How to read namespace as it is in a xml using XMLStreamReader? - java

I have an xml file from which i read using an XMLStreamReader object.
So i'll keep it simple :
Let's take this xml example :
<mySample xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" attribute1="value1"/>
So what i need is to get the value (as a String) "xmlns:xsi" and get the value (as a String also) "http://www.w3.org/2001/XMLSchema-instance"
I did try to have a test like this :
if (reader.getEventType() != XMLStreamConstants.NAMESPACE){
attributeName = reader.getAttributeLocalName(i);
attributeValue = reader.getAttributeValue(i);
}
else{
attributeName = reader.getNamespacePrefix(i) + reader.getNamespaceURI(i);
attributeValue = reader.getAttributeValue(i);
}
But it did not work.
Obviously i missed something being a newbie to this API, so any help would be very welcome.

The JSR-173 specification (Stax API for Java) states the following regarding the NAMESPACE event :
Namespace
Namespace declarations can also exist outside of a StartElement and may be reported as a
standalone information item. In general Namespaces are reported as part of a StartElement
event. When namespaces are the result of an XQuery or XPath expression they may be
reported as standalone events.
So if you are looking at namespace events, you should most probably be checking StartElement events, and inspect them. Once again, from the spec :
Namespaces can be accessed using the following methods:
int getNamespaceCount();
String getNamespacePrefix(int index);
String getNamespaceURI(int index);
Only the namespaces declared on the current StartElement are available. The list does
not contain previously declared namespaces and does not remove redeclared namespaces.
At any point during the parsing, you can get the current complete namespace context :
The namespace context of the current state is available by calling
XMLStreamReader.getNamespaceContext() or
StartElement.getNamespaceContext(). These methods return an instance of the
javax.xml.namespace.NamespaceContext interface.
That's theory : most namespace declarations come from START_ELEMENT, some may come independently.
In practice, I have never came accross a NAMESPACE event reported by the API when reading from a file. It's almost always reported as part of a START_ELEMENT (and repeated in the corresponding END_ELEMENT), so you must check START_ELEMENT if you are interested in namespace declaration. For example, starting with your document :
String xml = "<?xml version=\"1.0\" encoding=\"utf-8\" ?><mySample xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" attribute1=\"value1\"/>";
XMLStreamReader reader = XMLInputFactory.newFactory().createXMLStreamReader(new StringReader(xml));
while (reader.hasNext()) {
int event = reader.next();
if (XMLStreamConstants.START_ELEMENT == event) {
if (reader.getNamespaceCount() > 0) {
// This happens
System.out.println("ELEMENT START: " + reader.getLocalName() + " , namespace count is: " + reader.getNamespaceCount());
for (int nsIndex = 0; nsIndex < reader.getNamespaceCount(); nsIndex++) {
String nsPrefix = reader.getNamespacePrefix(nsIndex);
String nsId = reader.getNamespaceURI(nsIndex);
System.out.println("\tNamepsace prefix: " + nsPrefix + " associated with URI " + nsId);
}
}
} else if(XMLStreamConstants.NAMESPACE == event) {
// This almost never happens
System.out.println("NAMESPACE EVENT");
}
}
Will produce :
ELEMENT START: mySample , namespace count is: 1
Namepsace prefix: xsi associated with URI http://www.w3.org/2001/XMLSchema-instance
Bottom line : you should check for NAMESPACE and START_ELEMENT events, even if most of times, you will only have START_ELEMENT reporting namespace declartions, it is not one or the other, it's both.

Related

How to get the whole text of StAX XMLEvent object?

I am using StAX Iterator api to read an xml.
XML:
<FormData OID="QUAL">
<IGData IGRepeatKey="1" IGOID="SQUAL" TType="Insert">
<IData Value="0859" IOID="SID"></IData>
<IData Value="DM" IOID="RDOMAIN"></IData>
</IGData>
<IGData IGRepeatKey="1" IGOID="SQUAL" TType="Insert">
<IData Value="0860" IOID="SID"></IData>
<IData Value="2013-01-03T02:00" IOID="QVAL"></IData>
</IGData>
</FormData>
And Stax code:
while(xmlEventReader.hasNext()){
xmlEvent = xmlEventReader.nextEvent();
eventString = xmlEvent.toString();
if(xmlEvent.isStartElement() && eventString.contains("FormData") && eventString.contains("QUAL")){
//do something
}
}
It is working (eventString has whole text of xmlEvent) in my local environment.
But when i deploy this into server, eventString contains like "Stax Event #1". So if condition is returning false.
I thought both are using different XMLEvent implementations. So i checked it through code, and jar is same in both environments: jre1.8.0_73/lib/rt.jar!/javax/xml/stream/events/XMLEvent.class
How to get the whole text of XMLEvent object? Am i doing anything wrong here? Please suggest any other alternatives.
Every XML Event has 3 states
Start Element
is Characters
End Element
for eg if you need to access the data for " IGRepeatKey " from your xml file then in the state ( Start Element ), you need to check if IGData tag has started , if its true . Start a new Iterator which will iterate over all the tags i.e IGRepeatKey , IGOID , TType .
Try Something like this
Iterator<Attribute> iterator = element.getAttributes();
while (iterator.hasNext())
{
Attribute attribute = (Attribute)iterator.next();
QName name = attribute.getName();
String value = attribute.getValue();
System.out.println(name+" + "+value);
}
Add this iterator in xml.isStartElement() block .

stax xml confusion with getname function

I have a xml file like this:
<comment type="PTM">
<text evidence="19">Sumoylated following its interaction with PIAS1 and UBE2I.</text>
</comment>
<comment type="PTM">
<text evidence="17">Ubiquitinated, leading to proteasomal degradation.</text>
</comment>
<comment type="disease">
<text>A chromosomal aberration involving ZMYND11 is a cause of acute poorly differentiated myeloid leukemia. Translocation (10;17)(p15;q21) with MBTD1.</text>
</comment>
<comment type="disease" evidence="23">
<disease id="DI-04257">
<name>Mental retardation, autosomal dominant 30</name>
<acronym>MRD30</acronym>
<description>A disorder characterized by significantly below average general intellectual functioning associated with impairments in adaptive behavior and manifested during the developmental period. MRD30 patients manifest mild intellectual disability and subtle facial dysmorphisms, including hypertelorism, ptosis, and a wide mouth.</description>
<dbReference type="MIM" id="616083"/>
</disease>
<text>The disease is caused by mutations affecting the gene represented in this entry.</text>
</comment>
<comment type="similarity">
<text evidence="8">Contains 1 bromo domain.</text>
</comment>
<comment type="similarity">
<text evidence="9">Contains 1 MYND-type zinc finger.</text>
</comment>
I use stax to extract the disease information. This is part of my code:
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader eventReader = factory.createXMLEventReader( new FileReader(p));
while(eventReader.hasNext()){
XMLEvent event = eventReader.nextEvent();
switch(event.getEventType()){
case XMLStreamConstants.START_ELEMENT:
StartElement startElement = event.asStartElement();
String qName = startElement.getName().getLocalPart();
if (qName.equalsIgnoreCase("comment")) {
System.out.println("Start Element : comment");
Iterator<Attribute> attributes = startElement.getAttributes();
Attribute a = attributes.next();
System.out.println("ATRIBUTES " + a.getName());
type = a.getValue();
System.out.println("Roll No : " + type);
} else if(qName.equalsIgnoreCase("text") && type.equals("disease")){ text = true; }
break;
case XMLStreamConstants.CHARACTERS:
Characters characters = event.asCharacters();
if(text){ res = res + " " + characters.getData();
//System.out.println("TEXT: " + res);
text = false;
}
break;
case XMLStreamConstants.END_ELEMENT:
EndElement endElement = event.asEndElement();
if(endElement.getName().getLocalPart().equalsIgnoreCase("comment")){
//System.out.println("End Element : comment");
//System.out.println();
}
break;
For this type of line:
<comment type="disease">
I can extract the info correctly, but when I try to find comment type "disease" in this line:
<comment type="disease" evidence="23">
it gives me type=evidence and not type=disease as it should be. Therefore it doesn't save anything from this kind of line.
First of all can we please get in the habit of using useful variable names, you have the following variables with their type: a(node), text(boolean), qName(String)... These variables leave me scratching my head and wondering what they are:
a - Just not a useful name, it should really be something like typeAttr or something noting that it should be the type="" attribute
text - its a boolean?! maybe collectText would be more appropriate since it designates that you should collect the next text events value.
qName - its a string which is the localPart of a QName, if its not a QName then dont name it as one..
But thats enough ranting you get the idea. Your problem lies in where you get the attribute. In XML attributes have no specific order and will not and should not be expected to return in the order which they are defined. In your code you have the following
Iterator<Attribute> attributes = startElement.getAttributes();
Attribute a = attributes.next();
System.out.println("ATRIBUTES " + a.getName());
type = a.getValue();
Here you get the first attribute from the element and set the type equal to its value. As I mentioned the XML attributes have no specific order so you are getting the evidence attribute. You should be getting the attribute by name:
Attribute a = startElement.getAttributeByName(QName.valueOf("type"));
System.out.println("ATRIBUTES " + a.getName());
type = a.getValue();
Sorry no direct answer but a comment on how to use StaX or XmlPull effectively: Streaming XML parsers are designed to be friendly for recursive descent parsing (avoiding explicit state modeling, something you'd often need with a SAX parser) -- in your case i'd expect the following methods (rejecting or ignoring all unexpected content):
Comment parseComment(XMLEventReader eventReader) {
// call parseText and parseDisease for the corresponding element starts
}
Text parseText(XMLEventReader eventReader) {
}
Disease parseDisease(XmlEventReader eventReader) {
}
That said, there is a tradeoff: If you don't need the streaming aspect (performance), you may be better of with just parsing to a DOM and then extracting the information as needed by walking or peeking into the DOM, avoiding a low level XML API altogether.
By using Stax I assume you are dealing with a large document, or a platform with limited resources... the fact is that memory overhead is largely a DOM related issue. VTD-XML on the other hand is far more efficient than DOM while retaining vitually all benefits of DOM style of coding... please read this latest research paper for more info
http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf
import com.ximpleware.*;
public class queryAttr {
public static void main(String[] s) throws VTDException{
VTDGen vg = new VTDGen();
vg.selectLcDepth(5);// improve XPath performance for deep document
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/root/comment[#type='disease' and #evidence='23']");
int i=0,j=0;
while((i=ap.evalXPath())!=-1){
if (vn.toElement(VTDNav.FIRST_CHILD)){
System.out.println(" element name: "+ vn.toString(vn.getCurrentIndex()));
j=vn.getText();
if (i!=-1)
System.out.println(""+vn.toString(i));
if (vn.toElement(VTDNav.NS)){
System.out.println(" element name: "+ vn.toString(vn.getCurrentIndex()));
j=vn.getText();
if (i!=-1)
System.out.println("text node==>"+vn.toString(i));
}
if (vn.toElement(VTDNav.NS)){
System.out.println(" element name: "+ vn.toString(vn.getCurrentIndex()));
j=vn.getText();
if (i!=-1)
System.out.println("text node==>"+vn.toString(i));
}
if (vn.toElement(VTDNav.NS)){
System.out.println(" element name: "+ vn.toString(vn.getCurrentIndex()));
j=vn.getText();
if (i!=-1)
System.out.println("text node==>"+vn.toString(i));
}
vn.toElement(VTDNav.PARENT);
}
}
}
}

Why did qName work and LocalName did not?

I was learning the Java SAX API. I made my own XML feed using php. Here is the XML Document
Now when i wanted to make my application output to the console nothing came up. I pinpointed the problem to the endElement method in my XMLHandler that extends DefaultHandler. Here is my implementation of it.
public void endElement(String uri, String localName, String qName) throws SAXException {
//I added the next three lines for debugging
System.out.println("Found End Element " + count + " times");
System.out.println("Localname = " + localName);
System.out.println("QName = " + qName);
super.endElement(uri, localName, qName);
if (this.currentItem != null){
if (localName.equalsIgnoreCase(me.osama.XMLParsing.BaseFeedParser.ITEMNAME)){
currentItem.setItemName(builder.toString());
} else if (localName.equalsIgnoreCase(me.osama.XMLParsing.BaseFeedParser.ITEMSITE)){
currentItem.setItemSite(builder.toString());
} else if (localName.equalsIgnoreCase(me.osama.XMLParsing.BaseFeedParser.ITEMNO)){
currentItem.setItemNo(builder.toString());
} else if (localName.equalsIgnoreCase(me.osama.XMLParsing.BaseFeedParser.ITEM)){
System.out.println(currentItem);
items.add(currentItem);
}
builder.setLength(0);
}
count++;
}
Turns out that localName kept on coming empty hence the conditions never held true and the code never went into the decision block. On the other hand qName brought all names out properly and once i changed the variable to qName the List<Item> items collection type did fill up and worked correctly.
I am here to ask why did qName work and not localName? Whereas the tutorial from IBM's DeveloperWorks used an RSS feed and localName worked perfectly for him.
P.S. this is the feed the IBM Tutorial used: http://www.androidster.com/android_news.rss
As per the SAX namespace for Java API,
By default, an XML reader will report a Namespace URI and a localName
for every element that belongs in a namespace, in both the start and
end handler.
Perhaps, if you add a namespace to the XML and define your elements in that namespace, it would return a valid localName. The article also mentions that with namespace processing, some implementations will return empty qName.

Dom4j xmlns attribute

I want to add xmlns attribute to the root node only, however when i add a namespace to the root element, all subsequent child elements also get the same xmlns attribute. How do I add xmlns attribute to a single node but not any of its children ?
CODE:
public String toXml() {
Document document = DocumentHelper.createDocument();
Element documentRoot = document.addElement("ResponseMessage");
documentRoot.addNamespace("",getXmlNamespace())
.addAttribute("xmlns:xsi", getXmlNamespaceSchemaInstance())
.addAttribute("xsi:schemaLocation", getXmlSchemaLocation())
.addAttribute("id", super.getId());
Element header = documentRoot.addElement("Header");
buildHeader(header);
Element body = documentRoot.addElement("Body");
buildProperties(body);
body.addElement("StatusMessage").addText(this.getStatusMessage().getMessage());
return document.asXML();
}
OK, new answer.
If you want your elements to belong to a certain namespace, be sure to create them in that namespace. Use the methods that have Qname as one of its arguments. If you create an element with no namespace, DOM4J will have to add namespace declarations to accommodate to your (unwillingly) specification.
Your example slightly edited. I didn't use QName, but gave each element a namespace uri:
public static String toXml() {
Document document = DocumentHelper.createDocument();
Element documentRoot = document.addElement("ResponseMessage",
getXmlNamespace());
documentRoot.addAttribute(QName.get("schemaLocation", "xsi", "xsi-ns"),
"schema.xsd").addAttribute("id", "4711");
Element header = documentRoot.addElement("Header");
Element body = documentRoot.addElement("Body", getXmlNamespace());
// buildProperties(body);
body.addElement("StatusMessage", getXmlNamespace()).addText("status");
return document.asXML();
}
private static String getXmlNamespace() {
return "xyzzy";
}
public static void main(String[] args) throws Exception {
System.out.println(toXml());
}
produces as output:
<?xml version="1.0" encoding="UTF-8"?>
<ResponseMessage xmlns="xyzzy" xmlns:xsi="xsi-ns" xsi:schemaLocation="schema.xsd" id="4711">
<Header/><Body><StatusMessage>status</StatusMessage></Body>
</ResponseMessage>
UPDATE 2:
Note also, that I changed the way how the schemaLocation attribute is declared. You really never have to manually manage the namespace declarations--this will be taken care of by the library.
However, there is one case where it might be useful to add a namespace delaration: If you have a document with predominantly namespace X elements, and some child elements with namspace Y spread out in the document, declaring a namesapce binding for Y at the root element, may save a lot of repeating name space declarations in the child elements.
Heres how. Its a bit of a hack, but it does what you want:
public static String toXml() {
Document d = DocumentHelper.createDocument();
Namespace rootNs = new Namespace("", DEFAULT_NAMESPACE); // root namespace uri
Namespace xsiNs = new Namespace("xsi", XSI_NAMESPACE); // xsi namespace uri
QName rootQName = QName.get(rootElement, rootNs); // your root element's name
Element root = d.addElement(rootElement);
root.setQName(rootQName);
root.add(xsiNs);
root.addAttribute("xsi:schemaLocation", SCHEMA_LOC)
.addAttribute("id", super.getId());
Element header = documentRoot.addElement("Header");
Element body = documentRoot.addElement("Body", getXmlNamespace());
// buildProperties(body);
body.addElement("StatusMessage", getXmlNamespace()).addText("status");
return document.asXML();
}

Retrieving Xpath from SOAP Message

I want to retrieve all the xpaths from soap message at run time.
For example, if I have a soap message like
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Bodyxmlns:ns1="http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail">
<ns1:process>
<ns1:To></ns1:To>
<ns1:Subject></ns1:Subject>
<ns1:Body></ns1:Body>
</ns1:process>
</soap:Body>
</soap:Envelope>
then the possible xpaths from this soap message are
/soap:Envelope/soap:Body/ns1:process/ns1:To
/soap:Envelope/soap:Body/ns1:process/ns1:Subject
/soap:Envelope/soap:Body/ns1:process/ns1:Body
How can i retrive those with java?
Use the XPath type with a NamespaceContext.
Map<String, String> map = new HashMap<String, String>();
map.put("foo", "http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail");
NamespaceContext context = ...; //TODO: context from map
XPath xpath = ...; //TODO: create instance from factory
xpath.setNamespaceContext(context);
Document doc = ...; //TODO: parse XML
String toValue = xpath.evaluate("//foo:To", doc);
The double forward slash makes this expression match the first To element in the http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail in the given node. It does not matter that I used foo instead of ns1; the prefix mapping needs to match the one in the XPath expression, not the one in the document.
You can find further examples in Java: using XPath with namespaces and implementing NamespaceContext. You can find further examples of working with SOAP here.
Something like this could work:
string[] paths;
function RecurseThroughRequest(string request, string[] paths, string currentPath)
{
Nodes[] nodes = getNodesAtPath(request, currentPath);
//getNodesAtPath is an assumed function which returns a set of
//Node objects representing all the nodes that are children at the current path
foreach(Node n in nodes)
{
if(!n.hasChildren())
{
paths.Add(currentPath + "/" + n.Name);
}
else
{
RecurseThroughRequest(paths, currentPath + "/" + n.Name);
}
}
}
And then call the function with something like this:
string[] paths = new string[];
RecurseThroughRequest(request, paths, "/");
Of course that won't work out of the gates, but I think the theory is there.

Categories