unable to parse a node from xml string with dom4j

unable to parse a node from xml string with dom4j - java

I'm parsing a xml string with dom4j and I'm using xpath to select some element from it, the code is :
String test = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><epp xmlns=\"urn:ietf:params:xml:ns:epp-1.0\"><response><result code=\"1000\"><msg lang=\"en-US\">Command completed successfully</msg></result><trID><clTRID>87285586-99412370</clTRID><svTRID>52639BB8-1-ARNES</svTRID></trID></response></epp>";
SAXReader reader = new SAXReader();
reader.setIncludeExternalDTDDeclarations(false);
reader.setIncludeInternalDTDDeclarations(false);
reader.setValidation(false);
Document xmlDoc;
try {
xmlDoc = reader.read(new StringReader(test));
xmlDoc.getRootElement();
Node nodeStatus = xmlDoc.selectSingleNode("//epp/response/result");
System.out.print(nodeStatus.getText());
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I always get null for the nodeStatus variable. I actualy nead to read the code from the result noad from the xml
<result code="1000">
This is the XML that I am reading from the String test:
<?xml version="1.0" encoding="UTF-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
<response>
<result code="1000">
<msg lang="en-US">Command completed successfully</msg>
</result>
<trID>
<clTRID>87285586-99412370</clTRID>
<svTRID>52639BB8-1-ARNES</svTRID>
</trID>
</response>
</epp>
Any hints?

Your XML has a namespace. DOM4J returns null because it won't find your nodes.
To make it work, you first have to register the namespaces you are using. You will need a prefix. Any one. And you will have to use that prefix in your XPath.
You could use tns for "target namespace". Then you have to create a xpath object with it like this:
XPath xpath = new DefaultXPath("/tns:epp/tns:response/tns:result");
To register the namespaces you will need to create a Map, add the namespace with the prefix you used in the xpath expression, and pass it to the setNamespaceURIs() method.
namespaces.put("tns", "urn:ietf:params:xml:ns:epp-1.0");
xpath.setNamespaceURIs(namespaces);
Now you can call selectSingleNode, but you will call it on your XPath object passing the document as the argument:
Node nodeStatus = xpath.selectSingleNode(xmlDoc);
From there you can extract the data you need. getText() won't give you the data you want. If you want the contents of the result node as XML, you can use:
nodeStatus.asXML()
Edit: to retrieve just the code, change your XPath to:
/tns:epp/tns:response/tns:result/#code
And retrieve the result with
nodeStatus.getText();
I replaced the double slash // (which means descendant-or-self) with / since the expression contains the full path and / is more efficient. But if you only have one result node in your whole file, you can use:
//result/#code
to extract the data. It will match all descendants. If there is more than one result, it will return a node-set.

Related

How do I retrieve one specific piece of data from a XML file using XMLreader?

A real example of the XML data I have to parse through and how the file is configured. this is how the file is presented to me.
<?xml version="1.0"?>
<session>
<values>
<value id="FILE_CREATE_DATE">
<timestamp>2012-04-16T21:33:31Z</timestamp>
</value>
<value id="LAST_ACCESSED">
<timestamp>2012-09-17T17:15:23Z</timestamp>
</value>
<value id="VERSION_TIMESTAMP">
<timestamp>2012-04-16T21:33:31Z</timestamp>
</value>
</values>
</session>
I need to go into this file and retrieve the FILE_CREATE_DATE data.
My code so far:
File xmlFile = new File(XMLFileData[i].getPath());
FileInputStream myXMLStream = new FileInputStream(xmlFile);
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLReader.hasText())
{
System.out.println(XMLReader.getText());
break;
}
}
XMLReader.next();
}
the 'getLocalName()' function returns 'Sessions' then 'value' then 'values' but never returns the actual name of the element. I need to test to see if I am at the right element then retrieve the data from that element...

I use Jsoup which is a library for parsing HTML. But it can be used for xml too. you would first have to load the XML file into a Document object then simply call
doc.getElementById("FILE_CREATE_DATE");
This will return an Element object that will have the timestamp as a child. Here's a link to the library: https://jsoup.org/
This is my first StackOverflow answer so let me know if it helps !

Your id is not an element - it's element attribute.
You should read attribute of your value node, see the javadoc for getAttributeValue method:
http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html#getAttributeValue(java.lang.String,%20java.lang.String)
Returns the normalized attribute value of the attribute with the
namespace and localName If the namespaceURI is null the namespace is
not checked for equality
So it will be:
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value")) {
String idValue = XMLReader.getAttributeValue(null, "id");
//here idValue will be equal to FILE_CREATE_DATE, LAST_ACCESSED or VERSION_TIMESTAMP
}

try something like
if(XMLReader.getAttributeValue(0).equalIgnorecase("FILE_CREATE_DATE"))
getAttributeValue : Return value of the given index of the attribute. for
<value id="FILE_CREATE_DATE">
id is the first attribute. So XMLReader.getAttributeValue(0)
but before calling this you have to validate whether element has the first attribute. Because all the tags does not have at least 1 attribute.
in jsoup you can query like this
public static void main(String[] args) {
Document doc;
try {
doc = Jsoup.connect("http://www.dropbox.com/public/xml/yourfile.xml").userAgent("Mozilla").get();
//<value id="FILE_CREATE_DATE">
Elements links = doc.select("value[id=FILE_CREATE_DATE]");
for (Element link : links) {
if(link.attr("id").contains("FILE_CREATE_DATE"))//find the link with some texts
{
System.out.println("here is the element you need");
}
}
} catch (IOException e) {
e.printStackTrace();
}
}

XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value"))
{
String idValue = XMLReader.getAttributeValue(null, "id");
if (idValue.equals("FILE_CREATE_DATE"))
{
System.out.println(idValue);
XMLReader.nextTag();
System.out.println(XMLReader.getElementText());
}
}
}
XMLReader.next();
}
So this code is the final result of all my anguish on the topic of recovering specific data from a XML data file. I want to thank everyone who helped me out with answers - regardless on if it was what I was looking for they got me thinking and that led to the solution...

Dom4j xmlns attribute

I want to add xmlns attribute to the root node only, however when i add a namespace to the root element, all subsequent child elements also get the same xmlns attribute. How do I add xmlns attribute to a single node but not any of its children ?
CODE:
public String toXml() {
Document document = DocumentHelper.createDocument();
Element documentRoot = document.addElement("ResponseMessage");
documentRoot.addNamespace("",getXmlNamespace())
.addAttribute("xmlns:xsi", getXmlNamespaceSchemaInstance())
.addAttribute("xsi:schemaLocation", getXmlSchemaLocation())
.addAttribute("id", super.getId());
Element header = documentRoot.addElement("Header");
buildHeader(header);
Element body = documentRoot.addElement("Body");
buildProperties(body);
body.addElement("StatusMessage").addText(this.getStatusMessage().getMessage());
return document.asXML();
}

OK, new answer.
If you want your elements to belong to a certain namespace, be sure to create them in that namespace. Use the methods that have Qname as one of its arguments. If you create an element with no namespace, DOM4J will have to add namespace declarations to accommodate to your (unwillingly) specification.
Your example slightly edited. I didn't use QName, but gave each element a namespace uri:
public static String toXml() {
Document document = DocumentHelper.createDocument();
Element documentRoot = document.addElement("ResponseMessage",
getXmlNamespace());
documentRoot.addAttribute(QName.get("schemaLocation", "xsi", "xsi-ns"),
"schema.xsd").addAttribute("id", "4711");
Element header = documentRoot.addElement("Header");
Element body = documentRoot.addElement("Body", getXmlNamespace());
// buildProperties(body);
body.addElement("StatusMessage", getXmlNamespace()).addText("status");
return document.asXML();
}
private static String getXmlNamespace() {
return "xyzzy";
}
public static void main(String[] args) throws Exception {
System.out.println(toXml());
}
produces as output:
<?xml version="1.0" encoding="UTF-8"?>
<ResponseMessage xmlns="xyzzy" xmlns:xsi="xsi-ns" xsi:schemaLocation="schema.xsd" id="4711">
<Header/><Body><StatusMessage>status</StatusMessage></Body>
</ResponseMessage>
UPDATE 2:
Note also, that I changed the way how the schemaLocation attribute is declared. You really never have to manually manage the namespace declarations--this will be taken care of by the library.
However, there is one case where it might be useful to add a namespace delaration: If you have a document with predominantly namespace X elements, and some child elements with namspace Y spread out in the document, declaring a namesapce binding for Y at the root element, may save a lot of repeating name space declarations in the child elements.

Heres how. Its a bit of a hack, but it does what you want:
public static String toXml() {
Document d = DocumentHelper.createDocument();
Namespace rootNs = new Namespace("", DEFAULT_NAMESPACE); // root namespace uri
Namespace xsiNs = new Namespace("xsi", XSI_NAMESPACE); // xsi namespace uri
QName rootQName = QName.get(rootElement, rootNs); // your root element's name
Element root = d.addElement(rootElement);
root.setQName(rootQName);
root.add(xsiNs);
root.addAttribute("xsi:schemaLocation", SCHEMA_LOC)
.addAttribute("id", super.getId());
Element header = documentRoot.addElement("Header");
Element body = documentRoot.addElement("Body", getXmlNamespace());
// buildProperties(body);
body.addElement("StatusMessage", getXmlNamespace()).addText("status");
return document.asXML();
}

Efficient way of parsing XML in Java

I have to parse an XML file with following structure:
<root>
<object_1>
<pro1> abc </pro1>
<pro2> pqr </pro2>
<pro3> xyz </pro3>
<children>
<object_a>
<pro1> abc </pro1>
<pro2> pqr </pro2>
<pro3> xyz </pro3>
<children>
.
.
.
</children>
</object_a>
</children>
</object_1>
<object_2>
.
.
.
</object_n>
</root>
Aim is to parse this multilevel nesting. A few classes are defined in Java.
Class Object_1
Class Object_2
.
.
.
Class Object_N
with their respective properties.
The following code is working for me, but then this is not the best way of doing things.
File file = new File(fileName);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(file);
doc.getDocumentElement().normalize();
if(doc ==null) return;
Node node = doc.getFirstChild();
NodeList lst = node.getChildNodes();
Node children = null ;
int len = lst.getLength();
for(int index=0;index<len;index++)
{
Node child = lst.item(index);
String name = child.getNodeName();
if(name=="Name")
name = child.getNodeValue();
else if(name=="Comment")
comment = child.getNodeValue());
else if(name=="children")
children = child;
}
if(children==null) return;
lst = children.getChildNodes();
len = lst.getLength();
Class<?> obj=null;
AbsModel model = null;
for(int index=0;index<len;index++)
{
Node childNode = lst.item(index);
String modelName = childNode.getNodeName();
try {
obj = Class.forName(modelName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
if(obj!=null)
model = (AbsModel) obj.newInstance();
else
model = new GenericModel();
model.restoreDefaultPropFromXML(childNode);
addChild(model);
}
}
Is there a better way of parsing this XML.

Consider using JAXB, which is part of Java since version 6. You should be able to parse (“unmarshall”) your XML file into your own classes with almost no code, just adding a few annotations expliciting the mapping between your object structure and your XML structure.

StAX and or JAXB is almost always the way to go.
If the XML is really dynamic (like attributes specify the property name) ie <prop name="property" value="" /> then you will need to use StAX only or live with what JAXB will map it to (a POJO with name and value properties) and post process.
Personally I find combining StAX and JAXB the best. I parse to the elements I want and then use JAXB to turn the element into a POJO.
See Also:
My own utility library that will turn an XML Stream into an iterator of objects.
Parsing very large XML files and marshalling to Java Objects
http://tedone.typepad.com/blog/2011/06/unmarshalling-benchmark-in-java-jaxb-vs-stax-vs-woodstox.html

While JAXB may be the best choice I'd also like to mention jOOX which provides a JQuery-like API and makes working with XML documents really pleasant.

Retrieving Xpath from SOAP Message

I want to retrieve all the xpaths from soap message at run time.
For example, if I have a soap message like
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Bodyxmlns:ns1="http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail">
<ns1:process>
<ns1:To></ns1:To>
<ns1:Subject></ns1:Subject>
<ns1:Body></ns1:Body>
</ns1:process>
</soap:Body>
</soap:Envelope>
then the possible xpaths from this soap message are
/soap:Envelope/soap:Body/ns1:process/ns1:To
/soap:Envelope/soap:Body/ns1:process/ns1:Subject
/soap:Envelope/soap:Body/ns1:process/ns1:Body
How can i retrive those with java?

Use the XPath type with a NamespaceContext.
Map<String, String> map = new HashMap<String, String>();
map.put("foo", "http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail");
NamespaceContext context = ...; //TODO: context from map
XPath xpath = ...; //TODO: create instance from factory
xpath.setNamespaceContext(context);
Document doc = ...; //TODO: parse XML
String toValue = xpath.evaluate("//foo:To", doc);
The double forward slash makes this expression match the first To element in the http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail in the given node. It does not matter that I used foo instead of ns1; the prefix mapping needs to match the one in the XPath expression, not the one in the document.
You can find further examples in Java: using XPath with namespaces and implementing NamespaceContext. You can find further examples of working with SOAP here.

Something like this could work:
string[] paths;
function RecurseThroughRequest(string request, string[] paths, string currentPath)
{
Nodes[] nodes = getNodesAtPath(request, currentPath);
//getNodesAtPath is an assumed function which returns a set of
//Node objects representing all the nodes that are children at the current path
foreach(Node n in nodes)
{
if(!n.hasChildren())
{
paths.Add(currentPath + "/" + n.Name);
}
else
{
RecurseThroughRequest(paths, currentPath + "/" + n.Name);
}
}
}
And then call the function with something like this:
string[] paths = new string[];
RecurseThroughRequest(request, paths, "/");
Of course that won't work out of the gates, but I think the theory is there.

NamespaceContext and using namespaces with XPath

Resolving an xpath that includes namespaces in Java appears to require the use of a NamespaceContext object, mapping prefixes to namespace urls and vice versa. However, I can find no mechanism for getting a NamespaceContext other than implementing it myself. This seems counter-intuitive.
The question: Is there any easy way to acquire a NamespaceContext from a document, or to create one, or failing that, to forgo prefixes altogether and specify the xpath with fully qualified names?

It is possible to get a NamespaceContext instance without writing your own class. Its class-use page shows you can get one using the javax.xml.stream package.
String ctxtTemplate = "<data xmlns=\"http://base\" xmlns:foo=\"http://foo\" />";
NamespaceContext nsContext = null;
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader evtReader = factory
.createXMLEventReader(new StringReader(ctxtTemplate));
while (evtReader.hasNext()) {
XMLEvent event = evtReader.nextEvent();
if (event.isStartElement()) {
nsContext = ((StartElement) event)
.getNamespaceContext();
break;
}
}
System.out.println(nsContext.getNamespaceURI(""));
System.out.println(nsContext.getNamespaceURI("foo"));
System.out.println(nsContext
.getNamespaceURI(XMLConstants.XMLNS_ATTRIBUTE));
System.out.println(nsContext
.getNamespaceURI(XMLConstants.XML_NS_PREFIX));
Forgoing prefixes altogether is likely to lead to ambiguous expressions - if you want to drop namespace prefixes, you'd need to change the document format. Creating a context from a document doesn't necessarily make sense. The prefixes have to match the ones used in the XPath expression, not the ones in any document, as in this code:
String xml = "<data xmlns=\"http://base\" xmlns:foo=\"http://foo\" >"
+ "<foo:value>"
+ "hello"
+ "</foo:value>"
+ "</data>";
String expression = "/stack:data/overflow:value";
class BaseFooContext implements NamespaceContext {
#Override
public String getNamespaceURI(String prefix) {
if ("stack".equals(prefix))
return "http://base";
if ("overflow".equals(prefix))
return "http://foo";
throw new IllegalArgumentException(prefix);
}
#Override
public String getPrefix(String namespaceURI) {
throw new UnsupportedOperationException();
}
#Override
public Iterator<String> getPrefixes(
String namespaceURI) {
throw new UnsupportedOperationException();
}
}
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new BaseFooContext());
String value = xpath.evaluate(expression,
new InputSource(new StringReader(xml)));
System.out.println(value);
Neither the implementation returned by the StAX API nor the one above implement the full class/method contracts as defined in the doc. You can get a full, map-based implementation here.

I've just been working through using xpath and NamespaceContexts myself. I came across a good treatment of the issue on developerworks.

I found a convenient implementation in "Apache WebServices Common Utilities" called NamespaceContextImpl.
You can use the following maven dependency to obtain this class:
<dependency>
<groupId>org.apache.ws.commons</groupId>
<artifactId>ws-commons-util</artifactId>
<version>1.0.1</version>
</dependency>
I've use it in the following manner (I know its built for sax, but after reading the code, its o.k):
NamespaceContextImpl nsContext = new NamespaceContextImpl();
nsContext.startPrefixMapping("foo", "my.name.space.com");
You don't need to called endPrefixMapping.

If you are using the Spring framework you can reuse their NamespaceContext implementation
org.springframework.util.xml.SimpleNamespaceContext
This is a similar answer like the one from Asaf Mesika. So it doesn't give you automatic a NamespaceContext based on your document. You have to construct it yourself. Still it helps you because it at least gives you an implementation to starts with.
When we faced a similar problem, Both the spring SimpleNamespaceContext and the "Apache WebServices Common Utilities" worked. We wanted to avoid to the addition jar dependency on Apache WebServices Common Utilities and used the Spring one, because our application is Spring based.

If you are using Jersey 2 and only have a default XML namespace (xmlns="..."), you can use SimpleNamespaceResolver:
<?xml version="1.0" encoding="UTF-8"?>
<Outer xmlns="http://host/namespace">
<Inner />
</Outer>
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
docBuilderFactory.setNamespaceAware(true);
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new File("document.xml"));
String query = "/t:Outer/t:Inner";
XPath xpath = XPathFactory.newInstance().newXPath();
String xmlns = document.getDocumentElement().getAttribute("xmlns");
xpath.setNamespaceContext(new SimpleNamespaceResolver("t", xmlns));
NodeList nodeList = (NodeList) xpath.evaluate(query, document, XPathConstants.NODESET);
//nodeList will contain the <Inner> element
You can also specify xmlns manually if you want.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

unable to parse a node from xml string with dom4j - java

Related

How do I retrieve one specific piece of data from a XML file using XMLreader?

Dom4j xmlns attribute

Efficient way of parsing XML in Java

Retrieving Xpath from SOAP Message

NamespaceContext and using namespaces with XPath

Categories

Resources