How to get XML comments using Java's DocumentBuilder

How to get XML comments using Java's DocumentBuilder - java

I have an application that uses SAML authentication, acts as an SP, and therefore parses SAMLResponses. I received notification that an IdP that communicates with my application will now start signing their SAMLResponses with http://www.w3.org/2001/10/xml-exc-c14n#WithComments, which means comments matter when calculating the validity of the SAML signature.
Here lies the problem - the library I use for XML parsing strips these comment nodes by default. See this example program:
import org.apache.commons.io.IOUtils;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
public class Main {
public static void main(String[] args) {
try {
String xml = "<NameID>test#email<!---->.com</NameID>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(IOUtils.toInputStream(xml));
NodeList nodes = doc.getElementsByTagName("NameID");
if (nodes == null || nodes.getLength() == 0)
{
throw new RuntimeException("No NameID in document");
}
System.out.println(nodes.item(0).getTextContent());
} catch(Exception e) {
System.err.println(e.getMessage());
}
}
}
So, this program will print test#email.com (which means that's what my SAML code will get, too). This is a problem, as I'm pretty sure it will cause signature validation to fail without the comment included, since the XML document was signed with the #WithComments canonicalization method.
Is there any way to get DocumentBuilder/getTextContent() to leave in comment nodes so my signature is not invalidated by the missing comment?
Documentation for getTextContent() is here: https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html#getTextContent()

Your code actually retains the comment.
Here, slightly modified:
public static void main(String[] args) throws Exception {
String xml = "<NameID>test#email<!--foobar-->.com</NameID>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
NodeList childNodes = doc.getDocumentElement().getChildNodes();
Node[] nodes = new Node[childNodes.getLength()];
for (int index = 0; index < childNodes.getLength(); index++) {
nodes[index] = childNodes.item(index);
}
System.out.println(nodes[1].getTextContent());
}
Prints foobar. (Run it on Ideone.)
There are 3 child nodes of the root element, one of the is the comment node. So it is actually retained.

Related

Not able to parse inner elements of XML using DocumentBuilderFactory in Java

I'm having a response as XML. I'm trying to parse the XML object to get inner details. Im using DocumentBuilderFactory for this. The parent object is not null, but when I try to get the deepnode list elements, its returning null. Am I missing anything
Here is my response XML
ResponseXML
<DATAPACKET REQUEST-ID = "1">
<HEADER>
</HEADER>
<BODY>
<CONSUMER_PROFILE2>
<CONSUMER_DETAILS2>
<NAME>David</NAME>
<DATE_OF_BIRTH>1949-01-01T00:00:00+03:00</DATE_OF_BIRTH>
<GENDER>001</GENDER>
</CONSUMER_DETAILS2>
</CONSUMER_PROFILE2></BODY></DATAPACKET>
and Im parsing in the following way
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(responseXML));
// Consumer details.
if(doc.getDocumentElement().getElementsByTagName("CONSUMER_DETAILS2") != null) {
Node consumerDetailsNode = doc.getDocumentElement().getElementsByTagName("CONSUMER_DETAILS2").item(0); -->This is coming as null
dateOfBirth = getNamedItem(consumerDetailsNode, "DATE_OF_BIRTH");
System.out.println("DOB:"+dateOfBirth);
}
getNamedItem
private static String getNamedItem(Node searchResultNode, String param) {
return searchResultNode.getAttributes().getNamedItem(param) != null ? searchResultNode.getAttributes().getNamedItem(param).getNodeValue() : "";
}
Any ideas would be greatly appreciated.

The easiest way to search for individual elements within an XML document is with XPAth. It provides search syntax similar to file system notation.
Here is a solution to the specific problem of you document:
EDIT: solution adopted to support multiple CONSUMER_PROFILE2 elements. You just need to get and parse NodeList instread of one Node
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public class XpathDemo
{
public static void main(String[] args)
{
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document xmlDoc = builder.parse(new InputSource(new FileReader("C://Temp/xx.xml")));
// Selects all CONSUMER_PROFILE2 elements no matter where they are in the document
String cp2_nodes = "//CONSUMER_PROFILE2";
// Selects first DATE_OF_BIRTH element somewhere under current element
String dob_nodes = "//DATE_OF_BIRTH[1]";
// Selects text child node of current element
String text_node = "/child::text()";
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList dob_list = (NodeList)xPath.compile(cp2_nodes + dob_nodes + text_node)
.evaluate(xmlDoc, XPathConstants.NODESET);
for (int i = 0; i < dob_list.getLength() ; i++) {
Node dob_node = dob_list.item(i);
String dob_text = dob_node.getNodeValue();
System.out.println(dob_text);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

Java Dom parser reports wrong number of child nodes

I have the following xml file:
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user id="0" firstname="John"/>
</users>
Then I'm trying to parse it with java, but getchildnodes reports wrong number of child nodes.
Java code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(this.file);
document.getDocumentElement().normalize();
Element root = document.getDocumentElement();
NodeList nodes = root.getChildNodes();
System.out.println(nodes.getLength());
Result: 3
Also I'm getting NPEs for accessing the nodes attributes, so I'm guessing something's going horribly wrong.

The child nodes consist of elements and text nodes for whitespace. You will want to check the node type before processing the attributes. You may also want to consider using the javax.xml.xpath APIs available in the JDK/JRE starting with Java SE 5.
Example 1
This example demonstrates how to issue an XPath statement against a DOM.
package forum11649396;
import java.io.StringReader;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<?xml version='1.0' encoding='UTF-8'?><users><user id='0' firstname='John'/></users>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new InputSource(new StringReader(xml)));
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
Element userElement = (Element) xpath.evaluate("/users/user", document, XPathConstants.NODE);
System.out.println(userElement.getAttribute("id"));
System.out.println(userElement.getAttribute("firstname"));
}
}
Example 2
The following example demonstrates how to issue an XPath statement against an InputSource to get a DOM node. This saves you from having to parse the XML into a DOM yourself.
package forum11649396;
import java.io.StringReader;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<?xml version='1.0' encoding='UTF-8'?><users><user id='0' firstname='John'/></users>";
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
InputSource inputSource = new InputSource(new StringReader(xml));
Element userElement = (Element) xpath.evaluate("/users/user", inputSource, XPathConstants.NODE);
System.out.println(userElement.getAttribute("id"));
System.out.println(userElement.getAttribute("firstname"));
}
}

There are three child nodes:
a text node containing a line break
an element node (tagged user)
a text node containing a line break
So when processing the child nodes, check for element nodes.

You have to make sure you account for the '\n' between the nodes, which count for text nodes. You can test for that using if(root.getNodeType() == Node.ELEMENT_NODE)
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(this.file);
document.getDocumentElement().normalize();
for(Node root = document.getFirstChild(); root != null; root = root.getNextSibling()) {
if(root.getNodeType() == Node.ELEMENT_NODE) {
NodeList nodes = root.getChildNodes();
System.out.println(root.getNodeName() + " has "+nodes.getLength()+" children");
for(int i=0; i<nodes.getLength(); i++) {
Node n = nodes.item(i);
System.out.println("\t"+n.getNodeName());
}
}
}

I didn't notice any of the answers addressing your last note about NPEs when trying to access attributes.
Also I'm getting NPEs for accessing the nodes attributes, so I'm guessing something's going horribly wrong.
Since I've seen the following suggestion on a few sites, I assume it's a common way to access attributes:
String myPropValue = node.getAttributes().getNamedItem("myProp").getNodeValue();
which works fine if the nodes always contain a myProp attribute, but if it has no attributes, getAttributes will return null. Also, if there are attributes, but no myProp attribute, getNamedItem will return null.
I'm currently using
public static String getStrAttr(Node node, String key) {
if (node.hasAttributes()) {
Node item = node.getAttributes().getNamedItem(key);
if (item != null) {
return item.getNodeValue();
}
}
return null;
}
public static int getIntAttr(Node node, String key) {
if (node.hasAttributes()) {
Node item = node.getAttributes().getNamedItem(key);
if (item != null) {
return Integer.parseInt(item.getNodeValue());
}
}
return -1;
}
in a utility class, but your mileage may vary.

Building an XML Document: I'm doing it wrong

I'm trying to build an XML representation of some data. I've followed other examples, but I can't get it working. I've commented code down to this basic bit, and still nothing. This code compiles and runs OK, but the resulting output is empty. A call to dDoc.getDocumentElement() returns null. What am I doing wrong?
Please help me, Stack Overflow. You're my only hope.
DocumentBuilderFactory dFactory = DocumentBuilderFactory.newInstance();
dFactory.setValidating( false );
DocumentBuilder dBuilder = dFactory.newDocumentBuilder();
Document dDoc = dBuilder.newDocument();
// The root document element.
Element pageDataElement = dDoc.createElement("page-data");
pageDataElement.appendChild(dDoc.createTextNode("Example Text."));
dDoc.appendChild(pageDataElement);
log.debug(dDoc.getTextContent());

The following runs ok. You just need to call dDoc.getDocumentElement().getTextContent() instead of dDoc.getTextContent().
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dFactory = DocumentBuilderFactory.newInstance();
dFactory.setValidating( false );
DocumentBuilder dBuilder = dFactory.newDocumentBuilder();
Document dDoc = dBuilder.newDocument();
// The root document element.
Element pageDataElement = dDoc.createElement("page-data");
pageDataElement.appendChild(dDoc.createTextNode("Example Text."));
dDoc.appendChild(pageDataElement);
System.out.println(dDoc.getDocumentElement().getTextContent());
}
}
Will give the output:
Example Text.

You also can use http://xom.nu/
Xom has better API, small and very fast.

How do you traverse and store XML in Blackberry Java app?

I'm having a problem accessing the contents of an XML document.
My goal is this:
Take an XML source and parse it into a fair equivalent of an associative array, then store it as a persistable object.
the xml is pretty simple:
<root>
<element>
<category_id>1</category_id>
<name>Cars</name>
</element>
<element>
<category_id>2</category_id>
<name>Boats</name>
</element>
</root>
Basic java class below. I'm pretty much just calling save(xml) after http response above. Yes, the xml is properly formatted.
import java.io.IOException;
import java.util.Hashtable;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import java.util.Vector;
import net.rim.device.api.system.PersistentObject;
import net.rim.device.api.system.PersistentStore;
import net.rim.device.api.xml.parsers.DocumentBuilder;
import net.rim.device.api.xml.parsers.DocumentBuilderFactory;
public class database{
private static PersistentObject storeVenue;
static final long key = 0x2ba5f8081f7ef332L;
public Hashtable hashtable;
public Vector venue_list;
String _node,_element;
public database()
{
storeVenue = PersistentStore.getPersistentObject(key);
}
public void save(Document xml)
{
venue_list = new Vector();
storeVenue.setContents(venue_list);
Hashtable categories = new Hashtable();
try{
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory. newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
docBuilder.isValidating();
xml.getDocumentElement ().normalize ();
NodeList list=xml.getElementsByTagName("*");
_node=new String();
_element = new String();
for (int i=0;i<list.getLength();i++){
Node value=list.item(i).getChildNodes().item(0);
_node=list.item(i).getNodeName();
_element=value.getNodeValue();
categories.put(_element, _node);
}
}
catch (Exception e){
System.out.println(e.toString());
}
venue_list.addElement(categories);
storeVenue.commit();
}
The code above is the work in progress, and is most likely heavily flawed. However, I have been at this for days now. I can never seem to get all child nodes, or the name / value pair.
When I print out the vector as a string, I usually end up with results like this:
[{ = root, = element}]
and that's it. No "category_id", no "name"
Ideally, I would end up with something like
[{1 = cars, 2 = boats}]
Any help is appreciated.
Thanks

Here's a fixed version of your program. Changes that I made are as follows:
I removed the DocBuilder-stuff from the save() method. These calls are needed to construct a new Document. Once you have such an object (and you do since it is passed in as an argument) you don't need the DocumentBuilder anymore. A proper use of DocumentBuilder is illustrated in the main method, below.
_node,_element need not be fields. They get new values with each pass through the loop inside save so I made them local variables. In addition I changed their names to category and name to reflect their association with the elements in the XML document.
There's never a need to create a new String object by using new String(). A simple "" in enough (see the initialization of the category and name variables).
Instead of looping over everything (via "*") the loop now iterates over element elements. Then there is a an inner loop that iterates over the children of each element, namely: its category_id and name elements.
In each pass of the inner we set either the category or the name variable depending on the name of the node at hand.
The actual value that is set to these variables is obtained by via node.getTextContent() which returns the stuff between the node's enclosing tags.
class database:
public class database {
private static PersistentObject storeVenue;
static final long key = 0x2ba5f8081f7ef332L;
public Hashtable hashtable;
public Vector venue_list;
public database() {
storeVenue = PersistentStore.getPersistentObject(key);
}
public void save(Document xml) {
venue_list = new Vector();
storeVenue.setContents(venue_list);
Hashtable categories = new Hashtable();
try {
xml.getDocumentElement().normalize();
NodeList list = xml.getElementsByTagName("element");
for (int i = 0; i < list.getLength(); i++) {
String category = "";
String name = "";
NodeList children = list.item(i).getChildNodes();
for(int j = 0; j < children.getLength(); ++j)
{
Node n = children.item(j);
if("category_id".equals(n.getNodeName()))
category = n.getTextContent();
else if("name".equals(n.getNodeName()))
name = n.getTextContent();
}
categories.put(category, name);
System.out.println("category=" + category + "; name=" + name);
}
} catch (Exception e) {
System.out.println(e.toString());
}
venue_list.addElement(categories);
storeVenue.commit();
}
}
Here's a main method:
public static void main(String[] args) throws Exception {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
docBuilder.isValidating();
Document xml = docBuilder.parse(new File("input.xml"));
database db = new database();
db.save(xml);
}

Thank you so much. With only slight modification I was able to do exactly what I was looking for.
Here are the modifications I had to do:
Even though I am building in 1.5, getTextContent was not available. I had to use category = n.getFirstChild().getNodeValue(); to obtain the value of each node. Though there may have been a simple solution like updating my build settings, I am not familiar enough with BB requirements to know when it is safe to stray from the default recommended build settings.
In the main, I had to alter this line:
Document xml = docBuilder.parse(new File("input.xml"));
so that it was reading from an InputStream delivered from my web server, and not necessarily a local file - even though I wonder if storing the xml local would be more efficient than storing a vector full of hash tables.
...
InputStream responseData = connection.openInputStream();
Document xmlParsed = docBuilder.parse(result);
Obviously I skipped over the HTTP connection portion for the sake of keeping this readable.
Your help has saved me a full weekend of blind debugging. Thank you very much! Hopefully this post will help someone else as well.

//res/xml/input.xml
private static String _xmlFileName = "/xml/input.xml";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream inputStream = getClass().getResourceAsStream( _xmlFileName );
Document document = builder.parse( inputStream );

XML Parsing from Java Rest Service Response

I am getting the below XML response from Java Rest Service.Could any one how to get status tag information?
<operation name="EDIT_REQUEST">
<result>
<status>Failed</status>
<message>Error when editing request details - Request Closing Rule Violation. Request cannot be Resolved</message>
</result>
</operation>

XPath is a very robust and intuitive way to quickly query XML documents. You can reach value of status tag by following steps.
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
Document doc = builder.parse(stream); //input stream of response.
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
XPathExpression expr = xpath.compile("//status"); // Look for status tag value.
String status = expr.evaluate(doc);
System.out.println(status);

A rough code is here
ByteArrayInputStream input = new ByteArrayInputStream(
response.getBytes("UTF-8"));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(input);
// first get root tag and than result tag and childs of result tag
NodeList childNodes = doc.getDocumentElement().getChildNodes().item(0).getChildNodes();
for(int i = 0 ; i < childNodes.getLength(); i++){
if("status".equals(childNodes.item(i).getNodeName())){
System.out.println(childNodes.item(i).getTextContent());
}
}

If you are just concerned with the status text, how about just having a simple regex with a group?
e.g.
String responseXml = 'xml response from service....'
Pattern p = Pattern.compile("<status>(.+?)</status>");
Matcher m = p.matcher(responseXml);
if(m.find())
{
System.out.println(m.group(1)); //Should print 'Failed'
}

One option is to use the joox library, imported using gradle, like:
compile 'org.jooq:joox:1.3.0'
And use it like:
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
import static org.joox.JOOX.$;
public class Main {
public static void main(String[] args) throws IOException, SAXException {
System.out.println(
$(new File(args[0])).xpath("/operation/result/status").text()
);
}
}
It yields:
Failed

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get XML comments using Java's DocumentBuilder - java

Related

Not able to parse inner elements of XML using DocumentBuilderFactory in Java

Java Dom parser reports wrong number of child nodes

Building an XML Document: I'm doing it wrong

How do you traverse and store XML in Blackberry Java app?

XML Parsing from Java Rest Service Response

Categories

Resources