XML Parsing from Java Rest Service Response - java

I am getting the below XML response from Java Rest Service.Could any one how to get status tag information?
<operation name="EDIT_REQUEST">
<result>
<status>Failed</status>
<message>Error when editing request details - Request Closing Rule Violation. Request cannot be Resolved</message>
</result>
</operation>

XPath is a very robust and intuitive way to quickly query XML documents. You can reach value of status tag by following steps.
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
Document doc = builder.parse(stream); //input stream of response.
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
XPathExpression expr = xpath.compile("//status"); // Look for status tag value.
String status = expr.evaluate(doc);
System.out.println(status);

A rough code is here
ByteArrayInputStream input = new ByteArrayInputStream(
response.getBytes("UTF-8"));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(input);
// first get root tag and than result tag and childs of result tag
NodeList childNodes = doc.getDocumentElement().getChildNodes().item(0).getChildNodes();
for(int i = 0 ; i < childNodes.getLength(); i++){
if("status".equals(childNodes.item(i).getNodeName())){
System.out.println(childNodes.item(i).getTextContent());
}
}

If you are just concerned with the status text, how about just having a simple regex with a group?
e.g.
String responseXml = 'xml response from service....'
Pattern p = Pattern.compile("<status>(.+?)</status>");
Matcher m = p.matcher(responseXml);
if(m.find())
{
System.out.println(m.group(1)); //Should print 'Failed'
}

One option is to use the joox library, imported using gradle, like:
compile 'org.jooq:joox:1.3.0'
And use it like:
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
import static org.joox.JOOX.$;
public class Main {
public static void main(String[] args) throws IOException, SAXException {
System.out.println(
$(new File(args[0])).xpath("/operation/result/status").text()
);
}
}
It yields:
Failed

Related

How to get XML comments using Java's DocumentBuilder

I have an application that uses SAML authentication, acts as an SP, and therefore parses SAMLResponses. I received notification that an IdP that communicates with my application will now start signing their SAMLResponses with http://www.w3.org/2001/10/xml-exc-c14n#WithComments, which means comments matter when calculating the validity of the SAML signature.
Here lies the problem - the library I use for XML parsing strips these comment nodes by default. See this example program:
import org.apache.commons.io.IOUtils;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
public class Main {
public static void main(String[] args) {
try {
String xml = "<NameID>test#email<!---->.com</NameID>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(IOUtils.toInputStream(xml));
NodeList nodes = doc.getElementsByTagName("NameID");
if (nodes == null || nodes.getLength() == 0)
{
throw new RuntimeException("No NameID in document");
}
System.out.println(nodes.item(0).getTextContent());
} catch(Exception e) {
System.err.println(e.getMessage());
}
}
}
So, this program will print test#email.com (which means that's what my SAML code will get, too). This is a problem, as I'm pretty sure it will cause signature validation to fail without the comment included, since the XML document was signed with the #WithComments canonicalization method.
Is there any way to get DocumentBuilder/getTextContent() to leave in comment nodes so my signature is not invalidated by the missing comment?
Documentation for getTextContent() is here: https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html#getTextContent()
Your code actually retains the comment.
Here, slightly modified:
public static void main(String[] args) throws Exception {
String xml = "<NameID>test#email<!--foobar-->.com</NameID>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
NodeList childNodes = doc.getDocumentElement().getChildNodes();
Node[] nodes = new Node[childNodes.getLength()];
for (int index = 0; index < childNodes.getLength(); index++) {
nodes[index] = childNodes.item(index);
}
System.out.println(nodes[1].getTextContent());
}
Prints foobar. (Run it on Ideone.)
There are 3 child nodes of the root element, one of the is the comment node. So it is actually retained.

Not able to parse inner elements of XML using DocumentBuilderFactory in Java

I'm having a response as XML. I'm trying to parse the XML object to get inner details. Im using DocumentBuilderFactory for this. The parent object is not null, but when I try to get the deepnode list elements, its returning null. Am I missing anything
Here is my response XML
ResponseXML
<DATAPACKET REQUEST-ID = "1">
<HEADER>
</HEADER>
<BODY>
<CONSUMER_PROFILE2>
<CONSUMER_DETAILS2>
<NAME>David</NAME>
<DATE_OF_BIRTH>1949-01-01T00:00:00+03:00</DATE_OF_BIRTH>
<GENDER>001</GENDER>
</CONSUMER_DETAILS2>
</CONSUMER_PROFILE2></BODY></DATAPACKET>
and Im parsing in the following way
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(responseXML));
// Consumer details.
if(doc.getDocumentElement().getElementsByTagName("CONSUMER_DETAILS2") != null) {
Node consumerDetailsNode = doc.getDocumentElement().getElementsByTagName("CONSUMER_DETAILS2").item(0); -->This is coming as null
dateOfBirth = getNamedItem(consumerDetailsNode, "DATE_OF_BIRTH");
System.out.println("DOB:"+dateOfBirth);
}
getNamedItem
private static String getNamedItem(Node searchResultNode, String param) {
return searchResultNode.getAttributes().getNamedItem(param) != null ? searchResultNode.getAttributes().getNamedItem(param).getNodeValue() : "";
}
Any ideas would be greatly appreciated.
The easiest way to search for individual elements within an XML document is with XPAth. It provides search syntax similar to file system notation.
Here is a solution to the specific problem of you document:
EDIT: solution adopted to support multiple CONSUMER_PROFILE2 elements. You just need to get and parse NodeList instread of one Node
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public class XpathDemo
{
public static void main(String[] args)
{
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document xmlDoc = builder.parse(new InputSource(new FileReader("C://Temp/xx.xml")));
// Selects all CONSUMER_PROFILE2 elements no matter where they are in the document
String cp2_nodes = "//CONSUMER_PROFILE2";
// Selects first DATE_OF_BIRTH element somewhere under current element
String dob_nodes = "//DATE_OF_BIRTH[1]";
// Selects text child node of current element
String text_node = "/child::text()";
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList dob_list = (NodeList)xPath.compile(cp2_nodes + dob_nodes + text_node)
.evaluate(xmlDoc, XPathConstants.NODESET);
for (int i = 0; i < dob_list.getLength() ; i++) {
Node dob_node = dob_list.item(i);
String dob_text = dob_node.getNodeValue();
System.out.println(dob_text);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

xml and percent symbols

I'm using XMLStreamReader to parse a piece of xml:
XMLStreamReader rd = XMLInputFactory.newInstance().createXMLStreamReader(io_xml, "UTF-8");
...
if (eventType == XMLStreamConstants.START_ELEMENT) {
String name = rd.getLocalName();
if (name.equals("key")) {
String val = rd.getElementText();
}
}
Problem is, I'm getting a bad read for the following string: "<key>cami%C3%B5es%2Babc</key>"
org.junit.ComparisonFailure:
expected:<cami[%C3%B5es%]2Babc> but was:<cami[ C3 B5es ]2Babc>
Do I neeed to do anything special within the XML? I already tried to put everything within a CDATA section but I get the same error.
Using a "regular" parser everything works:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
Document parse = builder.parse(is);
String value = parse.getFirstChild().getTextContent();
...
I figured it out. The problem was in a different section of the code. A setter that didn't just set.

Java Dom parser reports wrong number of child nodes

I have the following xml file:
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user id="0" firstname="John"/>
</users>
Then I'm trying to parse it with java, but getchildnodes reports wrong number of child nodes.
Java code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(this.file);
document.getDocumentElement().normalize();
Element root = document.getDocumentElement();
NodeList nodes = root.getChildNodes();
System.out.println(nodes.getLength());
Result: 3
Also I'm getting NPEs for accessing the nodes attributes, so I'm guessing something's going horribly wrong.
The child nodes consist of elements and text nodes for whitespace. You will want to check the node type before processing the attributes. You may also want to consider using the javax.xml.xpath APIs available in the JDK/JRE starting with Java SE 5.
Example 1
This example demonstrates how to issue an XPath statement against a DOM.
package forum11649396;
import java.io.StringReader;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<?xml version='1.0' encoding='UTF-8'?><users><user id='0' firstname='John'/></users>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new InputSource(new StringReader(xml)));
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
Element userElement = (Element) xpath.evaluate("/users/user", document, XPathConstants.NODE);
System.out.println(userElement.getAttribute("id"));
System.out.println(userElement.getAttribute("firstname"));
}
}
Example 2
The following example demonstrates how to issue an XPath statement against an InputSource to get a DOM node. This saves you from having to parse the XML into a DOM yourself.
package forum11649396;
import java.io.StringReader;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<?xml version='1.0' encoding='UTF-8'?><users><user id='0' firstname='John'/></users>";
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
InputSource inputSource = new InputSource(new StringReader(xml));
Element userElement = (Element) xpath.evaluate("/users/user", inputSource, XPathConstants.NODE);
System.out.println(userElement.getAttribute("id"));
System.out.println(userElement.getAttribute("firstname"));
}
}
There are three child nodes:
a text node containing a line break
an element node (tagged user)
a text node containing a line break
So when processing the child nodes, check for element nodes.
You have to make sure you account for the '\n' between the nodes, which count for text nodes. You can test for that using if(root.getNodeType() == Node.ELEMENT_NODE)
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(this.file);
document.getDocumentElement().normalize();
for(Node root = document.getFirstChild(); root != null; root = root.getNextSibling()) {
if(root.getNodeType() == Node.ELEMENT_NODE) {
NodeList nodes = root.getChildNodes();
System.out.println(root.getNodeName() + " has "+nodes.getLength()+" children");
for(int i=0; i<nodes.getLength(); i++) {
Node n = nodes.item(i);
System.out.println("\t"+n.getNodeName());
}
}
}
I didn't notice any of the answers addressing your last note about NPEs when trying to access attributes.
Also I'm getting NPEs for accessing the nodes attributes, so I'm guessing something's going horribly wrong.
Since I've seen the following suggestion on a few sites, I assume it's a common way to access attributes:
String myPropValue = node.getAttributes().getNamedItem("myProp").getNodeValue();
which works fine if the nodes always contain a myProp attribute, but if it has no attributes, getAttributes will return null. Also, if there are attributes, but no myProp attribute, getNamedItem will return null.
I'm currently using
public static String getStrAttr(Node node, String key) {
if (node.hasAttributes()) {
Node item = node.getAttributes().getNamedItem(key);
if (item != null) {
return item.getNodeValue();
}
}
return null;
}
public static int getIntAttr(Node node, String key) {
if (node.hasAttributes()) {
Node item = node.getAttributes().getNamedItem(key);
if (item != null) {
return Integer.parseInt(item.getNodeValue());
}
}
return -1;
}
in a utility class, but your mileage may vary.

How to easily parse HTML for consumption as a service using Java?

I want to parse an HTML such as http://www.reddit.com/r/reddit.com/search?q=Microsoft&sort=top
and only want extract the text of the element which has <a class="title"
The options I have looked so far all look like overkill (SAX, DOM traversal).
Use Jsoup. It supports jQuery-like CSS selectors. Here's a kickoff example:
String url = "http://www.reddit.com/r/reddit.com/search?q=Microsoft&sort=top";
Document document = Jsoup.connect(url).get();
for (Element link : document.select("a.title")) {
System.out.println(link.absUrl("href"));
}
Result:
http://news.cnet.com/8301-13579_3-10288022-37.html
http://dl.getdropbox.com/u/18264/mspoland.jpg
http://www.reddit.com/r/reddit.com/comments/ar5z1/verizon_stealthily_installed_a_bing_search_app_on/
http://www.grabup.com/uploads/240ccede5360b093dbf298f8946025a5.png
http://www.youtube.com/watch?v=7Ym0tZSWGMc&fmt=34
http://i42.tinypic.com/wv5qar.jpg
http://www.reddit.com/r/technology/comments/8hnya/apple_no_i_dont_want_to_make_quicktime_my_default/
http://cssferret.imgur.com/microsoft_wtf
http://imgur.com/8pct5.png
http://googleblog.blogspot.com/2011/02/microsofts-bing-uses-google-search.html
http://news.cnet.com/8301-27076_3-20011994-248.html?part=rss&subj=news&tag=2547-1_3-0-20
http://gizmodo.com/5383413/shady-microsoft-plugin-pokes-critical-hole-in-firefox-security
http://i.stack.imgur.com/sl1LY.png
http://imgur.com/T6BMs
http://www.nytimes.com/2010/09/14/world/europe/14raid.html
http://twitter.com/phil_nash/status/21159419598
http://online.wsj.com/article/SB10001424052748704415104576065641376054226.html?mod=WSJASIA_hpp_MIDDLESecondNews
http://www.reddit.com/r/reddit.com/comments/bqqxv/inside_the_chinese_factory_that_makes_microsofts/
http://i.min.us/iX0PA.png
http://imgur.com/m4nuz.gif
http://www.gamesforwindows.com/en-CA/Games/AgeofEmpiresIII/
http://foredecker.wordpress.com/2011/02/27/working-at-microsoft-day-to-day-coding/
http://homepage.mac.com/aleksivic/.Pictures/humor/spotTheBusey.jpg
http://www.bloomberg.com/apps/news?pid=20601087&sid=a7uOT0ro100U&refer=home
http://www.microsoft.com/windowsxp/eula/pro.mspx
Pretty concise, huh?
See also:
Pros and cons of leading HTML parsers in Java
Just an observation: Reddit generates XHTML, which means it's XML compliant. So you can just use an XPath library. e.g. (shamelessly copied from http://www.ibm.com/developerworks/library/x-javaxpathapi.html with minor modifications),
import java.io.IOException;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
public class XPathExample {
public static void main(String[] args)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = domFactory.newDocumentBuilder();
// replace the following line with code to retrieve and parse the URL of your choice
Document doc = builder.parse("books.xml");
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr
= xpath.compile("//a[class='title']/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
}
}
Obviously won't work on all websites, but will work for any that serve XHTML.

Categories