Loop through XML String using XPath - Java - java

I have a function I would like to loop through the xml and pull out certain tags.
My xml looks like this:
<Report_Data>
<Report_Entry>
<Company>Test</Company>
<Name>Test Name</Name>
<Division>Test Division</Division>
</Report_Entry>
<Report_Entry>
<Company>Test 2</Company>
<Name>Test Name 2</Name>
<Division>Test Division 2</Division>
</Report_Entry>
<Report_Entry>
<Company>Test 3</Company>
<Name>Test Name 3</Name>
<Division>Test Division 3</Division>
</Report_Entry>
</Report_Data>
Here is my code to loop through:
String comp, name, div, nodeName, NodeValue;
Node node;
try
{
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
InputSource source = new InputSource(new StringReader(coaFULL));
Document doc2 = (Document) xpath.evaluate("/", source, XPathConstants.NODE);
NodeList nodeList = (NodeList) xpath.compile("/Report_Data/Report_Entry").evaluate(doc2, XPathConstants.NODESET);
System.out.println("NODE LIST LENGTH =" + nodeList.getLength());
String nodeName, nodeValue = "";
Node node;
for(int i = 0; i < nodeList.getLength(); i++)
{
node = nodeList.item(i);
node = nodeList.item(i).getFirstChild();
nodeName = node.getNodeName();
nodeValue = node.getChildNodes().item( 0 ).getNodeValue();
if(nodeName.equals("Company"))
{
comp = nodeValue;
}
else if( nodeName.equals("Name"))
{
name = nodeValue;
}
else if(nodeName.equals("Division"))
{
div = nodeValue;
}
System.out.println("COMPANY = " + comp);
System.out.println("NAME = " + name);
System.out.println("DIVISION = " + div);
}
When I run my code, only the first value (company) gets an actual value, everything else is blank. I also tried adding node = nodeList.item(i).getNextSibling(); inside of each if statement to grab the next node, but that did not work.
My nodeList does have items in it, over 1000. Is there a problem with this statement: NodeList nodeList = (NodeList) xpath.compile("/Report_Data/Report_Entry").evaluate(doc2, XPathConstants.NODESET);?
Should it be: NodeList nodeList = (NodeList) xpath.compile("/Report_Data/Report_Entry/*").evaluate(doc2, XPathConstants.NODESET);
I tried it with the /* at the end but that caused the nodeList to have every single node in it. I want to make sure that when I grab a Report_Entry node, that I set the string variables to the correct values that correspond to each other.
==========================================================
Solution: It's ugly but my solution was to just go with one loop and use the second list of children nodes with hard coded values:
for(int i = 0; i < nodeList.getLength(); i++)
{
node = nodeList.item(i);
tempList = node.getChildNodes();
System.out.println("TEMP LIST LENGTH =" + tempList.getLength());
comp = tempList.item(0).getTextContent();
name = tempList.item(1).getTextContent();
div = tempList.item(2).getTextContent();
}
Thanks to #hage for his help.

Maybe it's because your node is only the first child?
node = nodeList.item(i);
node = nodeList.item(i).getFirstChild();
I guess nodeList.item(i) will give you the Report_Entrys and their first child is the Company.
You will need to loop over all children of the Company entry
EDIT (regarding your edit):
tempList.item(x) is the Company, Name, and then Division. When you get the first child of this one, you are at the text node (the actual content). And because you try to get the name of this node, you get the #text output (see this).
To get name and value of the nodes, try this (untested)
nodeName = tempList.item(x).getNodeName();
nodeValue = tempList.item(x).getTextContent();

Related

Why is XPath returning nodes from my entire document, and not just the select node [duplicate]

This question already has answers here:
What is the difference between .// and //* in XPath?
(4 answers)
Closed 5 years ago.
Given:
public class XPathTest {
public static void main(String args[]) throws Exception {
String xmlString
= "<a>"
+ "<b c=\"1\"/>"
+ "<b c=\"2\"/>"
+ "</a>";
ByteArrayInputStream bis = new ByteArrayInputStream(xmlString.getBytes());
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(bis));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//b");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
dumpNodeList(nl);
for (int i = 0; i < nl.getLength(); i++) {
Node n = nl.item(i);
NodeList nl2 = (NodeList) expr.evaluate(n, XPathConstants.NODESET);
dumpNodeList(nl2);
}
}
public static void dumpNodeList(NodeList nl) {
System.out.println("NodeList length = " + nl.getLength());
for (int i = 0; i < nl.getLength(); i++) {
System.out.println("Node #" + i);
Element e = (Element) nl.item(i);
System.out.println("Name = " + e.getTagName());
System.out.println("Attr = " + e.getAttribute("c"));
}
System.out.println();
}
}
Sample result:
NodeList length = 2
Node #0
Name = b
Attr = 1
Node #1
Name = b
Attr = 2
NodeList length = 2
Node #0
Name = b
Attr = 1
Node #1
Name = b
Attr = 2
NodeList length = 2
Node #0
Name = b
Attr = 1
Node #1
Name = b
Attr = 2
I evaluate the XPath expression //b, passing in the root document node to the evaluator. It, expectedly, returns the two b nodes.
What I'm confused about is when I evaluate the same expression, but instead of passing in the root node as he parameter I pass in one of the children nodes I located earlier. According to XPathExpression.evaluate(Object item), it says Evaluate the compiled XPath expression in the specified context and return the result as the specified type.
The expression, //b, means "give me all of the b nodes".
Intuitively, I would think that if I pass in a Node to XPathExpression.evalute(Object item), that the expression would be evaluated starting with that node as its root for the "give me all" part, rather than the entire document. So I would expect a resulting NodeList of one node, not two.
But instead, I get the same two nodes as if from the entire document.
So, the two questions are:
Why is the expression being evaluated relative to the entire document, and not to just using the pass Node as a synthetic root for the evaluation?
How can I get the expression to evaluate using the passed in Node as the synthetic root for the evaluation?
It works as it should.. Instead of //b, please try with ./b. Maybe M$ help is not very helpful, but here they have some handy examples.
An expression starting with /, like //b, starts by navigating upwards from the context node to the root of the containing tree. So your cited documentation is correct that it is "evaluated in the current context", but you somehow misread this as saying that it only looks at the subtree rooted at the context node. If you choose to navigate upwards from the context node, you can, and that is precisely what you have done.
You probably wanted .//b.

XPATH evaluation against child node

I have xml as follows,
<students>
<Student><age>23</age><id>2000</id><name>PP2000</name></Student>
<Student><age>23</age><id>1000</id><name>PP1000</name></Student>
</students>
I have 2 xpaths Template XPATH = students/Student will be the template nodes, but I cannot hard code this xpath, because it will change for other XMLs, and XML is pretty dynamic, can expand (but with the same base XPATHs) So if I evaluate one more XPATH using the template node, I'm using the following code,
XPath xpathResource = XPathFactory.newInstance().newXPath();
Document xmlDocument = //creating document;
NodeList nodeList = (NodeList)xpathResource.compile("//students/Student").evaluate(xmlDocument, XPathConstants.NODESET);
for (int nodeIndex = 0; nodeIndex < nodeList.getLength(); nodeIndex++) {
Node currentNode = nodeList.item(nodeIndex);
String xpathID = "//students/Student/id";
String xpathName = "//students/Student/name";
NodeList childID = (NodeList)xpathResource.compile(xpathID).evaluate(currentNode, XPathConstants.NODESET);
NodeList childName = (NodeList)xpathResource.compile(xpathName).evaluate(currentNode, XPathConstants.NODESET);
System.out.println("node ID " +childID.item(0).getTextContent());
System.out.println("node Name " +childName.item(0).getTextContent());
}
Now the problem is, this for loop will execute for 2 times, but both time I'm getting 2000 , PP2000 as ID value. Is there any way to iterate to the child node with generic XPATH against a node. I cannot go generic XPATH against the whole XMLDocument, I have some validation to do. I want to use XML nodelist as result set rows, so that I can validate the XML value and do my stuff.
XPath xpathResource = XPathFactory.newInstance().newXPath();
Document xmlDocument = //creating document;
NodeList nodeList = (NodeList)xpathResource.compile("//students/Student/id").evaluate(xmlDocument, XPathConstants.NODESET);
for (int nodeIndex = 0; nodeIndex < nodeList.getLength(); nodeIndex++) {
Node currentNode = nodeList.item(nodeIndex);
System.out.println("node " +currentNode.getTextContent());
}

Retrieving different child elements xml

I have a xml file that looks like this.
<Device>
<Staff>
<Name>ABC</Name>
<Name>Hello</Name>
</Staff>
<Connect>
<Speed>123</Speed>
<Speed>456</Speed>
</Connect>
</Device>
I need help in retrieving the value of name & speed as i have never tried xml before. I am getting null pointer exception whenever I try to retrieve the element values. Any help is appreciated.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
// Load the input XML document, parse it and return an instance of the
// Document class.
Document document = builder.parse(new File("C:/Users/AA/Desktop/eclipse/lol/testing.xml"));//change to own directory
NodeList nodeList = document.getDocumentElement().getChildNodes();
System.out.println(nodeList.getLength());
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println(i);
Element elem = (Element) node;
// Get the value of the ID attribute.
// String ID = node.getAttributes().getNamedItem("ID").getNodeValue();
// Get the value of all sub-elements.
String name = elem.getElementsByTagName("Name")
.item(0).getChildNodes().item(0).getNodeValue();
Integer speed = Integer.parseInt(elem.getElementsByTagName("Connect")
.item(0).getChildNodes().item(0).getNodeValue());//null pointer exception happens here
staffList.add(new staff(name));
connectList.add(new connect(speed));
}
}
// Print all employees.
for (staff stl : staffList)
{System.out.println("STAFF "+stl.getName());}
for (connect ctl : connectList)
{System.out.println("Connect "+ctl.getSpeed());}
You will have null pointer exceptions because you're assuming that in every iteration of the for loop, the desired nodes have children elements:
String name = elem.getElementsByTagName("Name")
.item(0).getChildNodes().item(0).getNodeValue();
In the above code, you are accessing the first child of a Name element which is a text node (e.g. ABC), and then getting its children nodes, which will cause an exception since there no children elements inside the text node.
Likewise,
Integer speed = Integer.parseInt(elem.getElementsByTagName("Connect")
.item(0).getChildNodes().item(0).getNodeValue());
will cause an exception in one of the iterations of the loop where elem corresponds to Connect itself.
You can try the following code instead:
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println(i);
Element elem = (Element) node;
// Get the value of the ID attribute.
// String ID =
// node.getAttributes().getNamedItem("ID").getNodeValue();
// Get the value of all sub-elements.
NodeList nameNodes = elem.getElementsByTagName("Name");
for(int j = 0; j < nameNodes.getLength(); j++) {
Node nameNode = nameNodes.item(j);
staffList.add(new staff(nameNode.getTextContent()));
}
NodeList speedNodes = elem.getElementsByTagName("Speed");
for(int j = 0; j < speedNodes.getLength(); j++) {
Node speedNode = speedNodes.item(j);
connectList.add(new connect(Integer.parseInt(speedNode.getTextContent())));
}
}
P.S.: Try to use class names that start with an uppercase.
You want getTextContent() rather than getNodeValue() - the latter always returns null for element nodes.
See: DOMDocument getNodeValue() returns null (contains an output escaped string)

Unable to get Name attribute of Childnode

I have a XML file that looks like this:
<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist">
<exist:collection name="/db/RCM" created="2013-03-24T09:37:34.957+05:30" owner="admin" group="dba" permissions="rwxrwxrwx">
<exist:resource name="demo2.xml" created="2013-03-24T09:44:13.696+05:30" last-modified="2013-03-24T09:44:13.696+05:30" owner="guest" group="guest" permissions="rw-r--r--"/>
<exist:resource name="demo3.xml" created="2013-03-24T09:45:47.592+05:30" last-modified="2013-03-24T09:45:47.592+05:30" owner="guest" group="guest" permissions="rw-r--r--"/>
<exist:resource name="rcmdemo.xml" created="2013-03-25T11:36:45.659+05:30" last-modified="2013-03-25T11:36:45.659+05:30" owner="guest" group="guest" permissions="rw-r--r--"/>
<exist:resource name="rcmdemo2.xml" created="2013-03-25T11:47:03.564+05:30" last-modified="2013-03-25T11:47:03.564+05:30" owner="guest" group="guest" permissions="rw-r--r--"/>
</exist:collection>
</exist:result>
I want to fetch the name of the XML files, so the output looks like this:
demo2.xml
demo3.xml
rcmdemo.xml
rcmdemo2.xml
I have written the following code:
NodeList nodeList = doc.getElementsByTagName("exist:resource");
for (int i = 0; i < nodeList.getLength(); i++) {
Node n = nodeList.item(i);
Node actualNode = n.getFirstChild();
if (actualNode != null) {
System.out.println(actualNode.getNodeValue());
}
}
But it does not return the output that I want, where am I going wrong?
In this example name is an attribute of the node rather than the name of the node. Please look at the following question for information regarding attributes of nodes, the second answer in particular is what you are looking for i think.
get the the attributes from an XML File using Java
You have to get the attribute from the given node since your name is an attribute of exist:resource.
NodeList nodeList = doc.getElementsByTagName("exist:resource");
for (int i = 0; i < nodeList.getLength(); i++) {
Node n = nodeList.item(i);
Node actualNode = n.getFirstChild();
if (actualNode != null) {
// Will return node value
System.out.println(actualNode.getNodeValue());
// Will return the attribute value
System.out.println(current.getAttributeValue("name"));
}
}

Looping over nodes and extracting specific subnode values using Java's XPath

I understand from Googling that it makes more sense to extract data from XML using XPath than by using DOM looping.
At the moment, I have implemented a solution using DOM, but the code is verbose, and it feels untidy and unmaintainable, so I would like to switch to a cleaner XPath solution.
Let's say I have this structure:
<products>
<product>
<title>Some title 1</title>
<image>Some image 1</image>
</product>
<product>
<title>Some title 2</title>
<image>Some image 2</image>
</product>
...
</products>
I want to be able to run a for loop for each of the <product> elements, and inside this for loop, extract the title and image node values.
My code looks like this:
InputStream is = conn.getInputStream();
DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(is);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
Node n = products.item(i);
if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
Element product = (Element) n;
// do some DOM navigation to get the title and image
}
}
Inside my for loop I get each <product> as a Node, which is cast to an Element.
Can I simply use my instance of XPathExpression to compile and run another XPath on the Node or the Element?
Yes, you can always do like this -
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
expr = xpath.compile("title"); // The new xpath expression to find 'title' within 'product'.
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
Node n = products.item(i);
if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
Element product = (Element) n;
NodeList nodes = (NodeList) expr.evaluate(product,XPathConstants.NODESET); //Find the 'title' in the 'product'
System.out.println("TITLE: " + nodes.item(0).getTextContent()); // And here is the title
}
}
Here I have given example of extracting the 'title' value. In same way you can do for 'image'
I'm not a big fan of this approach because you have to build a document (which might be expensive) before you can apply XPaths to it.
I've found VTD-XML a lot more efficient when it comes to applying XPaths to documents, because you don't need to load the whole document into memory. Here is some sample code:
final VTDGen vg = new VTDGen();
vg.parseFile("file.xml", false);
final VTDNav vn = vg.getNav();
final AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/products/product");
while (ap.evalXPath() != -1) {
System.out.println("PRODUCT:");
// you could either apply another xpath or simply get the first child
if (vn.toElement(VTDNav.FIRST_CHILD, "title")) {
int val = vn.getText();
if (val != -1) {
System.out.println("Title: " + vn.toNormalizedString(val));
}
vn.toElement(VTDNav.PARENT);
}
if (vn.toElement(VTDNav.FIRST_CHILD, "image")) {
int val = vn.getText();
if (val != -1) {
System.out.println("Image: " + vn.toNormalizedString(val));
}
vn.toElement(VTDNav.PARENT);
}
}
Also see this post on Faster XPaths with VTD-XML.

Categories