Getting XML child elements with XPath - java

I have this XML:
<root>
<items>
<item1>
<tag1>1</tag1>
<sub>
<sub1>10 </sub1>
<sub2>20 </sub2>
</sub>
</item1>
<item2>
<tag1>1</tag1>
<sub>
<sub1> </sub1>
<sub2> </sub2>
</sub>
</item2>
</items>
</root>
I want to get the item1 element and the name and values of the child elements.
That is, i want to get: tag1 - 1,sub1-10,sub2-20.
How can i do this? so far i can only get elements without children.

Document doc = ...;
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/root/items/item1/*/text()");
Object o = expr.evaluate(doc, XPathConstants.NODESET);
NodeList list = (NodeList) o;

import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
/**
* File: Ex1.java #author ronda
*/
public class Ex1 {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory Factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = Factory.newDocumentBuilder();
Document doc = builder.parse("myxml.xml");
//creating an XPathFactory:
XPathFactory factory = XPathFactory.newInstance();
//using this factory to create an XPath object:
XPath xpath = factory.newXPath();
// XPath Query for showing all nodes value
XPathExpression expr = xpath.compile("//" + "item1" + "/*");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
Element el = (Element) nodes.item(i);
System.out.println("tag: " + el.getNodeName());
// seach for the Text children
if (el.getFirstChild().getNodeType() == Node.TEXT_NODE)
System.out.println("inner value:" + el.getFirstChild().getNodeValue());
NodeList children = el.getChildNodes();
for (int k = 0; k < children.getLength(); k++) {
Node child = children.item(k);
if (child.getNodeType() != Node.TEXT_NODE) {
System.out.println("child tag: " + child.getNodeName());
if (child.getFirstChild().getNodeType() == Node.TEXT_NODE)
System.out.println("inner child value:" + child.getFirstChild().getNodeValue());;
}
}
}
}
}
I get this output loading the xml of your question in file named: myxml.xml:
run:
2
tag: tag1
inner value:1
tag: sub
inner value:
child tag: sub1
inner child value:10
child tag: sub2
inner child value:20
...a bit wordy, but allow us to understand how it works. PS: I found a good guide in here

Related

XPATH evaluation against child node

I have xml as follows,
<students>
<Student><age>23</age><id>2000</id><name>PP2000</name></Student>
<Student><age>23</age><id>1000</id><name>PP1000</name></Student>
</students>
I have 2 xpaths Template XPATH = students/Student will be the template nodes, but I cannot hard code this xpath, because it will change for other XMLs, and XML is pretty dynamic, can expand (but with the same base XPATHs) So if I evaluate one more XPATH using the template node, I'm using the following code,
XPath xpathResource = XPathFactory.newInstance().newXPath();
Document xmlDocument = //creating document;
NodeList nodeList = (NodeList)xpathResource.compile("//students/Student").evaluate(xmlDocument, XPathConstants.NODESET);
for (int nodeIndex = 0; nodeIndex < nodeList.getLength(); nodeIndex++) {
Node currentNode = nodeList.item(nodeIndex);
String xpathID = "//students/Student/id";
String xpathName = "//students/Student/name";
NodeList childID = (NodeList)xpathResource.compile(xpathID).evaluate(currentNode, XPathConstants.NODESET);
NodeList childName = (NodeList)xpathResource.compile(xpathName).evaluate(currentNode, XPathConstants.NODESET);
System.out.println("node ID " +childID.item(0).getTextContent());
System.out.println("node Name " +childName.item(0).getTextContent());
}
Now the problem is, this for loop will execute for 2 times, but both time I'm getting 2000 , PP2000 as ID value. Is there any way to iterate to the child node with generic XPATH against a node. I cannot go generic XPATH against the whole XMLDocument, I have some validation to do. I want to use XML nodelist as result set rows, so that I can validate the XML value and do my stuff.
XPath xpathResource = XPathFactory.newInstance().newXPath();
Document xmlDocument = //creating document;
NodeList nodeList = (NodeList)xpathResource.compile("//students/Student/id").evaluate(xmlDocument, XPathConstants.NODESET);
for (int nodeIndex = 0; nodeIndex < nodeList.getLength(); nodeIndex++) {
Node currentNode = nodeList.item(nodeIndex);
System.out.println("node " +currentNode.getTextContent());
}

Java XML Parse xPath not getting all child elements

I have a very large XML file with a lot of elements.
I'm only interested in the cases which look like the example below. there's about 400 cases in the xml document I want to parse through the document and print out each element and name.
<cases>
<case>
<id/>
<title/>
<type/>
<priority/>
<estimate/>
<references/>
<custom>
<functional_area/>
<technology_dependence/>
<reviewed/>
<steps_completed>
</steps_completed>
<preconds> </preconds>
<steps_seperated>
<step>
<index/>
<content>
</content>
<expected>
</expected>
</step>
<step>
<index/>
<content>
</content>
<expected>
</expected>
</step>
<step>
</steps_seperated>
</custom>
</case>
at the moment my code works fine up until "steps_seperated" where it stops and goes onto the next case.
my code looks like this (MVCE BELOW)
I can't work out why it stops after "steps_seperated" and starts a new case
a second issue I've noticed is it only displays out 10 or so cases (im not sure if this is because I'm running it in netbeans )
any help would be very much appreciated thank you
p.s
mvce
public void printCaseElements(NodeList list){
for(int i = 0 ;i <list.getLength();i++){
Element el = (Element) list.item(i);
System.out.println("tag: " + el.getNodeName());
if(el.getFirstChild().getNodeType() == Node.TEXT_NODE)
{
System.out.println("Inner Value: "+ el.getFirstChild().getNodeValue());
System.out.println("________________________________________________________________________");
NodeList children = el.getChildNodes();
for(int k = 0; k < children.getLength(); k++){
Node child = children.item(k);
if (child.getNodeType() != Node.TEXT_NODE){
System.out.println("child tag: "+ child.getNodeName());
if(child.getFirstChild().getNodeType() == Node.TEXT_NODE){
System.out.println("inner child value :" + child.getFirstChild().getNodeValue());
System.out.println("____________________________________________________________________________________________");
}
}
}
}
}
P.P.S
I think the issue may be when I use xPath
DOMParser parser =new DOMParser() ;
InputSource source = new InputSource(path) ;
try {
parser.parse(source);
Element docElement = parser.getDocument().getDocumentElement();
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression expression =xPath.compile("//case/*");
NodeList list =(NodeList) expression.evaluate(docElement,XPathConstants.NODESET);
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document newDoc = documentBuilder.newDocument();
Element newElement = newDoc.createElement("cases");
newDoc.appendChild(newElement);
for(int i =0 ; i <list.getLength(); i++){
Node n = newDoc.importNode(list.item(i), true);
newElement.appendChild(n);
}
when I changed ("//case/") to ("//steps_seperated/") it showed all the elements below steps separated. but not the elements before steps_seperated

How to narrow an XPath evaluation to a single node instead of a whole document in Java?

XML stream
<l>
<i>
<a>AAA</a>
<b>BBB</b>
<c>CCC</c>
</i>
<i>
<a>AAA2</a>
<b>BBB2</b>
<c>CCC2</c>
</i>
<i>
...
</i>
</l>
I want to output the following text with some Java code:
> CCC
> CCC2
...
Here is the code I wrote to produce the expected result:
Java code
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document d = docBuilder.parse("file:///C:/path/to/my/xml/stream.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//i");
NodeList listOfiNodes = (NodeList) expr.evaluate(d, XPathConstants.NODESET);
for(int i=0;i<listOfiNodes.getLength();i++) {
XPathExpression expr2 = xpath.compile("//c");
System.out.println("> " + ((Node) expr2.evaluate(listOfiNodes.item(i), XPathConstants.NODE)).getTextContent());
}
expr2 keeps on returning the first c node. So I get this output:
> CCC
> CCC
...
The evaluation performed by expr2 doesn't seem to "stay" on the node passed to evaluate() method. Why?
NOTA: I don't want to get the c nodes directly with the xpath //i/c (or /l/i/c).
Java 6
//c selects all matching nodes in the whole document. Use c instead and you will receive this output:
> CCC
> CCC2
Note that you will get an NPE if a Node i does not contain a c in the line where you print the results. The following code should be working as expected:
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document d = docBuilder.parse("stream.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//i");
NodeList listOfiNodes = (NodeList) expr.evaluate(d, XPathConstants.NODESET);
for (int i = 0; i < listOfiNodes.getLength(); i++) {
javax.xml.xpath.XPathExpression expr2 = xpath.compile("c");
Node item = listOfiNodes.item(i);
Node node = (Node) expr2.evaluate(item, XPathConstants.NODE);
if (null != node) {
System.out.println("> " + node.getTextContent());
}
}
Change "//c" with ".//c"
XPathExpression expr2 = xpath.compile(".//c");
It will start the search anywhere from the current node instead of the whole document.
XPathExpression expr2 = (XPathExpression) xpath.compile(".//c");
for(int i=0;i<listOfiNodes.getLength();i++) {
System.out.println("> " + ((Node) expr2.evaluate(listOfiNodes.item(i), XPathConstants.NODE)).getTextContent());
}
Output:
CCC
CCC2

xml parsing with java

I'm new to xml parsing, I am trying to parse the following xml file using java.
<a>
<e class="object">
<amenities class="array">
<e class="object">
<id type="number">31</id>
<name type="string">Internet access available</name>
</e>
<e class="object">
<id type="number">9</id>
<name type="string">Business center</name>
</e>
</amenities>
<brands class="array">
<e class="object">
<code type="number">291</code>
<name type="string">Utell</name>
</e>
<e class="object">
<code type="number">72</code>
<name type="string">Best Western International</name>
</e>
</brands>
<hotels class="array">
<e class="object">
<addressLine1 type="string">4 Rue du Mont-Thabor</addressLine1>
<city type="string">Paris</city>
<name type="string">Renaissance Paris Vendome Hotel</name>
<starRating type="string">5</starRating>
</e>
<e class="object">
<addressLine1 type="string">35 Rue de Berri</addressLine1>
<city type="string">Paris</city>
<name type="string">Crowne Plaza Hotel PARIS-CHAMPS ELYSÉES</name>
<starRating type="string">5</starRating>
</e>
</hotels>
</e>
</a>
I only need to list the name tag info(which will be the Hotel name) for that I used following code but it resulted me not only the hotel info but also everything, can anyone please help me parsing this???
Thanks a lot!!!
Here is the java code I used
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class ReadXMLFile {
public static void main(String argv[]) {
try {
File fXmlFile = new File("c:\\file.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("e");
System.out.println("-----------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Hotel Name : " + getTagValue("name",eElement));
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private static String getTagValue(String sTag, Element eElement){
NodeList nlList= eElement.getElementsByTagName(sTag).item(0).getChildNodes();
Node nValue = (Node) nlList.item(0);
return nValue.getNodeValue();
}
}
You could use xpath to get all the name nodes:
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/hotels//name");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
Pavithira you can use xpath to get only the hotels, below is the main method which you can simple copy/paste at your code.
public static void main(String argv[]) {
try {
File fXmlFile = new File("c:\\file.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :"
+ doc.getDocumentElement().getNodeName());
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//hotels/e");
NodeList nList = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
System.out.println("-----------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Hotel Name : "
+ getTagValue("name", eElement));
}
}
} catch (Exception e) {
e.printStackTrace();
}
You are iterating over the 'e' nodes, so your loop will print out every node inside any 'e' node (which includes the (sub)root node!). Change your getElementsByTagName paramater to "name" if you only want to retrieve and print those nodes.

Looping over nodes and extracting specific subnode values using Java's XPath

I understand from Googling that it makes more sense to extract data from XML using XPath than by using DOM looping.
At the moment, I have implemented a solution using DOM, but the code is verbose, and it feels untidy and unmaintainable, so I would like to switch to a cleaner XPath solution.
Let's say I have this structure:
<products>
<product>
<title>Some title 1</title>
<image>Some image 1</image>
</product>
<product>
<title>Some title 2</title>
<image>Some image 2</image>
</product>
...
</products>
I want to be able to run a for loop for each of the <product> elements, and inside this for loop, extract the title and image node values.
My code looks like this:
InputStream is = conn.getInputStream();
DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(is);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
Node n = products.item(i);
if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
Element product = (Element) n;
// do some DOM navigation to get the title and image
}
}
Inside my for loop I get each <product> as a Node, which is cast to an Element.
Can I simply use my instance of XPathExpression to compile and run another XPath on the Node or the Element?
Yes, you can always do like this -
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
expr = xpath.compile("title"); // The new xpath expression to find 'title' within 'product'.
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
Node n = products.item(i);
if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
Element product = (Element) n;
NodeList nodes = (NodeList) expr.evaluate(product,XPathConstants.NODESET); //Find the 'title' in the 'product'
System.out.println("TITLE: " + nodes.item(0).getTextContent()); // And here is the title
}
}
Here I have given example of extracting the 'title' value. In same way you can do for 'image'
I'm not a big fan of this approach because you have to build a document (which might be expensive) before you can apply XPaths to it.
I've found VTD-XML a lot more efficient when it comes to applying XPaths to documents, because you don't need to load the whole document into memory. Here is some sample code:
final VTDGen vg = new VTDGen();
vg.parseFile("file.xml", false);
final VTDNav vn = vg.getNav();
final AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/products/product");
while (ap.evalXPath() != -1) {
System.out.println("PRODUCT:");
// you could either apply another xpath or simply get the first child
if (vn.toElement(VTDNav.FIRST_CHILD, "title")) {
int val = vn.getText();
if (val != -1) {
System.out.println("Title: " + vn.toNormalizedString(val));
}
vn.toElement(VTDNav.PARENT);
}
if (vn.toElement(VTDNav.FIRST_CHILD, "image")) {
int val = vn.getText();
if (val != -1) {
System.out.println("Image: " + vn.toNormalizedString(val));
}
vn.toElement(VTDNav.PARENT);
}
}
Also see this post on Faster XPaths with VTD-XML.

Categories