Parse XML file using XPATH expressions and Java

Parse XML file using XPATH expressions and Java - java

I would like to parse an xml file and i'm using Java and xpath evaluation.
I would like to output the current's node sibling with an xpath expression and without
using getNextSibling() function, is that possible?
e.g. if we read the name element i would like to add the xpath expression ="./address"
in order to output the sibling of "name" without using getNextSibling()/.
The xml file is as follows:
<root>
<name>
<address>
<profession>
</root>
My code is as follows:
package dom_stack4;
import org.w3c.dom.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import java.io.IOException;
import org.xml.sax.SAXException;
public class Dom_stack4 {
public static void main(String[] args) throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
// TODO code application logic here
DocumentBuilderFactory domFactory =
DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("root.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
// XPath Query for showing all nodes value
XPathExpression expr = xpath.compile("/root/name/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(" Name is : " + nodes.item(i).getNodeValue());
/* IS that possible here ?? */
/* ./address/text() => outputs the current's node sibling */
expr = xpath.compile("./address/text()");
Object result1 = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes1 = (NodeList) result1;
for (int j = 0; j < nodes1.getLength(); j++) {
System.out.println(" ---------------- " + nodes1.item(j).getNodeValue());
}
}
}
}
Thanks, in advance

First off, it would be good if you were running your code using properly formed xml, the name address and profession tags should be closed or you will get an error when you try to parse it. Secondly, if you want to select the text child of the node, you need to make sure there is actually something there, so your xml should look something like this:
<root>
<name>hi</name>
<address>hi</address>
<profession>hi</profession>
</root>
Now, what you have for selecting the text of the name element is fine, so starting with your IS that possible here ?? comment there are some changes you need to make.
If you want to evaluate a relative XPath, you don't want to be passing your document object to the XPath's evaluate method. The current location that the XPath is being evaluated from is determined by the item that you give to it, and the document object is always evaluated at the root. If you want to evaluate relative to a specific node, rather than giving it the document, give it that node.
So you would have something like this for your evaluate method call:
Object result1 = expr.evaluate(nodes.item(i), XPathConstants.NODESET);
Next, you should make sure that your XPath is actually correct. The node that we currently have selected is the text node of name. Which means we need to first go to the name node instead of the text node. The . expression in XPath syntax selects the current node, so all you are doing with that is selecting the same node. You want the .. expression which selects the parent node.
So with our current XPath of .. we are selecting the name node. What we want to do is select address nodes that are sibilings of the name node. There are two ways we can do this, we could select the parent of the name node, the root node, and select address nodes that are children of that, or we could use an XPath axis to select the siblings (information about axes can be found here http://www.w3schools.com/xpath/xpath_axes.asp)
If we are going through the root node, we would need to select the parent of the parent of our current node so ../.. which gives us the root node, followed by the address children: ../../address/text(), which would give us all address siblings.
Alternatively, using an axis, we could do .. to select the name node followed by ../following-sibling::address (NOTE: this only works if the address nodes are after the name node) and then select the text of the address nodes with ../following-sibling:address/text().
This gives us those two lines as either
expr = xpath.compile("../../address/text()");
Object result1 = expr.evaluate(nodes.item(i), XPathConstants.NODESET);
or
expr = xpath.compile("../following-sibling::address/text()");
Object result1 = expr.evaluate(nodes.item(i), XPathConstants.NODESET);

Related

How to parse through a Node and extract the value of a child node in Java?

The input to the function this code is in, is a Node configNode. I need to extract the value of a child node inTemplate. The following is the code. Only null is printed.
XPath xpath = XPathFactory.newInstance().newXPath();
Node inTemplateNode = (Node) xpath.compile("#inTemplate").evaluate(configNode, XPathConstants.NODE);
String inTemplate = (inTemplateNode != null) ? inTemplateNode.getTextContent() : null;
System.out.println("inTemplate Value =" + inTemplate);
Can anyone help me as to why this code is not working.

The XPath expression #inTemplate selects the attribute named inTemplate of the context node (e.g. <config inTemplate="foo"/>). If you really need an attribute value then doing ((Element)configNode).getAttribute("inTemplate") should work in the DOM without the need to use any XPath.
If you want to select a child element (e.g. <config><inTemplate>foo</inTemplate></config>) named inTemplate then use the path inTemplate and not #inTemplate.

Java: Parse XML child element that may not exist

I'm trying to extract values from an InputStream containing XML data. The general data layout is something like this:
<objects count="1">
<object>
<stuff>...</stuff>
<more_stuff>...</more_stuff>
...
<connections>
<connection>124</connection>
<connection>128</connection>
</connections>
</object>
<objects>
I need to find the integers stored in the <connection> attributes. However, I can't guarantee that there will always be exactly two (there may be just one or none at all). Even more, there will be cases where the element <connections> is not present.
I've been looking at examples like this, but it doesn't mention how to handle cases where a parent is non-existent.
The case where <connections> doesn't exist at all is quite rare (but is something I definitely need to know when it does happen), and the case where it does exist but contains less than two <connection>'s would be even more rare (basically I expect it to never happen).
Should I just assume everything is in place and catch the exception if something happens, or is there a clever way to detect the presence of <connections>?
My initial idea was to use something like:
InputStream response = urlConnection.getInputStream();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(response);
String xPathExpressionString = "/objects/object/connections/connection";
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression expr = xPath.compile(xPathExpressionString);
NodeList nodeList = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
Node intersectionNode = nodeList.item(i);
if (intersectionNode.getNodeType() == Node.ELEMENT_NODE) { // What is this anyway?
// Do something with value
}
}
According to the example linked above, this should handle the case with varying amounts of <connection>'s, but how should I deal with <connections> missing alltoghether.
(Btw, there should always only be a single object, so no need to worry about that)

Use this xpath expression:
"//object//connection"
The "//" construct is a short form for the "self-or-descendants" axis. So the expression above will select all <connection> elements that have an <object> parent.

From the below code we can get all the names of child tags of a document and once we it goes into second if block it means there is connections tag existing as childnode for given doc:
As you said we don't know information about the parent we can use the below line accordingly to the xml present.
group.getChildNodes().item(0).getChildNodes()......
Document doc = dBuilder.parse(inputFile);
doc.getDocumentElement().normalize();
NodeList groupList = doc.getChildNodes().item(0).getChildNodes();
for (int groupCount = 0; groupCount < groupList.getLength(); groupCount++)
{
Node group = groupList.item(groupCount);
if (group.getNodeType() == Node.ELEMENT_NODE)
{
if(group.getNodeName().equals("connections"))
{
}
}
}
My First Answer in Stackoverflow.Hope this helps.

How can I traverse xml nodes without knowing its schema

I know I can use DocumentBuilder to parse an xml file and traverse through the nodes but I am stuck at figuring out if the node has any more children. So for example in this xml:
<MyDoc>
<book>
<title> ABCD </title>
</book>
</MyDoc>
if I do node.hasChildNodes() I get true for both book and title. But what I am trying to do is if a node has some text value (not attributes) like title then print it otherwise don't do anything. I know this is some simple check but I just can't seem to find the answer on web. I am probably not searching with right keywords. Thanks in advance.

Try getChildNodes(). That will return a NodeList object which will allow you to iterate through all of the Nodes under the one you're referencing. regardless of what names they might have.

You have to check the type of the child nodes that you get by calling getChildNodes()by calling getNodeType(). <book> has a child of type ELEMENT_NODE whereas <title> has a child of type TEXT_NODE.

I am not sure but I think you wanted a way to iterate through all of the elements regardless of how nested it is. The below recursively goes through all elements. It then prints the elements value as long as its not just white space:
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("test.xml");
NodeList childNodes = doc.getChildNodes();
iterateNodes(childNodes);
}
private static void iterateNodes(NodeList childNodes)
{
for (int i = 0; i < childNodes.getLength(); ++i)
{
Node node = childNodes.item(i);
String text = node.getNodeValue();
if (text != null && !text.trim().isEmpty()) {
System.out.println(text);
}
if (node.hasChildNodes()) {
iterateNodes(node.getChildNodes());
}
}
}

Text nodes exist under element nodes in a DOM, and data is always stored in text nodes. Perhaps the most common error in DOM processing is to navigate to an element node and expect it to contain the data that is stored in that element. Not so! Even the simplest element node has a text node under it that contains the data.
Ref: http://docs.oracle.com/javase/tutorial/jaxp/dom/readingXML.html

reading xml file with multiple child node

Consider i have a XML file like the below xml file.
<top>
<CRAWL>
<NAME>div[class=name],attr=0</NAME>
<PRICE>span[class~=(?i)(price-new|price-old)],attr=0</PRICE>
<DESC>div[class~=(?i)(sttl dyn|bin)],attr=0</DESC>
<PROD_IMG>div[class=image]>a>img,attr=src</PROD_IMG>
<URL>div[class=name]>a,attr=href</URL>
</CRAWL>
<CRAWL>
<NAME>img[class=img],attr=alt</NAME>
<PRICE>div[class=g-b],attr=0</PRICE>
<DESC>div[class~=(?i)(sttl dyn|bin)],attr=0</DESC>
<PROD_IMG>img[itemprop=image],attr=src</PROD_IMG>
<URL>a[class=img],attr=href</URL>
</CRAWL>
</top>
what i want is first take all the values coming under and after finishing the first operation go to the next one and repeat it even though i have more than two tag.I have managed to get if just one is available. using the values coming inside the tags i am doing some other function. in each it has values from different and i am using that values for different operations. everything else if fine other than i dont know how to loop the fetching inside the xml file.
regards

If I'm understanding this correctly, you're trying to extract data from ALL tags that exist within your XML fragment. There are multiple solutions to this. I'm listing them below:
XPath: If you know exactly what your XML structure is, you can employ XPath for each node=CRAWL to find data within tags:
// Instantiate XPath variable
XPath xpath = XPathFactory.newInstance().newXPath();
// Define the exact XPath expressions you want to get data for:
XPathExpression name = xpath.compile("//top/CRAWL/NAME/text()");
XPathExpression price = xpath.compile("//top/CRAWL/PRICE/text()");
XPathExpression desc = xpath.compile("//top/CRAWL/DESC/text()");
XPathExpression prod_img = xpath.compile("//top/CRAWL/PROD_IMG/text()");
XPathExpression url = xpath.compile("//top/CRAWL/URL/text()");
At this point, each of the variables above will contain the data for each of the tags. You could drop this into an array for each where you will have all the data for each of the tags in all elements.
The other (more efficient solution) is to have the data stored by doing DOM based parsing:
// Instantiate the doc builder
DocumentBuilder xmlDocBuilder = domFactory.newDocumentBuilder();
Document xmlDoc = xmlDocBuilder.parse("xmlFile.xml");
// Create NodeList of element tag "CRAWL"
NodeList crawlNodeList = xmlDoc.getElementsByTagName("CRAWL");
// Now iterate through each item in the NodeList and get the values of
// each of the elements in Name, Price, Desc etc.
for (Node node: crawlNodeList) {
NamedNodeMap subNodeMap = node.getChildNodes();
int currentNodeMapLength = subNodeMap.getLength();
// Get each node's name and value
for (i=0; i<currentNodeMapLength; i++){
// Iterate through all of the values in the nodeList,
// e.g. NAME, PRICE, DESC, etc.
// Do something with these values
}
}
Hope this helps!

Java XPath : iterating over a collection of nodes and their indices

I have this XML instance document:
<entities>
<person>James</person>
<person>Jack</person>
<person>Jim</person>
</entities>
And with the following code I iterate over the person nodes and print their names:
XPathExpression expr = xpath.compile("/entities/person");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
String nodeName = node.getNodeName();
String name = xpath.compile("text()").evaluate(node).trim();
System.out.printf("node type = %s, node name = %s\n", nodeName, name);
}
Now what I would like is to also have access to the index of each node.
I know I can trivially get it from the i loop variable but I want to get it as an XPath expression instead, preferably in no different way than I get the value of the text() XPath expression.
My use-case is that I am trying to handle all attributes I collect as XPath expressions (which I load at run-time from a config file) so that I minimize non-generic code, so I don't want to treat the index as a special case.

You'd have to use a trick like counting the preceding siblings
count(preceding-sibling::person)
which gives 0 for the first person, 1 for the second one, etc.

Try using position()
String index = xpath.compile("position()").evaluate(node).trim();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse XML file using XPATH expressions and Java - java

Related

How to parse through a Node and extract the value of a child node in Java?

Java: Parse XML child element that may not exist

How can I traverse xml nodes without knowing its schema

reading xml file with multiple child node

Java XPath : iterating over a collection of nodes and their indices

Categories

Resources