xpath getting multiple node values - xml parser using java - java

below are the xml file
<priority-claims>
<priority-claim sequence="1" kind="national">
<document-id document-id-type="maindoc">
<doc-number>FD0297663</doc-number>
<date>20070403</date>
</document-id>
</priority-claim>
<priority-claim sequence="2" kind="national">
<document-id document-id-type="maindoc">
<doc-number>FD0745459P</doc-number>
<date>20060424</date>
</document-id>
</priority-claim>
</priority-claims>
my expected conditions:
1.How can i getting all the node value (i.e FD0297663, 20070403 and FD0745459P,20060424)
2.its may be single (i.e Priority -claim tag) or more than a single level is possible
my existing code getting first level value only
String priorityNumber = xPath.compile("//priority-claim//doc-number").evaluate(xmlDocument);
String priorityDate = xPath.compile("//priority-claim//date").evaluate(xmlDocument);

Below is a working example:
updated the xpath expressions (e.g, /priority-claims/priority-claim/document-id/doc-number/text())
using NodeList
NodeList priorityNumbers = (NodeList) xPath.compile("/priority-claims/priority-claim/document-id/doc-number/text()").evaluate(xmlDocument, XPathConstants.NODESET);
NodeList priorityDates = (NodeList) xPath.compile("/priority-claims/priority-claim/document-id/date/text()").evaluate(xmlDocument,XPathConstants.NODESET);
for(int i=0; i<priorityNumbers.getLength();i++){
System.out.println(priorityNumbers.item(i).getNodeValue());
}
for(int i=0; i<priorityDates.getLength();i++){
System.out.println(priorityDates.item(i).getNodeValue());
}
Here is a linkt to a gist with a runnable version: https://gist.github.com/rparree/1c7eb8e9ca928b98418fdb167a2096a3

Related

Using XPath count function

I am using a oracle sql database to carryout sql queries with xpath expressions – I have created an XML file which contains data relating to a film
The XPath expression you're looking for (not the SQL expression) is:
count(/film/directors/director)
which result should be 1 with your example XML file.
If you want to check if it's 2, use
count(/film/directors/director) = 2
which should return FALSE with your XML file.
First, you obviously know you need to use xPath to query the XML file, but you seem to have failed to understand what xPath is or how it should be used.
My first suggestion would be to go a read up on xPath and xPath in Java because it has nothing to do with the SQL.
I then did a quick search on "java xpath count" and come across a number of excellent examples, but based on XPath count() function, I went about testing your document with...
try {
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
DocumentBuilder b = f.newDocumentBuilder();
// This is your document in a file
Document d = b.parse(new File("Test.xml"));
d.getDocumentElement().normalize();
String expression = "//film[count(directors)=1]";
XPath xPath = XPathFactory.newInstance().newXPath();
Object result = xPath.compile(expression).evaluate(d, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
System.out.println("Found " + node.getTextContent());
}
} catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException | DOMException exp) {
exp.printStackTrace();
}
This basically listed the film node (found one match) ... but, why did you produce a result?! Look at the query, //film[count(directors)=1], it's listing all film matches with a one director, because I want to test the query. Change it to //film[count(directors)=2] and it will return a result of zero matches based on your example.
I would highly recommend that you pause for a moment and become more familiar with what xPath is and how it works before you continue

How to read an XML in java w/o DOM?

I have an XML file and reading the information using Xpath, I want to read the 'listings_Id' and 'budget_remaining' together.
XML example
<ads>
<ad>
<listing_ids>
<listing_id>2235</listing_id>
<listing_id>303</listing_id>
<listing_id>394</listing_id>
</listing_ids>
<reference_id>11</reference_id>
<net_ppe>0.55</net_ppe>
<budget_remaining>50000.0</budget_remaining>
</ad>
<ad>
<listing_ids>
<listing_id>2896</listing_id>
</listing_ids>
<reference_id>8</reference_id>
<net_ppe>1.5</net_ppe>
<budget_remaining>1.3933399</budget_remaining>
</ad>
</ads>
I want to output it to a CSV file as the following
ListingId,BudgetRemaining
2235,50000
303,50000
394,50000
2896,1.39
Using the code
String expression = "/ads/ad/listing_ids/listing_id";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(docum, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getFirstChild().getNodeValue());
}
String expression1 = "/ads/ad/budget_remaining";
System.out.println(expression1);
NodeList nodeList1 = (NodeList) xPath.compile(expression1).evaluate(docum, XPathConstants.NODESET);
for (int i = 0; i < nodeList1.getLength(); i++) {
System.out.println(nodeList1.item(i).getFirstChild().getNodeValue());
}
Output
/ads/ad/listing_ids/listing_id
2235
303
394
2896
/ads/ad/budget_remaining
50000.0
1.3933399
Desired Output
2235,50000.0
303,50000.0
2896,50000.0
2896,1.3933399
How to read the XML using Xpath or any other method? I want the 'listing_ids' and 'budget_ remaining' to be read together for each 'Listing Id' like
303,50000
Please help me-new to Java.
It may be easier for you to use jaxb to parse the XML into a list of ads.
You can then reference your Java list
I would suggest using XQuery, which unlike XPath can return structured results. (Or XPath 2.0, but if you're going to XPath 2.0 then you might as well go all the way to XQuery).
The relevant query is
string-join(
for $n in /ads/ad/listing_ids/listing_id
return $n/concat(., ',', ../../budget_remaining),
'
'
)
This will return the required output as a single string.

how to get a node value in Xpath - Java

I've got a section of XML that looks like this:
<entry>
<id>tag:example.com,2005:Release/343597</id>
<published>2012-04-10T11:29:19Z</published>
<updated>2012-04-10T12:04:41Z</updated>
<link type="text/html" href="http://example.com/projects/example1" rel="alternate"/>
<title>example1</title>
</entry>
I need to grab the link http://example.com/projects/example1 from this block. I'm not sure how to do this. To get the title of the project I use this code:
String title1 = children.item(9).getFirstChild().getNodeValue();
where children is the getChildNodes() object for the <entry> </entry> block. But I keep getting NullPointerExceptions when I try to get the node value for the <link> node in a similar way. I see that the XML code is different for the <link> node, and I'm not sure what it's value is.... Please advise!
The xpath expression to get that node is
//entry/link/#href
In java you can write
Document doc = ... // your XML document
XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//entry/link/#href");
String href = xp.evaluate(doc);
Then if you need to get the link value of the entry with a specific id you can change the xpath expression to
//entry[id='tag:example.com,2005:Release/343597']/link/#href
Finally if you want to get all the links in the documents, if the document has many entry elements you can write
Document doc = ... // your XML document
XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//entry/link/#href");
NodeList links = (NodeList) xp.evaluate(doc, XPathConstants.NODESET);
// and iterate on links
Here is the complete code:
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("test.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//entry/link/#href");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i));
}

How to retrieve a specific node's value in XPath?

I have a XML file with this format:
<object>
<origin>1:1:1</origin>
<normal>2:2:2</normal>
<leafs>
<object>
<origin>1:1:1</origin>
<normal>3:3:3</normal>
<leafs>none</leafs>
</object>
</leafs>
</object>
How could I retrieve the value "none" of element <leafs> on second level of the tree? I used this
XPathExpression expLeafs = xpath.compile("*[name()='leafs']");
Object resLeafs = expLeafs.evaluate(node, XPathConstants.NODESET);
NodeList leafsList = (NodeList) resLeafs;
if (!leafsList.item(0).getFirstChild().getNodeValue().equals("none"))
more code...
but it doesn't work because there are some empty text nodes bofore and after "none". Is there a way to deal with it like xpath.compile("*[value()='none']")?
I just ran a simple test program using your XML file and
expr = xpath.compile("/object/leafs/object/leafs/text()");
and got the desired "none" result. If you have additional requirements, you'll have to edit your question.
After a checking the code line #Lord Torgamus provided i managed to parse the document as i needed like this:
XPathExpression expLeafs = xpath.compile("*[name()='leafs']");
Object resLeafs = expLeafs.evaluate(node, XPathConstants.NODESET);
NodeList leafsList = (NodeList) resLeafs;
Node nd = leafsList.item(0);
XPathExpression expr = xpath.compile("text()");
Object resultObj = expr.evaluate(nd, XPathConstants.NODE);
String str = expr.evaluate(nd).trim();
System.out.println(str);
and the output is "none" with no other empty text node.

Parsing XML in Java from Wordpress feed

private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
I have this code for parsing the title and content from the default wordpress xml, the only problem is that when I try to get the content of the blog entry, the xml tag is: <content:encoded> and I do not understand how to retrieve this data ?
The tag <content:encoded> means an element with the name encoded in the XML namespace with the prefix content. The XPath evaluator is probably unable to resolve the content prefix to it's namespace, which I think is http://purl.org/rss/1.0/modules/content/ from a quick Google.
To get it to resolve, you'll need to do the following:
Ensure your DocumentBuilderFactory has setNamespaceAware( true ) called on it after construction, otherwise all namespaces are discarded during parsing.
Write an implementation of javax.xml.namespace.NamespaceContext to resolve the prefix to it's namespace (doc).
Call XPath#setNamespaceContext() with your implementation.
You could also try to use XStream, wich is a good and easy to use XML parser. Makes you have almost no work for parsing known XML structures.
PS: Their site is currently offline, use Google Cache to see it =P

Categories