How to extract all values pointed to by an XPath?

How to extract all values pointed to by an XPath? - java

I have the below xml
<test>
<nodeA>
<nodeB>key</nodeB>
<nodeC>value1</nodeC>
</nodeA>
<nodeA>
<nodeB>key</nodeB>
<nodeC>value2</nodeC>
</nodeA>
</test>
How to concatenate and get all the values in the xpath /test/nodeA/nodeC ?
My expected output in this scenario would be value1value2
I am not sure from what I have read that it is possible with xpath, but thanks for your help.
P.S: I am using VTD-XML from Ximpleware to parse the same in Java. Any java based solution is also welcome. Currently my java solution gives only the first value, i.e. value1

XPath will return a NodeList which you can iterate and concatenate:
StringBuilder concatenated = new StringBuilder():
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/test/nodeA/nodeC/text()";
InputSource inputSource = new InputSource("sample.xml");
NodeList nodes = (NodeList) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);
for(int i = 0; i < nodes.getLength(); i++) {
concatenated.append(nodes.item(i).getTextContent());
}

Here's a groovy implementation (in 2 lines of code!) using XmlSlurper
def xml = new groovy.util.XmlSlurper().parse(new File('sample.xml'))
print xml.nodeA*.nodeC.join("")
Outputs
value1value2
I don't use groovy in production code but for local mucking about it's great. I often have little groovy utilities in gradle build files.

If you want a single XPath expression to return the string with a concatenation of the selected values then you need XPath 2.0 (or later) respectively XQuery 1.0 (or later) where you can do string-join(/test/nodeA/nodeC, ''). XPath 1.0 does not have the expressive power to give you a string but of course, as already shown in another answer, you can iterate over the selected nodes and concatenate the selected values in a host language like Java.

This is how I would do it with VTD-XML...
import com.ximpleware.*;
public class concat {
public static void main(String[] s) throws VTDException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/test/*/nodeC/text()");
StringBuilder sb = new StringBuilder(100);
int i=0;
while((i=ap.evalXPath())!=-1){
sb.append(vn.toString(i));
}
System.out.println(sb.toString());
}
}

Related

Using XPath count function

I am using a oracle sql database to carryout sql queries with xpath expressions – I have created an XML file which contains data relating to a film

The XPath expression you're looking for (not the SQL expression) is:
count(/film/directors/director)
which result should be 1 with your example XML file.
If you want to check if it's 2, use
count(/film/directors/director) = 2
which should return FALSE with your XML file.

First, you obviously know you need to use xPath to query the XML file, but you seem to have failed to understand what xPath is or how it should be used.
My first suggestion would be to go a read up on xPath and xPath in Java because it has nothing to do with the SQL.
I then did a quick search on "java xpath count" and come across a number of excellent examples, but based on XPath count() function, I went about testing your document with...
try {
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
DocumentBuilder b = f.newDocumentBuilder();
// This is your document in a file
Document d = b.parse(new File("Test.xml"));
d.getDocumentElement().normalize();
String expression = "//film[count(directors)=1]";
XPath xPath = XPathFactory.newInstance().newXPath();
Object result = xPath.compile(expression).evaluate(d, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
System.out.println("Found " + node.getTextContent());
}
} catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException | DOMException exp) {
exp.printStackTrace();
}
This basically listed the film node (found one match) ... but, why did you produce a result?! Look at the query, //film[count(directors)=1], it's listing all film matches with a one director, because I want to test the query. Change it to //film[count(directors)=2] and it will return a result of zero matches based on your example.
I would highly recommend that you pause for a moment and become more familiar with what xPath is and how it works before you continue

xpath getting multiple node values - xml parser using java

below are the xml file
<priority-claims>
<priority-claim sequence="1" kind="national">
<document-id document-id-type="maindoc">
<doc-number>FD0297663</doc-number>
<date>20070403</date>
</document-id>
</priority-claim>
<priority-claim sequence="2" kind="national">
<document-id document-id-type="maindoc">
<doc-number>FD0745459P</doc-number>
<date>20060424</date>
</document-id>
</priority-claim>
</priority-claims>
my expected conditions:
1.How can i getting all the node value (i.e FD0297663, 20070403 and FD0745459P,20060424)
2.its may be single (i.e Priority -claim tag) or more than a single level is possible
my existing code getting first level value only
String priorityNumber = xPath.compile("//priority-claim//doc-number").evaluate(xmlDocument);
String priorityDate = xPath.compile("//priority-claim//date").evaluate(xmlDocument);

Below is a working example:
updated the xpath expressions (e.g, /priority-claims/priority-claim/document-id/doc-number/text())
using NodeList
NodeList priorityNumbers = (NodeList) xPath.compile("/priority-claims/priority-claim/document-id/doc-number/text()").evaluate(xmlDocument, XPathConstants.NODESET);
NodeList priorityDates = (NodeList) xPath.compile("/priority-claims/priority-claim/document-id/date/text()").evaluate(xmlDocument,XPathConstants.NODESET);
for(int i=0; i<priorityNumbers.getLength();i++){
System.out.println(priorityNumbers.item(i).getNodeValue());
}
for(int i=0; i<priorityDates.getLength();i++){
System.out.println(priorityDates.item(i).getNodeValue());
}
Here is a linkt to a gist with a runnable version: https://gist.github.com/rparree/1c7eb8e9ca928b98418fdb167a2096a3

Comparing date with Xpath in java

I have an XML page that I parse using DOM in Java. When I perform a query using XPath, for example price <10 or price >20, I get the expected result. However, I cannot get any results when I try to compare by date. NetBeans says it's successful, but does not give me any results.
This code is from the XML page:
<!?xml version="1.0" encoding="UTF-8"?><catalog ><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<catalog>
This is my Java code:
public class Main {
public static void main(String[] args) throws XPathExpressionException, FileNotFoundException {
XPathFactory factory = XPathFactory.newInstance();
XPath path = factory.newXPath();
XPathExpression xPathExpression=path.compile("//book[price >10]/* ");
//| //book[price>10]/*
File xmlDocument =new File("books.xml");
InputSource inputSource = new InputSource(new FileInputStream(xmlDocument));
Object result = xPathExpression.evaluate(inputSource,XPathConstants.NODESET);
NodeList nodeList = (NodeList)result;
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.print(nodeList.item(i).getNodeName()+" ");
System.out.print(nodeList.item(i).getFirstChild().getNodeValue());
System.out.print("\n");
}
}
}
What I need to do is to compare publish_date to a predefined date.
"//book[publish_date>2000-01-01]/*
something like this

Xpath (at least 1.0) cannot compare dates, but you can turn date into integer with right order, using translate function:
//book[translate(publish_date,'-','') > 20000101]

How to read an XML in java w/o DOM?

I have an XML file and reading the information using Xpath, I want to read the 'listings_Id' and 'budget_remaining' together.
XML example
<ads>
<ad>
<listing_ids>
<listing_id>2235</listing_id>
<listing_id>303</listing_id>
<listing_id>394</listing_id>
</listing_ids>
<reference_id>11</reference_id>
<net_ppe>0.55</net_ppe>
<budget_remaining>50000.0</budget_remaining>
</ad>
<ad>
<listing_ids>
<listing_id>2896</listing_id>
</listing_ids>
<reference_id>8</reference_id>
<net_ppe>1.5</net_ppe>
<budget_remaining>1.3933399</budget_remaining>
</ad>
</ads>
I want to output it to a CSV file as the following
ListingId,BudgetRemaining
2235,50000
303,50000
394,50000
2896,1.39
Using the code
String expression = "/ads/ad/listing_ids/listing_id";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(docum, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getFirstChild().getNodeValue());
}
String expression1 = "/ads/ad/budget_remaining";
System.out.println(expression1);
NodeList nodeList1 = (NodeList) xPath.compile(expression1).evaluate(docum, XPathConstants.NODESET);
for (int i = 0; i < nodeList1.getLength(); i++) {
System.out.println(nodeList1.item(i).getFirstChild().getNodeValue());
}
Output
/ads/ad/listing_ids/listing_id
2235
303
394
2896
/ads/ad/budget_remaining
50000.0
1.3933399
Desired Output
2235,50000.0
303,50000.0
2896,50000.0
2896,1.3933399
How to read the XML using Xpath or any other method? I want the 'listing_ids' and 'budget_ remaining' to be read together for each 'Listing Id' like
303,50000
Please help me-new to Java.

It may be easier for you to use jaxb to parse the XML into a list of ads.
You can then reference your Java list

I would suggest using XQuery, which unlike XPath can return structured results. (Or XPath 2.0, but if you're going to XPath 2.0 then you might as well go all the way to XQuery).
The relevant query is
string-join(
for $n in /ads/ad/listing_ids/listing_id
return $n/concat(., ',', ../../budget_remaining),
'
'
)
This will return the required output as a single string.

Parsing XML in Java from Wordpress feed

private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
I have this code for parsing the title and content from the default wordpress xml, the only problem is that when I try to get the content of the blog entry, the xml tag is: <content:encoded> and I do not understand how to retrieve this data ?

The tag <content:encoded> means an element with the name encoded in the XML namespace with the prefix content. The XPath evaluator is probably unable to resolve the content prefix to it's namespace, which I think is http://purl.org/rss/1.0/modules/content/ from a quick Google.
To get it to resolve, you'll need to do the following:
Ensure your DocumentBuilderFactory has setNamespaceAware( true ) called on it after construction, otherwise all namespaces are discarded during parsing.
Write an implementation of javax.xml.namespace.NamespaceContext to resolve the prefix to it's namespace (doc).
Call XPath#setNamespaceContext() with your implementation.

You could also try to use XStream, wich is a good and easy to use XML parser. Makes you have almost no work for parsing known XML structures.
PS: Their site is currently offline, use Google Cache to see it =P

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to extract all values pointed to by an XPath? - java

Related

Using XPath count function

xpath getting multiple node values - xml parser using java

Comparing date with Xpath in java

How to read an XML in java w/o DOM?

Parsing XML in Java from Wordpress feed

Categories

Resources