Comparing date with Xpath in java - java

I have an XML page that I parse using DOM in Java. When I perform a query using XPath, for example price <10 or price >20, I get the expected result. However, I cannot get any results when I try to compare by date. NetBeans says it's successful, but does not give me any results.
This code is from the XML page:
<!?xml version="1.0" encoding="UTF-8"?><catalog ><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<catalog>
This is my Java code:
public class Main {
public static void main(String[] args) throws XPathExpressionException, FileNotFoundException {
XPathFactory factory = XPathFactory.newInstance();
XPath path = factory.newXPath();
XPathExpression xPathExpression=path.compile("//book[price >10]/* ");
//| //book[price>10]/*
File xmlDocument =new File("books.xml");
InputSource inputSource = new InputSource(new FileInputStream(xmlDocument));
Object result = xPathExpression.evaluate(inputSource,XPathConstants.NODESET);
NodeList nodeList = (NodeList)result;
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.print(nodeList.item(i).getNodeName()+" ");
System.out.print(nodeList.item(i).getFirstChild().getNodeValue());
System.out.print("\n");
}
}
}
What I need to do is to compare publish_date to a predefined date.
"//book[publish_date>2000-01-01]/*
something like this

Xpath (at least 1.0) cannot compare dates, but you can turn date into integer with right order, using translate function:
//book[translate(publish_date,'-','') > 20000101]

Related

Java XPath scan file looking for a word

Im building an application that will taka a word from user and then scan file using XPath returning true or false depending on wheather the word was found in that file or not.
I have build following class that implements XPath, but i am either missunderstanding how it should work or there is something wrong with my code. Can anyone explain to me how to use Xpath to make full file search?
public XPath() throws IOException, SAXException, ParserConfigurationException, XPathExpressionException {
FileInputStream fileIS = new FileInputStream("text.xml");
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPathFactory xPathfactory = XPathFactory.newInstance();
javax.xml.xpath.XPath xPath = xPathfactory.newXPath();
XPathExpression expr = xPath.compile("//text()[contains(.,'java')]");
System.out.println(expr.evaluate(xmlDocument, XPathConstants.NODESET));
}
And the xml file i am currently testing on.
<?xml version="1.0"?>
<Tutorials>
<Tutorial tutId="01" type="java">
<title>Guava</title>
<description>Introduction to Guava</description>
<date>04/04/2016</date>
<author>GuavaAuthor</author>
</Tutorial>
<Tutorial tutId="02" type="java">
<title>XML</title>
<description>Introduction to XPath</description>
<date>04/05/2016</date>
<author>XMLAuthor</author>
</Tutorial>
</Tutorials>
Found the solution, i was missing correct display of the found entries and as someone pointed out in comment 'java' is in arguments and i want to scan only text fields so it would be never found, after adding following code and changing the word my app will look for, application works
Object result = expr.evaluate(xmlDocument, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
Your XPath is searching the text() nodes, but the word java appears in the #type attribute (which is not a text() node).
If you want to search for the word in both text() and #* then you could use a union | operator and check for either/both containing that word:
//text()[contains(. ,'java')] | //#*[contains(., 'java')]
But you might also want to scan comment() and processing-instruction(), so could generically match on node() and then in the predicate test:
//node()[contains(. ,'java')] | //#*[contains(., 'java')]
With XPath 2.0 or greater, you could use:
//node()[(.|#*)[contains(., 'java')]]

How to extract all values pointed to by an XPath?

I have the below xml
<test>
<nodeA>
<nodeB>key</nodeB>
<nodeC>value1</nodeC>
</nodeA>
<nodeA>
<nodeB>key</nodeB>
<nodeC>value2</nodeC>
</nodeA>
</test>
How to concatenate and get all the values in the xpath /test/nodeA/nodeC ?
My expected output in this scenario would be value1value2
I am not sure from what I have read that it is possible with xpath, but thanks for your help.
P.S: I am using VTD-XML from Ximpleware to parse the same in Java. Any java based solution is also welcome. Currently my java solution gives only the first value, i.e. value1
XPath will return a NodeList which you can iterate and concatenate:
StringBuilder concatenated = new StringBuilder():
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/test/nodeA/nodeC/text()";
InputSource inputSource = new InputSource("sample.xml");
NodeList nodes = (NodeList) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);
for(int i = 0; i < nodes.getLength(); i++) {
concatenated.append(nodes.item(i).getTextContent());
}
Here's a groovy implementation (in 2 lines of code!) using XmlSlurper
def xml = new groovy.util.XmlSlurper().parse(new File('sample.xml'))
print xml.nodeA*.nodeC.join("")
Outputs
value1value2
I don't use groovy in production code but for local mucking about it's great. I often have little groovy utilities in gradle build files.
If you want a single XPath expression to return the string with a concatenation of the selected values then you need XPath 2.0 (or later) respectively XQuery 1.0 (or later) where you can do string-join(/test/nodeA/nodeC, ''). XPath 1.0 does not have the expressive power to give you a string but of course, as already shown in another answer, you can iterate over the selected nodes and concatenate the selected values in a host language like Java.
This is how I would do it with VTD-XML...
import com.ximpleware.*;
public class concat {
public static void main(String[] s) throws VTDException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/test/*/nodeC/text()");
StringBuilder sb = new StringBuilder(100);
int i=0;
while((i=ap.evalXPath())!=-1){
sb.append(vn.toString(i));
}
System.out.println(sb.toString());
}
}

Complex search query through XML records

I have a list of objects which contain one XML String field. I have to execute an SQL like query for that field, and get a sub list that satisfies the values. I am trying to use XPath.
Firstly, I can't figure out the XPath string to achieve this. Secondly, there might be a better way of doing this. I tried searching through SO but the answers don't really address this problem
Details
I have a list of books:
List <Books> allBooks;
The Book class can have an id and details fields. The details is XML.
class Book
{
String id;
String details; //XML
}
Here is a sample of the details xml String:
<book>
<name>Harry Potter and the sorcerer's stone</name>
<author>J K Rowling</author>
<genre>fantasy</genre>
<keyword>wizard</keyword>
<keyword>british</keyword>
<keyword>hogwarts</keyword>
<price>25</price>
</book>
So, uptil here it is all set in stone. It is part of existing code and I cannot change that design.
My work is to take the list allBooks & run a query through it, the logic of which is:
WHERE author = "J K Rowling" AND
genre = "fantasy" AND
(keyword = "wizard" OR keyword="hogwarts")
I considered throwing this data in a DB to run an actual query, but since the list will only contain a couple of hundred records, the overhead of connection, loading data etc is not worth it.
Anyone know how to do this through XPath? Any better way of doing this?
We need book records
//book
with author "J K Rowling"
//book[author = "J K Rowling"]
and genre is "fantasy"
//book[author = "J K Rowling" and genre = "fantasy"]
and keyword is "wizard" or "hogwarts"
//book[author = "J K Rowling" and genre = "fantasy" and (keyword = "wizard" or keyword = "hogwarts")]
You need to build the XPath queries first. I recommend referring to a previous answer for those (hoaz has a good listing here). Then you need to write the code to compile the query and evaluate it. Example:
public List<Book> findBookInformation(List<Books> books)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
List<Book> foundBooks = new ArrayList<Book>(); // books matching criteria
for (Book book : books) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(book.details))); // parse details XML into a Doc object
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
//using one of the query examples
XPathExpression expr = xpath.compile("/book[author = \"J K Rowling\" and genre = \"fantasy\" and (keyword = \"wizard\" or keyword = \"hogwarts\")]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
if (null != nodes && nodes.getLength() > 0) {
foundBooks.add(book); // add to your return list
}
}
return foundBooks;
}
You could extend a method like this to take in your query arguments to dynamically build your XPath query, but this should give you the basic idea.
Assume the Books is the root
/Books/Book[(author = "J K Rowling") and (genre = "fantasy") and (keyword = "wizard" or keyword = "hogwarts")]

How to access to value read XML using XPath in Java

I want to read XML data using XPath in Java.
I have the next XML file named MyXML.xml:
<?xml version="1.0" encoding="iso-8859-1" ?>
<REPOSITORY xmlns:LIBRARY="http://www.openarchives.org/LIBRARY/2.0/"
xmlns:xsi="http://www.w3.prg/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/LIBRARY/2.0/ http://www.openarchives.org/LIBRARY/2.0/LIBRARY-PHM.xsd">
<repository>Test</repository>
<records>
<record>
<ejemplar>
<library_book:book
xmlns:library_book="http://www.w3c.es/LIBRARY/book/"
xmlns:book="http://www.w3c.es/LIBRARY/book/"
xmlns:bookAssets="http://www.w3c.es/LIBRARY/book/"
xmlns:bookAsset="http://www.w3c.es/LIBRARY/book/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3c.es/LIBRARY/book/ http://www.w3c.es/LIBRARY/replacement/book.xsd">
<book:bookAssets count="1">
<book:bookAsset nasset="1">
<book:bookAsset.id>value1</book:bookAsset.id>
<book:bookAsset.event>
<book:bookAsset.event.id>value2</book:bookAsset.event.id>
</book:bookAsset.event>
</book:bookAsset>
</book:bookAssets>
</library_book:book>
</ejemplar>
</record>
</records>
</REPOSITORY>
I want access to value1 and value2 values. For this, I try this:
// Standard of reading a XML file
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
XPathExpression expr = null;
builder = factory.newDocumentBuilder();
doc = builder.parse("MyXML.xml");
// Create a XPathFactory
XPathFactory xFactory = XPathFactory.newInstance();
// Create a XPath object
XPath xpath = xFactory.newXPath();
expr = xpath.compile("//REPOSITORY/records/record/ejemplar/library_book:book//book:bookAsset.event.id/text()");
Object result = expr.evaluate(doc, XPathConstants.STRING);
System.out.println("RESULT=" + (String)result);
But I don't get any results. Only prints RESULT=.
¿How to access to value1 and value2 values?. ¿What is the XPath filter to apply?.
Thanks in advanced.
I'm using JDK6.
You are having problems with namespaces, what you can do is
take them into account
ignore them using the XPath local-name() function
Solution 1 implies implementing a NamespaceContext that maps namespaces names and URIs and set it on the XPath object before querying.
Solution 2 is easy, you just need to change your XPath (but depending on your XML you may fine-tune your XPath to be sure to select the correct element):
XPath xpath = xFactory.newXPath();
expr = xpath.compile("//*[local-name()='bookAsset.event.id']/text()");
Object result = expr.evaluate(doc, XPathConstants.STRING);
System.out.println("RESULT=" + result);
Runnable example on ideone.
You can take a look at the following blog article to better understand the uses of namespaces and XPath in Java (even if old)
Try
Object result = expr.evaluate(doc, XPathConstants.NODESET);
// Cast the result to a DOM NodeList
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
System.out.println(nodes.item(i).getNodeValue());
}
One approach is to implement a name space context like:
public static class UniversalNamespaceResolver implements NamespaceContext {
private Document sourceDocument;
public UniversalNamespaceResolver(Document document) {
sourceDocument = document;
}
public String getNamespaceURI(String prefix) {
if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
return sourceDocument.lookupNamespaceURI(null);
} else {
return sourceDocument.lookupNamespaceURI(prefix);
}
}
public String getPrefix(String namespaceURI) {
return sourceDocument.lookupPrefix(namespaceURI);
}
public Iterator getPrefixes(String namespaceURI) {
return null;
}
}
And then use it like
xpath.setNamespaceContext(new UniversalNamespaceResolver(doc));
You also need to move up all the namespace declarations to the root node (REPOSITORY). Otherwise it might be a problem if you have namespace declarations on two different levels.

Parsing XML in Java from Wordpress feed

private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
I have this code for parsing the title and content from the default wordpress xml, the only problem is that when I try to get the content of the blog entry, the xml tag is: <content:encoded> and I do not understand how to retrieve this data ?
The tag <content:encoded> means an element with the name encoded in the XML namespace with the prefix content. The XPath evaluator is probably unable to resolve the content prefix to it's namespace, which I think is http://purl.org/rss/1.0/modules/content/ from a quick Google.
To get it to resolve, you'll need to do the following:
Ensure your DocumentBuilderFactory has setNamespaceAware( true ) called on it after construction, otherwise all namespaces are discarded during parsing.
Write an implementation of javax.xml.namespace.NamespaceContext to resolve the prefix to it's namespace (doc).
Call XPath#setNamespaceContext() with your implementation.
You could also try to use XStream, wich is a good and easy to use XML parser. Makes you have almost no work for parsing known XML structures.
PS: Their site is currently offline, use Google Cache to see it =P

Categories