Complex search query through XML records - java

I have a list of objects which contain one XML String field. I have to execute an SQL like query for that field, and get a sub list that satisfies the values. I am trying to use XPath.
Firstly, I can't figure out the XPath string to achieve this. Secondly, there might be a better way of doing this. I tried searching through SO but the answers don't really address this problem
Details
I have a list of books:
List <Books> allBooks;
The Book class can have an id and details fields. The details is XML.
class Book
{
String id;
String details; //XML
}
Here is a sample of the details xml String:
<book>
<name>Harry Potter and the sorcerer's stone</name>
<author>J K Rowling</author>
<genre>fantasy</genre>
<keyword>wizard</keyword>
<keyword>british</keyword>
<keyword>hogwarts</keyword>
<price>25</price>
</book>
So, uptil here it is all set in stone. It is part of existing code and I cannot change that design.
My work is to take the list allBooks & run a query through it, the logic of which is:
WHERE author = "J K Rowling" AND
genre = "fantasy" AND
(keyword = "wizard" OR keyword="hogwarts")
I considered throwing this data in a DB to run an actual query, but since the list will only contain a couple of hundred records, the overhead of connection, loading data etc is not worth it.
Anyone know how to do this through XPath? Any better way of doing this?

We need book records
//book
with author "J K Rowling"
//book[author = "J K Rowling"]
and genre is "fantasy"
//book[author = "J K Rowling" and genre = "fantasy"]
and keyword is "wizard" or "hogwarts"
//book[author = "J K Rowling" and genre = "fantasy" and (keyword = "wizard" or keyword = "hogwarts")]

You need to build the XPath queries first. I recommend referring to a previous answer for those (hoaz has a good listing here). Then you need to write the code to compile the query and evaluate it. Example:
public List<Book> findBookInformation(List<Books> books)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
List<Book> foundBooks = new ArrayList<Book>(); // books matching criteria
for (Book book : books) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(book.details))); // parse details XML into a Doc object
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
//using one of the query examples
XPathExpression expr = xpath.compile("/book[author = \"J K Rowling\" and genre = \"fantasy\" and (keyword = \"wizard\" or keyword = \"hogwarts\")]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
if (null != nodes && nodes.getLength() > 0) {
foundBooks.add(book); // add to your return list
}
}
return foundBooks;
}
You could extend a method like this to take in your query arguments to dynamically build your XPath query, but this should give you the basic idea.

Assume the Books is the root
/Books/Book[(author = "J K Rowling") and (genre = "fantasy") and (keyword = "wizard" or keyword = "hogwarts")]

Related

Java XPath scan file looking for a word

Im building an application that will taka a word from user and then scan file using XPath returning true or false depending on wheather the word was found in that file or not.
I have build following class that implements XPath, but i am either missunderstanding how it should work or there is something wrong with my code. Can anyone explain to me how to use Xpath to make full file search?
public XPath() throws IOException, SAXException, ParserConfigurationException, XPathExpressionException {
FileInputStream fileIS = new FileInputStream("text.xml");
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPathFactory xPathfactory = XPathFactory.newInstance();
javax.xml.xpath.XPath xPath = xPathfactory.newXPath();
XPathExpression expr = xPath.compile("//text()[contains(.,'java')]");
System.out.println(expr.evaluate(xmlDocument, XPathConstants.NODESET));
}
And the xml file i am currently testing on.
<?xml version="1.0"?>
<Tutorials>
<Tutorial tutId="01" type="java">
<title>Guava</title>
<description>Introduction to Guava</description>
<date>04/04/2016</date>
<author>GuavaAuthor</author>
</Tutorial>
<Tutorial tutId="02" type="java">
<title>XML</title>
<description>Introduction to XPath</description>
<date>04/05/2016</date>
<author>XMLAuthor</author>
</Tutorial>
</Tutorials>
Found the solution, i was missing correct display of the found entries and as someone pointed out in comment 'java' is in arguments and i want to scan only text fields so it would be never found, after adding following code and changing the word my app will look for, application works
Object result = expr.evaluate(xmlDocument, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
Your XPath is searching the text() nodes, but the word java appears in the #type attribute (which is not a text() node).
If you want to search for the word in both text() and #* then you could use a union | operator and check for either/both containing that word:
//text()[contains(. ,'java')] | //#*[contains(., 'java')]
But you might also want to scan comment() and processing-instruction(), so could generically match on node() and then in the predicate test:
//node()[contains(. ,'java')] | //#*[contains(., 'java')]
With XPath 2.0 or greater, you could use:
//node()[(.|#*)[contains(., 'java')]]

Using XPath count function

I am using a oracle sql database to carryout sql queries with xpath expressions – I have created an XML file which contains data relating to a film
The XPath expression you're looking for (not the SQL expression) is:
count(/film/directors/director)
which result should be 1 with your example XML file.
If you want to check if it's 2, use
count(/film/directors/director) = 2
which should return FALSE with your XML file.
First, you obviously know you need to use xPath to query the XML file, but you seem to have failed to understand what xPath is or how it should be used.
My first suggestion would be to go a read up on xPath and xPath in Java because it has nothing to do with the SQL.
I then did a quick search on "java xpath count" and come across a number of excellent examples, but based on XPath count() function, I went about testing your document with...
try {
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
DocumentBuilder b = f.newDocumentBuilder();
// This is your document in a file
Document d = b.parse(new File("Test.xml"));
d.getDocumentElement().normalize();
String expression = "//film[count(directors)=1]";
XPath xPath = XPathFactory.newInstance().newXPath();
Object result = xPath.compile(expression).evaluate(d, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
System.out.println("Found " + node.getTextContent());
}
} catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException | DOMException exp) {
exp.printStackTrace();
}
This basically listed the film node (found one match) ... but, why did you produce a result?! Look at the query, //film[count(directors)=1], it's listing all film matches with a one director, because I want to test the query. Change it to //film[count(directors)=2] and it will return a result of zero matches based on your example.
I would highly recommend that you pause for a moment and become more familiar with what xPath is and how it works before you continue

Comparing date with Xpath in java

I have an XML page that I parse using DOM in Java. When I perform a query using XPath, for example price <10 or price >20, I get the expected result. However, I cannot get any results when I try to compare by date. NetBeans says it's successful, but does not give me any results.
This code is from the XML page:
<!?xml version="1.0" encoding="UTF-8"?><catalog ><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<catalog>
This is my Java code:
public class Main {
public static void main(String[] args) throws XPathExpressionException, FileNotFoundException {
XPathFactory factory = XPathFactory.newInstance();
XPath path = factory.newXPath();
XPathExpression xPathExpression=path.compile("//book[price >10]/* ");
//| //book[price>10]/*
File xmlDocument =new File("books.xml");
InputSource inputSource = new InputSource(new FileInputStream(xmlDocument));
Object result = xPathExpression.evaluate(inputSource,XPathConstants.NODESET);
NodeList nodeList = (NodeList)result;
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.print(nodeList.item(i).getNodeName()+" ");
System.out.print(nodeList.item(i).getFirstChild().getNodeValue());
System.out.print("\n");
}
}
}
What I need to do is to compare publish_date to a predefined date.
"//book[publish_date>2000-01-01]/*
something like this
Xpath (at least 1.0) cannot compare dates, but you can turn date into integer with right order, using translate function:
//book[translate(publish_date,'-','') > 20000101]

java adding 2 xml documents together

I have two seperate java Document objects:
Doc1 :
<CIS_REQUEST>
<Request1>
</CIS_REQUEST>
<CIS_RESPONSE>
<RESPONSE1>
<RESPONSE2>
</CIS_RESPONSE>
Doc2 :
<CIS_REQUEST>
<Request1>
</CIS_REQUEST>
I want the resulting document to look like:
<CIS_REQUEST>
<Request1>
</CIS_REQUEST>
<CIS_RESPONSE>
<RESPONSE1>
<RESPONSE2>
</CIS_RESPONSE>
<PROCESSING>1</PROCESSING>
<CIS_PROFILES>
<CIS_REQUEST>
<Request1>
</CIS_REQUEST>
<CIS_PROFILES>
The code I have so far:
Document doc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(combinedResponse);
counter++;
String counter_str = Integer.toString(counter);
Element count = doc2.createElement("PROCESSING");
root.appendChild(count);
Text counter_text = doc2.createTextNode(counter_str);
count.appendChild(counter_text);
Element profileElement = doc2.createElement(profName + "_profiles");
profileElement.append(doc1) //I need some replacement for this code.
Can anyone educate me on how I can just append one document to another, and not insert it somewhere in the original document?

Parsing XML in Java from Wordpress feed

private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
I have this code for parsing the title and content from the default wordpress xml, the only problem is that when I try to get the content of the blog entry, the xml tag is: <content:encoded> and I do not understand how to retrieve this data ?
The tag <content:encoded> means an element with the name encoded in the XML namespace with the prefix content. The XPath evaluator is probably unable to resolve the content prefix to it's namespace, which I think is http://purl.org/rss/1.0/modules/content/ from a quick Google.
To get it to resolve, you'll need to do the following:
Ensure your DocumentBuilderFactory has setNamespaceAware( true ) called on it after construction, otherwise all namespaces are discarded during parsing.
Write an implementation of javax.xml.namespace.NamespaceContext to resolve the prefix to it's namespace (doc).
Call XPath#setNamespaceContext() with your implementation.
You could also try to use XStream, wich is a good and easy to use XML parser. Makes you have almost no work for parsing known XML structures.
PS: Their site is currently offline, use Google Cache to see it =P

Categories