Parsing XML in Java from Wordpress feed - java

private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
I have this code for parsing the title and content from the default wordpress xml, the only problem is that when I try to get the content of the blog entry, the xml tag is: <content:encoded> and I do not understand how to retrieve this data ?

The tag <content:encoded> means an element with the name encoded in the XML namespace with the prefix content. The XPath evaluator is probably unable to resolve the content prefix to it's namespace, which I think is http://purl.org/rss/1.0/modules/content/ from a quick Google.
To get it to resolve, you'll need to do the following:
Ensure your DocumentBuilderFactory has setNamespaceAware( true ) called on it after construction, otherwise all namespaces are discarded during parsing.
Write an implementation of javax.xml.namespace.NamespaceContext to resolve the prefix to it's namespace (doc).
Call XPath#setNamespaceContext() with your implementation.

You could also try to use XStream, wich is a good and easy to use XML parser. Makes you have almost no work for parsing known XML structures.
PS: Their site is currently offline, use Google Cache to see it =P

Related

Java XPath scan file looking for a word

Im building an application that will taka a word from user and then scan file using XPath returning true or false depending on wheather the word was found in that file or not.
I have build following class that implements XPath, but i am either missunderstanding how it should work or there is something wrong with my code. Can anyone explain to me how to use Xpath to make full file search?
public XPath() throws IOException, SAXException, ParserConfigurationException, XPathExpressionException {
FileInputStream fileIS = new FileInputStream("text.xml");
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPathFactory xPathfactory = XPathFactory.newInstance();
javax.xml.xpath.XPath xPath = xPathfactory.newXPath();
XPathExpression expr = xPath.compile("//text()[contains(.,'java')]");
System.out.println(expr.evaluate(xmlDocument, XPathConstants.NODESET));
}
And the xml file i am currently testing on.
<?xml version="1.0"?>
<Tutorials>
<Tutorial tutId="01" type="java">
<title>Guava</title>
<description>Introduction to Guava</description>
<date>04/04/2016</date>
<author>GuavaAuthor</author>
</Tutorial>
<Tutorial tutId="02" type="java">
<title>XML</title>
<description>Introduction to XPath</description>
<date>04/05/2016</date>
<author>XMLAuthor</author>
</Tutorial>
</Tutorials>
Found the solution, i was missing correct display of the found entries and as someone pointed out in comment 'java' is in arguments and i want to scan only text fields so it would be never found, after adding following code and changing the word my app will look for, application works
Object result = expr.evaluate(xmlDocument, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
Your XPath is searching the text() nodes, but the word java appears in the #type attribute (which is not a text() node).
If you want to search for the word in both text() and #* then you could use a union | operator and check for either/both containing that word:
//text()[contains(. ,'java')] | //#*[contains(., 'java')]
But you might also want to scan comment() and processing-instruction(), so could generically match on node() and then in the predicate test:
//node()[contains(. ,'java')] | //#*[contains(., 'java')]
With XPath 2.0 or greater, you could use:
//node()[(.|#*)[contains(., 'java')]]

How to write XPath to get node attribute value from a "Name Space XML" in Java

INPUT_XML:
<?xml version="1.0" encoding="UTF-8">
<root xmlns:ns1="http://path1/schema1" xmlns:ns2="http://path2/schema2">
<ns1:abc>1234</ns1:abc>
<ns2:def>5678</ns2:def>
</root>
In Java, I am trying to write XPath expression which will get the value corresponding to this attribute "xmlns:ns1" from the above INPUT_XML string content.
I've tried the following:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(INPUT_XML);
String xpathExpression = "/root/xmlns:ns1";
// Create XPathFactory object
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
// Create XPathExpression object
XPathExpression expr = xpath.compile(xpathExpression);
// Evaluate expression result on XML document
NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
But the above code is not giving the expected value of the specified attribute i.e. xmlns:ns1. I heavily suspect the xPathExpression is wrong. Please suggest with the right XPath expression or the right approach to tackle this issue.
If you're using an XPath 1.0 processor, or a XPath 2.0 processor with XPath 1.0 compatibility mode turned on, you can use the namespace axis to select the namespace value.
You will need to make the following change in your code:
String xpathExpression = "/root/namespace::ns1"
The xmlns:ns1="http://path1/schema1" and xmlns:ns2="http://path2/schema2" are not attributes, but namespace declarations. You cannot retrieve them with an XPath declaration so easily (there is XPath function namespace-uri() for this purpose, but root element does not have any namespace, it only defines them for future use).
When using DOM API you could use method lookupNamespaceURI():
System.out.println("ns1 = " + doc.getDocumentElement().lookupNamespaceURI("ns1"));
System.out.println("ns2 = " + doc.getDocumentElement().lookupNamespaceURI("ns2"));
When using XPath you could try following expressions:
namespace-uri(/*[local-name()='root']/*[local-name()='abc'])
namespace-uri(/*[local-name()='root']/*[local-name()='def'])

How to read an XML in java w/o DOM?

I have an XML file and reading the information using Xpath, I want to read the 'listings_Id' and 'budget_remaining' together.
XML example
<ads>
<ad>
<listing_ids>
<listing_id>2235</listing_id>
<listing_id>303</listing_id>
<listing_id>394</listing_id>
</listing_ids>
<reference_id>11</reference_id>
<net_ppe>0.55</net_ppe>
<budget_remaining>50000.0</budget_remaining>
</ad>
<ad>
<listing_ids>
<listing_id>2896</listing_id>
</listing_ids>
<reference_id>8</reference_id>
<net_ppe>1.5</net_ppe>
<budget_remaining>1.3933399</budget_remaining>
</ad>
</ads>
I want to output it to a CSV file as the following
ListingId,BudgetRemaining
2235,50000
303,50000
394,50000
2896,1.39
Using the code
String expression = "/ads/ad/listing_ids/listing_id";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(docum, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getFirstChild().getNodeValue());
}
String expression1 = "/ads/ad/budget_remaining";
System.out.println(expression1);
NodeList nodeList1 = (NodeList) xPath.compile(expression1).evaluate(docum, XPathConstants.NODESET);
for (int i = 0; i < nodeList1.getLength(); i++) {
System.out.println(nodeList1.item(i).getFirstChild().getNodeValue());
}
Output
/ads/ad/listing_ids/listing_id
2235
303
394
2896
/ads/ad/budget_remaining
50000.0
1.3933399
Desired Output
2235,50000.0
303,50000.0
2896,50000.0
2896,1.3933399
How to read the XML using Xpath or any other method? I want the 'listing_ids' and 'budget_ remaining' to be read together for each 'Listing Id' like
303,50000
Please help me-new to Java.
It may be easier for you to use jaxb to parse the XML into a list of ads.
You can then reference your Java list
I would suggest using XQuery, which unlike XPath can return structured results. (Or XPath 2.0, but if you're going to XPath 2.0 then you might as well go all the way to XQuery).
The relevant query is
string-join(
for $n in /ads/ad/listing_ids/listing_id
return $n/concat(., ',', ../../budget_remaining),
'
'
)
This will return the required output as a single string.

How do I get the tag 'Name' from a XML Node in Java (Android)

I have a tiny little problem parsing an XML file in Java (Android).
I have an XML file that is like this:
<Events>
<Event Name="Olympus Has Fallen">
...
</Event>
<Event Name="Iron Man 3">
...
</Event>
</Events>
I already managed to get the NodeList by doing this:
URL url = new URL("********");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList = doc.getElementsByTagName("Event");
Also I managed to get every single item of the NodeList by doing this:
for (int i = 0; i < nodeList.getLength(); i++) {
// Item
Node node = nodeList.item(i);
Log.i("film", node.getNodeName());
}
But this just Logs: "Event" instead of the value of the Name tag.
How do I output the value of this 'name' tag from the XML.
Can anyone help me with this one?
Thanks in advance!
But this just Logs: "Event" instead of the value of the Name tag.
Yes, because you're asking for the name of the element. There isn't a Name "tag" - there's a Name attribute, and that's what you should find:
// Only check in elements, and only those which actually have attributes.
if (node.hasAttributes()) {
NamedNodeMap attributes = node.getAttributes();
Node nameAttribute = attributes.getNamedItem("Name");
if (nameAttribute != null) {
System.out.println("Name attribute: " + nameAttribute.getTextContent());
}
}
(It's very important to be precise in terminology - it's worth knowing the difference between nodes, elements, attributes etc. It will help you enormously both when communicating with others and when looking for the right bits of API to call.)

how to get a node value in Xpath - Java

I've got a section of XML that looks like this:
<entry>
<id>tag:example.com,2005:Release/343597</id>
<published>2012-04-10T11:29:19Z</published>
<updated>2012-04-10T12:04:41Z</updated>
<link type="text/html" href="http://example.com/projects/example1" rel="alternate"/>
<title>example1</title>
</entry>
I need to grab the link http://example.com/projects/example1 from this block. I'm not sure how to do this. To get the title of the project I use this code:
String title1 = children.item(9).getFirstChild().getNodeValue();
where children is the getChildNodes() object for the <entry> </entry> block. But I keep getting NullPointerExceptions when I try to get the node value for the <link> node in a similar way. I see that the XML code is different for the <link> node, and I'm not sure what it's value is.... Please advise!
The xpath expression to get that node is
//entry/link/#href
In java you can write
Document doc = ... // your XML document
XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//entry/link/#href");
String href = xp.evaluate(doc);
Then if you need to get the link value of the entry with a specific id you can change the xpath expression to
//entry[id='tag:example.com,2005:Release/343597']/link/#href
Finally if you want to get all the links in the documents, if the document has many entry elements you can write
Document doc = ... // your XML document
XPathExpression xp = XPathFactory.newInstance().newXPath().compile("//entry/link/#href");
NodeList links = (NodeList) xp.evaluate(doc, XPathConstants.NODESET);
// and iterate on links
Here is the complete code:
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("test.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//entry/link/#href");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i));
}

Categories