How to read an XML in java w/o DOM? - java

I have an XML file and reading the information using Xpath, I want to read the 'listings_Id' and 'budget_remaining' together.
XML example
<ads>
<ad>
<listing_ids>
<listing_id>2235</listing_id>
<listing_id>303</listing_id>
<listing_id>394</listing_id>
</listing_ids>
<reference_id>11</reference_id>
<net_ppe>0.55</net_ppe>
<budget_remaining>50000.0</budget_remaining>
</ad>
<ad>
<listing_ids>
<listing_id>2896</listing_id>
</listing_ids>
<reference_id>8</reference_id>
<net_ppe>1.5</net_ppe>
<budget_remaining>1.3933399</budget_remaining>
</ad>
</ads>
I want to output it to a CSV file as the following
ListingId,BudgetRemaining
2235,50000
303,50000
394,50000
2896,1.39
Using the code
String expression = "/ads/ad/listing_ids/listing_id";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(docum, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getFirstChild().getNodeValue());
}
String expression1 = "/ads/ad/budget_remaining";
System.out.println(expression1);
NodeList nodeList1 = (NodeList) xPath.compile(expression1).evaluate(docum, XPathConstants.NODESET);
for (int i = 0; i < nodeList1.getLength(); i++) {
System.out.println(nodeList1.item(i).getFirstChild().getNodeValue());
}
Output
/ads/ad/listing_ids/listing_id
2235
303
394
2896
/ads/ad/budget_remaining
50000.0
1.3933399
Desired Output
2235,50000.0
303,50000.0
2896,50000.0
2896,1.3933399
How to read the XML using Xpath or any other method? I want the 'listing_ids' and 'budget_ remaining' to be read together for each 'Listing Id' like
303,50000
Please help me-new to Java.

It may be easier for you to use jaxb to parse the XML into a list of ads.
You can then reference your Java list

I would suggest using XQuery, which unlike XPath can return structured results. (Or XPath 2.0, but if you're going to XPath 2.0 then you might as well go all the way to XQuery).
The relevant query is
string-join(
for $n in /ads/ad/listing_ids/listing_id
return $n/concat(., ',', ../../budget_remaining),
'
'
)
This will return the required output as a single string.

Related

Using XPath count function

I am using a oracle sql database to carryout sql queries with xpath expressions – I have created an XML file which contains data relating to a film
The XPath expression you're looking for (not the SQL expression) is:
count(/film/directors/director)
which result should be 1 with your example XML file.
If you want to check if it's 2, use
count(/film/directors/director) = 2
which should return FALSE with your XML file.
First, you obviously know you need to use xPath to query the XML file, but you seem to have failed to understand what xPath is or how it should be used.
My first suggestion would be to go a read up on xPath and xPath in Java because it has nothing to do with the SQL.
I then did a quick search on "java xpath count" and come across a number of excellent examples, but based on XPath count() function, I went about testing your document with...
try {
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
DocumentBuilder b = f.newDocumentBuilder();
// This is your document in a file
Document d = b.parse(new File("Test.xml"));
d.getDocumentElement().normalize();
String expression = "//film[count(directors)=1]";
XPath xPath = XPathFactory.newInstance().newXPath();
Object result = xPath.compile(expression).evaluate(d, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
System.out.println("Found " + node.getTextContent());
}
} catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException | DOMException exp) {
exp.printStackTrace();
}
This basically listed the film node (found one match) ... but, why did you produce a result?! Look at the query, //film[count(directors)=1], it's listing all film matches with a one director, because I want to test the query. Change it to //film[count(directors)=2] and it will return a result of zero matches based on your example.
I would highly recommend that you pause for a moment and become more familiar with what xPath is and how it works before you continue

xpath getting multiple node values - xml parser using java

below are the xml file
<priority-claims>
<priority-claim sequence="1" kind="national">
<document-id document-id-type="maindoc">
<doc-number>FD0297663</doc-number>
<date>20070403</date>
</document-id>
</priority-claim>
<priority-claim sequence="2" kind="national">
<document-id document-id-type="maindoc">
<doc-number>FD0745459P</doc-number>
<date>20060424</date>
</document-id>
</priority-claim>
</priority-claims>
my expected conditions:
1.How can i getting all the node value (i.e FD0297663, 20070403 and FD0745459P,20060424)
2.its may be single (i.e Priority -claim tag) or more than a single level is possible
my existing code getting first level value only
String priorityNumber = xPath.compile("//priority-claim//doc-number").evaluate(xmlDocument);
String priorityDate = xPath.compile("//priority-claim//date").evaluate(xmlDocument);
Below is a working example:
updated the xpath expressions (e.g, /priority-claims/priority-claim/document-id/doc-number/text())
using NodeList
NodeList priorityNumbers = (NodeList) xPath.compile("/priority-claims/priority-claim/document-id/doc-number/text()").evaluate(xmlDocument, XPathConstants.NODESET);
NodeList priorityDates = (NodeList) xPath.compile("/priority-claims/priority-claim/document-id/date/text()").evaluate(xmlDocument,XPathConstants.NODESET);
for(int i=0; i<priorityNumbers.getLength();i++){
System.out.println(priorityNumbers.item(i).getNodeValue());
}
for(int i=0; i<priorityDates.getLength();i++){
System.out.println(priorityDates.item(i).getNodeValue());
}
Here is a linkt to a gist with a runnable version: https://gist.github.com/rparree/1c7eb8e9ca928b98418fdb167a2096a3

How to extract all values pointed to by an XPath?

I have the below xml
<test>
<nodeA>
<nodeB>key</nodeB>
<nodeC>value1</nodeC>
</nodeA>
<nodeA>
<nodeB>key</nodeB>
<nodeC>value2</nodeC>
</nodeA>
</test>
How to concatenate and get all the values in the xpath /test/nodeA/nodeC ?
My expected output in this scenario would be value1value2
I am not sure from what I have read that it is possible with xpath, but thanks for your help.
P.S: I am using VTD-XML from Ximpleware to parse the same in Java. Any java based solution is also welcome. Currently my java solution gives only the first value, i.e. value1
XPath will return a NodeList which you can iterate and concatenate:
StringBuilder concatenated = new StringBuilder():
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/test/nodeA/nodeC/text()";
InputSource inputSource = new InputSource("sample.xml");
NodeList nodes = (NodeList) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);
for(int i = 0; i < nodes.getLength(); i++) {
concatenated.append(nodes.item(i).getTextContent());
}
Here's a groovy implementation (in 2 lines of code!) using XmlSlurper
def xml = new groovy.util.XmlSlurper().parse(new File('sample.xml'))
print xml.nodeA*.nodeC.join("")
Outputs
value1value2
I don't use groovy in production code but for local mucking about it's great. I often have little groovy utilities in gradle build files.
If you want a single XPath expression to return the string with a concatenation of the selected values then you need XPath 2.0 (or later) respectively XQuery 1.0 (or later) where you can do string-join(/test/nodeA/nodeC, ''). XPath 1.0 does not have the expressive power to give you a string but of course, as already shown in another answer, you can iterate over the selected nodes and concatenate the selected values in a host language like Java.
This is how I would do it with VTD-XML...
import com.ximpleware.*;
public class concat {
public static void main(String[] s) throws VTDException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/test/*/nodeC/text()");
StringBuilder sb = new StringBuilder(100);
int i=0;
while((i=ap.evalXPath())!=-1){
sb.append(vn.toString(i));
}
System.out.println(sb.toString());
}
}

Reading XML in java using DOM

I am new to read XML in Java using DOM. Could someone help me with simple code steps to read this XML in DOM?
Here is my XML:
<DataSet xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation='datamartschema.1.3.xsd'>
<DataStream title='QUESTIONNAIRE'>
<Record>
<TransactionDate>2014-05-28T14:17:31.2186777-06:00</TransactionDate><SubType>xhaksdj</SubType>
<IntegerValue title='ComponentID'>11111</IntegerValue>
</Record><Record>
<TransactionDate>2014-05-28T14:17:31.2186777-06:00</TransactionDate><SubType>jhgjhg</SubType>
<IntegerValue title='ComponentID'>11111</IntegerValue>
</Record>
</DataStream>
</DataSet>
In this XML I need to read the DataStream value and Record values. My expected output is
DataStream=QUESTIONNAIRE and my records are
<TransactionDate>2014-05-28T14:17:31.2186777-06:00</TransactionDate><SubType>xhaksdj</SubType><IntegerValue title='ComponentID'>11111</IntegerValue><TransactionDate>2014-05-28T14:17:31.2186777-06:00</TransactionDate><SubType>jhgjhg</SubType><IntegerValue title='ComponentID'>11111</IntegerValue>
How can I get this output? I tried myself but I can't get the records output like above. I get the output without Tags which are present in the above output.I am using this line to get the output. But it does not give me correct output. Also, how to read the datastream value from this XML? Kindly help me.
This is my code snippets
NodeList datasetallRecords = indElement.getElementsByTagName("Record");
for (int y = 0; y < datasetallRecords.getLength(); y++) {
Element recordsElement = (Element) datasetallRecords.item(y);
recordXMl = recordXMl + recordsElement.getTextContent();
String d = datasetallRecords.item(y).getTextContent();
if (recordsElement.getTagName().equalsIgnoreCase("SubType")) {
lsDataStreamSubTypes.add(recordsElement.getTextContent());
}
recordCount = y;
}
When you create new instance of builder you can get DataStream
it would be look like this:
Element root = document.getDocumentElement();
NodeList dataStreams = root.getElementsByTagName("DataStream");
then get throw this list and get all info like this:
for (int i = 0; i < dataStreams.lenght(); i++) {
Element dataStream = (Element) dataStreams.item(i);
if (dataStream.getNodeType() == Element.ELEMENT_NODE) {
String title = dataStream.getAttributes()
.getNamedItem("title").getTextContent();
}
}
First you need to create a Node like this
Node nNode = datasetallRecords.item(y);
then an element like this
Element eElement = (Element) nNode;
now you can start taking the values from the element by using the getelementbyid and getnodevalue method.
You're not getting the tags because the call to getTextContent() on the "Record" node will return only the textual content of that node and its descendants.
If you need to nodes as well you'll have to process the XML by hand. Have a look at the DOM tutorial it covers processing a document in DOM mode very well including how to read out element names.

Parsing XML in Java from Wordpress feed

private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
I have this code for parsing the title and content from the default wordpress xml, the only problem is that when I try to get the content of the blog entry, the xml tag is: <content:encoded> and I do not understand how to retrieve this data ?
The tag <content:encoded> means an element with the name encoded in the XML namespace with the prefix content. The XPath evaluator is probably unable to resolve the content prefix to it's namespace, which I think is http://purl.org/rss/1.0/modules/content/ from a quick Google.
To get it to resolve, you'll need to do the following:
Ensure your DocumentBuilderFactory has setNamespaceAware( true ) called on it after construction, otherwise all namespaces are discarded during parsing.
Write an implementation of javax.xml.namespace.NamespaceContext to resolve the prefix to it's namespace (doc).
Call XPath#setNamespaceContext() with your implementation.
You could also try to use XStream, wich is a good and easy to use XML parser. Makes you have almost no work for parsing known XML structures.
PS: Their site is currently offline, use Google Cache to see it =P

Categories