convert xpath expression result to json - java

I want to convert a dom nodelist to a json array and send the result to a rest client:
Each node of the xml represents the following:
<A NAME="x" COUNT="y">
<B KEY="z1" VALUE="z2"/>
<B KEY="z3" VALUE="z4"/>
</A>
I want that i i will have for the output an array of objects where each object looks like the following:
{"NAME":"x",
"COUNT":"y",
"B": [ {"KEY": "z1, VALUE:"z2"},
{"KEY":"z3", VALUE:"z4"} ]
}
I tried to use the GSON library:
package com.a;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
public class Test {
private static final String XPATH = "/A/B";
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
File f = new File("C:/Users/abc/Desktop/a.xml");
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(f);
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile(XPATH).evaluate(xmlDocument, XPathConstants.NODESET);
Gson gson = new GsonBuilder().setPrettyPrinting().create();
String jsonOutput = gson.toJson(nodeList);
System.out.println(jsonOutput);
}
}
but i am getting an error
Exception in thread "main" java.lang.StackOverflowError at
java.lang.StringBuffer.append(StringBuffer.java:224) at
java.io.StringWriter.write(StringWriter.java:84) at
com.google.gson.stream.JsonWriter.newline(JsonWriter.java:569) at
com.google.gson.stream.JsonWriter.beforeName(JsonWriter.java:586)
How can i fix this code?
As it is possible to convert a whole xml to json (Quickest way to convert XML to JSON in Java)
I assume that it is possible to convert dom nodes to json. What is wrong here?

What's wrong ? you simply don't use the lib described on your link (Quickest way to convert XML to JSON in Java)
Gson will use java reflection to generate a json string from any object. From a DOM Document (or node), even when it doesn't end up with a StackOverflowError, it will not produce what you expect. here is the result for your XML:
{"fNamespacesEnabled":false,"mutationEvents":false,"actualEncoding":"UTF-8","standalone":false,"fDocumentURI":"...a.xml","changes":0,"allowGrammarAccess":false,"errorChecking":true,"ancestorChecking":true,"xmlVersionChanged":false,"documentNumber":0,"nodeCounter":0,"xml11Version":false,"flags":6}
Actually it seems that if any method has been invoked on a DOM Document (ex: getDocumentElement), the gson.toJson end up with a StackOverflowError.
As you can see in the link, a jar that will do the job can be found here: http://mvnrepository.com/artifact/org.json/json
It implies that you re-convert the nodes extracted with your XPath to string.
You can do it with that:
private static String toString(Node n) throws TransformerFactoryConfigurationError, TransformerException {
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(n);
transformer.transform(source, result);
return result.getWriter().toString();
}
With all that, all you have to do is to loop through your nodeList, convert it to string, and then convert it to json
for (int i = 0; i < nodeList.getLength(); i++) {
Node n = nodeList.item(i);
JSONObject xmlJSONObj = XML.toJSONObject(toString(n));
String jsonPrettyPrintString = xmlJSONObj.toString(1);
System.out.println(jsonPrettyPrintString);
}

Related

How to parse the full content of a XML Tag in java

I have some kind of complex XML data structure. The structure contains different fragments like in the following example:
<data>
<content-part-1>
<h1>Hello <strong>World</strong>. This is some text.</h1>
<h2>.....</h2>
</content-part1>
....
</data>
The h1 tag within the tag 'content-part-1' is of interest. I want to get the full content of the xml tag 'h1'.
In java I used the javax.xml.parsers.DocumentBuilder and tried something like this:
String my_content="<h1>Hello <strong>World</strong>. This is some text.</h1>";
// parse h1 tag..
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = documentBuilder.parse(new InputSource(new StringReader(my_content)));
Node node = doc.importNode(doc.getDocumentElement(), true);
if (node != null && node.getNodeName().equals("h1")) {
return node.getTextContent();
}
But the method 'getTextContent()' will return:
Hello World. This is some text.
The tag "strong" is removed by the xml parser (as it is the documented behavior).
My question is how I can extract the full content of a single XML Node within a org.w3c.dom.Document without any further parsing the node content?
Although java DOM parser provides functionality for parsing mixed content, in this particular case it could be more convenient to use Jsoup library. When using it code to extract h1 element content would be as follows:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
String text = "<data>\n"
+ " <content-part1>\n"
+ " <h1>Hello <strong>World</strong>. This is some text.</h1>\n"
+ " <h2></h2>\n"
+ " </content-part1>\n"
+ "</data>";
Document doc = Jsoup.parse(text);
Elements h1Elements = doc.select("h1");
for (Element h1 : h1Elements) {
System.out.println(h1.html());
}
Output in this case will be "Hello <strong>World</strong>. This is some text."
What you probaly want is XML generation from some subnode of your document.
So with slighlty modified nodeToString from earlier answer to similar question I can propose to
generate text <h1>Hello <strong>World</strong>. This is some text.</h1>. Some extra effor might be needed to get rid of <h1> and </h1>
package com.github.vtitov.test;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.StringReader;
import java.io.StringWriter;
import static org.hamcrest.MatcherAssert.*;
import static org.hamcrest.Matchers.*;
public class XmlTest {
#Test
public void buildXml() throws Exception {
String my_content="<h1>Hello <strong>World</strong>. This is some text.</h1>";
// parse h1 tag..
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = documentBuilder.parse(new InputSource(new StringReader(my_content)));
Node node = doc.importNode(doc.getDocumentElement(), true);
String h1Content = null;
if (node != null && node.getNodeName().equals("h1")) {
h1Content = nodeToString(node);
}
assertThat("h1", h1Content, equalTo("<h1>Hello <strong>World</strong>. This is some text.</h1>"));
}
private static String nodeToString(Node node) throws TransformerException {
StringWriter sw = new StringWriter();
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "no");
t.transform(new DOMSource(node), new StreamResult(sw));
return sw.toString();
}
}

Selenium Webdriver : Reusable xml parsing class method is not working due to return type unknown

My objective is to Create Reusable xml parsing class concerning that return type could be array or arraylist
My code is working but I wanted reusablity I am unable to create reusable class/method due to return type which could array or arraylist is not working.**
1) I have created a xml file as follows:
<SearchStrings>
<Search id="1111" type="high">
<Questions>What is software Testing?</Questions>
<Tips>How to connect database with eclipse ?</Tips>
<Multiple>Who was the first prime minister of India? </Multiple>
<Persons>Who is Dr.APJ Abdul Kalam </Persons>
</Search>
<Search id="2222" type="low">
<Questions>What is Automation Testing?</Questions>
<Tips>How to use selenium webdriver </Tips>
<Multiple>Who was the fourth prime minister of India? </Multiple>
<Persons>Who is Superman? </Persons>
</Search>
<Search id="3333" type="medium">
<Questions>What is Selenium ide Testing?</Questions>
<Tips>How to use selenium webdriver with eclipse ? </Tips>
<Multiple>Who was the ninth prime minister of India? </Multiple>
<Persons>Who is Spiderman? </Persons>
</Search>
<Search id="4444" type="optional">
<Questions>What is database Testing?</Questions>
<Tips>How to use Class in java ? </Tips>
<Multiple>Who was the eight prime minister of India? </Multiple>
<Persons>Who is motherindia? </Persons>
</Search>
</SearchStrings>
2) Creating a class which fetch nodes of tags at once and store all of them in
a String [] SearchString and then use this array to fetch the values and by .sendKeys(value) attribute search them at google.
Simplified:
1) Store elements tag element in an reusable datatype my knowledge is limited so using string array.
2) Fetch string array elements and search them using the .sendkeys(element) at google.
my code is as below:
package searchexperiment;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Experiment implements Paths
{
public static WebDriver driver;
static Document document;
static DocumentBuilder db;
public static void main(String args[])
{
String[] SearchStrings;
driver=new FirefoxDriver();
driver.manage().timeouts().implicitlyWait(50, TimeUnit.SECONDS);
driver.get("https://www.google.com/");
//loading xml as test data
WebElement googlebox=driver.findElement(By.id("gbqfq"));
try {
FileInputStream file = new FileInputStream(new File(test_xml));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println("*************************");
String expression="/SearchStrings/Search/Questions";
System.out.println("This is ordered expression \n"+expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for(int i=0;i< nodeList.getLength();i++)
{
// Node nNode = emailNodeElementList.item(j);
// Element eElement = (Element) nNode;
System.out.println("Taking the loop value");
// below push is not working.
Object array = push(SearchStrings[i],nodeList.item(i).getFirstChild().getNodeValue());
String text=nodeList.item(i).getFirstChild().getNodeValue();
googlebox.clear();
googlebox.sendKeys(text);
System.out.println("Closing the loop value");
}
I am using the string array in order to make xml parsing class reusable.
I have used an interface to get file name
public interface Paths {
String test_xml="XML/Searchtext.xml";
}
Reusable method along with class was :
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import searchexperiment.Paths;
public class DocBuilderClass implements Paths
{
public static String[] username()
{
String[] SearchElements=new String[4];
try
{
FileInputStream file = new FileInputStream(new File(test_xml));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println("*************************");
String expression="/SearchStrings/Search/Tips";
System.out.println("This is ordered expression \n"+expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
//int size=
for(int i=0;i< nodeList.getLength();i++)
{
// Node nNode = emailNodeElementList.item(j);
// Element eElement = (Element) nNode;
System.out.println("Taking the loop value");
//Object array = push(SearchStrings[i],nodeList.item(i).getFirstChild().getNodeValue());
String text=nodeList.item(i).getFirstChild().getNodeValue();
//googlebox.clear();
// googlebox.sendKeys(text);
SearchElements[i]=text;
System.out.println("Closing the loop value");
}
}
catch(Exception ex)
{
System.out.println("This is a exception" + ex);
}
finally
{
}
return SearchElements;
}
}
and then way to call the class was as follows:
String [] namelist=DocBuilderClass.username();
for(int i=0;i<namelist.length;i++)
{
String abc=namelist[i];
googlebox.sendKeys(abc);
googlebox.clear();
googlebox.sendKeys(namelist[i]);
}
References were Reference Link String[] array
Reference Link XML Parsing
All I learn that Your basics should be strong to solve a strong and complex problems.

How to get the value of text node inside <div> element using XPATH and Jtidy

I have been banging my head over for two days now.
I have a XHTML web-page from which i want to scrap some data
I am using JTidy to DOMParse and then XPathFactory to find nodes using XPath
The Xhtml snippet is something like this
<div style="line-height: 22px;" id="dvTitle" class="titlebtmbrdr01">BAJAJ AUTO LTD.</div>
Now i want that BAJAJ AUTO LTD.
The code that i am using is :
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Vector;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class BSEQuotesExtractor implements valueExtractor {
#Override
public Vector<String> getName(Document d) throws XPathExpressionException {
// TODO Auto-generated method stub
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//div[#id='dvTitle']/text()");
Object result = expr.evaluate(d, XPathConstants.NODESET);
NodeList nodes = (NodeList)result;
for(int i=0;i<nodes.getLength();i++)
{
System.out.println(nodes.item(i).getNodeValue());
}
return null;
}
public static void main(String[] args) throws MalformedURLException, IOException, XPathExpressionException{
BSEQuotesExtractor q = new BSEQuotesExtractor();
DOMParser parser = new DOMParser(new URL("http://www.bseindia.com/bseplus/StockReach/StockQuote/Equity/BAJAJ%20AUTO%20LTD/BAJAJAUT/532977/Scrips").openStream());
Document d = parser.getDocument();
q.getName(d);
}
}
But i gett a null output and not BAJAJ AUTO LTD.
Please rescue me
try this.
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//div[#id='dvTitle']");
Object result = expr.evaluate(d, XPathConstants.NODE);
Node node = (Node)result;
System.out.println(node.getTextContent());
you must use XPathConstants.STRING instead of XPathConstants.NODESET.
You want to get a value of a single element (div), not a list of nodes.
Write:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String divContent = (String) path.evaluate("//div[#id='dvTitle']", document, XPathConstants.STRING);
Into divContent you get "BAJAJ AUTO LTD.".

Parse Last.Fm XML from API in Java

URL: http://ws.audioscrobbler.com/2.0/?method=chart.gethypedtracks&api_key=1732077d6772048ccc671c754061cb18&limit=10
From the above url I need to somehow remove the Artist name and the track name from the XML file produced from each Song given but I have no Idea how to work with an XML file structured in this way ??
Any help or pointers would be very much appreciated !
Thanks,
Ross
Here's a fully working class that loads the URL you have indicated and parses the Track and artist names.
Basically it reads the xml into a Document, and runs 2 xpath queries in loops to get the data you want.
The document itself is simple xml, if you reformat it, it looks like:
<?xml version="1.0" encoding="utf-8"?>
<lfm status="ok">
<tracks page="1" perPage="10" totalPages="50" total="500">
<track>
<name>Hysterical</name>
<duration>231</duration>
<percentagechange>3626</percentagechange>
<mbid/>
<url>http://www.last.fm/music/Clap+Your+Hands+Say+Yeah/_/Hysterical</url>
<streamable fulltrack="0">0</streamable>
<artist>
<name>Clap Your Hands Say Yeah</name>
...
All I did to clean it up was run it through a re-formatter like xmlstarlet as I mentioned in my comment. Note: you don't have to reformat it for java to read it if it's well formed. Human readable is all a re-format does for you.
The first xpath query gets the track name using a path lfm/tracks/track/name. You can use something like this xpath tester to try out your xpath queries (you can paste your xml in and it will reformat it too). If you don't understand xpath, there are many sources on the net.
The second xpath works relative to the current track name node, and looks for a following-sibling node of type artist with a name sub-node, and then displays the text of the node.
Here's the code
package net.fish;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ParseXML {
private static final DocumentBuilderFactory DOCUMENT_BUILDER_FACTORY = DocumentBuilderFactory.newInstance();
private static final XPathFactory XPATH_FACTORY = XPathFactory.newInstance();
public static void main(String[] args) throws Exception {
new ParseXML().parseXml("http://ws.audioscrobbler.com/2.0/?method=chart.gethypedtracks&api_key=1732077d6772048ccc671c754061cb18&limit=10");
}
private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("lfm/tracks/track/name");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Track Name: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("following-sibling::artist/name");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
}

XPath error in Android

My goal is executing an XQuery using XPath.
My XML file is:
<?xml version="1.0" encoding="UTF-8"?>
<postes>
<poste>
<gouvernourat>Kairouan</gouvernourat>
<ville>Kairouan sud</ville>
<cp>3100</cp>
</poste>
<poste>
<gouvernourat>Tunis</gouvernourat>
<ville>Ghazela</ville>
<cp>1002</cp>
</poste>
</postes>
My Java code is:
package xmlparse;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class QueryXML {
public void query() throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
// Standard of reading a XML file
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
XPathExpression expr = null;
builder = factory.newDocumentBuilder();
doc = builder.parse("a.xml"); //C:\\Users\\aymen\\Desktop\\
// Create a XPathFactory
XPathFactory xFactory = XPathFactory.newInstance();
// Create a XPath object
XPath xpath = xFactory.newXPath();
// Compile the XPath expression
expr = xpath.compile("/postes/poste[gouvernourat='Tunis']/ville/text()");
// Run the query and get a nodeset
Object result = expr.evaluate(doc, XPathConstants.NODESET);
// Cast the result to a DOM NodeList
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
System.out.println(nodes.item(i).getNodeValue());
}
}
public static void main(String[] args) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException {
QueryXML process = new QueryXML();
process.query();
}
}
When I launch this Java code the result is displayed on the console correctly (System.out.println).
But if I copy this code to my Android application and change System.out.println(nodes.item(i).getNodeValue()); to Text2.setText(nodes.item(i).getNodeValue()); (I have a TextView named Text2)
When I execute the code and I click the button the TextView stays empty (No error for Force Close)
Thank you in advance
Attribute names needs to start with '#' while using XPath in Android.
So change
[gouvernourat='Tunis']
To
[#gouvernourat='Tunis']
Refer http://developer.android.com/reference/javax/xml/xpath/package-summary.html for details.

Categories