Output XML content with unknown node depth using XPath

Output XML content with unknown node depth using XPath - java

I have paths of information I want to extract from a XML string:
"/root/A/info1"
"/root/A/B/info2"
"/root/A/B/info3"
"/root/A/info4"
And this is the input:
<root>
<A>
<info1>value1</info1>
<B>
<info2>value2.1</info2>
<info3>value3.1</info3>
</B>
<B>
<info2>value2.2</info2>
<!-- note: element "info3" is missing here! -->
</B>
<B>
<info2>value2.3</info2>
<info3>value3.3</info3>
</B>
<info4>value4</info4>
</A>
</root>
And I want to achieve this:
value1|value2.1|value3.1|value4
value1|value2.2|NULL|value4
value1|value2.3|value3.3|value4
My paths vary and I never know the depth of the XML file. Because "/root/A/B/info2" and "/root/A/B/info3" exist three times, I obviously need to output three lines.
I think recursion is needed here.
My code:
main function:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
String[] paths = new String[] {"/root/A/info1", "/root/A/B/info2", "/root/A/B/info3", "/root/A/info4"};
XPath xPath = XPathFactory.newInstance().newXPath();
String[] output = new String[paths.length];
for(int i=0; i<paths.length; i++) {
recursion(paths, doc, xPath, paths[i], i, output);
}
recursive function:
private static void recursion(String[] paths, Object parent, XPath xPath, String path, int position, String[] output) throws Exception {
if(path.contains("/")) { // check if it's the last element, which contains the needed value
List<String> pathNodes = new ArrayList(Arrays.asList(StringUtils.split(path, "/")));
String currentPathNode = pathNodes.get(0);
NodeList nodeList = (NodeList) xPath.compile(currentPathNode).evaluate(parent, XPathConstants.NODESET);
pathNodes.remove(0);
String newPath = StringUtils.join(pathNodes, "/");
for(int i=0; i<nodeList.getLength(); i++) {
Node node = nodeList.item(i);
recursion(paths, node, xPath, newPath, position, output.clone()); // clone?
}
}
else {
output[position] = xPath.compile(path).evaluate(parent);
if((position + 1) == paths.length) { // check if it's the last path, so output the values
System.out.println(StringUtils.join(output, "|"));
}
}
}
If I clone output I get this:
|||value4
If I don't clone output I get that (overwriting previous values):
value1|value2.3|value3.3|value4
Please give me a hint.
Update: Have again a look at the XML input. Text elements which have no value could be missing.

I finally solved it.
I added a context path to my application. It specifies which element is the deepest.
In my example here it would be "/root/A/B".
I update all my paths to be relative to that context path:
"../info1"
"info2"
"info3"
"../info4"
Then I count the nodes from the context path (here 3). That's also the number of lines that will be created. I create a loop to iterate over them and query my updated paths with XPath.

Related

XML - Extract One tag Value

I have to extract tag value from an xml Document that contains a single tag like below:
<error>Permission denied</error>
i have tried:
String xmlRecords = "<error>Permission denied</error>"
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlRecords));
Document doc = db.parse(is);
Node nodes = doc.getFirstChild();
String = nodes.getNodeValue();
but it dont works.
How can i do it ?

Use doc.getDocumentElement().getTextContent() to get the string Permission denied.

With DOM it´s util to know the structure of the XML document, and which node level are you looking for.
After get Document, you can use document.getElementsByTagName("root") to look for the root or father tags, and get the childs as a list to look for the item. Something like this:
NodeList listresults = document.getElementsByTagName('father/root element string');
NodeList nl = listresults.item(0).getChildNodes();
// Recorremos los nodos
for (int temp = 0; temp < nl.getLength(); temp++) {
Node node = nl.item(temp);
// Check if it is a node
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
if(element.getNodeName().equals("error")){
// check the element
}
}
}
I hope this helps you.

just try following code.
String value = nodes.getTextContent();

You have to construct the string if you are using the above approach. You will get the string values of the tag name and content using the functions.
Tag name = nodes.getTextContent()
tag value = nodes.getLocalName()

I guess this is what you want
Element element = document.getDocumentElement();
NodeList errorTagList = element.getElementsByTagName("error");
if (errorTagList != null && errorTagList.getLength() > 0) {
NodeList errorTagSubList = errorTagList.item(0).getChildNodes();
if (errorTagSubList != null && errorTagSubList.getLength() > 0) {
String value = errorTagSubList.item(0).getNodeValue();
}
}

Get node text with HTML task inside

I try to examine with Java XPath an html string like this:
<app>
<elem class="A">value1</elem>
<elem class="B">value2a<br />value2b</elem>
<elem class="C">value3</elem>
</app>
Actually for obtain the elem's value i use this code
public String getValue(String xml, String classValue){
XPath xpath = XPathFactory.newInstance().newXPath();
InputSource source = new InputSource(new StringReader(xml));
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = db.parse(source);
String xpathRequest = "//*[#class='"+classValue+"']/text()";
String value = xpath.evaluate(xpathRequest , document);
return value;
}
For classes A and C works fine, but when i ask the content of task with class B obtain only value2a
How i can get the complete string of node?

Simply run
String xpathRequest = "//*[#class='"+class+"']";
String value = this.xpath.evaluate(xpathRequest , document);
This will select the <elem> node and when converted to a String build the concatenation of all text content, e.g. Value2a Value2b
To get a list of all text contents below a Elem you need to select them as NodeSet:
String xpathRequest = "//*[#class='"+class+"']/text()";
NodeList textNodes = (NodeList)xpath.evaluate(xpathRequest , document, XPathConstants.NODESET);
ArrayList<String> texts = new ArrayList<>();
for (int i=0; i<textNodes.getLength(); i++)
texts.add(textNodes.item(i).getTextContent());

It is because xpath will return 2 value at this moment. Try below :-
List<WebElement> allprice = driver.findElements(By.xpath("//*[#class='B']/text()"));
for(WebElement a:WebElement allprice){
System.out.println(a.gettext());
}

navigating hierarchy of xml input file

How do I list the element names at a given level in an xml schema hierarchy? The code I have below is listing all element names at every level of the hierarchy, with no concept of nesting.
Here is my xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="CDA.xsl"?>
<SomeDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:something">
<title>some title</title>
<languageCode code="en-US"/>
<versionNumber value="1"/>
<recordTarget>
<someRole>
<id extension="998991"/>
<addr use="HP">
<streetAddressLine>1357 Amber Drive</streetAddressLine>
<city>Beaverton</city>
<state>OR</state>
<postalCode>97867</postalCode>
<country>US</country>
</addr>
<telecom value="tel:(816)276-6909" use="HP"/>
</someRole>
</recordTarget>
</SomeDocument>
Here is my java method for importing and iterating the xml file:
public static void parseFile() {
//get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
//Using factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
//parse using builder to get DOM representation of the XML file
Document dom = db.parse("D:\\mypath\\somefile.xml");
//get the root element
Element docEle = dom.getDocumentElement();
//get a nodelist of elements
NodeList nl = docEle.getElementsByTagName("*");
if (nl != null && nl.getLength() > 0) {
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println("node.getNodeName() is: "+node.getNodeName());
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
The output of the above program is:
title
languageCode
versionNumber
recordTarget
someRole
id
addr
streetAddressLine
city
state
postalCode
country
telecom
Instead, I would like to output the following:
title
languageCode
versionNumber
recordTarget
It would be nice to then be able to list the children of recordTarget as someRole, and then to list the children of someRole as id, addr, and telecom. And so on, but at my discretion in the code. How can I change my code to get the output that I want?

You're getting all nodes with this line:
NodeList nl = docEle.getElementsByTagName("*");
Change it to
NodeList nl = docEle.getChildNodes();
to get all of its children. Your print statement will then give you the output you're looking for.
Then, when you iterate through your NodeList, you can choose to call the same method on each Node you create:
NodeList children = node.getChildNodes();
If you want to print an XML-like structure, perhaps a recursive method that prints all child nodes is what you are looking for.

You could re-write the parseFile (I'd rather call it parseChildrenElementNames) method to take an input String that specifies the element name for which you want to print out its children element names:
public static void parseChildrenElementNames(String parentElementName) {
// get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
// Using factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
// parse using builder to get DOM representation of the XML file
Document dom = db
.parse("D:\\mypath\\somefile.xml");
// get the root element
NodeList elementsByTagName = dom.getElementsByTagName(parentElementName);
if(elementsByTagName != null) {
Node parentElement = elementsByTagName.item(0);
// get a nodelist of elements
NodeList nl = parentElement.getChildNodes();
if (nl != null) {
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println("node.getNodeName() is: "
+ node.getNodeName());
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
However, this will only consider the first element that matches the specified name.
For example, to get the list of elements under the first node named someRole, you would call parseChildrenElementNames("someRole"); which would print out:
node.getNodeName() is: id
node.getNodeName() is: addr
node.getNodeName() is: telecom

Java XML - nested elements with same name

How can I reach to elements which have same name and recursive inclusion using Java XML? This has worked in python ElementTree, but for some reason I need to get this running in Java.
I have tried:
String filepath = ("file.xml");
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(filepath);
NodeList nl = doc.getElementsByTagName("*/*/foo");
Example
<foo>
<foo>
<foo>
</foo>
</foo>
</foo>

You seem to be under the impression that getElementsByTagName takes an XPath expression. It doesn't. As documented:
Returns a NodeList of all the Elements in document order with a given tag name and are contained in the document.
If you need to use XPath, you should look at the javax.xml.xpath package. Sample code:
Object set = xpath.evaluate("*/*/foo", doc, XPathConstants.NODESET);
NodeList list = (NodeList) set;
int count = list.getLength();
for (int i = 0; i < count; i++) {
Node node = list.item(i);
// Handle the node
}

Parsing xml string containing hyperlink

I am using DOM to parse an XML string as in the following example. This works great except in one instance. The document which I am trying to parse looks like this:
<response requestID=\"1234\">
<expectedValue>Alarm</expectedValue>
<recommendations>For steps on how to resolve visit Website and use the search features for \"Alarm\"<recommendations>
<setting>Active</setting>
<response>
The code I used to parse the XML is as follows:
try {
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlResult));
Document doc = db.parse(is);
NodeList nlResponse = doc.getElementsByTagName("response");
String[] String = new String[3]; //result entries
for (int i = 0; i < nlResponse.getLength(); i++) {
Element e = (Element) nlResponse.item(i);
int c1 = 0; //count for string array
NodeList ev = e.getElementsByTagName("expectedValue");
Element line = (Element) ev.item(0);
String[c1] = (getCharacterDataFromElement(line));
c1++;
NodeList rec = e.getElementsByTagName("recommendations");
line = (Element) rec.item(0);
String[c1] = (getCharacterDataFromElement(line));
c1++;
NodeList set = e.getElementsByTagName("settings");
line = (Element) set.item(0);
String[c1] = (getCharacterDataFromElement(line));
c1++;
I am able to parse the code and put the result into a string array (as opposed to the System.out.println()). With the current code, my string array looks as follows:
String[0] = "Alarm"
String[1] = "For steps on how to resolve visit"
String[2] = "Active"
I would like some way of being able to read the rest of the information within "Recommendations" in order to ultimately display the hyperlink (along with other output) in a TextView. How can I do this?

I apologize for my previous answer in assuming your xml was ill-formed.
I think what is happening is that your call to the getCharacterDataFromElement is only looking at the first child node for text, when it will need to look at all the child nodes and getting the href attribute as well as the text for the 2nd child node when looking at the recommendations node.
e.g. after getting the Element for recommendation
String srec = "";
NodeList nl = line.getChildNodes();
srec += nl.item(0).getTextContent();
Node n = nl.item(1);
NamedNodeMap nm = n.getAttributes();
srec += "" + n.getTextContent() + "";
srec += nl.item(2).getTextContent();
String[c1] = srec;

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Output XML content with unknown node depth using XPath - java

Related

XML - Extract One tag Value

Get node text with HTML task inside

navigating hierarchy of xml input file

Java XML - nested elements with same name

Parsing xml string containing hyperlink

Categories

Resources