How can I select several XML elements using XPath?

How can I select several XML elements using XPath? - java

Assuming the following XML :
<response>
<header>
<resultCode>0000</resultCode>
<resultMsg>OK</resultMsg>
</header>
<body>
<items>
<item>
<addr1>America</addr1>
<addr2>(Atlanta)</addr2>
</item>
<item>
<addr1>Canada</addr1>
<addr2>(Toronto)</addr2>
</item>
<item>
<addr1>France</addr1>
<addr2>(Paris)</addr2>
</item>
</items>
</body>
</response>
I wanted to select several XML elements using XPath.
So, I wrote the JAVA code below.
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(urlBuilder.toString());
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList items = (NodeList) xpath.evaluate("//item", doc, XPathConstants.NODESET );
NodeList addrAll= (NodeList) xpath.evaluate("//item/addr1 | //item/addr2", doc, XPathConstants.NODESET);
System.out.println("length:"+addrAll.getLength());
for(int tmp=0; tmp<addrAll.getLength();tmp++){
System.out.println(addrAll.item(tmp).getTextContent());
}
The result is:
length:6
America
(Atlanta)
Canada
(Toronto)
France
(Paris)
But, this is not what I wanted.
My expected output:
length:3
America (Atlanta)
Canada (Toronto)
France (Paris)
I hope you understand my question.
How can I edit my code to do that?

That's not how xpath works; it retrieves the information it designates, but won't concatenate several data points.
To do that, you'll have either use xslt, or you can create two xpaths, one for each of the addrX parts, and then have the Java client code combine them.
How you need to update your Java code depends on several things, like if each item will always contain both an addr1 and addr2, for example.
If you can rely on that, you can do this:
System.out.println("length:"+addrAll.getLength());
for(int tmp=0; tmp<addrAll.getLength();tmp+=2){
String country = addrAll.item(tmp).getTextContent();
String city = addrAll.item(tmp+1).getTextContent();
System.out.printf("%s %s\n", country, city);
}

XPath 1.0 has a limited set of data types available: string, boolean, number, and node-set. Your desired answer is a sequence of three strings, which don't correspond to existing nodes, and there's no such thing in XPath 1.0 as a sequence of three strings.
If you're in the Java world, there's really no reason to restrict yourself to XPath 1.0. XPath 2.0 extends the type system to allow a sequence of strings, so you can get your answer with an expression such as //item/concat(addr1, ' ', addr2) or //item/string-join(*, ' ').
XPath 2.0 has been around for more than ten years - time to move forward! You might also consider using a more modern tree model than DOM: JDOM2 and XOM are vastly easier to use.

List<WebElement> items = wd.findElements(By.xpath("//items/item"));
System.out.println("length: " + items.size());
items.forEach(item -> System.out.println(item.getText()));
Output:
length: 3
America (Atlanta)
Canada (Toronto)
France (Paris)
You could put into the List or Map.

Related

Check if element node contains no text using java and Xpath?

I am new to Xpath. I am facing a problem that I have to get a boolean response from Xpath, if an element does not contains any text then it should return false otherwise true. I have seen many examples and I don't have much time to learn Xpath expressions. Below is the Xml file.
<?xml version="1.0" encoding="UTF-8" ?>
<order id="1234" date="05/06/2013">
<customer first_name="James" last_name="Rorrison">
<email>j.rorri#me.com</email>
<phoneNumber>+44 1234 1234</phoneNumber>
</customer>
<content>
<order_line item="H2G2" quantity="1">
<unit_price>23.5</unit_price>
</order_line>
<order_line item="Harry Potter" quantity="2">
<unit_price></unit_price>//**I want false here**
</order_line>
</content>
<credit_card number="1357" expiry_date="10/13" control_number="234" type="Visa" />
</order>
Could you point me the right direction to create xpath expression for this problem.
What I want is a expression(dummy expression) as below.
/order/content/order_line/unit_price[at this point I want to put a validation which will return true or false based on some check of isNull or notNull].

The following xpath will do this:
not(boolean(//*[not(text() or *)]))
but this xpath will also include the credit_card node since it to does not contain any text (the attributes are not text()).
if you also want to exclude node with attributes then use this..
not(boolean(//*[not(text() or * or #*)]))
Following your edit, you can do this..
/order/content/order_line/unit_price[not(text()]
It will return a list of nodes with no text and from there you can test against the count of nodes for your test.
or to return true/false..
not(boolean(/order/content/order_line/unit_price[not(text()]))

How to get inner text from children XML tags using jdom2?

My XML file is structured like so:
<parent xml:space="preserve">
Hello, my name is
<variable type="firstname">ABC</variable>
and my last name is
<variable type="lastname">XYZ</variable>
</parent>
I need a way to get the text output in this format:
"Hello, my name is ABC and my last name is XYZ".
Now the issue with using jdom2 is that element.getText() method returns the entire string as a single string (with no regard to position of the child tags):
"Hello, my name is and my last name is".
Is there anyway I can get the position of the child tags/delimit them, so that even a manual variable insert can be done at some later point?

edit The example uses the Xerces parser which is included in Java runtime API for the DOM. For a JDOM2 solution see the answer from rolfl.
As a starting point you could use following snippet. Based on what you really want to achieve changes needs to be done by yourself.
xml = "<parent xml:space=\"preserve\">\n"
+ "Hello, my name is\n"
+ " <variable type=\"firstname\">ABC</variable>\n"
+ "and my last name is \n"
+ " <variable type=\"lastname\">XYZ</variable>\n"
+ "</parent>";
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile("//parent").evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getTextContent());
}
output
Hello, my name is
ABC
and my last name is
XYZ
note The snippet is not optimised. See it more as a PoC.

getText is specified in JDOM to return the immediate Text content of the element. JDOM also has the method getValue() which returns:
Returns the XPath 1.0 string value of this element, which is the complete, ordered content of all text node descendants of this element (i.e. the text that's left after all references are resolved and all other markup is stripped out.)
Applying this to your document:
Document doc = new SAXBuilder().build("parentwtext.xml");
Element root = doc.getRootElement();
System.out.println(root.getValue());
I get the output (there's an empty line at the beginning I can't show in here):
Hello, my name is
ABC
and my last name is
XYZ

Reading XML tag from MediaWiki using Java

I need to read output of 'search' tag from following url usign Java.
First I need to read XML into some string from following URL:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srlimit=1&srsearch=big+brother
I should end up having this:
<api>
<query-continue>
<search sroffset="1"/>
</query-continue>
<query>
<searchinfo totalhits="55180"/>
<search>
<p ns="0" title="Big Brothers Big Sisters of America" snippet="<span class='searchmatch'>Big</span> <span class='searchmatch'>Brothers</span> <span class='searchmatch'>Big</span> Sisters of America is a 501(c)(3) non-profit organization whose goal is to help all children reach their potential through <b>...</b> " size="13008" wordcount="1906" timestamp="2014-04-15T06:46:01Z"/>
</search>
</query>
</api>
Then once I have the XML, I need to get content of the search tag:
Output of 'search' tag looks like this and I need to get two parts from the code in the middle:
<search>
<p ns="0" title="Big Brothers Big Sisters of America" snippet="<span class='searchmatch'>Big</span> <span class='searchmatch'>Brothers</span> <span class='searchmatch'>Big</span> Sisters of America is a 501(c)(3) non-profit organization whose goal is to help all children reach their potential through <b>...</b> " size="13008" wordcount="1906" timestamp="2014-04-15T06:46:01Z"/>
</search>
At the end, all I need is to have two strings, which would equal to this:
String title = Big Brothers Big Sisters of America
String snippet = "<span class='searchmatch'>Big..."
Can someone please help me amending this code, I am not sure what I am doing wrong. I don't think it's even retrieving XML from url, much less the tags inside the XML.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srlimit=1&srsearch=big+brother");
doc.getDocumentElement().normalize();
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
XPathExpression expr = xpath.compile("//query/search/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
System.out.println(nodes.item(i).getNodeValue());
}
Sorry, I am a newbie and can't find the answer to this anywhere.

The main problem here is that you're asking for text nodes that are children of <search>, but in fact the <p ..> that you want is not a text node: it's an element. (In fact, the <search> element has no text node children, as you can tell when you view the response from that URL using "View Source".)
So what you want to do is change your XPath expression to
//query/search/p
which will give you the p element node. Then ask for the value of this node's two attributes title and snippet in your Java code:
Element e = (Element)(nodes.item(i));
String title = e.getAttribute("title");
String snippet = e.getAttribute("snippet");
Or, you could do two XPath queries, one for each attribute:
//query/search/p/#title
and
//query/search/p/#snippet
assuming there will only be one <p> element. If you were doing this over multiple <p> elements, you'd probably want to keep each pair of attributes together instead of having two separate lists of results.

XPath - Get id attribute from parent element

i have following xml file:
<diagnostic version="1.0">
<!-- diagnostic panel 1 -->
<panel xml:id="0">
<!-- list controls -->
<control xml:id="0_0">
<settings description="text 1"/>
</control>
<control xml:id="0_1">
<settings description="text 2"/>
</control>
</panel>
<panel xml:id="1">
<!-- list controls -->
<control xml:id="1_0">
<settings description="text 3"/>
</control>
<control xml:id="1_1">
<settings description="text 4"/>
</control>
</panel>
</diagnostic>
and definition XPath:
//*[not(#description='-')]/#description
and Java code:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("diagnostic.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
// XPath Query for showing all nodes value
XPathExpression expr = xpath.compile("//*[not(#description='-')]/#description");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(i + ": " + nodes.item(i).getParentNode() + nodes.item(i).getNodeValue());
}
This definition of XPath would return all attribute values description where the value of this attribute is not '-'.
Output:
text 1
text 2
text 3
text 4
But I need to find this attribute description also attribute xml:id element control.
Output:
0_0 text 1
0_1 text 2
1_0 text 3
1_1 text 4
How to do that in my description also returns a xml:id element of control? I need to know that the description given element is control.

Someone correct me if I'm wrong, but I don't think this can be done with a single XPath expression. The concat function returns a single text result, not a list. I suggest you run multiple XPath expressions and construct your results from that, or run a single XPath expression to get the settings elements you need, then take the description attribute from it and concatenate it with the xml:id attribute from the parent element if that's a control one.
Nodes keep references to their parents. Use method getParentNode() to obtain it.
Here's an alternative: run this XPath expression...
//control[settings[#description!='-']]/#xml:id | //control/settings[#description!='-']/#description
... and then concatenate the text of the alternating results in the returned node list. In other words, text from item 0 + item 1, text from item 2 + item 3 etc.
The above XPath expression will return this node list:
0_0
text 1
0_1
text 2
1_0
text 3
1_1
text 4
You can then parse through that list and construct your results.
Be careful. This will only work if there's at most 1 settings element per control element. Also, you may find that on evaluation the XPath engine throws an error for that xml: prefix. It may say that it's unknown. You might have to bind that prefix to the correct namespace first. Since the xml prefix is reserved and bound by default to a specific namespace, this might not be needed. I'm not certain as I haven't used it before.
I've tested the expression in XMLSpy. It's not entirely impossible that the XPath engine used in Java (or the one you set for use) returns the nodes in another order. It might evaluate both parts of the "or" (the pipe symbol) separately and then dump the results into a single node list. I don't know what the XPath spec mandates regarding result ordering.

I may be just as wrong, but the nodes you traverse in the result are the XML nodes themselves. Your code sample is almost there:
- nodes.item(i) points to the attribute "description".
- nodes.item(i).getParentNode() points to the tag "settings".
- nodes.item(i).getParentNode().getParentNode() would point to the tag "control" (class Element). You could then use getAttribute() or getAttributeNS() on that node to find get the attribute you need.

Parsing Tree-structure into Relational-style data-store

Would someone be able to help with how I could implement this, or at least the algorithm to use for this.
What I am trying to do is parse a a hierarchical/tree structure file into a relation store. I will explain further below, with an example.
This is a sample source file, just a simple/non-realistic example for purposes of this question.
<title text=“title1">
<comment id=“comment1">
<data> this is part of comment one</data>
<data> this is some more of comment one</data>
</comment>
<comment id=“comment2”>
<data> this is part of comment two</data>
<data> this is some more of comment two</data>
<data> this is even some more of comment two</data>
</comment>
</title>
So the main thing to note here is that the number of <comment>, and the number of <data> elements for each comment may be arbitrary. So given the above, I would want to transform into something looking like:
title | comment | data
------------------------------------------------------------------------
title1 comment1 this is some part of comment one
title1 comment1 this is some more of comment one
title1 comment2 this is part of comment two
title1 comment2 this is some more of comment two
title1 comment2 this is even some more of comment two
In order to make this happen, lets say I can have specified the relational schema in the following manner, using an xpath expression that can be evaluated on the source file.
attribute1: title = /title/#title
attribute2: comment = /title/comment/#id
attribute3: data = /title/comment/data/text()
Suggested Data-structures:
ResultSet is a List<Map<String,String>> (where: each map represents a single row)
Schema is a Map<String,String> (where: we map attribute-name --> path expression)
Source file, some DOM Document

I'm not sure whether you're asking how to implement the XML parser itself or how, given a parse tree for the XML, how to flatten it into a hierarchical structure. I'm guessing that you're looking at the latter of these now (there are many good XML parsers out there and I doubt that's the bottleneck), so I'll answer that here. Let me know if you're actually interested in the XML parsing detail and I can update the answer.
I believe that the way you want to think about this is with a recursive descent over the tree. The idea is as follows: your naming system consists of the concatenation of all the nodes above you in the tree followed by your own name. Given that, you could run a recursive DFS over the tree using something like this:
FlattenXML(XMLDocument x) {
for each top-level XML node t:
RecFlattenTree(t, "");
}
RecFlattenTree(Tree t, String prefix) {
if t is a leaf with data d:
update the master table by adding (prefix, d) to the list of entries
else
for each child c of t, whose name is x:
RecFlattenTree(c, prefix + "/" + x)
}
For example, if you were to trace this over the XML document you had up top, it might go something like this:
RecFlattenTree(title1, "/title1")
RecFlattenTree(comment1, "/title1/comment1")
RecFlattenTree(data node 1 , "/title1/comment1")
Add /title1/comment1/data, value = "this is some part of comment one"
RecFlattenTree(data node 2, "/title1/comment1")
Add /title1/comment2/data, value = "this is some more of comment one"
RecFlattenTree(comment2, "/title1/comment2")
RecFlattenTree(data node 1 , "/title1/comment2")
Add /title1/comment2/data, value = "this is part of comment two"
RecFlattenTree(data node 2, "/title1/comment2")
Add /title1/comment2/data, value = "this is more of comment two"
RecFlattenTree(data node 3, "/title1/comment2")
Add /title1/comment2/data, value = "this is even more of comment two"
Which ends up generating the list
/title1/comment1/data, value = "this is some part of comment one"
/title1/comment1/data, value = "this is some more of comment one"
/title1/comment1/data, value = "this is part of comment two"
/title1/comment1/data, value = "this is more of comment two"
/title1/comment1/data, value = "this is even more of comment two"
Which is exactly what you want.
Hope this helps! Let me know if I misinterpreted your question!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I select several XML elements using XPath? - java

List<WebElement> items = wd.findElements(By.xpath("//items/item")); System.out.println("length: " + items.size()); items.forEach(item -> System.out.println(item.getText())); Output: length: 3 America (Atlanta) Canada (Toronto) France (Paris) You could put into the List or Map.

Related

Check if element node contains no text using java and Xpath?

How to get inner text from children XML tags using jdom2?

Reading XML tag from MediaWiki using Java

XPath - Get id attribute from parent element

Parsing Tree-structure into Relational-style data-store

Categories

Resources