Get XML nodes from certain tree level - java

I need a method like Document.getElementsByTagName(), but one that searches only tags from a certain level (ie, not nested tags with the same name)
Example file:
<script>
<something>
<findme></findme><!-- DO NOT FIND THIS TAG -->
</something>
<findme></findme><!-- FIND THIS TAG -->
</script>
Document.getElementsByTagName() simply returns all findme tags in the document.

Here is an example with XPath
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class TestXPath {
private static final String FILE = "a.xml" ;
private static final String XPATH = "/script/findme";
public static void main(String[] args) {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder;
try {
builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(FILE);
XPathExpression expr = XPathFactory.newInstance().newXPath().compile(XPATH);
Object hits = expr.evaluate(doc, XPathConstants.NODESET ) ;
if ( hits instanceof NodeList ) {
NodeList list = (NodeList) hits ;
for (int i = 0; i < list.getLength(); i++ ) {
System.out.println( list.item(i).getTextContent() );
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
With
<script>
<something>
<findme>1</findme><!-- DO NOT FIND THIS TAG -->
</something>
<findme>Find this</findme><!-- FIND THIS TAG -->
<findme>More of this</findme><!-- FIND THIS TAG AS WELL -->
</script>
It yields
Find this
More of this

If you're using DOM, the only way I can think of would be a recursive function that looks at the children of each element.

Use:
/*/findme
This XPath expression selects all findme elements that are children of the top element of the XML document.
This expression:
//findme[count(ancestor::*) = 5]
selects all findme elements in the XML document that have exactly five ancestor - elements -- that is, they all are at "level 6".

Related

java search specific attribut name in the xml file

I wouldlike to search in my xml file all attribut (name) without use element tag node :
xml :
<test 1><test1/>
<test2> <test2/>
<test 3 id="aaa"> </test3>
<test 5> </test5>
<test 6 id="bbb" name="ijof"> </test6>
JAVA :
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File(path));
root = document.getDocumentElement();
String attribut = root.getAttribute("name");
System.out.println(attribut); // Expected ijof
Did you execute your code at least once? I dont't think so. Otherwise you would have surely noticed that your XML cannot be parsed.
There are several flaws in your example XML:
No root element.
Wrong end tags: It should be <test1></test1> and not <test1><test1/>.
Element names must not contain whitespace and start and end tag must match. It should be <test5> </test5> and not <test 5> </test5>
Apart of that you can use XPATH to get all elements with a name attribute.
Here is a complete example with the XML as a string but this should be irrelevant:
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import java.io.IOException;
import java.io.StringReader;
public class FindNameAttribute {
private static final String XML =
"<root>\n" +
" <test1></test1>\n" +
" <test2> </test2>\n" +
" <test3 id=\"aaa\"> </test3>\n" +
" <test4 name=\"4\"/>\n" +
" <test5> </test5>\n" +
" <test6 id=\"bbb\" name=\"ijof\"> </test6>\n" +
" <test7 id=\"bbb\"><child name=\"childname\"/> </test7>\n" +
"</root>\n";
public static void main(String[] args) {
System.out.println(XML);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
StringReader reader = new StringReader(XML);
InputSource source = new InputSource(reader);
Document document = builder.parse(source);
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//*[#name]", document, XPathConstants.NODESET);
for(int i = 0; i < nodes.getLength(); i++) {
Element el = (Element) nodes.item(i);
String elementName = el.getTagName();
String nameAttribute = el.getAttribute("name");
System.out.println(String.format("Element name: %s, name attribute: %s", elementName, nameAttribute));
}
} catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException e) {
e.printStackTrace();
}
}
}
This is the output:
<root>
<test1></test1>
<test2> </test2>
<test3 id="aaa"> </test3>
<test4 name="4"/>
<test5> </test5>
<test6 id="bbb" name="ijof"> </test6>
<test7 id="bbb"><child name="childname"/> </test7>
</root>
Element name: test4, name attribute: 4
Element name: test6, name attribute: ijof
Element name: child, name attribute: childname
The relevant XPATH expression is: //*[#name]
//: Looks for every element in the document
*: Placeholder for element name. Each name matches.
*[#name]: The [] denotes the predicate. We only want elements with a name attribute.
#: Means the following name is the name of an attribute. Whithout it would be interpreted as an element name

xpath to get position of childnode against the parent in XML

I need to loop through all chilnodes in the parent node (callEvents) and save each time the position of the child node.(First call event, second call event...). I'm stack in how to get the position of each child node in the parent node java/xpath?
say I have the following xml:
<callEvents>
<gprsCall>...</gprsCall>
<gprsCall>...</gprsCall>
<mobileOrigintaedCall>...</mobileOrigintaedCall>
<gprsCall>...</gprsCall>
<gprsCall>...</gprsCall>
</callEvents>
So the code should return:
My first gprsCall position = 1
My second gprsCall position = 2
My first mobileOriginatedCall position = 3
My third gprsCall position = 4
My fourth gprsCall position = 5
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathConstants;
import java.io.StringReader;
import org.xml.sax.InputSource;
class myxml {
static final String xml = "<callEvents>\n"
+ "<gprsCall>...</gprsCall>\n"
+ "<gprsCall>...</gprsCall>\n"
+ "<mobileOrigintaedCall>...</mobileOrigintaedCall>\n"
+ "<gprsCall>...</gprsCall>\n"
+ "<gprsCall>...</gprsCall>\n"
+ "</callEvents>";
public static void main (String... args) throws Throwable {
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xml)));
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("/callEvents/*");
NodeList nodelist = (NodeList) xpath.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodelist.getLength(); i++) {
System.out.println("Position: " + (i + 1)
+ ", name: " + nodelist.item(i).getNodeName());
}
} catch (Throwable e) {
throw e;
}
}
}
Iterate over the XPath callEvents/* and use the position() function to get the position.
This is an XSLT example:
<xsl:template match="/callEvents/*">
<xsl:value-of select="concat(local-name(),' - ',position() div 2)" />
</xsl:template>

getElementsByTagName searching down all levels of XML nodes

I have this XML file:
<root>
<node1>
<name>A</name>
<node2>
<name>B</name>
<node3>
<name>C</name>
<number>001</number>
</node3>
</node2>
</node1>
</root>
I am parsing the file, to get the name for each node, and the corresponding number if existing.
I use:
String number = eElement.getElementsByTagName("number").item(0).getTextContent();
This should give me something like:
Name | Number
A |
B |
C | 001
But I get:
Name | Number
A | 001
B | 001
C | 001
So, I think the getElementsByTagName("Number") is looking for number node in all the children of a node. I don't want that. Does anybody know a workaround?
I thought of using XPath instead of the above method, but I really want to know if there's an existing way. Thanks
You can use the javax.xml.xpath APIs in the JDK/JRE to have much more control over the XML returned over getElementsByTagName.
import java.io.File;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new File("filename.xml"));
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
Element element = (Element) xpath.evaluate("//node3/name", document, XPathConstants.NODE);
}
}
Hope this helps,
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class XML {
public static void main(String[] args) throws IOException {
File input = new File("D:\\sample.xml");
Document doc = Jsoup.parse(input, "UTF-8");
Elements allElements = doc.select("root");
for(Element value : allElements){
System.out.println(value.text());
}
String node3Num = doc.select("node3").tagName("number").text();
System.out.println(node3Num);
}
}
Output:
A B C 001
C 001
I have used jsoup-1.7.2 jar (you can download from jsoup.org)
Assuming that your eElement variable is always one of the <node1/>, <node2/>, … elements in question, then the following code should work when you replace your own snippet mentioned above:
String number = null;
NodeList childNodes = eElement.getChildNodes();
for (int i = 0; i < childNodes.getLength(); i++) {
Node node = childNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE
&& node.getNodeName().equals("number")) {
number = node.getTextContent();
break;
}
}
The number variable will be null when there is no <number/> child; it will contain the number you need otherwise.

using conditions in XPath expressions

I need to parse through an XML document in the database and search for a given expression in it. Then, I must return a String value if the given expression is present in the XML else I need to parse through the next expression and return another String value and so on.
I achieved this by using the following code:
// An xml document is passed as a Node when getEntryType() method is called
public static class XMLTextFields {
public static String getEntryType(Node target) throws XPathExpressionException {
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXpath();
String entryType = null;
String [] expression = new String [] {"./libx:package", "./libx:libapp", "./libx:module"};
String [] type = new String [] {"Package", "Libapp", "Module" };
for (int i = 0; i < 3; i ++) {
XPathExpression expr = xpath.compile(expression[i]);
Object result = expr.evaluate(target, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
if (nodes.getLength() == 0)
continue;
entryType = (type[i]);
}
return entryType;
}
}
I am wondering if there is a simpler way to do this? Meaning, is there a way to use the "expression" like a function which returns a string if the expression is present in the xml.
I am guessing I should be able to do something like this but am not exactly sure:
String [] Expression = new String [] {"[./libx:package]\"Package\"", ....}
Meaning, return "Package" if libx:package node exists in the given XML
If your XPath processor is version 2, you can use if expressions: http://www.w3.org/TR/xpath20/#id-conditionals .
You can use XSLT here. In XSLT you can check the node name by using
<xsl:value-of select="*[starts-with(name(),'libx:package')]" />
OR you can check using
<xsl:if select="name()='libx:package'" >
<!-- Your cusotm elements here... -->
</xsl:if>
You can check existence of Element OR Attribute this way to validate specific needs.
hope this helps.
Yes there is, just use an XPath functions in your expression:
Expression exp = xpath.compile("local-name(*[local-name() = 'package'])")
// will return "package" for any matching elements
exp.evaluate(target, XPathConstants.STRING);
But this will return "package" instead of "Package". Note the capital P
Below is the Test code:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import java.util.Map;
import java.util.HashMap;
public class Test {
private static Map<String, String> mappings = new HashMap<String, String>();
static {
mappings.put("package", "Package");
mappings.put("libapp", "Application");
mappings.put("module", "Module");
}
public static void main(String[] args) throws Throwable {
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String entryType = null;
XPathExpression [] expressions = new XPathExpression[] {
xpath.compile("local-name(*[local-name() = 'package'])"),
xpath.compile("local-name(*[local-name() = 'libapp'])"),
xpath.compile("local-name(*[local-name() = 'module'])")
};
DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = fac.newDocumentBuilder();
Document doc = parser.parse(args[0]);
for(int i = 0; i < expressions.length; i++) {
String found = (String) expressions[i].evaluate(doc.getDocumentElement(),
XPathConstants.STRING);
entryType = mappings.get(found);
if(entryType != null && !entryType.trim().isEmpty()) {
break;
}
}
System.out.println(entryType);
}
}
Contents of text file:
<?xml version="1.0"?>
<root xmlns:libx="urn:libex">
<libx:package>mypack</libx:package>
<libx:libapp>myapp</libx:libapp>
<libx:module>mymod</libx:module>
</root>
In XPath 1.0
concat(translate(substring(local-name(libx:package|libx:libapp|libx:module),
1,
1),
'plm',
'PLM'),
substring(local-name(libx:package|libx:libapp|libx:module),2))
EDIT: It was dificult to understand the path because there was not provided input sample...
#ALL: Thanks!
I used:
XPathExpression expr = xpath.compile("concat(substring('Package',0,100*boolean(//libx:package))," + "substring('Libapp',0,100*boolean(//libx:libapp)),substring('Module',0,100*boolean(//libx:module)))");
expr.evaluate(target, XPathConstants.STRING);

Getting XML Node text value with Java DOM

I can't fetch text value with Node.getNodeValue(), Node.getFirstChild().getNodeValue() or with Node.getTextContent().
My XML is like
<add job="351">
<tag>foobar</tag>
<tag>foobar2</tag>
</add>
And I'm trying to get tag value (non-text element fetching works fine). My Java code sounds like
Document doc = db.parse(new File(args[0]));
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an,an2;
for (int i=0; i < nl.getLength(); i++) {
an = nl.item(i);
if(an.getNodeType()==Node.ELEMENT_NODE) {
NodeList nl2 = an.getChildNodes();
for(int i2=0; i2<nl2.getLength(); i2++) {
an2 = nl2.item(i2);
// DEBUG PRINTS
System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
if(an2.hasChildNodes())
System.out.println(an2.getFirstChild().getTextContent());
if(an2.hasChildNodes())
System.out.println(an2.getFirstChild().getNodeValue());
System.out.println(an2.getTextContent());
System.out.println(an2.getNodeValue());
}
}
}
It prints out
tag type (1):
tag1
tag1
tag1
null
#text type (3):
_blank line_
_blank line_
...
Thanks for the help.
I'd print out the result of an2.getNodeName() as well for debugging purposes. My guess is that your tree crawling code isn't crawling to the nodes that you think it is. That suspicion is enhanced by the lack of checking for node names in your code.
Other than that, the javadoc for Node defines "getNodeValue()" to return null for Nodes of type Element. Therefore, you really should be using getTextContent(). I'm not sure why that wouldn't give you the text that you want.
Perhaps iterate the children of your tag node and see what types are there?
Tried this code and it works for me:
String xml = "<add job=\"351\">\n" +
" <tag>foobar</tag>\n" +
" <tag>foobar2</tag>\n" +
"</add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = db.parse(bis);
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an,an2;
for (int i=0; i < nl.getLength(); i++) {
an = nl.item(i);
if(an.getNodeType()==Node.ELEMENT_NODE) {
NodeList nl2 = an.getChildNodes();
for(int i2=0; i2<nl2.getLength(); i2++) {
an2 = nl2.item(i2);
// DEBUG PRINTS
System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getTextContent());
if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getNodeValue());
System.out.println(an2.getTextContent());
System.out.println(an2.getNodeValue());
}
}
}
Output was:
#text: type (3): foobar foobar
#text: type (3): foobar2 foobar2
If your XML goes quite deep, you might want to consider using XPath, which comes with your JRE, so you can access the contents far more easily using:
String text = xp.evaluate("//add[#job='351']/tag[position()=1]/text()",
document.getDocumentElement());
Full example:
import static org.junit.Assert.assertEquals;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.junit.Before;
import org.junit.Test;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
public class XPathTest {
private Document document;
#Before
public void setup() throws Exception {
String xml = "<add job=\"351\"><tag>foobar</tag><tag>foobar2</tag></add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
document = db.parse(new InputSource(new StringReader(xml)));
}
#Test
public void testXPath() throws Exception {
XPathFactory xpf = XPathFactory.newInstance();
XPath xp = xpf.newXPath();
String text = xp.evaluate("//add[#job='351']/tag[position()=1]/text()",
document.getDocumentElement());
assertEquals("foobar", text);
}
}
I use a very old java. Jdk 1.4.08 and I had the same issue. The Node class for me did not had the getTextContent() method. I had to use Node.getFirstChild().getNodeValue() instead of Node.getNodeValue() to get the value of the node. This fixed for me.
If you are open to vtd-xml, which excels at both performance and memory efficiency, below is the code to do what you are looking for...in both XPath and manual navigation... the overall code is much concise and easier to understand ...
import com.ximpleware.*;
public class queryText {
public static void main(String[] s) throws VTDException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", true))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
// first manually navigate
if(vn.toElement(VTDNav.FC,"tag")){
int i= vn.getText();
if (i!=-1){
System.out.println("text ===>"+vn.toString(i));
}
if (vn.toElement(VTDNav.NS,"tag")){
i=vn.getText();
System.out.println("text ===>"+vn.toString(i));
}
}
// second version use XPath
ap.selectXPath("/add/tag/text()");
int i=0;
while((i=ap.evalXPath())!= -1){
System.out.println("text node ====>"+vn.toString(i));
}
}
}

Categories