Java XML - nested elements with same name - java

How can I reach to elements which have same name and recursive inclusion using Java XML? This has worked in python ElementTree, but for some reason I need to get this running in Java.
I have tried:
String filepath = ("file.xml");
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(filepath);
NodeList nl = doc.getElementsByTagName("*/*/foo");
Example
<foo>
<foo>
<foo>
</foo>
</foo>
</foo>

You seem to be under the impression that getElementsByTagName takes an XPath expression. It doesn't. As documented:
Returns a NodeList of all the Elements in document order with a given tag name and are contained in the document.
If you need to use XPath, you should look at the javax.xml.xpath package. Sample code:
Object set = xpath.evaluate("*/*/foo", doc, XPathConstants.NODESET);
NodeList list = (NodeList) set;
int count = list.getLength();
for (int i = 0; i < count; i++) {
Node node = list.item(i);
// Handle the node
}

Related

parse xml using dom java

I have the bellow xml:
<modelingOutput>
<listOfTopics>
<topic id="1">
<token id="354">wish</token>
</topic>
</listOfTopics>
<rankedDocs>
<topic id="1">
<documents>
<document id="1" numWords="0"/>
<document id="2" numWords="1"/>
<document id="3" numWords="2"/>
</documents>
</topic>
</rankedDocs>
<listOfDocs>
<documents>
<document id="1">
<topic id="1" percentage="4.790644689978203%"/>
<topic id="2" percentage="11.427632949428334%"/>
<topic id="3" percentage="17.86913349249596%"/>
</document>
</documents>
</listOfDocs>
</modelingOutput>
Ι Want to parse this xml file and get the topic id and percentage from ListofDocs
The first way is to get all document element from xml and then I check if grandfather node is ListofDocs.
But the element document exist in rankedDocs and in listOfDocs, so I have a very large list.
So I wonder if exist better solution to parse this xml avoiding if statement?
My code:
public void parse(){
Document dom = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
dom = db.parse(is);
Element doc = dom.getDocumentElement();
NodeList documentnl = doc.getElementsByTagName("document");
for (int i = 1; i <= documentnl.getLength(); i++) {
Node item = documentnl.item(i);
Node parentNode = item.getParentNode();
Node grandpNode = parentNode.getParentNode();
if(grandpNode.getNodeName() == "listOfDocs"{
//get value
}
}
}
First, when checking the node name you shouldn't compare Strings using ==. Always use the equals method instead.
You can use XPath to evaluate only the document topic elements under listOfDocs:
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPath.compile("//listOfDocs//document/topic");
NodeList topicnl = (NodeList) xPathExpression.evaluate(dom, XPathConstants.NODESET);
for(int i = 0; i < topicnl.getLength(); i++) {
...
If you do not want to use the if statement you can use XPath to get the element you need directly.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("source.xml");
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/*/listOfDocs/documents/document/topic");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getAttributes().getNamedItem("id"));
System.out.println(nodes.item(i).getAttributes().getNamedItem("percentage"));
}
Please check GitHub project here.
Hope this helps.
I like to use XMLBeam for such tasks:
public class Answer {
#XBDocURL("resource://data.xml")
public interface DataProjection {
public interface Topic {
#XBRead("./#id")
int getID();
#XBRead("./#percentage")
String getPercentage();
}
#XBRead("/modelingOutput/listOfDocs//document/topic")
List<Topic> getTopics();
}
public static void main(final String[] args) throws IOException {
final DataProjection dataProjection = new XBProjector().io().fromURLAnnotation(DataProjection.class);
for (Topic topic : dataProjection.getTopics()) {
System.out.println(topic.getID() + ": " + topic.getPercentage());
}
}
}
There is even a convenient way to convert the percentage to float or double. Tell me if you like to have an example.

How to print values within XML tag in java [duplicate]

This question already has answers here:
Get element name from XML in Java DOM
(3 answers)
Closed 8 years ago.
I never really know how to work with XML tags.How do I traverse the node and print particular node in the XML tag.Below is the XML file.
<Employees>
<Employee>
<Gender></Gender>
<Name>
<Firstname></Firstname>
<Lastname></Lastname>
</Name>
<Email></Email>
<Projects>
<Project></Project>
</Projects>
<PhoneNumbers>
<Home></Home>
<Office></Office>
</PhoneNumbers>
</Employee>
There is no data but this is the structure.I am using the following code to parse it partially.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document xmlDocument = builder.parse("employees.xml");
System.out.println(xmlDocument.getDocumentElement().getNodeName());
I would like to print the gender and lastname values.How do I parse the tag which is inside the Name tag which in turn the Name is inside the Employee tag.
Regards.
You should use XPATH. There is a good explanation in this post.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(<uri_as_string>);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
Try this.
String expression = "/Employees/Employee/Gender"; //read Gender value
NodeList nList = (NodeList) xPath.compile(expression).evaluate(document, XPathConstants.NODESET);
for (int j = 0; nList != null && j < nList.getLength(); j++) {
Node node = nList.item(j);
System.out.println("" + node.getFirstChild().getNodeValue());
}
expression = "/Employees/Employee/Name/Lastname"; //read Lastname value
nList = (NodeList) xPath.compile(expression).evaluate(document, XPathConstants.NODESET);
for (int j = 0; nList != null && j < nList.getLength(); j++) {
Node node = nList.item(j);
System.out.println("" + node.getFirstChild().getNodeValue());
}

how to get attribute of given node?

I am trying to write DOM XML parsing.
My Xml file
<?xml version="1.0"?>
<BLAH>
<AgentNm type="citi1">
<accName>accName1</accName>
<accType>accType1</accType>
<someThing>someThing1</someThing>
<amt>100000</amt>
</AgentNm>
<AgentNm type="citi2">
<accName>accName2</accName>
<accType>accType2</accType>
<someThing>someThing2</someThing>
<amt>200000</amt>
</AgentNm>
</BLAH>
And i tried following java code
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse (new File("c:\\file.xml"));
// normalize text representation
doc.getDocumentElement ().normalize ();
System.out.println ("Root element of the doc is " +doc.getDocumentElement().getNodeName());
NodeList agentNm = doc.getElementsByTagName("AgentNm");
int totalAgentNm = agentNm.getLength();
System.out.println("Total no of Agents : " + totalAgentNm);
for(int s=0; s<agentNm.getLength() ; s++){
Node firstPersonNode = agentNm.item(s);
if(firstPersonNode.getNodeType() == Node.ELEMENT_NODE){
Element firstPersonElement = (Element)firstPersonNode;
PrintNodeElem(firstPersonElement,"type");
}//end of if clause
}//end of for loop with s var
static void PrintNodeElem(Element nodeElem,String elem){
NodeList someThingList = nodeElem.getElementsByTagName(elem);
Element ageElement = (Element)someThingList.item(0);
NodeList textAgeList = ageElement.getChildNodes();
System.out.println(elem+" : " +((Node)textAgeList.item(0)).getNodeValue().trim());
}
But, when i tried to execute above method,
i am getting null pointer exception.
can any one explain me, how to fix this.
if you want an attribute of a given node, I would suggest XPath. It is much easier.
http://onjava.com/onjava/2005/01/12/xpath.html

Getting error in parsing an XML file using JDOM

I have this XML document:
<?xml version="1.0" encoding="utf-8"?>
<RootElement>
<Achild>
.....
</Achild>
</RootElement>
How can I check if the document contains Achild element or not? I tried
final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Use the factory to create a builder
try {
final DocumentBuilder builder = factory.newDocumentBuilder();
final Document doc = builder.parse(configFile);
final Node parentNode = doc.getDocumentElement();
final Element childElement = (Element) parentNode.getFirstChild();
if(childElement.getNodeName().equalsIgnoreCase(...
but it gives me an error (childElement is null).
I think that you're getting #text node (that between <RootElement> and <Achild>) as first child (that's pretty common mistake), for example:
final Node parentNode = doc.getDocumentElement();
Node childElement = parentNode.getFirstChild();
System.out.println(childElement.getNodeName());
Returns:
#text
Use instead:
final Node parentNode = doc.getDocumentElement();
NodeList childElements = parentNode.getChildNodes();
for (int i = 0; i < childElements.getLength(); ++i)
{
Node childElement = childElements.item(i);
if (childElement instanceof Element)
System.out.println(childElement.getNodeName());
}
Wanted result:
Achild
EDIT:
There is second way using DocumentBuilderFactory.setIgnoringElementContentWhitespace method:
factory.setIgnoringElementContentWhitespace(true);
However this works only in validating mode, so you need to provide DTD in your XML document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE RootElement [
<!ELEMENT RootElement (Achild)+>
<!ELEMENT Achild (#PCDATA)>
]>
<RootElement>
<Achild>some text</Achild>
</RootElement>
and set factory.setValidating(true). Full example:
final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
final DocumentBuilder builder = factory.newDocumentBuilder();
final Document doc = builder.parse("input.xml");
final Node rootNode = doc.getDocumentElement();
final Element childElement = (Element) rootNode.getFirstChild();
System.out.println(childElement.getNodeName());
Wanted result with original code:
Achild
It sounds like .getFirstChild() is returning you a text node containing the white space between "" and "", in which case you would need to advance to the next sibling node to get to where you expect.

Child node name in a xml

I am trying to write a piece of code that can parse any xml and print its contents. I am using DOM parser. I am able to get the name of the root tag of the xml, but cant obtain tag name of the immediate child. This can be done easily in case the node names are known by using the method 'getElementsByTagName' . Is there any way out of this dilemma ?
My code goes like this :
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
doc.getDocumentElement().getNodeName() // this gets me the name of the root node.
Now how can i get the name of the immediate child node so that i can traverse the xml using getElementsByTagName("x").
Thanks in advance.
getChildNodes() returns all children of an element. The list will contain more then just elements so you'll have to check each child node if it is an element:
NodeList nodes = doc.getDocumentElement().getChildNodes();
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.get(i);
if (node instanceof Element) {
Element childElement = (Element) node;
System.out.println("tag name: " + childElement.getTagName());
}
}

Categories