Getting error in parsing an XML file using JDOM - java

I have this XML document:
<?xml version="1.0" encoding="utf-8"?>
<RootElement>
<Achild>
.....
</Achild>
</RootElement>
How can I check if the document contains Achild element or not? I tried
final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Use the factory to create a builder
try {
final DocumentBuilder builder = factory.newDocumentBuilder();
final Document doc = builder.parse(configFile);
final Node parentNode = doc.getDocumentElement();
final Element childElement = (Element) parentNode.getFirstChild();
if(childElement.getNodeName().equalsIgnoreCase(...
but it gives me an error (childElement is null).

I think that you're getting #text node (that between <RootElement> and <Achild>) as first child (that's pretty common mistake), for example:
final Node parentNode = doc.getDocumentElement();
Node childElement = parentNode.getFirstChild();
System.out.println(childElement.getNodeName());
Returns:
#text
Use instead:
final Node parentNode = doc.getDocumentElement();
NodeList childElements = parentNode.getChildNodes();
for (int i = 0; i < childElements.getLength(); ++i)
{
Node childElement = childElements.item(i);
if (childElement instanceof Element)
System.out.println(childElement.getNodeName());
}
Wanted result:
Achild
EDIT:
There is second way using DocumentBuilderFactory.setIgnoringElementContentWhitespace method:
factory.setIgnoringElementContentWhitespace(true);
However this works only in validating mode, so you need to provide DTD in your XML document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE RootElement [
<!ELEMENT RootElement (Achild)+>
<!ELEMENT Achild (#PCDATA)>
]>
<RootElement>
<Achild>some text</Achild>
</RootElement>
and set factory.setValidating(true). Full example:
final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
final DocumentBuilder builder = factory.newDocumentBuilder();
final Document doc = builder.parse("input.xml");
final Node rootNode = doc.getDocumentElement();
final Element childElement = (Element) rootNode.getFirstChild();
System.out.println(childElement.getNodeName());
Wanted result with original code:
Achild

It sounds like .getFirstChild() is returning you a text node containing the white space between "" and "", in which case you would need to advance to the next sibling node to get to where you expect.

Related

XML - Extract One tag Value

I have to extract tag value from an xml Document that contains a single tag like below:
<error>Permission denied</error>
i have tried:
String xmlRecords = "<error>Permission denied</error>"
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlRecords));
Document doc = db.parse(is);
Node nodes = doc.getFirstChild();
String = nodes.getNodeValue();
but it dont works.
How can i do it ?
Use doc.getDocumentElement().getTextContent() to get the string Permission denied.
With DOM it´s util to know the structure of the XML document, and which node level are you looking for.
After get Document, you can use document.getElementsByTagName("root") to look for the root or father tags, and get the childs as a list to look for the item. Something like this:
NodeList listresults = document.getElementsByTagName('father/root element string');
NodeList nl = listresults.item(0).getChildNodes();
// Recorremos los nodos
for (int temp = 0; temp < nl.getLength(); temp++) {
Node node = nl.item(temp);
// Check if it is a node
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
if(element.getNodeName().equals("error")){
// check the element
}
}
}
I hope this helps you.
just try following code.
String value = nodes.getTextContent();
You have to construct the string if you are using the above approach. You will get the string values of the tag name and content using the functions.
Tag name = nodes.getTextContent()
tag value = nodes.getLocalName()
I guess this is what you want
Element element = document.getDocumentElement();
NodeList errorTagList = element.getElementsByTagName("error");
if (errorTagList != null && errorTagList.getLength() > 0) {
NodeList errorTagSubList = errorTagList.item(0).getChildNodes();
if (errorTagSubList != null && errorTagSubList.getLength() > 0) {
String value = errorTagSubList.item(0).getNodeValue();
}
}

parse xml using dom java

I have the bellow xml:
<modelingOutput>
<listOfTopics>
<topic id="1">
<token id="354">wish</token>
</topic>
</listOfTopics>
<rankedDocs>
<topic id="1">
<documents>
<document id="1" numWords="0"/>
<document id="2" numWords="1"/>
<document id="3" numWords="2"/>
</documents>
</topic>
</rankedDocs>
<listOfDocs>
<documents>
<document id="1">
<topic id="1" percentage="4.790644689978203%"/>
<topic id="2" percentage="11.427632949428334%"/>
<topic id="3" percentage="17.86913349249596%"/>
</document>
</documents>
</listOfDocs>
</modelingOutput>
Ι Want to parse this xml file and get the topic id and percentage from ListofDocs
The first way is to get all document element from xml and then I check if grandfather node is ListofDocs.
But the element document exist in rankedDocs and in listOfDocs, so I have a very large list.
So I wonder if exist better solution to parse this xml avoiding if statement?
My code:
public void parse(){
Document dom = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
dom = db.parse(is);
Element doc = dom.getDocumentElement();
NodeList documentnl = doc.getElementsByTagName("document");
for (int i = 1; i <= documentnl.getLength(); i++) {
Node item = documentnl.item(i);
Node parentNode = item.getParentNode();
Node grandpNode = parentNode.getParentNode();
if(grandpNode.getNodeName() == "listOfDocs"{
//get value
}
}
}
First, when checking the node name you shouldn't compare Strings using ==. Always use the equals method instead.
You can use XPath to evaluate only the document topic elements under listOfDocs:
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPath.compile("//listOfDocs//document/topic");
NodeList topicnl = (NodeList) xPathExpression.evaluate(dom, XPathConstants.NODESET);
for(int i = 0; i < topicnl.getLength(); i++) {
...
If you do not want to use the if statement you can use XPath to get the element you need directly.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("source.xml");
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/*/listOfDocs/documents/document/topic");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getAttributes().getNamedItem("id"));
System.out.println(nodes.item(i).getAttributes().getNamedItem("percentage"));
}
Please check GitHub project here.
Hope this helps.
I like to use XMLBeam for such tasks:
public class Answer {
#XBDocURL("resource://data.xml")
public interface DataProjection {
public interface Topic {
#XBRead("./#id")
int getID();
#XBRead("./#percentage")
String getPercentage();
}
#XBRead("/modelingOutput/listOfDocs//document/topic")
List<Topic> getTopics();
}
public static void main(final String[] args) throws IOException {
final DataProjection dataProjection = new XBProjector().io().fromURLAnnotation(DataProjection.class);
for (Topic topic : dataProjection.getTopics()) {
System.out.println(topic.getID() + ": " + topic.getPercentage());
}
}
}
There is even a convenient way to convert the percentage to float or double. Tell me if you like to have an example.

Java XML - nested elements with same name

How can I reach to elements which have same name and recursive inclusion using Java XML? This has worked in python ElementTree, but for some reason I need to get this running in Java.
I have tried:
String filepath = ("file.xml");
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(filepath);
NodeList nl = doc.getElementsByTagName("*/*/foo");
Example
<foo>
<foo>
<foo>
</foo>
</foo>
</foo>
You seem to be under the impression that getElementsByTagName takes an XPath expression. It doesn't. As documented:
Returns a NodeList of all the Elements in document order with a given tag name and are contained in the document.
If you need to use XPath, you should look at the javax.xml.xpath package. Sample code:
Object set = xpath.evaluate("*/*/foo", doc, XPathConstants.NODESET);
NodeList list = (NodeList) set;
int count = list.getLength();
for (int i = 0; i < count; i++) {
Node node = list.item(i);
// Handle the node
}

Child node name in a xml

I am trying to write a piece of code that can parse any xml and print its contents. I am using DOM parser. I am able to get the name of the root tag of the xml, but cant obtain tag name of the immediate child. This can be done easily in case the node names are known by using the method 'getElementsByTagName' . Is there any way out of this dilemma ?
My code goes like this :
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
doc.getDocumentElement().getNodeName() // this gets me the name of the root node.
Now how can i get the name of the immediate child node so that i can traverse the xml using getElementsByTagName("x").
Thanks in advance.
getChildNodes() returns all children of an element. The list will contain more then just elements so you'll have to check each child node if it is an element:
NodeList nodes = doc.getDocumentElement().getChildNodes();
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.get(i);
if (node instanceof Element) {
Element childElement = (Element) node;
System.out.println("tag name: " + childElement.getTagName());
}
}

XML parser gives null element

When I try to parse a XML-file, it gives sometimes a null element by the title.
I think it has to do with HTML-tags '
How can I solve this problem?
I have the follow XML-file:
<item>
<title>' Nieuwe DVD '</title>
<description>tekst, tekst tekst</description>
<link>dvd.html</link>
<category>nieuws</category>
<pubDate>Sat, 1 Jan 2011 9:24:00 +0000</pubDate>
</item>
And the follow code to parse the xml-file:
//DocumentBuilderFactory, DocumentBuilder are used for
//xml parsing
DocumentBuilderFactory dbf = DocumentBuilderFactory
.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
//using db (Document Builder) parse xml data and assign
//it to Element
Document document = db.parse(is);
Element element = document.getDocumentElement();
//take rss nodes to NodeList
element.normalize();
NodeList nodeList = element.getElementsByTagName("item");
if (nodeList.getLength() > 0)
{
for (int i = 0; i < nodeList.getLength(); i++)
{
//take each entry (corresponds to <item></item> tags in
//xml data
Element entry = (Element) nodeList.item(i);
entry.normalize();
Element _titleE = (Element) entry.getElementsByTagName(
"title").item(0);
Element _categoryE = (Element) entry
.getElementsByTagName("category").item(0);
Element _pubDateE = (Element) entry
.getElementsByTagName("pubDate").item(0);
Element _linkE = (Element) entry.getElementsByTagName(
"link").item(0);
String _title = _titleE.getFirstChild().getNodeValue();
String _category = _categoryE.getFirstChild().getNodeValue();
Date _pubDate = new Date(_pubDateE.getFirstChild().getNodeValue());
String _link = _linkE.getFirstChild().getNodeValue();
//create RssItemObject and add it to the ArrayList
RssItem rssItem = new RssItem(_title, _category, _pubDate, _link);
rssItems.add(rssItem);
conn.disconnect();
}
Don't use getFirstElement when you really want getTextContent.

Categories