XMLParser in java - java

While using getNodeName, it will return actual value with "#text" as prefix. I do not want that prefix.
If I remove space and newlines, getNodeName is working fine. I am using DocumentBuilderFactory,DocumentBuilder and Document for parse xml.
My XML file
<test>
<a>
file1
</a>
<b>
file2
</b>
<c>
<files>
<file>
myfile1
</file>
</files>
</c>
</test>
My java Method
NodeList childNodes = null;
NodeList parentNodes = xml.getNodeList("test");
int node_len = parentNodes.getLength();
for (int i = 0; i < node_len; i++)
{
childNodes = parentNodes.item(i).getChildNodes();
int child_len = childNodes.getLength();
for (int j = 0; j < child_len; j++)
{
Node dataNode = childNodes.item(j);
System.out.println(dataNode.getNodeName());
}
}
Please help me to clear this issue. Thanks is advance.

In XML almost everything is a node, and all nodes implement getNodeName() (or similar syntax in each parser). The elements and attributes are nodes and have explicit node names (elementName (in your case "test", "a", "b", "c", "files", "file") or attributeName (you have no attributes)). text() nodes and and comment() nodes do not have individual node names. The parser will normally give them a single common nodeName of #text or #comment so you can see what type they are. (The only other logical alternatives would be null or emptyString or throw Exception all of which would be worse.)
"While using getNodeName, it will return actual value with "#text" as prefix". Are you sure?
Be sure that you are not confusing the name of a node with its value. There are two separate operations:
getNodeName() which should return "#text" for ALL text nodes. getValue() which should return "myfile1" (probably with trailing \n). Note that your file contains many whitespace text nodes.
Note that if you getValue() of an element, that is the concatenated strings of all the descendants, including whitespace.
Note also that the string "myfile1" is NOT a child of the elementNode file. The elementNode has a child text() node whose string value is "myfile1".

In addition to answer given by #peter.murray.rust I want to suggest you to check whether the node is actually Element (that is expected in your case), cast to Element and invoke getTagName():
if(dataNode instanceof Element) {
String tag = ((Element)dataNode).getTagName();
}

Try to drop text nodes.
for (int j = 0; j < child_len; j++)
{
Node dataNode = childNodes.item(j);
if (dataNode.getNodeType() == Node.ELEMENT_NODE) {
System.out.println(dataNode.getNodeName());
}
}
The condition dataNode.getNodeType() == Node.ELEMENT_NODE will drop all non-element nodes.

for (int j = 0; j < child_len; j++)
{
Node dataNode = childNodes.item(j);
if(dataNode.getAttributes() != null)
System.out.println(dataNode.getNodeName());
/*or
if (dataNode.getNodeType() == Node.ELEMENT_NODE)
{
System.out.println(dataNode.getNodeName());
}
*/
}
}

Related

JAVA: Read xml with iterated child node names

To migrate from an old XML-based application to a database application I need to implement a converter that reads XML files and creates a file with insert queries
I've taken a look at common tutorials for java XML reading, but my issue is, that I do not have Nodes with the same name, but unique node names following a certain prefix.
Most Tutorial Examples follow a scheme like this:
<root>
<class>
<node></node>
<node></node>
<node></node>
</class>
</root>
which allows the usage of doc.getElementsByTagName("theName"). But in my case, the tagname is a prefix followed by unique identifier, like <theNodeName_A1>. Here is a sample of my XML. each <theNodeName_XX> contains multiple children and children of children.
<root>
<class>
<theNodeName_A1>
</theNodeName_A1>
<theNodeName_B3>
</theNodeName_B3>
</class>
</root>
My goal is to provide a function that does something like "doc.getElementsbyTagName(contains("theNodeName")) which would allow to iterate through each node and process child (subnodes) of each node.
How can I achieve this?
getElementsByTagName is not going to work for this. I think the easiest approach is to use XPath:
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList elements = (NodeList) xpath.evaluate(
"//class/*[contains(local-name(), '_')]",
doc, XPathConstants.NODESET);
int count = elements.getLength();
for (int i = 0; i < count; i++) {
Element element = (Element) elements.item(i);
String name = element.getTagName();
int underscore = name.indexOf('_');
String id = name.substring(underscore + 1);
// ...
}
You could also iterate over the child elements one by one, but it’s more involved:
NodeList classElements = doc.getElementsByTagName("class");
int classCount = classElements.getLength();
for (int i = 0; i < classCount; i++) {
Element classElement = (Element) classElements.item(i);
NodeList children = classElement.getChildNodes();
int childCount = children.getLength();
for (int j = 0; j < childCount; j++) {
Node child = children.item(j);
// IMPORTANT: Not all child nodes are Elements.
// DO NOT SKIP THIS CHECK!
if (child instanceof Element) {
Element element = (Element) child;
String name = element.getTagName();
int underscore = name.indexOf('_');
if (underscore > 0) {
String id = name.substring(underscore + 1);
// ...
}
}
}
}

How to know if a xml node contains a value or another node? [duplicate]

I am using DOM representations in java
how can I distinguish if an xml tag has a value inside it or has another embedded tag ?
For example, I can have :
<item> 2 </item>
or
<item> <name> item1 </name> </item>
i want to do the following
if(condition1 : there is no tags inside item tag) do ...
else do ...
how can I write condition 1 ?
You can just test every child by iterating over the list of child nodes:
public static boolean hasChildElements(Element el) {
NodeList children = el.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
if (children.item(i).getNodeType() == Node.ELEMENT_NODE) {
return true;
}
}
return false;
}
condition1 is then (! hasChildElements(el)).
Alternatively, you can implement the test with getElementsByTagName("*").getLength() == 0. However, if there are sub elements, this method will traverse the whole fragment you're testing, and allocate lots of memory.
ElementNodes have a getElementsByTagName method. Call it with the string '*' as the argument (to match all tag names) then count the number of results. More than zero and there are child elements.
NodeList nl = docEle.getElementsByTagName("item");
nl.getChildNodes();
You can iterate over childNodes to get the nodeType if nodeType is text you can put your condition.

can't delete conditionally from xml using Java

I've an xml file that I need to delete certain elements depending on the attribute value. the unconditional deletion works just fine but this won't:
NodeList nodes = doc.getElementsByTagName("host");
Element d;
for ( int i = 0; i < nodes.getLength(); i++ ) {
String state = nodes.item(i).getChildNodes().item(0).getAttributes().item(0).getTextContent();
if("down".equals(state)){
d= (Element) nodes.item(0);
d.getParentNode().removeChild(d);
System.out.println(state);
}
}
There are a couple of problems with the code. The first is the line
String state = nodes.item(i).getChildNodes().item(0).getAttributes().item(0).getTextContent();
which assumes that 1) there are no text nodes between child elements, and 2) that the attribute looked for is always the first attribute in the list. This makes the code brittle and subject to failure if the format of the input document change.
The second problem is that while iterating over the node list, some of the members of the list are removed from their parent. The behavior of the list if detaching/removing children from their parent is not specified in the DOM API, why the effect could vary between implementations. Not sure if this is the case here, but I wouldn't depend on guessing.
I would suggest using XPath instead (much clearer and shorter). Something like this:
XPath xp = XPathFactory.newInstance().newXPath();
NodeList list = (NodeList) xp.evaluate("//node[*[#* = 'down']]", doc, XPathConstants.NODESET);
for(int i=0; i < list.getLength(); ++i) {
Node node = list.item(i);
node.getParentNode().removeChild(node);
}
The expression //node[*[#* = 'down']] selects all <node> elements that have a child element that has any attribute with value down.

Java org.w3c.dom - Recursively find and return an attribute value in a NodeList

I am currently parsing an XML response from a web service. It returns a finite number of <result> elements. I am currently iterating through a NodeList of such results.
While I iterate through this, sometimes I need to find the value of an attribute that exists in each <result> element. In that case, I want to call a method that traverses through all of the child nodes (and potentially the children's children, etc.) and returns the attribute's value.
I have attempted to do this recursively:
private String findAttrInChildren(Element element, String tag) {
if (!element.getAttribute(tag).isEmpty()) {
return element.getAttribute(tag);
}
NodeList children = element.getChildNodes();
for (int i = 0, len = children.getLength(); i < len; i++) {
if (children.item(i).getNodeType() == Node.ELEMENT_NODE) {
Element childElement = (Element) children.item(i);
return findAttrInChildren(childElement, tag);
}
}
// We didn't find it, return null
return null;
}
Unfortunately, this isn't working. Is recursion the best approach here? I think the fact that I want to return a value at the end is messing me up somewhere along the line, rather than implementing a void recursive method.
You leave the recursion too early. Given
if (children.item(i).getNodeType() == Node.ELEMENT_NODE) {
Element childElement = (Element) children.item(i);
return findAttrInChildren(childElement, tag);
}
this will end the recursive search at the first child element - regardless if the child or one of its descendants has the attribute or not.
So test if the returned attribute is not null:
if (children.item(i).getNodeType() == Node.ELEMENT_NODE) {
Element childElement = (Element) children.item(i);
String attr = findAttrInChildren(childElement, tag);
if (attr != null)
return attr;
}
I know your question was answered and you got an answer that works but you can actually make your code a little better. You should avoid 2 exit points and have 1 instead and always avoid null. So to cater these both I am going to modify your code and provide you a new one. I hope you understand the changes. I am using Optional to avoid null and make your code a little more readable and understandable.
private static Optional<String> findAttributeInChildren(Element element, String tag) {
Optional<String> attr = Optional.empty();
if (!element.getAttribute(tag).isEmpty()) {
attr = Optional.of(element.getAttribute(tag));
} else {
NodeList children = element.getChildNodes();
int len = children.getLength();
for (int i = 0; (i < len) && (!attr.isPresent()); i++) {
if (children.item(i).getNodeType() == Node.ELEMENT_NODE) {
Element childElement = (Element) children.item(i);
attr = findAttributeInChildren(childElement, tag);
}
}
}
return attr;
}
And if you want to check whether your code has some value or not you just do;
findAttributeInChildren(x,y).isPresent()
rather than
findAttributeInChildren(x,y) == null

How to know if a tag contains a value or another tag?

I am using DOM representations in java
how can I distinguish if an xml tag has a value inside it or has another embedded tag ?
For example, I can have :
<item> 2 </item>
or
<item> <name> item1 </name> </item>
i want to do the following
if(condition1 : there is no tags inside item tag) do ...
else do ...
how can I write condition 1 ?
You can just test every child by iterating over the list of child nodes:
public static boolean hasChildElements(Element el) {
NodeList children = el.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
if (children.item(i).getNodeType() == Node.ELEMENT_NODE) {
return true;
}
}
return false;
}
condition1 is then (! hasChildElements(el)).
Alternatively, you can implement the test with getElementsByTagName("*").getLength() == 0. However, if there are sub elements, this method will traverse the whole fragment you're testing, and allocate lots of memory.
ElementNodes have a getElementsByTagName method. Call it with the string '*' as the argument (to match all tag names) then count the number of results. More than zero and there are child elements.
NodeList nl = docEle.getElementsByTagName("item");
nl.getChildNodes();
You can iterate over childNodes to get the nodeType if nodeType is text you can put your condition.

Categories