To migrate from an old XML-based application to a database application I need to implement a converter that reads XML files and creates a file with insert queries
I've taken a look at common tutorials for java XML reading, but my issue is, that I do not have Nodes with the same name, but unique node names following a certain prefix.
Most Tutorial Examples follow a scheme like this:
<root>
<class>
<node></node>
<node></node>
<node></node>
</class>
</root>
which allows the usage of doc.getElementsByTagName("theName"). But in my case, the tagname is a prefix followed by unique identifier, like <theNodeName_A1>. Here is a sample of my XML. each <theNodeName_XX> contains multiple children and children of children.
<root>
<class>
<theNodeName_A1>
</theNodeName_A1>
<theNodeName_B3>
</theNodeName_B3>
</class>
</root>
My goal is to provide a function that does something like "doc.getElementsbyTagName(contains("theNodeName")) which would allow to iterate through each node and process child (subnodes) of each node.
How can I achieve this?
getElementsByTagName is not going to work for this. I think the easiest approach is to use XPath:
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList elements = (NodeList) xpath.evaluate(
"//class/*[contains(local-name(), '_')]",
doc, XPathConstants.NODESET);
int count = elements.getLength();
for (int i = 0; i < count; i++) {
Element element = (Element) elements.item(i);
String name = element.getTagName();
int underscore = name.indexOf('_');
String id = name.substring(underscore + 1);
// ...
}
You could also iterate over the child elements one by one, but it’s more involved:
NodeList classElements = doc.getElementsByTagName("class");
int classCount = classElements.getLength();
for (int i = 0; i < classCount; i++) {
Element classElement = (Element) classElements.item(i);
NodeList children = classElement.getChildNodes();
int childCount = children.getLength();
for (int j = 0; j < childCount; j++) {
Node child = children.item(j);
// IMPORTANT: Not all child nodes are Elements.
// DO NOT SKIP THIS CHECK!
if (child instanceof Element) {
Element element = (Element) child;
String name = element.getTagName();
int underscore = name.indexOf('_');
if (underscore > 0) {
String id = name.substring(underscore + 1);
// ...
}
}
}
}
Related
The code below will output a list of element ID's, and then plug those ID's into another xpath query. With the end goal to return specific child elements from each ID. This code works, however It is VERY slow, and will take far too long for anyone to want to wait, since there are a large amount of ID's. It will also return all elements, not specific ones that I choose.
Is there a a one-line query that will select a range of attribute id's?
Is there a better way to return elements that I want, without using another xpath query?
Here is what I have so far:
String expression = "MyComplexQueryHere/#ID-REF"; //Returns all IDs
NodeList nodeList = (NodeList) xpath.compile(expression).evaluate(doc, XPathConstants.NODESET); // Get Nodelist
for (int i = 0; i < nodeList.getLength(); i++)
{
Node nNode = nodeList.item(i);
String expression2 = String.format("MyComplexElements/Element[#ID=\"%s\"]", nNode.getTextContent()) ;
NodeList childList = (NodeList) xpath.compile(expression2).evaluate(doc, XPathConstants.NODESET);
for (int k = 0; k < childList.getLength(); k++)
{
Node kNode = childList.item(k);
System.out.println(kNode.getNodeName());
System.out.println(kNode.getTextContent());
}
}
I've an xml file that I need to delete certain elements depending on the attribute value. the unconditional deletion works just fine but this won't:
NodeList nodes = doc.getElementsByTagName("host");
Element d;
for ( int i = 0; i < nodes.getLength(); i++ ) {
String state = nodes.item(i).getChildNodes().item(0).getAttributes().item(0).getTextContent();
if("down".equals(state)){
d= (Element) nodes.item(0);
d.getParentNode().removeChild(d);
System.out.println(state);
}
}
There are a couple of problems with the code. The first is the line
String state = nodes.item(i).getChildNodes().item(0).getAttributes().item(0).getTextContent();
which assumes that 1) there are no text nodes between child elements, and 2) that the attribute looked for is always the first attribute in the list. This makes the code brittle and subject to failure if the format of the input document change.
The second problem is that while iterating over the node list, some of the members of the list are removed from their parent. The behavior of the list if detaching/removing children from their parent is not specified in the DOM API, why the effect could vary between implementations. Not sure if this is the case here, but I wouldn't depend on guessing.
I would suggest using XPath instead (much clearer and shorter). Something like this:
XPath xp = XPathFactory.newInstance().newXPath();
NodeList list = (NodeList) xp.evaluate("//node[*[#* = 'down']]", doc, XPathConstants.NODESET);
for(int i=0; i < list.getLength(); ++i) {
Node node = list.item(i);
node.getParentNode().removeChild(node);
}
The expression //node[*[#* = 'down']] selects all <node> elements that have a child element that has any attribute with value down.
While using getNodeName, it will return actual value with "#text" as prefix. I do not want that prefix.
If I remove space and newlines, getNodeName is working fine. I am using DocumentBuilderFactory,DocumentBuilder and Document for parse xml.
My XML file
<test>
<a>
file1
</a>
<b>
file2
</b>
<c>
<files>
<file>
myfile1
</file>
</files>
</c>
</test>
My java Method
NodeList childNodes = null;
NodeList parentNodes = xml.getNodeList("test");
int node_len = parentNodes.getLength();
for (int i = 0; i < node_len; i++)
{
childNodes = parentNodes.item(i).getChildNodes();
int child_len = childNodes.getLength();
for (int j = 0; j < child_len; j++)
{
Node dataNode = childNodes.item(j);
System.out.println(dataNode.getNodeName());
}
}
Please help me to clear this issue. Thanks is advance.
In XML almost everything is a node, and all nodes implement getNodeName() (or similar syntax in each parser). The elements and attributes are nodes and have explicit node names (elementName (in your case "test", "a", "b", "c", "files", "file") or attributeName (you have no attributes)). text() nodes and and comment() nodes do not have individual node names. The parser will normally give them a single common nodeName of #text or #comment so you can see what type they are. (The only other logical alternatives would be null or emptyString or throw Exception all of which would be worse.)
"While using getNodeName, it will return actual value with "#text" as prefix". Are you sure?
Be sure that you are not confusing the name of a node with its value. There are two separate operations:
getNodeName() which should return "#text" for ALL text nodes. getValue() which should return "myfile1" (probably with trailing \n). Note that your file contains many whitespace text nodes.
Note that if you getValue() of an element, that is the concatenated strings of all the descendants, including whitespace.
Note also that the string "myfile1" is NOT a child of the elementNode file. The elementNode has a child text() node whose string value is "myfile1".
In addition to answer given by #peter.murray.rust I want to suggest you to check whether the node is actually Element (that is expected in your case), cast to Element and invoke getTagName():
if(dataNode instanceof Element) {
String tag = ((Element)dataNode).getTagName();
}
Try to drop text nodes.
for (int j = 0; j < child_len; j++)
{
Node dataNode = childNodes.item(j);
if (dataNode.getNodeType() == Node.ELEMENT_NODE) {
System.out.println(dataNode.getNodeName());
}
}
The condition dataNode.getNodeType() == Node.ELEMENT_NODE will drop all non-element nodes.
for (int j = 0; j < child_len; j++)
{
Node dataNode = childNodes.item(j);
if(dataNode.getAttributes() != null)
System.out.println(dataNode.getNodeName());
/*or
if (dataNode.getNodeType() == Node.ELEMENT_NODE)
{
System.out.println(dataNode.getNodeName());
}
*/
}
}
I am trying to values of specific Nodes from an xml file, this is working fine.
However, there is one line i can't read which is :
<misc viewers="898" duration="6684"/>
I can find the node, but getNodeValue() and getTextContext() both return null.
Is there a workaround to get the contents of this line?
Thanks
edit : i am using this loop to find nodes
NodeList nodes = doc.getElementsByTagName("item");
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
System.out.println("Title: "
+ getElementValue(element, "title"));
System.out
.println("embed: " + getElementValue(element, "misc"));
System.out.println();
}
viewers and duration are attributes of the misc node, not values. You need to call getAttributes() to get a NamedNodeMap of all the attributes, then call getNamedItem() on your node map to access a specific attribute.
I have an XML file as follows:
<rootNode>
<link>http://rootlink/</link>
<image>
<link>http://imagelink/</link>
<title>This is the title</title>
</image>
</rootNode>
The XML Java code using DOM is as follows:
NodeList rootNodeList = element.getElementsByTagName("link");
This will give me all of the "link" elements including the top level and the one inside the "image" node.
Is there a way to just get the "link" tags for rootNode within one level and not two such as is the case for the image link? That is, I just want the http://rootlink/ "link".
You could use XPath:
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
NodeList links = (NodeList) xpath.evaluate("rootNode/link", element,
XPathConstants.NODESET);
I couldn't find any methods to do that either so I wrote this helper function,
public static List<Element> getChildrenByTagName(Element parent, String name) {
List<Element> nodeList = new ArrayList<Element>();
for (Node child = parent.getFirstChild(); child != null; child = child.getNextSibling()) {
if (child.getNodeType() == Node.ELEMENT_NODE &&
name.equals(child.getNodeName())) {
nodeList.add((Element) child);
}
}
return nodeList;
}
If you can use JDOM instead, you can do this:
element.getChildren("link");
With standard Dom the closest you can get is to iterate the child nodes list (by calling getChildNodes() and checking each item(i) of the NodeList, picking out the nodes with the matching name.
I know this is an old question, but nonetheless I have an alternative solution to add.
It's not the most efficient, but works:
Get all the children using getElementsByTagName, then just check each one has the same parent to the one you started with.
I use this because I have a set of results, each result can have results nested inside it. When I call my Result constructor I need to add any nested results, but as they themselves will look for their own children I don't want to add children to the current level (their parent will add them).
Example:
NodeList children = resultNode.getElementsByTagName("result");
for(int i = 0; i<children.getLength(); i++){
// make sure not to pick up grandchildren.
if(children.item(i).getParentNode().isSameNode(resultNode)){
addChildResult(new Result((Element)children.item(i)));
}
}
Hope this helps.
I wrote this function to get the node value by tagName, restrict to top level
public static String getValue(Element item, String tagToGet, String parentTagName) {
NodeList n = item.getElementsByTagName(tagToGet);
Node nodeToGet = null;
for (int i = 0; i<n.getLength(); i++) {
if (n.item(i).getParentNode().getNodeName().equalsIgnoreCase(parentTagName)) {
nodeToGet = n.item(i);
}
}
return getElementValue(nodeToGet);
}
public final static String getElementValue(Node elem) {
Node child;
if (elem != null) {
if (elem.hasChildNodes()) {
for (child = elem.getFirstChild(); child != null; child = child
.getNextSibling()) {
if (child.getNodeType() == Node.TEXT_NODE) {
return child.getNodeValue();
}
}
}
}
return "";
}