my question is "DOM parser, why do I get just one child of an element?"
I looked into this and this one, but I do not get the point.
What I'm trying to do is the following:
I have an XML file (see the extract below) :
<POITEM>
<item>
<PO_ITEM>00010</PO_ITEM>
<SHORT_TEXT>ITEM_A</SHORT_TEXT>
<MATL_GROUP>20010102</MATL_GROUP>
<AGREEMENT>4600010076</AGREEMENT>
<AGMT_ITEM>00010</AGMT_ITEM>
<HL_ITEM>00000</HL_ITEM>
<NET_PRICE>1.000000000</NET_PRICE>
<QUANTITY>1.000</QUANTITY>
<PO_UNIT>EA</PO_UNIT>
</item>
<item>
<PO_ITEM>00020</PO_ITEM>
<SHORT_TEXT>ITEM_B</SHORT_TEXT>
<MATL_GROUP>20010102</MATL_GROUP>
<AGREEMENT>4600010080</AGREEMENT>
<AGMT_ITEM>00020</AGMT_ITEM>
<HL_ITEM>00000</HL_ITEM
<NET_PRICE>5.000000000</NET_PRICE>
<QUANTITY>5.000</QUANTITY>
<PO_UNIT>EA</PO_UNIT>
</item>
</POITEM>
I only want to extract <PO_ITEM>, <SHORT_TEXT>, <MATL_GROUP>, <NET_PRICE>, <QUANTITY> and <PO_UNIT> and write it into another, smaller XML file.
So this is my code:
nodes = dcmt.getElementsByTagName("POITEM");
Element rootElement2 = doc1.createElement("PO_POSITIONS");
rootElement1.appendChild(rootElement2);
Element details2 = doc1.createElement("PO_DETAILS");
rootElement2.appendChild(details2);
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
Element position = doc1.createElement("position");
details2.appendChild(position);
Element poItm = doc1.createElement("PO_ITEM");
poItm.appendChild(doc1.createTextNode(getValue("PO_ITEM", element)));
position.appendChild(poItm);
Element matlGrp = doc1.createElement("MATL_GROUP");
matlGrp.appendChild(doc1.createTextNode(getValue("MATL_GROUP",element)));
position.appendChild(matlGrp);
Element poUnit = doc1.createElement("PO_UNIT");
poUnit.appendChild(doc1.createTextNode(getValue("PO_UNIT",element)));
position.appendChild(poUnit);
Element netPrice = doc1.createElement("NET_PRICE");
netPrice.appendChild(doc1.createTextNode(getValue("NET_PRICE",element)));
position.appendChild(netPrice);
Element shortTxt = doc1.createElement("SHORT_TEXT");
shortTxt.appendChild(doc1.createTextNode(getValue("SHORT_TEXT",element)));
position.appendChild(shortTxt);
//Element matl = doc2.createElement("MATERIAL");
//matl.appendChild(doc2.createTextNode(getValue("MATERIAL",element)));
//details2.appendChild(matl);
Element qnty = doc1.createElement("QUANTITY");
qnty.appendChild(doc1.createTextNode(getValue("QUANTITY",element)));
position.appendChild(qnty);
/*Element preqNr = doc1.createElement("PREQ_NO");
preqNr.appendChild(doc1.createTextNode(getValue("PREQ_NO",element)));
details2.appendChild(preqNr); */
}
}
So far so good, I'm getting a new XML File, but it only holds the first entry, so as i understand it, by the nodes = dcmt.getElementsByTagName("POITEM"); gets into the first <item> until the first </item> and then gets out of the loop. So how do I manage step into the next item? Do I need to create some kind of loop, to access the next <item> ?
By the way, changing the structure of the XML file is no option, since I get the file from an interface.
Or do I make a mistake while writing the new XML file?
The output looks like this:
<PO_POSITIONS>
<PO_DETAILS>
<position>
<PO_ITEM>00010</PO_ITEM>
<MATL_GROUP>20010102</MATL_GROUP>
<PO_UNIT>EA</PO_UNIT>
<NET_PRICE>1.00000000</NET_PRICE>
<SHORT_TEXT>ITEM_A</SHORT_TEXT>
<QUANTITY>1.000</QUANTITY>
</position>
</PO_DETAILS>
</PO_POSITIONS>
You could parse it yourself, it's kind of a pain. When I did xml way back when, we used to use stylesheets to do these kinds of transformations. Something like this post: How to transform XML with XSL using Java
If that's not an option, then to do it by hand (I omitted the new document construction, but you can see where it goes):
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class XMLTest {
#Test
public void testXmlParsing() throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new File("/Users/aakture/Documents/workspace-sts-2.9.1.RELEASE/smartfox/branches/trunk/java/gelato/src/test/resources/sample.xml").getAbsolutePath());
Node poItem = doc.getElementsByTagName("POITEM").item(0);
NodeList poItemChildren = poItem.getChildNodes();
for (int i = 0; i < poItemChildren.getLength(); i++) {
Node item = poItemChildren.item(i);
NodeList itemChildren = item.getChildNodes();
for (int j = 0; j < itemChildren.getLength(); j++) {
Node itemChild = itemChildren.item(j);
if("PO_ITEM".equals(itemChild.getNodeName())) {
System.out.println("found PO_ITEM: " + itemChild.getTextContent());
}
if("MATL_GROUP".equals(itemChild.getNodeName())) {
System.out.println("found MATL_GROUP: " + itemChild.getTextContent());
}
}
}
}
}
Related
I don't know how to explain my situation, I can provide example below.
I have an XML file to be read in Java, something like this:
<Author AffiliationIDS="Aff1">
<AuthorName DisplayOrder="Western">
<GivenName>Wei</GivenName>
<GivenName>Long</GivenName>
<FamilyName>
<Value>Tan</Value>
</FamilyName>
</AuthorName>
</Author>
As you can see the <FamilyName> tag, inside the <FamilyName> tag is surrounded by a Value tag. This is because in the xsd it stated the element as maxOccurs="unbounded" which mean more than one value can be in that element tag. How should I modify the code so that it can read in the <FamilyName> tag and get Value tag element no matter how many occurrence of the Value exist?
Example:
<FamilyName>
<Value>Sarah</Value>
<Value>Johnson</Value>
</FamilyName>
The code look like this.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class ReadXMLFile {
public static void main(String argv[]) {
try {
File fXmlFile = new File("/fileaddress/test-1.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("AuthorName");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Given Name : " + eElement.getElementsByTagName("GivenName").item(0).getTextContent());
System.out.println("Family Name : " + eElement.getElementsByTagName("FamilyName").item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Get the FamilyName node by getElementsByTagName("FamilyName").item(0) and loop over its child nodes (.getChildNodes()) and get the value of the textNode
Or,
You can even getElementsByTagName("Value") if you are sure that value tag does not occur anywhere else other than inside FamilyName
Here is a code Sample
NodeList children = doc.getElementsByTagName("FamilyName").item(0).getChildNodes();
for(int i=0;i<children.getLength();i++) {
if(children.item(i).getNodeType()== Node.ELEMENT_NODE) {
Element child = (Element)children.item(i);
System.out.println(child.getTextContent());
}
}
I am working on an XML example in order to understand DOM and XML better. I have a XML document with cars, of which I want to get the first cars-nodes.
I also want to do this generic, without giving a specific tag-name (find elements by tag "supercars" / "luxurycars" ...). More like "give me all the direct subnodes from cars" -> "supercars, supercars, luxurycars".
Therefore I've written the following code in order to understand the structure.
But the output confuses me:
Why is the Nodelist length 7? Is it "[cars], [supercars], [content of supercars], [supercars], [content of supercars]"? I cant manage to get the elements out and see for myself.
Why are there 4 empty "Current Elements:"?
Why is the first NodeName "#text" and not "sportcars", which comes AFTER that?
My XML document sportcars.xml.:
<?xml version="1.0"?>
<cars>
<supercars company="Ferrari">
<carname type="formula one">Ferarri 101</carname>
<carname type="sports car">Ferarri 201</carname>
<carname type="sports car">Ferarri 301</carname>
</supercars>
<supercars company="Lamborgini">
<carname>Lamborgini 001</carname>
<carname>Lamborgini 002</carname>
<carname>Lamborgini 003</carname>
</supercars>
<luxurycars company="Benteley">
<carname>Benteley 1</carname>
<carname>Benteley 2</carname>
<carname>Benteley 3</carname>
</luxurycars>
</cars>
My java file QueryXMLFileDemo.java:
package xml;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class QueryXmlFileDemo {
public static void main(String[] args) {
try {
File inputFile = new File("sportcars.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
doc.getDocumentElement().normalize();
Node n = doc.getFirstChild();
NodeList nL = n.getChildNodes();
System.out.println("Nodelist length: " + nL.getLength());
for (int i = 0; i < nL.getLength(); i++) {
Node temp = nL.item(i);
System.out.println("Current Element: " + temp.getTextContent());
System.out.println("NodeName: " + temp.getNodeName());
System.out.println("Root Element: " + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("supercars");
}
} catch (Exception e) {
}
}
}
Output:
Nodelist length: 7
Current Element:
NodeName: #text
Current Element:
Ferarri 101
Ferarri 201
Ferarri 301
NodeName: supercars
Current Element:
NodeName: #text
Current Element:
Lamborgini 001
Lamborgini 002
Lamborgini 003
NodeName: supercars
Current Element:
NodeName: #text
Current Element:
Benteley 1
Benteley 2
Benteley 3
NodeName: luxurycars
Current Element:
NodeName: #text
So, how can I print only the nodes "supercars, supercars, luxurycars" and nothing else?
A better way of retrieving nodes is by using XPath or XQuery; inheritly easier to reason about
You get the "#text" in the output because in XML there are text nodes between the elements, even if these are just white space like line breaks or indentation. See the Node Javadoc on the different possible node types.
When you print a node's getTextContent it prints the node and its children, as per the Javadoc.
If you just want to ignore the #text nodes (or any other ones), you can check in your loop what node you're dealing with. In your case, it would be something like this:
if (Node.ELEMENT_NODE != temp.getNodeType()) {
continue;
}
I found the solution, but I also have to admit, that my question was too broad and confusing. Therefore I post my way of solving the problem and hope, this clears what I was asking before.
package xml;
import javax.xml.parsers.DocumentBuilder;
import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class QueryXmlFileDemo {
public static void main(String[] args) {
try {
File inputFile = new File("sportcars.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document inputDocument = dBuilder.parse(inputFile);
inputDocument.getDocumentElement().normalize();
Node carsNode = inputDocument.getFirstChild();
NodeList carsNodeList = carsNode.getChildNodes();
for (int i = 0; i < carsNodeList.getLength(); i++) {
Node carTypes = carsNodeList.item(i);
// hides the #text-entries
if (Node.ELEMENT_NODE != carTypes.getNodeType()) {
continue;
}
System.out.println("CarType: " + carTypes.getNodeName());
}
} catch (Exception e) {
}
}
}
Output:
CarType: supercars
CarType: supercars
CarType: luxurycars
So without knowing the attributes of my XML-document I can get the "first level" of the nodes - the first nodes within <cars>: <supercars>, <supercars> and <luxurycars>.
I have the below plist.
<?xml version="1.5" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//PSect//ACD PLIST 1.5//" "http://pset.com/ACD/plist.dtd">
<plist version="1.5">
<dict>
<key>City</key>
<string>Melbourne</string>
<key>DetailedInfo</key>
<dict>
<key>Name</key>
<real>Sam</real>
<key>Income</key>
<real>4000</real>
</dict>
<key>Status</key>
<string>Single</string>
<key>PIN</key>
<string>123456789</string>
</dict>
I have the code to parse this plist into an xml file. What I need help with is to find the key City in the plist. I have looked at some posts to search for a string in an xml file, but haven't had much luck. Basically what I want to do is,
1. Check if my xml file has Key City
2. If it does, assign its value (Melbourne) to another String.
Is there anyway I can achieve this? Please suggest.
i am not sure of the doctype you have in your plist but try this
try this
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) throws Exception {
String keyVal = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new File("input.xml"));
NodeList keyList = document.getElementsByTagName("key");
if(keyList !=null && keyList.getLength() > 0) {
for(int i =0; i< keyList.getLength(); i++) {
keyVal = keyList.item(i).getTextContent();
if ("City".equals(keyVal)) {
NodeList stringList = document.getElementsByTagName("string");
if(stringList !=null && stringList.getLength() > 0) {
System.out.println(stringList.item(i).getTextContent());
}
}
}
}
}
}
XPath path = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) path.evaluate("//dict[key/text()='City']", doc, XPathConstants.NODESET);
if (nl.getLength() == 1) {
Element dictElement = (Element) nl.item(0);
NodeList stringNodeList = dictElement.getElementsByTagName("string");
for (int i = 0; i < stringNodeList.getLength(); i++) {
// replace string here
System.out.println("Replace: " + stringNodeList.item(i));
}
}
why don't you use object-xml binding framework like jaxb or xtream to do this. Those frameworks would create an object out of your xml and it will be very easy to navigate this xml then. for example you could do if("City".equals(getDict().getKey()) { then do this}
I am basically following the example here
http://www.mkyong.com/java/how-to-read-xml-file-in-java-jdom-example/
So rather than doing something like
node.getChildText("firstname")
right??
this works fine..
But is there a way to get all the "keys" and then I can query that to get values?
Just like we do in parsing json..
JSONObject json = (JSONObject) parser.parse(value);
for (Object key : json.keySet()) {
Object val = json.get(key);
}
rather than hardcoding keys and values?
Thanks
Code for reference:
package org.random_scripts;
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
public class XMLReader {
public static void main(String[] args) {
SAXBuilder builder = new SAXBuilder();
File xmlFile = new File("data.xml");
try {
Document document = (Document) builder.build(xmlFile);
Element rootNode = document.getRootElement();
List list = rootNode.getChildren("staff");
List children = rootNode.getChildren();
System.out.println(children);
for (int i = 0; i < list.size(); i++) {
Element node = (Element) list.get(i);
System.out.println("First Name : " + node.getChildText("firstname"));
System.out.println("Last Name : " + node.getChildText("lastname"));
System.out.println("Nick Name : " + node.getChildText("nickname"));
System.out.println("Salary : " + node.getChildText("salary"));
}
} catch (IOException io) {
System.out.println(io.getMessage());
} catch (JDOMException jdomex) {
System.out.println(jdomex.getMessage());
}
}
}
Well, if you wanted to write out all of the children of the node, you could do something like this:
List children = rootNode.getChildren();
for (int i = 0; i < list.size(); i++) {
Element node = (Element) list.get(i);
List dataNodes = node.getChildren();
for (int j = 0; j < dataNodes.size(); ++j) {
Element dataNode = (Element) dataNodes.get(j);
System.out.println(dataNode.getName() + " : " + dataNode.getText());
}
}
This would let you write out all of the children without knowing the names, with the only downside being that you wouldn't have "pretty" names for the fields (i.e. "First Name" instead of "firstname"). Of course, you'd have the same limitation in JSON - I don't know of an easy way to get pretty names for the fields unless your program has some knowledge about what the children are, which is the thing you seem to be trying to avoid.
The above code only provides the list of 1st level child under the tag.
For example::
<parent>
<child1>
<childinternal></childinternal>
</child1>
<child2></child2>
</parent>
The above code only prints child1 and child2, if you want to print even the internal nodes in depth you have to do recursive call.
To find a child has more nodes in it use, jdom api child.getContentSize(), if its greater than 1 menas it has more nodes.
I can't fetch text value with Node.getNodeValue(), Node.getFirstChild().getNodeValue() or with Node.getTextContent().
My XML is like
<add job="351">
<tag>foobar</tag>
<tag>foobar2</tag>
</add>
And I'm trying to get tag value (non-text element fetching works fine). My Java code sounds like
Document doc = db.parse(new File(args[0]));
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an,an2;
for (int i=0; i < nl.getLength(); i++) {
an = nl.item(i);
if(an.getNodeType()==Node.ELEMENT_NODE) {
NodeList nl2 = an.getChildNodes();
for(int i2=0; i2<nl2.getLength(); i2++) {
an2 = nl2.item(i2);
// DEBUG PRINTS
System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
if(an2.hasChildNodes())
System.out.println(an2.getFirstChild().getTextContent());
if(an2.hasChildNodes())
System.out.println(an2.getFirstChild().getNodeValue());
System.out.println(an2.getTextContent());
System.out.println(an2.getNodeValue());
}
}
}
It prints out
tag type (1):
tag1
tag1
tag1
null
#text type (3):
_blank line_
_blank line_
...
Thanks for the help.
I'd print out the result of an2.getNodeName() as well for debugging purposes. My guess is that your tree crawling code isn't crawling to the nodes that you think it is. That suspicion is enhanced by the lack of checking for node names in your code.
Other than that, the javadoc for Node defines "getNodeValue()" to return null for Nodes of type Element. Therefore, you really should be using getTextContent(). I'm not sure why that wouldn't give you the text that you want.
Perhaps iterate the children of your tag node and see what types are there?
Tried this code and it works for me:
String xml = "<add job=\"351\">\n" +
" <tag>foobar</tag>\n" +
" <tag>foobar2</tag>\n" +
"</add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = db.parse(bis);
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an,an2;
for (int i=0; i < nl.getLength(); i++) {
an = nl.item(i);
if(an.getNodeType()==Node.ELEMENT_NODE) {
NodeList nl2 = an.getChildNodes();
for(int i2=0; i2<nl2.getLength(); i2++) {
an2 = nl2.item(i2);
// DEBUG PRINTS
System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getTextContent());
if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getNodeValue());
System.out.println(an2.getTextContent());
System.out.println(an2.getNodeValue());
}
}
}
Output was:
#text: type (3): foobar foobar
#text: type (3): foobar2 foobar2
If your XML goes quite deep, you might want to consider using XPath, which comes with your JRE, so you can access the contents far more easily using:
String text = xp.evaluate("//add[#job='351']/tag[position()=1]/text()",
document.getDocumentElement());
Full example:
import static org.junit.Assert.assertEquals;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.junit.Before;
import org.junit.Test;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
public class XPathTest {
private Document document;
#Before
public void setup() throws Exception {
String xml = "<add job=\"351\"><tag>foobar</tag><tag>foobar2</tag></add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
document = db.parse(new InputSource(new StringReader(xml)));
}
#Test
public void testXPath() throws Exception {
XPathFactory xpf = XPathFactory.newInstance();
XPath xp = xpf.newXPath();
String text = xp.evaluate("//add[#job='351']/tag[position()=1]/text()",
document.getDocumentElement());
assertEquals("foobar", text);
}
}
I use a very old java. Jdk 1.4.08 and I had the same issue. The Node class for me did not had the getTextContent() method. I had to use Node.getFirstChild().getNodeValue() instead of Node.getNodeValue() to get the value of the node. This fixed for me.
If you are open to vtd-xml, which excels at both performance and memory efficiency, below is the code to do what you are looking for...in both XPath and manual navigation... the overall code is much concise and easier to understand ...
import com.ximpleware.*;
public class queryText {
public static void main(String[] s) throws VTDException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", true))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
// first manually navigate
if(vn.toElement(VTDNav.FC,"tag")){
int i= vn.getText();
if (i!=-1){
System.out.println("text ===>"+vn.toString(i));
}
if (vn.toElement(VTDNav.NS,"tag")){
i=vn.getText();
System.out.println("text ===>"+vn.toString(i));
}
}
// second version use XPath
ap.selectXPath("/add/tag/text()");
int i=0;
while((i=ap.evalXPath())!= -1){
System.out.println("text node ====>"+vn.toString(i));
}
}
}