Retrieve XML Element names with Java from unknown message format - java

I am parsing XML from lots of JMS messaging topics, so the structure of each message varies a lot and I'd like to make one general tool to parse them all.
To start, all I want to do is get the element names:
<gui-action>
<action>some action</action>
<params>
<param1>blue</param1>
<param2>tall</param2>
<params>
</gui-action>
I just want to retrieve the strings "gui-action", "action", "params", "param1", and "param2." Duplicates are just fine.
I've tried using org.w3c.dom.Node, Element, NodeLists and I'm not having much luck. I keep getting the element values, not the names.
private Element root;
private Document doc;
private NodeList nl;
//messageStr is passed in elsewhere in the code
//but is a string of the full XML message.
doc = xmlParse( messageStr );
root = doc.getDocumentElement();
nl = root.getChildNodes();
int size = nl.getLength();
for (int i=0; i<size; i++) {
log.info( nl.item(i).getNodeName() );
}
public Document xmlParse( String xml ){
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
InputSource is;
try {
//Using factory get an instance of document builder
db = dbf.newDocumentBuilder();
is = new InputSource(new StringReader( xml ) );
doc = db.parse( is );
} catch(ParserConfigurationException pce) {
pce.printStackTrace();
} catch(SAXException se) {
se.printStackTrace();
} catch(IOException ioe) {
ioe.printStackTrace();
}
return doc;
//parse using builder to get DOM representation of the XML file
}
My logged "parsed" XML looks like this:
#text
action
#text
params
#text

Figured it out. I was iterating over only the child nodes, and not including the parent. So now I just filter out the #texts, and include the parent. Derp.
log.info(root.getNodeName() );
for (int i=0; i<size; i++) {
nodeName = nl.item(i).getNodeName();
if( nodeName != "#text" ) {
log.info( nodeName );
}
}
Now if anyone knows a way to get a NodeList of the entire document, that would be awesome.

Related

XML parsing using XPath in Java

Hi!
I've spent some time to parse an XML document with XPath. It seeams to be a simple task but I got in troubles since the begining.
My code is :
public class QueryXML3 {
public static void main(String[] args) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
try {
builder = factory.newDocumentBuilder();
//doc = builder.parse("SampleExample.xml");
InputStream is = QueryXML3.class.getClassLoader().getResourceAsStream("SampleXml.xml");
doc = builder.parse(is);
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
Node parNode = getParameterNode(doc, xpath);
System.out.println("parameter node:" + parNode);
NodeList res = getParameterNodeList(doc, xpath );
System.out.println("List of nodes" + res);
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
}
}
public static Node getParameterNode(Document doc, XPath xpath) {
Node res = null;
try {
res = (Node) xpath.evaluate("/definitions/process", doc, XPathConstants.NODE);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return res;
}
public static NodeList getParameterNodeList(Document doc, XPath xpath) {
NodeList nodeList = null;
try {
nodeList = (NodeList) xpath.evaluate("/definitions/process", doc, XPathConstants.NODESET);
for (int i = 0; i > nodeList.getLength(); i++) {
System.out.print(nodeList.item(i).getNodeName() + " ");
}
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return nodeList;
}
}
As a result i get this:
parameter node:[process: null]
List of nodes com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList#2f17aadf
I just want to output all the nodes of my xml file and theire attributes...
You are really asking how to serialize an Element to a string - use either a Transformer or DOMImplementationLS.
The NodeList type has no toString() contract and the implementation does not override the default Object.toString(). You need to iterate over the nodes and serialize each Element as above.
You could easily parse an XML file in java using a 3rd party package such as JSoup or JDom.
As an example, here is some simple output of an XML files elements using JSoup:
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Java code printing all elements and the selected <from>-element:
String xml = "<note>\n"
+ "<to>Tove</to>\n"
+ "<from>Jani</from>\n"
+ "<heading>Reminder</heading>\n"
+ "<body>Don't forget me this weekend!</body>\n"
+ "</note>";
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.children()) {
System.out.println(e);
}
Element fromElement = doc.select("from").first();
System.out.println("\nThis is the <from>-element content:\n" + fromElement);

Retrieve values from XML tag in Java

I have a set of XML string outputs from a natural language tool and need to retrieve values out of them, also provide null value to those tags that are not presented in the output string. Tried to use the Java codes provided in Extracting data from XML using Java but it doesn't seem to work.
Current sample tag inventory is listed below:
<TimeStamp>, <Role>, <SpeakerId>, <Person>, <Location>, <Organization>
Sample XML output string:
<TimeStamp>00.00.00</TimeStamp> <Role>Speaker1</Role><SpeakerId>1234</SpeakerId>Blah, blah, blah.
Desire outputs:
TimeStamp: 00.00.00
Role: Speaker1
SpeakerId: 1234
Person: null
Place: null
Organization: null
In order to use the Java codes provided in above link (in updated code), I inserted <Dummy> and </Dummy> as follows:
<Dummy><TimeStamp>00.00.00</TimeStamp><Role>Speaker1</Role><SpeakerId>1234</SpeakerId>Blah, blah, blah.</Dummy>
However, it returns dummy and null only. Since I'm still a newbie to Java, detailed explanations will be much appreciated.
Try this way :D hope can help you
File fXmlFile = new File("yourfile.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
You can get child node list like this:
NodeList nList = doc.getElementsByTagName("staff");
Get the item like this:
Node nNode = nList.item(temp);
Example Site
This is what I ended up doing for my Java wrapper (Show TimeStamp only)
public class NERPost {
public String convertXML (String input) {
String nerOutput = input;
try {
DocumentBuilderFactory docBuilderFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(nerOutput));
Document doc = docBuilder.parse(is);
// normalize text representation
doc.getDocumentElement ().normalize ();
NodeList listOfDummies = doc.getElementsByTagName("dummy");
for(int s=0; s<listOfDummies.getLength() ; s++){
Node firstDummyNode = listOfDummies.item(s);
if(firstDummyNode.getNodeType() == Node.ELEMENT_NODE){
Element firstDummyElement = (Element)firstDummyNode;
//Convert each entity label --------------------------------
//TimeStamp
String ts = "<TimeStamp>";
Boolean foundTs;
if (foundTs = nerOutput.contains(ts)) {
NodeList timeStampList = firstDummyElement.getElementsByTagName("TimeStamp");
//do it recursively
for (int i=0; i<timeStampList.getLength(); i++) {
Node firstTimeStampNode = timeStampList.item(i);
Element timeStampElement = (Element)firstTimeStampNode;
NodeList textTSList = timeStampElement.getChildNodes();
String timeStampOutput = ((Node)textTSList.item(0)).getNodeValue().trim();
System.out.println ("<TimeStamp>" + timeStampOutput + "</TimeStamp>\n")
} //end for
}//end if
//other XML tags
//.....
}//end if
}//end for
}
catch...
}//end try
}}

Parsing XML from website to an Android device

I am starting an Android application that will parse XML from the web. I've created a few Android apps but they've never involved parsing XML and I was wondering if anyone had any tips on the best way to go about it?
Here's an example:
try {
URL url = new URL(/*your xml url*/);
URLConnection conn = url.openConnection();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
NodeList nodes = doc.getElementsByTagName(/*tag from xml file*/);
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName(/*item within the tag*/);
Element line = (Element) title.item(0);
phoneNumberList.add(line.getTextContent());
}
}
catch (Exception e) {
e.printStackTrace();
}
In my example, my XML file looks a little like:
<numbers>
<phone>
<string name = "phonenumber1">555-555-5555</string>
</phone>
<phone>
<string name = "phonenumber2">555-555-5555</string>
</phone>
</numbers>
and I would replace /*tag from xml file*/ with "phone" and /*item within the tag*/ with "string".
I always use the w3c dom classes. I have a static helper method that I use to parse the xml data as a string and returns to me a Document object. Where you get the xml data can vary (web, file, etc) but eventually you load it as a string.
something like this...
Document document = null;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(data));
document = builder.parse(is);
}
catch (SAXException e) { }
catch (IOException e) { }
catch (ParserConfigurationException e) { }
There are different types of parsing mechanisms available, one is SAX Here is SAX parsing example, second is DOM parsing Here is DOM Parsing example.. From your question it is not clear what you want, but these may be good starting points.
There are three types of parsing I know: DOM, SAX and XMLPullParsing.
In my example here you need the URL and the parent node of the XML element.
try {
URL url = new URL("http://www.something.com/something.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList1 = doc.getElementsByTagName("parent node here");
for (int i = 0; i < nodeList1.getLength(); i++) {
Node node = nodeList1.item(i);
}
} catch(Exception e) {
}
Also try this.
I would use the DOM parser, it is not as efficient as SAX, if the XML file is not too large, as it is easier in that case.
I have made just one android App, that involved XML parsing. XML received from a SOAP web service. I used XmlPullParser. The implementation from Xml.newPullParser() had a bug where calls to nextText() did not always advance to the END_TAG as the documentation promised. There is a work around for this.

Issue with XML child nodes iteration when mix of text and element nodes

I was trying to parse following strings to form a xml document and then trying to extract all child nodes of and add to a different document object which is already available to me.
<dhruba><test>this</test>that<test2>wang chu</test2> something.... </dhruba>
<dhruba>this is text node <test>this</test>that<test2>wang chu</test2> anything..</dhruba>
while I am trying to read the child nodes, it is returning null child for TEXT_NODE for 1st string and null for ELEMENT_NODE for 2nd String, this is wrong, is it API problem ??
I am using following code ... it compile , I am using java 6.
Node n = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
db = dbf.newDocumentBuilder();
} catch (ParserConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
dom = db.newDocument();
Element rootEle = dom.createElement("resources");
// adding the root element to the document
dom.appendChild(rootEle);
Element element = dom.createElement("string");
element.setAttribute("name", "some_name");
try {
n = db.parse(new InputSource(new StringReader("<dhruba><test>this</test>that<test2>node value</test2> some text</dhruba>"))).getDocumentElement();
n = dom.importNode(n, true);
NodeList nodeList = n.getChildNodes();
int length = nodeList.getLength();
System.out.println("Total no of childs : "+length);
for(int count = 0 ; count < length ; count++ ){
Node node = nodeList.item(count);
if(node != null ){
element.appendChild(node);
}
}
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
rootEle.appendChild(element);
INPUT :: as string
<dhruba><string name="some_name">
that
<test>this</test>
<test2>node value</test2>
some text
</string>
</dhruba>
EXPECTED OUTPUT :: as document
<string>
<string name="some_name">
<test>this</test>
<test2>node value</test2>
</string>
</string>
if I try to parse
<test>this</test>that<test2>wang chu</test2> something....
then output comes as "thiswang chu"
Why is this happening? what needs to be done if I want to add following node under another document element, i.e. <string>.
<test>this</test>
that
<test2>node value</test2>
some text
[notice that it does not have <dhruba>] inside parent node of another
document.
Hope I am clear. Above code compiles in Java 6
I will assume that this is Java.
First, I'm surprised that you don't get an exception with your importNode() call, since you're importing the Document, which shouldn't be allowed (per the JavaDoc).
Now to the question that you asked: if you only want to attach specific node types, you need to make a test using the node's type. A switch statement is the easiest (note: this has not been compiled, may contain syntax errors):
switch (n.getNodeType())
{
case ELEMENT_NODE :
// append the node to the other tree
break;
default :
// do nothing
}
Probably you want Node.cloneNode() method:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.newDocument();
Element element = dom.createElement("string");
element.setAttribute("name", "some_name");
String inputXMLString =
"<dhruba><test>this</test>that<test2>node value</test2> some text</dhruba>";
Node n = db.parse(new InputSource(new StringReader(inputXMLString))).getDocumentElement();
n = dom.importNode(n, true);
NodeList nodeList = n.getChildNodes();
for (int i = 0; i < nodeList.getLength(); ++i)
{
Node node = nodeList.item(i);
element.appendChild(node.cloneNode(true));
}
dom.appendChild(element);
To get dom into stdout or file you could write:
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
DOMSource source = new DOMSource(dom);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
Result:
<string name="some_name">
<test>this</test>that<test2>node value</test2> some text</string>

A question on how to solve the string problem in Java

I've created a simple xml file here:
http://roberthan.host56.com/productsNew.xml
which is quite simple, the root node is [products] while all other element nodes are [product]. Under each [product] node, there are two child nodes, [code] and [name], so it basically looks like:
[product]
[code]ddd[/code]
[name]ssss[/name]
[/product]
I've also written up the following Java code to parse this XML file and take out the text content of the [product] node, and add it to a JComboBox.
docBuilder = docFactory.newDocumentBuilder();
doc = docBuilder.parse("http://roberthan.host56.com/productsNew.xml");
NodeList productNodes = doc.getElementsByTagName("product");
productlist.clear();
for (i = 0; i < productNodes.getLength(); i++)
{
Node childNode = productNodes.item(i);
if (childNode.hasChildNodes()) {
NodeList nl = childNode.getChildNodes();
Node nameNode = nl.item(2);
productlist.add(nameNode.getTextContent());
}
}
final JComboBox productComboB = new JComboBox();
Iterator iterator = productlist.iterator();
while(iterator.hasNext())
{
productComboB.addItem(iterator.next().toString());
}
The code is quite straightforward, I firstly parse the xml and get all the product nodes and put them into a nodelist, and the productList is an arrayList. I loop through the all the [product] nodes, for each of them, if it has child nodes, then I take the second child node (which is the [name] node) and put the text content of it in the array list, and finally, I loop through the arrayList and add each item to the combo box.
The problem I got is, if I select the [code] child node, which means "Node nameNode = nl.item(1)", it will work perfectly; however, if I change that item(1) to item(2) to extract all the [name] nodes, the combo box will have a drop down list, but all the items are blank, like I have inserted 10 empty strings.
Also, if I try to add a "Hello World" string into the combo box after the above code, the "Hello World" item will appear after the 10 empty items.
I have spent the whole afternoon debugging this but still no breakthrough, the XML is actually quite simple and the Java is straightforward too. Could anyone share some thoughts with me on this please. Thanks a lot!
It is because the node list contains text nodes also.
If you add the following snippet to your code you will find that
for(int j = 0;j<nl.getLength();j++){
System.out.println(nl.item(j).getNodeName());
}
It will give the following output for each iteration of the product
#text
code
#text
name
#text
This means you have to get the 3rd element to get the name node.
Node nameNode = nl.item(3);
But I'll suggest you to use XPath to solve this problem.
NodeList nodelist = XPathAPI.selectNodeList(doc, "//products/product/name");
for (int i = 0; i < nodelist.getLength(); i++) {
productlist.add(nodelist.item(i).getTextContent());
}
XPath using this expression will easily solve your problem:
String XPATH_EXPRESSION1 = "//name/text()";
e.g.,
public static final String PRODUCTS_NEW = "http://roberthan.host56.com/productsNew.xml";
public static final String XPATH_EXPRESSION1 = "//name/text()";
public XmlFun() {
URL productsUrl;
try {
productsUrl = new URL(PRODUCTS_NEW);
List<String> nameList = xPathExtract(productsUrl.openStream());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
private List<String> xPathExtract(InputStream inStream) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document domDoc = builder.parse(inStream);
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
XPathExpression xExpr = xpath.compile(XPATH_EXPRESSION1);
NodeList nodes = (NodeList)xExpr.evaluate(domDoc, XPathConstants.NODESET);
List<String> resultList = new ArrayList<String>();
for (int i = 0; i < nodes.getLength(); i++) {
String node = nodes.item(i).getNodeValue();
resultList.add(node);
}
return resultList;
}

Categories