Parsing XML file Using Java (DOM parser) - java

Ok, so I have been able to kind of parse through this xml file. But I am unable to get to the section I want.
http://www.faroo.com/api?q=iphone&start=1&length=10&l=en&src=news&f=rss
This is the URL to the xml because it looks very ugly just pasted on here. I have gone through this xml and have copied it to a file. The part that I need is the "title" in the first "item". I have gone through with this code:
System.out.println(myDocument.getElementsByTagName("item").item(0).getTextContent());
And this just prints all of the contents of the first "item", like "title" and "link" and "description" but I do not want all of it, I only want "title" to be printed. I have having problems getting it to work exactly right, but I feel like I am close. Any help will be appreciated. Thanks.

From the Oracle documentation on the org.w3c.dom package:
This attribute returns the text content of this node and its descendants.
Your code is calling getTextContent() on the item tag. If you modify your code so that it retrieves the text from the title tag, it works correctly.
System.out.println(myDocument.getElementsByTagName("item").item(0).getFirstChild().getTextContent());
Note that this relies on title being the first child tag in item. You may want to change this to a more order-independant solution.

Below is a code that iterates through the whole rss and gets all the titles, links and descriptions. You can create an object that has title, link and description as attributes and use it as you please:
try {
File fXmlFile = new File("api.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("item");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("title : " + eElement.getElementsByTagName("title").item(0).getTextContent());
System.out.println("link : " + eElement.getElementsByTagName("link").item(0).getTextContent());
System.out.println("description : " + eElement.getElementsByTagName("description").item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}
Hope that helps.

Related

How to extract data from <dc> tag in java?

I am currently trying to extract the tag element < dc:title > from an epub in Java. However, i tried using
doc.getDocumentElement().getElementsByTagName("dc:title"));
and it only showed 2nd element :com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl. I would like to know how can I extract < dc:tittle > ?
Here is my code:
File fXmlFile = new File("file directory");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("1st element :" + doc.getElementsByTagName("dc");
System.out.println("2nd element :" + doc.getDocumentElement().getElementsByTagName("dc:title"));
System output:
1st element : com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl#4f53e9be
2nd element :com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl#e16e1a2
Added Sample Data
<dc:title>
<![CDATA[someData]]>
</dc:title>
<dc:creator>
<![CDATA[someData]>
</dc:creator>
<dc:language>someData</dc:language>
The method getElementsByTagName(String) is return a List of matching elements (note plural 's'). You then need to specify which element (such as by using .item(index) to access a Node instance) you want to use. Therewith, you can using getNodeValue() on that Node object.
EDITED: because of the CDATA element, rather use Node.getTextContent():
NodeList elems = doc.getElementsByTagName("dc:title");
Node item = elems.item(0);
System.out.println(item.getTextContent());
I would suggest using xpath to get the desired output.
Also, refer following link for examples.
https://www.journaldev.com/1194/java-xpath-example-tutorial
For example:
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//dc:title/text()";
NodeList nodes = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.item(0).getNodeValue());

XML Parse - Issue with parsing text from specific Node [duplicate]

This question already has answers here:
Getting an attribute value in xml element
(3 answers)
Closed 5 years ago.
Face an issue in parsing XML to extract data from a specific node. I referred to Link1 Link2 Link3. Please note, am able to parse & get the data for other nodes in the below xml file like id, order_id etc. But for the below line / node, unable to extract the info of segment_id & instrument_id:
<trade segment_id="NSE-F&O " instrument_id="NSE:INFRATEL17NOVFUT">
Not sure if the way the XML file is setup or the way I am trying to extract the data for that specific node is wrong. Hope the specific issue I face is clear.
XML File:
<contract_note version="0.1">
<contracts>
<contract>
<id>CNT-17/18-5310750</id>
<name>CONTRACT NOTE CUM BILL</name>
<description>None</description>
<timestamp>2017-11-01</timestamp>
<trades>
<trade segment_id="NSE-F&O " instrument_id="NSE:INFRATEL17NOVFUT">
<id>37513030</id>
<order_id>1300000000352370</order_id>
<timestamp>09:20:48</timestamp>
<description>None</description>
<type>buy</type>
<quantity>1700</quantity>
<average_price>444.2</average_price>
<value>755140.0</value>
</trade>
</trades>
</contract>
</contracts>
</contract_note>
Code:
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
NodeList cNoteList = doc.getElementsByTagName("contract");
Node nNode = cNoteList.item(0);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
for (int j = 1; j <= eElement.getElementsByTagName("trade").getLength(); j++) {
// Check if data can be read for Node - 'id'
System.out.println(eElement.getElementsByTagName("id").item(j).getTextContent();
// Check if data can be read for segment_id & instrument_id
System.out.println("Scrip: " + eElement.getElementsByTagName("trade").item(0).getTextContent());
}
}catch (Exception e) {
e.printStackTrace();
}
Edit:
Corrected the xml file info provided above.
As #Juan commented, your XML is bad. Fix it by following the required XML escaping rules and replacing segment_id="NSE-F&O " with segment_id="NSE-F&O ".
If you cannot change the XML, then see How to parse invalid (bad / not well-formed) XML? for options, but the best option is to fix the XML at the source.

How to add nodes from another xml using xmlbeans

I am using xmlbeans to generate the xml document, while I need to extract all the children from another xml file and insert them to my current document.
The to_be_add.xml:
<root>
<style>
.....
</style>
<atlas img="styles/jmap.png">
....
</atlas>
.....
</root>
And this xml file does not have a schema so I do not create related java class to map it. You think it as a plain xml file.
I want the style atlas node added. I use the following codes:
XmlObject pointRoot = XmlObject.Factory.parse(Main.class.getResourceAsStream("to_be_added.xml"));
NodeList nodeList = pointRoot.getDomNode().getChildNodes();
Node themeNode = renderthemeDoc.getDomNode();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
themeNode.appendChild(node);
}
Then I got error:
Exception in thread "main"
org.apache.xmlbeans.impl.store.DomImpl$WrongDocumentErr: Child to add
is from another document
And I found this post by searching "child to .... another document": how to add a xml document to another xml document in java which said that the connection between the element and the document has to be broken between the element can be add to other document.
So I try to build the Document object(that is why the variable pointDoc and themeDoc exist):
XmlObject pointRoot = XmlObject.Factory.parse(Main.class.getResourceAsStream("to_be_added.xml"));
Document pointDoc = pointRoot.getDomNode().getOwnerDocument();
System.out.println(pointDoc);
Element element = pointDoc.getDocumentElement();
NodeList nodeList = element.getChildNodes();
Document themeDoc = myCurrentDoc.getDomNode().getOwnerDocument();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
node = themeDoc.importNode(node, true);
themeDoc.appendChild(node);
}
Then I got NullPointerException which said that the pointDoc is null.
That is the whole process how I try to solve this problem. If it is unclear, please tell me, I will update accordingly.
Is it possible to fix it?
Since your other XML file is not mapped to a class, you can use a regular DOM parser to read it and extract its nodes. But using a generic object factory you can still get the nodes:
XmlObject pointRoot = XmlObject.Factory.parse( "<root>\n" +
" <style>\n" +
" </style>\n" +
" <atlas img=\"styles/jmap.png\">\n" +
" </atlas>\n" +
"</root>");
Node pointDoc = pointRoot.getDomNode().getFirstChild();
NodeList nodeList = pointDoc.getChildNodes();
for(int i = 0; i < nodeList.getLength(); i++) {
System.out.println("Node: " + nodeList.item(i).getNodeName());
}
This will print:
Node: #text
Node: style
Node: #text
Node: atlas
Node: #text

How to get XML content as a String

<root>
<h id="1">
<d value="1,2,3,4,5"><open>10:00</open><close>23:00</close></d>
<d value="6"><open>10:00</open><close>2:00</close></d>
<d value="7"><open>10:00</open><close>21:00</close></d>
</h>
<h id="2">
</h>
</root>
Here I have the XML which root has list of <h> tagged nodes. Now I need to break these into parts and set it into different variables (add into a map).
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new InputSource(new ByteArrayInputStream(data.getBytes("utf-8"))));
NodeList nList = doc.getElementsByTagName("h");
for (int i = 0; i < nList.getLength(); i++)
{
Node nNode = nList.item(i);
System.out.println(nNode.getAttributes().getNamedItem("id") + " " + ?????);
}
what should I call in order to get the value (String value) of a nNode ?
Here is what Im looking for as the asnwer for the above code once some one fills the ????
1 <h id="1"><d value="1,2,3,4,5"><open>10:00</open><close>23:00</close></d><d value="6">open>10:00</open><close>2:00</close></d><d value="7"><open>10:00</open><close>21:00</close></d></h>
2 <h id="2"></h>
And i don't mind having as root element
You can use Node.getTextContent() to conveniently get all the text of a node (gets text of children as well).
See Parsing xml file contents without knowing xml file structure for a short example.
If you're trying to get the value attributes of the d nodes (I can't actually tell, your question is slightly unclear to me), then it would be different -- for that you would iterate through the children of each h node (use getChildNodes() or getFirstChild() + getNextSibling()) then grab their value attributes just as you are getting the id attribute of the h nodes (the above link also shows an example of iterating through child nodes).
Have you tried jDom library? http://www.jdom.org/docs/apidocs/org/jdom2/output/XMLOutputter.html
XMLOutputter outp = new XMLOutputter();
String s = outp.outputString(your_jdom_element);
Have you tried nNode.toString() if you are using Node from javax.xml.soap.Node.
You can use that:
http://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html#getTextContent()
but your sample nNode has other nodes, not just text. It seems you need helper method to construct String from child nodes.
Pass your nNode to nodeToString
XML Node to String in Java

How do I get the tag 'Name' from a XML Node in Java (Android)

I have a tiny little problem parsing an XML file in Java (Android).
I have an XML file that is like this:
<Events>
<Event Name="Olympus Has Fallen">
...
</Event>
<Event Name="Iron Man 3">
...
</Event>
</Events>
I already managed to get the NodeList by doing this:
URL url = new URL("********");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList = doc.getElementsByTagName("Event");
Also I managed to get every single item of the NodeList by doing this:
for (int i = 0; i < nodeList.getLength(); i++) {
// Item
Node node = nodeList.item(i);
Log.i("film", node.getNodeName());
}
But this just Logs: "Event" instead of the value of the Name tag.
How do I output the value of this 'name' tag from the XML.
Can anyone help me with this one?
Thanks in advance!
But this just Logs: "Event" instead of the value of the Name tag.
Yes, because you're asking for the name of the element. There isn't a Name "tag" - there's a Name attribute, and that's what you should find:
// Only check in elements, and only those which actually have attributes.
if (node.hasAttributes()) {
NamedNodeMap attributes = node.getAttributes();
Node nameAttribute = attributes.getNamedItem("Name");
if (nameAttribute != null) {
System.out.println("Name attribute: " + nameAttribute.getTextContent());
}
}
(It's very important to be precise in terminology - it's worth knowing the difference between nodes, elements, attributes etc. It will help you enormously both when communicating with others and when looking for the right bits of API to call.)

Categories