I have XML content as below
<PARENT1 ATTR="FILE1">
<TITLE>test1.pdf</TITLE>
</PARENT1>
<PARENT2 ATTR="FILE2">
<TITLE>test2.pdf</TITLE>
</PARENT2>
I want to create a hashmap in Java by adding map Key as Parent attribute value and map Value as Child Node Value.
Example:
map.put("FILE1","test1.pdf");
map.put("FILE2","test2.pdf");
I know to get all child nodes list, but i am not getting how to get child node value based on parent node attribute or parent node.
How to achieve this in Java using DOM or SAX parser.
Any help is greatly appreciated.
Regards,
Tendulkar
If the XML files aren't huge, I'd recommend using JDOM instead of the default DOM parser as it's much more user friendly.
Here's an example to kinda do what you want, but you'll need to do the error checking and such yourself.
public class XmlParser {
private static final String xml = "<parents><parent name=\"name1\"><title>title1</title></parent><parent name=\"name2\"><title>title2</title></parent></parents>";
public static final void main(String [] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xml)));
Node parents = doc.getChildNodes().item(0);
Map<String, String> dataMap = new HashMap<>();
for (int i = 0; i < parents.getChildNodes().getLength(); i++) {
Node parent = parents.getChildNodes().item(i);
String name = parent.getAttributes().getNamedItem("name").getNodeValue();
String title = parent.getChildNodes().item(0).getTextContent();
dataMap.put(name, title);
}
System.out.println(dataMap);
}
}
Related
I have JAXB annotations:
#XmlElement(name="String")
private String string = "one";
#XmlElement(name="ArrayOne")
private ArrayList<String> array1 = new ArrayList<String>();
and marshalling:
array.add("Just one");
JAXBContext jc1 = JAXBContext.newInstance( getClass() );
Marshaller marshaller = jc1.createMarshaller();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ;
Document doc = factory.newDocumentBuilder().newDocument();
marshaller.marshal(this, doc);
If I dig into document nodes, I cannot see any difference between
node elements.
My question is, if there is any trick how to marshall into DOM document,
that node elements will be somehow distinct, whether is simple object(String),
or array object. Marshaller of course is aware of field types, so I wonder,
if it puts some flag in Element data.
DOM structure is
NodeName:String NodeContent:one
NodeName:ArrayOne NodeContent:Just one
but I would like to have like:
NodeName:String NodeContent:one
NodeName:ArrayOne
Children:
NodeName:ArrayOne NodeContent:Just one
so I know, ArrayOne is array, no matter of just one object.
Note, that I cannot change annotations, since there is not always source available.
You can create wrapper element for collections using #XmlElementWrapper:
#XmlElement(name="String")
private String string = "one";
#XmlElementWrapper(name="ArrayOne")
private ArrayList<String> array1 = new ArrayList<String>();
XML output for this mapping looks like this:
<testElement>
<String>one</String>
<ArrayOne>
<array1>one</array1>
</ArrayOne>
</testElement>
Update for comment:
Adding wrapper on DOM Document manually (probably there is an easier way, maybe by using a Transformer):
TestElement te = new TestElement();
JAXBContext jc1 = JAXBContext.newInstance(TestElement.class);
Marshaller marshaller = jc1.createMarshaller();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ;
Document doc = factory.newDocumentBuilder().newDocument();
marshaller.marshal(te, doc);
NodeList nodeList = doc.getDocumentElement().getChildNodes();
Node newNode = doc.createElement("ArrayOneWrapper");
List<Node> arrayOneElements = new ArrayList<>();
for (int i = 0; i < nodeList.getLength(); i++) {
Node n = nodeList.item(i);
if (n.getNodeName().equals("ArrayOne")) {
arrayOneElements.add(n);
}
}
for (Node n : arrayOneElements) {
newNode.appendChild(n);
}
XML output:
<testElement>
<String>one</String>
<ArrayOneWrapper>
<ArrayOne>one</ArrayOne>
<ArrayOne>two</ArrayOne>
</ArrayOneWrapper>
</testElement>
I have an XML with a list of beaches.
Each entry looks like:
<beach>
<name>Dor</name>
<longitude>32.1867</longitude>
<latitude>34.6077</latitude>
</beach>
I am using Jsoup to read this XML into a Document doc.
Is there an easy way to handle this data?
I like to be able to do something like this:
x = my_beach_list["Dor"].longitude;
Currently I left it in Jsoup Doc and I am using:
x = get_XML_val(doc, "Dor", "longitude");
With get_XML_val defined as:
private String get_XML_val(Document doc, String element_name, String element_attr) {
Elements beaches = doc.select("beach");
Elements one_node = beaches.select("beach:matches(" + element_name + ")");
Element node_attr = one_node.select(element_attr).first();
String t = node_attr.text();
return t;
}
Thanks
Ori
Java is an object oriented language, and you would benefit a lot by using that fact, for instance, you can parse your XML into a java data structure once, and use that as a lookup, ex:
class Beach {
private String name;
private double longitude;
private double latitude;
// constructor;
// getters and setters;
}
now when you have your class set up, you can start to parse your xml into a list of beach objects:
List<Beach> beaches = new ArrayList<>();
// for every beach object in the xml
beaches.add(new Beach(val_from_xml, ..., ...);
now when you want to find a specific object, you can query your collection.
beaches.stream().filter(beach -> "Dor".equals(beach.getName()));
You can use XPath evaluator to execute queries on the xml directly and can get the results. For Example : the xpath expression to get longitude value for a beach with name Dor is : /beach[name="Dor"]/longitude
The corresponding java code to evaluate xpath expressions can be written as :
private static void getValue(String xml) {
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new StringBufferInputStream(xml));
doc.getDocumentElement().normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile("/beach[name=\"Dor\"]/longitude").evaluate(doc,
XPathConstants.NODESET);
System.out.println(nodeList.item(0).getTextContent());
} catch (IOException | ParserConfigurationException | XPathExpressionException | SAXException exception) {
exception.printStackTrace();
}
}
The above method prints the longitude value as : 32.1867
More on Xpaths here.
I need to get the tag of an element right below the root, but DOM seems only to offer methods getting child nodes (not elements) and you cant cast from one to the other.
http://ideone.com/SUjRmn
#Override
public void loadXml(String filepath) throws Exception {
File f = new File(filepath);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = null;
Document doc = null;
try {
db = dbf.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
try {
doc = db.parse(f);
} catch (SAXException | IOException | NullPointerException e) {
e.printStackTrace();
}
Element root = doc.getDocumentElement();
Node firstChild = root.getFirstChild();
String tag = firstChild.getNodeName();
//here is the problem. I can't cast from Node to Element and Node
//stores only an int value, not the name of the object I want to restore
ShapeDrawer drawable = null;
switch (tag) {
case "scribble":
drawable = new ScribbleDrawer();
...
From the class to restore:
#Override
public void setValues(Element root) {
NodeList nodelist = null;
nodelist = root.getElementsByTagName("color");
colorManager.setColor((nodelist.item(0).getTextContent()));
this.color = colorManager.getCurrentColor();
System.out.println(color.toString());
nodelist = root.getElementsByTagName("pressx");
pressx = Integer.parseInt(nodelist.item(0).getTextContent());
System.out.println(pressx);
nodelist = root.getElementsByTagName("pressy");
pressy = Integer.parseInt(nodelist.item(0).getTextContent());
System.out.println(pressy);
nodelist = root.getElementsByTagName("lastx");
lastx = Integer.parseInt(nodelist.item(0).getTextContent());
nodelist = root.getElementsByTagName("lasty");
lasty = Integer.parseInt(nodelist.item(0).getTextContent());
}
public void toDOM(Document doc, Element root) {
System.out.println("ScribbleDrawer being saved");
Element shapeBranch = doc.createElement("scribble");
Attr attr1 = doc.createAttribute("hashcode");
attr1.setValue(((Integer) this.hashCode()).toString());
shapeBranch.setAttributeNode(attr1);
root.appendChild(shapeBranch);
Element eColor = doc.createElement("color");
eColor.setTextContent(colorManager.namedColorToString(color));
shapeBranch.appendChild(eColor);
// creating tree branch
Element press = doc.createElement("press");
Attr attr2 = doc.createAttribute("pressx");
attr2.setValue(((Integer) pressy).toString());
press.setAttributeNode(attr2);
Attr attr3 = doc.createAttribute("pressy");
attr3.setValue(((Integer) pressy).toString());
press.setAttributeNode(attr3);
shapeBranch.appendChild(press);
Element last = doc.createElement("last");
Attr attr4 = doc.createAttribute("lastx");
attr4.setValue(((Integer) lastx).toString());
last.setAttributeNode(attr4);
Attr attr5 = doc.createAttribute("lasty");
attr5.setValue(((Integer) lasty).toString());
last.setAttributeNode(attr5);
shapeBranch.appendChild(last);
}
I know other parsers are easier, but I am almost finished and when it comes to polymorphy JAXB seems to be just as complicated with Option-marshalling etc
EDIT: this is what the xml looks like; instead of "scribble" other tags/polymorphic children are possible which are deserialized from different instance variables (and thus different DOM-trees except for the root)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Drawables>
<scribble hashcode="189680059">
<color>Black</color>
<press pressx="221" pressy="221"/>
<last lastx="368" lasty="219"/>
</scribble>
<scribble hashcode="1215837841">
<color>Black</color>
<press pressx="246" pressy="246"/>
<last lastx="368" lasty="221"/>
</scribble>
If your node is an Element, you can cast it from node to element. But your first child might also be a text node, which can't be cast, of course. You have to test the nodes for their NodeType before casting.
If your XML is not using namespaces, you can use a method like this one to extract your child elements. It receives a list of nodes, test each one and returns a list containing only the elements:
public static List getChildren(Element element) {
List<Element> elements = new ArrayList<>();
NodeList nodeList = element.getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
elements.add((Element) node);
}
}
return elements;
}
An alternative is to use an API which already includes such utility methods, like DOM4J, or JDOM.
I'm trying to read xml file, ex :
<entry>
<title>FEED TITLE</title>
<id>5467sdad98787ad3149878sasda</id>
<tempi type="application/xml">
<conento xmlns="http://mydomainname.com/xsd/radiofeed.xsd" madeIn="USA" />
</tempi>
</entry>
Here is the code I have so far :
Here is my attempt of trying to code this, what to say not successful thats why I started bounty. Here it is http://pastebin.com/huKP4KED .
Bounty update :
I really really tried to do this for days now didn't expect to be so hard, I'll accept useful links/books/tutorials but prefer code because I need this done yesterday.
Here is what I need:
Concerning xml above :
I need to get value of title, id
attribute value of tempi as well as madeIn attribute value of contento
What is the best way to do this ?
EDIT:
#Pascal Thivent
Maybe creating method would be good idea like public String getValue(String xml, Element elementname), where you specify tag name, the method returns tag value or tag attribute(maybe give it name as additional method argument) if the value is not available
What I really want to get certain tag value or attribute if tag value(s) is not available, so I'm in the process of thinking what is the best way to do so since I've never done it before
The best solution for this is to use XPath. Your pastebin is expired, but here's what I gathered. Let's say we have the following feed.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<entries>
<entry>
<title>FEED TITLE 1</title>
<id>id1</id>
<tempi type="type1">
<conento xmlns="dontcare?" madeIn="MadeIn1" />
</tempi>
</entry>
<entry>
<title>FEED TITLE 2</title>
<id>id2</id>
<tempi type="type2">
<conento xmlns="dontcare?" madeIn="MadeIn2" />
</tempi>
</entry>
<entry>
<id>id3</id>
</entry>
</entries>
Here's a short but compile-and-runnable proof-of-concept (with feed.xml file in the same directory).
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.io.*;
import java.util.*;
public class XPathTest {
static class Entry {
final String title, id, origin, type;
Entry(String title, String id, String origin, String type) {
this.title = title;
this.id = id;
this.origin = origin;
this.type = type;
}
#Override public String toString() {
return String.format("%s:%s(%s)[%s]", id, title, origin, type);
}
}
final static XPath xpath = XPathFactory.newInstance().newXPath();
static String evalString(Node context, String path) throws XPathExpressionException {
return (String) xpath.evaluate(path, context, XPathConstants.STRING);
}
public static void main(String[] args) throws Exception {
File file = new File("feed.xml");
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
NodeList entriesNodeList = (NodeList) xpath.evaluate("//entry", document, XPathConstants.NODESET);
List<Entry> entries = new ArrayList<Entry>();
for (int i = 0; i < entriesNodeList.getLength(); i++) {
Node entryNode = entriesNodeList.item(i);
entries.add(new Entry(
evalString(entryNode, "title"),
evalString(entryNode, "id"),
evalString(entryNode, "tempi/conento/#madeIn"),
evalString(entryNode, "tempi/#type")
));
}
for (Entry entry : entries) {
System.out.println(entry);
}
}
}
This produces the following output:
id1:FEED TITLE 1(MadeIn1)[type1]
id2:FEED TITLE 2(MadeIn2)[type2]
id3:()[]
Note how using XPath makes the value retrieval very simple, intuitive, readable, and straightforward, and "missing" values are also gracefully handled.
API links
package javax.xml.xpath
http://www.w3.org/TR/xpath
Wikipedia/XPath
Use Element.getAttribute and Element.setAttribute
In your example, ((Node) content.item(0)).getFirstChild().getAttributes(). Assuming that content is a typo, and you mean contento, getFirstChild is correctly returning NULL as contento has no children. Try: ((Node) contento.item(0)).getAttributes() instead.
Another issue is that by using getFirstChild and getChildNodes()[0] without checking the return value, you are running the risk of picking up child text nodes, instead of the element you want.
As pointed out, <contento> doesn't have any child so instead of:
(contento.item(0)).getFirstChild().getAttributes()
You should treat the Node as Element and use getAttribute(String), something like this:
((Element)contento.item(0)).getAttribute("madeIn")
Here is a modified version of your code (it's not the most robust code I've written):
InputStream inputStream = new ByteArrayInputStream(xml.getBytes());
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(inputStream);
doc.getDocumentElement().normalize();
System.out.println("Root element " + doc.getDocumentElement().getNodeName());
NodeList nodeLst = doc.getElementsByTagName("entry");
System.out.println("Information of all entries");
for (int s = 0; s < nodeLst.getLength(); s++) {
Node fstNode = nodeLst.item(s);
if (fstNode.getNodeType() == Node.ELEMENT_NODE) {
Element fstElmnt = (Element) fstNode;
NodeList title = fstElmnt.getElementsByTagName("title").item(0).getChildNodes();
System.out.println("Title : " + (title.item(0)).getNodeValue());
NodeList id = fstElmnt.getElementsByTagName("id").item(0).getChildNodes();
System.out.println("Id: " + (id.item(0)).getNodeValue());
Node tempiNode = fstElmnt.getElementsByTagName("tempi").item(0);
System.out.println("Type : " + ((Element) tempiNode).getAttribute("type"));
Node contento = tempiNode.getChildNodes().item(0);
System.out.println("Made in : " + ((Element) contento).getAttribute("madeIn"));
}
}
Running it on your XML snippet produces the following output:
Root element entry
Information of all entries
Title : FEED TITLE
Id: 5467sdad98787ad3149878sasda
Type : application/xml
Made in : USA
By the way, did you consider using something like Rome instead?
I have a 1000 entry document whose format is something like:
<Example>
<Entry>
<n1></n1>
<n2></n2>
</Entry>
<Entry>
<n1></n1>
<n2></n2>
</Entry>
<!--and so on-->
There are more than 1000 Entry nodes here. I am writing a Java program which basically gets all the node one by one and do some analyzing on each node. But the problem is that the retrieval time of the nodes increases with its no. For example it takes 78 millisecond to retrieve the first node 100 ms to retrieve the second and it keeps on increasing. And to retrieve the 999 node it takes more than 5 second. This is extremely slow. We would be plugging this code to XML files which have even more than 1000 entries. Some like millions. The total time to parse the whole document is more than 5 minutes.
I am using this simple code to traverse it. Here nxp is my own class which has all the methods to get nodes from xpath.
nxp.fromXpathToNode("/Example/Entry" + "[" + i + "]", doc);
and doc is the document for the file. i is the no of node to retrieve.
Also when i try something like this
List<Node> nl = nxp.fromXpathToNodes("/Example/Entry",doc);
content = nl.get(i);
I face the same problem.
Anyone has any solution on how to speed up the tretirival of the nodes, so it takes the same amount of time to get the 1st node as well as the 1000 node from the XML file.
Here is the code for xpathtonode.
public Node fromXpathToNode(String expression, Node context)
{
try
{
return (Node)this.getCachedExpression(expression).evaluate(context, XPathConstants.NODE);
}
catch (Exception cause)
{
throw new RuntimeException(cause);
}
}
and here is the code for fromxpathtonodes.
public List<Node> fromXpathToNodes(String expression, Node context)
{
List<Node> nodes = new ArrayList<Node>();
NodeList results = null;
try
{
results = (NodeList)this.getCachedExpression(expression).evaluate(context, XPathConstants.NODESET);
for (int index = 0; index < results.getLength(); index++)
{
nodes.add(results.item(index));
}
}
catch (Exception cause)
{
throw new RuntimeException(cause);
}
return nodes;
}
and here is the starting
public class NativeXpathEngine implements XpathEngine
{
private final XPathFactory factory;
private final XPath engine;
/**
* Cache for previously compiled XPath expressions. {#link XPathExpression#hashCode()}
* is not reliable or consistent so use the textual representation instead.
*/
private final Map<String, XPathExpression> cachedExpressions;
public NativeXpathEngine()
{
super();
this.factory = XPathFactory.newInstance();
this.engine = factory.newXPath();
this.cachedExpressions = new HashMap<String, XPathExpression>();
}
Try VTD-XML. It uses less memory than DOM. It is easier to use than SAX and supports XPath. Here is some sample code to help you get started. It applies an XPath to get the Entry elements and then prints out the n1 and n2 child elements.
final VTDGen vg = new VTDGen();
vg.parseFile("/path/to/file.xml", false);
final VTDNav vn = vg.getNav();
final AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/Example/Entry");
int count = 1;
while (ap.evalXPath() != -1) {
System.out.println("Inside Entry: " + count);
//move to n1 child
vn.toElement(VTDNav.FIRST_CHILD, "n1");
System.out.println("\tn1: " + vn.toNormalizedString(vn.getText()));
//move to n2 child
vn.toElement(VTDNav.NEXT_SIBLING, "n2");
System.out.println("\tn2: " + vn.toNormalizedString(vn.getText()));
//move back to parent
vn.toElement(VTDNav.PARENT);
count++;
}
The correct solution is to detach the node right after you call item(i), like so:
Node node = results.item(index)
node.getParentNode().removeChild(node)
nodes.add(node)
See XPath.evaluate performance slows down (absurdly) over multiple calls
I had similar issue with the Xpath Evaluation , I tried using CachedXPathAPI’s which is faster by 100X than the XPathApi’s which was used earlier.
more information about this Api is provided here :
http://xml.apache.org/xalan-j/apidocs/org/apache/xpath/CachedXPathAPI.html
Hope it helps.
Cheers,
Madhusudhan
If you need to parse huge but flat documents, SAX is a good alternative. It allows you to handle the XML as a stream instead of building a huge DOM. Your example could be parsed using a ContentHandler like this:
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.ext.DefaultHandler2;
public class ExampleHandler extends DefaultHandler2 {
private StringBuffer chars = new StringBuffer(1000);
private MyEntry currentEntry;
private MyEntryHandler myEntryHandler;
ExampleHandler(MyEntryHandler myEntryHandler) {
this.myEntryHandler = myEntryHandler;
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
chars.append(ch);
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if ("Entry".equals(localName)) {
myEntryHandler.handle(currentEntry);
currentEntry = null;
}
else if ("n1".equals(localName)) {
currentEntry.setN1(chars.toString());
}
else if ("n2".equals(localName)) {
currentEntry.setN2(chars.toString());
}
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
chars.setLength(0);
if ("Entry".equals(localName)) {
currentEntry = new MyEntry();
}
}
}
If the document has a deeper and more complex structure, you're going to need to use Stacks to keep track of the current path in the document. Then you should consider writing a general purpose ContentHandler to do the dirty work and use with your document type dependent handlers.
What kind of parser are you using?
DOM pulls the whole document in memory - once you pull the whole document in memory then your operations can be fast but doing so in a web app or a for loop can have an impact.
SAX parser does on demand parsing and loads nodes as and when you request.
So try to use a parser implementation that suits your need.
Use the JAXEN library for xpaths:
http://jaxen.codehaus.org/