How to store XML into Java array

How to store XML into Java array - java

I have an XML with a list of beaches.
Each entry looks like:
<beach>
<name>Dor</name>
<longitude>32.1867</longitude>
<latitude>34.6077</latitude>
</beach>
I am using Jsoup to read this XML into a Document doc.
Is there an easy way to handle this data?
I like to be able to do something like this:
x = my_beach_list["Dor"].longitude;
Currently I left it in Jsoup Doc and I am using:
x = get_XML_val(doc, "Dor", "longitude");
With get_XML_val defined as:
private String get_XML_val(Document doc, String element_name, String element_attr) {
Elements beaches = doc.select("beach");
Elements one_node = beaches.select("beach:matches(" + element_name + ")");
Element node_attr = one_node.select(element_attr).first();
String t = node_attr.text();
return t;
}
Thanks
Ori

Java is an object oriented language, and you would benefit a lot by using that fact, for instance, you can parse your XML into a java data structure once, and use that as a lookup, ex:
class Beach {
private String name;
private double longitude;
private double latitude;
// constructor;
// getters and setters;
}
now when you have your class set up, you can start to parse your xml into a list of beach objects:
List<Beach> beaches = new ArrayList<>();
// for every beach object in the xml
beaches.add(new Beach(val_from_xml, ..., ...);
now when you want to find a specific object, you can query your collection.
beaches.stream().filter(beach -> "Dor".equals(beach.getName()));

You can use XPath evaluator to execute queries on the xml directly and can get the results. For Example : the xpath expression to get longitude value for a beach with name Dor is : /beach[name="Dor"]/longitude
The corresponding java code to evaluate xpath expressions can be written as :
private static void getValue(String xml) {
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new StringBufferInputStream(xml));
doc.getDocumentElement().normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile("/beach[name=\"Dor\"]/longitude").evaluate(doc,
XPathConstants.NODESET);
System.out.println(nodeList.item(0).getTextContent());
} catch (IOException | ParserConfigurationException | XPathExpressionException | SAXException exception) {
exception.printStackTrace();
}
}
The above method prints the longitude value as : 32.1867
More on Xpaths here.

Related

Retrieving child Node value based on Parent Node attribute using DOM parser

I have XML content as below
<PARENT1 ATTR="FILE1">
<TITLE>test1.pdf</TITLE>
</PARENT1>
<PARENT2 ATTR="FILE2">
<TITLE>test2.pdf</TITLE>
</PARENT2>
I want to create a hashmap in Java by adding map Key as Parent attribute value and map Value as Child Node Value.
Example:
map.put("FILE1","test1.pdf");
map.put("FILE2","test2.pdf");
I know to get all child nodes list, but i am not getting how to get child node value based on parent node attribute or parent node.
How to achieve this in Java using DOM or SAX parser.
Any help is greatly appreciated.
Regards,
Tendulkar

If the XML files aren't huge, I'd recommend using JDOM instead of the default DOM parser as it's much more user friendly.
Here's an example to kinda do what you want, but you'll need to do the error checking and such yourself.
public class XmlParser {
private static final String xml = "<parents><parent name=\"name1\"><title>title1</title></parent><parent name=\"name2\"><title>title2</title></parent></parents>";
public static final void main(String [] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xml)));
Node parents = doc.getChildNodes().item(0);
Map<String, String> dataMap = new HashMap<>();
for (int i = 0; i < parents.getChildNodes().getLength(); i++) {
Node parent = parents.getChildNodes().item(i);
String name = parent.getAttributes().getNamedItem("name").getNodeValue();
String title = parent.getChildNodes().item(0).getTextContent();
dataMap.put(name, title);
}
System.out.println(dataMap);
}
}

Get node text with HTML task inside

I try to examine with Java XPath an html string like this:
<app>
<elem class="A">value1</elem>
<elem class="B">value2a<br />value2b</elem>
<elem class="C">value3</elem>
</app>
Actually for obtain the elem's value i use this code
public String getValue(String xml, String classValue){
XPath xpath = XPathFactory.newInstance().newXPath();
InputSource source = new InputSource(new StringReader(xml));
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = db.parse(source);
String xpathRequest = "//*[#class='"+classValue+"']/text()";
String value = xpath.evaluate(xpathRequest , document);
return value;
}
For classes A and C works fine, but when i ask the content of task with class B obtain only value2a
How i can get the complete string of node?

Simply run
String xpathRequest = "//*[#class='"+class+"']";
String value = this.xpath.evaluate(xpathRequest , document);
This will select the <elem> node and when converted to a String build the concatenation of all text content, e.g. Value2a Value2b
To get a list of all text contents below a Elem you need to select them as NodeSet:
String xpathRequest = "//*[#class='"+class+"']/text()";
NodeList textNodes = (NodeList)xpath.evaluate(xpathRequest , document, XPathConstants.NODESET);
ArrayList<String> texts = new ArrayList<>();
for (int i=0; i<textNodes.getLength(); i++)
texts.add(textNodes.item(i).getTextContent());

It is because xpath will return 2 value at this moment. Try below :-
List<WebElement> allprice = driver.findElements(By.xpath("//*[#class='B']/text()"));
for(WebElement a:WebElement allprice){
System.out.println(a.gettext());
}

navigating hierarchy of xml input file

How do I list the element names at a given level in an xml schema hierarchy? The code I have below is listing all element names at every level of the hierarchy, with no concept of nesting.
Here is my xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="CDA.xsl"?>
<SomeDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:something">
<title>some title</title>
<languageCode code="en-US"/>
<versionNumber value="1"/>
<recordTarget>
<someRole>
<id extension="998991"/>
<addr use="HP">
<streetAddressLine>1357 Amber Drive</streetAddressLine>
<city>Beaverton</city>
<state>OR</state>
<postalCode>97867</postalCode>
<country>US</country>
</addr>
<telecom value="tel:(816)276-6909" use="HP"/>
</someRole>
</recordTarget>
</SomeDocument>
Here is my java method for importing and iterating the xml file:
public static void parseFile() {
//get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
//Using factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
//parse using builder to get DOM representation of the XML file
Document dom = db.parse("D:\\mypath\\somefile.xml");
//get the root element
Element docEle = dom.getDocumentElement();
//get a nodelist of elements
NodeList nl = docEle.getElementsByTagName("*");
if (nl != null && nl.getLength() > 0) {
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println("node.getNodeName() is: "+node.getNodeName());
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
The output of the above program is:
title
languageCode
versionNumber
recordTarget
someRole
id
addr
streetAddressLine
city
state
postalCode
country
telecom
Instead, I would like to output the following:
title
languageCode
versionNumber
recordTarget
It would be nice to then be able to list the children of recordTarget as someRole, and then to list the children of someRole as id, addr, and telecom. And so on, but at my discretion in the code. How can I change my code to get the output that I want?

You're getting all nodes with this line:
NodeList nl = docEle.getElementsByTagName("*");
Change it to
NodeList nl = docEle.getChildNodes();
to get all of its children. Your print statement will then give you the output you're looking for.
Then, when you iterate through your NodeList, you can choose to call the same method on each Node you create:
NodeList children = node.getChildNodes();
If you want to print an XML-like structure, perhaps a recursive method that prints all child nodes is what you are looking for.

You could re-write the parseFile (I'd rather call it parseChildrenElementNames) method to take an input String that specifies the element name for which you want to print out its children element names:
public static void parseChildrenElementNames(String parentElementName) {
// get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
// Using factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
// parse using builder to get DOM representation of the XML file
Document dom = db
.parse("D:\\mypath\\somefile.xml");
// get the root element
NodeList elementsByTagName = dom.getElementsByTagName(parentElementName);
if(elementsByTagName != null) {
Node parentElement = elementsByTagName.item(0);
// get a nodelist of elements
NodeList nl = parentElement.getChildNodes();
if (nl != null) {
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println("node.getNodeName() is: "
+ node.getNodeName());
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
However, this will only consider the first element that matches the specified name.
For example, to get the list of elements under the first node named someRole, you would call parseChildrenElementNames("someRole"); which would print out:
node.getNodeName() is: id
node.getNodeName() is: addr
node.getNodeName() is: telecom

Output XML content with unknown node depth using XPath

I have paths of information I want to extract from a XML string:
"/root/A/info1"
"/root/A/B/info2"
"/root/A/B/info3"
"/root/A/info4"
And this is the input:
<root>
<A>
<info1>value1</info1>
<B>
<info2>value2.1</info2>
<info3>value3.1</info3>
</B>
<B>
<info2>value2.2</info2>
<!-- note: element "info3" is missing here! -->
</B>
<B>
<info2>value2.3</info2>
<info3>value3.3</info3>
</B>
<info4>value4</info4>
</A>
</root>
And I want to achieve this:
value1|value2.1|value3.1|value4
value1|value2.2|NULL|value4
value1|value2.3|value3.3|value4
My paths vary and I never know the depth of the XML file. Because "/root/A/B/info2" and "/root/A/B/info3" exist three times, I obviously need to output three lines.
I think recursion is needed here.
My code:
main function:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
String[] paths = new String[] {"/root/A/info1", "/root/A/B/info2", "/root/A/B/info3", "/root/A/info4"};
XPath xPath = XPathFactory.newInstance().newXPath();
String[] output = new String[paths.length];
for(int i=0; i<paths.length; i++) {
recursion(paths, doc, xPath, paths[i], i, output);
}
recursive function:
private static void recursion(String[] paths, Object parent, XPath xPath, String path, int position, String[] output) throws Exception {
if(path.contains("/")) { // check if it's the last element, which contains the needed value
List<String> pathNodes = new ArrayList(Arrays.asList(StringUtils.split(path, "/")));
String currentPathNode = pathNodes.get(0);
NodeList nodeList = (NodeList) xPath.compile(currentPathNode).evaluate(parent, XPathConstants.NODESET);
pathNodes.remove(0);
String newPath = StringUtils.join(pathNodes, "/");
for(int i=0; i<nodeList.getLength(); i++) {
Node node = nodeList.item(i);
recursion(paths, node, xPath, newPath, position, output.clone()); // clone?
}
}
else {
output[position] = xPath.compile(path).evaluate(parent);
if((position + 1) == paths.length) { // check if it's the last path, so output the values
System.out.println(StringUtils.join(output, "|"));
}
}
}
If I clone output I get this:
|||value4
If I don't clone output I get that (overwriting previous values):
value1|value2.3|value3.3|value4
Please give me a hint.
Update: Have again a look at the XML input. Text elements which have no value could be missing.

I finally solved it.
I added a context path to my application. It specifies which element is the deepest.
In my example here it would be "/root/A/B".
I update all my paths to be relative to that context path:
"../info1"
"info2"
"info3"
"../info4"
Then I count the nodes from the context path (here 3). That's also the number of lines that will be created. I create a loop to iterate over them and query my updated paths with XPath.

Using Java to parse XML

I have made a PHP script that parses an XML file. This is not easy to use and I wanted to implement it in Java.
Inside the first element there are various count of wfs:member elements I loop through:
foreach ($data->children("wfs", true)->member as $member) { }
This was easy to do with Java:
NodeList wfsMember = doc.getElementsByTagName("wfs:member");
for(int i = 0; i < wfsMember.getLength(); i++) { }
I have opened the XML file like this
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(WeatherDatabaseUpdater.class.getResourceAsStream("wfs.xml"));
Then I need to get a attribute from an element called observerdProperty. In PHP this is simple:
$member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href
But in Java, how do I do this? Do I need to use getElementsByTagName and loop through them if I want to go deeper in the structure?`
In PHP the whole script looks the following.
foreach ($data->children("wfs", true)->member as $member) {
$dataType = $dataTypes[(string) $member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href];
foreach ($member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->result->
children("wml2", true)->MeasurementTimeseries->
children("wml2", true)->point as $point) {
$time = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->time;
$value = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->value;
$data[$dataType][] = array($time, $value)
}
}
In the second foreach I loop through the observation elements and get the time and value data from it. Then I save it in an array. If I need to loop through the elements in Java the way I described, this is very hard to implement. I don't think that is the case, so could someone advice me how to implement something similar in Java?

The easiest way, if performance is not a main concern, is probably XPath. With XPath, you can find nodes and attributes simply by specifying a path.
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
The xpath_expression could be as simple as
"string(//member/observedProperty/#href)"
For more information about XPath, XPath Tutorial from W3Schools is pretty good.

You have few variations how to implement XML parsing at Java.
The most common is: DOM, SAX, StAX.
Everyone one has pros and cons. With Dom and Sax you able to validate your xml with xsd schema. But Stax works without xsd validation, and much faster.
For example, xml file:
<?xml version="1.0" encoding="UTF-8"?>
<staff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="oldEmployee.xsd">
<employee>
<name>Carl Cracker</name>
<salary>75000</salary>
<hiredate year="1987" month="12" day="15" />
</employee>
<employee>
<name>Harry Hacker</name>
<salary>50000</salary>
<hiredate year="1989" month="10" day="1" />
</employee>
<employee>
<name>Tony Tester</name>
<salary>40000</salary>
<hiredate year="1990" month="3" day="15" />
</employee>
</staff>
The longest at implementation (to my mind) DOM parser:
class DomXmlParser {
private Document document;
List<Employee> empList = new ArrayList<>();
public SchemaFactory schemaFactory;
public final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
public final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
public DomXmlParser() {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(EMPLOYEE_XML.getFilename()));
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseFromXmlToEmployee() {
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
Employee emp = new Employee();
NodeList childNodes = node.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
Node cNode = childNodes.item(j);
// identify the child tag of employees
if (cNode instanceof Element) {
switch (cNode.getNodeName()) {
case "name":
emp.setName(text(cNode));
break;
case "salary":
emp.setSalary(Double.parseDouble(text(cNode)));
break;
case "hiredate":
int yearAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("year").getNodeValue());
int monthAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("month").getNodeValue());
int dayAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("day").getNodeValue());
emp.setHireDay(yearAttr, monthAttr - 1, dayAttr);
break;
}
}
}
empList.add(emp);
}
}
return empList;
}
private String text(Node cNode) {
return cNode.getTextContent().trim();
}
}
SAX parser:
class SaxHandler extends DefaultHandler {
private Stack<String> elementStack = new Stack<>();
private Stack<Object> objectStack = new Stack<>();
public List<Employee> employees = new ArrayList<>();
Employee employee = null;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
this.elementStack.push(qName);
if ("employee".equals(qName)) {
employee = new Employee();
this.objectStack.push(employee);
this.employees.add(employee);
}
if("hiredate".equals(qName))
{
int yearatt = Integer.parseInt(attributes.getValue("year"));
int monthatt = Integer.parseInt(attributes.getValue("month"));
int dayatt = Integer.parseInt(attributes.getValue("day"));
if (employee != null) {
employee.setHireDay(yearatt, monthatt - 1, dayatt) ;
}
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
this.elementStack.pop();
if ("employee".equals(qName)) {
Object objects = this.objectStack.pop();
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
String value = new String(ch, start, length).trim();
if (value.length() == 0) return; // skip white space
if ("name".equals(currentElement())) {
employee = (Employee) this.objectStack.peek();
employee.setName(value);
} else if ("salary".equals(currentElement()) && "employee".equals(currentParrentElement())) {
employee.setSalary(Double.parseDouble(value));
}
}
private String currentElement() {
return this.elementStack.peek();
}
private String currentParrentElement() {
if (this.elementStack.size() < 2) return null;
return this.elementStack.get(this.elementStack.size() - 2);
}
}
Stax parser:
class StaxXmlParser {
private List<Employee> employeeList;
private Employee currentEmployee;
private String tagContent;
private String attrContent;
private XMLStreamReader reader;
public StaxXmlParser(String filename) {
employeeList = null;
currentEmployee = null;
tagContent = null;
try {
XMLInputFactory factory = XMLInputFactory.newFactory();
reader = factory.createXMLStreamReader(new FileInputStream(new File(filename)));
parseEmployee();
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseEmployee() throws XMLStreamException {
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
if ("employee".equals(reader.getLocalName())) {
currentEmployee = new Employee();
}
if ("staff".equals(reader.getLocalName())) {
employeeList = new ArrayList<>();
}
if ("hiredate".equals(reader.getLocalName())) {
int yearAttr = Integer.parseInt(reader.getAttributeValue(null, "year"));
int monthAttr = Integer.parseInt(reader.getAttributeValue(null, "month"));
int dayAttr = Integer.parseInt(reader.getAttributeValue(null, "day"));
currentEmployee.setHireDay(yearAttr, monthAttr - 1, dayAttr);
}
break;
case XMLStreamConstants.CHARACTERS:
tagContent = reader.getText().trim();
break;
case XMLStreamConstants.ATTRIBUTE:
int count = reader.getAttributeCount();
for (int i = 0; i < count; i++) {
System.out.printf("count is: %d%n", count);
}
break;
case XMLStreamConstants.END_ELEMENT:
switch (reader.getLocalName()) {
case "employee":
employeeList.add(currentEmployee);
break;
case "name":
currentEmployee.setName(tagContent);
break;
case "salary":
currentEmployee.setSalary(Double.parseDouble(tagContent));
break;
}
}
}
return employeeList;
}
}
And some main() test:
public static void main(String[] args) {
long startTime, elapsedTime;
Main main = new Main();
startTime = System.currentTimeMillis();
main.testSaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testStaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testDomParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
}
Output:
Using SAX Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 106 ms
Using StAX Parser:
------------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 5 ms
Using DOM Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 13 ms
You can see some glimpse view at there variations.
But at java exist other as JAXB - You need to have xsd schema and accord to this schema you generate classes. After this you this can use unmarchal() to read from xml file:
public class JaxbDemo {
public static void main(String[] args) {
try {
long startTime = System.currentTimeMillis();
// create jaxb and instantiate marshaller
JAXBContext context = JAXBContext.newInstance(Staff.class.getPackage().getName());
FileInputStream in = new FileInputStream(new File(Files.EMPLOYEE_XML.getFilename()));
System.out.println("Output from employee XML file");
Unmarshaller um = context.createUnmarshaller();
Staff staff = (Staff) um.unmarshal(in);
// print employee list
for (Staff.Employee emp : staff.getEmployee()) {
System.out.println(emp);
}
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime));
} catch (Exception e) {
e.printStackTrace();
}
}
}
I tried this one approach as before, result is next:
Employee { name='Carl Cracker', salary=75000, hiredate=1987-12-15 } }
Employee { name='Harry Hacker', salary=50000, hiredate=1989-10-1 } }
Employee { name='Tony Tester', salary=40000, hiredate=1990-3-15 } }
Parsing time is: 320 ms
I added another toString(), and it has different hire day format.
Here is few links that is interesting for you:
Java & XML Tutorial
JAXB Tutorial

DOM Parser through Recursion
Using a DOM parser, you can easily get into a mess of nested for loops as you've already pointed out. Nevertheless, DOM structure is represented by Node containing child nodes collection in the form of a NodeList where each element is again a Node - this becomes a perfect candidate for recursion.
Sample XML
To showcase the ability of DOM parser discounting the size of the XML, I took the example of a hosted sample OpenWeatherMap XML.
Searching by city name in XML format
This XML contains London's weather forecast for every 3 hour duration. This XML makes a good case of reading through a relatively large data set and extracting specific information through attributes within the child elements.
In the snapshot, we are targeting to gather the Elements marked by the arrows.
The Code
We start of by creating a Custom class to hold temperature and clouds values. We would also override toString() of this custom class to conveniently print our records.
ForeCast.java
public class ForeCast {
/**
* Overridden toString() to conveniently print the results
*/
#Override
public String toString() {
return "The minimum temperature is: " + getTemperature()
+ " and the weather overall: " + getClouds();
}
public String getTemperature() {
return temperature;
}
public void setTemperature(String temperature) {
this.temperature = temperature;
}
public String getClouds() {
return clouds;
}
public void setClouds(String clouds) {
this.clouds = clouds;
}
private String temperature;
private String clouds;
}
Now to the main class. In the main class where we perform our recursion, we want to create a List of ForeCast objects which store individual temperature and clouds records by traversing the entire XML.
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
In the XML the parent to both temperature and clouds elements is time, we would logically check for the time element.
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
Thereafter, we would get a handle on temperature and clouds elements which can be set to the ForeCast object.
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName("temperature").item(0);
// Minimum temperature value is selectively picked (for proof of concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName("clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
The complete class below:
CustomDomXmlParser.java
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class CustomDomXmlParser {
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
// Read XML throuhg a URL (a FileInputStream can be used to pick up an
// XML file from the file system)
InputStream path = new URL(
"http://api.openweathermap.org/data/2.5/forecast?q=London,us&mode=xml")
.openStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(path);
// Call to the recursive method with the parent node
traverse(document.getDocumentElement());
// Print the List values collected within the recursive method
for (ForeCast forecastObj : forecastList)
System.out.println(forecastObj);
}
/**
*
* #param node
*/
public static void traverse(Node node) {
// Get the list of Child Nodes immediate to the current node
NodeList list = node.getChildNodes();
// Declare our local instance of forecast object
ForeCast forecastObj = null;
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName(
"temperature").item(0);
// Minimum temperature value is selectively picked (for proof of
// concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName(
"clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
}
// Add our foreCastObj if initialized within this recursion, that is if
// it traverses the time node within the XML, and not in any other case
if (forecastObj != null)
forecastList.add(forecastObj);
/**
* Recursion block
*/
// Iterate over the next child nodes
for (int i = 0; i < list.getLength(); i++) {
Node currentNode = list.item(i);
// Recursively invoke the method for the current node
traverse(currentNode);
}
}
}
The Output
As you can figure out from the screenshot below, we were able to group together the 2 specific elements and assign their values effectively to a Java Collection instance. We delegated the complex parsing of the xml to the generic recursive solution and customized mainly the logical block part. As mentioned, it is a genetic solution with a minimal customization which can work through all valid xmls.
Alternatives
Many other alternatives are available, here is a list of open source XML parsers for Java.
However, your approach with PHP and your initial work with Java based parser aligns to the DOM based XML parser solution, simplified by the use of recursion.

I wouldn't suggest you to implement your own parse function for XML parsing since there are already many options out there. My suggestion is DOM parser. You can find few examples in the following link. (You can also choose from other available options)
http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html
You can use commands such as
eElement.getAttribute("id");
Source: http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/

I agree what has been already posted about not implementing parse functions yourself.
Instead of DOM/SAX/STAX parsers though, I would suggest using JDOM or XOM, which are external libraries.
Related discussions:
What Java XML library do you recommend (to replace dom4j)?
Should I still be using JDOM with Java 5 or 6?
My gut feeling is that jdom is the one most java developers use. Some use dom4j, some xom, some others, but hardly anybody implements these parsing functions themselves.

use Java
startElement and endElement
for DOM Parsers

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to store XML into Java array - java

Related

Retrieving child Node value based on Parent Node attribute using DOM parser

Get node text with HTML task inside

navigating hierarchy of xml input file

Output XML content with unknown node depth using XPath

Using Java to parse XML

Categories

Resources