XMLParsing, Dynamic Structure, Content

XMLParsing, Dynamic Structure, Content - java

Want to Achive:
Get an unknown XML file's Elements (Element Name, How many elements are there in the xml file).
Then get all the attributes and their name and values to use it later (eg Comparison to other xml file)
element_vs_attribute
Researched:
1. 2. 3. 4. 5.
And many more
Does Anyone have any idea for this?
I dont want to pre define more then 500 table like in the previous code snippet, somehow i should be able to get the number of elements and the element names itself dynamically.
EDIT!
Example1
<Root Attri1="" Attri2="">
<element1 EAttri1="" EAttri2=""/>
<Element2 EAttri1="" EAttri2="">
<nestedelement3 NEAttri1="" NEAttri2=""/>
</Element2>
</Root>
Example2
<Root Attri1="" Attri2="" Attr="" At="">
<element1 EAttri1="" EAttri2="">
<nestedElement2 EAttri1="" EAttri2="">
<nestedelement3 NEAttri1="" NEAttri2=""/>
</nestedElement2>
</element1>
</Root>
Program Snipet:
String Example1[] = {"element1","Element2","nestedelement3"};
String Example2[] = {"element1","nestedElement2","nestedelement3"};
for(int i=0;i<Example1.length;++){
NodeList Elements = oldDOC.getElementsByTagName(Example1[i]);
for(int j=0;j<Elements.getLength();j++) {
Node nodeinfo=Elements.item(j);
for(int l=0;l<nodeinfo.getAttributes().getLength();l++) {
.....
}
}
Output:
The expected result is to get all the Element and all the Attributes out from the XML file without pre defining anything.
eg:
Elements: element1 Element2 nestedelement3
Attributes: Attri1 Attri2 EAttri1 EAttri2 EAttri1 EAttri2 NEAttri1 NEAttri2

The right tool for this job is xpath
It allows you to collect all or some elements and attributes based on various criteria. It is the closest you will get to a "universal" xml parser.
Here is the solution that I came up with. The solution first finds all element names in the given xml doc, then for each element, it counts the element's occurrences, then collect it all to a map. same for attributes.
I added inline comments and method/variable names should be self explanatory.
import java.io.*;
import java.nio.file.*;
import java.util.*;
import java.util.function.*;
import java.util.stream.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
public class TestXpath
{
public static void main(String[] args) {
XPath xPath = XPathFactory.newInstance().newXPath();
try (InputStream is = Files.newInputStream(Paths.get("C://temp/test.xml"))) {
// parse file into xml doc
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document xmlDocument = builder.parse(is);
// find all element names in xml doc
Set<String> allElementNames = findNames(xmlDocument, xPath.compile("//*[name()]"));
// for each name, count occurrences, and collect to map
Map<String, Integer> elementsAndOccurrences = allElementNames.stream()
.collect(Collectors.toMap(Function.identity(), name -> countElementOccurrences(xmlDocument, name)));
System.out.println(elementsAndOccurrences);
// find all attribute names in xml doc
Set<String> allAttributeNames = findNames(xmlDocument, xPath.compile("//#*"));
// for each name, count occurrences, and collect to map
Map<String, Integer> attributesAndOccurrences = allAttributeNames.stream()
.collect(Collectors.toMap(Function.identity(), name -> countAttributeOccurrences(xmlDocument, name)));
System.out.println(attributesAndOccurrences);
} catch (Exception e) {
e.printStackTrace();
}
}
public static Set<String> findNames(Document xmlDoc, XPathExpression xpathExpr) {
try {
NodeList nodeList = (NodeList)xpathExpr.evaluate(xmlDoc, XPathConstants.NODESET);
// convert nodeList to set of node names
return IntStream.range(0, nodeList.getLength())
.mapToObj(i -> nodeList.item(i).getNodeName())
.collect(Collectors.toSet());
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return new HashSet<>();
}
public static int countElementOccurrences(Document xmlDoc, String elementName) {
return countOccurrences(xmlDoc, elementName, "count(//*[name()='" + elementName + "'])");
}
public static int countAttributeOccurrences(Document xmlDoc, String attributeName) {
return countOccurrences(xmlDoc, attributeName, "count(//#*[name()='" + attributeName + "'])");
}
public static int countOccurrences(Document xmlDoc, String name, String xpathExpr) {
XPath xPath = XPathFactory.newInstance().newXPath();
try {
Number count = (Number)xPath.compile(xpathExpr).evaluate(xmlDoc, XPathConstants.NUMBER);
return count.intValue();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return 0;
}
}

Related

Jacskon XML serialization of a list

EDIT: I am trying to serialise to XML markup from Java objects.
I am struggling to serialise some XML from a List of size N of Integers using Jackson.
I want to output the following XML from a list of integers of variable length [9, 2, ... , 7].
<tagName>
<thing1>9</thing1>
<thing2>2</thing2>
...
<thingN>7</thingN>
<tagName>
I can't find any resource on here for dealing with deserialising lists.
The closest I have managed to get is
#JacksonXmlProperty(localName = "thing")
private List<Integer> thingList;
And I can't figure out how to add a counter to the local name for each member of the list.
Any help would be appreciated, thank you!

Ok Now I got your problem. So We I tried It with Jsoup library and find below code snippet for your work.
public static void main(String[] args) {
int [] array={1,2,3,4,5};
TagName name = new TagName();
//initialize the TagName object
for (int a=0;a<array.length;a++) {
name.setThingList(array[a]);
}
XmlMapper xmlMapper = new XmlMapper();
try {
//get object as a string
String value = xmlMapper.writeValueAsString(name);
//First you need to parse the xml
Document doc = Jsoup.parse(value, "", Parser.xmlParser());
//get tagname object
Element tagname = doc.getElementsByTag("tagname").first();
//get tagname's children which are thing
Elements childs = tagname.children();
for (int a = 0; a < childs.size(); ) {
//rename their tagname
childs.get(a).tagName("thing" + ++a);
}
System.out.println(tagname);
} catch (IOException e) {
e.printStackTrace();
}
}
<tagname>
<thing1>
1
</thing1>
<thing2>
2
</thing2>
<thing3>
3
</thing3>
<thing4>
4
</thing4>
<thing5>
5
</thing5>
</tagname>
#JacksonXmlRootElement(localName = "xml")
public class TagName {
public ArrayList<Integer> getThingList() {
return thingList;
}
public void setThingList(Integer thing) {
this.thingList.add(thing);
}
#JacksonXmlElementWrapper(localName = "tagname")
#JacksonXmlProperty(localName = "thing")
private ArrayList<Integer> thingList = new ArrayList<>();
}

Using Java to parse XML

I have made a PHP script that parses an XML file. This is not easy to use and I wanted to implement it in Java.
Inside the first element there are various count of wfs:member elements I loop through:
foreach ($data->children("wfs", true)->member as $member) { }
This was easy to do with Java:
NodeList wfsMember = doc.getElementsByTagName("wfs:member");
for(int i = 0; i < wfsMember.getLength(); i++) { }
I have opened the XML file like this
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(WeatherDatabaseUpdater.class.getResourceAsStream("wfs.xml"));
Then I need to get a attribute from an element called observerdProperty. In PHP this is simple:
$member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href
But in Java, how do I do this? Do I need to use getElementsByTagName and loop through them if I want to go deeper in the structure?`
In PHP the whole script looks the following.
foreach ($data->children("wfs", true)->member as $member) {
$dataType = $dataTypes[(string) $member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href];
foreach ($member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->result->
children("wml2", true)->MeasurementTimeseries->
children("wml2", true)->point as $point) {
$time = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->time;
$value = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->value;
$data[$dataType][] = array($time, $value)
}
}
In the second foreach I loop through the observation elements and get the time and value data from it. Then I save it in an array. If I need to loop through the elements in Java the way I described, this is very hard to implement. I don't think that is the case, so could someone advice me how to implement something similar in Java?

The easiest way, if performance is not a main concern, is probably XPath. With XPath, you can find nodes and attributes simply by specifying a path.
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
The xpath_expression could be as simple as
"string(//member/observedProperty/#href)"
For more information about XPath, XPath Tutorial from W3Schools is pretty good.

You have few variations how to implement XML parsing at Java.
The most common is: DOM, SAX, StAX.
Everyone one has pros and cons. With Dom and Sax you able to validate your xml with xsd schema. But Stax works without xsd validation, and much faster.
For example, xml file:
<?xml version="1.0" encoding="UTF-8"?>
<staff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="oldEmployee.xsd">
<employee>
<name>Carl Cracker</name>
<salary>75000</salary>
<hiredate year="1987" month="12" day="15" />
</employee>
<employee>
<name>Harry Hacker</name>
<salary>50000</salary>
<hiredate year="1989" month="10" day="1" />
</employee>
<employee>
<name>Tony Tester</name>
<salary>40000</salary>
<hiredate year="1990" month="3" day="15" />
</employee>
</staff>
The longest at implementation (to my mind) DOM parser:
class DomXmlParser {
private Document document;
List<Employee> empList = new ArrayList<>();
public SchemaFactory schemaFactory;
public final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
public final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
public DomXmlParser() {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(EMPLOYEE_XML.getFilename()));
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseFromXmlToEmployee() {
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
Employee emp = new Employee();
NodeList childNodes = node.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
Node cNode = childNodes.item(j);
// identify the child tag of employees
if (cNode instanceof Element) {
switch (cNode.getNodeName()) {
case "name":
emp.setName(text(cNode));
break;
case "salary":
emp.setSalary(Double.parseDouble(text(cNode)));
break;
case "hiredate":
int yearAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("year").getNodeValue());
int monthAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("month").getNodeValue());
int dayAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("day").getNodeValue());
emp.setHireDay(yearAttr, monthAttr - 1, dayAttr);
break;
}
}
}
empList.add(emp);
}
}
return empList;
}
private String text(Node cNode) {
return cNode.getTextContent().trim();
}
}
SAX parser:
class SaxHandler extends DefaultHandler {
private Stack<String> elementStack = new Stack<>();
private Stack<Object> objectStack = new Stack<>();
public List<Employee> employees = new ArrayList<>();
Employee employee = null;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
this.elementStack.push(qName);
if ("employee".equals(qName)) {
employee = new Employee();
this.objectStack.push(employee);
this.employees.add(employee);
}
if("hiredate".equals(qName))
{
int yearatt = Integer.parseInt(attributes.getValue("year"));
int monthatt = Integer.parseInt(attributes.getValue("month"));
int dayatt = Integer.parseInt(attributes.getValue("day"));
if (employee != null) {
employee.setHireDay(yearatt, monthatt - 1, dayatt) ;
}
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
this.elementStack.pop();
if ("employee".equals(qName)) {
Object objects = this.objectStack.pop();
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
String value = new String(ch, start, length).trim();
if (value.length() == 0) return; // skip white space
if ("name".equals(currentElement())) {
employee = (Employee) this.objectStack.peek();
employee.setName(value);
} else if ("salary".equals(currentElement()) && "employee".equals(currentParrentElement())) {
employee.setSalary(Double.parseDouble(value));
}
}
private String currentElement() {
return this.elementStack.peek();
}
private String currentParrentElement() {
if (this.elementStack.size() < 2) return null;
return this.elementStack.get(this.elementStack.size() - 2);
}
}
Stax parser:
class StaxXmlParser {
private List<Employee> employeeList;
private Employee currentEmployee;
private String tagContent;
private String attrContent;
private XMLStreamReader reader;
public StaxXmlParser(String filename) {
employeeList = null;
currentEmployee = null;
tagContent = null;
try {
XMLInputFactory factory = XMLInputFactory.newFactory();
reader = factory.createXMLStreamReader(new FileInputStream(new File(filename)));
parseEmployee();
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseEmployee() throws XMLStreamException {
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
if ("employee".equals(reader.getLocalName())) {
currentEmployee = new Employee();
}
if ("staff".equals(reader.getLocalName())) {
employeeList = new ArrayList<>();
}
if ("hiredate".equals(reader.getLocalName())) {
int yearAttr = Integer.parseInt(reader.getAttributeValue(null, "year"));
int monthAttr = Integer.parseInt(reader.getAttributeValue(null, "month"));
int dayAttr = Integer.parseInt(reader.getAttributeValue(null, "day"));
currentEmployee.setHireDay(yearAttr, monthAttr - 1, dayAttr);
}
break;
case XMLStreamConstants.CHARACTERS:
tagContent = reader.getText().trim();
break;
case XMLStreamConstants.ATTRIBUTE:
int count = reader.getAttributeCount();
for (int i = 0; i < count; i++) {
System.out.printf("count is: %d%n", count);
}
break;
case XMLStreamConstants.END_ELEMENT:
switch (reader.getLocalName()) {
case "employee":
employeeList.add(currentEmployee);
break;
case "name":
currentEmployee.setName(tagContent);
break;
case "salary":
currentEmployee.setSalary(Double.parseDouble(tagContent));
break;
}
}
}
return employeeList;
}
}
And some main() test:
public static void main(String[] args) {
long startTime, elapsedTime;
Main main = new Main();
startTime = System.currentTimeMillis();
main.testSaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testStaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testDomParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
}
Output:
Using SAX Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 106 ms
Using StAX Parser:
------------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 5 ms
Using DOM Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 13 ms
You can see some glimpse view at there variations.
But at java exist other as JAXB - You need to have xsd schema and accord to this schema you generate classes. After this you this can use unmarchal() to read from xml file:
public class JaxbDemo {
public static void main(String[] args) {
try {
long startTime = System.currentTimeMillis();
// create jaxb and instantiate marshaller
JAXBContext context = JAXBContext.newInstance(Staff.class.getPackage().getName());
FileInputStream in = new FileInputStream(new File(Files.EMPLOYEE_XML.getFilename()));
System.out.println("Output from employee XML file");
Unmarshaller um = context.createUnmarshaller();
Staff staff = (Staff) um.unmarshal(in);
// print employee list
for (Staff.Employee emp : staff.getEmployee()) {
System.out.println(emp);
}
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime));
} catch (Exception e) {
e.printStackTrace();
}
}
}
I tried this one approach as before, result is next:
Employee { name='Carl Cracker', salary=75000, hiredate=1987-12-15 } }
Employee { name='Harry Hacker', salary=50000, hiredate=1989-10-1 } }
Employee { name='Tony Tester', salary=40000, hiredate=1990-3-15 } }
Parsing time is: 320 ms
I added another toString(), and it has different hire day format.
Here is few links that is interesting for you:
Java & XML Tutorial
JAXB Tutorial

DOM Parser through Recursion
Using a DOM parser, you can easily get into a mess of nested for loops as you've already pointed out. Nevertheless, DOM structure is represented by Node containing child nodes collection in the form of a NodeList where each element is again a Node - this becomes a perfect candidate for recursion.
Sample XML
To showcase the ability of DOM parser discounting the size of the XML, I took the example of a hosted sample OpenWeatherMap XML.
Searching by city name in XML format
This XML contains London's weather forecast for every 3 hour duration. This XML makes a good case of reading through a relatively large data set and extracting specific information through attributes within the child elements.
In the snapshot, we are targeting to gather the Elements marked by the arrows.
The Code
We start of by creating a Custom class to hold temperature and clouds values. We would also override toString() of this custom class to conveniently print our records.
ForeCast.java
public class ForeCast {
/**
* Overridden toString() to conveniently print the results
*/
#Override
public String toString() {
return "The minimum temperature is: " + getTemperature()
+ " and the weather overall: " + getClouds();
}
public String getTemperature() {
return temperature;
}
public void setTemperature(String temperature) {
this.temperature = temperature;
}
public String getClouds() {
return clouds;
}
public void setClouds(String clouds) {
this.clouds = clouds;
}
private String temperature;
private String clouds;
}
Now to the main class. In the main class where we perform our recursion, we want to create a List of ForeCast objects which store individual temperature and clouds records by traversing the entire XML.
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
In the XML the parent to both temperature and clouds elements is time, we would logically check for the time element.
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
Thereafter, we would get a handle on temperature and clouds elements which can be set to the ForeCast object.
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName("temperature").item(0);
// Minimum temperature value is selectively picked (for proof of concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName("clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
The complete class below:
CustomDomXmlParser.java
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class CustomDomXmlParser {
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
// Read XML throuhg a URL (a FileInputStream can be used to pick up an
// XML file from the file system)
InputStream path = new URL(
"http://api.openweathermap.org/data/2.5/forecast?q=London,us&mode=xml")
.openStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(path);
// Call to the recursive method with the parent node
traverse(document.getDocumentElement());
// Print the List values collected within the recursive method
for (ForeCast forecastObj : forecastList)
System.out.println(forecastObj);
}
/**
*
* #param node
*/
public static void traverse(Node node) {
// Get the list of Child Nodes immediate to the current node
NodeList list = node.getChildNodes();
// Declare our local instance of forecast object
ForeCast forecastObj = null;
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName(
"temperature").item(0);
// Minimum temperature value is selectively picked (for proof of
// concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName(
"clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
}
// Add our foreCastObj if initialized within this recursion, that is if
// it traverses the time node within the XML, and not in any other case
if (forecastObj != null)
forecastList.add(forecastObj);
/**
* Recursion block
*/
// Iterate over the next child nodes
for (int i = 0; i < list.getLength(); i++) {
Node currentNode = list.item(i);
// Recursively invoke the method for the current node
traverse(currentNode);
}
}
}
The Output
As you can figure out from the screenshot below, we were able to group together the 2 specific elements and assign their values effectively to a Java Collection instance. We delegated the complex parsing of the xml to the generic recursive solution and customized mainly the logical block part. As mentioned, it is a genetic solution with a minimal customization which can work through all valid xmls.
Alternatives
Many other alternatives are available, here is a list of open source XML parsers for Java.
However, your approach with PHP and your initial work with Java based parser aligns to the DOM based XML parser solution, simplified by the use of recursion.

I wouldn't suggest you to implement your own parse function for XML parsing since there are already many options out there. My suggestion is DOM parser. You can find few examples in the following link. (You can also choose from other available options)
http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html
You can use commands such as
eElement.getAttribute("id");
Source: http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/

I agree what has been already posted about not implementing parse functions yourself.
Instead of DOM/SAX/STAX parsers though, I would suggest using JDOM or XOM, which are external libraries.
Related discussions:
What Java XML library do you recommend (to replace dom4j)?
Should I still be using JDOM with Java 5 or 6?
My gut feeling is that jdom is the one most java developers use. Some use dom4j, some xom, some others, but hardly anybody implements these parsing functions themselves.

use Java
startElement and endElement
for DOM Parsers

Get XPath of XML Tag

If I have an XML document like below:
<foo>
<foo1>Foo Test 1</foo1>
<foo2>
<another1>
<test10>This is a duplicate</test10>
</another1>
</foo2>
<foo2>
<another1>
<test1>Foo Test 2</test1>
</another1>
</foo2>
<foo3>Foo Test 3</foo3>
<foo4>Foo Test 4</foo4>
</foo>
How do I get the XPath of <test1> for example? So the output should be something like: foo/foo2[2]/another1/test1
I'm guessing the code would look something like this:
public String getXPath(Document document, String xmlTag) {
String xpath = "";
...
//Get the node from xmlTag
//Get the xpath using the node
return xpath;
}
Let's say String XPathVar = getXPath(document, "<test1>");. I need to get back an absolute xpath that will work in the following code:
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression xpr = xpath.compile(XPathVar);
xpr.evaluate(Document, XPathConstants.STRING);
But it can't be a shortcut like //test1 because it will also be used for meta data purposes.
When printing the result out via:
System.out.println(xpr.evaluate(Document, XPathConstants.STRING));
I should get the node's value. So if XPathVar = foo/foo2[2]/another1/test1 then I should get back:
Foo Test 2 and not This is a duplicate

You don't 'get' an xpath in the same way you don't 'get' sql.
An xpath is a query you write based on your understanding of an xml document or schema, just as sql is a query you write based on your understanding of a database schema - you don't 'get' either of them.
I would be possible to generate xpath statements from the DOM simply by walking back up the nodes from a given node, though to do this generically enough, taking into account attribute values on each node, would make the resulting code next to useless. For example (which comes with a warning that this will find the first node that has a given name, xpath is much more that this and you may as well just use the xpath //foo2):
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
public class XPathExample
{
private static String getXPath(Node root, String elementName)
{
for (int i = 0; i < root.getChildNodes().getLength(); i++)
{
Node node = root.getChildNodes().item(i);
if (node instanceof Element)
{
if (node.getNodeName().equals(elementName))
{
return "/" + node.getNodeName();
}
else if (node.getChildNodes().getLength() > 0)
{
String xpath = getXPath(node, elementName);
if (xpath != null)
{
return "/" + node.getNodeName() + xpath;
}
}
}
}
return null;
}
private static String getXPath(Document document, String elementName)
{
return document.getDocumentElement().getNodeName() + getXPath(document.getDocumentElement(), elementName);
}
public static void main(String[] args)
{
try
{
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(
new ByteArrayInputStream(
("<foo><foo1>Foo Test 1</foo1><foo2><another1><test1>Foo Test 2</test1></another1></foo2><foo3>Foo Test 3</foo3><foo4>Foo Test 4</foo4></foo>").getBytes()
)
);
String xpath = "/" + getXPath(document, "test1");
System.out.println(xpath);
Node node1 = (Node)XPathFactory.newInstance().newXPath().compile(xpath).evaluate(document, XPathConstants.NODE);
Node node2 = (Node)XPathFactory.newInstance().newXPath().compile("//test1").evaluate(document, XPathConstants.NODE);
//This evaluates to true, hence you may as well just use the xpath //test1.
System.out.println(node1.equals(node2));
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Likewise you could write an XML transformation that turned an xml document into a series of xpath statements but this transformation would be more complicated that writing the xpath in the first place and so largely pointless.

How's this:
private static String getXPath(Document root, String elementName)
{
try{
XPathExpression expr = XPathFactory.newInstance().newXPath().compile("//" + elementName);
Node node = (Node)expr.evaluate(root, XPathConstants.NODE);
if(node != null) {
return getXPath(node);
}
}
catch(XPathExpressionException e) { }
return null;
}
private static String getXPath(Node node) {
if(node == null || node.getNodeType() != Node.ELEMENT_NODE) {
return "";
}
return getXPath(node.getParentNode()) + "/" + node.getNodeName();
}
Note that this is first locating the node (using XPath) and then using the located node to get its XPath. Quite the roundabout approach to get a value you already have.
Working ideone example: http://ideone.com/EL4783

using conditions in XPath expressions

I need to parse through an XML document in the database and search for a given expression in it. Then, I must return a String value if the given expression is present in the XML else I need to parse through the next expression and return another String value and so on.
I achieved this by using the following code:
// An xml document is passed as a Node when getEntryType() method is called
public static class XMLTextFields {
public static String getEntryType(Node target) throws XPathExpressionException {
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXpath();
String entryType = null;
String [] expression = new String [] {"./libx:package", "./libx:libapp", "./libx:module"};
String [] type = new String [] {"Package", "Libapp", "Module" };
for (int i = 0; i < 3; i ++) {
XPathExpression expr = xpath.compile(expression[i]);
Object result = expr.evaluate(target, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
if (nodes.getLength() == 0)
continue;
entryType = (type[i]);
}
return entryType;
}
}
I am wondering if there is a simpler way to do this? Meaning, is there a way to use the "expression" like a function which returns a string if the expression is present in the xml.
I am guessing I should be able to do something like this but am not exactly sure:
String [] Expression = new String [] {"[./libx:package]\"Package\"", ....}
Meaning, return "Package" if libx:package node exists in the given XML

If your XPath processor is version 2, you can use if expressions: http://www.w3.org/TR/xpath20/#id-conditionals .

You can use XSLT here. In XSLT you can check the node name by using
<xsl:value-of select="*[starts-with(name(),'libx:package')]" />
OR you can check using
<xsl:if select="name()='libx:package'" >
<!-- Your cusotm elements here... -->
</xsl:if>
You can check existence of Element OR Attribute this way to validate specific needs.
hope this helps.

Yes there is, just use an XPath functions in your expression:
Expression exp = xpath.compile("local-name(*[local-name() = 'package'])")
// will return "package" for any matching elements
exp.evaluate(target, XPathConstants.STRING);
But this will return "package" instead of "Package". Note the capital P
Below is the Test code:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import java.util.Map;
import java.util.HashMap;
public class Test {
private static Map<String, String> mappings = new HashMap<String, String>();
static {
mappings.put("package", "Package");
mappings.put("libapp", "Application");
mappings.put("module", "Module");
}
public static void main(String[] args) throws Throwable {
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String entryType = null;
XPathExpression [] expressions = new XPathExpression[] {
xpath.compile("local-name(*[local-name() = 'package'])"),
xpath.compile("local-name(*[local-name() = 'libapp'])"),
xpath.compile("local-name(*[local-name() = 'module'])")
};
DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = fac.newDocumentBuilder();
Document doc = parser.parse(args[0]);
for(int i = 0; i < expressions.length; i++) {
String found = (String) expressions[i].evaluate(doc.getDocumentElement(),
XPathConstants.STRING);
entryType = mappings.get(found);
if(entryType != null && !entryType.trim().isEmpty()) {
break;
}
}
System.out.println(entryType);
}
}
Contents of text file:
<?xml version="1.0"?>
<root xmlns:libx="urn:libex">
<libx:package>mypack</libx:package>
<libx:libapp>myapp</libx:libapp>
<libx:module>mymod</libx:module>
</root>

In XPath 1.0
concat(translate(substring(local-name(libx:package|libx:libapp|libx:module),
1,
1),
'plm',
'PLM'),
substring(local-name(libx:package|libx:libapp|libx:module),2))
EDIT: It was dificult to understand the path because there was not provided input sample...

#ALL: Thanks!
I used:
XPathExpression expr = xpath.compile("concat(substring('Package',0,100*boolean(//libx:package))," + "substring('Libapp',0,100*boolean(//libx:libapp)),substring('Module',0,100*boolean(//libx:module)))");
expr.evaluate(target, XPathConstants.STRING);

XML child node attribute value

I'm trying to read xml file, ex :
<entry>
<title>FEED TITLE</title>
<id>5467sdad98787ad3149878sasda</id>
<tempi type="application/xml">
<conento xmlns="http://mydomainname.com/xsd/radiofeed.xsd" madeIn="USA" />
</tempi>
</entry>
Here is the code I have so far :
Here is my attempt of trying to code this, what to say not successful thats why I started bounty. Here it is http://pastebin.com/huKP4KED .
Bounty update :
I really really tried to do this for days now didn't expect to be so hard, I'll accept useful links/books/tutorials but prefer code because I need this done yesterday.
Here is what I need:
Concerning xml above :
I need to get value of title, id
attribute value of tempi as well as madeIn attribute value of contento
What is the best way to do this ?
EDIT:
#Pascal Thivent
Maybe creating method would be good idea like public String getValue(String xml, Element elementname), where you specify tag name, the method returns tag value or tag attribute(maybe give it name as additional method argument) if the value is not available
What I really want to get certain tag value or attribute if tag value(s) is not available, so I'm in the process of thinking what is the best way to do so since I've never done it before

The best solution for this is to use XPath. Your pastebin is expired, but here's what I gathered. Let's say we have the following feed.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<entries>
<entry>
<title>FEED TITLE 1</title>
<id>id1</id>
<tempi type="type1">
<conento xmlns="dontcare?" madeIn="MadeIn1" />
</tempi>
</entry>
<entry>
<title>FEED TITLE 2</title>
<id>id2</id>
<tempi type="type2">
<conento xmlns="dontcare?" madeIn="MadeIn2" />
</tempi>
</entry>
<entry>
<id>id3</id>
</entry>
</entries>
Here's a short but compile-and-runnable proof-of-concept (with feed.xml file in the same directory).
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.io.*;
import java.util.*;
public class XPathTest {
static class Entry {
final String title, id, origin, type;
Entry(String title, String id, String origin, String type) {
this.title = title;
this.id = id;
this.origin = origin;
this.type = type;
}
#Override public String toString() {
return String.format("%s:%s(%s)[%s]", id, title, origin, type);
}
}
final static XPath xpath = XPathFactory.newInstance().newXPath();
static String evalString(Node context, String path) throws XPathExpressionException {
return (String) xpath.evaluate(path, context, XPathConstants.STRING);
}
public static void main(String[] args) throws Exception {
File file = new File("feed.xml");
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
NodeList entriesNodeList = (NodeList) xpath.evaluate("//entry", document, XPathConstants.NODESET);
List<Entry> entries = new ArrayList<Entry>();
for (int i = 0; i < entriesNodeList.getLength(); i++) {
Node entryNode = entriesNodeList.item(i);
entries.add(new Entry(
evalString(entryNode, "title"),
evalString(entryNode, "id"),
evalString(entryNode, "tempi/conento/#madeIn"),
evalString(entryNode, "tempi/#type")
));
}
for (Entry entry : entries) {
System.out.println(entry);
}
}
}
This produces the following output:
id1:FEED TITLE 1(MadeIn1)[type1]
id2:FEED TITLE 2(MadeIn2)[type2]
id3:()[]
Note how using XPath makes the value retrieval very simple, intuitive, readable, and straightforward, and "missing" values are also gracefully handled.
API links
package javax.xml.xpath
http://www.w3.org/TR/xpath
Wikipedia/XPath

Use Element.getAttribute and Element.setAttribute
In your example, ((Node) content.item(0)).getFirstChild().getAttributes(). Assuming that content is a typo, and you mean contento, getFirstChild is correctly returning NULL as contento has no children. Try: ((Node) contento.item(0)).getAttributes() instead.
Another issue is that by using getFirstChild and getChildNodes()[0] without checking the return value, you are running the risk of picking up child text nodes, instead of the element you want.

As pointed out, <contento> doesn't have any child so instead of:
(contento.item(0)).getFirstChild().getAttributes()
You should treat the Node as Element and use getAttribute(String), something like this:
((Element)contento.item(0)).getAttribute("madeIn")
Here is a modified version of your code (it's not the most robust code I've written):
InputStream inputStream = new ByteArrayInputStream(xml.getBytes());
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(inputStream);
doc.getDocumentElement().normalize();
System.out.println("Root element " + doc.getDocumentElement().getNodeName());
NodeList nodeLst = doc.getElementsByTagName("entry");
System.out.println("Information of all entries");
for (int s = 0; s < nodeLst.getLength(); s++) {
Node fstNode = nodeLst.item(s);
if (fstNode.getNodeType() == Node.ELEMENT_NODE) {
Element fstElmnt = (Element) fstNode;
NodeList title = fstElmnt.getElementsByTagName("title").item(0).getChildNodes();
System.out.println("Title : " + (title.item(0)).getNodeValue());
NodeList id = fstElmnt.getElementsByTagName("id").item(0).getChildNodes();
System.out.println("Id: " + (id.item(0)).getNodeValue());
Node tempiNode = fstElmnt.getElementsByTagName("tempi").item(0);
System.out.println("Type : " + ((Element) tempiNode).getAttribute("type"));
Node contento = tempiNode.getChildNodes().item(0);
System.out.println("Made in : " + ((Element) contento).getAttribute("madeIn"));
}
}
Running it on your XML snippet produces the following output:
Root element entry
Information of all entries
Title : FEED TITLE
Id: 5467sdad98787ad3149878sasda
Type : application/xml
Made in : USA
By the way, did you consider using something like Rome instead?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XMLParsing, Dynamic Structure, Content - java

Related

Jacskon XML serialization of a list

Using Java to parse XML

Get XPath of XML Tag

using conditions in XPath expressions

XML child node attribute value

Categories

Resources