What is the easy and best way to read below xml? - java

I am new to XML.
Please let me know easy and best way to read xml below in java. In my xml, I have queries as root and query as child element in it.
<queries>
<query id="getUserByName">
select * from users where name=?
</query>
<query id="getUserByEmail">
select * from users where email=?
</query>
</queries>
I will pass query id, based on that we need to fetch corresponding query. Please help me with code for better understanding.

With XPath, it's simple.
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
public class Test {
public static final String xml =
"<queries>"
+ " <query id=\"getUserByName\">"
+ " select * from users where name=?"
+ " </query>"
+ " <query id=\"getUserByEmail\">"
+ " select * from users where email=?"
+ " </query>"
+ "</queries>";
public static void main(String[] args) throws Exception {
System.out.println(getQuery("getUserByName"));
System.out.println(getQuery("getUserByEmail"));
}
public static String getQuery (String id) throws Exception {
InputStream is = new ByteArrayInputStream(xml.getBytes("UTF8"));
InputSource inputSource = new InputSource(is);
XPath xpath = XPathFactory.newInstance().newXPath();
return xpath.evaluate("/queries/query[#id='" + id +"']", inputSource);
}
}

A very easy code to implement would be JAXB parser. Personally I love this one as it establishes everything using simple annotations.
Steps.
Create a couple of bean classes with the structure of your xml. In your case Queries class containing List<Query>. Define Query to contain a string variable. If you take the time to go through the annotations, I'm sure you can do this even with a single bean class but with multiple annotations.
Pass your string of XML to a JAXB context of Queries class and you are done.
You'll get one Java object for each Query tag. Once you get the bean class, manipulation becomes easy.
Ref:
JAXB Hello World Example
JAXB Tutorial

A good solution would be to load these queries to a map first and access it later based on the map. To load queries to a map you could do something like:
Map<String, String> queriesMap = new HashMap<String, String>();
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
ByteArrayInputStream inputStream = new ByteArrayInputStream("<queries> <query id=\"getUserByName\"> select * from users where name=? </query> <query id=\"getUserByEmail\"> select * from users where email=? </query></queries>".getBytes());
// you could use something like: new FileInputStream("queries.xml");
Document doc = documentBuilder.parse(inputStream);
// get queries elements
NodeList queriesNodes = doc.getElementsByTagName("queries");
// iterate over it
for (int i = 0; i < queriesNodes.getLength(); i++) {
// get queries element
Node node = queriesNodes.item(i);
// get query elements (theoretically)
NodeList queryNodes = node.getChildNodes();
for (int j = 0; j < queryNodes.getLength(); j++) {
Node queryNode = queryNodes.item(j);
// if not element just skip to next one (in case of text nodes for the white spaces)
if (!(queryNode.getNodeType() == Node.ELEMENT_NODE)) {
continue;
}
// get query
Node idAttr = queryNode.getAttributes().getNamedItem("id");
if (idAttr != null) {
queriesMap.put(idAttr.getTextContent(), StringUtils.trim(queryNode.getTextContent()));
}
}
}
System.out.println(queriesMap);

Related

Java XML JDOM2 XPath - Read text value from XML attribute and element using XPath expression

The program should be allowed to read from an XML file using XPath expressions.
I already started the project using JDOM2, switching to another API is unwanted.
The difficulty is, that the program does not know beforehand if it has to read an element or an attribute.
Does the API provide any function to receive the content (string) just by giving it the XPath expression?
From what I know about XPath in JDOM2, it uses objects of different types to evaluate XPath expressions pointing to attributes or elements.
I am only interested in the content of the attribute / element where the XPath expression points to.
Here is an example XML file:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
This is what my program looks like:
package exampleprojectgroup;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import org.jdom2.Attribute;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.filter.Filters;
import org.jdom2.input.SAXBuilder;
import org.jdom2.input.sax.XMLReaders;
import org.jdom2.xpath.XPathExpression;
import org.jdom2.xpath.XPathFactory;
public class ElementAttribute2String
{
ElementAttribute2String()
{
run();
}
public void run()
{
final String PATH_TO_FILE = "c:\\readme.xml";
/* It is essential that the program has to work with a variable amount of XPath expressions. */
LinkedList<String> xPathExpressions = new LinkedList<>();
/* Simulate user input.
* First XPath expression points to attribute,
* second one points to element.
* Many more expressions follow in a real situation.
*/
xPathExpressions.add( "/bookstore/book/#category" );
xPathExpressions.add( "/bookstore/book/price" );
/* One list should be sufficient to store the result. */
List<Element> elementsResult = null;
List<Attribute> attributesResult = null;
List<Object> objectsResult = null;
try
{
SAXBuilder saxBuilder = new SAXBuilder( XMLReaders.NONVALIDATING );
Document document = saxBuilder.build( PATH_TO_FILE );
XPathFactory xPathFactory = XPathFactory.instance();
int i = 0;
for ( String string : xPathExpressions )
{
/* Works only for elements, uncomment to give it a try. */
// XPathExpression<Element> xPathToElement = xPathFactory.compile( xPathExpressions.get( i ), Filters.element() );
// elementsResult = xPathToElement.evaluate( document );
// for ( Element element : elementsResult )
// {
// System.out.println( "Content of " + string + ": " + element.getText() );
// }
/* Works only for attributes, uncomment to give it a try. */
// XPathExpression<Attribute> xPathToAttribute = xPathFactory.compile( xPathExpressions.get( i ), Filters.attribute() );
// attributesResult = xPathToAttribute.evaluate( document );
// for ( Attribute attribute : attributesResult )
// {
// System.out.println( "Content of " + string + ": " + attribute.getValue() );
// }
/* I want to receive the content of the XPath expression as a string
* without having to know if it is an attribute or element beforehand.
*/
XPathExpression<Object> xPathExpression = xPathFactory.compile( xPathExpressions.get( i ) );
objectsResult = xPathExpression.evaluate( document );
for ( Object object : objectsResult )
{
if ( object instanceof Attribute )
{
System.out.println( "Content of " + string + ": " + ((Attribute)object).getValue() );
}
else if ( object instanceof Element )
{
System.out.println( "Content of " + string + ": " + ((Element)object).getText() );
}
}
i++;
}
}
catch ( IOException ioException )
{
ioException.printStackTrace();
}
catch ( JDOMException jdomException )
{
jdomException.printStackTrace();
}
}
}
Another thought is to search for the '#' character in the XPath expression, to determine if it is pointing to an attribute or element.
This gives me the desired result, though I wish there was a more elegant solution.
Does the JDOM2 API provide anything useful for this problem?
Could the code be redesigned to meet my requirements?
Thank you in advance!
XPath expressions are hard to type/cast because they need to be compiled in a system that is sensitive to the return type of the XPath functions/values that are in the expression. JDOM relies on third-party code to do that, and that third party code does not have a mechanism to correlate those types at your JDOM code's compile time. Note that XPath expressions can return a number of different types of content, including String, boolean, Number, and Node-List-like content.
In most cases, the XPath expression return type is known before the expression is evaluated, and the programmer has the "right" casting/expectations for processing the results.
In your case, you don't, and the expression is more dynamic.
I recommend that you declare a helper function to process the content:
private static final Function extractValue(Object source) {
if (source instanceof Attribute) {
return ((Attribute)source).getValue();
}
if (source instanceof Content) {
return ((Content)source).getValue();
}
return String.valueOf(source);
}
This at least will neaten up your code, and if you use Java8 streams, can be quite compact:
List<String> values = xPathExpression.evaluate( document )
.stream()
.map(o -> extractValue(o))
.collect(Collectors.toList());
Note that the XPath spec for Element nodes is that the string-value is the concatination of the Element's text() content as well as all child elements' content. Thus, in the following XML snippet:
<a>bilbo <b>samwise</b> frodo</a>
the getValue() on the a element will return bilbo samwise frodo, but the getText() will return bilbo frodo. Choose which mechanism you use for the value extraction carefully.
I had the exact same problem and took the approach of recognizing when an attribute is the focus of the Xpath. I solved with two functions. The first complied the XPathExpression for later use:
XPathExpression xpExpression;
if (xpath.matches( ".*/#[\\w]++$")) {
// must be an attribute value we're after..
xpExpression = xpfac.compile(xpath, Filters.attribute(), null, myNSpace);
} else {
xpExpression = xpfac.compile(xpath, Filters.element(), null, myNSpace);
}
The second evaluates and returns a value:
Object target = xpExpression.evaluateFirst(baseEl);
if (target != null) {
String value = null;
if (target instanceof Element) {
Element targetEl = (Element) target;
value = targetEl.getTextNormalize();
} else if (target instanceof Attribute) {
Attribute targetAt = (Attribute) target;
value = targetAt.getValue();
}
I suspect its a matter of coding style whether you prefer the helper function suggested in the previous answer or this approach. Either will work.

Using Java to parse XML

I have made a PHP script that parses an XML file. This is not easy to use and I wanted to implement it in Java.
Inside the first element there are various count of wfs:member elements I loop through:
foreach ($data->children("wfs", true)->member as $member) { }
This was easy to do with Java:
NodeList wfsMember = doc.getElementsByTagName("wfs:member");
for(int i = 0; i < wfsMember.getLength(); i++) { }
I have opened the XML file like this
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(WeatherDatabaseUpdater.class.getResourceAsStream("wfs.xml"));
Then I need to get a attribute from an element called observerdProperty. In PHP this is simple:
$member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href
But in Java, how do I do this? Do I need to use getElementsByTagName and loop through them if I want to go deeper in the structure?`
In PHP the whole script looks the following.
foreach ($data->children("wfs", true)->member as $member) {
$dataType = $dataTypes[(string) $member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href];
foreach ($member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->result->
children("wml2", true)->MeasurementTimeseries->
children("wml2", true)->point as $point) {
$time = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->time;
$value = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->value;
$data[$dataType][] = array($time, $value)
}
}
In the second foreach I loop through the observation elements and get the time and value data from it. Then I save it in an array. If I need to loop through the elements in Java the way I described, this is very hard to implement. I don't think that is the case, so could someone advice me how to implement something similar in Java?
The easiest way, if performance is not a main concern, is probably XPath. With XPath, you can find nodes and attributes simply by specifying a path.
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
The xpath_expression could be as simple as
"string(//member/observedProperty/#href)"
For more information about XPath, XPath Tutorial from W3Schools is pretty good.
You have few variations how to implement XML parsing at Java.
The most common is: DOM, SAX, StAX.
Everyone one has pros and cons. With Dom and Sax you able to validate your xml with xsd schema. But Stax works without xsd validation, and much faster.
For example, xml file:
<?xml version="1.0" encoding="UTF-8"?>
<staff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="oldEmployee.xsd">
<employee>
<name>Carl Cracker</name>
<salary>75000</salary>
<hiredate year="1987" month="12" day="15" />
</employee>
<employee>
<name>Harry Hacker</name>
<salary>50000</salary>
<hiredate year="1989" month="10" day="1" />
</employee>
<employee>
<name>Tony Tester</name>
<salary>40000</salary>
<hiredate year="1990" month="3" day="15" />
</employee>
</staff>
The longest at implementation (to my mind) DOM parser:
class DomXmlParser {
private Document document;
List<Employee> empList = new ArrayList<>();
public SchemaFactory schemaFactory;
public final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
public final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
public DomXmlParser() {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(EMPLOYEE_XML.getFilename()));
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseFromXmlToEmployee() {
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
Employee emp = new Employee();
NodeList childNodes = node.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
Node cNode = childNodes.item(j);
// identify the child tag of employees
if (cNode instanceof Element) {
switch (cNode.getNodeName()) {
case "name":
emp.setName(text(cNode));
break;
case "salary":
emp.setSalary(Double.parseDouble(text(cNode)));
break;
case "hiredate":
int yearAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("year").getNodeValue());
int monthAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("month").getNodeValue());
int dayAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("day").getNodeValue());
emp.setHireDay(yearAttr, monthAttr - 1, dayAttr);
break;
}
}
}
empList.add(emp);
}
}
return empList;
}
private String text(Node cNode) {
return cNode.getTextContent().trim();
}
}
SAX parser:
class SaxHandler extends DefaultHandler {
private Stack<String> elementStack = new Stack<>();
private Stack<Object> objectStack = new Stack<>();
public List<Employee> employees = new ArrayList<>();
Employee employee = null;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
this.elementStack.push(qName);
if ("employee".equals(qName)) {
employee = new Employee();
this.objectStack.push(employee);
this.employees.add(employee);
}
if("hiredate".equals(qName))
{
int yearatt = Integer.parseInt(attributes.getValue("year"));
int monthatt = Integer.parseInt(attributes.getValue("month"));
int dayatt = Integer.parseInt(attributes.getValue("day"));
if (employee != null) {
employee.setHireDay(yearatt, monthatt - 1, dayatt) ;
}
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
this.elementStack.pop();
if ("employee".equals(qName)) {
Object objects = this.objectStack.pop();
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
String value = new String(ch, start, length).trim();
if (value.length() == 0) return; // skip white space
if ("name".equals(currentElement())) {
employee = (Employee) this.objectStack.peek();
employee.setName(value);
} else if ("salary".equals(currentElement()) && "employee".equals(currentParrentElement())) {
employee.setSalary(Double.parseDouble(value));
}
}
private String currentElement() {
return this.elementStack.peek();
}
private String currentParrentElement() {
if (this.elementStack.size() < 2) return null;
return this.elementStack.get(this.elementStack.size() - 2);
}
}
Stax parser:
class StaxXmlParser {
private List<Employee> employeeList;
private Employee currentEmployee;
private String tagContent;
private String attrContent;
private XMLStreamReader reader;
public StaxXmlParser(String filename) {
employeeList = null;
currentEmployee = null;
tagContent = null;
try {
XMLInputFactory factory = XMLInputFactory.newFactory();
reader = factory.createXMLStreamReader(new FileInputStream(new File(filename)));
parseEmployee();
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseEmployee() throws XMLStreamException {
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
if ("employee".equals(reader.getLocalName())) {
currentEmployee = new Employee();
}
if ("staff".equals(reader.getLocalName())) {
employeeList = new ArrayList<>();
}
if ("hiredate".equals(reader.getLocalName())) {
int yearAttr = Integer.parseInt(reader.getAttributeValue(null, "year"));
int monthAttr = Integer.parseInt(reader.getAttributeValue(null, "month"));
int dayAttr = Integer.parseInt(reader.getAttributeValue(null, "day"));
currentEmployee.setHireDay(yearAttr, monthAttr - 1, dayAttr);
}
break;
case XMLStreamConstants.CHARACTERS:
tagContent = reader.getText().trim();
break;
case XMLStreamConstants.ATTRIBUTE:
int count = reader.getAttributeCount();
for (int i = 0; i < count; i++) {
System.out.printf("count is: %d%n", count);
}
break;
case XMLStreamConstants.END_ELEMENT:
switch (reader.getLocalName()) {
case "employee":
employeeList.add(currentEmployee);
break;
case "name":
currentEmployee.setName(tagContent);
break;
case "salary":
currentEmployee.setSalary(Double.parseDouble(tagContent));
break;
}
}
}
return employeeList;
}
}
And some main() test:
public static void main(String[] args) {
long startTime, elapsedTime;
Main main = new Main();
startTime = System.currentTimeMillis();
main.testSaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testStaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testDomParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
}
Output:
Using SAX Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 106 ms
Using StAX Parser:
------------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 5 ms
Using DOM Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 13 ms
You can see some glimpse view at there variations.
But at java exist other as JAXB - You need to have xsd schema and accord to this schema you generate classes. After this you this can use unmarchal() to read from xml file:
public class JaxbDemo {
public static void main(String[] args) {
try {
long startTime = System.currentTimeMillis();
// create jaxb and instantiate marshaller
JAXBContext context = JAXBContext.newInstance(Staff.class.getPackage().getName());
FileInputStream in = new FileInputStream(new File(Files.EMPLOYEE_XML.getFilename()));
System.out.println("Output from employee XML file");
Unmarshaller um = context.createUnmarshaller();
Staff staff = (Staff) um.unmarshal(in);
// print employee list
for (Staff.Employee emp : staff.getEmployee()) {
System.out.println(emp);
}
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime));
} catch (Exception e) {
e.printStackTrace();
}
}
}
I tried this one approach as before, result is next:
Employee { name='Carl Cracker', salary=75000, hiredate=1987-12-15 } }
Employee { name='Harry Hacker', salary=50000, hiredate=1989-10-1 } }
Employee { name='Tony Tester', salary=40000, hiredate=1990-3-15 } }
Parsing time is: 320 ms
I added another toString(), and it has different hire day format.
Here is few links that is interesting for you:
Java & XML Tutorial
JAXB Tutorial
DOM Parser through Recursion
Using a DOM parser, you can easily get into a mess of nested for loops as you've already pointed out. Nevertheless, DOM structure is represented by Node containing child nodes collection in the form of a NodeList where each element is again a Node - this becomes a perfect candidate for recursion.
Sample XML
To showcase the ability of DOM parser discounting the size of the XML, I took the example of a hosted sample OpenWeatherMap XML.
Searching by city name in XML format
This XML contains London's weather forecast for every 3 hour duration. This XML makes a good case of reading through a relatively large data set and extracting specific information through attributes within the child elements.
In the snapshot, we are targeting to gather the Elements marked by the arrows.
The Code
We start of by creating a Custom class to hold temperature and clouds values. We would also override toString() of this custom class to conveniently print our records.
ForeCast.java
public class ForeCast {
/**
* Overridden toString() to conveniently print the results
*/
#Override
public String toString() {
return "The minimum temperature is: " + getTemperature()
+ " and the weather overall: " + getClouds();
}
public String getTemperature() {
return temperature;
}
public void setTemperature(String temperature) {
this.temperature = temperature;
}
public String getClouds() {
return clouds;
}
public void setClouds(String clouds) {
this.clouds = clouds;
}
private String temperature;
private String clouds;
}
Now to the main class. In the main class where we perform our recursion, we want to create a List of ForeCast objects which store individual temperature and clouds records by traversing the entire XML.
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
In the XML the parent to both temperature and clouds elements is time, we would logically check for the time element.
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
Thereafter, we would get a handle on temperature and clouds elements which can be set to the ForeCast object.
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName("temperature").item(0);
// Minimum temperature value is selectively picked (for proof of concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName("clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
The complete class below:
CustomDomXmlParser.java
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class CustomDomXmlParser {
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
// Read XML throuhg a URL (a FileInputStream can be used to pick up an
// XML file from the file system)
InputStream path = new URL(
"http://api.openweathermap.org/data/2.5/forecast?q=London,us&mode=xml")
.openStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(path);
// Call to the recursive method with the parent node
traverse(document.getDocumentElement());
// Print the List values collected within the recursive method
for (ForeCast forecastObj : forecastList)
System.out.println(forecastObj);
}
/**
*
* #param node
*/
public static void traverse(Node node) {
// Get the list of Child Nodes immediate to the current node
NodeList list = node.getChildNodes();
// Declare our local instance of forecast object
ForeCast forecastObj = null;
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName(
"temperature").item(0);
// Minimum temperature value is selectively picked (for proof of
// concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName(
"clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
}
// Add our foreCastObj if initialized within this recursion, that is if
// it traverses the time node within the XML, and not in any other case
if (forecastObj != null)
forecastList.add(forecastObj);
/**
* Recursion block
*/
// Iterate over the next child nodes
for (int i = 0; i < list.getLength(); i++) {
Node currentNode = list.item(i);
// Recursively invoke the method for the current node
traverse(currentNode);
}
}
}
The Output
As you can figure out from the screenshot below, we were able to group together the 2 specific elements and assign their values effectively to a Java Collection instance. We delegated the complex parsing of the xml to the generic recursive solution and customized mainly the logical block part. As mentioned, it is a genetic solution with a minimal customization which can work through all valid xmls.
Alternatives
Many other alternatives are available, here is a list of open source XML parsers for Java.
However, your approach with PHP and your initial work with Java based parser aligns to the DOM based XML parser solution, simplified by the use of recursion.
I wouldn't suggest you to implement your own parse function for XML parsing since there are already many options out there. My suggestion is DOM parser. You can find few examples in the following link. (You can also choose from other available options)
http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html
You can use commands such as
eElement.getAttribute("id");
Source: http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
I agree what has been already posted about not implementing parse functions yourself.
Instead of DOM/SAX/STAX parsers though, I would suggest using JDOM or XOM, which are external libraries.
Related discussions:
What Java XML library do you recommend (to replace dom4j)?
Should I still be using JDOM with Java 5 or 6?
My gut feeling is that jdom is the one most java developers use. Some use dom4j, some xom, some others, but hardly anybody implements these parsing functions themselves.
use Java
startElement and endElement
for DOM Parsers

Cannot get the content inside the child tag with JDOM

I am trying to extract the content from a gpx file. The problem is when I used getChildren("wpt") to get the content of wpt tag, I got nothing returned. And when I used getChildren() method, I got and several returned. And when I removed all the contents in the only leave it as , everything works fine.
The content of this file:
<?xml version="1.0" encoding="UTF-8"?>
<gpx version="1.0" creator="GPSBabel-
http://www.gpsbabel.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.topografix.com/GPX/1/0"
xsi:schemaLocation="http://www.topografix.com/GPX/1/0
http://www.topografix.com/GPX/1/0/gpx.xsd">
<time>2010-11-07T06:21:28Z</time>
<bounds minlat="40.516437500" minlon="-79.759539000"
maxlat="44.943992000" maxlon="-72.186828500"/>
<wpt lat="43.449895700" lon="-79.759539000">
<name>Pharmacy</name>
<cmt>Pharmacy</cmt>
<desc>Pharmacy</desc>
</wpt>
<wpt lat="43.650977000" lon="-79.758495300">
<name>Pharmacy:Walk-In Clinic</name>
<cmt>Pharmacy:Walk-In Clinic</cmt>
<desc>Pharmacy:Walk-In Clinic</desc>
</wpt>
<wpt lat="43.583929100" lon="-79.758268700">
<name>Hospital:Meadowvale Professional Center</name>
<cmt>Hospital:Meadowvale Professional Center</cmt>
<desc>Hospital:Meadowvale Professional Center</desc>
</wpt>
</gpx>
Below is my codes to extract the content:
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.sql.*;
import java.util.*;
public class ReadXml
{
public Connection conn = null;
public Statement stmt = null ;
public void readXml()
{
try
{
Class.forName("org.gjt.mm.mysql.Driver").newInstance();
String url ="jdbc:mysql://localhost/test?user=root&password=admin";//
conn = DriverManager.getConnection(url);
stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE,ResultSet.CONCUR_UPDATABLE);
}
catch(Exception sqlexception)
{
System.out.println("connection error !");
}
try
{
SAXBuilder sb = new SAXBuilder();
Document doc = sb.build("new_york_Government_and_Public_Services.gpx");
Element root = doc.getRootElement();
String name = "" ,lat = "", lon = "";
Element elms = null;
List list1 = root.getChildren("wpt");
for(int i=0; i< list1.size(); i++)
{
elms = (Element)list1.get(i);
lat = elms.getAttributeValue("lat");
lon = elms.getAttributeValue("lon");
name = elms.getChildText("name");
String sql = "insert into poi_test(name,lat,lon)values ('"+name+"','"+lat+"','"+lon+"')";
stmt.executeUpdate(sql);
}//for
stmt.close();
conn.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
public static void main(String[] args)
{
ReadXml rx = new ReadXml();
rx.readXml();
}
}
getChildren(String) JavaDoc:
This returns a List of all the child elements nested directly (one level deep) within this element with the given local name and belonging to no namespace, returned as Element objects. If this target element has no nested elements with the given name outside a namespace, an empty List is returned. The returned list is "live" in document order and changes to it affect the element's actual contents.
As your wpt tag belongs to http://www.topografix.com/GPX/1/0 namespace, the getChildren method behaves correctly when returning no children. Instead, you should use getChildren(String,Namespace):
Namespace gpx = Namespace.getNamespace("http://www.topografix.com/GPX/1/0");
//[...]
List list1 = root.getChildren("wpt", gpx);
I believe you need to specify the element namespace in this instance:
//yada yada
Namespace rootNamespace = root.getNamespace();
List wptElements = root.getChildren("wpt", rootNamespace);
for (Iterator wptIt = wptElements.iterator(); wptIt.hasNext(); ) {
Element wpt = (Element)wptIt.next();
//yada yada
}
Please also note that its recommended to iterate through the list via an Iterator instead of by index, according to the JDOM javadocs:
Sequential traversal through the List
is best done with a Iterator since the
underlying implement of List.size()
may not be the most efficient.

XML child node attribute value

I'm trying to read xml file, ex :
<entry>
<title>FEED TITLE</title>
<id>5467sdad98787ad3149878sasda</id>
<tempi type="application/xml">
<conento xmlns="http://mydomainname.com/xsd/radiofeed.xsd" madeIn="USA" />
</tempi>
</entry>
Here is the code I have so far :
Here is my attempt of trying to code this, what to say not successful thats why I started bounty. Here it is http://pastebin.com/huKP4KED .
Bounty update :
I really really tried to do this for days now didn't expect to be so hard, I'll accept useful links/books/tutorials but prefer code because I need this done yesterday.
Here is what I need:
Concerning xml above :
I need to get value of title, id
attribute value of tempi as well as madeIn attribute value of contento
What is the best way to do this ?
EDIT:
#Pascal Thivent
Maybe creating method would be good idea like public String getValue(String xml, Element elementname), where you specify tag name, the method returns tag value or tag attribute(maybe give it name as additional method argument) if the value is not available
What I really want to get certain tag value or attribute if tag value(s) is not available, so I'm in the process of thinking what is the best way to do so since I've never done it before
The best solution for this is to use XPath. Your pastebin is expired, but here's what I gathered. Let's say we have the following feed.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<entries>
<entry>
<title>FEED TITLE 1</title>
<id>id1</id>
<tempi type="type1">
<conento xmlns="dontcare?" madeIn="MadeIn1" />
</tempi>
</entry>
<entry>
<title>FEED TITLE 2</title>
<id>id2</id>
<tempi type="type2">
<conento xmlns="dontcare?" madeIn="MadeIn2" />
</tempi>
</entry>
<entry>
<id>id3</id>
</entry>
</entries>
Here's a short but compile-and-runnable proof-of-concept (with feed.xml file in the same directory).
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.io.*;
import java.util.*;
public class XPathTest {
static class Entry {
final String title, id, origin, type;
Entry(String title, String id, String origin, String type) {
this.title = title;
this.id = id;
this.origin = origin;
this.type = type;
}
#Override public String toString() {
return String.format("%s:%s(%s)[%s]", id, title, origin, type);
}
}
final static XPath xpath = XPathFactory.newInstance().newXPath();
static String evalString(Node context, String path) throws XPathExpressionException {
return (String) xpath.evaluate(path, context, XPathConstants.STRING);
}
public static void main(String[] args) throws Exception {
File file = new File("feed.xml");
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(file);
NodeList entriesNodeList = (NodeList) xpath.evaluate("//entry", document, XPathConstants.NODESET);
List<Entry> entries = new ArrayList<Entry>();
for (int i = 0; i < entriesNodeList.getLength(); i++) {
Node entryNode = entriesNodeList.item(i);
entries.add(new Entry(
evalString(entryNode, "title"),
evalString(entryNode, "id"),
evalString(entryNode, "tempi/conento/#madeIn"),
evalString(entryNode, "tempi/#type")
));
}
for (Entry entry : entries) {
System.out.println(entry);
}
}
}
This produces the following output:
id1:FEED TITLE 1(MadeIn1)[type1]
id2:FEED TITLE 2(MadeIn2)[type2]
id3:()[]
Note how using XPath makes the value retrieval very simple, intuitive, readable, and straightforward, and "missing" values are also gracefully handled.
API links
package javax.xml.xpath
http://www.w3.org/TR/xpath
Wikipedia/XPath
Use Element.getAttribute and Element.setAttribute
In your example, ((Node) content.item(0)).getFirstChild().getAttributes(). Assuming that content is a typo, and you mean contento, getFirstChild is correctly returning NULL as contento has no children. Try: ((Node) contento.item(0)).getAttributes() instead.
Another issue is that by using getFirstChild and getChildNodes()[0] without checking the return value, you are running the risk of picking up child text nodes, instead of the element you want.
As pointed out, <contento> doesn't have any child so instead of:
(contento.item(0)).getFirstChild().getAttributes()
You should treat the Node as Element and use getAttribute(String), something like this:
((Element)contento.item(0)).getAttribute("madeIn")
Here is a modified version of your code (it's not the most robust code I've written):
InputStream inputStream = new ByteArrayInputStream(xml.getBytes());
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(inputStream);
doc.getDocumentElement().normalize();
System.out.println("Root element " + doc.getDocumentElement().getNodeName());
NodeList nodeLst = doc.getElementsByTagName("entry");
System.out.println("Information of all entries");
for (int s = 0; s < nodeLst.getLength(); s++) {
Node fstNode = nodeLst.item(s);
if (fstNode.getNodeType() == Node.ELEMENT_NODE) {
Element fstElmnt = (Element) fstNode;
NodeList title = fstElmnt.getElementsByTagName("title").item(0).getChildNodes();
System.out.println("Title : " + (title.item(0)).getNodeValue());
NodeList id = fstElmnt.getElementsByTagName("id").item(0).getChildNodes();
System.out.println("Id: " + (id.item(0)).getNodeValue());
Node tempiNode = fstElmnt.getElementsByTagName("tempi").item(0);
System.out.println("Type : " + ((Element) tempiNode).getAttribute("type"));
Node contento = tempiNode.getChildNodes().item(0);
System.out.println("Made in : " + ((Element) contento).getAttribute("madeIn"));
}
}
Running it on your XML snippet produces the following output:
Root element entry
Information of all entries
Title : FEED TITLE
Id: 5467sdad98787ad3149878sasda
Type : application/xml
Made in : USA
By the way, did you consider using something like Rome instead?

lucene get matched terms in query

What is the best way to find out which terms in a query matched against a given document returned as a hit in lucene?
I have tried a weird method involving hit highlighting package in lucene contrib and also a method that searches for every word in the query against the top most document ("docId: xy AND description: each_word_in_query").
Do not get satisfactory results?
Hit highlighting does not report some of the words that matched for a document other than the first one.
I'm not sure if the second approach is the best alternative.
The method explain in the Searcher is a nice way to see which part of a query was matched and how it affects the overall score.
Example taken from the book Lucene In Action 2nd Edition:
public class Explainer {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: Explainer <index dir> <query>");
System.exit(1);
}
String indexDir = args[0];
String queryExpression = args[1];
Directory directory = FSDirectory.open(new File(indexDir));
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
"contents", new SimpleAnalyzer());
Query query = parser.parse(queryExpression);
System.out.println("Query: " + queryExpression);
IndexSearcher searcher = new IndexSearcher(directory);
TopDocs topDocs = searcher.search(query, 10);
for (int i = 0; i < topDocs.totalHits; i++) {
ScoreDoc match = topDocs.scoreDocs[i];
Explanation explanation = searcher.explain(query, match.doc);
System.out.println("----------");
Document doc = searcher.doc(match.doc);
System.out.println(doc.get("title"));
System.out.println(explanation.toString());
}
}
}
This will explain the score of each document that matches the query.
Not tried yet, but have a look at the implementation of org.apache.lucene.search.highlight.QueryTermExtractor.

Categories