Best way to compare 2 XML documents in Java - java

I'm trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end. I've got a good set of input/output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end.
When it comes time to compare the actual output to the expected output I'm running into some problems. My first thought was just to do string comparisons on the expected and actual messages. This doens't work very well because the example data we have isn't always formatted consistently and there are often times different aliases used for the XML namespace (and sometimes namespaces aren't used at all.)
I know I can parse both strings and then walk through each element and compare them myself and this wouldn't be too difficult to do, but I get the feeling there's a better way or a library I could leverage.
So, boiled down, the question is:
Given two Java Strings which both contain valid XML how would you go about determining if they are semantically equivalent? Bonus points if you have a way to determine what the differences are.

Sounds like a job for XMLUnit
http://www.xmlunit.org/
https://github.com/xmlunit
Example:
public class SomeTest extends XMLTestCase {
#Test
public void test() {
String xml1 = ...
String xml2 = ...
XMLUnit.setIgnoreWhitespace(true); // ignore whitespace differences
// can also compare xml Documents, InputSources, Readers, Diffs
assertXMLEqual(xml1, xml2); // assertXMLEquals comes from XMLTestCase
}
}

The following will check if the documents are equal using standard JDK libraries.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();
Document doc2 = db.parse(new File("file2.xml"));
doc2.normalizeDocument();
Assert.assertTrue(doc1.isEqualNode(doc2));
normalize() is there to make sure there are no cycles (there technically wouldn't be any)
The above code will require the white spaces to be the same within the elements though, because it preserves and evaluates it. The standard XML parser that comes with Java does not allow you to set a feature to provide a canonical version or understand xml:space if that is going to be a problem then you may need a replacement XML parser such as xerces or use JDOM.

Xom has a Canonicalizer utility which turns your DOMs into a regular form, which you can then stringify and compare. So regardless of whitespace irregularities or attribute ordering, you can get regular, predictable comparisons of your documents.
This works especially well in IDEs that have dedicated visual String comparators, like Eclipse. You get a visual representation of the semantic differences between the documents.

The latest version of XMLUnit can help the job of asserting two XML are equal. Also XMLUnit.setIgnoreWhitespace() and XMLUnit.setIgnoreAttributeOrder() may be necessary to the case in question.
See working code of a simple example of XML Unit use below.
import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.XMLUnit;
import org.junit.Assert;
public class TestXml {
public static void main(String[] args) throws Exception {
String result = "<abc attr=\"value1\" title=\"something\"> </abc>";
// will be ok
assertXMLEquals("<abc attr=\"value1\" title=\"something\"></abc>", result);
}
public static void assertXMLEquals(String expectedXML, String actualXML) throws Exception {
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreAttributeOrder(true);
DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(expectedXML, actualXML));
List<?> allDifferences = diff.getAllDifferences();
Assert.assertEquals("Differences found: "+ diff.toString(), 0, allDifferences.size());
}
}
If using Maven, add this to your pom.xml:
<dependency>
<groupId>xmlunit</groupId>
<artifactId>xmlunit</artifactId>
<version>1.4</version>
</dependency>

Building on Tom's answer, here's an example using XMLUnit v2.
It uses these maven dependencies
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-core</artifactId>
<version>2.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-matchers</artifactId>
<version>2.0.0</version>
<scope>test</scope>
</dependency>
..and here's the test code
import static org.junit.Assert.assertThat;
import static org.xmlunit.matchers.CompareMatcher.isIdenticalTo;
import org.xmlunit.builder.Input;
import org.xmlunit.input.WhitespaceStrippedSource;
public class SomeTest extends XMLTestCase {
#Test
public void test() {
String result = "<root></root>";
String expected = "<root> </root>";
// ignore whitespace differences
// https://github.com/xmlunit/user-guide/wiki/Providing-Input-to-XMLUnit#whitespacestrippedsource
assertThat(result, isIdenticalTo(new WhitespaceStrippedSource(Input.from(expected).build())));
assertThat(result, isIdenticalTo(Input.from(expected).build())); // will fail due to whitespace differences
}
}
The documentation that outlines this is https://github.com/xmlunit/xmlunit#comparing-two-documents

Thanks, I extended this, try this ...
import java.io.ByteArrayInputStream;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
public class XmlDiff
{
private boolean nodeTypeDiff = true;
private boolean nodeValueDiff = true;
public boolean diff( String xml1, String xml2, List<String> diffs ) throws Exception
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new ByteArrayInputStream(xml1.getBytes()));
Document doc2 = db.parse(new ByteArrayInputStream(xml2.getBytes()));
doc1.normalizeDocument();
doc2.normalizeDocument();
return diff( doc1, doc2, diffs );
}
/**
* Diff 2 nodes and put the diffs in the list
*/
public boolean diff( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( diffNodeExists( node1, node2, diffs ) )
{
return true;
}
if( nodeTypeDiff )
{
diffNodeType(node1, node2, diffs );
}
if( nodeValueDiff )
{
diffNodeValue(node1, node2, diffs );
}
System.out.println(node1.getNodeName() + "/" + node2.getNodeName());
diffAttributes( node1, node2, diffs );
diffNodes( node1, node2, diffs );
return diffs.size() > 0;
}
/**
* Diff the nodes
*/
public boolean diffNodes( Node node1, Node node2, List<String> diffs ) throws Exception
{
//Sort by Name
Map<String,Node> children1 = new LinkedHashMap<String,Node>();
for( Node child1 = node1.getFirstChild(); child1 != null; child1 = child1.getNextSibling() )
{
children1.put( child1.getNodeName(), child1 );
}
//Sort by Name
Map<String,Node> children2 = new LinkedHashMap<String,Node>();
for( Node child2 = node2.getFirstChild(); child2!= null; child2 = child2.getNextSibling() )
{
children2.put( child2.getNodeName(), child2 );
}
//Diff all the children1
for( Node child1 : children1.values() )
{
Node child2 = children2.remove( child1.getNodeName() );
diff( child1, child2, diffs );
}
//Diff all the children2 left over
for( Node child2 : children2.values() )
{
Node child1 = children1.get( child2.getNodeName() );
diff( child1, child2, diffs );
}
return diffs.size() > 0;
}
/**
* Diff the nodes
*/
public boolean diffAttributes( Node node1, Node node2, List<String> diffs ) throws Exception
{
//Sort by Name
NamedNodeMap nodeMap1 = node1.getAttributes();
Map<String,Node> attributes1 = new LinkedHashMap<String,Node>();
for( int index = 0; nodeMap1 != null && index < nodeMap1.getLength(); index++ )
{
attributes1.put( nodeMap1.item(index).getNodeName(), nodeMap1.item(index) );
}
//Sort by Name
NamedNodeMap nodeMap2 = node2.getAttributes();
Map<String,Node> attributes2 = new LinkedHashMap<String,Node>();
for( int index = 0; nodeMap2 != null && index < nodeMap2.getLength(); index++ )
{
attributes2.put( nodeMap2.item(index).getNodeName(), nodeMap2.item(index) );
}
//Diff all the attributes1
for( Node attribute1 : attributes1.values() )
{
Node attribute2 = attributes2.remove( attribute1.getNodeName() );
diff( attribute1, attribute2, diffs );
}
//Diff all the attributes2 left over
for( Node attribute2 : attributes2.values() )
{
Node attribute1 = attributes1.get( attribute2.getNodeName() );
diff( attribute1, attribute2, diffs );
}
return diffs.size() > 0;
}
/**
* Check that the nodes exist
*/
public boolean diffNodeExists( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1 == null && node2 == null )
{
diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2 + "\n" );
return true;
}
if( node1 == null && node2 != null )
{
diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2.getNodeName() );
return true;
}
if( node1 != null && node2 == null )
{
diffs.add( getPath(node1) + ":node " + node1.getNodeName() + "!=" + node2 );
return true;
}
return false;
}
/**
* Diff the Node Type
*/
public boolean diffNodeType( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1.getNodeType() != node2.getNodeType() )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeType() + "!=" + node2.getNodeType() );
return true;
}
return false;
}
/**
* Diff the Node Value
*/
public boolean diffNodeValue( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1.getNodeValue() == null && node2.getNodeValue() == null )
{
return false;
}
if( node1.getNodeValue() == null && node2.getNodeValue() != null )
{
diffs.add( getPath(node1) + ":type " + node1 + "!=" + node2.getNodeValue() );
return true;
}
if( node1.getNodeValue() != null && node2.getNodeValue() == null )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2 );
return true;
}
if( !node1.getNodeValue().equals( node2.getNodeValue() ) )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2.getNodeValue() );
return true;
}
return false;
}
/**
* Get the node path
*/
public String getPath( Node node )
{
StringBuilder path = new StringBuilder();
do
{
path.insert(0, node.getNodeName() );
path.insert( 0, "/" );
}
while( ( node = node.getParentNode() ) != null );
return path.toString();
}
}

AssertJ 1.4+ has specific assertions to compare XML content:
String expectedXml = "<foo />";
String actualXml = "<bar />";
assertThat(actualXml).isXmlEqualTo(expectedXml);
Here is the Documentation

Below code works for me
String xml1 = ...
String xml2 = ...
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreAttributeOrder(true);
XMLAssert.assertXMLEqual(actualxml, xmlInDb);

skaffman seems to be giving a good answer.
another way is probably to format the XML using a commmand line utility like xmlstarlet(http://xmlstar.sourceforge.net/) and then format both the strings and then use any diff utility(library) to diff the resulting output files. I don't know if this is a good solution when issues are with namespaces.

I'm using Altova DiffDog which has options to compare XML files structurally (ignoring string data).
This means that (if checking the 'ignore text' option):
<foo a="xxx" b="xxx">xxx</foo>
and
<foo b="yyy" a="yyy">yyy</foo>
are equal in the sense that they have structural equality. This is handy if you have example files that differ in data, but not structure!

I required the same functionality as requested in the main question. As I was not allowed to use any 3rd party libraries, I have created my own solution basing on #Archimedes Trajano solution.
Following is my solution.
import java.io.ByteArrayInputStream;
import java.nio.charset.Charset;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.junit.Assert;
import org.w3c.dom.Document;
/**
* Asserts for asserting XML strings.
*/
public final class AssertXml {
private AssertXml() {
}
private static Pattern NAMESPACE_PATTERN = Pattern.compile("xmlns:(ns\\d+)=\"(.*?)\"");
/**
* Asserts that two XML are of identical content (namespace aliases are ignored).
*
* #param expectedXml expected XML
* #param actualXml actual XML
* #throws Exception thrown if XML parsing fails
*/
public static void assertEqualXmls(String expectedXml, String actualXml) throws Exception {
// Find all namespace mappings
Map<String, String> fullnamespace2newAlias = new HashMap<String, String>();
generateNewAliasesForNamespacesFromXml(expectedXml, fullnamespace2newAlias);
generateNewAliasesForNamespacesFromXml(actualXml, fullnamespace2newAlias);
for (Entry<String, String> entry : fullnamespace2newAlias.entrySet()) {
String newAlias = entry.getValue();
String namespace = entry.getKey();
Pattern nsReplacePattern = Pattern.compile("xmlns:(ns\\d+)=\"" + namespace + "\"");
expectedXml = transletaNamespaceAliasesToNewAlias(expectedXml, newAlias, nsReplacePattern);
actualXml = transletaNamespaceAliasesToNewAlias(actualXml, newAlias, nsReplacePattern);
}
// nomralize namespaces accoring to given mapping
DocumentBuilder db = initDocumentParserFactory();
Document expectedDocuemnt = db.parse(new ByteArrayInputStream(expectedXml.getBytes(Charset.forName("UTF-8"))));
expectedDocuemnt.normalizeDocument();
Document actualDocument = db.parse(new ByteArrayInputStream(actualXml.getBytes(Charset.forName("UTF-8"))));
actualDocument.normalizeDocument();
if (!expectedDocuemnt.isEqualNode(actualDocument)) {
Assert.assertEquals(expectedXml, actualXml); //just to better visualize the diffeences i.e. in eclipse
}
}
private static DocumentBuilder initDocumentParserFactory() throws ParserConfigurationException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(false);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
return db;
}
private static String transletaNamespaceAliasesToNewAlias(String xml, String newAlias, Pattern namespacePattern) {
Matcher nsMatcherExp = namespacePattern.matcher(xml);
if (nsMatcherExp.find()) {
xml = xml.replaceAll(nsMatcherExp.group(1) + "[:]", newAlias + ":");
xml = xml.replaceAll(nsMatcherExp.group(1) + "=", newAlias + "=");
}
return xml;
}
private static void generateNewAliasesForNamespacesFromXml(String xml, Map<String, String> fullnamespace2newAlias) {
Matcher nsMatcher = NAMESPACE_PATTERN.matcher(xml);
while (nsMatcher.find()) {
if (!fullnamespace2newAlias.containsKey(nsMatcher.group(2))) {
fullnamespace2newAlias.put(nsMatcher.group(2), "nsTr" + (fullnamespace2newAlias.size() + 1));
}
}
}
}
It compares two XML strings and takes care of any mismatching namespace mappings by translating them to unique values in both input strings.
Can be fine tuned i.e. in case of translation of namespaces. But for my requirements just does the job.

This will compare full string XMLs (reformatting them on the way). It makes it easy to work with your IDE (IntelliJ, Eclipse), cos you just click and visually see the difference in the XML files.
import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;
import static org.apache.xml.security.Init.init;
import static org.junit.Assert.assertEquals;
public class XmlUtils {
static {
init();
}
public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
return new String(canonXmlBytes);
}
public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
InputSource src = new InputSource(new StringReader(input));
Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
Boolean keepDeclaration = input.startsWith("<?xml");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
return writer.writeToString(document);
}
public static void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
String canonicalExpected = prettyFormat(toCanonicalXml(expected));
String canonicalActual = prettyFormat(toCanonicalXml(actual));
assertEquals(canonicalExpected, canonicalActual);
}
}
I prefer this to XmlUnit because the client code (test code) is cleaner.

Using XMLUnit 2.x
In the pom.xml
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-assertj3</artifactId>
<version>2.9.0</version>
</dependency>
Test implementation (using junit 5) :
import org.junit.jupiter.api.Test;
import org.xmlunit.assertj3.XmlAssert;
public class FooTest {
#Test
public void compareXml() {
//
String xmlContentA = "<foo></foo>";
String xmlContentB = "<foo></foo>";
//
XmlAssert.assertThat(xmlContentA).and(xmlContentB).areSimilar();
}
}
Other methods : areIdentical(), areNotIdentical(), areNotSimilar()
More details (configuration of assertThat(~).and(~) and examples) in this documentation page.
XMLUnit also has (among other features) a DifferenceEvaluator to do more precise comparisons.
XMLUnit website

Using JExamXML with java application
import com.a7soft.examxml.ExamXML;
import com.a7soft.examxml.Options;
.................
// Reads two XML files into two strings
String s1 = readFile("orders1.xml");
String s2 = readFile("orders.xml");
// Loads options saved in a property file
Options.loadOptions("options");
// Compares two Strings representing XML entities
System.out.println( ExamXML.compareXMLString( s1, s2 ) );

Since you say "semantically equivalent" I assume you mean that you want to do more than just literally verify that the xml outputs are (string) equals, and that you'd want something like
<foo> some stuff here</foo></code>
and
<foo>some stuff here</foo></code>
do read as equivalent. Ultimately it's going to matter how you're defining "semantically equivalent" on whatever object you're reconstituting the message from. Simply build that object from the messages and use a custom equals() to define what you're looking for.

Related

Adding content from one xml file to the other in JAVA

I've properties file which has key values of name tag in 2 xml files, one is source and other one is destination.
I need to check whether name tag with the same value from properties file is there or not in destination xml, if its there I should not do anything, if its not there the source xml file should be iterated to search for name tag value which is from properties file. Once it found the same name tag should be added from source.xml file to destination.xml file..
Please do help me on this java code
private void updateCofigDestn() throws ParserConfigurationException, TransformerConfigurationException, TransformerException, IOException, SAXException {
prop = loadConfigProperties();
String ConfigSrcFile = prop.getProperty("ConfigSourceFile");
String ConfigDesnFile = prop.getProperty("ConfigDestnFile");
System.out.println("\nConfig Destn Path update config :: " + ConfigDesnFile);
File configSrcFile = new File(ConfigSrcFile + "\\config.xml");
File configDstnFile = new File(ConfigDesnFile + "\\config.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setValidating(false);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document docSrc = dBuilder.parse(configSrcFile);
Document docDestn = dBuilder.parse(configDstnFile);
Set < Object > keys = getAllKeys();
for (Object k: keys) {
if (k.toString().startsWith("JDBC")) {
System.out.println("Inside Keys");
String key = (String) k;
keyVal = getPropertyValue(key);
System.out.println(key + ": " + getPropertyValue(key));
NodeList listSrc =
docSrc.getElementsByTagName("jdbc-system-resource");
NodeList listDsn =
docDestn.getElementsByTagName("jdbc-system-resource");
System.out.println("listDsn.item(0)" + listDsn.item(0).getTextContent());
if (listDsn.item(0) != null) {
for (int t = 0; t < listDsn.getLength(); t++) {
Element elmntDsn1 = (Element) listDsn.item(t);
String DsNameDsn1 = elmntDsn1.getElementsByTagName("name").item(0).getTextContent();
System.out.println("DS At DESTN in Update Conf " + DsNameDsn1);
if (keyVal.equalsIgnoreCase(DsNameDsn1)) {} else {
for (int temp = 0; temp < listSrc.getLength(); temp++) {
Element elmntSrc = (Element) listSrc.item(temp);
String DsNameSrc = elmntSrc.getElementsByTagName("name").item(0).getTextContent();
// elmntSrc.getElementsByTagName(keyVal).item(0).getTextContent();
// configDestn(keyVal);
//System.out.println("value bool >>>>> " +res ) ;
if (keyVal.equalsIgnoreCase(DsNameSrc) && keyVal != null) {
Node copiedNode = docDestn.importNode(elmntSrc, true);
docDestn.getDocumentElement().appendChild(copiedNode);
System.out.println(" Updating the destination Config File");
TransformerFactory.newInstance().newTransformer().transform(new DOMSource(docDestn),
new StreamResult(new FileWriter(configDstnFile)));
}
}
}
}
} else {
System.out.println("Destination List is null ");
for (int temp = 0; temp < listSrc.getLength(); temp++) {
Element elmntSrc = (Element) listSrc.item(temp);
String elmntValSrc = elmntSrc.getElementsByTagName("name").item(0).getTextContent();
if (keyVal.equalsIgnoreCase(elmntValSrc) &&
keyVal != null) {
Node copiedNode = docDestn.importNode(elmntSrc, true);
docDestn.getDocumentElement().appendChild(copiedNode);
System.out.println(" Updating the destination Config File in NULL");
TransformerFactory.newInstance().newTransformer().transform(new DOMSource(docDestn),
new StreamResult(new FileWriter(configDstnFile)));
}
}
}
}
}
}
For ex..
config.properties
file1 = def
file2 = xyz
file3 = abc
source.xml
<domain>
<node0>
<name>xyz</name>
</node0>
<node1>
<name>abc</name>
</node1>
<node2>
<name>def</name>
</node2>
</domain>
destination.xml
<domain>
<node1>
<name>abc</name>
</node1>
</domain>
Step1: It takes key value of file 1 'def' from properties file and checks in destination.xml file, since its not there it will append it.
Step2: It takes the next key value of file 2 'xyz' value from properties file and checks in destination.xml file, since its not there it will append it.
Step3: It takes the next key value of file 3 'abc' from properties file and checks in destination.xml or not, since its there it will not appended.
And now the destination.xml should be looks like,
<domain>
<node1>
<name>abc</name>
</node1>
<node0>
<name>xyz</name>
</node0>
<node2>
<name>def</name>
</node2>
</domain>
This is my requirement to do in JAVA, I have tried lot of coding.
Please do help me out on this..
You're almost there.
The mistake you are doing is using inner 3 level nested for-loop whereas only 2 level nested for-loop is required.
The for-loop of listSrc should be outside the destSrc.
Try this below one.
package com.tmp;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Map;
import java.util.Properties;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class Tmp {
public static void main( String[] args ) {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = builderFactory.newDocumentBuilder();
XPath path = XPathFactory.newInstance().newXPath();
Document destDocument = builder.parse( new FileInputStream( "D:\\tmp\\destination.xml" ) );
Document srcDocument = builder.parse( new FileInputStream( "D:\\tmp\\source.xml" ) );
Element destRootEle = destDocument.getDocumentElement();
Element srcRootEle = srcDocument.getDocumentElement();
Properties properties = new Properties();
properties.load( new FileInputStream( "D:\\tmp\\config.properties" ) );
// read properties from config.properties file one by one
for ( Map.Entry<Object, Object> entry : properties.entrySet() ) {
String propVal = (String) entry.getValue();
NodeList destNodeList = (NodeList) path.evaluate( "//name", destRootEle, XPathConstants.NODESET );
boolean destNodeNotExist = true;
// iterate through the destination.xml to check whether property value is exist or not
for ( int i = 0; i < destNodeList.getLength(); i++ ) {
Node node = destNodeList.item( i );
if ( propVal.trim().equals( node.getTextContent().trim() ) ) {
destNodeNotExist = false;
break;
}
}
// if the property value is not found in destination.xml then check for the node in source.xml to add to the destination.xml
if ( destNodeNotExist ) {
NodeList srcNodeList = (NodeList) path.evaluate( "//name", srcRootEle, XPathConstants.NODESET );
for ( int i = 0; i < srcNodeList.getLength(); i++ ) {
Node missingNodeToAdd = srcNodeList.item( i );
if ( propVal.trim().equals( missingNodeToAdd.getTextContent().trim() ) ) {
destRootEle.appendChild( destDocument.adoptNode( missingNodeToAdd.getParentNode() ) );
break;
}
}
}
}
// save the changes made to destination.xml file into file system
Transformer tr = TransformerFactory.newInstance().newTransformer();
tr.setOutputProperty( OutputKeys.INDENT, "yes" );
tr.setOutputProperty( OutputKeys.OMIT_XML_DECLARATION, "yes" );
tr.setOutputProperty( OutputKeys.ENCODING, "UTF-8" );
tr.transform( new DOMSource( destDocument ), new StreamResult( new FileOutputStream( "D:\\tmp\\destination.xml" ) ) );
} catch ( Exception e ) {
e.printStackTrace();
}
}
}

Java XML JDOM2 XPath - Read text value from XML attribute and element using XPath expression

The program should be allowed to read from an XML file using XPath expressions.
I already started the project using JDOM2, switching to another API is unwanted.
The difficulty is, that the program does not know beforehand if it has to read an element or an attribute.
Does the API provide any function to receive the content (string) just by giving it the XPath expression?
From what I know about XPath in JDOM2, it uses objects of different types to evaluate XPath expressions pointing to attributes or elements.
I am only interested in the content of the attribute / element where the XPath expression points to.
Here is an example XML file:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
This is what my program looks like:
package exampleprojectgroup;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import org.jdom2.Attribute;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.filter.Filters;
import org.jdom2.input.SAXBuilder;
import org.jdom2.input.sax.XMLReaders;
import org.jdom2.xpath.XPathExpression;
import org.jdom2.xpath.XPathFactory;
public class ElementAttribute2String
{
ElementAttribute2String()
{
run();
}
public void run()
{
final String PATH_TO_FILE = "c:\\readme.xml";
/* It is essential that the program has to work with a variable amount of XPath expressions. */
LinkedList<String> xPathExpressions = new LinkedList<>();
/* Simulate user input.
* First XPath expression points to attribute,
* second one points to element.
* Many more expressions follow in a real situation.
*/
xPathExpressions.add( "/bookstore/book/#category" );
xPathExpressions.add( "/bookstore/book/price" );
/* One list should be sufficient to store the result. */
List<Element> elementsResult = null;
List<Attribute> attributesResult = null;
List<Object> objectsResult = null;
try
{
SAXBuilder saxBuilder = new SAXBuilder( XMLReaders.NONVALIDATING );
Document document = saxBuilder.build( PATH_TO_FILE );
XPathFactory xPathFactory = XPathFactory.instance();
int i = 0;
for ( String string : xPathExpressions )
{
/* Works only for elements, uncomment to give it a try. */
// XPathExpression<Element> xPathToElement = xPathFactory.compile( xPathExpressions.get( i ), Filters.element() );
// elementsResult = xPathToElement.evaluate( document );
// for ( Element element : elementsResult )
// {
// System.out.println( "Content of " + string + ": " + element.getText() );
// }
/* Works only for attributes, uncomment to give it a try. */
// XPathExpression<Attribute> xPathToAttribute = xPathFactory.compile( xPathExpressions.get( i ), Filters.attribute() );
// attributesResult = xPathToAttribute.evaluate( document );
// for ( Attribute attribute : attributesResult )
// {
// System.out.println( "Content of " + string + ": " + attribute.getValue() );
// }
/* I want to receive the content of the XPath expression as a string
* without having to know if it is an attribute or element beforehand.
*/
XPathExpression<Object> xPathExpression = xPathFactory.compile( xPathExpressions.get( i ) );
objectsResult = xPathExpression.evaluate( document );
for ( Object object : objectsResult )
{
if ( object instanceof Attribute )
{
System.out.println( "Content of " + string + ": " + ((Attribute)object).getValue() );
}
else if ( object instanceof Element )
{
System.out.println( "Content of " + string + ": " + ((Element)object).getText() );
}
}
i++;
}
}
catch ( IOException ioException )
{
ioException.printStackTrace();
}
catch ( JDOMException jdomException )
{
jdomException.printStackTrace();
}
}
}
Another thought is to search for the '#' character in the XPath expression, to determine if it is pointing to an attribute or element.
This gives me the desired result, though I wish there was a more elegant solution.
Does the JDOM2 API provide anything useful for this problem?
Could the code be redesigned to meet my requirements?
Thank you in advance!
XPath expressions are hard to type/cast because they need to be compiled in a system that is sensitive to the return type of the XPath functions/values that are in the expression. JDOM relies on third-party code to do that, and that third party code does not have a mechanism to correlate those types at your JDOM code's compile time. Note that XPath expressions can return a number of different types of content, including String, boolean, Number, and Node-List-like content.
In most cases, the XPath expression return type is known before the expression is evaluated, and the programmer has the "right" casting/expectations for processing the results.
In your case, you don't, and the expression is more dynamic.
I recommend that you declare a helper function to process the content:
private static final Function extractValue(Object source) {
if (source instanceof Attribute) {
return ((Attribute)source).getValue();
}
if (source instanceof Content) {
return ((Content)source).getValue();
}
return String.valueOf(source);
}
This at least will neaten up your code, and if you use Java8 streams, can be quite compact:
List<String> values = xPathExpression.evaluate( document )
.stream()
.map(o -> extractValue(o))
.collect(Collectors.toList());
Note that the XPath spec for Element nodes is that the string-value is the concatination of the Element's text() content as well as all child elements' content. Thus, in the following XML snippet:
<a>bilbo <b>samwise</b> frodo</a>
the getValue() on the a element will return bilbo samwise frodo, but the getText() will return bilbo frodo. Choose which mechanism you use for the value extraction carefully.
I had the exact same problem and took the approach of recognizing when an attribute is the focus of the Xpath. I solved with two functions. The first complied the XPathExpression for later use:
XPathExpression xpExpression;
if (xpath.matches( ".*/#[\\w]++$")) {
// must be an attribute value we're after..
xpExpression = xpfac.compile(xpath, Filters.attribute(), null, myNSpace);
} else {
xpExpression = xpfac.compile(xpath, Filters.element(), null, myNSpace);
}
The second evaluates and returns a value:
Object target = xpExpression.evaluateFirst(baseEl);
if (target != null) {
String value = null;
if (target instanceof Element) {
Element targetEl = (Element) target;
value = targetEl.getTextNormalize();
} else if (target instanceof Attribute) {
Attribute targetAt = (Attribute) target;
value = targetAt.getValue();
}
I suspect its a matter of coding style whether you prefer the helper function suggested in the previous answer or this approach. Either will work.

Using Java to parse XML

I have made a PHP script that parses an XML file. This is not easy to use and I wanted to implement it in Java.
Inside the first element there are various count of wfs:member elements I loop through:
foreach ($data->children("wfs", true)->member as $member) { }
This was easy to do with Java:
NodeList wfsMember = doc.getElementsByTagName("wfs:member");
for(int i = 0; i < wfsMember.getLength(); i++) { }
I have opened the XML file like this
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(WeatherDatabaseUpdater.class.getResourceAsStream("wfs.xml"));
Then I need to get a attribute from an element called observerdProperty. In PHP this is simple:
$member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href
But in Java, how do I do this? Do I need to use getElementsByTagName and loop through them if I want to go deeper in the structure?`
In PHP the whole script looks the following.
foreach ($data->children("wfs", true)->member as $member) {
$dataType = $dataTypes[(string) $member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href];
foreach ($member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->result->
children("wml2", true)->MeasurementTimeseries->
children("wml2", true)->point as $point) {
$time = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->time;
$value = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->value;
$data[$dataType][] = array($time, $value)
}
}
In the second foreach I loop through the observation elements and get the time and value data from it. Then I save it in an array. If I need to loop through the elements in Java the way I described, this is very hard to implement. I don't think that is the case, so could someone advice me how to implement something similar in Java?
The easiest way, if performance is not a main concern, is probably XPath. With XPath, you can find nodes and attributes simply by specifying a path.
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
The xpath_expression could be as simple as
"string(//member/observedProperty/#href)"
For more information about XPath, XPath Tutorial from W3Schools is pretty good.
You have few variations how to implement XML parsing at Java.
The most common is: DOM, SAX, StAX.
Everyone one has pros and cons. With Dom and Sax you able to validate your xml with xsd schema. But Stax works without xsd validation, and much faster.
For example, xml file:
<?xml version="1.0" encoding="UTF-8"?>
<staff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="oldEmployee.xsd">
<employee>
<name>Carl Cracker</name>
<salary>75000</salary>
<hiredate year="1987" month="12" day="15" />
</employee>
<employee>
<name>Harry Hacker</name>
<salary>50000</salary>
<hiredate year="1989" month="10" day="1" />
</employee>
<employee>
<name>Tony Tester</name>
<salary>40000</salary>
<hiredate year="1990" month="3" day="15" />
</employee>
</staff>
The longest at implementation (to my mind) DOM parser:
class DomXmlParser {
private Document document;
List<Employee> empList = new ArrayList<>();
public SchemaFactory schemaFactory;
public final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
public final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
public DomXmlParser() {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(EMPLOYEE_XML.getFilename()));
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseFromXmlToEmployee() {
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
Employee emp = new Employee();
NodeList childNodes = node.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
Node cNode = childNodes.item(j);
// identify the child tag of employees
if (cNode instanceof Element) {
switch (cNode.getNodeName()) {
case "name":
emp.setName(text(cNode));
break;
case "salary":
emp.setSalary(Double.parseDouble(text(cNode)));
break;
case "hiredate":
int yearAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("year").getNodeValue());
int monthAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("month").getNodeValue());
int dayAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("day").getNodeValue());
emp.setHireDay(yearAttr, monthAttr - 1, dayAttr);
break;
}
}
}
empList.add(emp);
}
}
return empList;
}
private String text(Node cNode) {
return cNode.getTextContent().trim();
}
}
SAX parser:
class SaxHandler extends DefaultHandler {
private Stack<String> elementStack = new Stack<>();
private Stack<Object> objectStack = new Stack<>();
public List<Employee> employees = new ArrayList<>();
Employee employee = null;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
this.elementStack.push(qName);
if ("employee".equals(qName)) {
employee = new Employee();
this.objectStack.push(employee);
this.employees.add(employee);
}
if("hiredate".equals(qName))
{
int yearatt = Integer.parseInt(attributes.getValue("year"));
int monthatt = Integer.parseInt(attributes.getValue("month"));
int dayatt = Integer.parseInt(attributes.getValue("day"));
if (employee != null) {
employee.setHireDay(yearatt, monthatt - 1, dayatt) ;
}
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
this.elementStack.pop();
if ("employee".equals(qName)) {
Object objects = this.objectStack.pop();
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
String value = new String(ch, start, length).trim();
if (value.length() == 0) return; // skip white space
if ("name".equals(currentElement())) {
employee = (Employee) this.objectStack.peek();
employee.setName(value);
} else if ("salary".equals(currentElement()) && "employee".equals(currentParrentElement())) {
employee.setSalary(Double.parseDouble(value));
}
}
private String currentElement() {
return this.elementStack.peek();
}
private String currentParrentElement() {
if (this.elementStack.size() < 2) return null;
return this.elementStack.get(this.elementStack.size() - 2);
}
}
Stax parser:
class StaxXmlParser {
private List<Employee> employeeList;
private Employee currentEmployee;
private String tagContent;
private String attrContent;
private XMLStreamReader reader;
public StaxXmlParser(String filename) {
employeeList = null;
currentEmployee = null;
tagContent = null;
try {
XMLInputFactory factory = XMLInputFactory.newFactory();
reader = factory.createXMLStreamReader(new FileInputStream(new File(filename)));
parseEmployee();
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseEmployee() throws XMLStreamException {
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
if ("employee".equals(reader.getLocalName())) {
currentEmployee = new Employee();
}
if ("staff".equals(reader.getLocalName())) {
employeeList = new ArrayList<>();
}
if ("hiredate".equals(reader.getLocalName())) {
int yearAttr = Integer.parseInt(reader.getAttributeValue(null, "year"));
int monthAttr = Integer.parseInt(reader.getAttributeValue(null, "month"));
int dayAttr = Integer.parseInt(reader.getAttributeValue(null, "day"));
currentEmployee.setHireDay(yearAttr, monthAttr - 1, dayAttr);
}
break;
case XMLStreamConstants.CHARACTERS:
tagContent = reader.getText().trim();
break;
case XMLStreamConstants.ATTRIBUTE:
int count = reader.getAttributeCount();
for (int i = 0; i < count; i++) {
System.out.printf("count is: %d%n", count);
}
break;
case XMLStreamConstants.END_ELEMENT:
switch (reader.getLocalName()) {
case "employee":
employeeList.add(currentEmployee);
break;
case "name":
currentEmployee.setName(tagContent);
break;
case "salary":
currentEmployee.setSalary(Double.parseDouble(tagContent));
break;
}
}
}
return employeeList;
}
}
And some main() test:
public static void main(String[] args) {
long startTime, elapsedTime;
Main main = new Main();
startTime = System.currentTimeMillis();
main.testSaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testStaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testDomParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
}
Output:
Using SAX Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 106 ms
Using StAX Parser:
------------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 5 ms
Using DOM Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 13 ms
You can see some glimpse view at there variations.
But at java exist other as JAXB - You need to have xsd schema and accord to this schema you generate classes. After this you this can use unmarchal() to read from xml file:
public class JaxbDemo {
public static void main(String[] args) {
try {
long startTime = System.currentTimeMillis();
// create jaxb and instantiate marshaller
JAXBContext context = JAXBContext.newInstance(Staff.class.getPackage().getName());
FileInputStream in = new FileInputStream(new File(Files.EMPLOYEE_XML.getFilename()));
System.out.println("Output from employee XML file");
Unmarshaller um = context.createUnmarshaller();
Staff staff = (Staff) um.unmarshal(in);
// print employee list
for (Staff.Employee emp : staff.getEmployee()) {
System.out.println(emp);
}
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime));
} catch (Exception e) {
e.printStackTrace();
}
}
}
I tried this one approach as before, result is next:
Employee { name='Carl Cracker', salary=75000, hiredate=1987-12-15 } }
Employee { name='Harry Hacker', salary=50000, hiredate=1989-10-1 } }
Employee { name='Tony Tester', salary=40000, hiredate=1990-3-15 } }
Parsing time is: 320 ms
I added another toString(), and it has different hire day format.
Here is few links that is interesting for you:
Java & XML Tutorial
JAXB Tutorial
DOM Parser through Recursion
Using a DOM parser, you can easily get into a mess of nested for loops as you've already pointed out. Nevertheless, DOM structure is represented by Node containing child nodes collection in the form of a NodeList where each element is again a Node - this becomes a perfect candidate for recursion.
Sample XML
To showcase the ability of DOM parser discounting the size of the XML, I took the example of a hosted sample OpenWeatherMap XML.
Searching by city name in XML format
This XML contains London's weather forecast for every 3 hour duration. This XML makes a good case of reading through a relatively large data set and extracting specific information through attributes within the child elements.
In the snapshot, we are targeting to gather the Elements marked by the arrows.
The Code
We start of by creating a Custom class to hold temperature and clouds values. We would also override toString() of this custom class to conveniently print our records.
ForeCast.java
public class ForeCast {
/**
* Overridden toString() to conveniently print the results
*/
#Override
public String toString() {
return "The minimum temperature is: " + getTemperature()
+ " and the weather overall: " + getClouds();
}
public String getTemperature() {
return temperature;
}
public void setTemperature(String temperature) {
this.temperature = temperature;
}
public String getClouds() {
return clouds;
}
public void setClouds(String clouds) {
this.clouds = clouds;
}
private String temperature;
private String clouds;
}
Now to the main class. In the main class where we perform our recursion, we want to create a List of ForeCast objects which store individual temperature and clouds records by traversing the entire XML.
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
In the XML the parent to both temperature and clouds elements is time, we would logically check for the time element.
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
Thereafter, we would get a handle on temperature and clouds elements which can be set to the ForeCast object.
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName("temperature").item(0);
// Minimum temperature value is selectively picked (for proof of concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName("clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
The complete class below:
CustomDomXmlParser.java
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class CustomDomXmlParser {
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
// Read XML throuhg a URL (a FileInputStream can be used to pick up an
// XML file from the file system)
InputStream path = new URL(
"http://api.openweathermap.org/data/2.5/forecast?q=London,us&mode=xml")
.openStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(path);
// Call to the recursive method with the parent node
traverse(document.getDocumentElement());
// Print the List values collected within the recursive method
for (ForeCast forecastObj : forecastList)
System.out.println(forecastObj);
}
/**
*
* #param node
*/
public static void traverse(Node node) {
// Get the list of Child Nodes immediate to the current node
NodeList list = node.getChildNodes();
// Declare our local instance of forecast object
ForeCast forecastObj = null;
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName(
"temperature").item(0);
// Minimum temperature value is selectively picked (for proof of
// concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName(
"clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
}
// Add our foreCastObj if initialized within this recursion, that is if
// it traverses the time node within the XML, and not in any other case
if (forecastObj != null)
forecastList.add(forecastObj);
/**
* Recursion block
*/
// Iterate over the next child nodes
for (int i = 0; i < list.getLength(); i++) {
Node currentNode = list.item(i);
// Recursively invoke the method for the current node
traverse(currentNode);
}
}
}
The Output
As you can figure out from the screenshot below, we were able to group together the 2 specific elements and assign their values effectively to a Java Collection instance. We delegated the complex parsing of the xml to the generic recursive solution and customized mainly the logical block part. As mentioned, it is a genetic solution with a minimal customization which can work through all valid xmls.
Alternatives
Many other alternatives are available, here is a list of open source XML parsers for Java.
However, your approach with PHP and your initial work with Java based parser aligns to the DOM based XML parser solution, simplified by the use of recursion.
I wouldn't suggest you to implement your own parse function for XML parsing since there are already many options out there. My suggestion is DOM parser. You can find few examples in the following link. (You can also choose from other available options)
http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html
You can use commands such as
eElement.getAttribute("id");
Source: http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
I agree what has been already posted about not implementing parse functions yourself.
Instead of DOM/SAX/STAX parsers though, I would suggest using JDOM or XOM, which are external libraries.
Related discussions:
What Java XML library do you recommend (to replace dom4j)?
Should I still be using JDOM with Java 5 or 6?
My gut feeling is that jdom is the one most java developers use. Some use dom4j, some xom, some others, but hardly anybody implements these parsing functions themselves.
use Java
startElement and endElement
for DOM Parsers

Get XPath of XML Tag

If I have an XML document like below:
<foo>
<foo1>Foo Test 1</foo1>
<foo2>
<another1>
<test10>This is a duplicate</test10>
</another1>
</foo2>
<foo2>
<another1>
<test1>Foo Test 2</test1>
</another1>
</foo2>
<foo3>Foo Test 3</foo3>
<foo4>Foo Test 4</foo4>
</foo>
How do I get the XPath of <test1> for example? So the output should be something like: foo/foo2[2]/another1/test1
I'm guessing the code would look something like this:
public String getXPath(Document document, String xmlTag) {
String xpath = "";
...
//Get the node from xmlTag
//Get the xpath using the node
return xpath;
}
Let's say String XPathVar = getXPath(document, "<test1>");. I need to get back an absolute xpath that will work in the following code:
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression xpr = xpath.compile(XPathVar);
xpr.evaluate(Document, XPathConstants.STRING);
But it can't be a shortcut like //test1 because it will also be used for meta data purposes.
When printing the result out via:
System.out.println(xpr.evaluate(Document, XPathConstants.STRING));
I should get the node's value. So if XPathVar = foo/foo2[2]/another1/test1 then I should get back:
Foo Test 2 and not This is a duplicate
You don't 'get' an xpath in the same way you don't 'get' sql.
An xpath is a query you write based on your understanding of an xml document or schema, just as sql is a query you write based on your understanding of a database schema - you don't 'get' either of them.
I would be possible to generate xpath statements from the DOM simply by walking back up the nodes from a given node, though to do this generically enough, taking into account attribute values on each node, would make the resulting code next to useless. For example (which comes with a warning that this will find the first node that has a given name, xpath is much more that this and you may as well just use the xpath //foo2):
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
public class XPathExample
{
private static String getXPath(Node root, String elementName)
{
for (int i = 0; i < root.getChildNodes().getLength(); i++)
{
Node node = root.getChildNodes().item(i);
if (node instanceof Element)
{
if (node.getNodeName().equals(elementName))
{
return "/" + node.getNodeName();
}
else if (node.getChildNodes().getLength() > 0)
{
String xpath = getXPath(node, elementName);
if (xpath != null)
{
return "/" + node.getNodeName() + xpath;
}
}
}
}
return null;
}
private static String getXPath(Document document, String elementName)
{
return document.getDocumentElement().getNodeName() + getXPath(document.getDocumentElement(), elementName);
}
public static void main(String[] args)
{
try
{
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(
new ByteArrayInputStream(
("<foo><foo1>Foo Test 1</foo1><foo2><another1><test1>Foo Test 2</test1></another1></foo2><foo3>Foo Test 3</foo3><foo4>Foo Test 4</foo4></foo>").getBytes()
)
);
String xpath = "/" + getXPath(document, "test1");
System.out.println(xpath);
Node node1 = (Node)XPathFactory.newInstance().newXPath().compile(xpath).evaluate(document, XPathConstants.NODE);
Node node2 = (Node)XPathFactory.newInstance().newXPath().compile("//test1").evaluate(document, XPathConstants.NODE);
//This evaluates to true, hence you may as well just use the xpath //test1.
System.out.println(node1.equals(node2));
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Likewise you could write an XML transformation that turned an xml document into a series of xpath statements but this transformation would be more complicated that writing the xpath in the first place and so largely pointless.
How's this:
private static String getXPath(Document root, String elementName)
{
try{
XPathExpression expr = XPathFactory.newInstance().newXPath().compile("//" + elementName);
Node node = (Node)expr.evaluate(root, XPathConstants.NODE);
if(node != null) {
return getXPath(node);
}
}
catch(XPathExpressionException e) { }
return null;
}
private static String getXPath(Node node) {
if(node == null || node.getNodeType() != Node.ELEMENT_NODE) {
return "";
}
return getXPath(node.getParentNode()) + "/" + node.getNodeName();
}
Note that this is first locating the node (using XPath) and then using the located node to get its XPath. Quite the roundabout approach to get a value you already have.
Working ideone example: http://ideone.com/EL4783

Speeding up xpath

I have a 1000 entry document whose format is something like:
<Example>
<Entry>
<n1></n1>
<n2></n2>
</Entry>
<Entry>
<n1></n1>
<n2></n2>
</Entry>
<!--and so on-->
There are more than 1000 Entry nodes here. I am writing a Java program which basically gets all the node one by one and do some analyzing on each node. But the problem is that the retrieval time of the nodes increases with its no. For example it takes 78 millisecond to retrieve the first node 100 ms to retrieve the second and it keeps on increasing. And to retrieve the 999 node it takes more than 5 second. This is extremely slow. We would be plugging this code to XML files which have even more than 1000 entries. Some like millions. The total time to parse the whole document is more than 5 minutes.
I am using this simple code to traverse it. Here nxp is my own class which has all the methods to get nodes from xpath.
nxp.fromXpathToNode("/Example/Entry" + "[" + i + "]", doc);
and doc is the document for the file. i is the no of node to retrieve.
Also when i try something like this
List<Node> nl = nxp.fromXpathToNodes("/Example/Entry",doc);
content = nl.get(i);
I face the same problem.
Anyone has any solution on how to speed up the tretirival of the nodes, so it takes the same amount of time to get the 1st node as well as the 1000 node from the XML file.
Here is the code for xpathtonode.
public Node fromXpathToNode(String expression, Node context)
{
try
{
return (Node)this.getCachedExpression(expression).evaluate(context, XPathConstants.NODE);
}
catch (Exception cause)
{
throw new RuntimeException(cause);
}
}
and here is the code for fromxpathtonodes.
public List<Node> fromXpathToNodes(String expression, Node context)
{
List<Node> nodes = new ArrayList<Node>();
NodeList results = null;
try
{
results = (NodeList)this.getCachedExpression(expression).evaluate(context, XPathConstants.NODESET);
for (int index = 0; index < results.getLength(); index++)
{
nodes.add(results.item(index));
}
}
catch (Exception cause)
{
throw new RuntimeException(cause);
}
return nodes;
}
and here is the starting
public class NativeXpathEngine implements XpathEngine
{
private final XPathFactory factory;
private final XPath engine;
/**
* Cache for previously compiled XPath expressions. {#link XPathExpression#hashCode()}
* is not reliable or consistent so use the textual representation instead.
*/
private final Map<String, XPathExpression> cachedExpressions;
public NativeXpathEngine()
{
super();
this.factory = XPathFactory.newInstance();
this.engine = factory.newXPath();
this.cachedExpressions = new HashMap<String, XPathExpression>();
}
Try VTD-XML. It uses less memory than DOM. It is easier to use than SAX and supports XPath. Here is some sample code to help you get started. It applies an XPath to get the Entry elements and then prints out the n1 and n2 child elements.
final VTDGen vg = new VTDGen();
vg.parseFile("/path/to/file.xml", false);
final VTDNav vn = vg.getNav();
final AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/Example/Entry");
int count = 1;
while (ap.evalXPath() != -1) {
System.out.println("Inside Entry: " + count);
//move to n1 child
vn.toElement(VTDNav.FIRST_CHILD, "n1");
System.out.println("\tn1: " + vn.toNormalizedString(vn.getText()));
//move to n2 child
vn.toElement(VTDNav.NEXT_SIBLING, "n2");
System.out.println("\tn2: " + vn.toNormalizedString(vn.getText()));
//move back to parent
vn.toElement(VTDNav.PARENT);
count++;
}
The correct solution is to detach the node right after you call item(i), like so:
Node node = results.item(index)
node.getParentNode().removeChild(node)
nodes.add(node)
See XPath.evaluate performance slows down (absurdly) over multiple calls
I had similar issue with the Xpath Evaluation , I tried using CachedXPathAPI’s which is faster by 100X than the XPathApi’s which was used earlier.
more information about this Api is provided here :
http://xml.apache.org/xalan-j/apidocs/org/apache/xpath/CachedXPathAPI.html
Hope it helps.
Cheers,
Madhusudhan
If you need to parse huge but flat documents, SAX is a good alternative. It allows you to handle the XML as a stream instead of building a huge DOM. Your example could be parsed using a ContentHandler like this:
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.ext.DefaultHandler2;
public class ExampleHandler extends DefaultHandler2 {
private StringBuffer chars = new StringBuffer(1000);
private MyEntry currentEntry;
private MyEntryHandler myEntryHandler;
ExampleHandler(MyEntryHandler myEntryHandler) {
this.myEntryHandler = myEntryHandler;
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
chars.append(ch);
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if ("Entry".equals(localName)) {
myEntryHandler.handle(currentEntry);
currentEntry = null;
}
else if ("n1".equals(localName)) {
currentEntry.setN1(chars.toString());
}
else if ("n2".equals(localName)) {
currentEntry.setN2(chars.toString());
}
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
chars.setLength(0);
if ("Entry".equals(localName)) {
currentEntry = new MyEntry();
}
}
}
If the document has a deeper and more complex structure, you're going to need to use Stacks to keep track of the current path in the document. Then you should consider writing a general purpose ContentHandler to do the dirty work and use with your document type dependent handlers.
What kind of parser are you using?
DOM pulls the whole document in memory - once you pull the whole document in memory then your operations can be fast but doing so in a web app or a for loop can have an impact.
SAX parser does on demand parsing and loads nodes as and when you request.
So try to use a parser implementation that suits your need.
Use the JAXEN library for xpaths:
http://jaxen.codehaus.org/

Categories