Comparing two xml files and get the differences - java

I am trying to compare two XML files. My requirement is to compare old and new xml files and if any difference is there merge it into the new xml file.
Below code snippet gives me whether both files are equal or not.
public static void compare() throws SAXException, IOException, ParserConfigurationException{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new File("old.xml"));
doc1.normalizeDocument();
Document doc2 = db.parse(new File("new.xml"));
doc2.normalizeDocument();
System.out.println (doc1.isEqualNode(doc2));
}
But I want the differences also. Please tell me how I can get the differences.
I have already tried XMLUnit , but I dont want to use it.

You can use jaxb to parse your xml and contruct java objects with unmarshall method and compare your generated objects.
See it here : How to compare jaxb, object, string to find differences
You can also use google diff-match-patch api to compare your two xml as Strings and patch to create the new.
https://code.google.com/archive/p/google-diff-match-patch/
Diff demo : https://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_diff.html
Patch Demo : https://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_patch.html
Sources are available on github : https://github.com/GerHobbelt/google-diff-match-patch

Related

Special characters creates problem while writing xml

first of all please excuse my shallow understanding into coding as I am a business analyst. Now my question. I am writing java code to convert a csv into xml. I am able to read csv successfully into objects. However, while writing the xml, when special a space or "=" is encounteredan error is thrown.
Piece of the problematic code, I have imporovised the value in create element just to highlight the problem. In actual I am getting this value from an object:-
DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentFactory.newDocumentBuilder();
Document xmlDocument= documentBuilder.newDocument();
Element root = xmlDocument.createElement("Media NationalGroupId="8" AllFTA="1002" AllSTV="1001");
xmlDocument.appendChild(root);
My xml should look something like this
<Media DateCreated="20200224 145251" NationalGroupId="8" AllFTA="1002" AllSTV="1001" AllTV="1000" NextId="1000000">
createElement should only receive Media as the argument.
To add the other attributes (DateCreated, NationalGroupId, etc), you need to call setAttribute on root, one by one.

Stepts for creating a Document object

I am learning about XML in Java and every time I want to use a Document object I have to write:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
I know how it works further, but what actually happens in those 3 lines? Why do I need a DocumentBuilderFactory and then a DocumentBuilder to build a Document?
Update: Could you give me an example where I shouldn't write the first 2 lines exactly the same? I don't see the point of instantiating 2 more objects for a new Document. What is their effective role?
1) Factory (creates something) can create a DocumentBuilder
Obtain a new instance of a DocumentBuilderFactory. This static method
creates a new factory instance.
2)
Creates a new instance of a DocumentBuilder using the currently
configured parameters.
3)
Parse the content of the given file as an XML document and return a
new DOM Document object. An IllegalArgumentException is thrown if the
File is null null.
Source
This is how the library is build. Without the factory you will not be able to create a new DocumentBuilder object and thus will not be able to parse a file
The approach you use for the XML parsing is known as the Document Object Model (DOM) approach (note: it is not the only one available) and a part of Java API for XML Processing (JAXP). Quoting:
Designed to be flexible, JAXP allows you to use any XML-compliant
parser from within your application
To allow the programmer to use any XML parser, the system needs to avoid using a specific implementation. To be able to do that it decides the implementation during runtime using a design pattern known as the Factory pattern which (quoting) "...deals with the problem of creating objects (products) without specifying the exact class of object that will be created."
So when you use DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); the returned instance is not actually a DocumentBuilder (it couldn't be - this is an abstract class) but an instance of another class that extends DocumentBuilder. You could print the actual class in runtime to verify that.
// returns com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl in my system
System.out.println( dbFactory.getClass().getName() );
// returns com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl in my system
System.out.println( dBuilder.getClass().getName() );
Examples where you wouldn't need to use the first two lines, would be the cases where you would use a specific parsing implementation directly (and thus introducing a third party dependency in your project).
I hope this helps
Right from the javadocs:
DocumentBuilderFactory.newInstance()
Obtain a new instance of a DocumentBuilderFactory. This static method
creates a new factory instance. This method uses the following ordered
lookup procedure to determine the DocumentBuilderFactory
implementation class to load:
Use the javax.xml.parsers.DocumentBuilderFactory system property.
Use the properties file "lib/jaxp.properties" in the JRE directory. This configuration file is in standard java.util.Properties format
and contains the fully qualified name of the implementation class
with the key being the system property defined above. The
jaxp.properties file is read only once by the JAXP implementation and
it's values are then cached for future use. If the file does not
exist when the first attempt is made to read from it, no further
attempts are made to check for its existence. It is not possible to
change the value of any property in jaxp.properties after it has been
read for the first time.
Use the Services API (as detailed in the JAR specification), if available, to determine the classname. The Services API will look for
a classname in the file Platform default DocumentBuilderFactory
instance.
META-INF/services/javax.xml.parsers.DocumentBuilderFactory in jars
available to the runtime.
Platform default DocumentBuilderFactory instance.
Once an application has obtained a reference to a
DocumentBuilderFactory it can use the factory to configure and obtain
parser instances.
DocumentBuilderFactory.newDocumentBuilder()
Creates a new instance of a DocumentBuilder using the currently
configured parameters.
Returns: A new instance of a DocumentBuilder.
Throws: ParserConfigurationException - if a DocumentBuilder cannot be
created which satisfies the configuration requested.
DocumentBuilder.parse()
Parse the content of the given file as an XML document and return a
new DOM Document object. An IllegalArgumentException is thrown if the
File is null null.
Parameters: f - The file containing the XML to parse.
Returns: A new DOM Document object.
Throws: IOException - If any IO errors occur. SAXException - If any
parse errors occur. IllegalArgumentException - When f is null

create simple xml in Java

I am trying to build server that sends a xml file to client. I am getting info from db and wants to build from that xml file.
But I have a problem with:
DocumentBuilder documentBuilder = null;
Document doc =documentBuilder.newDocument();
I am getting NullPointerException. Here is me full code:
public void createXmlTree() throws Exception {
//This method creates an element node
DocumentBuilder documentBuilder = null;
Document doc =documentBuilder.newDocument();
Element root = doc.createElement("items");
//adding a node after the last child node of the specified node.
doc.appendChild(root);
for(int i=0;i<db.stories.size();i++){
Element child = doc.createElement("item");
root.appendChild(child);
Element child1 = doc.createElement("title");
child.appendChild(child1);
Text text = doc.createTextNode(db.stories.get(i).title);
child1.appendChild(text);
//Comment comment = doc.createComment("Employee in roseindia");
//child.appendChild(comment);
Element child2 = doc.createElement("date");
child.appendChild(child2);
Text text2 = doc.createTextNode(db.stories.get(i).date);
child2.appendChild(text2);
Element child3 = doc.createElement("text");
child.appendChild(child3);
Text text3 = doc.createTextNode(db.stories.get(i).text);
child3.appendChild(text3);
root.appendChild(child3);
Well yes, you would get a NullPointerException. You're calling a method on a null reference - very clearly, given that you've assigned the documentBuilder a null value on the line before. You need to get an instance of DocumentBuilder to start with. For example:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = factory.newDocumentBuilder();
of course you are getting a NullPointerException, your DocumentBuilder is null.
Try instantiating it first.
// Step 1: create a DocumentBuilderFactory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// Step 2: create a DocumentBuilder
DocumentBuilder db = dbf.newDocumentBuilder();
Guys are right about DocumentBuilder. But may I offer you other solution? Your servlet mostly deals with generating of XML itself, i.e. produces kind of markup. This is the purpose of JSP. You can implement simple JSP page that will actually contain template of your XML and some code that inserts dynamic data. This is much simpler and easier to maintain.
Yes, JSP typically generate HTML but no-one said that they cannot generate XML or any other text format. Just do not forget to set content type to text/xml.
Do you really need to write you XML manually?
Do you have the XSD of the XML you want to write?
Because, it would be easier to generate some classes using XJC/JAXB and use the marshaller to write your XML file.

Parse a single Line of XML into a HashMap

I'm building an android app that communicates to a web server and am struggling with the following scenario:
Given ONE line of XML in a String eg:
"<test one="1" two="2" />"
I would like to extract the values into a HashMap so that:
map.get("one") = "1"
map.get("two") = "2"
I already can do this with a full XML document using the SAX Parser, this complains when i try to just give it the above string with a MalformedUrlException: Protocol not found
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
Document doc = null;
builder = factory.newDocumentBuilder();
doc = builder.parse("<test one="1" two="2" />"); //here
I realize some regex could do this but Id really rather do it properly.
The same behaviour can be found at http://metacpan.org/pod/XML::Simple#XMLin which is what the web server uses.
Can anyone help? Thanks :D
DocumentBuilder.parse(String) treats the string as a URL. Try this instead:
Document doc = builder.parse(new InputSource(new StringReader(text)));
(where text contains the XML, of course).

org.w3c.dom.Document to String without javax.xml.transform

I've spent a while looking around on Google for a way to convert a org.w3c.dom.Document to a string representation of the whole DOM tree, so I can save the object to file system.
However all the solutions I've found use javax.xml.transform.Transformer which isn't supported as part of the Android 2.1 API. How can I do this without using this class/containing package?
Please try this code:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse("/path/to/file.xml");
DOMImplementation domImpl = ownerDocument.getImplementation();
DOMImplementationLS domImplLS = (DOMImplementationLS)domImpl.getFeature("LS", "3.0");
LSSerializer serializer = domImplLS.createLSSerializer();
serializer.getDomConfig().setParameter("xml-declaration", Boolean.valueOf(false));
LSOutput lsOutput = domImplLS.createLSOutput();
lsOutput.setCharacterStream(output);
serializer.write(doc, lsOutput);
To avoid using Transformer you should manually iterate over your xml tree, otherwise you can rely on some external libraries. You should take a look here.

Categories