how to match two xml string? - java

I am getting some information in the form of XML.
Before using that xml I want to validate that all the information is in that XML.
For this purpose I will have a master copy of XML, against which i will match all the coming documents.
How can i do this?

Looks a lot of work, but you could use XPath depending on the size and structure of your xml. Take a look at http://www.w3schools.com/xpath/default.asp.
Also there is a really good starting tutorial here:
http://www.ibm.com/developerworks/library/x-javaxpathapi.html
And if you're willing to do validation through the use of your xsd (if there is one), look at ( XML Schema (XSD) validation tool? )

A common approach for validating xml is to define a schema (xsd or dtd). A parser can validate the input xml document against all constraints that are specified in the schema document.
This is a common way if you need to make sure, that certains elements are present and that certain values are within a specified value range.

You can refer the following link to see how an xml can be validated against a dtd in java
http://www.roseindia.net/xml/dom/DOMValidateDTD.shtml

Related

How can I programatically modify an XML Document to respect a DTD in java

I have an XML Document built programatically and waiting for serialisation (as a String). Before serialising it though, I would like to re-arrange its nodes so that they match the definition of the DTD. I should mention that my implementation prevents me to know in which order the tree will be built.
Any recommended solutions for this ?
There is some academic research to correct invalid XML:
Correction of Invalid XML Documents with Respect to Single Type Tree
Grammars
but I don't know if there is any available library for that.
So you need to do it by hand, or better rework the generation of the document so that it produces a valid instance in the first place.

Use xpath instead of XSD object generation for accessing XML details?

There is an XML file hosted on a server that I want to parse. Normally I generate an XSD from the XML and then generate the java pojo's from this XSD. Using jackson I then parse the XML to a java object representation. Is it not more straightforward to just use xpath ? This means I do not need to generate a object hierarchy based on the XML and also I do not need to regenerate the object hierarchy if the XML changes. xpath seems much more concise and intuitive ?
Why should I use XSD , object generation instead of xpath ?
According to the XML Schema specification XSD is used for defining the structure, content and semantics of XML documents. This means that you can use XSD to validate your XML file.
Depending on your circumstances you might be able to do without generating the whole object tree if all you need is to get some values from the XML file. In this case XPath is the way to go. However, you still might want to have an XSD file in order to validate the XML file before parsing it. This way you make your software fail fast, when the structure of your XML file changes, which will suggest that you change your XPath expressions. But for this to work, you shouldn't use the XSD you generate from your XML file, instead you should have a separate pre-generated XSD file which complies with the XPath expressions.
I think both approaches are valid, depending on the circumstances.
At the end of the day, you want to extract the values from that remote xml file and do something with them.
First criteria to consider is the size of that file, and the number of data elements.
If it's just a few, then xpath extraction should be straightforward. However, if that xml file represent a sizable and/or complex data structure, then you probably want the de-serialization to a Java data structure that you can then utilize, and JAXB would be a good candidate.
JAXB is going to be easier/better if the remote server adheres or publishes an XML Schema. If it doesn't, and changes often and significantly, you're going to suffer either way, but particularly so with JAXB. There are ways to smooth things over by pre-processing that xml with XSLT to force it into a more reliable form, but that is going to be a partial solution most likely.

modifying xml document using xml parsers?

I have an xml stored in database table. i need to get the xml and modify few elements and put the xml back in the database.
I am thinking to use JDOM or JAXB to modify the xml elements. Could you please suggest which one is better regarding the performance?
Thanks!
JAXB and JDOM and completely different things. JAXB will serialize java objects into an XML format and vice versa. JDOM simply reads in the XML file and stores it in a DOM tree which can then be used to modify the xml itself. So better if you go for JDOM.
JAXB is to be used when you have objects where the attribute values are stored in XML hence you can parse an xml document and it gives you a java objects and then you can write these back.
Quite a bit of work if you want to simple change some values. And it doesn't work with arbitrary xml files, JAXB has it's own format linked to your object's definitions.
JDOM creates also objects but the objects used are XML objects like Element, NodeList, ...
If you just want to change some values -> why not reading the xml file as a plain text file and use string operations to make your changes.
Or of the modification is more logicaly defined -> use an XSLT and a stylesheet translator.
Googling for XSLT and Java will give you tons of examples.

How to generate XSD from elements of XML

I have a XML input
<field>
<name>id</name>
<dataType>string</dataType>
<maxlength>42</maxlength>
<required>false</required>
</field>
I am looking for a library or a tool which will take an XML instance document and output a corresponding XSD schema.
I am looking for some java library with which I can generate a XSD for the above XML structure
If all you want is an XSD so that the XML you gave conforms to it, you'd be much better off by crafting it yourself rather than using a tool.
No one knows better than you the particularities of the schema, such as which valid values are there (for instance, is the <maxlength> element required? are true and false the only valid values for <required>?).
If you really want to use a tool (I'd only advice using it if you haven't designed the XML and really can't get the real XSD - or if you designed it, double check the generated XSD), you could try Trang. It can infer an XSD Schema from a number of example XML's.
You'll have to take into account that the XSD a tool can infer you might be incomplete or inaccurate if XML samples aren't representative enough.
java -jar trang.jar sampleXML.xml inferredXSD.xsd
You can find a usage example of Trang here.
You can try with online tool called XMLGrid: http://xmlgrid.net/xml2xsd.html
You could write an XSLT to do something like that. But the problem is, a single document alone is not enough information to generate a schema. Are any of those elements optional? Is there anything missing from that document, that might appear in other instances? How many of a particular element can there be? Do they have to be in that order? There are loads of things that can be expressed in a schema, that are not immediately obvious from one instance of a document that conforms to that schema.
For the people who really want to include it in their Java code to generate an XSD and understand the perils, check out Generate XSD from XML programatically in Java
Try xmlbeans it has some tools one of them is ins2xsd you can find specifics here:
http://xmlbeans.apache.org/docs/2.0.0/guide/tools.html
Good luck

Is it possible to use Apache Digester to filter dynamic xml leaf tags?

I've used Apache digester before and loved the branch based searching of xml tags.
Specifying a tag as
h\a\b\
is very intuitive.
Now i want to do xml filtering project, but apache digester doesn't seem like it will work, simply because there is no way to get to the underlying xml tags. As the faq says:
How do I get some xml nested within a tag as a literal string?
It is frequently asked how some XML (esp. XHTML) nested within a document can be extracted as a string, eg to extract the contents of the "body" tag below as a string:
...some xml code...
If you can modify the above to wrap the desired text as a CDATA section then things are easy; Digester will simply treat that CDATA block as a single string:
...some xml code...
If this can't be done then you need to use a NodeCreateRule to create a DOM node representing the body tag and its children, then serialise that DOM node back to text.
Remember that Digester is just a layer on top of a standard XML parser, and standard XML parsers have no option to just stop parsing input at a specific element - unless it knows that the contents of that element is a block of characters (CDATA).
If there was something that uses the same pattern system that i can use to filter xml? My idea is to use the patterns given by the user and blacklist them, and copy everything else.
Or maybe there is a way to find the location of a match in Apache Digester (the location on the xml, not just the displayed text). That would be enough for me to copy the other text by keeping a copy of it around, and skipping the matches.
Edit: I've since found out that XPath looks almost ok for doing this, but all applications i found were for selecting something, not removing it. Do you have a example for this?
Never mind, managed to do it with XPath.

Categories