Ignore element order while validating XML against XSD

Ignore element order while validating XML against XSD - java

We have an XML which needs to be validated against an XSD. The XML is being generated by XSTREAM. and We are using jaxp api's to validate the XML against the respective XSD. Unfortunately, currently our test case fail as the generated XML has elements/Tags in different order/sequence than the XSD.
Is it possible to ignore the order of elements in generated XML while validating it against XSD?
Thanks for the help in advance.

What you are asking for is a way to say "validate some of the XSD and ignore other parts". I don't think that can be done.
One possible solution would be to modify the schema so that instead of using a <sequence> for those elements (which requires that the elements be in a particular order) you can use <all>, which allows the elements to be in any order.
The point of a schema is to impose certain structure and requirements on an XML document. You can't just say "eh, I don't like that particular part of the schema, ignore it" as then the document doesn't conform to the schema anymore.

Related

How to use XSD for non namespaced documents

Recently I've encountered a service that returns its results in XML, in sort of following fashion
<event>
<event-header>
...
</event-header>
<event-body>
...
</event-body>
</event>
Notice that the document does not have a namespace definition. As a result, there is no "official" schema that I can use.
I have written a schema definition that I can use to generate classes that are usable in code to interact with equivalent elements in the document. From observation I can tell that the document format does not change (field order remains the same, fields are not introduced or go away). But question stands, can I still deserialize the provided document using my schema? As far as I know, schemas must define a namespace, and in theory the documents above and below
<event xmlns="http://saltyjuice.lt/dragas/event-service/1.0/event-schema.xsd">
<event-header>
...
</event-header>
<event-body>
...
</event-body>
</event>
are not equivalent.
For reference, I'm using stax and woodstox 6 as implementation.

You can have a schema for a no-namespace document, I don't know why you thought otherwise. It's not ideal, because a namespace can guide people to the right schema. But it's allowed. Anyway, even with a namespace, it's quite possible to have several schemas for the same namespace (usually, versions and variants).

how to skip few element validations in xsd in java

I am trying to validate an xml against an xsd in java.
In XSD one of the field(tax_id) is defined as manadatory element.
But in my scenario I pass an xml to another component, that component fills the manadatory
field(tax_id).
Before sending that xml to the next component, I have to validate that xml against the xsd.
As, in that XSD element tax_id is defined as mandatory element, I get exception for not filling mandatory element (tax_id).
I can create a new xsd by making tax_id as optional field, but with this we would be having 2 xsds.
Is there any way to skip/ignore few elements while validating in java?

In general, no. The purpose of the XSD is to specify the rules that the document must meet, in order to be valid. You can't ignore or skip some of those rules. If you could, you'd probably have other problems. For example, if required elements were really optional (or could be) then technically any element that was supposed to contain a bunch of other elements could be empty (and still valid) under that more lax validation.
In your situation, you probably have two options:
Change your workflow - make sure the first component populates the XML with the empty tax_id. Then it will validate.
Introduce a second schema - one earlier in the "pipeline" of processing, that doesn't require tax_id. Then validate against that.

Querying XSD-valid XML for the original XML schema

Given a schema document (XSD foramt) such as the MODS 3.5 schema (US Library of Congress, LoC), and a document (XML) known to be valid according to that schema, such as the metadata for the Antitrust & Competition Policy Blog archives 2007 (HTML view) from the LoC Law Blawgs Web Archive, is there a Java API such that would allow a Java program to query the XML document for the XML schema data types that elements of the document are instances of?
It may seem as though I have may XML schemas and UML models confused. I'm thinking of an XML schema as it representing something like a UML model (M1), and an XML document then, like user data (M0) representing instances of UML model elements. If it may be possible, similarly, to query an XML element, to determine the XML schema data type or element definition that the element either derives from or is conformant to in the parse tree, I've thought it could make for a nice feature for a sequencer for ModeShape.
I think, the idea is essentially: That it may be possible to reference the JCR nodes representing XML elements of a sequenced XML document, in a ModeShape JCR repository, to reference each element to a JCR node representing an XML schema data type, such the type's representative JCR node would be defined in the sequencing of the schema used by the document, such as would have been sequenced by the ModeShape XSD sequencer.
I'm simply not certain if there may be an API, in Java, for determining the XML schema element than a valid XML document element -- when the XML document is validated according to an XML schema -- such that the element is conformant to in the parse tree. I'm of an impression that it would be possible to perform such a computation. Simply, I wonder, might there already be an API for that?
Alternately, there is UML...

The answer is yes.
In terms of standards, validating an XML document against a schema produces a PSVI, (post schema validation infoset), and the PSVI decorates nodes in the parse tree with information about what types they were validated against.
In terms of concrete implementation, if you use the JAXP Validation API you can either generate a DOM augmented with TypeInfo that tells you the type of each node, or you can use a SAX-based validation pipeline in which type information is notified to a TypeInfoProvider.
You can also do this using schema-aware XSLT and XQuery; after a validation operation, nodes are augmented with a "type annotation", which you can interrogate using the "instance of" test. If you use Saxon, you can use the extension functions saxon:type() or saxon:type-annotation() to explore further:
http://www.saxonica.com/documentation/#!functions/saxon/type
http://www.saxonica.com/documentation/#!functions/saxon/type-annotation
A limitation of the XSLT/XQuery approach is that it only works if validation succeeds. The DOM/SAX interfaces also provide information in cases where validation fails.

Evaluating JAXB

I have a couple of questions about JAXB:
What options are there for parsing? Can I implement / plugin my own parser easily?
What about validity? Suppose I have a relaxed parser that is somewhat relaxed regarding the schema. Can I still create an (invalid) object-structure?
Does JAXB provide special means to do e.g. validation on the objects? I'd like to parse to an "invalid" object structure, have some algorithm repair it, then validate (in Java).
Does JAXB provide other means to do fancy things on the objects (e.g. visitor pattern).
What about the memory footprint? Is the object representation (disregarding the parsing) feasible for XML files of 10-100MB?
Good tutorials covering this kind of questions are appreciated, Google revealed only coarse overviews.

Below are my answers to your questions:
What options are there for parsing? Can I implement / plugin my own
parser easily?
JAXB (JSR-222) implementations can unmarshal from many different input types: InputStream, InputSource',Node,XMLStreamReader,XMLEventReader,File,Source`. If your XML representation matches any of these then you're all set.
What about validity? Suppose I have a relaxed parser that is somewhat
relaxed regarding the schema. Can I still create an (invalid)
object-structure?
JAXB implementations requires that the XML be well formed, but does not require it be valid against an XML schema. It is designed to handle a wide range of documents. If you want to ensure "validity" then you can set an XML schema (see JAXB and Marshal/Unmarshal Schema Validation).
Does JAXB provide special means to do e.g. validation on the objects?
I'd like to parse to an "invalid" object structure, have some
algorithm repair it, then validate (in Java).
You can use the javax.xml.validation APIs to do validation on an object model. For a full example see:
http://blog.bdoughan.com/2010/11/validate-jaxb-object-model-with-xml.html
Does JAXB provide other means to do fancy things on the objects (e.g.
visitor pattern).
JAXB models are POJOs so you can design them as you wish. You may be interested in the following classes:
http://docs.oracle.com/javase/6/docs/api/javax/xml/bind/Marshaller.Listener.html
http://docs.oracle.com/javase/6/docs/api/javax/xml/bind/Unmarshaller.Listener.html
What about the memory footprint? Is the object representation
(disregarding the parsing) feasible for XML files of 10-100MB?
Yes JAXB can be used to process documents of that size. If you are concerned about size, you can use an XMLStreamReader to parse the XML file and then unmarshal objects from the XMLStreamReader in chunks.

How can I get the xml Node type based on schema definition in Java?

Let's say I have a doc.xml and corresponding doc.xsd. I use xpath to retrieve some nodes, so I get a list of org.w3c.dom.Node. How can I get type of each node from schema, eg. xs:integer, xs:string etc ?
Some solution would be to parse schema with xpath query "//NodeName[#type]" using node.getNodeName() as NodeName, but that's not perfect. I can't be sure that schema is elegant - what if NodeName exists in many places in schema and has not been extracted as a separate type?
So generally I am looking for a reliable solution to get the node type for ANY valid xml & xsd.

You should consider using JAXB. It will create Java classes for you based on the schema type. Then your XML docs are read into those classes, which are typed according to how you defined your XSD. Therefore xsd:int maps to java int(or Integer wrapper class, I can't recall), etc.

Cast your DOM Elements to TypeInfo: from there, you can access the type information you're looking for.

Unfortunately types as defined in an XML Schema (XSD) or Document Type Definition (DTD) are not directly tied to XML document they validate. The elements and attributes in an XML document do not inherently have a type they are just text. Think of an XSD as a script that validates an XML document rather than a set of type annotations for elements and attributes.
The XML specification does not define types as you are thinking of them here. Even Document Type Definitions (DTD) which can be embedded inside XML documents more about the structure of the document not the type of the data contained in elements and attributes.
The type system described in XML Schema is an optional layer of validation that can be applied to XML documents. Since this validation optional the standard XML APIs do not provide a way to bind the validation rules in an XSD to the actual attributes and elements.
I think it would be possible for an XML API to provide a mechanism to bind an XSD to a specific XML document, but I am not aware of an XML parser that does this. One reason why this is not so easy is that the type system that is defined in XML Schema is much richer than is supported in most mainstream programming languages. In your example you may only be interested in xs:integer, xs:string and the like but in XML Schema you can create types that specify ranges, patterns and other things that are just not possible with data types in most programming languages. To represent this complex type system in Java or any programming language would have to be done through a fairly complex API. The the question becomes it is really worth it? I would say probably not.

As per David Ds answer, slightly cleaner, call getSchemaTypeInfo() on an element or attribute

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.