Recently I've encountered a service that returns its results in XML, in sort of following fashion
<event>
<event-header>
...
</event-header>
<event-body>
...
</event-body>
</event>
Notice that the document does not have a namespace definition. As a result, there is no "official" schema that I can use.
I have written a schema definition that I can use to generate classes that are usable in code to interact with equivalent elements in the document. From observation I can tell that the document format does not change (field order remains the same, fields are not introduced or go away). But question stands, can I still deserialize the provided document using my schema? As far as I know, schemas must define a namespace, and in theory the documents above and below
<event xmlns="http://saltyjuice.lt/dragas/event-service/1.0/event-schema.xsd">
<event-header>
...
</event-header>
<event-body>
...
</event-body>
</event>
are not equivalent.
For reference, I'm using stax and woodstox 6 as implementation.
You can have a schema for a no-namespace document, I don't know why you thought otherwise. It's not ideal, because a namespace can guide people to the right schema. But it's allowed. Anyway, even with a namespace, it's quite possible to have several schemas for the same namespace (usually, versions and variants).
Related
When using JAXB marshalling, do I have an influence on which element a namespace/namespace prefix will be declared?
Currently, all namespace prefixes are declared at the root element, but due to strange limitations of the system which processes my XML I need to declare them at child elements (which would still result in a valid xml document).
A similar, but not identical request has been made to the official jaxb issue tracker and declined by a developer back in 2006. I'd like to know if this situation changed in the meanwhile or if some workarounds exist.
Any help is appreciated.
Example:
JAXB marshalling creates the following XML:
<outer xmlns:ns1="http://mydomain">
<inner>
<ns1:data/>
</inner>
</outer>
While I need to have something like
(ns1 prefix is not declared at the root element):
<outer>
<inner xmlns:ns1="http://mydomain">
<ns1:data/>
</inner>
</outer>
JAXB (JSR-222) does not provide a means to control where namespace declarations occur. JAXB providers tend to put the namespaces on the root element (for performance reasons), but they are not required to.
Below is a link to an answer I gave to a similar question where XMLStreamWriter is extended to control when the namespace declarations get reported.
JAXB marshalling XMPP stanzas
I have a couple of questions about JAXB:
What options are there for parsing? Can I implement / plugin my own parser easily?
What about validity? Suppose I have a relaxed parser that is somewhat relaxed regarding the schema. Can I still create an (invalid) object-structure?
Does JAXB provide special means to do e.g. validation on the objects? I'd like to parse to an "invalid" object structure, have some algorithm repair it, then validate (in Java).
Does JAXB provide other means to do fancy things on the objects (e.g. visitor pattern).
What about the memory footprint? Is the object representation (disregarding the parsing) feasible for XML files of 10-100MB?
Good tutorials covering this kind of questions are appreciated, Google revealed only coarse overviews.
Below are my answers to your questions:
What options are there for parsing? Can I implement / plugin my own
parser easily?
JAXB (JSR-222) implementations can unmarshal from many different input types: InputStream, InputSource',Node,XMLStreamReader,XMLEventReader,File,Source`. If your XML representation matches any of these then you're all set.
What about validity? Suppose I have a relaxed parser that is somewhat
relaxed regarding the schema. Can I still create an (invalid)
object-structure?
JAXB implementations requires that the XML be well formed, but does not require it be valid against an XML schema. It is designed to handle a wide range of documents. If you want to ensure "validity" then you can set an XML schema (see JAXB and Marshal/Unmarshal Schema Validation).
Does JAXB provide special means to do e.g. validation on the objects?
I'd like to parse to an "invalid" object structure, have some
algorithm repair it, then validate (in Java).
You can use the javax.xml.validation APIs to do validation on an object model. For a full example see:
http://blog.bdoughan.com/2010/11/validate-jaxb-object-model-with-xml.html
Does JAXB provide other means to do fancy things on the objects (e.g.
visitor pattern).
JAXB models are POJOs so you can design them as you wish. You may be interested in the following classes:
http://docs.oracle.com/javase/6/docs/api/javax/xml/bind/Marshaller.Listener.html
http://docs.oracle.com/javase/6/docs/api/javax/xml/bind/Unmarshaller.Listener.html
What about the memory footprint? Is the object representation
(disregarding the parsing) feasible for XML files of 10-100MB?
Yes JAXB can be used to process documents of that size. If you are concerned about size, you can use an XMLStreamReader to parse the XML file and then unmarshal objects from the XMLStreamReader in chunks.
Until now, I've been handling extensions by defining a placeholder element that has "name" and "value" attributes as shown in the below example
<root>
<typed-content>
...
</typed-content>
<extension name="var1" value="val1"/>
<extension name="var2" value="val2"/>
....
</root>
I am now planning to switch to using xsd:any. I'd appreciate if you can help me choose th best approach
What is the value add of xsd:any over my previous approach if I specify processContents="strict"
Can a EAI/ESB tool/library execute XPATH expressions against the arbitrary elements I return
I see various binding tools treating this separately while generating the binding code. Is this this the same case if I include a namespace="http://mynamespace" and provide the schema for the "http://mynamespace" during code gen time?
Is this WS-I compliant?
Are there any gotchas that I am missing?
Thank you
Using <xsd:any processContents="strict"> gives people the ability to add extensions to their XML instance documents without changing the original schema. This is the critical benefit it gives you.
Yes. tools than manipulate the instances don't care what the schema looks like, it's the instance documents they look at. To them, it doesn't really matter if you use <xsd:any> or not.
Binding tools generally don't handle <xsd:any> very elegantly. This is understandable, since they have no information about what it could contain, so they'll usually give you an untyped placeholder. It's up the the application code to handle that at runtime. JAXB is particular (the RI, at least) makes a bit of a fist of it, but it's workable.
Yes. It's perfectly good XML Schema practice, and all valid XML Schema are supported by WS-I
<xsd:any> makes life a bit harder on the programmer, due to the untyped nature of the bindings, but if you need to support arbitrary extension points, this is the way to do it. However, if your extensions are well-defined, and do not change, then it may not be worth the irritation factor.
Regarding point 3
Binding tools generally don't handle
very elegantly. This is
understandable, since they have no
information about what it could
contain, so they'll usually give you
an untyped placeholder. It's up the
the application code to handle that at
runtime. JAXB is particular (the RI,
at least) makes a bit of a fist of it,
but it's workable.
This corresponds to the #XmlAnyElement annotation in JAXB. The behaviour is as follows:
#XmlAnyElement - Keep All as DOM Nodes
If you annotate a property with this annotation the corresponding portion of the XML document will be kept as DOM nodes.
#XMLAnyElement(lax=true) - Convert Known Elements to Domain Objects
By setting lax=true, if JAXB has a root type corresponding to that QName then it will convert that chunk to a domain object.
http://bdoughan.blogspot.com/2010/08/using-xmlanyelement-to-build-generic.html
We have an XML which needs to be validated against an XSD. The XML is being generated by XSTREAM. and We are using jaxp api's to validate the XML against the respective XSD. Unfortunately, currently our test case fail as the generated XML has elements/Tags in different order/sequence than the XSD.
Is it possible to ignore the order of elements in generated XML while validating it against XSD?
Thanks for the help in advance.
What you are asking for is a way to say "validate some of the XSD and ignore other parts". I don't think that can be done.
One possible solution would be to modify the schema so that instead of using a <sequence> for those elements (which requires that the elements be in a particular order) you can use <all>, which allows the elements to be in any order.
The point of a schema is to impose certain structure and requirements on an XML document. You can't just say "eh, I don't like that particular part of the schema, ignore it" as then the document doesn't conform to the schema anymore.
Let's say I have a doc.xml and corresponding doc.xsd. I use xpath to retrieve some nodes, so I get a list of org.w3c.dom.Node. How can I get type of each node from schema, eg. xs:integer, xs:string etc ?
Some solution would be to parse schema with xpath query "//NodeName[#type]" using node.getNodeName() as NodeName, but that's not perfect. I can't be sure that schema is elegant - what if NodeName exists in many places in schema and has not been extracted as a separate type?
So generally I am looking for a reliable solution to get the node type for ANY valid xml & xsd.
You should consider using JAXB. It will create Java classes for you based on the schema type. Then your XML docs are read into those classes, which are typed according to how you defined your XSD. Therefore xsd:int maps to java int(or Integer wrapper class, I can't recall), etc.
Cast your DOM Elements to TypeInfo: from there, you can access the type information you're looking for.
Unfortunately types as defined in an XML Schema (XSD) or Document Type Definition (DTD) are not directly tied to XML document they validate. The elements and attributes in an XML document do not inherently have a type they are just text. Think of an XSD as a script that validates an XML document rather than a set of type annotations for elements and attributes.
The XML specification does not define types as you are thinking of them here. Even Document Type Definitions (DTD) which can be embedded inside XML documents more about the structure of the document not the type of the data contained in elements and attributes.
The type system described in XML Schema is an optional layer of validation that can be applied to XML documents. Since this validation optional the standard XML APIs do not provide a way to bind the validation rules in an XSD to the actual attributes and elements.
I think it would be possible for an XML API to provide a mechanism to bind an XSD to a specific XML document, but I am not aware of an XML parser that does this. One reason why this is not so easy is that the type system that is defined in XML Schema is much richer than is supported in most mainstream programming languages. In your example you may only be interested in xs:integer, xs:string and the like but in XML Schema you can create types that specify ranges, patterns and other things that are just not possible with data types in most programming languages. To represent this complex type system in Java or any programming language would have to be done through a fairly complex API. The the question becomes it is really worth it? I would say probably not.
As per David Ds answer, slightly cleaner, call getSchemaTypeInfo() on an element or attribute