Denormalize an XML schema programmatically - java

I need to take any given valid XML schema (XSD) and denormalize it to a simple form containing no refs, no includes, etc. All simple type definitions should be inline, such that when looking at any given element, all declarations are visible without performing another lookup.
I've found some tools that have this built-in, but I need to do it "on the fly." Platform of choice is Java, but I'd be willing to port the code from another language if necessary. I just really don't want to reinvent the wheel here. Searching for OSS libraries from Apache/etc have yielded nothing. The closest I've found is XSOM which supports traversing a schema as an object model, but you still have to handle every possible form that a schema could take to represent a given structure.
The output doesn't have to be actual XML, as it will actually be used in a object model in its final form.

You might find XSD4J helpful:
http://dynvocation.selfip.net/xsd4j/

The EMF XSD model may be helpful:
http://www.eclipse.org/modeling/mdt/?project=xsd

Another useful API for XML Schema is XSOM.
XSOM is used by XJC, JAXB schema compiler under the hub so is probably guaranteed to be kept alive.

Related

Is it practical to combine XML Schema and an XML-to-JSON conversion?

I have to specify a JSON data structure; that data structure will be part of an interface description, the data will be processed by JavaScript. JSON is set for the data transmission. In other projects, where we used XML instead of JSON, I have used rich XML schemas for this. Unfortunately, I cannot do that now.
I did some researching and found JSON Schema.
However, this is still draft status, which makes me feel a bit uneasy to use it in this context.
I also came across this question discussing how to map XML to JSON. There seems to be a standard (?) conversion in the XML class in the org.json namespace. It appears that the conversion is rather straight-forward for XML documents without mixed content.
So the idea is to use XML Schema to describe the data structure, use our existing XML processing (editing, transformation, validation, ...) tools as long as possible on the server side and convert the XML DOM to JSON just before delivering the data to the JSON consumer.
Data transmission is one-way only and we would not have mixed-content XML.
Maybe someone has tried this before? Would that be a practical approach in the sense that the the semantics of the XML Schema are still clear enough for the client-side programmers when (conceptually) applied to the JSON document? Are there any particular pitfalls to be aware of?
If I understood your idea right, you want to use XML Schema as the primary model for you data exchange - for XML as well as JSON formats.
This idea has two parts:
Use single source to model all the data exchange.
Use XML Schema as this single source.
Singe source model
The first idea brings you to MDD (Model-Driven Development) or MDA (Model-Driven Architecture) which had a hype around 2002-2005. It was UML-heavy, vendor-driven hype, but quite a few reasonable things (like AndroMDA) survived.
Generally, MDA is a good idea. It works splendid as long as you do "standard" things. But it can be a nightmare if you want to "customize".
In your case, I would definitely say that single-source model makes sense. This is about data exchange. In the core this can be reduced to very simple models which are still powerful enough to express everything you need.
JSON is an example for this. JSON is even simpler that XML but still powerful enough. It clearly shows that as long as you have basic primitive types, objects, arrays and nesting you can express almost anything.
This "single source model" must not be necessarily UML, it can be anything powerful enough to cover all the underlying requirements.
The main problem with "single source model" is customizing. You know, 90% works verwy well OOTB, but then in 10% you don't get the result you want and have to customize and then the effort gets you. Most of the generation tools have some kinds of "plugins". So if you fit in the 90%, you're lucky, otherwise you may need to get to know the hairy internals of the genration tools.
To sum up, single-source model is a good idea as long as it serves all the needs AND the effort to adapt/apply it for the required scenarios is not greater that making it from scratch.
XML Schema as the model
The next question is whether XML Schema is good as the single source model.
You have probably heard or used JAXB which has a schema compiler (XJC). This compiler can take your XML Schema and then generate Java classes with JAXB annotations. These classes can then be used to unmarshal XML into Java objects or marshal these object to XML.
And to JSON:
JAXB Mapping to JSON
Looks like you can also produce a JSON Schema from these classes (haven't tried it myself though):
How to generate JSON schema from a JAXB annotated class?
So XML Schema-first approach works. You can call it schema-driven development (I, hereby, claim the copyright on this term).
I personally did a lot of things schema-first wrote a number of tools/plugins for XJC. For instance:
Hyperjaxb makes schema-derived classes persistable with JPA.
Jsonix is baiscally a JAXB port for pure JavaScript.
My experience is that you can do a lot of things schema-first, but I also have to say that XML Schema is good but not the best or simplest model. The specification is complex, and if you take a look at the schema-derived classes then you could spot a few constructs which don't fit well in Java beans and properties. For instance, #XmlElementRef is a complex and often weird looking construct - which is stil necessary to cover quite a number of cases you can easily express in the XML Schema. In all the tools I wrote i alsways had to fight with cases and corder cases and corner cases of corner cases of such constructs.
XML Schema, if you keep it simple and neat, may be beautiful. Maps perfect to beans and properties, easy to understand and work with, a lot of tool support. So XML Schema is not the worst choice to model or specify data exchange.
But it can also get as complex as hell. I saw a lot of overengineered schemas, which then are extremely hard to work with - for a very little gain. Sometimes schema designers just don't know XML Schema well enough, sometimes know it too well. Last time I helped to work out "XML Schema design best practices", we landed on 60+ someting pages document of do's and don't's. So it's easy to get XML Schemas wrong.
But still, as I said above, if it's kept simple and clean it may be beuatiful.
What are the alternatives?
Well, you may actually use your Java code as your model source. Annotated POJOs are expressionaly powerful and versatile enough, but still quite simple to work with. You are not schema-first, you're Java code-first then, but you still can do all the same tricks. You can generate an XML Schema based on your annotated classes. You can do persistence (and much more) with MOXy. You can do JSON just as well.
To sum up and answer your question:
Yes, it is practical, and is known to work fairly well.
Along with the schema-first approach also consider Java-first approach.
You have tools to get XML-Objects-JSON-Persistence.
There are pitfalls (see above).
Hope this helps.
Since no one has answered to this question so far and we have started to follow this approach, I quickly summarize that for us the approach works generally quite well. We have designed a very rich XML Schema, that serves us as part of the contract between the server and the web client. The JSON follows the XML one-to-one, so the XML Schema reads naturally for the JSON document, too.
The only minor problem we noticed is that the canonical XML-to-JSON transformation that we use (which is not Schema-aware) creates a single object when there is just one child element somewhere in the tree, even when the XML Schema has an upperBound of 'many' for that element. This means that the programmers have to handle some polymorphism between object-values and collections here on the JSON side.

Java to XSD or XSD to Java

I know that, using JAXB, you can generate Java files from an XSD and that you can also generate the XSD from annotated POJOs. What are the advantages and disadvantages of each? Is one overall better than the other?
We basically want to serialize events to a log in XML format.
Ultimately it depends on where you want to focus:
If the XML Schema is the Most Important Thing
Then it is best to start from the XML schema and generate a JAXB model. There are details of an XML schema that a JAXB (JSR-222) implementation just can't generate:
A maxOccurs other than 0, 1, or unbounded
many facets on a simple type
model groups
If the Object Model is the Most Important Thing
If you will be using the Java model for more than just converting between objects and XML (i.e. using it with JPA for persistence) then I would recommend starting with Java objects. This will give you the greatest control.
It depends on your requirement and scenario with respect to the point of initiation.
Given your requirement, use generate Java files from an XSD as you want to define the output(XML) format first which should be supported by Java.,
Given that one of the main points of XML is to be a portable data-transfer format, usable regardless of platform or language, I would avoid generating XSD from any specific programming language, as a rule of thumb. This may not matter if you know you are only communicating between Java endpoints (but are you sure this will be true forever?).
It is better, all else being equal, to define the interface/schema in a programming-language neutral way.
There are lots of exceptions to this general principle, especially if you are integrating with existing or legacy code...
If you have the opportunity to design both pojo and schema, It's a matter of design - do you design for a "perfect" schema or for a "perfect" java class.
In some cases, you don't have the luxury of a choice, in system integration scenarios, you might be given a predefined XSD from another system that you need to adapt to, then XSD -> Class will be the only way.

Java - Generating XML for a legacy system

I'm working on an existing system that's generating XML for a legacy system using a simple template language. This is obviously not ideal because it's difficult to see the structure of the generated XML, it suffers from escaping problems and it's easy to generate invalid XML.
For any sane XML formats I'd just Xstream or another Java XML serializing library, but this legacy system has a lot of strange rules like "this node should be excluded if the value is less then ten" and "the formatting of the date in node x depends on the value of node y". There are other strange rules as well, but this should be enough to get the idea.
As I've said, the template approach is far from idea, but it's pragmatic and works (with some effort). Is there a better way to approach generating XML for legacy systems with this amount of formatting rules? XSL has crossed my mind, but implementing any amount of logic in XSL is frankly not very tempting.
Basically you need some custom logic during serialization. I am guessing that the in-memory object structure is not directly mirrored in the XML structure? Alternatives:
Use StAX and distribute read and write methods within the objects.
Use JAXB and insert custom serialization.
Don't even think of expressing your custom logic in anything other than java, i.e. some "super" framework.
I am not sure, if this is what you are looking for, but maybe try XML Binding like JAXB...
In other words: you could generate a class library from your xsd-Schema and then build your object graph in java code, then serialize it in one call to xml.
You could use simple xml and some converters I think:
http://simple.sourceforge.net/download/stream/doc/tutorial/tutorial.php

How to create a handler for parsing a xml using SAX?

I'm using xerces SAX to parse this XML:
<product id="123">
<sku>abc123</sku>
<location>
<warehouse>1</warehouse/>
<level>3</level>
</location>
<details>
<weight unit="kg">150</weight/>
<mfg>honda</mfg>
</details>
</product>
So I created my own class named ProductSAXHandler.java:
public class ProductSAXHandler extends DefaultHandler {
}
Now from my spring mvc controller I plan on pass my XML file as a string, and I want a Product object returned.
So inside my ProductSAXHandler I will create my startElement, endElement methods.
So should I just create a method that takes the xml as a string, then instantiates this ProductSAXhandler and then call parse?
This is my first time dealing with xml/sax so I'm a little unclear how to design this class.
This doesn't directly answer your question, but I'm gonna read a bit between the lines and focus on your actual intention...
Since you want an instance of some Product class that encapsulates the data from the XML, probably in a structured way by preference, you'd do much better to use JAXB for this task. Unless you have really specific requirements regarding the customization of binding XML input to objects, this will turn out a lot simpler than using SAX.
What you'll need to do is:
Get a W3C XML Schema for your XML. If you don't have one and can't obtain one, then there are tools out there that can generate a schema based on input XML. Trang makes this very easy.
Generate Java classes from the schema. For this you can use XJC (the XML-to-Java Compiler) available with Sun's JDK. If you're using build tools like Ant or Maven, there's plugins available. This can automate the process to make it part of a full build.
Use the JAXB API with your generated classes to easily turn XML documents into objects and vice-versa.
Although JAXB will take some time to learn (especially if the desired XML-Java mapping isn't 1-to-1), I think you'll end up saving time in the long run.
Sorry if you really do need SAX and this answer is not applicable, but I figured I'd rather let you know your options before using a somewhat archaic XML processing model. Note that DOM and StAX might also be of interest to you.

Generate object model out of RelaxNG schema with RNGOM - how to start?

I want to generate an object model out of an RelaxNG Schema.
Therefore I want to use the RNGOM Object Model/Parser (mainly because I could not find any alternative - although I don't even care about the language the parser is written in/generates). Now that I checked out the RNGOM source from SVN, I don't have ANY idea how to use RNGOM, since there is not any piece of information out there about the usage.
A useful hint how to start with RNGOM - a link, example, or any description which saves me from having to read understand the whole source code of RNGOM - will be awarded as an answer.
Even better would be a simple example how to use the parser to generate an Object model out of an RNG file.
More infos:
I want to generate Java classes out of the following RelaxNG Schema:
http://libvirt.org/git/?p=libvirt.git;a=tree;f=docs/schemas;hb=HEAD
I found out that the Glassfish guys are using rngom to generate the same object model I need, but I could not yet find out how they are using rngom.
A way to proceed could be to :
use jing to convert from Relax NG to XML Schema (see here)
use more common tools to generate classes (e.g. JaxB).
Hi I ran into mostly the same requirement except I am concentrating on the Compact Syntax. Here is one way of doing what you want but YMMV.
To give some context, my goal in 2 phases: (a) Trying to slurp RelaxNG Compact Syntax and traverse an object/tree to create Spring 4 POJOs usable in Spring 4 Rest Controller. (b) From there I want to develop a request validator that uses the RNG Compact and automatically validates the request before Spring de-serializes the request. Basically scaffolding JSON REST API development using RelaxNG Compact Syntax as both design/documentation and JSON schema definition/validation.
For the first objective I thought about annotating the CompactSyntax with JJTree but I am obviously not fluent in JavaCC so I decided to go a more programatic approach...
I analyzed and tested the code in several ways to determine if there was a tree implementation in binary, digested and/or nc packages but I don't think there is one (an om/tree) as such.
So my latest, actually successful approach, has been to build upon binary and extend SchemaBuilderImpl, implement the visitor interface, and passing my custom SchemaBuilderImpl to CompactSyntax using the long constructor: CompactSyntax(CompactParseable parseable, Reader r, String sourceUri, SchemaBuilder sb, ErrorHandler eh, String inheritedNs)
When you call CompactParseable.parse you will get structured events in the visitor interface and I think this is good enough to traverse the rng schema and from here you could easily create an OM or tree.
But I am not sure this is the best approach. Maybe I missed something and there is in fact an OM/Tree built by the rngom implementation (in my case CompactSyntax) that you can traverse to determine parent/child relationships more easily. Or maybe there are other approaches to this.
Anyway, this is one approach that seems to be working for what I want. Is mostly visitor pattern based and since the interfaces were there I decided to use them. Maybe it will work for you. Bottom line, I could not find an OM/AST that can be traversed implemented anywhere in the implementation packages (nc, binary, digested).

Categories