Generate object model out of RelaxNG schema with RNGOM - how to start?

Generate object model out of RelaxNG schema with RNGOM - how to start? - java

I want to generate an object model out of an RelaxNG Schema.
Therefore I want to use the RNGOM Object Model/Parser (mainly because I could not find any alternative - although I don't even care about the language the parser is written in/generates). Now that I checked out the RNGOM source from SVN, I don't have ANY idea how to use RNGOM, since there is not any piece of information out there about the usage.
A useful hint how to start with RNGOM - a link, example, or any description which saves me from having to read understand the whole source code of RNGOM - will be awarded as an answer.
Even better would be a simple example how to use the parser to generate an Object model out of an RNG file.
More infos:
I want to generate Java classes out of the following RelaxNG Schema:
http://libvirt.org/git/?p=libvirt.git;a=tree;f=docs/schemas;hb=HEAD
I found out that the Glassfish guys are using rngom to generate the same object model I need, but I could not yet find out how they are using rngom.

A way to proceed could be to :
use jing to convert from Relax NG to XML Schema (see here)
use more common tools to generate classes (e.g. JaxB).

Hi I ran into mostly the same requirement except I am concentrating on the Compact Syntax. Here is one way of doing what you want but YMMV.
To give some context, my goal in 2 phases: (a) Trying to slurp RelaxNG Compact Syntax and traverse an object/tree to create Spring 4 POJOs usable in Spring 4 Rest Controller. (b) From there I want to develop a request validator that uses the RNG Compact and automatically validates the request before Spring de-serializes the request. Basically scaffolding JSON REST API development using RelaxNG Compact Syntax as both design/documentation and JSON schema definition/validation.
For the first objective I thought about annotating the CompactSyntax with JJTree but I am obviously not fluent in JavaCC so I decided to go a more programatic approach...
I analyzed and tested the code in several ways to determine if there was a tree implementation in binary, digested and/or nc packages but I don't think there is one (an om/tree) as such.
So my latest, actually successful approach, has been to build upon binary and extend SchemaBuilderImpl, implement the visitor interface, and passing my custom SchemaBuilderImpl to CompactSyntax using the long constructor: CompactSyntax(CompactParseable parseable, Reader r, String sourceUri, SchemaBuilder sb, ErrorHandler eh, String inheritedNs)
When you call CompactParseable.parse you will get structured events in the visitor interface and I think this is good enough to traverse the rng schema and from here you could easily create an OM or tree.
But I am not sure this is the best approach. Maybe I missed something and there is in fact an OM/Tree built by the rngom implementation (in my case CompactSyntax) that you can traverse to determine parent/child relationships more easily. Or maybe there are other approaches to this.
Anyway, this is one approach that seems to be working for what I want. Is mostly visitor pattern based and since the interfaces were there I decided to use them. Maybe it will work for you. Bottom line, I could not find an OM/AST that can be traversed implemented anywhere in the implementation packages (nc, binary, digested).

Related

Is it practical to combine XML Schema and an XML-to-JSON conversion?

I have to specify a JSON data structure; that data structure will be part of an interface description, the data will be processed by JavaScript. JSON is set for the data transmission. In other projects, where we used XML instead of JSON, I have used rich XML schemas for this. Unfortunately, I cannot do that now.
I did some researching and found JSON Schema.
However, this is still draft status, which makes me feel a bit uneasy to use it in this context.
I also came across this question discussing how to map XML to JSON. There seems to be a standard (?) conversion in the XML class in the org.json namespace. It appears that the conversion is rather straight-forward for XML documents without mixed content.
So the idea is to use XML Schema to describe the data structure, use our existing XML processing (editing, transformation, validation, ...) tools as long as possible on the server side and convert the XML DOM to JSON just before delivering the data to the JSON consumer.
Data transmission is one-way only and we would not have mixed-content XML.
Maybe someone has tried this before? Would that be a practical approach in the sense that the the semantics of the XML Schema are still clear enough for the client-side programmers when (conceptually) applied to the JSON document? Are there any particular pitfalls to be aware of?

If I understood your idea right, you want to use XML Schema as the primary model for you data exchange - for XML as well as JSON formats.
This idea has two parts:
Use single source to model all the data exchange.
Use XML Schema as this single source.
Singe source model
The first idea brings you to MDD (Model-Driven Development) or MDA (Model-Driven Architecture) which had a hype around 2002-2005. It was UML-heavy, vendor-driven hype, but quite a few reasonable things (like AndroMDA) survived.
Generally, MDA is a good idea. It works splendid as long as you do "standard" things. But it can be a nightmare if you want to "customize".
In your case, I would definitely say that single-source model makes sense. This is about data exchange. In the core this can be reduced to very simple models which are still powerful enough to express everything you need.
JSON is an example for this. JSON is even simpler that XML but still powerful enough. It clearly shows that as long as you have basic primitive types, objects, arrays and nesting you can express almost anything.
This "single source model" must not be necessarily UML, it can be anything powerful enough to cover all the underlying requirements.
The main problem with "single source model" is customizing. You know, 90% works verwy well OOTB, but then in 10% you don't get the result you want and have to customize and then the effort gets you. Most of the generation tools have some kinds of "plugins". So if you fit in the 90%, you're lucky, otherwise you may need to get to know the hairy internals of the genration tools.
To sum up, single-source model is a good idea as long as it serves all the needs AND the effort to adapt/apply it for the required scenarios is not greater that making it from scratch.
XML Schema as the model
The next question is whether XML Schema is good as the single source model.
You have probably heard or used JAXB which has a schema compiler (XJC). This compiler can take your XML Schema and then generate Java classes with JAXB annotations. These classes can then be used to unmarshal XML into Java objects or marshal these object to XML.
And to JSON:
JAXB Mapping to JSON
Looks like you can also produce a JSON Schema from these classes (haven't tried it myself though):
How to generate JSON schema from a JAXB annotated class?
So XML Schema-first approach works. You can call it schema-driven development (I, hereby, claim the copyright on this term).
I personally did a lot of things schema-first wrote a number of tools/plugins for XJC. For instance:
Hyperjaxb makes schema-derived classes persistable with JPA.
Jsonix is baiscally a JAXB port for pure JavaScript.
My experience is that you can do a lot of things schema-first, but I also have to say that XML Schema is good but not the best or simplest model. The specification is complex, and if you take a look at the schema-derived classes then you could spot a few constructs which don't fit well in Java beans and properties. For instance, #XmlElementRef is a complex and often weird looking construct - which is stil necessary to cover quite a number of cases you can easily express in the XML Schema. In all the tools I wrote i alsways had to fight with cases and corder cases and corner cases of corner cases of such constructs.
XML Schema, if you keep it simple and neat, may be beautiful. Maps perfect to beans and properties, easy to understand and work with, a lot of tool support. So XML Schema is not the worst choice to model or specify data exchange.
But it can also get as complex as hell. I saw a lot of overengineered schemas, which then are extremely hard to work with - for a very little gain. Sometimes schema designers just don't know XML Schema well enough, sometimes know it too well. Last time I helped to work out "XML Schema design best practices", we landed on 60+ someting pages document of do's and don't's. So it's easy to get XML Schemas wrong.
But still, as I said above, if it's kept simple and clean it may be beuatiful.
What are the alternatives?
Well, you may actually use your Java code as your model source. Annotated POJOs are expressionaly powerful and versatile enough, but still quite simple to work with. You are not schema-first, you're Java code-first then, but you still can do all the same tricks. You can generate an XML Schema based on your annotated classes. You can do persistence (and much more) with MOXy. You can do JSON just as well.
To sum up and answer your question:
Yes, it is practical, and is known to work fairly well.
Along with the schema-first approach also consider Java-first approach.
You have tools to get XML-Objects-JSON-Persistence.
There are pitfalls (see above).
Hope this helps.

Since no one has answered to this question so far and we have started to follow this approach, I quickly summarize that for us the approach works generally quite well. We have designed a very rich XML Schema, that serves us as part of the contract between the server and the web client. The JSON follows the XML one-to-one, so the XML Schema reads naturally for the JSON document, too.
The only minor problem we noticed is that the canonical XML-to-JSON transformation that we use (which is not Schema-aware) creates a single object when there is just one child element somewhere in the tree, even when the XML Schema has an upperBound of 'many' for that element. This means that the programmers have to handle some polymorphism between object-values and collections here on the JSON side.

Java - Generating XML for a legacy system

I'm working on an existing system that's generating XML for a legacy system using a simple template language. This is obviously not ideal because it's difficult to see the structure of the generated XML, it suffers from escaping problems and it's easy to generate invalid XML.
For any sane XML formats I'd just Xstream or another Java XML serializing library, but this legacy system has a lot of strange rules like "this node should be excluded if the value is less then ten" and "the formatting of the date in node x depends on the value of node y". There are other strange rules as well, but this should be enough to get the idea.
As I've said, the template approach is far from idea, but it's pragmatic and works (with some effort). Is there a better way to approach generating XML for legacy systems with this amount of formatting rules? XSL has crossed my mind, but implementing any amount of logic in XSL is frankly not very tempting.

Basically you need some custom logic during serialization. I am guessing that the in-memory object structure is not directly mirrored in the XML structure? Alternatives:
Use StAX and distribute read and write methods within the objects.
Use JAXB and insert custom serialization.
Don't even think of expressing your custom logic in anything other than java, i.e. some "super" framework.

I am not sure, if this is what you are looking for, but maybe try XML Binding like JAXB...
In other words: you could generate a class library from your xsd-Schema and then build your object graph in java code, then serialize it in one call to xml.

You could use simple xml and some converters I think:
http://simple.sourceforge.net/download/stream/doc/tutorial/tutorial.php

Extending Axis Code Generation module

In short, what I'm looking to achieve is the ability to run wsdl2java and generate extra code. Has anyone done this and can offer hints/tips/advice, has anyone done anything similar with a different approach the one outline further down the question (a lot further down)?
In the long form:
Background:
We have a third party piece of software that is used extensively in many projects but it has no ability to integrate directly with web services. With this in mind we take the wsdl, generate the client and then have a lot of boilerplate code that sits on top to allow integration. I've spent some time stream lining this but I want to go the whole hog.
Current standing:
I've written a simple first generation code generator which handles the creation of 95% of the code however this is reads in a hand written xml config, outputs the code using FileWriter(eugh), but I still need to hand write the code to tie it pass info to/from the actual webservice client code. This was just a quick and dirty solution as I needed it fast, and also to act as a POC.
Approach to solving this:
I'm picking this up in my own time purely because I think its a interesting problem but as such I don't want to waste lots of it on a dead end approach.
I believe the way to achieve my goal is to write an extension to the code generation module as described here http://wso2.org/library/35, I belive by writing this extension I will get access to the axis model of the wsdl and can apply my own xslt to it.
If you agree and have done similar, is there any advice you care to share, or useful resources you can point me to.
If you disagree with my approach I'd appreciate a brief outline( don't want to waste your time) of why and suggestion for a new approch.

I have never extended the Axis2 WSDL2Java emitter so I don't know how much flexibility you would gain from doing so. The article you reference suggests that you can hook into the generation process quite easily. It really depends on what you have to generate.
Recently I had to do create boilerplate code from Database schemas and WSDLs and I have used a mixed approach:
Groovy
Groovy is great for rapid prototyping and templates. You can, for instance, collect information from a Database or a Wsdl and emit code based on a template. Here you can see some examples: http://groovy.codehaus.org/Groovy+Templates
PMD API
PMD is a tool to scan Java code and report potential problems. It also exposes an API for parsing code using XPATH and has a very rich model to work with. You can do stuff like:
final Java15Parser parser = new Java15Parser();
final FileInputStream stream = new FileInputStream("VehicleServiceType.java");
final Object c = parser.parse(new InputStreamReader(stream));
final XPath xpath = new BaseXPath("//TypeDeclaration/Annotation/NormalAnnotation[Name/#Image = 'WebService']",
new DocumentNavigator());
for (final Iterator iter = xpath.selectNodes(c).iterator(); iter.hasNext();) {
final Object obj = iter.next();
// Do code generation based on annotations...
}
Personally, I have found that a mixed approach works better then a monolithic one. Code generation is often an art more then a science.
One more thing: in my current project, I'm looking at Python for (simple) code generation. It has a very nice templating library (jinja), but I wouldn't recommend it for parsing Java code.
Hope this help!

You've already started writing a code generator to achieve your end so you could try continuing on this path. The codemodel library is a pretty amazing library for generating code. I have just recently used it to generate code and it was very good.
I'd suggest you give this codemodel library a try to generate the code you require. This is the codemodel library written for jaxb wsdl to java compiler.

We've used wsdl2java a lot (but in axis 1.4) My only tips are:
1 use complex/structured types as the arguments to operations e.g. resetWidget(widgetStruct) where widgetStruct class contains the fields widgetId, widgetName, widgetType etc instead of resetWidget(arg1, arg2, arg3..). So next year when you extend the WSDL and add some more params all the legacy code will still compile without have to extend all your methods. This approach was actually forced on us because another (old) WSDL tool didn't generate the responses correctly if we passed all the fields as params.
2 Put all you business logic into other classes. So when you regenerate the skeleton you can just put back in a few lines of code instead of having to update huge chunks of code.
Maybe some of these problems are solved in Axis2.

Further research revealed that the way to go was to create a new CodeEmitter based on AxisServiceBasedMultiLanguageEmitter.
Unfortunately this project was canned as there stopped being a requirement to create the web service clients. The third party piece of software released a new version allowing it to directly consume web services.

Denormalize an XML schema programmatically

I need to take any given valid XML schema (XSD) and denormalize it to a simple form containing no refs, no includes, etc. All simple type definitions should be inline, such that when looking at any given element, all declarations are visible without performing another lookup.
I've found some tools that have this built-in, but I need to do it "on the fly." Platform of choice is Java, but I'd be willing to port the code from another language if necessary. I just really don't want to reinvent the wheel here. Searching for OSS libraries from Apache/etc have yielded nothing. The closest I've found is XSOM which supports traversing a schema as an object model, but you still have to handle every possible form that a schema could take to represent a given structure.
The output doesn't have to be actual XML, as it will actually be used in a object model in its final form.

You might find XSD4J helpful:
http://dynvocation.selfip.net/xsd4j/

The EMF XSD model may be helpful:
http://www.eclipse.org/modeling/mdt/?project=xsd

Another useful API for XML Schema is XSOM.
XSOM is used by XJC, JAXB schema compiler under the hub so is probably guaranteed to be kept alive.

Serialize Java objects into Java code

Does somebody know a Java library which serializes a Java object hierarchy into Java code which generates this object hierarchy? Like Object/XML serialization, only that the output format is not binary/XML but Java code.

Serialised data represents the internal data of objects. There isn't enough information to work out what methods you would need to call on the objects to reproduce the internal state.
There are two obvious approaches:
Encode the serialised data in a literal String and deserialise that.
Use java.beans XML persistence, which should be easy enough to process with your favourite XML->Java source technique.

I am not aware of any libraries that will do this out of the box but you should be able to take one of the many object to XML serialisation libraries and customise the backend code to generate Java. Would probably not be much code.
For example a quick google turned up XStream. I've never used it but is seems to support multiple backends other than XML - e.g. JSON. You can implement your own writer and just write out the Java code needed to recreate the hierarchy.
I'm sure you could do the same with other libraries, in particular if you can hook into a SAX event stream.
See:
HierarchicalStreamWriter

Great question. I was thinking about serializing objects into java code to make testing easier. The use case would be to load some data into a db, then generate the code creating an object and later use this code in test methods to initialize data without the need to access the DB.
It is somehow true that the object state doesn't contain enough info to know how it's been created and transformed, however, for simple java beans there is no reason why this shouldn't be possible.
Do you feel like writing a small library for this purpose? I'll start coding soon!

XStream is a serialization library I used for serialization to XML. It should be possible and rather easy to extend it so that it writes Java code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.