What is the best performance solution for XML generation.
My goal is to build a few simple XMLs from code. I am going to implement simple custom StringBuffer based implementation of XML Builder. From other side there are several libraries like http://code.google.com/p/java-xmlbuilder/ and http://code.google.com/p/xmltool/ which has nice DSL but I guess lack on performance.
Since my goal is build simple enough XMLBuilder with great performance I think I will build custom solution. It will featuring:
Nice Java-based DSL for XML constructs (adding tags basically)
Great StringBuffer based performance.
String data escape handling when adding XML tags.
Auto-indent
Please suggest if I am wrong on performance expectations and its probably better to use ready-made libraries.
UPDATE. Why I think the performance of standard xml builders is not very good.
Standard XML builders uses Document Builder Factory and works with classes behind the scenes. Also these classes optimized to fit all users. For example I don't need namespace support etc.
<?xml version="1.0" encoding="utf-8">
<root>
<testdata>value</testdata>
</root>
</xml>
Consider very simple XML code above. If you build with standard tools it will involve so many work just to make this simple XML. I consider that it's better to just generate it by myself using String.
UPDATE 2. Performance requirement is that code should do as many things as required to generate simple XML and not more.
UPDATE 3. Thanks everyone for great comments! Now I understand better what I need and that my initial goal was not set very correctly with word "performance". My true goal is to use simple enough solution with convenient DSL to describe the XML structure and generate the XML output.
I will use plain Java objects as DSL for XML and generate XML using XStream library which is pretty straightforward solution.
UPDATE 4. JAXB. I discussed XStream vs JAXB and found that JAXB is faster than XStream. Plus I already use JAXB in my project and I like its standard annotations. I change my mind and will go with JAXB for now because XStream was originally heavily developed at the time when JAXB was not so good as today.
I will suggest something very controversial but still ...
Make profiling and performance tests with both libraries.
If you don't have time for that, assuming something is slow would be the wrong choice in my opinion.
Because if it turns out that it actually is not slow, it would save you a lot of time to use an already built and supported library/framework.
Another thought.
You will need to test your completed high performance solution against the solutions already available anyway, to check if it is really high performance. So I would strongly suggest measuring the performance of the libraries available before starting your own.
Regarding:
Standard XML builders uses Document
Builder Factory and works with classes
behind the scenes. Also these classes
optimized to fit all users. For
example I don't need namespace support
etc.
An alternative to DOM is StAX (JSR-173). It is a Streaming API for XML that is quite fast. There are several implementations, I have found Woodstox to be quite performant.
There is powerful and flexible Groovy's NodeBuilder (http://groovy.codehaus.org/GroovyMarkup).
def root = new NodeBuilder()
.people(kind:'folks', groovy:true) {
person(x:123, name:'James', cheese:'edam') {
project(name:'groovy')
project(name:'geronimo')
}
person(x:234, name:'bob', cheese:'cheddar') {
project(name:'groovy')
project(name:'drools')
}
}
XmlUtil.serialize(root, System.out)
This results with an XML document:
<?xml version="1.0" encoding="UTF-8"?>
<people kind="folks" groovy="true">
<person x="123" name="James" cheese="edam">
<project name="groovy"/>
<project name="geronimo"/>
</person>
<person x="234" name="bob" cheese="cheddar">
<project name="groovy"/>
<project name="drools"/>
</person>
</people>
One more high-performance suggestion: use StaxMate -- it is as fast as underlying Stax-based XML writer, which is rather fast (40 - 80 megabytes per second, sustained). Just make sure you do NOT use default JDK 6 Stax implementation (Sun sjsxp) but something faster like Woodstox or Aalto.
I would strongly recommend against writing your own XML writer; it is typically risky (good chance you will forget some part of escaping) as others have mentioned, and not all that likely to be faster than existing efficient solutions (not all existing solutions are efficient; you do need to find ones that are). And in the end... unless you really want to write these things, why not work on something more interesting and meaningful?
But if you do want to do something above and beyond existing writers you could consider using a simple writer and augmenting it with additional functionality that you need. For example, if you just use Stax XMLStreamWriter as base, it is quite easy to add simple but efficient abstractions. Or if you like existing packages, see if you can suggest improvements to their authors (or even code contributions).
Related
I have to specify a JSON data structure; that data structure will be part of an interface description, the data will be processed by JavaScript. JSON is set for the data transmission. In other projects, where we used XML instead of JSON, I have used rich XML schemas for this. Unfortunately, I cannot do that now.
I did some researching and found JSON Schema.
However, this is still draft status, which makes me feel a bit uneasy to use it in this context.
I also came across this question discussing how to map XML to JSON. There seems to be a standard (?) conversion in the XML class in the org.json namespace. It appears that the conversion is rather straight-forward for XML documents without mixed content.
So the idea is to use XML Schema to describe the data structure, use our existing XML processing (editing, transformation, validation, ...) tools as long as possible on the server side and convert the XML DOM to JSON just before delivering the data to the JSON consumer.
Data transmission is one-way only and we would not have mixed-content XML.
Maybe someone has tried this before? Would that be a practical approach in the sense that the the semantics of the XML Schema are still clear enough for the client-side programmers when (conceptually) applied to the JSON document? Are there any particular pitfalls to be aware of?
If I understood your idea right, you want to use XML Schema as the primary model for you data exchange - for XML as well as JSON formats.
This idea has two parts:
Use single source to model all the data exchange.
Use XML Schema as this single source.
Singe source model
The first idea brings you to MDD (Model-Driven Development) or MDA (Model-Driven Architecture) which had a hype around 2002-2005. It was UML-heavy, vendor-driven hype, but quite a few reasonable things (like AndroMDA) survived.
Generally, MDA is a good idea. It works splendid as long as you do "standard" things. But it can be a nightmare if you want to "customize".
In your case, I would definitely say that single-source model makes sense. This is about data exchange. In the core this can be reduced to very simple models which are still powerful enough to express everything you need.
JSON is an example for this. JSON is even simpler that XML but still powerful enough. It clearly shows that as long as you have basic primitive types, objects, arrays and nesting you can express almost anything.
This "single source model" must not be necessarily UML, it can be anything powerful enough to cover all the underlying requirements.
The main problem with "single source model" is customizing. You know, 90% works verwy well OOTB, but then in 10% you don't get the result you want and have to customize and then the effort gets you. Most of the generation tools have some kinds of "plugins". So if you fit in the 90%, you're lucky, otherwise you may need to get to know the hairy internals of the genration tools.
To sum up, single-source model is a good idea as long as it serves all the needs AND the effort to adapt/apply it for the required scenarios is not greater that making it from scratch.
XML Schema as the model
The next question is whether XML Schema is good as the single source model.
You have probably heard or used JAXB which has a schema compiler (XJC). This compiler can take your XML Schema and then generate Java classes with JAXB annotations. These classes can then be used to unmarshal XML into Java objects or marshal these object to XML.
And to JSON:
JAXB Mapping to JSON
Looks like you can also produce a JSON Schema from these classes (haven't tried it myself though):
How to generate JSON schema from a JAXB annotated class?
So XML Schema-first approach works. You can call it schema-driven development (I, hereby, claim the copyright on this term).
I personally did a lot of things schema-first wrote a number of tools/plugins for XJC. For instance:
Hyperjaxb makes schema-derived classes persistable with JPA.
Jsonix is baiscally a JAXB port for pure JavaScript.
My experience is that you can do a lot of things schema-first, but I also have to say that XML Schema is good but not the best or simplest model. The specification is complex, and if you take a look at the schema-derived classes then you could spot a few constructs which don't fit well in Java beans and properties. For instance, #XmlElementRef is a complex and often weird looking construct - which is stil necessary to cover quite a number of cases you can easily express in the XML Schema. In all the tools I wrote i alsways had to fight with cases and corder cases and corner cases of corner cases of such constructs.
XML Schema, if you keep it simple and neat, may be beautiful. Maps perfect to beans and properties, easy to understand and work with, a lot of tool support. So XML Schema is not the worst choice to model or specify data exchange.
But it can also get as complex as hell. I saw a lot of overengineered schemas, which then are extremely hard to work with - for a very little gain. Sometimes schema designers just don't know XML Schema well enough, sometimes know it too well. Last time I helped to work out "XML Schema design best practices", we landed on 60+ someting pages document of do's and don't's. So it's easy to get XML Schemas wrong.
But still, as I said above, if it's kept simple and clean it may be beuatiful.
What are the alternatives?
Well, you may actually use your Java code as your model source. Annotated POJOs are expressionaly powerful and versatile enough, but still quite simple to work with. You are not schema-first, you're Java code-first then, but you still can do all the same tricks. You can generate an XML Schema based on your annotated classes. You can do persistence (and much more) with MOXy. You can do JSON just as well.
To sum up and answer your question:
Yes, it is practical, and is known to work fairly well.
Along with the schema-first approach also consider Java-first approach.
You have tools to get XML-Objects-JSON-Persistence.
There are pitfalls (see above).
Hope this helps.
Since no one has answered to this question so far and we have started to follow this approach, I quickly summarize that for us the approach works generally quite well. We have designed a very rich XML Schema, that serves us as part of the contract between the server and the web client. The JSON follows the XML one-to-one, so the XML Schema reads naturally for the JSON document, too.
The only minor problem we noticed is that the canonical XML-to-JSON transformation that we use (which is not Schema-aware) creates a single object when there is just one child element somewhere in the tree, even when the XML Schema has an upperBound of 'many' for that element. This means that the programmers have to handle some polymorphism between object-values and collections here on the JSON side.
I am using java, I want to read strings from an XML tag. EX: < blank type="Something">
I need to be able to assign "Something" to a variable. Any ideas?
There are a lot of ways to do this:
You can use the XML APIs provided with Java (SAX or STAX or DOM).
There are libraries that build on the XML APIs (JDOM, DOM4J, or XOM) which are easier to use than the raw APIs.
There's Java-XML databinding, described in Pratik's answer. Java-XML databinding is sometimes overkill, depending on your requirements, and when there are errors they can be hard to figure out. Sometimes it's worthwhile, though. I think JiBX is particularly interesting.
If you don't know where to start, start with XOM. XOM was created by a JDOM contributor, it was designed to be easy to use.
What you want to achieve is referred to as Unmarshalling an XML. Unmarshalling means extracting data from an XML document and using it to construct a Java object or graph of objects. There are various APIs available to do the same. You should have a look at the following links:
JAXB:
http://www.oracle.com/technetwork/articles/javase/index-140168.html
Castor: http://castor.codehaus.org/
JiBX: http://jibx.sourceforge.net/
I'm working on an existing system that's generating XML for a legacy system using a simple template language. This is obviously not ideal because it's difficult to see the structure of the generated XML, it suffers from escaping problems and it's easy to generate invalid XML.
For any sane XML formats I'd just Xstream or another Java XML serializing library, but this legacy system has a lot of strange rules like "this node should be excluded if the value is less then ten" and "the formatting of the date in node x depends on the value of node y". There are other strange rules as well, but this should be enough to get the idea.
As I've said, the template approach is far from idea, but it's pragmatic and works (with some effort). Is there a better way to approach generating XML for legacy systems with this amount of formatting rules? XSL has crossed my mind, but implementing any amount of logic in XSL is frankly not very tempting.
Basically you need some custom logic during serialization. I am guessing that the in-memory object structure is not directly mirrored in the XML structure? Alternatives:
Use StAX and distribute read and write methods within the objects.
Use JAXB and insert custom serialization.
Don't even think of expressing your custom logic in anything other than java, i.e. some "super" framework.
I am not sure, if this is what you are looking for, but maybe try XML Binding like JAXB...
In other words: you could generate a class library from your xsd-Schema and then build your object graph in java code, then serialize it in one call to xml.
You could use simple xml and some converters I think:
http://simple.sourceforge.net/download/stream/doc/tutorial/tutorial.php
Can you please recommend a fast XML Builder in Java that doesn't use annotations? Thanks.
JAXB 2. You can use it without annotations, defining mappings in XML resources.
Betwixt, from apache is pretty good for simple stuff. http://commons.apache.org/betwixt/
Xstream is a better and faster framework. Its easier to learn as well.
XMLEncoder. It is a standard Java class, simple, feature-poor way of serializing objects to XML.
How about StaxMate -- very light-weight, faster than object-mapper solutions (XStream or JAXB), still reasonably easy to use for many use cases.
Of course it really depends on whether by fast you mean time to write code (if so, XStream is probably the thing), or time to run it (in which case simple Stax/SAX wrapping writers, like StaxMate or XMLBuilder are the things); XStream is not particular fast, although it's not the slowest thing around.
I've been doing quite a bit of simple XML-processing in python and grown to like the ElementTree way of doing things.
Is there something similar and as easy to use in Java? I find the DOM model a bit cumbersome and find myself writing much more code than I would like to do simple things.
Or am I asking the wrong thing?
Maybe my question is: Is there a better option than the "XMLUtils" classes I see people implementing in some places to simplify their code when dealing with DOM?
Adding a litte bit here about why I like ElementTree since the question was asked.
Simplicity (I guess anything seems simple after working with DOM though)
Feels like a natural fit in python
Requires very little code on my part.
I'm trying to come up with a simple code example to illustrate, but it's sort of hard to give a good example.
Here's an attempt though. This just adds a tag with a value and an attribute to an existing xml string.
from xml.etree.ElementTree import *
xml_string = '<top><sub a="x"></sub></top>'
parsed = fromstring(xmlstring)
se = SubElement(parsed, "tag")
se.text = "value"
se.attrib["a"] = "x"
new_xml_string = tostring(parsed)
After that, the new_xml_string is
<top><sub a="x" /><tag a="x">value</tag></top>
Not an example that really covers everything, but still. There's also the fairly simple looping over tags when you want to do stuff, easy testing for presence of tags and attributes and other things.
To be honest, all XML APIs in Java suck, you just can vary the level of suckage you push yourself into which may turn horrible/slow to manageable/decent to even suprisingly OK at times.
This all mostly stems from the fact that Java APIs try to be as W3C DOM compliant as possible, in fact Xerces (Java's current native XML solution) prides itself on being compliant to a whole bunch of XML related W3C specifications as you can see from their front page.
The actual Xerces API is very unpleasant to work with, though, and because of that multiple other Java XML libraries have popped out over the years. Currently most popular ones are
JDOM, simplifies DOM operations a lot and do I dare to say even pleasant at times, works like a charm when mixed with Jaxen - well, unless you hit this problem with namespaces.
XOM which has a wonderful presentation about what's wrong with Java's XML right now and how they propose their way of doing things as a solution. In part it is actually better than JDOM, but it's not widespread enough yet so can't really say how it behaves in the real world out there. Definitely worth a check though.
dom4j, well-rounded library, supports all kinds of important features and plays out as a down-to-earth solution for XML. dom4j is basically the "old, proven and reliable" option of the popular ones.
Last but definitely not least I just have to mention StAX just because it's different, it's actually event-driven streaming API for XML. Definitely worth a look just out of curiosity.
PS. I'm currently actually writing my own XML parser/navigator as an exercise but haven't decided on what kind of API it will have. I'm really aiming for ease of use which seems to be quite rare in Java XML APIs so far, but I'm not entirely sure what kind of API I am going to provide. Python's ElementTree seems interesting, but since I'm not entirely familiar with it, would you like to maybe give a short summary on what exactly in it you find enjoyable?
You might look into the following alternatives:
dom4j
xom
jdom
Since I never used ElementTree I don't know wich one is the closest.
If you can use Groovy inside your project, it offers a set of classes that helps a lot when processing XML.
We find XOM (http://www.xom.nu) to provide simple subclassable Element functionality.
It is true the Java XML APIs are not the greatest in terms of usability. My prefered options would be XOM, JDOM then the built in JAXP in that order. There were some rumbling about native XML in the language (Begin Product Tab Sub Links
Integrating XML into the Java Programming Language) as a new data-type but that seems to have stalled.