xml.etree.ElementTree equivalent in Java - java

I've been doing quite a bit of simple XML-processing in python and grown to like the ElementTree way of doing things.
Is there something similar and as easy to use in Java? I find the DOM model a bit cumbersome and find myself writing much more code than I would like to do simple things.
Or am I asking the wrong thing?
Maybe my question is: Is there a better option than the "XMLUtils" classes I see people implementing in some places to simplify their code when dealing with DOM?
Adding a litte bit here about why I like ElementTree since the question was asked.
Simplicity (I guess anything seems simple after working with DOM though)
Feels like a natural fit in python
Requires very little code on my part.
I'm trying to come up with a simple code example to illustrate, but it's sort of hard to give a good example.
Here's an attempt though. This just adds a tag with a value and an attribute to an existing xml string.
from xml.etree.ElementTree import *
xml_string = '<top><sub a="x"></sub></top>'
parsed = fromstring(xmlstring)
se = SubElement(parsed, "tag")
se.text = "value"
se.attrib["a"] = "x"
new_xml_string = tostring(parsed)
After that, the new_xml_string is
<top><sub a="x" /><tag a="x">value</tag></top>
Not an example that really covers everything, but still. There's also the fairly simple looping over tags when you want to do stuff, easy testing for presence of tags and attributes and other things.

To be honest, all XML APIs in Java suck, you just can vary the level of suckage you push yourself into which may turn horrible/slow to manageable/decent to even suprisingly OK at times.
This all mostly stems from the fact that Java APIs try to be as W3C DOM compliant as possible, in fact Xerces (Java's current native XML solution) prides itself on being compliant to a whole bunch of XML related W3C specifications as you can see from their front page.
The actual Xerces API is very unpleasant to work with, though, and because of that multiple other Java XML libraries have popped out over the years. Currently most popular ones are
JDOM, simplifies DOM operations a lot and do I dare to say even pleasant at times, works like a charm when mixed with Jaxen - well, unless you hit this problem with namespaces.
XOM which has a wonderful presentation about what's wrong with Java's XML right now and how they propose their way of doing things as a solution. In part it is actually better than JDOM, but it's not widespread enough yet so can't really say how it behaves in the real world out there. Definitely worth a check though.
dom4j, well-rounded library, supports all kinds of important features and plays out as a down-to-earth solution for XML. dom4j is basically the "old, proven and reliable" option of the popular ones.
Last but definitely not least I just have to mention StAX just because it's different, it's actually event-driven streaming API for XML. Definitely worth a look just out of curiosity.
PS. I'm currently actually writing my own XML parser/navigator as an exercise but haven't decided on what kind of API it will have. I'm really aiming for ease of use which seems to be quite rare in Java XML APIs so far, but I'm not entirely sure what kind of API I am going to provide. Python's ElementTree seems interesting, but since I'm not entirely familiar with it, would you like to maybe give a short summary on what exactly in it you find enjoyable?

You might look into the following alternatives:
dom4j
xom
jdom
Since I never used ElementTree I don't know wich one is the closest.
If you can use Groovy inside your project, it offers a set of classes that helps a lot when processing XML.

We find XOM (http://www.xom.nu) to provide simple subclassable Element functionality.

It is true the Java XML APIs are not the greatest in terms of usability. My prefered options would be XOM, JDOM then the built in JAXP in that order. There were some rumbling about native XML in the language (Begin Product Tab Sub Links
Integrating XML into the Java Programming Language) as a new data-type but that seems to have stalled.

Related

Is it practical to combine XML Schema and an XML-to-JSON conversion?

I have to specify a JSON data structure; that data structure will be part of an interface description, the data will be processed by JavaScript. JSON is set for the data transmission. In other projects, where we used XML instead of JSON, I have used rich XML schemas for this. Unfortunately, I cannot do that now.
I did some researching and found JSON Schema.
However, this is still draft status, which makes me feel a bit uneasy to use it in this context.
I also came across this question discussing how to map XML to JSON. There seems to be a standard (?) conversion in the XML class in the org.json namespace. It appears that the conversion is rather straight-forward for XML documents without mixed content.
So the idea is to use XML Schema to describe the data structure, use our existing XML processing (editing, transformation, validation, ...) tools as long as possible on the server side and convert the XML DOM to JSON just before delivering the data to the JSON consumer.
Data transmission is one-way only and we would not have mixed-content XML.
Maybe someone has tried this before? Would that be a practical approach in the sense that the the semantics of the XML Schema are still clear enough for the client-side programmers when (conceptually) applied to the JSON document? Are there any particular pitfalls to be aware of?
If I understood your idea right, you want to use XML Schema as the primary model for you data exchange - for XML as well as JSON formats.
This idea has two parts:
Use single source to model all the data exchange.
Use XML Schema as this single source.
Singe source model
The first idea brings you to MDD (Model-Driven Development) or MDA (Model-Driven Architecture) which had a hype around 2002-2005. It was UML-heavy, vendor-driven hype, but quite a few reasonable things (like AndroMDA) survived.
Generally, MDA is a good idea. It works splendid as long as you do "standard" things. But it can be a nightmare if you want to "customize".
In your case, I would definitely say that single-source model makes sense. This is about data exchange. In the core this can be reduced to very simple models which are still powerful enough to express everything you need.
JSON is an example for this. JSON is even simpler that XML but still powerful enough. It clearly shows that as long as you have basic primitive types, objects, arrays and nesting you can express almost anything.
This "single source model" must not be necessarily UML, it can be anything powerful enough to cover all the underlying requirements.
The main problem with "single source model" is customizing. You know, 90% works verwy well OOTB, but then in 10% you don't get the result you want and have to customize and then the effort gets you. Most of the generation tools have some kinds of "plugins". So if you fit in the 90%, you're lucky, otherwise you may need to get to know the hairy internals of the genration tools.
To sum up, single-source model is a good idea as long as it serves all the needs AND the effort to adapt/apply it for the required scenarios is not greater that making it from scratch.
XML Schema as the model
The next question is whether XML Schema is good as the single source model.
You have probably heard or used JAXB which has a schema compiler (XJC). This compiler can take your XML Schema and then generate Java classes with JAXB annotations. These classes can then be used to unmarshal XML into Java objects or marshal these object to XML.
And to JSON:
JAXB Mapping to JSON
Looks like you can also produce a JSON Schema from these classes (haven't tried it myself though):
How to generate JSON schema from a JAXB annotated class?
So XML Schema-first approach works. You can call it schema-driven development (I, hereby, claim the copyright on this term).
I personally did a lot of things schema-first wrote a number of tools/plugins for XJC. For instance:
Hyperjaxb makes schema-derived classes persistable with JPA.
Jsonix is baiscally a JAXB port for pure JavaScript.
My experience is that you can do a lot of things schema-first, but I also have to say that XML Schema is good but not the best or simplest model. The specification is complex, and if you take a look at the schema-derived classes then you could spot a few constructs which don't fit well in Java beans and properties. For instance, #XmlElementRef is a complex and often weird looking construct - which is stil necessary to cover quite a number of cases you can easily express in the XML Schema. In all the tools I wrote i alsways had to fight with cases and corder cases and corner cases of corner cases of such constructs.
XML Schema, if you keep it simple and neat, may be beautiful. Maps perfect to beans and properties, easy to understand and work with, a lot of tool support. So XML Schema is not the worst choice to model or specify data exchange.
But it can also get as complex as hell. I saw a lot of overengineered schemas, which then are extremely hard to work with - for a very little gain. Sometimes schema designers just don't know XML Schema well enough, sometimes know it too well. Last time I helped to work out "XML Schema design best practices", we landed on 60+ someting pages document of do's and don't's. So it's easy to get XML Schemas wrong.
But still, as I said above, if it's kept simple and clean it may be beuatiful.
What are the alternatives?
Well, you may actually use your Java code as your model source. Annotated POJOs are expressionaly powerful and versatile enough, but still quite simple to work with. You are not schema-first, you're Java code-first then, but you still can do all the same tricks. You can generate an XML Schema based on your annotated classes. You can do persistence (and much more) with MOXy. You can do JSON just as well.
To sum up and answer your question:
Yes, it is practical, and is known to work fairly well.
Along with the schema-first approach also consider Java-first approach.
You have tools to get XML-Objects-JSON-Persistence.
There are pitfalls (see above).
Hope this helps.
Since no one has answered to this question so far and we have started to follow this approach, I quickly summarize that for us the approach works generally quite well. We have designed a very rich XML Schema, that serves us as part of the contract between the server and the web client. The JSON follows the XML one-to-one, so the XML Schema reads naturally for the JSON document, too.
The only minor problem we noticed is that the canonical XML-to-JSON transformation that we use (which is not Schema-aware) creates a single object when there is just one child element somewhere in the tree, even when the XML Schema has an upperBound of 'many' for that element. This means that the programmers have to handle some polymorphism between object-values and collections here on the JSON side.

grammar compiler compiler for Java

My company is trying to write some software for Android. We would like to work with Java, and there is a component of the company's software that is c++ and so needs to be ported (or at least porting needs to be tried before trying NDK stuff). This code was created using Accent, and it defines a grammar grammar. As near as I can tell, the original writer (now gone) wrote a grammar to specify how to specify a grammar, then compiled a compiler-compiler with that grammar and Accent. The compiler-compiler takes a grammar of the specified format and produces a binary code to parse strings conforming to that grammar. Here's an example snippet of the grammar:
//include rules from from this file (such as <alpha>)
include "alphabet.bnf"
<<topSymbol>> = <alpha> <alpha> <alpha>? .//two letters with an optional third
//square brackets enclose an XML statement clarifying semantics of the rule
[
<topSymbol>
<letter>
<command val="doSomethingToLetter"/>
</letter>
<!--etc.-->
</topSymbol>
]
My question is how to do this with Java, using Antlr or some other tool. A compiler-compiler-compiler seems rather complicated to me. Alternatively, I would like to know how to easily compile/parse this type of grammar, which contains a grammatical and semantic XML information.
If the original designer knew what he was doing, and it is warranted, then you want to preserve that concept. Going with another parser generator (or at least a parsing scheme of some kind) is the right approach. Either JavaCC or ANTLR would be fine as parser generators; you'll have to hand-translate the grammar. You might hand code a recursive descent parser if the grammar is simple enough.
If the original designer was simply over the top, then you can probably replace the grammar-driven aspect, but you won't be able to do that without understanding what he was achieving. The fact that this "seems rather complicated to me" suggests you don't really understand parsing/parser generator technology, and you are driven by a desire to do something you understand than preserve something you don't. But its a bad idea to tear apart something that is well designed/implemented just because you don't understand it. I strongly suggest you learn more about these kinds of technologies, and ask why was it implemented this way? Ultimately you may be right and should replace his approach by something else, but make that choice based on knowledge, not fear.
My question is how to do this with Java, using Antlr or some other tool. A compiler-compiler-compiler seems rather complicated to me.
It sounds complicated to me too!
Alternatively, I would like to know how to easily compile/parse this type of grammar, which contains a grammatical and semantic XML information.
No ... there is no easy answer to this. It sounds like your ex-colleague has gone over the top on the complexity front. You are going to have to:
either get your head around what his code does, and how it does it, learn how Antlr works, and hand translate,
or ditch his code AND design and find a simpler way to do what it is doing.
Good luck!
(Actually, there is a good chance that the code is not as complicated as it seems ... once you get your head around it, and compiler-compiler technology.)
Your best bet is to translate the grammar you have into ANTLR or Java CC or some other tool.
Another possibility is to call your C++ code using JNI, but that's fraught with peril.
I'm not aware of anything that can help. You'll just have to get a shovel and start digging.

Custom Java XMLBuilder vs Standard classes-based

What is the best performance solution for XML generation.
My goal is to build a few simple XMLs from code. I am going to implement simple custom StringBuffer based implementation of XML Builder. From other side there are several libraries like http://code.google.com/p/java-xmlbuilder/ and http://code.google.com/p/xmltool/ which has nice DSL but I guess lack on performance.
Since my goal is build simple enough XMLBuilder with great performance I think I will build custom solution. It will featuring:
Nice Java-based DSL for XML constructs (adding tags basically)
Great StringBuffer based performance.
String data escape handling when adding XML tags.
Auto-indent
Please suggest if I am wrong on performance expectations and its probably better to use ready-made libraries.
UPDATE. Why I think the performance of standard xml builders is not very good.
Standard XML builders uses Document Builder Factory and works with classes behind the scenes. Also these classes optimized to fit all users. For example I don't need namespace support etc.
<?xml version="1.0" encoding="utf-8">
<root>
<testdata>value</testdata>
</root>
</xml>
Consider very simple XML code above. If you build with standard tools it will involve so many work just to make this simple XML. I consider that it's better to just generate it by myself using String.
UPDATE 2. Performance requirement is that code should do as many things as required to generate simple XML and not more.
UPDATE 3. Thanks everyone for great comments! Now I understand better what I need and that my initial goal was not set very correctly with word "performance". My true goal is to use simple enough solution with convenient DSL to describe the XML structure and generate the XML output.
I will use plain Java objects as DSL for XML and generate XML using XStream library which is pretty straightforward solution.
UPDATE 4. JAXB. I discussed XStream vs JAXB and found that JAXB is faster than XStream. Plus I already use JAXB in my project and I like its standard annotations. I change my mind and will go with JAXB for now because XStream was originally heavily developed at the time when JAXB was not so good as today.
I will suggest something very controversial but still ...
Make profiling and performance tests with both libraries.
If you don't have time for that, assuming something is slow would be the wrong choice in my opinion.
Because if it turns out that it actually is not slow, it would save you a lot of time to use an already built and supported library/framework.
Another thought.
You will need to test your completed high performance solution against the solutions already available anyway, to check if it is really high performance. So I would strongly suggest measuring the performance of the libraries available before starting your own.
Regarding:
Standard XML builders uses Document
Builder Factory and works with classes
behind the scenes. Also these classes
optimized to fit all users. For
example I don't need namespace support
etc.
An alternative to DOM is StAX (JSR-173). It is a Streaming API for XML that is quite fast. There are several implementations, I have found Woodstox to be quite performant.
There is powerful and flexible Groovy's NodeBuilder (http://groovy.codehaus.org/GroovyMarkup).
def root = new NodeBuilder()
.people(kind:'folks', groovy:true) {
person(x:123, name:'James', cheese:'edam') {
project(name:'groovy')
project(name:'geronimo')
}
person(x:234, name:'bob', cheese:'cheddar') {
project(name:'groovy')
project(name:'drools')
}
}
XmlUtil.serialize(root, System.out)
This results with an XML document:
<?xml version="1.0" encoding="UTF-8"?>
<people kind="folks" groovy="true">
<person x="123" name="James" cheese="edam">
<project name="groovy"/>
<project name="geronimo"/>
</person>
<person x="234" name="bob" cheese="cheddar">
<project name="groovy"/>
<project name="drools"/>
</person>
</people>
One more high-performance suggestion: use StaxMate -- it is as fast as underlying Stax-based XML writer, which is rather fast (40 - 80 megabytes per second, sustained). Just make sure you do NOT use default JDK 6 Stax implementation (Sun sjsxp) but something faster like Woodstox or Aalto.
I would strongly recommend against writing your own XML writer; it is typically risky (good chance you will forget some part of escaping) as others have mentioned, and not all that likely to be faster than existing efficient solutions (not all existing solutions are efficient; you do need to find ones that are). And in the end... unless you really want to write these things, why not work on something more interesting and meaningful?
But if you do want to do something above and beyond existing writers you could consider using a simple writer and augmenting it with additional functionality that you need. For example, if you just use Stax XMLStreamWriter as base, it is quite easy to add simple but efficient abstractions. Or if you like existing packages, see if you can suggest improvements to their authors (or even code contributions).

Evaluating creation of GUI via file vs coding

I'm working on a utility that will be used to test the project I'm currently working on. What the utility will do is allow user to provide various inputs and it will sends out requests and provide the response as output.
However, at this point the exact format (which input is required and what is optional) has yet to be fleshed out. In addition, coding in Swing is somewhat repetitive since the overall work is simple though this should be the safest route to go as I have more or less full control and every component can be tweaked as I want. I'm considering using a configuration file that's in XML to describe the GUI (at least one part of it) and then coding the event handling part (in addition to validation, etc). The GUI itself shouldn't be too complicated. For each type of request to make there's a tab for the request and within each tab are various inputs.
There seems to be quite a few questions about this already but I'm not asking for a 3rd party library to do this. I'm looking to do this myself, since I don't think it'll be too overly complicated (hopefully). My main consideration for using this is re-usability (later on, for other projects) and for simplifying the GUI work. My question is: are there other pros/cons that I'm overlooking? Is it worth the (unknown) time to do this?
I've built GUI in VB.NET and with Flex3 before.
XML is so 2000. It's code, put it in real source files. If it really is so simple that it could be XML, all you are doing is removing the XML handling step and using a clearer syntax. If it turns out to be a little more complicated than you first expected, then you have the full power of your favourite programming language to hand.
In my experience, if your layout really is simple, something like the non-visual builders in FormLayout can lead to really concise code with a minimum of repetition.
If you have to specify the precise location of every control you might look at a declarative swing helper toolkit that can minimize boilerplate and simplify layout. Groovy supports this as does JavaFX, and both are simple library extensions to Java (give or take).
If the form is laid out in a pattern, using a definition file in a format like XML or YAML will work. I've done that and have even set up data bindings in that file so that you don't even have to deal with listeners or initial values...
If you are sure you want XML, I'd seriously consider YAML though, it's really close but instead of:
<outer>
<inner a=1> abc </inner>
</outer>
I think it's a lot more like:
outer
inner a=1
abc
(I may have that a bit wrong, but that's close I think. Anyway, you should never force anyone to type XML--if you are set on XML, provide a GUI with which to edit it!)

Class libraries for Java's immense verbosity

I recently got into Java. I have a background in dynamic languages and I'm finally figuring out why people complain about Java's verbosity. Are there any class libraries out there that address this issue? I'd much rather type something like String text = someClass.stdin() instead of the 8 or so lines it takes to get user input in Java.
In Java 5:
import java.util.Scanner;
...
System.out.print("Enter your name: ");
String userName = new Scanner(System.in).nextLine();
Or, in Java 6:
String userName = System.console().readLine("Enter your name: ");
Some of the Apache Commons libraries (particularly Lang, IO and Collections) are designed to hide the verbosity of certain core Java APIs. The verbosity of the Java language, however, we're all stuck with.
Sure there are several JPython, JRuby, Clojure, Scala...
Google has also released a number of libraries that complement sections of the standard library, like the collections library. Guice is also a nice lightweight DI framework that, IMHO, is easier to learn that spring.
The standard library is so large I don't think you'll find a single library that replaces everything. You're best bet is to look for libraries that solve individual problems (i.e. I don't like the Collections API, I need an object pool, etc.)
I'd be interested in seeing these 8 lines to get user input in Java.
I personally think that Java's verbosity becomes an asset as your program becomes larger. Unlike C and C++, everything is done in a more object oriented way. You get the object representing your output, then you issue an operation on it, and so on. Much easier to understand and maintain in the long run.
Is this as quick as a nice printf() here and there? No. Is it as convenient as scripting in Python? Of course not. But that's part of the cost of using a language like Java, just like the lack of Lambdas is annoying.
As an engineer your role is to pick the best tool for the job. I do most of my coding in Java, and some in Python, accepting the tradeoffs of each.
While you can't change the language, you could use libraries that simplify some operations (e.g., Google's or Apache's IO libraries). You could also write your own classes for the things that annoy you the most.
I also think you're confusing the verbosity of the language and of the standard library. The library contains a lot of stuff, most of it you'll never need. I find the existing division fairly straightforward and have never found myself in areas I didn't care about.
If you really can't stand Java, you might want to use hybrid languages like Scala.
I'm a big fan of leaning on my IDE's live templating features. (IntelliJ IDEA) I can't remember the last time I spelled out StringBuffer or System.out.println("...").

Categories