I'm using xerces SAX to parse this XML:
<product id="123">
<sku>abc123</sku>
<location>
<warehouse>1</warehouse/>
<level>3</level>
</location>
<details>
<weight unit="kg">150</weight/>
<mfg>honda</mfg>
</details>
</product>
So I created my own class named ProductSAXHandler.java:
public class ProductSAXHandler extends DefaultHandler {
}
Now from my spring mvc controller I plan on pass my XML file as a string, and I want a Product object returned.
So inside my ProductSAXHandler I will create my startElement, endElement methods.
So should I just create a method that takes the xml as a string, then instantiates this ProductSAXhandler and then call parse?
This is my first time dealing with xml/sax so I'm a little unclear how to design this class.
This doesn't directly answer your question, but I'm gonna read a bit between the lines and focus on your actual intention...
Since you want an instance of some Product class that encapsulates the data from the XML, probably in a structured way by preference, you'd do much better to use JAXB for this task. Unless you have really specific requirements regarding the customization of binding XML input to objects, this will turn out a lot simpler than using SAX.
What you'll need to do is:
Get a W3C XML Schema for your XML. If you don't have one and can't obtain one, then there are tools out there that can generate a schema based on input XML. Trang makes this very easy.
Generate Java classes from the schema. For this you can use XJC (the XML-to-Java Compiler) available with Sun's JDK. If you're using build tools like Ant or Maven, there's plugins available. This can automate the process to make it part of a full build.
Use the JAXB API with your generated classes to easily turn XML documents into objects and vice-versa.
Although JAXB will take some time to learn (especially if the desired XML-Java mapping isn't 1-to-1), I think you'll end up saving time in the long run.
Sorry if you really do need SAX and this answer is not applicable, but I figured I'd rather let you know your options before using a somewhat archaic XML processing model. Note that DOM and StAX might also be of interest to you.
Related
I have to specify a JSON data structure; that data structure will be part of an interface description, the data will be processed by JavaScript. JSON is set for the data transmission. In other projects, where we used XML instead of JSON, I have used rich XML schemas for this. Unfortunately, I cannot do that now.
I did some researching and found JSON Schema.
However, this is still draft status, which makes me feel a bit uneasy to use it in this context.
I also came across this question discussing how to map XML to JSON. There seems to be a standard (?) conversion in the XML class in the org.json namespace. It appears that the conversion is rather straight-forward for XML documents without mixed content.
So the idea is to use XML Schema to describe the data structure, use our existing XML processing (editing, transformation, validation, ...) tools as long as possible on the server side and convert the XML DOM to JSON just before delivering the data to the JSON consumer.
Data transmission is one-way only and we would not have mixed-content XML.
Maybe someone has tried this before? Would that be a practical approach in the sense that the the semantics of the XML Schema are still clear enough for the client-side programmers when (conceptually) applied to the JSON document? Are there any particular pitfalls to be aware of?
If I understood your idea right, you want to use XML Schema as the primary model for you data exchange - for XML as well as JSON formats.
This idea has two parts:
Use single source to model all the data exchange.
Use XML Schema as this single source.
Singe source model
The first idea brings you to MDD (Model-Driven Development) or MDA (Model-Driven Architecture) which had a hype around 2002-2005. It was UML-heavy, vendor-driven hype, but quite a few reasonable things (like AndroMDA) survived.
Generally, MDA is a good idea. It works splendid as long as you do "standard" things. But it can be a nightmare if you want to "customize".
In your case, I would definitely say that single-source model makes sense. This is about data exchange. In the core this can be reduced to very simple models which are still powerful enough to express everything you need.
JSON is an example for this. JSON is even simpler that XML but still powerful enough. It clearly shows that as long as you have basic primitive types, objects, arrays and nesting you can express almost anything.
This "single source model" must not be necessarily UML, it can be anything powerful enough to cover all the underlying requirements.
The main problem with "single source model" is customizing. You know, 90% works verwy well OOTB, but then in 10% you don't get the result you want and have to customize and then the effort gets you. Most of the generation tools have some kinds of "plugins". So if you fit in the 90%, you're lucky, otherwise you may need to get to know the hairy internals of the genration tools.
To sum up, single-source model is a good idea as long as it serves all the needs AND the effort to adapt/apply it for the required scenarios is not greater that making it from scratch.
XML Schema as the model
The next question is whether XML Schema is good as the single source model.
You have probably heard or used JAXB which has a schema compiler (XJC). This compiler can take your XML Schema and then generate Java classes with JAXB annotations. These classes can then be used to unmarshal XML into Java objects or marshal these object to XML.
And to JSON:
JAXB Mapping to JSON
Looks like you can also produce a JSON Schema from these classes (haven't tried it myself though):
How to generate JSON schema from a JAXB annotated class?
So XML Schema-first approach works. You can call it schema-driven development (I, hereby, claim the copyright on this term).
I personally did a lot of things schema-first wrote a number of tools/plugins for XJC. For instance:
Hyperjaxb makes schema-derived classes persistable with JPA.
Jsonix is baiscally a JAXB port for pure JavaScript.
My experience is that you can do a lot of things schema-first, but I also have to say that XML Schema is good but not the best or simplest model. The specification is complex, and if you take a look at the schema-derived classes then you could spot a few constructs which don't fit well in Java beans and properties. For instance, #XmlElementRef is a complex and often weird looking construct - which is stil necessary to cover quite a number of cases you can easily express in the XML Schema. In all the tools I wrote i alsways had to fight with cases and corder cases and corner cases of corner cases of such constructs.
XML Schema, if you keep it simple and neat, may be beautiful. Maps perfect to beans and properties, easy to understand and work with, a lot of tool support. So XML Schema is not the worst choice to model or specify data exchange.
But it can also get as complex as hell. I saw a lot of overengineered schemas, which then are extremely hard to work with - for a very little gain. Sometimes schema designers just don't know XML Schema well enough, sometimes know it too well. Last time I helped to work out "XML Schema design best practices", we landed on 60+ someting pages document of do's and don't's. So it's easy to get XML Schemas wrong.
But still, as I said above, if it's kept simple and clean it may be beuatiful.
What are the alternatives?
Well, you may actually use your Java code as your model source. Annotated POJOs are expressionaly powerful and versatile enough, but still quite simple to work with. You are not schema-first, you're Java code-first then, but you still can do all the same tricks. You can generate an XML Schema based on your annotated classes. You can do persistence (and much more) with MOXy. You can do JSON just as well.
To sum up and answer your question:
Yes, it is practical, and is known to work fairly well.
Along with the schema-first approach also consider Java-first approach.
You have tools to get XML-Objects-JSON-Persistence.
There are pitfalls (see above).
Hope this helps.
Since no one has answered to this question so far and we have started to follow this approach, I quickly summarize that for us the approach works generally quite well. We have designed a very rich XML Schema, that serves us as part of the contract between the server and the web client. The JSON follows the XML one-to-one, so the XML Schema reads naturally for the JSON document, too.
The only minor problem we noticed is that the canonical XML-to-JSON transformation that we use (which is not Schema-aware) creates a single object when there is just one child element somewhere in the tree, even when the XML Schema has an upperBound of 'many' for that element. This means that the programmers have to handle some polymorphism between object-values and collections here on the JSON side.
I am using java, I want to read strings from an XML tag. EX: < blank type="Something">
I need to be able to assign "Something" to a variable. Any ideas?
There are a lot of ways to do this:
You can use the XML APIs provided with Java (SAX or STAX or DOM).
There are libraries that build on the XML APIs (JDOM, DOM4J, or XOM) which are easier to use than the raw APIs.
There's Java-XML databinding, described in Pratik's answer. Java-XML databinding is sometimes overkill, depending on your requirements, and when there are errors they can be hard to figure out. Sometimes it's worthwhile, though. I think JiBX is particularly interesting.
If you don't know where to start, start with XOM. XOM was created by a JDOM contributor, it was designed to be easy to use.
What you want to achieve is referred to as Unmarshalling an XML. Unmarshalling means extracting data from an XML document and using it to construct a Java object or graph of objects. There are various APIs available to do the same. You should have a look at the following links:
JAXB:
http://www.oracle.com/technetwork/articles/javase/index-140168.html
Castor: http://castor.codehaus.org/
JiBX: http://jibx.sourceforge.net/
I'm working on an existing system that's generating XML for a legacy system using a simple template language. This is obviously not ideal because it's difficult to see the structure of the generated XML, it suffers from escaping problems and it's easy to generate invalid XML.
For any sane XML formats I'd just Xstream or another Java XML serializing library, but this legacy system has a lot of strange rules like "this node should be excluded if the value is less then ten" and "the formatting of the date in node x depends on the value of node y". There are other strange rules as well, but this should be enough to get the idea.
As I've said, the template approach is far from idea, but it's pragmatic and works (with some effort). Is there a better way to approach generating XML for legacy systems with this amount of formatting rules? XSL has crossed my mind, but implementing any amount of logic in XSL is frankly not very tempting.
Basically you need some custom logic during serialization. I am guessing that the in-memory object structure is not directly mirrored in the XML structure? Alternatives:
Use StAX and distribute read and write methods within the objects.
Use JAXB and insert custom serialization.
Don't even think of expressing your custom logic in anything other than java, i.e. some "super" framework.
I am not sure, if this is what you are looking for, but maybe try XML Binding like JAXB...
In other words: you could generate a class library from your xsd-Schema and then build your object graph in java code, then serialize it in one call to xml.
You could use simple xml and some converters I think:
http://simple.sourceforge.net/download/stream/doc/tutorial/tutorial.php
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
There is guy here swearing that JAXB is the greatest thing since sliced bread. I am curious to see what Stack Overflow users think the use case is for JAXB and what makes it a good or a bad solution for that case.
I'm a big fan of JAXB for manipulating XML. Basically, it provides a solution to this problem (I'm assuming familiarity with XML, Java data structures, and XML Schemas):
Working with XML is difficult. One needs a way to take an XML file - which is basically a text file - and convert it into some sort of data structure, which your program can then manipulate.
JAXB will take an XML Schema that you write and create a set of classes that correspond to that schema. The JAXB utilities will create the hierarchy of data structures for manipulating that XML.
JAXB can then be used to read an XML file, and then create instances of the generated classes - laden with the data from your XML. JAXB also does the reverse: takes java classes, and generates the corresponding XML.
I like JAXB because it is easy to use, and comes with Java 1.6 (if you are using 1.5, you can download the JAXB .jars.) The way it creates the class hierarchy is intuitive, and in my experience, does a decent job abstracting away the "XML" so that I can focus on "data".
So to answer your question: I would expect that, for small XML files, JAXB might be overkill. It requires you to create and maintain an XML schema, and to use "standard textbook methods" of utilizing Java classes for data structures. (Main classes, small inner-classes to represent "nodes", and a huge hierarchy of them.) So, JAXB is probably not that great for a simple linear list of "preferences" for an application.
But if you have a rather complex XML schema, and lots of data contained within it, then JAXB is fantastic. In my project, I was converting large amounts of data between binary (which was consumed by a C program) and XML (so that humans could consume and modify that data). The resulting XML Schema was nontrivial (many levels of hierarchy, some fields could be repeated, others could not) so JAXB was helpful in being able to manipulate that.
Here's a reason not to use it: performance suffers. There is a good deal of overhead when marshaling and unmarshaling. You might also want to consider another API for XML-Object binding -- such as JiBX:
http://jibx.sourceforge.net/
I use JAXB at work all the time and I really love it. It's perfect for complex XML schemas that are always changing and especially good for random access of tags in an XML file.
I hate to pimp but I just started a blog and this is literally the first thing I posted about!
Check it out here:
http://arthur.gonigberg.com/2010/04/21/getting-started-with-jaxb/
It's an "ORM for XML". Most often used alongside JAX-WS (and indeed the Sun implementations are developed together) for WS Death Star systems.
With JAXB you can automatically create XML representations of your objects (marshalling) and object representations of the XML (unmarshalling).
As far as the XML Schema is concerned, you have two choices:
Generate Java classes from an XSD
Generate an XSD from your Java classes
There are also some simpler XML serialization libraries like XStream, Digester or XMLBeans that might be alternatives.
JAXB is great if you have to code to some external XML spec defined as an XML schema (xsd).
For example, you have a trading application and you must report the trades to the Uber Lame Trade Reporting App and they've given you ultra.xsd to be getting on with. Use the $JAVA_HOME/bin/xjc compiler to turn the XML into a bunch of Java classes (e.g. UltraTrade).
Then you can just write a simple adapter layer to convert your trade objects to UltraTrades and use the JAXB to marshal the data across to Ultra-Corp. Much easier than messing about converting your trades into their XML format.
Where it all breaks down is when Ultra-Corp haven't actually obeyed their own spec, and the trade price which they have down as a xsd:float should actually be expressed as a double!
Why we need JAXB?
The remote components (written in Java) of web services uses XML as a mean to exchange messages between each other. Why XML? Because XML is considered light weight option to exchange message on Networks with limited resources.
So often we need to convert these XML documents into objects and vice versa. E.g: Simple Java POJO Employee can be used to send Employee data to remote component( also a Java programme).
class Employee{
String name;
String dept;
....
}
This Pojo should be converted (Marshall) in to XML document as follow:
<Employee>
<Name>...</Name>
<Department>...</Department>
</Employee>
And at the remote component, back to Java object from XML document (Un-Marshall).
What is JAXB?
JAXB is a library or a tool to perform this operation of Marshalling and UnMarshalling. It spares you from this headache, as simple as that.
You can also check out JIBX too. It is also a very good xml data binder, which is also specialized in OTA (Open Travel Alliance) and is supported by AXIS2 servers. If you're looking for performance and compatibility, you can check it out :
http://jibx.sourceforge.net/index.html
JAXB provides improved performance via default marshalling optimizations.
JAXB defines a programmer API for reading and writing Java objects to and from XML documents, thus simplifying the reading and writing of XML via Java.
I need to take any given valid XML schema (XSD) and denormalize it to a simple form containing no refs, no includes, etc. All simple type definitions should be inline, such that when looking at any given element, all declarations are visible without performing another lookup.
I've found some tools that have this built-in, but I need to do it "on the fly." Platform of choice is Java, but I'd be willing to port the code from another language if necessary. I just really don't want to reinvent the wheel here. Searching for OSS libraries from Apache/etc have yielded nothing. The closest I've found is XSOM which supports traversing a schema as an object model, but you still have to handle every possible form that a schema could take to represent a given structure.
The output doesn't have to be actual XML, as it will actually be used in a object model in its final form.
You might find XSD4J helpful:
http://dynvocation.selfip.net/xsd4j/
The EMF XSD model may be helpful:
http://www.eclipse.org/modeling/mdt/?project=xsd
Another useful API for XML Schema is XSOM.
XSOM is used by XJC, JAXB schema compiler under the hub so is probably guaranteed to be kept alive.