What is the difference between JAXP and JAXB?
JAXP (Java API for XML Processing) is a rather outdated umbrella term covering the various low-level XML APIs in JavaSE, such as DOM, SAX and StAX.
JAXB (Java Architecture for XML Binding) is a specific API (the stuff under javax.xml.bind) that uses annotations to bind XML documents to a java object model.
JAXP is Java API for XML Processing, which provides a platform for us to Parse the XML Files with the DOM Or SAX Parsers.
Where as JAXB is Java Architecture for XML Binding, it will make it easier to access XML documents from applications written in the Java programming language.
For Example : Computer.xml File, if we want to access the data with JAXP, we will be performing the following steps
Create a SAX Parser or DOM Parser and then PArse the data, if we use
DOM, it may be memory intensive if the document is too big. Suppose
if we use SAX parser, we need to identify the beginning of the
document. When it encounters something significant (in SAX terms, an
"event") such as the start of an XML tag, or the text inside of a
tag, it makes that data available to the calling application.
Then Create a content handler that defines the methods to be
notified by the parser when it encounters an event. These methods,
known as callback methods, take the appropriate action on the data
they receive.
The Same Operations if it is performed by JAXB, the following steps needs to be performed to access the Computer.xml
Bind the schema for the XML document.
Unmarshal the document into Java content objects. The Java content objects represent the content and organization of the XML document, and are directly available to your program.
After unmarshalling, your program can access and display the data in the XML document simply by accessing the data in the Java content objects and then displaying it. There is no need to create and use a parser and no need to write a content handler with callback methods. What this means is that developers can access and process XML data without having to know XML or XML processing
The key difference is which role the xml Schema plays. JAXP is outdated without awareness of the XML Schema while JAXB handles the schema binding as the very first step.
Related
I have a very large xml to unmarshal. I don't want to create POJO classes for this because that would mean creating around 20 classes. Is there a way I can unmarshal this dynamically i.e. without creating POJO classes?
Edit: Here is the link to the article to unmarshal (https://www.ncbi.nlm.nih.gov/pubmed/31297574/?report=xml&format=text)
I want to read this data and store it somewhere in my database.
I am trying to do this with jaxb.
The term "unmarshal" is usually used to mean a process of parsing XML and generating custom POJO objects. If you want to use generic Java objects instead, then you want one of the XML generic tree models. Most people use DOM, which is the oldest and worst of the models but is the default because it comes bundled with the Java platform; my own recommendation would be either JDOM2 or XOM.
If you don't want to create custom classes then you don't want to be using JAXB.
You haven't said in detail what you want to achieve, but for many XML operations, using XSLT or XQuery is going to be much easier than using Java (because processing XML is what they were designed for).
You can check DSM library. It's designed to process complex XML and JSON documents while reading the document. You define mapping definition in yaml format so you don't need to create classes to unmarshal.
DOM API load all XML to memory so that you can't use DOM with large XML. But DSM uses stream parsing so you won't face with memory problems. Using DSM is easier then DOM
Reading the docs, this is the method used in all the examples I've seen:
(Version of org.jdom.input.SAXBuilder is jdom-1.1.jar)
Document doc = new SAXBuilder().build(is);
Element root = doc.getRootElement();
Element child = root.getChild("someChildElement");
...
where is is an InputStream variable.
I'm wondering, since this is a SAX builder (as opposed to a DOM builder), does the entire inputstream get read into the document object with the build method? Or is it working off a lazy load and as long as I request elements with Element.getChildren() or similar functions (stemming from the root node) that are forward-only through the document, then the builder automatically takes care of loading chunks of the stream for me?
I need to be sure I'm not loading the whole file into memory.
Thanks,
Mike
The DOM parser similarly to the JDom parser loads the whole XML resource in memory to provide you a Document instance allowing to navigate in the elements of the XML.
Some references here :
the DOM standard is a codified standard for an in-memory document
model.
And here :
JDOM works on the logical in-memory XML tree,
Both DOM and JDom use the SAX parser internally to read the XML resource but they use it only to store the whole content in the Document instance that they return. Indeed, with Dom and JDom, the client never needs to provide a handler to intercept events triggered by the SAX parser.
Note that both DOM and JDom don't have any obligation to use SAX internally.
They use them mainly as the SAX standard is already there and so it makes sense to use it for reporting errors.
I need to be sure I'm not loading the whole file into memory.
You have two programming models to work with XML: streaming and the document object model (DOM).
You are looking for the first one.
So use the SAX parser by providing your handler to handle events generated by the SAX parser (startDocument(), startElement(), and so for) or as alternative look at a more user friendly API : STAX (Streaming API for XML) :
As an API in the JAXP family, StAX can be compared, among other APIs,
to SAX, TrAX, and JDOM. Of the latter two, StAX is not as powerful or
flexible as TrAX or JDOM, but neither does it require as much memory
or processor load to be useful, and StAX can, in many cases,
outperform the DOM-based APIs. The same arguments outlined above,
weighing the cost/benefits of the DOM model versus the streaming
model, apply here.
It eagerly parses the whole file to build the in-memory representation (i.e. Document) of the XML file.
If you want to be absolutely certain of that, you can go through the source on GitHub. More importantly the following classes: SAXBuilder, SAXHandler, and Document.
I have a complex xml file which has multi-level elements. I have to parse the XML file and based on the elements present, I have to handle the incoming request. I can use JAXB to generate the classes and parse the xml. But to go through the multi-level elements and match against the rules makes the program too complex and heavy (leads to 4-5 levels of loops). Is there an efficient and lighter way of achieving the same?
Depending on your needs to store -or not- temporarily the read data, you can count on these parsers:
DOM (Document Object Model) parsers store the XML data into a memory structure.
jDOM (Java DOM) parser is a DOM-like implementation in Java, with its own API.
SAX (Simple API for XML) parsers traverse an XML stream completely asynchronously, throwing user events for every read data.
StAX (Streaming API for XML) parsers reads an XML stream synchronously.
All of them can be found in any standard runtime (JRE), except jDOM, which is open-source.
So, if you are looking for an efficient way to process XML and take decissions based upon the read data, maybe StAX would suit your needs, because as soon as you get the data you need, you might stop reading and discard the rest of the input XML.
Update
To apply matching rules over the whole document I recommend you to use XPath over DOM.
There is an XML file hosted on a server that I want to parse. Normally I generate an XSD from the XML and then generate the java pojo's from this XSD. Using jackson I then parse the XML to a java object representation. Is it not more straightforward to just use xpath ? This means I do not need to generate a object hierarchy based on the XML and also I do not need to regenerate the object hierarchy if the XML changes. xpath seems much more concise and intuitive ?
Why should I use XSD , object generation instead of xpath ?
According to the XML Schema specification XSD is used for defining the structure, content and semantics of XML documents. This means that you can use XSD to validate your XML file.
Depending on your circumstances you might be able to do without generating the whole object tree if all you need is to get some values from the XML file. In this case XPath is the way to go. However, you still might want to have an XSD file in order to validate the XML file before parsing it. This way you make your software fail fast, when the structure of your XML file changes, which will suggest that you change your XPath expressions. But for this to work, you shouldn't use the XSD you generate from your XML file, instead you should have a separate pre-generated XSD file which complies with the XPath expressions.
I think both approaches are valid, depending on the circumstances.
At the end of the day, you want to extract the values from that remote xml file and do something with them.
First criteria to consider is the size of that file, and the number of data elements.
If it's just a few, then xpath extraction should be straightforward. However, if that xml file represent a sizable and/or complex data structure, then you probably want the de-serialization to a Java data structure that you can then utilize, and JAXB would be a good candidate.
JAXB is going to be easier/better if the remote server adheres or publishes an XML Schema. If it doesn't, and changes often and significantly, you're going to suffer either way, but particularly so with JAXB. There are ways to smooth things over by pre-processing that xml with XSLT to force it into a more reliable form, but that is going to be a partial solution most likely.
I want to create a basic RSS feeder app for Android and I don't know which library to choose:
SAX or DOM. Which should I choose?
Does anyone have any experience with either on an Android platform?
Any tips?
A SAX based parser will allow you to store only the information that you require with an event handler style interface, while DOM based methods will parse the whole file into an object model. Personally I would use SAX for both it's speed and memory advantages (especially on a mobile environment --- if you don't know the length of the XML at runtime, you could end up with a huge model). SAX allows you to construct your own objects/information as required in the format you want without having the default object model stored on top.
In general however, a SAX based parser is useful if the XML contains machine readable data, and a DOM based parser is useful when you have structured document style data. See here for more information.
Use SAX parser, it has better speed and performance....