What's the right way to produce a XML content in Java? - java

I've read several questions and tutorials over the internet such as
Best XML parser for Java [closed]
JAVA XML - How do I get specific elements in an XML Node?
What is the best way to create XML files in Java?
how to modify xml tag specific value in java?
Using StAX - From Oracle Tutorials
Apache Xerces Docs
Introduction to XML and XML With Java
Java DOM Parser - Modify XML Document
But since this is the very first time I have to manipulate XML documents in Java I'm still a little bit confused. The XML content is written with String concatenation and that seems to me wrong. It is the same to concatenate Strings to produce a JSON object instead of using a JSONObject class. That's the way the code is written right now:
"<msg:restenv xmlns:msg=\"http://www.b2wdigital.com/umb/NEXM_processa_nf_xml_req\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.b2wdigital.com/umb/NEXM_processa_nf_xml_req req.xsd\"><autenticacao><usuario>"
+ usuario + "</usuario><senha>" + StringUtils.defaultIfBlank(UmbrellaRestClient.PARAMETROS_INFRA_UMBRELLA.get("SENHA_UMBRELLA"), "WS.INTEGRADOR")
+ "</senha></autenticacao><parametros><parametro><p_vl_gnre>" + valorGNRE + "</p_vl_gnre><p_cnpj_destinatario>" + cnpjFilial + "</p_cnpj_destinatario><p_num_ped_compra>" + idPedido
+ "</p_num_ped_compra><p_xml_sefaz><![CDATA[" + arquivoNfe + "]]></p_xml_sefaz></parametro></parametros></msg:restenv>"
In my research I think that almost everything I've read pointed to SAX as the best solution but I never really found something really useful to read about it, almost everything states that we have to create a handler and override startElement, endElement and characters methods.
I don't have to serialize the XML file in hard disk or database or anything else, I just need to pass its content to a webservice.
So my question really is, which is the right way to do it?
Concatenate Strings the way things are done right now?
Write the XML file using a Java API like Xerces? I have no clue on how that can be done.
Read the XML file with streams and just change node texts? My XML without the files would be like that:
<msg:restenv xmlns:msg="{url}"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="{schemaLocation}">
<autenticacao>
<usuario></usuario>
<senha></senha>
</autenticacao>
<parametros>
<parametro>
<p_vl_gnre></p_vl_gnre>
<p_cnpj_destinatario></p_cnpj_destinatario>
<p_num_ped_compra></p_num_ped_compra>
<p_xml_sefaz><![CDATA[]]></p_xml_sefaz>
</parametro>
</parametros>
</msg:restenv>
I've also read something about using Apache Velocity as a template Engine since I don't actually have to serialize the XML and that's a approach that I really like because I've already worked with this framework and it's a really simple framework.
Again, I'm not looking for the best way, but the right one with tutorials and examples, if possible, on how to get things done.

It all depends on context. There is no single "right way".
The biggest issues with concatenation is the combination of escaping the XML in to a String constant (which is fiddly), but also escaping the values that you can using so that they're correct for an XML document.
For small XMLs, that's fine.
But for larger ones, it can be a pain.
If most of your XML is boilerplate with just a few values inserted, you may find that templates using something like Velocity or any of the other several libraries may be quite effective. It helps keep the template "native" (you don't have to wrap it in "'s and escape it), plus it keeps the XML out of your code, but easily lets you stamp in the parts that you need to do.

I agree that there's not just one way to do it, but I would advise you to take a look at JAXB. You can easily consume and produce XML without any of that pesky String manipulation. Look here for a simple intro: https://docs.oracle.com/javase/tutorial/jaxb/index.html

The Answer by Will Hartung is correct. There is not one right way as it depends on your situation.
For a beginner programmer, I suggest writing the strings manually so you get to understand XML in general and your content in particular. As for the mechanics of String concatenation, you would generally be using StringBuilder rather than String for better performance. Where thread-safety is needed, use StringBuffer.
The major issue is memory.
Abundant MemoryIf you have lots of memory and small XML documents, you can load the entire document into memory. This way you can traverse a document forwards, backwards, and jump around arbitrarily. This approach is know as Document Object Model (DOM). One better-known implementation of this approach is Apache Xerces. There are other good implementations as well.
Scarce MemoryIf you have little memory and large XML documents, then you need to plow through the document from start to finish, biting off small chunks at a time for lower memory usage. This approach is known as SAX. You can find multiple good implementations.
Another issue is validation. Do you want to validate the XML documents against a DTD or Schema? Some tools do this and some do not.
When all you need is to serialize the content of a Java object and read it back, I recommend the Simple XML Serialization library. Much simpler with a quicker learning-curve than the other tools.

Related

Parsing very large XML files and marshalling to Java Objects

I have the following issue: I have very large XML files (like 300+ Megs), and I need to parse them in order to add some of their values to the db. The structure of these files is also very complex. I want to use Stax Parser as it offers the nice possibility of pull-parsing (and thus processing) only parts of the XML file at a time, and thus not loading the whole thing in memory, but on the other hand getting the values with Stax (at least on these XML files) is cumbersome, I need to write a ton of code. From this latter point of view it will immensly help me if I could marshall the XML file to Java objects (like JAX-B does) however this would load the whole file plus a ton of Object instances in memory all at once.
My question is, is there some way to pull-parse (or just partially parse) the file sequentially, and then marshall only those parts to Java objects so I can deal with them easily without bogging down on memory?
I would recommend Eclipse EMF. But it has the same problem, if you give it the file name it would parse the whole thing. Although there are some options to reduce how much is loaded, but I didn't bother much as we run on machines with 96 GB RAM. :)
Anyway, If your XML format is well defined, then one workaround is to fool the EMF by breaking down the whole file into several smaller (but still well defined) XML snippets. Then feed each snippet one after the other. I don't know JAX-B, but perhaps the same workaround can be applied there as well. Which I would recommend, because EMF is too big a hammer for such a small issue.
Just to elaborate a bit if your XML looks like this:
<tag1>
<tag2>
<tag3/>
<tag4>
<tag5/>
</tag4>
<tag6/>
<tag7/>
</tag2>
<tag2>
<tag3/>
<tag4>
<tag5/>
</tag4>
<tag6/>
<tag7/>
</tag2>
............
<tag2>
<tag3/>
<tag4>
<tag5/>
</tag4>
<tag6/>
<tag7/>
</tag2>
</tag1>
Then it can be broken down into one XML each starting with <tag2> and ending with </tag2>. And in java most parsers would accept a Stream, so just parse using whatever you want, create some StringStream or something for each <tag2> in a loop and pass to JAX-B or EMF.
HTH
Well, first off I wanna thank the two persons answering my questions, but I finally ended up not using those propositions partly because those proposed technologies are a bit far from the Java let's say "standard XML parsing" and it feels weird going so far when there's a similar tool already present in Java and partly also because in fact I did found a solution that only uses Java API's to accomplish this.
I will not detail too much the solution I found, because I've already finished the implementation, and it's quite a big chunk of code to place here (I use Spring Batch on top of it all, with a ton of configuration and stuff).
I will however make a small comment on what I finally ended up doing:
The big idea here is the fact that if you have an XML document AND it's corresponding XSD schema, you can parse & marshall it with JAXB, and you can do it in chunks, and said chunks can be read with an even parser such as STAX and then passed to the JAXB Marshaller.
This practically means that you must first decide where's a good place in your XML file where you can say "this part here has A LOT of repetive structure, I will treat those repetitions one at a time". Those repetitive parts are usually the same (child) tag repeated a lot inside a parent tag. So all you have to do is make an event listener in your STAX parser that is triggered at the start of each of those child tags, than stream over to JAXB the content of that child tag, marshall it with JAXB and process it.
Really the idea is excellently described in this article, which I followed (true, it's from 2006, but it deals with JDK 1.6 which at that time was pretty new, so version-wise it's not that old at all):
http://www.javarants.com/2006/04/30/simple-and-efficient-xml-parsing-using-jaxb-2-0/
Document projection might be the answer here. Saxon and a number of other XQuery processors offer this as an option. If you have a reasonably simple query that selects a small amount of data from a large document, the query processor analyses the query to work out which parts of the tree need to be available for the query, and which can be discarded during processing. The resulting tree can often be only 1% of the size of the full document. Details for Saxon here:
http://saxonica.com/documentation/sourcedocs/projection.xml

Android XML parser for simple xml node strings

I need to parse a series of simple XML nodes (String format) as they arrive from a persistent socket connection. Is a custom Android SAX parser really the best way? It seams slightly overkill to do it in this way
I had naively hoped I could cast the strings to XML then reference the names / attributes with dot syntax or similar.
I'd use the DOM Parser. It isn't as efficient as SAX, but if it's a simple XML file that's not too large, it's the easiest way to get up and moving.
Great tutorial on how to use it here: http://tutorials.jenkov.com/java-xml/dom.html
You might want to take a look at the XPath library. This is a more simple way of parsing xml. It's similar to building SQL queries and regex's.
http://www.ibm.com/developerworks/library/x-javaxpathapi.html
I'd go for a SAX Parser:
It's much more efficient in terms of memory, especially for larger files: you don't parse an entire document into objects, instead the parser performs a single uni-directional pass over the document and triggers events as it goes through.
It's actually surprisingly easy to implement: for instance take a look at Working with XML on Android by IBM. It's only listings 5 and 6 that are the actual implementation of their SAX parser so it's not a lot of code.
You can try to use Konsume-XML: SAX/STAX/Pull APIs are too low-level and hard to use; DOM requires the XML to fit into memory and is still clunky to use. Konsume-XML is based on Pull and therefore it's extremely efficient, yet the API is higher-level and much easier to use.

parsing/scanning/tokenizing "raw XML"

I have an application where I need to parse or tokenize XML and preserve the raw text (e.g. don't parse entities, don't convert whitespace in attributes, keep attribute order, etc.) in a Java program.
I've spent several hours today trying to use StAX, SAX, XSLT, TagSoup, etc. before realizing that none of them do this. I can't afford to spend much more time attacking this problem, and parsing the text manually seems highly nontrivial. Is there any Java library that can help me tokenize the XML?
edit: why am I doing this? -- I have a large XML file that I want to make a small number of localized changes programmatically, that need to be reviewed. It is highly valuable to be able to use a diff tool. If the parser/filter normalizes the XML, then all I see is "red ink" in the diff tool. The application that produces the XML in the first place isn't something that I can easily have changed to produce "canonical XML", if there is such a thing.
I think you might have to generate your own grammar.
Some links:
Parsing XML with ANTLR Tutorial
ANTXR
XPA
http://www.google.com/search?q=antlr+xml
I don't think any XML parser will do what you want. Why ? For instance, the XML spec doesn't enforce attribute ordering. I think you're going to have to parse it yourself, and that is non-trivial.
Why do you have to do this ? I'm guessing you have some client 'XML' that enforces or relies on non-standard construction. In that case I'd push back and get that fixed, rather than jump through numerous fixes to try and accommodate this.
I'm not entirely sure that I understand what it is you are trying to do. Have you tried using CDATA regions for the parts of the document you don't want the parser to touch?
Also relying on attribute order is not a good idea - if I remember the XML standard correctly then order is never to be expected.
It sounds like you are dealing with some malformed XML and that it would be easier to first turn it into proper XML.

XML template in Java

I need to generate XML and they differ only in the values, that the tags contain.
Is it possible to create a template XML and then write only the values each time? (I do not want to go the JAXB way as these are small XMLs and are not worth creating objects for them).
Is this a good approach?
Any thoughts?
You can use freemarker or velocity for templating in java -- or even just add PHP tags to a sample XML to generate from a template.
I think as a general rule, though, once you start conditionally adding elements or attributes, or looping to generate multiples, you're better of generating your XML -- though I agree sometimes getting it into a format you want (not what the generator wants) is sometimes a pain.
As long as the XML file to be produced is small, simple and mostly consistent in format, I tend to buck the trend: I simply create and write a text string.
writer.out.format("<?xml version='1.0'><root><tag1>%s</tag1></root>", value1)
kinda thing.
Despite the fact that you are against jaxb (which I have yet to use), I wish to recommend a comparable way to do this with Apache's XMLBeans.
This requires you to use an xml schema - but from my experience it worth it...

Small modification to an XML document using StAX

I'm currently trying to read in an XML file, make some minor changes (alter the value of some attributes), and write it back out again.
I have intended to use a StAX parser (javax.xml.stream.XMLStreamReader) to read in each event, see if it was one I wanted to change, and then pass it straight on to the StAX writer (javax.xml.stream.XMLStreamReader) if no changes were required.
Unfortunately, that doesn't look to be so simple - The writer has no way to take an event type and a parser object, only methods like writeAttribute and writeStartElement. Obviously I could write a big switch statement with a case for every possible type of element which can occur in an XML document, and just write it back out again, but it seems like a lot of trouble for something which seems like it should be simple.
Is there something I'm missing that makes it easy to write out a very similar XML document to the one you read in with StAX?
After a bit of mucking around, the answer seems to be to use the Event reader/writer versions rather than the Stream versions.
(i.e. javax.xml.stream.XMLEventReader and javax.xml.stream.XMLEventWriter)
See also http://www.devx.com/tips/Tip/37795, which is what finally got me moving.
StAX works pretty well and is very fast. I used it in a project to parse XML files which are up to 20MB. I don't have a thorough analysis, but it was definitely faster than SAX.
As for your question: The difference between streaming and event-handling, AFAIK is control. With the streaming API you can walk through your document step by step and get the contents you want. Whereas the event-based API you can only handle what you are interested in.
I know this is rather old question, but if anyone else is looking for something like this, there is another alternative: Woodstox Stax2 extension API has method:
XMLStreamWriter2.copyEventFromReader(XMLStreamReader2 r, boolean preserveEventData)
which copies the currently pointed-to event from stream reader using stream writer. This is not only simple but very efficient. I have used it for similar modifications with success.
(how to get XMLStreamWriter2 etc? All Woodstox-provided instances implement these extended versions -- plus there are wrappers in case someone wants to use "basic" Stax variants, as well)

Categories