Best Practice for large XML file builder

Best Practice for large XML file builder - java

I have to build an XML file for an input to a SOAP service in Java. The input xml can consist of at least 1000 tags. What is the best way to build the XML? I have the XSD files but it is a bit complicated to use JAXB. Is XMLStreamWriter a good option for that?

XMLStreamWriter is one of the better APIs to use for writing XML from a Java application, but it has a few quirks (e.g. its namespace handling is a bit bizarre) and you may find it worthwhile to wrap it in a convenience API that knows about the kind of document you are writing, e.g. what namespaces it uses.
One of the advantages of the XMLStreamWriter interface is that there are plenty of implementations to choose from. For example Saxon has an implementation that gives you full control over all the XSLT/XQuery serialization options plus Saxon extensions (for example, you can even control the output order of attributes!)
One of the problems I hit with all event-based APIs is that sooner or later you find yourself forgetting to write an end tag, and that can be quite tricky to debug. Using a wrapper API that forces you to include the element name in a call on endElement() can be useful for debugging; if debugging is switched on you can keep a stack of element names and check that endElement() is writing the right tag; with debugging switched off you just drop this check.
Serializing using JAXB is higher-level, of course, but the downside is that it gives you less control.

Related

Custom StAX Parser for XML using javax wrappers

Custom StAX Parser for XML using javax wrappers
How do you do this; or at least good suggestions on the right documentation / examples / tutorials?
I've been using the javax.xml.stream package to process XML files but the application is begging for some "non-standard XML" (easy to understand what the means if you're not picky). I can write the parser, but I want this to be configurable: so that the app continues to use the same XML processing code except for changing the parser as needed.
The hard part at this point is finding concrete info on how this is done. Documentation speaks of, for example, configuring the parameters of SAXParserFactory and such, but I haven't found specific documentation or examples. I've even looked into some existing StAX source code. Need some good hints / guidance on how this is done in order to move forward.

According to the documentation, you can't. You can use one of three approved parsers. Anything else will result in an error.

XML API for best performance

I have an application that works with a lot of XML data. So, I want to ask you which is the best API to handle XML in java. Today, I'm using W3 and, for performance, I want to migrate to some API.
I make XML from 0, a lot of transforms, import into database (mysql, mssql, etc), export from database to html, modifi of those XML, and more.
Is JDOM the best option? do you know some other better than JDOM?
I heard (by reading pages) about javolution. Somebody use it?
Which API you recommend me?

If you have vast amounts of data, the main thing is to avoid having to load it all into memory at once (because it will use a vast amount of memory, and because it prevents you overlapping IO and processing). Sadly, i believe most DOM and DOM-like libraries (like DOM4J) do just that, so they are not well suited for processing vast amounts of XML efficiently.
Instead, look at using a streaming API, like SAX or StAX. StAX is, in my experience, usually easier to use.
There are other APIs that try to give you the convenience of DOM with the performance of SAX. Javolution might be one; VTD-XML is another. But to be honest, i find StAX quite easy to work with - it's basically a fancy stream, so you just think in the same way as if you were reading a text file from a stream.
One thing you might try is combining JAXB with StAX. The idea is that you stream the file using StAX, then use JAXB to unmarshal chunks within it. For instance, if you were processing an Atom feed, you could open it, read past the header, then work in a loop unmarshalling entry elements to objects one at a time. This only really works if your format consists of a sequence of independent elements, like Atom; it would be largely useless on something richer like XHTML. You can see examples of this in the JAXB reference implementation and a guy's blog post.

The answer depends on what performance aspects are important for your application. One factor is whether you are handling large XML documents.
For parsing, DOM-based approaches will not scale well to large documents. If you need to parse large documents, non-DOM parsers such as those using SAX and StAX will be faster and less resource intensive. However, if you need to transform XML after parsing, using either XSL or a DOM API, you are going to need the whole document in memory in any case.
For creating XML from code, StAX provides a nice API for this. Since the approach is stream-based, this will scale well to writing very large documents.

Well, the most developers I know and myself, we use dom4J, maybe if you have the time you could write a small performancetest with use of both frameworks, then you will see the difference. I prefere dom4j.

Java Collada Parser - XML Pull based implementation

I am looking at a set of parsers generated for Atom, XAL, Kml etc. seemingly using an automated technique with a XML pull based parser. The clue towards the automation is presence of "package.html" in all XML-to-Java mapped classes folders. I would like to produce a similar one for the rather large Collada 1.4 spec. My first attempt with Altova ran into small problems due the "enum" keyword. I am sure I can fix it in the next run with appropriate renaming. Khronos admit to not designing the 1.4 spec to being automated parser generation friendly.
The actual parsers i.e. XAL parser, Atom parser etc. implement the XMLEventParser interface. I would like to know if anybody has encountered/used this pattern. If so which tool can be used to map the XSD to a class set simply giving access to the data components of the nodes using getters and setters.

I'm not sure I understand your question, but it appears that you want to process XML formats like Atom and represent it in objects with getters/setters. This can easily be done with JAXB.
For an example see:
http://bdoughan.blogspot.com/2010/09/processing-atom-feeds-with-jaxb.html

Why choose an XSL-transformation?

For a current project the decision has to be made whether to use XML and an XSL-transformation to produce HTML or to directly use HTML-templates.
I'd be interested in arguments for or against the XSL-approach. I understand that in cases where you have to support many different layouts, an XSL-solution has a lot of advantages, but why would you choose it in those cases where you only have to support one target layout?
Edit: We're talking about Java here.

XSLT is a functional programming language and you can use it to create frontends as rich as any templating system. However, you shouldn't — you and your team will go insane.
Both options present the opportunity of transforming objects into a presentation form in a logical sort of way. XSLT is best suited for creating more XML, which might lead you to believe that it's a perfect candidate to use to create XHTML. However, creating XHTML shouldn't be the primary goal — Creating a user experience is. Don't concern yourself with the medium.
Two significant drawbacks to XSLT concern the syntax: Your templates, and the templates that they include, and the templates that those templates include will all be gigantic and verbose. Second, you'll have to do a lot of functional programming, and less-experienced engineers may be confused and terrified when they encounter a recursive template with an accumulating function parameter instead of a simple for loop.
If you're attracted by the beauty of transforming logically-constructed, valid XML entities, consider instead a type-safe templating system that transforms beans instead. Check out Google XML Pages, and create logically-organized, type-safe templates that will be easy for future engineers to pick up and extend.

I created an XML/XSLT-driven UI for an enterprise product about 5 years ago. We're still using it, and I can now look back on my experience and see many pros and cons:
Pros:
XSL is a powerful declarative language, useful & fun for experienced developers, and transforms can do pretty amazing things in a few lines of code
XSL is designed for use with XML, so if your data is already XML then it makes a lot of sense
Separation of concerns (rendering vs. data) is better than many template languages
XSL-based rendering can be easily "subclassed". By that I mean: let's say you have data class A with associated template A.xslt. For class B derived from A, you can easily create B.xslt with only the small differences, and include A.xslt for inherited behaviors. This makes it less succeptible to breaking due to changes in A.xslt.
The above point also gives you the power to do overrides. For class A with associated A.xslt, we can easily switch the associated template to A-custom.xslt, which is a few small changes plus inheritance of A.xslt. We can do this on the fly in the field and again, the benefit is that A-custom.xslt is only a few lines, not an entire modified copy of the original A.xslt. The small footprint means it's more likely to work with multiple versions of A.xslt.
In .NET 2.0, XSLT is compiled and becomes very fast. There may be similar tech for Java. (Most template languages do this now too.)
In .NET, it's possible to create an "Object XPath Navigator", which lets you transform your data objects without having to convert them to an XML object. Again there may be similar tech in Java
XSLT is smart about HTML & handles escaping, white space issues, etc. well
Cons:
XSL is a powerful declarative language, confusing to newer programmers - and fewer people know XSLT well
XSL is verbose. XML is often verbose too.
XSL transforms are probably slower than "native" templates. Even when compiled there's still more state overhead to XSL than most template languages
It's hard to pass parameters to XSLs, you have to either send them in line with your data (forcing you to create extra XML) or via system-specific methods (which may also involve constructing XML data)
If you don't have an ObjectXPathNavigator or equivalent, you'll incur significant overhead when turning your data objects into XML for transformation
Depending on the capabilities of your transformer, you may also incur buffering overhead as you transform into a string buffer and then send that string to the output device
The more advanced your XSLT usage, the less likely it is that your tools will support you (specifically as you start to use includes or faster ways to pass XML data in)
I'll try to update as I think of more issues. I think that looking back now, my verdict would be to stick with a common template language. What were once big issues when I selected XML/XSLT have now been addressed by newer and more mature revisions of the major template engines. We do still benefit greatly from the ability to inherit .xslt files, which is something most template engines don't do well. But in the end the value of having lots of developers providing examples is far greater (compare ASP.NET answers vs XSLT answers on StackOverflow, for instance.)
Hope that helps!

I've done significant development using XSLT and it has been both tremendously successful and a complete failure at two different sites.
A few thoughts before a conclusion:
I don't think anyone would argue that XSLT is far more powerful than a template parsing engine, it's a functional language.
Although it's not as widely adopted as most procedural languages, it's still a real language that's being used out there for actual projects, people can be hired already with knowledge of XSLT and it's a transferable skill for your current staff.
XSLT has also been around for a while now, the implementations are mature, I'm sure this is the case for long running templating engines (like Velocity) but newer engines may be less robust.
Whatever template language you decide on it's unlikely to be as well documented as XSLT. Check out any of the Michael Kay Programmer's reference series for an example on how to do a great reference book.
Tool support is generally very good ... if you have a budget. XMLSpy and Stylus Studio have both been very useful for me in the past.
XSLT is not only hard but, more importantly, different. Most people are not Computer Science graduates formally trained in functional programming. The majority of programmers will write XSLT in a procedural style which will not harness any power of the language and give you a maintenance headache.
XSLT transforms can be slow and can take a lot of memory. You may have problems if you have a stylesheet with a large XML input.
I love XSLT but whether you should use it or not comes down to a few points:
Are you committed to XSLT? Do you have serious in-house expertise in XSLT? Are you prepared to get some?
Is your data in XML? Does it make sense in XML? Do you have someone in-house who loves your data enough to make sure it's well structured and there's always an appropriate schema?
Unless the answer to those questions is yes and you have complex data that requires a complex rendering process, I wouldn't consider using XSLT ... especially if there's no experience in the team. Bad XSLT is much, much, much worse than a bad template.
However, it can render complex data in a maintainable fashion which would be impossible using many of today's templating engines.

Going the XSL way will future-proof your application. Meaning, if you decide in the future to add more templates with different layouts you will be able capitalize on those advantages. In my current project we save off the XML used (in an XMLType or CLOB) and allow other applications to access the data and XSL templates to generate documents via a web service. This was an after thought of the original design that was super easy to implement due to our decision to use XML/XSL.

XSLT has the advantage of being able to also produce output in other document types (i.e. pdf) and pdf output is very likely nowadays. XML/XSLT does also separate data from the view.

When we have done XSLT in the past, it was to allow the ability to extend our product. The output remained the same, only the presentation layer needed to change. This allowed us a lot of flexibility when we had clients that wanted to "customize" their UI, since all we needed to do was replace the XSLT file. If you foresee needing to make a lot of those kinds of changes, XSLT might be your answer.
However, as stated above, the XSLT syntax and functional programming mentality can make it difficult to effectively produce templates. We found that we liked to stick to the tricks that we learned and when we had client requests that fell outside of what we already knew, no one wanted to volunteer for the ticket. Usually someone eventually figured out how to do the task and our "bag of tricks" got larger, but it was often very cumbersome to figure out new things.
If you don't foresee change the UI ever, or at least not much, XSLT may not be worth the extra effort.

Please don't use XML/XSLT for web front-ends. I was in projects like this and it's horrible. Often you have to first produce the XML from objects or something similar, which doesn't make sense. A second point is, that there are so many good HTML editors out there for free, but I've found none for XSLT. So editing complex XSLT is no fun. I would recommend to go with HTML templates and a common template engine.

Depending on your application, having an XML layer that is then transformed to XHTML via XSLT also meens, that you can write easy WebServices to the XML layer - allowing your customers to consume your sites data...
Having the XML sent to the browser with a transformation link (forgot the exact syntax...) also meens less bandwith needed, as the XSLT file will stay the same and you only need to pass the raw XML it is built from - sort of like using an external CSS style sheet instead of adding the style attributes to your markup ;)

I think you need to examine what the source of your data will be. As mentioned by boris callens earlier, if the you are pulling from a database you will have to transform first to XML, then apply your transformations. Should the data source be RSS or the like, then XSLT is a natural choice.
XPATH and XSLT has a high learning curve and functional programming can be daunting to get your arms around. In time crunch this may not be the right choice.
For front end work JSON has a lighter payload, and is readily supported by jQuery and other Javascript libraries. You may want to consider JSON as the data protocol as the jQuery library is far more accessible to developers and the time to productivity with the framework is far less than with XSLT, embedded Javascript in tags, awful syntax and all the other minutia that come with XML/XPATH/XSLT on the front end.

Keep it simple. That's a principle that one gets to appreciate more and more.
Velocity or Freemarker are incredibly flexible and versatile. Your code base will be clear, easily understandable, and it will run much (much) faster than the X monstrosities.

http://fishbowl.pastiche.org/2002/02/12/xslt_is_the_spawn_of_satan/

I see how the XSL approach can be handy if your data is already XML.
But usually it isn't. It's somewhere in a database, needs to be generated on the spot or comes from some service.
Creating XML from this source to then be able to create HTML from that XML is useless in my opinion. I would stick with (X)Html templates.

In contrast to HTML, there are a lot of XML tools available if you need to do parsing and processing of the templates in any way. So you should choose XML to get the benefits of using tools and libraries for XML.
However, that said, it may just be that XHTML fits your needs, since this gives you full support of XML tools and libraries while still being normal HTML which is correctly processed by modern web browsers. If you need to do post-processing of those later on, you can still apply XSLT to the XHTML data.

I've used XML & XSLT in a previous project, financial web sites, and it worked well for us, but:
We had multiple customers, which
varied the number of outputs we had.
We could replace the XSLT stylesheet
and this made changes to the site
easier to manage for the developers
We had a specialist web editor on the team. We gave them example XML & they could edit the stylesheets directly
If there were ever any wording changes that needed to go onto the website yesterday ( it was a bank, this happened surprisingly often), we could just deploy the new XSLT without redeploying the entire site.
Multiple different output formats were needed. We used FOP for transformation to PDF, which is based upon the same sort of technology, so wasn't too hard for us to understand :-)
The main reason I see for using XSLT is if you have multiple sites all based upon the same XML, but requiring different HTML output.

XML + XSLT are really cool. You have the ability to output many types of target formats in the future.
But ne aware of embedded HTML in the XML. Firefox XSLT doesn't support "disable-output-escaping". See Bugzilla.

We use XSLT to generate html in our content management system and it works just fine.
Some hints: Don't try to generate all the page at once from one big hairy XML, you'll go insane. Use the HTML template (plain text/html file with styles, decorations and basic markup) with embedded markers (like, <!--MENU-->, <!--CONTENT-->), and replace markers with xslt-transformation of appropriate data.
Having said that, I doubt you really need xslt if you only going to have one layout, forever.

When and why would you use Apache commons-digester?

Out of all the libraries for inputing and outputting xml with java, in which circumstances is commons-digester the tool of choice?

From the digester wiki
Why use Digester?
Digester is a layer on top of the SAX
xml parser API to make it easier to
process xml input. In particular,
digester makes it easy to create and
initialise a tree of objects based on
an xml input file.
The most common use for Digester is to
process xml-format configuration
files, building a tree of objects
based on that information.
Note that digester can create and
initialise true objects, ie things
that relate to the business goals of
the application and have real
behaviours. Many other tools have a
different goal: to build a model of
the data in the input XML document,
like a W3C DOM does but a little more
friendly.
and
And unlike tools that generate
classes, you can write your
application's classes first, then
later decide to use Digester to build
them from an xml input file. The
result is that your classes are real
classes with real behaviours, that
happen to be initialised from an xml
file, rather than simple "structs"
that just hold data.
As an example of what it's NOT used for:
If, however, you are looking for a direct representation of the input xml document, as
data rather than true objects, then digester is not for you; DOM, jDOM or other more
direct binding tools will be more appropriate.
So, digester will map XML directly into java objects. In some cases that's more useful than having to read through the tree and pull out options.

My first take would be "never"... but perhaps it has its place. I agree with eljenso that it has been surpassed by competition.
So for good efficient and simple object binding/mapping, JAXB is much better, or XStream. Much more convenient and even faster.
EDIT 2019: also, Jackson XML, similar to JAXB in approach but using Jackson annotations

If you want to create and intialize "true" objects from XML, use a decent bean container, like the one provided by Spring.
Also, reading in the XML and processing it yourself using XPath, or using Java/XML binding tools like Castor, are good and maybe more standard alternatives.
I have worked with the Digester when using Struts, but it seems that it has been surpassed by other tools and frameworks for the possible uses it has.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.