How to stream XML data using XOM?

How to stream XML data using XOM? - java

Say I want to output a huge set of search results, as XML, into a PrintWriter or an OutputStream, using XOM. The resulting XML would look like this:
<?xml version="1.0" encoding="UTF-8"?>
<resultset>
<result>
[child elements and data]
</result>
...
...
[1000s of result elements more]
</resultset>
Because the resulting XML document could be big (hundreds of megabytes, perhaps), I want to output it in a streaming fashion (instead of creating the whole Document in memory and then writing that).
The granularity of outputting one <result> at a time is fine, so I want to generate one <result> after another, and write it into the stream. In other words, I'd simply like to do something like this pseudocode (automatic flushing enabled, so don't worry about that):
open stream/writer
write declaration
write start tag for <resultset>
while more results:
write next <result> element
write end tag for <resultset>
close stream/writer
I've been looking at Serializer, but the necessary methods, writeStartTag(Element), writeEndTag(Element), write(DocType) are protected, not public! Is there no other way than to subclass Serializer to be able to use those methods, or to manually write the start and end tags directly into the stream as Strings, bypassing XOM altogether? (The latter wouldn't be too bad in this simple example, but in the general case it would get quite ugly.)
Am I missing something or is XOM just not made for this?
With dom4j I could do this easily using XMLWriter - it has constructors that take a Writer or OutputStream, and methods writeOpen(Element), writeClose(Element), writeDocType(DocumentType) etc. Compare to XOM's Serializer where the only public write method is the one that takes a whole Document.
(This is related to my question about the best dom4j replacement where XOM is a strong contender.)

I ran in to the same issue, but found it's pretty simple to do what you mentioned as an option and subclass Serializer as follows:
public class StreamSerializer extends Serializer {
public StreamSerializer(OutputStream out) {
super(out);
}
#Override
public void write(Element element) throws IOException {
super.write(element);
}
#Override
public void writeXMLDeclaration() throws IOException {
super.writeXMLDeclaration();
}
#Override
public void writeEndTag(Element element) throws IOException {
super.writeEndTag(element);
}
#Override
public void writeStartTag(Element element) throws IOException {
super.writeStartTag(element);
}
}
Then you can still take advantage of the various XOM config like setIdent, etc. but use it like this:
Element rootElement = new Element("resultset");
StreamSerializer serializer = new StreamSerializer(out);
serializer.setIndent(4);
serializer.writeXMLDeclaration();
serializer.writeStartTag(rootElement);
while(hasNextElement()) {
serializer.write(nextElement());
}
serializer.writeEndTag(rootElement);
serializer.flush();

As far as I know, XOM doesn't support streaming directly.
What I used when I wanted to stream my XML documents was NUX, which has Streaming XML Serializer, similar to standard Serializer class in XOM. NUX is compatible with XOM. I downloaded NUX sources, extracted few NUX classes (StreamingSerializer interface, StreamingXMLSerializer -- which works for XOM documents, StreamingVerifier and NamespacesInScope), put them into my project, and it works like a charm. Too bad this isn't directly in XOM :-(
NUX is very nice companion to XOM: http://acs.lbl.gov/software/nux/, working mirror download: nux-1.6.tar.gz
Link to API: http://acs.lbl.gov/software/nux/api/nux/xom/io/StreamingSerializer.html
Here is sample code (methods are called in order: start(), n*nextResult(), finish(), serializer is StreamingXMLSerializer from NUX):
void start() {
serializer.writeXMLDeclaration();
Element root = new Element("response");
root.addAttribute(new Attribute("found", Integer.toString(123)));
root.addAttribute(new Attribute("count", Integer.toString(542)));
serializer.writeStartTag(root);
serializer.flush();
}
void nextResult(Result result) {
Element element = result.createXMLRepresentation();
serializer.write(element);
serializer.flush();
}
void finish() {
serializer.writeEndTag();
serializer.flush();
}

Related

Spring Integration - aggregate and transform

What would be the simplest integration component arrangement in my use case:
Receive messages from multiple sources and in multiple formats (all messages are JSON serialized objects).
Store messages in buffer up to 10 seconds (aggregate)
Group messages by different class property getter (eg class1.someId(), class2.otherId(), ...)
Release all messages that are grouped and transform to new aggregated message.
So far (point 1. and 2.), I'm using aggregator, but don't know if there is out of box solution for problem at 3.) - or I will have to try to cast each Message and check if type of object is class1 - then use correlationstrategy someId, if class2 then otherId.
For problem 4.) - I could manually code something - but Transformer seems like a good component to use, I just don't know if there is something like aggregating transformer where I can specify mapping rules for each input type.
UPDATE
Something like this:
class One{
public String getA(){ return "1"; }
}
class Two{
public Integer getB(){ return 1; }
}
class ReduceTo{
public void setId(Integer id){}
public void setOne(One one){}
public void setTwo(Two two){}
}
public class ReducingAggregator {
#CorrelationStrategyMethod
public String strategy(One one){
return one.getA();
}
#CorrelationStrategyMethod
public String strategy(Two two){
return two.getB().toString();
}
#AggregatorMethod
public void reduce(ReduceTo out, One in){
out.setId(Integer.valueOf(in.getA()));
out.setOne(in);
}
#AggregatorMethod
public void reduce(ReduceTo out, Two in){
out.setId(in.getB());
out.setTwo(in);
}
}
Annotations have, I suppose, different use-case than current spring ones. RediceTo could be any object including collections. In config we could specify when passed first time should it be empty list or something else (like reduce in java streams).

Not sure what you would like to see as out-of-the-box solution. That is your classes, so your methods. How Framework may make some decision on them?
Well, yes, you need to implement CorrelationStrategy. Or you can consider to use ExpressionEvaluatingCorrelationStrategy and don't write the Java code :-).
Please, elaborate more what you would like to see as an out-of-the-box feature.
The aggregating transformer is encapsulated exactly in the MessageGroupProcessor function of the Aggregator. By default it is DefaultAggregatingMessageGroupProcessor. Yes, you can code your own or again - use an ExpressionEvaluatingMessageGroupProcessor and don't write Java code again :-)

Hadoop use one instance for each mapper

I'm using Hadoop 's map reduce to parse xml files. So I have a class called Parser that can have a method parse() to parse the xml files. And So I should use it in the Mapper's map() function.
However it means that every time, when I want to call a Parser, I need to create a Parser instance. But this instance should be the same for each map job. So I'm wondering if I can just instantize this Parser just once?
And just another add-on question, why the Mapper class is always static?

To ensure one parser instance per Mapper , please use mappers setup method for instantiating your parser instance and clean using cleanup method.
Same thing we applied for protobuf parser which we had, but need to make sure that your parser instance is thread safe, and no shared data.
Note : setup and cleanup method will be called only once per mapper so we can initialize private variables there.
To clarify what cricket_007 said in "In a distributed computing environment, sharing instances of a variable isn't possible..."
we have a practice of reusing of writable classes instead of creating new writables every time we need. we can instantiate once and re-set the writable multiple times as described by Tip 6
Similarly parser objects also can be re-used(Tip-6 style). as described in below code.
For ex :
private YourXMLParser xmlParser = null;
#Override
protected void setup(Context context) throws IOException, InterruptedException {
super.setup(context);
xmlParser= new YourXMLParser();
}
#Override
protected void cleanup(Mapper<ImmutableBytesWritable, Result, NullWritable, Put>.Context context) throws IOException,
InterruptedException {
super.cleanup(context);
xmlParser= null;
}

Java: reading Xml configuration and applying to an object, what's a lightweight, simple solution

In our various applications, we have a hodge-podge of different methods used to read Xml configuration info and apply it to a Java object.
I am looking for a utility that, when given some Xml element, will automatically take any child element and set a corresponding property on the object to be configured (and of course handle any data conversion from String to correct standard Java data type).
I realize I am describing something a lot like JAXB (which I've used only a little, as part of a project to serialize/deserialize objects to Xml), so maybe it's the best solution? I just don't particularly want to be required to add the annotations to the class, and would rather it be assumed that any setter corresponds to any similarly named Xml element.
Any recommendations on what should be a standard way to do this would be appreciated. (And I'm fine if people say go back and read the JAXB docs, because that's the best solution.)
Thanks in advance.
Update: I did end up with JAXB, although it wasn't exactly painless. The main downside is that it is not case-insensitive (when you are dealing with config files, it's best not to require match by case). One other downsides are that you need to deploy 3 additional jars. I ended up with this code (maybe there is something more elegant):
public class JAXBConfigurator<T> {
private String filePath;
private Class<T> clazz;
public JAXBConfigurator(Class<T> toConfigure, String xmlFilePath) {
this.clazz = toConfigure;
this.filePath = xmlFilePath;
}
/**
* #return Parses Xml and reads configuration from Document element. Using this method assumes that the
* configuration xml starts at the top of the xml document.
* #throws Exception
*/
public T createAndConfigure() throws Exception {
return createAndConfigure(null);
}
/**
* Selects specified element from the parsed Xml document to use as the base
* for reading the configuration.
*
* #param tagName
* #return
*/
public T createAndConfigure(String tagName) throws Exception {
Document doc = XmlUtils.parse(filePath);
Node startNode;
if (tagName == null) {
startNode = doc;
} else {
startNode = XmlUtils.findFirstElement(doc, tagName);
}
return readConfigFromNode(startNode);
}
private T readConfigFromNode(Node startNode) throws JAXBException {
JAXBContext context = JAXBContext.newInstance(clazz);
Unmarshaller unmarshaller = context.createUnmarshaller();
JAXBElement<T> configElement = unmarshaller.unmarshal(startNode, clazz);
return configElement.getValue();
}
}
The class gets used like this:
JAXBConfigurator<MyConfig> configurator = new JAXBConfigurator<Config>(MyConfig.class, xmlfilePath);
instance = configurator.createAndConfigure("MyXmlStartTag");
...which seems reusable enough for most scenarios. Thanks again to everyone who responded.

JAXB (JSR-222) is configuration by exception. This means that no annotations are required. You just need to add metadata to override the default rules:
For an Example
http://wiki.eclipse.org/EclipseLink/Examples/MOXy/GettingStarted/TheBasics

Pick one of these: Castor, XMLBeans, JiBX, XStream and JAXB. I'd recommend JAXB or JiBX. And in case you use Spring for other purpose, you can also check out Spring Object/XML mapping, which is basically a wrapper around aforementioned implementations and provides a consistent API.

Return Objects created while parsing an XML Document

So I want to read data from an xml file, I am now at a point where I have XMLReader and ContentHandler in place and when the endDocument() is fired I have "collected" all the data I need from the document.
But now it seems that I ran into a wall...
How do I return the collected data (from the ContentHandler) so that it can be used in my application?

You may create a List<T> in ContentHandler.
public class MyTextHandler implements ContentHandler {
....
private ArrayList<YourModel> list;
public MyTextHandler() {
list= new ArrayList<YourModel>();
}
public ArrayList<YourModel> getList() {
return list;
}
....
}
Obtain list from the Handler:
MyTextHandler handler=new MyTextHandler();
reader.setContentHandler(handler);
InputSource is = new InputSource(filename);
reader.parse(is);
ArrayList<YourModel> list=handler.getList();

Technically ...
You dont really "return" data from a ContentHandler - the content handler is not designed to fulfill any specific input/output contract. Rather , it is an object which is supposed to "act" on XML data read actions. Some ContentHandlers may not interact with the host application - they might just print data to the console, for example. Meanwhile, others might process XML objects as beans, and then serialize those beans to a database.
If you want your application to "depend" directly on a ContentHandler to create objects, then the ContentHandler might typically be a subclass in your application which accesses the Model of your application, and writes data to that model. By "Model" here, Im referring to the M in a typical MVC application.
The Practical Answer
The contentHandler interface is meant to define the behavior of your SAX parser.. The data is often created on the endDocument() method, wherein we dump all the XML data out to a database, or as print statements.

How to stream large Files using JAXB Marshaller?

The Problem I'm facing is how to marshall a large list of objects into a single XML File, so large I can not marshall the complete list in one step. I have a method that returns these objects in chunks, but then I marshall these using JAXB, the marshaller returns with an exception that these objects are no root elements. This is ok for the normal case there you want to marshall the complete document in one step, but it also happens if I set the JAXB_FRAGMENT Property to true.
This is the desired XML output:
<rootElem>
<startDescription></startDescription>
<repeatingElem></repeatingElem>
<repeatingElem></repeatingElem>...
</rootElem>
So I assume I need some kind of listener that dynamically loads the next chunk of repeatingElements to feed it to the marshaller before he would write the closing tag of the rootElement. But how to do that? Up until now I only used JAXB to marshall small files and the JAXB documentation does not give much hints for that use case.

I'm aware that this is an old question but I came across it while searching for duplicates of another similar question.
As #skaffman suggests, you want to Marshal with JAXB_FRAGMENT enabled and your objects wrapped in JAXBElement. You then repeatedly marshal each individual instance of the repeated element. Basically it sounds like you want something roughly like this:
public class StreamingMarshal<T>
{
private XMLStreamWriter xmlOut;
private Marshaller marshaller;
private final Class<T> type;
public StreamingMarshal(Class<T> type) throws JAXBException
{
this.type = type;
JAXBContext context = JAXBContext.newInstance(type);
this.marshaller = context.createMarshaller();
this.marshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
}
public void open(String filename) throws XMLStreamException, IOException
{
xmlOut = XMLOutputFactory.newFactory().createXMLStreamWriter(new FileOutputStream(filename));
xmlOut.writeStartDocument();
xmlOut.writeStartElement("rootElement");
}
public void write(T t) throws JAXBException
{
JAXBElement<T> element = new JAXBElement<T>(QName.valueOf(type.getSimpleName()), type, t);
marshaller.marshal(element, xmlOut);
}
public void close() throws XMLStreamException
{
xmlOut.writeEndDocument();
xmlOut.close();
}
}

As you've discovered, if a class does not have the #XmlRootElement annotation, then you can't pass an instance of that class to the marshaller. However, there is an easy way around this - wrap the object in a JAXBElement, and pass that to the marshaller instead.
Now JAXBElement is a rather clumsy beast, but what it does is contains the element name and namespace of the object that you want to marshal, information which would normally be contained in the #XmlRootElement annotation. As long as you have the name and namespace, you can construct a JAXBElement to wrap your POJO, and marshal that.
If your POJOs were generated by XJC, then it will also have generated an ObjectFactory class which contains factory methods for building JAXBElement wrappers for you, making things a bit easier.
You'll still have to use the JAXB_FRAGMENT property for the repeating inner elements, otherwise JAXB will generate stuff like the XML prolog each time, which you don't want.

I don't know much of JAXB, so I can't help. But if you don't mind, I have a suggestion.
Writing XML is a lot easier than reading it, so an solution for your problem might be to use a more "low level" approach. Just write your own marshaller using one of the available open source libraries for XML. I think you can easily do what you want using dom4j.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to stream XML data using XOM? - java

Related

Spring Integration - aggregate and transform

Hadoop use one instance for each mapper

Java: reading Xml configuration and applying to an object, what's a lightweight, simple solution

Return Objects created while parsing an XML Document

How to stream large Files using JAXB Marshaller?

Categories

Resources