Best way for building objects out of XMLs at runtime

Best way for building objects out of XMLs at runtime - java

I'm in a situation that I have a lot of of XMLs, that are sent to me from a server and I'm using JAXB or any API based on that architecture for building instances of objects.
The problem is, I have to per-determine the class that I want to unmarshall for at compile time. My solution that is in my mind, is to read the incoming XML object and based on some tags, I will direct the unmrashaller to make an instance of the specified class. That approach will let me have a lot of IFs statements and big state machine.
Is there a better design pattern or approach ?

Try using Apache digester 3, I think it can save you lots of "if"s and is not difficult to use at all.
Have a look at this article: http://www.javaworld.com/javaworld/jw-10-2002/jw-1025-opensourceprofile.html

Related

Serializing objects with the Java Preferences API

When I first started using the Java Preferences API, the one glaring omission from the API was a putObject() method. I've always wondered why they did not include it.
So, I did some googling and I found this article from IBM which shows you how to do it: http://www.ibm.com/developerworks/library/j-prefapi/
The method they're using seems a bit hackish to me, because you have to break the Object up into byte matrices, store them, and reassemble them later.
My question is, has anyone tried this approach? Can you testify that it is a good way to store/retrieve objects?.
I'm also curious why the Java devs left putObject() out of the API. Does anyone have valuable insight?

I'm also curious why the Java devs left putObject() out of the API.
Does anyone have valuable insight?
From: http://docs.oracle.com/javase/7/docs/technotes/guides/preferences/designfaq.html
Why doesn't this API contain methods to read and write arbitrary
serializable objects?
Serialized objects are somewhat fragile: if the version of the program
that reads such a property differs from the version that wrote it, the
object may not deserialize properly (or at all). It is not impossible
to store serialized objects using this API, but we do not encourage
it, and have not provided a convenience method.

The article describes a reliable way to do it. I see there are a couple of things I may do differently (like I would store the count of the number of pieces as well as the pieces themselves so that I can figure things out easily when I retrieve them).
Your comment about Serialization is wrong though.... the object you want to store has to be Serializable.... that's how the ObjectOutputStream that the document uses does it's job.
So, Yes, it looks like a reliable mechanism, you need to have Serializable objects, and I imagine that the reason that putObject and getObject are not part of the API for two reasons:
it's not part of the way that is native to Windows registries
It risks people putting huge amounts of data in the registry.
Storing serialized objects in the registry strikes me as being somewhat concerning because they can be so big. I would only use it for occasions when there is no way to reconstruct the Object from constructors, and the serialized version is relatively small.

Generate object model out of RelaxNG schema with RNGOM - how to start?

I want to generate an object model out of an RelaxNG Schema.
Therefore I want to use the RNGOM Object Model/Parser (mainly because I could not find any alternative - although I don't even care about the language the parser is written in/generates). Now that I checked out the RNGOM source from SVN, I don't have ANY idea how to use RNGOM, since there is not any piece of information out there about the usage.
A useful hint how to start with RNGOM - a link, example, or any description which saves me from having to read understand the whole source code of RNGOM - will be awarded as an answer.
Even better would be a simple example how to use the parser to generate an Object model out of an RNG file.
More infos:
I want to generate Java classes out of the following RelaxNG Schema:
http://libvirt.org/git/?p=libvirt.git;a=tree;f=docs/schemas;hb=HEAD
I found out that the Glassfish guys are using rngom to generate the same object model I need, but I could not yet find out how they are using rngom.

A way to proceed could be to :
use jing to convert from Relax NG to XML Schema (see here)
use more common tools to generate classes (e.g. JaxB).

Hi I ran into mostly the same requirement except I am concentrating on the Compact Syntax. Here is one way of doing what you want but YMMV.
To give some context, my goal in 2 phases: (a) Trying to slurp RelaxNG Compact Syntax and traverse an object/tree to create Spring 4 POJOs usable in Spring 4 Rest Controller. (b) From there I want to develop a request validator that uses the RNG Compact and automatically validates the request before Spring de-serializes the request. Basically scaffolding JSON REST API development using RelaxNG Compact Syntax as both design/documentation and JSON schema definition/validation.
For the first objective I thought about annotating the CompactSyntax with JJTree but I am obviously not fluent in JavaCC so I decided to go a more programatic approach...
I analyzed and tested the code in several ways to determine if there was a tree implementation in binary, digested and/or nc packages but I don't think there is one (an om/tree) as such.
So my latest, actually successful approach, has been to build upon binary and extend SchemaBuilderImpl, implement the visitor interface, and passing my custom SchemaBuilderImpl to CompactSyntax using the long constructor: CompactSyntax(CompactParseable parseable, Reader r, String sourceUri, SchemaBuilder sb, ErrorHandler eh, String inheritedNs)
When you call CompactParseable.parse you will get structured events in the visitor interface and I think this is good enough to traverse the rng schema and from here you could easily create an OM or tree.
But I am not sure this is the best approach. Maybe I missed something and there is in fact an OM/Tree built by the rngom implementation (in my case CompactSyntax) that you can traverse to determine parent/child relationships more easily. Or maybe there are other approaches to this.
Anyway, this is one approach that seems to be working for what I want. Is mostly visitor pattern based and since the interfaces were there I decided to use them. Maybe it will work for you. Bottom line, I could not find an OM/AST that can be traversed implemented anywhere in the implementation packages (nc, binary, digested).

Creating multithreaded Java server and clients, but messages have to be in XML format

I've got to write a multithreaded chat program, using a server and clients but each message sent has to be in XML.
Is it simpler/easier just to write out all the code in java, and then try and somehow alter it so the messages are sent in XMl format, or would it be simpler just to try and go for it in XML and hope it works. I'll admit I don't know that much about XML. :)
Also any links to any relevant online help/tutorials would be much appreciated.
Thanks.

When messing with XML in Java, PLEASE consider using JAXB or something similar. It allows you to work with a normal object graph in memory and then serialize that to XML in one operation (and the other way around).
Manipulating XML through the DOM API is a slow way to lose your sanity, do not do it for any non-trivial amount of XML.
I fail to see what the program being multithreaded or a server have to do with it though...

Check out XStream. You can use this to marshall a normal Java object into XML, and back again into an object, without having to do anything instrusive like define interfaces or specify schema etc. i.e. it works out of the box for objects you already have defined. For most cases it's seamless in its default mode.
XStream produces a direct XML serialised representation of a Java object (i.e. XML elements represent each field of a Java object directly). You can customise this further as/when you require. If you want to define persisted objects in terms of schema (XSD) then it's not appropriate. However if you're transporting objects where persistence is short-term and you're not worried about conforming to some schema then it's definitely of use.
e.g.
Person person = new Person("Brian Agnew");
XStream xStream = new XStream();
System.out.println(xStream.toXML(person));
and conversion from XML to the Person object is similarly trivial.
(note XStream is thread-safe)

There is something called XML RPC. This examples pretty much shows what you're looking for:
http://docstore.mik.ua/orelly/xml/jxml/ch11_02.htm

It would be simpler to use existing XMPP clients and servers and not write your own at all.
If this is in fact homework, then I would suggest writing the client and server as you have suggested, using all java, but use a String as the message. You can then easily add parsing of the string to/from XML when all other parts are working.

I would suggest to also have a look at Betwixt and Digester. For Digester there are some tutorials which can be found in the Digister-wiki. Betwixt provides some pretty good tutorials right on its website.
Additionally to these two tools there is a list of alternatives that can be found in the Reference section of http://wiki.apache.org/commons/Digester/WhyUseDigester

You're on the right page trying to break the task into smaller pieces.

Homemade vs. Java Serialization

I have a certain POJO which needs to be persisted on a database, current design specifies its field as a single string column, and adding additional fields to the table is not an option.
Meaning, the objects need to be serialized in some way. So just for the basic implementation I went and designed my own serialized form of the object which meant concatenating all it's fields into one nice string, separated by a delimiter I chose. But this is rather ugly, and can cause problems, say if one of the fields contains my delimiter.
So I tried basic Java serialization, but from a basic test I conducted, this somehow becomes a very costly operation (building a ByteArrayOutputStream, an ObjectOutputStream, and so on, same for the deserialization).
So what are my options? What is the preferred way for serializing objects to go on a database?
Edit: this is going to be a very common operation in my project, so overhead must be kept to a minimum, and performance is crucial. Also, third-party solutions are nice, but irrelevant (and usually generate overhead which I am trying to avoid)

Elliot Rusty Harold wrote up a nice argument against using Java Object serialization for the objects in his XOM library. The same principles apply to you. The built-in Java serialization is Java-specific, fragile, and slow, and so is best avoided.
You have roughly the right idea in using a String-based format. The problem, as you state, is that you're running into formatting/syntax problems with delimiters. The solution is to use a format that is already built to handle this. If this is a standardized format, then you can also potentially use other libraries/languages to manipulate it. Also, a string-based format means that you have a hope of understanding it just by eyeballing the data; binary formats remove that option.
XML and JSON are two great options here; they're standardized, text-based, flexible, readable, and have lots of library support. They'll also perform surprisingly well (sometimes even faster than Java serialization).

You might try Protocol Buffers, it is a open-source project from Google, it is said to be fast (generates shorter serialized form than XML, and works faster). It also handles addition of new field gently (inserts default values).

You need to consider versioning in your solution. Data incompatibility is a problem you will experience with any solution that involves the use of a binary serialization of the Object. How do you load an older row of data into a newer version of the object?
So, the solutions above which involve serializing to a name/value pairs is the approach you probably want to use.
One solution is to include a version number as one of field values. As new fields are added, modified or removed then the version can be modified.
When deserializing the data, you can have different deserialization handlers for each version which can be used to convert data from one version to another.

XStream or YAML or OGNL come to mind as easy serialization techniques. XML has been the most common, but OGNL provides the most flexibility with the least amount of metadata.

Consider putting the data in a Properties object and use its load()/store() serialization. That's a text-based technique so it's still readable in the database:
public String getFieldsAsString() {
Properties data = new Properties();
data.setProperty( "foo", this.getFoo() );
data.setProperty( "bar", this.getBar() );
...
ByteArrayOutputStream out = new ByteArrayOutputStream();
data.store( out, "" );
return new String( out.toByteArray(), "8859-1" ); //store() always uses this encoding
}
To load from string, do similar using a new Properties object and load() the data.
This is better than Java serialization because it's very readable and compact.
If you need support for different data types (i.e. not just String), use BeanUtils to convert each field to and from a string representation.

I'd say your initial approach is not all that bad if your POJO consists of Strings and primitive types. You could enforce escaping of the delimiter to prevent corruptions. Also if you use Hibernate you encapsulate the serialization in a custom type.
If you do not mind another dependency, Hessian is supposedly a more efficient way of serializing Java objects.

How about the standard JavaBeans persistence mechanism:
java.beans.XMLEncoder
java.beans.XMLDecoder
These are able to create Java POJOs from XML (which have been persisted to XML). From memory, it looks (something) like...
<object class="java.util.HashMap">
<void method="put">
<string>Hello</string>
<float>1</float>
</void>
</object>
You have to provide PersistenceDelegate classes so that it knows how to persist user-defined classes. Assuming you don't remove any public methods, it is resilient to schema changes.

You can optimize the serialization by externalizing your object. That will give you complete control over how it is serialized and improve the performance of process. This is simple to do, as long as your POJO is simple (i.e. doesn't have references to other objects), otherwise you can easily break serialization.
tutorial here
EDIT: Not implying this is the preferred approach, but you are very limited in your options if ti is performance critical and you can only use a string column in the table.

If you are using a delimiter you could use a character which you know would never occur in your text such as \0, or special symbols http://unicode.org/charts/symbols.html
However the time spent sending the data to the database and persisting it is likely to be much larger than the cost of serialization. So I would suggest starting with some thing simple and easy to read (like XStream) and look at where your application is spending most of its time and optimise that.

I have a certain POJO which needs to be persisted on a database, current design specifies its field as a single string column, and adding additional fields to the table is not an option.
Could you create a new table and put a foreign key into that column!?!? :)
I suspect not, but let's cover all the bases!
Serialization:
We've recently had this discussion so that if our application crashes we can resurrect it in the same state as previously. We essentially dispatch a persistance event onto a queue, and then this grabs the object, locks it, and then serializes it. This seems pretty quick. How much data are you serializing? Can you make any variables transient (i.e. cached variables)? Can you consider splitting up your serialization?
Beware: what happens if your objects change (locking) or classes change (diferent serialization id)? You'll need to upgrade everything that's serialized to latest classes. Perhaps you only need to store this overnight so it doesn't matter?
XML:
You could use something like xstream to achieve this. Building something custom is doable (a nice interview question!), but I'd probably not do it myself. Why bother? Remember if you have cyclic links or if you have referencs to objects more than once. Rebuilding the objects isn't quite so trivial.
Database storage:
If you're using Oracle 10g to store blobs, upgrade to the latest version, since c/blob performance is massively increased. If we're talking large amounts of data, then perhaps zip the output stream?
Is this a realtime app, or will there be a second or two pauses where you can safely persist the actual object? If you've got time, then you could clone it and then persist the clone on another thread. What's the persistance for? Is it critical it's done inside a transaction?

Consider changing your schema. Even if you find a quick way to serialize a POJO to a string how do you handle different versions? How do you migrate the database from X->Y? Or worse from A->D? I am seeing issues where we stored a serialize object into a BLOB field and have to migrate a customer across multiple versions.

Have you looked into JAXB? It is a mechanism by which you can define a suite of java objects that are created from an XML Schema. It allows you to marshal from an object hierarchy to XML or unmarshal the XML back into an object hierarchy.

I'll second suggestion to use JAXB, or possibly XStream (former is faster, latter has more focus on object serialization part).
Plus, I'll further suggest a decent JSON-based alternative, Jackson (http://jackson.codehaus.org/Tutorial), which can fully serializer/deserialize beans to JSON text to store in the column.
Oh and I absolutely agree in that do not use Java binary serialization under any circumstances for long-term data storage. Same goes for Protocol Buffers; both are too fragile for this purpose (they are better for data transfer between tigtly coupled systems).

You might try Preon. Preon aims to be to binary encoded data what Hibernate is to relational databases and JAXB to XML.

Serialize Java objects into Java code

Does somebody know a Java library which serializes a Java object hierarchy into Java code which generates this object hierarchy? Like Object/XML serialization, only that the output format is not binary/XML but Java code.

Serialised data represents the internal data of objects. There isn't enough information to work out what methods you would need to call on the objects to reproduce the internal state.
There are two obvious approaches:
Encode the serialised data in a literal String and deserialise that.
Use java.beans XML persistence, which should be easy enough to process with your favourite XML->Java source technique.

I am not aware of any libraries that will do this out of the box but you should be able to take one of the many object to XML serialisation libraries and customise the backend code to generate Java. Would probably not be much code.
For example a quick google turned up XStream. I've never used it but is seems to support multiple backends other than XML - e.g. JSON. You can implement your own writer and just write out the Java code needed to recreate the hierarchy.
I'm sure you could do the same with other libraries, in particular if you can hook into a SAX event stream.
See:
HierarchicalStreamWriter

Great question. I was thinking about serializing objects into java code to make testing easier. The use case would be to load some data into a db, then generate the code creating an object and later use this code in test methods to initialize data without the need to access the DB.
It is somehow true that the object state doesn't contain enough info to know how it's been created and transformed, however, for simple java beans there is no reason why this shouldn't be possible.
Do you feel like writing a small library for this purpose? I'll start coding soon!

XStream is a serialization library I used for serialization to XML. It should be possible and rather easy to extend it so that it writes Java code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.