Best practices for object serialization to file

Best practices for object serialization to file - java

Not all projects require Databases.
Project I am currently working on doesn't want any DB to be used at all. Rather it should use object serialization to file.
This implementation would do lots of objects serialized/deserialized to file.
My question here is, what are best practices for object serialization to file ?

It depends heavily on the nature of the data, how likely the classes you will be writing to disk will change, whether you need to store just the class's data or its data and code, and whether it is meant to be human readable.
Object serialization to a file is one technique. Translating your object model to a structured text record (CSV, XML, etc.) is another. Generally, if the objects as referenced in the file must refer to each other, you will need to encode the references to an id number relevant to the file, and have the decoder rebuild the references while the objects are loading.
If you really need to control how the object marshalling to and from storage is done, you can control it in detail through the Externalizable interface. Beware, once you take on all the responsibility, you will need to handle it correctly.
As far as best practices go:
Create an in-file id for each object instance.
Encode the object's type in the serialization (this is done for you in default serialization schemes).
Add an extra field to track the class's revision, as loading "old" objects into newer versions of their classes can be problematic.
Provide for a layer that can "forward" translate an "old" on disk object of one known revision to the current class revision.

I would have suggested protocol buffers but now I recommend MessagePack

I recommend you to have a look at XStream.
It is a simple library to serialize objects to XML and back.

Related

Java Serialization and references

Say I have a serialized class that is used to save the state of my game. This serialized text is stored in a text file. If I restart my computer, reinstall java, etc. If I try to deserialize that text, will it save everything that it is referenced? For the purpose of question, assume the class has multiple ArrayList's of entitys and map elements.
Class -> Serialization - > text
Text - > Deserialization - > Class

As long as the serialized classes don't change its definition, there won't be any problem. You may even move these serialized files into another OS which deserializes to the same classes definition and it will work with no problem (unless you use libraries specific to an OS, thus breaking portability).

The serialized file will retain it's state regardless of whether or not you reinstall Java or restart your machine. It would be pretty useless otherwise: the point of serialization is to capture state in a persistent form for archival or transport in a fashion that can be to recreate that state later.
So, assuming the serialization method you are using originally saves all the references you care about, then you'll always be able to restore that object and all its references from that serialized data. Unless, perhaps, you try in five years using a new version of Java that no longer supports that serialization format or something.

Are you about standart java serialization mechanism? Java serialization mechanism will serialize data to binary format, not text. Java guarantees serialization/deserialization between versions until you use standart java library. So yes in your case serialization will work good.
But, I don't sure that standart java serialization good for your purposes. Because:
It's fully unreadable format.
If you'll change language later you must fully reimplement saving format.
As alternative you can use some of xml serialization libraries for java. In this case, save file will have readable format (good for debugging).

How do I replace Java's default deserialization with my own readObject call?

Someone thought it would be a good idea to store Objects in the database in a blob column using Java's default serialization methods.
The structure of these objects is controlled by another group and they changed a field type from BigDecimal to a Long,
but the data in our database remains the same.
Now we can't read the objects back because it causes ClassCastExceptions.
I tried to override it by writing my own readObject method,
but that throws a StreamCorruptedException because what was written by the default writeObject method.
How do I make my readObject call behave like Java's default one?
Is there a certain number of bytes I can skip to get to my data?

Externalizable allows you to take full control of serialization/deserialization. But it means you're responsible for writing and reading every field,
When it gets difficult though is when something was written out using the default serialization and you want to read it via Externalizable. (Or rather, it's impossible. If you try to read an object serialized with the default method using Externalizable, it'll just throw an exception.)
If you've got absolutely no control on the output, your only option is to keep two versions of the class: use the default deserialization of the old version, then convert to the new. The upside of this solution is that it keeps the "dirty" code in one place, separate from your nice and clean objects.
Again, unless you want to do things really complicated, your best option is to keep the old class as the "transport" bean and rename the class your code really uses to something else.

If you want to read what's already in your database your only option is to get them to change the class back again, and to institute some awareness that you're relying on the class definition as it was when the class was serialized. Merely implementing your own readObject() call can't fix this, and if the class is under someone else's control you can't do that anyway.
If you're prepared to throw away the existing data you have many other choices starting with custom Serialization, writeReplace()/readResolve(), Externalizable, ... or a different mechanism such as XML.
But if you're going to have third parties changing things whenever they feel like it you're always going to have problems of one kind or another.
BigDecimal to Long sounds like a retrograde step anyway.

Implement the readObject and readObjectNoData methods in you class.
Read the appropriate type using ObjectInoutStream.readObject and convert it to the new type
See the Serializable interface API for details.
More Details
You can only fix this easily if you control the source of the class that was serialized into the blob.
If you do not control this class,
then you have only a few limited and difficult options:
Have the controlling party give you a version of the class that reads the old format and writes the new format.
Write you own form of serialization (as in you read the blob and convert the bytes to classes) that can read the old format and generate new versions of the classes.
Write you own version of the class in question (remove the other from the class path) which reads the old format and produces some intermediate form (perhaps JSON).
Next you have to do one of these
Convince the powers that be that the blob technique is shitty and should be done away with. use the current class change as evidance. Almost any technique is better that this. Writing JSON to the db in the blob is better.
Stop depending on shitty classes from other people. (shitty is a judgement which I can only suspect, not know, is true). Instead create a suite of classes that represent the data in the database and convert from the externally controlled classes to the new data classes before writing to the database.

Benefits of serializing an object instead of using xml

Right my my application saves its objects as xml. The reason why I am considering serializing the objects is for a speed gain. Are there any other benefits of serializing?

There's one downside that has nothing to do with performance: You can't retrieve the serialized data if you ever change the underlying objects. If the data is long-lived, you could be setting yourself up for a rigid object model.
XML the advantage of leaving open the possibility of "duck typing"; so does JSON, and it's lighter than XML.

give json a try. its more lightweight.
object serialisation has given me many headaches in the past particually in an ever changing domain model

Homemade vs. Java Serialization

I have a certain POJO which needs to be persisted on a database, current design specifies its field as a single string column, and adding additional fields to the table is not an option.
Meaning, the objects need to be serialized in some way. So just for the basic implementation I went and designed my own serialized form of the object which meant concatenating all it's fields into one nice string, separated by a delimiter I chose. But this is rather ugly, and can cause problems, say if one of the fields contains my delimiter.
So I tried basic Java serialization, but from a basic test I conducted, this somehow becomes a very costly operation (building a ByteArrayOutputStream, an ObjectOutputStream, and so on, same for the deserialization).
So what are my options? What is the preferred way for serializing objects to go on a database?
Edit: this is going to be a very common operation in my project, so overhead must be kept to a minimum, and performance is crucial. Also, third-party solutions are nice, but irrelevant (and usually generate overhead which I am trying to avoid)

Elliot Rusty Harold wrote up a nice argument against using Java Object serialization for the objects in his XOM library. The same principles apply to you. The built-in Java serialization is Java-specific, fragile, and slow, and so is best avoided.
You have roughly the right idea in using a String-based format. The problem, as you state, is that you're running into formatting/syntax problems with delimiters. The solution is to use a format that is already built to handle this. If this is a standardized format, then you can also potentially use other libraries/languages to manipulate it. Also, a string-based format means that you have a hope of understanding it just by eyeballing the data; binary formats remove that option.
XML and JSON are two great options here; they're standardized, text-based, flexible, readable, and have lots of library support. They'll also perform surprisingly well (sometimes even faster than Java serialization).

You might try Protocol Buffers, it is a open-source project from Google, it is said to be fast (generates shorter serialized form than XML, and works faster). It also handles addition of new field gently (inserts default values).

You need to consider versioning in your solution. Data incompatibility is a problem you will experience with any solution that involves the use of a binary serialization of the Object. How do you load an older row of data into a newer version of the object?
So, the solutions above which involve serializing to a name/value pairs is the approach you probably want to use.
One solution is to include a version number as one of field values. As new fields are added, modified or removed then the version can be modified.
When deserializing the data, you can have different deserialization handlers for each version which can be used to convert data from one version to another.

XStream or YAML or OGNL come to mind as easy serialization techniques. XML has been the most common, but OGNL provides the most flexibility with the least amount of metadata.

Consider putting the data in a Properties object and use its load()/store() serialization. That's a text-based technique so it's still readable in the database:
public String getFieldsAsString() {
Properties data = new Properties();
data.setProperty( "foo", this.getFoo() );
data.setProperty( "bar", this.getBar() );
...
ByteArrayOutputStream out = new ByteArrayOutputStream();
data.store( out, "" );
return new String( out.toByteArray(), "8859-1" ); //store() always uses this encoding
}
To load from string, do similar using a new Properties object and load() the data.
This is better than Java serialization because it's very readable and compact.
If you need support for different data types (i.e. not just String), use BeanUtils to convert each field to and from a string representation.

I'd say your initial approach is not all that bad if your POJO consists of Strings and primitive types. You could enforce escaping of the delimiter to prevent corruptions. Also if you use Hibernate you encapsulate the serialization in a custom type.
If you do not mind another dependency, Hessian is supposedly a more efficient way of serializing Java objects.

How about the standard JavaBeans persistence mechanism:
java.beans.XMLEncoder
java.beans.XMLDecoder
These are able to create Java POJOs from XML (which have been persisted to XML). From memory, it looks (something) like...
<object class="java.util.HashMap">
<void method="put">
<string>Hello</string>
<float>1</float>
</void>
</object>
You have to provide PersistenceDelegate classes so that it knows how to persist user-defined classes. Assuming you don't remove any public methods, it is resilient to schema changes.

You can optimize the serialization by externalizing your object. That will give you complete control over how it is serialized and improve the performance of process. This is simple to do, as long as your POJO is simple (i.e. doesn't have references to other objects), otherwise you can easily break serialization.
tutorial here
EDIT: Not implying this is the preferred approach, but you are very limited in your options if ti is performance critical and you can only use a string column in the table.

If you are using a delimiter you could use a character which you know would never occur in your text such as \0, or special symbols http://unicode.org/charts/symbols.html
However the time spent sending the data to the database and persisting it is likely to be much larger than the cost of serialization. So I would suggest starting with some thing simple and easy to read (like XStream) and look at where your application is spending most of its time and optimise that.

I have a certain POJO which needs to be persisted on a database, current design specifies its field as a single string column, and adding additional fields to the table is not an option.
Could you create a new table and put a foreign key into that column!?!? :)
I suspect not, but let's cover all the bases!
Serialization:
We've recently had this discussion so that if our application crashes we can resurrect it in the same state as previously. We essentially dispatch a persistance event onto a queue, and then this grabs the object, locks it, and then serializes it. This seems pretty quick. How much data are you serializing? Can you make any variables transient (i.e. cached variables)? Can you consider splitting up your serialization?
Beware: what happens if your objects change (locking) or classes change (diferent serialization id)? You'll need to upgrade everything that's serialized to latest classes. Perhaps you only need to store this overnight so it doesn't matter?
XML:
You could use something like xstream to achieve this. Building something custom is doable (a nice interview question!), but I'd probably not do it myself. Why bother? Remember if you have cyclic links or if you have referencs to objects more than once. Rebuilding the objects isn't quite so trivial.
Database storage:
If you're using Oracle 10g to store blobs, upgrade to the latest version, since c/blob performance is massively increased. If we're talking large amounts of data, then perhaps zip the output stream?
Is this a realtime app, or will there be a second or two pauses where you can safely persist the actual object? If you've got time, then you could clone it and then persist the clone on another thread. What's the persistance for? Is it critical it's done inside a transaction?

Consider changing your schema. Even if you find a quick way to serialize a POJO to a string how do you handle different versions? How do you migrate the database from X->Y? Or worse from A->D? I am seeing issues where we stored a serialize object into a BLOB field and have to migrate a customer across multiple versions.

Have you looked into JAXB? It is a mechanism by which you can define a suite of java objects that are created from an XML Schema. It allows you to marshal from an object hierarchy to XML or unmarshal the XML back into an object hierarchy.

I'll second suggestion to use JAXB, or possibly XStream (former is faster, latter has more focus on object serialization part).
Plus, I'll further suggest a decent JSON-based alternative, Jackson (http://jackson.codehaus.org/Tutorial), which can fully serializer/deserialize beans to JSON text to store in the column.
Oh and I absolutely agree in that do not use Java binary serialization under any circumstances for long-term data storage. Same goes for Protocol Buffers; both are too fragile for this purpose (they are better for data transfer between tigtly coupled systems).

You might try Preon. Preon aims to be to binary encoded data what Hibernate is to relational databases and JAXB to XML.

Serialize Java objects into Java code

Does somebody know a Java library which serializes a Java object hierarchy into Java code which generates this object hierarchy? Like Object/XML serialization, only that the output format is not binary/XML but Java code.

Serialised data represents the internal data of objects. There isn't enough information to work out what methods you would need to call on the objects to reproduce the internal state.
There are two obvious approaches:
Encode the serialised data in a literal String and deserialise that.
Use java.beans XML persistence, which should be easy enough to process with your favourite XML->Java source technique.

I am not aware of any libraries that will do this out of the box but you should be able to take one of the many object to XML serialisation libraries and customise the backend code to generate Java. Would probably not be much code.
For example a quick google turned up XStream. I've never used it but is seems to support multiple backends other than XML - e.g. JSON. You can implement your own writer and just write out the Java code needed to recreate the hierarchy.
I'm sure you could do the same with other libraries, in particular if you can hook into a SAX event stream.
See:
HierarchicalStreamWriter

Great question. I was thinking about serializing objects into java code to make testing easier. The use case would be to load some data into a db, then generate the code creating an object and later use this code in test methods to initialize data without the need to access the DB.
It is somehow true that the object state doesn't contain enough info to know how it's been created and transformed, however, for simple java beans there is no reason why this shouldn't be possible.
Do you feel like writing a small library for this purpose? I'll start coding soon!

XStream is a serialization library I used for serialization to XML. It should be possible and rather easy to extend it so that it writes Java code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.