Serializing objects with changing class source code - java

Note: Due to the lack of questions like this on SO, I've decided to put one up myself as a Q&A
Serializing objects (using an ObjectOutputStream and an ObjectInputStream) is a method for storing an instance of a Java Object as data that can be later deserialized for use. This can cause problems and frustration when the Class used to deserialize the data does not remain the same (source-code changes; program updates).
So how can an Object be serialized and deserialized with an updated / downgraded version of a Class?

Here are a few common ways of serializing an object that can be deserialized in a backwards-compatible way.
1. Store the data in the JSON format using import and export methods designed to save all fields needed to recreate the instance. This can be made backwards-compatible by including a version key that allows for an update algorithm to be called if the version is too low. A common library for this is the Google Gson library which can represent Java objects in JSON as well as normally editing a JSON file.
2. Use the built-in java Properties class in a way similar to the method described above. Properties objects can be later stored using a stream (store()) written as a regular Java Properties file, or saved in XML (storeToXML()).
3. Sometimes simple objects can be easily represented with key-value pairs in a place where storing them in a JSON, XML, or Properties file is either too complicated or not neccessary (overkill one could say). In this case, an effective way of serializing the object could be using the ObjectOutputStream class to serialize a HashMap object containing key-value pairs where the key could be a String and the value could be an Object (HashMap<String,Object>). This allows for all of the object's fields to be stored as well as including a version key while providing much versatility.
Note: Although serializing an object using the ObjectOutputStream for persistence storage is normally considered bad convention, it can be used either way as long as the class' source code remains the same.
Also Note about versioning: Changes to a class can be safely made without disrupting deserialization using an ObjectOutputStream as long as they are a compatible change. As mentioned in the Versioning of Serializable Objects chapter of the Object Serialization Specification:
A compatible change is a change that does not affect the contract
between the class and its callers.

Related

Java object serializer which is based on field order

Does anyone know a reflection based Java object graph serializer, which stores the fields identified by field order instead name of the field? This is what I want to do:
load a JSON file with Jackson JSON deserializer
save it in binary format which doesn't contain the field names...
load the previously serialized object with the OBFUSCATED version of the application.
The serialized content won't be transferred to any other JVM. Excluding serialized POJOs from obfuscation is not an option for now.
Protostuff by default orders fields from top to bottom as defined in your pojo. You can additionally control the field number using annotations.
Note that the order is not guaranteed on some (non-sun) vms (especially dalvik).
Sun jdk6 or higher is recommended for guaranteed ordering.

Java deserialization of old Object

I'm having a problem when serializing and deserializing my objects in my project. I'm writing the object to a name.dat file.
However whenever i make a change in the Name class i can nolonger deserialize it, since it's two different objects.
Is there any way around this?
Your best options are:
Don't change your classes :-)
Throw away any serialized objects each time you change your classes.
Don't use Java object serialization.
Given that 1) and 2) are probably out of the question, option 3) should be given serious consideration. There a variety of alternatives to Java serialization, depending on the nature of the data you are persisting. These include:
Using Java properties files
Storing the data in a classical database (using SQL and the JDBC API)
Using an object-relational database mapping such as Hibernate
Using XML or JSON and a "binding" technology so that you can serialize / deserialize POJOs.
Finally, it is possible to implement class versioning using Java object serialization. However, it is tricky. And if you are continually changing the classes, then it is not going to be pleasant. Start by reading Versioning of Serializable Objects.

Use of Serializable other than Writing& Reading object to/from File

In Which Cases it is a good coding practice to use implements serializable other than Writing & Reading object to/from file.In a project i went through code. A class using implements serializable even if in that class/project no any Writing/Reading objects to/from file?
If the object leaves the JVM it was created in, the class should implement Serializable.
Serialization is a method by which an object can be represented as a sequence of bytes that includes the object's data as well as information about the object's type and the types of data stored in the object.
After a serialized object has been written into a file, it can be read from the file and deserialized that is, the type information and bytes that represent the object and its data can be used to recreate the object in memory.
This is the main purpose of de-serialization. To get the object information, object type, variable type information from a written(loosely speaking) representation of an object. And hence serialization is required in the first place, to make this possible.
So, whenever, your object has a possibility of leaving the JVM, the program is being executed in, you should make the class, implement Serializable.
Reading/Writing objects into files (Memory), or passing an object over internet or any other type of connection. Whenever the object, leaves the JVM it was created in, it should implement Serializable, so that it can be serialized and deserialized for recognition once it enters back into another/same JVM.
Many good reads at :
1: Why Java needs Serializable interface?
2: What is the purpose of Serialization in Java?
Benefits of serialization:
To persist data for future use.
To send data to a remote computer using client/server Java technologies like RMI , socket programming etc.
To flatten an object into array of bytes in memory.
To send objects between the servers in a cluster.
To exchange data between applets and servlets.
To store user session in Web applications
To activate/passivate enterprise java beans.
You can refer to this article for more details.
If you ever expect your object to be used as data in a RMI setting, they should be serializable, as RMI either needs objects Serializable (if they are to be serialized and sent to the remote side) or to be a UnicastRemoteObject if you need a remote reference.
In earlier versions of java (before java 5) marker interfaces were good way to declare meta data but currently we having annotation which are more powerful to declare meta data for classes.
Annotation provides the very flexible and dynamic capability and we can provide the configuration for annotation meta deta that either we want to send that information in byte code or at run time.
Here If you are not willing to read & write object then there is one purpose left of serialization is, declare metadata for class and if you are goint to declare meta data for class then personally I suggest you don't use serialization just go for annotation.
Annotation is better choice than marker interface and JUnit is a perfect example of using Annotation e.g. #Test for specifying a Test Class. Same can also be achieved by using Test marker interface.
There is one more example which indicate that Annotations are better choice #ThreadSafe looks lot better than implementing ThraedSafe marker interface.
There are other cases in which you want to send an object by value instead of by reference:
Sending objects over the network.
Can't really send objects by reference here.
Multithreading, particularly in Android
Android uses Serializable/Parcelable to send information between Activities. It has something to do with memory mapping and multithreading. I don't really understand this though.
Along with Martin C's answer I want to add that - if you use Serializable then you can easily load your Object graph to memory. For example you have a Student class which have a Deportment. So if you serialize your Student then the Department also be saved. Moreover it also allow you -
1. to rename variables in a serialized class while maintaining backwards-compatibility.
2. to access data from deleted fields in a new version (in other words, change the internal representation of your data while maintaining backwards-compatibility).
Some frameworks/environments might depend upon data objects being serializable. For example in J2EE, the HttpSession attributes must be serializable in order to benefit from Session Persistence. Also RMI and other dark ages artifacts use serialization.
Therefore, though you might not immediately need your data objects to be serializable, it might make sense to declare Serializable just in case (It is almost free, unless you need to go through the pain of declaring readObject/writeObject methods)

Re-serializing JBPM process variables directly via MySQL

I'm working with an application that uses JBPM 3.1 and MySQL. The core problem is that there are processes instances with variables that contain an older version of an external, non-JBPM Serializable class. When the main application is upgraded, these processes instances cause an exception to be thrown by JBPM since the SUID of a specific class instance has changed in the main application.
I believe I have a method for fixing the deserialization process using the technique described in the following:
How to deserialize an object persisted in a db now when the object has different serialVersionUID
However, my problem is figuring out where in MySQL JBPM stores process instance variables, so I can write a program that can interate over all the variables for all instances, an reserialize the variables so the offending class will have the new SUID, so JBPM can operate against the processes.
My initial looking at the JBPM tables, it appears that the JBPM_BYTEARRAY and/or JBPM_BYTEBLOCK may be the tables to operate against. However, I'm unsure how to proceed. I'm guessing each process variable is stored in a wrapping container class. Is that class org.jbpm.context.exe.VariableInstance? Or is it something else?
I figure if I have the proper jar files in the class path, and I know what the main class instance is that JBPM uses to store process variables in MySQL, I can deserialize the class (which will fix the SUID problem with the embedded problem class instance), and reserialize the class back. Since JBPM documentation does mention stuff about converters, I'm unsure if I have to replicate the conversion process JPBM does when deserializing, or if standard java deserialization is enough.
Some analysis of JBPM indicates that binary data may be split across multiple records. This may not be the case for mysql itself, but the JPBM code is written to support multiple RDBMs, and some have limits on the size of binary records.
Since the question earned me a tumbleweed reward, I was not going to get a usable mysql-based answer in within the deadline I had to meet, so I re-considered the core problem and the operating context the problem occurs, and came up with a solution that avoided the needed to perform direct mysql operations.
The main application in question already has some customize modifications to JBPM, so the solution I implemented altered JBPM source which performs the deserialization of process instance variables. This avoids the need to deal with JBPM logic that extracts the deserialized binary data from the RDBMs.
In the class org.jbpm.context.exe.converter.SerializableToByteArrayConverter, I modifed the code to use a custom ObjectInputStream class that returns the latest SUID of a class. The technique of just replacing the descriptor with the latest version of the class as described in the post referenced in the question does not work if the new class includes new fields. Doing so causes an end-of-data exception since the base deserialization code tries to access the "new" fields in the old, deserialized version of the class.
Therefore, I just need to replace the SUID, but keep all other parts of the descriptor the same. Since the JDK does not make ObjectStreamClass extensible, I created a sub-class of ObjectInputStream that returns the new SUID based upon a given calling pattern the java library executes against ObjectInputStream when deserialzing data.
The pattern: When reading the header of a deserialized object, the readUTF() function is called (to obtain the class name) followed by a readLong() call. Therefore, if this calling sequence occurs, and if the readUTF() returned the class name I want to change the SUID of, I return the newer SUID in the readLong() call.
The custom code reads a configuration file that specifies class names and associated SUIDs that should be mapped to the latest SUIDs for the classes listed. This allows mapping of alternate classes in the future w/o modifying the custom code.
Note, this approach is applicable to general deserialization operations, where one needs to map old SUIDs to the latest SUIDs of specified classes, and leaving the other parts of the serialized class descriptor alone to avoid end-of-data problems if the newer class definition includes additional field declarations not present in the older class definition.
Do you know if you made changes that break the contract or is it just simple adding new fields ? If it is simply adding new fields, then just define prior serialversionuid.. Otherwise.. you will have to read all the variables that have different serialversionids and save them under the new class because you are the only person who knows how to convert them.

will serialized object contains metadata?

When we are deserializing an object, its very difficult to understand that, how it is retriving the object in some certain state? Does it contain any Meta data of the object?
When an object is serialized, the object's class is written to the stream along with the contents of the object's non-transient fields. The deserializer will attempt to load that class (and there are several mechanisms for it to do that), then populate the non-transient fields.
The protocol spec is here: http://java.sun.com/javase/6/docs/platform/serialization/spec/protocol.html
If by "metadata" you're referring to annotations on the class, then no, they are not serialized with the object itself, but are available on the class. If you mean something else, please describe what you mean.
At a high level, the serialization stream contains the data inside the object and the name of the classes involved, as well as a version number to ensure the class didn't change. It uses that information to make a new instance of an object and fills it with the same data as the old instance. It does this avoiding all of the usual constraints on object creation (the need to call constructors, for example).
One confusing point people have is that they can think the class definition itself is serialized. It is not, just the data it contains with enough information to know which objects to recreate when deserilalized. When the object is deserialized, it has to match the existing class on the class path, the serialization binary data does not contain the class.

Categories