I need to serialize a java object which might change later on, like some of the variables can be added or removed. What are the pit falls of such an approach and What precautions should I take, if this remains the only way out.
You definitely need to add a serialVersionUID field right from the beginning.
Changes might make the serialized objects incompatible. Adding and removing fields can cause the violation of class contracts (up to the point of Exceptions being thrown) when deserializing instances where the field was not present in a class version that expects it to be - the field is set to the type's default value in that case; the most likely problems are NullPointerExceptions. This can be averted by implementing readObject() and writeObject(). Other changes (such as changing a field's type) can cause the deserialization to fail entirely.
As Michael pointed out Java provides some support for serialization with java.io.Serializable. The main problem with the Java support is that versioning is clunky and requires to user to deal with it.
Instead I would recommend something like Googles Protocol Buffers or Apache Thrift. For both you define the object in a very simple language and then they will generate the serialization code for you. Both also handle all the versioning for you such that you don't have to worry about if you are reading an old or a new version of the object.
For example if you have a type foo() which has a field bar and you write a bunch of foo objects to disk. Then some time later you add a field baz to foo and write a few more foo objects to disk. When you read them back they will all be foo objects, it will seem as if all of the original foo objects simply never set their baz field.
I suppose the short answer would be that you will have to implement some sort of custom deserialization process, that will know of the changes and will deserialize older versions of an object in a correct way. You should also include the serialVersionUID field that will keep track of you version and will help you find out if a serialized object is an old version. You can read more about this here
When you now that your serialized object will change in the future, you should create a new serialzed Object with another namespace, instead of changing an existing one.
And adding a serialVersionUID like Michael described is also a ToDo.
Related
What happens if I serialize a Map(or List) with a java version, and I try to deserialize it with other java version, where the serialVersionUID changed? I suppose it will fail.
If you create a lib for others to use what will be the preferred way of serializing objects, using Java Objects like Map, List or using an array of self made objects?
e.g.
List<MyObject> or MyObject[]?
Map<String, MyObject> or MyObject2[] (MyObject2 contains the key and MyObject)?
If you control the class and if you did not change the serialVersionUID also deserialization of an instance of a class from a older version is possible. There for java provides a concept which is called binary compatibility. Most of the flexibility of binary compatibility comes from the use of a late binding of symbolic references for the names of classes, interfaces, fields, methods:
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/version.html
So core classes from java e.g. HashMap, ArrayList, Vector... will be deserializable even if the class will be involved in a future version of java.
If you wish to control versioning in your own class, you simply have to provide the serialVersionUID field manually and ensure it is always the same, no matter what changes you make to the classfile.
See also this article:
http://www.oracle.com/technetwork/articles/java/javaserial-1536170.html
Yes, you are correct, deserialization with changed serialVersionUID will fail. Version of JDK doesn't matter here.
If you create a lib for others to use what will be the preferred way
of serializing objects, using Java Objects like Map, List or using an
array of self made objects?
You can serialize objects to some more portable format, like plain text with (e.g. JSON, XML). You may take a look at JAXB or XStream.
But keep in mind, that main usage of serialization is to transfer objects over the network. If you would like to store some state you typically should use a database. Serialization to bytes is useful mainly for short-lived objects (because as you noticed, object may change, and thus serializationVersionId may change also).
Hope it helps.
For class that implements Serializable interface there are 2 ways to define what specific fields get streamed during the serialization:
By default all non-static, non-transient fields that implement Serializable are preserved.
By definning ObjectStreamField [] serialPersistentFields and explicitly declaring the specific fields saved.
I wonder, what is the advantage of the second method except for the ability to define specific fields order?
The 'advantage' is that it does what it says in the Javadoc: defines which fields are serialized. Without it, all non-transient non-static fields are serialized. Your choice.
The advantage is you can conditionally populate ObjectStreamField at runtime albeit only once per JVM lifecycle to determine which fields should be serialized.
private static final ObjectStreamField [] osf;
static {
//code to init osf
}
Luckily, I'm actually writing this up right now.... Besides the advantages mentioned (and I don't know much about unshared), writing your own output format seems to have the following advantages:
Allows conditional output (different uses for serialization, such as persistence and copying, can serialize different parts of the object).
Should be faster, use less memory, and in some cases use less disk than the default mechanism (this is from Bloch's Effective Java 2).
Allows you to rename variables in a serialized class while maintaining backwards-compatibility.
Allows you to access data from deleted fields in a new version (in other words, change the internal representation of your data while maintaining backwards-compatibility).
I've seen the documentation you're quoting, and mentioning just those 2 options is a bit misleading and leaves quite a bit out: you can customize your serialization format in 2 ways, by using the ObjectOutput/InputStream interface to write and read fields in a particular order (described in Bloch), and using the PutField and GetField classes to write and read fields by name. You can use serialPersistentFields as your quote mentions to extend this second method, but it's not required unless you need to read or write data with a name which is not a member variable name.
There's a 3rd way to control format as well, using the Externizable interface, though I haven't explored that much. And some of the advantages can also be gotten through Serialization Proxies (see Bloch).
Anyone feel free to correct me on details if I missed anything.
In serialPersistentFields you can specify fields that are not necessarily present in the class anymore.
See for example the jdk class java.math.BigInteger, where several fields are read and written which don't exist anymore in the class. These obsolete fields are still read and written for compatibility with older versions. The reading and writing of these fields is handled by the readObject() and writeObject() methods.
See also
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/serial-arch.html#6250
I haven't tried this yet, but it seems risky. The case I'm thinking of is instrumenting simple VO classes with JiBX. These VOs are going to be serialized over AMF and possibly other schemes. Can anyone confirm or deny my suspicions that doing behind-the-back stuff like bytecode enhancement might mess something up in general, and provide some background information as to why? Also, I'm interested in the specific case of JiBX.
Behind the scenes, serialization uses reflection. Your bytecode manipulation is presumably adding fields. So, unless you mark these fields as transient, they will get serialised just like normal fields.
So, provided you have performed the same bytecode manipulation on both sides, you'll be fine.
If you haven't you'll need to read the serialisation documentation to understand how the backwards compatibility features work. Essentially, I think you can send fields that aren't expected by the receiver and you're fine; and you can miss out fields and they'll get their default values on the receiving end. But you should check this in the spec!
If you're just adding methods, then they have no effect on serialisation, unless they are things like readResolve(), etc. which are specifically used by the serialisation mechanism.
Adding/changing/removing public or protected fields or methods to a class will affect it's ability to be deserialized. As will adding interfaces. These are used among other things to generate a serialVersionUID which is written to the stream as part of the serialization process. If the serialVersionUID of the class doesn't match the loaded class during deserialization, then it will fail.
If you explicitly set the serialVersionUID in your class definition you can get by this. You may want to implement readObject and writeObject as well.
In the extreme case you can implement Externalizable and have full control of all serialization of the object.
Absolute worst case scenario (though incredibly useful in some situations) is to implement writeReplace on a complex object to swap it out with a sort of simpler value object in serialization. Then in deserialization the simpler value object can implement readResolve to either rebuild or locate the complex object on the other side. It's rare when you need to pull that out, but awfully fun when you do.
I know that I can use serialVersionUID to control the version of classes. And I read that I can then add or remove fields and the class will still be compatible, it will just use default values.
When must I change the serialVersionUID?
The value of the serialVersionUID field should ideally be changed when incompatible changes are made to the structure of the class. The complete list of incompatible changes is present
in the Java Object Serialization Specification.
To expand further, incompatible changes to a class will prevent the deserialization mechanism from creating an instance of the object, because there is information in the stream that does not map to the current class definition.
The frequently-repeated mantra about changing the serialVersionUID every time you change the class is complete and utter nonsense. See this Sun article which they republished on their site and which was migrated to the Oracle Technology Network after the acquisition.
You should change the serialVersionUID only when you deliberately want to break compatibility with all existing serializations, or when your changes to the class are so radical that you have no choice - in which case you should really think several times about what it is that you are actually doing.
In all other cases you should bust your boiler trying to use custom readObject()/writeObject() and/or writeReplace()/readResolve() methods and/or serialFields annotations so that you can continue to read objects from those existing serializations. Once you break that you are in for a major headache, indeed nightmare.
If you don't specify a serialVersionUID field in your Serializable classes, the Java compiler will specify one for you -- essentially it's a hash of the class name, interface names, methods, and fields of the class. Methods can be altered at any time, though, so if you need to change how a stored class is deserialized, you can override the readObject method. If you do specify the serialVersionUID field in your code, though, the compiler won't override that even if you do make incompatible changes, which can result in an exception at runtime -- your IDE or compiler won't give you a warning. (EDIT -- thanks EJP) IDEs such as Eclipse can insert the compiler's UID for you, if you want to easily check how the compiler views certain changes.
If you make changes often, keep an old version of the disk file around to test deserialization with. You can write unit tests to try and read in the old file, and see if it works or if it's totally incompatible.
One caveat, I've personally experienced the pain that is working with Serializable classes originally intended for long-term storage that were improperly designed. For example, storing GUI elements on disk rather than creating them when needed. Ask yourself if Serializable is really the best way to save your data.
For the sake of completeness, here's a list of changes that break the compatibility of Java serialization according to the java 8 spec:
Deleting fields
Moving classes up or down the hierarchy
Changing a nonstatic field to static or a nontransient field to transient
Changing the declared type of a primitive field
Changing the writeObject or readObject method so that it no longer writes or reads the default field data or changing it so that it attempts to write it or read it when the previous version did not.
Changing a class from Serializable to Externalizable or vice versa
Changing a class from a non-enum type to an enum type or vice versa
Removing either Serializable or Externalizable
Adding the writeReplace or readResolve method to a class
You can set serialiVersionUID to the same value for the life of the class. (Not always a good idea) Note: you can implement your own serialization version checking strategy with readObject/writeObject if you need this and leave the UID unchanged.
The only time you MUST change it is if you have already serialized some data to a file and you want to read it. If it has changed for any reason you MUST set the serialiVersionUID to the version in the file to have any hope of being able to read the data.
To declare your own serialVersionUID in java, type this in the
serialized object class:
#Serial
private static final long serialVersionUID = desired_number;
I want to write an object into a stream (or byte array) with its transient attributes to be able to reconstruct it in another VM. I don't want to modify its attributes because that object is a part of legacy application.
Standard Java serialization mechanism doesn't help. What other options do I have?
Update:
The reason I'm asking the question is that I want to modify an existing Spring application. It called a bean's method in-process earlier but now I want to move the bean on a separate machine and use Spring remoting through HTTP invoker. And I have a problem with parameters that have transient fields that need to be passed to this method but not needed to be serialized in other parts of the app.
Hmm - if an attribute is marked as transient, that means exactly that it's not mean to be considered part of the object's persistent state, e.g. for serialization. The fact that you want to do this at all is a code smell, and the correct solution is to stop those fields being transient.
Let's say though that for whatever reason you can't modify the target classes themselves. My first thought was that you could customise the serialisation by implementing readObject() and writeObject() methods, but that would also require changes to the target class.
In that case, you'll need to work with some kind of reflection-based or metadata-based API in order to do this. There are many libraries that will convert objects to and from XML or JSON or DB rows, etc. Your best bet would be to use one of these to convert the object to and from "hydrated" form (and likely you'll need to customise them, as any sane serialiser will ignore transient fields). Which one to pick depends on your current software stack, and your precise requirements.
I assume you cannot change the legacy code. In this case I think you will have to resort to going over the object fields with reflection and DataOutputStream.
transient variables are supposed to be those that aren't serializable or are easily recalculated.
My first suggestion is to look for methods on this object to recalculate the transient fields.