What happens if I serialize a Map(or List) with a java version, and I try to deserialize it with other java version, where the serialVersionUID changed? I suppose it will fail.
If you create a lib for others to use what will be the preferred way of serializing objects, using Java Objects like Map, List or using an array of self made objects?
e.g.
List<MyObject> or MyObject[]?
Map<String, MyObject> or MyObject2[] (MyObject2 contains the key and MyObject)?
If you control the class and if you did not change the serialVersionUID also deserialization of an instance of a class from a older version is possible. There for java provides a concept which is called binary compatibility. Most of the flexibility of binary compatibility comes from the use of a late binding of symbolic references for the names of classes, interfaces, fields, methods:
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/version.html
So core classes from java e.g. HashMap, ArrayList, Vector... will be deserializable even if the class will be involved in a future version of java.
If you wish to control versioning in your own class, you simply have to provide the serialVersionUID field manually and ensure it is always the same, no matter what changes you make to the classfile.
See also this article:
http://www.oracle.com/technetwork/articles/java/javaserial-1536170.html
Yes, you are correct, deserialization with changed serialVersionUID will fail. Version of JDK doesn't matter here.
If you create a lib for others to use what will be the preferred way
of serializing objects, using Java Objects like Map, List or using an
array of self made objects?
You can serialize objects to some more portable format, like plain text with (e.g. JSON, XML). You may take a look at JAXB or XStream.
But keep in mind, that main usage of serialization is to transfer objects over the network. If you would like to store some state you typically should use a database. Serialization to bytes is useful mainly for short-lived objects (because as you noticed, object may change, and thus serializationVersionId may change also).
Hope it helps.
Related
On a project, we have several objects serialized. It will be necessary to use these objects on machine with different JVM (possibly different versions).
Our objects serialVersionUID are fixed and won't change, but we are concerned about the serialVersionUID of the JVM standard objects, for instance ArrayList/HashSet that are used in our serialized objects.
So the question is, can these serialVersionUID change between different versions of JVM or between different JVM ?
Or do we have to use another serialization mechanism to support different JVMs ?
The serialVersionUID should only be changed if there is a change to the class that would not be compatible with previously serialized versions of it.
To see what changes would potentially break compatibility check the Specification
I highly doubt that a new version of Java would introduce any changes to core classes that would break compatibility.
We use serialVersionUID as a version code for the class, and we should change this field when we modify the class. This field is used as identity of the class in deserialization.
For example, you serialize a object of class A and save it in an binary file, you can deserialize file to the original object later. But if you add a field to A and do not change the serialVersionUID, the deserialization may return a malformed object. And if you change the serialVersionUID, the deserialization will reject the input and throw an exception. An exception is better than a unknown error.
These error/exception happen if and only if you used an old serialization result to create a instance of a modified class. If you don't use serialization for data persistence, there won't be any problems.
I'm looking for some info on the best approach serialize a graph of object based on the following (Java):
Two objects of the same class must be binary equal (bit by bit) compared to true if their state is equal. (Must not depend on JVM field ordering).
Collections are only modeled with arrays (nothing Collections).
All instances are immutable
Serialization format should be in byte[] format instead of text based.
I am in control of all the classes in the graph.
I don't want to put an empty constructor in the classes just to support serialization.
I have looked at implementing a solution based my own traversal an on Objenisis but my problem does not seem that unique. Better checking for any existing/complete solution first.
Updated details:
First, thanks for your help!
Objects must serialize to exactly the same bit order based on the objects state. This is important since the binary content will be digitally signed. Reconstruction of the serialized format will be based on the state of the object and not that the original bits are stored.
Interoperability between different technologies is important. I do see the software running on ex. .Net in the future. No Java flavour in the serialized format.
Note on comments of immutability: The values of the arrays are copied from the argument to the inner fields in the constructor. Less important.
Best regards,
Niclas Lindberg
You could write the data yourself, using reflections or hand coded methods. I use methods which are look hand code, except they are generated. (The performance of hand coded, and the convience of not having to rewrite the code when it changes)
Often developers talk about the builtin java serialization, but you can have a custom serialization to do whatever you want, any way you want.
To give you are more detailed answer, it would depend on what you want to do exactly.
BTW: You can serialize your data into byte[] and still make it human readable/text like/editable in a text editor. All you have to do is use a binary format which looks like text. ;)
Maybe you want to familiarize yourself with the serialization frameworks available for Java. A good starting point for that is the thift-protobuf-compare project, whose name is misleading: It compares the performance of more than 10 ways of serializing data using Java.
It seems that the hardest constraint you have is Interoperability between different technologies. I know that Googles Protobuffers and Thrift deliver here. Avro might also fit.
The important thing to know about serialization is that it is not guaranteed to be consistent across multiple versions of Java. It's not meant as a way to store data on a disk or anywhere permanent.
It's used internally to send classes from one JVM to another during RMI or some other network protocol. These are the types of applications that you should use Serialization for. If this describes your problem - short term communication between two different JVM's - then you should try to get Serialization going.
If you're looking for a way to store the data more permanently or you will need the data to survive in forward versions of Java, then you should find your own solution. Given your requirements, you should create some sort of method of converting each object into a byte stream yourself and reading it back into objects. You will then be responsible for making sure the format is forward compatible with future objects and features.
I highly recommend Chapter 11 of Effective Java by Joshua Bloch.
Is the Externalizable interface what you're looking for ? You fully control the way your objects are persisted and you do that the OO-style, with methods that are inherited and all (unlike the private read-/write-Object methods used with Serializable). But still, you cannot get rid of the no-arg accessible constructor requirement.
The only way you would get this is:
A/ USE UTF8 text, I.E. XML or JSON, binary turned to base64(http/xml safe variety).
B/ Enforce UTF8 binary ordering of all data.
C/ Pack the contents except all unescaped white space.
D/ Hash the content and provide that hash in a positionally standard location in the file.
I haven't tried this yet, but it seems risky. The case I'm thinking of is instrumenting simple VO classes with JiBX. These VOs are going to be serialized over AMF and possibly other schemes. Can anyone confirm or deny my suspicions that doing behind-the-back stuff like bytecode enhancement might mess something up in general, and provide some background information as to why? Also, I'm interested in the specific case of JiBX.
Behind the scenes, serialization uses reflection. Your bytecode manipulation is presumably adding fields. So, unless you mark these fields as transient, they will get serialised just like normal fields.
So, provided you have performed the same bytecode manipulation on both sides, you'll be fine.
If you haven't you'll need to read the serialisation documentation to understand how the backwards compatibility features work. Essentially, I think you can send fields that aren't expected by the receiver and you're fine; and you can miss out fields and they'll get their default values on the receiving end. But you should check this in the spec!
If you're just adding methods, then they have no effect on serialisation, unless they are things like readResolve(), etc. which are specifically used by the serialisation mechanism.
Adding/changing/removing public or protected fields or methods to a class will affect it's ability to be deserialized. As will adding interfaces. These are used among other things to generate a serialVersionUID which is written to the stream as part of the serialization process. If the serialVersionUID of the class doesn't match the loaded class during deserialization, then it will fail.
If you explicitly set the serialVersionUID in your class definition you can get by this. You may want to implement readObject and writeObject as well.
In the extreme case you can implement Externalizable and have full control of all serialization of the object.
Absolute worst case scenario (though incredibly useful in some situations) is to implement writeReplace on a complex object to swap it out with a sort of simpler value object in serialization. Then in deserialization the simpler value object can implement readResolve to either rebuild or locate the complex object on the other side. It's rare when you need to pull that out, but awfully fun when you do.
I need to serialize a java object which might change later on, like some of the variables can be added or removed. What are the pit falls of such an approach and What precautions should I take, if this remains the only way out.
You definitely need to add a serialVersionUID field right from the beginning.
Changes might make the serialized objects incompatible. Adding and removing fields can cause the violation of class contracts (up to the point of Exceptions being thrown) when deserializing instances where the field was not present in a class version that expects it to be - the field is set to the type's default value in that case; the most likely problems are NullPointerExceptions. This can be averted by implementing readObject() and writeObject(). Other changes (such as changing a field's type) can cause the deserialization to fail entirely.
As Michael pointed out Java provides some support for serialization with java.io.Serializable. The main problem with the Java support is that versioning is clunky and requires to user to deal with it.
Instead I would recommend something like Googles Protocol Buffers or Apache Thrift. For both you define the object in a very simple language and then they will generate the serialization code for you. Both also handle all the versioning for you such that you don't have to worry about if you are reading an old or a new version of the object.
For example if you have a type foo() which has a field bar and you write a bunch of foo objects to disk. Then some time later you add a field baz to foo and write a few more foo objects to disk. When you read them back they will all be foo objects, it will seem as if all of the original foo objects simply never set their baz field.
I suppose the short answer would be that you will have to implement some sort of custom deserialization process, that will know of the changes and will deserialize older versions of an object in a correct way. You should also include the serialVersionUID field that will keep track of you version and will help you find out if a serialized object is an old version. You can read more about this here
When you now that your serialized object will change in the future, you should create a new serialzed Object with another namespace, instead of changing an existing one.
And adding a serialVersionUID like Michael described is also a ToDo.
I have a file that contains serialized Java classes. I would like to parse this file in order to get a list of the classes in the file and the serialVersionUID of each class.
Is there a tool anyone can recommend to do this, or perhaps someone could offer some pointers on where I should start to accomplish this myself?
Cheers
Rich
I don't know if there's already such a tool (if you have access to the classes themselves, the serialver tool can tell you the ID), but if you need to roll your own,
Sun's serialzation spec should contain all the information you need - specifically, the grammar of the stream format.
Unfortunately not all classes (even in the JDK) obey the serialisation spec. In particular readObject does not always call defaultReadObject or readFields, with the equivalent mistake in writeObject.
You can detect which classes are being used whilst deserialising. ObjectInputStream uses resolveClass and resolveProxyClass to map class descriptors to actual Classes (some subclasses you different rules for class loader lookup).