I have a file that contains serialized Java classes. I would like to parse this file in order to get a list of the classes in the file and the serialVersionUID of each class.
Is there a tool anyone can recommend to do this, or perhaps someone could offer some pointers on where I should start to accomplish this myself?
Cheers
Rich
I don't know if there's already such a tool (if you have access to the classes themselves, the serialver tool can tell you the ID), but if you need to roll your own,
Sun's serialzation spec should contain all the information you need - specifically, the grammar of the stream format.
Unfortunately not all classes (even in the JDK) obey the serialisation spec. In particular readObject does not always call defaultReadObject or readFields, with the equivalent mistake in writeObject.
You can detect which classes are being used whilst deserialising. ObjectInputStream uses resolveClass and resolveProxyClass to map class descriptors to actual Classes (some subclasses you different rules for class loader lookup).
Related
We work heavily with serialization and having to specify Serializable tag on every object we use is kind of a burden. Especially when it's a 3rd-party class that we can't really change.
The question is: since Serializable is an empty interface and Java provides robust serialization once you add implements Serializable - why didn't they make everything serializable and that's it?
What am I missing?
Serialization is fraught with pitfalls. Automatic serialization support of this form makes the class internals part of the public API (which is why javadoc gives you the persisted forms of classes).
For long-term persistence, the class must be able to decode this form, which restricts the changes you can make to class design. This breaks encapsulation.
Serialization can also lead to security problems. By being able to serialize any object it has a reference to, a class can access data it would not normally be able to (by parsing the resultant byte data).
There are other issues, such as the serialized form of inner classes not being well defined.
Making all classes serializable would exacerbate these problems. Check out Effective Java Second Edition, in particular Item 74: Implement Serializable judiciously.
I think both Java and .Net people got it wrong this time around, would have been better to make everything serializable by default and only need to mark those classes that can't be safely serialized instead.
For example in Smalltalk (a language created in 70s) every object is serializable by default. I have no idea why this is not the case in Java, considering the fact that the vast majority of objects are safe to serialize and just a few of them aren't.
Marking an object as serializable (with an interface) doesn't magically make that object serializable, it was serializable all along, it's just that now you expressed something that the system could have found on his own, so I see no real good reason for serialization being the way it is now.
I think it was either a poor decision made by designers or serialization was an afterthought, or the platform was never ready to do serialization by default on all objects safely and consistently.
Not everything is genuinely serializable. Take a network socket connection, for example. You could serialize the data/state of your socket object, but the essence of an active connection would be lost.
The main role of Serializable in Java is to actually make, by default, all other objects nonserializable. Serialization is a very dangerous mechanism, especially in its default implementation. Hence, like friendship in C++, it is off by default, even if it costs a little to make things serializable.
Serialization adds constraints and potential problems since structure compatibility is not insured. It is good that it is off by default.
I have to admit that I have seen very few nontrivial classes where standard serialization does what I want it to. Especially in the case of complex data structures. So the effort you'd spend making the class serializble properly dwarves the cost of adding the interface.
For some classes, especially those that represent something more physical like a File, a Socket, a Thread, or a DB connection, it makes absolutely no sense to serialize instances. For many others, Serialization may be problematic because it destroys uniqueness constraints or simply forces you to deal with instances of different versions of a class, which you may not want to.
Arguably, it might have been better to make everything Serializable by default and make classes non-serializable through a keyword or marker interface - but then, those who should use that option probably would not think about it. The way it is, if you need to implement Serializable, you'll be told so by an Exception.
I think the though was to make sure you, as the programmer, know that your object my be serialized.
Apparently everything was serializable in some preliminary designs, but because of security and correctness concerns the final design ended up as we all know.
Source: Why must classes implement Serializable in order to be written to an ObjectOutputStream?.
Having to state explicitely that instances of a certain class are Serializable the language forces you to think about if you you should allow that. For simple value objects serialization is trivial, but in more complex cases you need to really think things through.
By just relying on the standard serialization support of the JVM you expose yourself to all kinds of nasty versioning issues.
Uniqueness, references to 'real' resources, timers and lots of other types of artifacts are NOT candidates for serialization.
Read this to understand Serializable Interface and why we should make only few classes Serializable and also we shopuld take care where to use transient keyword in case we want to remove few fields from the storing procedure.
http://www.codingeek.com/java/io/object-streams-serialization-deserialization-java-example-serializable-interface/
Well, my answer is that this is for no good reason. And from your comments I can see that you've already learned that. Other languages happily try serializing everything that doesn't jump on a tree after you've counted to 10. An Object should default to be serializable.
So, what you basically need to do is read all the properties of your 3rd-party class yourself. Or, if that's an option for you: decompile, put the damn keyword there, and recompile.
There are some things in Java that simply cannot
be serialized because they are runtime specific. Things like streams, threads, runtime,
etc. and even some GUI classes (which are connected to the underlying OS) cannot
be serialized.
While I agree with the points made in other answers here, the real problem is with deserialisation: If the class definition changes then there's a real risk the deserialisation won't work. Never modifying existing fields is a pretty major commitment for the author of a library to make! Maintaining API compatibility is enough of a chore as it is.
A class which needs to be persisted to a file or other media has to implement Serializable interface, so that JVM can allow the class object to be serialized.
Why Object class is not serialized then none of the classes need to implement the interface, after all JVM serializes the class only when I use ObjectOutputStream which means the control is still in my hands to let the JVM to serialize.
The reason why Object class is not serializable by default in the fact that the class version is the major issue. Therefore each class that is interested in serialization has to be marked as Serializable explicitly and provide a version number serialVersionUID.
If serialVersionUID is not provided then we get unexpected results while deserialzing the object, that is why JVM throws InvalidClassException if serialVersionUID doesn't match. Therefore every class has to implement Serializable interface and provide serialVersionUID to make sure the Class presented at the both ends is identical.
What happens if I serialize a Map(or List) with a java version, and I try to deserialize it with other java version, where the serialVersionUID changed? I suppose it will fail.
If you create a lib for others to use what will be the preferred way of serializing objects, using Java Objects like Map, List or using an array of self made objects?
e.g.
List<MyObject> or MyObject[]?
Map<String, MyObject> or MyObject2[] (MyObject2 contains the key and MyObject)?
If you control the class and if you did not change the serialVersionUID also deserialization of an instance of a class from a older version is possible. There for java provides a concept which is called binary compatibility. Most of the flexibility of binary compatibility comes from the use of a late binding of symbolic references for the names of classes, interfaces, fields, methods:
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/version.html
So core classes from java e.g. HashMap, ArrayList, Vector... will be deserializable even if the class will be involved in a future version of java.
If you wish to control versioning in your own class, you simply have to provide the serialVersionUID field manually and ensure it is always the same, no matter what changes you make to the classfile.
See also this article:
http://www.oracle.com/technetwork/articles/java/javaserial-1536170.html
Yes, you are correct, deserialization with changed serialVersionUID will fail. Version of JDK doesn't matter here.
If you create a lib for others to use what will be the preferred way
of serializing objects, using Java Objects like Map, List or using an
array of self made objects?
You can serialize objects to some more portable format, like plain text with (e.g. JSON, XML). You may take a look at JAXB or XStream.
But keep in mind, that main usage of serialization is to transfer objects over the network. If you would like to store some state you typically should use a database. Serialization to bytes is useful mainly for short-lived objects (because as you noticed, object may change, and thus serializationVersionId may change also).
Hope it helps.
I'm looking for some info on the best approach serialize a graph of object based on the following (Java):
Two objects of the same class must be binary equal (bit by bit) compared to true if their state is equal. (Must not depend on JVM field ordering).
Collections are only modeled with arrays (nothing Collections).
All instances are immutable
Serialization format should be in byte[] format instead of text based.
I am in control of all the classes in the graph.
I don't want to put an empty constructor in the classes just to support serialization.
I have looked at implementing a solution based my own traversal an on Objenisis but my problem does not seem that unique. Better checking for any existing/complete solution first.
Updated details:
First, thanks for your help!
Objects must serialize to exactly the same bit order based on the objects state. This is important since the binary content will be digitally signed. Reconstruction of the serialized format will be based on the state of the object and not that the original bits are stored.
Interoperability between different technologies is important. I do see the software running on ex. .Net in the future. No Java flavour in the serialized format.
Note on comments of immutability: The values of the arrays are copied from the argument to the inner fields in the constructor. Less important.
Best regards,
Niclas Lindberg
You could write the data yourself, using reflections or hand coded methods. I use methods which are look hand code, except they are generated. (The performance of hand coded, and the convience of not having to rewrite the code when it changes)
Often developers talk about the builtin java serialization, but you can have a custom serialization to do whatever you want, any way you want.
To give you are more detailed answer, it would depend on what you want to do exactly.
BTW: You can serialize your data into byte[] and still make it human readable/text like/editable in a text editor. All you have to do is use a binary format which looks like text. ;)
Maybe you want to familiarize yourself with the serialization frameworks available for Java. A good starting point for that is the thift-protobuf-compare project, whose name is misleading: It compares the performance of more than 10 ways of serializing data using Java.
It seems that the hardest constraint you have is Interoperability between different technologies. I know that Googles Protobuffers and Thrift deliver here. Avro might also fit.
The important thing to know about serialization is that it is not guaranteed to be consistent across multiple versions of Java. It's not meant as a way to store data on a disk or anywhere permanent.
It's used internally to send classes from one JVM to another during RMI or some other network protocol. These are the types of applications that you should use Serialization for. If this describes your problem - short term communication between two different JVM's - then you should try to get Serialization going.
If you're looking for a way to store the data more permanently or you will need the data to survive in forward versions of Java, then you should find your own solution. Given your requirements, you should create some sort of method of converting each object into a byte stream yourself and reading it back into objects. You will then be responsible for making sure the format is forward compatible with future objects and features.
I highly recommend Chapter 11 of Effective Java by Joshua Bloch.
Is the Externalizable interface what you're looking for ? You fully control the way your objects are persisted and you do that the OO-style, with methods that are inherited and all (unlike the private read-/write-Object methods used with Serializable). But still, you cannot get rid of the no-arg accessible constructor requirement.
The only way you would get this is:
A/ USE UTF8 text, I.E. XML or JSON, binary turned to base64(http/xml safe variety).
B/ Enforce UTF8 binary ordering of all data.
C/ Pack the contents except all unescaped white space.
D/ Hash the content and provide that hash in a positionally standard location in the file.
I've got a complex object which is being managed by the LCDS DataServices data management and being created/updated etc using custom assemblers. The vast majority of the object hierarchy is being serialized/deserialized correctly but I've hit a stumbling block when it comes to serializing immutable java classes.
In a java only world I would use the java writeReplace and readResolve methods as this excellent blog describes: http://lingpipe-blog.com/2009/08/10/serializing-immutable-singletons-serialization-proxy/
This is how I originally wrote my java class, expecting livecycle to call the writeReplace method and duly replace the immutable class with a mutable one for serialization. However it would appear that lcds knows nothing of the writeReplace method and will only call readExternal/writeExternal ignoring readResolve and writeReplace.
Firstly, have other people found this to be the case, or am I missing something?
Secondly, has anyone come up with an appropriate method to deserialize actionscript classes into either immutable objects or singletons?
Many thanks
Yes, it's a common problem. Adobe recommend that the Java type that has immutable properties implements Externalizable and the equivalent ActionScript type implements IExternalizable.
There is no plan to handle the writeReplace and readResolve, but you can ask for a feature request http://bugs.adobe.com/jira/browse/BLZ
When implementing your custom serialization take care that you will lose some benefits like compressing numbers and identifying duplicate strings. One idea is to take a look on the actual serialization mechanism and to modify it accordingly.
However, if you are interested just in serializing the read only properties this enhancement was implemented in the BlazeDS, take a look here: http://bugs.adobe.com/jira/browse/BLZ-427
shortest and most comprehensive answer i found: http://expertdevelopers.blogspot.com/2010/07/serializable-vs-externalizable.html
Using google's Protocul Buffers, I have a service already written in Java which has its own data structures already. I'd like to use pb to delivering messages and I'm looking for a way to serialize the existing data structures that I have in Java to pb.
I can start by defining all the data structures in pb from scratch, which is probably the right way to go but I'm too lazy.
So, say I have a Person class in Java (or other supported languages) or a Plane class which has tens of attributes in it, is there a way to serialize that class to pb? Can I have a pb attribute of type Plane? (when Plane is not a pb, it's a Java class)
No, you can't. Fields in protobuf messages are always the primitives (numbers, strings and byte arrays, basically), protobuf enums (which are generated as Java enums) or protobuf messages - and repeated versions of all of those, of course.
You could potentially write a tool which used reflection to create a .proto file from a Java class, but I suspect you'd find it quicker just to do it by hand. In particular, if you did use reflection you'd want to make sure that the fields were always generated with the same name, to maintain compatibility. One thing you could do is annotate the Java classes and write code to generate the .proto file based on those annotations - or even potentially serialize directly to proto format using the annotations. Personally I'd recommend creating the .proto file in some way rather than effectively rewriting the PB project - otherwise there's a significant risk of introducing bugs where there's already thoroughly tested code.
If you do create an annotation system, I'm sure Kenton Varda (and the rest of the PB community) would be interested in seeing it.
One way I can think of is to have a string field in a protobuf and serialize a Java class to that field using Java's primitive serialization. That way, assuming the receiver of the message knows how to read/deserialize it, I can easily serialize Java to Java messages.
There are downsides to this technique, though. To name a few:
It's only Java to Java (no C++, Python or others)
It's not as efficient as native protobufs are (neither parsing/serializatin wise nor message size wise)
You have the logic of the data structures scattered around in several places, some are in the protobufs definition file, some in other Java classes and this makes things harder to maintain.
But - it gets the job done for the short term.