Not serializable class with strings only [duplicate] - java

We work heavily with serialization and having to specify Serializable tag on every object we use is kind of a burden. Especially when it's a 3rd-party class that we can't really change.
The question is: since Serializable is an empty interface and Java provides robust serialization once you add implements Serializable - why didn't they make everything serializable and that's it?
What am I missing?

Serialization is fraught with pitfalls. Automatic serialization support of this form makes the class internals part of the public API (which is why javadoc gives you the persisted forms of classes).
For long-term persistence, the class must be able to decode this form, which restricts the changes you can make to class design. This breaks encapsulation.
Serialization can also lead to security problems. By being able to serialize any object it has a reference to, a class can access data it would not normally be able to (by parsing the resultant byte data).
There are other issues, such as the serialized form of inner classes not being well defined.
Making all classes serializable would exacerbate these problems. Check out Effective Java Second Edition, in particular Item 74: Implement Serializable judiciously.

I think both Java and .Net people got it wrong this time around, would have been better to make everything serializable by default and only need to mark those classes that can't be safely serialized instead.
For example in Smalltalk (a language created in 70s) every object is serializable by default. I have no idea why this is not the case in Java, considering the fact that the vast majority of objects are safe to serialize and just a few of them aren't.
Marking an object as serializable (with an interface) doesn't magically make that object serializable, it was serializable all along, it's just that now you expressed something that the system could have found on his own, so I see no real good reason for serialization being the way it is now.
I think it was either a poor decision made by designers or serialization was an afterthought, or the platform was never ready to do serialization by default on all objects safely and consistently.

Not everything is genuinely serializable. Take a network socket connection, for example. You could serialize the data/state of your socket object, but the essence of an active connection would be lost.

The main role of Serializable in Java is to actually make, by default, all other objects nonserializable. Serialization is a very dangerous mechanism, especially in its default implementation. Hence, like friendship in C++, it is off by default, even if it costs a little to make things serializable.
Serialization adds constraints and potential problems since structure compatibility is not insured. It is good that it is off by default.
I have to admit that I have seen very few nontrivial classes where standard serialization does what I want it to. Especially in the case of complex data structures. So the effort you'd spend making the class serializble properly dwarves the cost of adding the interface.

For some classes, especially those that represent something more physical like a File, a Socket, a Thread, or a DB connection, it makes absolutely no sense to serialize instances. For many others, Serialization may be problematic because it destroys uniqueness constraints or simply forces you to deal with instances of different versions of a class, which you may not want to.
Arguably, it might have been better to make everything Serializable by default and make classes non-serializable through a keyword or marker interface - but then, those who should use that option probably would not think about it. The way it is, if you need to implement Serializable, you'll be told so by an Exception.

I think the though was to make sure you, as the programmer, know that your object my be serialized.

Apparently everything was serializable in some preliminary designs, but because of security and correctness concerns the final design ended up as we all know.
Source: Why must classes implement Serializable in order to be written to an ObjectOutputStream?.

Having to state explicitely that instances of a certain class are Serializable the language forces you to think about if you you should allow that. For simple value objects serialization is trivial, but in more complex cases you need to really think things through.
By just relying on the standard serialization support of the JVM you expose yourself to all kinds of nasty versioning issues.
Uniqueness, references to 'real' resources, timers and lots of other types of artifacts are NOT candidates for serialization.

Read this to understand Serializable Interface and why we should make only few classes Serializable and also we shopuld take care where to use transient keyword in case we want to remove few fields from the storing procedure.
http://www.codingeek.com/java/io/object-streams-serialization-deserialization-java-example-serializable-interface/

Well, my answer is that this is for no good reason. And from your comments I can see that you've already learned that. Other languages happily try serializing everything that doesn't jump on a tree after you've counted to 10. An Object should default to be serializable.
So, what you basically need to do is read all the properties of your 3rd-party class yourself. Or, if that's an option for you: decompile, put the damn keyword there, and recompile.

There are some things in Java that simply cannot
be serialized because they are runtime specific. Things like streams, threads, runtime,
etc. and even some GUI classes (which are connected to the underlying OS) cannot
be serialized.

While I agree with the points made in other answers here, the real problem is with deserialisation: If the class definition changes then there's a real risk the deserialisation won't work. Never modifying existing fields is a pretty major commitment for the author of a library to make! Maintaining API compatibility is enough of a chore as it is.

A class which needs to be persisted to a file or other media has to implement Serializable interface, so that JVM can allow the class object to be serialized.
Why Object class is not serialized then none of the classes need to implement the interface, after all JVM serializes the class only when I use ObjectOutputStream which means the control is still in my hands to let the JVM to serialize.
The reason why Object class is not serializable by default in the fact that the class version is the major issue. Therefore each class that is interested in serialization has to be marked as Serializable explicitly and provide a version number serialVersionUID.
If serialVersionUID is not provided then we get unexpected results while deserialzing the object, that is why JVM throws InvalidClassException if serialVersionUID doesn't match. Therefore every class has to implement Serializable interface and provide serialVersionUID to make sure the Class presented at the both ends is identical.

Related

Side effects of using Serializable?

Reviewing server logs I encountered NotSerializableException for a domain object during some RMI cache transfer function. I noticed that a domain object doesn't implement Serializable interface; however I am a bit sceptical about implementing Serializable as I have no idea about its possible side effects. Would it break at some point?
If there are no side effects, why all the objects are not Serializable by their own?
Implementing Serializable has no side-effects ... apart from the obvious one of making the serialization mechanism consider serializing it.
(Of course, that fact that you implement the Serializable interface doesn't necessarily mean that serialization will work. For example, if your class has instance fields that are not serializable, and those fields are not declared as transient, then the normal serialization mechanism will fail.)
If there are no side effects why all the objects are not Serializable by their own?
One reason is that some objects have state that cannot be captured and represented by serialization. Examples include all kinds of Streams that are connected to data sources or sinks outside of the JVM, Java threads, and Java processes.
A second reason is that (arguably) the programmer should decide whether it is appropriate for a class to be serializable. Examples where it might be inappropriate include classes that hold sensitive information or classes whose internals are liable to change ... making deserialization problematic1.
1 - It is possible to deal with this, to a degree, but the programmer may want to say "I don't want to be forced to deal with this" ... for a class the he / she thinks should not be serialized.

How encapsulation is broken while accepting default Serialization?

I often hear people saying that Serialization breaks encapsulation and this loss of encapsulation can be somewhat minimized by providing custom serialization. Can someone provide a concrete example that justifies the loss of encapsulation due to default serialization and how can this loss be minimized by resorting to custom serialization?
I am tagging this question as Java related but the answer can be language agnostic as I think this is a common problem across platforms and languages.
Excellent question! First, let's get a definition for encapsulation and go from there. This wikipedia article defines encapsulation in the following way:
A language mechanism for restricting access to some of the object's components.
A language construct that facilitates the bundling of data with the methods (or other functions) operating on that data.
Serialization, at least the way Java does it, has ramifications for both of these notions. When you implement the Serializable interface in Java, you are essentially telling the JVM that all of your non-transient member variables and the order in which they are declared defines the contract by which objects can be reconstructed from a byte stream. This works recursively if and only if all of your member variable's class definitions also implement Serializable, and this is where you can get into trouble.
The Encapsulation Problem
Based on the previous definition of encapsulation, particularly the first item, encapsulation prevents you from knowing anything about how the object you are dealing with actually works under the hood, with respect to its member variables. Implementing Serializable "correctly" forces you as a developer to know more about the objects you are dealing with than you probably care about in the functional sense. In this sense, implementing Serializable directly opposes encapsulation.
Custom Serialization
In every case, serialization requires knowledge about what data constitutes an "object" of a particular type. Java's Serializable interface takes this to the extreme by forcing you to know the transient state of every member variable of every Object you hope to serialize. You could get around this by defining a serialization mechanism external to the types that need to be serialized, but there will be design tradeoffs - e.g. you'd probably need to deal with Objects at the level of the interface(s) they implement instead of direct interaction with their member variables, and you may lose some of the ability to reconstruct the exact Object type from a serialized byte stream.
Java default serialiation writes and reads field by field this way it exposes object's internal structure which breaks encapsulation. If you change the class's internal structure you might not be able to restore the object state correctly. While with custom serialization if you changed the class you can try and change readObject so that saved objects can be restored correctly.

Why is it necessary to use a marker interface to serialize an object?

Why can't just avoid this if I want all objects in my app to be serializable ?
Update: I know that some class cannot be serialized like thread but the java system KNOWS also that Thread is not serializable, why doesn't it manage this automatically ?
I'd like to know if there are some fundamental reasons.
Why can't just avoid this if I want all objects in my app to be serializable ?
Simply, because that's the way Java serialization works.
Consider that it does not make sense to serialize all objects.
Thread instances and instances of most Stream classes include critical state that simply cannot be serialized.
Some classes depend on class statics, and they are not serialized.
Some classes are non serializable because they critically depend on unserializable classes.
Some classes you simply don't want or need to serialize.
So given that, how should the application programmer control what gets serialized? How does he stop all sorts of unnecessary stuff from being serialized by accident? Answer: by declaring the classes he wants to be serializable as implementing Serializable.
If you implement all your serialization-related code yourself, you don't need it, but as long as you do it using standard library functions, there must be some way to communicate that your classes are designed and ready for serialization. Just because all the classes in your program are serializable doesn't mean they are in other's programs.
Because that's the way the language was designed? Questions like this are fundamentally pointless. It would have been possible, and indeed easier, to make all classes serializable, but it wasn't done that way. There are lots of reasons why not, and they are given in some FAQ or Gosling interview somewhere, that I read about 12 years ago. Security was certainly one of them. But at this stage it's a futile discussion really.

Is it safe to use bytecode enhancement techniques on classes that might be serialized and why?

I haven't tried this yet, but it seems risky. The case I'm thinking of is instrumenting simple VO classes with JiBX. These VOs are going to be serialized over AMF and possibly other schemes. Can anyone confirm or deny my suspicions that doing behind-the-back stuff like bytecode enhancement might mess something up in general, and provide some background information as to why? Also, I'm interested in the specific case of JiBX.
Behind the scenes, serialization uses reflection. Your bytecode manipulation is presumably adding fields. So, unless you mark these fields as transient, they will get serialised just like normal fields.
So, provided you have performed the same bytecode manipulation on both sides, you'll be fine.
If you haven't you'll need to read the serialisation documentation to understand how the backwards compatibility features work. Essentially, I think you can send fields that aren't expected by the receiver and you're fine; and you can miss out fields and they'll get their default values on the receiving end. But you should check this in the spec!
If you're just adding methods, then they have no effect on serialisation, unless they are things like readResolve(), etc. which are specifically used by the serialisation mechanism.
Adding/changing/removing public or protected fields or methods to a class will affect it's ability to be deserialized. As will adding interfaces. These are used among other things to generate a serialVersionUID which is written to the stream as part of the serialization process. If the serialVersionUID of the class doesn't match the loaded class during deserialization, then it will fail.
If you explicitly set the serialVersionUID in your class definition you can get by this. You may want to implement readObject and writeObject as well.
In the extreme case you can implement Externalizable and have full control of all serialization of the object.
Absolute worst case scenario (though incredibly useful in some situations) is to implement writeReplace on a complex object to swap it out with a sort of simpler value object in serialization. Then in deserialization the simpler value object can implement readResolve to either rebuild or locate the complex object on the other side. It's rare when you need to pull that out, but awfully fun when you do.

Is implementing java.io.Serializable even in simple POJO Java classes a best practice?

In general, is it a best practice to have simple POJO Java classes implement java.io.Serializable?
Generally not. Joshua Bloch says to implement Serializable judiciously. A summary of drawbacks that he describes:
decreases flexibility of changing class implementation later - the serialized form is part of the class's API
makes some bugs and security holes more likely - an attacker can access class internals within the serialized byte stream
increases test burden - now you have to test serialization!
burdens authors of subclasses - they have to make their subclasses Serializable too
Of course, sometimes you need a POJO to implement Serializable, say for RMI, but if the need isn't there, your code will be simpler and more secure without it.
Only if you need to be able to serialise them. It's not worth the effort otherwise.
It depends more on the needs. In the context of web applications, some web servers (eg. Tomcat 6) even make it mandatory to serialize the classes whose objects we store in sessions.
One thing I've done to address the fact that the serialized form is not backwards compatible (say when dynamically reloading a class on a running system), is load the fields I want to save into a hashmap and then serializing that. That way, I can always deserialize in the data, even if there are missing fields. You might have to provide defaults for missing keys, but it's better than messing up field order.

Categories