How encapsulation is broken while accepting default Serialization? - java

I often hear people saying that Serialization breaks encapsulation and this loss of encapsulation can be somewhat minimized by providing custom serialization. Can someone provide a concrete example that justifies the loss of encapsulation due to default serialization and how can this loss be minimized by resorting to custom serialization?
I am tagging this question as Java related but the answer can be language agnostic as I think this is a common problem across platforms and languages.

Excellent question! First, let's get a definition for encapsulation and go from there. This wikipedia article defines encapsulation in the following way:
A language mechanism for restricting access to some of the object's components.
A language construct that facilitates the bundling of data with the methods (or other functions) operating on that data.
Serialization, at least the way Java does it, has ramifications for both of these notions. When you implement the Serializable interface in Java, you are essentially telling the JVM that all of your non-transient member variables and the order in which they are declared defines the contract by which objects can be reconstructed from a byte stream. This works recursively if and only if all of your member variable's class definitions also implement Serializable, and this is where you can get into trouble.
The Encapsulation Problem
Based on the previous definition of encapsulation, particularly the first item, encapsulation prevents you from knowing anything about how the object you are dealing with actually works under the hood, with respect to its member variables. Implementing Serializable "correctly" forces you as a developer to know more about the objects you are dealing with than you probably care about in the functional sense. In this sense, implementing Serializable directly opposes encapsulation.
Custom Serialization
In every case, serialization requires knowledge about what data constitutes an "object" of a particular type. Java's Serializable interface takes this to the extreme by forcing you to know the transient state of every member variable of every Object you hope to serialize. You could get around this by defining a serialization mechanism external to the types that need to be serialized, but there will be design tradeoffs - e.g. you'd probably need to deal with Objects at the level of the interface(s) they implement instead of direct interaction with their member variables, and you may lose some of the ability to reconstruct the exact Object type from a serialized byte stream.

Java default serialiation writes and reads field by field this way it exposes object's internal structure which breaks encapsulation. If you change the class's internal structure you might not be able to restore the object state correctly. While with custom serialization if you changed the class you can try and change readObject so that saved objects can be restored correctly.

Related

Not serializable class with strings only [duplicate]

We work heavily with serialization and having to specify Serializable tag on every object we use is kind of a burden. Especially when it's a 3rd-party class that we can't really change.
The question is: since Serializable is an empty interface and Java provides robust serialization once you add implements Serializable - why didn't they make everything serializable and that's it?
What am I missing?
Serialization is fraught with pitfalls. Automatic serialization support of this form makes the class internals part of the public API (which is why javadoc gives you the persisted forms of classes).
For long-term persistence, the class must be able to decode this form, which restricts the changes you can make to class design. This breaks encapsulation.
Serialization can also lead to security problems. By being able to serialize any object it has a reference to, a class can access data it would not normally be able to (by parsing the resultant byte data).
There are other issues, such as the serialized form of inner classes not being well defined.
Making all classes serializable would exacerbate these problems. Check out Effective Java Second Edition, in particular Item 74: Implement Serializable judiciously.
I think both Java and .Net people got it wrong this time around, would have been better to make everything serializable by default and only need to mark those classes that can't be safely serialized instead.
For example in Smalltalk (a language created in 70s) every object is serializable by default. I have no idea why this is not the case in Java, considering the fact that the vast majority of objects are safe to serialize and just a few of them aren't.
Marking an object as serializable (with an interface) doesn't magically make that object serializable, it was serializable all along, it's just that now you expressed something that the system could have found on his own, so I see no real good reason for serialization being the way it is now.
I think it was either a poor decision made by designers or serialization was an afterthought, or the platform was never ready to do serialization by default on all objects safely and consistently.
Not everything is genuinely serializable. Take a network socket connection, for example. You could serialize the data/state of your socket object, but the essence of an active connection would be lost.
The main role of Serializable in Java is to actually make, by default, all other objects nonserializable. Serialization is a very dangerous mechanism, especially in its default implementation. Hence, like friendship in C++, it is off by default, even if it costs a little to make things serializable.
Serialization adds constraints and potential problems since structure compatibility is not insured. It is good that it is off by default.
I have to admit that I have seen very few nontrivial classes where standard serialization does what I want it to. Especially in the case of complex data structures. So the effort you'd spend making the class serializble properly dwarves the cost of adding the interface.
For some classes, especially those that represent something more physical like a File, a Socket, a Thread, or a DB connection, it makes absolutely no sense to serialize instances. For many others, Serialization may be problematic because it destroys uniqueness constraints or simply forces you to deal with instances of different versions of a class, which you may not want to.
Arguably, it might have been better to make everything Serializable by default and make classes non-serializable through a keyword or marker interface - but then, those who should use that option probably would not think about it. The way it is, if you need to implement Serializable, you'll be told so by an Exception.
I think the though was to make sure you, as the programmer, know that your object my be serialized.
Apparently everything was serializable in some preliminary designs, but because of security and correctness concerns the final design ended up as we all know.
Source: Why must classes implement Serializable in order to be written to an ObjectOutputStream?.
Having to state explicitely that instances of a certain class are Serializable the language forces you to think about if you you should allow that. For simple value objects serialization is trivial, but in more complex cases you need to really think things through.
By just relying on the standard serialization support of the JVM you expose yourself to all kinds of nasty versioning issues.
Uniqueness, references to 'real' resources, timers and lots of other types of artifacts are NOT candidates for serialization.
Read this to understand Serializable Interface and why we should make only few classes Serializable and also we shopuld take care where to use transient keyword in case we want to remove few fields from the storing procedure.
http://www.codingeek.com/java/io/object-streams-serialization-deserialization-java-example-serializable-interface/
Well, my answer is that this is for no good reason. And from your comments I can see that you've already learned that. Other languages happily try serializing everything that doesn't jump on a tree after you've counted to 10. An Object should default to be serializable.
So, what you basically need to do is read all the properties of your 3rd-party class yourself. Or, if that's an option for you: decompile, put the damn keyword there, and recompile.
There are some things in Java that simply cannot
be serialized because they are runtime specific. Things like streams, threads, runtime,
etc. and even some GUI classes (which are connected to the underlying OS) cannot
be serialized.
While I agree with the points made in other answers here, the real problem is with deserialisation: If the class definition changes then there's a real risk the deserialisation won't work. Never modifying existing fields is a pretty major commitment for the author of a library to make! Maintaining API compatibility is enough of a chore as it is.
A class which needs to be persisted to a file or other media has to implement Serializable interface, so that JVM can allow the class object to be serialized.
Why Object class is not serialized then none of the classes need to implement the interface, after all JVM serializes the class only when I use ObjectOutputStream which means the control is still in my hands to let the JVM to serialize.
The reason why Object class is not serializable by default in the fact that the class version is the major issue. Therefore each class that is interested in serialization has to be marked as Serializable explicitly and provide a version number serialVersionUID.
If serialVersionUID is not provided then we get unexpected results while deserialzing the object, that is why JVM throws InvalidClassException if serialVersionUID doesn't match. Therefore every class has to implement Serializable interface and provide serialVersionUID to make sure the Class presented at the both ends is identical.

Side effects of using Serializable?

Reviewing server logs I encountered NotSerializableException for a domain object during some RMI cache transfer function. I noticed that a domain object doesn't implement Serializable interface; however I am a bit sceptical about implementing Serializable as I have no idea about its possible side effects. Would it break at some point?
If there are no side effects, why all the objects are not Serializable by their own?
Implementing Serializable has no side-effects ... apart from the obvious one of making the serialization mechanism consider serializing it.
(Of course, that fact that you implement the Serializable interface doesn't necessarily mean that serialization will work. For example, if your class has instance fields that are not serializable, and those fields are not declared as transient, then the normal serialization mechanism will fail.)
If there are no side effects why all the objects are not Serializable by their own?
One reason is that some objects have state that cannot be captured and represented by serialization. Examples include all kinds of Streams that are connected to data sources or sinks outside of the JVM, Java threads, and Java processes.
A second reason is that (arguably) the programmer should decide whether it is appropriate for a class to be serializable. Examples where it might be inappropriate include classes that hold sensitive information or classes whose internals are liable to change ... making deserialization problematic1.
1 - It is possible to deal with this, to a degree, but the programmer may want to say "I don't want to be forced to deal with this" ... for a class the he / she thinks should not be serialized.

When is a reference to the object class required?

What is the function of the class Object in java? All the "objects" of any user defined class have the same function as the aforementioned class .So why did the creators of java create this class?
In which situations should one use the class 'Object'?
Since all classes in Java are obligated to derive (directly or indirectly) from Object, it allows for a default implementation for a number of behaviours that are needed or useful for all objects (e.g. conversion to a string, or a hash generation function).
Furthermore, having all objects in the system with a common lineage allows one to work with objects in a general sense. This is very useful for developing all sorts of general applications and utilities. For example, you can build a general purpose cache utility that works with any possible object, without requiring users to implement a special interface.
Pretty much the only time that Object is used raw is when it's used as a lock object (as in Object foo = new Object(); synchronized(foo){...}. The ability to use an object as the subject of a synchronized block is built in to Object, and there's no point to using anything more heavyweight there.
Object provides an interface with functionality that the Java language designers felt all Java objects should provide. You can use Object when you don't know the subtype of a class, and just want to treat it in a generic manner. This was especially important before the Java language had generics support.
There's an interesting post on programmers.stackexchange.com about why this choice was made for .NET, and those decisions most likely hold relevance for the Java language.
What Java implements is sometimes called a "cosmic hierarchy". It means that all classes in Java share a common root.
This has merit by itself, for use in "generic" containers. Without templates or language supported generics these would be harder to implement.
It also provides some basic behaviour that all classes automatically share, like the toString method.
Having this common super class was back in 1996 seen as a bit of a novelty and cool thing, that helped Java get popular (although there were proponents for this cosmic hierarchy as well).

Is it safe to use bytecode enhancement techniques on classes that might be serialized and why?

I haven't tried this yet, but it seems risky. The case I'm thinking of is instrumenting simple VO classes with JiBX. These VOs are going to be serialized over AMF and possibly other schemes. Can anyone confirm or deny my suspicions that doing behind-the-back stuff like bytecode enhancement might mess something up in general, and provide some background information as to why? Also, I'm interested in the specific case of JiBX.
Behind the scenes, serialization uses reflection. Your bytecode manipulation is presumably adding fields. So, unless you mark these fields as transient, they will get serialised just like normal fields.
So, provided you have performed the same bytecode manipulation on both sides, you'll be fine.
If you haven't you'll need to read the serialisation documentation to understand how the backwards compatibility features work. Essentially, I think you can send fields that aren't expected by the receiver and you're fine; and you can miss out fields and they'll get their default values on the receiving end. But you should check this in the spec!
If you're just adding methods, then they have no effect on serialisation, unless they are things like readResolve(), etc. which are specifically used by the serialisation mechanism.
Adding/changing/removing public or protected fields or methods to a class will affect it's ability to be deserialized. As will adding interfaces. These are used among other things to generate a serialVersionUID which is written to the stream as part of the serialization process. If the serialVersionUID of the class doesn't match the loaded class during deserialization, then it will fail.
If you explicitly set the serialVersionUID in your class definition you can get by this. You may want to implement readObject and writeObject as well.
In the extreme case you can implement Externalizable and have full control of all serialization of the object.
Absolute worst case scenario (though incredibly useful in some situations) is to implement writeReplace on a complex object to swap it out with a sort of simpler value object in serialization. Then in deserialization the simpler value object can implement readResolve to either rebuild or locate the complex object on the other side. It's rare when you need to pull that out, but awfully fun when you do.

Difference between serializing and deserializing and writing internals to a file and then reading them and passing them in constructor

Lets say we have a class
Class A implements serializable{
String s;
int i;
Date d;
public A(){
}
public A(String s, int i, Date d){
this.s =s;
blah blah
}
}
Now lets say one way i store all the internal values of s,i,d to a file and read them again, and pass them to the constructor and create a new object. Second I serialize and then deserialize to a new object. What is the basic difference between the two approaches.
I know serialization will be slow and secure and the other approach is not. Any other differences.
Read this article, explains pretty good what is serialization about (it is for Java RMI but the serialization explanation and problems are the same): http://oreilly.com/catalog/javarmi/chapter/ch10.html
The main differences I see is that:
(As the other answers says) you are responsible to serialize - deserialize. What is going to happen when one of the properties is another big complex class? What are you going to do then? Save its value as well?
Serialization depends on reflection, while the file thing depends on getters/setters/constructors. With reflection you don't need public setters/getters or a constructor with parameters. With the file thing you need them.
Extracted from the link above:
Using Serialization
Serialization is a mechanism built into the core Java libraries for writing a graph of objects into a stream of data. This stream of data can then be programmatically manipulated, and a deep copy of the objects can be made by reversing the process. This reversal is often called deserialization.
In particular, there are three main uses of serialization:
As a persistence mechanism. If the stream being used is FileOutputStream, then the data will automatically be written to a file.
As a copy mechanism. If the stream being used is ByteArrayOutputStream, then the data will be written to a byte array in memory. This byte array can then be used to create duplicates of the original objects.
As a communication mechanism. If the stream being used comes from a socket, then the data will automatically be sent over the wire to the receiving socket, at which point another program will decide what to do.
The important thing to note is that the use of serialization is independent of the serialization algorithm itself. If we have a serializable class, we can save it to a file or make a copy of it simply by changing the way we use the output of the serialization mechanism.
In your first approach, you are responsible for maintaining the logical relationship between the data values (in the sense that you store the data and then read it back and construct the object back).
In the second approach, Java does this for you behind the scenes.
Serialization and Deserialization in Java
Serialization is a process by which we can store the state of an object into any storage medium. We can store the state of the object into a file, into a database table etc. Deserialization is the opposite process of serialization where we retrieve the object back from the storage medium.
Eg1: Assume you have a Java bean object and its variables are having some values. Now you want to store this object into a file or into a database table. This can be achieved using serialization. Now you can retrieve this object again from the file or database at any point of time when you need it. This can be achieved using deserialization: (Post by Bobin Goswami).
Not real difference other than that you are implementing a custom serialization scheme, so that will typically involve more code, since by default serialization requires just an interface declaration.
You can achieve something very similar with Externalizable - you are in control of exactly what data is saved, so you can choose to save just the constructor arguments and construct the object from that. (You could achieve this also with serialization by marking non-constructor arguments as transient.)
The section on Serialization in Joshua Bloch's Effective Java, 2nd Ed. is really a good read on this subject. Something that is very important to keep in mind:
Using your own homegrown persistence method is intralinguistic. When you read data back from a store, you control how an object's state is restored. Very often this is with constructors and/or static factories. The invariants of the object's state are preserved. Encapsulation is maintained because you don't necessarily need to disclose implementation details as part of the custom store. The downside, of course, is that data very often needs to go places and #pakore nicely outlined those situations in which serialization is useful.
Serialization is an extralinguistic mechanism. Bloch makes compelling arguments for why serialization (in particular, the Serializiable interface) should be invoked only with the greatest of care. Serialization can bypass constructors because reconstitution of objects does not depend on one. There are profound possible security concerns. The invariants of your object's state are vulnerable. Moreover, using Serializable tends to lock you into supporting a particular class implementation (i.e., it destroys encapsulation) because much of your object's state becomes part of the class's exported API once it becomes Serializable (this can be proactively deferred by marking certain instance fields as transient).
TL;DR: Serialization is a common and even fundamental aspect of modern Java-based computing. Data these days must go places, and serialization provides a commonly used mechanism for communication. Because of the vulnerabilities that serialization may invoke and because it may case much (or all) of your object's internal state to become part of its exported API, the Serializable interface should be used with the greatest of care.

Categories