Java Serialization

Java Serialization - java

I posting a doubt that I came across reading Effective Java. I apologize if its a real simple and straight forward doubt. So in Item 74 - Implement Serializable judiciously, He is saying that even after implementing a good Information Hiding on your class using private and package private fields, it is prone to lose effectiveness? Whatever I read in the past was, all serialization does is, convert Objects into Byte Stream Form and After deserialization the same object is retained back. How does it lose Data Hiding in this process?

You could potentially have access to the value of the internal state of an object using serialization and deserialization.
By serializing an object, you might be able to read the values of the private fields that you otherwise shouldn't. Conversely, if you create a well-crafted byte array that you deserialize into an instance, you might be able to initialize it in an illegal state.

Data hiding problem with Serialization in context of OOP is pointed by #candiru.
But there is another aspect as well with Serialization.
You can send serialized file across the network so it can be peeped and things which are supposed to be private can be easily compromised.
Below is the content of a Bean class which i serialized (using default technique). I could view the content by opening the serialized file in a text editor.
¬í sr SerializationPractice1 I ageL extrat Ljava/lang/String;L nameq ~ xp
pt SidKumarq ~ x
Now you can easily find below things without even knowing about the class :
Name of the class : SerializationPractice1
A string attribute named as name value is SidKumar
These things you can notice for sure; other details are not so clear. And above information is correct.

I do believe that Serialization has the potential of exposing private data to the outside world. And that is where Externalizing (using Externalizable type instances come in very handy). By implementing Externalizable interface's writeExternal(...) method the developer has full control over the serialization process rather than relying completely on the default serialization runtime implementation. Below is the pseudo-code for my idea (I would be ignoring the actual method signatures as it is mere a pseudo-code intended to put across the broader idea):
class SensitiveData implemets java.io.Externalizable{
int sensitiveInteger;
writeExternal (OutputData outputData){
//encrypt sensitiveInteger here
//serialize the sensitiveInteger which is now encrypted to any persistent store
outputData.writeInt(sensitiveInteger);
//do other processing
}
}
In fact, why just encryption, we might well want to compress the bytes serialized to some persistent store if we want in some situations where the instance to be serialized is 'big'.

Related

Serialization of Java Objects

I'm trying to evaluate the usefulness of using Serialized objects in a Java application I'm developing. I'm trying to determine if the following makes sense for an Object-serialization implementation, or if I should custom-build the transport. Here's my scenario:
Objects will be transported over TCP from one application to another.
The serialized object will be an instance of a class stored in a common library.
Sample Class:
public class Room implements Serializable {
// Instance Variables
private Room roomWithinRoom;
// ...
}
So my question is that since I will have several instance variables that reference back to the Room class, can I use Java serialization to accomplish the transfer of Room objects? If I am able to accomplish this, will the pointers be preserved?
Example:
Room mainRoom = new Room();
Room closet = new Room();
mainRoom.addRoom(closet);
If I send over the object "mainRoom," will it also serialize and send the "closet" instance (preserving the roomWithinRoom instance variable in mainRoom?)
Thanks a bunch, everyone!

Yes, Java serialization does this really well. Maybe too well. I've had great luck writing to files. With TCP be sure to call reset after each write. Java will send a whole network graph in one go--your object, all the objects references, all they reference, and so on. One writeObject can send gigabytes of data. But without reset, if the next object you write was included in the first graph, all that will go across will be a reference to the previously sent object. Any updated data will be ignored. (Did I mention you should call reset after each writeObject?)
I'd suggest using the Externalizable interface. This gives you a bit more control over what gets written. (Actually, this is done on a class basis, so some classes can be Serializable and some Externalizable with no problem.) This means you can write in the same format even when the class changes a bit. It lets you skip data you don't need, and sometimes pass trivial objects as data primitives. I also use version numbers so newer versions of the class can read stuff written by older versions. (This is a bit more important when writing to files than with TCP.) A warning: when reading a class, be aware that the object reference you just read may not reference any data (that is, the referenced object has not be read in yet; you're looking at unitialized memory where the object's data will go.).
You'll probably fumble around with this a bit, but it does work really well once you understand it.

References will be preserved within the object graph, and any nested objects will also be serialized correctly given they implement the serializable interface, and are not marked transient.
I advise against built in java serialization as it is more verbose than other binary protocols. Also there is always the potential that serialization/deserialization routines could change between runtimes.
For these reasons, I suggest ProtoBufs. ProtoBuf is small, fast, and can even be used to quickly build serialization/deserialization routines in languages other than Java, given you find a protobuf "compiler" for that language.

Write object with transient attributes to stream (Java)

I want to write an object into a stream (or byte array) with its transient attributes to be able to reconstruct it in another VM. I don't want to modify its attributes because that object is a part of legacy application.
Standard Java serialization mechanism doesn't help. What other options do I have?
Update:
The reason I'm asking the question is that I want to modify an existing Spring application. It called a bean's method in-process earlier but now I want to move the bean on a separate machine and use Spring remoting through HTTP invoker. And I have a problem with parameters that have transient fields that need to be passed to this method but not needed to be serialized in other parts of the app.

Hmm - if an attribute is marked as transient, that means exactly that it's not mean to be considered part of the object's persistent state, e.g. for serialization. The fact that you want to do this at all is a code smell, and the correct solution is to stop those fields being transient.
Let's say though that for whatever reason you can't modify the target classes themselves. My first thought was that you could customise the serialisation by implementing readObject() and writeObject() methods, but that would also require changes to the target class.
In that case, you'll need to work with some kind of reflection-based or metadata-based API in order to do this. There are many libraries that will convert objects to and from XML or JSON or DB rows, etc. Your best bet would be to use one of these to convert the object to and from "hydrated" form (and likely you'll need to customise them, as any sane serialiser will ignore transient fields). Which one to pick depends on your current software stack, and your precise requirements.

I assume you cannot change the legacy code. In this case I think you will have to resort to going over the object fields with reflection and DataOutputStream.

transient variables are supposed to be those that aren't serializable or are easily recalculated.
My first suggestion is to look for methods on this object to recalculate the transient fields.

Difference between serializing and deserializing and writing internals to a file and then reading them and passing them in constructor

Lets say we have a class
Class A implements serializable{
String s;
int i;
Date d;
public A(){
}
public A(String s, int i, Date d){
this.s =s;
blah blah
}
}
Now lets say one way i store all the internal values of s,i,d to a file and read them again, and pass them to the constructor and create a new object. Second I serialize and then deserialize to a new object. What is the basic difference between the two approaches.
I know serialization will be slow and secure and the other approach is not. Any other differences.

Read this article, explains pretty good what is serialization about (it is for Java RMI but the serialization explanation and problems are the same): http://oreilly.com/catalog/javarmi/chapter/ch10.html
The main differences I see is that:
(As the other answers says) you are responsible to serialize - deserialize. What is going to happen when one of the properties is another big complex class? What are you going to do then? Save its value as well?
Serialization depends on reflection, while the file thing depends on getters/setters/constructors. With reflection you don't need public setters/getters or a constructor with parameters. With the file thing you need them.
Extracted from the link above:
Using Serialization
Serialization is a mechanism built into the core Java libraries for writing a graph of objects into a stream of data. This stream of data can then be programmatically manipulated, and a deep copy of the objects can be made by reversing the process. This reversal is often called deserialization.
In particular, there are three main uses of serialization:
As a persistence mechanism. If the stream being used is FileOutputStream, then the data will automatically be written to a file.
As a copy mechanism. If the stream being used is ByteArrayOutputStream, then the data will be written to a byte array in memory. This byte array can then be used to create duplicates of the original objects.
As a communication mechanism. If the stream being used comes from a socket, then the data will automatically be sent over the wire to the receiving socket, at which point another program will decide what to do.
The important thing to note is that the use of serialization is independent of the serialization algorithm itself. If we have a serializable class, we can save it to a file or make a copy of it simply by changing the way we use the output of the serialization mechanism.

In your first approach, you are responsible for maintaining the logical relationship between the data values (in the sense that you store the data and then read it back and construct the object back).
In the second approach, Java does this for you behind the scenes.

Serialization and Deserialization in Java
Serialization is a process by which we can store the state of an object into any storage medium. We can store the state of the object into a file, into a database table etc. Deserialization is the opposite process of serialization where we retrieve the object back from the storage medium.
Eg1: Assume you have a Java bean object and its variables are having some values. Now you want to store this object into a file or into a database table. This can be achieved using serialization. Now you can retrieve this object again from the file or database at any point of time when you need it. This can be achieved using deserialization: (Post by Bobin Goswami).

Not real difference other than that you are implementing a custom serialization scheme, so that will typically involve more code, since by default serialization requires just an interface declaration.
You can achieve something very similar with Externalizable - you are in control of exactly what data is saved, so you can choose to save just the constructor arguments and construct the object from that. (You could achieve this also with serialization by marking non-constructor arguments as transient.)

The section on Serialization in Joshua Bloch's Effective Java, 2nd Ed. is really a good read on this subject. Something that is very important to keep in mind:
Using your own homegrown persistence method is intralinguistic. When you read data back from a store, you control how an object's state is restored. Very often this is with constructors and/or static factories. The invariants of the object's state are preserved. Encapsulation is maintained because you don't necessarily need to disclose implementation details as part of the custom store. The downside, of course, is that data very often needs to go places and #pakore nicely outlined those situations in which serialization is useful.
Serialization is an extralinguistic mechanism. Bloch makes compelling arguments for why serialization (in particular, the Serializiable interface) should be invoked only with the greatest of care. Serialization can bypass constructors because reconstitution of objects does not depend on one. There are profound possible security concerns. The invariants of your object's state are vulnerable. Moreover, using Serializable tends to lock you into supporting a particular class implementation (i.e., it destroys encapsulation) because much of your object's state becomes part of the class's exported API once it becomes Serializable (this can be proactively deferred by marking certain instance fields as transient).
TL;DR: Serialization is a common and even fundamental aspect of modern Java-based computing. Data these days must go places, and serialization provides a commonly used mechanism for communication. Because of the vulnerabilities that serialization may invoke and because it may case much (or all) of your object's internal state to become part of its exported API, the Serializable interface should be used with the greatest of care.

How to cache any object type to memory/disk in java?

Is there a generic way to cache any type of object (be in a java class, or a word document etc.) to memory or disk?
Is simply serializing the object, and retaining the file extension (if it has one) enough to rebuild the object?

You seems to be using the word Object to describe 2 different things.
If your object is a Java object then having that object implement the Serializable is enough if you then use the java methods to serialize/de-serialize the object.
If you want to cache arbitrary data from the filesystem, the best way is to read it in an byte array(Or ArrayList). Then you can just write the array back to the disk or where you want it.

If you're talking about the inbuilt Java serialization, then you wouldn't even need to retain the file extension. The serialized form has enough information such that the deserialization process will produce an identical object without any additional help. I suppose that depending on how your code is structured, though, you might need to store some metadata for your own benefit so that you know what to cast the resulting Object as.
Note that Java serialization doesn't seem to fit your requirements, though - it cannot serialize any type of object, only those that implement Serializable. Perhaps you need to think a little more about what you mean by "simply serializing the object", since that's the rub.

No.
There is a class of objects which cannot be deserialized in a meaningful way. Think of an open network connection which is in the middle of transferring a file. You can not store that to disk, close your app, open your app, deserialize that connection and expect that it "just continues".
Java has an interface Serializable which indicates that an object can be serialized. It's up to you to ensure that is indeed possible. Typically an object is Serializable if all the data it holds is Serializable, or that data which is not Serializable is marked transient.
This is not to say that you could not, theoretically, dump the memory contents to a file as a byte stream, and read it back again later. You could build something like that I suppose. But to expect that it works is a different thing altogether.
In short, it is not possible to serialize any type. However, there is a generic way to serialize Java objects which are marked to be Serializable.

Not sure what you mean by "or a word document". Serialization can be used for disk caching, not sure what the purpose of using it in memory would be since it would probably be far faster to simply keep the original object.
A more robust solution might be ehcache it can manage the size of the cache as well as moving it between memory and disk.

If you're wondering about the cross platform (disk or memory) persistence part of the question, look at Java's Preferences class.

My, what a lot of answers!
Any object can make itself serializable by implementing java.io.Serializable.
But:
A default serialiser is implemented in ObjectOutputStream, which simply walks the object tree. This is fine for simple javabean type objects, but it can have undesirable effects such as system objects being serialised (I once inspected a serialised java object file and found that it was including all of the system timezone objects). And, of course, if your object has objects inside it that are not serializable (and not transient), then ObjectOutputStream will throw an exception.
(actually, even for JavaBean objects the default serializer it awful - the default serializer emits the classname of java.lang.String for every string field.)
So if your object is complicated, then you really should implement Externalizable and write a serialiser and deserializer with some smarts.
http://download.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#7185
So basically - no, you can't serialise any old object. You have to design object that are intended to be serialised and, ideally, that have some smarts about how they get themselves to and from a stream.

You cannot serialize any object in Java. Moreover, Java uses shallow copying(or is it called something else) for serialization, so if you want to seialize something like a HashMap, it might not save your data.

How does Java's serialization work and when it should be used instead of some other persistence technique?

I've been lately trying to learn more and generally test Java's serialization for both work and personal projects and I must say that the more I know about it, the less I like it. This may be caused by misinformation though so that's why I'm asking these two things from you all:
1: On byte level, how does serialization know how to match serialized values with some class?
One of my problems right here is that I made a small test with ArrayList containing values "one", "two", "three". After serialization the byte array took 78 bytes which seems awfully lot for such low amount of information(19+3+3+4 bytes). Granted there's bound to be some overhead but this leads to my second question:
2: Can serialization be considered a good method for persisting objects at all? Now obviously if I'd use some homemade XML format the persistence data would be something like this
<object>
<class="java.util.ArrayList">
<!-- Object array inside Arraylist is called elementData -->
<field name="elementData">
<value>One</value>
<value>Two</value>
<value>Three</value>
</field>
</object>
which, like XML in general, is a bit bloated and takes 138 bytes(without whitespaces, that is). The same in JSON could be
{
"java.util.ArrayList": {
"elementData": [
"one",
"two",
"three"
]
}
}
which is 75 bytes so already slightly smaller than Java's serialization. With these text-based formats it's of course obvious that there has to be a way to represent your basic data as text, numbers or any combination of both.
So to recap, how does serialization work on byte/bit level, when it should be used and when it shouldn't be used and what are real benefits of serialization besides that it comes standard in Java?

I would personally try to avoid Java's "built-in" serialization:
It's not portable to other platforms
It's not hugely efficient
It's fragile - getting it to cope with multiple versions of a class is somewhat tricky. Even changing compilers can break serialization unless you're careful.
For details of what the actual bytes mean, see the Java Object Serialization Specification.
There are various alternatives, such as:
XML and JSON, as you've shown (various XML flavours, of course)
YAML
Facebook's Thrift (RPC as well as serialization)
Google Protocol Buffers
Hessian (web services as well as serialization)
Apache Avro
Your own custom format
(Disclaimer: I work for Google, and I'm doing a port of Protocol Buffers to C# as my 20% project, so clearly I think that's a good bit of technology :)
Cross-platform formats are almost always more restrictive than platform-specific formats for obvious reasons - Protocol Buffers has a pretty limited set of native types, for example - but the interoperability can be incredibly useful. You also need to consider the impact of versioning, with backward and forward compatibility, etc. The text formats are generally hand-editable, but tend to be less efficient in both space and time.
Basically, you need to look at your requirements carefully.

The main advantage of serialization is that it is extremely easy to use, relatively fast, and preserves actual Java object meshes.
But you have to realize that it's not really meant to be used for storing data, but mainly as a way for different JVM instances to communicate over a network using the RMI protocol.

see the Java Object Serialization Stream Protocol for a description of the file format an grammar used for serialized objects.
Personally I think the built-in serialization is acceptable to persist short-lived data (e.g. store the state of a session object between to http-requests) which is not relevant outside your application.
For data that has a longer live-time or should be used outside your application, I'd persist either into a database or at least use a more commonly used format...

How does Java's built-in serialization works?
Whenever we want to serialize an object, we implement java.io.Serializable interface. The interface which does not have any methods to implement, even though we are implementing it to indicate something to compiler or JVM (known as Marker Interface). So if JVM sees a Class is Serializable it perform some pre-processing operation on those classes. The operation is, it adds the following two sample methods.
private void writeObject(java.io.ObjectOutputStream stream)
throws IOException {
stream.writeObject(name); // object property
stream.writeObject(address); // object property
}
private void readObject(java.io.ObjectInputStream stream)
throws IOException, ClassNotFoundException {
name = (String) stream.readObject(); // object property
address = (String) stream.readObject();// object property
}
When it should be used instead of some other persistence technique?
The built in Serialization is useful when sender and receiver both are Java. If you want to avoid the above kind of problems, we use XML or JSON with the help of frameworks.

I bumped into this dilemma about a month ago (see the question I asked).
The main lesson I learned from it is use Java serialization only when necessary and if there's no other option. Like Jon said, it has it's downfalls, while other serialization techniques are much easier, faster and more portable.

Serializing means that you put your structured data in your classes into a flat order of bytecode to save it.
You should generally use other techniques than the buildin java-method, it is just made to work out of the box but if you have some changing contents or changing orders in future in your serialized classes, you get into trouble because you'll cannot load them correctly.

The advantage of Java Object Serialization (JOS) is that it just works. There are also tools out there that do the same as JOS, but use an XML format instead of a binary format.
About the length: JOS writes some class information at the start, instead of as part of each instance - e.g. the full field names are recorded once, and an index into that list of names is used for instances of the class. This makes the output longer if you write only one instance of the class, but is more efficient if you write several (different) instances of it. It's not clear to me if your example actually uses a class, but this is the general reason why JOS is longer than one would expect.
BTW: this is incidental, but I don't think JSON records class names (as you have in your example), and so it might not do what you need.

The reason why storing a tiny amount of information is serial form is relatively large is that it stores information about the classes of the objects it is serialising. If you store a duplicate of your list, then you'll see that the file hasn't grown by much. Store the same object twice and the difference is tiny.
The important pros are: relatively easy to use, quite fast and can evolve (just like XML). However, the data is rather opaque, it is Java-only, tightly couples data to classes and untrusted data can easily cause DoS. You should think about the serialised form, rather than just slapping implements Serializable everywhere.

If you don't have too much data, you can save objects into a java.util.Properties object. An example of a key/value pair would be user_1234_firstname = Peter. Using reflection to save and load objects can make things easier.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.