I'm trying to evaluate the usefulness of using Serialized objects in a Java application I'm developing. I'm trying to determine if the following makes sense for an Object-serialization implementation, or if I should custom-build the transport. Here's my scenario:
Objects will be transported over TCP from one application to another.
The serialized object will be an instance of a class stored in a common library.
Sample Class:
public class Room implements Serializable {
// Instance Variables
private Room roomWithinRoom;
// ...
}
So my question is that since I will have several instance variables that reference back to the Room class, can I use Java serialization to accomplish the transfer of Room objects? If I am able to accomplish this, will the pointers be preserved?
Example:
Room mainRoom = new Room();
Room closet = new Room();
mainRoom.addRoom(closet);
If I send over the object "mainRoom," will it also serialize and send the "closet" instance (preserving the roomWithinRoom instance variable in mainRoom?)
Thanks a bunch, everyone!
Yes, Java serialization does this really well. Maybe too well. I've had great luck writing to files. With TCP be sure to call reset after each write. Java will send a whole network graph in one go--your object, all the objects references, all they reference, and so on. One writeObject can send gigabytes of data. But without reset, if the next object you write was included in the first graph, all that will go across will be a reference to the previously sent object. Any updated data will be ignored. (Did I mention you should call reset after each writeObject?)
I'd suggest using the Externalizable interface. This gives you a bit more control over what gets written. (Actually, this is done on a class basis, so some classes can be Serializable and some Externalizable with no problem.) This means you can write in the same format even when the class changes a bit. It lets you skip data you don't need, and sometimes pass trivial objects as data primitives. I also use version numbers so newer versions of the class can read stuff written by older versions. (This is a bit more important when writing to files than with TCP.) A warning: when reading a class, be aware that the object reference you just read may not reference any data (that is, the referenced object has not be read in yet; you're looking at unitialized memory where the object's data will go.).
You'll probably fumble around with this a bit, but it does work really well once you understand it.
References will be preserved within the object graph, and any nested objects will also be serialized correctly given they implement the serializable interface, and are not marked transient.
I advise against built in java serialization as it is more verbose than other binary protocols. Also there is always the potential that serialization/deserialization routines could change between runtimes.
For these reasons, I suggest ProtoBufs. ProtoBuf is small, fast, and can even be used to quickly build serialization/deserialization routines in languages other than Java, given you find a protobuf "compiler" for that language.
Related
I don't currently understand why I would choose to serialize an object instead of just doing a file output and then having a function read that file. What do I gain from serializing an object?
You gain an industry-standard way of reading and writing an object's data, using a W3C approved data exchange format that has almost universal support for readers and writers in almost every programming language.
Serialization makes it easy to store the state of objects,
and objects inside them (If they are Serializable and not marked as transient).
The benefits in your case :
Imagine you have a lot of different classes. Maybe coding a custom File-to-class parser is harder than readObject()
When you serialize an object, you are copying the actual byte data in memory into a stream. When you de-serialize that stream back into an object you get the identical object back including its internal object ID, which you would not get if you had written the properties of the object to a file, and then read it back in and interpreted it.
This means, if you serialize a collection of objects that reference each other, when you de-serialize them, they will still maintain their references to each other. This is good also for debugging a program. If an exception occurs you can create a memory dump on the users computer, and if they send it to you, then you can see directly what was in memory and the problems that may have been caused.
It is also easier to serialize a complex object with many properties to a stream than it is to build some string of representative data, which you will have to be read back, parse and construct a new object with it.
Really what you gain, is that it is easier/quicker and better for debugging.
I'm looking for some info on the best approach serialize a graph of object based on the following (Java):
Two objects of the same class must be binary equal (bit by bit) compared to true if their state is equal. (Must not depend on JVM field ordering).
Collections are only modeled with arrays (nothing Collections).
All instances are immutable
Serialization format should be in byte[] format instead of text based.
I am in control of all the classes in the graph.
I don't want to put an empty constructor in the classes just to support serialization.
I have looked at implementing a solution based my own traversal an on Objenisis but my problem does not seem that unique. Better checking for any existing/complete solution first.
Updated details:
First, thanks for your help!
Objects must serialize to exactly the same bit order based on the objects state. This is important since the binary content will be digitally signed. Reconstruction of the serialized format will be based on the state of the object and not that the original bits are stored.
Interoperability between different technologies is important. I do see the software running on ex. .Net in the future. No Java flavour in the serialized format.
Note on comments of immutability: The values of the arrays are copied from the argument to the inner fields in the constructor. Less important.
Best regards,
Niclas Lindberg
You could write the data yourself, using reflections or hand coded methods. I use methods which are look hand code, except they are generated. (The performance of hand coded, and the convience of not having to rewrite the code when it changes)
Often developers talk about the builtin java serialization, but you can have a custom serialization to do whatever you want, any way you want.
To give you are more detailed answer, it would depend on what you want to do exactly.
BTW: You can serialize your data into byte[] and still make it human readable/text like/editable in a text editor. All you have to do is use a binary format which looks like text. ;)
Maybe you want to familiarize yourself with the serialization frameworks available for Java. A good starting point for that is the thift-protobuf-compare project, whose name is misleading: It compares the performance of more than 10 ways of serializing data using Java.
It seems that the hardest constraint you have is Interoperability between different technologies. I know that Googles Protobuffers and Thrift deliver here. Avro might also fit.
The important thing to know about serialization is that it is not guaranteed to be consistent across multiple versions of Java. It's not meant as a way to store data on a disk or anywhere permanent.
It's used internally to send classes from one JVM to another during RMI or some other network protocol. These are the types of applications that you should use Serialization for. If this describes your problem - short term communication between two different JVM's - then you should try to get Serialization going.
If you're looking for a way to store the data more permanently or you will need the data to survive in forward versions of Java, then you should find your own solution. Given your requirements, you should create some sort of method of converting each object into a byte stream yourself and reading it back into objects. You will then be responsible for making sure the format is forward compatible with future objects and features.
I highly recommend Chapter 11 of Effective Java by Joshua Bloch.
Is the Externalizable interface what you're looking for ? You fully control the way your objects are persisted and you do that the OO-style, with methods that are inherited and all (unlike the private read-/write-Object methods used with Serializable). But still, you cannot get rid of the no-arg accessible constructor requirement.
The only way you would get this is:
A/ USE UTF8 text, I.E. XML or JSON, binary turned to base64(http/xml safe variety).
B/ Enforce UTF8 binary ordering of all data.
C/ Pack the contents except all unescaped white space.
D/ Hash the content and provide that hash in a positionally standard location in the file.
I posting a doubt that I came across reading Effective Java. I apologize if its a real simple and straight forward doubt. So in Item 74 - Implement Serializable judiciously, He is saying that even after implementing a good Information Hiding on your class using private and package private fields, it is prone to lose effectiveness? Whatever I read in the past was, all serialization does is, convert Objects into Byte Stream Form and After deserialization the same object is retained back. How does it lose Data Hiding in this process?
You could potentially have access to the value of the internal state of an object using serialization and deserialization.
By serializing an object, you might be able to read the values of the private fields that you otherwise shouldn't. Conversely, if you create a well-crafted byte array that you deserialize into an instance, you might be able to initialize it in an illegal state.
Data hiding problem with Serialization in context of OOP is pointed by #candiru.
But there is another aspect as well with Serialization.
You can send serialized file across the network so it can be peeped and things which are supposed to be private can be easily compromised.
Below is the content of a Bean class which i serialized (using default technique). I could view the content by opening the serialized file in a text editor.
’ sr SerializationPractice1 I ageL extrat Ljava/lang/String;L nameq ~ xp
pt SidKumarq ~ x
Now you can easily find below things without even knowing about the class :
Name of the class : SerializationPractice1
A string attribute named as name value is SidKumar
These things you can notice for sure; other details are not so clear. And above information is correct.
I do believe that Serialization has the potential of exposing private data to the outside world. And that is where Externalizing (using Externalizable type instances come in very handy). By implementing Externalizable interface's writeExternal(...) method the developer has full control over the serialization process rather than relying completely on the default serialization runtime implementation. Below is the pseudo-code for my idea (I would be ignoring the actual method signatures as it is mere a pseudo-code intended to put across the broader idea):
class SensitiveData implemets java.io.Externalizable{
int sensitiveInteger;
writeExternal (OutputData outputData){
//encrypt sensitiveInteger here
//serialize the sensitiveInteger which is now encrypted to any persistent store
outputData.writeInt(sensitiveInteger);
//do other processing
}
}
In fact, why just encryption, we might well want to compress the bytes serialized to some persistent store if we want in some situations where the instance to be serialized is 'big'.
Lets say we have a class
Class A implements serializable{
String s;
int i;
Date d;
public A(){
}
public A(String s, int i, Date d){
this.s =s;
blah blah
}
}
Now lets say one way i store all the internal values of s,i,d to a file and read them again, and pass them to the constructor and create a new object. Second I serialize and then deserialize to a new object. What is the basic difference between the two approaches.
I know serialization will be slow and secure and the other approach is not. Any other differences.
Read this article, explains pretty good what is serialization about (it is for Java RMI but the serialization explanation and problems are the same): http://oreilly.com/catalog/javarmi/chapter/ch10.html
The main differences I see is that:
(As the other answers says) you are responsible to serialize - deserialize. What is going to happen when one of the properties is another big complex class? What are you going to do then? Save its value as well?
Serialization depends on reflection, while the file thing depends on getters/setters/constructors. With reflection you don't need public setters/getters or a constructor with parameters. With the file thing you need them.
Extracted from the link above:
Using Serialization
Serialization is a mechanism built into the core Java libraries for writing a graph of objects into a stream of data. This stream of data can then be programmatically manipulated, and a deep copy of the objects can be made by reversing the process. This reversal is often called deserialization.
In particular, there are three main uses of serialization:
As a persistence mechanism. If the stream being used is FileOutputStream, then the data will automatically be written to a file.
As a copy mechanism. If the stream being used is ByteArrayOutputStream, then the data will be written to a byte array in memory. This byte array can then be used to create duplicates of the original objects.
As a communication mechanism. If the stream being used comes from a socket, then the data will automatically be sent over the wire to the receiving socket, at which point another program will decide what to do.
The important thing to note is that the use of serialization is independent of the serialization algorithm itself. If we have a serializable class, we can save it to a file or make a copy of it simply by changing the way we use the output of the serialization mechanism.
In your first approach, you are responsible for maintaining the logical relationship between the data values (in the sense that you store the data and then read it back and construct the object back).
In the second approach, Java does this for you behind the scenes.
Serialization and Deserialization in Java
Serialization is a process by which we can store the state of an object into any storage medium. We can store the state of the object into a file, into a database table etc. Deserialization is the opposite process of serialization where we retrieve the object back from the storage medium.
Eg1: Assume you have a Java bean object and its variables are having some values. Now you want to store this object into a file or into a database table. This can be achieved using serialization. Now you can retrieve this object again from the file or database at any point of time when you need it. This can be achieved using deserialization: (Post by Bobin Goswami).
Not real difference other than that you are implementing a custom serialization scheme, so that will typically involve more code, since by default serialization requires just an interface declaration.
You can achieve something very similar with Externalizable - you are in control of exactly what data is saved, so you can choose to save just the constructor arguments and construct the object from that. (You could achieve this also with serialization by marking non-constructor arguments as transient.)
The section on Serialization in Joshua Bloch's Effective Java, 2nd Ed. is really a good read on this subject. Something that is very important to keep in mind:
Using your own homegrown persistence method is intralinguistic. When you read data back from a store, you control how an object's state is restored. Very often this is with constructors and/or static factories. The invariants of the object's state are preserved. Encapsulation is maintained because you don't necessarily need to disclose implementation details as part of the custom store. The downside, of course, is that data very often needs to go places and #pakore nicely outlined those situations in which serialization is useful.
Serialization is an extralinguistic mechanism. Bloch makes compelling arguments for why serialization (in particular, the Serializiable interface) should be invoked only with the greatest of care. Serialization can bypass constructors because reconstitution of objects does not depend on one. There are profound possible security concerns. The invariants of your object's state are vulnerable. Moreover, using Serializable tends to lock you into supporting a particular class implementation (i.e., it destroys encapsulation) because much of your object's state becomes part of the class's exported API once it becomes Serializable (this can be proactively deferred by marking certain instance fields as transient).
TL;DR: Serialization is a common and even fundamental aspect of modern Java-based computing. Data these days must go places, and serialization provides a commonly used mechanism for communication. Because of the vulnerabilities that serialization may invoke and because it may case much (or all) of your object's internal state to become part of its exported API, the Serializable interface should be used with the greatest of care.
Is there a generic way to cache any type of object (be in a java class, or a word document etc.) to memory or disk?
Is simply serializing the object, and retaining the file extension (if it has one) enough to rebuild the object?
You seems to be using the word Object to describe 2 different things.
If your object is a Java object then having that object implement the Serializable is enough if you then use the java methods to serialize/de-serialize the object.
If you want to cache arbitrary data from the filesystem, the best way is to read it in an byte array(Or ArrayList). Then you can just write the array back to the disk or where you want it.
If you're talking about the inbuilt Java serialization, then you wouldn't even need to retain the file extension. The serialized form has enough information such that the deserialization process will produce an identical object without any additional help. I suppose that depending on how your code is structured, though, you might need to store some metadata for your own benefit so that you know what to cast the resulting Object as.
Note that Java serialization doesn't seem to fit your requirements, though - it cannot serialize any type of object, only those that implement Serializable. Perhaps you need to think a little more about what you mean by "simply serializing the object", since that's the rub.
No.
There is a class of objects which cannot be deserialized in a meaningful way. Think of an open network connection which is in the middle of transferring a file. You can not store that to disk, close your app, open your app, deserialize that connection and expect that it "just continues".
Java has an interface Serializable which indicates that an object can be serialized. It's up to you to ensure that is indeed possible. Typically an object is Serializable if all the data it holds is Serializable, or that data which is not Serializable is marked transient.
This is not to say that you could not, theoretically, dump the memory contents to a file as a byte stream, and read it back again later. You could build something like that I suppose. But to expect that it works is a different thing altogether.
In short, it is not possible to serialize any type. However, there is a generic way to serialize Java objects which are marked to be Serializable.
Not sure what you mean by "or a word document". Serialization can be used for disk caching, not sure what the purpose of using it in memory would be since it would probably be far faster to simply keep the original object.
A more robust solution might be ehcache it can manage the size of the cache as well as moving it between memory and disk.
If you're wondering about the cross platform (disk or memory) persistence part of the question, look at Java's Preferences class.
My, what a lot of answers!
Any object can make itself serializable by implementing java.io.Serializable.
But:
A default serialiser is implemented in ObjectOutputStream, which simply walks the object tree. This is fine for simple javabean type objects, but it can have undesirable effects such as system objects being serialised (I once inspected a serialised java object file and found that it was including all of the system timezone objects). And, of course, if your object has objects inside it that are not serializable (and not transient), then ObjectOutputStream will throw an exception.
(actually, even for JavaBean objects the default serializer it awful - the default serializer emits the classname of java.lang.String for every string field.)
So if your object is complicated, then you really should implement Externalizable and write a serialiser and deserializer with some smarts.
http://download.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#7185
So basically - no, you can't serialise any old object. You have to design object that are intended to be serialised and, ideally, that have some smarts about how they get themselves to and from a stream.
You cannot serialize any object in Java. Moreover, Java uses shallow copying(or is it called something else) for serialization, so if you want to seialize something like a HashMap, it might not save your data.