Why would I serialize an object instead of doing a file output?

Why would I serialize an object instead of doing a file output? - java

I don't currently understand why I would choose to serialize an object instead of just doing a file output and then having a function read that file. What do I gain from serializing an object?

You gain an industry-standard way of reading and writing an object's data, using a W3C approved data exchange format that has almost universal support for readers and writers in almost every programming language.

Serialization makes it easy to store the state of objects,
and objects inside them (If they are Serializable and not marked as transient).
The benefits in your case :
Imagine you have a lot of different classes. Maybe coding a custom File-to-class parser is harder than readObject()

When you serialize an object, you are copying the actual byte data in memory into a stream. When you de-serialize that stream back into an object you get the identical object back including its internal object ID, which you would not get if you had written the properties of the object to a file, and then read it back in and interpreted it.
This means, if you serialize a collection of objects that reference each other, when you de-serialize them, they will still maintain their references to each other. This is good also for debugging a program. If an exception occurs you can create a memory dump on the users computer, and if they send it to you, then you can see directly what was in memory and the problems that may have been caused.
It is also easier to serialize a complex object with many properties to a stream than it is to build some string of representative data, which you will have to be read back, parse and construct a new object with it.
Really what you gain, is that it is easier/quicker and better for debugging.

Related

Serialization vs toString()

Since I'm writing/reading from files, I was wondering if there's any difference or there's any best practice between directly sending objects or using their representation as strings on files which in my case I personally find it easier to handle.
So when should I serialize instead of writing/reading objects as String?

There's typically not enough information in the string representation of an object to be used to recreate it.
Java serialization "just works", but does not give you a human-readable representation, if that's what you are looking for.
Another alternative is to read / write JSON representations of your objects. There are several JSON serialization / federalization libraries for Java that are popular, including GSon and Jackson.

The answer is in the javadoc for Object.toString().
Returns a string representation of the object. In general, the toString method returns a string that "textually represents" this object. The result should be a concise but informative representation that is easy for a person to read.
Note that it says:
concise,
informative, and
easy for a person to read.
But it does NOT say:
complete,
unambiguous, or
easy for a computer to read.
Serialization is about producing a linear (not necessarily textual) form that can be read by a computer and used to reconstruct the state of the original object.
So a typical serialization is not particularly human readable (e.g. JSON, XML, YAML) or completely unreadable (e.g. Java Object Serialization, ASN.1). But the flip-side is that the information needed to reconstruct an object should all be present, in an unambiguous form.
(There is a lot more that could be said about various kinds serialization, their properties and their utility. However, it is beyond the scope of your question.)
Does this preclude toString() from being used for serializing data?
No, it doesn't.
But if you take that approach, you need to code your toString() methods carefully to make sure that what they produce is complete and unambiguous. Then you need to write a corresponding method to parse the toString() output and create an new object from it.
... or using their representation as strings on files which in my case I personally find it easier to handle.
I think that as you write larger and more complicated programs, you will get to the stage where that kind of code is tedious and time consuming to write, test and maintain.

Serialization allows you to convert the state of an object into a stream of bytes, which can then be saved to a file on the local disk, sent over the network to any other machine, or saved to the Database. Deserialization allows you to reverse the process, which means to reconvert the serialized byte stream into an object again. It's important to know that numbers or other types aren't as easy to write to files or treats as Strings. However, their initial states are not guaranteed to be maintained without being serialized.
Thus, it is convenient to use Strings, for simpler situations, which is not necessarily important to have a serialization, such as a college project. However, it is not recommended that this process be done, as there are other better solutions.

Is it possible to gain control on the amount of objects created when deserializing a file

Suppose I have a file with a lot (most likely 100K+, potentially millions) of serialized objects of the same class. I read these objects and do something with them:
//open stream
try{
while(true) {
Object o = ois.readObject();
foo(o);
}
}catch(EOFException){
}
//close stream...
When this is done, a very uncomfortably large amount of objects have been created. My problem is that I don't have control of those objects, and they won't be freed until the GC decides to do it.
Is there a way to put an upper limit on the amount of new objects created? For example, if my file has 100K serialized objects, is there a way to tweak the readObject mechanism so that a fixed size pool is used?
More Details
The ~100K object file is the merged result of many smaller files. What this small process is doing, is to create a sorted CSV file.

None of the suggested comments or answers so far will work (most of them are also unnecessary), because the ObjectInputStream itself holds a reference to every object it has ever deserialized, for preservation of object graphs.
You need to constrain how much data is written to the file, so you don't have to handle 100,000 objects per file, and if possible you should also make use of ObjectOutputStream.reset() or ObjectOutputStream.writeUnshared(), for the reasons described in their respective Javadoc comments.

You can try creating a fixed-sized collection of PhantomReferences to each of the objects from the file.
Once the collection is full, you only read another object from the file if and only if an existing PhantomReference can be retrieved/removed (as a blocking call) from the ReferenceQueue, after which you remove it from the fixed-sized collection and allow another to be created.
Remember to call 'clear()' on the PhantomReference after you remove it from the ReferenceQueue.
Hope this helps!
Refer to this document for more information regarding Phantom References:
https://weblogs.java.net/blog/kcpeppe/archive/2011/09/29/mysterious-phantom-reference
And Here:
http://java.dzone.com/articles/finalization-and-phantom

I guess you also have some influence on the design of the program that wrote those serialized objects. Isn't this kind of problem suggesting that the Java serialization format is not a good fit for your problem? Perhaps you should write and read the objects in some other format, which allows you to discard old objects as garbage during processing of the stream?

If you have to read Objects, you have to create Objects, there is little you can do about this. Changing your code to foo(ois.readObject()); gives the compiler a hint that it doesn't need to store the reference, but still Objects are created.
That leaves you two options:
You trust the Garbage Collector to be highly efficient and well designed.
You change your underlying data structure to not store Objects, but design it in a form that completely relies on primitive data types.

As I understand, EJP proposed to regenerate your input files using writeUnshared technique instead of writeObject to make object available for GC during reading. If it's about regenerating original content, then may be you could switch to some other serializer like Kryo?
Java's built-in serialization is slow, inefficient, and has many well-known problems (see Effective Java, by Josh Bloch pp. 213).
Their promised serialized object size is 5x smaller than standard Java's, so memory consumption should be at least 5x smaller, I think.
EDIT
Better wording: 5x-7x heavier serialized objects most probably mean that ObjectInputStream is a memory eater, e.g. uses too much for the job, though frees that memory in the end.

Serialization of Java Objects

I'm trying to evaluate the usefulness of using Serialized objects in a Java application I'm developing. I'm trying to determine if the following makes sense for an Object-serialization implementation, or if I should custom-build the transport. Here's my scenario:
Objects will be transported over TCP from one application to another.
The serialized object will be an instance of a class stored in a common library.
Sample Class:
public class Room implements Serializable {
// Instance Variables
private Room roomWithinRoom;
// ...
}
So my question is that since I will have several instance variables that reference back to the Room class, can I use Java serialization to accomplish the transfer of Room objects? If I am able to accomplish this, will the pointers be preserved?
Example:
Room mainRoom = new Room();
Room closet = new Room();
mainRoom.addRoom(closet);
If I send over the object "mainRoom," will it also serialize and send the "closet" instance (preserving the roomWithinRoom instance variable in mainRoom?)
Thanks a bunch, everyone!

Yes, Java serialization does this really well. Maybe too well. I've had great luck writing to files. With TCP be sure to call reset after each write. Java will send a whole network graph in one go--your object, all the objects references, all they reference, and so on. One writeObject can send gigabytes of data. But without reset, if the next object you write was included in the first graph, all that will go across will be a reference to the previously sent object. Any updated data will be ignored. (Did I mention you should call reset after each writeObject?)
I'd suggest using the Externalizable interface. This gives you a bit more control over what gets written. (Actually, this is done on a class basis, so some classes can be Serializable and some Externalizable with no problem.) This means you can write in the same format even when the class changes a bit. It lets you skip data you don't need, and sometimes pass trivial objects as data primitives. I also use version numbers so newer versions of the class can read stuff written by older versions. (This is a bit more important when writing to files than with TCP.) A warning: when reading a class, be aware that the object reference you just read may not reference any data (that is, the referenced object has not be read in yet; you're looking at unitialized memory where the object's data will go.).
You'll probably fumble around with this a bit, but it does work really well once you understand it.

References will be preserved within the object graph, and any nested objects will also be serialized correctly given they implement the serializable interface, and are not marked transient.
I advise against built in java serialization as it is more verbose than other binary protocols. Also there is always the potential that serialization/deserialization routines could change between runtimes.
For these reasons, I suggest ProtoBufs. ProtoBuf is small, fast, and can even be used to quickly build serialization/deserialization routines in languages other than Java, given you find a protobuf "compiler" for that language.

Difference between serializing and deserializing and writing internals to a file and then reading them and passing them in constructor

Lets say we have a class
Class A implements serializable{
String s;
int i;
Date d;
public A(){
}
public A(String s, int i, Date d){
this.s =s;
blah blah
}
}
Now lets say one way i store all the internal values of s,i,d to a file and read them again, and pass them to the constructor and create a new object. Second I serialize and then deserialize to a new object. What is the basic difference between the two approaches.
I know serialization will be slow and secure and the other approach is not. Any other differences.

Read this article, explains pretty good what is serialization about (it is for Java RMI but the serialization explanation and problems are the same): http://oreilly.com/catalog/javarmi/chapter/ch10.html
The main differences I see is that:
(As the other answers says) you are responsible to serialize - deserialize. What is going to happen when one of the properties is another big complex class? What are you going to do then? Save its value as well?
Serialization depends on reflection, while the file thing depends on getters/setters/constructors. With reflection you don't need public setters/getters or a constructor with parameters. With the file thing you need them.
Extracted from the link above:
Using Serialization
Serialization is a mechanism built into the core Java libraries for writing a graph of objects into a stream of data. This stream of data can then be programmatically manipulated, and a deep copy of the objects can be made by reversing the process. This reversal is often called deserialization.
In particular, there are three main uses of serialization:
As a persistence mechanism. If the stream being used is FileOutputStream, then the data will automatically be written to a file.
As a copy mechanism. If the stream being used is ByteArrayOutputStream, then the data will be written to a byte array in memory. This byte array can then be used to create duplicates of the original objects.
As a communication mechanism. If the stream being used comes from a socket, then the data will automatically be sent over the wire to the receiving socket, at which point another program will decide what to do.
The important thing to note is that the use of serialization is independent of the serialization algorithm itself. If we have a serializable class, we can save it to a file or make a copy of it simply by changing the way we use the output of the serialization mechanism.

In your first approach, you are responsible for maintaining the logical relationship between the data values (in the sense that you store the data and then read it back and construct the object back).
In the second approach, Java does this for you behind the scenes.

Serialization and Deserialization in Java
Serialization is a process by which we can store the state of an object into any storage medium. We can store the state of the object into a file, into a database table etc. Deserialization is the opposite process of serialization where we retrieve the object back from the storage medium.
Eg1: Assume you have a Java bean object and its variables are having some values. Now you want to store this object into a file or into a database table. This can be achieved using serialization. Now you can retrieve this object again from the file or database at any point of time when you need it. This can be achieved using deserialization: (Post by Bobin Goswami).

Not real difference other than that you are implementing a custom serialization scheme, so that will typically involve more code, since by default serialization requires just an interface declaration.
You can achieve something very similar with Externalizable - you are in control of exactly what data is saved, so you can choose to save just the constructor arguments and construct the object from that. (You could achieve this also with serialization by marking non-constructor arguments as transient.)

The section on Serialization in Joshua Bloch's Effective Java, 2nd Ed. is really a good read on this subject. Something that is very important to keep in mind:
Using your own homegrown persistence method is intralinguistic. When you read data back from a store, you control how an object's state is restored. Very often this is with constructors and/or static factories. The invariants of the object's state are preserved. Encapsulation is maintained because you don't necessarily need to disclose implementation details as part of the custom store. The downside, of course, is that data very often needs to go places and #pakore nicely outlined those situations in which serialization is useful.
Serialization is an extralinguistic mechanism. Bloch makes compelling arguments for why serialization (in particular, the Serializiable interface) should be invoked only with the greatest of care. Serialization can bypass constructors because reconstitution of objects does not depend on one. There are profound possible security concerns. The invariants of your object's state are vulnerable. Moreover, using Serializable tends to lock you into supporting a particular class implementation (i.e., it destroys encapsulation) because much of your object's state becomes part of the class's exported API once it becomes Serializable (this can be proactively deferred by marking certain instance fields as transient).
TL;DR: Serialization is a common and even fundamental aspect of modern Java-based computing. Data these days must go places, and serialization provides a commonly used mechanism for communication. Because of the vulnerabilities that serialization may invoke and because it may case much (or all) of your object's internal state to become part of its exported API, the Serializable interface should be used with the greatest of care.

How to cache any object type to memory/disk in java?

Is there a generic way to cache any type of object (be in a java class, or a word document etc.) to memory or disk?
Is simply serializing the object, and retaining the file extension (if it has one) enough to rebuild the object?

You seems to be using the word Object to describe 2 different things.
If your object is a Java object then having that object implement the Serializable is enough if you then use the java methods to serialize/de-serialize the object.
If you want to cache arbitrary data from the filesystem, the best way is to read it in an byte array(Or ArrayList). Then you can just write the array back to the disk or where you want it.

If you're talking about the inbuilt Java serialization, then you wouldn't even need to retain the file extension. The serialized form has enough information such that the deserialization process will produce an identical object without any additional help. I suppose that depending on how your code is structured, though, you might need to store some metadata for your own benefit so that you know what to cast the resulting Object as.
Note that Java serialization doesn't seem to fit your requirements, though - it cannot serialize any type of object, only those that implement Serializable. Perhaps you need to think a little more about what you mean by "simply serializing the object", since that's the rub.

No.
There is a class of objects which cannot be deserialized in a meaningful way. Think of an open network connection which is in the middle of transferring a file. You can not store that to disk, close your app, open your app, deserialize that connection and expect that it "just continues".
Java has an interface Serializable which indicates that an object can be serialized. It's up to you to ensure that is indeed possible. Typically an object is Serializable if all the data it holds is Serializable, or that data which is not Serializable is marked transient.
This is not to say that you could not, theoretically, dump the memory contents to a file as a byte stream, and read it back again later. You could build something like that I suppose. But to expect that it works is a different thing altogether.
In short, it is not possible to serialize any type. However, there is a generic way to serialize Java objects which are marked to be Serializable.

Not sure what you mean by "or a word document". Serialization can be used for disk caching, not sure what the purpose of using it in memory would be since it would probably be far faster to simply keep the original object.
A more robust solution might be ehcache it can manage the size of the cache as well as moving it between memory and disk.

If you're wondering about the cross platform (disk or memory) persistence part of the question, look at Java's Preferences class.

My, what a lot of answers!
Any object can make itself serializable by implementing java.io.Serializable.
But:
A default serialiser is implemented in ObjectOutputStream, which simply walks the object tree. This is fine for simple javabean type objects, but it can have undesirable effects such as system objects being serialised (I once inspected a serialised java object file and found that it was including all of the system timezone objects). And, of course, if your object has objects inside it that are not serializable (and not transient), then ObjectOutputStream will throw an exception.
(actually, even for JavaBean objects the default serializer it awful - the default serializer emits the classname of java.lang.String for every string field.)
So if your object is complicated, then you really should implement Externalizable and write a serialiser and deserializer with some smarts.
http://download.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#7185
So basically - no, you can't serialise any old object. You have to design object that are intended to be serialised and, ideally, that have some smarts about how they get themselves to and from a stream.

You cannot serialize any object in Java. Moreover, Java uses shallow copying(or is it called something else) for serialization, so if you want to seialize something like a HashMap, it might not save your data.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.