Java serialization of very large objects - java

I'm facing the following issue: I would like to serialize a very large object (several hundreds of MB in memory) to a file.
As I understood that Kryo is one of the best serialization libraries out there, I've been using this to serialize my object:
OutputStream fOutStream = new FileOutputStream(ParamsProvider.STORAGE_HH_FILENAME);
Output out = new Output(fOutStream);
kryo.writeObject(out, data);
out.close();
fOutStream.close();
This generates an OutOfMemory exception: I guess that the entire object is first serialized in memory before being written to file.
Hence my question: is there a way / library to serialize an entire object while it is written to file per chunks?
I would like to avoid the workaround of decomposing the object in smaller elements before serializing as:
I would like the serialization implementation to be independent of the data object structure
My object contains multiple references to the same objects. I suspect that if I serialize those elements independently, the serializer will instantiate them as different objects (consuming even more memory)
UPDATE:
In the meanwhile I implemented a serialization approach where the different elements of the very large object are serialized one by one and this does indeed avoid the OutOfMemory exception.
However I'm afraid that with this approach I will loose advantage (in terms of memory footprint) of multiple references to a same object (i.e. those references will have their own instance of the object). Any idea on this?
New code snippet (where, for instance, field1 contains references to same objects than field2):
OutputStream fOutStream = new FileOutputStream(ParamsProvider.STORAGE_HH_FILENAME);
Output out = new Output(fOutStream);
kryo.writeObject(out, data.field1);
kryo.writeObject(out, data.field2);
(...)
kryo.writeObject(out, data.field50);
out.close();
fOutStream.close();
Any hint / help would be greatly appreciated!
Thanks,
Tom

Related

Java - Framework to detect if Object has changed / dirty detection mechanism

Dear all I am currently implementing a java client which is using quite heavily third party webservices. In order to gain performance I just like to call webservices only in case the objects on my client side has been modified (become dirty).
Instead of writing an own kind of framework which is able to detect if an object is dirty, exists there any open / generic framework which can be reused and is not bind to its core product (e.g. hibernate)?
I presume with object you don't mean a single scalar value but a bean.
Technically you can do all sorts of fancy stuff to detect the bean mutation, for example changing the byte code and add some code whenever the object field is updated.
Another option is to keep a copy of the old bean instance and compare it.
So actually the problem reduces to comparing two beans, which was asked here: how to generically compare entire java beans? Probably you find more, there are a lot of frameworks dealing with beans in general.
However, since you call webservices, you must have a mechanism to serialize your objects. You could use the serialized form of the old and new object to compare for identity before sending the update request.
Change notification: I don't recommend attaching change listeners to every bean. This might change your general performance and introduce memory leaks. There is also a transaction problem: If more then one bean property is updated, when is the update of a bean completed? So you need an explicit call after the mutation anyways.
Note to myself and other caching guys: Actually this is the use case to provide a method like Cache.putIfNotEquals(key, value) on a cache, which is not much efford. The cache stores the previous value already and it does only call the cache writer (in a write through setup) if the value changed.
To give others a starting point how this can look like.
TestBean bean1 = new TestBean("AAA");
TestBean bean2 = new TestBean("BBB");
log.info("serialize...");
ByteArrayOutputStream str1 = new ByteArrayOutputStream();
ObjectOutputStream oos1 = new ObjectOutputStream(str1);
oos1.writeObject(bean1);
byte[] serialized1 = str1.toByteArray();
oos1.close();
ByteArrayOutputStream str2 = new ByteArrayOutputStream();
ObjectOutputStream oos2 = new ObjectOutputStream(str2);
oos2.writeObject(bean2);
byte[] serialized2 = str2.toByteArray();
oos2.close();
log.info("compare");
boolean same = Arrays.equals(serialized1, serialized2);
The advantage of this approach
you serialize the whole object structure so it supports complex hierarchies out of the box and you do not have to care about cycles (parent / childs)
you can use "transient" key word on members which you like to exclude from the serialization / comparison
you have not to add some boiler plate code to your bean classes (apart that the class needs to implement the Serializable interface)
disadvantages
it is not fine grained, so you don't get out of the box which fields are dirty. There are sophisticated approaches by using an own serialization format
I did not yet had a thought about performance but I could imagine that other approaches comes also not for free, so here you need to test and tune the approach your self.

Is it recommended to serialize & deserialize objects that are stored in arrayList?

In my small bank application, users have to input some value (name, SSN, amount etc..) and they get stored in an arrayList. The arrayList size is dynamic.
But problem with this one is I loose all data once I terminate the application. That leads me to think about the implementation of writing and reading file (file I/O).
Now I also have come to know about something called serialization and deserialization, though I am not quite sure in what situation this need to be implemented.
Do I need it in my particular case or simply writing into and reading from file will be enough?
What serialization and deserialization has to do with file I/O?
[NOTE: I will give more info if necessary]
This is where a Database comes into picture. To start with, you can use MySQL DB - it' an excellent FREE Database for small to medium size business apps. Later, if you intend to deploy your app to production - with large number of users & advance features, and are ready to pay a price for it - you might consider other databases like Oracle etc.
Storing info to files ((De)Serialization) is not recommended for any practical application.
Serialization is a mechanism where an object can be represented as a sequence of bytes that includes the object's data as well as information about the object's type and the types of data stored in the object.
ArrayList already implements Serializable, so in your example you could write something like this:
ArrayList<String> al=new ArrayList<String>();
al.add("Jean");
al.add("Pierre");
al.add("John");
try{
FileOutputStream fos= new FileOutputStream("myfile.txt");
ObjectOutputStream oos= new ObjectOutputStream(fos);
oos.writeObject(al);
oos.close();
fos.close();
}catch(IOException ioe){
ioe.printStackTrace();
}
Here we save the list al in the file myfile.txt.
To read the file and get your ArrayList back, you would use ObjectInputStream:
FileInputStream fis = new FileInputStream("myfile.txt");
ObjectInputStream ois = new ObjectInputStream(fis);
ArrayList<String> list = (ArrayList<String>) ois.readObject();
ois.close();
Serialization is required when you want to write instances of your own class to a file. In your case, you can create a java class to hold all the values about customer, then override hashCode() and equals(), and then write your object to file. http://www.tutorialspoint.com/java/java_serialization.htm
Also, if you want, you can store individual field in file as well as int or String.
Though I would suggest to use a database to store all this information. But it seems you are a student and still in learning phase. So, interacting with DB right away might not be a good approach as of now.
Yes, you can use arraylist for serialization and deserialization.
Whenever u want to write and read the object into file and from file
respectively then u need to be object should be serialized and object
write into the file in byte stream format.that means ur data will be secure in
stream.you can used serialization interface:-
To persist data for future use.
To send data to a remote computer using such client/server Java technologies as RMI or socket programming.
To "flatten" an object into array of bytes in memory.
To exchange data between applets and servlets.
To store user session in Web applications.
To activate/passivate enterprise java beans.
To send objects between the servers in a cluster.
and more............

Transfer of a Java Serialized Object

Is it possible to declare an instance of a serializable object in one Java program / class, then repeat the definitions of the internal objects in a different program /class entirely, and load in a big complex object from a data file? The goal is to be able to write an editor for items that's kept locally on my build machine, then write the game itself and distribute it to people who would like to play the game.
I'm writing a game in Java as a hobbyist project. Within my game, there's an a family of classes that extend a parent class, GameItem. Items might be in various families like HealingPotion, Bomb, KeyItem, and so on.
class GameItem implements Serializable {
String ItemName
String ImageResourceLocation
....}
What I want to do is include definitions of how to create each item in a particularly family of items, but then have a big class called GameItemList, which contains all possible items that can occur as you play the game.
class GameItemList implements Serializable {
LinkedList<GameItem>gameItemList;
//methods here like LookUpByName, LookUpByIndex that return references to an item
}
Maybe at some point - as the player starts a new game, or as the game launches, do something like:
//create itemList
FileInputStream fileIn = new FileInputStream("items.dat");
ObjectInputStream in = new ObjectInputStream(fileIn);
GameItemList allItems = (GameItemList)in.readObject();
in.close();
//Now I have an object called allItems that can be used for lookups.
Thanks guys, any comments or help would be greatly appreciated.
When you serialize an object, every field of the object is serialized, unless marked with transient. And this behavior is of course recursive. So yes, you can serialize an object, then deserialize it, and the deserialized object will have the same state as the serialized one. A different behavior would make serialization useless.
I wouldn't use native serialization for long-term storage of data, though. Serialized objects are hard to inspect, impossible to modify using a text editor, and maintaining backward compatibility with older versions of the classes is hard. I would use a more open format like XML or JSON.
Yes, that is possible. If an object is correctly serialized, it can be deserialized in any other machine as long as the application running there knowns the definition of the class to be deserialized.
This will work, but Java serialization is notorious for making it hard to "evolve" classes -- the internal representation is explicitly tied to the on-disk format. You can work around this with custom reader / writer methods, but you might consider a more portable format like JSON or XML instead of object serialization.

Why should i use Serialization instead of File I/O in java

In serialization mechanism,we are wrote the object into stream using objectinputstream and object outputstream.These objects passing across the network.In this mechanismusing a Object input/output stream.So Can i use File INPUT/OUTPUT Streams instead of calling serialization marker interface?.
I guess You are mixing up serialization and general I/O.
Serialization is a way to transform objects into byte sequences (and back, which is called Deserialization). This way, You can transmit serializable objects over the network and store them into files.
File input/output streams are for storing/reading any kind of data to/from files.
when you need to transfer your object on network, you need to serialized it. Below link might be useful for you.
http://java.sun.com/developer/technicalArticles/Programming/serialization/
File I/O and Serialization are two different things. File I/O is used to read/write a file. Serialization interface is used for binary interpretation of an object. So NO, you can't use File Streams for sending over network.(maybe there is some workaround for sending data over network using file streams, but its like trying to fly with a car)
First let's concentrate on the definition:
Serialization: It is the process of converting object state into a format that can be stored and reconstructed later in the same way.
Whereas in file I/O it can't be possible to store data-structure or object and reconstructed later in the same way.
That's why we use serialization or database query methods (like sql, mongodb).
JSON/XML can also be used for serialization using its parser.
Take an example of javascript (not java, but take it like language-agnostics):
var obj = { // it's an object in javascript (same like json)
a: "something",
b: 3,
c: "another"
};
Now if you try to use file i/o in this to save in a file (say abc.txt), it will be saved as a string
which means it can't be accessed later in other code by reading this file (abc.txt) like this:
// readThisFile();
// obj.a;
But if you use serialization (in javascript using JSON natively), you can read it from the file
Since streams are additive, you can do something like
FileOutputStream fos = new FileOutputStream("/some/file/to/write/to");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(someObject);
Not sure this is what you were asking, but it's hard to tell.
Serialization/Deserialization is used to read and write objects, which not only makes compressed data, which is unreadable but also is writes it in binary. The File I/O is used for reading and writing. It appears that you do not want to serialize, if you don't, well do not use it. Read and write your files in text.
In serialization mechanism,we write the object into s stream using
ObjectInputStream and ObjectOutputStream.
Ok
These objects are passed across the network.In this mechanism using a
ObjectInput/Output stream.
I am following you.
So can I use File Input/Output streams instead of calling
serialization marker interface?.
Here you lost me. Do you mean to send an object over the network or just to serialize it?
Of course you can use whichever Input/Output streams along with ObjectInput/ObjectOutput streams to serialize objects to different media.
For instance:
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("jedis.bin"));
out.writeObject(new Jedi("Luke"));
Would serialize the object into a file called jedis.bin
And the code
ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
ObjectOputStream out = new ObjectOutputStream(byteStream);
out.writeObject(new Jedi("Luke"));
Would serialize the object into a memory array.
So, anything that is an output/input stream is subject of being used as the underlying stream used by ObjectInput/ObjectOutput streams.

Implementing my own serialization in java

How can I implement serialization on my own. Meaning I don't want my class to implement serializable. But I want to implement serialization myself. So that without implementing serializable I can transfer objects over network or write them to a file and later retrieve them in same state. I want to do it since I want to learn and explore things.
Serialization is the process of translating the structure of an object into another format that could be easily transfered across network or could be stored in a file. Java serializes objects into a binary format. This is not necessary if bandwidth/disk-space is not a problem. You can simply encode your objects as XML:
// Code is for illustration purpose only, I haven't compiled it!!!
public class Person {
private String name;
private int age;
// ...
public String serializeToXml() {
StringBuilder xml = new StringBuilder();
xml.append("<person>");
xml.append("<attribute name=\"age\" type=\"int\">").append(age);
xml.append("</attribute>");
xml.append("<attribute name=\"name\" type=\"string\">").append(name);
xml.append("</attribute>");
xml.append("</person>");
return xml.toString();
}
Now you can get an object's XML representation and "serialize" it to a file or a network connection. A program written in any language that can parse XML can "deserialize" this object into its own data structure.
If you need a more compact representation, you can think of binary encoding:
// A naive binary serializer.
public byte[] serializeToBytes() {
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
// Object name and number of attributes.
// Write the 4 byte length of the string and the string itself to
// the ByteArrayOutputStream.
writeString("Person", bytes);
bytes.write(2); // number of attributes;
// Serialize age
writeString("age", bytes);
bytes.write(1); // type = 1 (i.e, int)
writeString(Integer.toString(age), bytes);
// serialize name
writeString("name", bytes);
bytes.write(2); // type = 2 (i.e, string)
writeString(name, bytes);
return bytes.toByteArray();
}
private static void writeString(String s, ByteArrayOutputStream bytes) {
bytes.write(s.length());
bytes.write(s.toBytes());
}
To learn about a more compact binary serialization scheme, see the Java implementation of Google Protocol Buffers.
You can use Externalizable and implement your own serialization mechanism. One of the difficult aspects of serialization is versioning so this can be a challenging exercise to implement. You can also look at protobuf and Avro as binary serialization formats.
You start with reflection. Get the object's class and declared fields of its class and all superclasses. Then obtain value of each field and write it to dump.
When deserializing, just reverse the process: get class name from your serialized form, instantiate an object and set its fields accordingly to the dump.
That's the simplistic approach if you just want to learn. There's many issues that can come up if you want to do it "for real":
Versioning. What if one end of the application is running new version, but the other end has an older class definition with some fields missing or renamed?
Overwriting default behavior. What if some object is more complex and cannot be recreated on a simple field-by-field basis?
Recreating dependencies between objects, including cyclic ones.
... and probably many more.
Get the Java Source code and understand how Serialization is implemented. I did this some month ago, and now have a Serialization that uses only 16% of the space and 20% of the time of "normal" serialization, at the cost of assuming that the classes that wrote the serialized data have not changed. I use this for client-server serialization where I can use this assumption.
As a supplement to #Konrad Garus' answer. There is one issue that is a show-stopper for a full reimplementation of Java serialization.
When you deserialize an object, you need to use one of the object's class's constructors to recreate an instance. But which constructor should you use? If there is a no-args constructor, you could conceivably use that. However, the no-args constructor (or indeed any constructor) might do something with the object in addition to creating it. For example, it might send a notification to something else that a new instance has been created ... passing the instance that isn't yet completely deserialized.
In fact, it is really difficult replicate what standard Java deserialization code does. What it does is this:
It determines the class to be created.
Create an instance of the class without calling any of its constructors.
It uses reflection to fill in the instance's fields, including private fields, with objects and values reconstructed from the serialization.
The problem is that step 2. involves some "black magic" that a normal Java class is not permitted to do.
(If you want to understand the gory details, read the serialization spec and take a look at the implementation in the OpenJDK codebase.)

Categories