As far I know, Kryo serialization / deserialization happens per object. Is it possible to serialize multiple objects into a single file?. One of workaround suggested in another similar SO question was to use an array of objects. Considering a huge amount of data that needs to be serialized, I feel it would not be as efficient as it should be. Is it right assumption?
Does Kryo API take an OutputStream? If so, just feed it the same OutputStream to serialize multiple files. Do the same with InputStream when reading. A good serialization format will have length encodings or termination symbols and would not rely on EOF for anything.
The array approach would also work with minimal overhead as long as all of these objects are already in memory. You are talking about adding just a few bytes per object to create an array to hold them. If they aren't all in memory, you would have to load them all into memory first to create an array around them. That could definitely become a problem given large enough data set.
As Kryo supports streaming there is nothing to stop you writing/reading more than one object to kryo "at the top level". For example the following program writes two unrelated objects to a file and then deserializes them again
public class TestClass{
public static void main(String[] args) throws FileNotFoundException{
serialize();
deSerialize();
}
public static void serialize() throws FileNotFoundException{
Collection<String>collection=new ArrayList<>();
int otherData=12;
collection.add("This is a serialized collection of strings");
Kryo kryo = new Kryo();
Output output = new Output(new FileOutputStream("testfile"));
kryo.writeClassAndObject(output, collection);
kryo.writeClassAndObject(output, otherData); //we could add as many of these as we like
output.close();
}
public static void deSerialize() throws FileNotFoundException{
Collection<String>collection;
int otherData;
Kryo kryo = new Kryo();
Input input = new Input(new FileInputStream("testfile"));
collection=(Collection<String>)kryo.readClassAndObject(input);
otherData=(Integer)kryo.readClassAndObject(input);
input.close();
for(String string: collection){
System.out.println(string);
}
System.out.println("There are other things too! like; " + otherData);
}
}
Related
I have a serializable class with custom writeObject() and readObject() methods.
When an object serializes, it needs to write two byte arrays, one after another. When something deserializes it, it needs to read those two arrays.
This is my code:
private void writeObject (final ObjectOutputStream out) throws IOException {
..
out.writeByte(this.signature.getV()); //one byte
out.writeObject(this.signature.getR()); //an array of bytes
out.writeObject(this.signature.getS()); //an array of bytes
out.close();
}
private void readObject (final ObjectInputStream in) throws IOException, ClassNotFoundException {
..
v = in.readByte();
r = (byte[])in.readObject();
s = (byte[])in.readObject();
this.signature = new Sign.SignatureData(v, r, s); //creating a new object because
//sign.signaturedata
// is not serializable
in.close();
}
When the object is being deserialized (readObject method) it throws an EOFException and all three variables are null/undefined.
Relating to question title, I saw a class called ByteArrayOutputStream, but to use it, it has to be enclosed in a ObjectOutputStream, which I cannot do, ad I have an OutputStream given and must write with it.
1. How do one properly write a byte array using objectOutputStream and properly reads it using ObjectInputStream?
2. Why the code above throws an EOFException without reading even one variable?
EDIT: I need to clarify: the readObject() and writeObject() are called by jvm itself while deserializing and serializing the object.
The second thing is, the SignatureData is a subclass to Sign, that comes from a third-party library - and that's why it's not serializable.
The third thing is, the problem probably lies in the reading and writing byte arrays by ObjectInput/ObjectOutput streams, not in the Sign.SignatureData class.
I have an array of bytes whose length equals XXX. It contains a serialized object which I want to unserialise (ie. : I want to create a copy of this object from these stored bytes).
But I have a constraint : the useful length of my bytes array. Indeed, I want to take in consideration the latter to unserialise (ie. : the serialized object can be shorter than the array's size).
I hope you will understand easier with my two little methods (the first serialises, while the last unserialises) :
byte[] toBytes() throws IOException {
byte[] array_bytes;
ByteArrayOutputStream byte_array_output_stream = new ByteArrayOutputStream();
ObjectOutput object_output = new ObjectOutputStream(byte_array_output_stream);
object_output.writeObject(this);
object_output.close();
array_bytes = byte_array_output_stream.toByteArray();
return array_bytes;
}
And the current unserialisation method (which is "wrong" for the moment because I don't use the useful length) :
static Message fromBytes(byte[] bytes, int length) throws IOException, ClassNotFoundException, ClassCastException {
Message message;
ByteArrayInputStream byte_array_input_stream = new ByteArrayInputStream(bytes);
ObjectInput object_input = new ObjectInputStream(byte_array_input_stream);
message = (Message) object_input.readObject();
object_input.close();
return message;
}
As you can see, readObject doesn't need a length, and I must : that's a problem, and perhaps I should NOT use this method.
Thus, my question is : With or without using readObject, how could I take in consideration the useful length (ie. : "payload" ?) of my bytes array ?
I assume that your Message class implements Serializable.
In this case, when you write your message, it gets automatically serialized from the java runtime, as explained in the Serializable interface.
I cannot be sure how or why you might find part of the generated byte array as not useful, since it is all part of the serialized instance.
However, I might suggest that you follow the Externalizable interface way:
your Message class will implement Externalizable. Then you have the option of controlling how exactly your class gets serialized and de-serialized in writeExternal(ObjectOutput out) and readExternal(ObjectInput in) methods respectively, where you can write the length you want in the stream, read it back, and/or keep only the required amount of bytes.
I would like to know how to save an ArrayList of abstract Objects to a file.
So far I only save primitive types or ArrayLists of primitive types by converting them to a comma separated String and storing this with a buffered reader.
But now I have got an ArrayList of Game Elements, which have really different properties and Constructors, so my normal approach won't work. There has to be something nicer than storing each to a file or each type of Object to a file or add plenty of seperator levels.
How do I do this in a nice way?
Have a look at Serialization, there are plenty of tutorials out there so I am not going to post any code:
http://www.tutorialspoint.com/java/java_serialization.htm
You can not instantiate Abstract Objects so you will need a child class which extends it. Also Abstract class should implement Serialize. Then using ObjectOutputStream you can directly write ArrayList using writeObject() method.
Below is the sample application
public abstract class Parent implements Serializable {
public abstract String getValue(); //Just to show value persist
}
public class Child extends Parent {
String value = null;
Child(String value) {
this.value = value;
}
public String getValue() {
return value;
}
}
// No throws clause here
public static void main(String[] args) throws FileNotFoundException,
IOException, ClassNotFoundException {
//create Arraylist
ArrayList<Parent> parents = new ArrayList<Parent>();
parents.add(new Child("test"));
//store
ObjectOutputStream objectOutputStream = new ObjectOutputStream(
new FileOutputStream("test.txt"));
objectOutputStream.writeObject(parents);
objectOutputStream.close();
//Read back
ObjectInputStream objectInputStream = new ObjectInputStream(
new FileInputStream("test.txt"));
ArrayList<Parent> readObjects = (ArrayList<Parent>)objectInputStream.readObject();
System.out.println(readObjects.get(0).getValue());
}
The answer could be two.
Depending on what is the usage of the file later.
ANS 1: It you want the object values to be saved temporarily in the file and reload it again from the file, then serialization is the best options.
ANS 2: If the file is output of the program and then you try the below
option#1: Start each line in the file with the unique object name
OBJECT1, blue, pink, yellow....
OBJECT2, rose, dairy, sunflower, cauliflower..
option#2 instead of the flat file(txt) you can use an apache poi framework to write
the object in more organised way.
I am using an inner class that is a subclass of a HashMap. I have a String as the key and double[] as the values. I store about 200 doubles per double[]. I should be using around 700 MB to store the keys, the pointers and the doubles. However, memory analysis reveals that I need a lot more than that (a little over 2 GB).
Using TIJmp (profiling tool) I saw there was a char[] that was using almost half of the total memory. TIJmp said that char[] came from Serializable and Cloneable. The values in it ranged from a list of fonts and default paths to messages and single characters.
What is the exact behavior of Serializable in the JVM? Is it keeping a "persistent" copy at all times thus, doubling the size of my memory footprint? How can I write binary copies of an object at runtime without turning the JVM into a memory hog?
PS: The method where the memory consumption increases the most is the one below. The file has around 229,000 lines and 202 fields per line.
public void readThetas(String filename) throws Exception
{
long t1 = System.currentTimeMillis();
documents = new HashMapX<String,double[]>(); //Document names to indices.
Scanner s = new Scanner(new File(filename));
int docIndex = 0;
if (s.hasNextLine())
System.out.println(s.nextLine()); // Consume useless first line :)
while(s.hasNextLine())
{
String[] fields = s.nextLine().split("\\s+");
String docName = fields[1];
numTopics = fields.length/2-1;
double[] thetas = new double[numTopics];
for (int i=2;i<numTopics;i=i+2)
thetas[Integer.valueOf(fields[i].trim())] = Double.valueOf(fields[i+1].trim());
documents.put(docName,thetas);
docIndex++;
if (docIndex%10000==0)
System.out.print("*"); //progress bar ;)
}
s.close();
long t2 = System.currentTimeMillis();
System.out.println("\nRead file in "+ (t2-t1) +" ms");
}
Oh!, and HashMapX is an inner class declared like this:
public static class HashMapX< K, V> extends HashMap<K,V> {
public V get(Object key, V altVal) {
if (this.containsKey(key))
return this.get(key);
else
return altVal;
}
}
This may not address all of your questions, but is a way in which serialization can significantly increase memory usage: http://java.sun.com/javase/technologies/core/basic/serializationFAQ.jsp#OutOfMemoryError.
In short, if you keep an ObjectOutputStream open then none of the objects that have been written to it can be garbage-collected unless you explicitly call its reset() method.
So, I found the answer. It is a memory leak in my code. Had nothing to do with Serializable or Cloneable.
This code is trying to parse a file. Each line contains a set of values which I am trying to extract. Then, I keep some of those values and store them in a HashMapX or some other structure.
The core of the problem is here:
String[] fields = s.nextLine().split("\\s+");
String docName = fields[1];
and I propagate it here:
documents.put(docName,thetas);
What happens is that docName is a reference to an element in an array (fields) and I am keeping that reference for the life of the program (by storing it in the global HashMap documents). As long as I keep that reference alive, the whole String[] fields cannot be garbage collected. The solution:
String docName = new String(fields[1]); // A copy, not a reference.
Thus copying the object and releasing the reference to the array element. In this way, the garbage collector can free the memory used by the array once I process every field.
I hope this will be useful to all of those who parse large text files using split and store some of the fields in global variables.
Thanks everybody for their comments. They guided me in the right direction.
I understand the theory behind incompatible serialVersionUIDs (i.e. you can discriminate different compilation versions of the same class) but I am seeing an issue that I don't understand and doesn't fall into the obvious error causes (different compiled version of the same class).
I am testing a serialization/deserialization process. All code is running on one machine, in the same VM, and both serialization and deserialization methods are using the same version of the compiled class. Serialization works fine. The class being serialized is quite complex, contains a number of other classes (java types and UDTs), and contains reference cycles. I haven't declared my own UID in any class.
This is the code:
public class Test {
public static void main(String[] args) throws Exception {
ContextNode context = WorkflowBuilder.getSimpleSequentialContextNode();
String contextString = BinarySerialization.serializeToString(context);
ContextNode contextD = BinarySerialization.deserializeFromString(ContextNode.class, contextString);
}
}
public class BinarySerialization {
public static synchronized String serializeToString(Object obj) throws Exception {
ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(byteStream);
oos.writeObject(obj);
oos.close();
return byteStream.toString();
}
public static synchronized <T> T deserializeFromString(Class<T> type, String byteString) throws Exception {
T object = null;
ByteArrayInputStream byteStream = new ByteArrayInputStream(byteString.getBytes());
ObjectInputStream in = new ObjectInputStream(byteStream);
object = (T)in.readObject();
in.close();
return object;
}
}
I am getting an InvalidClassException (local class incompatible: stream classdesc serialVersionUID = -7189235121689378989, local class serialVersionUID = -7189235121689362093) when deserializing.
What is the underlying issue? And how should I fix it?
Thanks
Edit
I should state the purpose of this. The serialized data will both need to be stored in a sqlite database and sent across the wire to other clients. If String is the wrong format for passing around the serialized data, what should I be using instead that will let me store and pass the data about? Thanks again.
First rule: never use String or char[] or Reader or Writer when handling binary data.
You're handling binary data and try to put it into a String. Don't do that, that's an inherently broken operation.
Next: the return value of byteStream.toString() doesn't in any way represent the actual content of the ByteArrayOutputStream. You'll want to use .getBytes() and pass the byte[] around (remember: treat binary data as binary data and not as a String).