Implementing my own serialization in java - java

How can I implement serialization on my own. Meaning I don't want my class to implement serializable. But I want to implement serialization myself. So that without implementing serializable I can transfer objects over network or write them to a file and later retrieve them in same state. I want to do it since I want to learn and explore things.

Serialization is the process of translating the structure of an object into another format that could be easily transfered across network or could be stored in a file. Java serializes objects into a binary format. This is not necessary if bandwidth/disk-space is not a problem. You can simply encode your objects as XML:
// Code is for illustration purpose only, I haven't compiled it!!!
public class Person {
private String name;
private int age;
// ...
public String serializeToXml() {
StringBuilder xml = new StringBuilder();
xml.append("<person>");
xml.append("<attribute name=\"age\" type=\"int\">").append(age);
xml.append("</attribute>");
xml.append("<attribute name=\"name\" type=\"string\">").append(name);
xml.append("</attribute>");
xml.append("</person>");
return xml.toString();
}
Now you can get an object's XML representation and "serialize" it to a file or a network connection. A program written in any language that can parse XML can "deserialize" this object into its own data structure.
If you need a more compact representation, you can think of binary encoding:
// A naive binary serializer.
public byte[] serializeToBytes() {
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
// Object name and number of attributes.
// Write the 4 byte length of the string and the string itself to
// the ByteArrayOutputStream.
writeString("Person", bytes);
bytes.write(2); // number of attributes;
// Serialize age
writeString("age", bytes);
bytes.write(1); // type = 1 (i.e, int)
writeString(Integer.toString(age), bytes);
// serialize name
writeString("name", bytes);
bytes.write(2); // type = 2 (i.e, string)
writeString(name, bytes);
return bytes.toByteArray();
}
private static void writeString(String s, ByteArrayOutputStream bytes) {
bytes.write(s.length());
bytes.write(s.toBytes());
}
To learn about a more compact binary serialization scheme, see the Java implementation of Google Protocol Buffers.

You can use Externalizable and implement your own serialization mechanism. One of the difficult aspects of serialization is versioning so this can be a challenging exercise to implement. You can also look at protobuf and Avro as binary serialization formats.

You start with reflection. Get the object's class and declared fields of its class and all superclasses. Then obtain value of each field and write it to dump.
When deserializing, just reverse the process: get class name from your serialized form, instantiate an object and set its fields accordingly to the dump.
That's the simplistic approach if you just want to learn. There's many issues that can come up if you want to do it "for real":
Versioning. What if one end of the application is running new version, but the other end has an older class definition with some fields missing or renamed?
Overwriting default behavior. What if some object is more complex and cannot be recreated on a simple field-by-field basis?
Recreating dependencies between objects, including cyclic ones.
... and probably many more.

Get the Java Source code and understand how Serialization is implemented. I did this some month ago, and now have a Serialization that uses only 16% of the space and 20% of the time of "normal" serialization, at the cost of assuming that the classes that wrote the serialized data have not changed. I use this for client-server serialization where I can use this assumption.

As a supplement to #Konrad Garus' answer. There is one issue that is a show-stopper for a full reimplementation of Java serialization.
When you deserialize an object, you need to use one of the object's class's constructors to recreate an instance. But which constructor should you use? If there is a no-args constructor, you could conceivably use that. However, the no-args constructor (or indeed any constructor) might do something with the object in addition to creating it. For example, it might send a notification to something else that a new instance has been created ... passing the instance that isn't yet completely deserialized.
In fact, it is really difficult replicate what standard Java deserialization code does. What it does is this:
It determines the class to be created.
Create an instance of the class without calling any of its constructors.
It uses reflection to fill in the instance's fields, including private fields, with objects and values reconstructed from the serialization.
The problem is that step 2. involves some "black magic" that a normal Java class is not permitted to do.
(If you want to understand the gory details, read the serialization spec and take a look at the implementation in the OpenJDK codebase.)

Related

Writing data classes with builder pattern

I have a data class which uses a builder to create the object and stores the data in a buffer in serialized form. I am planning to change the class to add and remove some fields. There are systems that will use both version of the class to create data i.e. the current version with all fields and newer version with removed/added fields. I am trying to see what is the best way to do this so that this is backward compatible(without breaking any consumer)?
I have couple of suggestions on how to do this but I am having a difficult time to pick one over the other.
Requirements:
The data stored has to be in binary.
Length of serialized record is same in both versions
Existing code
public class A implements Comparable<A>, Serializable {
private final Buffer buffer;
public static final Builder {
private Header header//header with version
private long creationTime;
private SomeObject someObject;//this is removed in next version
private OtherObject otherObject;//this is added in next version
public Builder() { }
//bunch of getters setters for fields
public A build() {return new A(this);}
private A(Builder b) {
//build the object and put into the buffer
validate()
}
private void validate() {//validates the object}
public A(Buffer buf) {
this.buffer=buf;
validate();
}
public A(String encodedString) {
this(ByteBuffer.wrap(encodedString));
}
}
// consumers use this to get creationTime for object A
public long getCreationTime() {
return buffer.getLong(OFFSET_CREATION_DATE);
}
}
Solution1: add new fields in the builder and use version in the header to decide which fields to use at build time (in build method) to create the object. Problem with this approach is that all the methods will exist at compile time to the consumers and unless they test their code every object will be valid. So it will be difficult to reason about which fields are required for which version at build time.
Solution2: Add a new builder in the class with the fields that you want. There will be duplicate fields that are in the existing builder.
Consumers can then use the builder that they want. This seems to be cleaner because builders will be completely independent. Problem with this approach is that since we are adding and removing fields, fields will be at different offsets so the getters will have to change to use an offset based on versionType. This is also problematic for future versions because then we will have this gigantic class with lots of builders and logic in getters for every version
Solution3: Create a new class (let's say B) that extends A and have its own builder. This way the code is more modular. Problem is that now there will need to be some logic somewhere to differentiate and know which constructor to call. For example , if a consumer is passing base64 to get an object A, it will need to figure out which version it is.
String encodedString = "someString form of A"
A a = new A(encodedString);
Is there a recommended way to code these data classes with builder patterns to make it both future and backwards compatible.
Aproach 2 combined with aproach 1 + correct binary representation is the answere. Choosing correct format for your binary representation, the simplest thing would be to pick up json . Make concrete builders for V1 and V2 object and use the byte buffer to construct them. Each builder/Factory would be interested only in the fields it recognizes. You may consider using a version field if a builder/factory attempts to deserialize wrong version exception may be thrown. The concrete builder/factory will be build only objects of the version it recognizes.
Subclassing is unnessessary in my opinion. You can separate the Builder/factory class from the object class. See "StreamSerializer" from Hazelcast as an example, completly external class to the entity dedicated only to doing marshaling.
Using proper format will fix your problem with offset from Aproach two. If you must have it in binary form then a workaround would be to use flat format where record size is bigger than nessesary and you have reserved free space for changes. In the old Cobol days this is how they were doing it. I don't recommend you to do that though. Use json :) it is simplest may be not most effecient. You can also check https://developers.google.com/protocol-buffers/ protocol buffers.
Depending on what layout you choose for serialization when demarshaling you may configure chain of responsibility that attempts to deserialize a stream portion. When you deprecate a version, the marshaler will be just removed/deactivated from the chain.

Use of Serializable other than Writing& Reading object to/from File

In Which Cases it is a good coding practice to use implements serializable other than Writing & Reading object to/from file.In a project i went through code. A class using implements serializable even if in that class/project no any Writing/Reading objects to/from file?
If the object leaves the JVM it was created in, the class should implement Serializable.
Serialization is a method by which an object can be represented as a sequence of bytes that includes the object's data as well as information about the object's type and the types of data stored in the object.
After a serialized object has been written into a file, it can be read from the file and deserialized that is, the type information and bytes that represent the object and its data can be used to recreate the object in memory.
This is the main purpose of de-serialization. To get the object information, object type, variable type information from a written(loosely speaking) representation of an object. And hence serialization is required in the first place, to make this possible.
So, whenever, your object has a possibility of leaving the JVM, the program is being executed in, you should make the class, implement Serializable.
Reading/Writing objects into files (Memory), or passing an object over internet or any other type of connection. Whenever the object, leaves the JVM it was created in, it should implement Serializable, so that it can be serialized and deserialized for recognition once it enters back into another/same JVM.
Many good reads at :
1: Why Java needs Serializable interface?
2: What is the purpose of Serialization in Java?
Benefits of serialization:
To persist data for future use.
To send data to a remote computer using client/server Java technologies like RMI , socket programming etc.
To flatten an object into array of bytes in memory.
To send objects between the servers in a cluster.
To exchange data between applets and servlets.
To store user session in Web applications
To activate/passivate enterprise java beans.
You can refer to this article for more details.
If you ever expect your object to be used as data in a RMI setting, they should be serializable, as RMI either needs objects Serializable (if they are to be serialized and sent to the remote side) or to be a UnicastRemoteObject if you need a remote reference.
In earlier versions of java (before java 5) marker interfaces were good way to declare meta data but currently we having annotation which are more powerful to declare meta data for classes.
Annotation provides the very flexible and dynamic capability and we can provide the configuration for annotation meta deta that either we want to send that information in byte code or at run time.
Here If you are not willing to read & write object then there is one purpose left of serialization is, declare metadata for class and if you are goint to declare meta data for class then personally I suggest you don't use serialization just go for annotation.
Annotation is better choice than marker interface and JUnit is a perfect example of using Annotation e.g. #Test for specifying a Test Class. Same can also be achieved by using Test marker interface.
There is one more example which indicate that Annotations are better choice #ThreadSafe looks lot better than implementing ThraedSafe marker interface.
There are other cases in which you want to send an object by value instead of by reference:
Sending objects over the network.
Can't really send objects by reference here.
Multithreading, particularly in Android
Android uses Serializable/Parcelable to send information between Activities. It has something to do with memory mapping and multithreading. I don't really understand this though.
Along with Martin C's answer I want to add that - if you use Serializable then you can easily load your Object graph to memory. For example you have a Student class which have a Deportment. So if you serialize your Student then the Department also be saved. Moreover it also allow you -
1. to rename variables in a serialized class while maintaining backwards-compatibility.
2. to access data from deleted fields in a new version (in other words, change the internal representation of your data while maintaining backwards-compatibility).
Some frameworks/environments might depend upon data objects being serializable. For example in J2EE, the HttpSession attributes must be serializable in order to benefit from Session Persistence. Also RMI and other dark ages artifacts use serialization.
Therefore, though you might not immediately need your data objects to be serializable, it might make sense to declare Serializable just in case (It is almost free, unless you need to go through the pain of declaring readObject/writeObject methods)

Serializing objects with changing class source code

Note: Due to the lack of questions like this on SO, I've decided to put one up myself as a Q&A
Serializing objects (using an ObjectOutputStream and an ObjectInputStream) is a method for storing an instance of a Java Object as data that can be later deserialized for use. This can cause problems and frustration when the Class used to deserialize the data does not remain the same (source-code changes; program updates).
So how can an Object be serialized and deserialized with an updated / downgraded version of a Class?
Here are a few common ways of serializing an object that can be deserialized in a backwards-compatible way.
1. Store the data in the JSON format using import and export methods designed to save all fields needed to recreate the instance. This can be made backwards-compatible by including a version key that allows for an update algorithm to be called if the version is too low. A common library for this is the Google Gson library which can represent Java objects in JSON as well as normally editing a JSON file.
2. Use the built-in java Properties class in a way similar to the method described above. Properties objects can be later stored using a stream (store()) written as a regular Java Properties file, or saved in XML (storeToXML()).
3. Sometimes simple objects can be easily represented with key-value pairs in a place where storing them in a JSON, XML, or Properties file is either too complicated or not neccessary (overkill one could say). In this case, an effective way of serializing the object could be using the ObjectOutputStream class to serialize a HashMap object containing key-value pairs where the key could be a String and the value could be an Object (HashMap<String,Object>). This allows for all of the object's fields to be stored as well as including a version key while providing much versatility.
Note: Although serializing an object using the ObjectOutputStream for persistence storage is normally considered bad convention, it can be used either way as long as the class' source code remains the same.
Also Note about versioning: Changes to a class can be safely made without disrupting deserialization using an ObjectOutputStream as long as they are a compatible change. As mentioned in the Versioning of Serializable Objects chapter of the Object Serialization Specification:
A compatible change is a change that does not affect the contract
between the class and its callers.

Transfer of a Java Serialized Object

Is it possible to declare an instance of a serializable object in one Java program / class, then repeat the definitions of the internal objects in a different program /class entirely, and load in a big complex object from a data file? The goal is to be able to write an editor for items that's kept locally on my build machine, then write the game itself and distribute it to people who would like to play the game.
I'm writing a game in Java as a hobbyist project. Within my game, there's an a family of classes that extend a parent class, GameItem. Items might be in various families like HealingPotion, Bomb, KeyItem, and so on.
class GameItem implements Serializable {
String ItemName
String ImageResourceLocation
....}
What I want to do is include definitions of how to create each item in a particularly family of items, but then have a big class called GameItemList, which contains all possible items that can occur as you play the game.
class GameItemList implements Serializable {
LinkedList<GameItem>gameItemList;
//methods here like LookUpByName, LookUpByIndex that return references to an item
}
Maybe at some point - as the player starts a new game, or as the game launches, do something like:
//create itemList
FileInputStream fileIn = new FileInputStream("items.dat");
ObjectInputStream in = new ObjectInputStream(fileIn);
GameItemList allItems = (GameItemList)in.readObject();
in.close();
//Now I have an object called allItems that can be used for lookups.
Thanks guys, any comments or help would be greatly appreciated.
When you serialize an object, every field of the object is serialized, unless marked with transient. And this behavior is of course recursive. So yes, you can serialize an object, then deserialize it, and the deserialized object will have the same state as the serialized one. A different behavior would make serialization useless.
I wouldn't use native serialization for long-term storage of data, though. Serialized objects are hard to inspect, impossible to modify using a text editor, and maintaining backward compatibility with older versions of the classes is hard. I would use a more open format like XML or JSON.
Yes, that is possible. If an object is correctly serialized, it can be deserialized in any other machine as long as the application running there knowns the definition of the class to be deserialized.
This will work, but Java serialization is notorious for making it hard to "evolve" classes -- the internal representation is explicitly tied to the on-disk format. You can work around this with custom reader / writer methods, but you might consider a more portable format like JSON or XML instead of object serialization.

Why its required to mark a class as serializable?

If a similar question is already posted on stackoverflow, pls just post the link.
What is the need to implement Serializable interface (with no methods) for objects which are to be serialized ?
The Java API says -
- If its not implemented then it will throw java.io.NotSerializableException.
That's because of the following code in ObjectOutputStream.java
............................
writeObject0(Object obj, boolean unshared){
.............
} else if (cl.isArray()) {
writeArray(obj, desc, unshared);
} else if (obj instanceof Serializable) {
writeOrdinaryObject(obj, desc, unshared);
} else {
throw new NotSerializableException(cl.getName());
}
................
But my question is why its necessary to implement Serializable and thereby inform or tell Java/JVM that a class can be serialized. (Is it only to avoid the exception ?).
In this is the case, If we write a similar functionality which writes objects to streams without the check of whether the class in an instanceOf Serializable, Will the objects of a class not implemneting Serializable serialized ?
Any help is appreciated.
It's a good question. The Serializable is know as a marker interface, and can be viewed as a tag on a class to identify it as having capabilities or behaviours. e.g. you can use this to identify classes that you want to serialise that don't have serialVersionUid defined (and this may be an error).
Note that the commonly used serialisation library XStream (and others) don't require Serializable to be defined.
It is needed so that the JVM can know whether or not a class can be safely serialized. Some things (database connections for example) contain state or connections to external resources that cannot really be serialized.
Also, you'll want to make sure that you put a serialVersionUID member in every serializable class to ensure that serialized objects can be de-serialized after a code change or recompile:
// Set to some arbitrary number.
// Change if the definition/structure of the class changes.
private static final long serialVersionUID = 1;
The serialization allows you to save objects directly to binary files without having to parse them to text, write the string out and then create a new object, and parse the string inputs when reading back in. The primary purpose is to allow you to save objects with all their data to a binary file. I found it to be extremely useful when having to work with linked lists containing lots of objects of the same type and I needed to save them and open them.
The reason is that not all classes can be serialized. Examples:
I/O stuff: InputStream, HTTP connections, channels. They depend on objects created outside the scope of the Java VM and there is no simple way to restore them.
OS resources like windows, images, etc.

Categories