I'm writing an application which needs to write an object into database.
For simplicity, I want to serialize the object.
But ObjectOuputStream needed for the same purpose has only one constructor which takes any subclass of OutputStream as parameter.
What parameter should be passed to it?
You can pass a ByteArrayOutputStream and then store the resulting stream.toByteArray() in the database as blob.
Make sure you specify a serialVersionUID for the class, because otherwise you'll have hard time when you add/remove a field.
Also consider the xml version for object serialization - XMLEncoder, if you need a bit more human-readable data.
And ultimately, you may want to translate your object model to the relational model via an ORM framework. JPA (Hibernate/EclipseLink/OpenJPA) provide object-relational mapping so that you work with objects, but their fields and relations are persisted in a RDBMS.
Using ByteArrayOutputStream should be a simple enough way to convert to a byte[] (call toByteArray after you've flushed). Alternatively there is Blob.setBinaryStream (which actually returns an OutputStream).
You might also want to reconsider using the database as a database...
e.g. create ByteArrayOutputStream and pass it to ObjectOuputStream constructor
One thing to add to this. java serialization is a good, general use tool. however, it can be a bit verbose. you might want to try gzipping the serialized data. you can do this by putting a GZIP stream between the object stream and the byte stream. this will use a small amount of extra cpu, but that is often a worthy tradeoff to shipping the extra bytes over the network and shoving them in a db.
Related
I am using redis as centralized cache for distributed system. Currently i am using jedis to connect to redis cluster, where i am storing the value as byte[] instead of string. My question is does storing plain string or byte [] has impact on getting the data. In my application i serialize my java pojo object and convert to byte [] and then store, where as i can convert it to json and store so while getting it from redis i can readily use the object instead of deserialize. I have tried both but the only difference i can see is the extra step of deserialize
In Redis, everything is a byte[]. What redis calls as strings are actually byte[] in programming languages.
When you store JSON, you still need to serialize it to byte[] before saving to redis, and do the reverse when you read back. This is no different from serializing a java object. In other words, you always have to pay the cost of serialization and deserialization.
That said, different libraries have different serialization costs. Java serialization is know to be slow and inefficient. JSON is likely to be better than java serialization - but wastes memory in redis because it is a text based. You can choose a better serialization library.
Kryo is a faster replacement for the java serializer. Message Pack is like JSON but faster. Protocol Buffers / Flat Buffers are even better, but require you to declare a schema upfront. There are other serialization formats as well, each with their tradeoffs.
The general recommendation - try to use the hash datatype. It is efficient, and lets you request specific fields instead of the whole object. Only if hash does not work for you, pick something else based on your needs.
P.S. If you are into benchmarks, this website has several - https://github.com/eishay/jvm-serializers/wiki
In Which Cases it is a good coding practice to use implements serializable other than Writing & Reading object to/from file.In a project i went through code. A class using implements serializable even if in that class/project no any Writing/Reading objects to/from file?
If the object leaves the JVM it was created in, the class should implement Serializable.
Serialization is a method by which an object can be represented as a sequence of bytes that includes the object's data as well as information about the object's type and the types of data stored in the object.
After a serialized object has been written into a file, it can be read from the file and deserialized that is, the type information and bytes that represent the object and its data can be used to recreate the object in memory.
This is the main purpose of de-serialization. To get the object information, object type, variable type information from a written(loosely speaking) representation of an object. And hence serialization is required in the first place, to make this possible.
So, whenever, your object has a possibility of leaving the JVM, the program is being executed in, you should make the class, implement Serializable.
Reading/Writing objects into files (Memory), or passing an object over internet or any other type of connection. Whenever the object, leaves the JVM it was created in, it should implement Serializable, so that it can be serialized and deserialized for recognition once it enters back into another/same JVM.
Many good reads at :
1: Why Java needs Serializable interface?
2: What is the purpose of Serialization in Java?
Benefits of serialization:
To persist data for future use.
To send data to a remote computer using client/server Java technologies like RMI , socket programming etc.
To flatten an object into array of bytes in memory.
To send objects between the servers in a cluster.
To exchange data between applets and servlets.
To store user session in Web applications
To activate/passivate enterprise java beans.
You can refer to this article for more details.
If you ever expect your object to be used as data in a RMI setting, they should be serializable, as RMI either needs objects Serializable (if they are to be serialized and sent to the remote side) or to be a UnicastRemoteObject if you need a remote reference.
In earlier versions of java (before java 5) marker interfaces were good way to declare meta data but currently we having annotation which are more powerful to declare meta data for classes.
Annotation provides the very flexible and dynamic capability and we can provide the configuration for annotation meta deta that either we want to send that information in byte code or at run time.
Here If you are not willing to read & write object then there is one purpose left of serialization is, declare metadata for class and if you are goint to declare meta data for class then personally I suggest you don't use serialization just go for annotation.
Annotation is better choice than marker interface and JUnit is a perfect example of using Annotation e.g. #Test for specifying a Test Class. Same can also be achieved by using Test marker interface.
There is one more example which indicate that Annotations are better choice #ThreadSafe looks lot better than implementing ThraedSafe marker interface.
There are other cases in which you want to send an object by value instead of by reference:
Sending objects over the network.
Can't really send objects by reference here.
Multithreading, particularly in Android
Android uses Serializable/Parcelable to send information between Activities. It has something to do with memory mapping and multithreading. I don't really understand this though.
Along with Martin C's answer I want to add that - if you use Serializable then you can easily load your Object graph to memory. For example you have a Student class which have a Deportment. So if you serialize your Student then the Department also be saved. Moreover it also allow you -
1. to rename variables in a serialized class while maintaining backwards-compatibility.
2. to access data from deleted fields in a new version (in other words, change the internal representation of your data while maintaining backwards-compatibility).
Some frameworks/environments might depend upon data objects being serializable. For example in J2EE, the HttpSession attributes must be serializable in order to benefit from Session Persistence. Also RMI and other dark ages artifacts use serialization.
Therefore, though you might not immediately need your data objects to be serializable, it might make sense to declare Serializable just in case (It is almost free, unless you need to go through the pain of declaring readObject/writeObject methods)
I am aware of what Serialization is however I have not found any real practical example describing the latter one (saving an object in a database taking advantage of the JAVA_OBJECT mapping).
Do I have first to serialize the object and then save it to the database?
In the case of MySQL, you don't have to serialize the object first, the driver will do it for you. Just use the PreparedStatement.setObject method.
For example, first in MySQL create the table:
create table blobs (b blob);
Then in a Java program create a prepared statement, set the parameters, and execute:
PreparedStatement preps;
preps = connection.prepareStatement("insert into blobs (b) values (?)");
preps.setObject(1, new CustomObject());
preps.execute();
Don't forget that the class of the object that you want to store has to implement the Serializable interface.
Serialization is used to save the state of an object and marshall it to a stream and share it with a remote process. The other process just need to have the same class version to deserialize the stream back to an object.
The problem with the database approach is that you will need to expose the databse even to the remote process. This is generally not done due to various reasons, mainly security.
Is that the class should extend ObjectWritable class? Then how can I pass it from client to the Map and Reduce? thanks
I assume you mean to pass an object from your client code to your Mappers and Reducers?
You will have to use some form of serialization to do that, since the data is going over the wire. There are a few possibilities depending on your scenario:
Probably the best solution would be to instantiate the object in the Mappers/Reducers. To pass the information required for the constructor call, you can use the Job-Configuration.
conf.setInt("foo", 32);
conf.set("bar", "bazz");
If your object is serializable and quite small you can serialize it and include a base64 encoded version of it in the JobConf.
If the serialized objects are to big, you can use the distributed cache: http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache
How can I implement serialization on my own. Meaning I don't want my class to implement serializable. But I want to implement serialization myself. So that without implementing serializable I can transfer objects over network or write them to a file and later retrieve them in same state. I want to do it since I want to learn and explore things.
Serialization is the process of translating the structure of an object into another format that could be easily transfered across network or could be stored in a file. Java serializes objects into a binary format. This is not necessary if bandwidth/disk-space is not a problem. You can simply encode your objects as XML:
// Code is for illustration purpose only, I haven't compiled it!!!
public class Person {
private String name;
private int age;
// ...
public String serializeToXml() {
StringBuilder xml = new StringBuilder();
xml.append("<person>");
xml.append("<attribute name=\"age\" type=\"int\">").append(age);
xml.append("</attribute>");
xml.append("<attribute name=\"name\" type=\"string\">").append(name);
xml.append("</attribute>");
xml.append("</person>");
return xml.toString();
}
Now you can get an object's XML representation and "serialize" it to a file or a network connection. A program written in any language that can parse XML can "deserialize" this object into its own data structure.
If you need a more compact representation, you can think of binary encoding:
// A naive binary serializer.
public byte[] serializeToBytes() {
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
// Object name and number of attributes.
// Write the 4 byte length of the string and the string itself to
// the ByteArrayOutputStream.
writeString("Person", bytes);
bytes.write(2); // number of attributes;
// Serialize age
writeString("age", bytes);
bytes.write(1); // type = 1 (i.e, int)
writeString(Integer.toString(age), bytes);
// serialize name
writeString("name", bytes);
bytes.write(2); // type = 2 (i.e, string)
writeString(name, bytes);
return bytes.toByteArray();
}
private static void writeString(String s, ByteArrayOutputStream bytes) {
bytes.write(s.length());
bytes.write(s.toBytes());
}
To learn about a more compact binary serialization scheme, see the Java implementation of Google Protocol Buffers.
You can use Externalizable and implement your own serialization mechanism. One of the difficult aspects of serialization is versioning so this can be a challenging exercise to implement. You can also look at protobuf and Avro as binary serialization formats.
You start with reflection. Get the object's class and declared fields of its class and all superclasses. Then obtain value of each field and write it to dump.
When deserializing, just reverse the process: get class name from your serialized form, instantiate an object and set its fields accordingly to the dump.
That's the simplistic approach if you just want to learn. There's many issues that can come up if you want to do it "for real":
Versioning. What if one end of the application is running new version, but the other end has an older class definition with some fields missing or renamed?
Overwriting default behavior. What if some object is more complex and cannot be recreated on a simple field-by-field basis?
Recreating dependencies between objects, including cyclic ones.
... and probably many more.
Get the Java Source code and understand how Serialization is implemented. I did this some month ago, and now have a Serialization that uses only 16% of the space and 20% of the time of "normal" serialization, at the cost of assuming that the classes that wrote the serialized data have not changed. I use this for client-server serialization where I can use this assumption.
As a supplement to #Konrad Garus' answer. There is one issue that is a show-stopper for a full reimplementation of Java serialization.
When you deserialize an object, you need to use one of the object's class's constructors to recreate an instance. But which constructor should you use? If there is a no-args constructor, you could conceivably use that. However, the no-args constructor (or indeed any constructor) might do something with the object in addition to creating it. For example, it might send a notification to something else that a new instance has been created ... passing the instance that isn't yet completely deserialized.
In fact, it is really difficult replicate what standard Java deserialization code does. What it does is this:
It determines the class to be created.
Create an instance of the class without calling any of its constructors.
It uses reflection to fill in the instance's fields, including private fields, with objects and values reconstructed from the serialization.
The problem is that step 2. involves some "black magic" that a normal Java class is not permitted to do.
(If you want to understand the gory details, read the serialization spec and take a look at the implementation in the OpenJDK codebase.)