How to pass objects from Client to Map and Reduce? - java

Is that the class should extend ObjectWritable class? Then how can I pass it from client to the Map and Reduce? thanks

I assume you mean to pass an object from your client code to your Mappers and Reducers?
You will have to use some form of serialization to do that, since the data is going over the wire. There are a few possibilities depending on your scenario:
Probably the best solution would be to instantiate the object in the Mappers/Reducers. To pass the information required for the constructor call, you can use the Job-Configuration.
conf.setInt("foo", 32);
conf.set("bar", "bazz");
If your object is serializable and quite small you can serialize it and include a base64 encoded version of it in the JobConf.
If the serialized objects are to big, you can use the distributed cache: http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache

Related

Serializable class

What is the point to a serializable class? To my understanding its so that you can send objects across a network and know that on both ends that the object will be verified that it is the correct object. For example, if I have a server with a serializable class and want to send data to an app via object output stream, I can use the serializable class with the same UID on both ends to verify that the object is legitimate and not hacked? Please correct me if I'm wrong but that's how I am understanding the documentation on the serializable interface
Security and Serialization both are different.
Java serialization is to convert the objects to bytes. Period.
The optional UID field is to assure the serialized and deserialized object (structure) versions match.
Serialization is useful to convert an object into a file and reload it back into an object later in future, and of course you can send that file (stream) over the network also.
You're correct, but you can think of it more broadly.
You can convert a serializable class to bytes
You can add an object of this type to a serializable collection and it will be properly serialized (e.g. you can make a list of them and serialize the list if the list is serializable)
By the way, the serialVersionUID is optional. It will generate one on its own, though it will be a bit more fragile - if you change, for example, a method signature, the jvm will translate this to an altered signature and believe that the class is now incompatible with previous serialized versions, even if you haven't changed data fields. If you create your own you're essentially overriding this mechanism.

Use of Serializable other than Writing& Reading object to/from File

In Which Cases it is a good coding practice to use implements serializable other than Writing & Reading object to/from file.In a project i went through code. A class using implements serializable even if in that class/project no any Writing/Reading objects to/from file?
If the object leaves the JVM it was created in, the class should implement Serializable.
Serialization is a method by which an object can be represented as a sequence of bytes that includes the object's data as well as information about the object's type and the types of data stored in the object.
After a serialized object has been written into a file, it can be read from the file and deserialized that is, the type information and bytes that represent the object and its data can be used to recreate the object in memory.
This is the main purpose of de-serialization. To get the object information, object type, variable type information from a written(loosely speaking) representation of an object. And hence serialization is required in the first place, to make this possible.
So, whenever, your object has a possibility of leaving the JVM, the program is being executed in, you should make the class, implement Serializable.
Reading/Writing objects into files (Memory), or passing an object over internet or any other type of connection. Whenever the object, leaves the JVM it was created in, it should implement Serializable, so that it can be serialized and deserialized for recognition once it enters back into another/same JVM.
Many good reads at :
1: Why Java needs Serializable interface?
2: What is the purpose of Serialization in Java?
Benefits of serialization:
To persist data for future use.
To send data to a remote computer using client/server Java technologies like RMI , socket programming etc.
To flatten an object into array of bytes in memory.
To send objects between the servers in a cluster.
To exchange data between applets and servlets.
To store user session in Web applications
To activate/passivate enterprise java beans.
You can refer to this article for more details.
If you ever expect your object to be used as data in a RMI setting, they should be serializable, as RMI either needs objects Serializable (if they are to be serialized and sent to the remote side) or to be a UnicastRemoteObject if you need a remote reference.
In earlier versions of java (before java 5) marker interfaces were good way to declare meta data but currently we having annotation which are more powerful to declare meta data for classes.
Annotation provides the very flexible and dynamic capability and we can provide the configuration for annotation meta deta that either we want to send that information in byte code or at run time.
Here If you are not willing to read & write object then there is one purpose left of serialization is, declare metadata for class and if you are goint to declare meta data for class then personally I suggest you don't use serialization just go for annotation.
Annotation is better choice than marker interface and JUnit is a perfect example of using Annotation e.g. #Test for specifying a Test Class. Same can also be achieved by using Test marker interface.
There is one more example which indicate that Annotations are better choice #ThreadSafe looks lot better than implementing ThraedSafe marker interface.
There are other cases in which you want to send an object by value instead of by reference:
Sending objects over the network.
Can't really send objects by reference here.
Multithreading, particularly in Android
Android uses Serializable/Parcelable to send information between Activities. It has something to do with memory mapping and multithreading. I don't really understand this though.
Along with Martin C's answer I want to add that - if you use Serializable then you can easily load your Object graph to memory. For example you have a Student class which have a Deportment. So if you serialize your Student then the Department also be saved. Moreover it also allow you -
1. to rename variables in a serialized class while maintaining backwards-compatibility.
2. to access data from deleted fields in a new version (in other words, change the internal representation of your data while maintaining backwards-compatibility).
Some frameworks/environments might depend upon data objects being serializable. For example in J2EE, the HttpSession attributes must be serializable in order to benefit from Session Persistence. Also RMI and other dark ages artifacts use serialization.
Therefore, though you might not immediately need your data objects to be serializable, it might make sense to declare Serializable just in case (It is almost free, unless you need to go through the pain of declaring readObject/writeObject methods)

Storing Serializable Objects in the Database

I'm writing an application which needs to write an object into database.
For simplicity, I want to serialize the object.
But ObjectOuputStream needed for the same purpose has only one constructor which takes any subclass of OutputStream as parameter.
What parameter should be passed to it?
You can pass a ByteArrayOutputStream and then store the resulting stream.toByteArray() in the database as blob.
Make sure you specify a serialVersionUID for the class, because otherwise you'll have hard time when you add/remove a field.
Also consider the xml version for object serialization - XMLEncoder, if you need a bit more human-readable data.
And ultimately, you may want to translate your object model to the relational model via an ORM framework. JPA (Hibernate/EclipseLink/OpenJPA) provide object-relational mapping so that you work with objects, but their fields and relations are persisted in a RDBMS.
Using ByteArrayOutputStream should be a simple enough way to convert to a byte[] (call toByteArray after you've flushed). Alternatively there is Blob.setBinaryStream (which actually returns an OutputStream).
You might also want to reconsider using the database as a database...
e.g. create ByteArrayOutputStream and pass it to ObjectOuputStream constructor
One thing to add to this. java serialization is a good, general use tool. however, it can be a bit verbose. you might want to try gzipping the serialized data. you can do this by putting a GZIP stream between the object stream and the byte stream. this will use a small amount of extra cpu, but that is often a worthy tradeoff to shipping the extra bytes over the network and shoving them in a db.

need of Serializable interface in java? As there are no methods in the interface. and how does it maintain state of an object?

classes which implement serializable interface what exactly they implement as there are no methods in the interface.And how does it help in maintaining state of object across a network.
It is a marker interface. Additional discussion can be found here: What is the use of marker interfaces in Java?
In short, the interface is used via reflection-based code that inspects the type information at run time, and if if the object in question implements that interfacre then certain actions are taken (in the case of Serializable: object is saved/loaded to/from a stream).
serializable is a marker interface. serializable interface makes java recognize the implementing class object can be serialized(means write the byte information of object into files or any other channels). So, it means if you want to make a class object can be serialized you have to make that class flagged with serializable interface. Otherwise, throws IOException like it wont serialze the object.
Why this Exception would be thrown? because, developer should decide about serializing an object and deserializing the same later would have any use or not. When there is no use of serializing developer wont want his object to be serialized by him or any other developer using his class. Take for example, Socket class; It wont implement serializable interface because If you can serialize socket and close application and launches the application again and deserialize the same socket object. In the mean while connected server through socket is down. Is there any use of serializing the socket class object?
It doesn't maintain state itself. But it marks the class as requiring serialisation, and the runtime then knows to serialise that class, and its components (excluding fields marked as transient).
It's useful to explicitly mark classes as being serialisable and fields as transient (i.e. not to be serialised). Otherwise you could inadvertently serialise everything in your program for transmission over the network. That likely is not what you want. You wouldn't want to serialise entities like factories. Nor credentials like passwords. Not to mention the payload size :-)
No interface maintains state, of course. It's a marker interface, like Remotable.
From Wikipedia:
.... serialization is the process of converting a data structure or object
into a sequence of bits so that it can be stored in a file, a memory buffer,
or transmitted across a network connection link .....
You can do things with serialized objects that you cannot do with non serialized objects. Instead of using using a webservice to pass data from client to server, you can put all of the info into a serialized bean and avoid any xml parsing and binding.
You can take a serialized bean and write it out to a file, save it in a database as a blob.
The serialized interface provides the ability for you to implement a level of persistence and durability.

will serialized object contains metadata?

When we are deserializing an object, its very difficult to understand that, how it is retriving the object in some certain state? Does it contain any Meta data of the object?
When an object is serialized, the object's class is written to the stream along with the contents of the object's non-transient fields. The deserializer will attempt to load that class (and there are several mechanisms for it to do that), then populate the non-transient fields.
The protocol spec is here: http://java.sun.com/javase/6/docs/platform/serialization/spec/protocol.html
If by "metadata" you're referring to annotations on the class, then no, they are not serialized with the object itself, but are available on the class. If you mean something else, please describe what you mean.
At a high level, the serialization stream contains the data inside the object and the name of the classes involved, as well as a version number to ensure the class didn't change. It uses that information to make a new instance of an object and fills it with the same data as the old instance. It does this avoiding all of the usual constraints on object creation (the need to call constructors, for example).
One confusing point people have is that they can think the class definition itself is serialized. It is not, just the data it contains with enough information to know which objects to recreate when deserilalized. When the object is deserialized, it has to match the existing class on the class path, the serialization binary data does not contain the class.

Categories