Requirement: I need to dump the object graph of a Java application at various points of time during the application's execution. But most importantly, the dump needs to contain metadata for all objects in the graph e.g. the reference ids as obtained from System.identityHashCode(object).
What I tried: I found out serialization is a goto solution for obtaining the object graph.
I tried a couple of different solutions, including the built in Serialization facility in Java, JAXB and XStream. But I have no clue on how to get any of these libraries to spit out the objects' metadata such as the original reference IDs for the objects getting serialized.
Question: I was wondering if there was an easy way to include additional information/metadata about each object in an object graph that gets serialized?
While i am interested in specifically storing the objects' reference id, as obtained by System.identityHashCode(), any pointers on storing any data in general will be acceptable.
In case adding such metadata is not possible, that information will be useful as well.
NOTE: I ask this question despite the knowledge that the reference ids obtained from the System.identityHashCode() method are not required to be unique across different objects.
P.S. I am a newbie with Java serialization.
Related
I'm having a problem when serializing and deserializing my objects in my project. I'm writing the object to a name.dat file.
However whenever i make a change in the Name class i can nolonger deserialize it, since it's two different objects.
Is there any way around this?
Your best options are:
Don't change your classes :-)
Throw away any serialized objects each time you change your classes.
Don't use Java object serialization.
Given that 1) and 2) are probably out of the question, option 3) should be given serious consideration. There a variety of alternatives to Java serialization, depending on the nature of the data you are persisting. These include:
Using Java properties files
Storing the data in a classical database (using SQL and the JDBC API)
Using an object-relational database mapping such as Hibernate
Using XML or JSON and a "binding" technology so that you can serialize / deserialize POJOs.
Finally, it is possible to implement class versioning using Java object serialization. However, it is tricky. And if you are continually changing the classes, then it is not going to be pleasant. Start by reading Versioning of Serializable Objects.
In Which Cases it is a good coding practice to use implements serializable other than Writing & Reading object to/from file.In a project i went through code. A class using implements serializable even if in that class/project no any Writing/Reading objects to/from file?
If the object leaves the JVM it was created in, the class should implement Serializable.
Serialization is a method by which an object can be represented as a sequence of bytes that includes the object's data as well as information about the object's type and the types of data stored in the object.
After a serialized object has been written into a file, it can be read from the file and deserialized that is, the type information and bytes that represent the object and its data can be used to recreate the object in memory.
This is the main purpose of de-serialization. To get the object information, object type, variable type information from a written(loosely speaking) representation of an object. And hence serialization is required in the first place, to make this possible.
So, whenever, your object has a possibility of leaving the JVM, the program is being executed in, you should make the class, implement Serializable.
Reading/Writing objects into files (Memory), or passing an object over internet or any other type of connection. Whenever the object, leaves the JVM it was created in, it should implement Serializable, so that it can be serialized and deserialized for recognition once it enters back into another/same JVM.
Many good reads at :
1: Why Java needs Serializable interface?
2: What is the purpose of Serialization in Java?
Benefits of serialization:
To persist data for future use.
To send data to a remote computer using client/server Java technologies like RMI , socket programming etc.
To flatten an object into array of bytes in memory.
To send objects between the servers in a cluster.
To exchange data between applets and servlets.
To store user session in Web applications
To activate/passivate enterprise java beans.
You can refer to this article for more details.
If you ever expect your object to be used as data in a RMI setting, they should be serializable, as RMI either needs objects Serializable (if they are to be serialized and sent to the remote side) or to be a UnicastRemoteObject if you need a remote reference.
In earlier versions of java (before java 5) marker interfaces were good way to declare meta data but currently we having annotation which are more powerful to declare meta data for classes.
Annotation provides the very flexible and dynamic capability and we can provide the configuration for annotation meta deta that either we want to send that information in byte code or at run time.
Here If you are not willing to read & write object then there is one purpose left of serialization is, declare metadata for class and if you are goint to declare meta data for class then personally I suggest you don't use serialization just go for annotation.
Annotation is better choice than marker interface and JUnit is a perfect example of using Annotation e.g. #Test for specifying a Test Class. Same can also be achieved by using Test marker interface.
There is one more example which indicate that Annotations are better choice #ThreadSafe looks lot better than implementing ThraedSafe marker interface.
There are other cases in which you want to send an object by value instead of by reference:
Sending objects over the network.
Can't really send objects by reference here.
Multithreading, particularly in Android
Android uses Serializable/Parcelable to send information between Activities. It has something to do with memory mapping and multithreading. I don't really understand this though.
Along with Martin C's answer I want to add that - if you use Serializable then you can easily load your Object graph to memory. For example you have a Student class which have a Deportment. So if you serialize your Student then the Department also be saved. Moreover it also allow you -
1. to rename variables in a serialized class while maintaining backwards-compatibility.
2. to access data from deleted fields in a new version (in other words, change the internal representation of your data while maintaining backwards-compatibility).
Some frameworks/environments might depend upon data objects being serializable. For example in J2EE, the HttpSession attributes must be serializable in order to benefit from Session Persistence. Also RMI and other dark ages artifacts use serialization.
Therefore, though you might not immediately need your data objects to be serializable, it might make sense to declare Serializable just in case (It is almost free, unless you need to go through the pain of declaring readObject/writeObject methods)
Note: Due to the lack of questions like this on SO, I've decided to put one up myself as a Q&A
Serializing objects (using an ObjectOutputStream and an ObjectInputStream) is a method for storing an instance of a Java Object as data that can be later deserialized for use. This can cause problems and frustration when the Class used to deserialize the data does not remain the same (source-code changes; program updates).
So how can an Object be serialized and deserialized with an updated / downgraded version of a Class?
Here are a few common ways of serializing an object that can be deserialized in a backwards-compatible way.
1. Store the data in the JSON format using import and export methods designed to save all fields needed to recreate the instance. This can be made backwards-compatible by including a version key that allows for an update algorithm to be called if the version is too low. A common library for this is the Google Gson library which can represent Java objects in JSON as well as normally editing a JSON file.
2. Use the built-in java Properties class in a way similar to the method described above. Properties objects can be later stored using a stream (store()) written as a regular Java Properties file, or saved in XML (storeToXML()).
3. Sometimes simple objects can be easily represented with key-value pairs in a place where storing them in a JSON, XML, or Properties file is either too complicated or not neccessary (overkill one could say). In this case, an effective way of serializing the object could be using the ObjectOutputStream class to serialize a HashMap object containing key-value pairs where the key could be a String and the value could be an Object (HashMap<String,Object>). This allows for all of the object's fields to be stored as well as including a version key while providing much versatility.
Note: Although serializing an object using the ObjectOutputStream for persistence storage is normally considered bad convention, it can be used either way as long as the class' source code remains the same.
Also Note about versioning: Changes to a class can be safely made without disrupting deserialization using an ObjectOutputStream as long as they are a compatible change. As mentioned in the Versioning of Serializable Objects chapter of the Object Serialization Specification:
A compatible change is a change that does not affect the contract
between the class and its callers.
Is a Data Transfer Object the same as a Value Object or are they different? If they are different then where should we use a DTO and where should we use a VO?
The programming language we are talking about is Java and the context is - there is a web application, which fetches data from a database and then processes it and ultimately the processed information is displayed on the front-end.
A value object is a simple object whose equality isn't based on identity.
A data transfer object is an object used to transfer data between software application subsystems, usually between business layers and UI. It is focused just on plain data, so it doesn't have any behaviour.
A Data Transfer Object is a kludge for moving a bunch of data from one layer or tier to another, the goal is to minimize the number of calls back and forth by packing a bunch of stuff into the same data structure and sending it together. Some people also use it, like Michael points out in his post here, so that the classes used by one layer are not exposed to the layer calling it. When I refer to DTO as a kludge, I mean there's not a precise abstract concept getting implemented, it's a practical workaround for helping with communication between application layers.
A Value Object is something where we're only interested in its value, like a monetary amount, a date range, or a code from a lookup table. It does not have an identity, meaning you would not be concerned, if you had several of them, of keeping track of which is which, because they are not things in themselves.
Contrast Value Objects to things that do have a unique identity in your system, which are called Entities. If you have a system where it tracks a customer making a payment, the customer and the payment are entities, because they represent specific things, but the monetary amount on the payment is just a value, it doesn't have an existence by itself, as far as your system is concerned. How something relates to your system determines if it is a Value Object or an Entity.
use a DTO at the boundary of your services if you don't want to send the actual domain object to the service's clients - this helps reduce dependencies between the client and service.
values objects are simply objects whose equality isn't based on identity e.g. java.lang.Integer
DTOs and value objects aren't really alternatives to each other.
They are different, but I've even used the two interchangeably in the past, which is wrong. I read that DTO (Data Transfer Object) was called a VO ( Value Object) in the first edition of the Core J2EE Patterns book, but wasn't able to find that reference.
A DTO, which I've sometimes called a Dumb Transfer Object to help me remember it's a container and shouldn't have any business logic is used to transport data between layers and tiers. It just should be an object with attributes that has getters/setters.
A VO however is similar to a JAVA Enum and represents a fixed set of data. A VO doesn't have object identity (the address of the object instance in memory), it is identified by its value and is immutable.
Martin Fowler, talking about Data Transfer Objects (DTOs):
Many people in the Sun community use the term "Value Object" for this pattern. I use it to mean something else.
So the term "Value Object" has been used to mean DTO, but as of him (and the other posters), its use as a DTO seems discouraged.
Good detailed answer in Matthias Noback article Is it a DTO or a Value Object?
In short a DTO:
Declares and enforces a schema for data: names and types
Offers no guarantees about correctness of values
A value object:
Wraps one or more values or value objects
Provides evidence of the correctness of these values
Maybe because of lack of experience, but I would put it this way: It's the matter of scope.
DTO has word transfer in it so it means some parts of the system will communicate using it.
Value object has smaller scope, you will pass set of data in value object instead in array from one service to the other.
As much as I understood niether of them is "object whose equality isn't based on identity".
When we are deserializing an object, its very difficult to understand that, how it is retriving the object in some certain state? Does it contain any Meta data of the object?
When an object is serialized, the object's class is written to the stream along with the contents of the object's non-transient fields. The deserializer will attempt to load that class (and there are several mechanisms for it to do that), then populate the non-transient fields.
The protocol spec is here: http://java.sun.com/javase/6/docs/platform/serialization/spec/protocol.html
If by "metadata" you're referring to annotations on the class, then no, they are not serialized with the object itself, but are available on the class. If you mean something else, please describe what you mean.
At a high level, the serialization stream contains the data inside the object and the name of the classes involved, as well as a version number to ensure the class didn't change. It uses that information to make a new instance of an object and fills it with the same data as the old instance. It does this avoiding all of the usual constraints on object creation (the need to call constructors, for example).
One confusing point people have is that they can think the class definition itself is serialized. It is not, just the data it contains with enough information to know which objects to recreate when deserilalized. When the object is deserialized, it has to match the existing class on the class path, the serialization binary data does not contain the class.