How to reconstruct an object from byte[] not created from ObjectOutputStream?

How to reconstruct an object from byte[] not created from ObjectOutputStream? - java

I need to reconstruct an object on the client side from a byte[] which stores bytes coming from an InputStream(TCP/IP). The Server is in C and structures are sent across as bytes. It is from these series of bytes that I have to reconstruct the object.
I can do this by reading chunks of bytes and converting them to variables of the object I want to reconstruct, but this method is tedious and I was wondering if there is an easy way out?

But this method is tedious and I was wondering if there is an easy way out?
Not that I'm aware of. But if you find yourself writing the same code multiple times, you may well find that if you extract some helper methods it actually becomes pretty simple. Yes, you'll need to call a method to read each field value... but the code should end up being easy to read and understand, and not rely on anything magical.
You could do all of this with reflection, possibly using annotations to specify the order in which fields have been serialized etc. But that's likely to be a lot of code to write - unless you've got a lot of different types to deserialize, it will probably be more code - and more complicated code - than the "dumb but straightforward" approach.
I hope the format of the bytes from the C side of things is well-specified though: if it's basically just dumping the in-memory representation, that can end up being pretty fragile in the face of change.

Take a look at JNA. You'll have to dig around a bit. JNA is designed to map C shared libraries (.DLL, .so, etc.) into Java. But it has various helper classes and methods that can be used to map a C structure in memory to a Java object of similar structure. I am almost 100% certain you could read these structures off the wire, write the bytes into a ByteBuffer (direct or otherwise), and then map a Java object over them.

Related

Serialization vs toString()

Since I'm writing/reading from files, I was wondering if there's any difference or there's any best practice between directly sending objects or using their representation as strings on files which in my case I personally find it easier to handle.
So when should I serialize instead of writing/reading objects as String?

There's typically not enough information in the string representation of an object to be used to recreate it.
Java serialization "just works", but does not give you a human-readable representation, if that's what you are looking for.
Another alternative is to read / write JSON representations of your objects. There are several JSON serialization / federalization libraries for Java that are popular, including GSon and Jackson.

The answer is in the javadoc for Object.toString().
Returns a string representation of the object. In general, the toString method returns a string that "textually represents" this object. The result should be a concise but informative representation that is easy for a person to read.
Note that it says:
concise,
informative, and
easy for a person to read.
But it does NOT say:
complete,
unambiguous, or
easy for a computer to read.
Serialization is about producing a linear (not necessarily textual) form that can be read by a computer and used to reconstruct the state of the original object.
So a typical serialization is not particularly human readable (e.g. JSON, XML, YAML) or completely unreadable (e.g. Java Object Serialization, ASN.1). But the flip-side is that the information needed to reconstruct an object should all be present, in an unambiguous form.
(There is a lot more that could be said about various kinds serialization, their properties and their utility. However, it is beyond the scope of your question.)
Does this preclude toString() from being used for serializing data?
No, it doesn't.
But if you take that approach, you need to code your toString() methods carefully to make sure that what they produce is complete and unambiguous. Then you need to write a corresponding method to parse the toString() output and create an new object from it.
... or using their representation as strings on files which in my case I personally find it easier to handle.
I think that as you write larger and more complicated programs, you will get to the stage where that kind of code is tedious and time consuming to write, test and maintain.

Serialization allows you to convert the state of an object into a stream of bytes, which can then be saved to a file on the local disk, sent over the network to any other machine, or saved to the Database. Deserialization allows you to reverse the process, which means to reconvert the serialized byte stream into an object again. It's important to know that numbers or other types aren't as easy to write to files or treats as Strings. However, their initial states are not guaranteed to be maintained without being serialized.
Thus, it is convenient to use Strings, for simpler situations, which is not necessarily important to have a serialization, such as a college project. However, it is not recommended that this process be done, as there are other better solutions.

Why would I serialize an object instead of doing a file output?

I don't currently understand why I would choose to serialize an object instead of just doing a file output and then having a function read that file. What do I gain from serializing an object?

You gain an industry-standard way of reading and writing an object's data, using a W3C approved data exchange format that has almost universal support for readers and writers in almost every programming language.

Serialization makes it easy to store the state of objects,
and objects inside them (If they are Serializable and not marked as transient).
The benefits in your case :
Imagine you have a lot of different classes. Maybe coding a custom File-to-class parser is harder than readObject()

When you serialize an object, you are copying the actual byte data in memory into a stream. When you de-serialize that stream back into an object you get the identical object back including its internal object ID, which you would not get if you had written the properties of the object to a file, and then read it back in and interpreted it.
This means, if you serialize a collection of objects that reference each other, when you de-serialize them, they will still maintain their references to each other. This is good also for debugging a program. If an exception occurs you can create a memory dump on the users computer, and if they send it to you, then you can see directly what was in memory and the problems that may have been caused.
It is also easier to serialize a complex object with many properties to a stream than it is to build some string of representative data, which you will have to be read back, parse and construct a new object with it.
Really what you gain, is that it is easier/quicker and better for debugging.

Object Stream vs Text Stream - Who's Faster? Java

I am doing a work that i need to measure the time to write and read with Object Streams and with Text Streams. I was expecting that the Object Streams was faster than Text Streams but , my results was exactly the opposite situation for both situations(read and write).
Can someone tell me which is normally faster?
Thanks.

Why did you think that Object streams would be faster? They have high overhead. Many people prefer other serialization mechanisms.

Object streams carry quite a bit of overhead since they need to serialize and deserialize class information. They can be reasonably efficient for large object graphs and arrays where the number of unique classes is small, but are notoriously bad for small messages. Object serialisation also has to do quite a bit of bookkeeping (e.g. to detect cycles in object graphs and ensure each object sent only once when there are multiple references to it)
Text streams on the other hand are very simple and carry little overhead. It's not surprising that they are faster in your tests.
Though it does depend a lot on how you encode your data into text: some naive text representations of object graphs would actually be much worse than regular Java object serialisation. Basically, it would be a bad idea to try and reinvent Java object serialisation in text form.....
If you are interested in fast and efficient serialisation of objects, you should also consider:
Advanced objects serialization libraries like Goggle's Protocol Buffers or Kryo
Efficient textual data representation formats like JSON or Clojure s-expressions (both of which have good library support and are proven in the field)

Serialization framework (no no-arg constructor)

I'm looking for some info on the best approach serialize a graph of object based on the following (Java):
Two objects of the same class must be binary equal (bit by bit) compared to true if their state is equal. (Must not depend on JVM field ordering).
Collections are only modeled with arrays (nothing Collections).
All instances are immutable
Serialization format should be in byte[] format instead of text based.
I am in control of all the classes in the graph.
I don't want to put an empty constructor in the classes just to support serialization.
I have looked at implementing a solution based my own traversal an on Objenisis but my problem does not seem that unique. Better checking for any existing/complete solution first.
Updated details:
First, thanks for your help!
Objects must serialize to exactly the same bit order based on the objects state. This is important since the binary content will be digitally signed. Reconstruction of the serialized format will be based on the state of the object and not that the original bits are stored.
Interoperability between different technologies is important. I do see the software running on ex. .Net in the future. No Java flavour in the serialized format.
Note on comments of immutability: The values of the arrays are copied from the argument to the inner fields in the constructor. Less important.
Best regards,
Niclas Lindberg

You could write the data yourself, using reflections or hand coded methods. I use methods which are look hand code, except they are generated. (The performance of hand coded, and the convience of not having to rewrite the code when it changes)
Often developers talk about the builtin java serialization, but you can have a custom serialization to do whatever you want, any way you want.
To give you are more detailed answer, it would depend on what you want to do exactly.
BTW: You can serialize your data into byte[] and still make it human readable/text like/editable in a text editor. All you have to do is use a binary format which looks like text. ;)

Maybe you want to familiarize yourself with the serialization frameworks available for Java. A good starting point for that is the thift-protobuf-compare project, whose name is misleading: It compares the performance of more than 10 ways of serializing data using Java.
It seems that the hardest constraint you have is Interoperability between different technologies. I know that Googles Protobuffers and Thrift deliver here. Avro might also fit.

The important thing to know about serialization is that it is not guaranteed to be consistent across multiple versions of Java. It's not meant as a way to store data on a disk or anywhere permanent.
It's used internally to send classes from one JVM to another during RMI or some other network protocol. These are the types of applications that you should use Serialization for. If this describes your problem - short term communication between two different JVM's - then you should try to get Serialization going.
If you're looking for a way to store the data more permanently or you will need the data to survive in forward versions of Java, then you should find your own solution. Given your requirements, you should create some sort of method of converting each object into a byte stream yourself and reading it back into objects. You will then be responsible for making sure the format is forward compatible with future objects and features.
I highly recommend Chapter 11 of Effective Java by Joshua Bloch.

Is the Externalizable interface what you're looking for ? You fully control the way your objects are persisted and you do that the OO-style, with methods that are inherited and all (unlike the private read-/write-Object methods used with Serializable). But still, you cannot get rid of the no-arg accessible constructor requirement.

The only way you would get this is:
A/ USE UTF8 text, I.E. XML or JSON, binary turned to base64(http/xml safe variety).
B/ Enforce UTF8 binary ordering of all data.
C/ Pack the contents except all unescaped white space.
D/ Hash the content and provide that hash in a positionally standard location in the file.

How does Java's serialization work and when it should be used instead of some other persistence technique?

I've been lately trying to learn more and generally test Java's serialization for both work and personal projects and I must say that the more I know about it, the less I like it. This may be caused by misinformation though so that's why I'm asking these two things from you all:
1: On byte level, how does serialization know how to match serialized values with some class?
One of my problems right here is that I made a small test with ArrayList containing values "one", "two", "three". After serialization the byte array took 78 bytes which seems awfully lot for such low amount of information(19+3+3+4 bytes). Granted there's bound to be some overhead but this leads to my second question:
2: Can serialization be considered a good method for persisting objects at all? Now obviously if I'd use some homemade XML format the persistence data would be something like this
<object>
<class="java.util.ArrayList">
<!-- Object array inside Arraylist is called elementData -->
<field name="elementData">
<value>One</value>
<value>Two</value>
<value>Three</value>
</field>
</object>
which, like XML in general, is a bit bloated and takes 138 bytes(without whitespaces, that is). The same in JSON could be
{
"java.util.ArrayList": {
"elementData": [
"one",
"two",
"three"
]
}
}
which is 75 bytes so already slightly smaller than Java's serialization. With these text-based formats it's of course obvious that there has to be a way to represent your basic data as text, numbers or any combination of both.
So to recap, how does serialization work on byte/bit level, when it should be used and when it shouldn't be used and what are real benefits of serialization besides that it comes standard in Java?

I would personally try to avoid Java's "built-in" serialization:
It's not portable to other platforms
It's not hugely efficient
It's fragile - getting it to cope with multiple versions of a class is somewhat tricky. Even changing compilers can break serialization unless you're careful.
For details of what the actual bytes mean, see the Java Object Serialization Specification.
There are various alternatives, such as:
XML and JSON, as you've shown (various XML flavours, of course)
YAML
Facebook's Thrift (RPC as well as serialization)
Google Protocol Buffers
Hessian (web services as well as serialization)
Apache Avro
Your own custom format
(Disclaimer: I work for Google, and I'm doing a port of Protocol Buffers to C# as my 20% project, so clearly I think that's a good bit of technology :)
Cross-platform formats are almost always more restrictive than platform-specific formats for obvious reasons - Protocol Buffers has a pretty limited set of native types, for example - but the interoperability can be incredibly useful. You also need to consider the impact of versioning, with backward and forward compatibility, etc. The text formats are generally hand-editable, but tend to be less efficient in both space and time.
Basically, you need to look at your requirements carefully.

The main advantage of serialization is that it is extremely easy to use, relatively fast, and preserves actual Java object meshes.
But you have to realize that it's not really meant to be used for storing data, but mainly as a way for different JVM instances to communicate over a network using the RMI protocol.

see the Java Object Serialization Stream Protocol for a description of the file format an grammar used for serialized objects.
Personally I think the built-in serialization is acceptable to persist short-lived data (e.g. store the state of a session object between to http-requests) which is not relevant outside your application.
For data that has a longer live-time or should be used outside your application, I'd persist either into a database or at least use a more commonly used format...

How does Java's built-in serialization works?
Whenever we want to serialize an object, we implement java.io.Serializable interface. The interface which does not have any methods to implement, even though we are implementing it to indicate something to compiler or JVM (known as Marker Interface). So if JVM sees a Class is Serializable it perform some pre-processing operation on those classes. The operation is, it adds the following two sample methods.
private void writeObject(java.io.ObjectOutputStream stream)
throws IOException {
stream.writeObject(name); // object property
stream.writeObject(address); // object property
}
private void readObject(java.io.ObjectInputStream stream)
throws IOException, ClassNotFoundException {
name = (String) stream.readObject(); // object property
address = (String) stream.readObject();// object property
}
When it should be used instead of some other persistence technique?
The built in Serialization is useful when sender and receiver both are Java. If you want to avoid the above kind of problems, we use XML or JSON with the help of frameworks.

I bumped into this dilemma about a month ago (see the question I asked).
The main lesson I learned from it is use Java serialization only when necessary and if there's no other option. Like Jon said, it has it's downfalls, while other serialization techniques are much easier, faster and more portable.

Serializing means that you put your structured data in your classes into a flat order of bytecode to save it.
You should generally use other techniques than the buildin java-method, it is just made to work out of the box but if you have some changing contents or changing orders in future in your serialized classes, you get into trouble because you'll cannot load them correctly.

The advantage of Java Object Serialization (JOS) is that it just works. There are also tools out there that do the same as JOS, but use an XML format instead of a binary format.
About the length: JOS writes some class information at the start, instead of as part of each instance - e.g. the full field names are recorded once, and an index into that list of names is used for instances of the class. This makes the output longer if you write only one instance of the class, but is more efficient if you write several (different) instances of it. It's not clear to me if your example actually uses a class, but this is the general reason why JOS is longer than one would expect.
BTW: this is incidental, but I don't think JSON records class names (as you have in your example), and so it might not do what you need.

The reason why storing a tiny amount of information is serial form is relatively large is that it stores information about the classes of the objects it is serialising. If you store a duplicate of your list, then you'll see that the file hasn't grown by much. Store the same object twice and the difference is tiny.
The important pros are: relatively easy to use, quite fast and can evolve (just like XML). However, the data is rather opaque, it is Java-only, tightly couples data to classes and untrusted data can easily cause DoS. You should think about the serialised form, rather than just slapping implements Serializable everywhere.

If you don't have too much data, you can save objects into a java.util.Properties object. An example of a key/value pair would be user_1234_firstname = Peter. Using reflection to save and load objects can make things easier.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.