What are the key points to make sure while implementing serialization

What are the key points to make sure while implementing serialization - java

What are the key points in checklist to be checked while implementing good serialization in java (I'm not really looking for , implements Serializable, write, readObject etc).
Instead , How to reduce the size of the object , Probably how to make the object in zip format and send over the network etc..
How to ensure the secure mode of transfer.
any others like this..

How to reduce the size of the object: new ObjectOutputStream(new GZipOutputStream(new BufferedOutputStream(out)). But this is a space-time tradeoff. You may find that it makes performance worse by adding latency.
How to ensure the secure mode of transfer: SSLSocket or an HTTPS URL.
Any others like this
Any others like what? You will need to be specific.

Do not use serialization to "persist" objects - this makes schema management (i.e. changing what constitutes a class's state) almost unworkable
Always declare a serialVersionUID field; otherwise you will not be able to add methods, or change the class in any way (even non-state-changing ones) without old versions of your code being unable to deserialize the objects (an IncompatibleClassVersionError will be thrown)
try and override readResolve if you are deserializing a logical "enum" instance of a class (typesafe enumeration pattern)
Make sure you are 100% happy with the name of the variables which make up the state of your class. Once you have serialized instances around, you cannot change the variable names
Do not implement Serializable unless you really have to
Do not make your interfaces Serializable - the implementation classes may be, but the interfaces should not be
Do not make serialization part of the way your library passes objects around, unless you are the only producer and consumer of the objects (e.g. server-GUI communication). Use a binary/wire protocol instead (e.g. protobuf)
To minimize what is sent over the wire, you could use swizzling. That is, perhaps you have a Product class; the serialized form might just be a unique int id field. All other methods could then be made to construct relevant state as required (perhaps as a database call, or call to some central service)
Make sure, if you are serializing out an object which contains some collection of elements as part of its state, that you synchronize on the collection. Otherwise you may find that someone modifies the collection as it is being serialized out, resulting in a ConcurrentModificationException

Probably how to make the object in zip format and send over the network etc..
Check out how the game developers for network games implement networking. They know, how to transmit data quickly. Have a look at e.g. http://code.google.com/p/kryonet/
How to ensure the secure mode of transfer. any others like this..
There are a lot of interpretations of secure mode. If you need reliability, use TCP otherwise UDP. If you need encryption use TLS otherwise rot13 may fit. If you need to ensure integrity, append a hash of the values to the message.
How to reduce the size of the object ,
Analyse your data and strip down the objects, so that you only have the necessary data in there. This is very context specific, as the best optimisation can be in the domain. E.g. you can check, if it is possible to send only deltas of the change.
It is an interesting question, but you have to be more specific about your goal or domain to get an answer that fits best.

Related

How does MicroStream (de)serialization work?

I was wondering how the serialization of MicroStream works in detail.
Since it is described as "Super-Fast" it has to rely on code-generation, right? Or is it based on reflections?
How would it perform in comparison to the Protobuf-Serialization, which relies on Code-generation that directly reads out of the java-fields and writes them into a bytebuffer and vice-versa.
Using reflections would drastically decrease the performance when serializing objects on a huge scale, wouldn't it?
I'm looking for a fast way to transmit and persist objects for a multiplayer-game and every millisecond counts. :)
Thanks in advance!
PS: Since I don't have enough reputation, I can not create the "microstream"-tag. https://microstream.one/

I am the lead developer of MicroStream.
(This is not an alias account. I really just created it. I'm reading on StackOverflow for 10 years or so but never had a reason to create an account. Until now.)
On every initialization, MicroStream analyzes the current runtime's versions of all required entity and value type classes and derives optimized metadata from them.
The same is done when encountering a class at runtime that was unknown so far.
The analysis is done per reflection, but since it is only done once for every handled class, the reflection performance cost is negligible.
The actual storing and loading or serialization and deserialization is done via optimized framework code based on the created metadata.
If a class layout changes, the type analysis creates a mapping from the field layout that the class' instances are stored in to that of the current class.
Automatically if possible (unambiguous changes or via some configurable heuristics), otherwise via a user-provided mapping. Performance stays the same since the JVM does not care if it (simplified speaking) copies a loaded value #3 to position #3 or to position #5. It's all in the metadata.
ByteBuffers are used, more precisely direct ByteBuffers, but only as an anchor for off-heap memory to work on via direct "Unsafe" low-level operations. If you are not familiar with "Unsafe" operations, a short and simple notion is: "It's as direct and fast as C++ code.". You can do anything you want very fast and close to memory, but you are also responsible for everything. For more details, google "sun.misc.Unsafe".
No code is generated. No byte code hacking, tacit replacement of instances by proxies or similar monkey business is used. On the technical level, it's just a Java library (including "Unsafe" usage), but with a lot of properly devised logic.
As a side note: reflection is not as slow as it is commonly considered to be. Not any more. It was, but it has been optimized pretty much in some past Java version(s?).
It's only slow if every operation has to do all the class analysis, field lookups, etc. anew (which an awful lot of frameworks seem to do because they are just badly written). If the fields are collected (set accessible, etc.) once and then cached, reflection is actually surprisingly fast.
Regarding the comparison to Protobuf-Serialization:
I can't say anything specific about it since I haven't used Protocol Buffers and I don't know how it works internally.
As usual with complex technologies, a truly meaningful comparison might be pretty difficult to do since different technologies have different optimization priorities and limitations.
Most serialization approaches give up referential consistency but only store "data" (i.e. if two objects reference a third, deserialization will create TWO instances of that third object.
Like this: A->C<-B ==serialization==> A->C1 B->C2.
This basically breaks/ruins/destroys object graphs and makes serialization of cyclic graphs impossible, since it creates and endlessly cascading replication. See JSON serialization, for example. Funny stuff.)
Even Brian Goetz' draft for a Java "Serialization 2.0" includes that limitation (see "Limitations" at http://cr.openjdk.java.net/~briangoetz/amber/serialization.html) (and another one which breaks the separation of concerns).
MicroStream does not have that limitation. It handles arbitrary object graphs properly without ruining their references.
Keeping referential consistency intact is by far not "trying to do too much", as he writes. It is "doing it properly". One just has to know how to do it properly. And it even is rather trivial if done correctly.
So, depending on how many limitations Protobuf-Serialization has ("pacts with the devil"), it might be hardly or even not at all comparable to MicroStream in general.
Of course, you can always create some performance comparison tests for your particular requirements and see which technology suits you best. Just make sure you are aware of the limitations a certain technology imposes on you (ruined referential consistency, forbidden types, required annotations, required default constructor / getters / setters, etc.).
MicroStream has none*.
(*) within reason: Serializing/storing system-internals (e.g. Thread) or non-entities (like lambdas or proxy instances) is, while technically possible, intentionally excluded.

How do I replace Java's default deserialization with my own readObject call?

Someone thought it would be a good idea to store Objects in the database in a blob column using Java's default serialization methods.
The structure of these objects is controlled by another group and they changed a field type from BigDecimal to a Long,
but the data in our database remains the same.
Now we can't read the objects back because it causes ClassCastExceptions.
I tried to override it by writing my own readObject method,
but that throws a StreamCorruptedException because what was written by the default writeObject method.
How do I make my readObject call behave like Java's default one?
Is there a certain number of bytes I can skip to get to my data?

Externalizable allows you to take full control of serialization/deserialization. But it means you're responsible for writing and reading every field,
When it gets difficult though is when something was written out using the default serialization and you want to read it via Externalizable. (Or rather, it's impossible. If you try to read an object serialized with the default method using Externalizable, it'll just throw an exception.)
If you've got absolutely no control on the output, your only option is to keep two versions of the class: use the default deserialization of the old version, then convert to the new. The upside of this solution is that it keeps the "dirty" code in one place, separate from your nice and clean objects.
Again, unless you want to do things really complicated, your best option is to keep the old class as the "transport" bean and rename the class your code really uses to something else.

If you want to read what's already in your database your only option is to get them to change the class back again, and to institute some awareness that you're relying on the class definition as it was when the class was serialized. Merely implementing your own readObject() call can't fix this, and if the class is under someone else's control you can't do that anyway.
If you're prepared to throw away the existing data you have many other choices starting with custom Serialization, writeReplace()/readResolve(), Externalizable, ... or a different mechanism such as XML.
But if you're going to have third parties changing things whenever they feel like it you're always going to have problems of one kind or another.
BigDecimal to Long sounds like a retrograde step anyway.

Implement the readObject and readObjectNoData methods in you class.
Read the appropriate type using ObjectInoutStream.readObject and convert it to the new type
See the Serializable interface API for details.
More Details
You can only fix this easily if you control the source of the class that was serialized into the blob.
If you do not control this class,
then you have only a few limited and difficult options:
Have the controlling party give you a version of the class that reads the old format and writes the new format.
Write you own form of serialization (as in you read the blob and convert the bytes to classes) that can read the old format and generate new versions of the classes.
Write you own version of the class in question (remove the other from the class path) which reads the old format and produces some intermediate form (perhaps JSON).
Next you have to do one of these
Convince the powers that be that the blob technique is shitty and should be done away with. use the current class change as evidance. Almost any technique is better that this. Writing JSON to the db in the blob is better.
Stop depending on shitty classes from other people. (shitty is a judgement which I can only suspect, not know, is true). Instead create a suite of classes that represent the data in the database and convert from the externally controlled classes to the new data classes before writing to the database.

Continue with Object serialization or use database?

I have written a math game in Java, and have distributed some copies to a few beta-testers. The problem is that the version I have given them is saving the GameData via object serialization, which I found out is mainly for sending Objects, or in this case, ArrayLists of GameData, over a network. It is NOT persistance; that is what a relational database is for. Knowing this, I would like to know if it would be better to create a database on the beta-tester's machine (and rewrite the game), or continue with the Object serialization version of the game, and then retrieve the Objects when they are ready to send the data?
My guess would be to just move their data to a database that is created on their computer, and then give them the database version of the game. That way, the data can be persisted and be much easier to manipulate. What turns me away from that idea is the question of how am I going to write their database into mine (in the future)?

Although relatively rare, there are still lots of applications that use serialization for storage and retrieval of objects. It's not wrong to do this, just slightly unusual. If it's working for you, stick with it because DB's are a heavyweight solution. What you found out, about serialization, is only an opinion and an ill-formed one at that.

In terms of using an embedded database, two options to consider are SQLite and HyperSQL. However, serialization is also an option, and in my opinion it should be your default option if you've already implemented it. Some considerations:
With serialization you've generally got to retrieve the entire object, which is slow if you've got an object with several dozen fields and you only want to read one of them. If you're making queries like these, then use a database. I suspect that you're just reading in all of your serialized objects at startup and serializing them back out to disk at shutdown, in which case there's no reason to use a database instead of serialization.
Java's default serialization mechanism is fairly slow. You may want to consider another serialization mechanism, such as Kryo or Jackson, but only if you're not happy with your program's serialization performance.

It is difficult to advise on the best choice of technology without knowing what you are persisting and why.
If the state is simply a snapshot of your game state (i.e. a save file) or a "best scores" table, then you don't need a database. Serializing using JSON, XML or ... Java Object serialization is sufficient.
If the state needs to be read or updated incrementally or shared with other applications ... or users on other machines ... then a database is more appropriate.
Serialization mechanisms are problematic if the requirements include incremental changes, etcetera. You end up building a database-like layer over the top of the serialization.
As to whether you should stick with Java serialization ... or switch to JSON or XML or something like that:
Object serialization is simple, but it can be fragile if you change the classes that you are serializing. This fragility can be mitigated, but it is messy and you lose the simplicity. (You need to write custom readObject and writeObject methods that know how to read "old versions" of the serialized objects.)
JSON and XML are a bit more complicated, but still relatively simple if you use an object binding mechanism.
It is worth noting that changes to the persisted object classes (or the database schemas) are potentially problematic no matter what you do. There is no easy universal solution to this problem.
UPDATE
Given the additional information that you provided in your first comment (below), it seems like you don't need a database in the game itself. All you need is something that can read and analyse the session state save files that your beta testers provide for you. Indeed, it doesn't even seem like the actual app needs to be able read the files. (But that's unclear, because you've not said what the real purpose of these files is ... or at least, not what the entire purpose is.)
It is also worth noting that you are probably saving the wrong information if your aim is to tune the sets of questions. What you really need to do is record the length of time and whether the user got the right or wrong answer and the time ... for each individual question. And you probably need to know what the actual answer given was ... so that you can spot cases where the user's answer was actually right and you "marked" it as wrong ... or vice versa.
"What turns me away from that idea is the question of how am I going to write their database into mine (in the future)?"
Exactly. If you hadn't prematurely "analysed" the data, you wouldn't have this problem.
But ignoring that, it seems like that a simple state saving mechanism is sufficient to meet your (still hypothetical / inferred) requirement of keeping a personal score board for the end user. Your "tuning" stuff would be better implemented using a custom log file. I cannot see any value in incorporating a database as part of the app itself.

I presume you are doing java serialisation, If so there is nothing wrong with it. Just be aware of its limitations - Different versions of java might not be able to retrieve the file.
Also If you change the Class, previous saved data can not be retrieved.
If you decide to change you could look at Xml, JSon, Protocol Buffers, Thrift, Avro etc as well as a DB.
Note:
Xml is builtin in to java
Java Db (Derby) is also in Java
Other serialisation schema's require a seperate library.

Best Practice (Design Pattern) for copying and augmenting Objects

I'm using an API providing access to a special server environment. This API has a wide range of Data objects you can retrieve from it. For Example APICar
Now I'd like to have "my own" data object (MyCar) containing all information of that data object but i'd like to either leave out some properties, augment it, or simply rename some of them.
This is because i need those data objects in a JSON driven client application. So when someone changes the API mentioned above and changes names of properties my client application will break immediatly.
My question is:
Is there a best practice or a design pattern to copy objects like this? Like when you have one Object and want to transfer it into another object of another class? I've seen something like that in eclipse called "AdapterFactory" and was wondering if it's wide used thing.
To make it more clear: I have ObjectA and i need ObjectB. ObjectA comes from the API and its class can change frequently. I need a method or an Object or a Class somewhere which is capable of turning an ObjectA into ObjectB.

I think you are looking for Design Pattern Adapter
It's really just wrapping an instance of class A in an instance of class B, to provide a different way of using it / different type.
"I think" because you mention copying issues, so it may not be as much a class/type thing as a persistence / transmission thing.
Depending on your situation you may also be interested in dynamic proxying, but that's a Java feature.

Passing a Entity over a network?

I have been studying Java networking for a while.
I am using ObjectInputStream and ObjectOutputStream for the I/O between sockets.
Is this possible to transfer a Entity or Model from server to client and vise versa?
How can I implement this? Am I suppose to implement the Entity or Model to Serializable?
Your response is highly appreciated.

I am not sure what sort of special thing you mean to denote by capital-E Entity and capital-M Model; these terms don't have any fixed, privileged meaning in Java (although they might with respect to a certain API or framework.) In general, if by these you just mean some specific Java objects, then yes, you can send any sort of objects this way, and yes, they would be required to implement Serializable. The only limitations would be if these objects contained members whose values wouldn't make sense on the other end of the pipe -- like file paths, etc.
Note that if you send one object, you'll end up sending every other object it holds a non-transient reference to, as well.

First of all... why sending an object through I/O stream? What's wrong with XML?
However, you can always send/receive an object through I/O stream as long as the sender can serialize the object and the receiver can deserialize the object. Hope it helps

You definitely need to look at one of these two libraries
Google gson: http://code.google.com/p/google-gson/
Converts Java object to JSON and back. advantage is that the object can be consumed or generated by Javascript. I have also used this for Java-Java RPC, but it gives you flexibility if you want to target browsers later
Google protocol buffers: http://code.google.com/apis/protocolbuffers/
This is what google uses for RPC. Implementations for Java, C, Python. If you need performance and the smallest size, this is the one to go with (The trade off is you can't look at the data easily to debug problems, like you can with gson, which generates plaint text JSON).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.