Serializing objects with the Java Preferences API

Serializing objects with the Java Preferences API - java

When I first started using the Java Preferences API, the one glaring omission from the API was a putObject() method. I've always wondered why they did not include it.
So, I did some googling and I found this article from IBM which shows you how to do it: http://www.ibm.com/developerworks/library/j-prefapi/
The method they're using seems a bit hackish to me, because you have to break the Object up into byte matrices, store them, and reassemble them later.
My question is, has anyone tried this approach? Can you testify that it is a good way to store/retrieve objects?.
I'm also curious why the Java devs left putObject() out of the API. Does anyone have valuable insight?

I'm also curious why the Java devs left putObject() out of the API.
Does anyone have valuable insight?
From: http://docs.oracle.com/javase/7/docs/technotes/guides/preferences/designfaq.html
Why doesn't this API contain methods to read and write arbitrary
serializable objects?
Serialized objects are somewhat fragile: if the version of the program
that reads such a property differs from the version that wrote it, the
object may not deserialize properly (or at all). It is not impossible
to store serialized objects using this API, but we do not encourage
it, and have not provided a convenience method.

The article describes a reliable way to do it. I see there are a couple of things I may do differently (like I would store the count of the number of pieces as well as the pieces themselves so that I can figure things out easily when I retrieve them).
Your comment about Serialization is wrong though.... the object you want to store has to be Serializable.... that's how the ObjectOutputStream that the document uses does it's job.
So, Yes, it looks like a reliable mechanism, you need to have Serializable objects, and I imagine that the reason that putObject and getObject are not part of the API for two reasons:
it's not part of the way that is native to Windows registries
It risks people putting huge amounts of data in the registry.
Storing serialized objects in the registry strikes me as being somewhat concerning because they can be so big. I would only use it for occasions when there is no way to reconstruct the Object from constructors, and the serialized version is relatively small.

Related

Continue with Object serialization or use database?

I have written a math game in Java, and have distributed some copies to a few beta-testers. The problem is that the version I have given them is saving the GameData via object serialization, which I found out is mainly for sending Objects, or in this case, ArrayLists of GameData, over a network. It is NOT persistance; that is what a relational database is for. Knowing this, I would like to know if it would be better to create a database on the beta-tester's machine (and rewrite the game), or continue with the Object serialization version of the game, and then retrieve the Objects when they are ready to send the data?
My guess would be to just move their data to a database that is created on their computer, and then give them the database version of the game. That way, the data can be persisted and be much easier to manipulate. What turns me away from that idea is the question of how am I going to write their database into mine (in the future)?

Although relatively rare, there are still lots of applications that use serialization for storage and retrieval of objects. It's not wrong to do this, just slightly unusual. If it's working for you, stick with it because DB's are a heavyweight solution. What you found out, about serialization, is only an opinion and an ill-formed one at that.

In terms of using an embedded database, two options to consider are SQLite and HyperSQL. However, serialization is also an option, and in my opinion it should be your default option if you've already implemented it. Some considerations:
With serialization you've generally got to retrieve the entire object, which is slow if you've got an object with several dozen fields and you only want to read one of them. If you're making queries like these, then use a database. I suspect that you're just reading in all of your serialized objects at startup and serializing them back out to disk at shutdown, in which case there's no reason to use a database instead of serialization.
Java's default serialization mechanism is fairly slow. You may want to consider another serialization mechanism, such as Kryo or Jackson, but only if you're not happy with your program's serialization performance.

It is difficult to advise on the best choice of technology without knowing what you are persisting and why.
If the state is simply a snapshot of your game state (i.e. a save file) or a "best scores" table, then you don't need a database. Serializing using JSON, XML or ... Java Object serialization is sufficient.
If the state needs to be read or updated incrementally or shared with other applications ... or users on other machines ... then a database is more appropriate.
Serialization mechanisms are problematic if the requirements include incremental changes, etcetera. You end up building a database-like layer over the top of the serialization.
As to whether you should stick with Java serialization ... or switch to JSON or XML or something like that:
Object serialization is simple, but it can be fragile if you change the classes that you are serializing. This fragility can be mitigated, but it is messy and you lose the simplicity. (You need to write custom readObject and writeObject methods that know how to read "old versions" of the serialized objects.)
JSON and XML are a bit more complicated, but still relatively simple if you use an object binding mechanism.
It is worth noting that changes to the persisted object classes (or the database schemas) are potentially problematic no matter what you do. There is no easy universal solution to this problem.
UPDATE
Given the additional information that you provided in your first comment (below), it seems like you don't need a database in the game itself. All you need is something that can read and analyse the session state save files that your beta testers provide for you. Indeed, it doesn't even seem like the actual app needs to be able read the files. (But that's unclear, because you've not said what the real purpose of these files is ... or at least, not what the entire purpose is.)
It is also worth noting that you are probably saving the wrong information if your aim is to tune the sets of questions. What you really need to do is record the length of time and whether the user got the right or wrong answer and the time ... for each individual question. And you probably need to know what the actual answer given was ... so that you can spot cases where the user's answer was actually right and you "marked" it as wrong ... or vice versa.
"What turns me away from that idea is the question of how am I going to write their database into mine (in the future)?"
Exactly. If you hadn't prematurely "analysed" the data, you wouldn't have this problem.
But ignoring that, it seems like that a simple state saving mechanism is sufficient to meet your (still hypothetical / inferred) requirement of keeping a personal score board for the end user. Your "tuning" stuff would be better implemented using a custom log file. I cannot see any value in incorporating a database as part of the app itself.

I presume you are doing java serialisation, If so there is nothing wrong with it. Just be aware of its limitations - Different versions of java might not be able to retrieve the file.
Also If you change the Class, previous saved data can not be retrieved.
If you decide to change you could look at Xml, JSon, Protocol Buffers, Thrift, Avro etc as well as a DB.
Note:
Xml is builtin in to java
Java Db (Derby) is also in Java
Other serialisation schema's require a seperate library.

Can I (easily) use a third-party library to handle serialization for Java RMI?

I very much like the simplicity of calling remote methods via Java's RMI, but the verbosity of its serialization format is a major buzz kill (Yes, I have benchmarked, thanks). It seems that the architects at Sun did the obvious right thing when designing the RPC (speaking loosely) component, but pulled an epic fail when it came to implementing serialization.
Conversely, it seems the architects of Thrift, Avro, Kryo (especially), protocol buffers (not so much), etc. generally did the obvious right thing when designing their serialization formats, but either do not provide a RPC mechanism, provide one that is needlessly convoluted (or immature), or else one that is more geared toward data transfer than invoking remote methods (perfectly fine for many purposes, but not what I'm looking for).
So, the obvious question: How can I use RMI's method-invocation loveliness but employ one of the above libraries for the wire protocol? Is this possible without a lot of work? Am I evaluating one of the aforementioned libraries too harshly (N.B. I very much dislike code generation, in general; I dislike unnecessary annotations somewhat, and XML configuration quite a bit more; any sort of "beans" make me cringe--I don't need the weight; ideally, I'm looking to just implement an interface for my remote objects, as with RMI).

Once upon a time, I did have the same requirement. I had changed rmi methods arguments and return types to byte[].
I had serialized objects with my preferred serializer to byte array, then called my modified rmi methods.
Well, as you mentioned java serialization is too verbose, therefore 5 years ago I did implement a space efficient serialization algorithm. It saves too much space, if you are sending a very complex object graph.. Recently, I have to port this serialization implementation to GWT, because GWT serialization in Dev mode is incredibly slow.
As an example;
rmi method
public void saveEmployee(Employee emp){
//business code
}
you should change it like below ,
public void saveEmployee(byte[] empByte) {
YourPreferredSerializer serialier = YourPreferredSerializerFactory.creteSerializer();
Employee emp = (Employee) serializer.deSerialize(empByte);
//business code
}
EDIT :
You should check MessagePack . it looks promising.

I don't think there is a way to re-wire RMI, but it might be that specific replacement projects -- I am specifically thinking of DiRMI -- might? And/or project owners might be interest in helping with this (Brian, its author, is a very competent s/w engineer from Amazon.com).
Another interesting project is Protostuff -- its author is building a RPC framework too (I think); but even without it supports an impressive range of data formats; and does this very efficiently (as per https://github.com/eishay/jvm-serializers/wiki/).
Btw, I personally think biggest mistake most projects have made (like PB, Avro) is not keeping proper separation between RPC and serialization aspects nicely separate.
So ability to do RPC using a pluggable data format or serialization providers seems like a good idea to me.

writeReplace() and readResolve() is probably the best combo for doing so. Mighty powerful in the right hands.

Java serialization is only verbose where it describes the classes and fields it's serializing. Overall, the format is as "self describing" as XML. You can can actually override this and replace it with something else. This is what the writeClassDescriptor and readClassDescriptor methods are for. Dirmi overrides these methods, and so it is able to use standard object serialization with less wire overhead.
The way it works is related to how its sessions work. Both endpoints may have different versions of the object, and so simply throwing away the class descriptors won't work. Instead, additional data is exchanged (in the background) so that the serialized descriptor is replaced with a session-specific identifier. Upon seeing the identifier, a lookup table is examined to find the descriptor object. Because the data is exchanged in the background, there's a brief "warm up period" after a session is created and for every time an object type is written for the first time.
Dirmi has no way to replace the wire format at this time.

Homemade vs. Java Serialization

I have a certain POJO which needs to be persisted on a database, current design specifies its field as a single string column, and adding additional fields to the table is not an option.
Meaning, the objects need to be serialized in some way. So just for the basic implementation I went and designed my own serialized form of the object which meant concatenating all it's fields into one nice string, separated by a delimiter I chose. But this is rather ugly, and can cause problems, say if one of the fields contains my delimiter.
So I tried basic Java serialization, but from a basic test I conducted, this somehow becomes a very costly operation (building a ByteArrayOutputStream, an ObjectOutputStream, and so on, same for the deserialization).
So what are my options? What is the preferred way for serializing objects to go on a database?
Edit: this is going to be a very common operation in my project, so overhead must be kept to a minimum, and performance is crucial. Also, third-party solutions are nice, but irrelevant (and usually generate overhead which I am trying to avoid)

Elliot Rusty Harold wrote up a nice argument against using Java Object serialization for the objects in his XOM library. The same principles apply to you. The built-in Java serialization is Java-specific, fragile, and slow, and so is best avoided.
You have roughly the right idea in using a String-based format. The problem, as you state, is that you're running into formatting/syntax problems with delimiters. The solution is to use a format that is already built to handle this. If this is a standardized format, then you can also potentially use other libraries/languages to manipulate it. Also, a string-based format means that you have a hope of understanding it just by eyeballing the data; binary formats remove that option.
XML and JSON are two great options here; they're standardized, text-based, flexible, readable, and have lots of library support. They'll also perform surprisingly well (sometimes even faster than Java serialization).

You might try Protocol Buffers, it is a open-source project from Google, it is said to be fast (generates shorter serialized form than XML, and works faster). It also handles addition of new field gently (inserts default values).

You need to consider versioning in your solution. Data incompatibility is a problem you will experience with any solution that involves the use of a binary serialization of the Object. How do you load an older row of data into a newer version of the object?
So, the solutions above which involve serializing to a name/value pairs is the approach you probably want to use.
One solution is to include a version number as one of field values. As new fields are added, modified or removed then the version can be modified.
When deserializing the data, you can have different deserialization handlers for each version which can be used to convert data from one version to another.

XStream or YAML or OGNL come to mind as easy serialization techniques. XML has been the most common, but OGNL provides the most flexibility with the least amount of metadata.

Consider putting the data in a Properties object and use its load()/store() serialization. That's a text-based technique so it's still readable in the database:
public String getFieldsAsString() {
Properties data = new Properties();
data.setProperty( "foo", this.getFoo() );
data.setProperty( "bar", this.getBar() );
...
ByteArrayOutputStream out = new ByteArrayOutputStream();
data.store( out, "" );
return new String( out.toByteArray(), "8859-1" ); //store() always uses this encoding
}
To load from string, do similar using a new Properties object and load() the data.
This is better than Java serialization because it's very readable and compact.
If you need support for different data types (i.e. not just String), use BeanUtils to convert each field to and from a string representation.

I'd say your initial approach is not all that bad if your POJO consists of Strings and primitive types. You could enforce escaping of the delimiter to prevent corruptions. Also if you use Hibernate you encapsulate the serialization in a custom type.
If you do not mind another dependency, Hessian is supposedly a more efficient way of serializing Java objects.

How about the standard JavaBeans persistence mechanism:
java.beans.XMLEncoder
java.beans.XMLDecoder
These are able to create Java POJOs from XML (which have been persisted to XML). From memory, it looks (something) like...
<object class="java.util.HashMap">
<void method="put">
<string>Hello</string>
<float>1</float>
</void>
</object>
You have to provide PersistenceDelegate classes so that it knows how to persist user-defined classes. Assuming you don't remove any public methods, it is resilient to schema changes.

You can optimize the serialization by externalizing your object. That will give you complete control over how it is serialized and improve the performance of process. This is simple to do, as long as your POJO is simple (i.e. doesn't have references to other objects), otherwise you can easily break serialization.
tutorial here
EDIT: Not implying this is the preferred approach, but you are very limited in your options if ti is performance critical and you can only use a string column in the table.

If you are using a delimiter you could use a character which you know would never occur in your text such as \0, or special symbols http://unicode.org/charts/symbols.html
However the time spent sending the data to the database and persisting it is likely to be much larger than the cost of serialization. So I would suggest starting with some thing simple and easy to read (like XStream) and look at where your application is spending most of its time and optimise that.

I have a certain POJO which needs to be persisted on a database, current design specifies its field as a single string column, and adding additional fields to the table is not an option.
Could you create a new table and put a foreign key into that column!?!? :)
I suspect not, but let's cover all the bases!
Serialization:
We've recently had this discussion so that if our application crashes we can resurrect it in the same state as previously. We essentially dispatch a persistance event onto a queue, and then this grabs the object, locks it, and then serializes it. This seems pretty quick. How much data are you serializing? Can you make any variables transient (i.e. cached variables)? Can you consider splitting up your serialization?
Beware: what happens if your objects change (locking) or classes change (diferent serialization id)? You'll need to upgrade everything that's serialized to latest classes. Perhaps you only need to store this overnight so it doesn't matter?
XML:
You could use something like xstream to achieve this. Building something custom is doable (a nice interview question!), but I'd probably not do it myself. Why bother? Remember if you have cyclic links or if you have referencs to objects more than once. Rebuilding the objects isn't quite so trivial.
Database storage:
If you're using Oracle 10g to store blobs, upgrade to the latest version, since c/blob performance is massively increased. If we're talking large amounts of data, then perhaps zip the output stream?
Is this a realtime app, or will there be a second or two pauses where you can safely persist the actual object? If you've got time, then you could clone it and then persist the clone on another thread. What's the persistance for? Is it critical it's done inside a transaction?

Consider changing your schema. Even if you find a quick way to serialize a POJO to a string how do you handle different versions? How do you migrate the database from X->Y? Or worse from A->D? I am seeing issues where we stored a serialize object into a BLOB field and have to migrate a customer across multiple versions.

Have you looked into JAXB? It is a mechanism by which you can define a suite of java objects that are created from an XML Schema. It allows you to marshal from an object hierarchy to XML or unmarshal the XML back into an object hierarchy.

I'll second suggestion to use JAXB, or possibly XStream (former is faster, latter has more focus on object serialization part).
Plus, I'll further suggest a decent JSON-based alternative, Jackson (http://jackson.codehaus.org/Tutorial), which can fully serializer/deserialize beans to JSON text to store in the column.
Oh and I absolutely agree in that do not use Java binary serialization under any circumstances for long-term data storage. Same goes for Protocol Buffers; both are too fragile for this purpose (they are better for data transfer between tigtly coupled systems).

You might try Preon. Preon aims to be to binary encoded data what Hibernate is to relational databases and JAXB to XML.

Serialize Java objects into Java code

Does somebody know a Java library which serializes a Java object hierarchy into Java code which generates this object hierarchy? Like Object/XML serialization, only that the output format is not binary/XML but Java code.

Serialised data represents the internal data of objects. There isn't enough information to work out what methods you would need to call on the objects to reproduce the internal state.
There are two obvious approaches:
Encode the serialised data in a literal String and deserialise that.
Use java.beans XML persistence, which should be easy enough to process with your favourite XML->Java source technique.

I am not aware of any libraries that will do this out of the box but you should be able to take one of the many object to XML serialisation libraries and customise the backend code to generate Java. Would probably not be much code.
For example a quick google turned up XStream. I've never used it but is seems to support multiple backends other than XML - e.g. JSON. You can implement your own writer and just write out the Java code needed to recreate the hierarchy.
I'm sure you could do the same with other libraries, in particular if you can hook into a SAX event stream.
See:
HierarchicalStreamWriter

Great question. I was thinking about serializing objects into java code to make testing easier. The use case would be to load some data into a db, then generate the code creating an object and later use this code in test methods to initialize data without the need to access the DB.
It is somehow true that the object state doesn't contain enough info to know how it's been created and transformed, however, for simple java beans there is no reason why this shouldn't be possible.
Do you feel like writing a small library for this purpose? I'll start coding soon!

XStream is a serialization library I used for serialization to XML. It should be possible and rather easy to extend it so that it writes Java code.

Deserialize in a different language

The log4j network adapter sends events as a serialised java object. I would like to be able to capture this object and deserialise it in a different language (python). Is this possible?
NOTE The network capturing is easy; its just a TCP socket and reading in a stream. The difficulty is the deserialising part

Generally, no.
The stream format for Java serialization is defined in this document, but you need access to the original class definitions (and a Java runtime to load them into) to turn the stream data back into something approaching the original objects. For example, classes may define writeObject() and readObject() methods to customise their own serialized form.
(edit: lubos hasko suggests having a little java program to deserialize the objects in front of Python, but the problem is that for this to work, your "little java program" needs to load the same versions of all the same classes that it might deserialize. Which is tricky if you're receiving log messages from one app, and really tricky if you're multiplexing more than one log stream. Either way, it's not going to be a little program any more. edit2: I could be wrong here, I don't know what gets serialized. If it's just log4j classes you should be fine. On the other hand, it's possible to log arbitrary exceptions, and if they get put in the stream as well my point stands.)
It would be much easier to customise the log4j network adapter and replace the raw serialization with some more easily-deserialized form (for example you could use XStream to turn the object into an XML representation)

Theoretically, it's possible. The Java Serialization, like pretty much everything in Javaland, is standardized. So, you could implement a deserializer according to that standard in Python. However, the Java Serialization format is not designed for cross-language use, the serialization format is closely tied to the way objects are represented inside the JVM. While implementing a JVM in Python is surely a fun exercise, it's probably not what you're looking for (-:
There are other (data) serialization formats that are specifically designed to be language agnostic. They usually work by stripping the data formats down to the bare minimum (number, string, sequence, dictionary and that's it) and thus requiring a bit of work on both ends to represent a rich object as a graph of dumb data structures (and vice versa).
Two examples are JSON (JavaScript Object Notation) and YAML (YAML Ain't Markup Language).
ASN.1 (Abstract Syntax Notation One) is another data serialization format. Instead of dumbing the format down to a point where it can be easily understood, ASN.1 is self-describing, meaning all the information needed to decode a stream is encoded within the stream itself.
And, of course, XML (eXtensible Markup Language), will work too, provided that it is not just used to provide textual representation of a "memory dump" of a Java object, but an actual abstract, language-agnostic encoding.
So, to make a long story short: your best bet is to either try to coerce log4j into logging in one of the above-mentioned formats, replace log4j with something that does that or try to somehow intercept the objects before they are sent over the wire and convert them before leaving Javaland.
Libraries that implement JSON, YAML, ASN.1 and XML are available for both Java and Python (and pretty much every programming language known to man).

I would recommend moving to a third-party format (by creating your own log4j adapters etc) that both languages understand and can easily marshal / unmarshal, e.g. XML.

In theory it's possible. Now how difficult in practice it might be depends on whether Java serialization format is documented or not. I guess, it's not. edit: oops, I was wrong, thanks Charles.
Anyway, this is what I suggest you to do
capture from log4j & deserialize Java object in your own little Java program.
now when you have the object again, serialize it using your own custom formatter.
Tip: Maybe you don't even have to write your own custom formatter. for example, JSON (scroll down for libs) has libraries for Python and Java, so you could in theory use Java library to serialize your objects and Python equivalent library to deserialize it
send output stream to your python application and deserialize it
Charles wrote:
the problem is that for this
to work, your "little java program"
needs to load the same versions of all
the same classes that it might
deserialize. Which is tricky if you're
receiving log messages from one app,
and really tricky if you're
multiplexing more than one log stream.
Either way, it's not going to be a
little program any more.
Can't you just simply reference Java log4j libraries in your own java process? I'm just giving general advice here that is applicable to any pair of languages (name of the question is pretty language agnostic so I just provided one of the generic solutions). Anyway, I'm not familiar with log4j and don't know whether you can "inject" your own serializer into it. If you can, then of course your suggestion is much better and cleaner.

Well I am not Python expert so I can't comment on how to solve your problem but if you have program in .NET you may use IKVM.NET to deserialize Java objects easily. I have experimented this by creating .NET Client for Log4J log messages written to Socket appender and it worked really well.
I am sorry, if this answer does not make sense here.

If you can have a JVM on the receiving side and the class definitions for the serialized data, and you only want to use Python and no other language, then you may use Jython:
you would deserialize what you received using the correct Java methods
and then you process what you get with you Python code

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.