Java: Serialization/ Deserialization to/from XML instead of binary - java

I have a complex set of data models that currently implement java.io.Serializable, and I have successfully serialized and deserialized them with ObjectOutputStream and ObjectInputStream.
However, the result are binary files (as expected), and I was wondering if Java supports serialization and deserialization in the same manner to a non-binary format, such as XML.
I see that C# has this feature: XML vs Binary performance for Serialization/Deserialization.
Performance speed/ efficiency is not a consideration in this case.

Further, I would suggest you to look at Simple and XStream frameworks. I found both good. You can choose to go with either one, or may be XmlEncoder as suggested by Jack.

Yes, Java has it and it's called XML Encoding. Check it out here!
The approach is quite similar to the normal serialization..

Related

Differences between Java Serialization, JSON, JAXB?

Is an object's implementation of the Serializable interface in any way related to that object's ability to be serialized into JSON or XML?
Is there a name for the text format that Java serialization uses?
If not, should we not use the word "serialization" to describe exporting an object to JSON or XML, to avoid confusion?
At AO, what uses are typical for each of these three serialization methods?
I know that JAXB is usually used to convert XML to Java, not the other way around, but I heard that the reverse is possible too.
Serialization simply refers to exporting an object from a process-specific in-memory format to an inter-process format that can be read and understood by a different process or application. It may be text, or it may be binary, it doesn't matter. It's all serialization. The reverse processes (reading and parsing a serialized inter-process format into an in-memory, in-process format) is called deserialization.
In that sense, serializing an object into an ObjectStream is just as much serialization as serializing it to JSON or XML. ObjectStream serialization is very difficult to understand/parse by non-java (including humans. It is not "human-readable"), but is used because it can be done on pretty much any object without any special markup.
JSON/XML on the other hand require extra work to tell the parser how to map them to and from JSON/XML, but are very portable - pretty much every language can understand JSON/XML, including humans - it is "human-readable".
One purpose of serialization of Java objects is being able to write them to a (binary) file from which some Java program can read them back, getting the same objects into its memory. This usage is usually limited to Java applications, writing and reading, although some non-Java app might be written to understand the binary format.
Another frequently used serialization of Java objects is to write them to a text (or binary) file from which some (note the absence of: Java) program can read and reconstruct an object or data structure equivalent to the POJO. This, of course, also works in the reverse direction. (I'm adding "binary", because there are some binary formats not defined by Java that are architecture-independent, e.g., ASN.1.)
And, yes, JAXB works either way, but there are some difficulties if the XML is rather "outlandish", i.e., far away from what JAXB can handle or handle easily. But if you can design either the XML Schema or the Java classes, it works very well. JAXB being part of the JDK, you might prefer using it over other serializations if you need to go from Java to X or back. There are other languange binding for XML.

Serialization framework (no no-arg constructor)

I'm looking for some info on the best approach serialize a graph of object based on the following (Java):
Two objects of the same class must be binary equal (bit by bit) compared to true if their state is equal. (Must not depend on JVM field ordering).
Collections are only modeled with arrays (nothing Collections).
All instances are immutable
Serialization format should be in byte[] format instead of text based.
I am in control of all the classes in the graph.
I don't want to put an empty constructor in the classes just to support serialization.
I have looked at implementing a solution based my own traversal an on Objenisis but my problem does not seem that unique. Better checking for any existing/complete solution first.
Updated details:
First, thanks for your help!
Objects must serialize to exactly the same bit order based on the objects state. This is important since the binary content will be digitally signed. Reconstruction of the serialized format will be based on the state of the object and not that the original bits are stored.
Interoperability between different technologies is important. I do see the software running on ex. .Net in the future. No Java flavour in the serialized format.
Note on comments of immutability: The values of the arrays are copied from the argument to the inner fields in the constructor. Less important.
Best regards,
Niclas Lindberg
You could write the data yourself, using reflections or hand coded methods. I use methods which are look hand code, except they are generated. (The performance of hand coded, and the convience of not having to rewrite the code when it changes)
Often developers talk about the builtin java serialization, but you can have a custom serialization to do whatever you want, any way you want.
To give you are more detailed answer, it would depend on what you want to do exactly.
BTW: You can serialize your data into byte[] and still make it human readable/text like/editable in a text editor. All you have to do is use a binary format which looks like text. ;)
Maybe you want to familiarize yourself with the serialization frameworks available for Java. A good starting point for that is the thift-protobuf-compare project, whose name is misleading: It compares the performance of more than 10 ways of serializing data using Java.
It seems that the hardest constraint you have is Interoperability between different technologies. I know that Googles Protobuffers and Thrift deliver here. Avro might also fit.
The important thing to know about serialization is that it is not guaranteed to be consistent across multiple versions of Java. It's not meant as a way to store data on a disk or anywhere permanent.
It's used internally to send classes from one JVM to another during RMI or some other network protocol. These are the types of applications that you should use Serialization for. If this describes your problem - short term communication between two different JVM's - then you should try to get Serialization going.
If you're looking for a way to store the data more permanently or you will need the data to survive in forward versions of Java, then you should find your own solution. Given your requirements, you should create some sort of method of converting each object into a byte stream yourself and reading it back into objects. You will then be responsible for making sure the format is forward compatible with future objects and features.
I highly recommend Chapter 11 of Effective Java by Joshua Bloch.
Is the Externalizable interface what you're looking for ? You fully control the way your objects are persisted and you do that the OO-style, with methods that are inherited and all (unlike the private read-/write-Object methods used with Serializable). But still, you cannot get rid of the no-arg accessible constructor requirement.
The only way you would get this is:
A/ USE UTF8 text, I.E. XML or JSON, binary turned to base64(http/xml safe variety).
B/ Enforce UTF8 binary ordering of all data.
C/ Pack the contents except all unescaped white space.
D/ Hash the content and provide that hash in a positionally standard location in the file.

Can objects be written to and read from files?

I think the answer is yes, but I just wanted to make sure.
anyone's help would be greatly appreciated
Yes, it's called serialization. It typically involves creating a String representation of the class's data, and then creating a method which can parse the saved data to recreate an equivalent Object. The code for saving and restoring can either be part of the Object's Class or provided elsewhere in a larger framework.
An object itself can't really be stored to a file. If you want, you can serialize the data in the object to some kind of document, such as an XML file. You can define how the data is stored in it. Then when you want to read it, you just need to open and parse the XML document back into your object, the opposite from how you saved it.
http://java.sys-con.com/node/37550
Serialization is the process of converting an object state to a sequence of bytes. These bytes can be then stored on the disk as a file or sent across the sockets or stored in a DB as BLOB etc. The inverse process is called De-serialization.
Not all objects can be serialized though. Only the ones that implement Serializable interface. Read here for more details.
There are various serialization types like binary serialization (compact, faster etc), textual serialization (slower, might take more space but human readable).
Java's serialization format is not portable and some problems. There are better alternatives to Java's native serialization. Based on your requirement you can choose the best one. Here are few protobuf, thrift, json, xml, YAML
Beyond default JDK serialization that is already mentioned, and XML serialization (using either suggested XStream, or faster JAXB) (which is included in JDK 6, see package 'javax.xml.bind'), there are many other options.
For example JSON serialization using Jackson is very efficient and also bit more compact and readable (latter is subjective of course) than XML serialization.
java serialization
Absolutely! Like others pointed out, it's called serialization. Give a look at the XStream library. I think it's great for serializing to XML. It helped me a lot in my projects, and it's very, very easy to use.
To use default serialization the classes must implement Serializable interface

What is the best way to interoperably serialize a message?

I'm considering message serialization support for spring-integration. This would be useful for various wire level transports to implement guaranteed delivery, but also to allow interoperability with other messaging systems (e.g. through AMQP).
The fundamental problem that arises is that a message containing Java object in it's payload and headers should be converted to a byte[] and/or written to a stream. Java's own serialization is clearly not going to cut it because that is not interoperable. My preference would be to create an interface that allows the user to implement the needed logic for all Objects that take part in serialization.
This means I don't want to require the client developer to generate his domain code, but rather define a serializer for objects that need it. The interfaces would be something like:
public interface PayloadSerializer<T> {
byte[] bytesForObject(T source);
T objectFromBytes(byte[]);
//similar methods for streaming potentially
}
//add HeaderSerializer, MessageSerializer
Is this a sensible idea and what would the perfect interface look like? Is there a standard interoperable way to serialize Objects that would make sense in this context?
There is a whole set of frameworks in java which generate XML, like JaxB, ... Now some of these formats might just as well be binary blobs and are difficult to use from other platforms, so it pays to try before you buy. There are also very easy to use XML serializers.
JSON is very popular nowadays because it gives easy interop with the browser and it is readable ad less verbose than XML.
Protocol Buffers and Thrift are popular when performance is important. These are binary formats but well specified and well supported on multiple platforms.
I would try to serialize the java objects into a XML representation and convert this into a byte array for stream I/O.

Unserialize in Java a serialized php object

Does anyone know if it is possible, actually if it has been done, to serialize an object in php and unserialize it in Java (java-php communication). Maybe an adapter will be needed.
What do you think?
Thanks
There is serialized-php-parser, which is a Java implementation that can parse php-serialized objects. In general, if you have the choice, I wouldn't recommend php-serialized as an exchange format, because it isn't ascii-safe (It contains null-bytes). Go with a format like xml or json instead. If you need a bit of type-information, xmlrpc is a good choice. It has good implementations for both php and Java.
PHP and Java both use their own (obviously different) serialization schemes. You could however use an interchange format both could read and write.
The two most obvious examples are XML and JSON.
There are others however such as Google Protocol Buffers.
Another Java project to work with the PHP serialization format is Pherialize.
Let's say you are serializing an array like this:
array(3) {
[0]=>
string(8) "A string"
[1]=>
int(12345)
[2]=>
bool(true)
}
Then you can unserialize it in Java with Pherialize like this:
MixedArray list = Pherialize.unserialize(data).toArray();
System.out.println("Item 1: " + list.getString(0));
System.out.println("Item 2: " + list.getInteger(1));
System.out.println("Item 3: " + list.getBoolean(2));
Theoretically, it's certainly possible. It's just bytes after all, and they can be parsed. Of course, the deserialized object would contain only data, not any of the PHP methods. If you want that, you'd have to rewrite the behaviour as Java classes that correspond directly with the PHP classes.
In practice, the main problem seems to be that the PHP serialization format does not seem to be formally specified - at least there is no link to a specification in the manual.
So you might have to dig through the code to understand the format.
All in all, it sounds like it would be much easier and more stable to use something like XML serialization - I'm sure both languages have libraries that faciliate this.
The JSON format would be a good place to start. There are implementations for Java, PHP and many other languages.
While initially based on the javascript object literal notation,
JSON proved convenient for lightweight data transfer between all types of systems.
add into pom.xml
<dependency>
<groupId>de.ailis.pherialize</groupId>
<artifactId>pherialize</artifactId>
<version>1.2.1</version>
</dependency>
then in code use
MixedArray list = Pherialize.unserialize(data).toArray(); // data is string `enter code here`
You can somehow make use of PHP's var_export() function for this, which returns a parseable string representation of the object you want to serialize.
I remember a snippet for Drupal (PHP CMS) where this functionality was needed. Just found it, so take a look at Serialized drupal node objects to java (should work with any PHP serialized object).
Maybe you can use that. I don't know whether there are issues with newer versions of PHP.
Serializing an object in PHP will dump the object properties. The resulting string isn't terribly complicated.
echo serialize(
array(1, null, "mystring", array("key"=>"value"))
);
Results in:
a:4:{i:0;i:1;i:1;N;i:2;s:8:"mystring";i:3;a:1:{s:3:"key";s:5:"value";}}
The string identifies datatypes, array lengths, array indexes and values, string lengths... Wouldn't take too much effort to reverse-engineer it and come up with your own parser, I think.
Like previous answers have mentioned, I would avoid PHP object serialization if possible. Use JSON (which is actually faster than serialize() in PHP), thrift or some other format that is more universal.
If you have no choice I have been working on a Jackson Module to enable reading and writing serialized PHP from Java. Jackson is a great JSON parser and since PHP serialization format is pretty similar it seemed like a good fit. It's not quite complete yet (writing is still a work in progress).
A better choice is to parse php serialized string to JSONArray, this repo (https://github.com/superalsrk/PhpSerialization) may help you
Note that there's a Java implementation of PHP. So you may be able to serialise the object and pass it to your Java-PHP instance, deserialise and then call into your Java infrastructure.
It all sounds a bit of an unholy mess, but perhaps worth looking at!
Use Web Services (REST, RPC, SOAP) or any other solution storing plain text that will allow you to read/rebuild the data from Java.
You may be also interested in using PHP/Java bridge (http://php-java-bridge.sourceforge.net/). It has own protocol. In their site said that it's fast implementation of bridge.
Try xstream (converts Java objects into readable XML) to serialize and then write your own PHP code to deserialize.

Categories