I'm coming to Java from a PHP background, and am surprised to see that JSON to object conversion is so constrained. In all the Jackson tutorials I came across, it looks like the object to be read needs to be pre-defined. Thus, if my data is in, say, JSON API format, I need to write boilerplate code to strip out everything except the "data" part, and then somehow convert all the strings into objects one by one.
I really miss PHP's json_decode function, which will read any JSON and give you a PHP object to play with. It also builds the necessary structure into the object, adding arrays and sub-objects as needed. Of course I understand that Java is a compiled language, but I'm wondering how this can be made easier.
As a strongly typed language Java often has less of these "just give it to me"-type of functionalities, but that doesn't mean they don't exist. Even Jackson can deserialize JSON without a predefined schema, giving you Maps and Lists instead of domain objects.
Just remember that if you're working on "real" projects, there are plenty of advantages from having the schemas defined. They weren't invented to annoy you, but to make sure that you can trust your data being in the correct form (or find out early if it's not).
Related
Is an object's implementation of the Serializable interface in any way related to that object's ability to be serialized into JSON or XML?
Is there a name for the text format that Java serialization uses?
If not, should we not use the word "serialization" to describe exporting an object to JSON or XML, to avoid confusion?
At AO, what uses are typical for each of these three serialization methods?
I know that JAXB is usually used to convert XML to Java, not the other way around, but I heard that the reverse is possible too.
Serialization simply refers to exporting an object from a process-specific in-memory format to an inter-process format that can be read and understood by a different process or application. It may be text, or it may be binary, it doesn't matter. It's all serialization. The reverse processes (reading and parsing a serialized inter-process format into an in-memory, in-process format) is called deserialization.
In that sense, serializing an object into an ObjectStream is just as much serialization as serializing it to JSON or XML. ObjectStream serialization is very difficult to understand/parse by non-java (including humans. It is not "human-readable"), but is used because it can be done on pretty much any object without any special markup.
JSON/XML on the other hand require extra work to tell the parser how to map them to and from JSON/XML, but are very portable - pretty much every language can understand JSON/XML, including humans - it is "human-readable".
One purpose of serialization of Java objects is being able to write them to a (binary) file from which some Java program can read them back, getting the same objects into its memory. This usage is usually limited to Java applications, writing and reading, although some non-Java app might be written to understand the binary format.
Another frequently used serialization of Java objects is to write them to a text (or binary) file from which some (note the absence of: Java) program can read and reconstruct an object or data structure equivalent to the POJO. This, of course, also works in the reverse direction. (I'm adding "binary", because there are some binary formats not defined by Java that are architecture-independent, e.g., ASN.1.)
And, yes, JAXB works either way, but there are some difficulties if the XML is rather "outlandish", i.e., far away from what JAXB can handle or handle easily. But if you can design either the XML Schema or the Java classes, it works very well. JAXB being part of the JDK, you might prefer using it over other serializations if you need to go from Java to X or back. There are other languange binding for XML.
I've seen several posts and topics regarding marshaling and serialization and I'm looking to gain some additional understanding/clarify my thought process.
I read What is the difference between Serialization and Marshaling? and a lot of the responses show that they are synonymy in a sense. But I think there may be some differences which I'm trying to clarify.
My understanding is that java serialization takes an object and makes it into a binary stream which can then be deserialized, as shown in the following example http://www.tutorialspoint.com/java/java_serialization.htm
For marshaling/demarshaling, I've seen classes get converted into an xml representation of the bean and have the information passed between a client and server, then recreated on the other end.
Based on the above my question(s) are:
Does serialization always go to binary format? If so, do we have to worry about different machine architectures like Big Indian vs. Little Indian or does java handle this for us?
If we represent our data over the wire as xml or json, is this always referred to as marshaling/demarshaling?
If the above bullets are true, then is there an advantage to one over the other?
I think it is just a matter of taste and context.
Most times I encounter the term either of the terms means that you want to turn an object into a string of 0 and 1.
But sometimes a specification might attach a slightly different meaning to it.
See the java case on wikipedia.
http://en.wikipedia.org/wiki/Marshalling_(computer_science)
I'm looking for some info on the best approach serialize a graph of object based on the following (Java):
Two objects of the same class must be binary equal (bit by bit) compared to true if their state is equal. (Must not depend on JVM field ordering).
Collections are only modeled with arrays (nothing Collections).
All instances are immutable
Serialization format should be in byte[] format instead of text based.
I am in control of all the classes in the graph.
I don't want to put an empty constructor in the classes just to support serialization.
I have looked at implementing a solution based my own traversal an on Objenisis but my problem does not seem that unique. Better checking for any existing/complete solution first.
Updated details:
First, thanks for your help!
Objects must serialize to exactly the same bit order based on the objects state. This is important since the binary content will be digitally signed. Reconstruction of the serialized format will be based on the state of the object and not that the original bits are stored.
Interoperability between different technologies is important. I do see the software running on ex. .Net in the future. No Java flavour in the serialized format.
Note on comments of immutability: The values of the arrays are copied from the argument to the inner fields in the constructor. Less important.
Best regards,
Niclas Lindberg
You could write the data yourself, using reflections or hand coded methods. I use methods which are look hand code, except they are generated. (The performance of hand coded, and the convience of not having to rewrite the code when it changes)
Often developers talk about the builtin java serialization, but you can have a custom serialization to do whatever you want, any way you want.
To give you are more detailed answer, it would depend on what you want to do exactly.
BTW: You can serialize your data into byte[] and still make it human readable/text like/editable in a text editor. All you have to do is use a binary format which looks like text. ;)
Maybe you want to familiarize yourself with the serialization frameworks available for Java. A good starting point for that is the thift-protobuf-compare project, whose name is misleading: It compares the performance of more than 10 ways of serializing data using Java.
It seems that the hardest constraint you have is Interoperability between different technologies. I know that Googles Protobuffers and Thrift deliver here. Avro might also fit.
The important thing to know about serialization is that it is not guaranteed to be consistent across multiple versions of Java. It's not meant as a way to store data on a disk or anywhere permanent.
It's used internally to send classes from one JVM to another during RMI or some other network protocol. These are the types of applications that you should use Serialization for. If this describes your problem - short term communication between two different JVM's - then you should try to get Serialization going.
If you're looking for a way to store the data more permanently or you will need the data to survive in forward versions of Java, then you should find your own solution. Given your requirements, you should create some sort of method of converting each object into a byte stream yourself and reading it back into objects. You will then be responsible for making sure the format is forward compatible with future objects and features.
I highly recommend Chapter 11 of Effective Java by Joshua Bloch.
Is the Externalizable interface what you're looking for ? You fully control the way your objects are persisted and you do that the OO-style, with methods that are inherited and all (unlike the private read-/write-Object methods used with Serializable). But still, you cannot get rid of the no-arg accessible constructor requirement.
The only way you would get this is:
A/ USE UTF8 text, I.E. XML or JSON, binary turned to base64(http/xml safe variety).
B/ Enforce UTF8 binary ordering of all data.
C/ Pack the contents except all unescaped white space.
D/ Hash the content and provide that hash in a positionally standard location in the file.
I am trying to create and parse JSON, and I get by with some samples found on Google/SO or trial-and-error. But I need some help with JSON basics, parsing, creating arrays inside JSON strings, and so on. I read about the JSONStringer and such, but I need information about parsing and creating complex JSON.
EDIT: I use Java.
Thanks.
First step typically is to look beyond bare-bones Java lib from org.json; other related questions therefore are, for example:
https://stackoverflow.com/questions/338586/a-better-java-json-library
https://stackoverflow.com/questions/1668862/good-json-java-library
The reason for this is that there is no point in worrying too much about low-level details; rather you usually want to operate either with Java collections (List, Maps, wrapper types) or with basic Java objects. Other libraries can offer such abstractions.
My personal favorite is Jackson, and its tutorial is found here.
which language-script?
for example, if you are using javaScript jQuery offers you few functions for json (http://api.jquery.com/jQuery.parseJSON/)..
There isn't much to it. You got objects, arrays and primitives such as string, number, boolean and null. The syntax can be picked up by googling JSON.
The handling of JSON is more down to frameworks and server - are you translating a server side domain model to JSON? What server technology?
Client side pretty much any decent framework has helper methods for parsing JSON to get around certain browser differences (native JSON parsing being one). Check out jQuery.getJSON.
You can learn about JSON here.
In the Java side, you should actually not be writing/parsing JSON yourself. That's only a lot of tedious work and a waste of effort since there are plenty of libraries for this. Just pick a library which is able to convert a complex Java object to a JSON string (and vice versa) in a single call. This way you can concentrate on writing clean Java code, not on fiddling with JSON syntax in plain Java strings.
See also:
Converting complex JSON to Java
How to use Ajax/JSON in JSP/Servlet
Does anyone know if it is possible, actually if it has been done, to serialize an object in php and unserialize it in Java (java-php communication). Maybe an adapter will be needed.
What do you think?
Thanks
There is serialized-php-parser, which is a Java implementation that can parse php-serialized objects. In general, if you have the choice, I wouldn't recommend php-serialized as an exchange format, because it isn't ascii-safe (It contains null-bytes). Go with a format like xml or json instead. If you need a bit of type-information, xmlrpc is a good choice. It has good implementations for both php and Java.
PHP and Java both use their own (obviously different) serialization schemes. You could however use an interchange format both could read and write.
The two most obvious examples are XML and JSON.
There are others however such as Google Protocol Buffers.
Another Java project to work with the PHP serialization format is Pherialize.
Let's say you are serializing an array like this:
array(3) {
[0]=>
string(8) "A string"
[1]=>
int(12345)
[2]=>
bool(true)
}
Then you can unserialize it in Java with Pherialize like this:
MixedArray list = Pherialize.unserialize(data).toArray();
System.out.println("Item 1: " + list.getString(0));
System.out.println("Item 2: " + list.getInteger(1));
System.out.println("Item 3: " + list.getBoolean(2));
Theoretically, it's certainly possible. It's just bytes after all, and they can be parsed. Of course, the deserialized object would contain only data, not any of the PHP methods. If you want that, you'd have to rewrite the behaviour as Java classes that correspond directly with the PHP classes.
In practice, the main problem seems to be that the PHP serialization format does not seem to be formally specified - at least there is no link to a specification in the manual.
So you might have to dig through the code to understand the format.
All in all, it sounds like it would be much easier and more stable to use something like XML serialization - I'm sure both languages have libraries that faciliate this.
The JSON format would be a good place to start. There are implementations for Java, PHP and many other languages.
While initially based on the javascript object literal notation,
JSON proved convenient for lightweight data transfer between all types of systems.
add into pom.xml
<dependency>
<groupId>de.ailis.pherialize</groupId>
<artifactId>pherialize</artifactId>
<version>1.2.1</version>
</dependency>
then in code use
MixedArray list = Pherialize.unserialize(data).toArray(); // data is string `enter code here`
You can somehow make use of PHP's var_export() function for this, which returns a parseable string representation of the object you want to serialize.
I remember a snippet for Drupal (PHP CMS) where this functionality was needed. Just found it, so take a look at Serialized drupal node objects to java (should work with any PHP serialized object).
Maybe you can use that. I don't know whether there are issues with newer versions of PHP.
Serializing an object in PHP will dump the object properties. The resulting string isn't terribly complicated.
echo serialize(
array(1, null, "mystring", array("key"=>"value"))
);
Results in:
a:4:{i:0;i:1;i:1;N;i:2;s:8:"mystring";i:3;a:1:{s:3:"key";s:5:"value";}}
The string identifies datatypes, array lengths, array indexes and values, string lengths... Wouldn't take too much effort to reverse-engineer it and come up with your own parser, I think.
Like previous answers have mentioned, I would avoid PHP object serialization if possible. Use JSON (which is actually faster than serialize() in PHP), thrift or some other format that is more universal.
If you have no choice I have been working on a Jackson Module to enable reading and writing serialized PHP from Java. Jackson is a great JSON parser and since PHP serialization format is pretty similar it seemed like a good fit. It's not quite complete yet (writing is still a work in progress).
A better choice is to parse php serialized string to JSONArray, this repo (https://github.com/superalsrk/PhpSerialization) may help you
Note that there's a Java implementation of PHP. So you may be able to serialise the object and pass it to your Java-PHP instance, deserialise and then call into your Java infrastructure.
It all sounds a bit of an unholy mess, but perhaps worth looking at!
Use Web Services (REST, RPC, SOAP) or any other solution storing plain text that will allow you to read/rebuild the data from Java.
You may be also interested in using PHP/Java bridge (http://php-java-bridge.sourceforge.net/). It has own protocol. In their site said that it's fast implementation of bridge.
Try xstream (converts Java objects into readable XML) to serialize and then write your own PHP code to deserialize.