Deserialize in a different language - java

The log4j network adapter sends events as a serialised java object. I would like to be able to capture this object and deserialise it in a different language (python). Is this possible?
NOTE The network capturing is easy; its just a TCP socket and reading in a stream. The difficulty is the deserialising part

Generally, no.
The stream format for Java serialization is defined in this document, but you need access to the original class definitions (and a Java runtime to load them into) to turn the stream data back into something approaching the original objects. For example, classes may define writeObject() and readObject() methods to customise their own serialized form.
(edit: lubos hasko suggests having a little java program to deserialize the objects in front of Python, but the problem is that for this to work, your "little java program" needs to load the same versions of all the same classes that it might deserialize. Which is tricky if you're receiving log messages from one app, and really tricky if you're multiplexing more than one log stream. Either way, it's not going to be a little program any more. edit2: I could be wrong here, I don't know what gets serialized. If it's just log4j classes you should be fine. On the other hand, it's possible to log arbitrary exceptions, and if they get put in the stream as well my point stands.)
It would be much easier to customise the log4j network adapter and replace the raw serialization with some more easily-deserialized form (for example you could use XStream to turn the object into an XML representation)

Theoretically, it's possible. The Java Serialization, like pretty much everything in Javaland, is standardized. So, you could implement a deserializer according to that standard in Python. However, the Java Serialization format is not designed for cross-language use, the serialization format is closely tied to the way objects are represented inside the JVM. While implementing a JVM in Python is surely a fun exercise, it's probably not what you're looking for (-:
There are other (data) serialization formats that are specifically designed to be language agnostic. They usually work by stripping the data formats down to the bare minimum (number, string, sequence, dictionary and that's it) and thus requiring a bit of work on both ends to represent a rich object as a graph of dumb data structures (and vice versa).
Two examples are JSON (JavaScript Object Notation) and YAML (YAML Ain't Markup Language).
ASN.1 (Abstract Syntax Notation One) is another data serialization format. Instead of dumbing the format down to a point where it can be easily understood, ASN.1 is self-describing, meaning all the information needed to decode a stream is encoded within the stream itself.
And, of course, XML (eXtensible Markup Language), will work too, provided that it is not just used to provide textual representation of a "memory dump" of a Java object, but an actual abstract, language-agnostic encoding.
So, to make a long story short: your best bet is to either try to coerce log4j into logging in one of the above-mentioned formats, replace log4j with something that does that or try to somehow intercept the objects before they are sent over the wire and convert them before leaving Javaland.
Libraries that implement JSON, YAML, ASN.1 and XML are available for both Java and Python (and pretty much every programming language known to man).

I would recommend moving to a third-party format (by creating your own log4j adapters etc) that both languages understand and can easily marshal / unmarshal, e.g. XML.

In theory it's possible. Now how difficult in practice it might be depends on whether Java serialization format is documented or not. I guess, it's not. edit: oops, I was wrong, thanks Charles.
Anyway, this is what I suggest you to do
capture from log4j & deserialize Java object in your own little Java program.
now when you have the object again, serialize it using your own custom formatter.
Tip: Maybe you don't even have to write your own custom formatter. for example, JSON (scroll down for libs) has libraries for Python and Java, so you could in theory use Java library to serialize your objects and Python equivalent library to deserialize it
send output stream to your python application and deserialize it
Charles wrote:
the problem is that for this
to work, your "little java program"
needs to load the same versions of all
the same classes that it might
deserialize. Which is tricky if you're
receiving log messages from one app,
and really tricky if you're
multiplexing more than one log stream.
Either way, it's not going to be a
little program any more.
Can't you just simply reference Java log4j libraries in your own java process? I'm just giving general advice here that is applicable to any pair of languages (name of the question is pretty language agnostic so I just provided one of the generic solutions). Anyway, I'm not familiar with log4j and don't know whether you can "inject" your own serializer into it. If you can, then of course your suggestion is much better and cleaner.

Well I am not Python expert so I can't comment on how to solve your problem but if you have program in .NET you may use IKVM.NET to deserialize Java objects easily. I have experimented this by creating .NET Client for Log4J log messages written to Socket appender and it worked really well.
I am sorry, if this answer does not make sense here.

If you can have a JVM on the receiving side and the class definitions for the serialized data, and you only want to use Python and no other language, then you may use Jython:
you would deserialize what you received using the correct Java methods
and then you process what you get with you Python code

Related

Java Serialization and references

Say I have a serialized class that is used to save the state of my game. This serialized text is stored in a text file. If I restart my computer, reinstall java, etc. If I try to deserialize that text, will it save everything that it is referenced? For the purpose of question, assume the class has multiple ArrayList's of entitys and map elements.
Class -> Serialization - > text
Text - > Deserialization - > Class
As long as the serialized classes don't change its definition, there won't be any problem. You may even move these serialized files into another OS which deserializes to the same classes definition and it will work with no problem (unless you use libraries specific to an OS, thus breaking portability).
The serialized file will retain it's state regardless of whether or not you reinstall Java or restart your machine. It would be pretty useless otherwise: the point of serialization is to capture state in a persistent form for archival or transport in a fashion that can be to recreate that state later.
So, assuming the serialization method you are using originally saves all the references you care about, then you'll always be able to restore that object and all its references from that serialized data. Unless, perhaps, you try in five years using a new version of Java that no longer supports that serialization format or something.
Are you about standart java serialization mechanism? Java serialization mechanism will serialize data to binary format, not text. Java guarantees serialization/deserialization between versions until you use standart java library. So yes in your case serialization will work good.
But, I don't sure that standart java serialization good for your purposes. Because:
It's fully unreadable format.
If you'll change language later you must fully reimplement saving format.
As alternative you can use some of xml serialization libraries for java. In this case, save file will have readable format (good for debugging).

Serializing objects with the Java Preferences API

When I first started using the Java Preferences API, the one glaring omission from the API was a putObject() method. I've always wondered why they did not include it.
So, I did some googling and I found this article from IBM which shows you how to do it: http://www.ibm.com/developerworks/library/j-prefapi/
The method they're using seems a bit hackish to me, because you have to break the Object up into byte matrices, store them, and reassemble them later.
My question is, has anyone tried this approach? Can you testify that it is a good way to store/retrieve objects?.
I'm also curious why the Java devs left putObject() out of the API. Does anyone have valuable insight?
I'm also curious why the Java devs left putObject() out of the API.
Does anyone have valuable insight?
From: http://docs.oracle.com/javase/7/docs/technotes/guides/preferences/designfaq.html
Why doesn't this API contain methods to read and write arbitrary
serializable objects?
Serialized objects are somewhat fragile: if the version of the program
that reads such a property differs from the version that wrote it, the
object may not deserialize properly (or at all). It is not impossible
to store serialized objects using this API, but we do not encourage
it, and have not provided a convenience method.
The article describes a reliable way to do it. I see there are a couple of things I may do differently (like I would store the count of the number of pieces as well as the pieces themselves so that I can figure things out easily when I retrieve them).
Your comment about Serialization is wrong though.... the object you want to store has to be Serializable.... that's how the ObjectOutputStream that the document uses does it's job.
So, Yes, it looks like a reliable mechanism, you need to have Serializable objects, and I imagine that the reason that putObject and getObject are not part of the API for two reasons:
it's not part of the way that is native to Windows registries
It risks people putting huge amounts of data in the registry.
Storing serialized objects in the registry strikes me as being somewhat concerning because they can be so big. I would only use it for occasions when there is no way to reconstruct the Object from constructors, and the serialized version is relatively small.

Transferring typed objects across platforms

I'd like to create a web API of some kind (I don't have a preference for the protocol), where the server uses Java and the client uses PHP.
I want the request and response to both be objects (instances of classes, not JSON-style hashes). The objects' fields can be primitive types or other objects. I would define all the necessary classes in both the client and server code. PHP and Java have similar object models, so it shouldn't be hard to write corresponding classes in both languages.
To make this work, there would need to be some automated way to serialize an object on one side, and unserialize it on the other. It would need to know which PHP class maps to which Java class, and how to convert the fields. I could write something, but is there an existing protocol for transferring objects like this? Can this be done with SOAP?
Java and PHP objects are not interchangeable. You will have to define the object types on both ends, and the transfer protocol could be anything you like. Serialization and deserialization makes the whole process transparent. The transport medium could be JSON, XML, YAML, or anything else for that matter.
For a record-like objects:
{"_type":"MyCoolObjectType", "a":1, "b":2, "c":3"}
If you're wanting to write once and use everywhere, I'd recommend using the same language on both ends, otherwise you'll have to have a compiler that can translate between your choice languages.
A SOAP web service can handle the basic abstraction as long as the request/response is not very complex. You can create the classes in java and then get the API to export a WSDL for them.
You need to have them both serialize to the same string. The PHP format and Java format for serialization are different, and therefore incompatible. You need a common exchange format, and I recommend that you DON'T use PHP's. However, the functions to serialize in PHP are fairly simple, are contained in ext/standard/var.c file in the PHP source if you choose to use it..
See the following:
Unserialize in Java a serialized php object - A similar question to yours.
http://en.wikipedia.org/wiki/Serialization#Serialization_formats
http://en.wikipedia.org/wiki/XML
XML, API, CSV, SOAP! Understanding the Alphabet Soup of Data Exchange
From http://en.wikipedia.org/wiki/XML (emphasis mine):
Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services.

Creating multithreaded Java server and clients, but messages have to be in XML format

I've got to write a multithreaded chat program, using a server and clients but each message sent has to be in XML.
Is it simpler/easier just to write out all the code in java, and then try and somehow alter it so the messages are sent in XMl format, or would it be simpler just to try and go for it in XML and hope it works. I'll admit I don't know that much about XML. :)
Also any links to any relevant online help/tutorials would be much appreciated.
Thanks.
When messing with XML in Java, PLEASE consider using JAXB or something similar. It allows you to work with a normal object graph in memory and then serialize that to XML in one operation (and the other way around).
Manipulating XML through the DOM API is a slow way to lose your sanity, do not do it for any non-trivial amount of XML.
I fail to see what the program being multithreaded or a server have to do with it though...
Check out XStream. You can use this to marshall a normal Java object into XML, and back again into an object, without having to do anything instrusive like define interfaces or specify schema etc. i.e. it works out of the box for objects you already have defined. For most cases it's seamless in its default mode.
XStream produces a direct XML serialised representation of a Java object (i.e. XML elements represent each field of a Java object directly). You can customise this further as/when you require. If you want to define persisted objects in terms of schema (XSD) then it's not appropriate. However if you're transporting objects where persistence is short-term and you're not worried about conforming to some schema then it's definitely of use.
e.g.
Person person = new Person("Brian Agnew");
XStream xStream = new XStream();
System.out.println(xStream.toXML(person));
and conversion from XML to the Person object is similarly trivial.
(note XStream is thread-safe)
There is something called XML RPC. This examples pretty much shows what you're looking for:
http://docstore.mik.ua/orelly/xml/jxml/ch11_02.htm
It would be simpler to use existing XMPP clients and servers and not write your own at all.
If this is in fact homework, then I would suggest writing the client and server as you have suggested, using all java, but use a String as the message. You can then easily add parsing of the string to/from XML when all other parts are working.
I would suggest to also have a look at Betwixt and Digester. For Digester there are some tutorials which can be found in the Digister-wiki. Betwixt provides some pretty good tutorials right on its website.
Additionally to these two tools there is a list of alternatives that can be found in the Reference section of http://wiki.apache.org/commons/Digester/WhyUseDigester
You're on the right page trying to break the task into smaller pieces.

Serialize Java objects into Java code

Does somebody know a Java library which serializes a Java object hierarchy into Java code which generates this object hierarchy? Like Object/XML serialization, only that the output format is not binary/XML but Java code.
Serialised data represents the internal data of objects. There isn't enough information to work out what methods you would need to call on the objects to reproduce the internal state.
There are two obvious approaches:
Encode the serialised data in a literal String and deserialise that.
Use java.beans XML persistence, which should be easy enough to process with your favourite XML->Java source technique.
I am not aware of any libraries that will do this out of the box but you should be able to take one of the many object to XML serialisation libraries and customise the backend code to generate Java. Would probably not be much code.
For example a quick google turned up XStream. I've never used it but is seems to support multiple backends other than XML - e.g. JSON. You can implement your own writer and just write out the Java code needed to recreate the hierarchy.
I'm sure you could do the same with other libraries, in particular if you can hook into a SAX event stream.
See:
HierarchicalStreamWriter
Great question. I was thinking about serializing objects into java code to make testing easier. The use case would be to load some data into a db, then generate the code creating an object and later use this code in test methods to initialize data without the need to access the DB.
It is somehow true that the object state doesn't contain enough info to know how it's been created and transformed, however, for simple java beans there is no reason why this shouldn't be possible.
Do you feel like writing a small library for this purpose? I'll start coding soon!
XStream is a serialization library I used for serialization to XML. It should be possible and rather easy to extend it so that it writes Java code.

Categories