Hi I am serializing an object using different VM (Oracle hotspot,jse) and deserializing it with android VM(dalvik). will there be any problem?
Assuming that by "serialization" you mean Serializable, then yes. Serialization is not guaranteed to be the same across distinct VMs. Please use something else (e.g., XML, JSON).
UPDATE
Your first comment is so flawed that I cannot fit my response in 500 characters.
ofcourse yes. without implementing Serializable we cannot serialize
Talented programmers can. Talented programmers can serialize data to XML, JSON, Protocol Buffers, Thrift, ASN.l, YAML, and any number of other formats.
what actually i am doing is i am writing an object on to the network using ObjectOutputStream(oracle hotspot) and reading that object on the android using ObjectInputStream
Talented programmers use platform-independent serialization approaches, such as any of the ones I listed above. That is because talented programmers realize that, in the future, there may be need to have clients or servers that are not based in Java.
So you mean to say as of now this is fine but in the future it is not guaranteed.
No. I wrote:
Serialization is not guaranteed to be the same across distinct VMs.
An object serialized using one VM (e.g., Oracle) should be able to be de-serialized using that VM. There is no guarantee that an object serialized with one VM can be de-serialized using another VM. In fact, developers have gotten in trouble trying to do precisely what you are trying to do. This is another example of why talented programmers use platform-independent serialization structures.
Related
We're looking for a high performance compact serialization solution for Java objects on GAE.
Native Java serialization doesn't perform all that well and it's terrible at compatibility i.e. it can't unserialize an old object if a field is added to the class or removed.
We tried Kryo which performs well in other environments and supports back compatibility when fields are added, but unfortunately the GAE SecurityManager slows it down terribly by adding a check to every method call in the recursion. I'm concerned that might be the issue with all serialization libraries.
Any ideas please? Thanks!
Beware, premature optimisation is the root of all evil.
You should first try one of the standard solutions and then decide if it fits your performance requirements. I did test several serialization solutions on GAE (java serialisation, JSON, JSON+ZIP) and they were an order of magnitude faster than datastore access.
So if serialising data takes 10ms and writing it to datastore takes 100ms, there is very little added benefit in trying to optimise the 10ms.
Btw, did you try Jackson?
Also, all API calls on GAE are implemented as RPC calls to other servers, where payload is serialised as protobuf.
Do you need cross-language ability? and re. high-performance are you referring to speed only, or including optimized memory management for less GC, or including serialized object size?
If you need cross-language I think Google's protobuf is a solution. However, it can hardly be called "high performance" because the UTF-8 strings created on Java side causes constant GCs.
In case the data you are supporting is mostly simple objects and you don't need composition, I would recommend you to write your own serialization layer (not kidding).
Using an enum to index your fields so you can serialize fields that contains value only
Create maps for primitive types using trove4j collections.
Using cached ByteBuffer objects if you could predict size for most of your objects to be under a certain value.
Using string dictionary to reduce string object re-creation and use cached StringBuilder during deserialization
That's what we did for our "high-performance" java serialization layer. Essentially we could achieve almost object-less serialization/de-serialization on a reasonably good timing.
When I first started using the Java Preferences API, the one glaring omission from the API was a putObject() method. I've always wondered why they did not include it.
So, I did some googling and I found this article from IBM which shows you how to do it: http://www.ibm.com/developerworks/library/j-prefapi/
The method they're using seems a bit hackish to me, because you have to break the Object up into byte matrices, store them, and reassemble them later.
My question is, has anyone tried this approach? Can you testify that it is a good way to store/retrieve objects?.
I'm also curious why the Java devs left putObject() out of the API. Does anyone have valuable insight?
I'm also curious why the Java devs left putObject() out of the API.
Does anyone have valuable insight?
From: http://docs.oracle.com/javase/7/docs/technotes/guides/preferences/designfaq.html
Why doesn't this API contain methods to read and write arbitrary
serializable objects?
Serialized objects are somewhat fragile: if the version of the program
that reads such a property differs from the version that wrote it, the
object may not deserialize properly (or at all). It is not impossible
to store serialized objects using this API, but we do not encourage
it, and have not provided a convenience method.
The article describes a reliable way to do it. I see there are a couple of things I may do differently (like I would store the count of the number of pieces as well as the pieces themselves so that I can figure things out easily when I retrieve them).
Your comment about Serialization is wrong though.... the object you want to store has to be Serializable.... that's how the ObjectOutputStream that the document uses does it's job.
So, Yes, it looks like a reliable mechanism, you need to have Serializable objects, and I imagine that the reason that putObject and getObject are not part of the API for two reasons:
it's not part of the way that is native to Windows registries
It risks people putting huge amounts of data in the registry.
Storing serialized objects in the registry strikes me as being somewhat concerning because they can be so big. I would only use it for occasions when there is no way to reconstruct the Object from constructors, and the serialized version is relatively small.
I have been studying Java networking for a while.
I am using ObjectInputStream and ObjectOutputStream for the I/O between sockets.
Is this possible to transfer a Entity or Model from server to client and vise versa?
How can I implement this? Am I suppose to implement the Entity or Model to Serializable?
Your response is highly appreciated.
I am not sure what sort of special thing you mean to denote by capital-E Entity and capital-M Model; these terms don't have any fixed, privileged meaning in Java (although they might with respect to a certain API or framework.) In general, if by these you just mean some specific Java objects, then yes, you can send any sort of objects this way, and yes, they would be required to implement Serializable. The only limitations would be if these objects contained members whose values wouldn't make sense on the other end of the pipe -- like file paths, etc.
Note that if you send one object, you'll end up sending every other object it holds a non-transient reference to, as well.
First of all... why sending an object through I/O stream? What's wrong with XML?
However, you can always send/receive an object through I/O stream as long as the sender can serialize the object and the receiver can deserialize the object. Hope it helps
You definitely need to look at one of these two libraries
Google gson: http://code.google.com/p/google-gson/
Converts Java object to JSON and back. advantage is that the object can be consumed or generated by Javascript. I have also used this for Java-Java RPC, but it gives you flexibility if you want to target browsers later
Google protocol buffers: http://code.google.com/apis/protocolbuffers/
This is what google uses for RPC. Implementations for Java, C, Python. If you need performance and the smallest size, this is the one to go with (The trade off is you can't look at the data easily to debug problems, like you can with gson, which generates plaint text JSON).
I very much like the simplicity of calling remote methods via Java's RMI, but the verbosity of its serialization format is a major buzz kill (Yes, I have benchmarked, thanks). It seems that the architects at Sun did the obvious right thing when designing the RPC (speaking loosely) component, but pulled an epic fail when it came to implementing serialization.
Conversely, it seems the architects of Thrift, Avro, Kryo (especially), protocol buffers (not so much), etc. generally did the obvious right thing when designing their serialization formats, but either do not provide a RPC mechanism, provide one that is needlessly convoluted (or immature), or else one that is more geared toward data transfer than invoking remote methods (perfectly fine for many purposes, but not what I'm looking for).
So, the obvious question: How can I use RMI's method-invocation loveliness but employ one of the above libraries for the wire protocol? Is this possible without a lot of work? Am I evaluating one of the aforementioned libraries too harshly (N.B. I very much dislike code generation, in general; I dislike unnecessary annotations somewhat, and XML configuration quite a bit more; any sort of "beans" make me cringe--I don't need the weight; ideally, I'm looking to just implement an interface for my remote objects, as with RMI).
Once upon a time, I did have the same requirement. I had changed rmi methods arguments and return types to byte[].
I had serialized objects with my preferred serializer to byte array, then called my modified rmi methods.
Well, as you mentioned java serialization is too verbose, therefore 5 years ago I did implement a space efficient serialization algorithm. It saves too much space, if you are sending a very complex object graph.. Recently, I have to port this serialization implementation to GWT, because GWT serialization in Dev mode is incredibly slow.
As an example;
rmi method
public void saveEmployee(Employee emp){
//business code
}
you should change it like below ,
public void saveEmployee(byte[] empByte) {
YourPreferredSerializer serialier = YourPreferredSerializerFactory.creteSerializer();
Employee emp = (Employee) serializer.deSerialize(empByte);
//business code
}
EDIT :
You should check MessagePack . it looks promising.
I don't think there is a way to re-wire RMI, but it might be that specific replacement projects -- I am specifically thinking of DiRMI -- might? And/or project owners might be interest in helping with this (Brian, its author, is a very competent s/w engineer from Amazon.com).
Another interesting project is Protostuff -- its author is building a RPC framework too (I think); but even without it supports an impressive range of data formats; and does this very efficiently (as per https://github.com/eishay/jvm-serializers/wiki/).
Btw, I personally think biggest mistake most projects have made (like PB, Avro) is not keeping proper separation between RPC and serialization aspects nicely separate.
So ability to do RPC using a pluggable data format or serialization providers seems like a good idea to me.
writeReplace() and readResolve() is probably the best combo for doing so. Mighty powerful in the right hands.
Java serialization is only verbose where it describes the classes and fields it's serializing. Overall, the format is as "self describing" as XML. You can can actually override this and replace it with something else. This is what the writeClassDescriptor and readClassDescriptor methods are for. Dirmi overrides these methods, and so it is able to use standard object serialization with less wire overhead.
The way it works is related to how its sessions work. Both endpoints may have different versions of the object, and so simply throwing away the class descriptors won't work. Instead, additional data is exchanged (in the background) so that the serialized descriptor is replaced with a session-specific identifier. Upon seeing the identifier, a lookup table is examined to find the descriptor object. Because the data is exchanged in the background, there's a brief "warm up period" after a session is created and for every time an object type is written for the first time.
Dirmi has no way to replace the wire format at this time.
The log4j network adapter sends events as a serialised java object. I would like to be able to capture this object and deserialise it in a different language (python). Is this possible?
NOTE The network capturing is easy; its just a TCP socket and reading in a stream. The difficulty is the deserialising part
Generally, no.
The stream format for Java serialization is defined in this document, but you need access to the original class definitions (and a Java runtime to load them into) to turn the stream data back into something approaching the original objects. For example, classes may define writeObject() and readObject() methods to customise their own serialized form.
(edit: lubos hasko suggests having a little java program to deserialize the objects in front of Python, but the problem is that for this to work, your "little java program" needs to load the same versions of all the same classes that it might deserialize. Which is tricky if you're receiving log messages from one app, and really tricky if you're multiplexing more than one log stream. Either way, it's not going to be a little program any more. edit2: I could be wrong here, I don't know what gets serialized. If it's just log4j classes you should be fine. On the other hand, it's possible to log arbitrary exceptions, and if they get put in the stream as well my point stands.)
It would be much easier to customise the log4j network adapter and replace the raw serialization with some more easily-deserialized form (for example you could use XStream to turn the object into an XML representation)
Theoretically, it's possible. The Java Serialization, like pretty much everything in Javaland, is standardized. So, you could implement a deserializer according to that standard in Python. However, the Java Serialization format is not designed for cross-language use, the serialization format is closely tied to the way objects are represented inside the JVM. While implementing a JVM in Python is surely a fun exercise, it's probably not what you're looking for (-:
There are other (data) serialization formats that are specifically designed to be language agnostic. They usually work by stripping the data formats down to the bare minimum (number, string, sequence, dictionary and that's it) and thus requiring a bit of work on both ends to represent a rich object as a graph of dumb data structures (and vice versa).
Two examples are JSON (JavaScript Object Notation) and YAML (YAML Ain't Markup Language).
ASN.1 (Abstract Syntax Notation One) is another data serialization format. Instead of dumbing the format down to a point where it can be easily understood, ASN.1 is self-describing, meaning all the information needed to decode a stream is encoded within the stream itself.
And, of course, XML (eXtensible Markup Language), will work too, provided that it is not just used to provide textual representation of a "memory dump" of a Java object, but an actual abstract, language-agnostic encoding.
So, to make a long story short: your best bet is to either try to coerce log4j into logging in one of the above-mentioned formats, replace log4j with something that does that or try to somehow intercept the objects before they are sent over the wire and convert them before leaving Javaland.
Libraries that implement JSON, YAML, ASN.1 and XML are available for both Java and Python (and pretty much every programming language known to man).
I would recommend moving to a third-party format (by creating your own log4j adapters etc) that both languages understand and can easily marshal / unmarshal, e.g. XML.
In theory it's possible. Now how difficult in practice it might be depends on whether Java serialization format is documented or not. I guess, it's not. edit: oops, I was wrong, thanks Charles.
Anyway, this is what I suggest you to do
capture from log4j & deserialize Java object in your own little Java program.
now when you have the object again, serialize it using your own custom formatter.
Tip: Maybe you don't even have to write your own custom formatter. for example, JSON (scroll down for libs) has libraries for Python and Java, so you could in theory use Java library to serialize your objects and Python equivalent library to deserialize it
send output stream to your python application and deserialize it
Charles wrote:
the problem is that for this
to work, your "little java program"
needs to load the same versions of all
the same classes that it might
deserialize. Which is tricky if you're
receiving log messages from one app,
and really tricky if you're
multiplexing more than one log stream.
Either way, it's not going to be a
little program any more.
Can't you just simply reference Java log4j libraries in your own java process? I'm just giving general advice here that is applicable to any pair of languages (name of the question is pretty language agnostic so I just provided one of the generic solutions). Anyway, I'm not familiar with log4j and don't know whether you can "inject" your own serializer into it. If you can, then of course your suggestion is much better and cleaner.
Well I am not Python expert so I can't comment on how to solve your problem but if you have program in .NET you may use IKVM.NET to deserialize Java objects easily. I have experimented this by creating .NET Client for Log4J log messages written to Socket appender and it worked really well.
I am sorry, if this answer does not make sense here.
If you can have a JVM on the receiving side and the class definitions for the serialized data, and you only want to use Python and no other language, then you may use Jython:
you would deserialize what you received using the correct Java methods
and then you process what you get with you Python code