I was wondering how the serialization of MicroStream works in detail.
Since it is described as "Super-Fast" it has to rely on code-generation, right? Or is it based on reflections?
How would it perform in comparison to the Protobuf-Serialization, which relies on Code-generation that directly reads out of the java-fields and writes them into a bytebuffer and vice-versa.
Using reflections would drastically decrease the performance when serializing objects on a huge scale, wouldn't it?
I'm looking for a fast way to transmit and persist objects for a multiplayer-game and every millisecond counts. :)
Thanks in advance!
PS: Since I don't have enough reputation, I can not create the "microstream"-tag. https://microstream.one/
I am the lead developer of MicroStream.
(This is not an alias account. I really just created it. I'm reading on StackOverflow for 10 years or so but never had a reason to create an account. Until now.)
On every initialization, MicroStream analyzes the current runtime's versions of all required entity and value type classes and derives optimized metadata from them.
The same is done when encountering a class at runtime that was unknown so far.
The analysis is done per reflection, but since it is only done once for every handled class, the reflection performance cost is negligible.
The actual storing and loading or serialization and deserialization is done via optimized framework code based on the created metadata.
If a class layout changes, the type analysis creates a mapping from the field layout that the class' instances are stored in to that of the current class.
Automatically if possible (unambiguous changes or via some configurable heuristics), otherwise via a user-provided mapping. Performance stays the same since the JVM does not care if it (simplified speaking) copies a loaded value #3 to position #3 or to position #5. It's all in the metadata.
ByteBuffers are used, more precisely direct ByteBuffers, but only as an anchor for off-heap memory to work on via direct "Unsafe" low-level operations. If you are not familiar with "Unsafe" operations, a short and simple notion is: "It's as direct and fast as C++ code.". You can do anything you want very fast and close to memory, but you are also responsible for everything. For more details, google "sun.misc.Unsafe".
No code is generated. No byte code hacking, tacit replacement of instances by proxies or similar monkey business is used. On the technical level, it's just a Java library (including "Unsafe" usage), but with a lot of properly devised logic.
As a side note: reflection is not as slow as it is commonly considered to be. Not any more. It was, but it has been optimized pretty much in some past Java version(s?).
It's only slow if every operation has to do all the class analysis, field lookups, etc. anew (which an awful lot of frameworks seem to do because they are just badly written). If the fields are collected (set accessible, etc.) once and then cached, reflection is actually surprisingly fast.
Regarding the comparison to Protobuf-Serialization:
I can't say anything specific about it since I haven't used Protocol Buffers and I don't know how it works internally.
As usual with complex technologies, a truly meaningful comparison might be pretty difficult to do since different technologies have different optimization priorities and limitations.
Most serialization approaches give up referential consistency but only store "data" (i.e. if two objects reference a third, deserialization will create TWO instances of that third object.
Like this: A->C<-B ==serialization==> A->C1 B->C2.
This basically breaks/ruins/destroys object graphs and makes serialization of cyclic graphs impossible, since it creates and endlessly cascading replication. See JSON serialization, for example. Funny stuff.)
Even Brian Goetz' draft for a Java "Serialization 2.0" includes that limitation (see "Limitations" at http://cr.openjdk.java.net/~briangoetz/amber/serialization.html) (and another one which breaks the separation of concerns).
MicroStream does not have that limitation. It handles arbitrary object graphs properly without ruining their references.
Keeping referential consistency intact is by far not "trying to do too much", as he writes. It is "doing it properly". One just has to know how to do it properly. And it even is rather trivial if done correctly.
So, depending on how many limitations Protobuf-Serialization has ("pacts with the devil"), it might be hardly or even not at all comparable to MicroStream in general.
Of course, you can always create some performance comparison tests for your particular requirements and see which technology suits you best. Just make sure you are aware of the limitations a certain technology imposes on you (ruined referential consistency, forbidden types, required annotations, required default constructor / getters / setters, etc.).
MicroStream has none*.
(*) within reason: Serializing/storing system-internals (e.g. Thread) or non-entities (like lambdas or proxy instances) is, while technically possible, intentionally excluded.
Related
We're looking for a high performance compact serialization solution for Java objects on GAE.
Native Java serialization doesn't perform all that well and it's terrible at compatibility i.e. it can't unserialize an old object if a field is added to the class or removed.
We tried Kryo which performs well in other environments and supports back compatibility when fields are added, but unfortunately the GAE SecurityManager slows it down terribly by adding a check to every method call in the recursion. I'm concerned that might be the issue with all serialization libraries.
Any ideas please? Thanks!
Beware, premature optimisation is the root of all evil.
You should first try one of the standard solutions and then decide if it fits your performance requirements. I did test several serialization solutions on GAE (java serialisation, JSON, JSON+ZIP) and they were an order of magnitude faster than datastore access.
So if serialising data takes 10ms and writing it to datastore takes 100ms, there is very little added benefit in trying to optimise the 10ms.
Btw, did you try Jackson?
Also, all API calls on GAE are implemented as RPC calls to other servers, where payload is serialised as protobuf.
Do you need cross-language ability? and re. high-performance are you referring to speed only, or including optimized memory management for less GC, or including serialized object size?
If you need cross-language I think Google's protobuf is a solution. However, it can hardly be called "high performance" because the UTF-8 strings created on Java side causes constant GCs.
In case the data you are supporting is mostly simple objects and you don't need composition, I would recommend you to write your own serialization layer (not kidding).
Using an enum to index your fields so you can serialize fields that contains value only
Create maps for primitive types using trove4j collections.
Using cached ByteBuffer objects if you could predict size for most of your objects to be under a certain value.
Using string dictionary to reduce string object re-creation and use cached StringBuilder during deserialization
That's what we did for our "high-performance" java serialization layer. Essentially we could achieve almost object-less serialization/de-serialization on a reasonably good timing.
Is there a method where I can iterate a Collection and only retrieve just a subset of attributes without loading/unloading the each of the full object to cache? 'Cos it seems like a waste to load/unload the WHOLE (possibly big) object when I need only some attribute(s), especially if the objects are big. It might cause unnecessary cache conflicts when loading such unnecessary data, right?
When I meant to 'load to cache' I mean to 'process' that object via the processor. So there would be objects of ex: 10 attributes. In the iterating loop I only use 1 of those. In such a scenario, I think its a waste to load all the other 9 attributes to the processor from the memory. Isn't there a solution to only extract the attributes without loading the full object?
Also, does something like Google's Guava solve the problem internally?
THANK YOU!
It's not usually the first place to look, but it's not certainly impossible that you're running into cache sharing problems. If you're really convinced (from realistic profiling or analysis of hardware counters) that this is a bottleneck worth addressing, you might consider altering your data structures to use parallel arrays of primitives (akin to column-based database storage in some DB architectures). e.g. one 'column' as a float[], another as a short[], a third as a String[], all indexed by the same identifier. This structure allows you to 'query' individual columns without loading into cache any columns that aren't currently needed.
I have some low-level algorithmic code that would really benefit from C's struct. I ran some microbenchmarks on various alternatives and found that parallel arrays was the most effective option for my algorithms (that may or may not apply to your own).
Note that a parallel-array structure will be considerably more complex to maintain and mutate than using Objects in java.util collections. So I'll reiterate - I'd only take this approach after you've convinced yourself that the benefit will be worth the pain.
There is no way in Java to manage loading to processor caches, and there is no way to change how the JVM works with objects, so the answer is no.
Java is not a low-level language and hides such details from the programmer.
The JVM will decide how much of the object it loads. It might load the whole object as some kind of read-ahead optimization, or load only the fields you actually access, or analyze the code during JIT compilation and do a combination of both.
Also, how large do you worry your objects are? I have rarely seen classes with more than a few fields, so I would not consider that big.
Background
There is a well-known tool called Wireshark. I've been using it for ages. It is great, but performance is the problem. Common usage scenario includes several data preparation steps in order to extract a data subset to be analyzed later. Without that step it takes minutes to do filtering (with big traces Wireshark is next to unusable).
The actual idea is to create a better solution, fast, parallel and efficient, to be used as a data aggregator/storage.
Requirements
The actual requirement is to use all power provided by modern hardware. I should say there is a room for different types of optimization and I hope I did a good job on upper layers, but technology is the main question right now. According to the current design there are several flavors of packet decoders (dissectors):
interactive decoders: decoding logic can be easily changed in runtime. Such approach can be quite useful for protocol developers -- decoding speed is not that critical, but flexibility and fast results are more important
embeddable decoders: can be used as a library.This type is supposed to have good performance and be flexible enough to use all available CPUs and cores
decoders as a service: can be accessed through a clean API. This type should provide best of the breed performance and efficiency
Results
My current solution is JVM-based decoders. The actual idea is to reuse the code, eliminate porting, etc, but still have good efficiency.
Interactive decoders: implemented on Groovy
Embeddable decoders: implemented on Java
Decoders as a service: Tomcat + optimizations + embeddable decoders wrapped into a servlet (binary in, XML out)
Problems to be solved
Groovy provides way to much power and everything, but lucks expressiveness in this particular case
Decoding protocol into a tree structure is a dead end -- too many resources are simply wasted
Memory consumption is somewhat hard to control. I did several optimizations but still not happy with profiling results
Tomcat with various bells and whistles still introduces to much overhead (mainly connection handling)
Am I doing right using JVM everywhere? Do you see any other good and elegant way to achieve the initial goal: get easy-to-write highly scalable and efficient protocol decoders?
The protocol, format of the results, etc are not fixed.
I've found several possible improvements:
Interactive decoders
Groovy expressiveness can be greatly improved, by extending Groovy syntax using
AST Transformations. So it would be possible to simplify decoders authoring still providing good performance. AST (stands for Abstract Syntax Tree) is a compile-time technique.
When the Groovy compiler compiles Groovy scripts and classes, at some
point in the process, the source code will end up being represented in
memory in the form of a Concrete Syntax Tree, then transformed into an
Abstract Syntax Tree. The purpose of AST Transformations is to let
developers hook into the compilation process to be able to modify the
AST before it is turned into bytecode that will be run by the JVM.
I do not want to reinvent the wheel introducing yet another language to define/describe a protocol structure (it is enough to have ASN.1). The idea is to simplify decoders development in order to provide some fast prototyping technique. Basically, some kind of DSL is to be introduced.
Further reading
Embeddable decoders
Java can introduce some additional overhead. There are several libraries to address that issue:
HPPC
Trove
Javolution
Commons-primitives
Frankly speaking I do not see any other option except Java for this layer.
Decoders as a service
No Java is needed on this layer. Finally I have a good option to go but price is quite high. GWan looks really good.
Some additional porting will be required, but it is definitely worth it.
This problem seems to share the same characteristic of many high-performance I/O implementation problems, which is that the number of memory copies dominates performance. The scatter-gather interface patterns for asynchronous I/O follow from this principle. With scatter-gather, blocks of memory are operated on in place. As long as the protocol decoders take block streams as input rather than byte streams, you'll have eliminated a lot of the performance overhead of moving memory around to preserve the byte stream abstraction. The byte stream is a very good abstraction for saving engineering time, but not so good for high-performance I/O.
In a related issue, I'd beware of the JVM just because of the basic type String. I can't say I'm familiar with how String is implemented in the JVM, but I do imagine that there's not a way of making a string out of a block list without doing a memory copy. On the other hand, a native kind of string that could, and which interoperated with the JVM String compatibly could be a way of splitting the difference.
The other aspect of this problem that seems relevant is that of formal languages. In the spirit of not copying blocks of memory, you also don't want to be scanning the same block of memory over and over. Since you want to make run-time changes, that means you probably don't want to use a precompiled state machine, but rather a recursive descent parser that can dispatch to an appropriate protocol interpreter at each level of descent. There are some complications involved when an outer layer does not specify the type of an inner layer. Those complications are worse when you don't even get the length of the inner content, since then you're relying on the inner content to be well formed to prevent runaway. Nevertheless, it's worth putting some attention into understand how many times a single block will be scanned.
Network traffic is growing (some analytics), so there will be a need to process more and more data per second.
The only way to achieve that is to use more CPU power, but CPU frequency is stable. Only number of cores is growing. It looks like the only way is to use available cores more efficiently and scale better.
I very much like the simplicity of calling remote methods via Java's RMI, but the verbosity of its serialization format is a major buzz kill (Yes, I have benchmarked, thanks). It seems that the architects at Sun did the obvious right thing when designing the RPC (speaking loosely) component, but pulled an epic fail when it came to implementing serialization.
Conversely, it seems the architects of Thrift, Avro, Kryo (especially), protocol buffers (not so much), etc. generally did the obvious right thing when designing their serialization formats, but either do not provide a RPC mechanism, provide one that is needlessly convoluted (or immature), or else one that is more geared toward data transfer than invoking remote methods (perfectly fine for many purposes, but not what I'm looking for).
So, the obvious question: How can I use RMI's method-invocation loveliness but employ one of the above libraries for the wire protocol? Is this possible without a lot of work? Am I evaluating one of the aforementioned libraries too harshly (N.B. I very much dislike code generation, in general; I dislike unnecessary annotations somewhat, and XML configuration quite a bit more; any sort of "beans" make me cringe--I don't need the weight; ideally, I'm looking to just implement an interface for my remote objects, as with RMI).
Once upon a time, I did have the same requirement. I had changed rmi methods arguments and return types to byte[].
I had serialized objects with my preferred serializer to byte array, then called my modified rmi methods.
Well, as you mentioned java serialization is too verbose, therefore 5 years ago I did implement a space efficient serialization algorithm. It saves too much space, if you are sending a very complex object graph.. Recently, I have to port this serialization implementation to GWT, because GWT serialization in Dev mode is incredibly slow.
As an example;
rmi method
public void saveEmployee(Employee emp){
//business code
}
you should change it like below ,
public void saveEmployee(byte[] empByte) {
YourPreferredSerializer serialier = YourPreferredSerializerFactory.creteSerializer();
Employee emp = (Employee) serializer.deSerialize(empByte);
//business code
}
EDIT :
You should check MessagePack . it looks promising.
I don't think there is a way to re-wire RMI, but it might be that specific replacement projects -- I am specifically thinking of DiRMI -- might? And/or project owners might be interest in helping with this (Brian, its author, is a very competent s/w engineer from Amazon.com).
Another interesting project is Protostuff -- its author is building a RPC framework too (I think); but even without it supports an impressive range of data formats; and does this very efficiently (as per https://github.com/eishay/jvm-serializers/wiki/).
Btw, I personally think biggest mistake most projects have made (like PB, Avro) is not keeping proper separation between RPC and serialization aspects nicely separate.
So ability to do RPC using a pluggable data format or serialization providers seems like a good idea to me.
writeReplace() and readResolve() is probably the best combo for doing so. Mighty powerful in the right hands.
Java serialization is only verbose where it describes the classes and fields it's serializing. Overall, the format is as "self describing" as XML. You can can actually override this and replace it with something else. This is what the writeClassDescriptor and readClassDescriptor methods are for. Dirmi overrides these methods, and so it is able to use standard object serialization with less wire overhead.
The way it works is related to how its sessions work. Both endpoints may have different versions of the object, and so simply throwing away the class descriptors won't work. Instead, additional data is exchanged (in the background) so that the serialized descriptor is replaced with a session-specific identifier. Upon seeing the identifier, a lookup table is examined to find the descriptor object. Because the data is exchanged in the background, there's a brief "warm up period" after a session is created and for every time an object type is written for the first time.
Dirmi has no way to replace the wire format at this time.
I find modeling physical containers using collections very intuitive. I override/delegate add methods with added capacity constraints based on physical attributes such as volume of added elements, sort based on physical attributes, locate elements by using maps of position to element and so on.
However, when I read the documentation of collection classes, I get the impression that it's not the intended use, that it's just a mathematical construct and a bounded queue is just meant to be constrained by the number of elements and so forth.
Indeed I think that I unless I'm able to model this collection coherently, I should perhaps not expose this class as a collection but only delegate to it internally. Opinions?
Many structures in software development do not have a physical counterpart. In fact, some structures and algorithms are quite abstract, and do not model objects directly in the physical world. So just because an object does not serve as a suitable model for physical objects in the real world does not necessarily mean it cannot be used effectively to solve problems within a computer program.
Indeed I think that I unless I'm able to model this collection coherently, I should perhaps not expose this class as a collection but only delegate to it internally. Opinions?
Firstly, you don't want to get too hung up with the modeling side of software engineering. UML style models (usually) serve primarily as a way of organizing and expressing the developer's high level ideas about how an application should be implemented. There is no need to have a strict one-to-one relationship between the classes in the model and the implementation classes in the application code.
Second, you don't want to get too hung up about modeling "real world" (i.e. physical) objects and their behavior. Most of the "objects" that are used in a typical applications have no real connection with the real world. For example, a "folder" or "directory" is really little more than an analogy of the physical objects with the same names. There's typically no need for the computer concept to be constrained by the physical behavior of the real world objects.
Finally, there are a number of software engineering reasons why it is a bad idea to have your Java domain classes extend the standard collection types. For example:
The collections have a generic behavior that it is typically not appropriate to expose in a domain object. For instance, you typically don't want components of a domain object to be added and removed willy-nilly.
By extending a collection type, you are implicitly giving permission for some part of your application to treat domain objects as just lists or sets or whatever.
By extending collection classes, you would be hard-wiring implementation details into your domain APIs. For example, you would need to decide between extending ArrayList or LinkedList, and changing your mind would result (at least) in a binary API incompatibility ... and possibly worse.
Not entirely sure that I've understood you correctly. I gather that you want to know if you should expose the collection (subclassing) or wrap it (have a private field).
As Robert says, it really depends on the case. It's pretty much your choice. Nonetheless I'd say that in many cases the better choice is to not expose the collection because the constraints define the object you are modelling and are not fully congruent with the underlying collection. In short: users of your object shouldn't need to know that they are dealing with a collection unless it is really a collection with some speciality e.g. has all properties of a collection but allows only a certain number of objects.