interpreting Java-native communication performance - java

Right now I'm using JNA for Java-native communication and am pleased with its simplicity. However I do need to optimize performance and am considering using other bindings.
My question is this: what part of Java-native communication is the "expensive" part? Is it the passing of data between them?
Let me put it another way. Right now the functions my JNA interface is calling don't pass any data to Java at all, and the functions aren't even called that often. In other words, Java calls a library call and then the library call does its own thing for a while and returns a primitive type. Will JNI/Swig/etc be any faster than JNA in that kind of situation?

Given your use-case JNI won't be any faster than JNA.
What's expensive about the Java-native interaction is transferring large amounts of memory. In particular, it can be very expensive to make Java memory available to the native code; IIRC this is partly because Java can choose to segment the memory however it likes, but native code will expect contiguous chunks of memory -- the movement/copying of memory takes some time.
If you're concerned about performance you should make sure that your JNA code uses the "direct" style access rather than the original interface style access.
Additionally, if you do need to transfer large amounts of memory between Java and native code you should consider using a single initial direct allocation (if possible) and avoid reallocating that memory on a regular basis. This way you pay the allocation cost only once, and at the beginning, so over a large number of calls that cost becomes negligible.

Related

Direct operations on off-heap arrays

Recently I've been looking for a way to store large chunk of data in memory for scientific computing. I've looked at scala-offheap and LArray. One thing I noticed is that if I have an existing function operating on a native Java array, I cannot apply it directly on an off-heap array; both libraries require a copy from off-heap array to a normal one.
I don't know if this is a real limitation of the memory model, or simply a limitation imposed by the library APIs. Is it possible to get a Java array "view" of an off-heap array?
jillegal claims to be able to do that, but it's basically one big hack because it violates assumptions of the garbage collectors and it is relying on particular collectors not going up in fire when they encounter those violations. It's probably not a good idea for production use.
If you only need to access primitive types then bytebuffers currently are the abstraction that provides the same APIs for on-heap and off-heap access but you have to extract fields one by one.

A better performing way retrieving select attributes Collections of Large Objects in Java

Is there a method where I can iterate a Collection and only retrieve just a subset of attributes without loading/unloading the each of the full object to cache? 'Cos it seems like a waste to load/unload the WHOLE (possibly big) object when I need only some attribute(s), especially if the objects are big. It might cause unnecessary cache conflicts when loading such unnecessary data, right?
When I meant to 'load to cache' I mean to 'process' that object via the processor. So there would be objects of ex: 10 attributes. In the iterating loop I only use 1 of those. In such a scenario, I think its a waste to load all the other 9 attributes to the processor from the memory. Isn't there a solution to only extract the attributes without loading the full object?
Also, does something like Google's Guava solve the problem internally?
THANK YOU!
It's not usually the first place to look, but it's not certainly impossible that you're running into cache sharing problems. If you're really convinced (from realistic profiling or analysis of hardware counters) that this is a bottleneck worth addressing, you might consider altering your data structures to use parallel arrays of primitives (akin to column-based database storage in some DB architectures). e.g. one 'column' as a float[], another as a short[], a third as a String[], all indexed by the same identifier. This structure allows you to 'query' individual columns without loading into cache any columns that aren't currently needed.
I have some low-level algorithmic code that would really benefit from C's struct. I ran some microbenchmarks on various alternatives and found that parallel arrays was the most effective option for my algorithms (that may or may not apply to your own).
Note that a parallel-array structure will be considerably more complex to maintain and mutate than using Objects in java.util collections. So I'll reiterate - I'd only take this approach after you've convinced yourself that the benefit will be worth the pain.
There is no way in Java to manage loading to processor caches, and there is no way to change how the JVM works with objects, so the answer is no.
Java is not a low-level language and hides such details from the programmer.
The JVM will decide how much of the object it loads. It might load the whole object as some kind of read-ahead optimization, or load only the fields you actually access, or analyze the code during JIT compilation and do a combination of both.
Also, how large do you worry your objects are? I have rarely seen classes with more than a few fields, so I would not consider that big.

Efficient decoding of binary and text structures (packets)

Background
There is a well-known tool called Wireshark. I've been using it for ages. It is great, but performance is the problem. Common usage scenario includes several data preparation steps in order to extract a data subset to be analyzed later. Without that step it takes minutes to do filtering (with big traces Wireshark is next to unusable).
The actual idea is to create a better solution, fast, parallel and efficient, to be used as a data aggregator/storage.
Requirements
The actual requirement is to use all power provided by modern hardware. I should say there is a room for different types of optimization and I hope I did a good job on upper layers, but technology is the main question right now. According to the current design there are several flavors of packet decoders (dissectors):
interactive decoders: decoding logic can be easily changed in runtime. Such approach can be quite useful for protocol developers -- decoding speed is not that critical, but flexibility and fast results are more important
embeddable decoders: can be used as a library.This type is supposed to have good performance and be flexible enough to use all available CPUs and cores
decoders as a service: can be accessed through a clean API. This type should provide best of the breed performance and efficiency
Results
My current solution is JVM-based decoders. The actual idea is to reuse the code, eliminate porting, etc, but still have good efficiency.
Interactive decoders: implemented on Groovy
Embeddable decoders: implemented on Java
Decoders as a service: Tomcat + optimizations + embeddable decoders wrapped into a servlet (binary in, XML out)
Problems to be solved
Groovy provides way to much power and everything, but lucks expressiveness in this particular case
Decoding protocol into a tree structure is a dead end -- too many resources are simply wasted
Memory consumption is somewhat hard to control. I did several optimizations but still not happy with profiling results
Tomcat with various bells and whistles still introduces to much overhead (mainly connection handling)
Am I doing right using JVM everywhere? Do you see any other good and elegant way to achieve the initial goal: get easy-to-write highly scalable and efficient protocol decoders?
The protocol, format of the results, etc are not fixed.
I've found several possible improvements:
Interactive decoders
Groovy expressiveness can be greatly improved, by extending Groovy syntax using
AST Transformations. So it would be possible to simplify decoders authoring still providing good performance. AST (stands for Abstract Syntax Tree) is a compile-time technique.
When the Groovy compiler compiles Groovy scripts and classes, at some
point in the process, the source code will end up being represented in
memory in the form of a Concrete Syntax Tree, then transformed into an
Abstract Syntax Tree. The purpose of AST Transformations is to let
developers hook into the compilation process to be able to modify the
AST before it is turned into bytecode that will be run by the JVM.
I do not want to reinvent the wheel introducing yet another language to define/describe a protocol structure (it is enough to have ASN.1). The idea is to simplify decoders development in order to provide some fast prototyping technique. Basically, some kind of DSL is to be introduced.
Further reading
Embeddable decoders
Java can introduce some additional overhead. There are several libraries to address that issue:
HPPC
Trove
Javolution
Commons-primitives
Frankly speaking I do not see any other option except Java for this layer.
Decoders as a service
No Java is needed on this layer. Finally I have a good option to go but price is quite high. GWan looks really good.
Some additional porting will be required, but it is definitely worth it.
This problem seems to share the same characteristic of many high-performance I/O implementation problems, which is that the number of memory copies dominates performance. The scatter-gather interface patterns for asynchronous I/O follow from this principle. With scatter-gather, blocks of memory are operated on in place. As long as the protocol decoders take block streams as input rather than byte streams, you'll have eliminated a lot of the performance overhead of moving memory around to preserve the byte stream abstraction. The byte stream is a very good abstraction for saving engineering time, but not so good for high-performance I/O.
In a related issue, I'd beware of the JVM just because of the basic type String. I can't say I'm familiar with how String is implemented in the JVM, but I do imagine that there's not a way of making a string out of a block list without doing a memory copy. On the other hand, a native kind of string that could, and which interoperated with the JVM String compatibly could be a way of splitting the difference.
The other aspect of this problem that seems relevant is that of formal languages. In the spirit of not copying blocks of memory, you also don't want to be scanning the same block of memory over and over. Since you want to make run-time changes, that means you probably don't want to use a precompiled state machine, but rather a recursive descent parser that can dispatch to an appropriate protocol interpreter at each level of descent. There are some complications involved when an outer layer does not specify the type of an inner layer. Those complications are worse when you don't even get the length of the inner content, since then you're relying on the inner content to be well formed to prevent runaway. Nevertheless, it's worth putting some attention into understand how many times a single block will be scanned.
Network traffic is growing (some analytics), so there will be a need to process more and more data per second.
The only way to achieve that is to use more CPU power, but CPU frequency is stable. Only number of cores is growing. It looks like the only way is to use available cores more efficiently and scale better.

Capacity adapting Java collections

Are there any Java libraries for maps and sets that alter their representation strategy based upon the capacity? I have an application where we have many many maps and sets, but most of the time they are small, usually 6 elements or less.
As such we've been able to extract some good memory improvements by writing some specialized maps and sets that just use arrays for small sizes and then default to standard Java Sets and Maps for larger capacities.
However, rolling our own specialized versions of set and maps seems kind of silly if there is already something off the shelf. I've looked at guava and the Apache collections and they do not seem to offer anything like this. Trove sounds like it is more memory efficient than the JDK's collections in general, but it isn't clear if it will attempt to minimize memory usage like this.
You may want to look at Clojure's persistent data structures. Although the "persistent" part may be overkill for you, it does exactly what you are looking for and is still really fast. There is a PersistentArrayMap that is promoted to a PersistentHashMap once the collection exceeds 16 entires.
I'm not aware of any such library.
The problem is that the representations that use the least amount of memory tend to:
be incompatible with the Java Collections APIs which makes integration hard, and
break down the abstraction boundaries; e.g. by adding link fields to element types.
These make it difficult to create a general purpose library along these lines. Then we add the problem that a representation that adapts to minimize heap space usage as the collection grows and shrinks will inevitably create a lot more garbage ... and that will have CPU performance implications.
Your approach is kind of interesting, though it doesn't give you anywhere like minimal memory usage. I assume that your classes are effectively wrappers for the standard implementation classes when the collections get big. If it works for you, I suggest that you stick with it.

When to use List<Long> instead of long[]?

There's something I really don't understand: a lot (see my comment) of people complain that Java isn't really OO because you still have access to primitive and primitive arrays. Some people go as far as saying that these should go away...
However I don't get it... Could you do efficiently things like signal processing (say write an FFT, for starters), writing efficient encryption algorithms, writing fast image manipulation libraries, etc. in Java if you hadn't access to, say, int[] and long[]?
Should I start writing my Java software by using List<Long> instead of long[]?
If the answer is "simply use higher-level libraries doing what you need" (for example, say, signal processing), then how are these libraries supposed to be written?
I personally use List most of the times, because it gives you a lot of convenience. You can also have concurrent collections, but not concurrent raw arrays.
Almost the only situation I use raw arrays is when I'm reading a large chunk of binary data, like image processing. I'm concerned instantiating e.g.Byte objects 100M times, though I have to confess I never tried working with that huge Byte list. I noticed when you have something like a 100KB file, List<Byte> works ok.
Most of the image processing examples etc. use array as well, so in this field it's actually more convenient to use raw arrays.
So in conclusion, my practical answer to this is
Use wrappers unless you are
Working with a very large array
like length > 10M (I'm too lazy to
write a benchmark!),
Working in a field
where many examples or people prefer
raw arrays (e.g. network
programming, image processing),
You found out there is a significant
performance gain by changing to raw arrays, by doing
experiments.
If for whatever
reason it's easier to work with raw
arrays on that problem for you.
In high performance computing, arrays of objects (as well as primitives) are essential as they map more robustly onto the underlying CPU architecture and behave more predictably for things such as cache access and garbage collection. With such techniques, Java is being used very successfully in areas where the received wisdom is that the language is not suitable.
However, if your goal is solely to write code that is highly maintainable and provably self consistent, then the higher level constructs are the obvious way to go. In your direct comparison, the List object hides the issue of memory allocation, growing your list and so on, as well as providing (in various implementations) additional facilities such as particular access patterns like stacks or queues. The use of generics also allows you to carry out refactoring with a far greater level of confidence and the full support of your IDE and toolchain.
An enlightened developer should make the appropriate choice for the use case they are approaching. Suggesting that a language is not "OO enough" because it allows such choices would lead me to suspect that the person either doesn't trust that other developers are as smart as they are or has not had a particularly wide experience of different application domains.
It's a judgment call, really. Lists tend to play better with generic libraries and have stuff like add, contains, etc, while arrays generally are faster and have built-in language support and can be used as varargs. Select whatever you find serves your purpose better.
Okay.
You need to know the size of an array at the time that it is created, but you cannot change its size after it has been created. But, a list can grow dynamically after it has been created, and it has the .Add() function to do that.
Have you gone through this link ?
A nice comparison of Arrays vs List.
Array or List in Java. Which is faster ?
List v/s Array

Categories