MKL Accelerated Math Libraries for Java - java

I've looked at the related threads on StackOverflow and Googled with not much luck. I'm also very new to Java (I'm coming from a C# and .NET background) so please bear with me. There is so much available in the Java world it's pretty overwhelming.
I'm starting on a new Java-on-Linux project that requires some heavy and highly repetitious numerical calculations (i.e. statistics, FFT, Linear Algebra, Matrices, etc.). So maximizing the performance of the mathematical operations is a requirement, as is ensuring the math is correct. So hence I have an interest in finding a Java library that perhaps leverages native acceleration such as MKL, and is proven (so commercial options are definitely a possibility here).
In the .NET space there are highly optimized and MKL accelerated commercial Mathematical libraries such as Centerspace NMath and Extreme Optimization. Is there anything comparable in Java?
Most of the math libraries I have found for Java either do not seem to be actively maintained (such as Colt) or do not appear to leverage MKL or other native acceleration (such as Apache Commons Math).
I have considered trying to leverage MKL directly from Java myself (e.g. JNI), but me being new to Java (let alone interoperating between Java and native libraries) it seemed smarter finding a Java library that has already done this correctly, efficiently, and is proven.
Again I apologize if I am mistaken or misguided (even in regarding any libraries I've mentioned) and my ignorance of the Java offerings. It's a whole new world for me coming from the heavily commercialized Microsoft stack so I could easily be mistaken on where to look and regarding the Java libraries I've mentioned. I would greatly appreciate any help or advice.

For things like FFT (bulk operations on arrays), the range check in java might kill your performance (at least recently it did). You probably want to look for libraries which optimize the provability of their index bounds.
According to the The HotSpot spec
The Java programming language
specification requires array bounds
checking to be performed with each
array access. An index bounds check
can be eliminated when the compiler
can prove that an index used for an
array access is within bounds.
I would actually look at JNI, and do your bulk operations there if they are individually very large. The longer the operation takes (i.e. solving a large linear system, or large FFT) the more its worth it to use JNI (even if you have to memcpy there and back).

Personally, I agree with your general approach, offloading the heavyweight maths from Java to a commercial-grade library.
Googling around for Java / MKL integration I found this so what you propose is technically possible. Another option to consider would be the NAG libraries. I use the MKL all the time, though I program in Fortran so there are no integration issues. I can certainly recommend their quality and performance. We tested, for instance, the MKL version of FFTW against a version we built from sources ourselves. The MKL implementation was faster by a small integer multiple.
If you have concerns about the performance of calling a library through JNI, then you should plan to structure your application to make fewer larger calls in preference to more smaller ones. As to the difficulties of using JNI, my view (I've done some JNI programming) is that the initial effort you have to make in learning how to use the interface will be well rewarded.
I note that you don't seem to be overwhelmed yet with suggestions of what Java maths libraries you could use. Like you I would be suspicious of research-quality, low-usage Java libraries trawled from the net.

You'd probably be better off avoiding them I think. I could be wrong, it's not a bit I'm too familiar with, so don't take too much from this unless a few others agree with me, but calling up the JNI has quite a large overhead, since it has to go outside of the JRE and everything to do it, so unless you're grouping a lot of things together into a single function to put through at once, the slight benefit of the external library's will be outweighed hugely by the cost of calling them. I'd give up looking for an MKL library and find an optimized pure Java library. I can't say I know of any better than the standard one to recommend though, sorry.

Related

Java best practices for vectorized computations

I'm researching methods for computing expensive vector operations in Java, e.g. dot-products or multiplications between large matrices. There are a few good threads on here on this topic, like this and this. It appears that there is no reliable way of having the JIT compile code to use CPU vector instructions (SSE2, AVX, MMX...). Moreover, high-performance linear algebra libraries (ND4J, jblas, ...) do in fact make JNI calls to BLAS/LAPACK libraries for the core routines. And I understand BLAS/LAPACK packages to be the de facto standard choices for native linear algebra computations.
On the other hand others (JAMA, ...) implement algorithms in pure Java without native calls.
My questions are:
What are the best practices here?
Is making native calls to BLAS/LAPACK actually a recommended choice? Are there other libraries worth considering?
Is the overhead of JNI calls negligible compared to the performance gain? Does anyone have experience as to where the threshold lies (e.g. how small an input should be to make JNI calls more expensive than a pure Java routine?)
How big is the portability tradeoff?
I hope this question could be of help both for those who develop their own computation routines, and for those who just want to make an educated choice between different implementations.
Insights are appreciated!
There are no clear best practices for every case. Whether you could/should use a pure Java solution (not using SIMD instructions) or (optimized with SIMD) native code through JNI depends on your particular application and specifically the size of your arrays and possible restrictions on the target system.
There could be a requirement that you are not allowed to install specific native libraries in the target system and BLAS is not already installed. In that case you simply have to use a Java library.
Pure Java libraries tend to perform better for arrays with length much smaller than 100 and at some point after that you get better performance using native libraries through JNI. As always, your mileage may vary.
Pertinent benchmarks have been performed (in random order):
http://ojalgo.org/performance_ejml.html
http://lessthanoptimal.github.io/Java-Matrix-Benchmark/
Performance of Java matrix math libraries?
These benchmarks can be confusing as they are informative. One library may be faster for some operation and slower for some other. Also keep in mind that there may be more than one implementation of BLAS available for your system. I currently have 3 installed on my system blas, atlas and openblas. Apart from choosing a Java library wrapping a BLAS implementation you also have to choose the underlying BLAS implementation.
This answer has a fairly up to date list except it doesn't mention nd4j that is rather new. Keep in mind that jeigen depends on eigen so not on BLAS.

Is there a Java library for accelerated vector computations?

I'm looking for a Java lib that permits to do some fast computations with vector (and maybe matrices too).
By fast I mean that it takes advantage of GPU processing and/or SSE instructions. I'm wondering if it can be possible to find something more portable as possible. I recognize that the JVM provides a thick abstraction layer of the hardware.
I've come across JCUDA, but there's a drawback: on a computer without an Nnvidia graphic card it should be run in emulation mode (so I come to believe it will be not efficient as expected). Has anyone already tried it?
What about OpenCL? It should provide you a good starting point for this kind of optimized operations.
There exist many bindings for Java, starting from jocl (but take a loot also at JavaCL or LWJGL that added support from 2.6)
If by fast you mean high speed rather than requiring support for your particular hardware, I'd recommend Colt. Vectors are called 1-d matrices in this library.
I'd recommend using UJMP (wraps most if not all of the high-speed Java matrix libraries) and wait for a decent GPGPU implementation to be written for it (I started hacking it with JavaCL a while ago, but it needs some serious rewrite, maybe using ScalaCLv2 that's in the works).

Using NumPy and Cpython with Jython

I must use a commercial Java library, and would like to do it from Python. Jython is robust and I am fine with it being a few dot releases behind. However, I would like to use NumPy as well, which obviously does not work with Jython. Options like CPype and Java numeric libraries are unappealing. The former is essentially dead. The latter are mostly immature and lack the ease of use and wide acceptance of NumPy. My question is: How can one have Jython and Python code interoperate? It would be acceptable for me to call Jython from Cpython or the other way around.
It's ironic, considering that Jython and Numeric (NumPy's ancestor) were initiated by the same developer (Jim Hugunin, who then moved on to also initiate IronPython and now holds some kind of senior architect position at Microsoft, working on all kind of dynamic languages support for .NET and Silverlight), that there's no really good way to use numpy in Jython. The closest thing to that, which I know of, is the "jnumerical" project -- the (scarce) docs are on sourceforge, but the updated sources are on bitbucket.
"Numeric Python", what jnumerical implements, is not as slick and streamlined as its numpy descendant, but it has about the same functionality and shares a lot of the concepts and philosophy, so maybe you could find it usable -- worth checking out, at least.
Consider using execnet, which allows you to combine the strengths of both Jython and CPython, including current NumPy. The disadvantage here is that you will have to pay for the cost of serializing/deserializing objects between the two interpreters in two different process spaces. (You can avoid network overhead by using its support for subprocess.) But such a combination may work well, given that you're considering JPype, which would have similar (and probably higher) overhead. Just ensure you've partitioned the work appropriately.
The Jython developers (and I'm one of them) are looking at supporting NumPy in the future, via support of the C Extension API, but this is very much preliminary planning indeed.
I look very much formard to the Jython C Extension API! That would be awesome!
Until, that point, I think you have two alternatives:
http://jepp.sourceforge.net/ for embedding python in java, it has a nice console. The disadvatage, for me a too big disadvatage, is that it needs to be compiled against your own python. And with the python upgrade, you have to recompile (I don't want to compile python, in order to compile and use the extension - it is also not possible, especially if the code should be executed on different machines, on grid for example)
http://lucene.apache.org/pylucene/jcc/ - this is used for lucene, and for many other projects. I personally use it to wrap GATE NLP engine and also solr. To make that available to Python. Jcc is much faster than the (dead) JPype, probably because some data structures (like lists) are optimized and also because it is interfacing python<->java via C++ extension (according to this: http://www.slideshare.net/onyame/mixing-python-and-java page 30) I have tried moving 6mil of integers in the list between python and java, JPype was orders of magnitude slower (but i don't remember the numbers)
However, using Jcc, you can wrap only public methods, and sometimes it is tricky, especially if that method is receiving or returning certain java objects (in short, JCC must compile wrappers also for the passed-in objects, otherwise all the methods using/returning such methods are not accessible). So unless you need to distribute your code, you are better of with JEPP.
Disclaimer: Have not had persnal experience with it yet
Seems like JyNI – Jython Native Interface is the way to go.
There's also a newer question posted which may have newer alternatives.
If you stick to vector and matrix maths, I suggest to have a look onto vectorz.
It is a pure Java implementation and shall be 100% usable from within jython. I still didn't try it, but will soon, since I have the same necessity in finding a numpy alternative.

Alternative to Java

I need an alternative to Java, because I am working on a genetics-calculation project.
It takes a lot of memory and the most of the cpu time. And therefore it won´t work when I deploy it on a server, because many people use the program at the same time.
Does anybody know another language that is not running in a virtual machine and is similar to Java (object-oriented, using exceptions and type-safety)?
Best regards,
Jonathan
To answer the direct question: there are dozens of languages that fit your explicit requirements. AmmoQ listed a few; Wikipedia has many more.
And I think that you'll be disappointed with every one of them.
Despite what Java haters want you to think, Java's performance is not much different than any other compiled language. Just changing languages won't improve performance much.
You'll probably do better by getting a profiler, and looking at the algorithms that you used.
Good luck!
If your apps is consuming most of the CPU and memory on a single-user workstation, I'm skeptical that translating it into some non-VM language is going to help much. With Java, you're depending on the VM for things like memory management; you're going to have to re-implement their equivalents in your non-VM language. Also, Java's memory management is pretty good. Your application probably isn't real-time sensitive, so having it pause once in a while isn't a problem. Besides, you're going to be running this on a multi-user system anyway, right?
Memory usage will have more to do with your underlying data structures and algorithms rather than something magical about the language. Unless you've got a really great memory allocator library for your chosen language, you may find you uses just as much memory (if not more) due to bugs in your program.
Since your app is compute-intensive, some other language is unlikely to make it less so, unless you insert some strategic sleep() calls throughout the code to deliberately make it yield the CPU more often. This will slow it down, but will be nicer to the other users.
Try running your app with Java's -server option. That will engage a VM designed for long-running programs and includes a JIT that will compile your Java into native code. It may make your program run a bit faster, but it will still be CPU and memory bound.
If you don't like C++, you might consider D, ObjectiveC or the new Go language from google.
You may try C++, it satisfies all your requirements.
Use Python along with numpy, scipy and matplotlib packages. numpy is a Python package which has all the number crunching code implemented in C. Hence runtime performance (bcoz of Python Virtual Machine) won't be an issue.
If you want compiled, statically typed language only, have a look at Haskell.
Can your algorithms be parallelised?
No matter what language you use you may come up against limitations at some point if you use a single process. Using something like Hadoop will mean you can retain Java and ease of use but you can run in parallel across many machines.
On the same theme as #Barry Brown's answer:
If your application is compute / memory intensive in Java, it will probably be compute / memory intensive in C++ or any other "more efficient" language. You might get some extra leeway ... but you'll soon run into the same performance wall.
IMO, you need to do the following things:
You need profile your application, and look for any major performance bottlenecks. You might find some real surprises.
In the light of the previous step, review the design and algorithms, paying attention to space and time complexity issues. Do some research to see if someone has discovered better algorithms for doing the computations that are problematic from a performance perspective.
If the previous steps don't get you ahead of the curve, see if you can upgrade your platform; get a bigger machine with more processors, more memory, etc.
If you are still stuck, your only other option is a scale-out design. Assuming that individual user requests are processed in a single-threaded, re-architect your system so that you can run "workers" across multiple servers, with a load balancer on the front. If you have a persistent back-end, look into how you can replicate that. And so on.
Figure out if the key algorithms can be parallelized / distributed so that the resource intensive parts of a user request execute in parallel on multiple processors / multiple servers; e.g. using a "map-reduce" framework.
OK, so there is no easy answer. But simply changing programming languages is NOT a good answer.
Regardless of language your program will need to share with others when running in multiple instances on a single machine. That is simply the way computers work.
The best way to allow your current program to scale to use the available hardware resources is to chop your amount of work into small, independent pieces, and make them implement the Callable interface. These can then be executed by a suitable Executor which can then be chosen according to the available hardware. See the Executors class for many preconfigured versions. THis is what I would recommend you to do here.
If you want to switch language then Mac OS X 10.6 allows for programming in the way described above with C and ObjectiveC and if you do it properly OS X can distribute the code over all available computing resources (both CPU and GPU and what have we).
If none of the above is interesting to you, then consider one of the Grid frameworks. Terracotta may be a good place to start.
F# or ruby, or python, they are very good for calculations, and many other things
NASA uses python
Well.. I think you are looking for C#.
C# is Object Oriented and has excellent support for Generics. You can use it do write both WinForm and server-side applications.
You can read more about C# generics here: http://msdn.microsoft.com/en-us/library/ms379564(VS.80).aspx
Edit:
My mistake, geneTIcs, not geneRIcs. It does not change the fact C# will do the job, and using generics will reduce load significantly.
You might find the computer language shootout here interesting.
For example, here's Java vs C++.
You might find Ocaml (from which F# is derived) worth a look; it meets your requirements for OO, exceptions, static types and it has a native compiler, however according to the shootout you may be trading less memory for lower speed.

What's the easiest way to use C source code in a Java application?

I found this open-source library that I want to use in my Java application. The library is written in C and was developed under Unix/Linux, and my application will run on Windows. It's a library of mostly mathematical functions, so as far as I can tell it doesn't use anything that's platform-dependent, it's just very basic C code. Also, it's not that big, less than 5,000 lines.
What's the easiest way to use the library in my application? I know there's JNI, but that involves finding a compiler to compile the library under Windows, getting up-to-date with the JNI framework, writing the code, etc. Doable, but not that easy. Is there an easier way? Considering the small size of the library, I'm tempted to just translate it to Java. Are there any tools that can help with that?
EDIT
I ended up translating the part of the library that I needed to Java. It's about 10% of the library so far, though it'll probably increase with time. C and Java are pretty similar, so it only took a few hours. The main difficulty is fixing the bugs that get introduced by mistakes in the translation.
Thank you everyone for your help. The proposed solutions all seemed interesting and I'll look into them when I need to link to larger libraries. For a small piece of C code, manual translation was the simplest solution.
On the Java GNU Scientific Library project I used Swig to generate the JNI wrapper classes around the C libraries. Great tool, and can also generate wrapper code in several languages including Python. Highly recommended.
Your best bet is probably to grab a good c book (K&R: The C Progranmming language) a cup of tea and start translating! I would be skeptical about trusting a translation program, more often then not the best translator is yourself! If you do this one, then its done and you don't need to keep re-doing it. There might be some complications if the library is open source, you'll need to check the licence carefully about this. Another point to consider is that there is always going to be some element of risk and potential error in the translation, therefore it might be necessary to consider writing some tests to ensure that the translation is correct.
Are there no JAVA equivelent Math functions?
As you yourself comment the JNI way is possible, as for a c compiler you could probably use 'Bloodshead Dev-c++' might work, but it is a lot of effort for ~5000 lines.
I'd compile it and use JNA.
JNA (Java Native Access) is basically does in runtime what JNI at compile time and doesnt need any non-java code (not much java either).
I don't know about its performance or usability in your case but I'd give it a try.
Are you sure you want to use the C library, even if it is that small?
Once 64 bit gets a little more common, you'll need to start building/deploying both 32 bit and 64 bit versions of the library as well. And depending on what the C code is like, you may or may not need to update the code to make it build as 64 bit.
If the C library is simple, it may be easier to just port the C library to pure java and not have to deal with building/deploying a JNI library, the C library and the java code.
Well, there is AMPC. It is a C compiler for Windows, MacOS X and Linux, that can compile C code into Java Byte Code (the kind of code, that runs on a Java virtual machine).
AMPC
However, it is commercial and costs $199 per license. I doubt that pays off for you ;) I don't know of any free compiler like that.
OTOH, Java and C are pretty similar. You could probably refactor the C Code to Java (structs can be replaced with objects with public instance variables) and pointer operations can usually be translated to something else (array operations for example). Though I guess you don't want to go through 5,000 lines of code, do you?
Using JNI makes the code platform dependent, however if you say it is platform independent C, there is no reason why your Java code should be platform dependent. OTOH, depending on how costly these calculations are, using JNI might actually buy you a performance gain, as when it comes to raw number crunching throughput, C can still beat Java in speed. However JNI calls are very costly, so if the calculation is just a very simple, quick calculation, the JNI call itself might take equally long (or even longer) than the calculation performed, in which case using JNI will buy you nothing, but slowing down your app and causing memory overhead.
Indeed, JNA looks impressive, it requires less effort than directly using JNI. But in any case you'd lose the platform independence, and since you're probably only using a small part of it, you might consider translating what you actually need.
Have you tried using:
System.loadLibrary("mylibrary.dll");
Not sure if this will work with a pure C library but it's probably worth a shot. :)

Categories