I must use a commercial Java library, and would like to do it from Python. Jython is robust and I am fine with it being a few dot releases behind. However, I would like to use NumPy as well, which obviously does not work with Jython. Options like CPype and Java numeric libraries are unappealing. The former is essentially dead. The latter are mostly immature and lack the ease of use and wide acceptance of NumPy. My question is: How can one have Jython and Python code interoperate? It would be acceptable for me to call Jython from Cpython or the other way around.
It's ironic, considering that Jython and Numeric (NumPy's ancestor) were initiated by the same developer (Jim Hugunin, who then moved on to also initiate IronPython and now holds some kind of senior architect position at Microsoft, working on all kind of dynamic languages support for .NET and Silverlight), that there's no really good way to use numpy in Jython. The closest thing to that, which I know of, is the "jnumerical" project -- the (scarce) docs are on sourceforge, but the updated sources are on bitbucket.
"Numeric Python", what jnumerical implements, is not as slick and streamlined as its numpy descendant, but it has about the same functionality and shares a lot of the concepts and philosophy, so maybe you could find it usable -- worth checking out, at least.
Consider using execnet, which allows you to combine the strengths of both Jython and CPython, including current NumPy. The disadvantage here is that you will have to pay for the cost of serializing/deserializing objects between the two interpreters in two different process spaces. (You can avoid network overhead by using its support for subprocess.) But such a combination may work well, given that you're considering JPype, which would have similar (and probably higher) overhead. Just ensure you've partitioned the work appropriately.
The Jython developers (and I'm one of them) are looking at supporting NumPy in the future, via support of the C Extension API, but this is very much preliminary planning indeed.
I look very much formard to the Jython C Extension API! That would be awesome!
Until, that point, I think you have two alternatives:
http://jepp.sourceforge.net/ for embedding python in java, it has a nice console. The disadvatage, for me a too big disadvatage, is that it needs to be compiled against your own python. And with the python upgrade, you have to recompile (I don't want to compile python, in order to compile and use the extension - it is also not possible, especially if the code should be executed on different machines, on grid for example)
http://lucene.apache.org/pylucene/jcc/ - this is used for lucene, and for many other projects. I personally use it to wrap GATE NLP engine and also solr. To make that available to Python. Jcc is much faster than the (dead) JPype, probably because some data structures (like lists) are optimized and also because it is interfacing python<->java via C++ extension (according to this: http://www.slideshare.net/onyame/mixing-python-and-java page 30) I have tried moving 6mil of integers in the list between python and java, JPype was orders of magnitude slower (but i don't remember the numbers)
However, using Jcc, you can wrap only public methods, and sometimes it is tricky, especially if that method is receiving or returning certain java objects (in short, JCC must compile wrappers also for the passed-in objects, otherwise all the methods using/returning such methods are not accessible). So unless you need to distribute your code, you are better of with JEPP.
Disclaimer: Have not had persnal experience with it yet
Seems like JyNI – Jython Native Interface is the way to go.
There's also a newer question posted which may have newer alternatives.
If you stick to vector and matrix maths, I suggest to have a look onto vectorz.
It is a pure Java implementation and shall be 100% usable from within jython. I still didn't try it, but will soon, since I have the same necessity in finding a numpy alternative.
Related
I am trying to optimize the performance of some natural language processing in a python project I am currently working on. Basically I would like to outsource the computationally intensive parts to use apache OpenNLP, which is written in Java.
My question is what would be the recommended way to link Java functions/classes back to my python code? The three main ways I have thought about are
using C/C++ bindings in python and then embedding a JVM in my C program. This is what I am leaning towards because I am somewhat familiar writing C extensions to python, but using a triangle of languages where C only functions as an intermediary doesn't seem right somehow.
using Jython. My main concern with this is that CPython is the overwhelmingly popular python implementation as far as I know and I don't want to break compatibility with other collaborators or packages.
streaming input and output to the binaries that come with OpenNLP. Apache provides tokenizers and such as stand-alone binaries that you can pipe data to and from. This would probably be the easiest option to implement, but it also seems like the most crude.
I'm wondering if anyone who has experience interfacing python and java knows how much the performance is likely to differ between these options, and which one is "recommended" or considered best practice in such a situation - or of course if there is an entirely different way to do it that I haven't thought of.
I did search SO for existing answers and found this, but it's an answer from 3.5 years ago and mentions some projects that are either dead, hard to integrate/configure/install or still under development.
Some comments mentioned that the overhead for all three methods is likely to be insignificant compared to the time required to run the actual NLP code. This is probably true, but I'm still interested in what the answer is from a more general perspective.
Thanks!
Consider building a java server with existing language independent RPC mecahnism(thirift, ....). And use python as the RPC client to talk with the server. It has loose coupling。
So... in an attempt to use preexisting wheels, rather than reinvent my own at every turn, I've been trying to get a decent Common Lisp environment working with [a particular Java's library]. My ABCL adventures actually went reasonably well and I was able, eventually, to get ABCL talking nicely to [it]. Of course I wanted more than just that, I wanted interoperability between the [it] and my half-round wheel, chemicl, a cheminformatics package I started writing in Common Lisp. This is where the train began to fall of the tracks.
ABCL and cxml-stp
A while back, in an earlier, aborted attempt to get some of my chem/bioinformatics (https://github.com/slyrus/cl-bio) stuff working with ABCL I noticed that plexippus-xpath couldn't be loaded into ABCL. This was fixed, so I was encouraged that things might work with ABCL. However, cxml-stp seems to break ABCL.
Hopefully this is a fixable bug and some future version of ABCL will work with cxml-stp.
In the meantime...
Other CL and Java
So, I figured I'd try some other approaches to getting Java and a Common Lisp implementation to play nice. I know, you're thinking "why doesn't the dude just use clojure? After all, that's what clojure was designed for!" Well, that's a good question. I did use clojure for some earlier explorations with [this Java library] and, while the java integration generally works well, I have a bunch of existing Common Lisp code I'd like to use and, at the time at least, it seemed like all of the clojure wrappers where thin wrappers around ugly Java libraries. I've grown to know and love many Common Lisp libraries, many of which are nicely available in QuickLisp, and I'd like to be able to use those (things like cxml-stp, plexippus-xpath, opticl, etc...).
Clozure Common-Lisp (CCL), for five years now, has shipped with a fully ported distribution of JFLI (JFLI previously depended on the LispWorks FFI) as a standard component of the "examples" provided with the CCL source distribution. JFLI (by Rich Hickey, creator of Clojure) uses an in-process model and will likely be at least an order of magnitude more performant than anything you might put together from the model employed by Hickey's next attempt, a more widely compatible socket-based solution he named FOIL.
Have look at the following URL to browse the current JFLI source code as it currently exists in the Clozure development trunk:
https://github.com/Clozure/ccl/tree/master/examples/jfli
Rich Hickey introduced JFLI with the following summary of the approach he had taken
(Substitute CCL's FFI where he references LW-FFI obviously):
My objective was to provide comprehensive, safe, dynamic and Lisp-y access to
Java and Java libraries as if they were Lisp libraries, for use in Lisp programs,
i.e. with an emphasis on working in Lisp rather than in Java.
The approach I took was to embed a JVM instance in the Lisp process using JNI. I
was able to do this using LispWorks' own FLI and no C (or Java! *) code, which
is a tribute to the LW FLI. On top of the JNI layer (essentially a wrapper
around the entire JNI API), I built this user-level API using Java Reflection.
From Clojure it is easy enough to use Java libraries...but what libraries does Clojure not have that are best done with Java?
It isn't easy to give a straightforward question to this answer, because it would be first necessary to define the difference between a Clojure library and a Java library. (Even more so, because Clojure is a Java library :))
Ok, let's start with a premise that a Clojure library is any library written in Clojure and simply ignore the Java code in Clojure implementation itself. But, what if given library uses some Java dependency, like say one of Apache Commons libraries? Would it still qualify as a Clojure and not Java library?
My own criterion (and I am guessing yours, too) for the difference between the two is whether or not the library exposes a Clojure-style interface with namespaces, functions, sequences or a Java-style interface with classes, methods and collections.
It is almost trivial to write Clojure wrappers around such Java libraries. In my experience that is very useful if you want to fit in functionality of the library in overall functional design of your application. A simple example would be if you want to map a Java method against a sequence. You can either use an ad-hoc defined anonymous function to wrap the method call, or a named function from your wrapper layer. If you do such things very often the second approach may be more suited, at least for most commonly used methods.
So, my conclusion is that any Java library should be easy to convert to a Clojure library. All that is needed is to write a wrapper for it.
Another conclusion is that it may not be needed at all. If all you want is to call the method, you may still just call the method and avoid all the architecture astronautics. :)
One potential answer may be a bytecode library like ASM http://asm.ow2.org/
But honestly, with time, any library in Java can be written in clojure. Some Java code that compiled to different bytecode can be replicated if clojure uses ASM underneath.
I strongly prefer Clojure as a language for development in general, but there are several good reasons I have found for using Java libraries or writing Java code in preference to Clojure:
Leveraging mature Java libraries - some Java libraries are truly excellent and very mature. From a pragmatic perspective, you are much better off directly using Java libraries like Netty, Swing or Joda Time rather than trying to utilise or invent some Clojure alternative. Sometimes there are Clojure wrappers for these libraries but these are mostly still in a somewhat experimental / immature state.
High performance code - I do quite a lot of data and image processing where maximum performance in essential. This rules out pretty much any approach that adds overhead (such as lazy sequences, temporary object creation) so idiomatic Clojure won't fit the bill. You could probably get there with very unidiomatic Clojure (lots of tight imperative loops and primitive array manipulation for example...) but if you're going to write this kind of code it's often actually simpler and cleaner in Java
APIs with mutable semantics - if the APIs you are relying upon depend upon mutable objects, Clojure code to interface with these APIs can become a bit ugly and unidiomatic. Sometimes writing Java in these cases is simpler.
The good news is that because the interoperability between Clojure and Java is so good, there isn't really any issue with mixing Clojure and Java code in a project. As a result, most of my projects are a mix of Clojure and Java code - I use whichever one is most appropriate for the task at hand.
Libraries for building GUIs comes to mind.
Lots of APIs. In fact, Clojure itself is built on top many sturdy Java APIs like the java.util.Collection API. And well known Clojure APIs like Incanter are built on top of libraries like Parallel Colt, and JFreeChart.
I can't find the quote at the moment; but Rich said somthing to the effect of "clojure should use java where possible" and not wrap java unnecessarily. The principal being to embrace the java platform instead of fighting it. so the general advice becomes:
If a good java library exists use it, if not write one in clojure.
I'm about to port a smallish library from Java to Python and wanted some advice (smallish ~ a few thousand lines of code). I've studied the Java code a little, and noticed some design patterns that are common in both languages. However, there were definitely some Java-only idioms (singletons, etc) present that are generally not-well-received in Python-world.
I know at least one tool (j2py) exists that will turn a .java file into a .py file by walking the AST. Some initial experimentation yielded less than favorable results.
Should I even be considering using an automated tool to generate some code, or are the languages different enough that any tool would create enough re-work to have justified writing from scratch?
If tools aren't the devil, are there any besides j2py that can at least handle same-project import management? I don't expect any tool to match 3rd party libraries from one language to a substitute in another.
If it were me, I'd consider doing the work by hand. A couple thousand lines of code isn't a lot of code, and by rewriting it yourself (rather than translating it automatically), you'll be in a position to decide how to take advantage of Python idioms appropriately. (FWIW, I worked Java almost exclusively for 9 years, and I'm now working in Python, so I know the kind of translation you'd have to do.)
Code is always better the second time you write it anyway....
Plus a few thousand lines of Java can probably be translated into a few hundred of Python.
Have a look at Jython. It can fairly seamlessly integrate Python on top of Java, and provide access to Java libraries but still let you act on them dynamically.
Automatic translators (f2c, j2py, whatever) normally emit code you wouldn't want to touch by hand. This is fine when all you need to do is use the output (for example, if you have a C compiler and no Fortran compiler, f2c allows you to compile Fortran programs), but terrible when you need to do anything to the code afterwards. If you intend to use this as anything other than a black box, translate it by hand. At that size, it won't be too hard.
I would write it again by hand. I don't know of any automated tools that would generate non-disgusting looking Python, and having ported Java code to Python myself, I found the result was both higher quality than the original and considerably shorter.
You gain quality because Python is more expressive (for example, anonymous inner class MouseAdapters and the like go away in favor of simple first class functions), and you also gain the benefit of writing it a second time.
It also is considerably shorter: for example, 99% of getters/setters can just be left out in favor of directly accessing the fields. For the other 1% which actually do something you can use property().
However as David mentioned, if you don't ever need to read or maintain the code, an automatic translator would be fine.
Jython's not what you're looking for in the final solution, but it will make the porting go much smoother.
My approach would be:
If there are existing tests (unit or otherwise), rewrite them in Jython (using Python's unittest)
Write some characterization tests in Jython (tests that record the current behavior)
Start porting class by class:
For each class, subclass it in Jython and port the methods one by one, making the method in the superclass abstract
After each change, run the tests!
You'll now have working Jython code that hopefully has minimal dependencies on Java.
Run the tests in CPython and fix whatever's left.
Refactor - you'll want to Pythonify the code, probably simplifying it a lot with Python idioms. This is safe and easy because of the tests.
I've this in the past with great success.
I've used Java2Python. It's not too bad, you still need to understand the code as it doesn't do everything correctly, but it does help.
I found this open-source library that I want to use in my Java application. The library is written in C and was developed under Unix/Linux, and my application will run on Windows. It's a library of mostly mathematical functions, so as far as I can tell it doesn't use anything that's platform-dependent, it's just very basic C code. Also, it's not that big, less than 5,000 lines.
What's the easiest way to use the library in my application? I know there's JNI, but that involves finding a compiler to compile the library under Windows, getting up-to-date with the JNI framework, writing the code, etc. Doable, but not that easy. Is there an easier way? Considering the small size of the library, I'm tempted to just translate it to Java. Are there any tools that can help with that?
EDIT
I ended up translating the part of the library that I needed to Java. It's about 10% of the library so far, though it'll probably increase with time. C and Java are pretty similar, so it only took a few hours. The main difficulty is fixing the bugs that get introduced by mistakes in the translation.
Thank you everyone for your help. The proposed solutions all seemed interesting and I'll look into them when I need to link to larger libraries. For a small piece of C code, manual translation was the simplest solution.
On the Java GNU Scientific Library project I used Swig to generate the JNI wrapper classes around the C libraries. Great tool, and can also generate wrapper code in several languages including Python. Highly recommended.
Your best bet is probably to grab a good c book (K&R: The C Progranmming language) a cup of tea and start translating! I would be skeptical about trusting a translation program, more often then not the best translator is yourself! If you do this one, then its done and you don't need to keep re-doing it. There might be some complications if the library is open source, you'll need to check the licence carefully about this. Another point to consider is that there is always going to be some element of risk and potential error in the translation, therefore it might be necessary to consider writing some tests to ensure that the translation is correct.
Are there no JAVA equivelent Math functions?
As you yourself comment the JNI way is possible, as for a c compiler you could probably use 'Bloodshead Dev-c++' might work, but it is a lot of effort for ~5000 lines.
I'd compile it and use JNA.
JNA (Java Native Access) is basically does in runtime what JNI at compile time and doesnt need any non-java code (not much java either).
I don't know about its performance or usability in your case but I'd give it a try.
Are you sure you want to use the C library, even if it is that small?
Once 64 bit gets a little more common, you'll need to start building/deploying both 32 bit and 64 bit versions of the library as well. And depending on what the C code is like, you may or may not need to update the code to make it build as 64 bit.
If the C library is simple, it may be easier to just port the C library to pure java and not have to deal with building/deploying a JNI library, the C library and the java code.
Well, there is AMPC. It is a C compiler for Windows, MacOS X and Linux, that can compile C code into Java Byte Code (the kind of code, that runs on a Java virtual machine).
AMPC
However, it is commercial and costs $199 per license. I doubt that pays off for you ;) I don't know of any free compiler like that.
OTOH, Java and C are pretty similar. You could probably refactor the C Code to Java (structs can be replaced with objects with public instance variables) and pointer operations can usually be translated to something else (array operations for example). Though I guess you don't want to go through 5,000 lines of code, do you?
Using JNI makes the code platform dependent, however if you say it is platform independent C, there is no reason why your Java code should be platform dependent. OTOH, depending on how costly these calculations are, using JNI might actually buy you a performance gain, as when it comes to raw number crunching throughput, C can still beat Java in speed. However JNI calls are very costly, so if the calculation is just a very simple, quick calculation, the JNI call itself might take equally long (or even longer) than the calculation performed, in which case using JNI will buy you nothing, but slowing down your app and causing memory overhead.
Indeed, JNA looks impressive, it requires less effort than directly using JNI. But in any case you'd lose the platform independence, and since you're probably only using a small part of it, you might consider translating what you actually need.
Have you tried using:
System.loadLibrary("mylibrary.dll");
Not sure if this will work with a pure C library but it's probably worth a shot. :)