Porting library from Java to Python

Porting library from Java to Python - java

I'm about to port a smallish library from Java to Python and wanted some advice (smallish ~ a few thousand lines of code). I've studied the Java code a little, and noticed some design patterns that are common in both languages. However, there were definitely some Java-only idioms (singletons, etc) present that are generally not-well-received in Python-world.
I know at least one tool (j2py) exists that will turn a .java file into a .py file by walking the AST. Some initial experimentation yielded less than favorable results.
Should I even be considering using an automated tool to generate some code, or are the languages different enough that any tool would create enough re-work to have justified writing from scratch?
If tools aren't the devil, are there any besides j2py that can at least handle same-project import management? I don't expect any tool to match 3rd party libraries from one language to a substitute in another.

If it were me, I'd consider doing the work by hand. A couple thousand lines of code isn't a lot of code, and by rewriting it yourself (rather than translating it automatically), you'll be in a position to decide how to take advantage of Python idioms appropriately. (FWIW, I worked Java almost exclusively for 9 years, and I'm now working in Python, so I know the kind of translation you'd have to do.)

Code is always better the second time you write it anyway....
Plus a few thousand lines of Java can probably be translated into a few hundred of Python.

Have a look at Jython. It can fairly seamlessly integrate Python on top of Java, and provide access to Java libraries but still let you act on them dynamically.

Automatic translators (f2c, j2py, whatever) normally emit code you wouldn't want to touch by hand. This is fine when all you need to do is use the output (for example, if you have a C compiler and no Fortran compiler, f2c allows you to compile Fortran programs), but terrible when you need to do anything to the code afterwards. If you intend to use this as anything other than a black box, translate it by hand. At that size, it won't be too hard.

I would write it again by hand. I don't know of any automated tools that would generate non-disgusting looking Python, and having ported Java code to Python myself, I found the result was both higher quality than the original and considerably shorter.
You gain quality because Python is more expressive (for example, anonymous inner class MouseAdapters and the like go away in favor of simple first class functions), and you also gain the benefit of writing it a second time.
It also is considerably shorter: for example, 99% of getters/setters can just be left out in favor of directly accessing the fields. For the other 1% which actually do something you can use property().
However as David mentioned, if you don't ever need to read or maintain the code, an automatic translator would be fine.

Jython's not what you're looking for in the final solution, but it will make the porting go much smoother.
My approach would be:
If there are existing tests (unit or otherwise), rewrite them in Jython (using Python's unittest)
Write some characterization tests in Jython (tests that record the current behavior)
Start porting class by class:
For each class, subclass it in Jython and port the methods one by one, making the method in the superclass abstract
After each change, run the tests!
You'll now have working Jython code that hopefully has minimal dependencies on Java.
Run the tests in CPython and fix whatever's left.
Refactor - you'll want to Pythonify the code, probably simplifying it a lot with Python idioms. This is safe and easy because of the tests.
I've this in the past with great success.

I've used Java2Python. It's not too bad, you still need to understand the code as it doesn't do everything correctly, but it does help.

Related

GUI in Java, Backend in SML?

I'm a big fan of functional programming languages (namely Standard ML and its dialects), mainly because of their expressiveness which allows for very consise, clean code. I can solve many problems dramatically faster with ML than with say Java.
However, Java is really great when it comes to programming GUIs (->SWT). I would definitely not wanna do that in a functional language.
This brings us to my actual question: Is there a good way to write a program in ML and then wrap it with a GUI written in Java?
What I have come up with so far is the following:
Compile the ML programm (e.g. with MLton or Poly ML) and execute the binary as
an external program from Java
(http://www.rgagnon.com/javadetails/java-0014.html).
Problem: The only way the Frontend/Backend can communicate is via Strings. This might require tons of (difficult) encoding/decoding.
Use JNI/JNA. From what I read, this will allow you to transfer Integers, Arrays etc. I think the external programms have to be written in C/C++ for this to work. With MLton's Foreign Function Interface I can write an Interface to my functional program in C and statically link the whole thing.
Problem: Apparantly, this only works with dynamic libraries, that is dlls in Windows. However, MLton will only let me compile the ML/C Programm to an executable. When trying to create a dll, I get a whole bunch of errors.
Does anyone have experience with this? Is there a better way to do this?
Thanks in advance! -Steffen
EDIT: I know about Scala which tries to bring concepts from functional programming to Java. I have tried it but I dont think it can compete with an actual functional programming language (in terms of expressivness)

That's not quite the exact answer but there is a functional language which is very ml-orientated for the JVM: Yeti
So if you like coding in ML than that's probably currently the closes you can get on the JVM and it integrates of course very well with all the Java APIs.

Is there a good way to write a program in ML and then wrap it with a GUI written in Java?
I don't know if this is a good way for small applications, but it is definitely a way, one that works for big IDE style stuff: Isabelle/ML vs. Isabelle/Scala/JVM. This is an application of interactive theorem proving, but plain SML programming is a trivial instance of that, in a sense.
So you can write basic Isabelle/ML code that emits some messages in the manner of the old-fashioned REPL, but the output can be interpreted by GUI components on the JVM side. Isabelle/jEdit does that routinely for pretty-printing of colored text, with a tiny little bit of rich text (sub/superscripts and bold).
Concerning explicit recoding of functional values over pipes/sockets as strings: that turns out quite simple in Isabelle/ML/Scala, due to some imitation of the way SML would represent typed values in untyped memory, but using untyped XML trees here instead of bits. The XML transfer syntax is specific to keep things simple: YXML instead of official quasi-human-readable XML. All of that fits into approx. 8000 bytes of SML source -- I am tempted to post the sources here, but better search the web for "Isabelle YXML" or "YXML PIDE".
Since Scala/JVM alone has been mentioned as standalone alternative: it definitely works, Scala is also very powerful and flexibile in imitating many programming styles (higher-order functional-object oriented), but for sophisticated symbolic applications like theorem proving, it just won't reach the purity and stability of SML. (Note that the underlying SML platform here is Poly/ML.)

Using NumPy and Cpython with Jython

I must use a commercial Java library, and would like to do it from Python. Jython is robust and I am fine with it being a few dot releases behind. However, I would like to use NumPy as well, which obviously does not work with Jython. Options like CPype and Java numeric libraries are unappealing. The former is essentially dead. The latter are mostly immature and lack the ease of use and wide acceptance of NumPy. My question is: How can one have Jython and Python code interoperate? It would be acceptable for me to call Jython from Cpython or the other way around.

It's ironic, considering that Jython and Numeric (NumPy's ancestor) were initiated by the same developer (Jim Hugunin, who then moved on to also initiate IronPython and now holds some kind of senior architect position at Microsoft, working on all kind of dynamic languages support for .NET and Silverlight), that there's no really good way to use numpy in Jython. The closest thing to that, which I know of, is the "jnumerical" project -- the (scarce) docs are on sourceforge, but the updated sources are on bitbucket.
"Numeric Python", what jnumerical implements, is not as slick and streamlined as its numpy descendant, but it has about the same functionality and shares a lot of the concepts and philosophy, so maybe you could find it usable -- worth checking out, at least.

Consider using execnet, which allows you to combine the strengths of both Jython and CPython, including current NumPy. The disadvantage here is that you will have to pay for the cost of serializing/deserializing objects between the two interpreters in two different process spaces. (You can avoid network overhead by using its support for subprocess.) But such a combination may work well, given that you're considering JPype, which would have similar (and probably higher) overhead. Just ensure you've partitioned the work appropriately.
The Jython developers (and I'm one of them) are looking at supporting NumPy in the future, via support of the C Extension API, but this is very much preliminary planning indeed.

I look very much formard to the Jython C Extension API! That would be awesome!
Until, that point, I think you have two alternatives:
http://jepp.sourceforge.net/ for embedding python in java, it has a nice console. The disadvatage, for me a too big disadvatage, is that it needs to be compiled against your own python. And with the python upgrade, you have to recompile (I don't want to compile python, in order to compile and use the extension - it is also not possible, especially if the code should be executed on different machines, on grid for example)
http://lucene.apache.org/pylucene/jcc/ - this is used for lucene, and for many other projects. I personally use it to wrap GATE NLP engine and also solr. To make that available to Python. Jcc is much faster than the (dead) JPype, probably because some data structures (like lists) are optimized and also because it is interfacing python<->java via C++ extension (according to this: http://www.slideshare.net/onyame/mixing-python-and-java page 30) I have tried moving 6mil of integers in the list between python and java, JPype was orders of magnitude slower (but i don't remember the numbers)
However, using Jcc, you can wrap only public methods, and sometimes it is tricky, especially if that method is receiving or returning certain java objects (in short, JCC must compile wrappers also for the passed-in objects, otherwise all the methods using/returning such methods are not accessible). So unless you need to distribute your code, you are better of with JEPP.

Disclaimer: Have not had persnal experience with it yet
Seems like JyNI – Jython Native Interface is the way to go.
There's also a newer question posted which may have newer alternatives.

If you stick to vector and matrix maths, I suggest to have a look onto vectorz.
It is a pure Java implementation and shall be 100% usable from within jython. I still didn't try it, but will soon, since I have the same necessity in finding a numpy alternative.

Are there compelling reasons not to use Groovy?

I'm developing a LoB application in Java after a long absence from the platform (having spent the last 8 years or so entrenched in Fortran, C, a smidgin of C++ and latterly .Net).
Java, the language, is not much changed from how I remember it. I like it's strengths and I can work around its weaknesses - the platform has grown and deciding upon the myriad of different frameworks which appear to do much the same thing as one another is a different story; but that can wait for another day - all-in-all I'm comfortable with Java. However, over the last couple of weeks I've become enamoured with Groovy, and purely from a selfish point of view: but not just because it makes development against the JVM a more succinct and entertaining (and, well, "groovy") proposition than Java (the language).
What strikes me most about Groovy is its inherent maintainability. We all (I hope!) strive to write well documented, easy to understand code. However, sometimes the languages we use themselves defeat us. An example: in 2001 I wrote a library in C to translate EDIFACT EDI messages into ANSI X12 messages. This is not a particularly complicated process, if slightly involved, and I thought at the time I had documented the code properly - and I probably had - but some six years later when I revisited the project (and after becoming acclimatised to C#) I found myself lost in so much C boilerplate (mallocs, pointers, etc. etc.) that it took three days of thoughtful analysis before I finally understood what I'd been doing six years previously.
This evening I've written about 2000 lines of Java (it is the day of rest, after all!). I've documented as best as I know how, but, but, of those 2000 lines of Java a significant proportion is Java boiler plate.
This is where I see Groovy and other dynamic languages winning through - maintainability and later comprehension. Groovy lets you concentrate on your intent without getting bogged down on the platform specific implementation; it's almost, but not quite, self documenting. I see this as being a huge boon to me when I revisit my current project (which I'll port to Groovy asap) in several years time and to my successors who will inherit it and carry on the good work.
So, are there any reasons not to use Groovy?

There are two reasons I can think of not to use Groovy (or Jython, or JRuby):
If you really, truly need performance
If you will miss static type checking
Those are both big ifs. Performance is probably less of a factor in most apps than people think, and static type checking is a religious issue. That said, one strength of all of these languages is their ability to mix and match with native Java code. Best of both worlds and all that.
Since I'm not responsible for your business, I say "Go for it".

If you use Groovy, you're basically throwing away useful information about types. This leaves your code "groovy": nice and concise.
Bird b
becomes
def b
Plus you get to play with all the meta-class stuff and dynamic method calls which are a torture in Java.
However -- and yes I have tried IntelliJ, Netbeans and Eclipse extensively -- serious automatic refactoring is not possible in Groovy. It's not IntelliJ's fault: the type information just isn't there. Developers will say, "but if you have unit tests for every single code path (hmmmm), then you can refactor more easily." But don't believe the hype: adding more code (unit tests) will add to the safety of massive refactoring, but they don't make the work easier. Now you have to hand fix the original code and the unit tests.
So this means that you don't refactor as often in Groovy, especially when a project is mature. While your code will be concise and easy to read, it will not be as brilliant as code that has been automatically refactored daily, hourly and weekly.
When you realize that a concept represented by a class in Java is no longer necessary, you can just delete it. In Eclipse or Netbeans or whatever, your project hierarchy lights up like a Christmas tree, telling you exactly what you've screwed up with this change. def thing tells the compiler (and therefore your IDE) nothing about how a variable will be used, whether the method exists, etc. And IDEs can only do so much guessing.
In the end, Java code is filled with "boilerplate," but it's been kneaded into its final form after many refactorings. And to me, that's the only way to get high-quality, readable code for future programmers, including when those future programmers are you-in-the-future.

Two reasons why Scala might be a compelling alternative to Groovy:
Performance on par with Java
Static typing without clutter

One of the biggest things you lose when you use dynamic languages, especially in a large codebase is the ability to use an IDE to re-factor. Languages that allow dynamically adding code to objects simply can't be parsed by today's IDEs to allow the kind of easy refactoring methods you can get from Eclipse, etc. for Java, C++, etc.
It's not really a case of "Dynamic languages are better than Static". Use what's best for you. The really cool thing about Groovy in particular is you can mix and match Java and Groovy in the same project, and it all runs on the VM. Yes, Scala is another example.

I think the biggest issue is lack of IDE support compared to java, however the plugins for Eclipse and Netbeans are getting better all the time. Also, if I remember correctly Groovy does not support anonymous inner classes if you really need them for some reason. I would personally choose Groovy anytime though.

How to rewrite or convert C# code in Java code?

I start to write a client - server application using .net (C#) for both client and server side.
Unfortunately, my company refuse to pay for Windows licence on server box meaning that I need to rewrite my code in Java, or go to the Mono way.
Is there any good way to translate C# code in Java ? The server application used no .net specific feature, only cross language tools like Spring.net, Hibernate.net and log4net.
Thanks.

I'd suggest building for Mono. You'll run into some gray area, but overall it's great. However, if you want to build for Java, you might check out Grasshopper. It's a commercial product, but it claims to be able to translate CIL (the output of the C# compiler) to Java bytecodes.

Possible solutions aside, direct translations of programs written in one language to a different language is generally considered a Bad Idea™ -- especially if this translation is done in some automated fashion. Even when done by a "real" programmer, translating an application line by line often results in a less than desirable end result because each language has its own idioms, strengths and weaknesses that require things be done in a slightly different way.
As painful as it may be, it's probably in your best interest and those who have to maintain this application to rewrite it in Java if that's what your employer requires.

I only know the other way. Dbo4 is developed in java and the c# version is generated from the java sources automaticaly.

There is no good way. My recommendation is to start over in Java, or like you said use Mono.

Although I think the first mistake was choosing an implementation language without ensuring a suitable deployment environment, there's nothing that can be done about that now. I would think the Mono way would be better. Having to rewrite code would only increase the cost of the project, especially if you already have a good amount of code written in C#. I, personally, try to avoid rewriting code whenever possible.

Java and C# are pretty close in syntax and semantics. The real problem is the little differences. They will bite you when you dont expect it.

Grasshopper is really the best solution at this time, if the licensing works for you (the free version has some significant limitations). Its completely based on the Mono class libs (which are actually pretty good), but runs on top of standard Java VMs. Thats good as the Java VMs are generally a bit faster and more stable than Mono, in my experience. It does have more weaknesses than Mono when it comes to Forms/Graphics related APIs, as much of this hasn't been ported to Java from the Mono VM, however.
In the cases were it works, it can be wonderful, though. The performance is sometimes even better than when running the same code on MS's VM on Windows. :)

I would say from a maintance stand point rewrite the code. It's going to bring the initial cost of the projet up but would be less labor intensive later for whoever is looking at the code. Like previous posters stated anything automated like this can't do as good as a job as a "real" programmer and doing line by line converting won't help much either. You don't want to produce code later on that works but is hell to maintain.

What's the easiest way to use C source code in a Java application?

I found this open-source library that I want to use in my Java application. The library is written in C and was developed under Unix/Linux, and my application will run on Windows. It's a library of mostly mathematical functions, so as far as I can tell it doesn't use anything that's platform-dependent, it's just very basic C code. Also, it's not that big, less than 5,000 lines.
What's the easiest way to use the library in my application? I know there's JNI, but that involves finding a compiler to compile the library under Windows, getting up-to-date with the JNI framework, writing the code, etc. Doable, but not that easy. Is there an easier way? Considering the small size of the library, I'm tempted to just translate it to Java. Are there any tools that can help with that?
EDIT
I ended up translating the part of the library that I needed to Java. It's about 10% of the library so far, though it'll probably increase with time. C and Java are pretty similar, so it only took a few hours. The main difficulty is fixing the bugs that get introduced by mistakes in the translation.
Thank you everyone for your help. The proposed solutions all seemed interesting and I'll look into them when I need to link to larger libraries. For a small piece of C code, manual translation was the simplest solution.

On the Java GNU Scientific Library project I used Swig to generate the JNI wrapper classes around the C libraries. Great tool, and can also generate wrapper code in several languages including Python. Highly recommended.

Your best bet is probably to grab a good c book (K&R: The C Progranmming language) a cup of tea and start translating! I would be skeptical about trusting a translation program, more often then not the best translator is yourself! If you do this one, then its done and you don't need to keep re-doing it. There might be some complications if the library is open source, you'll need to check the licence carefully about this. Another point to consider is that there is always going to be some element of risk and potential error in the translation, therefore it might be necessary to consider writing some tests to ensure that the translation is correct.
Are there no JAVA equivelent Math functions?
As you yourself comment the JNI way is possible, as for a c compiler you could probably use 'Bloodshead Dev-c++' might work, but it is a lot of effort for ~5000 lines.

I'd compile it and use JNA.
JNA (Java Native Access) is basically does in runtime what JNI at compile time and doesnt need any non-java code (not much java either).
I don't know about its performance or usability in your case but I'd give it a try.

Are you sure you want to use the C library, even if it is that small?
Once 64 bit gets a little more common, you'll need to start building/deploying both 32 bit and 64 bit versions of the library as well. And depending on what the C code is like, you may or may not need to update the code to make it build as 64 bit.
If the C library is simple, it may be easier to just port the C library to pure java and not have to deal with building/deploying a JNI library, the C library and the java code.

Well, there is AMPC. It is a C compiler for Windows, MacOS X and Linux, that can compile C code into Java Byte Code (the kind of code, that runs on a Java virtual machine).
AMPC
However, it is commercial and costs $199 per license. I doubt that pays off for you ;) I don't know of any free compiler like that.
OTOH, Java and C are pretty similar. You could probably refactor the C Code to Java (structs can be replaced with objects with public instance variables) and pointer operations can usually be translated to something else (array operations for example). Though I guess you don't want to go through 5,000 lines of code, do you?
Using JNI makes the code platform dependent, however if you say it is platform independent C, there is no reason why your Java code should be platform dependent. OTOH, depending on how costly these calculations are, using JNI might actually buy you a performance gain, as when it comes to raw number crunching throughput, C can still beat Java in speed. However JNI calls are very costly, so if the calculation is just a very simple, quick calculation, the JNI call itself might take equally long (or even longer) than the calculation performed, in which case using JNI will buy you nothing, but slowing down your app and causing memory overhead.

Indeed, JNA looks impressive, it requires less effort than directly using JNI. But in any case you'd lose the platform independence, and since you're probably only using a small part of it, you might consider translating what you actually need.

Have you tried using:
System.loadLibrary("mylibrary.dll");
Not sure if this will work with a pure C library but it's probably worth a shot. :)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.