I am trying to optimize the performance of some natural language processing in a python project I am currently working on. Basically I would like to outsource the computationally intensive parts to use apache OpenNLP, which is written in Java.
My question is what would be the recommended way to link Java functions/classes back to my python code? The three main ways I have thought about are
using C/C++ bindings in python and then embedding a JVM in my C program. This is what I am leaning towards because I am somewhat familiar writing C extensions to python, but using a triangle of languages where C only functions as an intermediary doesn't seem right somehow.
using Jython. My main concern with this is that CPython is the overwhelmingly popular python implementation as far as I know and I don't want to break compatibility with other collaborators or packages.
streaming input and output to the binaries that come with OpenNLP. Apache provides tokenizers and such as stand-alone binaries that you can pipe data to and from. This would probably be the easiest option to implement, but it also seems like the most crude.
I'm wondering if anyone who has experience interfacing python and java knows how much the performance is likely to differ between these options, and which one is "recommended" or considered best practice in such a situation - or of course if there is an entirely different way to do it that I haven't thought of.
I did search SO for existing answers and found this, but it's an answer from 3.5 years ago and mentions some projects that are either dead, hard to integrate/configure/install or still under development.
Some comments mentioned that the overhead for all three methods is likely to be insignificant compared to the time required to run the actual NLP code. This is probably true, but I'm still interested in what the answer is from a more general perspective.
Thanks!
Consider building a java server with existing language independent RPC mecahnism(thirift, ....). And use python as the RPC client to talk with the server. It has loose coupling。
Well, I've taken help of Google, Stackoverflow and whatever else I could find, did as much as I could, but it seems that I am unable to find out an exact answer! I have multiple queries, and I would love to have answers from the database-people as well as from the programmers and framework users.
From the programming languages, I know C/C++, Java and Python. I have undertaken a CMS project that would require frequent C's & R's of the CRUD. The project would have 50k users atleast. The head-to-toe of the project has been all figured out, and now I need to code it and make it live online.
Well, I want to use Neo4j as my database as its data representation model (nodes and relationships) is closest to the real project model. Now, neo4j has bindings for various languages, and one of them is Python (whose python bindings are very oldish, the jpype hasn't been updated since ages). I am thinking of going for some Java based framework, but then I leave this idea as I personally haven't heard much of java frameworks. But one of my partner tells me to go for Zend (PHP) as it has some kind of functionality that lets us execute Java code. Won't this slow the code? I mean executing one language's code in another language...
So, it all comes to this:
1) Database: I would want to go for Neo4j. But does it goes well off when the scalability factor kicks in? (From what I could gather from google, there are no scalability issues).
2) What framework to use in case of Neo4j? I would require a framework that is able to handle tonnes of requests and large data as the users of the project would be Creating and Reading data a lot.
P.S.: I know it is a long question, but couldn't jot it down in lesser words!
I can't speak about the scalability or suitability of Neo4J for your particular project.
However, I'd strongly advise you against trying to mix and match languages like Java and PHP. It's so much easier to stick to the best one for your particular task. I'd also strongly advise you against using JNI for anything unless you have no other option. Java is fast enough that you should almost never need JNI for performance.
That said, it's OK to run Neo4j in its "full server" mode and then have your PHP or Python application access it using some driver over the network. I just wouldn't recommend making an ugly hybrid of PHP and Java at your application layer.
Some decent Java frameworks you could check out include:
Spring
Google Guice with Sitebricks
Apache Struts 2
They're pretty standard in the industry and there are tons of good resources available on all of them.
In regards to the mini-question about language interoperability, Java provides the JNI interface, which allows the JVM and user code to make calls into other languages and vice versa. When the native code (e.g. C code called by Java, or Java called from C) runs, it is actually running in its natural environment, so there's no performance loss in terms of actual execution.
Neo4j as a standalone server has also REST API: http://docs.neo4j.org/chunked/milestone/rest-api.html, if you can embedded your requests in single REST queries, there is no need to use native embedded neo4j. If there is no need to use the embedded neo4j, you can take any language of your choice.
Regarding the scalability, recently neo4j can be used on Azure, so it must be quite easy to scale. To learn more how to scale neo4j, go to this page on neo4j.org.
UPDATE: in the newest version of Neo4j, there is added the support for a new query language - http://blog.neo4j.org/2011/06/kiruna-stol-14-milestone-4.html.
I have a ruby on rails application and thinking about porting it to java. What are the things I should consider before that? How hard is that task in terms of changes required?
Any advice from the people who have walked this path is greatly appreciated.
Motivation:
I have two web applications using same data. One is in java, another - rails. As a result, they both have databases and lots of stuff is sent back and forth and stored in copied tables. As an addition it is extremly slow. I can't move java to RoR, so thinking about what it'll take to move RoR to java (jvm that is).
If I were in your position, I would try running your Ruby code in JRuby which is an implementation of Ruby that runs on the JVM. It supports rails, which means you should be able to take your code and run it on the JVM.
Once that's done, you can start writing new features in java, and it should work with your old code transparently. You can also begin the task of rewriting some of your code in Java, without breaking comparability.
What's the motivation for porting this ?
If you need to integrate with Java libraries, there are numerous options available other than porting the whole app to Java.
If you need a direct port, then (as Chad has illustrated) JRuby may be the way to go.
If you want to do a complete rewrite, but keep the RoR paradigm, check out Grails, which is a JVM-based RoR equivalent using Groovy (a Java Virtual Machine-compatible language that allows you to bind in Java libraries)
In general switching out a core component or framework means that you will essentially have to reimplement some or even lots of your application. Hence, you usually want a good reason to do so.
If I understand your question correctly you need to deploy on a platform without Ruby but with a JVM. In that case I would make it run with JRuby as the very first priority as this is with a very high probability the approach needing the least amount of work.
This may seem like an obvious solution, but have you tried running both applications with the same database? My company is currently migrating our software from PHP to Rails, and while we're re-coding components one-by-one, we let both applications use the same database. No need to send data back and forth, as long as you make sure the applications don't conflict.
Check out playframework.org - it's a sweet web framework completely written in Java and the best Rails rip-off in Java I've seen to date. I ported a fairly simple app over to the Playframework in a few days. In some ways it's sweeter than Rails because of the way it uses Annotations to mix in code in a type safe manner. If you're a rails programmer with a Java background, you'll be productive almost instantly because the framework maps directly to the Rails world.
Hi guys does anyone know why the programming language C++ is used more widely in biometric security applications compared to the programming language Java? The answers that I have collected so far are 1) Virtual Compilers 2) OpenCV Library provided by C++. Can anyone help with this question??
Maybe it's the hardware support: I wrote an app that uses a fingerprint sensor. The library support for the device is C++, so I wrote the app in C++. Now they have a .NET version, so my next app will be C#.
I don't know specifically about biometric applications, but in general when security is important Java can be a stumbling block. Depending on how the security requirements are written, they can cover things that one must do manually in C++, but which are done automatically by Java. This poses a problem because one would need to demonstrate that Java properly (and in a timely manner!) satisfies the requirement. It is a lot easier to show that these requirements are met in C++ code, because the code the meets the requirement is part of the program in question.
If the security person/requirements/customer make it clear that relying on Java for some security features is acceptable, then this is no big deal. We could go round-and-round about whether or not it is reasonable to rely on/trust Java to satisfy security requirements, it really just depends on the specific security needs.
I am willing to put money on the reason being simply that the access api's for the hardware are written in c++. Most of the modern/higher-level languages are not going to easily communicate with hardware originaly exposed through a C/C++ api.
On a somewhat related note, Vala has all the languages features expected of a modern\high-level language(and then some), but compiles to C binary and source, and can easily make use of any library written in C (not sure about c++). Check it out, I havnt used it much, but its pretty cool.
Implementing a library in C++ provide a lot over java. Once written, C++ library can run on almost any platform (including embedded ones), and can be made available as a native import to a variety of other languages through tools like SWIG. Java can only run on something with enough speed and memory to run a JVM, and the only other Java programs can include the code as a native import. For biometric applications especially I think running on embedded systems would be a large concern, since you could build this into a small sensor.
The more glib answer would be no one wants to wait for your garbage collection cycle to launch the friggen missiles.
You could replace Java with any other language there. Probably it has more to do with the APIs and hardware.
Also, Java is more suited for Web Applications. Its not the best choice for desktop applications.
For some biometric applications, execution speed is crucial.
For instance, let's say you're doing facial recognition for a checkpoint, and Java takes twice the time to run the algorithm that a compiled language like C++ does. That means if you go with Java, either:
The checkpoint lines will be twice as long,
You'll have to pay to staff twice as many checkpoints, or
Your system will do half as good a job at recognizing faces
None of those are usually acceptable options, which makes using Java a non-starter.
Currently Google App Engine supports both Python & Java. Java support is less mature. However, Java seems to have a longer list of libraries and especially support for Java bytecode regardless of the languages used to write that code. Which language will give better performance and more power? Please advise. Thank you!
Edit:
http://groups.google.com/group/google-appengine-java/web/will-it-play-in-app-engine?pli=1
Edit:
By "power" I mean better expandability and inclusion of available libraries outside the framework. Python allows only pure Python libraries, though.
I'm biased (being a Python expert but pretty rusty in Java) but I think the Python runtime of GAE is currently more advanced and better developed than the Java runtime -- the former has had one extra year to develop and mature, after all.
How things will proceed going forward is of course hard to predict -- demand is probably stronger on the Java side (especially since it's not just about Java, but other languages perched on top of the JVM too, so it's THE way to run e.g. PHP or Ruby code on App Engine); the Python App Engine team however does have the advantage of having on board Guido van Rossum, the inventor of Python and an amazingly strong engineer.
In terms of flexibility, the Java engine, as already mentioned, does offer the possibility of running JVM bytecode made by different languages, not just Java -- if you're in a multi-language shop that's a pretty large positive. Vice versa, if you loathe Javascript but must execute some code in the user's browser, Java's GWT (generating the Javascript for you from your Java-level coding) is far richer and more advanced than Python-side alternatives (in practice, if you choose Python, you'll be writing some JS yourself for this purpose, while if you choose Java GWT is a usable alternative if you loathe writing JS).
In terms of libraries it's pretty much a wash -- the JVM is restricted enough (no threads, no custom class loaders, no JNI, no relational DB) to hamper the simple reuse of existing Java libraries as much, or more, than existing Python libraries are similarly hampered by the similar restrictions on the Python runtime.
In terms of performance, I think it's a wash, though you should benchmark on tasks of your own -- don't rely on the performance of highly optimized JIT-based JVM implementations discounting their large startup times and memory footprints, because the app engine environment is very different (startup costs will be paid often, as instances of your app are started, stopped, moved to different hosts, etc, all trasparently to you -- such events are typically much cheaper with Python runtime environments than with JVMs).
The XPath/XSLT situation (to be euphemistic...) is not exactly perfect on either side, sigh, though I think it may be a tad less bad in the JVM (where, apparently, substantial subsets of Saxon can be made to run, with some care). I think it's worth opening issues on the Appengine Issues page with XPath and XSLT in their titles -- right now there are only issues asking for specific libraries, and that's myopic: I don't really care HOW a good XPath/XSLT is implemented, for Python and/or for Java, as long as I get to use it. (Specific libraries may ease migration of existing code, but that's less important than being able to perform such tasks as "rapidly apply XSLT transformation" in SOME way!-). I know I'd star such an issue if well phrased (especially in a language-independent way).
Last but not least: remember that you can have different version of your app (using the same datastore) some of which are implemented with the Python runtime, some with the Java runtime, and you can access versions that differ from the "default/active" one with explicit URLs. So you could have both Python and Java code (in different versions of your app) use and modify the same data store, granting you even more flexibility (though only one will have the "nice" URL such as foobar.appspot.com -- which is probably important only for access by interactive users on browsers, I imagine;-).
Watch this app for changes in Python and Java performance:
http://gaejava.appspot.com/
(edit: apologies, link is broken now. But following para still applied when I saw it running last)
Currently, Python and using the low-level API in Java are faster than JDO on Java, for this simple test. At least if the underlying engine changes, that app should reflect performance changes.
Based on experience with running these VMs on other platforms, I'd say that you'll probably get more raw performance out of Java than Python. Don't underestimate Python's selling points, however: The Python language is much more productive in terms of lines of code - the general agreement is that Python requires a third of the code of an equivalent Java program, while remaining as or more readable. This benefit is multiplied by the ability to run code immediately without an explicit compile step.
With regards to available libraries, you'll find that much of the extensive Python runtime library works out of the box (as does Java's). The popular Django Web framework (http://www.djangoproject.com/) is also supported on AppEngine.
With regards to 'power', it's difficult to know what you mean, but Python is used in many different domains, especially the Web: YouTube is written in Python, as is Sourceforge (as of last week).
June 2013: This video is a very good answer by a google engineer:
http://www.youtube.com/watch?v=tLriM2krw2E
TLDR; is:
Pick the language that you and your team is most productive with
If you want to build something for production: Java or Python (not Go)
If you have a big team and a complex code base: Java (because of static code analysis and refactoring)
Small teams that iterate quickly: Python (although Java is also okay)
An important question to consider in deciding between Python and Java is how you will use the datastore in each language (and most other angles to the original question have already been covered quite well in this topic).
For Java, the standard method is to use JDO or JPA. These are great for portability but are not very well suited to the datastore.
A low-level API is available but this is too low level for day-to-day use - it is more suitable for building 3rd party libraries.
For Python there is an API designed specifically to provide applications with easy but powerful access to the datastore. It is great except that it is not portable so it locks you into GAE.
Fortunately, there are solutions being developed for the weaknesses listed for both languages.
For Java, the low-level API is being used to develop persistence libraries that are much better suited to the datastore then JDO/JPA (IMO). Examples include the Siena project, and Objectify.
I've recently started using Objectify and am finding it to be very easy to use and well suited to the datastore, and its growing popularity has translated into good support. For example, Objectify is officially supported by Google's new Cloud Endpoints service. On the other hand, Objectify only works with the datastore, while Siena is 'inspired' by the datastore but is designed to work with a variety of both SQL databases and NoSQL datastores.
For Python, there are efforts being made to allow the use of the Python GAE datastore API off of the GAE. One example is the SQLite backend that Google released for use with the SDK, but I doubt they intend this to grow into something production ready. The TyphoonAE project probably has more potential, but I don't think it is production ready yet either (correct me if I am wrong).
If anyone has experience with any of these alternatives or knows of others, please add them in a comment. Personally, I really like the GAE datastore - I find it to be a considerable improvement over the AWS SimpleDB - so I wish for the success of these efforts to alleviate some of the issues in using it.
I'm strongly recommending Java for GAE and here's why:
Performance: Java is potentially faster then Python.
Python development is under pressure of a lack of third-party libraries. For example, there is no XSLT for Python/GAE at all. Almost all Python libraries are C bindings (and those are unsupported by GAE).
Memcache API: Java SDK have more interesting abilities than Python SDK.
Datastore API: JDO is very slow, but native Java datastore API is very fast and easy.
I'm using Java/GAE in development right now.
As you've identified, using a JVM doesn't restrict you to using the Java language. A list of JVM languages and links can be found here. However, the Google App Engine does restrict the set of classes you can use from the normal Java SE set, and you will want to investigate if any of these implementations can be used on the app engine.
EDIT: I see you've found such a list
I can't comment on the performance of Python. However, the JVM is a very powerful platform performance-wise, given its ability to dynamically compile and optimise code during the run time.
Ultimately performance will depend on what your application does, and how you code it. In the absence of further info, I think it's not possible to give any more pointers in this area.
I've been amazed at how clean, straightforward, and problem free the Python/Django SDK is. However I started running into situations where I needed to start doing more JavaScript and thought I might want to take advantage of the GWT and other Java utilities. I've gotten just half way through the GAE Java tutorial, and have had one problem after another: Eclipse configuration issues, JRE versionitis, the mind-numbing complexity of Java, and a confusing and possibly broken tutorial. Checking out this site and others linked from here clinched it for me. I'm going back to Python, and I'll look into Pyjamas to help with my JavaScript challenges.
I'm a little late to the conversation, but here are my two cents. I really had a hard time choosing between Python and Java, since I am well versed in both languages. As we all know, there are advantages and disadvantages for both, and you have to take in account your requirements and the frameworks that work best for your project.
As I usually do in this type of dilemmas, I look for numbers to support my decision. I decided to go with Python for many reasons, but in my case, there was one plot that was the tipping point. If you search "Google App Engine" in GitHub as of September 2014, you will find the following figure:
There could be many biases in these numbers, but overall, there are three times more GAE Python repositories than GAE Java repositories. Not only that, but if you list the projects by the "number of stars" you will see that a majority of the Python projects appear at the top (you have to take in account that Python has been around longer). To me, this makes a strong case for Python because I take in account community adoption & support, documentation, and availability of open-source projects.
It's a good question, and I think many of the responses have given good view points of pros and cons on both sides of the fence. I've tried both Python and JVM-based AppEngine (in my case I was using Gaelyk which is a Groovy application framework built for AppEngine). When it comes to performance on the platform, one thing I hadn't considered until it was staring me in the face is the implication of "Loading Requests" that occur on the Java side of the fence. When using Groovy these loading requests are a killer.
I put a post together on the topic (http://distractable.net/coding/google-appengine-java-vs-python-performance-comparison/) and I'm hoping to find a way of working around the problem, but if not I think I'll be going back to a Python + Django combination until cold starting java requests has less of an impact.
Based on how much I hear Java people complain about AppEngine compared to Python users, I would say Python is much less stressful to use.
There's also project Unladen Swallow, which is apparently Google-funded if not Google-owned. They're trying to implement a LLVM-based backend for Python 2.6.1 bytecode, so they can use a JIT and various nice native code/GC/multi-core optimisations. (Nice quote: "We aspire to do no original work, instead using as much of the last 30 years of research as possible.") They're looking for a 5x speed-up to CPython.
Of course this doesn't answer your immediate question, but points towards a "closing of the gap" (if any) in the future (hopefully).
The beauty of python nowdays is how well it communicates with other languages. For instance you can have both python and java on the same table with Jython. Of course jython even though it fully supports java libraries it does not support fully python libraries. But its an ideal solution if you want to mess with Java Libraries. It even allows you to mix it with Java code with no extra coding.
But even python itself has made some steps forwared. See ctypes for example, near C speed , direct accees to C libraries all of this without leaving the comfort of python coding. Cython goes one step further , allowing to mix c code with python code with ease, or even if you dont want to mess with c or c++ , you can still code in python but use statically type variables making your python programms as fast as C apps. Cython is both used and supported by google by the way.
Yesterday I even found tools for python to inline C or even Assembly (see CorePy) , you cant get any more powerful than that.
Python is surely a very mature language, not only standing on itself , but able to coooperate with any other language with easy. I think that is what makes python an ideal solution even in a very advanced and demanding scenarios.
With python you can have acess to C/C++ ,Java , .NET and many other libraries with almost zero additional coding giving you also a language that minimises, simplifies and beautifies coding. Its a very tempting language.
Gone with Python even though GWT seems a perfect match for the kind of an app I'm developing. JPA is pretty messed up on GAE (e.g. no #Embeddable and other obscure non-documented limitations). Having spent a week, I can tell that Java just doesn't feel right on GAE at the moment.
One think to take into account are the frameworks you intend yo use. Not all frameworks on Java side are well suited for applications running on App Engine, which is somewhat different than traditional Java app servers.
One thing to consider is the application startup time. With traditional Java web apps you don't really need to think about this. The application starts and then it just runs. Doesn't really matter if the startup takes 5 seconds or couple of minutes. With App Engine you might end up in a situation where the application is only started when a request comes in. This means the user is waiting while your application boots up. New GAE features like reserved instances help here, but check first.
Another thing are the different limitations GAE psoes on Java. Not all frameworks are happy with the limitations on what classes you can use or the fact that threads are not allowed or that you can't access local filesystem. These issues are probably easy to find out by just googling about GAE compatibility.
I've also seen some people complaining about issues with session size on modern UI frameworks (Wicket, namely). In general these frameworks tend to do certain trade-offs in order to make development fun, fast and easy. Sometimes this may lead to conflicts with the App Engine limitations.
I initially started developing working on GAE with Java, but then switched to Python because of these reasons. My personal feeling is that Python is a better choice for App Engine development. I think Java is more "at home" for example on Amazon's Elastic Beanstalk.
BUT with App Engine things are changing very rapidly. GAE is changing itself and as it becomes more popular, the frameworks are also changing to work around its limitations.