For example there could be a bit of java byte code mixed together with some C. Jvm will execute java byte code and turns execution over to OS if a C part is hit. Is this technically possible or in practice?
Generally you can write C code which creates JVM, executing (execve) provided bytecode and either run them in separate threads with some IPC between or using JNA/JNI to exchange the data, or make operations and wait for completion.
I met some projects using this approach (for example part of Android system, Cloudera Impala and some others), but the code there is overcomplicated and hardly traceable. For sure it's took too much effort to make it work properly. Sometimes it's better either run 2 processes using different technologies with good IPC with data serialization (thrift, protobuf) or use only one of them.
If you still need to run both, I'd prefer to build a system in Java calling native functions with JNI rather than opposite.
Related
After searching for an option to run Java code from Django application(python), I found out that Py4J is the best option for me. I tried Jython, JPype and Python subprocess and each of them have certain limitations:
Jython. My app runs in python.
JPype is buggy. You can start JVM just once after that it fails to start again.
Python subprocess. Cannot pass Java object between Python and Java, because of regular console call.
On Py4J web site is written:
In terms of performance, Py4J has a bigger overhead than both of the previous solutions (Jython and JPype) because it relies on sockets, but if performance is critical to your application, accessing Java objects from Python programs might not be the best idea.
In my application performance is critical, because I'm working with Machine learning framework Mahout. My question is: Will Mahout also run slower because of Py4J gateway server or this overhead just mean that invoking Java methods from Python functions is slower (in latter case performance of Mahout will not be a problem and I can use Py4J).
JPype issue that #HIP_HOP mentioned with JVM getting detached from new threads can be overcome with the following hack (add it before the first call to Java objects in the new thread which does not have JVM yet):
# ensure that current thread is attached to JVM
# (essential to prevent JVM / entire container crashes
# due to "JPJavaEnv::FindClass" errors)
if not jpype.isThreadAttachedToJVM():
jpype.attachThreadToJVM()
PySpark uses Py4J quite successfully. If all the heavylifting is done on Spark (or Mahout in your case) itself, and you just want to return result back to "driver"/Python code, then Py4J might work for you very well as well.
Py4j has slightly bigger overhead for huge results (that's not necessarily the case for Spark workloads, as you only return summaries /aggregates for the dataframes). There is an improvement discussion for py4j to switch to binary serialization to remove that overhead for higher badnwidth requirements too: https://github.com/bartdag/py4j/issues/159
My solutions
java thread/process <-> Pipes <-> py subprocess
Use pipes by java's ProcessBuilder to call py with args "-u" to transfer data via pipes.
Here is a good practice.
https://github.com/JULIELab/java-stdio-ipc
Here is my stupid research result about "java <-> py"
[Jython] Java implement of python.
[Jpype] JPype is designed to allow the user to exercise Java as fluidly as possible from within Python. We can break this down into a few specific design goals.
Unlike Jython, JPype does not achieve this by re-implementing Python, but instead by interfacing both virtual machines at the native level. This shared memory based approach achieves good computing performance while providing the access to the entirety of CPython and Java libraries.
[Runtime] The Runtime class in java (old method).
[Process] Java ProcessBuilder class gives more structure to the arguments.
[Pipes] Named pipes could be the answer for you. Use subprocess. Popen to start the Java process and establish pipes to communicate with it.
Try mkfifo() implementation in python.
https://jj09.net/interprocess-communication-python-java/
-> java<-> Pipes <-> py https://github.com/JULIELab/java-stdio-ipc
[Protobuf] This is the opensource solution Google uses to do IPC between Java and Python. For serializing and deserializing data efficiently in a language-neutral, platform-neutral, extensible way, take a look at Protocol Buffers.
[Socket] CS-arch throgh socket
Server(Python) - Client(Java) communication using sockets
https://jj09.net/interprocess-communication-python-java/
Send File From Python Server to Java Client
[procbridge]
A super-lightweight IPC (Inter-Process Communication) protocol over TCP socket. https://github.com/gongzhang/procbridge
https://github.com/gongzhang/procbridge-python
https://github.com/gongzhang/procbridge-java
[hessian binary web service protocol] using python client and java server.
[Jython]
Jython is a reimplementation of Python in Java. As a result it has much lower costs to share data structures between Java and Python and potentially much higher level of integration. Noted downsides of Jython are that it has lagged well behind the state of the art in Python; it has a limited selection of modules that can be used; and the Python object thrashing is not particularly well fit in Java virtual machine leading to some known performance issues.
[Py4J]
Py4J uses a remote tunnel to operate the JVM. This has the advantage that the remote JVM does not share the same memory space and multiple JVMs can be controlled. It provides a fairly general API, but the overall integration to Python is as one would expect when operating a remote channel operating more like an RPC front-end. It seems well documented and capable. Although I haven’t done benchmarking, a remote access JVM will have a transfer penalty when moving data.
[Jep]
Jep stands for Java embedded Python. It is a mirror image of JPype. Rather that focusing on accessing Java from within Python, this project is geared towards allowing Java to access Python as a sub-interpreter. The syntax for accessing Java resources from within the embedded Python is quite similar to support for imports. Notable downsides are that although Python supports multiple interpreters many Python modules do not, thus some of the advantages of the use of Python may be hard to realize. In addition, the documentation is a bit underwhelming thus it is difficult to see how capable it is from the limited examples.
[PyJnius]
PyJnius is another Python to Java only bridge. Syntax is somewhat similar to JPype in that classes can be loaded in and then have mostly Java native syntax. Like JPype, it provides an ability to customize Java classes so that they appear more like native classes. PyJnius seems to be focused on Android. It is written using Cython .pxi files for speed. It does not include a method to represent primitive arrays, thus Python list must be converted whenever an array needs to be passed as an argument or a return. This seems pretty prohibitive for scientific code. PyJnius appears is still in active development.
[Javabridge]
Javabridge is direct low level JNI control from Python. The integration level is quite low on this, but it does serve the purpose of providing the JNI API to Python rather than attempting to wrap Java in a Python skin. The downside being of course you would really have to know a lot of JNI to make effective use of it.
[jpy]
This is the most similar package to JPype in terms of project goals. They have achieved more capabilities in terms of a Java from Python than JPype which does not support any reverse capabilities. It is currently unclear if this project is still active as the most recent release is dated 2014. The integration level with Python is fairly low currently though what they do provide is a similar API to JPype.
[JCC]
JCC is a C++ code generator that produces a C++ object interface wrapping a Java library via Java’s Native Interface (JNI). JCC also generates C++ wrappers that conform to Python’s C type system making the instances of Java classes directly available to a Python interpreter. This may be handy if your goal is not to make use of all of Java but rather have a specific library exposed to Python.
[VOC] https://beeware.org/project/projects/bridges/voc/_
A transpiler that converts Python bytecode into Java bytecode part of the BeeWare project. This may be useful if getting a smallish piece of Python code hooked into Java. It currently list itself as early development. This is more in the reverse direction as its goals are making Python code available in Java rather providing interaction between the two.
[p2j]
This lists itself as “A (restricted) python to java source translator”. Appears to try to convert Python code into Java. Has not been actively maintained since 2013. Like VOC this is primilarly for code translation rather that bridging.
[GraalVM]
Source: https://github.com/oracle/graal
I don't know Mahout. But think about that: At least with JPype and Py4J you will have performance impact when converting types from Java to Python and vice versa. Try to minimize calls between the languages. Maybe it's an alternative for you to code a thin wrapper in Java that condenses many Javacalls to one python2java call.
Because the performance is also a question about your usage screnario (how often you call the script and how large is the data that is moved) and because the different solutions have their own specific benefits/drawbacks, I have created an API to switch between different implementations without you having to change your python script: https://github.com/subes/invesdwin-context-python
Thus testing what works best or just being flexible about what to deploy to is really easy.
I'm new to Java, and was told to use the Java Native Interface to run some code I wrote in C.
Now, this might be a stupid question, but what's the point of the JNI ? Can't I simply execute my process from a Java UI program and get its stdout to parse ?
Also, I've read that the use of JNI might cause security issues. Do these issues directly depend on the quality of the invoked code ? Or is this something deeper ?
Thanks.
what's the point of the JNI ?
It enables you to mix C and Java code within the same process.
Can't I simply execute my process from a Java UI program and get its stdout to parse ?
A lot of things that can be achieved by using JNI can also be achieved by using inter-process communication (IPC). However, you'd have to ship all the input data to the other process, and then ship all the results back. This can be pretty expensive, which makes IPC impractical for many situations where JNI can be used (e.g. wrapping existing C libraries).
Also, I've read that the use of JNI might cause security issues. Do these issues directly depend on the quality of the invoked code ? Or is this something deeper ?
The point here is that the JVM does a lot of work to ensure that whatever Java code is thrown at it, things like buffer overruns, stack smashing attacks etc can't occur. For example, it performs bounds checking on all array accesses (which C doesn't).
On the other hand, JNI code is a black box to the JVM. If there's a problem with the C code (e.g. a buffer overrun), all bets are off.
Can't I simply execute my process from a Java UI program and get its stdout to parse ?
Do you think it's always appropriate to start a new process every time you want to execute any native code? Do you really want to be transferring potentially large amounts of data between processes? (Imagine a native image transformation.)
Also, I've read that the use of JNI might cause security issues. Do these issues directly depend on the quality of the invoked code?
Yes. Basically native code has less security sandboxing than Java running in a JVM. If the code has security bugs (e.g. buffer overflows) then clearly that will affect the security of your overall app.
I should say that it's relatively rare for Java developers to need to worry about JNI - I've certainly only touched it a couple of times in my career. You may also want to look at SWIG if the need arises.
Can't I simply execute my process from a Java UI program and get its stdout to parse ?
That would depend on what you are calling.
Note that you cannot just call programs via JNI, but library code.
In addition to that, spawning new processes is relatively expensive and managing multiple processes is complicated.
Does anybody know of a way to lock down individual threads within a Java process to specific CPU cores (on Linux)? I've done this in C, but can't find how to do this in Java. My instincts are that this will require a JNI call, but I was hoping someone here might have some insight or might have done it before.
Thanks!
You can't do this in pure java. But if you really need it -- you can use JNI to call native code which do the job. This is the place to start with:
http://ovatman.blogspot.com/2010/02/using-java-jni-to-set-thread-affinity.html
http://blog.toadhead.net/index.php/2011/01/22/cputhread-affinity-in-java/
UPD: After some thinking, I've decided to create my own class for this: ThreadAffinity.java It's JNA-based, and very simple -- so, if you want to use it in production, may be you should spent some time making it more stable, but for benchmarking and testing it works well as is.
UPD 2: There is another library for working with thread affinity in java. It uses same method as previously noted, but has another interface
I know it's been a while, but if anyone comes across this thread, here's how I solved this problem. I wrote a script that would do the following:
"jstack -l "
Take the results, find the "nid"'s of the threads I want to manually lock down to cores.
Taskset those threads.
You might want to take a look at https://github.com/peter-lawrey/Java-Thread-Affinity/blob/master/src/test/java/com/higherfrequencytrading/affinity/AffinityLockBindMain.java
IMO, this will not be possible unless you use native calls. JVM is supposed to be platform independent, any system calls done to achieve this will not result in a portable code.
It's not possible (at least with plain Java).
You can use thread pools to limit the amount of threads (and therefore cores) used for different types of work, but there is no way to specify a core to use.
There is even the (small) possibility that your Java runtime doesn't support native threading for your OS or hardware. In this case, green threads are used and only one core will be used for the whole JVM.
I just thinking to build a cluster software in Linux in Java. I want to control the CPU load, for example, if CPU load is higher than a threshold, then I reduce execution thread sizes. I thought I could check CPU load once per second or a couple of seconds by a demon thread, how to implement it in Java? and How to implment in Java if I am going to check particular process is dead or not, and the port it opens is lost or not?
There's no way (AFAIK) to do this in pure Java. You could potentially write some native C to do this and then interface using JNA / JNI (which would be the most robust solution.)
Alternatively, a quick hacky easy approach (if you're just using Linux) would be to use Runtime.exec() to call one of these native approaches which you could then parse from within Java.
In terms of checking whether a process is dead or not, you could use the above approach but with ps.
EDIT: This may be something that helps.
I would to pick up a new programming language - Java, having been using Python for some time. But it seems most things that can be done with Java can be done with Python. So I would like to know
What kind of things can be done with Java but not Python?
mobile programming (Android).
POSIX Threads Programming.
Conversely, What kind of things can be done with Python but not Java if any?
clarification:
I hope to get an answer from a practical point of view but not a theoretical point of view and it should be about the current status, not future. So theoretically all programming languages can perform any task, practically each is limited in some way.
Both languages are Turing complete, both have vast libraries, and both support extensions written in C so that you can access low level code if needed. The main difference is where they are currently supported. Java in general has wider support than Python.
Your example of Android is one place where Java is the standard choice, although Python also has some support in the form of Android Scripting Environment. Java is already installed on most home computers. You can write Java applets and expect them to work in most browsers.
One thing you can't easily do in Java is quickly write short scripts that perform useful tasks. Python is more suitable for scripting than Java (although there are of course other alternatives too).
I guess using Jython, you can do anything with Python that you can do in Java.
Conversely, Python has the PyPy compiler, which is pretty cool - a virtual machine with multiple backeds (Java Runtime, LLVM, .net, and Python IIRC), multiple garbage collectors, multiple implementations (Stackless), etc. I know Java has a big choice of virtual machines, but the growth of PyPy is amazing, due to it being written in RPython - a fairly productive language.
Also, can a Java do this, in 1 file and less that 20 lines, with no library imports? Obviously both languages have libraries that can do this, but I'm just talking about the flexibility of the languages.
class Logger(object): # boilerplate code
def log(self,level,msg,*args,**kwargs): # *args, **kwargs = flexible arguments
self._log(level,msg,*args,**kwargs) # call with flexible argments
def _log(self,level,msg,*args,**kwargs):
# override me at runtime :)
# I think Java people call this Dependency Runtime Injection
if level>1:
print msg,args,kwargs
logger = Logger() # boilerplate code
def logged(level): # what pattern do you call this?
def logged_decorator(function): # and this?
def func(*args,**kwars):
name = func.__name__ # look ma, reflective metaprogramming!
logger.log(level,name,*args,**kwargs)
return func(*args,**kwargs)
return func # boilerplate code
return logged_decorator # boilerplate code
Example use:
#logged
def my_func(arg1,arg2):
# your code here
pass
You would surely love reading the comparisons made below between these 2 languages.
Check them :
Java is Dead ! Long live Python
Python-Java : A side-by-side comparison
Python is NOT java
What Python Can do Java Can't -
Nothing.
What Java Can do but Python Can't -
Java is a multithreading boss. So if you are trying to write a web server, where multiple requests come at the same time, java can just spawn multiple threads and CPU swaps them. That's why almost all big companies (Expedia, LinkedIn, Goldman, Amazon, Netflix, CITY, JPMC, VISA almost all) uses the JVM Web server for their main application.
CPython has something called GIL, which prevents itself from using OS-level thread efficiently. How Python Application servers (Gunicorn, Django ..) work then? Well, they fork new Processes instead of threads. Threads are lightweight, so forking new processes instead of threads still work till a threshold, but not a very scalable solution.
Sheer Execution speed - When you take var a = 1. and then do a = 10.07, you just stored float value in a variable that previously-stored an integer. When variables are assigned a new value then internally, Python creates a new object to store the value. This pointer is then assigned to the variable, this is called dynamic binding. Dynamic Binding(Python) is slower than Static Binding(Java) - as it requires object creation. And Java or C/C++ primitive types are an order of magnitude faster because of static binding.
Space used (RAM usage) - The default space python uses to store a variable is huge.
>>> import math, sys
>>> a=0
>>> sys.getsizeof(a)
24
But in Java, you can take a Byte size variable, which will take just a byte. So nobody writes an application in Python if asked to write memory-efficient software(Think about you are asked to create a DataBase like Cassandra, maybe design a compute engine like Spark).
Packaging - In Java, you can create something like a Jar. Which can run on any machine where JVM is installed. and that JAR contains all the dependencies. In python you can't just ship something like a JAR, you will have to write a script to install dependencies in every machine you want to run your code on.
Think about if Android was a Python application, before installing an App from the play store, you had to install the dependencies of that app separately.
Performance . Performance . Performance -
Java’s efficiency largely comes from its Just-In-Time (JIT) compiler and support for concurrency. The JIT compiler is a part of the Java Runtime Environment. It improves performance of Java programs by compiling bytecodes into native machine code “just in time” to run. Java Virtual Machine (JVM) calls the compiled code directly. Since the code is not interpreted, compiling does not require processor time and memory usage. Theoretically, this can make a Java program as fast as a native application.
While Java programs are compiled directly, Python is interpreted which slows down Python programs during runtime. Determining the variable type which occurs during runtime increases the workload of the interpreter. Also, remembering the object type of objects retrieved from container objects contributes to memory usage.
We can go on and on ...
People use python because python is easy to learn, easy to use. A lot of libraries for ML / DataScience etc . that saves you coding. But if you are asked to write a performant, scalable, durable, long term application, People who understand comp sci, will always choose Java or C/C++
CPython has a lot of libraries with bindings to native libraries--not so Java.