Mixed Code (native, managed): how does it (technically) interoperate?

Mixed Code (native, managed): how does it (technically) interoperate? - java

I basically understand the idea of managed and native code and their difference. But how is it technically possible for them to communicate with each other? Imagine the following example:
I got some static or dynamic c++ library which is compiled for a specific platform. Now I write a Java Programm. Inside this code I call the library functions with the 'native' keyword. I build a jar file with the bytecode and the c++ library files will stay separate. The result will no longer be platform-independent.
But how does the java programm know if the called native methods exists?
How is the whole programmcode executed during runtime? I know that the bytecode will be interpreted or compiled with JIT.
How does this all fit in the sandboxing paradigm? Is the native code also executed inside the sandbox?
Does it work because both (java and c++) code is machine code in the end?
Maybe this is a dumb question. But I was always wondering...
EDIT: I got 3 good answers. really can't decide which helped me the most. But i will mark this question as answered to close this topic from my side.

It doesn't know until you call the method. The native code resides in a .DLL or .so; the java runtime looks for specific entry points that correspond to the native methods you created (if you're using JNI, there's a tool that can parse the methods and create function stubs that'll result in those entry points when compiled). If the wanted entry point is not there, an exception will be thrown.
The code generated by the JIT is not entirely self-suficient; it has to call external native code (both for low-level runtime routines or OS services) from time to time. The same mechanism is used to invoke the code for your native methods.
No. You can do everything you'd do in a pure C/C++ program there. The only things that'll stop it from doing any damage are external security measures you have (login privilege restrictions, other OS protections, security software, etc.) But the VM won't protect you.
No, JNI existed even before JIT appeared. The mechanism is the same, if the bytecode is being run by an interpreter, and you want this interpreter to invoke native code, you just need some logic in it to determine that a given method is "external" and should be called as native code. This information is contained in the compiled .class file, and when the interpreter or JIT loads it, it creates a memory representation that makes easy to direct the call upon a method lookup.

The JVM will check the libraries you defined and see if the method is there
Bytecode will be interpreted or JITted and a call to native code is added. This may include boxing/deboxing values and other things needed to convert the data into suitable format. The libraries have a certain interface which is explained to the Java compiler and it will produce the required interface logic.
Depends on the sandbox. By default native code is native code. It doesn't call Java APIs so the JVM cannot govern it in any way. But there may be other limitations, for example the JVM could run the native code with libraries that provide sandboxing, or the operating system might have a way of sandboxing.
It depends on what you mean. In the end anything the computer does is machine code, but it doesn't really matter in this case. What matters is the translation and execution part. That is the glue that makes everything work.
Think of the system as people. Person A only speaks Japanese, but wants to reserve a hotel in Paris. The receptionist B only speaks French. Person A can get a translator that will translate their commands to French, command receptionist B and in return translate what B produced into a form person A understands. This is the JNI part.

It depends on the platform. On Linux, Solaris, etc., the JRE uses dlopen. On Windows, it uses LoadLibraryEx and GetProcAddress. If the JRE is running in interpreted mode, it calls that function; in compiled mode, it compiles Java bytecode into native code that calls that function.
On all JREs I'm familiar with, you can't call a native function in a static library directly; only one in a dynamic library.
Native code doesn't have to be limited to a single platform; if it's standard C, you can probably compile it with a cross-compiler for every platform on which a JRE is available.

Related

Does the JVM use system calls to seek OS functionalities?

We know that the JVM calls on the underlying system to allocate memory and CPU time, access files, and many more. How does it work internally to achieve its activities?
Does the JVM use system calls?

Does the JVM use system calls?
Yes.
How does it work internally to achieve its activities?
The typical pattern is that some of the methods in a Java class are labelled as native. When the JVM encounters a call to a native method it makes a call into C or C++ code that is part of the JVM executable. The native method implementation typically does the following:
Check arguments from Java, and translates them into a C / C++ compatible form. For example, String arguments need to be converted to zero-terminated form.
Call the standard C / C++ library function with the arguments it needs.
The library function makes the syscall.
The OS does its stuff and the syscall returns.
The standard C / C++ library function returns.
The native method implementation checks the 'errno'. If there was an error, it creates a Java exception object and throws it.
Otherwise, the native method implementation converts results, etc into Java objects and returns them to the caller of the Java method.
The details vary, depending on what the native method does.
If you want to get a deeper understanding, I recommend that you checkout a copy of the OpenJDK source tree and start trawling. (You need to do the hard yards yourself ....)

Indeed, JVM needs to leverage system calls which is an operating system way to allow processes to interact with underlying system resources.
You can run strace java -version to see a bunch of system calls (mmap, mprotect, openat, etc.) executed even during this very limited java/jvm run.
Another good way to find out more is to dig trough JVM sources for native methods.
One example could be an implementation of FileChannel#force method
which internally calls fsync system call (for example): https://github.com/AdoptOpenJDK/openjdk-jdk11u/blob/5f01925b80ed851b133ee26fbcb07026ac04149e/src/java.base/unix/native/libnio/ch/FileDispatcherImpl.c#L172

Yes, system calls are the only way that an OS allows access to any program.
In the case of Java, this is why some OS-specific “features” show through, so spoiling the ideal of write-once-run-anywhere. For example, I’ve had a program that I developed on a Windows box fail when run on a Linux box.
The problem turned out to be that in the resources directory, the filename was all-lower case, but my program had the file name inMixed case. The program worked on windows since filenames on windows are case-insensitive, but in Linux they are case-sensitive.

Dynamic Link Library for Java/Python to access in C/C++?

A quick question that may seem out of the ordinary. (in reverse)
Instead of calling native code from an interpreted language; is there a way to compile Java or Python code to a .dll/.so and call the code from C/C++?
I'm willing to accept even answers such as manually spawning the interpreter or JVM and force it to read the .class/.py files. (is this a good solution?)
Thank you.

gcj can compile most Java source to native code (linked with a libgcj shared library) instead of to JVM bytecode.
There are a number of Python projects that are similar, like shedskin, but none as mature or active.
Cython is similar, but not quite the same—it compiles modules written in a Python-like language into native C extension modules for CPython. But if you put that together with embedding Python in a C app, it gives you most of what you want. But you are still running a Python interpreter loop to tie all those compiled-to-C functions together.
You can also do the same thing with Java—embed the JVM into your app, use gcj to compile any parts you want to native code, while compiling other parts to bytecode, and using JNI to communicate between them.
And of course you can use Jython to embed your Python code into the JVM, which you can embed into your C program, and because you can use JNI directly from Jython any pair of the three languages can effectively talk to each other without going through the third.
The idea of spawning a JVM or a CPython interpreter as a subprocess, which I think you were suggesting in your question, also works just fine. However, the only interface you will have to it in that case will be the child process's stdin/stdout/stderr (or any pipes or sockets you create manually), which isn't as flexible as being able to call methods directly on objects, etc. (Then again, sometimes that extra indirection can be a good thing, forcing you to define a cleanly-separated API between your components.)

You can embed a Python interpreter in your C/C++ program.
http://docs.python.org/2/extending/embedding.html
With Java your probably want the Java Native Interface (which works in both directions).
http://en.wikipedia.org/wiki/Java_Native_Interface

You can also look into Lua, while not as widely used as a lot of other scripting languages, it was meant to be embedded easily into executables. It's relatively small and fast. Just another option. If you want to call other languages from your c/c++ look into SWIG.

How native mixed Java code is platform independent?

In the code of many Java library classes I can see native methods. Even in the Object class.
If Java is platform independent when Java code is converted into byte codes, then what about native code? Is it also converted into byte code?
Does this native code call go to the OS or is it coming from downloaded or installation of Java itself?

Java library code does make native calls. Now these calls are fulfilled by JVM. If you notice then each system has OS specific JVM, so all the system-dependent native calls are ultimately served by the system dependent JVM implementations.

There are different flavours of native method:
The native methods in the standard Java libraries will all be implemented (by Oracle and / or the vendor of your Java implementation) for the platform that you are running on. Doing this is part of the process of developing Java for the platform. By the time you get to use Java (on that platform) the porting work has been done. (The methods are implemented the JVM and its associated native code libraries / dlls.)
Native methods in your code or in 3rd party libraries are a different matter. The native code that implements these methods does indeed represent a portability impediment, because it need to (at least) be recompiled for each platform. And in a lot of cases, the porting process may even extent to a complete rewrite of the (native) code.
If Java is platform independent when Java code is converted into byte codes, then what about native code? Is it also converted into byte code?
No1. Native methods are implemented in some other programming language; e.g. C or C++.
(If the native methods could be translated to bytecodes, there would be no need for them to be written as "native" in the first place!)
Does this native code call go to the OS or is it coming from downloaded or installation of Java itself?
It is unlikely that a Java native method will map directly to an system call or a call to one of the standard OS provided libraries. Native methods are usually implemented either by the JVM implementation, or by customer or 3rd party native libraries. See above.
1 - Actually, there is one exception to this. On the JNode platform, most methods in the Java core libraries that are marked as native do in fact map back to Java code. But that is because, almost the entire JNode operating system is implemented in Java. JNode's native code compiler implements some "clever tricks" to allow this to happen.

Can c++ code read java .class files?

I have been working in java for a while now, and want to learn how c++ works when it comes to compilation and executed.
I was wondering if there is a way to convert compiled c++ class into .class files in java and vice versa. I am interested in a single format that can be used both by java code as well as c++ code that can be directly executed to see the results.

Almost everything can be done if you are ambitious and stubborn enough, however, the question lies usually in time and cost and features..
In its core, Java language is a subset of C++. There are some syntactic sugars added that may make you feel that there is "something more", like anonymous class implementation or hidden pointers to outer classes, but it is just a thin layer of syntax, which is irrelevant once the code gets compiled.
After compilation, C++ code is represented by machine code. Java's bytecode of course is translatable to machine code - simply by the fact that JVM executes it and that the jitter can recompile it on the fly into machine code..
So, roughly speaking, every Java code, compiled or not, is translatable to C++.
However, there are some code constructs in C++ that can be compiled into machine code, but that maybe are representable in Java's bytecode, but that cannot be represented back in the Java language. There are lots of it: from some easy to go around like passing parameters as references, to more complex ones like pointer (TheList->) arithmetics, to some really painful to translate like multiple inheritance, custom memory management (that is, overloaded operators new and delete), or some wicked types like unions.
So, clearly, C++ code is not translatable to Java. Clearly, C++ compiled code is even more not translatable, as C++ compilers often optimize the products thorougly, so that it is very hard to guess what was the 'classes' or 'functions' like..
However, if you limit the C++ language, and restrict yourself to not use any of those hard-to-translate constructs (see 'TheList'), then you can make the C++ code translatable. Again, code, but not binaries.
This is not all though. The 'translatability' is one thing, but the other is: will it run? The most distinctive runtime difference is the GarbageCollector. Let's say you actually managed to translate some Java code into C++, and you lined it up with C++ application. Your Java/C++ code executes and creates some objects. Who will clean them up? Typically, there's no GC in C++. Your Java code will therefore leak -- or you will have to provide/implement some kind of GC for the Java/C++ code.. Not pretty. Of course you can limit Java code to not create any objects, d'oh.
Do not get me wrong: even those hard-to-translate things like pointer arithmetics etc are translatable: you can generate tons of helper/wireup code that will replace them with 'proper things of the second platform'. It will, however, be ridiculously complex and slow. I don't think anyone sane will ever try.
So, the only thing that would seem to be left available is very-limited-C++-code <-> somewhat-limited-Java-code. If we cut down the question to this, then yes, that should be translatable. But..
What does it mean to translate code? How'd you do it? You have to read, process, analyze the source code, and then somehow produce the other code in the other language. Well, ok. Producing code is simple, it's just text. But, have you ever tried to analyze code? Long story short, let me just tell you that reading/parsing Java code is at least an order of magnitude easier that reading/parsing C++ code. Java was partly desined to be easily parsable by relatively simple algorithms. If you drop any attempts to optimize, writing a Java parser/compiler is relatively simple thing. On the other hand, C++ was not. Like Java, to parse C++ properly you'd have to effectively create a custom C++ compiler, but also a preprocessor. To some extent, you might also need to implement some parts of the linker. To make thins a little worse, C++ evolved from older languages and is literally packed with some once-in-your-lifetime-used features that make the syntax really difficult to accurately process (ie. have you ever used alternate token set? ..and this is only beginning:)). Do not get scared too much, though! I just want to give you a feeling what you try to touch. You probably would'nt need to write it. Such already tools exists, both for Java (really many, actually) and for C++ (few, and I bet the reasonable ones are not-fully-for-free.. or maybe you could use the GCC toolset probably..). They produce machine-processable representations of the sourcecode, and if you really want to do some translating job, I'd suggest you start there.
Of course my knowledge can be off by a few years, and maybe someone already has written some moreorless working translator - I'd love to see it!
If not, I think it is not worth it. Try embedding Java's runtime in your C++ app, or talk from Java to C++ DLLs via JNI. It is much simplier!

Compiled C++ code is loaded by the OS. That is, the C++ linker generate OS dependent executable modules. Whereas Java .class is loaded by the JVM. JVM executes Java byte code.
If you want to make a Java byte code loader/runner, you could start from JVM source code.
=> link
If you are aimed to load/execute compiled C++ code in Java envirionment, following are required.
(Assuming 32bit Windows platform)
. PE parser/loader => link
. X86 CPU instruction parser(?) => link
. X86 instruction to Java byte code translator
In short, its almost impossible.
For simple C/C++ vs. Java interoperations, JNI will do. link

I think, JNI will be useful:
"The Java Native Interface (JNI) is a programming framework that enables Java code running in a Java Virtual Machine (JVM) to call, and to be called by, native applications (programs specific to a hardware and operating system platform) and libraries written in other languages such as C, C++ and assembly."
Here are good tips how to program a simple example of using the Java Native Interfac (write a Java application that calls a C function).

Converting already compiled C++ code to JVM bytecode is not an easy task, especially if you want to be able to use compiled C++ code from multiple platforms.
It will probably be easier to make a JVM-backend for clang instead. The drawback here is that people who wants this functionality must use your compiler.
In short, if you want to write code targeting the Java virtual machine, then use a language and compiler already made to do it. Like Java...

Does each of java API function map to java native method?

Does each of java API function map to java native method?
If not then how those functions get the functions of operating system?

Some of them do, and others build on top of the former (or on functionality offered by the JVM itself). Only very few methods map directly to native code, as it is platform specific, and the whole point of the JVM is to offer a platform neutral stage for the code to run.

A Java program runs in a JVM: Java Virtual Machine. The actual executed program is the JVM (lauched by the java command). This JVM is written in C and/or C++. Its role is to load Java byte-code, interpret it (and compile it to native code), and run it.
Some Java methods have the native modifier, and this means that they don't contain any byte-code to execute, but are directly mapped to a native function written in C or C++.

No, as you can see in the source code or by decompilation. The truth is, only very few methods map to native code.

Think about your question it doesn't make complete sense as not all functions (methods) use functions of the operating system.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.