What does a JVM have to do when calling a native method? - java

What are the usual steps that the JVM runtime has to perform when calling a Java method that is declared as native?
How does a HotSpot 1.8.0 JVM implement a JNI function call? What checking steps are involved (e.g. unhandled exceptions after return?), what bookkeeping has the JVM to perform (e.g. a local reference registry?), and where does the control go after the call of the native Java method? I would also appreciate it if someone could provide the entry point or important methods from the native HotSpot 1.8.0 code.
Disclaimer: I know that I can read the code myself but a prior explanation helps in quickly finding my way through the code. Additionally, I found this question worthwhile to be Google searchable. ;)

Calling a JNI method from Java is rather expensive comparing to a simple C function call.
HotSpot typically performs most of the following steps to invoke a JNI method:
Create a stack frame.
Move arguments to proper register or stack locations according to ABI.
Wrap object references to JNI handles.
Obtain JNIEnv* and jclass for static methods and pass them as additional arguments.
Check if should call method_entry trace function.
Lock an object monitor if the method is synchronized.
Check if the native function is linked already. Function lookup and linking is performed lazily.
Switch thread from in_java to in_native state.
Call the native function
Check if safepoint is needed.
Return thread to in_java state.
Unlock monitor if locked.
Notify method_exit.
Unwrap object result and reset JNI handles block.
Handle JNI exceptions.
Remove the stack frame.
The source code for this procedure can be found at SharedRuntime::generate_native_wrapper.
As you can see, an overhead may be significant. But in many cases most of the above steps are not necessary. For example, if a native method just performs some encoding/decoding on a byte array and does not throw any exceptions nor it calls other JNI functions. For these cases HotSpot has a non-standard (and not known) convention called Critical Natives, discussed here.

Related

Why there are JVM instructions `monitorenter/monitorexit` but no `wait/notifyAll` (they are native calls)?

When we write synchronized(some_object){} we can see two JVM instructions monitorenter/monitorexit issued as the byte code.
When we write synchronized(some_object){some_object.wait()} i would expect to see special JVM instructions like wait, but none -- instead wait/notify are implemented as native C functions.
Why there is such inconsistency (either have them all as JNI or as java byte code)? Was there a particular (historical) reason or it is just a matter of taste?
Context: i am interested in this because having all monitorenter/monitorexit/wait/notify in the bytecode would allow 'JavaByteCode program correctness verifiers that do not handle JNI' to verify concurrent Java programs that do not use JNI. Currently, such hypothetical tool has to workaround wait/notify.
i would expect to see special JVM instructions like wait
I wouldn't. That would be inconsistent, in my view - in the source code, you're just calling a method, so it makes sense that you're just calling a method in the bytecode as well. Otherwise the compiler would have to have special knowledge of those methods, where it doesn't at the moment.
Arguably it would make more sense for monitorenter and monitorexit to be implemented via method calls as well (as they are in .NET, for example). Certain methods will always be native and deeply tied to the JVM itself - I don't see anything unreasonable about that, and I wouldn't want each of those to be implemented via a separate bytecode operation. However, I don't have too much issue with synchronized having special bytecode supporting it, given that it's a language construct (like try/catch/finally) rather than just a regular method call.
There is no need for a verification program to deal with JNI as the semantics of wait and notify calls are well-specified. That’s not different to dedicated bytecode instructions. The same applies to how the hot spot optimizer deals with a lot of well known method invocations, which may include wait and notify. It does not necessarily generate a costly JNI invocation but rather generate code performing these low-level operations directly. Methods handled this way are called intrinsic methods (see also here or here.
There are so many, that you couldn’t call it bytecode anymore if you tried to reserve an opcode for each of them. Further, which methods are handled this way, depends on the actual JVM implementation and the hardware architecture on which it runs. It might also change between versions so there is no sense to carve it in stone by defining bytecode instructions for them.
You wrote “Currently, such hypothetical tool has to workaround wait/notify”. In fact, handling these special methods is not a work-around. It’s what such an audit tool has to do with a lot of methods like these declared in Lock and Condition which have similar threading-related semantics but there are also a lot of other well-known concurrency tools nowadays which have to be handled.
The exact decision to create monitorenter and monitorexit instructions but make wait and notify methods on Object is historical (it dates back over 20 years ago). Today, the decision might look different if the developers had to make it again. But I guess it would rather go into the direction to make even monitorenter and monitorexit special methods that are invoked under the hood rather than bytecode instructions. First, they are not the only thread synchronization tool anymore. Second, it’s how most of the new feature were added in the recent JVMs, preferably as method, even if it’s expected to be intrinsified by most, if not all, implementations.

What is the correct way to use v8::Locker, and why must I use it?

I'm trying to embed v8 in an Android application using NDK.
I have a JNI module that looks something like this (JNI mapping code not shown):
#include <jni.h>
#include <android/log.h>
#include <v8.h>
using namespace v8;
static jlong getMagicNumber() {
HandleScope handle_scope;
Persistent<Context> context = Context::New();
Context::Scope context_scope(context);
Handle<String> source = String::New("40 + 2");
Handle<Script> script = Script::Compile(source);
Handle<Value> result = script->Run();
context.Dispose();
return result->NumberValue();
}
The first time I run getMagicNumber, it correctly runs and returns 42. The second time I try to run it, it crashes.
Specifically, this ASSERT seen in v8's isolate.h fails:
// Returns the isolate inside which the current thread is running.
INLINE(static Isolate* Current()) {
Isolate* isolate = reinterpret_cast<Isolate*>(
Thread::GetExistingThreadLocal(isolate_key_));
ASSERT(isolate != NULL);
return isolate;
}
It sounds a lot like this problem, which suggests using v8::Locker to obtain "exclusive access to the isolate".
By adding a simple Locker l; to the top of getMagicNumber, the crash no longer occurs. Problems that fix themselves that easily tend to break themselves when I'm not paying attention.
I only have the most tenuous understanding of why this fixes my problem, and I'm getting compiler warnings that I'm using v8::Locker in a deprecated fashion. The recommended method is to provide it with a v8::Isolate as an argument to v8::Locker's constructor, but I have no idea how I'm supposed to "obtain" an isolate.
Ultimately: What is the proper way to solve this problem according to the current state of v8, and why?
As I understand it, a V8 isolate is an instance of the V8 runtime, complete with a heap, a garbage collector, and zero or more V8 contexts. Isolates are not thread-safe and must be protected via v8::Locker.
In general, to use V8 you must first create an isolate:
v8::Isolate* isolate = v8::Isolate::New();
Then, to use the isolate from any thread:
v8::Locker locker(isolate);
v8::Isolate::Scope isolateScope(isolate);
At this point the thread owns the isolate and is free to create contexts, execute scripts, etc.
Now, for the benefit of very simple applications, V8 provides a default isolate and relaxes the locking requirement, but you can only use these crutches if you always access V8 from the same thread. My guess is that your application failed because the second call was made from a different thread.
I am just learning V8 now, but I think you need to call:
v8::Locker locker(isolate);
This will create a stack allocated Locker object which will block the Isolate from being used on another thread. When the current function returns this stack object's destructor will be called automatically causing the Isolate to be unlocked.
The you need to call:
v8::Isolate::Scope isolateScope(isolate);
This sets the current thread to run this Isolate. Isolates can only be used on one thread. The Locker enforces this, but the Isolate itself needs to be configured for the current thread. This creates a stack allocated object which specifies which Isolate is associated with the current thread. Just like the Locker, when this variable goes out of scope (when the current function returns) the Scope destructor gets called to un-set the Isolate as the default. I believe this is needed because many of the V8 API calls need a reference to an Isolate, but don't take one as a parameter. Therefore they need one they can access directly (probably through per-thread variables).
All the Isolate::Scope class does is call isolate::Enter() in the constructor and isolate::Exit() in the destructor. Therefore if you want more control you can call Enter()/Exit() yourself.

.NET GC stuck on JNI call from finalizer()

I have a .NET application that is using JNI to call Java code. On the .NET finalizer we call a JNI call to clean the connected resource on Java. But from time to time this JNI gets stuck.
This as expected stuck the all .NET process and never releases.
Bellow you can see the thread dump we got from .NET:
NET Call Stack
Function
.JNIEnv_.NewByteArray(JNIEnv_*, Int32)
Bridge.NetToJava.JVMBridge.ExecutePBSCommand(Byte[], Int32, Byte[])
Bridge.Core.Internal.Pbs.Commands.PbsDispatcher.Execute(Bridge.Core.Internal.Pbs.PbsOutputStream, Bridge.Core.Internal.DispatcherObjectProxy)
Bridge.Core.Internal.Pbs.Commands.PbsCommandsBundle.ExecuteGenericDestructCommand(Byte, Int64, Boolean)
Bridge.Core.Internal.DispatcherObjectProxy.Dispose(Boolean)
Bridge.Core.Internal.Transaction.Dispose(Boolean)
Bridge.Core.Internal.DispatcherObjectProxy.Finalize()
Full Call Stack
Function
ntdll!KiFastSystemCallRet
ntdll!NtWaitForSingleObject+c
kernel32!WaitForSingleObjectEx+ac
kernel32!WaitForSingleObject+12
jvm!JVM_FindSignal+5cc49
jvm!JVM_FindSignal+4d0be
jvm!JVM_FindSignal+4d5fa
jvm!JVM_FindSignal+beb8e
jvm+115b
jvm!JNI_GetCreatedJavaVMs+1d26
Bridge_NetToJava+1220
clr!MethodTable::SetObjCreateDelegate+bd
clr!MethodTable::CallFinalizer+ca
clr!SVR::CallFinalizer+a7
clr!WKS::GCHeap::TraceGCSegments+239
clr!WKS::GCHeap::TraceGCSegments+415
clr!WKS::GCHeap::FinalizerThreadWorker+cd
clr!Thread::DoExtraWorkForFinalizer+114
clr!Thread::ShouldChangeAbortToUnload+101
clr!Thread::ShouldChangeAbortToUnload+399
clr!ManagedThreadBase_NoADTransition+35
clr!ManagedThreadBase::FinalizerBase+f
clr!WKS::GCHeap::FinalizerThreadStart+10c
clr!Thread::intermediateThreadProc+4b
kernel32!BaseThreadStart+34
I have no idea whether .NET finalizers are equally bad idea to Java finalizers, but using a potentially (dead)locking code (i see Win32 condition call at the very bottom) from anything like finalizer (regardless of the platform) is definitely a bad idea. You need to clean your native code of any potential locking, or have an emergency brake timeout at the level of .NET
As I didn't find a question I won't post a formal answer here but rather tell a story about something similar I underwent sometimes:
We created C ojects via JNI, that were backed by java object, and we decided to clean the C objects within the finalize method. However, we envisioned deadlocks, as the finalize is called from a non-application thread, the garbage-collector. As the entire wolrd is stopped while collecting the garbage, whenever the finalizer meets a lock it's immediately a dead lock. Thus we decided to use a java mechnism called phantom references. It's possible to bind a number to each of these 'references' (the C pointer) and then the VM removes an referenced object it puts such an reference into a queue. And one can pull this data whenever appropriate and remove the C object.
I think at least your problem is the same.

Is it possible to make GC manage native object's lifetime?

With C++ and C# experience and some little Java knowledge I'm now starting a Java+JNI (C++) project (Android, if that matters).
I have a native method, that creates some C++ class and returns a pointer to it as a Java long value (say, handle). And then other native methods called from Java code here and there, use the handle as a parameter to do some native operations on this class. C++ side does not own the object, it's Java side who does. But in the current architecture design it's hard to define who exactly owns the object and when to delete it. So it would probably be nice to make Java VM garbage collector to manage the object's lifetime somehow. The C++ class does not consume any resources, except some piece of memory, not large. So it's OK, if several such objects will not be destructed.
In C# I would probably wrap the native IntPtr handle in some managed wrapper class. And override it's finalizer to call native object's destructor when the managed wrapper is garbage collected. SafeHandle, AddMemoryPressure, etc. might be also of help here.
This is a different story with Java's finalize. The second thing you know after 'Hello world' in Java, is that using finalize is bad. Are there any other ways to accomplish this in Java? Maybe using PhantomReference?
Well let's consider the reason WHY finalize and Co are problematic: As you know there's no guarantee that the finalize will be called before the VM is shut down, which means that special cleanup code won't necessarily run (imo a bad decision, I don't see any problems to run through the finalize queue at cleanup, but well that's how it is). Also this is exactly the same situation in C#
Now your objects only consume memory, which will be cleaned up by the OS anyhow when the VM is destroyed, so the only case where finalize is problematic won't matter for you. So yes you can indeed use this variant and it'll work perfectly fine, but it may not exactly be considered a great architectural design - and as soon as you add resources to your C++ code where the OS doesn't handle the cleanup correctly you will run into problems
Also note that implementing a finalizer results in some additional overhead for the GC and means it takes two cycles to cleanup one of these objects (and whatever you do, don't ever save an object in the finalize method)
If you understand why you should avoid using Java's finalize method, you will also understand how to use it correctly. Using finalize for closing system resources (files and handles) is bad because you don't actually know when those resources will be closed and released. Using complex finalize logic is bad as your object reference can leak out and get pinned in memory again.
For your scenario, it is perfectly fine to use finalize.
using a wrapper with a finalizer is a decent solution here
but if you really don't wanna do that you can use a PhantomReference with a ReferenceQueue to clean it up (but you are going to require a separate thread to poll the queue)
So how can we achieve it using phantom reference.
Create a wrapper object for your native intPtr object. Create a
phantom reference(with a reference queue) on the wrapper object.
Create and maintain a map of phantom reference to intPtr.
Create a thread that will be monitoring the reference queue for finalized
wrapper object instances.
This thread will get the phantom reference from reference queue, lookup intPtr using phantom reference and call destructor on native int object referenced by intPtr.
While all this happening, you can go about happily using the
wrapper object in your java code.

Thread-aware heap allocation tracking with JVMTI

Writing a profiling I would also implement the typical task of heap profiling. Specifically I would like to track, which thread has allocated how much data? Using JVMTI I thought it's sufficient to hook to the events VM Object Allocation and Object Free. Sadly I read the first event is not triggered due to calls made to new.
The last idea I had was to check teh event MethodExit if its name is <init> and thus declare this call as an object allocation. However, within this event I cannot get the object and thus I cannot invoke GetObjectSize.
Simply iterating over the heap, bears no information regarding which object was allocated by which thread. Does anyone have an idea how to implement this?
A quick glance into the _new implementation of the Hotspot VM (templateTable_x86_64.cpp) seems to indicate, that _new doesn't offer any hooks for JVMTI (not even in the slow case it seems). So if your trick doesn't work, I don't see any other possibility - but I'm by no means an expert for JVMTI.
I assume compiling your own Hotspot VM with a small patch isn't especially useful for you?
This heapTracker demo illustrates you how to track all the objects in the heap.
Because the VMObjectAlloc Event is sent only when reflection occurs, the demo uses ByteCodeInstrument to track new object allocation.
You can use getCurrentThread function to know which thread the object is belong to.
Is there some reason you can't call GetObjectSize from the MethodEntry event for a constructor?
If you're interested in executing code before a method returns, then you can listen for the MethodEntry event, and if the method is named <init>, you can call NotifyFramePop to listen for the FramePop event for the current frame. This event is similar to the MethodExit event, but occurs before the method returns so you can still get the this object.

Categories