I am continuing my path to deep understanding of Java Thread. Unfortunately my Java Certification didn't cover that part, so the only way of learning is to post a series of dumb questions. With so many years of Java Development, I am sometimes wondering how much I still have to learn :-)
In particular my attention is now with the reference handler thread.
"Reference Handler" daemon prio=10 tid=0x02da3400 nid=0xb98 in Object.wait() [0x0302f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x1aac0320> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
- locked <0x1aac0320> (a java.lang.ref.Reference$Lock)
Now some questions are following, for some of them I know the answer, but I am not posting it, because I would like to hear someone else opinions:
What is the Reference Handler thread supposed to do ?
A thread dump should be considered bottom up, why does the stack trace start with locked, shouldn't the lock statement appears at least after the thread has run ?
What does "Native Method" means ?
Why "Unknown Source", in which case the thread dump cannot recall the source code ?
Lastly the waiting on and locked has the same , why ?
as usual, I kindly ask to answer all the questions, so that I can mark answered.
I suspect it handles running finalizers for the JVM. It's an implementation detail and as such not specified in the JVM spec.
This only means that the java.lang.ref.Reference$Lock was locked in the method mentioned in the line preceding it (i.e in ReferenceHandler.run().
"Native Method" simply means that the method is implemented in native (i.e. non-Java) code (think JNI).
Unknown Source only means that the .class file doesn't contain any source code location information (at least for this specific point). This can happen either when the method is a synthetic one (doesn't look like it here), or the class was compiled without debug information.
When a thread waits on some object, then it must have locked that object at some point down the call trace, so you can't really have a waiting on without a corresponding locked.
1) The Finalizer Thread calls finalizer methods.
The Reference Thread has a similar purpose.
http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/lang/java/lang/ref/Reference.java.htm
The OpenJDK source states its is a
High-priority thread to enqueue pending References
The GC creates a simple linked list of references which need to be processed and this thread quickly adds them to a proper queue. The reason this is done in two phases is that the GC does nothing but find the References, this thread calls code which handles those references e.g. Call Cleaners, and notifies ReferenceQueue listeners.
2) A lock is acquired for a synchronized method before it is entered.
3-5) covered by Joachim ;)
Wow, too deep for me. I can only answer one or two of your questions.
"Native Method" simply means the implementation of that method is in some native (i.e. C or C++) library. Once the call stack has "gone native", the JVM can no longer monitor it. No way for it to provide additional stack information.
"Unknown Source" likely means the code was compiled with optimization turned on and debugging info turned off (-g flag?). This eliminates the file/line information from the .class file.
Related
Recently I often encounter situations where the CPU reaches 100%, So I checked the CPU situation:
It seems that the memory is full, so I dumped heap, and try to analyze it using MAT.
I first noticed that the memory of Finalizer is very abnormal, used more than 2G of memory.
For comparison, it is about 500MB under normal circumstances.
I initially thought it was because the Finalizer thread was blocked, But Java thread dump: BLOCKED thread without "waiting to lock ..." seems to mean that this is not the source of the problem
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f83c008c000 nid=0x1c190 waiting for monitor entry [0x00007f83a1bba000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.zip.Deflater.deflateBytes(Native Method)
at java.util.zip.Deflater.deflate(Deflater.java:444)
- locked <0x00000006d1b327a0> (a java.util.zip.ZStreamRef)
at java.util.zip.Deflater.deflate(Deflater.java:366)
at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:251)
at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
- locked <0x00000006d1b32748> (a java.util.zip.GZIPOutputStream)
at org.apache.coyote.http11.filters.GzipOutputFilter.doWrite(GzipOutputFilter.java:72)
at org.apache.coyote.http11.Http11OutputBuffer.doWrite(Http11OutputBuffer.java:199)
at org.apache.coyote.Response.doWrite(Response.java:538)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:328)
at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:748)
at org.apache.catalina.connector.OutputBuffer.realWriteChars(OutputBuffer.java:433)
at org.apache.catalina.connector.OutputBuffer.flushCharBuffer(OutputBuffer.java:753)
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:284)
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:261)
at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:118)
at javax.imageio.stream..close(FileCacheImageOutputStream.java:238)
at javax.imageio.stream.ImageFileCacheImageOutputStreamInputStreamImpl.finalize(ImageInputStreamImpl.java:874)
at java.lang.System$2.invokeFinalize(System.java:1270)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98)
at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210)
Following all the way, I think I found the object that takes up memory:
So it seems that URLJarFile creates a lot of unused buffers, but I don’t know how to continue to find the source of the problem after I get here. Maybe I need to monitor the stack calling here?
Add: I think I found some useful information, most of which are loaded with jstl-1.2.jar
I have dump from JProfiler, where I was looking on performance and found blocked threads during test run. It looks like this:
Accordind to question here Understanding the Reference Handler thread, the lock on java.lang.ref.Reference$Lock would means object is locked for running finalizers for GC. Is this assumption correct?
Also why in second row, it states, this lock is owned by my other own thread? Is that possible?
Does have any knowledge what happening here?
My application is using Gson 2.2 for converting POJOs to JSON. When I was making a load test I stumbled upon a lot of threads blocked in Gson constructor:
"http-apr-28201-exec-28" #370 daemon prio=5 os_prio=0 tid=0x0000000001ee7800 nid=0x62cb waiting for monitor entry [0x00007fe64df9a000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.google.gson.Gson.<init>(Gson.java:200)
at com.google.gson.Gson.<init>(Gson.java:179)
Thread dump does NOT show any threads holding [0x00007fe64df9a000] monitor.
How can I find out who holds it?
Gson code at line 200 looks pretty innocent:
// built-in type adapters that cannot be overridden
factories.add(TypeAdapters.STRING_FACTORY);
factories.add(TypeAdapters.INTEGER_FACTORY);
I'm using JRE 1.8.0_91 on Linux
tl;dr I think you are running into GC-related behavior, where threads are being put in waiting state to allow for garbage collection.
I do not have the whole truth but I hope to provide some pieces of insight.
First thing to realize is that the number in brackets, [0x00007fe64df9a000], is not the address of a monitor. The number in brackets can be seen for all threads in a dump, even threads that are in running state. The number also does not change. Example from my test dump:
main" #1 prio=5 os_prio=0 tid=0x00007fe27c009000 nid=0x27e5c runnable [0x00007fe283bc2000]
java.lang.Thread.State: RUNNABLE
at Foo.main(Foo.java:12)
I am not sure what the number means, but this page hints that it is:
... the pointer to the Java VM internal thread structure. It is generally of no interest unless you are debugging a live Java VM or core file.
Although the format of the trace explained there is a bit different so I am not sure I am correct.
The way a dump looks when the address of the actual monitor is shown:
"qtp48612937-70" #70 prio=5 os_prio=0 tid=0x00007fbb845b4800 nid=0x133c waiting for monitor entry [0x00007fbad69e8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233)
- waiting to lock <0x00000005b8d68e90> (a java.lang.Object)
Notice the waiting to lock line in the trace and that the address of the monitor is different from the number in brackets.
The fact that we cannot see the address of the monitor involved indicates that the monitor exists only in native code.
Secondly, the Gson code involved does not contain any synchronization at all. The code just adds an element to an ArrayList (assuming no bytecode manipulation has been done and nothing fishy is being done at low level). I.e., it would not make sense to see the thread waiting for a standard synchronization monitor at this call.
I found some, indications that threads can be shown as waiting for a monitor entry when there is a lot of GC going on.
I wrote a simple test program to try to reproduce it by just adding a lot of elements to an array list:
List<String> l = new ArrayList<>();
while (true) {
for (int i = 0; i < 100_100; i++) {
l.add("" + i);
}
l = new ArrayList<>();
}
Then I took thread dumps of this program. Occasionally I ran into the following trace:
"main" #1 prio=5 os_prio=0 tid=0x00007f35a8009000 nid=0x12448 waiting on condition [0x00007f35ac335000]
java.lang.Thread.State: RUNNABLE
at Foo.main(Foo.java:10) <--- Line of l.add()
While not identical to the OP's trace, it is interesting to have a thread waiting on condition when no synchronization is involved. I experienced it more frequently with a smaller heap, indicating that it might be GC related.
Another possibility could be that code that contains synchronization has been JIT compiled and that prevents you from seeing the actual address of the monitor. However, I think that is less likely since you experience it on ArrayList.add. If that is the case, I know of no way to find out the actual holder of the monitor.
If you don't have GC issues then may be actually there is some thread which has acquired lock on an object and stuck thread is waiting to acquire lock on the same object. The way to figure out is look for
- waiting to lock <some_hex_address> (a <java_class>)
example would be
- waiting to lock <0x00000000f139bb98> (a java.util.concurrent.ConcurrentHashMap)
in the thread dump for entry which says waiting for monitor entry. Once you have found it, you can search for thread that has already acquired lock on the object with address <some_hex_address>, it would look something like this for the example -
- locked <0x00000000f139bb98> (a java.util.concurrent.ConcurrentHashMap)
Now you can see the stacktrace of that thread to figure out which line of code has acquired it.
I aware that JavaScript by design won’t support multithread, but we use JavaScript code like a service, where we compile the JavaScript code using Nashorn and invoke one of the method from the compiled script instance, concurrently, with different inputs to get the desired output. Our JavaScript code is thread-safe and they never access or manipulate any global data, no closures manipulation.
Occasionally, one of the thread get stuck in WeekHashMap and blocks all other concurrent threads forever. As of now, we don’t have any workaround or solution, since the WeekHashMap::getEntry() method is get stuck in tight-loop and there is no way to interrupt and safely kill the thread. This forces us to bounce the box now and then, this also downgrades Nashorn's adoptions in Tier-1 high revenue systems.
Thread Dump:
"Thread-3" #7647 prio=5 os_prio=0 tid=0x00007f023c2d0800 nid=0x9384 runnable [0x00007f03feee9000]
java.lang.Thread.State: RUNNABLE
at java.util.WeakHashMap.getEntry(WeakHashMap.java:431)
at java.util.WeakHashMap.containsKey(WeakHashMap.java:417)
at jdk.nashorn.internal.runtime.PropertyListeners$WeakPropertyMapSet.contains(PropertyListeners.java:217)
at jdk.nashorn.internal.runtime.PropertyListeners.containsListener(PropertyListeners.java:115)
- locked <0x000000063c9ecd68> (a jdk.nashorn.internal.runtime.PropertyListeners)
at jdk.nashorn.internal.runtime.PropertyListeners.addListener(PropertyListeners.java:95)
at jdk.nashorn.internal.runtime.PropertyMap.addListener(PropertyMap.java:247)
at jdk.nashorn.internal.runtime.ScriptObject.getProtoSwitchPoint(ScriptObject.java:2112)
at jdk.nashorn.internal.runtime.ScriptObject.createEmptyGetter(ScriptObject.java:2409)
at jdk.nashorn.internal.runtime.ScriptObject.noSuchProperty(ScriptObject.java:2353)
at jdk.nashorn.internal.runtime.ScriptObject.findGetMethod(ScriptObject.java:1960)
at jdk.nashorn.internal.runtime.ScriptObject.lookup(ScriptObject.java:1828)
at jdk.nashorn.internal.runtime.linker.NashornLinker.getGuardedInvocation(NashornLinker.java:104)
at jdk.nashorn.internal.runtime.linker.NashornLinker.getGuardedInvocation(NashornLinker.java:98)
at jdk.internal.dynalink.support.CompositeTypeBasedGuardingDynamicLinker.getGuardedInvocation(CompositeTypeBasedGuardingDynamicLinker.java:176)
at jdk.internal.dynalink.support.CompositeGuardingDynamicLinker.getGuardedInvocation(CompositeGuardingDynamicLinker.java:124)
at jdk.internal.dynalink.support.LinkerServicesImpl.getGuardedInvocation(LinkerServicesImpl.java:154)
at jdk.internal.dynalink.DynamicLinker.relink(DynamicLinker.java:253)
at java.lang.invoke.LambdaForm$DMH/1376533963.invokeSpecial_LLIL_L(LambdaForm$DMH)
at java.lang.invoke.LambdaForm$BMH/1644775282.reinvoke(LambdaForm$BMH)
at java.lang.invoke.LambdaForm$MH/1967400458.exactInvoker(LambdaForm$MH)
at java.lang.invoke.LambdaForm$reinvoker/1083020379.dontInline(LambdaForm$reinvoker)
//Trimmed Purposely
"Thread-2" #7646 prio=5 os_prio=0 tid=0x00007f023c2d6800 nid=0x9383 waiting for monitor entry [0x00007f03fefea000]
java.lang.Thread.State: BLOCKED (on object monitor)
at jdk.nashorn.internal.runtime.PropertyListeners.containsListener(PropertyListeners.java:111)
- waiting to lock <0x000000063c9ecd68> (a jdk.nashorn.internal.runtime.PropertyListeners)
at jdk.nashorn.internal.runtime.PropertyListeners.addListener(PropertyListeners.java:95)
at jdk.nashorn.internal.runtime.PropertyMap.addListener(PropertyMap.java:247)
at jdk.nashorn.internal.runtime.ScriptObject.getProtoSwitchPoint(ScriptObject.java:2112)
at jdk.nashorn.internal.runtime.ScriptObject.createEmptyGetter(ScriptObject.java:2409)
at jdk.nashorn.internal.runtime.ScriptObject.noSuchProperty(ScriptObject.java:2353)
at jdk.nashorn.internal.runtime.ScriptObject.findGetMethod(ScriptObject.java:1960)
at jdk.nashorn.internal.runtime.ScriptObject.lookup(ScriptObject.java:1828)
at jdk.nashorn.internal.runtime.linker.NashornLinker.getGuardedInvocation(NashornLinker.java:104)
at jdk.nashorn.internal.runtime.linker.NashornLinker.getGuardedInvocation(NashornLinker.java:98)
at jdk.internal.dynalink.support.CompositeTypeBasedGuardingDynamicLinker.getGuardedInvocation(CompositeTypeBasedGuardingDynamicLinker.java:176)
at jdk.internal.dynalink.support.CompositeGuardingDynamicLinker.getGuardedInvocation(CompositeGuardingDynamicLinker.java:124)
at jdk.internal.dynalink.support.LinkerServicesImpl.getGuardedInvocation(LinkerServicesImpl.java:154)
at jdk.internal.dynalink.DynamicLinker.relink(DynamicLinker.java:253)
at java.lang.invoke.LambdaForm$DMH/1376533963.invokeSpecial_LLIL_L(LambdaForm$DMH)
at java.lang.invoke.LambdaForm$BMH/1644775282.reinvoke(LambdaForm$BMH)
at java.lang.invoke.LambdaForm$MH/1967400458.exactInvoker(LambdaForm$MH)
at java.lang.invoke.LambdaForm$reinvoker/1083020379.dontInline(LambdaForm$reinvoker)
at java.lang.invoke.LambdaForm$MH/363682507.guard(LambdaForm$MH)
at java.lang.invoke.LambdaForm$reinvoker/1083020379.dontInline(LambdaForm$reinvoker)
//Trimmed Purposely
Almost similar issue reported in the following bug, but I am not able to +1 or add more details in this bug. As stated in this bug, it is really hard to reproduce this bug from developer system.
https://bugs.openjdk.java.net/browse/JDK-8146274
Questions:
If there any better workaround to address this problem?
What if JDK team replace the WeakHashMap with ConcurrentHashMap? WeakHashMap for sure does not support thread safety.
I have a method in Java that call several other methods. This method is being called from several threads, in a fixed thread pool. The number of workers is the same as the number of available processors (cores).
public void query() {
method1();
method2();
}
When I profile the program execution using VisualVM the times of method1() and method2() times are very short, but query() self-time is very long. But the method has no other code apart from the two calls. There might be synchronization inside method1() and method2(), but nothing obvious in code that I have control of.
When I reduce the number of workers in the pool to 1, this self-time is almost gone. Both single-threaded and multi-threaded execution times of the whole program are almost the same. I think it means that my method query() is waiting for something.
There are no deadlocks, the execution finishes fine. The two methods method1() and method2() call a lot of other things including library classes in obfuscated jars, so it is not easy for me to debug it. However the query() method is called directly from the worker threads, using java.util.concurrent.ExecutorService.
Issue a kill command at level 3 against the running process. All threads will dump a stack trace to the standard out and the app will continue running.
kill -3 <pid>
Note, you wont see anything on the console where you issued the kill command. The Java app itself will have the output. You might need to check logs, depending on where the app is redirecting its output.
I have found the problem in a proxy class that was wrapping another class in a custom locking mechanism.
I went on creating a series of Thread Dumps. Since I was using JVisualVM for profiling, I created a handful of Thread Dumps during the process. Ctrl+Break worked too, same as kill -3 <pid> mentioned by Synesso in his answer.
I used the Thread Dump Analyzer mentioned in the comments to analyze them. I did not know what to look for first, but thanks to the linking of objects and monitors in the TDA, I found something like this:
"pool-9-thread-32" #304 prio=5 os_prio=0 tid=0x000000002a706800 nid=0x348c waiting for monitor entry [0x000000003f06e000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.example.MyClass.method1(MyClass.java:400)
- waiting to lock <0x0000000680837b90> (a com.example.DifferentClass)
at com.example.MyClass.query(MyClass.java:500)
... omitted ...
at java.util.concurrent.FutureTask.run(FutureTask.java:270)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:618)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000075bc59aa8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
DifferentClass extends abstract MyClass and there is a call from method1() to DifferentClass, where a DTO object is passed to a method that does a lot of processing, logging and finally saving to a database. The proxy class was used during the creation of one of the database handling classes.
Your best option is to find a way to get a stack trace of the running program. Here is one possible way.
I suggest running the program using the debug mode in your IDE and put break points next to what may appear to be the problem. Then step into (e.g. F7 in Netbeans) at the point where the program makes a delay. You can step in all the way to obfuscated code though you may not be able to fix the issue there. However you will have known where the delay is.