My application is using Gson 2.2 for converting POJOs to JSON. When I was making a load test I stumbled upon a lot of threads blocked in Gson constructor:
"http-apr-28201-exec-28" #370 daemon prio=5 os_prio=0 tid=0x0000000001ee7800 nid=0x62cb waiting for monitor entry [0x00007fe64df9a000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.google.gson.Gson.<init>(Gson.java:200)
at com.google.gson.Gson.<init>(Gson.java:179)
Thread dump does NOT show any threads holding [0x00007fe64df9a000] monitor.
How can I find out who holds it?
Gson code at line 200 looks pretty innocent:
// built-in type adapters that cannot be overridden
factories.add(TypeAdapters.STRING_FACTORY);
factories.add(TypeAdapters.INTEGER_FACTORY);
I'm using JRE 1.8.0_91 on Linux
tl;dr I think you are running into GC-related behavior, where threads are being put in waiting state to allow for garbage collection.
I do not have the whole truth but I hope to provide some pieces of insight.
First thing to realize is that the number in brackets, [0x00007fe64df9a000], is not the address of a monitor. The number in brackets can be seen for all threads in a dump, even threads that are in running state. The number also does not change. Example from my test dump:
main" #1 prio=5 os_prio=0 tid=0x00007fe27c009000 nid=0x27e5c runnable [0x00007fe283bc2000]
java.lang.Thread.State: RUNNABLE
at Foo.main(Foo.java:12)
I am not sure what the number means, but this page hints that it is:
... the pointer to the Java VM internal thread structure. It is generally of no interest unless you are debugging a live Java VM or core file.
Although the format of the trace explained there is a bit different so I am not sure I am correct.
The way a dump looks when the address of the actual monitor is shown:
"qtp48612937-70" #70 prio=5 os_prio=0 tid=0x00007fbb845b4800 nid=0x133c waiting for monitor entry [0x00007fbad69e8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233)
- waiting to lock <0x00000005b8d68e90> (a java.lang.Object)
Notice the waiting to lock line in the trace and that the address of the monitor is different from the number in brackets.
The fact that we cannot see the address of the monitor involved indicates that the monitor exists only in native code.
Secondly, the Gson code involved does not contain any synchronization at all. The code just adds an element to an ArrayList (assuming no bytecode manipulation has been done and nothing fishy is being done at low level). I.e., it would not make sense to see the thread waiting for a standard synchronization monitor at this call.
I found some, indications that threads can be shown as waiting for a monitor entry when there is a lot of GC going on.
I wrote a simple test program to try to reproduce it by just adding a lot of elements to an array list:
List<String> l = new ArrayList<>();
while (true) {
for (int i = 0; i < 100_100; i++) {
l.add("" + i);
}
l = new ArrayList<>();
}
Then I took thread dumps of this program. Occasionally I ran into the following trace:
"main" #1 prio=5 os_prio=0 tid=0x00007f35a8009000 nid=0x12448 waiting on condition [0x00007f35ac335000]
java.lang.Thread.State: RUNNABLE
at Foo.main(Foo.java:10) <--- Line of l.add()
While not identical to the OP's trace, it is interesting to have a thread waiting on condition when no synchronization is involved. I experienced it more frequently with a smaller heap, indicating that it might be GC related.
Another possibility could be that code that contains synchronization has been JIT compiled and that prevents you from seeing the actual address of the monitor. However, I think that is less likely since you experience it on ArrayList.add. If that is the case, I know of no way to find out the actual holder of the monitor.
If you don't have GC issues then may be actually there is some thread which has acquired lock on an object and stuck thread is waiting to acquire lock on the same object. The way to figure out is look for
- waiting to lock <some_hex_address> (a <java_class>)
example would be
- waiting to lock <0x00000000f139bb98> (a java.util.concurrent.ConcurrentHashMap)
in the thread dump for entry which says waiting for monitor entry. Once you have found it, you can search for thread that has already acquired lock on the object with address <some_hex_address>, it would look something like this for the example -
- locked <0x00000000f139bb98> (a java.util.concurrent.ConcurrentHashMap)
Now you can see the stacktrace of that thread to figure out which line of code has acquired it.
Related
I have an application with 1 writer thread and 8 reader threads accessing a shared resource, which is behind a ReentrantReadWriteLock. It froze for about an hour, producing no log output and not responding to requests. This is on Java 8.
Before killing it someone took thread dumps, which look like this:
Writer thread:
"writer-0" #83 prio=5 os_prio=0 tid=0x00007f899c166800 nid=0x2b1f waiting on condition [0x00007f898d3ba000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000002b8dd4ea8> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
Reader:
"reader-1" #249 daemon prio=5 os_prio=0 tid=0x00007f895000c000 nid=0x33d6 waiting on condition [0x00007f898edcf000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000002b8dd4ea8> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
This looks like a deadlock, however there are a couple of things that make me doubt that:
I can't find another thread that could possibly be holding the same lock
Taking a thread dump 4 seconds later yealds the same result, but all threads now report parking to wait for <0x00000002a7daa878>, which is different than 0x00000002b8dd4ea8 in the first dump.
Is this a deadlock? I see that there is some change in the threads' state, but it could only be internal to the lock implementation. What else could be causing this behaviour?
It turned out it was a deadlock. The thread holding the lock was not reported as holding any locks in the thread dump, which made it difficult to diagnose.
The only way to understand that was to inspect a heap dump of the application. For those interested in how, here's the process step-by-step:
A heap dump was taken at roughly the same time as the thread dumps.
I opened it using Java VisualVM, which comes with JDK.
In the "Classes" view I filtered by the class name of the class that contains the lock as a field.
I double-clicked on the class to be taken to the "Instances" view
Thankfully, there were only a few instances of that class, so I was able to find the one that was causing problems.
I inspected the ReentrantReadWriteLock object kept in a field in the class. In particular the sync field of that lock keeps its state - in this case it was ReentrantReadWriteLock$FairSync.
The state property of it was 65536. This represents both the number of shared and exclusive holds of the lock. The shared holds count is stored in the first 16 bits of the state, and is retrieved as state >>> 16. The exclusive holds count is in the last 16 bits, and is retrieved as state & ((1 << 16) - 1). From this we can see that there's 1 shared hold and 0 exclusive holds on the lock.
You can see the threads waiting for the lock in the head field. It is a queue, with thread containing the waiting thread, and next containing the next node in the queue. Going through it I found the writer-0 and 7 of the 8 reader-n threads, confirming what we know from the thread dump.
The firstReader field of the sync object contains the thread that have acquired the read log - from the comment in the code firstReader is the first thread to have acquired the read lock. firstReaderHoldCount is firstReader's hold count.More precisely, firstReader is the unique thread that last changed the shared count from 0 to 1, and has not released theread lock since then; null if there is no such thread.
In this case the thread holding the lock was one of the reader threads. It was blocked on something entirely different, which would have required one of the other reader threads to progress. Ultimately it was caused by a bug, in which a reader thread would not properly release the lock, and keep it forever. That I found by analyzing the code, and adding tracking and logging when the lock was acquired and released.
Is this a deadlock?
I don't think this is evidence of a deadlock. At least, not in the classic sense of the term.
The stack dump shows two threads waiting on the same ReentrantReadWriteLock. One thread is waiting to acquire the read lock. The other is waiting to acquire the write lock.
Now if no threads currently holds any locks, then one of these threads would be able to proceed.
If some other thread currently held the write lock, that would be sufficient to block both of these threads. But that isn't a deadlock. It would only be a deadlock if that third thread was itself waiting on a different lock ... and there was a circularity in the blocking.
So what about the possibility of these two threads blocking each other? I don't think that is possible. The reentrancy rules in the javadocs allow a thread that has the write lock to acquire the read lock without blocking. Likewise it can acquire the write lock it it already holds it.
The other piece of evidence is that things have changed in the thread dump you took a bit later. If there was a genuine deadlock, there would be no change.
If it is not a deadlock between (just) these two threads, what else could it be?
One possibility is that a third thread is holding the write lock (for a long time) and that is gumming things up. Just too much contention on this readwrite lock.
If the (assumed) third thread is using tryLock, it is possible that you have a livelock ... which could explain the "change" evidence. But on the flip-side, that thread should have been parked too ... which you say that you don't see.
Another possibility is that you have too many active threads ... and the OS is struggling to schedule them to cores.
But this is all speculation.
Recently I often encounter situations where the CPU reaches 100%, So I checked the CPU situation:
It seems that the memory is full, so I dumped heap, and try to analyze it using MAT.
I first noticed that the memory of Finalizer is very abnormal, used more than 2G of memory.
For comparison, it is about 500MB under normal circumstances.
I initially thought it was because the Finalizer thread was blocked, But Java thread dump: BLOCKED thread without "waiting to lock ..." seems to mean that this is not the source of the problem
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f83c008c000 nid=0x1c190 waiting for monitor entry [0x00007f83a1bba000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.zip.Deflater.deflateBytes(Native Method)
at java.util.zip.Deflater.deflate(Deflater.java:444)
- locked <0x00000006d1b327a0> (a java.util.zip.ZStreamRef)
at java.util.zip.Deflater.deflate(Deflater.java:366)
at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:251)
at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
- locked <0x00000006d1b32748> (a java.util.zip.GZIPOutputStream)
at org.apache.coyote.http11.filters.GzipOutputFilter.doWrite(GzipOutputFilter.java:72)
at org.apache.coyote.http11.Http11OutputBuffer.doWrite(Http11OutputBuffer.java:199)
at org.apache.coyote.Response.doWrite(Response.java:538)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:328)
at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:748)
at org.apache.catalina.connector.OutputBuffer.realWriteChars(OutputBuffer.java:433)
at org.apache.catalina.connector.OutputBuffer.flushCharBuffer(OutputBuffer.java:753)
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:284)
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:261)
at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:118)
at javax.imageio.stream..close(FileCacheImageOutputStream.java:238)
at javax.imageio.stream.ImageFileCacheImageOutputStreamInputStreamImpl.finalize(ImageInputStreamImpl.java:874)
at java.lang.System$2.invokeFinalize(System.java:1270)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98)
at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210)
Following all the way, I think I found the object that takes up memory:
So it seems that URLJarFile creates a lot of unused buffers, but I don’t know how to continue to find the source of the problem after I get here. Maybe I need to monitor the stack calling here?
Add: I think I found some useful information, most of which are loaded with jstl-1.2.jar
I aware that JavaScript by design won’t support multithread, but we use JavaScript code like a service, where we compile the JavaScript code using Nashorn and invoke one of the method from the compiled script instance, concurrently, with different inputs to get the desired output. Our JavaScript code is thread-safe and they never access or manipulate any global data, no closures manipulation.
Occasionally, one of the thread get stuck in WeekHashMap and blocks all other concurrent threads forever. As of now, we don’t have any workaround or solution, since the WeekHashMap::getEntry() method is get stuck in tight-loop and there is no way to interrupt and safely kill the thread. This forces us to bounce the box now and then, this also downgrades Nashorn's adoptions in Tier-1 high revenue systems.
Thread Dump:
"Thread-3" #7647 prio=5 os_prio=0 tid=0x00007f023c2d0800 nid=0x9384 runnable [0x00007f03feee9000]
java.lang.Thread.State: RUNNABLE
at java.util.WeakHashMap.getEntry(WeakHashMap.java:431)
at java.util.WeakHashMap.containsKey(WeakHashMap.java:417)
at jdk.nashorn.internal.runtime.PropertyListeners$WeakPropertyMapSet.contains(PropertyListeners.java:217)
at jdk.nashorn.internal.runtime.PropertyListeners.containsListener(PropertyListeners.java:115)
- locked <0x000000063c9ecd68> (a jdk.nashorn.internal.runtime.PropertyListeners)
at jdk.nashorn.internal.runtime.PropertyListeners.addListener(PropertyListeners.java:95)
at jdk.nashorn.internal.runtime.PropertyMap.addListener(PropertyMap.java:247)
at jdk.nashorn.internal.runtime.ScriptObject.getProtoSwitchPoint(ScriptObject.java:2112)
at jdk.nashorn.internal.runtime.ScriptObject.createEmptyGetter(ScriptObject.java:2409)
at jdk.nashorn.internal.runtime.ScriptObject.noSuchProperty(ScriptObject.java:2353)
at jdk.nashorn.internal.runtime.ScriptObject.findGetMethod(ScriptObject.java:1960)
at jdk.nashorn.internal.runtime.ScriptObject.lookup(ScriptObject.java:1828)
at jdk.nashorn.internal.runtime.linker.NashornLinker.getGuardedInvocation(NashornLinker.java:104)
at jdk.nashorn.internal.runtime.linker.NashornLinker.getGuardedInvocation(NashornLinker.java:98)
at jdk.internal.dynalink.support.CompositeTypeBasedGuardingDynamicLinker.getGuardedInvocation(CompositeTypeBasedGuardingDynamicLinker.java:176)
at jdk.internal.dynalink.support.CompositeGuardingDynamicLinker.getGuardedInvocation(CompositeGuardingDynamicLinker.java:124)
at jdk.internal.dynalink.support.LinkerServicesImpl.getGuardedInvocation(LinkerServicesImpl.java:154)
at jdk.internal.dynalink.DynamicLinker.relink(DynamicLinker.java:253)
at java.lang.invoke.LambdaForm$DMH/1376533963.invokeSpecial_LLIL_L(LambdaForm$DMH)
at java.lang.invoke.LambdaForm$BMH/1644775282.reinvoke(LambdaForm$BMH)
at java.lang.invoke.LambdaForm$MH/1967400458.exactInvoker(LambdaForm$MH)
at java.lang.invoke.LambdaForm$reinvoker/1083020379.dontInline(LambdaForm$reinvoker)
//Trimmed Purposely
"Thread-2" #7646 prio=5 os_prio=0 tid=0x00007f023c2d6800 nid=0x9383 waiting for monitor entry [0x00007f03fefea000]
java.lang.Thread.State: BLOCKED (on object monitor)
at jdk.nashorn.internal.runtime.PropertyListeners.containsListener(PropertyListeners.java:111)
- waiting to lock <0x000000063c9ecd68> (a jdk.nashorn.internal.runtime.PropertyListeners)
at jdk.nashorn.internal.runtime.PropertyListeners.addListener(PropertyListeners.java:95)
at jdk.nashorn.internal.runtime.PropertyMap.addListener(PropertyMap.java:247)
at jdk.nashorn.internal.runtime.ScriptObject.getProtoSwitchPoint(ScriptObject.java:2112)
at jdk.nashorn.internal.runtime.ScriptObject.createEmptyGetter(ScriptObject.java:2409)
at jdk.nashorn.internal.runtime.ScriptObject.noSuchProperty(ScriptObject.java:2353)
at jdk.nashorn.internal.runtime.ScriptObject.findGetMethod(ScriptObject.java:1960)
at jdk.nashorn.internal.runtime.ScriptObject.lookup(ScriptObject.java:1828)
at jdk.nashorn.internal.runtime.linker.NashornLinker.getGuardedInvocation(NashornLinker.java:104)
at jdk.nashorn.internal.runtime.linker.NashornLinker.getGuardedInvocation(NashornLinker.java:98)
at jdk.internal.dynalink.support.CompositeTypeBasedGuardingDynamicLinker.getGuardedInvocation(CompositeTypeBasedGuardingDynamicLinker.java:176)
at jdk.internal.dynalink.support.CompositeGuardingDynamicLinker.getGuardedInvocation(CompositeGuardingDynamicLinker.java:124)
at jdk.internal.dynalink.support.LinkerServicesImpl.getGuardedInvocation(LinkerServicesImpl.java:154)
at jdk.internal.dynalink.DynamicLinker.relink(DynamicLinker.java:253)
at java.lang.invoke.LambdaForm$DMH/1376533963.invokeSpecial_LLIL_L(LambdaForm$DMH)
at java.lang.invoke.LambdaForm$BMH/1644775282.reinvoke(LambdaForm$BMH)
at java.lang.invoke.LambdaForm$MH/1967400458.exactInvoker(LambdaForm$MH)
at java.lang.invoke.LambdaForm$reinvoker/1083020379.dontInline(LambdaForm$reinvoker)
at java.lang.invoke.LambdaForm$MH/363682507.guard(LambdaForm$MH)
at java.lang.invoke.LambdaForm$reinvoker/1083020379.dontInline(LambdaForm$reinvoker)
//Trimmed Purposely
Almost similar issue reported in the following bug, but I am not able to +1 or add more details in this bug. As stated in this bug, it is really hard to reproduce this bug from developer system.
https://bugs.openjdk.java.net/browse/JDK-8146274
Questions:
If there any better workaround to address this problem?
What if JDK team replace the WeakHashMap with ConcurrentHashMap? WeakHashMap for sure does not support thread safety.
I have the following thread dump I get using jstack and would like to know what the hex value next to the word runnable shows. I have seen the same value used in other places showing as:
waiting on condition [0x00000000796e9000]
Does this mean the other threads are waiting on this thread?
runnable [0x00000000796e9000]
thread dump
"ajp-bio-8009-exec-2925" daemon prio=10 tid=0x0000000015ca7000 nid=0x53c7 runnable [0x00000000796e9000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
I have the following thread dump I get using jstack and would like to
know what the hex value next to the word runnable shows. I have seen
the same value used in other places showing as:
waiting on condition [0x00000000796e9000]
Does this mean the other threads are waiting on this thread?
Yes. This indicates one thread holds a lock and another thread is waiting to obtain that lock. This is fairly similar to the synchronized keyword conceptually, but can be quite a bit more powerful (and complicated).
Take a look at the javadoc for condition to get a better understanding of conditions.
This question/answer gives a description of the attributes in a thread dump (for java 6).
I have Java EE based application running on tomcat and I am seeing that all of a sudden the application hangs after running for couple of hours.
I collected the thread dump from the application just before it hangs and put it in TDA for analysis:
TDA (Thread Dump Analyzer) gives the following message for the above monitor:
A lot of threads are waiting for this monitor to become available again.
This might indicate a congestion. You also should analyze other locks
blocked by threads waiting for this monitor as there might be much more
threads waiting for it.
And here is the stacktrace of the thread highlighted above:
"MY_THREAD" prio=10 tid=0x00007f97f1918800 nid=0x776a
waiting for monitor entry [0x00007f9819560000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Hashtable.get(Hashtable.java:356)
- locked <0x0000000680038b68> (a java.util.Properties)
at java.util.Properties.getProperty(Properties.java:951)
at java.lang.System.getProperty(System.java:709)
at com.MyClass.myMethod(MyClass.java:344)
I want to know what does the "waiting for monitor entry" state means? And also would appreciate any pointers to help me debug this issue.
One of your threads acquired a monitor object (an exclusive lock on a object). That means the thread is executing synchronized code and for whatever reason stuck there, possibly waiting for other threads. But the other threads cannot continue their execution because they encountered a synchronized block and asked for a lock (monitor object), however they cannot get it until it is released by other thread. So... probably deadlock.
Please look for this string from the whole thread dump
- locked <0x00007f9819560000>
If you can find it, the thread is deadlock with thread "tid=0x00007f97f1918800"
Monitor = synchronized. You have lots of threads trying to get the lock on the same object.
Maybe you should switch from using a Hashtable and use a HashMap
This means that your thread is trying to set a lock (on the Hashtable), but some other thread is already accessing it and has set a lock. So it's waiting for the lock to release. Check what your other threads are doing. Especially thread with tid="0x00007f9819560000"