tomcat: Waiting on condition thread - java

I'm investigating the strange issue with tomcat shutdown process: after runnig shutdown.sh the java process still appears(I check it by using ps -ef|grep tomcat)
The situation is a bit complicated, because I have very limitted access to the server(no debug, for example)
I took thread dump(by using kill -3 <PID>) and heap dump by using remote jConsole and Hotspot features.
After looking into thread dump I found this:
"pool-2-thread-1" #74 prio=5 os_prio=0 tid=0x00007f3354359800 nid=0x7b46 waiting on condition [0x00007f333e55d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c378d330> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
So, my understanding of the problem is follows: There is a resource(DB connection or something else) which is used in CachedThreadpool, and this resource is now locked,
and prevent to thread pool-2-thread-1 to stop. Assuming that this thread isn't deamon - JVM cannot gracefully stop.
Is there a way to find out which resource is locked, from where is it locked and how to avoid that? Another question is - how to prevent this situation?
Another this is: what the adress 0x00007f333e55d000 is for?
Thanks!

After a wail of struggling I found out, that the freezing thread always have the same name pool-2-thread-1. I grep all the sources of the project to find any places where any scheduled thread pool is started: Executors#newScheduledThreadPool. After a huge amount of time I logged all that places, with thread name in thread pool.
One more restart server and gotcha!
I found out one thread pool, that is started with only one thread and used following code:
private void shutdownThreadpool(int tries) {
try {
boolean terminated;
LOGGER.debug("Trying to shutdown thread pool in {} tries", tries);
pool.shutdown();
do {
terminated = pool.awaitTermination(WAIT_ON_SHUTDOWN,TimeUnit.MILLISECONDS);
if (--tries == 0
&& !terminated) {
LOGGER.debug("After 10 attempts couldn't shutdown thread pool, force shutdown");
pool.shutdownNow();
terminated = pool.awaitTermination(WAIT_ON_SHUTDOWN, TimeUnit.MILLISECONDS);
if (!terminated) {
LOGGER.debug("Cannot stop thread pool even with force");
LOGGER.trace("Some of the workers doesn't react to Interruption event properly");
terminated = true;
}
} else {
LOGGER.info("After {} attempts doesn't stop", tries);
}
} while (!terminated);
LOGGER.debug("Successfully stop thread pool");
} catch (final InterruptedException ie) {
LOGGER.warn("Thread pool shutdown interrupted");
}
}
After that the issue was solved.

Related

Thread dump for threads which are managed by executor framework

I have written following program. Basically I am using executor framework to manage threads. I've also used a BlockingQueue and deliberately keeping it empty so that the thread remains in waiting state.
The below is the program:
package com.example.executors;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ScheduledFuture;
import java.util.concurrent.TimeUnit;
public class ExecutorDemo {
public static void main(String[] args) throws InterruptedException {
ScheduledExecutorService scheduledThreadPool = null;
BlockingQueue<Integer> bq = new LinkedBlockingQueue<>();
scheduledThreadPool = Executors.newSingleThreadScheduledExecutor((Runnable run) -> {
Thread t = Executors.defaultThreadFactory().newThread(run);
t.setDaemon(true);
t.setName("Worker-pool-" + Thread.currentThread().getName());
t.setUncaughtExceptionHandler(
(thread, e) -> System.out.println("thread is --> " + thread + "exception is --> " + e));
return t;
});
ScheduledFuture<?> f = scheduledThreadPool.scheduleAtFixedRate(() -> {
System.out.println("Inside thread.. working");
try {
bq.take();
} catch (InterruptedException e) {
e.printStackTrace();
}
}, 2000, 30000, TimeUnit.MILLISECONDS);
System.out.println("f.isDone() ---> " + f.isDone());
Thread.sleep(100000000000L);
}
}
Once the program runs, main thread remains in TIMED_WAITING state, due to Thread.sleep(). In thread, which is managed by executor, i am making it to read an empty blocking queue, and this thread remain in WAITING state for ever. I wanted to see how does the thread dump looks in this scenario. I have captured it below:
"Worker-pool-main" #10 daemon prio=5 os_prio=31 tid=0x00007f7ef393d800 nid=0x5503 waiting on condition [0x000070000a3d8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007955f7110> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at com.example.cs.executors.CSExecutorUnderstanding.lambda$2(CSExecutorUnderstanding.java:34)
at com.example.cs.executors.CSExecutorUnderstanding$$Lambda$2/1705736037.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
As expected thread Worker-pool-main remains in WAITING state. My doubt is on the thread dump.
As it is executor service which manages the life-cycle of thread in executor framework, then how this thread dump starts with Thread.run() method.
Shouldn't it be that first some portion of executor appearing and then Thread.run()
Basically , the doubt is: when life-cycle is managed by executor, then how come Thread.run() is appearing first and up the stack see portions of executors. Isn't executors starting these threads, so how they are appearing up in the stack?
When you start a new Thread, it will execute its run method on a completely new call stack. That is the entrypoint for the code in that Thread. It is completely decoupled from the thread that called start. The "parent" thread continues to run its own code on its own stack independently, and if either of the two threads crashes or completes it does not impact the other.
The only thing that shows up in a thread's stack frames is whatever gets called inside of run. You don't get to see who called run (the JVM did that). Unless of course, you confused start with run and called the run directly from your own code. Then there is no new thread involved at all.
Here, the thread is not created by your own code directly, but by the executor service. But that one does not do anything different, it also has to create threads by calling constructors and start them using start. The end result is the same.
What run usually does is delegate to a Runnable that has been set in its constructor. You see that here: The executor service has installed a ThreadPoolExecutor$Worker instance. This one contains all the code to be run on the new thread and control its interactions with the executor.
That ThreadPoolExecutor$Worker in turn will then call into its payload code, your application code, the tasks that have been submitted to the executor. In your case, that is com.example.cs.executors.CSExecutorUnderstanding$$Lambda$2/1705736037.

Is the "(daemon)" in Java Thread-Dumps reliable?

We use multiple Thread-Pools in our Java Server,
and one of the Thread-Pool should only contain "deamon" Threads.
I use a Thread Factory, which calls setDaemon(true) for that pool,
and I use my own Thread sub-class, which looks like this:
private static final class DaemonThread extends Thread {
public DaemonThread(ThreadGroup group, Runnable target,
String name, long stackSize) {
super(group, target, name, stackSize);
}
#Override
public void run() {
if (!isDaemon()) {
LOG.error("Wrong state for isDaemon(): false");
}
super.run();
}
}
So, I'm expecting only to see something like this,
when I do a Thread-Dump:
"DAEMON-pool-1-thread-1234" Id=61 in WAITING on lock
=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#2030c72c (daemon)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
at com.company.thread.task.Task$TaskThreadFactory$DaemonThread.run(Task.java:194)
Locked synchronizers: count = 0
But I also see something like this too (no "(daemon)"):
"DAEMON-pool-1-thread-3287" Id=3383 in TIMED_WAITING on lock
=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#69e4d79b
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:1134)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
at com.company.thread.task.Task$TaskThreadFactory$DaemonThread.run(Task.java:194)
Locked synchronizers: count = 0
As you might have noticed from the bottom line,
the Thread that does not say "(daemon)" is of my sub-class.
Firstly, it should be daemon, because the thread-pool sets it,
and secondly, if it is not daemon, then I should see an error log,
which I cannot find in our logs either.
So, idk if the thread-dump output is "wrong", and the thread is daemon,
or if it is not daemon, and the log output is somehow lost,
or if somehow the daemon state is changed after the thread is created,
which is, AFAIK, not allowed:
https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#setDaemon-boolean-
This method must be invoked before the thread is started.
Why do I want to know this? Because our Java server often doesn't shut down when it should, and I suspect it might be related to some threads not being daemon threads, when they should.

How to determine where Java thread interrupts are coming from?

I've got a UI automation framework that launches tests using TestNG and runs through pages using Selenium/WebDriver. Oftentimes the pages I'm testing make AJAX calls that modify the DOM upon returning. In these cases I use Selenium explicit waits to declare a DOM condition that I want to be met before the automation can proceed (IE: some button gets enabled).
Internally Selenium's FluentWait.until method handles this by polling the DOM for my ExpectedCondition every 500ms and calling Thread.sleep() in-between these checks.
When I run two tests back to back in a TestNG suite this works perfectly fine for the first test, but starts to fail with an InterruptedException about halfway through each subsequent test. This is consistent. The exceptions look like this:
Associated Throwable Type: class org.openqa.selenium.WebDriverException Associated Throwable Message: java.lang.InterruptedException: sleep interrupted
The strange thing is that there's no multi-threading going on here. I've disabled Selenium Grid, BrowserMob Proxy, and every other bit of code that could be conflicting. I've read both of these questions:
https://stackoverflow.com/questions/24495176/why-is-thread-sleep-being-interrupted - Closed for not providing enough detail, but one of the proposed answers states that one should override the Thread.interrupt method for debugging.
Who interrupts my thread? - Accepted answer also states that one should override the Thread.interrupt method for debugging.
My problem with this solution is that placing a breakpoint inside the existing Thread.interrupt method does not reveal any calls around the time that the thread is interrupted. This includes calls from all of my third party dependencies (IE: TestNG and Selenium). Whatever is calling this thread interrupt appears to be external to my framework.
I've also tried calling Thread.currentThread.isInterrupted() at every point prior to the FluentWait.until call and it consistently returns false. I've even used IntelliJ's evaluate function to check for isInterrupted inside the Selenium code itself. This thread is only being interrupted once the Thread.sleep call occurs inside FluentWait.until.
I've seen this happen on multiple Windows build servers as well as on my Macbook, so this does not appear to be machine specific.
I thought for a while that this might be caused by a TestNG timeout, but reducing the TestNG timeout in my suite yielded a different behavior than these interruptions.
Currently I'm working around this issue with the following code which swallows the exception and resumes the explicit wait:
public static boolean waitForElementStatus(Stuff)
{
/* snip - setup for ExpectedCondition (change) */
long startSeconds = new Date().getTime() / 1000;
long currentSeconds = startSeconds;
long remainingSeconds = maxElementStatusChangeSeconds;
WebDriverWait waitForElement = new WebDriverWait(driver, maxElementStatusChangeSeconds);
boolean changed = false;
boolean firstWait = true; // If specified time is 0 we still want to check once.
out:while(firstWait || remainingSeconds > 0)
{
firstWait = false;
Boolean exceptionThrown = false;
try
{
waitForElement.until(change);
}
catch(Throwable t)
{
exceptionThrown = true;
if(t.getCause()) != null
{
t = t.getCause(); // InterruptedException is wrapped inside a WebDriverException
}
if(t.getClass().equals(InterruptedException.class))
{
Thread.interrupted(); // clear interrupt status for this thread
currentSeconds = new Date().getTime() / 1000;
remainingSeconds = startSeconds + maxElementStatusChangeSeconds - currentSeconds;
if(remainingSeconds > 0)
{
String warning = String.format("Caught unidentified interrupt inside Selenium " +
"FluentWait.until call. Swallowing interrupt and repeating call with [%s] seconds " +
"remaining.", remainingSeconds);
CombinedLogger.warn(warning);
waitForElement = new WebDriverWait(driver, remainingSeconds);
}
else
{
// If a timeout exception would have been thrown instead of the interruption then
// we'll allow the WebDriverWait to execute one last time so it can throw the
// timeout instead.
waitForElement = new WebDriverWait(driver, 0);
}
}
else if(haltOnFailure) // for any other exception type such as TimeoutException
{
CombinedLogger.error(stuff + "...FAILURE(HALTING)", t);
break out;
}
else // for any other exception type such as TimeoutException
{
CombinedLogger.info(stuff + "...failure(non-halting)");
break out;
}
}
if(!exceptionThrown)
{
changed = true;
CombinedLogger.info(stuff + "...success ");
break out;
}
}
return changed;
}
This workaround does function, and fortunately these mystery interrupts are only occurring sporadically afterwards (they don't happen repeatedly), so the tests are able to proceed. However, I understand that swallowing InterruptedException is bad form. If possible, I'd like to determine where and why these interrupts are taking place so that I can put an end to them instead of using this hack.
Simply propagating the exceptions is not an option since these tests need to continue running instead of obediently crashing.
Are there any known utilities, JVM arguments, or libraries that I could use which would help me track down Java thread interruptions that are caused by code which is out of my control?
Update 12/10/2014: I've captured two thread dumps. One is from immediately before the interrupt and one is from immediately after it. The only difference between the two is the line number of the interrupted thread (it goes from the try block to the catch block after being interrupted). Not sure what this tells me, but here's the data:
Full thread dump (immediately before interrupt)
"TestNG#1359" prio=5 tid=0xc nid=NA runnable
java.lang.Thread.State: RUNNABLE
at org.openqa.selenium.support.ui.FluentWait.until(FluentWait.java:232)
/* snip - company stuff */
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:46)
at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:37)
at org.testng.internal.MethodInvocationHelper.invokeWithTimeoutWithNoExecutor(MethodInvocationHelper.java:240)
at org.testng.internal.MethodInvocationHelper.invokeWithTimeout(MethodInvocationHelper.java:229)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:724)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
at org.testng.TestRunner.privateRun(TestRunner.java:767)
at org.testng.TestRunner.run(TestRunner.java:617)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:348)
at org.testng.SuiteRunner.access$000(SuiteRunner.java:38)
at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382)
at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"main#1" prio=5 tid=0x1 nid=NA waiting
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:422)
at java.util.concurrent.FutureTask.get(FutureTask.java:199)
at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:289)
at org.testng.internal.thread.ThreadUtil.execute(ThreadUtil.java:72)
at org.testng.SuiteRunner.runInParallelTestMode(SuiteRunner.java:367)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:308)
at org.testng.SuiteRunner.run(SuiteRunner.java:254)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1149)
at org.testng.TestNG.run(TestNG.java:1057)
at org.testng.remote.RemoteTestNG.run(RemoteTestNG.java:111)
at org.testng.remote.RemoteTestNG.initAndRun(RemoteTestNG.java:204)
at org.testng.remote.RemoteTestNG.main(RemoteTestNG.java:175)
at org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:125)
"Thread-8#2432" daemon prio=5 tid=0x15 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0xe08> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105)
at java.lang.Thread.run(Thread.java:745)
"Thread-7#2431" daemon prio=5 tid=0x14 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0xe09> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105)
at java.lang.Thread.run(Thread.java:745)
"Thread-6#2424" prio=5 tid=0x13 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:261)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:347)
at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:46)
at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:188)
"process reaper#2008" daemon prio=10 tid=0x10 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.lang.UNIXProcess.waitForProcessExit(UNIXProcess.java:-1)
at java.lang.UNIXProcess.access$500(UNIXProcess.java:54)
at java.lang.UNIXProcess$4.run(UNIXProcess.java:225)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"ReaderThread#645" prio=5 tid=0xb nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(SocketInputStream.java:-1)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
- locked <0xe0b> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.testng.remote.strprotocol.BaseMessageSender$ReaderThread.run(BaseMessageSender.java:245)
"Finalizer#2957" daemon prio=8 tid=0x3 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
"Reference Handler#2958" daemon prio=10 tid=0x2 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
"Signal Dispatcher#2956" daemon prio=9 tid=0x4 nid=NA runnable
java.lang.Thread.State: RUNNABLE
There is not much that can be inferred from the thread dump as in what caused it.
But in reality you cannot rely on Thread.sleep() too much ,it might be interrupted for known/unknown reason.OS might be the reason in the later case.
Thread.sleep() is one of the few methods which takes interrupt seriously. As a thread cannot handle InterruptedException while it is sleeping ,you need to handle it.
What you are doing right now might not be a workaround but a way to go in such cases,where we cannot do without Thread.sleep().
A bit outdated but I have similar problem and with the help of your previously posted link (https://stackoverflow.com/a/2476246) I put a breakpoint into the Thread.interrupt() method.
It reveals that the interruption was made by StoryManager.waitUntilAllDoneOrFailed() method that triggers future.cancel() method after the timeout set on whole story.
My whole setup is:
page.getPageObject().withTimeoutOf(convertDuration(duration)).waitFor(by);
where duration is about 60 secs. (the minute is due to some async stuff)
and
configuredEmbedder().embedderControls().useStoryTimeouts("30");
And the stackTrace is:
at java.util.concurrent.FutureTask.cancel(FutureTask.java:174)
at org.jbehave.core.embedder.StoryManager.waitUntilAllDoneOrFailed(StoryManager.java:184)
at org.jbehave.core.embedder.StoryManager.performStories(StoryManager.java:121)
at org.jbehave.core.embedder.StoryManager.runStories(StoryManager.java:107)
and that interrupts later Thread.sleep() method in ThucydidesFluentWait.doWait() (basically in the underneath Sleeper instance method sleep())
Increasing the story timeout or proper setup of waitFor(...) timeout vs. story timeout solves the problem on my side.

SelectorImpl is BLOCKED

I use a lot of client sends a request to the server about 1000 requests per second a client, the server's CPU soon rose to 600% (8 cores), and always maintain this state. When I use jstack printing process content, I found SelectorImpl is BLOCKED state. Records are as follows:
nioEventLoopGroup-4-1 prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Unknown Source)
High CPU has something to do with this? Another problem is that when I connect a lot of clients, find some client will connect, an error is as follows:
"nioEventLoopGroup-4-1" prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Unknown Source)
Generate client is accomplished by using a thread pool, and has set up a connection timeout, but why frequent connection timeout? Is to serve the cause of the suit?
public void run() {
System.out.println(tnum + " connecting...");
try {
Bootstrap bootstrap = new Bootstrap();
bootstrap.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 30000)
.handler(loadClientInitializer);
// Start the connection attempt.
ChannelFuture future = bootstrap.connect(host, port);
future.channel().attr(AttrNum).set(tnum);
future.sync();
if (future.isSuccess()) {
System.out.println(tnum + " login success.");
goSend(tnum, future.channel());
} else {
System.out.println(tnum + " login failed.");
}
} catch (Exception e) {
XLog.error(e);
} finally {
// group.shutdownGracefully();
}
}
High CPU has something to do with this?
It might be. I'd diagnose this problem following way (on a Linux box):
Find threads which are eating CPU
Using pidstat I'd find which threads are eating CPU and in what mode (user/kernel) time is spent.
$ pidstat -p [java-process-pid] -tu 1 | awk '$9 > 50'
This command shows threads eating at least 50% of CPU time. You can inspect what those threads are doing using jstack, VisualVM or Java Flight Recorder.
If CPU-hungry threads and BLOCKED threads are the same, CPU usage seems to do something with contention.
Find reason for connection timeout
Basically you will get connection timeout if two OS'es can't finish TCP-handshake in a given time. Several reasons for this:
network link saturation. Can be diagnosed using sar -n DEV 1 and comparing rxkB/s and txkB/s columns to your link maximum throughput.
server (Netty) doesn't respond with accept() call in given timeout. This thread can be BLOCKED or starving for CPU time. You can find which threads are calling accept() (therefore finishing TCP-handshake) using strace -f -e trace=accept -p [java-pid]. And after that check for possible reasons using pidstat/jstack.
Also you can find number of received requests for connection open (but not confirmed) with netstat -an | grep -c SYN_RECV
If you can elaborate more on what your Netty is doing it could be helpful. Regardless - please make sure you are closing the channels. Notice from the Channel javadoc:
It is important to call close() or close(ChannelPromise) to release all resources once you are done with the Channel. This ensures all resources are released in a proper way, i.e. filehandles
If you are closing the channels, then the problem may be with the logic it self - running into infinite loops or similar - which may be able to explain the high CPU.

Which Java thread is hogging the CPU?

Let's say your Java program is taking 100% CPU. It has 50 threads. You need to find which thread is guilty. I have not found a tool that can help. Currently I use the following very time consuming routine:
Run jstack <pid>, where pid is the process id of a Java process. The easy way to find it is to run another utility included in the JDK - jps. It is better to redirect jstack's output to a file.
Search for "runnable" threads. Skip those that wait on a socket (for some reason they are still marked runnable).
Repeat steps 1 and 2 a couple of times and see if you can locate a pattern.
Alternatively, you could attach to a Java process in Eclipse and try to suspend threads one by one, until you hit the one that hogs CPU. On a one-CPU machine, you might need to first reduce the Java process's priority to be able to move around. Even then, Eclipse often isn't able to attach to a running process due to a timeout.
I would have expected Sun's visualvm tool to do this.
Does anybody know of a better way?
Identifying which Java Thread is consuming most CPU in production server.
Most (if not all) productive systems doing anything important will use more than 1 java thread. And when something goes crazy and your cpu usage is on 100%, it is hard to identify which thread(s) is/are causing this. Or so I thought. Until someone smarter than me showed me how it can be done. And here I will show you how to do it and you too can amaze your family and friends with your geek skills.
A Test Application
In order to test this, we need a test application. So I will give you one. It consists of 3 classes:
A HeavyThread class that does something CPU intensive (computing MD5 hashes)
A LightThread class that does something not-so-cpu-intensive (counting and sleeping).
A StartThreads class to start 1 cpu intensive and several light threads.
Here is code for these classes:
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.UUID;
/**
* thread that does some heavy lifting
*
* #author srasul
*
*/
public class HeavyThread implements Runnable {
private long length;
public HeavyThread(long length) {
this.length = length;
new Thread(this).start();
}
#Override
public void run() {
while (true) {
String data = "";
// make some stuff up
for (int i = 0; i < length; i++) {
data += UUID.randomUUID().toString();
}
MessageDigest digest;
try {
digest = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException(e);
}
// hash the data
digest.update(data.getBytes());
}
}
}
import java.util.Random;
/**
* thread that does little work. just count & sleep
*
* #author srasul
*
*/
public class LightThread implements Runnable {
public LightThread() {
new Thread(this).start();
}
#Override
public void run() {
Long l = 0l;
while(true) {
l++;
try {
Thread.sleep(new Random().nextInt(10));
} catch (InterruptedException e) {
e.printStackTrace();
}
if(l == Long.MAX_VALUE) {
l = 0l;
}
}
}
}
/**
* start it all
*
* #author srasul
*
*/
public class StartThreads {
public static void main(String[] args) {
// lets start 1 heavy ...
new HeavyThread(1000);
// ... and 3 light threads
new LightThread();
new LightThread();
new LightThread();
}
}
Assuming that you have never seen this code, and all you have a PID of a runaway Java process that is running these classes and is consuming 100% CPU.
First let's start the StartThreads class.
$ ls
HeavyThread.java LightThread.java StartThreads.java
$ javac *
$ java StartThreads &
At this stage a Java process is running should be taking up 100 cpu. In my top I see:
In top press Shift-H which turns on Threads. The man page for top says:
-H : Threads toggle
Starts top with the last remembered 'H' state reversed. When
this toggle is On, all individual threads will be displayed.
Otherwise, top displays a summation of all threads in a
process.
And now in my top with Threads display turned ON i see:
And I have a java process with PID 28294. Lets get the stack dump of this process using jstack:
$ jstack 28924
2010-11-18 13:05:41
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
"Attach Listener" daemon prio=10 tid=0x0000000040ecb000 nid=0x7150 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"DestroyJavaVM" prio=10 tid=0x00007f9a98027800 nid=0x70fd waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Thread-3" prio=10 tid=0x00007f9a98025800 nid=0x710d waiting on condition [0x00007f9a9d543000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at LightThread.run(LightThread.java:21)
at java.lang.Thread.run(Thread.java:619)
"Thread-2" prio=10 tid=0x00007f9a98023800 nid=0x710c waiting on condition [0x00007f9a9d644000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at LightThread.run(LightThread.java:21)
at java.lang.Thread.run(Thread.java:619)
"Thread-1" prio=10 tid=0x00007f9a98021800 nid=0x710b waiting on condition [0x00007f9a9d745000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at LightThread.run(LightThread.java:21)
at java.lang.Thread.run(Thread.java:619)
"Thread-0" prio=10 tid=0x00007f9a98020000 nid=0x710a runnable [0x00007f9a9d846000]
java.lang.Thread.State: RUNNABLE
at sun.security.provider.DigestBase.engineReset(DigestBase.java:139)
at sun.security.provider.DigestBase.engineUpdate(DigestBase.java:104)
at java.security.MessageDigest$Delegate.engineUpdate(MessageDigest.java:538)
at java.security.MessageDigest.update(MessageDigest.java:293)
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:197)
- locked <0x00007f9aa457e400> (a sun.security.provider.SecureRandom)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:257)
- locked <0x00007f9aa457e708> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
- locked <0x00007f9aa4582fc8> (a java.security.SecureRandom)
at java.util.UUID.randomUUID(UUID.java:162)
at HeavyThread.run(HeavyThread.java:27)
at java.lang.Thread.run(Thread.java:619)
"Low Memory Detector" daemon prio=10 tid=0x00007f9a98006800 nid=0x7108 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread1" daemon prio=10 tid=0x00007f9a98004000 nid=0x7107 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x00007f9a98001000 nid=0x7106 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x0000000040de4000 nid=0x7105 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x0000000040dc4800 nid=0x7104 in Object.wait() [0x00007f9a97ffe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007f9aa45506b0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <0x00007f9aa45506b0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x0000000040dbd000 nid=0x7103 in Object.wait() [0x00007f9a9de92000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007f9aa4550318> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x00007f9aa4550318> (a java.lang.ref.Reference$Lock)
"VM Thread" prio=10 tid=0x0000000040db8800 nid=0x7102 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000040d6e800 nid=0x70fe runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000040d70800 nid=0x70ff runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000040d72000 nid=0x7100 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000040d74000 nid=0x7101 runnable
"VM Periodic Task Thread" prio=10 tid=0x00007f9a98011800 nid=0x7109 waiting on condition
JNI global references: 910
From my top I see that the PID of the top thread is 28938. And 28938 in hex is 0x710A. Notice that in the stack dump, each thread has an nid which is dispalyed in hex. And it just so happens that 0x710A is the id of the thread:
"Thread-0" prio=10 tid=0x00007f9a98020000 nid=0x710a runnable [0x00007f9a9d846000]
java.lang.Thread.State: RUNNABLE
at sun.security.provider.DigestBase.engineReset(DigestBase.java:139)
at sun.security.provider.DigestBase.engineUpdate(DigestBase.java:104)
at java.security.MessageDigest$Delegate.engineUpdate(MessageDigest.java:538)
at java.security.MessageDigest.update(MessageDigest.java:293)
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:197)
- locked <0x00007f9aa457e400> (a sun.security.provider.SecureRandom)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:257)
- locked <0x00007f9aa457e708> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
- locked <0x00007f9aa4582fc8> (a java.security.SecureRandom)
at java.util.UUID.randomUUID(UUID.java:162)
at HeavyThread.run(HeavyThread.java:27)
at java.lang.Thread.run(Thread.java:619)
And so you can confirm that the thread which is running the HeavyThread class is consuming most CPU.
In read world situations, it will probably be a bunch of threads that consume some portion of CPU and these threads put together will lead to the Java process using 100% CPU.
Summary
Run top
Press Shift-H to enable Threads View
Get PID of the thread with highest CPU
Convert PID to HEX
Get stack dump of java process
Look for thread with the matching HEX PID.
jvmtop can show you the top consuming threads:
TID NAME STATE CPU TOTALCPU
25 http-8080-Processor13 RUNNABLE 4.55% 1.60%
128022 RMI TCP Connection(18)-10.101. RUNNABLE 1.82% 0.02%
36578 http-8080-Processor164 RUNNABLE 0.91% 2.35%
128026 JMX server connection timeout TIMED_WAITING 0.00% 0.00%
Try looking at the Hot Thread Detector plugin for visual VM -- it uses the ThreadMXBean API to take multiple CPU consumption samples to find the most active threads. It's based on a command-line equivalent from Bruce Chapman which might also be useful.
Just run up JVisualVM, connect to your app and and use the thread view. The one which remains continually active is your most likely culprit.
Have a look at the Top Threads plugin for JConsole.
I would recommend taking a look at Arthas tool open sourced by Alibaba.
It contains a bunch of useful commands that can help you debug your production code:
Dashboard: Overview of Your Java Process
SC: Search Class Loaded by Your JVM
Jad: Decompile Class Into Source Code
Watch: View Method Invocation Input and Results
Trace: Find the Bottleneck of Your Method Invocation
Monitor: View Method Invocation Statistics
Stack: View Call Stack of the Method
Tt: Time Tunnel of Method Invocations
Example of the dashboard:
If you're running under Windows, try Process Explorer. Bring up the properties dialog for your process, then select the Threads tab.
Take a thread dump. Wait for 10 seconds. Take another thread dump. Repeat one more time.
Inspect the thread dumps and see which threads are stuck at the same place, or processing the same request.
This is a manual way of doing it, but often useful.
Use ps -eL or top -H -p <pid>, or if you need to see and monitor in real time, run top (then shift H), to get the Light Weight Process ( LWP aka threads) associated with the java process.
root#xxx:/# ps -eL
PID LWP TTY TIME CMD
1 1 ? 00:00:00 java
1 7 ? 00:00:01 java
1 8 ? 00:07:52 java
1 9 ? 00:07:52 java
1 10 ? 00:07:51 java
1 11 ? 00:07:52 java
1 12 ? 00:07:52 java
1 13 ? 00:07:51 java
1 14 ? 00:07:51 java
1 15 ? 00:07:53 java
…
1 164 ? 00:00:02 java
1 166 ? 00:00:02 java
1 169 ? 00:00:02 java
Note LWP= Lightweight Process; In Linux, a thread is associated with a process so that it can be managed in the kernel; LWP shares files and other resources with the parent process.
Now let us see the threads that are taking most time
1 8 ? 00:07:52 java
1 9 ? 00:07:52 java
1 10 ? 00:07:51 java
1 11 ? 00:07:52 java
1 12 ? 00:07:52 java
1 13 ? 00:07:51 java
1 14 ? 00:07:51 java
1 15 ? 00:07:53 java
Jstack is a JDK utility to print Java Stack; It prints thread of the form.
Familiarize yourself with others cool JDK tools as well (jcmd jstat jhat jmap jstack etc — https://docs.oracle.com/javase/8/docs/technotes/tools/unix/)
jstack -l <process id>
The nid, Native thread id in the stack trace is the one that is connected to LWT in linux (https://gist.github.com/rednaxelafx/843622)
“GC task thread#0 (ParallelGC)” os_prio=0 tid=0x00007fc21801f000 nid=0x8 runnable
The nid is given in Hex; So we convert the thread id taking the most time
8,9,10,11,12,13,14,15 in DEC = 8,9,A, B,C,D,E,F in HEX.
(note that this particular stack was taken from Java in a Docker container, with a convenient process if of 1 )
Let us see the thread with this ids..
“GC task thread#0 (ParallelGC)” os_prio=0 tid=0x00007fc21801f000 nid=0x8 runnable
“GC task thread#1 (ParallelGC)” os_prio=0 tid=0x00007fc218020800 nid=0x9 runnable
“GC task thread#2 (ParallelGC)” os_prio=0 tid=0x00007fc218022800 nid=0xa runnable
“GC task thread#3 (ParallelGC)” os_prio=0 tid=0x00007fc218024000 nid=0xb runnable
“GC task thread#4 (ParallelGC)” os_prio=0 tid=0x00007fc218026000 nid=0xc runnable
“GC task thread#5 (ParallelGC)” os_prio=0 tid=0x00007fc218027800 nid=0xd runnable
“GC task thread#6 (ParallelGC)” os_prio=0 tid=0x00007fc218029800 nid=0xe runnable
“GC task thread#7 (ParallelGC)” os_prio=0 tid=0x00007fc21802b000 nid=0xf runnable
All GC related threads; No wonder it was taking lot of CPU time; But then is GC a problem here.
Use jstat (not jstack !) utility to have a quick check for GC.
jstat -gcutil <pid>
This is a kind of hacky way, but it seems like you could fire the application up in a debugger, and then suspend all the threads, and go through the code and find out which one isn't blocking on a lock or an I/O call in some kind of loop. Or is this like what you've already tried?
An option you could consider is querying your threads for the answer from within application. Via the ThreadMXBean you can query CPU usage of threads from within your Java application and query stack traces of the offending thread(s).
The ThreadMXBean option allows you to build this kind of monitoring into your live application. It has negligible impact and has the distinct advantage that you can make it do exactly what you want.
If you suspect VisualVM is a good tool, try it (because it does this) Find out the threads(s) only helps you in the general direction of why it is consuming so much CPU.
However, if its that obvious I would go straight to using a profiler to find out why you are consuming so much CPU.

Categories