I have written following program. Basically I am using executor framework to manage threads. I've also used a BlockingQueue and deliberately keeping it empty so that the thread remains in waiting state.
The below is the program:
package com.example.executors;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ScheduledFuture;
import java.util.concurrent.TimeUnit;
public class ExecutorDemo {
public static void main(String[] args) throws InterruptedException {
ScheduledExecutorService scheduledThreadPool = null;
BlockingQueue<Integer> bq = new LinkedBlockingQueue<>();
scheduledThreadPool = Executors.newSingleThreadScheduledExecutor((Runnable run) -> {
Thread t = Executors.defaultThreadFactory().newThread(run);
t.setDaemon(true);
t.setName("Worker-pool-" + Thread.currentThread().getName());
t.setUncaughtExceptionHandler(
(thread, e) -> System.out.println("thread is --> " + thread + "exception is --> " + e));
return t;
});
ScheduledFuture<?> f = scheduledThreadPool.scheduleAtFixedRate(() -> {
System.out.println("Inside thread.. working");
try {
bq.take();
} catch (InterruptedException e) {
e.printStackTrace();
}
}, 2000, 30000, TimeUnit.MILLISECONDS);
System.out.println("f.isDone() ---> " + f.isDone());
Thread.sleep(100000000000L);
}
}
Once the program runs, main thread remains in TIMED_WAITING state, due to Thread.sleep(). In thread, which is managed by executor, i am making it to read an empty blocking queue, and this thread remain in WAITING state for ever. I wanted to see how does the thread dump looks in this scenario. I have captured it below:
"Worker-pool-main" #10 daemon prio=5 os_prio=31 tid=0x00007f7ef393d800 nid=0x5503 waiting on condition [0x000070000a3d8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007955f7110> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at com.example.cs.executors.CSExecutorUnderstanding.lambda$2(CSExecutorUnderstanding.java:34)
at com.example.cs.executors.CSExecutorUnderstanding$$Lambda$2/1705736037.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
As expected thread Worker-pool-main remains in WAITING state. My doubt is on the thread dump.
As it is executor service which manages the life-cycle of thread in executor framework, then how this thread dump starts with Thread.run() method.
Shouldn't it be that first some portion of executor appearing and then Thread.run()
Basically , the doubt is: when life-cycle is managed by executor, then how come Thread.run() is appearing first and up the stack see portions of executors. Isn't executors starting these threads, so how they are appearing up in the stack?
When you start a new Thread, it will execute its run method on a completely new call stack. That is the entrypoint for the code in that Thread. It is completely decoupled from the thread that called start. The "parent" thread continues to run its own code on its own stack independently, and if either of the two threads crashes or completes it does not impact the other.
The only thing that shows up in a thread's stack frames is whatever gets called inside of run. You don't get to see who called run (the JVM did that). Unless of course, you confused start with run and called the run directly from your own code. Then there is no new thread involved at all.
Here, the thread is not created by your own code directly, but by the executor service. But that one does not do anything different, it also has to create threads by calling constructors and start them using start. The end result is the same.
What run usually does is delegate to a Runnable that has been set in its constructor. You see that here: The executor service has installed a ThreadPoolExecutor$Worker instance. This one contains all the code to be run on the new thread and control its interactions with the executor.
That ThreadPoolExecutor$Worker in turn will then call into its payload code, your application code, the tasks that have been submitted to the executor. In your case, that is com.example.cs.executors.CSExecutorUnderstanding$$Lambda$2/1705736037.
In my java application, I need to copy the content of a directory from one to another. But sometimes (very rare) the copyDirectory stuck forever and the code does not execute after that. Which result in high CPU.
I checked the jstack of my application multiple times and found that the same thread is in the runnable state for long. Below is the stack trace of the thread.
"pool-2-thread-3" #17 prio=5 os_prio=0 tid=0x00007fab5585c000 nid=0xa81 runnable [0x00007fab0af6f000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.size0(Native Method)
at sun.nio.ch.FileDispatcherImpl.size(FileDispatcherImpl.java:84)
at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:310)
- locked <0x00000000c5f59728> (a java.lang.Object)
at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:705)
at org.apache.commons.io.FileUtils.doCopyFile(FileUtils.java:1147)
at org.apache.commons.io.FileUtils.doCopyDirectory(FileUtils.java:1428)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1389)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1261)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1230)
I tried copying same manually with shell command but they copied successfully. Also, there is one more thread which is in running state for long with the following stack trace.
"pool-2-thread-52" #81581 prio=5 os_prio=0 tid=0x00007fab55951800 nid=0x5db runnable [0x00007faafb2f0000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileChannelImpl.position0(Native Method)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:288)
- locked <0x00000000c5ffde60> (a java.lang.Object)
at sun.nio.ch.FileChannelImpl.transferFromFileChannel(FileChannelImpl.java:651)
- locked <0x00000000c5ffde60> (a java.lang.Object)
at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:708)
at org.apache.commons.io.FileUtils.doCopyFile(FileUtils.java:1147)
at org.apache.commons.io.FileUtils.doCopyDirectory(FileUtils.java:1428)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1389)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1261)
at org.apache.commons.io.FileUtils.copyDirectory(FileUtils.java:1230)
I am not getting any clue why the thread stuck in that native call. Is there any environmental issue or related to the machine?
I just got the clue from the IO-385
Since I was using apache commons-io version 2.4. Which is having a bug where FileUtils.doCopyFile can potentially lead to infinite loop.
for(long count = 0L; pos < size; pos += output.transferFrom(input, pos, count))
{
count = size - pos > 31457280L ? 31457280L : size - pos;
}
I have Spring with Quartz jobs (clustered) running at periodic interval (1 minute). When server starts everything seems fine, but jobs don't get triggered after some time. Restart of the server makes the jobs run, but issue re-occurs after some time.
I suspected it to be a thread exhaustion issue and from thread dump I noticed that all my Quartz threads (10) are in TIMED_WAITING.
Config:
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 10
org.quartz.threadPool.threadPriority = 5
Thread dump:
quartzScheduler_Worker-10 - priority:10 - threadId:0x00007f8ae534d800 - nativeId:0x13c78 - state:TIMED_WAITING stackTrace:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000066cd73220> (a java.lang.Object)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
- locked <0x000000066cd73220> (a java.lang.Object)
Using quartz 2.2.1 (i doubt if it could be version specific issue)
I verified from the logs that there are no DB connectivity issues.
Kindly help in diagnosing the problem. Is there a possibility that I have maxed out system resources (number of threads) ? But my jobs are synchronous and exist only when all its child threads have completed their task and I also have this annotation #DisallowConcurrentExecution
The root cause was we had too many miss fires in our quartz job. We have quartz kicks in every 1 minute and job doesn't really complete in say 1 min, so it's getting pilled up as miss fires and quartz tries to execute them first.
During this process there's an operation of update of miss fires which takes a lots of time which leads quartz to get stuck. This is evident from thread dump where in all our quartz threads are in TIMED_WAITING state as below
quartzScheduler_Worker-10 - priority:10 - threadId:0x00007f8ae534d800 - nativeId:0x13c78 - state:TIMED_WAITING
stackTrace:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000066cd73220> (a java.lang.Object)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
- locked <0x000000066cd73220> (a java.lang.Object)
Refer : https://jira.terracotta.org/jira/si/jira.issueviews:issue-html/QTZ-357/QTZ-357.html
For our use case miss fires can be ignored and can be picked with next run. Hence I changed the Misfire instruction to ignore as below
<property name="misfireInstructionName" value="MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY" />
I've got a UI automation framework that launches tests using TestNG and runs through pages using Selenium/WebDriver. Oftentimes the pages I'm testing make AJAX calls that modify the DOM upon returning. In these cases I use Selenium explicit waits to declare a DOM condition that I want to be met before the automation can proceed (IE: some button gets enabled).
Internally Selenium's FluentWait.until method handles this by polling the DOM for my ExpectedCondition every 500ms and calling Thread.sleep() in-between these checks.
When I run two tests back to back in a TestNG suite this works perfectly fine for the first test, but starts to fail with an InterruptedException about halfway through each subsequent test. This is consistent. The exceptions look like this:
Associated Throwable Type: class org.openqa.selenium.WebDriverException Associated Throwable Message: java.lang.InterruptedException: sleep interrupted
The strange thing is that there's no multi-threading going on here. I've disabled Selenium Grid, BrowserMob Proxy, and every other bit of code that could be conflicting. I've read both of these questions:
https://stackoverflow.com/questions/24495176/why-is-thread-sleep-being-interrupted - Closed for not providing enough detail, but one of the proposed answers states that one should override the Thread.interrupt method for debugging.
Who interrupts my thread? - Accepted answer also states that one should override the Thread.interrupt method for debugging.
My problem with this solution is that placing a breakpoint inside the existing Thread.interrupt method does not reveal any calls around the time that the thread is interrupted. This includes calls from all of my third party dependencies (IE: TestNG and Selenium). Whatever is calling this thread interrupt appears to be external to my framework.
I've also tried calling Thread.currentThread.isInterrupted() at every point prior to the FluentWait.until call and it consistently returns false. I've even used IntelliJ's evaluate function to check for isInterrupted inside the Selenium code itself. This thread is only being interrupted once the Thread.sleep call occurs inside FluentWait.until.
I've seen this happen on multiple Windows build servers as well as on my Macbook, so this does not appear to be machine specific.
I thought for a while that this might be caused by a TestNG timeout, but reducing the TestNG timeout in my suite yielded a different behavior than these interruptions.
Currently I'm working around this issue with the following code which swallows the exception and resumes the explicit wait:
public static boolean waitForElementStatus(Stuff)
{
/* snip - setup for ExpectedCondition (change) */
long startSeconds = new Date().getTime() / 1000;
long currentSeconds = startSeconds;
long remainingSeconds = maxElementStatusChangeSeconds;
WebDriverWait waitForElement = new WebDriverWait(driver, maxElementStatusChangeSeconds);
boolean changed = false;
boolean firstWait = true; // If specified time is 0 we still want to check once.
out:while(firstWait || remainingSeconds > 0)
{
firstWait = false;
Boolean exceptionThrown = false;
try
{
waitForElement.until(change);
}
catch(Throwable t)
{
exceptionThrown = true;
if(t.getCause()) != null
{
t = t.getCause(); // InterruptedException is wrapped inside a WebDriverException
}
if(t.getClass().equals(InterruptedException.class))
{
Thread.interrupted(); // clear interrupt status for this thread
currentSeconds = new Date().getTime() / 1000;
remainingSeconds = startSeconds + maxElementStatusChangeSeconds - currentSeconds;
if(remainingSeconds > 0)
{
String warning = String.format("Caught unidentified interrupt inside Selenium " +
"FluentWait.until call. Swallowing interrupt and repeating call with [%s] seconds " +
"remaining.", remainingSeconds);
CombinedLogger.warn(warning);
waitForElement = new WebDriverWait(driver, remainingSeconds);
}
else
{
// If a timeout exception would have been thrown instead of the interruption then
// we'll allow the WebDriverWait to execute one last time so it can throw the
// timeout instead.
waitForElement = new WebDriverWait(driver, 0);
}
}
else if(haltOnFailure) // for any other exception type such as TimeoutException
{
CombinedLogger.error(stuff + "...FAILURE(HALTING)", t);
break out;
}
else // for any other exception type such as TimeoutException
{
CombinedLogger.info(stuff + "...failure(non-halting)");
break out;
}
}
if(!exceptionThrown)
{
changed = true;
CombinedLogger.info(stuff + "...success ");
break out;
}
}
return changed;
}
This workaround does function, and fortunately these mystery interrupts are only occurring sporadically afterwards (they don't happen repeatedly), so the tests are able to proceed. However, I understand that swallowing InterruptedException is bad form. If possible, I'd like to determine where and why these interrupts are taking place so that I can put an end to them instead of using this hack.
Simply propagating the exceptions is not an option since these tests need to continue running instead of obediently crashing.
Are there any known utilities, JVM arguments, or libraries that I could use which would help me track down Java thread interruptions that are caused by code which is out of my control?
Update 12/10/2014: I've captured two thread dumps. One is from immediately before the interrupt and one is from immediately after it. The only difference between the two is the line number of the interrupted thread (it goes from the try block to the catch block after being interrupted). Not sure what this tells me, but here's the data:
Full thread dump (immediately before interrupt)
"TestNG#1359" prio=5 tid=0xc nid=NA runnable
java.lang.Thread.State: RUNNABLE
at org.openqa.selenium.support.ui.FluentWait.until(FluentWait.java:232)
/* snip - company stuff */
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:46)
at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:37)
at org.testng.internal.MethodInvocationHelper.invokeWithTimeoutWithNoExecutor(MethodInvocationHelper.java:240)
at org.testng.internal.MethodInvocationHelper.invokeWithTimeout(MethodInvocationHelper.java:229)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:724)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
at org.testng.TestRunner.privateRun(TestRunner.java:767)
at org.testng.TestRunner.run(TestRunner.java:617)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:348)
at org.testng.SuiteRunner.access$000(SuiteRunner.java:38)
at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382)
at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"main#1" prio=5 tid=0x1 nid=NA waiting
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:422)
at java.util.concurrent.FutureTask.get(FutureTask.java:199)
at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:289)
at org.testng.internal.thread.ThreadUtil.execute(ThreadUtil.java:72)
at org.testng.SuiteRunner.runInParallelTestMode(SuiteRunner.java:367)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:308)
at org.testng.SuiteRunner.run(SuiteRunner.java:254)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1149)
at org.testng.TestNG.run(TestNG.java:1057)
at org.testng.remote.RemoteTestNG.run(RemoteTestNG.java:111)
at org.testng.remote.RemoteTestNG.initAndRun(RemoteTestNG.java:204)
at org.testng.remote.RemoteTestNG.main(RemoteTestNG.java:175)
at org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:125)
"Thread-8#2432" daemon prio=5 tid=0x15 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0xe08> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105)
at java.lang.Thread.run(Thread.java:745)
"Thread-7#2431" daemon prio=5 tid=0x14 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0xe09> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105)
at java.lang.Thread.run(Thread.java:745)
"Thread-6#2424" prio=5 tid=0x13 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:261)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:347)
at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:46)
at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:188)
"process reaper#2008" daemon prio=10 tid=0x10 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.lang.UNIXProcess.waitForProcessExit(UNIXProcess.java:-1)
at java.lang.UNIXProcess.access$500(UNIXProcess.java:54)
at java.lang.UNIXProcess$4.run(UNIXProcess.java:225)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"ReaderThread#645" prio=5 tid=0xb nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(SocketInputStream.java:-1)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
- locked <0xe0b> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.testng.remote.strprotocol.BaseMessageSender$ReaderThread.run(BaseMessageSender.java:245)
"Finalizer#2957" daemon prio=8 tid=0x3 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
"Reference Handler#2958" daemon prio=10 tid=0x2 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
"Signal Dispatcher#2956" daemon prio=9 tid=0x4 nid=NA runnable
java.lang.Thread.State: RUNNABLE
There is not much that can be inferred from the thread dump as in what caused it.
But in reality you cannot rely on Thread.sleep() too much ,it might be interrupted for known/unknown reason.OS might be the reason in the later case.
Thread.sleep() is one of the few methods which takes interrupt seriously. As a thread cannot handle InterruptedException while it is sleeping ,you need to handle it.
What you are doing right now might not be a workaround but a way to go in such cases,where we cannot do without Thread.sleep().
A bit outdated but I have similar problem and with the help of your previously posted link (https://stackoverflow.com/a/2476246) I put a breakpoint into the Thread.interrupt() method.
It reveals that the interruption was made by StoryManager.waitUntilAllDoneOrFailed() method that triggers future.cancel() method after the timeout set on whole story.
My whole setup is:
page.getPageObject().withTimeoutOf(convertDuration(duration)).waitFor(by);
where duration is about 60 secs. (the minute is due to some async stuff)
and
configuredEmbedder().embedderControls().useStoryTimeouts("30");
And the stackTrace is:
at java.util.concurrent.FutureTask.cancel(FutureTask.java:174)
at org.jbehave.core.embedder.StoryManager.waitUntilAllDoneOrFailed(StoryManager.java:184)
at org.jbehave.core.embedder.StoryManager.performStories(StoryManager.java:121)
at org.jbehave.core.embedder.StoryManager.runStories(StoryManager.java:107)
and that interrupts later Thread.sleep() method in ThucydidesFluentWait.doWait() (basically in the underneath Sleeper instance method sleep())
Increasing the story timeout or proper setup of waitFor(...) timeout vs. story timeout solves the problem on my side.
Let's say your Java program is taking 100% CPU. It has 50 threads. You need to find which thread is guilty. I have not found a tool that can help. Currently I use the following very time consuming routine:
Run jstack <pid>, where pid is the process id of a Java process. The easy way to find it is to run another utility included in the JDK - jps. It is better to redirect jstack's output to a file.
Search for "runnable" threads. Skip those that wait on a socket (for some reason they are still marked runnable).
Repeat steps 1 and 2 a couple of times and see if you can locate a pattern.
Alternatively, you could attach to a Java process in Eclipse and try to suspend threads one by one, until you hit the one that hogs CPU. On a one-CPU machine, you might need to first reduce the Java process's priority to be able to move around. Even then, Eclipse often isn't able to attach to a running process due to a timeout.
I would have expected Sun's visualvm tool to do this.
Does anybody know of a better way?
Identifying which Java Thread is consuming most CPU in production server.
Most (if not all) productive systems doing anything important will use more than 1 java thread. And when something goes crazy and your cpu usage is on 100%, it is hard to identify which thread(s) is/are causing this. Or so I thought. Until someone smarter than me showed me how it can be done. And here I will show you how to do it and you too can amaze your family and friends with your geek skills.
A Test Application
In order to test this, we need a test application. So I will give you one. It consists of 3 classes:
A HeavyThread class that does something CPU intensive (computing MD5 hashes)
A LightThread class that does something not-so-cpu-intensive (counting and sleeping).
A StartThreads class to start 1 cpu intensive and several light threads.
Here is code for these classes:
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.UUID;
/**
* thread that does some heavy lifting
*
* #author srasul
*
*/
public class HeavyThread implements Runnable {
private long length;
public HeavyThread(long length) {
this.length = length;
new Thread(this).start();
}
#Override
public void run() {
while (true) {
String data = "";
// make some stuff up
for (int i = 0; i < length; i++) {
data += UUID.randomUUID().toString();
}
MessageDigest digest;
try {
digest = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException(e);
}
// hash the data
digest.update(data.getBytes());
}
}
}
import java.util.Random;
/**
* thread that does little work. just count & sleep
*
* #author srasul
*
*/
public class LightThread implements Runnable {
public LightThread() {
new Thread(this).start();
}
#Override
public void run() {
Long l = 0l;
while(true) {
l++;
try {
Thread.sleep(new Random().nextInt(10));
} catch (InterruptedException e) {
e.printStackTrace();
}
if(l == Long.MAX_VALUE) {
l = 0l;
}
}
}
}
/**
* start it all
*
* #author srasul
*
*/
public class StartThreads {
public static void main(String[] args) {
// lets start 1 heavy ...
new HeavyThread(1000);
// ... and 3 light threads
new LightThread();
new LightThread();
new LightThread();
}
}
Assuming that you have never seen this code, and all you have a PID of a runaway Java process that is running these classes and is consuming 100% CPU.
First let's start the StartThreads class.
$ ls
HeavyThread.java LightThread.java StartThreads.java
$ javac *
$ java StartThreads &
At this stage a Java process is running should be taking up 100 cpu. In my top I see:
In top press Shift-H which turns on Threads. The man page for top says:
-H : Threads toggle
Starts top with the last remembered 'H' state reversed. When
this toggle is On, all individual threads will be displayed.
Otherwise, top displays a summation of all threads in a
process.
And now in my top with Threads display turned ON i see:
And I have a java process with PID 28294. Lets get the stack dump of this process using jstack:
$ jstack 28924
2010-11-18 13:05:41
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
"Attach Listener" daemon prio=10 tid=0x0000000040ecb000 nid=0x7150 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"DestroyJavaVM" prio=10 tid=0x00007f9a98027800 nid=0x70fd waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Thread-3" prio=10 tid=0x00007f9a98025800 nid=0x710d waiting on condition [0x00007f9a9d543000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at LightThread.run(LightThread.java:21)
at java.lang.Thread.run(Thread.java:619)
"Thread-2" prio=10 tid=0x00007f9a98023800 nid=0x710c waiting on condition [0x00007f9a9d644000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at LightThread.run(LightThread.java:21)
at java.lang.Thread.run(Thread.java:619)
"Thread-1" prio=10 tid=0x00007f9a98021800 nid=0x710b waiting on condition [0x00007f9a9d745000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at LightThread.run(LightThread.java:21)
at java.lang.Thread.run(Thread.java:619)
"Thread-0" prio=10 tid=0x00007f9a98020000 nid=0x710a runnable [0x00007f9a9d846000]
java.lang.Thread.State: RUNNABLE
at sun.security.provider.DigestBase.engineReset(DigestBase.java:139)
at sun.security.provider.DigestBase.engineUpdate(DigestBase.java:104)
at java.security.MessageDigest$Delegate.engineUpdate(MessageDigest.java:538)
at java.security.MessageDigest.update(MessageDigest.java:293)
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:197)
- locked <0x00007f9aa457e400> (a sun.security.provider.SecureRandom)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:257)
- locked <0x00007f9aa457e708> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
- locked <0x00007f9aa4582fc8> (a java.security.SecureRandom)
at java.util.UUID.randomUUID(UUID.java:162)
at HeavyThread.run(HeavyThread.java:27)
at java.lang.Thread.run(Thread.java:619)
"Low Memory Detector" daemon prio=10 tid=0x00007f9a98006800 nid=0x7108 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread1" daemon prio=10 tid=0x00007f9a98004000 nid=0x7107 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x00007f9a98001000 nid=0x7106 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x0000000040de4000 nid=0x7105 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x0000000040dc4800 nid=0x7104 in Object.wait() [0x00007f9a97ffe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007f9aa45506b0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <0x00007f9aa45506b0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x0000000040dbd000 nid=0x7103 in Object.wait() [0x00007f9a9de92000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007f9aa4550318> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x00007f9aa4550318> (a java.lang.ref.Reference$Lock)
"VM Thread" prio=10 tid=0x0000000040db8800 nid=0x7102 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000040d6e800 nid=0x70fe runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000040d70800 nid=0x70ff runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000040d72000 nid=0x7100 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000040d74000 nid=0x7101 runnable
"VM Periodic Task Thread" prio=10 tid=0x00007f9a98011800 nid=0x7109 waiting on condition
JNI global references: 910
From my top I see that the PID of the top thread is 28938. And 28938 in hex is 0x710A. Notice that in the stack dump, each thread has an nid which is dispalyed in hex. And it just so happens that 0x710A is the id of the thread:
"Thread-0" prio=10 tid=0x00007f9a98020000 nid=0x710a runnable [0x00007f9a9d846000]
java.lang.Thread.State: RUNNABLE
at sun.security.provider.DigestBase.engineReset(DigestBase.java:139)
at sun.security.provider.DigestBase.engineUpdate(DigestBase.java:104)
at java.security.MessageDigest$Delegate.engineUpdate(MessageDigest.java:538)
at java.security.MessageDigest.update(MessageDigest.java:293)
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:197)
- locked <0x00007f9aa457e400> (a sun.security.provider.SecureRandom)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:257)
- locked <0x00007f9aa457e708> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
- locked <0x00007f9aa4582fc8> (a java.security.SecureRandom)
at java.util.UUID.randomUUID(UUID.java:162)
at HeavyThread.run(HeavyThread.java:27)
at java.lang.Thread.run(Thread.java:619)
And so you can confirm that the thread which is running the HeavyThread class is consuming most CPU.
In read world situations, it will probably be a bunch of threads that consume some portion of CPU and these threads put together will lead to the Java process using 100% CPU.
Summary
Run top
Press Shift-H to enable Threads View
Get PID of the thread with highest CPU
Convert PID to HEX
Get stack dump of java process
Look for thread with the matching HEX PID.
jvmtop can show you the top consuming threads:
TID NAME STATE CPU TOTALCPU
25 http-8080-Processor13 RUNNABLE 4.55% 1.60%
128022 RMI TCP Connection(18)-10.101. RUNNABLE 1.82% 0.02%
36578 http-8080-Processor164 RUNNABLE 0.91% 2.35%
128026 JMX server connection timeout TIMED_WAITING 0.00% 0.00%
Try looking at the Hot Thread Detector plugin for visual VM -- it uses the ThreadMXBean API to take multiple CPU consumption samples to find the most active threads. It's based on a command-line equivalent from Bruce Chapman which might also be useful.
Just run up JVisualVM, connect to your app and and use the thread view. The one which remains continually active is your most likely culprit.
Have a look at the Top Threads plugin for JConsole.
I would recommend taking a look at Arthas tool open sourced by Alibaba.
It contains a bunch of useful commands that can help you debug your production code:
Dashboard: Overview of Your Java Process
SC: Search Class Loaded by Your JVM
Jad: Decompile Class Into Source Code
Watch: View Method Invocation Input and Results
Trace: Find the Bottleneck of Your Method Invocation
Monitor: View Method Invocation Statistics
Stack: View Call Stack of the Method
Tt: Time Tunnel of Method Invocations
Example of the dashboard:
If you're running under Windows, try Process Explorer. Bring up the properties dialog for your process, then select the Threads tab.
Take a thread dump. Wait for 10 seconds. Take another thread dump. Repeat one more time.
Inspect the thread dumps and see which threads are stuck at the same place, or processing the same request.
This is a manual way of doing it, but often useful.
Use ps -eL or top -H -p <pid>, or if you need to see and monitor in real time, run top (then shift H), to get the Light Weight Process ( LWP aka threads) associated with the java process.
root#xxx:/# ps -eL
PID LWP TTY TIME CMD
1 1 ? 00:00:00 java
1 7 ? 00:00:01 java
1 8 ? 00:07:52 java
1 9 ? 00:07:52 java
1 10 ? 00:07:51 java
1 11 ? 00:07:52 java
1 12 ? 00:07:52 java
1 13 ? 00:07:51 java
1 14 ? 00:07:51 java
1 15 ? 00:07:53 java
…
1 164 ? 00:00:02 java
1 166 ? 00:00:02 java
1 169 ? 00:00:02 java
Note LWP= Lightweight Process; In Linux, a thread is associated with a process so that it can be managed in the kernel; LWP shares files and other resources with the parent process.
Now let us see the threads that are taking most time
1 8 ? 00:07:52 java
1 9 ? 00:07:52 java
1 10 ? 00:07:51 java
1 11 ? 00:07:52 java
1 12 ? 00:07:52 java
1 13 ? 00:07:51 java
1 14 ? 00:07:51 java
1 15 ? 00:07:53 java
Jstack is a JDK utility to print Java Stack; It prints thread of the form.
Familiarize yourself with others cool JDK tools as well (jcmd jstat jhat jmap jstack etc — https://docs.oracle.com/javase/8/docs/technotes/tools/unix/)
jstack -l <process id>
The nid, Native thread id in the stack trace is the one that is connected to LWT in linux (https://gist.github.com/rednaxelafx/843622)
“GC task thread#0 (ParallelGC)” os_prio=0 tid=0x00007fc21801f000 nid=0x8 runnable
The nid is given in Hex; So we convert the thread id taking the most time
8,9,10,11,12,13,14,15 in DEC = 8,9,A, B,C,D,E,F in HEX.
(note that this particular stack was taken from Java in a Docker container, with a convenient process if of 1 )
Let us see the thread with this ids..
“GC task thread#0 (ParallelGC)” os_prio=0 tid=0x00007fc21801f000 nid=0x8 runnable
“GC task thread#1 (ParallelGC)” os_prio=0 tid=0x00007fc218020800 nid=0x9 runnable
“GC task thread#2 (ParallelGC)” os_prio=0 tid=0x00007fc218022800 nid=0xa runnable
“GC task thread#3 (ParallelGC)” os_prio=0 tid=0x00007fc218024000 nid=0xb runnable
“GC task thread#4 (ParallelGC)” os_prio=0 tid=0x00007fc218026000 nid=0xc runnable
“GC task thread#5 (ParallelGC)” os_prio=0 tid=0x00007fc218027800 nid=0xd runnable
“GC task thread#6 (ParallelGC)” os_prio=0 tid=0x00007fc218029800 nid=0xe runnable
“GC task thread#7 (ParallelGC)” os_prio=0 tid=0x00007fc21802b000 nid=0xf runnable
All GC related threads; No wonder it was taking lot of CPU time; But then is GC a problem here.
Use jstat (not jstack !) utility to have a quick check for GC.
jstat -gcutil <pid>
This is a kind of hacky way, but it seems like you could fire the application up in a debugger, and then suspend all the threads, and go through the code and find out which one isn't blocking on a lock or an I/O call in some kind of loop. Or is this like what you've already tried?
An option you could consider is querying your threads for the answer from within application. Via the ThreadMXBean you can query CPU usage of threads from within your Java application and query stack traces of the offending thread(s).
The ThreadMXBean option allows you to build this kind of monitoring into your live application. It has negligible impact and has the distinct advantage that you can make it do exactly what you want.
If you suspect VisualVM is a good tool, try it (because it does this) Find out the threads(s) only helps you in the general direction of why it is consuming so much CPU.
However, if its that obvious I would go straight to using a profiler to find out why you are consuming so much CPU.