I have a very strange situation which i can't get my head around. I have defined a thread pool and its usage like this
ExecutorService fixedThreadPool = Executors.newFixedThreadPool(5);
....some code.....
logger.info("Event:{}, message:[{}]", Event.MESSAGE.name(), message);
fixedThreadPool.submit(new Runnable() {
#Override
public void run() {
...some code...
}
});
logger.info("Submitted: Event:{}, message:[{}]", Event.MESSAGE.name(), message);
Now here is my output for the log messages
2017-07-25 20:44:41,020 [New I/O worker #1] XXXXXXXXX.XXXXXXXServiceImpl - Event:MESSAGE, message:[{"delegateTaskId":"_5ejQ7gtTXyfh6qnPrUeJg","sync":true,"accountId":"kmpySmUISimoRrJL6NL73w"}]
2017-07-25 20:45:42,356 [New I/O worker #1] XXXXXXXXX.XXXXXXXServiceImpl - Submitted: Event:MESSAGE, message:[{"delegateTaskId":"_5ejQ7gtTXyfh6qnPrUeJg","sync":true,"accountId":"kmpySmUISimoRrJL6NL73w"}]
See the timestamp of the two messages. Although i am expecting that submitting to the queue for thread pool should be immediate, it takes almost a minute between the two messages. I have tried to eliminate all possibilities like printing GC logs (no major GC with pauses), load pattern etc.
At the time when i see this there is no load on the system and CPU use is minimal. Its running on amazon EC2 T2LARGE box and i can see that there is not much CPU usage.
I read java docs and google around but i couldn't find anything helpful. This is very puzzling. Any pointer is greatly appreciated.
------EDIT-----
I added the time in the log message to make sure that there is no issue of logging. The updated code is
logger.info("Event:{}, time:{}, message:[{}]", Event.MESSAGE.name(), new Date(), message);
fixedThreadPool.submit(new Runnable() {
#Override
public void run() {
...some code...
}
});
logger.info("Submitted: Event:{}, time:{}, message:[{}]", Event.MESSAGE.name(), new Date(), message);
Here is the output
Event:MESSAGE, time:Wed Jul 26 17:50:18 UTC 2017, message:[{"delegateTaskId":"pN7UzXfzSWajjJY33LbM1A","sync":true,"accountId":"kmpySmUISimoRrJL6NL73w"}]
Submitted: Event:MESSAGE, time:Wed Jul 26 17:51:19 UTC 2017, message:[{"delegateTaskId":"pN7UzXfzSWajjJY33LbM1A","sync":true,"accountId":"kmpySmUISimoRrJL6NL73w"}]
As you can see that the time taken to submit the task in the thread pool is almost a minute
I've tested the following code, also on an AWS EC2 t2.large instance:
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class RaghvendraSinghTest
{
public static void main(String[] args)
throws Exception
{
ExecutorService fixedThreadPool = Executors.newFixedThreadPool(5);
System.out.printf("[%s] Before fixedThreadPool.submit()%n", Instant.now());
fixedThreadPool.submit(new Runnable() {
#Override
public void run()
{
System.out.printf("[%s] In run()%n", Instant.now());
}
});
System.out.printf("[%s] After fixedThreadPool.submit()%n", Instant.now());
fixedThreadPool.shutdown();
fixedThreadPool.awaitTermination(30, TimeUnit.SECONDS);
System.out.printf("[%s] After fixedThreadPool.shutdown()%n", Instant.now());
}
}
Running the code results in the following output:
[2017-07-26T20:11:56.730Z] Before fixedThreadPool.submit()
[2017-07-26T20:11:56.803Z] After fixedThreadPool.submit()
[2017-07-26T20:11:56.803Z] In run()
[2017-07-26T20:11:56.804Z] After fixedThreadPool.shutdown()
Which shows that the entire run of the program takes less than 75ms. One thing about your question that stands out to me is the name of your thread - "New I/O worker #1" - which indicates to me that there are multiple ExecutorServices at play here.
If you run the code I have included - and just the code I have included - do you see results similar to mine? If you do (and I suspect that you will), you should include enough code so that we can replicate your problem. Otherwise this certainly appears to be specific to your environment.
I figured out the issue. We have the following in our logback.xml
<appender name="SYSLOG-TLS" class="software.wings.logging.CloudBeesSyslogAppender">
<layout class="ch.qos.logback.classic.PatternLayout">
<pattern>%date{ISO8601} %boldGreen(${process_id}) %boldCyan(${version}) %green([%thread]) %highlight(%-5level) %cyan(%logger) - %msg %n</pattern>
</layout>
<host>XXXXXXXX</host>
<port>XXXXXXXX</port>
<programName>XXXXXXXXX</programName>
<key>XXXXXXXXXXX</key>
<threshold>TRACE</threshold>
</appender>
This configuration made the logger.info call to post the log in logdna and the way our system was configured, the posting of the log to logdna server was synchronous and sometimes it was taking upto 60 secs and our tasks were timing out.
Now need to figure out why these logdna log posting calls are synchronous.
Related
I'm trying to understand ConsumableFuture.
Basically, I supply a task to the ConsumableFuture and then put the worker thread running that task to sleep for 2 seconds. I expect the worker thread to resume execution after 2 seconds and return the result.
public class CompletableFutureDemo {
public static void main(String[] args) {
System.err.println("Application started");
CompletableFuture
.supplyAsync(()->work1())
.thenAccept(op-> System.out.println(op));
System.err.println("Application ended");
}
public static int work1() {
System.out.println(Thread.currentThread().getName());
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("work1 called");
return (int) (Math.random() * 100);
}
}
Output:
Application started
ForkJoinPool.commonPool-worker-1
Application ended
Why is the worker thread not resuming?
But if I remove the sleep statement from the worker thread, then I get the desired output.
Application started
ForkJoinPool.commonPool-worker-1
work1 called
Application ended
64
As #Slaw already pointed in the comment, the Main thread completes and exit the application when the work thread is sleeping, so you can call join to keep main thread waiting until work thread completes
System.err.println("Application started");
CompletableFuture
.supplyAsync(()->work1())
.thenAccept(op-> System.out.println(op)).join();
System.err.println("Application ended");
Output :
ForkJoinPool.commonPool-worker-3
Application started
work1 called
12
Application ended
Or you can keep main thread wait after completion of it work
System.err.println("Application started");
CompletableFuture<Void> completableFuture = CompletableFuture
.supplyAsync(()->work1())
.thenAccept(op-> System.out.println(op));
System.err.println("Application ended");
completableFuture.join();
Output :
ForkJoinPool.commonPool-worker-3
Application started
Application ended
work1 called
25
If you have multiple CompletableFuture objects then you can use allOf to wait until all tasks get completed (but in background every completable task will execute asynchronously)
CompletableFuture.allOf(completableFuture1,completableFuture1).join();
I achieved asynchronous operation as well as avoiding marking it as a daemon, by supplying my own instance of Executor. (Any flavour of Executor)
CompletableFuture
.supplyAsync(()->work1(), Executors.newFixedThreadPool(2))
.thenAccept(op-> System.out.println(op));
I think this would have avoided creating daemon threads, similar to what we have in ExecutorServices.
Thank you #Slaw for providing the information on the daemon thread. I would like to find out more why ForkJoin architecture would mark threads as a daemon by default.
I'm writing a scheduled task in Thorntail that will run for a long time (approx. 30 minutes). However, it appears that Thorntail limits the execution time to 30 seconds.
My code looks like this (I've removed code that I believe is irrelevant):
#Singleton
public class ReportJobProcessor {
#Schedule(hour = "*", minute = "*/30", persistent = false)
public void processJobs() {
// Acquire a list of jobs
jobs.forEach(this::processJob);
}
private void processJob(ReportJob job) {
// A long running process
}
}
After 30 seconds, I see the following in my logs:
2019-10-01 16:15:14,097 INFO [org.jboss.as.ejb3.timer] (EJB default - 2) WFLYEJB0021: Timer: [id=... timedObjectId=... auto-timer?:true persistent?:false timerService=org.jboss.as.ejb3.timerservice.TimerServiceImpl#42478b98 initialExpiration=null intervalDuration(in milli sec)=0 nextExpiration=Tue Oct 01 16:20:00 CEST 2019 timerState=IN_TIMEOUT info=null] will be retried
Another 30 seconds later, an exception is thrown because the job still didn't complete.
I have no idea how to increase the timeout, and googling my issue returns nothing helpful.
How can I increase the timeout beyond 30 seconds?
I suggest you take a bit different approach.
The scheduled task will distribute jobs to asynchronously running stateless session beans (SLSB) called ReportJobExecutor and finish immediately after job distribution without timing out. The number of simultaneously running SLSBs can be adjustable in project-defaults.yml, the default count is 16, IIRC. This is a very basic example but demonstrates Java EE executions with predefined bean pool that is invoked using EJB Timer. More complicated example would be manual pooling of executors that would allow you to control lifecycle of the executors (e.g. killing them after specified time).
#Singleton
public class ReportJobProcessor {
#Inject ReportJobExecutor reportJobExecutor;
#Schedule(hour = "*", minute = "*/30", persistent = false)
public void processJobs() {
// Acquire a list of jobs
jobs.forEach(job -> reportJobExecutor.run(job));
}
}
#Stateless
#Asynchronous
public class ReportJobExecutor {
public void run(ReportJob job) {
//do whatever with job
}
}
Idea #2:
Another approach would be using Java Batch Processing API (JSR 352), unfortunately, I am not familiar with this API.
I've seen other references to this issue, such as here and here, although these reference different versions of Netty. Tried this using the latest in the 4.0 branch (4.0.29) and in the 5.0 alpha branch (5.0-Alpha3). Local (non-linux) jdk 1.8.040, fine. Remote (Linux) with java jdk 1.8.025-b17 get 100% cpu.
Linux kernel version 2.6.32.
Tried using EpollEventLoopGroup();
Tried calling
workerGroup = new NioEventLoopGroup();
workerGroup.rebuildSelectors();
Can anyone offer any suggestions? I've seen references to this bug w/different versions of Netty. Jdk bug? Netty bug? Process goes to 100% immediately on startup and stays there.
Update: Upgraded to java 1.8.045, same difference.
JStack output of all runnable threads (there's some rabbitmq stuff in there, only included for completeness - that's common to other applications, and is not the cause of the problem).
As we identified in the comments, the thread that consumed CPU is busy in the following stack:
"pool-9-thread-1" #49 prio=5 os_prio=0 tid=0x00007ffd508e8000 nid=0x3a0c runnable [0x00007ffd188b6000]
java.lang.Thread.State: RUNNABLE
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I have managed to reproduce a similar behavior by creating a ScheduledThreadPoolExecutor, configuring it to allow core threads to time out, and scheduling a lot of repeating tasks with a short delay. It yields a lot of CPU on my machine and the jstack output is similar (sometimes deeper into the poll method). This code reproduces it:
ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(1);
executor.setKeepAliveTime(1, TimeUnit.MINUTES);
executor.allowCoreThreadTimeOut(true);
for (long i = 0; i < 1000; i++) {
executor.scheduleAtFixedRate(new Runnable() {
#Override
public void run() {
}
}, 0, 1, TimeUnit.NANOSECONDS);
}
Now we just have to identify which code sets up a broken ScheduledThreadPoolExecutor. I searched through the RabbitMQ and Netty source code without finding anything obvoius. Could it be something you do in your own code?
Edit: As mentioned in the comments, the root cause was a ScheduledThreadPoolExecutor initialized with 0 which apparently can cause a CPU spin om some platforms. This was done in the OP's code.
I am wondering what is the difference between these two methods of Executors class? I have a web application where I'm checking some data every 100 ms so that's why I'm using this scheduler with scheduleWithFixedDelay method. I want to know which method should I use in this case (newScheduledThreadPool or newSingleThreadScheduledExecutor)?
I also have one more question - in VisualVM where I monitor my Glassfish server I noticed that I have some threads in PARK state - for example:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <3cb9965d> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Is it possible that these threads are connected with scheduler because I don't have any idea what else would create them? These threads are never destroyed, so I am afraid that this could cause some troubles. Here is a screenshot (new Thread-35 will be created in 15minutes and so on...):
As documentation states:
Unlike the otherwise equivalent newScheduledThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
So when using newScheduledThreadPool(1)you will be able to add more threads later.
newSingleThreadScheduledExecuto() is wrapped by a delegate, as you can see in Executors.java:
public static ScheduledExecutorService newSingleThreadScheduledExecutor() {
return new DelegatedScheduledExecutorService(new ScheduledThreadPoolExecutor(1));
}
Differences (from javadoc):
if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.
Unlike the otherwise equivalent {#code newScheduledThreadPool(1)} the returned executor is guaranteed not to be reconfigurable to use additional threads.
reply to your comment:
do this also apply for newScheduledThreadPool(1) or not?
no, you need to take care of thread failure yourself.
As for Unsafe.park(), see this.
As pointed out by wings and tagir the differences is in "how to manage failure".
About Thread your thread is in Wait status; park is not a status but is a method to put the Thread in wait status; see also
How to detect thread being blocked by IO?
However, let me suggest a different way to implement a scheduled thread on Java EE; you should take a look at EJB's TimerService
#Singleton
public class TimerSessionBean {
#Resource
TimerService timerService;
public void setTimer(long intervalDuration) {
Timer timer = timerService.createTimer(intervalDuration,
"Created new programmatic timer");
}
#Timeout
public void lookForData(Timer timer) {
//this.setLastProgrammaticTimeout(new Date());
....
}
//OR
#Schedule(minute = "*/1", hour = "*")
public void runEveryMinute() {
...
}
}
I have a web app being served by jetty + mysql. I'm running into an issue where my database connection pool gets exhausted, and all threads start blocking waiting for a connection. I've tried two database connection pool libraries: (1) bonecp (2) hikari. Both exhibit the same behavior with my app.
I've done several thread dumps when I see this state, and all the blocked threads are in this state (not picking on bonecp, I'm sure it's something on my end now):
"qtp1218743501-131" prio=10 tid=0x00007fb858295800 nid=0x669b waiting on condition [0x00007fb8cd5d3000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000763f42d20> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at com.jolbox.bonecp.DefaultConnectionStrategy.getConnectionInternal(DefaultConnectionStrategy.java:82)
at com.jolbox.bonecp.AbstractConnectionStrategy.getConnection(AbstractConnectionStrategy.java:90)
at com.jolbox.bonecp.BoneCP.getConnection(BoneCP.java:553)
at com.me.Foo.start(Foo.java:30)
...
I'm not sure where to go from here. I was thinking that I would see some stack traces in the thread dump where my code was stuck doing some lengthly operation, not waiting for a connection. For example, if my code looks like this:
public class Foo {
public void start() {
Connection conn = threadPool.getConnection();
work(conn);
conn.close();
}
public void work(Connection conn) {
.. something lengthy like scan every row in the database etc ..
}
}
I would expect one of the threads above to have a stack trace that shows it working away in the work() method:
...
at com.me.mycode.Foo.work()
at com.me.mycode.Foo.start()
but instead they're just all waiting for a connection:
...
at com.jolbox.bonecp.BoneCP.getConnection() // ?
at com.me.mycode.Foo.work()
at com.me.mycode.Foo.start()
Any thoughts on how to continue debugging would be great.
Some other background: the app operates normally for about 45 minutes, mem and thread dumps show nothing out of the ordinary. Then the condition is triggered and the thread count spikes up. I started thinking it might be some combination of sql statements the app is trying to perform which turn into some sort of lock on the mysql side, but again I would expect some of the threads in the stack traces above to show me that they're in that part of the code.
The thread dumps were taken using visualvm.
Thanks
Take advantage of the configuration options for the connection pool (see BoneCPConfig / HikariCPConfig). First of all, set a connection time-out (HikariCP connectionTimeout) and a leak detection time-out (HikariCP leakDetectionThreshold, I could not find the counterpart in BoneCP). There might be more configuration options that dump stack-traces when something is not quite right.
My guess is that your application does not always return a connection to the pool and after 45 minutes has no connection in the pool anymore (and thus blocks forever trying to get a connection from the pool). Treat a connection like opening/closing a file, i.e. always use try/finally:
public void start() {
Connection conn = null;
try {
work(conn = dbPool.getConnection());
} finally {
if (conn != null) {
conn.close();
}
}
}
Finally, both connection pools have options to allow JMX monitoring. You can use this to monitor for strange behavior in the pool.
I question the whole design.
If you have a waiting block in a multithreaded netIO, you need a better implementation of the connection.
I suggest you take a look at non blocking IO (Java.nio, channels package), or granulate your locks.