Have a threadPool set up with 8 threads interacting with the database, doing selects and inserts. Set up a DataSource by extending BasicDataSource and setting the maxactive to 30 and maxIdle to same.
My app works fine but, when I have a look at jvisualVM I see a lot of wait for a database connection,which is puzzling me since I have 8 threads at max which use shared JdbcTemplate
Database: Oracle 12cr2
How can I resolve this?
Tried setting MaxIdle to the same number as MaxActive and it doesn't make any difference, I still see the delays.
I thought maybe there's just one connection which might be being shared due to some misconfig with the jdbctemplate, so I set a breakpoint once after one of the threads acquired a connection and watched if other threads were being starved, but this was not the case, the operations continued but with the delays as seen previously.
dbcp2 config:
data = property.getProperty(Constants.DATABASE_INITIAL_SIZE);
if( data != null && data.isEmpty() == false )
setInitialSize(Integer.parseInt(data));
else
setInitialSize(10);
data = property.getProperty(Constants.DATABASE_MAX_ACTIVE);
if( data != null && data.isEmpty() == false )
setMaxTotal(Integer.parseInt(data));
else
setMaxTotal(20);
setMaxIdle(20);
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000540289128> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:590)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:441)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:362)
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
I see a lot of these in the threaddumps.
Related
I'm having a race condition between two threads simultaneously trying to close a JMS Session, which was created for the IBM MQ broker. It appears theres an option to prevent different threads from using a same connection handler simultaneously. The config is called 'MQCNO_HANDLE_SHARE_NO_BLOCK' (check https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.ref.dev.doc/q095480_.htm).
I'm looking for some way to configure this property using the MQConnectionFactory, in the IBM MQ Java client (v.9.1.1.0).
I've already tried using the connectionFactory.setMQConnectionOptions, and or-ing the 'com.ibm.mq.constants.CMQC#MQCNO_HANDLE_SHARE_NO_BLOCK' constant to the actual value, but the client fails to start, telling me it's that the connection options are not valid.
connectionFactory.setMQConnectionOptions(connectionFactory.getMQConnectionOptions());
connectionFactory.setMQConnectionOptions(connectionFactory.getMQConnectionOptions() | MQCNO_HANDLE_SHARE_NO_BLOCK);
I've found in some adaptation of the IBM MQ client to GoLang that the flag is setted in here:
if gocno == nil {
// Because Go programs are always threaded, and we cannot
// tell on which thread we might get dispatched, allow handles always to
// be shareable.
gocno = NewMQCNO()
gocno.Options = MQCNO_HANDLE_SHARE_NO_BLOCK
} else {
if (gocno.Options & (MQCNO_HANDLE_SHARE_NO_BLOCK |
MQCNO_HANDLE_SHARE_BLOCK)) == 0 {
gocno.Options |= MQCNO_HANDLE_SHARE_NO_BLOCK
}
}
copyCNOtoC(&mqcno, gocno)
C.MQCONNX((*C.MQCHAR)(mqQMgrName), &mqcno, &qMgr.hConn, &mqcc, &mqrc)
Does anyone dealt with this problem, or used this flag?
EDIT - Added thread dumps from two locked threads.
I have the following threads locked:
1) JMSCCThreadPoolWoker, aka, IBM worker handling the exception raised by an IBM TCP receiver thread:
"JMSCCThreadPoolWorker-493": inconsistent?, holding [0x00000006d605a0b8, 0x00000006d5f6b9e8, 0x00000005c631e140]
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at com.ibm.mq.jmqi.remote.util.ReentrantMutex.acquire(ReentrantMutex.java:167)
at com.ibm.mq.jmqi.remote.util.ReentrantMutex.acquire(ReentrantMutex.java:73)
at com.ibm.mq.jmqi.remote.api.RemoteHconn.requestDispatchLock(RemoteHconn.java:1219)
at com.ibm.mq.jmqi.remote.api.RemoteFAP.MQCTL(RemoteFAP.java:2576)
at com.ibm.mq.jmqi.monitoring.JmqiInterceptAdapter.MQCTL(JmqiInterceptAdapter.java:333)
at com.ibm.msg.client.wmq.internal.WMQConsumerOwnerShadow.controlAsyncService(WMQConsumerOwnerShadow.java:169)
at com.ibm.msg.client.wmq.internal.WMQConsumerOwnerShadow.stop(WMQConsumerOwnerShadow.java:471)
at com.ibm.msg.client.wmq.internal.WMQSession.stop(WMQSession.java:1894)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.stop(JmsSessionImpl.java:2515)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.stop(JmsSessionImpl.java:2498)
at com.ibm.msg.client.jms.internal.JmsConnectionImpl.stop(JmsConnectionImpl.java:1263)
at com.ibm.mq.jms.MQConnection.stop(MQConnection.java:473)
at org.springframework.jms.connection.SingleConnectionFactory.closeConnection(SingleConnectionFactory.java:491)
at org.springframework.jms.connection.SingleConnectionFactory.resetConnection(SingleConnectionFactory.java:383)
at org.springframework.jms.connection.CachingConnectionFactory.resetConnection(CachingConnectionFactory.java:199)
at org.springframework.jms.connection.SingleConnectionFactory.onException(SingleConnectionFactory.java:361)
at org.springframework.jms.connection.SingleConnectionFactory$AggregatedExceptionListener.onException(SingleConnectionFactory.java:715)
at com.ibm.msg.client.jms.internal.JmsProviderExceptionListener.run(JmsProviderExceptionListener.java:413)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTask(WorkQueueItem.java:319)
at com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.runItem(SimpleWorkQueueItem.java:99)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run(WorkQueueItem.java:343)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.runWorkQueueItem(WorkQueueManager.java:312)
at com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWorker.run(WorkQueueManagerImplementation.java:1227)
2) A message handling threads, which happens to be processing a message, and in a Catch clause, attemps to close the sessions/producer/consumer (there's a JMS REPLY_TO handling situtation).
"[MuleRuntime].cpuLight.07: CPU_LITE #76c382e9": awaiting notification on [0x00000006d5f6b9c8], holding [0x0000000718d73900]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.close(JmsSessionImpl.java:383)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.close(JmsSessionImpl.java:349)
at com.ibm.mq.jms.MQSession.close(MQSession.java:275)
at org.springframework.jms.connection.CachingConnectionFactory$CachedSessionInvocationHandler.physicalClose(CachingConnectionFactory.java:481)
at org.springframework.jms.connection.CachingConnectionFactory$CachedSessionInvocationHandler.invoke(CachingConnectionFactory.java:311)
at com.sun.proxy.$Proxy197.close(Unknown Source)
at org.mule.jms.commons.internal.connection.session.DefaultJmsSession.close(DefaultJmsSession.java:65)
at org.mule.jms.commons.internal.common.JmsCommons.closeQuietly(JmsCommons.java:165)
at org.mule.jms.commons.internal.source.JmsListener.doReply(JmsListener.java:326)
at MORE STUFF BELOW
I have Spring with Quartz jobs (clustered) running at periodic interval (1 minute). When server starts everything seems fine, but jobs don't get triggered after some time. Restart of the server makes the jobs run, but issue re-occurs after some time.
I suspected it to be a thread exhaustion issue and from thread dump I noticed that all my Quartz threads (10) are in TIMED_WAITING.
Config:
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 10
org.quartz.threadPool.threadPriority = 5
Thread dump:
quartzScheduler_Worker-10 - priority:10 - threadId:0x00007f8ae534d800 - nativeId:0x13c78 - state:TIMED_WAITING stackTrace:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000066cd73220> (a java.lang.Object)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
- locked <0x000000066cd73220> (a java.lang.Object)
Using quartz 2.2.1 (i doubt if it could be version specific issue)
I verified from the logs that there are no DB connectivity issues.
Kindly help in diagnosing the problem. Is there a possibility that I have maxed out system resources (number of threads) ? But my jobs are synchronous and exist only when all its child threads have completed their task and I also have this annotation #DisallowConcurrentExecution
The root cause was we had too many miss fires in our quartz job. We have quartz kicks in every 1 minute and job doesn't really complete in say 1 min, so it's getting pilled up as miss fires and quartz tries to execute them first.
During this process there's an operation of update of miss fires which takes a lots of time which leads quartz to get stuck. This is evident from thread dump where in all our quartz threads are in TIMED_WAITING state as below
quartzScheduler_Worker-10 - priority:10 - threadId:0x00007f8ae534d800 - nativeId:0x13c78 - state:TIMED_WAITING
stackTrace:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000066cd73220> (a java.lang.Object)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
- locked <0x000000066cd73220> (a java.lang.Object)
Refer : https://jira.terracotta.org/jira/si/jira.issueviews:issue-html/QTZ-357/QTZ-357.html
For our use case miss fires can be ignored and can be picked with next run. Hence I changed the Misfire instruction to ignore as below
<property name="misfireInstructionName" value="MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY" />
I have a web app being served by jetty + mysql. I'm running into an issue where my database connection pool gets exhausted, and all threads start blocking waiting for a connection. I've tried two database connection pool libraries: (1) bonecp (2) hikari. Both exhibit the same behavior with my app.
I've done several thread dumps when I see this state, and all the blocked threads are in this state (not picking on bonecp, I'm sure it's something on my end now):
"qtp1218743501-131" prio=10 tid=0x00007fb858295800 nid=0x669b waiting on condition [0x00007fb8cd5d3000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000763f42d20> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at com.jolbox.bonecp.DefaultConnectionStrategy.getConnectionInternal(DefaultConnectionStrategy.java:82)
at com.jolbox.bonecp.AbstractConnectionStrategy.getConnection(AbstractConnectionStrategy.java:90)
at com.jolbox.bonecp.BoneCP.getConnection(BoneCP.java:553)
at com.me.Foo.start(Foo.java:30)
...
I'm not sure where to go from here. I was thinking that I would see some stack traces in the thread dump where my code was stuck doing some lengthly operation, not waiting for a connection. For example, if my code looks like this:
public class Foo {
public void start() {
Connection conn = threadPool.getConnection();
work(conn);
conn.close();
}
public void work(Connection conn) {
.. something lengthy like scan every row in the database etc ..
}
}
I would expect one of the threads above to have a stack trace that shows it working away in the work() method:
...
at com.me.mycode.Foo.work()
at com.me.mycode.Foo.start()
but instead they're just all waiting for a connection:
...
at com.jolbox.bonecp.BoneCP.getConnection() // ?
at com.me.mycode.Foo.work()
at com.me.mycode.Foo.start()
Any thoughts on how to continue debugging would be great.
Some other background: the app operates normally for about 45 minutes, mem and thread dumps show nothing out of the ordinary. Then the condition is triggered and the thread count spikes up. I started thinking it might be some combination of sql statements the app is trying to perform which turn into some sort of lock on the mysql side, but again I would expect some of the threads in the stack traces above to show me that they're in that part of the code.
The thread dumps were taken using visualvm.
Thanks
Take advantage of the configuration options for the connection pool (see BoneCPConfig / HikariCPConfig). First of all, set a connection time-out (HikariCP connectionTimeout) and a leak detection time-out (HikariCP leakDetectionThreshold, I could not find the counterpart in BoneCP). There might be more configuration options that dump stack-traces when something is not quite right.
My guess is that your application does not always return a connection to the pool and after 45 minutes has no connection in the pool anymore (and thus blocks forever trying to get a connection from the pool). Treat a connection like opening/closing a file, i.e. always use try/finally:
public void start() {
Connection conn = null;
try {
work(conn = dbPool.getConnection());
} finally {
if (conn != null) {
conn.close();
}
}
}
Finally, both connection pools have options to allow JMX monitoring. You can use this to monitor for strange behavior in the pool.
I question the whole design.
If you have a waiting block in a multithreaded netIO, you need a better implementation of the connection.
I suggest you take a look at non blocking IO (Java.nio, channels package), or granulate your locks.
I have some worker threads running, with MySQL and mysql-connector-java-5.1.20.
When I kill some SQL statement ( using kill "connection id" from mysql client), the java thread hangs, which should throw some exception.
jstack prints:
"quartzBase$child#45e3dd3c_Worker-3" prio=10 tid=0x00007f960004c800 nid=0x713d runnable [0x00007f943b3a0000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAvailable(Native Method)
at java.net.PlainSocketImpl.available(PlainSocketImpl.java:472)
- locked <0x00007f9e11fe13a8> (a java.net.SocksSocketImpl)
at java.net.SocketInputStream.available(SocketInputStream.java:217)
at com.mysql.jdbc.util.ReadAheadInputStream.available(ReadAheadInputStream.java:232)
at com.mysql.jdbc.MysqlIO.clearInputStream(MysqlIO.java:981)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2426)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2677)
- locked <0x00007f9e17de2b50> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4863)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4749)
- locked <0x00007f9e17de2b50> (a com.mysql.jdbc.JDBC4Connection)
at org.apache.commons.dbcp.DelegatingConnection.rollback(DelegatingConnection.java:368)
at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.rollback(PoolingDataSource.java:323)
at org.hibernate.transaction.JDBCTransaction.rollbackAndResetAutoCommit(JDBCTransaction.java:217)
at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:196)
at org.springframework.orm.hibernate3.HibernateTransactionManager.doRollback(HibernateTransactionManager.java:676)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollback(AbstractPlatformTransactionManager.java:845)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(AbstractPlatformTransactionManager.java:822)
at org.springframework.transaction.interceptor.TransactionAspectSupport.completeTransactionAfterThrowing(TransactionAspectSupport.java:430)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:112)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at $Proxy1021.process(Unknown Source)
Using jvmtop, I saw this:
JvmTop 0.8.0 alpha - 22:48:37, amd64, 24 cpus, Linux 2.6.32-35, load avg 11.53
http://code.google.com/p/jvmtop
Profiling PID 27403: com.caucho.server.resin.Resin --root-dir
36.41% ( 0.22s) com.mysql.jdbc.util.ReadAheadInputStream.available()
33.42% ( 0.20s) ....opensymphony.xwork2.conversion.impl.DefaultTypeConve()
30.17% ( 0.18s) com.mysql.jdbc.util.ReadAheadInputStream.fill()
0.00% ( 0.00s) com.rabbitmq.client.impl.Frame.readFrom()
The worker threads will never accept new tasks.
any idea?
According to MySQL documentation "kill connection thread_id" should terminate the connection associated with the given thread_id. But it looks like that is not happening (in which case the Java thread will wait for an answer forever). Maybe you can verify that the connection is actually closed using some network tool (e.g. netstat) .
I've ran into hanging MySQL connections before and had to resort to using the socketTimeout JDBC connection parameter (but be careful: the socketTimeout needs to be larger than the time it takes to complete the longest running query). You could also try to use the QueryTimeout for a prepared statement.
I use a lot of client sends a request to the server about 1000 requests per second a client, the server's CPU soon rose to 600% (8 cores), and always maintain this state. When I use jstack printing process content, I found SelectorImpl is BLOCKED state. Records are as follows:
nioEventLoopGroup-4-1 prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Unknown Source)
High CPU has something to do with this? Another problem is that when I connect a lot of clients, find some client will connect, an error is as follows:
"nioEventLoopGroup-4-1" prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Unknown Source)
Generate client is accomplished by using a thread pool, and has set up a connection timeout, but why frequent connection timeout? Is to serve the cause of the suit?
public void run() {
System.out.println(tnum + " connecting...");
try {
Bootstrap bootstrap = new Bootstrap();
bootstrap.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 30000)
.handler(loadClientInitializer);
// Start the connection attempt.
ChannelFuture future = bootstrap.connect(host, port);
future.channel().attr(AttrNum).set(tnum);
future.sync();
if (future.isSuccess()) {
System.out.println(tnum + " login success.");
goSend(tnum, future.channel());
} else {
System.out.println(tnum + " login failed.");
}
} catch (Exception e) {
XLog.error(e);
} finally {
// group.shutdownGracefully();
}
}
High CPU has something to do with this?
It might be. I'd diagnose this problem following way (on a Linux box):
Find threads which are eating CPU
Using pidstat I'd find which threads are eating CPU and in what mode (user/kernel) time is spent.
$ pidstat -p [java-process-pid] -tu 1 | awk '$9 > 50'
This command shows threads eating at least 50% of CPU time. You can inspect what those threads are doing using jstack, VisualVM or Java Flight Recorder.
If CPU-hungry threads and BLOCKED threads are the same, CPU usage seems to do something with contention.
Find reason for connection timeout
Basically you will get connection timeout if two OS'es can't finish TCP-handshake in a given time. Several reasons for this:
network link saturation. Can be diagnosed using sar -n DEV 1 and comparing rxkB/s and txkB/s columns to your link maximum throughput.
server (Netty) doesn't respond with accept() call in given timeout. This thread can be BLOCKED or starving for CPU time. You can find which threads are calling accept() (therefore finishing TCP-handshake) using strace -f -e trace=accept -p [java-pid]. And after that check for possible reasons using pidstat/jstack.
Also you can find number of received requests for connection open (but not confirmed) with netstat -an | grep -c SYN_RECV
If you can elaborate more on what your Netty is doing it could be helpful. Regardless - please make sure you are closing the channels. Notice from the Channel javadoc:
It is important to call close() or close(ChannelPromise) to release all resources once you are done with the Channel. This ensures all resources are released in a proper way, i.e. filehandles
If you are closing the channels, then the problem may be with the logic it self - running into infinite loops or similar - which may be able to explain the high CPU.