i have this issue where in even if the query process in Oracle ended, thread in Java Spring is still waiting or reading from DB Connection. this query loads for more than 30 minutes in the Oracle DB, after it ends, i saw web container thread in java and it's still waiting on the DB side.
all i see in thread dump is - locked <33019fca> (a oracle.jdbc.driver.T4CConnection), but i cant see any thread related to 33019fca, plus the web container is in RUNNABLE State so it puzzles me.
my question is, what are things do I need to check? I know the query is the root cause as it should not load that long, but why does web container is still waiting when the query ended in the DB side?
Application:
Java 6
OJDBC 6
Oracle DB
Spring
Websphere 8
Java Thread:
"WebContainer : 11" - Thread t#191
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:128)
at oracle.net.ns.Packet.receive(Packet.java:311)
at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:397)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:587)
at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:220)
at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:48)
at oracle.jdbc.driver.T4CCallableStatement.executeForRows(T4CCallableStatement.java:938)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1150)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4798)
at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:4901)
**- locked <33019fca> (a oracle.jdbc.driver.T4CConnection)**
at oracle.jdbc.driver.OracleCallableStatement.execute(OracleCallableStatement.java:5631)
**- locked <33019fca> (a oracle.jdbc.driver.T4CConnection)**
at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1385)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecute(WSJdbcPreparedStatement.java:956)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.execute(WSJdbcPreparedStatement.java:623)
at com.xx.xx.xxxx.dao.xxDataQueryJdbcDAO.getData(xxDataQueryJdbcDAO.java:346)
Locked ownable synchronizers:
- None
from fiddler perspective, it is still waiting for the response.
from Oracle Side,
I cant see the query loading anymore plus all connections related to schema is inactive.
Related
I have Spring with Quartz jobs (clustered) running at periodic interval (1 minute). When server starts everything seems fine, but jobs don't get triggered after some time. Restart of the server makes the jobs run, but issue re-occurs after some time.
I suspected it to be a thread exhaustion issue and from thread dump I noticed that all my Quartz threads (10) are in TIMED_WAITING.
Config:
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 10
org.quartz.threadPool.threadPriority = 5
Thread dump:
quartzScheduler_Worker-10 - priority:10 - threadId:0x00007f8ae534d800 - nativeId:0x13c78 - state:TIMED_WAITING stackTrace:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000066cd73220> (a java.lang.Object)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
- locked <0x000000066cd73220> (a java.lang.Object)
Using quartz 2.2.1 (i doubt if it could be version specific issue)
I verified from the logs that there are no DB connectivity issues.
Kindly help in diagnosing the problem. Is there a possibility that I have maxed out system resources (number of threads) ? But my jobs are synchronous and exist only when all its child threads have completed their task and I also have this annotation #DisallowConcurrentExecution
The root cause was we had too many miss fires in our quartz job. We have quartz kicks in every 1 minute and job doesn't really complete in say 1 min, so it's getting pilled up as miss fires and quartz tries to execute them first.
During this process there's an operation of update of miss fires which takes a lots of time which leads quartz to get stuck. This is evident from thread dump where in all our quartz threads are in TIMED_WAITING state as below
quartzScheduler_Worker-10 - priority:10 - threadId:0x00007f8ae534d800 - nativeId:0x13c78 - state:TIMED_WAITING
stackTrace:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000066cd73220> (a java.lang.Object)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
- locked <0x000000066cd73220> (a java.lang.Object)
Refer : https://jira.terracotta.org/jira/si/jira.issueviews:issue-html/QTZ-357/QTZ-357.html
For our use case miss fires can be ignored and can be picked with next run. Hence I changed the Misfire instruction to ignore as below
<property name="misfireInstructionName" value="MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY" />
Calls made from Java to SQL server using jtds driver is stuck. We have checked there is no locking or any long running queries on SQL server.
Following is the stack trace from java thread dump. Any one has any idea?
"core-CommandInvoker-thread-7133" prio=5 tid=7362 #### RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)
at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:731)
at net.sourceforge.jtds.jdbc.ResponseStream.getPacket(ResponseStream.java:477)
at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:146)
at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:128)
at net.sourceforge.jtds.jdbc.TdsData.readData(TdsData.java:767)
at net.sourceforge.jtds.jdbc.TdsCore.tdsRowToken(TdsCore.java:3172)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2430)
at net.sourceforge.jtds.jdbc.TdsCore.getNextRow(TdsCore.java:802)
at net.sourceforge.jtds.jdbc.JtdsResultSet.next(JtdsResultSet.java:608)
at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:20
7)
I had the same problem today on Java 8 / JDBI / Postgresql / HikariCP.
Mysteriously adding order by to the query solved the problem.
To be more specific, the resultset needs to be in deterministic order. If there're any ties in the order by, it'll stuck on socketRead0.
Here is an old report that leads me to this solution.
I got this type thread dump on tomcat, All thread are in wating state.
So application is slow down.
Please suggest me the solution for that.
I am using Tomcat 7 and java 7
"ImageLoadWorker(653)" prio=5 tid=0x2089 nid=0x829 in Object.wait() - stats: cpu=0 blk=-1 wait=-1
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on org.xhtmlrenderer.swing.ImageLoadQueue#4651e7d2
at java.lang.Object.wait(Object.java:503)
at org.xhtmlrenderer.swing.ImageLoadQueue.getTask(ImageLoadQueue.java:83)
at org.xhtmlrenderer.swing.ImageLoadWorker.run(ImageLoadWorker.java:53)
Locked synchronizers: count = 0
There is thread leakage in this class of flyingsaucer jar. I got answer for this problem on below URL.
https://technotailor.wordpress.com/2017/04/17/thread-leak-with-imageloadworker-in-flying-saucer-jar/
I have some worker threads running, with MySQL and mysql-connector-java-5.1.20.
When I kill some SQL statement ( using kill "connection id" from mysql client), the java thread hangs, which should throw some exception.
jstack prints:
"quartzBase$child#45e3dd3c_Worker-3" prio=10 tid=0x00007f960004c800 nid=0x713d runnable [0x00007f943b3a0000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAvailable(Native Method)
at java.net.PlainSocketImpl.available(PlainSocketImpl.java:472)
- locked <0x00007f9e11fe13a8> (a java.net.SocksSocketImpl)
at java.net.SocketInputStream.available(SocketInputStream.java:217)
at com.mysql.jdbc.util.ReadAheadInputStream.available(ReadAheadInputStream.java:232)
at com.mysql.jdbc.MysqlIO.clearInputStream(MysqlIO.java:981)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2426)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2677)
- locked <0x00007f9e17de2b50> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4863)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4749)
- locked <0x00007f9e17de2b50> (a com.mysql.jdbc.JDBC4Connection)
at org.apache.commons.dbcp.DelegatingConnection.rollback(DelegatingConnection.java:368)
at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.rollback(PoolingDataSource.java:323)
at org.hibernate.transaction.JDBCTransaction.rollbackAndResetAutoCommit(JDBCTransaction.java:217)
at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:196)
at org.springframework.orm.hibernate3.HibernateTransactionManager.doRollback(HibernateTransactionManager.java:676)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollback(AbstractPlatformTransactionManager.java:845)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(AbstractPlatformTransactionManager.java:822)
at org.springframework.transaction.interceptor.TransactionAspectSupport.completeTransactionAfterThrowing(TransactionAspectSupport.java:430)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:112)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at $Proxy1021.process(Unknown Source)
Using jvmtop, I saw this:
JvmTop 0.8.0 alpha - 22:48:37, amd64, 24 cpus, Linux 2.6.32-35, load avg 11.53
http://code.google.com/p/jvmtop
Profiling PID 27403: com.caucho.server.resin.Resin --root-dir
36.41% ( 0.22s) com.mysql.jdbc.util.ReadAheadInputStream.available()
33.42% ( 0.20s) ....opensymphony.xwork2.conversion.impl.DefaultTypeConve()
30.17% ( 0.18s) com.mysql.jdbc.util.ReadAheadInputStream.fill()
0.00% ( 0.00s) com.rabbitmq.client.impl.Frame.readFrom()
The worker threads will never accept new tasks.
any idea?
According to MySQL documentation "kill connection thread_id" should terminate the connection associated with the given thread_id. But it looks like that is not happening (in which case the Java thread will wait for an answer forever). Maybe you can verify that the connection is actually closed using some network tool (e.g. netstat) .
I've ran into hanging MySQL connections before and had to resort to using the socketTimeout JDBC connection parameter (but be careful: the socketTimeout needs to be larger than the time it takes to complete the longest running query). You could also try to use the QueryTimeout for a prepared statement.
I use a lot of client sends a request to the server about 1000 requests per second a client, the server's CPU soon rose to 600% (8 cores), and always maintain this state. When I use jstack printing process content, I found SelectorImpl is BLOCKED state. Records are as follows:
nioEventLoopGroup-4-1 prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Unknown Source)
High CPU has something to do with this? Another problem is that when I connect a lot of clients, find some client will connect, an error is as follows:
"nioEventLoopGroup-4-1" prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Unknown Source)
Generate client is accomplished by using a thread pool, and has set up a connection timeout, but why frequent connection timeout? Is to serve the cause of the suit?
public void run() {
System.out.println(tnum + " connecting...");
try {
Bootstrap bootstrap = new Bootstrap();
bootstrap.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 30000)
.handler(loadClientInitializer);
// Start the connection attempt.
ChannelFuture future = bootstrap.connect(host, port);
future.channel().attr(AttrNum).set(tnum);
future.sync();
if (future.isSuccess()) {
System.out.println(tnum + " login success.");
goSend(tnum, future.channel());
} else {
System.out.println(tnum + " login failed.");
}
} catch (Exception e) {
XLog.error(e);
} finally {
// group.shutdownGracefully();
}
}
High CPU has something to do with this?
It might be. I'd diagnose this problem following way (on a Linux box):
Find threads which are eating CPU
Using pidstat I'd find which threads are eating CPU and in what mode (user/kernel) time is spent.
$ pidstat -p [java-process-pid] -tu 1 | awk '$9 > 50'
This command shows threads eating at least 50% of CPU time. You can inspect what those threads are doing using jstack, VisualVM or Java Flight Recorder.
If CPU-hungry threads and BLOCKED threads are the same, CPU usage seems to do something with contention.
Find reason for connection timeout
Basically you will get connection timeout if two OS'es can't finish TCP-handshake in a given time. Several reasons for this:
network link saturation. Can be diagnosed using sar -n DEV 1 and comparing rxkB/s and txkB/s columns to your link maximum throughput.
server (Netty) doesn't respond with accept() call in given timeout. This thread can be BLOCKED or starving for CPU time. You can find which threads are calling accept() (therefore finishing TCP-handshake) using strace -f -e trace=accept -p [java-pid]. And after that check for possible reasons using pidstat/jstack.
Also you can find number of received requests for connection open (but not confirmed) with netstat -an | grep -c SYN_RECV
If you can elaborate more on what your Netty is doing it could be helpful. Regardless - please make sure you are closing the channels. Notice from the Channel javadoc:
It is important to call close() or close(ChannelPromise) to release all resources once you are done with the Channel. This ensures all resources are released in a proper way, i.e. filehandles
If you are closing the channels, then the problem may be with the logic it self - running into infinite loops or similar - which may be able to explain the high CPU.