Thread dump showing thread not releasing lock. - java

As shown below in the thread dump, thread with id qtp336276309-556300 obtained org.apache.log4j.spi.RootLogger's
lock & did not release it.
This causes other threads to be BLOCKED.
This is sometimes accompanied by CPU utilization on the host machine consistently increase over a week from 5% to 30%.
Would like to know whether the threads in Blocked state could cause CPU spike? If yes, then how can I approach to fix the problem?
If no, then what else should be reviewed in case of resolving spiking CPU utilization?
Thread dump is as below:
"qtp336276309-561036" #561036 prio=5 os_prio=0 tid=0x00007efe80576800 nid=0x20a7 waiting for monitor entry [0x00007efe19fbd000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0x00000000e02333c0> (a org.apache.log4j.spi.RootLogger)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.log(Log4jLoggerAdapter.java:601)
at org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:224)
at org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:301)
at java.util.logging.Logger.log(Logger.java:738)
at java.util.logging.Logger.doLog(Logger.java:765)
at java.util.logging.Logger.log(Logger.java:788)
at java.util.logging.Logger.info(Logger.java:1490)
at org.apache.mesos.chronos.scheduler.api.TaskManagementResource.updateStatus(TaskManagementResource.scala:43)
"qtp336276309-561035" #561035 prio=5 os_prio=0 tid=0x00007efe80193000 nid=0x20a6 waiting for monitor entry [0x00007efe248f4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0x00000000e02333c0> (a org.apache.log4j.spi.RootLogger)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.info(Category.java:666)
at mesosphere.chaos.http.ChaosRequestLog.write(ChaosRequestLog.scala:15)
at org.eclipse.jetty.server.NCSARequestLog.log(NCSARequestLog.java:591)
at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:92)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-328668" #561032 prio=5 os_prio=0 tid=0x00007efe50052000 nid=0x62 waiting for monitor entry [0x00007efe6b0cb000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0x00000000e02333c0> (a org.apache.log4j.spi.RootLogger)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.log(Log4jLoggerAdapter.java:601)
at org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:224)
at org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:301)
at java.util.logging.Logger.log(Logger.java:738)
at java.util.logging.Logger.doLog(Logger.java:765)
at java.util.logging.Logger.log(Logger.java:788)
at java.util.logging.Logger.info(Logger.java:1490)
at org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework.statusUpdate(MesosJobFramework.scala:224)
at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:37)
at com.sun.proxy.$Proxy30.statusUpdate(Unknown Source)
Locked ownable synchronizers:
- None
"qtp336276309-556300" #556300 prio=5 os_prio=0 tid=0x00007efe81654800 nid=0x2047 runnable [0x00007efe1a1c0000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
- locked <0x00000000e0234470> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:480)
- locked <0x00000000e0234450> (a java.io.PrintStream)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
- locked <0x00000000e0234438> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
- locked <0x00000000e0233cc0> (a org.apache.log4j.ConsoleAppender)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
- locked <0x00000000e02333c0> (a org.apache.log4j.spi.RootLogger)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.info(Category.java:666)
at mesosphere.chaos.http.ChaosRequestLog.write(ChaosRequestLog.scala:15)
at org.eclipse.jetty.server.NCSARequestLog.log(NCSARequestLog.java:591)
at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:92)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None

As shown below in the thread dump, thread with id qtp336276309-556300 obtained org.apache.log4j.spi.RootLogger's lock & did not release it.
This causes other threads to be BLOCKED.
This is normal, only one thread can be logging to a file at any one time. If you have multiple threads trying to write at once, they have to wait.
This is sometimes accompanied by CPU utilization on the host machine consistently increase over a week from 5% to 30%.
While logging is a very common cause of slowness, it usually doesn't get worse over time.
What often causes a problem is logging too much, and you might be logging more over time which could result in an increase, however I would see if something else is causing more activity. i.e. logging might just be the symptom.
Would like to know whether the threads in Blocked state could cause CPU spike?
BLOCKED threads don't use much CPU. While this can result in higher latencies, you might not see any increase in CPU usage (in fact it might go down)
If no, then what else should be reviewed in case of resolving spiking CPU utilization?
I would look at what other threads are doing at this time. I would try reducing the logging to reduce the amount of "noise" so you can see what it is doing most of the time.

Related

Request going pending due to threads in WAITING and TIMED_WAITING state

Java spring boot application request goes in pending state as threads held in WAITING and TIMED_WAITING state.
Jstack logs:
"qtp886341817-1399" #1399 prio=5 os_prio=0 tid=0x00007f02142ae800 nid=0x22f904 waiting on condition [0x00007f01c3fa8000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000684588e00> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:656)
at org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:49)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:720)
at java.lang.Thread.run(Thread.java:748)
"threadPoolTaskExecutor-1" #114 prio=5 os_prio=0 tid=0x00007f02140b4800 nid=0x229d78 waiting on condition [0x00007f01c55b2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000684588e58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"qtp886341817-717" #717 prio=5 os_prio=0 tid=0x00007f021c102000 nid=0x22c546 in Object.wait() [0x00007f01ee774000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.eclipse.paho.client.mqttv3.internal.Token.waitUntilSent(Token.java:248)
- locked <0x0000000689516c80>
(a java.lang.Object) at org.eclipse.paho.client.mqttv3.MqttTopic.publish(MqttTopic.java:117)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.lang.Thread.run(Thread.java:748)
Details:
In this situation, the application is unable to serve API, the number of current threads went up to 250+, many threads go on deadlock state.
This spring application is hosted on AWS's t2.medium instance, Xms=1g, Xmx=2g, UseG1GC and we are using the jetty server.
This application generally servers a long thread wait APIs, it takes at least 12 to 60 seconds to respond to some of the APIs.
Questions:
Is there any way to find out how much threads can a spring application/JVM/jetty server can handle.
How can we tune this application to avoid such a situation (when application non-responsive)
How to restrict API's before this hung up situation.
Look here:
at org.eclipse.paho.client.mqttv3.internal.Token.waitUntilSent(Token.java:248)
- locked <0x0000000689516c80>
there is a lock, happened on timed out attempt to send message. Try to add here an asynchronous call.

Why is the java.exe is consuming more CPU due to “org.xnio.nio” threads?

We are running WildFly 12 with JDK 1.8 version. We faced the portal down / slowness issue in our web application due to high CPU usage (90% above) of java.exe. Then, we have taken a thread dump using JStack and most of the high CPU threads are pointing to "org.xnio.nio"
Portal is running in HTTPS
CPU: 8 core
Total RAM: 24 GB
Allocated JVM memory: From 1 GB (min) to 5 GB (max)
Occupaid JVM memory when taken the thread dump: 4 GB
DB is in separate machine
Below are the thread dump results (High CPU threads):
"default I/O-1" #95 prio=5 os_prio=0 tid=0x00000000260bf800 nid=0x1988 runnable [0x000000002e49f000]
java.lang.Thread.State: RUNNABLE
at io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:147)
at io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
at io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:59)
at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
at io.undertow.conduits.ReadTimeoutStreamSourceConduit$2.readReady(ReadTimeoutStreamSourceConduit.java:89)
at io.undertow.protocols.ssl.SslConduit$SslReadReadyHandler.readReady(SslConduit.java:1145)
at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:591)
"default I/O-13" #108 prio=5 os_prio=0 tid=0x0000000022186000 nid=0x2310 runnable [0x000000002f09f000]
java.lang.Thread.State: RUNNABLE
at java.lang.Object.notifyAll(Native Method)
at sun.nio.ch.WindowsSelectorImpl$StartLock.startThreads(WindowsSelectorImpl.java:189)
- locked <0x00000006618ae440> (a sun.nio.ch.WindowsSelectorImpl$StartLock)
at sun.nio.ch.WindowsSelectorImpl$StartLock.access$300(WindowsSelectorImpl.java:181)
at sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:153)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x000000066508ea18> (a sun.nio.ch.Util$3)
- locked <0x000000066508ea08> (a java.util.Collections$UnmodifiableSet)
- locked <0x0000000665082b98> (a sun.nio.ch.WindowsSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:551)
"default I/O-13" #108 prio=5 os_prio=0 tid=0x0000000022186000 nid=0x2310 runnable [0x000000002f09e000]
java.lang.Thread.State: RUNNABLE
at java.util.AbstractCollection.toArray(AbstractCollection.java:183)
at sun.nio.ch.Util$3.toArray(Util.java:322)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:570)
- locked <0x000000066508ea18> (a sun.nio.ch.Util$3)
- locked <0x0000000665082b98> (a sun.nio.ch.WindowsSelectorImpl)
"default I/O-5" #100 prio=5 os_prio=0 tid=0x00000000260c0000 nid=0x858 runnable [0x000000002e89e000]
java.lang.Thread.State: RUNNABLE
at org.xnio.nio.WorkerThread.run(WorkerThread.java:480)
The issue is occuring (2 - 5) days once.
Thanks in advance

Flyway regularly hangs (MariaDB connector, RDS)

I've been seeing frequent hangs on deployment, at the migration step. Java/Scala application packaged in WAR for Tomcat. Database is RDS Aurora using MariaDB connector (https://downloads.mariadb.org/connector-java/).
Probably has nothing to do with Flyway but is a generic problem getting a connection.
Migration is run from shell in container:
java -cp `echo WEB-INF/lib/*|tr ' ' :` foo.Migrate
Migration code looks like:
def main(args: Array[String]): Unit = {
Environment.dbFlywayPassword.foreach { pass =>
val flyway = new Flyway
flyway.setDataSource(Environment.jdbcUrl, "flyway", pass)
flyway.migrate
}
}
Connection string:
jdbc:mysql:aurora://%RDS_HOST%/xxx?serverSslCert=/rds-ca-2015-root.pem&useSSL=true&connectTimeout=10000
I've tried increasing logging level in Flyway, but nothing is logged after this line:
15:57:35.115 [main] INFO o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse
So I got a thread dump, that looks like this:
15:57:35.115 [main] INFO o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse
2017-06-08 15:57:56
Full thread dump OpenJDK 64-Bit Server VM (25.121-b13 mixed mode):
"MariaDb-failover-1" #8 daemon prio=5 os_prio=0 tid=0x00005555f80ae000 nid=0x14 waiting on condition [0x00007fc330b8f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f5c59b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00005555f70bf000 nid=0x12 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00005555f7063000 nid=0x11 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00005555f7060800 nid=0x10 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00005555f705e800 nid=0xf waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00005555f702f000 nid=0xe in Object.wait() [0x00007fc331616000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00005555f702c000 nid=0xd in Object.wait() [0x00007fc331717000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00005555f6f85000 nid=0xb runnable [0x00007fc34341d000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
- locked <0x00000000f0a66090> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000000f0a81eb0> (a sun.security.ssl.AppInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000000f06b5118> (a java.io.BufferedInputStream)
at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacketArray(StandardPacketInputStream.java:125)
at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacket(StandardPacketInputStream.java:95)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1002)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:982)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readRequestSessionVariables(AbstractConnectProtocol.java:498)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readPipelineAdditionalData(AbstractConnectProtocol.java:544)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:410)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:357)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:149)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:179)
at org.mariadb.jdbc.internal.failover.impl.MastersSlavesListener.initializeConnection(MastersSlavesListener.java:154)
at org.mariadb.jdbc.internal.failover.FailoverProxy.<init>(FailoverProxy.java:94)
at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:464)
at org.mariadb.jdbc.Driver.connect(Driver.java:103)
at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnectionFromDriver(DriverDataSource.java:416)
at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnection(DriverDataSource.java:381)
at org.flywaydb.core.internal.util.jdbc.JdbcUtils.openConnection(JdbcUtils.java:51)
at org.flywaydb.core.Flyway.execute(Flyway.java:1418)
at org.flywaydb.core.Flyway.migrate(Flyway.java:971)
at tgam.service.data.Migrate$.$anonfun$main$1(Migrate.scala:11)
at foo.Migrate$.$anonfun$main$1$adapted(Migrate.scala:8)
at foo.Migrate$$$Lambda$4/458209687.apply(Unknown Source)
at scala.Option.foreach(Option.scala:257)
at foo.Migrate$.main(Migrate.scala:8)
at foo.Migrate.main(Migrate.scala)
"VM Thread" os_prio=0 tid=0x00005555f7022000 nid=0xc runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00005555f70da800 nid=0x13 waiting on condition
JNI global references: 232
Heap
def new generation total 4928K, used 1611K [0x00000000f0600000, 0x00000000f0b50000, 0x00000000f5950000)
eden space 4416K, 24% used [0x00000000f0600000, 0x00000000f0712c10, 0x00000000f0a50000)
from space 512K, 100% used [0x00000000f0a50000, 0x00000000f0ad0000, 0x00000000f0ad0000)
to space 512K, 0% used [0x00000000f0ad0000, 0x00000000f0ad0000, 0x00000000f0b50000)
tenured generation total 10944K, used 4187K [0x00000000f5950000, 0x00000000f6400000, 0x0000000100000000)
the space 10944K, 38% used [0x00000000f5950000, 0x00000000f5d66e78, 0x00000000f5d67000, 0x00000000f6400000)
Metaspace used 13061K, capacity 13306K, committed 13568K, reserved 1060864K
class space used 1381K, capacity 1449K, committed 1536K, reserved 1048576K
Looks like an I/O hang in org.mariadb.jdbc.Driver.connect, but I do have a connectTimeout set (10 seconds). This timeout doesn't seem to be effective (would I need a socketTimeout per https://github.com/brettwooldridge/HikariCP/issues/754 ?)
This has been happening for a while. The same thing happened when I was using Tomcat's contextInitialized hook to do migrations. I decided to refactor out into a separate invocation before starting Tomcat, which looks like a better idea in general, but it hasn't affected this behaviour.
What will typically happen is that the code will hang, after 2-3 minutes ECS will timeout, and trigger a redeploy. After some number of these retries (e.g. up to 10), Flyway will run successfully and the service will start.
OK, seems this is a known bug in RDS Aurora - and it's alluded to in the MariaDB Connector doc (surely it should be a runtime warning, though!)
https://mariadb.com/kb/en/mariadb/about-mariadb-connector-j/#infrequently-used
usePipelineAuth Not compatible with aurora During connection,
different queries are executed. When option is active those queries
are send using pipeline (all queries are send, then only all results
are reads), permitting faster connection creation. Default: true.
Since 1.6.0
Also kudos to wlad_ in Freenode #maria who pointed me in the right direction.

Logback hangs forever

I am writing a Java application to route a high number of concurrent messages. The application uses the Logback framework for logging and I am seeing a surprising behavior where the application hangs. In a stack trace, I can see that application threads are stuck in logging calls:
"New I/O client worker #1-1" #125 prio=5 os_prio=0 tid=0x00007f0524017000 nid=0x29f3 waiting on condition [0x00007f052ecea000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f089c4a7e88> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at java.util.concurrent.ArrayBlockingQueue.remainingCapacity(ArrayBlockingQueue.java:468)
at ch.qos.logback.core.AsyncAppenderBase.isQueueBelowDiscardingThreshold(AsyncAppenderBase.java:152)
at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:144)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
at ch.qos.logback.classic.Logger.info(Logger.java:579)
at com.application.ClientListener$6.operationComplete(***.java:514)
- locked <0x00007f089c372b60> (a com.application.ClientListener)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:372)
at org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:316)
at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:776)
at org.jboss.netty.channel.socket.nio.NioWorker.processRegisterTaskQueue(NioWorker.java:257)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:199)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00007f08a80fc118> (a java.util.concurrent.ThreadPoolExecutor$Worker)
It seems that the logging call is blocked trying to acquire an lock <0x00007f089c4a7e88>
inside a java.util.concurrent.ArrayBlockingQueue instance used in AsyncAppenderBase.
In the stack trace, I can see that the lock <0x00007f089c4a7e88> is held by another thread in a thread pool that is idle:
"dispatcher-3" #90 prio=5 os_prio=0 tid=0x00007f04d0004800 nid=0x29d2 waiting on condition [0x00007f0534ed3000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f089cbbaae8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00007f089c4a7e88> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
It looks like the internal lock of the ArrayBlockingQueue was held by that thread and subsequently
not released.
What is going on here? A race condition in java.util.concurrent.ArrayBlockingQueue? A bug in Logback?
I am using Java 8u40 and Logback 1.2.1.
You need to set asyncAppender neverBlock option to true for this to work.

Java process doesn't exit on SIGTERM under heavy load

In normal operation my application exits fine when sent a 'kill -s SIGTERM '.
However, under load sometimes the process does not exit.
I'm just wondering if it possible that http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6392332 is the reason for this, or if it could be something else?
Here are some parts of the stack trace of the process in question, showing the shutdown methods, any help is much appreciated.
Note, this is a Java process running on 64-bit RHEL 6.3.
2013-05-22 08:01:33
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):
...
"Thread-15" prio=10 tid=0x000000001994d000 nid=0x4d5a waiting on condition [0x00007f4da08a3000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000079fd9fce8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:109)
at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:49)
at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.releaseExternalResources(AbstractNioWorkerPool.java:77)
at org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory.releaseExternalResources(NioServerSocketChannelFactory.java:164)
at com.test.services.radius.server.RadiusServerImpl.stop(RadiusServerImpl.java:87)
at com.test.services.radius.ServiceProvider.unload(ServiceProvider.java:61)
at com.test.spf.ServiceProviderCacheImpl.clearCurrentCache(ServiceProviderCacheImpl.java:150)
- locked <0x00000006b8968038> (a com.test.spf.ServiceProviderCacheImpl)
at com.test.spf.ServiceProviderCacheImpl.unload(ServiceProviderCacheImpl.java:170)
at com.test.spf.SPAImpl.stop(SPAImpl.java:178)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.springframework.beans.factory.support.DisposableBeanAdapter.invokeCustomDestroyMethod(DisposableBeanAdapter.java:273)
at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:199)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:487)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:463)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:480)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:463)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:480)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:463)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:480)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:463)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:431)
- locked <0x00000006da966290> (a java.util.LinkedHashMap)
at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1048)
at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1022)
at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:970)
- locked <0x000000073ad1a920> (a java.lang.Object)
at org.springframework.web.context.ContextLoader.closeWebApplicationContext(ContextLoader.java:384)
at org.springframework.web.context.ContextLoaderListener.contextDestroyed(ContextLoaderListener.java:78)
at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:4245)
at org.apache.catalina.core.StandardContext.stop(StandardContext.java:4886)
- locked <0x00000006b8968118> (a org.apache.catalina.core.StandardContext)
at org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:936)
at org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java:1359)
at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:1330)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:326)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1098)
- locked <0x00000006b89682f8> (a org.apache.catalina.core.StandardHost)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1110)
- locked <0x000000068e63ff10> (a org.apache.catalina.core.StandardEngine)
at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:468)
at org.apache.catalina.core.StandardService.stop(StandardService.java:604)
- locked <0x000000068e63ff10> (a org.apache.catalina.core.StandardEngine)
at org.apache.catalina.core.StandardServer.stop(StandardServer.java:788)
at org.apache.catalina.startup.Catalina.stop(Catalina.java:662)
at org.apache.catalina.startup.Catalina$CatalinaShutdownHook.run(Catalina.java:706)
"SIGTERM handler" daemon prio=10 tid=0x0000000025453000 nid=0x4d58 in Object.wait() [0x00007f4da0b2f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000072c10db20> (a org.apache.catalina.startup.Catalina$CatalinaShutdownHook)
at java.lang.Thread.join(Thread.java:1258)
- locked <0x000000072c10db20> (a org.apache.catalina.startup.Catalina$CatalinaShutdownHook)
at java.lang.Thread.join(Thread.java:1332)
at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
at java.lang.Shutdown.runHooks(Shutdown.java:123)
at java.lang.Shutdown.sequence(Shutdown.java:167)
at java.lang.Shutdown.exit(Shutdown.java:212)
- locked <0x0000000707855738> (a java.lang.Class for java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:722)
No Unix process will finish on SIGTERM till all threads are finished properly, thus returned from run method. High load could have its roots in deadlock - deadlocked threads usually never ends. Or endless loop in some thread.
Btw proper way of shutting down the Tomcat is to use bundled shutdown script. It could fail in some extraordinary situations, but then you can just SIGKILL it.
SIGTERM or SIGKILL, that's often the business question. Proper shutdown can easily take more than 15min, when complex application, swapped... a.s.o. So can you stand 15min outage or rather you'll kill it and start in next 2min?

Categories