Flyway regularly hangs (MariaDB connector, RDS) - java

I've been seeing frequent hangs on deployment, at the migration step. Java/Scala application packaged in WAR for Tomcat. Database is RDS Aurora using MariaDB connector (https://downloads.mariadb.org/connector-java/).
Probably has nothing to do with Flyway but is a generic problem getting a connection.
Migration is run from shell in container:
java -cp `echo WEB-INF/lib/*|tr ' ' :` foo.Migrate
Migration code looks like:
def main(args: Array[String]): Unit = {
Environment.dbFlywayPassword.foreach { pass =>
val flyway = new Flyway
flyway.setDataSource(Environment.jdbcUrl, "flyway", pass)
flyway.migrate
}
}
Connection string:
jdbc:mysql:aurora://%RDS_HOST%/xxx?serverSslCert=/rds-ca-2015-root.pem&useSSL=true&connectTimeout=10000
I've tried increasing logging level in Flyway, but nothing is logged after this line:
15:57:35.115 [main] INFO o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse
So I got a thread dump, that looks like this:
15:57:35.115 [main] INFO o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse
2017-06-08 15:57:56
Full thread dump OpenJDK 64-Bit Server VM (25.121-b13 mixed mode):
"MariaDb-failover-1" #8 daemon prio=5 os_prio=0 tid=0x00005555f80ae000 nid=0x14 waiting on condition [0x00007fc330b8f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f5c59b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00005555f70bf000 nid=0x12 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00005555f7063000 nid=0x11 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00005555f7060800 nid=0x10 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00005555f705e800 nid=0xf waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00005555f702f000 nid=0xe in Object.wait() [0x00007fc331616000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00005555f702c000 nid=0xd in Object.wait() [0x00007fc331717000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00005555f6f85000 nid=0xb runnable [0x00007fc34341d000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
- locked <0x00000000f0a66090> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000000f0a81eb0> (a sun.security.ssl.AppInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000000f06b5118> (a java.io.BufferedInputStream)
at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacketArray(StandardPacketInputStream.java:125)
at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacket(StandardPacketInputStream.java:95)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1002)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:982)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readRequestSessionVariables(AbstractConnectProtocol.java:498)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readPipelineAdditionalData(AbstractConnectProtocol.java:544)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:410)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:357)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:149)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:179)
at org.mariadb.jdbc.internal.failover.impl.MastersSlavesListener.initializeConnection(MastersSlavesListener.java:154)
at org.mariadb.jdbc.internal.failover.FailoverProxy.<init>(FailoverProxy.java:94)
at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:464)
at org.mariadb.jdbc.Driver.connect(Driver.java:103)
at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnectionFromDriver(DriverDataSource.java:416)
at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnection(DriverDataSource.java:381)
at org.flywaydb.core.internal.util.jdbc.JdbcUtils.openConnection(JdbcUtils.java:51)
at org.flywaydb.core.Flyway.execute(Flyway.java:1418)
at org.flywaydb.core.Flyway.migrate(Flyway.java:971)
at tgam.service.data.Migrate$.$anonfun$main$1(Migrate.scala:11)
at foo.Migrate$.$anonfun$main$1$adapted(Migrate.scala:8)
at foo.Migrate$$$Lambda$4/458209687.apply(Unknown Source)
at scala.Option.foreach(Option.scala:257)
at foo.Migrate$.main(Migrate.scala:8)
at foo.Migrate.main(Migrate.scala)
"VM Thread" os_prio=0 tid=0x00005555f7022000 nid=0xc runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00005555f70da800 nid=0x13 waiting on condition
JNI global references: 232
Heap
def new generation total 4928K, used 1611K [0x00000000f0600000, 0x00000000f0b50000, 0x00000000f5950000)
eden space 4416K, 24% used [0x00000000f0600000, 0x00000000f0712c10, 0x00000000f0a50000)
from space 512K, 100% used [0x00000000f0a50000, 0x00000000f0ad0000, 0x00000000f0ad0000)
to space 512K, 0% used [0x00000000f0ad0000, 0x00000000f0ad0000, 0x00000000f0b50000)
tenured generation total 10944K, used 4187K [0x00000000f5950000, 0x00000000f6400000, 0x0000000100000000)
the space 10944K, 38% used [0x00000000f5950000, 0x00000000f5d66e78, 0x00000000f5d67000, 0x00000000f6400000)
Metaspace used 13061K, capacity 13306K, committed 13568K, reserved 1060864K
class space used 1381K, capacity 1449K, committed 1536K, reserved 1048576K
Looks like an I/O hang in org.mariadb.jdbc.Driver.connect, but I do have a connectTimeout set (10 seconds). This timeout doesn't seem to be effective (would I need a socketTimeout per https://github.com/brettwooldridge/HikariCP/issues/754 ?)
This has been happening for a while. The same thing happened when I was using Tomcat's contextInitialized hook to do migrations. I decided to refactor out into a separate invocation before starting Tomcat, which looks like a better idea in general, but it hasn't affected this behaviour.
What will typically happen is that the code will hang, after 2-3 minutes ECS will timeout, and trigger a redeploy. After some number of these retries (e.g. up to 10), Flyway will run successfully and the service will start.

OK, seems this is a known bug in RDS Aurora - and it's alluded to in the MariaDB Connector doc (surely it should be a runtime warning, though!)
https://mariadb.com/kb/en/mariadb/about-mariadb-connector-j/#infrequently-used
usePipelineAuth Not compatible with aurora During connection,
different queries are executed. When option is active those queries
are send using pipeline (all queries are send, then only all results
are reads), permitting faster connection creation. Default: true.
Since 1.6.0
Also kudos to wlad_ in Freenode #maria who pointed me in the right direction.

Related

Why is the java.exe is consuming more CPU due to “org.xnio.nio” threads?

We are running WildFly 12 with JDK 1.8 version. We faced the portal down / slowness issue in our web application due to high CPU usage (90% above) of java.exe. Then, we have taken a thread dump using JStack and most of the high CPU threads are pointing to "org.xnio.nio"
Portal is running in HTTPS
CPU: 8 core
Total RAM: 24 GB
Allocated JVM memory: From 1 GB (min) to 5 GB (max)
Occupaid JVM memory when taken the thread dump: 4 GB
DB is in separate machine
Below are the thread dump results (High CPU threads):
"default I/O-1" #95 prio=5 os_prio=0 tid=0x00000000260bf800 nid=0x1988 runnable [0x000000002e49f000]
java.lang.Thread.State: RUNNABLE
at io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:147)
at io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
at io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:59)
at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
at io.undertow.conduits.ReadTimeoutStreamSourceConduit$2.readReady(ReadTimeoutStreamSourceConduit.java:89)
at io.undertow.protocols.ssl.SslConduit$SslReadReadyHandler.readReady(SslConduit.java:1145)
at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:591)
"default I/O-13" #108 prio=5 os_prio=0 tid=0x0000000022186000 nid=0x2310 runnable [0x000000002f09f000]
java.lang.Thread.State: RUNNABLE
at java.lang.Object.notifyAll(Native Method)
at sun.nio.ch.WindowsSelectorImpl$StartLock.startThreads(WindowsSelectorImpl.java:189)
- locked <0x00000006618ae440> (a sun.nio.ch.WindowsSelectorImpl$StartLock)
at sun.nio.ch.WindowsSelectorImpl$StartLock.access$300(WindowsSelectorImpl.java:181)
at sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:153)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x000000066508ea18> (a sun.nio.ch.Util$3)
- locked <0x000000066508ea08> (a java.util.Collections$UnmodifiableSet)
- locked <0x0000000665082b98> (a sun.nio.ch.WindowsSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:551)
"default I/O-13" #108 prio=5 os_prio=0 tid=0x0000000022186000 nid=0x2310 runnable [0x000000002f09e000]
java.lang.Thread.State: RUNNABLE
at java.util.AbstractCollection.toArray(AbstractCollection.java:183)
at sun.nio.ch.Util$3.toArray(Util.java:322)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:570)
- locked <0x000000066508ea18> (a sun.nio.ch.Util$3)
- locked <0x0000000665082b98> (a sun.nio.ch.WindowsSelectorImpl)
"default I/O-5" #100 prio=5 os_prio=0 tid=0x00000000260c0000 nid=0x858 runnable [0x000000002e89e000]
java.lang.Thread.State: RUNNABLE
at org.xnio.nio.WorkerThread.run(WorkerThread.java:480)
The issue is occuring (2 - 5) days once.
Thanks in advance

Neo4j import tool - OutOfMemory error: GC overhead limit exceeded

I am using the neo4j-import tool (Windows) to import ~1 million nodes with ~20 million relationships, all of which should be unique. The process proceeds smoothly until it gets to the "Relationship Count" task, where it loads all the way up to 20M (seemingly all of the relationships) but then it hangs for awhile (30 min-1 hour), eventually returning "java.lang.OutOfMemoryError: GC overhead limit exceeded".
I have loaded large graph databases successfully before (39M nodes, 21M relationships) so I'm not sure what the issue is. Is it because the graph database is more densely connected compared to the previous database that I loaded?
Or, could there be a memory leak? In my task manager, the Java Platform SE Binary process requires an increasingly large amount memory (up to 12-13GB out of 16GB of RAM) as the import loads, especially towards the end. This seems suspiciously large, especially since the 39M node/21M relationship graph database was able to import successfully using the import tool relatively quickly (didn't hang at relationship count).
Any thoughts as to what could be going wrong? Thanks in advance!
If it helps to look at my nodes/relationships files, here is a link to them:
https://drive.google.com/open?id=0Bw7N-SlJA3ZCei0ycEhoa2YwNUU
Here is the neo4j shell output:
C:Users\Username\Documents\Neo4j>neo4jImport -into graphDB1.graphdb --nodes D:\concept.csv --relationships D:\predicate.csv --stacktrace --idtype integer
WARNING! This batch script has been deprecated. Please use the provided PowerShell scripts instead: http://neo4j.com/docs/stable/powershell.html
The system cannot find the path specified.
Importing the contents of these files into graphDB1.graphdb:
Nodes:
D:\concept.csv
Relationships:
D:\predicate.csv
Available memory:
Free machine memory: 13.50 GB
Max heap memory : 12.75 GB
Nodes
[>:|PR|NOD|*LABEL SCAN---------------------------------|v:6.79 MB/s----------------------------] 1M
Done in 40s 562ms
Prepare node index
[*DETECT:20.37 MB------------------------------------------------------------------------------] 1M
Done in 802ms
Calculate dense nodes
[*>:59.38 MB/s----------------------------------|PREPARE(3)====================================] 20M
Done in 12s 566ms
Relationships
[>:2.01 |PREPARE-----------|P|RELATIONSHI|*v:4.05 MB/s-----------------------------------------] 20M
Done in 6m 3s 655ms
Node --> Relationship
[>:3.19 MB/s--------------------------|L|*v:2.39 MB/s------------------------------------------] 1M
Done in 8s 421ms
Relationship --> Relationship
[*>:6.82 MB/s--------------------------------------|LINK-----------|v:6.82 MB/s----------------] 20M
Done in 1m 36s 849ms
Node counts
[*COUNT:91.55 MB-------------------------------------------------------------------------------] 1M
Done in 3m 35s 21ms
Relationship counts
[*>:8.62 MB/s-----------------------------------------------------------|COUNT-----------------] 20MException in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Unknown Source)
at java.util.ArrayList.toArray(Unknown Source)
at java.util.ArrayList.<init>(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.stats.StepStats.<init>(StepStats.java:39)
at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.stats(AbstractStep.java:220)
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution$1.compare(StageExecution.java:123)
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution$1.compare(StageExecution.java:118)
at java.util.TimSort.countRunAndMakeAscending(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.Arrays.sort(Unknown Source)
at java.util.Collections.sort(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stepsOrderedBy(StageExecution.java:117)
at org.neo4j.unsafe.impl.batchimport.staging.DynamicProcessorAssigner.assignProcessorsToPotentialBottleNeck(DynamicProcessorAssigner.java:94)
at org.neo4j.unsafe.impl.batchimport.staging.DynamicProcessorAssigner.check(DynamicProcessorAssigner.java:81)
at org.neo4j.unsafe.impl.batchimport.staging.MultiExecutionMonitor.check(MultiExecutionMonitor.java:106)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:65)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseExecution(ExecutionSupervisors.java:80)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:224)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:185)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:363)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:279)
UPDATE 1:
Here is the thread dump at the moment(s) that the import hangs at relationship counts:
2016-02-17 08:28:12
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
"MuninnPageCache[1]-FlushTask" daemon prio=6 tid=0x0000000026855800 nid=0xfe0 waiting on condition [0x00000000288fe000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000004c0189810> (a org.neo4j.io.pagecache.impl.muninn.MuninnPageCache)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.continuouslyFlushPages(MuninnPageCache.java:909)
at org.neo4j.io.pagecache.impl.muninn.FlushTask.run(FlushTask.java:36)
at org.neo4j.io.pagecache.impl.muninn.BackgroundTask.run(BackgroundTask.java:45)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
"MuninnPageCache[1]-EvictionTask" daemon prio=6 tid=0x0000000026904000 nid=0x3bd4 runnable [0x00000000287fe000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000004c0189810> (a org.neo4j.io.pagecache.impl.muninn.MuninnPageCache)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.parkEvictor(MuninnPageCache.java:697)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.parkUntilEvictionRequired(MuninnPageCache.java:751)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.continuouslySweepPages(MuninnPageCache.java:732)
at org.neo4j.io.pagecache.impl.muninn.EvictionTask.run(EvictionTask.java:39)
at org.neo4j.io.pagecache.impl.muninn.BackgroundTask.run(BackgroundTask.java:45)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
"Service Thread" daemon prio=6 tid=0x0000000024ee8000 nid=0x301c runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x0000000024ee6000 nid=0x3060 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x0000000024ee2800 nid=0x2198 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Attach Listener" daemon prio=10 tid=0x0000000024ee2000 nid=0x1ae4 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x0000000024ee1000 nid=0x135c waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=8 tid=0x0000000024ed9000 nid=0x3480 in Object.wait() [0x00000000278ff000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000004c000d4b0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
- locked <0x00000004c000d4b0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
"Reference Handler" daemon prio=10 tid=0x0000000024ed8000 nid=0x1ae8 in Object.wait() [0x00000000277ff000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000004c000d300> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
- locked <0x00000004c000d300> (a java.lang.ref.Reference$Lock)
"main" prio=6 tid=0x00000000023c2800 nid=0x2e7c waiting on condition [0x00000000023bf000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.neo4j.io.fs.FileUtils.waitAndThenTriggerGC(FileUtils.java:253)
at org.neo4j.io.fs.FileUtils.deleteFile(FileUtils.java:110)
at org.neo4j.io.fs.DefaultFileSystemAbstraction.deleteFile(DefaultFileSystemAbstraction.java:127)
at org.neo4j.kernel.impl.storemigration.FileOperation$3.perform(FileOperation.java:93)
at org.neo4j.kernel.impl.storemigration.StoreFile.fileOperation(StoreFile.java:267)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:389)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:279)
"VM Thread" prio=10 tid=0x0000000024ed1800 nid=0x3058 runnable
"GC task thread#0 (ParallelGC)" prio=6 tid=0x00000000023d7000 nid=0x313c runnable
"GC task thread#1 (ParallelGC)" prio=6 tid=0x00000000023d9000 nid=0x3144 runnable
"GC task thread#2 (ParallelGC)" prio=6 tid=0x00000000023da800 nid=0x974 runnable
"GC task thread#3 (ParallelGC)" prio=6 tid=0x00000000023dc000 nid=0x3a3c runnable
"GC task thread#4 (ParallelGC)" prio=6 tid=0x00000000023de800 nid=0x3684 runnable
"GC task thread#5 (ParallelGC)" prio=6 tid=0x00000000023e1000 nid=0x35b8 runnable
"GC task thread#6 (ParallelGC)" prio=6 tid=0x00000000023e4000 nid=0x3950 runnable
"GC task thread#7 (ParallelGC)" prio=6 tid=0x00000000023e5800 nid=0x318c runnable
"GC task thread#8 (ParallelGC)" prio=6 tid=0x00000000023e8800 nid=0x30b8 runnable
"GC task thread#9 (ParallelGC)" prio=6 tid=0x00000000023e9800 nid=0x32dc runnable
"VM Periodic Task Thread" prio=10 tid=0x0000000024eed800 nid=0x3710 waiting on condition
JNI global references: 377
Heap
PSYoungGen total 2071552K, used 0K [0x0000000780000000, 0x0000000800000000, 0x0000000800000000)
eden space 2043904K, 0% used [0x0000000780000000,0x0000000780000000,0x00000007fcc00000)
from space 27648K, 0% used [0x00000007fe500000,0x00000007fe500000,0x0000000800000000)
to space 25600K, 0% used [0x00000007fcc00000,0x00000007fcc00000,0x00000007fe500000)
ParOldGen total 11534336K, used 10982258K [0x00000004c0000000, 0x0000000780000000, 0x0000000780000000)
object space 11534336K, 95% used [0x00000004c0000000,0x000000075e4dcb50,0x0000000780000000)
PSPermGen total 21504K, used 13521K [0x00000004bae00000, 0x00000004bc300000, 0x00000004c0000000)
object space 21504K, 62% used [0x00000004bae00000,0x00000004bbb34588,0x00000004bc300000)
2016-02-17 08:28:20
That is very strange on such a small dataset. How many unique relationships and labels do you expect there to be in this dataset? Also can you provide a thread dump some way into that pause when it happens?
EDIT: problem was that a column containing property values was used as LABEL. This produced an enormous amount of labels by mistake and the counting doesn't scale with that.

how to interprete this thread dump from a hung Java Swing application?

I have the following thread dump from a hung java swing application. It hung after a button is clicked and the GUI changed to blank. Other threads in socket communication and task management are still working (from the log file I can tell). I have removed some non-relevant output.
The #13 AW-EventQueue-0 should send out a command through the socket but it seems failed there. The #20 and #21 are AW-EventQueue-0-SharedResourceRunner which is not the same as the #13? It seems there is no deadlock but the GUI is not responsive and became blank.
do you see any useful information about the cause of the hanging?
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode):
"DestroyJavaVM" #32 prio=5 os_prio=0 tid=0x00007f286c009800 nid=0xa41 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"TimerQueue" #22 daemon prio=5 os_prio=0 tid=0x00007f28002a8800 nid=0xa65 waiting on condition [0x00007f284c56f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000088a8f5c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:211)
at javax.swing.TimerQueue.run(TimerQueue.java:171)
at java.lang.Thread.run(Thread.java:745)
"AWT-EventQueue-0-SharedResourceRunner" #21 daemon prio=6 os_prio=0 tid=0x00007f280021d000 nid=0xa64 in Object.wait() [0x00007f284d434000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000088aec748> (a jogamp.opengl.SharedResourceRunner)
at java.lang.Object.wait(Object.java:502)
at jogamp.opengl.SharedResourceRunner.run(SharedResourceRunner.java:276)
- locked <0x0000000088aec748> (a jogamp.opengl.SharedResourceRunner)
at java.lang.Thread.run(Thread.java:745)
"AWT-EventQueue-0-SharedResourceRunner" #20 daemon prio=6 os_prio=0 tid=0x00007f28001f3000 nid=0xa63 in Object.wait() [0x00007f284f7f5000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000088aed588> (a jogamp.opengl.SharedResourceRunner)
at java.lang.Object.wait(Object.java:502)
at jogamp.opengl.SharedResourceRunner.run(SharedResourceRunner.java:276)
- locked <0x0000000088aed588> (a jogamp.opengl.SharedResourceRunner)
at java.lang.Thread.run(Thread.java:745)
"AWT-EventQueue-0" #13 prio=6 os_prio=0 tid=0x00007f286c444800 nid=0xa59 in Object.wait() [0x00007f2858913000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000dc467018> (a java.lang.Object)
at java.lang.Object.wait(Object.java:502)
at com.mycp.common.task.BMBTaskBase.startTask(BMBTaskBase.java:551)
- locked <0x00000000dc467018> (a java.lang.Object)
at com.mycp.uiapp.workmgmt.WorkMgmtMgr.sendBegCmd(WorkMgmtMgr.java:334)
at com.mycp.uiapp.workmgmt.WorkMgmtPanelBase.prepareAndSendBegWork(WorkMgmtPanelBase.java:559)
at com.mycp.uiapp.workmmgmt.WorkMgmtPanel.prepareAndSendBegWork(WorkMgmtPanel.java:1479)
at com.mycp.uiapp.workmgmt.WorkMgmtPanelBase.btnPrepareClicked(WorkMgmtPanelBase.java:363)
at com.mycp.uiapp.workmgmt.WorkMgmtPanel.btnPrepareClicked(WorkMgmtPanel.java:1412)
at com.mycp.uiapp.workmgmt.WorkMgmtPanel.actionPerformed(WorkMgmtPanel.java:1336)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2346)
"AWT-Shutdown" #14 prio=5 os_prio=0 tid=0x00007f286c443000 nid=0xa58 in Object.wait() [0x00007f2858a17000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000088ae8c28> (a java.lang.Object)
at java.lang.Object.wait(Object.java:502)
at sun.awt.AWTAutoShutdown.run(AWTAutoShutdown.java:295)
- locked <0x0000000088ae8c28> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:745)
"AWT-XAWT" #12 daemon prio=6 os_prio=0 tid=0x00007f286c384000 nid=0xa51 runnable [0x00007f285914f000]
java.lang.Thread.State: RUNNABLE
at sun.awt.X11.XToolkit.waitForEvents(Native Method)
at sun.awt.X11.XToolkit.run(XToolkit.java:559)
at sun.awt.X11.XToolkit.run(XToolkit.java:523)
at java.lang.Thread.run(Thread.java:745)
"Java2D Disposer" #10 daemon prio=10 os_prio=0 tid=0x00007f286c35e000 nid=0xa50 in Object.wait() [0x00007f2859250000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000087ab7ec0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:142)
- locked <0x0000000087ab7ec0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:158)
at sun.java2d.Disposer.run(Disposer.java:148)
at java.lang.Thread.run(Thread.java:745)
"Thread-0" #9 prio=5 os_prio=0 tid=0x00007f286c234800 nid=0xa4f waiting on condition [0x00007f2859af5000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.mycp.logging.BMBLogging$Task.run(BMBLogging.java:1072)
at java.lang.Thread.run(Thread.java:745)
"Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007f286c0cf800 nid=0xa4d runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f286c0b2000 nid=0xa4c waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f286c0b0000 nid=0xa4b waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f286c0ad800 nid=0xa4a waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f286c0ab000 nid=0xa49 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f286c07c000 nid=0xa48 in Object.wait() [0x00007f285a2dd000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000087a7e6c8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:142)
- locked <0x0000000087a7e6c8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:158)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f286c07a000 nid=0xa47 in Object.wait() [0x00007f285a3de000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000087a7e708> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
- locked <0x0000000087a7e708> (a java.lang.ref.Reference$Lock)
"VM Thread" os_prio=0 tid=0x00007f286c072800 nid=0xa46 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f286c01e800 nid=0xa42 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f286c020800 nid=0xa43 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f286c022000 nid=0xa44 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f286c024000 nid=0xa45 runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f286c0d2000 nid=0xa4e waiting on condition
JNI global references: 485
Heap
PSYoungGen total 118272K, used 98176K [0x00000000d6e00000, 0x00000000de700000, 0x0000000100000000)
eden space 113152K, 82% used [0x00000000d6e00000,0x00000000dc8e00c8,0x00000000ddc80000)
from space 5120K, 100% used [0x00000000de180000,0x00000000de680000,0x00000000de680000)
to space 5120K, 0% used [0x00000000ddc80000,0x00000000ddc80000,0x00000000de180000)
ParOldGen total 159744K, used 76671K [0x0000000084a00000, 0x000000008e600000, 0x00000000d6e00000)
object space 159744K, 47% used [0x0000000084a00000,0x00000000894dfc50,0x000000008e600000)
Metaspace used 30027K, capacity 30212K, committed 30464K, reserved 1077248K
class space used 3528K, capacity 3582K, committed 3584K, reserved 1048576K
The thread dump shows stack traces from about 22 different threads. Many of them look like application threads (as opposed to JVM internal threads). Most of the application threads are waiting for something. Which of those threads should not be waiting?
I'd start by looking at thread 13: Looks like the Swing EDT, and it's waiting inside a call to a button's actionPerformed(...) handler. That can't be good.
I think I am late in the game. Anyways, from your logs we could see that a thread is in Parking Waiting state.
"TimerQueue" #22 daemon prio=5 os_prio=0 tid=0x00007f28002a8800 nid=0xa65 waiting on condition [0x00007f284c56f000]
java.lang.Thread.State: WAITING (parking)
We could see that this thread is expecting something from DelayQueue.
DelayQueue-An unbounded blocking queue of Delayed elements, in which an element can only be taken when its delay has expired.
So In the same TimerQueue, we have
at java.util.concurrent.DelayQueue.take(DelayQueue.java:211)
This take() function waits if an element with expired delay is available on this queue.
This could be the reason for application hanging issue as this thread is still waiting and doesn't shutdown. So, there are still threads alive. To resolve this you will need to kill these threads.
For this you could just use ExecutorServices.shutdown() method. OR Simply you could use System.exit().
I would recommend you to use System.exit().

How to handle a SSLSocketImpl Deadlock properly?

Very rarely I get a deadlock while using wiki-java. Having a look at the full thread dump (acquired via kill -3 $JAVA-PID) suggests that the deadlock seems to be originating somewhere in the SSLSocketImpl. I'd prefer to avoid this deadlock in the first place (instead of doing some hacky recovery) but I am unsure how to find the cause and prevent it. Is there a way to set a timeout in the SSLSocketImpl or throw an exception in case of the deadlock? (It would be pretty straightforward to catch it in the main loop and redo the last call)
Full thread dump OpenJDK 64-Bit Server VM (24.51-b03 mixed mode):
"Service Thread" daemon prio=10 tid=0x00007f3cd816b000 nid=0x102c runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007f3cd8168800 nid=0x102b waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007f3cd8165800 nid=0x102a waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007f3cd8163800 nid=0x1029 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007f3cd8140800 nid=0x1028 waiting on condition [0x00007f3ccb9f7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007d77a9080> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:799)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:672)
at sun.security.ssl.SSLSocketImpl.sendAlert(SSLSocketImpl.java:2005)
at sun.security.ssl.SSLSocketImpl.warning(SSLSocketImpl.java:1832)
at sun.security.ssl.SSLSocketImpl.closeInternal(SSLSocketImpl.java:1600)
- locked <0x00000007d77a8d78> (a sun.security.ssl.SSLSocketImpl)
at sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:1538)
at sun.security.ssl.BaseSSLSocketImpl.finalize(BaseSSLSocketImpl.java:249)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:101)
at java.lang.ref.Finalizer.access$100(Finalizer.java:32)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:190)
"Reference Handler" daemon prio=10 tid=0x00007f3cd813e800 nid=0x1027 in Object.wait() [0x00007f3ccbaf9000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000078495a2c8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x000000078495a2c8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00007f3cd8008000 nid=0x1021 waiting for monitor entry [0x00007f3cdfdb7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.security.ssl.SSLSocketImpl.getConnectionState(SSLSocketImpl.java:649)
- waiting to lock <0x00000007d77a8d78> (a sun.security.ssl.SSLSocketImpl)
at sun.security.ssl.SSLSocketImpl.isClosed(SSLSocketImpl.java:1446)
at java.net.Socket.getTcpNoDelay(Socket.java:965)
at sun.security.ssl.BaseSSLSocketImpl.getTcpNoDelay(BaseSSLSocketImpl.java:345)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:819)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:801)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122)
- locked <0x00000007d77a8d60> (a sun.security.ssl.AppOutputStream)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000007d77a8d48> (a java.io.BufferedOutputStream)
at java.io.PrintStream.flush(PrintStream.java:338)
- locked <0x00000007d77a8d28> (a java.io.PrintStream)
at sun.net.www.MessageHeader.print(MessageHeader.java:297)
- locked <0x00000007d6d057b0> (a sun.net.www.MessageHeader)
at sun.net.www.http.HttpClient.writeRequests(HttpClient.java:599)
at sun.net.www.http.HttpClient.writeRequests(HttpClient.java:610)
at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:619)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1321)
- locked <0x00000007d6d05640> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getHeaderFieldKey(HttpURLConnection.java:2731)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getHeaderFieldKey(HttpsURLConnectionImpl.java:307)
at shared.Wiki.grabCookies(Wiki.java:6907)
at shared.Wiki.fetch(Wiki.java:6462)
at shared.Wiki.getPageText(Wiki.java:1465)
at smallBots.Bot1.getText(Bot1.java:204)
at smallBots.Bot1.crawlCategory(Bot1.java:74)
at smallBots.Bot1.main(Bot1.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
"VM Thread" prio=10 tid=0x00007f3cd813a000 nid=0x1026 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f3cd801d800 nid=0x1022 runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f3cd801f800 nid=0x1023 runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f3cd8021800 nid=0x1024 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f3cd8023000 nid=0x1025 runnable
"VM Periodic Task Thread" prio=10 tid=0x00007f3cd8175800 nid=0x102d waiting on condition
JNI global references: 205
Found one Java-level deadlock:
=============================
"Finalizer":
waiting for ownable synchronizer 0x00000007d77a9080, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
which is held by "main"
"main":
waiting to lock monitor 0x00007f3cac0015c8 (object 0x00000007d77a8d78, a sun.security.ssl.SSLSocketImpl),
which is held by "Finalizer"
Java stack information for the threads listed above:
===================================================
"Finalizer":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007d77a9080> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:799)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:672)
at sun.security.ssl.SSLSocketImpl.sendAlert(SSLSocketImpl.java:2005)
at sun.security.ssl.SSLSocketImpl.warning(SSLSocketImpl.java:1832)
at sun.security.ssl.SSLSocketImpl.closeInternal(SSLSocketImpl.java:1600)
- locked <0x00000007d77a8d78> (a sun.security.ssl.SSLSocketImpl)
at sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:1538)
at sun.security.ssl.BaseSSLSocketImpl.finalize(BaseSSLSocketImpl.java:249)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:101)
at java.lang.ref.Finalizer.access$100(Finalizer.java:32)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:190)
"main":
at sun.security.ssl.SSLSocketImpl.getConnectionState(SSLSocketImpl.java:649)
- waiting to lock <0x00000007d77a8d78> (a sun.security.ssl.SSLSocketImpl)
at sun.security.ssl.SSLSocketImpl.isClosed(SSLSocketImpl.java:1446)
at java.net.Socket.getTcpNoDelay(Socket.java:965)
at sun.security.ssl.BaseSSLSocketImpl.getTcpNoDelay(BaseSSLSocketImpl.java:345)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:819)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:801)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122)
- locked <0x00000007d77a8d60> (a sun.security.ssl.AppOutputStream)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000007d77a8d48> (a java.io.BufferedOutputStream)
at java.io.PrintStream.flush(PrintStream.java:338)
- locked <0x00000007d77a8d28> (a java.io.PrintStream)
at sun.net.www.MessageHeader.print(MessageHeader.java:297)
- locked <0x00000007d6d057b0> (a sun.net.www.MessageHeader)
at sun.net.www.http.HttpClient.writeRequests(HttpClient.java:599)
at sun.net.www.http.HttpClient.writeRequests(HttpClient.java:610)
at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:619)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1321)
- locked <0x00000007d6d05640> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getHeaderFieldKey(HttpURLConnection.java:2731)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getHeaderFieldKey(HttpsURLConnectionImpl.java:307)
at shared.Wiki.grabCookies(Wiki.java:6907)
at shared.Wiki.fetch(Wiki.java:6462)
at shared.Wiki.getPageText(Wiki.java:1465)
at smallBots.Bot1.getText(Bot1.java:204)
at smallBots.Bot1.crawlCategory(Bot1.java:74)
at smallBots.Bot1.main(Bot1.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Found 1 deadlock.
Heap
PSYoungGen total 10752K, used 801K [0x00000007d6d00000, 0x00000007d7880000, 0x0000000800000000)
eden space 9728K, 1% used [0x00000007d6d00000,0x00000007d6d1fc20,0x00000007d7680000)
from space 1024K, 65% used [0x00000007d7780000,0x00000007d7828b40,0x00000007d7880000)
to space 1024K, 0% used [0x00000007d7680000,0x00000007d7680000,0x00000007d7780000)
ParOldGen total 93696K, used 69956K [0x0000000784800000, 0x000000078a380000, 0x00000007d6d00000)
object space 93696K, 74% used [0x0000000784800000,0x0000000788c51160,0x000000078a380000)
PSPermGen total 21504K, used 9537K [0x000000077a200000, 0x000000077b700000, 0x0000000784800000)
object space 21504K, 44% used [0x000000077a200000,0x000000077ab50720,0x000000077b700000)
The asker's answer: update to Java 8 to fix it.

Java thread lock on System.err()

I'm a little baffled.
I have two threads writing to System.err (and System.out), and eventually they get stuck in a thread lock. Are System.err and System.out really not thread-safe?
The structure of the program at that point is:
main thread having launched two Reader threads to do some processing and printout, called via a ThreadPoolExecutor.
one Reader thread ("s3gp-2") that is blocked on System.err.format(), from within a separate IOStats object, in a synchronized method called IOStats.add().
another Reader thread that is (correctly) blocked trying to call the same IOStats.add() method that is blocked on System.err.format().
Note that each Reader typically writes to System.err and System.out.
Should I have a separate singleton object that handles all output with synchronized methods? Any insight very appreciated, as usual.
Update
I realize that I should give a bit more context: that java program is invoked in parallel by all the n segments of a Greenplum DB (via a create external web table ... execute prgm). The locking behavior is extremely rare and usually all goes well. When the problem occurs, all the n processes are blocked. I am focusing on just one of them for the debugging.
Some more notes:
For a few minutes, I thought that the fact that I also use log4j.Logger might cause trouble. But, in fact, there is no Console appender used for that program so I'm less inclined to think that is the problem.
using lsof as well as ls -l /proc/$pid/fd, I observe that the stderr of the java process that is blocked in a pipe that is also read by postgres. As one commenter suggested, it might be that this pipe is not actually drained by postgres, and that blocks the System.err print out.
In fact here are the relevant lines:
% lsof | head -1
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
% lsof | grep $pid | grep ' 2w '
java 10047 gpadmin 2w FIFO 0,6 26854739 pipe
% lsof | grep $pid | grep ' 2w '
java 10047 gpadmin 2w FIFO 0,6 26854739 pipe
% node=26854739
% lsof | grep $node
postgres 10028 gpadmin 26r FIFO 0,6 26854739 pipe
sh 10046 gpadmin 2w FIFO 0,6 26854739 pipe
java 10047 gpadmin 2w FIFO 0,6 26854739 pipe
Details: The stack dump (jstack -l pid) gives me:
2012-04-17 12:23:02
Full thread dump Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00000000061a3800 nid=0x75c5 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"s3gp-2" prio=10 tid=0x00002aaabc207000 nid=0x281f runnable [0x0000000041718000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:318)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000f601c4f8> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000000f601c4d8> (a java.io.PrintStream)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
- locked <0x00000000f601c6f0> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.write(PrintStream.java:527)
- locked <0x00000000f601c4d8> (a java.io.PrintStream)
at java.io.PrintStream.print(PrintStream.java:669)
at java.io.PrintStream.append(PrintStream.java:1065)
at java.io.PrintStream.append(PrintStream.java:57)
at java.util.Formatter$FixedString.print(Formatter.java:2563)
at java.util.Formatter.format(Formatter.java:2476)
at java.io.PrintStream.format(PrintStream.java:970)
- locked <0x00000000f601c4d8> (a java.io.PrintStream)
at com.foo.serv.util.Loader$IOStats.add(Loader.java:136)
- locked <0x00000000f6015870> (a com.foo.serv.util.Loader$IOStats)
at com.foo.serv.util.Loader$Reader.run(Loader.java:199)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Locked ownable synchronizers:
- <0x00000000f6024470> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"s3gp-1" prio=10 tid=0x00002aaabc1fa000 nid=0x281e waiting for monitor entry [0x00000000405d9000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.foo.serv.util.Loader$IOStats.add(Loader.java:129)
- waiting to lock <0x00000000f6015870> (a com.foo.serv.util.Loader$IOStats)
at com.foo.serv.util.Loader$Reader.run(Loader.java:199)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Locked ownable synchronizers:
- <0x00000000f601ea50> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Service Thread" daemon prio=10 tid=0x00002aaab0054000 nid=0x27e7 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"C2 CompilerThread1" daemon prio=10 tid=0x00002aaab0051800 nid=0x27e6 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"C2 CompilerThread0" daemon prio=10 tid=0x00002aaab004e800 nid=0x27e4 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Signal Dispatcher" daemon prio=10 tid=0x00002aaab004c000 nid=0x27e3 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Finalizer" daemon prio=10 tid=0x00002aaab0001000 nid=0x2762 in Object.wait() [0x00000000428cc000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f6015730> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000000f6015730> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
Locked ownable synchronizers:
- None
"Reference Handler" daemon prio=10 tid=0x0000000005edc800 nid=0x2760 in Object.wait() [0x00000000427cb000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f603b9c0> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000000f603b9c0> (a java.lang.ref.Reference$Lock)
Locked ownable synchronizers:
- None
"main" prio=10 tid=0x0000000005e3f800 nid=0x2744 waiting on condition [0x0000000041abe000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f61df188> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1433)
at com.foo.serv.util.Loader.execute(Loader.java:107)
at com.foo.serv.util.Loader.main(Loader.java:571)
Locked ownable synchronizers:
- None
"VM Thread" prio=10 tid=0x0000000005ed5000 nid=0x275e runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005e4a000 nid=0x2745 runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005e4c000 nid=0x2746 runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005e4e000 nid=0x2747 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005e4f800 nid=0x2748 runnable
"GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005e51800 nid=0x2749 runnable
"GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005e53800 nid=0x274a runnable
"GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005e55000 nid=0x274b runnable
"GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005e57000 nid=0x274c runnable
"GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005e59000 nid=0x274d runnable
"GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005e5b000 nid=0x274e runnable
"GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005e5c800 nid=0x274f runnable
"GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005e5e800 nid=0x2750 runnable
"GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005e60800 nid=0x2751 runnable
"VM Periodic Task Thread" prio=10 tid=0x00002aaab005e800 nid=0x27eb waiting on condition
JNI global references: 207

Categories