Questions about native calltrace using Java - java

We're debugging an error that causing a crash in a Tomcat web application.
The application uses 2 3rd-party apps over jni, one of the 3rd-parties using SmartHeap (it is a memory management library for c/c++ applications), the other don't (it is webMethods broker version 5).
The strange thing is I see in the crash log that webMethods calls its native methods to initiate a connection to the broker server, but if I print the call trace of the thread where the crash happened using WinDbg (loading the minidump file created when the JVM crashed), it contains calls to SmartHeap functions. Now i feel I'm a bit lost... because I've checked, and found no references to this dll from the webMethods binaries.
(actually a memory allocation is called)
My question is how is it possible?
I mean anybody could describe how this part is working? Because I thought that the interpreted/compiled and native frames are called in a fixed order (it is logical).
maybe the call stack is invalid? (now we have many dump files with almost the same call trace)
or the call trace (the calling order of the native functions) is valid, only some of the functions have been reordered before calling (like a lazy object has to be generated before sending it to the webMethods broker, but i don't see any sign of this)
I'm querying the call trace on the dump file by calling ".ecxr" and "kv", the output is:
0:060> .ecxr
eax=4d330554 ebx=4d350010 ecx=4d330010 edx=00000000 esi=4d350010 edi=00000000
eip=4c912f15 esp=4bf1dad0 ebp=3574884d iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206
shsmp!shi_allocSmall2+0x195:
4c912f15 8b4d00 mov ecx,dword ptr [ebp] ss:0023:3574884d=????????
0:060> k
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
4bf1daec 4c912bbd shsmp!shi_allocSmall2+0x195
4bf1dafc 4c91b973 shsmp!MemAllocPtr+0x5d
*** WARNING: Unable to verify checksum for awssl50jn.dll
*** ERROR: Symbol file could not be found. Defaulted to export symbols for awssl50jn.dll -
4bf1db14 49abc38d shsmp!shi_malloc_dbg+0x23
WARNING: Stack unwind information not available. Following frames may be wrong.
4bf1db3c 49abeca2 awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0xa1cd
4bf1db48 49ab5e66 awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0xcae2
4bf1db4c 49ab5e55 awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0x3ca6
4bf1db60 49ab667d awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0x3c95
4bf1db80 49abdbbc awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0x44bd
4bf1dc20 4c912f4f awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0xb9fc
4bf1dc78 49abd607 shsmp!shi_allocSmall2+0x1cf
00000000 00000000 awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0xb447`
Any help would be appreciated!

Related

Ehcache Initial table allocation failed

One of our web application runs within Tomcat 7 which is deployed on AS400 server, and it is using Ehcache as cache component swap data into disk and reduce memory usage.
Few weeks ago, when we try to deploy this application for one of our customer, it fails at startup. And log shows:
Caused by: java.lang.IllegalStateException: Cache 'data' creation in EhcacheManager failed.
at org.ehcache.core.EhcacheManager.createCache(EhcacheManager.java:288)
at org.ehcache.core.EhcacheManager.init(EhcacheManager.java:567)
... 7 more
Caused by: org.ehcache.StateTransitionException: Initial table allocation failed.
Initial Table Size (slots) : 64
Allocation Will Require : 1KB
Table Page Source : org.terracotta.offheapstore.disk.paging.MappedPageSource#bc8a4ca2
at org.ehcache.core.StatusTransitioner$Transition.succeeded(StatusTransitioner.java:209)
at org.ehcache.core.Ehcache.init(Ehcache.java:567)
at org.ehcache.core.EhcacheManager.createCache(EhcacheManager.java:261)
... 8 more
Caused by: java.lang.IllegalArgumentException: Initial table allocation failed.
Initial Table Size (slots) : 64
Allocation Will Require : 1KB
Table Page Source : org.terracotta.offheapstore.disk.paging.MappedPageSource#bc8a4ca2
at org.terracotta.offheapstore.OffHeapHashMap.<init>(OffHeapHashMap.java:219)
at org.terracotta.offheapstore.AbstractLockedOffHeapHashMap.<init>(AbstractLockedOffHeapHashMap.java:71)
at org.terracotta.offheapstore.AbstractOffHeapClockCache.<init>(AbstractOffHeapClockCache.java:76)
at org.terracotta.offheapstore.disk.persistent.AbstractPersistentOffHeapCache.<init>(AbstractPersistentOffHeapCache.java:43)
at org.terracotta.offheapstore.disk.persistent.PersistentReadWriteLockedOffHeapClockCache.<init>(PersistentReadWriteLockedOffHeapClockCache.java:36)
at org.ehcache.impl.internal.store.disk.factories.EhcachePersistentSegmentFactory$EhcachePersistentSegment.<init>(EhcachePersistentSegmentFactory.java:73)
at org.ehcache.impl.internal.store.disk.factories.EhcachePersistentSegmentFactory.newInstance(EhcachePersistentSegmentFactory.java:60)
at org.ehcache.impl.internal.store.disk.factories.EhcachePersistentSegmentFactory.newInstance(EhcachePersistentSegmentFactory.java:37)
at org.terracotta.offheapstore.concurrent.AbstractConcurrentOffHeapMap.<init>(AbstractConcurrentOffHeapMap.java:106)
at org.terracotta.offheapstore.concurrent.AbstractConcurrentOffHeapCache.<init>(AbstractConcurrentOffHeapCache.java:48)
at org.terracotta.offheapstore.disk.persistent.AbstractPersistentConcurrentOffHeapCache.<init>(AbstractPersistentConcurrentOffHeapCache.java:52)
at org.ehcache.impl.internal.store.disk.EhcachePersistentConcurrentOffHeapClockCache.<init>(EhcachePersistentConcurrentOffHeapClockCache.java:52)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore.createBackingMap(OffHeapDiskStore.java:279)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore.getBackingMap(OffHeapDiskStore.java:167)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore.access$600(OffHeapDiskStore.java:95)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore$Provider.init(OffHeapDiskStore.java:460)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore$Provider.initStore(OffHeapDiskStore.java:456)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore$Provider.initAuthoritativeTier(OffHeapDiskStore.java:507)
at org.ehcache.impl.internal.store.tiering.TieredStore$Provider.initStore(TieredStore.java:472)
at org.ehcache.core.EhcacheManager$8.init(EhcacheManager.java:499)
at org.ehcache.core.StatusTransitioner.runInitHooks(StatusTransitioner.java:135)
at org.ehcache.core.StatusTransitioner.access$000(StatusTransitioner.java:33)
at org.ehcache.core.StatusTransitioner$Transition.succeeded(StatusTransitioner.java:194)
this code triggered this is:
CacheConfiguration<String, String[]> dconf = CacheConfigurationBuilder
.newCacheConfigurationBuilder(String.class, String[].class, ResourcePoolsBuilder.heap(11)
.disk(3, MemoryUnit.GB, false))
.withExpiry(Expirations.timeToLiveExpiration(Duration.of(30, TimeUnit.MINUTES)))
.build();
dataCacheManager = CacheManagerBuilder.newCacheManagerBuilder()
.with(CacheManagerBuilder.persistence(new File(cacheFolder, "requestdata"))) //$NON-NLS-1$
.withCache(CACHE_NAME_DATA,dconf)
.build(true);
which surprised us because it has never happened before, we have deployed it for some other customers' server (Windows, As400, linux), none of them has this issues.
This is really a headache, we spend weeks try to figure it out, read source code, tuning jvm parameters, googling around..., nothing except one unanswered post: https://groups.google.com/forum/#!topic/ehcache-users/ApFAe5nYxuA
Is there anyone can help us one this? thanks ahead!
The Ehcache 3 disk store uses java.nio.MappedByteBuffer which require access to direct memory.
There is no documented default MaxDirectMemorySize in Java and the same JVM on different OS can behave differently.
If you have not already set the flag -XX:MaxDirectMemorySize=3G when launching your application, it could be the cause of that exception you see.

org.h2.jdbc.JdbcSQLException: General error: "java.lang.StackOverflowError" [50000-176]

Stackoverflow error while using H2 database in Multi Threaded Environment
Our Application has service layer querying H2 database and retrieving the resultset.
The service layer connects to the h2 database using opensource clustering middleware "Sequoia" (that offers load balancing and
transparent failover) and also manages database connections .
https://sourceforge.net/projects/sequoiadb/
Our service layer has 50 service methods and we have exposed the service methods as EJB's . While Invoking the EJB's
we get the response from service (that includes H2 READ) with an average response time of 0.2 secs .
The DAO layer, query the database using Hibernate Criteria and we also use JPA2.0 entity manager to manage datasource.
For Load testing , We created a test class (with a main method) that invokes all the 50 EJB Methods .
50 threads were created and all the threads invoked the test class . The execution was Ok for first run and all the 50 threads succssfully completed
invoking 50 EJB methods .
When we triggered the test class again , we encountered "stackoverflowerror".The Detailed stacktrace is shown below
org.h2.jdbc.JdbcSQLException: General error: "java.lang.StackOverflowError" [50000-176]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:344)
at org.h2.message.DbException.get(DbException.java:167)
at org.h2.message.DbException.convert(DbException.java:290)
at org.h2.server.TcpServerThread.sendError(TcpServerThread.java:222)
at org.h2.server.TcpServerThread.run(TcpServerThread.java:155)
at java.lang.Thread.run(Thread.java:784)
Caused by: java.lang.StackOverflowError
at java.lang.Character.digit(Character.java:4505)
at java.lang.Integer.parseInt(Integer.java:458)
at java.lang.Integer.parseInt(Integer.java:510)
at java.text.MessageFormat.makeFormat(MessageFormat.java:1348)
at java.text.MessageFormat.applyPattern(MessageFormat.java:469)
at java.text.MessageFormat.<init>(MessageFormat.java:361)
at java.text.MessageFormat.format(MessageFormat.java:822)
at org.h2.message.DbException.translate(DbException.java:92)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:343)
at org.h2.message.DbException.get(DbException.java:167)
at org.h2.message.DbException.convert(DbException.java:290)
at org.h2.command.Command.executeUpdate(Command.java:262)
at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:199)
at org.h2.server.TcpServer.addConnection(TcpServer.java:140)
at org.h2.server.TcpServerThread.run(TcpServerThread.java:152)
... 1 more
at org.h2.engine.SessionRemote.done(SessionRemote.java:606)
at org.h2.engine.SessionRemote.initTransfer(SessionRemote.java:129)
at org.h2.engine.SessionRemote.connectServer(SessionRemote.java:430)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:311)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:107)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:91)
at org.h2.Driver.connect(Driver.java:74)
at org.continuent.sequoia.controller.connection.DriverManager.getConnectionForDriver(DriverManager.java:266)
We then even added a random thread sleep(10-25 secs) between EJB Invocation . The execution was successful thrice (all 50 EJB Invocation)
and when we triggered for 4th time ,it failed with above error .
We get to see the above failure even with a thread count of 25 .
The Failure is random and there doesn't seems to be a pattern . Kindly let us know if we have missed any configuration .
Please let me know if you need any additional information . Thanks in Advance for any help .
Technology Stack :
1) Java 1.6
2) h2-1.3.176
3) Sequoia Middleware that manages DB Connection Open and Close.
-Variable Connection Pool Manager
-init pool size 250
Thanks Lance Java for your suggestions . Increasing stack size didnt help in our scenario for the following reasons (i.e additional stack helped only for few more executions).
In Our App , we are using Entity Manager (JPA) and the transaction attribute was not set . Hence each query to the database , created a thread carrying out execution . In JVisualVm , we observed the DB Threads, the Live Threads was equal to Total Threads Started .
Eventually our app created more than 30K threads and hence has resulted in Stackoverflow error .
Upon Setting the transaction attribute , the threads get killed after DB execution and all the transactions are then managed by only 25-30 threads.
The Issue is resolved now .
There's two main causes for a stack overflow error
A bug containing a non-terminating recursive call
The allocated stack size for the jvm isn't big enough
Looking at your stack trace it doesn't look recursive so I'm guessing you are running out of space. Have you set the -Xss flag for your JVM? You might need to increase this value.

error when opening a lucene index: java.io.IOException: Map failed

I have a problem which I think is the same as that described here:
Error when opening a lucene index: Map failed
However the solution does not apply in this case so I am providing more details and asking again.
The index is created using Solr 5.3
The line of code causing the exception is:
IndexReader indexReader = DirectoryReader.open(FSDirectory.open(Paths.get("the_path")));
The exception stacktrace is:
Exception in thread "main" java.io.IOException: Map failed: MMapIndexInput(path="/mnt/fastdata/ac1zz/JATE/solr-5.3.0/server/solr/jate/data_aclrd/index/_5t.tvd") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 434505698 bytes. Please review 'ulimit -v', 'ulimit -m' (both should return 'unlimited'), and 'sysctl vm.max_map_count'. More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:907)
at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:265)
at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:239)
at org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.<init>(CompressingTermVectorsReader.java:144)
at org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:120)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
at uk.ac.shef.dcs.jate.app.AppATTF.extract(AppATTF.java:39)
at uk.ac.shef.dcs.jate.app.AppATTF.main(AppATTF.java:33)
The suggested solutions as in the exception message do not work in this case because I am running the application on a server and I do not have permissions to change those.
Namely,
ulimit -v unlimited
prints: "-bash: ulimit: virtual memory: cannot modify limit: Operation not permitted"
and
sysctl -w vm.max_map_count=10000000
gives:"error: permission denied on key 'vm.max_map_count'"
Is there anyway I can solve this?
Thanks
I have found a solution and so I am answering myself.
If you really cannot set ulimit or vm.max_map_count, the only solution, according to http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html, is to configure your solr (or if you work with Lucene api, choose explicitly) to use SimpleFSDirectory (if windows) or NIOFSDirectory, both are slower than the default.
For example
DirectoryReader.open(new NIOFSDirectory(Paths.get("path_to_index"), FSLockFactory.getDefault()))

com.fasterxml.jackson.core.util.InternCache.intern() deadlock under high load

Env:
Amazon
Centos
Apache-tomcat-7.0.53
Java 8
Jackson-core-2.2.3
Problem
When we test our servers with a load of ~7000CCU, we see several of the following when we profile our appservers using Yourkit.
http-apr-8080-exec-952 <--- Frozen for at least 17 sec
com.fasterxml.jackson.core.util.InternCache.intern(String) InternCache.java:43
com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer.findSymbol(char[], int, int, int) CharsToNameCanonicalizer.java:506
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parseFieldName(int) ReaderBasedJsonParser.java:1182
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken() ReaderBasedJsonParser.java:602
com.fasterxml.jackson.core.base.ParserMinimalBase.nextValue() ParserMinimalBase.java:128
What can we do to improve the performance of this library.
I found the reason, we were not closing the parser instance. By closing the parser instance in 'finally block' this issue has vanished.

Java Quartz Ibatis Cron Issues

I have a java webapp using an ibatis row handler to load a very large dataset (1 million rows in an innodb table). The process is run as a nightly cron job by quartz scheduler. However, after it processes for 6 minutes, it dies with the following stack trace:
WARN [DefaultQuartzScheduler_Worker-8] MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(168) | Could not invoke method 'doBatch' on target object [org.myCron#4adb34]
org.springframework.jdbc.UncategorizedSQLException: SqlMapClient operation: encountered SQLException [
--- The error occurred in org/myCron/mySqlMap.xml.
--- The error occurred while applying a result map.
--- Check the mySqlMap.outputMapping.
--- The error happened while setting a property on the result object.
--- Cause: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:
** BEGIN NESTED EXCEPTION **
java.io.EOFException
STACKTRACE:
java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1903)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2402)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2860)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:6106)
at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:168)
at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at com.ibatis.common.jdbc.logging.ResultSetLogProxy.invoke(ResultSetLogProxy.java:47)
at $Proxy10.next(Unknown Source)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleResults(SqlExecutor.java:380)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleMultipleResults(SqlExecutor.java:301)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(SqlExecutor.java:190)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.sqlExecuteQuery(GeneralStatement.java:205)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithCallback(GeneralStatement.java:173)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithRowHandler(GeneralStatement.java:133)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryWithRowHandler(SqlMapExecutorDelegate.java:649)
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryWithRowHandler(SqlMapSessionImpl.java:156)
at com.ibatis.sqlmap.engine.impl.SqlMapClientImpl.queryWithRowHandler(SqlMapClientImpl.java:133)
at org.springframework.orm.ibatis.SqlMapClientTemplate$5.doInSqlMapClient(SqlMapClientTemplate.java:267)
at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:165)
at org.springframework.orm.ibatis.SqlMapClientTemplate.queryWithRowHandler(SqlMapClientTemplate.java:265)
at org.myCron.doBatch(MyCron.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:248)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:165)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:66)
at org.quartz.core.JobRunShell.run(JobRunShell.java:191)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516)
** END NESTED EXCEPTION **
The stack trace is very vague. The only hints that I see are 'the error happened while setting a property on the result object'. There are only two properties on the result object: a String and an Integer. Both of them permit null values, but my select statements indicate that neither of them have any null values. They both have a proper gettter/setter (which makes sense since the process runs for a while successfully before dying). Every time that the cron runs, it dies at a random point (so it isn't stuck on a particular row).
Note - The method 'doBatch' does exist since that is the method that starts the cron process. If it couldn't find doBatch, it couldn't successfully process the first thousand rows.
I've also tried runnning the job outside of quartz and it also fails there as well. We tried increasing our MySQL net_read_timeout, net_write_timeout, and delayed_insert_timeout but none of these settings helped with the problem. I also tried setting my log4j setting to DEBUG and I did not get any helpful info.
Any other ideas about what I could try?
Sounds like MySQL closed the connection for some reason. Check the MySQL log see if anything shows up. Turn on various logging options for MySQL if necessary.
Also, start printing debug data (including timestamp) from your app - just print everything, then see what the last action was - perhaps you have some rarely triggered conditions in your code that has a bug.
I.e. every single time you talk to MySQL log it before AND after.

Categories