JTA Transaction Timeout Troubleshooting

JTA Transaction Timeout Troubleshooting - java

Setup:
Oracle 12 DB
JBoss EAP7
Webservice running on JBoss, inserts into DB
Batchprogramm calling the webservice from multiple threads about 130.000 times in the span of an hour
The problem:
2018-04-26 18:20:44,675 +0200 [WARN ] [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffffac110923:-4c44ed1d:5ac9329e:6866ea in state RUN
2018-04-26 18:20:44,675 +0200 [WARN ] [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffffac110923:-4c44ed1d:5ac9329e:6866ea invoked while multiple threads active within it.
2018-04-26 18:20:44,679 +0200 [WARN ] [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012381: Action id 0:ffffac110923:-4c44ed1d:5ac9329e:6866ea completed with multiple threads - thread default task-48 was in progress with xxx.BaseEntity.getNextValue(BaseEntity.java:28)
This happens routinely in the production environment under heavy load, not when processing fewer records and not in an identical test environment with the exact same load.
The last line shows that this transaction timeout (300s) occurs while fetching the next value from a sequence:
CREATE SEQUENCE "XXX_S" MINVALUE xxx MAXVALUE xxx INCREMENT BY 1 START WITH xxx CACHE 2 NOORDER NOCYCLE NOPARTITION ;
I know Oracle needs to lock/unlock the sequence in order to keep it consistent, so my parallel webservice calls must somehow run into a deadlock or massive contention, producing the timeout.
How do I find the root of this problem? Which parameters can I try to manipulate?

Issue is now resolved, though very unsatisfyingly. We removed parallelism.

Related

Spring boot metrics shows HikariCP connection creation count 1, when HikariCP debug log's connection total is 2

I use Spring-boot version 2.0.2 to make web application with default connection pool HikariCP.
HikariCP debug log shows collect connection size like 2, but spring boot metrics show connection creation is 1.
Did I misunderstand?
Thanks in advance.
application.yml is the below
spring:
datasource:
minimum-idle: 2
maximum-pool-size: 7
Log:
DEBUG 8936 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - After cleanup stats (total=2, active=0, idle=2, waiting=0)
URL for metrics:http://localhost:8080/xxx/metrics/hikaricp.connections.creation
Response:
{
name: "hikaricp.connections.creation",
measurements:
[
{
statistic: "COUNT",
value: 1 <--- I think this should be 2
},
...
]
}

What you are seeing is HikariCPs failfast check behaviour with regards to tracking metrics at this stage.
(I actually dug into this as I didn't know the answer beforehand)
At this stage a MetricsTracker isn't set yet and thus the initial connection creation isn't counted. In case the initial connection could be established, HikariCP just keeps this connection. In your case only the next connection creation is counted.
In case you really want the metric value to be "correct" you can set spring.datasource.hikari.initialization-fail-timeout=-1. The behaviour is described in HikariCPs README under initializationFailTimeout.
If you really need a "correct" value is debatable as you'll only miss that initial count. Ideally you'll want to reason about connection creation counts in a specific time window - e.g. rate of connection creations per minute to determine if you dispose connections too early from the pool.

Hazelcast custom timeout for operations

I am using "hazelcast.operation.call.timeout.millis = 100" configuration to timeout hazelcast operations.
But at the startup of the hazelcast some of the map size operation are getting timeout because of this configuration. I just only want to timeout the operations after the map load which are basically map get operations. Is there any way to add custom operation timeout for those map.get() operations ?
Is there any other way to get this done ???
com.hazelcast.core.OperationTimeoutException: HDMapSizeOperation got rejected before execution due to not starting within the operation-call-timeout of: 100ms. Current time: 2017-05-15 11:41:47.503. Start time: 2017-05-15 11:41:44.189. Total elapsed time: 3314 ms. Invocation{op=com.hazelcast.map.impl.operation.HDMapSizeOperation{serviceName='hz:impl:mapService', identityHash=1941379381, partitionId=0, replicaIndex=0, callId=-24461, invocationTime=1494828707296 (2017-05-15 11:41:47.296), waitTimeout=-1, callTimeout=100, name=blockMap}, tryCount=250, tryPauseMillis=500, invokeCount=11, callTimeoutMillis=100, firstInvocationTimeMs=1494828704189, firstInvocationTime='2017-05-15 11:41:44.189', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 05:30:00.000', target=[192.168.2.204]:5701, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:151)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:99)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:75)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:155)
at com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.retryFailedPartitions(InvokeOnPartitions.java:143)
at com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.invoke(InvokeOnPartitions.java:73)
at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnAllPartitions(OperationServiceImpl.java:371)
at com.hazelcast.map.impl.proxy.MapProxySupport.size(MapProxySupport.java:628)
at com.hazelcast.map.impl.proxy.MapProxyImpl.size(MapProxyImpl.java:102)
at it.XXXX.tbx.server.MapLoader.run(MapLoader.java:36)
Regards,
Tharinda

If you are trying to control waiting on the result of e.g. a map.get; you could have a look at the asynchronous version like map.getAsync. It returns a future and you can control how long you want to wait for a result.
Modifying the call timeout is not advised.

Thread associated with transaction suspended for a long time in JBoss 6.1.0

I am facing a big concern in JBoss 6.1.0. It is a multi threaded application and am using stateless EJB with BMT and Sybase DB. JDK used is 1.7.76u. User transaction is started. Queries got ran but the associated thread tries to commit after ONE HOUR. I am not aware what happened to the thread executing. It is suspended for sure but not from the code.
Can anyone please give a valuable pointer about why the thread got suspended for more than hour. Obviously after an hour, thread resuming and trying either COMMIT or ROLLBACK will fail and has failed as the default transaction timeout is 300 seconds (which is JBoss 6 default value).
2017-01-09 10:01:49,389 DEBUG [TestDAO] [EventId: ] [pool-63-thread-6] SQL SELECT QUERY
2017-01-09 10:01:49,391 DEBUG [TestDAO] [EventId: ] [pool-63-thread-6] ['dao.rowsProcessed']: 1 rows processed
2017-01-09 10:01:49,389 DEBUG [TestDAO] [EventId: ] [pool-63-thread-6] SQL UPDATE QUERY
2017-01-09 10:01:49,391 DEBUG [TestDAO] [EventId: ] [pool-63-thread-6] ['dao.rowsUpdated']: 1 row updated
2017-01-09 11:05:48,213 DEBUG [DAOUtils] [EventId: ] [pool-63-thread-6] commitTx
2017-01-09 11:05:48,214 ERROR [DAOUtils] [EventId: ] [pool-63-thread-6] commitTx() ARJUNA-16063 The transaction is not active!
2017-01-09 11:05:48,215 DEBUG [DAOUtils] [EventId: ] [pool-63-thread-6] rollbackTx
2017-01-09 11:05:48,215 ERROR [DAOUtils] [EventId: ] [pool-63-thread-6] rollbackTx() java.lang.IllegalStateException - BaseTransaction.rollback - ARJUNA-16074 no transaction!

It seems you have long running transactions which is being time-out.
"The transaction is not active!" are caused by a transaction timeout. When a transaction times out the transaction manager rolls it back asynchronously and then when a compontent tries to access the transaction again (e.g. to commit it or roll it back) it won't be able to according to the JTA spec.
The default transaction timeout has been defined under "default-timeout" attribute at "transactions" subsystem in the application server configuration.
The default is 300 seconds / 5 minutes.
You may modify the value to increase the default transaction timeout.
You may set the value to 0 to disable the transaction reaper/ transaction timeout.
The application server VM must be restarted for the default-timeout change to be applied.
<subsystem xmlns="urn:jboss:domain:transactions:1.4">
<coordinator-environment default-timeout="300"/> <!-- HERE -->
</subsystem>
It looks to me like your it is taking longer than 5 minutes to process the message therefore its transaction is timing out.
I would recommend you to increase the transaction timeout to a higher figure to avoid this situation. It would be good if you could refactor the application code to reduce the time taken to complete a transaction. So it may be that the application logic is correctly handling the scenario in this case

As I mentioned in the JBoss forum, this is not issue with the transaction timeout.
There is no point in extending the timeout for transaction as this blocks all the other applications because locks in the database is held by transaction.
Threads executing the transaction are frozen. Any hints on why this thread is blocked from committing would be of great help.
Rgds
Manohar

EJB JPA CMT - Flush failure on large dataset

I've got a JBoss 6.3 EAP, JPA 2.0, EJB 3.1, CMT JTA web app. DB is MSSQL2008R2, using MS JDBC driver, and hibernate 4.2.14 under the hood.
I've got a method that looks kind of like this, to duplicate a million Prices entities:
public void doStuff(Date newDate)
{
List<Prices> prices = dao.getPrices(); //<< 1000000+ prices
for (Prices price : prices)
{
Prices copy = price.clone();
copy.setDate(newDate);
entityManager.persist(copy);
if (newDate.before(someDate))
{
price.setDate(someDate);
entityManager.merge(price);
}
}
}
I set the JBoss EJB coordinator timeout to an hour, to let it run. I increased heap size to -Xmx 3G after it ran out of memory the first time.
The code starts at 1:24am, it finishes at 1:36am, then at 2:24am, it fails with a transaction error, and rolls back. The stacktrace says its during the flush.
at org.hibernate.ejb.AbstractEntityManagerImpl$CallbackExceptionMapperImpl.mapManagedFlushFailure(AbstractEntityManagerImpl.java:1510) [hibernate-e
ntitymanager-4.2.14.SP1-redhat-1.jar:4.2.14.SP1-redhat-1]
I can see that if I break up the million into chunks of 10000 and flush after each, it doesn't even get near a million during the hour. So flushing is clearly an expensive task. But I suppose it starts flushing implicitly during JTA's post-intercept commit.
Should I just increase the timeout and try again? It is a DEV database being used by a few others, and my code seems to lock the prices table, making it unqueryable from MSSQL SMSS, so it's not something I want to let run indefinitely. But is this just an issue of needing more time?
Start of stacktrace:
02:24:45,157 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0a14021f:3d218bb8:56009132:22 in state RUN
02:24:45,169 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffff0a14021f:3d218bb8:56009132:22 invoked while multiple threads active within it.
02:24:45,169 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffff0a14021f:3d218bb8:56009132:22 aborting with 1 threads active!
02:24:45,667 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0a14021f:3d218bb8:56009132:22 in state CANCEL
02:24:46,209 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0a14021f:3d218bb8:56009132:22 in state CANCEL_INTERRUPTED
02:24:46,210 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012120: TransactionReaper::check worker Thread[Transaction Reaper Worker 0,5,main] not responding to interrupt when cancelling TX 0:ffff0a14021f:3d218bb8:56009132:22 -- worker marked as zombie and TX scheduled for mark-as-rollback
02:24:46,210 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012110: TransactionReaper::check successfuly marked TX 0:ffff0a14021f:3d218bb8:56009132:22 as rollback only
02:25:07,968 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (http-/0.0.0.0:8080-1) SQL Error: 0, SQLState: null
02:25:07,968 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (http-/0.0.0.0:8080-1) Transaction cannot proceed STATUS_ROLLEDBACK
02:25:08,085 WARN [com.arjuna.ats.arjuna] (http-/0.0.0.0:8080-1) ARJUNA012125: TwoPhaseCoordinator.beforeCompletion - failed for SynchronizationImple< 0:ffff0a14021f:3d218bb8:56009132:24, org.hibernate.engine.transaction.synchronization.internal.RegisteredSynchronization#2d633a18 >: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not prepare statement

Well I rewrote it as SQL, and used 2 entityManager.createNativeQuery calls, instead of programmatic JPA, and it finished in 30 seconds or so.
So, the lesson is, don't bother with JPA for large data sets. Work out the solution in SQL, and then grab the straight JDBC connection to do it.

Transaction is alternating Timeouts

I am using jboss 5.1.x, EJB3.0
I have MDB which listens to JMS queue. when the MDB taking a message, it dispatch a msg via TCP to some modem.
sometimes that Modem doesnt response when the server is waiting for an answer:
byte[] byteData = receive(is);
coz I cant set timeout on InputStream.
so thanks to the EJB container the transaction timeout(which is there by default) rolling back the operation and then a retry executed again.
this mechanism by default works fine for me, the problem is:
Sometimes the transaction never timed out, and after long time I get the following
msg in the console:
15:18:22,578 WARN [arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.TransactionReaper_18] - TransactionReaper::check timeout for TX a6b2232:5f8:4d3591c6:76 in state RUN
15:18:22,578 WARN [arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.BasicAction_58] - Abort of action id a6b2232:5f8:4d3591c6:76 invoked while multiple threads active within it.
15:18:22,578 WARN [arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check - atomic action a6b2232:5f8:4d3591c6:76 aborting with 1 threads active!
15:18:22,578 WARN [arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.TransactionReaper_7] - TransactionReaper::doCancellations worker Thread[Thread-10,5,jboss] successfully canceled TX a6b2232:5f8:4d3591c6:76
Any idea what's wrong? and why sometimes it work and sometimes it doesnt?
thanks,
ray.

JBossAS that uses the Arjuna's Transaction Manager. In EJB3 interceptor chain would begin to unroll and eventually hit the transaction manager interceptors whose job it is to abort the transaction.
For MDB's you can annote it with #ActivationConfigProperty(propertyName="transactionTimeout" value="1500")
For other beans you can have #TransactionTimeout(1500) at class level or method level.
When the transaction manager detects that the transaction has timed out and then aborts it from within an asynchronous thread (different from the thread running in method), but it never sends an interrupt to the currently running thread.
Therefore resulting in : invoked while multiple threads active within it ... aborting with 1 threads active!
Edit :
//---
ThreadGroup root = Thread.currentThread().getThreadGroup().getParent();
while (root.getParent() != null)
root = root.getParent();
findAllThread(root,0);
//---
public static findAllThread(ThreadGroup threadGroup, int level){
int actCount = threadGroup.activeCount();
Thread[] threads = new Thread[actCount*2];
actCount = threadGroup.enumerate(threads, false);
for (int i=0; i<actCount; i++) {
Thread thread = threads[i];
thread.interrupt();
}
int groupCount = threadGroup.activeGroupCount();
ThreadGroup[] groups = new ThreadGroup[numGroups*2];
groupCount = threadGroup.enumerate(groups, false);
for (int i=0; i<groupCount; i++)
findAllThread(groups[i], level+1);
//---
It will list other active threads also like Reference Handler, Finalizer, Signal Dispatcher etc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JTA Transaction Timeout Troubleshooting - java

Issue is now resolved, though very unsatisfyingly. We removed parallelism.

Related

Spring boot metrics shows HikariCP connection creation count 1, when HikariCP debug log's connection total is 2

Hazelcast custom timeout for operations

Thread associated with transaction suspended for a long time in JBoss 6.1.0

EJB JPA CMT - Flush failure on large dataset

Transaction is alternating Timeouts

Categories

Resources