Spring batch long running tasklet database timeout - java

Given I have this Spring Batch configuration for my workflow job and I am using Sql Server database for my spring batch tables:
public class MyConfiguration extends AbstractConfiguration {
#Bean
#Qualifier("pollStep")
public Step pollStep() {
return stepBuilderFactory.get("pollStep")
.tasklet(filePollingTasklet())
.listener(promoteContextListener())
.build();
}
#Bean
#StepScope
private Tasklet filePollingTasklet() {
return ((stepContribution, chunkContext) -> getStatus(stepContribution, chunkContext));
}
private RepeatStatus getStatus(StepContribution stepContribution, ChunkContext chunkContext) {
//some code
Map<String, Boolean> result = poller.pollForFile(myContext, sourceInfo);
return RepeatStatus.FINISHED;
}
}
My application polls for a file on remote server. After 100 mins when it can't find a file the poller.pollForFile() throws a runtime exception and my step status is UNKNOWN and the application exits with exceptions:
c.m.s.j.SQLServerException: Connection reset at
c.m.s.j.SQLServerConnection.terminate(SQLServerConnection.java:1667) at
c.m.s.j.SQLServerConnection.terminate(SQLServerConnection.java:1654) at
c.m.s.j.TDSChannel.write(IOBuffer.java:1805) at c.m.s.jdbc.TDSWriter.flush(IOBuffer.java:3581) at
c.m.s.jdbc.TDSWriter.writePacket(IOBuffer.java:3482) at
c.m.s.jdbc.TDSWriter.endMessage(IOBuffer.java:3062) at
c.m.s.j.TDSCommand.startResponse(IOBuffer.java:6120) at
c.m.s.j.TDSCommand.startResponse(IOBuffer.java:6106) at
c.m.s.j.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:1756) at
c.m.s.j.TDSCommand.execute(IOBuffer.java:5696) at
c.m.s.j.SQLServerConnection.executeCommand(SQLServerConnection.java:1715) at
c.m.s.j.SQLServerConnection.connectionCommand(SQLServerConnection.java:1761) at
c.m.s.j.SQLServerConnection.rollback(SQLServerConnection.java:1964) at
c.z.h.p.ProxyConnection.rollback(ProxyConnection.java:375) at
c.z.h.p.HikariProxyConnection.rollback(HikariProxyConnection.java) at
o.h.r.j.i.AbstractLogicalConnectionImplementor.rollback(AbstractLogicalConnectionImplementor.java:116) ... 50 common frames omitted Wrapped by: u003c#7f0e356au003e o.h.TransactionException: Unable to rollback against JDBC Connection at ...
I think the sql server db connection is timed out and closed and spring batch is unable to perform rollback and db updates. Ideally, I want status to be FAILED which it is when I run locally with H2 but on this instance what strategy or techniques can I use to overcome this issue? The exit message doesnt have the error from exception thrown by pollForFile(), instead it is org.springframework.transaction.TransactionSystemException: Could not roll back JPA transaction; nested exception is org.hibernate.TransactionException: Unable to rollback against JDBC Connectionat
Is there a way to fix this issue? What if I were to move from tasklet to chunk-oriented and perform the poll logic in read() method of ItemReader ?

Your thinking is correct. When the commit fails, Spring Batch is unable to correctly update the step status which ends in UNKNOWN instead of FAILED. There is an open issue for that here: https://github.com/spring-projects/spring-batch/issues/1826. While your exception is different, the problem is the same. I had an attempt to fix that here: https://github.com/spring-projects/spring-batch/pull/591 but I decided to discard it (you can find more details about the reasons in that PR).
To work around the issue, you need to make sure any (runtime) exception is handled in the tasklet (or in item writer in case of a chunk-oriented step). In your case, you can increase the timeout of your transaction and catch runtime exception in the tasklet (which you can wrap in a meaningful exception that you re-throw from the tasklet to make it fail).
EDIT: add example of increasing transaction timeout
#Bean
#Qualifier("pollStep")
public Step pollStep() {
DefaultTransactionAttribute attribute = new DefaultTransactionAttribute();
attribute.setTimeout(60 * 100);
// set other transaction attributes
return stepBuilderFactory.get("pollStep")
.tasklet(filePollingTasklet())
.transactionAttribute(attribute)
.listener(promoteContextListener())
.build();
}

Related

spring batch rollback all steps in case of exception in one the the steps

Job lensJob(JobBuilderFactory jbf, StepBuilderFactory sbf) throws Exception{
return jbf
.get("myJob")
.incrementer(new RunIdIncrementer())
.listener(jobResultListener)
.start(step1Lens())
.next(step2Lens())
.build();
}
so in my case my job contains 2 steps that reads from the same file and insert in different table of database ,what i'm looking for is a way to rollback automatically all the steps when an exception is thrown and in case some records are inserted to be removed automatically
That's not possible, there is no inter-step transactions.
reads from the same file and insert in different table
You can have two writers (one writer for each table) configured as delegates in a CompositeItemWriter. With this configuration, a transaction rollback will rollback items written in both tables.
Hope this helps.

How rollback transaction after timeout in spring boot application in same way as on weblogic

So in my weblogic application we are you using some jtaWeblogicTransactionManager. There is some default timeout which can be override in annotation #Transactional(timeout = 60). I created some infinity loop to read data from db which correctly timeout:
29 Apr 2018 20:44:55,458 WARN [[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel.Default (self-tuning)'] org.springframework.jdbc.support.SQLErrorCodesFactory : Error while extracting database name - falli
ng back to empty error codes
org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData; nested exception is java.sql.SQLException: Unexpected exception while enlisting XAConnection java.sql.SQLExceptio
n: Transaction rolled back: Transaction timed out after 240 seconds
BEA1-2C705D7476A3E21D0AB1
at weblogic.jdbc.jta.DataSource.enlist(DataSource.java:1760)
at weblogic.jdbc.jta.DataSource.refreshXAConnAndEnlist(DataSource.java:1645)
at weblogic.jdbc.wrapper.JTAConnection.getXAConn(JTAConnection.java:232)
at weblogic.jdbc.wrapper.JTAConnection.checkConnection(JTAConnection.java:94)
at weblogic.jdbc.wrapper.JTAConnection.checkConnection(JTAConnection.java:77)
at weblogic.jdbc.wrapper.Connection.preInvocationHandler(Connection.java:107)
at weblogic.jdbc.wrapper.Connection.getMetaData(Connection.java:560)
at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:331)
at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:366)
at org.springframework.jdbc.support.SQLErrorCodesFactory.getErrorCodes(SQLErrorCodesFactory.java:212)
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.setDataSource(SQLErrorCodeSQLExceptionTranslator.java:134)
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.<init>(SQLErrorCodeSQLExceptionTranslator.java:97)
at org.springframework.jdbc.support.JdbcAccessor.getExceptionTranslator(JdbcAccessor.java:99)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:655)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:690)
now I would like to make same behavior in my spring boot application so I tried this:
#EnableTransactionManagement
.
.
.
#Bean(name = "ds1")
#ConfigurationProperties(prefix = "datasource.ds1")
public DataSource logDataSource() {
AtomikosDataSourceBean ds = new AtomikosDataSourceBean();
return ds;
}
#Bean(name = "ds2")
#ConfigurationProperties(prefix = "datasource.ds2")
public DataSource refDataSource() {
AtomikosDataSourceBean ds = new AtomikosDataSourceBean();
return ds;
}
tm:
#Bean(name = "userTransaction")
public UserTransaction userTransaction() throws Throwable {
UserTransactionImp userTransactionImp = new UserTransactionImp();
userTransactionImp.setTransactionTimeout(120);
return userTransactionImp;
}
#Bean(name = "atomikosTransactionManager", initMethod = "init", destroyMethod = "close")
public TransactionManager atomikosTransactionManager() throws Throwable {
UserTransactionManager userTransactionManager = new UserTransactionManager();
userTransactionManager.setForceShutdown(false);
userTransactionManager.setTransactionTimeout(120);
return userTransactionManager;
}
#Bean(name = "transactionManager")
#DependsOn({ "userTransaction", "atomikosTransactionManager" })
public JtaTransactionManager transactionManager() throws Throwable {
UserTransaction userTransaction = userTransaction();
TransactionManager atomikosTransactionManager = atomikosTransactionManager();
return new JtaTransactionManager(userTransaction, atomikosTransactionManager);
}
and application.properties:
datasource.ref.xa-data-source-class-name=oracle.jdbc.xa.client.OracleXADataSource
datasource.ref.unique-resource-name=ref
datasource.ref.xa-properties.URL=jdbc:oracle:thin:#...
datasource.ref.xa-properties.user=...
#datasource.ref.xa-properties.databaseName=...
datasource.ref.password=301d24ae7d0d69614734a499df85f1e2
datasource.ref.test-query=SELECT 1 FROM DUAL
datasource.ref.max-pool-size=5
datasource.log.xa-data-source-class-name=oracle.jdbc.xa.client.OracleXADataSource
datasource.log.unique-resource-name=log
datasource.log.xa-properties.URL=jdbc:oracle:thin:#...
datasource.log.xa-properties.user=...
#datasource.log.xa-properties.databaseName=...
datasource.log.password=e58605c2a0b840b7c6d5b20b3692c5db
datasource.log.test-query=SELECT 1 FROM DUAL
datasource.log.max-pool-size=5
spring.jta.atomikos.properties.log-base-dir=target/transaction-logs/
spring.jta.enabled=true
spring.jta.atomikos.properties.service=com.atomikos.icatch.standalone.UserTransactionServiceFactory
spring.jta.atomikos.properties.max-timeout=600000
spring.jta.atomikos.properties.default-jta-timeout=10000
spring.transaction.default-timeout=900
but with no success. My infinity loop never ends (I wait about 15 minutes and then I stop my app). The only time when I saw rollback was when I tried Thread.sleep and after sleep this transaction timeout with rollback but this is not what I want to. So is there some way how to interrupt process after timeout(use timeout in annotation or use default) in same way how in my weblogic application ?
UPDATE
I tested it like this:
public class MyService {
public void customMethod(){
customDao.readSomething();
}
}
public class CustomDao {
#Transactional(timeout = 120)
public void readSomething()
while(true){
//read data from db. app on weblogic throw timeout, spring boot app in docker did nothing and after 15 I give it up and kill it
}
}
}
UPDATE2
When I turn on atomikos debug I can see there is warning during init and some atomikos timer:
2018-05-03 14:00:54.833 [main] WARN c.a.r.xa.XaResourceRecoveryManager - Error while retrieving xids from resource - will retry later...
javax.transaction.xa.XAException: null
at oracle.jdbc.xa.OracleXAResource.recover(OracleXAResource.java:730)
at com.atomikos.datasource.xa.RecoveryScan.recoverXids(RecoveryScan.java:32)
at com.atomikos.recovery.xa.XaResourceRecoveryManager.retrievePreparedXidsFromXaResource(XaResourceRecoveryManager.java:158)
at com.atomikos.recovery.xa.XaResourceRecoveryManager.recover(XaResourceRecoveryManager.java:67)
at com.atomikos.datasource.xa.XATransactionalResource.recover(XATransactionalResource.java:449)
at com.atomikos.datasource.xa.XATransactionalResource.setRecoveryService(XATransactionalResource.java:416)
at com.atomikos.icatch.config.Configuration.notifyAfterInit(Configuration.java:466)
at com.atomikos.icatch.config.Configuration.init(Configuration.java:450)
at com.atomikos.icatch.config.UserTransactionServiceImp.initialize(UserTransactionServiceImp.java:105)
at com.atomikos.icatch.config.UserTransactionServiceImp.init(UserTransactionServiceImp.java:219)
at com.atomikos.icatch.jta.UserTransactionImp.checkSetup(UserTransactionImp.java:59)
at com.atomikos.icatch.jta.UserTransactionImp.setTransactionTimeout(UserTransactionImp.java:127)
maybe this is the reason. How I can fix this ? I am using oracle 12 with ojdbc8 driver
UPDATE 3
after fix UPDATE2 to grant user permission to db I can see in log warning:
2018-05-03 15:16:30.207 [Atomikos:4] WARN c.a.icatch.imp.ActiveStateHandler - Transaction 127.0.1.1.tm152535336001600001 has timed out and will rollback.
problem is that app is still reading data from db after this timeout. Why it is not rollbacked ?
UPDATE 4
so I found in ActiveStateHandler when timeout occurs there is code:
...
setState ( TxState.ACTIVE );
...
and AtomikosConnectionProxy is checking timeout this way
if ( ct.getState().equals(TxState.ACTIVE) ) ct.registerSynchronization(new JdbcRequeueSynchronization( this , ct ));
else AtomikosSQLException.throwAtomikosSQLException("The transaction has timed out - try increasing the timeout if needed");
so why timeout is set state which not cause exception in AtomikosConnectionProxy ?
UPDATE 5
so I found that property
com.atomikos.icatch.threaded_2pc
will solve my problem and now it starts rollback how I want. But I still dont understand why I should set this to true because now I am testing it on some task which should run in single thread
set com.atomikos.icatch.threaded_2pc=true in jta.properties fixed my problem. Idk why this default value was change to false in web application.
* #param single_threaded_2pc (!com.atomikos.icatch.threaded_2pc)
* If true then commit is done in the same thread as the one that
* started the tx.
XA transactions are horribly complicated and you really want to have a very good reason for using them (ie it's literally impossible to add some business process that removes the need for XA), because you are going to get into trouble out in the wild...
That said, My guess is that it's about timeout discrepancies between XA phases.
With XA there are 2 timeouts - a timeout for the 1st phase, known as the Voting phase (which is typically the one set by the #Transactional annotation, but this depends on the JTA provider) and another timeout for the 2nd phase, known as the commit phase, which is typically a lot longer, because the Transaction Manager has already got the agreement from all parties that the commit is ready to go, and therefore provide greater leeway for things like transient network failures and so on.
My guess is that the WebLogic JTA is simply behaving differently to Atomikos with how it's handling the 2nd phase notifications back from the participants, until atomikos is changed to use the multithreaded ack.
If you application is just you and the database, then you can probably get away without an XA Transaction Manager. I'd expect this would behave the way you want for timeouts.
Good Luck!

weblogic.transaction.internal.AppSetRollbackOnlyException: setRollbackOnly called on transaction

I have the bellow case, Where I call doSomeTask() of BeanA but if doSomeTask() fails I want to persist ErrorInfo into another table as well calling the saveError(ErrorInfo) of BeanA. Both of them has #TransactionAttribute(REQUIRES_NEW).
class BeanA {
#TransactionAttribute(REQUIRES_NEW)
public void doSomeTask(){
if(someCondition){
throw new SomeException();
}
// do task
}
#TransactionAttribute(REQUIRES_NEW)
public void saveError(ErrorInfo error) {
// save error info if doSomeTask fails
}
}
class BeanB {
BeanA beanA;
void performTask(){
try{
beanA.doSomeTask();
}catch(Exception e){
ErrorInfo error = getErrorInfo(e)
beanA.saveError(error);
}
}
}
But when doSomeTask() throws Exception saveError() doesn't work and throws Exception
Caused by: weblogic.transaction.internal.AppSetRollbackOnlyException: setRollbackOnly called on transaction
What am I doing wrong and how to fix this error? Thanks in advance for any help.
Sorry, for the late answer. The problem was resolved.
The actual error was hidden. In my case, the actual error was just a JSR 303 validation error of ErrorInfo instance while persisting. Had to add
Dweblogic.transaction.allowOverrideSetRollbackReason=true
on <domain_home>/bin/setDomainEnv.sh to find out the actual error and fix it. Thanks to this answer https://stackoverflow.com/a/38584687/1563286
I have debugged the similar issue not so long ago. In my case the issue was the following:
there was a top level transaction open when the REQUIRES_NEW method is called
After exception in nested transaction commit of the top level one failed to commit as "marked as rollback only"
It turned out that through a new transaction is started the connection holder is shared on TransactionManager level. When exception is thrown inside nested transaction the connection itself is marked as rollback only. So later this is causing the issue.
I was able to resolve the issue by using the savepoints (available since JDBC 3.0). Usually savepoints are disabled in many environments/ORM by default and using them requires additional configuration.
Hope this is of some help.

Spring UnknownServiceException

I have scheduled task in separate thread that access #Transactional service.
In this task I try to access method like this
List<Obj> objects = new ArrayList<Obj>(objService.getObjWithStatus(Status.PROCESSING));
and sometimes get the exception
org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is org.hibernate.service.UnknownServiceException: Unknown service requested [org.hibernate.engine.jdbc.connections.spi.ConnectionProvider]
at org.springframework.orm.hibernate4.HibernateTransactionManager.doBegin(HibernateTransactionManager.java:544)
Caused by: org.hibernate.service.UnknownServiceException: Unknown service requested [org.hibernate.engine.jdbc.connections.spi.ConnectionProvider]
I tried to synchronize access to the service over all threads, but exception still occures randomly. Is there any way to prevent this kind of error?
It seems that spring doesn't close scheduled tasks on exit, and they are keep running without context. Adding
#PreDestroy
private void destroyListener() {
synchronized (this) {
scheduledTask.cancel(false);
}
}
seems to fix this issue

Spring Quartz Scheduler race condition

What I suspect the problem to be is SchedulerFactoryBean's setOverwriteExistingJobs not offering enough protection.
One node will be initializing the scheduler and it will decide to replace the trigger (breakpoint org.quartz.impl.jdbcjobstore.SimpleTriggerPersistenceDelegate#deleteExtendedTriggerProperties )
Right after it executes this method, the trigger won't be in the database any longer so when another node in the cluster will try to read it (org.quartz.impl.jdbcjobstore.JobStoreSupport#retrieveTrigger) it will fail with the exception below. Because of this exception, the whole application will fail to start (not just the scheduler).
Caused by: org.quartz.JobPersistenceException: Couldn't retrieve
trigger: No record found for selection of Trigger with key:
The logs can be found at https://github.com/apixandru/case-study/tree/master/spring-boot-quartz/logs
(The exception can be found on the Server-1 node after the 4th restart)
For the whole project that demonstrates this issue go to https://github.com/apixandru/case-study/tree/master/spring-boot-quartz
The way that we configure the scheduler is here
#Bean
JobDetailFactoryBean jobFactoryBean() {
JobDetailFactoryBean bean = new JobDetailFactoryBean();
bean.setDurability(true);
bean.setName("Sampler");
bean.setJobClass(SampleJob.class);
return bean;
}
#Bean
SimpleTriggerFactoryBean triggerFactoryBean(JobDetailFactoryBean jobFactoryBean) {
SimpleTriggerFactoryBean bean = new SimpleTriggerFactoryBean();
bean.setName("Sampler Trigger");
bean.setRepeatInterval(20_000);
bean.setJobDetail(jobFactoryBean.getObject());
return bean;
}
#Bean
SchedulerFactoryBean schedulerFactoryBean(SimpleTriggerFactoryBean triggerFactoryBean, DataSource dataSource, Dependency dependency) {
Properties props = new Properties();
props.put("org.quartz.scheduler.instanceId", "AUTO");
props.put("org.quartz.jobStore.isClustered", "true");
SchedulerFactoryBean bean = new SchedulerFactoryBean();
bean.setTriggers(triggerFactoryBean.getObject());
bean.setSchedulerName("Demo Scheduler");
bean.setSchedulerContextAsMap(Collections.singletonMap("dependency", dependency));
bean.setOverwriteExistingJobs(true);
bean.setDataSource(dataSource);
bean.setQuartzProperties(props);
return bean;
}
This happens a lot on our work servers but it's a lot harder to get locally (possibly due to the fact that the actual servers are dedicated and have a lot more power than my local machine?)
To get the bug on any machine, start one server in debug mode and put a breakpoint on SimpleTriggerPersistenceDelegate.deleteExtendedTriggerProperties and just after it executes, start the second server and you will get this exception
Anyway, I managed to get this error locally as well after about 40 redeploys to my local clustered weblogic server.
The problem is the fact that by default no transaction manager is used, so no locking is used.
To solve the issue, it is required to call the schedulerFactoryBean's setTransactionManager method.

Categories