I am currently trying to investigate the total number of transactions of my spring batch job.
As I can see, a StepExecution has a property commit_count that tells how many transactions were commited by the step.
I have a job that consists of two steps:
Read a file and map the content to Java objects ( = should be non-transactional, right?)
This step uses a Tasklet, which is only called once. So only one transaction is going to be created for this according to my understanding. Basically, the step does some specific processing of the created objects and persists them to the database afterwards ( = should be transactional, right?)
After the execution of my step I can see that both steps have a commit_count of 1.
But I expected only the second step to have a commit_count of 1. The other one should have a commit_count of 0, right?
I know that, next to the business transactions, Spring does some own transactional stuff in order to persist the job and step execution metadata and so on. But i have read on the internet that this is not technically wrapped in a transaction and thus I don't expect it to be included in the commit_count of a step, right?
In order to see the total number of commited transactions I also have tried to configure logging.level.org.springframework.transaction.interceptor: TRACE but this just logs lots of the following log statements
Completing transaction for [org.springframework.batch.core.repository.support.SimpleJobRepository.update]
Getting transaction for [org.springframework.batch.core.repository.support.SimpleJobRepository.updateExecutionContext]
Completing transaction sounds to me like a transaction has been commited. I am seeing lots of statements like this in the logs, so does this mean my batch job is creating lots of transactions? Actually I have expected the batch job metadata update would be enclosed in a single transaction..
Can someone please explain this. Thanks in Advance!
A commit is made after every chunk. The chunk size of Tasklet(non-chunk based) is always 1. That is reason why commit count is 1 for Tasklet(non-chunk based)
Related
1. cron job started
2. create Entity1 and save to DB
3. Fetch transactionEntity from DB
4. using transactions as transactionIds.
for (Transaction id : transactionIds) {
a. create Entity2 and save to db
b. fetch paymentEntity from DB.
c. response = post request Rest API call
d. udpate Entity2 with response
}
5. udpate Entity1.
Problem statement - I am getting 5000+ transaction from db in transactionIds using cron jobs which need to process as given above. With the above approach while my previous loop is going on, next 5000+ transactions come in the loop as cron job runs in 2 minutes.
I have checked multiple solutions(.parallelStream() with ForkJoinPool / ListenableFuture, but am unable to decide which is the best solution to scale the above code. Can I use spring batch for this, if yes, how to do this? What are the steps comes in reader, process and writer from above steps.
One way to approach this problem will be to use Kafka for consuming the messages. You can increase the number of pods (hopefully you are using microservices) and each pod can be part of a consumer group. This will effectively remove the loop in your code and consumers can be increased on demand to process any scale.
Another advantage of message based approach will be that you can have multiple delivery modes(at least once, at most once etc) and there are a lot of open source libraries available to view the stats of the topic (Lag between consumption and production of messages in a topic).
If this is not possible,
The rest call should not happen for every transaction, you'll need to post the transactions as a batch. API calls are always expensive to do, so the lesser roundtrips will give you a huge difference in time taken to complete the loop.
Instead of directly updating DB before and after API call, you can change the loop use
repository.saveAll(yourentitycollection) // Only one DB call after looping, can be batched
Suggest you to move to producer-consumer strategy in near future.
We are currently doing a discussion on our architecture.
The idea is that we have a database and multiple processing units. One transaction does include the following steps:
Query an entry based on a flag
Request exclusive lock for this entry
Do the processing
Update the Flag and some other columns
Release lock and commit transaction
But what happens if the second processing unit queries an entry, while the first one does hold a lock?
Updating the flag during transaction does not do the job due to transaction isolation and dirty read
In my opinion the possible results on this situation are:
The second processing unit gets the same entry and an exception is raised during the lockrequest
The second one get the next available entry and everything is fine.
The second processing unit gets the same entry.
However, whether the exception will be thrown or the lock acquisition will be blocked until the lock is released by the first transaction depends on the way you are asking for the lock in the second transaction (for example, with timeout or NO WAIT or something similar).
The second scenario (The second one gets the next available entry) is a bit harder to implement. Concurrent transactions are isolated from each other, so they basically see the same snapshot of data until one of them commits.
You can take a look at some database specific features, like Oracle Advanced Queue, or you could change the approach you read data (for example, read them in batches and then dispatch the processing to multiple threads/transactions). All of this highly depends on what exactly you are solving, are there any processing order constraints, failure/rollback/retry handling, etc.
So I have a simple batch job with just one step which consists of a MongoItemReader to read in objects from MongoDB of course, a custom item processor (which for now just sets an 'isProcessed' boolean flag to true), and a MongoItemWriter.
Thing is, I want to be able to backup my jobs to a DB whenever they fail (for cases like server downtime), so I have implemented Mongo documents that basically store JobExecution, StepExecution, JobInstance, and ExecutionContext objects. They seem to create their respective objects correctly since I am able to use them to restart a job (after adding them to the job repository), but they are restarting from the very beginning when I instead want them to start from where they left off.
So I'm wondering then what I'm missing. Where exactly does a failed job store data of when/where it failed? I thought the readCount, readSkipCount, processSkipCount, etc variables would have something to do with it, but those are included in my StepExecution document (along with everything else the StepExecution class has a 'get' method for). I thought then maybe it was the execution context, but that was empty for both the job and its one step.
When a job is restarted, stateful components (those that implement ItemStream) receive the step's ExecutionContext during the open call allowing them to reset the state based on the last run. It's then up to the component to reset the state as well as maintain the state during the calls to ItemStream#update as processing occurs.
So based on the above, a failed job doesn't persist it's state when it fails...it's actually persisting it all along as it's successful. That way, when it does fail, things should be rolled back to the last successful point. Which leads me to...
Mongo isn't transactional. Are you sure the state is being persisted correctly? We don't have a Mongo based job repository for this reason...
I am revamping a dataloader that reads from flat file and batchinsert using jdbctemplate for every 500 items. I am using java executor fixed thread pool that submits tasks, which does reading each file and batchupdate. For example when reading first file, if it fails during 3rd batchinsert ,all the previous batchinsert for this file needs to be rollbacked. The task should continue with next file and create a new transaction for insert. I needed a code that can do this . Currently I am using transactiontemplate and wrapping the batchinsert code inside doInTransactionwithoutcallbackand during exception in catch block calling transaction status.setrollbackonly. But I need a code which can create new transaction for next file irrespective of whether last file failed or succeded.setting propagation to requires new solves it?
As Sean commented, you should not reinvent the whole thing, and use Spring Batch instead.
Spring Batch will allow you to:
partition the execution (e.g. using a thread pool executor)
map records in the file(s) to objects
set the right commit interval, where it'd commit a "chunk" of processed records, and rollback in case any of them are "wrong"
specify what errors are skippable, retryable
and much more
And it is already there => coded, tested and awesome.
I am using jboss5.1.x, ejb3.0
I have a transaction which goes like this:
MDB listen to JMS Queue.
MDB takes msg from JMS writing to Database.
in some of the catch clauses i throw "New EJBException(..)", in order to have rollbacks when specific exceptions occurs.
beside of that I have configured a re-try mechanism, after 3 times the msg going to error queue.
What i wanna achive is:
when Iam having a rollback, i want to increase the current re-try number, so if some1 is observing the database, he/she can see on-line the current re-try number.
the problem is: when I do rollback, so even the "insert_number_of_retry" query is being rolled back itself, which preventing from me to add the current retry number to the database
how can I solve this?
Thanks,
ray.
You can try to execute your logging method inside a separate transaction by annotating it with #TransactionAttribute(TransactionAttributeType.REQUIRES_NEW).
You need a separate transaction in a separate thread (you may use dedicated thread/pool for or spawn one, if need be). You have an option to wait for the forked tx to end or forfeit (and just continue w/ the rollback and fast exit), that depends on the extra logic and so.