I would like to know if there is possibility to have transaction over multiple threads.
The problem is, I need to persist large number of records. If i do it sequentially it's going to take long time. That's why i need to partition these entity objects and persist them parallelly. Which is faster and works great. But only trade off is. Each thread is in it's own transaction. If any of this fail. It will rollback its own thread transaction but not all of them.
Is there any way to run all the threads in single transaction. So that i can control the transaction.
I am using JpaRepository respository to persist or delete.
code sample
#Transactional
public boolean executeTransaction(final EntityWrapper entityWrapper) {
transactionTemplate = new TransactionTemplate(transactionManagerRepo);
try {
transactionTemplate.execute(new TransactionCallbackWithoutResult() {
protected void doInTransactionWithoutResult(TransactionStatus status) {
repo.saveAll(list); // Saves all at once. Batch commit. Which takes long time
// other way executing in ThreadPoolExecutor saving 200 records parallelly
entitiesList = partition(entitiesMasterList, 200);
executor = ThreadPoolExecutor.initializeThreadPoolExecutor(entitiesList.size());
Please suggest.
Related
I've seen articles saying that we should try to limit the scope of transaction, e.g. instead of doing this:
#Transactional
public void save(User user) {
queryData();
addData();
updateData();
}
We should exclude queryData from the transaction by using Spring's TransactionTemplate (or just move it out of the transactional method):
#Autowired
private TransactionTemplate transactionTemplate;
public void save(final User user) {
queryData();
transactionTemplate.execute((status) => {
addData();
updateData();
return Boolean.TRUE;
})
}
But my understanding is that since JDBC will always need a transaction for all operations, if I use the second way, there will be 2 transactions opened and closed, 1 for queryData (opened by JDBC), and another for codes inside transactionTemplate.execute opened by our class. If so, won't this be a waste of resources now that you've split 1 transaction into 2?
If an transaction starts , it will use up one DB connection. So we generally want the transaction to be completed as fast as possible , and delay to start it as much as we can until we really need to access DB such that the connection pool has more time to provide more available connections for other requests to use.
So if part of the workflow within your function requires to take some time to finish their work and that work is not required to access DB, it is true that it is better to limit the scope of the transaction to exclude this part of the codes.
But in your example, as both transaction are executed in series and both need to access DB , I don't see there are any points to separate them into two different transactions.
Also, in term of Hibernate, it is very normal to load and update the entities in the same transaction such that you do not need to deal with the detached entities if the entities that you update are loaded from another already closed transaction. Dealing with detached entities is not easy if you are not familiar with Hibernate.
I need to perform the batch operation with a Pessimistic lock so that no other can read or write this row during this operation. But I want to increment the page number after every batch so that if the transaction failed or instance died, I will continue from the last page. But with the below code, it is not updating the page until all batches are completed so when restarting the job, it's processing from page=0.
#Transactional
public void process(long id) {
Entity entity = repo.findById(id);
processBatch(entity);
}
void processBatch(Entity entity) {
int totalPages = otherMicroService.getTotalPages(id);
int lastPage = entity.getPage();
for(int page=lastPage; i<totalPages; i++) {
doOperation();
entity.setPage(page);
repo.save(entity);
}
}
#Lock(LockModeType.PESSIMISTIC_WRITE)
Optional<Entity> findById(int id);
Is there a way to update the page after every batch with PESSIMISTIC_WRITE enabled?
Thanks in Advance.
I'd add #Transactional(propagation = Propagation.REQUIRES_NEW) to the processBatch so the data will be committed at every iteration (btw, save and saveAndFlush will work the same in this case).
Be aware that after adding this annotation the processBatch method needs to be moved to the separate bean (or bean's self-injection should be made) in order to make Spring AOP work correctly.
We have a method that has reads and writes to MySql, the method can be called by multiple threads. The db operations are like:
public List<Record> getAndUpdate() {
Task task = taskMapper.selectByPrimaryKey(id);
if (task.getStatus() == 0) {
insertRecords();
task.setStatus(1);
taskMapper.update(task);
}
// some queries and return data
return someRecordMapper.selectByXXX();
}
private void insertRecords() {
// read some files and create someRecords
someRecordMapper.insertBatch(someRecords);
}
The method reads a task's status, if the status is 0, it then inserts a bunch of records (of that task) to the Records table, and then set the status of the task to 1.
I want those DB operations to be transactional and exclusive, meaning that when one thread enters the transaction, other threads trying to read the same
task should block. Otherwise, they will see task status as 0 and insertRecords() will be called multiple times, resulting in duplicated data.
The #Transactional annotation doesn't seem to block transactions from other threads, it only ensures rollback in case of abortion. So I think with #Transactional alone, the above issue cannot be avoided.
I'm using MySql with mybatis, I think MySql itself can achieve such synchronization between threads so I try not to introduce extra components such as redis lock to do it. I wonder how can I do it in Spring?
I ended up using the "SELECT ... FOR UPDATE" query. With this query executed, all the other reads/writes are locked until the current transaction commits or gets rolled back. Also need to annotate the method with #Transactional. But the row lock and the transaction here are 2 different concerns. The test results is satisfactory.
There is one batch job looking like this:
#Transactional
public void myBatchJob() {
// retrieves thousands of entries and locks them
// to prevent other jobs from touthing this dataset
entries = getEntriesToProcessWithLock();
additional = doPrepWork(); // interacts with DB
processor = applicationContext.getBean(getClass());
while (!entries.isEmpty()) {
result = doActualProcessing(entries, additional); // takes as many entries as it needs; removes them from collection afterwards
resultDao.save(result);
}
}
However I occasionally get the below error if the entries collection is big enough.
ORA-01000: maximum open cursors exceeded
I decided to blame doActualProcessing() and save() methods as they could end up in creating hundreds of blobs in one transaction.
The obvious way out seems to be splitting processing into multiple transactions: one for getting and locking entries and multiple other transactions for processing and persisting. Like this:
#Transactional
public void myBatchJob() {
// retrieves thousands of entries and locks them
// to prevent other jobs from touthing this dataset
entries = getEntriesToProcessWithLock();
additional = doPrepWork(); // interacts with DB
processor = applicationContext.getBean(getClass());
while (!entries.isEmpty()) {
processor.doProcess(entries, additional);
}
}
#Transactional(propagation=REQUIRES_NEW)
public void doProcess(entries, additional) {
result = doActualProcessing(entries, additional); // takes as many entries as it needs; removes them from collection afterwards
resultDao.save(result);
}
and now whenever doProcess is called I get:
Caused by: org.hibernate.HibernateException: illegally attempted to associate a proxy with two open Sessions
How do I make HibernateTransactionManager do what REQUIRES_NEW javadoc suggests: suspend current transaction and start a new one?
In my opinion the problem lies in the fact that you have retrieved the entities in the top Transaction and while they are still associated with that transaction you try to pass them (proxies) to method which would be processed in a separate transaction.
I think that you could try two options:
1) Detach the entities before ivoking processor.doProcess(entries, additional);:
session.evict(entity); // loop through the list and do this
then inside inner transaction try to merge:
session.merge(entity);
2) Second option would be to retrieve ids instead of entities in the getEntriesToProcessWithLock. Then you would pass plain primitive fields which wont cause proxy problems. You would then retrieve proper entities inside of the inner transaction.
I am trying to perform batch inserts with data that is currently being inserted to DB one statement per transaction. Transaction code statement looks similar to below. Currently, addHolding() method is being called for each quote that comes in from an external feed, and each of these quote updates happens about 150 times per second.
public class HoldingServiceImpl {
#Autowired
private HoldingDAO holdingDao;
#Transactional(propagation = Propagation.REQUIRES_NEW, rollbackFor = Exception.class)
public void addHolding(Quote quote) {
Holding holding = transformQuote(quote);
holdingDao.addHolding(holding);
}
}
And DAO is getting current session from Hibernate SessionFactory and calling save on object.
public class HoldingDAOImpl {
#Autowired
private SessionFactory sessionFactory;
public void addHolding(Holding holding) {
sessionFactory.getCurrentSession().save(holding);
}
}
I have looked at Hibernate batching documentation, but it is not clear from document how I would organize code for batch inserting in this case, since I don't have the full list of data at hand, but rather am waiting for it to stream.
Does merely setting Hibernate batching properties in properties file (e.g. hibernate.jdbc.batch_size=20) "magically" batch insert these? Or will I need to, say, capture each quote update in a synchronized list, and then insert list load and clear list when batch size limit reached?
Also, the whole purpose of implementing batching is to see if performance improves. If there is better way to handle inserts in this scenario, let me know.
Setting the property hibernate.jdbc.batch_size=20 is an indication for the hibernate to Flush the objects after 20. In your case hibernate automatically calls sessionfactory.flush() after 20 records saved.
When u call a sessionFactory.save(), the insert command is only fired to in-memory hibernate cache. Only once the Flush is called hibernate synchronizes these changes with the Database. Hence setting hibernate batch size is enough to do batch inserts. Fine tune the Batch size according to your needs.
Also make sure your transactions are handled properly. If you commit a transaction also forces hibernate to flush the session.