I need to perform the batch operation with a Pessimistic lock so that no other can read or write this row during this operation. But I want to increment the page number after every batch so that if the transaction failed or instance died, I will continue from the last page. But with the below code, it is not updating the page until all batches are completed so when restarting the job, it's processing from page=0.
#Transactional
public void process(long id) {
Entity entity = repo.findById(id);
processBatch(entity);
}
void processBatch(Entity entity) {
int totalPages = otherMicroService.getTotalPages(id);
int lastPage = entity.getPage();
for(int page=lastPage; i<totalPages; i++) {
doOperation();
entity.setPage(page);
repo.save(entity);
}
}
#Lock(LockModeType.PESSIMISTIC_WRITE)
Optional<Entity> findById(int id);
Is there a way to update the page after every batch with PESSIMISTIC_WRITE enabled?
Thanks in Advance.
I'd add #Transactional(propagation = Propagation.REQUIRES_NEW) to the processBatch so the data will be committed at every iteration (btw, save and saveAndFlush will work the same in this case).
Be aware that after adding this annotation the processBatch method needs to be moved to the separate bean (or bean's self-injection should be made) in order to make Spring AOP work correctly.
Related
I am using Spring Boot JPA, I have enabled batching by ensuring the following lines are in the my application.properties:
spring.jpa.properties.hibernate.jdbc.batch_size=1000
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
I now have a loop where I am doing a findById on an entity and then saving that entity like so:
var entity = dao.findById(id)
// Do some stuff
dao.save(entity) //This line is not really required but I am being explicit here
Putting the above in a loop I see that the save(update) statements are batched to the DB.
My issue is that if I do a findOneByX where X is a property on the entity then the batching does not work (batch size of 1), requests get sent one at a time i.e.:
var entity = dao.findOneByX(x)
// Do some stuff
dao.save(entity)
Why is this happening? Is JPA/JDBC only equipped to batch when we findById only?
Solution
Refer to How to implement batch update using Spring Data Jpa?
Fetch the list of entity you want to update to a list
Update as desired
Call saveAll
PS: beware of memory usage for this solution, when your list size is large.
Why findById and findOneByX behave differently?
As suggested by M. Deinum, hibernate will auto flush your change
prior to executing a JPQL/HQL query that overlaps with the queued entity actions
Since both findById and findOneByX will execute query, what is the different between them?
First, the reason to flush is to make sure session and Database are in same state, hence you can get consistent result from session cache(if available) and database.
When calling findById, hibernate will try to get it from session cache, if entity is not available, fetch it from database. While for findOneByX, we always need to fetch it from database as it is impossible to cache entity by X.
Then we can consider below example:
#Entity
#Getter
#Setter
#NoArgsConstructor
#AllArgsConstructor
public class Student {
#Id
private Long id;
private String name;
private int age;
}
Suppose we have
id
name
age
1
Amy
10
#Transactional
public void findByIdAndUpdate() {
dao.save(new Student(2L, "Dennis", 14));
// no need to flush as we can get from session
for (int i = 0; i < 100; i++) {
Student dennis = dao.findById(2L).orElseThrow();
dennis.setAge(i);
dao.save(dennis);
}
}
Will result in
412041 nanoseconds spent executing 2 JDBC batches;
1 for insert 1 one for update.
Hibernate: I'm sure that result can be fetch from session (without flush) or database if record is not in session, so let's skip flushing as it is slow!
#Transactional
public void findOneByNameAndUpdate() {
Student amy = dao.findOneByName("Amy");
// this affect later query
amy.setName("Tammy");
dao.save(amy);
for (int i = 0; i < 100; i++) {
// do you expect getting result here?
Student tammy = dao.findOneByName("Tammy");
// Hibernate not smart enough to notice this will not affect later result.
tammy.setAge(i);
dao.save(tammy);
}
}
Will result in
13964088 nanoseconds spent executing 101 JDBC batches;
1 for first update and 100 for update in loop.
Hibernate: Hmm, I'm not sure if stored update will affect the result, better flush the update or I will be blamed by developer.
I would like to know if there is possibility to have transaction over multiple threads.
The problem is, I need to persist large number of records. If i do it sequentially it's going to take long time. That's why i need to partition these entity objects and persist them parallelly. Which is faster and works great. But only trade off is. Each thread is in it's own transaction. If any of this fail. It will rollback its own thread transaction but not all of them.
Is there any way to run all the threads in single transaction. So that i can control the transaction.
I am using JpaRepository respository to persist or delete.
code sample
#Transactional
public boolean executeTransaction(final EntityWrapper entityWrapper) {
transactionTemplate = new TransactionTemplate(transactionManagerRepo);
try {
transactionTemplate.execute(new TransactionCallbackWithoutResult() {
protected void doInTransactionWithoutResult(TransactionStatus status) {
repo.saveAll(list); // Saves all at once. Batch commit. Which takes long time
// other way executing in ThreadPoolExecutor saving 200 records parallelly
entitiesList = partition(entitiesMasterList, 200);
executor = ThreadPoolExecutor.initializeThreadPoolExecutor(entitiesList.size());
Please suggest.
So I have this method:
#Transactional
public void savePostTitle(Long postId, String title) {
Post post = postRepository.findOne(postId);
post.setTitle(title);
}
As per this post:
The save method serves no purpose. Even if we remove it, Hibernate
will still issue the UPDATE statement since the entity is managed and
any state change is propagated as long as the currently running
EntityManager is open.
and indeed the update statement is issued, but if I run the method without the #Transactional annotation:
public void savePostTitle(Long postId, String title) {
Post post = postRepository.findOne(postId);
post.setTitle(title);
}
Hibernate will not issue the update statement so one has to call postRepository.save(post);explicitly.
What is the difference between using #Transactional or not in this specific scenario?
In a standard configuration, the scope of a persistence context is bound to the transaction.
If you don't have an explicit transaction defined by means of the annotation your (non-existing) transaction span just the reading call to the database.
After that the entity just loaded is not managed.
This means changes to it won't get tracked nor saved.
Flushing won't help because there are no changes tracked.
I have the following implementation.
#Transactional
public void saveAndGenerateResult(Data data) {
saveDataInTableA(data.someAmountForA);
saveDataInTableB(data.someAmountForB);
callAnAggregatedFunction(data);
}
public void saveDataInTableA(DataA a) {
tableARepository.saveAndFlush(a);
}
public void saveDataInTableA(DataB b) {
tableBRepository.saveAndFlush(b);
}
public void callAnAggregatedFunction() {
// Do something based on the data saved from the beginning in Table A and Table B
}
It is important to use saveAndFlush to have the data immediately available to the callAnAggregatedFunction function to get an aggregated result and save it to another table. That is why I am not using save function which does not flush the transactions into database immediately as far as I know.
However, I am using a #Transactional annotation over the function saveAndGenerateResult, as I want to rollback the database transactions that I have done in that function in case of any failure which is normally ensured by having a #Transactional annotation over a method.
What will be the scenario in this specific case? I am using saveAndFlush which flushes the data immediately into the database table and if the last function (i.e. callAnAggregatedFunction) fails to write the data into the table, will the previous write operations in table A and table B will be rollbacked?
Will the previous write operations in table A and table B be rollbacked?
Yes, unless your saveAndFlush() methods have their own transactions (i.e. with propagation = REQUIRES_NEW).
If they're all part of the transaction you started in saveAndGenerateResult(), all modifications made to the database will be rolled back in case of failure.
For more information: Spring - #Transactional - What happens in background?
Spring #Transactional - isolation, propagation
I am trying to perform batch inserts with data that is currently being inserted to DB one statement per transaction. Transaction code statement looks similar to below. Currently, addHolding() method is being called for each quote that comes in from an external feed, and each of these quote updates happens about 150 times per second.
public class HoldingServiceImpl {
#Autowired
private HoldingDAO holdingDao;
#Transactional(propagation = Propagation.REQUIRES_NEW, rollbackFor = Exception.class)
public void addHolding(Quote quote) {
Holding holding = transformQuote(quote);
holdingDao.addHolding(holding);
}
}
And DAO is getting current session from Hibernate SessionFactory and calling save on object.
public class HoldingDAOImpl {
#Autowired
private SessionFactory sessionFactory;
public void addHolding(Holding holding) {
sessionFactory.getCurrentSession().save(holding);
}
}
I have looked at Hibernate batching documentation, but it is not clear from document how I would organize code for batch inserting in this case, since I don't have the full list of data at hand, but rather am waiting for it to stream.
Does merely setting Hibernate batching properties in properties file (e.g. hibernate.jdbc.batch_size=20) "magically" batch insert these? Or will I need to, say, capture each quote update in a synchronized list, and then insert list load and clear list when batch size limit reached?
Also, the whole purpose of implementing batching is to see if performance improves. If there is better way to handle inserts in this scenario, let me know.
Setting the property hibernate.jdbc.batch_size=20 is an indication for the hibernate to Flush the objects after 20. In your case hibernate automatically calls sessionfactory.flush() after 20 records saved.
When u call a sessionFactory.save(), the insert command is only fired to in-memory hibernate cache. Only once the Flush is called hibernate synchronizes these changes with the Database. Hence setting hibernate batch size is enough to do batch inserts. Fine tune the Batch size according to your needs.
Also make sure your transactions are handled properly. If you commit a transaction also forces hibernate to flush the session.