Good Morning,
I am trying to create a scheduled task which has to update database entity cyclically, I am using Spring MVC and Hibernate as ORM.
Problem
The scheduled task should update entities in background, but changes are not persisted in the Database.
Structure of the system
I have a Batch entity with basic information and plenty of sensors inserting record in the DB every few seconds.
Related to the Batch entity, there is a TrackedBatch entity which contains many calculated fields related to the Batch entity itself, the scheduled task takes each Batch one by one, update related data from sensors with lotto = lottoService.updateBatchRelations(batch) and then update the TrackedBatch entity with the new computed data.
A user can modify Batch basic information, then the system should recompute TrackedBatch data and update the entity (this is done by the controller which calls updateBatchFollowingModification method). This step is correctly done with an asynch method, the problem comes when the scheduled task should recompute the same infos.
Asynch method used to update entities after user modification (Working correctly)
#Async("threadPoolTaskExecutor")
#Transactional
public void updateBatchFollowingModification(Lotto lotto)
{
logger.debug("Daemon started");
Lotto batch = lottoService.findBatchById(lotto.getId_lotto(), false);
lotto = lottoService.updateBatchRelations(batch);
lotto.setTrackedBatch(trackableBatchService.modifyTrackedBatch(batch.getTrackedBatch(), batch));
logger.debug("Daemon ended");
}
Scheduled methods to update entities cyclically (Not working as expected)
#Scheduled(fixedDelay = 10000)
public void updateActiveBatchesWithDaemon()
{
logger.info("updating active batches in background");
List<Integer> idsOfActiveBatches = lottoService.findIdsOfActiveBatchesInAllSectors();
if(!idsOfActiveBatches.isEmpty())
{
logger.info("found " + idsOfActiveBatches.size() + " active batches");
for(Integer id : idsOfActiveBatches)
{
logger.debug("update batch " + id + " in background");
updateBatch(id);
}
}
else
{
logger.info("no active batches found");
}
}
#Transactional
public void updateBatch(Integer id)
{
Lotto activeLotto = lottoService.findBatchById(id, false);
updateBatchFollowingModification(activeLotto);
}
As a premise, I can state that scheduled method is fired/configured correctly and runs continously (the same stands for asynch method, as following a user modification all entities are updated correctly), at line updateBatchFollowingModification(activeLotto) in updateBatch method, the related entities are modified correctly (even the TrackedBatch, I have checked with the debugger), then the changes are not persisted in the Database when method ends and no exception is thrown.
Looking around the internet I didn't find any solution to this problem nor it seems to be a known problem or bug from Hibernate and Spring.
Also reading Spring documentation about scheduling didn't help, I also tried to use save method in the scheduled task to save again the entity (but it obiously didn't work).
Further considerations
I do not know if the #Scheduled annotation needs some extra configuration to handle #Transactional methods as in the web devs are using those annotations together with no problem, moreover in documentation no cons are mentioned.
I also do not think it is a concurrency problem, because if the asynch method is modifying the data, the scheduled one should be stopped by the implicit optimistic locking system in order to finish after the first transaction commit, the same stands if the first to acquire the locking is the scheduled method (correct me if I am wrong).
I cannot figure out why changes are not persisted when the scheduled method is used, can someone link documentation or tutorials on this topic? so I can find a solution, or, better, if someone faced a similar problem, how it can be solved?
Finally I managed to resolve the issue by explicitly defining the isolation level for the transaction involved in the process and by eliminating the updateBatch method (as it was a kind of duplicated feature as updateBatchFollowingModification is doing the same thing), in particular I put the isolation level for updateBatchFollowingModification to #Transactional(isolation = Isolation.SERIALIZABLE).
This obviously works in my case as no scalability is needed, so serializing actions do not bring any problem to the application.
Related
In a recent task, after I created an object I flushed the result to the database. The database table had a unique constraint, meaning that if I tried to flush the same record for the second time, I would get a ConstraintViolationException. A sample snippet is shown below:
createEntityAndFlush(result);
sendAsyncRequestToThirdSystem(param);
The code for the createEntityAndFlush:
private T createEntityAndFlush(final T entity) throws ServiceException {
log.debug("Persisting {}", entity.getClass().getSimpleName());
getEntityManager().persist(entity);
getEntityManager().flush();
return entity;
}
The reason I used flush was that I wanted to make sure that a ConstraintViolationException would be thrown prior to finishing the transaction and thus calling the sendAsyncRequestToThirdSystem. But that was not the case, since sendAsyncRequestToThirdSystem was called after the exception was thrown.
To test the code in racing conditions, I used the ManagedExecutorService and created two runnable tasks (Future<?> submit(Runnable task)) to replicate the incoming request.
Eventually the problem was solved by trying performing a lock on a new table for each unique request id, but I would like to know where I was wrong in my first approach (ex. wrong use of flash, ManagedExecutorService was responsible for awkward behaviour). Thanks in advance!
The issue is that while flush() does flush the changes into the database, the transaction is still open, and the unique constraint will be checked when the transaction is committed (this may depend on the database, but at least with Postgres and any MVCC using DB).
So you will need to make sure that createEntityAndFlush(result); runs in its own transaction, possibly with a #Transactional(propagation = Propagation.REQUIRES_NEW) (or equivalent, if not using Spring) to see if the unique index is violated.
We have a method that has reads and writes to MySql, the method can be called by multiple threads. The db operations are like:
public List<Record> getAndUpdate() {
Task task = taskMapper.selectByPrimaryKey(id);
if (task.getStatus() == 0) {
insertRecords();
task.setStatus(1);
taskMapper.update(task);
}
// some queries and return data
return someRecordMapper.selectByXXX();
}
private void insertRecords() {
// read some files and create someRecords
someRecordMapper.insertBatch(someRecords);
}
The method reads a task's status, if the status is 0, it then inserts a bunch of records (of that task) to the Records table, and then set the status of the task to 1.
I want those DB operations to be transactional and exclusive, meaning that when one thread enters the transaction, other threads trying to read the same
task should block. Otherwise, they will see task status as 0 and insertRecords() will be called multiple times, resulting in duplicated data.
The #Transactional annotation doesn't seem to block transactions from other threads, it only ensures rollback in case of abortion. So I think with #Transactional alone, the above issue cannot be avoided.
I'm using MySql with mybatis, I think MySql itself can achieve such synchronization between threads so I try not to introduce extra components such as redis lock to do it. I wonder how can I do it in Spring?
I ended up using the "SELECT ... FOR UPDATE" query. With this query executed, all the other reads/writes are locked until the current transaction commits or gets rolled back. Also need to annotate the method with #Transactional. But the row lock and the transaction here are 2 different concerns. The test results is satisfactory.
Currently we are using play 1.2.5 with Java and MySQL. We have a simple JPA model (a Play entity extending Model class) we save to the database.
SimpleModel() test = new SimpleModel();
test.foo = "bar";
test.save();
At each web request we save multiple instances of the SimpleModel, for example:
JPAPlugin.startTx(false);
for (int i=0;i<5000;i++)
{
SimpleModel() test = new SimpleModel();
test.foo = "bar";
test.save();
}
JPAPlugin.closeTx(false);
We are using the JPAPlugin.startTx and closeTx to manually start and end the transaction.
Everything works fine if there is only one request executing the transaction.
What we noticed is that if a second request tries to execute the loop simultaneously, the second request gets a "Lock wait timeout exceeded; try restarting transaction javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not insert: [SimpleModel]" since the first request locks the table but is not done until the second request times out.
This results in multiple:
ERROR AssertionFailure:45 - an assertion failure occured (this may indicate a bug in Hibernate, but is more likely due to unsafe use of the session)
org.hibernate.AssertionFailure: null id in SimpleModel entry (don't flush the Session after an exception occurs)
Another disinfect is that the CPU usage during the inserts goes crazy.
To fix this, I'm thinking to create a transaction aware queue to insert the entities sequentially but this will result in huge inserting times.
What is the correct way to handle this situation?
JPAPlugin on Play Framwork 1.2.5 is not thread-safe and you will not resolve this using this version of Play.
That problem is fixed on Play 2.x, but if you can't migrate try to use hibernate directly.
You should not need to handle transactions yourself in this scenario.
Instead either put your inserts in a controller method or in an asynchronous job if the task is time consuming.
Jobs and controller both handle transasctions.
However check that this is really what you are trying to achieve. Each http request creating 5000 records does not seem realistic. Perhaps it would make more sense to have a container model with a collection?
Do you really need a transaction for the entire insert? Does it matter if the database is not locked during the data import?
You can simply create a job and execute it for each insert:
for (int i=0;i<5000;i++)
{
new Job() {
doJob(){
SimpleModel() test = new SimpleModel();
test.foo = "bar";
test.save();
}.now();
}
This will create a single transaction for each insert and get rid of your database lock issue.
I have one Entity named Transaction and its related table in database is TAB_TRANSACTIONS. The whole system is working pretty fine; now a new requirement has came up in which the client has demanded that all the transactions older than 30 days should be moved to another archive table, e.g. TAB_TRANSACTIONS_HIST.
Currently as a work around I have given them a script scheduled to run every 24 hours, which simply moves the data from Source to Dest.
I was wondering is there any better solution to this using hibernate?
Can I fetch Transaction entities and then store them in TAB_TRANSACTIONS_HISTORY? I have looked at many similar questions but couldn't find a solution to that, any suggestions would help.
You may want to create a quartz scheduler for this task. Here is the Job for the scheduler
public class DatabaseBackupJob implements Job {
public void execute(JobExecutionContext jec) throws JobExecutionException {
Configuration cfg=new Configuration();
cfg.configure("hibernate.cfg.xml");
Session session = cfg.buildSessionFactory().openSession();
Query q = session.createQuery("insert into Tab_Transaction_History(trans) select t.trans as trans from Tab_Transaction t where t.date < :date")
.setParameter("date", reqDate);
try{
Trasaction t = session.beginTransaction();
q.executeNonQuery();
t.commit();
} catch(Exception e){
} finally {
session.close();
}
}
}
P.S. hibernate doesnot provide a scheduler, so you cannot perform this activity using core hibernate and hence you need external APIs like quartz scheduler
The solution you search may be achieved only if you rely on TWO different persistence context, I think.
A single persistence context maps entities to tables in a non-dynamic way, so you can't perform a "runtime-switch" from a mapped-table to another.
But you can create a different persistence context (or a parallel configuration in hibernate instead of using 2 different contexts), then load this new configuration in a different EntityManager, and perform all your tasks.
That's the only solution that comes to mind, at the moment. Really don't know if it's adequate...
I think it's a good idea to run the script every 24 hours.
You could decrase the interval if you're not happy.
But if you already have a working script, where is your actual problem?
Checking the age of all transactions and move the ones older than 30 days to another list or map is the best way I think.
You will need some kind of schedule mechanism. Either a thread that is woken up periodically, or some other trigger that is appropriate for you.
You can also use a bulk insert operation
Query q = session.createQuery(
"insert into TabTransactionHistory tth
(.....)
select .... from TabTransaction tt"
);
int createdObjects = q.executeUpdate();
(Replace ... with actual fields)
You can also use the "where clause" which can be used to trim down result on basis of how old the entries are.
I have a Roo project that works "fine" with transactions, but each .merge() or .persist() takes longer and longer time, so that what should've taken 10ms takes 5000ms towards the end of the transaction. Luckily, my changes are individually idempotent, so I don't really need a transaction.
But when I throw out transaction handling I run into the classic "The context has been closed" when I do myObject.merge()
The job I'm running is from the command line as a batch, so here is what I usually do:
public static void main(final String[] args) {
context = new ClassPathXmlApplicationContext("META-INF/spring/applicationContext.xml");
JpaTransactionManager txMgr = (JpaTransactionManager) context.getBean("transactionManager");
TransactionTemplate txTemplate = new TransactionTemplate(txMgr);
txTemplate.execute(new TransactionCallback() { #SuppressWarnings("finally")
public Object doInTransaction(TransactionStatus txStatus) {
try {
ImportUnitFromDisk importer = new ImportUnitFromDisk();
int status = importer.run(args[0]);
System.out.println("Import data complete status: " + status);
} catch (Exception e) {
e.printStackTrace();
} finally {
return null;
}
}});
System.out.println("All done!");
System.exit(0);
}
But what I really want to do is something like this:
public static void main(final String[] args) {
ImportUnitFromDisk importer = new ImportUnitFromDisk();
int status = importer.run(args[0]);
System.out.println("Import data complete status: " + status);
System.out.println("All done!");
System.exit(0);
}
What can I do to allow me to persist() and merge() without using transactions, given that the entities are generated with Spring Roo (using OpenJPA and MySQL)?
Cheers
Nik
Even if your changes are idempotent, you still will need transaction.
As far as performance is concerned.
How tightly coupled is your entity objects. (For instance if all table fk refernces are migrated to entity relationship, then its pretty tightly coupled)?
May be you should remove some unwanted bidirectional relationships.
Identify master tables and remove entities mapping to master records.
What is your cascade options? Check if you have cascade all options everywhere.
For me it looks the Entity map is far too tightly coupled .(Everyone knows someone who has ...) and the cascade options kick off merging the whole object graph. (log your jpa sql, that can validate my assumption)
I have experienced exactly the same performance problem with a Spring / Hibernate batch process. Note that this has nothing to do with Spring Roo or even Spring - it is due to the workings of Hibernate / JPA.
The basic problem is that Hibernate maintains a session cache of all the Java entities that are part of the transaction, and for new entities (for which bytecode instrumentation has not been done) Hibernate must scan the entities on each flush to see if there were updates. This is at least O(n) for n = # of new entities in the session. If the batch process is primarily adding new entities, then this turns into O(n^2) behavior for the overall batch.
One solution if you want to maintain the whole process in one transaction is to periodically flush (to do inserts/updates) and then evict entities that you no longer need to keep in the session. Another solution is to split the batch process into multiple transactions.
See http://www.basilv.com/psd/blog/2010/avoiding-caching-to-improve-hibernate-performance for more details.