Best way to handle multiple inserts - java

Currently we are using play 1.2.5 with Java and MySQL. We have a simple JPA model (a Play entity extending Model class) we save to the database.
SimpleModel() test = new SimpleModel();
test.foo = "bar";
test.save();
At each web request we save multiple instances of the SimpleModel, for example:
JPAPlugin.startTx(false);
for (int i=0;i<5000;i++)
{
SimpleModel() test = new SimpleModel();
test.foo = "bar";
test.save();
}
JPAPlugin.closeTx(false);
We are using the JPAPlugin.startTx and closeTx to manually start and end the transaction.
Everything works fine if there is only one request executing the transaction.
What we noticed is that if a second request tries to execute the loop simultaneously, the second request gets a "Lock wait timeout exceeded; try restarting transaction javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not insert: [SimpleModel]" since the first request locks the table but is not done until the second request times out.
This results in multiple:
ERROR AssertionFailure:45 - an assertion failure occured (this may indicate a bug in Hibernate, but is more likely due to unsafe use of the session)
org.hibernate.AssertionFailure: null id in SimpleModel entry (don't flush the Session after an exception occurs)
Another disinfect is that the CPU usage during the inserts goes crazy.
To fix this, I'm thinking to create a transaction aware queue to insert the entities sequentially but this will result in huge inserting times.
What is the correct way to handle this situation?

JPAPlugin on Play Framwork 1.2.5 is not thread-safe and you will not resolve this using this version of Play.
That problem is fixed on Play 2.x, but if you can't migrate try to use hibernate directly.

You should not need to handle transactions yourself in this scenario.
Instead either put your inserts in a controller method or in an asynchronous job if the task is time consuming.
Jobs and controller both handle transasctions.
However check that this is really what you are trying to achieve. Each http request creating 5000 records does not seem realistic. Perhaps it would make more sense to have a container model with a collection?

Do you really need a transaction for the entire insert? Does it matter if the database is not locked during the data import?
You can simply create a job and execute it for each insert:
for (int i=0;i<5000;i++)
{
new Job() {
doJob(){
SimpleModel() test = new SimpleModel();
test.foo = "bar";
test.save();
}.now();
}
This will create a single transaction for each insert and get rid of your database lock issue.

Related

Scheduled Spring MVC task not updating DB entity

Good Morning,
I am trying to create a scheduled task which has to update database entity cyclically, I am using Spring MVC and Hibernate as ORM.
Problem
The scheduled task should update entities in background, but changes are not persisted in the Database.
Structure of the system
I have a Batch entity with basic information and plenty of sensors inserting record in the DB every few seconds.
Related to the Batch entity, there is a TrackedBatch entity which contains many calculated fields related to the Batch entity itself, the scheduled task takes each Batch one by one, update related data from sensors with lotto = lottoService.updateBatchRelations(batch) and then update the TrackedBatch entity with the new computed data.
A user can modify Batch basic information, then the system should recompute TrackedBatch data and update the entity (this is done by the controller which calls updateBatchFollowingModification method). This step is correctly done with an asynch method, the problem comes when the scheduled task should recompute the same infos.
Asynch method used to update entities after user modification (Working correctly)
#Async("threadPoolTaskExecutor")
#Transactional
public void updateBatchFollowingModification(Lotto lotto)
{
logger.debug("Daemon started");
Lotto batch = lottoService.findBatchById(lotto.getId_lotto(), false);
lotto = lottoService.updateBatchRelations(batch);
lotto.setTrackedBatch(trackableBatchService.modifyTrackedBatch(batch.getTrackedBatch(), batch));
logger.debug("Daemon ended");
}
Scheduled methods to update entities cyclically (Not working as expected)
#Scheduled(fixedDelay = 10000)
public void updateActiveBatchesWithDaemon()
{
logger.info("updating active batches in background");
List<Integer> idsOfActiveBatches = lottoService.findIdsOfActiveBatchesInAllSectors();
if(!idsOfActiveBatches.isEmpty())
{
logger.info("found " + idsOfActiveBatches.size() + " active batches");
for(Integer id : idsOfActiveBatches)
{
logger.debug("update batch " + id + " in background");
updateBatch(id);
}
}
else
{
logger.info("no active batches found");
}
}
#Transactional
public void updateBatch(Integer id)
{
Lotto activeLotto = lottoService.findBatchById(id, false);
updateBatchFollowingModification(activeLotto);
}
As a premise, I can state that scheduled method is fired/configured correctly and runs continously (the same stands for asynch method, as following a user modification all entities are updated correctly), at line updateBatchFollowingModification(activeLotto) in updateBatch method, the related entities are modified correctly (even the TrackedBatch, I have checked with the debugger), then the changes are not persisted in the Database when method ends and no exception is thrown.
Looking around the internet I didn't find any solution to this problem nor it seems to be a known problem or bug from Hibernate and Spring.
Also reading Spring documentation about scheduling didn't help, I also tried to use save method in the scheduled task to save again the entity (but it obiously didn't work).
Further considerations
I do not know if the #Scheduled annotation needs some extra configuration to handle #Transactional methods as in the web devs are using those annotations together with no problem, moreover in documentation no cons are mentioned.
I also do not think it is a concurrency problem, because if the asynch method is modifying the data, the scheduled one should be stopped by the implicit optimistic locking system in order to finish after the first transaction commit, the same stands if the first to acquire the locking is the scheduled method (correct me if I am wrong).
I cannot figure out why changes are not persisted when the scheduled method is used, can someone link documentation or tutorials on this topic? so I can find a solution, or, better, if someone faced a similar problem, how it can be solved?
Finally I managed to resolve the issue by explicitly defining the isolation level for the transaction involved in the process and by eliminating the updateBatch method (as it was a kind of duplicated feature as updateBatchFollowingModification is doing the same thing), in particular I put the isolation level for updateBatchFollowingModification to #Transactional(isolation = Isolation.SERIALIZABLE).
This obviously works in my case as no scalability is needed, so serializing actions do not bring any problem to the application.

JPA use flush to trigger exception and halt execution

In a recent task, after I created an object I flushed the result to the database. The database table had a unique constraint, meaning that if I tried to flush the same record for the second time, I would get a ConstraintViolationException. A sample snippet is shown below:
createEntityAndFlush(result);
sendAsyncRequestToThirdSystem(param);
The code for the createEntityAndFlush:
private T createEntityAndFlush(final T entity) throws ServiceException {
log.debug("Persisting {}", entity.getClass().getSimpleName());
getEntityManager().persist(entity);
getEntityManager().flush();
return entity;
}
The reason I used flush was that I wanted to make sure that a ConstraintViolationException would be thrown prior to finishing the transaction and thus calling the sendAsyncRequestToThirdSystem. But that was not the case, since sendAsyncRequestToThirdSystem was called after the exception was thrown.
To test the code in racing conditions, I used the ManagedExecutorService and created two runnable tasks (Future<?> submit(Runnable task)) to replicate the incoming request.
Eventually the problem was solved by trying performing a lock on a new table for each unique request id, but I would like to know where I was wrong in my first approach (ex. wrong use of flash, ManagedExecutorService was responsible for awkward behaviour). Thanks in advance!
The issue is that while flush() does flush the changes into the database, the transaction is still open, and the unique constraint will be checked when the transaction is committed (this may depend on the database, but at least with Postgres and any MVCC using DB).
So you will need to make sure that createEntityAndFlush(result); runs in its own transaction, possibly with a #Transactional(propagation = Propagation.REQUIRES_NEW) (or equivalent, if not using Spring) to see if the unique index is violated.

Use Transactions for Synchronization in Spring

We have a method that has reads and writes to MySql, the method can be called by multiple threads. The db operations are like:
public List<Record> getAndUpdate() {
Task task = taskMapper.selectByPrimaryKey(id);
if (task.getStatus() == 0) {
insertRecords();
task.setStatus(1);
taskMapper.update(task);
}
// some queries and return data
return someRecordMapper.selectByXXX();
}
private void insertRecords() {
// read some files and create someRecords
someRecordMapper.insertBatch(someRecords);
}
The method reads a task's status, if the status is 0, it then inserts a bunch of records (of that task) to the Records table, and then set the status of the task to 1.
I want those DB operations to be transactional and exclusive, meaning that when one thread enters the transaction, other threads trying to read the same
task should block. Otherwise, they will see task status as 0 and insertRecords() will be called multiple times, resulting in duplicated data.
The #Transactional annotation doesn't seem to block transactions from other threads, it only ensures rollback in case of abortion. So I think with #Transactional alone, the above issue cannot be avoided.
I'm using MySql with mybatis, I think MySql itself can achieve such synchronization between threads so I try not to introduce extra components such as redis lock to do it. I wonder how can I do it in Spring?
I ended up using the "SELECT ... FOR UPDATE" query. With this query executed, all the other reads/writes are locked until the current transaction commits or gets rolled back. Also need to annotate the method with #Transactional. But the row lock and the transaction here are 2 different concerns. The test results is satisfactory.

JAX-WS Webservice with JPA transactions

I'm going to become mad with JPA...
I have a JAX-WS Webservice like that
#WebService
public class MyService
{
#EJB private MyDbService myDbService;
...
System.out.println(dmrService.read());
...
}
My EJB contains
#Stateless
public class MyDbService
{
#PersistenceContext(unitName="mypu")
private EntityManager entityManager;
public MyEntity read()
{
MyEntity myEntity;
String queryString = "SELECT ... WHERE e.name = :type";
TypedQuery<MyEntity> query = entityManager.createQuery(queryString,MyEntity.class);
query.setParameter("type","xyz");
try
{
myEntity= query.getSingleResult();
}
catch (Exception e)
{
myEntity= null;
}
return myEntity;
}
In my persistence.xml the mypu has transaction-type="JTA" and a jta-data-source
If I call the webservice, it's working. The entity is retrieved from the db.
Now, using an external tool, I'm changing the value of one field in my record.
I'm calling the webservice again and ... the entity displayed contains the old value.
If I'm deploying again, or if I'm adding a entityManager.refresh(myEntity) after the request, I have the good value again.
In #MyTwoCents answer, Option 2 is to NOT use your 'external' tool for changes, use your application instead. Caching is of more use if your application knows about all the changes going on, or has some way of being informed of them. This is the better option, but only if your application can be the single access point for the data.
Forcing a refresh, via EntityManager.refresh() or through provider specific query hints on specific queries, or by invalidating the cache as described here https://wiki.eclipse.org/EclipseLink/Examples/JPA/Caching#How_to_refresh_the_cache is another option. This forces JPA to go past the cache and access the database on the specific query. Problems with this are you must either know when the cache is stale and needs to be refreshed, or put it on queries that cannot tolerate stale data. If that is fairly frequent or on every query, then your application is going through all the work of maintaining a cache that isn't used.
The last option is to turn off the second level cache. This forces queries to always load entities into an EntityManager from the database data, not a second level cache. You reduce the risk of stale data (but not eliminate it, as the EntityManager is required to have its own first level cache for managed entities, representing a transactional cache), but at the cost of reloading and rebuilding entities, sometimes unnecessarily if they have been read before by other threads.
Which is best depends entirely on the application and its expected use cases.
Don't be mad its fine
Flow goes like this.
You fired a query saying where type="xyz"
Now Hibernate keeps this query or state in cache so that if you fire query again it will return same value if state is not changes.
Now you are updating detail from some external resource.
Hibernate doesnt have any clue about that
So when you fire query again it returns from catch
When you do refresh, hibernate gets detail from Database
Solution :
So you can either add refresh before calling get call
OR
Change the Table value using Hibernate methods in Application so that Hibernate is aware about changes.
OR
Disable Hibernate cache to query each time from DB (not recommended as it will slow down stuff)

Why isn't my Hibernate insert reflected in my Hibernate query?

I've been asked to write some coded tests for a hibernate-based data access object.
I figure that I'd start with a trivial test: when I save a model, it should be in the collection returned by dao.getTheList(). The problem is, no matter what, when I call dao.getTheList(), it is always an empty collection.
The application code is already working in production, so let's assume that the problem is just with my test code.
#Test
#Transactional("myTransactionManager")
public void trivialTest() throws Exception {
...
// create the model to insert
...
session.save(model);
session.flush();
final Collection<Model> actual = dao.getTheList();
assertEquals(1, actual.size());
}
The test output is expected:<1> but was:<0>
So far, I've tried explicitly committing after the insert, and disabling the cache, but that hasn't worked.
I'm not looking to become a master of Hibernate, and I haven't been given enough time to read the entire documentation. Without really knowing where to start, this seemed like this might be a good question for the community.
What can I do to make sure that my Hibernate insert is flushed/committed/de-cached/or whatever it is, before the verification step of the test executes?
[edit] Some additional info on what I've tried. I tried manually committing the transaction between the insert and the call to dao.getTheList(), but I just get the error Could not roll back Hibernate transaction; nested exception is org.hibernate.TransactionException: Transaction not successfully started
#Test
#Transactional("myTransactionManager")
public void trivialTest() throws Exception {
...
// create the model to insert
...
final Transaction firstTransaction = session.beginTransaction();
session.save(model);
session.flush();
firstTransaction.commit();
final Transaction secondTransaction = session.beginTransaction();
final Collection<SystemConfiguration> actual = dao.getTheList();
secondTransaction.commit();
assertEquals(1, actual.size());
}
I've also tried breaking taking the #Transactional annotation off the test thread and annotating each of 2 helper methods, one for each Hibernate job. For that, though I get the error: No Hibernate Session bound to thread, and configuration does not allow creation of non-transactional one here.
[/edit]
I think the underlying DBMS might hide the change to other transactions as long as the changing transaction is not completed yet. Is getTheList running in an extra transaction? Are you using oracle or postgres?

Categories