Spring Roo project as batch job without transactions

Spring Roo project as batch job without transactions - java

I have a Roo project that works "fine" with transactions, but each .merge() or .persist() takes longer and longer time, so that what should've taken 10ms takes 5000ms towards the end of the transaction. Luckily, my changes are individually idempotent, so I don't really need a transaction.
But when I throw out transaction handling I run into the classic "The context has been closed" when I do myObject.merge()
The job I'm running is from the command line as a batch, so here is what I usually do:
public static void main(final String[] args) {
context = new ClassPathXmlApplicationContext("META-INF/spring/applicationContext.xml");
JpaTransactionManager txMgr = (JpaTransactionManager) context.getBean("transactionManager");
TransactionTemplate txTemplate = new TransactionTemplate(txMgr);
txTemplate.execute(new TransactionCallback() { #SuppressWarnings("finally")
public Object doInTransaction(TransactionStatus txStatus) {
try {
ImportUnitFromDisk importer = new ImportUnitFromDisk();
int status = importer.run(args[0]);
System.out.println("Import data complete status: " + status);
} catch (Exception e) {
e.printStackTrace();
} finally {
return null;
}
}});
System.out.println("All done!");
System.exit(0);
}
But what I really want to do is something like this:
public static void main(final String[] args) {
ImportUnitFromDisk importer = new ImportUnitFromDisk();
int status = importer.run(args[0]);
System.out.println("Import data complete status: " + status);
System.out.println("All done!");
System.exit(0);
}
What can I do to allow me to persist() and merge() without using transactions, given that the entities are generated with Spring Roo (using OpenJPA and MySQL)?
Cheers
Nik

Even if your changes are idempotent, you still will need transaction.
As far as performance is concerned.
How tightly coupled is your entity objects. (For instance if all table fk refernces are migrated to entity relationship, then its pretty tightly coupled)?
May be you should remove some unwanted bidirectional relationships.
Identify master tables and remove entities mapping to master records.
What is your cascade options? Check if you have cascade all options everywhere.
For me it looks the Entity map is far too tightly coupled .(Everyone knows someone who has ...) and the cascade options kick off merging the whole object graph. (log your jpa sql, that can validate my assumption)

I have experienced exactly the same performance problem with a Spring / Hibernate batch process. Note that this has nothing to do with Spring Roo or even Spring - it is due to the workings of Hibernate / JPA.
The basic problem is that Hibernate maintains a session cache of all the Java entities that are part of the transaction, and for new entities (for which bytecode instrumentation has not been done) Hibernate must scan the entities on each flush to see if there were updates. This is at least O(n) for n = # of new entities in the session. If the batch process is primarily adding new entities, then this turns into O(n^2) behavior for the overall batch.
One solution if you want to maintain the whole process in one transaction is to periodically flush (to do inserts/updates) and then evict entities that you no longer need to keep in the session. Another solution is to split the batch process into multiple transactions.
See http://www.basilv.com/psd/blog/2010/avoiding-caching-to-improve-hibernate-performance for more details.

Related

How to check special conditions before saving data with Hibernate

Sample Scenario
I have a limit that controls the total value of a column. If I make a save that exceeds this limit, I want it to throw an exception. For example;
Suppose I have already added the following data: LIMIT = 20
id
code
value
1
A
15
2
A
5
3
B
12
4
B
3
If I insert (A,2) it exceeds the limit and I want to get exception
If I insert (B,4) the transaction should be successful since it didn't exceed the limit
code and value are interrelated
What can I do
I can check this scenario with required queries. For example, I write a method for it and I can check it in the save method. That's it.
However, I'm looking for a more useful solution than this
For example, is there any annotation when designing Entity ?
Can I do this without calling the method that provides this control every time ?
What examples can I give ?
#UniqueConstraint checking if it adds the same values

Using transaction
The most common and long-accepted way is to simply abstract in a suitable form (in a class, a library, a service, ...) the business rules that govern the behavior you describe, within a transaction:
#Transactional(propagation = Propagation.REQUIRED)
public RetType operation(ReqType args) {
...
perform operations;
...
if(fail post conditions)
throw ...;
...
}
In this case, if when calling a method there is already an open transaction, that transaction will be used (and there will be no interlocks), if there is no transaction created, it will create a new one so that both the operations and the postconditions check are performed within the same transaction.
Note that with this strategy both operation and invariant check transactions can combine multiple transactional states managed by the TransactionManager (e.g. Redis, MySQL, MQS, ... simultaneously and in a coordinated manner).
Using only the database
It has not been used for a long time (in favor of the first way) but using TRIGGERS was the canonical option used some decades ago to check postconditions, but this solution is usually coupled to the specific database engine (e.g. in PostgreSQL or MySQL).
It could be useful in the case where the client making the modifications is unable or unwilling (not safe) to check postconditions (e.g. bash processes) within a transaction. But nowadays it is infrequent.
The use of TRIGGERS may also be preferable in certain scenarios where efficiency is required, as there are certain optimization options within the database scripts.

Neither Hibernate nor Spring Data JPA have anything built-in for this scenario. You have to program the transaction logic in your repository yourself:
#PersistenceContext
EntityManager em;
public addValue(String code, int value) {
var checkQuery = em.createQuery("SELECT SUM(value) FROM Entity WHERE code = :code", Integer.class);
checkQuery.setParameter("code", code);
if (checkQuery.getSingleResult() + value > 20) {
throw new LimitExceededException("attempted to exceed limit for " + code);
}
var newEntity = new Entity();
newEntity.setCode(code);
newEntity.setValue(value);
em.persist(newEntity);
}
Then (it's important!) you have to define SERIALIZABLE isolation level on the #Transactional annotations for the methods that work with this table.
Read more about serializable isolation level here, they have an oddly similar example.
Note that you have to consider retrying the failed transaction. No idea how to do this with Spring though.

You should use a singleton (javax/ejb/Singleton)
#Singleton
public class Register {
#Lock(LockType.WRITE)
public register(String code, int value) {
if(i_can_insert_modify(code, value)) {
//use entityManager or some dao
} else {
//do something
}
}
}

Spring Batch + Hibernate: Resolve ManyToMany on Data Migration

We are doing a data migration from one database to another using Hibernate and Spring Batch. The example below is slightly disguised.
Therefore, we are using the standard processing pipeline:
return jobBuilderFactory.get("migrateAll")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(DConfiguration.migrateD())
and migrateD consists of three steps:
#Bean(name="migrateDsStep")
public Step migrateDs() {
return stepBuilderFactory.get("migrateDs")
.<org.h2.D, org.mssql.D> chunk(100)
.reader(dReader())
.processor(dItemProcessor)
.writer(dWriter())
.listener(chunkLogger)
.build();
Now asume that this table has a manytomany relationship to another table. How can I persist that? I have basically a JPA Entity Class for all my Entities and fill those in the processor which does the actual migration from the old database objects to the new ones.
#Component
#Import({mssqldConfiguration.class, H2dConfiguration.class})
public class ClassificationItemProcessor implements ItemProcessor<org.h2.d, org.mssql.d> {
public ClassificationItemProcessor() {
super();
}
public Classification process(org.h2.d a) throws Exception {
d di = new di();
di.setA(a.getA);
di.setB(a.getB);`
// asking for object e.g. possible via, But this does not work:
// Set<e> es = eRepository.findById(a.getes());
di.set(es)
...
// How to model a m:n?
return d;
}
So I could basically ask for the related object via another database call (Repository) and add it to d. But when I do that, I rather run into LazyInitializationExceptions or, if it was successful sometimes the data in the intermediate tables will not have been filled up.
What is the best practice to model this?

This is not a Spring Batch issue, it is rather a Hibernate mapping issue. As far as Spring Batch is concerned, your input items are of type org.h2.D and your output items are of type org.mssql.D. It is up to you define what an item is and how to "enrich" it in your item processor.
You need to make sure that items received by the writer are completely "filled in", meaning that you have already set any other entities on them (be it a single entity or a set of of entities such as di.set(es) in your example). If this leads to lazy intitialization exceptions, you need to change your model to be eagerly initialized instead, because Spring Batch cannot help at that level.

Scheduled Spring MVC task not updating DB entity

Good Morning,
I am trying to create a scheduled task which has to update database entity cyclically, I am using Spring MVC and Hibernate as ORM.
Problem
The scheduled task should update entities in background, but changes are not persisted in the Database.
Structure of the system
I have a Batch entity with basic information and plenty of sensors inserting record in the DB every few seconds.
Related to the Batch entity, there is a TrackedBatch entity which contains many calculated fields related to the Batch entity itself, the scheduled task takes each Batch one by one, update related data from sensors with lotto = lottoService.updateBatchRelations(batch) and then update the TrackedBatch entity with the new computed data.
A user can modify Batch basic information, then the system should recompute TrackedBatch data and update the entity (this is done by the controller which calls updateBatchFollowingModification method). This step is correctly done with an asynch method, the problem comes when the scheduled task should recompute the same infos.
Asynch method used to update entities after user modification (Working correctly)
#Async("threadPoolTaskExecutor")
#Transactional
public void updateBatchFollowingModification(Lotto lotto)
{
logger.debug("Daemon started");
Lotto batch = lottoService.findBatchById(lotto.getId_lotto(), false);
lotto = lottoService.updateBatchRelations(batch);
lotto.setTrackedBatch(trackableBatchService.modifyTrackedBatch(batch.getTrackedBatch(), batch));
logger.debug("Daemon ended");
}
Scheduled methods to update entities cyclically (Not working as expected)
#Scheduled(fixedDelay = 10000)
public void updateActiveBatchesWithDaemon()
{
logger.info("updating active batches in background");
List<Integer> idsOfActiveBatches = lottoService.findIdsOfActiveBatchesInAllSectors();
if(!idsOfActiveBatches.isEmpty())
{
logger.info("found " + idsOfActiveBatches.size() + " active batches");
for(Integer id : idsOfActiveBatches)
{
logger.debug("update batch " + id + " in background");
updateBatch(id);
}
}
else
{
logger.info("no active batches found");
}
}
#Transactional
public void updateBatch(Integer id)
{
Lotto activeLotto = lottoService.findBatchById(id, false);
updateBatchFollowingModification(activeLotto);
}
As a premise, I can state that scheduled method is fired/configured correctly and runs continously (the same stands for asynch method, as following a user modification all entities are updated correctly), at line updateBatchFollowingModification(activeLotto) in updateBatch method, the related entities are modified correctly (even the TrackedBatch, I have checked with the debugger), then the changes are not persisted in the Database when method ends and no exception is thrown.
Looking around the internet I didn't find any solution to this problem nor it seems to be a known problem or bug from Hibernate and Spring.
Also reading Spring documentation about scheduling didn't help, I also tried to use save method in the scheduled task to save again the entity (but it obiously didn't work).
Further considerations
I do not know if the #Scheduled annotation needs some extra configuration to handle #Transactional methods as in the web devs are using those annotations together with no problem, moreover in documentation no cons are mentioned.
I also do not think it is a concurrency problem, because if the asynch method is modifying the data, the scheduled one should be stopped by the implicit optimistic locking system in order to finish after the first transaction commit, the same stands if the first to acquire the locking is the scheduled method (correct me if I am wrong).
I cannot figure out why changes are not persisted when the scheduled method is used, can someone link documentation or tutorials on this topic? so I can find a solution, or, better, if someone faced a similar problem, how it can be solved?

Finally I managed to resolve the issue by explicitly defining the isolation level for the transaction involved in the process and by eliminating the updateBatch method (as it was a kind of duplicated feature as updateBatchFollowingModification is doing the same thing), in particular I put the isolation level for updateBatchFollowingModification to #Transactional(isolation = Isolation.SERIALIZABLE).
This obviously works in my case as no scalability is needed, so serializing actions do not bring any problem to the application.

HibernateTransactionManager #Transactional(propagation=REQUIRES_NEW) cannot open 2 sessions

There is one batch job looking like this:
#Transactional
public void myBatchJob() {
// retrieves thousands of entries and locks them
// to prevent other jobs from touthing this dataset
entries = getEntriesToProcessWithLock();
additional = doPrepWork(); // interacts with DB
processor = applicationContext.getBean(getClass());
while (!entries.isEmpty()) {
result = doActualProcessing(entries, additional); // takes as many entries as it needs; removes them from collection afterwards
resultDao.save(result);
}
}
However I occasionally get the below error if the entries collection is big enough.
ORA-01000: maximum open cursors exceeded
I decided to blame doActualProcessing() and save() methods as they could end up in creating hundreds of blobs in one transaction.
The obvious way out seems to be splitting processing into multiple transactions: one for getting and locking entries and multiple other transactions for processing and persisting. Like this:
#Transactional
public void myBatchJob() {
// retrieves thousands of entries and locks them
// to prevent other jobs from touthing this dataset
entries = getEntriesToProcessWithLock();
additional = doPrepWork(); // interacts with DB
processor = applicationContext.getBean(getClass());
while (!entries.isEmpty()) {
processor.doProcess(entries, additional);
}
}
#Transactional(propagation=REQUIRES_NEW)
public void doProcess(entries, additional) {
result = doActualProcessing(entries, additional); // takes as many entries as it needs; removes them from collection afterwards
resultDao.save(result);
}
and now whenever doProcess is called I get:
Caused by: org.hibernate.HibernateException: illegally attempted to associate a proxy with two open Sessions
How do I make HibernateTransactionManager do what REQUIRES_NEW javadoc suggests: suspend current transaction and start a new one?

In my opinion the problem lies in the fact that you have retrieved the entities in the top Transaction and while they are still associated with that transaction you try to pass them (proxies) to method which would be processed in a separate transaction.
I think that you could try two options:
1) Detach the entities before ivoking processor.doProcess(entries, additional);:
session.evict(entity); // loop through the list and do this
then inside inner transaction try to merge:
session.merge(entity);
2) Second option would be to retrieve ids instead of entities in the getEntriesToProcessWithLock. Then you would pass plain primitive fields which wont cause proxy problems. You would then retrieve proper entities inside of the inner transaction.

Map Entity to different Tables based on certain condition

I have one Entity named Transaction and its related table in database is TAB_TRANSACTIONS. The whole system is working pretty fine; now a new requirement has came up in which the client has demanded that all the transactions older than 30 days should be moved to another archive table, e.g. TAB_TRANSACTIONS_HIST.
Currently as a work around I have given them a script scheduled to run every 24 hours, which simply moves the data from Source to Dest.
I was wondering is there any better solution to this using hibernate?
Can I fetch Transaction entities and then store them in TAB_TRANSACTIONS_HISTORY? I have looked at many similar questions but couldn't find a solution to that, any suggestions would help.

You may want to create a quartz scheduler for this task. Here is the Job for the scheduler
public class DatabaseBackupJob implements Job {
public void execute(JobExecutionContext jec) throws JobExecutionException {
Configuration cfg=new Configuration();
cfg.configure("hibernate.cfg.xml");
Session session = cfg.buildSessionFactory().openSession();
Query q = session.createQuery("insert into Tab_Transaction_History(trans) select t.trans as trans from Tab_Transaction t where t.date < :date")
.setParameter("date", reqDate);
try{
Trasaction t = session.beginTransaction();
q.executeNonQuery();
t.commit();
} catch(Exception e){
} finally {
session.close();
}
}
}
P.S. hibernate doesnot provide a scheduler, so you cannot perform this activity using core hibernate and hence you need external APIs like quartz scheduler

The solution you search may be achieved only if you rely on TWO different persistence context, I think.
A single persistence context maps entities to tables in a non-dynamic way, so you can't perform a "runtime-switch" from a mapped-table to another.
But you can create a different persistence context (or a parallel configuration in hibernate instead of using 2 different contexts), then load this new configuration in a different EntityManager, and perform all your tasks.
That's the only solution that comes to mind, at the moment. Really don't know if it's adequate...

I think it's a good idea to run the script every 24 hours.
You could decrase the interval if you're not happy.
But if you already have a working script, where is your actual problem?
Checking the age of all transactions and move the ones older than 30 days to another list or map is the best way I think.

You will need some kind of schedule mechanism. Either a thread that is woken up periodically, or some other trigger that is appropriate for you.
You can also use a bulk insert operation
Query q = session.createQuery(
"insert into TabTransactionHistory tth
(.....)
select .... from TabTransaction tt"
);
int createdObjects = q.executeUpdate();
(Replace ... with actual fields)
You can also use the "where clause" which can be used to trim down result on basis of how old the entries are.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spring Roo project as batch job without transactions - java

Related

How to check special conditions before saving data with Hibernate

Spring Batch + Hibernate: Resolve ManyToMany on Data Migration

Scheduled Spring MVC task not updating DB entity

HibernateTransactionManager #Transactional(propagation=REQUIRES_NEW) cannot open 2 sessions

Map Entity to different Tables based on certain condition

Categories

Resources