I need to receive and save huge amount of data using spring data over hibernate. Our server allocated not enough RAM for persisting all entities at the same time. We will definitely get OutOfMemory error.
So we need to save data by batches it's obvious. Also we need to use #Transactional to be sure that all data persisted or non was persisted in case of even single error.
So, the question: does spring data during #Transactional method keep storing entities in RAM or entities which were flushed are accessible to garbage collector?
So, what is the best approach to process huge mount of data with spring data? Maybe spring data isn't right approach to solve problems like that.
Does spring data during #Transactional method keep storing entities in
RAM or entities which were flushed are accessible to garbage
collector?
The entities will keep storing in RAM (i.e in entityManager) until the transaction commit/rollback or the entityManager is cleared. That means the entities are only eligible for GC if the transaction commit/rollback or
entityManager.clear() is called.
So, what is the best approach to process huge mount of data with
spring data?
The general strategy to prevent OOM is to load and process the data batch by batch . At the end of each batch , you should flush and clear the entityManager such that the entityManager can release its managed entities for CG. The general code flow should be something like this:
#Component
public class BatchProcessor {
//Spring will ensure this entityManager is the same as the one that start transaction due to #Transactional
#PersistenceContext
private EntityManager em;
#Autowired
private FooRepository fooRepository;
#Transactional
public void startProcess(){
processBatch(1,100);
processBatch(101,200);
processBatch(201,300);
//blablabla
}
private void processBatch(int fromFooId , int toFooId){
List<Foo> foos = fooRepository.findFooIdBetween(fromFooId, toFooId);
for(Foo foo :foos){
//process a foo
}
/*****************************
The reason to flush is send the update SQL to DB .
Otherwise ,the update will lost if we clear the entity manager
afterward.
******************************/
em.flush();
em.clear();
}
}
Note that this practise is only for preventing OOM but not for achieving high performance. So if performance is not your concern , you can safely use this strategy.
Related
Alrighty folks. I've read quite a bit of docs. And at my wits end on why this is happening.
Like many of the other people, we have a service interfacing with a oracle db. This is a pretty standard setup
#Service
public class DaoService {
private JdbcTemplate jdbcTemplate;
private SimpleJdbcInsert insertA;
private SimpleJdbcInsert insertB;
private SimpleJdbcInsert insertC;
#Autowired
public Dao(JdbcTemplate jdbcTemplate) {
this.jdbcTemplate = jdbcTemplate;
// Assume all Inserts are initialized
}
#Transactional(rollbackFor = Exception.class)
public void insertStuff(Stuff stuff) {
insertA.execute(stuff.A);
// Suppose a connection failed here
insertB.executeBatch(stuff.B);
insertC.executeBatch(stuff.C);
}
Now here lies our issue. For 99% of all rollbacks they are a-ok.
When a connection gets closed for unknown reasons is where our problems occur.
The exception message being
TransactionSystemException: Could not roll back JDBC transaction;
nested exception is java.sql.SQLException: "The connection is closed"
See, it tries to rollback as it's supposed to. But the issue is that stuffA lingers in the DB will stuffB and stuffC is not. This happens only like 1% of the time. (aka, sometimes despite the rollback failure, no 'stuff' will be in the DB.
Am I understanding something wrong? I thought spring only commits at the end of a successful transaction? Anyone have an idea on how I can stop these partial commits despite making in #Transactional?
p.s. for what it's worth. autocommit is defaulted to being on. However I read that it's not taken into consideration when something is #Transactional
From the Database JDBC Developer's Guide:
If the auto-commit mode is disabled and you close the connection without explicitly committing or rolling back your last changes, then an implicit COMMIT operation is run.
Maybe this is the case you are running into.
According to this answer, Spring only offers you an interface for transaction manager and the implementation may vary. However, most transaction manager implementations will turn off autocommit during an #Transactional call then return it to the previous state after commit.
That being said, on some databases there are ways to execute autonomous transactions inside an outer transaction that may not be visible to your application or to Spring. I know Oracle allows this. Have you checked all of the triggers on the tables? I worked on one application that had an audit trigger that would force orphaned data into some of the tables in situations like this. Can you tell us which database you are using?
try this :
#Transactional(propagation = Propagation.REQUIRES_NEW)
hope it will work
My application loads a list of entities that should be processed. This happens in a class that uses a scheduler
#Component
class TaskScheduler {
#Autowired
private TaskRepository taskRepository;
#Autowired
private HandlingService handlingService;
#Scheduled(fixedRate = 15000)
#Transactional
public void triggerTransactionStatusChangeHandling() {
taskRepository.findByStatus(Status.OPEN).stream()
.forEach(handlingService::handle);
}
}
In my HandlingService processes each task in issolation using REQUIRES_NEW for propagation level.
#Component
class HandlingService {
#Transactional(propagation = Propagation.REQUIRES_NEW)
public void handle(Task task) {
try {
processTask(task); // here the actual processing would take place
task.setStatus(Status.PROCCESED);
} catch (RuntimeException e) {
task.setStatus(Status.ERROR);
}
}
}
The code works only because i started the parent transaction on TaskScheduler class. If i remove the #Transactional annotation the entities are not managed anymore and the update to the task entity is not propagated to the db.I don't find it natural to make the scheduled method transactional.
From what i see i have two options:
1. Keep code as it is today.
Maybe it`s just me and this is a correct aproach.
This varianthas the least trips to the database.
2. Remove the #Transactional annotation from the Scheduler, pass the id of the task and reload the task entity in the HandlingService.
#Component
class HandlingService {
#Autowired
private TaskRepository taskRepository;
#Transactional(propagation = Propagation.REQUIRES_NEW)
public void handle(Long taskId) {
Task task = taskRepository.findOne(taskId);
try {
processTask(task); // here the actual processing would take place
task.setStatus(Status.PROCCESED);
} catch (RuntimeException e) {
task.setStatus(Status.ERROR);
}
}
}
Has more trips to the database (one extra query/element)
Can be executed using #Async
Can you please offer your opinion on which is the correct way of tackling this kind of problems, maybe with another method that i didn't know about?
If your intention is to process each task in a separate transaction, then your first approach actually does not work because everything is committed at the end of the scheduler transaction.
The reason for that is that in the nested transactions Task instances are basically detached entities (Sessions started in the nested transactions are not aware of those instances). At the end of the scheduler transaction Hibernate performs dirty check on the managed instances and synchronizes changes with the database.
This approach is also very risky, because there may be troubles if you try to access an uninitialized proxy on a Task instance in the nested transaction. And there may be troubles if you change the Task object graph in the nested transaction by adding to it some other entity instance loaded in the nested transaction (because that instance will now be detached when the control returns to the scheduler transaction).
On the other hand, your second approach is correct and straightforward and helps avoid all of the above pitfalls. Only, I would read the ids and commit the transaction (there is no need to keep it suspended while the tasks are being processed). The easiest way to achieve it is to remove the Transactional annotation from the scheduler and make the repository method transactional (if it isn't transactional already).
If (and only if) the performance of the second approach is an issue, as you already mentioned you could go with asynchronous processing or even parallelize the processing to some degree. Also, you may want to take a look at extended sessions (conversations), maybe you could find it suitable for your use case.
The current code processes the task in the nested transaction, but updates the status of the task in the outer transaction (because the Task object is managed by the outer transaction). Because these are different transactions, it is possible that one succeeds while the other fails, leaving the database in an inconsistent state. In particular, with this code, completed tasks remain in status open if processing another task throws an exception, or the server is restarted before all tasks have been processed.
As your example shows, passing managed entities to another transaction makes it ambiguous which transaction should update these entities, and is therefore best avoided. Instead, you should be passing ids (or detached entities), and avoid unnecessary nesting of transactions.
Assuming that processTask(task); is a method in the HandlingService class (same as handle(task) method), then removing #Transactional in HandlingService won't work because of the natural behavior of Spring's dynamic proxy.
Quoting from spring.io forum:
When Spring loads your TestService it wrap it with a proxy. If you call a method of the TestService outside of the TestService, the proxy will be invoke instead and your transaction will be managed correctly. However if you call your transactional method in a method in the same object, you will not invoke the proxy but directly the target of the proxy and you won't execute the code wrapped around your service to manage transaction.
This is one of SO thread about this topic, and Here are some articles about this:
http://tutorials.jenkov.com/java-reflection/dynamic-proxies.html
http://tutorials.jenkov.com/java-persistence/advanced-connection-and-transaction-demarcation-and-propagation.html
http://blog.jhades.org/how-does-spring-transactional-really-work/
If you really don't like adding #Transaction annotation in your #Scheduled method, you could get transaction from EntityManager and manage transaction programmatically, for example:
UserTransaction tx = entityManager.getTransaction();
try {
processTask(task);
task.setStatus(Status.PROCCESED);
tx.commit();
} catch (Exception e) {
tx.rollback();
}
But I doubt that you will take this way (well, I wont). In the end,
Can you please offer your opinion on which is the correct way of tackling this kind of problems
There's no correct way in this case. My personal opinion is, annotation (for example, #Transactional) is just a marker, and you need a annotation processor (spring, in this case) to make #Transactional work. Annotation will have no impact at all without its processor.
I Will more worry about, for example, why I have processTask(task) and task.setStatus(Status.PROCESSED); live outside of processTask(task) if it looks like does the same thing, etc.
HTH.
i'm working with spring jdbcTemplate in some desktop aplications.
i'm trying to rollback some database operations, but i don't know how can i manage the transaction with this object(JdbcTemplate). I'm doing multiple inserts and updates through a methods sequence. When any operation fails i need rollback all previous operations.
any idea?
Updated... i tried to use #Transactional, but the rolling back doesn't happend.
Do i need some previous configuration on my JdbcTemplate?
My Example:
#Transactional(rollingbackFor = Exception.class,propagation = Propagation.REQUIRES_NEW)
public void Testing(){
jdbcTemplate.exec("Insert into table Acess_Level(IdLevel,Description) values(1,'Admin')");
jdbcTemplate.exec("Insert into table Acess_Level(IdLevel,Description) values(STRING,'Admin')");
}
The IdLevel is a NUMERIC parameter, so... in our second command will occur an exception. When i see the table in database, i can see the first insert... but, i think this operation should be roll back.
what is wrong?
JdbcTemplate doesn't handle transaction by itself. You should use a TransactionTemplate, or #Transactional annotations : with this, you can then group operations within a transaction, and rollback all operations in case of errors.
#Transactional
public void someMethod() {
jdbcTemplate.update(..)
}
In Spring private methods don't get proxied, so the annotation will not work. See this question: Does Spring #Transactional attribute work on a private method?.
Create a service and put #Transactional on the public methods that you want to be transactional. It would be more normal-looking Spring code to have simple DAO objects that each do one job and have several of them injected than it would to have a complicated DAO object that performed multiple SQL calls within its own transaction.
I use spring+hibernate template to process entities. There is rather big amount of entities to be loaded all at once, so I retrieve an iterator from hibernate template.
Every entity should be processed as a single unit of work. I tried to put entity processing in a separate transaction (propagation = REQUIRED_NEW). However I ended with exception stating that proxy is bounded to two sessions. This is due to suspended transaction used for iterator lazy loading. I am using the same bean for lazy loading and for processing entities. (May be it should be refactored into two separate daos: one for lazy loading and one for processing?)
Then I tried to use single transaction that is committed after each entity is processed. Much better here, there are no exceptions during entity processing, but after processing is finished and method returns, exception is thrown from spring transaction managing code.
#Transactional
public void processManyManyEntities() {
org.hibernate.Sesstion hibernateSession = myDao.getHibernateTemplate().getSessionFactory().getCurrentSession();
Iterator<Entity> entities = myDao.findEntitesForProcessing();
while (entities.hasNext()) {
Entity entity = entities.next();
hibernateSession.beginTransaction();
anotherBean.processSingleEntity(entity);
hibernateSession.getTransaction().commit();
}
}
processSingleEntity is a method in another bean annotated with #Transactional, so there is one transaction for all entities. I checked what transaction causes an exception: it is the very first transaction returned from hibernateSession.beginTransaction(), so it just not updated in transaction manager.
There are several questions:
is it possible to avoid session bind exception without refactoring the dao? that is not relevant question as problem is with Session and hibernate not with dao
is it possible to update transaction in the transaction manager? solved
is it possible to use the same transaction (for anotherBean.processSingleEntity(entity);) without #Transactional annotation on processManyManyEntities?
I would prefer to remove the #Transactional annotation on "processManyManyEntities()", and, to eagerly load all data from "findEntitesForProcessing".
If your want each entity of data to be transactional in "processSingleEntity", the #Transactinoal on "processSingleEntity" is fine. You don't have to annotate the #Transactional on "processManyManyEntities()". But in case of lazy loading, eager loading is a necessary mean to prevent the source data from loading in another session, say, in "processSingleEntity".
Since the transaction for each entity of data, the transactional boundary in loading of source data is not the case of your transaction. Don't let the "lazy loading" to complicate your intent of transaction for modification of data.
That is possible to update transaction in transaction manager. All you need to do is get SessionHolder instance from TransactionSynchronizationManager.getResource(sessionFactory) and set transaction in the session holder to the current. That makes possible to commit transaction whenever it is necessary and allows transaction to be committed by spring transaction manager after method return.
In our J2EE application, we use a EJB-3 stateful bean to allow the front code to create, modify and save persistent entities (managed through JPA-2).
It looks something like this:
#LocalBean
#Stateful
#TransactionAttribute(TransactionAttributeType.NEVER)
public class MyEntityController implements Serializable
{
#PersistenceContext(type = PersistenceContextType.EXTENDED)
private EntityManager em;
private MyEntity current;
public void create()
{
this.current = new MyEntity();
em.persist(this.current);
}
public void load(Long id)
{
this.current = em.find(MyEntity.class, id);
}
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void save()
{
em.flush();
}
}
Very important, to avoid too early commits, only the save() method is within a transaction, so if we call create(), we insert nothing in the database.
Curiously, in the save() method, we have to call em.flush() in order to really hit the database. In fact, I tried and found that we can also call em.isOpen() or em.getFlushMode(), well anything that is "em-related".
I don't understand this point. As save() is in a transaction, I thought that at the end of the method, the transaction will be committed, and so the persistent entity manager automatically flushed. Why do I have to manually flush it?
Thanks,
Xavier
To be direct and to the metal, there will be no javax.transaction.Synchronization objects registered for the EntityManager in question until you actually use it in a transaction.
We in app-server-land will create one of these objects to do the flush() and register it with the javax.transaction.TransactionSynchronizationRegistry or javax.transaction.Transaction. This can't be done unless there is an active transaction.
That's the long and short of it.
Yes, an app server could very well keep a list of resources it gave the stateful bean and auto-enroll them in every transaction that stateful bean might start or participate in. The downside of that is you completely lose the ability to decide which things go in which transactions. Maybe you have a 2 or 3 different transactions to run on different persistence units and are aggregating the work up in your Extended persistence context for a very specific transaction. It's really a design issue and the app server should leave such decisions to the app itself.
You use it in a transaction and we'll enroll it in the transaction. That's the basic contract.
Side note, depending on how the underlying EntityManager is handled, any persistent call to the EntityManager may be enough to cause a complete flush at the end of the transaction. Certainly, flush() is the most direct and clear but a persist() or even a find() might do it.
If you use extended persistence context all operations on managed entities done inside non-transactional methods are queued to be written to the database. Once you call flush() on entity manager within a transaction context all queued changes are written to the database. So in other words, the fact that you have a transactional method doesn't commit the changes itself when method exits (as in CMT), but flushing entity manager actually does. You can find full explanation of this process here
Because there is no way to know "when" the client is done with the session (extended scope).