I need to insert a lot of data in a database using hibernate, i was looking at batch insert from hibernate, what i am using is similar to the example on the manual:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
but i see that flush doesn't write the data on the database.
Reading about it, if the code is inside a transaction then nothing will be committed to the database until the transaction performs a commit.
So what is the need to use flush/clear ? seems useless, if the data are not written on the database then they are in memory.
How can i force hibernate to write data in the database?
Thanks
The data is sent to the database, and is not in memory anymore. It's just not made definitively persistent until the transaction commit. It's exacltly the same as if you executes the following sequences of statements in any database tool:
begin;
insert into ...
insert into ...
insert into ...
// here, three inserts have been done on the database. But they will only be made
// definitively persistent at commit time
...
commit;
The flush consists in executing the insert statements.
The commit consists in executing the commit statement.
The data will be written to the database, but according to the transaction isolation level you will not see them (in other transactions) until the transaction is committed.
Use some sql statement logger, that prints the statmentes that are transported over the database connection, then you will see that the statmentes are send to the database.
For best perfromance you also have to commit transactions. Flushing and clearing session clears hibernate caches, but data is moved to JDBC connection caches, and is still uncommited ( different RDBMS / drivers show differrent behaviour ) - you are just shifting proble to other place without real improvements in perfromance.
Having flush() at the location mentioned saves you memory too as your session will be cleared regularly. Otherwise you will have 100000 object in memory and might run out of memory for larger count. Check out this article.
Related
I trying persist a many registers in database reading a file with many lines
I´m using a forech to read the list of objects wrapped in file
logs.stream().forEach(log -> save(log));
private LogData save(LogData log) {
return repository.persist(log);
}
But the inserts are slow
Do i have a way to speed the inserts?
Your way take a long time because you persist element by element, so you go n time to the database, I would like to use Batch processing instead to use one transaction instead of N transaction, so the persist method can be :
public void persist(List<Logs> logs) {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
logs.forEach(log -> session.save(log));// from the comment of #shmosel
tx.commit();
session.close();
}
Use a Batch Insert, Google "Hibernate Batch Insert" or replace with whatever name of your ORM if it's not Hibernate.
https://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm
To insert at every line makes this program slowly, why dont you think to collect n lines, and insert n lines together at once.
I stumbled upon a problem with locking row in Oracle DB. The purpose of the lock is to prevent more than one transaction reading data from the DB because this data influences the generation of new data and is changed in terms of a transaction.
In order to make the lock, I've put the #Lock annotation over SpringData find method which retrieves data that participates in the transaction.
#Lock(LockModeType.PESSIMISTIC_WRITE)
User findUserById(#Param("id") String operatorId);
After this code is implemented I get log message
org.hibernate.loader.Loader - HHH000444: Encountered request for locking however dialect reports that database prefers locking be done in a separate select (follow-on locking); results will be locked after initial query executes
Besides, it has no effect and causes
org.springframework.dao.DataIntegrityViolationException: could not execute batch; SQL [insert into ...]
The issue can be solved when rewriting the lock using entity manager
entityManager.lock(userByIdWithLockOnReadWrite, LockModeType.PESSIMISTIC_WRITE);
or
entityManager.unwrap(Session.class).lock(userByIdWithLockOnReadWrite, LockMode.PESSIMISTIC_WRITE);
The issue doesn't appear on MariaDB (MySQL).
Maybe there are some special rules of using the annotation?
You said that:
The purpose of the lock is to prevent more than one transaction
reading data from the DB because this data influences the generation
of new data and is changed in terms of a transaction.
Oracle uses MVCC (Multiversion Concurrency Control) so Readers don't block Writers and Writers don't block Readers. Even if you acquire a row-level lock with Oracle, and you modify that row without committing, other transactions can still read the last committed value.
Related to this log message:
org.hibernate.loader.Loader - HHH000444: Encountered request for locking however dialect reports that database prefers locking be done in a separate select (follow-on locking); results will be locked after initial query executes
The follow-on locking mechanism is due to Oracle not being able to apply the lock when doing Oracle 11g pagination, using DISTINCT or UNION ALL.
If you're using Oracle 12i, then you can update the Hibernate dialect to Oracle12cDialect and pagination and locking will work fine since Oracle 12 uses the SQL standard pagination and it no longer requires a derived table query.
This does not happen in MariaDB or any other database. It's just an Oracle pre-12 limitation.
If you are using Hibernate 5.2.1, we added a new hint HINT_FOLLOW_ON_LOCKING which disables this mechanism.
So, your Spring Data query becomes:
#QueryHints(value = { #QueryHint(name = "hibernate.query.followOnLocking", value = "false")}, forCounting = false)
#Lock(LockModeType.PESSIMISTIC_WRITE)
User findUserById(#Param("id") String operatorId);
you can also apply it manually:
User user = entityManager.createQuery(
"select u from User u where id = :id", User.class)
.setParameter("id", id);
.unwrap( Query.class )
.setLockOptions(
new LockOptions( LockMode.PESSIMISTIC_WRITE )
.setFollowOnLocking( false ) )
.getSingleResult();
Is there any option to make a Transaction(TxB) to wait for some time (without throwing Lock Acquisition Exception) for another Transaction(TxA) to release the Lock.
I noticed weird behavior in my application. It looks like commited data is not visible right after commit. Algorithm looks like this :
connection1 - insert into table row with id = 5
connection1 - commit, close
connection2 - open
connection2 - select from table row with id = 5 (no results)
connection2 - insert into table row with id = 5 (PRIMARY KEY VIOLATION, result is in db)
If select on connection2 returns no results then i do insert, otherwise it is update.
Server has many databases (~200), it looks like commit is done but changes are in DB later. I use java and jdbc. Any ideas would be appreciated.
This behavior corresponds to the REPEATABLE READ isolation mode, see SET TRANSACTION:
REPEATABLE READ
All statements of the current transaction can only see rows committed before the
first query or data-modification statement
was executed in this transaction.
Try connection.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED) to see if it makes a difference.
I have an application using hibernate. One of its modules calls a native SQL (StoredProc) in batch process. Roughly what it does is that every time it writes a file it updates a field in the database. Right now I am not sure how many files would need to be written as it is dependent on the number of transactions per day so it could be zero to a million.
If I use this code snippet in while loop will I have any problems?
#Transactional
public void test()
{
//The for loop represents a list of records that needs to be processed.
for (int i = 0; i < 1000000; i++ )
{
//Process the records and write the information into a file.
...
//Update a field(s) in the database using a stored procedure based on the processed information.
updateField(String.valueOf(i));
}
}
#Transactional(propagation=propagation.MANDATORY)
public void updateField(String value)
{
Session session = getSession();
SQLQuery sqlQuery = session.createSQLQuery("exec spUpdate :value");
sqlQuery.setParameter("value", value);
sqlQuery.executeUpdate();
}
Will I need any other configurations for my data source and transaction manager?
Will I need to set hibernate.jdbc.batch_size and hibernate.cache.use_second_level_cache?
Will I need to use session flush and clear for this? The samples in the hibernate tutorial is using POJO's and not native sql so I am not sure if it is also applicable.
Please note another part of the application is already using hibernate so as much as possible I would like to stick to using hibernate.
Thank you for your time and I am hoping for your quick response. If it is also possible could code snippet would really be useful for me.
Application Work Flow
1) Query Database for the transaction information. (Transaction date, Type of account, currency, etc..)
2) For each account process transaction information. (Discounts, Current Balance, etc..)
3) Write the transaction information and processed information to a file.
4) Update a database field based on the process information
5) Go back to step 2 while their are still accounts. (Assuming that no exception are thrown)
The code snippet will open and close the session for each iteration, which definitely not a good practice.
Is it possible, you have a job which checks how many new files added in the folder?
The job should run say every 15/25 minutes, checking how much files are changed/added in last 15/25 minutes and updates the database in batch.
Something like that will lower down the number of open/close session connections. It should be much faster than this.
I want to inquire about what actually the flush method does in the following case:
for (int i = 0; i < myList.size(); i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
getCurrentSession().save(c);
if (i % 20 == 0)
getCurrentSession().flush();
}
Does this means that after the iteration 20, the cache is flushed, and then the 20 held memory objects are actually saved in the database ?
Can someone please explain to me what will happen when the condition is true.
From the javadoc of Session#flush:
Force this session to flush. Must be
called at the end of a unit of work,
before committing the transaction and
closing the session (depending on
flush-mode, Transaction.commit()
calls this method).
Flushing is the process of synchronizing the underlying
persistent store with persistable
state held in memory.
In other words, flush tells Hibernate to execute the SQL statements needed to synchronize the JDBC connection's state with the state of objects held in the session-level cache. And the condition if (i % 20 == 0) will make it happen for every i multiple of 20.
But, still, the new Car instances will be held in the session-level cache and, for big myList.size(), you're going to eat all memory and ultimately get an OutOfMemoryException. To avoid this situation, the pattern described in the documentation is to flush AND clear the session at regular intervals (same size as the JDBC batch size) to persist the changes and then detach the instances so that they can be garbage collected:
13.1. Batch inserts
When making new objects persistent
flush() and then clear() the session
regularly in order to control the size
of the first-level cache.
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
The documentation mentions in the same chapter how to set the JDBC batch size.
See also
10.10. Flushing the Session
Chapter 13. Batch processing
Depends on how the FlushMode is set up.
In default configuration Hibernate tries to sync up with the database at three locations.
1. before querying data
2. on commiting a transaction
3. explictly calling flush
If the FlushMode is set as FlushMode.Manual, the programmer is informing hibernate that he/she will handle when to pass the data to the database.Under this configuration
the session.flush() call will save the object instances to the database.
A session.clear() call acutally can be used to clear the persistance context.
// Assume List to be of 50
for (int i = 0; i < 50 ; i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
getCurrentSession().save(c);
// 20 car Objects which are saved in memory syncronizes with DB
if (i % 20 == 0)
getCurrentSession().flush();
}
Few more pointers regarding why the flushing should match batch size
To enable batching you need to set the jdbc batch size
// In your case
hibernate.jdbc.batch_size =20
One common pitfall in using batching is if you are using single object update or insert this goes fine.But in case
you are using mutiple objects leading to multiple inserts /updates then you will have to explicitly set the sorting mechanism.
For example
// Assume List to be of 50
for (int i = 0; i < 50 ; i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
// Adding accessory also in the card here
Accessories a=new Accessories("I am new one");
c.add(a);
// Now you got two entities to be persisted . car and accessory
// Two SQL inserts
getCurrentSession().save(c);
// 20 car Objects which are saved in memory syncronizes with DB
// Flush here clears the car objects from 1st level JVM cache
if (i % 20 == 0)
getCurrentSession().flush();
getCurrentSession().clear();
}
Here in this case
two sql are generated
1 for insert in car
1 for insert in accessory
For proper batching you will have to set the
<prop key="hibernate.order_inserts">true</prop>
so that all the inserts for car is sorted together and all inserts of accessories are sorted together.By doing so you will have 20 inserts firing in a batch rather then 1 sql firing at a time.
For different operation under one transaction, you can have a look at http://docs.jboss.org/hibernate/core/3.2/api/org/hibernate/event/def/AbstractFlushingEventListener.html
Yes every 20 loop, sql is generated and executed for the unsaved objects. Your should also set batch mode to 20 to increase performances.