I want to inquire about what actually the flush method does in the following case:
for (int i = 0; i < myList.size(); i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
getCurrentSession().save(c);
if (i % 20 == 0)
getCurrentSession().flush();
}
Does this means that after the iteration 20, the cache is flushed, and then the 20 held memory objects are actually saved in the database ?
Can someone please explain to me what will happen when the condition is true.
From the javadoc of Session#flush:
Force this session to flush. Must be
called at the end of a unit of work,
before committing the transaction and
closing the session (depending on
flush-mode, Transaction.commit()
calls this method).
Flushing is the process of synchronizing the underlying
persistent store with persistable
state held in memory.
In other words, flush tells Hibernate to execute the SQL statements needed to synchronize the JDBC connection's state with the state of objects held in the session-level cache. And the condition if (i % 20 == 0) will make it happen for every i multiple of 20.
But, still, the new Car instances will be held in the session-level cache and, for big myList.size(), you're going to eat all memory and ultimately get an OutOfMemoryException. To avoid this situation, the pattern described in the documentation is to flush AND clear the session at regular intervals (same size as the JDBC batch size) to persist the changes and then detach the instances so that they can be garbage collected:
13.1. Batch inserts
When making new objects persistent
flush() and then clear() the session
regularly in order to control the size
of the first-level cache.
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
The documentation mentions in the same chapter how to set the JDBC batch size.
See also
10.10. Flushing the Session
Chapter 13. Batch processing
Depends on how the FlushMode is set up.
In default configuration Hibernate tries to sync up with the database at three locations.
1. before querying data
2. on commiting a transaction
3. explictly calling flush
If the FlushMode is set as FlushMode.Manual, the programmer is informing hibernate that he/she will handle when to pass the data to the database.Under this configuration
the session.flush() call will save the object instances to the database.
A session.clear() call acutally can be used to clear the persistance context.
// Assume List to be of 50
for (int i = 0; i < 50 ; i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
getCurrentSession().save(c);
// 20 car Objects which are saved in memory syncronizes with DB
if (i % 20 == 0)
getCurrentSession().flush();
}
Few more pointers regarding why the flushing should match batch size
To enable batching you need to set the jdbc batch size
// In your case
hibernate.jdbc.batch_size =20
One common pitfall in using batching is if you are using single object update or insert this goes fine.But in case
you are using mutiple objects leading to multiple inserts /updates then you will have to explicitly set the sorting mechanism.
For example
// Assume List to be of 50
for (int i = 0; i < 50 ; i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
// Adding accessory also in the card here
Accessories a=new Accessories("I am new one");
c.add(a);
// Now you got two entities to be persisted . car and accessory
// Two SQL inserts
getCurrentSession().save(c);
// 20 car Objects which are saved in memory syncronizes with DB
// Flush here clears the car objects from 1st level JVM cache
if (i % 20 == 0)
getCurrentSession().flush();
getCurrentSession().clear();
}
Here in this case
two sql are generated
1 for insert in car
1 for insert in accessory
For proper batching you will have to set the
<prop key="hibernate.order_inserts">true</prop>
so that all the inserts for car is sorted together and all inserts of accessories are sorted together.By doing so you will have 20 inserts firing in a batch rather then 1 sql firing at a time.
For different operation under one transaction, you can have a look at http://docs.jboss.org/hibernate/core/3.2/api/org/hibernate/event/def/AbstractFlushingEventListener.html
Yes every 20 loop, sql is generated and executed for the unsaved objects. Your should also set batch mode to 20 to increase performances.
Related
I would like to know what is the best way to do the following:
A client sends a json of 100 records to the spring boot application to insert into the DB.
But before inserting I have to execute a query to verify some data of EACH record of the 100 records. And then insert.
I currently have this:
for(int i= 0; i < productos.size(); i++) {
productos.get(i).setIdvehiculo(productoRepository.findTesting("49878", 3)); // ----> NATIVE QUERY EXECUTION TAKES 100ms I THINK
productoRepository.save(productos.get(i)); // ----> INSERT
}
//productoRepository.saveAll(productos);
entityManager.flush();
entityManager.clear();
And it takes 10 seconds ... doing the select and inserting. 100 records, 10 seconds, isn't that a long time?
Don't insert 1:1 inside for loop, just construct the model there and add that model into ArrayList and once you done with processing of records, call saveAll(productos list) outside loop.
Try enabling L2 cache. That would reduce the validation time. Depending on how critical your data is, you can also cache the entity on the application level.
Create a transaction to save the entity. This will allow the database to leverage the concurrency control.
See if you can change the architecture to enable the queue (could be Kafka Q), and another application consumes this Q to write to the database.
I trying persist a many registers in database reading a file with many lines
I´m using a forech to read the list of objects wrapped in file
logs.stream().forEach(log -> save(log));
private LogData save(LogData log) {
return repository.persist(log);
}
But the inserts are slow
Do i have a way to speed the inserts?
Your way take a long time because you persist element by element, so you go n time to the database, I would like to use Batch processing instead to use one transaction instead of N transaction, so the persist method can be :
public void persist(List<Logs> logs) {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
logs.forEach(log -> session.save(log));// from the comment of #shmosel
tx.commit();
session.close();
}
Use a Batch Insert, Google "Hibernate Batch Insert" or replace with whatever name of your ORM if it's not Hibernate.
https://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm
To insert at every line makes this program slowly, why dont you think to collect n lines, and insert n lines together at once.
I have a classic Java EE system, Web tier with JSF, EJB 3 for the BL, and Hibernate 3 doing the data access to a DB2 database. I am struggling with the following scenario: A user will initiate a process which involves retrieving a large data set from the database. The retrieval process takes some time and so the user does not receive an immediate response, gets impatient and opens a new browser and initiates the retrieval again, sometimes multiple times. The EJB container is obviously unaware of the fact that the first retrievals are no longer relevant, and when the database returns a result set, Hibernate starts populating a set of POJOs which take up vast amounts of memory, eventually causing an OutOfMemoryError.
A potential solution that I thought of was to use the Hibernate Session's cancelQuery method. However, the cancelQuery method only works before the database returns a result set. Once the database returns a result set and Hibernate begins populating the POJOs, the cancelQuery method no longer has an effect. In this case, the database queries themselves return rather quickly, and the bulk of the performance overhead seems to reside in populating the POJOs, at which point we can no longer call the cancelQuery method.
The solution implemented ended up looking like this:
The general idea was to maintain a map of all the Hibernate sessions that are currently running queries to the HttpSession of the user who initiated them, so that when the user would close the browser we would be able to kill the running queries.
There were two main challenges to overcome here. One was propagating the HTTP session-id from the web tier to the EJB tier without interfering with all the method calls along the way - i.e. not tampering with existing code in the system. The second challenge was to figure out how to cancel the queries once the database had already started returning results and Hibernate was populating objects with the results.
The first problem was overcome based on our realization that all methods being called along the stack were being handled by the same thread. This makes sense, as our application exists all within one container and does not have any remote calls. Being that that is the case, we created a Servlet Filter that intercepts every call to the application and adds a ThreadLocal variable with the current HTTP session-id. This way the HTTP session-id will be available to each one of the method calls lower down along the line.
The second challenge was a little more sticky. We discovered that the Hibernate method responsible for running the queries and subsequently populating the POJOs was called doQuery and located in the org.hibernate.loader.Loader.java class. (We happen to be using Hibernate 3.5.3, but the same holds true for newer versions of Hibernate.):
private List doQuery(
final SessionImplementor session,
final QueryParameters queryParameters,
final boolean returnProxies) throws SQLException, HibernateException {
final RowSelection selection = queryParameters.getRowSelection();
final int maxRows = hasMaxRows( selection ) ?
selection.getMaxRows().intValue() :
Integer.MAX_VALUE;
final int entitySpan = getEntityPersisters().length;
final ArrayList hydratedObjects = entitySpan == 0 ? null : new ArrayList( entitySpan * 10 );
final PreparedStatement st = prepareQueryStatement( queryParameters, false, session );
final ResultSet rs = getResultSet( st, queryParameters.hasAutoDiscoverScalarTypes(), queryParameters.isCallable(), selection, session );
final EntityKey optionalObjectKey = getOptionalObjectKey( queryParameters, session );
final LockMode[] lockModesArray = getLockModes( queryParameters.getLockOptions() );
final boolean createSubselects = isSubselectLoadingEnabled();
final List subselectResultKeys = createSubselects ? new ArrayList() : null;
final List results = new ArrayList();
try {
handleEmptyCollections( queryParameters.getCollectionKeys(), rs, session );
EntityKey[] keys = new EntityKey[entitySpan]; //we can reuse it for each row
if ( log.isTraceEnabled() ) log.trace( "processing result set" );
int count;
for ( count = 0; count < maxRows && rs.next(); count++ ) {
if ( log.isTraceEnabled() ) log.debug("result set row: " + count);
Object result = getRowFromResultSet(
rs,
session,
queryParameters,
lockModesArray,
optionalObjectKey,
hydratedObjects,
keys,
returnProxies
);
results.add( result );
if ( createSubselects ) {
subselectResultKeys.add(keys);
keys = new EntityKey[entitySpan]; //can't reuse in this case
}
}
if ( log.isTraceEnabled() ) {
log.trace( "done processing result set (" + count + " rows)" );
}
}
finally {
session.getBatcher().closeQueryStatement( st, rs );
}
initializeEntitiesAndCollections( hydratedObjects, rs, session, queryParameters.isReadOnly( session ) );
if ( createSubselects ) createSubselects( subselectResultKeys, queryParameters, session );
return results; //getResultList(results);
}
In this method you can see that first the results are brought from the database in the form of a good old fashioned java.sql.ResultSet, after which it runs in a loop over each set and creates an object from it. Some additional initialization is performed in the initializeEntitiesAndCollections() method called after the loop. After debugging a little, we discovered that the bulk of the performance overhead was in these sections of the method, and not in the part that gets the java.sql.ResultSet from the database, but the cancelQuery method was only effective on the first part. The solution therefore was to add an additional condition to the for loop, to check whether the thread is interrupted like this:
for ( count = 0; count < maxRows && rs.next() && !currentThread.isInterrupted(); count++ ) {
// ...
}
as well as to perform the same check before calling the initializeEntitiesAndCollections() method:
if (!Thread.interrupted()) {
initializeEntitiesAndCollections(hydratedObjects, rs, session,
queryParameters.isReadOnly(session));
if (createSubselects) {
createSubselects(subselectResultKeys, queryParameters, session);
}
}
Additionally, by calling the Thread.interrupted() on the second check, the flag is cleared and does not affect the further functioning of the program. Now when a query is to be canceled, the canceling method accesses the Hibernate session and thread stored in a map with the HTTP session-id as the key, calls the cancelQuery method on the session and calls the interrupt method of the thread.
I got a similar problem in a totally different environment. I did the following: before adding the new job to my queue I first checked whether the 'same job' is already enqueued from that user. If so I do not accept the second job and inform the user about that.
This doesn't answer your question on how to protect the user from an outOfMemory if the data is too big to fit in the available ram. But it's a good trick to protect your server from doing useless stuff.
Too complicated for me :-) I would like to create separate service for "heavy" queries. And store in it information about query parameters, maybe results, which would be valid limited time. If query execution is too long, user receive message, that execution of his task will takes considerable time, and he may wait or cancel it. Such scenario works fine for analytic queries. This variant gave you simple access to task, running on the server, to kill its.
But if you has problem with hibernate, than I suppose that problem not in analytic queries, but in ordinary business queries. If its execution too long, can you try to use L2 cache (cold start may be very long, but hot data would be received instantly)? Or optimize hibernate\jbdc parameters?
I have the equivalent of the following code and hibernate configuration (basically, StreamRef belongs to tape, and has to be unique on that tape):
<class name="StreamRef" table="StreamRefToTape">
<composite-id> <key-property name="UUID"/>
<key-many-to-one class="Tape" name="tape">
<column name="Tape_TapeId" not-null="true"/>
</key-many-to-one>
</composite-id>
...</class>
<class name="Tape" table="Tape">
<id column="TapeId" name="tapeId"/></class>
I have millions of these StreamRef's, and I want to save them all within the same transaction, but I also want to save on RAM during this transaction.
So I attempted the following code, my assumption being that if I turn off CacheMode, then it won't track objects internally, so it will save a lot of RAM (this seems to help, to some degree). But when testing this hypothesis, like this:
session = sessionFactory.openSession();
session.setCacheMode(CacheMode.IGNORE); // disable the first level cache
session.beginTransaction();
Tape t = new Tape();
StreamRef s1 = new StreamRef("same uuid");
StreamRef s2 = new StreamRef("same uuid"); // force a primary key collision
session.saveOrUpdate(t);
for(StreamRef s : t.getStreams()) {
session.save(s);
}
session.commit();
I would have expected this to not raise because I turned off CacheMode (but it raises a NonUniqueObjectException https://gist.github.com/4542569 ). Could somebody please confirm that 1) the hibernate internal cache is not disable-able? and 2) this exception has nothing to do with CacheMode? Is there any way to accomplish what I want here (to not use up tons of hibernate RAM within a transaction?)
somewhat related: https://stackoverflow.com/a/3543740/32453
(As a side question...does it matter the order that setCacheMode is called vin relation to beginTransaction? I assume it doesn't?)
Many thanks.
The exception makes sense. You're violating the rules you told Hibernate you were going to play by. If you really want to do what you've coded you'll need to use the StatelessSession API or createSQLQuery API. As it stands, Session.setCacheMode is for interacting with the second level cache, not the session cache.
Regarding memory usage, you'll want to incrementally flush batches of records to disk so Hibernate can purge its ActionQueue.
Here is an example from the section on batch updates in the user's guide:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
You can also read about stateless sessions in the same chapter.
Hibernate will save all session object at once... Cache will store object for other sessions... So you can't disable your's for the single session... You can't do it try to use merge()....
I need to insert a lot of data in a database using hibernate, i was looking at batch insert from hibernate, what i am using is similar to the example on the manual:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
but i see that flush doesn't write the data on the database.
Reading about it, if the code is inside a transaction then nothing will be committed to the database until the transaction performs a commit.
So what is the need to use flush/clear ? seems useless, if the data are not written on the database then they are in memory.
How can i force hibernate to write data in the database?
Thanks
The data is sent to the database, and is not in memory anymore. It's just not made definitively persistent until the transaction commit. It's exacltly the same as if you executes the following sequences of statements in any database tool:
begin;
insert into ...
insert into ...
insert into ...
// here, three inserts have been done on the database. But they will only be made
// definitively persistent at commit time
...
commit;
The flush consists in executing the insert statements.
The commit consists in executing the commit statement.
The data will be written to the database, but according to the transaction isolation level you will not see them (in other transactions) until the transaction is committed.
Use some sql statement logger, that prints the statmentes that are transported over the database connection, then you will see that the statmentes are send to the database.
For best perfromance you also have to commit transactions. Flushing and clearing session clears hibernate caches, but data is moved to JDBC connection caches, and is still uncommited ( different RDBMS / drivers show differrent behaviour ) - you are just shifting proble to other place without real improvements in perfromance.
Having flush() at the location mentioned saves you memory too as your session will be cleared regularly. Otherwise you will have 100000 object in memory and might run out of memory for larger count. Check out this article.