how is Hibernate able to throw a NonUniqueObjectException with setCacheMode(CacheMode.IGNORE)?

how is Hibernate able to throw a NonUniqueObjectException with setCacheMode(CacheMode.IGNORE)? - java

I have the equivalent of the following code and hibernate configuration (basically, StreamRef belongs to tape, and has to be unique on that tape):
<class name="StreamRef" table="StreamRefToTape">
<composite-id> <key-property name="UUID"/>
<key-many-to-one class="Tape" name="tape">
<column name="Tape_TapeId" not-null="true"/>
</key-many-to-one>
</composite-id>
...</class>
<class name="Tape" table="Tape">
<id column="TapeId" name="tapeId"/></class>
I have millions of these StreamRef's, and I want to save them all within the same transaction, but I also want to save on RAM during this transaction.
So I attempted the following code, my assumption being that if I turn off CacheMode, then it won't track objects internally, so it will save a lot of RAM (this seems to help, to some degree). But when testing this hypothesis, like this:
session = sessionFactory.openSession();
session.setCacheMode(CacheMode.IGNORE); // disable the first level cache
session.beginTransaction();
Tape t = new Tape();
StreamRef s1 = new StreamRef("same uuid");
StreamRef s2 = new StreamRef("same uuid"); // force a primary key collision
session.saveOrUpdate(t);
for(StreamRef s : t.getStreams()) {
session.save(s);
}
session.commit();
I would have expected this to not raise because I turned off CacheMode (but it raises a NonUniqueObjectException https://gist.github.com/4542569 ). Could somebody please confirm that 1) the hibernate internal cache is not disable-able? and 2) this exception has nothing to do with CacheMode? Is there any way to accomplish what I want here (to not use up tons of hibernate RAM within a transaction?)
somewhat related: https://stackoverflow.com/a/3543740/32453
(As a side question...does it matter the order that setCacheMode is called vin relation to beginTransaction? I assume it doesn't?)
Many thanks.

The exception makes sense. You're violating the rules you told Hibernate you were going to play by. If you really want to do what you've coded you'll need to use the StatelessSession API or createSQLQuery API. As it stands, Session.setCacheMode is for interacting with the second level cache, not the session cache.
Regarding memory usage, you'll want to incrementally flush batches of records to disk so Hibernate can purge its ActionQueue.
Here is an example from the section on batch updates in the user's guide:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
You can also read about stateless sessions in the same chapter.

Hibernate will save all session object at once... Cache will store object for other sessions... So you can't disable your's for the single session... You can't do it try to use merge()....

Related

Insert a list of objects java

I trying persist a many registers in database reading a file with many lines
I´m using a forech to read the list of objects wrapped in file
logs.stream().forEach(log -> save(log));
private LogData save(LogData log) {
return repository.persist(log);
}
But the inserts are slow
Do i have a way to speed the inserts?

Your way take a long time because you persist element by element, so you go n time to the database, I would like to use Batch processing instead to use one transaction instead of N transaction, so the persist method can be :
public void persist(List<Logs> logs) {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
logs.forEach(log -> session.save(log));// from the comment of #shmosel
tx.commit();
session.close();
}

Use a Batch Insert, Google "Hibernate Batch Insert" or replace with whatever name of your ORM if it's not Hibernate.
https://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm

To insert at every line makes this program slowly, why dont you think to collect n lines, and insert n lines together at once.

Hibernate performance issue while inserting massive data

We will migrate large amounts of data (a single type of entity) from Amazon's DynamoDB into a MySQL DB. We are using Hibernate to map this class into a mysql entity. There are around 3 million entities (excluding rows of list property). Here is our class mapping summary:
#Entity
#Table(name = "CUSTOMER")
public class Customer {
#Id
#Column(name = "id")
private String id;
//Other properties in which all of them are primitive types/String
#ElementCollection
#CollectionTable(name = "CUSTOMER_USER", joinColumns = #JoinColumn(name = "customer_id"))
#Column(name = "userId")
private List<String> users;
// CONSTRUCTORS, GETTERS, SETTERS, etc.
}
users is a list of String. We have created two mysql tables like following:
CREATE TABLE CUSTOMER(id VARCHAR(100), PRIMARY KEY(id));
CREATE TABLE CUSTOMER_USER(customer_id VARCHAR(100), userId VARCHAR(100), PRIMARY KEY(customer_id, userId), FOREIGN KEY (customer_id) REFERENCES CUSTOMER(id));
Note: We do not make hibernate generate any id value, we are assigning our IDs to Customer entities which are guaranteed to be unique.
Here is our hibernate.cfg.xml:
<hibernate-configuration>
<session-factory>
<property name="hibernate.dialect"> org.hibernate.dialect.MySQLDialect </property>
<property name="hibernate.connection.driver_class"> com.mysql.jdbc.Driver </property>
<property name="hibernate.connection.url"> jdbc:mysql://localhost/xxx </property>
<property name="hibernate.connection.username"> xxx </property>
<property name="hibernate.connection.password"> xxx </property>
<property name="hibernate.connection.provider_class">org.hibernate.c3p0.internal.C3P0ConnectionProvider</property>
<property name="hibernate.jdbc.batch_size"> 50 </property>
<property name="hibernate.cache.use_second_level_cache">false</property>
<property name="c3p0.min_size">30</property>
<property name="c3p0.max_size">70</property>
</session-factory>
</hibernate-configuration>
We are creating some number of threads each reading data from Dynamo and inserting them to our MySQl DB via Hibernate. Here is what each thread does:
// Each single thread brings resultItems from DynamoDB
Session session = factory.openSession();
Transaction tx = session.beginTransaction();
for(int i = 0; i < resultItems.size(); i++) {
Customer cust = new Customer(resultItems.get(i));
session.save(cust);
if(i % BATCH_SIZE == 0) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
We have our own performance monitoring functions and we are continuously logging the overall read/write performance. The problem is, migration starts with reading/writing 1500 items/sec (on average), but keeps getting slowed as long as number of rows in CUSTOMER and CUSTOMER_USER tables increases (after a few minutes, r/w speed was around 500 items/sec). I am not experienced on Hibernate and here are my questions:
What should hibernate.cfg.xml be like for a multi-threaded task like ours? Is the content which i gave above fits for such a task or is there any wrong/missing point?
There are exactly 50 threads and each does following: Read from DynamoDB first, and then insert the results into mysql db, then read from dynamo, and so on. Therefore, uptime of communication with hibernate is not 100%. Under these circumstances, what do you recommend to set min_size and max_size of c3p0 connection pool sizes? To be able to understand the concept, should I also set remaining c3p0-related tags in hibernate.cfg.xml?
What can be done to maximize the speed of bulk inserting?
NOTE 1 I did not write all of the properties, because the remaining ones other than list of users are all int, boolean, String, etc.
NOTE 2 All of points are tested and have no negative effect on performance. When we dont insert anything into mysql db, read speed stays stable for hours.
NOTE 3 Any recommendation/guidance about the structure of mysql tables, configuration settings, sessions/transactions, number of connection pools, batch sizes, etc. will be really helpful!

Assuming you are not doing anything else in the hibernate transaction than just inserting the data into these two tables, you can use StatelessSession session = sessionFactory.openStatelessSession(); instead of normal session which reduces the overhead of maintaining the caches. But then you will have to save the nested collection objects separately.
Refer https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html
So it could be something like -
// Each single thread brings resultItems from DynamoDB
StatelessSession session = factory.openStatelessSession();
Transaction tx = session.beginTransaction();
for(int i = 0; i < resultItems.size(); i++) {
Customer cust = new Customer(resultItems.get(i));
Long id = session.save(cust); // get the generated id
// TODO: Create a list of related customer users and assign the id to all of them and then save those customer user objects in the same transaction.
if(i % BATCH_SIZE == 0) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();

In your scenario there are 25 threads batch-inserting data into one table simultaneously. MySQL has to maintain ACID properties while 25 transactions for many records in one table remain open or are being committed. That can cause a huge overhead.
While migrating data from databases, network latency can cause significant delays when there are many back-and-forth communications with the database. In this case, using multiple threads can be beneficial. But when doing batch fetches and batch inserts, there is little to gain as the database drivers will (or should) communicate data without doing much back-and-forth communications.
In the batch-scenario, start with 1 thread that reads data, prepares a batch and puts it in a queue for 1 thread that is writing data from the prepared batches. Keep the batches small (100 to 1 000 records) and commit often (every 100 records or so). This will minimize the overhead for maintaining the table. If network latency is a problem, try using 2 threads for reading and 2 for writing (but any performance gain might be offset by the overhead for maintaining the table used by 2 threads simultaneously).
Since there is no generated ID, you should benefit from the hibernate.jdbc.batch_size option already in your hibernate configuration. The hibernate.jdbc.fetch_size option (set this to 250 or so) might also be of interest.
As #hermant1900 mentions, using the StatelessSession is also a good idea. But by far the fastest method is mentioned by #Rob in the comments: use database tools to export the data to a file and import it in MySQL. I'm quite sure this is also the preferred method: it takes less time, less processing and there are fewer variables involved - overall a lot more reliable.

hibernate batch insert - how flush works?

I need to insert a lot of data in a database using hibernate, i was looking at batch insert from hibernate, what i am using is similar to the example on the manual:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
but i see that flush doesn't write the data on the database.
Reading about it, if the code is inside a transaction then nothing will be committed to the database until the transaction performs a commit.
So what is the need to use flush/clear ? seems useless, if the data are not written on the database then they are in memory.
How can i force hibernate to write data in the database?
Thanks

The data is sent to the database, and is not in memory anymore. It's just not made definitively persistent until the transaction commit. It's exacltly the same as if you executes the following sequences of statements in any database tool:
begin;
insert into ...
insert into ...
insert into ...
// here, three inserts have been done on the database. But they will only be made
// definitively persistent at commit time
...
commit;
The flush consists in executing the insert statements.
The commit consists in executing the commit statement.

The data will be written to the database, but according to the transaction isolation level you will not see them (in other transactions) until the transaction is committed.
Use some sql statement logger, that prints the statmentes that are transported over the database connection, then you will see that the statmentes are send to the database.

For best perfromance you also have to commit transactions. Flushing and clearing session clears hibernate caches, but data is moved to JDBC connection caches, and is still uncommited ( different RDBMS / drivers show differrent behaviour ) - you are just shifting proble to other place without real improvements in perfromance.

Having flush() at the location mentioned saves you memory too as your session will be cleared regularly. Otherwise you will have 100000 object in memory and might run out of memory for larger count. Check out this article.

Bulk insert or update with Hibernate?

I need to consume a rather large amounts of data from a daily CSV file. The CSV contains around 120K records. This is slowing to a crawl when using hibernate. Basically, it seems hibernate is doing a SELECT before every single INSERT (or UPDATE) when using saveOrUpdate(); for every instance being persisted with saveOrUpdate(), a SELECT is issued before the actual INSERT or a UPDATE. I can understand why it's doing this, but its terribly inefficient for doing bulk processing, and I'm looking for alternatives
I'm confident that the performance issue lies with the way I'm using hibernate for this, since I got another version working with native SQL (that parses the CSV in the excat same manner) and its literally running circles around this new version)
So, to the actual question, does a hibernate alternative to mysqls "INSERT ... ON DUPLICATE" syntax exist?
Or, if i choose to do native SQL for this, can I do native SQL within a hibernate transaction? Meaning, will it support commit/rollbacks?

There are many possible bottlenecks in to bulk operations. The best approach depends heavily on what your data looks like. Have a look at the Hibernate Manual section on batch processing.
At a minimum, make sure you are using the following pattern (copied from the manual):
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
If you are mapping a flat file to a very complex object graph you may have to get more creative, but the basic principal is that you have to find a balance between pushing good sized chunks of data to the database with each flush/commit and avoiding exploding the size of the session level cache.
Lastly, if you don't need Hibernate to handle any collections or cascading for your data to be correctly inserted, consider using a StatelessSession.

From Hibernate Batch Processing
For update i used the following :
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
ScrollableResults employeeCursor = session.createQuery("FROM EMPLOYEE")
.scroll();
int count = 0;
while ( employeeCursor.next() ) {
Employee employee = (Employee) employeeCursor.get(0);
employee.updateEmployee();
seession.update(employee);
if ( ++count % 50 == 0 ) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
But for insert i would go for jcwayne answer

According to an answer to a similar question, it can be done by configuring Hibernate to insert objects using a custom stored procedure which uses your database's upsert functionality. It's not pretty, though.

High-throughput data export
If you only want to import data without doing any processing or transformation, then a tool like PostgreSQL COPY is the fastest way o import data.
Batch processing
However, if you need to do the transformation, data aggregation, correlation/merging between existing data and the incoming one, then you need application-level batch processing.
In this case, you want to flush-clear-commit regularly:
int entityCount = 50;
int batchSize = 25;
EntityManager entityManager = entityManagerFactory()
.createEntityManager();
EntityTransaction entityTransaction = entityManager
.getTransaction();
try {
entityTransaction.begin();
for (int i = 0; i < entityCount; i++) {
if (i > 0 && i % batchSize == 0) {
entityTransaction.commit();
entityTransaction.begin();
entityManager.clear();
}
Post post = new Post(
String.format("Post %d", i + 1)
);
entityManager.persist(post);
}
entityTransaction.commit();
} catch (RuntimeException e) {
if (entityTransaction.isActive()) {
entityTransaction.rollback();
}
throw e;
} finally {
entityManager.close();
}
Also, make sure you enable JDBC batching as well using the following configuration properties:
<property
name="hibernate.jdbc.batch_size"
value="25"
/>
<property
name="hibernate.order_inserts"
value="true"
/>
<property
name="hibernate.order_updates"
value="true"
/>
Bulk processing
Bulk processing is suitable when all rows match pre-defined filtering criteria, so you can use a single UPDATE to change all records.
However, using bulk updates that modify millions of records can increase the size of the redo log or end up taking lots of locks on database systems that still use 2PL (Two-Phase Locking), like SQL Server.
So, while the bulk update is the most efficient way to change many records, you have to pay attention to how many records are to be changed to avoid a long-running transaction.
Also, you can combine bulk update with optimistic locking so that other OLTP transactions won't lose the update done by the bulk processing process.

If you use sequence or native generator Hibernate will use a select to get the id:
<id name="id" column="ID">
<generator class="native" />
</id>
You should use hilo or seqHiLo generator:
<id name="id" type="long" column="id">
<generator class="seqhilo">
<param name="sequence">SEQ_NAME</param>
<param name="max_lo">100</param>
</generator>
</id>

The "extra" select is to generate the unique identifier for your data.
Switch to HiLo sequence generation and you can reduce the sequence roundtrips to the database by the number of the allocation size. Please note, there will be a gap in primary keys unless you adjust your sequence value for the HiLo generator

Question about Hibernate session.flush()

I want to inquire about what actually the flush method does in the following case:
for (int i = 0; i < myList.size(); i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
getCurrentSession().save(c);
if (i % 20 == 0)
getCurrentSession().flush();
}
Does this means that after the iteration 20, the cache is flushed, and then the 20 held memory objects are actually saved in the database ?
Can someone please explain to me what will happen when the condition is true.

From the javadoc of Session#flush:
Force this session to flush. Must be
called at the end of a unit of work,
before committing the transaction and
closing the session (depending on
flush-mode, Transaction.commit()
calls this method).
Flushing is the process of synchronizing the underlying
persistent store with persistable
state held in memory.
In other words, flush tells Hibernate to execute the SQL statements needed to synchronize the JDBC connection's state with the state of objects held in the session-level cache. And the condition if (i % 20 == 0) will make it happen for every i multiple of 20.
But, still, the new Car instances will be held in the session-level cache and, for big myList.size(), you're going to eat all memory and ultimately get an OutOfMemoryException. To avoid this situation, the pattern described in the documentation is to flush AND clear the session at regular intervals (same size as the JDBC batch size) to persist the changes and then detach the instances so that they can be garbage collected:
13.1. Batch inserts
When making new objects persistent
flush() and then clear() the session
regularly in order to control the size
of the first-level cache.
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
The documentation mentions in the same chapter how to set the JDBC batch size.
See also
10.10. Flushing the Session
Chapter 13. Batch processing

Depends on how the FlushMode is set up.
In default configuration Hibernate tries to sync up with the database at three locations.
1. before querying data
2. on commiting a transaction
3. explictly calling flush
If the FlushMode is set as FlushMode.Manual, the programmer is informing hibernate that he/she will handle when to pass the data to the database.Under this configuration
the session.flush() call will save the object instances to the database.
A session.clear() call acutally can be used to clear the persistance context.

// Assume List to be of 50
for (int i = 0; i < 50 ; i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
getCurrentSession().save(c);
// 20 car Objects which are saved in memory syncronizes with DB
if (i % 20 == 0)
getCurrentSession().flush();
}
Few more pointers regarding why the flushing should match batch size
To enable batching you need to set the jdbc batch size
// In your case
hibernate.jdbc.batch_size =20
One common pitfall in using batching is if you are using single object update or insert this goes fine.But in case
you are using mutiple objects leading to multiple inserts /updates then you will have to explicitly set the sorting mechanism.
For example
// Assume List to be of 50
for (int i = 0; i < 50 ; i++) {
Car c = new Car( car.get(i).getId(),car.get(i).getName() );
// Adding accessory also in the card here
Accessories a=new Accessories("I am new one");
c.add(a);
// Now you got two entities to be persisted . car and accessory
// Two SQL inserts
getCurrentSession().save(c);
// 20 car Objects which are saved in memory syncronizes with DB
// Flush here clears the car objects from 1st level JVM cache
if (i % 20 == 0)
getCurrentSession().flush();
getCurrentSession().clear();
}
Here in this case
two sql are generated
1 for insert in car
1 for insert in accessory
For proper batching you will have to set the
<prop key="hibernate.order_inserts">true</prop>
so that all the inserts for car is sorted together and all inserts of accessories are sorted together.By doing so you will have 20 inserts firing in a batch rather then 1 sql firing at a time.
For different operation under one transaction, you can have a look at http://docs.jboss.org/hibernate/core/3.2/api/org/hibernate/event/def/AbstractFlushingEventListener.html

Yes every 20 loop, sql is generated and executed for the unsaved objects. Your should also set batch mode to 20 to increase performances.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.