Batch inserts with JPA/EJB3 - java

Does JPA/EJB3 framework provide standard way to do batch insert operation...?
We use hibernate for persistence framework, So I can fall back to Hibernate Session and use combination session.save()/session.flush() achieve batch insert. But would like to know if EJB3 have a support for this...

Neither JPA nor Hibernate do provide particular support for batch inserts and the idiom for batch inserts with JPA would be the same as with Hibernate:
EntityManager em = ...;
EntityTransaction tx = em.getTransaction();
tx.begin();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
em.persist(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
em.flush();
em.clear();
}
}
tx.commit();
session.close();
Using Hibernate's proprietary API in this case doesn't provide any advantage IMO.
References
JPA 1.0 Specification
Section 4.10 "Bulk Update and Delete Operations"
Hibernate Core reference guide
Chapter 13. Batch processing

For hibernate specifically, the whole chapter 13 of the core manual explain the methods.
But you are saying that you want the EJB method through Hibernate, so the entity manager documentation also has a chapter on that here. I suggest that you read both (the core and the entity manager).
In EJB, it is simply about using EJB-QL (with some limitations). Hibernate provides more mechanics though if you need more flexibility.

With medium records number you can use this way:
em.getTransaction().begin();
for (int i = 1; i <= 100000; i++) {
Point point = new Point(i, i);
em.persist(point);
if ((i % 10000) == 0) {
em.flush();
em.clear();
}
}
em.getTransaction().commit();
But with large records number you should perform this task in multiple transactions:
em.getTransaction().begin();
for (int i = 1; i <= 1000000; i++) {
Point point = new Point(i, i);
em.persist(point);
if ((i % 10000) == 0) {
em.getTransaction().commit();
em.clear();
em.getTransaction().begin();
}
}
em.getTransaction().commit();
Ref: JPA Batch Store

Yes you can rollback to your JPA implementation if you wish in order to have the control you defined.
JPA 1.0 is rich on EL-HQL but light on Criteria API support, however this has been addressed in 2.0.
Session session = (Session) entityManager.getDelegate();
session.setFlushMode(FlushMode.MANUAL);

Pascal
In your example to insert 100000 records, it is done within single transaction, as the commit() is only called at the end.. Does it put a lot pressure towards the database? Furthermore, in case there is rollback, the cost will be too much..
Will the following approach be better?
EntityManager em = ...;
for ( int i=0; i<100000; i++ ) {
if(!em.getTransaction().isActive()) {
em.getTransaction().begin();
}
Customer customer = new Customer(.....);
em.persist(customer);
if ((i+1) % 20 == 0 ) { //20, same as the JDBC batch size
//flush and commit of inserts and release memory:
em.getTransaction().commit();
em.clear();
}
}
session.close();

Related

How to DO Batch Update in Hibernate Effectively

I have read many article and found some ways to do batch process
One of that is Using flush and clear , following is the code
long t1 = System.currentTimeMillis();
Session session = getSession();
Transaction transaction = session.beginTransaction();
try {
Query query = session.createQuery("FROM PersonEntity WHERE id > " + lastMaxId + " ORDER BY id");
query.setMaxResults(1000);
rows = query.list();
int count = 0;
if (rows == null || rows.size() == 0) {
return;
}
LOGGER.info("fetched {} rows from db", rows.size());
for (Object row : rows) {
PersonEntity personEntity = (PersonEntity) row;
personEntity.setName(randomAlphaNumeric(30));
lastMaxId = personEntity.getId();
session.saveOrUpdate(personEntity);
if (++count % 50 == 0) {
session.flush();
session.clear();
LOGGER.info("Flushed and Cleared");
}
}
} finally {
if (session != null && session.isOpen()) {
LOGGER.info("Closing Session and commiting transaction");
transaction.commit();
session.close();
}
}
long t2 = System.currentTimeMillis();
LOGGER.info("time taken {}s", (t2 - t1) / 1000);
In above code we are processing records in batch of 1000 and updating them in the same transaction .
It is OK when we have to do batch update only .
But I have following questions regading it :
There can be case when some other thread(T2) is accessing the same set of rows for some runtime update operations , but in this case till the 1000 batch will not be commited , T2 remians stuck
So , How we should handle this case ?
Possible thoughts/solution by me :
I think we can do update in different session with small batch of say 50
Use a diffrent Stateless connection for Update and commit the transcation one by one , but close the session when a batch of 1000 completes .
Please Help me getting better solution .
Do you mean to say this:
there is a batch update in progress inside a transaction
in the meanwhile another thread starts updating one of the records that's there in the batch as well
because of this, the batch will wait till the update in point 2 is complete. This causes the rest of the records in the batch to also wait.
So far, it appears all good. However, the important pont here was that the transaction was done to make the update to a large set of records "faster". Usually, transactions are used to ensure "consistency/atomicity".
How does one design this piece - fast updates to multiple records in one go with atomicity not being the primary criteria, while a likely update to a record in the batch is also requested by another thread

Hibernate Batch Operation performance

I have around 5000 records to update. I am trying to measure performance of the operation. It starts with around 100 ms but after every thousand updates operation time increases around 80 ms. Why is it slowing down? JVM?
StatelessSession session = dao.getStatelessSession();
Transaction transaction = session.beginTransaction();
try {
List<Entity> list = dao.findAll();
int counter = 0;
for (Entity each : list) {
final Date startTime = Clock.getTime();
webService.execute(each);
session.update(each);
counter += 1;
final Date endTime = Clock.getTime();
LOGGER.info("***** " + getMilliSecondsDifference(startTime, endTime) + " for count: " + counter + "*****");
}
} catch (Exception e) {
LOGGER.info("***** Exception occured : ", e);
} finally {
transaction.commit();
session.close();
}
Hüseyin,
It doesnt have to be hibernate problem at all if we look at your code.
I suggest you to comment out your line related with webservice call.
Then please try again batch hql running.
Maybe networking could be getting slower.
You have one transaction and dealing with a large number of objects. here you will probabely have a memory leak and performance issue too.
The objects references will stay in memory untill a session flush is executed (commit). so you will have a big number of object in memory in addition of the big number of informations about the object changes that will be also kept in the hibernate session and that can alter the performance too (i'am not an hibernate expert but you should consider this point)
I think that you may think about using a lot of transactions
See theses interesting links:
Transaction Management for bulk operations
Hibernate session and Transaction Management Guidelines
Good luck

Using update queries with objectdb

The following code:
EntityManagerFactory emf = Persistence.createEntityManagerFactory("test.odb");
EntityManager em = emf.createEntityManager();
em.getTransaction().begin();
Point p = new Point(0, 0);
em.persist(p);
em.getTransaction().commit();
em.getTransaction().begin();
Query query = em.createQuery("UPDATE Point SET x = 1001 where x = 0");
int updateCount = query.executeUpdate();
em.getTransaction().commit();
TypedQuery<Point> myquery = em.createQuery("SELECT p from Point p where p.x = 1001", Point.class);
List<Point> results = myquery.getResultList();
System.out.println("X coordinate is: " + results.get(0).getX());
em.close();
prints out: X coordinate is: 0
which is wrong because X coordinate should be 1001
But if I change the code to:
EntityManagerFactory emf = Persistence.createEntityManagerFactory("test.odb");
EntityManager em = emf.createEntityManager();
em.getTransaction().begin();
Point p = new Point(0, 0);
em.persist(p);
em.getTransaction().commit();
em.getTransaction().begin();
Query query = em.createQuery("UPDATE Point SET x = 1001 where x = 0");
int updateCount = query.executeUpdate();
em.getTransaction().commit();
em.close();
em = emf.createEntityManager();
TypedQuery<Point> myquery = em.createQuery("SELECT p from Point p where p.x = 1001", Point.class);
List<Point> results = myquery.getResultList();
System.out.println("X coordinate is: " + results.get(0).getX());
em.close();
The result is same as expected:
X coordiate is: 1001
What have I done wrong in the first code snippet?
UPDATE queries bypass the EntityManager, which means that the EntityManager may not have an up to date view of the real objects in the database.
As explained in the UPDATE queries page in the ObjectDB Manual:
"Updating entity objects in the database using an UPDATE query may be slightly more efficient than retrieving entity objects and then updating them, but it should be used cautiously because bypassing the EntityManager may break its synchronization with the database. For example, the EntityManager may not be aware that a cached entity object in its persistence context has been modified by an UPDATE query. Therefore, it is a good practice to use a separate EntityManager for UPDATE queries."
Using a separate EntityManager is exactly what you did by closing and opening a new EntityManager in your revised code.
Alternatively, if you want to use the same EntityManager, you may clear its persistence context (i.e. its cache), after running the UPDATE query and before running the SELECT query.

EntityManager throws OptimisticLockException when try to delete locked entity in same transaction

Here is my code:
EntityManager em = JPAUtil.createEntityManager();
try {
EntityTransaction tx = em.getTransaction();
try {
//do some stuff here
tx.begin();
List es = em.createNamedQuery("getMyEntities", MyEntity.class).getResultList();
for (MyEntity e : es) {
em.lock(e, LockModeType.OPTIMISTIC);
}
if (es.size() != 0) {
em.remove(es.get(0));
}
tx.commit
} finally {
if (tx.isActive()) {
tx.rollback();
}
}
} finally {
em.close();
}
When I'm executing that code I get :
...
..........
Caused by: javax.persistence.OptimisticLockException: Newer version [null] of entity [[MyEntity#63]] found in database
at org.hibernate.ejb.AbstractEntityManagerImpl.wrapLockException(AbstractEntityManagerImpl.java:1427)
at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1324)
at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1300)
at org.hibernate.ejb.TransactionImpl.commit(TransactionImpl.java:80)
... 23 more
Can anybody explain me why that?
I suppose that you have added the #Version-annotated column after you already had some entries in database, so that some null-values were created for the already-existing records.
Now hibernate can't compare the versions.
I would try to set the version column to 1 for all null-versioned entities.
I think this error is thrown due to the fact that I try to delete a record that has a lock on it. Trying to delete this row, will set the version to null, but the version in the database still remains set to former number. It seems that hibernate core perceive a null value to be not reliable for this kind of operation.
If I have to do this kind of operation, I have to release the lock first on this entity.
Anyone with better knowledge on it has to clarify that issue.

java persistence memory leaks

I have 1M rows in a table and I want to get all of them. But when I try to get all rows with jpa by pagination then I get java heap error. Do you think that am I missing something? Any advice
int counter = 0;
while (counter >= 0) {
javax.persistence.EntityManager em = javax.persistence.Persistence
.createEntityManagerFactory("MyPU")
.createEntityManager();
Query query = em.createQuery("select m from mytable m");
java.util.Collection<MyEntity> data = query
.setFirstResult(counter).setMaxResults(1000).getResultList();
for(MyEntity yobj : data){
System.out.println(obj);
}
counter += 1000;
data.clear();
em.clear();
em.close();
}
Since you use native SQL anyway, can't you specify the LIMIT :counter, 1000 (or ROWNUM BETWEEN :counter AND 1000 if using Oracle) directly in your SQL statement?
Note that you create new instance of EntityManagerFactory at each iteration, but don't close it. Perhaps it would be better to use a single factory instead:
int counter = 0;
EntityManagerFactory emf = javax.persistence.Persistence.createEntityManagerFactory("MyPU");
while (counter >= 0) {
javax.persistence.EntityManager em = emf.createEntityManager();
...
}

Categories