I have read many article and found some ways to do batch process
One of that is Using flush and clear , following is the code
long t1 = System.currentTimeMillis();
Session session = getSession();
Transaction transaction = session.beginTransaction();
try {
Query query = session.createQuery("FROM PersonEntity WHERE id > " + lastMaxId + " ORDER BY id");
query.setMaxResults(1000);
rows = query.list();
int count = 0;
if (rows == null || rows.size() == 0) {
return;
}
LOGGER.info("fetched {} rows from db", rows.size());
for (Object row : rows) {
PersonEntity personEntity = (PersonEntity) row;
personEntity.setName(randomAlphaNumeric(30));
lastMaxId = personEntity.getId();
session.saveOrUpdate(personEntity);
if (++count % 50 == 0) {
session.flush();
session.clear();
LOGGER.info("Flushed and Cleared");
}
}
} finally {
if (session != null && session.isOpen()) {
LOGGER.info("Closing Session and commiting transaction");
transaction.commit();
session.close();
}
}
long t2 = System.currentTimeMillis();
LOGGER.info("time taken {}s", (t2 - t1) / 1000);
In above code we are processing records in batch of 1000 and updating them in the same transaction .
It is OK when we have to do batch update only .
But I have following questions regading it :
There can be case when some other thread(T2) is accessing the same set of rows for some runtime update operations , but in this case till the 1000 batch will not be commited , T2 remians stuck
So , How we should handle this case ?
Possible thoughts/solution by me :
I think we can do update in different session with small batch of say 50
Use a diffrent Stateless connection for Update and commit the transcation one by one , but close the session when a batch of 1000 completes .
Please Help me getting better solution .
Do you mean to say this:
there is a batch update in progress inside a transaction
in the meanwhile another thread starts updating one of the records that's there in the batch as well
because of this, the batch will wait till the update in point 2 is complete. This causes the rest of the records in the batch to also wait.
So far, it appears all good. However, the important pont here was that the transaction was done to make the update to a large set of records "faster". Usually, transactions are used to ensure "consistency/atomicity".
How does one design this piece - fast updates to multiple records in one go with atomicity not being the primary criteria, while a likely update to a record in the batch is also requested by another thread
Related
I use Hibernate 5 and Oracle 12.
With the below query I want to randomly select an Entity from a set of Entity's:
Query query = getSession().createQuery("SELECT e FROM Entity e ... <CONDITIONS> ... AND ROWNUM = 1");
Optional<Entity> entity = query.list().stream().findAny();
// Change the entity in some way. The changes will also make sure that the entity won't appear in the next query run based on <CONDITIONS>
...
This works but only if all the transactions that execute the code run sequentially. Thus I also want to make sure that the entity that has already been read won't be read in another transaction.
I tried it with locking:
Query query = getSession().createQuery("SELECT e FROM Entity e ... <CONDITIONS> ... AND ROWNUM = 1")
.setLockMode("this", LockMode.PESSIMISTIC_READ);
But it seems that Hibernate converts this construct to SELECT ... FOR UPDATE which doesn't prevent the other transaction from reading the entity, waiting till the other transactions using it commits and then applying their own changes on the entity.
Is it possible to set some kind of lock on the entity so that it disappears guaranteed from the query result in another transaction?
I've written some experimental code to understand how locking works in Hibernate. It simulates two transactions whose key steps (select and commit) can be executed in different order by adjusting the parameters of transaction() method. This time Field is used instead of Entity, but it doesn't matter. Each transaction reads the same Field, updates its description attribute and commits.
private static final LockMode lockMode = LockMode.PESSIMISTIC_WRITE;
enum Order {T1_READS_EARLIER_COMMITS_LATER, T2_READS_EARLIER_COMMITS_LATER};
#Test
public void firstReadsTheOtherRejected() {
ExecutorService es = Executors.newFixedThreadPool(3);
// It looks like the transaction that commits first is the only transaction that can make changes.
// The changes of the other one will be ignored.
final Order order = Order.T1_READS_EARLIER_COMMITS_LATER;
// final Order order = Order.T2_READS_EARLIER_COMMITS_LATER;
es.execute(() -> {
switch (order) {
case T1_READS_EARLIER_COMMITS_LATER:
transaction("T1", 1, 8);
break;
case T2_READS_EARLIER_COMMITS_LATER:
transaction("T1", 4, 1);
break;
}
});
es.execute(() -> {
switch (order) {
case T1_READS_EARLIER_COMMITS_LATER:
transaction("T2", 4, 1);
break;
case T2_READS_EARLIER_COMMITS_LATER:
transaction("T2", 1, 8);
break;
}
});
es.shutdown();
try {
es.awaitTermination(1, TimeUnit.MINUTES);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private void transaction(String name, int delayBeforeRead, int delayBeforeCommit) {
Transaction tx = null;
Session session = null;
try {
session = factory.openSession();
tx = session.beginTransaction();
try {
TimeUnit.SECONDS.sleep(delayBeforeRead);
} catch (InterruptedException e) {
e.printStackTrace();
}
Query query = session.createQuery("SELECT f FROM Field f WHERE f.description=?1").setLockMode("this", lockMode);
query.setString("1", DESC);
Field field = (Field) query.uniqueResult();
String description1 = field.getDescription();
System.out.println(name + " : FIELD READ " + description1);
try {
TimeUnit.SECONDS.sleep(delayBeforeCommit);
} catch (InterruptedException e) {
e.printStackTrace();
}
field.setDescription(name);
session.update(field);
System.out.println(name + " : FIELD UPDATED");
tx.commit();
} catch (Exception e) {
fail();
if (tx != null) {
tx.rollback();
}
} finally {
session.close();
}
System.out.println(name + " : COMMITTED");
}
and the output:
T1 : FIELD READ This is a field for testing
апр 19, 2019 5:28:01 PM org.hibernate.loader.Loader determineFollowOnLockMode
WARN: HHH000445: Alias-specific lock modes requested, which is not currently supported with follow-on locking; all acquired locks will be [PESSIMISTIC_WRITE]
апр 19, 2019 5:28:01 PM org.hibernate.loader.Loader shouldUseFollowOnLocking
WARN: HHH000444: Encountered request for locking however dialect reports that database prefers locking be done in a separate select (follow-on locking); results will be locked after initial query executes
Hibernate: select field0_.ID as ID1_9_, field0_.DESCRIPTION as DESCRIPTION2_9_, field0_.NAME as NAME3_9_, field0_.TYPE as TYPE4_9_ from FIELD field0_ where field0_.DESCRIPTION=?
Hibernate: select ID from FIELD where ID =? for update
T1 : FIELD UPDATED
Hibernate: update FIELD set DESCRIPTION=?, NAME=?, TYPE=? where ID=?
T2 : FIELD READ This is a field for testing
T1 : COMMITTED
апр 19, 2019 5:28:07 PM org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl stop
T2 : FIELD UPDATED
Hibernate: update FIELD set DESCRIPTION=?, NAME=?, TYPE=? where ID=?
INFO: HHH000030: Cleaning up connection pool [jdbc:oracle:thin:#localhost:1521:oracle]
T2 : COMMITTED
Process finished with exit code 0
After the execution the column description contains T2. It looks like pessimistic_write mode works. The transaction who wrote first - won. And this was T2. But what happened with T1? T1 : COMMITTED is also seen in the output. As long as T1 doesn't change anything it's acceptable for me, but I need an indicator that T1 failed, so that I can retry the read/select.
I was wrong. I ran the code multiple times and with different results. Sometimes the column description contains T1, sometimes T2.
You say you want to make sure that other transactions will NOT READ the queries entities.
For that, you need LockMode.PESSIMISTIC_WRITE. This does not allow both READs and UPDATEs. LockMode.PESSIMISTIC_READ does not allow only UPDATEs.
A lock with LockModeType.PESSIMISTIC_WRITE can be obtained on an
entity instance to force serialization among transactions attempting
to update the entity data.
A lock with LockModeType.PESSIMISTIC_WRITE can be used when querying
data and there is a high likelihood of deadlock or update failure
among concurrent updating transactions.
I am having trouble getting SELECT FOR UPDATE to work in Hibernate and Oracle.
When I have two threads with one EntityManager per thread, the second thread seems to be able to read the same row as the first. I can see this by adding traces which show that the second thread reads the same row while the first is in between query.getSingleResult() and entityManager.getTransaction().commit() My expectation was that once a SELECT FOR UPDATE has been issued no one else should be able to read the same row until it is committed by the first thread. But this is not happening.
I can resort to an alternative implementation. What I want to achieve is only one process being able to read and update a row in an Oracle table so that it behaves like a queue given that the consumer processes can be in different machines.
Here is the minimum example of my code:
public MyMessage getNextMessage() {
String sql = "SELECT * FROM MESSAGE WHERE MESSAGE_STATUS = 'Pending' AND rownum=1 FOR UPDATE OF MESSAGE_STATUS";
entityManager.getTransaction().begin();
Query query = entityManager.createNativeQuery(sql, MyMessage.class);
query.setLockMode(LockModeType.PESSIMISTIC_WRITE);
MyMessage msg = null;
try {
msg = (MyMessage) query.getSingleResult();
} catch (NoResultException nodatafound) {
// Ignore when no data found, just return null
}
if (msg != null) {
msg.setMessageStatus("In Progress");
entityManager.persist(msg);
}
entityManager.getTransaction().commit();
return msg;
}
we are extracting data from various database types (Oracle, MySQL, SQL-Server, ...). Once it is successfully written to a file we want to mark it as transmitted, so we update a specific column.
Our problem is, that a user has the possibility to change the data in the meantime but might forget to commit. The record is blocked with a select for update statement. So it can happen, that we mark something as transmitted, which is not.
This is an excerpt from our code:
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (!extractedData.rowUpdated()) {
extractedData.updateString("COLUMNNAME", "TRANSMITTED");
// code will stop here if user has changed data but did not commit
extractedData.updateRow();
// once committed the changed data is marked as transmitted
}
}
The method extractedData.rowUpdated() returns false, because technically the user didn't change anything yet.
Is there any way to not update the row and detect if data was changed at this late stage?
Unfortunately I cannot change the program the user is using to change the data.
So you want to
Run through all rows of the table that have not been exported
Export this data somewhere
Mark these rows exported so your next iteration will not export them again
As there might be pending changes on a row, you don't want to mess with that information
How about:
You iterate over all rows.
for every row
generate a hash value for the contents of the row
compare column "UPDATE_STATUS" with calulated hash
if no match
export row
store hash into "UPDATE_STATUS"
if store fails (row locked)
-> no worries, will be exported again next time
if store succeeds (on data already changed by user)
-> no worries, will be exported again as hash will not match
This might further slow your export as you'll have to iterate over everything instead of over everything WHERE UPDATE_STATUS IS NULL but you might be able to do two jobs - one (fast)
iterating over WHERE UPDATE_STATUS IS NULL and one slow and thorough WHERE UPDATE_STATUS IS NOT NULL (with the hash-rechecking in place)
If you want to avoid store-failures/waits, you might want to store the hash /updated information into a second table copying the primary key plus the hash field value - that way user
locks on the main table would not interfere with your updates at all (as those would be on another table)
"a user [...] might forget to commit" > A user either commits or he doesn't. "Forgetting" to commit is tantamount to a bug in his software.
To work around that you need to either:
Start a transaction with isolation level SERIALIZABLE, and within that transaction:
Read the data and export it. Data read this way is blocked from being updated.
Update the data you processed. Note: don't do that with an updateable ResultSet, do that with an UPDATE statement. That way you don't need an CONCUR_UPDATABLE + TYPE_SCROLL_SENSITIVE which is much slower than a CONCUR_READ_ONLY + TYPE_FORWARD_ONLY.
Commit the transaction.
That way the buggy software will be blocked from updating data you are processing.
Another way
Start a TRANSACTION at a lower isolation level (default READ COMMITTED) and within that transaction
Select the data with proper Table Hints Eg for SQL Server these: TABLOCKX + HOLDLOCK (large datasets), or ROWLOCK + XLOCK + HOLDLOCK (small datasets), or PAGLOCK + XLOCK + HOLDLOCK. Having HOLDLOCK as a table hint is practically equivalent to having a SERIALIZABLE transaction. Note that lock escalation may escalate the latter two to table locks if the number of locks becomes too high.
Update the data you processed; Note: use an UPDATE statement. Lose the updatable/scroll_sensitive resultset.
Commit the TRANSACTION.
Same deal, the buggy software will be blocked from updating data you are processing.
In the end we had to implement optimistic locking. In some tables we already have a column that stores the version number. Some other tables have a timestamp column that holds the time of the last change (changed by trigger).
While a timestamp might not always be a reliable source for optimistic locking we went with it anyway. Several changes during a single second are not very realistic in our environment.
Since we have to know the primary key without describing it before hand, we had to access the resultset metadata. Some of our databases do not support this (DB/2 legacy tables for example). We are still using the old system for these.
Note: The tableMetaData is an XML-config file where our description of the table is stored. This is not directly related to the metadata of the table in the database.
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (tableMetaData.getVersion() != null) {
markDataAsExported(extractedData, tableMetaData);
} else {
markResultSetAsExported(extractedData, tableMetaData);
}
}
// new way with building of an update statement including the version column in the where clause
private void markDataAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
ResultSet resultSetPrimaryKeys = null;
PreparedStatement versionedUpdateStatement = null;
try {
ResultSetMetaData extractedMetaData = extractedData.getMetaData();
resultSetPrimaryKeys = conn.getMetaData().getPrimaryKeys(null, null, tableMetaData.getTable());
ArrayList<String> primaryKeyList = new ArrayList<String>();
String sqlStatement = "update " + tableMetaData.getTable() + " set " + tableMetaData.getUpdateColumn()
+ " = ? where ";
if (resultSetPrimaryKeys.isBeforeFirst()) {
while (resultSetPrimaryKeys.next()) {
primaryKeyList.add(resultSetPrimaryKeys.getString(4));
sqlStatement += resultSetPrimaryKeys.getString(4) + " = ? and ";
}
sqlStatement += tableMetaData.getVersionColumn() + " = ?";
versionedUpdateStatement = conn.prepareStatement(sqlStatement);
while (extractedData.next()) {
versionedUpdateStatement.setString(1, tableMetaData.getUpdateValue());
for (int i = 0; i < primaryKeyList.size(); i++) {
versionedUpdateStatement.setObject(i + 2, extractedData.getObject(primaryKeyList.get(i)),
extractedMetaData.getColumnType(extractedData.findColumn(primaryKeyList.get(i))));
}
versionedUpdateStatement.setObject(primaryKeyList.size() + 2,
extractedData.getObject(tableMetaData.getVersionColumn()), tableMetaData.getVersionType());
if (versionedUpdateStatement.executeUpdate() == 0) {
logger.warn(Message.COLLECTOR_DATA_CHANGED, tableMetaData.getTable());
}
}
} else {
logger.warn(Message.COLLECTOR_PK_ERROR, tableMetaData.getTable());
markResultSetAsExported(extractedData, tableMetaData);
}
} finally {
if (resultSetPrimaryKeys != null) {
resultSetPrimaryKeys.close();
}
if (versionedUpdateStatement != null) {
versionedUpdateStatement.close();
}
}
}
//the old way as fallback
private void markResultSetAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
while (extractedData.next()) {
extractedData.updateString(tableMetaData.getUpdateColumn(), tableMetaData.getUpdateValue());
extractedData.updateRow();
}
}
I have around 5000 records to update. I am trying to measure performance of the operation. It starts with around 100 ms but after every thousand updates operation time increases around 80 ms. Why is it slowing down? JVM?
StatelessSession session = dao.getStatelessSession();
Transaction transaction = session.beginTransaction();
try {
List<Entity> list = dao.findAll();
int counter = 0;
for (Entity each : list) {
final Date startTime = Clock.getTime();
webService.execute(each);
session.update(each);
counter += 1;
final Date endTime = Clock.getTime();
LOGGER.info("***** " + getMilliSecondsDifference(startTime, endTime) + " for count: " + counter + "*****");
}
} catch (Exception e) {
LOGGER.info("***** Exception occured : ", e);
} finally {
transaction.commit();
session.close();
}
Hüseyin,
It doesnt have to be hibernate problem at all if we look at your code.
I suggest you to comment out your line related with webservice call.
Then please try again batch hql running.
Maybe networking could be getting slower.
You have one transaction and dealing with a large number of objects. here you will probabely have a memory leak and performance issue too.
The objects references will stay in memory untill a session flush is executed (commit). so you will have a big number of object in memory in addition of the big number of informations about the object changes that will be also kept in the hibernate session and that can alter the performance too (i'am not an hibernate expert but you should consider this point)
I think that you may think about using a lot of transactions
See theses interesting links:
Transaction Management for bulk operations
Hibernate session and Transaction Management Guidelines
Good luck
Does JPA/EJB3 framework provide standard way to do batch insert operation...?
We use hibernate for persistence framework, So I can fall back to Hibernate Session and use combination session.save()/session.flush() achieve batch insert. But would like to know if EJB3 have a support for this...
Neither JPA nor Hibernate do provide particular support for batch inserts and the idiom for batch inserts with JPA would be the same as with Hibernate:
EntityManager em = ...;
EntityTransaction tx = em.getTransaction();
tx.begin();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
em.persist(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
em.flush();
em.clear();
}
}
tx.commit();
session.close();
Using Hibernate's proprietary API in this case doesn't provide any advantage IMO.
References
JPA 1.0 Specification
Section 4.10 "Bulk Update and Delete Operations"
Hibernate Core reference guide
Chapter 13. Batch processing
For hibernate specifically, the whole chapter 13 of the core manual explain the methods.
But you are saying that you want the EJB method through Hibernate, so the entity manager documentation also has a chapter on that here. I suggest that you read both (the core and the entity manager).
In EJB, it is simply about using EJB-QL (with some limitations). Hibernate provides more mechanics though if you need more flexibility.
With medium records number you can use this way:
em.getTransaction().begin();
for (int i = 1; i <= 100000; i++) {
Point point = new Point(i, i);
em.persist(point);
if ((i % 10000) == 0) {
em.flush();
em.clear();
}
}
em.getTransaction().commit();
But with large records number you should perform this task in multiple transactions:
em.getTransaction().begin();
for (int i = 1; i <= 1000000; i++) {
Point point = new Point(i, i);
em.persist(point);
if ((i % 10000) == 0) {
em.getTransaction().commit();
em.clear();
em.getTransaction().begin();
}
}
em.getTransaction().commit();
Ref: JPA Batch Store
Yes you can rollback to your JPA implementation if you wish in order to have the control you defined.
JPA 1.0 is rich on EL-HQL but light on Criteria API support, however this has been addressed in 2.0.
Session session = (Session) entityManager.getDelegate();
session.setFlushMode(FlushMode.MANUAL);
Pascal
In your example to insert 100000 records, it is done within single transaction, as the commit() is only called at the end.. Does it put a lot pressure towards the database? Furthermore, in case there is rollback, the cost will be too much..
Will the following approach be better?
EntityManager em = ...;
for ( int i=0; i<100000; i++ ) {
if(!em.getTransaction().isActive()) {
em.getTransaction().begin();
}
Customer customer = new Customer(.....);
em.persist(customer);
if ((i+1) % 20 == 0 ) { //20, same as the JDBC batch size
//flush and commit of inserts and release memory:
em.getTransaction().commit();
em.clear();
}
}
session.close();