fast insertion of many entities in Hibernate - java

I want to insert a List of 170.000 entities into my local installed MySQL 8.0 database using Hibernate 4.2.
Currently I'm doing this via the Session#save method. But inserting those many entities lasts so long. So is there a possibility to do this faster?
for (Agagf x : list) {
create(x);
}
// ------------------------
public static void create(Object obj) throws DatabaseException {
Session hsession = null;
try {
hsession = SqlDataHibernateUtil.getSessionFactory().openSession();
Transaction htransaction = hsession.beginTransaction();
hsession.save(obj);
htransaction.commit();
} catch (HibernateException ex) {
throw new DatabaseException(ex);
} finally {
if (hsession != null)
hsession.close();
}
}

There is this article on Hibernate page: http://docs.jboss.org/hibernate/orm/4.2/manual/en-US/html/ch15.html
According to them, you would need something like this:
create(list);
// ------------------------
public static void create(List<Object> objList) throws DatabaseException {
Session hsession = null;
try {
hsession = SqlDataHibernateUtil.getSessionFactory().openSession();
Transaction htransaction = hsession.beginTransaction();
int count = 0;
for(Agagf x: objList) {
hsession.save(obj);
if ( ++count % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
hsession.flush();
hsession.clear();
count = 0;
}
}
} catch (HibernateException ex) {
throw new DatabaseException(ex);
} finally {
htransaction.commit();
if (hsession != null) {
hsession.close();
}
}
}
Also, the configuration to enable batch processing:
if you are undertaking batch processing you will need to enable the use of JDBC
batching. This is absolutely essential if you want to achieve optimal
performance. Set the JDBC batch size to a reasonable number (10-50, for example):
hibernate.jdbc.batch_size 20
Edit: In your case, play with the batch size to better fit your volume. Just remember to make the same size in both configuration and if statement for flush.

Related

Java unique code generation failed while calling the recurring function

We have to implement a logic to write the unique code generation in Java. The concept is when we generate the code the system will check if the code is already generate or not. If already generate the system create new code and check again. But this logic fails in some case and we cannot able to identify what is the issue is
Here is the code to create the unique code
Integer code = null;
try {
int max = 999999;
int min = 100000;
code = (int) Math.round(Math.random() * (max - min + 1) + min);
PreOrders preObj = null;
preObj = WebServiceDao.getInstance().preOrderObj(code.toString());
if (preObj != null) {
createCode();
}
} catch (Exception e) {
exceptionCaught();
e.printStackTrace();
log.error("Exception in method createCode() - " + e.toString());
}
return code;
}
The function preOrderObj is calling a function to check the code exists in the database if exists return the object. We are using Hibernate to map the database functions and Mysql on the backend.
Here is the function preOrderObj
PreOrders preOrderObj = null;
List<PreOrders> preOrderList = null;
SessionFactory sessionFactory =
(SessionFactory) ServletActionContext.getServletContext().getAttribute(HibernateListener.KEY_NAME);
Session Hibernatesession = sessionFactory.openSession();
try {
Hibernatesession.beginTransaction();
preOrderList = Hibernatesession.createCriteria(PreOrders.class).add(Restrictions.eq("code", code)).list(); // removed .add(Restrictions.eq("status", true))
if (!preOrderList.isEmpty()) {
preOrderObj = (PreOrders) preOrderList.iterator().next();
}
Hibernatesession.getTransaction().commit();
Hibernatesession.flush();
} catch (Exception e) {
Hibernatesession.getTransaction().rollback();
log.debug("This is my debug message.");
log.info("This is my info message.");
log.warn("This is my warn message.");
log.error("This is my error message.");
log.fatal("Fatal error " + e.getStackTrace().toString());
} finally {
Hibernatesession.close();
}
return preOrderObj;
}
Please guide us to identify the issue.
In createCode method, when the random code generated already exist in database, you try to call createCode again. However, the return value from the recursive call is not updated to the code variable, hence the colliding code is still returned and cause error.
To fix the problem, update the method as
...
if (preObj != null) {
//createCode();
code = createCode();
}
...
Such that the code is updated.
By the way, using random number to generate unique value and test uniqueness through query is a bit strange. You may try Auto Increment if you want unique value.

Neo4j data consistency issue

I got empty result error when accessing neo4j database with the following setup.
Neo4j runs in docker.
Uploader process runs continuously with 0.5 second sleep between each
upload
Reader process runs continuously as well
Reader:
Driver driver = GraphDatabase.driver("bolt://db_address:7687", AuthTokens.basic("user", "password"));
while (true) {
try (Session session = driver.session(AccessMode.READ)) {
for (int i = 1; i <= 100; i++) {
session.run("Match (n:Number) where n.value=$value return ID(n)", parameters("value", i)).single().get(0).asInt();
}
}
Thread.sleep(500);
}
Uploader:
Driver driver = GraphDatabase.driver("bolt://db_address:7687", AuthTokens.basic("user", "password"));
while (true) {
try (Session session = driver.session(AccessMode.WRITE)) {
try (Transaction tx = session.beginTransaction()) {
tx.run("MATCH (n) DELETE n");
for (int i = 1; i <= 100; i++) {
tx.run("CREATE (n:Number {value: $value}) return ID(n)", parameters("value", i)).single().get(0).asInt();
}
tx.success();
}
}
Thread.sleep(500);
}
After few cycles I get error in the reader process:
Cannot retrieve a single record, because this result is empty.
At start the database contains the requested data.
Based on the description of "write transaction" and the code above the empty result seems strange.
Did I miss something with the transaction handling with neo4j?
You need to call tx.success() to commit the transaction.
PS: not sure why the database is cleared on every upload run

How to scanning and deleting millions of rows in HBase

What Happened
All the data from last month was corrupted due to a bug in the system. So we have to delete and re-input these records manually. Basically, I want to delete all the rows inserted during a certain period of time. However, I found it difficult to scan and delete millions of rows in HBase.
Possible Solutions
I found two way to bulk delete:
The first one is to set a TTL, so that all the outdated record would be deleted automatically by the system. But I want to keep the records inserted before last month, so this solution does not work for me.
The second option is to write a client using the Java API:
public static void deleteTimeRange(String tableName, Long minTime, Long maxTime) {
Table table = null;
Connection connection = null;
try {
Scan scan = new Scan();
scan.setTimeRange(minTime, maxTime);
connection = HBaseOperator.getHbaseConnection();
table = connection.getTable(TableName.valueOf(tableName));
ResultScanner rs = table.getScanner(scan);
List<Delete> list = getDeleteList(rs);
if (list.size() > 0) {
table.delete(list);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (null != table) {
try {
table.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (connection != null) {
try {
connection.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
private static List<Delete> getDeleteList(ResultScanner rs) {
List<Delete> list = new ArrayList<>();
try {
for (Result r : rs) {
Delete d = new Delete(r.getRow());
list.add(d);
}
} finally {
rs.close();
}
return list;
}
But in this approach, all the records are stored in ResultScanner rs, so the heap size would be huge. And if the program crushes, it has to start from the beginning.
So, is there a better way to achieve the goal?
Don't know how many 'millions' you are dealing with in your table, but the simples thing is to not try to put them all into a List at once but to do it in more manageable steps by using the .next(n) function. Something like this:
for (Result row : rs.next(numRows))
{
Delete del = new Delete(row.getRow());
...
}
This way, you can control how many rows get returned from the server via a single RPC through the numRows parameter. Make sure it's large enough so as not to make too many round-trips to the server, but at the same time not too large to kill your heap. You can also use the BufferedMutator to operate on multiple Deletes at once.
Hope this helps.
I would suggest two improvements:
Use BufferedMutator to batch your deletes,  it does exactly what you need – keeps internal buffer of mutations and flushes it to HBase when buffer fills up, so you do not have to worry about keeping your own list, sizing and flushing it.
Improve your scan:
Use KeyOnlyFilter – since you do not need the values, no need to retrieve them
use scan.setCacheBlocks(false) - since you do a full-table scan, caching all blocks on the region server does not make much sense
tune scan.setCaching(N) and scan.setBatch(N) – the N will depend on the size of your keys, you should keep a balance between caching more and memory it will require; but since you only transfer keys, the N could be quite large, I suppose.
Here's an updated version of your code:
public static void deleteTimeRange(String tableName, Long minTime, Long maxTime) {
try (Connection connection = HBaseOperator.getHbaseConnection();
final Table table = connection.getTable(TableName.valueOf(tableName));
final BufferedMutator mutator = connection.getBufferedMutator(TableName.valueOf(tableName))) {
Scan scan = new Scan();
scan.setTimeRange(minTime, maxTime);
scan.setFilter(new KeyOnlyFilter());
scan.setCaching(1000);
scan.setBatch(1000);
scan.setCacheBlocks(false);
try (ResultScanner rs = table.getScanner(scan)) {
for (Result result : rs) {
mutator.mutate(new Delete(result.getRow()));
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
Note the use of "try with resource" – if you omit that, make sure to .close() mutator, rs, table, and connection.

OptimisticLocking is not working in eclipse link

I am working on a project and due to certain issue I am changing pessimistic locking to optimistic locking. While doing so i am getting an error while updating the entity in the code as i tried it as a standalone application so there is no chance that two threads are updating it simultaneously. I checked the value of version in the different part of the code. Its showing it as 0. While calling flush or commit its giving an excepion that it is updated to 1.
Note: I already added #version int version_id field in entity for optimisticlocking.
The code is as below:
WorkItem tmp_workItem=entityManager.find(WorkItem.class , workItem.getEntityKey());
logger.info("Before merge"+ tmp_workItem.getVersion());
entityManager.merge(workItem);
tmp_workItem=entityManager.find(WorkItem.class , workItem.getEntityKey());
logger.info("After merge"+tmp_workItem.getVersion()+" "+workItem.getVersion());
//logger.info(entityManager.getLockMode(WorkItem.class).toString());
entityManager.flush();
logger.info("After flush"+tmp_workItem.getVersion());
response = true;
This code is throwing an exception :
getVersion : 1
] cannot be updated because it has changed or been deleted since it was last read.
Class> com.csg.ib.cobra.domain.WorkItem Primary Key> [9553]
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitToDatabase(RepeatableWriteUnitOfWork.java:549)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitToDatabaseWithPreBuiltChangeSet(UnitOfWorkImpl.java:1559
The Values of version in logger are :
before merge : 0 0
after merge : 0 0
before flush: 0 0
Then how it can get incresed by 1 while calling entityManger.flush()
)
In general, in contrast with the id field, which is updated on persist operation, version field is updated when data is actually flushed to the database, either using commit or flush. This is the reason why version field value is 0 until entityManager.flush is called. You could alternatively be using transaction.commit if using resource local transaction.
Regarding the exception that you are getting in the application code, it would be useful to have a more complete code example, maybe in the form of a JUnit test. I have created the following simple tests to demonstrate correct behavior, hope it will be helpful:
#Test
public void testOptimisticLocking() {
OptimisticEntity entity = new OptimisticEntity();
entity.setName("Some name");
EntityTransaction transaction = entityManager.getTransaction();
transaction.begin();
try {
entityManager.persist(entity);
transaction.commit();
} catch (Exception x) {
transaction.rollback();
Assert.fail("Failed to commit: " + x.getMessage());
}
int id = entity.getId();
int version = entity.getVersion();
Assert.assertTrue(id > 0);
Assert.assertTrue(version > 0);
entity = entityManager.find(OptimisticEntity.class, id);
Assert.assertNotNull("Entity could not be retrieved", entity);
Assert.assertEquals("Entity version retrieved not expected", version, entity.getVersion());
entity.setName("Another name");
transaction.begin();
try {
entityManager.merge(entity);
transaction.commit();
} catch (Exception x) {
transaction.rollback();
Assert.fail("Failed to merge: " + x.getMessage());
}
Assert.assertEquals("Entity version not incremented after merge", version+1, entity.getVersion());
}
#Test
public void testOptimisticLockingOneTransaction() {
OptimisticEntity entity = new OptimisticEntity();
entity.setName("Some name");
EntityTransaction transaction = entityManager.getTransaction();
transaction.begin();
try {
entityManager.persist(entity);
int id = entity.getId();
int version = entity.getVersion();
Assert.assertTrue(id > 0);
Assert.assertEquals(0, version);
OptimisticEntity retrievedEntity = entityManager.find(OptimisticEntity.class, id);
Assert.assertNotNull("Entity could not be retrieved", retrievedEntity);
Assert.assertEquals("Entity version retrieved not expected", 0, retrievedEntity.getVersion());
entity.setName("Another name");
entityManager.merge(entity);
Assert.assertEquals("Entity version changed after merge", 0, entity.getVersion());
retrievedEntity = entityManager.find(OptimisticEntity.class, id);
Assert.assertNotNull("Entity could not be retrieved", retrievedEntity);
Assert.assertEquals("Entity version retrieved not expected", 0, retrievedEntity.getVersion());
entityManager.flush();
Assert.assertEquals("Entity version not incremented after flush", 1, entity.getVersion());
transaction.commit();
} catch (Exception x) {
transaction.rollback();
Assert.fail("An error occurred: " + x.getMessage());
}
}

Hibernate causes out of memory exception when saving large number of entities

In my application I'm using CSVReader & hibernate to import large amount of entities (like 1 500 000 or more) into database from a csv file. The code looks like this:
Session session = headerdao.getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
int count = 0;
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
try {
if (nextLine.length == 23
&& Integer.parseInt(nextLine[0]) > lastIdInDB) {
JournalHeader current = parseJournalHeader(nextLine);
current.setChain(chain);
session.save(current);
count++;
if (count % 100 == 0) {
session.flush();
tx.commit();
session.clear();
tx.begin();
}
if (count % 10000 == 0) {
LOG.info(count);
}
}
} catch (NumberFormatException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
}
}
tx.commit();
session.close();
With large enough files (somewhere around 700 000 lines) I get out of memory exception (heap space).
It seems that the problem is somehow hibernate related, because if I comment just the line session.save(current); it runs fine. If it's uncommented, the task manager shows continuously increasing memory usage of javaw and then at some point the parsing gets real slow and it crashes.
parseJournalHeader() does nothing special, it just parses an entity based on the String[] that the csv reader gives.
Session actually persists objects in cache. You are doing correct things to deal with first-level cache. But there's more things which prevent garbage collection from happening.
Try to use StatelessSession instead.

Categories