I trying persist a many registers in database reading a file with many lines
I´m using a forech to read the list of objects wrapped in file
logs.stream().forEach(log -> save(log));
private LogData save(LogData log) {
return repository.persist(log);
}
But the inserts are slow
Do i have a way to speed the inserts?
Your way take a long time because you persist element by element, so you go n time to the database, I would like to use Batch processing instead to use one transaction instead of N transaction, so the persist method can be :
public void persist(List<Logs> logs) {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
logs.forEach(log -> session.save(log));// from the comment of #shmosel
tx.commit();
session.close();
}
Use a Batch Insert, Google "Hibernate Batch Insert" or replace with whatever name of your ORM if it's not Hibernate.
https://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm
To insert at every line makes this program slowly, why dont you think to collect n lines, and insert n lines together at once.
Related
I would like to know what is the best way to do the following:
A client sends a json of 100 records to the spring boot application to insert into the DB.
But before inserting I have to execute a query to verify some data of EACH record of the 100 records. And then insert.
I currently have this:
for(int i= 0; i < productos.size(); i++) {
productos.get(i).setIdvehiculo(productoRepository.findTesting("49878", 3)); // ----> NATIVE QUERY EXECUTION TAKES 100ms I THINK
productoRepository.save(productos.get(i)); // ----> INSERT
}
//productoRepository.saveAll(productos);
entityManager.flush();
entityManager.clear();
And it takes 10 seconds ... doing the select and inserting. 100 records, 10 seconds, isn't that a long time?
Don't insert 1:1 inside for loop, just construct the model there and add that model into ArrayList and once you done with processing of records, call saveAll(productos list) outside loop.
Try enabling L2 cache. That would reduce the validation time. Depending on how critical your data is, you can also cache the entity on the application level.
Create a transaction to save the entity. This will allow the database to leverage the concurrency control.
See if you can change the architecture to enable the queue (could be Kafka Q), and another application consumes this Q to write to the database.
I am using the below set of code for an update:
private void updateAvatarPath(Integer param1, String param2, String param3, boolean param4){
Transaction avatarUpdatePathTransaction = session.beginTransaction();
String updateQuery = "query goes here with param";
Query query = session.createSQLQuery(updateQuery);
query.executeUpdate();
avatarUpdatePathTransaction.commit();
session.flush();
}
This function is being called from a loop. So this takes time to update since for each loop it's hitting the DB. Instead of hitting DB every time, to increase the performance I am planning to execute it as batches. But have no idea how to do it.
session.doWork() is one of the solutions which I got. I want to know any other option available to do it.
You should move Transaction avatarUpdatePathTransaction = session.beginTransaction(); before the start of your loop and avatarUpdatePathTransaction.commit(); after the end of your loop.
The recommended pattern is to have one session per "unit of work", in your case this seems to be modifying multiple entities in a single session/transaction.
The session.flush(); is not necessary I think, committing the transaction should flush the session
I have an application using hibernate. One of its modules calls a native SQL (StoredProc) in batch process. Roughly what it does is that every time it writes a file it updates a field in the database. Right now I am not sure how many files would need to be written as it is dependent on the number of transactions per day so it could be zero to a million.
If I use this code snippet in while loop will I have any problems?
#Transactional
public void test()
{
//The for loop represents a list of records that needs to be processed.
for (int i = 0; i < 1000000; i++ )
{
//Process the records and write the information into a file.
...
//Update a field(s) in the database using a stored procedure based on the processed information.
updateField(String.valueOf(i));
}
}
#Transactional(propagation=propagation.MANDATORY)
public void updateField(String value)
{
Session session = getSession();
SQLQuery sqlQuery = session.createSQLQuery("exec spUpdate :value");
sqlQuery.setParameter("value", value);
sqlQuery.executeUpdate();
}
Will I need any other configurations for my data source and transaction manager?
Will I need to set hibernate.jdbc.batch_size and hibernate.cache.use_second_level_cache?
Will I need to use session flush and clear for this? The samples in the hibernate tutorial is using POJO's and not native sql so I am not sure if it is also applicable.
Please note another part of the application is already using hibernate so as much as possible I would like to stick to using hibernate.
Thank you for your time and I am hoping for your quick response. If it is also possible could code snippet would really be useful for me.
Application Work Flow
1) Query Database for the transaction information. (Transaction date, Type of account, currency, etc..)
2) For each account process transaction information. (Discounts, Current Balance, etc..)
3) Write the transaction information and processed information to a file.
4) Update a database field based on the process information
5) Go back to step 2 while their are still accounts. (Assuming that no exception are thrown)
The code snippet will open and close the session for each iteration, which definitely not a good practice.
Is it possible, you have a job which checks how many new files added in the folder?
The job should run say every 15/25 minutes, checking how much files are changed/added in last 15/25 minutes and updates the database in batch.
Something like that will lower down the number of open/close session connections. It should be much faster than this.
I need to insert a lot of data in a database using hibernate, i was looking at batch insert from hibernate, what i am using is similar to the example on the manual:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
but i see that flush doesn't write the data on the database.
Reading about it, if the code is inside a transaction then nothing will be committed to the database until the transaction performs a commit.
So what is the need to use flush/clear ? seems useless, if the data are not written on the database then they are in memory.
How can i force hibernate to write data in the database?
Thanks
The data is sent to the database, and is not in memory anymore. It's just not made definitively persistent until the transaction commit. It's exacltly the same as if you executes the following sequences of statements in any database tool:
begin;
insert into ...
insert into ...
insert into ...
// here, three inserts have been done on the database. But they will only be made
// definitively persistent at commit time
...
commit;
The flush consists in executing the insert statements.
The commit consists in executing the commit statement.
The data will be written to the database, but according to the transaction isolation level you will not see them (in other transactions) until the transaction is committed.
Use some sql statement logger, that prints the statmentes that are transported over the database connection, then you will see that the statmentes are send to the database.
For best perfromance you also have to commit transactions. Flushing and clearing session clears hibernate caches, but data is moved to JDBC connection caches, and is still uncommited ( different RDBMS / drivers show differrent behaviour ) - you are just shifting proble to other place without real improvements in perfromance.
Having flush() at the location mentioned saves you memory too as your session will be cleared regularly. Otherwise you will have 100000 object in memory and might run out of memory for larger count. Check out this article.
I have a program that is used to replicate/mirror the main tables (around 20) from Oracle to MSSQL 2005 via webservice (REST).
The program periodically read XML data from the webservice and convert it to list via jpa entity. This list of entity will store to MSSQL via JPA.
All jpa entity will be provided by the team who create the webservice.
There are two issues that I notice and seems unsolvable after some searching.
1st issue: The performance of inserting/updating via JDBC jpa is very slow, it takes around 0.1s per row...
Doing the same via C# -> datatable -> bulkinsert to new table in DB -> call stored procedure to do mass insert / update base on joins takes 0.01 s for 4000 records.
(Each table will have around 500-5000 records every 5 minutes)
Below shows a snapshot of the Java code that do the task-> persistent library -> EclipseLink JPA2.0
private void GetEntityA(OurClient client, EntityManager em, DBWriter dbWriter){
//code to log time and others
List<EntityA> response = client.findEntityA_XML();
em.setFlushMode(FlushModeType.COMMIT);
em.getTransaction().begin();
int count = 0;
for (EntityA object : response) {
count++;
em.merge(object);
//Batch commit
if (count % 1000 == 0){
try{
em.getTransaction().commit();
em.getTransaction().begin();
commitRecords = count;
} catch (Exception e) {
em.getTransaction().rollback();
}
}
}
try{
em.getTransaction().commit();
} catch (Exception e) {
em.getTransaction().rollback();
}
//dbWriter write log to DB
}
Anything done wrong causing the slowness? How can I improve the insert/update speed?
2nd issue: There are around 20 tables to replicate and I have created the same number of methods similar to above, basically copying above method 20 times and replace EntityA with EntityB and so on, you get the idea...
Is there anyway to generalize the method such that I can throw in any entity?
The performance of inserting/updating via JDBC jpa is very slow,
OR mappers generally are slow for bulk inserts. Per definition. You ant speed? Use another approach.
In general an ORM will not cater fur the bulk insert / stored procedure approach and tus get slaughtered here. You use the wrong appraoch for high performance inserts.
There are around 20 tables to replicate and I have created the same number of methods similar to
above, basically copying above method 20 times and replace EntityA with EntityB and so on, you get
the idea...
Generics. Part of java for some time now.
You can execute SQL, stored procedure or JPQL update all queries through JPA as well. I'm not sure where these objects are coming from, but if you are just migrating one table to another in the same database, you can do the same thing you were doing in C# in Java with JPA.
If you want to process the objects in JPA, then see,
http://java-persistence-performance.blogspot.com/2011/06/how-to-improve-jpa-performance-by-1825.html
For #2, change EntityA to Object, and you have a generic method.