JAVA SQL Batch updates best way - java

I have a service that updates multiple tables through multiple services with in a transaction boundary, if one of them fails everything need to be rolled back. One of these SQL update has around 2k+ records to update and its done in two batches 1000 records at a time. Problem is this update is taking too long around 2mins sometimes and transaction timing out. Is there a way in this sql can be performed spanning multiple threads each thread updating 100 records. Thanks in advance.

Related

Multi threaded transactional inserts with spring data jdbc

I'm using NamedParameterJdbcTemplate. I need to inserts data to 5 different tables within a transaction.
The sequential execution of inserts take long time & I need to optimize the time taken for inserts.
One possible option is make all inserts parallel using threads. As far as I understood transaction is not propagate to multi threads.
How can I improve time taken for this operation within a transaction boundary ?
I don't think what you are trying to do can possibly work.
As far as I know a database transaction is always bound to a single connection.
And the JDBC connection API is blocking, i.e. you can only execute a single statement at a time. So even when you share the Spring transaction across multiple threads you'll still execute your SQL sequential.
I therefore see the following options which might be combined available to you:
Tune your database/SQL: batched inserts, disabled constraints, adding or removing indexes and so one might have a effect on the execution time.
Drop the transactional constraint.
If you can break your process into multiple processes you might be able to run them in parallel and actually gaining performance.
Tune/parallelise the part happening in your Java application so you can do other stuff while your SQL statements are running.
To decide which approach is most promising we'd need to know more about your actual scenario.

MongoDB Spring inserting multiple documents in bulk is slow

I have a method that calls MongoTemplate.insert(List<Object>) to insert multiple objects inside my mongo database.
During performance testing, calling the method concurrently multiple times drastically increases the response times. For example, running 100 threads each bulk-inserting ~100 documents raises the response time for each thread to about 5 seconds.
This doesn't seem right. Am I missing something?

Should I COMMIT after every execute batch?

I have a 1 trillion records file. Batch size is 1000 after which the batch is Executed.
Should I commit after each Batch ? Or Commit just once after all the 1 trillion records are executed in Batches of 1000 ?
{
// Loop for 1 Trillion Records
statement.AddBatch()
if (++count % 1000 == 0)
{
statement.executeBatch()
// SHOULD I COMMIT HERE AFTER EACH BATCH ???
}
} // End Loop
// SHOULD I COMMIT HERE ONCE ONLY ????
A commit marks the end of a successful transaction. So the commit should theoretically happen after all rows have been executed successfully.
If the execution statements are completely independent, than every one should have it's own commit (in theory).
But there may be limitations by the database system that require to split up the rows in several batches with their own commit. Since a database has to reserve some space to be able to do a rollback unless changes are committed, the "cost" of a huge transaction size may by very high.
So the answer is: It depends on your requirements, your database and environment.
Mostly it depends what you want to achieve, usually you need to compromise on something to achieve something. For example, I am deleting 3 million records that are no longer being accessed by my users using a stored procedure.
If I execute delete query all at once, a table lock gets escalated and my other users start getting timeout issues in our applications because the table has been locked by SQL Server (I know the question is not specific to SQL Server but could help debug the problem) to give the deletion process better performance, If you have such a case, you will never go for a bigger batch than 5000. (See Lock Escalation Threshold)
With my current plan, I am deleting 3000 rows per batch and only key lock is happening which is good, I am committing after half a million records are processed.
So, if you do not want simultaneous users hitting the table, you can delete the huge number of records if your database server has enough log space and processing speed but 1 Trillion records are a mess. You better proceed with a batch wise deletion or if 1 Trillion records are total records in the table and you want to delete all of those records, then I'd suggest go for a truncate table.

java jdbc design pattern : handle many inserts

I would like to ask for some advices concerning my problem.
I have a batch that does some computation (multi threading environement) and do some inserts in a table.
I would like to do something like batch insert, meaning that once I got a query, wait to have 1000 queries for instance, and then execute the batch insert (not doing it one by one).
I was wondering if there is any design pattern on this.
I have a solution in mind, but it's a bit complicated:
build a method that will receive the queries
add them to a list (the string and/or the statements)
do not execute until the list has 1000 items
The problem : how do I handle the end ?
What I mean is, the last 999 queries, when do I execute them since I'll never get to 1000 ?
What should I do ?
I'm thinking at a thread that wakes up every 5 minutes and check the number of items in a list. If he wakes up twice and the number is the same , execute the existing queries.
Does anyone has a better idea ?
Your database driver needs to support batch inserting. See this.
Have you established your system is choking on network traffic because there is too much communication between the service and the database? If not, I wouldn't worry about batching, until you are sure you need it.
You mention that in your plan you want to check every 5 minutes. That's an eternity. If you are going to get 1000 items in 5 minutes, you shouldn't need batching. That's ~ 3 a second.
Assuming you do want to batch, have a process wake up every 2 seconds and commit whatever is queued up. Don't wait five minutes. It might commit 0 rows, it might commit 10...who cares...With this approach, you don't need to worry that your arbitrary threshold hasn't been met.
I'm assuming that the inserts come in one at a time. If your incoming data comes in n at once, I would just commit every incoming request, no matter how many inserts happen. If your messages are coming in as some sort of messaging system, it's asynchronous anyway, so you shouldn't need to worry about batching. Under high load, the incoming messages just wait till there is capacity to handle them.
Add a commit kind of method to that API that will be called to confirm all items have been added. Also, the optimum batch size is somewhere in the range 20-50. After that the potential gain is outweighed by the bookkeeping necessary for a growing number of statements. You don't mention it explicitly, but of course you must use the dedicated batch API in JDBC.
If you need to keep track of many writers, each in its own thread, then you'll also need a begin kind of method and you can count how many times it was called, compared to how many times commit was called. Something like reference-counting. When you reach zero, you know you can flush your statement buffer.
This is most amazing concept , I have faced many time.So, according to your problem you are creating a batch and that batch has 1000 or more queries for insert . But , if you are inserting into same table with repeated manner.
To avoid this type of situation you can make the insert query like this:-
INSERT INTO table1 VALUES('4','India'),('5','Odisha'),('6','Bhubaneswar')
It can execute only once with multiple values.So, better you can keep all values inside any collections elements (arraylist,list,etc) and finally make a query like above and insert it once.
Also you can use SQL Transaction API.(Commit,rollback,setTraction() ) etc.
Hope ,it will help you.
All the best.

Java Message Driven Bean(MDB) simultaneous database update issue

I have a java batch process which publishes message processing to MQ. MDB associated with the queue processes the message. Each message will have 10 records. I need to update a database table to keep track of the records processed, successful and failures. There will be only one row in table for each batch run. So the problem is that since multiple instances of MDB are trying to update, we are facing concurrency issues. We tried with row-level locking as well. But the issue still exists.
I am looking for a solution where I can keep track of the counter on the java side and then do a single update after reaching certain threshold. Lets say 500 messages were published. Each message processes 10 records. The MDB should update this counter after processing all records within this message. The counter will then spawn a thread (if threshold is met) that will update the database.
Please let me know what options are available to me.
App Server - WAS 5.6, DB2 9.1 on Z/OS. Access to DB2 is through SP.
Thanks!
Have you tried doing the update entirely on the DB server? For example:
UPDATE COUNT_TABLE SET COUNTER = COUNTER + 1 WHERE ...
The DB server should be able to manage concurrent update statements like this.
The simplest solution would be to have only a single MDB instance running and your concurrency problem disappears.
This would be a little slower, since you will be doing DB updates of 10 at a time instead of your proposed 500, but unless this is a problem, I would just keep it simple.

Categories