Java Hibernate tips about update all table fields performance

Java Hibernate tips about update all table fields performance - java

I have a requirement like this.
protected Integer[] updateFullTable(final Class clazz){
final ProjectionList projectionList=Projections.projectionList().add(Projections.property("id"),"id");
final Criteria criteria=session.createCriteria(clazz)
.add(Restrictions.eq("typeOfOperation",1))
.add(Restrictions.eq("performUpdate",true));
criteria.setProjection(projectionList);
final List idsList=criteria.list();
final Integer[]ids = transformObjectArrayIntoIntegerArray(idList);
//NOW WE UPDATE THE ROWS IDS.
final Query query=session.createQuery("update "+clazz.getName()+" set activeRegister=true and updateTime=:updateTime where id in (:ids)")
.setParameter("updateTime",new Date())
.setParameterList("ids",ids);
query.executeUpdate();
return transform;
}
As you guys can see I need to update all rows in a table sometime I query all the rows ids and later apply the update to those ids in a separate query but the tables has a lot of records sometimes takes between 30 seconds to 10 minutes depends of the table.
I have change this code to only one update like this.
final Query query=session.createQuery("update "+clazz.getName()+" set activeRegister=true and updateTime=:updateTime where typeOfOperation=1 and performUpdate=true");
And with that only query I avoid the first query but I cannot not longer return the affected Ids. But later the requirement was change a
final StringBuilder logRevert;
Parameter was added.
Which needs to store the updated ids to apply a direct reverse update into the DB if required.
But with my update i cannot get the Ids not longer. My question is how can I get or return the affected ids using a stored procedure or some workaround in the DB or hibernate I mean get the first behaviour with only one query or a enhanced code..
Any tip.
I have tried
Using criteria
Using HQL.
Using namedQuery
Using SqlQuery
Not using transformer returning me a raw Object[]
But the times still are somehow high.
I want something like
query.executeUpdate(); // RETURNS THE COUNT OF THE AFFECTED ROWS
But I need the affected Ids......
Sorry if the question is simple.
UPDATE
With #dmitry-senkovich I could do it using rawSQL but not with hibernate a separated question was made here.
https://stackoverflow.com/questions/44641851/java-hibernate-org-hibernate-exception-sqlgrammarexception-could-not-extract-re

What about the following solution?
SET #ids = NULL;
UPDATE SOME_TABLE
SET activeRegister = true, updateTime = :updateTime
WHERE typeOfOperation = 1 and performUpdate = true
AND (SELECT #ids := CONCAT_WS(',', id, #ids));
SELECT #ids;

if updateTime is datetime
you can select all affected record ids with select
Date updateTime = new Date(); // time from update
select id from clazz.getName() where updateTime=:updateTime and activeRegister=true and typeOfOperation=1 and performUpdate=true

Updating a large number of rows in a table is a slow operation. This is due to needing to capture the 'old' value of each row in case of a ROLLBACK (due to explicit ROLLBACK, failure of the UPDATE, failure or subsequent query in same transaction, or power failure before UPDATE finishes).
The usual fix is to rethink the application design that necessitated the large UPDATE.
On there other hand, there is a possible fix to the schema. Please provide SHOW CREATE TABLE so I don't have to do as much 'hand waving' in the following paragraph...
It might be better to move the column(s) that need to be updated into a separate, parallel, table ("vertical partitioning"). This might be beneficial if
The original table has lots of wide columns (TEXT, BLOB, etc) -- by not having to make bulky copies.
The original table is being updated simultaneously -- by the updates not blocking each other.
There are SELECT hitting the non-updated columns -- by avoiding certain other blockings.
You can still get the original set of columns -- by JOINing the two tables together.

Related

Position Autoincrement in Talend

So i a bit lost and don t really know how to hang up this one...
Consider that i have a 2 DB table in Talend, let say firstly
A table invoices_only which has as fields, the invoiceNummer and the authors like this
Then, a table invoices_table with the field (invoiceNummer, article, quantity and price) and for one invoice, I can have many articles, for example
and through a tmap want to obtain a table invoice_table_result, with new columns, one for the article position, an one other for the total price. for the position i know that i can use something like the Numeric.sequence("s1",1,1) function, but don t know how to restart my counter when a new invoices nummer is found, and of course for the total price it is just a basic multiplication
so my result should be some thing like this
Here is a draft of my talend job, i m doing a lookup on the invoicenummer between the table invoice_only and invoices
Any Advices? thanks.

A trick I use is to do the sequence like this:
Numeric.sequence("s" + row.InvoiceNummer, 1, 1)
This way, the sequence gets incremented while you're still on the same InvoiceNummer, and a new one is started whenever a new InvoiceNummer is found.

There are two ways to achieve it,
tJavaFlex
Sql
tJavaFlex
You can compare current data with the previous data and reset the sequence value using below function,
if () {
Numeric.resetSequence(seqName, startValue);
}
Sql
Once data is loaded into the tables, create a post job and use an update query to update the records. You have to select the records and take the rank of the values. On top of the select you have to perform the update.
select invoicenumber, row_number() over(partition by invoicenumber, order by invoicenumber) from table name where -- conditions if any.
Update statements vary with respect to the database, please provide which database are you using, so that can provide the update query.
I would recommend you to achieve this through Sql

Get details about SQL updated records

I need suggestions about a logic. There is an update query in application like below
UPDATE TABLE
SET FLAG = CASE
WHEN FLAG = 'IP' THEN 'P'
WHEN FLAG = 'IH' THEN 'H'
WHEN FLAG = 'IM' THEN 'M'
END
WHERE ADJUSTMENT_ID IN (SELECT Query )
This update is executed from a Java function which returns void.
Now I have a requirement to get details of updated records too (few columns from table TABLE) and return a LIST from function instead of void.
Running SELECT first then updating records in loop is not an option due to performance reasons. Records are updated with a single UPDATE statement because its supposed to run faster.
What would be options for me keeping comparable performance? Should I go with a stored procedure?

SELECT ... FROM FINAL TABLE (UPDATE ....)
will do the job. As it is a single SQL statement the performance will aslo be good.
See also
http://www.idug.org/p/bl/et/blogid=278&blogaid=422

Cassandra Updates not working properly

BoundStatement UpdateTable = new BoundStatement(preparedStatement);
UpdateTable.bind(productId, productname, time);
session.execute(UpdateTable);
I am using the following commands to update cassandra tables.Sometimes it updates and sometimes it doesn't.
UPDATE product SET count = count + 1 where productId = ? AND productname = ? AND time = ?;
It never throws an error.
Why is this ?
EDIT
Table structure
CREATE TABLE IF NOT EXISTS product(productId int,productname text,time timestamp , count counter,PRIMARY KEY (productid,productname,time));

By looking at your (Java?) code, I can't really tell what kind of object insertUpdateTable is. But the bind method should return a BoundStatement object that can be executed. And while UpdateTable is indeed a BoundStatement, I don't see that you're actually binding your variables to it.
Based on the limited amount of code shown, I see two solutions here:
Call the bind method on UpdateTable inside your session.execute:
session.execute(UpdateTable.bind(productId, productname, time));
Wrap your insertUpdateTable.bind inside a session.execute:
session.execute(insertUpdateTable.bind(productId, productname, time));
Check out the DataStax documentation on Using Bound Statements with the Java driver for more information.
Sometimes it updates and sometimes it doesn't.
If you had posted your Cassandra table definition, it might shed some more light on this. But it is important to remember that Cassandra PRIMARY KEYs are unique, and that INSERTs and UPDATEs are essentially the same (an INSERT can "update" existing values and an UPDATE can "insert" new values). Sometimes an UPDATE may appear to not work, when it may be performing a write with the same key values. Just something to look out for.
Also important to note, is that UPDATE product SET count = count + 1 will only work under two conditions:
count is a counter column.
product is a counter table, consisting of only keys and counter columns (all non-counter columns must be a part of the PRIMARY KEY).
Worth noting is that counter columns underwent a big change/improvement with Cassandra 2.1. If you need to use counters and are still on Cassandra 2.0, it may be worth upgrading.

You said the update sometimes works, but note that if you ever delete row with a counter, you'll be unable to modify the row again without dropping the table and recreating it. The update will appear to fail silently. For more, see CASSANDRA-8491

I had a similar issue during high frequency writes & updates.
As the number of concurrent requests goes up, there is good chance that the latest bind may over write the previous bound params. So instead of using single boundStatement, used preparedStatement.bind() in the session.execute
Can you try the following.
Instead of using :
UpdateTable.bind(productId, productname, time);
session.execute(UpdateTable);
Use :
session.execute(preparedStatement.bind(productId, productname, time));

Java Multithreaded delete on same sets of table

I had to cleanup database ( few tables with given condition , where columns for conditions are always same ) e.g.
delete from table1 where date < given_date1 and id = given_id
delete from table2 where date < given_date2 and id = given_id
Where given_id and givendate relation varies on both table by table and id by id.
The actual delete condition is not always where date < givendate , I just wrote for example, so say one id has got 300 days of data, and other of 500 days of data, the where condition is allowed to delete oldes 10 days of data where 10 is a variable, based on user input, so at one iteration all nodes are processed with deleting oldest 10 days of data and thus query changes for each id, but the fact is that it would be on same sets of table
earlier that script was written in as sql script and doing its operation but was taking time, Now I have implemented a multithreaded java application where the new code looks like
for(i=0; i < idcount ; i++)
{
//launch new thread and against that thread call
delete(date,currentid);
}
function delete(date,id)
{
delete from table1 where date < given_date and id = given_id
delete from table2 where date < given_date and id = given_id
}
after implementing this I found deadlock on sql table, which was solved by indexing the tables, but still its not fast as it is supposed to be, If I have 500 threads they are all launched one after other, and obviously running on same sets of table. and sql is not actually executing in parallel on each table ?
When I monitor my java.exe and sqlserver.exe, its not busy at all ? I hope it is supposed to be.
Could anyone tell me what could be best approach to implement multithreaded delete on same sets of table, so that I can bump up the thread and do deletion in parallel and consume available resources

If all the actions are delete on a given id the I would just do a delete on each table doing all the ids at once.
e.g.
delete from table1 where date < given_date and id in (given_id1, given_id2 ..... )
If there are lots of given_ids the first insert them into a temporary table then execute each delete by joining the table to have deletions with the temporary table
Also if trying to use multiple threads then the improvement is really only expected if you act on a table in a thread so there will not be contention in the database.

Ignoring the problem you created...
Why not use the IN statement?
delete from table1 where date < given_date and id IN (id1, id2, id3, ...)
Update based on clarification:
Based on the explanation in the comments, my guess is that you don't have good indexes and every delete statement is resulting in a table scan. Each table scan locks the table and thus the database can only process one statement at a time. Index the date and id columns along with any other column used in the where clause of your delete statement.

In my personal experience, I make a class to manage my queries and the communication with the database. I use a thread pool to manage my threads and simply have the threads make calls to my static database manager. The manager should have a synchronized method in it that acquires a lock() on to the database connection. The threads will then be able to access the database and their actions won't conflict with each other.

If you dont care about making all command in one transaction unit so put the delete in its own transaction (small one).

Insert fail then update OR Load and then decide if insert or update

I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.
Which one is the best approach to abtain better performance results:
Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update
Try to load the entry from database. if the results retrieved update, if not insert the entry.
another option? tell me about it :)
In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.
I'm talking about a DB table that could reach over 100 million entries in a mature form.
What will be your approach? Performance is my most important goal.

If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).
See:
http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=194

If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.
If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.

The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):
checks for primary key, if zero records are found does the insert, else updates
Does the update and, if fails, does the insert.
others
And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.

MySQL supports this:
INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1

Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.
Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.
CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
BEGIN
INSERT INTO
my_table (id, col1)
SELECT
_id, _col1;
EXCEPTION WHEN unique_violation THEN
UPDATE
my_table
SET
col1 = _col1
WHERE
id = _id;
END;
END;
$$
LANGUAGE plpgsql;
You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.
As alluded to in some other answers, the most efficient way to handle this operation is in one batch:
Take all of the rows passed to the web service and bulk insert them into a temporary table
Update rows in the mater table from the temp table
Insert new rows in the master table from the temp table
Dispose of the temp table
The type of temporary table to use and most efficient way to manage it will depend on the database you are using.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Hibernate tips about update all table fields performance - java

What about the following solution? SET #ids = NULL; UPDATE SOME_TABLE SET activeRegister = true, updateTime = :updateTime WHERE typeOfOperation = 1 and performUpdate = true AND (SELECT #ids := CONCAT_WS(',', id, #ids)); SELECT #ids;

if updateTime is datetime you can select all affected record ids with select Date updateTime = new Date(); // time from update select id from clazz.getName() where updateTime=:updateTime and activeRegister=true and typeOfOperation=1 and performUpdate=true

Related

Position Autoincrement in Talend

Get details about SQL updated records

Cassandra Updates not working properly

Java Multithreaded delete on same sets of table

Insert fail then update OR Load and then decide if insert or update

Categories

Resources