JDBC - PostgreSQL - batch insert + unique index

JDBC - PostgreSQL - batch insert + unique index - java

I have a table with unique constraint on some field. I need to insert a large number of records in this table. To make it faster I'm using batch update with JDBC (driver version is 8.3-603).
Is there a way to do the following:
every batch execute I need to write into the table all the records from the batch that don't violate the unique index;
every batch execute I need to receive the records from the batch that were not inserted into DB, so I could save "wrong" records
?

The most efficient way of doing this would be something like this:
create a staging table with the same structure as the target table but without the unique constraint
batch insert all rows into that staging table. The most efficient way is to use copy or use the CopyManager (although I don't know if that is already supported in your ancient driver version.
Once that is done you copy the valid rows into the target table:
insert into target_table(id, col_1, col_2)
select id, col_1, col_2
from staging_table
where not exists (select *
from target_table
where target_table.id = staging_table.id);
Note that the above is not concurrency safe! If other processes do the same thing you might still get unique key violations. To prevent that you need to lock the target table.
If you want to remove the copied rows, you could do that using a writeable CTE:
with inserted as (
insert into target_table(id, col_1, col_2)
select id, col_1, col_2
from staging_table
where not exists (select *
from target_table
where target_table.id = staging_table.id)
returning staging_table.id;
)
delete from staging_table
where id in (select id from inserted);
A (non-unique) index on the staging_table.id should help for the performance.

Related

How can I access a value when inserting into a table?

I'm trying to write a java sql query, the simplified table would be table(name,version) with a unique constraint on (name, version).
I'm trying to insert a row into my database with a conditional statement. Meaning that when a entry with the same name exists, it should insert the row with same name and its version increased by 1.
I have tried with the following:
INSERT INTO table(name,version)
VALUES(?, CASE WHEN EXISTS(SELECT name from table where name=?)
THEN (SELECT MAX(version) FROM table WHERE name = ?) +1
ELSE 1 END)
values are sent by user.
My question is, how can I access the 'name' inside the values so I could compare them?

If you want to write this as a single query:
INSERT INTO table (name, version)
SELECT ?, COLAESCE(MAX(t2.version) + 1, 1)
FROM table t2
WHERE t2.name = ?;
That said, this is dangerous. Two threads could execute this query "at the same time" and possibly create the same version number. You can prevent this from happening by adding a unique index/constraint on (name, version).
With the unique index/constraint, one of the updates will fail if there is a conflict.

I see at least two approaches:
1. For each pair of name and version you first query the max version:
SELECT MAX(VERSION) as MAX FROM <table> WHERE NAME = <name>
And then you insert the result + 1 with a corresponding insert query:
INSERT INTO <table>(NAME,VERSION) VALUES (<name>,result+1)
This approach is very straight-forward, easy-to-read and implement, however, not really performant because of so many queries necessary.
You can achieve that with sql alone with sql analytics and window functions, e.g.:
SELECT NAME, ROW_NUMBER() over (partition BY NAME ORDER BY NAME) as VERSION FROM<table>
You can then save the result of this query as a table using CREATE TABLE as SELECT...
(The assumption here is that the first version is 1, if it is not the case, then one could slightly rework the query). This solution would be very performant even for large datasets.

You should get the name before insertion. In your case, if something went wrong then how would you know about it so you get the name before insert query.
Not sure but you try this:
declare int version;
if exists(SELECT name from table where name=?)
then
version = SELECT MAX(version) FROM table WHERE name = ?
version += 1
else
version = 1
end
Regards.

This is actually a bad plan, you might be changing what the user's specified data. That is likely to not be what is desired, maybe they're not trying to create a new version but just unaware that the one wanted already exists. But, you can create a function, which your java calls, not only inserts the requested version or max+1 if the requested version already exists. Moreover it returns the actual values inserted.
-- create table
create table nv( name text
, version integer
, constraint nv_uk unique (name, version)
);
-- function to create version or 1+max if requested exists
create or replace function new_version
( name_in text
, version_in integer
)
returns record
language plpgsql strict
as $$
declare
violated_constraint text;
return_name_version record;
begin
insert into nv(name,version)
values (name_in,version_in)
returning (name, version) into return_name_version;
return return_name_version;
exception
when unique_violation
then
GET STACKED DIAGNOSTICS violated_constraint = CONSTRAINT_NAME;
if violated_constraint like '%nv\_uk%'
then
insert into nv(name,version)
select name_in, 1+max(version)
from nv
where name = name_in
group by name_in
returning (name, version) into return_name_version;
return return_name_version;
end if;
end;
$$;
-- create some data
insert into nv(name,version)
select 'n1', gn
from generate_series( 1,3) gn ;
-- test insert existing
select new_version('n2',1);
select new_version('n1',1);
select *
from nv
order by name, version;

Spring JDBC Template batchUpdate to update thousands of records in a tbale

I have an update query which I am trying to execute through batchUpdate method of spring jdbc template. This update query can potentially match 1000s of rows in EVENT_DYNAMIC_ATTRIBUTE table which needs to be get updated. Will updating thousands of rows in a table cause any issue in production database apart from timeout? like, will it crash database or slowdown the performance of entire database engine for other connections...etc?
Is there a better way to achieve this instead of firing single update query in spring JDBC template or JPA? I have the following settings for jdbc template.
this.jdbc = new JdbcTemplate(ds);
jdbc.setFetchSize(1000);
jdbc.setQueryTimeout(0); // zero means there is no limit
The update query:
UPDATE EVENT_DYNAMIC_ATTRIBUTE eda
SET eda.ATTRIBUTE_VALUE = 'claim',
eda.LAST_UPDATED_DATE = SYSDATE,
eda.LAST_UPDATED_BY = 'superUsers'
WHERE eda.DYNAMIC_ATTRIBUTE_NAME_ID = 4002
AND eda.EVENT_ID IN
(WITH category_data
AS ( SELECT c.CATEGORY_ID
FROM CATEGORY c
START WITH CATEGORY_ID = 495984
CONNECT BY PARENT_ID = PRIOR CATEGORY_ID)
SELECT event_id
FROM event e
WHERE EXISTS
(SELECT 't'
FROM category_data cd
WHERE cd.CATEGORY_ID = e.PRIMARY_CATEGORY_ID))

If it is one time thing, I normally first select the records which needs to be updated and put in a temporary table or in a csv, and I make sure that I save primary key of those records in a table or in a csv. Then I read records in batches from temporary table or csv, and do the update in the table using the primary key. This way tables are not locked for a long time and you can have fixed set of records added in the batch which needs update and updates are done using primary key so it will be very fast. And if any update fails then you know which records got failed by logging out the failed records primary key in a log file or in an error table. I have followed this approach many time for updating millions of records in the PROD database, as it is very safe approach.

How to find the primary key of deleted records with JDBC.?

I want to know how to find the primary key of the deleted records through the JDBC connection.
If my query like following then what will be the primary key of deleted records?
String sqlDelete = "DELETE FROM devicesequences WHERE deviceId = 20;
What is the primary key of the deleted record?

You need a second query. And of course you need to execute it before running the DELETE query.
SELECT id FROM devicesequences WHERE deviceId = 20;
assuming that id is the name of your primary key column.

Since you can delete any arbitrary row (or set of rows) at any time: no, there's no way to tell WHICH row (or rows) you most recently deleted.
You can, however, create a "trigger" to save this information for you.
http://dev.mysql.com/doc/refman/5.0/en/triggers.html

How to delete multiple rows from multiple tables using Where clause?

Using an Oracle DB, I need to select all the IDs from a table where a condition exists, then delete the rows from multiple tables where that ID exists. The pseudocode would be something like:
SELECT ID FROM TABLE1 WHERE AGE > ?
DELETE FROM TABLE1 WHERE ID = <all IDs received from SELECT>
DELETE FROM TABLE2 WHERE ID = <all IDs received from SELECT>
DELETE FROM TABLE3 WHERE ID = <all IDs received from SELECT>
What is the best and most efficient way to do this?
I was thinking something like the following, but wanted to know if there was a better way.
PreparedStatement selectStmt = conn.prepareStatment("SELECT ID FROM TABLE1 WHERE AGE > ?");
selectStmt.setInt(1, age);
ResultSet rs = selectStmt.executeQuery():
PreparedStatement delStmt1 = conn.prepareStatment("DELETE FROM TABLE1 WHERE ID = ?");
PreparedStatement delStmt2 = conn.prepareStatment("DELETE FROM TABLE2 WHERE ID = ?");
PreparedStatement delStmt3 = conn.prepareStatment("DELETE FROM TABLE3 WHERE ID = ?");
while(rs.next())
{
String id = rs.getString("ID");
delStmt1.setString(1, id);
delStmt1.addBatch();
delStmt2.setString(1, id);
delStmt2.addBatch();
delStmt3.setString(1, id);
delStmt3.addBatch();
}
delStmt1.executeBatch();
delStmt2.executeBatch();
delStmt3.executeBatch();
Is there a better/more efficient way?

You could do it with one DELETE statement if two of your 3 tables (for example "table2" and "table3") are child tables of the parent table (for example "table1") that have a "ON DELETE CASCADE" option.
This means that the two child tables have a column (example column "id" of "table2" and "table3") that has a foreign key constraint with "ON DELETE CASCADE" option that references the primary key column of the parent table (example column "id" of "table1"). This way only deleting from the parent table would automatically delete associated rows in the child tables.
Check out this in more detail : http://www.techonthenet.com/oracle/foreign_keys/foreign_delete.php

If you delete only few records of a large tables ensure that an index on the
column ID is defined.
To delete the records from the table TABLE2 and 3 the best strategy is to use the CASCADE DELETE as proposed by
#ivanzg - if this is not possible, see below.
To delete from TABLE1 a far superior option that a batch delete on a row basis, use signle delete using the age based predicate:
PreparedStatement stmt = con.prepareStatement("DELETE FROM TABLE1 WHERE age > ?")
stmt.setInt(1,60)
Integer rowCount = stmt.executeUpdate()
If you can't cascade delete, use for the table2 and 3 the same concept as above but with the following statment:
DELETE FROM TABLE2/*or 3*/ WHERE ID in (SELECT ID FROM TABLE1 WHERE age > ?)
General best practice - minimum logic in client, whole logic in the database server. The database should be able to do reasonable execution plan
- see the index note above.

DELETE statement operates a table per statement. However the main implementations support triggers or other mechanisms that perform subordinate modifications. For example Oracle's CREATE TRIGGER.
However developers might end up figuring out what is the database doing behind their backs. (When/Why to use Cascading in SQL Server?)
Alternatively, if you need to use an intermediate result in your delete statements. You might use a temporal table in your batch (as proposed here).
As a side note, I see not transaction control (setAutoCommit(false) ... commit() in your example code. I guess that might be for the sake of simplicity.
Also you are executing 3 different delete batches (one for each table) instead of one. That might negate the benefit of using PreparedStatement.

Complex INSERT query

I'm pretty new to MySQL. I have two related tables, quite common case: Klients(KID, name, surname) and Visits(VID, VKID, dateOfVisit) - VKID is the Klient ID. I have a problem with suitable INSERT query, this is what I want to do:
1.Check if Klient with specific name and surname exists (let's assume that there are no people with the same surnames)
2.If yes, get the ID and do the INSERT to Visits table
3.If no, INSERT new Klient, get the ID and INSERT to Visits.
Is it possible to do in one query?

You would need to use the IF EXIST / NOT EXISTS and use a subquery to check the table. See the reference bwlo
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
HTH

The INSERT statement allows only one single target table.
So the query you're looking for is just impossible unless you use triggers or stored procedures.
But such problem is commonly solved using the fallowing small algorithm:
1) insert a record in table [Visits] assuming the parent record does exist in table [Klients]
INSERT INTO Visits (VKID, dateOfVisit)
SELECT KID, NOW()
FROM Klients
WHERE (name=#name) AND (surname=#surname)
2) check the number of inserted records after query (1)
3) if no record has been inserted, then add a new record table [Klients], and then run (1) again.

try something like this
IF (SELECT * FROM `sometable` WHERE name = 'somename' AND surname = 'somesurname') IS NULL THEN
INSERT INTO Table1(name,surname) VALUES ('somename', 'somesurname');
ELSE INSERT INTO visits(kid,name,surname)
SELECT kid, name, surname FROM Table1 WHERE name = 'somename' AND surname = 'somesurname';
END IF;
there is no need to specify 'VALUES' on the second insert
i have not tested it, but this is the general idea of what you are trying to accomplish.

These should be two queries in a transaction:
INSERT INTO Klients (name, surname)
VALUES ('John', 'Doe')
ON DUPLICATE KEY UPDATE
KID = LAST_INSERT_ID(KID);
INSERT INTO Visits (VKID, dateOfVisits)
VALUES (LAST_INSERT_ID(), NOW());
The first statement is an upsert statement where the update part uses not widely known, but intented exactly for the purpose functionality of LAST_INSERT_ID(), where explicitly passed value is stored for getting the value afterwards.
UPD: I forgot to mention that you would need to add a unique constraint on (surname, name).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JDBC - PostgreSQL - batch insert + unique index - java

Related

How can I access a value when inserting into a table?

Spring JDBC Template batchUpdate to update thousands of records in a tbale

How to find the primary key of deleted records with JDBC.?

How to delete multiple rows from multiple tables using Where clause?

Complex INSERT query

Categories

Resources