How to query a boolean column using an integer with Postgres? - java

I post a similar question previously, but have opened another question to be more specific as the previous one gave me a solution but I have now encountered another problem.
We have an existing Oracle database that had boolean columns defined like so
CREATE TABLE MY_TABLE
(
SOME_TABLE_COLUMN NUMBER(1) DEFAULT 0 NOT NULL,
etc..
And with the corresponding java field defined like private boolean someTableColumn; I've come to understand this is because Oracle does not have a Boolean datatype, but under the hood does the conversion from boolean to integer when inserting data, and the reverse when retriving data.
This has caused an issue when I have been working on migrating our database from Oracle to Postgres. Thanks to answers on my previous question, I have migrated the column type from NUMBER(1) to BOOLEAN. This has solved the issue with inserting data. However, our codebase uses JDBCTemplate and we unfortunately have hundereds of hardcoded queries in the code that make queries like SELECT * FROM MY_TABLE WHERE TABLE_COLUMN=1.
When these run against the Postgres DB, I get the following error ERROR: operator does not exist: boolean = integer. We have a requirement to have backwards compatability with Oracle, so I can't simply update these queries to replace 1 and 0 with TRUE and FALSE respectively.
Is there a way I can configure Postgres so it can do a conversion behind the scenes to resolve this? I have looked at casts but I don't really understand the documentation and the examples given don't seem to match my use case. Any help is appreciated.

Can you try to use '0' and '1' instead of 0 and 1 in your requests ?
I used to work on apps compliant with both Oracle and Postgresql using this syntax. Apps were using JPA but can say with certainty that we were also using this syntax with nativeQuery = true.
Note: I would have posted this as a comment but I don't have the required reputation to do so, hence the post as an answer

Update:
Duh! Brain Freeze. On later thought there is a way to get this conversion in both directions. Process via a View.
Steps:
Rename your table.
Create a view having the same name as the old table. In this view
translate the boolean column to the appropriate integer.
Create an trigger function and an instead of trigger on the view for insert/update dml. In the trigger function translate the column value to boolean as appropriate.
See revised demo.
alter table testb rename to testb_tab;
create or replace view testb (id, name, is_ok)
as
select i,
, name
, is_ok::int
from testb_tab;
create or replace
function testb_act_dml()
returns trigger
language plpgsql
as $$
begin
if tg_op = 'INSERT' then
insert into testb_tab(name,is_ok)
values (new.name, new.is_ok::boolean) ;
else
update testb_tab
set name = new.name
, is_ok = new.is_ok::boolean
where id = old.id;
end if;
return new;
end;
$$;
create trigger testb_biuri -- before insert update row instead of
instead of insert or update on testb
for each row execute function testb_act_dml();
Finally, there is another option which probably has the lease work. Do not change the column description to Boolean. Either leave it as an integer or define it as a smallint. Either way a check constraint may come in useful. So something along the line of:
create table tests( id int generated always as identity primary key
, name text
, is_ok smallint
, constraint is_ok_ck
check ( is_ok in (0,1) or is_ok is null)
);
This is one of the issues I have with the concept of "database independence". It simply does not exist. Vendors implementation often simply vary too much. In this case Postures to the rescue, perhaps but also perhaps extreme: create you own create your own operators. Proceed with caution:
-- function to compare boolean = integer
create or replace
function"__bool=int"( b boolean, i int)
returns boolean
language sql
as $$
select (b=i::boolean);
$$;
-- create the Operator for boolean = integer
create operator = (
leftarg = boolean
, rightarg = int
, function = "__bool=int"
, commutator = =
);
The above will not allow your code: SELECT * FROM MY_TABLE WHERE TABLE_COLUMN=1 (see demo here).
However, this road may lead to unexpected twists, and lots of function/operator pairs. For example the above does not support SELECT * FROM MY_TABLE WHERE TABLE_COLUMN<>1. That requires another function/operator combination. Further I do not see a retrieval function that converts a boolean back to an integer. If you follow this path be sure to massively test your boolean-to-integer (integer-to-boolean) operations. It may just be better to just byte the bullet and updated those few queries (you did say hundreds not thousands) as #mlogario suggests.

Related

JDBC PreparedStatement in batch when columns to set vary from row to row

I am trying to batch updates using PreparedStatement where the data you get in varies in what you have available.
Simple example:
You have a table with column x and y. Your inputdata is a representation of rows where only the modified column is present. So for row 1 you have x='foo' and for row 2 you have y='bar'. In order to batch these with a PreparedStatement you need one SQL statement: "update table where x=?,y=? where etc".
The solution that I am working towards is setting the column where you dont have any value to the value that is already present, but I'm not sure if this is possible. In raw SQL you could write "update table where x=x and y='foo' where etc", however I haven't found any ways to achieve this by setting the "?" parameter to be a reference to the column, it seems it's not possible.
Is it possible to handle this case at all with PreparedStatements? I apologize if my explanation is poor.
Assuming the values you want to set are all non-null, you could use a statement like
update sometable set column1 = coalesce(?, column1), colum2 = coalesce(?, column2)
Then when the value should not be updated, use either PreparedStatement.setNull with an appropriate java.sql.Types value or a PreparedStatement.setXXX of the appropriate type with a null as value.
As discussed in the comments, an alternative if you do need to update to null, is to use a custom function with a sentinel value (either for updating to null or for using the current value). Something like:
update sometable set column1 = conditional_update(?, column1), colum2 = conditional_update(?, column2)
Where conditional_update would be something like (using Firebird 3 PSQL syntax):
create function conditional_update(new_value varchar(500), original_value varchar(500))
returns varchar(500)
as
begin
if (new_value = '$$NO_UPDATE$$') then
return original_value;
else
return new_value;
end
Where using $$NO_UPDATE$$ is a sentinel value for not updating. A potential downside of this solution is the typing of columns as a string type. You always need to use string types (I'm not sure if there are databases that would supports dynamic parameters and return types in this situation).
Let me rephrase your problem and you please tell me if this is right:
You want to be able to have the update process
sometimes you only update for a specific x
sometimes you only update for a specific y
sometimes you update for specific x and y combined
Correct? If 'yes' then the next issue is implementation possibilities.
You suggested trying to put the column itself into the parameter. Even if that's possible I'd still recommend against it. The code will get confusing.
I suggest you create a java method that builds and returns the query string (or at least the 'where' clause) based on your available parameters (x, y, etc)
Your code for invoking the JDBC Prepared statement will invoke that method to get a query String.
You will still benefit from using prepared statements. The cost is that your database will be caching several different statements instead of one, but I imagine that is a minimal issue.
You can think of auxiliar "?" variable for each column which will manage if the column should be updated or not.
update table
set x = case when 0 = ? then x else ? end,
y = case when 0 = ? then y else ? end
where etc;
Passing 0 for each 0 = ? will not update the column whereas passing 1 will update it to the value you specified.

How to implement Lookup Tables?

I am unable to grasp the concept of the lookup table.
I am currently working on a project wherein I am using two tables.
The first table consists of two columns- name(varchar) and value(varchar).
The second table also has two rows- Result(varchar) and value(varchar).
Result is used to store the values which are obtained from a Java code. Whenever the Result of the Java code matches the name in the first table, I need to update the second table with the corresponding value in the first table.
Does using lookup table help in any way? If it does, can it be explained with an example?If not, is there any other way?
Just imagine a table person with a column GenderIsMale BIT. You can set this value to 1 (yes, it is a boy) or to 0 (no, a girl). This was easy in earlier days.
Now we have more categories. According to this link facebook offers more than 50 differing categories...
There the lookup-table comes into play: You create a table which has - as minium - a unique key and a value. In most cases this is an ID INT IDENTITY and a Content VARCHAR(100) NOT NULL. You can add more columns like Abbreviation or any other additional content (e.g. other languages or codes of external code systems read about mapping tables also) directly bound to this value.
The next step is, to take the GenderIsMale-column away and replace it with a
GenderID INT NOT NULL
CONSTRAINT FK_Person_GenderID FOREIGN KEY REFERENCES GenderLookUpTable(GenderID)
The person table will store the GenderID only, the related values are stored in the side table and can be looked up.
The simple lookup table is the basic construct of how to create a relational database model in min. 3.NF or BCNF (which should be a minium reuqirement for professional database design).
Whenever the Result of the Java code matches the name in the first
table, I need to update the second table with the corresponding value
in the first table.
That's a perfect use case for database trigger, which can be used to perform various things when a change (insert, update, delete) happens in a table.
Assuming you're inserting the value of your Java calculations to your (result, value) table (let's call it foo, and the other table is bar), you can write a trigger that replaces the value being written with the value from the other table. Example given for Postgres, if using another db refer to your particular RDBMS manual to see the syntax.
CREATE FUNCTION get_value_from_lookup_table() RETURNS trigger AS $$
BEGIN
IF EXISTS (SELECT 1 FROM bar WHERE name = NEW.result) THEN
RETURN SELECT name, value FROM bar WHERE name = NEW.result;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER lookup_value
INSTEAD OF INSERT ON foo
FOR EACH ROW
EXECUTE PROCEDURE get_value_from_lookup_table();
Every time an INSERT is done on foo, a check is done to see if a row exists in bar where name=result. If so, that row is inserted, otherwise the insert goes on normally. That's the basic gist of it. The actual solution depends on table constraints, whether you need to handle inserts and updates, etc.

Decode in SQL vs. If... Else in Java

I'm looking for a solution to a simple scenario. I need to check if a value is present in a table, and if present I need Y else N
I can do it in two ways, either fetch the count of rows from the database, and code the logic in java, or use DECODE(COUNT(*),0,'N','Y')
Which is better? Is there any advantage of one over the other? Or more specifically, is there any disadvantage of using DECODE() instead of doing it in Java?
The database I have is DB2.
You should use exists. I would tend to do this as:
select (case when exists (select 1 from . . . .)
then 'Y' else 'N'
end) as flag
from sysibm.sysdummy1;
The reason you want to use exists is because it is faster. When you use count(*), the SQL engine has to process all the (appropriate) data to get the count. With exists, it can stop at the first one.
The reason to prefer case over decode() is that the former is ANSI standard SQL, available in basically all databases.
It shouldn't be any considerable difference between those 2 ways that you mentioned.
1) The DECODE will be simple and the IF will be simple.
2) You will be receiving an Int32 versus a CHAR(1) - which is not a significant difference.
So, I would consider another aspect: Which of those 2 will make your code more CLEAR?
And one more thing: if this is the ONLY thing that you're selecting on that query, you could try something like:
SELECT 'Y' FROM DUAL WHERE EXISTS (SELECT 1 FROM YOURTABLE WHERE YOURCONDITION = 1); --Oracle SQL - but should be fairly easy to translate it to DB2
This is an option to not make the DB count for every occurrence of your condition just to check if it exists.
Aggregated functions like count can be optimized with MQT - Materilized Query Tables
https://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/
connect to sample
alter table employee add unique (empno)
alter table department add unique (deptno)
create table count_emp_dpto_1 as (select d.deptno, e.empno, count(*) from employee e, department d where d.deptno = 1 and e.workdept = d.deptno) data initially deferred refresh immediate
set integrity for count_emp_dpto_1 immediate checked not incremental
select * from count_emp_dpto_1
connect reset

Modifying PLSQL function to return multiple rows from same column

I am a beginner PLSQL user, and I have what might be a rather simple question.
I have created the following SQL Function, which returns the created date of the process whose corporate ID matches the corporate ID that I have given it. I have this connected to my JDBC, and it returns values just fine.
However, I just realized I overlooked an important issue--it's entirely possible that more than one process will have a corporate ID that matches the ID value I've inputted, and in cases like that I will need to be able to access all of the created date values where the IDs return a match.
CREATE OR REPLACE FUNCTION FUNCTION_1(
c_id IN INT)
RETURN INT
AS
p_date process.date_created%TYPE;
BEGIN
SELECT process.date_created
FROM PROCESS
WHERE process.corporate_id = c_id
ORDER BY process.corporate_id;
RETURN p_id;
END FUNCTION_1;
/
Is there a way that I can modify my function to return multiple rows from the same column, and then call that function to return some sort of array using JDBC? Or, if that isn't possible, is there a way I can return what I need using PLSQL procedures or just plain SQL combined with JDBC? I've looked through other questions here, but none seemed to be about quite what I need to know.
Thanks to anyone who can help!
you need make some changes in your function. on java side it will be simple select
you need change type of your function from int to collection
read about the table functions here Table Functions
user oracle table() function to convert result of your function to table
it let you use your function in queries. read more about the syntax here: Table Collections: Examples
here the example how to call your function from java:
select t.column_value as process_id
from table(FUNCTION_1(1)) t
--result
PROCESS_ID
1 1
2 2
--we need create new type - table of integers
CREATE OR REPLACE TYPE t_process_ids IS TABLE OF int;
--and make changes in function
CREATE OR REPLACE FUNCTION FUNCTION_1(
c_id IN INT)
RETURN t_process_ids
AS
l_ids t_process_ids := t_process_ids();
BEGIN
--here I populated result of select into the local variables
SELECT process.id
bulk collect into l_ids
FROM PROCESS
WHERE process.corporate_id = c_id
ORDER BY process.corporate_id;
--return the local var
return l_ids;
END FUNCTION_1;
--the script that I used for testing
create table process(id int, corporate_id int, date_created date);
insert into process values(1, 1, sysdate);
insert into process values(2, 1, sysdate);
insert into process values(3, 2, sysdate);

How to PreparedStatement sql with ON DUPLICATE KEY UPDATE? [duplicate]

Several months ago I learned from an answer on Stack Overflow how to perform multiple updates at once in MySQL using the following syntax:
INSERT INTO table (id, field, field2) VALUES (1, A, X), (2, B, Y), (3, C, Z)
ON DUPLICATE KEY UPDATE field=VALUES(Col1), field2=VALUES(Col2);
I've now switched over to PostgreSQL and apparently this is not correct. It's referring to all the correct tables so I assume it's a matter of different keywords being used but I'm not sure where in the PostgreSQL documentation this is covered.
To clarify, I want to insert several things and if they already exist to update them.
PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL)
INSERT INTO the_table (id, column_1, column_2)
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z')
ON CONFLICT (id) DO UPDATE
SET column_1 = excluded.column_1,
column_2 = excluded.column_2;
Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:
Example 38-2. Exceptions with UPDATE/INSERT
This example uses exception handling to perform either UPDATE or INSERT, as appropriate:
CREATE TABLE db (a INT PRIMARY KEY, b TEXT);
CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
-- note that "a" must be unique
UPDATE db SET b = data WHERE a = key;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure
BEGIN
INSERT INTO db(a,b) VALUES (key, data);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing, and loop to try the UPDATE again
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');
There's possibly an example of how to do this in bulk, using CTEs in 9.1 and above, in the hackers mailing list:
WITH foos AS (SELECT (UNNEST(%foo[])).*)
updated as (UPDATE foo SET foo.a = foos.a ... RETURNING foo.id)
INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id)
WHERE updated.id IS NULL;
See a_horse_with_no_name's answer for a clearer example.
Warning: this is not safe if executed from multiple sessions at the same time (see caveats below).
Another clever way to do an "UPSERT" in postgresql is to do two sequential UPDATE/INSERT statements that are each designed to succeed or have no effect.
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
The UPDATE will succeed if a row with "id=3" already exists, otherwise it has no effect.
The INSERT will succeed only if row with "id=3" does not already exist.
You can combine these two into a single string and run them both with a single SQL statement execute from your application. Running them together in a single transaction is highly recommended.
This works very well when run in isolation or on a locked table, but is subject to race conditions that mean it might still fail with duplicate key error if a row is inserted concurrently, or might terminate with no row inserted when a row is deleted concurrently. A SERIALIZABLE transaction on PostgreSQL 9.1 or higher will handle it reliably at the cost of a very high serialization failure rate, meaning you'll have to retry a lot. See why is upsert so complicated, which discusses this case in more detail.
This approach is also subject to lost updates in read committed isolation unless the application checks the affected row counts and verifies that either the insert or the update affected a row.
With PostgreSQL 9.1 this can be achieved using a writeable CTE (common table expression):
WITH new_values (id, field1, field2) as (
values
(1, 'A', 'X'),
(2, 'B', 'Y'),
(3, 'C', 'Z')
),
upsert as
(
update mytable m
set field1 = nv.field1,
field2 = nv.field2
FROM new_values nv
WHERE m.id = nv.id
RETURNING m.*
)
INSERT INTO mytable (id, field1, field2)
SELECT id, field1, field2
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.id = new_values.id)
See these blog entries:
Upserting via Writeable CTE
WAITING FOR 9.1 – WRITABLE CTE
WHY IS UPSERT SO COMPLICATED?
Note that this solution does not prevent a unique key violation but it is not vulnerable to lost updates.
See the follow up by Craig Ringer on dba.stackexchange.com
In PostgreSQL 9.5 and newer you can use INSERT ... ON CONFLICT UPDATE.
See the documentation.
A MySQL INSERT ... ON DUPLICATE KEY UPDATE can be directly rephrased to a ON CONFLICT UPDATE. Neither is SQL-standard syntax, they're both database-specific extensions. There are good reasons MERGE wasn't used for this, a new syntax wasn't created just for fun. (MySQL's syntax also has issues that mean it wasn't adopted directly).
e.g. given setup:
CREATE TABLE tablename (a integer primary key, b integer, c integer);
INSERT INTO tablename (a, b, c) values (1, 2, 3);
the MySQL query:
INSERT INTO tablename (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
becomes:
INSERT INTO tablename (a, b, c) values (1, 2, 10)
ON CONFLICT (a) DO UPDATE SET c = tablename.c + 1;
Differences:
You must specify the column name (or unique constraint name) to use for the uniqueness check. That's the ON CONFLICT (columnname) DO
The keyword SET must be used, as if this was a normal UPDATE statement
It has some nice features too:
You can have a WHERE clause on your UPDATE (letting you effectively turn ON CONFLICT UPDATE into ON CONFLICT IGNORE for certain values)
The proposed-for-insertion values are available as the row-variable EXCLUDED, which has the same structure as the target table. You can get the original values in the table by using the table name. So in this case EXCLUDED.c will be 10 (because that's what we tried to insert) and "table".c will be 3 because that's the current value in the table. You can use either or both in the SET expressions and WHERE clause.
For background on upsert see How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
I was looking for the same thing when I came here, but the lack of a generic "upsert" function botherd me a bit so I thought you could just pass the update and insert sql as arguments on that function form the manual
that would look like this:
CREATE FUNCTION upsert (sql_update TEXT, sql_insert TEXT)
RETURNS VOID
LANGUAGE plpgsql
AS $$
BEGIN
LOOP
-- first try to update
EXECUTE sql_update;
-- check if the row is found
IF FOUND THEN
RETURN;
END IF;
-- not found so insert the row
BEGIN
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing and loop
END;
END LOOP;
END;
$$;
and perhaps to do what you initially wanted to do, batch "upsert", you could use Tcl to split the sql_update and loop the individual updates, the preformance hit will be very small see http://archives.postgresql.org/pgsql-performance/2006-04/msg00557.php
the highest cost is executing the query from your code, on the database side the execution cost is much smaller
There is no simple command to do it.
The most correct approach is to use function, like the one from docs.
Another solution (although not that safe) is to do update with returning, check which rows were updates, and insert the rest of them
Something along the lines of:
update table
set column = x.column
from (values (1,'aa'),(2,'bb'),(3,'cc')) as x (id, column)
where table.id = x.id
returning id;
assuming id:2 was returned:
insert into table (id, column) values (1, 'aa'), (3, 'cc');
Of course it will bail out sooner or later (in concurrent environment), as there is clear race condition in here, but usually it will work.
Here's a longer and more comprehensive article on the topic.
I use this function merge
CREATE OR REPLACE FUNCTION merge_tabla(key INT, data TEXT)
RETURNS void AS
$BODY$
BEGIN
IF EXISTS(SELECT a FROM tabla WHERE a = key)
THEN
UPDATE tabla SET b = data WHERE a = key;
RETURN;
ELSE
INSERT INTO tabla(a,b) VALUES (key, data);
RETURN;
END IF;
END;
$BODY$
LANGUAGE plpgsql
Personally, I've set up a "rule" attached to the insert statement. Say you had a "dns" table that recorded dns hits per customer on a per-time basis:
CREATE TABLE dns (
"time" timestamp without time zone NOT NULL,
customer_id integer NOT NULL,
hits integer
);
You wanted to be able to re-insert rows with updated values, or create them if they didn't exist already. Keyed on the customer_id and the time. Something like this:
CREATE RULE replace_dns AS
ON INSERT TO dns
WHERE (EXISTS (SELECT 1 FROM dns WHERE ((dns."time" = new."time")
AND (dns.customer_id = new.customer_id))))
DO INSTEAD UPDATE dns
SET hits = new.hits
WHERE ((dns."time" = new."time") AND (dns.customer_id = new.customer_id));
Update: This has the potential to fail if simultaneous inserts are happening, as it will generate unique_violation exceptions. However, the non-terminated transaction will continue and succeed, and you just need to repeat the terminated transaction.
However, if there are tons of inserts happening all the time, you will want to put a table lock around the insert statements: SHARE ROW EXCLUSIVE locking will prevent any operations that could insert, delete or update rows in your target table. However, updates that do not update the unique key are safe, so if you no operation will do this, use advisory locks instead.
Also, the COPY command does not use RULES, so if you're inserting with COPY, you'll need to use triggers instead.
Similar to most-liked answer, but works slightly faster:
WITH upsert AS (UPDATE spider_count SET tally=1 WHERE date='today' RETURNING *)
INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert)
(source: http://www.the-art-of-web.com/sql/upsert/)
I custom "upsert" function above, if you want to INSERT AND REPLACE :
`
CREATE OR REPLACE FUNCTION upsert(sql_insert text, sql_update text)
RETURNS void AS
$BODY$
BEGIN
-- first try to insert and after to update. Note : insert has pk and update not...
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
EXECUTE sql_update;
IF FOUND THEN
RETURN;
END IF;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION upsert(text, text)
OWNER TO postgres;`
And after to execute, do something like this :
SELECT upsert($$INSERT INTO ...$$,$$UPDATE... $$)
Is important to put double dollar-comma to avoid compiler errors
check the speed...
According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.
I have the same issue for managing account settings as name value pairs.
The design criteria is that different clients could have different settings sets.
My solution, similar to JWP is to bulk erase and replace, generating the merge record within your application.
This is pretty bulletproof, platform independent and since there are never more than about 20 settings per client, this is only 3 fairly low load db calls - probably the fastest method.
The alternative of updating individual rows - checking for exceptions then inserting - or some combination of is hideous code, slow and often breaks because (as mentioned above) non standard SQL exception handling changing from db to db - or even release to release.
#This is pseudo-code - within the application:
BEGIN TRANSACTION - get transaction lock
SELECT all current name value pairs where id = $id into a hash record
create a merge record from the current and update record
(set intersection where shared keys in new win, and empty values in new are deleted).
DELETE all name value pairs where id = $id
COPY/INSERT merged records
END TRANSACTION
CREATE OR REPLACE FUNCTION save_user(_id integer, _name character varying)
RETURNS boolean AS
$BODY$
BEGIN
UPDATE users SET name = _name WHERE id = _id;
IF FOUND THEN
RETURN true;
END IF;
BEGIN
INSERT INTO users (id, name) VALUES (_id, _name);
EXCEPTION WHEN OTHERS THEN
UPDATE users SET name = _name WHERE id = _id;
END;
RETURN TRUE;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT
For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I'd suggest looking into http://mbk.projects.postgresql.org
The current best practice that I'm aware of is:
COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
Acquire Lock [optional] (advisory is preferable to table locks, IMO)
Merge. (the fun part)
UPDATE will return the number of modified rows. If you use JDBC (Java), you can then check this value against 0 and, if no rows have been affected, fire INSERT instead. If you use some other programming language, maybe the number of the modified rows still can be obtained, check documentation.
This may not be as elegant but you have much simpler SQL that is more trivial to use from the calling code. Differently, if you write the ten line script in PL/PSQL, you probably should have a unit test of one or another kind just for it alone.
Edit: This does not work as expected. Unlike the accepted answer, this produces unique key violations when two processes repeatedly call upsert_foo concurrently.
Eureka! I figured out a way to do it in one query: use UPDATE ... RETURNING to test if any rows were affected:
CREATE TABLE foo (k INT PRIMARY KEY, v TEXT);
CREATE FUNCTION update_foo(k INT, v TEXT)
RETURNS SETOF INT AS $$
UPDATE foo SET v = $2 WHERE k = $1 RETURNING $1
$$ LANGUAGE sql;
CREATE FUNCTION upsert_foo(k INT, v TEXT)
RETURNS VOID AS $$
INSERT INTO foo
SELECT $1, $2
WHERE NOT EXISTS (SELECT update_foo($1, $2))
$$ LANGUAGE sql;
The UPDATE has to be done in a separate procedure because, unfortunately, this is a syntax error:
... WHERE NOT EXISTS (UPDATE ...)
Now it works as desired:
SELECT upsert_foo(1, 'hi');
SELECT upsert_foo(1, 'bye');
SELECT upsert_foo(3, 'hi');
SELECT upsert_foo(3, 'bye');
PostgreSQL >= v15
Big news on this topic as in PostgreSQL v15, it is possible to use MERGE command. In fact, this long awaited feature was listed the first of the improvements of the v15 release.
This is similar to INSERT ... ON CONFLICT but more batch-oriented. It has a powerful WHEN MATCHED vs WHEN NOT MATCHED structure that gives the ability to INSERT, UPDATE or DELETE on such conditions.
It not only eases bulk changes, but it even adds more control that tradition UPSERT and INSERT ... ON CONFLICT
Take a look at this very complete sample from official page:
MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN NOT MATCHED AND s.stock_delta > 0 THEN
INSERT VALUES(s.winename, s.stock_delta)
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
UPDATE SET stock = w.stock + s.stock_delta
WHEN MATCHED THEN
DELETE;
PostgreSQL v9, v10, v11, v12, v13, v14
If version is under v15 and over v9.5 , probably best choice is to use UPSERT syntax, with ON CONFLICT clause
Here is the example how to do upsert with params and without special sql constructions
if you have special condition (sometimes you can't use 'on conflict' because you can't create constraint)
WITH upd AS
(
update view_layer set metadata=:metadata where layer_id = :layer_id and view_id = :view_id returning id
)
insert into view_layer (layer_id, view_id, metadata)
(select :layer_id layer_id, :view_id view_id, :metadata metadata FROM view_layer l
where NOT EXISTS(select id FROM upd WHERE id IS NOT NULL) limit 1)
returning id
maybe it will be helpful

Categories