Doing an idempotent insert with postrgres - sql

I'd like something like
INSERT VALUES(1,2,3) INTO sometable ON CONFLICT DO NOTHING IF EXACTLY SAME ROW
So I'd like The following behavior:
#CREATE TABLE sometable (a int primary key, b int, c int);
CREATE TABLE
#INSERT INTO sometable (1,2,3) ON CONFLICT DO NOTHING IF EXACTLY SAME ROW
INSERT 0 1
#INSERT INTO sometable (1,2,3) ON CONFLICT DO NOTHING IF EXACTLY SAME ROW
INSERT 0 0
#INSERT INTO sometable (1,3,2) ON CONFLICT DO NOTHING IF EXACTLY SAME ROW
ERROR: duplicate key value violates unique constraint "sometable_pkey"
DETAIL: Key (a)=(1) already exists.
Desiring this seems a very natural thing, because a client application can't assume it will know if an insert succeeded (if postgres or the client crashes or the network fails, the request might have been processed but the client never receives confirmation). So any well written application needs to deal with this case somehow.
However, the least bad way of achieving this that I have found is still very annoying:
INSERT INTO sometable (a,b,c) VALUES(1,2,3) ON CONFLICT(a) UPDATE set sometable.b=2 WHERE sometable.b=2 AND sometable.c=3;
In other words, do a no-op update, but only if the values are what you would have inserted and then throw an error if 0 rows (rather than 1) where touched.
Is there a better way?

You can use an INSERT based on a select:
insert into sometable
select *
from ( values (1,2,3) ) as data(a,b,c)
where not exists (select *
from sometable
where data = sometable);
Yes, the condition where data = sometable is valid in Postgres and simply compares all columns.
This can also be extended to multiple rows:
insert into sometable
select *
from (
values
(1,2,3),
(4,5,6),
(7,8,9)
) as data(a,b,c)
where not exists (select *
from sometable
where data = sometable);
This does not prevent PK violation errors (as on conflict does) if done from multiple transactions though. You still need to handle those errors.

Related

PostgreSQL IF value doesn't exist THEN INSERT value

I've tried multiple variations of this code and it keeps giving me the SQL state: 42601 error. I've looked at the docs and tried using their examples which sometimes work, but do not do what I want.
IF (SELECT guildid FROM Servers <> 1) THEN
INSERT INTO Servers (guildid) VALUES (1)
END IF;
My Servers table is simple. I just have a single row called guildid. I do not want a CASE statement because that only outputs messages.
If you don't have to worry about race conditions (which the if approach suggests), then:
INSERT INTO Servers (guildid)
SELECT guildid
FROM (VALUES (1)) v(guildid)
WHERE NOT EXISTS (SELECT 1 FROM Servers s WHERE s.guildid = v.guildid);
To protect the table, I would suggest adding a unique index on it. In fact, just declare guildid to be the primary key. In that case, your original insert will simply fail, if you attempt to insert a duplicate value.
If you have a unique constraint on that column, you can use on conflict do nothing:
INSERT INTO Servers (guildid) VALUES (1)
ON CONFLICT DO NOTHING;

Updating SQL Server table with composite key

I have a SQL Server table with three columns, the first two columns are the primary key. I'm writing a stored procedure that will update the last two columns in mass and it works fine for that as long as there are are no primary key violations but when there is a primary key violation it throws an error and stops executing.
How can I make it to ignore the line and continue updating the record as long as there is no primary key violation?
Is there a better way to approach this problem? I'm only doing a simple update with where as column2= somevalue AND column 3 = some value.
In SQL Server you'd use MERGE to upsert (i.e. insert or update):
MERGE mytable
USING (SELECT 1 as key1, 2 as key2, 3 as col1, 4 as col2) AS src
ON (mytable.key1 = src.key1 AND mytable.key2 = src.key2)
WHEN MATCHED THEN
UPDATE SET col1 = src.col1, col2 = src.col2
WHEN NOT MATCHED THEN
INSERT (key1, key2, col1, col2) VALUES (src.key1, src.key2, src.col1, src.col2);
There is nothing inherently wrong with your question, despite the rather loud protestations. Your question is confusing, especially when you refer to columns by position. That is a big no-no. So, a script that demonstrates your problem is generally the best way to both demonstrate your problem and get useful suggestions.
The short answer to your question is - you can't. A statement either succeeds or fails as a whole. If you want to update each row individually and ignore certain errors, then you need to write your tsql to do that.
And despite the protests (again), there are situations where it is necessary to update columns that are part of the primary key. It is unusual - very unusual - but you should also be wary of any absolute statement about tsql. When you find yourself doing unusual things, you should review your schema (and your approach) because it is quite possible that there are better ways to accomplish your goal.
And in this case, I suggest that you SHOULD really think about what you are trying to accomplish. If you want to update a set of rows in a particular way and the statement fails - that means there is a flaw somewhere!. Typically, this error implies that your update logic is not correct. Perhaps you assume something about your data that is not accurate? It is impossible to know from a distance. The error message will tell you what set of values caused the conflict - so that should give you sufficient information to investigate. As another tool, write a select statement that demonstrates your proposed update and look for the values in the error message. E.g.
set nocount on;
create table #x (a smallint not null, b smallint not null, c varchar(10) not null, constraint xx primary key(a, b));
insert #x (a, b, c) values (1, 1, 'test'), (1, 2, 'zork');
select * from #x;
update #x set b = 2, c = 'dork';
select a, b, c, cast(2 as smallint) as new_b, 'dork' as new_c
from #x
order by a, new_b;
drop table #x;

Return rows from INSERT with ON CONFLICT without needing to update

I have a situation where I very frequently need to get a row from a table with a unique constraint, and if none exists then create it and return.
For example my table might be:
CREATE TABLE names(
id SERIAL PRIMARY KEY,
name TEXT,
CONSTRAINT names_name_key UNIQUE (name)
);
And it contains:
id | name
1 | bob
2 | alice
Then I'd like to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT DO NOTHING RETURNING id;
Or perhaps:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT (name) DO NOTHING RETURNING id
and have it return bob's id 1. However, RETURNING only returns either inserted or updated rows. So, in the above example, it wouldn't return anything. In order to have it function as desired I would actually need to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = 'bob'
RETURNING id;
which seems kind of cumbersome. I guess my questions are:
What is the reasoning for not allowing the (my) desired behaviour?
Is there a more elegant way to do this?
It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;
This way you do not actually write a new row version without need.
I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.
However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.
Proper solution for single-row UPSERT:
Is SELECT or INSERT in a function prone to race conditions?
General solutions for bulk UPSERT:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Without concurrent write load
If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

What happens with duplicates when inserting multiple rows?

I am running a python script that inserts a large amount of data into a Postgres database, I use a single query to perform multiple row inserts:
INSERT INTO table (col1,col2) VALUES ('v1','v2'),('v3','v4') ... etc
I was wondering what would happen if it hits a duplicate key for the insert. Will it stop the entire query and throw an exception? Or will it merely ignore the insert of that specific row and move on?
The INSERT will just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (PRIMARY KEY, UNIQUE, CHECK or EXCLUDE constraint) - which you did not mention in your question. But that's what you are probably worried about.
Assuming a UNIQUE or PK constraint on (col1,col2), you are dealing with a textbook UPSERT situation. Many related questions and answers to find here.
Generally, if any constraint is violated, an exception is raised which (unless trapped in subtransaction like it's possible in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction.
Without concurrent writes
I.e.: No other transactions will try to write to the same table at the same time.
Exclude rows that are already in the table with WHERE NOT EXISTS ... or any other applicable technique:
Select rows which are not present in other table
And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join WHERE NOT EXISTS ...
One technique to deal with both at once would be EXCEPT:
INSERT INTO tbl (col1, col2)
VALUES
(text 'v1', text 'v2') -- explicit type cast may be needed in 1st row
, ('v3', 'v4')
, ('v3', 'v4') -- beware of dupes in source
EXCEPT SELECT col1, col2 FROM tbl;
EXCEPT without the key word ALL folds duplicate rows in the source. If you know there are no dupes, or you don't want to fold duplicates silently, use EXCEPT ALL (or one of the other techniques). See:
Using EXCEPT clause in PostgreSQL
Generally, if the target table is big, WHERE NOT EXISTS in combination with DISTINCT on the source will probably be faster:
INSERT INTO tbl (col1, col2)
SELECT *
FROM (
SELECT DISTINCT *
FROM (
VALUES
(text 'v1', text'v2')
, ('v3', 'v4')
, ('v3', 'v4') -- dupes in source
) t(c1, c2)
) t
WHERE NOT EXISTS (
SELECT FROM tbl
WHERE col1 = t.c1 AND col2 = t.c2
);
If there can be many dupes, it pays to fold them in the source first. Else use one subquery less.
Related:
Select rows which are not present in other table
With concurrent writes
Use the Postgres UPSERT implementation INSERT ... ON CONFLICT ... in Postgres 9.5 or later:
INSERT INTO tbl (col1,col2)
SELECT DISTINCT * -- still can't insert the same row more than once
FROM (
VALUES
(text 'v1', text 'v2')
, ('v3','v4')
, ('v3','v4') -- you still need to fold dupes in source!
) t(c1, c2)
ON CONFLICT DO NOTHING; -- ignores rows with *any* conflict!
Further reading:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How do I insert a row which contains a foreign key?
Documentation:
The manual
The commit page
The Postgres Wiki page
Craig's reference answer for UPSERT problems:
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
Will it stop the entire query and throw an exception? Yes.
To avoid that, you can look on the following SO question here, which describes how to avoid Postgres from throwing an error for multiple inserts when some of the inserted keys already exist on the DB.
You should basically do this:
INSERT INTO DBtable
(id, field1)
SELECT 1, 'value'
WHERE
NOT EXISTS (
SELECT id FROM DBtable WHERE id = 1
);

Does DB2 have an "insert or update" statement?

From my code (Java) I want to ensure that a row exists in the database (DB2) after my code is executed.
My code now does a select and if no result is returned it does an insert. I really don't like this code since it exposes me to concurrency issues when running in a multi-threaded environment.
What I would like to do is to put this logic in DB2 instead of in my Java code.
Does DB2 have an insert-or-update statement? Or anything like it that I can use?
For example:
insertupdate into mytable values ('myid')
Another way of doing it would probably be to always do the insert and catch "SQL-code -803 primary key already exists", but I would like to avoid that if possible.
Yes, DB2 has the MERGE statement, which will do an UPSERT (update or insert).
MERGE INTO target_table USING source_table ON match-condition
{WHEN [NOT] MATCHED
THEN [UPDATE SET ...|DELETE|INSERT VALUES ....|SIGNAL ...]}
[ELSE IGNORE]
See:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0010873.htm
https://www.ibm.com/support/knowledgecenter/en/SS6NHC/com.ibm.swg.im.dashdb.sql.ref.doc/doc/r0010873.html
https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/merge?lang=en
I found this thread because I really needed a one-liner for DB2 INSERT OR UPDATE.
The following syntax seems to work, without requiring a separate temp table.
It works by using VALUES() to create a table structure . The SELECT * seems surplus IMHO but without it I get syntax errors.
MERGE INTO mytable AS mt USING (
SELECT * FROM TABLE (
VALUES
(123, 'text')
)
) AS vt(id, val) ON (mt.id = vt.id)
WHEN MATCHED THEN
UPDATE SET val = vt.val
WHEN NOT MATCHED THEN
INSERT (id, val) VALUES (vt.id, vt.val)
;
if you have to insert more than one row, the VALUES part can be repeated without having to duplicate the rest.
VALUES
(123, 'text'),
(456, 'more')
The result is a single statement that can INSERT OR UPDATE one or many rows presumably as an atomic operation.
This response is to hopefully fully answer the query MrSimpleMind had in use-update-and-insert-in-same-query and to provide a working simple example of the DB2 MERGE statement with a scenario of inserting AND updating in one go (record with ID 2 is updated and record ID 3 inserted).
CREATE TABLE STAGE.TEST_TAB ( ID INTEGER, DATE DATE, STATUS VARCHAR(10) );
COMMIT;
INSERT INTO TEST_TAB VALUES (1, '2013-04-14', NULL), (2, '2013-04-15', NULL); COMMIT;
MERGE INTO TEST_TAB T USING (
SELECT
3 NEW_ID,
CURRENT_DATE NEW_DATE,
'NEW' NEW_STATUS
FROM
SYSIBM.DUAL
UNION ALL
SELECT
2 NEW_ID,
NULL NEW_DATE,
'OLD' NEW_STATUS
FROM
SYSIBM.DUAL
) AS S
ON
S.NEW_ID = T.ID
WHEN MATCHED THEN
UPDATE SET
(T.STATUS) = (S.NEW_STATUS)
WHEN NOT MATCHED THEN
INSERT
(T.ID, T.DATE, T.STATUS) VALUES (S.NEW_ID, S.NEW_DATE, S.NEW_STATUS);
COMMIT;
Another way is to execute this 2 queries. It's simpler than create a MERGE statement:
update TABLE_NAME set FIELD_NAME=xxxxx where MyID=XXX;
INSERT INTO TABLE_NAME (MyField1,MyField2) values (xxx,xxxxx)
WHERE NOT EXISTS(select 1 from TABLE_NAME where MyId=xxxx);
The first query just updateS the field you need, if the MyId exists.
The second insertS the row into db if MyId does not exist.
The result is that only one of the queries is executed in your db.
I started with hibernate project where hibernate allows you to saveOrUpdate().
I converted that project into JDBC project the problem was with save and update.
I wanted to save and update at the same time using JDBC.
So, I did some research and I came accross ON DUPLICATE KEY UPDATE :
String sql="Insert into tblstudent (firstName,lastName,gender) values (?,?,?)
ON DUPLICATE KEY UPDATE
firstName= VALUES(firstName),
lastName= VALUES(lastName),
gender= VALUES(gender)";
The issue with the above code was that it updated primary key twice which is true as
per mysql documentation:
The affected rows is just a return code. 1 row means you inserted, 2 means you updated, 0 means nothing happend.
I introduced id and increment it to 1. Now I was incrementing the value of id and not mysql.
String sql="Insert into tblstudent (id,firstName,lastName,gender) values (?,?,?)
ON DUPLICATE KEY UPDATE
id=id+1,
firstName= VALUES(firstName),
lastName= VALUES(lastName),
gender= VALUES(gender)";
The above code worked for me for both insert and update.
Hope it works for you as well.