INSERT into a table from SELECT only if value doesn't exist - sql

I want to insert table1.id into table2.t1col only if table1.id doesn't exist in table2.t1col, yet.
I think I have to use :
insert into table2 name (t1col) value (select id from table1)
but I want to add only if that id doesn't exist in table2 already.

A unique/index constraint guarantees the uniqueness of values. So, it is recommended.
Unfortunately, though, a constraint violation causes the entire insert to fail. So, you can do:
insert into table2(t1col)
select id
from table1 t1
where not exists (select 1 from table2 t2 where t2.t1col = t1.id);
You should also have a unique index/constraint to prevent problems in the future.

If you have a UNIQUE or PRIMARY KEY constraint on table2.t1col, like you most probably should, there is a more elegant solution for Postgres 9.5 (currently beta, to be released real soon now). Use the new UPSERT implementation INSERT ... ON CONFLICT DO NOTING. Quoting the manual:
The optional ON CONFLICT clause specifies an alternative action to
raising a unique violation or exclusion constraint violation error.
For each individual row proposed for insertion, either the insertion
proceeds, or, if an arbiter constraint or index specified by
conflict_target is violated, the alternative conflict_action is taken.
ON CONFLICT DO NOTHING simply avoids inserting a row as its alternative action.
Bold emphasis mine.
So you can simply:
INSERT INTO table2(t1col)
SELECT id FROM table1
ON CONFLICT DO NOTHING;
If table1.id is not defined unique, make it unique:
INSERT INTO table2(t1col)
SELECT DISTINCT id FROM table1
ON CONFLICT DO NOTHING;
For Postgres 9.4 you can find a overview of techniques here:
Select rows which are not present in other table.

You can use a Unique Index for preventing duplicated rows.
But if you want to insert rows filtering it to not insert duplicated rows, you can do this.
INSERT INTO table2 (idcol)
SELECT id FROM table1
EXCEPT
SELECT idcol FROM table2;

Create a unique index on that column and be done with it. No need to check.
CREATE UNIQUE INDEX name ON table (column [, ...]);
http://www.postgresql.org/docs/9.4/static/indexes-unique.html

Use this query
INSERT INTO table2 name (t1col) value
(
SELECT t1.id FROM table1 t1, table2 t2
WHERE t1.id <> t2.id
)

Related

Enforce inserting a record into a secondary table every time a record is inserted into a primary table

In a SQL Server 2008 R2 setup, I have table1 (id, col1-1, col1-2, ...) and table2 (id, table1_id, col2-1, col2-2, ...), where table2.table1_id points to table1.id (i.e. a foreign key to primary key relation).
How do I enforce the rule that every time somebody insert a record into table1, they are also required to create a record into table2?
I would guess that the obvious mechanism to use here are triggers, but this can lead to the chicken and the egg problem since the trigger will require a record in table2 in order to create a record in table1, but in order to create a record in table2 first I need the table1.id value to already exists. How to overcome this?
I need the enforcement to be automatic and built-in into the database. A frequently running job to check for records in table1 without corresponding records in table2 will not make the cut and the rule must be enforced at real time when somebody is actually creating a record in table1.
You could simply use OUTPUT clause to do that. Here is a simple example
CREATE TABLE Parent(Id INT, Col VARCHAR(45));
CREATE TABLE Child(Id INT, Col VARCHAR(45));
INSERT INTO Parent(Id, Col)
OUTPUT INSERTED.Id, INSERTED.Col INTO Child
VALUES(1, 'SomeValue');
SELECT *
FROM Parent P
JOIN Child C ON P.Id = C.Id;
Note that the extended support for SQL Server 2008/2008 R2 ends on July 9, 2019. You should upgrade as soon as possible.
Chicken and the egg problem is preventable, but I tested on Windows SQL 2016:
CREATE TRIGGER [dbo].[Table1_Insert]
ON [dbo].[Table1RR] AFTER INSERT
AS
IF rowcount_big() = 0 RETURN --leave trigger if no record (-s) inserted in the Table1
SET NOCOUNT ON
ALTER TABLE dbo.Table2RR NOCHECK CONSTRAINT FK_Table2_Table1 -- resolve chicken and the egg problem
INSERT INTO dbo.Table2RR select id,null,null from inserted order by id
ALTER TABLE dbo.Table2RR CHECK CONSTRAINT FK_Table2_Table1
GO
ALTER TABLE [dbo].[Table1RR] ENABLE TRIGGER [Table1_Insert]
GO

PostgreSQL insert multiple on conflict targets [duplicate]

I have two columns in table col1, col2, they both are unique indexed (col1 is unique and so is col2).
I need at insert into this table, use ON CONFLICT syntax and update other columns, but I can't use both column in conflict_targetclause.
It works:
INSERT INTO table
...
ON CONFLICT ( col1 )
DO UPDATE
SET
-- update needed columns here
But how to do this for several columns, something like this:
...
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
....
ON CONFLICT requires a unique index* to do the conflict detection. So you just need to create a unique index on both columns:
t=# create table t (id integer, a text, b text);
CREATE TABLE
t=# create unique index idx_t_id_a on t (id, a);
CREATE INDEX
t=# insert into t values (1, 'a', 'foo');
INSERT 0 1
t=# insert into t values (1, 'a', 'bar') on conflict (id, a) do update set b = 'bar';
INSERT 0 1
t=# select * from t;
id | a | b
----+---+-----
1 | a | bar
* In addition to unique indexes, you can also use exclusion constraints. These are a bit more general than unique constraints. Suppose your table had columns for id and valid_time (and valid_time is a tsrange), and you wanted to allow duplicate ids, but not for overlapping time periods. A unique constraint won't help you, but with an exclusion constraint you can say "exclude new records if their id equals an old id and also their valid_time overlaps its valid_time."
A sample table and data
CREATE TABLE dupes(col1 int primary key, col2 int, col3 text,
CONSTRAINT col2_unique UNIQUE (col2)
);
INSERT INTO dupes values(1,1,'a'),(2,2,'b');
Reproducing the problem
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this Q1. The result is
ERROR: duplicate key value violates unique constraint "col2_unique"
DETAIL: Key (col2)=(2) already exists.
What the documentation says
conflict_target can perform unique index inference. When performing
inference, it consists of one or more index_column_name columns and/or
index_expression expressions, and an optional index_predicate. All
table_name unique indexes that, without regard to order, contain
exactly the conflict_target-specified columns/expressions are inferred
(chosen) as arbiter indexes. If an index_predicate is specified, it
must, as a further requirement for inference, satisfy arbiter indexes.
This gives the impression that the following query should work, but it does not because it would actually require a together unique index on col1 and col2. However such an index would not guarantee that col1 and col2 would be unique individually which is one of the OP's requirements.
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1,col2) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this query Q2 (this fails with a syntax error)
Why?
Postgresql behaves this way is because what should happen when a conflict occurs on the second column is not well defined. There are number of possibilities. For example in the above Q1 query, should postgresql update col1 when there is a conflict on col2? But what if that leads to another conflict on col1? how is postgresql expected to handle that?
A solution
A solution is to combine ON CONFLICT with old fashioned UPSERT.
CREATE OR REPLACE FUNCTION merge_db(key1 INT, key2 INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
UPDATE dupes SET col3 = data WHERE col1 = key1 and col2 = key2;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently, or key2
-- already exists in col2,
-- we could get a unique-key failure
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col1) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col2) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
END;
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
You would need to modify the logic of this stored function so that it updates the columns exactly the way you want it to. Invoke it like
SELECT merge_db(3,2,'c');
SELECT merge_db(1,2,'d');
In nowadays is (seems) impossible. Neither the last version of the ON CONFLICT syntax permits to repeat the clause, nor with CTE is possible: not is possible to breack the INSERT from ON CONFLICT to add more conflict-targets.
If you are using postgres 9.5, you can use the EXCLUDED space.
Example taken from What's new in PostgreSQL 9.5:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Vlad got the right idea.
First you have to create a table unique constraint on the columns col1, col2 Then once you do that you can do the following:
INSERT INTO dupes values(3,2,'c')
ON CONFLICT ON CONSTRAINT dupes_pkey
DO UPDATE SET col3 = 'c', col2 = 2
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
works fine. but you should not update col1, col2 in the SET section.
Create a constraint (foreign index, for example).
OR/AND
Look at existing constraints (\d in psq).
Use ON CONSTRAINT(constraint_name) in the INSERT clause.
You can typically (I would think) generate a statement with only one on conflict that specifies the one and only constraint that is of relevance, for the thing you are inserting.
Because typically, only one constraint is the "relevant" one, at a time. (If many, then I'm wondering if something is weird / oddly-designed, hmm.)
Example:
(License: Not CC0, only CC-By)
// there're these unique constraints:
// unique (site_id, people_id, page_id)
// unique (site_id, people_id, pages_in_whole_site)
// unique (site_id, people_id, pages_in_category_id)
// and only *one* of page-id, category-id, whole-site-true/false
// can be specified. So only one constraint is "active", at a time.
val thingColumnName = thingColumnName(notfificationPreference)
val insertStatement = s"""
insert into page_notf_prefs (
site_id,
people_id,
notf_level,
page_id,
pages_in_whole_site,
pages_in_category_id)
values (?, ?, ?, ?, ?, ?)
-- There can be only one on-conflict clause.
on conflict (site_id, people_id, $thingColumnName) <—— look
do update set
notf_level = excluded.notf_level
"""
val values = List(
siteId.asAnyRef,
notfPref.peopleId.asAnyRef,
notfPref.notfLevel.toInt.asAnyRef,
// Only one of these is non-null:
notfPref.pageId.orNullVarchar,
if (notfPref.wholeSite) true.asAnyRef else NullBoolean,
notfPref.pagesInCategoryId.orNullInt)
runUpdateSingleRow(insertStatement, values)
And:
private def thingColumnName(notfPref: PageNotfPref): String =
if (notfPref.pageId.isDefined)
"page_id"
else if (notfPref.pagesInCategoryId.isDefined)
"pages_in_category_id"
else if (notfPref.wholeSite)
"pages_in_whole_site"
else
die("TyE2ABK057")
The on conflict clause is dynamically generated, depending on what I'm trying to do. If I'm inserting a notification preference, for a page — then there can be a unique conflict, on the site_id, people_id, page_id constraint. And if I'm configuring notification prefs, for a category — then instead I know that the constraint that can get violated, is site_id, people_id, category_id.
So I can, and fairly likely you too, in your case?, generate the correct on conflict (... columns ), because I know what I want to do, and then I know which single one of the many unique constraints, is the one that can get violated.
Kind of hacky but I solved this by concatenating the two values from col1 and col2 into a new column, col3 (kind of like an index of the two) and compared against that. This only works if you need it to match BOTH col1 and col2.
INSERT INTO table
...
ON CONFLICT ( col3 )
DO UPDATE
SET
-- update needed columns here
Where col3 = the concatenation of the values from col1 and col2.
I get I am late to the party but for the people looking for answers I found this:
here
INSERT INTO tbl_Employee
VALUES (6,'Noor')
ON CONFLICT (EmpID,EmpName)
DO NOTHING;
ON CONFLICT is very clumsy solution, run
UPDATE dupes SET key1=$1, key2=$2 where key3=$3
if rowcount > 0
INSERT dupes (key1, key2, key3) values ($1,$2,$3);
works on Oracle, Postgres and all other database

postgresql insert into from select

I have two tables table1 and test_table1 which have the same schema.
Both tables have rows/data and pk id's starting from 1.
I would like to do:
insert into test_table1 select * from table1;
but this fails due to the pk values from table1 existing in test_table1.
Way around it would be to specify columns and leave the pk column out, but for some reason thats not working either:
e.g.
NOTE - no pk columns in query below
insert into test_table1 (col1, col2,..., coln) select col1,col2,...,coln from table1;
returns
ERROR: duplicate key value violates unique constraint "test_table1_pkey"
DETAIL: Key (id)=(1) already exists.
I know this works in MySql, is this just due to Postgresql? Anyway around it?
EDIT:
Both tables have primary keys and sequence set.
Since it wasn't clear - tables don't have the same data.
I would just like to add rows from table1 to test_table1.
For answers telling me to exclude primary key from the query - I did as I said before.
Just remove pk column from columns of query
insert into test_table1 (col2,..., coln) select col2,...,coln from table1;
If it still fails maybe you have not sequence on pk columns.
Create sequence on already existing pk column
create sequence test_table1_seq;
ALTER TABLE test_table1
ALTER COLUMN col1 SET DEFAULT nextval('test_table1_seq'::regclass);
And update sequence value to current
SELECT setval('test_table1_seq', (SELECT MAX(col1) FROM test_table1));
This post helped me solve my problem, not sure what went wrong:
How to fix PostgreSQL error "duplicate key violates unique constraint"
If you get this message when trying to insert data into a PostgreSQL database:
ERROR: duplicate key violates unique constraint
That likely means that the primary key sequence in the table you're working with has somehow become out of sync, likely because of a mass import process (or something along those lines). Call it a "bug by design", but it seems that you have to manually reset the a primary key index after restoring from a dump file. At any rate, to see if your values are out of sync, run these two commands:
SELECT MAX(the_primary_key) FROM the_table;
SELECT nextval('the_primary_key_sequence');
If the first value is higher than the second value, your sequence is out of sync. Back up your PG database (just in case), then run thisL
SELECT setval('the_primary_key_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);
That will set the sequence to the next available value that's higher than any existing primary key in the sequence.
You rather would want to do a UPDATE JOIN like
UPDATE test_table1 AS v
SET col1 = s.col1,
col2 = s.col2,
col3 = s.col3,
.....
colN = s.colN
FROM table1 AS s
WHERE v.id = s.id;
what you want to do is an upsert.
with upsert as (
update test_table1 tt
set col1 = t.col1,
col2 = t.col2,
col3 = t.col3
from table1 t
where t.id = tt.id
returning *
)
insert into test_table1(id, col1, col2, col3)
select id, col1,col2,col3
from table1
where not exists (select * from upsert)

What happens with duplicates when inserting multiple rows?

I am running a python script that inserts a large amount of data into a Postgres database, I use a single query to perform multiple row inserts:
INSERT INTO table (col1,col2) VALUES ('v1','v2'),('v3','v4') ... etc
I was wondering what would happen if it hits a duplicate key for the insert. Will it stop the entire query and throw an exception? Or will it merely ignore the insert of that specific row and move on?
The INSERT will just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (PRIMARY KEY, UNIQUE, CHECK or EXCLUDE constraint) - which you did not mention in your question. But that's what you are probably worried about.
Assuming a UNIQUE or PK constraint on (col1,col2), you are dealing with a textbook UPSERT situation. Many related questions and answers to find here.
Generally, if any constraint is violated, an exception is raised which (unless trapped in subtransaction like it's possible in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction.
Without concurrent writes
I.e.: No other transactions will try to write to the same table at the same time.
Exclude rows that are already in the table with WHERE NOT EXISTS ... or any other applicable technique:
Select rows which are not present in other table
And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join WHERE NOT EXISTS ...
One technique to deal with both at once would be EXCEPT:
INSERT INTO tbl (col1, col2)
VALUES
(text 'v1', text 'v2') -- explicit type cast may be needed in 1st row
, ('v3', 'v4')
, ('v3', 'v4') -- beware of dupes in source
EXCEPT SELECT col1, col2 FROM tbl;
EXCEPT without the key word ALL folds duplicate rows in the source. If you know there are no dupes, or you don't want to fold duplicates silently, use EXCEPT ALL (or one of the other techniques). See:
Using EXCEPT clause in PostgreSQL
Generally, if the target table is big, WHERE NOT EXISTS in combination with DISTINCT on the source will probably be faster:
INSERT INTO tbl (col1, col2)
SELECT *
FROM (
SELECT DISTINCT *
FROM (
VALUES
(text 'v1', text'v2')
, ('v3', 'v4')
, ('v3', 'v4') -- dupes in source
) t(c1, c2)
) t
WHERE NOT EXISTS (
SELECT FROM tbl
WHERE col1 = t.c1 AND col2 = t.c2
);
If there can be many dupes, it pays to fold them in the source first. Else use one subquery less.
Related:
Select rows which are not present in other table
With concurrent writes
Use the Postgres UPSERT implementation INSERT ... ON CONFLICT ... in Postgres 9.5 or later:
INSERT INTO tbl (col1,col2)
SELECT DISTINCT * -- still can't insert the same row more than once
FROM (
VALUES
(text 'v1', text 'v2')
, ('v3','v4')
, ('v3','v4') -- you still need to fold dupes in source!
) t(c1, c2)
ON CONFLICT DO NOTHING; -- ignores rows with *any* conflict!
Further reading:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How do I insert a row which contains a foreign key?
Documentation:
The manual
The commit page
The Postgres Wiki page
Craig's reference answer for UPSERT problems:
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
Will it stop the entire query and throw an exception? Yes.
To avoid that, you can look on the following SO question here, which describes how to avoid Postgres from throwing an error for multiple inserts when some of the inserted keys already exist on the DB.
You should basically do this:
INSERT INTO DBtable
(id, field1)
SELECT 1, 'value'
WHERE
NOT EXISTS (
SELECT id FROM DBtable WHERE id = 1
);

Duplicate key error with PostgreSQL INSERT with subquery

There are some similar questions on StackOverflow, but they don't seem to exactly match my case. I am trying to bulk insert into a PostgreSQL table with composite unique constraints. I created a temporary table (temptable) without any constraints, and loaded the data (with possible some duplicate values) in it. So far, so good.
Now, I am trying to transfer the data to the actual table (realtable) with unique index. For this, I used an INSERT statement with a subquery:
INSERT INTO realtable
SELECT * FROM temptable WHERE NOT EXISTS (
SELECT 1 FROM realtable WHERE temptable.added_date = realtable.added_date
AND temptable.product_name = realtable.product_name
);
However, I am getting duplicate key errors:
ERROR: duplicate key value violates unique constraint "realtable_added_date_product_name_key"
SQL state: 23505
Detail: Key (added_date, product_name)=(20000103, TEST) already exists.
My question is, shouldn't the WHERE NOT EXISTS clause prevent this from happening? How can I fix it?
The NOT EXISTS clause only prevents rows from temptable conflicting with existing rows from realtable; it will not prevent multiple rows from temptable from conflicting with each other. This is because the SELECT is calculated once based on the initial state of realtable, not re-calculated after each row is inserted.
One solution would be to use a GROUP BY or DISTINCT ON in the SELECT query, to omit duplicates, e.g.
INSERT INTO realtable
SELECT DISTINCT ON (added_date, product_name) *
FROM temptable WHERE NOT EXISTS (
SELECT 1 FROM realtable WHERE temptable.added_date = realtable.added_date
AND temptable.product_name = realtable.product_name
)
ORDER BY ???; -- this ORDER BY will determine which of a set of duplicates is kept by the DISTINCT ON