Postgres upsert without incrementing serial IDs? - sql

Consider the following table:
CREATE TABLE key_phrase(
id SERIAL PRIMARY KEY NOT NULL,
body TEXT UNIQUE
)
I'd like to do the following:
Create a record with the given body if it doesn't already exist.
Return the id of the newly created record, or the id of the existing record if a new record was not created.
Ensure the serial id is not incremented on conflicts.
I've tried a few methods, the most simple including basic usage of DO NOTHING:
INSERT INTO key_phrase(body) VALUES ('example') ON CONFLICT DO NOTHING RETURNING id
However, this will only return an id if a new record is created.
I've also tried the following:
WITH ins AS (
INSERT INTO key_phrase (body)
VALUES (:phrase)
ON CONFLICT (body) DO UPDATE
SET body = NULL
WHERE FALSE
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM key_phrase
WHERE body = :phrase
LIMIT 1;
This will return the id of a newly created record or the id of the existing record. However, it causes the serial primary to be bumped, causing gaps whenever a new record is created.
So how can one perform a conditional insert (upsert) that fulfills the 3 requirements mentioned earlier?

I suspect that you are looking for something like:
with
data as (select :phrase as body),
ins as (
insert into key_phrase (body)
select body
from data d
where not exists (select 1 from key_phrase kp where kp.body = d.body)
returning id
)
select id from ins
union all
select kp.id
from key_phrase kp
inner join data d on d.body = kp.body
The main difference with your original code is that this uses not exists to skip already inserted phrases rather than on conflict. I moved the declaration of the parameter to a CTE to make things easier to follow, but it doesn't have to be that way, we could do:
with
ins as (
insert into key_phrase (body)
select body
from (values(:phrase)) d(body)
where not exists (select 1 from key_phrase where body = :phrase)
returning id
)
select id from ins
union all
select kp.id from key_phrase where body = :phrase
Not using on conflict will reduce the number of sequences that are burned. It should be highlighted, however, that there is no way to guarantee that serials will consistently be sequential. There could be gaps for other reasons. This is by design; the purpose of serials is to guarantee uniqueness, not "sequentiallity". If you really want an auto-increment with no holes, the consider row_number() and a view.

Related

PostgreSQL: Insert if not exist and then Select

Question
Imagine having the following PostgreSQL table:
CREATE TABLE setting (
user_id bigint PRIMARY KEY NOT NULL,
language lang NOT NULL DEFAULT 'english',
foo bool NOT NULL DEFAULT true,
bar bool NOT NULL DEFAULT true
);
From my research, I know to INSERT a row with the default values if the row for the specific user did not exist, would look something like this:
INSERT INTO setting (user_id)
SELECT %s
WHERE NOT EXISTS (SELECT 1 FROM setting WHERE user_id = %s)
(where the %s are placeholders where I would provide the User's ID)
I also know to get the user's setting (aka to SELECT) I can do the following:
SELECT * FROM setting WHERE user_id = %s
However, I am trying to combine the two, where I can retrieve the user's setting, and if the setting for the particular user does not exist yet, INSERT default values and return those values.
Example
So it would look something like this:
Imagine Alice has her setting already saved in the database but Bob is a new user and does not have it.
When we execute the magical SQL query with Alice's user ID, it will return Alice's setting stored in the database. If we execute the same identical magical SQL query on Bob's user ID, it will detect that Bob does not have any setting saved in the database , thus it will INSERT a setting record with all default values, and then return Bob's newly created setting.
Given that there is an UNIQUE or PK constraint on user_id as Frank Heikens said then try to insert, if it violates the constraint do nothing and return the inserted row (if any) in the t CTE, union it with a 'proper' select and pick the first row only. The optimizer will take care than no extra select is done if the insert returns a row.
with t as
(
insert into setting (user_id) values (%s)
on conflict do nothing
returning *
)
select * from t
union all
select * from setting where user_id = %s
limit 1;
No magic necessary. Use returning and union all:
with inparms as ( -- Put your input parameters in CTE so you bind only once
select %s::bigint as user_id
), cond_insert as ( -- Insert the record if not exists, returning *
insert into settings (user_id)
select i.user_id
from inparms i
where not exists (select 1 from settings where user_id = i.user_id)
returning *
)
select * -- If a record was inserted, get it
from cond_insert
union all
select s.* -- If not, then get the pre-existing record
from inparms i
join settings s on s.user_id = i.user_id;

Create a record in table A and assign its id to table B

I have a set of companies and for each of them, I need to generate a UUID in another table.
companies table
detail_id (currently NULL for all companies)
details table
id
uuid
date
...
I'd like to update all companies with newly created detail like this:
UPDATE companies
SET details_id =
(
INSERT INTO details
(uuid, date)
VALUES (uuid_generate_v1(), now()::date)
RETURNING id
)
But that gives me a syntax error since I can't use INSERT INTO inside UPDATE.
What is a proper way to create a row in the details table and immediately set the newly created id in the companies table?
(I am using Postgres)
You can use a data-modifying CTE:
with new_rows as (
INSERT INTO details ("uuid", "date")
VALUES (uuid_generate_v1(), current_date)
RETURNING id
)
update companies
set details_id = new_rows.id
from new_rows

Query ID for name, if not found insert name and return ID

I really struggle to find a efficient way in one postgresql script to ask for the id primary key where the name is X but if not found insert name X and returning id primary key. Im doing it from Python using psycopg2.
I have for simplicity a two col table. A primary key and a character name.
The code below works if it is found and returns me the ID.
SELECT part_id FROM parts WHERE part_name='testing2'
The code below works if it is not found and returns me the new ID
INSERT INTO parts(part_name)
SELECT 'testing2'
WHERE NOT EXISTS (SELECT * FROM parts WHERE part_name='testing2')
RETURNING part_id
I would like to have one call where the server checks everything instead of first asking for the key then checking if it returns anything then sending a new command where I insert the new name and extract the ID for the name.
I simple could not get any of kind of similar questions online to work. Somebody using IFs but could not get anything to work.
Thanks in advance
F
In Postgres you can use CTEs, which can include INSERT:
WITH p AS (
SELECT part_id
FROM parts
WHERE part_name = 'testing2'
),
i AS (
INSERT INTO parts (part_name)
SELECT 'testing2'
WHERE NOT EXISTS (SELECT 1 FROM p)
RETURNING part_id
)
SELECT p.part_id FROM p
UNION ALL
SELECT i.part_id FROM i;

Copy selections from two related tables

I have two tables in a database. In the first table (tab1) I have a list of items. In the second table I have a many to many relationship between these items.
CREATE TABLE tab1(id INTEGER PRIMARY KEY ASC,set INTEGER, name TEXT);
CREATE TABLE tab2(id INTEGER PRIMARY KEY ASC,id1 INTEGER,id2 INTEGER,relationship TEXT);
The items in the first table are comprised of sets that all have the same value for the set field. I want to duplicate any given set with a new set id, such that the new set contains the same elements and relationships of the original set. If all the items in the set have sequential ids, I can do it as follows. First, find the highest id in the set (in this case, set 3):
SELECT id FROM tab1 WHERE set=3 ORDER BY id DESC LIMIT 1
I assign this to a variable $oldid. Next, I duplicate the items in tab1 matching the specified set, giving them a new set (in this case 37)
INSERT INTO tab1 (set,name) SELECT 37, name FROM tab1 WHERE set=3 ORDER BY id ASC
I then get the id of the last row inserted, and assign it to the variable $newid:
SELECT last_insert_rowid()
I then assign $diff = $newid-$oldid. Since the original set has sequential ids, I can simply select the original relationships for set=3, then add the difference:
INSERT INTO tab2 (id2,id2,relationship) SELECT id1+$diff,id2+$diff,type FROM tab WHERE id1 IN (SELECT id FROM tab WHERE set=3)
But this does not work if the set is not consisting of sequential ids in tab1. I could do a complete query of the original ids, then create a 1:1 mapping to the newly inserted ids for set 37, and then add the difference between each row, and then insert the newly computed rows in the table. But this requires loading all the selections to the client and doing all the work on the client. Is there some way to create a query that does it on the server in the general case?
Assuming that (set, name) is a candidate key for tab1, you can use these columns to look up the corresponding values:
INSERT INTO tab2(id1, id2, relationship)
SELECT (SELECT id
FROM tab1
WHERE "set" = 37
AND name = (SELECT name
FROM tab1
WHERE id = tab2.id1)),
(SELECT id
FROM tab1
WHERE "set" = 37
AND name = (SELECT name
FROM tab1
WHERE id = tab2.id2)),
relationship
FROM tab2
WHERE id1 IN (SELECT id
FROM tab1
WHERE "set" = 3)
OR id2 IN (SELECT id
FROM tab1
WHERE "set" = 3)

how to delete duplicates from a database table based on a certain field

i have a table that somehow got duplicated. i basically want to delete all records that are duplicates, which is defined by a field in my table called SourceId. There should only be one record for each source ID.
is there any SQL that i can write that will delete every duplicate so i only have one record per Sourceid ?
Assuming you have a column ID that can tie-break the duplicate sourceid's, you can use this. Using min(id) causes it to keep just the min(id) per sourceid batch.
delete from tbl
where id NOT in
(
select min(id)
from tbl
group by sourceid
)
delete from table
where pk in (
select i2.pk
from table i1
inner join table i2
on i1.SourceId = i2.SourceId
)
good practice is to start with
select * from … and only later replace to delete from …