PostgreSQL: Insert if not exist and then Select - sql

Question
Imagine having the following PostgreSQL table:
CREATE TABLE setting (
user_id bigint PRIMARY KEY NOT NULL,
language lang NOT NULL DEFAULT 'english',
foo bool NOT NULL DEFAULT true,
bar bool NOT NULL DEFAULT true
);
From my research, I know to INSERT a row with the default values if the row for the specific user did not exist, would look something like this:
INSERT INTO setting (user_id)
SELECT %s
WHERE NOT EXISTS (SELECT 1 FROM setting WHERE user_id = %s)
(where the %s are placeholders where I would provide the User's ID)
I also know to get the user's setting (aka to SELECT) I can do the following:
SELECT * FROM setting WHERE user_id = %s
However, I am trying to combine the two, where I can retrieve the user's setting, and if the setting for the particular user does not exist yet, INSERT default values and return those values.
Example
So it would look something like this:
Imagine Alice has her setting already saved in the database but Bob is a new user and does not have it.
When we execute the magical SQL query with Alice's user ID, it will return Alice's setting stored in the database. If we execute the same identical magical SQL query on Bob's user ID, it will detect that Bob does not have any setting saved in the database , thus it will INSERT a setting record with all default values, and then return Bob's newly created setting.

Given that there is an UNIQUE or PK constraint on user_id as Frank Heikens said then try to insert, if it violates the constraint do nothing and return the inserted row (if any) in the t CTE, union it with a 'proper' select and pick the first row only. The optimizer will take care than no extra select is done if the insert returns a row.
with t as
(
insert into setting (user_id) values (%s)
on conflict do nothing
returning *
)
select * from t
union all
select * from setting where user_id = %s
limit 1;

No magic necessary. Use returning and union all:
with inparms as ( -- Put your input parameters in CTE so you bind only once
select %s::bigint as user_id
), cond_insert as ( -- Insert the record if not exists, returning *
insert into settings (user_id)
select i.user_id
from inparms i
where not exists (select 1 from settings where user_id = i.user_id)
returning *
)
select * -- If a record was inserted, get it
from cond_insert
union all
select s.* -- If not, then get the pre-existing record
from inparms i
join settings s on s.user_id = i.user_id;

Related

postgres insert into multiple tables after each other and return everything

Given postgres database with 3 tables:
users(user_id: uuid, ...)
urls(slug_id:int8 pkey, slug:text unique not null, long_url:text not null)
userlinks(user_id:fkey users.user_id, slug_id:fkey urls.slug_id)
pkey(user_id, slug_id)
The userlinks table exists as a cross reference to associate url slugs to one or more users.
When a new slug is created by a user I'd like to INSERT into the urls table, take the slug_id that was created there, INSERT into userlinks with current users ID and slug_id
Then if possible return both results as a table of records.
Current users id is accessible with auth.uid()
I'm doing this with a stored procedure in supabase
I've gotten this far but I'm stuck:
WITH urls_row as (
INSERT INTO urls(slug, long_url)
VALUES ('testslug2', 'testlong_url2')
RETURNING slug_id
)
INSERT INTO userlinks(user_id, slug_id)
VALUES (auth.uid(), urls_row.slug_id)
--RETURNING *
--RETURNING (urls_record, userlinks_record)
Try this :
WITH urls_row as (
INSERT INTO urls(slug, long_url)
VALUES ('testslug2', 'testlong_url2')
RETURNING slug_id
), userlink_row AS (
INSERT INTO userlinks(user_id, slug_id)
SELECT auth.uid(), urls_row.slug_id
FROM urls_row
RETURNING *
)
SELECT *
FROM urls_row AS ur
INNER JOIN userlink_row AS us
ON ur.slug_id = us.slug_id

Postgres upsert without incrementing serial IDs?

Consider the following table:
CREATE TABLE key_phrase(
id SERIAL PRIMARY KEY NOT NULL,
body TEXT UNIQUE
)
I'd like to do the following:
Create a record with the given body if it doesn't already exist.
Return the id of the newly created record, or the id of the existing record if a new record was not created.
Ensure the serial id is not incremented on conflicts.
I've tried a few methods, the most simple including basic usage of DO NOTHING:
INSERT INTO key_phrase(body) VALUES ('example') ON CONFLICT DO NOTHING RETURNING id
However, this will only return an id if a new record is created.
I've also tried the following:
WITH ins AS (
INSERT INTO key_phrase (body)
VALUES (:phrase)
ON CONFLICT (body) DO UPDATE
SET body = NULL
WHERE FALSE
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM key_phrase
WHERE body = :phrase
LIMIT 1;
This will return the id of a newly created record or the id of the existing record. However, it causes the serial primary to be bumped, causing gaps whenever a new record is created.
So how can one perform a conditional insert (upsert) that fulfills the 3 requirements mentioned earlier?
I suspect that you are looking for something like:
with
data as (select :phrase as body),
ins as (
insert into key_phrase (body)
select body
from data d
where not exists (select 1 from key_phrase kp where kp.body = d.body)
returning id
)
select id from ins
union all
select kp.id
from key_phrase kp
inner join data d on d.body = kp.body
The main difference with your original code is that this uses not exists to skip already inserted phrases rather than on conflict. I moved the declaration of the parameter to a CTE to make things easier to follow, but it doesn't have to be that way, we could do:
with
ins as (
insert into key_phrase (body)
select body
from (values(:phrase)) d(body)
where not exists (select 1 from key_phrase where body = :phrase)
returning id
)
select id from ins
union all
select kp.id from key_phrase where body = :phrase
Not using on conflict will reduce the number of sequences that are burned. It should be highlighted, however, that there is no way to guarantee that serials will consistently be sequential. There could be gaps for other reasons. This is by design; the purpose of serials is to guarantee uniqueness, not "sequentiallity". If you really want an auto-increment with no holes, the consider row_number() and a view.

Why PostgreSQL CTE with DELETE is not working?

I was trying to delete a record from my stock table if the update in the same table results in quantity 0 using two CTEs.
The upserts are working, but the delete is not generating the result I was expecting. the quantity in stock table is changing to zero but the record is not being deleted.
Table structure:
CREATE TABLE IF NOT EXISTS stock_location (
stock_location_id SERIAL
, site_code VARCHAR(10) NOT NULL
, location_code VARCHAR(50) NOT NULL
, status CHAR(1) NOT NULL DEFAULT 'A'
, CONSTRAINT pk_stock_location PRIMARY KEY (stock_location_id)
, CONSTRAINT ui_stock_location__keys UNIQUE (site_code, location_code)
);
CREATE TABLE IF NOT EXISTS stock (
stock_id SERIAL
, stock_location_id INT NOT NULL
, item_code VARCHAR(50) NOT NULL
, quantity FLOAT NOT NULL
, CONSTRAINT pk_stock PRIMARY KEY (stock_id)
, CONSTRAINT ui_stock__keys UNIQUE (stock_location_id, item_code)
, CONSTRAINT fk_stock__stock_location FOREIGN KEY (stock_location_id)
REFERENCES stock_location (stock_location_id)
ON DELETE CASCADE ON UPDATE CASCADE
);
This is how the statement looks like:
WITH stock_location_upsert AS (
INSERT INTO stock_location (
site_code
, location_code
, status
) VALUES (
inSiteCode
, inLocationCode
, inStatus
)
ON CONFLICT ON CONSTRAINT ui_stock_location__keys
DO UPDATE SET
status = inStatus
RETURNING stock_location_id
)
, stock_upsert AS (
INSERT INTO stock (
stock_location_id
, item_code
, quantity
)
SELECT
slo.stock_location_id
, inItemCode
, inQuantity
FROM stock_location_upsert slo
ON CONFLICT ON CONSTRAINT ui_stock__keys
DO UPDATE SET
quantity = stock.quantity + inQuantity
RETURNING stock_id, quantity
)
DELETE FROM stock stk
USING stock_upsert stk2
WHERE stk.stock_id = stk2.stock_id
AND stk.quantity = 0;
Does anyone know what's going on?
This is an example of what I'm trying to do:
DROP TABLE IF EXISTS test1;
CREATE TABLE IF NOT EXISTS test1 (
id serial
, code VARCHAR(10) NOT NULL
, description VARCHAR(100) NOT NULL
, quantity INT NOT NULL
, CONSTRAINT pk_test1 PRIMARY KEY (id)
, CONSTRAINT ui_test1 UNIQUE (code)
);
-- UPSERT
WITH test1_upsert AS (
INSERT INTO test1 (
code, description, quantity
) VALUES (
'01', 'DESC 01', 1
)
ON CONFLICT ON CONSTRAINT ui_test1
DO UPDATE SET
description = 'DESC 02'
, quantity = 0
RETURNING test1.id, test1.quantity
)
DELETE FROM test1
USING test1_upsert
WHERE test1.id = test1_upsert.id
AND test1_upsert.quantity = 0;
The second time the UPSERT command runs, it should delete the record from test1 once the quantity will be updated to zero.
Makes sense?
Here, DELETE is working in the way it was designed to work. The answer is actually pretty straightforward and documented. I've experienced the same behaviour years ago.
The reason your delete is not actually removing the data is because your where condition doesn't match with what's stored inside the table as far as what the delete statement sees.
All sub-statements within CTE (Common Table Expression) are executed with the same snapshot of data, so they can't see other statement effect on target table. In this case, when you run UPDATE and then DELETE, the DELETE statement sees the same data that UPDATE did, and doesn't see the updated data that UPDATE statement modified.
How can you work around that? You need to separate UPDATE & DELETE into two independent statements.
In case you need to pass the information about what to delete you could for example (1) create a temporary table and insert the data primary key that has been updated so that you can join to that in your latter query (DELETE based on data that was UPDATEd). (2) You could achieve the same result by simply adding a column within the updated table and changing its value to mark updated rows or (3) however you like it to get the job done. You should get the feeling of what needs to be done by above examples.
Quoting the manual to support my findings:
7.8.2. Data-Modifying Statements in WITH
The sub-statements in WITH are executed concurrently with each other
and with the main query. Therefore, when using data-modifying
statements in WITH, the order in which the specified updates actually
happen is unpredictable. All the statements are executed with the same
snapshot (see Chapter 13), so they cannot “see” one another's effects
on the target tables.
(...)
This also applies to deleting a row that was already updated in the same statement: only the update is performed
Adding to the helpful explanation above... Whenever possible it is absolutely best to break out modifying procedures into their own statements.
However, when the CTE has multiple modifying procedures that reference the same subquery and temporary tables are unideal (such as in stored procedures) then you just need a good solution.
In that case if you'd like a simple trick about how to go about ensuring a bit of order, consider this example:
WITH
to_insert AS
(
SELECT
*
FROM new_values
)
, first AS
(
DELETE FROM some_table
WHERE
id in (SELECT id FROM to_insert)
RETURNING *
)
INSERT INTO some_other_table
SELECT * FROM new_values
WHERE
exists (SELECT count(*) FROM first)
;
The trick here is the exists (SELECT count(*) FROM first) part which must be executed first before the insert can happen. This is a way (which I wouldn't consider too hacky) to enforce an order while keeping everything within one CTE.
But this is just the concept - there are more optimal ways of doing the same thing for a given context.

Returning row if value exists, inserting otherwises

I have a table that is created like so:
CREATE TABLE IF NOT EXISTS my_table
p_id uuid DEFAULT uuid_generate_v4() NOT NULL,
my_column varchar(250) NOT NULL,
PRIMARY KEY (p_id),
CONSTRAINT unique_mycolumn UNIQUE (mycolumn);
What I would like to do is return p_id if a particular value matches whatever is in my_column, but if not, insert that particular value and return the newly generated value. I can "sort" of get around this by trying to from within my code calling the respective queries to query the value, and if that returns null, insert it, but that's a fairly big race condition and I'd like to do it within sql. I've seen a lot of similiar questions, but this is different given the unique constraint on that column.
EDIT: This is what I'm looking to accomplish: Say I have the following data:
1 | foo
2 | bar
If I query for "bar", I'd like to get 2 back. However, If I query for "baz", I'd like to have the table look like the following:
1 | foo
2 | bar
3 | baz
So I guess it would be better described as "return ID if the value exists, and insert it the value otherwise to create a new row".
You could try something like this:
with ins as (insert into my_table (my_column)
select 'baz' where not exists
(select 1 from my_table where my_column = 'baz')
returning p_id)
select p_id from ins
union all
select p_id from my_table where my_column = 'baz';
Note however that this can fail if two transactions try to insert the value 'baz' at the same time.

PostgreSQL: Get last updates by joining 2 tables

I have 2 tables that I need to join to get the last/latest update in the 2nd table based on valid rows in the 1st table.
Code below is en example.
Table 1: Registered users
This table contains a list of users registered in the system.
When a user gets registered it gets added into this table. A user is registered with a name, and a registration time.
A user can get de-registered from the system. When this is done, the de-registration column gets updated to the time that the user was removed. If this value is NULL, it means that the user is still registered.
CREATE TABLE users (
entry_idx SERIAL PRIMARY KEY,
name TEXT NOT NULL,
reg_time TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
dereg_time TIMESTAMP WITH TIME ZONE DEFAULT NULL
);
Table 2: User updates
This table contains updates on the users. Each time a user changes a property (example position) the change gets stored in this table. No updates must be removed since there is a requirement to keep history in the table.
CREATE TABLE user_updates (
entry_idx SERIAL PRIMARY KEY,
name TEXT NOT NULL,
position INTEGER NOT NULL,
time TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Required output
So given the above information, I need to get a new table that contains only the last update for the current registered users.
Test Data
The following data can be used as test data for the above tables:
-- Register 3 users
INSERT INTO users(name) VALUES ('Person1');
INSERT INTO users(name) VALUES ('Person2');
INSERT INTO users(name) VALUES ('Person3');
-- Add some updates for all users
INSERT INTO user_updates(name, position) VALUES ('Person1', 0);
INSERT INTO user_updates(name, position) VALUES ('Person1', 1);
INSERT INTO user_updates(name, position) VALUES ('Person1', 2);
INSERT INTO user_updates(name, position) VALUES ('Person2', 1);
INSERT INTO user_updates(name, position) VALUES ('Person3', 1);
-- Unregister the 2nd user
UPDATE users SET dereg_time = NOW() WHERE name = 'Person2';
From the above, I want the last updates for Person 1 and Person 3.
Failed attempt
I have tried using joins and other methods but the results are not what I am looking for. The question is almost the same as one asked here. I have used the solution in answer 1 and it does give the correct answer, but it takes too long to get too the answer in my system.
Based on the above link I have created the following query that 'works':
SELECT
t1.*
, t2.*
FROM
users t1
JOIN (
SELECT
t.*,
row_number()
OVER (
PARTITION BY
t.name
ORDER BY t.entry_idx DESC
) rn
FROM user_updates t
) t2
ON
t1.name = t2.name
AND
t2.rn = 1
WHERE
t1.dereg_time IS NULL;
Problem
The problem with the above query is that it takes very long to complete. Table 1 contains a small list of users, while table 2 contains a huge amount of updates. I think that the query might be inefficient in the way that it handles the 2 tables (based on my limited understanding of the query). From pgAdmin's explain it does a lot of sorting and aggregation on the updates 1st before joining with the registered table.
Question
How can I formulate a query to efficiently and fast get the latest updates for registered users?
PostgreSQL have a special distinct on syntax for such type of queries:
select distinct on(t1.name)
--it's better to specify columns explicitly, * just for example
t1.*, t2.*
from users as t1
left outer join user_updates as t2 on t2.name = t1.name
where t1.dereg_time is null
order by t1.name, t2.entry_idx desc
sql fiddle demo
you can try it, but for me your query should work fine too.
I am using q1 to get the last update of each user. Then joining with users to remove entries that have been deregistered. Then joining with q2 to get rest of user_update fields.
select users.*,q2.* from users
join
(select name,max(time) t from user_updates group by name) q1
on users.name=q1.name
join user_updates q2 on q1.t=q2.time and q1.name=q2.name
where
users.dereg_time is null
(I haven't tested it. have edited some things)