Returning row if value exists, inserting otherwises - sql

I have a table that is created like so:
CREATE TABLE IF NOT EXISTS my_table
p_id uuid DEFAULT uuid_generate_v4() NOT NULL,
my_column varchar(250) NOT NULL,
PRIMARY KEY (p_id),
CONSTRAINT unique_mycolumn UNIQUE (mycolumn);
What I would like to do is return p_id if a particular value matches whatever is in my_column, but if not, insert that particular value and return the newly generated value. I can "sort" of get around this by trying to from within my code calling the respective queries to query the value, and if that returns null, insert it, but that's a fairly big race condition and I'd like to do it within sql. I've seen a lot of similiar questions, but this is different given the unique constraint on that column.
EDIT: This is what I'm looking to accomplish: Say I have the following data:
1 | foo
2 | bar
If I query for "bar", I'd like to get 2 back. However, If I query for "baz", I'd like to have the table look like the following:
1 | foo
2 | bar
3 | baz
So I guess it would be better described as "return ID if the value exists, and insert it the value otherwise to create a new row".

You could try something like this:
with ins as (insert into my_table (my_column)
select 'baz' where not exists
(select 1 from my_table where my_column = 'baz')
returning p_id)
select p_id from ins
union all
select p_id from my_table where my_column = 'baz';
Note however that this can fail if two transactions try to insert the value 'baz' at the same time.

Related

PostgreSQL: Insert if not exist and then Select

Question
Imagine having the following PostgreSQL table:
CREATE TABLE setting (
user_id bigint PRIMARY KEY NOT NULL,
language lang NOT NULL DEFAULT 'english',
foo bool NOT NULL DEFAULT true,
bar bool NOT NULL DEFAULT true
);
From my research, I know to INSERT a row with the default values if the row for the specific user did not exist, would look something like this:
INSERT INTO setting (user_id)
SELECT %s
WHERE NOT EXISTS (SELECT 1 FROM setting WHERE user_id = %s)
(where the %s are placeholders where I would provide the User's ID)
I also know to get the user's setting (aka to SELECT) I can do the following:
SELECT * FROM setting WHERE user_id = %s
However, I am trying to combine the two, where I can retrieve the user's setting, and if the setting for the particular user does not exist yet, INSERT default values and return those values.
Example
So it would look something like this:
Imagine Alice has her setting already saved in the database but Bob is a new user and does not have it.
When we execute the magical SQL query with Alice's user ID, it will return Alice's setting stored in the database. If we execute the same identical magical SQL query on Bob's user ID, it will detect that Bob does not have any setting saved in the database , thus it will INSERT a setting record with all default values, and then return Bob's newly created setting.
Given that there is an UNIQUE or PK constraint on user_id as Frank Heikens said then try to insert, if it violates the constraint do nothing and return the inserted row (if any) in the t CTE, union it with a 'proper' select and pick the first row only. The optimizer will take care than no extra select is done if the insert returns a row.
with t as
(
insert into setting (user_id) values (%s)
on conflict do nothing
returning *
)
select * from t
union all
select * from setting where user_id = %s
limit 1;
No magic necessary. Use returning and union all:
with inparms as ( -- Put your input parameters in CTE so you bind only once
select %s::bigint as user_id
), cond_insert as ( -- Insert the record if not exists, returning *
insert into settings (user_id)
select i.user_id
from inparms i
where not exists (select 1 from settings where user_id = i.user_id)
returning *
)
select * -- If a record was inserted, get it
from cond_insert
union all
select s.* -- If not, then get the pre-existing record
from inparms i
join settings s on s.user_id = i.user_id;

Postgres upsert without incrementing serial IDs?

Consider the following table:
CREATE TABLE key_phrase(
id SERIAL PRIMARY KEY NOT NULL,
body TEXT UNIQUE
)
I'd like to do the following:
Create a record with the given body if it doesn't already exist.
Return the id of the newly created record, or the id of the existing record if a new record was not created.
Ensure the serial id is not incremented on conflicts.
I've tried a few methods, the most simple including basic usage of DO NOTHING:
INSERT INTO key_phrase(body) VALUES ('example') ON CONFLICT DO NOTHING RETURNING id
However, this will only return an id if a new record is created.
I've also tried the following:
WITH ins AS (
INSERT INTO key_phrase (body)
VALUES (:phrase)
ON CONFLICT (body) DO UPDATE
SET body = NULL
WHERE FALSE
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM key_phrase
WHERE body = :phrase
LIMIT 1;
This will return the id of a newly created record or the id of the existing record. However, it causes the serial primary to be bumped, causing gaps whenever a new record is created.
So how can one perform a conditional insert (upsert) that fulfills the 3 requirements mentioned earlier?
I suspect that you are looking for something like:
with
data as (select :phrase as body),
ins as (
insert into key_phrase (body)
select body
from data d
where not exists (select 1 from key_phrase kp where kp.body = d.body)
returning id
)
select id from ins
union all
select kp.id
from key_phrase kp
inner join data d on d.body = kp.body
The main difference with your original code is that this uses not exists to skip already inserted phrases rather than on conflict. I moved the declaration of the parameter to a CTE to make things easier to follow, but it doesn't have to be that way, we could do:
with
ins as (
insert into key_phrase (body)
select body
from (values(:phrase)) d(body)
where not exists (select 1 from key_phrase where body = :phrase)
returning id
)
select id from ins
union all
select kp.id from key_phrase where body = :phrase
Not using on conflict will reduce the number of sequences that are burned. It should be highlighted, however, that there is no way to guarantee that serials will consistently be sequential. There could be gaps for other reasons. This is by design; the purpose of serials is to guarantee uniqueness, not "sequentiallity". If you really want an auto-increment with no holes, the consider row_number() and a view.

PostgreSQL nested INSERTs / WITHs for foreign key insertions

I'm using PostgreSQL 9.3, and I'm trying to write a SQL script to insert some data for unit tests, and I've run into a bit of a problem.
Let's say we have three tables, structured like this:
------- Table A ------- -------- Table B -------- -------- Table C --------
id | serial NOT NULL id | serial NOT NULL id | serial NOT NULL
foo | character varying a_id | integer NOT NULL b_id | integer NOT NULL
bar | character varying baz | character varying
The columns B.a_id and C.b_id are foreign keys to the id column of tables A and B, respectively.
What I'm trying to do is to insert a row into each of these three tables with pure SQL, without having the ID's hard-coded into the SQL (making assumptions about the database before this script is run seems undesirable, since if those assumptions change I'll have to go back and re-compute the proper ID's for all of the test data).
Note that I do realize I could do this programatically, but in general writing pure SQL is way less verbose than writing program code to execute SQL, so it makes more sense for test suite data.
Anyway, here's the query I wrote which I figured would work:
WITH X AS (
WITH Y AS (
INSERT INTO A (foo)
VALUES ('abc')
RETURNING id
)
INSERT INTO B (a_id, bar)
SELECT id, 'def'
FROM Y
RETURNING id
)
INSERT INTO C (b_id, baz)
SELECT id, 'ghi'
FROM X;
However, this doesn't work, and results in PostgreSQL telling me:
ERROR: WITH clause containing a data-modifying statement must be at the top level
Is there a correct way to write this type of query in general, without hard-coding the ID values?
(You can find a fiddle here which contains this example.)
Don't nest the common table expressions, just write one after the other:
WITH Y AS (
INSERT INTO A (foo)
VALUES ('abc')
RETURNING id
), x as (
INSERT INTO B (a_id, bar)
SELECT id, 'def'
FROM Y
RETURNING id
)
INSERT INTO C (b_id, baz)
SELECT id, 'ghi'
FROM X;

Problems with a PostgreSQL upsert query

I'm trying to update the database by either updating or inserting a new record into the vote_user_table. The table is defined as follows:
Column | Type | Modifiers
-----------+---------+--------------------------------------------------------------
id | integer | not null default nextval('vote_user_table_id_seq'::regclass)
review_id | integer | not null
user_id | integer | not null
positive | boolean | default false
negative | boolean | default false
Indexes:
"vote_user_table_pkey" PRIMARY KEY, btree (id)
Here's the query I'm using. I'm actually using parameters for all of the values, but for testing purposes I replaced them with values that I'm sure should work. The username aaa#aaa.com exists.
WITH updated AS (
UPDATE vote_user_table
SET
positive = 'true',
negative = 'false'
FROM usuario
WHERE
review_id = 6 AND
user_id = usuario.id AND
usuario.username ILIKE 'aaa#aaa.com'
RETURNING vote_user_table.id
)
INSERT INTO vote_user_table
(review_id, user_id, positive, negative)
SELECT 6, usuario.id, 'true', 'false'
FROM usuario, updated
WHERE
updated.id IS NULL AND
usuario.username ILIKE 'aaa#aaa.com'
The output I obtain after running the query is:
INSERT 0 0
Although it is supposed to insert the new value, because the row doesn't exist on the database yet. When I inserted a row manually, using only the insert clause, and then performed the upsert query shown above, it updated the row correctly.
I'm using PostgreSQL version 9.2.4.
I got the idea of writing this query from this other question:
Insert, on duplicate update in PostgreSQL?
Any ideas on what could I be doing wrong?
Any suggestion on how can I achieve what I want to do?
Create statement for vote_user_table:
CREATE TABLE vote_user_table (
id integer NOT NULL,
review_id integer NOT NULL,
user_id integer NOT NULL,
positive boolean DEFAULT false,
negative boolean DEFAULT false
);
CREATE SEQUENCE vote_user_table_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE vote_user_table_id_seq OWNED BY vote_user_table.id;
The UPDATE in the first CTE updated produces no row. That means, you don't get a NULL value for updated.id either. When joining to updated, you get nothing, so no INSERT happens either.
Should work with NOT EXISTS:
WITH updated AS (
UPDATE vote_user_table v
SET positive = TRUE -- use booleann values ..
,negative = FALSE -- .. instead of quoted string literals
FROM usuario u
WHERE v.review_id = 6 -- guessing origin
AND v.user_id = u.id
AND u.username ILIKE 'aaa#aaa.com'
RETURNING v.id
)
INSERT INTO vote_user_table (review_id, user_id, positive, negative)
SELECT 6, u.id, TRUE, FALSE
FROM usuario u
WHERE NOT EXISTS (SELECT 1 FROM updated)
AND u.username ILIKE 'aaa#aaa.com';
Be aware that there is still a very tiny chance for a race condition under heavy concurrent load. Details under this related question:
Upsert with a transaction

Factor (string) to Numeric in PostgreSQL

Similar to this, is it possible to convert a String field to Numeric in PostgreSQL. For instance,
create table test (name text);
insert into test (name) values ('amy');
insert into test (name) values ('bob');
insert into test (name) values ('bob');
insert into test (name) values ('celia');
and add a field that is
name | num
-------+-----
amy | 1
bob | 2
bob | 2
celia | 3
The most effective "hash"-function of all is a serial primary key - giving you a unique number like you wished for in the question.
I also deal with duplicates in this demo:
CREATE TEMP TABLE string (
string_id serial PRIMARY KEY
,string text NOT NULL UNIQUE -- no dupes
,ct int NOT NULL DEFAULT 1 -- count instead of dupe rows
);
Then you would enter new strings like this:
(Data-modifying CTE requires PostgreSQL 9.1 or later.)
WITH x AS (SELECT 'abc'::text AS nu)
, y AS (
UPDATE string s
SET ct = ct + 1
FROM x
WHERE s.string = x.nu
RETURNING TRUE
)
INSERT INTO string (string)
SELECT nu
FROM x
WHERE NOT EXISTS (SELECT 1 FROM y);
If the string nu already exists, the count (ct) is increased by 1. If not, a new row is inserted, starting with a count of 1.
The UNIQUE also adds an index on the column string.string automatically, which leads to optimal performance for this query.
Add additional logic (triggers ?) for UPDATE / DELETE to make this bullet-proof - if needed.
Note, there is a super-tiny race condition here, if two concurrent transactions try to add the same string at the same moment in time. To be absolutely sure, you could use SERIALIZABLE transactions. More info and links under this this related question.
Live demo at sqlfiddle.
How 'bout a hash, such as md5, of name?
create table test (name text, hash text);
-- later
update test set hash = md5(name);
If you need to convert that md5 text to a number: Hashing a String to a Numeric Value in PostgresSQL
If they are all single characters, you could do this:
ALTER TABLE test ADD COLUMN num int;
UPDATE test SET num = ascii(name);
Though that would only return the character for the first letter if the string was more than a single character.
The exact case shown in your request can be produced with the dense_rank window function:
regress=# SELECT name, dense_rank() OVER (ORDER BY name) FROM test;
name | dense_rank
-------+------------
amy | 1
bob | 2
bob | 2
celia | 3
(4 rows)
so if you were adding a number for each row, you'd be able to do something like:
ALTER TABLE test ADD COLUMN some_num integer;
WITH gen(gen_name, gen_num) AS
(SELECT name, dense_rank() OVER (ORDER BY name) FROM test GROUP BY name)
UPDATE test SET some_num = gen_num FROM gen WHERE name = gen_name;
ALTER TABLE test ALTER COLUMN some_num SET NOT NULL;
however I think it's much more sensible to use a hash or to assign generated keys. I'm just showing that your example can be achieved.
The biggest problem with this approach is that inserting new data is a pain. It's a ranking (like your example shows) so if you INSERT INTO test (name) VALUES ('billy'); then the ranking changes.