Problems with a PostgreSQL upsert query - sql

I'm trying to update the database by either updating or inserting a new record into the vote_user_table. The table is defined as follows:
Column | Type | Modifiers
-----------+---------+--------------------------------------------------------------
id | integer | not null default nextval('vote_user_table_id_seq'::regclass)
review_id | integer | not null
user_id | integer | not null
positive | boolean | default false
negative | boolean | default false
Indexes:
"vote_user_table_pkey" PRIMARY KEY, btree (id)
Here's the query I'm using. I'm actually using parameters for all of the values, but for testing purposes I replaced them with values that I'm sure should work. The username aaa#aaa.com exists.
WITH updated AS (
UPDATE vote_user_table
SET
positive = 'true',
negative = 'false'
FROM usuario
WHERE
review_id = 6 AND
user_id = usuario.id AND
usuario.username ILIKE 'aaa#aaa.com'
RETURNING vote_user_table.id
)
INSERT INTO vote_user_table
(review_id, user_id, positive, negative)
SELECT 6, usuario.id, 'true', 'false'
FROM usuario, updated
WHERE
updated.id IS NULL AND
usuario.username ILIKE 'aaa#aaa.com'
The output I obtain after running the query is:
INSERT 0 0
Although it is supposed to insert the new value, because the row doesn't exist on the database yet. When I inserted a row manually, using only the insert clause, and then performed the upsert query shown above, it updated the row correctly.
I'm using PostgreSQL version 9.2.4.
I got the idea of writing this query from this other question:
Insert, on duplicate update in PostgreSQL?
Any ideas on what could I be doing wrong?
Any suggestion on how can I achieve what I want to do?
Create statement for vote_user_table:
CREATE TABLE vote_user_table (
id integer NOT NULL,
review_id integer NOT NULL,
user_id integer NOT NULL,
positive boolean DEFAULT false,
negative boolean DEFAULT false
);
CREATE SEQUENCE vote_user_table_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE vote_user_table_id_seq OWNED BY vote_user_table.id;

The UPDATE in the first CTE updated produces no row. That means, you don't get a NULL value for updated.id either. When joining to updated, you get nothing, so no INSERT happens either.
Should work with NOT EXISTS:
WITH updated AS (
UPDATE vote_user_table v
SET positive = TRUE -- use booleann values ..
,negative = FALSE -- .. instead of quoted string literals
FROM usuario u
WHERE v.review_id = 6 -- guessing origin
AND v.user_id = u.id
AND u.username ILIKE 'aaa#aaa.com'
RETURNING v.id
)
INSERT INTO vote_user_table (review_id, user_id, positive, negative)
SELECT 6, u.id, TRUE, FALSE
FROM usuario u
WHERE NOT EXISTS (SELECT 1 FROM updated)
AND u.username ILIKE 'aaa#aaa.com';
Be aware that there is still a very tiny chance for a race condition under heavy concurrent load. Details under this related question:
Upsert with a transaction

Related

PostgreSQL: Insert if not exist and then Select

Question
Imagine having the following PostgreSQL table:
CREATE TABLE setting (
user_id bigint PRIMARY KEY NOT NULL,
language lang NOT NULL DEFAULT 'english',
foo bool NOT NULL DEFAULT true,
bar bool NOT NULL DEFAULT true
);
From my research, I know to INSERT a row with the default values if the row for the specific user did not exist, would look something like this:
INSERT INTO setting (user_id)
SELECT %s
WHERE NOT EXISTS (SELECT 1 FROM setting WHERE user_id = %s)
(where the %s are placeholders where I would provide the User's ID)
I also know to get the user's setting (aka to SELECT) I can do the following:
SELECT * FROM setting WHERE user_id = %s
However, I am trying to combine the two, where I can retrieve the user's setting, and if the setting for the particular user does not exist yet, INSERT default values and return those values.
Example
So it would look something like this:
Imagine Alice has her setting already saved in the database but Bob is a new user and does not have it.
When we execute the magical SQL query with Alice's user ID, it will return Alice's setting stored in the database. If we execute the same identical magical SQL query on Bob's user ID, it will detect that Bob does not have any setting saved in the database , thus it will INSERT a setting record with all default values, and then return Bob's newly created setting.
Given that there is an UNIQUE or PK constraint on user_id as Frank Heikens said then try to insert, if it violates the constraint do nothing and return the inserted row (if any) in the t CTE, union it with a 'proper' select and pick the first row only. The optimizer will take care than no extra select is done if the insert returns a row.
with t as
(
insert into setting (user_id) values (%s)
on conflict do nothing
returning *
)
select * from t
union all
select * from setting where user_id = %s
limit 1;
No magic necessary. Use returning and union all:
with inparms as ( -- Put your input parameters in CTE so you bind only once
select %s::bigint as user_id
), cond_insert as ( -- Insert the record if not exists, returning *
insert into settings (user_id)
select i.user_id
from inparms i
where not exists (select 1 from settings where user_id = i.user_id)
returning *
)
select * -- If a record was inserted, get it
from cond_insert
union all
select s.* -- If not, then get the pre-existing record
from inparms i
join settings s on s.user_id = i.user_id;

SQL query to filter out if a flag is disabled

I have a table with values that look something like this:
user_id | value | disabled
--------------------------
| hello | false
15 | hello | true
Where rows without a user_id are automatically generated with a default value of disabled=false
Users can then select values to disable so that those words will not appear in the list which is only visible to them.
My goal is to be able to return a combination of all system-generated values plus all the values a user has set, where the values aren't disabled.
The query currently looks something like this:
SELECT DISTINCT ON (value)
user_id, value, disabled
FROM table
WHERE value LIKE '%%'
AND disabled = false
AND (user_id IS NULL OR user_id=15)
ORDER BY value, user_id;
However, with this query, it will still return hello since even though it sees that the user-generated has a value of disabled=true, it will still pass the system-generated one where the disabled is set to false.
What it should do is that since it sees a user-generated value that is disabled, it shouldn't return the system-generated one anymore.
Any ideas or suggestions will be greatly appreciated!
I think you want this:
SELECT DISTINCT value
FROM table t
WHERE value LIKE '%%' AND
disabled = false AND
(user_id IS NULL OR user_id = 15) AND
NOT EXISTS (SELECT 1 FROM table t2 WHERE t2.value = t.value and t2.disabled = true);

Returning row if value exists, inserting otherwises

I have a table that is created like so:
CREATE TABLE IF NOT EXISTS my_table
p_id uuid DEFAULT uuid_generate_v4() NOT NULL,
my_column varchar(250) NOT NULL,
PRIMARY KEY (p_id),
CONSTRAINT unique_mycolumn UNIQUE (mycolumn);
What I would like to do is return p_id if a particular value matches whatever is in my_column, but if not, insert that particular value and return the newly generated value. I can "sort" of get around this by trying to from within my code calling the respective queries to query the value, and if that returns null, insert it, but that's a fairly big race condition and I'd like to do it within sql. I've seen a lot of similiar questions, but this is different given the unique constraint on that column.
EDIT: This is what I'm looking to accomplish: Say I have the following data:
1 | foo
2 | bar
If I query for "bar", I'd like to get 2 back. However, If I query for "baz", I'd like to have the table look like the following:
1 | foo
2 | bar
3 | baz
So I guess it would be better described as "return ID if the value exists, and insert it the value otherwise to create a new row".
You could try something like this:
with ins as (insert into my_table (my_column)
select 'baz' where not exists
(select 1 from my_table where my_column = 'baz')
returning p_id)
select p_id from ins
union all
select p_id from my_table where my_column = 'baz';
Note however that this can fail if two transactions try to insert the value 'baz' at the same time.

Return rows of a table that actually changed in an UPDATE

Using Postgres, I can perform an update statement and return the rows affected by the commend.
UPDATE accounts
SET status = merge_accounts.status,
field1 = merge_accounts.field1,
field2 = merge_accounts.field2,
etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
RETURNING accounts.*
This will give me a list of all records that matched the WHERE clause, however will not tell me which rows were actually updated by the operation.
In this simplified use-case it of course would be trivial to simply add another guard AND status != 'Closed, however my real world use-case involves updating potentially dozens of fields from a merge table with 10,000+ rows, and I want to be able to detect which rows were actually changed, and which are identical to their previous version. (The expectation is very few rows will actually have changed).
The best I've got so far is
UPDATE accounts
SET x=..., y=...
FROM accounts as old WHERE old.uid = accounts.uid
FROM merge_accounts WHERE merge_accounts.uid = accounts.uid
RETURNING accounts, old
Which will return a tuple of old and new rows that can then be diff'ed inside my Java codebase itself - however this requires significant additional network traffic and is potentially error prone.
The ideal scenario is to be able to have postgres return just the rows that actually had any values changed - is this possible?
Here on github is a more real world example of what I'm doing, incorporating some of the suggestions so far.
Using Postgres 9.1, but can use 9.4 if required. The requirements are effectively
Be able to perform an upsert of new data
Where we may only know the specific key/value pair to update on any given row
Get back a result containing just the rows that were actually changed by the upsert
Bonus - get a copy of the old records as well.
Since this question was opened I've gotten most of this working now, although I'm unsure if my approach is a good idea or not - it's a bit hacked together.
Only update rows that actually change
That saves expensive updates and expensive checks after the UPDATE.
To update every column with the new value provided (if anything changes):
UPDATE accounts a
SET (status, field1, field2) -- short syntax for ..
= (m.status, m.field1, m.field2) -- .. updating multiple columns
FROM merge_accounts m
WHERE m.uid = a.uid
AND (a.status IS DISTINCT FROM m.status OR
a.field1 IS DISTINCT FROM m.field1 OR
a.field2 IS DISTINCT FROM m.field2)
RETURNING a.*;
Due to PostgreSQL's MVCC model any change to a row writes a new row version. Updating a single column is almost as expensive as updating every column in the row at once. Rewriting the rest of the row comes at practically no cost, as soon as you have to update anything.
Details:
How do I (or can I) SELECT DISTINCT on multiple columns?
UPDATE a whole row in PL/pgSQL
Shorthand for whole rows
If the row types of accounts and merge_accounts are identical and you want to adopt everything from merge_accounts into accounts, there is a shortcut comparing the whole row type:
UPDATE accounts a
SET (status, field1, field2)
= (m.status, m.field1, m.field2)
FROM merge_accounts m
WHERE a.uid = m.uid
AND m IS DISTINCT FROM a
RETURNING a.*;
This even works for NULL values. Details in the manual.
But it's not going to work for your home-grown solution where (quoting your comment):
merge_accounts is identical, save that all non-pk columns are array types
It requires compatible row types, i.e. each column shares the same data type or there is at least an implicit cast between the two types.
For your special case
UPDATE accounts a
SET (status, field1, field2)
= (COALESCE(m.status[1], a.status) -- default to original ..
, COALESCE(m.field1[1], a.field1) -- .. if m.column[1] IS NULL
, COALESCE(m.field2[1], a.field2))
FROM merge_accounts m
WHERE m.uid = a.uid
AND (m.status[1] IS NOT NULL AND a.status IS DISTINCT FROM m.status[1]
OR m.field1[1] IS NOT NULL AND a.field1 IS DISTINCT FROM m.field1[1]
OR m.field2[1] IS NOT NULL AND a.field2 IS DISTINCT FROM m.field2[1])
RETURNING a.*
m.status IS NOT NULL works if columns that shouldn't be updated are NULL in merge_accounts.
m.status <> '{}' if you operate with empty arrays.
m.status[1] IS NOT NULL covers both options.
Related:
Return pre-UPDATE column values using SQL only
if you aren't relying on side-effectts of the update, only update the records that need to change
UPDATE accounts
SET status = merge_accounts.status,
field1 = merge_accounts.field1,
field2 = merge_accounts.field2,
etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
AND NOT (status IS NOT DISTINCT FROM merge_accounts.status
AND field1 IS NOT DISTINCT FROM merge_accounts.field1
AND field2 IS NOT DISTINCT FROM merge_accounts.field2
)
RETURNING accounts.*
I would recommend using the information_schema.columns table to introspect the columns dynamically, and then use those within a plpgsql function to dynamically generate the UPDATE statement.
i.e. this DDL:
create table foo
(
id serial,
val integer,
name text
);
insert into foo (val, name) VALUES (10, 'foo'), (20, 'bar'), (30, 'baz');
And this query:
select column_name
from information_schema.columns
where table_name = 'foo'
order by ordinal_position;
will yield the columns for the table in the order that they were defined in the table DDL.
Essentially you would use the above SELECT within the function to dynamically build up your UPDATE statement by iterating over the results of the above SELECT in a FOR LOOP to dynamically build up both the SET and WHERE clauses.
Some variation of this ?
SELECT * FROM old;
id | val
----+-----
1 | 1
2 | 2
4 | 5
5 | 1
6 | 2
SELECT * FROM new;
id | val
----+-----
1 | 2
2 | 2
3 | 2
5 | 1
6 | 1
SELECT * FROM old JOIN new ON old.id = new.id;
id | val | id | val
----+-----+----+-----
1 | 1 | 1 | 2
2 | 2 | 2 | 2
5 | 1 | 5 | 1
6 | 2 | 6 | 1
(4 rows)
WITH sel AS (
SELECT o.id , o.val FROM old o JOIN new n ON o.id=n.id ),
upd AS (
UPDATE old SET val = new.val FROM new WHERE new.id=old.id RETURNING old.* )
SELECT * from sel, upd WHERE sel.id = upd.id AND sel.val <> upd.val;
id | val | id | val
----+-----+----+-----
1 | 1 | 1 | 2
6 | 2 | 6 | 1
(2 rows)
Refer SO answer and read the entire discussion.
If you are updating a single table and want to know if the row is actually changed you can use this query:
with rows_affected as (
update mytable set (field1, field2, field3)=('value1', 'value2', 3) where id=1 returning *
)
select count(*)>0 as is_modified from rows_affected
join mytable on mytable.id=rows_affected.id
where rows_affected is distinct from mytable;
And you can wrap your existing queries into this one without the need to modify the actual update statements.

Factor (string) to Numeric in PostgreSQL

Similar to this, is it possible to convert a String field to Numeric in PostgreSQL. For instance,
create table test (name text);
insert into test (name) values ('amy');
insert into test (name) values ('bob');
insert into test (name) values ('bob');
insert into test (name) values ('celia');
and add a field that is
name | num
-------+-----
amy | 1
bob | 2
bob | 2
celia | 3
The most effective "hash"-function of all is a serial primary key - giving you a unique number like you wished for in the question.
I also deal with duplicates in this demo:
CREATE TEMP TABLE string (
string_id serial PRIMARY KEY
,string text NOT NULL UNIQUE -- no dupes
,ct int NOT NULL DEFAULT 1 -- count instead of dupe rows
);
Then you would enter new strings like this:
(Data-modifying CTE requires PostgreSQL 9.1 or later.)
WITH x AS (SELECT 'abc'::text AS nu)
, y AS (
UPDATE string s
SET ct = ct + 1
FROM x
WHERE s.string = x.nu
RETURNING TRUE
)
INSERT INTO string (string)
SELECT nu
FROM x
WHERE NOT EXISTS (SELECT 1 FROM y);
If the string nu already exists, the count (ct) is increased by 1. If not, a new row is inserted, starting with a count of 1.
The UNIQUE also adds an index on the column string.string automatically, which leads to optimal performance for this query.
Add additional logic (triggers ?) for UPDATE / DELETE to make this bullet-proof - if needed.
Note, there is a super-tiny race condition here, if two concurrent transactions try to add the same string at the same moment in time. To be absolutely sure, you could use SERIALIZABLE transactions. More info and links under this this related question.
Live demo at sqlfiddle.
How 'bout a hash, such as md5, of name?
create table test (name text, hash text);
-- later
update test set hash = md5(name);
If you need to convert that md5 text to a number: Hashing a String to a Numeric Value in PostgresSQL
If they are all single characters, you could do this:
ALTER TABLE test ADD COLUMN num int;
UPDATE test SET num = ascii(name);
Though that would only return the character for the first letter if the string was more than a single character.
The exact case shown in your request can be produced with the dense_rank window function:
regress=# SELECT name, dense_rank() OVER (ORDER BY name) FROM test;
name | dense_rank
-------+------------
amy | 1
bob | 2
bob | 2
celia | 3
(4 rows)
so if you were adding a number for each row, you'd be able to do something like:
ALTER TABLE test ADD COLUMN some_num integer;
WITH gen(gen_name, gen_num) AS
(SELECT name, dense_rank() OVER (ORDER BY name) FROM test GROUP BY name)
UPDATE test SET some_num = gen_num FROM gen WHERE name = gen_name;
ALTER TABLE test ALTER COLUMN some_num SET NOT NULL;
however I think it's much more sensible to use a hash or to assign generated keys. I'm just showing that your example can be achieved.
The biggest problem with this approach is that inserting new data is a pain. It's a ranking (like your example shows) so if you INSERT INTO test (name) VALUES ('billy'); then the ranking changes.