Is it safe to insert row, with value incremented by CTE select? - sql

Say we have this table:
create table if not exists template (
id serial primary key,
label text not null,
version integer not null default 1,
created_at timestamp not null default current_timestamp,
unique(label, version)
);
The logic is to insert new record, incrementing version value in case of the equal label value. First intention is to do something like this:
with v as (
select coalesce(max(version), 0) + 1 as new_version
from template t where label = 'label1'
)
insert into template (label, version)
values ('label1', (select new_version from v))
returning *;
Although it works, I'm pretty sure it wouldn't be safe in case of the simultaneous inserts. Am I right?
If I am, should I wrap this query in a transaction?
Or is there a better way to implement this kind of versioning?

Gap-less serial IDs per label are hard to come by. Your simple approach can easily fail with concurrent writes due to inherent race conditions. And "value-locking" is not generally implemented in Postgres.
But there is a way. Introduce a parent table label - if you don't already have one - and take a lock on the parent row. This keeps locking to a minimum and should avoid excessive costs from lock contention.
CREATE TABLE label (
label text PRIMARY KEY
);
CREATE TABLE version (
id serial PRIMARY KEY
, label text NOT NULL REFERENCES label
, version integer NOT NULL DEFAULT 1
, created_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
, UNIQUE(label, version)
);
Then, in a single transaction:
BEGIN;
INSERT INTO label (label)
VALUES ('label1')
ON CONFLICT (label) DO UPDATE
SET label = NULL WHERE false; -- never executed, but still locks the row
RETURNING *; -- optional
INSERT INTO version (label, version)
SELECT 'label1', coalesce(max(v.version), 0) + 1
FROM version v
WHERE v.label = 'label1'
RETURNING *;
COMMIT;
The first UPSERT inserts a new label if it's not there, yet, or locks the row if it is. Either way, the transaction now holds a lock on that label, excluding concurrent writes.
The second INSERT adds a new version, or the first one if there are none, yet.
You could also move the UPSERT into a CTE attached to the INSERT, thus making it a single command and hence always a single transaction implicitly. But the CTE is not needed per se.
This is safe under concurrent write load and works for all corner cases. You just have to make sure that all possibly competing write access takes the same route.
You might wrap this into a function. This ...
... ensures a single transaction
... simplifies the call, with a single mention of the label value
... allows to revoke write privileges from the tables and only grant it to this function if desired, enforcing the right access pattern.
CREATE FUNCTION f_new_label (_label text)
RETURNS TABLE (label text, version int)
LANGUAGE sql STRICT AS
$func$
INSERT INTO label (label)
VALUES (_label)
ON CONFLICT (label) DO UPDATE
SET label = NULL WHERE false; -- never executed, but still locks the row
INSERT INTO version AS v (label, version)
SELECT _label, coalesce(max(v1.version), 0) + 1
FROM version v1
WHERE v1.label = _label
RETURNING v.label, v.version;
$func$;
Call:
SELECT * FROM f_new_label('label1');
fiddle
Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?
UPDATE n:m relation in view as array (operations)

Yes, there could be collisions with simultaneous inserts.
Transactions could lead to locks, as you want to keep the state if the table till the insert occurs with the new version number. appently, postgres would create a deadlock with multiple concurrent inserts.
You can use a before i8nsert trigger , which would guarantee, that every insert gets a higher version number, as it would go row by row.
But you have to remember, that cpus and the sql server can rearrange computation order, so that the rule first come first serve, may not be applied.

Related

Postgres: SELECT or INSERT in high concurrent write load DB

We have a DB for which we need a "selsert" (not upsert) function.
The function should take a text value and return a id column of existing row (SELECT) or insert the value and return id of new row (INSERT).
There are multiple processes that will need to perform this functionality (selsert)
I have been experimenting with pg_advisory_lock and ON CONFLICT clause for INSERT but am still not sure what approach would work best (even when looking at some of the other answers).
So far I have come up with following
WITH
selected AS (
SELECT id FROM test.body_parts WHERE (lower(trim(part))) = lower(trim('finger')) LIMIT 1
),
inserted AS (
INSERT INTO test.body_parts (part)
SELECT trim('finger')
WHERE NOT EXISTS ( SELECT * FROM selected )
-- ON CONFLICT (lower(trim(part))) DO NOTHING -- not sure if this is needed
RETURNING id
)
SELECT id, 'inserted' FROM inserted
UNION
SELECT id, 'selected' FROM selected
Will above query (within function) insure consistency in high
concurrency write workloads?
Are there any other issues I must consider (locking?, etc, etc)
BTW, I can insure that there are no duplicate values of (part) by creating unique index. That is not an issue. What I am after is that SELECT returns existing value if another process does INSERT (I hope I am explaining this right)
Unique index would have following definition
CREATE UNIQUE INDEX body_parts_part_ux
ON test.body_parts
USING btree
(lower(trim(part)));

Return rows from INSERT with ON CONFLICT without needing to update

I have a situation where I very frequently need to get a row from a table with a unique constraint, and if none exists then create it and return.
For example my table might be:
CREATE TABLE names(
id SERIAL PRIMARY KEY,
name TEXT,
CONSTRAINT names_name_key UNIQUE (name)
);
And it contains:
id | name
1 | bob
2 | alice
Then I'd like to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT DO NOTHING RETURNING id;
Or perhaps:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT (name) DO NOTHING RETURNING id
and have it return bob's id 1. However, RETURNING only returns either inserted or updated rows. So, in the above example, it wouldn't return anything. In order to have it function as desired I would actually need to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = 'bob'
RETURNING id;
which seems kind of cumbersome. I guess my questions are:
What is the reasoning for not allowing the (my) desired behaviour?
Is there a more elegant way to do this?
It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;
This way you do not actually write a new row version without need.
I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.
However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.
Proper solution for single-row UPSERT:
Is SELECT or INSERT in a function prone to race conditions?
General solutions for bulk UPSERT:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Without concurrent write load
If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

How to retrieve the actual default value for a column before insertion

I have a table in my postgres database that looks like this when I describe it.
Table "public.statistical_outputs"
Column | Type | Modifiers
-------------------+--------------------------+------------------------------------------------------------------
id | bigint | not null default nextval('statistical_outputs_id_seq'::regclass)
I want to know what value will be inserted into the id column if I use a statement like
insert into statistical_outputs VALUES (DEFAULT);
I have tried things like
select nextval('id') from statistical_outputs;
but it does not work.
Possibly related questions:
postgresql sequence nextval in schema
PostgreSQL nextval and currval in same query
This questions is a possible duplicate of:
Get the default values of table columns in Postgres?
However, the answer given by Chris is the one I want without having to look at the information schema (which I think I tried but didn't work).
There's no way to do what you want directly - you can't preview the value.
Imagine:
regress=> CREATE TABLE crazy (blah integer, rand float4 default random());
CREATE TABLE
regress=> insert into crazy(blah, rand) values (1, DEFAULT);
INSERT 0 1
regress=> select * from crazy;
blah | rand
------+----------
1 | 0.932575
(1 row)
random() is a volatile function that returns a different value each time. So any attempt to preview the value would only get you a different value to the one that'll be inserted.
The same is true of nextval as concurrent transactions can affect the value - even if you directly read the current sequence position, which PostgreSQL tries to prevent you from doing (because it'll produce wrong results). It's just more obvious to think about this problem with random than nextval.
So, with a volatile default, all you can do is:
Evaluate the default expression yourself, then supply the value in the insert, i.e. call SELECT nextval('statistical_outputs_id_seq') then INSERT INTO ... VALUES (..., 'the value from nextval()');
Use RETURNING to obtain the generated value
I suggest the latter. The former is annoying and difficult in the general case, since a default can be any arbitrary expression.
Example for RETURNING:
regress=> insert into crazy(blah, rand) values (1, DEFAULT) RETURNING rand;
rand
----------
0.975092
(1 row)
INSERT 0 1
The default value for a column that is a sequence will vary depending on the transaction and its relation to the current state of the MVCC.
That is to say, it will all depend on when you first get the value of the sequence, and what other transactions currently involve that sequence. i.e. the default will vary over time, dependent heavily on how other transactions are using that sequence.
The closest way to determine the default value (and again, this will vary over time) is to select the currval of the sequence (with the understanding that it's theoretically possible for another transaction to call nextval afterward and change it, although the usefullness of calling currval will depend on exactly what you want to do with the value).
Edit in response to comment:
#Craig Ringer points out that to call currval first requires a call to nextval, which is a fair point.

DB2 locking when no record yet exists

I have a table, something like:
create table state {foo int not null, bar int not null, baz varchar(32)};
create unique index on state(foo,bar);
I'd like to lock for a unique record in this table. However, if there's no existing record I'd like to prevent anyone else from inserting a record, but without inserting myself.
I'd use "FOR UPDATE WITH RS USE AND KEEP EXCLUSIVE LOCKS" but that only seems to work if the record exists.
A) You can let DB2 create every ID number. Let's say you have defined your Customer table
CREATE TABLE Customers
( CustomerID Int NOT NULL
GENERATED ALWAYS AS IDENTITY
PRIMARY KEY
, Name Varchar(50)
, Billing_Type Char(1)
, Balance Dec(9,2) NOT NULL DEFAULT
);
Insert rows without specifying the CustomerID, since DB2 will always produce the value for you.
INSERT INTO Customers
(Name, Billing_Type)
VALUES
(:cname, :billtype);
If you need to know what the last value assigned in your session was, you can then use the IDENTITY_VAL_LOCAL() function.
B) In my environment, I generally specify GENERATED BY DEFAULT. This is in part due to the nature of our principle programming language, ILE RPG-IV, where developers have traditionally to allowed the compiler to use the entire record definition. This leads me to I can tell everyone to use a sequence to generate ID values for a given table or set of tables.
You can grant select to only you, but if there are others with secadm or other privileges, they could insert.
You can do something with a trigger, something like check the current session, and if the user is your user, then it inserts the row.
if (SESSION_USER <> 'Alex) then
rollback -- or generate an exception
end if;
It seems that you also want to keep just one row, then, you can control that also in a trigger:
select count(0) into value from state
if (value > 1) then
rollback -- or generate an exception
end if;

SQL constraint to prevent updating a column based on its prior value

Can a Check Constraint (or some other technique) be used to prevent a value from being set that contradicts its prior value when its record is updated.
One example would be a NULL timestamp indicating something happened, like "file_exported". Once a file has been exported and has a non-NULL value, it should never be set to NULL again.
Another example would be a hit counter, where an integer is only permitted to increase, but can never decrease.
If it helps I'm using postgresql, but I'd like to see solutions that fit any SQL implementation
Use a trigger. This is a perfect job for a simple PL/PgSQL ON UPDATE ... FOR EACH ROW trigger, which can see both the NEW and OLD values.
See trigger procedures.
lfLoop has the best approach to the question. But to continue Craig Ringer's approach using triggers, here is an example. Essentially, you are setting the value of the column back to the original (old) value before you update.
CREATE OR REPLACE FUNCTION example_trigger()
RETURNS trigger AS
$BODY$
BEGIN
new.valuenottochange := old.valuenottochange;
new.valuenottochange2 := old.valuenottochange2;
RETURN new;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
DROP TRIGGER IF EXISTS trigger_name ON tablename;
CREATE TRIGGER trigger_name BEFORE UPDATE ON tablename
FOR EACH ROW EXECUTE PROCEDURE example_trigger();
One example would be a NULL timestamp indicating something happened,
like "file_exported". Once a file has been exported and has a non-NULL
value, it should never be set to NULL again.
Another example would be a hit counter, where an integer is only
permitted to increase, but can never decrease.
In both of these cases, I simply wouldn't record these changes as attributes on the annotated table; the 'exported' or 'hit count' is a distinct idea, representing related but orthogonal real world notions from the objects they relate to:
So they would simply be different relations. Since We only want "file_exported" to occur once:
CREATE TABLE thing_file_exported(
thing_id INTEGER PRIMARY KEY REFERENCES(thing.id),
file_name VARCHAR NOT NULL
)
The hit counter is similarly a different table:
CREATE TABLE thing_hits(
thing_id INTEGER NOT NULL REFERENCES(thing.id),
hit_date TIMESTAMP NOT NULL,
PRIMARY KEY (thing_id, hit_date)
)
And you might query with
SELECT thing.col1, thing.col2, tfe.file_name, count(th.thing_id)
FROM thing
LEFT OUTER JOIN thing_file_exported tfe
ON (thing.id = tfe.thing_id)
LEFT OUTER JOIN thing_hits th
ON (thing.id = th.thing_id)
GROUP BY thing.col1, thing.col2, tfe.file_name
Stored procedures and functions in PostgreSQL have access to both old and new values, and that code can access arbitrary tables and columns. It's not hard to build simple (crude?) finite state machines in stored procedures. You can even build table-driven state machines that way.