Reference value of serial column in another column during same INSERT - sql

I have a table with a SERIAL primary key, and also an ltree column, whose value I want to be the concatenation of those primary keys. e.g.
id | path
----------
1 1
2 1.2
3 1.2.3
4 1.4
5 1.5
I'm curious if there's a way to do such an insert in one query, e.g.
INSERT INTO foo (id, ltree) VALUES (DEFAULT, THIS.id::text)
I'm probably overreaching here and trying to do in one query what I should be doing in two (grouped in a transaction).

You could use a CTE to retrieve the value from the sequence once and use it repeatedly:
WITH cte AS (
SELECT nextval('foo_id_seq') AS id
)
INSERT INTO foo (id, ltree)
SELECT id, '1.' || id
FROM cte;
The CTE with a data-modifying command requires Postgres 9.1 or later.
If you are not sure about the name of the sequence, use
pg_get_serial_sequence() instead:
WITH i AS (
SELECT nextval(pg_get_serial_sequence('foo', 'id')) AS id
)
INSERT INTO foo (id, ltree)
SELECT id, '1.' || id
FROM i;
If the table name "foo" might not be unique across all schemas in the DB, schema-qualify it. And if the spelling of any name is non-standard, you have to double-quote:
pg_get_serial_sequence('"My_odd_Schema".foo', 'id')
Quick tests indicated #Mark's idea with lastval() might work too:
INSERT INTO foo (ltree) VALUES ('1.' || lastval());
You can just leave id out of the query, the serial column will be assigned automatically. Makes no difference.
There shouldn't be a race condition between rows. I quote the manual:
currval
Return the value most recently obtained by nextval for this sequence in the current session. (An error is reported if nextval has
never been called for this sequence in this session.) Because this is
returning a session-local value, it gives a predictable answer whether
or not other sessions have executed nextval since the current session
did.
This function requires USAGE or SELECT privilege on the sequence.
lastval
Return the value most recently returned by nextval in the current session. This function is identical to currval, except that instead of
taking the sequence name as an argument it refers to whichever
sequence nextval was most recently applied to in the current session.
It is an error to call lastval if nextval has not yet been called in
the current session.
This function requires USAGE or SELECT privilege on the last used sequence.
Bold emphasis mine.
But, as #Bernard commented, it can fail after all: there is no guarantee that the default value is filled (and nextval() called in the process) before lastval() is called to fill the 2nd column ltree. So stick with the first solution and nextval() to be sure.

This worked in my test:
INSERT INTO foo (id, ltree) VALUES (DEFAULT, (SELECT last_value from foo_id_seq));
I think there's a race condition there if two INSERTs are happening at the same time, since this references the last sequence value, instead of the current row. I would personally be more inclined to do this (pseudo-code):
my $id = SELECT nextval('foo_id_seq');
INSERT INTO foo (id, ltree) VALUES ($id, '$id');

Related

Is it safe to insert row, with value incremented by CTE select?

Say we have this table:
create table if not exists template (
id serial primary key,
label text not null,
version integer not null default 1,
created_at timestamp not null default current_timestamp,
unique(label, version)
);
The logic is to insert new record, incrementing version value in case of the equal label value. First intention is to do something like this:
with v as (
select coalesce(max(version), 0) + 1 as new_version
from template t where label = 'label1'
)
insert into template (label, version)
values ('label1', (select new_version from v))
returning *;
Although it works, I'm pretty sure it wouldn't be safe in case of the simultaneous inserts. Am I right?
If I am, should I wrap this query in a transaction?
Or is there a better way to implement this kind of versioning?
Gap-less serial IDs per label are hard to come by. Your simple approach can easily fail with concurrent writes due to inherent race conditions. And "value-locking" is not generally implemented in Postgres.
But there is a way. Introduce a parent table label - if you don't already have one - and take a lock on the parent row. This keeps locking to a minimum and should avoid excessive costs from lock contention.
CREATE TABLE label (
label text PRIMARY KEY
);
CREATE TABLE version (
id serial PRIMARY KEY
, label text NOT NULL REFERENCES label
, version integer NOT NULL DEFAULT 1
, created_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
, UNIQUE(label, version)
);
Then, in a single transaction:
BEGIN;
INSERT INTO label (label)
VALUES ('label1')
ON CONFLICT (label) DO UPDATE
SET label = NULL WHERE false; -- never executed, but still locks the row
RETURNING *; -- optional
INSERT INTO version (label, version)
SELECT 'label1', coalesce(max(v.version), 0) + 1
FROM version v
WHERE v.label = 'label1'
RETURNING *;
COMMIT;
The first UPSERT inserts a new label if it's not there, yet, or locks the row if it is. Either way, the transaction now holds a lock on that label, excluding concurrent writes.
The second INSERT adds a new version, or the first one if there are none, yet.
You could also move the UPSERT into a CTE attached to the INSERT, thus making it a single command and hence always a single transaction implicitly. But the CTE is not needed per se.
This is safe under concurrent write load and works for all corner cases. You just have to make sure that all possibly competing write access takes the same route.
You might wrap this into a function. This ...
... ensures a single transaction
... simplifies the call, with a single mention of the label value
... allows to revoke write privileges from the tables and only grant it to this function if desired, enforcing the right access pattern.
CREATE FUNCTION f_new_label (_label text)
RETURNS TABLE (label text, version int)
LANGUAGE sql STRICT AS
$func$
INSERT INTO label (label)
VALUES (_label)
ON CONFLICT (label) DO UPDATE
SET label = NULL WHERE false; -- never executed, but still locks the row
INSERT INTO version AS v (label, version)
SELECT _label, coalesce(max(v1.version), 0) + 1
FROM version v1
WHERE v1.label = _label
RETURNING v.label, v.version;
$func$;
Call:
SELECT * FROM f_new_label('label1');
fiddle
Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?
UPDATE n:m relation in view as array (operations)
Yes, there could be collisions with simultaneous inserts.
Transactions could lead to locks, as you want to keep the state if the table till the insert occurs with the new version number. appently, postgres would create a deadlock with multiple concurrent inserts.
You can use a before i8nsert trigger , which would guarantee, that every insert gets a higher version number, as it would go row by row.
But you have to remember, that cpus and the sql server can rearrange computation order, so that the rule first come first serve, may not be applied.

How to insert a row if not exists otherwise select and return its ID in both cases in MariaDB?

I have a table with ID primary key (autoincrement) and a unique column Name. Is there an efficient way in MariaDB to insert a row into this table if the same Name doesn't exist, otherwise select the existing row and, in both cases, return the ID of the row with this Name?
Here's a solution for Postgres. However, it seems MariaDB doesn't have the RETURNING id clause.
What I have tried so far is brute-force:
INSERT IGNORE INTO services (Name) VALUES ('JohnDoe');
SELECT ID FROM services WHERE Name='JohnDoe';
UPDATE: MariaDB 10.5 has RETURNING clause, however, the queries I have tried so far throw a syntax error:
WITH i AS (INSERT IGNORE INTO services (`Name`) VALUES ('John') RETURNING ID)
SELECT ID FROM i
UNION
SELECT ID FROM services WHERE `Name`='John'
For a single row, assuming id is AUTO_INCREMENT.
INSERT INTO t (name)
VALUES ('JohnDoe')
ON DUPLICATE KEY id = LAST_INSERT_ID(id);
SELECT LAST_INSERT_ID();
That looks kludgy, but it is an example in the documentation.
Caution: Most forms of INSERT will "burn" auto_inc ids. That is, they grab the next id(s) before realizing that the id won't be used. This could lead to overflowing the max auto_inc size.
It is also wise not to put the normalization inside the transaction that does the "meat" of the code. It ties up the table unnecessarily long and runs extra risk of burning ids in the case of rollback.
For batch updating of a 'normalization' table like that, see my notes here: http://mysql.rjweb.org/doc.php/staging_table#normalization (It avoids burning ids.)

Get Id from a conditional INSERT

For a table like this one:
CREATE TABLE Users(
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
What would be the correct one-query insert for the following operation:
Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.
I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.
It seems that RETURNING id and ON CONFLICT do not belong together.
The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.
This is the crucial statement in the manual to explain your situation:
The syntax of the RETURNING list is identical to that of the output
list of SELECT. Only rows that were successfully inserted or updated
will be returned. For example, if a row was locked but not updated
because an ON CONFLICT DO UPDATE ... WHERE clause condition was not
satisfied, the row will not be returned.
Bold emphasis mine.
For a single row to insert:
Without concurrent write load on the same table
WITH ins AS (
INSERT INTO users(name)
VALUES ('new_usr_name') -- input value
ON CONFLICT(name) DO NOTHING
RETURNING users.id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM users -- 2nd SELECT never executed if INSERT successful
WHERE name = 'new_usr_name' -- input value a 2nd time
LIMIT 1;
With possible concurrent write load on the table
Consider this instead (for single row INSERT):
Is SELECT or INSERT in a function prone to race conditions?
To insert a set of rows:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
All three with very detailed explanation.
For a single row insert and no update:
with i as (
insert into users (name)
select 'the name'
where not exists (
select 1
from users
where name = 'the name'
)
returning id
)
select id
from users
where name = 'the name'
union all
select id from i
The manual about the primary and the with subqueries parts:
The primary query and the WITH queries are all (notionally) executed at the same time
Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.
But there is also:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot
If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:
with s as (
select id
from users
where name = 'the name'
), i as (
insert into users (name)
select 'the name'
where not exists (select 1 from s)
returning id
)
select id from s
union all
select id from i

How to retrieve the actual default value for a column before insertion

I have a table in my postgres database that looks like this when I describe it.
Table "public.statistical_outputs"
Column | Type | Modifiers
-------------------+--------------------------+------------------------------------------------------------------
id | bigint | not null default nextval('statistical_outputs_id_seq'::regclass)
I want to know what value will be inserted into the id column if I use a statement like
insert into statistical_outputs VALUES (DEFAULT);
I have tried things like
select nextval('id') from statistical_outputs;
but it does not work.
Possibly related questions:
postgresql sequence nextval in schema
PostgreSQL nextval and currval in same query
This questions is a possible duplicate of:
Get the default values of table columns in Postgres?
However, the answer given by Chris is the one I want without having to look at the information schema (which I think I tried but didn't work).
There's no way to do what you want directly - you can't preview the value.
Imagine:
regress=> CREATE TABLE crazy (blah integer, rand float4 default random());
CREATE TABLE
regress=> insert into crazy(blah, rand) values (1, DEFAULT);
INSERT 0 1
regress=> select * from crazy;
blah | rand
------+----------
1 | 0.932575
(1 row)
random() is a volatile function that returns a different value each time. So any attempt to preview the value would only get you a different value to the one that'll be inserted.
The same is true of nextval as concurrent transactions can affect the value - even if you directly read the current sequence position, which PostgreSQL tries to prevent you from doing (because it'll produce wrong results). It's just more obvious to think about this problem with random than nextval.
So, with a volatile default, all you can do is:
Evaluate the default expression yourself, then supply the value in the insert, i.e. call SELECT nextval('statistical_outputs_id_seq') then INSERT INTO ... VALUES (..., 'the value from nextval()');
Use RETURNING to obtain the generated value
I suggest the latter. The former is annoying and difficult in the general case, since a default can be any arbitrary expression.
Example for RETURNING:
regress=> insert into crazy(blah, rand) values (1, DEFAULT) RETURNING rand;
rand
----------
0.975092
(1 row)
INSERT 0 1
The default value for a column that is a sequence will vary depending on the transaction and its relation to the current state of the MVCC.
That is to say, it will all depend on when you first get the value of the sequence, and what other transactions currently involve that sequence. i.e. the default will vary over time, dependent heavily on how other transactions are using that sequence.
The closest way to determine the default value (and again, this will vary over time) is to select the currval of the sequence (with the understanding that it's theoretically possible for another transaction to call nextval afterward and change it, although the usefullness of calling currval will depend on exactly what you want to do with the value).
Edit in response to comment:
#Craig Ringer points out that to call currval first requires a call to nextval, which is a fair point.

How to obtain a DB2 Sequence Value in a Multithreaded Application

I am working on a multithreaded application that uses DB2 for its primary database. In the past we've mostly used Identity columns for tables where we needed an auto-generated unique identifier. To do that we would run the below 2 queries in the same transaction:
INSERT INTO tbname (IDENTITY_COL, ...) VALUES (DEFAULT, ...);
SELECT IDENTITY_VAL_LOCAL() FROM SYSIBM.SYSDUMMY1;
We are now being pressured to switch to Sequence instead. I know you can use "NEXT VALUE FOR colname" in both INSERT and SELECT statements, but I can't figure out how to both INSERT and SELECT with the same value without risking a race condition in a multithreaded application. For example, if I use:
INSERT INTO tbname (SEQUENCE_COL, ...) VALUES (NEXT VALUE FOR SEQUENCE_COL, ...);
SELECT PREVIOUS VALUE FOR SEQUENCE_COL;
Then there's a possibility another INSERT was run between the above INSERT and SELECT, hence providing me the incorrect value. If I try:
SELECT NEXT VALUE FOR SEQUENCE_COL;
store the value in a variable and pass that in to the INSERT:
INSERT INTO tbname (SEQUENCE_COL, ...) VALUES (variable_value, ...);
Then there's a possibility another thread got the same NEXT VALUE and tries to insert the same value, resulting in a DB2 -803 error. Is it possible to use SEQUENCE columns in a multithreaded environment, or do I need to fight to keep my IDENTITY columns?
In addition to what Michael Sharek (correctly) said:
INSERT INTO tbname (SEQUENCE_COL, ...) VALUES (NEXT VALUE FOR SEQUENCE_COL, ...);
SELECT PREVIOUS VALUE FOR SEQUENCE_COL;
Your assumption Then there's a possibility another INSERT was run between the above INSERT and SELECT, hence providing me the incorrect value" regarding the above sequence of statements is incorrect.
The "next value" and "previous value" are connection specific.
Access to a sequence from different threads will never create a "race" condition. Each connection has a completely isolated "environment" for the sequence.
You've got a mistaken assumption in your question.
If I try:
SELECT NEXT VALUE FOR SEQUENCE_COL;
store the value in a variable and pass that in to the INSERT:
INSERT INTO tbname (SEQUENCE_COL, ...) VALUES (variable_value, ...);
Then there's a possibility another thread got the same NEXT VALUE and tries to insert the same value
That's not correct. The second thread would get a different NEXTVAL and not the same value as the first thread.
I also want to add my opinion on this part:
We are now being pressured to switch to Sequence instead.
I can't imagine there being a really good reason to switch to sequences from identity. They're basically the same thing.
In addition to the other correct answers, you can also just use a single statement to insert a row and return inserted values as follows:
SELECT SEQUENCE_COL FROM NEW TABLE (
INSERT INTO tbname (SEQUENCE_COL, ...) VALUES (NEXT VALUE FOR MY_SEQUENCE, ...)
)