For a table like this one:
CREATE TABLE Users(
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
What would be the correct one-query insert for the following operation:
Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.
I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.
It seems that RETURNING id and ON CONFLICT do not belong together.
The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.
This is the crucial statement in the manual to explain your situation:
The syntax of the RETURNING list is identical to that of the output
list of SELECT. Only rows that were successfully inserted or updated
will be returned. For example, if a row was locked but not updated
because an ON CONFLICT DO UPDATE ... WHERE clause condition was not
satisfied, the row will not be returned.
Bold emphasis mine.
For a single row to insert:
Without concurrent write load on the same table
WITH ins AS (
INSERT INTO users(name)
VALUES ('new_usr_name') -- input value
ON CONFLICT(name) DO NOTHING
RETURNING users.id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM users -- 2nd SELECT never executed if INSERT successful
WHERE name = 'new_usr_name' -- input value a 2nd time
LIMIT 1;
With possible concurrent write load on the table
Consider this instead (for single row INSERT):
Is SELECT or INSERT in a function prone to race conditions?
To insert a set of rows:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
All three with very detailed explanation.
For a single row insert and no update:
with i as (
insert into users (name)
select 'the name'
where not exists (
select 1
from users
where name = 'the name'
)
returning id
)
select id
from users
where name = 'the name'
union all
select id from i
The manual about the primary and the with subqueries parts:
The primary query and the WITH queries are all (notionally) executed at the same time
Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.
But there is also:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot
If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:
with s as (
select id
from users
where name = 'the name'
), i as (
insert into users (name)
select 'the name'
where not exists (select 1 from s)
returning id
)
select id from s
union all
select id from i
Related
I have a table with ID primary key (autoincrement) and a unique column Name. Is there an efficient way in MariaDB to insert a row into this table if the same Name doesn't exist, otherwise select the existing row and, in both cases, return the ID of the row with this Name?
Here's a solution for Postgres. However, it seems MariaDB doesn't have the RETURNING id clause.
What I have tried so far is brute-force:
INSERT IGNORE INTO services (Name) VALUES ('JohnDoe');
SELECT ID FROM services WHERE Name='JohnDoe';
UPDATE: MariaDB 10.5 has RETURNING clause, however, the queries I have tried so far throw a syntax error:
WITH i AS (INSERT IGNORE INTO services (`Name`) VALUES ('John') RETURNING ID)
SELECT ID FROM i
UNION
SELECT ID FROM services WHERE `Name`='John'
For a single row, assuming id is AUTO_INCREMENT.
INSERT INTO t (name)
VALUES ('JohnDoe')
ON DUPLICATE KEY id = LAST_INSERT_ID(id);
SELECT LAST_INSERT_ID();
That looks kludgy, but it is an example in the documentation.
Caution: Most forms of INSERT will "burn" auto_inc ids. That is, they grab the next id(s) before realizing that the id won't be used. This could lead to overflowing the max auto_inc size.
It is also wise not to put the normalization inside the transaction that does the "meat" of the code. It ties up the table unnecessarily long and runs extra risk of burning ids in the case of rollback.
For batch updating of a 'normalization' table like that, see my notes here: http://mysql.rjweb.org/doc.php/staging_table#normalization (It avoids burning ids.)
I'm trying to split what was a large table update into multiple inserts into working tables. One of the queries needs uses the row number in it. On an INSERT in oracle, can I explicitly add the ROWNUM as an explicit column? This is a working table ultimately used in a reporting operation with a nasty partion over clause and having a true row number is helpful.
create table MY_TABLE(KEY number,SOMEVAL varchar2(30),EXPLICIT_ROW_NUMBER NUMBER);
INSERT /*+PARALLEL(AUTO) */ INTO MY_TABLE(KEY,SOMEVAL,EXPLICIT_ROW_NUMBER) (
SELECT /*+PARALLEL(AUTO) */ KEY,SOMEVAL,ROWNUM
FROM PREVIOUS_VERSION_OF_MY_TABLE
);
where PREVIOUS_VERSION_OF_MY_TABLE has both a KEY and SOMEVAL fields.
I'd like it to number the rows in the order that the inner select statement does it. So, the first row in the select, had it been explicitly run, would have a ROWNUM of 1, etc. I don't want it reversed, etc.
The table above has over 80MM records. Originally I used an UPDATE, and when I ran it, I got some ORA error saying that I ran out of UNDO space. I do not have the exact error message at this point anymore.
I'm trying to accomplish the same thing with multiple working tables that I would have done with one or more updates. Apparently it is either hard, impossible, etc to add UNDO space, for this query (our company DB team says), without making me a DBA, or spending about $100 on a hard drive and attaching it to the instance. So I need to write a harder query to get around this limitation. The goal is to have a session id and timestamps within that session, but for each timestamp within a session (except the last timestamp), show the next session. The original query is included below:
update sc_hub_session_activity schat
set session_time_stamp_rank = (
select /*+parallel(AUTO) */ order_number
from (
select /*+parallel(AUTO) */ schat_all.explicit_row_number as explicit_row_number,row_number() over (partition by schat_all.session_key order by schat_all.session_key,schat_all.time_stamp) as order_number
from sc_hub_session_activity schat_all
where schat_all.session_key=schat.session_key
) schat_all_group
where schat.explicit_row_number = schat_all_group.explicit_row_number
);
commit;
update sc_hub_session_activity schat
set session_next_time_stamp = (
select /*+parallel(AUTO) */ time_stamp
from sc_hub_session_activity schat2
where (schat2.session_time_stamp_rank = schat.session_time_stamp_rank+1) and (schat2.session_key = schat.session_key)
);
commit;
We have a DB for which we need a "selsert" (not upsert) function.
The function should take a text value and return a id column of existing row (SELECT) or insert the value and return id of new row (INSERT).
There are multiple processes that will need to perform this functionality (selsert)
I have been experimenting with pg_advisory_lock and ON CONFLICT clause for INSERT but am still not sure what approach would work best (even when looking at some of the other answers).
So far I have come up with following
WITH
selected AS (
SELECT id FROM test.body_parts WHERE (lower(trim(part))) = lower(trim('finger')) LIMIT 1
),
inserted AS (
INSERT INTO test.body_parts (part)
SELECT trim('finger')
WHERE NOT EXISTS ( SELECT * FROM selected )
-- ON CONFLICT (lower(trim(part))) DO NOTHING -- not sure if this is needed
RETURNING id
)
SELECT id, 'inserted' FROM inserted
UNION
SELECT id, 'selected' FROM selected
Will above query (within function) insure consistency in high
concurrency write workloads?
Are there any other issues I must consider (locking?, etc, etc)
BTW, I can insure that there are no duplicate values of (part) by creating unique index. That is not an issue. What I am after is that SELECT returns existing value if another process does INSERT (I hope I am explaining this right)
Unique index would have following definition
CREATE UNIQUE INDEX body_parts_part_ux
ON test.body_parts
USING btree
(lower(trim(part)));
I have a situation where I very frequently need to get a row from a table with a unique constraint, and if none exists then create it and return.
For example my table might be:
CREATE TABLE names(
id SERIAL PRIMARY KEY,
name TEXT,
CONSTRAINT names_name_key UNIQUE (name)
);
And it contains:
id | name
1 | bob
2 | alice
Then I'd like to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT DO NOTHING RETURNING id;
Or perhaps:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT (name) DO NOTHING RETURNING id
and have it return bob's id 1. However, RETURNING only returns either inserted or updated rows. So, in the above example, it wouldn't return anything. In order to have it function as desired I would actually need to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = 'bob'
RETURNING id;
which seems kind of cumbersome. I guess my questions are:
What is the reasoning for not allowing the (my) desired behaviour?
Is there a more elegant way to do this?
It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;
This way you do not actually write a new row version without need.
I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.
However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.
Proper solution for single-row UPSERT:
Is SELECT or INSERT in a function prone to race conditions?
General solutions for bulk UPSERT:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Without concurrent write load
If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;
I'm trying to avoid writing separate SQL queries to achieve the following scenario:
I have a Table called Values:
Values:
id INT (PK)
data TEXT
I would like to check if certain data exists in the table, and if it does, return its id, otherwise insert it and return its id.
The (very) naive way would be:
select id from Values where data = "SOME_DATA";
if id is not null, take it.
if id is null then:
insert into Values(data) values("SOME_DATA");
and then select it again to see its id or use the returned id.
I am trying to make the above functionality in one line.
I think I'm getting close, but I couldn't make it yet:
So far I got this:
select id from Values where data=(COALESCE((select data from Values where data="SOME_DATA"), (insert into Values(data) values("SOME_DATA"));
I'm trying to take advantage of the fact that the second select will return null and then the second argument to COALESCE will be returned. No success so far. What am I missing?
Your command does not work because in SQL, INSERT does not return a value.
If you have a unique constraint/index on the data column, you can use that to prevent duplicates if you blindly insert the value; this uses SQLite's INSERT OR IGNORE extension:
INSERT OR IGNORE INTO "Values"(data) VALUES('SOME_DATE');
SELECT id FROM "Values" WHERE data = 'SOME_DATA';