Get count of records affected by INSERT or UPDATE in PostgreSQL - sql

My database driver for PostgreSQL 8/9 does not return a count of records affected when executing INSERT or UPDATE.
PostgreSQL offers the non-standard syntax "RETURNING" which seems like a good workaround. But what might be the syntax? The example returns the ID of a record, but I need a count.
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
RETURNING did;

I know this question is oooolllllld and my solution is arguably overly complex, but that's my favorite kind of solution!
Anyway, I had to do the same thing and got it working like this:
-- Get count from INSERT
WITH rows AS (
INSERT INTO distributors
(did, dname)
VALUES
(DEFAULT, 'XYZ Widgets'),
(DEFAULT, 'ABC Widgets')
RETURNING 1
)
SELECT count(*) FROM rows;
-- Get count from UPDATE
WITH rows AS (
UPDATE distributors
SET dname = 'JKL Widgets'
WHERE did <= 10
RETURNING 1
)
SELECT count(*) FROM rows;
One of these days I really have to get around to writing a love sonnet to PostgreSQL's WITH clause ...

I agree w/ Milen, your driver should do this for you. What driver are you using and for what language? But if you are using plpgsql, you can use GET DIAGNOSTICS my_var = ROW_COUNT;
http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS

You can take ROW_COUNT after update or insert with this code:
insert into distributors (did, dname) values (DEFAULT, 'XYZ Widgets');
get diagnostics v_cnt = row_count;

It's not clear from your question how you're calling the statement. Assuming you're using something like JDBC you may be calling it as a query rather than an update. From JDBC's executeQuery:
Executes the given SQL statement, which returns a single ResultSet
object.
This is therefore appropriate when you execute a statement that returns some query results, such as SELECT or INSERT ... RETURNING. If you are making an update to the database and then want to know how many tuples were affected, you need to use executeUpdate which returns:
either (1) the row count for SQL Data Manipulation Language (DML)
statements or (2) 0 for SQL statements that return nothing

You could wrap your query in a transaction and it should show you the count before you ROLLBACK or COMMIT. Example:
BEGIN TRANSACTION;
INSERT .... ;
ROLLBACK TRANSACTION;
If you run the first 2 lines of the above, it should give you the count. You can then ROLLBACK (undo) the insert if you find that the number of affected lines isn't what you expected. If you're satisfied that the INSERT is correct, then you can run the same thing, but replace line 3 with COMMIT TRANSACTION;.
Important note: After you run any BEGIN TRANSACTION; you must either ROLLBACK; or COMMIT; the transaction, otherwise the transaction will create a lock that can slow down or even cripple an entire system, if you're running on a production environment.

Related

SELECT after WITH returns no rows if in an explicit transaction

WITH generated_id AS (
INSERT INTO ... RETURNING id
)
SELECT id FROM generated_id;
returns 42, whereas
BEGIN;
WITH generated_id AS (
INSERT INTO ... RETURNING id
)
SELECT id FROM generated_id;
COMMIT;
returns nothing. Why?
Update:
I just found that WITH is irrelevant, because even a single select doesn't work:
SELECT something FROM some_table;
returns rows.
BEGIN;
SELECT something FROM some_table;
COMMIT;
returns no rows.
Update 2:
I thought BEGIN and START TRANSACTION are the exact same things. Anyways, I tried all possible combinations but none of them works for me.
I'm using this free postgres service but now I tested it with SQL Fiddle and it is not complaining.
It is strange however that if I don't put a ; at the end of the SELECT line, my DB engine gives me a syntax error and if I put it there using SQL Fiddle, it tells me that Explicit commits are not allowed.
So it's still very unclear for me what is exactly happening. Whether it only works in SQL Fiddle because it really isn't running my query in an explicit transaction and if it would then the results would be the same: no rows, just as how my DB engine behaves.
I'm unfortunately not able to test it on other servers, but if someone has a reliable postgres config, maybe they could try it for me whether it runs and tell me.
This will return nothing in most clients because you only see what the last command returned:
BEGIN;
SELECT something FROM some_table;
COMMIT;
Try instead:
BEGIN;
SELECT something FROM some_table;
But don't forget to COMMIT or ROLLBACK later to terminate the open transaction.
SQL Fiddle does not allow explicit transaction wrappers. I quote:
All SQL queries are run within a transaction that gets immediately rolled-back after the SQL executes.
BEGIN; issued inside an open transaction only issues a WARNING - which is not displayed in SQL Fiddle (you see a result with 0 rows).
COMMIT; raises the error you saw.
And in your attempt with omitting the semicolon after the SELECT, COMMIT is interpreted as table alias:
BEGIN;
SELECT something FROM some_table
COMMIT;
... is equivalent to:
BEGIN;
SELECT something FROM some_table AS commit;
So that's another misunderstanding.
Did you try saying BEGIN WORK?
I think you need to start with that, then you can end it with COMMIT
Try to start with:
BEGIN TRANSACTION;
Then end with:
END TRANSACTION;

SQLITE: stop execution if select returns specific value

Is there any way to write an SQL input file for sqlite that would somehow "throw" an error, eg. exited the transaction with rollback, if a condition isn't met?
I have a script that is supposed to do something, but only if there is a certain row in one table. If it's not there, the execution of the script might have fatal results and corrupt the db.
The script is only started on demand right now, but I would prefer to add a fail-safe which would prevent its execution in case there is some issue.
Basically what I need is something like
/* IF */ SELECT value FROM meta WHERE key = 'version' /* != hardcoded_version_string THROW SOME EXCEPTION */
Is there any way to accomplish that? In Postgre / Oracle this could be done using PLSQL but I am not sure if sqlite support any such a thing?
Triggers can use the RAISE function to generate errors:
CREATE VIEW test AS SELECT NULL AS value;
CREATE TRIGGER test_insert
INSTEAD OF INSERT ON test
BEGIN
SELECT RAISE(FAIL, 'wrong value')
WHERE NEW.value != 'fixed_value';
END;
INSERT INTO test SELECT 'fixed_value';
INSERT INTO test SELECT 'whatever';
Error: wrong value
Is there any way to write an SQL input file for sqlite that would
somehow "throw" an error, eg. exited the transaction with rollback, if
a condition isn't met?
One workaround may be to create dummy table and explicitly violate NULL constraint:
CREATE TABLE meta("key" VARCHAR(100));
INSERT INTO meta("key") VALUES ('version');
CREATE TEMPORARY TABLE dummy(col INT NOT NULL);
Transaction:
BEGIN TRANSACTION;
INSERT INTO dummy(col)
SELECT NULL -- explicit insert of NULL
FROM meta
WHERE "key" = 'version';
-- Error: NOT NULL constraint failed: dummy.col
-- rest code
INSERT INTO meta("key")
VALUES ('val1');
INSERT INTO meta("key")
VALUES ('val2');
-- ...
COMMIT;
SqlFiddleDemo
Keep in mind that SQLite is not procedural language and this solution is a bit ugly.

Is SELECT or INSERT in a function prone to race conditions?

I wrote a function to create posts for a simple blogging engine:
CREATE FUNCTION CreatePost(VARCHAR, TEXT, VARCHAR[])
RETURNS INTEGER AS $$
DECLARE
InsertedPostId INTEGER;
TagName VARCHAR;
BEGIN
INSERT INTO Posts (Title, Body)
VALUES ($1, $2)
RETURNING Id INTO InsertedPostId;
FOREACH TagName IN ARRAY $3 LOOP
DECLARE
InsertedTagId INTEGER;
BEGIN
-- I am concerned about this part.
BEGIN
INSERT INTO Tags (Name)
VALUES (TagName)
RETURNING Id INTO InsertedTagId;
EXCEPTION WHEN UNIQUE_VIOLATION THEN
SELECT INTO InsertedTagId Id
FROM Tags
WHERE Name = TagName
FETCH FIRST ROW ONLY;
END;
INSERT INTO Taggings (PostId, TagId)
VALUES (InsertedPostId, InsertedTagId);
END;
END LOOP;
RETURN InsertedPostId;
END;
$$ LANGUAGE 'plpgsql';
Is this prone to race conditions when multiple users delete tags and create posts at the same time?
Specifically, do transactions (and thus functions) prevent such race conditions from happening?
I'm using PostgreSQL 9.2.3.
It's the recurring problem of SELECT or INSERT under possible concurrent write load, related to (but different from) UPSERT (which is INSERT or UPDATE).
This PL/pgSQL function uses UPSERT (INSERT ... ON CONFLICT ..) to INSERT or SELECT a single row:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
SELECT tag_id -- only if row existed before
FROM tag
WHERE tag = _tag
INTO _tag_id;
IF NOT FOUND THEN
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
END IF;
END
$func$;
There is still a tiny window for a race condition. To make absolutely sure we get an ID:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT tag_id
FROM tag
WHERE tag = _tag
INTO _tag_id;
EXIT WHEN FOUND;
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
EXIT WHEN FOUND;
END LOOP;
END
$func$;
db<>fiddle here
This keeps looping until either INSERT or SELECT succeeds.
Call:
SELECT f_tag_id('possibly_new_tag');
If subsequent commands in the same transaction rely on the existence of the row and it is actually possible that other transactions update or delete it concurrently, you can lock an existing row in the SELECT statement with FOR SHARE.
If the row gets inserted instead, it is locked (or not visible for other transactions) until the end of the transaction anyway.
Start with the common case (INSERT vs SELECT) to make it faster.
Related:
Get Id from a conditional INSERT
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
Related (pure SQL) solution to INSERT or SELECT multiple rows (a set) at once:
How to use RETURNING with ON CONFLICT in PostgreSQL?
What's wrong with this pure SQL solution?
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE sql AS
$func$
WITH ins AS (
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
)
SELECT tag_id FROM ins
UNION ALL
SELECT tag_id FROM tag WHERE tag = _tag
LIMIT 1;
$func$;
Not entirely wrong, but it fails to seal a loophole, like #FunctorSalad worked out. The function can come up with an empty result if a concurrent transaction tries to do the same at the same time. The manual:
All the statements are executed with the same snapshot
If a concurrent transaction inserts the same new tag a moment earlier, but hasn't committed, yet:
The UPSERT part comes up empty, after waiting for the concurrent transaction to finish. (If the concurrent transaction should roll back, it still inserts the new tag and returns a new ID.)
The SELECT part also comes up empty, because it's based on the same snapshot, where the new tag from the (yet uncommitted) concurrent transaction is not visible.
We get nothing. Not as intended. That's counter-intuitive to naive logic (and I got caught there), but that's how the MVCC model of Postgres works - has to work.
So do not use this if multiple transactions can try to insert the same tag at the same time. Or loop until you actually get a row. The loop will hardly ever be triggered in common work loads anyway.
Postgres 9.4 or older
Given this (slightly simplified) table:
CREATE table tag (
tag_id serial PRIMARY KEY
, tag text UNIQUE
);
An almost 100% secure function to insert new tag / select existing one, could look like this.
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
BEGIN
WITH sel AS (SELECT t.tag_id FROM tag t WHERE t.tag = _tag FOR SHARE)
, ins AS (INSERT INTO tag(tag)
SELECT _tag
WHERE NOT EXISTS (SELECT 1 FROM sel) -- only if not found
RETURNING tag.tag_id) -- qualified so no conflict with param
SELECT sel.tag_id FROM sel
UNION ALL
SELECT ins.tag_id FROM ins
INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- insert in concurrent session?
RAISE NOTICE 'It actually happened!'; -- hardly ever happens
END;
EXIT WHEN tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$;
db<>fiddle here
Old sqlfiddle
Why not 100%? Consider the notes in the manual for the related UPSERT example:
https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
Explanation
Try the SELECT first. This way you avoid the considerably more expensive exception handling 99.99% of the time.
Use a CTE to minimize the (already tiny) time slot for the race condition.
The time window between the SELECT and the INSERT within one query is super tiny. If you don't have heavy concurrent load, or if you can live with an exception once a year, you could just ignore the case and use the SQL statement, which is faster.
No need for FETCH FIRST ROW ONLY (= LIMIT 1). The tag name is obviously UNIQUE.
Remove FOR SHARE in my example if you don't usually have concurrent DELETE or UPDATE on the table tag. Costs a tiny bit of performance.
Never quote the language name: 'plpgsql'. plpgsql is an identifier. Quoting may cause problems and is only tolerated for backwards compatibility.
Don't use non-descriptive column names like id or name. When joining a couple of tables (which is what you do in a relational DB) you end up with multiple identical names and have to use aliases.
Built into your function
Using this function you could largely simplify your FOREACH LOOP to:
...
FOREACH TagName IN ARRAY $3
LOOP
INSERT INTO taggings (PostId, TagId)
VALUES (InsertedPostId, f_tag_id(TagName));
END LOOP;
...
Faster, though, as a single SQL statement with unnest():
INSERT INTO taggings (PostId, TagId)
SELECT InsertedPostId, f_tag_id(tag)
FROM unnest($3) tag;
Replaces the whole loop.
Alternative solution
This variant builds on the behavior of UNION ALL with a LIMIT clause: as soon as enough rows are found, the rest is never executed:
Way to try multiple SELECTs till a result is available?
Building on this, we can outsource the INSERT into a separate function. Only there we need exception handling. Just as safe as the first solution.
CREATE OR REPLACE FUNCTION f_insert_tag(_tag text, OUT tag_id int)
RETURNS int
LANGUAGE plpgsql AS
$func$
BEGIN
INSERT INTO tag(tag) VALUES (_tag) RETURNING tag.tag_id INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- catch exception, NULL is returned
END
$func$;
Which is used in the main function:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT tag_id FROM tag WHERE tag = _tag
UNION ALL
SELECT f_insert_tag(_tag) -- only executed if tag not found
LIMIT 1 -- not strictly necessary, just to be clear
INTO _tag_id;
EXIT WHEN _tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$;
This is a bit cheaper if most of the calls only need SELECT, because the more expensive block with INSERT containing the EXCEPTION clause is rarely entered. The query is also simpler.
FOR SHARE is not possible here (not allowed in UNION query).
LIMIT 1 would not be necessary (tested in pg 9.4). Postgres derives LIMIT 1 from INTO _tag_id and only executes until the first row is found.
There's still something to watch out for even when using the ON CONFLICT clause introduced in Postgres 9.5. Using the same function and example table as in #Erwin Brandstetter's answer, if we do:
Session 1: begin;
Session 2: begin;
Session 1: select f_tag_id('a');
f_tag_id
----------
11
(1 row)
Session 2: select f_tag_id('a');
[Session 2 blocks]
Session 1: commit;
[Session 2 returns:]
f_tag_id
----------
NULL
(1 row)
So f_tag_id returned NULL in session 2, which would be impossible in a single-threaded world!
If we raise the transaction isolation level to repeatable read (or the stronger serializable), session 2 throws ERROR: could not serialize access due to concurrent update instead. So no "impossible" results at least, but unfortunately we now need to be prepared to retry the transaction.
Edit: With repeatable read or serializable, if session 1 inserts tag a, then session 2 inserts b, then session 1 tries to insert b and session 2 tries to insert a, one session detects a deadlock:
ERROR: deadlock detected
DETAIL: Process 14377 waits for ShareLock on transaction 1795501; blocked by process 14363.
Process 14363 waits for ShareLock on transaction 1795503; blocked by process 14377.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (0,3) in relation "tag"
SQL function "f_tag_id" statement 1
After the session that received the deadlock error rolls back, the other session continues. So I guess we should treat deadlock just like serialization_failure and retry, in a situation like this?
Alternatively, insert the tags in a consistent order, but this is not easy if they don't all get added in one place.
I think there is a slight chance that when the tag already existed it might be deleted by another transaction after your transaction has found it. Using a SELECT FOR UPDATE should solve that.

PLSQL Insert into with subquery and returning clause

I can't figure out the correct syntax for the following pseudo-sql:
INSERT INTO some_table
(column1,
column2)
SELECT col1_value,
col2_value
FROM other_table
WHERE ...
RETURNING id
INTO local_var;
I would like to insert something with the values of a subquery.
After inserting I need the new generated id.
Heres what oracle doc says:
Insert Statement
Returning Into
OK i think it is not possible only with the values clause...
Is there an alternative?
You cannot use the RETURNING BULK COLLECT from an INSERT.
This methodology can work with updates and deletes howeveer:
create table test2(aa number)
/
insert into test2(aa)
select level
from dual
connect by level<100
/
set serveroutput on
declare
TYPE t_Numbers IS TABLE OF test2.aa%TYPE
INDEX BY BINARY_INTEGER;
v_Numbers t_Numbers;
v_count number;
begin
update test2
set aa = aa+1
returning aa bulk collect into v_Numbers;
for v_count in 1..v_Numbers.count loop
dbms_output.put_line('v_Numbers := ' || v_Numbers(v_count));
end loop;
end;
You can get it to work with a few extra steps (doing a FORALL INSERT utilizing TREAT)
as described in this article:
returning with insert..select
T
to utilize the example they create and apply it to test2 test table
CREATE or replace TYPE ot AS OBJECT
( aa number);
/
CREATE TYPE ntt AS TABLE OF ot;
/
set serveroutput on
DECLARE
nt_passed_in ntt;
nt_to_return ntt;
FUNCTION pretend_parameter RETURN ntt IS
nt ntt;
BEGIN
SELECT ot(level) BULK COLLECT INTO nt
FROM dual
CONNECT BY level <= 5;
RETURN nt;
END pretend_parameter;
BEGIN
nt_passed_in := pretend_parameter();
FORALL i IN 1 .. nt_passed_in.COUNT
INSERT INTO test2(aa)
VALUES
( TREAT(nt_passed_in(i) AS ot).aa
)
RETURNING ot(aa)
BULK COLLECT INTO nt_to_return;
FOR i IN 1 .. nt_to_return.COUNT LOOP
DBMS_OUTPUT.PUT_LINE(
'Sequence value = [' || TO_CHAR(nt_to_return(i).aa) || ']'
);
END LOOP;
END;
/
Unfortunately that's not possible. RETURNING is only available for INSERT...VALUES statements. See this Oracle forum thread for a discussion of this subject.
You can't, BUT at least in Oracle 19c, you can specify a SELECT subquery inside the VALUES clause and so use RETURNING! This can be a good workaround, even if you may have to repeat the WHERE clause for every field:
INSERT INTO some_table
(column1,
column2)
VALUES((SELECT col1_value FROM other_table WHERE ...),
(SELECT col2_value FROM other_table WHERE ...))
RETURNING id
INTO local_var;
Because the insert is based on a select, Oracle is assuming that you are permitting a multiple-row insert with that syntax. In that case, look at the multiple row version of the returning clause document as it demonstrates that you need to use BULK COLLECT to retrieve the value from all inserted rows into a collection of results.
After all, if your insert query creates two rows - which returned value would it put into an single variable?
EDIT - Turns out this doesn't work as I had thought.... darn it!
This isn't as easy as you may think, and certainly not as easy as it is using MySQL. Oracle doesn't keep track of the last inserts, in a way that you can ping back the result.
You will need to work out some other way of doing this, you can do it using ROWID - but this has its pitfalls.
This link discussed the issue: http://forums.oracle.com/forums/thread.jspa?threadID=352627

Nested transactions in postgresql 8.2?

I'm working on scripts that apply database schema updates. I've setup all my SQL update scripts using start transaction/commit. I pass these scripts to psql on the command line.
I now need to apply multiple scripts at the same time, and in one transaction. So far the only solution I've come up with is to remove the start transaction/commit from the original set of scripts, then jam them together inside a new start transaction/commit block. I'm writing perl scripts to do this on the fly.
Effectively I want nested transactions, which I can't figure out how to do in postgresql.
Is there any way to do or simulate nested transactions for this purpose? I have things setup to automatically bail out on any error, so I don't need to continue in the top level transaction if any of the lower ones fail.
Well you have the possibility to use nested transactions inside postgresql using SavePoints.
Take this code example:
CREATE TABLE t1 (a integer PRIMARY KEY);
CREATE FUNCTION test_exception() RETURNS boolean LANGUAGE plpgsql AS
$$BEGIN
INSERT INTO t1 (a) VALUES (1);
INSERT INTO t1 (a) VALUES (2);
INSERT INTO t1 (a) VALUES (1);
INSERT INTO t1 (a) VALUES (3);
RETURN TRUE;
EXCEPTION
WHEN integrity_constraint_violation THEN
RAISE NOTICE 'Rollback to savepoint';
RETURN FALSE;
END;$$;
BEGIN;
SELECT test_exception();
NOTICE: Rollback to savepoint
test_exception
----------------
f
(1 row)
COMMIT;
SELECT count(*) FROM t1;
count
-------
0
(1 row)
Maybe this will help you out a little bit.
I've ended up 'solving' my problem out of band - I use a perl script to re-work the input scripts to eliminate their start transaction/commit calls, then push them all into one file, which gets it's own start transaction/commit.