Is SELECT "faster" than function with nested INSERT? - sql

I'm using a function that inserts a row to a table if it doesn't exist, then returns the id of the row.
Whenever I put the function inside a SELECT statement, with values that don't exist in the table yet, e.g.:
SELECT * FROM table WHERE id = function(123);
... it returns an empty row. However, running it again with the same values will return the row with the values I want to see.
Why does this happen? Is the INSERT running behind the SELECT speed? Or does PostgreSQL cache the table when it didn't exist, and at next run, it displays the result?
Here's a ready to use example of how this issue can occur:
CREATE TABLE IF NOT EXISTS test_table(
id INTEGER,
tvalue boolean
);
CREATE OR REPLACE FUNCTION test_function(user_id INTEGER)
RETURNS integer
LANGUAGE 'plpgsql'
AS $$
DECLARE
__user_id INTEGER;
BEGIN
EXECUTE format('SELECT * FROM test_table WHERE id = $1')
USING user_id
INTO __user_id;
IF __user_id IS NOT NULL THEN
RETURN __user_id;
ELSE
INSERT INTO test_table(id, tvalue)
VALUES (user_id, TRUE)
RETURNING id
INTO __user_id;
RETURN __user_id;
END IF;
END;
$$;
Call:
SELECT * FROM test_table WHERE id = test_function(4);
To reproduce the issue, pass any integer that doesn't exist in the table, yet.

The example is broken in multiple places.
No need for dynamic SQL with EXECUTE.
SELECT * in the function is wrong.
Your table definition should have a UNIQUE or PRIMARY KEY constraint on (id).
Most importantly, the final SELECT statement is bound to fail. Since the function is VOLATILE (has to be), it is evaluated once for every existing row in the table. Even if that worked, it would be a performance nightmare. But it does not. Like #user2864740 commented, there is also a problem with visibility. Postgres checks every existing row against the result of the function, which in turn adds 1 or more rows, and those rows are not yet in the snapshot the SELECT is operating on.
SELECT * FROM test_table WHERE id = test_function(4);
This would work (but see below!):
CREATE TABLE test_table (
id int PRIMARY KEY --!
, tvalue bool
);
CREATE OR REPLACE FUNCTION test_function(_user_id int)
RETURNS test_table LANGUAGE sql AS
$func$
WITH ins AS (
INSERT INTO test_table(id, tvalue)
VALUES (_user_id, TRUE)
ON CONFLICT DO NOTHING
RETURNING *
)
TABLE ins
UNION ALL
SELECT * FROM test_table WHERE id = _user_id
LIMIT 1
$func$;
And replace your SELECT with just:
SELECT * FROM test_function(1);
db<>fiddle here
Related:
Return a value if no record is found
How to use RETURNING with ON CONFLICT in PostgreSQL?
There is still a race condition for concurrent calls. If that can happen, consider:
Is SELECT or INSERT in a function prone to race conditions?

Related

Postgresql function (upsert and delete): how to pass a set of rows of table type to function call

I have a table
CREATE TABLE items(
id SERIAL PRIMARY KEY,
group_id INT NOT NULL,
item_id INT NOT NULL,
name TEXT,
.....
.....
);
I am creating a function that
takes set of row values for a single group_id, fail if multiple group_ids present in in input rows
compares it with matching values in the table (only for that group_id
updates changed values (only for the input group_id)
inserts new values
deletes table rows that are absent in the row input (compare rows with group_id and item_id)(only for the input group_id)
this is my function definition
CREATE OR REPLACE FUNCTION update_items(rows_input items[]) RETURNS boolean as $$
DECLARE
rows items[];
group_id_input integer;
BEGIN
-- get single group_id from input rows, fail if multiple group_id's present in input
-- read items of that group_id in table
-- compare input rows and table rows (of the same group_id)
-- create transaction
-- delete absent rows
-- upsert
-- return success of transaction (boolean)
END;
$$ LANGUAGE plpgsql;
I am trying to call the function in a query
select update_items(
(38,1,1283,"Name1"),
(39,1,1471,"Name2"),
(40,1,1333,"Name3")
);
I get the following error
Failed to run sql query: column "Name1" does not exist
I tried removing the id column values: that gives me the same error
What is the correct way to pass row values to a function that accepts table type array as arguments?
updates changed values
inserts new values deletes table rows that are
absent in the row input (compare rows with group_id and item_id)
If you want do upsert, you must upsert with unique constraint.
So there is two unique constraints. primary key(id), (group_id, item_id).
insert on conflict need consider these two unique constraint.
Since You want pass items[] type to the functions. So it also means that any id that is not in the input function arguments will also be deleted.
drop table if exists items cascade;
begin;
CREATE TABLE items(
id bigint GENERATED BY DEFAULT as identity PRIMARY KEY,
group_id INT NOT NULL,
item_id INT NOT NULL,
name TEXT
,unique(group_id,item_id)
);
insert into items values
(38,1,1283,'original_38'),
(39,1,1471,'original_39'),
(40,1,1333,'original_40'),
(42,1,1332,'original_42');
end;
main function:
CREATE OR REPLACE FUNCTION update_items (in_items items[])
RETURNS boolean
AS $FUNC$
DECLARE
iter items;
saved_ids bigint[];
BEGIN
saved_ids := (SELECT ARRAY (SELECT (unnest(in_items)).id));
DELETE FROM items
WHERE NOT (id = ANY (saved_ids));
FOREACH iter IN ARRAY in_items LOOP
INSERT INTO items
SELECT
iter.*
ON CONFLICT (id)
DO NOTHING;
INSERT INTO items
SELECT
iter.*
ON CONFLICT (group_id,
item_id)
DO UPDATE SET
name = EXCLUDED.name;
RAISE NOTICE 'rec.groupid: %, rec.items_id:%', iter.group_id, iter.item_id;
END LOOP;
RETURN TRUE;
END
$FUNC$
LANGUAGE plpgsql;
call it:
SELECT
*
FROM
update_items ('{"(38, 1, 1283, Name1) "," (39, 1, 1471, Name2) "," (40, 1, 1333, Name3)"}'::items[]);
references:
Iterating over integer[] in PL/pgSQL
How to match elements in an array of composite type?
IN vs ANY operator in PostgreSQL
Here's how I achieved UPSERT with DELETE missing rows, if anyone is looking to do the same.
CREATE OR REPLACE FUNCTION update_items(in_rows items[]) RETURNS INT AS $$
DECLARE
in_groups INTEGER[];
in_group_id INTEGER;
in_item_ids INTEGER[];
BEGIN
-- get single group id from input rows, fail if multiple group ids present in input
in_groups = (SELECT ARRAY (SELECT distinct(group_id) FROM UNNEST(in_rows)));
IF ARRAY_LENGTH(in_groups,1)>1 THEN
RAISE EXCEPTION 'Multiple group_ids found in input items: %', in_groups;
END IF;
in_group_id = in_groups[1];
-- delete items of this group that are absent in in_rows
in_item_ids := (SELECT ARRAY (SELECT (UNNEST(in_rows)).item_id));
DELETE FROM items
WHERE
master_code <> ANY (in_item_ids)
AND group_id = in_group_id;
-- upsert in_rows
INSERT INTO items
SELECT * FROM UNNEST(in_rows)
ON CONFLICT (group_id,item_d)
DO UPDATE SET
parent_group_id = EXCLUDED.parent_group_id,
mat_centre_id = EXCLUDED.mat_centre_id,
NAME = EXCLUDED.NAME,
opening_date = EXCLUDED.opening_date;
RETURN in_group_id;
-- return success of transaction (boolean)
END;
$$ LANGUAGE plpgsql;
This function removes rows that are missing from your in_rows

Dynamic query that uses CTE gets "syntax error at end of input"

I have a table that looks like this:
CREATE TABLE label (
hid UUID PRIMARY KEY DEFAULT UUID_GENERATE_V4(),
name TEXT NOT NULL UNIQUE
);
I want to create a function that takes a list of names and inserts multiple rows into the table, ignoring duplicate names, and returns an array of the IDs generated for the rows it inserted.
This works:
CREATE OR REPLACE FUNCTION insert_label(nms TEXT[])
RETURNS UUID[]
AS $$
DECLARE
ids UUID[];
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
WITH new_names AS (
INSERT INTO label(name)
SELECT tn.name
FROM tmp_names tn
WHERE NOT EXISTS(SELECT 1 FROM label h WHERE h.name = tn.name)
RETURNING hid
)
SELECT ARRAY_AGG(hid) INTO ids
FROM new_names;
DROP TABLE tmp_names;
RETURN ids;
END;
$$ LANGUAGE PLPGSQL;
I have many tables with the exact same columns as the label table, so I would like to have a function that can insert into any of them. I'd like to create a dynamic query to do that. I tried that, but this does not work:
CREATE OR REPLACE FUNCTION insert_label(h_tbl REGCLASS, nms TEXT[])
RETURNS UUID[]
AS $$
DECLARE
ids UUID[];
query_str TEXT;
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
query_str := FORMAT('WITH new_names AS ( INSERT INTO %1$I(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM %1$I h WHERE h.name = tn.name) RETURNING hid)', h_tbl);
EXECUTE query_str;
SELECT ARRAY_AGG(hid) INTO ids FROM new_names;
DROP TABLE tmp_names;
RETURN ids;
END;
$$ LANGUAGE PLPGSQL;
This is the output I get when I run that function:
psql=# select insert_label('label', array['how', 'now', 'brown', 'cow']);
ERROR: syntax error at end of input
LINE 1: ...SELECT 1 FROM label h WHERE h.name = tn.name) RETURNING hid)
^
QUERY: WITH new_names AS ( INSERT INTO label(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM label h WHERE h.name = tn.name) RETURNING hid)
CONTEXT: PL/pgSQL function insert_label(regclass,text[]) line 19 at EXECUTE
The query generated by the dynamic SQL looks like it should be exactly the same as the query from static SQL.
I got the function to work by changing the return value from an array of UUIDs to a table of UUIDs and not using CTE:
CREATE OR REPLACE FUNCTION insert_label(h_tbl REGCLASS, nms TEXT[])
RETURNS TABLE (hid UUID)
AS $$
DECLARE
query_str TEXT;
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
query_str := FORMAT('INSERT INTO %1$I(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM %1$I h WHERE h.name = tn.name) RETURNING hid', h_tbl);
RETURN QUERY EXECUTE query_str;
DROP TABLE tmp_names;
RETURN;
END;
$$ LANGUAGE PLPGSQL;
I don't know if one way is better than the other, returning an array of UUIDs or a table of UUIDs, but at least I got it to work one of those ways. Plus, possibly not using a CTE is more efficient, so it may be better to stick with the version that returns a table of UUIDs.
What I would like to know is why the dynamic query did not work when using a CTE. The query it produced looked like it should have worked.
If anyone can let me know what I did wrong, I would appreciate it.
... why the dynamic query did not work when using a CTE. The query it produced looked like it should have worked.
No, it was only the CTE without (required) outer query. (You had SELECT ARRAY_AGG(hid) INTO ids FROM new_names in the static version.)
There are more problems, but just use this query instead:
INSERT INTO label(name)
SELECT unnest(nms)
ON CONFLICT DO NOTHING
RETURNING hid;
label.name is defined UNIQUE NOT NULL, so this simple UPSERT can replace your function insert_label() completely.
It's much simpler and faster. It also defends against possible duplicates from within your input array that you didn't cover, yet. And it's safe under concurrent write load - as opposed to your original, which might run into race conditions. Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?
I would just use the simple query and replace the table name.
But if you still want a dynamic function:
CREATE OR REPLACE FUNCTION insert_label(_tbl regclass, _nms text[])
RETURNS TABLE (hid uuid)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$$
INSERT INTO %s(name)
SELECT unnest($1)
ON CONFLICT DO NOTHING
RETURNING hid
$$, _tbl)
USING _nms;
END
$func$;
If you don't need an array as result, stick with the set (RETURNS TABLE ...). Simpler.
Pass values (_nms) to EXECUTE in a USING clause.
The tablename (_tbl) is type regclass, so the format specifier %I for format() would be wrong. Use %s instead. See:
Table name as a PostgreSQL function parameter

PostgreSQL: Increment otherwise insert

I have rows which are updated with an increment very often, but inserted very rarely. Is it possible to switch the order of the new INSERT ... ON CONFLICT statement to optimize for updates instead of inserts?
Right now I'm doing this:
INSERT INTO ?? (??) VALUES (?) ON CONFLICT(??) DO UPDATE SET ?? = ?? + 1 RETURNING ??
While this works, it also increases the sequence for the primary key each time even if the insert fails.
Is it possible to rewrite the query in a way that the first operation would be an update, and only if no update executed an insert would be performed?
I know no builtin command for that but you can write a stored procedure for that:
CREATE OR REPLACE FUNCTION update_or_insert(in_parameter1 INTEGER, ...) RETURNING SETOF my_table AS $$
DECLARE
result my_table%ROWTYPE;
BEGIN
WITH updated_rows AS (
UPDATE my_table SET ... WHERE ... RETURNING *
)
SELECT * INTO result FROM updated_rows;
IF FOUND THEN
RETURN NEXT result;
ELSE
WITH inserted_rows AS (
INSERt INTO my_table (...) VALUES (...) RETURNING *
)
SELECT * INTO result FROM inserted_rows;
RETURN NEXT result;
END IF;
RETURN;
$$ LANGUAGE plpgsql;
You can call this function as follows:
SELECT * FROM update_or_insert(123, ...);

Selecting and passing a record as a function argument

It may look like a duplicate of existing questions (e.g. This one) but they only deal with passing "new" arguments, not selecting rows from the database.
I have a table, for example:
CREATE TABLE my_table (
id bigserial NOT NULL,
name text,
CONSTRAINT my_table_pkey PRIMARY KEY (id)
);
And a function:
CREATE FUNCTION do_something(row_in my_table) RETURNS void AS
$$
BEGIN
-- does something
END;
$$
LANGUAGE plpgsql;
I would like to run it on data already existing in the database. It's no problem if I would like to use it from another PL/pgSQL stored procedure, for example:
-- ...
SELECT * INTO row_var FROM my_table WHERE id = 123; -- row_var is of type my_table%rowtype
PERFORM do_something(row_var);
-- ...
However, I have no idea how to do it using an "ordinary" query, e.g.
SELECT do_something(SELECT * FROM my_table WHERE id = 123);
ERROR: syntax error at or near "SELECT"
LINE 1: SELECT FROM do_something(SELECT * FROM my_table ...
Is there a way to execute such query?
You need to pass a scalar record to that function, this requires to enclose the actual select in another pair of parentheses:
SELECT do_something( (SELECT * FROM my_table WHERE id = 123) );
However the above will NOT work, because the function only expects a single column (a record of type my_table) whereas select * returns multiple columns (which is something different than a single record with multiple fields).
In order to return a record from the select you need to use the following syntax:
SELECT do_something( (SELECT my_table FROM my_table WHERE id = 123) );
Note that this might still fail if you don't make sure the select returns exactly one row.
If you want to apply the function to more than one row, you can do that like this:
select do_something(my_table)
from my_table;

How to return a record from function, executed by INSERT/UPDATE rule (trigger)?

Do the following scheme for my database:
create sequence data_sequence;
create table data_table
{
id integer primary key;
field varchar(100);
};
create view data_view as
select id, field from data_table;
create function data_insert(_new data_view) returns data_view as
$$declare
_id integer;
_result data_view%rowtype;
begin
_id := nextval('data_sequence');
insert into data_table(id, field) values(_id, _new.field);
select * into _result from data_view where id = _id;
return _result;
end;
$$
language plpgsql;
create rule insert as on insert to data_view do instead
select data_insert(new);
Then type in psql:
insert into data_view(field) values('abc');
Would like to see something like:
id | field
----+---------
1 | abc
Instead see:
data_insert
-------------
(1, "abc")
Is it possible to fix this somehow?
Thanks for any ideas.
Ultimate idea is to use this in other functions, so that I could obtain id of just inserted record without selecting for it from scratch. Something like:
insert into data_view(field) values('abc') returning id into my_variable
would be nice but doesn't work with error:
ERROR: cannot perform INSERT RETURNING on relation "data_view"
HINT: You need an unconditional ON INSERT DO INSTEAD rule with a RETURNING clause.
I don't really understand that HINT. I use PostgreSQL 8.4.
What you want to do is already built into postgres. It allows you to include a RETURNING clause on INSERT statements.
CREATE TABLE data_table (
id SERIAL,
field VARCHAR(100),
CONSTRAINT data_table_pkey PRIMARY KEY (id)
);
INSERT INTO data_table (field) VALUES ('testing') RETURNING id, field;
If you feel you must use a view, check this thread on the postgres mailing list before going any further.