How to use upsert with Postgres

How to use upsert with Postgres - sql

I want to convert this code in Postgres to something shorter that will do the same. I read about upsert but I couldn't understand a good way to implement that on my code.
What I wrote works fine, but I want to find a more elegant way to write it.
Hope someone here can help me! This is the query:
CREATE OR REPLACE FUNCTION insert_table(
in_guid character varying,
in_x_value character varying,
in_y_value character varying
)
RETURNS TABLE(response boolean) LANGUAGE 'plpgsql'
DECLARE _id integer;
BEGIN
-- guid exists and it's been 10 minutes from created_date:
IF ((SELECT COUNT (*) FROM public.tbl_client_location WHERE guid = in_guid AND created_date < NOW() - INTERVAL '10 MINUTE') > 0) THEN
RETURN QUERY (SELECT FALSE);
-- guid exists but 10 minutes hasen't passed yet:
ELSEIF ((SELECT COUNT (*) FROM public.tbl_client_location WHERE guid = in_guid) > 0) THEN
UPDATE
public.tbl_client_location
SET
x_value = in_x_value,
y_value = in_y_value,
updated_date = now()
WHERE
guid = in_guid;
RETURN QUERY (SELECT TRUE);
-- guid not exist:
ELSE
INSERT INTO public.tbl_client_location
( guid , x_value , y_value )
VALUES
( in_guid, in_x_value, in_y_value )
RETURNING id INTO _id;
RETURN QUERY (SELECT TRUE);
END IF;
END

This can indeed be a lot simpler:
CREATE OR REPLACE FUNCTION insert_table(in_guid text
, in_x_value text
, in_y_value text
, OUT response bool) -- ④
-- RETURNS record -- optional noise -- ④
LANGUAGE plpgsql AS -- ①
$func$ -- ②
-- DECLARE
-- _id integer; -- what for?
BEGIN
INSERT INTO tbl AS t
( guid, x_value, y_value)
VALUES (in_guid, in_x_value, in_y_value)
ON CONFLICT (guid) DO UPDATE -- guid exists
SET ( x_value, y_value, updated_date)
= (EXCLUDED.x_value, EXCLUDED.y_value, now()) -- ⑤
WHERE t.created_date >= now() - interval '10 minutes' -- ③ have not passed yet
-- RETURNING id INTO _id -- what for?
;
response := FOUND; -- ⑥
END
$func$;
Assuming guid is defined UNIQUE or PRIMARY KEY, and created_date is defined NOT NULL DEFAULT now().
① Language name is an identifier - better without quotes.
② Quotes around function body were missing (invalid command). See:
What are '$$' used for in PL/pgSQL
③ UPDATE only if 10 min have not passed yet. Keep in mind that timestamps are those from the beginning of the respective transactions. So keep transactions short and simple. See:
Difference between now() and current_timestamp
④ A function with OUT parameter(s) and no RETURNS clause returns a single row (record) automatically. Your original was declared as set-returning function (0-n returned rows), which didn't make sense. See:
Return multiple fields as a record in PostgreSQL with PL/pgSQL
⑤ It's generally better to use the special EXCLUDED row than to spell out values again. See:
How could this UPSERT query be made shorter?
⑤ Also using short syntax for updating multiple columns. See:
Update multiple columns that start with a specific string
⑥ To see whether a row was written use the special variable FOUND. Subtle difference: different from your original, you get true or false after the fact, saying that a row has actually been written (or not). In your original, the INSERT or UPDATE might still be skipped (without raising an exception) by a trigger or rule, and the function result would be misleading in this case. See:
IS NOT NULL test for a record does not return TRUE when variable is set
Further reading:
Postgres ON CONFLICT ON CONSTRAINT triggering errors in the error log
How to use RETURNING with ON CONFLICT in PostgreSQL?
You might just run the single SQL statement instead, providing your values once:
INSERT INTO tbl AS t(guid, x_value,y_value)
VALUES ($in_guid, $in_x_value, $in_y_value) -- your values here, once
ON CONFLICT (guid) DO UPDATE
SET (x_value,y_value, updated_date)
= (EXCLUDED.x_value, EXCLUDED.y_value, now())
WHERE t.created_date >= now() - interval '10 minutes';

I finally solved it. I made another function that'll be called and checked if it's already exists and the time and then I can do upsert without any problems.
That's what I did at the end:
CREATE OR REPLACE FUNCTION fnc_check_table(
in_guid character varying)
RETURNS TABLE(response boolean)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
BEGIN
IF EXISTS (SELECT FROM tbl WHERE guid = in_guid AND created_date < NOW() - INTERVAL '10 MINUTE' ) THEN
RETURN QUERY (SELECT FALSE);
ELSEIF EXISTS (SELECT FROM tbl WHERE guid = in_guid AND created_date > NOW() - INTERVAL '10 MINUTE') THEN
RETURN QUERY (SELECT TRUE);
ELSE
RETURN QUERY (SELECT TRUE);
END IF;
END
$BODY$;
CREATE OR REPLACE FUNCTION fnc_insert_table(
in_guid character varying,
in_x_value character varying,
in_y_value character varying)
RETURNS TABLE(response boolean)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
BEGIN
IF (fnc_check_table(in_guid)) THEN
INSERT INTO tbl (guid, x_value, y_value)
VALUES (in_guid,in_x_value,in_y_value)
ON CONFLICT (guid)
DO UPDATE SET x_value=in_x_value, y_value=in_y_value, updated_date=now();
RETURN QUERY (SELECT TRUE);
ELSE
RETURN QUERY (SELECT FALSE);
END IF;
END
$BODY$;

Related

Is SELECT "faster" than function with nested INSERT?

I'm using a function that inserts a row to a table if it doesn't exist, then returns the id of the row.
Whenever I put the function inside a SELECT statement, with values that don't exist in the table yet, e.g.:
SELECT * FROM table WHERE id = function(123);
... it returns an empty row. However, running it again with the same values will return the row with the values I want to see.
Why does this happen? Is the INSERT running behind the SELECT speed? Or does PostgreSQL cache the table when it didn't exist, and at next run, it displays the result?
Here's a ready to use example of how this issue can occur:
CREATE TABLE IF NOT EXISTS test_table(
id INTEGER,
tvalue boolean
);
CREATE OR REPLACE FUNCTION test_function(user_id INTEGER)
RETURNS integer
LANGUAGE 'plpgsql'
AS $$
DECLARE
__user_id INTEGER;
BEGIN
EXECUTE format('SELECT * FROM test_table WHERE id = $1')
USING user_id
INTO __user_id;
IF __user_id IS NOT NULL THEN
RETURN __user_id;
ELSE
INSERT INTO test_table(id, tvalue)
VALUES (user_id, TRUE)
RETURNING id
INTO __user_id;
RETURN __user_id;
END IF;
END;
$$;
Call:
SELECT * FROM test_table WHERE id = test_function(4);
To reproduce the issue, pass any integer that doesn't exist in the table, yet.

The example is broken in multiple places.
No need for dynamic SQL with EXECUTE.
SELECT * in the function is wrong.
Your table definition should have a UNIQUE or PRIMARY KEY constraint on (id).
Most importantly, the final SELECT statement is bound to fail. Since the function is VOLATILE (has to be), it is evaluated once for every existing row in the table. Even if that worked, it would be a performance nightmare. But it does not. Like #user2864740 commented, there is also a problem with visibility. Postgres checks every existing row against the result of the function, which in turn adds 1 or more rows, and those rows are not yet in the snapshot the SELECT is operating on.
SELECT * FROM test_table WHERE id = test_function(4);
This would work (but see below!):
CREATE TABLE test_table (
id int PRIMARY KEY --!
, tvalue bool
);
CREATE OR REPLACE FUNCTION test_function(_user_id int)
RETURNS test_table LANGUAGE sql AS
$func$
WITH ins AS (
INSERT INTO test_table(id, tvalue)
VALUES (_user_id, TRUE)
ON CONFLICT DO NOTHING
RETURNING *
)
TABLE ins
UNION ALL
SELECT * FROM test_table WHERE id = _user_id
LIMIT 1
$func$;
And replace your SELECT with just:
SELECT * FROM test_function(1);
db<>fiddle here
Related:
Return a value if no record is found
How to use RETURNING with ON CONFLICT in PostgreSQL?
There is still a race condition for concurrent calls. If that can happen, consider:
Is SELECT or INSERT in a function prone to race conditions?

How do I store date Variable in Postgres function?

So I am working to create a function that will delete the 1 month worth records from a table. The table is in postgres. As postgres does not have stored procedures I am trying to declare a function with the logic that will insert the 1 month records into a history table and then delete the records from the live table. I have the following code :
CREATE FUNCTION DeleteAndInsertTransaction(Integer)
RETURNS Void
AS $Body$
SELECT now() into saveTime;
SELECT * INTO public.hist_table
FROM (select * from public.live_table
WHERE update < ((SELECT * FROM saveTime) - ($1::text || ' months')::interval)) as sub;
delete from public.live_table
where update < ((SELECT * FROM saveTime) - ($1::text || ' months')::interval);
DROP TABLE saveTime;
$Body$
Language 'sql';
So the above code compiles fine but when I try to run it by invoking it :- DeleteAndInsertTransaction(27) it gives me an
Error: relation "savetime" does not exist and I have no clue what is going on here.
If I take out the SELECT now() into saveTime; out of the function bloc and declare it before invoking the function then it runs fine but I need to store the current date into a variable and use that as a constant for the insert and delete and this is going against a huge table and there could be significant time difference between the insert and deletes. Any pointers as to what is going on here ?

select .. into .. is the deprecated syntax for create table ... as select ... which creates a new table.
So, SELECT now() into saveTime; actually creates a new table (named savetime), and is equivalent to: create table savetime as select now(); - it's not storing something in a variable.
To store a value in a variable, you need to first declare the variable, then you can assign the value. But you can only do that in PL/pgSQL, not SQL
CREATE FUNCTION DeleteAndInsertTransaction(p_num_months integer)
returns void
as
$Body$
declare
l_now timestamp;
begin
l_now := now();
...
end;
$body$
language plpgsql;
To insert into an existing table you need
insert into public.hist_table
select *
from public.live_table.
To select the rows from the last x month, there is no need to store the current date and time in a variable to begin with. It's also easier to use make_interval() to generate an interval based on a specified unit.
You can simply use
select *
from live_table
where updated_at <= current_date - make_interval(mons => p_pum_months);
And as you don't need a variable, you can actually do all that with a language sql function.
So the function would look something like this:
CREATE FUNCTION DeleteAndInsertTransaction(p_num_months integer)
RETURNS Void
AS
$Body$
insert into public.hist_table
select *
from live_table
where updated_at < current_date - make_interval(months => p_pum_months);
delete from public.live_table
where updated_at < current_date - make_interval(months => p_pum_months);
$Body$
Language sql;
Note that the language name is an identifier and should not be quoted.
You can actually do the DELETE and INSERT in a single statement:
with deleted as (
delete from public.live_table
where updated_at <= current_date - make_interval(months => p_pum_months)
returning *
)
insert into hist_table
select *
from deleted;

How to use the result of the query within a function to execute UPDATE

I'm very new to SQL and I'm currently using Postgres to execute a function. Essentially, I want the function to 1) first perform a query, 2) depending on the condition, update the specified field, and 3) return the result.
CREATE OR REPLACE FUNCTION get_overdue()
RETURNS TABLE (
overdue boolean,
due_date DATE
)
$$
BEGIN
SELECT overdue, due_date FROM booking;
IF NOW()::DATE > due_date::date then
-- I want to execute and return the following the query result: UPDATE booking SET overdue = true WHERE (the result of the above query)
END IF;
end$$

If I understand the logic well, you can do this in a single query:
update booking
set overdue = true
where due_date < current_date
returning *;

Optimising function which extracts records with a minimum gap in timestamps

I have a big table of timestamps in Postgres 9.4.5:
CREATE TABLE vessel_position (
posid serial NOT NULL,
mmsi integer NOT NULL,
"timestamp" timestamp with time zone,
the_geom geometry(PointZ,4326),
CONSTRAINT "PK_posid_mmsi" PRIMARY KEY (posid, mmsi)
);
Additional index:
CREATE INDEX vessel_position_timestamp_idx ON vessel_position ("timestamp");
I want to extract every row where the timestamp is at least x minutes after the previous row. I've tried a few different SELECT statements using LAG() which all kind of worked, but didn't give me the exact result I require. The below functions gives me what I need, but I feel it could be quicker:
CREATE OR REPLACE FUNCTION _getVesslTrackWithInterval(mmsi integer, startTime character varying (25) ,endTime character varying (25), interval_min integer)
RETURNS SETOF vessel_position AS
$func$
DECLARE
count integer DEFAULT 0;
posids varchar DEFAULT '';
tbl CURSOR FOR
SELECT
posID
,EXTRACT(EPOCH FROM (timestamp - lag(timestamp) OVER (ORDER BY posid asc)))::int as diff
FROM vessel_position vp WHERE vp.mmsi = $1 AND vp.timestamp BETWEEN $2::timestamp AND $3::timestamp;
BEGIN
FOR row IN tbl
LOOP
count := coalesce(row.diff,0) + count;
IF count >= $4*60 OR count = 0 THEN
posids:= posids || row.posid || ',';
count:= 0;
END IF;
END LOOP;
RETURN QUERY EXECUTE 'SELECT * from vessel_position where posid in (' || TRIM(TRAILING ',' FROM posids) || ')';
END
$func$ LANGUAGE plpgsql;
I can't help thinking getting all the posids as a string and then selecting them all again at the very end is slowing things down.
Within the IF statement, I already have access to each row I want to keep, so could potentially store them in a temp table and then return temp table at the end of the loop.
Can this function be optimised - to improve performance in particular?

Query
Your function has all kinds of expensive, unnecessary overhead. A single query should be many times faster, doing the same:
CREATE OR REPLACE FUNCTION _get_vessel_track_with_interval
(mmsi int, starttime timestamptz, endtime timestamptz, min_interval interval)
RETURNS SETOF vessel_position AS
$func$
BEGIN
SELECT (vp).* -- parentheses required for decomposing row type
FROM (
SELECT vp -- whole row (!)
, timestamp - lag(timestamp) OVER (ORDER BY posid) AS diff
FROM vessel_position vp
WHERE vp.mmsi = $1
AND vp.timestamp >= $2 -- typically you'd include the lower bound
AND vp.timestamp < $3; -- ... and exlude the upper
ORDER BY posid
) sub
WHERE diff >= $4;
END
$func$ LANGUAGE plpgsql STABLE;
Could also just be an SQL function or the bare SELECT without any wrapper (Maybe as prepared statement? Example.)
Note how starttime and endtime are passed as timestamp. (Makes no sense to pass as text and cast.) And the minimum interval min_interval is an actual interval. Pass any interval of your choosing.
Index
If the predicate on mmsi is in any way selective, the two indexes you currently have (PK ON (posid, mmsi) and idx on (timestamp)) are not very useful. If you reverse the column order of your PK to (mmsi, posid), it becomes far more useful for the query at hand. See:
Is a composite index also good for queries on the first field?
The optimal index for this would typically be on vessel_position(mmsi, timestamp). Related:
Multicolumn index and performance
PostgreSQL performance with (col = value or col is NULL)
Query does not hit the index - are these the proper columns to index?
Aside: Avoid keywords as identifiers. That's asking for trouble. Plus, a column timestamp that actually holds timestamptz is misleading.

How to pass the value of NEW using dollar quoting?

I cannot access the value of the NEW row inside my crosstab() query string.
CREATE OR REPLACE FUNCTION insert_fx()
RETURNS TRIGGER AS
$BODY$
BEGIN
INSERT INTO outputtb (serial,date, judge)
VALUES (NEW.serial, NEW.date, NEW.tjudge) RETURNING serial INTO newserial;
UPDATE outputtb
SET (reading1,
reading2,
reading3) =
(SELECT ct."reading1",
ct."reading2",
ct."reading3"
FROM crosstab( $$
SELECT tb2. serial,tb2. readings,tb2. value
FROM DATA AS tb2
INNER JOIN outputtb AS tb1 USING (serial)
WHERE tb2.serial = $$||NEW.serno||$$
ORDER BY 1 ASC $$, $$
VALUES ('reading1'),('reading2'),('reading3')$$
) ct ("Serial" VARCHAR(50),"Reading1" FLOAT8, "Reading2" FLOAT8, "Reading3" FLOAT8))
WHERE sn = NEW.serno;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
CREATE TRIGGER insert_tg
BEFORE INSERT ON details
FOR EACH ROW EXECUTE PROCEDURE insert_fx();
It returns this error:
ERROR: syntax error at or near "CC1027HCA0GESKN00CC000FT0000"
LINE 6: tb2. serial = 043611007853619CC1027HCA0GESKN00CC000FT...
I think it does not accept characters, it only accepts integers. Maybe the quoting need some modification and I'm not that familiar with pgsql quoting.
I need help to finish my project. I'm stuck on this part.

The immediate cause of the error message is that you concatenated the string NEW.serno without quoting it. To safely fix use format() or quote_literal() or quote_nullable().
...
UPDATE outputtb
SET (reading1, reading2, reading3)
= (SELECT ct.reading1, ct.reading2, ct.reading3
FROM crosstab(
'SELECT serial, t2.readings, t2.value
FROM data t2
JOIN outputtb t1 USING (serial)
WHERE serial = ' || quote_nullable(NEW.serno) || '
ORDER BY 1'
, $$VALUES ('reading1'),('reading2'),('reading3')$$
) ct (serial text, reading1 float8, reading2 float8, reading3 float8))
WHERE sn = NEW.serno;
...
Basics:
Insert text with single quotes in PostgreSQL
In passing I also fixed your incorrect mixed-case identifiers:
Are PostgreSQL column names case-sensitive?
But there are more problems:
newserial has not been declared and is also not used.
outputtb is pointless noise in the query passed to crosstab().
Like #a_horse commented, you shouldn't need an INSERT and an UPDATE, and crosstab() also seems like overkill.
This is a big mess.
Going out on a limb, my educated guess is you want this:
CREATE OR REPLACE FUNCTION insert_fx()
RETURNS TRIGGER AS
$func$
BEGIN
INSERT INTO outputtb (serial, date, judge, reading1, reading2, reading3)
SELECT NEW.serial, NEW.date, NEW.tjudge, ct.*
FROM (SELECT 1) dummy
LEFT JOIN crosstab (
'SELECT serial, readings, value
FROM data
WHERE serial = ' || quote_nullable(NEW.serno) || '
ORDER BY 1'
, $$VALUES ('reading1'),('reading2'),('reading3')$$
) ct (serial text, reading1 float8, reading2 float8, reading3 float8) ON true;
RETURN NEW;
END
$func$ LANGUAGE plpgsql;
The LEFT JOIN to a dummy table prevents losing the INSERT when crosstab() comes up empty.
Which can be simplified to:
CREATE OR REPLACE FUNCTION insert_fx()
RETURNS TRIGGER AS
$func$
BEGIN
INSERT INTO outputtb (serial, date, judge, reading1, reading2, reading3)
SELECT NEW.serial, NEW.date, NEW.tjudge
min(value) FILTER (WHERE readings = 'reading1')
min(value) FILTER (WHERE readings = 'reading2')
min(value) FILTER (WHERE readings = 'reading3')
FROM data
WHERE serial = NEW.serno;
RETURN NEW;
END
$func$ LANGUAGE plpgsql;
Since we aggregate now, a result row is guaranteed, and we don't have to defend against losing it.
Aside:
"serial" is not a reserved word. But it's the name of a common pseudo data-type, so I still wouldn't use it as column name to avoid confusing error situations.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use upsert with Postgres - sql

Related

Is SELECT "faster" than function with nested INSERT?

How do I store date Variable in Postgres function?

How to use the result of the query within a function to execute UPDATE

Optimising function which extracts records with a minimum gap in timestamps

How to pass the value of NEW using dollar quoting?

Categories

Resources