How do I store date Variable in Postgres function? - sql

So I am working to create a function that will delete the 1 month worth records from a table. The table is in postgres. As postgres does not have stored procedures I am trying to declare a function with the logic that will insert the 1 month records into a history table and then delete the records from the live table. I have the following code :
CREATE FUNCTION DeleteAndInsertTransaction(Integer)
RETURNS Void
AS $Body$
SELECT now() into saveTime;
SELECT * INTO public.hist_table
FROM (select * from public.live_table
WHERE update < ((SELECT * FROM saveTime) - ($1::text || ' months')::interval)) as sub;
delete from public.live_table
where update < ((SELECT * FROM saveTime) - ($1::text || ' months')::interval);
DROP TABLE saveTime;
$Body$
Language 'sql';
So the above code compiles fine but when I try to run it by invoking it :- DeleteAndInsertTransaction(27) it gives me an
Error: relation "savetime" does not exist and I have no clue what is going on here.
If I take out the SELECT now() into saveTime; out of the function bloc and declare it before invoking the function then it runs fine but I need to store the current date into a variable and use that as a constant for the insert and delete and this is going against a huge table and there could be significant time difference between the insert and deletes. Any pointers as to what is going on here ?

select .. into .. is the deprecated syntax for create table ... as select ... which creates a new table.
So, SELECT now() into saveTime; actually creates a new table (named savetime), and is equivalent to: create table savetime as select now(); - it's not storing something in a variable.
To store a value in a variable, you need to first declare the variable, then you can assign the value. But you can only do that in PL/pgSQL, not SQL
CREATE FUNCTION DeleteAndInsertTransaction(p_num_months integer)
returns void
as
$Body$
declare
l_now timestamp;
begin
l_now := now();
...
end;
$body$
language plpgsql;
To insert into an existing table you need
insert into public.hist_table
select *
from public.live_table.
To select the rows from the last x month, there is no need to store the current date and time in a variable to begin with. It's also easier to use make_interval() to generate an interval based on a specified unit.
You can simply use
select *
from live_table
where updated_at <= current_date - make_interval(mons => p_pum_months);
And as you don't need a variable, you can actually do all that with a language sql function.
So the function would look something like this:
CREATE FUNCTION DeleteAndInsertTransaction(p_num_months integer)
RETURNS Void
AS
$Body$
insert into public.hist_table
select *
from live_table
where updated_at < current_date - make_interval(months => p_pum_months);
delete from public.live_table
where updated_at < current_date - make_interval(months => p_pum_months);
$Body$
Language sql;
Note that the language name is an identifier and should not be quoted.
You can actually do the DELETE and INSERT in a single statement:
with deleted as (
delete from public.live_table
where updated_at <= current_date - make_interval(months => p_pum_months)
returning *
)
insert into hist_table
select *
from deleted;

Related

Dynamic query that uses CTE gets "syntax error at end of input"

I have a table that looks like this:
CREATE TABLE label (
hid UUID PRIMARY KEY DEFAULT UUID_GENERATE_V4(),
name TEXT NOT NULL UNIQUE
);
I want to create a function that takes a list of names and inserts multiple rows into the table, ignoring duplicate names, and returns an array of the IDs generated for the rows it inserted.
This works:
CREATE OR REPLACE FUNCTION insert_label(nms TEXT[])
RETURNS UUID[]
AS $$
DECLARE
ids UUID[];
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
WITH new_names AS (
INSERT INTO label(name)
SELECT tn.name
FROM tmp_names tn
WHERE NOT EXISTS(SELECT 1 FROM label h WHERE h.name = tn.name)
RETURNING hid
)
SELECT ARRAY_AGG(hid) INTO ids
FROM new_names;
DROP TABLE tmp_names;
RETURN ids;
END;
$$ LANGUAGE PLPGSQL;
I have many tables with the exact same columns as the label table, so I would like to have a function that can insert into any of them. I'd like to create a dynamic query to do that. I tried that, but this does not work:
CREATE OR REPLACE FUNCTION insert_label(h_tbl REGCLASS, nms TEXT[])
RETURNS UUID[]
AS $$
DECLARE
ids UUID[];
query_str TEXT;
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
query_str := FORMAT('WITH new_names AS ( INSERT INTO %1$I(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM %1$I h WHERE h.name = tn.name) RETURNING hid)', h_tbl);
EXECUTE query_str;
SELECT ARRAY_AGG(hid) INTO ids FROM new_names;
DROP TABLE tmp_names;
RETURN ids;
END;
$$ LANGUAGE PLPGSQL;
This is the output I get when I run that function:
psql=# select insert_label('label', array['how', 'now', 'brown', 'cow']);
ERROR: syntax error at end of input
LINE 1: ...SELECT 1 FROM label h WHERE h.name = tn.name) RETURNING hid)
^
QUERY: WITH new_names AS ( INSERT INTO label(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM label h WHERE h.name = tn.name) RETURNING hid)
CONTEXT: PL/pgSQL function insert_label(regclass,text[]) line 19 at EXECUTE
The query generated by the dynamic SQL looks like it should be exactly the same as the query from static SQL.
I got the function to work by changing the return value from an array of UUIDs to a table of UUIDs and not using CTE:
CREATE OR REPLACE FUNCTION insert_label(h_tbl REGCLASS, nms TEXT[])
RETURNS TABLE (hid UUID)
AS $$
DECLARE
query_str TEXT;
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
query_str := FORMAT('INSERT INTO %1$I(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM %1$I h WHERE h.name = tn.name) RETURNING hid', h_tbl);
RETURN QUERY EXECUTE query_str;
DROP TABLE tmp_names;
RETURN;
END;
$$ LANGUAGE PLPGSQL;
I don't know if one way is better than the other, returning an array of UUIDs or a table of UUIDs, but at least I got it to work one of those ways. Plus, possibly not using a CTE is more efficient, so it may be better to stick with the version that returns a table of UUIDs.
What I would like to know is why the dynamic query did not work when using a CTE. The query it produced looked like it should have worked.
If anyone can let me know what I did wrong, I would appreciate it.
... why the dynamic query did not work when using a CTE. The query it produced looked like it should have worked.
No, it was only the CTE without (required) outer query. (You had SELECT ARRAY_AGG(hid) INTO ids FROM new_names in the static version.)
There are more problems, but just use this query instead:
INSERT INTO label(name)
SELECT unnest(nms)
ON CONFLICT DO NOTHING
RETURNING hid;
label.name is defined UNIQUE NOT NULL, so this simple UPSERT can replace your function insert_label() completely.
It's much simpler and faster. It also defends against possible duplicates from within your input array that you didn't cover, yet. And it's safe under concurrent write load - as opposed to your original, which might run into race conditions. Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?
I would just use the simple query and replace the table name.
But if you still want a dynamic function:
CREATE OR REPLACE FUNCTION insert_label(_tbl regclass, _nms text[])
RETURNS TABLE (hid uuid)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$$
INSERT INTO %s(name)
SELECT unnest($1)
ON CONFLICT DO NOTHING
RETURNING hid
$$, _tbl)
USING _nms;
END
$func$;
If you don't need an array as result, stick with the set (RETURNS TABLE ...). Simpler.
Pass values (_nms) to EXECUTE in a USING clause.
The tablename (_tbl) is type regclass, so the format specifier %I for format() would be wrong. Use %s instead. See:
Table name as a PostgreSQL function parameter

How to use upsert with Postgres

I want to convert this code in Postgres to something shorter that will do the same. I read about upsert but I couldn't understand a good way to implement that on my code.
What I wrote works fine, but I want to find a more elegant way to write it.
Hope someone here can help me! This is the query:
CREATE OR REPLACE FUNCTION insert_table(
in_guid character varying,
in_x_value character varying,
in_y_value character varying
)
RETURNS TABLE(response boolean) LANGUAGE 'plpgsql'
DECLARE _id integer;
BEGIN
-- guid exists and it's been 10 minutes from created_date:
IF ((SELECT COUNT (*) FROM public.tbl_client_location WHERE guid = in_guid AND created_date < NOW() - INTERVAL '10 MINUTE') > 0) THEN
RETURN QUERY (SELECT FALSE);
-- guid exists but 10 minutes hasen't passed yet:
ELSEIF ((SELECT COUNT (*) FROM public.tbl_client_location WHERE guid = in_guid) > 0) THEN
UPDATE
public.tbl_client_location
SET
x_value = in_x_value,
y_value = in_y_value,
updated_date = now()
WHERE
guid = in_guid;
RETURN QUERY (SELECT TRUE);
-- guid not exist:
ELSE
INSERT INTO public.tbl_client_location
( guid , x_value , y_value )
VALUES
( in_guid, in_x_value, in_y_value )
RETURNING id INTO _id;
RETURN QUERY (SELECT TRUE);
END IF;
END
This can indeed be a lot simpler:
CREATE OR REPLACE FUNCTION insert_table(in_guid text
, in_x_value text
, in_y_value text
, OUT response bool) -- ④
-- RETURNS record -- optional noise -- ④
LANGUAGE plpgsql AS -- ①
$func$ -- ②
-- DECLARE
-- _id integer; -- what for?
BEGIN
INSERT INTO tbl AS t
( guid, x_value, y_value)
VALUES (in_guid, in_x_value, in_y_value)
ON CONFLICT (guid) DO UPDATE -- guid exists
SET ( x_value, y_value, updated_date)
= (EXCLUDED.x_value, EXCLUDED.y_value, now()) -- ⑤
WHERE t.created_date >= now() - interval '10 minutes' -- ③ have not passed yet
-- RETURNING id INTO _id -- what for?
;
response := FOUND; -- ⑥
END
$func$;
Assuming guid is defined UNIQUE or PRIMARY KEY, and created_date is defined NOT NULL DEFAULT now().
① Language name is an identifier - better without quotes.
② Quotes around function body were missing (invalid command). See:
What are '$$' used for in PL/pgSQL
③ UPDATE only if 10 min have not passed yet. Keep in mind that timestamps are those from the beginning of the respective transactions. So keep transactions short and simple. See:
Difference between now() and current_timestamp
④ A function with OUT parameter(s) and no RETURNS clause returns a single row (record) automatically. Your original was declared as set-returning function (0-n returned rows), which didn't make sense. See:
Return multiple fields as a record in PostgreSQL with PL/pgSQL
⑤ It's generally better to use the special EXCLUDED row than to spell out values again. See:
How could this UPSERT query be made shorter?
⑤ Also using short syntax for updating multiple columns. See:
Update multiple columns that start with a specific string
⑥ To see whether a row was written use the special variable FOUND. Subtle difference: different from your original, you get true or false after the fact, saying that a row has actually been written (or not). In your original, the INSERT or UPDATE might still be skipped (without raising an exception) by a trigger or rule, and the function result would be misleading in this case. See:
IS NOT NULL test for a record does not return TRUE when variable is set
Further reading:
Postgres ON CONFLICT ON CONSTRAINT triggering errors in the error log
How to use RETURNING with ON CONFLICT in PostgreSQL?
You might just run the single SQL statement instead, providing your values once:
INSERT INTO tbl AS t(guid, x_value,y_value)
VALUES ($in_guid, $in_x_value, $in_y_value) -- your values here, once
ON CONFLICT (guid) DO UPDATE
SET (x_value,y_value, updated_date)
= (EXCLUDED.x_value, EXCLUDED.y_value, now())
WHERE t.created_date >= now() - interval '10 minutes';
I finally solved it. I made another function that'll be called and checked if it's already exists and the time and then I can do upsert without any problems.
That's what I did at the end:
CREATE OR REPLACE FUNCTION fnc_check_table(
in_guid character varying)
RETURNS TABLE(response boolean)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
BEGIN
IF EXISTS (SELECT FROM tbl WHERE guid = in_guid AND created_date < NOW() - INTERVAL '10 MINUTE' ) THEN
RETURN QUERY (SELECT FALSE);
ELSEIF EXISTS (SELECT FROM tbl WHERE guid = in_guid AND created_date > NOW() - INTERVAL '10 MINUTE') THEN
RETURN QUERY (SELECT TRUE);
ELSE
RETURN QUERY (SELECT TRUE);
END IF;
END
$BODY$;
CREATE OR REPLACE FUNCTION fnc_insert_table(
in_guid character varying,
in_x_value character varying,
in_y_value character varying)
RETURNS TABLE(response boolean)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
BEGIN
IF (fnc_check_table(in_guid)) THEN
INSERT INTO tbl (guid, x_value, y_value)
VALUES (in_guid,in_x_value,in_y_value)
ON CONFLICT (guid)
DO UPDATE SET x_value=in_x_value, y_value=in_y_value, updated_date=now();
RETURN QUERY (SELECT TRUE);
ELSE
RETURN QUERY (SELECT FALSE);
END IF;
END
$BODY$;

Postgres: How to set column default value as another column value while altering the table

I have a postgres table with millions of record in it. Now I want to add new column to that table called "time_modified" with the value in another column "last_event_time". Running a migration script is taking long time , so need a simple solution to run in production.
Assuming that the columns are timestamps you can try:
alter table my_table add time_modified text;
alter table my_table alter time_modified type timestamp using last_event_time;
I suggest use function with pg_sleep, which wait between iteriation in loop
This way don't invoke exclusive lock and others locks on your_table.
SELECT pg_sleep(seconds);
But time of execute is long
alter table my_table add time_modified timestamp;
CREATE OR REPLACE FUNCTION update_mew_column()
RETURNS void AS
$BODY$
DECLARE
rec record;
BEGIN
for rec in (select id,last_event_time from your_table) loop
update your_table set time_modified = rec.last_event_time where id = rec.id;
PERFORM pg_sleep(0.01);
end loop;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
and execute function:
select update_mew_column();

PostgreSQL inherited table and insert triggers

I'm trying to follow the advice here to create a vertically partitioned table for storing time series data.
So far, my schema looks like this:
CREATE TABLE events
(
topic text,
t timestamp,
value integer,
primary key(topic, t)
);
CREATE TABLE events_2014
(
primary key (topic, t),
check (t between '2014-01-01' and '2015-01-01')
) INHERITS (events);
Now I'm trying to create an INSTEAD OF INSERT trigger so that events can be inserted on the events table and the row will end up in the right sub-table. But the documentation says that INSTEAD OF INSERT triggers can only be created on views, not tables (or subtables):
CREATE OR REPLACE FUNCTION insert_events () RETURNS TRIGGER AS $insert_events$ BEGIN
IF new.t between '2014-01-01' and '2015-01-01' THEN
INSERT INTO events_2014 SELECT new.*;
...
END IF
RETURN NULL;
END;
$insert_events$ LANGUAGE PLPGSQL;
CREATE TRIGGER insert_events INSTEAD OF INSERT ON events FOR EACH ROW EXECUTE PROCEDURE insert_events();
ERROR: "events" is a table
DETAIL: Tables cannot have INSTEAD OF triggers.
What's the right way of doing this?
You need to declare BEFORE INSERT triggers.
Documentation on partitioning is a great source of knowledge in this matter and is full of examples.
Example function from docs
CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.logdate >= DATE '2006-02-01' AND
NEW.logdate < DATE '2006-03-01' ) THEN
INSERT INTO measurement_y2006m02 VALUES (NEW.*);
ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
NEW.logdate < DATE '2006-04-01' ) THEN
INSERT INTO measurement_y2006m03 VALUES (NEW.*);
...
ELSIF ( NEW.logdate >= DATE '2008-01-01' AND
NEW.logdate < DATE '2008-02-01' ) THEN
INSERT INTO measurement_y2008m01 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Example trigger from docs
CREATE TRIGGER insert_measurement_trigger
BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();
Returning NULL from BEFORE trigger will keep the parent table empty.

How to create sequence which start from 1 in each day

Sequence should return values 1,2,3 etc starting for 1 for every day.
current_date should used for day determination.
For example, calling today first time it shoudl return 1, in second time 2 etc.
Tomorrow, first call shoud return again 1, second call 2 etc.
Postgres 9.1 is used.
Use a table to keep the sequence:
create table daily_sequence (
day date, s integer, primary key (day, s)
);
This function will retrieve the next value:
create or replace function daily_sequence()
returns int as $$
insert into daily_sequence (day, s)
select current_date, coalesce(max(s), 0) + 1
from daily_sequence
where day = current_date
returning s
;
$$ language sql;
select daily_sequence();
Be prepared to retry in case of an improbable duplicate key value error. If previous days' sequences are not necessary delete them to keep the table and the index as light as possible:
create or replace function daily_sequence()
returns int as $$
with d as (
delete from daily_sequence
where day < current_date
)
insert into daily_sequence (day, s)
select current_date, coalesce(max(s), 0) + 1
from daily_sequence
where day = current_date
returning s
;
$$ language sql;
You just need to think of cronjob as running a shell command at a specified time or day.
Shell Command for running cron job
psql --host host.domain.com --port 32098 --db_name databaseName < my.sql
You can then just add this to your crontab (I recommend you use crontab -e to avoid breaking things)
# It will run your command at 00:00 every day
# min hour wday month mday command-to-run
0 0 * * * psql --host host.domain.com --port 32098 --db_name databaseName < my.sql
It is quite interesting task.
Lets try to use additional sequence for the date and alternative function to get next value:
-- We will use anonymous block here because it is impossible to use
-- variables and functions in DDL directly
do language plpgsql $$
begin
execute 'create sequence my_seq_day start with ' || (current_date - '1900-01-01')::varchar;
end; $$;
-- Initialize sequence
select nextval('my_seq_day');
create sequence my_seq;
create or replace function nextval_daily(in p_seq varchar) returns bigint as $$
declare
dd bigint;
lv bigint;
begin
select current_date - '1900-01-01'::date into dd;
-- Here we should to retrieve current value from sequence
-- properties instead of currval function to make it session-independent
execute 'select last_value from '||p_seq||'_day' into lv;
if dd - lv > 0 then
-- If next day has come
-- Reset main sequens
execute 'alter sequence '||p_seq||' restart';
-- And set the day sequence to the current day
execute 'alter sequence '||p_seq||'_day restart with '||dd::varchar;
execute 'select nextval('''||p_seq||'_day'')' into lv;
end if;
return nextval(p_seq);
end; $$ language plpgsql;
Then use function nextval_daily instead of nextval.
Hope it was helpful.
I have came across with almost similar requirement.
Handled the logic from query rather than modifying the sequence.
used setval() to reset the sequence to 0 if its the first entry to the table for the day.
Else nextval() of the sequence.
Below is the sample query :
SELECT
CASE WHEN NOT EXISTS (
SELECT primary_key FROM schema.table WHERE date(updated_datetime) = #{systemDate} limit 1)
THEN
setval('scheam.job_seq', 1)
ELSE
nextval('scheam.job_seq')
END
UPDATE privilege is required for the user to execute setval.
GRANT UPDATE ON ALL SEQUENCES IN SCHEMA ur_schema TO user;