How to use a counter in a Postgresql function loop when generating a timestamp type for each iteration - postgresql-9.5

CREATE OR REPLACE FUNCTION inserts_table_foo(startId INT,
endId INT,
stepTimeZone INT default 0,
period INT default 0) RETURNS void AS
$$
DECLARE
nameVar TEXT;
dateVar TIMESTAMP;
begin
for idVar in startId..endId
loop
nameVar := md5(random()::text);
dateVar :=
(now()::timestamp with time zone +
make_interval(hours := stepTimeZone) +
make_interval(days := period)
);
period := period + 1;
insert into foo(id, name, data) values (idVar, nameVar, dateVar);
end loop;
end ;
$$ LANGUAGE plpgsql;
Unfortunately, the variable -> period, does not work as intended. I assumed that the counter would increase by 1 and each time, the date generation would be different, but in fact, the date turns out to be the same for the entire duration of the cycle.
How to make the date, for each iteration, be generated anew? This won't do here
generate_series ( TIMESTAMP WITHOUT TIME ZONE '2022-01-01', CURRENT_DATE, '1 day' )
because, you need to bet 100,000,000 records, in a table where the date field can be in the table in several places (dateStart, dateEnd, startDateReg, etc.) and are placed out of order. Therefore, I want to make a function that accepts a number of variables, according to which I can set the periods of the generated values, in this example, this is the date and time.
I would like to clearly see which variable the value comes into and also see the code for inserting this value into the table (that is, using the INSERT into entry .....)
What is the problem here and how could it be solved ?

Related

How to use upsert with Postgres

I want to convert this code in Postgres to something shorter that will do the same. I read about upsert but I couldn't understand a good way to implement that on my code.
What I wrote works fine, but I want to find a more elegant way to write it.
Hope someone here can help me! This is the query:
CREATE OR REPLACE FUNCTION insert_table(
in_guid character varying,
in_x_value character varying,
in_y_value character varying
)
RETURNS TABLE(response boolean) LANGUAGE 'plpgsql'
DECLARE _id integer;
BEGIN
-- guid exists and it's been 10 minutes from created_date:
IF ((SELECT COUNT (*) FROM public.tbl_client_location WHERE guid = in_guid AND created_date < NOW() - INTERVAL '10 MINUTE') > 0) THEN
RETURN QUERY (SELECT FALSE);
-- guid exists but 10 minutes hasen't passed yet:
ELSEIF ((SELECT COUNT (*) FROM public.tbl_client_location WHERE guid = in_guid) > 0) THEN
UPDATE
public.tbl_client_location
SET
x_value = in_x_value,
y_value = in_y_value,
updated_date = now()
WHERE
guid = in_guid;
RETURN QUERY (SELECT TRUE);
-- guid not exist:
ELSE
INSERT INTO public.tbl_client_location
( guid , x_value , y_value )
VALUES
( in_guid, in_x_value, in_y_value )
RETURNING id INTO _id;
RETURN QUERY (SELECT TRUE);
END IF;
END
This can indeed be a lot simpler:
CREATE OR REPLACE FUNCTION insert_table(in_guid text
, in_x_value text
, in_y_value text
, OUT response bool) -- ④
-- RETURNS record -- optional noise -- ④
LANGUAGE plpgsql AS -- ①
$func$ -- ②
-- DECLARE
-- _id integer; -- what for?
BEGIN
INSERT INTO tbl AS t
( guid, x_value, y_value)
VALUES (in_guid, in_x_value, in_y_value)
ON CONFLICT (guid) DO UPDATE -- guid exists
SET ( x_value, y_value, updated_date)
= (EXCLUDED.x_value, EXCLUDED.y_value, now()) -- ⑤
WHERE t.created_date >= now() - interval '10 minutes' -- ③ have not passed yet
-- RETURNING id INTO _id -- what for?
;
response := FOUND; -- ⑥
END
$func$;
Assuming guid is defined UNIQUE or PRIMARY KEY, and created_date is defined NOT NULL DEFAULT now().
① Language name is an identifier - better without quotes.
② Quotes around function body were missing (invalid command). See:
What are '$$' used for in PL/pgSQL
③ UPDATE only if 10 min have not passed yet. Keep in mind that timestamps are those from the beginning of the respective transactions. So keep transactions short and simple. See:
Difference between now() and current_timestamp
④ A function with OUT parameter(s) and no RETURNS clause returns a single row (record) automatically. Your original was declared as set-returning function (0-n returned rows), which didn't make sense. See:
Return multiple fields as a record in PostgreSQL with PL/pgSQL
⑤ It's generally better to use the special EXCLUDED row than to spell out values again. See:
How could this UPSERT query be made shorter?
⑤ Also using short syntax for updating multiple columns. See:
Update multiple columns that start with a specific string
⑥ To see whether a row was written use the special variable FOUND. Subtle difference: different from your original, you get true or false after the fact, saying that a row has actually been written (or not). In your original, the INSERT or UPDATE might still be skipped (without raising an exception) by a trigger or rule, and the function result would be misleading in this case. See:
IS NOT NULL test for a record does not return TRUE when variable is set
Further reading:
Postgres ON CONFLICT ON CONSTRAINT triggering errors in the error log
How to use RETURNING with ON CONFLICT in PostgreSQL?
You might just run the single SQL statement instead, providing your values once:
INSERT INTO tbl AS t(guid, x_value,y_value)
VALUES ($in_guid, $in_x_value, $in_y_value) -- your values here, once
ON CONFLICT (guid) DO UPDATE
SET (x_value,y_value, updated_date)
= (EXCLUDED.x_value, EXCLUDED.y_value, now())
WHERE t.created_date >= now() - interval '10 minutes';
I finally solved it. I made another function that'll be called and checked if it's already exists and the time and then I can do upsert without any problems.
That's what I did at the end:
CREATE OR REPLACE FUNCTION fnc_check_table(
in_guid character varying)
RETURNS TABLE(response boolean)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
BEGIN
IF EXISTS (SELECT FROM tbl WHERE guid = in_guid AND created_date < NOW() - INTERVAL '10 MINUTE' ) THEN
RETURN QUERY (SELECT FALSE);
ELSEIF EXISTS (SELECT FROM tbl WHERE guid = in_guid AND created_date > NOW() - INTERVAL '10 MINUTE') THEN
RETURN QUERY (SELECT TRUE);
ELSE
RETURN QUERY (SELECT TRUE);
END IF;
END
$BODY$;
CREATE OR REPLACE FUNCTION fnc_insert_table(
in_guid character varying,
in_x_value character varying,
in_y_value character varying)
RETURNS TABLE(response boolean)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
BEGIN
IF (fnc_check_table(in_guid)) THEN
INSERT INTO tbl (guid, x_value, y_value)
VALUES (in_guid,in_x_value,in_y_value)
ON CONFLICT (guid)
DO UPDATE SET x_value=in_x_value, y_value=in_y_value, updated_date=now();
RETURN QUERY (SELECT TRUE);
ELSE
RETURN QUERY (SELECT FALSE);
END IF;
END
$BODY$;

Add partition in existing table Greenplum

I am trying to add monthly partition on a table for an year or so. But the issue is I cannot add them in a single query. While creating the table in the past, I have added the partition for each month for couple of years.
CREATE TABLE Calls (
callid varchar(200) NOT NULL,
calltime timestamp NOT NULL,
Duration varchar(50) NULL
)
DISTRIBUTED BY (callid)
PARTITION BY RANGE(calltime)
(
START ('2019-04-01 00:00:00'::timestamp without time zone) END ('2022-01-01 00:00:00'::timestamp without time zone) EVERY ('1 mon'::interval)
I read different articles and blogs on it but could not found any solution to add monthly partitions for year or so. The only possible way is to add manually one by one for each month.
alter table Calls
Add partition
start (date '2022-01-01') inclusive
end (date '2022-02-01') exclusive
--And Again for next month
alter table Calls
Add partition
start (date '2022-02-01') inclusive
end (date '2022-03-01') exclusive
I have around 50 60 tables and doing it manually for each table will take a lot of time and effort. I am trying to make a generic way to add partitions. Any solution?
the quickest i have always found was a dirty perl script that looped over the different ranges and just pre-writes all the SQL something like this:
$SQL = "alter table Calls Add partition start (date 'START') inclusive end (date 'END') exclusive;";
for ($loop=<start>; $loop <= <end>; $loop+=<interval>)
{
$SQL =~ s/START/$loop/;
$SQL =~ s/END/END/;
print $SQL, "\n";
}
Hope that pseudo code helps
There is no direct way for adding multiple partitions without specifying range for each partition. It will simply through error like below.
gpadmin=# alter table calls add partition START ('2022-01-01 00:00:00'::timestamp without time zone) END ('2023-01-01 00:00:00'::timestamp without time zone) EVERY ('1 mon'::interval);
ERROR: cannot specify EVERY when adding RANGE partition to relation "calls"
gpadmin=#
You can create new table with different partition range then replace the Calls table with new table but it a resource consuming process and requires manual intervention.
Thanks,
Anil
Managed to create a dynamic function that takes datestart text,dateend text,table_schema text,table_name text as parameter and loops over to add a partition.
CREATE OR REPLACE FUNCTION partitionfunction(datestart text,dateend text,table_schema text,table_name text)
RETURNS void
LANGUAGE plpgsql
VOLATILE
AS $$
declare
var_Table_schema text;
var_Table_name text;
var_dateStart date;
var_dateEnd date;
var_endPartition date;
BEGIN
var_Table_schema := Table_schema;
var_Table_name := Table_name;
var_dateStart := to_date(dateStart,'YYYY-MM-DD');
var_dateEnd := to_date(dateEnd,'YYYY-MM-DD');
WHILE var_dateStart < var_dateEnd
loop
var_endPartition = var_dateStart + interval '1 MONTH'; --You can also change it for weeks or days
execute ' alter table '|| Table_schema ||'.'|| Table_name || '
add partition start('''||var_dateStart||''') end(
'''||var_endPartition||''')';
var_dateStart = var_dateStart + interval '1 MONTH'; --You can also change it for weeks or days
END loop;
end;
$$
EXECUTE ON ANY;

How do I store date Variable in Postgres function?

So I am working to create a function that will delete the 1 month worth records from a table. The table is in postgres. As postgres does not have stored procedures I am trying to declare a function with the logic that will insert the 1 month records into a history table and then delete the records from the live table. I have the following code :
CREATE FUNCTION DeleteAndInsertTransaction(Integer)
RETURNS Void
AS $Body$
SELECT now() into saveTime;
SELECT * INTO public.hist_table
FROM (select * from public.live_table
WHERE update < ((SELECT * FROM saveTime) - ($1::text || ' months')::interval)) as sub;
delete from public.live_table
where update < ((SELECT * FROM saveTime) - ($1::text || ' months')::interval);
DROP TABLE saveTime;
$Body$
Language 'sql';
So the above code compiles fine but when I try to run it by invoking it :- DeleteAndInsertTransaction(27) it gives me an
Error: relation "savetime" does not exist and I have no clue what is going on here.
If I take out the SELECT now() into saveTime; out of the function bloc and declare it before invoking the function then it runs fine but I need to store the current date into a variable and use that as a constant for the insert and delete and this is going against a huge table and there could be significant time difference between the insert and deletes. Any pointers as to what is going on here ?
select .. into .. is the deprecated syntax for create table ... as select ... which creates a new table.
So, SELECT now() into saveTime; actually creates a new table (named savetime), and is equivalent to: create table savetime as select now(); - it's not storing something in a variable.
To store a value in a variable, you need to first declare the variable, then you can assign the value. But you can only do that in PL/pgSQL, not SQL
CREATE FUNCTION DeleteAndInsertTransaction(p_num_months integer)
returns void
as
$Body$
declare
l_now timestamp;
begin
l_now := now();
...
end;
$body$
language plpgsql;
To insert into an existing table you need
insert into public.hist_table
select *
from public.live_table.
To select the rows from the last x month, there is no need to store the current date and time in a variable to begin with. It's also easier to use make_interval() to generate an interval based on a specified unit.
You can simply use
select *
from live_table
where updated_at <= current_date - make_interval(mons => p_pum_months);
And as you don't need a variable, you can actually do all that with a language sql function.
So the function would look something like this:
CREATE FUNCTION DeleteAndInsertTransaction(p_num_months integer)
RETURNS Void
AS
$Body$
insert into public.hist_table
select *
from live_table
where updated_at < current_date - make_interval(months => p_pum_months);
delete from public.live_table
where updated_at < current_date - make_interval(months => p_pum_months);
$Body$
Language sql;
Note that the language name is an identifier and should not be quoted.
You can actually do the DELETE and INSERT in a single statement:
with deleted as (
delete from public.live_table
where updated_at <= current_date - make_interval(months => p_pum_months)
returning *
)
insert into hist_table
select *
from deleted;

Optimising function which extracts records with a minimum gap in timestamps

I have a big table of timestamps in Postgres 9.4.5:
CREATE TABLE vessel_position (
posid serial NOT NULL,
mmsi integer NOT NULL,
"timestamp" timestamp with time zone,
the_geom geometry(PointZ,4326),
CONSTRAINT "PK_posid_mmsi" PRIMARY KEY (posid, mmsi)
);
Additional index:
CREATE INDEX vessel_position_timestamp_idx ON vessel_position ("timestamp");
I want to extract every row where the timestamp is at least x minutes after the previous row. I've tried a few different SELECT statements using LAG() which all kind of worked, but didn't give me the exact result I require. The below functions gives me what I need, but I feel it could be quicker:
CREATE OR REPLACE FUNCTION _getVesslTrackWithInterval(mmsi integer, startTime character varying (25) ,endTime character varying (25), interval_min integer)
RETURNS SETOF vessel_position AS
$func$
DECLARE
count integer DEFAULT 0;
posids varchar DEFAULT '';
tbl CURSOR FOR
SELECT
posID
,EXTRACT(EPOCH FROM (timestamp - lag(timestamp) OVER (ORDER BY posid asc)))::int as diff
FROM vessel_position vp WHERE vp.mmsi = $1 AND vp.timestamp BETWEEN $2::timestamp AND $3::timestamp;
BEGIN
FOR row IN tbl
LOOP
count := coalesce(row.diff,0) + count;
IF count >= $4*60 OR count = 0 THEN
posids:= posids || row.posid || ',';
count:= 0;
END IF;
END LOOP;
RETURN QUERY EXECUTE 'SELECT * from vessel_position where posid in (' || TRIM(TRAILING ',' FROM posids) || ')';
END
$func$ LANGUAGE plpgsql;
I can't help thinking getting all the posids as a string and then selecting them all again at the very end is slowing things down.
Within the IF statement, I already have access to each row I want to keep, so could potentially store them in a temp table and then return temp table at the end of the loop.
Can this function be optimised - to improve performance in particular?
Query
Your function has all kinds of expensive, unnecessary overhead. A single query should be many times faster, doing the same:
CREATE OR REPLACE FUNCTION _get_vessel_track_with_interval
(mmsi int, starttime timestamptz, endtime timestamptz, min_interval interval)
RETURNS SETOF vessel_position AS
$func$
BEGIN
SELECT (vp).* -- parentheses required for decomposing row type
FROM (
SELECT vp -- whole row (!)
, timestamp - lag(timestamp) OVER (ORDER BY posid) AS diff
FROM vessel_position vp
WHERE vp.mmsi = $1
AND vp.timestamp >= $2 -- typically you'd include the lower bound
AND vp.timestamp < $3; -- ... and exlude the upper
ORDER BY posid
) sub
WHERE diff >= $4;
END
$func$ LANGUAGE plpgsql STABLE;
Could also just be an SQL function or the bare SELECT without any wrapper (Maybe as prepared statement? Example.)
Note how starttime and endtime are passed as timestamp. (Makes no sense to pass as text and cast.) And the minimum interval min_interval is an actual interval. Pass any interval of your choosing.
Index
If the predicate on mmsi is in any way selective, the two indexes you currently have (PK ON (posid, mmsi) and idx on (timestamp)) are not very useful. If you reverse the column order of your PK to (mmsi, posid), it becomes far more useful for the query at hand. See:
Is a composite index also good for queries on the first field?
The optimal index for this would typically be on vessel_position(mmsi, timestamp). Related:
Multicolumn index and performance
PostgreSQL performance with (col = value or col is NULL)
Query does not hit the index - are these the proper columns to index?
Aside: Avoid keywords as identifiers. That's asking for trouble. Plus, a column timestamp that actually holds timestamptz is misleading.

How to create sequence which start from 1 in each day

Sequence should return values 1,2,3 etc starting for 1 for every day.
current_date should used for day determination.
For example, calling today first time it shoudl return 1, in second time 2 etc.
Tomorrow, first call shoud return again 1, second call 2 etc.
Postgres 9.1 is used.
Use a table to keep the sequence:
create table daily_sequence (
day date, s integer, primary key (day, s)
);
This function will retrieve the next value:
create or replace function daily_sequence()
returns int as $$
insert into daily_sequence (day, s)
select current_date, coalesce(max(s), 0) + 1
from daily_sequence
where day = current_date
returning s
;
$$ language sql;
select daily_sequence();
Be prepared to retry in case of an improbable duplicate key value error. If previous days' sequences are not necessary delete them to keep the table and the index as light as possible:
create or replace function daily_sequence()
returns int as $$
with d as (
delete from daily_sequence
where day < current_date
)
insert into daily_sequence (day, s)
select current_date, coalesce(max(s), 0) + 1
from daily_sequence
where day = current_date
returning s
;
$$ language sql;
You just need to think of cronjob as running a shell command at a specified time or day.
Shell Command for running cron job
psql --host host.domain.com --port 32098 --db_name databaseName < my.sql
You can then just add this to your crontab (I recommend you use crontab -e to avoid breaking things)
# It will run your command at 00:00 every day
# min hour wday month mday command-to-run
0 0 * * * psql --host host.domain.com --port 32098 --db_name databaseName < my.sql
It is quite interesting task.
Lets try to use additional sequence for the date and alternative function to get next value:
-- We will use anonymous block here because it is impossible to use
-- variables and functions in DDL directly
do language plpgsql $$
begin
execute 'create sequence my_seq_day start with ' || (current_date - '1900-01-01')::varchar;
end; $$;
-- Initialize sequence
select nextval('my_seq_day');
create sequence my_seq;
create or replace function nextval_daily(in p_seq varchar) returns bigint as $$
declare
dd bigint;
lv bigint;
begin
select current_date - '1900-01-01'::date into dd;
-- Here we should to retrieve current value from sequence
-- properties instead of currval function to make it session-independent
execute 'select last_value from '||p_seq||'_day' into lv;
if dd - lv > 0 then
-- If next day has come
-- Reset main sequens
execute 'alter sequence '||p_seq||' restart';
-- And set the day sequence to the current day
execute 'alter sequence '||p_seq||'_day restart with '||dd::varchar;
execute 'select nextval('''||p_seq||'_day'')' into lv;
end if;
return nextval(p_seq);
end; $$ language plpgsql;
Then use function nextval_daily instead of nextval.
Hope it was helpful.
I have came across with almost similar requirement.
Handled the logic from query rather than modifying the sequence.
used setval() to reset the sequence to 0 if its the first entry to the table for the day.
Else nextval() of the sequence.
Below is the sample query :
SELECT
CASE WHEN NOT EXISTS (
SELECT primary_key FROM schema.table WHERE date(updated_datetime) = #{systemDate} limit 1)
THEN
setval('scheam.job_seq', 1)
ELSE
nextval('scheam.job_seq')
END
UPDATE privilege is required for the user to execute setval.
GRANT UPDATE ON ALL SEQUENCES IN SCHEMA ur_schema TO user;