Add partition in existing table Greenplum

Add partition in existing table Greenplum - sql

I am trying to add monthly partition on a table for an year or so. But the issue is I cannot add them in a single query. While creating the table in the past, I have added the partition for each month for couple of years.
CREATE TABLE Calls (
callid varchar(200) NOT NULL,
calltime timestamp NOT NULL,
Duration varchar(50) NULL
)
DISTRIBUTED BY (callid)
PARTITION BY RANGE(calltime)
(
START ('2019-04-01 00:00:00'::timestamp without time zone) END ('2022-01-01 00:00:00'::timestamp without time zone) EVERY ('1 mon'::interval)
I read different articles and blogs on it but could not found any solution to add monthly partitions for year or so. The only possible way is to add manually one by one for each month.
alter table Calls
Add partition
start (date '2022-01-01') inclusive
end (date '2022-02-01') exclusive
--And Again for next month
alter table Calls
Add partition
start (date '2022-02-01') inclusive
end (date '2022-03-01') exclusive
I have around 50 60 tables and doing it manually for each table will take a lot of time and effort. I am trying to make a generic way to add partitions. Any solution?

the quickest i have always found was a dirty perl script that looped over the different ranges and just pre-writes all the SQL something like this:
$SQL = "alter table Calls Add partition start (date 'START') inclusive end (date 'END') exclusive;";
for ($loop=<start>; $loop <= <end>; $loop+=<interval>)
{
$SQL =~ s/START/$loop/;
$SQL =~ s/END/END/;
print $SQL, "\n";
}
Hope that pseudo code helps

There is no direct way for adding multiple partitions without specifying range for each partition. It will simply through error like below.
gpadmin=# alter table calls add partition START ('2022-01-01 00:00:00'::timestamp without time zone) END ('2023-01-01 00:00:00'::timestamp without time zone) EVERY ('1 mon'::interval);
ERROR: cannot specify EVERY when adding RANGE partition to relation "calls"
gpadmin=#
You can create new table with different partition range then replace the Calls table with new table but it a resource consuming process and requires manual intervention.
Thanks,
Anil

Managed to create a dynamic function that takes datestart text,dateend text,table_schema text,table_name text as parameter and loops over to add a partition.
CREATE OR REPLACE FUNCTION partitionfunction(datestart text,dateend text,table_schema text,table_name text)
RETURNS void
LANGUAGE plpgsql
VOLATILE
AS $$
declare
var_Table_schema text;
var_Table_name text;
var_dateStart date;
var_dateEnd date;
var_endPartition date;
BEGIN
var_Table_schema := Table_schema;
var_Table_name := Table_name;
var_dateStart := to_date(dateStart,'YYYY-MM-DD');
var_dateEnd := to_date(dateEnd,'YYYY-MM-DD');
WHILE var_dateStart < var_dateEnd
loop
var_endPartition = var_dateStart + interval '1 MONTH'; --You can also change it for weeks or days
execute ' alter table '|| Table_schema ||'.'|| Table_name || '
add partition start('''||var_dateStart||''') end(
'''||var_endPartition||''')';
var_dateStart = var_dateStart + interval '1 MONTH'; --You can also change it for weeks or days
END loop;
end;
$$
EXECUTE ON ANY;

Related

How to use a counter in a Postgresql function loop when generating a timestamp type for each iteration

CREATE OR REPLACE FUNCTION inserts_table_foo(startId INT,
endId INT,
stepTimeZone INT default 0,
period INT default 0) RETURNS void AS
$$
DECLARE
nameVar TEXT;
dateVar TIMESTAMP;
begin
for idVar in startId..endId
loop
nameVar := md5(random()::text);
dateVar :=
(now()::timestamp with time zone +
make_interval(hours := stepTimeZone) +
make_interval(days := period)
);
period := period + 1;
insert into foo(id, name, data) values (idVar, nameVar, dateVar);
end loop;
end ;
$$ LANGUAGE plpgsql;
Unfortunately, the variable -> period, does not work as intended. I assumed that the counter would increase by 1 and each time, the date generation would be different, but in fact, the date turns out to be the same for the entire duration of the cycle.
How to make the date, for each iteration, be generated anew? This won't do here
generate_series ( TIMESTAMP WITHOUT TIME ZONE '2022-01-01', CURRENT_DATE, '1 day' )
because, you need to bet 100,000,000 records, in a table where the date field can be in the table in several places (dateStart, dateEnd, startDateReg, etc.) and are placed out of order. Therefore, I want to make a function that accepts a number of variables, according to which I can set the periods of the generated values, in this example, this is the date and time.
I would like to clearly see which variable the value comes into and also see the code for inserting this value into the table (that is, using the INSERT into entry .....)
What is the problem here and how could it be solved ?

SQLite adding minute field to date field

Let's say i have a simple database like this one:
CREATE TABLE test (start TEXT, end TEXT, duration INTEGER);
The start and end are dates and duration is in minutes. I would like to check if start+duration < end
I know i can use the date function datetime(start,'+14 minutes'); but how would i do that with another column value?

If you want to create a constraint so that start + duration is always less than end in your table then this is the syntax:
CREATE TABLE test (
start TEXT,
end TEXT,
duration INTEGER,
CHECK (DATETIME(start, duration || ' minute') < end)
);
There is no need to concatenate '+'.
This will work only if your date columns have the only valid date format for SQLite which is YYYY-MM-DD hh:mm:ss.
Note that if the table already exists you can't add the constraint.
You will have to create a new table with the above schema (including the constraint), copy the data of the old table to the new table (if they meet the condition of the constraint), delete the old table and rename the new table.

You can construct a string:
datetime(start, '+' || dur || ' minute')

It this what you want?
dt_start < datetime(dt_end, '+' || duration || ' minute')
Note that start and end are language keywords in most databases, hence not good choices for column names. I renamed those columns dt_start and dt_end in the above code.

How can I add a date with integer

In PostgreSQL we have two tables:
a table is called:"users" and it contains the information from members like user_id and acc_type_id which is integer and shows the number of months that an account would be expired and it contains expiry_date (timestamp) which shows the time that the account will be expired
We have another table "account_type" which contains account_type_id and validity which is integer(the number of months)
Now when a user fill “users table” I want to put a trigger on insert and update:
When the user choose acc_type_id the expiry_date will be calculated automatically with the below formula:
For example today is 2020-01-06 and they choose gold account which is 9 month (I defined it integer), so their expiry_date should be: 2020-10-06 what is the best way for writing that code in short one?
declare val integer;
declare user_id_users integer:=0;
declare s integer;
BEGIN
user_id_users:=(select user_id from users where user_id=new.user_id ) ;
if ( user_id_users <> 0) then
s:=(select acc_type_id from users where user_id=new.user_id);
val:=(select validity from account_type where account_type_id=s);
update users set expiry_date= (select current_date +interval '1 month' * val where user_id=user_id_users);
end if;
return new;
END;
I have written the code for insert, but when I insert a new user, It would be in a loop and it shows a lot of similar errors continuously as below:
SQL statement "update users set expiry_date= (select current_date +interval '1 month' * val where user_id=user_id_users)" PL/pgSQL function acc_type_expiary() line 17 at SQL statement

You are over complicating things. You don't need multiple selects, you only need a single select to get the value from the account_type table. If the trigger is on the users table, there is no need to run an UPDATE.
Select the value, and assign it to the new record
DECLARE
l_months integer;
BEGIN
select at.validity
into l_months
from account_type at
where at.account_type_id = new.acc_type_id;
-- no UPDATE, just assign the value.
new.expiry_date := current_date + (interval '1 month' * l_months);
return new;
END;
Instead of (interval '1 month' * l_months) you can also use make_interval(months => l_months)
This requires that the trigger is defined as before update for each row

Looks like your update should read:
update users set expiry_date = (current_date + interval '1 month' * val) where user_id = user_id_users;

Optimising function which extracts records with a minimum gap in timestamps

I have a big table of timestamps in Postgres 9.4.5:
CREATE TABLE vessel_position (
posid serial NOT NULL,
mmsi integer NOT NULL,
"timestamp" timestamp with time zone,
the_geom geometry(PointZ,4326),
CONSTRAINT "PK_posid_mmsi" PRIMARY KEY (posid, mmsi)
);
Additional index:
CREATE INDEX vessel_position_timestamp_idx ON vessel_position ("timestamp");
I want to extract every row where the timestamp is at least x minutes after the previous row. I've tried a few different SELECT statements using LAG() which all kind of worked, but didn't give me the exact result I require. The below functions gives me what I need, but I feel it could be quicker:
CREATE OR REPLACE FUNCTION _getVesslTrackWithInterval(mmsi integer, startTime character varying (25) ,endTime character varying (25), interval_min integer)
RETURNS SETOF vessel_position AS
$func$
DECLARE
count integer DEFAULT 0;
posids varchar DEFAULT '';
tbl CURSOR FOR
SELECT
posID
,EXTRACT(EPOCH FROM (timestamp - lag(timestamp) OVER (ORDER BY posid asc)))::int as diff
FROM vessel_position vp WHERE vp.mmsi = $1 AND vp.timestamp BETWEEN $2::timestamp AND $3::timestamp;
BEGIN
FOR row IN tbl
LOOP
count := coalesce(row.diff,0) + count;
IF count >= $4*60 OR count = 0 THEN
posids:= posids || row.posid || ',';
count:= 0;
END IF;
END LOOP;
RETURN QUERY EXECUTE 'SELECT * from vessel_position where posid in (' || TRIM(TRAILING ',' FROM posids) || ')';
END
$func$ LANGUAGE plpgsql;
I can't help thinking getting all the posids as a string and then selecting them all again at the very end is slowing things down.
Within the IF statement, I already have access to each row I want to keep, so could potentially store them in a temp table and then return temp table at the end of the loop.
Can this function be optimised - to improve performance in particular?

Query
Your function has all kinds of expensive, unnecessary overhead. A single query should be many times faster, doing the same:
CREATE OR REPLACE FUNCTION _get_vessel_track_with_interval
(mmsi int, starttime timestamptz, endtime timestamptz, min_interval interval)
RETURNS SETOF vessel_position AS
$func$
BEGIN
SELECT (vp).* -- parentheses required for decomposing row type
FROM (
SELECT vp -- whole row (!)
, timestamp - lag(timestamp) OVER (ORDER BY posid) AS diff
FROM vessel_position vp
WHERE vp.mmsi = $1
AND vp.timestamp >= $2 -- typically you'd include the lower bound
AND vp.timestamp < $3; -- ... and exlude the upper
ORDER BY posid
) sub
WHERE diff >= $4;
END
$func$ LANGUAGE plpgsql STABLE;
Could also just be an SQL function or the bare SELECT without any wrapper (Maybe as prepared statement? Example.)
Note how starttime and endtime are passed as timestamp. (Makes no sense to pass as text and cast.) And the minimum interval min_interval is an actual interval. Pass any interval of your choosing.
Index
If the predicate on mmsi is in any way selective, the two indexes you currently have (PK ON (posid, mmsi) and idx on (timestamp)) are not very useful. If you reverse the column order of your PK to (mmsi, posid), it becomes far more useful for the query at hand. See:
Is a composite index also good for queries on the first field?
The optimal index for this would typically be on vessel_position(mmsi, timestamp). Related:
Multicolumn index and performance
PostgreSQL performance with (col = value or col is NULL)
Query does not hit the index - are these the proper columns to index?
Aside: Avoid keywords as identifiers. That's asking for trouble. Plus, a column timestamp that actually holds timestamptz is misleading.

How to create sequence which start from 1 in each day

Sequence should return values 1,2,3 etc starting for 1 for every day.
current_date should used for day determination.
For example, calling today first time it shoudl return 1, in second time 2 etc.
Tomorrow, first call shoud return again 1, second call 2 etc.
Postgres 9.1 is used.

Use a table to keep the sequence:
create table daily_sequence (
day date, s integer, primary key (day, s)
);
This function will retrieve the next value:
create or replace function daily_sequence()
returns int as $$
insert into daily_sequence (day, s)
select current_date, coalesce(max(s), 0) + 1
from daily_sequence
where day = current_date
returning s
;
$$ language sql;
select daily_sequence();
Be prepared to retry in case of an improbable duplicate key value error. If previous days' sequences are not necessary delete them to keep the table and the index as light as possible:
create or replace function daily_sequence()
returns int as $$
with d as (
delete from daily_sequence
where day < current_date
)
insert into daily_sequence (day, s)
select current_date, coalesce(max(s), 0) + 1
from daily_sequence
where day = current_date
returning s
;
$$ language sql;

You just need to think of cronjob as running a shell command at a specified time or day.
Shell Command for running cron job
psql --host host.domain.com --port 32098 --db_name databaseName < my.sql
You can then just add this to your crontab (I recommend you use crontab -e to avoid breaking things)
# It will run your command at 00:00 every day
# min hour wday month mday command-to-run
0 0 * * * psql --host host.domain.com --port 32098 --db_name databaseName < my.sql

It is quite interesting task.
Lets try to use additional sequence for the date and alternative function to get next value:
-- We will use anonymous block here because it is impossible to use
-- variables and functions in DDL directly
do language plpgsql $$
begin
execute 'create sequence my_seq_day start with ' || (current_date - '1900-01-01')::varchar;
end; $$;
-- Initialize sequence
select nextval('my_seq_day');
create sequence my_seq;
create or replace function nextval_daily(in p_seq varchar) returns bigint as $$
declare
dd bigint;
lv bigint;
begin
select current_date - '1900-01-01'::date into dd;
-- Here we should to retrieve current value from sequence
-- properties instead of currval function to make it session-independent
execute 'select last_value from '||p_seq||'_day' into lv;
if dd - lv > 0 then
-- If next day has come
-- Reset main sequens
execute 'alter sequence '||p_seq||' restart';
-- And set the day sequence to the current day
execute 'alter sequence '||p_seq||'_day restart with '||dd::varchar;
execute 'select nextval('''||p_seq||'_day'')' into lv;
end if;
return nextval(p_seq);
end; $$ language plpgsql;
Then use function nextval_daily instead of nextval.
Hope it was helpful.

I have came across with almost similar requirement.
Handled the logic from query rather than modifying the sequence.
used setval() to reset the sequence to 0 if its the first entry to the table for the day.
Else nextval() of the sequence.
Below is the sample query :
SELECT
CASE WHEN NOT EXISTS (
SELECT primary_key FROM schema.table WHERE date(updated_datetime) = #{systemDate} limit 1)
THEN
setval('scheam.job_seq', 1)
ELSE
nextval('scheam.job_seq')
END
UPDATE privilege is required for the user to execute setval.
GRANT UPDATE ON ALL SEQUENCES IN SCHEMA ur_schema TO user;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Add partition in existing table Greenplum - sql

Related

How to use a counter in a Postgresql function loop when generating a timestamp type for each iteration

SQLite adding minute field to date field

How can I add a date with integer

Optimising function which extracts records with a minimum gap in timestamps

How to create sequence which start from 1 in each day

Categories

Resources