What would be the right steps for horizontal partitioning in Postgresql?

What would be the right steps for horizontal partitioning in Postgresql? - sql

We have an E-commerce portal with a Postgresql 9.1 database. One very important table has at the moment 32 million records. If we want to deliver all items this table would grow to 320 million records, mostly dates. Which would be to heavy.
So we are thinking about horizontal partitioning / sharding. We can divide items in this table into 12 pieces horizontal (1 per month). What would be the best steps and technics to do so? Would horizontal partitioning within the database be good enough or do we have to start thinking about sharding?

While 320 million is not small, it's not really huge either.
It largely depends on the queries you run on the table. If you always include the partition key in your queries then "regular" partitioning would probably work.
An example for this can be found in the PostgreSQL wiki:
http://wiki.postgresql.org/wiki/Month_based_partitioning
The manual also explains some of the caveats of partitioning:
http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html
If you are thinking about sharding, you might read how Instagram (which is powered by PostgreSQL) has implemented that:
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
If you have mostly read-queries, another option might be to use streaming replication to setup multiple servers and distribute the read queries by connecting to the hot-standby for read access and connecting to the master for write access. I think pg-pool II can do that (somewhat) automatically. That can be combined with partitioning to further reduce the query runtime.
If you are adventurous and don't have very immediate needs to do so, you might also consider Postgres-XC which promises to support transparent horizontal scaling:
http://postgres-xc.sourceforge.net/
There is no final release yet, but it looks like this isn't taking too long

Here is my sample code for partitioning:
t_master is a view to be select/insert/update/delete in your application
t_1 and t_2 is the underlying tables actually storing the data.
create or replace view t_master(id, col1)
as
select id, col1 from t_1
union all
select id, col1 from t_2
CREATE TABLE t_1
(
id bigint PRIMARY KEY,
col1 text
);
CREATE TABLE t_2
(
id bigint PRIMARY KEY,
col1 text
);
CREATE OR REPLACE FUNCTION t_insert_partition_function()
returns TRIGGER AS $$
begin
raise notice '%s', 'hello';
execute 'insert into t_'
|| ( mod(NEW.id, 2)+ 1 )
|| ' values ( $1, $2 )' USING NEW.id, NEW.col1 ;
RETURN NULL;
end;
$$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION t_update_partition_function()
returns TRIGGER AS $$
begin
raise notice '%s', 'hello';
execute 'update t_'
|| ( mod(NEW.id, 2)+ 1 )
|| ' set id = $1, col1 = $2 where id = $1'
USING NEW.id, NEW.col1 ;
RETURN NULL;
end;
$$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION t_delete_partition_function()
returns TRIGGER AS $$
begin
raise notice '%s', 'hello';
execute 'delete from t_'
|| ( mod(OLD.id, 2)+ 1 )
|| ' where id = $1'
USING OLD.id;
RETURN NULL;
end;
$$
LANGUAGE plpgsql;
CREATE TRIGGER t_insert_partition_trigger instead of INSERT
ON t_master FOR each row
execute procedure t_insert_partition_function();
CREATE TRIGGER t_update_partition_trigger instead of update
ON t_master FOR each row
execute procedure t_update_partition_function();
CREATE TRIGGER t_delete_partition_trigger instead of delete
ON t_master FOR each row
execute procedure t_delete_partition_function();

If you don't mind upgrading to PostgreSQL 9.4, then you could use the pg_shard extension, which lets you transparently shard a PostgreSQL table across many machines. Every shard is stored as a regular PostgreSQL table on another PostgreSQL server and replicated to other servers. It uses hash-partitioning to decide which shard(s) to use for a given query. pg_shard would work well if your queries have a natural partition dimension (e.g., customer ID).
More info: https://github.com/citusdata/pg_shard

Related

Partitioned tables cannot have ROW triggers. when adding a for each row trigger that calls a fn which creates a partition before inserting (postgres)

I am new to Database, so seeking some advise or help.
I have a table that is partitioned by list host, as shown below.
CREATE TABLE public. services
(
id integer,
service_name character varying(128),
host character varying(128),
) PARTITION BY LIST (host)
And to insert in the table, I have created a function that will check if the partitioned table is present if not, let's create one before inserting and trying to add a trigger to the function.
CREATE OR REPLACE FUNCTION service_function()
RETURNS TRIGGER AS $$
DECLARE
host TEXT;
partition_name TEXT;
BEGIN
partition_name := 'services_' || host;
IF NOT EXISTS
(SELECT 1
FROM information_schema.tables
WHERE table_name= partation_name)
THEN
RAISE NOTICE 'A partition has been created %', partition_name;
EXECUTE format(E'CREATE TABLE %I PARTITION OF services for values in (%L)'), partition_name, host;
END IF;
EXECUTE format('INSERT INTO %I (id, service_name, host) VALUES ($1, $2, $3)', partition_name) using NEW.id, NEW.service_name, NEW.host;
RETURN NULL;
Now when I am trying to add a trigger,
CREATE TRIGGER insert_service_trigger
BEFORE INSERT on services
FOR EACH ROW EXECUTE PROCEDURE service_function();
END
following error is thrown: ERROR: "services" is a partitioned table
DETAIL: Partitioned tables cannot have ROW triggers.
any suggestions or solution for it?
what I am trying to achieve: services table will be in 100G+ and always host will be there in where clause for all the SELECT queries, so I thought of creating a partition using host. is this is rite approach ??

You must be using PostgreSQL v10, which is the only release that ever sported that error message. Triggers on partitioned tables were introduced in v11.
But you won't be able to achieve what you want in any PostgreSQL version. By the time your trigger has started processing, it is too late too change the table definition. You'd get the following error:
ERROR: cannot CREATE TABLE .. PARTITION OF "services" because it is being used by active queries in this session
There is no way to achieve what you want in PostgreSQL. For a thorough discussion of this and possible workarounds, read my article.

PL/pgSQL function to return the output of various SELECT queries from different database

I have found this very interesting article: Refactor a PL/pgSQL function to return the output of various SELECT queries
from Erwin Brandstetter which describes how to return all columns of various tables with only one function:
CREATE OR REPLACE FUNCTION data_of(_table_name anyelement, _where_part text)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT * FROM ' || pg_typeof(_table_name)::text || ' WHERE ' || _where_part;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM data_of(NULL::tablename,'1=1 LIMIT 1');
This works pretty well. I need a very similar solution but for getting data from a table on a different database via dblink. That means the call NULL::tablename will fail since the table does not exists on the database where the call is made. I wonder how to make this work. Any try to connect inside of the function via dblink to a different database failed to get the result of NULL::tablename. It seems the polymorph function needs a polymorph parameter which creates the return type of the function implicit.
I would appreciate very much if anybody could help me.
Thanks a lot
Kind regards
Brian

it seems this request is more difficult to explain than I thought it is. Here is a second try with a test setup:
Database 1
First we create a test table with some data on database 1:
CREATE TABLE db1_test
(
id integer NOT NULL,
txt text
)
WITH (
OIDS=TRUE
);
INSERT INTO db1_test (id, txt) VALUES(1,'one');
INSERT INTO db1_test (id, txt) VALUES(2,'two');
INSERT INTO db1_test (id, txt) VALUES(3,'three');
Now we create the polymorph function on database 1:
-- create a polymorph function with a polymorph parameter "_table_name" on database 1
-- the return type is set implicit by calling the function "data_of" with the parameter "NULL::[tablename]" and a where part
CREATE OR REPLACE FUNCTION data_of(_table_name anyelement, _where_part text)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT * FROM ' || pg_typeof(_table_name)::text || ' WHERE ' || _where_part;
END
$func$ LANGUAGE plpgsql;
Now we make test call if everything works as aspected on database 1
SELECT * FROM data_of(NULL::db1_test, 'id=2');
It works. Please notice I do NOT specify any columns of the table db1_test. Now we switch over to database 2.
Database 2
Here I need to make exactly the same call to data_of from database 1 as before and although WITHOUT knowing the columns of the selected table at call time. Unfortunatly this is not gonna work, the only call which works is something like that:
SELECT
*
FROM dblink('dbname=[database1] port=[port] user=[user] password=[password]'::text, 'SELECT * FROM data_of(NULL::db1_test, \'id=2\')'::text)
t1(id integer, txt text);
Conclusion
This call works, but as you can see, I need to specify at least once how all the columns look like from the table I want to select. I am looking for any way to bypass this and make it possible to make a call WITHOUT knowing all of the columns from the table on database 1.
Final goal
My final goal is to create a function in database 2 which looks like
SELECT * from data_of_dblink('table_name','where_part')
and which calls internaly data_of() on database1 to make it possible to select a table on a different database with a where part as parameter. It should work like a static view but with the possiblity to pass a where part as parameter.
I am extremly open for suggestions.
Thanks a lot
Brian

Check if table is empty in runtime

I am trying to write a script which drops some obsolete tables in Postgres database. I want to be sure the tables are empty before dropping them. I also want the script could be kept in our migration scripts where it is safe to run even after these tables are actually dropped.
There is my script:
CREATE OR REPLACE FUNCTION __execute(TEXT) RETURNS VOID AS $$
BEGIN EXECUTE $1; END;
$$ LANGUAGE plpgsql STRICT;
CREATE OR REPLACE FUNCTION __table_exists(TEXT, TEXT) RETURNS bool as $$
SELECT exists(SELECT 1 FROM information_schema.tables WHERE (table_schema, table_name, table_type) = ($1, $2, 'BASE TABLE'));
$$ language sql STRICT;
CREATE OR REPLACE FUNCTION __table_is_empty(TEXT) RETURNS bool as $$
SELECT not exists(SELECT 1 FROM $1 );
$$ language sql STRICT;
-- Start migration here
SELECT __execute($$
DROP TABLE oldtable1;
$$)
WHERE __table_exists('public', 'oldtable1')
AND __table_is_empty('oldtable1');
-- drop auxilary functions here
And finally I got:
ERROR: syntax error at or near "$1"
LINE 11: SELECT not exists(SELECT 1 FROM $1 );
Is there any other way?

You must use EXECUTE if you want to pass a table name as parameter in a Postgres function.
Example:
CREATE OR REPLACE FUNCTION __table_is_empty(param character varying)
RETURNS bool
AS $$
DECLARE
v int;
BEGIN
EXECUTE 'select 1 WHERE EXISTS( SELECT 1 FROM ' || quote_ident(param) || ' ) '
INTO v;
IF v THEN return false; ELSE return true; END IF;
END;
$$ LANGUAGE plpgsql;
/
Demo: http://sqlfiddle.com/#!12/09cb0/1

No, no, no. For many reasons.
#kordirko already pointed out the immediate cause for the error message: In plain SQL, variables can only be used for values not for key words or identifiers. You can fix that with dynamic SQL, but that still doesn't make your code right.
You are applying programming paradigms from other programming languages. With PL/pgSQL, it is extremely inefficient to split your code into multiple separate tiny sub-functions. The overhead is huge in comparison.
Your actual call is also a time bomb. Expressions in the WHERE clause are executed in any order, so this may or may not raise an exception for non-existing table names:
WHERE __table_exists('public', 'oldtable1')
AND __table_is_empty('oldtable1');
... which will roll back your whole transaction.
Finally, you are completely open to race conditions. Like #Frank already commented, a table can be in use by concurrent transactions, in which case open locks may stall your attempt to drop the table. Could also lead to deadlocks (which the system resolves by rolling back all but one competing transactions). Take out an exclusive lock yourself, before you check whether the table is (still) empty.
Proper function
This is safe for concurrent use. It takes an array of table names (and optionally a schema name) and only drops existing, empty tables that are not locked in any way:
CREATE OR REPLACE FUNCTION f_drop_tables(_tbls text[] = '{}'
, _schema text = 'public'
, OUT drop_ct int) AS
$func$
DECLARE
_tbl text; -- loop var
_empty bool; -- for empty check
BEGIN
drop_ct := 0; -- init!
FOR _tbl IN
SELECT quote_ident(table_schema) || '.'
|| quote_ident(table_name) -- qualified & escaped table name
FROM information_schema.tables
WHERE table_schema = _schema
AND table_type = 'BASE TABLE'
AND table_name = ANY(_tbls)
LOOP
EXECUTE 'SELECT NOT EXISTS (SELECT 1 FROM ' || _tbl || ')'
INTO _empty; -- check first, only lock if empty
IF _empty THEN
EXECUTE 'LOCK TABLE ' || _tbl; -- now table is ripe for the plucking
EXECUTE 'SELECT NOT EXISTS (SELECT 1 FROM ' || _tbl || ')'
INTO _empty; -- recheck after lock
IF _empty THEN
EXECUTE 'DROP TABLE ' || _tbl; -- go in for the kill
drop_ct := drop_ct + 1; -- count tables actually dropped
END IF;
END IF;
END LOOP;
END
$func$ LANGUAGE plpgsql STRICT;
Call:
SELECT f_drop_tables('{foo1,foo2,foo3,foo4}');
To call with a different schema than the default 'public':
SELECT f_drop_tables('{foo1,foo2,foo3,foo4}', 'my_schema');
Major points
Reports the number of tables actually dropped. (Adapt to report info of your choice.)
Using the information schema like in your original. Seems the right choice here, but be aware of subtle limitations:
How to check if a table exists in a given schema
For use under heavy concurrent load (with long transactions), consider the NOWAIT option for the LOCK command and possibly catch exceptions from it.
Per documentation on "Table-level Locks":
ACCESS EXCLUSIVE
Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE,
SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE,
and ACCESS EXCLUSIVE). This mode guarantees that the holder
is the only transaction accessing the table in any way.
Acquired by the ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX, CLUSTER, and VACUUM FULL commands. This is also the
default lock mode for LOCK TABLE statements that do not specify a mode explicitly.
Bold emphasis mine.

I work on DB2 so I can't say for sure if this will work on a Postgres database, but try this:
select case when count(*) > 0 then True else False end from $1
...in place of:
SELECT not exists(SELECT 1 FROM $1 )
If Postgres does not have a CASE / END expression capability, I'd be shocked if it didn't have some kind of similar IF/THEN/ELSE like expression ability to use as a substitute.

History table referencing other values in the table / accessing package table variables

I have a system for tracking usage of computers in a lab. Slightly simplified, it works out to:
Machines are associated with a lab.
Machines have a binary logged_in state, which gets updated automatically when users log in and out.
There is a view keyed on the lab which gathers the total number of seats associated with the lab, and the current number in use for that lab.
What I would like to do is add a history or audit table, which would track changes to lab population over time. I had a trigger on the machine table to store the time and the total lab population in my lab history table every time the machine table changed. The problem is that, in order to get the new total for the lab, I have to examine the other values in the machine table. This results in a table mutating error.
Some things I found on here and elsewhere suggested that I should create a package to track the labs being changed. Use a before trigger to clear the list, a row trigger to store each labid being changed, and an after trigger to update the history table with new values for only those labs whose ids are in the list. I've tried that, but can't figure out how to access the values I've stored in the package table (or even if it is storing them properly in the first place.) As will no doubt be obvious, I'm unfamiliar with PL/SQL packages and table variables - the whole syntax of referring to table entries like arrays struck me as vaguely heretical though incredibly useful if it works. So most of the below is just copied and adapted from other solutions I've found, but they didn't stretch as far as how to actually use my table of changed lablocids, assuming its being created properly in the first place. The following simply tells me that pg_machine_in_use_pkg.changedlablocids does not exist when I try to compile the final trigger.
create or replace package labstats_adm.pg_machine_in_use_pkg
as
type arr is table of number index by binary_integer;
changedlablocids arr;
empty arr;
end;
/
create or replace trigger labstats_adm.pg_machine_in_use_init
before insert or update
on labstats_adm.pg_machine
begin
-- begin each update with a blank list of changed lablocids
pg_machine_in_use_pkg.changedlablocids := pg_machine_in_use_pkg.empty;
end;
/
--
create or replace trigger labstats_adm.pg_machine_in_use_update
after insert or update of in_use,lablocid
on labstats_adm.pg_machine
for each row
begin
-- record lablocids - old and new - of changed machines
if :new.lablocid is not null then
pg_machine_in_use_pkg.changedlablocids( pg_machine_in_use_pkg.changedlablocids.count+1 ) := :new.lablocid;
end if;
if :old.lablocid is not null and :old.lablocid != :new.lablocid then
pg_machine_in_use_pkg.changedlablocids( pg_machine_in_use_pkg.changedlablocids.count+1 ) := :old.lablocid;
end if;
end;
create or replace trigger labstats_adm.pg_machine_lab_history
after insert or update of in_use,lablocid
on labstats_adm.pg_machine
begin
-- for each lablocation we just logged a change to, update that labs history
insert into labstats_adm.pg_lab_history (labid, time, total_seats, used_seats)
select labid, systimestamp, total_seats, used_seats
from labstats_adm.lab_usage
where labid in (
select distinct labid from pg_machine_in_use_pkg.changedlablocids
);
end;
/
On the other hand, if there is a better overall approach than the package, I'm all ears.

After some reflection I've got to go with #tbone on this one. In my experience a history table should be a copy of the data in the "real" table with fields added to show when particular 'version' of the data shown by a row in the history table was in effect. So if the "real" table is something like
CREATE TABLE REAL_TABLE
(ID_REAL_TABLE NUMBER PRIMARY KEY,
COL2 NUMBER,
COL3 VARCHAR2(50));
then I'd create the history table as
CREATE TABLE HIST_TABLE
(ID_HIST_TABLE NUMBER PRIMARY KEY
ID_REAL_TABLE NUMBER,
COL2 NUMBER,
COL3 VARCHAR2(50),
EFFECTIVE_START_DT TIMESTAMP(9) NOT NULL,
EFFECTIVE_END_DT TIMESTAMP(9));
and I'd have the following triggers to get everything populated:
CREATE TRIGGER REAL_TABLE_BI
BEFORE INSERT ON REAL_TABLE
REFERENCING OLD AS OLD
NEW AS NEW
FOR EACH ROW
BEGIN
IF :NEW.ID_REAL_TABLE IS NULL THEN
:NEW.ID_REAL_TABLE := REAL_TABLE_SEQUENCE.NEXTVAL;
END IF;
END REAL_TABLE_BI;
CREATE TRIGGER HIST_TABLE_BI
BEFORE INSERT ON HIST_TABLE
FOR EACH ROW
BEGIN
IF :NEW.ID_HIST_TABLE IS NULL THEN
:NEW.ID_HIST_TABLE := HIST_TABLE_SEQUENCE.NEXTVAL;
END IF;
END HIST_TABLE_BI;
CREATE TRIGGER REAL_TABLE_AIUD
AFTER INSERT OR UPDATE OR DELETE ON REAL_TABLE
FOR EACH ROW
DECLARE
tsEffective_start_date TIMESTAMP(9) := SYSTIMESTAMP;
tsEffective_end_date TIMESTAMP(9) := dtEffective_start_date - INTERVAL '0.000000001' SECOND;
BEGIN
IF UPDATING OR DELETING THEN
UPDATE HIST_TABLE
SET EFFECTIVE_END_DATE := tsEffective_end_date
WHERE ID_REAL_TABLE = :NEW.ID_REAL_TABLE AND
EFFECTIVE_END_DATE IS NULL;
END IF;
IF INSERTING OR UPDATING THEN
INSERT INTO HIST_TABLE (ID_REAL_TABLE, COL2, COL3, EFFECTIVE_START_DATE)
VALUES (:NEW.ID_REAL_TABLE, :NEW.COL2, :NEW.COL3, tsEffective_start_date);
END IF;
END REAL_TABLE_AIUD;
Using this method the "history" table has all historical versions of the data in the "real" table PLUS a complete copy of the "current" data from the "real" table; this is done to simplify queries which need to report on all versions of the data in the table up to and including present values.
The advantage of using triggers to do all this is that the maintenance of the primary keys and the history table becomes automatic and can't be easily circumvented or forgotten.
Share and enjoy.

Sorry so slow to get back; its taken me a bit of fiddling, and I haven't had a lot of time to work on it.
Thanks to Bob Jarvis for pointing me at the compound triggers, which cleaned up the overall structure significantly. After that, I just had to sanitise the way I'm getting values back out of my table variable. On the odd chance that someone else stumbles over this looking for the answer to the same problem, I'll post my final solution here:
create or replace
trigger pg_machine_in_use_update
for insert or update or delete of in_use,lablocid
on labstats_adm.pg_machine
compound trigger
type arr is table of number index by binary_integer;
changedlabids arr;
idx binary_integer;
after each row is
newlabid labstats_adm.pg_labs.labid%TYPE;
oldlabid labstats_adm.pg_labs.labid%TYPE;
begin
-- store the labids of any changed locations
-- PL/SQL does not like us testing for the existence of something that isn't there, so just set it twice if necessary
if ( :new.lablocid is not null ) then
select labid into newlabid from labstats_adm.pg_lablocation where lablocid = :new.lablocid;
changedlabids( newlabid ) := 1;
end if;
if ( :old.lablocid is not null ) then
select labid into oldlabid from labstats_adm.pg_lablocation where lablocid = :old.lablocid;
changedlabids( oldlabid ) := 1;
end if;
end after each row;
after statement is
begin
idx := changedlabids.FIRST;
while idx is not null loop
insert into labstats_adm.pg_lab_history (labid, time, total_seats, used_seats)
select labid, systimestamp, total_seats, used_seats
from labstats_adm.lab_usage
where labid = idx;
idx := changedlabids.NEXT(idx);
end loop;
end after statement;
end pg_machine_in_use_update;

Update multiple columns in a trigger function in plpgsql

Given the following schema:
create table account_type_a (
id SERIAL UNIQUE PRIMARY KEY,
some_column VARCHAR
);
create table account_type_b (
id SERIAL UNIQUE PRIMARY KEY,
some_other_column VARCHAR
);
create view account_type_a view AS select * from account_type_a;
create view account_type_b view AS select * from account_type_b;
I try to create a generic trigger function in plpgsql, which enables updating the view:
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
An unsuccessful effort of mine was:
create function updateAccount() returns trigger as $$
declare
target_table varchar := substring(TG_TABLE_NAME from '(.+)_view');
cols varchar;
begin
execute 'select string_agg(column_name,$1) from information_schema.columns
where table_name = $2' using ',', target_table into cols;
execute 'update ' || target_table || ' set (' || cols || ') = select ($1).*
where id = ($1).id' using NEW;
return NULL;
end;
$$ language plpgsql;
The problem is the update statement. I am unable to come up with a syntax that would work here. I have successfully implemented this in PL/Perl, but would be interested in a plpgsql-only solution.
Any ideas?
Update
As #Erwin Brandstetter suggested, here is the code for my PL/Perl solution. I incoporated some of his suggestions.
create function f_tr_up() returns trigger as $$
use strict;
use warnings;
my $target_table = quote_ident($_TD->{'table_name'}) =~ s/^([\w]+)_view$/$1/r;
my $NEW = $_TD->{'new'};
my $cols = join(',', map { quote_ident($_) } keys $NEW);
my $vals = join(',', map { quote_literal($_) } values $NEW);
my $query = sprintf(
"update %s set (%s) = (%s) where id = %d",
$target_table,
$cols,
$vals,
$NEW->{'id'});
spi_exec_query($query);
return;
$$ language plperl;

While #Gary's answer is technically correct, it fails to mention that PostgreSQL does support this form:
UPDATE tbl
SET (col1, col2, ...) = (expression1, expression2, ..)
Read the manual on UPDATE.
It's still tricky to get this done with dynamic SQL. I'll assume a simple case where views consist of the same columns as their underlying tables.
CREATE VIEW tbl_view AS SELECT * FROM tbl;
Problems
The special record NEW is not visible inside EXECUTE. I pass NEW as a single parameter with the USING clause of EXECUTE.
As discussed, UPDATE with list-form needs individual values. I use a subselect to split the record into individual columns:
UPDATE ...
FROM (SELECT ($1).*) x
(Parenthesis around $1 are not optional.) This allows me to simply use two column lists built with string_agg() from the catalog table: one with and one without table qualification.
It's not possible to assign a row value as a whole to individual columns. The manual:
According to the standard, the source value for a parenthesized
sub-list of target column names can be any row-valued expression
yielding the correct number of columns. PostgreSQL only allows the
source value to be a row constructor or a sub-SELECT.
INSERT is implemented simpler. If the structure of view and table are identical we can omit the column definition list. (Can be improved, see below.)
Solution
I made a couple of updates to your approach to make it shine.
Trigger function for UPDATE:
CREATE OR REPLACE FUNCTION f_trg_up()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME from '(.+)_view$'));
_cols text;
_vals text;
BEGIN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ')
, string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s
SET (%s) = (%s)
FROM (SELECT ($1).*) x', _tbl, _cols, _vals)
USING NEW;
RETURN NEW; -- Don't return NULL unless you knwo what you're doing
END
$func$;
Trigger function for INSERT:
CREATE OR REPLACE FUNCTION f_trg_ins()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME FROM '(.+)_view$'));
BEGIN
EXECUTE format('INSERT INTO %s SELECT ($1).*', _tbl)
USING NEW;
RETURN NEW; -- Don't return NULL unless you know what you're doing
END
$func$;
Triggers:
CREATE TRIGGER trg_instead_up
INSTEAD OF UPDATE ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_up();
CREATE TRIGGER trg_instead_ins
INSTEAD OF INSERT ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_ins();
Before Postgres 11 the syntax (oddly) was EXECUTE PROCEDURE instead of EXECUTE FUNCTION - which also still works.
db<>fiddle here - demonstrating INSERT and UPDATE
Old sqlfiddle
Major points
Include the schema name to make the table reference unambiguous. There can be multiple table of the same name in one database with multiple schemas!
Query pg_catalog.pg_attribute instead of information_schema.columns. Less portable, but much faster and allows to use the table-OID.
How to check if a table exists in a given schema
Table names are NOT safe against SQLi when concatenated as strings for dynamic SQL. Escape with quote_ident() or format() or with an object-identifer type. This includes the special trigger function variables TG_TABLE_SCHEMA and TG_TABLE_NAME!
Cast to the object identifier type regclass to assert the table name is valid and get the OID for the catalog look-up.
Optionally use format() to build the dynamic query string safely.
No need for dynamic SQL for the first query on the catalog tables. Faster, simpler.
Use RETURN NEW instead of RETURN NULL in these trigger functions unless you know what you are doing. (NULL would cancel the INSERT for the current row.)
This simple version assumes that every table (and view) has a unique column named id. A more sophisticated version might use the primary key dynamically.
The function for UPDATE allows the columns of view and table to be in any order, as long as the set is the same.
The function for INSERT expects the columns of view and table to be in identical order. If you want to allow arbitrary order, add a column definition list to the INSERT command, just like with UPDATE.
Updated version also covers changes to the id column by using OLD additionally.

Postgresql doesn't support updating multiple columns using the set (col1,col2) = select val1,val2 syntax.
To achieve the same in postgresql you'd use
update target_table
set col1 = d.val1,
col2 = d.val2
from source_table d
where d.id = target_table.id
This is going to make the dynamic query a bit more complex to build as you'll need to iterate the column name list you're using into individual fields. I'd suggest you use array_agg instead of string_agg as an array is easier to process than splitting the string again.
Postgresql UPDATE syntax
documentation on array_agg function

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What would be the right steps for horizontal partitioning in Postgresql? - sql

Related

Partitioned tables cannot have ROW triggers. when adding a for each row trigger that calls a fn which creates a partition before inserting (postgres)

PL/pgSQL function to return the output of various SELECT queries from different database

Check if table is empty in runtime

History table referencing other values in the table / accessing package table variables

Update multiple columns in a trigger function in plpgsql

Categories

Resources