Using a variable in a Postgres function [duplicate] - sql

It must be simple, but I'm making my first steps into Postgres functions and I can't find anything that works...
I'd like to create a function that will modify a table and / or column and I can't find the right way of specifying my tables and columns as arguments in my function.
Something like:
CREATE OR REPLACE FUNCTION foo(t table)
RETURNS void AS $$
BEGIN
alter table t add column c1 varchar(20);
alter table t add column c2 varchar(20);
alter table t add column c3 varchar(20);
alter table t add column c4 varchar(20);
END;
$$ LANGUAGE PLPGSQL;
select foo(some_table)
In another case, I'd like to have a function that alters a certain column from a certain table:
CREATE OR REPLACE FUNCTION foo(t table, c column)
RETURNS void AS $$
BEGIN
UPDATE t SET c = "This is a test";
END;
$$ LANGUAGE PLPGSQL;
Is it possible to do that?

You must defend against SQL injection whenever you turn user input into code. That includes table and column names coming from system catalogs or from direct user input alike. This way you also prevent trivial exceptions with non-standard identifiers. There are basically three built-in methods:
1. format()
1st query, sanitized:
CREATE OR REPLACE FUNCTION foo(_t text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('
ALTER TABLE %I ADD COLUMN c1 varchar(20)
, ADD COLUMN c2 varchar(20)', _t);
END
$func$;
format() requires Postgres 9.1 or later. Use it with the %I format specifier.
The table name alone may be ambiguous. You may have to provide the schema name to avoid changing the wrong table by accident. Related:
INSERT with dynamic table name in trigger function
How does the search_path influence identifier resolution and the "current schema"
Aside: adding multiple columns with a single ALTER TABLE command is cheaper.
2. regclass
You can also use a cast to a registered class (regclass) for the special case of existing table names. Optionally schema-qualified. This fails immediately and gracefully for table names that are not be valid and visible to the calling user. The 1st query sanitized with a cast to regclass:
CREATE OR REPLACE FUNCTION foo(_t regclass)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'ALTER TABLE ' || _t || ' ADD COLUMN c1 varchar(20)
, ADD COLUMN c2 varchar(20)';
END
$func$;
Call:
SELECT foo('table_name');
Or:
SELECT foo('my_schema.table_name'::regclass);
Aside: consider using just text instead of varchar(20).
3. quote_ident()
The 2nd query sanitized:
CREATE OR REPLACE FUNCTION foo(_t regclass, _c text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'UPDATE ' || _t -- sanitized with regclass
|| ' SET ' || quote_ident(_c) || ' = ''This is a test''';
END
$func$;
For multiple concatenations / interpolations, format() is cleaner ...
Related answers:
Table name as a PostgreSQL function parameter
Postgres functions vs prepared queries
Case sensitive!
Be aware that unquoted identifiers are not cast to lower case here. When used as identifier in SQL [Postgres casts to lower case automatically][7]. But here we pass strings for dynamic SQL. When escaped as demonstrated, CaMel-case identifiers (like UserS) will be preserved by doublequoting ("UserS"), just like other non-standard names like "name with space" "SELECT"etc. Hence, names are case sensitive in this context.
My standing advice is to use legal lower case identifiers exclusively and never worry about that.
Aside: single quotes are for values, double quotes are for identifiers. See:
Are PostgreSQL column names case-sensitive?

Related

Extract column values from one table and insert with modifications into another

I have created a PL/pgSQL function that accepts two column names, a "relation", and two table names. It finds distinct rows in one table and inserts them in to a temporary table, deletes any row with a null value, and sets all values of one column to relation. I have the first part of the process using this function.
create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
returns void
language plpgsql as
$func$
begin
raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null');
end
$func$;
It is executed as follows:
-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );
select * from temp_stack;
Note that I didn't use relation, yet. The INSERT shall also assign relation, but I can't figure out how to make that happen to get something like:
label
source
target
attr
graph
:associated_with
638000
ARAS
:associated_with
202000
JASE
:associated_with
638010
JASE
:associated_with
638000
JASE
:associated_with
202100
JASE
:documents
A
638010
:documents
A
202000
:documents
A
202100
:documents
B
638000
:documents
A
638000
:documents
B
124004
:documents
B
202100
My challenges are:
How to integrate relation in the INSERT? When I try to use VALUES and comma separation I get an "error near select".
How to allow strings starting with ":" in relation? I'm anticipating here, the inclusion of the colon has given me challenges in the past.
How can I do this? Or is there a better approach?
Toy data model:
drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;
-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text := format(
'INSERT INTO pg_temp.%3$I (label, source, target)
SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
WHERE (%1$I, %2$I) IS NOT NULL'
, _s, _v, _tbl, _tbl_src);
BEGIN
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _relation;
GET DIAGNOSTICS row_count = ROW_COUNT; -- return number of inserted rows
END
$func$;
db<>fiddle here
Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.
We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.)
See:
Format specifier for integer variables in format() for EXECUTE?
Table name as a PostgreSQL function parameter
I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.
(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL.
About that:
Check if a Postgres composite field is null/empty
Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.
I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:
How does the search_path influence identifier resolution and the "current schema"
I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See:
Can I make a plpgsql function return an integer without using a variable?

Writing dynamic SQL safely, with an example function that estimates rows in any view

Last week I wrote myself a client side pivot table generator. Included on the screen is a list of tables and views. Along the way, I found it helpful to have an estimate of the number of rows in a selected table or view. It turns out to be easy to get an estimate of a table count, but I didn't know how to get an estimate of the rows in a view. Today, I was reading
COUNT(*) MADE FAST
Laurenz Albe's blog post ends with this tidy piece of clever:
CREATE FUNCTION row_estimator(query text) RETURNS bigint
LANGUAGE plpgsql AS
$$DECLARE
plan jsonb;
BEGIN
EXECUTE 'EXPLAIN (FORMAT JSON) ' || query INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;$$;
That. Is. Nice. I realized that this would work as a view estimator, so I wrote up (read "hacked together") a function:
DROP FUNCTION IF EXISTS api.view_count_estimate (text, text);
CREATE OR REPLACE FUNCTION api.view_count_estimate (
schema_name text,
view_name text)
RETURNS BIGINT
AS $$
DECLARE
plan jsonb;
query text;
BEGIN
EXECUTE
'select definition
from pg_views
where schemaname = $1 and
viewname = $2'
USING schema_name,view_name
INTO query;
EXECUTE
'EXPLAIN (FORMAT JSON) ' || query
INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;
$$ LANGUAGE plpgsql;
ALTER FUNCTION api.view_count_estimate(text, text) OWNER TO user_change_structure;
This brings me to one of the areas I'm a bit nervous about in Postgres: Creating dynamic SQL safely. I'm not really clear about the magic regclass castings, or if I should be using something like quote_ident() above. Is the built SQL with the USING list safe? I don't see how it could be.
I'm using Postgres 11.4.x.
As Laurenz said, your current code is perfectly safe, but given the amount of paranoia around SQL injection nowadays, it's worth elaborating.
Consider the naive version of your function:
CREATE FUNCTION api.view_count_estimate(schema_name text, view_name text) RETURNS BIGINT
AS $$
DECLARE result BIGINT;
BEGIN
EXECUTE 'SELECT COUNT(*) FROM ' || schema_name || '.' || view_name INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql;
This is obviously wide open to SQL injection, but can be easily secured in a couple of ways. The most straightforward is to use quote_ident():
EXECUTE 'SELECT COUNT(*) FROM ' || quote_ident(schema_name) || '.' || quote_ident(view_name)
INTO result;
This guarantees that the concatenated strings are syntactically valid identifiers, by double-quoting them if they contain any symbols, whitespace, or uppercase characters, so there is no risk of the user input being interpreted as an unwanted SQL expression or keyword (though it has the potential downside of making your function inputs case-sensitive).
A much more succinct alternative to quote_ident() is to use the equivalent %I format specifier:
EXECUTE format('SELECT COUNT(*) FROM %I.%I', schema_name, view_name) INTO result;
There is also a %L specifier for embedding string literals, equivalent to the quote_literal() function.
An even nicer approach is to pass view references via a regclass parameter:
CREATE FUNCTION api.view_count_estimate(view_id regclass) RETURNS BIGINT
AS $$
DECLARE result BIGINT;
BEGIN
EXECUTE 'SELECT COUNT(*) FROM ' || view_id::text INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql;
The "magic" behind the regclass type is actually pretty straightforward; the value itself is just the integer primary key of the pg_class table, and it behaves like an integer in most respects, but the casts to and from string values will query pg_class to find the name of the table, following the same quoting rules as quote_ident() and respecting the current schema search path.
In other words, you can call the function with
SELECT view_count_estimate('my_view')
or
SELECT view_count_estimate('"public"."my_view"')
or
SELECT view_count_estimate('public.My_View')
...and the regclass conversion will resolve the identifier just as it would in a query (while the text cast within the function will quote/qualify the identifier as needed).
But all of this is only necessary if you need to substitute an identifier into your query. If you just need to substitute values into a query (as is the case in your function) then a parameterised query (e.g. EXECUTE ... USING) is completely safe. Query parameterisation is not simple string substitution; it maintains a clear separation between code and data (in fact, the entire SQL statement is parsed, and all identifiers resolved, before the parameter values are even considered), so there is no possibility of SQL injection.
In fact, since all of the variables in this case are simple parameters, you don't need a dynamic query at all; your function can just run the query directly, without the EXECUTE:
SELECT definition
FROM pg_views
WHERE schemaname = schema_name
AND viewname = view_name
INTO query;
Under the hood, PL/pgSQL will build exactly the same parameterised query (i.e. ...WHERE schemaname = $1 AND viewname = $2) and bind the parameters to your function inputs. This approach has the added benefit of only preparing the query the first time you call the function within a session, and just re-binding the parameter values on subsequent calls.
Actually, in this case you don't really need the query at all. The pg_get_viewdef() function will return the view definition for a given regclass, so the whole thing can be reduced to:
CREATE FUNCTION api.view_count_estimate(view_id regclass) RETURNS bigint
AS $$
DECLARE
plan jsonb;
BEGIN
EXECUTE 'EXPLAIN (FORMAT JSON) ' || pg_get_viewdef(view_id) INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;
$$ LANGUAGE plpgsql;
Your function is safe from SQL injection.

INSERT with dynamic column names

I have column names stored in variable colls, next I execute code:
DO $$
DECLARE
v_name text := quote_ident('colls');
BEGIN
EXECUTE 'insert into table1 select '|| colls ||' from table2 ';
-- EXECUTE 'insert into table1 select '|| v_name ||' from table2 ';
END$$;
I have got error: column "colls" does not exist. Program used colls as name not as variable. What am I doing wrong?
I have found similar example in documentation:
https://www.postgresql.org/docs/8.1/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
I have column names stored in variable colls
No, you don't. You have a variable v_name - which holds a single word: 'colls'. About variables in SQL:
User defined variables in PostgreSQL
Read the chapters Identifiers and Key Words and Constants in the manual.
And if you had multiple column names in a single variable, you could not use quote_ident() like that. It would escape the whole string as a single identifier.
I guess the basic misunderstanding is this: 'colls' is a string constant, not a variable. There are no other variables in a DO statement than the ones you declare in the DECLARE section. You might be looking for a function that takes a variable number of column names as parameter(s) ...
CREATE OR REPLACE FUNCTION f_insert_these_columns(VARIADIC _cols text[])
RETURNS void AS
$func$
BEGIN
EXECUTE (
SELECT 'INSERT INTO table1 SELECT '
|| string_agg(quote_ident(col), ', ')
|| ' FROM table2'
FROM unnest(_cols) col
);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT f_insert_these_columns('abd', 'NeW Deal'); -- column names case sensitive!
SELECT f_insert_these_columns(VARIADIC '{abd, NeW Deal}'); -- column names case sensitive!
Note how I unnest the array of column names and escape them one by one.
A VARIADIC parameter should be perfect for your use case. You can either pass a list of column names or an array.
Either way, be vary of SQL injection.
Related, with more explanation:
Pass multiple values in single parameter
Table name as a PostgreSQL function parameter

Update multiple columns in a trigger function in plpgsql

Given the following schema:
create table account_type_a (
id SERIAL UNIQUE PRIMARY KEY,
some_column VARCHAR
);
create table account_type_b (
id SERIAL UNIQUE PRIMARY KEY,
some_other_column VARCHAR
);
create view account_type_a view AS select * from account_type_a;
create view account_type_b view AS select * from account_type_b;
I try to create a generic trigger function in plpgsql, which enables updating the view:
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
An unsuccessful effort of mine was:
create function updateAccount() returns trigger as $$
declare
target_table varchar := substring(TG_TABLE_NAME from '(.+)_view');
cols varchar;
begin
execute 'select string_agg(column_name,$1) from information_schema.columns
where table_name = $2' using ',', target_table into cols;
execute 'update ' || target_table || ' set (' || cols || ') = select ($1).*
where id = ($1).id' using NEW;
return NULL;
end;
$$ language plpgsql;
The problem is the update statement. I am unable to come up with a syntax that would work here. I have successfully implemented this in PL/Perl, but would be interested in a plpgsql-only solution.
Any ideas?
Update
As #Erwin Brandstetter suggested, here is the code for my PL/Perl solution. I incoporated some of his suggestions.
create function f_tr_up() returns trigger as $$
use strict;
use warnings;
my $target_table = quote_ident($_TD->{'table_name'}) =~ s/^([\w]+)_view$/$1/r;
my $NEW = $_TD->{'new'};
my $cols = join(',', map { quote_ident($_) } keys $NEW);
my $vals = join(',', map { quote_literal($_) } values $NEW);
my $query = sprintf(
"update %s set (%s) = (%s) where id = %d",
$target_table,
$cols,
$vals,
$NEW->{'id'});
spi_exec_query($query);
return;
$$ language plperl;
While #Gary's answer is technically correct, it fails to mention that PostgreSQL does support this form:
UPDATE tbl
SET (col1, col2, ...) = (expression1, expression2, ..)
Read the manual on UPDATE.
It's still tricky to get this done with dynamic SQL. I'll assume a simple case where views consist of the same columns as their underlying tables.
CREATE VIEW tbl_view AS SELECT * FROM tbl;
Problems
The special record NEW is not visible inside EXECUTE. I pass NEW as a single parameter with the USING clause of EXECUTE.
As discussed, UPDATE with list-form needs individual values. I use a subselect to split the record into individual columns:
UPDATE ...
FROM (SELECT ($1).*) x
(Parenthesis around $1 are not optional.) This allows me to simply use two column lists built with string_agg() from the catalog table: one with and one without table qualification.
It's not possible to assign a row value as a whole to individual columns. The manual:
According to the standard, the source value for a parenthesized
sub-list of target column names can be any row-valued expression
yielding the correct number of columns. PostgreSQL only allows the
source value to be a row constructor or a sub-SELECT.
INSERT is implemented simpler. If the structure of view and table are identical we can omit the column definition list. (Can be improved, see below.)
Solution
I made a couple of updates to your approach to make it shine.
Trigger function for UPDATE:
CREATE OR REPLACE FUNCTION f_trg_up()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME from '(.+)_view$'));
_cols text;
_vals text;
BEGIN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ')
, string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s
SET (%s) = (%s)
FROM (SELECT ($1).*) x', _tbl, _cols, _vals)
USING NEW;
RETURN NEW; -- Don't return NULL unless you knwo what you're doing
END
$func$;
Trigger function for INSERT:
CREATE OR REPLACE FUNCTION f_trg_ins()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME FROM '(.+)_view$'));
BEGIN
EXECUTE format('INSERT INTO %s SELECT ($1).*', _tbl)
USING NEW;
RETURN NEW; -- Don't return NULL unless you know what you're doing
END
$func$;
Triggers:
CREATE TRIGGER trg_instead_up
INSTEAD OF UPDATE ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_up();
CREATE TRIGGER trg_instead_ins
INSTEAD OF INSERT ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_ins();
Before Postgres 11 the syntax (oddly) was EXECUTE PROCEDURE instead of EXECUTE FUNCTION - which also still works.
db<>fiddle here - demonstrating INSERT and UPDATE
Old sqlfiddle
Major points
Include the schema name to make the table reference unambiguous. There can be multiple table of the same name in one database with multiple schemas!
Query pg_catalog.pg_attribute instead of information_schema.columns. Less portable, but much faster and allows to use the table-OID.
How to check if a table exists in a given schema
Table names are NOT safe against SQLi when concatenated as strings for dynamic SQL. Escape with quote_ident() or format() or with an object-identifer type. This includes the special trigger function variables TG_TABLE_SCHEMA and TG_TABLE_NAME!
Cast to the object identifier type regclass to assert the table name is valid and get the OID for the catalog look-up.
Optionally use format() to build the dynamic query string safely.
No need for dynamic SQL for the first query on the catalog tables. Faster, simpler.
Use RETURN NEW instead of RETURN NULL in these trigger functions unless you know what you are doing. (NULL would cancel the INSERT for the current row.)
This simple version assumes that every table (and view) has a unique column named id. A more sophisticated version might use the primary key dynamically.
The function for UPDATE allows the columns of view and table to be in any order, as long as the set is the same.
The function for INSERT expects the columns of view and table to be in identical order. If you want to allow arbitrary order, add a column definition list to the INSERT command, just like with UPDATE.
Updated version also covers changes to the id column by using OLD additionally.
Postgresql doesn't support updating multiple columns using the set (col1,col2) = select val1,val2 syntax.
To achieve the same in postgresql you'd use
update target_table
set col1 = d.val1,
col2 = d.val2
from source_table d
where d.id = target_table.id
This is going to make the dynamic query a bit more complex to build as you'll need to iterate the column name list you're using into individual fields. I'd suggest you use array_agg instead of string_agg as an array is easier to process than splitting the string again.
Postgresql UPDATE syntax
documentation on array_agg function

use selected value as table name in postgres

i've serarched for the answer, but did't find.
so i've got a table types
CREATE TABLE types
(
type_id serial NOT NULL,
type_name character varying,
CONSTRAINT un_type_name UNIQUE (type_name)
)
which holds type names, lets say users - and this is the name of corresponding table users. this design may be a bit ugly, but it was made to allow users create their own types. (is there better way to acheve this?)
now i want to perform a query like this one:
select type_name, (select count(*) from ???) from types
to get list of all type names and count of objects of each type.
can this be done?
You cannot do it directly in SQL
You can use a PLpgSQL function and dynamic SQL
CREATE OR REPLACE FUNCTION tables_count(OUT type_name character varying, OUT rows bigint)
RETURNS SETOF record AS $$
BEGIN
FOR tables_count.type_name IN SELECT types.type_name FROM types
LOOP
EXECUTE 'SELECT COUNT(*) FROM ' || quote_ident(tables_count.type_name) INTO tables_count.rows;
RETURN NEXT;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM tables_count();
I don't have enough information, but I do suspect that something is off with your design. You shouldn't need an extra table for every type.
Be that as it may, what you want to do cannot be done - in pure SQL.
It can be done with a plpgsql function executing dynamic SQL, though:
CREATE OR REPLACE FUNCTION f_type_ct()
RETURNS TABLE (type_name text, ct bigint) AS
$BODY$
DECLARE
tbl text;
BEGIN
FOR tbl IN SELECT t.type_name FROM types t ORDER BY t.type_name
LOOP
RETURN QUERY EXECUTE
'SELECT $1, count(*) FROM ' || tbl::regclass
USING tbl;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql;
Call:
SELECT * FROM f_type_ct();
You'll need to study most of the chapter about plpgsql in the manual to understand what's going on here.
One special hint: the cast to regclass is a safeguard against SQLi. You could also use the more generally applicable quote_ident() for that, but that does not properly handle schema-qualified table names, while the cast to regclass does. It also only accepts table names that are visible to the calling user.