Dynamic column name as date in postgresql crosstab [duplicate] - sql

I implemented this function in my Postgres database: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/
Here's the function:
create or replace function xtab (tablename varchar, rowc varchar, colc varchar, cellc varchar, celldatatype varchar) returns varchar language plpgsql as $$
declare
dynsql1 varchar;
dynsql2 varchar;
columnlist varchar;
begin
-- 1. retrieve list of column names.
dynsql1 = 'select string_agg(distinct '||colc||'||'' '||celldatatype||''','','' order by '||colc||'||'' '||celldatatype||''') from '||tablename||';';
execute dynsql1 into columnlist;
-- 2. set up the crosstab query
dynsql2 = 'select * from crosstab (
''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'',
''select distinct '||colc||' from '||tablename||' order by 1''
)
as ct (
'||rowc||' varchar,'||columnlist||'
);';
return dynsql2;
end
$$;
So now I can call the function:
select xtab('globalpayments','month','currency','(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)','text');
Which returns (because the return type of the function is varchar):
select * from crosstab (
'select month,currency,(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)
from globalpayments
group by 1,2
order by 1,2'
, 'select distinct currency
from globalpayments
order by 1'
) as ct ( month varchar,CAD text,EUR text,GBP text,USD text );
How can I get this function to not only generate the code for the dynamic crosstab, but also execute the result? I.e., the result when I manually copy/paste/execute is this. But I want it to execute without that extra step: the function shall assemble the dynamic query and execute it:
Edit 1
This function comes close, but I need it to return more than just the first column of the first record
Taken from: Are there any way to execute a query inside the string value (like eval) in PostgreSQL?
create or replace function eval( sql text ) returns text as $$
declare
as_txt text;
begin
if sql is null then return null ; end if ;
execute sql into as_txt ;
return as_txt ;
end;
$$ language plpgsql
usage: select * from eval($$select * from analytics limit 1$$)
However it just returns the first column of the first record :
eval
----
2015
when the actual result looks like this:
Year, Month, Date, TPV_USD
---- ----- ------ --------
2016, 3, 2016-03-31, 100000

What you ask for is impossible. SQL is a strictly typed language. PostgreSQL functions need to declare a return type (RETURNS ..) at the time of creation.
A limited way around this is with polymorphic functions. If you can provide the return type at the time of the function call. But that's not evident from your question.
Refactor a PL/pgSQL function to return the output of various SELECT queries
You can return a completely dynamic result with anonymous records. But then you are required to provide a column definition list with every call. And how do you know about the returned columns? Catch 22.
There are various workarounds, depending on what you need or can work with. Since all your data columns seem to share the same data type, I suggest to return an array: text[]. Or you could return a document type like hstore or json. Related:
Dynamic alternative to pivot with CASE and GROUP BY
Dynamically convert hstore keys into columns for an unknown set of keys
But it might be simpler to just use two calls: 1: Let Postgres build the query. 2: Execute and retrieve returned rows.
Selecting multiple max() values using a single SQL statement
I would not use the function from Eric Minikel as presented in your question at all. It is not safe against SQL injection by way of maliciously malformed identifiers. Use format() to build query strings unless you are running an outdated version older than Postgres 9.1.
A shorter and cleaner implementation could look like this:
CREATE OR REPLACE FUNCTION xtab(_tbl regclass, _row text, _cat text
, _expr text -- still vulnerable to SQL injection!
, _type regtype)
RETURNS text
LANGUAGE plpgsql AS
$func$
DECLARE
_cat_list text;
_col_list text;
BEGIN
-- generate categories for xtab param and col definition list
EXECUTE format(
$$SELECT string_agg(quote_literal(x.cat), '), (')
, string_agg(quote_ident (x.cat), %L)
FROM (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
, ' ' || _type || ', ', _cat, _tbl)
INTO _cat_list, _col_list;
-- generate query string
RETURN format(
'SELECT * FROM crosstab(
$q$SELECT %I, %I, %s
FROM %I
GROUP BY 1, 2 -- only works if the 3rd column is an aggregate expression
ORDER BY 1, 2$q$
, $c$VALUES (%5$s)$c$
) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr -- expr must be an aggregate expression!
, _tbl, _cat_list, _col_list, _type);
END
$func$;
Same function call as your original version. The function crosstab() is provided by the additional module tablefunc which has to be installed. Basics:
PostgreSQL Crosstab Query
This handles column and table names safely. Note the use of object identifier types regclass and regtype. Also works for schema-qualified names.
Table name as a PostgreSQL function parameter
However, it is not completely safe while you pass a string to be executed as expression (_expr - cellc in your original query). This kind of input is inherently unsafe against SQL injection and should never be exposed to the general public.
SQL injection in Postgres functions vs prepared queries
Scans the table only once for both lists of categories and should be a bit faster.
Still can't return completely dynamic row types since that's strictly not possible.

Not quite impossible, you can still execute it (from a query execute the string and return SETOF RECORD.
Then you have to specify the return record format. The reason in this case is that the planner needs to know the return format before it can make certain decisions (materialization comes to mind).
So in this case you would EXECUTE the query, return the rows and return SETOF RECORD.
For example, we could do something like this with a wrapper function but the same logic could be folded into your function:
CREATE OR REPLACE FUNCTION crosstab_wrapper
(tablename varchar, rowc varchar, colc varchar,
cellc varchar, celldatatype varchar)
returns setof record language plpgsql as $$
DECLARE outrow record;
BEGIN
FOR outrow IN EXECUTE xtab($1, $2, $3, $4, $5)
LOOP
RETURN NEXT outrow
END LOOP;
END;
$$;
Then you supply the record structure on calling the function just like you do with crosstab.
Then when you all the query you would have to supply a record structure (as (col1 type, col2 type, etc) like you do with connectby.

Related

Compute an aggregated tsrange from a set of entries?

I am trying to compute a aggregated tsrange from a set of row that I extract from an SQL query. Problem is that I keep getting errors that the input parameter is not being passed in.
CREATE OR REPLACE AGGREGATE range_merge(anyrange)
(
sfunc = range_merge,
stype = anyrange
);
DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint) returns tsrange AS
$$
DECLARE
result tsrange;
BEGIN
EXECUTE format('select range_merge(valid) from %s where entity_id = %U', entity_name, entry) into result;
return result;
END
$$ LANGUAGE plpgsql;
When I do:
select * from aggregate_validity(country, 1);
I get an error stating that the entity name and entry do not exist. It does not seem to parameterize the input into the statement properly.
Function:
EXECUTE format('select range_merge(valid) from %s where entity_id=%U',entity_name, entry)
into result;
=>
EXECUTE format('select range_merge(valid) from %I where entity_id=%s',entity_name, entry)
into result;
--%I for identifier, %s for value
Call:
select * from aggregate_validity(country, 1)
=>
select * from aggregate_validity('country', 1);
db<>fiddle demo
CREATE OR REPLACE AGGREGATE range_merge(anyrange) (
SFUNC = range_merge
, STYPE = anyrange
);
-- DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint, OUT result tsrange)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'SELECT range_merge(valid) FROM ' || entity_name || ' WHERE entity_id = $1'
INTO result
USING entry;
END
$func$;
Call:
SELECT aggregate_validity('country', 1);
db<>fiddle here
The call does not need SELECT * FROM, as the function returns a single value per definition.
I used an OUT parameter to simplify (OUT result tsrange). See:
Returning from a function with OUT parameter
Don't concatenate the entry value into the SQL string. Pass it as value with the USING clause. Cleaner, faster.
Since entity_name is passed as regclass, it's safe to simply concatenate (which is a bit cheaper). See:
Table name as a PostgreSQL function parameter
Plus, missing quotes and incorrect format specifiers, as Lukasz already provided.
Your custom aggregate function range_merge() has some caveats:
I wouldn't name it "range_merge", that being the name of the plain function range_merge(), too. While that's legal, it still invites confusing errors.
You are aware that the function range_merge() includes gaps between input ranges in the output range?
range_merge() returns NULL for any NULL input. So if your table has any NULL values in the column valid, the result is always NULL. I strongly suggest that any involved columns shall be defined as NOT NULL.
If you are at liberty to install additional modules, consider range_agg by Paul Jungwirth who is also here on Stackovflow. It provides the superior function range_agg() addressing some of the mentioned issues.
If you don't want to include gaps, consider the Postgres Wiki page on range aggregation.
I would probably not use aggregate_validity() at all. It obscures the nested functionality from the Postgres query planner and may lead so suboptimal query plans. Typically, you can replace it with a correlated or a LATERAL subquery, which can be planned and optimized by Postgres in context of the outer query. I appended a demo to the fiddle:
db<>fiddle here
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

Writing dynamic SQL safely, with an example function that estimates rows in any view

Last week I wrote myself a client side pivot table generator. Included on the screen is a list of tables and views. Along the way, I found it helpful to have an estimate of the number of rows in a selected table or view. It turns out to be easy to get an estimate of a table count, but I didn't know how to get an estimate of the rows in a view. Today, I was reading
COUNT(*) MADE FAST
Laurenz Albe's blog post ends with this tidy piece of clever:
CREATE FUNCTION row_estimator(query text) RETURNS bigint
LANGUAGE plpgsql AS
$$DECLARE
plan jsonb;
BEGIN
EXECUTE 'EXPLAIN (FORMAT JSON) ' || query INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;$$;
That. Is. Nice. I realized that this would work as a view estimator, so I wrote up (read "hacked together") a function:
DROP FUNCTION IF EXISTS api.view_count_estimate (text, text);
CREATE OR REPLACE FUNCTION api.view_count_estimate (
schema_name text,
view_name text)
RETURNS BIGINT
AS $$
DECLARE
plan jsonb;
query text;
BEGIN
EXECUTE
'select definition
from pg_views
where schemaname = $1 and
viewname = $2'
USING schema_name,view_name
INTO query;
EXECUTE
'EXPLAIN (FORMAT JSON) ' || query
INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;
$$ LANGUAGE plpgsql;
ALTER FUNCTION api.view_count_estimate(text, text) OWNER TO user_change_structure;
This brings me to one of the areas I'm a bit nervous about in Postgres: Creating dynamic SQL safely. I'm not really clear about the magic regclass castings, or if I should be using something like quote_ident() above. Is the built SQL with the USING list safe? I don't see how it could be.
I'm using Postgres 11.4.x.
As Laurenz said, your current code is perfectly safe, but given the amount of paranoia around SQL injection nowadays, it's worth elaborating.
Consider the naive version of your function:
CREATE FUNCTION api.view_count_estimate(schema_name text, view_name text) RETURNS BIGINT
AS $$
DECLARE result BIGINT;
BEGIN
EXECUTE 'SELECT COUNT(*) FROM ' || schema_name || '.' || view_name INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql;
This is obviously wide open to SQL injection, but can be easily secured in a couple of ways. The most straightforward is to use quote_ident():
EXECUTE 'SELECT COUNT(*) FROM ' || quote_ident(schema_name) || '.' || quote_ident(view_name)
INTO result;
This guarantees that the concatenated strings are syntactically valid identifiers, by double-quoting them if they contain any symbols, whitespace, or uppercase characters, so there is no risk of the user input being interpreted as an unwanted SQL expression or keyword (though it has the potential downside of making your function inputs case-sensitive).
A much more succinct alternative to quote_ident() is to use the equivalent %I format specifier:
EXECUTE format('SELECT COUNT(*) FROM %I.%I', schema_name, view_name) INTO result;
There is also a %L specifier for embedding string literals, equivalent to the quote_literal() function.
An even nicer approach is to pass view references via a regclass parameter:
CREATE FUNCTION api.view_count_estimate(view_id regclass) RETURNS BIGINT
AS $$
DECLARE result BIGINT;
BEGIN
EXECUTE 'SELECT COUNT(*) FROM ' || view_id::text INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql;
The "magic" behind the regclass type is actually pretty straightforward; the value itself is just the integer primary key of the pg_class table, and it behaves like an integer in most respects, but the casts to and from string values will query pg_class to find the name of the table, following the same quoting rules as quote_ident() and respecting the current schema search path.
In other words, you can call the function with
SELECT view_count_estimate('my_view')
or
SELECT view_count_estimate('"public"."my_view"')
or
SELECT view_count_estimate('public.My_View')
...and the regclass conversion will resolve the identifier just as it would in a query (while the text cast within the function will quote/qualify the identifier as needed).
But all of this is only necessary if you need to substitute an identifier into your query. If you just need to substitute values into a query (as is the case in your function) then a parameterised query (e.g. EXECUTE ... USING) is completely safe. Query parameterisation is not simple string substitution; it maintains a clear separation between code and data (in fact, the entire SQL statement is parsed, and all identifiers resolved, before the parameter values are even considered), so there is no possibility of SQL injection.
In fact, since all of the variables in this case are simple parameters, you don't need a dynamic query at all; your function can just run the query directly, without the EXECUTE:
SELECT definition
FROM pg_views
WHERE schemaname = schema_name
AND viewname = view_name
INTO query;
Under the hood, PL/pgSQL will build exactly the same parameterised query (i.e. ...WHERE schemaname = $1 AND viewname = $2) and bind the parameters to your function inputs. This approach has the added benefit of only preparing the query the first time you call the function within a session, and just re-binding the parameter values on subsequent calls.
Actually, in this case you don't really need the query at all. The pg_get_viewdef() function will return the view definition for a given regclass, so the whole thing can be reduced to:
CREATE FUNCTION api.view_count_estimate(view_id regclass) RETURNS bigint
AS $$
DECLARE
plan jsonb;
BEGIN
EXECUTE 'EXPLAIN (FORMAT JSON) ' || pg_get_viewdef(view_id) INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;
$$ LANGUAGE plpgsql;
Your function is safe from SQL injection.

Trying to create dynamic query strings with PL/PgSQL to make DRY functions in PostgreSQL 9.6

I have tables that contain the same type of data for every year, but the data gathered varies slightly in that they may not have the same fields.
d_abc_2016
d_def_2016
d_ghi_2016
d_jkl_2016
There are certain constants for each table: company_id, employee_id, salary.
However, each one might or might not have these fields that are used to calculate total incentives: bonus, commission, cash_incentives. There are a lot more, but just using these as a examples. All numeric
I should note at this point, users only have the ability to run SELECT statements.
What I would like to be able to do is this:
Give the user the ability to call in SELECT and specify their own fields in addition to the call
Pass the table name being used into the function to use in conditional logic to determine how the query string should be constructed for the eventual total_incentives calculation in addition to passing the whole table so a ton of arguments don't have to be passed into the function
Basically this:
SELECT employee_id, salary, total_incentives(t, 'd_abc_2016')
FROM d_abc_2016 t;
So the function being called will calculate total_incentives which is numeric for that employee_id and also show their salary. But the user might choose to add other fields to look at.
For the function, because the fields used in the total_incentives function will vary from table to table, I need to create logic to construct the query string dynamically.
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS numeric AS
$$
DECLARE
-- table name lower case in case user typed wrong
tbl varchar(255) := lower($2;
-- parse out the table code to use in conditional logic
tbl_code varchar(255) := split_part(survey, '_', 2);
-- the starting point if the query string
base_calc varchar(255) := 'salary + '
-- query string
query_string varchar(255);
-- have to declare this to put computation INTO
total_incentives_calc numeric;
BEGIN
IF tbl_code = 'abc' THEN
query_string := base_calc || 'bonus';
ELSIF tbl_code = 'def' THEN
query_string := base_calc || 'bonus + commission';
ELSIF tbl_code = 'ghi' THEN
-- etc...
END IF;
EXECUTE format('SELECT $1 FROM %I', tbl)
INTO total_incentives_calc
USING query_string;
RETURN total_incentives_calc;
END;
$$
LANGUAGE plpgsql;
This results in an:
ERROR: invalid input syntax for type numeric: "salary + bonus"
CONTEXT: PL/pgSQL function total_incentives(anyelement,text) line 16 at EXECUTE
Since it should be returning a set of numeric values. Change it to the following:
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS SETOF numeric AS
$$
...
RETURN;
Get the same error.
Figure well, maybe it is a table it is trying to return.
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS TABLE(tot_inc numeric) AS
$$
...
Get the same error.
Really, any variation produces that result. So really not sure how to get this to work.
Look at RESULT QUERY, RESULT NEXT, or RESULT QUERY EXECUTE.
https://www.postgresql.org/docs/9.6/static/plpgsql-control-structures.html
RESULT QUERY won't work because it takes a hard coded query from what I can tell, which won't take in variables.
RESULT NEXT iterates through each record, which I don't think will be suitable for my needs and seems like it will be really slow... and it takes a hard coded query from what I can tell.
RESULT QUERY EXECUTE sounds promising.
-- EXECUTE format('SELECT $1 FROM %I', tbl)
-- INTO total_incentives_calc
-- USING query_string;
RETURN QUERY
EXECUTE format('SELECT $1 FROM %I', tbl)
USING query_string;
And get:
ERROR: structure of query does not match function result type
DETAIL: Returned type character varying does not match expected type numeric in column 1.
CONTEXT: PL/pgSQL function total_incentives(anyelement,text) line 20 at RETURN QUERY
It should be returning numeric.
Lastly, I can get this to work, but it won't be DRY. I'd rather not make a bunch of separate functions for each table with duplicative code. Most of the working examples I have seen have the whole query in the function and are called like such:
SELECT total_incentives(d_abc_2016, 'd_abc_2016');
So any additional columns would have to be specified in the function as:
EXECUTE format('SELECT employee_id...)
Given the users will only be able to run SELECT in query this really isn't an option. They need to specify any additional columns they want to see inside a query.
I've posted a similar question but was told it was unclear, so hopefully this lengthier version will more clearly explain what I am trying to do.
The column names and tables names should not be used as query parameters passed by USING clause.
Probably lines:
RETURN QUERY
EXECUTE format('SELECT $1 FROM %I', tbl)
USING query_string;
should be:
RETURN QUERY
EXECUTE format('SELECT %s FROM %I', query_string, tbl);
This case is example why too DRY principle is sometimes problematic. If you write it directly, then your code will be simpler, cleaner and probably shorter.
Dynamic SQL is one from last solution - not first. Use dynamic SQL only when your code will be significantly shorter with dynamic sql than without dynamic SQL.

Passing a ResultSet into a Postgresql Function

Is it possible to pass the results of a postgres query as an input into another function?
As a very contrived example, say I have one query like
SELECT id, name
FROM users
LIMIT 50
and I want to create a function my_function that takes the resultset of the first query and returns the minimum id. Is this possible in pl/pgsql?
SELECT my_function(SELECT id, name FROM Users LIMIT 50); --returns 50
You could use a cursor, but that very impractical for computing a minimum.
I would use a temporary table for that purpose, and pass the table name for use in dynamic SQL:
CREATE OR REPLACE FUNCTION f_min_id(_tbl regclass, OUT min_id int) AS
$func$
BEGIN
EXECUTE 'SELECT min(id) FROM ' || _tbl
INTO min_id;
END
$func$ LANGUAGE plpgsql;
Call:
CREATE TEMP TABLE foo ON COMMIT DROP AS
SELECT id, name
FROM users
LIMIT 50;
SELECT f_min_id('foo');
Major points
The first parameter is of type regclass to prevent SQL injection. More info in this related answer on dba.SE.
I made the temp table ON COMMIT DROP to limit its lifetime to the current transaction. May or may not be what you want.
You can extend this example to take more parameters. Search for code examples for dynamic SQL with EXECUTE.
-> SQLfiddle demo
I would take the problem on the other side, calling an aggregate function for each record of the result set. It's not as flexible but can gives you an hint to work on.
As an exemple to follow your sample problem:
CREATE OR REPLACE FUNCTION myMin ( int,int ) RETURNS int AS $$
SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;
$$ LANGUAGE SQL STRICT IMMUTABLE;
CREATE AGGREGATE my_function ( int ) (
SFUNC = myMin, STYPE = int, INITCOND = 2147483647 --maxint
);
SELECT my_function(id) from (SELECT * FROM Users LIMIT 50) x;
It is not possible to pass an array of generic type RECORD to a plpgsql function which is essentially what you are trying to do.
What you can do is pass in an array of a specific user defined TYPE or of a particular table row type. In the example below you could also swap out the argument data type for the table name users[] (though this would obviously mean getting all data in the users table row).
CREATE TYPE trivial {
"ID" integer,
"NAME" text
}
CREATE OR REPLACE FUNCTION trivial_func(data trivial[])
RETURNS integer AS
$BODY$
DECLARE
BEGIN
--Implementation here using data
return 1;
END$BODY$
LANGUAGE 'plpgsql' VOLATILE;
I think there's no way to pass recordset or table into function (but I'd be glad if i'm wrong). Best I could suggest is to pass array:
create or replace function my_function(data int[])
returns int
as
$$
select min(x) from unnest(data) as x
$$
language SQL;
sql fiddle demo

use selected value as table name in postgres

i've serarched for the answer, but did't find.
so i've got a table types
CREATE TABLE types
(
type_id serial NOT NULL,
type_name character varying,
CONSTRAINT un_type_name UNIQUE (type_name)
)
which holds type names, lets say users - and this is the name of corresponding table users. this design may be a bit ugly, but it was made to allow users create their own types. (is there better way to acheve this?)
now i want to perform a query like this one:
select type_name, (select count(*) from ???) from types
to get list of all type names and count of objects of each type.
can this be done?
You cannot do it directly in SQL
You can use a PLpgSQL function and dynamic SQL
CREATE OR REPLACE FUNCTION tables_count(OUT type_name character varying, OUT rows bigint)
RETURNS SETOF record AS $$
BEGIN
FOR tables_count.type_name IN SELECT types.type_name FROM types
LOOP
EXECUTE 'SELECT COUNT(*) FROM ' || quote_ident(tables_count.type_name) INTO tables_count.rows;
RETURN NEXT;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM tables_count();
I don't have enough information, but I do suspect that something is off with your design. You shouldn't need an extra table for every type.
Be that as it may, what you want to do cannot be done - in pure SQL.
It can be done with a plpgsql function executing dynamic SQL, though:
CREATE OR REPLACE FUNCTION f_type_ct()
RETURNS TABLE (type_name text, ct bigint) AS
$BODY$
DECLARE
tbl text;
BEGIN
FOR tbl IN SELECT t.type_name FROM types t ORDER BY t.type_name
LOOP
RETURN QUERY EXECUTE
'SELECT $1, count(*) FROM ' || tbl::regclass
USING tbl;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql;
Call:
SELECT * FROM f_type_ct();
You'll need to study most of the chapter about plpgsql in the manual to understand what's going on here.
One special hint: the cast to regclass is a safeguard against SQLi. You could also use the more generally applicable quote_ident() for that, but that does not properly handle schema-qualified table names, while the cast to regclass does. It also only accepts table names that are visible to the calling user.