Trying to create dynamic query strings with PL/PgSQL to make DRY functions in PostgreSQL 9.6 - sql

I have tables that contain the same type of data for every year, but the data gathered varies slightly in that they may not have the same fields.
d_abc_2016
d_def_2016
d_ghi_2016
d_jkl_2016
There are certain constants for each table: company_id, employee_id, salary.
However, each one might or might not have these fields that are used to calculate total incentives: bonus, commission, cash_incentives. There are a lot more, but just using these as a examples. All numeric
I should note at this point, users only have the ability to run SELECT statements.
What I would like to be able to do is this:
Give the user the ability to call in SELECT and specify their own fields in addition to the call
Pass the table name being used into the function to use in conditional logic to determine how the query string should be constructed for the eventual total_incentives calculation in addition to passing the whole table so a ton of arguments don't have to be passed into the function
Basically this:
SELECT employee_id, salary, total_incentives(t, 'd_abc_2016')
FROM d_abc_2016 t;
So the function being called will calculate total_incentives which is numeric for that employee_id and also show their salary. But the user might choose to add other fields to look at.
For the function, because the fields used in the total_incentives function will vary from table to table, I need to create logic to construct the query string dynamically.
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS numeric AS
$$
DECLARE
-- table name lower case in case user typed wrong
tbl varchar(255) := lower($2;
-- parse out the table code to use in conditional logic
tbl_code varchar(255) := split_part(survey, '_', 2);
-- the starting point if the query string
base_calc varchar(255) := 'salary + '
-- query string
query_string varchar(255);
-- have to declare this to put computation INTO
total_incentives_calc numeric;
BEGIN
IF tbl_code = 'abc' THEN
query_string := base_calc || 'bonus';
ELSIF tbl_code = 'def' THEN
query_string := base_calc || 'bonus + commission';
ELSIF tbl_code = 'ghi' THEN
-- etc...
END IF;
EXECUTE format('SELECT $1 FROM %I', tbl)
INTO total_incentives_calc
USING query_string;
RETURN total_incentives_calc;
END;
$$
LANGUAGE plpgsql;
This results in an:
ERROR: invalid input syntax for type numeric: "salary + bonus"
CONTEXT: PL/pgSQL function total_incentives(anyelement,text) line 16 at EXECUTE
Since it should be returning a set of numeric values. Change it to the following:
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS SETOF numeric AS
$$
...
RETURN;
Get the same error.
Figure well, maybe it is a table it is trying to return.
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS TABLE(tot_inc numeric) AS
$$
...
Get the same error.
Really, any variation produces that result. So really not sure how to get this to work.
Look at RESULT QUERY, RESULT NEXT, or RESULT QUERY EXECUTE.
https://www.postgresql.org/docs/9.6/static/plpgsql-control-structures.html
RESULT QUERY won't work because it takes a hard coded query from what I can tell, which won't take in variables.
RESULT NEXT iterates through each record, which I don't think will be suitable for my needs and seems like it will be really slow... and it takes a hard coded query from what I can tell.
RESULT QUERY EXECUTE sounds promising.
-- EXECUTE format('SELECT $1 FROM %I', tbl)
-- INTO total_incentives_calc
-- USING query_string;
RETURN QUERY
EXECUTE format('SELECT $1 FROM %I', tbl)
USING query_string;
And get:
ERROR: structure of query does not match function result type
DETAIL: Returned type character varying does not match expected type numeric in column 1.
CONTEXT: PL/pgSQL function total_incentives(anyelement,text) line 20 at RETURN QUERY
It should be returning numeric.
Lastly, I can get this to work, but it won't be DRY. I'd rather not make a bunch of separate functions for each table with duplicative code. Most of the working examples I have seen have the whole query in the function and are called like such:
SELECT total_incentives(d_abc_2016, 'd_abc_2016');
So any additional columns would have to be specified in the function as:
EXECUTE format('SELECT employee_id...)
Given the users will only be able to run SELECT in query this really isn't an option. They need to specify any additional columns they want to see inside a query.
I've posted a similar question but was told it was unclear, so hopefully this lengthier version will more clearly explain what I am trying to do.

The column names and tables names should not be used as query parameters passed by USING clause.
Probably lines:
RETURN QUERY
EXECUTE format('SELECT $1 FROM %I', tbl)
USING query_string;
should be:
RETURN QUERY
EXECUTE format('SELECT %s FROM %I', query_string, tbl);
This case is example why too DRY principle is sometimes problematic. If you write it directly, then your code will be simpler, cleaner and probably shorter.
Dynamic SQL is one from last solution - not first. Use dynamic SQL only when your code will be significantly shorter with dynamic sql than without dynamic SQL.

Related

Compute an aggregated tsrange from a set of entries?

I am trying to compute a aggregated tsrange from a set of row that I extract from an SQL query. Problem is that I keep getting errors that the input parameter is not being passed in.
CREATE OR REPLACE AGGREGATE range_merge(anyrange)
(
sfunc = range_merge,
stype = anyrange
);
DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint) returns tsrange AS
$$
DECLARE
result tsrange;
BEGIN
EXECUTE format('select range_merge(valid) from %s where entity_id = %U', entity_name, entry) into result;
return result;
END
$$ LANGUAGE plpgsql;
When I do:
select * from aggregate_validity(country, 1);
I get an error stating that the entity name and entry do not exist. It does not seem to parameterize the input into the statement properly.
Function:
EXECUTE format('select range_merge(valid) from %s where entity_id=%U',entity_name, entry)
into result;
=>
EXECUTE format('select range_merge(valid) from %I where entity_id=%s',entity_name, entry)
into result;
--%I for identifier, %s for value
Call:
select * from aggregate_validity(country, 1)
=>
select * from aggregate_validity('country', 1);
db<>fiddle demo
CREATE OR REPLACE AGGREGATE range_merge(anyrange) (
SFUNC = range_merge
, STYPE = anyrange
);
-- DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint, OUT result tsrange)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'SELECT range_merge(valid) FROM ' || entity_name || ' WHERE entity_id = $1'
INTO result
USING entry;
END
$func$;
Call:
SELECT aggregate_validity('country', 1);
db<>fiddle here
The call does not need SELECT * FROM, as the function returns a single value per definition.
I used an OUT parameter to simplify (OUT result tsrange). See:
Returning from a function with OUT parameter
Don't concatenate the entry value into the SQL string. Pass it as value with the USING clause. Cleaner, faster.
Since entity_name is passed as regclass, it's safe to simply concatenate (which is a bit cheaper). See:
Table name as a PostgreSQL function parameter
Plus, missing quotes and incorrect format specifiers, as Lukasz already provided.
Your custom aggregate function range_merge() has some caveats:
I wouldn't name it "range_merge", that being the name of the plain function range_merge(), too. While that's legal, it still invites confusing errors.
You are aware that the function range_merge() includes gaps between input ranges in the output range?
range_merge() returns NULL for any NULL input. So if your table has any NULL values in the column valid, the result is always NULL. I strongly suggest that any involved columns shall be defined as NOT NULL.
If you are at liberty to install additional modules, consider range_agg by Paul Jungwirth who is also here on Stackovflow. It provides the superior function range_agg() addressing some of the mentioned issues.
If you don't want to include gaps, consider the Postgres Wiki page on range aggregation.
I would probably not use aggregate_validity() at all. It obscures the nested functionality from the Postgres query planner and may lead so suboptimal query plans. Typically, you can replace it with a correlated or a LATERAL subquery, which can be planned and optimized by Postgres in context of the outer query. I appended a demo to the fiddle:
db<>fiddle here
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

Writing dynamic SQL safely, with an example function that estimates rows in any view

Last week I wrote myself a client side pivot table generator. Included on the screen is a list of tables and views. Along the way, I found it helpful to have an estimate of the number of rows in a selected table or view. It turns out to be easy to get an estimate of a table count, but I didn't know how to get an estimate of the rows in a view. Today, I was reading
COUNT(*) MADE FAST
Laurenz Albe's blog post ends with this tidy piece of clever:
CREATE FUNCTION row_estimator(query text) RETURNS bigint
LANGUAGE plpgsql AS
$$DECLARE
plan jsonb;
BEGIN
EXECUTE 'EXPLAIN (FORMAT JSON) ' || query INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;$$;
That. Is. Nice. I realized that this would work as a view estimator, so I wrote up (read "hacked together") a function:
DROP FUNCTION IF EXISTS api.view_count_estimate (text, text);
CREATE OR REPLACE FUNCTION api.view_count_estimate (
schema_name text,
view_name text)
RETURNS BIGINT
AS $$
DECLARE
plan jsonb;
query text;
BEGIN
EXECUTE
'select definition
from pg_views
where schemaname = $1 and
viewname = $2'
USING schema_name,view_name
INTO query;
EXECUTE
'EXPLAIN (FORMAT JSON) ' || query
INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;
$$ LANGUAGE plpgsql;
ALTER FUNCTION api.view_count_estimate(text, text) OWNER TO user_change_structure;
This brings me to one of the areas I'm a bit nervous about in Postgres: Creating dynamic SQL safely. I'm not really clear about the magic regclass castings, or if I should be using something like quote_ident() above. Is the built SQL with the USING list safe? I don't see how it could be.
I'm using Postgres 11.4.x.
As Laurenz said, your current code is perfectly safe, but given the amount of paranoia around SQL injection nowadays, it's worth elaborating.
Consider the naive version of your function:
CREATE FUNCTION api.view_count_estimate(schema_name text, view_name text) RETURNS BIGINT
AS $$
DECLARE result BIGINT;
BEGIN
EXECUTE 'SELECT COUNT(*) FROM ' || schema_name || '.' || view_name INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql;
This is obviously wide open to SQL injection, but can be easily secured in a couple of ways. The most straightforward is to use quote_ident():
EXECUTE 'SELECT COUNT(*) FROM ' || quote_ident(schema_name) || '.' || quote_ident(view_name)
INTO result;
This guarantees that the concatenated strings are syntactically valid identifiers, by double-quoting them if they contain any symbols, whitespace, or uppercase characters, so there is no risk of the user input being interpreted as an unwanted SQL expression or keyword (though it has the potential downside of making your function inputs case-sensitive).
A much more succinct alternative to quote_ident() is to use the equivalent %I format specifier:
EXECUTE format('SELECT COUNT(*) FROM %I.%I', schema_name, view_name) INTO result;
There is also a %L specifier for embedding string literals, equivalent to the quote_literal() function.
An even nicer approach is to pass view references via a regclass parameter:
CREATE FUNCTION api.view_count_estimate(view_id regclass) RETURNS BIGINT
AS $$
DECLARE result BIGINT;
BEGIN
EXECUTE 'SELECT COUNT(*) FROM ' || view_id::text INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql;
The "magic" behind the regclass type is actually pretty straightforward; the value itself is just the integer primary key of the pg_class table, and it behaves like an integer in most respects, but the casts to and from string values will query pg_class to find the name of the table, following the same quoting rules as quote_ident() and respecting the current schema search path.
In other words, you can call the function with
SELECT view_count_estimate('my_view')
or
SELECT view_count_estimate('"public"."my_view"')
or
SELECT view_count_estimate('public.My_View')
...and the regclass conversion will resolve the identifier just as it would in a query (while the text cast within the function will quote/qualify the identifier as needed).
But all of this is only necessary if you need to substitute an identifier into your query. If you just need to substitute values into a query (as is the case in your function) then a parameterised query (e.g. EXECUTE ... USING) is completely safe. Query parameterisation is not simple string substitution; it maintains a clear separation between code and data (in fact, the entire SQL statement is parsed, and all identifiers resolved, before the parameter values are even considered), so there is no possibility of SQL injection.
In fact, since all of the variables in this case are simple parameters, you don't need a dynamic query at all; your function can just run the query directly, without the EXECUTE:
SELECT definition
FROM pg_views
WHERE schemaname = schema_name
AND viewname = view_name
INTO query;
Under the hood, PL/pgSQL will build exactly the same parameterised query (i.e. ...WHERE schemaname = $1 AND viewname = $2) and bind the parameters to your function inputs. This approach has the added benefit of only preparing the query the first time you call the function within a session, and just re-binding the parameter values on subsequent calls.
Actually, in this case you don't really need the query at all. The pg_get_viewdef() function will return the view definition for a given regclass, so the whole thing can be reduced to:
CREATE FUNCTION api.view_count_estimate(view_id regclass) RETURNS bigint
AS $$
DECLARE
plan jsonb;
BEGIN
EXECUTE 'EXPLAIN (FORMAT JSON) ' || pg_get_viewdef(view_id) INTO plan;
RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;
$$ LANGUAGE plpgsql;
Your function is safe from SQL injection.

Dynamic column name as date in postgresql crosstab [duplicate]

I implemented this function in my Postgres database: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/
Here's the function:
create or replace function xtab (tablename varchar, rowc varchar, colc varchar, cellc varchar, celldatatype varchar) returns varchar language plpgsql as $$
declare
dynsql1 varchar;
dynsql2 varchar;
columnlist varchar;
begin
-- 1. retrieve list of column names.
dynsql1 = 'select string_agg(distinct '||colc||'||'' '||celldatatype||''','','' order by '||colc||'||'' '||celldatatype||''') from '||tablename||';';
execute dynsql1 into columnlist;
-- 2. set up the crosstab query
dynsql2 = 'select * from crosstab (
''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'',
''select distinct '||colc||' from '||tablename||' order by 1''
)
as ct (
'||rowc||' varchar,'||columnlist||'
);';
return dynsql2;
end
$$;
So now I can call the function:
select xtab('globalpayments','month','currency','(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)','text');
Which returns (because the return type of the function is varchar):
select * from crosstab (
'select month,currency,(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)
from globalpayments
group by 1,2
order by 1,2'
, 'select distinct currency
from globalpayments
order by 1'
) as ct ( month varchar,CAD text,EUR text,GBP text,USD text );
How can I get this function to not only generate the code for the dynamic crosstab, but also execute the result? I.e., the result when I manually copy/paste/execute is this. But I want it to execute without that extra step: the function shall assemble the dynamic query and execute it:
Edit 1
This function comes close, but I need it to return more than just the first column of the first record
Taken from: Are there any way to execute a query inside the string value (like eval) in PostgreSQL?
create or replace function eval( sql text ) returns text as $$
declare
as_txt text;
begin
if sql is null then return null ; end if ;
execute sql into as_txt ;
return as_txt ;
end;
$$ language plpgsql
usage: select * from eval($$select * from analytics limit 1$$)
However it just returns the first column of the first record :
eval
----
2015
when the actual result looks like this:
Year, Month, Date, TPV_USD
---- ----- ------ --------
2016, 3, 2016-03-31, 100000
What you ask for is impossible. SQL is a strictly typed language. PostgreSQL functions need to declare a return type (RETURNS ..) at the time of creation.
A limited way around this is with polymorphic functions. If you can provide the return type at the time of the function call. But that's not evident from your question.
Refactor a PL/pgSQL function to return the output of various SELECT queries
You can return a completely dynamic result with anonymous records. But then you are required to provide a column definition list with every call. And how do you know about the returned columns? Catch 22.
There are various workarounds, depending on what you need or can work with. Since all your data columns seem to share the same data type, I suggest to return an array: text[]. Or you could return a document type like hstore or json. Related:
Dynamic alternative to pivot with CASE and GROUP BY
Dynamically convert hstore keys into columns for an unknown set of keys
But it might be simpler to just use two calls: 1: Let Postgres build the query. 2: Execute and retrieve returned rows.
Selecting multiple max() values using a single SQL statement
I would not use the function from Eric Minikel as presented in your question at all. It is not safe against SQL injection by way of maliciously malformed identifiers. Use format() to build query strings unless you are running an outdated version older than Postgres 9.1.
A shorter and cleaner implementation could look like this:
CREATE OR REPLACE FUNCTION xtab(_tbl regclass, _row text, _cat text
, _expr text -- still vulnerable to SQL injection!
, _type regtype)
RETURNS text
LANGUAGE plpgsql AS
$func$
DECLARE
_cat_list text;
_col_list text;
BEGIN
-- generate categories for xtab param and col definition list
EXECUTE format(
$$SELECT string_agg(quote_literal(x.cat), '), (')
, string_agg(quote_ident (x.cat), %L)
FROM (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
, ' ' || _type || ', ', _cat, _tbl)
INTO _cat_list, _col_list;
-- generate query string
RETURN format(
'SELECT * FROM crosstab(
$q$SELECT %I, %I, %s
FROM %I
GROUP BY 1, 2 -- only works if the 3rd column is an aggregate expression
ORDER BY 1, 2$q$
, $c$VALUES (%5$s)$c$
) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr -- expr must be an aggregate expression!
, _tbl, _cat_list, _col_list, _type);
END
$func$;
Same function call as your original version. The function crosstab() is provided by the additional module tablefunc which has to be installed. Basics:
PostgreSQL Crosstab Query
This handles column and table names safely. Note the use of object identifier types regclass and regtype. Also works for schema-qualified names.
Table name as a PostgreSQL function parameter
However, it is not completely safe while you pass a string to be executed as expression (_expr - cellc in your original query). This kind of input is inherently unsafe against SQL injection and should never be exposed to the general public.
SQL injection in Postgres functions vs prepared queries
Scans the table only once for both lists of categories and should be a bit faster.
Still can't return completely dynamic row types since that's strictly not possible.
Not quite impossible, you can still execute it (from a query execute the string and return SETOF RECORD.
Then you have to specify the return record format. The reason in this case is that the planner needs to know the return format before it can make certain decisions (materialization comes to mind).
So in this case you would EXECUTE the query, return the rows and return SETOF RECORD.
For example, we could do something like this with a wrapper function but the same logic could be folded into your function:
CREATE OR REPLACE FUNCTION crosstab_wrapper
(tablename varchar, rowc varchar, colc varchar,
cellc varchar, celldatatype varchar)
returns setof record language plpgsql as $$
DECLARE outrow record;
BEGIN
FOR outrow IN EXECUTE xtab($1, $2, $3, $4, $5)
LOOP
RETURN NEXT outrow
END LOOP;
END;
$$;
Then you supply the record structure on calling the function just like you do with crosstab.
Then when you all the query you would have to supply a record structure (as (col1 type, col2 type, etc) like you do with connectby.

Postgres run SQL statement from string

I would like to execute a SQL statement based on a string. I have created the SQL statement through a function which returns a string. Can anybody explain how I can execute the statement that is returned? I know that you can't do it in plain SQL, so I was thinking about putting it in a function. The only issue is that the columns in the statement aren't always the same, so I don't know which data types to use for the columns. I'm using Postgres 9.1.0.
For example, suppose the SQL string returned from my function the is:
Select open, closed, discarded from abc
But, it can also be:
Select open from abc
Or
Select open, closed from abc
How can I execute any of these strings, so that the results would be returned as a table with only the columns listed in the statement?
Edit: the function is written in PL/pgSQL. And the results will be used for reporting where they don't want to see columns that have no values. So the function that I wrote returns the names of all columns that have values and then add it to the SQL statement.
Thanks for your help!
I don't think you can return the rows directly from a function, because its return type would be unknown. Even if you specified the return type as RECORD, you'd have to list the returned columns at call time. Based on Wayne Conrad's idea, you could do this:
CREATE FUNCTION my_create(cmd TEXT) RETURNS VOID AS $$
BEGIN
EXECUTE 'CREATE TEMPORARY TABLE temp_result AS ' || cmd;
END;
$$ VOLATILE LANGUAGE plpgsql;
Then use the function like this:
BEGIN;
SELECT my_create(...);
SELECT * FROM temp_result;
ROLLBACK; -- or COMMIT

using comma separated values inside IN clause for NUMBER column

I have 2 procedures inside a package. I am calling one procedure to get a comma separated list of user ids.
I am storing the result in a VARCHAR variable. Now when I am using this comma separated list to put inside an IN clause in it is throwing "ORA-01722:INVALID NUMBER" exception.
This is how my variable looks like
l_userIds VARCHAR2(4000) := null;
This is where i am assigning the value
l_userIds := getUserIds(deptId); -- this returns a comma separated list
And my second query is like -
select * from users_Table where user_id in (l_userIds);
If I run this query I get INVALID NUMBER error.
Can someone help here.
Do you really need to return a comma-separated list? It would generally be much better to declare a collection type
CREATE TYPE num_table
AS TABLE OF NUMBER;
Declare a function that returns an instance of this collection
CREATE OR REPLACE FUNCTION get_nums
RETURN num_table
IS
l_nums num_table := num_table();
BEGIN
for i in 1 .. 10
loop
l_nums.extend;
l_nums(i) := i*2;
end loop;
END;
and then use that collection in your query
SELECT *
FROM users_table
WHERE user_id IN (SELECT * FROM TABLE( l_nums ));
It is possible to use dynamic SQL as well (which #Sebas demonstrates). The downside to that, however, is that every call to the procedure will generate a new SQL statement that needs to be parsed again before it is executed. It also puts pressure on the library cache which can cause Oracle to purge lots of other reusable SQL statements which can create lots of other performance problems.
You can search the list using like instead of in:
select *
from users_Table
where ','||l_userIds||',' like '%,'||cast(user_id as varchar2(255))||',%';
This has the virtue of simplicity (no additional functions or dynamic SQL). However, it does preclude the use of indexes on user_id. For a smallish table this shouldn't be a problem.
The problem is that oracle does not interprete the VARCHAR2 string you're passing as a sequence of numbers, it is just a string.
A solution is to make the whole query a string (VARCHAR2) and then execute it so the engine knows he has to translate the content:
DECLARE
TYPE T_UT IS TABLE OF users_Table%ROWTYPE;
aVar T_UT;
BEGIN
EXECUTE IMMEDIATE 'select * from users_Table where user_id in (' || l_userIds || ')' INTO aVar;
...
END;
A more complex but also elegant solution would be to split the string into a table TYPE and use it casted directly into the query. See what Tom thinks about it.
DO NOT USE THIS SOLUTION!
Firstly, I wanted to delete it, but I think, it might be informative for someone to see such a bad solution. Using dynamic SQL like this causes multiple execution plans creation - 1 execution plan per 1 set of data in IN clause, because there is no binding used and for the DB, every query is a different one (SGA gets filled with lots of very similar execution plans, every time the query is run with a different parameter, more memory is needlessly used in SGA).
Wanted to write another answer using Dynamic SQL more properly (with binding variables), but Justin Cave's answer is the best, anyway.
You might also wanna try REF CURSOR (haven't tried that exact code myself, might need some little tweaks):
DECLARE
deptId NUMBER := 2;
l_userIds VARCHAR2(2000) := getUserIds(deptId);
TYPE t_my_ref_cursor IS REF CURSOR;
c_cursor t_my_ref_cursor;
l_row users_Table%ROWTYPE;
l_query VARCHAR2(5000);
BEGIN
l_query := 'SELECT * FROM users_Table WHERE user_id IN ('|| l_userIds ||')';
OPEN c_cursor FOR l_query;
FETCH c_cursor INTO l_row;
WHILE c_cursor%FOUND
LOOP
-- do something with your row
FETCH c_cursor INTO l_row;
END LOOP;
END;
/