select function() in postgresql makes too much calls to function() [duplicate] - sql

This question already has an answer here:
How to avoid multiple function evals with the (func()).* syntax in a query?
(1 answer)
Closed 7 years ago.
Let's assume we have this function:
create or replace function foo(a integer)
returns table (b integer, c integer)
language plpgsql
as $$
begin
raise notice 'foo()';
return query select a*2, a*4;
return query select a*6, a*8;
return query select a*10, a*12;
end;
$$;
The "raise notice 'foo()'" part will be used to know how many time the function is called.
If i call the function this way:
postgres=# SELECT i, foo(i) as bla FROM generate_series(1,3) as i;
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
i | bla
---+---------
1 | (2,4)
1 | (6,8)
1 | (10,12)
2 | (4,8)
2 | (12,16)
2 | (20,24)
3 | (6,12)
3 | (18,24)
3 | (30,36)
(9 rows)
We can see that, as expected, foo() is called 3 times.
But if i call the function this way (so i actually gets foo() result in different columns):
postgres=# SELECT i, (foo(i)).* FROM generate_series(1,3) as i;
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
i | b | c
---+----+----
1 | 2 | 4
1 | 6 | 8
1 | 10 | 12
2 | 4 | 8
2 | 12 | 16
2 | 20 | 24
3 | 6 | 12
3 | 18 | 24
3 | 30 | 36
(9 rows)
We can see that foo() is called 6 times. And if foo() was returning 3 columns, it would have been called 9 times. It's pretty clear that foo() is called for every i and every column it returns.
I don't understand why postgres does not make an optimisation here. And this is a problem for me as my (real) foo() may be CPU intensive. Any idea ?
Edit:
Using an "immutable" function or a function that does not return multiple rows gives the same behaviour:
create or replace function foo(a integer)
returns table (b integer, c integer, d integer)
language plpgsql
immutable
as $$
begin
raise notice 'foo';
return query select a*2, a*3, a*4;
end;
$$;
postgres=# select i, (foo(i)).* from generate_series(1,2) as i;
NOTICE: foo
NOTICE: foo
NOTICE: foo
NOTICE: foo
NOTICE: foo
NOTICE: foo
i | b | c | d
---+---+---+---
1 | 2 | 3 | 4
2 | 4 | 6 | 8
(2 rows)

This is a known issue.
SELECT (f(x)).*
is macro-expanded at parse-time into
SELECT (f(x)).a, (f(x)).b, ...
and PostgreSQL doesn't coalesce multiple calls to the same function down to a single call.
To avoid the issue you can wrap it in another layer of subquery so that the macro-expansion occurs on a simple reference to the function's result rather than the function invocation:
select i, (f).*
FROM (
SELECT i, foo(i) f from generate_series(1,2) as i
) x(i, f)
or use a lateral call in the FROM clause, which is preferred for newer versions:
select i, f.*
from generate_series(1,2) as i
CROSS JOIN LATERAL foo(i) f;
The CROSS JOIN LATERAL may be omitted, using legacy comma joins and an implicit lateral join, but I find it considerably clear to include it, especially when you're mixing other join types.

Basically it is reasonable not to call functions that return more than one value (especially functions returning sets) in select clause.
In fact postgres does not make any optimization for such a call.
Place your function in from clause.
SELECT i, f.* FROM generate_series(1,3) as i, foo(i) f;
In the documentation you can find the note (emphasis mine):
Currently, functions returning sets can also be called in the select
list of a query. For each row that the query generates by itself, the
function returning set is invoked, and an output row is generated for
each element of the function's result set. Note, however, that this
capability is deprecated and might be removed in future releases.

Related

PostgreSQL Calculate product of an array

Write a function that takes an array and returns the product of all the other elements for each element. For example, f([2, 3, 4, 5]) -> [3x4x5, 2x4x5, 2x3x5, 2x3x4] -> [60, 40, 30, 24].
I know to calculate product you can do exp(sum(ln(value))) but am unsure on the rest.
if someone could help it would be appreciated.
While it's doable with with SQL, I think in this case PL/pgSQL might be easier to deal with:
create function multiply_elements(p_input int[])
returns int[]
as
$$
declare
l_result int[];
l_idx1 int;
l_idx2 int;
begin
for l_idx1 in 1..cardinality(p_input) loop
l_result[l_idx1] := 1;
for l_idx2 in 1..cardinality(p_input) loop
if l_idx1 <> l_idx2 then
l_result[l_idx1] := l_result[l_idx1] * p_input[l_idx2];
end if;
end loop;
end loop;
return l_result;
end;
$$
language plpgsql
immutable;
Ok, here is one solution using just SQL Select. It first generates a prep table in a form
rn | num | arr
1 | 2 | {2,3,4,5}
2 | 3 | {2,3,4,5}
3 | 4 | {2,3,4,5}
4 | 5 | {2,3,4,5}
the arrays are then unnested and we generate an intermediate result in the following form (table t)
rn | num | x
1 | 2 | 3
1 | 2 | 4
1 | 2 | 5
2 | 3 | 2
2 | 3 | 4
2 | 3 | 5
...
Such intermediate result is grouped and we calculate the required product. The final step is a creation of an array from rows (the array_agg function)
create aggregate product(integer)
(stype=bigint,sfunc=int84mul,initcond=1);
create function product_array(p_array int[])
returns int[]
as
$$
declare
r int[];
begin
select array_agg(p) into r
from (
select product(x) p
from (
select row_number() over () rn,
num,
p_array arr
from unnest(p_array) num
) prep,
lateral unnest(arr) x
where num != x
group by rn, num
order by rn
) t;
return r;
end;
$$
language plpgsql
immutable;
select product_array(array[2,3,4,5]);
DEMO
Product is computed by an aggregate function product defined at the beginning of the script since there is no such function in PostgreSQL.
The SQL contains the row_number function in order to preserve the order of the elements of the input array since the SQL switch to relations from arrays and back.

Execute statement provided as a value

I have the following table:
test=# CREATE TABLE stmts(id SERIAL, selector VARCHAR(255));
CREATE TABLE
test=# INSERT INTO stmts(selector) VALUES('5 > 3'),('5 < 3');
INSERT 0 2
test=# SELECT selector FROM stmts;
selector
----------
5 > 3
5 < 3
(2 rows)
I want to amend the select to be able to evaluate the value of the selector for each row, so desired effect is:
selector, magic FROM stmts;
selector | magic
--------------------
5 > 3 | true
5 < 3 | false
(2 rows)
It would be great if this was executed in the context of the row, so we can evaluate for example expression id = 5, etc.
Is this even possible?
You can do this dynamically in a plpgsql function:
create or replace function eval_bool(expr text)
returns boolean language plpgsql as $$
declare
rslt boolean;
begin
execute format('select %s', expr) into rslt;
return rslt;
end $$;
select id, selector, eval_bool(selector)
from stmts;
id | selector | eval_bool
----+----------+-----------
1 | 5 > 3 | t
2 | 5 < 3 | f
(2 rows)

Postgres return null values on function error/failure when casting

I am attempting to convert text values to timestamp values.
For the following table called a:
id | c1
----+--------------------
1 | 03-03-2000
2 | 01-01-2000
3 | 12/4/1990
4 | 12 Sept 2011
5 | 12-1-1999 12:33:12
6 | 24-04-89 2:33 am
I am attempting to perform a select with a cast as follows:
select id, c1,c1::timestampas c2 from a;
This works correctly if there were only the first 5 rows, but for the 6th row where c1 is 24-04-89 2:33 am it throws the following error:
ERROR: date/time field value out of range: "24-04-89 2:33 am"
HINT: Perhaps you need a different "datestyle" setting.
What I want is null for those values which cannot not be casted to timestamp instead of the command failing altogether. Like this:
id | c1 | c2
----+--------------------+---------------------
1 | 03-03-2000 | 2000-03-03 00:00:00
2 | 01-01-2000 | 2000-01-01 00:00:00
3 | 12/4/1990 | 1990-12-04 00:00:00
4 | 12 Sept 2011 | 2011-09-12 00:00:00
5 | 12-1-1999 12:33:12 | 1999-12-01 12:33:12
6 | 24-04-89 2:33 am | (null)
(6 rows)
EDIT:
Also, is there a generic way to implement this? i.e.: (based on klin's answer) a plpgsql wrapper function that sets the value to null if the function it is wrapped around throws an error.
For e.g.: a function set_null_on_error that can be used like this:
select id, c1,set_null_on_error(c1::timestamp)as c2 from a;
or
select id, c1,set_null_on_error(to_number(c1, '99'))as c2 from a;
This can be done by trapping an exception in a plpgsql function.
create or replace function my_to_timestamp(arg text)
returns timestamp language plpgsql
as $$
begin
begin
return arg::timestamp;
exception when others then
return null;
end;
end $$;
select id, c1, my_to_timestamp(c1) as c2 from a;
Trying to define a generic function.
Assume that you defined a function set_null_on_error(anyelement). Calling
select set_null_on_error('foo'::timestamp);
raises error before the function is executed.
You can try something like this:
create or replace function set_null_on_error(kind text, args anyarray)
returns anyelement language plpgsql
as $$
begin
begin
if kind = 'timestamp' then
return args[1]::timestamp;
elseif kind = 'number' then
return to_number(args[1], args[2]);
end if;
exception when others then
return null;
end;
end; $$;
select set_null_on_error('timestamp', array['2014-01-01']);
select set_null_on_error('number', array['1.22444', '9999D99']);
In my opinion such a solution is too complicated, quite inconvenient to use and generally might turn out to generate problems hard to debug.

How can you expand a "condensed" PostgreSQL row into separate columns?

I have a function which returns a table.
If you run SELECT * FROM some_function(12345) the result is:
object_id | name
----------------
12345 | "B"
If you run SELECT some_function(12345) the result is:
some_function
-------------
(12345,"B")
The problem is that I want the original form (so that I can access individual column values), but have the argument to some_function() come from a column in a table. I can execute SELECT some_function(thing_id) FROM things but this returns:
some_function
-------------
(12345,"B")
(12346,"C")
(12347,"D")
Whereas what I want returned is:
object_id | name
----------------
12345 | "B"
12346 | "C"
12347 | "D"
So how can one "unnest" or "expand" such a condensed row?
9.3 and above: lateral query
In PostgreSQL 9.3 or newer use an implicit lateral query:
SELECT f.* FROM things t, some_function(t.thing_id) f;
Prefer this formulation for all new queries. The above is the standard formulation.
It also works properly with functions that RETURNS TABLE or RETURNS SETOF RECORD as well as funcs with out-params that RETURNS RECORD.
It's shorthand for:
SELECT f.*
FROM things t
CROSS JOIN LATERAL some_function(t.thing_id) f;
Pre-9.3: wildcard expansion (with care)
Prior versions, causes multiple-evaluation of some_function, does not work if some_function returns a set, do not use this:
SELECT (some_function(thing_id)).* FROM things;
Prior versions, avoids multiple-evaluation of some_function using a second layer of indirection. Only use this if you must support quite old PostgreSQL versions.
SELECT (f).*
FROM (
SELECT some_function(thing_id) f
FROM things
) sub(f);
Demo:
Setup:
CREATE FUNCTION some_function(i IN integer, x OUT integer, y OUT text, z OUT text) RETURNS record LANGUAGE plpgsql AS $$
BEGIN
RAISE NOTICE 'evaluated with %',i;
x := i;
y := i::text;
z := 'dummy';
RETURN;
END;
$$;
create table things(thing_id integer);
insert into things(thing_id) values (1),(2),(3);
test run:
demo=> SELECT f.* FROM things t, some_function(t.thing_id) f;
NOTICE: evaluated with 1
NOTICE: evaluated with 2
NOTICE: evaluated with 3
x | y | z
---+---+-------
1 | 1 | dummy
2 | 2 | dummy
3 | 3 | dummy
(3 rows)
demo=> SELECT (some_function(thing_id)).* FROM things;
NOTICE: evaluated with 1
NOTICE: evaluated with 1
NOTICE: evaluated with 1
NOTICE: evaluated with 2
NOTICE: evaluated with 2
NOTICE: evaluated with 2
NOTICE: evaluated with 3
NOTICE: evaluated with 3
NOTICE: evaluated with 3
x | y | z
---+---+-------
1 | 1 | dummy
2 | 2 | dummy
3 | 3 | dummy
(3 rows)
demo=> SELECT (f).*
FROM (
SELECT some_function(thing_id) f
FROM things
) sub(f);
NOTICE: evaluated with 1
NOTICE: evaluated with 2
NOTICE: evaluated with 3
x | y | z
---+---+-------
1 | 1 | dummy
2 | 2 | dummy
3 | 3 | dummy
(3 rows)
SELECT * FROM (SELECT some_function(thing_id) FROM things) x;
The subselect SELECT some_function(thing_id) FROM things returns a row for each record found. The outer select "uncompresses" the row into separate columns.

How to substring and join with another table with the substring result

I have 2 tables: errorlookup and errors.
errorlookup has 2 columns: codes and description.
The codes are of length 2.
errors has 2 columns id and errorcodes.
The errorcodes are of length 40 meaning they code store 20 error codes for each id.
I need to display all the description associated with the id by substring the errorcodes and matching with code in errorlookup table.
Sample data for errorlookup:
codes:description
12:Invalid
22:Inactive
21:active
Sample data for errors:
id:errorcodes
1:1221
2:2112
3:1222
I cant use LIKE as it would result in too many errors. I want the errorcodes column to be broken down into strings of length 2 and then joined with the errorlookup.
How can it be done?
If you really cannot alter the tables structure, here's another approach:
Create an auxilary numbers table:
CREATE TABLE numbers
( i INT NOT NULL
, PRIMARY KEY (i)
) ;
INSERT INTO numbers VALUES
( 1 ) ;
INSERT INTO numbers VALUES
( 2 ) ;
--- ...
INSERT INTO numbers VALUES
( 100 ) ;
Then you could use this:
SELECT err.id
, err.errorcodes
, num.i
, look.codes
, look.descriptionid
FROM
( SELECT i, 2*i-1 AS pos --- odd numbers
FROM numbers
WHERE i <= 20 --- 20 pairs
) num
CROSS JOIN
errors err
JOIN
errorlookup look
ON look.codes = SUBSTR(err.errorcodes, pos, 2)
ORDER BY
err.errorcodes
, num.i ;
Test at: SQL-Fiddle
ID ERRORCODES I CODES DESCRIPTIONID
1 1221 1 12 Invalid
1 1221 2 21 Active
3 1222 1 12 Invalid
3 1222 2 22 Inactive
2 2112 1 21 Active
2 2112 2 12 Invalid
I think the cleanest solution is to "normalize" your errocodes table using a PL/SQL function. That way you can keep the current (broken) table design, but still access its content as if it was properly normlized.
create type error_code_type as object (id integer, code varchar(2))
/
create or replace type error_table as table of error_code_type
/
create or replace function unnest_errors
return error_table pipelined
is
codes_l integer;
i integer;
one_row error_code_type := error_code_type(null, null);
begin
for err_rec in (select id, errorcodes from errors) loop
codes_l := length(err_rec.errorcodes);
i := 1;
while i < codes_l loop
one_row.id := err_rec.id;
one_row.code := substr(err_rec.errorcodes, i, 2);
pipe row (one_row);
i := i + 2;
end loop;
end loop;
end;
/
Now with this function you can do something like this:
select er.id, er.code, el.description
from table(unnest_errors) er
join errorlookup el on el.codes = er.code;
You can also create a view based on the function to make the statements a bit easier to read:
create or replace view normalized_errorcodes
as
select *
from table(unnest_errors);
Then you can simply reference the view in the real statement.
(I tested this on 11.2 but I believe it should work on 10.x as well)
I think you're on the right track with LIKE. MySQL has an RLIKE function that allows matching by regular expression (I don't know if it's present in Oracle.) You could use errorlookup.code as a pattern to match against errors.errorcodes. The (..)* pattern is used to prevent things like "1213" from matching, for example, "21".
SELECT *
FROM error
JOIN errorlookup
WHERE errorcodes RLIKE CONCAT('^(..)*',code)
ORDER BY id;
+------+----------+------+
| id | errorcode| code |
+------+----------+------+
| 1 | 11 | 11 |
| 2 | 1121 | 11 |
| 2 | 1121 | 21 |
| 3 | 21313245 | 21 |
| 3 | 21313245 | 31 |
| 3 | 21313245 | 32 |
| 4 | 21 | 21 |
+------+----------+------+