Performance when calling a function in SELECT statement - sql

I have a query where I need to call a SQL function to format a particular column in the query. The formatting needed is very similar to formatting a phone number, ie. changing 1234567890 into (123)456-7890.
I've read that calling a function from a select statement could be a performance killer, and it was kind of reflected in my situation, the time the query took more than tripled and I did not think the function would take this much longer. The function runs in linear time but does use SQL loops. To give an idea of the size of the database this particular query returns about 220,000 rows. The run time of the query went from < 3s to > 9s when running without calling the function vs. running calling the function. The column that needs formatting isn't indexed or used in a join condition or where clause.
Is the performance drop here expected or is there something I can do to improve it?
This is the function in question:
CREATE OR REPLACE FUNCTION fn(bigint)
RETURNS character varying LANGUAGE plpgsql AS
$BODY$
DECLARE
v_chars varchar[];
v_ret varchar;
v_length int4;
v_count int4;
BEGIN
if ($1 isnull or $1 = 0) then
return null;
end if;
v_chars := regexp_split_to_array($1::varchar,'');
v_ret := '';
v_length := array_upper (v_chars,1);
v_count := 0;
for v_index in 1..11 loop
v_count := v_count + 1;
if (v_index <= v_length) then
v_ret := v_chars[v_length - (v_index - 1)] || v_ret;
else
v_ret := '0' || v_ret;
end if;
if (v_count <= 6 and (v_count % 2) = 0) then
v_ret := '.' || v_ret;
end if;
end loop;
return v_ret;
END
$BODY$

It depends on the specifics of the function. To find out how much a bare function call will cost, create dummy functions like:
CREATE FUNCTION f_bare_plpgsql(text)
RETURNS text LANGUAGE plpgsql IMMUTABLE AS
$BODY$
BEGIN
RETURN $1;
END
$BODY$;
CREATE FUNCTION f_bare_sql(text)
RETURNS text LANGUAGE sql IMMUTABLE AS
$BODY$
SELECT $1;
$BODY$;
And try your query again.
If then you wonder why your function is slow, add it to your question.
Solution for updated question
Your function could be improved in many places, but there is a more radical solution:
SELECT to_char(12345678901, '00000"."00"."00"."00')
Many times faster, obviously. More about to_char() in the manual.
Consider the following demo:
WITH x(n) AS (
VALUES (1::bigint), (12), (123), (1234), (12345), (123456), (1234567)
,(12345678), (123456789), (1234567890), (12345678901), (123456789012)
)
SELECT n, x.fn(n), to_char(n, '00000"."00"."00"."00')
FROM x
n | fn | to_char
--------------+----------------+-----------------
1 | 00000.00.00.01 | 00000.00.00.01
12 | 00000.00.00.12 | 00000.00.00.12
123 | 00000.00.01.23 | 00000.00.01.23
1234 | 00000.00.12.34 | 00000.00.12.34
12345 | 00000.01.23.45 | 00000.01.23.45
123456 | 00000.12.34.56 | 00000.12.34.56
1234567 | 00001.23.45.67 | 00001.23.45.67
12345678 | 00012.34.56.78 | 00012.34.56.78
123456789 | 00123.45.67.89 | 00123.45.67.89
1234567890 | 01234.56.78.90 | 01234.56.78.90
12345678901 | 12345.67.89.01 | 12345.67.89.01
123456789012 | 23456.78.90.12 | #####.##.##.##
to_char() is only prepared for up to 11 decimal digits, as you can see.
Can easily be extended, if need should be.

If you really must perform the formatting in the database then modify your table to include a field to store the formatted number.
A trigger can call your function to generate the formatted number when the value changes, then you only (slightly) increase the time taken to INSERT or UPDATE a few rows at a time, rather than all of them.
Your query returning all 220k rows then becomes a simple SELECT of the formatted value and should be nice and quick.

Related

Scope of column names, aliases, and OUT parameters in PL/pgSQL function

I have a hard time understanding why I can refer to the output columns in returns table(col type).
There is a subtle bug in the below code, the order by var refers to res in returns, not to data1 which we aliased to res. res in where is always null and we get 0 rows.
Why can I refer to the column name in output?
In what cases do I want this?
CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(res int )
LANGUAGE plpgsql
AS $function$
begin
return query
select data1 res
from table_with_data
where res < var;
end
$function$
Why can I refer to the column name in output
From the manual, the section about function parameters:
column_name The name of an output column in the RETURNS TABLE syntax. This is effectively another way of declaring a named OUT parameter, except that RETURNS TABLE also implies RETURNS SETOF.
What this means is that in your case res is effectively a writeable variable, which type you plan to return a set of. As any other variable without a default value assigned, it starts off as null.
In what case do I want this
You can return multiple records from a function of this type with a single return query, but another way is by a series of multiple return query or return next - in the second case, filling out the fields in a record of your output table each time. You could have expected a return statement to end the function, but in this scenario only a single return; without anything added would have that effect.
create table public.test_res (data integer);
CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(res int )
LANGUAGE plpgsql
AS $function$
begin
insert into public.test_res select res;--to inspect its initial value later
select 1 into res;
return next;
return next;--note that res isn't reset after returning next
return query select 2;--doesn't affect the current value of res
return next;--returning something else earlier didn't affect res either
return;--it will finish here
select 3 into res;
return next;
end
$function$;
select * from test(0);
-- res
-------
-- 1
-- 1
-- 2
-- 1
--(4 rows)
table public.test_res; --this was the initial value of res within the function
-- data
--------
-- null
--(1 row)
Which is the most useful with LOOPs
CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(comment text,res int) LANGUAGE plpgsql AS $function$
declare rec record;
array_slice int[];
begin
return query select 'return query returned these multiple records in one go', a from generate_series(1,3,1) a(a);
res:=0;
comment:='loop exit when res>4';
loop exit when res>4;
select res+1 into res;
return next;
end loop;
comment:='while res between 5 and 8 loop';
while res between 5 and 8 loop
select res+2 into res;
return next;
end loop;
comment:='for element in reverse 3 .. -3 by 2 loop';
for element in reverse 3 .. -3 by 2 loop
select element into res;
return next;
end loop;
comment:='for <record> in <expression> loop';
for rec in select pid from pg_stat_activity where state<>'idle' loop
select rec.pid into res;
return next;
end loop;
comment:='foreach array_slice slice 1 in array arr loop';
foreach array_slice SLICE 1 in array ARRAY[[1,2,3],[11,12,13],[21,22,23]] loop
select array_slice[1] into res;
return next;
end loop;
end
$function$;
Example results
select * from public.test(0);
-- comment | res
----------------------------------------------------------+--------
-- return query returned these multiple records in one go | 1
-- return query returned these multiple records in one go | 2
-- return query returned these multiple records in one go | 3
-- loop exit when res>4 | 1
-- loop exit when res>4 | 2
-- loop exit when res>4 | 3
-- loop exit when res>4 | 4
-- loop exit when res>4 | 5
-- while res between 5 and 8 loop | 7
-- while res between 5 and 8 loop | 9
-- for element in reverse 3 .. -3 by 2 loop | 3
-- for element in reverse 3 .. -3 by 2 loop | 1
-- for element in reverse 3 .. -3 by 2 loop | -1
-- for element in reverse 3 .. -3 by 2 loop | -3
-- for <record> in <expression> loop | 118786
-- foreach array_slice slice 1 in array arr loop | 1
-- foreach array_slice slice 1 in array arr loop | 11
-- foreach array_slice slice 1 in array arr loop | 21
--(18 rows)
True, OUT parameters (including field names in a RETURNS TABLE (...) clause) are visible in all SQL DML statements in a PL/pgSQL function body, just like other variables. Find details in the manual chapters Variable Substitution and Returning from a Function for PL/pgSQL.
However, a more fundamental misunderstanding comes first here. The syntax of your nested SELECT is invalid to begin with. The PL/pgSQL variable happens to mask this problem (with a different problem). In SQL, you cannot refer to output column names (column aliases in the SELECT clause) in the WHERE clause. This is invalid:
select data1 res
from table_with_data
where res < var;
The manual:
An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses;
there you must write out the expression instead.
This is different for ORDER BY, which you mention in the text, but don't include in the query. See:
GROUP BY + CASE statement
Fixing immediate issue
Could be repaired like this:
CREATE OR REPLACE FUNCTION public.test1(var int)
RETURNS TABLE(res int)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT data1 AS res -- column alias is just noise (or documentation)
FROM table_with_data
WHERE data1 < var; -- original column name!
END
$func$
fiddle
See:
Real number comparison for trigram similarity
The column alias is just noise in this case. The name of the column returned from the function is res in any case - as defined in the RETURNS TABLE clause.
Aside: It's recommended not to omit the AS keyword for column aliases (unlike table aliases). See:
Query to ORDER BY the number of rows returned from another SELECT
If there was actual ambiguity between column and variable name - say, you declared an OUT parameter or variable named data1 - you'd get an error message like this:
ERROR: column reference "data1" is ambiguous
LINE 2: select data1
^
DETAIL: It could refer to either a PL/pgSQL variable or a table column.
Brute force fix
Could be fixed with a special command at the start of the function body:
CREATE OR REPLACE FUNCTION public.test3(var int)
RETURNS TABLE(data1 int)
LANGUAGE plpgsql AS
$func$
#variable_conflict use_column -- ! to resolve conflicts
BEGIN
RETURN QUERY
SELECT data1
FROM table_with_data
WHERE data1 < var; -- !
END
$func$
See:
Naming conflict between function parameter and result of JOIN with USING clause
Proper fix
Table-qualify column names, and avoid conflicting variable names to begin with.
CREATE OR REPLACE FUNCTION public.test4(_var int)
RETURNS TABLE(res int)
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
RETURN QUERY
SELECT t.data1 -- table-qualify column name
FROM table_with_data t
WHERE t.data1 < _var; -- !
END
$func$
Example:
Calling a PostgreSQL function from Java

ORA-01830 when converting number to words

High value is in decimal format eg.- 100.10, I want to convert it into word so I write below script but not getting execution by this..
SELECT SYMBOL, HIGH, UPPER(TO_CHAR(TO_DATE(HIGH,'J'),'JSP'))
AMT_IN_WORDS FROM BHAV;
getting error of
ORA-01830
please correct this where am wrong....
Thank you in advance...
You can creation a function.
CREATE OR REPLACE FUNCTION big_amt_in_words (p_input VARCHAR2) RETURN VARCHAR2
IS
v_running_input NUMBER;
v_num NUMBER;
v_amt_in_words VARCHAR2(2000);
BEGIN
v_running_input := P_input;
FOR i IN (
SELECT RPAD(1, (rownum*3)+1, 0) num_value,
CASE LENGTH(RPAD(1, (rownum*3)+1, 0))
WHEN 4 THEN 'THOUSAND'
WHEN 7 THEN 'MILLION'
WHEN 10 THEN 'BILLION'
WHEN 13 THEN 'TRILLION'
WHEN 16 THEN 'QUADRILLION'
WHEN 19 THEN 'QUINTILLION'
WHEN 22 THEN 'SEXTILLION'
WHEN 25 THEN 'SEPTILLION'
WHEN 28 THEN 'OCTILLION'
END place_value
FROM DUAL
CONNECT BY rownum < 10
ORDER BY rownum desc)
LOOP
v_num := TRUNC(v_running_input/i.num_value,0);
IF v_num > 0 THEN
v_amt_in_words := v_amt_in_words||' '||TO_CHAR(TO_DATE(v_num,'J'), 'JSP')||' '||i.place_value;
v_running_input := v_running_input - (v_num * i.num_value);
END IF;
END LOOP;
v_amt_in_words := v_amt_in_words||' '||TO_CHAR(TO_DATE(TRUNC(v_running_input),'J'), 'JSP')
||' AND '||UPPER(TO_CHAR(TO_DATE((ROUND(v_running_input-TRUNC(v_running_input),2)*100),'J'),'JSP'))||' CENTS';
RETURN TRIM(v_amt_in_words);
END;
/
To use it,
SELECT BIG_AMT_IN_WORDS(65763245345658.12) amt_in_words
FROM DUAL;
Output
---------------------------------------------
SIXTY-FIVE TRILLION SEVEN HUNDRED SIXTY-THREE BILLION TWO HUNDRED FORTY-FIVE MILLION THREE HUNDRED FORTY-FIVE THOUSAND SIX HUNDRED FIFTY-EIGHT AND TWELVE CENTS
The error is raised since the value of high that you have shown is a decimal, that cannot be cast as an integer implicitly, unlike 100.00. So, it cannot be converted to Julian date.
SELECT UPPER(TO_CHAR(TO_DATE(100.10,'J'),'JSP'))AMT_IN_WORDS FROM DUAL;
This causes
ORA-01830: date format picture ends before converting entire input
string
This can be resolved by rounding the decimal to the nearest integer.
SELECT UPPER(TO_CHAR(TO_DATE(ROUND(100.10),'J'),'JSP'))AMT_IN_WORDS FROM DUAL;
| AMT_IN_WORDS |
|--------------|
| ONE HUNDRED |
Demo
If you really want the float component as well, although limited, you may refer this answer's EDIT2: How to convert number to words - ORACLE

function oracle to postgresql

I'm migrating an Oracle database to PostgreSQL, to transfer the tables I had no problem, however I'm having trouble transcribing a function for it to run in Postgres, below the function found in Oracle:
create or replace FUNCTION "FN_HOUR_MINUTE" (P_HOUR IN NUMBER)
RETURN NUMBER
IS
-- PL/SQL Specification
V_RETORN NUMBER(4);
-- Convert hour to minute
-- PL/SQL Block
BEGIN
V_RETORN := 60*TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(P_HOUR,'0000'),' ') ,1,2))+
TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(P_HOUR, '0000'),' '), 3,2));
RETURN V_RETORN;
EXCEPTION
WHEN OTHERS THEN
RETURN NULL ;
END;
I tried writing in postgres as follows:
CREATE OR REPLACE FUNCTION fn_hour_minute(p_hour in NUMERIC)
RETURNS NUMERIC(4) AS $$
DECLARE
v_retorn NUMERIC(4);
BEGIN
v_retorn := 60*TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(p_hour,'0000'),' ') ,1,2))+
TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(p_hour, '0000'),' '), 3,2));
RETURN v_retorn;
END;
$$ LANGUAGE plpgsql;
But gives an error that says the to_number function does not exist.
If you spread the expression into factors:
select TO_CHAR(1234,'0000'),
ltrim(TO_CHAR(1234,'0000')),
substr(ltrim(TO_CHAR(1234,'0000')),1,2),
substr(ltrim(TO_CHAR(1234,'0000')),3,2)
from dual;
TO_CH LTRIM SU SU
----- ----- -- --
1234 1234 12 34
you will see that this is just a very advanced way to calculate such an expression
60 * TRUNC( p_hour / 100 ) + p_hour % 100
You forgot to include the formating for the TO_NUMBER section. Update the TO_NUMBER to TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(p_hour,'0000'),' ') ,1,2), '0000') and it should work.

Postgresql record to array

I need to convert from an array to rows and back to an array for filtering records.
I'm using information_schema._pg_expandarray in a SELECT query to get one row per value in the array.
Given the following array :
"char"[]
{i,i,o,t,b}
_pg_expandarray retuns 5 rows with 1 column of type record :
record
(i,1)
(i,2)
(o,3)
(t,4) <= to be filtered out later
(b,5)
I need to filter this result set to exclude record(s) that contains 't'.
How can I do that ? Should I convert back to an array ?
Is there a way to filter on the array directly ?
Thanks in advance.
If your objective is to produce a set of rows as above but with the row containing 't' removed, then this does the trick:
test=> select *
from information_schema._pg_expandarray(array['i','i','o','t','b']) as a(i)
where a.i!='t';
i | n
---+---
i | 1
i | 2
o | 3
b | 5
(4 rows)
As an aside, unless you particularly want the index returned as a second column, I'd be inclined to use unnest() over information_schema._pg_expandarray(), which does not appear to be documented and judging by the leading '_' in the name is probably intended for internal usage.
There does not seem to be any built in function to filter arrays. Your question implies you may want the result in array form - if that is the case then writing a simple function is trivial. Here is an example:
CREATE OR REPLACE FUNCTION array_filter(anyarray, anyelement) RETURNS anyarray
AS $$
DECLARE
inArray ALIAS FOR $1;
filtValue ALIAS FOR $2;
outArray ALIAS FOR $0;
outIndex int=0;
BEGIN
FOR I IN array_lower(inArray, 1)..array_upper(inArray, 1) LOOP
IF inArray[I] != filtValue THEN
outArray[outIndex] := inArray[I];
outIndex=outIndex+1;
END IF;
END LOOP;
RETURN outArray;
END;
$$ LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
Usage:
test=> select array_filter(array['i','i','o','t','b'],'t');
array_filter
-----------------
[0:3]={i,i,o,b}
(1 row)

How to calculate an exponential moving average on postgres?

I'm trying to implement an exponential moving average (EMA) on postgres, but as I check documentation and think about it the more I try the more confused I am.
The formula for EMA(x) is:
EMA(x1) = x1
EMA(xn) = α * xn + (1 - α) * EMA(xn-1)
It seems to be perfect for an aggregator, keeping the result of the last calculated element is exactly what has to be done here. However an aggregator produces one single result (as reduce, or fold) and here we need a list (a column) of results (as map). I have been checking how procedures and functions work, but AFAIK they produce one single output, not a column. I have seen plenty of procedures and functions, but I can't really figure out how does this interact with relational algebra, especially when doing something like this, an EMA.
I did not have luck searching the Internets so far. But the definition for an EMA is quite simple, I hope it is possible to translate this definition into something that works in postgres and is simple and efficient, because moving to NoSQL is going to be excessive in my context.
Thank you.
PD: here you can see an example:
https://docs.google.com/spreadsheet/ccc?key=0AvfclSzBscS6dDJCNWlrT3NYdDJxbkh3cGJ2S2V0cVE
You can define your own aggregate function and then use it with a window specification to get the aggregate output at each stage rather than a single value.
So an aggregate is a piece of state, and a transform function to modify that state for each row, and optionally a finalising function to convert the state to an output value. For a simple case like this, just a transform function should be sufficient.
create function ema_func(numeric, numeric) returns numeric
language plpgsql as $$
declare
alpha numeric := 0.5;
begin
-- uncomment the following line to see what the parameters mean
-- raise info 'ema_func: % %', $1, $2;
return case
when $1 is null then $2
else alpha * $2 + (1 - alpha) * $1
end;
end
$$;
create aggregate ema(basetype = numeric, sfunc = ema_func, stype = numeric);
which gives me:
steve#steve#[local] =# select x, ema(x, 0.1) over(w), ema(x, 0.2) over(w) from data window w as (order by n asc) limit 5;
x | ema | ema
-----------+---------------+---------------
44.988564 | 44.988564 | 44.988564
39.5634 | 44.4460476 | 43.9035312
38.605724 | 43.86201524 | 42.84396976
38.209646 | 43.296778316 | 41.917105008
44.541264 | 43.4212268844 | 42.4419368064
These numbers seem to match up to the spreadsheet you added to the question.
Also, you can define the function to pass alpha as a parameter from the statement:
create or replace function ema_func(state numeric, inval numeric, alpha numeric)
returns numeric
language plpgsql as $$
begin
return case
when state is null then inval
else alpha * inval + (1-alpha) * state
end;
end
$$;
create aggregate ema(numeric, numeric) (sfunc = ema_func, stype = numeric);
select x, ema(x, 0.5 /* alpha */) over (order by n asc) from data
Also, this function is actually so simple that it doesn't need to be in plpgsql at all, but can be just a sql function, although you can't refer to parameters by name in one of those:
create or replace function ema_func(state numeric, inval numeric, alpha numeric)
returns numeric
language sql as $$
select case
when $1 is null then $2
else $3 * $2 + (1-$3) * $1
end
$$;
This type of query can be solved with a recursive CTE - try:
with recursive cte as (
select n, x ema from my_table where n = 1
union all
select m.n, alpha * m.x + (1 - alpha) * cte.ema
from cte
join my_table m on cte.n = m.n - 1
cross join (select ? alpha) a)
select * from cte;
--$1 Stock code
--$2 exponential;
create or replace function fn_ema(text,numeric)
returns numeric as
$body$
declare
alpha numeric := 0.5;
var_r record;
result numeric:=0;
n int;
p1 numeric;
begin
alpha=2/(1+$2);
n=0;
for var_r in(select *
from stock_old_invest
where code=$1 order by stock_time desc)
loop
if n>0 then
result=result+(1-alpha)^n*var_r.price_now;
else
p1=var_r.price_now;
end if;
n=n+1;
end loop;
result=alpha*(result+p1);
return result;
end
$body$
language plpgsql volatile
cost 100;
alter function fn_ema(text,numeric)
owner to postgres;