I'm trying to implement an exponential moving average (EMA) on postgres, but as I check documentation and think about it the more I try the more confused I am.
The formula for EMA(x) is:
EMA(x1) = x1
EMA(xn) = α * xn + (1 - α) * EMA(xn-1)
It seems to be perfect for an aggregator, keeping the result of the last calculated element is exactly what has to be done here. However an aggregator produces one single result (as reduce, or fold) and here we need a list (a column) of results (as map). I have been checking how procedures and functions work, but AFAIK they produce one single output, not a column. I have seen plenty of procedures and functions, but I can't really figure out how does this interact with relational algebra, especially when doing something like this, an EMA.
I did not have luck searching the Internets so far. But the definition for an EMA is quite simple, I hope it is possible to translate this definition into something that works in postgres and is simple and efficient, because moving to NoSQL is going to be excessive in my context.
Thank you.
PD: here you can see an example:
https://docs.google.com/spreadsheet/ccc?key=0AvfclSzBscS6dDJCNWlrT3NYdDJxbkh3cGJ2S2V0cVE
You can define your own aggregate function and then use it with a window specification to get the aggregate output at each stage rather than a single value.
So an aggregate is a piece of state, and a transform function to modify that state for each row, and optionally a finalising function to convert the state to an output value. For a simple case like this, just a transform function should be sufficient.
create function ema_func(numeric, numeric) returns numeric
language plpgsql as $$
declare
alpha numeric := 0.5;
begin
-- uncomment the following line to see what the parameters mean
-- raise info 'ema_func: % %', $1, $2;
return case
when $1 is null then $2
else alpha * $2 + (1 - alpha) * $1
end;
end
$$;
create aggregate ema(basetype = numeric, sfunc = ema_func, stype = numeric);
which gives me:
steve#steve#[local] =# select x, ema(x, 0.1) over(w), ema(x, 0.2) over(w) from data window w as (order by n asc) limit 5;
x | ema | ema
-----------+---------------+---------------
44.988564 | 44.988564 | 44.988564
39.5634 | 44.4460476 | 43.9035312
38.605724 | 43.86201524 | 42.84396976
38.209646 | 43.296778316 | 41.917105008
44.541264 | 43.4212268844 | 42.4419368064
These numbers seem to match up to the spreadsheet you added to the question.
Also, you can define the function to pass alpha as a parameter from the statement:
create or replace function ema_func(state numeric, inval numeric, alpha numeric)
returns numeric
language plpgsql as $$
begin
return case
when state is null then inval
else alpha * inval + (1-alpha) * state
end;
end
$$;
create aggregate ema(numeric, numeric) (sfunc = ema_func, stype = numeric);
select x, ema(x, 0.5 /* alpha */) over (order by n asc) from data
Also, this function is actually so simple that it doesn't need to be in plpgsql at all, but can be just a sql function, although you can't refer to parameters by name in one of those:
create or replace function ema_func(state numeric, inval numeric, alpha numeric)
returns numeric
language sql as $$
select case
when $1 is null then $2
else $3 * $2 + (1-$3) * $1
end
$$;
This type of query can be solved with a recursive CTE - try:
with recursive cte as (
select n, x ema from my_table where n = 1
union all
select m.n, alpha * m.x + (1 - alpha) * cte.ema
from cte
join my_table m on cte.n = m.n - 1
cross join (select ? alpha) a)
select * from cte;
--$1 Stock code
--$2 exponential;
create or replace function fn_ema(text,numeric)
returns numeric as
$body$
declare
alpha numeric := 0.5;
var_r record;
result numeric:=0;
n int;
p1 numeric;
begin
alpha=2/(1+$2);
n=0;
for var_r in(select *
from stock_old_invest
where code=$1 order by stock_time desc)
loop
if n>0 then
result=result+(1-alpha)^n*var_r.price_now;
else
p1=var_r.price_now;
end if;
n=n+1;
end loop;
result=alpha*(result+p1);
return result;
end
$body$
language plpgsql volatile
cost 100;
alter function fn_ema(text,numeric)
owner to postgres;
Related
Given the following example function:
CREATE OR REPLACE FUNCTION add_max_value(_x BIGINT)
RETURNS BIGINT
LANGUAGE sql
AS $$
SELECT 9223372036854775807 + _x;
$$;
If this function is called with any positive value, the following error is returned:
SELECT add_max_value(1); -- Expecting -9223372036854775808 if math wrapped
-- SQL Error [22003]: ERROR: bigint out of range
How can I do wrap-on-overflow integer math in Postgres?
Please note:
I want to do this in the database, not in the application
I don't want it to promote to an arbitrary precision integer (NUMERIC)
Although the example only does addition, in practice I'm interested in other operations as well
As a SQL function there isn't a way. SQL functions cannot process exceptions. But a plpgsql function can:
CREATE OR REPLACE FUNCTION add_max_value(_x BIGINT)
RETURNS BIGINT
LANGUAGE plpgsql
AS $$
declare
bigx bigint;
begin
bigx = 9223372036854775807 + _x;
return bigx;
exception
when sqlstate '22003' then
return (9223372036854775807::numeric + _x - 2^64)::bigint;
end;
$$;
This is easily in my top five of the most wasteful SQL I have ever written:
create or replace function add_max_value(_x bigint)
returns bigint
language sql
as $$
with recursive inputs as (
select s.rn, r.a::int, s.b::int, (r.a::int + s.b::int) % 2 as sumbit,
(r.a::bit & s.b::bit)::int as carry
from regexp_split_to_table((9223372036854775807::bit(64))::text, '') with ordinality as r(a, rn)
join regexp_split_to_table((_x::bit(64))::text, '') with ordinality as s(b, rn)
on s.rn = r.rn
), addition as (
select rn, sumbit, sumbit as s2, carry, carry as upcarry
from inputs
where rn = 64
union
select i.rn, i.sumbit, (i.sumbit + a.upcarry) % 2, i.carry,
(i.carry::bit | a.upcarry::bit)::int
from addition a
join inputs i on i.rn = a.rn - 1
)
select (string_agg(s2::text, '' order by rn)::bit(64))::bigint
from addition
$$;
I have a function defined by
CREATE OR REPLACE FUNCTION public.div(dividend INTEGER, divisor INTEGER)
RETURNS INTEGER
LANGUAGE 'sql'
IMMUTABLE
LEAKPROOF
STRICT
SECURITY DEFINER
PARALLEL SAFE
AS $BODY$
SELECT ($1 + $2/2) / $2;
$BODY$;
It should calculate a commercial rounded result. Most of the times, it does the job. I don't know why, but select div(5, 3) gives me the correct answer while it doesn't when one parameter is calculated by an aggregate, e.g. select div(sum(val), 3) from (select 1 as val UNION SELECT 4) list is sufficient to trigger that.
How can I fix div? I don't want to cast every input.
BTW, using SELECT (cast($1 as integer) + cast($2 as integer)/2) / cast($2 as integer); as the definition of div didn't help.
Allow floats as parameters, then explicitly cast at the calculation, otherwise you have an implied conversion whilst passing the parameter.
CREATE OR REPLACE FUNCTION my_div(dividend FLOAT, divisor FLOAT)
RETURNS INTEGER
LANGUAGE 'sql'
IMMUTABLE
-- LEAKPROOF -- not allowed at dbfiddle.uk
STRICT
SECURITY DEFINER
PARALLEL SAFE
AS $BODY$
SELECT --($1 + $2/2) / $2;
(cast($1 as integer) + cast($2 as integer)/2) / cast($2 as integer)
$BODY$;
✓
select my_div(sum(val), 3)
from (select 1 as val UNION SELECT 4) x
| my_div |
| -----: |
| 2 |
dbfiddle here
Change the name of the function.
The function div(numeric, numeric) is a builtin Postgres function and there is an ambiguity which function you want to call:
select div(5, 3) -- calls your function public.div(integer, integer)
select div(5::bigint, 3) -- calls pg_catalog.div(numeric, numeric)
In the second case the arguments have to be resolved and the system function is chosen as first.
Note that the function sum(integer) gives bigint as a result.
I'm using PostgreSQL 9.3.
The table partner.partner_statistic contains the following columns:
id reg_count
serial integer
I wrote the function convert(integer):
CREATE FUNCTION convert(d integer) RETURNS integer AS $$
BEGIN
--Do something and return integer result
END
$$ LANGUAGE plpgsql;
And now I need to write a function returned array of integers as follows:
CREATE FUNCTION res() RETURNS integer[] AS $$
<< outerblock >>
DECLARE
arr integer[]; --That array of integers I need to fill in depends on the result of query
r partner.partner_statistic%rowtype;
table_name varchar DEFAULT 'partner.partner_statistic';
BEGIN
FOR r IN
SELECT * FROM partner.partner_statistic offset 0 limit 100
LOOP
--
-- I need to add convert(r[reg_count]) to arr where r[id] = 0 (mod 5)
--
-- How can I do that?
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
You don't need (and shouldn't use) PL/PgSQL loops for this. Just use an aggregate. I'm kind of guessing about what you mean by "where r[id] = 0 (mod 5) but I'm assuming you mean "where id is evenly divisible by 5". (Note that this is NOT the same thing as "every fifth row" because generated IDs have gaps).
Something like:
SELECT array_agg(r.reg_count)
FROM partner.partner_statistic
WHERE id % 5 = 0
LIMIT 100
probably meets your needs.
If you want to return the value, use RETURN QUERY SELECT ... or preferably use a simple sql language function.
If you want a dynamic table name, use:
RETURN QUERY EXECUTE format('
SELECT array_agg(r.reg_count)
FROM %I
WHERE id % 5 = 0
LIMIT 100', table_name::regclass);
I want to write sql query that will calculate iterated function sequence of some function until it reaches fixed point, and then return that fixed point:
f(f(f(f ..(x)))) = x0 = f(x0)
E.g. let f(x) = (256/x + x)/2:
create function f(x float) returns float as $$
select (256/x + x) / 2
$$ language sql;
Here is my attempt to write the query:
create function f_sequence(x float) returns table(x0 float) as $$
with recursive
t(a,b) as
(select x, f(x)
union all
select b, f(b) from t where a <> b)
select a from t;
$$ language sql;
Now it is possible to get iterated sequence that converges to some fixed point:
=# select f_sequence(333);
f_sequence
------------------
333
166.884384384384
84.2091902577822
43.6246192451207
24.7464326525125
17.5456790321891
16.0680829640781
16.0001442390486
16.0000000006501
16
(10 rows)
(Actually it converges to √256 because this is Babylonian method of calculating square roots.)
Now I need one additional query to get only the last row from the sequence:
=# with
res as (select array_agg(f_sequence) as res from f_sequence(333))
select res[array_length(res,1)] from res;
res
-----
16
(1 row)
The question is: how to write this more concise?
In particular, I don't like that separate query to get the last value (and that I need to accumulate all intermediate values in array).
Leave the definition of f as it is.
In your sequence generation, retain the value of f(x) in the output.
create function f_sequence(x float) returns table(x0 float, fx float) as $$
with recursive
t(a,b) as
(select x, f(x)
union all
select b, f(b) from t where a <> b)
select a, b from t;
$$ language sql;
Then limit your result to just the fixed value.
select x0 from f_sequence(256) where x0 = fx;
Edit: Add a procedural version.
create function iterf(x float) returns float as $$
declare fx float := f(x);
begin
while fx != x loop
x := fx;
fx := f(x);
end loop;
return fx;
end;
$$ language plpgsql;
This question already has answers here:
Postgres function returning table not returning data in columns
(2 answers)
Closed 8 years ago.
I created function like below:
CREATE TYPE points AS (
"gid" double precision
, "elevation" double precision
, "distance" double precision
, "x" double precision
, "y" double precision
);
CREATE OR REPLACE FUNCTION public.nd_test4()
RETURNS SETOF points AS $$
DECLARE
sql text;
rec points;
BEGIN
sql := 'select
"gid"
, "elev" as "elevation"
, st_distance(ST_MakePoint(1093147, 1905632) , "the_geom" ) as "distance"
, st_x("the_geom") as "x"
, st_y("the_geom")as "y"
from
"elevation-test"
where
st_within("the_geom" ,st_buffer(ST_MakePoint(1093147, 1905632), 15) )
order by distance limit 4';
FOR rec IN EXECUTE(sql) LOOP
RETURN NEXT rec;
END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;
And when I run the function like select nd_test4();, I get a result with no filed names like below.
image?
How can I get result with filed name like this:
gid | elevation | distance | x | y
----+-----------+----------+---------+-------
1 | 350.0 | 10 | 12345.1 | 12435
Call the function with:
SELECT * FROM nd_test4();
Also, your function definition is needlessly convoluted. Simplify to:
CREATE OR REPLACE FUNCTION public.nd_test4()
RETURNS SETOF points AS
$func$
BEGIN
RETURN QUERY
SELECT gid
,elev -- AS elevation
,st_distance(ST_MakePoint(1093147, 1905632) , the_geom ) -- AS distance
,st_x(the_geom) -- AS x
,st_y(the_geom) -- AS y
FROM "elevation-test"
WHERE st_within(the_geom, st_buffer(ST_MakePoint(1093147, 1905632), 15))
ORDER BY distance
LIMIT 4;
END
$func$ LANGUAGE plpgsql;
Or better yet, use a plain SQL function here:
CREATE OR REPLACE FUNCTION public.nd_test4()
RETURNS SETOF points AS
$func$
SELECT gid
,elev -- AS elevation
,st_distance(ST_MakePoint(1093147, 1905632) , the_geom ) -- AS distance
,st_x(the_geom) -- AS x
,st_y(the_geom) -- AS y
FROM "elevation-test"
WHERE st_within(the_geom, st_buffer(ST_MakePoint(1093147, 1905632), 15))
ORDER BY distance
LIMIT 4
$func$ LANGUAGE sql;
No need for dynamic SQL.
I also stripped the gratuitous double quotes. Not needed for legal, lower-case identifiers. Exception is "elevation-test". You shouldn't use an operator (-) as part of a table name. That's just begging for trouble.
Aliases in the function body are replaced by column names of the composite type. They are only visible inside the function and therefore just documentation in your case.