I've tried to create an aggregate function, which finds the minimum value in a column, then it adds the Laplacian noise.
I am using Postregres PL/pgSQL language.
The aggregate works perfectly, but I'd like to know if there is any way to improve the code that I wrote.
/*
* PLpgSQL function which behaves to aggregate the MIN(col) function then adds the laplacian noise.
* For the sensivity (which is the upper bound of the query), We use the halfed maximum value of the column called.
* Passing the array which contains the entire column values, that will be compared, to establish which one is the minimum.
* Then we compute Laplacian distribution (sensivity/epsilon). This given value is added to the minimum Value that will disturb
* the final result
*/
CREATE OR REPLACE FUNCTION addLaplacianNoiseMin (real[]) RETURNS real AS $$
DECLARE
i real;
minVal real; --minimum value which is found in the column and then disturbed
laplaceNoise real; --laplacian distribution which is computed finding the halfed maximum value, divided by an arbitrary epsilon (small value)
epsilon real := 1.2;
sensivity real; --our computed upper bound
maxVal real;
BEGIN
minVal := $1[1];
maxVal := $1[1];
IF ARRAY_LENGTH($1,1) > 0 THEN --Checking whether the array is empty or not
<<confrontoMinimo>>
FOREACH i IN ARRAY $1 LOOP --Looping through the entire array, passed as parameter
IF minVal >= i THEN
minVal := i;
ELSE
maxVal := i;
END IF;
END LOOP confrontoMinimo;
ELSE
RAISE NOTICE 'Invalid parameter % passed to the aggregate function',$1;
--Raising exception if the parameter passed as argument points to null.
RAISE EXCEPTION 'Cannot find MIN value. Parameter % is null', $1
USING HINT = 'You cannot pass a null array! Check the passed parameter';
END IF;
sensivity := maxVal/2;
laplaceNoise := sensivity/(epsilon);
RAISE NOTICE 'minVal: %, maxVal: %, sensivity: %, laplaceNoise: %', minVal, maxVal, sensivity,laplaceNoise;
minVal := laplaceNoise + minVal;
RETURN minVal;
END;
$$ LANGUAGE plpgsql;
CREATE AGGREGATE searchMinValueArray (real)
(
sfunc = array_append,
stype = real[],
finalfunc = addLaplacianNoiseMin,
initCond = '{}'
);
Yes, you can improve that by not using an array as state for the aggregate, but a composite type like:
CREATE TYPE aggstate AS (minval real, maxval real);
Then you can perform the operation from your loop in the SFUNC and don't have to keep an array in memory that can possibly become very large. The FINALFUNC would then become very simple.
Related
Can someone tell me what's wrong in my code. I need to create function that displays the number of digits given a number but I keep getting missing in and out parameter. Im am using Oracle SQL. Thank you
SET SERVEROUTPUT ON;
CREATE OR REPLACE FUNCTION Digit (n1 IN OUT INTEGER) RETURN INTEGER IS
Counter INTEGER := 0;
BEGIN
WHILE (n1 != 0 ) LOOP
n1 := n1 /10;
Counter := Counter + 1;
END LOOP;
RETURN Counter;
END;
Test block:
DECLARE
n1 INTEGER := 0;
BEGIN:
n1 := &n1;
DBMS_OUTPUT.PUT_LINE('The number of digit = ' ||Digit(Counter));
END;
The error is probably because of the stray : character after begin in your test block.
I would write it like this:
create or replace function digits
( p_num integer )
return integer
as
pragma udf;
i simple_integer := p_num;
l_digits simple_integer := 0;
begin
while i <> 0 loop
i := i / 10;
l_digits := l_digits + 1;
end loop;
return l_digits;
end digits;
I made the parameter in only, instead of in out. This means you can use it in SQL queries, and also in PL/SQL code without needing to pass in a variable whose value will get changed to 0 by the function.
pragma pdf tells the compiler to optimise the function for use in SQL.
I used simple_integer as in theory it's slightly more efficient for arithmetic operations, although I doubt any improvement is measurable in the real world (and I'm rather trusting the optimising compiler to cast my literal 10 as a simple_integer, as otherwise the overhead of type conversion will defeat any arithmetic efficiency).
I've found a Verhoeff Checksum function for PostgresSQL at: https://github.com/HIISORG/SNOMED-CT-PostgreSQL/blob/master/Verhoeff.sql
CREATE OR REPLACE FUNCTION verhoeff_generate (
input numeric = NULL::numeric
)
RETURNS smallint AS $$
DECLARE
_c SMALLINT := 0;
_m SMALLINT;
_i SMALLINT := 0;
_n VARCHAR(255);
-- Delcare array
_d CHAR(100) := '0123456789123406789523401789563401289567401239567859876043216598710432765982104387659321049876543210';
_p CHAR(80) := '01234567891576283094580379614289160435279453126870428657390127938064157046913258';
_v CHAR(10) := '0432156789';
BEGIN
_n := REVERSE(input::TEXT);
WHILE _i<length(_n) LOOP
_m := CAST(SUBSTRING(_p,(((_i + 1)%8)*10) + CAST(SUBSTRING(_n, _i+1, 1) AS SMALLINT) + 1, 1) AS SMALLINT);
_c := CAST (substring(_d, (_c *10 + _m + 1), 1) AS SMALLINT);
_i := _i + 1;
END LOOP;
RETURN CONCAT(input, CAST(substring(_v,_c+1,1) as SMALLINT));
END; $$
LANGUAGE 'plpgsql'
IMMUTABLE
RETURNS NULL ON NULL INPUT;
I've modified the RETURN, so that it would concatenate the INPUT with the Checksum digit:
RETURN CONCAT(input, CAST(substring(_v,_c+1,1) as SMALLINT));
And I get the error:
[2020-02-20 11:53:19] [22003] ERROR: value "331010000014" is out of range for type smallint
[2020-02-20 11:53:19] Where: PL/pgSQL function verhoeff_generate(numeric) while casting return value to function's return type
I've tried:
RETURN CONCAT(input, CAST(substring(_v,_c+1,1) as BIGINT));
Still getting the same error.
You've modified the code that gives the return value, away from the original smallint it was returning, to now be a string. (CONCAT function outputs a string - you can cast numbers as many times as you like before you feed them into concat, but they will be converted into strings and then concatenated, and concat outputs a string, no matter what you feed into it).
CONCAT is now returning you a string containing too many digits (it is too numerically large) to fit into a smallint - a conversion that PG is attempting to carry out implicitly for you. This represents the core problem:
CREATE OR REPLACE FUNCTION return_big_number ()
RETURNS smallint AS $$
RETURN '32769'; --string of a number that is too big for a smallint
END; $$
'32769' is a string that cannot be converted to a smallint, because it's simply too numerically great - smallint caps out at 32767. Similarly by using concat, you're generating a string that contains digits representing a number too numerically large for a smallint
Either change the function declaration at the top so that it returns a suitable string:
RETURNS smallint AS $$
^^^^^^^^
change this to perhaps "RETURNS text AS $$"
Or if having the output as numeric would suit you better, change the function so it declares to return a numeric datatype that can represent more digits than a smallint, and change the return value calculation to keep it numeric (multiply the input by some power of 10, and add the checksum, rather than changing the input to string and concatenating the checksum)
I'm new to Postgresql (I'm programming in PL/pgSQL, there's no much difference with sql).
I wrote my custome aggregate function, which has to find the min value, or max in an array of numeric.
This is the function aggregate code
CREATE OR REPLACE FUNCTION searchMaxValue (numeric[]) RETURNS numeric AS $$
DECLARE
i numeric;
maxVal numeric;
BEGIN
maxVal = $1[1];
IF ARRAY_LENGHT($1,1) > 0 THEN --Checking whether the array is empty or not
<<confrontoMassimo>>
FOREACH i IN ARRAY $1 LOOP --Looping through the entire array, passed as parameter
IF maxVal <= $1[i] THEN
maxVal := $1[i];
END IF;
END LOOP confrontoMassimo;
ELSE
RAISE NOTICE 'Invalid parameter % passed to the aggregate function',$1;
--Raising exception if the parameter passed as argument points to null.
RAISE EXCEPTION 'Cannot find Max value. Parameter % is null', $1
USING HINT = 'You cannot pass a null array! Check the passed parameter';
END IF;
RETURN maxVal;
END;
$$ LANGUAGE plpgsql;
CREATE AGGREGATE searchMaxValueArray (numeric)
(
sfunc = array_append,
stype = numeric[],
finalfunc = searchMaxValue,
initCond = '{}'
);
The problem is, that it doesn't work as expected. What is the problem?
As mentioned above, there's a small typo in your function; ARRAY_LENGHT should be ARRAY_LENGTH.
Aside from that, the only issue I can see is here:
FOREACH i IN ARRAY $1 LOOP
IF maxVal <= $1[i] THEN
...
In a FOREACH loop, the target variable i isn't the array index, it's the array element itself. In other words, the loop should be:
FOREACH i IN ARRAY $1 LOOP
IF maxVal <= i THEN
maxVal := i;
END IF;
END LOOP
With those changes, it seems to work as expected: https://rextester.com/FTWB14034
How can i achieve below functionality using Oracle SQL or PL/SQL?
This stored procedure gives the same result as NORMDIST function in Calc.
The parameters that need to be passed are x, mean, standard deviation and cumulative.
Cumulative parameter gives a choice to get normal distribution value at x (0) or cumulative probability of value<=x (1).
create or replace FUNCTION NORMDIST(x_value number,mean_value number,stddev_value number, cumulative NUMBER DEFAULT 0)
RETURN NUMBER IS
x number;
t number;
z number;
ans number;
BEGIN
IF (stddev_value = 0) THEN
RETURN 1;
END IF;
x := (x_value-mean_value)/stddev_value;
IF cumulative = 1 THEN
z := abs(x)/SQRT(2);
t := 1/(1+0.5*z);
ans := t*exp(-z*z-1.26551223+t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))/2;
If (x <= 0)
Then RETURN ans;
Else return 1-ans;
End if;
ELSE
RETURN 1/(sqrt(2*3.14159265358979)*stddev_value)*Exp(-(Power(x_value-mean_value,2)/(2*Power(stddev_value,2)) ));
END IF;
END;
/
This is a quick solution, I have not tried to gain maximum precision or performance. Depending on your req, you might need to tweak number format, precision, calculation logic, etc.
create or replace function calc_sn_pdf(x in number) return number
is
pi CONSTANT NUMBER := 3.14159265358979;
begin
return 1/sqrt(2*pi) * exp(-x*x/2);
end;
/
The cdf must be approximated (as it is az integral function which has no simple mathematical formula), one possible approximation is implemented as follows. Many other approximations can be found on Wikipedia.
create or replace function calc_sn_cdf(x in number) return number
is
b0 CONSTANT NUMBER := 0.2316419;
b1 CONSTANT NUMBER := 0.319381530;
b2 CONSTANT NUMBER := -0.356563782;
b3 CONSTANT NUMBER := 1.781477937;
b4 CONSTANT NUMBER := -1.821255978;
b5 CONSTANT number := 1.330274429;
v_t number;
begin
--see 26.2.17 at http://people.math.sfu.ca/~cbm/aands/page_932.htm
--see https://en.wikipedia.org/wiki/Normal_distribution#Numerical_approximations_for_the_normal_CDF
--Zelen & Severo (1964) approximation
if x < 0 then
--this approximation works for x>0, but cdf is symmetric for x=0:
return 1 - calc_sn_cdf(-x);
else
v_t := 1 / (1 + b0*x);
return 1 - calc_sn_pdf(x)*(b1*v_t + b2*v_t*v_t + b3*v_t*v_t*v_t + b4*v_t*v_t*v_t*v_t + b5*v_t*v_t*v_t*v_t*v_t);
end if;
end;
/
Btw, if you need to run these functions a lot of time, it would be useful to turn on native pl/sql compilation.
--I wrote this function in PL/SQL and it works. I compared results with the NORMDIST
--Function in excel and the results match very closely. You will need to pass the
--following --parameters to the function.
-- 1. Value of X
-- 2. Value of Mean
-- 3. Value of Standard Deviation
--This function returns the same result when you pass cumulative=TRUE in excel.
create or replace FUNCTION NORMSDIST(x_value number,mean_value number,stddev_value number)
RETURN NUMBER IS
x number;
t number;
z number;
ans number;
BEGIN
IF (stddev_value = 0) THEN
RETURN 1;
END IF;
x := (x_value-mean_value)/stddev_value;
z := abs(x)/SQRT(2);
t := 1.0/(1.0+0.5*z);
ans := t*exp(-z*z-1.26551223+t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))/2.0;
If (x <= 0)
Then RETURN ans;
Else return 1-ans;
End if;
END NORMSDIST;
I'm trying to implement an exponential moving average (EMA) on postgres, but as I check documentation and think about it the more I try the more confused I am.
The formula for EMA(x) is:
EMA(x1) = x1
EMA(xn) = α * xn + (1 - α) * EMA(xn-1)
It seems to be perfect for an aggregator, keeping the result of the last calculated element is exactly what has to be done here. However an aggregator produces one single result (as reduce, or fold) and here we need a list (a column) of results (as map). I have been checking how procedures and functions work, but AFAIK they produce one single output, not a column. I have seen plenty of procedures and functions, but I can't really figure out how does this interact with relational algebra, especially when doing something like this, an EMA.
I did not have luck searching the Internets so far. But the definition for an EMA is quite simple, I hope it is possible to translate this definition into something that works in postgres and is simple and efficient, because moving to NoSQL is going to be excessive in my context.
Thank you.
PD: here you can see an example:
https://docs.google.com/spreadsheet/ccc?key=0AvfclSzBscS6dDJCNWlrT3NYdDJxbkh3cGJ2S2V0cVE
You can define your own aggregate function and then use it with a window specification to get the aggregate output at each stage rather than a single value.
So an aggregate is a piece of state, and a transform function to modify that state for each row, and optionally a finalising function to convert the state to an output value. For a simple case like this, just a transform function should be sufficient.
create function ema_func(numeric, numeric) returns numeric
language plpgsql as $$
declare
alpha numeric := 0.5;
begin
-- uncomment the following line to see what the parameters mean
-- raise info 'ema_func: % %', $1, $2;
return case
when $1 is null then $2
else alpha * $2 + (1 - alpha) * $1
end;
end
$$;
create aggregate ema(basetype = numeric, sfunc = ema_func, stype = numeric);
which gives me:
steve#steve#[local] =# select x, ema(x, 0.1) over(w), ema(x, 0.2) over(w) from data window w as (order by n asc) limit 5;
x | ema | ema
-----------+---------------+---------------
44.988564 | 44.988564 | 44.988564
39.5634 | 44.4460476 | 43.9035312
38.605724 | 43.86201524 | 42.84396976
38.209646 | 43.296778316 | 41.917105008
44.541264 | 43.4212268844 | 42.4419368064
These numbers seem to match up to the spreadsheet you added to the question.
Also, you can define the function to pass alpha as a parameter from the statement:
create or replace function ema_func(state numeric, inval numeric, alpha numeric)
returns numeric
language plpgsql as $$
begin
return case
when state is null then inval
else alpha * inval + (1-alpha) * state
end;
end
$$;
create aggregate ema(numeric, numeric) (sfunc = ema_func, stype = numeric);
select x, ema(x, 0.5 /* alpha */) over (order by n asc) from data
Also, this function is actually so simple that it doesn't need to be in plpgsql at all, but can be just a sql function, although you can't refer to parameters by name in one of those:
create or replace function ema_func(state numeric, inval numeric, alpha numeric)
returns numeric
language sql as $$
select case
when $1 is null then $2
else $3 * $2 + (1-$3) * $1
end
$$;
This type of query can be solved with a recursive CTE - try:
with recursive cte as (
select n, x ema from my_table where n = 1
union all
select m.n, alpha * m.x + (1 - alpha) * cte.ema
from cte
join my_table m on cte.n = m.n - 1
cross join (select ? alpha) a)
select * from cte;
--$1 Stock code
--$2 exponential;
create or replace function fn_ema(text,numeric)
returns numeric as
$body$
declare
alpha numeric := 0.5;
var_r record;
result numeric:=0;
n int;
p1 numeric;
begin
alpha=2/(1+$2);
n=0;
for var_r in(select *
from stock_old_invest
where code=$1 order by stock_time desc)
loop
if n>0 then
result=result+(1-alpha)^n*var_r.price_now;
else
p1=var_r.price_now;
end if;
n=n+1;
end loop;
result=alpha*(result+p1);
return result;
end
$body$
language plpgsql volatile
cost 100;
alter function fn_ema(text,numeric)
owner to postgres;