Custom aggregate function

Custom aggregate function - sql

I am trying to understand aggregate functions and I need help.
So for instance the following sample:
CREATE OR REPLACE FUNCTION array_median(timestamp[])
RETURNS timestamp AS
$$
SELECT CASE WHEN array_upper($1,1) = 0 THEN null ELSE asorted[ceiling(array_upper(asorted,1)/2.0)] END
FROM (SELECT ARRAY(SELECT ($1)[n] FROM
generate_series(1, array_upper($1, 1)) AS n
WHERE ($1)[n] IS NOT NULL
ORDER BY ($1)[n]
) As asorted) As foo ;
$$
LANGUAGE 'sql' IMMUTABLE;
CREATE AGGREGATE median(timestamp) (
SFUNC=array_append,
STYPE=timestamp[],
FINALFUNC=array_median
)
I am not understanding the structure/logic that needs to go into the select statement in the aggregate function itself. Can someone explain what the flow/logic is?
I am writing an aggregate, a strange one, that the return is always the first string it ever sees.

You're showing a median calculation, but want the first text value you see?
Below is how to do that. Assuming you want the first non-null value, that is. If not, you'll need to keep track of if you've got a value already or not.
The accumulator function is written as plpgsql and sql - the plpgsql one lets you use variable names and debug it too. It simply uses COALESCE against the previous accumulated value and the new value and returns the first non-null. So - as soon as you have a non-null in the accumulator everything else gets ignored.
You may also want to consider the "first_value" window function for this sort of thing if you're on a modern (8.4+) version of PostgreSQL.
http://www.postgresql.org/docs/9.1/static/functions-window.html
HTH
BEGIN;
CREATE FUNCTION remember_first(acc text, newval text) RETURNS text AS $$
BEGIN
RAISE NOTICE '% vs % = %', acc, newval, COALESCE(acc, newval);
RETURN COALESCE(acc, newval);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
CREATE FUNCTION remember_first_sql(text,text) RETURNS text AS $$
SELECT COALESCE($1, $2);
$$ LANGUAGE SQL IMMUTABLE;
-- No "initcond" means we start out with null
--
CREATE AGGREGATE first(text) (
sfunc = remember_first,
stype = text
);
CREATE TEMP TABLE tt (t text);
INSERT INTO tt VALUES ('abc'),('def'),('ghi');
SELECT first(t) FROM tt;
ROLLBACK;

Related

Compute an aggregated tsrange from a set of entries?

I am trying to compute a aggregated tsrange from a set of row that I extract from an SQL query. Problem is that I keep getting errors that the input parameter is not being passed in.
CREATE OR REPLACE AGGREGATE range_merge(anyrange)
(
sfunc = range_merge,
stype = anyrange
);
DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint) returns tsrange AS
$$
DECLARE
result tsrange;
BEGIN
EXECUTE format('select range_merge(valid) from %s where entity_id = %U', entity_name, entry) into result;
return result;
END
$$ LANGUAGE plpgsql;
When I do:
select * from aggregate_validity(country, 1);
I get an error stating that the entity name and entry do not exist. It does not seem to parameterize the input into the statement properly.

Function:
EXECUTE format('select range_merge(valid) from %s where entity_id=%U',entity_name, entry)
into result;
=>
EXECUTE format('select range_merge(valid) from %I where entity_id=%s',entity_name, entry)
into result;
--%I for identifier, %s for value
Call:
select * from aggregate_validity(country, 1)
=>
select * from aggregate_validity('country', 1);
db<>fiddle demo

CREATE OR REPLACE AGGREGATE range_merge(anyrange) (
SFUNC = range_merge
, STYPE = anyrange
);
-- DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint, OUT result tsrange)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'SELECT range_merge(valid) FROM ' || entity_name || ' WHERE entity_id = $1'
INTO result
USING entry;
END
$func$;
Call:
SELECT aggregate_validity('country', 1);
db<>fiddle here
The call does not need SELECT * FROM, as the function returns a single value per definition.
I used an OUT parameter to simplify (OUT result tsrange). See:
Returning from a function with OUT parameter
Don't concatenate the entry value into the SQL string. Pass it as value with the USING clause. Cleaner, faster.
Since entity_name is passed as regclass, it's safe to simply concatenate (which is a bit cheaper). See:
Table name as a PostgreSQL function parameter
Plus, missing quotes and incorrect format specifiers, as Lukasz already provided.
Your custom aggregate function range_merge() has some caveats:
I wouldn't name it "range_merge", that being the name of the plain function range_merge(), too. While that's legal, it still invites confusing errors.
You are aware that the function range_merge() includes gaps between input ranges in the output range?
range_merge() returns NULL for any NULL input. So if your table has any NULL values in the column valid, the result is always NULL. I strongly suggest that any involved columns shall be defined as NOT NULL.
If you are at liberty to install additional modules, consider range_agg by Paul Jungwirth who is also here on Stackovflow. It provides the superior function range_agg() addressing some of the mentioned issues.
If you don't want to include gaps, consider the Postgres Wiki page on range aggregation.
I would probably not use aggregate_validity() at all. It obscures the nested functionality from the Postgres query planner and may lead so suboptimal query plans. Typically, you can replace it with a correlated or a LATERAL subquery, which can be planned and optimized by Postgres in context of the outer query. I appended a demo to the fiddle:
db<>fiddle here
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

<column> is ambiguous in column comparison between two tables

I want this postgres function to work :
CREATE OR REPLACE FUNCTION difference_of_match_ids_in_match_history_and_match_results()
returns table(match_id BIGINT)
as
$$
BEGIN
return QUERY
SELECT *
FROM sports.match_history
WHERE match_id NOT IN (SELECT match_id
FROM sports.match_results);
END $$
LANGUAGE 'plpgsql';
This stand alone query works just fine:
SELECT *
FROM sports.match_history
WHERE match_id NOT IN (SELECT match_id FROM sports.match_results);
But when I put it into this function and try to run it like this:
select *
from difference_of_match_ids_in_match_history_and_match_results();
I get this:
SQL Error [42702]: ERROR: column reference "match_id" is ambiguous
Detail: It could refer to either a PL/pgSQL variable or a table
column. Where: PL/pgSQL function
difference_of_match_ids_in_match_history_and_match_results() line 3 at
RETURN QUERY
I've seen other questions with this same error, and they suggest naming the sub queries to specify which instance of a column you're referring to, however, those examples use joins and my query works fine outside of the function.
If I do need to name the column, how would I do so with only one sub-query?
If that isn't the issue, then I'm assuming that there's something wrong with the way I'm defining a function.

You query is fine. The ambiguity is on the match_id in returns table(match_id BIGINT) rename it or prefix the columns with the table name in your query
CREATE OR REPLACE FUNCTION difference_of_match_ids_in_match_history_and_match_results()
returns table(new_name BIGINT)
as
$$
BEGIN
return QUERY
SELECT *
FROM sports.match_history
WHERE match_id NOT IN (SELECT match_id
FROM sports.match_results);
END $$
LANGUAGE 'plpgsql';
or
CREATE OR REPLACE FUNCTION difference_of_match_ids_in_match_history_and_match_results()
returns table(match_id BIGINT)
as
$$
BEGIN
return QUERY
SELECT sports.match_history.match_id
FROM sports.match_history
WHERE sports.match_history.match_id NOT IN (SELECT sports.match_results.match_id
FROM sports.match_results);
END $$
LANGUAGE 'plpgsql';
Didn't test the code.

The structure of the result set must match the function result type. If you want to get only match_ids:
CREATE OR REPLACE FUNCTION difference_of_match_ids_in_match_history_and_match_results()
RETURNS TABLE(m_id BIGINT) -- !!
AS
$$
BEGIN
RETURN QUERY
SELECT match_id -- !!
FROM sports.match_history
WHERE match_id NOT IN (SELECT match_id
FROM sports.match_results);
END $$
LANGUAGE 'plpgsql';
If you want to get whole rows as a result:
DROP FUNCTION difference_of_match_ids_in_match_history_and_match_results();
CREATE OR REPLACE FUNCTION difference_of_match_ids_in_match_history_and_match_results()
RETURNS SETOF sports.match_history -- !!
AS
$$
BEGIN
RETURN QUERY
SELECT * -- !!
FROM sports.match_history
WHERE match_id NOT IN (SELECT match_id
FROM sports.match_results);
END $$
LANGUAGE 'plpgsql';

As others have answerd, it's an ambiguity between the result definition and PL/pgSQL variables. The column name in a set returning function is in fact also a variable inside the function.
But you don't need PL/pgSQL for this in the first place. If you use a plain SQL function it will be more efficient and the problem will go away as well:
CREATE OR REPLACE FUNCTION difference_of_match_ids_in_match_history_and_match_results()
returns table(match_id BIGINT)
as
$$
SELECT match_id --<< do not return * - only return one column
FROM sports.match_history
WHERE match_id NOT IN (SELECT match_id
FROM sports.match_results);
$$
LANGUAGE sql;
Note that the language name is an identifier and should not be quoted at all.

The naming conflict between column names and plpgsql OUT parameters has been addressed. More details here:
Postgresql - INSERT RETURNING INTO ambiguous column reference
I would also use a different query style. NOT IN (SELECT ...) is typically slowest and carries traps with NULL values. Use NOT EXISTS instead:
SELECT match_id
FROM sports.match_history h
WHERE NOT EXISTS (
SELECT match_id
FROM sports.match_results
WHERE match_id = h.match_id
);
More:
Select rows which are not present in other table

Pass extra parameter to PostgreSQL aggregate final function

Is the only way to pass an extra parameter to the final function of a PostgreSQL aggregate to create a special TYPE for the state value?
e.g.:
CREATE TYPE geomvaltext AS (
geom public.geometry,
val double precision,
txt text
);
And then to use this type as the state variable so that the third parameter (text) finally reaches the final function?
Why aggregates can't pass extra parameters to the final function themselves? Any implementation reason?
So we could easily construct, for example, aggregates taking a method:
SELECT ST_MyAgg(accum_number, 'COMPUTE_METHOD') FROM blablabla
Thanks

You can define an aggregate with more than one parameter.
I don't know if that solves your problem, but you could use it like this:
CREATE OR REPLACE FUNCTION myaggsfunc(integer, integer, text) RETURNS integer
IMMUTABLE STRICT LANGUAGE sql AS
$f$
SELECT CASE $3
WHEN '+' THEN $1 + $2
WHEN '*' THEN $1 * $2
ELSE NULL
END
$f$;
CREATE AGGREGATE myagg(integer, text) (
SFUNC = myaggsfunc(integer, integer, text),
STYPE = integer
);
It could be used like this:
CREATE TABLE mytab
AS SELECT * FROM generate_series(1, 10) i;
SELECT myagg(i, '+') FROM mytab;
myagg
-------
55
(1 row)
SELECT myagg(i, '*') FROM mytab;
myagg
---------
3628800
(1 row)

I solved a similar issue by making a custom aggregate function that did all the operations at once and stored their states in an array.
CREATE AGGREGATE myagg(integer)
(
INITCOND = '{ 0, 1 }',
STYPE = integer[],
SFUNC = myaggsfunc
);
and:
CREATE OR REPLACE FUNCTION myaggsfunc(agg_state integer[], agg_next integer)
RETURNS integer[] IMMUTABLE STRICT LANGUAGE 'plpgsql' AS $$
BEGIN
agg_state[1] := agg_state[1] + agg_next;
agg_state[2] := agg_state[2] * agg_next;
RETURN agg_state;
END;
$$;
Then made another function that selected one of the results based on the second argument:
CREATE OR REPLACE FUNCTION myagg_pick(agg_state integer[], agg_fn character varying)
RETURNS integer IMMUTABLE STRICT LANGUAGE 'plpgsql' AS $$
BEGIN
CASE agg_fn
WHEN '+' THEN RETURN agg_state[1];
WHEN '*' THEN RETURN agg_state[2];
ELSE RETURN 0;
END CASE;
END;
$$;
Usage:
SELECT myagg_pick(myagg("accum_number"), 'COMPUTE_METHOD') FROM "mytable" GROUP BY ...
Obvious downside of this is the overhead of performing all the functions instead of just one. However when dealing with simple operations such as adding, multiplying etc. it should be acceptable in most cases.

You would have to rewrite the final function itself, and in that case you might as well write a set of new aggregate functions, one for each possible COMPUTE_METHOD. If the COMPUTE_METHOD is a data value or implied by a data value, then a CASE statement can be used to select the appropriate aggregate method. Alternatively, you may want to create a custom composite type with fields for accum_number and COMPUTE_METHOD, and write a single new aggregate function that uses this new data type.

Passing multiple values in single parameter

Let's say I have this function:
CREATE OR REPLACE FUNCTION test_function(character varaying)
RETURNS integer AS
$BODY$
DECLARE
some_integer integer;
begin
Select column2 from test_table where column1 in ($1) into some_integer;
end;
Return some_integer;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
And I want to call it like this:
Select * from test_function ('data1', 'data2','data3');
Of course, it cannot be done this way, because Postgres tries to find function with this name and three parameter which doesn't exists.
I tried to put quotes around commas but in that case parameter is interpreted wrong: data1', 'data2','data3, like one string.
Is there a way to put multiple values in parameter so IN clause can recognized it?

(Your function wouldn't be created. RETURN after END is syntactical nonsense.)
A function with a VARIADIC parameter does exactly what you ask for:
CREATE OR REPLACE FUNCTION test_function(date, date, VARIADIC varchar[])
RETURNS SETOF integer
LANGUAGE sql AS
$func$
SELECT col1
FROM test_table
WHERE col3 > $1
AND col4 < $2
AND col2 = ANY($3)
$func$;
db<>fiddle here - demo with additional parameters
Old sqlfiddle
Call (as desired):
SELECT * FROM test_function('data1', 'data2', 'data3');
Using a simple SQL function, plpgsql is not required for the simple example. But VARIADIC works for plpgsql functions, too.
Using RETURNS SETOF integer since this can obviously return multiple rows.
Details:
Pass multiple values in single parameter
Return rows matching elements of input array in plpgsql function
VARIADIC parameter must be the last input parameter
Return rows matching elements of input array in plpgsql function

Passing a ResultSet into a Postgresql Function

Is it possible to pass the results of a postgres query as an input into another function?
As a very contrived example, say I have one query like
SELECT id, name
FROM users
LIMIT 50
and I want to create a function my_function that takes the resultset of the first query and returns the minimum id. Is this possible in pl/pgsql?
SELECT my_function(SELECT id, name FROM Users LIMIT 50); --returns 50

You could use a cursor, but that very impractical for computing a minimum.
I would use a temporary table for that purpose, and pass the table name for use in dynamic SQL:
CREATE OR REPLACE FUNCTION f_min_id(_tbl regclass, OUT min_id int) AS
$func$
BEGIN
EXECUTE 'SELECT min(id) FROM ' || _tbl
INTO min_id;
END
$func$ LANGUAGE plpgsql;
Call:
CREATE TEMP TABLE foo ON COMMIT DROP AS
SELECT id, name
FROM users
LIMIT 50;
SELECT f_min_id('foo');
Major points
The first parameter is of type regclass to prevent SQL injection. More info in this related answer on dba.SE.
I made the temp table ON COMMIT DROP to limit its lifetime to the current transaction. May or may not be what you want.
You can extend this example to take more parameters. Search for code examples for dynamic SQL with EXECUTE.
-> SQLfiddle demo

I would take the problem on the other side, calling an aggregate function for each record of the result set. It's not as flexible but can gives you an hint to work on.
As an exemple to follow your sample problem:
CREATE OR REPLACE FUNCTION myMin ( int,int ) RETURNS int AS $$
SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;
$$ LANGUAGE SQL STRICT IMMUTABLE;
CREATE AGGREGATE my_function ( int ) (
SFUNC = myMin, STYPE = int, INITCOND = 2147483647 --maxint
);
SELECT my_function(id) from (SELECT * FROM Users LIMIT 50) x;

It is not possible to pass an array of generic type RECORD to a plpgsql function which is essentially what you are trying to do.
What you can do is pass in an array of a specific user defined TYPE or of a particular table row type. In the example below you could also swap out the argument data type for the table name users[] (though this would obviously mean getting all data in the users table row).
CREATE TYPE trivial {
"ID" integer,
"NAME" text
}
CREATE OR REPLACE FUNCTION trivial_func(data trivial[])
RETURNS integer AS
$BODY$
DECLARE
BEGIN
--Implementation here using data
return 1;
END$BODY$
LANGUAGE 'plpgsql' VOLATILE;

I think there's no way to pass recordset or table into function (but I'd be glad if i'm wrong). Best I could suggest is to pass array:
create or replace function my_function(data int[])
returns int
as
$$
select min(x) from unnest(data) as x
$$
language SQL;
sql fiddle demo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Custom aggregate function - sql

Related

Compute an aggregated tsrange from a set of entries?

<column> is ambiguous in column comparison between two tables

Pass extra parameter to PostgreSQL aggregate final function

Passing multiple values in single parameter

Passing a ResultSet into a Postgresql Function

Categories

Resources