How to return result of a SELECT inside a function in PostgreSQL? - sql

I have this function in PostgreSQL, but I don't know how to return the result of the query:
CREATE OR REPLACE FUNCTION wordFrequency(maxTokens INTEGER)
RETURNS SETOF RECORD AS
$$
BEGIN
SELECT text, count(*), 100 / maxTokens * count(*)
FROM (
SELECT text
FROM token
WHERE chartype = 'ALPHABETIC'
LIMIT maxTokens
) as tokens
GROUP BY text
ORDER BY count DESC
END
$$
LANGUAGE plpgsql;
But I don't know how to return the result of the query inside the PostgreSQL function.
I found that the return type should be SETOF RECORD, right? But the return command is not right.
What is the right way to do this?

Use RETURN QUERY:
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text -- also visible as OUT param in function body
, cnt bigint
, ratio bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, count(*) AS cnt -- column alias only visible in this query
, (count(*) * 100) / _max_tokens -- I added parentheses
FROM (
SELECT t.txt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
LIMIT _max_tokens
) t
GROUP BY t.txt
ORDER BY cnt DESC; -- potential ambiguity
END
$func$;
Call:
SELECT * FROM word_frequency(123);
Defining the return type explicitly is much more practical than returning a generic record. This way you don't have to provide a column definition list with every function call. RETURNS TABLE is one way to do that. There are others. Data types of OUT parameters have to match exactly what is returned by the query.
Choose names for OUT parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.
But note the potential naming conflict between the OUT parameter cnt and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over the OUT parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:
Use the ordinal position of the item in the SELECT list: ORDER BY 2 DESC. Example:
Select first row in each GROUP BY group?
Repeat the expression ORDER BY count(*).
(Not required here.) Set the configuration parameter plpgsql.variable_conflict or use the special command #variable_conflict error | use_variable | use_column in the function. See:
Naming conflict between function parameter and result of JOIN with USING clause
Don't use "text" or "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt and cnt in my examples, you may want more explicit names.
Added a missing ; and corrected a syntax error in the header. (_max_tokens int), not (int maxTokens) - data type after name.
While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Or work with numeric or a floating point type. See below.
Alternative
This is what I think your query should actually look like (calculating a relative share per token):
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text
, abs_cnt bigint
, relative_share numeric)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt, t.cnt
, round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2) -- AS relative_share
FROM (
SELECT t.txt, count(*) AS cnt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
GROUP BY t.txt
ORDER BY cnt DESC
LIMIT _max_tokens
) t
ORDER BY t.cnt DESC;
END
$func$;
The expression sum(t.cnt) OVER () is a window function. You could use a CTE instead of the subquery. Pretty, but a subquery is typically cheaper in simple cases like this one (mostly before Postgres 12).
A final explicit RETURN statement is not required (but allowed) when working with OUT parameters or RETURNS TABLE (which makes implicit use of OUT parameters).
round() with two parameters only works for numeric types. count() in the subquery produces a bigint result and a sum() over this bigint produces a numeric result, thus we deal with a numeric number automatically and everything just falls into place.

Please see the following link for documentation:
https://www.postgresql.org/docs/current/xfunc-sql.html
Example:
CREATE FUNCTION sum_n_product_with_tab (x int)
RETURNS TABLE(sum int, product int) AS $$
SELECT $1 + tab.y, $1 * tab.y FROM tab;
$$ LANGUAGE SQL;

Related

Column doesn't exist when using WITH statement PostgreSQL

I want to create a function to be used to get the node traversal path.
CREATE TYPE IDType AS (id uuid);
drop function F_ItemPath;
CREATE OR REPLACE FUNCTION F_ItemPath (item record)
RETURNS TABLE (item_id uuid, depth numeric)
AS $$
BEGIN
return QUERY
WITH recursive item_path AS (
SELECT ic.parent_item_id, depth=1
from item_combination ic, item i
WHERE ic.child_item_id=i.id
UNION all
SELECT ic.parent_item_id, depth=ip.depth + 1
FROM item_path ip, item_combination ic WHERE ip.parent_item_id=ic.child_item_id
)
SELECT item_id=ip.parent_item_id, depth=ip.depth FROM item_path ip;
END; $$
LANGUAGE plpgsql;
select * from F_ItemPath(('55D6F516-7D8F-4DF3-A4E5-1E3F505837A1', 'FFE2A4D3-267C-465F-B4B4-C7BB2582F1BC'))
there has two problems:
I tried using user-defined type to set parameter type CREATE TYPE IDType AS (id uuid);, but I don't know how to call the function with table argument
there has an error that says:
SQL Error [42703]: ERROR: column ip.depth does not exist
Where: PL/pgSQL function f_itempath(record) line 3 at RETURN QUERY
what I expected is I can use the function normally and the argument can be supplied from other tables.
this is the full query that you can try:
http://sqlfiddle.com/#!15/9caba/1
I made the query in DBEAVER app, it will have some different error message.
I suggest you can experiment with it outside sqlfiddle.
The expression depth=1 tests if the column depth equals the value 1 and returns a boolean value. But you never give that boolean expression a proper name.
Additionally you can't add numbers to boolean values, so the expression depth=ip.depth + 1 tries to add 1 to a value of true or false - which fails obviously. If it did work, it would then compare that value with the value in the column depth again.
Did you intend to alias the value 1 with the name depth? Then you need to use 1 as depth and ip.depth + 1 as depth in the recursive part.
In the final select you have the same error - using boolean expressions instead of a column alias
It's also highly recommended to use explicit JOIN operators which were introduced in the SQL standard over 30 years ago.
Using PL/pgSQL to wrap a SQL query is also a bit of an overkill. A SQL function is enough.
Using an untyped record as a parameter seems highly dubious. It won't allow you to access columns using e.g. item.id. But given your example call, it seems you simply want to pass multiple IDs for the anchor (no-recursive) part of the query. That's better done using an array or a varadic parameter which allows listing multiple parameters with commas.
So you probably want something like this:
drop function f_itempath;
CREATE OR REPLACE FUNCTION f_itempath(variadic p_root_id uuid[])
RETURNS TABLE (item_id uuid, depth integer)
as
$$
WITH recursive item_path AS (
SELECT ic.parent_item_id, 1 as depth
FROM item_combination ic
WHERE ic.child_item_id = any(p_root_id) --<< no join needed to access the parameter
UNION all
SELECT ic.parent_item_id, ip.depth + 1
FROM item_path ip
JOIN item_combination ic ON ip.parent_item_id = ic.child_item_id
)
SELECT ip.parent_item_id as item_id, ip.depth
FROM item_path ip;
$$
language sql
stable;
Then you can call it like this (note: no parentheses around the parameters)
select *
from f_itempath('55d6f516-7d8f-4df3-a4e5-1e3f505837a1', 'ffe2a4d3-267c-465f-b4b4-c7bb2582f1bc');
select *
from f_itempath('55d6f516-7d8f-4df3-a4e5-1e3f505837a1', 'ffe2a4d3-267c-465f-b4b4-c7bb2582f1bc', 'df366232-f200-4254-bad5-94e11ea35379');
select *
from f_itempath('55d6f516-7d8f-4df3-a4e5-1e3f505837a1');

Compute an aggregated tsrange from a set of entries?

I am trying to compute a aggregated tsrange from a set of row that I extract from an SQL query. Problem is that I keep getting errors that the input parameter is not being passed in.
CREATE OR REPLACE AGGREGATE range_merge(anyrange)
(
sfunc = range_merge,
stype = anyrange
);
DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint) returns tsrange AS
$$
DECLARE
result tsrange;
BEGIN
EXECUTE format('select range_merge(valid) from %s where entity_id = %U', entity_name, entry) into result;
return result;
END
$$ LANGUAGE plpgsql;
When I do:
select * from aggregate_validity(country, 1);
I get an error stating that the entity name and entry do not exist. It does not seem to parameterize the input into the statement properly.
Function:
EXECUTE format('select range_merge(valid) from %s where entity_id=%U',entity_name, entry)
into result;
=>
EXECUTE format('select range_merge(valid) from %I where entity_id=%s',entity_name, entry)
into result;
--%I for identifier, %s for value
Call:
select * from aggregate_validity(country, 1)
=>
select * from aggregate_validity('country', 1);
db<>fiddle demo
CREATE OR REPLACE AGGREGATE range_merge(anyrange) (
SFUNC = range_merge
, STYPE = anyrange
);
-- DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint, OUT result tsrange)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'SELECT range_merge(valid) FROM ' || entity_name || ' WHERE entity_id = $1'
INTO result
USING entry;
END
$func$;
Call:
SELECT aggregate_validity('country', 1);
db<>fiddle here
The call does not need SELECT * FROM, as the function returns a single value per definition.
I used an OUT parameter to simplify (OUT result tsrange). See:
Returning from a function with OUT parameter
Don't concatenate the entry value into the SQL string. Pass it as value with the USING clause. Cleaner, faster.
Since entity_name is passed as regclass, it's safe to simply concatenate (which is a bit cheaper). See:
Table name as a PostgreSQL function parameter
Plus, missing quotes and incorrect format specifiers, as Lukasz already provided.
Your custom aggregate function range_merge() has some caveats:
I wouldn't name it "range_merge", that being the name of the plain function range_merge(), too. While that's legal, it still invites confusing errors.
You are aware that the function range_merge() includes gaps between input ranges in the output range?
range_merge() returns NULL for any NULL input. So if your table has any NULL values in the column valid, the result is always NULL. I strongly suggest that any involved columns shall be defined as NOT NULL.
If you are at liberty to install additional modules, consider range_agg by Paul Jungwirth who is also here on Stackovflow. It provides the superior function range_agg() addressing some of the mentioned issues.
If you don't want to include gaps, consider the Postgres Wiki page on range aggregation.
I would probably not use aggregate_validity() at all. It obscures the nested functionality from the Postgres query planner and may lead so suboptimal query plans. Typically, you can replace it with a correlated or a LATERAL subquery, which can be planned and optimized by Postgres in context of the outer query. I appended a demo to the fiddle:
db<>fiddle here
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

Passing a ResultSet into a Postgresql Function

Is it possible to pass the results of a postgres query as an input into another function?
As a very contrived example, say I have one query like
SELECT id, name
FROM users
LIMIT 50
and I want to create a function my_function that takes the resultset of the first query and returns the minimum id. Is this possible in pl/pgsql?
SELECT my_function(SELECT id, name FROM Users LIMIT 50); --returns 50
You could use a cursor, but that very impractical for computing a minimum.
I would use a temporary table for that purpose, and pass the table name for use in dynamic SQL:
CREATE OR REPLACE FUNCTION f_min_id(_tbl regclass, OUT min_id int) AS
$func$
BEGIN
EXECUTE 'SELECT min(id) FROM ' || _tbl
INTO min_id;
END
$func$ LANGUAGE plpgsql;
Call:
CREATE TEMP TABLE foo ON COMMIT DROP AS
SELECT id, name
FROM users
LIMIT 50;
SELECT f_min_id('foo');
Major points
The first parameter is of type regclass to prevent SQL injection. More info in this related answer on dba.SE.
I made the temp table ON COMMIT DROP to limit its lifetime to the current transaction. May or may not be what you want.
You can extend this example to take more parameters. Search for code examples for dynamic SQL with EXECUTE.
-> SQLfiddle demo
I would take the problem on the other side, calling an aggregate function for each record of the result set. It's not as flexible but can gives you an hint to work on.
As an exemple to follow your sample problem:
CREATE OR REPLACE FUNCTION myMin ( int,int ) RETURNS int AS $$
SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;
$$ LANGUAGE SQL STRICT IMMUTABLE;
CREATE AGGREGATE my_function ( int ) (
SFUNC = myMin, STYPE = int, INITCOND = 2147483647 --maxint
);
SELECT my_function(id) from (SELECT * FROM Users LIMIT 50) x;
It is not possible to pass an array of generic type RECORD to a plpgsql function which is essentially what you are trying to do.
What you can do is pass in an array of a specific user defined TYPE or of a particular table row type. In the example below you could also swap out the argument data type for the table name users[] (though this would obviously mean getting all data in the users table row).
CREATE TYPE trivial {
"ID" integer,
"NAME" text
}
CREATE OR REPLACE FUNCTION trivial_func(data trivial[])
RETURNS integer AS
$BODY$
DECLARE
BEGIN
--Implementation here using data
return 1;
END$BODY$
LANGUAGE 'plpgsql' VOLATILE;
I think there's no way to pass recordset or table into function (but I'd be glad if i'm wrong). Best I could suggest is to pass array:
create or replace function my_function(data int[])
returns int
as
$$
select min(x) from unnest(data) as x
$$
language SQL;
sql fiddle demo

In PostgreSQL What is g(i) in: FROM generate_subscripts($1, 1) g(i)?

This is in the postgres manual:
CREATE or replace FUNCTION mleast(a VARIADIC numeric[])
RETURNS numeric
AS $$
SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
$$ LANGUAGE SQL;
SELECT mleast(10, -1, 5, 4.4);
if I write: ( ommiting g(i) )
CREATE or replace FUNCTION mleast(a VARIADIC numeric[])
RETURNS numeric
AS $$
SELECT min($1[i]) FROM generate_subscripts($1, 1);
$$ LANGUAGE SQL;
SELECT mleast(10, -1, 5, 4.4);
I receive: Error does not exist the column «i»
What exactly is g(i)?
generate_subscripts is a "set-returning function", which will return multiple rows when you call it. That's why it's most often put in the FROM clause.
By default, the results from generate_subscripts, which comes built-in with Postgres is anonymous and does automatically have any name to use as a handle in order to refer to it in the rest of the query. This is what the g(i) is; it's an alias for the table (g) and the column (i) returned by generate_subscripts. So this expression:
FROM generate_subscripts($1, 1) g(i)
means:
execute the function generate_subscripts and assign its results to a table called "g" with a single column called "i"
or in SQL form:
CREATE TABLE g ( i integer );
INSERT INTO g SELECT * FROM generate_subscripts(some_array, 1);
SELECT i FROM g ORDER BY i;
Without that alias, the SELECT portion of your query has no idea how to reference the results produced by generate_subscripts.
The use of g(i) and gs(i) is a convention in the Postgres world. "g" or "gs" stands for "generate_series" or "generate_subscripts". "i" is the traditional programming variable for "iterator". You can actually use any aliases you want, and I encourage you to use aliases which actually reference what you're trying to do, in order to improve future code maintenance. For example:
FROM generate_subscripts( $1, 1 ) as features(feature_no)
These functions are detailed in the PostgreSQL docs, and it's also useful to reference the docs on how to write Set Returning Functions (scroll down to 35.4.8. SQL Functions Returning Sets). Once you write one yourself, it becomes clearer why you need an alias to reference its results.
g(i) defines the structure of the table (resultset) that is returned by the generate_series() function.
It assigns the alias g to the resultset with a single column named i

Set a default return value for a Postgres function

I have the following function in Postgres:
CREATE OR REPLACE FUNCTION point_total(user_id integer, gametime date)
RETURNS bigint AS
$BODY$
SELECT sum(points) AS result
FROM picks
WHERE user_id = $1
AND picks.gametime > $2
AND points IS NOT NULL;
$BODY$
LANGUAGE sql VOLATILE;
It works correctly, but when a user starts out and has no points, it very reasonably returns NULL. How can I modify it so that it returns 0 instead.
Changing the body of the function to that below results in an "ERROR: syntax error at or near "IF".
SELECT sum(points) AS result
FROM picks
WHERE user_id = $1
AND picks.gametime > $2
AND points IS NOT NULL;
IF result IS NULL
SELECT 0 AS result;
END;
You need to change the language from sqlto plpgsql if you want to use the procedural features of PL/pgSQL. The function body changes, too.
Be aware that all parameter names are visible in the function body, including all levels of SQL statements. If you create a naming conflict, you may need to table-qualify column names like this: table.col, to avoid confusion. Since you refer to function parameters by positional reference ($n) anyway, I just removed parameter names to make it work.
Finally, THEN was missing in the IF statement - the immediate cause of the error message.
One could use COALESCE to substitute for NULL values. But that only works if there is at least one resulting row. COALESCE can't fix "no row" it can only replace actual NULL values.
There are several ways to cover all NULL cases. In plpgsql functions:
CREATE OR REPLACE FUNCTION point_total(integer, date, OUT result bigint)
RETURNS bigint AS
$func$
BEGIN
SELECT sum(p.points) -- COALESCE would make sense ...
INTO result
FROM picks p
WHERE p.user_id = $1
AND p.gametime > $2
AND p.points IS NOT NULL; -- ... if NULL values were not ruled out
IF NOT FOUND THEN -- If no row was found ...
result := 0; -- ... set to 0 explicitly
END IF;
END
$func$ LANGUAGE plpgsql;
Or you can enclose the whole query in a COALESCE expression in an outer SELECT. "No row" from the inner SELECT results in a NULL in the expression. Work as plain SQL, or you can wrap it in an sql function:
CREATE OR REPLACE FUNCTION point_total(integer, date)
RETURNS bigint AS
$func$
SELECT COALESCE(
(SELECT sum(p.points)
FROM picks p
WHERE p.user_id = $1
AND p.gametime > $2
-- AND p.points IS NOT NULL -- redundant here
), 0)
$func$ LANGUAGE sql;
Related answer:
How to display a default value when no match found in a query?
Concerning naming conflicts
One problem was the naming conflict most likely. There have been major changes in version 9.0. I quote the release notes:
E.8.2.5. PL/pgSQL
PL/pgSQL now throws an error if a variable name conflicts with a
column name used in a query (Tom Lane)
Later versions have refined the behavior. In obvious spots the right alternative is picked automatically. Reduces the potential for conflicts, but it's still there. The advice still applies in Postgres 9.3.