I am trying to create a pl/pgsql equivalent to the pandas 'ffill' function. The function should forward fill null values. In the example I can do a forward fill but I get errors when I try to create a function from my procedure. The function seems to reflect exactly the procedure but I get a syntax error at the portion ... as $1.
Why? What should I be reading to clarify?
-- Forward fill experiment
DROP TABLE IF EXISTS example;
create temporary table example(id int, str text, val integer);
insert into example values
(1, 'a', null),
(1, null, 1),
(2, 'b', 2),
(2,null ,null );
select * from example
select (case
when str is null
then lag(str,1) over (order by id)
else str
end) as str,
(case
when val is null
then lag(val,1) over (order by id)
else val
end) as val
from example
-- Forward fill function
create or replace function ffill(text, text, text) -- takes column to fill, the table, and the ordering column
returns text as $$
begin
select (case
when $1 is null
then lag($1 ,1) over (order by $3)
else $1
end) as $1
from $2;
end;
$$ LANGUAGE plpgsql;
Update 1: I did some additional experimenting taking a different approach. The code is below. It uses the same example table as above.
CREATE OR REPLACE FUNCTION GapFillInternal(
s anyelement,
v anyelement) RETURNS anyelement AS
$$
declare
temp alias for $0 ;
begin
RAISE NOTICE 's= %, v= %', s, v;
if v is null and s notnull then
temp := s;
elsif s is null and v notnull then
temp := v;
elsif s notnull and v notnull then
temp := v;
else
temp := null;
end if;
RAISE NOTICE 'temp= %', temp;
return temp;
END;
$$ LANGUAGE PLPGSQL;
CREATE AGGREGATE GapFill(anyelement) (
SFUNC=GapFillInternal,
STYPE=anyelement
);
select id, str, val, GapFill(val) OVER (ORDER by id) as valx
from example;
The resulting table is this:
I don't understand where the '1' in the first row of valx column comes from. From the raise notice output it should be Null and that seems a correct expectation from the CREATE AGGREGATE docs.
Correct call
Seems like your displayed query is incorrect, and the test case is just too reduced to show it.
Assuming you want to "forward fill" partitioned by id, you'll have to say so:
SELECT row_num, id
, str, gap_fill(str) OVER w AS strx
, val, gap_fill(val) OVER w AS valx
FROM example
WINDOW w AS (PARTITION BY id ORDER BY row_num); -- !
The WINDOW clause is just a syntactical convenience to avoid spelling out the same window frame repeatedly. The important part is the added PARTITION clause.
Simpler function
Much simpler, actually:
CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
RETURNS anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s); -- that's all!
END
$func$;
CREATE AGGREGATE gap_fill(anyelement) (
SFUNC = gap_fill_internal,
STYPE = anyelement
);
Slightly faster in a quick test.
Standard SQL
Without custom function:
SELECT row_num, id
, str, first_value(str) OVER (PARTITION BY id, ct_str ORDER BY row_num) AS strx
, val, first_value(val) OVER (PARTITION BY id, ct_val ORDER BY row_num) AS valx
FROM (
SELECT *, count(str) OVER w AS ct_str, count(val) OVER w AS ct_val
FROM example
WINDOW w AS (PARTITION BY id ORDER BY row_num)
) sub;
The query becomes more complex with a subquery. Performance is similar. Slightly slower in a quick test.
More explanation in these related answers:
Carry over long sequence of missing values with Postgres
Retrieve last known value for each column of a row
SQL group table by "leading rows" without pl/sql
db<>fiddle here - showing all with extended test case
Related
Using Postgres 13.1, I want to apply a forward fill function to all columns of a table. The forward fill function is explained in my earlier question:
How to do forward fill as a PL/PGSQL function
However, in that case the columns and table are specified. I want to take that code and apply it to an arbitrary table, ie. specify a table and the forward fill is applied to each of the columns.
Using this table as an example:
CREATE TABLE example(row_num int, id int, str text, val integer);
INSERT INTO example VALUES
(1, 1, '1a', NULL)
, (2, 1, NULL, 1)
, (3, 2, '2a', 2)
, (4, 2, NULL, NULL)
, (5, 3, NULL, NULL)
, (6, 3, '3a', 31)
, (7, 3, NULL, NULL)
, (8, 3, NULL, 32)
, (9, 3, '3b', NULL)
, (10,3, NULL, NULL)
;
I start with the following working base for the function. I call it passing in some variable names. Note the first is a table name not a column name. The function takes the table name and creates an array of all the column names and then outputs the names.
create or replace function col_collect(tbl text, id text, row_num text)
returns text[]
language plpgsql as
$func$
declare
tmp text[];
col text;
begin
select array (
select column_name
from information_schema."columns" c
where table_name = tbl
) into tmp;
foreach col in array tmp
loop
raise notice 'col: %', col;
end loop;
return tmp;
end
$func$;
I want to apply the "forward fill" function I got from my earlier question to each column of a table. UPDATE seems to be the correct approach. So this is the preceding function where I replace raise notice by an update using execute so I can pass in the table name:
create or replace function col_collect(tbl text, id text, row_num text)
returns void
language plpgsql as
$func$
declare
tmp text[];
col text;
begin
select array (
select column_name
from information_schema."columns" c
where table_name = tbl
) into tmp;
foreach col in array tmp
loop
execute 'update '||tbl||'
set '||col||' = gapfill('||col||') OVER w AS '||col||'
where '||tbl||'.row_num = '||col||'.row_num
window w as (PARTITION BY '||id||' ORDER BY '||row_num||')
returning *;';
end loop;
end
$func$;
-- call the function
select col_collect('example','id','row_num')
The preceding errors out with a syntax error. I have tried many variations on this but they all fail. Helpful answers on SO were here and here. The aggregate function I'm trying to apply (as window function) is:
CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
RETURNS anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s); -- that's all!
END
$func$;
CREATE AGGREGATE gap_fill(anyelement) (
SFUNC = gap_fill_internal,
STYPE = anyelement
);
My questions are:
is this a good approach and if so what am I doing wrong; or
is there a better way to do this?
What you ask is not a trivial task. You should be comfortable with PL/pgSQL. I do not advise this kind of dynamic SQL queries for beginners, too powerful.
That said, let's dive in. Buckle up!
CREATE OR REPLACE FUNCTION f_gap_fill_update(_tbl regclass, _id text, _row_num text, OUT nullable_columns int, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_pk text := quote_ident(_row_num);
_sql text;
BEGIN
SELECT INTO _sql, nullable_columns
concat_ws(E'\n'
, 'UPDATE ' || _tbl || ' t'
, 'SET (' || string_agg( quote_ident(a.attname), ', ') || ')'
, ' = (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
, 'FROM ('
, ' SELECT ' || _pk
, ' , ' || string_agg(format('gap_fill(%1$I) OVER w AS %1$I', a.attname), ', ')
, ' FROM ' || _tbl
, format(' WINDOW w AS (PARTITION BY %I ORDER BY %s)', _id, _pk)
, ' ) u'
, format('WHERE t.%1$s = u.%1$s', _pk)
, 'AND (' || string_agg('t.' || quote_ident(a.attname), ', ') || ') IS DISTINCT FROM'
, ' (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
)
, count(*) -- AS _col_ct
FROM (
SELECT a.attname
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped
AND NOT a.attnotnull
ORDER BY a.attnum
) a;
IF nullable_columns = 0 THEN
RAISE EXCEPTION 'No nullable columns found in table >>%<<', _tbl;
ELSIF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql; -- execute
GET DIAGNOSTICS updated_rows = ROW_COUNT;
END
$func$;
Example call:
SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');
db<>fiddle here
The function is state of the art.
Generates and executes a query of the form:
UPDATE tbl t
SET (str, val, col1)
= (u.str, u.val, u.col1)
FROM (
SELECT row_num
, gap_fill(str) OVER w AS str, gap_fill(val) OVER w AS val
, gap_fill(col1) OVER w AS col1
FROM tbl
WINDOW w AS (PARTITION BY id ORDER BY row_num)
) u
WHERE t.row_num = u.row_num
AND (t.str, t.val, t.col1) IS DISTINCT FROM
(u.str, u.val, u.col1)
Using pg_catalog.pg_attribute instead of the information schema. See:
"Information schema vs. system catalogs"
Note the final WHERE clause to prevent (possibly expensive) empty updates. Only rows that actually change will be written. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Moreover, only nullable columns (not defined NOT NULL) will even be considered, to avoid unnecessary work.
Using ROW syntax in UPDATE to keep the code simple. See:
SQL update fields of one table from fields of another one
The function returns two integer values: nullable_columns and updated_rows, reporting what the names suggest.
The function defends against SQL injection properly. See:
Table name as a PostgreSQL function parameter
SQL injection in Postgres functions vs prepared queries
About GET DIAGNOSTICS:
Calculate number of rows affected by batch query in PostgreSQL
The above function updates, but does not return rows. Here is a basic demo how to return rows of varying type:
CREATE OR REPLACE FUNCTION f_gap_fill_select(_tbl_type anyelement, _id text, _row_num text)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := pg_typeof(_tbl_type)::text::regclass;
_sql text;
BEGIN
SELECT INTO _sql
'SELECT ' || string_agg(CASE WHEN a.attnotnull
THEN format('%I', a.attname)
ELSE format('gap_fill(%1$I) OVER w AS %1$I', a.attname) END
, ', ' ORDER BY a.attnum)
|| E'\nFROM ' || _tbl
|| format(E'\nWINDOW w AS (PARTITION BY %I ORDER BY %I)', _id, _row_num)
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped;
IF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
RETURN QUERY EXECUTE _sql;
-- RAISE NOTICE '%', _sql; -- debug
END
$func$;
Call (note special syntax!):
SELECT * FROM f_gap_fill_select(NULL::example, 'id', 'row_num');
db<>fiddle here
About returning a polymorphic row type:
Refactor a PL/pgSQL function to return the output of various SELECT queries
I am trying to execute this query but for some reason I seem to get this error
ERROR: syntax error at or near "$1"
LINE 8: FROM $1
Which I am not sure makes sense?
The function which should execute it is this
CREATE OR REPLACE FUNCTION tsrange_aggregate(_tbl regclass, selected_entity_id uuid, foreign_tsrange tsrange, OUT result Boolean)
RETURNS BOOLEAN AS
$$
BEGIN
EXECUTE
'SELECT $3 <# (SELECT tsrange(min(COALESCE(lower(valid), ''-infinity'')), max(COALESCE(upper(valid), ''infinity'')))
FROM (
SELECT *, count(nextstart > enddate OR NULL) OVER (ORDER BY valid DESC NULLS LAST) AS grp
FROM (
SELECT valid
, max(COALESCE(upper(valid), ''infinity'')) OVER (ORDER BY valid) AS enddate
, lead(lower(valid)) OVER (ORDER BY valid) As nextstart
FROM $1
where entity_id = $2
) a
) b
GROUP BY grp
ORDER BY 1);'
INTO result
USING _tbl, selected_entity_id, foreign_tsrange ;
END
$$ LANGUAGE plpgsql;
SELECT tsrange_aggregate('parent_registration'::regclass, '0006f79d-5af7-4b29-a200-6aef3bb0105f', tsrange('2011-05-23 02:00:00', '2013-05-23 02:00:00' ));
is it not possible to function parameter to inner query? or am I missing something here?
The error message suggests that postgres cannot cope with a table called $1. You have to concatenate the table name in the variable with the string, e.g. using FORMAT:
CREATE OR REPLACE FUNCTION myfunc(text)
RETURNS BOOLEAN AS
$$
DECLARE res boolean;
BEGIN
EXECUTE FORMAT('SELECT b FROM %I',$1) INTO res;
RETURN res;
END
$$ LANGUAGE plpgsql;
%I treats the argument value as an SQL identifier.
Demo: db<>fiddle
I have an upsert function that I modified from the documentation. However I have been trying to return the updated or inserted row. I'm calling this function from a node application and I need to keep track of which record has either been updated or inserted especially during sync script.
Here is the function:
create or replace function upsert_test(d TEXT, sys INT, val INT, p INT, inter BOOLEAN)
returns void as $$
begin
update test_table set description=d, code=sys, val_id=val, provider_id=p, connect=inter where code=sys and val_id=val and provider_id=p;
if found then
return;
end if;
begin
insert into test_table (description, code, val_id, provider_id, connect) values (d, sys, val, p, inter);
return;
exception when unique_violation then
end;
return;
end;
$$ language plpgsql;
I have tried to change the return type and have the function return a record but I can't seem to get it working.
Use the RETURNING clause. You can combine it with RETURN QUERY ...
CREATE OR REPLACE FUNCTION upsert_t2(d text, sys int, val int, p int, inter bool)
RETURNS SETOF test_table AS
$func$
BEGIN
RETURN QUERY
UPDATE test_table t
SET description = d
,code = sys
,val_id = val
,provider_id = p
,connect = inter
WHERE t.code = sys
AND t.val_id = val
AND t.provider_id = p
RETURNING t.*;
IF FOUND THEN
RETURN;
END IF;
BEGIN
RETURN QUERY
INSERT INTO test_table (description, code, val_id, provider_id, connect)
VALUES (d, sys, val, p, inter)
RETURNING *;
EXCEPTION WHEN UNIQUE_VIOLATION THEN
END;
RETURN;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM upsert_t2(...)
Reply to comment
I would try to avoid updates completely that do not change anything. Also, I would look to a data-modifying CTE in a loop:
CREATE OR REPLACE FUNCTION upsert_cte(d text, sys int, val int, p int
, inter bool)
RETURNS SETOF test_table AS
$func$
BEGIN
LOOP
BEGIN
RETURN QUERY
WITH sel AS (
SELECT t.pk_col -- primary key column
FROM test_table t
WHERE t.code = sys
AND t.val_id = val
AND t.provider_id = p
FOR SHARE -- lock
)
, ins AS (
INSERT INTO test_table (description, code, val_id, provider_id, connect)
SELECT d, sys, val, p, inter
WHERE NOT EXISTS (SELECT 1 FROM sel) -- if not found
RETURNING *
)
, upd AS (
UPDATE test_table t
SET description = d
,code = sys
,val_id = val
,provider_id = p
,connect = inter
FROM sel
WHERE sel.pk_col = t.pk_col -- if found (possibly mult. rows)
AND t.description IS DISTINCT FROM d
,t.code IS DISTINCT FROM sys
,t.val_id IS DISTINCT FROM val
,t.provider_id IS DISTINCT FROM p
,t.connect IS DISTINCT FROM inter -- only if anything changes
RETURNING t.*
)
SELECT * FROM ins
UNION ALL
SELECT * FROM upd;
RETURN; -- No error occurred, exit loop
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- inserted in concurrent session
RAISE NOTICE 'It happened!'; -- hardly ever happens, keep looping
END;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Explanation and links in this related answer:
Is SELECT or INSERT in a function prone to race conditions?
I have written a stored procedure with the help of the SO community. I have cobbled/hacked together the answers of various questions to write my function.
However, when I try to create my function in the db (PostgreSQL 8.4), I get the following errors:
ERROR: syntax error at or near "$4"
LINE 1: ...ank() OVER (ORDER BY $1 , $2 ) / $3 )::int as $4 , $1 ,...
^
AND
QUERY: SELECT ceil(rank() OVER (ORDER BY $1 , $2 ) / $3 )::int as $4 , $1 , $2 , $5 , $6 , $7 , $8 , $9 , $10 FROM foobar WHERE $2 BETWEEN $11 AND $12 AND $1 = ANY( $13 ) ORDER BY $1 , $2 , $4
CONTEXT: SQL statement in PL/PgSQL function "custom_group" near line 9
This is the code for the function I am trying to create:
CREATE OR REPLACE FUNCTION custom_group(start_date DATE, end_date DATE, grouping_period INTEGER, _ids int[] DEFAULT '{}')
RETURNS TABLE (
grp INTEGER,
id INTEGER,
entry_date DATE,
pop REAL,
hip REAL,
lop REAL,
pcl REAL,
vop BIGINT,
poi BIGINT) AS
$BODY$
BEGIN
IF _ids <> '{}'::int[] THEN -- excludes empty array and NULL
RETURN QUERY
SELECT ceil(rank() OVER (ORDER BY id, entry_date) / $3)::int as grp
,id, entry_date, pop, hip, lop, pcl, vop, poi
FROM foobar
WHERE entry_date BETWEEN start_date AND end_date AND id = ANY(_ids)
ORDER BY id, entry_date, grp ;
ELSE
RETURN QUERY
SELECT ceil(rank() OVER (ORDER BY id, entry_date) / $3)::int as grp
,id, entry_date, pop, hip, lop, pcl, vop, poi
FROM foobar
WHERE entry_date BETWEEN start_date AND end_date
ORDER BY id, entry_date, grp ;
END IF;
END;
$BODY$ LANGUAGE plpgsql;
Can anyone understand why I am getting these errors - and how do I fix them?
The error comes from a naming conflict.
The variable grp is defined implicitly by the RETURNS TABLE clause. In the function body, you try to use the same identifier as column alias, which conflicts.
Just use a different name for grp - the column alias will not be visible outside of the function anyway.
And table-qualify the other columns:
CREATE OR REPLACE FUNCTION custom_group(_start_date DATE
,_end_date DATE
,_grouping_period INTEGER,
,_ids int[] DEFAULT '{}')
RETURNS TABLE (grp int, id int, entry_date date, pop real, hip real,
lop real, pcl real, vop bigint, poi bigint) AS
$BODY$
BEGIN
IF _ids <> '{}'::int[] THEN -- excludes empty array and NULL
RETURN QUERY
SELECT ceil(rank() OVER (ORDER BY f.id, f.entry_date) / $3)::int AS _grp
,f.id, f.entry_date, f.pop, f.hip, f.lop, f.pcl, f.vop, f.poi
FROM foobar f
WHERE f.entry_date BETWEEN _start_date AND _end_date AND id = ANY(_ids)
ORDER BY f.id, f.entry_date, _grp;
ELSE
RETURN QUERY
SELECT ceil(rank() OVER (ORDER BY f.id, f.entry_date) / $3)::int -- no alias
,f.id, f.entry_date, f.pop, f.hip, f.lop, f.pcl, f.vop, f.poi
FROM foobar f
WHERE f.entry_date BETWEEN _start_date AND _end_date
ORDER BY f.id, f.entry_date, 1; -- ordinal pos. instead of col alias
END IF;
END;
$BODY$ LANGUAGE plpgsql;
The reason why I prefix IN parameters with _ is the same: avoid such naming conflicts.
You don't even have to use an alias for the computed column at all in this case. You could use the ordinal position in ORDER BY like I demonstrate in the second query. I quote the manual here:
Each expression can be the name or ordinal number of an output column
(SELECT list item), or it can be an arbitrary expression formed from
input-column values.
I am writing a SP, using PL/pgSQL.
I want to return a record, comprised of fields from several different tables. Could look something like this:
CREATE OR REPLACE FUNCTION get_object_fields(name text)
RETURNS RECORD AS $$
BEGIN
-- fetch fields f1, f2 and f3 from table t1
-- fetch fields f4, f5 from table t2
-- fetch fields f6, f7 and f8 from table t3
-- return fields f1 ... f8 as a record
END
$$ language plpgsql;
How may I return the fields from different tables as fields in a single record?
[Edit]
I have realized that the example I gave above was slightly too simplistic. Some of the fields I need to be retrieving, will be saved as separate rows in the database table being queried, but I want to return them in the 'flattened' record structure.
The code below should help illustrate further:
CREATE TABLE user (id int, school_id int, name varchar(32));
CREATE TYPE my_type AS (
user1_id int,
user1_name varchar(32),
user2_id int,
user2_name varchar(32)
);
CREATE OR REPLACE FUNCTION get_two_users_from_school(schoolid int)
RETURNS my_type AS $$
DECLARE
result my_type;
temp_result user;
BEGIN
-- for purpose of this question assume 2 rows returned
SELECT id, name INTO temp_result FROM user where school_id = schoolid LIMIT 2;
-- Will the (pseudo)code below work?:
result.user1_id := temp_result[0].id ;
result.user1_name := temp_result[0].name ;
result.user2_id := temp_result[1].id ;
result.user2_name := temp_result[1].name ;
return result ;
END
$$ language plpgsql
Don't use CREATE TYPE to return a polymorphic result. Use and abuse the RECORD type instead. Check it out:
CREATE FUNCTION test_ret(a TEXT, b TEXT) RETURNS RECORD AS $$
DECLARE
ret RECORD;
BEGIN
-- Arbitrary expression to change the first parameter
IF LENGTH(a) < LENGTH(b) THEN
SELECT TRUE, a || b, 'a shorter than b' INTO ret;
ELSE
SELECT FALSE, b || a INTO ret;
END IF;
RETURN ret;
END;$$ LANGUAGE plpgsql;
Pay attention to the fact that it can optionally return two or three columns depending on the input.
test=> SELECT test_ret('foo','barbaz');
test_ret
----------------------------------
(t,foobarbaz,"a shorter than b")
(1 row)
test=> SELECT test_ret('barbaz','foo');
test_ret
----------------------------------
(f,foobarbaz)
(1 row)
This does wreak havoc on code, so do use a consistent number of columns, but it's ridiculously handy for returning optional error messages with the first parameter returning the success of the operation. Rewritten using a consistent number of columns:
CREATE FUNCTION test_ret(a TEXT, b TEXT) RETURNS RECORD AS $$
DECLARE
ret RECORD;
BEGIN
-- Note the CASTING being done for the 2nd and 3rd elements of the RECORD
IF LENGTH(a) < LENGTH(b) THEN
ret := (TRUE, (a || b)::TEXT, 'a shorter than b'::TEXT);
ELSE
ret := (FALSE, (b || a)::TEXT, NULL::TEXT);
END IF;
RETURN ret;
END;$$ LANGUAGE plpgsql;
Almost to epic hotness:
test=> SELECT test_ret('foobar','bar');
test_ret
----------------
(f,barfoobar,)
(1 row)
test=> SELECT test_ret('foo','barbaz');
test_ret
----------------------------------
(t,foobarbaz,"a shorter than b")
(1 row)
But how do you split that out in to multiple rows so that your ORM layer of choice can convert the values in to your language of choice's native data types? The hotness:
test=> SELECT a, b, c FROM test_ret('foo','barbaz') AS (a BOOL, b TEXT, c TEXT);
a | b | c
---+-----------+------------------
t | foobarbaz | a shorter than b
(1 row)
test=> SELECT a, b, c FROM test_ret('foobar','bar') AS (a BOOL, b TEXT, c TEXT);
a | b | c
---+-----------+---
f | barfoobar |
(1 row)
This is one of the coolest and most underused features in PostgreSQL. Please spread the word.
You need to define a new type and define your function to return that type.
CREATE TYPE my_type AS (f1 varchar(10), f2 varchar(10) /* , ... */ );
CREATE OR REPLACE FUNCTION get_object_fields(name text)
RETURNS my_type
AS
$$
DECLARE
result_record my_type;
BEGIN
SELECT f1, f2, f3
INTO result_record.f1, result_record.f2, result_record.f3
FROM table1
WHERE pk_col = 42;
SELECT f3
INTO result_record.f3
FROM table2
WHERE pk_col = 24;
RETURN result_record;
END
$$ LANGUAGE plpgsql;
If you want to return more than one record you need to define the function as returns setof my_type
Update
Another option is to use RETURNS TABLE() instead of creating a TYPE which was introduced in Postgres 8.4
CREATE OR REPLACE FUNCTION get_object_fields(name text)
RETURNS TABLE (f1 varchar(10), f2 varchar(10) /* , ... */ )
...
To return a single row
Simpler with OUT parameters:
CREATE OR REPLACE FUNCTION get_object_fields(_school_id int
, OUT user1_id int
, OUT user1_name varchar(32)
, OUT user2_id int
, OUT user2_name varchar(32)) AS
$func$
BEGIN
SELECT INTO user1_id, user1_name
u.id, u.name
FROM users u
WHERE u.school_id = _school_id
LIMIT 1; -- make sure query returns 1 row - better in a more deterministic way?
user2_id := user1_id + 1; -- some calculation
SELECT INTO user2_name
u.name
FROM users u
WHERE u.id = user2_id;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM get_object_fields(1);
You don't need to create a type just for the sake of this plpgsql function. It may be useful if you want to bind multiple functions to the same composite type. Else, OUT parameters do the job.
There is no RETURN statement. OUT parameters are returned automatically with this form that returns a single row. RETURN is optional.
Since OUT parameters are visible everywhere inside the function body (and can be used just like any other variable), make sure to table-qualify columns of the same name to avoid naming conflicts! (Better yet, use distinct names to begin with.)
Simpler yet - also to return 0-n rows
Typically, this can be simpler and faster if queries in the function body can be combined. And you can use RETURNS TABLE() (since Postgres 8.4, long before the question was asked) to return 0-n rows.
The example from above can be written as:
CREATE OR REPLACE FUNCTION get_object_fields2(_school_id int)
RETURNS TABLE (user1_id int
, user1_name varchar(32)
, user2_id int
, user2_name varchar(32)) AS
$func$
BEGIN
RETURN QUERY
SELECT u1.id, u1.name, u2.id, u2.name
FROM users u1
JOIN users u2 ON u2.id = u1.id + 1
WHERE u1.school_id = _school_id
LIMIT 1; -- may be optional
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM get_object_fields2(1);
RETURNS TABLE is effectively the same as having a bunch of OUT parameters combined with RETURNS SETOF record, just shorter.
The major difference: this function can return 0, 1 or many rows, while the first version always returns 1 row.
Add LIMIT 1 like demonstrated to only allow 0 or 1 row.
RETURN QUERY is simple way to return results from a query directly.
You can use multiple instances in a single function to add more rows to the output.
db<>fiddle here (demonstrating both)
Varying row-type
If your function is supposed to dynamically return results with a different row-type depending on the input, read more here:
Refactor a PL/pgSQL function to return the output of various SELECT queries
If you have a table with this exact record layout, use its name as a type, otherwise you will have to declare the type explicitly:
CREATE OR REPLACE FUNCTION get_object_fields
(
name text
)
RETURNS mytable
AS
$$
DECLARE f1 INT;
DECLARE f2 INT;
…
DECLARE f8 INT;
DECLARE retval mytable;
BEGIN
-- fetch fields f1, f2 and f3 from table t1
-- fetch fields f4, f5 from table t2
-- fetch fields f6, f7 and f8 from table t3
retval := (f1, f2, …, f8);
RETURN retval;
END
$$ language plpgsql;
You can achieve this by using simply as a returns set of records using return query.
CREATE OR REPLACE FUNCTION schemaName.get_two_users_from_school(schoolid bigint)
RETURNS SETOF record
LANGUAGE plpgsql
AS $function$
begin
return query
SELECT id, name FROM schemaName.user where school_id = schoolid;
end;
$function$
And call this function as : select * from schemaName.get_two_users_from_school(schoolid) as x(a bigint, b varchar);
you can do this using OUT parameter and CROSS JOIN
CREATE OR REPLACE FUNCTION get_object_fields(my_name text, OUT f1 text, OUT f2 text)
AS $$
SELECT t1.name, t2.name
FROM table1 t1
CROSS JOIN table2 t2
WHERE t1.name = my_name AND t2.name = my_name;
$$ LANGUAGE SQL;
then use it as a table:
select get_object_fields( 'Pending') ;
get_object_fields
-------------------
(Pending,code)
(1 row)
or
select * from get_object_fields( 'Pending');
f1 | f
---------+---------
Pending | code
(1 row)
or
select (get_object_fields( 'Pending')).f1;
f1
---------
Pending
(1 row)
CREATE TABLE users(user_id int, school_id int, name text);
insert into users values (1, 10,'alice')
,(5, 10,'boy')
,(13, 10,'cassey')
,(17, 10,'delores')
,(4, 11,'elaine');
I setted the user_id as arbitrary int. The function input parameter is the school_id. So if the school_id is 10 you hope to get the following result:
user_id | name | user_id | name
---------+-------+---------+------
1 | alice | 5 | boy
So your query should be something like:
with a as (
select u1.user_id,
u1.name from users u1
where school_id = 10 order by user_id limit 1),
b as
(select u2.user_id,u2.name from users u2
where school_id = 10 order by user_id limit 1 offset 1 )
select * from a cross JOIN b ;
So let's wrap the query to the plpgsql function.
CREATE OR REPLACE FUNCTION
get_object_fields2(_school_id int)
RETURNS TABLE (user1_id int
, user1_name text
, user2_id int
, user2_name text)
LANGUAGE plpgsql AS
$func$
DECLARE countu integer;
BEGIN
countu := (
select count(*) from users where school_id = _school_id);
IF countu >= 2 THEN
RETURN QUERY
with a as (
select u1.user_id,
u1.name from users u1
where school_id = _school_id
order by user_id limit 1),
b as(
select u2.user_id,u2.name from users u2
where school_id = _school_id
order by user_id limit 1 offset 1 )
select * from a cross JOIN b;
elseif countu = 1 then
return query
select u1.user_id, u1.name,u1.user_id, u1.name
from users u1 where school_id = _school_id;
else
RAISE EXCEPTION 'not found';
end if;
END
$func$;