iterative depth of tree - sql

The problematic code:
CREATE OR REPLACE FUNCTION foo(searchid INTEGER)
RETURNS INTEGER AS
$$
DECLARE
level INTEGER := 0;
mid INTEGER := searchid;
BEGIN
WHILE EXISTS(SELECT id INTO mid FROM tbl1 WHERE parent_id=mid) LOOP
level := level + 1;
END LOOP;
RETURN level;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE;
I need to find the tree depth of an element with id searchid, I've written a slightly different function than above which uses mid NOTNULL as a condition of the while loop and it works.
However, when I try to use EXISTS directly in the WHILE condition as in the code posted above, postgresql says:
SQL error:
ERROR: syntax error at or near "$1"
LINE 1: SELECT EXISTS(SELECT id INTO $1 FROM tbl1 WHERE ...
So it does some strange transformations on my code which make it syntactically wrong.
How to fix it?
It runs on postgresql 8.3.17.

Just for the record:
If you were using an up-to-date version of Postgres, you could do this more efficiently with a single statement:
with recursive tree as (
select id, parent, 1 as level
from tbl1
where id = 1
union all
select c.id, c.parent, p.level + 1
from tbl1 c
join tree p on c.parent = p.id
)
select max(level)
from tree

The key mistake is that you cannot assign a variable with SELECT INTO inside an EXISTS construct. SELECT items inside the EXISTS construct are ignored.
I rewrote the function to simplify and make it more secure:
CREATE OR REPLACE FUNCTION foo(_searchid int, OUT _level int)
RETURNS int
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
_level := 0;
LOOP
SELECT INTO _level, _searchid
_level + 1, t.id
FROM tbl1 t
WHERE t.parent_id = _searchid;
EXIT WHEN NOT FOUND;
END LOOP;
END
$func$;
Call:
SELECT foo(1);
Major points
It is your responsibility to prevent the loop from being infinite.
The _ prefix of parameters is to avoid naming conflicts with potential columns of the used table.
I use the special variable FOUND that is set TRUE after (and only after) certain SQL statements (like SELECT INTO) found a row.
Use the EXIT command to exit the loop when no row was found.
Increment _level inside the SELECT. (Or in the loop body, it's just a tiny simplification.)
Since PostgreSQL 9.1 you can assign to IN parameters, so I (ab)use _searchid and don't need to DECLARE any additional variables. Don't do this, if you need the original parameter value later in the function.
The function should not be declared IMMUTABLE, since it accesses a table. I made it STABLE instead. You can make the function IMMUTABLE to "cheat" and be able to use it in index creation (for instance) - but it's on you if such an index breaks after a change in the underlying table.
Recursive CTE
With modern PostgreSQL you could also use a recursive CTE for the job. That's what #a_horse hinted at in his comment - oh, and what he posted now as answer.
Another example (of many on SO) here.

Maybe instead of writing your own function you would like to use postgres extension with function that was created to present hierarchical data that is stored in a table?
It's called connectby and it is a part of tablefunc extension. How to use the function you can find here.
To install extension:
CREATE EXTENSION tablefunc;
You've got many possibilities like you can choose: key value of the row to start at, maximum depth to descend to, or zero for unlimited depth, or string to separate keys with in branch output.

Related

Column doesn't exist when using WITH statement PostgreSQL

I want to create a function to be used to get the node traversal path.
CREATE TYPE IDType AS (id uuid);
drop function F_ItemPath;
CREATE OR REPLACE FUNCTION F_ItemPath (item record)
RETURNS TABLE (item_id uuid, depth numeric)
AS $$
BEGIN
return QUERY
WITH recursive item_path AS (
SELECT ic.parent_item_id, depth=1
from item_combination ic, item i
WHERE ic.child_item_id=i.id
UNION all
SELECT ic.parent_item_id, depth=ip.depth + 1
FROM item_path ip, item_combination ic WHERE ip.parent_item_id=ic.child_item_id
)
SELECT item_id=ip.parent_item_id, depth=ip.depth FROM item_path ip;
END; $$
LANGUAGE plpgsql;
select * from F_ItemPath(('55D6F516-7D8F-4DF3-A4E5-1E3F505837A1', 'FFE2A4D3-267C-465F-B4B4-C7BB2582F1BC'))
there has two problems:
I tried using user-defined type to set parameter type CREATE TYPE IDType AS (id uuid);, but I don't know how to call the function with table argument
there has an error that says:
SQL Error [42703]: ERROR: column ip.depth does not exist
Where: PL/pgSQL function f_itempath(record) line 3 at RETURN QUERY
what I expected is I can use the function normally and the argument can be supplied from other tables.
this is the full query that you can try:
http://sqlfiddle.com/#!15/9caba/1
I made the query in DBEAVER app, it will have some different error message.
I suggest you can experiment with it outside sqlfiddle.
The expression depth=1 tests if the column depth equals the value 1 and returns a boolean value. But you never give that boolean expression a proper name.
Additionally you can't add numbers to boolean values, so the expression depth=ip.depth + 1 tries to add 1 to a value of true or false - which fails obviously. If it did work, it would then compare that value with the value in the column depth again.
Did you intend to alias the value 1 with the name depth? Then you need to use 1 as depth and ip.depth + 1 as depth in the recursive part.
In the final select you have the same error - using boolean expressions instead of a column alias
It's also highly recommended to use explicit JOIN operators which were introduced in the SQL standard over 30 years ago.
Using PL/pgSQL to wrap a SQL query is also a bit of an overkill. A SQL function is enough.
Using an untyped record as a parameter seems highly dubious. It won't allow you to access columns using e.g. item.id. But given your example call, it seems you simply want to pass multiple IDs for the anchor (no-recursive) part of the query. That's better done using an array or a varadic parameter which allows listing multiple parameters with commas.
So you probably want something like this:
drop function f_itempath;
CREATE OR REPLACE FUNCTION f_itempath(variadic p_root_id uuid[])
RETURNS TABLE (item_id uuid, depth integer)
as
$$
WITH recursive item_path AS (
SELECT ic.parent_item_id, 1 as depth
FROM item_combination ic
WHERE ic.child_item_id = any(p_root_id) --<< no join needed to access the parameter
UNION all
SELECT ic.parent_item_id, ip.depth + 1
FROM item_path ip
JOIN item_combination ic ON ip.parent_item_id = ic.child_item_id
)
SELECT ip.parent_item_id as item_id, ip.depth
FROM item_path ip;
$$
language sql
stable;
Then you can call it like this (note: no parentheses around the parameters)
select *
from f_itempath('55d6f516-7d8f-4df3-a4e5-1e3f505837a1', 'ffe2a4d3-267c-465f-b4b4-c7bb2582f1bc');
select *
from f_itempath('55d6f516-7d8f-4df3-a4e5-1e3f505837a1', 'ffe2a4d3-267c-465f-b4b4-c7bb2582f1bc', 'df366232-f200-4254-bad5-94e11ea35379');
select *
from f_itempath('55d6f516-7d8f-4df3-a4e5-1e3f505837a1');

Iterate through table, perform calculation on each row

I would like to preface this by saying I am VERY new to SQL, but my work now requires that I work in it.
I have a dataset containing topographical point data (x,y,z). I am trying to build a KNN model based on this data. For every point 'P', I search for the 100 points in the data set nearest P (nearest meaning geographically nearest). I then average the values of these points (this average is known as a residual), and add this value to the table in the 'resid' column.
As a proof of concept, I am trying to simply iterate over the table, and set the value of the 'resid' column to 1.0 in every row.
My query is this:
CREATE OR REPLACE FUNCTION LoopThroughTable() RETURNS VOID AS '
DECLARE row table%rowtype;
BEGIN
FOR row in SELECT * FROM table LOOP
SET row.resid = 1.0;
END LOOP;
END
' LANGUAGE 'plpgsql';
SELECT LoopThroughTable() as output;
This code executes and returns successfully, but when I check the table, no alterations have been made. What is my error?
Doing updates row-by-row in a loop is almost always a bad idea and will be extremely slow and won't scale. You should really find a way to avoid that.
After having said that:
All your function is doing is to change the value of the column value in memory - you are just modifying the contents of a variable. If you want to update the data you need an update statement:
You need to use an UPDATE inside the loop:
CREATE OR REPLACE FUNCTION LoopThroughTable()
RETURNS VOID
AS
$$
DECLARE
t_row the_table%rowtype;
BEGIN
FOR t_row in SELECT * FROM the_table LOOP
update the_table
set resid = 1.0
where pk_column = t_row.pk_column; --<<< !!! important !!!
END LOOP;
END;
$$
LANGUAGE plpgsql;
Note that you have to add a where condition on the primary key to the update statement otherwise you would update all rows for each iteration of the loop.
A slightly more efficient solution is to use a cursor, and then do the update using where current of
CREATE OR REPLACE FUNCTION LoopThroughTable()
RETURNS VOID
AS $$
DECLARE
t_curs cursor for
select * from the_table;
t_row the_table%rowtype;
BEGIN
FOR t_row in t_curs LOOP
update the_table
set resid = 1.0
where current of t_curs;
END LOOP;
END;
$$
LANGUAGE plpgsql;
So if I execute the UPDATE query after the loop has finished, will that commit the changes to the table?
No. The call to the function runs in the context of the calling transaction. So you need to commit after running SELECT LoopThroughTable() if you have disabled auto commit in your SQL client.
Note that the language name is an identifier, do not use single quotes around it. You should also avoid using keywords like row as variable names.
Using dollar quoting (as I did) also makes writing the function body easier
I'm not sure if the proof of concept example does what you want. In general, with SQL, you almost never need a FOR loop. While you can use a function, if you have PostgreSQL 9.3 or later, you can use a LATERAL subquery to perform subqueries for each row.
For example, create 10,000 random 3D points with a random value column:
CREATE TABLE points(
gid serial primary key,
geom geometry(PointZ),
value numeric
);
CREATE INDEX points_geom_gist ON points USING gist (geom);
INSERT INTO points(geom, value)
SELECT ST_SetSRID(ST_MakePoint(random()*1000, random()*1000, random()*100), 0), random()
FROM generate_series(1, 10000);
For each point, search for the 100 nearest points (except the point in question), and find the residual between the points' value and the average of the 100 nearest:
SELECT p.gid, p.value - avg(l.value) residual
FROM points p,
LATERAL (
SELECT value
FROM points j
WHERE j.gid <> p.gid
ORDER BY p.geom <-> j.geom
LIMIT 100
) l
GROUP BY p.gid
ORDER BY p.gid;
Following is a simple example to update rows in a table:
Assuming the row id field id
Update all rows:
UPDATE my_table SET field1='some value'
WHERE id IN (SELECT id FROM staff)
Selective row update
UPDATE my_table SET field1='some value'
WHERE id IN (SELECT id FROM staff WHERE field2='same value')
You don't need a function for that.
All you need is to run this query:
UPDATE table SET resid = 1.0;
if you want to do it with a function you can use SQL function:
CREATE OR REPLACE FUNCTION LoopThroughTable()
RETURNS VOID AS
$BODY$
UPDATE table SET resid = 1.0;
$BODY$
LANGUAGE sql VOLATILE
if you want to use plpgsql then function would be:
CREATE OR REPLACE FUNCTION LoopThroughTable()
RETURNS void AS
$BODY$
begin
UPDATE table SET resid = 1.0;
end;
$BODY$
LANGUAGE plpgsql VOLATILE
Note that it is not recommended to use plpgsql functions for tasks that can be done with Sql functions.

Input table for PL/pgSQL function

I would like to use a plpgsql function with a table and several columns as input parameter. The idea is to split the table in chunks and do something with each part.
I tried the following function:
CREATE OR REPLACE FUNCTION my_func(Integer)
RETURNS SETOF my_part
AS $$
DECLARE
out my_part;
BEGIN
FOR i IN 0..$1 LOOP
FOR out IN
SELECT * FROM my_func2(SELECT * FROM table1 WHERE id = i)
LOOP
RETURN NEXT out;
END LOOP;
END LOOP;
RETURN;
END;
$$
LANGUAGE plpgsql;
my_func2() is the function that does some work on each smaller part.
CREATE or REPLACE FUNCTION my_func2(table1)
RETURNS SETOF my_part2 AS
$$
BEGIN
RETURN QUERY
SELECT * FROM table1;
END
$$
LANGUAGE plpgsql;
If I run:
SELECT * FROM my_func(99);
I guess I should receive the first 99 IDs processed for each id.
But it says there is an error for the following line:
SELECT * FROM my_func2(select * from table1 where id = i)
The error is:
The subquery is only allowed to return one column
Why does this happen? Is there an easy way to fix this?
There are multiple misconceptions here. Study the basics before you try advanced magic.
Postgres does not have "table variables". You can only pass 1 column or row at a time to a function. Use a temporary table or a refcursor (like commented by #Daniel) to pass a whole table. The syntax is invalid in multiple places, so it's unclear whether that's what you are actually trying.
Even if it is: it would probably be better to process one row at a time or rethink your approach and use a set-based operation (plain SQL) instead of passing cursors.
The data types my_part and my_part2 are undefined in your question. May be a shortcoming of the question or a problem in the test case.
You seem to expect that the table name table1 in the function body of my_func2() refers to the function parameter of the same (type!) name, but this is fundamentally wrong in at least two ways:
You can only pass values. A table name is an identifier, not a value. You would need to build a query string dynamically and execute it with EXECUTE in a plpgsql function. Try a search, many related answers her on SO. Then again, that may also not be what you wanted.
table1 in CREATE or REPLACE FUNCTION my_func2(table1) is a type name, not a parameter name. It means your function expects a value of the type table1. Obviously, you have a table of the same name, so it's supposed to be the associated row type.
The RETURN type of my_func2() must match what you actually return. Since you are returning SELECT * FROM table1, make that RETURNS SETOF table1.
It can just be a simple SQL function.
All of that put together:
CREATE or REPLACE FUNCTION my_func2(_row table1)
RETURNS SETOF table1 AS
'SELECT ($1).*' LANGUAGE sql;
Note the parentheses, which are essential for decomposing a row type. Per documentation:
The parentheses are required here to show that compositecol is a column name not a table name
But there is more ...
Don't use out as variable name, it's a keyword of the CREATE FUNCTION statement.
The syntax of your main query my_func() is more like psudo-code. Too much doesn't add up.
Proof of concept
Demo table:
CREATE TABLE table1(table1_id serial PRIMARY KEY, txt text);
INSERT INTO table1(txt) VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g');
Helper function:
CREATE or REPLACE FUNCTION my_func2(_row table1)
RETURNS SETOF table1 AS
'SELECT ($1).*' LANGUAGE sql;
Main function:
CREATE OR REPLACE FUNCTION my_func(int)
RETURNS SETOF table1 AS
$func$
DECLARE
rec table1;
BEGIN
FOR i IN 0..$1 LOOP
FOR rec IN
SELECT * FROM table1 WHERE table1_id = i
LOOP
RETURN QUERY
SELECT * FROM my_func2(rec);
END LOOP;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM my_func(99);
SQL Fiddle.
But it's really just a a proof of concept. Nothing useful, yet.
As the error log is telling you.. you can return only one column in a subquery, so you have to change it to
SELECT my_func2(SELECT Specific_column_you_need FROM hasval WHERE wid = i)
a possible solution can be that you pass to funct2 the primary key of the table your funct2 needs and then you can obtain the whole table by making the SELECT * inside the function

PL/pgSQL Return SETOF Records Error

I am relatively new to postgresql and battling my way to get familiarized with it. I had run in to an error while writing a new pl/sql function. ERROR: type "ordered_parts" does not exist
CREATE OR REPLACE FUNCTION get_ordered_parts(var_bill_to integer)
RETURNS SETOF ordered_parts AS
$BODY$
declare
var_ordered_id record;
var_part ordered_parts;
begin
for var_ordered in select order_id from view_orders where bill_to = var_bill_to
loop
for var_part select os.po_num,os.received,os.customer_note,orders.part_num,orders.description,orders.order_id,orders.remaining_quantity from (select vli.part_num,vli.description,vli.order_id,vli.quantity - vli.quantity_shipped as remaining_quantity from view_line_items as vli where vli.order_id in (select order_id from view_orders where bill_to = var_bill_to and order_id = var_ordered.order_id) and vli.quantity - vli.quantity_shipped > 0)as orders left join order_sales as os on orders.order_id = os.order_id
then
-- Then we've found a leaf part
return next var_part;
end if;
end loop;
end;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100
ROWS 1000;
ALTER FUNCTION get_ordered_parts(integer) OWNER TO postgres;
just note - your code is perfect example how don't write stored procedure ever. For some longer results it can be extremely slow. Minimally two cycles can be joined to one, or better, you can use just only one RETURN QUERY statement. Next issue is zero formatting of embedded SQL - good length of line is between 70 and 100 chars - writing long SQL statement to one line going to zero readability and maintainability code.
Relation database is not array, and any query has some cost, so don't use nested FOR if you really don't need it. I am sorry for offtopic.
The error message is telling you that you have declared the return type of your function to be SETOF ordered_parts, but it doesn't know what kind of thing ordered_parts is. Within your Declare block you also have a variable declared as this same type (var_part ordered_parts).
If you had a table or view called ordered_parts, then its "row type" would be automatically created as a type, but this is not the case here. if you just want to use an arbitrary row from a result set, you can just use the generic type record.
So in this case your function should say RETURNS SETOF record, and your Declare block var_part record.
Bonus tip: rather than looping over the result of your query and running RETURN NEXT on each row, you can use RETURN QUERY to throw the whole result set into the returned set in one go. See this Postgres manual page.

Set a default return value for a Postgres function

I have the following function in Postgres:
CREATE OR REPLACE FUNCTION point_total(user_id integer, gametime date)
RETURNS bigint AS
$BODY$
SELECT sum(points) AS result
FROM picks
WHERE user_id = $1
AND picks.gametime > $2
AND points IS NOT NULL;
$BODY$
LANGUAGE sql VOLATILE;
It works correctly, but when a user starts out and has no points, it very reasonably returns NULL. How can I modify it so that it returns 0 instead.
Changing the body of the function to that below results in an "ERROR: syntax error at or near "IF".
SELECT sum(points) AS result
FROM picks
WHERE user_id = $1
AND picks.gametime > $2
AND points IS NOT NULL;
IF result IS NULL
SELECT 0 AS result;
END;
You need to change the language from sqlto plpgsql if you want to use the procedural features of PL/pgSQL. The function body changes, too.
Be aware that all parameter names are visible in the function body, including all levels of SQL statements. If you create a naming conflict, you may need to table-qualify column names like this: table.col, to avoid confusion. Since you refer to function parameters by positional reference ($n) anyway, I just removed parameter names to make it work.
Finally, THEN was missing in the IF statement - the immediate cause of the error message.
One could use COALESCE to substitute for NULL values. But that only works if there is at least one resulting row. COALESCE can't fix "no row" it can only replace actual NULL values.
There are several ways to cover all NULL cases. In plpgsql functions:
CREATE OR REPLACE FUNCTION point_total(integer, date, OUT result bigint)
RETURNS bigint AS
$func$
BEGIN
SELECT sum(p.points) -- COALESCE would make sense ...
INTO result
FROM picks p
WHERE p.user_id = $1
AND p.gametime > $2
AND p.points IS NOT NULL; -- ... if NULL values were not ruled out
IF NOT FOUND THEN -- If no row was found ...
result := 0; -- ... set to 0 explicitly
END IF;
END
$func$ LANGUAGE plpgsql;
Or you can enclose the whole query in a COALESCE expression in an outer SELECT. "No row" from the inner SELECT results in a NULL in the expression. Work as plain SQL, or you can wrap it in an sql function:
CREATE OR REPLACE FUNCTION point_total(integer, date)
RETURNS bigint AS
$func$
SELECT COALESCE(
(SELECT sum(p.points)
FROM picks p
WHERE p.user_id = $1
AND p.gametime > $2
-- AND p.points IS NOT NULL -- redundant here
), 0)
$func$ LANGUAGE sql;
Related answer:
How to display a default value when no match found in a query?
Concerning naming conflicts
One problem was the naming conflict most likely. There have been major changes in version 9.0. I quote the release notes:
E.8.2.5. PL/pgSQL
PL/pgSQL now throws an error if a variable name conflicts with a
column name used in a query (Tom Lane)
Later versions have refined the behavior. In obvious spots the right alternative is picked automatically. Reduces the potential for conflicts, but it's still there. The advice still applies in Postgres 9.3.