PLPGSQL Function to Calculate Bearing - sql

Basically I can't get my head around the syntax of plpgsql and would appreciate some help with the following efforts.
I have a table containing 1000's of wgs84 points. The following SQL will retrieve a set of points within a bounding box on this table:
SELECT id, ST_X(wgs_geom), ST_Y(wgs_geom), ST_Z(wgs_geom)
FROM points_table
INNER JOIN
(SELECT ST_Transform(ST_GeomFromText('POLYGON((-1.73576102027 1.5059743629,
-1.73591122397 51.5061067655,-1.73548743495 51.5062838333,-1.73533186682
1.5061514313,-1.73576102027 51.5059743629))', 4326, 27700)
) AS bgeom
) AS t2
ON ST_Within(local_geom, t2.bgeom)
What I need to do is add a bearing/azimuth column to the results that describes the bearing at each point in the returned data set.
So the approach I'm trying to implement is to build a plpgsql function that can select the data as per above and calculate the bearing between each set of points in a loop.
However my efforts at understanding basic data access and handling within a plpgsql function are failing miserably.
An example of the current version of the function I'm trying to create is as follows:
CREATE TYPE bearing_type AS (x numeric, y numeric, z numeric, bearing numeric);
--DROP FUNCTION IF EXISTS get_bearings_from_points();
CREATE OR REPLACE FUNCTION get_bearings_from_points()
RETURNS SETOF bearing_type AS
$BODY$
DECLARE
rowdata points_table%rowtype;
returndata bearing_type;
BEGIN
FOR rowdata IN
SELECT nav_id, wgs_geom
FROM points_table INNER JOIN
(SELECT ST_Transform(ST_GeomFromText('POLYGON((-1.73576102027
3.5059743629,-1.73591122397 53.5061067655,-1.73548743495
53.5062838333,-1.73533186682 53.5061514313,-1.73576102027
53.5059743629))', 4326), 27700)
AS bgeom)
AS t2 ON ST_Within(local_geom, t2.bgeom)
LOOP
returndata.x := ST_X(rowdata.wgs_geom);
returndata.y := ST_Y(rowdata.wgs_geom);
returndata.z := ST_Z(rowdata.wgs_geom);
returndata.bearing := ST_Azimuth(<current_point> , <next_point>)
RETURN NEXT returndata;
END LOOP;
RETURN;
END
$BODY$
LANGUAGE plpgsql;
I would like to just call this function as follows:
SELECT get_bearings_from_points();
and get the desired result.
Basically the problems are understanding how to access the rowdata properly such that I can read the current and next points.
In the above example I've had various problems from how to call the ST_X etc SQL functions and have tried EXECUTE select statements with errors re geometry data types.
Any insights/help would be much appreciated.

In PL/pgSQL it's most effective to do as much as is elegantly possible in basic SQL queries at once. You can largely simplify.
I didn't get a definition of the sort order out of your question and left ??? to fill in for you:
CREATE OR REPLACE FUNCTION get_bearings_from_points(_bgeom geometry)
RETURNS TABLE (x numeric, y numeric, z numeric, bearing numeric) AS
$func$
BEGIN
FOR x, y, z, bearing IN
SELECT ST_X(t.wgs_geom), ST_Y(t.wgs_geom), ST_Z(t.wgs_geom)
, ST_Azimuth(t.wgs_geom, lead(t.wgs_geom) OVER (ORDER BY ???))
FROM points_table t
WHERE ST_Within(t.local_geom, _bgeom)
ORDER BY ???
LOOP
RETURN NEXT;
END LOOP;
END
$func$ LANGUAGE plpgsql;
The window function lead() references a column from the next row according to sort order.
This can be simplified further to a single SQL query - possibly wrapped into an SQL function:
CREATE OR REPLACE FUNCTION get_bearings_from_points(_bgeom geometry)
RETURNS TABLE (x numeric, y numeric, z numeric, bearing numeric) AS
$func$
SELECT ST_X(t.wgs_geom), ST_Y(t.wgs_geom), ST_Z(t.wgs_geom)
, ST_Azimuth(t.wgs_geom, lead(t.wgs_geom) OVER (ORDER BY ???))
FROM points_table t
WHERE ST_Within(t.local_geom, $1) -- use numbers in pg 9.1 or older
ORDER BY ???
$func$ LANGUAGE sql;
Parameter names can be referenced in pg 9.2 or later. Per release notes of pg 9.2:
Allow SQL-language functions to reference parameters by name (Matthew
Draper)

Related

PostgreSQL: Why calling function (NUMERIC -> NUMERIC) in SELECT is too expensive?

I have a query with a long select part with many cases when statement and I tried to put them to function for clarity, but the query start running longer than before.
So, for testing, I create one simple table
CREATE TABLE t1 AS
select random()::numeric from generate_series(1, 5000000, 1)
and one very simple function
create funtion f1 (i NUMERIC)
returns NUMERIC
LANGUAGE plpgsql
AS
$body$
begin
return i+1
end
$body$
and running two queries:
1: SELECT random, random+1 from T1
2: SELECT random, f1(random) from T1
and I don't know why the first is running two times faster than the second, so please help.
Sorry for my English :(
That is because the function f1 has to be called for every row, and calling a PL/pgSQL function is quite expensive.
If you define it as an SQL function, it could be inlined, which would be much cheaper:
FREATE FUNCTION f1(i NUMERIC) RETURNS numeric
LANGUAGE sql AS
'SELECT i + 1';

Iterate through table, perform calculation on each row

I would like to preface this by saying I am VERY new to SQL, but my work now requires that I work in it.
I have a dataset containing topographical point data (x,y,z). I am trying to build a KNN model based on this data. For every point 'P', I search for the 100 points in the data set nearest P (nearest meaning geographically nearest). I then average the values of these points (this average is known as a residual), and add this value to the table in the 'resid' column.
As a proof of concept, I am trying to simply iterate over the table, and set the value of the 'resid' column to 1.0 in every row.
My query is this:
CREATE OR REPLACE FUNCTION LoopThroughTable() RETURNS VOID AS '
DECLARE row table%rowtype;
BEGIN
FOR row in SELECT * FROM table LOOP
SET row.resid = 1.0;
END LOOP;
END
' LANGUAGE 'plpgsql';
SELECT LoopThroughTable() as output;
This code executes and returns successfully, but when I check the table, no alterations have been made. What is my error?
Doing updates row-by-row in a loop is almost always a bad idea and will be extremely slow and won't scale. You should really find a way to avoid that.
After having said that:
All your function is doing is to change the value of the column value in memory - you are just modifying the contents of a variable. If you want to update the data you need an update statement:
You need to use an UPDATE inside the loop:
CREATE OR REPLACE FUNCTION LoopThroughTable()
RETURNS VOID
AS
$$
DECLARE
t_row the_table%rowtype;
BEGIN
FOR t_row in SELECT * FROM the_table LOOP
update the_table
set resid = 1.0
where pk_column = t_row.pk_column; --<<< !!! important !!!
END LOOP;
END;
$$
LANGUAGE plpgsql;
Note that you have to add a where condition on the primary key to the update statement otherwise you would update all rows for each iteration of the loop.
A slightly more efficient solution is to use a cursor, and then do the update using where current of
CREATE OR REPLACE FUNCTION LoopThroughTable()
RETURNS VOID
AS $$
DECLARE
t_curs cursor for
select * from the_table;
t_row the_table%rowtype;
BEGIN
FOR t_row in t_curs LOOP
update the_table
set resid = 1.0
where current of t_curs;
END LOOP;
END;
$$
LANGUAGE plpgsql;
So if I execute the UPDATE query after the loop has finished, will that commit the changes to the table?
No. The call to the function runs in the context of the calling transaction. So you need to commit after running SELECT LoopThroughTable() if you have disabled auto commit in your SQL client.
Note that the language name is an identifier, do not use single quotes around it. You should also avoid using keywords like row as variable names.
Using dollar quoting (as I did) also makes writing the function body easier
I'm not sure if the proof of concept example does what you want. In general, with SQL, you almost never need a FOR loop. While you can use a function, if you have PostgreSQL 9.3 or later, you can use a LATERAL subquery to perform subqueries for each row.
For example, create 10,000 random 3D points with a random value column:
CREATE TABLE points(
gid serial primary key,
geom geometry(PointZ),
value numeric
);
CREATE INDEX points_geom_gist ON points USING gist (geom);
INSERT INTO points(geom, value)
SELECT ST_SetSRID(ST_MakePoint(random()*1000, random()*1000, random()*100), 0), random()
FROM generate_series(1, 10000);
For each point, search for the 100 nearest points (except the point in question), and find the residual between the points' value and the average of the 100 nearest:
SELECT p.gid, p.value - avg(l.value) residual
FROM points p,
LATERAL (
SELECT value
FROM points j
WHERE j.gid <> p.gid
ORDER BY p.geom <-> j.geom
LIMIT 100
) l
GROUP BY p.gid
ORDER BY p.gid;
Following is a simple example to update rows in a table:
Assuming the row id field id
Update all rows:
UPDATE my_table SET field1='some value'
WHERE id IN (SELECT id FROM staff)
Selective row update
UPDATE my_table SET field1='some value'
WHERE id IN (SELECT id FROM staff WHERE field2='same value')
You don't need a function for that.
All you need is to run this query:
UPDATE table SET resid = 1.0;
if you want to do it with a function you can use SQL function:
CREATE OR REPLACE FUNCTION LoopThroughTable()
RETURNS VOID AS
$BODY$
UPDATE table SET resid = 1.0;
$BODY$
LANGUAGE sql VOLATILE
if you want to use plpgsql then function would be:
CREATE OR REPLACE FUNCTION LoopThroughTable()
RETURNS void AS
$BODY$
begin
UPDATE table SET resid = 1.0;
end;
$BODY$
LANGUAGE plpgsql VOLATILE
Note that it is not recommended to use plpgsql functions for tasks that can be done with Sql functions.

Input table for PL/pgSQL function

I would like to use a plpgsql function with a table and several columns as input parameter. The idea is to split the table in chunks and do something with each part.
I tried the following function:
CREATE OR REPLACE FUNCTION my_func(Integer)
RETURNS SETOF my_part
AS $$
DECLARE
out my_part;
BEGIN
FOR i IN 0..$1 LOOP
FOR out IN
SELECT * FROM my_func2(SELECT * FROM table1 WHERE id = i)
LOOP
RETURN NEXT out;
END LOOP;
END LOOP;
RETURN;
END;
$$
LANGUAGE plpgsql;
my_func2() is the function that does some work on each smaller part.
CREATE or REPLACE FUNCTION my_func2(table1)
RETURNS SETOF my_part2 AS
$$
BEGIN
RETURN QUERY
SELECT * FROM table1;
END
$$
LANGUAGE plpgsql;
If I run:
SELECT * FROM my_func(99);
I guess I should receive the first 99 IDs processed for each id.
But it says there is an error for the following line:
SELECT * FROM my_func2(select * from table1 where id = i)
The error is:
The subquery is only allowed to return one column
Why does this happen? Is there an easy way to fix this?
There are multiple misconceptions here. Study the basics before you try advanced magic.
Postgres does not have "table variables". You can only pass 1 column or row at a time to a function. Use a temporary table or a refcursor (like commented by #Daniel) to pass a whole table. The syntax is invalid in multiple places, so it's unclear whether that's what you are actually trying.
Even if it is: it would probably be better to process one row at a time or rethink your approach and use a set-based operation (plain SQL) instead of passing cursors.
The data types my_part and my_part2 are undefined in your question. May be a shortcoming of the question or a problem in the test case.
You seem to expect that the table name table1 in the function body of my_func2() refers to the function parameter of the same (type!) name, but this is fundamentally wrong in at least two ways:
You can only pass values. A table name is an identifier, not a value. You would need to build a query string dynamically and execute it with EXECUTE in a plpgsql function. Try a search, many related answers her on SO. Then again, that may also not be what you wanted.
table1 in CREATE or REPLACE FUNCTION my_func2(table1) is a type name, not a parameter name. It means your function expects a value of the type table1. Obviously, you have a table of the same name, so it's supposed to be the associated row type.
The RETURN type of my_func2() must match what you actually return. Since you are returning SELECT * FROM table1, make that RETURNS SETOF table1.
It can just be a simple SQL function.
All of that put together:
CREATE or REPLACE FUNCTION my_func2(_row table1)
RETURNS SETOF table1 AS
'SELECT ($1).*' LANGUAGE sql;
Note the parentheses, which are essential for decomposing a row type. Per documentation:
The parentheses are required here to show that compositecol is a column name not a table name
But there is more ...
Don't use out as variable name, it's a keyword of the CREATE FUNCTION statement.
The syntax of your main query my_func() is more like psudo-code. Too much doesn't add up.
Proof of concept
Demo table:
CREATE TABLE table1(table1_id serial PRIMARY KEY, txt text);
INSERT INTO table1(txt) VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g');
Helper function:
CREATE or REPLACE FUNCTION my_func2(_row table1)
RETURNS SETOF table1 AS
'SELECT ($1).*' LANGUAGE sql;
Main function:
CREATE OR REPLACE FUNCTION my_func(int)
RETURNS SETOF table1 AS
$func$
DECLARE
rec table1;
BEGIN
FOR i IN 0..$1 LOOP
FOR rec IN
SELECT * FROM table1 WHERE table1_id = i
LOOP
RETURN QUERY
SELECT * FROM my_func2(rec);
END LOOP;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM my_func(99);
SQL Fiddle.
But it's really just a a proof of concept. Nothing useful, yet.
As the error log is telling you.. you can return only one column in a subquery, so you have to change it to
SELECT my_func2(SELECT Specific_column_you_need FROM hasval WHERE wid = i)
a possible solution can be that you pass to funct2 the primary key of the table your funct2 needs and then you can obtain the whole table by making the SELECT * inside the function

Passing a ResultSet into a Postgresql Function

Is it possible to pass the results of a postgres query as an input into another function?
As a very contrived example, say I have one query like
SELECT id, name
FROM users
LIMIT 50
and I want to create a function my_function that takes the resultset of the first query and returns the minimum id. Is this possible in pl/pgsql?
SELECT my_function(SELECT id, name FROM Users LIMIT 50); --returns 50
You could use a cursor, but that very impractical for computing a minimum.
I would use a temporary table for that purpose, and pass the table name for use in dynamic SQL:
CREATE OR REPLACE FUNCTION f_min_id(_tbl regclass, OUT min_id int) AS
$func$
BEGIN
EXECUTE 'SELECT min(id) FROM ' || _tbl
INTO min_id;
END
$func$ LANGUAGE plpgsql;
Call:
CREATE TEMP TABLE foo ON COMMIT DROP AS
SELECT id, name
FROM users
LIMIT 50;
SELECT f_min_id('foo');
Major points
The first parameter is of type regclass to prevent SQL injection. More info in this related answer on dba.SE.
I made the temp table ON COMMIT DROP to limit its lifetime to the current transaction. May or may not be what you want.
You can extend this example to take more parameters. Search for code examples for dynamic SQL with EXECUTE.
-> SQLfiddle demo
I would take the problem on the other side, calling an aggregate function for each record of the result set. It's not as flexible but can gives you an hint to work on.
As an exemple to follow your sample problem:
CREATE OR REPLACE FUNCTION myMin ( int,int ) RETURNS int AS $$
SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;
$$ LANGUAGE SQL STRICT IMMUTABLE;
CREATE AGGREGATE my_function ( int ) (
SFUNC = myMin, STYPE = int, INITCOND = 2147483647 --maxint
);
SELECT my_function(id) from (SELECT * FROM Users LIMIT 50) x;
It is not possible to pass an array of generic type RECORD to a plpgsql function which is essentially what you are trying to do.
What you can do is pass in an array of a specific user defined TYPE or of a particular table row type. In the example below you could also swap out the argument data type for the table name users[] (though this would obviously mean getting all data in the users table row).
CREATE TYPE trivial {
"ID" integer,
"NAME" text
}
CREATE OR REPLACE FUNCTION trivial_func(data trivial[])
RETURNS integer AS
$BODY$
DECLARE
BEGIN
--Implementation here using data
return 1;
END$BODY$
LANGUAGE 'plpgsql' VOLATILE;
I think there's no way to pass recordset or table into function (but I'd be glad if i'm wrong). Best I could suggest is to pass array:
create or replace function my_function(data int[])
returns int
as
$$
select min(x) from unnest(data) as x
$$
language SQL;
sql fiddle demo

Custom aggregate function

I am trying to understand aggregate functions and I need help.
So for instance the following sample:
CREATE OR REPLACE FUNCTION array_median(timestamp[])
RETURNS timestamp AS
$$
SELECT CASE WHEN array_upper($1,1) = 0 THEN null ELSE asorted[ceiling(array_upper(asorted,1)/2.0)] END
FROM (SELECT ARRAY(SELECT ($1)[n] FROM
generate_series(1, array_upper($1, 1)) AS n
WHERE ($1)[n] IS NOT NULL
ORDER BY ($1)[n]
) As asorted) As foo ;
$$
LANGUAGE 'sql' IMMUTABLE;
CREATE AGGREGATE median(timestamp) (
SFUNC=array_append,
STYPE=timestamp[],
FINALFUNC=array_median
)
I am not understanding the structure/logic that needs to go into the select statement in the aggregate function itself. Can someone explain what the flow/logic is?
I am writing an aggregate, a strange one, that the return is always the first string it ever sees.
You're showing a median calculation, but want the first text value you see?
Below is how to do that. Assuming you want the first non-null value, that is. If not, you'll need to keep track of if you've got a value already or not.
The accumulator function is written as plpgsql and sql - the plpgsql one lets you use variable names and debug it too. It simply uses COALESCE against the previous accumulated value and the new value and returns the first non-null. So - as soon as you have a non-null in the accumulator everything else gets ignored.
You may also want to consider the "first_value" window function for this sort of thing if you're on a modern (8.4+) version of PostgreSQL.
http://www.postgresql.org/docs/9.1/static/functions-window.html
HTH
BEGIN;
CREATE FUNCTION remember_first(acc text, newval text) RETURNS text AS $$
BEGIN
RAISE NOTICE '% vs % = %', acc, newval, COALESCE(acc, newval);
RETURN COALESCE(acc, newval);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
CREATE FUNCTION remember_first_sql(text,text) RETURNS text AS $$
SELECT COALESCE($1, $2);
$$ LANGUAGE SQL IMMUTABLE;
-- No "initcond" means we start out with null
--
CREATE AGGREGATE first(text) (
sfunc = remember_first,
stype = text
);
CREATE TEMP TABLE tt (t text);
INSERT INTO tt VALUES ('abc'),('def'),('ghi');
SELECT first(t) FROM tt;
ROLLBACK;