PostgreSQL Function not working with WHERE clause - sql

I am trying to make a function in Postgres to make my queries faster compared to Django ORM. But the problem I am facing is results are coming when there is no WHERE clause in the query.
This is the function and its call which yields 0 rows:
CREATE OR REPLACE FUNCTION public.standard_search(search_term text, similarity_to integer)
RETURNS TABLE(obj_id integer, app_num integer, app_for text, similarity integer)
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
similarity integer;
BEGIN
RETURN QUERY
SELECT mark.id, mark.app_num, mark.app_for::text, levenshtein($1, focusword.word) AS similarity
FROM mark
INNER JOIN focusword ON (mark.id = focusword.mark_id)
WHERE similarity <= $2
ORDER BY similarity, mark.app_for, mark.app_num;
END
$BODY$;
select * from public.standard_search('millennium', 4)
This is the function and its call which is giving me results but is slow as the filtering is done in the function call:
CREATE OR REPLACE FUNCTION public.standard_search(search_term text, similarity_to integer)
RETURNS TABLE(obj_id integer, app_num integer, app_for text, similarity integer)
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
similarity integer;
BEGIN
RETURN QUERY
SELECT mark.id, mark.app_num, mark.app_for::text, levenshtein($1, focusword.word) AS similarity
FROM mark
INNER JOIN focusword ON (mark.id = focusword.trademark_id)
ORDER BY similarity, trad.app_for, mark.app_num;
END
$BODY$;
select * from public.standard_search('millennium', 4) where similarity <= 4
Can anyone shed some light on what is actually going wrong here? After this I can work on the performance improvements.
I was unable to perform this via VIEWS as it required at least one parameter, i.e., search_term to be passed into the levenshtein() function.
I was also facing a problem of passing a tuple as a parameter in the function which is again to be used in the where clause like:
WHERE mark.class in (1,2,3,4,5)
I was doing this previously via RawSQL feature of Django ORM, but trying to do it here because of the performance improvement gains.

The identifier similarity is used in a confusing variety of ways. Not all make sense ...
CREATE OR REPLACE FUNCTION public.standard_search(search_term text, similarity_to integer)
RETURNS TABLE(obj_id integer, app_num integer, app_for text, similarity integer) -- ①
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
similarity integer; -- ②
BEGIN
RETURN QUERY
SELECT mark.id, mark.app_num, mark.app_for::text, levenshtein($1, focusword.word) AS similarity -- ③
FROM mark
INNER JOIN focusword ON (mark.id = focusword.mark_id)
WHERE similarity <= $2 -- ④
ORDER BY similarity, mark.app_for, mark.app_num; -- ⑤
END
$BODY$;
① ... as column in the return type defined by RETURNS TABLE - effectively an OUT parameter.
② ... as variable - overruling visibility of the OUT parameter. But why?
③ ... as column alias in the SELECT list.
④ ... in the WHERE clause, which makes no sense. It does not refer to ③ (like you seem to assume). Output names are not visible in the WHERE clause. There you can only refer to input column names - or variables and parameters. Since the variable (hiding the parameter, but that makes no difference here) is NULL, no rows are returned. Ever.
⑤ ... in ORDER BY, which resolves to the column alias defined in ③
This is madness. Even I had a hard time figuring out which is visible where, and I have some experience with this. Avoid naming conflicts like this. Follow some naming convention and use distinct names for parameters, variables, column names and aliases!
Related:
PostgreSQL does not accept column alias in WHERE clause
Postgres Function NULL value for row that references NEW
How to return result of a SELECT inside a function in PostgreSQL?
Possible solution
This would make more sense:
CREATE OR REPLACE FUNCTION public.standard_search(_search_term text, _similarity_to integer)
RETURNS TABLE(obj_id integer, app_num integer, app_for text, similarity integer) AS
$func$
SELECT m.id, m.app_num, m.app_for::text, levenshtein(_search_term, f.word) AS sim
FROM mark m
JOIN focusword f ON m.id = f.mark_id
WHERE levenshtein($1, f.word) <= _similarity_to
ORDER BY sim, m.app_for, m.app_num;
$func$ LANGUAGE sql;

Related

Compute an aggregated tsrange from a set of entries?

I am trying to compute a aggregated tsrange from a set of row that I extract from an SQL query. Problem is that I keep getting errors that the input parameter is not being passed in.
CREATE OR REPLACE AGGREGATE range_merge(anyrange)
(
sfunc = range_merge,
stype = anyrange
);
DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint) returns tsrange AS
$$
DECLARE
result tsrange;
BEGIN
EXECUTE format('select range_merge(valid) from %s where entity_id = %U', entity_name, entry) into result;
return result;
END
$$ LANGUAGE plpgsql;
When I do:
select * from aggregate_validity(country, 1);
I get an error stating that the entity name and entry do not exist. It does not seem to parameterize the input into the statement properly.
Function:
EXECUTE format('select range_merge(valid) from %s where entity_id=%U',entity_name, entry)
into result;
=>
EXECUTE format('select range_merge(valid) from %I where entity_id=%s',entity_name, entry)
into result;
--%I for identifier, %s for value
Call:
select * from aggregate_validity(country, 1)
=>
select * from aggregate_validity('country', 1);
db<>fiddle demo
CREATE OR REPLACE AGGREGATE range_merge(anyrange) (
SFUNC = range_merge
, STYPE = anyrange
);
-- DROP FUNCTION IF EXISTS aggregate_validity(entity_name regclass, entry bigint);
CREATE OR REPLACE FUNCTION aggregate_validity(entity_name regclass, entry bigint, OUT result tsrange)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'SELECT range_merge(valid) FROM ' || entity_name || ' WHERE entity_id = $1'
INTO result
USING entry;
END
$func$;
Call:
SELECT aggregate_validity('country', 1);
db<>fiddle here
The call does not need SELECT * FROM, as the function returns a single value per definition.
I used an OUT parameter to simplify (OUT result tsrange). See:
Returning from a function with OUT parameter
Don't concatenate the entry value into the SQL string. Pass it as value with the USING clause. Cleaner, faster.
Since entity_name is passed as regclass, it's safe to simply concatenate (which is a bit cheaper). See:
Table name as a PostgreSQL function parameter
Plus, missing quotes and incorrect format specifiers, as Lukasz already provided.
Your custom aggregate function range_merge() has some caveats:
I wouldn't name it "range_merge", that being the name of the plain function range_merge(), too. While that's legal, it still invites confusing errors.
You are aware that the function range_merge() includes gaps between input ranges in the output range?
range_merge() returns NULL for any NULL input. So if your table has any NULL values in the column valid, the result is always NULL. I strongly suggest that any involved columns shall be defined as NOT NULL.
If you are at liberty to install additional modules, consider range_agg by Paul Jungwirth who is also here on Stackovflow. It provides the superior function range_agg() addressing some of the mentioned issues.
If you don't want to include gaps, consider the Postgres Wiki page on range aggregation.
I would probably not use aggregate_validity() at all. It obscures the nested functionality from the Postgres query planner and may lead so suboptimal query plans. Typically, you can replace it with a correlated or a LATERAL subquery, which can be planned and optimized by Postgres in context of the outer query. I appended a demo to the fiddle:
db<>fiddle here
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

How to convert Postgres plpgsql user-defined function to LANGUAGE SQL user-defined function?

My understanding is, within Postgres Database, we can write SQL style user -created function and PlpgSQL style user-created function. And they should be able to translate from one to the other. First off, am I conceptually wrong?
Here is an example:
I was trying to convert such code below:
CREATE OR REPLACE FUNCTION getNthHighestSalary(N integer) RETURNS integer
AS $$
BEGIN
return (
select distinct salary
from employee
order by salary
limit 1 offset $1-1);
END;$$ LANGUAGE plpgsql;
into something like:
CREATE OR REPLACE FUNCTION getNthHighestSalary(N integer) RETURNS integer
AS
BEGIN
return (
select distinct salary
from employee
order by salary
limit 1 offset $1-1);
END; LANGUAGE SQL;
no matter how I tried, the code I converted to won't work inside Postgres database, and always throws weird syntax error.
so how to convert the piece of code above to a viable Standard SQL function which is able to run within Postgres database? especially please explain where the problem is and what's the major difference between Standard SQL and Plpgsql syntax in the Postgres Database environment. Thanks a lot
BTW, here's the code for creating test table and inserting test data:
create table Employee
(
id varchar(255) PRIMARY KEY,
Salary numeric
);
insert into Employee values('1',100),('2',200),('3',300);
If you want to use LANGUAGE SQL, then there are a couple of changes you have to make.
First is to get rid of the BEGIN and END.
Second is to simply state the SELECT query without the RETURN keyword.
There were some other problems: You should order by salary desc, the return type is numeric rather than integer, and you need to escape the ; character, so enclose it with $$ as you do the plpgsql functions.
CREATE OR REPLACE FUNCTION getNthHighestSalary(N integer) RETURNS numeric
AS $$
select distinct salary
from employee
order by salary desc
limit 1 offset $1-1;
$$ LANGUAGE SQL;

PLPGSQL Function to Calculate Bearing

Basically I can't get my head around the syntax of plpgsql and would appreciate some help with the following efforts.
I have a table containing 1000's of wgs84 points. The following SQL will retrieve a set of points within a bounding box on this table:
SELECT id, ST_X(wgs_geom), ST_Y(wgs_geom), ST_Z(wgs_geom)
FROM points_table
INNER JOIN
(SELECT ST_Transform(ST_GeomFromText('POLYGON((-1.73576102027 1.5059743629,
-1.73591122397 51.5061067655,-1.73548743495 51.5062838333,-1.73533186682
1.5061514313,-1.73576102027 51.5059743629))', 4326, 27700)
) AS bgeom
) AS t2
ON ST_Within(local_geom, t2.bgeom)
What I need to do is add a bearing/azimuth column to the results that describes the bearing at each point in the returned data set.
So the approach I'm trying to implement is to build a plpgsql function that can select the data as per above and calculate the bearing between each set of points in a loop.
However my efforts at understanding basic data access and handling within a plpgsql function are failing miserably.
An example of the current version of the function I'm trying to create is as follows:
CREATE TYPE bearing_type AS (x numeric, y numeric, z numeric, bearing numeric);
--DROP FUNCTION IF EXISTS get_bearings_from_points();
CREATE OR REPLACE FUNCTION get_bearings_from_points()
RETURNS SETOF bearing_type AS
$BODY$
DECLARE
rowdata points_table%rowtype;
returndata bearing_type;
BEGIN
FOR rowdata IN
SELECT nav_id, wgs_geom
FROM points_table INNER JOIN
(SELECT ST_Transform(ST_GeomFromText('POLYGON((-1.73576102027
3.5059743629,-1.73591122397 53.5061067655,-1.73548743495
53.5062838333,-1.73533186682 53.5061514313,-1.73576102027
53.5059743629))', 4326), 27700)
AS bgeom)
AS t2 ON ST_Within(local_geom, t2.bgeom)
LOOP
returndata.x := ST_X(rowdata.wgs_geom);
returndata.y := ST_Y(rowdata.wgs_geom);
returndata.z := ST_Z(rowdata.wgs_geom);
returndata.bearing := ST_Azimuth(<current_point> , <next_point>)
RETURN NEXT returndata;
END LOOP;
RETURN;
END
$BODY$
LANGUAGE plpgsql;
I would like to just call this function as follows:
SELECT get_bearings_from_points();
and get the desired result.
Basically the problems are understanding how to access the rowdata properly such that I can read the current and next points.
In the above example I've had various problems from how to call the ST_X etc SQL functions and have tried EXECUTE select statements with errors re geometry data types.
Any insights/help would be much appreciated.
In PL/pgSQL it's most effective to do as much as is elegantly possible in basic SQL queries at once. You can largely simplify.
I didn't get a definition of the sort order out of your question and left ??? to fill in for you:
CREATE OR REPLACE FUNCTION get_bearings_from_points(_bgeom geometry)
RETURNS TABLE (x numeric, y numeric, z numeric, bearing numeric) AS
$func$
BEGIN
FOR x, y, z, bearing IN
SELECT ST_X(t.wgs_geom), ST_Y(t.wgs_geom), ST_Z(t.wgs_geom)
, ST_Azimuth(t.wgs_geom, lead(t.wgs_geom) OVER (ORDER BY ???))
FROM points_table t
WHERE ST_Within(t.local_geom, _bgeom)
ORDER BY ???
LOOP
RETURN NEXT;
END LOOP;
END
$func$ LANGUAGE plpgsql;
The window function lead() references a column from the next row according to sort order.
This can be simplified further to a single SQL query - possibly wrapped into an SQL function:
CREATE OR REPLACE FUNCTION get_bearings_from_points(_bgeom geometry)
RETURNS TABLE (x numeric, y numeric, z numeric, bearing numeric) AS
$func$
SELECT ST_X(t.wgs_geom), ST_Y(t.wgs_geom), ST_Z(t.wgs_geom)
, ST_Azimuth(t.wgs_geom, lead(t.wgs_geom) OVER (ORDER BY ???))
FROM points_table t
WHERE ST_Within(t.local_geom, $1) -- use numbers in pg 9.1 or older
ORDER BY ???
$func$ LANGUAGE sql;
Parameter names can be referenced in pg 9.2 or later. Per release notes of pg 9.2:
Allow SQL-language functions to reference parameters by name (Matthew
Draper)

Passing a ResultSet into a Postgresql Function

Is it possible to pass the results of a postgres query as an input into another function?
As a very contrived example, say I have one query like
SELECT id, name
FROM users
LIMIT 50
and I want to create a function my_function that takes the resultset of the first query and returns the minimum id. Is this possible in pl/pgsql?
SELECT my_function(SELECT id, name FROM Users LIMIT 50); --returns 50
You could use a cursor, but that very impractical for computing a minimum.
I would use a temporary table for that purpose, and pass the table name for use in dynamic SQL:
CREATE OR REPLACE FUNCTION f_min_id(_tbl regclass, OUT min_id int) AS
$func$
BEGIN
EXECUTE 'SELECT min(id) FROM ' || _tbl
INTO min_id;
END
$func$ LANGUAGE plpgsql;
Call:
CREATE TEMP TABLE foo ON COMMIT DROP AS
SELECT id, name
FROM users
LIMIT 50;
SELECT f_min_id('foo');
Major points
The first parameter is of type regclass to prevent SQL injection. More info in this related answer on dba.SE.
I made the temp table ON COMMIT DROP to limit its lifetime to the current transaction. May or may not be what you want.
You can extend this example to take more parameters. Search for code examples for dynamic SQL with EXECUTE.
-> SQLfiddle demo
I would take the problem on the other side, calling an aggregate function for each record of the result set. It's not as flexible but can gives you an hint to work on.
As an exemple to follow your sample problem:
CREATE OR REPLACE FUNCTION myMin ( int,int ) RETURNS int AS $$
SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;
$$ LANGUAGE SQL STRICT IMMUTABLE;
CREATE AGGREGATE my_function ( int ) (
SFUNC = myMin, STYPE = int, INITCOND = 2147483647 --maxint
);
SELECT my_function(id) from (SELECT * FROM Users LIMIT 50) x;
It is not possible to pass an array of generic type RECORD to a plpgsql function which is essentially what you are trying to do.
What you can do is pass in an array of a specific user defined TYPE or of a particular table row type. In the example below you could also swap out the argument data type for the table name users[] (though this would obviously mean getting all data in the users table row).
CREATE TYPE trivial {
"ID" integer,
"NAME" text
}
CREATE OR REPLACE FUNCTION trivial_func(data trivial[])
RETURNS integer AS
$BODY$
DECLARE
BEGIN
--Implementation here using data
return 1;
END$BODY$
LANGUAGE 'plpgsql' VOLATILE;
I think there's no way to pass recordset or table into function (but I'd be glad if i'm wrong). Best I could suggest is to pass array:
create or replace function my_function(data int[])
returns int
as
$$
select min(x) from unnest(data) as x
$$
language SQL;
sql fiddle demo

How to return result of a SELECT inside a function in PostgreSQL?

I have this function in PostgreSQL, but I don't know how to return the result of the query:
CREATE OR REPLACE FUNCTION wordFrequency(maxTokens INTEGER)
RETURNS SETOF RECORD AS
$$
BEGIN
SELECT text, count(*), 100 / maxTokens * count(*)
FROM (
SELECT text
FROM token
WHERE chartype = 'ALPHABETIC'
LIMIT maxTokens
) as tokens
GROUP BY text
ORDER BY count DESC
END
$$
LANGUAGE plpgsql;
But I don't know how to return the result of the query inside the PostgreSQL function.
I found that the return type should be SETOF RECORD, right? But the return command is not right.
What is the right way to do this?
Use RETURN QUERY:
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text -- also visible as OUT param in function body
, cnt bigint
, ratio bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, count(*) AS cnt -- column alias only visible in this query
, (count(*) * 100) / _max_tokens -- I added parentheses
FROM (
SELECT t.txt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
LIMIT _max_tokens
) t
GROUP BY t.txt
ORDER BY cnt DESC; -- potential ambiguity
END
$func$;
Call:
SELECT * FROM word_frequency(123);
Defining the return type explicitly is much more practical than returning a generic record. This way you don't have to provide a column definition list with every function call. RETURNS TABLE is one way to do that. There are others. Data types of OUT parameters have to match exactly what is returned by the query.
Choose names for OUT parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.
But note the potential naming conflict between the OUT parameter cnt and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over the OUT parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:
Use the ordinal position of the item in the SELECT list: ORDER BY 2 DESC. Example:
Select first row in each GROUP BY group?
Repeat the expression ORDER BY count(*).
(Not required here.) Set the configuration parameter plpgsql.variable_conflict or use the special command #variable_conflict error | use_variable | use_column in the function. See:
Naming conflict between function parameter and result of JOIN with USING clause
Don't use "text" or "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt and cnt in my examples, you may want more explicit names.
Added a missing ; and corrected a syntax error in the header. (_max_tokens int), not (int maxTokens) - data type after name.
While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Or work with numeric or a floating point type. See below.
Alternative
This is what I think your query should actually look like (calculating a relative share per token):
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text
, abs_cnt bigint
, relative_share numeric)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt, t.cnt
, round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2) -- AS relative_share
FROM (
SELECT t.txt, count(*) AS cnt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
GROUP BY t.txt
ORDER BY cnt DESC
LIMIT _max_tokens
) t
ORDER BY t.cnt DESC;
END
$func$;
The expression sum(t.cnt) OVER () is a window function. You could use a CTE instead of the subquery. Pretty, but a subquery is typically cheaper in simple cases like this one (mostly before Postgres 12).
A final explicit RETURN statement is not required (but allowed) when working with OUT parameters or RETURNS TABLE (which makes implicit use of OUT parameters).
round() with two parameters only works for numeric types. count() in the subquery produces a bigint result and a sum() over this bigint produces a numeric result, thus we deal with a numeric number automatically and everything just falls into place.
Please see the following link for documentation:
https://www.postgresql.org/docs/current/xfunc-sql.html
Example:
CREATE FUNCTION sum_n_product_with_tab (x int)
RETURNS TABLE(sum int, product int) AS $$
SELECT $1 + tab.y, $1 * tab.y FROM tab;
$$ LANGUAGE SQL;