PostgreSQL function returning a data cube - sql

First off, the iceberg-cube query is defined as in
Let's say I have a relation item,location,year,supplier,unit_sales,
and I would like to write a plpgsql functions as
a wrapper around the query in the image, to specify the parameter N,
like so:
create or replace function iceberg_query( percentage integer )
returns cube
/* Code here */
as
$$
declare
numrows int;
begin
select count(*) into numrows from sales;
select item, location, year, count(*)
from sales
group by cube(item,location,year)
having count(*) >= numrows*percentage/100;
end;
$$ language 'plpgsql'
What do I need to add to Code here-part, to make this work? How to specify a data cube as a return type in plpgsql?

To make your plpgsql function work, you need a RETURNS clause matching what you return. And you need to actually return something. I suppose:
CREATE OR REPLACE FUNCTION iceberg_query ( percentage numeric)
RETURNS TABLE (item ?TYPE?, location ?TYPE?, year ?TYPE?, ct bigint)
AS
$func$
DECLARE
numrows bigint := (SELECT count(*) FROM sales);
BEGIN
RETURN QUERY
SELECT s.item, s.location, s.year, count(*)
FROM sales s
GROUP BY cube(s.item,s.location,s.year)
HAVING count(*) >= numrows * percentage / 100;
END
$func$ LANGUAGE plpgsql;
Replace the placeholders ?TYPE? with actual (undisclosed) data types.
Call the function with:
SELECT * FROM iceberg_query (10);
Note how I table-qualify all column names in the query to avoid naming collisions with the new OUT parameters of the same name.
And note the use of numeric instead of integer as pointed out by Scoots in a comment.
Related:
How to return result of a SELECT inside a function in PostgreSQL?
plpgsql error "RETURN NEXT cannot have a parameter in function with OUT parameters" in table-returning function
Aside: you don't need a function for this. This plain SQL query does the same:
SELECT s.item, s.location, s.year, count(*)
FROM sales s
GROUP BY cube(s.item,s.location,s.year)
HAVING count(*) >= (SELECT count(*) * $percentage / 100 FROM sales); -- your pct here
Provide a numeric literal (10.0, not 10) to avoid integer division and the rounding that comes with it.

Related

Return a composite type or multiple columns from a PostgreSQL function

My aim is to write a function that takes in one parameter and returns two values. The query is working perfectly, however, when executed via the function made, I receive an error that a subquery should not return multiple columns.
My function is as follows:
CREATE TYPE double_integer_type AS (p1 integer, p2 integer);
DROP FUNCTION next_dvd_in_queue;
CREATE OR REPLACE FUNCTION next_dvd_in_queue (member_id_p1 integer) RETURNS double_integer_type as $$
BEGIN
RETURN(
select temp2.dvdid,
temp2.movie_title
from
(select temp1.dvdid,
temp1.movie_title,
temp1.customer_priority
from
(select *
from rentalqueue
where rentalqueue.memberid=member_id_p1) temp1
inner join dvd on dvd.dvdid=temp1.dvdid
where dvd.dvdquantityonhand>0) temp2
order by temp2.customer_priority asc
limit 1
);
END; $$ LANGUAGE PLPGSQL
Call:
select dvdid from next_dvd_in_queue(3);
The query, when executed with a hard-coded value, is:
select temp2.dvdid,
temp2.movie_title
from
(select temp1.dvdid,
temp1.movie_title,
temp1.customer_priority
from
(select *
from rentalqueue
where rentalqueue.memberid=3) temp1
inner join dvd on dvd.dvdid=temp1.dvdid
where dvd.dvdquantityonhand>0) temp2
order by temp2.customer_priority asc
limit 1
The above query works fine.
However, when I call the function in the following way:
select * from next_dvd_in_queue(3);
I get the following error:
ERROR: subquery must return only one column
LINE 1: SELECT (
^
QUERY: SELECT (
select temp2.dvdid,
temp2.movie_title
from
(select temp1.dvdid,
temp1.movie_title,
temp1.customer_priority
from
(select *
from rentalqueue
where rentalqueue.memberid=3) temp1
inner join dvd on dvd.dvdid=temp1.dvdid
where dvd.dvdquantityonhand>0) temp2
order by temp2.customer_priority asc
limit 1
)
CONTEXT: PL/pgSQL function next_dvd_in_queue(integer) line 3 at RETURN
You can fix the syntax error with an explicit cast to the composite type:
CREATE OR REPLACE FUNCTION next_dvd_in_queue (member_id_p1 integer)
RETURNS double_integer_type AS
$func$
BEGIN
RETURN (
SELECT ROW(temp2.dvdid, temp2.movie_title)::double_integer_type
FROM ...
);
END
$func$ LANGUAGE plpgsql
But I would remove the needless complication with the composite type and use OUT parameters instead:
CREATE OR REPLACE FUNCTION pg_temp.next_dvd_in_queue (member_id_p1 integer
OUT p1 integer
OUT p2 varchar(100)) AS
$func$
BEGIN
SELECT INTO p1, p2
temp2.dvdid, temp2.movie_title
FROM ...
END
$func$ LANGUAGE plpgsql;
Avoid naming collisions between parameter names and column names. I like to stick to a naming convention where I prefix all parameter names with _, so _member_id_p1, _p1, _p2.
Related:
Returning from a function with OUT parameter
How to return result of a SELECT inside a function in PostgreSQL?

how to get the count of rows returned of last executed selected query in postgresql

I have just started with the PostgreSQL and created a function that returns a table based on the select query with where clause and limit.
I also want to return the count of all the rows from the function that satisfied the condition without considering the limit clause.
Here is the function
CREATE OR REPLACE FUNCTION public.get_notifications(
search_text character,
page_no integer,
count integer)
RETURNS TABLE(id integer, head character, description text, img_url text, link text, created_on timestamp without time zone)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
DECLARE
query text;
skip_records int = page_no * count;
BEGIN
query = concat('SELECT id,head,description,img_url,link,created_on from notifications where head ilike ''%',
search_text,'%''','offset ',skip_records,'limit ',count);
RETURN QUERY Execute query;
END;
$BODY$;
Here is the call
select * from get_notifications('s',0,5)
You can use a window function to return the count of the entire rowset:
select col1
, col2
, count(*) over () as total_rows -- < Window function
from YourTable
order by
col1
limit 10
The over clause specifies the window for count to operate on. The over () is shorthand for over (rows between unbounded preceding and unbounded following).
Working example on SQL Fiddle.

Passing a ResultSet into a Postgresql Function

Is it possible to pass the results of a postgres query as an input into another function?
As a very contrived example, say I have one query like
SELECT id, name
FROM users
LIMIT 50
and I want to create a function my_function that takes the resultset of the first query and returns the minimum id. Is this possible in pl/pgsql?
SELECT my_function(SELECT id, name FROM Users LIMIT 50); --returns 50
You could use a cursor, but that very impractical for computing a minimum.
I would use a temporary table for that purpose, and pass the table name for use in dynamic SQL:
CREATE OR REPLACE FUNCTION f_min_id(_tbl regclass, OUT min_id int) AS
$func$
BEGIN
EXECUTE 'SELECT min(id) FROM ' || _tbl
INTO min_id;
END
$func$ LANGUAGE plpgsql;
Call:
CREATE TEMP TABLE foo ON COMMIT DROP AS
SELECT id, name
FROM users
LIMIT 50;
SELECT f_min_id('foo');
Major points
The first parameter is of type regclass to prevent SQL injection. More info in this related answer on dba.SE.
I made the temp table ON COMMIT DROP to limit its lifetime to the current transaction. May or may not be what you want.
You can extend this example to take more parameters. Search for code examples for dynamic SQL with EXECUTE.
-> SQLfiddle demo
I would take the problem on the other side, calling an aggregate function for each record of the result set. It's not as flexible but can gives you an hint to work on.
As an exemple to follow your sample problem:
CREATE OR REPLACE FUNCTION myMin ( int,int ) RETURNS int AS $$
SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;
$$ LANGUAGE SQL STRICT IMMUTABLE;
CREATE AGGREGATE my_function ( int ) (
SFUNC = myMin, STYPE = int, INITCOND = 2147483647 --maxint
);
SELECT my_function(id) from (SELECT * FROM Users LIMIT 50) x;
It is not possible to pass an array of generic type RECORD to a plpgsql function which is essentially what you are trying to do.
What you can do is pass in an array of a specific user defined TYPE or of a particular table row type. In the example below you could also swap out the argument data type for the table name users[] (though this would obviously mean getting all data in the users table row).
CREATE TYPE trivial {
"ID" integer,
"NAME" text
}
CREATE OR REPLACE FUNCTION trivial_func(data trivial[])
RETURNS integer AS
$BODY$
DECLARE
BEGIN
--Implementation here using data
return 1;
END$BODY$
LANGUAGE 'plpgsql' VOLATILE;
I think there's no way to pass recordset or table into function (but I'd be glad if i'm wrong). Best I could suggest is to pass array:
create or replace function my_function(data int[])
returns int
as
$$
select min(x) from unnest(data) as x
$$
language SQL;
sql fiddle demo

How to return result of a SELECT inside a function in PostgreSQL?

I have this function in PostgreSQL, but I don't know how to return the result of the query:
CREATE OR REPLACE FUNCTION wordFrequency(maxTokens INTEGER)
RETURNS SETOF RECORD AS
$$
BEGIN
SELECT text, count(*), 100 / maxTokens * count(*)
FROM (
SELECT text
FROM token
WHERE chartype = 'ALPHABETIC'
LIMIT maxTokens
) as tokens
GROUP BY text
ORDER BY count DESC
END
$$
LANGUAGE plpgsql;
But I don't know how to return the result of the query inside the PostgreSQL function.
I found that the return type should be SETOF RECORD, right? But the return command is not right.
What is the right way to do this?
Use RETURN QUERY:
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text -- also visible as OUT param in function body
, cnt bigint
, ratio bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, count(*) AS cnt -- column alias only visible in this query
, (count(*) * 100) / _max_tokens -- I added parentheses
FROM (
SELECT t.txt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
LIMIT _max_tokens
) t
GROUP BY t.txt
ORDER BY cnt DESC; -- potential ambiguity
END
$func$;
Call:
SELECT * FROM word_frequency(123);
Defining the return type explicitly is much more practical than returning a generic record. This way you don't have to provide a column definition list with every function call. RETURNS TABLE is one way to do that. There are others. Data types of OUT parameters have to match exactly what is returned by the query.
Choose names for OUT parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.
But note the potential naming conflict between the OUT parameter cnt and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over the OUT parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:
Use the ordinal position of the item in the SELECT list: ORDER BY 2 DESC. Example:
Select first row in each GROUP BY group?
Repeat the expression ORDER BY count(*).
(Not required here.) Set the configuration parameter plpgsql.variable_conflict or use the special command #variable_conflict error | use_variable | use_column in the function. See:
Naming conflict between function parameter and result of JOIN with USING clause
Don't use "text" or "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt and cnt in my examples, you may want more explicit names.
Added a missing ; and corrected a syntax error in the header. (_max_tokens int), not (int maxTokens) - data type after name.
While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Or work with numeric or a floating point type. See below.
Alternative
This is what I think your query should actually look like (calculating a relative share per token):
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (txt text
, abs_cnt bigint
, relative_share numeric)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt, t.cnt
, round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2) -- AS relative_share
FROM (
SELECT t.txt, count(*) AS cnt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
GROUP BY t.txt
ORDER BY cnt DESC
LIMIT _max_tokens
) t
ORDER BY t.cnt DESC;
END
$func$;
The expression sum(t.cnt) OVER () is a window function. You could use a CTE instead of the subquery. Pretty, but a subquery is typically cheaper in simple cases like this one (mostly before Postgres 12).
A final explicit RETURN statement is not required (but allowed) when working with OUT parameters or RETURNS TABLE (which makes implicit use of OUT parameters).
round() with two parameters only works for numeric types. count() in the subquery produces a bigint result and a sum() over this bigint produces a numeric result, thus we deal with a numeric number automatically and everything just falls into place.
Please see the following link for documentation:
https://www.postgresql.org/docs/current/xfunc-sql.html
Example:
CREATE FUNCTION sum_n_product_with_tab (x int)
RETURNS TABLE(sum int, product int) AS $$
SELECT $1 + tab.y, $1 * tab.y FROM tab;
$$ LANGUAGE SQL;

PostgreSQL ORDER BY values in IN() clause

Ok, there are some answers out there on how to do this. But all of the answers are assuming that the query is selecting all. If you have a distinct select, the methods no longer work.
See here for that method: Simulating MySQL's ORDER BY FIELD() in Postgresql
Basically I have
SELECT DISTINCT id
FROM items
WHERE id IN (5,2,9)
ORDER BY
CASE id
WHEN 5 THEN 1
WHEN 2 THEN 2
WHEN 9 THEN 3
END
Of course, this breaks and says
"PGError: ERROR: for SELECT DISTINCT, ORDER BY expressions must
appear in select list"
Is there any way to order your query results in PostgreSQL by the order of the values in the IN clause?
You can wrap it into a derived table:
SELECT *
FROM (
SELECT DISTINCT id
FROM items
WHERE id IN (5,2,9)
) t
ORDER BY
CASE id
WHEN 5 THEN 1
WHEN 2 THEN 2
WHEN 9 THEN 3
END
From documentation:
Tip: Grouping without aggregate expressions effectively calculates the
set of distinct values in a column. This can also be achieved using
the DISTINCT clause (see Section 7.3.3).
SQL query:
SELECT id
FROM items
WHERE id IN (5,2,9)
GROUP BY id
ORDER BY
CASE id
WHEN 5 THEN 1
WHEN 2 THEN 2
WHEN 9 THEN 3
END;
I create this function in postgres PL/PGSQL and it is a lot easier to use.
-- Function: uniqueseperategeomarray(geometry[], double precision)
-- DROP FUNCTION uniqueseperategeomarray(geometry[], double precision);
CREATE OR REPLACE FUNCTION manualidsort(input_id int, sort_array int[])
RETURNS int AS
$BODY$
DECLARE
input_index int;
each_item int;
index int;
BEGIN
index := 1;
FOREACH each_item IN ARRAY sort_array
LOOP
IF each_item = input_id THEN
RETURN index;
END IF;
index := index+1;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql;
ALTER FUNCTION manualidsort(int, int[])
OWNER TO staff;
And when you run a query, go like this...
SELECT * FROM my_table ORDER BY manualidsort(my_table_id, ARRAY[25, 66, 54])
Looking around I couldn't find a complete solution to this question.
I think the better solution is to use this query
SELECT *
FROM items
WHERE id IN (5,2,9)
ORDER BY idx(array[5,2,9]::integer[], items.id)
If you are using PostgreSQL >= 9.2 you can enable the idx function enabling the extension.
CREATE EXTENSION intarray;
Otherwise you can create it with the following:
CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
RETURNS INT AS
$$
SELECT i FROM (
SELECT generate_series(array_lower($1,1),array_upper($1,1))
) g(i)
WHERE $1[i] = $2
LIMIT 1;
$$ LANGUAGE SQL IMMUTABLE;
I really suggest to use ::integer[] in the query because if you are creating the array dynamically is possible that it has no elements resulting in ids(array[], items.id) which would raise an exception on PostgreSQL.