How to make GROUP BY as a parameter without using CASE WHEN? - sql

I have the following table with the following tables and values and types.
create table example (
fname text,
lname text,
value int);
insert into example values
('doge','coin',123),
('bit','coin',434),
('lite','coin',565),
('doge','meme',183),
('bit','meme',453),
('lite','meme',433);
create type resultrow as (
nam text,
amount int);
I would like to write a function, that groups by a parameter I give to the function.
This example works:
do $$
declare
my_parameter text;
results resultrow[];
begin
my_parameter = 'last';
results := array(select row( case when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end,
sum(salary))::resultrow
from example
group by case when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end);
raise notice '%', results;
end;
$$ language plpgsql;
I have been told, that CASE WHEN decisions are really expensive. One obvious solution would be to create the select statements twice:
if my_parameter = 'first' then
results := array(select row(fname,sum(salary))::resultrow
from example
group by fname);
end if;
if my_parameter = 'last' then
results := array(select row(lname,sum(salary))::resultrow
from example
group by lname);
end if;
But this leads to a lot of ugly duplicated code.
Is there another solution to make the group by parameterisable?

If you don't want to use case, you can use this:
with cte(name, salary) as (
select fname, salary from example where my_parameter = 'first'
union all
select lname, salary from example where my_parameter = 'last'
)
select name, sum(salary)
from cte
group by name
But, actually, it's better to test, I've not heard that case is expensive.
If you'll find that case is not expensive, I still suggest use subquery or cte to avoid code duplication, like:
with cte(name, salary) as (
select
case
when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end as name,
salary
from example
)
select name, sum(salary)
from cte
group by name

Simplify what you have:
DO
$do$
DECLARE
_param text := 'last'; -- one can assign at declaration time
results resultrow[];
BEGIN
results := ARRAY(
SELECT t::resultrow -- refer to table alias to get whole row
FROM (
SELECT CASE _param -- simple "switched" CASE
WHEN 'first' THEN fname
WHEN 'last' THEN lname
END
,sum(salary)
FROM example
GROUP BY 1 -- simpler with positional reference
) t
);
RAISE NOTICE '%', results;
END
$do$ LANGUAGE plpgsql;
Using simple CASE syntax variant. This way the expression is only evaluated once and the syntax is simpler. Since your question refers to CASE - even if that's hardly relevant.
Also using a positional reference in the GROUP BY clause. This seems relevant to the title of your question. More explanation in these related answers:
Select first row in each GROUP BY group?
GROUP BY + CASE statement
This kind of query can be very inefficient. It's not a problem of the (very cheap!) CASE statement per se. It's because the planner has to provide for varying input in the first column and may be forced to use a generic, less optimized plan.
Dynamic SQL
I assume the actual goal is to write a function that takes my_parameter. Use dynamic SQL with EXECUTE, which will likely result in a superior query plan, i.e. superior performance. There are lots of code example here, try a search.
Also, I return a set of resultrow instead of the awkward ARRAY you had in your example (since you cannot return from a DO statement):
CREATE FUNCTION f_salaray_for_param(_param text)
RETURNS SETOF resultrow AS
$func$
DECLARE
_fld text :=
CASE _param
WHEN 'first' THEN 'fname' -- SQL injection not possible
WHEN 'last' THEN 'lname'
END;
BEGIN
IF _fld IS NULL THEN -- exception for invalid params
RAISE EXCEPTION 'Unexpected value for _param: %', _param;
END IF;
RETURN QUERY EXECUTE '
SELECT ' || _fld || ', sum(salary)
FROM example
GROUP BY 1'; -- query is very simple now
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_salaray_for_param('first');
BTW, the plpgsql assignment operator is := (not =).

Related

Impossible to get NULL from empty value in pl/sql

In my table "Group" I have not a name=John,so I want to get the NULL value in "new_name".
SELECT name INTO new_name FROM Group WHERE name="John";
in pl/sql:
if(new_name IS NULL) then ---here is the problem,I can't enter in "if",instead of having name=NULL, I have name=" ",and when I try to use if(new_name=" "), there's an error....---
some code .....
end if;
so how to check in "if" satement, is it NULL or not(no need to use EXISTS)?
A simple method is to use aggregation:
SELECT MAX(name) INTO new_name
FROM Group
WHERE name = 'John';
An aggregation query with no GROUP BY is guaranteed to return one row. If all rows are filtered out, then the results of the aggregations are NULL.
It looks like you don't have NULL but a space; if that's what you are saying, then you should enclose it into single quotes:
if new_name = ' ' then ...
However: back to beginning. If there's no row whose name = 'John', then select won't return null but raise the no_data_found exception which you should handle, somehow.
One option is to actually "handle" it:
begin
begin
select name into new_name from group where name = 'John';
exception
when no_data_found then new_name := null;
end;
if new_name is null then
some code ...
end if;
end;
Or, if you use an aggregate, you'll avoid the exception handling section (which means "less typing", but it kind of hides what's going on):
begin
select max(name) into new_name from group where name = 'John';
if new_name is null then
some code ...
end if;
end;
I suggest you handle it properly.
your IF condition should be fine but like Littlefoot mentioned, your logic will never satisfy your If condition since the NAME will always be not NULL.
You can use NVL() too if you want to assign values in your new_name variable other than NULL then use that in your IF condition.

Return default rows from a function when first SELECT does not return rows

I have this function http://rextester.com/VIHMIG61446
CREATE OR REPLACE FUNCTION myTestProcedure(namevalue character varying)
RETURNS TABLE(id integer, name character varying, isdefault boolean)
LANGUAGE plpgsql
AS $function$
BEGIN
IF EXISTS(SELECT
Domain.id,
Domain.name,
Domain.isdefault
FROM Domain
where lower(Domain.name) like namevalue)
THEN
RETURN QUERY SELECT
Domain.id,
Domain.name,
Domain.isdefault
FROM Domain
where lower(Domain.name) like namevalue;
ELSE
RETURN QUERY SELECT
Domain.id,
Domain.name,
Domain.isdefault
FROM Domain
where Domain.isdefault = true;
END IF;
END
$function$;
and I'm looking a way to not repeat the whole query on the if, so I decided to use with as to store the result but it does not work for me http://rextester.com/MVMVA73088
How should I use with as?
CREATE OR REPLACE FUNCTION myTestProcedure(namevalue character varying)
RETURNS TABLE(id integer, name character varying, isdefault boolean)
LANGUAGE plpgsql
AS $function$
BEGIN
with temporal_result as (
SELECT
Domain.id,
Domain.name,
Domain.isdefault
FROM Domain
where lower(Domain.name) like namevalue
)
IF EXISTS(temporal_result)
THEN
RETURN QUERY SELECT * from temporal_result;
ELSE
RETURN QUERY SELECT
Domain.id,
Domain.name,
Domain.isdefault
FROM Domain
where Domain.isdefault = true;
END IF;
END
$function$;
The if-else logic can be avoided completely. The equivalent result can be written as a single query. The function BOOL_AND is an aggregate function that returns false if any of the values are false, otherwise it returns true.
The following query will work correctly even if multiple rows are matched with the lower(name) like '<namevalue>' condition, or if you have multiple default values.
SELECT subquery.id, subquery.name, subquery.isdefault
FROM (SELECT d.id,
d.name,
d.isdefault,
BOOL_AND(d.isdefault) OVER () default_and
FROM domain d
WHERE lower(d.name) like 'robert' or isdefault) subquery
WHERE isdefault = default_and
As to why you get an error with IF EXISTS(temporal_result), that is not valid sql. It's illegal to do such branching from within an sql statement. What you can do instead is to save the result of the first query into a temporary table, and do the if-else branching referring to the temporary table. A correct version of your stored procedure is below:
CREATE OR REPLACE FUNCTION mytestprocedure(namevalue character varying)
RETURNS TABLE(id integer, name character varying, isdefault boolean)
LANGUAGE plpgsql
AS $function$
BEGIN
CREATE TEMPORARY TABLE temporal_result as
SELECT
d.id,
d.name,
d.isdefault
FROM domain d
where lower(d.name) like namevalue
;
IF EXISTS(SELECT TRUE FROM temporal_result) THEN
RETURN QUERY SELECT * from temporal_result;
ELSE
RETURN QUERY SELECT
d.id,
d.name,
d.isdefault
FROM domain d
where d.isdefault = true;
END IF;
DROP TABLE temporal_result;
RETURN;
END;
$function$;
Note that it is necessary to drop the table at the end of the procedure.
Also note that postgresql ignores upper / lower cases in entity names unless quoted, so it is generally considered poor style to use camel case when naming tables / fields / functions.
I suggest to check the special PL/pgSQL variable FOUND instead:
CREATE OR REPLACE FUNCTION my_test_func(namevalue varchar)
RETURNS TABLE(id integer, name varchar, isdefault boolean)
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
RETURN QUERY
SELECT d.id, d.name, d.isdefault
FROM domain d
WHERE lower(d.name) LIKE namevalue;
IF NOT FOUND THEN
RETURN QUERY
SELECT d.id, d.name, d.isdefault
FROM domain d
WHERE d.isdefault;
END IF;
END
$func$;
Clean and fast. The first query is simple and as fast as possible. The second query is never executed when the first returns any rows. Related (chapter "Other cases"):
How to return a value from a function if no value is found
I would probably use d.name ILIKE namevalue and support it with a trigram index. See:
LOWER LIKE vs iLIKE
A partial index for default rows in the 2nd query might pay, too. If you can get index-only scans out of it, include the (small!) columns you need to retrieve as index columns:
CREATE INDEX domain_defaults_idx ON domain (id, name, isdefault) WHERE isdefault;
Yes, we have to include isdefault, even though logically redundant. Postgres is not currently (pg 14) smart enough to derive the value from the WHERE condition.
If you can't get index-only scans, an index with a constant expression is cheaper:
CREATE INDEX domain_defaults_idx ON domain ((TRUE)) WHERE isdefault;
Related:
How to index a query with WHERE field IS NULL?

Quickest way to find out if record exist

I have three functions which are all doing the same. I like to know whether SELECT ... FROM EMP WHERE DEPT_ID = v_dept returns any row.
Which one would be the fastest way?
CREATE OR REPLACE FUNCTION RecordsFound1(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
n INTEGER;
BEGIN
SELECT COUNT(*) INTO res FROM EMP WHERE DEPT_ID = v_dept;
RETURN n > 0;
END;
/
CREATE OR REPLACE FUNCTION RecordsFound2(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
CURSOR curEmp IS
SELECT DEPT_ID FROM EMP WHERE DEPT_ID = v_dept;
dept EMP.DEPT_ID%TYPE;
res BOOLEAN;
BEGIN
OPEN curEmp;
FETCH curEmp INTO dept;
res := curEmp%FOUND;
CLOSE curEmp;
RETURN res;
END;
/
CREATE OR REPLACE FUNCTION RecordsFound3(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
dept EMP.DEPT_ID%TYPE;
BEGIN
SELECT DEPT_ID INTO dept FROM EMP WHERE DEPT_ID = v_dept;
RETURN TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RETURN FALSE;
WHEN TOO_MANY_ROWS THEN
RETURN TRUE;
END;
/
Assume table EMP is very big and condition WHERE DEPT_ID = v_dept could match on thousands of rows.
Usually I would expect RecordsFound2 to be the fastest, because it has to fetch (maximum) only one single row. So in terms of I/O it should be the best.
For the non-believers: the exists() version:
CREATE OR REPLACE FUNCTION RecordsFound0(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
BEGIN
RETURN EXISTS( SELECT 1 FROM EMP WHERE DEPT_ID = v_dept);
END;
The Postgresql version:
CREATE OR REPLACE FUNCTION RecordsFound0(v_dept IN EMP.DEPT_ID%TYPE) RETURNS BOOLEAN AS
$func$
BEGIN
RETURN EXISTS( SELECT 1 FROM EMP WHERE DEPT_ID = v_dept);
END
$func$ LANGUAGE plpgsql;
And in Postgres the function can be implemented in pure sql, without the need for plpgsql(in Postgres the select does not need a ... FROM DUAL
CREATE OR REPLACE FUNCTION RecordsFound0s(v_debt IN EMP.DEPT_ID%TYPE) RETURNS BOOLEAN AS
$func$
SELECT EXISTS( SELECT NULL FROM EMP WHERE DEPT_ID = v_debt);
$func$ LANGUAGE sql;
Note: the unary EXISTS(...) operator yields a Boolean, which is exactly what you want.
Note2: I hope I have the Oracle syntax correct. (keywords RETURN <-->RETURNS and AS <-->IS)
Your solution 1 Count all occurrences:
You have the DBMS do much more work than needed. why let it scan the table and count all occurences when you only want to know whether at least one exists or not? This is slow. (But on a small emp table with an index on dept_id this may still look fast :-)
Your solution 2 Open a Cursor and only fetch the first record
A good idea and probably rather fast, as you stop, once you found a record. However, the DBMS doesn't know that you only want to look for the mere existence and may decide for a slow execution plan, as it expects you to fetch all matches.
Your solution 3 Fetch the one record or get an exception
This may be a tad faster, as the DBMS expects to find one record only. However, it must test for further matches in order to raise TOO_MANY_ROWS in case. So in spite of having found a record already it must look on.
solution 4 Use COUNT and ROWNUM
By adding AND ROWNUM = 1 you show the DBMS that you want one record only. At a minimum the DBMS knows it can stop at some point, at best it even notices that it is only one record needed. So depending on the implementation the DBMS may find the optimal execution plan.
solution 5 Use EXISTS
EXISTS is made to check for mere existence, so the DBMS can find the optimal execution plan. EXISTS is an SQL word, not a PL/SQL word and the SQL engine doesn't know BOOLEAN, so the function gets a bit clumsy:
CREATE OR REPLACE FUNCTION RecordsFound1(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
v_1_is_yes_0_is_no INTEGER;
BEGIN
SELECT COUNT(*) INTO v_1_is_yes_0_is_no
FROM DUAL
WHERE EXISTS (SELECT * FROM EMP WHERE DEPT_ID = v_dept);
RETURN n = 1;
END;
The absolute fastest ways is to not call the count function at all.
A typical pattern is
count the number of rows
if cnt = 0 then do something
else read chunk of data and process
Simple read the data and than perform the count test on it.
you can break a SQL by adding an extra condition to your where-clause:
where ...
and rownum = 1;
this stops immediatly if at least one record is found and it is as fast as the "exists" operator.
See the followint sample code:
create or replace function test_record_exists(pi_some_parameter in varchar2) return boolean is
l_dummy varchar2(10);
begin
select 'x'
into l_dummy
from <your table>
where <column where you want to filter for> = pi_some_parameter
and rownum = 1;
return (true);
exception
when no_data_found then
return (false);
end;
If you use:
select count(*)
from my_table
where ...
and rownum = 1;
... then the query will:
be executed in the most efficient fashion
always return a single row
return either 0 or 1
This three factors make it very fast and very easy to use in PL/SQL as you do not have to concern yourself with whether a row is returned or not.
The returned value is also amenable to use as a true/false boolean, of course.
If you wanted to list the departments that either do or do not have any records in the emp table then I would certainly use EXISTS, as the semi-(anti)join is the most efficient means of executing the query:
select *
from dept
where [NOT] exists (
select null
from emp
where emp.dept_id = dept.id);

Cursor inside SQL query

In Oracle, it's possible to return a cursor inside a SQL query, using the cursor keyword, like this:
select owner, table_name,
cursor (select column_name
from all_tab_columns
where owner = allt.owner
and table_name = allt.table_name) as columns
from all_tables allt
The questions are:
Does anyone know where can I find documentation for this?
Does PortgreSQL (or any other open source DBMS) have a similar feature?
It's called a CURSOR EXPRESSION, and it is documented in the obvious place, the Oracle SQL Reference. Find it here.
As for your second question, the closest thing PostgreSQL offers to match this functionality is "scalar sub-queries". However, as #tbrugz points out, these only return one row and one column, so they aren't much like Cursor Expressions. Read about them in the documentation here. MySQL also has Scalar Sub-queries, again limited to one column and one row. Docs here. Likewise SQL Server and DB2 (not open source but for completeness).
That rules out all the obvious contenders. So, it seems unlikely any other DBMS offers the jagged result set we get from Oracle's cursor expression.
Postgres provides cursor expressions but the syntax is a bit less handy than Oracle's.
First you need to create function for array to refcursor conversion:
create or replace function arr2crs(arr anyarray) returns refcursor as $$
declare crs refcursor;
begin
open crs for select * from unnest(arr);
return crs;
end;
$$ language plpgsql volatile;
Now let's create some test data
create table dep as
select 1 depid, 'Sales' depname
union all
select 2 depid, 'IT' depname;
create table emp as
select 1 empid, 1 depid, 'John' empname union all
select 2 empid, 1 depid, 'James' empname union all
select 3 empid, 2 depid, 'Rob';
You can query it like this
select
dep.*,
arr2crs(array(
select row(emp.*)::emp from emp
where emp.depid = dep.depid
)) emps
from dep
And process in on client side like this (Java)
public static List Rs2List(ResultSet rs) throws SQLException{
List result = new ArrayList();
ResultSetMetaData meta = rs.getMetaData();
while(rs.next()){
Map row = new HashMap();
for (int i = 1; i <= meta.getColumnCount(); i++){
Object o = rs.getObject(i);
row.put(
meta.getColumnName(i),
(o instanceof ResultSet)?Rs2List((ResultSet)o):o);
}
result.add(row);
}
return result;
}
Note that you must explicitly cast row to particular type. You can use CREATE TYPE to create necessary types.

How can I perform an AND on an unknown number of booleans in postgresql?

I have a table with a foreign key and a boolean value (and a bunch of other columns that aren't relevant here), as such:
CREATE TABLE myTable
(
someKey integer,
someBool boolean
);
insert into myTable values (1, 't'),(1, 't'),(2, 'f'),(2, 't');
Each someKey could have 0 or more entries. For any given someKey, I need to know if a) all the entries are true, or b) any of the entries are false (basically an AND).
I've come up with the following function:
CREATE FUNCTION do_and(int4) RETURNS boolean AS
$func$
declare
rec record;
retVal boolean = 't'; -- necessary, or true is returned as null (it's weird)
begin
if not exists (select someKey from myTable where someKey = $1) then
return null; -- and because we had to initialise retVal, if no rows are found true would be returned
end if;
for rec in select someBool from myTable where someKey = $1 loop
retVal := rec.someBool AND retVal;
end loop;
return retVal;
end;
$func$ LANGUAGE 'plpgsql' VOLATILE;
... which gives the correct results:
select do_and(1) => t
select do_and(2) => f
select do_and(3) => null
I'm wondering if there's a nicer way to do this. It doesn't look too bad in this simple scenario, but once you include all the supporting code it gets lengthier than I'd like. I had a look at casting the someBool column to an array and using the ALL construct, but I couldn't get it working... any ideas?
No need to redefine functions PostgreSQL already provides: bool_and() will do the job:
select bool_and(someBool)
from myTable
where someKey = $1
group by someKey;
(Sorry, can't test it now)
Similar to the previous one, but in one query, this will do the trick, however, it is not clean nor easily-understandable code:
SELECT someKey,
CASE WHEN sum(CASE WHEN someBool THEN 1 ELSE 0 END) = count(*)
THEN true
ELSE false END as boolResult
FROM table
GROUP BY someKey
This will get all the responses at once, if you only want one key just add a WHERE clause
I just installed PostgreSQL for the first time this week, so you'll need to clean up the syntax, but the general idea here should work:
return_value = NULL
IF EXISTS
(
SELECT
*
FROM
My_Table
WHERE
some_key = $1
)
BEGIN
IF EXISTS
(
SELECT
*
FROM
My_Table
WHERE
some_key = $1 AND
some_bool = 'f'
)
SELECT return_value = 'f'
ELSE
SELECT return_value = 't'
END
The idea is that you only need to look at one row to see if any exist and if at least one row exists you then only need to look until you find a false value to determine that the final value is false (or you get to the end and it's true). Assuming that you have an index on some_key, performance should be good I would think.
(Very minor side-point: I think your function should be declared STABLE rather than VOLATILE, since it just uses data from the database to determine its result.)
As someone mentioned, you can stop scanning as soon as you encounter a "false" value. If that's a common case, you can use a cursor to actually provoke a "fast finish":
CREATE FUNCTION do_and(key int) RETURNS boolean
STABLE LANGUAGE 'plpgsql' AS $$
DECLARE
v_selector CURSOR(cv_key int) FOR
SELECT someBool FROM myTable WHERE someKey = cv_key;
v_result boolean;
v_next boolean;
BEGIN
OPEN v_selector(key);
LOOP
FETCH v_selector INTO v_next;
IF not FOUND THEN
EXIT;
END IF;
IF v_next = false THEN
v_result := false;
EXIT;
END IF;
v_result := true;
END LOOP;
CLOSE v_selector;
RETURN v_result;
END
$$;
This approach also means that you are only doing a single scan on myTable. Mind you, I suspect you need loads and loads of rows in order for the difference to be appreciable.
You can also use every, which is just an alias to bool_and:
select every(someBool)
from myTable
where someKey = $1
group by someKey;
Using every makes your query more readable. An example, show all persons who just eat apple every day:
select personId
from personDailyDiet
group by personId
having every(fruit = 'apple');
every is semantically the same as bool_and, but it's certainly clear that every is more readable than bool_and:
select personId
from personDailyDiet
group by personId
having bool_and(fruit = 'apple');
Maybe count 'all' items with somekey=somevalue and use it in a boolean comparison with the count of all 'True' occurences for somekey?
Some non-tested pseudo-sql to show what i mean...
select foo1.count_key_items = foo2.count_key_true_items
from
(select count(someBool) as count_all_items from myTable where someKey = '1') as foo1,
(select count(someBool) as count_key_true_items from myTable where someKey = '1' and someBool) as foo2
CREATE FUNCTION do_and(int4)
RETURNS boolean AS
$BODY$
SELECT
MAX(bar)::bool
FROM (
SELECT
someKey,
MIN(someBool::int) AS bar
FROM
myTable
WHERE
someKey=$1
GROUP BY
someKey
UNION
SELECT
$1,
NULL
) AS foo;
$BODY$
LANGUAGE 'sql' STABLE;
In case you don't need the NULL value (when there aren't any rows), simply use the query below:
SELECT
someKey,
MIN(someBool::int)::bool AS bar
FROM
myTable
WHERE
someKey=$1
GROUP BY
someKey
SELECT DISTINCT ON (someKey) someKey, someBool
FROM myTable m
ORDER BY
someKey, someBool NULLS FIRST
This will select the first ordered boolean value for each someKey.
If there is a single FALSE or a NULL, it will be returned first, meaning that the AND failed.
If the first boolean is a TRUE, then all other booleans are also TRUE for this key.
Unlike the aggregate, this will use the index on (someKey, someBool).
To return an OR, just reverse the ordering:
SELECT DISTINCT ON (someKey) someKey, someBool
FROM myTable m
ORDER BY
someKey, someBool DESC NULLS FIRST