How can I perform an AND on an unknown number of booleans in postgresql? - sql

I have a table with a foreign key and a boolean value (and a bunch of other columns that aren't relevant here), as such:
CREATE TABLE myTable
(
someKey integer,
someBool boolean
);
insert into myTable values (1, 't'),(1, 't'),(2, 'f'),(2, 't');
Each someKey could have 0 or more entries. For any given someKey, I need to know if a) all the entries are true, or b) any of the entries are false (basically an AND).
I've come up with the following function:
CREATE FUNCTION do_and(int4) RETURNS boolean AS
$func$
declare
rec record;
retVal boolean = 't'; -- necessary, or true is returned as null (it's weird)
begin
if not exists (select someKey from myTable where someKey = $1) then
return null; -- and because we had to initialise retVal, if no rows are found true would be returned
end if;
for rec in select someBool from myTable where someKey = $1 loop
retVal := rec.someBool AND retVal;
end loop;
return retVal;
end;
$func$ LANGUAGE 'plpgsql' VOLATILE;
... which gives the correct results:
select do_and(1) => t
select do_and(2) => f
select do_and(3) => null
I'm wondering if there's a nicer way to do this. It doesn't look too bad in this simple scenario, but once you include all the supporting code it gets lengthier than I'd like. I had a look at casting the someBool column to an array and using the ALL construct, but I couldn't get it working... any ideas?

No need to redefine functions PostgreSQL already provides: bool_and() will do the job:
select bool_and(someBool)
from myTable
where someKey = $1
group by someKey;
(Sorry, can't test it now)

Similar to the previous one, but in one query, this will do the trick, however, it is not clean nor easily-understandable code:
SELECT someKey,
CASE WHEN sum(CASE WHEN someBool THEN 1 ELSE 0 END) = count(*)
THEN true
ELSE false END as boolResult
FROM table
GROUP BY someKey
This will get all the responses at once, if you only want one key just add a WHERE clause

I just installed PostgreSQL for the first time this week, so you'll need to clean up the syntax, but the general idea here should work:
return_value = NULL
IF EXISTS
(
SELECT
*
FROM
My_Table
WHERE
some_key = $1
)
BEGIN
IF EXISTS
(
SELECT
*
FROM
My_Table
WHERE
some_key = $1 AND
some_bool = 'f'
)
SELECT return_value = 'f'
ELSE
SELECT return_value = 't'
END
The idea is that you only need to look at one row to see if any exist and if at least one row exists you then only need to look until you find a false value to determine that the final value is false (or you get to the end and it's true). Assuming that you have an index on some_key, performance should be good I would think.

(Very minor side-point: I think your function should be declared STABLE rather than VOLATILE, since it just uses data from the database to determine its result.)
As someone mentioned, you can stop scanning as soon as you encounter a "false" value. If that's a common case, you can use a cursor to actually provoke a "fast finish":
CREATE FUNCTION do_and(key int) RETURNS boolean
STABLE LANGUAGE 'plpgsql' AS $$
DECLARE
v_selector CURSOR(cv_key int) FOR
SELECT someBool FROM myTable WHERE someKey = cv_key;
v_result boolean;
v_next boolean;
BEGIN
OPEN v_selector(key);
LOOP
FETCH v_selector INTO v_next;
IF not FOUND THEN
EXIT;
END IF;
IF v_next = false THEN
v_result := false;
EXIT;
END IF;
v_result := true;
END LOOP;
CLOSE v_selector;
RETURN v_result;
END
$$;
This approach also means that you are only doing a single scan on myTable. Mind you, I suspect you need loads and loads of rows in order for the difference to be appreciable.

You can also use every, which is just an alias to bool_and:
select every(someBool)
from myTable
where someKey = $1
group by someKey;
Using every makes your query more readable. An example, show all persons who just eat apple every day:
select personId
from personDailyDiet
group by personId
having every(fruit = 'apple');
every is semantically the same as bool_and, but it's certainly clear that every is more readable than bool_and:
select personId
from personDailyDiet
group by personId
having bool_and(fruit = 'apple');

Maybe count 'all' items with somekey=somevalue and use it in a boolean comparison with the count of all 'True' occurences for somekey?
Some non-tested pseudo-sql to show what i mean...
select foo1.count_key_items = foo2.count_key_true_items
from
(select count(someBool) as count_all_items from myTable where someKey = '1') as foo1,
(select count(someBool) as count_key_true_items from myTable where someKey = '1' and someBool) as foo2

CREATE FUNCTION do_and(int4)
RETURNS boolean AS
$BODY$
SELECT
MAX(bar)::bool
FROM (
SELECT
someKey,
MIN(someBool::int) AS bar
FROM
myTable
WHERE
someKey=$1
GROUP BY
someKey
UNION
SELECT
$1,
NULL
) AS foo;
$BODY$
LANGUAGE 'sql' STABLE;
In case you don't need the NULL value (when there aren't any rows), simply use the query below:
SELECT
someKey,
MIN(someBool::int)::bool AS bar
FROM
myTable
WHERE
someKey=$1
GROUP BY
someKey

SELECT DISTINCT ON (someKey) someKey, someBool
FROM myTable m
ORDER BY
someKey, someBool NULLS FIRST
This will select the first ordered boolean value for each someKey.
If there is a single FALSE or a NULL, it will be returned first, meaning that the AND failed.
If the first boolean is a TRUE, then all other booleans are also TRUE for this key.
Unlike the aggregate, this will use the index on (someKey, someBool).
To return an OR, just reverse the ordering:
SELECT DISTINCT ON (someKey) someKey, someBool
FROM myTable m
ORDER BY
someKey, someBool DESC NULLS FIRST

Related

How to make GROUP BY as a parameter without using CASE WHEN?

I have the following table with the following tables and values and types.
create table example (
fname text,
lname text,
value int);
insert into example values
('doge','coin',123),
('bit','coin',434),
('lite','coin',565),
('doge','meme',183),
('bit','meme',453),
('lite','meme',433);
create type resultrow as (
nam text,
amount int);
I would like to write a function, that groups by a parameter I give to the function.
This example works:
do $$
declare
my_parameter text;
results resultrow[];
begin
my_parameter = 'last';
results := array(select row( case when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end,
sum(salary))::resultrow
from example
group by case when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end);
raise notice '%', results;
end;
$$ language plpgsql;
I have been told, that CASE WHEN decisions are really expensive. One obvious solution would be to create the select statements twice:
if my_parameter = 'first' then
results := array(select row(fname,sum(salary))::resultrow
from example
group by fname);
end if;
if my_parameter = 'last' then
results := array(select row(lname,sum(salary))::resultrow
from example
group by lname);
end if;
But this leads to a lot of ugly duplicated code.
Is there another solution to make the group by parameterisable?
If you don't want to use case, you can use this:
with cte(name, salary) as (
select fname, salary from example where my_parameter = 'first'
union all
select lname, salary from example where my_parameter = 'last'
)
select name, sum(salary)
from cte
group by name
But, actually, it's better to test, I've not heard that case is expensive.
If you'll find that case is not expensive, I still suggest use subquery or cte to avoid code duplication, like:
with cte(name, salary) as (
select
case
when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end as name,
salary
from example
)
select name, sum(salary)
from cte
group by name
Simplify what you have:
DO
$do$
DECLARE
_param text := 'last'; -- one can assign at declaration time
results resultrow[];
BEGIN
results := ARRAY(
SELECT t::resultrow -- refer to table alias to get whole row
FROM (
SELECT CASE _param -- simple "switched" CASE
WHEN 'first' THEN fname
WHEN 'last' THEN lname
END
,sum(salary)
FROM example
GROUP BY 1 -- simpler with positional reference
) t
);
RAISE NOTICE '%', results;
END
$do$ LANGUAGE plpgsql;
Using simple CASE syntax variant. This way the expression is only evaluated once and the syntax is simpler. Since your question refers to CASE - even if that's hardly relevant.
Also using a positional reference in the GROUP BY clause. This seems relevant to the title of your question. More explanation in these related answers:
Select first row in each GROUP BY group?
GROUP BY + CASE statement
This kind of query can be very inefficient. It's not a problem of the (very cheap!) CASE statement per se. It's because the planner has to provide for varying input in the first column and may be forced to use a generic, less optimized plan.
Dynamic SQL
I assume the actual goal is to write a function that takes my_parameter. Use dynamic SQL with EXECUTE, which will likely result in a superior query plan, i.e. superior performance. There are lots of code example here, try a search.
Also, I return a set of resultrow instead of the awkward ARRAY you had in your example (since you cannot return from a DO statement):
CREATE FUNCTION f_salaray_for_param(_param text)
RETURNS SETOF resultrow AS
$func$
DECLARE
_fld text :=
CASE _param
WHEN 'first' THEN 'fname' -- SQL injection not possible
WHEN 'last' THEN 'lname'
END;
BEGIN
IF _fld IS NULL THEN -- exception for invalid params
RAISE EXCEPTION 'Unexpected value for _param: %', _param;
END IF;
RETURN QUERY EXECUTE '
SELECT ' || _fld || ', sum(salary)
FROM example
GROUP BY 1'; -- query is very simple now
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_salaray_for_param('first');
BTW, the plpgsql assignment operator is := (not =).

Return setof record with 1 row

I'm altering a PLPGSQL function and I'm having a small problem. First let me post it's declaration:
CREATE OR REPLACE FUNCTION permissions(_principal text)
RETURNS SETOF record AS
$BODY$
DECLARE
id integer := 0;
rolerow record ;
In this function there are a few cases, controled by IF statements, and in all of them, the return is the UNION of more than 1 query, such as this:
FOR rolerow IN (
(SELECT 'role1' AS role FROM table1 WHERE id = table1.id)
UNION (SELECT 'role2' AS role FROM table2 WHERE id = table2.id)
UNION (SELECT 'role3' AS role FROM table3 WHERE id = table3.id)
)
LOOP
RETURN NEXT rolerow;
END LOOP;
RETURN;
And it all works fine, but in one case, I need to return a single query result, that would be a SETOF record but with only 1 item, so I did it like this:
FOR rolerow IN (
SELECT 'role4' AS role FROM table4 WHERE id = table4.id
)
LOOP
RETURN NEXT rolerow;
END LOOP;
RETURN;
I also tried
RETURN QUERY SELECT 'role4' AS role FROM table4 WHERE id = table4.id;
But in both cases I get the same error as a response:
ERROR: structure of query does not match function result type
DETAIL: Returned type unknown does not match expected type text in column 1.
Does anyone have any idea how I can fix this?
I'll provide extra information in case this isn't enough.
You need an explicit cast for the string literal 'role4', which is not typed (type "unknown") unlike you seem to expect:
SELECT 'role4'::text AS role FROM ...
Generally, looping is more expensive for your simple examples. Use RETURN QUERY like you already tested.

Proper way of checking if row exists in table in PL/SQL block

I was writing some tasks yesterday and it struck me that I don't really know THE PROPER and ACCEPTED way of checking if row exists in table when I'm using PL/SQL.
For examples sake let's use table:
PERSON (ID, Name);
Obviously I can't do (unless there's some secret method) something like:
BEGIN
IF EXISTS SELECT id FROM person WHERE ID = 10;
-- do things when exists
ELSE
-- do things when doesn't exist
END IF;
END;
So my standard way of solving it was:
DECLARE
tmp NUMBER;
BEGIN
SELECT id INTO tmp FROM person WHERE id = 10;
--do things when record exists
EXCEPTION
WHEN no_data_found THEN
--do things when record doesn't exist
END;
However I don't know if it's accepted way of doing it, or if there's any better way of checking, I would really apprieciate if someone could share their wisdom with me.
I wouldn't push regular code into an exception block. Just check whether any rows exist that meet your condition, and proceed from there:
declare
any_rows_found number;
begin
select count(*)
into any_rows_found
from my_table
where rownum = 1 and
... other conditions ...
if any_rows_found = 1 then
...
else
...
end if;
IMO code with a stand-alone SELECT used to check to see if a row exists in a table is not taking proper advantage of the database. In your example you've got a hard-coded ID value but that's not how apps work in "the real world" (at least not in my world - yours may be different :-). In a typical app you're going to use a cursor to find data - so let's say you've got an app that's looking at invoice data, and needs to know if the customer exists. The main body of the app might be something like
FOR aRow IN (SELECT * FROM INVOICES WHERE DUE_DATE < TRUNC(SYSDATE)-60)
LOOP
-- do something here
END LOOP;
and in the -- do something here you want to find if the customer exists, and if not print an error message.
One way to do this would be to put in some kind of singleton SELECT, as in
-- Check to see if the customer exists in PERSON
BEGIN
SELECT 'TRUE'
INTO strCustomer_exists
FROM PERSON
WHERE PERSON_ID = aRow.CUSTOMER_ID;
EXCEPTION
WHEN NO_DATA_FOUND THEN
strCustomer_exists := 'FALSE';
END;
IF strCustomer_exists = 'FALSE' THEN
DBMS_OUTPUT.PUT_LINE('Customer does not exist!');
END IF;
but IMO this is relatively slow and error-prone. IMO a Better Way (tm) to do this is to incorporate it in the main cursor:
FOR aRow IN (SELECT i.*, p.ID AS PERSON_ID
FROM INVOICES i
LEFT OUTER JOIN PERSON p
ON (p.ID = i.CUSTOMER_PERSON_ID)
WHERE DUE_DATA < TRUNC(SYSDATE)-60)
LOOP
-- Check to see if the customer exists in PERSON
IF aRow.PERSON_ID IS NULL THEN
DBMS_OUTPUT.PUT_LINE('Customer does not exist!');
END IF;
END LOOP;
This code counts on PERSON.ID being declared as the PRIMARY KEY on PERSON (or at least as being NOT NULL); the logic is that if the PERSON table is outer-joined to the query, and the PERSON_ID comes up as NULL, it means no row was found in PERSON for the given CUSTOMER_ID because PERSON.ID must have a value (i.e. is at least NOT NULL).
Share and enjoy.
Many ways to skin this cat. I put a simple function in each table's package...
function exists( id_in in yourTable.id%type ) return boolean is
res boolean := false;
begin
for c1 in ( select 1 from yourTable where id = id_in and rownum = 1 ) loop
res := true;
exit; -- only care about one record, so exit.
end loop;
return( res );
end exists;
Makes your checks really clean...
IF pkg.exists(someId) THEN
...
ELSE
...
END IF;
select nvl(max(1), 0) from mytable;
This statement yields 0 if there are no rows, 1 if you have at least one row in that table. It's way faster than doing a select count(*). The optimizer "sees" that only a single row needs to be fetched to answer the question.
Here's a (verbose) little example:
declare
YES constant signtype := 1;
NO constant signtype := 0;
v_table_has_rows signtype;
begin
select nvl(max(YES), NO)
into v_table_has_rows
from mytable -- where ...
;
if v_table_has_rows = YES then
DBMS_OUTPUT.PUT_LINE ('mytable has at least one row');
end if;
end;
If you are using an explicit cursor, It should be as follows.
DECLARE
CURSOR get_id IS
SELECT id
FROM person
WHERE id = 10;
id_value_ person.id%ROWTYPE;
BEGIN
OPEN get_id;
FETCH get_id INTO id_value_;
IF (get_id%FOUND) THEN
DBMS_OUTPUT.PUT_LINE('Record Found.');
ELSE
DBMS_OUTPUT.PUT_LINE('Record Not Found.');
END IF;
CLOSE get_id;
EXCEPTION
WHEN no_data_found THEN
--do things when record doesn't exist
END;
You can do EXISTS in Oracle PL/SQL.
You can do the following:
DECLARE
n_rowExist NUMBER := 0;
BEGIN
SELECT CASE WHEN EXISTS (
SELECT 1
FROM person
WHERE ID = 10
) THEN 1 ELSE 0 INTO n_rowExist END FROM DUAL;
IF n_rowExist = 1 THEN
-- do things when it exists
ELSE
-- do things when it doesn't exist
END IF;
END;
/
Explanation:
In the query nested where it starts with SELECT CASE WHEN EXISTS and after the parenthesis (SELECT 1 FROM person WHERE ID = 10) it will return a result if it finds a person of ID of 10. If the there's a result on the query then it will assign the value of 1 otherwise it will assign the value of 0 to n_rowExist variable. Afterwards, the if statement checks if the value returned equals to 1 then is true otherwise it will be 0 = 1 and that is false.
Select 'YOU WILL SEE ME' as ANSWER from dual
where exists (select 1 from dual where 1 = 1);
Select 'YOU CAN NOT SEE ME' as ANSWER from dual
where exists (select 1 from dual where 1 = 0);
Select 'YOU WILL SEE ME, TOO' as ANSWER from dual
where not exists (select 1 from dual where 1 = 0);
select max( 1 )
into my_if_has_data
from MY_TABLE X
where X.my_field = my_condition
and rownum = 1;
Not iterating through all records.
If MY_TABLE has no data, then my_if_has_data sets to null.

Same queries in PostgreSQL stored procedure

So, I'm trying to create a procedure that is going to find
a specific row in my table, save the row in a result to be
returned, delete the row and afterwards return the result.
The best thing I managed to do was the following:
CREATE OR REPLACE FUNCTION sth(foo integer)
RETURNS TABLE(a integer, b integer, ... other fields) AS $$
DECLARE
to_delete_id integer;
BEGIN
SELECT id INTO to_delete_id FROM my_table WHERE sth_id = foo LIMIT 1;
RETURN QUERY SELECT * FROM my_table WHERE sth_id = foo LIMIT 1;
DELETE FROM my_table where id = to_delete_id;
END;
$$ LANGUAGE plpgsql;
As you see, I have 2 SELECT operations that pretty much do the same thing (extra
overhead). Is there a way to just have the second SELECT and also set the to_delete_id
so I can delete the row afterwards?
You just want a DELETE...RETURNING.
DELETE FROM my_table WHERE sth_id=foo LIMIT 1 RETURNING *
Edit based on ahwnn's comment. Quite right too - teach me to cut + paste the query without reading it properly.
DELETE FROM my_table WHERE id = (SELECT id ... LIMIT 1) RETURNING *
Can be done much easier:
CREATE OR REPLACE FUNCTION sth(foo integer)
RETURNS SETOF my_table
AS
$$
BEGIN
return query
DELETE FROM my_table p
where sth_id = foo
returning *;
END;
$$
LANGUAGE plpgsql;
Select all the columns into variables, return them, then delete using the id:
Declare a variables for each column (named by convention the save as the column but with a leading underscore), then:
SELECT id, col1, col2, ...
INTO _id, _col1, _col22, ...
FROM my_table
WHERE sth_id = foo
LIMIT 1;
RETURN QUERY SELECT _id, _col1, _col22, ...;
DELETE FROM my_table where id = _id;

How to structure SQL - select first X rows for each value of a column?

I have a table with the following type of data:
create table store (
n_id serial not null primary key,
n_place_id integer not null references place(n_id),
dt_modified timestamp not null,
t_tag varchar(4),
n_status integer not null default 0
...
(about 50 more fields)
);
There are indices on n_id, n_place_id, dt_modified and all other fields used in the query below.
This table contains about 100,000 rows at present, but may grow to closer to a million or even more. Yet, for now let's assume we're staying at around the 100K mark.
I'm trying to select rows from these table where one two conditions are met:
All rows where n_place_id is in a specific subset (this part is easy); or
For all other n_place_id values the first ten rows sorted by dt_modified (this is where it becomes more complicated).
Doing it in one SQL seems to be too painful, so I'm happy with a stored function for this. I have my function defined thus:
create or replace function api2.fn_api_mobile_objects()
returns setof store as
$body$
declare
maxres_free integer := 10;
resulter store%rowtype;
mcnt integer := 0;
previd integer := 0;
begin
create temporary table paid on commit drop as
select n_place_id from payments where t_reference is not null and now()::date between dt_paid and dt_valid;
for resulter in
select * from store where n_status > 0 and t_tag is not null order by n_place_id, dt_modified desc
loop
if resulter.n_place_id in (select n_place_id from paid) then
return next resulter;
else
if previd <> resulter.n_place_id then
mcnt := 0;
previd := resulter.n_place_id;
end if;
if mcnt < maxres_free then
return next resulter;
mcnt := mcnt + 1;
end if;
end if;
end loop;
end;$body$
language 'plpgsql' volatile;
The problem is that
select * from api2.fn_api_mobile_objects()
takes about 6-7 seconds to execute. Considering that after that this resultset needs to be joined to 3 other tables with a bunch of additional conditions applied and further sorting applied, this is clearly unacceptable.
Well, I still do need to get this data, so either I am missing something in the function or I need to rethink the entire algorithm. Either way, I need help with this.
CREATE TABLE store
( n_id serial not null primary key
, n_place_id integer not null -- references place(n_id)
, dt_modified timestamp not null
, t_tag varchar(4)
, n_status integer not null default 0
);
INSERT INTO store(n_place_id,dt_modified,n_status)
SELECT n,d,n%4
FROM generate_series(1,100) n
, generate_series('2012-01-01'::date ,'2012-10-01'::date, '1 day'::interval ) d
;
WITH zzz AS (
SELECT n_id AS n_id
, rank() OVER (partition BY n_place_id ORDER BY dt_modified) AS rnk
FROM store
)
SELECT st.*
FROM store st
JOIN zzz ON zzz.n_id = st.n_id
WHERE st.n_place_id IN ( 1,22,333)
OR zzz.rnk <=10
;
Update: here is the same selfjoin construct as a subquery (CTEs are treated a bit differently by the planner):
SELECT st.*
FROM store st
JOIN ( SELECT sx.n_id AS n_id
, rank() OVER (partition BY sx.n_place_id ORDER BY sx.dt_modified) AS zrnk
FROM store sx
) xxx ON xxx.n_id = st.n_id
WHERE st.n_place_id IN ( 1,22,333)
OR xxx.zrnk <=10
;
After much struggle, I managed to get the stored function to return the results in just over 1 second (which is a huge improvement). Now the function looks like this (I added the additional condition, which didn't affect the performance much):
create or replace function api2.fn_api_mobile_objects(t_search varchar)
returns setof store as
$body$
declare
maxres_free integer := 10;
resulter store%rowtype;
mid integer := 0;
begin
create temporary table paid on commit drop as
select n_place_id from payments where t_reference is not null and now()::date between dt_paid and dt_valid
union
select n_place_id from store where n_status > 0 and t_tag is not null group by n_place_id having count(1) <= 10;
for resulter in
select * from store
where n_status > 0 and t_tag is not null
and (t_name ~* t_search or t_description ~* t_search)
and n_place_id in (select n_place_id from paid)
loop
return next resulter;
end loop;
for mid in
select distinct n_place_id from store where n_place_id not in (select n_place_id from paid)
loop
for resulter in
select * from store where n_status > 0 and t_tag is not null and n_place_id = mid order by dt_modified desc limit maxres_free
loop
return next resulter;
end loop;
end loop;
end;$body$
language 'plpgsql' volatile;
This runs in just over 1 second on my local machine and in about 0.8-1.0 seconds on live. For my purpose, this is good enough, although I am not sure what will happen as the amount of data grows.
As a simple suggestion, the way I like to do this sort of troubleshooting is to construct a query that gets me most of the way there, and optimize it properly, and then add the necessary pl/pgsql stuff around it. The major advantage to this approach is that you can optimize based on query plans.
Also if you aren't dealing with a lot of rows, array_agg() and unnest() are your friends as they allow you (on Pg 8.4 and later!) to dispense with the temporary table management overhead and simply construct and query an array of tuples in memory as a relation. It may perform better also if you are just hitting an array in memory instead of a temp table (less planning overhead and less query overhead too).
Also on your updated query I would look at replacing that final loop with a subquery or a join, allowing the planner to decide when to do a nested loop lookup or when to try to find a better way.