I have three functions which are all doing the same. I like to know whether SELECT ... FROM EMP WHERE DEPT_ID = v_dept returns any row.
Which one would be the fastest way?
CREATE OR REPLACE FUNCTION RecordsFound1(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
n INTEGER;
BEGIN
SELECT COUNT(*) INTO res FROM EMP WHERE DEPT_ID = v_dept;
RETURN n > 0;
END;
/
CREATE OR REPLACE FUNCTION RecordsFound2(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
CURSOR curEmp IS
SELECT DEPT_ID FROM EMP WHERE DEPT_ID = v_dept;
dept EMP.DEPT_ID%TYPE;
res BOOLEAN;
BEGIN
OPEN curEmp;
FETCH curEmp INTO dept;
res := curEmp%FOUND;
CLOSE curEmp;
RETURN res;
END;
/
CREATE OR REPLACE FUNCTION RecordsFound3(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
dept EMP.DEPT_ID%TYPE;
BEGIN
SELECT DEPT_ID INTO dept FROM EMP WHERE DEPT_ID = v_dept;
RETURN TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RETURN FALSE;
WHEN TOO_MANY_ROWS THEN
RETURN TRUE;
END;
/
Assume table EMP is very big and condition WHERE DEPT_ID = v_dept could match on thousands of rows.
Usually I would expect RecordsFound2 to be the fastest, because it has to fetch (maximum) only one single row. So in terms of I/O it should be the best.
For the non-believers: the exists() version:
CREATE OR REPLACE FUNCTION RecordsFound0(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
BEGIN
RETURN EXISTS( SELECT 1 FROM EMP WHERE DEPT_ID = v_dept);
END;
The Postgresql version:
CREATE OR REPLACE FUNCTION RecordsFound0(v_dept IN EMP.DEPT_ID%TYPE) RETURNS BOOLEAN AS
$func$
BEGIN
RETURN EXISTS( SELECT 1 FROM EMP WHERE DEPT_ID = v_dept);
END
$func$ LANGUAGE plpgsql;
And in Postgres the function can be implemented in pure sql, without the need for plpgsql(in Postgres the select does not need a ... FROM DUAL
CREATE OR REPLACE FUNCTION RecordsFound0s(v_debt IN EMP.DEPT_ID%TYPE) RETURNS BOOLEAN AS
$func$
SELECT EXISTS( SELECT NULL FROM EMP WHERE DEPT_ID = v_debt);
$func$ LANGUAGE sql;
Note: the unary EXISTS(...) operator yields a Boolean, which is exactly what you want.
Note2: I hope I have the Oracle syntax correct. (keywords RETURN <-->RETURNS and AS <-->IS)
Your solution 1 Count all occurrences:
You have the DBMS do much more work than needed. why let it scan the table and count all occurences when you only want to know whether at least one exists or not? This is slow. (But on a small emp table with an index on dept_id this may still look fast :-)
Your solution 2 Open a Cursor and only fetch the first record
A good idea and probably rather fast, as you stop, once you found a record. However, the DBMS doesn't know that you only want to look for the mere existence and may decide for a slow execution plan, as it expects you to fetch all matches.
Your solution 3 Fetch the one record or get an exception
This may be a tad faster, as the DBMS expects to find one record only. However, it must test for further matches in order to raise TOO_MANY_ROWS in case. So in spite of having found a record already it must look on.
solution 4 Use COUNT and ROWNUM
By adding AND ROWNUM = 1 you show the DBMS that you want one record only. At a minimum the DBMS knows it can stop at some point, at best it even notices that it is only one record needed. So depending on the implementation the DBMS may find the optimal execution plan.
solution 5 Use EXISTS
EXISTS is made to check for mere existence, so the DBMS can find the optimal execution plan. EXISTS is an SQL word, not a PL/SQL word and the SQL engine doesn't know BOOLEAN, so the function gets a bit clumsy:
CREATE OR REPLACE FUNCTION RecordsFound1(v_dept IN EMP.DEPT_ID%TYPE) RETURN BOOLEAN IS
v_1_is_yes_0_is_no INTEGER;
BEGIN
SELECT COUNT(*) INTO v_1_is_yes_0_is_no
FROM DUAL
WHERE EXISTS (SELECT * FROM EMP WHERE DEPT_ID = v_dept);
RETURN n = 1;
END;
The absolute fastest ways is to not call the count function at all.
A typical pattern is
count the number of rows
if cnt = 0 then do something
else read chunk of data and process
Simple read the data and than perform the count test on it.
you can break a SQL by adding an extra condition to your where-clause:
where ...
and rownum = 1;
this stops immediatly if at least one record is found and it is as fast as the "exists" operator.
See the followint sample code:
create or replace function test_record_exists(pi_some_parameter in varchar2) return boolean is
l_dummy varchar2(10);
begin
select 'x'
into l_dummy
from <your table>
where <column where you want to filter for> = pi_some_parameter
and rownum = 1;
return (true);
exception
when no_data_found then
return (false);
end;
If you use:
select count(*)
from my_table
where ...
and rownum = 1;
... then the query will:
be executed in the most efficient fashion
always return a single row
return either 0 or 1
This three factors make it very fast and very easy to use in PL/SQL as you do not have to concern yourself with whether a row is returned or not.
The returned value is also amenable to use as a true/false boolean, of course.
If you wanted to list the departments that either do or do not have any records in the emp table then I would certainly use EXISTS, as the semi-(anti)join is the most efficient means of executing the query:
select *
from dept
where [NOT] exists (
select null
from emp
where emp.dept_id = dept.id);
Related
CREATE OR REPLACE PROCEDURE test_max_rows (
max_rows IN NUMBER DEFAULT 1000
)
IS
CURSOR cur_test ( max_rows IN number ) IS
SELECT id FROM test_table
WHERE user_id = 'ABC'
AND ROWNUM <= max_rows;
id test_table.id%TYPE;
BEGIN
OPEN cur_test(max_rows) ;
LOOP
FETCH cur_test INTO id;
EXIT WHEN cur_test%NOTFOUND;
DBMS_OUTPUT.PUT_LINE('ID:' || id);
END LOOP;
END;
My requirement is to modify the above code so that when I pass -1 for max_rows, the proc should return all the rows returned by the query. Otherwise, it should limit the rows as per max_rows.
For example:
EXECUTE test_max_rows(-1);
This command should return all the rows returned by the SELECT statement above.
EXECUTE test_max_rows(10);
This command should return only 10 rows.
You can do this with a OR clause; change:
AND ROWNUM <= max_rows;
to:
AND (max_rows < 1 OR ROWNUM <= max_rows);
Then passing zero, -1, or any negative number will fetch all rows, and any positive number will return a restricted list. You could also replace the default 1000 clause with default null, and then test for null instead, which might be a bit more obvious:
AND (max_rows is null OR ROWNUM <= max_rows);
Note that which rows you get with a passed value will be indeterminate because you don't have an order by clause at the moment.
Doing this in a procedure also seems a bit odd, and you're assuming whoever calls it will be able to see the output - i.e. will have done set serveroutput on or the equivalent for their client - which is not a very safe assumption. An alternative, if you can't specify the row limit in a simple query, might be to use a pipelined function instead - you could at least then call that from plain SQL.
CREATE OR REPLACE FUNCTION test_max_rows (max_rows IN NUMBER DEFAULT NULL)
RETURN sys.odcinumberlist PIPELINED
AS
BEGIN
FOR r IN (
SELECT id FROM test_table
WHERE user_id = 'ABC'
AND (max_rows IS NULL OR ROWNUM <= max_rows)
) LOOP
PIPE ROW (r.id);
END LOOP;
END;
/
And then call it as:
SELECT * FROM TABLE(test_max_rows);
or
SELECT * FROM TABLE(test_max_rows(10));
Here's a quick SQL Fiddle demo. But you should still consider if you can do the whole thing in plain SQL and PL/SQL altogether.
I have the following table with the following tables and values and types.
create table example (
fname text,
lname text,
value int);
insert into example values
('doge','coin',123),
('bit','coin',434),
('lite','coin',565),
('doge','meme',183),
('bit','meme',453),
('lite','meme',433);
create type resultrow as (
nam text,
amount int);
I would like to write a function, that groups by a parameter I give to the function.
This example works:
do $$
declare
my_parameter text;
results resultrow[];
begin
my_parameter = 'last';
results := array(select row( case when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end,
sum(salary))::resultrow
from example
group by case when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end);
raise notice '%', results;
end;
$$ language plpgsql;
I have been told, that CASE WHEN decisions are really expensive. One obvious solution would be to create the select statements twice:
if my_parameter = 'first' then
results := array(select row(fname,sum(salary))::resultrow
from example
group by fname);
end if;
if my_parameter = 'last' then
results := array(select row(lname,sum(salary))::resultrow
from example
group by lname);
end if;
But this leads to a lot of ugly duplicated code.
Is there another solution to make the group by parameterisable?
If you don't want to use case, you can use this:
with cte(name, salary) as (
select fname, salary from example where my_parameter = 'first'
union all
select lname, salary from example where my_parameter = 'last'
)
select name, sum(salary)
from cte
group by name
But, actually, it's better to test, I've not heard that case is expensive.
If you'll find that case is not expensive, I still suggest use subquery or cte to avoid code duplication, like:
with cte(name, salary) as (
select
case
when my_parameter = 'first' then fname
when my_parameter = 'last' then lname
end as name,
salary
from example
)
select name, sum(salary)
from cte
group by name
Simplify what you have:
DO
$do$
DECLARE
_param text := 'last'; -- one can assign at declaration time
results resultrow[];
BEGIN
results := ARRAY(
SELECT t::resultrow -- refer to table alias to get whole row
FROM (
SELECT CASE _param -- simple "switched" CASE
WHEN 'first' THEN fname
WHEN 'last' THEN lname
END
,sum(salary)
FROM example
GROUP BY 1 -- simpler with positional reference
) t
);
RAISE NOTICE '%', results;
END
$do$ LANGUAGE plpgsql;
Using simple CASE syntax variant. This way the expression is only evaluated once and the syntax is simpler. Since your question refers to CASE - even if that's hardly relevant.
Also using a positional reference in the GROUP BY clause. This seems relevant to the title of your question. More explanation in these related answers:
Select first row in each GROUP BY group?
GROUP BY + CASE statement
This kind of query can be very inefficient. It's not a problem of the (very cheap!) CASE statement per se. It's because the planner has to provide for varying input in the first column and may be forced to use a generic, less optimized plan.
Dynamic SQL
I assume the actual goal is to write a function that takes my_parameter. Use dynamic SQL with EXECUTE, which will likely result in a superior query plan, i.e. superior performance. There are lots of code example here, try a search.
Also, I return a set of resultrow instead of the awkward ARRAY you had in your example (since you cannot return from a DO statement):
CREATE FUNCTION f_salaray_for_param(_param text)
RETURNS SETOF resultrow AS
$func$
DECLARE
_fld text :=
CASE _param
WHEN 'first' THEN 'fname' -- SQL injection not possible
WHEN 'last' THEN 'lname'
END;
BEGIN
IF _fld IS NULL THEN -- exception for invalid params
RAISE EXCEPTION 'Unexpected value for _param: %', _param;
END IF;
RETURN QUERY EXECUTE '
SELECT ' || _fld || ', sum(salary)
FROM example
GROUP BY 1'; -- query is very simple now
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_salaray_for_param('first');
BTW, the plpgsql assignment operator is := (not =).
I'm doing some testing to see if I can speed up a particular result set, but can't seem to get this particular solution working. I have data coming a few different tables and want to combine the data. I want to try this without using a union select to see if I get a performance improvement.
When I have a custom table/object type in a function, it seems to delete the existing data from the table when doing the subsequent select. Is there a way to do subsequent selects into the table without having the previous data deleted?
SQL Fiddle
I don't think that approach will be faster, in fact I expect it to be much slower.
But if you do want to do it, you need to put the rows from the second select into an intermediate collection and then join both using multiset union.
Something like this:
create or replace function
academic_history(p_student_id number)
return ah_tab_type
is
result ah_tab_type;
t ah_tab_type;
begin
select ah_obj_type(student_id,course_code,grade)
bulk collect into result
from completed_courses
where student_id = p_student_id;
select ah_obj_type(student_id,course_code,'P')
bulk collect into T
from trans_courses
where student_id = p_student_id;
result := result multiset union t;
return result;
end;
/
As well as the multiset approach, if you really wanted to do this you could also make it a pipelined function:
create or replace function
academic_history(p_student_id number)
return ah_tab_type pipelined
is
T ah_tab_type;
begin
select ah_obj_type(student_id,course_code,grade)
bulk collect
into T
from completed_courses
where student_id = p_student_id;
for i in 1..T.count loop
pipe row (T(i));
end loop;
select ah_obj_type(student_id,course_code,'P')
bulk collect
into T
from trans_courses
where student_id = p_student_id;
for i in 1..T.count loop
pipe row (T(i));
end loop;
return;
end;
SQL Fiddle.
Thanks a_horse_with_no_name for pointing out that doing the multiple selects one at a time will probably be slower. I was able to reduce the execution time by filtering each select by student_id and then union-ing (rather than union-ing everything then filtering). On the data set I'm working with this solution was the fastest taking less than 1/10 of a second...
create or replace function
academic_history(p_student_id number)
return ah_tab_type
is
T ah_tab_type;
begin
select ah_obj_type(student_id,course_code,grade)
bulk collect
into T
from (
select student_id,course_code,grade
from completed_courses
where student_id = p_student_id
union
select student_id,course_code,'P'
from trans_courses
where student_id = p_student_id);
return T;
end;
/
select *
from table(academic_history(1));
and this took 2-3 seconds to execute...
create view vw_academic_history
select student_id,course_code,grade
from completed_courses
union
select student_id,course_code,'P'
from trans_courses;
select *
from vw_academic_history
where student_id = 1;
SQLFiddle.
I'm writing a function in PL/pgSQL, and I'm looking for the simplest way to check if a row exists.
Right now I'm SELECTing an integer into a boolean, which doesn't really work. I'm not experienced with PL/pgSQL enough yet to know the best way of doing this.
Here's part of my function:
DECLARE person_exists boolean;
BEGIN
person_exists := FALSE;
SELECT "person_id" INTO person_exists
FROM "people" p
WHERE p.person_id = my_person_id
LIMIT 1;
IF person_exists THEN
-- Do something
END IF;
END; $$ LANGUAGE plpgsql;
Update - I'm doing something like this for now:
DECLARE person_exists integer;
BEGIN
person_exists := 0;
SELECT count("person_id") INTO person_exists
FROM "people" p
WHERE p.person_id = my_person_id
LIMIT 1;
IF person_exists < 1 THEN
-- Do something
END IF;
Simpler, shorter, faster: EXISTS.
IF EXISTS (SELECT FROM people p WHERE p.person_id = my_person_id) THEN
-- do something
END IF;
The query planner can stop at the first row found - as opposed to count(), which scans all (qualifying) rows regardless. Makes a big difference with big tables. The difference is small for a condition on a unique column: only one row qualifies and there is an index to look it up quickly.
Only the existence of at least one qualifying row matters. The SELECT list can be empty - in fact, that's shortest and cheapest. (Some other RDBMS don't allow an empty SELECT list on principal.)
Improved with #a_horse_with_no_name's comments.
Use count(*)
declare
cnt integer;
begin
SELECT count(*) INTO cnt
FROM people
WHERE person_id = my_person_id;
IF cnt > 0 THEN
-- Do something
END IF;
Edit (for the downvoter who didn't read the statement and others who might be doing something similar)
The solution is only effective because there is a where clause on a column (and the name of the column suggests that its the primary key - so the where clause is highly effective)
Because of that where clause there is no need to use a LIMIT or something else to test the presence of a row that is identified by its primary key. It is an effective way to test this.
In Oracle, it's possible to return a cursor inside a SQL query, using the cursor keyword, like this:
select owner, table_name,
cursor (select column_name
from all_tab_columns
where owner = allt.owner
and table_name = allt.table_name) as columns
from all_tables allt
The questions are:
Does anyone know where can I find documentation for this?
Does PortgreSQL (or any other open source DBMS) have a similar feature?
It's called a CURSOR EXPRESSION, and it is documented in the obvious place, the Oracle SQL Reference. Find it here.
As for your second question, the closest thing PostgreSQL offers to match this functionality is "scalar sub-queries". However, as #tbrugz points out, these only return one row and one column, so they aren't much like Cursor Expressions. Read about them in the documentation here. MySQL also has Scalar Sub-queries, again limited to one column and one row. Docs here. Likewise SQL Server and DB2 (not open source but for completeness).
That rules out all the obvious contenders. So, it seems unlikely any other DBMS offers the jagged result set we get from Oracle's cursor expression.
Postgres provides cursor expressions but the syntax is a bit less handy than Oracle's.
First you need to create function for array to refcursor conversion:
create or replace function arr2crs(arr anyarray) returns refcursor as $$
declare crs refcursor;
begin
open crs for select * from unnest(arr);
return crs;
end;
$$ language plpgsql volatile;
Now let's create some test data
create table dep as
select 1 depid, 'Sales' depname
union all
select 2 depid, 'IT' depname;
create table emp as
select 1 empid, 1 depid, 'John' empname union all
select 2 empid, 1 depid, 'James' empname union all
select 3 empid, 2 depid, 'Rob';
You can query it like this
select
dep.*,
arr2crs(array(
select row(emp.*)::emp from emp
where emp.depid = dep.depid
)) emps
from dep
And process in on client side like this (Java)
public static List Rs2List(ResultSet rs) throws SQLException{
List result = new ArrayList();
ResultSetMetaData meta = rs.getMetaData();
while(rs.next()){
Map row = new HashMap();
for (int i = 1; i <= meta.getColumnCount(); i++){
Object o = rs.getObject(i);
row.put(
meta.getColumnName(i),
(o instanceof ResultSet)?Rs2List((ResultSet)o):o);
}
result.add(row);
}
return result;
}
Note that you must explicitly cast row to particular type. You can use CREATE TYPE to create necessary types.