Optimizing sql query with subselect list in clause - sql

I'm using oracle 11g and trying to optimize a query.
The basic structure of the query is:
SELECT val1, val2, val3,
FROM
table_name
WHERE
val1 in (subselect statement is here, it selects a list of possible values for
val1 from another table)
and val5>=X and val5<=Y
group by val1
order by val2 desc;
My issue is that when I use a subselect, the cost is 3130.
If I fill in the results of the subselect by hand - so, for example
field1 in (1, 2, 3, 4, 5, 6)
Where (1, 2, 3, 4, 5, 6) is the results of the subselect, which in this case is all possible values of field 1, the cost of the query is 14, and oracle uses an "inlist iterator" for the group by part of the query. The results of the two queries are identical.
My question is how to mimic the behaviour of manually listing the possible values of field1 with a subselect statement. The reason I don't list those values in the query is that the possible values change based on one of the other fields, so the subselect is pulling the possible values of field1 from a 2nd table based on, say, field2.
I have an index of val1, val5, so it isn't doing any full table scans - it does do a range scan in both cases, but in the subselect case the range scan is much more expensive. However it isn't the most expensive part of the subselect query. The most expensive part is the group by, which is a HASH.
Edit - Yes, the query isn't syntactically correct - I didn't want to put up anything too specific. The actual query is fine - the selects use valid group by functions.
The subselect returns 6 values, but it can be anywhere from 1-50 or so based on the other value.
Edit2 - What I ended up doing was 2 separate queries so I could generate the list used in the subselect. I actually tried a similar test in sqlite, and it does the same thing, so this isn't just Oracle.

what you are seeing is a result of the IN () bieng subject to bind variable peeking. when you have histograms you write a query like "where a = 'a'" oracle will use the histogram to guess how many rows will be returned (same idea with an inlist operator, which iterates for each item and aggregates rows). if no histograms it will make a guess in the form of rows/distinct values.
In a subquery oracle doesn't do this (in most cases..there is a unique case where it does).
for example:
SQL> create table test
2 (val1 number, val2 varchar2(20), val3 number);
Table created.
Elapsed: 00:00:00.02
SQL>
SQL> insert into test select 1, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;
100 rows created.
Elapsed: 00:00:00.01
SQL> insert into test select 2, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 1000;
1000 rows created.
Elapsed: 00:00:00.02
SQL> insert into test select 3, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;
100 rows created.
Elapsed: 00:00:00.00
SQL> insert into test select 4, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100000;
100000 rows created.
so i have a table with 101200 rows. for VAL1 , 100 are "1" 1000 are "2" 100 are "3" and 100k are "4".
now if histograms are gathered (and we do want them in this case)
SQL> exec dbms_stats.gather_table_stats(user , 'test', degree=>4, method_opt=>'for all indexed columns size 4', estimate_percent=>100);
SQL> exec dbms_stats.gather_table_stats(user , 'lookup', degree=>4, method_opt =>'for all indexed columns size 3', estimate_percent=>100);
we see the following:
SQL> explain plan for select * from test where val1 in (1, 2, 3) ;
Explained.
SQL> #explain ""
Plan hash value: 3165434153
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1200 | 19200 | 23 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1200 | 19200 | 23 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | TEST1 | 1200 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
vs
SQL> explain plan for select * from test where val1 in (select id from lookup where str = 'A') ;
Explained.
SQL> #explain ""
Plan hash value: 441162525
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25300 | 518K| 106 (3)| 00:00:02 |
| 1 | NESTED LOOPS | | 25300 | 518K| 106 (3)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| LOOKUP | 1 | 5 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | LOOKUP1 | 1 | | 0 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | TEST | 25300 | 395K| 105 (3)| 00:00:02 |
----------------------------------------------------------------------------------------
where lookup table is
SQL> select * From lookup;
ID STR
---------- ----------
1 A
2 B
3 C
4 D
(str is unique indexed and has histograms).
notice a bang on cardinality of 1200 for the inlist and a good plan, but a wildly inaccurate one on the sub query? Oracle hasn't computed histograms on the join condition, instead it has said "look, i dont know what id will be, so ill guess total rows(100k+1000+100+100)/distinct values(4) = 25300 and use that. as such its picked a full table scan.
that's all great, but how to fix it? if you know that this sub query will match a small number of rows (we do). then you have to hint the outer query to try to have it use an index. like:
SQL> explain plan for select /*+ index(t) */ * from test t where val1 in (select id from lookup where str = 'A') ;
Explained.
SQL> #explain
Plan hash value: 702117913
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25300 | 518K| 456 (1)| 00:00:06 |
| 1 | NESTED LOOPS | | 25300 | 518K| 456 (1)| 00:00:06 |
| 2 | TABLE ACCESS BY INDEX ROWID| LOOKUP | 1 | 5 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | LOOKUP1 | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| TEST | 25300 | 395K| 455 (1)| 00:00:06 |
|* 5 | INDEX RANGE SCAN | TEST1 | 25300 | | 61 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
another thing is in my particular case. as val1=4 is most of the table, lets say i have my standard query:
select * from test t where val1 in (select id from lookup where str = :B1);
for the possible :B1 inputs. if i know that the valid values passed in are A, B and C (ie not D which maps to id=4) . i can add this trick:
SQL> explain plan for select * from test t where val1 in (select id from lookup where str = :b1 and id in (1, 2, 3)) ;
Explained.
SQL> #explain ""
Plan hash value: 771376936
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 5250 | 24 (5)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 5250 | 24 (5)| 00:00:01 |
|* 2 | VIEW | index$_join$_002 | 1 | 5 | 1 (100)| 00:00:01 |
|* 3 | HASH JOIN | | | | | |
|* 4 | INDEX RANGE SCAN | LOOKUP1 | 1 | 5 | 0 (0)| 00:00:01 |
| 5 | INLIST ITERATOR | | | | | |
|* 6 | INDEX UNIQUE SCAN | SYS_C002917051 | 1 | 5 | 0 (0)| 00:00:01 |
| 7 | INLIST ITERATOR | | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID| TEST | 1200 | 19200 | 23 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | TEST1 | 1200 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
now notice oracle has got a reasonable card (its pushed the 1,2,3 onto the TEST table and got 1200..not 100% accurate, as i was only filtering on noe of them but ive told oralce CERTAINLY NOT 4!

I have done some research and I think everything is explained here: oracle docs.
Just look in "How the CBO Evaluates IN-List Iterators"
and compare it to "How the CBO Evaluates the IN Operator".
Your query with "field1 in (1, 2, 3, 4, 5, 6)" is matching first case but query with subselect is rewritten by Oracle.
So every query with subselect or join will have similar cost to yours unless you find very tricky way to put return from subquery as parameters.
You can always try to set more memory to sorts.

You might be able to fix the statement by adding indexes on the subselect. However, you would have to post the query and execution plan to understand that. By the way, how long does the subselect itself take?
You can try one of the following two versions:
select val1, val2, val3
from table_name join
(select distinct val from (subselect here)) t
on table_name.val1 = t.val
where val5>=X and val5<=Y
group by val1, val2, val3
order by val2 desc;
or:
select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
exists (select 1 from (subselect here) t where t.val = table_name.val1)
group by val1, val2, val3
order by val2 desc;
These are semantically equivalent, and one of them might optimize better.
One other possibility that might work is to do the filtering after the group by. Something like:
select t.*
from (select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
group by val1, val2, val3
) t
where val1 in (subselect here)
order by val2 desc;

Related

ORA-01652 Why does unused row limiter solve this?

If I run a query without a row limiter i get an ora-01652 telling me I am out of temp table space. (I'm not the DBA & I admittedly don't fully understand this error.) If I add a rownum < 1000000000 it runs in a few seconds (yes, it's limited to a billion rows). My inner query only returns about 1,000 rows. How is an absurdly large row limiter, that is never reached, making this query run? There should be no difference between the limited and unlimited queries, no?
select
col1,
col2,
...
from
(
select
col1, col2,...
from table1 a
join table2 b-- limiter for performance
on a.column= b.column
or a.col= b.col
where
filter = 'Y'
and rownum <1000000000 -- irrelevant but query doesn't run without it.
) c
join table3 d
on c.id = d.id
We need to see the execution plan for the queries with and without the rownum condition. But as an example, adding a "rownum" can change an execution plan
SQL> create table t as select * from dba_objects
2 where object_id is not null;
Table created.
SQL>
SQL> create index ix on t ( object_id );
Index created.
SQL>
SQL> set autotrace traceonly explain
SQL> select * from t where object_id > 0 ;
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 82262 | 10M| 445 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 82262 | 10M| 445 (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_ID">0)
SQL> select * from t where object_id > 0 and rownum < 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 658510075
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 1188 | 3 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 9 | 1188 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IX | | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
This is a simplistic example, but you can get similar things with joins and the like, in particular, the "rownum" clause might be prohibiting the innermost join being folded into the outermost one, and thus yielding a different plan.

Trying to Remove table access full in oracle

i have a simple query
select round(sum(a.wt)) as a_wt
from db.abc a
where a.date is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce
and I want to remove table access full.there are 3 index which are like these on following combination of column
col_no
col_no,date,fant,pyc
wagno,batno
what can be can be done to remove table access full.
One option would be creating a Function Based Index :
create index idx_date_col_pod on ABC (nvl("date",date'1900-01-01'), nvl(col_no,0), pod_cd);
and convert the query to :
select round(sum(wt)) as a_wt
from abc
where nvl("date",date'1900-01-01') = date'1900-01-01' -- matching means "date" column is null assuming there exists no records with this ancient date.
and nvl(col_no,0) != 0 -- non-matching means "col_no" column is not null
and pod_cd = 367
and fant != rce
Usually indexes are not indexing the null values, so the conditions like
where a.date is null
and a.col_no is not null
meaning just "don't use an index in order to get lines for these conditions"
However, there is an option in the create index statement allowing it to index null columns (starting from version 11 as far as I know)
create index abc_date_nulls on abc(date, 1); -- (xxx,1) is doing the trick
Thus you'll create an index that considers null values. This might be useful depending on selectivity of "date is null" condition.
Otherwise or in addition, I'd suggest you to check the selectivity for the condition "pod_cd = 367" and build an index on pod_cd column.
If you are sure the index will help and the database doesn't use it, you can force oracle to use an index using a hint
select /*+ index(index name) */ ... from ...
It is good for tests or for checking the impact indexes can provide but pleasepleaseplease be careful using them in production. Google the documentation and all the things about disadvantages for that approach. Don't tell anyone I told you to use hints on production
Indexes can be used to check for nulls, and to compare two columns against each other.
Setup:
create table abc
( dt date
, col_no number
, pod_cd varchar2(5)
, fant number
, rce number
, wt number )
nologging;
insert /*+ append */ into abc (dt, col_no, pod_cd, fant, rce, wt)
select case mod(rownum,3) when 0 then date '2018-12-31' + mod(rownum,1000) end
, case mod(rownum,7) when 0 then rownum end
, case mod(rownum,2) when 1 then mod(rownum,1000) end
, round(dbms_random.value) + 10
, round(dbms_random.value) + 10
, 1
from xmltable('1 to 10000000');
create index x1 on abc (pod_cd, dt);
create index x2 on abc (fant, rce);
Test for nulls:
select count(*) from abc a
where a.pod_cd = '367'
and a.dt is null;
COUNT(*)
----------
6667
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2253536563
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 20 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | INDEX RANGE SCAN| X1 | 6667 | 46669 | 20 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."POD_CD"='367' AND "A"."DT" IS NULL)
The query was executed using index X1, without touching the table.
Test for fant != rce:
select count(*)
from abc a
where a.fant != a.rce;
COUNT(*)
----------
5000666
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 29151601
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 6468 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | INDEX FAST FULL SCAN| X2 | 5000K| 28M| 6468 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."FANT"<>"A"."RCE")
The query was executed using index X2, also without touching the table.
Test for the full query:
create index x3 on abc(pod_cd, dt, fant, rce, col_no, wt);
select round(sum(a.wt)) as a_wt
from abc a
where a.dt is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce;
A_WT
----------
481
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3828004431
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 28 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 16 | | |
|* 2 | INDEX RANGE SCAN| X3 | 476 | 7616 | 28 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."POD_CD"='367' AND "A"."DT" IS NULL)
filter("A"."COL_NO" IS NOT NULL AND "A"."FANT"<>"A"."RCE")
Full table scans aren't always terrible, though.
SQL> drop index x1;
Index dropped.
SQL> drop index x2;
Index dropped.
SQL> drop index x3;
Index dropped.
select round(sum(a.wt)) as a_wt
from abc a
where a.dt is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce;
A_WT
----------
481
1 row selected.
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 1045519631
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 8188 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 16 | | |
|* 2 | TABLE ACCESS FULL| ABC | 476 | 7616 | 8188 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."COL_NO" IS NOT NULL AND "A"."POD_CD"='367' AND
"A"."DT" IS NULL AND "A"."FANT"<>"A"."RCE")
Table ABC has 10 million rows, and the full scan took 0.18 seconds. This is in a VM on a 4 year old laptop.

Distinct clause on large join

Following this question, suppose now I've set up the indexes, and now I want only to return certain field, without duplicates:
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2
where A.year=2016
and B.year=2016
the problem now is I'm getting something like 150k cod, with only 1000 distinct values, so my query is very inefficient.
Question: how can I improve that? i.e, how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
Thank you in advance!
I'm basing my answer on your question:
how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
with the EXISTS clause, once it sees a match it will stop and check for the next record to be checked.
adding the DISTINCT will filter out any duplicate CODs (in case there is one).
select DISTINCT cod
from A ax
where year = 2016
and exists ( select 1
from B bx
WHERE Ax.ID1 = Bx.ID1
AND Ax.ID2 = Bx.ID2
AND Ax.YEAR = Bx.YEAR);
EDIT: Was curious which solution (IN or EXISTS) will give me a better Explain plan
Create the 1st Table Definition
Create table A
(
ID1 number,
ID2 number,
cod varchar2(100),
year number
);
insert 4000000 sequential numbers
BEGIN
FOR i IN 1..4000000 loop
insert into A (id1, id2, cod, year)
values (i, i , i, i);
end loop;
END;
commit;
Create Table B and insert the same data into to it
Create table B
as
select *
from A;
Reinsert Data from Table A to make duplicates
insert into B
select *
from A
Build the Indexes mentioned in the Previous Post Index on join and where
CREATE INDEX A_IDX ON A(year, id1, id2);
CREATE INDEX B_IDX ON B(year, id1, id2);
Update a bunch of rows to make it fetch multiple rows with the year 2016:
update B
set year = 2016
where rownum < 20000;
update A
set year = 2016
where rownum < 20000;
commit;
Check Explain plan using EXISTS
Plan hash value: 1052726981
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS SEMI | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 2 | 36 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("BX"."YEAR"=2016 AND "AX"."ID1"="BX"."ID1" AND "AX"."ID2"="BX"."ID2")
filter("AX"."YEAR"="BX"."YEAR")
Check Explain plan using IN
Plan hash value: 3002464630
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 1 | 18 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("YEAR"=2016 AND "ID1"="ID1" AND "ID2"="ID2")
Although my test case is limited, i'm guessing that both the IN and EXISTS clause have nearly the same execution.
On the face of it, what you are actually trying to do should be done like this:
select distinct cod
from A
where year = 2016
and (id1, id2) in (select id1, id2 from B where year = 2016)
The subquery in the WHERE condition is a non-correlated query, so it will be evaluated only once. And the IN condition is evaluated using short-circuiting; instead of a complete join, it will search through the results of the subquery only until a match is found.
EDIT: As Migs Isip points out, there may be duplicate codes in the original table, so a "distinct" may still be needed. I edited my code to add it back after Migs posted his answer.
Not sure about your existing indexes but you can improve your query a bit by adding another JOIN condition like
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2 and
A.year = B.year // this one
where A.year=2016;

Unwanted queries merge in Oracle 10g

I am working on Oracle Database 10g Release 10.2.0.5.0. I have view like:
CREATE OR REPLACE VIEW some_view
(
A,
B
)
AS
SELECT A, B
FROM table_a
WHERE condition_a
UNION ALL
SELECT A, B
FROM table_b
WHERE condition_b;
and some database function some_db_package.foo(). My problem is that when I execute query:
SELECT A, some_db_package.foo(B) val
FROM some_view
WHERE some_db_package.foo(B) = 0;
Oracle is merging conditions from query and some_view, so I am getting something like:
SELECT A, some_db_package.foo(B) val
FROM table_a
WHERE some_db_package.foo(B) = 0 AND condition_a
UNION ALL
SELECT A, some_db_package.foo(B) val
FROM table_b
WHERE some_db_package.foo(B) = 0 AND condition_b;
some_db_package.foo() executes on all rows from table_a and table_b and I would like to execute some_db_package.foo() only on filtered (by condition_a and condition_b) rows. Is there any way to do that (i.e. by changing sql query or some_view definition) assuming that I can not use optimizer hints in query?
Problem solved. Just to summmarize:
some_db_package.foo() - for given event and date range counts event's errors which occured between dates (foo() access tables), so it is deterministic only when sysdate > dateTo.
select * from ( SELECT A, some_db_package.foo(B) val FROM some_view ) does not make difference.
Actually I do not need UNION ALL and I did test with UNION, but stil the same result.
with some_view_set as (select A, B from some_view) select * from ( select A, some_db_package.foo(B) val from some_view_set ) where val = 0 does not make difference.
I did test with optimizer hints and unfortunately Oracle ignored them.
Using ROWNUM >= 1 in some_view was the solution for my problem.
Thank you for help, I really appreciate it.
ROWNUM is usually the best way to stop optimizer transformations. Hints are difficult to get right - the syntax is weird and buggy and there are many potential transformations that need to be stopped. There are other ways to re-write the query, but ROWNUM is generally the best way because it is documented to work this way. ROWNUM has to evaluate last to be used in Top-N queries, you can always rely on it to prevent query blocks from being merged.
Sample schema
drop table table_a;
drop table table_b;
create table table_a(a number, b number);
create table table_b(a number, b number);
insert into table_a select level, level from dual connect by level <= 10;
insert into table_b select level, level from dual connect by level <= 10;
begin
dbms_stats.gather_table_stats(user, 'table_a');
dbms_stats.gather_table_stats(user, 'table_b');
end;
/
--FOO takes 1 second each time it is executed.
create or replace function foo(p_value number) return number is
begin
dbms_lock.sleep(1);
return 0;
end;
/
--BAR is fast, but the optimizer doesn't know it.
create or replace function bar(p_value number) return number is
begin
return p_value;
end;
/
--This view returns 2 rows.
CREATE OR REPLACE VIEW some_view AS
SELECT A, B
FROM table_a
WHERE a = bar(1)
UNION ALL
SELECT A, B
FROM table_b
WHERE a = bar(2);
Slow query
This query takes 20 seconds to run, implying the function is evaluated 20 times.
SELECT A, foo(B) val
FROM some_view
WHERE foo(B) = 0;
The explain plan shows the conditions are merged, and it appears that the conditions are evaluated from left to right (but don't rely on this always being true!).
explain plan for
SELECT A, foo(B) val
FROM some_view
WHERE foo(B) = 0;
select * from table(dbms_xplan.display);
Plan hash value: 4139878329
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 5 (0)| 00:00:01 |
| 1 | VIEW | SOME_VIEW | 1 | 6 | 5 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | TABLE ACCESS FULL| TABLE_A | 1 | 6 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| TABLE_B | 1 | 6 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("FOO"("B")=0 AND "A"="BAR"(1))
4 - filter("FOO"("B")=0 AND "A"="BAR"(2))
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
Fast query
Add a seemingly redundant ROWNUM predicate that does nothing except prevent transformations.
CREATE OR REPLACE VIEW some_view2 AS
SELECT A, B
FROM table_a
WHERE a = bar(1)
AND ROWNUM >= 1 --Prevent optimizer transformations, for performance.
UNION ALL
SELECT A, B
FROM table_b
WHERE a = bar(2)
AND ROWNUM >= 1 --Prevent optimizer transformations, for performance.
;
Now the query only takes 4 seconds, the function is only run 4 times.
SELECT A, foo(B) val
FROM some_view2
WHERE foo(B) = 0;
In the new explain plan it's clear that the FOO function is evaluated last, after most of the filtering is complete.
explain plan for
SELECT A, foo(B) val
FROM some_view2
WHERE foo(B) = 0;
select * from table(dbms_xplan.display);
Plan hash value: 4228269064
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 6 (0)| 00:00:01 |
|* 1 | VIEW | SOME_VIEW2 | 2 | 52 | 6 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
| 3 | COUNT | | | | | |
|* 4 | FILTER | | | | | |
|* 5 | TABLE ACCESS FULL| TABLE_A | 1 | 6 | 3 (0)| 00:00:01 |
| 6 | COUNT | | | | | |
|* 7 | FILTER | | | | | |
|* 8 | TABLE ACCESS FULL| TABLE_B | 1 | 6 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FOO"("B")=0)
4 - filter(ROWNUM>=1)
5 - filter("A"="BAR"(1))
7 - filter(ROWNUM>=1)
8 - filter("A"="BAR"(2))
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
Ben's idea to make the function DETERMINISTIC may also help reduce the function calls.

Why isn't index used for this query?

I had a query where an index was not used when I thought it could be, so I reproduced it out of curiosity:
Create a test_table with 1.000.000 rows (10 distinct values in col, 500 bytes of data in some_data).
CREATE TABLE test_table AS (
SELECT MOD(ROWNUM,10) col, LPAD('x', 500, 'x') some_data
FROM dual
CONNECT BY ROWNUM <= 1000000
);
Create an index and gather table stats:
CREATE INDEX test_index ON test_table ( col );
EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );
Try to get distinct values of col and the COUNT:
EXPLAIN PLAN FOR
SELECT col, COUNT(*)
FROM test_table
GROUP BY col;
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 15816 (1)| 00:03:10
| 1 | HASH GROUP BY | | 10 | 30 | 15816 (1)| 00:03:10
| 2 | TABLE ACCESS FULL| TEST_TABLE | 994K| 2914K| 15755 (1)| 00:03:10
---------------------------------------------------------------------------------
The index is not used, providing the hint does not change this.
I guess, the index can't be used in this case, but why?
UPDATE:
Try making the col column NOT NULL. That is the reason it's not using the index. When it's not null, here's the plan.
SELECT STATEMENT, GOAL = ALL_ROWS 69 10 30
HASH GROUP BY 69 10 30
INDEX FAST FULL SCAN SANDBOX TEST_INDEX 56 98072 294216
If the optimizer determines that it's more efficient NOT to use the index (maybe due to rewriting the query), then it won't. Optimizer hints are just that, namely, hints to tell Oracle an index you'd like it to use. You can think of them as suggestions. But if the optimizer determines that it's better not to use the index (again, as result of query rewrite for example), then it's not going to.
Refer to this link: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm
"Specifying one of these hints causes the optimizer to choose the specified access path only if the access path is available based on the existence of an index or cluster and on the syntactic constructs of the SQL statement. If a hint specifies an unavailable access path, then the optimizer ignores it."
Since you are running a count(*) operation, the optimizer has determined that it's more efficient to just scan the whole table and hash instead of using your index.
Here's another handy link on hints:
http://www.dba-oracle.com/t_hint_ignored.htm
you forgot this really important information: COL is not null
If the column is NULLABLE, the index can not be used because there might be unindexed rows.
SQL> ALTER TABLE test_table MODIFY (col NOT NULL);
Table altered
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*) FROM test_table GROUP BY col;
Explained
SQL> SELECT * FROM table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1077170955
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 1954 (1)| 00:00:2
| 1 | SORT GROUP BY NOSORT| | 10 | 30 | 1954 (1)| 00:00:2
| 2 | INDEX FULL SCAN | TEST_INDEX | 976K| 2861K| 1954 (1)| 00:00:2
--------------------------------------------------------------------------------
I ran Peter's original stuff and reproduced his results. I then applied dcp's suggestion...
SQL> alter table test_table modify col not null;
Table altered.
SQL> EXEC dbms_stats.gather_table_stats( user, 'TEST_TABLE' , cascade=>true)
PL/SQL procedure successfully completed.
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*)
3 FROM test_table
4 GROUP BY col;
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2099921975
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 574 (9)| 00:00:07 |
| 1 | HASH GROUP BY | | 10 | 30 | 574 (9)| 00:00:07 |
| 2 | INDEX FAST FULL SCAN| TEST_INDEX | 1000K| 2929K| 532 (2)| 00:00:07 |
------------------------------------------------------------------------------------
9 rows selected.
SQL>
The reason this matters, is because NULL values are not included in a normal B-TREE index, but the GROUP BY has to include NULL as a grouping "value" in your query. By telling the optimizer that there are no NULLs in col it is free to use the much more efficient index (I was getting an elapsed time of almost 3.55 seconds with the FTS). This is a classic example of how metadata can influence the optimizer.
Incidentally, this is obviously a 10g or 11g database, because it uses the HASH GROUP BY algorithm, instead of the older SORT (GROUP BY) algorithm.
bitmap index will do as well
Execution Plan
----------------------------------------------------------
Plan hash value: 2200191467
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 15983 (2)| 00:03:12 |
| 1 | HASH GROUP BY | | 10 | 30 | 15983 (2)| 00:03:12 |
| 2 | TABLE ACCESS FULL| TEST_TABLE | 1013K| 2968K| 15825 (1)| 00:03:10 |
---------------------------------------------------------------------------------
SQL> create bitmap index test_index on test_table(col);
Index created.
SQL> EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );
PL/SQL procedure successfully completed.
SQL> SELECT col, COUNT(*)
2 FROM test_table
3 GROUP BY col
4 /
Execution Plan
----------------------------------------------------------
Plan hash value: 238193838
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 286 (0)| 00:00:04 |
| 1 | SORT GROUP BY NOSORT | | 10 | 30 | 286 (0)| 00:00:04 |
| 2 | BITMAP CONVERSION COUNT| | 1010K| 2961K| 286 (0)| 00:00:04 |
| 3 | BITMAP INDEX FULL SCAN| TEST_INDEX | | | | |
---------------------------------------------------------------------------------------