I would like to know whether INTERSECT or EXISTS have better performance in Oracle 11g. Consider I have the below two tables.
Student_Master
STUDENT_ID NAME
---------- ------
STUD01 ALEX
STUD02 JAMES
STUD03 HANS
Student_Status
STUDENT_ID STATUS
---------- ------
STUD01 Fail
STUD02 Pass
STUD03 Pass
Which of the below query will perform better considering that the table Student_Status will have more number of records compared to the table Student_Master.
SELECT STUDENT_ID FROM Student_Master
INTERSECT
SELECT STUDENT_ID FROM Student_Status
SELECT STUDENT_ID FROM Student_Master M
WHERE EXISTS
(SELECT STUDENT_ID FROM Student_Status S WHERE M.STUDENT_ID=S.STUDENT_ID)
A quick test would suggest the EXISTS option...
SELECT STUDENT_ID FROM Student_Master INTERSECT SELECT STUDENT_ID FROM
Student_Status
Plan hash value: 416197223
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | INTERSECTION | | | | | |
| 2 | SORT UNIQUE | | 3 | 36 | 3 (34)| 00:00:01 |
| 3 | TABLE ACCESS FULL| STUDENT_MASTER | 3 | 36 | 2 (0)| 00:00:01 |
| 4 | SORT UNIQUE | | 3 | 36 | 3 (34)| 00:00:01 |
| 5 | TABLE ACCESS FULL| STUDENT_STATUS | 3 | 36 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
-
SELECT STUDENT_ID FROM Student_Master M WHERE EXISTS (SELECT STUDENT_ID
FROM Student_Status S WHERE M.STUDENT_ID=S.STUDENT_ID)
Plan hash value: 361045672
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
|* 1 | HASH JOIN SEMI | | 3 | 72 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| STUDENT_MASTER | 3 | 36 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| STUDENT_STATUS | 3 | 36 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
The second form with EXIST operator does always full scan on Student_Master table, so I think, you'll get better performance using intersect. But you'll need use index on master table.
More info on EXIST can be found here: Ask TOM - IN & EXISTS
Related
I have query something like
WITH
str_table as (
SELECT stringtext, stringnumberid
FROM STRING_TABLE
WHERE LANGID IN (23,62)
),
data as (
select *
from employee emp
left outer join str_table st on emp.nameid = st.stringnumberid
)
select * from data
I know With clause will work in this manner
Step 1 : The SQL Query within the with clause is executed at first step.
Step 2 : The output of the SQL query is stored into temporary relation of with clause.
Step 3 : The Main query is executed with temporary relation produced at the last stage.
Now I want to ask whether the indexes created on the actual STRING_TABLE are going to help in temporary str_table produce by the With clause? I want to ask whether the indexes also have impact on str_table or not?
Oracle will not process CTE one by one. It will analyze the SQL as a whole. Your SQL is most likely the same as following in the eye of Oracle optimizer
select emp.*
from employee emp left outer join STRING_TABLE st
on emp.nameid = st.stringnumberid
where st.LANGID IN (23,62);
Oracle can use index on STRING_TABLE. Whether it will depends on the table statistics. For example, if the table has few rows (say a few hundred), Oracle will likely not use index.
It depends.
First of all, with clause is not a temporary table. As documentation says:
Oracle Database optimizes the query by treating the query name as either an inline view or as a temporary table.
Optimizer decides to materialize with subquery if either you forse it to do so by using /*+materialize*/ hint inside the subquery or you reuse this with subquery more than once.
In the example below Oracle uses with clause as inline view and merges it within the main query:
explain plan for
with a as (
select
s.textid,
s.textvalue,
a.id,
a.other_column
from string_table s
join another_tab a
on s.textid = a.textid
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
and b.job_textid = a_name.id
| PLAN_TABLE_OUTPUT |
| :----------------------------------------------------------------------------------- |
| Plan hash value: 1854049435 |
| |
| ------------------------------------------------------------------------------------ |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ------------------------------------------------------------------------------------ |
| | 0 | SELECT STATEMENT | | 1 | 1147 | 74 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 1 | 1147 | 74 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | ANOTHER_TAB | 39 | 3042 | 3 (0)| 00:00:01 | |
| |* 3 | HASH JOIN | | 31 | 33139 | 71 (0)| 00:00:01 | |
| | 4 | TABLE ACCESS FULL| BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| |* 5 | TABLE ACCESS FULL| STRING_TABLE | 1143 | 589K| 68 (0)| 00:00:01 | |
| ------------------------------------------------------------------------------------ |
But depending on the statistics and hints it may evaluate subquery first and then add it to the main query:
explain plan for
with a as (
select
s.textid,
s.textvalue,
a.id,
a.other_column
from string_table s
join another_tab a
on s.textid = a.textid
where langid in (1)
)
select /*+NO_MERGE(a_name)*/ *
from big_table b
join a a_name
on b.name_textid = a_name.textid
and b.job_textid = a_name.id
| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| Plan hash value: 4105667421 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 101 | 110K| 74 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 101 | 110K| 74 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 3 | VIEW | | 64 | 37120 | 71 (0)| 00:00:01 | |
| |* 4 | HASH JOIN | | 64 | 38784 | 71 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS FULL| ANOTHER_TAB | 39 | 3042 | 3 (0)| 00:00:01 | |
| |* 6 | TABLE ACCESS FULL| STRING_TABLE | 1143 | 589K| 68 (0)| 00:00:01 | |
| ------------------------------------------------------------------------------------- |
When you use with subquery twice, optimizer decides to materialize it:
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
join a a_job
on b.job_textid = a_job.textid
| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------------------------------------------------- |
| Plan hash value: 1371454296 |
| |
| ---------------------------------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 63 | 98973 | 67 (0)| 00:00:01 | |
| | 1 | TEMP TABLE TRANSFORMATION | | | | | | |
| | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D7224_469C01 | | | | | |
| | 3 | TABLE ACCESS BY INDEX ROWID BATCHED | STRING_TABLE | 999 | 515K| 22 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 999 | | 4 (0)| 00:00:01 | |
| |* 5 | HASH JOIN | | 63 | 98973 | 45 (0)| 00:00:01 | |
| |* 6 | HASH JOIN | | 35 | 36960 | 24 (0)| 00:00:01 | |
| | 7 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 8 | VIEW | | 999 | 502K| 21 (0)| 00:00:01 | |
| | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7224_469C01 | 999 | 502K| 21 (0)| 00:00:01 | |
| | 10 | VIEW | | 999 | 502K| 21 (0)| 00:00:01 | |
| | 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7224_469C01 | 999 | 502K| 21 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------------------------------------------------- |
So when there are some indexes on tables inside with subquery they may be used in all above cases: before materialization, when subquery is not merged and when subquery is merged and some idexes provide better query plan on merged subquery (even when those indexes are not used when you execute subquery alone).
What about idexes: if they provide high selectivity (i.e. number of rows retrieved by index is small compared to the overall number of rows), then Oracle will consider to use it. Note, that index access has two steps: read index blocks and then read table blocks that contain rowids found by index. If table size is not much bigger than index size, then Oracle may use table scan instead of index scan even for quite selective predicate (because of doubled IO).
In the below example I've used "small" texts (100 chars) and big_table table of 20 rows and this index for text table:
create index ix
on string_table(langid, textid)
Optimizer decides to use index range scan and read only blocks of the first level (first column of the index):
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
| PLAN_TABLE_OUTPUT |
| :---------------------------------------------------------------------------------------------------- |
| Plan hash value: 1660330381 |
| |
| ----------------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ----------------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 29 | 31001 | 26 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 29 | 31001 | 26 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS BY INDEX ROWID BATCHED| STRING_TABLE | 999 | 515K| 23 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 999 | | 4 (0)| 00:00:01 | |
| ----------------------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 1 - access("B"."NAME_TEXTID"="S"."TEXTID") |
| 4 - access("LANGID"=1) | |
But when we reduce the number of rows in big_table, it uses both the columns for index scan:
delete from big_table
where id > 4
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
| PLAN_TABLE_OUTPUT |
| :-------------------------------------------------------------------------------------------- |
| Plan hash value: 1766926914 |
| |
| --------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| --------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 1 | NESTED LOOPS | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 2 | NESTED LOOPS | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS FULL | BIG_TABLE | 4 | 4032 | 3 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 1 | | 1 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS BY INDEX ROWID| STRING_TABLE | 2 | 4056 | 2 (0)| 00:00:01 | |
| --------------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 4 - access("LANGID"=1 AND "B"."NAME_TEXTID"="S"."TEXTID") |
| |
You may check above code snippets in the db<>fiddle.
i am doubting about this case, but not clear why.
consider the following sql :
create table t1(tid int not null, t1 int not null);
create table t2(t2 int not null, tname varchar(30) null);
create unique index i_t2 on t2(t2);
create or replace view v_1 as
select t1.tid,t1.t1,max(t2.tname) as tname
from t1 left join t2
on t1.t1 = t2.t2
group by t1.tid,t1.t1;
then check the execution plan for select count(1) from v_1, the t2 is eliminated by the optimizer:
SQL> select count(1) from v_1;
Execution Plan
----------------------------------------------------------
Plan hash value: 3243658773
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 (34)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | VM_NWVW_0 | 1 | | 3 (34)| 00:00:01 |
| 3 | HASH GROUP BY | | 1 | 26 | 3 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
but if the index i_t2 is dropped or recreated without unique attribute,
the table t2 is not eliminated in execution plan:
SQL> drop index i_t2;
Index dropped.
SQL> select count(1) from v_1;
Execution Plan
----------------------------------------------------------
Plan hash value: 2710188186
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5 (20)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | VM_NWVW_0 | 1 | | 5 (20)| 00:00:01 |
| 3 | HASH GROUP BY | | 1 | 39 | 5 (20)| 00:00:01 |
|* 4 | HASH JOIN OUTER | | 1 | 39 | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL| T2 | 1 | 13 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
it seems even if the index is removed,
the result of select count(1) from v_1 also equal to
select count(1) from (select tid,t1 from t1 group by tid,t1)
why the optimizer does not eliminate t2 in the second case?
is there any principle or actual data example discribing this?
thanks :)
This is an optimization called join elimination. Because t2.t2 us unique, the optimizer knows that every row retrieved from t1 can only ever retrieve one row from t2. Since there is nothing projected from t2, there is no need to perform the join.
If you do
select tid, t1 from v_1;
you will see that we do not perform the join. However, if we project from t2, then the join is needed.
I can reproduce the following behavior both with Oracle 11 (see SQL Fiddle) and Oracle 12.
CREATE TYPE my_tab IS TABLE OF NUMBER(3);
CREATE TABLE test AS SELECT ROWNUM AS id FROM dual CONNECT BY ROWNUM <= 1000;
CREATE UNIQUE INDEX idx_test ON test( id );
CREATE VIEW my_view AS
SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt
FROM test;
The following case uses the index as expected:
SELECT * FROM my_view
WHERE id IN ( 1, 2 );
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 2 (0)| 00:00:01 |
| 1 | VIEW | MY_VIEW | 2 | 52 | 2 (0)| 00:00:01 |
| 2 | WINDOW BUFFER | | 2 | 8 | 2 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
|* 4 | INDEX UNIQUE SCAN| IDX_TEST | 2 | 8 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------
The following case does not use the index even though the cardinality hint is provided:
SELECT * FROM my_view
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab );
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 28 | 33 (4)| 00:00:01 |
|* 1 | HASH JOIN RIGHT SEMI | | 1 | 28 | 33 (4)| 00:00:01 |
| 2 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
| 3 | VIEW | MY_VIEW | 1000 | 26000 | 4 (25)| 00:00:01 |
| 4 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Edit:
Using an inline view and a JOIN instead of IN uses a similar plan:
SELECT /*+ CARDINALITY( tab, 2 ) */ *
FROM ( SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt FROM test ) t
JOIN TABLE( NEW my_tab( 1, 2 ) ) tab ON ( tab.COLUMN_VALUE = t.id );
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 56 | 33 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 56 | 33 (4)| 00:00:01 |
| 2 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
| 3 | VIEW | | 1000 | 26000 | 4 (25)| 00:00:01 |
| 4 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Replacing the analytic function by a LEFT JOIN with GROUP BY does not help either:
SELECT *
FROM ( SELECT t.id, s.cnt
FROM test t
LEFT JOIN ( SELECT id, COUNT(*) AS cnt
FROM test
GROUP BY id
) s ON ( s.id = t.id )
)
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab );
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 64 | 34 (6)| 00:00:01 |
|* 1 | HASH JOIN OUTER | | 2 | 64 | 34 (6)| 00:00:01 |
| 2 | NESTED LOOPS | | 2 | 12 | 30 (4)| 00:00:01 |
| 3 | SORT UNIQUE | | 2 | 4 | 29 (0)| 00:00:01 |
| 4 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IDX_TEST | 1 | 4 | 0 (0)| 00:00:01 |
| 6 | VIEW | | 1000 | 26000 | 4 (25)| 00:00:01 |
| 7 | HASH GROUP BY | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 8 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Replacing the PL/SQL Collection by a subselect does not seem to help either. The CARDINALITY hint is considered (the plan says 2 rows), but the index is still ignored.
SELECT *
FROM ( SELECT id, cnt FROM my_view )
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ id FROM test tab );
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 4 (25)| 00:00:01 |
| 1 | NESTED LOOPS | | 2 | 60 | 4 (25)| 00:00:01 |
| 2 | VIEW | MY_VIEW | 1000 | 26000 | 4 (25)| 00:00:01 |
| 3 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IDX_TEST | 1 | 4 | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Adding WHERE tab.id <= 2 to the in-list-subquery uses the index, so the optimizer seems to "not take the CARDINALITY hint serious enough" when selecting from a view with analytic functions (or another subselect) and filtering by a list of values.
How can I make these queries use the index as expected?
I think the one problem might be that the optimizer refuses to merge a view (and consider any indexes on the underlying tables) when the outer query block contains PL/SQL functions (e.g. TABLE()).
If you manually expand the view and query the table directly, it can access the index fine:
SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt
FROM test
WHERE id IN ( SELECT COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab )
;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 31 (4)| 00:00:01 |
| 1 | WINDOW SORT | | 1 | 6 | 31 (4)| 00:00:01 |
|* 2 | HASH JOIN SEMI | | 1 | 6 | 30 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | IDX_TEST | 1000 | 4000 | 1 (0)| 00:00:01 |
| 4 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 16336 | 29 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
I'm not sure if there's a way to override this behavior, or if it's a limitation in the optimizer. I tried moving the TABLE function to a CTE, but that doesn't seem to help.
I'm having trouble optimizing an Oracle query after an upgrade to Oracle 11g and this problem is starting to drive me a little mad.
Note, this question has been now fully edited because I have more information after creating a simple test case. The original question is available here: https://stackoverflow.com/revisions/12304320/1.
This issue is that when joining two tables, one of which has a between condition on a date column, if the query joins to a remote table, bind peeking doesn't happen.
Here is a test case to help reproduce the problem. First set up two source tables. The first is a list of dates, being the first of the month, going back thirty years
create table mike_temp_etl_control
as
select
add_months(trunc(sysdate, 'MM'), 1-row_count) as reporting_date
from (
select level as row_count
from dual
connect by level < 360
);
Then some data sourced from dba_objects:
create table mike_temp_dba_objects as
select owner, object_name, subobject_name, object_id, created
from dba_objects
union all
select owner, object_name, subobject_name, object_id, created
from dba_objects;
Then create an empty table to run the data in to:
create table mike_temp_1
as
select
a.OWNER,
a.OBJECT_NAME,
a.SUBOBJECT_NAME,
a.OBJECT_ID,
a.CREATED,
b.REPORTING_DATE
from
mike_temp_dba_objects a
join mike_temp_etl_control b on (
b.reporting_date between add_months(a.created, -24) and a.created)
where 1=2;
Then run the code. You may need to create a larger version mike_temp_dba_objects to slow the query down (or use some other method to get the execution plan). While the query is running, I get an execution plan from the session by running select *
from table(dbms_xplan.display_cursor(sql_id => 'xxxxxxxxxxx')) from a different session.
declare
pv_report_start_date date := date '2002-01-01';
v_report_end_date date := date '2012-07-01';
begin
INSERT /*+ APPEND */
INTO mike_temp_5
select
a.OWNER,
a.OBJECT_NAME,
a.SUBOBJECT_NAME,
a.OBJECT_ID,
a.CREATED,
b.REPORTING_DATE
from
mike_temp_dba_objects a
join mike_temp_etl_control b on (
b.reporting_date between add_months(a.created, -24) and a.created)
cross join dual#emirrl -- This line causes problems...
where
b.reporting_date between add_months(pv_report_start_date, -12) and v_report_end_date;
rollback;
end;
By having a remote table in the query, the cardinality estimate for the mike_temp_etl_control table is completely wrong and bind peeking doesn't seem to be happening.
The execution plan for the query above is shown below:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 373 (100)|
| 1 | LOAD AS SELECT | | | | |
|* 2 | FILTER | | | | |
| 3 | MERGE JOIN | | 5 | 655 | 373 (21)|
| 4 | SORT JOIN | | 1096 | 130K| 370 (20)|
| 5 | MERGE JOIN CARTESIAN| | 1096 | 130K| 369 (20)|
| 6 | REMOTE | DUAL | 1 | | 2 (0)|
| 7 | BUFFER SORT | | 1096 | 130K| 367 (20)|
|* 8 | TABLE ACCESS FULL | MIKE_TEMP_DBA_OBJECTS | 1096 | 130K| 367 (20)|
|* 9 | FILTER | | | | |
|* 10 | SORT JOIN | | 2 | 18 | 3 (34)|
|* 11 | TABLE ACCESS FULL | MIKE_TEMP_ETL_CONTROL | 2 | 18 | 2 (0)|
---------------------------------------------------------------------------------------
If I then replace the remote dual with the local version I get the correct cardinality (139 instead of 2):
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 10682 (100)|
| 1 | LOAD AS SELECT | | | | |
|* 2 | FILTER | | | | |
| 3 | MERGE JOIN | | 152K| 19M| 10682 (3)|
| 4 | SORT JOIN | | 438K| 51M| 10632 (2)|
| 5 | NESTED LOOPS | | 438K| 51M| 369 (20)|
| 6 | FAST DUAL | | 1 | | 2 (0)|
|* 7 | TABLE ACCESS FULL| MIKE_TEMP_DBA_OBJECTS | 438K| 51M| 367 (20)|
|* 8 | FILTER | | | | |
|* 9 | SORT JOIN | | 139 | 1251 | 3 (34)|
|* 10 | TABLE ACCESS FULL| MIKE_TEMP_ETL_CONTROL | 139 | 1251 | 2 (0)|
-------------------------------------------------------------------------------------
So, I guess the question is how can I get the correct cardinality to be estimated? Is this an Oracle bug or is this the expected behaviour?
I think you should mess about dynamic sampling. It works in 11g differently so may it is the reason of your troubles.
I have a query:
select count(1) CNT
from file_load_params a
where a.doc_type = (select b.doc_type
from file_load_header b
where b.indicator = 'XELFASI')
order by a.line_no
Which explain plan is:
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL | FILE_LOAD_PARAMS | 15 | 105 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| FILE_LOAD_HEADER | 1 | 12 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | FILE_LOAD_HEADER_UK | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
I thought that I could optimize this query and write this one:
select count(1) CNT
from file_load_params a,file_load_header b
where b.indicator = 'XELFASI'
and a.doc_type = b.doc_type
order by a.line_no
Its explain plan is:
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 19 | | |
| 2 | NESTED LOOPS | | 15 | 285 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| FILE_LOAD_HEADER | 1 | 12 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | FILE_LOAD_HEADER_UK | 1 | | 0 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | FILE_LOAD_PARAMS | 15 | 105 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Is it good? I think not,but I expected better result...Do you have any idea?
From the explain plans, these appear to be tiny tables and the cost of the query is negligible. How long do they take to run and how quickly do you need them to run ?
But remove the ORDER BY. Since you are selecting a single row COUNT aggregate it is pointless.
One of the possible optimizations i see from your explain plan is
TABLE ACCESS FULL | FILE_LOAD_PARAMS
This seems to indicate that table file_load_params possibly does not have any index on doc_type
If that is the case, can you add an index for doc_type. If you already have indexes, can you post your table schema for file_load_params
The result is not the same for the two queries. The IN operator automatically also applies a DISTINCT to the inner query. And in this case it is probably not a key you are joining on (if it is, then make it an unique key), so it cannot be optimized away.
As for optimizing the query, then do as InSane says, add an index on Doc_Type in FILE_LOAD_PARAMS