SQL Multiple Minus vs Multiple Join Performance - sql

I hope someone can explain the performance of joining multiple tables vs. using MINUS to eliminate records. I looked at a few other stack overflow questions but didn't see what I was looking for.
I thought these two queries would produce the same output, and I have always heard "use joins, use joins!", particularly from stackoverflow posts, that they were expected to be faster...
This is the first query I ran which I thought would be much slower, but it takes only a matter of minutes to run...
select some_id
from table1
MINUS
select some_id
from table2
where table2.value = 'some_value'
MINUS
select some_id
from table3
where table3.value = 'some_value'
group by some_id
This is the second query which I thought would be faster, but it has been running for over 3 hours now (with no end in sight?)
select some_id
from table1
join table2 on table1.id=table2.id
join table3 on table1.id=table3.id
where table2.value = 'some_value'
or table3.value = 'some_value'
group by some_id
I should note all 3 tables have > 1 Million records, up to 15 Million records each.
EDIT:
Sorry - I meant to let you know I was avoiding the use of NOT EXISTS in this question as a response, as I really am curious about just these two scenarios.

Try this version:
select some_id
from table1
where not exists (select 1 from table2 t2 on t1.id = t2.id and t2.value = 'some_value') or
not exists (select 1 from table3 t3 on t1.id = t3.id and t3.value = 'some_value')
For best performance, you want indexes on table2(id, value) and table3(id, value).

Firstly make sure you have the indexes in place,
to see the plan, if it is making use of full table scan, the go ahead with the creating of indexes else it is going to take a long , long time.
if you have plsql developer, then paste the query in the in sql window and press F5 it would give you the explain plan .
or can do this also,
SCOTT#research 17-APR-15> EXPLAIN PLAN FOR
2 select empno
3 from emp
4 MINUS
5 select empno
6 from empp
7 where empp.empno = '7839'
8 MINUS
9 select empno
10 from emppp
11 where emppp.empno = '7902'
12 group by empno
13 ;
Explained.
SCOTT#research 17-APR-15> SET LINESIZE 130
SCOTT#research 17-APR-15> SET PAGESIZE 0
SCOTT#research 17-APR-15> SELECT *
2 FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 4222598102
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 82 | 10 (90)| 00:00:01 |
| 1 | MINUS | | | | | |
| 2 | MINUS | | | | | |
| 3 | SORT UNIQUE NOSORT | | 14 | 56 | 2 (50)| 00:00:01 |
| 4 | INDEX FULL SCAN | PK_EMP | 14 | 56 | 1 (0)| 00:00:01 |
| 5 | SORT UNIQUE NOSORT | | 1 | 13 | 4 (25)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | EMPP | 1 | 13 | 3 (0)| 00:00:01 |
| 7 | SORT UNIQUE NOSORT | | 1 | 13 | 4 (25)| 00:00:01 |
| 8 | SORT GROUP BY NOSORT| | 1 | 13 | 4 (25)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | EMPPP | 1 | 13 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - filter("EMPP"."EMPNO"=7839)
9 - filter("EMPPP"."EMPNO"=7902)
Note
-----
- dynamic sampling used for this statement (level=2)
26 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2137789089
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 16336 | 29 (0)| 00:00:01 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| DISPLAY | 8168 | 16336 | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
or if you want to use autotrace then do,
set autotrace on explain
This is how it would look,
SCOTT#research 17-APR-15> select empno
2 from emp
3 MINUS
4 select empno
5 from empp
6 where empp.empno = '7839'
7 MINUS
8 select empno
9 from emppp
10 where emppp.empno = '7902'
11 group by empno
12 ;
EMPNO
----------
234
7499
7521
7566
7654
7698
7782
7788
7844
7876
7900
7934
12 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4222598102
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 82 | 10 (90)| 00:00:01 |
| 1 | MINUS | | | | | |
| 2 | MINUS | | | | | |
| 3 | SORT UNIQUE NOSORT | | 14 | 56 | 2 (50)| 00:00:01 |
| 4 | INDEX FULL SCAN | PK_EMP | 14 | 56 | 1 (0)| 00:00:01 |
| 5 | SORT UNIQUE NOSORT | | 1 | 13 | 4 (25)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | EMPP | 1 | 13 | 3 (0)| 00:00:01 |
| 7 | SORT UNIQUE NOSORT | | 1 | 13 | 4 (25)| 00:00:01 |
| 8 | SORT GROUP BY NOSORT| | 1 | 13 | 4 (25)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | EMPPP | 1 | 13 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - filter("EMPP"."EMPNO"=7839)
9 - filter("EMPPP"."EMPNO"=7902)
Note
-----
- dynamic sampling used for this statement (level=2)
SCOTT#research 17-APR-15>
SCOTT#research 17-APR-15> select emp.empno
2 from emp
3 join empp on emp.empno=empp.empno
4 join emppp on emp.empno=emppp.empno
5 where empp.empno = '7839'
6 or emppp.empno = '7902'
7 group by emp.empno
8 ;
EMPNO
----------
7839
7902
Execution Plan
----------------------------------------------------------
Plan hash value: 1435156579
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 8 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 30 | 8 (25)| 00:00:01 |
|* 2 | HASH JOIN | | 1 | 30 | 7 (15)| 00:00:01 |
| 3 | NESTED LOOPS | | 6 | 102 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMPPP | 6 | 78 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN| PK_EMP | 1 | 4 | 0 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | EMPP | 10 | 130 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMP"."EMPNO"="EMPP"."EMPNO")
filter("EMPP"."EMPNO"=7839 OR "EMPPP"."EMPNO"=7902)
5 - access("EMP"."EMPNO"="EMPPP"."EMPNO")
Note
-----
- dynamic sampling used for this statement (level=2)

Related

Avoid Full Table Scan with subquery or analytic function in view

I can reproduce the following behavior both with Oracle 11 (see SQL Fiddle) and Oracle 12.
CREATE TYPE my_tab IS TABLE OF NUMBER(3);
CREATE TABLE test AS SELECT ROWNUM AS id FROM dual CONNECT BY ROWNUM <= 1000;
CREATE UNIQUE INDEX idx_test ON test( id );
CREATE VIEW my_view AS
SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt
FROM test;
The following case uses the index as expected:
SELECT * FROM my_view
WHERE id IN ( 1, 2 );
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 2 (0)| 00:00:01 |
| 1 | VIEW | MY_VIEW | 2 | 52 | 2 (0)| 00:00:01 |
| 2 | WINDOW BUFFER | | 2 | 8 | 2 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
|* 4 | INDEX UNIQUE SCAN| IDX_TEST | 2 | 8 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------
The following case does not use the index even though the cardinality hint is provided:
SELECT * FROM my_view
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab );
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 28 | 33 (4)| 00:00:01 |
|* 1 | HASH JOIN RIGHT SEMI | | 1 | 28 | 33 (4)| 00:00:01 |
| 2 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
| 3 | VIEW | MY_VIEW | 1000 | 26000 | 4 (25)| 00:00:01 |
| 4 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Edit:
Using an inline view and a JOIN instead of IN uses a similar plan:
SELECT /*+ CARDINALITY( tab, 2 ) */ *
FROM ( SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt FROM test ) t
JOIN TABLE( NEW my_tab( 1, 2 ) ) tab ON ( tab.COLUMN_VALUE = t.id );
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 56 | 33 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 56 | 33 (4)| 00:00:01 |
| 2 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
| 3 | VIEW | | 1000 | 26000 | 4 (25)| 00:00:01 |
| 4 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Replacing the analytic function by a LEFT JOIN with GROUP BY does not help either:
SELECT *
FROM ( SELECT t.id, s.cnt
FROM test t
LEFT JOIN ( SELECT id, COUNT(*) AS cnt
FROM test
GROUP BY id
) s ON ( s.id = t.id )
)
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab );
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 64 | 34 (6)| 00:00:01 |
|* 1 | HASH JOIN OUTER | | 2 | 64 | 34 (6)| 00:00:01 |
| 2 | NESTED LOOPS | | 2 | 12 | 30 (4)| 00:00:01 |
| 3 | SORT UNIQUE | | 2 | 4 | 29 (0)| 00:00:01 |
| 4 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IDX_TEST | 1 | 4 | 0 (0)| 00:00:01 |
| 6 | VIEW | | 1000 | 26000 | 4 (25)| 00:00:01 |
| 7 | HASH GROUP BY | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 8 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Replacing the PL/SQL Collection by a subselect does not seem to help either. The CARDINALITY hint is considered (the plan says 2 rows), but the index is still ignored.
SELECT *
FROM ( SELECT id, cnt FROM my_view )
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ id FROM test tab );
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 4 (25)| 00:00:01 |
| 1 | NESTED LOOPS | | 2 | 60 | 4 (25)| 00:00:01 |
| 2 | VIEW | MY_VIEW | 1000 | 26000 | 4 (25)| 00:00:01 |
| 3 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IDX_TEST | 1 | 4 | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Adding WHERE tab.id <= 2 to the in-list-subquery uses the index, so the optimizer seems to "not take the CARDINALITY hint serious enough" when selecting from a view with analytic functions (or another subselect) and filtering by a list of values.
How can I make these queries use the index as expected?
I think the one problem might be that the optimizer refuses to merge a view (and consider any indexes on the underlying tables) when the outer query block contains PL/SQL functions (e.g. TABLE()).
If you manually expand the view and query the table directly, it can access the index fine:
SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt
FROM test
WHERE id IN ( SELECT COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab )
;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 31 (4)| 00:00:01 |
| 1 | WINDOW SORT | | 1 | 6 | 31 (4)| 00:00:01 |
|* 2 | HASH JOIN SEMI | | 1 | 6 | 30 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | IDX_TEST | 1000 | 4000 | 1 (0)| 00:00:01 |
| 4 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 16336 | 29 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
I'm not sure if there's a way to override this behavior, or if it's a limitation in the optimizer. I tried moving the TABLE function to a CTE, but that doesn't seem to help.

Oracle SQL indexed query 100% cpu usage

I'm running a relatively simple query
SELECT * FROM confirm_v c
JOIN person p ON c.created_by=p.id
INNER JOIN invoice_confirm ic ON ic.confirm_id=c.id
WHERE c.id = (SELECT id FROM
(SELECT c2.id FROM confirm c2
JOIN invoice_confirm ic2 ON ic2.confirm_id=c2.id
WHERE ic2.invoice_id=11954081
AND c2.previous=0
AND c2.canceled=0
AND c2.confirm_type='INVOICE'
ORDER BY c2.id)
WHERE rownum=1);
which results in 100% cpu usage by the rdb. The confirm_type is a varchar2(50 char), the rest are number(10) if it means anything.
The invoice_confirm and confirm tables are covered by indices and there are no full table scans visible in the execution plan for this query.
This query isn't executed a lot, but accounts for nearly 100% of total cpu usage. Any ideas are appreciated.
EDIT:
The explain plan in text from for the query.
EXPLAIN PLAN FOR ...
SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
Plan hash value: 1705859247
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 69 | 10 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 69 | 10 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 57 | 7 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | 1 | 30 | 5 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID | CONFIRM | 1 | 24 | 3 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_CONFIRM | 1 | | 2 (0)| 00:00:01 |
|* 7 | COUNT STOPKEY | | | | | |
| 8 | VIEW | | 4 | 52 | 27 (4)| 00:00:01 |
|* 9 | SORT ORDER BY STOPKEY | | 4 | 132 | 27 (4)| 00:00:01 |
| 10 | NESTED LOOPS | | 4 | 132 | 26 (0)| 00:00:01 |
| 11 | NESTED LOOPS | | 11 | 132 | 26 (0)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID BATCHED| INVOICE_CONFIRM | 3 | 36 | 4 (0)| 00:00:01 |
|* 13 | INDEX RANGE SCAN | FKI_INVOICE_CONFIRM_INVOICE | 2 | | 3 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | PK_CONFIRM | 1 | | 1 (0)| 00:00:01 |
|* 15 | TABLE ACCESS BY INDEX ROWID | CONFIRM | 1 | 21 | 2 (0)| 00:00:01 |
|* 16 | INDEX RANGE SCAN | FKI_INVOICE_CONFIRM_CONFIRM | 1 | 6 | 2 (0)| 00:00:01 |
| 17 | TABLE ACCESS BY INDEX ROWID | PERSON | 1 | 27 | 2 (0)| 00:00:01 |
|* 18 | INDEX UNIQUE SCAN | PK_KASUTAJA | 1 | | 1 (0)| 00:00:01 |
|* 19 | INDEX RANGE SCAN | FKI_INVOICE_CONFIRM_CONFIRM | 1 | | 2 (0)| 00:00:01 |
| 20 | TABLE ACCESS BY INDEX ROWID | INVOICE_CONFIRM | 1 | 12 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("CONFIRM"."ID"= (SELECT "ID" FROM (SELECT "C2"."ID" "ID" FROM "INVOICE_CONFIRM" "IC2","CONFIRM" "C2"
WHERE "IC2"."CONFIRM_ID"="C2"."ID" AND "C2"."CANCELED"=0 AND "C2"."PREVIOUS"=0 AND "C2"."CONFIRM_TYPE"='INVOICE' AND
"IC2"."INVOICE_ID"=11954081 ORDER BY "C2"."ID") "from$_subquery$_006" WHERE ROWNUM=1))
7 - filter(ROWNUM=1)
9 - filter(ROWNUM=1)
13 - access("IC2"."INVOICE_ID"=11954081)
14 - access("IC2"."CONFIRM_ID"="C2"."ID")
15 - filter("C2"."CANCELED"=0 AND "C2"."PREVIOUS"=0 AND "C2"."CONFIRM_TYPE"='INVOICE')
16 - access("IC"."CONFIRM_ID"="CONFIRM"."ID")
18 - access("CONFIRM"."CREATED_BY"="P"."ID")
19 - access("IC"."CONFIRM_ID"="CONFIRM"."ID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 2 Sql Plan Directives used for this statement
Gather optimizer statistics on the relevant tables and investigate why the statistics were missing.
begin
dbms_stats.gather_table_stats(user, 'CONFIRM');
dbms_stats.gather_table_stats(user, 'INVOICE_CONFIRM');
dbms_stats.gather_table_stats(user, 'PERSON');
end;
/
Optimizer statistics are critical for Oracle to achieve good performance. The note dynamic statistics used: dynamic sampling (level=2) implies that there are tables with missing optimizer statistics. That should never happen unless the tables were created within the last day.
Oracle automatically gathers stale and missing statistics. Check if the job is running with this query. If there are no recent rows, ask your DBA to re-enable the task.
select *
from dba_optstat_operations
where operation like '%auto%'
order by start_time desc;
The autotask is good enough for most tables. But if there's a large batch process that updates a lot of rows then the statistics should be manually collected as soon as the job is complete.

Comparing two join queries in Oracle

I have 2 queries do the same job:
SELECT * FROM student_info
INNER JOIN class
ON student_info.id = class.studentId
WHERE student_info.name = 'Ken'
SELECT * FROM (SELECT * FROM student_info WHERE name = 'Ken') studInfo
INNER JOIN class
ON student_info.id = class.studentId
Which one is faster? I guess the second but not sure, I am using Oracle 11g.
UPDATED:
My tables are non-indexed and I confirm two PLAN_TABLE_OUTPUTs are almost same:
Full size image
In the latest versions of Oracle, the optimizer is smart enough to do its job. So it won't matter and both of your queries would be internally optimized to do the task efficiently. Optimizer might do a query re-write and opt an efficient execution plan.
Let's understand this with a small example of EMP and DEPT table. I will use two similar queries like yours in the question.
I will take two cases, first a predicate having a non-indexed column, second with an indexed column.
Case 1 - predicate having a non-indexed column
SQL> explain plan for
2 SELECT * FROM emp e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno
5 where ename = 'SCOTT';
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 59 | 4 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("E"."ENAME"='SCOTT')
4 - access("E"."DEPTNO"="D"."DEPTNO")
Note
-----
- this is an adaptive plan
22 rows selected.
SQL>
SQL> explain plan for
2 SELECT * FROM (SELECT * FROM emp WHERE ename = 'SCOTT') e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 59 | 4 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("ENAME"='SCOTT')
4 - access("EMP"."DEPTNO"="D"."DEPTNO")
Note
-----
- this is an adaptive plan
22 rows selected.
SQL>
Case 2 - predicate having an indexed column
SQL> explain plan for
2 SELECT * FROM emp e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno
5 where empno = 7788;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 2385808155
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 59 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("E"."EMPNO"=7788)
5 - access("E"."DEPTNO"="D"."DEPTNO")
18 rows selected.
SQL>
SQL> explain plan for
2 SELECT * FROM (SELECT * FROM emp where empno = 7788) e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 2385808155
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 59 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO"=7788)
5 - access("EMP"."DEPTNO"="D"."DEPTNO")
18 rows selected.
SQL>
Is there any difference between the explain plans in each case respectively? No.
You'd need to show us the query plans and the execution statistics to be certain. That said, assuming name is indexed and statistics are reasonably accurate, I'd be shocked if the two queries didn't generate the same plan (and, thus, the same performance). With either query, Oracle is free to evaluate the predicate before or after it evaluates the join so it is unlikely that it would choose differently in the two cases.
I would definitely lean towards the first query.
When selects are nested, Oracle has fewer optimization opportunities. It generally has to evaluate the inner select into a temporary view and then apply the outer select to that. That is rarely faster than a JOIN where Oracle will evaluate everything together.
Showing your EXPLAIN PLAN would provide extra info for us as well.

SQL subquery and Joins giving same or different result (oracle)

I'm working to optimize queries due to huge amount of data on Oracle.
There is one query like this.
With subquery :
SELECT
STG.ID1,
STG.ID2
FROM (SELECT
DISTINCT
H1.ID1,
H2.ID2
FROM T_STGDV STG
INNER JOIN T_HUB1 H1 ON STG.BK1 = H1.BK1
INNER JOIN T_HUB2 H2 ON STG.BK2 = H2.BK2 ) STG
LEFT OUTER JOIN T_LINK L ON L.ID1 = STG.ID1 AND L.ID2 = STG.ID2
WHERE L.IDL IS NULL;
I'm doing this optimization :
SELECT
DISTINCT
H1.ID1,
H2.ID2
FROM T_STGDV STG
INNER JOIN T_HUB1 H1 ON STG.BK1 = H1.BK1
INNER JOIN T_HUB2 H2 ON STG.BK2 = H2.BK2
LEFT OUTER JOIN T_LINK L ON L.ID1 = H1.ID1 AND L.ID2 = H2.ID2
WHERE L.IDL IS NULL;
I want to know if the result will be the same, the behavior is the same.
I did some tests, I didn't find difference but maybe i missed some test case ?
Any idea what could be the difference between those queries ?
Thanks.
Some details, the Explain plan for those testing tables (the cost are not representative of the real tables)
the First query :
Plan hash value: 2680307749
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | 11 (28)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | HASH JOIN OUTER | | 1 | 65 | 11 (28)| 00:00:01 |
| 3 | VIEW | | 1 | 26 | 8 (25)| 00:00:01 |
| 4 | HASH UNIQUE | | 1 | 134 | 8 (25)| 00:00:01 |
|* 5 | HASH JOIN | | 1 | 134 | 7 (15)| 00:00:01 |
|* 6 | HASH JOIN | | 1 | 94 | 5 (20)| 00:00:01 |
| 7 | TABLE ACCESS FULL| T_STGDV | 1 | 54 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL| T_HUB1 | 2 | 80 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | T_HUB2 | 2 | 80 | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS FULL | T_LINK | 3 | 117 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("L"."IDL" IS NULL)
2 - access("L"."ID2"(+)="STG"."ID2" AND "L"."ID1"(+)="STG"."ID1")
5 - access("STG"."BK2"="H2"."BK2")
6 - access("STG"."BK1"="H1"."BK1")
Note
-----
- dynamic sampling used for this statement (level=2)
the second query
Plan hash value: 2149614538
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | 11 (28)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 65 | 11 (28)| 00:00:01 |
|* 2 | FILTER | | | | | |
|* 3 | HASH JOIN OUTER | | 1 | 65 | 10 (20)| 00:00:01 |
| 4 | VIEW | | 1 | 26 | 7 (15)| 00:00:01 |
|* 5 | HASH JOIN | | 1 | 134 | 7 (15)| 00:00:01 |
|* 6 | HASH JOIN | | 1 | 94 | 5 (20)| 00:00:01 |
| 7 | TABLE ACCESS FULL| T_STGDV | 1 | 54 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL| T_HUB1 | 2 | 80 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | T_HUB2 | 2 | 80 | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS FULL | T_LINK | 3 | 117 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("L"."IDL" IS NULL)
3 - access("L"."ID2"(+)="H2"."ID2" AND "L"."ID1"(+)="H1"."ID1")
5 - access("STG"."BK2"="H2"."BK2")
6 - access("STG"."BK1"="H1"."BK1")
Note
-----
- dynamic sampling used for this statement (level=2)
The queries look equivalent to me, because of the where clause.
Without the where clause they are not equivalent. Duplicates in t_link (relative to the join keys) would result in duplicate rows. However, you are looking for no matches, so this is not an issue. When there is no match, the two versions should be equivalent.
If you want to test them with your current dataset you can use minus.
query 1
MINUS
query 2
If any results are displayed, they are not the same.
You have to flip them around to try the other way too...
query 2
MINUS
query 1
If both tests return no records, the queries have the same effect on your current dataset.
This might be the difference: look at these lines in your execution plans:
2 - access("L"."ID2"(+)="STG"."ID2" AND "L"."ID1"(+)="STG"."ID1")
and
3 - access("L"."ID2"(+)="H2"."ID2" AND "L"."ID1"(+)="H1"."ID1")
STG is a temporary table created by Oracle for the duration of the query (that ambiguousness between T_STGDV alias and the subquery alias was alone a reason to rewrite the query). And this temporary table is of course unindexed. After your refactoring, Oracle optimiser start joining T_LINK with H1 and H2 instead of a temporary table and that allows it to utilize indexes built on those table, thus giving you the 20x increase in speed.
After testing, there are giving the same result. And the second one is more efficient.

sql query optimisation

Please compare the following:
INNER JOIN table1 t1 ON t1.someID LIKE 'search.%' AND
t1.someID = ( 'search.' || t0.ID )
vs.
INNER JOIN table1 t1 ON t1.someID = ( 'search.' || t0.ID )
I've been told, that the first case is optimized. But you know, I can not understand why it is. As far as I understand the 2nd example should run faster.
We use Oracle, but I suppose it does not matter at the moment.
Please explain if I'm wrong.
Thank you
So, here is the explain plan for a query which joins on just the concatenated string:
SQL> explain plan for
2 select e.* from emp e
3 join big_table bt on bt.col2 = 'search'||trim(to_char(e.empno))
4 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 179424166
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1052 | 65224 | 43 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1052 | 65224 | 43 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 20 | 780 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BIG_VC_I | 53 | 1219 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
15 rows selected.
SQL>
Compare and contrast with the plan for a query which includes the LIKE clause in its join:
SQL> explain plan for
2 select e.* from emp e
3 join big_table bt on (bt.col2 like 'search%'
4 and bt.col2 = 'search'||trim(to_char(e.empno)))
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 179424166
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 5 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 62 | 5 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BIG_VC_I | 1 | 23 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')
3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
filter("BT"."COL2" LIKE 'search%')
17 rows selected.
SQL>
The cost of the second query is much lower than the first. But this is because the optimizer is estimating that the second query will return far fewer rows than the first query. More information allows the database to make a more accurate prediction. (In fact the query will return no rows).
Of course this does presume the joined column is indexed, otherwise it won't make any difference.
The other thing to bear in mind is that the columns which are queried can affect the plan. This version selects from BIG_TABLE rather than EMP.
SQL> explain plan for
2 select bt.* from emp e
3 join big_table bt on (bt.col2 like 'search%'
4 and bt.col2 = 'search'||trim(to_char(e.empno)))
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
Plan hash value: 4042413806
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 46 | 4 (0)| 00:00:01 |
|* 3 | INDEX FULL SCAN | PK_EMP | 1 | 4 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | BIG_VC_I | 1 | | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| BIG_TABLE | 1 | 42 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')
4 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
filter("BT"."COL2" LIKE 'search%')
19 rows selected.
SQL>
The query analysis of the various database engines would really tell the story but my first instinct would be that the first form is in fact optimized. The reason is that the compiler cannot guess as the to results of the concatenation. It must do more work to determine the value against which to do the match and would likely result in a table scan. The first still must do that, however, it is able to narrow the resultset using the LIKE operator (presuming an index exists on the someID column) first and thus has to do fewer concatenations.