I have 3 tables which I am using in sub queries like below.
INSERT INTO TABLE_A (SELECT * FROM TABLE_B WHERE (X,Y) NOT IN (SELECT X,Y FROM TABLE_A));
But it took ages to run the query as the tables have 400,000 to 500,000 rows.
Whereas, when I do the below query, it doesn't take much time.
INSERT INTO TABLE_A (SELECT * FROM TABLE_B WHERE X||Y NOT IN (SELECT X||Y FROM TABLE_A));
I got a doubt whether both are same or not after seeing the execution time.
Why is one slower than the other?
Are these queries same?
The key here is the "INTERNAL FUNCTION" in the plan, which probably means you are comparing columns with different data types (which is always a bad idea)
For example
SQL> create table t1 ( x int, y int );
Table created.
SQL> create table t2 ( x varchar2(10), y varchar2(10));
Table created.
SQL>
SQL> insert into t1 select rownum,rownum from dual
2 connect by level <= 1000;
1000 rows created.
SQL>
SQL> insert into t2 select rownum,rownum from dual
2 connect by level <= 1000;
1000 rows created.
SQL> set autotrace traceonly explain
SQL> select *
2 from t1
3 where (x,y) not in (select x,y from t2);
Execution Plan
----------------------------------------------------------
Plan hash value: 2177415756
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 16000 | 8 (25)| 00:00:01 |
| 1 | MERGE JOIN ANTI NA | | 1000 | 16000 | 8 (25)| 00:00:01 |
| 2 | SORT JOIN | | 1000 | 8000 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T1 | 1000 | 8000 | 3 (0)| 00:00:01 |
|* 4 | SORT UNIQUE | | 1000 | 8000 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T2 | 1000 | 8000 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access(INTERNAL_FUNCTION("X")=TO_NUMBER("X") AND
INTERNAL_FUNCTION("Y")=TO_NUMBER("Y"))
filter(INTERNAL_FUNCTION("Y")=TO_NUMBER("Y") AND
INTERNAL_FUNCTION("X")=TO_NUMBER("X"))
We had to clean up the data types before we could do a proper join, and hence we did not use the hash join.
When you changed this to be concatentation, that operator made everything a string, and hence a hash join could be used.
SQL> select *
2 from t1
3 where (x||y) not in (select x||y from t2);
Execution Plan
----------------------------------------------------------
Plan hash value: 1275484728
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 990 | 15840 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN ANTI NA | | 990 | 15840 | 6 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T1 | 1000 | 8000 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T2 | 1000 | 8000 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(TO_CHAR("X")||TO_CHAR("Y")="X"||"Y")
SQL>
but as William has pointed out, you run the risk of inaccurate results.
Related
i am doubting about this case, but not clear why.
consider the following sql :
create table t1(tid int not null, t1 int not null);
create table t2(t2 int not null, tname varchar(30) null);
create unique index i_t2 on t2(t2);
create or replace view v_1 as
select t1.tid,t1.t1,max(t2.tname) as tname
from t1 left join t2
on t1.t1 = t2.t2
group by t1.tid,t1.t1;
then check the execution plan for select count(1) from v_1, the t2 is eliminated by the optimizer:
SQL> select count(1) from v_1;
Execution Plan
----------------------------------------------------------
Plan hash value: 3243658773
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 (34)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | VM_NWVW_0 | 1 | | 3 (34)| 00:00:01 |
| 3 | HASH GROUP BY | | 1 | 26 | 3 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
but if the index i_t2 is dropped or recreated without unique attribute,
the table t2 is not eliminated in execution plan:
SQL> drop index i_t2;
Index dropped.
SQL> select count(1) from v_1;
Execution Plan
----------------------------------------------------------
Plan hash value: 2710188186
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5 (20)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | VM_NWVW_0 | 1 | | 5 (20)| 00:00:01 |
| 3 | HASH GROUP BY | | 1 | 39 | 5 (20)| 00:00:01 |
|* 4 | HASH JOIN OUTER | | 1 | 39 | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL| T2 | 1 | 13 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
it seems even if the index is removed,
the result of select count(1) from v_1 also equal to
select count(1) from (select tid,t1 from t1 group by tid,t1)
why the optimizer does not eliminate t2 in the second case?
is there any principle or actual data example discribing this?
thanks :)
This is an optimization called join elimination. Because t2.t2 us unique, the optimizer knows that every row retrieved from t1 can only ever retrieve one row from t2. Since there is nothing projected from t2, there is no need to perform the join.
If you do
select tid, t1 from v_1;
you will see that we do not perform the join. However, if we project from t2, then the join is needed.
I am running the exact same join query using two different tables, but the first one (table A) times out whereas the second (table B) does not.
SELECT * FROM table_X
INNER JOIN table_A
ON table_A.point_origin = table_X.item_id
WHERE ROWNUM < 10;
SELECT * FROM table_X
INNER JOIN table_B
ON table_B.point_origin = table_X.item_id
WHERE ROWNUM < 10;
As far as I know, table A is a subset of table B. Neither table A nor table B have point_origin indexed.
(Edit for clarification: table A is a only a subset of table B in terms of row identifiers, not in terms of exact column data.)
For what it's worth, I'm dealing with very large tables and item_id is indexed.
Is there anything else that would affect performance here or am I definitely wrong about some information provided?
Edit: Additional information per a comment below
table_A:
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 4743 | 12 (0)| | |
|* 1 | COUNT STOPKEY | | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| table_X | 1 | 227 | 1 (0)| | |
| 3 | NESTED LOOPS | | 11 | 5797 | 12 (0)| | |
| 4 | PARTITION RANGE ALL | | 10M| 2969M| 2 (0)| 1 | 4 |
| 5 | TABLE ACCESS FULL | table_A | 10M| 2969M| 2 (0)| 1 | 4 |
|* 6 | INDEX RANGE SCAN | table_X_IP_PK | 1 | | 1 (0)| | |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
6 - access("table_A"."POINT_ORIGIN"="table_X"."ITEM_ID")
Note
-----
- 'PLAN_TABLE' is old version
table_B:
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 3879 | 11 (0)|
|* 1 | COUNT STOPKEY | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| table_X | 1 | 227 | 1 (0)|
| 3 | NESTED LOOPS | | 10 | 4310 | 11 (0)|
| 4 | TABLE ACCESS FULL | table_B | 118M| 22G| 2 (0)|
|* 5 | INDEX RANGE SCAN | table_X_IP_PK | 1 | | 1 (0)|
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
5 - access("table_B"."POINT_ORIGIN"="table_X"."ITEM_ID")
Note
-----
- 'PLAN_TABLE' is old version
It appears that table_a is partitioned and that the query only needs to scan 4 partitions while table_b is not partitioned and must be read in its entirety. The optimizer estimates that 4 partitions of table_a have 10 million rows while table_b has 118 million rows. You're using a nested loop so you'd expect O(n) performance so based on the statistics, it would make sense that the second query would take ~11.8 times as long as the first query.
Are the optimizer's estimates accurate? The optimizer is only as good as the statistics you've given it and it is possible that one or both tables have stale statistics.
I hope someone can explain the performance of joining multiple tables vs. using MINUS to eliminate records. I looked at a few other stack overflow questions but didn't see what I was looking for.
I thought these two queries would produce the same output, and I have always heard "use joins, use joins!", particularly from stackoverflow posts, that they were expected to be faster...
This is the first query I ran which I thought would be much slower, but it takes only a matter of minutes to run...
select some_id
from table1
MINUS
select some_id
from table2
where table2.value = 'some_value'
MINUS
select some_id
from table3
where table3.value = 'some_value'
group by some_id
This is the second query which I thought would be faster, but it has been running for over 3 hours now (with no end in sight?)
select some_id
from table1
join table2 on table1.id=table2.id
join table3 on table1.id=table3.id
where table2.value = 'some_value'
or table3.value = 'some_value'
group by some_id
I should note all 3 tables have > 1 Million records, up to 15 Million records each.
EDIT:
Sorry - I meant to let you know I was avoiding the use of NOT EXISTS in this question as a response, as I really am curious about just these two scenarios.
Try this version:
select some_id
from table1
where not exists (select 1 from table2 t2 on t1.id = t2.id and t2.value = 'some_value') or
not exists (select 1 from table3 t3 on t1.id = t3.id and t3.value = 'some_value')
For best performance, you want indexes on table2(id, value) and table3(id, value).
Firstly make sure you have the indexes in place,
to see the plan, if it is making use of full table scan, the go ahead with the creating of indexes else it is going to take a long , long time.
if you have plsql developer, then paste the query in the in sql window and press F5 it would give you the explain plan .
or can do this also,
SCOTT#research 17-APR-15> EXPLAIN PLAN FOR
2 select empno
3 from emp
4 MINUS
5 select empno
6 from empp
7 where empp.empno = '7839'
8 MINUS
9 select empno
10 from emppp
11 where emppp.empno = '7902'
12 group by empno
13 ;
Explained.
SCOTT#research 17-APR-15> SET LINESIZE 130
SCOTT#research 17-APR-15> SET PAGESIZE 0
SCOTT#research 17-APR-15> SELECT *
2 FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 4222598102
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 82 | 10 (90)| 00:00:01 |
| 1 | MINUS | | | | | |
| 2 | MINUS | | | | | |
| 3 | SORT UNIQUE NOSORT | | 14 | 56 | 2 (50)| 00:00:01 |
| 4 | INDEX FULL SCAN | PK_EMP | 14 | 56 | 1 (0)| 00:00:01 |
| 5 | SORT UNIQUE NOSORT | | 1 | 13 | 4 (25)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | EMPP | 1 | 13 | 3 (0)| 00:00:01 |
| 7 | SORT UNIQUE NOSORT | | 1 | 13 | 4 (25)| 00:00:01 |
| 8 | SORT GROUP BY NOSORT| | 1 | 13 | 4 (25)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | EMPPP | 1 | 13 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - filter("EMPP"."EMPNO"=7839)
9 - filter("EMPPP"."EMPNO"=7902)
Note
-----
- dynamic sampling used for this statement (level=2)
26 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2137789089
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 16336 | 29 (0)| 00:00:01 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| DISPLAY | 8168 | 16336 | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
or if you want to use autotrace then do,
set autotrace on explain
This is how it would look,
SCOTT#research 17-APR-15> select empno
2 from emp
3 MINUS
4 select empno
5 from empp
6 where empp.empno = '7839'
7 MINUS
8 select empno
9 from emppp
10 where emppp.empno = '7902'
11 group by empno
12 ;
EMPNO
----------
234
7499
7521
7566
7654
7698
7782
7788
7844
7876
7900
7934
12 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4222598102
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 82 | 10 (90)| 00:00:01 |
| 1 | MINUS | | | | | |
| 2 | MINUS | | | | | |
| 3 | SORT UNIQUE NOSORT | | 14 | 56 | 2 (50)| 00:00:01 |
| 4 | INDEX FULL SCAN | PK_EMP | 14 | 56 | 1 (0)| 00:00:01 |
| 5 | SORT UNIQUE NOSORT | | 1 | 13 | 4 (25)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | EMPP | 1 | 13 | 3 (0)| 00:00:01 |
| 7 | SORT UNIQUE NOSORT | | 1 | 13 | 4 (25)| 00:00:01 |
| 8 | SORT GROUP BY NOSORT| | 1 | 13 | 4 (25)| 00:00:01 |
|* 9 | TABLE ACCESS FULL | EMPPP | 1 | 13 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - filter("EMPP"."EMPNO"=7839)
9 - filter("EMPPP"."EMPNO"=7902)
Note
-----
- dynamic sampling used for this statement (level=2)
SCOTT#research 17-APR-15>
SCOTT#research 17-APR-15> select emp.empno
2 from emp
3 join empp on emp.empno=empp.empno
4 join emppp on emp.empno=emppp.empno
5 where empp.empno = '7839'
6 or emppp.empno = '7902'
7 group by emp.empno
8 ;
EMPNO
----------
7839
7902
Execution Plan
----------------------------------------------------------
Plan hash value: 1435156579
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 8 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 30 | 8 (25)| 00:00:01 |
|* 2 | HASH JOIN | | 1 | 30 | 7 (15)| 00:00:01 |
| 3 | NESTED LOOPS | | 6 | 102 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMPPP | 6 | 78 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN| PK_EMP | 1 | 4 | 0 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | EMPP | 10 | 130 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMP"."EMPNO"="EMPP"."EMPNO")
filter("EMPP"."EMPNO"=7839 OR "EMPPP"."EMPNO"=7902)
5 - access("EMP"."EMPNO"="EMPPP"."EMPNO")
Note
-----
- dynamic sampling used for this statement (level=2)
I have 2 queries do the same job:
SELECT * FROM student_info
INNER JOIN class
ON student_info.id = class.studentId
WHERE student_info.name = 'Ken'
SELECT * FROM (SELECT * FROM student_info WHERE name = 'Ken') studInfo
INNER JOIN class
ON student_info.id = class.studentId
Which one is faster? I guess the second but not sure, I am using Oracle 11g.
UPDATED:
My tables are non-indexed and I confirm two PLAN_TABLE_OUTPUTs are almost same:
Full size image
In the latest versions of Oracle, the optimizer is smart enough to do its job. So it won't matter and both of your queries would be internally optimized to do the task efficiently. Optimizer might do a query re-write and opt an efficient execution plan.
Let's understand this with a small example of EMP and DEPT table. I will use two similar queries like yours in the question.
I will take two cases, first a predicate having a non-indexed column, second with an indexed column.
Case 1 - predicate having a non-indexed column
SQL> explain plan for
2 SELECT * FROM emp e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno
5 where ename = 'SCOTT';
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 59 | 4 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("E"."ENAME"='SCOTT')
4 - access("E"."DEPTNO"="D"."DEPTNO")
Note
-----
- this is an adaptive plan
22 rows selected.
SQL>
SQL> explain plan for
2 SELECT * FROM (SELECT * FROM emp WHERE ename = 'SCOTT') e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 59 | 4 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("ENAME"='SCOTT')
4 - access("EMP"."DEPTNO"="D"."DEPTNO")
Note
-----
- this is an adaptive plan
22 rows selected.
SQL>
Case 2 - predicate having an indexed column
SQL> explain plan for
2 SELECT * FROM emp e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno
5 where empno = 7788;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 2385808155
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 59 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("E"."EMPNO"=7788)
5 - access("E"."DEPTNO"="D"."DEPTNO")
18 rows selected.
SQL>
SQL> explain plan for
2 SELECT * FROM (SELECT * FROM emp where empno = 7788) e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 2385808155
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 59 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO"=7788)
5 - access("EMP"."DEPTNO"="D"."DEPTNO")
18 rows selected.
SQL>
Is there any difference between the explain plans in each case respectively? No.
You'd need to show us the query plans and the execution statistics to be certain. That said, assuming name is indexed and statistics are reasonably accurate, I'd be shocked if the two queries didn't generate the same plan (and, thus, the same performance). With either query, Oracle is free to evaluate the predicate before or after it evaluates the join so it is unlikely that it would choose differently in the two cases.
I would definitely lean towards the first query.
When selects are nested, Oracle has fewer optimization opportunities. It generally has to evaluate the inner select into a temporary view and then apply the outer select to that. That is rarely faster than a JOIN where Oracle will evaluate everything together.
Showing your EXPLAIN PLAN would provide extra info for us as well.
Please compare the following:
INNER JOIN table1 t1 ON t1.someID LIKE 'search.%' AND
t1.someID = ( 'search.' || t0.ID )
vs.
INNER JOIN table1 t1 ON t1.someID = ( 'search.' || t0.ID )
I've been told, that the first case is optimized. But you know, I can not understand why it is. As far as I understand the 2nd example should run faster.
We use Oracle, but I suppose it does not matter at the moment.
Please explain if I'm wrong.
Thank you
So, here is the explain plan for a query which joins on just the concatenated string:
SQL> explain plan for
2 select e.* from emp e
3 join big_table bt on bt.col2 = 'search'||trim(to_char(e.empno))
4 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 179424166
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1052 | 65224 | 43 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1052 | 65224 | 43 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 20 | 780 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BIG_VC_I | 53 | 1219 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
15 rows selected.
SQL>
Compare and contrast with the plan for a query which includes the LIKE clause in its join:
SQL> explain plan for
2 select e.* from emp e
3 join big_table bt on (bt.col2 like 'search%'
4 and bt.col2 = 'search'||trim(to_char(e.empno)))
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 179424166
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 5 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 62 | 5 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BIG_VC_I | 1 | 23 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')
3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
filter("BT"."COL2" LIKE 'search%')
17 rows selected.
SQL>
The cost of the second query is much lower than the first. But this is because the optimizer is estimating that the second query will return far fewer rows than the first query. More information allows the database to make a more accurate prediction. (In fact the query will return no rows).
Of course this does presume the joined column is indexed, otherwise it won't make any difference.
The other thing to bear in mind is that the columns which are queried can affect the plan. This version selects from BIG_TABLE rather than EMP.
SQL> explain plan for
2 select bt.* from emp e
3 join big_table bt on (bt.col2 like 'search%'
4 and bt.col2 = 'search'||trim(to_char(e.empno)))
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
Plan hash value: 4042413806
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 46 | 4 (0)| 00:00:01 |
|* 3 | INDEX FULL SCAN | PK_EMP | 1 | 4 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | BIG_VC_I | 1 | | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| BIG_TABLE | 1 | 42 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')
4 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
filter("BT"."COL2" LIKE 'search%')
19 rows selected.
SQL>
The query analysis of the various database engines would really tell the story but my first instinct would be that the first form is in fact optimized. The reason is that the compiler cannot guess as the to results of the concatenation. It must do more work to determine the value against which to do the match and would likely result in a table scan. The first still must do that, however, it is able to narrow the resultset using the LIKE operator (presuming an index exists on the someID column) first and thus has to do fewer concatenations.