Please compare the following:
INNER JOIN table1 t1 ON t1.someID LIKE 'search.%' AND
t1.someID = ( 'search.' || t0.ID )
vs.
INNER JOIN table1 t1 ON t1.someID = ( 'search.' || t0.ID )
I've been told, that the first case is optimized. But you know, I can not understand why it is. As far as I understand the 2nd example should run faster.
We use Oracle, but I suppose it does not matter at the moment.
Please explain if I'm wrong.
Thank you
So, here is the explain plan for a query which joins on just the concatenated string:
SQL> explain plan for
2 select e.* from emp e
3 join big_table bt on bt.col2 = 'search'||trim(to_char(e.empno))
4 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 179424166
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1052 | 65224 | 43 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1052 | 65224 | 43 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 20 | 780 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BIG_VC_I | 53 | 1219 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
15 rows selected.
SQL>
Compare and contrast with the plan for a query which includes the LIKE clause in its join:
SQL> explain plan for
2 select e.* from emp e
3 join big_table bt on (bt.col2 like 'search%'
4 and bt.col2 = 'search'||trim(to_char(e.empno)))
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 179424166
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 5 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 62 | 5 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BIG_VC_I | 1 | 23 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')
3 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
filter("BT"."COL2" LIKE 'search%')
17 rows selected.
SQL>
The cost of the second query is much lower than the first. But this is because the optimizer is estimating that the second query will return far fewer rows than the first query. More information allows the database to make a more accurate prediction. (In fact the query will return no rows).
Of course this does presume the joined column is indexed, otherwise it won't make any difference.
The other thing to bear in mind is that the columns which are queried can affect the plan. This version selects from BIG_TABLE rather than EMP.
SQL> explain plan for
2 select bt.* from emp e
3 join big_table bt on (bt.col2 like 'search%'
4 and bt.col2 = 'search'||trim(to_char(e.empno)))
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
Plan hash value: 4042413806
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 46 | 4 (0)| 00:00:01 |
|* 3 | INDEX FULL SCAN | PK_EMP | 1 | 4 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | BIG_VC_I | 1 | | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| BIG_TABLE | 1 | 42 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter('search'||TRIM(TO_CHAR("E"."EMPNO")) LIKE 'search%')
4 - access("BT"."COL2"='search'||TRIM(TO_CHAR("E"."EMPNO")))
filter("BT"."COL2" LIKE 'search%')
19 rows selected.
SQL>
The query analysis of the various database engines would really tell the story but my first instinct would be that the first form is in fact optimized. The reason is that the compiler cannot guess as the to results of the concatenation. It must do more work to determine the value against which to do the match and would likely result in a table scan. The first still must do that, however, it is able to narrow the resultset using the LIKE operator (presuming an index exists on the someID column) first and thus has to do fewer concatenations.
Related
I have 3 tables which I am using in sub queries like below.
INSERT INTO TABLE_A (SELECT * FROM TABLE_B WHERE (X,Y) NOT IN (SELECT X,Y FROM TABLE_A));
But it took ages to run the query as the tables have 400,000 to 500,000 rows.
Whereas, when I do the below query, it doesn't take much time.
INSERT INTO TABLE_A (SELECT * FROM TABLE_B WHERE X||Y NOT IN (SELECT X||Y FROM TABLE_A));
I got a doubt whether both are same or not after seeing the execution time.
Why is one slower than the other?
Are these queries same?
The key here is the "INTERNAL FUNCTION" in the plan, which probably means you are comparing columns with different data types (which is always a bad idea)
For example
SQL> create table t1 ( x int, y int );
Table created.
SQL> create table t2 ( x varchar2(10), y varchar2(10));
Table created.
SQL>
SQL> insert into t1 select rownum,rownum from dual
2 connect by level <= 1000;
1000 rows created.
SQL>
SQL> insert into t2 select rownum,rownum from dual
2 connect by level <= 1000;
1000 rows created.
SQL> set autotrace traceonly explain
SQL> select *
2 from t1
3 where (x,y) not in (select x,y from t2);
Execution Plan
----------------------------------------------------------
Plan hash value: 2177415756
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 16000 | 8 (25)| 00:00:01 |
| 1 | MERGE JOIN ANTI NA | | 1000 | 16000 | 8 (25)| 00:00:01 |
| 2 | SORT JOIN | | 1000 | 8000 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T1 | 1000 | 8000 | 3 (0)| 00:00:01 |
|* 4 | SORT UNIQUE | | 1000 | 8000 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T2 | 1000 | 8000 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access(INTERNAL_FUNCTION("X")=TO_NUMBER("X") AND
INTERNAL_FUNCTION("Y")=TO_NUMBER("Y"))
filter(INTERNAL_FUNCTION("Y")=TO_NUMBER("Y") AND
INTERNAL_FUNCTION("X")=TO_NUMBER("X"))
We had to clean up the data types before we could do a proper join, and hence we did not use the hash join.
When you changed this to be concatentation, that operator made everything a string, and hence a hash join could be used.
SQL> select *
2 from t1
3 where (x||y) not in (select x||y from t2);
Execution Plan
----------------------------------------------------------
Plan hash value: 1275484728
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 990 | 15840 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN ANTI NA | | 990 | 15840 | 6 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T1 | 1000 | 8000 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T2 | 1000 | 8000 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(TO_CHAR("X")||TO_CHAR("Y")="X"||"Y")
SQL>
but as William has pointed out, you run the risk of inaccurate results.
I am running the exact same join query using two different tables, but the first one (table A) times out whereas the second (table B) does not.
SELECT * FROM table_X
INNER JOIN table_A
ON table_A.point_origin = table_X.item_id
WHERE ROWNUM < 10;
SELECT * FROM table_X
INNER JOIN table_B
ON table_B.point_origin = table_X.item_id
WHERE ROWNUM < 10;
As far as I know, table A is a subset of table B. Neither table A nor table B have point_origin indexed.
(Edit for clarification: table A is a only a subset of table B in terms of row identifiers, not in terms of exact column data.)
For what it's worth, I'm dealing with very large tables and item_id is indexed.
Is there anything else that would affect performance here or am I definitely wrong about some information provided?
Edit: Additional information per a comment below
table_A:
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 4743 | 12 (0)| | |
|* 1 | COUNT STOPKEY | | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| table_X | 1 | 227 | 1 (0)| | |
| 3 | NESTED LOOPS | | 11 | 5797 | 12 (0)| | |
| 4 | PARTITION RANGE ALL | | 10M| 2969M| 2 (0)| 1 | 4 |
| 5 | TABLE ACCESS FULL | table_A | 10M| 2969M| 2 (0)| 1 | 4 |
|* 6 | INDEX RANGE SCAN | table_X_IP_PK | 1 | | 1 (0)| | |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
6 - access("table_A"."POINT_ORIGIN"="table_X"."ITEM_ID")
Note
-----
- 'PLAN_TABLE' is old version
table_B:
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 3879 | 11 (0)|
|* 1 | COUNT STOPKEY | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| table_X | 1 | 227 | 1 (0)|
| 3 | NESTED LOOPS | | 10 | 4310 | 11 (0)|
| 4 | TABLE ACCESS FULL | table_B | 118M| 22G| 2 (0)|
|* 5 | INDEX RANGE SCAN | table_X_IP_PK | 1 | | 1 (0)|
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
5 - access("table_B"."POINT_ORIGIN"="table_X"."ITEM_ID")
Note
-----
- 'PLAN_TABLE' is old version
It appears that table_a is partitioned and that the query only needs to scan 4 partitions while table_b is not partitioned and must be read in its entirety. The optimizer estimates that 4 partitions of table_a have 10 million rows while table_b has 118 million rows. You're using a nested loop so you'd expect O(n) performance so based on the statistics, it would make sense that the second query would take ~11.8 times as long as the first query.
Are the optimizer's estimates accurate? The optimizer is only as good as the statistics you've given it and it is possible that one or both tables have stale statistics.
I have 2 queries do the same job:
SELECT * FROM student_info
INNER JOIN class
ON student_info.id = class.studentId
WHERE student_info.name = 'Ken'
SELECT * FROM (SELECT * FROM student_info WHERE name = 'Ken') studInfo
INNER JOIN class
ON student_info.id = class.studentId
Which one is faster? I guess the second but not sure, I am using Oracle 11g.
UPDATED:
My tables are non-indexed and I confirm two PLAN_TABLE_OUTPUTs are almost same:
Full size image
In the latest versions of Oracle, the optimizer is smart enough to do its job. So it won't matter and both of your queries would be internally optimized to do the task efficiently. Optimizer might do a query re-write and opt an efficient execution plan.
Let's understand this with a small example of EMP and DEPT table. I will use two similar queries like yours in the question.
I will take two cases, first a predicate having a non-indexed column, second with an indexed column.
Case 1 - predicate having a non-indexed column
SQL> explain plan for
2 SELECT * FROM emp e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno
5 where ename = 'SCOTT';
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 59 | 4 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("E"."ENAME"='SCOTT')
4 - access("E"."DEPTNO"="D"."DEPTNO")
Note
-----
- this is an adaptive plan
22 rows selected.
SQL>
SQL> explain plan for
2 SELECT * FROM (SELECT * FROM emp WHERE ename = 'SCOTT') e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 59 | 4 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 39 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("ENAME"='SCOTT')
4 - access("EMP"."DEPTNO"="D"."DEPTNO")
Note
-----
- this is an adaptive plan
22 rows selected.
SQL>
Case 2 - predicate having an indexed column
SQL> explain plan for
2 SELECT * FROM emp e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno
5 where empno = 7788;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 2385808155
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 59 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("E"."EMPNO"=7788)
5 - access("E"."DEPTNO"="D"."DEPTNO")
18 rows selected.
SQL>
SQL> explain plan for
2 SELECT * FROM (SELECT * FROM emp where empno = 7788) e
3 INNER JOIN dept d
4 ON e.deptno = d.deptno;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 2385808155
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 59 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 59 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 39 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO"=7788)
5 - access("EMP"."DEPTNO"="D"."DEPTNO")
18 rows selected.
SQL>
Is there any difference between the explain plans in each case respectively? No.
You'd need to show us the query plans and the execution statistics to be certain. That said, assuming name is indexed and statistics are reasonably accurate, I'd be shocked if the two queries didn't generate the same plan (and, thus, the same performance). With either query, Oracle is free to evaluate the predicate before or after it evaluates the join so it is unlikely that it would choose differently in the two cases.
I would definitely lean towards the first query.
When selects are nested, Oracle has fewer optimization opportunities. It generally has to evaluate the inner select into a temporary view and then apply the outer select to that. That is rarely faster than a JOIN where Oracle will evaluate everything together.
Showing your EXPLAIN PLAN would provide extra info for us as well.
I'm using Oracle 11g, the main table has about 10m records. Here is my query:
SELECT COUNT (*)
FROM CONTACT c INNER JOIN STATUS S ON C.STATUS = S.STATUS
WHERE C.USER = 1 AND S.REQUIRE = 1 AND ROWNUM = 1;
The Cost is 3736, but when I changed it to this form:
SELECT COUNT (*) FROM
(SELECT 1 FROM CONTACT c INNER JOIN STATUS S ON C.STATUS = S.STATUS
WHERE C.USER = 1 AND S.REQUIRE = 1 AND ROWNUM = 1);
The Cost became 5! What's the difference between these 2 queries?
Here are the explain plan for both query:
The first query:
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 3736 (1)| 00:00:45 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | NESTED LOOPS | | 4627 | 46270 | 3736 (1)| 00:00:45 |
| 4 | TABLE ACCESS BY INDEX ROWID| CONTACT | 6610 | 33050 | 3736 (1)| 00:00:45 |
|* 5 | INDEX RANGE SCAN | IX_CONTACT_USR | 6610 | | 20 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IX_CONTACT_STATUS | 1 | 5 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM=1)
5 - access("C"."USER"=1)
6 - access("C"."STATUS"="S"."STATUS" AND "S"."REQUIRE"=1)
The second query:
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 1 | | 5 (0)| 00:00:01 |
|* 3 | COUNT STOPKEY | | | | | |
| 4 | NESTED LOOPS | | 2 | 20 | 5 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| CONTACT | 3 | 15 | 5 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IX_CONTACT_USR | 6610 | | 3 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | IX_CONTACT_STATUS | 1 | 5 | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(ROWNUM=1)
6 - access("C"."USER"=1)
7 - access("C"."STATUS"="S"."STATUS" AND "S"."REQUIRE"=1)
I executed 2 queries, the first one sometimes cost 45s+ (e.g. first run or change the user id), otherwise it will cost <1s. I totally don't know why it's such different, maybe db cache?
When I executed the second query, I can always get result in 1s. So I think the second one is better, but I don't the reason why it improves a lot.
You can see where the difference comes in by comparing the line in the execution plans that access the CONTACT table (looks at the rows column, the first one).
First:
| 4 | TABLE ACCESS BY INDEX ROWID| CONTACT | 6610 | 33050 | 3736 (1)| 00:00:45 |
Second:
| 5 | TABLE ACCESS BY INDEX ROWID| CONTACT | 3 | 15 | 5 (0)| 00:00:01 |
In the first example, the ROWNUM = 1 predicate isn't applied until after the CONTACT table has been accessed, so you're getting 6610 rows returned from this table. Whereas in your second query optimizer has only returned 3. This is many orders of magnitude less, which is why you're seeing the second query complete quicker.
As to why the second execution of the "slow" query is "fast", you're thinking is correct - the data has been loaded from disk into the buffer cache so access is much quicker.
Most likely that's just estimation difference and they will have same execution statistics. Trace both + tkprof to get real data.
Also if you want some more details behind optimizer logic - do hard parse with event 10053.
Cost is not only the factor for the queries, some times it depends on the server also, which u r showing is it a CPU cost or I/O Cost, some times cost my vary, because of the Column Cardinality, Conditions of the query. if u wanna see the much clarification on the queries, get the explain plan or TKPROOF, so that u 'll get to know , it's going for full table scan or which index is picking up and execution time.
In Oracle (10g), when I use a View (not Materialized View), does Oracle take into account the where clause when it executes the view?
Let's say I have:
MY_VIEW =
SELECT *
FROM PERSON P, ORDERS O
WHERE P.P_ID = O.P_ID
And I then execute the following:
SELECT *
FROM MY_VIEW
WHERE MY_VIEW.P_ID = '1234'
When this executes, does oracle first execute the query for the view and THEN filter it based on my where clause (where MY_VIEW.P_ID = '1234') or does it do this filtering as part of the execution of the view? If it does not do the latter, and P_ID had an index, would I also lose out on the indexing capability since Oracle would be executing my query against the view which doesn't have the index rather than the base table which has the index?
It will not execute the query first. If you have a index on P_ID, it will be used.
Execution plan is the same as if you would merge both view-code and WHERE-clause into a single select statement.
You can try this for yourself:
EXPLAIN PLAN FOR
SELECT *
FROM MY_VIEW
WHERE MY_VIEW.P_ID = '1234'
followed by
SELECT * FROM TABLE( dbms_xplan.display );
---------------------------------------------------------------------------------
|Id | Operation | Name |Rows| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | 2 (0)| 00:00:01|
| 1 | NESTED LOOPS | | 1 | 52 | 2 (0)| 00:00:01|
| 2 | TABLE ACCESS BY INDEX ROWID| PERSON | 1 | 26 | 2 (0)| 00:00:01|
| 3 | INDEX UNIQUE SCAN | PK_P | 1 | | 1 (0)| 00:00:01|
| 4 | TABLE ACCESS BY INDEX ROWID| ORDERS | 1 | 26 | 0 (0)| 00:00:01|
| 5 | INDEX RANGE SCAN | IDX_O | 1 | | 0 (0)| 00:00:01|
---------------------------------------------------------------------------------
WOW!! This is interesting.. I have two different explain plan depends on different data volumn & query inside logical view(This is my assumption)
The original question case : It is definitely doing filtering first.
I have small number of data(<10 in total) in this test table.
`
| 0 | SELECT STATEMENT | | 2 | 132 | 2 (0)|
00:00:01 |
| 1 | NESTED LOOPS | | 2 | 132 | 2 (0)|
00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| PERSON | 1 | 40 | 1 (0)|
00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PERSON_PK | 1 | | 0 (0)|
00:00:01 |
|* 4 | INDEX RANGE SCAN | ORDERS_PK | 2 | 52 | 1 (0)|
00:00:01 |
Predicate Information (identified by operation id)
3 - access("P"."P_ID"=1)
4 - access("O"."P_ID"=1)
Note
dynamic sampling used for this statement
`
However, when the data become larger(just several hundreads though, 300 ~ 400) and the view query become complex(using "connect by" etc..), the plan changed, I think...
| 0 | SELECT STATEMENT | | 1 | 29 | 2
(0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 29 | 2
(0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| RP_TRANSACTION | 1 | 12 | 1
(0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | RP_TRANSACTION_PK | 1 | | 0
(0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| RP_REQUEST | 279 | 4743 | 1
(0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | RP_REQUEST_PK | 1 | | 0
(0)| 00:00:01 |
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TRANSACTION_ID"=18516648)
5 - access("REQ"."REQUEST_ID"="TRANS"."REQUEST_ID")
---- Below is my original post
In my knowledge, the oracle first execute the view(logical view) using temporary space and then do the filter.. So your query is basically same as
SELECT *
FROM ( SELECT *
FROM PERSON P, ORDERS O
WHERE P.P_ID = O.P_ID
) where P_ID='1234'
I don't think you can create index on logical view(Materialized view uses index)
Also, you should be aware, you would execute the query for MY_VIEW, everytime you using
select *
from MY_VIEW
where P_ID = '1234'.
I mean every single time. Naturally, it is not a good idea for the performance matter