Unexpected query results in Oracle db - sql

We have Oracle 12.2.0.1.0 database. We create a simple table like this:
CREATE TABLE TABLE1 (DATE1 TIMESTAMP (6));
INSERT INTO TABLE1 VALUES (TIMESTAMP'2018-05-30 00:00:00');
INSERT INTO TABLE1 VALUES (TIMESTAMP'2018-05-30 00:00:00');
When we query with the following two select statements, we get different results. The first one returns two rows as expected, while the second one doesn't.
SELECT T1.*, NVL(T2.DATE1, TIMESTAMP'1900-01-01 00:00:00')
FROM TABLE1 T1
LEFT JOIN TABLE1 T2
ON 1 = 0
WHERE T1.DATE1 > NVL(T2.DATE1, TIMESTAMP'1900-01-01 00:00:00');
SELECT T1.*, NVL(T2.DATE1, TIMESTAMP'1900-01-01 00:00:00')
FROM TABLE1 T1
LEFT JOIN TABLE1 T2
ON T1.DATE1 || '---' = '-'
WHERE T1.DATE1 > NVL(T2.DATE1, TIMESTAMP'1900-01-01 00:00:00');
T1 and T2 are the same TABLE1. We are joining it on itself.
Please advise why that is so. Thanks.

It seems the optimizer gets confused with so many levels of obfuscating the join condition.
The first query results in the following execution plan:
SQL_ID 9k6g3m0xs31w7, child number 1
-------------------------------------
select t1.*, nvl(t2.date1, timestamp'1900-01-01 00:00:00') from table1
t1 left join table1 t2 on 1 = 0 where t1.date1 > nvl(t2.date1,
timestamp'1900-01-01 00:00:00')
Plan hash value: 963482612
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 3 (100)| 2 |00:00:00.01 | 7 |
|* 1 | TABLE ACCESS FULL| TABLE1 | 1 | 2 | 26 | 3 (0)| 2 |00:00:00.01 | 7 |
-----------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$F7AF7B7D / T1#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."DATE1">TIMESTAMP' 1900-01-01 00:00:00.000000000')
So the planner correctly sees that the self join is unnecessary and replaces the NVL() condition on the joined table with a condition on the column itself.
Apparently this "replacing" the condition does not work correctly in 12.2.
The second query results in the following plan:
SQL_ID 3twykk3kcyyxy, child number 1
-------------------------------------
select t1.*, nvl(t2.date1, timestamp'1900-01-01 00:00:00') from table1
t1 left join table1 t2 on t1.date1 || '---' = '-' where t1.date1 >
nvl(t2.date1, timestamp'1900-01-01 00:00:00')
Plan hash value: 736255932
----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 8 (100)| 0 |00:00:00.01 | 7 | | | |
|* 1 | FILTER | | 1 | | | | 0 |00:00:00.01 | 7 | | | |
| 2 | MERGE JOIN OUTER | | 1 | 1 | 26 | 8 (25)| 2 |00:00:00.01 | 7 | | | |
| 3 | SORT JOIN | | 1 | 2 | 26 | 4 (25)| 2 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)|
| 4 | TABLE ACCESS FULL | TABLE1 | 1 | 2 | 26 | 3 (0)| 2 |00:00:00.01 | 7 | | | |
|* 5 | SORT JOIN | | 2 | 2 | 26 | 4 (25)| 0 |00:00:00.01 | 0 | 1024 | 1024 | |
| 6 | VIEW | VW_LAT_C83A7ED5 | 2 | 2 | 26 | 3 (0)| 0 |00:00:00.01 | 0 | | | |
|* 7 | FILTER | | 2 | | | | 0 |00:00:00.01 | 0 | | | |
| 8 | TABLE ACCESS FULL| TABLE1 | 0 | 2 | 26 | 3 (0)| 0 |00:00:00.01 | 0 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$F7AF7B7D
4 - SEL$F7AF7B7D / T1#SEL$1
6 - SEL$BCD4421C / VW_LAT_AE9E49E8#SEL$AE9E49E8
7 - SEL$BCD4421C
8 - SEL$BCD4421C / T2#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."DATE1">NVL("ITEM_1",TIMESTAMP' 1900-01-01 00:00:00.000000000'))
5 - access(INTERNAL_FUNCTION("T1"."DATE1")>NVL("ITEM_1",TIMESTAMP' 1900-01-01 00:00:00.000000000'))
7 - filter(INTERNAL_FUNCTION("T1"."DATE1")||'---'='-')
So the optimizer replaced the reference to the table column with some ITEM1 placeholder - and the step access(INTERNAL_FUNCTION("T1"."DATE1")>NVL("ITEM_1",TIMESTAMP' 1900-01-01 00:00:00.000000000')) messes things up.
With 12.1 the plan is essentially the same, the only difference is that the access() part is missing in the predicates, so I guess that replacement is somewhat buggy in 12.2 (to be precise my version is: 12.2.0.1.0)

Related

Optimize view with CASE and GROUP BY

I'm trying to create a view which will be queried against with a simple WHERE clause. I have the following tables (simplified) :
CREATE TABLE table_1 (
"ID" VARCHAR2(38 BYTE),
"USER" VARCHAR2(255 BYTE),
"FIELD_TYPE_1" VARCHAR2(255 BYTE),
"FIELD_TYPE_2" VARCHAR2(255 BYTE)
);
CREATE TABLE table_2 (
"FK_ID" VARCHAR2(38 BYTE),
"TYPE" NUMBER
);
table_1 have 12M rows and table_2 have 25M rows.
My current view is defined as :
CREATE OR REPLACE VIEW vw_t1 AS
SELECT
CASE
WHEN t2.TYPE = 1 THEN t1.field_type_1
ELSE t1.field_type_2
END FIELD_TYPE,
t1.USER,
COUNT(*)
FROM table_1 t1 JOIN table_2 t2 ON t1.ID = t2.FK_ID
GROUP BY t1.USER,
CASE WHEN t2.TYPE = 1 THEN t1.field_type_1 else t1.field_type_2 end;
and queries against the view perform poorly taking 10s to return:
SELECT * FROM vw_t1 WHERE field_type LIKE 'some text'
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6867K| 491M| | 394K (1)| 00:00:16 |
| 1 | HASH GROUP BY | | 6867K| 491M| 553M| 394K (1)| 00:00:16 |
|* 2 | FILTER | | | | | | |
|* 3 | HASH JOIN RIGHT OUTER | | 6867K| 491M| 57M| 282K (1)| 00:00:12 |
| 4 | TABLE ACCESS STORAGE FULL| table_1 | 2420K| 30M| | 6891 (1)| 00:00:01 |
|* 5 | TABLE ACCESS STORAGE FULL| table_2 | 6835K| 404M| | 248K (1)| 00:00:10 |
-------------------------------------------------------------------------------------------------------
I have thought of returning the fields 'field_type_1', 'field_type_2', and 'type', and have the users do the CASE themselves, but the queries won't return the same results due to the GROUP BY.
You should test your view's explain plan with changed context. For me it is not possible to test it (with just a few rows) but here are 3 different options and explain plans for the view and sql selecting from the view.
No indexes on any table
Primary key on TABLE_1
Primary key on TABLE_1 and NonUnique Index on TABLE_2
-- V i e w ------------------------------------
SELECT
CASE WHEN t2.TYPE = 1 THEN t1.field_type_1
ELSE t1.field_type_2
END "FIELD_TYPE",
--
t1.USER_ID,
--
COUNT(*) "CNT"
FROM table_1 t1 JOIN table_2 t2 ON t1.ID = t2.FK_ID
GROUP BY t1.USER_ID, CASE WHEN t2.TYPE = 1 THEN t1.field_type_1 else t1.field_type_2 END;
1. ******************************************************************************
Table_1 - none
Table_2 - none
View
----------------------------------------------------------
| Id | Operation | Name | Rows | Time |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 00:00:01 |
| 1 | HASH GROUP BY | | 8 | 00:00:01 |
|* 2 | HASH JOIN | | 8 | 00:00:01 |
| 3 | TABLE ACCESS FULL| TABLE_1 | 3 | 00:00:01 |
| 4 | TABLE ACCESS FULL| TABLE_2 | 8 | 00:00:01 |
----------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."ID"="T2"."FK_ID")
=== Query ==================================================================================
SELECT * FROM a__a WHERE field_type LIKE('%FLD%');
----------------------------------------------------------
| Id | Operation | Name | Rows | Time |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 00:00:01 |
|* 2 | HASH JOIN | | 1 | 00:00:01 |
| 3 | TABLE ACCESS FULL| TABLE_1 | 3 | 00:00:01 |
| 4 | TABLE ACCESS FULL| TABLE_2 | 8 | 00:00:01 |
----------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."ID"="T2"."FK_ID")
filter(CASE "T2"."TYPE" WHEN 1 THEN "T1"."FIELD_TYPE_1" ELSE
"T1"."FIELD_TYPE_2" END LIKE '%FLD%')
...
2. ******************************************************************************
Table_1 - PK (ID)
Table_2 - None
View
| Id | Operation | Name | Rows | Time |
|-----|-------------------------------|------------|-------|----------|
| 0 | SELECT STATEMENT | | 8 | 00:00:01 |
| 1 | HASH GROUP BY | | 8 | 00:00:01 |
| 2 | MERGE JOIN | | 8 | 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| TABLE_1 | 3 | 00:00:01 |
| 4 | INDEX FULL SCAN | TABLE_1_PK | 3 | 00:00:01 |
|* 5 | SORT JOIN | | 8 | 00:00:01 |
| 6 | TABLE ACCESS FULL | TABLE_2 | 8 | 00:00:01 |
-----------------------------------------------------------------------
=== Query ==================================================================================
SELECT * FROM a__a WHERE field_type LIKE('%FLD%');
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 00:00:01 |
| 2 | MERGE JOIN | | 1 | 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| TABLE_1 | 3 | 00:00:01 |
| 4 | INDEX FULL SCAN | TABLE_1_PK | 3 | 00:00:01 |
|* 5 | FILTER | | | |
|* 6 | SORT JOIN | | 8 | 00:00:01 |
| 7 | TABLE ACCESS FULL | TABLE_2 | 8 | 00:00:01 |
-----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter(CASE "T2"."TYPE" WHEN 1 THEN "T1"."FIELD_TYPE_1" ELSE
"T1"."FIELD_TYPE_2" END LIKE '%FLD%')
6 - access("T1"."ID"="T2"."FK_ID")
filter("T1"."ID"="T2"."FK_ID")
...
3. ******************************************************************************
Table_1 - PK (ID)
Table_2 - INDEX NonUnique (FK_ID, TYPE)
VIEW
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 00:00:01 |
| 1 | HASH GROUP BY | | 8 | 00:00:01 |
| 2 | MERGE JOIN | | 8 | 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| TABLE_1 | 3 | 00:00:01 |
| 4 | INDEX FULL SCAN | TABLE_1_PK | 3 | 00:00:01 |
|* 5 | SORT JOIN | | 8 | 00:00:01 |
| 6 | INDEX FULL SCAN | TABLE_2_INDEX_FK_ID_TYPE | 8 | 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("T1"."ID"="T2"."FK_ID")
filter("T1"."ID"="T2"."FK_ID")
=== Query ==================================================================================
SELECT * FROM a__a WHERE field_type LIKE('%FLDt%');
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 00:00:01 |
| 2 | MERGE JOIN | | 1 | 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| TABLE_1 | 3 | 00:00:01 |
| 4 | INDEX FULL SCAN | TABLE_1_PK | 3 | 00:00:01 |
|* 5 | FILTER | | | |
|* 6 | SORT JOIN | | 8 | 00:00:01 |
| 7 | INDEX FULL SCAN | TABLE_2_INDEX_FK_ID_TYPE | 8 | 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter(CASE "T2"."TYPE" WHEN 1 THEN "T1"."FIELD_TYPE_1" ELSE
"T1"."FIELD_TYPE_2" END LIKE '%FLD%')
6 - access("T1"."ID"="T2"."FK_ID")
filter("T1"."ID"="T2"."FK_ID")

why oracle optimizer not eliminate this case?

i am doubting about this case, but not clear why.
consider the following sql :
create table t1(tid int not null, t1 int not null);
create table t2(t2 int not null, tname varchar(30) null);
create unique index i_t2 on t2(t2);
create or replace view v_1 as
select t1.tid,t1.t1,max(t2.tname) as tname
from t1 left join t2
on t1.t1 = t2.t2
group by t1.tid,t1.t1;
then check the execution plan for select count(1) from v_1, the t2 is eliminated by the optimizer:
SQL> select count(1) from v_1;
Execution Plan
----------------------------------------------------------
Plan hash value: 3243658773
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 (34)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | VM_NWVW_0 | 1 | | 3 (34)| 00:00:01 |
| 3 | HASH GROUP BY | | 1 | 26 | 3 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
but if the index i_t2 is dropped or recreated without unique attribute,
the table t2 is not eliminated in execution plan:
SQL> drop index i_t2;
Index dropped.
SQL> select count(1) from v_1;
Execution Plan
----------------------------------------------------------
Plan hash value: 2710188186
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5 (20)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | VM_NWVW_0 | 1 | | 5 (20)| 00:00:01 |
| 3 | HASH GROUP BY | | 1 | 39 | 5 (20)| 00:00:01 |
|* 4 | HASH JOIN OUTER | | 1 | 39 | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL| T2 | 1 | 13 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
it seems even if the index is removed,
the result of select count(1) from v_1 also equal to
select count(1) from (select tid,t1 from t1 group by tid,t1)
why the optimizer does not eliminate t2 in the second case?
is there any principle or actual data example discribing this?
thanks :)
This is an optimization called join elimination. Because t2.t2 us unique, the optimizer knows that every row retrieved from t1 can only ever retrieve one row from t2. Since there is nothing projected from t2, there is no need to perform the join.
If you do
select tid, t1 from v_1;
you will see that we do not perform the join. However, if we project from t2, then the join is needed.

Efficient table comparison

I know that there are the following ways to select values present in one table but not in other.
LEFT JOIN, NOT IN and NOT EXISTS
Which is the recommended option to use?
There probably isn't an universal answer - so would appreciate the used-case where each is advisable.
(I am not looking for the syntax of the above options - just a comparison of the approaches)
In short, LEFT JOIN takes slightly more time that other two. But NOT EXISTS and NOT IN took almost same time.
I prefer left join when I need to utilize the values of other table in select clause. Else I prefer not exists.
I suggest you to replicate the below test on your machine as mine is a home machine with Oracle 12c and hardly anything else running. May be in a bigger environment, the test will give more accurate result.
Test in detail:
To practically test it I will create 2 tables and insert first one with 10 Million rows and second one with some other condition from first one, so some rows are not inserted to second table.
--Create first table
create table test_data_left (empno integer, ename varchar2(10),CONSTRAINT tdl_pk primary key(empno));
--PLSQL Block to enter 10 Million rows in test_data_left
declare v_max_empno integer;
BEGIN
select coalesce(max(EMPNO),0) into v_max_empno from emp_data;
FOR i IN 1..1000000 LOOP -- add 10 Million rows
insert into test_data_left(empno,ename) values (
i+v_max_empno,
DBMS_RANDOM.string('U',TRUNC(DBMS_RANDOM.value(10,11)))
);
END LOOP;
END;
/
commit;
--Create second table and populate with some condition to block some rows from first table
create table test_data_right (empno integer, ename varchar2(10),CONSTRAINT tdr_pk primary key(empno));
insert into test_data_right (empno,ename)
select empno,ename from test_data_left
where ename not like 'JK%';
These are the queries I am using to get the data.
NOTE: I am not using t1.* in select statements, as SQL Developer only displays first 50 rows and you cannot run explain plan on it. Hence I am using count(*)
select count(*) from test_data_left t1 left join test_data_right t2 on
t1.empno=t2.empno where t2.empno is nulll
select count(*) from test_data_left t1
where t1.empno not in (select empno from test_data_right);
select count(*) from test_data_left t1
where not exists (select 1 from test_data_right t2 where t1.empno=t2.empno);
To gather status of last run query, I used this command.
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(null,null,'ALLSTATS LAST')) ;
Just to be careful, that Oracle is not doing any funny business while calculating, I had reset the database connection before running every query.
Below are the status after each query. I have repeated it in reverse order to give a fair chance to LEFT JOIN.
As far as I can see, the LEFT JOIN is the slowest but NOT IN and
NOT EXISTS are almost same. (based on couple of more iterations which I wasn't able to capture)
Iteration 1
LEFT JOIN
SQL_ID 0qz2qtza4yrr0, child number 0
-------------------------------------
select count(*) from test_data_left t1 left join test_data_right t2 on
t1.empno=t2.empno where t2.empno is null
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.41 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.41 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.32 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.22 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.54 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="T2"."EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
NOT EXISTS
SQL_ID c498qdbzw5dxv, child number 0
-------------------------------------
select count(*) from test_data_left t1 where not exists (select 1 from
test_data_right t2 where t1.empno=t2.empno)
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.27 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.27 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.19 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.21 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.49 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="T2"."EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
NOT IN
SQL_ID gwm775xqnufgm, child number 0
-------------------------------------
select count(*) from test_data_left t1 where t1.empno not in (select
empno from test_data_right)
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.23 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.23 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.15 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.19 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.47 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
ITERATION 2
NOT IN
SQL_ID gwm775xqnufgm, child number 0
-------------------------------------
select count(*) from test_data_left t1 where t1.empno not in (select
empno from test_data_right)
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.19 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.19 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.11 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.19 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.46 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
NOT EXISTS
SQL_ID c498qdbzw5dxv, child number 0
-------------------------------------
select count(*) from test_data_left t1 where not exists (select 1 from
test_data_right t2 where t1.empno=t2.empno)
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.19 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.19 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.12 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.19 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.46 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="T2"."EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
LEFT JOIN
SQL_ID 0qz2qtza4yrr0, child number 0
-------------------------------------
select count(*) from test_data_left t1 left join test_data_right t2 on
t1.empno=t2.empno where t2.empno is null
Plan hash value: 2082679279
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.33 | 5012 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.33 | 5012 |
| 2 | NESTED LOOPS ANTI | | 1 | 1206K| 900K|00:00:01.24 | 5012 |
| 3 | INDEX FAST FULL SCAN| TDL_PK | 1 | 1206K| 1000K|00:00:00.22 | 1891 |
|* 4 | INDEX UNIQUE SCAN | TDR_PK | 1000K| 1 | 99865 |00:00:00.50 | 3121 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."EMPNO"="T2"."EMPNO")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
this will return everything from table a where there is not corresponding record in table b
SELECT a.col FROM a WHERE a.col NOT IN (SELECT b.col from b)
Try on below query
select tabA.* from tabA left join tabB on tabA.id = tabB.tabA_id
where tabB.tabA_id is null
Hope it can help.

CTE vs subquery, which one is efficient?

SELECT col, (SELECT COUNT(*) FROM table) as total_count FROM table
This query executes subquery for every row, right?
Now if we have
;WITH CTE(total_count) AS (
SELECT COUNT(*) FROM table
)
SELECT col, (SELECT total_count FROM CTE) FROM table;
Will be second method more efficient? Will CTE execute COUNT(*) only once and then SELECT uses it as prepared value? or in second case also executed COUNT(*) for each row?
For Oracle the surest way is to observe the behaviour of the statements with extended statistics.
Do do so first increase the statistics level to ALL
alter session set statistics_level=all;
Then run both statements (fetching all rows) and find the SQL_ID of those statements
Finally display the statistics using following statement (passing the proper SQL_ID):
select * from table(dbms_xplan.display_cursor('your SQL_ID here',null,'ALLSTATS LAST'));
This gives for my test table
SQL_ID 5n0sdcu8347j9, child number 0
-------------------------------------
SELECT col, (SELECT COUNT(*) FROM t1) as total_count FROM t1
Plan hash value: 1306093980
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 351 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 338 |
| 2 | TABLE ACCESS FULL| T1 | 1 | 1061 | 1000 |00:00:00.01 | 338 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 1061 | 1000 |00:00:00.01 | 351 |
-------------------------------------------------------------------------------------
and
SQL_ID fs0h660f08bj6, child number 0
-------------------------------------
WITH CTE(total_count) AS ( SELECT COUNT(*) FROM t1 ) SELECT col,
(SELECT total_count FROM CTE) FROM t1
Plan hash value: 1223456497
--------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 351 |
| 1 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 338 |
| 2 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 338 |
| 3 | TABLE ACCESS FULL| T1 | 1 | 1061 | 1000 |00:00:00.01 | 338 |
| 4 | TABLE ACCESS FULL | T1 | 1 | 1061 | 1000 |00:00:00.01 | 351 |
--------------------------------------------------------------------------------------
So the plans are slightly different, but in both cases the FULL TABLE SCAN is started only once (column Starts = 1). Which gives no real difference.
For purpose of camparison I run also a correlated subquery, which gives a complete different picture with high number of Starts (of FTS)
SQL_ID cbvwd6pm6699m, child number 0
-------------------------------------
SELECT col, (SELECT COUNT(*) FROM t1 where col = a.col) as total_count
FROM t1 a
Plan hash value: 1306093980
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 351 |
| 1 | SORT AGGREGATE | | 1000 | 1 | 1000 |00:00:00.31 | 338K|
|* 2 | TABLE ACCESS FULL| T1 | 1000 | 11 | 1000 |00:00:00.31 | 338K|
| 3 | TABLE ACCESS FULL | T1 | 1 | 1061 | 1000 |00:00:00.01 | 351 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL"=:B1)
I believe that the query optimizer in both Oracle and SQL Server will recognize that the count query is not correlated, compute it once, and then use the cached result throughout the execution of the outer query.
Also, the CTE won't change anything as far as I know, since at execution time the code inside it will basically just be inlined into the actual outer query.
Here is a reference for Oracle which mentions that a non correlated subquery will be executed once and cached, except in cases where the outer query only has a few rows. In that case, it might not be cached because there isn't much of a penalty in executing the count subquery multiple times.

How does SQL join work?

I am trying to understand how does joins work internally. What will be the difference between the way in which the following two queries would run?
For example
(A)
Select *
FROM TABLE1
FULL JOIN TABLE2 ON TABLE1.ID = TABLE2.ID
FULL JOIN TABLE3 ON TABLE1.ID = TABLE3.ID
And
(B)
Select *
FROM TABLE1
FULL JOIN TABLE2 ON TABLE1.ID = TABLE2.ID
FULL JOIN TABLE3 ON TABLE2.ID = TABLE3.ID
Edit: I am talking about oracle here.
Consider some records present in table 2 and table 3 but not in table 1, query A would give two rows for that record but B would give only one row.
Your DBMS's optimiser will determine how best to perform the query. Usually this is done by "cost based optimisation", where a number of different query plans are considered and the most efficient one selected. If your two queries are logically identical, it is most likely that the optimiser will end up using the same query plan whichever way you write it. In fact, it would be a poor optimiser these days that produced different query plans based on such minor differences in the SQL.
However, full outer joins are a different matter (in Oracle at least), since the way the columns are joined influences the result. i.e. the 2 queries are not interchangeable.
You can use AUTOTRACE in SQL Plus to see the different plans:
SQL> select *
2 from t1
3 full join t2 on t2.id = t1.id
4 full join t3 on t3.id = t2.id;
ID ID ID
---------- ---------- ----------
1 1
1 row selected.
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 117 | 29 (11)|
| 1 | VIEW | | 3 | 117 | 29 (11)|
| 2 | UNION-ALL | | | | |
|* 3 | HASH JOIN OUTER | | 2 | 142 | 15 (14)|
| 4 | VIEW | | 2 | 90 | 11 (10)|
| 5 | UNION-ALL | | | | |
|* 6 | HASH JOIN OUTER | | 1 | 91 | 6 (17)|
| 7 | TABLE ACCESS FULL| T1 | 1 | 52 | 2 (0)|
| 8 | TABLE ACCESS FULL| T2 | 1 | 39 | 3 (0)|
|* 9 | HASH JOIN ANTI | | 1 | 26 | 6 (17)|
| 10 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)|
| 11 | TABLE ACCESS FULL| T1 | 1 | 13 | 2 (0)|
| 12 | TABLE ACCESS FULL | T3 | 1 | 26 | 3 (0)|
|* 13 | HASH JOIN ANTI | | 1 | 26 | 15 (14)|
| 14 | TABLE ACCESS FULL | T3 | 1 | 13 | 3 (0)|
| 15 | VIEW | | 2 | 26 | 11 (10)|
| 16 | UNION-ALL | | | | |
|* 17 | HASH JOIN OUTER | | 1 | 39 | 6 (17)|
| 18 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)|
| 19 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)|
|* 20 | HASH JOIN ANTI | | 1 | 26 | 6 (17)|
| 21 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)|
| 22 | TABLE ACCESS FULL| T1 | 1 | 13 | 2 (0)|
---------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T3"."ID"(+)="T2"."ID")
6 - access("T2"."ID"(+)="T1"."ID")
9 - access("T2"."ID"="T1"."ID")
13 - access("T3"."ID"="T2"."ID")
17 - access("T2"."ID"(+)="T1"."ID")
20 - access("T2"."ID"="T1"."ID")
SQL> select *
2 from t1
3 full join t2 on t2.id = t1.id
4 full join t3 on t3.id = t1.id;
ID ID ID
---------- ---------- ----------
1
1
2 rows selected.
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 117 | 29 (11)|
| 1 | VIEW | | 3 | 117 | 29 (11)|
| 2 | UNION-ALL | | | | |
|* 3 | HASH JOIN OUTER | | 2 | 142 | 15 (14)|
| 4 | VIEW | | 2 | 90 | 11 (10)|
| 5 | UNION-ALL | | | | |
|* 6 | HASH JOIN OUTER | | 1 | 91 | 6 (17)|
| 7 | TABLE ACCESS FULL| T1 | 1 | 52 | 2 (0)|
| 8 | TABLE ACCESS FULL| T2 | 1 | 39 | 3 (0)|
|* 9 | HASH JOIN ANTI | | 1 | 26 | 6 (17)|
| 10 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)|
| 11 | TABLE ACCESS FULL| T1 | 1 | 13 | 2 (0)|
| 12 | TABLE ACCESS FULL | T3 | 1 | 26 | 3 (0)|
|* 13 | HASH JOIN ANTI | | 1 | 26 | 15 (14)|
| 14 | TABLE ACCESS FULL | T3 | 1 | 13 | 3 (0)|
| 15 | VIEW | | 2 | 26 | 11 (10)|
| 16 | UNION-ALL | | | | |
|* 17 | HASH JOIN OUTER | | 1 | 39 | 6 (17)|
| 18 | TABLE ACCESS FULL| T1 | 1 | 26 | 2 (0)|
| 19 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)|
|* 20 | HASH JOIN ANTI | | 1 | 26 | 6 (17)|
| 21 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)|
| 22 | TABLE ACCESS FULL| T1 | 1 | 13 | 2 (0)|
---------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T3"."ID"(+)="T1"."ID")
6 - access("T2"."ID"(+)="T1"."ID")
9 - access("T2"."ID"="T1"."ID")
13 - access("T3"."ID"="T1"."ID")
17 - access("T2"."ID"(+)="T1"."ID")
20 - access("T2"."ID"="T1"."ID")
In fact, the query plans are identical except for the Predicate information
Use EXPLAIN PLAN:
http://download.oracle.com/docs/cd/B10500_01/server.920/a96533/ex_plan.htm
You stated interest in "internals", and then asked an example that illustrates "semantics". I'm answering semantics.
Consider these tables.
Table1 : 1, 4, 6
Table2 : 2, 4, 5
Table3 : 3, 5, 6
Both examples perform the same join first, so I'll perform that here.
FirstResult = T1 FULL JOIN T2 : (T1, T2)
(1, null)
(4, 4)
(6, null)
(null, 2)
(null, 5)
Example (A)
FirstResult FULL JOIN T3 ON FirstItem : (T1, T2, T3)
(1, null, null)
(4, 4, null)
(6, null, 6) <----
(null, 2, null)
(null, 5, null) <----
(null, null, 3)
Example (B)
FirstResult FULL JOIN T3 ON SecondItem : (T1, T2, T3)
(1, null, null)
(4, 4, null)
(6, null, null) <----
(null, 2, null)
(null, 5, 5) <----
(null, null, 3)
This shows you logically how to produce the results from the joins.
For "internals", there's something called a query optimizer, which will produce these same results - but it will make implementation choices to do the computation/io fast. These choices include:
which tables to access first
look into a table using an index or table scan
which join implementation type to use (nested loop, merge, hash).
Also note: due to the optimizer making these choices, and changing these choices based on what it considers to be optimal - the order of the results can change. The default ordering of results is always "what is easiest". If you don't want the default ordering, you need to specify ordering in your query.
To see exactly what the optimizer will do with a query (at that moment, because it can change its mind), you need to view the execution plan.
With query A what you get includes entries in table 1 with a corresponding entry in table3 without corresponding entries in table3 (nulls for t2 columns)
With query B uou don't get those entries because you only go to table3 through table2. If you don't have a corresponding entry in table2, the table2.id will be null and will never match a table3.id