SQL subquery and Joins giving same or different result (oracle) - sql

I'm working to optimize queries due to huge amount of data on Oracle.
There is one query like this.
With subquery :
SELECT
STG.ID1,
STG.ID2
FROM (SELECT
DISTINCT
H1.ID1,
H2.ID2
FROM T_STGDV STG
INNER JOIN T_HUB1 H1 ON STG.BK1 = H1.BK1
INNER JOIN T_HUB2 H2 ON STG.BK2 = H2.BK2 ) STG
LEFT OUTER JOIN T_LINK L ON L.ID1 = STG.ID1 AND L.ID2 = STG.ID2
WHERE L.IDL IS NULL;
I'm doing this optimization :
SELECT
DISTINCT
H1.ID1,
H2.ID2
FROM T_STGDV STG
INNER JOIN T_HUB1 H1 ON STG.BK1 = H1.BK1
INNER JOIN T_HUB2 H2 ON STG.BK2 = H2.BK2
LEFT OUTER JOIN T_LINK L ON L.ID1 = H1.ID1 AND L.ID2 = H2.ID2
WHERE L.IDL IS NULL;
I want to know if the result will be the same, the behavior is the same.
I did some tests, I didn't find difference but maybe i missed some test case ?
Any idea what could be the difference between those queries ?
Thanks.
Some details, the Explain plan for those testing tables (the cost are not representative of the real tables)
the First query :
Plan hash value: 2680307749
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | 11 (28)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | HASH JOIN OUTER | | 1 | 65 | 11 (28)| 00:00:01 |
| 3 | VIEW | | 1 | 26 | 8 (25)| 00:00:01 |
| 4 | HASH UNIQUE | | 1 | 134 | 8 (25)| 00:00:01 |
|* 5 | HASH JOIN | | 1 | 134 | 7 (15)| 00:00:01 |
|* 6 | HASH JOIN | | 1 | 94 | 5 (20)| 00:00:01 |
| 7 | TABLE ACCESS FULL| T_STGDV | 1 | 54 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL| T_HUB1 | 2 | 80 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | T_HUB2 | 2 | 80 | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS FULL | T_LINK | 3 | 117 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("L"."IDL" IS NULL)
2 - access("L"."ID2"(+)="STG"."ID2" AND "L"."ID1"(+)="STG"."ID1")
5 - access("STG"."BK2"="H2"."BK2")
6 - access("STG"."BK1"="H1"."BK1")
Note
-----
- dynamic sampling used for this statement (level=2)
the second query
Plan hash value: 2149614538
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | 11 (28)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 65 | 11 (28)| 00:00:01 |
|* 2 | FILTER | | | | | |
|* 3 | HASH JOIN OUTER | | 1 | 65 | 10 (20)| 00:00:01 |
| 4 | VIEW | | 1 | 26 | 7 (15)| 00:00:01 |
|* 5 | HASH JOIN | | 1 | 134 | 7 (15)| 00:00:01 |
|* 6 | HASH JOIN | | 1 | 94 | 5 (20)| 00:00:01 |
| 7 | TABLE ACCESS FULL| T_STGDV | 1 | 54 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL| T_HUB1 | 2 | 80 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | T_HUB2 | 2 | 80 | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS FULL | T_LINK | 3 | 117 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("L"."IDL" IS NULL)
3 - access("L"."ID2"(+)="H2"."ID2" AND "L"."ID1"(+)="H1"."ID1")
5 - access("STG"."BK2"="H2"."BK2")
6 - access("STG"."BK1"="H1"."BK1")
Note
-----
- dynamic sampling used for this statement (level=2)

The queries look equivalent to me, because of the where clause.
Without the where clause they are not equivalent. Duplicates in t_link (relative to the join keys) would result in duplicate rows. However, you are looking for no matches, so this is not an issue. When there is no match, the two versions should be equivalent.

If you want to test them with your current dataset you can use minus.
query 1
MINUS
query 2
If any results are displayed, they are not the same.
You have to flip them around to try the other way too...
query 2
MINUS
query 1
If both tests return no records, the queries have the same effect on your current dataset.

This might be the difference: look at these lines in your execution plans:
2 - access("L"."ID2"(+)="STG"."ID2" AND "L"."ID1"(+)="STG"."ID1")
and
3 - access("L"."ID2"(+)="H2"."ID2" AND "L"."ID1"(+)="H1"."ID1")
STG is a temporary table created by Oracle for the duration of the query (that ambiguousness between T_STGDV alias and the subquery alias was alone a reason to rewrite the query). And this temporary table is of course unindexed. After your refactoring, Oracle optimiser start joining T_LINK with H1 and H2 instead of a temporary table and that allows it to utilize indexes built on those table, thus giving you the 20x increase in speed.

After testing, there are giving the same result. And the second one is more efficient.

Related

sql from clause tables

I have the following query and in the from clause there is a left join with ga and following other tables.
should we use left join keyword for all other tables after ga table or we can use as it is in the query. Is there any performance issues with this query?
query:
from
a#db_link st left join (Select a,b,c,d
from b#db_link where id = 'AD' and num = 4) ga
on st.compensationdate = ga.compensationdate
and st.salestransactionseq = ga.salestransactionseq ,
b#db_link ta,
c#db_link cr,
d#db_link crd_typ,
e#db_link evt_typ,
f#db_link disputes
where st.salestransactionseq = ta.salestransactionseq
and st.id = 'AD'
This is the query plan:
Plan hash value: 3767304471
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | Inst |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | | 1 | 661 | 342 (1)| 00:00:01 | | | |
| 1 | NESTED LOOPS | | 1 | 661 | 342 (1)| 00:00:01 | | | |
| 2 | NESTED LOOPS | | 1 | 661 | 342 (1)| 00:00:01 | | | |
| 3 | NESTED LOOPS | | 1 | 612 | 342 (1)| 00:00:01 | | | |
| 4 | NESTED LOOPS | | 1 | 564 | 342 (1)| 00:00:01 | | | |
|* 5 | HASH JOIN | | 1 | 549 | 342 (1)| 00:00:01 | | | |
| 6 | NESTED LOOPS | | 1 | 503 | 0 (0)| 00:00:01 | | | |
| 7 | NESTED LOOPS | | 1 | 503 | 0 (0)| 00:00:01 | | | |
| 8 | NESTED LOOPS | | 1 | 450 | 0 (0)| 00:00:01 | | | |
| 9 | NESTED LOOPS | | 1 | 407 | 0 (0)| 00:00:01 | | | |
| 10 | PARTITION RANGE SINGLE | | 1 | 217 | 0 (0)| 00:00:01 | 1357 | 1357 | |
|* 11 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CS_SALESTRANSACTION | 1 | 217 | 0 (0)| 00:00:01 | 1357 | 1357 | PRD121 |
|* 12 | INDEX RANGE SCAN | CS_SALESTRANSACTION_PK | 1 | | 0 (0)| 00:00:01 | 1357 | 1357 | PRD121 |
| 13 | PARTITION RANGE SINGLE | | 1 | 190 | 0 (0)| 00:00:01 | 1356 | 1356 | |
| 14 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CS_TRANSACTIONASSIGNMENT | 1 | 190 | 0 (0)| 00:00:01 | 1356 | 1356 | PRD121 |
|* 15 | INDEX RANGE SCAN | CS_TRANSACTIONASSIGNMENT_PK | 1 | | 0 (0)| 00:00:01 | 1356 | 1356 | PRD121 |
|* 16 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | CS_GASALESTRANSACTION | 1 | 43 | 0 (0)| 00:00:01 | ROWID | ROWID | PRD121 |
|* 17 | INDEX RANGE SCAN | GASALESTRANSACTION_IDX | 3 | | 0 (0)| 00:00:01 | | | PRD121 |
| 18 | PARTITION RANGE SINGLE | | 1 | | 2 (0)| 00:00:01 | 8 | 8 | |
| 19 | PARTITION LIST ALL | | 1 | | 2 (0)| 00:00:01 | 1 | 268 | |
|* 20 | INDEX RANGE SCAN | OD_CREDIT_UTVALUE | 1 | | 2 (0)| 00:00:01 | 1347 | 1614 | PRD121 |
|* 21 | TABLE ACCESS BY LOCAL INDEX ROWID | CS_CREDIT | 1 | 53 | 3 (0)| 00:00:01 | 1 | 1 | PRD121 |
| 22 | TABLE ACCESS FULL | ADTV_FRS_DISPUTES | 27011 | 1213K| 341 (0)| 00:00:01 | | | PRD121 |
|* 23 | TABLE ACCESS BY INDEX ROWID | ADTV_FRS_CONTROL | 1 | 15 | 1 (0)| 00:00:01 | | | PRD121 |
|* 24 | INDEX UNIQUE SCAN | ADTV_FRS_CONTROL_PK | 1 | | 0 (0)| 00:00:01 | | | PRD121 |
| 25 | PARTITION LIST SINGLE | | 1 | 48 | 1 (0)| 00:00:01 | 2 | 2 | |
|* 26 | TABLE ACCESS BY LOCAL INDEX ROWID | CS_EVENTTYPE | 1 | 48 | 1 (0)| 00:00:01 | 2 | 2 | PRD121 |
|* 27 | INDEX UNIQUE SCAN | CS_EVENTTYPE_PK | 1 | | 0 (0)| 00:00:01 | 2 | 2 | PRD121 |
| 28 | PARTITION LIST SINGLE | | 1 | | 0 (0)| 00:00:01 | 2 | 2 | |
|* 29 | INDEX UNIQUE SCAN | CS_CREDITTYPE_PK | 1 | | 0 (0)| 00:00:01 | 2 | 2 | PRD121 |
|* 30 | TABLE ACCESS BY LOCAL INDEX ROWID | CS_CREDITTYPE | 1 | 49 | 1 (0)| 00:00:01 | 2 | 2 | PRD121 |
------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("GENERICATTRIBUTE13"="A1"."ACTIVITY_ID" AND "LINENUMBER"="A1"."ITEM_ID")
11 - filter("SUBLINENUMBER"<>2)
12 - access("TENANTID"='ADTV' AND "PROCESSINGUNITSEQ"=3.82805968326498E16)
15 - access("TENANTID"='ADTV' AND "PROCESSINGUNITSEQ"=3.82805968326498E16 AND "COMPENSATIONDATE"="COMPENSATIONDATE" AND
"SALESTRANSACTIONSEQ"="SALESTRANSACTIONSEQ")
16 - filter("COMPENSATIONDATE"="COMPENSATIONDATE" AND "PAGENUMBER"=4)
17 - access("SALESTRANSACTIONSEQ"="SALESTRANSACTIONSEQ" AND "TENANTID"='ADTV')
20 - access("TENANTID"='ADTV' AND "PROCESSINGUNITSEQ"=3.82805968326498E16)
21 - filter("SALESTRANSACTIONSEQ"="SALESTRANSACTIONSEQ" AND "SALESORDERSEQ"="SALESORDERSEQ")
23 - filter(UPPER("A8"."STATUS")='NEW')
24 - access("A1"."CASE_NO"="A8"."CASE_NO")
26 - filter("EVENTTYPEID"='PROTECTIONPLAN CHARGEBACK' OR "EVENTTYPEID"='PROTECTIONPLAN CHARGEBACK-FRS' OR "EVENTTYPEID"='PROTECTIONPLAN
INCENTIVE' OR "EVENTTYPEID"='PROTECTIONPLAN INCENTIVE-FRS' OR "EVENTTYPEID"='PROTECTIONPLAN KICKER' OR "EVENTTYPEID"='PROTECTIONPLAN KICKER-FRS' OR
"EVENTTYPEID"='UNIVERSAL BILLER' OR "EVENTTYPEID"='UNIVERSAL BILLER-FRS' OR "EVENTTYPEID"='WORK ORDER' OR "EVENTTYPEID"='WORK ORDER-FRS')
27 - access("TENANTID"='ADTV' AND "EVENTTYPESEQ"="DATATYPESEQ" AND "REMOVEDATE"=TO_DATE(' 2200-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
29 - access("TENANTID"='ADTV' AND "CREDITTYPESEQ"="DATATYPESEQ" AND "REMOVEDATE"=TO_DATE(' 2200-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
30 - filter("CREDITTYPEID"="A1"."CREDIT_TYPE" OR "CREDITTYPEID" LIKE "A1"."CREDIT_TYPE"||'%FRS')
Note
-----
- fully remote statement
- dynamic statistics used: dynamic sampling (level=7)
should we use left join keyword for all other tables after ga table or we can use as it is in the query. Is there any performance issues with this query?
LEFT OUTER JOIN, to give it its full name, is a two part thing
OUTER JOIN is a special case where "if the join fails, permit the solid side table/resultset to exist in the output and fill the partial side with NULLs"
The LEFT is a direction to the database as to which side shall be considered "solid". All the rows from the solid side are present at least once.
Absent any parentheses or sub queries driving execution direction:
SELECT *
FROM
a
LEFT OUTER JOIN b ON ...
a is the left; a is thus the solid side. All rows from a will be present. Rows from b may be present or null if the join predicates matched no rows
Once this is done this whole resultset of "a and b, nulls, warts and all" will become "the left side" for subsequent joins
SELECT *
FROM
a
LEFT OUTER JOIN b
some_kind_of JOIN c
Is effectively the same as:
SELECT *
FROM
(
SELECT *
FROM
a
LEFT OUTER JOIN b
) newLeft
some_kind_of JOIN c
Remember, the OUTER specifier permits the join to fail and still keeps the declared solid side rows
Whether you can use INNER or LEFT/RIGHT OUTER to join c in depends on what you're joining it to
If you're joining it to, say, a column from a then it could be fine to use INNER or OUTER - you'd use whatever you'd use if b wasn't even in the picture.
Will the join from a to c fail sometimes and you still want the rows from a? Use an OUTER.
Will it never fail, or do you not want any rows that do fail? Use an INNER.
However, if you're joining it to a column that was provided by b then you probably are going to want to use some OUTER join, otherwise there will have been no point making the query do a left outer join b - rows from bb will definitely have a NULL where the join failed but you wanted to keep those ones.. If you then INNER JOIN c to some column from b, that was NULL because the join failed, then the row will disappear from the output. Nothing is ever equal to NULL, so the INNER JOIN to c on the NULL in the column from b. In effect the INNER JOIN undoes all that good work done keeping a's data, by the OUTER join that joined b's data
Doing
a
LEFT JOIN b ON a.b_id = b.id
LEFT JOIN c ON b.c_id = c.id
allows those rows from a-join-b where b.c_id is null (because the join failed) to stay in the output (because it's an outer join to C, not an inner one)..
..
Generally we inner join everything we can, then switch to left joining everything else because it makes the queries easier to follow. In that "if c is being inner joined to a" scenario we would perhaps:
a
INNER JOIN c on a.c_id = c.id
LEFT JOIN b on a.b_id = b.id
Rather than:
a
LEFT JOIN b on a.b_id = b.id
INNER JOIN c on a.c_id = c.id
If a table is being joined to a table that was left joined, left join it too. Avoid RIGHT join because it goes against the evaluation direction of SQL and makes things harder to reason about; any time you think about using a right join, turn it around and rewrite it as a left.
Don't forget to use sub queries too. If you want every a joined to b which is joined to c only if both b and c sides match, it's probably clearest to:
a
LEFT JOIN (
SELECT * FROM b INNER JOIN c ON b.c_id = c.id
) b_and_c
Try to see your SQL as developing some growing-wider-with-every-join resultset that, at every join, becomes the new left side

Does indexing works with "WITH" in Oracle

I have query something like
WITH
str_table as (
SELECT stringtext, stringnumberid
FROM STRING_TABLE
WHERE LANGID IN (23,62)
),
data as (
select *
from employee emp
left outer join str_table st on emp.nameid = st.stringnumberid
)
select * from data
I know With clause will work in this manner
Step 1 : The SQL Query within the with clause is executed at first step.
Step 2 : The output of the SQL query is stored into temporary relation of with clause.
Step 3 : The Main query is executed with temporary relation produced at the last stage.
Now I want to ask whether the indexes created on the actual STRING_TABLE are going to help in temporary str_table produce by the With clause? I want to ask whether the indexes also have impact on str_table or not?
Oracle will not process CTE one by one. It will analyze the SQL as a whole. Your SQL is most likely the same as following in the eye of Oracle optimizer
select emp.*
from employee emp left outer join STRING_TABLE st
on emp.nameid = st.stringnumberid
where st.LANGID IN (23,62);
Oracle can use index on STRING_TABLE. Whether it will depends on the table statistics. For example, if the table has few rows (say a few hundred), Oracle will likely not use index.
It depends.
First of all, with clause is not a temporary table. As documentation says:
Oracle Database optimizes the query by treating the query name as either an inline view or as a temporary table.
Optimizer decides to materialize with subquery if either you forse it to do so by using /*+materialize*/ hint inside the subquery or you reuse this with subquery more than once.
In the example below Oracle uses with clause as inline view and merges it within the main query:
explain plan for
with a as (
select
s.textid,
s.textvalue,
a.id,
a.other_column
from string_table s
join another_tab a
on s.textid = a.textid
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
and b.job_textid = a_name.id
| PLAN_TABLE_OUTPUT |
| :----------------------------------------------------------------------------------- |
| Plan hash value: 1854049435 |
| |
| ------------------------------------------------------------------------------------ |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ------------------------------------------------------------------------------------ |
| | 0 | SELECT STATEMENT | | 1 | 1147 | 74 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 1 | 1147 | 74 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | ANOTHER_TAB | 39 | 3042 | 3 (0)| 00:00:01 | |
| |* 3 | HASH JOIN | | 31 | 33139 | 71 (0)| 00:00:01 | |
| | 4 | TABLE ACCESS FULL| BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| |* 5 | TABLE ACCESS FULL| STRING_TABLE | 1143 | 589K| 68 (0)| 00:00:01 | |
| ------------------------------------------------------------------------------------ |
But depending on the statistics and hints it may evaluate subquery first and then add it to the main query:
explain plan for
with a as (
select
s.textid,
s.textvalue,
a.id,
a.other_column
from string_table s
join another_tab a
on s.textid = a.textid
where langid in (1)
)
select /*+NO_MERGE(a_name)*/ *
from big_table b
join a a_name
on b.name_textid = a_name.textid
and b.job_textid = a_name.id
| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| Plan hash value: 4105667421 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 101 | 110K| 74 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 101 | 110K| 74 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 3 | VIEW | | 64 | 37120 | 71 (0)| 00:00:01 | |
| |* 4 | HASH JOIN | | 64 | 38784 | 71 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS FULL| ANOTHER_TAB | 39 | 3042 | 3 (0)| 00:00:01 | |
| |* 6 | TABLE ACCESS FULL| STRING_TABLE | 1143 | 589K| 68 (0)| 00:00:01 | |
| ------------------------------------------------------------------------------------- |
When you use with subquery twice, optimizer decides to materialize it:
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
join a a_job
on b.job_textid = a_job.textid
| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------------------------------------------------- |
| Plan hash value: 1371454296 |
| |
| ---------------------------------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 63 | 98973 | 67 (0)| 00:00:01 | |
| | 1 | TEMP TABLE TRANSFORMATION | | | | | | |
| | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D7224_469C01 | | | | | |
| | 3 | TABLE ACCESS BY INDEX ROWID BATCHED | STRING_TABLE | 999 | 515K| 22 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 999 | | 4 (0)| 00:00:01 | |
| |* 5 | HASH JOIN | | 63 | 98973 | 45 (0)| 00:00:01 | |
| |* 6 | HASH JOIN | | 35 | 36960 | 24 (0)| 00:00:01 | |
| | 7 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 8 | VIEW | | 999 | 502K| 21 (0)| 00:00:01 | |
| | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7224_469C01 | 999 | 502K| 21 (0)| 00:00:01 | |
| | 10 | VIEW | | 999 | 502K| 21 (0)| 00:00:01 | |
| | 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7224_469C01 | 999 | 502K| 21 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------------------------------------------------- |
So when there are some indexes on tables inside with subquery they may be used in all above cases: before materialization, when subquery is not merged and when subquery is merged and some idexes provide better query plan on merged subquery (even when those indexes are not used when you execute subquery alone).
What about idexes: if they provide high selectivity (i.e. number of rows retrieved by index is small compared to the overall number of rows), then Oracle will consider to use it. Note, that index access has two steps: read index blocks and then read table blocks that contain rowids found by index. If table size is not much bigger than index size, then Oracle may use table scan instead of index scan even for quite selective predicate (because of doubled IO).
In the below example I've used "small" texts (100 chars) and big_table table of 20 rows and this index for text table:
create index ix
on string_table(langid, textid)
Optimizer decides to use index range scan and read only blocks of the first level (first column of the index):
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
| PLAN_TABLE_OUTPUT |
| :---------------------------------------------------------------------------------------------------- |
| Plan hash value: 1660330381 |
| |
| ----------------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ----------------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 29 | 31001 | 26 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 29 | 31001 | 26 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS BY INDEX ROWID BATCHED| STRING_TABLE | 999 | 515K| 23 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 999 | | 4 (0)| 00:00:01 | |
| ----------------------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 1 - access("B"."NAME_TEXTID"="S"."TEXTID") |
| 4 - access("LANGID"=1) | |
But when we reduce the number of rows in big_table, it uses both the columns for index scan:
delete from big_table
where id > 4
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
| PLAN_TABLE_OUTPUT |
| :-------------------------------------------------------------------------------------------- |
| Plan hash value: 1766926914 |
| |
| --------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| --------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 1 | NESTED LOOPS | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 2 | NESTED LOOPS | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS FULL | BIG_TABLE | 4 | 4032 | 3 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 1 | | 1 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS BY INDEX ROWID| STRING_TABLE | 2 | 4056 | 2 (0)| 00:00:01 | |
| --------------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 4 - access("LANGID"=1 AND "B"."NAME_TEXTID"="S"."TEXTID") |
| |
You may check above code snippets in the db<>fiddle.

Slow inner join in Oracle

I have Oracle database with a main table contain 9 000 000 rows and a second with 19 000 000 rows.
When I do :
SELECT *
FROM main m
INNER JOIN second s ON m.id = s.fk_id AND s.cd = 'E' AND s.line = 1
It's take 45 seconds to get the first part of the result, even with all the index below :
CREATE INDEX IDX_1 ON SECOND (LINE, CD, FK_ID, ID);
CREATE INDEX IDX_1 ON SECOND (LINE, CD);
MAIN (ID) AS PRIMARY KEY
Any idea how to do it faster ? I try some index, rebuild but it's always take 45 seconds
Here is the execution plan :
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8850631 | 2133002071 | 696494 | 00:00:28 |
| * 1 | HASH JOIN | | 8850631 | 2133002071 | 696494 | 00:00:28 |
| * 2 | TABLE ACCESS FULL | SECOND | 8850631 | 646096063 | 143512 | 00:00:06 |
| 3 | TABLE ACCESS FULL | MAIN | 9227624 | 1550240832 | 153363 | 00:00:06 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("M"."ID"="S"."FK_ID")
* 2 - filter("S"."CD"='D' AND "S"."LINE"=1)
Thanks
If you want to see the first line quickly you have to enable Oracle to use the NESTED LOOP join.
This will required an index on second with the two columns you constraint in your query and an index on main on the join column id
create index second_idx on second(line,cd);
create index main_idx on main(id);
You'll see an execution plan similar to one below
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 87 | 8178 | 178 (0)| 00:00:03 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 87 | 8178 | 178 (0)| 00:00:03 |
| 3 | TABLE ACCESS BY INDEX ROWID| SECOND | 87 | 2523 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | SECOND_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | MAIN_IDX | 1 | | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | MAIN | 1 | 65 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("S"."LINE"=1 AND "S"."CD"='E')
5 - access("M"."ID"="S"."FK_ID")
You will access via index all rows in second with requested lineand cd (plan line 4 and 3) and for each such row you'll access via index the main table (lines 5 and 6)
This will provide an instant access to the first few rows and will work fine if there are a low number of rows in second table with the selected line and cd. In other case (when there is a large number of rows with s.cd = 'E' AND s.line = 1 - say 10k+) you will still see the first result rows quickly, but you'll wait ages to see the last row (it will take much more that the 45 seconds to finish the query).
If this is a problem you have to use a HASH JOIN (which you probaly do now).
A hash join typically doesn not use indexes and produced following execution plan
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10182 | 1153K| 908 (1)| 00:00:11 |
|* 1 | HASH JOIN | | 10182 | 1153K| 908 (1)| 00:00:11 |
|* 2 | TABLE ACCESS FULL| SECOND | 10182 | 99K| 520 (2)| 00:00:07 |
| 3 | TABLE ACCESS FULL| MAIN | 90000 | 9316K| 387 (1)| 00:00:05 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("M"."ID"="S"."FK_ID")
2 - filter("S"."LINE"=1 AND "S"."CD"='E')
Summary
To use the nested loops the indexes must be available as described above
The switch between nested loopsand hash join is done by the Oracle database (CBO) - provided that your tables statistics and database configuration are fine.

Oracle SQL indexed query 100% cpu usage

I'm running a relatively simple query
SELECT * FROM confirm_v c
JOIN person p ON c.created_by=p.id
INNER JOIN invoice_confirm ic ON ic.confirm_id=c.id
WHERE c.id = (SELECT id FROM
(SELECT c2.id FROM confirm c2
JOIN invoice_confirm ic2 ON ic2.confirm_id=c2.id
WHERE ic2.invoice_id=11954081
AND c2.previous=0
AND c2.canceled=0
AND c2.confirm_type='INVOICE'
ORDER BY c2.id)
WHERE rownum=1);
which results in 100% cpu usage by the rdb. The confirm_type is a varchar2(50 char), the rest are number(10) if it means anything.
The invoice_confirm and confirm tables are covered by indices and there are no full table scans visible in the execution plan for this query.
This query isn't executed a lot, but accounts for nearly 100% of total cpu usage. Any ideas are appreciated.
EDIT:
The explain plan in text from for the query.
EXPLAIN PLAN FOR ...
SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
Plan hash value: 1705859247
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 69 | 10 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 69 | 10 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 57 | 7 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | 1 | 30 | 5 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID | CONFIRM | 1 | 24 | 3 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | PK_CONFIRM | 1 | | 2 (0)| 00:00:01 |
|* 7 | COUNT STOPKEY | | | | | |
| 8 | VIEW | | 4 | 52 | 27 (4)| 00:00:01 |
|* 9 | SORT ORDER BY STOPKEY | | 4 | 132 | 27 (4)| 00:00:01 |
| 10 | NESTED LOOPS | | 4 | 132 | 26 (0)| 00:00:01 |
| 11 | NESTED LOOPS | | 11 | 132 | 26 (0)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID BATCHED| INVOICE_CONFIRM | 3 | 36 | 4 (0)| 00:00:01 |
|* 13 | INDEX RANGE SCAN | FKI_INVOICE_CONFIRM_INVOICE | 2 | | 3 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | PK_CONFIRM | 1 | | 1 (0)| 00:00:01 |
|* 15 | TABLE ACCESS BY INDEX ROWID | CONFIRM | 1 | 21 | 2 (0)| 00:00:01 |
|* 16 | INDEX RANGE SCAN | FKI_INVOICE_CONFIRM_CONFIRM | 1 | 6 | 2 (0)| 00:00:01 |
| 17 | TABLE ACCESS BY INDEX ROWID | PERSON | 1 | 27 | 2 (0)| 00:00:01 |
|* 18 | INDEX UNIQUE SCAN | PK_KASUTAJA | 1 | | 1 (0)| 00:00:01 |
|* 19 | INDEX RANGE SCAN | FKI_INVOICE_CONFIRM_CONFIRM | 1 | | 2 (0)| 00:00:01 |
| 20 | TABLE ACCESS BY INDEX ROWID | INVOICE_CONFIRM | 1 | 12 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("CONFIRM"."ID"= (SELECT "ID" FROM (SELECT "C2"."ID" "ID" FROM "INVOICE_CONFIRM" "IC2","CONFIRM" "C2"
WHERE "IC2"."CONFIRM_ID"="C2"."ID" AND "C2"."CANCELED"=0 AND "C2"."PREVIOUS"=0 AND "C2"."CONFIRM_TYPE"='INVOICE' AND
"IC2"."INVOICE_ID"=11954081 ORDER BY "C2"."ID") "from$_subquery$_006" WHERE ROWNUM=1))
7 - filter(ROWNUM=1)
9 - filter(ROWNUM=1)
13 - access("IC2"."INVOICE_ID"=11954081)
14 - access("IC2"."CONFIRM_ID"="C2"."ID")
15 - filter("C2"."CANCELED"=0 AND "C2"."PREVIOUS"=0 AND "C2"."CONFIRM_TYPE"='INVOICE')
16 - access("IC"."CONFIRM_ID"="CONFIRM"."ID")
18 - access("CONFIRM"."CREATED_BY"="P"."ID")
19 - access("IC"."CONFIRM_ID"="CONFIRM"."ID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 2 Sql Plan Directives used for this statement
Gather optimizer statistics on the relevant tables and investigate why the statistics were missing.
begin
dbms_stats.gather_table_stats(user, 'CONFIRM');
dbms_stats.gather_table_stats(user, 'INVOICE_CONFIRM');
dbms_stats.gather_table_stats(user, 'PERSON');
end;
/
Optimizer statistics are critical for Oracle to achieve good performance. The note dynamic statistics used: dynamic sampling (level=2) implies that there are tables with missing optimizer statistics. That should never happen unless the tables were created within the last day.
Oracle automatically gathers stale and missing statistics. Check if the job is running with this query. If there are no recent rows, ask your DBA to re-enable the task.
select *
from dba_optstat_operations
where operation like '%auto%'
order by start_time desc;
The autotask is good enough for most tables. But if there's a large batch process that updates a lot of rows then the statistics should be manually collected as soon as the job is complete.

SQL tuning issue

I have a query:
select count(1) CNT
from file_load_params a
where a.doc_type = (select b.doc_type
from file_load_header b
where b.indicator = 'XELFASI')
order by a.line_no
Which explain plan is:
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL | FILE_LOAD_PARAMS | 15 | 105 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| FILE_LOAD_HEADER | 1 | 12 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | FILE_LOAD_HEADER_UK | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
I thought that I could optimize this query and write this one:
select count(1) CNT
from file_load_params a,file_load_header b
where b.indicator = 'XELFASI'
and a.doc_type = b.doc_type
order by a.line_no
Its explain plan is:
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 19 | | |
| 2 | NESTED LOOPS | | 15 | 285 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| FILE_LOAD_HEADER | 1 | 12 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | FILE_LOAD_HEADER_UK | 1 | | 0 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | FILE_LOAD_PARAMS | 15 | 105 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Is it good? I think not,but I expected better result...Do you have any idea?
From the explain plans, these appear to be tiny tables and the cost of the query is negligible. How long do they take to run and how quickly do you need them to run ?
But remove the ORDER BY. Since you are selecting a single row COUNT aggregate it is pointless.
One of the possible optimizations i see from your explain plan is
TABLE ACCESS FULL | FILE_LOAD_PARAMS
This seems to indicate that table file_load_params possibly does not have any index on doc_type
If that is the case, can you add an index for doc_type. If you already have indexes, can you post your table schema for file_load_params
The result is not the same for the two queries. The IN operator automatically also applies a DISTINCT to the inner query. And in this case it is probably not a key you are joining on (if it is, then make it an unique key), so it cannot be optimized away.
As for optimizing the query, then do as InSane says, add an index on Doc_Type in FILE_LOAD_PARAMS