SQL Why don't use PK index? - sql

I have this complicated SQL query:
SELECT f1 (d1.prdecdde),
f2 (d1.prdecdde),
f3 (d1.prdecdde),
f4 (1, d1.prdecdde, d1.prdenpol),
d1.prdeisin,
f6 (d1.prdecdde, a.POLIRCTB),
NVL (a.poliagtb, a.poliagta),
d1.prdedtpr,
prdeticu
FROM ( SELECT prdecdde,
prdenpol,
prdeano,
SUM (NVL (prdeval, 0)) valantes,
NULL valdepois,
prdedtpr,
prdeticu,
prdeisin
FROM stat_pro_det
WHERE prdedprv = '20151101'
AND prdecdde IN (700,
100,
610,
600,
710,
900,
910)
AND prdeval > 0
GROUP BY prdecdde,
prdenpol,
prdeano,
prdedtpr,
prdeticu,
prdeisin
UNION ALL
SELECT prdecdde,
prdenpol,
prdeano,
NULL,
SUM (NVL (prdeval, 0)) valdepois,
prdedtpr,
prdeticu,
prdeisin
FROM stat_pro_det
WHERE prdedprv = '20160727'
AND prdecdde IN (700,
100,
610,
600,
710,
900,
910)
AND prdeval > 0
GROUP BY prdecdde,
prdenpol,
prdeano,
prdedtpr,
prdeticu,
prdeisin) d1,
sgss.dtpoli a
WHERE a.policdde = d1.prdecdde AND a.polinpol = d1.prdenpol
HAVING SUM (NVL (d1.valdepois, 0) - NVL (d1.valantes, 0)) <> 0
GROUP BY d1.prdecdde,
d1.prdenpol,
d1.prdeano,
a.polirctb,
a.poliagta,
a.poliagtb,
d1.prdedtpr,
d1.prdeticu,
d1.prdeisin;
The primary key for dtpoli table is this:
CREATE UNIQUE INDEX SGSS.PK_DTPOLI ON SGSS.DTPOLI
(POLICDDE, POLINPOL)
Here is the explain plan:
Plan hash value: 1960385779
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19403 | 1705K| | 113K (1)| 00:00:05 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 19403 | 1705K| 38M| 113K (1)| 00:00:05 |
|* 3 | HASH JOIN | | 388K| 33M| 23M| 111K (1)| 00:00:05 |
| 4 | TABLE ACCESS FULL | DTPOLI | 618K| 16M| | 6561 (7)| 00:00:01 |
| 5 | VIEW | | 388K| 22M| | 103K (1)| 00:00:05 |
| 6 | UNION-ALL | | | | | | |
| 7 | HASH GROUP BY | | 194K| 9304K| 13M| 52044 (1)| 00:00:03 |
| 8 | INLIST ITERATOR | | | | | | |
|* 9 | TABLE ACCESS BY INDEX ROWID| STAT_PRO_DET | 194K| 9304K| | 50003 (1)| 00:00:02 |
|* 10 | INDEX RANGE SCAN | STAT_PRO_DET_03 | 198K| | | 790 (2)| 00:00:01 |
| 11 | HASH GROUP BY | | 193K| 9264K| 13M| 51818 (1)| 00:00:03 |
| 12 | INLIST ITERATOR | | | | | | |
|* 13 | TABLE ACCESS BY INDEX ROWID| STAT_PRO_DET | 193K| 9264K| | 49784 (1)| 00:00:02 |
|* 14 | INDEX RANGE SCAN | STAT_PRO_DET_03 | 197K| | | 783 (2)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM(NVL("D1"."VALDEPOIS",0)-NVL("D1"."VALANTES",0))<>0)
3 - access("POLICDDE"="D1"."PRDECDDE" AND "POLINPOL"="D1"."PRDENPOL")
9 - filter("PRDEVAL">0)
10 - access("PRDEDPRV"='20151101' AND ("PRDECDDE"=100 OR "PRDECDDE"=600 OR "PRDECDDE"=610 OR
"PRDECDDE"=700 OR "PRDECDDE"=710 OR "PRDECDDE"=900 OR "PRDECDDE"=910))
13 - filter("PRDEVAL">0)
14 - access("PRDEDPRV"='20160727' AND ("PRDECDDE"=100 OR "PRDECDDE"=600 OR "PRDECDDE"=610 OR
"PRDECDDE"=700 OR "PRDECDDE"=710 OR "PRDECDDE"=900 OR "PRDECDDE"=910))
Both columns are number datatype. Using the hint parallel(#) I can improve the performance but my focus is on dtpoli PK.
I can't find why this query doesn't use this primary key index and uses a Full Table Scan on DTPOLI table. It's because I have a Group by clause? I really don't understand.
Any help?
I'm using Oracle 11gR2.

It doesn't use the index because it would be less efficient to do so. An index is useful if you're retrieving a small proportion of the data from the table, but when you're getting a lot of it, using the index would be slower.
The reason is that finding the matching row in the index requires a disk access, to get the index block. That gives you the ROWID of the data record, and you then need another disk access to get that data block. Each index and data block has to be read at least once, and possibly many times.
The blocks may be in the buffer cache, but you're still hitting that twice, and because you'd be jumping around to different parts of the index and table you're increasing the likelihood of things having aged out - which means that even if you end up getting two rows from the same physical data block, you might end up having to read it from disk twice.
A full table scan will retrieve all of the data blocks for the table in one go, so it doesn't have to read any of them twice, and doesn't have the additional overhead of also reading the index blocks.
If you were only referring to the columns in the primary key (or any other index) then a full index scan might be used. But you're retrieving non-index data, such as poliagtb, so the data blocks have to be retrieved too.
Your primary key is enforcing referential integrity. It can also be used to retrieve specific data quickly, but only when appropriate. The optimiser does a pretty good job of deciding when it is and isn't appropriate.

Related

Oracle SQL Execution Plan Difference

I have the below query and I'm generating the execution plan for it:
(EDIT: I'm using SQL Developer).
EXPLAIN PLAN
FOR
WITH aux AS (
SELECT
*
FROM
cdc.uap_fkkvkp#rbip
WHERE
ezawe = 'D'
)
SELECT /* FULL(a) FULL(b) */
-- COUNT(1)
b.zzpayment_plan,
b.vkont,
b.gpart,
a.opbel,
a.opupw,
a.opupk,
a.opupz,
a.blart,
a.betrw
FROM
cdc.uap_dfkkop#rbip a
JOIN aux b ON b.vkont = a.vkont
WHERE
a.augst IS NULL
AND a.xanza IS NULL
AND a.stakz IS NULL
AND a.augrs IS NULL
AND a.abwtp IS NULL;
SELECT
*
FROM
TABLE ( dbms_xplan.display );
It gives the below plan:
Plan hash value: 289441478
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4404K| 634M| | 6752K (1)| 00:04:24 |
|* 1 | HASH JOIN | | 4404K| 634M| 291M| 6752K (1)| 00:04:24 |
|* 2 | TABLE ACCESS STORAGE FULL | UAP_FKKVKP | 4487K| 239M| | 559K (1)| 00:00:22 |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| UAP_DFKKOP | 4404K| 399M| | 6160K (1)| 00:04:01 |
| 4 | BITMAP CONVERSION TO ROWIDS | | | | | | |
| 5 | BITMAP AND | | | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_AUGST_2 | | | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_NEW_STAKZ_2 | | | | | |
|* 8 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_AUGRS_2 | | | | | |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("UAP_FKKVKP"."VKONT"="A"."VKONT")
2 - storage("EZAWE"=U'D')
filter("EZAWE"=U'D')
3 - filter("A"."XANZA" IS NULL AND "A"."ABWTP" IS NULL)
6 - access("A"."AUGST" IS NULL)
7 - access("A"."STAKZ" IS NULL)
8 - access("A"."AUGRS" IS NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
However, if I generate the plan for the first part of the code, but for the COUNT(1) uncommented (and the other selections commented), the below shows (it's exactly the same as the previous execution plan):
Plan hash value: 2732266276
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 66 | | 6736K (1)| 00:04:24 |
| 1 | SORT AGGREGATE | | 1 | 66 | | | |
|* 2 | HASH JOIN | | 4404K| 277M| 171M| 6736K (1)| 00:04:24 |
|* 3 | TABLE ACCESS STORAGE FULL | UAP_FKKVKP | 4487K| 119M| | 559K (1)| 00:00:22 |
|* 4 | TABLE ACCESS BY INDEX ROWID BATCHED| UAP_DFKKOP | 4404K| 159M| | 6160K (1)| 00:04:01 |
| 5 | BITMAP CONVERSION TO ROWIDS | | | | | | |
| 6 | BITMAP AND | | | | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_AUGST_2 | | | | | |
|* 8 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_NEW_STAKZ_2 | | | | | |
|* 9 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_AUGRS_2 | | | | | |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("UAP_FKKVKP"."VKONT"="A"."VKONT")
3 - storage("EZAWE"=U'D')
filter("EZAWE"=U'D')
4 - filter("A"."XANZA" IS NULL AND "A"."ABWTP" IS NULL)
7 - access("A"."AUGST" IS NULL)
8 - access("A"."STAKZ" IS NULL)
9 - access("A"."AUGRS" IS NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
If I try to execute both queries, the first one takes about 3 secs while the second one takes almost 8 mins.
Why is there such a difference between the two and why is it not captured in the execution plan?
Is my full table scan here correctly applied? If not, what would be the best hint?
What would be the best (up to date) book/documentation/online tutorials to upgrade my SQL performance skills? So far, I've seen that Oracle has a dev gym for performance and there are also some books like Advanced Oracle SQL Tuning which looks interesting.
With explain there is no real execution. You will receive a lot more details with real execution after turning statistics_level to ALL on session level.
But run it in SQLPLUS:
set serverout off
alter session set statistics_level=all;
<execute the query here without explain>
select * from table(dbms_xplan.display_cursor(null,null,'RUNSTATS_LAST'));
If you provide two such plans instead of EXPLAIN PLAN it will be easier for your readers.
Short description of important columns(Tanel Poder site is good):
Starts - number of execution of the step
A-Rows - actual number of rows received from this step
Buffers - number of reads from RAM - you have to decrease this value
Reads - number of reads from disk
Writes - number of writes to disk - different from 0 - mainly for SELECTs when TEMP is used
This method for plan usage is shown for example here : https://tanelpoder.com/files/Oracle_SQL_Plan_Execution.pdf

Oracle Join a Million Record Table slow down query performance

My requirement is to find the idle period for the each customer.To find the idle customer first i have to fetch the
registration table and it has 1 million records. To find out the last transaction time for each customer i have to
join the transaction log table it has 60 million records.Below is my query for that.
SELECT CUSTOMERNAME,MOBILENUMBER,ACCOUNTNUMBER,
CUSTOMERID,LASTTXNDATE,
FLOOR(SYSDATE - to_date(TO_CHAR(LASTTXNDATE, 'DD/MM/YYYY'),'DD/MM/YYYY')) AS "IDLE DAYS"
FROM REGN_MAST
LEFT JOIN
( SELECT TXNMOBILENUMBER,MAX(TXNDT) AS LASTTXNDATE
FROM TXN_DETL
GROUP BY TXNMOBILENUMBER
)
ON MOBILENUMBER=TXNMOBILENUMBER;
explain plan for
SELECT CUSTOMERNAME,MOBILENUMBER,ACCOUNTNUMBER,
CUSTOMERID,LASTTXNDATE,
FLOOR(SYSDATE - to_date(TO_CHAR(LASTTXNDATE, 'DD/MM/YYYY'),'DD/MM/YYYY')) AS "IDLE DAYS"
FROM REGN_MAST
LEFT JOIN
( SELECT TXNMOBILENUMBER,MAX(TXNDT) AS LASTTXNDATE
FROM TXN_DETL
GROUP BY TXNMOBILENUMBER
)
ON MOBILENUMBER=TXNMOBILENUMBER;
Plan hash value: 403296370
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1231K| 102M| | 1554K (1)| 05:10:59 | | |
|* 1 | HASH JOIN RIGHT OUTER | | 1231K| 102M| 58M| 1554K (1)| 05:10:59 | | |
| 2 | VIEW | | 1565K| 40M| | 1535K (1)| 05:07:07 | | |
| 3 | HASH GROUP BY | | 1565K| 37M| 2792M| 1535K (1)| 05:07:07 | | |
| 4 | PARTITION RANGE ALL | | 80M| 1926M| | 1321K (1)| 04:24:24 | 1 |1048575|
| 5 | PARTITION HASH ALL | | 80M| 1926M| | 1321K (1)| 04:24:24 | 1 | 4 |
| 6 | TABLE ACCESS FULL | TXN_DETL | 80M| 1926M| | 1321K (1)| 04:24:24 | 1 |1048575|
| 7 | PARTITION RANGE ALL | | 1231K| 70M| | 12237 (1)| 00:02:27 | 1 |1048575|
| 8 | PARTITION HASH ALL | | 1231K| 70M| | 12237 (1)| 00:02:27 | 1 | 4 |
| 9 | TABLE ACCESS BY LOCAL INDEX ROWID| REGN_MAST | 1231K| 70M| | 12237 (1)| 00:02:27 | 1 |1048575|
| 10 | BITMAP CONVERSION TO ROWIDS | | | | | | | | |
| 11 | BITMAP INDEX FULL SCAN | IDX_REGN_MAST_7 | | | | | | 1 |1048575|
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MOBILENUMBER"="TXNMOBILENUMBER"(+))
Note
-----
- dynamic sampling used for this statement (level=11)
------------------------------------------------------------------------------------------------------------------------------------------------
This query takes more than 25 minutes.How to improve the performance of this query.
Any help will be greatly appreciated!!!!!!
Your query uses all data from both tables, so the first choice is to chect the execution plan using the FULL TABLE SCAN.
Remember FULL TABLE SCAN is slow, but selecting all rows from a table with an INDEX is much slower...
So you should approach an execotion plan as follows:
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 60M| | 176K (2)| 00:00:07 |
|* 1 | HASH JOIN OUTER | | 1000K| 60M| 41M| 176K (2)| 00:00:07 |
| 2 | TABLE ACCESS FULL | REGN_MAST | 1000K| 29M| | 1370 (1)| 00:00:01 |
| 3 | VIEW | | 1014K| 30M| | 170K (2)| 00:00:07 |
| 4 | HASH GROUP BY | | 1014K| 16M| 1610M| 170K (2)| 00:00:07 |
| 5 | TABLE ACCESS FULL| TXN_DETL | 60M| 972M| | 49771 (1)| 00:00:02 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MOBILENUMBER"="TXNMOBILENUMBER"(+))
Depending on your HW and memory configuration the time will vary, but on a recent HW I'd expect elapces time below 10 minutes.
You may further limit it using
a) parallel query
b) keep a materialized view holding the last transaction date
Here my test with generated data leding to 5+ minutes (see below).
So my advice either remove all indexes or hint the FULL and retry.
SQL> set timi on
SQL> set autotrace traceonly
SQL> SELECT CUSTOMERNAME,MOBILENUMBER,ACCOUNTNUMBER,
2 CUSTOMERID,LASTTXNDATE,
3 FLOOR(SYSDATE - to_date(TO_CHAR(LASTTXNDATE, 'DD/MM/YYYY'),'DD/MM/YYYY')
) AS "IDLE DAYS"
4 FROM REGN_MAST
5 LEFT JOIN
6 ( SELECT TXNMOBILENUMBER,MAX(TXNDT) AS LASTTXNDATE
7 FROM TXN_DETL
8 GROUP BY TXNMOBILENUMBER
9 )
10 ON MOBILENUMBER=TXNMOBILENUMBER;
1000000 rows selected.
Elapsed: 00:05:42.23
Sample Data
create table REGN_MAST
as
select
'Name'||rownum CUSTOMERNAME,'00'||rownum MOBILENUMBER, 99*rownum ACCOUNTNUMBER, rownum CUSTOMERID
from dual connect by level <= 1000000;
create table TXN_DETL
as
with cust as (
select
'00'||rownum TXNMOBILENUMBER
from dual connect by level <= 1000000),
trans as (
select DATE'2018-01-01' + rownum TXNDT
from dual connect by level <= 60)
select TXNMOBILENUMBER, TXNDT
from cust CROSS join trans;
I would try rewriting the query as:
SELECT m.CUSTOMERNAME, m.MOBILENUMBER, m.ACCOUNTNUMBER,
m.CUSTOMERID, t.TXNDT,
FLOOR(SYSDATE - TRUNC(TXNDT)) AS IDLE_DAYS
FROM REGN_MAST m JOIN
TXN_DETL t
ON m.MOBILENUMBER = t.TXNMOBILENUMBER
WHERE t.TXNDT = (SELECT MAX(t2.TXNDT) FROM TXN_DETL t2 WHERE m.MOBILENUMBER = t2.TXNMOBILENUMBER);
Then, be sure that you have an index on TXN_DETL(TXNMOBILENUMBER, TXNDT) for performance.
I changed the LEFT JOIN to an INNER JOIN under the assumption that all customers have transactions.
This also simplifies the date arithmetic. That has less to do with performance than readability.
Create a covering index on TXN_DETL(TXNMOBILENUMBER,TXNDT).
According to the execution plan 86% of the cost is for the full table scan on TXN_DETL. If there is an index on all the relevant columns Oracle can use that index as a skinny table. An INDEX FAST FULL SCAN operation might run significantly faster than TABLE ACCESS FULL.

Oracle Optimizer Awkwardly Doesn't Prefer to Use Index

I'm joining a table with itself but although I expect this operation to use index, it seems it doesn't. There are 1 million records on the table(MY_TABLE) and the query I run is executing on about 10 thousand records.(So it is lower than %1 of whole table.)
Test Case:
explain plan for
SELECT *
FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
WHERE (A1.K_ID = '123abc')
AND A1.HDT = A2.HDT
AND A2.C_DATE BETWEEN A1.SYSDATE - 0.0004
AND A1.SYSDATE + 0.0004
AND A1.GKID = A2.GKID;
Plan hash value: 1210306805
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 81 | 28 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | HASH JOIN | | 3 | 81 | 28 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 3 | 45 | 7 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_MY_TABLE_C_DATE | 3 | | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 17 | 204 | 21 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IX_MY_TABLE_K_ID | 17 | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SYSDATE#!+0.00004>=SYSDATE#!-0.00004)
2 - access("A1"."HDT"="A2"."HDT" AND
"A1"."GKID"="A2"."GKID")
4 - access("A2"."C_DATE">=SYSDATE#!-0.00004 AND
"A2"."C_DATE"<=SYSDATE#!+0.00004)
6 - access("A1"."K_ID"=U'123abc')
In the above statement, it can be seen that the index on C_DATE is used.
However, in the below statement, the index on C_DATE is not used. So, the query runs really slow.
Real Case:
explain plan for
SELECT *
FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
WHERE (A1.K_ID = '123abc')
AND A1.HDT = A2.HDT
AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004
AND A1.C_DATE + 0.0004
AND A1.GKID = A2.GKID;
Plan hash value: 1063167343
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4187K| 998M| 6549K (1)| 00:04:16 |
|* 1 | HASH JOIN | | 4187K| 998M| 6549K (1)| 00:04:16 |
| 2 | JOIN FILTER CREATE | :BF0000 | 17 | 2125 | 21 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 17 | 2125 | 21 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_MY_TABLE_K_ID | 17 | | 4 (0)| 00:00:01 |
| 5 | JOIN FILTER USE | :BF0000 | 1429M| 166G| 6546K (1)| 00:04:16 |
|* 6 | TABLE ACCESS STORAGE FULL | MY_TABLE | 1429M| 166G| 6546K (1)| 00:04:16 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(A1.HDT=A2.HDT AND
A1.GKID=A2.GKID)
filter(A2.C_DATE>=INTERNAL_FUNCTION(A1.C_DATE)-0.00004 AND
A2.C_DATE<=INTERNAL_FUNCTION(A1.C_DATE)+0.00004)
4 - access(A1.K_ID=U'123abc')
6 - storage(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
filter(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
If I use the hint /*+index(A2,IX_MY_TABLE_C_DATE )*/, everthing is fine, the index is used and the query runs fast as I want.
The query in the real case cannot be changed because it is created by application.
Index Information:
K_ID, not unique, position 1
HDT, not unique, position 1
C_DATE, not unique, position 1
ID Unique and Primary Key, position 1
What do I have to change in order the query in the real case to use index?
Well, the second query is slower since it's quite different from the first one. It has an extra join between tables:
AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004
AND A1.C_DATE + 0.0004
and on a million rows this takes a toll.
The first query doesn't have this join condition and both tables are:
Filtered first. This is fast using indexes: only 3 rows and 17 rows.
Joining them second. Joining 3 and 17 rows doesn't take any time.
The second query needs to perform:
A huge (hash) join first, that returns probably 100K+ rows.
A filtering later.
This is way slower.
I suggest adding the following indexes, and try again:
create index ix_1 (k_id);
create index ix_2 (hdt, gkid, c_date);
You have three join criteria (HDT, GKID, C_DATE) and 1 non-join criteria (K_ID) in your self-join. So for me it would seem natural, if the DBMS started with the records matching K_ID and then looked up all matching other records.
For this case I'd suggest the following indexes:
create index idx1 on my_table(k_id, hdt, gkid, c_date);
create index idx2 on my_table(hdt, gkid, c_date);
If there are just few records per k_id, I am sure that Oracle will use the indexes. If there are many, Oracle may still use the second one for a hash join.
Just to add, whenever you have a case where you
have a slow SQL,
have identified a hint that would make it
faster
cannot change the SQL because the source is not available
then this is a perfect case for SQL Plan Baselines. These let you lock a "good" plan against an existing SQL without touching the SQL statement itself.
The entire series describing SPM is at the links below, but thelinke to "part 4" walks through an exact example of what you want to achieve.
https://blogs.oracle.com/optimizer/sql-plan-management-part-1-of-4-creating-sql-plan-baselines
https://blogs.oracle.com/optimizer/sql-plan-management-part-2-of-4-spm-aware-optimizer
https://blogs.oracle.com/optimizer/sql-plan-management-part-3-of-4:-evolving-sql-plan-baselines
https://blogs.oracle.com/optimizer/sql-plan-management-part-4-of-4:-user-interfaces-and-other-features

Using partition key in function Oracle

We have a partioned table in our Oracle database using this syntaxe:
...
PARTITION BY RANGE(saledate)
(PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')),
PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')),
...
We usually use partition key in select statement like this:
Select * from table where saledate >= trunc(sysdate-3) and saledate < trunc(sysdate-2)
To have same result using less code, I usually use this query instead :
Select * from table where trunc(saledate) = trunc(sysdate-3)
My question is, by using partition key in a function, in this case trunc(), do we loose partioning performance ?
You are misinterpreting the plan shown by Toad (in your answer). For the two queries that shows, respectively:
Partition #: 2 Partitions determined by key values
and
Partition #: 1 Partitions accessed #1 - #17
The first query is accessing only the partitions it needs, based on the key value, which is the date; so it only has to do a full scan of the partition(s) that could contain your date.
The second query has to access all partitions because you are manipulating the key value with a function, meaning you are no longer really using the partition key. The key is saledate, not trunc(saledate). This is similar to what happens when you use a function on an indexed column; the index is no longer used in that case, and the partition key is no longer used here. And as you seem to suspect from your question, yes, you do lose efficiency.
You can also see that the cardinality has been guessed as 50 because of the function call, instead of the stats-provided value of 4966.
You can see the same thing from a dummy table using dbms_xplan; from your first query:
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4996 | 39968 | 14 (0)| 00:00:01 | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE ITERATOR| | 4996 | 39968 | 14 (0)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | T42 | 4996 | 39968 | 14 (0)| 00:00:01 | KEY | KEY |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TRUNC(SYSDATE#!-3)<TRUNC(SYSDATE#!-2))
3 - filter("SALEDATE">=TRUNC(SYSDATE#!-3) AND "SALEDATE"<TRUNC(SYSDATE#!-2))
And from your second query:
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 400 | 14 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ALL| | 50 | 400 | 14 (0)| 00:00:01 | 1 | 17 |
|* 2 | TABLE ACCESS FULL | T42 | 50 | 400 | 14 (0)| 00:00:01 | 1 | 17 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TRUNC(INTERNAL_FUNCTION("SALEDATE"))=TRUNC(SYSDATE#!-3))
Notice the pstart/pstop values and the cardinality in each query.
Your first query is going to be more efficient because it can use the partition key to be selective about which partitions it does a full scan of, while the second cannot and has to scan them all.

Query Tuning - Advice

I need advice on the attached Query. The query executes for over an hour and has full table scan as per the Explain Plan. I am fairly new to query tuning and would appriciate some advice.
Firstly why would I get a full table scan even though all the columns I use have index created on them.
Secondly, is there any possibility where in I can reduce the execution time, all tables accessed are huge and contain millions of records, even then I would like to scope out some options. Appriciate your help.
Query:
select
distinct rtrim(a.cod_acct_no)||'|'||
a.cod_prod||'|'||
to_char(a.dat_acct_open,'Mon DD YYYY HH:MMAM')||'|'||
a.cod_acct_title||'|'||
a.cod_acct_stat||'|'||
ltrim(to_char(a.amt_od_limit,'99999999999999999990.999999'))||'|'||
ltrim(to_char(a.bal_book,'99999999999999999990.999999'))||'|'||
a.flg_idd_auth||'|'||
a.flg_mnt_status||'|'||
rtrim(c.cod_acct_no)||'|'||
c.cod_10||'|'||
d.nam_branch||'|'||
d.nam_cc_city||'|'||
d.nam_cc_state||'|'||
c.cod_1||'|'||
c.cod_14||'|'||
num_14||'|'||
a.cod_cust||'|'||
c.cod_last_mnt_chkrid||'|'||
c.dat_last_mnt||'|'||
c.ctr_updat_srlno||'|'||
c.cod_20||'|'||
c.num_16||'|'||
c.cod_14||'|'||
c.num_10 ||'|'||
a.flg_classif_reqd||'|'||
(select g.cod_classif_plan_id||'|'||
g.cod_classif_plan_id
from
ac_acct_preferences g
where
a.cod_acct_no=g.cod_acct_no AND g.FLG_MNT_STATUS = 'A' )||'|'||
(select e.dat_cam_expiry from flexprod_host.AC_ACCT_PLAN_CRITERIA e where a.cod_acct_no=e.cod_acct_no and e.FLG_MNT_STATUS ='A')||'|'||
c.cod_23||'|'||
lpad(trim(a.cod_cc_brn),4,0)||'|'||
(select min( o.dat_eff) from ch_acct_od_hist o where a.cod_acct_no=o.cod_acct_no )
from
ch_acct_mast a,
ch_acct_cbr_codes c,
ba_cc_brn_mast d
where
a.flg_mnt_status ='A'
and c.flg_mnt_status ='A'
and a.cod_acct_no= c.cod_acct_no(+)
and a.cod_cc_brn=d.cod_cc_brn
and a.cod_prod in (
299,200,804,863,202,256,814,232,182,844,279,830,802,833,864,
813,862,178,205,801,235,897,231,187,229,847,164,868,805,207,
250,837,274,253,831,893,201,809,846,819,820,845,811,843,285,
894,284,817,832,278,818,810,181,826,867,825,848,871,866,895,
770,806,827,835,838,881,853,188,816,293,298)
Query Plan:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4253465430
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 733K| 125M| | 468K (1)|999:59:59 | | |
| 1 | TABLE ACCESS BY INDEX ROWID | AC_ACCT_PREFERENCES | 1 | 26 | | 3 (0)| 00:01:05 | | |
|* 2 | INDEX UNIQUE SCAN | IN_AC_ACCT_PREFERENCES_1 | 1 | | | 2 (0)| 00:00:43 | | |
| 3 | PARTITION HASH SINGLE | | 1 | 31 | | 3 (0)| 00:01:05 | KEY | KEY |
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID| AC_ACCT_PLAN_CRITERIA | 1 | 31 | | 3 (0)| 00:01:05 | KEY | KEY |
|* 5 | INDEX UNIQUE SCAN | IN_AC_ACCT_PLAN_CRITERIA_1 | 1 | | | 2 (0)| 00:00:43 | KEY | KEY |
| 6 | SORT AGGREGATE | | 1 | 29 | | | | | |
| 7 | FIRST ROW | | 1 | 29 | | 3 (0)| 00:01:05 | | |
|* 8 | INDEX RANGE SCAN (MIN/MAX) | IN_CH_ACCT_OD_HIST_1 | 1 | 29 | | 3 (0)| 00:01:05 | | |
| 9 | HASH UNIQUE | | 733K| 125M| 139M| 468K (1)|999:59:59 | | |
|* 10 | HASH JOIN | | 733K| 125M| | 439K (1)|999:59:59 | | |
|* 11 | TABLE ACCESS FULL | BA_CC_BRN_MAST | 3259 | 136K| | 31 (0)| 00:11:04 | | |
|* 12 | HASH JOIN | | 747K| 97M| 61M| 439K (1)|999:59:59 | | |
| 13 | PARTITION HASH ALL | | 740K| 52M| | 286K (1)|999:59:59 | 1 | 64 |
|* 14 | TABLE ACCESS FULL | CH_ACCT_MAST | 740K| 52M| | 286K (1)|999:59:59 | 1 | 64 |
|* 15 | TABLE ACCESS FULL | CH_ACCT_CBR_CODES | 9154K| 541M| | 117K (1)|699:41:01 | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COD_ACCT_NO"=:B1 AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_co
de'),'0')))
5 - access("COD_ACCT_NO"=:B1 AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_co
de'),'0')))
8 - access("COD_ACCT_NO"=:B1)
filter("COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
10 - access("COD_CC_BRN"="COD_CC_BRN")
11 - filter("COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
12 - access("COD_ACCT_NO"="COD_ACCT_NO")
14 - filter(("COD_PROD"=164 OR "COD_PROD"=178 OR "COD_PROD"=181 OR "COD_PROD"=182 OR "COD_PROD"=187 OR "COD_PROD"=188 OR
"COD_PROD"=200 OR "COD_PROD"=201 OR "COD_PROD"=202 OR "COD_PROD"=205 OR "COD_PROD"=207 OR "COD_PROD"=229 OR "COD_PROD"=231 OR
"COD_PROD"=232 OR "COD_PROD"=235 OR "COD_PROD"=250 OR "COD_PROD"=253 OR "COD_PROD"=256 OR "COD_PROD"=274 OR "COD_PROD"=278 OR
"COD_PROD"=279 OR "COD_PROD"=284 OR "COD_PROD"=285 OR "COD_PROD"=293 OR "COD_PROD"=298 OR "COD_PROD"=299 OR "COD_PROD"=770 OR
"COD_PROD"=801 OR "COD_PROD"=802 OR "COD_PROD"=804 OR "COD_PROD"=805 OR "COD_PROD"=806 OR "COD_PROD"=809 OR "COD_PROD"=810 OR
"COD_PROD"=811 OR "COD_PROD"=813 OR "COD_PROD"=814 OR "COD_PROD"=816 OR "COD_PROD"=817 OR "COD_PROD"=818 OR "COD_PROD"=819 OR
"COD_PROD"=820 OR "COD_PROD"=825 OR "COD_PROD"=826 OR "COD_PROD"=827 OR "COD_PROD"=830 OR "COD_PROD"=831 OR "COD_PROD"=832 OR
"COD_PROD"=833 OR "COD_PROD"=835 OR "COD_PROD"=837 OR "COD_PROD"=838 OR "COD_PROD"=843 OR "COD_PROD"=844 OR "COD_PROD"=845 OR
"COD_PROD"=846 OR "COD_PROD"=847 OR "COD_PROD"=848 OR "COD_PROD"=853 OR "COD_PROD"=862 OR "COD_PROD"=863 OR "COD_PROD"=864 OR
"COD_PROD"=866 OR "COD_PROD"=867 OR "COD_PROD"=868 OR "COD_PROD"=871 OR "COD_PROD"=881 OR "COD_PROD"=893 OR "COD_PROD"=894 OR
"COD_PROD"=895 OR "COD_PROD"=897) AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_
code'),'0')))
15 - filter("FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
Considering each table contains over 100 columns I am limited while uploading the entire table definition. however please find the below details for the columns accessed in the where clause. Hope this helps.
Columns Type Nullable
cod_acct_no CHAR(16) N
FLG_MNT_STATUS CHAR(1) N
cod_23 VARCHAR2(360) Y
cod_cc_brn NUMBER(5) N
cod_prod NUMBER N
I Hope this can bring the cost down.
select
distinct rtrim(a.cod_acct_no)||'|'||
a.cod_prod||'|'||
to_char(a.dat_acct_open,'Mon DD YYYY HH:MMAM')||'|'||
a.cod_acct_title||'|'||
a.cod_acct_stat||'|'||
ltrim(to_char(a.amt_od_limit,'99999999999999999990.999999'))||'|'||
ltrim(to_char(a.bal_book,'99999999999999999990.999999'))||'|'||
a.flg_idd_auth||'|'||
a.flg_mnt_status||'|'||
rtrim(c.cod_acct_no)||'|'||
c.cod_10||'|'||
d.nam_branch||'|'||
d.nam_cc_city||'|'||
d.nam_cc_state||'|'||
c.cod_1||'|'||
c.cod_14||'|'||
num_14||'|'||
a.cod_cust||'|'||
c.cod_last_mnt_chkrid||'|'||
c.dat_last_mnt||'|'||
c.ctr_updat_srlno||'|'||
c.cod_20||'|'||
c.num_16||'|'||
c.cod_14||'|'||
c.num_10 ||'|'||
a.flg_classif_reqd||'|'||
g.cod_classif_plan_id||'|'||g.cod_classif_plan_id
||'|'||
e.dat_cam_expiry ||'|'||
c.cod_23||'|'||
lpad(trim(a.cod_cc_brn),4,0)||'|'||
(select min( o.dat_eff) from ch_acct_od_hist o where a.cod_acct_no=o.cod_acct_no )
from
ch_acct_mast a
JOIN ch_acct_cbr_codes c
ON a.flg_mnt_status ='A'
and c.flg_mnt_status ='A'
and a.cod_acct_no= c.cod_acct_no(+)
JOIN ba_cc_brn_mast d
a.cod_cc_brn=d.cod_cc_brn
JOIN ac_acct_preferences g
ON a.cod_acct_no=g.cod_acct_no AND g.FLG_MNT_STATUS = 'A'
INNER JOIN flexprod_host.AC_ACCT_PLAN_CRITERIA e
ON a.cod_acct_no=e.cod_acct_no and e.FLG_MNT_STATUS ='A'
WHERE a.cod_prod in (
299,200,804,863,202,256,814,232,182,844,279,830,802,833,864,
813,862,178,205,801,235,897,231,187,229,847,164,868,805,207,
250,837,274,253,831,893,201,809,846,819,820,845,811,843,285,
894,284,817,832,278,818,810,181,826,867,825,848,871,866,895,
770,806,827,835,838,881,853,188,816,293,298)
1. Don't fear full table scans. If a large percent of the rows in a table are being accessed it is more efficient to use a hash join/full table scan than a nested loop/index scan.
2. Fix statistics and re-analyze objects. 999 hours to read a table? That's probably an optimizer bug, have a dba look at select * from sys.aux_stats$; for some ridiculous values.
The time isn't very useful, but if one of your forecasted values is so significantly off then you need to check all of them. You should probably re-gather stats on all the relevant tables. Use default settings unless there is a good reason. For example, exec dbms_stats.gather_table_stats('your_schema_name','CH_ACCT_MAST');.
3. Look at cardinalities. Are the Rows estimates in the ballpark? They'll almost never be perfect, but if they are off by more than
an order of magnitude or two it can cause problems. Look for the first significant difference and try to correct it.
4. Code change. #Santhosh had a good idea to re-write using ANSI joins and manually unnest a subquery. Although I think you should
try to unnest the other subquery instead. Oracle can automatically unnest subqueries, but not if subqueries "contain aggregate functions".
5. Disable VPD Looks like this query is being transformed. Make sure you understand exactly what it's doing and why. You may want to disable VPD temporarily, for yourself, while you debug this problem.
6. Parallelism. Since some of these tables are large, you may want to add a parallel hint. But be careful, it is easy to use up a lot
of resources. Try to get the plan right before you do this.