Oracle Optimizer Awkwardly Doesn't Prefer to Use Index - sql

I'm joining a table with itself but although I expect this operation to use index, it seems it doesn't. There are 1 million records on the table(MY_TABLE) and the query I run is executing on about 10 thousand records.(So it is lower than %1 of whole table.)
Test Case:
explain plan for
SELECT *
FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
WHERE (A1.K_ID = '123abc')
AND A1.HDT = A2.HDT
AND A2.C_DATE BETWEEN A1.SYSDATE - 0.0004
AND A1.SYSDATE + 0.0004
AND A1.GKID = A2.GKID;
Plan hash value: 1210306805
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 81 | 28 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | HASH JOIN | | 3 | 81 | 28 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 3 | 45 | 7 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_MY_TABLE_C_DATE | 3 | | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 17 | 204 | 21 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IX_MY_TABLE_K_ID | 17 | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SYSDATE#!+0.00004>=SYSDATE#!-0.00004)
2 - access("A1"."HDT"="A2"."HDT" AND
"A1"."GKID"="A2"."GKID")
4 - access("A2"."C_DATE">=SYSDATE#!-0.00004 AND
"A2"."C_DATE"<=SYSDATE#!+0.00004)
6 - access("A1"."K_ID"=U'123abc')
In the above statement, it can be seen that the index on C_DATE is used.
However, in the below statement, the index on C_DATE is not used. So, the query runs really slow.
Real Case:
explain plan for
SELECT *
FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
WHERE (A1.K_ID = '123abc')
AND A1.HDT = A2.HDT
AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004
AND A1.C_DATE + 0.0004
AND A1.GKID = A2.GKID;
Plan hash value: 1063167343
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4187K| 998M| 6549K (1)| 00:04:16 |
|* 1 | HASH JOIN | | 4187K| 998M| 6549K (1)| 00:04:16 |
| 2 | JOIN FILTER CREATE | :BF0000 | 17 | 2125 | 21 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 17 | 2125 | 21 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_MY_TABLE_K_ID | 17 | | 4 (0)| 00:00:01 |
| 5 | JOIN FILTER USE | :BF0000 | 1429M| 166G| 6546K (1)| 00:04:16 |
|* 6 | TABLE ACCESS STORAGE FULL | MY_TABLE | 1429M| 166G| 6546K (1)| 00:04:16 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(A1.HDT=A2.HDT AND
A1.GKID=A2.GKID)
filter(A2.C_DATE>=INTERNAL_FUNCTION(A1.C_DATE)-0.00004 AND
A2.C_DATE<=INTERNAL_FUNCTION(A1.C_DATE)+0.00004)
4 - access(A1.K_ID=U'123abc')
6 - storage(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
filter(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
If I use the hint /*+index(A2,IX_MY_TABLE_C_DATE )*/, everthing is fine, the index is used and the query runs fast as I want.
The query in the real case cannot be changed because it is created by application.
Index Information:
K_ID, not unique, position 1
HDT, not unique, position 1
C_DATE, not unique, position 1
ID Unique and Primary Key, position 1
What do I have to change in order the query in the real case to use index?

Well, the second query is slower since it's quite different from the first one. It has an extra join between tables:
AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004
AND A1.C_DATE + 0.0004
and on a million rows this takes a toll.
The first query doesn't have this join condition and both tables are:
Filtered first. This is fast using indexes: only 3 rows and 17 rows.
Joining them second. Joining 3 and 17 rows doesn't take any time.
The second query needs to perform:
A huge (hash) join first, that returns probably 100K+ rows.
A filtering later.
This is way slower.
I suggest adding the following indexes, and try again:
create index ix_1 (k_id);
create index ix_2 (hdt, gkid, c_date);

You have three join criteria (HDT, GKID, C_DATE) and 1 non-join criteria (K_ID) in your self-join. So for me it would seem natural, if the DBMS started with the records matching K_ID and then looked up all matching other records.
For this case I'd suggest the following indexes:
create index idx1 on my_table(k_id, hdt, gkid, c_date);
create index idx2 on my_table(hdt, gkid, c_date);
If there are just few records per k_id, I am sure that Oracle will use the indexes. If there are many, Oracle may still use the second one for a hash join.

Just to add, whenever you have a case where you
have a slow SQL,
have identified a hint that would make it
faster
cannot change the SQL because the source is not available
then this is a perfect case for SQL Plan Baselines. These let you lock a "good" plan against an existing SQL without touching the SQL statement itself.
The entire series describing SPM is at the links below, but thelinke to "part 4" walks through an exact example of what you want to achieve.
https://blogs.oracle.com/optimizer/sql-plan-management-part-1-of-4-creating-sql-plan-baselines
https://blogs.oracle.com/optimizer/sql-plan-management-part-2-of-4-spm-aware-optimizer
https://blogs.oracle.com/optimizer/sql-plan-management-part-3-of-4:-evolving-sql-plan-baselines
https://blogs.oracle.com/optimizer/sql-plan-management-part-4-of-4:-user-interfaces-and-other-features

Related

Can this query be made faster?

My Oracle query takes over 1.5 min and I do not know if it's because of inefficient query writing, bad choice of indexes or some other database issue that I cannot control.
Some tables and data were changed to protect IP.
SELECT /*+ PARALLEL (AUTO) */ COUNT(DISTINCT SUD_USERID)
FROM (
SELECT /*+ PARALLEL (AUTO) */
SUD_USERID ,
CASE WHEN SCH_PAGETYPE = 'Page' AND SUD_EVENTTYPE = 'S'
THEN 'EVENTTYPE1'
WHEN SCH_PAGETYPE = 'Page' AND SUD_EVENTTYPE = 'V'
THEN 'EVENTTYPE2'
WHEN SCH_PAGETYPE = 'Hub' AND SUD_EVENTTYPE = 'S'
THEN 'EVENTTYPE3'
END AS CALC_EVENT_SOURCE,
SUD_EVENT_SOURCE
FROM
(
SELECT /*+ PARALLEL (AUTO) */
UPPER(PAGETYPE)|| '-' || SCH.ID PAGETYPE_ID ,
SCH.PAGETYPE SCH_PAGETYPE
FROM TABLE1 SCH
WHERE SCH.PAGETYPE IN ('Page', 'Hub')
AND SCH.CATEGORY_NAME NOT IN ('archive', 'testcategory')
)
INNER JOIN (
SELECT /*+ PARALLEL (AUTO) */
DISTINCT SUD.TRACEID TRACEID ,
SUD.EVENTTYPE SUD_EVENTTYPE ,
SUD.USERID SUD_USERID,
SUD.EVENT_SOURCE SUD_EVENT_SOURCE
FROM
SOMESCHEMA.USAGE_DETAILS SUD
WHERE
SUD.EVENTTYPE IN ('S', 'V')
)
ON TRACEID = PAGETYPE_ID
INNER JOIN USER_JOB_FAMILY_MAPPING SFD
ON SUD_USERID = SFD.USERID
)
WHERE CALC_EVENT_SOURCE = SUD_EVENT_SOURCE
I could not copy the text of the explain plan (generated via DBeaver)
but here is a screenshot:
USAGE_DETAILS table has 3941810 rows
TABLE1 has 5908 rows
USER_JOB_FAMILY_MAPPING has 578233 rows
There are no keys on any of these tables.
USAGE_DETAILS.TRACEID is VARCHAR2(500) NOT NULL has function index=SUBSTR("TRACEID",1,4) and
another index declared as default but on that column.
USAGE_DETAILS.USERID is VARCHAR2(50) NOT NULL
USAGE_DETAILS.EVENTTYPE is VARCHAR2(10) NOT NULL and has default index
USAGE_DETAILS.EVENT_SOURCE is VARCHAR2(200) NOT NULL and has default index
I have tried doing inner joins on the full tables rather than the parenthetically generated (subselect?) tables, but that did not perform better and also limited my ability to use an alias in the WHERE clause.
I do not know what kind of machine this is running on, just that it's set up for development. I'd like this query to give me accurate answers in under 10s. Some times the query above though still does not return even after 10+ minutes.
Plan hash value: 2784166315
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | | 258K (1)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | 27 | | | |
| 2 | VIEW | VM_NWVW_1 | 1809K| 46M| | 258K (1)| 00:00:11 |
| 3 | HASH GROUP BY | | 1809K| 745M| | 258K (1)| 00:00:11 |
|* 4 | HASH JOIN | | 1809K| 745M| | 258K (1)| 00:00:11 |
|* 5 | TABLE ACCESS FULL | TABLE1S | 5875 | 172K| | 309 (0)| 00:00:01 |
| 6 | MERGE JOIN SEMI | | 3079K| 1180M| | 257K (1)| 00:00:11 |
| 7 | SORT JOIN | | 3079K| 1139M| | 254K (1)| 00:00:10 |
| 8 | VIEW | | 3079K| 1139M| | 254K (1)| 00:00:10 |
| 9 | HASH UNIQUE | | 3079K| 1139M| 1202M| 254K (1)| 00:00:10 |
| 10 | INLIST ITERATOR | | | | | | |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED| USAGE_DETAILS | 3079K| 1139M| | 46 (0)| 00:00:01 |
|* 12 | INDEX RANGE SCAN | IDX_UUD_EVENTTYPE | 13704 | | | 46 (0)| 00:00:01 |
|* 13 | SORT UNIQUE | | 578K| 7905K| 22M| 3558 (1)| 00:00:01 |
| 14 | INDEX FAST FULL SCAN | USERID_IDX | 578K| 7905K| | 704 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TRACEID"=UPPER("PAGETYPE")||'-'||TO_CHAR("SCH"."ID"))
filter("from$_subquery$_004"."SUD_EVENT_SOURCE"=CASE WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE1' WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='V')) THEN 'EVENTTYPE2' WHEN (("SCH"."PAGETYPE"='Hub') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE3' END )
5 - filter("SCH"."CATEGORY_NAME"<>'archive' AND "SCH"."CATEGORY_NAME"<>'testcategory' AND ("SCH"."PAGETYPE"='Hub' OR
"SCH"."PAGETYPE"='Page'))
12 - access("SUD"."EVENTTYPE"='S' OR "SUD"."EVENTTYPE"='V')
13 - access("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
filter("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation
After running
exec dbms_stats.gather_table_stats(ownname=>'SCHEMA1',tabname=>'USAGE_DETAILS');
exec dbms_stats.gather_table_stats(ownname=>'SCHEMA1',tabname=>'TABLE1');
I have this new plan:
SQL> select plan_table_output from table(dbms_xplan.display());
Plan hash value: 3419946982
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | | 70152 (1)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | 27 | | | |
| 2 | VIEW | VM_NWVW_1 | 53144 | 1401K| | 70152 (1)| 00:00:03 |
| 3 | HASH GROUP BY | | 53144 | 21M| 21M| 70152 (1)| 00:00:03 |
|* 4 | HASH JOIN RIGHT SEMI | | 53144 | 21M| 14M| 65453 (1)| 00:00:03 |
| 5 | INDEX FAST FULL SCAN | USERID_IDX | 578K| 7905K| | 704 (1)| 00:00:01 |
|* 6 | HASH JOIN | | 53144 | 20M| | 62995 (1)| 00:00:03 |
| 7 | JOIN FILTER CREATE | :BF0000 | 5503 | 161K| | 309 (0)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | TABLE1 | 5503 | 161K| | 309 (0)| 00:00:01 |
| 9 | VIEW | | 3549K| 1259M| | 62677 (1)| 00:00:03 |
| 10 | HASH UNIQUE | | 3549K| 159M| 203M| 62677 (1)| 00:00:03 |
| 11 | JOIN FILTER USE | :BF0000 | 3549K| 159M| | 21035 (1)| 00:00:01 |
|* 12 | TABLE ACCESS FULL| USAGE_DETAILS | 3549K| 159M| | 21035 (1)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
6 - access("TRACEID"=UPPER("PAGETYPE")||'-'||TO_CHAR("SCH"."ID"))
filter("from$_subquery$_004"."SUD_EVENT_SOURCE"=CASE WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE1' WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='V')) THEN 'EVENTTYPE2' WHEN (("SCH"."PAGETYPE"='Hub') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE3' END )
8 - filter("SCH"."CATEGORY_NAME"<>'archive' AND "SCH"."CATEGORY_NAME"<>'testcategory' AND
("SCH"."PAGETYPE"='Hub' OR "SCH"."PAGETYPE"='Page'))
12 - filter(("SUD"."EVENTTYPE"='S' OR "SUD"."EVENTTYPE"='V') AND
SYS_OP_BLOOM_FILTER(:BF0000,"SUD"."TRACEID"))
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation
37 rows selected.
Is it more likely that the compute statistics massively helped this query or that someone did something else that I was not aware of? Yes, the query ran much better, but I'd feel better too if I knew why.
Hy,
after reviewing your SQL I noticed your statements are full of string comparisons and searches. For example
SELECT /*+ PARALLEL (AUTO) */
UPPER(PAGETYPE)|| '-' || SCH.ID PAGETYPE_ID ,
SCH.PAGETYPE SCH_PAGETYPE
FROM TABLE1 SCH
WHERE SCH.PAGETYPE IN ('Page', 'Hub')
AND SCH.CATEGORY_NAME NOT IN ('archive', 'testcategory')
This can be indexed int 2 ways.
First: Create table that has 'Page', 'Hub', and other types that you need, create for the a column Index and then "replace" basically adapt your query to resolve those indexes instead of string compare.
Tables can have multiple indices on columns those have to be treated with caution because they create problems in regards of database size.
Also I would check if what are the biggest tables and reorder their selections to the last. Meaning:
if one table has 12 rows and the other 100. First put the 12 row table then the 100 row. This will multiply in your case since the tables and nested and chained.
I made 1 more review and realized I made an oversight.
USAGE_DETAILS table has 3941810 rows
TABLE1 has 5908 rows
USER_JOB_FAMILY_MAPPING has 578233 rows
First filter the table 1, this is costly already, then Inner join raw USAGE_DETAILS and then select the join ID-s.
Then inner join USER_JOB_FAMILY_MAPPING, and select after that. The reason is that the joins are done on the ID which is probably int type.
Gather statistics on the relevant objects like this:
begin
dbms_stats.gather_table_stats(ownname => user, tabname => 'TABLE1');
dbms_stats.gather_table_stats(ownname => 'SOMESCHEMA', tabname => 'USAGE_DETAILS');
end;
/
This line in the execution plan implies that one of the tables is missing statistics:
- dynamic statistics used: dynamic sampling (level=2)
Not all uses of dynamic sampling imply missing statistics, but level 2 is highly suspicious. That sampling level is usually intended to "Apply dynamic sampling to all unanalyzed tables."
Optimizer statistics are necessary for Oracle to make good execution plans. The algorithms and access paths for joining small amounts of data are different than the algorithms and access paths for joining large amounts of data. The optimizer statistics help Oracle estimate the size of the results and build good plans.
If this solves your problem, you should also investigate the root cause. Optimizer statistics should always be gathered manually after a large change, and automatically by the system every night. If you have a large ETL process that significantly changes a table, it should include a call to DBMS_STATS at the end. The database by default gathers stats at 10PM every night, unless a DBA foolishly disabled the autotask.
If that doesn't solve the problem, then regenerate the execution plan with actual numbers using DBMS_SQLTUNE or the GATHER_PLAN_STATISTICS_HINT. SQL tuning is about optimizing the operations. Your SQL statement has 14 operations, each of which is like a miniature program. We need to know which one of the operations is causing the problem. Finding actual cardinalities and actual run times, and comparing them to estimates, helps tremendously with diagnosing SQL problems.
How do we know that gathering stats was what fixed the performance?
We can't be 100% sure. But it's a safe bet that gathering statistics was responsible for the improvement, for several reasons.
Bad or missing statistics are responsible for a large percentage of all Oracle performance problems. Ask any DBA and they'll have plenty of stories about missing statistics.
The Note section changes strongly imply there are no other weird things happening behind the scenes. There are lots of tricks to silently fix queries, like SQL profiles, baselines, adaptive reoptimization, dynamic sampling (shows up in the first plan, but not the second one, because stats are better), etc. But if those tricks were used they would show up in the Note section.

Distinct clause on large join

Following this question, suppose now I've set up the indexes, and now I want only to return certain field, without duplicates:
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2
where A.year=2016
and B.year=2016
the problem now is I'm getting something like 150k cod, with only 1000 distinct values, so my query is very inefficient.
Question: how can I improve that? i.e, how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
Thank you in advance!
I'm basing my answer on your question:
how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
with the EXISTS clause, once it sees a match it will stop and check for the next record to be checked.
adding the DISTINCT will filter out any duplicate CODs (in case there is one).
select DISTINCT cod
from A ax
where year = 2016
and exists ( select 1
from B bx
WHERE Ax.ID1 = Bx.ID1
AND Ax.ID2 = Bx.ID2
AND Ax.YEAR = Bx.YEAR);
EDIT: Was curious which solution (IN or EXISTS) will give me a better Explain plan
Create the 1st Table Definition
Create table A
(
ID1 number,
ID2 number,
cod varchar2(100),
year number
);
insert 4000000 sequential numbers
BEGIN
FOR i IN 1..4000000 loop
insert into A (id1, id2, cod, year)
values (i, i , i, i);
end loop;
END;
commit;
Create Table B and insert the same data into to it
Create table B
as
select *
from A;
Reinsert Data from Table A to make duplicates
insert into B
select *
from A
Build the Indexes mentioned in the Previous Post Index on join and where
CREATE INDEX A_IDX ON A(year, id1, id2);
CREATE INDEX B_IDX ON B(year, id1, id2);
Update a bunch of rows to make it fetch multiple rows with the year 2016:
update B
set year = 2016
where rownum < 20000;
update A
set year = 2016
where rownum < 20000;
commit;
Check Explain plan using EXISTS
Plan hash value: 1052726981
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS SEMI | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 2 | 36 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("BX"."YEAR"=2016 AND "AX"."ID1"="BX"."ID1" AND "AX"."ID2"="BX"."ID2")
filter("AX"."YEAR"="BX"."YEAR")
Check Explain plan using IN
Plan hash value: 3002464630
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 1 | 18 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("YEAR"=2016 AND "ID1"="ID1" AND "ID2"="ID2")
Although my test case is limited, i'm guessing that both the IN and EXISTS clause have nearly the same execution.
On the face of it, what you are actually trying to do should be done like this:
select distinct cod
from A
where year = 2016
and (id1, id2) in (select id1, id2 from B where year = 2016)
The subquery in the WHERE condition is a non-correlated query, so it will be evaluated only once. And the IN condition is evaluated using short-circuiting; instead of a complete join, it will search through the results of the subquery only until a match is found.
EDIT: As Migs Isip points out, there may be duplicate codes in the original table, so a "distinct" may still be needed. I edited my code to add it back after Migs posted his answer.
Not sure about your existing indexes but you can improve your query a bit by adding another JOIN condition like
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2 and
A.year = B.year // this one
where A.year=2016;

Using partition key in function Oracle

We have a partioned table in our Oracle database using this syntaxe:
...
PARTITION BY RANGE(saledate)
(PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')),
PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')),
...
We usually use partition key in select statement like this:
Select * from table where saledate >= trunc(sysdate-3) and saledate < trunc(sysdate-2)
To have same result using less code, I usually use this query instead :
Select * from table where trunc(saledate) = trunc(sysdate-3)
My question is, by using partition key in a function, in this case trunc(), do we loose partioning performance ?
You are misinterpreting the plan shown by Toad (in your answer). For the two queries that shows, respectively:
Partition #: 2 Partitions determined by key values
and
Partition #: 1 Partitions accessed #1 - #17
The first query is accessing only the partitions it needs, based on the key value, which is the date; so it only has to do a full scan of the partition(s) that could contain your date.
The second query has to access all partitions because you are manipulating the key value with a function, meaning you are no longer really using the partition key. The key is saledate, not trunc(saledate). This is similar to what happens when you use a function on an indexed column; the index is no longer used in that case, and the partition key is no longer used here. And as you seem to suspect from your question, yes, you do lose efficiency.
You can also see that the cardinality has been guessed as 50 because of the function call, instead of the stats-provided value of 4966.
You can see the same thing from a dummy table using dbms_xplan; from your first query:
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4996 | 39968 | 14 (0)| 00:00:01 | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE ITERATOR| | 4996 | 39968 | 14 (0)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | T42 | 4996 | 39968 | 14 (0)| 00:00:01 | KEY | KEY |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TRUNC(SYSDATE#!-3)<TRUNC(SYSDATE#!-2))
3 - filter("SALEDATE">=TRUNC(SYSDATE#!-3) AND "SALEDATE"<TRUNC(SYSDATE#!-2))
And from your second query:
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 400 | 14 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ALL| | 50 | 400 | 14 (0)| 00:00:01 | 1 | 17 |
|* 2 | TABLE ACCESS FULL | T42 | 50 | 400 | 14 (0)| 00:00:01 | 1 | 17 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TRUNC(INTERNAL_FUNCTION("SALEDATE"))=TRUNC(SYSDATE#!-3))
Notice the pstart/pstop values and the cardinality in each query.
Your first query is going to be more efficient because it can use the partition key to be selective about which partitions it does a full scan of, while the second cannot and has to scan them all.

Query Tuning - Advice

I need advice on the attached Query. The query executes for over an hour and has full table scan as per the Explain Plan. I am fairly new to query tuning and would appriciate some advice.
Firstly why would I get a full table scan even though all the columns I use have index created on them.
Secondly, is there any possibility where in I can reduce the execution time, all tables accessed are huge and contain millions of records, even then I would like to scope out some options. Appriciate your help.
Query:
select
distinct rtrim(a.cod_acct_no)||'|'||
a.cod_prod||'|'||
to_char(a.dat_acct_open,'Mon DD YYYY HH:MMAM')||'|'||
a.cod_acct_title||'|'||
a.cod_acct_stat||'|'||
ltrim(to_char(a.amt_od_limit,'99999999999999999990.999999'))||'|'||
ltrim(to_char(a.bal_book,'99999999999999999990.999999'))||'|'||
a.flg_idd_auth||'|'||
a.flg_mnt_status||'|'||
rtrim(c.cod_acct_no)||'|'||
c.cod_10||'|'||
d.nam_branch||'|'||
d.nam_cc_city||'|'||
d.nam_cc_state||'|'||
c.cod_1||'|'||
c.cod_14||'|'||
num_14||'|'||
a.cod_cust||'|'||
c.cod_last_mnt_chkrid||'|'||
c.dat_last_mnt||'|'||
c.ctr_updat_srlno||'|'||
c.cod_20||'|'||
c.num_16||'|'||
c.cod_14||'|'||
c.num_10 ||'|'||
a.flg_classif_reqd||'|'||
(select g.cod_classif_plan_id||'|'||
g.cod_classif_plan_id
from
ac_acct_preferences g
where
a.cod_acct_no=g.cod_acct_no AND g.FLG_MNT_STATUS = 'A' )||'|'||
(select e.dat_cam_expiry from flexprod_host.AC_ACCT_PLAN_CRITERIA e where a.cod_acct_no=e.cod_acct_no and e.FLG_MNT_STATUS ='A')||'|'||
c.cod_23||'|'||
lpad(trim(a.cod_cc_brn),4,0)||'|'||
(select min( o.dat_eff) from ch_acct_od_hist o where a.cod_acct_no=o.cod_acct_no )
from
ch_acct_mast a,
ch_acct_cbr_codes c,
ba_cc_brn_mast d
where
a.flg_mnt_status ='A'
and c.flg_mnt_status ='A'
and a.cod_acct_no= c.cod_acct_no(+)
and a.cod_cc_brn=d.cod_cc_brn
and a.cod_prod in (
299,200,804,863,202,256,814,232,182,844,279,830,802,833,864,
813,862,178,205,801,235,897,231,187,229,847,164,868,805,207,
250,837,274,253,831,893,201,809,846,819,820,845,811,843,285,
894,284,817,832,278,818,810,181,826,867,825,848,871,866,895,
770,806,827,835,838,881,853,188,816,293,298)
Query Plan:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4253465430
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 733K| 125M| | 468K (1)|999:59:59 | | |
| 1 | TABLE ACCESS BY INDEX ROWID | AC_ACCT_PREFERENCES | 1 | 26 | | 3 (0)| 00:01:05 | | |
|* 2 | INDEX UNIQUE SCAN | IN_AC_ACCT_PREFERENCES_1 | 1 | | | 2 (0)| 00:00:43 | | |
| 3 | PARTITION HASH SINGLE | | 1 | 31 | | 3 (0)| 00:01:05 | KEY | KEY |
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID| AC_ACCT_PLAN_CRITERIA | 1 | 31 | | 3 (0)| 00:01:05 | KEY | KEY |
|* 5 | INDEX UNIQUE SCAN | IN_AC_ACCT_PLAN_CRITERIA_1 | 1 | | | 2 (0)| 00:00:43 | KEY | KEY |
| 6 | SORT AGGREGATE | | 1 | 29 | | | | | |
| 7 | FIRST ROW | | 1 | 29 | | 3 (0)| 00:01:05 | | |
|* 8 | INDEX RANGE SCAN (MIN/MAX) | IN_CH_ACCT_OD_HIST_1 | 1 | 29 | | 3 (0)| 00:01:05 | | |
| 9 | HASH UNIQUE | | 733K| 125M| 139M| 468K (1)|999:59:59 | | |
|* 10 | HASH JOIN | | 733K| 125M| | 439K (1)|999:59:59 | | |
|* 11 | TABLE ACCESS FULL | BA_CC_BRN_MAST | 3259 | 136K| | 31 (0)| 00:11:04 | | |
|* 12 | HASH JOIN | | 747K| 97M| 61M| 439K (1)|999:59:59 | | |
| 13 | PARTITION HASH ALL | | 740K| 52M| | 286K (1)|999:59:59 | 1 | 64 |
|* 14 | TABLE ACCESS FULL | CH_ACCT_MAST | 740K| 52M| | 286K (1)|999:59:59 | 1 | 64 |
|* 15 | TABLE ACCESS FULL | CH_ACCT_CBR_CODES | 9154K| 541M| | 117K (1)|699:41:01 | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COD_ACCT_NO"=:B1 AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_co
de'),'0')))
5 - access("COD_ACCT_NO"=:B1 AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_co
de'),'0')))
8 - access("COD_ACCT_NO"=:B1)
filter("COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
10 - access("COD_CC_BRN"="COD_CC_BRN")
11 - filter("COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
12 - access("COD_ACCT_NO"="COD_ACCT_NO")
14 - filter(("COD_PROD"=164 OR "COD_PROD"=178 OR "COD_PROD"=181 OR "COD_PROD"=182 OR "COD_PROD"=187 OR "COD_PROD"=188 OR
"COD_PROD"=200 OR "COD_PROD"=201 OR "COD_PROD"=202 OR "COD_PROD"=205 OR "COD_PROD"=207 OR "COD_PROD"=229 OR "COD_PROD"=231 OR
"COD_PROD"=232 OR "COD_PROD"=235 OR "COD_PROD"=250 OR "COD_PROD"=253 OR "COD_PROD"=256 OR "COD_PROD"=274 OR "COD_PROD"=278 OR
"COD_PROD"=279 OR "COD_PROD"=284 OR "COD_PROD"=285 OR "COD_PROD"=293 OR "COD_PROD"=298 OR "COD_PROD"=299 OR "COD_PROD"=770 OR
"COD_PROD"=801 OR "COD_PROD"=802 OR "COD_PROD"=804 OR "COD_PROD"=805 OR "COD_PROD"=806 OR "COD_PROD"=809 OR "COD_PROD"=810 OR
"COD_PROD"=811 OR "COD_PROD"=813 OR "COD_PROD"=814 OR "COD_PROD"=816 OR "COD_PROD"=817 OR "COD_PROD"=818 OR "COD_PROD"=819 OR
"COD_PROD"=820 OR "COD_PROD"=825 OR "COD_PROD"=826 OR "COD_PROD"=827 OR "COD_PROD"=830 OR "COD_PROD"=831 OR "COD_PROD"=832 OR
"COD_PROD"=833 OR "COD_PROD"=835 OR "COD_PROD"=837 OR "COD_PROD"=838 OR "COD_PROD"=843 OR "COD_PROD"=844 OR "COD_PROD"=845 OR
"COD_PROD"=846 OR "COD_PROD"=847 OR "COD_PROD"=848 OR "COD_PROD"=853 OR "COD_PROD"=862 OR "COD_PROD"=863 OR "COD_PROD"=864 OR
"COD_PROD"=866 OR "COD_PROD"=867 OR "COD_PROD"=868 OR "COD_PROD"=871 OR "COD_PROD"=881 OR "COD_PROD"=893 OR "COD_PROD"=894 OR
"COD_PROD"=895 OR "COD_PROD"=897) AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_
code'),'0')))
15 - filter("FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
Considering each table contains over 100 columns I am limited while uploading the entire table definition. however please find the below details for the columns accessed in the where clause. Hope this helps.
Columns Type Nullable
cod_acct_no CHAR(16) N
FLG_MNT_STATUS CHAR(1) N
cod_23 VARCHAR2(360) Y
cod_cc_brn NUMBER(5) N
cod_prod NUMBER N
I Hope this can bring the cost down.
select
distinct rtrim(a.cod_acct_no)||'|'||
a.cod_prod||'|'||
to_char(a.dat_acct_open,'Mon DD YYYY HH:MMAM')||'|'||
a.cod_acct_title||'|'||
a.cod_acct_stat||'|'||
ltrim(to_char(a.amt_od_limit,'99999999999999999990.999999'))||'|'||
ltrim(to_char(a.bal_book,'99999999999999999990.999999'))||'|'||
a.flg_idd_auth||'|'||
a.flg_mnt_status||'|'||
rtrim(c.cod_acct_no)||'|'||
c.cod_10||'|'||
d.nam_branch||'|'||
d.nam_cc_city||'|'||
d.nam_cc_state||'|'||
c.cod_1||'|'||
c.cod_14||'|'||
num_14||'|'||
a.cod_cust||'|'||
c.cod_last_mnt_chkrid||'|'||
c.dat_last_mnt||'|'||
c.ctr_updat_srlno||'|'||
c.cod_20||'|'||
c.num_16||'|'||
c.cod_14||'|'||
c.num_10 ||'|'||
a.flg_classif_reqd||'|'||
g.cod_classif_plan_id||'|'||g.cod_classif_plan_id
||'|'||
e.dat_cam_expiry ||'|'||
c.cod_23||'|'||
lpad(trim(a.cod_cc_brn),4,0)||'|'||
(select min( o.dat_eff) from ch_acct_od_hist o where a.cod_acct_no=o.cod_acct_no )
from
ch_acct_mast a
JOIN ch_acct_cbr_codes c
ON a.flg_mnt_status ='A'
and c.flg_mnt_status ='A'
and a.cod_acct_no= c.cod_acct_no(+)
JOIN ba_cc_brn_mast d
a.cod_cc_brn=d.cod_cc_brn
JOIN ac_acct_preferences g
ON a.cod_acct_no=g.cod_acct_no AND g.FLG_MNT_STATUS = 'A'
INNER JOIN flexprod_host.AC_ACCT_PLAN_CRITERIA e
ON a.cod_acct_no=e.cod_acct_no and e.FLG_MNT_STATUS ='A'
WHERE a.cod_prod in (
299,200,804,863,202,256,814,232,182,844,279,830,802,833,864,
813,862,178,205,801,235,897,231,187,229,847,164,868,805,207,
250,837,274,253,831,893,201,809,846,819,820,845,811,843,285,
894,284,817,832,278,818,810,181,826,867,825,848,871,866,895,
770,806,827,835,838,881,853,188,816,293,298)
1. Don't fear full table scans. If a large percent of the rows in a table are being accessed it is more efficient to use a hash join/full table scan than a nested loop/index scan.
2. Fix statistics and re-analyze objects. 999 hours to read a table? That's probably an optimizer bug, have a dba look at select * from sys.aux_stats$; for some ridiculous values.
The time isn't very useful, but if one of your forecasted values is so significantly off then you need to check all of them. You should probably re-gather stats on all the relevant tables. Use default settings unless there is a good reason. For example, exec dbms_stats.gather_table_stats('your_schema_name','CH_ACCT_MAST');.
3. Look at cardinalities. Are the Rows estimates in the ballpark? They'll almost never be perfect, but if they are off by more than
an order of magnitude or two it can cause problems. Look for the first significant difference and try to correct it.
4. Code change. #Santhosh had a good idea to re-write using ANSI joins and manually unnest a subquery. Although I think you should
try to unnest the other subquery instead. Oracle can automatically unnest subqueries, but not if subqueries "contain aggregate functions".
5. Disable VPD Looks like this query is being transformed. Make sure you understand exactly what it's doing and why. You may want to disable VPD temporarily, for yourself, while you debug this problem.
6. Parallelism. Since some of these tables are large, you may want to add a parallel hint. But be careful, it is easy to use up a lot
of resources. Try to get the plan right before you do this.

How does IN clause affect performance in oracle?

UPDATE table1
SET col1 = 'Y'
WHERE col2 in (select col2 from table2)
In the above query, imagine the inner query returns 10000 rows. Does this query with IN clause affect performance?
If so, what can be done for faster execution?
if the subquery returns a large number of rows compared to the number of rows in TABLE1, the optimizer will likely produce a plan like this:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 300K| 24M| | 1581 (1)| 00:0
| 1 | UPDATE | TABLE1 | | | | |
|* 2 | HASH JOIN SEMI | | 300K| 24M| 9384K| 1581 (1)| 00:0
| 3 | TABLE ACCESS FULL| TABLE1 | 300K| 5860K| | 355 (2)| 00:0
| 4 | TABLE ACCESS FULL| TABLE2 | 168K| 10M| | 144 (2)| 00:0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COL2"="COL2")
It will scan both tables once and update only the rows in TABLE1 common to both tables. This is a highly efficient plan if you need to update lots of rows.
Sometimes the inner query will have few rows compared to the number of rows in TABLE1. If you have an index on TABLE1(col2), you could then obtain a plan similar to this one:
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 93 | 4557 | 247 (1)| 00:00:03 |
| 1 | UPDATE | TABLE1 | | | | |
| 2 | NESTED LOOPS | | 93 | 4557 | 247 (1)| 00:00:03 |
| 3 | SORT UNIQUE | | 51 | 1326 | 142 (0)| 00:00:02 |
| 4 | TABLE ACCESS FULL| TABLE2 | 51 | 1326 | 142 (0)| 00:00:02 |
|* 5 | INDEX RANGE SCAN | IDX1 | 2 | 46 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("T1"."COL2"="T2"."COL2")
In that case Oracle will read the rows from TABLE2 and for each (unique) row, perform an index access on TABLE1.
Which access is faster depend upon the selectivity of the inner query and the clustering of the index on TABLE1 (are the rows with similar value of col2 in TABLE1 next to each other or randomly spread?). In any case, performance wise, if you need to perform this update this query is one of the fastest way to do it.
UPDATE table1 outer
SET col1 = 'Y'
WHERE EXISTS (select null
from table2
WHERE col2 = outer.col2)
This could be better
To get the idea which is better - look at the execution plan.
From Oracle:
11.5.3.4 Use of EXISTS versus IN for Subqueries
In certain circumstances, it is better
to use IN rather than EXISTS. In
general, if the selective predicate is
in the subquery, then use IN. If the
selective predicate is in the parent
query, then use EXISTS.
From my experience, I have seen better plans using EXISTS where subquery returns large amount of rows.
See here for more discussion from Oracle