Distinct clause on large join - sql

Following this question, suppose now I've set up the indexes, and now I want only to return certain field, without duplicates:
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2
where A.year=2016
and B.year=2016
the problem now is I'm getting something like 150k cod, with only 1000 distinct values, so my query is very inefficient.
Question: how can I improve that? i.e, how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
Thank you in advance!

I'm basing my answer on your question:
how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
with the EXISTS clause, once it sees a match it will stop and check for the next record to be checked.
adding the DISTINCT will filter out any duplicate CODs (in case there is one).
select DISTINCT cod
from A ax
where year = 2016
and exists ( select 1
from B bx
WHERE Ax.ID1 = Bx.ID1
AND Ax.ID2 = Bx.ID2
AND Ax.YEAR = Bx.YEAR);
EDIT: Was curious which solution (IN or EXISTS) will give me a better Explain plan
Create the 1st Table Definition
Create table A
(
ID1 number,
ID2 number,
cod varchar2(100),
year number
);
insert 4000000 sequential numbers
BEGIN
FOR i IN 1..4000000 loop
insert into A (id1, id2, cod, year)
values (i, i , i, i);
end loop;
END;
commit;
Create Table B and insert the same data into to it
Create table B
as
select *
from A;
Reinsert Data from Table A to make duplicates
insert into B
select *
from A
Build the Indexes mentioned in the Previous Post Index on join and where
CREATE INDEX A_IDX ON A(year, id1, id2);
CREATE INDEX B_IDX ON B(year, id1, id2);
Update a bunch of rows to make it fetch multiple rows with the year 2016:
update B
set year = 2016
where rownum < 20000;
update A
set year = 2016
where rownum < 20000;
commit;
Check Explain plan using EXISTS
Plan hash value: 1052726981
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS SEMI | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 2 | 36 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("BX"."YEAR"=2016 AND "AX"."ID1"="BX"."ID1" AND "AX"."ID2"="BX"."ID2")
filter("AX"."YEAR"="BX"."YEAR")
Check Explain plan using IN
Plan hash value: 3002464630
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 1 | 18 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("YEAR"=2016 AND "ID1"="ID1" AND "ID2"="ID2")
Although my test case is limited, i'm guessing that both the IN and EXISTS clause have nearly the same execution.

On the face of it, what you are actually trying to do should be done like this:
select distinct cod
from A
where year = 2016
and (id1, id2) in (select id1, id2 from B where year = 2016)
The subquery in the WHERE condition is a non-correlated query, so it will be evaluated only once. And the IN condition is evaluated using short-circuiting; instead of a complete join, it will search through the results of the subquery only until a match is found.
EDIT: As Migs Isip points out, there may be duplicate codes in the original table, so a "distinct" may still be needed. I edited my code to add it back after Migs posted his answer.

Not sure about your existing indexes but you can improve your query a bit by adding another JOIN condition like
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2 and
A.year = B.year // this one
where A.year=2016;

Related

Trying to Remove table access full in oracle

i have a simple query
select round(sum(a.wt)) as a_wt
from db.abc a
where a.date is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce
and I want to remove table access full.there are 3 index which are like these on following combination of column
col_no
col_no,date,fant,pyc
wagno,batno
what can be can be done to remove table access full.
One option would be creating a Function Based Index :
create index idx_date_col_pod on ABC (nvl("date",date'1900-01-01'), nvl(col_no,0), pod_cd);
and convert the query to :
select round(sum(wt)) as a_wt
from abc
where nvl("date",date'1900-01-01') = date'1900-01-01' -- matching means "date" column is null assuming there exists no records with this ancient date.
and nvl(col_no,0) != 0 -- non-matching means "col_no" column is not null
and pod_cd = 367
and fant != rce
Usually indexes are not indexing the null values, so the conditions like
where a.date is null
and a.col_no is not null
meaning just "don't use an index in order to get lines for these conditions"
However, there is an option in the create index statement allowing it to index null columns (starting from version 11 as far as I know)
create index abc_date_nulls on abc(date, 1); -- (xxx,1) is doing the trick
Thus you'll create an index that considers null values. This might be useful depending on selectivity of "date is null" condition.
Otherwise or in addition, I'd suggest you to check the selectivity for the condition "pod_cd = 367" and build an index on pod_cd column.
If you are sure the index will help and the database doesn't use it, you can force oracle to use an index using a hint
select /*+ index(index name) */ ... from ...
It is good for tests or for checking the impact indexes can provide but pleasepleaseplease be careful using them in production. Google the documentation and all the things about disadvantages for that approach. Don't tell anyone I told you to use hints on production
Indexes can be used to check for nulls, and to compare two columns against each other.
Setup:
create table abc
( dt date
, col_no number
, pod_cd varchar2(5)
, fant number
, rce number
, wt number )
nologging;
insert /*+ append */ into abc (dt, col_no, pod_cd, fant, rce, wt)
select case mod(rownum,3) when 0 then date '2018-12-31' + mod(rownum,1000) end
, case mod(rownum,7) when 0 then rownum end
, case mod(rownum,2) when 1 then mod(rownum,1000) end
, round(dbms_random.value) + 10
, round(dbms_random.value) + 10
, 1
from xmltable('1 to 10000000');
create index x1 on abc (pod_cd, dt);
create index x2 on abc (fant, rce);
Test for nulls:
select count(*) from abc a
where a.pod_cd = '367'
and a.dt is null;
COUNT(*)
----------
6667
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2253536563
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 20 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | INDEX RANGE SCAN| X1 | 6667 | 46669 | 20 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."POD_CD"='367' AND "A"."DT" IS NULL)
The query was executed using index X1, without touching the table.
Test for fant != rce:
select count(*)
from abc a
where a.fant != a.rce;
COUNT(*)
----------
5000666
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 29151601
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 6468 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | INDEX FAST FULL SCAN| X2 | 5000K| 28M| 6468 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."FANT"<>"A"."RCE")
The query was executed using index X2, also without touching the table.
Test for the full query:
create index x3 on abc(pod_cd, dt, fant, rce, col_no, wt);
select round(sum(a.wt)) as a_wt
from abc a
where a.dt is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce;
A_WT
----------
481
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3828004431
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 28 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 16 | | |
|* 2 | INDEX RANGE SCAN| X3 | 476 | 7616 | 28 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."POD_CD"='367' AND "A"."DT" IS NULL)
filter("A"."COL_NO" IS NOT NULL AND "A"."FANT"<>"A"."RCE")
Full table scans aren't always terrible, though.
SQL> drop index x1;
Index dropped.
SQL> drop index x2;
Index dropped.
SQL> drop index x3;
Index dropped.
select round(sum(a.wt)) as a_wt
from abc a
where a.dt is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce;
A_WT
----------
481
1 row selected.
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 1045519631
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 8188 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 16 | | |
|* 2 | TABLE ACCESS FULL| ABC | 476 | 7616 | 8188 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."COL_NO" IS NOT NULL AND "A"."POD_CD"='367' AND
"A"."DT" IS NULL AND "A"."FANT"<>"A"."RCE")
Table ABC has 10 million rows, and the full scan took 0.18 seconds. This is in a VM on a 4 year old laptop.

Oracle Optimizer Awkwardly Doesn't Prefer to Use Index

I'm joining a table with itself but although I expect this operation to use index, it seems it doesn't. There are 1 million records on the table(MY_TABLE) and the query I run is executing on about 10 thousand records.(So it is lower than %1 of whole table.)
Test Case:
explain plan for
SELECT *
FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
WHERE (A1.K_ID = '123abc')
AND A1.HDT = A2.HDT
AND A2.C_DATE BETWEEN A1.SYSDATE - 0.0004
AND A1.SYSDATE + 0.0004
AND A1.GKID = A2.GKID;
Plan hash value: 1210306805
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 81 | 28 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | HASH JOIN | | 3 | 81 | 28 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 3 | 45 | 7 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_MY_TABLE_C_DATE | 3 | | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 17 | 204 | 21 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IX_MY_TABLE_K_ID | 17 | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SYSDATE#!+0.00004>=SYSDATE#!-0.00004)
2 - access("A1"."HDT"="A2"."HDT" AND
"A1"."GKID"="A2"."GKID")
4 - access("A2"."C_DATE">=SYSDATE#!-0.00004 AND
"A2"."C_DATE"<=SYSDATE#!+0.00004)
6 - access("A1"."K_ID"=U'123abc')
In the above statement, it can be seen that the index on C_DATE is used.
However, in the below statement, the index on C_DATE is not used. So, the query runs really slow.
Real Case:
explain plan for
SELECT *
FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
WHERE (A1.K_ID = '123abc')
AND A1.HDT = A2.HDT
AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004
AND A1.C_DATE + 0.0004
AND A1.GKID = A2.GKID;
Plan hash value: 1063167343
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4187K| 998M| 6549K (1)| 00:04:16 |
|* 1 | HASH JOIN | | 4187K| 998M| 6549K (1)| 00:04:16 |
| 2 | JOIN FILTER CREATE | :BF0000 | 17 | 2125 | 21 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 17 | 2125 | 21 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_MY_TABLE_K_ID | 17 | | 4 (0)| 00:00:01 |
| 5 | JOIN FILTER USE | :BF0000 | 1429M| 166G| 6546K (1)| 00:04:16 |
|* 6 | TABLE ACCESS STORAGE FULL | MY_TABLE | 1429M| 166G| 6546K (1)| 00:04:16 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(A1.HDT=A2.HDT AND
A1.GKID=A2.GKID)
filter(A2.C_DATE>=INTERNAL_FUNCTION(A1.C_DATE)-0.00004 AND
A2.C_DATE<=INTERNAL_FUNCTION(A1.C_DATE)+0.00004)
4 - access(A1.K_ID=U'123abc')
6 - storage(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
filter(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
If I use the hint /*+index(A2,IX_MY_TABLE_C_DATE )*/, everthing is fine, the index is used and the query runs fast as I want.
The query in the real case cannot be changed because it is created by application.
Index Information:
K_ID, not unique, position 1
HDT, not unique, position 1
C_DATE, not unique, position 1
ID Unique and Primary Key, position 1
What do I have to change in order the query in the real case to use index?
Well, the second query is slower since it's quite different from the first one. It has an extra join between tables:
AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004
AND A1.C_DATE + 0.0004
and on a million rows this takes a toll.
The first query doesn't have this join condition and both tables are:
Filtered first. This is fast using indexes: only 3 rows and 17 rows.
Joining them second. Joining 3 and 17 rows doesn't take any time.
The second query needs to perform:
A huge (hash) join first, that returns probably 100K+ rows.
A filtering later.
This is way slower.
I suggest adding the following indexes, and try again:
create index ix_1 (k_id);
create index ix_2 (hdt, gkid, c_date);
You have three join criteria (HDT, GKID, C_DATE) and 1 non-join criteria (K_ID) in your self-join. So for me it would seem natural, if the DBMS started with the records matching K_ID and then looked up all matching other records.
For this case I'd suggest the following indexes:
create index idx1 on my_table(k_id, hdt, gkid, c_date);
create index idx2 on my_table(hdt, gkid, c_date);
If there are just few records per k_id, I am sure that Oracle will use the indexes. If there are many, Oracle may still use the second one for a hash join.
Just to add, whenever you have a case where you
have a slow SQL,
have identified a hint that would make it
faster
cannot change the SQL because the source is not available
then this is a perfect case for SQL Plan Baselines. These let you lock a "good" plan against an existing SQL without touching the SQL statement itself.
The entire series describing SPM is at the links below, but thelinke to "part 4" walks through an exact example of what you want to achieve.
https://blogs.oracle.com/optimizer/sql-plan-management-part-1-of-4-creating-sql-plan-baselines
https://blogs.oracle.com/optimizer/sql-plan-management-part-2-of-4-spm-aware-optimizer
https://blogs.oracle.com/optimizer/sql-plan-management-part-3-of-4:-evolving-sql-plan-baselines
https://blogs.oracle.com/optimizer/sql-plan-management-part-4-of-4:-user-interfaces-and-other-features

Query Tuning - Advice

I need advice on the attached Query. The query executes for over an hour and has full table scan as per the Explain Plan. I am fairly new to query tuning and would appriciate some advice.
Firstly why would I get a full table scan even though all the columns I use have index created on them.
Secondly, is there any possibility where in I can reduce the execution time, all tables accessed are huge and contain millions of records, even then I would like to scope out some options. Appriciate your help.
Query:
select
distinct rtrim(a.cod_acct_no)||'|'||
a.cod_prod||'|'||
to_char(a.dat_acct_open,'Mon DD YYYY HH:MMAM')||'|'||
a.cod_acct_title||'|'||
a.cod_acct_stat||'|'||
ltrim(to_char(a.amt_od_limit,'99999999999999999990.999999'))||'|'||
ltrim(to_char(a.bal_book,'99999999999999999990.999999'))||'|'||
a.flg_idd_auth||'|'||
a.flg_mnt_status||'|'||
rtrim(c.cod_acct_no)||'|'||
c.cod_10||'|'||
d.nam_branch||'|'||
d.nam_cc_city||'|'||
d.nam_cc_state||'|'||
c.cod_1||'|'||
c.cod_14||'|'||
num_14||'|'||
a.cod_cust||'|'||
c.cod_last_mnt_chkrid||'|'||
c.dat_last_mnt||'|'||
c.ctr_updat_srlno||'|'||
c.cod_20||'|'||
c.num_16||'|'||
c.cod_14||'|'||
c.num_10 ||'|'||
a.flg_classif_reqd||'|'||
(select g.cod_classif_plan_id||'|'||
g.cod_classif_plan_id
from
ac_acct_preferences g
where
a.cod_acct_no=g.cod_acct_no AND g.FLG_MNT_STATUS = 'A' )||'|'||
(select e.dat_cam_expiry from flexprod_host.AC_ACCT_PLAN_CRITERIA e where a.cod_acct_no=e.cod_acct_no and e.FLG_MNT_STATUS ='A')||'|'||
c.cod_23||'|'||
lpad(trim(a.cod_cc_brn),4,0)||'|'||
(select min( o.dat_eff) from ch_acct_od_hist o where a.cod_acct_no=o.cod_acct_no )
from
ch_acct_mast a,
ch_acct_cbr_codes c,
ba_cc_brn_mast d
where
a.flg_mnt_status ='A'
and c.flg_mnt_status ='A'
and a.cod_acct_no= c.cod_acct_no(+)
and a.cod_cc_brn=d.cod_cc_brn
and a.cod_prod in (
299,200,804,863,202,256,814,232,182,844,279,830,802,833,864,
813,862,178,205,801,235,897,231,187,229,847,164,868,805,207,
250,837,274,253,831,893,201,809,846,819,820,845,811,843,285,
894,284,817,832,278,818,810,181,826,867,825,848,871,866,895,
770,806,827,835,838,881,853,188,816,293,298)
Query Plan:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4253465430
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 733K| 125M| | 468K (1)|999:59:59 | | |
| 1 | TABLE ACCESS BY INDEX ROWID | AC_ACCT_PREFERENCES | 1 | 26 | | 3 (0)| 00:01:05 | | |
|* 2 | INDEX UNIQUE SCAN | IN_AC_ACCT_PREFERENCES_1 | 1 | | | 2 (0)| 00:00:43 | | |
| 3 | PARTITION HASH SINGLE | | 1 | 31 | | 3 (0)| 00:01:05 | KEY | KEY |
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID| AC_ACCT_PLAN_CRITERIA | 1 | 31 | | 3 (0)| 00:01:05 | KEY | KEY |
|* 5 | INDEX UNIQUE SCAN | IN_AC_ACCT_PLAN_CRITERIA_1 | 1 | | | 2 (0)| 00:00:43 | KEY | KEY |
| 6 | SORT AGGREGATE | | 1 | 29 | | | | | |
| 7 | FIRST ROW | | 1 | 29 | | 3 (0)| 00:01:05 | | |
|* 8 | INDEX RANGE SCAN (MIN/MAX) | IN_CH_ACCT_OD_HIST_1 | 1 | 29 | | 3 (0)| 00:01:05 | | |
| 9 | HASH UNIQUE | | 733K| 125M| 139M| 468K (1)|999:59:59 | | |
|* 10 | HASH JOIN | | 733K| 125M| | 439K (1)|999:59:59 | | |
|* 11 | TABLE ACCESS FULL | BA_CC_BRN_MAST | 3259 | 136K| | 31 (0)| 00:11:04 | | |
|* 12 | HASH JOIN | | 747K| 97M| 61M| 439K (1)|999:59:59 | | |
| 13 | PARTITION HASH ALL | | 740K| 52M| | 286K (1)|999:59:59 | 1 | 64 |
|* 14 | TABLE ACCESS FULL | CH_ACCT_MAST | 740K| 52M| | 286K (1)|999:59:59 | 1 | 64 |
|* 15 | TABLE ACCESS FULL | CH_ACCT_CBR_CODES | 9154K| 541M| | 117K (1)|699:41:01 | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COD_ACCT_NO"=:B1 AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_co
de'),'0')))
5 - access("COD_ACCT_NO"=:B1 AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_co
de'),'0')))
8 - access("COD_ACCT_NO"=:B1)
filter("COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
10 - access("COD_CC_BRN"="COD_CC_BRN")
11 - filter("COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
12 - access("COD_ACCT_NO"="COD_ACCT_NO")
14 - filter(("COD_PROD"=164 OR "COD_PROD"=178 OR "COD_PROD"=181 OR "COD_PROD"=182 OR "COD_PROD"=187 OR "COD_PROD"=188 OR
"COD_PROD"=200 OR "COD_PROD"=201 OR "COD_PROD"=202 OR "COD_PROD"=205 OR "COD_PROD"=207 OR "COD_PROD"=229 OR "COD_PROD"=231 OR
"COD_PROD"=232 OR "COD_PROD"=235 OR "COD_PROD"=250 OR "COD_PROD"=253 OR "COD_PROD"=256 OR "COD_PROD"=274 OR "COD_PROD"=278 OR
"COD_PROD"=279 OR "COD_PROD"=284 OR "COD_PROD"=285 OR "COD_PROD"=293 OR "COD_PROD"=298 OR "COD_PROD"=299 OR "COD_PROD"=770 OR
"COD_PROD"=801 OR "COD_PROD"=802 OR "COD_PROD"=804 OR "COD_PROD"=805 OR "COD_PROD"=806 OR "COD_PROD"=809 OR "COD_PROD"=810 OR
"COD_PROD"=811 OR "COD_PROD"=813 OR "COD_PROD"=814 OR "COD_PROD"=816 OR "COD_PROD"=817 OR "COD_PROD"=818 OR "COD_PROD"=819 OR
"COD_PROD"=820 OR "COD_PROD"=825 OR "COD_PROD"=826 OR "COD_PROD"=827 OR "COD_PROD"=830 OR "COD_PROD"=831 OR "COD_PROD"=832 OR
"COD_PROD"=833 OR "COD_PROD"=835 OR "COD_PROD"=837 OR "COD_PROD"=838 OR "COD_PROD"=843 OR "COD_PROD"=844 OR "COD_PROD"=845 OR
"COD_PROD"=846 OR "COD_PROD"=847 OR "COD_PROD"=848 OR "COD_PROD"=853 OR "COD_PROD"=862 OR "COD_PROD"=863 OR "COD_PROD"=864 OR
"COD_PROD"=866 OR "COD_PROD"=867 OR "COD_PROD"=868 OR "COD_PROD"=871 OR "COD_PROD"=881 OR "COD_PROD"=893 OR "COD_PROD"=894 OR
"COD_PROD"=895 OR "COD_PROD"=897) AND "FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_
code'),'0')))
15 - filter("FLG_MNT_STATUS"='A' AND "COD_ENTITY_VPD"=TO_NUMBER(NVL(SYS_CONTEXT('CLIENTCONTEXT','entity_code'),'0')))
Considering each table contains over 100 columns I am limited while uploading the entire table definition. however please find the below details for the columns accessed in the where clause. Hope this helps.
Columns Type Nullable
cod_acct_no CHAR(16) N
FLG_MNT_STATUS CHAR(1) N
cod_23 VARCHAR2(360) Y
cod_cc_brn NUMBER(5) N
cod_prod NUMBER N
I Hope this can bring the cost down.
select
distinct rtrim(a.cod_acct_no)||'|'||
a.cod_prod||'|'||
to_char(a.dat_acct_open,'Mon DD YYYY HH:MMAM')||'|'||
a.cod_acct_title||'|'||
a.cod_acct_stat||'|'||
ltrim(to_char(a.amt_od_limit,'99999999999999999990.999999'))||'|'||
ltrim(to_char(a.bal_book,'99999999999999999990.999999'))||'|'||
a.flg_idd_auth||'|'||
a.flg_mnt_status||'|'||
rtrim(c.cod_acct_no)||'|'||
c.cod_10||'|'||
d.nam_branch||'|'||
d.nam_cc_city||'|'||
d.nam_cc_state||'|'||
c.cod_1||'|'||
c.cod_14||'|'||
num_14||'|'||
a.cod_cust||'|'||
c.cod_last_mnt_chkrid||'|'||
c.dat_last_mnt||'|'||
c.ctr_updat_srlno||'|'||
c.cod_20||'|'||
c.num_16||'|'||
c.cod_14||'|'||
c.num_10 ||'|'||
a.flg_classif_reqd||'|'||
g.cod_classif_plan_id||'|'||g.cod_classif_plan_id
||'|'||
e.dat_cam_expiry ||'|'||
c.cod_23||'|'||
lpad(trim(a.cod_cc_brn),4,0)||'|'||
(select min( o.dat_eff) from ch_acct_od_hist o where a.cod_acct_no=o.cod_acct_no )
from
ch_acct_mast a
JOIN ch_acct_cbr_codes c
ON a.flg_mnt_status ='A'
and c.flg_mnt_status ='A'
and a.cod_acct_no= c.cod_acct_no(+)
JOIN ba_cc_brn_mast d
a.cod_cc_brn=d.cod_cc_brn
JOIN ac_acct_preferences g
ON a.cod_acct_no=g.cod_acct_no AND g.FLG_MNT_STATUS = 'A'
INNER JOIN flexprod_host.AC_ACCT_PLAN_CRITERIA e
ON a.cod_acct_no=e.cod_acct_no and e.FLG_MNT_STATUS ='A'
WHERE a.cod_prod in (
299,200,804,863,202,256,814,232,182,844,279,830,802,833,864,
813,862,178,205,801,235,897,231,187,229,847,164,868,805,207,
250,837,274,253,831,893,201,809,846,819,820,845,811,843,285,
894,284,817,832,278,818,810,181,826,867,825,848,871,866,895,
770,806,827,835,838,881,853,188,816,293,298)
1. Don't fear full table scans. If a large percent of the rows in a table are being accessed it is more efficient to use a hash join/full table scan than a nested loop/index scan.
2. Fix statistics and re-analyze objects. 999 hours to read a table? That's probably an optimizer bug, have a dba look at select * from sys.aux_stats$; for some ridiculous values.
The time isn't very useful, but if one of your forecasted values is so significantly off then you need to check all of them. You should probably re-gather stats on all the relevant tables. Use default settings unless there is a good reason. For example, exec dbms_stats.gather_table_stats('your_schema_name','CH_ACCT_MAST');.
3. Look at cardinalities. Are the Rows estimates in the ballpark? They'll almost never be perfect, but if they are off by more than
an order of magnitude or two it can cause problems. Look for the first significant difference and try to correct it.
4. Code change. #Santhosh had a good idea to re-write using ANSI joins and manually unnest a subquery. Although I think you should
try to unnest the other subquery instead. Oracle can automatically unnest subqueries, but not if subqueries "contain aggregate functions".
5. Disable VPD Looks like this query is being transformed. Make sure you understand exactly what it's doing and why. You may want to disable VPD temporarily, for yourself, while you debug this problem.
6. Parallelism. Since some of these tables are large, you may want to add a parallel hint. But be careful, it is easy to use up a lot
of resources. Try to get the plan right before you do this.

Join much slower using table() function

I am attempting to use a table() function on an object in order to do a join within a PL/SQL function. When using this function, a query may take up to 20 minutes to complete; when I enter the data directly into a table instead, it takes less than 5 seconds. I have not been able to figure out why there is such a significant difference, but my best hunch is that the index on the column from the joining table is not being used. The column definition for the tables and for the objects is the same.
Here is some example code:
create or replace type VARCHAR20_TYPE is OBJECT
(
val varchar2(20 byte);
);
create or replace type VARCHAR20_TABLE is table of VARCHAR20_TYPE;
create or replace FUNCTION test_function(
in_project_ids VARCHAR20_TABLE
) RETURN INTEGER
IS
l_result INTEGER;
BEGIN
SELECT count(*) into l_result FROM project p JOIN TABLE(in_project_ids) t ON p.project_id = t.val;
RETURN l_result;
END;
If I were to replace in_project_ids in the above example with a join to a real table with the same column definition, it significantly improves the performance of the function.
this is to be expected. when dealing with in memory arrays like this Oracle will assume 8k rows will be in that table.
try this to help it:
SELECT /*+ cardinality(t, 20) */ count(*) into l_result FROM project p JOIN TABLE(in_project_ids) t ON p.project_id = t.val;
where 20 should be a rough guess on the actual number of entries. this is one of the edge cases where hinting is "ok" (and required to help the optimizer).
edit
eg:
SQL> explain plan for SELECT /*+ cardinality(t, 1) */ * FROM project p JOIN TABLE(VARCHAR20_TABLE()) t ON p.project_id = t.val;
Explained.
SQL> select * From table(dbms_xplan.display);
Plan hash value: 858605789
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 27 | 30 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 1 | 2 | 29 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C0011177 | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID | PROJECT | 1 | 25 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."PROJECT_ID"=TO_NUMBER(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)))
Note
-----
- dynamic sampling used for this statement (level=2)
21 rows selected.
SQL> explain plan for SELECT * FROM project p JOIN TABLE(VARCHAR20_TABLE()) t ON p.project_id = t.val;
Explained.
SQL> select * From table(dbms_xplan.display);
Plan hash value: 583089723
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 215K| 33 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 8168 | 215K| 33 (4)| 00:00:01 |
| 2 | TABLE ACCESS FULL | PROJECT | 2000 | 50000 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 16336 | 29 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("P"."PROJECT_ID"=TO_NUMBER(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)))
Note
-----
- dynamic sampling used for this statement (level=2)
19 rows selected.
a trivial example but note the "Rows" on the collection fetch = 8168 without the hint and the change in plan as a result. check the explain plan with the real table vs the collection vs the hinted collection and helpfully, with a reasonable cardinality hint number your plan and performance should improve.

Optimizing sql query with subselect list in clause

I'm using oracle 11g and trying to optimize a query.
The basic structure of the query is:
SELECT val1, val2, val3,
FROM
table_name
WHERE
val1 in (subselect statement is here, it selects a list of possible values for
val1 from another table)
and val5>=X and val5<=Y
group by val1
order by val2 desc;
My issue is that when I use a subselect, the cost is 3130.
If I fill in the results of the subselect by hand - so, for example
field1 in (1, 2, 3, 4, 5, 6)
Where (1, 2, 3, 4, 5, 6) is the results of the subselect, which in this case is all possible values of field 1, the cost of the query is 14, and oracle uses an "inlist iterator" for the group by part of the query. The results of the two queries are identical.
My question is how to mimic the behaviour of manually listing the possible values of field1 with a subselect statement. The reason I don't list those values in the query is that the possible values change based on one of the other fields, so the subselect is pulling the possible values of field1 from a 2nd table based on, say, field2.
I have an index of val1, val5, so it isn't doing any full table scans - it does do a range scan in both cases, but in the subselect case the range scan is much more expensive. However it isn't the most expensive part of the subselect query. The most expensive part is the group by, which is a HASH.
Edit - Yes, the query isn't syntactically correct - I didn't want to put up anything too specific. The actual query is fine - the selects use valid group by functions.
The subselect returns 6 values, but it can be anywhere from 1-50 or so based on the other value.
Edit2 - What I ended up doing was 2 separate queries so I could generate the list used in the subselect. I actually tried a similar test in sqlite, and it does the same thing, so this isn't just Oracle.
what you are seeing is a result of the IN () bieng subject to bind variable peeking. when you have histograms you write a query like "where a = 'a'" oracle will use the histogram to guess how many rows will be returned (same idea with an inlist operator, which iterates for each item and aggregates rows). if no histograms it will make a guess in the form of rows/distinct values.
In a subquery oracle doesn't do this (in most cases..there is a unique case where it does).
for example:
SQL> create table test
2 (val1 number, val2 varchar2(20), val3 number);
Table created.
Elapsed: 00:00:00.02
SQL>
SQL> insert into test select 1, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;
100 rows created.
Elapsed: 00:00:00.01
SQL> insert into test select 2, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 1000;
1000 rows created.
Elapsed: 00:00:00.02
SQL> insert into test select 3, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;
100 rows created.
Elapsed: 00:00:00.00
SQL> insert into test select 4, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100000;
100000 rows created.
so i have a table with 101200 rows. for VAL1 , 100 are "1" 1000 are "2" 100 are "3" and 100k are "4".
now if histograms are gathered (and we do want them in this case)
SQL> exec dbms_stats.gather_table_stats(user , 'test', degree=>4, method_opt=>'for all indexed columns size 4', estimate_percent=>100);
SQL> exec dbms_stats.gather_table_stats(user , 'lookup', degree=>4, method_opt =>'for all indexed columns size 3', estimate_percent=>100);
we see the following:
SQL> explain plan for select * from test where val1 in (1, 2, 3) ;
Explained.
SQL> #explain ""
Plan hash value: 3165434153
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1200 | 19200 | 23 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1200 | 19200 | 23 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | TEST1 | 1200 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
vs
SQL> explain plan for select * from test where val1 in (select id from lookup where str = 'A') ;
Explained.
SQL> #explain ""
Plan hash value: 441162525
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25300 | 518K| 106 (3)| 00:00:02 |
| 1 | NESTED LOOPS | | 25300 | 518K| 106 (3)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| LOOKUP | 1 | 5 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | LOOKUP1 | 1 | | 0 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | TEST | 25300 | 395K| 105 (3)| 00:00:02 |
----------------------------------------------------------------------------------------
where lookup table is
SQL> select * From lookup;
ID STR
---------- ----------
1 A
2 B
3 C
4 D
(str is unique indexed and has histograms).
notice a bang on cardinality of 1200 for the inlist and a good plan, but a wildly inaccurate one on the sub query? Oracle hasn't computed histograms on the join condition, instead it has said "look, i dont know what id will be, so ill guess total rows(100k+1000+100+100)/distinct values(4) = 25300 and use that. as such its picked a full table scan.
that's all great, but how to fix it? if you know that this sub query will match a small number of rows (we do). then you have to hint the outer query to try to have it use an index. like:
SQL> explain plan for select /*+ index(t) */ * from test t where val1 in (select id from lookup where str = 'A') ;
Explained.
SQL> #explain
Plan hash value: 702117913
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25300 | 518K| 456 (1)| 00:00:06 |
| 1 | NESTED LOOPS | | 25300 | 518K| 456 (1)| 00:00:06 |
| 2 | TABLE ACCESS BY INDEX ROWID| LOOKUP | 1 | 5 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | LOOKUP1 | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| TEST | 25300 | 395K| 455 (1)| 00:00:06 |
|* 5 | INDEX RANGE SCAN | TEST1 | 25300 | | 61 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
another thing is in my particular case. as val1=4 is most of the table, lets say i have my standard query:
select * from test t where val1 in (select id from lookup where str = :B1);
for the possible :B1 inputs. if i know that the valid values passed in are A, B and C (ie not D which maps to id=4) . i can add this trick:
SQL> explain plan for select * from test t where val1 in (select id from lookup where str = :b1 and id in (1, 2, 3)) ;
Explained.
SQL> #explain ""
Plan hash value: 771376936
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 5250 | 24 (5)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 5250 | 24 (5)| 00:00:01 |
|* 2 | VIEW | index$_join$_002 | 1 | 5 | 1 (100)| 00:00:01 |
|* 3 | HASH JOIN | | | | | |
|* 4 | INDEX RANGE SCAN | LOOKUP1 | 1 | 5 | 0 (0)| 00:00:01 |
| 5 | INLIST ITERATOR | | | | | |
|* 6 | INDEX UNIQUE SCAN | SYS_C002917051 | 1 | 5 | 0 (0)| 00:00:01 |
| 7 | INLIST ITERATOR | | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID| TEST | 1200 | 19200 | 23 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | TEST1 | 1200 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
now notice oracle has got a reasonable card (its pushed the 1,2,3 onto the TEST table and got 1200..not 100% accurate, as i was only filtering on noe of them but ive told oralce CERTAINLY NOT 4!
I have done some research and I think everything is explained here: oracle docs.
Just look in "How the CBO Evaluates IN-List Iterators"
and compare it to "How the CBO Evaluates the IN Operator".
Your query with "field1 in (1, 2, 3, 4, 5, 6)" is matching first case but query with subselect is rewritten by Oracle.
So every query with subselect or join will have similar cost to yours unless you find very tricky way to put return from subquery as parameters.
You can always try to set more memory to sorts.
You might be able to fix the statement by adding indexes on the subselect. However, you would have to post the query and execution plan to understand that. By the way, how long does the subselect itself take?
You can try one of the following two versions:
select val1, val2, val3
from table_name join
(select distinct val from (subselect here)) t
on table_name.val1 = t.val
where val5>=X and val5<=Y
group by val1, val2, val3
order by val2 desc;
or:
select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
exists (select 1 from (subselect here) t where t.val = table_name.val1)
group by val1, val2, val3
order by val2 desc;
These are semantically equivalent, and one of them might optimize better.
One other possibility that might work is to do the filtering after the group by. Something like:
select t.*
from (select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
group by val1, val2, val3
) t
where val1 in (subselect here)
order by val2 desc;