Execution plan too expensive case when exists - sql

I have the below query, but when I execute it runs forever.
WITH aux AS (
SELECT
contract,
contract_account,
business_partner,
payment_plan,
installation,
contract_status
FROM
reta.mv_integrated_md a
WHERE
contract_status IN (
'LIVE',
'FINAL'
)
), aux1 AS (
SELECT
a.*,
CASE
WHEN EXISTS (
SELECT
NULL
FROM
aux b
WHERE
b.business_partner = a.business_partner
AND b.installation = a.installation
AND b.payment_plan = 'BMW'
) THEN
'X'
END h
FROM
aux a
)
SELECT
*
FROM
aux1;
My execution plan shows a huge cost which I cannot locate. How could I optimize this query? I have tried some hints but none of them have worked :(
Plan hash value: 1662974027
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19M| 2000M| 825G (1)|999:59:59 | | |
|* 1 | VIEW | | 19M| 990M| 41331 (1)| 00:00:02 | | |
| 2 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FDA49C92_9A7BE8DE | 19M| 1066M| 41331 (1)| 00:00:02 | | |
| 3 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 4 | LOAD AS SELECT | SYS_TEMP_0FDA49C92_9A7BE8DE | | | | | | |
| 5 | PARTITION RANGE SINGLE | | 18M| 974M| 759K (1)| 00:00:30 | 1 | 1 |
|* 6 | TABLE ACCESS STORAGE FULL| MV_INTEGRATED_MD | 18M| 974M| 759K (1)| 00:00:30 | 1 | 1 |
| 7 | VIEW | | 19M| 2000M| 41331 (1)| 00:00:02 | | |
| 8 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FDA49C92_9A7BE8DE | 19M| 1066M| 41331 (1)| 00:00:02 | | |
----------------------------------------------------------------------------------------------------------------------------
Kindly let me know if any additional information needed.

Use window functions:
SELECT r.contract, r.contract_account, r.business_partner,
r.payment_plan, r.installation, r.contract_status,
MAX(CASE WHEN r.payment_plan = 'BMW' THEN 'X' END) OVER (PARTITION BY business_partner, installation) as h
FROM reta.mv_integrated_md#rbip r
WHERE r.contract_status IN ('LIVE', 'FINAL');
Not only is the query much simpler to write and read, but it should perform much better too.

Highest cost is due to FTS(Full table scan) on table/MV MV_INTEGRATED_MD.
Try to create index on contract_status and check if it reduces the cost and also, what is size of this mv/table in terms of block and it is 10 percent or more than total buffer cache size ?
TABLE ACCESS STORAGE FULL| MV_INTEGRATED_MD | 18M| 974M| 759K (1)| 00:00:30 | 1 | 1

If you run your query with the /*+ gather_plan_statistics */ hint (I'm simulating it with a 1000 row table) you imediately see the problem :
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
-------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
-------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 9 | 5 |
|* 1 | VIEW | | 1000 | 1000 | 1000 |00:00:00.09 | 0 | 0 |
| 2 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6737_1A17DE13 | 1000 | 1000 | 500K|00:00:00.08 | 0 | 0 |
| 3 | TEMP TABLE TRANSFORMATION | | 1 | | 1000 |00:00:00.01 | 9 | 5 |
| 4 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6737_1A17DE13 | 1 | | 0 |00:00:00.01 | 8 | 5 |
|* 5 | TABLE ACCESS FULL | MV_INTEGRATED_MD | 1 | 1000 | 1000 |00:00:00.01 | 7 | 5 |
| 6 | VIEW | | 1 | 1000 | 1000 |00:00:00.01 | 0 | 0 |
| 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6737_1A17DE13 | 1 | 1000 | 1000 |00:00:00.01 | 0 | 0 |
-------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("B"."BUSINESS_PARTNER"=:B1 AND "B"."INSTALLATION"=:B2 AND "B"."PAYMENT_PLAN"='BMW'))
5 - filter("CONTRACT_STATUS"='LIVE')
It is in the line 2 where a full scan is activated in a loop for each line of the main table (see starts = 1000)
Typically you want to resolve the EXISTS with a semi join to preserve good performance, but here it seems that Oracle can not rewrite it.
So you'll need to rewrite the query yourself.
Despite the excelent proposal of #GordonLinoff (that I'll start with) you may try to use an outer join as follows
with bmw as (
select distinct business_partner, installation
from mv_integrated_md
where payment_plan = 'BMW')
SELECT
a.contract,
a.contract_account,
a.business_partner,
a.payment_plan,
a.installation,
a.contract_status,
case when b.business_partner is not null then 'X' end as h
FROM mv_integrated_md a
left outer join bmw b
on b.business_partner = a.business_partner and
b.installation = a.installation
WHERE a.contract_status IN ( 'LIVE', 'FINAL')
This will lead to two fulls scans, one deduplication and outer join.

Related

Oracle database slow execution of statement with multiple recursive common table expressions

I have an Oracle 19c database table with "resources" that are organized hierarchically like a nested folder tree. The table contains around 2.5 million rows and the tree is up to 10 levels deep.
create table RESOURCES (
ID_ NUMBER(10) not null constraint PK_RESOURCES primary key,
FOLDERID_ NUMBER(10) constraint FK_PARENTFOLDER references RESOURCES
);
create index FOLDERIDINDEX on RESOURCES (FOLDERID_);
I'm using SQL recursive common table expressions (aka recursive subquery factoring) to find all descendants of some given resources.
In general, this works quite nicely, but if I try to get descendants of multiple folders with one query, some queries don't perform at all using Oracle. I'd like to understand why this is the case, and if there's some easy way to speed things up (query hint?, bugfix?, ...?)
For example, this statement does not return within 60 minutes(!):
WITH cte1 (id_) AS (SELECT id_ FROM Resources where id_ = 11
UNION ALL
SELECT r.id_ FROM Resources r, cte1 c WHERE r.folderId_ = c.id_),
cte2 (id_) AS (SELECT id_ FROM Resources where id_ = 808965
UNION ALL
SELECT r.id_ FROM Resources r, cte2 c WHERE r.folderId_ = c.id_)
SELECT count(*)
FROM Resources r
WHERE (r.folderId_ IN (SELECT * FROM cte1) OR r.folderId_ IN (SELECT * FROM cte2));
If I replace the two sub-selects in the last line with a UNION, it just takes a few seconds:
WITH cte1 (id_) AS (SELECT id_ FROM Resources where id_ = 11
UNION ALL
SELECT r.id_ FROM Resources r, cte1 c WHERE r.folderId_ = c.id_),
cte2 (id_) AS (SELECT id_ FROM Resources where id_ = 808965
UNION ALL
SELECT r.id_ FROM Resources r, cte2 c WHERE r.folderId_ = c.id_)
SELECT count(*)
FROM Resources r
WHERE (r.folderId_ IN (SELECT * FROM cte1 UNION SELECT * FROM cte2));
While that could already be the solution, it's a bit hard to change in my project, because SQL is auto-generated by code from application queries, and at that point not so easy to change. Also, application queries could be much more complex and such a replacement might not even be possible. These are just simplified examples. Maybe there's some other way to speed things up?
Interestingly, the slow query works without any performance problems on other databases like MySQL 8, PostgreSQL 13, SQL Server 2016 (with small syntax changes for the databases). It's just Oracle which seems to have a problem here.
This is the query plan for the first query, i.e. the slow one:
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 54G (1)|591:38:12 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | RESOURCES | 2410K| 11M| 9389 (1)| 00:00:01 |
|* 4 | VIEW | | 239K| 3046K| 23128 (1)| 00:00:01 |
| 5 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | |
|* 6 | INDEX UNIQUE SCAN | PK_RESOURCES | 1 | 6 | 2 (0)| 00:00:01 |
|* 7 | HASH JOIN | | 239K| 5623K| 23126 (1)| 00:00:01 |
| 8 | RECURSIVE WITH PUMP | | | | | |
| 9 | BUFFER SORT (REUSE) | | | | | |
| 10 | TABLE ACCESS FULL | RESOURCES | 2410K| 25M| 9386 (1)| 00:00:01 |
|* 11 | VIEW | | 239K| 3046K| 23128 (1)| 00:00:01 |
| 12 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | |
|* 13 | INDEX UNIQUE SCAN | PK_RESOURCES | 1 | 6 | 2 (0)| 00:00:01 |
|* 14 | HASH JOIN | | 239K| 5623K| 23126 (1)| 00:00:01 |
| 15 | RECURSIVE WITH PUMP | | | | | |
| 16 | BUFFER SORT (REUSE) | | | | | |
| 17 | TABLE ACCESS FULL | RESOURCES | 2410K| 25M| 9386 (1)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
" 2 - filter( EXISTS (SELECT 0 FROM ""CTE1"" ""CTE1"" WHERE ""CTE1"".""ID_""=:B1) OR EXISTS (SELECT 0 "
" FROM ""CTE2"" ""CTE2"" WHERE ""CTE2"".""ID_""=:B2))"
" 4 - filter(""CTE1"".""ID_""=:B1)"
" 6 - access(""ID_""=11)"
" 7 - access(""R"".""FOLDERID_""=""C"".""ID_"")"
" 11 - filter(""CTE2"".""ID_""=:B1)"
" 13 - access(""ID_""=808965)"
" 14 - access(""R"".""FOLDERID_""=""C"".""ID_"")"
For comparison, the faster query using a UNION seems to use a better plan:
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | | 55733 (1)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | 18 | | | |
|* 2 | HASH JOIN | | 2806K| 48M| 11M| 55733 (1)| 00:00:03 |
| 3 | VIEW | VW_NSO_1 | 479K| 6092K| | 50820 (1)| 00:00:02 |
| 4 | SORT UNIQUE | | 479K| 6092K| 9424K| 50820 (1)| 00:00:02 |
| 5 | UNION-ALL | | | | | | |
| 6 | VIEW | | 239K| 3046K| | 23128 (1)| 00:00:01 |
| 7 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | | |
|* 8 | INDEX UNIQUE SCAN | PK_RESOURCES | 1 | 6 | | 2 (0)| 00:00:01 |
|* 9 | HASH JOIN | | 239K| 5623K| | 23126 (1)| 00:00:01 |
| 10 | RECURSIVE WITH PUMP | | | | | | |
| 11 | BUFFER SORT (REUSE) | | | | | | |
| 12 | TABLE ACCESS FULL | RESOURCES | 2410K| 25M| | 9386 (1)| 00:00:01 |
| 13 | VIEW | | 239K| 3046K| | 23128 (1)| 00:00:01 |
| 14 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | | |
|* 15 | INDEX UNIQUE SCAN | PK_RESOURCES | 1 | 6 | | 2 (0)| 00:00:01 |
|* 16 | HASH JOIN | | 239K| 5623K| | 23126 (1)| 00:00:01 |
| 17 | RECURSIVE WITH PUMP | | | | | | |
| 18 | BUFFER SORT (REUSE) | | | | | | |
| 19 | TABLE ACCESS FULL | RESOURCES | 2410K| 25M| | 9386 (1)| 00:00:01 |
| 20 | INDEX FAST FULL SCAN | FOLDERIDINDEX | 2410K| 11M| | 2392 (1)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
" 2 - access(""R"".""FOLDERID_""=""ID_"")"
" 8 - access(""ID_""=11)"
" 9 - access(""R"".""FOLDERID_""=""C"".""ID_"")"
" 15 - access(""ID_""=808965)"
" 16 - access(""R"".""FOLDERID_""=""C"".""ID_"")"

Recursive CTE much slower than CONNECT BY in Oracle

I have a table that looks like this:
Bundleref
Subbundleref
123
456
456
789
Starting from a certain reference (e.g. 123), I want a list of all descendants, all the way until the leaf values.
In Oracle, I can use a CONNECT BY clause like this:
select subbundleref from store.tw_bundles
start with bundleref = 2201114
connect by prior subbundleref = bundleref
For compatibility reasons, I am trying to convert it to a recursive CTE, like this:
WITH bundles(br,sr)
AS
(
SELECT bundleref, subbundleref
FROM store.tw_bundles where bundleref = 2201114
UNION ALL
SELECT bundleref, subbundleref
FROM store.tw_bundles twb
inner join bundles on twb.bundleref = bundles.sr
)
select sr from bundles
This gives me the same result. There is one problem though: the CONNECT BY query takes 300 ms, the recursive query takes about 50 seconds. Am I doing something inefficient here or is this not being optimized? (I'm using Oracle 19c.)
Explain plan for first query:
Plan hash value: 4216745508
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 218 | 5668 | 36 (6)| 00:00:01 |
|* 1 | CONNECT BY WITH FILTERING| | | | | |
|* 2 | INDEX RANGE SCAN | TW_BUNDLES_BUNDLEREF_IDX | 14 | 168 | 3 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 204 | 5100 | 31 (0)| 00:00:01 |
| 4 | CONNECT BY PUMP | | | | | |
|* 5 | INDEX RANGE SCAN | TW_BUNDLES_BUNDLEREF_IDX | 15 | 180 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("BUNDLEREF"=PRIOR "SUBBUNDLEREF")
2 - access("BUNDLEREF"=2201114)
5 - access("connect$_by$_pump$_002"."prior subbundleref "="BUNDLEREF")
Note
-----
- this is an adaptive plan
And for the second one:
Plan hash value: 1467025167
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 259K (3)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | | | | |
| 2 | VIEW | | 1975M| | | 259K (3)| 00:00:11 |
| 3 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | | |
|* 4 | INDEX RANGE SCAN | TW_BUNDLES_BUNDLEREF_IDX | 14 | 168 | | 3 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 1975M| 45G| 258M| 259K (3)| 00:00:11 |
| 6 | BUFFER SORT (REUSE) | | | | | | |
| 7 | TABLE ACCESS FULL | TW_BUNDLES | 11M| 129M| | 9208 (1)| 00:00:01 |
| 8 | RECURSIVE WITH PUMP | | | | | | |
-------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("BUNDLEREF"=2201114)
5 - access("TWB"."BUNDLEREF"="BUNDLES"."SR")

Does indexing works with "WITH" in Oracle

I have query something like
WITH
str_table as (
SELECT stringtext, stringnumberid
FROM STRING_TABLE
WHERE LANGID IN (23,62)
),
data as (
select *
from employee emp
left outer join str_table st on emp.nameid = st.stringnumberid
)
select * from data
I know With clause will work in this manner
Step 1 : The SQL Query within the with clause is executed at first step.
Step 2 : The output of the SQL query is stored into temporary relation of with clause.
Step 3 : The Main query is executed with temporary relation produced at the last stage.
Now I want to ask whether the indexes created on the actual STRING_TABLE are going to help in temporary str_table produce by the With clause? I want to ask whether the indexes also have impact on str_table or not?
Oracle will not process CTE one by one. It will analyze the SQL as a whole. Your SQL is most likely the same as following in the eye of Oracle optimizer
select emp.*
from employee emp left outer join STRING_TABLE st
on emp.nameid = st.stringnumberid
where st.LANGID IN (23,62);
Oracle can use index on STRING_TABLE. Whether it will depends on the table statistics. For example, if the table has few rows (say a few hundred), Oracle will likely not use index.
It depends.
First of all, with clause is not a temporary table. As documentation says:
Oracle Database optimizes the query by treating the query name as either an inline view or as a temporary table.
Optimizer decides to materialize with subquery if either you forse it to do so by using /*+materialize*/ hint inside the subquery or you reuse this with subquery more than once.
In the example below Oracle uses with clause as inline view and merges it within the main query:
explain plan for
with a as (
select
s.textid,
s.textvalue,
a.id,
a.other_column
from string_table s
join another_tab a
on s.textid = a.textid
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
and b.job_textid = a_name.id
| PLAN_TABLE_OUTPUT |
| :----------------------------------------------------------------------------------- |
| Plan hash value: 1854049435 |
| |
| ------------------------------------------------------------------------------------ |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ------------------------------------------------------------------------------------ |
| | 0 | SELECT STATEMENT | | 1 | 1147 | 74 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 1 | 1147 | 74 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | ANOTHER_TAB | 39 | 3042 | 3 (0)| 00:00:01 | |
| |* 3 | HASH JOIN | | 31 | 33139 | 71 (0)| 00:00:01 | |
| | 4 | TABLE ACCESS FULL| BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| |* 5 | TABLE ACCESS FULL| STRING_TABLE | 1143 | 589K| 68 (0)| 00:00:01 | |
| ------------------------------------------------------------------------------------ |
But depending on the statistics and hints it may evaluate subquery first and then add it to the main query:
explain plan for
with a as (
select
s.textid,
s.textvalue,
a.id,
a.other_column
from string_table s
join another_tab a
on s.textid = a.textid
where langid in (1)
)
select /*+NO_MERGE(a_name)*/ *
from big_table b
join a a_name
on b.name_textid = a_name.textid
and b.job_textid = a_name.id
| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| Plan hash value: 4105667421 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 101 | 110K| 74 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 101 | 110K| 74 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 3 | VIEW | | 64 | 37120 | 71 (0)| 00:00:01 | |
| |* 4 | HASH JOIN | | 64 | 38784 | 71 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS FULL| ANOTHER_TAB | 39 | 3042 | 3 (0)| 00:00:01 | |
| |* 6 | TABLE ACCESS FULL| STRING_TABLE | 1143 | 589K| 68 (0)| 00:00:01 | |
| ------------------------------------------------------------------------------------- |
When you use with subquery twice, optimizer decides to materialize it:
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
join a a_job
on b.job_textid = a_job.textid
| PLAN_TABLE_OUTPUT |
| :--------------------------------------------------------------------------------------------------------------------- |
| Plan hash value: 1371454296 |
| |
| ---------------------------------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ---------------------------------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 63 | 98973 | 67 (0)| 00:00:01 | |
| | 1 | TEMP TABLE TRANSFORMATION | | | | | | |
| | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D7224_469C01 | | | | | |
| | 3 | TABLE ACCESS BY INDEX ROWID BATCHED | STRING_TABLE | 999 | 515K| 22 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 999 | | 4 (0)| 00:00:01 | |
| |* 5 | HASH JOIN | | 63 | 98973 | 45 (0)| 00:00:01 | |
| |* 6 | HASH JOIN | | 35 | 36960 | 24 (0)| 00:00:01 | |
| | 7 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 8 | VIEW | | 999 | 502K| 21 (0)| 00:00:01 | |
| | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7224_469C01 | 999 | 502K| 21 (0)| 00:00:01 | |
| | 10 | VIEW | | 999 | 502K| 21 (0)| 00:00:01 | |
| | 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7224_469C01 | 999 | 502K| 21 (0)| 00:00:01 | |
| ---------------------------------------------------------------------------------------------------------------------- |
So when there are some indexes on tables inside with subquery they may be used in all above cases: before materialization, when subquery is not merged and when subquery is merged and some idexes provide better query plan on merged subquery (even when those indexes are not used when you execute subquery alone).
What about idexes: if they provide high selectivity (i.e. number of rows retrieved by index is small compared to the overall number of rows), then Oracle will consider to use it. Note, that index access has two steps: read index blocks and then read table blocks that contain rowids found by index. If table size is not much bigger than index size, then Oracle may use table scan instead of index scan even for quite selective predicate (because of doubled IO).
In the below example I've used "small" texts (100 chars) and big_table table of 20 rows and this index for text table:
create index ix
on string_table(langid, textid)
Optimizer decides to use index range scan and read only blocks of the first level (first column of the index):
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
| PLAN_TABLE_OUTPUT |
| :---------------------------------------------------------------------------------------------------- |
| Plan hash value: 1660330381 |
| |
| ----------------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| ----------------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 29 | 31001 | 26 (0)| 00:00:01 | |
| |* 1 | HASH JOIN | | 29 | 31001 | 26 (0)| 00:00:01 | |
| | 2 | TABLE ACCESS FULL | BIG_TABLE | 19 | 10279 | 3 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS BY INDEX ROWID BATCHED| STRING_TABLE | 999 | 515K| 23 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 999 | | 4 (0)| 00:00:01 | |
| ----------------------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 1 - access("B"."NAME_TEXTID"="S"."TEXTID") |
| 4 - access("LANGID"=1) | |
But when we reduce the number of rows in big_table, it uses both the columns for index scan:
delete from big_table
where id > 4
explain plan for
with a as (
select
s.textid,
s.textvalue
from string_table s
where langid in (1)
)
select *
from big_table b
join a a_name
on b.name_textid = a_name.textid
| PLAN_TABLE_OUTPUT |
| :-------------------------------------------------------------------------------------------- |
| Plan hash value: 1766926914 |
| |
| --------------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
| --------------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 1 | NESTED LOOPS | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 2 | NESTED LOOPS | | 6 | 18216 | 11 (0)| 00:00:01 | |
| | 3 | TABLE ACCESS FULL | BIG_TABLE | 4 | 4032 | 3 (0)| 00:00:01 | |
| |* 4 | INDEX RANGE SCAN | IX | 1 | | 1 (0)| 00:00:01 | |
| | 5 | TABLE ACCESS BY INDEX ROWID| STRING_TABLE | 2 | 4056 | 2 (0)| 00:00:01 | |
| --------------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 4 - access("LANGID"=1 AND "B"."NAME_TEXTID"="S"."TEXTID") |
| |
You may check above code snippets in the db<>fiddle.

Avoid Full Table Scan with subquery or analytic function in view

I can reproduce the following behavior both with Oracle 11 (see SQL Fiddle) and Oracle 12.
CREATE TYPE my_tab IS TABLE OF NUMBER(3);
CREATE TABLE test AS SELECT ROWNUM AS id FROM dual CONNECT BY ROWNUM <= 1000;
CREATE UNIQUE INDEX idx_test ON test( id );
CREATE VIEW my_view AS
SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt
FROM test;
The following case uses the index as expected:
SELECT * FROM my_view
WHERE id IN ( 1, 2 );
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 2 (0)| 00:00:01 |
| 1 | VIEW | MY_VIEW | 2 | 52 | 2 (0)| 00:00:01 |
| 2 | WINDOW BUFFER | | 2 | 8 | 2 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
|* 4 | INDEX UNIQUE SCAN| IDX_TEST | 2 | 8 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------
The following case does not use the index even though the cardinality hint is provided:
SELECT * FROM my_view
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab );
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 28 | 33 (4)| 00:00:01 |
|* 1 | HASH JOIN RIGHT SEMI | | 1 | 28 | 33 (4)| 00:00:01 |
| 2 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
| 3 | VIEW | MY_VIEW | 1000 | 26000 | 4 (25)| 00:00:01 |
| 4 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Edit:
Using an inline view and a JOIN instead of IN uses a similar plan:
SELECT /*+ CARDINALITY( tab, 2 ) */ *
FROM ( SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt FROM test ) t
JOIN TABLE( NEW my_tab( 1, 2 ) ) tab ON ( tab.COLUMN_VALUE = t.id );
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 56 | 33 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 56 | 33 (4)| 00:00:01 |
| 2 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
| 3 | VIEW | | 1000 | 26000 | 4 (25)| 00:00:01 |
| 4 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Replacing the analytic function by a LEFT JOIN with GROUP BY does not help either:
SELECT *
FROM ( SELECT t.id, s.cnt
FROM test t
LEFT JOIN ( SELECT id, COUNT(*) AS cnt
FROM test
GROUP BY id
) s ON ( s.id = t.id )
)
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab );
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 64 | 34 (6)| 00:00:01 |
|* 1 | HASH JOIN OUTER | | 2 | 64 | 34 (6)| 00:00:01 |
| 2 | NESTED LOOPS | | 2 | 12 | 30 (4)| 00:00:01 |
| 3 | SORT UNIQUE | | 2 | 4 | 29 (0)| 00:00:01 |
| 4 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 2 | 4 | 29 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IDX_TEST | 1 | 4 | 0 (0)| 00:00:01 |
| 6 | VIEW | | 1000 | 26000 | 4 (25)| 00:00:01 |
| 7 | HASH GROUP BY | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 8 | TABLE ACCESS FULL | TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Replacing the PL/SQL Collection by a subselect does not seem to help either. The CARDINALITY hint is considered (the plan says 2 rows), but the index is still ignored.
SELECT *
FROM ( SELECT id, cnt FROM my_view )
WHERE id IN ( SELECT /*+ CARDINALITY( tab 2 ) */ id FROM test tab );
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 4 (25)| 00:00:01 |
| 1 | NESTED LOOPS | | 2 | 60 | 4 (25)| 00:00:01 |
| 2 | VIEW | MY_VIEW | 1000 | 26000 | 4 (25)| 00:00:01 |
| 3 | WINDOW SORT | | 1000 | 4000 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TEST | 1000 | 4000 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IDX_TEST | 1 | 4 | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Adding WHERE tab.id <= 2 to the in-list-subquery uses the index, so the optimizer seems to "not take the CARDINALITY hint serious enough" when selecting from a view with analytic functions (or another subselect) and filtering by a list of values.
How can I make these queries use the index as expected?
I think the one problem might be that the optimizer refuses to merge a view (and consider any indexes on the underlying tables) when the outer query block contains PL/SQL functions (e.g. TABLE()).
If you manually expand the view and query the table directly, it can access the index fine:
SELECT id, COUNT(1) OVER ( PARTITION BY id ) AS cnt
FROM test
WHERE id IN ( SELECT COLUMN_VALUE
FROM TABLE( NEW my_tab( 1, 2 ) ) tab )
;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 31 (4)| 00:00:01 |
| 1 | WINDOW SORT | | 1 | 6 | 31 (4)| 00:00:01 |
|* 2 | HASH JOIN SEMI | | 1 | 6 | 30 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | IDX_TEST | 1000 | 4000 | 1 (0)| 00:00:01 |
| 4 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 16336 | 29 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
I'm not sure if there's a way to override this behavior, or if it's a limitation in the optimizer. I tried moving the TABLE function to a CTE, but that doesn't seem to help.

Physical reads caused by WITH clause

I have a set of complex optimized selects that suffer from physical reads. Without them they would be even faster!
These physical reads occur due to the WITH clause, one physical_read_request per WITH sub-query. They seem totally unnecessary to me, I'd prefer Oracle keeping the sub-query results in memory instead of writing them down to disk and reading them again.
I'm looking for a way how to get rid of these phys reads.
A simple sample having the same problems is this:
Edit: Example replaced with simpler one that does not use dictionary views.
alter session set STATISTICS_LEVEL=ALL;
create table T as
select level NUM from dual
connect by level <= 1000;
with /*a2*/ TT as (
select NUM from T
where NUM between 100 and 110
)
select * from TT
union all
select * from TT
;
SELECT * FROM TABLE(dbms_xplan.display_cursor(
(select sql_id from v$sql
where sql_fulltext like 'with /*a2*/ TT%'
and sql_fulltext not like '%v$sql%'
and sql_fulltext not like 'explain%'),
NULL, format=>'allstats last'));
and the corresponding execution plan is
SQL_ID bpqnhfdmxnqvp, child number 0
-------------------------------------
with /*a2*/ TT as ( select NUM from T where NUM between 100 and
110 ) select * from TT union all select * from TT
Plan hash value: 4255080040
---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 22 |00:00:00.01 | 20 | 1 | 1 | | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | 22 |00:00:00.01 | 20 | 1 | 1 | | | |
| 2 | LOAD AS SELECT | | 1 | | 0 |00:00:00.01 | 8 | 0 | 1 | 266K| 266K| 266K (0)|
|* 3 | TABLE ACCESS FULL | T | 1 | 11 | 11 |00:00:00.01 | 4 | 0 | 0 | | | |
| 4 | UNION-ALL | | 1 | | 22 |00:00:00.01 | 9 | 1 | 0 | | | |
| 5 | VIEW | | 1 | 11 | 11 |00:00:00.01 | 6 | 1 | 0 | | | |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6646_63A776 | 1 | 11 | 11 |00:00:00.01 | 6 | 1 | 0 | | | |
| 7 | VIEW | | 1 | 11 | 11 |00:00:00.01 | 3 | 0 | 0 | | | |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6646_63A776 | 1 | 11 | 11 |00:00:00.01 | 3 | 0 | 0 | | | |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(("NUM">=100 AND "NUM"<=110))
Note
-----
- dynamic sampling used for this statement (level=2)
See the (phys) Write upon each WITH view creation, and the (phys) Read upon the first view usage. I also tried the RESULT_CACHE hint (which is not reflected in this sample select, but was reflected in my original queries), but it doesn't remove the disk accesses either (which is understandable).
How can I get rid of the phys writes/reads?