High performance approach to greatest-n-per-group SQL query - sql

I'm attempting to build an infrastructure for quickly running regressions on demand, pulling apache requests from a database that contains all historic activity on our webservers. To improve coverage by making sure that we still regress requests from our smaller clients, I'd like to ensure a distribution of requests by retrieving at most n (for the sake of this question, say 10) requests for each client.
I found a number of similar questions answered here, the closest of which seemed to be SQL query to return top N rows per ID across a range of IDs, but the answers were for the most part performance-agnostic solutions
that I had already tried. For instance, the row_number() analytic function gets us exactly the data we're looking for:
SELECT
*
FROM
(
SELECT
dailylogdata.*,
row_number() over (partition by dailylogdata.contextid order by occurrencedate) rn
FROM
dailylogdata
WHERE
shorturl in (?)
)
WHERE
rn <= 10;
However, given that this table contains millions of entries for a given day and this approach necessitates reading all rows from the index that match our selection criteria in order to apply the row_number analytic function, performance is terrible. We end up selecting nearly a million rows, only to throw out the vast majority of them because their row_number exceeded 10. Stats from executing the above query:
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp||
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|| 0 | SELECT STATEMENT | | 1 | | 12222 |00:09:08.94 | 895K| 584K| 301 | | | | ||
||* 1 | VIEW | | 1 | 4427K| 12222 |00:09:08.94 | 895K| 584K| 301 | | | | ||
||* 2 | WINDOW SORT PUSHED RANK | | 1 | 4427K| 13536 |00:09:08.94 | 895K| 584K| 301 | 2709K| 743K| 97M (1)| 4096 ||
|| 3 | PARTITION RANGE SINGLE | | 1 | 4427K| 932K|00:22:27.90 | 895K| 584K| 0 | | | | ||
|| 4 | TABLE ACCESS BY LOCAL INDEX ROWID| DAILYLOGDATA | 1 | 4427K| 932K|00:22:27.61 | 895K| 584K| 0 | | | | ||
||* 5 | INDEX RANGE SCAN | DAILYLOGDATA_URLCONTEXT | 1 | 17345 | 932K|00:00:00.75 | 1448 | 0 | 0 | | | | ||
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 1 - filter("RN"<=:SYS_B_2) |
| 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DAILYLOGDATA"."CONTEXTID" ORDER BY "OCCURRENCEDATE")<=:SYS_B_2) |
| 5 - access("SHORTURL"=:P1) |
| |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
However, if instead we only query for the first 10 results for a specific contextid, we can execute this dramatically faster:
SELECT
*
FROM
(
SELECT
dailylogdata.*
FROM
dailylogdata
WHERE
shorturl in (?)
and contextid = ?
)
WHERE
rownum <= 10;
Stats from running this query:
|-------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers ||
|-------------------------------------------------------------------------------------------------------------------------|
|| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 14 ||
||* 1 | COUNT STOPKEY | | 1 | | 10 |00:00:00.01 | 14 ||
|| 2 | PARTITION RANGE SINGLE | | 1 | 10 | 10 |00:00:00.01 | 14 ||
|| 3 | TABLE ACCESS BY LOCAL INDEX ROWID| DAILYLOGDATA | 1 | 10 | 10 |00:00:00.01 | 14 ||
||* 4 | INDEX RANGE SCAN | DAILYLOGDATA_URLCONTEXT | 1 | 1 | 10 |00:00:00.01 | 5 ||
|-------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 1 - filter(ROWNUM<=10) |
| 4 - access("SHORTURL"=:P1 AND "CONTEXTID"=TO_NUMBER(:P2)) |
| |
+-------------------------------------------------------------------------------------------------------------------------+
In this instance, Oracle is smart enough to stop retrieving data after getting 10 results. I could gather a complete set of contextids and programatically generate a query consisting of one instance of this query for each contextid and union all the whole mess together, but given the sheer number of contextids, we might run into an internal Oracle limitation, and even if not, this approach reeks of kludge.
Does anyone know of an approach that maintains the simplicity of the first query, while retaining performance commensurate with the second query? Also note that I don't actually care about retrieving a stable set of rows; so long as they satisfy my criteria, they're fine for the purposes of a regression.
Edit: Adam Musch's suggestion did the trick. I'm appending performance results with his changes up here since I can't fit them in a comment response to his answer. I'm also using a larger data set for testing this time, here are the (cached) stats from my original row_number approach for comparison:
|-------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem ||
|-------------------------------------------------------------------------------------------------------------------------------------------------|
|| 0 | SELECT STATEMENT | | 1 | | 12624 |00:00:22.34 | 1186K| 931K| | | ||
||* 1 | VIEW | | 1 | 1163K| 12624 |00:00:22.34 | 1186K| 931K| | | ||
||* 2 | WINDOW NOSORT | | 1 | 1163K| 1213K|00:00:21.82 | 1186K| 931K| 3036M| 17M| ||
|| 3 | TABLE ACCESS BY INDEX ROWID| TWTEST | 1 | 1163K| 1213K|00:00:20.41 | 1186K| 931K| | | ||
||* 4 | INDEX RANGE SCAN | TWTEST_URLCONTEXT | 1 | 1163K| 1213K|00:00:00.81 | 8568 | 0 | | | ||
|-------------------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 1 - filter("RN"<=10) |
| 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "CONTEXTID" ORDER BY NULL )<=10) |
| 4 - access("SHORTURL"=:P1) |
+-------------------------------------------------------------------------------------------------------------------------------------------------+
I've taken the liberty of boiling down Adam's suggestion a bit; here's the modified query...
select
*
from
twtest
where
rowid in (
select
rowid
from (
select
rowid,
shorturl,
row_number() over (partition by shorturl, contextid
order by null) rn
from
twtest
)
where rn <= 10
and shorturl in (?)
);
...and stats from its (cached) evaluation:
|--------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem ||
|--------------------------------------------------------------------------------------------------------------------------------------|
|| 0 | SELECT STATEMENT | | 1 | | 12624 |00:00:01.33 | 19391 | | | ||
|| 1 | NESTED LOOPS | | 1 | 1 | 12624 |00:00:01.33 | 19391 | | | ||
|| 2 | VIEW | VW_NSO_1 | 1 | 1163K| 12624 |00:00:01.27 | 6770 | | | ||
|| 3 | HASH UNIQUE | | 1 | 1 | 12624 |00:00:01.27 | 6770 | 1377K| 1377K| 5065K (0)||
||* 4 | VIEW | | 1 | 1163K| 12624 |00:00:01.25 | 6770 | | | ||
||* 5 | WINDOW NOSORT | | 1 | 1163K| 1213K|00:00:01.09 | 6770 | 283M| 5598K| ||
||* 6 | INDEX RANGE SCAN | TWTEST_URLCONTEXT | 1 | 1163K| 1213K|00:00:00.40 | 6770 | | | ||
|| 7 | TABLE ACCESS BY USER ROWID| TWTEST | 12624 | 1 | 12624 |00:00:00.04 | 12621 | | | ||
|--------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 4 - filter("RN"<=10) |
| 5 - filter(ROW_NUMBER() OVER ( PARTITION BY "SHORTURL","CONTEXTID" ORDER BY NULL NULL )<=10) |
| 6 - access("SHORTURL"=:P1) |
| |
|Note |
|----- |
| - dynamic sampling used for this statement (level=2) |
| |
+--------------------------------------------------------------------------------------------------------------------------------------+
As advertised, we're only accessing the dailylogdata table for the fully-filtered rows. I'm concerned that it appears to still be doing a full scan of the urlcontext index based on the number of rows it claims to be selecting (1213K), but given that it's using only using 6770 buffers (this number remains constant even if I increase the number of context-specific results) this may be misleading.

This is kind of a janky solution, but it seems to do what you want: shortcut the index scan as soon as possible, and not read data until it's been qualified both by filtering conditioning and top-n query criteria.
Note that it was tested with a shorturl = condition, not a shorturl IN condition.
with rowid_list as
(select rowid
from (select *
from (select rowid,
row_number() over (partition by shorturl, contextid
order by null) rn
from dailylogdata
)
where rn <= 10
)
where shorturl = ?
)
select *
from dailylogdata
where rowid in (select rowid from rowid_list)
The with clause grabs the first 10 rowids filtering a WINDOW NOSORT for each unique combination of shorturl and contextid that meets your criteria. Then it loops over that set of rowids, fetching each by rowid.
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 286 | 1536 (1)| 00:00:19 |
| 1 | NESTED LOOPS | | 1 | 286 | 1536 (1)| 00:00:19 |
| 2 | VIEW | VW_NSO_1 | 136K| 1596K| 910 (1)| 00:00:11 |
| 3 | HASH UNIQUE | | 1 | 3326K| | |
|* 4 | VIEW | | 136K| 3326K| 910 (1)| 00:00:11 |
|* 5 | WINDOW NOSORT | | 136K| 2794K| 910 (1)| 00:00:11 |
|* 6 | INDEX RANGE SCAN | TABLE_REDACTED_INDEX | 136K| 2794K| 910 (1)| 00:00:11 |
| 7 | TABLE ACCESS BY USER ROWID| TABLE_REDACTED | 1 | 274 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("RN"<=10)
5 - filter(ROW_NUMBER() OVER ( PARTITION BY "CLIENT_ID","SCE_ID" ORDER BY NULL NULL
)<=10)
6 - access("TABLE_REDACTED"."SHORTURL"=:b1)

It seems to be the sort taking all the time. Is occurrenceDate your clustered index, and if not, is it much quicker if you change to order by your clustered index? I.e. if it's clustered by a sequential id, then order by that.

Last time I simply cached last most interesting rows in a smaller table. with my data distribution it was cheaper to update the cache table on every insert rather than query the bulk table.

I think you should also check other ways/queries to achieve the same result set.
Self-JOIN / GROUP BY
SELECT
d.*
, COUNT(*) AS rn
FROM
dailylogdata AS d
LEFT OUTER JOIN
dailylogdata AS d2
ON d.contextid = d2.contextid
AND d.occurrencedate >= d2.occurrencedate)
AND d2.shorturl IN (?)
WHERE
d.shorturl IN (?)
GROUP BY
d.*
HAVING
COUNT(*) <= 10
And another one which I have no idea if it works correctly:
SELECT
d.*
, COUNT(*) AS rn
FROM
( SELECT DISTINCT
contextid
FROM
dailylogdata
WHERE
shorturl IN (?)
) AS dd
JOIN
dailylogdata AS d
ON d.PK IN
( SELECT
d10.PK
FROM
dailylogdata AS d10
WHERE
d10.contextid = dd.contextid
AND
d10.shorturl IN (?)
AND
rownum <= 10
ORDER BY
d10.occurrencedate
)

Related

Execution plan too expensive case when exists

I have the below query, but when I execute it runs forever.
WITH aux AS (
SELECT
contract,
contract_account,
business_partner,
payment_plan,
installation,
contract_status
FROM
reta.mv_integrated_md a
WHERE
contract_status IN (
'LIVE',
'FINAL'
)
), aux1 AS (
SELECT
a.*,
CASE
WHEN EXISTS (
SELECT
NULL
FROM
aux b
WHERE
b.business_partner = a.business_partner
AND b.installation = a.installation
AND b.payment_plan = 'BMW'
) THEN
'X'
END h
FROM
aux a
)
SELECT
*
FROM
aux1;
My execution plan shows a huge cost which I cannot locate. How could I optimize this query? I have tried some hints but none of them have worked :(
Plan hash value: 1662974027
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19M| 2000M| 825G (1)|999:59:59 | | |
|* 1 | VIEW | | 19M| 990M| 41331 (1)| 00:00:02 | | |
| 2 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FDA49C92_9A7BE8DE | 19M| 1066M| 41331 (1)| 00:00:02 | | |
| 3 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 4 | LOAD AS SELECT | SYS_TEMP_0FDA49C92_9A7BE8DE | | | | | | |
| 5 | PARTITION RANGE SINGLE | | 18M| 974M| 759K (1)| 00:00:30 | 1 | 1 |
|* 6 | TABLE ACCESS STORAGE FULL| MV_INTEGRATED_MD | 18M| 974M| 759K (1)| 00:00:30 | 1 | 1 |
| 7 | VIEW | | 19M| 2000M| 41331 (1)| 00:00:02 | | |
| 8 | TABLE ACCESS STORAGE FULL | SYS_TEMP_0FDA49C92_9A7BE8DE | 19M| 1066M| 41331 (1)| 00:00:02 | | |
----------------------------------------------------------------------------------------------------------------------------
Kindly let me know if any additional information needed.
Use window functions:
SELECT r.contract, r.contract_account, r.business_partner,
r.payment_plan, r.installation, r.contract_status,
MAX(CASE WHEN r.payment_plan = 'BMW' THEN 'X' END) OVER (PARTITION BY business_partner, installation) as h
FROM reta.mv_integrated_md#rbip r
WHERE r.contract_status IN ('LIVE', 'FINAL');
Not only is the query much simpler to write and read, but it should perform much better too.
Highest cost is due to FTS(Full table scan) on table/MV MV_INTEGRATED_MD.
Try to create index on contract_status and check if it reduces the cost and also, what is size of this mv/table in terms of block and it is 10 percent or more than total buffer cache size ?
TABLE ACCESS STORAGE FULL| MV_INTEGRATED_MD | 18M| 974M| 759K (1)| 00:00:30 | 1 | 1
If you run your query with the /*+ gather_plan_statistics */ hint (I'm simulating it with a 1000 row table) you imediately see the problem :
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
-------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
-------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 9 | 5 |
|* 1 | VIEW | | 1000 | 1000 | 1000 |00:00:00.09 | 0 | 0 |
| 2 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6737_1A17DE13 | 1000 | 1000 | 500K|00:00:00.08 | 0 | 0 |
| 3 | TEMP TABLE TRANSFORMATION | | 1 | | 1000 |00:00:00.01 | 9 | 5 |
| 4 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6737_1A17DE13 | 1 | | 0 |00:00:00.01 | 8 | 5 |
|* 5 | TABLE ACCESS FULL | MV_INTEGRATED_MD | 1 | 1000 | 1000 |00:00:00.01 | 7 | 5 |
| 6 | VIEW | | 1 | 1000 | 1000 |00:00:00.01 | 0 | 0 |
| 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6737_1A17DE13 | 1 | 1000 | 1000 |00:00:00.01 | 0 | 0 |
-------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("B"."BUSINESS_PARTNER"=:B1 AND "B"."INSTALLATION"=:B2 AND "B"."PAYMENT_PLAN"='BMW'))
5 - filter("CONTRACT_STATUS"='LIVE')
It is in the line 2 where a full scan is activated in a loop for each line of the main table (see starts = 1000)
Typically you want to resolve the EXISTS with a semi join to preserve good performance, but here it seems that Oracle can not rewrite it.
So you'll need to rewrite the query yourself.
Despite the excelent proposal of #GordonLinoff (that I'll start with) you may try to use an outer join as follows
with bmw as (
select distinct business_partner, installation
from mv_integrated_md
where payment_plan = 'BMW')
SELECT
a.contract,
a.contract_account,
a.business_partner,
a.payment_plan,
a.installation,
a.contract_status,
case when b.business_partner is not null then 'X' end as h
FROM mv_integrated_md a
left outer join bmw b
on b.business_partner = a.business_partner and
b.installation = a.installation
WHERE a.contract_status IN ( 'LIVE', 'FINAL')
This will lead to two fulls scans, one deduplication and outer join.

Indexes not work in query with paging with oracle 10g

I have a data recovery query in an Oracle 10g database with several indexes. When performing the pagination, oracle stops using the indexes.
For the query of the first page, only one subquery is generated:
explain plan for
select * from (
SELECT e.id, e.data, e.date, e.modifdate, e.origin, e.type, e.priority, e.view, e.state
FROM ev e
WHERE exists (
SELECT null
FROM dev d
WHERE (e.cId = d.deviceId OR e.cId = d.id) AND d.id = 152465)
ORDER BY e.date DESC )
where rownum <= 30
In the execution plan, I can see how the indexes are being used:
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30 | 68730 | 35616 (1)| 00:07:08 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 30 | 68730 | 35616 (1)| 00:07:08 |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID | EV | 90578 | 138M| 31351 (1)| 00:06:17 |
| 5 | INDEX FULL SCAN | EV_DATE_DESC_IDX | 35291 | | 128 (6)| 00:00:02 |
|* 6 | TABLE ACCESS BY INDEX ROWID | DEV | 1 | 20 | 4 (0)| 00:00:01 |
| 7 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 8 | BITMAP OR | | | | | |
| 9 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 10 | INDEX RANGE SCAN | DEV_DEVICE_ID_IDX | | | 1 (0)| 00:00:01 |
| 11 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 12 | INDEX RANGE SCAN | SYS_C00443004 | | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------------
30 rows on 0.759 seconds
For the following pages, two subqueries are generated and the indexes are no longer used:
explain plan for
select * from (
select row_.*, rownum rownum_ from (
SELECT e.id, e.data, e.date, e.modifdate, e.origin, e.type, e.priority, e.view, e.state
FROM ev e
WHERE exists (
SELECT null
FROM dev d
WHERE (e.cId = d.deviceId OR e.cId = d.id) AND d.id = 152465)
ORDER BY e.date DESC )
row_ where rownum <= 60)
where rownum_ > 30
Execution plan:
--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 60 | 135K| | 63815 (2)| 00:12:46 |
|* 1 | VIEW | | 60 | 135K| | 63815 (2)| 00:12:46 |
|* 2 | COUNT STOPKEY | | | | | | |
| 3 | VIEW | | 77 | 172K| | 63815 (2)| 00:12:46 |
|* 4 | SORT ORDER BY STOPKEY | | 77 | 120K| 283M| 63815 (2)| 00:12:46 |
|* 5 | FILTER | | | | | | |
| 6 | TABLE ACCESS FULL | EV | 90578 | 138M| | 6798 (3)| 00:01:22 |
|* 7 | TABLE ACCESS BY INDEX ROWID | DEV | 1 | 20 | | 4 (0)| 00:00:01 |
| 8 | BITMAP CONVERSION TO ROWIDS | | | | | | |
| 9 | BITMAP OR | | | | | | |
| 10 | BITMAP CONVERSION FROM ROWIDS| | | | | | |
|* 11 | INDEX RANGE SCAN | DEV_DEVICE_ID_IDX | | | | 1 (0)| 00:00:01 |
| 12 | BITMAP CONVERSION FROM ROWIDS| | | | | | |
|* 13 | INDEX RANGE SCAN | SYS_C00443004 | | | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------------------
30 rows on 3.386 seconds
For paging I write the query by hand but I set the page number with the entity manager:
Query querySelect = entityManager.createNativeQuery(query, Event.class);
querySelect.setMaxResults(pageSize);
querySelect.setFirstResult(pageSize * page);
Why does not Oracle use the indexes in the second query? How can I use the indexes with the pagination in oracle?
Assuming these are normal heap tables, I am just speculating here.
The first query sorts the records using ev.date column and fetches the first 30 records. This favours Index Full Scan (IFS). IFS scans the ROWIDs in the same order of entries in the leaf blocks (Left to right - full scan). So the scanned records are already ordered. As you were fetching the first 30 rows out of these, the optimizer seems to have favoured IFS. Please note that there is a table access after IFS. This is already an overhead.
The second query fetches 30 rows in between the scanned records unlike the first 30 records in the first query. This creates a new starting point and ending point in between the scanned result set. Inspite of scanning in order, the query will have to discard the initially scanned rows based on the new starting point. Hence the records scanned before the starting point are redundant as they will anyway be filtered out. So it seems like IFS is considered as overhead here. This might have forced the query to favour Full Table Scan (FTS) against IFS. Also the scanned records need to be sorted to obtain the required order using 283M of temporary tablespace.
You may want to check the following details to understand the execution plan better.
The number of rows in EV
SELECT num_rows, last_analyzed, blocks, sample_size
FROM user_tables
WHERE table_name = 'EV';
The number of rows in DEV
SELECT num_rows, last_analyzed, blocks, sample_size
FROM user_tables
WHERE table_name = 'DEV';
The number of rows in DEV with ID = 152465
The number of rows in EV with ID or DEVICEID = 152465
The list of indexes in each table with columns in order, uniqueness
SELECT table_name, index_name, uniqueness, clustering_factor, last_analyzed, sample_size
FROM user_indexes
WHERE table_name in ('EV','DEV') AND status = 'VALID'
ORDER BY 1,2;
SELECT table_name,index_name,column_name
FROM user_ind_columns
WHERE table_name in ('EV','DEV')
ORDER BY 1,2,column_position;
Check if histogram statistics are available on any columns.
SELECT table_name, column_name, histogram
FROM user_tab_col_statistics
WHERE table_name in ('EV','DEV')
ORDER BY 1,2;
Also the following parameters might help.
optimizer_index_caching
optimizer_index_cost_Adj
db_file_multiblock_read_count
pga_aggregate_target
optimizer_mode
cursor_sharing

CTE vs subquery, which one is efficient?

SELECT col, (SELECT COUNT(*) FROM table) as total_count FROM table
This query executes subquery for every row, right?
Now if we have
;WITH CTE(total_count) AS (
SELECT COUNT(*) FROM table
)
SELECT col, (SELECT total_count FROM CTE) FROM table;
Will be second method more efficient? Will CTE execute COUNT(*) only once and then SELECT uses it as prepared value? or in second case also executed COUNT(*) for each row?
For Oracle the surest way is to observe the behaviour of the statements with extended statistics.
Do do so first increase the statistics level to ALL
alter session set statistics_level=all;
Then run both statements (fetching all rows) and find the SQL_ID of those statements
Finally display the statistics using following statement (passing the proper SQL_ID):
select * from table(dbms_xplan.display_cursor('your SQL_ID here',null,'ALLSTATS LAST'));
This gives for my test table
SQL_ID 5n0sdcu8347j9, child number 0
-------------------------------------
SELECT col, (SELECT COUNT(*) FROM t1) as total_count FROM t1
Plan hash value: 1306093980
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 351 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 338 |
| 2 | TABLE ACCESS FULL| T1 | 1 | 1061 | 1000 |00:00:00.01 | 338 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 1061 | 1000 |00:00:00.01 | 351 |
-------------------------------------------------------------------------------------
and
SQL_ID fs0h660f08bj6, child number 0
-------------------------------------
WITH CTE(total_count) AS ( SELECT COUNT(*) FROM t1 ) SELECT col,
(SELECT total_count FROM CTE) FROM t1
Plan hash value: 1223456497
--------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 351 |
| 1 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 338 |
| 2 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 338 |
| 3 | TABLE ACCESS FULL| T1 | 1 | 1061 | 1000 |00:00:00.01 | 338 |
| 4 | TABLE ACCESS FULL | T1 | 1 | 1061 | 1000 |00:00:00.01 | 351 |
--------------------------------------------------------------------------------------
So the plans are slightly different, but in both cases the FULL TABLE SCAN is started only once (column Starts = 1). Which gives no real difference.
For purpose of camparison I run also a correlated subquery, which gives a complete different picture with high number of Starts (of FTS)
SQL_ID cbvwd6pm6699m, child number 0
-------------------------------------
SELECT col, (SELECT COUNT(*) FROM t1 where col = a.col) as total_count
FROM t1 a
Plan hash value: 1306093980
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 351 |
| 1 | SORT AGGREGATE | | 1000 | 1 | 1000 |00:00:00.31 | 338K|
|* 2 | TABLE ACCESS FULL| T1 | 1000 | 11 | 1000 |00:00:00.31 | 338K|
| 3 | TABLE ACCESS FULL | T1 | 1 | 1061 | 1000 |00:00:00.01 | 351 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL"=:B1)
I believe that the query optimizer in both Oracle and SQL Server will recognize that the count query is not correlated, compute it once, and then use the cached result throughout the execution of the outer query.
Also, the CTE won't change anything as far as I know, since at execution time the code inside it will basically just be inlined into the actual outer query.
Here is a reference for Oracle which mentions that a non correlated subquery will be executed once and cached, except in cases where the outer query only has a few rows. In that case, it might not be cached because there isn't much of a penalty in executing the count subquery multiple times.

Physical reads caused by WITH clause

I have a set of complex optimized selects that suffer from physical reads. Without them they would be even faster!
These physical reads occur due to the WITH clause, one physical_read_request per WITH sub-query. They seem totally unnecessary to me, I'd prefer Oracle keeping the sub-query results in memory instead of writing them down to disk and reading them again.
I'm looking for a way how to get rid of these phys reads.
A simple sample having the same problems is this:
Edit: Example replaced with simpler one that does not use dictionary views.
alter session set STATISTICS_LEVEL=ALL;
create table T as
select level NUM from dual
connect by level <= 1000;
with /*a2*/ TT as (
select NUM from T
where NUM between 100 and 110
)
select * from TT
union all
select * from TT
;
SELECT * FROM TABLE(dbms_xplan.display_cursor(
(select sql_id from v$sql
where sql_fulltext like 'with /*a2*/ TT%'
and sql_fulltext not like '%v$sql%'
and sql_fulltext not like 'explain%'),
NULL, format=>'allstats last'));
and the corresponding execution plan is
SQL_ID bpqnhfdmxnqvp, child number 0
-------------------------------------
with /*a2*/ TT as ( select NUM from T where NUM between 100 and
110 ) select * from TT union all select * from TT
Plan hash value: 4255080040
---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 22 |00:00:00.01 | 20 | 1 | 1 | | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | 22 |00:00:00.01 | 20 | 1 | 1 | | | |
| 2 | LOAD AS SELECT | | 1 | | 0 |00:00:00.01 | 8 | 0 | 1 | 266K| 266K| 266K (0)|
|* 3 | TABLE ACCESS FULL | T | 1 | 11 | 11 |00:00:00.01 | 4 | 0 | 0 | | | |
| 4 | UNION-ALL | | 1 | | 22 |00:00:00.01 | 9 | 1 | 0 | | | |
| 5 | VIEW | | 1 | 11 | 11 |00:00:00.01 | 6 | 1 | 0 | | | |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6646_63A776 | 1 | 11 | 11 |00:00:00.01 | 6 | 1 | 0 | | | |
| 7 | VIEW | | 1 | 11 | 11 |00:00:00.01 | 3 | 0 | 0 | | | |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6646_63A776 | 1 | 11 | 11 |00:00:00.01 | 3 | 0 | 0 | | | |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(("NUM">=100 AND "NUM"<=110))
Note
-----
- dynamic sampling used for this statement (level=2)
See the (phys) Write upon each WITH view creation, and the (phys) Read upon the first view usage. I also tried the RESULT_CACHE hint (which is not reflected in this sample select, but was reflected in my original queries), but it doesn't remove the disk accesses either (which is understandable).
How can I get rid of the phys writes/reads?

Missing STOPKEY per partition in Oracle plan for paging by local index

There is next partitioned table:
CREATE TABLE "ERMB_LOG_TEST_BF"."OUT_SMS"(
"TRX_ID" NUMBER(19,0) NOT NULL ENABLE,
"CREATE_TS" TIMESTAMP (3) DEFAULT systimestamp NOT NULL ENABLE,
/* other fields... */
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "ERMB_LOG_TEST_BF"
PARTITION BY RANGE ("TRX_ID") INTERVAL (281474976710656)
(PARTITION "SYS_P1358" VALUES LESS THAN (59109745109237760) SEGMENT CREATION IMMEDIATE
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "ERMB_LOG_TEST_BF");
CREATE INDEX "ERMB_LOG_TEST_BF"."OUT_SMS_CREATE_TS_TRX_ID_IX" ON "ERMB_LOG_TEST_BF"."OUT_SMS" ("CREATE_TS" DESC, "TRX_ID" DESC)
PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) LOCAL
(PARTITION "SYS_P1358"
PCTFREE 10 INITRANS 2 MAXTRANS 255 LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "ERMB_LOG_TEST_BF");
I have sql query, which select 20 records ordered by date and transaction:
select rd from (
select /*+ INDEX(OUT_SMS OUT_SMS_CREATE_TS_TRX_ID_IX) */ rowid rd
from OUT_SMS
where TRX_ID between 34621422135410688 and 72339069014638591
and CREATE_TS between to_timestamp('2013-02-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
and to_timestamp('2013-03-06 08:57:00', 'yyyy-mm-dd hh24:mi:ss')
order by CREATE_TS DESC, TRX_ID DESC
) where rownum <= 20
Oracle has generated next plan:
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 240 | | 4788K (1)| 00:05:02 | | |
|* 1 | COUNT STOPKEY | | | | | | | | |
| 2 | VIEW | | 312M| 3576M| | 4788K (1)| 00:05:02 | | |
|* 3 | SORT ORDER BY STOPKEY | | 312M| 9G| 12G| 4788K (1)| 00:05:02 | | |
| 4 | PARTITION RANGE ITERATOR| | 312M| 9G| | 19 (0)| 00:00:01 | 1 | 48 |
|* 5 | COUNT STOPKEY | | | | | | | | |
|* 6 | INDEX RANGE SCAN | OUT_SMS_CREATE_TS_TRX_ID_IX | 312M| 9G| | 19 (0)| 00:00:01 | 1 | 48 |
-----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=20)
3 - filter(ROWNUM<=20)
5 - filter(ROWNUM<=20)
6 - access(SYS_OP_DESCEND("CREATE_TS")>=HEXTORAW('878EFCF9F6C5FEFAFF') AND
SYS_OP_DESCEND("TRX_ID")>=HEXTORAW('36F7E7D7F8A4F0BFA9A3FF') AND
SYS_OP_DESCEND("CREATE_TS")<=HEXTORAW('878EFDFEF8FEF8FF') AND
SYS_OP_DESCEND("TRX_ID")<=HEXTORAW('36FBD0E9D4E9DBD5F8A6FF') )
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("CREATE_TS"))<=TIMESTAMP' 2013-03-06 08:57:00,000000000' AND
SYS_OP_UNDESCEND(SYS_OP_DESCEND("TRX_ID"))<=72339069014638591 AND
SYS_OP_UNDESCEND(SYS_OP_DESCEND("TRX_ID"))>=34621422135410688 AND
SYS_OP_UNDESCEND(SYS_OP_DESCEND("CREATE_TS"))>=TIMESTAMP' 2013-02-01 00:00:00,000000000')
It works perfectly.
By the way, table OUT_SMS is partitioned by TRX_ID field and OUT_SMS_CREATE_TS_TRX_ID_IX is local index (CREATE_TS DESC, TRX_ID DESC) on each partition.
But if I convert this query to prepared statement:
select rd from (
select /*+ INDEX(OUT_SMS OUT_SMS_CREATE_TS_TRX_ID_IX) */ rowid rd
from OUT_SMS
where TRX_ID between ? and ?
and CREATE_TS between ? and ?
order by CREATE_TS DESC, TRX_ID DESC
) where rownum <= 20
Oracle generates next plan:
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 240 | 14743 (1)| 00:00:01 | | |
|* 1 | COUNT STOPKEY | | | | | | | |
| 2 | VIEW | | 1964 | 23568 | 14743 (1)| 00:00:01 | | |
|* 3 | SORT ORDER BY STOPKEY | | 1964 | 66776 | 14743 (1)| 00:00:01 | | |
|* 4 | FILTER | | | | | | | |
| 5 | PARTITION RANGE ITERATOR| | 1964 | 66776 | 14742 (1)| 00:00:01 | KEY | KEY |
|* 6 | INDEX RANGE SCAN | OUT_SMS_CREATE_TS_TRX_ID_IX | 1964 | 66776 | 14742 (1)| 00:00:01 | KEY | KEY |
----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=20)
3 - filter(ROWNUM<=20)
4 - filter(TO_TIMESTAMP(:RR,'yyyy-mm-dd hh24:mi:ss')<=TO_TIMESTAMP(:T,'yyyy-mm-dd hh24:mi:ss') AND
TO_NUMBER(:ABC)<=TO_NUMBER(:EBC))
6 - access(SYS_OP_DESCEND("CREATE_TS")>=SYS_OP_DESCEND(TO_TIMESTAMP(:T,'yyyy-mm-dd hh24:mi:ss')) AND
SYS_OP_DESCEND("TRX_ID")>=SYS_OP_DESCEND(TO_NUMBER(:EBC)) AND
SYS_OP_DESCEND("CREATE_TS")<=SYS_OP_DESCEND(TO_TIMESTAMP(:RR,'yyyy-mm-dd hh24:mi:ss')) AND
SYS_OP_DESCEND("TRX_ID")<=SYS_OP_DESCEND(TO_NUMBER(:ABC)))
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("TRX_ID"))>=TO_NUMBER(:ABC) AND
SYS_OP_UNDESCEND(SYS_OP_DESCEND("TRX_ID"))<=TO_NUMBER(:EBC) AND
SYS_OP_UNDESCEND(SYS_OP_DESCEND("CREATE_TS"))>=TO_TIMESTAMP(:RR,'yyyy-mm-dd hh24:mi:ss') AND
SYS_OP_UNDESCEND(SYS_OP_DESCEND("CREATE_TS"))<=TO_TIMESTAMP(:T,'yyyy-mm-dd hh24:mi:ss'))
Operation COUNT STOPKEY disappears from plan. This operation should be after index was analyzed for getting 20 rows from each partition like the first query.
How can I compose prepared statement to have COUNT STOPKEY in the plan?
When you use bind variables, Oracle is forced to use dynamic partition pruning instead of static partition pruning. The result of this is that Oracle doesn't know at parse time which partitions will be accessed, as this changes based on your input variables.
This means that when using literal values (instead of bind variables), we know which partitions will be accessed by your local index. Therefore the count stopkey can be applied to the output of the index before we prune the partitions.
When using bind variables, the partition range iterator has to figure out which partitions you're accessing. It then has a check to ensure that the first of your variables in the between operations do actually have a lower value then the second one (the filter operation in the second plan).
This can easily be reproduced, as the following test case shows:
create table tab (
x date,
y integer,
filler varchar2(100)
) partition by range(x) (
partition p1 values less than (date'2013-01-01'),
partition p2 values less than (date'2013-02-01'),
partition p3 values less than (date'2013-03-01'),
partition p4 values less than (date'2013-04-01'),
partition p5 values less than (date'2013-05-01'),
partition p6 values less than (date'2013-06-01')
);
insert into tab (x, y)
select add_months(trunc(sysdate, 'y'), mod(rownum, 5)), rownum, dbms_random.string('x', 50)
from dual
connect by level <= 1000;
create index i on tab(x desc, y desc) local;
exec dbms_stats.gather_table_stats(user, 'tab', cascade => true);
explain plan for
SELECT * FROM (
SELECT rowid FROM tab
where x between date'2013-01-01' and date'2013-02-02'
and y between 50 and 100
order by x desc, y desc
)
where rownum <= 5;
SELECT * FROM table(dbms_xplan.display(null, null, 'BASIC +ROWS +PARTITION'));
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | |
| 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 1 | | |
| 3 | SORT ORDER BY STOPKEY | | 1 | | |
| 4 | PARTITION RANGE ITERATOR| | 1 | 2 | 3 |
| 5 | COUNT STOPKEY | | | | |
| 6 | INDEX RANGE SCAN | I | 1 | 2 | 3 |
--------------------------------------------------------------------
explain plan for
SELECT * FROM (
SELECT rowid FROM tab
where x between to_date(:st, 'dd/mm/yyyy') and to_date(:en, 'dd/mm/yyyy')
and y between :a and :b
order by x desc, y desc
)
where rownum <= 5;
SELECT * FROM table(dbms_xplan.display(null, null, 'BASIC +ROWS +PARTITION'));
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | |
| 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 1 | | |
| 3 | SORT ORDER BY STOPKEY | | 1 | | |
| 4 | FILTER | | | | |
| 5 | PARTITION RANGE ITERATOR| | 1 | KEY | KEY |
| 6 | INDEX RANGE SCAN | I | 1 | KEY | KEY |
---------------------------------------------------------------------
As in your example, the second query can only filter the partitions to a key at parse time, rather than the exact partitions as in the first example.
This is one of those rare cases where literal values can provide better performance than bind variables. You should investigate whether this is a possibility for you.
Finally, you say you want 20 rows from each partition. Your query as stands won't do this, it'll just return you the first 20 rows according to your ordering. For 20 rows/partition, you need to do something like this:
select rd from (
select rowid rd,
row_number() over (partition by trx_id order by create_ts desc) rn
from OUT_SMS
where TRX_ID between ? and ?
and CREATE_TS between ? and ?
order by CREATE_TS DESC, TRX_ID DESC
) where rn <= 20
UPDATE
The reason you're not getting the count stopkey is to do with the filter operation in line 4 of the "bad" plan. You can see this more clearly if you repeat the example above, but with no partitioning.
This gives you the following plans:
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | COUNT STOPKEY | |
| 2 | VIEW | |
|* 3 | SORT ORDER BY STOPKEY| |
|* 4 | TABLE ACCESS FULL | TAB |
----------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
3 - filter(ROWNUM<=5)
4 - filter("X">=TO_DATE(' 2013-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "X"<=TO_DATE(' 2013-02-02 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "Y">=50 AND "Y"<=100)
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | COUNT STOPKEY | |
| 2 | VIEW | |
|* 3 | SORT ORDER BY STOPKEY| |
|* 4 | FILTER | |
|* 5 | TABLE ACCESS FULL | TAB |
----------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
3 - filter(ROWNUM<=5)
4 - filter(TO_NUMBER(:A)<=TO_NUMBER(:B) AND
TO_DATE(:ST,'dd/mm/yyyy')<=TO_DATE(:EN,'dd/mm/yyyy'))
5 - filter("Y">=TO_NUMBER(:A) AND "Y"<=TO_NUMBER(:B) AND
"X">=TO_DATE(:ST,'dd/mm/yyyy') AND "X"<=TO_DATE(:EN,'dd/mm/yyyy'))
As you can see, there's an extra filter operation when you use bind variables appearing before the sort order by stopkey. This happens after accessing the index. This is checking that the values for the variables will allow data to be returned (the first variable in your between does actually have a lower value than the second). This isn't necessary when using literals because the optimizer already knows that 50 is less than 100 (in this case). It doesn't know whether :a is less than :b at parse time however.
Why exactly this is I don't know. It could be intentional design by Oracle - there's no point doing the stopkey check if the values set for the variables result in zero rows - or just an oversight.
I can reproduce your findings on 11.2.0.3. Here's my test case:
SQL> -- Table with 100 partitions of 100 rows
SQL> CREATE TABLE out_sms
2 PARTITION BY RANGE (trx_id)
3 INTERVAL (100) (PARTITION p0 VALUES LESS THAN (0))
4 AS
5 SELECT ROWNUM trx_id,
6 trunc(SYSDATE) + MOD(ROWNUM, 50) create_ts
7 FROM dual CONNECT BY LEVEL <= 10000;
Table created
SQL> CREATE INDEX OUT_SMS_IDX ON out_sms (create_ts desc, trx_id desc) LOCAL;
Index created
[static plan]
SELECT rd
FROM (SELECT /*+ INDEX(OUT_SMS OUT_SMS_IDX) */
rowid rd
FROM out_sms
WHERE create_ts BETWEEN systimestamp AND systimestamp + 10
AND trx_id BETWEEN 1 AND 500
ORDER BY create_ts DESC, trx_id DESC)
WHERE rownum <= 20;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | |
|* 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 1 | | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | | |
| 4 | PARTITION RANGE ITERATOR| | 1 | 2 | 7 |
|* 5 | COUNT STOPKEY | | | | |
|* 6 | INDEX RANGE SCAN | OUT_SMS_IDX | 1 | 2 | 7 |
---------------------------------------------------------------------------
[dynamic]
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | |
|* 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 1 | | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | | |
|* 4 | FILTER | | | | |
| 5 | PARTITION RANGE ITERATOR| | 1 | KEY | KEY |
|* 6 | INDEX RANGE SCAN | OUT_SMS_IDX | 1 | KEY | KEY |
----------------------------------------------------------------------------
As in your example the ROWNUM predicate is pushed inside the partition index range scan in the first case, not in the second case. When using static variables, the plan shows that Oracle fetches only 20 rows per partition, whereas using dynamic variables, Oracle will fetch all rows that satisfy the WHERE clause in each partition. I couldn't find a setting or a statistics configuration where the predicate could be pushed when using bind variables.
I hoped that you could use dynamic filters with wider static limits to game the system but it seems that the ROWNUM predicate isn't used inside individual partitions as soon as there are dynamic variables present:
SELECT rd
FROM (SELECT /*+ INDEX(OUT_SMS OUT_SMS_IDX) */
rowid rd
FROM out_sms
WHERE nvl(create_ts+:5, sysdate) BETWEEN :1 AND :2
AND nvl(trx_id+:6, 0) BETWEEN :3 AND :4
AND trx_id BETWEEN 1 AND 500
AND create_ts BETWEEN systimestamp AND systimestamp + 10
ORDER BY create_ts DESC, trx_id DESC)
WHERE rownum <= 20
Plan hash value: 2740263591
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | |
|* 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 1 | | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | | |
|* 4 | FILTER | | | | |
| 5 | PARTITION RANGE ITERATOR| | 1 | 2 | 7 |
|* 6 | INDEX RANGE SCAN | OUT_SMS_IDX | 1 | 2 | 7 |
----------------------------------------------------------------------------
If this query is important and its performance is critical, you could transform the index to a global index. It will increase partition maintenance but most partition operations can be used online with recent Oracle versions. A global index will work as with standard non-partitioned table in this case:
SQL> drop index out_sms_idx;
Index dropped
SQL> CREATE INDEX OUT_SMS_IDX ON out_sms (create_ts DESC, trx_id desc);
Index created
SELECT rd
FROM (SELECT
rowid rd
FROM out_sms
WHERE create_ts BETWEEN :1 AND :2
AND trx_id BETWEEN :3 AND :4
ORDER BY create_ts DESC, trx_id DESC)
WHERE rownum <= 20
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 2 (0)|
|* 1 | COUNT STOPKEY | | | | |
| 2 | VIEW | | 1 | 12 | 2 (0)|
|* 3 | FILTER | | | | |
|* 4 | INDEX RANGE SCAN| OUT_SMS_IDX | 1 | 34 | 2 (0)|
------------------------------------------------------------------------
I can confirm that the issue in question is still a problem on Oracle 12.1.0.2.0.
And even hardcoded partition elimination bounds are not enough.
Here is the test table in my case:
CREATE TABLE FR_MESSAGE_PART (
ID NUMBER(38) NOT NULL CONSTRAINT PK_FR_MESSAGE_PART PRIMARY KEY USING INDEX LOCAL,
TRX_ID NUMBER(38) NOT NULL, TS TIMESTAMP NOT NULL, TEXT CLOB)
PARTITION BY RANGE (ID) (PARTITION PART_0 VALUES LESS THAN (0));
CREATE INDEX IX_FR_MESSAGE_PART_TRX_ID ON FR_MESSAGE_PART(TRX_ID) LOCAL;
CREATE INDEX IX_FR_MESSAGE_PART_TS ON FR_MESSAGE_PART(TS) LOCAL;
The table is populated with several millions of records of OLTP production data for several months. Each month belongs to a separate partition.
Primary key values of this table always include time part in higher bits that allows to use ID for range partitioning by calendar periods. All messages inherit higher time bits of TRX_ID. This ensures that all messages belonging to the same business operation do always fall in the same partition.
Let's start with hardcoded query for selecting a page of the most recent messages for a given time period with partition elimination bounds applied:
select * from (select * from FR_MESSAGE_PART
where TS >= DATE '2017-11-30' and TS < DATE '2017-12-02'
and ID >= 376894993815568384 and ID < 411234940974268416
order by TS DESC) where ROWNUM <= 40;
But, having freshly gathered table statistics, Oracle optimizer still falsely estimates that sorting two entire monthly partitions would be faster than a range scan for two days by existing local index:
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 26200 | | 103K (1)| 00:00:05 | | |
|* 1 | COUNT STOPKEY | | | | | | | | |
| 2 | VIEW | | 803K| 501M| | 103K (1)| 00:00:05 | | |
|* 3 | SORT ORDER BY STOPKEY | | 803K| 70M| 92M| 103K (1)| 00:00:05 | | |
| 4 | PARTITION RANGE ITERATOR| | 803K| 70M| | 86382 (1)| 00:00:04 | 2 | 3 |
|* 5 | TABLE ACCESS FULL | FR_MESSAGE_PART | 803K| 70M| | 86382 (1)| 00:00:04 | 2 | 3 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=40)
3 - filter(ROWNUM<=40)
5 - filter("TS"<TIMESTAMP' 2017-12-01 00:00:00' AND "TS">=TIMESTAMP' 2017-11-29 00:00:00' AND
"ID">=376894993815568384)
Actual execution time appears by an order of magnitude longer than estimated in plan.
So we have to apply a hint to force usage of the index:
select * from (select /*+ FIRST_ROWS(40) INDEX(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PART
where TS >= DATE '2017-11-30' and TS < DATE '2017-12-02'
and ID >= 376894993815568384 and ID < 411234940974268416
order by TS DESC) where ROWNUM <= 40;
Now the plan uses the index but still envolves slow sorting of two entire partitions:
-----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 26200 | | 615K (1)| 00:00:25 | | |
|* 1 | COUNT STOPKEY | | | | | | | | |
| 2 | VIEW | | 803K| 501M| | 615K (1)| 00:00:25 | | |
|* 3 | SORT ORDER BY STOPKEY | | 803K| 70M| 92M| 615K (1)| 00:00:25 | | |
| 4 | PARTITION RANGE ITERATOR | | 803K| 70M| | 598K (1)| 00:00:24 | 2 | 3 |
|* 5 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART | 803K| 70M| | 598K (1)| 00:00:24 | 2 | 3 |
|* 6 | INDEX RANGE SCAN | IX_FR_MESSAGE_PART_TS | 576K| | | 2269 (1)| 00:00:01 | 2 | 3 |
-----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=40)
3 - filter(ROWNUM<=40)
5 - filter("ID">=376894993815568384)
6 - access("TS">=TIMESTAMP' 2017-11-30 00:00:00' AND "TS"<TIMESTAMP' 2017-12-02 00:00:00')
After some struggling through Oracle hints reference and google it was found that we also have to explicitly specify the descending direction for index range scan with INDEX_DESC or INDEX_RS_DESC hint:
select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PART
where TS >= DATE '2017-11-30' and TS < DATE '2017-12-02'
and ID >= 376894993815568384 and ID < 411234940974268416
order by TS DESC) where ROWNUM <= 40;
This at last gives fast plan with COUNT STOPKEY per partition which scans partitions in descending order and sorts only at most 40 rows from each partition:
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 26200 | | 615K (1)| 00:00:25 | | |
|* 1 | COUNT STOPKEY | | | | | | | | |
| 2 | VIEW | | 803K| 501M| | 615K (1)| 00:00:25 | | |
|* 3 | SORT ORDER BY STOPKEY | | 803K| 70M| 92M| 615K (1)| 00:00:25 | | |
| 4 | PARTITION RANGE ITERATOR | | 803K| 70M| | 598K (1)| 00:00:24 | 3 | 2 |
|* 5 | COUNT STOPKEY | | | | | | | | |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART | 803K| 70M| | 598K (1)| 00:00:24 | 3 | 2 |
|* 7 | INDEX RANGE SCAN DESCENDING | IX_FR_MESSAGE_PART_TS | 576K| | | 2269 (1)| 00:00:01 | 3 | 2 |
------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=40)
3 - filter(ROWNUM<=40)
5 - filter(ROWNUM<=40)
6 - filter("ID">=376894993815568384)
7 - access("TS">=TIMESTAMP' 2017-11-30 00:00:00' AND "TS"<TIMESTAMP' 2017-12-02 00:00:00')
filter("TS">=TIMESTAMP' 2017-11-30 00:00:00' AND "TS"<TIMESTAMP' 2017-12-02 00:00:00')
This runs blazing fast but estimated plan cost is still falsely too high.
So far so good. Now let's try to make the query parametrized to be used in our custom ORM framework:
select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PART
where TS >= :1 and TS < :2
and ID >= :3 and ID < :4
order by TS DESC) where ROWNUM <= 40;
But then COUNT STOPKEY per partition disappears from the plan as stated in the question and confirmed in the other answer:
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 26200 | 82349 (1)| 00:00:04 | | |
|* 1 | COUNT STOPKEY | | | | | | | |
| 2 | VIEW | | 153 | 97K| 82349 (1)| 00:00:04 | | |
|* 3 | SORT ORDER BY STOPKEY | | 153 | 14076 | 82349 (1)| 00:00:04 | | |
|* 4 | FILTER | | | | | | | |
| 5 | PARTITION RANGE ITERATOR | | 153 | 14076 | 82348 (1)| 00:00:04 | KEY | KEY |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART | 153 | 14076 | 82348 (1)| 00:00:04 | KEY | KEY |
|* 7 | INDEX RANGE SCAN DESCENDING | IX_FR_MESSAGE_PART_TS | 110K| | 450 (1)| 00:00:01 | KEY | KEY |
----------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=40)
3 - filter(ROWNUM<=40)
4 - filter(TO_NUMBER(:4)>TO_NUMBER(:3) AND TO_TIMESTAMP(:2)>TO_TIMESTAMP(:1))
6 - filter("ID">=TO_NUMBER(:3) AND "ID"<TO_NUMBER(:4))
7 - access("TS">=TO_TIMESTAMP(:1) AND "TS"<TO_TIMESTAMP(:2))
filter("TS">=TO_TIMESTAMP(:1) AND "TS"<TO_TIMESTAMP(:2))
Then I tried to retreat to hardcoded monthly-aligned partition elimination bounds but still retain parametrized timestamp bounds to minimize plan cache spoiling.
select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PART
where TS >= :1 and TS < :2
and ID >= 376894993815568384 and ID < 411234940974268416
order by TS DESC) where ROWNUM <= 40;
But still got slow plan:
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 26200 | | 83512 (1)| 00:00:04 | | |
|* 1 | COUNT STOPKEY | | | | | | | | |
| 2 | VIEW | | 61238 | 38M| | 83512 (1)| 00:00:04 | | |
|* 3 | SORT ORDER BY STOPKEY | | 61238 | 5501K| 7216K| 83512 (1)| 00:00:04 | | |
|* 4 | FILTER | | | | | | | | |
| 5 | PARTITION RANGE ITERATOR | | 61238 | 5501K| | 82214 (1)| 00:00:04 | 3 | 2 |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART | 61238 | 5501K| | 82214 (1)| 00:00:04 | 3 | 2 |
|* 7 | INDEX RANGE SCAN DESCENDING | IX_FR_MESSAGE_PART_TS | 79076 | | | 316 (1)| 00:00:01 | 3 | 2 |
------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=40)
3 - filter(ROWNUM<=40)
4 - filter(TO_TIMESTAMP(:2)>TO_TIMESTAMP(:1))
6 - filter("ID">=376894993815568384)
7 - access("TS">=TO_TIMESTAMP(:1) AND "TS"<TO_TIMESTAMP(:2))
filter("TS">=TO_TIMESTAMP(:1) AND "TS"<TO_TIMESTAMP(:2))
#ChrisSaxon in his answer here has mentioned that missing nested STOPKEY COUNT has something to do with filter(TO_TIMESTAMP(:2)>TO_TIMESTAMP(:1)) operation which validates that the upper bound is really bigger than the lower one.
Taking this into account I tried to cheat the oprimizer by transforming TS between :a and :b into equivalent :b between TS and TS + (:b - :a). And this worked!
After some additional investigation of the root cause of this change, I've found that just replacing TS >= :1 and TS < :2 with TS + 0 >= :1 and TS < :2 helps to achieve optimal execution plan.
select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PART
where TS + 0 >= :1 and TS < :2
and ID >= 376894993815568384 and ID < 411234940974268416
order by TS DESC) where ROWNUM <= 40;
The plan now has proper COUNT STOPKEY per partition and a notion of INTERNAL_FUNCTION("TS")+0 which prevented the toxic extra bounds checking filter, I guess.
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 26200 | | 10120 (1)| 00:00:01 | | |
|* 1 | COUNT STOPKEY | | | | | | | | |
| 2 | VIEW | | 61238 | 38M| | 10120 (1)| 00:00:01 | | |
|* 3 | SORT ORDER BY STOPKEY | | 61238 | 5501K| 7216K| 10120 (1)| 00:00:01 | | |
| 4 | PARTITION RANGE ITERATOR | | 61238 | 5501K| | 8822 (1)| 00:00:01 | 3 | 2 |
|* 5 | COUNT STOPKEY | | | | | | | | |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART | 61238 | 5501K| | 8822 (1)| 00:00:01 | 3 | 2 |
|* 7 | INDEX RANGE SCAN DESCENDING | IX_FR_MESSAGE_PART_TS | 7908 | | | 631 (1)| 00:00:01 | 3 | 2 |
------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=40)
3 - filter(ROWNUM<=40)
5 - filter(ROWNUM<=40)
6 - filter("ID">=376894993815568384)
7 - access("TS"<TO_TIMESTAMP(:2))
filter(INTERNAL_FUNCTION("TS")+0>=:1 AND "TS"<TO_TIMESTAMP(:2))
We had to implement the mentioned Oracle-specific + 0 workaround and partition elimination bounds hardcoding in our custom ORM framework. It allows to retain the same fast paging performance after switching to partitioned tables with local indices.
But I wish much patience and sanity to those who venture to do the same switch without complete control of sql-building code.
It appears Oracle has too much pitfalls when partitioning and paging are mixed together. For example, we found that Oracle 12's new OFFSET ROWS / FETCH NEXT ROWS ONLY syntax sugar is almost unusable with local indexed partitioned tables as most of analytic windowing functions it's based upon.
The shortest working query to fetch some page behind the first one is
select * from (select * from (
select /*+ FIRST_ROWS(200) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */* from FR_MESSAGE_PART
where TS + 0 >= :1 and TS < :2
and ID >= 376894993815568384 and ID < 411234940974268416
order by TS DESC) where ROWNUM <= 200) offset 180 rows;
Here is an example of actual execution plan after running such query:
SQL_ID c67mmq4wg49sx, child number 0
-------------------------------------
select * from (select * from (select /*+ FIRST_ROWS(200)
INDEX_RS_DESC("FR_MESSAGE_PART" ("TS")) GATHER_PLAN_STATISTICS */ "ID",
"MESSAGE_TYPE_ID", "TS", "REMOTE_ADDRESS", "TRX_ID",
"PROTOCOL_MESSAGE_ID", "MESSAGE_DATA_ID", "TEXT_OFFSET", "TEXT_SIZE",
"BODY_OFFSET", "BODY_SIZE", "INCOMING" from "FR_MESSAGE_PART" where
"TS" + 0 >= :1 and "TS" < :2 and "ID" >= 376894993815568384 and "ID" <
411234940974268416 order by "TS" DESC) where ROWNUM <= 200) offset 180
rows
Plan hash value: 2499404919
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | | 640K(100)| | | | 20 |00:00:00.01 | 322 | | | |
|* 1 | VIEW | | 1 | 200 | 130K| | 640K (1)| 00:00:26 | | | 20 |00:00:00.01 | 322 | | | |
| 2 | WINDOW NOSORT | | 1 | 200 | 127K| | 640K (1)| 00:00:26 | | | 200 |00:00:00.01 | 322 | 142K| 142K| |
| 3 | VIEW | | 1 | 200 | 127K| | 640K (1)| 00:00:26 | | | 200 |00:00:00.01 | 322 | | | |
|* 4 | COUNT STOPKEY | | 1 | | | | | | | | 200 |00:00:00.01 | 322 | | | |
| 5 | VIEW | | 1 | 780K| 487M| | 640K (1)| 00:00:26 | | | 200 |00:00:00.01 | 322 | | | |
|* 6 | SORT ORDER BY STOPKEY | | 1 | 780K| 68M| 89M| 640K (1)| 00:00:26 | | | 200 |00:00:00.01 | 322 | 29696 | 29696 |26624 (0)|
| 7 | PARTITION RANGE ITERATOR | | 1 | 780K| 68M| | 624K (1)| 00:00:25 | 3 | 2 | 400 |00:00:00.01 | 322 | | | |
|* 8 | COUNT STOPKEY | | 2 | | | | | | | | 400 |00:00:00.01 | 322 | | | |
|* 9 | TABLE ACCESS BY LOCAL INDEX ROWID| FR_MESSAGE_PART | 2 | 780K| 68M| | 624K (1)| 00:00:25 | 3 | 2 | 400 |00:00:00.01 | 322 | | | |
|* 10 | INDEX RANGE SCAN DESCENDING | IX_FR_MESSAGE_PART_TS | 2 | 559K| | | 44368 (1)| 00:00:02 | 3 | 2 | 400 |00:00:00.01 | 8 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
DB_VERSION('12.1.0.2')
OPT_PARAM('optimizer_dynamic_sampling' 0)
OPT_PARAM('_optimizer_dsdir_usage_control' 0)
FIRST_ROWS(200)
OUTLINE_LEAF(#"SEL$3")
OUTLINE_LEAF(#"SEL$2")
OUTLINE_LEAF(#"SEL$1")
OUTLINE_LEAF(#"SEL$4")
NO_ACCESS(#"SEL$4" "from$_subquery$_004"#"SEL$4")
NO_ACCESS(#"SEL$1" "from$_subquery$_001"#"SEL$1")
NO_ACCESS(#"SEL$2" "from$_subquery$_002"#"SEL$2")
INDEX_RS_DESC(#"SEL$3" "FR_MESSAGE_PART"#"SEL$3" ("FR_MESSAGE_PART"."TS"))
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_004"."rowlimit_$$_rownumber">180)
4 - filter(ROWNUM<=200)
6 - filter(ROWNUM<=200)
8 - filter(ROWNUM<=200)
9 - filter("ID">=376894993815568384)
10 - access("TS"<:2)
filter((INTERNAL_FUNCTION("TS")+0>=:1 AND "TS"<:2))
Note how much actual fetched rows and time are better than optimizer estimations.
Update
Beware than even this optimal plan could fail down to slow local index full scan in case lower partition elimination bound was guessed too low that the lowest partition doesn't contain enough records to match query filters.
rleishman's Tuning "BETWEEN" Queries states:
The problem is that an index can only scan on one column with a range
predicate (<, >, LIKE, BETWEEN). So even if an index contained both
the lower_bound and upper_bound columns, the index scan will return
all of the rows matching lower_bound <= :b, and then filter the rows
that do not match upper_bound >= :b.
In the case where the sought value is somewhere in the middle, the
range scan will return half of the rows in the table in order to find
a single row. In the worst case where the most commonly sought rows
are at the top (highest values), the index scan will process almost
every row in the table for every lookup.
It means that, unfortunately, Oracle doesn't take into account the lower bound of a range scan filter until it reaches STOPKEY COUNT condition or scans the whole partition!
So we had to limit lower partition elimination bound heuristics to the same month the lower timestamp period bound falls into.
This defends against full index scans at expense of a risk of not showing some delayed transaction messages in the list.
But this can be easily resolved by extending the supplied time period if needed.
I've also tried to apply the same + 0 trick to force optimal plan with dynamic partition elimination bounds binding:
select * from (select /*+ FIRST_ROWS(40) INDEX_RS_DESC(FR_MESSAGE_PART (TS)) */ * from FR_MESSAGE_PART
where TS+0 >= :1 and TS < :2
and ID >= :3 and ID+0 < :4
order by TS DESC) where ROWNUM <= 40;
The plan then still retains proper STOPKEY COUNT per partition but the partition elimination is lost for upper bound as may be noticed by Pstart column of plan table:
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 26200 | 9083 (1)| 00:00:01 | | |
|* 1 | COUNT STOPKEY | | | | | | | |
| 2 | VIEW | | 153 | 97K| 9083 (1)| 00:00:01 | | |
|* 3 | SORT ORDER BY STOPKEY | | 153 | 14076 | 9083 (1)| 00:00:01 | | |
| 4 | PARTITION RANGE ITERATOR | | 153 | 14076 | 9082 (1)| 00:00:01 | 10 | KEY |
|* 5 | COUNT STOPKEY | | | | | | | |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FR_MESSAGE_PART | 153 | 14076 | 9082 (1)| 00:00:01 | 10 | KEY |
|* 7 | INDEX RANGE SCAN DESCENDING | IX_FR_MESSAGE_PART_TS | 11023 | | 891 (1)| 00:00:01 | 10 | KEY |
----------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=40)
3 - filter(ROWNUM<=40)
5 - filter(ROWNUM<=40)
6 - filter("ID">=TO_NUMBER(:3) AND "ID"+0<TO_NUMBER(:4))
7 - access("TS"<TO_TIMESTAMP(:2))
filter(INTERNAL_FUNCTION("TS")+0>=:1 AND "TS"<TO_TIMESTAMP(:2))
Is Dynamic SQL an option? That way you could "inject" the TRX_ID and CREATE_TS filter values eliminating the use of bind variables. Maybe then the generated plan would include COUNT STOPKEY.
By Dynamic SQL I meant for you to construct the SQL dynamically and then invoking it with EXECUTE IMMEDIATE or OPEN. By using this you are able to use your filters directly without bind variables. Example:
v_sql VARCHAR2(1000) :=
'select rd from (
select /*+ INDEX(OUT_SMS OUT_SMS_CREATE_TS_TRX_ID_IX) */ rowid rd
from OUT_SMS
where TRX_ID between ' || v_trx_id_min || ' and ' || v_trx_id_maxb || '
and CREATE_TS between ' || v_create_ts_min|| ' and ' || v_create_ts_max || '
order by CREATE_TS DESC, TRX_ID DESC
) where rownum <= 20';
then invoke it using:
EXECUTE IMMEDIATE v_sql;
or even:
OPEN cursor_out FOR v_sql;