Why does explain plan show the wrong number of rows? - sql

I am trying to simulate this and to do that I have created the following procedure to insert a large number of rows:
create or replace PROCEDURE a_lot_of_rows is
i carte.cod%TYPE;
a carte.autor%TYPE := 'Author #';
t carte.titlu%TYPE := 'Book #';
p carte.pret%TYPE := 3.23;
e carte.nume_editura%TYPE := 'Penguin Random House';
begin
for i in 8..1000 loop
insert into carte
values (i, e, a || i, t || i, p, 'hardcover');
commit;
end loop;
for i in 1001..1200 loop
insert into carte
values (i, e, a || i, t || i, p, 'paperback');
commit;
end loop;
end;
I have created a bitmap index on the tip_coperta column (which can only have the values 'hardcover' and 'paperback') and then inserted 1200 more rows. However, the result given by the explain plan is the following (before the insert procedure, the table had 7 rows, of which 4 had the tip_coperta = 'paperback'):
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 284 | 34 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CARTE | 4 | 284 | 34 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIP_COPERTA"='paperback')

Motto: Bad Statistics are Worse than No Statistics
TLDR your statistics are stale and need to be recollected. If you create index the index statistics are automatically gathered but not the table statistics that are relevant for your case.
Lets simulate your example with the following script to create the table and fill it with 1000 hardcovers and 200 paperbacks.
create table CARTE
(cod int,
autor VARCHAR2(100),
titlu VARCHAR2(100),
pret NUMBER,
nume_editura VARCHAR2(100),
tip_coperta VARCHAR2(100)
);
insert into CARTE
(cod,autor,titlu,pret,nume_editura,tip_coperta)
select rownum,
'Author #'||rownum ,
'Book #'||rownum,
3.23,
'Penguin Random Number',
case when rownum <=1000 then 'hardcover'
else 'paperback' end
from dual connect by level <= 1200;
commit;
This leaves the new table without optimizer object statistics, which you can verfiy with the following query that return only NULLs
select NUM_ROWS, LAST_ANALYZED from user_tables where table_name = 'CARTE';
So, let's check what is the Oracle impression of the table:
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from CARTE
where tip_coperta = 'paperback'
;
--
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
The script above produce the execution plan for your query asking for paberbacks and you see that the Rows is fine (= 200). How is this possible?
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 46800 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CARTE | 200 | 46800 | 5 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIP_COPERTA"='paperback')
The explanation is in the Notes of the plan output - the dynamic sampling was used.
Basically Oracle execute while parsing the query an additional query to estimate the number of rows with the filter predicate.
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Dynamic sampling is fine for tables that are used seldon, but if the table is queried regularly, we need optimizer statsitics to save the overhead of dynamic sampling.
So let's collect statistics
exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'CARTE' );
Now you see that the statistics are gathered, the total number of rows is fine and
in the column statistics a frequency histogram is created - this is important to estimate the count of records with a specific value!
select NUM_ROWS, LAST_ANALYZED from user_tables where table_name = 'CARTE';
NUM_ROWS LAST_ANALYZED
---------- -------------------
1200 09.01.2021 16:48:26
select NUM_DISTINCT,HISTOGRAM from user_tab_columns where table_name = 'CARTE' and column_name = 'TIP_COPERTA';
NUM_DISTINCT HISTOGRAM
------------ ---------------
2 FREQUENCY
Lets vefiry how the statistics are working now in the execution plan
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from CARTE
where tip_coperta = 'paperback'
;
--
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
Basically we see the same correct result
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 12400 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CARTE | 200 | 12400 | 5 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIP_COPERTA"='paperback')
Now we delete all but the four 'paperback's from the table
delete from CARTE
where tip_coperta = 'paperback' and cod > 1004;
commit;
select count(*) from CARTE
where tip_coperta = 'paperback'
COUNT(*)
----------
4
With this action the statistics went stale and give a wrong result based on obsolet data. This wrong result will occure until the statistics are recollected.
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from CARTE
where tip_coperta = 'paperback'
;
--
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 12400 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CARTE | 200 | 12400 | 5 (0)| 00:00:01 |
---------------------------------------------------------------------------
Set up a policy that maintains your statistics up-to-date!

It is the table statistics that are important for this cardinality.
You will need to wait until the automatic stats gathering task to fire up and obtain the new statistics, or you can do it yourself:
exec dbms_stats.gather_table_stats(null,'carte',method_opt=>'for all columns size 1 for columns size 254 TIP_COPERTA')
This will force there to be a histogram on the TIP_COPERTA column and not on the others (you may wish to use for all columns size skew or for all columns size auto or even just let it default to whatever the set preferred method_opt parameter is. Have a read of this article for details about this parameter.
In some of the later versions of Oracle, depending on where you are running it, you may also have Real-Time Statistics. This is where Oracle will be keeping your statistics up to date even after conventional DML.
It's important to remember that cardinality estimates do not need to be completely accurate for you to obtain reasonable execution plans. A common rule of thumb is that it should be within an order of magnitude, and even then you will probably be fine most of the time.

To obtain an estimate for the number of rows, Oracle needs you to analyze the table (or index). When you create an index, there is an automatic analyze.

Related

Execution plan of table in cluster expects one row when it should expect multiple

I have created a cluster and a table in the cluster with the following definitions:
create cluster roald_dahl_titles (
title varchar2(100)
);
create index i_roald_dahl_titles
on cluster roald_dahl_titles
;
create table ROALD_DAHL_NOVELS (
title varchar2(100),
published_year number
)
cluster roald_dahl_titles (title)
;
Notably, this is index is not created with the unique constraint, and it's quite possible to insert duplicate values into the table ROALD_DAHL_NOVELS:
insert into roald_dahl_novels (title, published_year) values ('Esio Trot', 1990);
insert into roald_dahl_novels (title, published_year) values ('Esio Trot', 1990);
I then gather statistics on the both the table and the index, and look at an execution plan that uses the index:
begin
dbms_stats.gather_table_stats(user, 'ROALD_DAHL_NOVELS');
dbms_stats.gather_INDEX_stats(user, 'I_ROALD_DAHL_TITLES');
end;
/
explain plan for
select published_year
from roald_dahl_novels
where title = 'Esio Trot';
select *
from table(dbms_xplan.display(format => 'ALL'));
The contents of the execution plan I find a bit confusing, though:
Plan hash value: 2187850431
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 28 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS CLUSTER| ROALD_DAHL_NOVELS | 2 | 28 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | I_ROALD_DAHL_TITLES | 1 | | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / ROALD_DAHL_NOVELS#SEL$1
2 - SEL$1 / ROALD_DAHL_NOVELS#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TITLE"='Esio Trot')
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "ROALD_DAHL_NOVELS".ROWID[ROWID,10], "TITLE"[VARCHAR2,100],
"PUBLISHED_YEAR"[NUMBER,22]
2 - "ROALD_DAHL_NOVELS".ROWID[ROWID,10]
As part of operation 2, it performs an index unique scan, which tells me that 'Esio Trot' is expected to appear only once in the cluster. The execution plan also says that for that operation, it expects to return only one row.
The column projection information tells me that it expects to return a single column (which will be a ROWID for the table ROALD_DAHL_NOVELS), so this tells me that the total number of ROWIDs returned from that operation will be 1 (1 row at 1 ROWID per row). Since each of the two rows in the table ROALD_DAHL_NOVELS has a different ROWID, then this operation can only be used to return a single row from the table.
When the TABLE ACCESS CLUSTER operation is performed, the execution plan then (correctly) expects two rows to be returned, which is what I find confusing. If these rows are being accessed by ROWID, then I would expect the previous operation to return (at least) two ROWIDs. If they are not being accessed by ROWID, I would not expect the previous operation to return and ROWIDs.
Also, in the TABLE ACCESS CLUSTER, the ROWID of the table ROALD_DAHL_NOVELS is listed in the column projection information section. I am not attempting to select the ROWID, so I would not expect it to be returned from that operation. If anywhere, I would expect it to be in the predicate information section.
Additional investigation
I tried inserting the same row into the table repeatedly, until it contained 65536 identical copies of the same row. After gathering stats and querying USER_INDEXES for the index I_ROALD_DAHL_TITLES, we got the following:
UNIQUENESS DISTINCT_KEYS AVG_DATA_BLOCKS_PER_KEY
UNIQUE 1 109
As I understand it, this tells us:
The index is unique, so we expect each key to appear once in the index
The index has only one distinct key ('Esio Trot'), so must have exactly one entry
The index expects our one key to match to several rows in the table, across 109 blocks
This seems paradoxical - for one key to match to several rows in the table would mean that there must be several entries in the index for that key (each matching to a different ROWID), which would contradict the index being unique.
When checking USER_EXTENTS, the index only uses a single extent of 65536 bytes, which is not enough space to hold information for each of the ROWIDs in the table.
It's not a bug.
Run this query in your database:
select UNIQUENESS from dba_indexes where index_name = upper('i_roald_dahl_titles');
UNIQUENES
---------
UNIQUE
The reason for this is that B-tree cluster indexes only store the database block address of the cluster block that stores that data -- it does not store full rowid values, like a normal index would.
So, while your various rows for title = 'Esio Trot' might have rowid values like:
select rowid row_id, title from roald_dahl_novels n;
ROW_ID TITLE
------------------ ----------------------------------------------------------------------------------------------------
ABocNnACmAABWsWAAL Esio Trot
ABocNnACmAABWsWAAM Esio Trot
ABocNnACmAABWsWAAN Esio Trot
The B-tree cluster index only stores one entry: "Esio Trot", with the corresponding database block address. You can confirm this in your database with:
select num_rows from dba_indexes where index_Name = 'I_ROALD_DAHL_TITLES';
NUM_ROWS
----------
1
That is why you are getting an UNIQUE SCAN reported. Because that is what it is doing, as far as the index is concerned.
There is the same issue with the actual execution plan (tested in 19.5).
Maybe it is a limitation or bug of the displayed execution plan for cluster objects. I would ask this question on asktom.oracle.com to have some kind of official (and free) answer from Oracle.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f41cf1x2zdyyr, child number 0
-------------------------------------
select published_year from roald_dahl_novels where title = 'Esio
Trot'
Plan hash value: 2187850431
--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1 (100)| | 2 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS CLUSTER| ROALD_DAHL_NOVELS | 1 | 2 | 28 | 1 (0)| 00:00:01 | 2 |00:00:00.01 | 3 |
|* 2 | INDEX UNIQUE SCAN | I_ROALD_DAHL_TITLES | 1 | 1 | | 0 (0)| | 1 |00:00:00.01 | 1 |
--------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / ROALD_DAHL_NOVELS#SEL$1
2 - SEL$1 / ROALD_DAHL_NOVELS#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TITLE"='Esio Trot')
Column Projection Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------
1 - "ROALD_DAHL_NOVELS".ROWID[ROWID,10], "TITLE"[VARCHAR2,100], "PUBLISHED_YEAR"[NUMBER,22]
2 - "ROALD_DAHL_NOVELS".ROWID[ROWID,10]
32 rows selected.

where column is null taking longer time to execute

I am executing a select statement like the one below which is taking more than 6mins to execute.
select * from table where col1 is null;
whereas:
select * from table;
returns results in few seconds. The table contains 25million records. No indexes are used. there is a composite PK but not on the col used. Same query when executed on a different table with 50 million records, returns results in few seconds. only this table poses a problem.
Rebuilt the table to check if there was a miss, but still facing the same issue.
can some one help me here on why it is taking time?
datatype: VARCHAR2(40)
PLAN:
Plan hash value: 2838772322
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 794 | 60973 (16)| 00:00:03 |
|* 1 | TABLE ACCESS STORAGE FULL| table | 1 | 794 | 60973 (16)| 00:00:03 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("column" IS NULL)
filter("column" IS NULL)
select * from table;
Oracle SQL Developer tool has a default setting to display only 50 records unless it was manually edited. So the entire 25 million records will not be fetched as you don't need all the records for display.
select * from table where col1 is null;
But when you filter for null values, the entire set of 25 million rows has to be scanned to apply the filter and get your 81 records satisfying that predicate. Hence your second query takes longer.

Performance of a sql query

I'm exectuting command on large table. It has around 7 millons rows.
Command is like this:
select * from mytable;
Now I'm restriting the number of rows to around 3 millons. I'm using this command:
select * from mytable where timest > add_months( sysdate, -12*4 )
I have an index on timest column. But the costs are almost same. I would expect they will decrease. What am I doing wrong?
Any clue?
Thank you in advance!
Here explain plans:
using an index for 3 out of 7 mio. of rows would most probably be even more expensive, so oracle makes a full table scan for both queries, which is IMO correct.
You may try to do parallel FTS (Full Table Scan) - it should be faster, BUT it will put your Oracle server under higher load, so don't do it on heavy loaded multiuser DBs.
Here is an example:
select /*+full(t) parallel(t,4)*/ *
from mytable t
where timest > add_months( sysdate, -12*4 );
To select a very small number of records from a table use index. To select a non-trivial part use partitioning.
In your case an effective acces would be enabled with range partitioning on timest column.
The big advantage is that only relevant partitions are accessed.
Here an exammple
create table test(ts date, s varchar2(4000))
PARTITION BY RANGE (ts)
(PARTITION t1p1 VALUES LESS THAN (TO_DATE('2010-01-01', 'YYYY-MM-DD')),
PARTITION t1p2 VALUES LESS THAN (TO_DATE('2015-01-01', 'YYYY-MM-DD')),
PARTITION t1p4 VALUES LESS THAN (MAXVALUE)
);
Query
select * from test where ts < to_date('2009-01-01','yyyy-mm-dd');
will access only the paartition 1, i.e. only before '2010-01-01'.
See pstart an dpstop in execution plan
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 10055 | 9 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 5 | 10055 | 9 (0)| 00:00:01 | 1 | 1 |
|* 2 | TABLE ACCESS FULL | TEST | 5 | 10055 | 9 (0)| 00:00:01 | 1 | 1 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("TS"<TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
There are (at least) two problems.
add_months( sysdate, -12*4 ) is a function. It's not just a constant, so optimizer can't use index here.
Choosing 3 mln. from 7 mln. rows by index is not good idea anyway. Yes, you (would) go by index tree fast, yet each time you have to go to the heap (cause you need * = all rows). That means that there's no sense in using this index.
Thus the index plays no role here.

GATHER_PLAN_STATISTICS does does not generate basic plan statistics

all,
I am learning to tune query now, when I ran the following:
select /*+ gather_plan_statistics */ * from emp;
select * from table(dbms_xplan.display(FORMAT=>'ALLSTATS LAST'));
The result always says:
Warning: basic plan statistics not available. These are only collected when:
hint 'gather_plan_statistics' is used for the statement or
parameter 'statistics_level' is set to 'ALL', at session or system level
I tried the alter session set statistics_level = ALL; too in sqlplus, but that did not change anything in the result.
Could anyone please let me know what I might have missed?
Thanks so much.
DISPLAY Function displays a content of PLAN_TABLE generated (filled) by EXPLAIN PLAN FOR command. So you can use it to generate and display an (theoretical) plan using EXPLAIN PLAN FOR command, for example in this way:
create table emp as select * from all_objects;
explain plan for
select /*+ gather_plan_statistics */ count(*) from emp where object_id between 100 and 150;
select * from table(dbms_xplan.display );
Plan hash value: 2083865914
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 351 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
|* 2 | TABLE ACCESS FULL| EMP | 12 | 60 | 351 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OBJECT_ID"<=150 AND "OBJECT_ID">=100)
/*+ gather_plan_statistics */ hint does not save data into PLAN_TABLE, but it stores execution statistics in V$SQL_PLAN performance view.
To display these data you can use a method described here: http://www.dba-oracle.com/t_gather_plan_statistics.htm, but this not always work, because you must execute the second command immediately after the SQL query.
The better method is to query V$SQL view to obtain SQL_ID of the query, and then use DISPLAY_CURSOR function, for example in this way:
select /*+ gather_plan_statistics */ count(*) from emp where object_id between 100 and 150;
select sql_id, plan_hash_value, child_number, executions, fetches, cpu_time, elapsed_time, physical_read_requests, physical_read_bytes
from v$sql s
where sql_fulltext like 'select /*+ gather_plan_statistics */ count(*)%from emp%'
and sql_fulltext not like '%from v$sql' ;
SQL_ID PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS FETCHES CPU_TIME ELAPSED_TIME PHYSICAL_READ_REQUESTS PHYSICAL_READ_BYTES
------------- --------------- ------------ ---------- ---------- ---------- ------------ ---------------------- -------------------
9jjm288hx7buz 2083865914 0 1 1 15625 46984 26 10305536
The above query returns SQL_ID=9jjm288hx7buz and CHILD_NUMBER=0(child number is just a cursor number). Use these values to query the colledted plan:
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('9jjm288hx7buz', 0, 'ALLSTATS'));
SQL_ID 9jjm288hx7buz, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from emp where object_id
between 100 and 150
Plan hash value: 2083865914
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | | 2 |00:00:00.05 | 10080 |
| 1 | SORT AGGREGATE | | 2 | 1 | 2 |00:00:00.05 | 10080 |
|* 2 | TABLE ACCESS FULL| EMP | 2 | 47 | 24 |00:00:00.05 | 10080 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("OBJECT_ID"<=150 AND "OBJECT_ID">=100))
If all you ran were the two statements in your question:
select /*+ gather_plan_statistics */ * from emp;
select * from table(dbms_xplan.display(FORMAT=>'ALLSTATS LAST'));
Then I think your problem is your use of DBMS_XPLAN.DISPLAY. The way you are using it, you are printing the plan of the last statement you explained, not the last statement you executed. And "explain" will not execute the query, so it will not benefit from a gather_plan_statistics hint.
This works for me in 12c:
select /*+ gather_plan_statistics */ count(*) from dba_objects;
SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (null, null, 'ALLSTATS LAST'));
i.e., display_cursor instead of just display.
What I leared from the answers so far:
When a query is parsed, the optimizer estimates how many rows are produced during each step of the query plan. Sometimes it is neccessary to check how good the prediction was. If the estimates are off by more than a order of magnitude, this might lead to the wrong plan being used.
To compare estimated and actual numbers, the following steps are necessary:
You need read access to V$SQL_PLAN, V$SESSION and V$SQL_PLAN_STATISTICS_ALL. These privileges are included in the SELECT_CATALOG role. (source)
Switch on statistics gathering, either by
ALTER SESSION SET STATISTICS_LEVEL = ALL;
or by using the hint /*+ gather_plan_statistics */ in the query.
There seems to be a certain performance overhead.
See for instance Jonathan's blog.
Run the query. You'll need to find it later, so it's best to include an arbitrary hint:
SELECT /*+ gather_plan_statistics HelloAgain */ * FROM scott.emp;
EXPLAIN PLAN FOR SELECT ... is not sufficient, as it will only create the estimates without running the actual query.
Furthermore, as #Matthew suggested (thanks!), it is important to actually fetch all rows. Most GUIs will show only the first 50 rows or so. In SQL Developer, you can use the shortcut ctrl+End in the query result window.
Find the query in the cursor cache and note it's SQL_ID:
SELECT sql_id, child_number, sql_text
FROM V$SQL
WHERE sql_text LIKE '%HelloAgain%';
dbqbqxp9srftn 0 SELECT /*+ gather_plan...
Format the result:
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('dbqbqxp9srftn',0,'ALLSTATS LAST'));
Steps 4. and 5. can be combined:
SELECT x.*
FROM v$sql s,
TABLE(DBMS_XPLAN.DISPLAY_CURSOR(s.sql_id, s.child_number)) x
WHERE s.sql_text LIKE '%HelloAgain%';
The result shows the estimated rows (E-Rows) and the actual rows (A-Rows):
SQL_ID dbqbqxp9srftn, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics HelloAgain */ * FROM scott.emp
Plan hash value: 3956160932
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 14 |00:00:00.01 | 6 |
| 1 | TABLE ACCESS FULL| EMP | 1 | 14 | 14 |00:00:00.01 | 6 |
------------------------------------------------------------------------------------
ALLSTATS LAST starts working after you ran the statement twice.

performance difference between to_char and to_date [duplicate]

This question already has answers here:
How to optimize an Oracle query that has to_char in where clause for date
(6 answers)
Closed 9 years ago.
I have simple SQL query.. on Oracle 10g. I want to know the difference between these queries:
select * from employee where id = 123 and
to_char(start_date, 'yyyyMMdd') >= '2013101' and
to_char(end_date, 'yyyyMMdd') <= '20121231';
select * from employee where id = 123 and
start_date >= to_date('2013101', 'yyyyMMdd') and
end_date <= to_date('20121231', 'yyyyMMdd');
Questions:
1. Are these queries the same? start_date, end_date are indexed date columns.
2. Does one work better over the other?
Please let me know. thanks.
The latter is almost certain to be faster.
It avoids data type conversions on a column value.
Oracle will estimate better the number of possible values between two dates, rather than two strings that are representations of dates.
Note that neither will return any rows as the lower limit is probably intended to be higher than the upper limit according to the numbers you've given. Also you've missed a numeral in 2013101.
One of the biggest flaw when you converting, casting or transforming to expression (i.e. "NVL", "COALESCE" etc.) columns in WHERE clause is that CBO will not be able to use index on that column. I slightly modified your example to show the difference:
SQL> create table t_test as
2 select * from all_objects;
Table created
SQL> create index T_TEST_INDX1 on T_TEST(CREATED, LAST_DDL_TIME);
Index created
Created table and index for our experiment.
SQL> execute dbms_stats.set_table_stats(ownname => 'SCOTT',
tabname => 'T_TEST',
numrows => 100000,
numblks => 10000);
PL/SQL procedure successfully completed
We are making CBO think that our table kind of big one.
SQL> explain plan for
2 select *
3 from t_test tt
4 where tt.owner = 'SCOTT'
5 and to_char(tt.last_ddl_time, 'yyyyMMdd') >= '20130101'
6 and to_char(tt.created, 'yyyyMMdd') <= '20121231';
Explained
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2796558804
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 300 | 2713 (1)| 00:00:33 |
|* 1 | TABLE ACCESS FULL| T_TEST | 3 | 300 | 2713 (1)| 00:00:33 |
----------------------------------------------------------------------------
Full table scan is used which would be costly on big table.
SQL> explain plan for
2 select *
3 from t_test tt
4 where tt.owner = 'SCOTT'
5 and tt.last_ddl_time >= to_date('20130101', 'yyyyMMdd')
6 and tt.created <= to_date('20121231', 'yyyyMMdd');
Explained
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1868991173
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 300 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T_TEST | 3 | 300 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_TEST_INDX1 | 8 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
See, now it's index range scan and the cost is significantly lower.
SQL> drop table t_test;
Table dropped
Finally cleaning.
for output (displaying) purpose use to_char
for "date" handling (insert, update, compare etc) use to_date
I don't have any performance link to share, but using to_date in above Query should run faster!
While the to_char it will first cast the date and then for making the compare it will need to resolve it as date type. There will be a small performance loss.
As using to_date it will not need to cast first, it will use date type directly.