Performance of a sql query - sql

I'm exectuting command on large table. It has around 7 millons rows.
Command is like this:
select * from mytable;
Now I'm restriting the number of rows to around 3 millons. I'm using this command:
select * from mytable where timest > add_months( sysdate, -12*4 )
I have an index on timest column. But the costs are almost same. I would expect they will decrease. What am I doing wrong?
Any clue?
Thank you in advance!
Here explain plans:

using an index for 3 out of 7 mio. of rows would most probably be even more expensive, so oracle makes a full table scan for both queries, which is IMO correct.
You may try to do parallel FTS (Full Table Scan) - it should be faster, BUT it will put your Oracle server under higher load, so don't do it on heavy loaded multiuser DBs.
Here is an example:
select /*+full(t) parallel(t,4)*/ *
from mytable t
where timest > add_months( sysdate, -12*4 );

To select a very small number of records from a table use index. To select a non-trivial part use partitioning.
In your case an effective acces would be enabled with range partitioning on timest column.
The big advantage is that only relevant partitions are accessed.
Here an exammple
create table test(ts date, s varchar2(4000))
PARTITION BY RANGE (ts)
(PARTITION t1p1 VALUES LESS THAN (TO_DATE('2010-01-01', 'YYYY-MM-DD')),
PARTITION t1p2 VALUES LESS THAN (TO_DATE('2015-01-01', 'YYYY-MM-DD')),
PARTITION t1p4 VALUES LESS THAN (MAXVALUE)
);
Query
select * from test where ts < to_date('2009-01-01','yyyy-mm-dd');
will access only the paartition 1, i.e. only before '2010-01-01'.
See pstart an dpstop in execution plan
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 10055 | 9 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 5 | 10055 | 9 (0)| 00:00:01 | 1 | 1 |
|* 2 | TABLE ACCESS FULL | TEST | 5 | 10055 | 9 (0)| 00:00:01 | 1 | 1 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("TS"<TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

There are (at least) two problems.
add_months( sysdate, -12*4 ) is a function. It's not just a constant, so optimizer can't use index here.
Choosing 3 mln. from 7 mln. rows by index is not good idea anyway. Yes, you (would) go by index tree fast, yet each time you have to go to the heap (cause you need * = all rows). That means that there's no sense in using this index.
Thus the index plays no role here.

Related

Improve performance of NOT EXISTS in case of large tables

What I am trying to accomplish is getting rows from one table that do not match another table based on specific filters. The two tables are relatively huge so I am trying to filter them based on a certain time range.
The steps I went through so far.
Get the IDs from "T1" for the last 3 days
SELECT
id
FROM T1
WHERE STARTTIME BETWEEN '3 days ago' AND 'now';
Execution time is 4.5s.
Get the IDs from "T2" for the last 3 days
SELECT
id
FROM T2
WHERE STARTTIME BETWEEN '3 days ago' AND 'now';
Execution time is 2.5s.
Now I try to use NOT EXISTS to merge the results from both statements into one
SELECT
CID
FROM T1
WHERE STARTTIME BETWEEN '3 days ago' AND 'now'
AND NOT EXISTS (
SELECT NULL FROM T2
WHERE T1.ID = T2.ID
AND STARTTIME BETWEEN '3 days ago' AND 'now'
);
Execution time is 23s.
I also tried the INNER JOIN logic from this answer thinking it makes sense, but I get no results so I cannot properly evaluate.
Is there a better way to construct this statement that could possibly lead to a faster execution time?
19.01.2022 - Update based on comments
Expected result can contain any number of rows between 1 and 10 000
The used columns have the following indexes:
CREATE INDEX IX_T1_CSTARTTIME
ON T1 (CSTARTTIME ASC)
TABLESPACE MYHOSTNAME_DATA1;
CREATE INDEX IX_T2_CSTARTTIME
ON T2 (CSTARTTIME ASC)
TABLESPACE MYHOSTNAME_DATA2;
NOTE: Just noticed that the indexes are located on different table spaces, could this be a potential issue as well?
Following the excellent comments from Marmite Bomber here is the execution plan for the statement:
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 21773 | 2019K| | 1817K (1)| 00:01:12 |
|* 1 | HASH JOIN RIGHT ANTI| | 21773 | 2019K| 112M| 1817K (1)| 00:01:12 |
|* 2 | TABLE ACCESS FULL | T2 | 2100K| 88M| | 1292K (1)| 00:00:51 |
|* 3 | TABLE ACCESS FULL | T1 | 2177K| 105M| | 512K (1)| 00:00:21 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."ID"="T1"."ID")
2 - filter("STARTTIME">=1642336690000 AND "T2"."ID" IS NOT NULL
AND "STARTTIME"<=1642595934000)
3 - filter("STARTTIME">=1642336690000 AND
"STARTTIME"<=1642595934000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1; rowset=256) "T1"."ID"[CHARACTER,38]
2 - (rowset=256) "T2"."ID"[CHARACTER,38]
3 - (rowset=256) "ID"[CHARACTER,38]
Is there a better way to construct this statement that could possibility lead to a faster execution time?
Your basic responsibility is to write the SQL staement, the basic responsibility of Oracle is to come with an execution plan
If you are not satified (but you should know that a combination of two sources using NOT EXISTS will take longer that the sum of the time to extract the data from the sources) your fist step should be to verify the execution plan (and not try to rewrite the statement).
See some more details how to proceede here
EXPLAIN PLAN SET STATEMENT_ID = 'stmt1' into plan_table FOR
SELECT
PAD
FROM T1
WHERE STARTTIME BETWEEN date'2021-01-11' AND date'2021-01-13'
AND NOT EXISTS (
SELECT NULL FROM T2
WHERE T1.ID = T2.ID
AND STARTTIME BETWEEN date'2021-01-11' AND date'2021-01-13'
);
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'stmt1','ALL'));
This is what you should see
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1999 | 150K| 10175 (1)| 00:00:01 |
|* 1 | HASH JOIN RIGHT ANTI| | 1999 | 150K| 10175 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | T2 | 2002 | 26026 | 4586 (1)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | T1 | 4002 | 250K| 5589 (1)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
2 - filter("STARTTIME"<=TO_DATE(' 2021-01-13 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "STARTTIME">=TO_DATE(' 2021-01-11 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
3 - filter("STARTTIME"<=TO_DATE(' 2021-01-13 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "STARTTIME">=TO_DATE(' 2021-01-11 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
Note that the hash join (here anti due to the not exists) is the best way to join two large row sources. Note also that the plan does not use indexes. The reason is the same - to access large data you do not want to go over index.
Contrary to the case of low cardinality row sources (OTPL) where you expects to see index access and NESTED LOOPS ANTI.
Some times is Oracle confused (e.g. while seeing stale statistics) and decide to go the NESTED LOOPway even for large data - which leads to long elapsed time.
This should help you at least to decide if you have a problem or not.
Perhaps a simple MINUS operation will accomplish what you are looking for:
select id
from ( select id
from t1
where starttime between '3 days ago' and 'now'
MINUS
select id
from t2
where starttime between '3 days ago' and 'now'
);
for however you actually define starttime between '3 days ago' and 'now'. This literally uses your current queries as is the MINUS operation removes from the first those values which do exist in the second and returns the result. See MINUS demo here.

What is SYS_OP_UNDESCEND and SYS_OP_DESCEND in Oracle Explain Plan?

I have an Oracle explain plan that looks like this:
Plan hash value: 2484140766
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 180K| 84M| 5 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 180K| 84M| 5 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID | OSTRICH | 6500K| 793M| 5 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN DESCENDING| OSTRICH_ENDDATE_IDX_2 | 1 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=180000)
3 - filter("OSTRICH_STATUS_ID"=2)
4 - access(SYS_OP_DESCEND("END_DATE")>=SYS_OP_DESCEND(SYSDATE#!))
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("END_DATE"))<=SYSDATE#!)
I have been trying to understand what is happening with these 2 lines at the bottom:
4 - access(SYS_OP_DESCEND("END_DATE")>=SYS_OP_DESCEND(SYSDATE#!))
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("END_DATE"))<=SYSDATE#!)
What do SYS_OP_UNDESCEND and SYS_OP_DESCEND mean?
The index that the explain plan references is (I think) called a descending index. (I do not know a lot about Oracle indexing.) The DDL for that index is:
CREATE INDEX
OSTRICH_ENDDATE_IDX_2
ON
OSTRICH
(
"END_DATE" DESC
);
The actual query looks like this:
SELECT
l.id,
l.end_date,
l.status
FROM
(
SELECT
*
from OSTRICH l2
where END_DATE <= SYSDATE
and OSTRICH_STATUS_ID = 2
order by l2.END_DATE
) l
WHERE ROWNUM <= 180000;
What do SYS_OP_UNDESCEND and SYS_OP_DESCEND mean? This query is taking much longer than I would expect, and I am trying to understand what impact the descending and undescending has on the query?
Oracle implements the descending index "as if" it were a function-based index. Function-based indexes are invoked when a query uses the function call; thus an FBI on upper(col1) would be used when the WHERE clause filters on upper(col1) = 'WHATEVER'.
In this case I think the SYS_OP_DESCEND is the "function" Oracle uses when creating a descending index I think it is then invoking SYS_OP_UNDESCEND because your WHERE clause is unsuited to a descending index. It's not surprising the performance sucks.
There are very few use cases where a descending index is a good idea. Why are you using one on this column on this table?
Assuming there is a good reason for using the index and you can't just drop it, your best bet for improved performance would be to not use the index for this query. Doing something like this should prevent the optimiser not using the index:
SELECT
l.id,
l.end_date,
l.status
FROM
(
SELECT /*+ NO_INDEX(l2 OSTRICH_ENDDATE_IDX_2) */
*
from OSTRICH l2
where END_DATE <= SYSDATE
and OSTRICH_STATUS_ID = 2
order by l2.END_DATE
) l
WHERE ROWNUM <= 180000;
SYS_OP_UNDESCENDand SYS_OP_DESCEND are internal functions used by the CBO that appear in the EXPLAIN PLAN when a function based index is used or a sort operation inside an index clause has been specified.
In your case, you are using an INDEX with a SORT clause
CREATE INDEX
OSTRICH_ENDDATE_IDX_2
ON
OSTRICH
(
"END_DATE" DESC
);
Your plan shows these two operations:
access(SYS_OP_DESCEND("END_DATE")>=SYS_OP_DESCEND(SYSDATE#!))
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("END_DATE"))<=SYSDATE#!)
The first operation is the access, based on the desc index clause of the index itself, and the second the filter. Both appear because the query is done against the nature of the index.
I would never use this clause in any index unless the access is done in that way always, which is quite rare because sorting in different ways is what normally SQL is used for.
There is also this bug: ( fixed in 20.1 )
Bug 27589260 wrong sort order due to virtual column replacement in function based index
That degrades the performance of the query when a virtual column is present in the table and a function based index has been used.

Oracle SQL query running slow, full table scan on primary key, why?

I have a problem with a piece of code, I can't understand why the below query is doing a full table scan on the works table when wrk.cre_surr_id is the primary key. The stats on both tables are both up to date below are the indexes on both tables.
TABLE INDEXES
WORKS
INDEX NAME UNIQUE LOGGING COLUMN NAME ORDER
WRK_I1 N NO LOGICALLY_DELETED_Y Asc
WRK_ICE_WRK_KEY N YES ICE_WRK_KEY Asc
WRK_PK Y NO CRE_SURR_ID Asc
WRK_TUNECODE_UK Y NO TUNECODE Asc
TLE_TITLE_TOKENS
INDEX NAME UNIQUE LOGGING COLUMN NAME ORDER
TTT_I1 N YES TOKEN_TYPE, Asc
SEARCH_TOKEN,
DN_WRK_CRE_SURR_ID
TTT_TLE_FK_1 N YES TLE_SURR_ID
Problem query below. It has a cost of 245,876 which seems high, it's doing a FULL TABLE SCAN of the WORKS table which has 21,938,384 rows in the table. It is doing an INDEX RANGE SCAN of the TLE_TITLE_TOKENS table which has 19,923,002 rows in it. On the explain plan also is an INLIST ITERATOR which I haven't a clue what it means but it I think it's to do with having an "in ('E','N')" in my sql query.
SELECT wrk.cre_surr_id
FROM works wrk,
tle_title_tokens ttt
WHERE ttt.dn_wrk_cre_surr_id = wrk.cre_surr_id
AND wrk.logically_deleted_y IS NULL
AND ttt.token_type in ('E','N')
AND ttt.search_token LIKE 'BELIEVE'||'%'
When I break the query down and do a simple select from the TLE_TITLE_TOKENS table I get 280,000 records back.
select ttt.dn_wrk_cre_surr_id
from tle_title_tokens ttt
where ttt.token_type in ('E','N')
and ttt.search_token LIKE 'BELIEVE'||'%'
How do I stop it doing a FULL TABLE scan on the WORKS table. I could put a hint on the query but I would have thought Oracle would be clever enough to know to use the index without a hint.
Also on TLE_TITLE_TOKENS table would it be better to create a fuction based index on the column SEARCH_TOKEN as users seem to do LIKE % searches on this field. What would that fuction based index look like.
I'm running on an Oracle 11g database.
Thanks in Advance to any answers.
First, rewrite the query using a join:
SELECT wrk.cre_surr_id
FROM tle_title_tokens ttt JOIN
works wrk
ON ttt.dn_wrk_cre_surr_id = wrk.cre_surr_id
WHERE wrk.logically_deleted_y IS NULL AND
ttt.token_type in ('E', 'N') AND
ttt.search_token LIKE 'BELIEVE'||'%';
You should be able to speed this query by using indexes. It is not clear what the best index is. I would suggest either tle_title_tokens(search_token, toekn_type, dn_wrk_cre_surr_id) and works(cre_surr_id, logically_deleted_y).
Another possibility is to write the query using EXISTS, such as:
SELECT wrk.cre_surr_id
FROM works wrk
WHERE wrk.logically_deleted_y IS NULL AND
EXISTS (SELECT 1
FROM tle_title_tokens ttt
WHERE ttt.dn_wrk_cre_surr_id = wrk.cre_surr_id AND
ttt.token_type IN ('N', 'E') AND
ttt.search_token LIKE 'BELIEVE'||'%'
) ;
For this version, you want indexes on works(logically_deleted_y, cre_surr_id) and tle_title_tokens(dn_wrk_cre_surr_id, token_type, search_token).
try this:
SELECT /*+ leading(ttt) */ wrk.cre_surr_id
FROM works wrk,
tle_title_tokens ttt
WHERE ttt.dn_wrk_cre_surr_id = wrk.cre_surr_id
AND wrk.logically_deleted_y IS NULL
AND ttt.token_type in ('E','N')
AND ttt.search_token LIKE 'BELIEVE'||'%'
Out of the 19,923,002 rows in LE_TITLE_TOKENS,
How many records have TOKEN_TYPE 'E', how many have 'N'? Are there any other TokenTypes? If yes, then how many are they put together?
If E and N put together forms a small part of the total records, then check if histogram statistics are updated for that column.
The execution plan depends on how many records are being selected from LE_TITLE_TOKENS out of the 20M records for the given filters.
I'm assuming this index definition
create index works_idx on works (cre_surr_id,logically_deleted_y);
create index title_tokens_idx on tle_title_tokens(search_token,token_type,dn_wrk_cre_surr_id);
There are typically two possible scenarios to execute the join
NESTED LOOPS which access the inner table WORKS using index, but repeatedly in a loop for each row in the outer table
HASH JOIN which access the WORKS using FULL SCAN but only once.
It is not possible to say that one option is bad and the other good.
Nested loops is better if there are only few row in the outer table (few loops), but with increasing number of records in the outer table (TOKEN) gets slower and
slower and at some number of row the HASH JOIN is bettwer.
How to see what execution plan is better? Simple force Oracle using hint to run both scanarios and compare the elapsed time.
In your case you should see those two execution plans
HASH JOIN
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 207K| 10M| | 2439 (1)| 00:00:30 |
|* 1 | HASH JOIN | | 207K| 10M| 7488K| 2439 (1)| 00:00:30 |
|* 2 | INDEX RANGE SCAN | TITLE_TOKENS_IDX | 207K| 5058K| | 29 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| WORKS | 893K| 22M| | 431 (2)| 00:00:06 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TTT"."DN_WRK_CRE_SURR_ID"="WRK"."CRE_SURR_ID")
2 - access("TTT"."SEARCH_TOKEN" LIKE 'BELIEVE%')
filter("TTT"."SEARCH_TOKEN" LIKE 'BELIEVE%' AND ("TTT"."TOKEN_TYPE"='E' OR
"TTT"."TOKEN_TYPE"='N'))
3 - filter("WRK"."LOGICALLY_DELETED_Y" IS NULL)
NESTED LOOPS
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 207K| 10M| 414K (1)| 01:22:56 |
| 1 | NESTED LOOPS | | 207K| 10M| 414K (1)| 01:22:56 |
|* 2 | INDEX RANGE SCAN| TITLE_TOKENS_IDX | 207K| 5058K| 29 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| WORKS_IDX | 1 | 26 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TTT"."SEARCH_TOKEN" LIKE 'BELIEVE%')
filter("TTT"."SEARCH_TOKEN" LIKE 'BELIEVE%' AND
("TTT"."TOKEN_TYPE"='E' OR "TTT"."TOKEN_TYPE"='N'))
3 - access("TTT"."DN_WRK_CRE_SURR_ID"="WRK"."CRE_SURR_ID" AND
"WRK"."LOGICALLY_DELETED_Y" IS NULL)
My gues is the (with 280K loops) the hash join (i.e. FULLTABLE SCAN) will be bettwer, but it could be that you recognise that nested loops should be used.
In this case the optimize doesn't correct recognise the switching point between nested loops and hash join.
Common cause of this is wrong or missing system statistics or improper optimizer parameters.

Missing parenthesis error when creating Function Based Index

I am attempting to create a Function Based Index on a predicate that has a high cost (Oracle).
I want to create an index on the TIME_ID column in the A4ORDERS table that brings back values for month of December:
SELECT * FROM A4ORDERS WHERE TRIM(TO_CHAR(time_id, 'Month')) in ( 'December' );
Creating the FBI:
CREATE INDEX TIME_FIDX ON A4ORDERS(TRIM(TO_CHAR(time_id, 'Month')) in ( 'December' ));
I get a "Missing Right parenthesis" error and I can't figure out why? Any guidance you can provide would be appreciated.
Solution from Alex Poole's response below that worked:
CREATE INDEX TIME_FIDX ON A4ORDERS (TRIM(TO_CHAR(time_id, 'Month')));
Your create index statement should not have the in ( 'December' ) part, that only belongs in the query. If you create the index as:
CREATE INDEX TIME_FIDX ON A4ORDERS (TRIM(TO_CHAR(time_id, 'Month')));
... then that index can be used by your query:
EXPLAIN PLAN FOR
SELECT * FROM A4ORDERS WHERE TRIM(TO_CHAR(time_id, 'Month')) in ( 'December' );
SELECT plan_table_output FROM TABLE (dbms_xplan.display());
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| A4ORDERS | 1 | 29 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TIME_FIDX | 1 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(TRIM(TO_CHAR(INTERNAL_FUNCTION("TIME_ID"),'Month'))='December')
So you can see from the plan that TIME_FIDX is being used. Whether it will give you a significant performance gain remains to be seen of course, and the optimiser could decide it isn't selective enough anyway.
'Month' is NLS-sensitive though; it would be safer to either use the month number, or specify the NLS_DATE_LANGUAGE in the TO_CHAR call, but it has to be done consistently - which will be a little easier with numbers. You could also make it an indexed virtual column.
You can use:
CREATE INDEX TIME_FIDX ON A4ORDERS(TRIM(TO_CHAR(time_id, 'Month')));
But also you can make it bit more simple:
CREATE INDEX TIME_FIDX ON A4ORDERS(TO_CHAR(time_id, 'mm'));
and write SQL:
SELECT * FROM A4ORDERS WHERE TO_CHAR(time_id, 'mm') in ( '12');
But if you provide more information about your problem (workaround, SQL query, plans etc.) you can receive more help.

performance difference between to_char and to_date [duplicate]

This question already has answers here:
How to optimize an Oracle query that has to_char in where clause for date
(6 answers)
Closed 9 years ago.
I have simple SQL query.. on Oracle 10g. I want to know the difference between these queries:
select * from employee where id = 123 and
to_char(start_date, 'yyyyMMdd') >= '2013101' and
to_char(end_date, 'yyyyMMdd') <= '20121231';
select * from employee where id = 123 and
start_date >= to_date('2013101', 'yyyyMMdd') and
end_date <= to_date('20121231', 'yyyyMMdd');
Questions:
1. Are these queries the same? start_date, end_date are indexed date columns.
2. Does one work better over the other?
Please let me know. thanks.
The latter is almost certain to be faster.
It avoids data type conversions on a column value.
Oracle will estimate better the number of possible values between two dates, rather than two strings that are representations of dates.
Note that neither will return any rows as the lower limit is probably intended to be higher than the upper limit according to the numbers you've given. Also you've missed a numeral in 2013101.
One of the biggest flaw when you converting, casting or transforming to expression (i.e. "NVL", "COALESCE" etc.) columns in WHERE clause is that CBO will not be able to use index on that column. I slightly modified your example to show the difference:
SQL> create table t_test as
2 select * from all_objects;
Table created
SQL> create index T_TEST_INDX1 on T_TEST(CREATED, LAST_DDL_TIME);
Index created
Created table and index for our experiment.
SQL> execute dbms_stats.set_table_stats(ownname => 'SCOTT',
tabname => 'T_TEST',
numrows => 100000,
numblks => 10000);
PL/SQL procedure successfully completed
We are making CBO think that our table kind of big one.
SQL> explain plan for
2 select *
3 from t_test tt
4 where tt.owner = 'SCOTT'
5 and to_char(tt.last_ddl_time, 'yyyyMMdd') >= '20130101'
6 and to_char(tt.created, 'yyyyMMdd') <= '20121231';
Explained
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2796558804
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 300 | 2713 (1)| 00:00:33 |
|* 1 | TABLE ACCESS FULL| T_TEST | 3 | 300 | 2713 (1)| 00:00:33 |
----------------------------------------------------------------------------
Full table scan is used which would be costly on big table.
SQL> explain plan for
2 select *
3 from t_test tt
4 where tt.owner = 'SCOTT'
5 and tt.last_ddl_time >= to_date('20130101', 'yyyyMMdd')
6 and tt.created <= to_date('20121231', 'yyyyMMdd');
Explained
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1868991173
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 300 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T_TEST | 3 | 300 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_TEST_INDX1 | 8 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
See, now it's index range scan and the cost is significantly lower.
SQL> drop table t_test;
Table dropped
Finally cleaning.
for output (displaying) purpose use to_char
for "date" handling (insert, update, compare etc) use to_date
I don't have any performance link to share, but using to_date in above Query should run faster!
While the to_char it will first cast the date and then for making the compare it will need to resolve it as date type. There will be a small performance loss.
As using to_date it will not need to cast first, it will use date type directly.