performance difference between to_char and to_date [duplicate] - sql

This question already has answers here:
How to optimize an Oracle query that has to_char in where clause for date
(6 answers)
Closed 9 years ago.
I have simple SQL query.. on Oracle 10g. I want to know the difference between these queries:
select * from employee where id = 123 and
to_char(start_date, 'yyyyMMdd') >= '2013101' and
to_char(end_date, 'yyyyMMdd') <= '20121231';
select * from employee where id = 123 and
start_date >= to_date('2013101', 'yyyyMMdd') and
end_date <= to_date('20121231', 'yyyyMMdd');
Questions:
1. Are these queries the same? start_date, end_date are indexed date columns.
2. Does one work better over the other?
Please let me know. thanks.

The latter is almost certain to be faster.
It avoids data type conversions on a column value.
Oracle will estimate better the number of possible values between two dates, rather than two strings that are representations of dates.
Note that neither will return any rows as the lower limit is probably intended to be higher than the upper limit according to the numbers you've given. Also you've missed a numeral in 2013101.

One of the biggest flaw when you converting, casting or transforming to expression (i.e. "NVL", "COALESCE" etc.) columns in WHERE clause is that CBO will not be able to use index on that column. I slightly modified your example to show the difference:
SQL> create table t_test as
2 select * from all_objects;
Table created
SQL> create index T_TEST_INDX1 on T_TEST(CREATED, LAST_DDL_TIME);
Index created
Created table and index for our experiment.
SQL> execute dbms_stats.set_table_stats(ownname => 'SCOTT',
tabname => 'T_TEST',
numrows => 100000,
numblks => 10000);
PL/SQL procedure successfully completed
We are making CBO think that our table kind of big one.
SQL> explain plan for
2 select *
3 from t_test tt
4 where tt.owner = 'SCOTT'
5 and to_char(tt.last_ddl_time, 'yyyyMMdd') >= '20130101'
6 and to_char(tt.created, 'yyyyMMdd') <= '20121231';
Explained
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2796558804
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 300 | 2713 (1)| 00:00:33 |
|* 1 | TABLE ACCESS FULL| T_TEST | 3 | 300 | 2713 (1)| 00:00:33 |
----------------------------------------------------------------------------
Full table scan is used which would be costly on big table.
SQL> explain plan for
2 select *
3 from t_test tt
4 where tt.owner = 'SCOTT'
5 and tt.last_ddl_time >= to_date('20130101', 'yyyyMMdd')
6 and tt.created <= to_date('20121231', 'yyyyMMdd');
Explained
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1868991173
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 300 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T_TEST | 3 | 300 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_TEST_INDX1 | 8 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
See, now it's index range scan and the cost is significantly lower.
SQL> drop table t_test;
Table dropped
Finally cleaning.

for output (displaying) purpose use to_char
for "date" handling (insert, update, compare etc) use to_date
I don't have any performance link to share, but using to_date in above Query should run faster!
While the to_char it will first cast the date and then for making the compare it will need to resolve it as date type. There will be a small performance loss.
As using to_date it will not need to cast first, it will use date type directly.

Related

Improve performance of NOT EXISTS in case of large tables

What I am trying to accomplish is getting rows from one table that do not match another table based on specific filters. The two tables are relatively huge so I am trying to filter them based on a certain time range.
The steps I went through so far.
Get the IDs from "T1" for the last 3 days
SELECT
id
FROM T1
WHERE STARTTIME BETWEEN '3 days ago' AND 'now';
Execution time is 4.5s.
Get the IDs from "T2" for the last 3 days
SELECT
id
FROM T2
WHERE STARTTIME BETWEEN '3 days ago' AND 'now';
Execution time is 2.5s.
Now I try to use NOT EXISTS to merge the results from both statements into one
SELECT
CID
FROM T1
WHERE STARTTIME BETWEEN '3 days ago' AND 'now'
AND NOT EXISTS (
SELECT NULL FROM T2
WHERE T1.ID = T2.ID
AND STARTTIME BETWEEN '3 days ago' AND 'now'
);
Execution time is 23s.
I also tried the INNER JOIN logic from this answer thinking it makes sense, but I get no results so I cannot properly evaluate.
Is there a better way to construct this statement that could possibly lead to a faster execution time?
19.01.2022 - Update based on comments
Expected result can contain any number of rows between 1 and 10 000
The used columns have the following indexes:
CREATE INDEX IX_T1_CSTARTTIME
ON T1 (CSTARTTIME ASC)
TABLESPACE MYHOSTNAME_DATA1;
CREATE INDEX IX_T2_CSTARTTIME
ON T2 (CSTARTTIME ASC)
TABLESPACE MYHOSTNAME_DATA2;
NOTE: Just noticed that the indexes are located on different table spaces, could this be a potential issue as well?
Following the excellent comments from Marmite Bomber here is the execution plan for the statement:
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 21773 | 2019K| | 1817K (1)| 00:01:12 |
|* 1 | HASH JOIN RIGHT ANTI| | 21773 | 2019K| 112M| 1817K (1)| 00:01:12 |
|* 2 | TABLE ACCESS FULL | T2 | 2100K| 88M| | 1292K (1)| 00:00:51 |
|* 3 | TABLE ACCESS FULL | T1 | 2177K| 105M| | 512K (1)| 00:00:21 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."ID"="T1"."ID")
2 - filter("STARTTIME">=1642336690000 AND "T2"."ID" IS NOT NULL
AND "STARTTIME"<=1642595934000)
3 - filter("STARTTIME">=1642336690000 AND
"STARTTIME"<=1642595934000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1; rowset=256) "T1"."ID"[CHARACTER,38]
2 - (rowset=256) "T2"."ID"[CHARACTER,38]
3 - (rowset=256) "ID"[CHARACTER,38]
Is there a better way to construct this statement that could possibility lead to a faster execution time?
Your basic responsibility is to write the SQL staement, the basic responsibility of Oracle is to come with an execution plan
If you are not satified (but you should know that a combination of two sources using NOT EXISTS will take longer that the sum of the time to extract the data from the sources) your fist step should be to verify the execution plan (and not try to rewrite the statement).
See some more details how to proceede here
EXPLAIN PLAN SET STATEMENT_ID = 'stmt1' into plan_table FOR
SELECT
PAD
FROM T1
WHERE STARTTIME BETWEEN date'2021-01-11' AND date'2021-01-13'
AND NOT EXISTS (
SELECT NULL FROM T2
WHERE T1.ID = T2.ID
AND STARTTIME BETWEEN date'2021-01-11' AND date'2021-01-13'
);
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'stmt1','ALL'));
This is what you should see
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1999 | 150K| 10175 (1)| 00:00:01 |
|* 1 | HASH JOIN RIGHT ANTI| | 1999 | 150K| 10175 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | T2 | 2002 | 26026 | 4586 (1)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | T1 | 4002 | 250K| 5589 (1)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
2 - filter("STARTTIME"<=TO_DATE(' 2021-01-13 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "STARTTIME">=TO_DATE(' 2021-01-11 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
3 - filter("STARTTIME"<=TO_DATE(' 2021-01-13 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "STARTTIME">=TO_DATE(' 2021-01-11 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
Note that the hash join (here anti due to the not exists) is the best way to join two large row sources. Note also that the plan does not use indexes. The reason is the same - to access large data you do not want to go over index.
Contrary to the case of low cardinality row sources (OTPL) where you expects to see index access and NESTED LOOPS ANTI.
Some times is Oracle confused (e.g. while seeing stale statistics) and decide to go the NESTED LOOPway even for large data - which leads to long elapsed time.
This should help you at least to decide if you have a problem or not.
Perhaps a simple MINUS operation will accomplish what you are looking for:
select id
from ( select id
from t1
where starttime between '3 days ago' and 'now'
MINUS
select id
from t2
where starttime between '3 days ago' and 'now'
);
for however you actually define starttime between '3 days ago' and 'now'. This literally uses your current queries as is the MINUS operation removes from the first those values which do exist in the second and returns the result. See MINUS demo here.

Why does explain plan show the wrong number of rows?

I am trying to simulate this and to do that I have created the following procedure to insert a large number of rows:
create or replace PROCEDURE a_lot_of_rows is
i carte.cod%TYPE;
a carte.autor%TYPE := 'Author #';
t carte.titlu%TYPE := 'Book #';
p carte.pret%TYPE := 3.23;
e carte.nume_editura%TYPE := 'Penguin Random House';
begin
for i in 8..1000 loop
insert into carte
values (i, e, a || i, t || i, p, 'hardcover');
commit;
end loop;
for i in 1001..1200 loop
insert into carte
values (i, e, a || i, t || i, p, 'paperback');
commit;
end loop;
end;
I have created a bitmap index on the tip_coperta column (which can only have the values 'hardcover' and 'paperback') and then inserted 1200 more rows. However, the result given by the explain plan is the following (before the insert procedure, the table had 7 rows, of which 4 had the tip_coperta = 'paperback'):
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 284 | 34 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CARTE | 4 | 284 | 34 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIP_COPERTA"='paperback')
Motto: Bad Statistics are Worse than No Statistics
TLDR your statistics are stale and need to be recollected. If you create index the index statistics are automatically gathered but not the table statistics that are relevant for your case.
Lets simulate your example with the following script to create the table and fill it with 1000 hardcovers and 200 paperbacks.
create table CARTE
(cod int,
autor VARCHAR2(100),
titlu VARCHAR2(100),
pret NUMBER,
nume_editura VARCHAR2(100),
tip_coperta VARCHAR2(100)
);
insert into CARTE
(cod,autor,titlu,pret,nume_editura,tip_coperta)
select rownum,
'Author #'||rownum ,
'Book #'||rownum,
3.23,
'Penguin Random Number',
case when rownum <=1000 then 'hardcover'
else 'paperback' end
from dual connect by level <= 1200;
commit;
This leaves the new table without optimizer object statistics, which you can verfiy with the following query that return only NULLs
select NUM_ROWS, LAST_ANALYZED from user_tables where table_name = 'CARTE';
So, let's check what is the Oracle impression of the table:
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from CARTE
where tip_coperta = 'paperback'
;
--
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
The script above produce the execution plan for your query asking for paberbacks and you see that the Rows is fine (= 200). How is this possible?
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 46800 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CARTE | 200 | 46800 | 5 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIP_COPERTA"='paperback')
The explanation is in the Notes of the plan output - the dynamic sampling was used.
Basically Oracle execute while parsing the query an additional query to estimate the number of rows with the filter predicate.
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Dynamic sampling is fine for tables that are used seldon, but if the table is queried regularly, we need optimizer statsitics to save the overhead of dynamic sampling.
So let's collect statistics
exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'CARTE' );
Now you see that the statistics are gathered, the total number of rows is fine and
in the column statistics a frequency histogram is created - this is important to estimate the count of records with a specific value!
select NUM_ROWS, LAST_ANALYZED from user_tables where table_name = 'CARTE';
NUM_ROWS LAST_ANALYZED
---------- -------------------
1200 09.01.2021 16:48:26
select NUM_DISTINCT,HISTOGRAM from user_tab_columns where table_name = 'CARTE' and column_name = 'TIP_COPERTA';
NUM_DISTINCT HISTOGRAM
------------ ---------------
2 FREQUENCY
Lets vefiry how the statistics are working now in the execution plan
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from CARTE
where tip_coperta = 'paperback'
;
--
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
Basically we see the same correct result
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 12400 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CARTE | 200 | 12400 | 5 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TIP_COPERTA"='paperback')
Now we delete all but the four 'paperback's from the table
delete from CARTE
where tip_coperta = 'paperback' and cod > 1004;
commit;
select count(*) from CARTE
where tip_coperta = 'paperback'
COUNT(*)
----------
4
With this action the statistics went stale and give a wrong result based on obsolet data. This wrong result will occure until the statistics are recollected.
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select * from CARTE
where tip_coperta = 'paperback'
;
--
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 12400 | 5 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CARTE | 200 | 12400 | 5 (0)| 00:00:01 |
---------------------------------------------------------------------------
Set up a policy that maintains your statistics up-to-date!
It is the table statistics that are important for this cardinality.
You will need to wait until the automatic stats gathering task to fire up and obtain the new statistics, or you can do it yourself:
exec dbms_stats.gather_table_stats(null,'carte',method_opt=>'for all columns size 1 for columns size 254 TIP_COPERTA')
This will force there to be a histogram on the TIP_COPERTA column and not on the others (you may wish to use for all columns size skew or for all columns size auto or even just let it default to whatever the set preferred method_opt parameter is. Have a read of this article for details about this parameter.
In some of the later versions of Oracle, depending on where you are running it, you may also have Real-Time Statistics. This is where Oracle will be keeping your statistics up to date even after conventional DML.
It's important to remember that cardinality estimates do not need to be completely accurate for you to obtain reasonable execution plans. A common rule of thumb is that it should be within an order of magnitude, and even then you will probably be fine most of the time.
To obtain an estimate for the number of rows, Oracle needs you to analyze the table (or index). When you create an index, there is an automatic analyze.

Performance of a sql query

I'm exectuting command on large table. It has around 7 millons rows.
Command is like this:
select * from mytable;
Now I'm restriting the number of rows to around 3 millons. I'm using this command:
select * from mytable where timest > add_months( sysdate, -12*4 )
I have an index on timest column. But the costs are almost same. I would expect they will decrease. What am I doing wrong?
Any clue?
Thank you in advance!
Here explain plans:
using an index for 3 out of 7 mio. of rows would most probably be even more expensive, so oracle makes a full table scan for both queries, which is IMO correct.
You may try to do parallel FTS (Full Table Scan) - it should be faster, BUT it will put your Oracle server under higher load, so don't do it on heavy loaded multiuser DBs.
Here is an example:
select /*+full(t) parallel(t,4)*/ *
from mytable t
where timest > add_months( sysdate, -12*4 );
To select a very small number of records from a table use index. To select a non-trivial part use partitioning.
In your case an effective acces would be enabled with range partitioning on timest column.
The big advantage is that only relevant partitions are accessed.
Here an exammple
create table test(ts date, s varchar2(4000))
PARTITION BY RANGE (ts)
(PARTITION t1p1 VALUES LESS THAN (TO_DATE('2010-01-01', 'YYYY-MM-DD')),
PARTITION t1p2 VALUES LESS THAN (TO_DATE('2015-01-01', 'YYYY-MM-DD')),
PARTITION t1p4 VALUES LESS THAN (MAXVALUE)
);
Query
select * from test where ts < to_date('2009-01-01','yyyy-mm-dd');
will access only the paartition 1, i.e. only before '2010-01-01'.
See pstart an dpstop in execution plan
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 10055 | 9 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 5 | 10055 | 9 (0)| 00:00:01 | 1 | 1 |
|* 2 | TABLE ACCESS FULL | TEST | 5 | 10055 | 9 (0)| 00:00:01 | 1 | 1 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("TS"<TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
There are (at least) two problems.
add_months( sysdate, -12*4 ) is a function. It's not just a constant, so optimizer can't use index here.
Choosing 3 mln. from 7 mln. rows by index is not good idea anyway. Yes, you (would) go by index tree fast, yet each time you have to go to the heap (cause you need * = all rows). That means that there's no sense in using this index.
Thus the index plays no role here.

Missing parenthesis error when creating Function Based Index

I am attempting to create a Function Based Index on a predicate that has a high cost (Oracle).
I want to create an index on the TIME_ID column in the A4ORDERS table that brings back values for month of December:
SELECT * FROM A4ORDERS WHERE TRIM(TO_CHAR(time_id, 'Month')) in ( 'December' );
Creating the FBI:
CREATE INDEX TIME_FIDX ON A4ORDERS(TRIM(TO_CHAR(time_id, 'Month')) in ( 'December' ));
I get a "Missing Right parenthesis" error and I can't figure out why? Any guidance you can provide would be appreciated.
Solution from Alex Poole's response below that worked:
CREATE INDEX TIME_FIDX ON A4ORDERS (TRIM(TO_CHAR(time_id, 'Month')));
Your create index statement should not have the in ( 'December' ) part, that only belongs in the query. If you create the index as:
CREATE INDEX TIME_FIDX ON A4ORDERS (TRIM(TO_CHAR(time_id, 'Month')));
... then that index can be used by your query:
EXPLAIN PLAN FOR
SELECT * FROM A4ORDERS WHERE TRIM(TO_CHAR(time_id, 'Month')) in ( 'December' );
SELECT plan_table_output FROM TABLE (dbms_xplan.display());
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| A4ORDERS | 1 | 29 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TIME_FIDX | 1 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(TRIM(TO_CHAR(INTERNAL_FUNCTION("TIME_ID"),'Month'))='December')
So you can see from the plan that TIME_FIDX is being used. Whether it will give you a significant performance gain remains to be seen of course, and the optimiser could decide it isn't selective enough anyway.
'Month' is NLS-sensitive though; it would be safer to either use the month number, or specify the NLS_DATE_LANGUAGE in the TO_CHAR call, but it has to be done consistently - which will be a little easier with numbers. You could also make it an indexed virtual column.
You can use:
CREATE INDEX TIME_FIDX ON A4ORDERS(TRIM(TO_CHAR(time_id, 'Month')));
But also you can make it bit more simple:
CREATE INDEX TIME_FIDX ON A4ORDERS(TO_CHAR(time_id, 'mm'));
and write SQL:
SELECT * FROM A4ORDERS WHERE TO_CHAR(time_id, 'mm') in ( '12');
But if you provide more information about your problem (workaround, SQL query, plans etc.) you can receive more help.

Alter session slows down the query through Hibernate

I'm using Oracle 11gR2 and Hibernate 4.2.1.
My application is a searching application.
Only has SELECT operations and all of them are native queries.
Oracle uses case-sensitive sort by default.
I want to override it to case-insensitive.
I saw couple of option here http://docs.oracle.com/cd/A81042_01/DOC/server.816/a76966/ch2.htm#91066
Now I'm using this query before any search executes.
ALTER SESSION SET NLS_SORT='BINARY_CI'
If I execute above sql before execute the search query, hibernate takes about 15 minutes to return from search query.
If I do this in Sql Developer, It returns within couple of seconds.
Why this kind of two different behaviors,
What can I do to get rid of this slowness?
Note: I always open a new Hibernate session for each search.
Here is my sql:
SELECT *
FROM (SELECT
row_.*,
rownum rownum_
FROM (SELECT
a, b, c, d, e,
RTRIM(XMLAGG(XMLELEMENT("x", f || ', ') ORDER BY f ASC)
.extract('//text()').getClobVal(), ', ') AS f,
RTRIM(
XMLAGG(XMLELEMENT("x", g || ', ') ORDER BY g ASC)
.extract('//text()').getClobVal(), ', ') AS g
FROM ( SELECT src.a, src.b, src.c, src.d, src.e, src.f, src.g
FROM src src
WHERE upper(pp) = 'PP'
AND upper(qq) = 'QQ'
AND upper(rr) = 'RR'
AND upper(ss) = 'SS'
AND upper(tt) = 'TT')
GROUP BY a, b, c, d, e
ORDER BY b ASC) row_
WHERE rownum <= 400
) WHERE rownum_ > 0;
There are so may fields comes with LIKE operation, and it is a dynamic sql query. If I use order by upper(B) asc Sql Developer also takes same time.
But order by upper results are same as NLS_SORT=BINARY_CI. I have used UPPER('B') indexes, but nothings gonna work for me.
A's length = 10-15 characters
B's length = 34-50 characters
C's length = 5-10 characters
A, B and C are sort-able fields via app.
This SRC table has 3 million+ records.
We finally ended up with a SRC table which is a materialized view.
Business logic of the SQL is completely fine.
All of the sor-table fields and others are UPPER indexed.
UPPER() and BINARY_CI may produce the same results but Oracle cannot use them interchangeably. To use an index and BINARY_CI you must create an index like this:
create index src_nlssort_index on src(nlssort(b, 'nls_sort=''BINARY_CI'''));
Sample table and mixed case data
create table src(b varchar2(100) not null);
insert into src select 'MiXeD CAse '||level from dual connect by level <= 100000;
By default the upper() predicate can perform a range scan on the the upper() index
create index src_upper_index on src(upper(b));
explain plan for
select * from src where upper(b) = 'MIXED CASE 1';
select * from table(dbms_xplan.display(format => '-rows -bytes -cost -predicate
-note'));
Plan hash value: 1533361696
------------------------------------------------------------------
| Id | Operation | Name | Time |
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SRC | 00:00:01 |
| 2 | INDEX RANGE SCAN | SRC_UPPER_INDEX | 00:00:01 |
------------------------------------------------------------------
BINARY_CI and LINGUISTIC will not use the index
alter session set nls_sort='binary_ci';
alter session set nls_comp='linguistic';
explain plan for
select * from src where b = 'MIXED CASE 1';
select * from table(dbms_xplan.display(format => '-rows -bytes -cost -note'));
Plan hash value: 3368256651
---------------------------------------------
| Id | Operation | Name | Time |
---------------------------------------------
| 0 | SELECT STATEMENT | | 00:00:02 |
|* 1 | TABLE ACCESS FULL| SRC | 00:00:02 |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NLSSORT("B",'nls_sort=''BINARY_CI''')=HEXTORAW('6D69786564
2063617365203100') )
Function based index on NLSSORT() enables index range scans
create index src_nlssort_index on src(nlssort(b, 'nls_sort=''BINARY_CI'''));
explain plan for
select * from src where b = 'MIXED CASE 1';
select * from table(dbms_xplan.display(format => '-rows -bytes -cost -note'));
Plan hash value: 478278159
--------------------------------------------------------------------
| Id | Operation | Name | Time |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SRC | 00:00:01 |
|* 2 | INDEX RANGE SCAN | SRC_NLSSORT_INDEX | 00:00:01 |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(NLSSORT("B",'nls_sort=''BINARY_CI''')=HEXTORAW('6D69786564
2063617365203100') )
I investigated and found that The parameters NLS_COMP y NLS_SORT may affect how oracle make uses of execute plan for string ( when it is comparing or ordering).
Is not necesary to change NLS session. adding
ORDER BY NLSSORT(column , 'NLS_SORT=BINARY_CI')
and adding a index for NLS is enough
create index column_index_binary as NLSSORT(column , 'NLS_SORT=BINARY_CI')
I found a clue to a problem in this issue so i'm paying back.
Why oracle stored procedure execution time is greatly increased depending on how it is executed?