Are functions considered in Execution Plan? - sql

How will the execution plan be generated when the query has PL/SQL functions(user-defined functions) in SELECT or WHERE clauses?
Does it calculate the cost for those functions also and show it in the execution plan or the functions are just ignored?
Thanks in advance for your help.

User generated functions directly contribute little or no cost when they are used in either the SELECT or the WHERE clause. If we want the optimizer to make decisions based on the cost of functions, we must manually set a cost with the ASSOCIATE STATISTICS command.
Sample Schema
For this example, create the following medium sized table, and two simple functions - one that is obviously fast, and one that is obviously slow.
create table test1 as
select mod(level, 10) a, mod(level, 10) b
from dual
connect by level <= 100000;
begin
dbms_stats.gather_table_stats(user, 'test1');
end;
/
create or replace function fast_function(p_number number) return number is
begin
return p_number;
end;
/
create or replace function slow_function(p_number number) return number is
v_count number;
begin
select count(*)
into v_count
from all_tables;
return v_count;
end;
/
Functions in SELECT clause - no cost
Calling the function in the SELECT clause does not change the cost at all. The below three queries SELECT a literal, the fast function, and the slow function:
explain plan for select a from test1;
select * from table(dbms_xplan.display);
explain plan for select fast_function(a) from test1;
select * from table(dbms_xplan.display);
explain plan for select slow_function(a) from test1;
select * from table(dbms_xplan.display);
But all queries generate the same execution plan:
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 292K| 47 (3)| 00:00:01 |
| 1 | TABLE ACCESS FULL| TEST1 | 100K| 292K| 47 (3)| 00:00:01 |
---------------------------------------------------------------------------
Functions in WHERE clause - little cost
When calling the functions in the WHERE clause instead of a literal, the cost slightly increases from 48 to 70. But there is no cost difference between the fast function and the slow function.
explain plan for select * from test1 where a = b;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 48 (5)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10000 | 60000 | 48 (5)| 00:00:01 |
---------------------------------------------------------------------------
explain plan for select * from test1 where fast_function(a) = b;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 70 (35)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10000 | 60000 | 70 (35)| 00:00:01 |
---------------------------------------------------------------------------
explain plan for select * from test1 where slow_function(a) = b;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 70 (35)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10000 | 60000 | 70 (35)| 00:00:01 |
---------------------------------------------------------------------------
ASSOCIATE STATISTICS
We can set the cpu_cost, io_cost, and network_cost for each call to the function. There's probably a way to find those specific costs using tracing, but the cost is an internal magic number that's hard to understand, and the optimizer generally only needs numbers within an order of magnitude to make good decisions. I found the total cost of the query inside the slow function, 1000, and divided it equally into the cpu_cost and io_cost like this:
associate statistics with functions slow_function default cost(500,500,0);
Now the total cost for the plan increases dramatically from 70 to 100,000,000:
explain plan for select * from test1 where b = slow_function(b);
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 100M (1)| 01:05:07 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10000 | 60000 | 100M (1)| 01:05:07 |
---------------------------------------------------------------------------
More importantly, Oracle can use this cost information to run the functions in the right order. In the below query, Oracle runs the fast function first, which costs almost nothing, and then runs the slow function on the remaining rows.
(It's a bit difficult to tell the order of function execution. The lower overall cost implies how the functions are run. And the order of the functions in the FILTER is another sign. In regular SQL, the two sides of an AND predicate could be run in any order. In an explain plan, the execution order seems to always be left-to-right.)
explain plan for select * from test1 where a = fast_function(a) and b = slow_function(b);
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 6000 | 10M (1)| 00:06:31 |
|* 1 | TABLE ACCESS FULL| TEST1 | 1000 | 6000 | 10M (1)| 00:06:31 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"="FAST_FUNCTION"("A") AND "B"="SLOW_FUNCTION"("B"))
Selectivity
Despite the name "cost based optimizer", we should probably worry more about the cardinality than the cost. The number of rows returned by predicates drives most execution plan choices. Oracle makes a few default guesses about user defined functions. For example, in the below query, Oracle assumes that the function will only satisfy 1% of the rows - that's why the "Rows" in the execution plan says 1000 instead of 100000.
explain plan for select * from test1 where fast_function(a) = 1;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 6000 | 70 (35)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 1000 | 6000 | 70 (35)| 00:00:01 |
---------------------------------------------------------------------------
If we know that the function is much more selective, for example if we know that the function is more likely to only match 0.1% of all rows, we can also use ASSOCIATE STATISTICS to set the default selectivity. The below command sets the selectivity and then the number of rows drops from 1000 to 100.
associate statistics with functions fast_function default selectivity 0.1;
explain plan for select * from test1 where fast_function(a) = 1;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 600 | 70 (35)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 100 | 600 | 70 (35)| 00:00:01 |
---------------------------------------------------------------------------
In our simple plans, the cardinality doesn't matter. But in realistic queries, horrible cardinality estimates cause a chain reaction of bad decisions that leads to slow queries. Helping the optimizer make good cardinality estimates is often the most important part of performance tuning.
Other kinds of functions and statistics
This already-long-answer still only scratches the surface of how functions can affect execution plans. Table functions, functions that return rows of data, are a whole other topic. And I'd bet there are dynamic reoptimization features in newer Oracle versions that will help improve the second or third execution, after the optimizer has learned from its mistakes.
I hope I didn't discourage you from using custom functions. The vast majority of the time, Oracle will make the right decisions without any effort. And when it doesn't, there are mechanisms to help correct those mistakes.

Related

ORA-01652 Why does unused row limiter solve this?

If I run a query without a row limiter i get an ora-01652 telling me I am out of temp table space. (I'm not the DBA & I admittedly don't fully understand this error.) If I add a rownum < 1000000000 it runs in a few seconds (yes, it's limited to a billion rows). My inner query only returns about 1,000 rows. How is an absurdly large row limiter, that is never reached, making this query run? There should be no difference between the limited and unlimited queries, no?
select
col1,
col2,
...
from
(
select
col1, col2,...
from table1 a
join table2 b-- limiter for performance
on a.column= b.column
or a.col= b.col
where
filter = 'Y'
and rownum <1000000000 -- irrelevant but query doesn't run without it.
) c
join table3 d
on c.id = d.id
We need to see the execution plan for the queries with and without the rownum condition. But as an example, adding a "rownum" can change an execution plan
SQL> create table t as select * from dba_objects
2 where object_id is not null;
Table created.
SQL>
SQL> create index ix on t ( object_id );
Index created.
SQL>
SQL> set autotrace traceonly explain
SQL> select * from t where object_id > 0 ;
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 82262 | 10M| 445 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 82262 | 10M| 445 (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_ID">0)
SQL> select * from t where object_id > 0 and rownum < 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 658510075
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 1188 | 3 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 9 | 1188 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IX | | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
This is a simplistic example, but you can get similar things with joins and the like, in particular, the "rownum" clause might be prohibiting the innermost join being folded into the outermost one, and thus yielding a different plan.

Why does %cpu and cost increase for this execution plan?

I have a table that is populated with many rows of records. I explain and display the execution plan before the creation of the index for a query
explain plan for
SELECT l_partKey, count(*)
FROM LINEITEM
GROUP BY L_PARTKEY
HAVING COUNT(l_tax) > 2;
SELECT * FROM table(dbms_xplan.display);
And this is the output
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2487493660
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 8821 (1)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 8821 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| LINEITEM | 1800K| 8789K| 8775 (1)| 00:00:01 |
--------------------------------------------------------------------------------
Then I create this index:
CREATE INDEX lineItemIdx ON LINEITEM(l_partKey);
Explain and display the execution plan again and this is the output:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 573468153
--------------------------------------------------------------------------------
------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 1130 (5)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 1130 (5)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| LINEITEMIDX | 1800K| 8789K| 1084 (1)| 00:00:01 |
Does anyone know why the %cpu goes from 1, 1, 1 to 5, 5, 1?
Afterwards, I removed the index I created and create a new index on l_partKey, l_tax and explain and display the execution again:
CREATE INDEX lineItemIdx ON LINEITEM(l_partKey, l_tax);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 573468153
--------------------------------------------------------------------------------
------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 1326 (4)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 1326 (4)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| LINEITEMIDX | 1800K| 8789K| 1281 (1)| 00:00:01 |
Now there is a slight increase in cost from 1130, 1130, 1084 to 1326, 1326, 1281, when using the new index l_partKey, l_tax as compared to the previous index i created. Why is that so? Shouldn't this index be increasing the speed of the query processing more than the previous index?
Your query requires counting all the rows in a 1.8 megarow table. Therefore, Oracle has to do some kind of full scan to satisfy it.
Without a useful index, it needed a full table scan: it has to read the entire table. That probably slams the server's IO operations; so the cpu is active for a small percent of the elapsed time of the query. DBMSs have two things that slow them down. IO (reading an entire table from disk) and CPU (computing things). Without an index, the CPU spends most of the elapsed time of the query waiting on the disk to deliver the contents of the whole table. So the CPU is active for a smaller percentage of the elapsed time. With the index, the disk must deliver less data. So the CPU takes a larger percentage of the total time. CPU% is not a good measure of the overall cost of queries.
When you added your first index, you reduced the IO operations needed to satisfy the query, so the cpu became active for a larger percent of the elapsed time.
Your second index caused your query to cost almost exactly the same as your first. The index items are a little bit larger so Oracle has to do slightly more work to handle them; that may explain the slight cost increase.
Don't forget: Oracle is 43 years old and on version 19. Generations of programmers have worked to optimize it. Trying to guess "why" for a small cost difference is probably not worth your trouble.
Finally, there's something strange in your query. You do SELECT ... COUNT(*) and then HAVING COUNT(column) > 2. COUNT(column) is different from COUNT(*): the former counts the non-null entries in column, where COUNT(*) counts them all. Is that your intention?
Both queries with indexes use INDEX FAST FULL SCAN. That's the holy grail of full scans. Your second index includes your l_tax column, so it's possible to guess it's declared NOT NULL or it might not have been eligible for fast scanning. In that case Oracle knows COUNT(*) is the same as COUNT(l_tax). That's why both indexes come up with the same plan, even with slightly different costs on the steps.

Oracle SQL execution plan changes due to SYS_OP_C2C internal conversion

I'm wondering why cost of this query
select * from address a
left join name n on n.adress_id=a.id
where a.street='01';
is higher than
select * from address a
left join name n on n.adress_id=a.id
where a.street=N'01';
where address table looks like this
ID NUMBER
STREET VARCHAR2(255 CHAR)
POSTAL_CODE VARCHAR2(255 CHAR)
and name table looks like this
ID NUMBER
ADDRESS_ID NUMBER
NAME VARCHAR2(255 CHAR)
SURNAME VARCHAR2(255 CHAR)
These are costs returned by explain plan
Explain plan for '01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3591 | 1595K| 87 (0)| 00:00:02 |
| 1 | NESTED LOOPS OUTER | | 3591 | 1595K| 87 (0)| 00:00:02 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 3 | 207 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."STREET"='01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
Explain plan for N'01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 347 | 154K| 50 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 347 | 154K| 50 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 1 | 69 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("A"."STREET")=U'01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
As you can see cost for N'01' query is lower than cost for '01'. Any idea why? N'01' needs additionally convert varchar to nvarchar so cost should be higher (SYS_OP_C2C()). The other question is why rows processed by N'01' query is lower than '01'?
[EDIT]
Table address has 30 rows.
Table name has 19669 rows.
SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
I am not sure about the reason why the number of rows are less, but I can guarantee it could be more too. Cost estimation won't be affected.
Let's try to see step-by-step in a test case.
SQL> CREATE TABLE t AS SELECT 'a'||LEVEL col FROM dual CONNECT BY LEVEL < 1000;
Table created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = 'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 5 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter("COL"='a10')
13 rows selected.
SQL>
So far so good. Since there is only one row with value as 'a10', optimizer estimated one row.
Let's see with the national characterset conversion.
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter(SYS_OP_C2C("COL")=U'a10')
13 rows selected.
SQL>
What happened here? We can see filter(SYS_OP_C2C("COL")=U'a10'), which means an internal function is applied and it converts the varchar2 value to nvarchar2. The filter now found 10 rows.
This will also suppress any index usage, since now a function is applied on the column. We can tune it by creating a function-based index to avoid full table scan.
SQL> create index nchar_indx on t(to_nchar(col));
Index created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE to_nchar(col) = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1400144832
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 50 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NCHAR_INDX | 4 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - access(SYS_OP_C2C("COL")=U'a10')
14 rows selected.
SQL>
However, will this make the execution plans similar? No. i think with two different charactersets , the filter will not be applied alike. Thus, the difference lies.
My research says,
Usually, such scenarios occur when the data coming via an application
is nvarchar2 type, but the table column is varchar2. Thus, Oracle
applies an internal function in the filter operation. My suggestion
is, to know your data well, so that you use similar data types during
design phase.
When worrying about explain plans, it matters whether there are current statistics on the tables. If the statistics do not represent the actual data reasonably well, then the optimizer will make mistakes and estimate cardinalities incorrectly.
You can check how long ago statistics were gathered by querying the data dictionary:
select table_name, last_analyzed
from user_tables
where table_name in ('ADDRESS','NAME');
You can gather statistics for the optimizer to use by calling DBMS_STATS:
begin
dbms_stats.gather_table_stats(user, 'ADDRESS');
dbms_stats.gather_table_stats(user, 'NAME');
end;
So perhaps after gathering statistics you will get different explain plans. Perhaps not.
The difference in your explain plans is primarily because the optimizer estimates how many rows it will find in address table differently in the two cases.
In the first case you have an equality predicate with same datatype - this is good and the optimizer can often estimate cardinality (row count) reasonably well for cases like this.
In the second case a function is applied to the column - this is often bad (unless you have function based indexes) and will force the optimizer to take a wild guess. That wild quess will be different in different versions of Oracle as the developers of the optimizer tries to improve upon it. Some versions the wild guess will simply be something like "I guess 5% of the number of rows in the table."
When comparing different datatypes, it is best to avoid implicit conversions, particularly when like this case the implicit conversion makes a function on the column rather than the literal. If you have cases where you get a value as datatype NVARCHAR2 and need to use it in a predicate like above, it can be a good idea to explicitly convert the value to the datatype of the column.
select * from address a
left join name n on n.adress_id=a.id
where a.street = CAST( N'01' AS VARCHAR2(255));
In this case with a literal it does not make sense, of course. Here you would just use your first query. But if it was a variable or function parameter, maybe you could have use cases for doing something like this.
As I can see the first query returns 3591 rows, the second one returns 347 rows. So Oracle needs less I/O operation that's why the cost is less.
Don't be confused with
N'01' needs additionally convert varchar to nvarchar
Oracle does one hard parse and then uses soft parse for the same queries. So the longer your oracle works the faster it becomes.

Distinct values on indexed column

I have a table with 115 M rows. One of the column is indexed (index called "my_index" on explain plan below) and not nullable. Moreover, this column has just one distinct value so far.
When I do
select distinct my_col from my_table;
, it takes 230 seconds which is very long. Here is the explain plan.
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 3 | 22064 (90)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 22064 (90)| 00:03:23 |
| 2 | INDEX FULL SCAN | my_index | 115M| 331M| 2363 (2)| 00:00:22 |
Since the column has just one distinct value, why does it take so long ? Why Oracle does not just check index entries and fastly find that there is just one possible value for this column ? On the explain plan above, the index scanning seems to take 22 s but what is this "SORT UNIQUE NOSORT" which takes ages ?
Thank you in advance for your help
Re analyse the table.
EXEC dbms_stats.gather_table_stats('owner','table_name',cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ');
Change Index Type
One distinct value out of 115M rows??!! That's what called as low cardinality, not so good for the 'normal' B-Tree index Consider a bitmapped index. (If at all you have B-tree)
Reconstructing Query
If you are sure that no new values will be added to this column then please remove the distinct clause and rather use as Abhijith said.
SORT UNIQUE NOSORT is not taking too long. You are looking at the estimates from a bad execution plan that is probably the result of unreasonable optimizer parameters. For example, setting the parameter OPTIMIZER_INDEX_COST_ADJ to 1 instead of the default 100 can produce a similar plan. Most likely your query runs slowly because your database is busy or just slow.
What's wrong with the posted execution plan?
The posted execution plan seems unreasonable. Retrieving data should take much longer than simply throwing out duplicates. And the consumer operation, SORT UNIQUE NOSORT, can start at almost the same time as the producer operation, INDEX FULL SCAN. Normally they should finish at almost the same time. The execution plan in the question shows the optimizer estimates. The screenshot below of an active report shows the actual timelines for a very similar query. All steps are starting and stopping at almost the same time.
Sample setup with reasonable plan
Below is a very similar setup, but with a very plain configuration. Same number of rows read (115 million) and returned (1), and almost the exact same segment size (329MB vs 331 MB). The plan shows almost all of the time being spent on the INDEX FULL SCAN.
drop table test1 purge;
create table test1(a number not null, b number, c number) nologging;
begin
for i in 1 .. 115 loop
insert /*+ append */ into test1 select 1, level, level
from dual connect by level <= 1000000;
commit;
end loop;
end;
/
create index test1_idx on test1(a);
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 244K (4)| 00:48:50 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 244K (4)| 00:48:50 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 237K (1)| 00:47:30 |
--------------------------------------------------------------------------------
Re-creating a bad plan
--Set optimizer_index_cost_adj to a ridiculously low value.
--This changes the INDEX FULL SCAN estimate from 47 minutes to 29 seconds.
alter session set optimizer_index_cost_adj = 1;
--Changing the CPUSPEEDNW to 800 will exactly re-create the time estimate
--for SORT UNIQUE NOSORT. This value is not ridiculous, and it is not
--something you should normally change. But it does imply your CPUs are
--slow. My 2+ year-old desktop had an original score of 1720.
begin
dbms_stats.set_system_stats( 'CPUSPEEDNW', 800);
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 16842 (86)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 16842 (86)| 00:03:23 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 2389 (2)| 00:00:29 |
--------------------------------------------------------------------------------
How to investigate
Check the parameters.
select name, value from v$parameter where name like 'optimizer_index%'
NAME VALUE
---- -----
optimizer_index_cost_adj 1
optimizer_index_caching 0
Also check the system statistics.
select * from sys.aux_stats$;
+---------------+------------+-------+------------------+
| SNAME | PNAME | PVAL1 | PVAL2 |
+---------------+------------+-------+------------------+
| SYSSTATS_INFO | STATUS | | COMPLETED |
| SYSSTATS_INFO | DSTART | | 09-23-2013 17:52 |
| SYSSTATS_INFO | DSTOP | | 09-23-2013 17:52 |
| SYSSTATS_INFO | FLAGS | 1 | |
| SYSSTATS_MAIN | CPUSPEEDNW | 800 | |
| SYSSTATS_MAIN | IOSEEKTIM | 10 | |
| SYSSTATS_MAIN | IOTFRSPEED | 4096 | |
| SYSSTATS_MAIN | SREADTIM | | |
| SYSSTATS_MAIN | MREADTIM | | |
| SYSSTATS_MAIN | CPUSPEED | | |
| SYSSTATS_MAIN | MBRC | | |
| SYSSTATS_MAIN | MAXTHR | | |
| SYSSTATS_MAIN | SLAVETHR | | |
+---------------+------------+-------+------------------+
To find out where the time is really spent, use a tool like the active report.
select dbms_sqltune.report_sql_monitor(sql_id => '5s63uf4au6hcm',
type => 'active') from dual;
If there are only a few distinct values of the column, try a compressed index:
create index my_index on my_table (my_col) compress;
This will store each distinct value of the column only once, hopefully reducing the execution time of your query.
As a bonus: use this to see the actual plan used for a query:
select /*+ gather_plan_statistics */ distinct my_col from my_table;
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR);
The gather_plan_statistics hint will collect more data (it will take longer to execute), but it works without it too. See the documentation of DBMS_XPLAN.DISPLAY_CURSOR for more details.
See the explain plan carefully.
It scans the whole index to know what you are trying to fetch
Then applies distinct function (try to retrieve the unique values). Though you say there is only one unique value, it has to scan the whole index to get the values. Oracle does not know that there is only one distinct value in the index. You can restrict the rownum = 1 to get the quick answer.
Try this to get the quick answer
select my_col from my_table where rownum = 1;
It is highly unfavourable to add an index on a column which has very less distribution. This is bad for the table and overall for the application as well. This just does not make any sense

Total cost of a query through Oracle's explain plan

I am a bit new to Oracle and I am have a question regarding Oracle's explain plan. I have used the 'auto-trace' feature for a particular query.
SQL> SELECT * from myTable;
11 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 1233351234
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 330 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| MYTABLE| 11 | 330 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
My question is if I want to calculate the 'total' cost of this query, is it 6 (3+3) or its only 3. Suppose I had a larger query with more steps in the plan, do I have to add up all the values in the cost column to get the total cost or is it the first value (ID=0) that is the total cost of a query?
Cost is 3, the plan is shown as a hierarchy, with the cost of the sub-components already included in the parent components.
You might also want to take a look at some of the responses to:
How do you interpret a query's explain plan?