Total cost of a query through Oracle's explain plan - sql

I am a bit new to Oracle and I am have a question regarding Oracle's explain plan. I have used the 'auto-trace' feature for a particular query.
SQL> SELECT * from myTable;
11 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 1233351234
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 330 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| MYTABLE| 11 | 330 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
My question is if I want to calculate the 'total' cost of this query, is it 6 (3+3) or its only 3. Suppose I had a larger query with more steps in the plan, do I have to add up all the values in the cost column to get the total cost or is it the first value (ID=0) that is the total cost of a query?

Cost is 3, the plan is shown as a hierarchy, with the cost of the sub-components already included in the parent components.

You might also want to take a look at some of the responses to:
How do you interpret a query's explain plan?

Related

How is this cardinality being calculated in Explain plan?

I am analyzing the "explanation plan" about the following instruction
SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
and Oracle SQL Developer tells me that it has a cardinality of 1513 and cost of 1302.
How are these calculations performed? Could be reproduced with an instruction (calculate with a select and obtain de same value)?
The cardinality generated by an explain plan can be based on many factors, but in your code Oracle is probably just guessing that the SUBSTR expression will return 1% of all rows from the table.
For example, we can recreate your cardinality estimate by creating a simple table with 151,300 rows:
drop table friends;
create table friends(activity varchar2(100));
create index friends_idx on friends(activity);
insert into friends select level from dual connect by level <= 1513 * 100;
begin
dbms_stats.gather_table_stats(user, 'FRIENDS', no_invalidate => false);
end;
/
The resulting explain plan estimates the query will return 1% of the table, or 1513 rows:
explain plan for SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
select * from table(dbms_xplan.display);
Plan hash value: 3524934291
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1513 | 9078 | 72 (6)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| FRIENDS | 1513 | 9078 | 72 (6)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUBSTR("ACTIVITY",1,2)='49')
The above code is the simplest explanation, but there are potentially dozens of other weird things that are going on with your query. Running EXPLAIN PLAN FOR SELECT... and then SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); is often enough to investigate the cardinality. Pay special attention to the "Note" section for any unexpected gotchas.
Not all of these cardinality rules and features are documented. But if you have a lot of free time, and want to understand the math behind it all, run some 10053 trace files and read Jonathan Lewis' blog and book. His book also explains how the "cost" is generated, but the calculations are so complicated that it's not worth worrying about.
Why doesn't Oracle calculate a perfect cardinality estimate?
It's too expensive to calculate actual cardinalities before running the queries. To create an always-perfect estimate for the SUBSTR operation, Oracle would have to run something like the below query:
SELECT SUBSTR(activity,1,2), COUNT(*)
FROM friends
GROUP BY SUBSTR(activity,1,2);
For my sample data, the above query returns 99 counts, and determines that the cardinality estimate should be 1111 for the original query.
But the above query has to first read all the data from FRIENDS.ACTIVITY, which requires either an index fast full scan or a full table scan. Then the data has to be sorted or hashed to get the counts per group (which is likely an O(N*LOG(N)) operation). If the table is large, the intermediate results won't fit in memory and must be written and then read from disk.
Pre-calculating the cardinality would be more work than the actual query itself. The results could perhaps be saved, but storing those results could take up a lot of space, and how does the database know that the predicate will ever be needed again? And even if the pre-calculated cardinalities were stored, as soon as someone modifies the table those values may become worthless.
And this whole effort assumes that the functions are deterministic. While SUBSTR works reliably, what if there was a custom function like DBMS_RANDOM.VALUE? These problems are both theoretically impossible (the halting problem), and very difficult in practice. Instead, the optimizer relies on guesses like DBA_TABLES.NUM_ROWS (from when the statistics were last gathered) * 0.01 for "complex" predicates.
Dynamic Sampling
Dynamic sampling, also known as dynamic statistics, will pre-run parts of your SQL statement to create a better estimate. You can set the amount of data to be sampled, and by setting the value to 10, Oracle will effectively run the whole thing ahead of time to determine the cardinality. This feature can obviously be pretty slow, and there are lots of weird edge cases and other features I'm not discussing here, but for your query it can create a perfect estimate of 1,111 rows:
EXPLAIN PLAN FOR SELECT /*+ dynamic_sampling(10) */ * FROM friends WHERE SUBSTR(activity,1,2) = '49';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 3524934291
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1111 | 6666 | 72 (6)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| FRIENDS | 1111 | 6666 | 72 (6)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUBSTR("ACTIVITY",1,2)='49')
Note
-----
- dynamic statistics used: dynamic sampling (level=10)
Dynamic Reoptimization
Oracle can keep track of the number of rows at run-time and adjust the plan accordingly. This feature doesn't help you with your simple sample query. But if the table was used as part of a join, when the cardinality estimates become more important, Oracle will build multiple versions of the explain plan and use the one depending on the actual cardinality.
In the below explain plan, you can see the estimate is still the same old 1513. But if the actual number is much lower at run time, Oracle will disable the HASH JOIN operation meant for a large number of rows, and will switch to the NESTED LOOPS operation that is better suited for a smaller number of rows.
EXPLAIN PLAN FOR
SELECT *
FROM friends friends1
JOIN friends friends2
ON friends1.activity = friends2.activity
WHERE SUBSTR(friends1.activity,1,2) = '49';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(format => '+adaptive'));
Plan hash value: 215764417
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1530 | 18360 | 143 (5)| 00:00:01 |
| * 1 | HASH JOIN | | 1530 | 18360 | 143 (5)| 00:00:01 |
|- 2 | NESTED LOOPS | | 1530 | 18360 | 143 (5)| 00:00:01 |
|- 3 | STATISTICS COLLECTOR | | | | | |
| * 4 | TABLE ACCESS FULL | FRIENDS | 1513 | 9078 | 72 (6)| 00:00:01 |
|- * 5 | INDEX RANGE SCAN | FRIENDS_IDX | 1 | 6 | 168 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | FRIENDS | 151K| 886K| 70 (3)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("FRIENDS1"."ACTIVITY"="FRIENDS2"."ACTIVITY")
4 - filter(SUBSTR("FRIENDS1"."ACTIVITY",1,2)='49')
5 - access("FRIENDS1"."ACTIVITY"="FRIENDS2"."ACTIVITY")
Note
-----
- this is an adaptive plan (rows marked '-' are inactive)
Expression Statistics
Expression statistics tells Oracle to gather additional types of statistics. We can force Oracle to gather statistics on the SUBSTR expression, and then those statistics can be used for more accurate estimates. In the below example, the final estimate is actually only slightly different. Expression statistics alone don't work well here, but that was just bad luck in this case.
SELECT dbms_stats.create_extended_stats(extension => '(SUBSTR(activity,1,2))', ownname => user, tabname => 'FRIENDS')
FROM DUAL;
begin
dbms_stats.gather_table_stats(user, 'FRIENDS');
end;
/
EXPLAIN PLAN FOR SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 3524934291
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1528 | 13752 | 72 (6)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| FRIENDS | 1528 | 13752 | 72 (6)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUBSTR("ACTIVITY",1,2)='49')
Expression Statistics and Histograms
With the addition of a histogram, we're finally creating something pretty similar to what your teacher described. When the expression statistics are gathered, a histogram will save information about the number of unique values in up to 255 different ranges or buckets. In our case, since there are only 99 unique rows, the histogram will perfectly estimate the number of rows for '49' as '1111'.
--(There are several ways to gather histograms. Instead of directly forcing it, I prefer to call the query
-- multiple times so that Oracle will register the need for a histogram, and automatically create one.)
SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
begin
dbms_stats.gather_table_stats(user, 'FRIENDS');
end;
/
EXPLAIN PLAN FOR SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 3524934291
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1111 | 9999 | 72 (6)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| FRIENDS | 1111 | 9999 | 72 (6)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUBSTR("ACTIVITY",1,2)='49')
Summary
Oracle will not automatically pre-run all predicates to perfectly estimate cardinalities. But there are several mechanisms we can use to get Oracle to do something very similar for a small number of queries that we care about.
The situation gets even more complicated when you consider bind variables - what if the value '49' changes frequently? (Adaptive Cursor Sharing can help with that.) Or what if a huge amount of rows are modified, how do we update statistics quickly? (Online Statistics Gathering and Incremental Statistics can help with that.)
The optimizer doesn't really optimize. There's only enough time to satisfice.

Are functions considered in Execution Plan?

How will the execution plan be generated when the query has PL/SQL functions(user-defined functions) in SELECT or WHERE clauses?
Does it calculate the cost for those functions also and show it in the execution plan or the functions are just ignored?
Thanks in advance for your help.
User generated functions directly contribute little or no cost when they are used in either the SELECT or the WHERE clause. If we want the optimizer to make decisions based on the cost of functions, we must manually set a cost with the ASSOCIATE STATISTICS command.
Sample Schema
For this example, create the following medium sized table, and two simple functions - one that is obviously fast, and one that is obviously slow.
create table test1 as
select mod(level, 10) a, mod(level, 10) b
from dual
connect by level <= 100000;
begin
dbms_stats.gather_table_stats(user, 'test1');
end;
/
create or replace function fast_function(p_number number) return number is
begin
return p_number;
end;
/
create or replace function slow_function(p_number number) return number is
v_count number;
begin
select count(*)
into v_count
from all_tables;
return v_count;
end;
/
Functions in SELECT clause - no cost
Calling the function in the SELECT clause does not change the cost at all. The below three queries SELECT a literal, the fast function, and the slow function:
explain plan for select a from test1;
select * from table(dbms_xplan.display);
explain plan for select fast_function(a) from test1;
select * from table(dbms_xplan.display);
explain plan for select slow_function(a) from test1;
select * from table(dbms_xplan.display);
But all queries generate the same execution plan:
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 292K| 47 (3)| 00:00:01 |
| 1 | TABLE ACCESS FULL| TEST1 | 100K| 292K| 47 (3)| 00:00:01 |
---------------------------------------------------------------------------
Functions in WHERE clause - little cost
When calling the functions in the WHERE clause instead of a literal, the cost slightly increases from 48 to 70. But there is no cost difference between the fast function and the slow function.
explain plan for select * from test1 where a = b;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 48 (5)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10000 | 60000 | 48 (5)| 00:00:01 |
---------------------------------------------------------------------------
explain plan for select * from test1 where fast_function(a) = b;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 70 (35)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10000 | 60000 | 70 (35)| 00:00:01 |
---------------------------------------------------------------------------
explain plan for select * from test1 where slow_function(a) = b;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 70 (35)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10000 | 60000 | 70 (35)| 00:00:01 |
---------------------------------------------------------------------------
ASSOCIATE STATISTICS
We can set the cpu_cost, io_cost, and network_cost for each call to the function. There's probably a way to find those specific costs using tracing, but the cost is an internal magic number that's hard to understand, and the optimizer generally only needs numbers within an order of magnitude to make good decisions. I found the total cost of the query inside the slow function, 1000, and divided it equally into the cpu_cost and io_cost like this:
associate statistics with functions slow_function default cost(500,500,0);
Now the total cost for the plan increases dramatically from 70 to 100,000,000:
explain plan for select * from test1 where b = slow_function(b);
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 100M (1)| 01:05:07 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10000 | 60000 | 100M (1)| 01:05:07 |
---------------------------------------------------------------------------
More importantly, Oracle can use this cost information to run the functions in the right order. In the below query, Oracle runs the fast function first, which costs almost nothing, and then runs the slow function on the remaining rows.
(It's a bit difficult to tell the order of function execution. The lower overall cost implies how the functions are run. And the order of the functions in the FILTER is another sign. In regular SQL, the two sides of an AND predicate could be run in any order. In an explain plan, the execution order seems to always be left-to-right.)
explain plan for select * from test1 where a = fast_function(a) and b = slow_function(b);
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 6000 | 10M (1)| 00:06:31 |
|* 1 | TABLE ACCESS FULL| TEST1 | 1000 | 6000 | 10M (1)| 00:06:31 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"="FAST_FUNCTION"("A") AND "B"="SLOW_FUNCTION"("B"))
Selectivity
Despite the name "cost based optimizer", we should probably worry more about the cardinality than the cost. The number of rows returned by predicates drives most execution plan choices. Oracle makes a few default guesses about user defined functions. For example, in the below query, Oracle assumes that the function will only satisfy 1% of the rows - that's why the "Rows" in the execution plan says 1000 instead of 100000.
explain plan for select * from test1 where fast_function(a) = 1;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 6000 | 70 (35)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 1000 | 6000 | 70 (35)| 00:00:01 |
---------------------------------------------------------------------------
If we know that the function is much more selective, for example if we know that the function is more likely to only match 0.1% of all rows, we can also use ASSOCIATE STATISTICS to set the default selectivity. The below command sets the selectivity and then the number of rows drops from 1000 to 100.
associate statistics with functions fast_function default selectivity 0.1;
explain plan for select * from test1 where fast_function(a) = 1;
select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 600 | 70 (35)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST1 | 100 | 600 | 70 (35)| 00:00:01 |
---------------------------------------------------------------------------
In our simple plans, the cardinality doesn't matter. But in realistic queries, horrible cardinality estimates cause a chain reaction of bad decisions that leads to slow queries. Helping the optimizer make good cardinality estimates is often the most important part of performance tuning.
Other kinds of functions and statistics
This already-long-answer still only scratches the surface of how functions can affect execution plans. Table functions, functions that return rows of data, are a whole other topic. And I'd bet there are dynamic reoptimization features in newer Oracle versions that will help improve the second or third execution, after the optimizer has learned from its mistakes.
I hope I didn't discourage you from using custom functions. The vast majority of the time, Oracle will make the right decisions without any effort. And when it doesn't, there are mechanisms to help correct those mistakes.

Why does %cpu and cost increase for this execution plan?

I have a table that is populated with many rows of records. I explain and display the execution plan before the creation of the index for a query
explain plan for
SELECT l_partKey, count(*)
FROM LINEITEM
GROUP BY L_PARTKEY
HAVING COUNT(l_tax) > 2;
SELECT * FROM table(dbms_xplan.display);
And this is the output
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2487493660
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 8821 (1)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 8821 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| LINEITEM | 1800K| 8789K| 8775 (1)| 00:00:01 |
--------------------------------------------------------------------------------
Then I create this index:
CREATE INDEX lineItemIdx ON LINEITEM(l_partKey);
Explain and display the execution plan again and this is the output:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 573468153
--------------------------------------------------------------------------------
------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 1130 (5)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 1130 (5)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| LINEITEMIDX | 1800K| 8789K| 1084 (1)| 00:00:01 |
Does anyone know why the %cpu goes from 1, 1, 1 to 5, 5, 1?
Afterwards, I removed the index I created and create a new index on l_partKey, l_tax and explain and display the execution again:
CREATE INDEX lineItemIdx ON LINEITEM(l_partKey, l_tax);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 573468153
--------------------------------------------------------------------------------
------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 1326 (4)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 1326 (4)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| LINEITEMIDX | 1800K| 8789K| 1281 (1)| 00:00:01 |
Now there is a slight increase in cost from 1130, 1130, 1084 to 1326, 1326, 1281, when using the new index l_partKey, l_tax as compared to the previous index i created. Why is that so? Shouldn't this index be increasing the speed of the query processing more than the previous index?
Your query requires counting all the rows in a 1.8 megarow table. Therefore, Oracle has to do some kind of full scan to satisfy it.
Without a useful index, it needed a full table scan: it has to read the entire table. That probably slams the server's IO operations; so the cpu is active for a small percent of the elapsed time of the query. DBMSs have two things that slow them down. IO (reading an entire table from disk) and CPU (computing things). Without an index, the CPU spends most of the elapsed time of the query waiting on the disk to deliver the contents of the whole table. So the CPU is active for a smaller percentage of the elapsed time. With the index, the disk must deliver less data. So the CPU takes a larger percentage of the total time. CPU% is not a good measure of the overall cost of queries.
When you added your first index, you reduced the IO operations needed to satisfy the query, so the cpu became active for a larger percent of the elapsed time.
Your second index caused your query to cost almost exactly the same as your first. The index items are a little bit larger so Oracle has to do slightly more work to handle them; that may explain the slight cost increase.
Don't forget: Oracle is 43 years old and on version 19. Generations of programmers have worked to optimize it. Trying to guess "why" for a small cost difference is probably not worth your trouble.
Finally, there's something strange in your query. You do SELECT ... COUNT(*) and then HAVING COUNT(column) > 2. COUNT(column) is different from COUNT(*): the former counts the non-null entries in column, where COUNT(*) counts them all. Is that your intention?
Both queries with indexes use INDEX FAST FULL SCAN. That's the holy grail of full scans. Your second index includes your l_tax column, so it's possible to guess it's declared NOT NULL or it might not have been eligible for fast scanning. In that case Oracle knows COUNT(*) is the same as COUNT(l_tax). That's why both indexes come up with the same plan, even with slightly different costs on the steps.

Oracle SQL execution plan changes due to SYS_OP_C2C internal conversion

I'm wondering why cost of this query
select * from address a
left join name n on n.adress_id=a.id
where a.street='01';
is higher than
select * from address a
left join name n on n.adress_id=a.id
where a.street=N'01';
where address table looks like this
ID NUMBER
STREET VARCHAR2(255 CHAR)
POSTAL_CODE VARCHAR2(255 CHAR)
and name table looks like this
ID NUMBER
ADDRESS_ID NUMBER
NAME VARCHAR2(255 CHAR)
SURNAME VARCHAR2(255 CHAR)
These are costs returned by explain plan
Explain plan for '01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3591 | 1595K| 87 (0)| 00:00:02 |
| 1 | NESTED LOOPS OUTER | | 3591 | 1595K| 87 (0)| 00:00:02 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 3 | 207 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."STREET"='01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
Explain plan for N'01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 347 | 154K| 50 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 347 | 154K| 50 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 1 | 69 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("A"."STREET")=U'01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
As you can see cost for N'01' query is lower than cost for '01'. Any idea why? N'01' needs additionally convert varchar to nvarchar so cost should be higher (SYS_OP_C2C()). The other question is why rows processed by N'01' query is lower than '01'?
[EDIT]
Table address has 30 rows.
Table name has 19669 rows.
SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
I am not sure about the reason why the number of rows are less, but I can guarantee it could be more too. Cost estimation won't be affected.
Let's try to see step-by-step in a test case.
SQL> CREATE TABLE t AS SELECT 'a'||LEVEL col FROM dual CONNECT BY LEVEL < 1000;
Table created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = 'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 5 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter("COL"='a10')
13 rows selected.
SQL>
So far so good. Since there is only one row with value as 'a10', optimizer estimated one row.
Let's see with the national characterset conversion.
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter(SYS_OP_C2C("COL")=U'a10')
13 rows selected.
SQL>
What happened here? We can see filter(SYS_OP_C2C("COL")=U'a10'), which means an internal function is applied and it converts the varchar2 value to nvarchar2. The filter now found 10 rows.
This will also suppress any index usage, since now a function is applied on the column. We can tune it by creating a function-based index to avoid full table scan.
SQL> create index nchar_indx on t(to_nchar(col));
Index created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE to_nchar(col) = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1400144832
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 50 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NCHAR_INDX | 4 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - access(SYS_OP_C2C("COL")=U'a10')
14 rows selected.
SQL>
However, will this make the execution plans similar? No. i think with two different charactersets , the filter will not be applied alike. Thus, the difference lies.
My research says,
Usually, such scenarios occur when the data coming via an application
is nvarchar2 type, but the table column is varchar2. Thus, Oracle
applies an internal function in the filter operation. My suggestion
is, to know your data well, so that you use similar data types during
design phase.
When worrying about explain plans, it matters whether there are current statistics on the tables. If the statistics do not represent the actual data reasonably well, then the optimizer will make mistakes and estimate cardinalities incorrectly.
You can check how long ago statistics were gathered by querying the data dictionary:
select table_name, last_analyzed
from user_tables
where table_name in ('ADDRESS','NAME');
You can gather statistics for the optimizer to use by calling DBMS_STATS:
begin
dbms_stats.gather_table_stats(user, 'ADDRESS');
dbms_stats.gather_table_stats(user, 'NAME');
end;
So perhaps after gathering statistics you will get different explain plans. Perhaps not.
The difference in your explain plans is primarily because the optimizer estimates how many rows it will find in address table differently in the two cases.
In the first case you have an equality predicate with same datatype - this is good and the optimizer can often estimate cardinality (row count) reasonably well for cases like this.
In the second case a function is applied to the column - this is often bad (unless you have function based indexes) and will force the optimizer to take a wild guess. That wild quess will be different in different versions of Oracle as the developers of the optimizer tries to improve upon it. Some versions the wild guess will simply be something like "I guess 5% of the number of rows in the table."
When comparing different datatypes, it is best to avoid implicit conversions, particularly when like this case the implicit conversion makes a function on the column rather than the literal. If you have cases where you get a value as datatype NVARCHAR2 and need to use it in a predicate like above, it can be a good idea to explicitly convert the value to the datatype of the column.
select * from address a
left join name n on n.adress_id=a.id
where a.street = CAST( N'01' AS VARCHAR2(255));
In this case with a literal it does not make sense, of course. Here you would just use your first query. But if it was a variable or function parameter, maybe you could have use cases for doing something like this.
As I can see the first query returns 3591 rows, the second one returns 347 rows. So Oracle needs less I/O operation that's why the cost is less.
Don't be confused with
N'01' needs additionally convert varchar to nvarchar
Oracle does one hard parse and then uses soft parse for the same queries. So the longer your oracle works the faster it becomes.

Explain plan and Query execution time differences

I have two tables TABLE_A and TABLE_B ( one to many. FK of table_a in table_b ). I have written the following 3 queries and each one of it will perform at different speeds on the tables but basically they all are doing the same.
Time: 3.916 seconds.
SELECT count(*)
FROM TABLE_A hconn
WHERE EXISTS
(SELECT *
FROM TABLE_B hipconn
WHERE HIPCONN.A_ID = HCONN.A_ID
);
Time: 3.52 seconds
SELECT COUNT(*)
FROM TABLE_A hconn,
TABLE_B HIPCONN
WHERE HCONN.A_ID = HIPCONN.A_ID;
Time: 2.72 seconds.
SELECT COUNT(*)
FROM TABLE_A HCONN
JOIN TABLE_B HIPCONN
ON HCONN.A_ID = HIPCONN.A_ID;
From the above timings, we can know that the last query is performing better than other. (I've tested them a bunch of times and they all perform in the same order mentioned but the last query performed well always).
I've started looking at the explain plan for the above queries to find out why it is happening.
Query explain plan, it prints out the same cost and time for all the above queries without any difference.(Explain plan below) I re-ran a couple of times, but the result is same for all the above queries.
Question: Why does the speed of the results vary when the explain plan showed that it takes same amount of time for all the queries? where am I going wrong?
Plan hash value: 600428245
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | | 12913 (2)| 00:02:35 |
| 1 | SORT AGGREGATE | | 1 | 11 | | | |
|* 2 | HASH JOIN RIGHT SEMI | | 2273K| 23M| 39M| 12913 (2)| 00:02:35 |
| 3 | INDEX STORAGE FAST FULL SCAN| BIN$ACCkNNuTHKPgUJAKNJgj5Q==$0 | 2278K| 13M| | 1685 (2)| 00:00:21 |
| 4 | INDEX STORAGE FAST FULL SCAN| BIN$ACCkNNubHKPgUJAKNJgj5Q==$0 | 6448K| 30M| | 4009 (2)| 00:00:49 |
-------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("HIPCONN"."A_ID"="HCONN"."A_ID")
You may use DBMS_XPLAN.DISPLAY_CURSOR to display the actual execution plan for the last SQL statement executed, since the queries may have more than one execution plan in the library cache.
Also you may enable a 10046 trace at level 12 to check why the queries are responding with different execution times.