Why does %cpu and cost increase for this execution plan? - sql

I have a table that is populated with many rows of records. I explain and display the execution plan before the creation of the index for a query
explain plan for
SELECT l_partKey, count(*)
FROM LINEITEM
GROUP BY L_PARTKEY
HAVING COUNT(l_tax) > 2;
SELECT * FROM table(dbms_xplan.display);
And this is the output
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2487493660
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 8821 (1)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 8821 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| LINEITEM | 1800K| 8789K| 8775 (1)| 00:00:01 |
--------------------------------------------------------------------------------
Then I create this index:
CREATE INDEX lineItemIdx ON LINEITEM(l_partKey);
Explain and display the execution plan again and this is the output:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 573468153
--------------------------------------------------------------------------------
------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 1130 (5)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 1130 (5)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| LINEITEMIDX | 1800K| 8789K| 1084 (1)| 00:00:01 |
Does anyone know why the %cpu goes from 1, 1, 1 to 5, 5, 1?
Afterwards, I removed the index I created and create a new index on l_partKey, l_tax and explain and display the execution again:
CREATE INDEX lineItemIdx ON LINEITEM(l_partKey, l_tax);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 573468153
--------------------------------------------------------------------------------
------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3023 | 15115 | 1326 (4)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 3023 | 15115 | 1326 (4)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| LINEITEMIDX | 1800K| 8789K| 1281 (1)| 00:00:01 |
Now there is a slight increase in cost from 1130, 1130, 1084 to 1326, 1326, 1281, when using the new index l_partKey, l_tax as compared to the previous index i created. Why is that so? Shouldn't this index be increasing the speed of the query processing more than the previous index?

Your query requires counting all the rows in a 1.8 megarow table. Therefore, Oracle has to do some kind of full scan to satisfy it.
Without a useful index, it needed a full table scan: it has to read the entire table. That probably slams the server's IO operations; so the cpu is active for a small percent of the elapsed time of the query. DBMSs have two things that slow them down. IO (reading an entire table from disk) and CPU (computing things). Without an index, the CPU spends most of the elapsed time of the query waiting on the disk to deliver the contents of the whole table. So the CPU is active for a smaller percentage of the elapsed time. With the index, the disk must deliver less data. So the CPU takes a larger percentage of the total time. CPU% is not a good measure of the overall cost of queries.
When you added your first index, you reduced the IO operations needed to satisfy the query, so the cpu became active for a larger percent of the elapsed time.
Your second index caused your query to cost almost exactly the same as your first. The index items are a little bit larger so Oracle has to do slightly more work to handle them; that may explain the slight cost increase.
Don't forget: Oracle is 43 years old and on version 19. Generations of programmers have worked to optimize it. Trying to guess "why" for a small cost difference is probably not worth your trouble.
Finally, there's something strange in your query. You do SELECT ... COUNT(*) and then HAVING COUNT(column) > 2. COUNT(column) is different from COUNT(*): the former counts the non-null entries in column, where COUNT(*) counts them all. Is that your intention?
Both queries with indexes use INDEX FAST FULL SCAN. That's the holy grail of full scans. Your second index includes your l_tax column, so it's possible to guess it's declared NOT NULL or it might not have been eligible for fast scanning. In that case Oracle knows COUNT(*) is the same as COUNT(l_tax). That's why both indexes come up with the same plan, even with slightly different costs on the steps.

Related

How to Optimize the Query in SQL Oracle Database

This query is taking 5 hrs to extract 1M rows out of 120M data.
select * from cc.ALC_TR_LS
where INV_TRN_ID > 0
order by INV_TRN_ID asc ;
fetch first 1000000 rows only;
Note : INV_TRN_ID is a identity column.
Need to optimize this query. Please suggest.
The first thing if you observe performance issues is to check the execution plan.
I'll simulate your case with this sample data
create table large_tab as
select rownum id,
rpad('x',2000,'y') pad
from dual connect by level <= 20000;
The execution plan for the query (your simplified)
select * from large_tab
where id > 0
order by id
fetch first 10000 rows only;
is as follows
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 9M| | 10223 (1)| 00:00:01 |
|* 1 | VIEW | | 10000 | 9M| | 10223 (1)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 20000 | 38M| 39M| 10223 (1)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | LARGE_TAB | 20000 | 38M| | 1842 (1)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "LARGE_TAB"."ID")<=10000)
3 - filter("ID">0)
The critical point is in line 2 where you need to sort 1M rows (10K in my case).
Note that you sort rows not IDs, so you need a lot of temp space as shown in the execution plan column TempSpc.
This is where you spend the elapsed time.
Alternative
If the table is rather static you may use two step approach.
First get the highest ID of your first 1M rows, than get all data up to this ID
Get the Upper Bound ID
select id from large_tab
where id > 0
order by id
offset 9999 rows fetch first 1 rows only;
The plan is similar, but as you order only ID's that are compact, the temp space is gone
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 761K| 1844 (1)| 00:00:01 |
|* 1 | VIEW | | 20000 | 761K| 1844 (1)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 20000 | 97K| 1844 (1)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | LARGE_TAB | 20000 | 97K| 1842 (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Get Data
Knowing the upper bound ID you may query as simple as
select * from large_tab
where id > 0 and id <= 10000 ---<<< your upper bound here
Note that you do not need the ORDER BY at all to get the requested count, so this will be a very afficient full table scan.
Two final remarks
to get 1M rows from a table do not try to somehow deploy an index in your access. If you want to optimize you need partitioning schema based in ID.
If you do not need all columns, do not use SELECT * because as you saw in the plan, you will need to sort this unneeded data.
fetching 1000000 record and that is with * (All columns) obviously will take time,
Steps suggestion :-
Retrieve only the required columns.
Make partitions for the table it will be more faster for more details about partitions visit https://docs.oracle.com/cd/B19306_01/server.102/b14220/partconc.htm#:~:text=Each%20row%20in%20a%20partitioned,use%20of%20the%20partition%20key
if you post your data to auto ascending grid in the front end execution time gets shorten in Backend

Full table scan on JOIN table using PK

I'm a bit puzzled on why a full table scan is performed on a simple sql query that uses primary key to join:
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
Explain shows:
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | | 283K (2)| 00:00:15 |
| 1 | SORT AGGREGATE | | 1 | 24 | | | |
|* 2 | HASH JOIN | | 677K| 15M| 14M| 283K (2)| 00:00:15 |
| 3 | INLIST ITERATOR | | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| ZVZ_BRIEF_REGISTRATIE | 694K| 6779K| | 17430 (1)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | ZVZ_BRIEF_REGISTRATIE_IF4 | 694K| | | 1469 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | ZVZ_PRINT_DOCUMENT | 9567K| 127M| | 260K (1)| 00:00:14 |
----------------------------------------------------------------------------------------------------------------------------
Where pd.PRINT_DOCUMENT_ID is a primary key.
Despite millions of records, I wouldn't expect this query to be slow.
What is the reason, and how to improve?
Does this give you a different plan?
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
WHERE br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
If so then you want to add BRIEF_REG_GROEP_ID to your index.
Probably last time statistics for ZVZ_PRINT_DOCUMENT were calculated when there were very few rows, so Oracle thinks that hash will be very small. Either try recalculating statistics or use hints:
SELECT /*+ leading(br pd) use_nl(pd)*/ max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
The optimiser estimates that it will access 694K rows from ZVZ_BRIEF_REGISTRATIE for the three BRIEF_REG_GROEP_ID values, using an index, and then it needs to get the corresponding details from ZVZ_PRINT_DOCUMENT. 694K individual index lookups is a lot (consider that it has to go the the index for each one and then use the rowid to access the table, in a loop, 694K times), and it has calculated that it will take less effort to just read ZVZ_PRINT_DOCUMENT once and crunch the two sets in a single hash join. Index lookups are usually better for small volumes of data.
Is it any faster if you hint it to use the index?
Are the row estimates in the execution plan correct? How many rows are there in each table and how many will you actually read?
What is your Oracle version and do you have adaptive features enabled?
It's slightly odd that your query has no WHERE clause but instead a filtering condition is included in the inner join. I expect the optimiser will rewrite it as a WHERE predicate anyway, but I would still want to experiment to see whether it affected the plan.

Can this query be made faster?

My Oracle query takes over 1.5 min and I do not know if it's because of inefficient query writing, bad choice of indexes or some other database issue that I cannot control.
Some tables and data were changed to protect IP.
SELECT /*+ PARALLEL (AUTO) */ COUNT(DISTINCT SUD_USERID)
FROM (
SELECT /*+ PARALLEL (AUTO) */
SUD_USERID ,
CASE WHEN SCH_PAGETYPE = 'Page' AND SUD_EVENTTYPE = 'S'
THEN 'EVENTTYPE1'
WHEN SCH_PAGETYPE = 'Page' AND SUD_EVENTTYPE = 'V'
THEN 'EVENTTYPE2'
WHEN SCH_PAGETYPE = 'Hub' AND SUD_EVENTTYPE = 'S'
THEN 'EVENTTYPE3'
END AS CALC_EVENT_SOURCE,
SUD_EVENT_SOURCE
FROM
(
SELECT /*+ PARALLEL (AUTO) */
UPPER(PAGETYPE)|| '-' || SCH.ID PAGETYPE_ID ,
SCH.PAGETYPE SCH_PAGETYPE
FROM TABLE1 SCH
WHERE SCH.PAGETYPE IN ('Page', 'Hub')
AND SCH.CATEGORY_NAME NOT IN ('archive', 'testcategory')
)
INNER JOIN (
SELECT /*+ PARALLEL (AUTO) */
DISTINCT SUD.TRACEID TRACEID ,
SUD.EVENTTYPE SUD_EVENTTYPE ,
SUD.USERID SUD_USERID,
SUD.EVENT_SOURCE SUD_EVENT_SOURCE
FROM
SOMESCHEMA.USAGE_DETAILS SUD
WHERE
SUD.EVENTTYPE IN ('S', 'V')
)
ON TRACEID = PAGETYPE_ID
INNER JOIN USER_JOB_FAMILY_MAPPING SFD
ON SUD_USERID = SFD.USERID
)
WHERE CALC_EVENT_SOURCE = SUD_EVENT_SOURCE
I could not copy the text of the explain plan (generated via DBeaver)
but here is a screenshot:
USAGE_DETAILS table has 3941810 rows
TABLE1 has 5908 rows
USER_JOB_FAMILY_MAPPING has 578233 rows
There are no keys on any of these tables.
USAGE_DETAILS.TRACEID is VARCHAR2(500) NOT NULL has function index=SUBSTR("TRACEID",1,4) and
another index declared as default but on that column.
USAGE_DETAILS.USERID is VARCHAR2(50) NOT NULL
USAGE_DETAILS.EVENTTYPE is VARCHAR2(10) NOT NULL and has default index
USAGE_DETAILS.EVENT_SOURCE is VARCHAR2(200) NOT NULL and has default index
I have tried doing inner joins on the full tables rather than the parenthetically generated (subselect?) tables, but that did not perform better and also limited my ability to use an alias in the WHERE clause.
I do not know what kind of machine this is running on, just that it's set up for development. I'd like this query to give me accurate answers in under 10s. Some times the query above though still does not return even after 10+ minutes.
Plan hash value: 2784166315
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | | 258K (1)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | 27 | | | |
| 2 | VIEW | VM_NWVW_1 | 1809K| 46M| | 258K (1)| 00:00:11 |
| 3 | HASH GROUP BY | | 1809K| 745M| | 258K (1)| 00:00:11 |
|* 4 | HASH JOIN | | 1809K| 745M| | 258K (1)| 00:00:11 |
|* 5 | TABLE ACCESS FULL | TABLE1S | 5875 | 172K| | 309 (0)| 00:00:01 |
| 6 | MERGE JOIN SEMI | | 3079K| 1180M| | 257K (1)| 00:00:11 |
| 7 | SORT JOIN | | 3079K| 1139M| | 254K (1)| 00:00:10 |
| 8 | VIEW | | 3079K| 1139M| | 254K (1)| 00:00:10 |
| 9 | HASH UNIQUE | | 3079K| 1139M| 1202M| 254K (1)| 00:00:10 |
| 10 | INLIST ITERATOR | | | | | | |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED| USAGE_DETAILS | 3079K| 1139M| | 46 (0)| 00:00:01 |
|* 12 | INDEX RANGE SCAN | IDX_UUD_EVENTTYPE | 13704 | | | 46 (0)| 00:00:01 |
|* 13 | SORT UNIQUE | | 578K| 7905K| 22M| 3558 (1)| 00:00:01 |
| 14 | INDEX FAST FULL SCAN | USERID_IDX | 578K| 7905K| | 704 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TRACEID"=UPPER("PAGETYPE")||'-'||TO_CHAR("SCH"."ID"))
filter("from$_subquery$_004"."SUD_EVENT_SOURCE"=CASE WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE1' WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='V')) THEN 'EVENTTYPE2' WHEN (("SCH"."PAGETYPE"='Hub') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE3' END )
5 - filter("SCH"."CATEGORY_NAME"<>'archive' AND "SCH"."CATEGORY_NAME"<>'testcategory' AND ("SCH"."PAGETYPE"='Hub' OR
"SCH"."PAGETYPE"='Page'))
12 - access("SUD"."EVENTTYPE"='S' OR "SUD"."EVENTTYPE"='V')
13 - access("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
filter("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation
After running
exec dbms_stats.gather_table_stats(ownname=>'SCHEMA1',tabname=>'USAGE_DETAILS');
exec dbms_stats.gather_table_stats(ownname=>'SCHEMA1',tabname=>'TABLE1');
I have this new plan:
SQL> select plan_table_output from table(dbms_xplan.display());
Plan hash value: 3419946982
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | | 70152 (1)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | 27 | | | |
| 2 | VIEW | VM_NWVW_1 | 53144 | 1401K| | 70152 (1)| 00:00:03 |
| 3 | HASH GROUP BY | | 53144 | 21M| 21M| 70152 (1)| 00:00:03 |
|* 4 | HASH JOIN RIGHT SEMI | | 53144 | 21M| 14M| 65453 (1)| 00:00:03 |
| 5 | INDEX FAST FULL SCAN | USERID_IDX | 578K| 7905K| | 704 (1)| 00:00:01 |
|* 6 | HASH JOIN | | 53144 | 20M| | 62995 (1)| 00:00:03 |
| 7 | JOIN FILTER CREATE | :BF0000 | 5503 | 161K| | 309 (0)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | TABLE1 | 5503 | 161K| | 309 (0)| 00:00:01 |
| 9 | VIEW | | 3549K| 1259M| | 62677 (1)| 00:00:03 |
| 10 | HASH UNIQUE | | 3549K| 159M| 203M| 62677 (1)| 00:00:03 |
| 11 | JOIN FILTER USE | :BF0000 | 3549K| 159M| | 21035 (1)| 00:00:01 |
|* 12 | TABLE ACCESS FULL| USAGE_DETAILS | 3549K| 159M| | 21035 (1)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
6 - access("TRACEID"=UPPER("PAGETYPE")||'-'||TO_CHAR("SCH"."ID"))
filter("from$_subquery$_004"."SUD_EVENT_SOURCE"=CASE WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE1' WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='V')) THEN 'EVENTTYPE2' WHEN (("SCH"."PAGETYPE"='Hub') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE3' END )
8 - filter("SCH"."CATEGORY_NAME"<>'archive' AND "SCH"."CATEGORY_NAME"<>'testcategory' AND
("SCH"."PAGETYPE"='Hub' OR "SCH"."PAGETYPE"='Page'))
12 - filter(("SUD"."EVENTTYPE"='S' OR "SUD"."EVENTTYPE"='V') AND
SYS_OP_BLOOM_FILTER(:BF0000,"SUD"."TRACEID"))
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation
37 rows selected.
Is it more likely that the compute statistics massively helped this query or that someone did something else that I was not aware of? Yes, the query ran much better, but I'd feel better too if I knew why.
Hy,
after reviewing your SQL I noticed your statements are full of string comparisons and searches. For example
SELECT /*+ PARALLEL (AUTO) */
UPPER(PAGETYPE)|| '-' || SCH.ID PAGETYPE_ID ,
SCH.PAGETYPE SCH_PAGETYPE
FROM TABLE1 SCH
WHERE SCH.PAGETYPE IN ('Page', 'Hub')
AND SCH.CATEGORY_NAME NOT IN ('archive', 'testcategory')
This can be indexed int 2 ways.
First: Create table that has 'Page', 'Hub', and other types that you need, create for the a column Index and then "replace" basically adapt your query to resolve those indexes instead of string compare.
Tables can have multiple indices on columns those have to be treated with caution because they create problems in regards of database size.
Also I would check if what are the biggest tables and reorder their selections to the last. Meaning:
if one table has 12 rows and the other 100. First put the 12 row table then the 100 row. This will multiply in your case since the tables and nested and chained.
I made 1 more review and realized I made an oversight.
USAGE_DETAILS table has 3941810 rows
TABLE1 has 5908 rows
USER_JOB_FAMILY_MAPPING has 578233 rows
First filter the table 1, this is costly already, then Inner join raw USAGE_DETAILS and then select the join ID-s.
Then inner join USER_JOB_FAMILY_MAPPING, and select after that. The reason is that the joins are done on the ID which is probably int type.
Gather statistics on the relevant objects like this:
begin
dbms_stats.gather_table_stats(ownname => user, tabname => 'TABLE1');
dbms_stats.gather_table_stats(ownname => 'SOMESCHEMA', tabname => 'USAGE_DETAILS');
end;
/
This line in the execution plan implies that one of the tables is missing statistics:
- dynamic statistics used: dynamic sampling (level=2)
Not all uses of dynamic sampling imply missing statistics, but level 2 is highly suspicious. That sampling level is usually intended to "Apply dynamic sampling to all unanalyzed tables."
Optimizer statistics are necessary for Oracle to make good execution plans. The algorithms and access paths for joining small amounts of data are different than the algorithms and access paths for joining large amounts of data. The optimizer statistics help Oracle estimate the size of the results and build good plans.
If this solves your problem, you should also investigate the root cause. Optimizer statistics should always be gathered manually after a large change, and automatically by the system every night. If you have a large ETL process that significantly changes a table, it should include a call to DBMS_STATS at the end. The database by default gathers stats at 10PM every night, unless a DBA foolishly disabled the autotask.
If that doesn't solve the problem, then regenerate the execution plan with actual numbers using DBMS_SQLTUNE or the GATHER_PLAN_STATISTICS_HINT. SQL tuning is about optimizing the operations. Your SQL statement has 14 operations, each of which is like a miniature program. We need to know which one of the operations is causing the problem. Finding actual cardinalities and actual run times, and comparing them to estimates, helps tremendously with diagnosing SQL problems.
How do we know that gathering stats was what fixed the performance?
We can't be 100% sure. But it's a safe bet that gathering statistics was responsible for the improvement, for several reasons.
Bad or missing statistics are responsible for a large percentage of all Oracle performance problems. Ask any DBA and they'll have plenty of stories about missing statistics.
The Note section changes strongly imply there are no other weird things happening behind the scenes. There are lots of tricks to silently fix queries, like SQL profiles, baselines, adaptive reoptimization, dynamic sampling (shows up in the first plan, but not the second one, because stats are better), etc. But if those tricks were used they would show up in the Note section.

Oracle SQL execution plan changes due to SYS_OP_C2C internal conversion

I'm wondering why cost of this query
select * from address a
left join name n on n.adress_id=a.id
where a.street='01';
is higher than
select * from address a
left join name n on n.adress_id=a.id
where a.street=N'01';
where address table looks like this
ID NUMBER
STREET VARCHAR2(255 CHAR)
POSTAL_CODE VARCHAR2(255 CHAR)
and name table looks like this
ID NUMBER
ADDRESS_ID NUMBER
NAME VARCHAR2(255 CHAR)
SURNAME VARCHAR2(255 CHAR)
These are costs returned by explain plan
Explain plan for '01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3591 | 1595K| 87 (0)| 00:00:02 |
| 1 | NESTED LOOPS OUTER | | 3591 | 1595K| 87 (0)| 00:00:02 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 3 | 207 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."STREET"='01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
Explain plan for N'01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 347 | 154K| 50 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 347 | 154K| 50 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 1 | 69 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("A"."STREET")=U'01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
As you can see cost for N'01' query is lower than cost for '01'. Any idea why? N'01' needs additionally convert varchar to nvarchar so cost should be higher (SYS_OP_C2C()). The other question is why rows processed by N'01' query is lower than '01'?
[EDIT]
Table address has 30 rows.
Table name has 19669 rows.
SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
I am not sure about the reason why the number of rows are less, but I can guarantee it could be more too. Cost estimation won't be affected.
Let's try to see step-by-step in a test case.
SQL> CREATE TABLE t AS SELECT 'a'||LEVEL col FROM dual CONNECT BY LEVEL < 1000;
Table created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = 'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 5 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter("COL"='a10')
13 rows selected.
SQL>
So far so good. Since there is only one row with value as 'a10', optimizer estimated one row.
Let's see with the national characterset conversion.
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter(SYS_OP_C2C("COL")=U'a10')
13 rows selected.
SQL>
What happened here? We can see filter(SYS_OP_C2C("COL")=U'a10'), which means an internal function is applied and it converts the varchar2 value to nvarchar2. The filter now found 10 rows.
This will also suppress any index usage, since now a function is applied on the column. We can tune it by creating a function-based index to avoid full table scan.
SQL> create index nchar_indx on t(to_nchar(col));
Index created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE to_nchar(col) = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1400144832
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 50 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NCHAR_INDX | 4 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - access(SYS_OP_C2C("COL")=U'a10')
14 rows selected.
SQL>
However, will this make the execution plans similar? No. i think with two different charactersets , the filter will not be applied alike. Thus, the difference lies.
My research says,
Usually, such scenarios occur when the data coming via an application
is nvarchar2 type, but the table column is varchar2. Thus, Oracle
applies an internal function in the filter operation. My suggestion
is, to know your data well, so that you use similar data types during
design phase.
When worrying about explain plans, it matters whether there are current statistics on the tables. If the statistics do not represent the actual data reasonably well, then the optimizer will make mistakes and estimate cardinalities incorrectly.
You can check how long ago statistics were gathered by querying the data dictionary:
select table_name, last_analyzed
from user_tables
where table_name in ('ADDRESS','NAME');
You can gather statistics for the optimizer to use by calling DBMS_STATS:
begin
dbms_stats.gather_table_stats(user, 'ADDRESS');
dbms_stats.gather_table_stats(user, 'NAME');
end;
So perhaps after gathering statistics you will get different explain plans. Perhaps not.
The difference in your explain plans is primarily because the optimizer estimates how many rows it will find in address table differently in the two cases.
In the first case you have an equality predicate with same datatype - this is good and the optimizer can often estimate cardinality (row count) reasonably well for cases like this.
In the second case a function is applied to the column - this is often bad (unless you have function based indexes) and will force the optimizer to take a wild guess. That wild quess will be different in different versions of Oracle as the developers of the optimizer tries to improve upon it. Some versions the wild guess will simply be something like "I guess 5% of the number of rows in the table."
When comparing different datatypes, it is best to avoid implicit conversions, particularly when like this case the implicit conversion makes a function on the column rather than the literal. If you have cases where you get a value as datatype NVARCHAR2 and need to use it in a predicate like above, it can be a good idea to explicitly convert the value to the datatype of the column.
select * from address a
left join name n on n.adress_id=a.id
where a.street = CAST( N'01' AS VARCHAR2(255));
In this case with a literal it does not make sense, of course. Here you would just use your first query. But if it was a variable or function parameter, maybe you could have use cases for doing something like this.
As I can see the first query returns 3591 rows, the second one returns 347 rows. So Oracle needs less I/O operation that's why the cost is less.
Don't be confused with
N'01' needs additionally convert varchar to nvarchar
Oracle does one hard parse and then uses soft parse for the same queries. So the longer your oracle works the faster it becomes.

Understanding the Explain plan

I have a query that is taking 10 secs of time to run currently(about 300 lines). Now I add a where condition table_a.column_a ='XXX' like in the below query. The amount of time it takes to run it now has increased to 300 secs.
When I ran the explain plan. I see that this new where condition has some impact, looks like a sort operation(plan result below). I did not mention sort anywhere in the sql. Why is this piece so significant?
QUERY:
SELECT * from TABLE_A,TABLE_B WHERE TABLE_A.ID = TABLE_B.SOMEID AND TABLE_A.COLUMN_A='XXX';
EXPLAINPLAN RESULT:(REMOVED THE UNNECESSARY PART)
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 2878 | 154 (2)| 00:00:02 |
--removed lines here--
| 124 | BUFFER SORT | | 1 | 24 | 126 (1)| 00:00:02 |
| 125 | TABLE ACCESS BY INDEX ROWID | TABLE_A | 1 | 24 | 3 (0)| 00:00:01 |
|*126 | INDEX RANGE SCAN | COLUMN_A | 1 | | 2 (0)| 00:00:01 |
It looks like the sort operation is in place to allow for an index scan for your where condition rather than a generally more expensive sequential scan. It could be that, in this instance, the sort plus index scan is more expensive than the sequential scan would be. You could try changing this behavior by dropping the operative index, or by using hints to dictate the access method.