Performance impact of multiple aggregate functions in HAVING clause

Performance impact of multiple aggregate functions in HAVING clause - sql

I am wondering if there is a difference between the two queries below.
I am looking for a general answer to explain how the optimizer treats each of these answers. There is an index on t.id.
The version of Oracle is 11g.
select t.id, sum(t.amount)
from transaction t
group by t.id
having sum(t.amount) between -0.009 and 0.009
select t.id, sum(t.amount)
from transaction t
group by t.id
having sum(t.amount) >= -0.009 and sum(t.amount)<= 0.009

In an aggregation query, most of the work involves moving the data around. There is some overhead for aggregations, but it is usually pretty simple.
And, the SQL compiler can decide if it wants to re-use aggregated expressions. Just because you use sum(amount) twice in the query doesn't mean that it gets executed twice.
Some aggregation functions are more expensive -- especially on strings or using distinct. You can always test queries to see if there is much impact, but in general, you should worry about whether your logic is correct not how many times you are using aggregation functions.

If you want to obseve basic information about the steps decided by the CBO for the execution of SQL statement use explain plan
Example
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select DEPARTMENT_ID, sum(salary)
from HR.employees
group by DEPARTMENT_ID
having sum(salary) between 5000 and 10000
;
--
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
The query returns
Plan hash value: 244580604
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 4 (25)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 7 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMPLOYEES | 107 | 749 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
3 - SEL$1 / EMPLOYEES#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("SALARY")>=5000 AND SUM("SALARY")<=10000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (rowset=256) "DEPARTMENT_ID"[NUMBER,22], SUM("SALARY")[22]
2 - (#keys=1; rowset=256) "DEPARTMENT_ID"[NUMBER,22],
SUM("SALARY")[22]
3 - (rowset=256) "SALARY"[NUMBER,22], "DEPARTMENT_ID"[NUMBER,22]
So first of all you see a TABLE ACCESS FULL is performed (line 3), so your index assumption is not correct.
As pointed in other answer, you see the between is translated in two perdicates connected with and (filter line 1).
But most impertant fro yur question is the Column Projection, you see that the sum(SALARY) is calculated in line 2 (HASH GROUP BY operation) and passed to the line 1 (FILTER), in both cases only once (one column with length 22).
So don't worry about multiple calculation.

There is absolutely no difference between the two queries. between is just syntactical sugar; the parser immediately transforms the between condition into the two inequalities, combined with the and operator. This is done even before the optimizer sees the query. (Note that in this context the distinction between the parsing and the optimization stages is meaningful, even though often programmers think of them as a single step.)
Trivial example:
SQL> set autotrace traceonly explain
SQL> select deptno, sum(sal) as sum_sal
2 from scott.emp
3 group by deptno
4 having sum(sal) between 10000 and 20000
5 ;
Execution Plan
----------------------------------------------------------
Plan hash value: 2138686577
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 4 (25)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 7 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("SAL")>=10000 AND SUM("SAL")<=20000)
The "index on..." thing that you mention has nothing to do with the question.

Another fun way to test this:
with function expand_sql_text(text_in varchar2)
return varchar2
as
text_out long;
begin
dbms_utility.expand_sql_text(text_in, text_out);
return text_out;
end expand_sql_text;
select expand_sql_text(
'select * from dual where 2 between 1 and 3'
) as text_out
from dual
/
TEXT_OUT
------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT "A1"."DUMMY" "DUMMY" FROM "SYS"."DUAL" "A1" WHERE 2>=1 AND 2<=3
1 row selected.
In your original question, the second predicate was
having sum(t.amount) > -0.009 and sum(t.amount)< 0.009
which is not the same as the between version, because between is not exclusive.
In SQL generally, filter predicates against simple literals do not normally lead to any significant performance overhead. In a group by clause, the fact that the predicate is applied after aggregation reduces any overhead even further.

Related

What is the best way to see if the row exsists when you know that you only need to check the recent rows?

I have a table that stores the response from certain API.
It has 1.7 million rows.
pk is a kind of UnixTime(not exactly, but smilliar).
I call the API very frequently to see if the data had changed.
To check if the data had changed, I have to run this command:
SELECT 1
FROM RATE
WHERE REGDATE = '$apiReponseDate' --yymmddhhmmss
If the answer is False, that means the reponse had changed, and then I insert.
I have an INDEX on REGDATE, and I know this makes the table to do the binary search, not a full-search.
but I do know that in order to know if the data had updated, I only need to check the recent rows.
To me, using WHERE for the whole table seems an inefficient way.
Is there any good way to see if the data I got from the API response is already in DB or not?
I'm using Oracle, but that is not a main point because I'm thinking about searching the query's efficiency.

You may use index_desc hint and filter by rownum to access the table and read the most recent row. Then compare this value with the current API response.
Example is below for (default) ascending index. If an index is created as id desc, then you need to reverse the order of reading (specify index_asc hint).
create table t (
id not null,
val
) as
select level,
dbms_random.string('x', 1000)
from dual
connect by level < 5000
create unique index t_ix
on t(id)
select
/*+
index_desc(t (t.id))
gather_plan_statistics
*/
id,
substr(val, 1, 10)
from t
where rownum = 1
ID
SUBSTR(VAL,1,10)
4999
D0H3YOHB5E
select *
from dbms_xplan.display_cursor(
format => 'ALL ALLSTATS'
)
PLAN_TABLE_OUTPUT
:-----------------
SQL_ID 2ym2rg02qfmk4, child number 0
-------------------------------------
select /*+ index_desc(t (t.id)) gather_plan_statistics */
id, substr(val, 1, 10) from t where rownum = 1
 
Plan hash value: 1335626365
 
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 3 (100)| | 1 |00:00:00.01 | 3 | 1 |
|* 1 | COUNT STOPKEY | | 1 | | | | | 1 |00:00:00.01 | 3 | 1 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1005 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 3 | 1 |
| 3 | INDEX FULL SCAN DESCENDING | T_IX | 1 | 4999 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 2 | 1 |
------------------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$1
   2 - SEL$1 / "T"#"SEL$1"
   3 - SEL$1 / "T"#"SEL$1"
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
   1 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000]
   2 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000]
   3 - "T".ROWID[ROWID,10], "ID"[NUMBER,22]
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
 
   2 - SEL$1 / "T"#"SEL$1"
           - index_desc(t (t.id))
fiddle

Instead of doing a SELECT and then INSERT, which is two queries, then you could combine the two into a MERGE statement:
MERGE INTO rate r
USING DUAL ignore
ON r.redgate = '$apiReponseDate'
WHEN NOT MATCHED THEN
INSERT (col1, col2, col3, redgate)
VALUES ('value1', 'value2', 'value3', '$apiNewReponseDate');
This will prevent you having to use two round trips from the middle-tier to the database and do it all in a single query.

Does Oracle View calculate fields you are not trying to query?

I am trying to optimize the performance of a query. The problem is:
Say we have view X. And it has attr A, B, C.
C is a calculated field.
Say I want to try to optimize by querying "Select A,B from X where A = 'some condition'" then if I need C I can calculate that later on with the much smaller subset of data to enhance performance.
My question is will this help? Or does an Oracle view calculate C anyways when I make the initial query, regardless of whether I am querying for that attr or not? Therefore to optimize I would have to remove these calculated from the view?

The short answer is yes it helps to select only the limited column list from a view - both in means of the CPU (value calculation) and storage.
In cases when it is possible Oracle optimizer completely eliminates the view definition and merges it in the underlining tables.
So the view columns not referenced in the query are not accessed at all.
Here a simple example
create table t as
select rownum a, rownum b, rownum c from dual
connect by level = 10;
create or replace view v1 as
select a, b, c/0 c from t;
I'm simulation the complex calculation with a zero divide to see if the value is caclulated at all.
The best way to check is to run the query and to see the execution plan
select a, b from v1
where a = 1;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 26 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"=1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22], "B"[NUMBER,22]
In the Project Information is visible, that only columns A and B are refrenced - so no calculation on the C column is done.
Even the other way around work; if you first materialize the view and afterward you make the row filtering. I'm simulating it with the folowing query - not ethat teh MATERALIZE hint materializes all the view rows in a temporary table, that is used in the query.
with vv as (
select /*+ MATERIALIZE */ a,b from v1)
select a, b from vv
where a = 1;
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 5 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6605_5E8CE554 | | | | |
| 3 | TABLE ACCESS FULL | T | 1 | 26 | 3 (0)| 00:00:01 |
|* 4 | VIEW | | 1 | 26 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6605_5E8CE554 | 1 | 26 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("A"=1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"[NUMBER,22], "B"[NUMBER,22]
2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[96], SYSDEF[0]
3 - "A"[NUMBER,22], "B"[NUMBER,22]
4 - "A"[NUMBER,22], "B"[NUMBER,22]
5 - "C0"[NUMBER,22], "C1"[NUMBER,22]
Again you see in the Projection Information the column C is not referenced.
What doesn't work and you should avoid is to materialize the view with a select * .. - this of course fails.
with vv as (
select /*+ MATERIALIZE */ * from v1)
select a, b from vv
where a = 1;

SQL is a descriptive language not a procedural language. The database optimizer actually determines the execution.
For your scenario, there are two reasonable options for the optimizer:
Filter the rows and then calculate the value after filtering.
Calculate the value first and then filter.
Oracle has a good optimizer, so I would expect it to consider both possibilities. It chooses the one that would have the best performance for your query.
If you still need c only for some of the rows returned, then delaying the computation might be worthwhile if it is really really expensive. However, that would be an unusual optimization.

how to optimize pl sql if statement with multiple OR conditions

I’m new to pl sql can you please let me know how i can optimize the below if statement?
IF (inSeries=‘90’) OR (inSeries=‘91’) OR (inSeries=‘92’) OR (inSeries=‘93’) OR (inSeries=‘94’) THEN
like in sql we can use
WHERE inSeries IN (‘90’,’91’,’92’,’93’,’94’)

In PLSQL also 'IN' condition works as IF condition
declare
inSeries varchar2(2) := '90';
begin
if inseries in ('90','91','92','93','94')
then
dbms_output.put_line(inseries ||':this is within series');
else
dbms_output.put_line(inseries ||':this is out of series');
end if;
end;
-- output
90:this is within series
80:this is out of series
but there is another way depending on the business logic, as i can see from your question that its in series increment, you can directly use greater than and less than combination...

Optimizer will most probably rewrite query so your IN will become OR anyway. Compare line 3 in the query and the very last line:
SQL> select job
2 from emp
3 where deptno in (10, 20, 30, 40);
JOB
---------
CLERK
SALESMAN
<snip>
14 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 154 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| EMP | 14 | 154 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DEPTNO"=10 OR "DEPTNO"=20 OR "DEPTNO"=30 OR "DEPTNO"=40)
SQL>

You can use your SQL query itself along with 'EXISTS' keyword.
IF EXISTS (SELECT * FROM <table_name> WHERE inSeries IN (‘90’,’91’,’92’,’93’,’94’))

Sequence of query execution

When there is a correlated query, what is the sequence of execution?
Ex:
select
p.productNo,
(
select count(distinct concat(bom.detailpart,bom.groupname))
from dl_MBOM bom
where bom.DetailPart=p.ProductNo
) cnt1
from dm_product p

The execution plan will vary by database vendors. For Oracle, here is a similar query, and the corresponding execution plan.
select dname,
( select count( distinct job )
from emp e
where e.deptno = d.deptno
) x
from dept d
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | SORT GROUP BY | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| EMP | 5 | 55 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | DEPT | 4 | 52 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("E"."DEPTNO"=:B1)

While it seems likely that the DBMS reads record for record from dm_product and for each such record looks up the value in dl_MBOM, this doesn't necessarily happen.
With an SQL query you tell the DBMS mainly what to do, not how to do it. If the DBMS thinks it better to build a join instead and work on this, it is free to do so.
Short answer: the sequence of execution is not determined. (You can, however, in many DBMS look at the query's execution plan to see how it is executed.)

Oracle SQL execution plan changes due to SYS_OP_C2C internal conversion

I'm wondering why cost of this query
select * from address a
left join name n on n.adress_id=a.id
where a.street='01';
is higher than
select * from address a
left join name n on n.adress_id=a.id
where a.street=N'01';
where address table looks like this
ID NUMBER
STREET VARCHAR2(255 CHAR)
POSTAL_CODE VARCHAR2(255 CHAR)
and name table looks like this
ID NUMBER
ADDRESS_ID NUMBER
NAME VARCHAR2(255 CHAR)
SURNAME VARCHAR2(255 CHAR)
These are costs returned by explain plan
Explain plan for '01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3591 | 1595K| 87 (0)| 00:00:02 |
| 1 | NESTED LOOPS OUTER | | 3591 | 1595K| 87 (0)| 00:00:02 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 3 | 207 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."STREET"='01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
Explain plan for N'01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 347 | 154K| 50 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 347 | 154K| 50 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 1 | 69 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("A"."STREET")=U'01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
As you can see cost for N'01' query is lower than cost for '01'. Any idea why? N'01' needs additionally convert varchar to nvarchar so cost should be higher (SYS_OP_C2C()). The other question is why rows processed by N'01' query is lower than '01'?
[EDIT]
Table address has 30 rows.
Table name has 19669 rows.

SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
I am not sure about the reason why the number of rows are less, but I can guarantee it could be more too. Cost estimation won't be affected.
Let's try to see step-by-step in a test case.
SQL> CREATE TABLE t AS SELECT 'a'||LEVEL col FROM dual CONNECT BY LEVEL < 1000;
Table created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = 'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 5 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter("COL"='a10')
13 rows selected.
SQL>
So far so good. Since there is only one row with value as 'a10', optimizer estimated one row.
Let's see with the national characterset conversion.
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter(SYS_OP_C2C("COL")=U'a10')
13 rows selected.
SQL>
What happened here? We can see filter(SYS_OP_C2C("COL")=U'a10'), which means an internal function is applied and it converts the varchar2 value to nvarchar2. The filter now found 10 rows.
This will also suppress any index usage, since now a function is applied on the column. We can tune it by creating a function-based index to avoid full table scan.
SQL> create index nchar_indx on t(to_nchar(col));
Index created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE to_nchar(col) = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1400144832
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 50 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NCHAR_INDX | 4 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - access(SYS_OP_C2C("COL")=U'a10')
14 rows selected.
SQL>
However, will this make the execution plans similar? No. i think with two different charactersets , the filter will not be applied alike. Thus, the difference lies.
My research says,
Usually, such scenarios occur when the data coming via an application
is nvarchar2 type, but the table column is varchar2. Thus, Oracle
applies an internal function in the filter operation. My suggestion
is, to know your data well, so that you use similar data types during
design phase.

When worrying about explain plans, it matters whether there are current statistics on the tables. If the statistics do not represent the actual data reasonably well, then the optimizer will make mistakes and estimate cardinalities incorrectly.
You can check how long ago statistics were gathered by querying the data dictionary:
select table_name, last_analyzed
from user_tables
where table_name in ('ADDRESS','NAME');
You can gather statistics for the optimizer to use by calling DBMS_STATS:
begin
dbms_stats.gather_table_stats(user, 'ADDRESS');
dbms_stats.gather_table_stats(user, 'NAME');
end;
So perhaps after gathering statistics you will get different explain plans. Perhaps not.
The difference in your explain plans is primarily because the optimizer estimates how many rows it will find in address table differently in the two cases.
In the first case you have an equality predicate with same datatype - this is good and the optimizer can often estimate cardinality (row count) reasonably well for cases like this.
In the second case a function is applied to the column - this is often bad (unless you have function based indexes) and will force the optimizer to take a wild guess. That wild quess will be different in different versions of Oracle as the developers of the optimizer tries to improve upon it. Some versions the wild guess will simply be something like "I guess 5% of the number of rows in the table."
When comparing different datatypes, it is best to avoid implicit conversions, particularly when like this case the implicit conversion makes a function on the column rather than the literal. If you have cases where you get a value as datatype NVARCHAR2 and need to use it in a predicate like above, it can be a good idea to explicitly convert the value to the datatype of the column.
select * from address a
left join name n on n.adress_id=a.id
where a.street = CAST( N'01' AS VARCHAR2(255));
In this case with a literal it does not make sense, of course. Here you would just use your first query. But if it was a variable or function parameter, maybe you could have use cases for doing something like this.

As I can see the first query returns 3591 rows, the second one returns 347 rows. So Oracle needs less I/O operation that's why the cost is less.
Don't be confused with
N'01' needs additionally convert varchar to nvarchar
Oracle does one hard parse and then uses soft parse for the same queries. So the longer your oracle works the faster it becomes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Performance impact of multiple aggregate functions in HAVING clause - sql

Related

What is the best way to see if the row exsists when you know that you only need to check the recent rows?

Does Oracle View calculate fields you are not trying to query?

how to optimize pl sql if statement with multiple OR conditions

Sequence of query execution

Oracle SQL execution plan changes due to SYS_OP_C2C internal conversion

Categories

Resources