Join much slower using table() function - sql

I am attempting to use a table() function on an object in order to do a join within a PL/SQL function. When using this function, a query may take up to 20 minutes to complete; when I enter the data directly into a table instead, it takes less than 5 seconds. I have not been able to figure out why there is such a significant difference, but my best hunch is that the index on the column from the joining table is not being used. The column definition for the tables and for the objects is the same.
Here is some example code:
create or replace type VARCHAR20_TYPE is OBJECT
(
val varchar2(20 byte);
);
create or replace type VARCHAR20_TABLE is table of VARCHAR20_TYPE;
create or replace FUNCTION test_function(
in_project_ids VARCHAR20_TABLE
) RETURN INTEGER
IS
l_result INTEGER;
BEGIN
SELECT count(*) into l_result FROM project p JOIN TABLE(in_project_ids) t ON p.project_id = t.val;
RETURN l_result;
END;
If I were to replace in_project_ids in the above example with a join to a real table with the same column definition, it significantly improves the performance of the function.

this is to be expected. when dealing with in memory arrays like this Oracle will assume 8k rows will be in that table.
try this to help it:
SELECT /*+ cardinality(t, 20) */ count(*) into l_result FROM project p JOIN TABLE(in_project_ids) t ON p.project_id = t.val;
where 20 should be a rough guess on the actual number of entries. this is one of the edge cases where hinting is "ok" (and required to help the optimizer).
edit
eg:
SQL> explain plan for SELECT /*+ cardinality(t, 1) */ * FROM project p JOIN TABLE(VARCHAR20_TABLE()) t ON p.project_id = t.val;
Explained.
SQL> select * From table(dbms_xplan.display);
Plan hash value: 858605789
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 27 | 30 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 1 | 2 | 29 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C0011177 | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID | PROJECT | 1 | 25 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."PROJECT_ID"=TO_NUMBER(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)))
Note
-----
- dynamic sampling used for this statement (level=2)
21 rows selected.
SQL> explain plan for SELECT * FROM project p JOIN TABLE(VARCHAR20_TABLE()) t ON p.project_id = t.val;
Explained.
SQL> select * From table(dbms_xplan.display);
Plan hash value: 583089723
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 215K| 33 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 8168 | 215K| 33 (4)| 00:00:01 |
| 2 | TABLE ACCESS FULL | PROJECT | 2000 | 50000 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 16336 | 29 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("P"."PROJECT_ID"=TO_NUMBER(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)))
Note
-----
- dynamic sampling used for this statement (level=2)
19 rows selected.
a trivial example but note the "Rows" on the collection fetch = 8168 without the hint and the change in plan as a result. check the explain plan with the real table vs the collection vs the hinted collection and helpfully, with a reasonable cardinality hint number your plan and performance should improve.

Related

Performance problem with QUERY using BIND variables and OR condition in Oracle 12.2

I am having a hard time understanding why the Oracle CBO is behaving the way it does when a bind variable is part of a OR condition.
My environment
Oracle 12.2 over Red Hat Linux 7
HINT. I am just providing a simplification of the query where the problem is located
$ sqlplus / as sysdba
SQL*Plus: Release 12.2.0.1.0 Production on Thu Jun 10 15:40:07 2021
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> #test.sql
SQL> var loanIds varchar2(4000);
SQL> exec :loanIds := '100000018330,100000031448,100000013477,100000023115,100000022550,100000183669,100000247514,100000048198,100000268289';
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.00
SQL> SELECT
2 whs.* ,
3 count(*) over () AS TOTAL
4 FROM ALFAMVS.WHS_LOANS whs
5 WHERE
6 ( nvl(:loanIds,'XX') = 'XX' or
7 loanid IN (select regexp_substr(NVL(:loanIds,''),'[^,]+', 1, level) from dual
8 connect by level <= regexp_count(:loanIds,'[^,]+'))
9 )
10 ;
7 rows selected.
Elapsed: 00:00:18.72
Execution Plan
----------------------------------------------------------
Plan hash value: 2980809427
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6729 | 6748K| 2621 (1)| 00:00:01 |
| 1 | WINDOW BUFFER | | 6729 | 6748K| 2621 (1)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | WHS_LOANS | 113K| 110M| 2621 (1)| 00:00:01 |
|* 4 | FILTER | | | | | |
|* 5 | CONNECT BY WITHOUT FILTERING (UNIQUE)| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(:LOANIDS,'XX')='XX' OR EXISTS (SELECT 0 FROM "DUAL" "DUAL" WHERE
SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1 CONNECT BY LEVEL<=
REGEXP_COUNT (:LOANIDS,'[^,]+')))
4 - filter(SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1)
5 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
Statistics
----------------------------------------------------------
288 recursive calls
630 db block gets
9913 consistent gets
1 physical reads
118724 redo size
13564 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
113003 sorts (memory)
0 sorts (disk)
7 rows processed
SQL> set autotrace off
SQL> select count(*) from ALFAMVS.WHS_LOANS ;
COUNT(*)
----------
113095
1 row selected.
Elapsed: 00:00:00.14
KEY POINTS
I do know that if I change the OR expression by using two selects and UNION ALL works perfectly. The problem is that I have a lot of conditions done in the same way, so UNION ALL is not a solution in my case.
The table has statistics up to date calculated with FOR ALL COLUMNS SIZE AUTO and with ESTIMATE PERCENT 10%.
Dynamic SQL is not a solution in my case, because the query is called through a third party software that uses an API Web to convert the result to JSON.
I was able to rephrase the regular expression with connect by level in a way that now takes 19 seconds. Before it was taking 40 seconds.
The table has only 113K records and no indexes.
The query has 20 conditions of this kind, all written in the same way, as the screen in the web app that triggers the query by the API allows the user to use any combination of parameters or none at all.
If I remove the expression NVL(:loanIds,'XX') = 'XX' OR, the query takes 0.01 seconds. Why this OR expression with BINDs is giving such headache to the Optimizer ?
-- UPDATE --
I want to thank #Alex Poole for his suggestions and share with him that the third alternative ( removing the regular expressions ) has worked as a charm. It would be great to understand why, though. You have my most sincere gratitude. I used those for a while and I never had this problem. Also, the suggestion to use regexp_like was even better than the original one with regexp_substr and connect by level, but much slower by far than the one where no regular expressions are used at all
Original query
7 rows selected.
Elapsed: 00:00:36.29
New query
7 rows selected.
Elapsed: 00:00:00.58
Once the EXISTS disappeared of the internal predicate, the query works as fast as hell.
Thank you all for your comments !
From the execution plan the optimiser is, for some reason, re-evaluating the hierarchical query for every row in your table, and then using exists() to see if that row's ID is in the result. It isn't clear why the or is causing that. It might be something to raise with Oracle.
From experimenting I can see three ways to at least partially work around the problem - though I'm sure there are others. The first is to move the CSV expansion to a CTE and then force that to materialize with a hint:
WITH loanIds_cte (loanId) as (
select /*+ materialize */ regexp_substr(:loanIds,'[^,]+', 1, level)
from dual
connect by level <= regexp_count(:loanIds,'[^,]+')
)
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
loanid IN (select loanId from loanIds_cte)
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 3226738189
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1102 | 9918 | 11 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9FD2A6_198A2E1A | | | | |
|* 3 | CONNECT BY WITHOUT FILTERING| | | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 5 | WINDOW BUFFER | | 1102 | 9918 | 9 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | TABLE ACCESS FULL | WHS_LOANS | 11300 | 99K| 9 (0)| 00:00:01 |
|* 8 | VIEW | | 1 | 2002 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9FD2A6_198A2E1A | 1 | 2002 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
6 - filter(:LOANIDS IS NULL OR EXISTS (SELECT 0 FROM (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0"
"LOANID" FROM "SYS"."SYS_TEMP_0FD9FD2A6_198A2E1A" "T1") "LOANIDS_CTE" WHERE SYS_OP_C2C("LOANID")=:B1))
8 - filter(SYS_OP_C2C("LOANID")=:B1)
That still does the odd transformation to exists(), but at least now that is querying the materialized CTE, so that connect by query is only evaluated one.
Or you could compare each loadId value with the full string using a regular expression:
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
regexp_like(:loanIds, '(^|,)' || loanId || '(,|$)')
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1622376598
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1102 | 9918 | 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 1102 | 9918 | 9 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| WHS_LOANS | 1102 | 9918 | 9 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:LOANIDS IS NULL OR REGEXP_LIKE
(:LOANIDS,SYS_OP_C2C(U'(^|,)'||"LOANID"||U'(,|$)')))
which is slower than the CTE in my testing because regular expression are still expensive and you're doing 113k of them (still, better than 2 x 113k x number-of-elements of them).
Or you can avoid regular expressions and use several separate comparisons:
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
:loanIds like loanId || ',%' or
:loanIds like '%,' || loanId or
:loanIds like '%,' || loanId || ',%'
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1622376598
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2096 | 18864 | 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 2096 | 18864 | 9 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| WHS_LOANS | 2096 | 18864 | 9 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:LOANIDS IS NULL OR :LOANIDS LIKE
SYS_OP_C2C("LOANID"||U',%') OR :LOANIDS LIKE
SYS_OP_C2C(U'%,'||"LOANID") OR :LOANIDS LIKE
SYS_OP_C2C(U'%,'||"LOANID"||U',%'))
which is fastest of those three options in my limited testing. But there may well be better and faster approaches.
Not really relevant, but you seem to be running this as SYS which isn't a good idea, even if the data is in another schema; your loanId column appears to be nvarchar2 (from the SYS_OP_C2C calls), which seems odd for something that could possibly be a number but in any case only seems likely to have ASCII characters; NVL(:loanIds,'') doesn't do anything, since null and empty string are the same in Oracle; and nvl(:loanIds,'XX') = 'XX' can be done as :loanIds is not null which avoid magic values.

ORA-01652 Why does unused row limiter solve this?

If I run a query without a row limiter i get an ora-01652 telling me I am out of temp table space. (I'm not the DBA & I admittedly don't fully understand this error.) If I add a rownum < 1000000000 it runs in a few seconds (yes, it's limited to a billion rows). My inner query only returns about 1,000 rows. How is an absurdly large row limiter, that is never reached, making this query run? There should be no difference between the limited and unlimited queries, no?
select
col1,
col2,
...
from
(
select
col1, col2,...
from table1 a
join table2 b-- limiter for performance
on a.column= b.column
or a.col= b.col
where
filter = 'Y'
and rownum <1000000000 -- irrelevant but query doesn't run without it.
) c
join table3 d
on c.id = d.id
We need to see the execution plan for the queries with and without the rownum condition. But as an example, adding a "rownum" can change an execution plan
SQL> create table t as select * from dba_objects
2 where object_id is not null;
Table created.
SQL>
SQL> create index ix on t ( object_id );
Index created.
SQL>
SQL> set autotrace traceonly explain
SQL> select * from t where object_id > 0 ;
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 82262 | 10M| 445 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 82262 | 10M| 445 (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_ID">0)
SQL> select * from t where object_id > 0 and rownum < 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 658510075
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 1188 | 3 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 9 | 1188 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IX | | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
This is a simplistic example, but you can get similar things with joins and the like, in particular, the "rownum" clause might be prohibiting the innermost join being folded into the outermost one, and thus yielding a different plan.

Oracle Optimizer Awkwardly Doesn't Prefer to Use Index

I'm joining a table with itself but although I expect this operation to use index, it seems it doesn't. There are 1 million records on the table(MY_TABLE) and the query I run is executing on about 10 thousand records.(So it is lower than %1 of whole table.)
Test Case:
explain plan for
SELECT *
FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
WHERE (A1.K_ID = '123abc')
AND A1.HDT = A2.HDT
AND A2.C_DATE BETWEEN A1.SYSDATE - 0.0004
AND A1.SYSDATE + 0.0004
AND A1.GKID = A2.GKID;
Plan hash value: 1210306805
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 81 | 28 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | HASH JOIN | | 3 | 81 | 28 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 3 | 45 | 7 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_MY_TABLE_C_DATE | 3 | | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 17 | 204 | 21 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IX_MY_TABLE_K_ID | 17 | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SYSDATE#!+0.00004>=SYSDATE#!-0.00004)
2 - access("A1"."HDT"="A2"."HDT" AND
"A1"."GKID"="A2"."GKID")
4 - access("A2"."C_DATE">=SYSDATE#!-0.00004 AND
"A2"."C_DATE"<=SYSDATE#!+0.00004)
6 - access("A1"."K_ID"=U'123abc')
In the above statement, it can be seen that the index on C_DATE is used.
However, in the below statement, the index on C_DATE is not used. So, the query runs really slow.
Real Case:
explain plan for
SELECT *
FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
WHERE (A1.K_ID = '123abc')
AND A1.HDT = A2.HDT
AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004
AND A1.C_DATE + 0.0004
AND A1.GKID = A2.GKID;
Plan hash value: 1063167343
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4187K| 998M| 6549K (1)| 00:04:16 |
|* 1 | HASH JOIN | | 4187K| 998M| 6549K (1)| 00:04:16 |
| 2 | JOIN FILTER CREATE | :BF0000 | 17 | 2125 | 21 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE | 17 | 2125 | 21 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX_MY_TABLE_K_ID | 17 | | 4 (0)| 00:00:01 |
| 5 | JOIN FILTER USE | :BF0000 | 1429M| 166G| 6546K (1)| 00:04:16 |
|* 6 | TABLE ACCESS STORAGE FULL | MY_TABLE | 1429M| 166G| 6546K (1)| 00:04:16 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access(A1.HDT=A2.HDT AND
A1.GKID=A2.GKID)
filter(A2.C_DATE>=INTERNAL_FUNCTION(A1.C_DATE)-0.00004 AND
A2.C_DATE<=INTERNAL_FUNCTION(A1.C_DATE)+0.00004)
4 - access(A1.K_ID=U'123abc')
6 - storage(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
filter(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
If I use the hint /*+index(A2,IX_MY_TABLE_C_DATE )*/, everthing is fine, the index is used and the query runs fast as I want.
The query in the real case cannot be changed because it is created by application.
Index Information:
K_ID, not unique, position 1
HDT, not unique, position 1
C_DATE, not unique, position 1
ID Unique and Primary Key, position 1
What do I have to change in order the query in the real case to use index?
Well, the second query is slower since it's quite different from the first one. It has an extra join between tables:
AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004
AND A1.C_DATE + 0.0004
and on a million rows this takes a toll.
The first query doesn't have this join condition and both tables are:
Filtered first. This is fast using indexes: only 3 rows and 17 rows.
Joining them second. Joining 3 and 17 rows doesn't take any time.
The second query needs to perform:
A huge (hash) join first, that returns probably 100K+ rows.
A filtering later.
This is way slower.
I suggest adding the following indexes, and try again:
create index ix_1 (k_id);
create index ix_2 (hdt, gkid, c_date);
You have three join criteria (HDT, GKID, C_DATE) and 1 non-join criteria (K_ID) in your self-join. So for me it would seem natural, if the DBMS started with the records matching K_ID and then looked up all matching other records.
For this case I'd suggest the following indexes:
create index idx1 on my_table(k_id, hdt, gkid, c_date);
create index idx2 on my_table(hdt, gkid, c_date);
If there are just few records per k_id, I am sure that Oracle will use the indexes. If there are many, Oracle may still use the second one for a hash join.
Just to add, whenever you have a case where you
have a slow SQL,
have identified a hint that would make it
faster
cannot change the SQL because the source is not available
then this is a perfect case for SQL Plan Baselines. These let you lock a "good" plan against an existing SQL without touching the SQL statement itself.
The entire series describing SPM is at the links below, but thelinke to "part 4" walks through an exact example of what you want to achieve.
https://blogs.oracle.com/optimizer/sql-plan-management-part-1-of-4-creating-sql-plan-baselines
https://blogs.oracle.com/optimizer/sql-plan-management-part-2-of-4-spm-aware-optimizer
https://blogs.oracle.com/optimizer/sql-plan-management-part-3-of-4:-evolving-sql-plan-baselines
https://blogs.oracle.com/optimizer/sql-plan-management-part-4-of-4:-user-interfaces-and-other-features

Why Oracle it's running this (wrong) query

Why Oracle it's running this (wrong) query?
SELECT * FROM CUSTOMERS WHERE CUSTOMER_TYPE_ID = 1ORDER BY ID;
without a space between 1 and ORDER
In Oracle a variable name or identifier starts with underscore("_") or letters. So, for 1order, the interpreter knows there is no identifier, it must be a number, so it tries to get the number and separate the rest and succeeds.
Looking at the explain plan, you can see that Oracle could resolve the filter predicate, and the query is considered valid.
SQL> EXPLAIN PLAN FOR
2 SELECT * FROM OE.CUSTOMERS WHERE CUSTOMER_ID = 232ORDER BY CUSTOMER_ID;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 4238351645
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 177 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| CUSTOMERS | 1 | 177 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | CUSTOMERS_PK | 1 | | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CUSTOMER_ID"=232)
14 rows selected.
SQL>
So, optimizer could identify it as access("CUSTOMER_ID"=232)

Oracle SQL execution plan changes due to SYS_OP_C2C internal conversion

I'm wondering why cost of this query
select * from address a
left join name n on n.adress_id=a.id
where a.street='01';
is higher than
select * from address a
left join name n on n.adress_id=a.id
where a.street=N'01';
where address table looks like this
ID NUMBER
STREET VARCHAR2(255 CHAR)
POSTAL_CODE VARCHAR2(255 CHAR)
and name table looks like this
ID NUMBER
ADDRESS_ID NUMBER
NAME VARCHAR2(255 CHAR)
SURNAME VARCHAR2(255 CHAR)
These are costs returned by explain plan
Explain plan for '01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3591 | 1595K| 87 (0)| 00:00:02 |
| 1 | NESTED LOOPS OUTER | | 3591 | 1595K| 87 (0)| 00:00:02 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 3 | 207 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."STREET"='01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
Explain plan for N'01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 347 | 154K| 50 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 347 | 154K| 50 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 1 | 69 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("A"."STREET")=U'01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
As you can see cost for N'01' query is lower than cost for '01'. Any idea why? N'01' needs additionally convert varchar to nvarchar so cost should be higher (SYS_OP_C2C()). The other question is why rows processed by N'01' query is lower than '01'?
[EDIT]
Table address has 30 rows.
Table name has 19669 rows.
SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
I am not sure about the reason why the number of rows are less, but I can guarantee it could be more too. Cost estimation won't be affected.
Let's try to see step-by-step in a test case.
SQL> CREATE TABLE t AS SELECT 'a'||LEVEL col FROM dual CONNECT BY LEVEL < 1000;
Table created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = 'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 5 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter("COL"='a10')
13 rows selected.
SQL>
So far so good. Since there is only one row with value as 'a10', optimizer estimated one row.
Let's see with the national characterset conversion.
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter(SYS_OP_C2C("COL")=U'a10')
13 rows selected.
SQL>
What happened here? We can see filter(SYS_OP_C2C("COL")=U'a10'), which means an internal function is applied and it converts the varchar2 value to nvarchar2. The filter now found 10 rows.
This will also suppress any index usage, since now a function is applied on the column. We can tune it by creating a function-based index to avoid full table scan.
SQL> create index nchar_indx on t(to_nchar(col));
Index created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE to_nchar(col) = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1400144832
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 50 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NCHAR_INDX | 4 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - access(SYS_OP_C2C("COL")=U'a10')
14 rows selected.
SQL>
However, will this make the execution plans similar? No. i think with two different charactersets , the filter will not be applied alike. Thus, the difference lies.
My research says,
Usually, such scenarios occur when the data coming via an application
is nvarchar2 type, but the table column is varchar2. Thus, Oracle
applies an internal function in the filter operation. My suggestion
is, to know your data well, so that you use similar data types during
design phase.
When worrying about explain plans, it matters whether there are current statistics on the tables. If the statistics do not represent the actual data reasonably well, then the optimizer will make mistakes and estimate cardinalities incorrectly.
You can check how long ago statistics were gathered by querying the data dictionary:
select table_name, last_analyzed
from user_tables
where table_name in ('ADDRESS','NAME');
You can gather statistics for the optimizer to use by calling DBMS_STATS:
begin
dbms_stats.gather_table_stats(user, 'ADDRESS');
dbms_stats.gather_table_stats(user, 'NAME');
end;
So perhaps after gathering statistics you will get different explain plans. Perhaps not.
The difference in your explain plans is primarily because the optimizer estimates how many rows it will find in address table differently in the two cases.
In the first case you have an equality predicate with same datatype - this is good and the optimizer can often estimate cardinality (row count) reasonably well for cases like this.
In the second case a function is applied to the column - this is often bad (unless you have function based indexes) and will force the optimizer to take a wild guess. That wild quess will be different in different versions of Oracle as the developers of the optimizer tries to improve upon it. Some versions the wild guess will simply be something like "I guess 5% of the number of rows in the table."
When comparing different datatypes, it is best to avoid implicit conversions, particularly when like this case the implicit conversion makes a function on the column rather than the literal. If you have cases where you get a value as datatype NVARCHAR2 and need to use it in a predicate like above, it can be a good idea to explicitly convert the value to the datatype of the column.
select * from address a
left join name n on n.adress_id=a.id
where a.street = CAST( N'01' AS VARCHAR2(255));
In this case with a literal it does not make sense, of course. Here you would just use your first query. But if it was a variable or function parameter, maybe you could have use cases for doing something like this.
As I can see the first query returns 3591 rows, the second one returns 347 rows. So Oracle needs less I/O operation that's why the cost is less.
Don't be confused with
N'01' needs additionally convert varchar to nvarchar
Oracle does one hard parse and then uses soft parse for the same queries. So the longer your oracle works the faster it becomes.