Lookup table for oracle decodes? - sql

It might be a newbie question, but still..
We are all familiar with Oracle's decodes and cases, e.g.
select
decode (state,
0, 'initial',
1, 'current',
2, 'finnal',
state)
from states_table
Or the same sort of thing using CASE's.
Now let's say I have a table with these same values:
state_num | state_desc
0 | 'initial'
1 | 'current'
2 | 'finnal'
is there a way I could do that same query using this table as a resource for the decode?
Please note that I do not want to joint the table to access the data from the other table... i just want to know if there's something I could use to do a sort of decode(myField, usingThisLookupTable, thisValueForDefault).

Instead of a join, you could use a subquery, i.e.
select nvl(
(select state_desc
from lookup
where state_num=state),to_char(state))
from states_table;

No, there is not any another way, besides using a join to your second table. Sure, you could write a scalar subquery in your select clause, or you could write your own function, but that would be inefficient practise.
If you need the data from the table, you need to select from it.
EDIT:
I have to refine my earlier statement about the inefficient practise.
When using a scalar subquery in your select list, you'd expect that you are forcing a nested-loop look-a-like plan, where the scalar subquery gets executed for each row of the states_table. At least I expected that :-).
However, Oracle has implemented scalar subquery caching, which leads to a really nice optimization. It only executes the subquery 3 times. There is an excellent article about scalar subqueries where you can see that more factors play a role in how this optimization behaves: http://www.oratechinfo.co.uk/scalar_subqueries.html#scalar3
Here is my own test to see this at work. For a simulation of your tables, I used this script:
create table states_table (id,state,filler)
as
select level
, floor(dbms_random.value(0,3))
, lpad('*',1000,'*')
from dual
connect by level <= 100000
/
alter table states_table add primary key (id)
/
create table lookup_table (state_num,state_desc)
as
select 0, 'initial' from dual union all
select 1, 'current' from dual union all
select 2, 'final' from dual
/
alter table lookup_table add primary key (state_num)
/
alter table states_table add foreign key (state) references lookup_table(state_num)
/
exec dbms_stats.gather_table_stats(user,'states_table',cascade=>true)
exec dbms_stats.gather_table_stats(user,'lookup_table',cascade=>true)
Then execute the query and have a look at the real execution plan:
SQL> select /*+ gather_plan_statistics */
2 s.id
3 , s.state
4 , l.state_desc
5 from states_table s
6 join lookup_table l on s.state = l.state_num
7 /
ID STATE STATE_D
---------- ---------- -------
1 2 final
...
100000 0 initial
100000 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f6p6ku8g8k95w, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ s.id , s.state , l.state_desc from states_table s join
lookup_table l on s.state = l.state_num
Plan hash value: 1348290364
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
|* 1 | HASH JOIN | | 1 | 99614 | 100K|00:00:00.50 | 20015 | 7478 | 1179K| 1179K| 578K (0)|
| 2 | TABLE ACCESS FULL| LOOKUP_TABLE | 1 | 3 | 3 |00:00:00.01 | 3 | 0 | | | |
| 3 | TABLE ACCESS FULL| STATES_TABLE | 1 | 99614 | 100K|00:00:00.30 | 20012 | 7478 | | | |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."STATE"="L"."STATE_NUM")
20 rows selected.
Now do the same for the scalar subquery variant:
SQL> select /*+ gather_plan_statistics */
2 s.id
3 , s.state
4 , ( select l.state_desc
5 from lookup_table l
6 where l.state_num = s.state
7 )
8 from states_table s
9 /
ID STATE (SELECT
---------- ---------- -------
1 2 final
...
100000 0 initial
100000 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 22y3dxukrqysh, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ s.id , s.state , ( select l.state_desc
from lookup_table l where l.state_num = s.state ) from states_table s
Plan hash value: 2600781440
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE | 3 | 1 | 3 |00:00:00.01 | 5 | 0 |
|* 2 | INDEX UNIQUE SCAN | SYS_C0040786 | 3 | 1 | 3 |00:00:00.01 | 2 | 0 |
| 3 | TABLE ACCESS FULL | STATES_TABLE | 1 | 99614 | 100K|00:00:00.30 | 20012 | 9367 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("L"."STATE_NUM"=:B1)
20 rows selected.
And look at the Starts column of step 1 and 2: only 3!
Whether this optimization is always a good thing in your situation, depends on many factors. You can refer to the earlier mentioned article to see the effect of some.
In your situation with only three states, it looks like you can't go wrong with the scalar subquery variant.
Regards,
Rob.

Related

What is the best way to see if the row exsists when you know that you only need to check the recent rows?

I have a table that stores the response from certain API.
It has 1.7 million rows.
pk is a kind of UnixTime(not exactly, but smilliar).
I call the API very frequently to see if the data had changed.
To check if the data had changed, I have to run this command:
SELECT 1
FROM RATE
WHERE REGDATE = '$apiReponseDate' --yymmddhhmmss
If the answer is False, that means the reponse had changed, and then I insert.
I have an INDEX on REGDATE, and I know this makes the table to do the binary search, not a full-search.
but I do know that in order to know if the data had updated, I only need to check the recent rows.
To me, using WHERE for the whole table seems an inefficient way.
Is there any good way to see if the data I got from the API response is already in DB or not?
I'm using Oracle, but that is not a main point because I'm thinking about searching the query's efficiency.
You may use index_desc hint and filter by rownum to access the table and read the most recent row. Then compare this value with the current API response.
Example is below for (default) ascending index. If an index is created as id desc, then you need to reverse the order of reading (specify index_asc hint).
create table t (
id not null,
val
) as
select level,
dbms_random.string('x', 1000)
from dual
connect by level < 5000
create unique index t_ix
on t(id)
select
/*+
index_desc(t (t.id))
gather_plan_statistics
*/
id,
substr(val, 1, 10)
from t
where rownum = 1
ID
SUBSTR(VAL,1,10)
4999
D0H3YOHB5E
select *
from dbms_xplan.display_cursor(
format => 'ALL ALLSTATS'
)
PLAN_TABLE_OUTPUT
:-----------------
SQL_ID 2ym2rg02qfmk4, child number 0
-------------------------------------
select /*+ index_desc(t (t.id)) gather_plan_statistics */
id, substr(val, 1, 10) from t where rownum = 1
 
Plan hash value: 1335626365
 
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 3 (100)| | 1 |00:00:00.01 | 3 | 1 |
|* 1 | COUNT STOPKEY | | 1 | | | | | 1 |00:00:00.01 | 3 | 1 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1005 | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 3 | 1 |
| 3 | INDEX FULL SCAN DESCENDING | T_IX | 1 | 4999 | | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 2 | 1 |
------------------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$1
   2 - SEL$1 / "T"#"SEL$1"
   3 - SEL$1 / "T"#"SEL$1"
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
   1 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000]
   2 - "ID"[NUMBER,22], "VAL"[VARCHAR2,4000]
   3 - "T".ROWID[ROWID,10], "ID"[NUMBER,22]
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
 
   2 - SEL$1 / "T"#"SEL$1"
           - index_desc(t (t.id))
fiddle
Instead of doing a SELECT and then INSERT, which is two queries, then you could combine the two into a MERGE statement:
MERGE INTO rate r
USING DUAL ignore
ON r.redgate = '$apiReponseDate'
WHEN NOT MATCHED THEN
INSERT (col1, col2, col3, redgate)
VALUES ('value1', 'value2', 'value3', '$apiNewReponseDate');
This will prevent you having to use two round trips from the middle-tier to the database and do it all in a single query.

Sequence of query execution

When there is a correlated query, what is the sequence of execution?
Ex:
select
p.productNo,
(
select count(distinct concat(bom.detailpart,bom.groupname))
from dl_MBOM bom
where bom.DetailPart=p.ProductNo
) cnt1
from dm_product p
The execution plan will vary by database vendors. For Oracle, here is a similar query, and the corresponding execution plan.
select dname,
( select count( distinct job )
from emp e
where e.deptno = d.deptno
) x
from dept d
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | SORT GROUP BY | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| EMP | 5 | 55 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | DEPT | 4 | 52 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("E"."DEPTNO"=:B1)
While it seems likely that the DBMS reads record for record from dm_product and for each such record looks up the value in dl_MBOM, this doesn't necessarily happen.
With an SQL query you tell the DBMS mainly what to do, not how to do it. If the DBMS thinks it better to build a join instead and work on this, it is free to do so.
Short answer: the sequence of execution is not determined. (You can, however, in many DBMS look at the query's execution plan to see how it is executed.)

How to find a missing value from another table

Imagine if I had two tables like this:
Table Name: user
| id | user_id | password |
Table Name: permissions
| id | admin | write | delete | transfer |
And populated the tables with this:
Inserting into user table:
- 0, joshsh, asdf01
- 1, jakesh, asdf02
- 2, annsh, asdf03
- 3, lamsh, asdf04
Inserting into permissions table:
- 0, yes, yes, yes, yes
- 1, yes, yes, yes, yes
- 2, no, yes, yes, yes
And I didn't add in the 4th value into the permissions table. How would I query a statement as to find which numbers I forgot (in case it was a big database)?
There are many ways to get the same result, these are two possible ways:
With a NOT EXISTS:
SQL> select u.*
2 from user_ u
3 where not exists (
4 select 1
5 from permissions p
6 where u.id = p.id
7 );
ID USER_ID PASSWORD
---------- ---------- ----------
3 lamsh asdf04
Execution Plan
----------------------------------------------------------
Plan hash value: 3342498783
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 160 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 4 | 160 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| USER_ | 4 | 108 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| PERMISSIONS | 3 | 39 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."ID"="P"."ID")
Note
-----
- dynamic sampling used for this statement (level=2)
And with an outer join:
SQL> select u.*
2 from user_ u
3 left outer join permissions p
4 on (u.id = p.id)
5 where p.id is null;
ID USER_ID PASSWORD
---------- ---------- ----------
3 lamsh asdf04
Execution Plan
----------------------------------------------------------
Plan hash value: 3342498783
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 160 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 4 | 160 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| USER_ | 4 | 108 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| PERMISSIONS | 3 | 39 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."ID"="P"."ID")
Note
-----
- dynamic sampling used for this statement (level=2)
Notice that Oracle did the same explain plan for both statements, considering that the test has been made with two very small tables with no indexes and no statistics.
Also, consider that there are many ways to get the same result; performance strongly depends on you data, stats, indexes, ...
ps I used USER_ instead of USER, to avoid problems.
Assuming that the id column is the join between the permissions and user tables based on the id column (and that id is the primary/unique key in both tables), here are a couple of solutions:
select id from permissions
minus
select id from user;
or
select * from user
where id not in (select id from permissions);
By applying count(*) on both the tables if they are not equal then alert will be generated and that altert can be generated using sql server agent job.
or you can do this
Select * from user as u
full outer join permissions as p
on a.id in(select id from user)
The best way is to use LEFT JOIN with and IS NULL in where clause, it's elegant and efficient in every aspects:
SELECT u.*
FROM user u
LEFT JOIN permissions p ON u.id = p.id
WHERE p.id IS NULL

Unwanted queries merge in Oracle 10g

I am working on Oracle Database 10g Release 10.2.0.5.0. I have view like:
CREATE OR REPLACE VIEW some_view
(
A,
B
)
AS
SELECT A, B
FROM table_a
WHERE condition_a
UNION ALL
SELECT A, B
FROM table_b
WHERE condition_b;
and some database function some_db_package.foo(). My problem is that when I execute query:
SELECT A, some_db_package.foo(B) val
FROM some_view
WHERE some_db_package.foo(B) = 0;
Oracle is merging conditions from query and some_view, so I am getting something like:
SELECT A, some_db_package.foo(B) val
FROM table_a
WHERE some_db_package.foo(B) = 0 AND condition_a
UNION ALL
SELECT A, some_db_package.foo(B) val
FROM table_b
WHERE some_db_package.foo(B) = 0 AND condition_b;
some_db_package.foo() executes on all rows from table_a and table_b and I would like to execute some_db_package.foo() only on filtered (by condition_a and condition_b) rows. Is there any way to do that (i.e. by changing sql query or some_view definition) assuming that I can not use optimizer hints in query?
Problem solved. Just to summmarize:
some_db_package.foo() - for given event and date range counts event's errors which occured between dates (foo() access tables), so it is deterministic only when sysdate > dateTo.
select * from ( SELECT A, some_db_package.foo(B) val FROM some_view ) does not make difference.
Actually I do not need UNION ALL and I did test with UNION, but stil the same result.
with some_view_set as (select A, B from some_view) select * from ( select A, some_db_package.foo(B) val from some_view_set ) where val = 0 does not make difference.
I did test with optimizer hints and unfortunately Oracle ignored them.
Using ROWNUM >= 1 in some_view was the solution for my problem.
Thank you for help, I really appreciate it.
ROWNUM is usually the best way to stop optimizer transformations. Hints are difficult to get right - the syntax is weird and buggy and there are many potential transformations that need to be stopped. There are other ways to re-write the query, but ROWNUM is generally the best way because it is documented to work this way. ROWNUM has to evaluate last to be used in Top-N queries, you can always rely on it to prevent query blocks from being merged.
Sample schema
drop table table_a;
drop table table_b;
create table table_a(a number, b number);
create table table_b(a number, b number);
insert into table_a select level, level from dual connect by level <= 10;
insert into table_b select level, level from dual connect by level <= 10;
begin
dbms_stats.gather_table_stats(user, 'table_a');
dbms_stats.gather_table_stats(user, 'table_b');
end;
/
--FOO takes 1 second each time it is executed.
create or replace function foo(p_value number) return number is
begin
dbms_lock.sleep(1);
return 0;
end;
/
--BAR is fast, but the optimizer doesn't know it.
create or replace function bar(p_value number) return number is
begin
return p_value;
end;
/
--This view returns 2 rows.
CREATE OR REPLACE VIEW some_view AS
SELECT A, B
FROM table_a
WHERE a = bar(1)
UNION ALL
SELECT A, B
FROM table_b
WHERE a = bar(2);
Slow query
This query takes 20 seconds to run, implying the function is evaluated 20 times.
SELECT A, foo(B) val
FROM some_view
WHERE foo(B) = 0;
The explain plan shows the conditions are merged, and it appears that the conditions are evaluated from left to right (but don't rely on this always being true!).
explain plan for
SELECT A, foo(B) val
FROM some_view
WHERE foo(B) = 0;
select * from table(dbms_xplan.display);
Plan hash value: 4139878329
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 5 (0)| 00:00:01 |
| 1 | VIEW | SOME_VIEW | 1 | 6 | 5 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | TABLE ACCESS FULL| TABLE_A | 1 | 6 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| TABLE_B | 1 | 6 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("FOO"("B")=0 AND "A"="BAR"(1))
4 - filter("FOO"("B")=0 AND "A"="BAR"(2))
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
Fast query
Add a seemingly redundant ROWNUM predicate that does nothing except prevent transformations.
CREATE OR REPLACE VIEW some_view2 AS
SELECT A, B
FROM table_a
WHERE a = bar(1)
AND ROWNUM >= 1 --Prevent optimizer transformations, for performance.
UNION ALL
SELECT A, B
FROM table_b
WHERE a = bar(2)
AND ROWNUM >= 1 --Prevent optimizer transformations, for performance.
;
Now the query only takes 4 seconds, the function is only run 4 times.
SELECT A, foo(B) val
FROM some_view2
WHERE foo(B) = 0;
In the new explain plan it's clear that the FOO function is evaluated last, after most of the filtering is complete.
explain plan for
SELECT A, foo(B) val
FROM some_view2
WHERE foo(B) = 0;
select * from table(dbms_xplan.display);
Plan hash value: 4228269064
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 6 (0)| 00:00:01 |
|* 1 | VIEW | SOME_VIEW2 | 2 | 52 | 6 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
| 3 | COUNT | | | | | |
|* 4 | FILTER | | | | | |
|* 5 | TABLE ACCESS FULL| TABLE_A | 1 | 6 | 3 (0)| 00:00:01 |
| 6 | COUNT | | | | | |
|* 7 | FILTER | | | | | |
|* 8 | TABLE ACCESS FULL| TABLE_B | 1 | 6 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FOO"("B")=0)
4 - filter(ROWNUM>=1)
5 - filter("A"="BAR"(1))
7 - filter(ROWNUM>=1)
8 - filter("A"="BAR"(2))
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
Ben's idea to make the function DETERMINISTIC may also help reduce the function calls.

Optimizing sql query with subselect list in clause

I'm using oracle 11g and trying to optimize a query.
The basic structure of the query is:
SELECT val1, val2, val3,
FROM
table_name
WHERE
val1 in (subselect statement is here, it selects a list of possible values for
val1 from another table)
and val5>=X and val5<=Y
group by val1
order by val2 desc;
My issue is that when I use a subselect, the cost is 3130.
If I fill in the results of the subselect by hand - so, for example
field1 in (1, 2, 3, 4, 5, 6)
Where (1, 2, 3, 4, 5, 6) is the results of the subselect, which in this case is all possible values of field 1, the cost of the query is 14, and oracle uses an "inlist iterator" for the group by part of the query. The results of the two queries are identical.
My question is how to mimic the behaviour of manually listing the possible values of field1 with a subselect statement. The reason I don't list those values in the query is that the possible values change based on one of the other fields, so the subselect is pulling the possible values of field1 from a 2nd table based on, say, field2.
I have an index of val1, val5, so it isn't doing any full table scans - it does do a range scan in both cases, but in the subselect case the range scan is much more expensive. However it isn't the most expensive part of the subselect query. The most expensive part is the group by, which is a HASH.
Edit - Yes, the query isn't syntactically correct - I didn't want to put up anything too specific. The actual query is fine - the selects use valid group by functions.
The subselect returns 6 values, but it can be anywhere from 1-50 or so based on the other value.
Edit2 - What I ended up doing was 2 separate queries so I could generate the list used in the subselect. I actually tried a similar test in sqlite, and it does the same thing, so this isn't just Oracle.
what you are seeing is a result of the IN () bieng subject to bind variable peeking. when you have histograms you write a query like "where a = 'a'" oracle will use the histogram to guess how many rows will be returned (same idea with an inlist operator, which iterates for each item and aggregates rows). if no histograms it will make a guess in the form of rows/distinct values.
In a subquery oracle doesn't do this (in most cases..there is a unique case where it does).
for example:
SQL> create table test
2 (val1 number, val2 varchar2(20), val3 number);
Table created.
Elapsed: 00:00:00.02
SQL>
SQL> insert into test select 1, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;
100 rows created.
Elapsed: 00:00:00.01
SQL> insert into test select 2, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 1000;
1000 rows created.
Elapsed: 00:00:00.02
SQL> insert into test select 3, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;
100 rows created.
Elapsed: 00:00:00.00
SQL> insert into test select 4, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100000;
100000 rows created.
so i have a table with 101200 rows. for VAL1 , 100 are "1" 1000 are "2" 100 are "3" and 100k are "4".
now if histograms are gathered (and we do want them in this case)
SQL> exec dbms_stats.gather_table_stats(user , 'test', degree=>4, method_opt=>'for all indexed columns size 4', estimate_percent=>100);
SQL> exec dbms_stats.gather_table_stats(user , 'lookup', degree=>4, method_opt =>'for all indexed columns size 3', estimate_percent=>100);
we see the following:
SQL> explain plan for select * from test where val1 in (1, 2, 3) ;
Explained.
SQL> #explain ""
Plan hash value: 3165434153
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1200 | 19200 | 23 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1200 | 19200 | 23 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | TEST1 | 1200 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
vs
SQL> explain plan for select * from test where val1 in (select id from lookup where str = 'A') ;
Explained.
SQL> #explain ""
Plan hash value: 441162525
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25300 | 518K| 106 (3)| 00:00:02 |
| 1 | NESTED LOOPS | | 25300 | 518K| 106 (3)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| LOOKUP | 1 | 5 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | LOOKUP1 | 1 | | 0 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | TEST | 25300 | 395K| 105 (3)| 00:00:02 |
----------------------------------------------------------------------------------------
where lookup table is
SQL> select * From lookup;
ID STR
---------- ----------
1 A
2 B
3 C
4 D
(str is unique indexed and has histograms).
notice a bang on cardinality of 1200 for the inlist and a good plan, but a wildly inaccurate one on the sub query? Oracle hasn't computed histograms on the join condition, instead it has said "look, i dont know what id will be, so ill guess total rows(100k+1000+100+100)/distinct values(4) = 25300 and use that. as such its picked a full table scan.
that's all great, but how to fix it? if you know that this sub query will match a small number of rows (we do). then you have to hint the outer query to try to have it use an index. like:
SQL> explain plan for select /*+ index(t) */ * from test t where val1 in (select id from lookup where str = 'A') ;
Explained.
SQL> #explain
Plan hash value: 702117913
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25300 | 518K| 456 (1)| 00:00:06 |
| 1 | NESTED LOOPS | | 25300 | 518K| 456 (1)| 00:00:06 |
| 2 | TABLE ACCESS BY INDEX ROWID| LOOKUP | 1 | 5 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | LOOKUP1 | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| TEST | 25300 | 395K| 455 (1)| 00:00:06 |
|* 5 | INDEX RANGE SCAN | TEST1 | 25300 | | 61 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
another thing is in my particular case. as val1=4 is most of the table, lets say i have my standard query:
select * from test t where val1 in (select id from lookup where str = :B1);
for the possible :B1 inputs. if i know that the valid values passed in are A, B and C (ie not D which maps to id=4) . i can add this trick:
SQL> explain plan for select * from test t where val1 in (select id from lookup where str = :b1 and id in (1, 2, 3)) ;
Explained.
SQL> #explain ""
Plan hash value: 771376936
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 5250 | 24 (5)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 5250 | 24 (5)| 00:00:01 |
|* 2 | VIEW | index$_join$_002 | 1 | 5 | 1 (100)| 00:00:01 |
|* 3 | HASH JOIN | | | | | |
|* 4 | INDEX RANGE SCAN | LOOKUP1 | 1 | 5 | 0 (0)| 00:00:01 |
| 5 | INLIST ITERATOR | | | | | |
|* 6 | INDEX UNIQUE SCAN | SYS_C002917051 | 1 | 5 | 0 (0)| 00:00:01 |
| 7 | INLIST ITERATOR | | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID| TEST | 1200 | 19200 | 23 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | TEST1 | 1200 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
now notice oracle has got a reasonable card (its pushed the 1,2,3 onto the TEST table and got 1200..not 100% accurate, as i was only filtering on noe of them but ive told oralce CERTAINLY NOT 4!
I have done some research and I think everything is explained here: oracle docs.
Just look in "How the CBO Evaluates IN-List Iterators"
and compare it to "How the CBO Evaluates the IN Operator".
Your query with "field1 in (1, 2, 3, 4, 5, 6)" is matching first case but query with subselect is rewritten by Oracle.
So every query with subselect or join will have similar cost to yours unless you find very tricky way to put return from subquery as parameters.
You can always try to set more memory to sorts.
You might be able to fix the statement by adding indexes on the subselect. However, you would have to post the query and execution plan to understand that. By the way, how long does the subselect itself take?
You can try one of the following two versions:
select val1, val2, val3
from table_name join
(select distinct val from (subselect here)) t
on table_name.val1 = t.val
where val5>=X and val5<=Y
group by val1, val2, val3
order by val2 desc;
or:
select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
exists (select 1 from (subselect here) t where t.val = table_name.val1)
group by val1, val2, val3
order by val2 desc;
These are semantically equivalent, and one of them might optimize better.
One other possibility that might work is to do the filtering after the group by. Something like:
select t.*
from (select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
group by val1, val2, val3
) t
where val1 in (subselect here)
order by val2 desc;