Why won't Oracle use my index unless I tell it to? - sql

I have an index:
CREATE INDEX BLAH ON EMPLOYEE(SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4));
and an SQL STATEMENT:
SELECT COUNT(*)
FROM (SELECT COUNT(*)
FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
HAVING COUNT(*) > 100);
but it keeps doing a full table scan instead of using the index unless I add a hint.
EMPSHIRTNO is not the primary key, EMPNO is (which isn't used here).
Complex query
EXPLAIN PLAN FOR SELECT COUNT(*) FROM (SELECT COUNT(*) FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
HAVING COUNT(*) > 100);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1712471557
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 24 (9)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 497 | | 24 (9)| 00:00:01 |
|* 3 | FILTER | | | | | |
----------------------------------------------------------------------------------
| 4 | HASH GROUP BY | | 497 | 2485 | 24 (9)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMPLOYEE | 9998 | 49990 | 22 (0)| 00:00:01||
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(COUNT(*)>100)
17 rows selected.
ANALYZE INDEX BLAH VALIDATE STRUCTURE;
SELECT BTREE_SPACE, USED_SPACE FROM INDEX_STATS;
BTREE_SPACE USED_SPACE
----------- ----------
176032 150274
Simple query:
EXPLAIN PLAN FOR SELECT * FROM EMPLOYEE;
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2913724801
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9998 | 439K| 23 (5)| 00:00:01 |
| 1 | TABLE ACCESS FULL| EMPLOYEE | 9998 | 439K| 23 (5)| 00:00:01 |
------------------------------------------------------------------------------
8 rows selected.
Maybe it is because the NOT NULL constraint is enforced via a CHECK constraint rather than being defined originally in the table creation statement? It will use the index when I do:
SELECT * FROM EMPLOYEE WHERE SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) = '1234';
For those suggesting that it needs to read all of the rows anyway (which I don't think it does as it is counting), the index is not used on this either:
SELECT SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) FROM EMPLOYEE;
In fact, putting an index on EMPSHIRTNO and performing SELECT EMPSHIRTNO FROM EMPLOYEE; does not use the index either. I should point out that EMPSHIRTNO is not unique, there are duplicates in the table.

Because of the nature of your query it needs to scan every row of the table anyway. So oracle is probably deciding that a full table scan is the most efficient way to do this. Because its using a HASH GROUP BY there is no nasty sort at the end like in oracle 7 days.
First get the count per SUBSTR(...) of shirt no. Its thus first part of the query which has to scan the entire table
SELECT COUNT(*)
FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
Next you want to discard the SUBSTR(...) where the count is <= 100. Oracle needs to scan all rows to verify this. Technically you could argue that once it has 101 it doesn't need any more, but I don't think Oracle can work this out, especially as you are asking it what the total numer is in the SELECT COUNT(*) of the subquery.
HAVING COUNT(*) > 100);
So basically to give you the answer you want Oracle needs to scan every row in the table, so an index is no help on filtering. Because its using a hash group by, the index is no help on the grouping either. So to use the index would just slow your query down, which is why Oracle is not using it.

I think you may need to build a function-based index on SUBSTR(TO_CHAR(EMPSHIRTNO), 1,4); Functions in your SQL have a tendency to invalidate regular indexes on a column.

I believe #Codo is correct. Oracle cannot determine that the expression will always be non-null, and then must assume that some nulls may not
be stored in the index.
(It seems like Oracle should be able to figure out that the expression is not nullable. In general, the chance of any random SUBSTR expression always being
not null is probably very low, maybe Oracle just lumps all SUBSTR expressions together?)
You can make the index usable for your query with one of these work-arounds:
--bitmap index:
create bitmap index blah on employee(substr(to_char(empshirtno), 1, 4));
--multi-column index:
alter table employee add constraint blah primary key (id, empshirtno);
--indexed virtual column:
create table employee(empshirtno varchar2(10) not null
,empshirtno_for_index as (substr(empshirtno,1,4)) not null );
create index blah on employee(empshirtno_for_index);

Related

Group By not using index

There is a table which has trades and its row count is 220 million, one of column is counterparty. The column is indexed. If I run a normal query like:
select *
from <table>
where counterparty = 'X'
The plan shows it uses index. Where as if I use group by on same column, it doesn't use index and does table scan. i.e.: for below query:
select counterparty, count(*)
from <table>
group by counterparty
Could you please advise, why it's not using the index for group by? FYI - I have already run the db stats.
FYI - the plan for 1st and second query is shown below:
Note - we are migrating data from Sybase to oracle, when I use same group by in Sybase with same indexes. The query uses indexes, but not in oracle.
First
Plan hash value: 350128866
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2209 | 1469K| 914 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FXCASHTRADE | 2209 | 1469K| 914 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SCB_FXCASHTRADE_002 | 2209 | | 11 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("COUNTERPARTY"='test')
Second
> Plan hash value: 2920872612
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 100K| 2151K| | 6558K (1)| 00:00:38 |
| 1 | HASH GROUP BY | | 100K| 2151K| 6780M| 6558K (1)| 00:00:38 |
| 2 | TABLE ACCESS FULL| FXCASHTRADE | 221M| 4643M| | 6034K (1)| 00:00:35 |
I am going to make an educated guess and say that counterparty is defined as a nullable column. As such, Oracle can't solely rely on the index to generate the results of your group by query, since null values need to be included in the results, but (Oracle) indexes don't include null values. With that in mind, a full table scan makes sense.
If there is no good reason for counterparty to be nullable, go ahead and make it not null. The execution plan should then change to use the index as expected.
Alternatively, if you can't make that change, but you don't care about null values for this particular query, you can tweak the query to filter our null values explicitly. This should also result in a better execution plan.
select counterparty, count(*)
from tbl
where counterparty is not null -- add this filter
group by counterparty
Note: I'm no Sybase expert, but I assume that indexes include null values. Oracle indexes do not include null values. That would explain the difference in execution plan between both databases.

Oracle index for a static like clause

I want to index this query clause -- note that the text is static.
SELECT * FROM tbl where flags LIKE '%current_step: complete%'
To re-iterate, the current_step: complete never changes in the query. I want to build an index that will effectively pre-calculate this boolean value, thereby preventing full table scans...
I would prefer not to add a boolean column to store the pre-calculated value as this would necessitate code changes in the application....
If you don't want to change the query, and it isn't just an issue of nor changing the data maintenance (in which case a virtual column and/or index would do the job), you could use a materialised view that applies the filter, and let query rewrite take case of using that instead of the real table. Which may well be overkill but is an option.
The original plan for a mocked-up version:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TBL | 2 | 60 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAGS" IS NOT NULL AND "FLAGS" LIKE '%current_step:
complete%')
A materialised view that will only hold the records your query is interested in (this is a simple example but you'd need to decide how to refresh and add a log if needed):
create materialized view mvw
enable query rewrite as
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
Now your query hits the materialised view, thanks to query rewrite:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
select * from table(dbms_xplan.display);
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 3 (0)| 00:00:01 |
| 1 | MAT_VIEW REWRITE ACCESS FULL| MVW | 2 | 60 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
But any other query will still use the original table:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: working%';
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TBL | 1 | 27 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAGS" LIKE '%current_step: success%' AND "FLAGS" IS NOT
NULL)
Of course a virtual index would be simpler if you are allowed to modify the query...
A full text search index might be what you are looking for.
There are a few ways you can implement this:
Oracle has Oracle Text where you can define which type of full text index you want.
Lucene is a Java full text search framework.
Solr is a server product that provides full text search.
I would prefer not to add a boolean column to store the pre-calculated value as this would necessitate code changes in the application
There are two ways I can suggest :
1.
If you are on 11g and up, you could have a VIRTUAL COLUMN always generated as 1 when the value is complete else 0. All you need to do then :
select * from table where virtual_column = 1
To improve performance, you could have an index over it, which is equivalent to a function-based index.
2.
Update : Perhaps, I should be more clear with my second point : source
There are instances where Oracle will use an index to resolve a like with the pattern of '%text%'. If the query can be resolved without having to go back to the table (rowid lookup), the index may be chosen. Example:
select distinct first_nm from person where first_nm like '%EV%';
And in above case, Oracle will do an index fast full scan - a full scan of the smaller index.

Want to process 5000 records from the select query is taking long time in oracle database

Each time i want to process 5000 records like below.
First time i want to process records from 1 to 5000 rows.
second time i want to process records from 5001 to 10000 rows.
third time i want to process records from 10001 to 15001 rows like wise
I dont want to go for procedure or PL/SQL. I will change the rnum values in my code to fetch the 5000 records.
The given query is taking 3 minutes to fetch the records from 3 joined tables. How can i reduced the time to fetch the records.
select * from (
SELECT to_number(AA.MARK_ID) as MARK_ID, AA.SUPP_ID as supplier_id, CC.supp_nm as SUPPLIER_NAME, CC.supp_typ as supplier_type,
CC.supp_lock_typ as supplier_lock_type, ROW_NUMBER() OVER (ORDER BY AA.MARK_ID) as rnum
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL)
where rnum >=10001 and rnum<=15000;
I have tried below scenario but no luck.
I have tried the /*+ USE_NL(AA BB) */ hints.
I used exists in the where conditions. but its taking the same 3 minutes to fetch the records.
Below is the table details.
select count(*) from TABLE_B;
-----------------
2275
select count(*) from TABLE_A;
-----------------
2405276
select count(*) from TABLE_C;
-----------------
1269767
Result of my inner query total records is
SELECT count(*)
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL;
-----------------
2027055
All the used columns in where conditions are indexed properly.
Explain Table for the given query is...
Plan hash value: 3726328503
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 1 | VIEW | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 2 | WINDOW SORT PUSHED RANK | | 2082K| 166M| 200M| 85175 (1)| 00:17:03 |
|* 3 | HASH JOIN | | 2082K| 166M| | 44550 (1)| 00:08:55 |
| 4 | TABLE ACCESS FULL | TABLE_C | 1640 | 49200 | | 22 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 2082K| 107M| 27M| 44516 (1)| 00:08:55 |
|* 6 | VIEW | index$_join$_005 | 1274K| 13M| | 9790 (1)| 00:01:58 |
|* 7 | HASH JOIN | | | | | | |
| 8 | INLIST ITERATOR | | | | | | |
|* 9 | INDEX RANGE SCAN | TABLE_B_IN2 | 1274K| 13M| | 2371 (2)| 00:00:29 |
| 10 | INDEX FAST FULL SCAN| TABLE_B_IU1 | 1274K| 13M| | 4801 (1)| 00:00:58 |
|* 11 | TABLE ACCESS FULL | TABLE_A | 2356K| 96M| | 27174 (1)| 00:05:27 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=10001 AND "RNUM"<=15000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "A"."MARK_ID")<=15000)
3 - access("A"."SUPP_ID"="C"."LOC_ID" AND "A"."VALUE_KEY"="C"."VALUE_KEY")
5 - access("A"."MARK_ID"="A"."MARK_ID" AND "A"."VALUE_KEY"="A"."VALUE_KEY")
6 - filter("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
7 - access(ROWID=ROWID)
9 - access("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
11 - filter("A"."CHNL_ID"=160 AND "A"."VPR_ID" IS NOT NULL)
Could you please anyone help me on this to tune this query as i am trying from last 2 days?
Each query will take a long time because each query will have to join then sort all rows. The row_number analytic function can only return a result if the whole set has been read. This is highly inefficient. If the data set is large, you only want to sort and hash-join once.
You should fetch the whole set once, using batches of 5k rows. Alternatively, if you want to keep your existing code logic, you could store the result in a temporary table, for instance:
CREATE TABLE TMP AS <your above query>
CREATE INDEX ON TMP (rnum)
And then replace your query in your code by
SELECT * FROM TMP WHERE rnum BETWEEN :x AND :y
Obviously if your temp table is being reused periodically, just create it once and delete when done (or use a true temporary table).
How many unique MARK_ID values have you got in TABLE_A? I think you may get better performance if you limit the fetched ranges of records by MARK_ID instead of the artificial row number, because the latter is obviously not sargeable. Granted, you may not get exactly 5000 rows in each range but I have a feeling it's not as important as the query performance.
Firstly, giving obfuscated table names makes it nearly impossible to deduce anything about the data distributions and relationships between tables, so potential answerers are crippled from the start.
However, if every row in table_a matches one row in the other tables then you can avoid some of the usage of 200Mb of temporary disk space that is probably crippling performance by pushing the ranking down into an inline view or common table expression.
Monitor V$SQL_WORKAREA to check the exact amount of space being used for the window function, and if it is still excessive consider modifying the memory management to increase available sort area size.
Something like:
with cte_table_a as (
SELECT
to_number(MARK_ID) as MARK_ID,
SUPP_ID as supplier_id,
ROW_NUMBER() OVER (ORDER BY MARK_ID) as rnum
from
TABLE_A
where
char_id='160' and
VPR_ID IS NOT NULL)
select ...
from
cte_table_a aa,
TABLE_B BB,
TABLE_C CC
WHERE
aa.rnum >= 10001 and
aa.rnum <= 15000 and
AA.MARK_ID = BB.MARK_ID AND
AA.SUPP_ID = CC.location_id AND
BB.VALUE_KEY = AA.VALUE_KEY AND
BB.VALUE_KEY = CC.VALUE_KEY

Distinct values on indexed column

I have a table with 115 M rows. One of the column is indexed (index called "my_index" on explain plan below) and not nullable. Moreover, this column has just one distinct value so far.
When I do
select distinct my_col from my_table;
, it takes 230 seconds which is very long. Here is the explain plan.
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 3 | 22064 (90)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 22064 (90)| 00:03:23 |
| 2 | INDEX FULL SCAN | my_index | 115M| 331M| 2363 (2)| 00:00:22 |
Since the column has just one distinct value, why does it take so long ? Why Oracle does not just check index entries and fastly find that there is just one possible value for this column ? On the explain plan above, the index scanning seems to take 22 s but what is this "SORT UNIQUE NOSORT" which takes ages ?
Thank you in advance for your help
Re analyse the table.
EXEC dbms_stats.gather_table_stats('owner','table_name',cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ');
Change Index Type
One distinct value out of 115M rows??!! That's what called as low cardinality, not so good for the 'normal' B-Tree index Consider a bitmapped index. (If at all you have B-tree)
Reconstructing Query
If you are sure that no new values will be added to this column then please remove the distinct clause and rather use as Abhijith said.
SORT UNIQUE NOSORT is not taking too long. You are looking at the estimates from a bad execution plan that is probably the result of unreasonable optimizer parameters. For example, setting the parameter OPTIMIZER_INDEX_COST_ADJ to 1 instead of the default 100 can produce a similar plan. Most likely your query runs slowly because your database is busy or just slow.
What's wrong with the posted execution plan?
The posted execution plan seems unreasonable. Retrieving data should take much longer than simply throwing out duplicates. And the consumer operation, SORT UNIQUE NOSORT, can start at almost the same time as the producer operation, INDEX FULL SCAN. Normally they should finish at almost the same time. The execution plan in the question shows the optimizer estimates. The screenshot below of an active report shows the actual timelines for a very similar query. All steps are starting and stopping at almost the same time.
Sample setup with reasonable plan
Below is a very similar setup, but with a very plain configuration. Same number of rows read (115 million) and returned (1), and almost the exact same segment size (329MB vs 331 MB). The plan shows almost all of the time being spent on the INDEX FULL SCAN.
drop table test1 purge;
create table test1(a number not null, b number, c number) nologging;
begin
for i in 1 .. 115 loop
insert /*+ append */ into test1 select 1, level, level
from dual connect by level <= 1000000;
commit;
end loop;
end;
/
create index test1_idx on test1(a);
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 244K (4)| 00:48:50 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 244K (4)| 00:48:50 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 237K (1)| 00:47:30 |
--------------------------------------------------------------------------------
Re-creating a bad plan
--Set optimizer_index_cost_adj to a ridiculously low value.
--This changes the INDEX FULL SCAN estimate from 47 minutes to 29 seconds.
alter session set optimizer_index_cost_adj = 1;
--Changing the CPUSPEEDNW to 800 will exactly re-create the time estimate
--for SORT UNIQUE NOSORT. This value is not ridiculous, and it is not
--something you should normally change. But it does imply your CPUs are
--slow. My 2+ year-old desktop had an original score of 1720.
begin
dbms_stats.set_system_stats( 'CPUSPEEDNW', 800);
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 16842 (86)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 16842 (86)| 00:03:23 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 2389 (2)| 00:00:29 |
--------------------------------------------------------------------------------
How to investigate
Check the parameters.
select name, value from v$parameter where name like 'optimizer_index%'
NAME VALUE
---- -----
optimizer_index_cost_adj 1
optimizer_index_caching 0
Also check the system statistics.
select * from sys.aux_stats$;
+---------------+------------+-------+------------------+
| SNAME | PNAME | PVAL1 | PVAL2 |
+---------------+------------+-------+------------------+
| SYSSTATS_INFO | STATUS | | COMPLETED |
| SYSSTATS_INFO | DSTART | | 09-23-2013 17:52 |
| SYSSTATS_INFO | DSTOP | | 09-23-2013 17:52 |
| SYSSTATS_INFO | FLAGS | 1 | |
| SYSSTATS_MAIN | CPUSPEEDNW | 800 | |
| SYSSTATS_MAIN | IOSEEKTIM | 10 | |
| SYSSTATS_MAIN | IOTFRSPEED | 4096 | |
| SYSSTATS_MAIN | SREADTIM | | |
| SYSSTATS_MAIN | MREADTIM | | |
| SYSSTATS_MAIN | CPUSPEED | | |
| SYSSTATS_MAIN | MBRC | | |
| SYSSTATS_MAIN | MAXTHR | | |
| SYSSTATS_MAIN | SLAVETHR | | |
+---------------+------------+-------+------------------+
To find out where the time is really spent, use a tool like the active report.
select dbms_sqltune.report_sql_monitor(sql_id => '5s63uf4au6hcm',
type => 'active') from dual;
If there are only a few distinct values of the column, try a compressed index:
create index my_index on my_table (my_col) compress;
This will store each distinct value of the column only once, hopefully reducing the execution time of your query.
As a bonus: use this to see the actual plan used for a query:
select /*+ gather_plan_statistics */ distinct my_col from my_table;
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR);
The gather_plan_statistics hint will collect more data (it will take longer to execute), but it works without it too. See the documentation of DBMS_XPLAN.DISPLAY_CURSOR for more details.
See the explain plan carefully.
It scans the whole index to know what you are trying to fetch
Then applies distinct function (try to retrieve the unique values). Though you say there is only one unique value, it has to scan the whole index to get the values. Oracle does not know that there is only one distinct value in the index. You can restrict the rownum = 1 to get the quick answer.
Try this to get the quick answer
select my_col from my_table where rownum = 1;
It is highly unfavourable to add an index on a column which has very less distribution. This is bad for the table and overall for the application as well. This just does not make any sense

SQL Query going for Full Table scan instead of Index Based Scan

I have two tables:
create table big( id number, name varchar2(100));
insert into big(id, name) select rownum, object_name from all_objects;
create table small as select id from big where rownum < 10;
create index big_index on big(id);
On these tables if I execute the following query:
select *
from big_table
where id like '45%'
or id in ( select id from small_table);
it always goes for a Full Table Scan.
Execution Plan
----------------------------------------------------------
Plan hash value: 2290496975
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3737 | 97162 | 85 (3)| 00:00:02 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| BIG | 74718 | 1897K| 85 (3)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| SMALL | 1 | 4 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=45 OR EXISTS (SELECT /*+ */ 0 FROM "SMALL" "SMALL"
WHERE "ID"=:B1))
3 - filter("ID"=:B1)
Are there any ways in which we can rewrite the Query So that it always goes for index Scan.
No, no and no.
You do NOT want it to use an index. Luckily Oracle is smarter than that.
ID is numeric. While it might have ID values of 45,450,451,452,4501,45004,4500003 etc, in the indexes these values will be scattered anywhere and everywhere. If you went with a condition such as ID BETWEEN 450 AND 459, then it may be worth using the index.
To use the index it would have to scan it all the way from top to bottom (converting each ID to a character to do the LIKE comparison). Then, for any match, it has to go off to get the NAME column.
It has decided that it is easier to and quicker to scan the table (which, with 75,000 rows isn't that big anyway) rather than mucking about going back and forth between the index and the table.
The others are right, you shouldn't use a numeric column like that.
However, it is actually, the OR <subquery> construct that is causing a (performance) problem in this case. I don't know if it is different in version 11, but up to version 10gr2, it causes a a filter operation with what is basically a nested loop with a correlated subquery. In your case, the use of a numeric column as a varchar also results in a full table scan.
You can rewrite your query like this:
select *
from big
where id like '45%'
union all
select *
from big
join small using(id)
where id not like '45%';
With your test case, I end up with a row count of 174000 rows in big and 9 small.
Running your query takes 7 seconds with 1211399 consistent gets.
Running my query 0,7 seconds and uses 542 consistent gets.
The explain plans for my query is:
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8604 | 154 (6)|
| 1 | UNION-ALL | | | |
|* 2 | TABLE ACCESS FULL | BIG | 8603 | 151 (4)|
| 3 | NESTED LOOPS | | 1 | 3 (0)|
|* 4 | TABLE ACCESS FULL | SMALL | 1 | 3 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID| BIG | 1 | 0 (0)|
|* 6 | INDEX UNIQUE SCAN | BIG_PK | 1 | 0 (0)|
---------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_CHAR("ID") LIKE '45%')
4 - filter(TO_CHAR("SMALL"."ID") NOT LIKE '45%')
6 - access("BIG"."ID"="SMALL"."ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
542 consistent gets
0 physical reads
0 redo size
33476 bytes sent via SQL*Net to client
753 bytes received via SQL*Net from client
76 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1120 rows processed
Something like this might work:
select *
from big_table big
where id like '45%'
or exists ( select id from small_table where id = big.id);