Distinct values on indexed column - sql

I have a table with 115 M rows. One of the column is indexed (index called "my_index" on explain plan below) and not nullable. Moreover, this column has just one distinct value so far.
When I do
select distinct my_col from my_table;
, it takes 230 seconds which is very long. Here is the explain plan.
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 3 | 22064 (90)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 22064 (90)| 00:03:23 |
| 2 | INDEX FULL SCAN | my_index | 115M| 331M| 2363 (2)| 00:00:22 |
Since the column has just one distinct value, why does it take so long ? Why Oracle does not just check index entries and fastly find that there is just one possible value for this column ? On the explain plan above, the index scanning seems to take 22 s but what is this "SORT UNIQUE NOSORT" which takes ages ?
Thank you in advance for your help

Re analyse the table.
EXEC dbms_stats.gather_table_stats('owner','table_name',cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ');
Change Index Type
One distinct value out of 115M rows??!! That's what called as low cardinality, not so good for the 'normal' B-Tree index Consider a bitmapped index. (If at all you have B-tree)
Reconstructing Query
If you are sure that no new values will be added to this column then please remove the distinct clause and rather use as Abhijith said.

SORT UNIQUE NOSORT is not taking too long. You are looking at the estimates from a bad execution plan that is probably the result of unreasonable optimizer parameters. For example, setting the parameter OPTIMIZER_INDEX_COST_ADJ to 1 instead of the default 100 can produce a similar plan. Most likely your query runs slowly because your database is busy or just slow.
What's wrong with the posted execution plan?
The posted execution plan seems unreasonable. Retrieving data should take much longer than simply throwing out duplicates. And the consumer operation, SORT UNIQUE NOSORT, can start at almost the same time as the producer operation, INDEX FULL SCAN. Normally they should finish at almost the same time. The execution plan in the question shows the optimizer estimates. The screenshot below of an active report shows the actual timelines for a very similar query. All steps are starting and stopping at almost the same time.
Sample setup with reasonable plan
Below is a very similar setup, but with a very plain configuration. Same number of rows read (115 million) and returned (1), and almost the exact same segment size (329MB vs 331 MB). The plan shows almost all of the time being spent on the INDEX FULL SCAN.
drop table test1 purge;
create table test1(a number not null, b number, c number) nologging;
begin
for i in 1 .. 115 loop
insert /*+ append */ into test1 select 1, level, level
from dual connect by level <= 1000000;
commit;
end loop;
end;
/
create index test1_idx on test1(a);
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 244K (4)| 00:48:50 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 244K (4)| 00:48:50 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 237K (1)| 00:47:30 |
--------------------------------------------------------------------------------
Re-creating a bad plan
--Set optimizer_index_cost_adj to a ridiculously low value.
--This changes the INDEX FULL SCAN estimate from 47 minutes to 29 seconds.
alter session set optimizer_index_cost_adj = 1;
--Changing the CPUSPEEDNW to 800 will exactly re-create the time estimate
--for SORT UNIQUE NOSORT. This value is not ridiculous, and it is not
--something you should normally change. But it does imply your CPUs are
--slow. My 2+ year-old desktop had an original score of 1720.
begin
dbms_stats.set_system_stats( 'CPUSPEEDNW', 800);
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 16842 (86)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 16842 (86)| 00:03:23 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 2389 (2)| 00:00:29 |
--------------------------------------------------------------------------------
How to investigate
Check the parameters.
select name, value from v$parameter where name like 'optimizer_index%'
NAME VALUE
---- -----
optimizer_index_cost_adj 1
optimizer_index_caching 0
Also check the system statistics.
select * from sys.aux_stats$;
+---------------+------------+-------+------------------+
| SNAME | PNAME | PVAL1 | PVAL2 |
+---------------+------------+-------+------------------+
| SYSSTATS_INFO | STATUS | | COMPLETED |
| SYSSTATS_INFO | DSTART | | 09-23-2013 17:52 |
| SYSSTATS_INFO | DSTOP | | 09-23-2013 17:52 |
| SYSSTATS_INFO | FLAGS | 1 | |
| SYSSTATS_MAIN | CPUSPEEDNW | 800 | |
| SYSSTATS_MAIN | IOSEEKTIM | 10 | |
| SYSSTATS_MAIN | IOTFRSPEED | 4096 | |
| SYSSTATS_MAIN | SREADTIM | | |
| SYSSTATS_MAIN | MREADTIM | | |
| SYSSTATS_MAIN | CPUSPEED | | |
| SYSSTATS_MAIN | MBRC | | |
| SYSSTATS_MAIN | MAXTHR | | |
| SYSSTATS_MAIN | SLAVETHR | | |
+---------------+------------+-------+------------------+
To find out where the time is really spent, use a tool like the active report.
select dbms_sqltune.report_sql_monitor(sql_id => '5s63uf4au6hcm',
type => 'active') from dual;

If there are only a few distinct values of the column, try a compressed index:
create index my_index on my_table (my_col) compress;
This will store each distinct value of the column only once, hopefully reducing the execution time of your query.
As a bonus: use this to see the actual plan used for a query:
select /*+ gather_plan_statistics */ distinct my_col from my_table;
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR);
The gather_plan_statistics hint will collect more data (it will take longer to execute), but it works without it too. See the documentation of DBMS_XPLAN.DISPLAY_CURSOR for more details.

See the explain plan carefully.
It scans the whole index to know what you are trying to fetch
Then applies distinct function (try to retrieve the unique values). Though you say there is only one unique value, it has to scan the whole index to get the values. Oracle does not know that there is only one distinct value in the index. You can restrict the rownum = 1 to get the quick answer.
Try this to get the quick answer
select my_col from my_table where rownum = 1;
It is highly unfavourable to add an index on a column which has very less distribution. This is bad for the table and overall for the application as well. This just does not make any sense

Related

Want to process 5000 records from the select query is taking long time in oracle database

Each time i want to process 5000 records like below.
First time i want to process records from 1 to 5000 rows.
second time i want to process records from 5001 to 10000 rows.
third time i want to process records from 10001 to 15001 rows like wise
I dont want to go for procedure or PL/SQL. I will change the rnum values in my code to fetch the 5000 records.
The given query is taking 3 minutes to fetch the records from 3 joined tables. How can i reduced the time to fetch the records.
select * from (
SELECT to_number(AA.MARK_ID) as MARK_ID, AA.SUPP_ID as supplier_id, CC.supp_nm as SUPPLIER_NAME, CC.supp_typ as supplier_type,
CC.supp_lock_typ as supplier_lock_type, ROW_NUMBER() OVER (ORDER BY AA.MARK_ID) as rnum
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL)
where rnum >=10001 and rnum<=15000;
I have tried below scenario but no luck.
I have tried the /*+ USE_NL(AA BB) */ hints.
I used exists in the where conditions. but its taking the same 3 minutes to fetch the records.
Below is the table details.
select count(*) from TABLE_B;
-----------------
2275
select count(*) from TABLE_A;
-----------------
2405276
select count(*) from TABLE_C;
-----------------
1269767
Result of my inner query total records is
SELECT count(*)
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL;
-----------------
2027055
All the used columns in where conditions are indexed properly.
Explain Table for the given query is...
Plan hash value: 3726328503
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 1 | VIEW | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 2 | WINDOW SORT PUSHED RANK | | 2082K| 166M| 200M| 85175 (1)| 00:17:03 |
|* 3 | HASH JOIN | | 2082K| 166M| | 44550 (1)| 00:08:55 |
| 4 | TABLE ACCESS FULL | TABLE_C | 1640 | 49200 | | 22 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 2082K| 107M| 27M| 44516 (1)| 00:08:55 |
|* 6 | VIEW | index$_join$_005 | 1274K| 13M| | 9790 (1)| 00:01:58 |
|* 7 | HASH JOIN | | | | | | |
| 8 | INLIST ITERATOR | | | | | | |
|* 9 | INDEX RANGE SCAN | TABLE_B_IN2 | 1274K| 13M| | 2371 (2)| 00:00:29 |
| 10 | INDEX FAST FULL SCAN| TABLE_B_IU1 | 1274K| 13M| | 4801 (1)| 00:00:58 |
|* 11 | TABLE ACCESS FULL | TABLE_A | 2356K| 96M| | 27174 (1)| 00:05:27 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=10001 AND "RNUM"<=15000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "A"."MARK_ID")<=15000)
3 - access("A"."SUPP_ID"="C"."LOC_ID" AND "A"."VALUE_KEY"="C"."VALUE_KEY")
5 - access("A"."MARK_ID"="A"."MARK_ID" AND "A"."VALUE_KEY"="A"."VALUE_KEY")
6 - filter("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
7 - access(ROWID=ROWID)
9 - access("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
11 - filter("A"."CHNL_ID"=160 AND "A"."VPR_ID" IS NOT NULL)
Could you please anyone help me on this to tune this query as i am trying from last 2 days?
Each query will take a long time because each query will have to join then sort all rows. The row_number analytic function can only return a result if the whole set has been read. This is highly inefficient. If the data set is large, you only want to sort and hash-join once.
You should fetch the whole set once, using batches of 5k rows. Alternatively, if you want to keep your existing code logic, you could store the result in a temporary table, for instance:
CREATE TABLE TMP AS <your above query>
CREATE INDEX ON TMP (rnum)
And then replace your query in your code by
SELECT * FROM TMP WHERE rnum BETWEEN :x AND :y
Obviously if your temp table is being reused periodically, just create it once and delete when done (or use a true temporary table).
How many unique MARK_ID values have you got in TABLE_A? I think you may get better performance if you limit the fetched ranges of records by MARK_ID instead of the artificial row number, because the latter is obviously not sargeable. Granted, you may not get exactly 5000 rows in each range but I have a feeling it's not as important as the query performance.
Firstly, giving obfuscated table names makes it nearly impossible to deduce anything about the data distributions and relationships between tables, so potential answerers are crippled from the start.
However, if every row in table_a matches one row in the other tables then you can avoid some of the usage of 200Mb of temporary disk space that is probably crippling performance by pushing the ranking down into an inline view or common table expression.
Monitor V$SQL_WORKAREA to check the exact amount of space being used for the window function, and if it is still excessive consider modifying the memory management to increase available sort area size.
Something like:
with cte_table_a as (
SELECT
to_number(MARK_ID) as MARK_ID,
SUPP_ID as supplier_id,
ROW_NUMBER() OVER (ORDER BY MARK_ID) as rnum
from
TABLE_A
where
char_id='160' and
VPR_ID IS NOT NULL)
select ...
from
cte_table_a aa,
TABLE_B BB,
TABLE_C CC
WHERE
aa.rnum >= 10001 and
aa.rnum <= 15000 and
AA.MARK_ID = BB.MARK_ID AND
AA.SUPP_ID = CC.location_id AND
BB.VALUE_KEY = AA.VALUE_KEY AND
BB.VALUE_KEY = CC.VALUE_KEY

Why won't Oracle use my index unless I tell it to?

I have an index:
CREATE INDEX BLAH ON EMPLOYEE(SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4));
and an SQL STATEMENT:
SELECT COUNT(*)
FROM (SELECT COUNT(*)
FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
HAVING COUNT(*) > 100);
but it keeps doing a full table scan instead of using the index unless I add a hint.
EMPSHIRTNO is not the primary key, EMPNO is (which isn't used here).
Complex query
EXPLAIN PLAN FOR SELECT COUNT(*) FROM (SELECT COUNT(*) FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
HAVING COUNT(*) > 100);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1712471557
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 24 (9)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 497 | | 24 (9)| 00:00:01 |
|* 3 | FILTER | | | | | |
----------------------------------------------------------------------------------
| 4 | HASH GROUP BY | | 497 | 2485 | 24 (9)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMPLOYEE | 9998 | 49990 | 22 (0)| 00:00:01||
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(COUNT(*)>100)
17 rows selected.
ANALYZE INDEX BLAH VALIDATE STRUCTURE;
SELECT BTREE_SPACE, USED_SPACE FROM INDEX_STATS;
BTREE_SPACE USED_SPACE
----------- ----------
176032 150274
Simple query:
EXPLAIN PLAN FOR SELECT * FROM EMPLOYEE;
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2913724801
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9998 | 439K| 23 (5)| 00:00:01 |
| 1 | TABLE ACCESS FULL| EMPLOYEE | 9998 | 439K| 23 (5)| 00:00:01 |
------------------------------------------------------------------------------
8 rows selected.
Maybe it is because the NOT NULL constraint is enforced via a CHECK constraint rather than being defined originally in the table creation statement? It will use the index when I do:
SELECT * FROM EMPLOYEE WHERE SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) = '1234';
For those suggesting that it needs to read all of the rows anyway (which I don't think it does as it is counting), the index is not used on this either:
SELECT SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) FROM EMPLOYEE;
In fact, putting an index on EMPSHIRTNO and performing SELECT EMPSHIRTNO FROM EMPLOYEE; does not use the index either. I should point out that EMPSHIRTNO is not unique, there are duplicates in the table.
Because of the nature of your query it needs to scan every row of the table anyway. So oracle is probably deciding that a full table scan is the most efficient way to do this. Because its using a HASH GROUP BY there is no nasty sort at the end like in oracle 7 days.
First get the count per SUBSTR(...) of shirt no. Its thus first part of the query which has to scan the entire table
SELECT COUNT(*)
FROM EMPLOYEE
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
Next you want to discard the SUBSTR(...) where the count is <= 100. Oracle needs to scan all rows to verify this. Technically you could argue that once it has 101 it doesn't need any more, but I don't think Oracle can work this out, especially as you are asking it what the total numer is in the SELECT COUNT(*) of the subquery.
HAVING COUNT(*) > 100);
So basically to give you the answer you want Oracle needs to scan every row in the table, so an index is no help on filtering. Because its using a hash group by, the index is no help on the grouping either. So to use the index would just slow your query down, which is why Oracle is not using it.
I think you may need to build a function-based index on SUBSTR(TO_CHAR(EMPSHIRTNO), 1,4); Functions in your SQL have a tendency to invalidate regular indexes on a column.
I believe #Codo is correct. Oracle cannot determine that the expression will always be non-null, and then must assume that some nulls may not
be stored in the index.
(It seems like Oracle should be able to figure out that the expression is not nullable. In general, the chance of any random SUBSTR expression always being
not null is probably very low, maybe Oracle just lumps all SUBSTR expressions together?)
You can make the index usable for your query with one of these work-arounds:
--bitmap index:
create bitmap index blah on employee(substr(to_char(empshirtno), 1, 4));
--multi-column index:
alter table employee add constraint blah primary key (id, empshirtno);
--indexed virtual column:
create table employee(empshirtno varchar2(10) not null
,empshirtno_for_index as (substr(empshirtno,1,4)) not null );
create index blah on employee(empshirtno_for_index);

SQL Query going for Full Table scan instead of Index Based Scan

I have two tables:
create table big( id number, name varchar2(100));
insert into big(id, name) select rownum, object_name from all_objects;
create table small as select id from big where rownum < 10;
create index big_index on big(id);
On these tables if I execute the following query:
select *
from big_table
where id like '45%'
or id in ( select id from small_table);
it always goes for a Full Table Scan.
Execution Plan
----------------------------------------------------------
Plan hash value: 2290496975
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3737 | 97162 | 85 (3)| 00:00:02 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| BIG | 74718 | 1897K| 85 (3)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| SMALL | 1 | 4 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=45 OR EXISTS (SELECT /*+ */ 0 FROM "SMALL" "SMALL"
WHERE "ID"=:B1))
3 - filter("ID"=:B1)
Are there any ways in which we can rewrite the Query So that it always goes for index Scan.
No, no and no.
You do NOT want it to use an index. Luckily Oracle is smarter than that.
ID is numeric. While it might have ID values of 45,450,451,452,4501,45004,4500003 etc, in the indexes these values will be scattered anywhere and everywhere. If you went with a condition such as ID BETWEEN 450 AND 459, then it may be worth using the index.
To use the index it would have to scan it all the way from top to bottom (converting each ID to a character to do the LIKE comparison). Then, for any match, it has to go off to get the NAME column.
It has decided that it is easier to and quicker to scan the table (which, with 75,000 rows isn't that big anyway) rather than mucking about going back and forth between the index and the table.
The others are right, you shouldn't use a numeric column like that.
However, it is actually, the OR <subquery> construct that is causing a (performance) problem in this case. I don't know if it is different in version 11, but up to version 10gr2, it causes a a filter operation with what is basically a nested loop with a correlated subquery. In your case, the use of a numeric column as a varchar also results in a full table scan.
You can rewrite your query like this:
select *
from big
where id like '45%'
union all
select *
from big
join small using(id)
where id not like '45%';
With your test case, I end up with a row count of 174000 rows in big and 9 small.
Running your query takes 7 seconds with 1211399 consistent gets.
Running my query 0,7 seconds and uses 542 consistent gets.
The explain plans for my query is:
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8604 | 154 (6)|
| 1 | UNION-ALL | | | |
|* 2 | TABLE ACCESS FULL | BIG | 8603 | 151 (4)|
| 3 | NESTED LOOPS | | 1 | 3 (0)|
|* 4 | TABLE ACCESS FULL | SMALL | 1 | 3 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID| BIG | 1 | 0 (0)|
|* 6 | INDEX UNIQUE SCAN | BIG_PK | 1 | 0 (0)|
---------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_CHAR("ID") LIKE '45%')
4 - filter(TO_CHAR("SMALL"."ID") NOT LIKE '45%')
6 - access("BIG"."ID"="SMALL"."ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
542 consistent gets
0 physical reads
0 redo size
33476 bytes sent via SQL*Net to client
753 bytes received via SQL*Net from client
76 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1120 rows processed
Something like this might work:
select *
from big_table big
where id like '45%'
or exists ( select id from small_table where id = big.id);

Why is this query so slow?

I have tables FOO and BAR. FOO has a foreign key to BAR's PK.
When I execute the following query it takes several seconds.
select foo.name, foo.description, bar.quadrant from FOO, BAR
where FOO.BAR_ID = BAR.BAR_ID
Here is my explain plan:
OPERATION OBJECT_NAME OPTIONS COST
SELECT STATEMENT 39
HASH JOIN 39
TABLE ACCESS BAR FULL 2
TABLE ACCESS FOO FULL 36
FOO has 6000 records in it and BAR only has 5. The BAR_ID column is a NUMBER.
This is running on Oracle 10g and it is taking ~3 seconds to complete. That seems extreme given how quickly it performs other queries.
EDIT table defs:
CREATE TABLE BAR
(
"BAR_ID" NUMBER NOT NULL,
"QUADRANT" VARCHAR2(100 BYTE) NOT NULL,
CONSTRAINT "BAR_PK" PRIMARY KEY ("BAR_ID")
)
CREATE TABLE FOO
( "FOO_ID" NUMBER NOT NULL,
"BAR_ID" NUMBER NOT NULL,
"NAME" VARCHAR2(250 BYTE) NOT NULL,
"DESCRIPTION" VARCHAR2(250 BYTE),
CONSTRAINT "FOO_PK" PRIMARY KEY ("FOO_ID"),
CONSTRAINT "FOO__FK1" FOREIGN KEY ("BAR_ID") REFERENCES BAR ("BAR_ID") ENABLE
);
Are you sure you have good statistics? I created a test case from your DDL and saw this plan before statistics:
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4996 | 1619K| 10 (10)| 00:00:01 |
|* 1 | HASH JOIN | | 4996 | 1619K| 10 (10)| 00:00:01 |
| 2 | TABLE ACCESS FULL| BAR | 5 | 325 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| FOO | 4996 | 1302K| 6 (0)| 00:00:01 |
---------------------------------------------------------------------------
(If you get the dbms_xplan output you'll also see "dynamic sampling used for this statement").
After doing this:
SQL> begin dbms_stats.gather_table_stats(user,'FOO'); end;
2 /
PL/SQL procedure successfully completed.
SQL> c/FOO/BAR/
1* begin dbms_stats.gather_table_stats(user,'BAR'); end;
SQL> /
PL/SQL procedure successfully completed.
I see:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4996 | 131K| 9 (12)| 00:00:01 |
| 1 | MERGE JOIN | | 4996 | 131K| 9 (12)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| BAR | 5 | 40 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | BAR_PK | 5 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 4996 | 94924 | 7 (15)| 00:00:01 |
| 5 | TABLE ACCESS FULL | FOO | 4996 | 94924 | 6 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
There's a bucket load of instrumentation built into Oracle for investigating this sort of issue.
Start with this paper:
http://method-r.com/downloads/doc_download/10-for-developers-making-friends-with-the-oracle-database-cary-millsap
Get a TKPROF trace for your query to see what really happens - explain plan is just an estimate.
Basically, execute ALTER SESSION SET SQL_TRACE = TRUE command before your query, execute the query, and then ALTER SESSION SET SQL_TRACE = FALSE. Then find the trace file produced from location determined by USER_DUMP_DEST parameter (look into v$parameter view). Use TKPROF utility to process the raw trace file into more readable format, and examine the results (and post them here, too).
(See Using SQL Trace and TKPROF from Oracle.com for more information.)
Does the table get frequent updates?
Is foo.description a huge CLOB?
Is network latency making it seem like the query is taking a long time?
Are these tables really complex views?
Were the tables once very large and have since had lots of data deleted?
From what I can remember, Oracle will see this as a simple join that will ignore the indexes. The basic idea is that because you are not limiting the data in either table and just joining them together, it thinks that a full table scan will work better. If the foo table has null in the bar_id column for several rows, then you may want to use the index hint.
As an example, if you run the query based on a single bar_id, the explain plan will likely use the indexes as expected. Without the index it will do a full scan on the bar table, because it is very small, and a full scan on the foo table because you are not filtering out any values for bar_id.
One last note is to make sure you update statistics on the tables and indexes. This would be important for a sparse index as Oracle may realize the index can significantly change the cost of the query.
It is very reasonable to make a full table scan to FOO table, the table has 4996 row and you right a query that you ask oracle to "Send all the Foo records along with their bar.quadrant"

Why isn't index used for this query?

I had a query where an index was not used when I thought it could be, so I reproduced it out of curiosity:
Create a test_table with 1.000.000 rows (10 distinct values in col, 500 bytes of data in some_data).
CREATE TABLE test_table AS (
SELECT MOD(ROWNUM,10) col, LPAD('x', 500, 'x') some_data
FROM dual
CONNECT BY ROWNUM <= 1000000
);
Create an index and gather table stats:
CREATE INDEX test_index ON test_table ( col );
EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );
Try to get distinct values of col and the COUNT:
EXPLAIN PLAN FOR
SELECT col, COUNT(*)
FROM test_table
GROUP BY col;
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 15816 (1)| 00:03:10
| 1 | HASH GROUP BY | | 10 | 30 | 15816 (1)| 00:03:10
| 2 | TABLE ACCESS FULL| TEST_TABLE | 994K| 2914K| 15755 (1)| 00:03:10
---------------------------------------------------------------------------------
The index is not used, providing the hint does not change this.
I guess, the index can't be used in this case, but why?
UPDATE:
Try making the col column NOT NULL. That is the reason it's not using the index. When it's not null, here's the plan.
SELECT STATEMENT, GOAL = ALL_ROWS 69 10 30
HASH GROUP BY 69 10 30
INDEX FAST FULL SCAN SANDBOX TEST_INDEX 56 98072 294216
If the optimizer determines that it's more efficient NOT to use the index (maybe due to rewriting the query), then it won't. Optimizer hints are just that, namely, hints to tell Oracle an index you'd like it to use. You can think of them as suggestions. But if the optimizer determines that it's better not to use the index (again, as result of query rewrite for example), then it's not going to.
Refer to this link: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm
"Specifying one of these hints causes the optimizer to choose the specified access path only if the access path is available based on the existence of an index or cluster and on the syntactic constructs of the SQL statement. If a hint specifies an unavailable access path, then the optimizer ignores it."
Since you are running a count(*) operation, the optimizer has determined that it's more efficient to just scan the whole table and hash instead of using your index.
Here's another handy link on hints:
http://www.dba-oracle.com/t_hint_ignored.htm
you forgot this really important information: COL is not null
If the column is NULLABLE, the index can not be used because there might be unindexed rows.
SQL> ALTER TABLE test_table MODIFY (col NOT NULL);
Table altered
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*) FROM test_table GROUP BY col;
Explained
SQL> SELECT * FROM table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1077170955
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 1954 (1)| 00:00:2
| 1 | SORT GROUP BY NOSORT| | 10 | 30 | 1954 (1)| 00:00:2
| 2 | INDEX FULL SCAN | TEST_INDEX | 976K| 2861K| 1954 (1)| 00:00:2
--------------------------------------------------------------------------------
I ran Peter's original stuff and reproduced his results. I then applied dcp's suggestion...
SQL> alter table test_table modify col not null;
Table altered.
SQL> EXEC dbms_stats.gather_table_stats( user, 'TEST_TABLE' , cascade=>true)
PL/SQL procedure successfully completed.
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*)
3 FROM test_table
4 GROUP BY col;
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2099921975
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 574 (9)| 00:00:07 |
| 1 | HASH GROUP BY | | 10 | 30 | 574 (9)| 00:00:07 |
| 2 | INDEX FAST FULL SCAN| TEST_INDEX | 1000K| 2929K| 532 (2)| 00:00:07 |
------------------------------------------------------------------------------------
9 rows selected.
SQL>
The reason this matters, is because NULL values are not included in a normal B-TREE index, but the GROUP BY has to include NULL as a grouping "value" in your query. By telling the optimizer that there are no NULLs in col it is free to use the much more efficient index (I was getting an elapsed time of almost 3.55 seconds with the FTS). This is a classic example of how metadata can influence the optimizer.
Incidentally, this is obviously a 10g or 11g database, because it uses the HASH GROUP BY algorithm, instead of the older SORT (GROUP BY) algorithm.
bitmap index will do as well
Execution Plan
----------------------------------------------------------
Plan hash value: 2200191467
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 15983 (2)| 00:03:12 |
| 1 | HASH GROUP BY | | 10 | 30 | 15983 (2)| 00:03:12 |
| 2 | TABLE ACCESS FULL| TEST_TABLE | 1013K| 2968K| 15825 (1)| 00:03:10 |
---------------------------------------------------------------------------------
SQL> create bitmap index test_index on test_table(col);
Index created.
SQL> EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );
PL/SQL procedure successfully completed.
SQL> SELECT col, COUNT(*)
2 FROM test_table
3 GROUP BY col
4 /
Execution Plan
----------------------------------------------------------
Plan hash value: 238193838
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 286 (0)| 00:00:04 |
| 1 | SORT GROUP BY NOSORT | | 10 | 30 | 286 (0)| 00:00:04 |
| 2 | BITMAP CONVERSION COUNT| | 1010K| 2961K| 286 (0)| 00:00:04 |
| 3 | BITMAP INDEX FULL SCAN| TEST_INDEX | | | | |
---------------------------------------------------------------------------------------