Trying to Remove table access full in oracle - sql

i have a simple query
select round(sum(a.wt)) as a_wt
from db.abc a
where a.date is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce
and I want to remove table access full.there are 3 index which are like these on following combination of column
col_no
col_no,date,fant,pyc
wagno,batno
what can be can be done to remove table access full.

One option would be creating a Function Based Index :
create index idx_date_col_pod on ABC (nvl("date",date'1900-01-01'), nvl(col_no,0), pod_cd);
and convert the query to :
select round(sum(wt)) as a_wt
from abc
where nvl("date",date'1900-01-01') = date'1900-01-01' -- matching means "date" column is null assuming there exists no records with this ancient date.
and nvl(col_no,0) != 0 -- non-matching means "col_no" column is not null
and pod_cd = 367
and fant != rce

Usually indexes are not indexing the null values, so the conditions like
where a.date is null
and a.col_no is not null
meaning just "don't use an index in order to get lines for these conditions"
However, there is an option in the create index statement allowing it to index null columns (starting from version 11 as far as I know)
create index abc_date_nulls on abc(date, 1); -- (xxx,1) is doing the trick
Thus you'll create an index that considers null values. This might be useful depending on selectivity of "date is null" condition.
Otherwise or in addition, I'd suggest you to check the selectivity for the condition "pod_cd = 367" and build an index on pod_cd column.
If you are sure the index will help and the database doesn't use it, you can force oracle to use an index using a hint
select /*+ index(index name) */ ... from ...
It is good for tests or for checking the impact indexes can provide but pleasepleaseplease be careful using them in production. Google the documentation and all the things about disadvantages for that approach. Don't tell anyone I told you to use hints on production

Indexes can be used to check for nulls, and to compare two columns against each other.
Setup:
create table abc
( dt date
, col_no number
, pod_cd varchar2(5)
, fant number
, rce number
, wt number )
nologging;
insert /*+ append */ into abc (dt, col_no, pod_cd, fant, rce, wt)
select case mod(rownum,3) when 0 then date '2018-12-31' + mod(rownum,1000) end
, case mod(rownum,7) when 0 then rownum end
, case mod(rownum,2) when 1 then mod(rownum,1000) end
, round(dbms_random.value) + 10
, round(dbms_random.value) + 10
, 1
from xmltable('1 to 10000000');
create index x1 on abc (pod_cd, dt);
create index x2 on abc (fant, rce);
Test for nulls:
select count(*) from abc a
where a.pod_cd = '367'
and a.dt is null;
COUNT(*)
----------
6667
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2253536563
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 20 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | INDEX RANGE SCAN| X1 | 6667 | 46669 | 20 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."POD_CD"='367' AND "A"."DT" IS NULL)
The query was executed using index X1, without touching the table.
Test for fant != rce:
select count(*)
from abc a
where a.fant != a.rce;
COUNT(*)
----------
5000666
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 29151601
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 6468 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | INDEX FAST FULL SCAN| X2 | 5000K| 28M| 6468 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."FANT"<>"A"."RCE")
The query was executed using index X2, also without touching the table.
Test for the full query:
create index x3 on abc(pod_cd, dt, fant, rce, col_no, wt);
select round(sum(a.wt)) as a_wt
from abc a
where a.dt is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce;
A_WT
----------
481
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3828004431
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 28 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 16 | | |
|* 2 | INDEX RANGE SCAN| X3 | 476 | 7616 | 28 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."POD_CD"='367' AND "A"."DT" IS NULL)
filter("A"."COL_NO" IS NOT NULL AND "A"."FANT"<>"A"."RCE")
Full table scans aren't always terrible, though.
SQL> drop index x1;
Index dropped.
SQL> drop index x2;
Index dropped.
SQL> drop index x3;
Index dropped.
select round(sum(a.wt)) as a_wt
from abc a
where a.dt is null
and a.col_no is not null
and a.pod_cd = '367'
and a.fant != a.rce;
A_WT
----------
481
1 row selected.
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 1045519631
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 8188 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 16 | | |
|* 2 | TABLE ACCESS FULL| ABC | 476 | 7616 | 8188 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."COL_NO" IS NOT NULL AND "A"."POD_CD"='367' AND
"A"."DT" IS NULL AND "A"."FANT"<>"A"."RCE")
Table ABC has 10 million rows, and the full scan took 0.18 seconds. This is in a VM on a 4 year old laptop.

Related

Distinct clause on large join

Following this question, suppose now I've set up the indexes, and now I want only to return certain field, without duplicates:
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2
where A.year=2016
and B.year=2016
the problem now is I'm getting something like 150k cod, with only 1000 distinct values, so my query is very inefficient.
Question: how can I improve that? i.e, how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
Thank you in advance!
I'm basing my answer on your question:
how can I tell the DB, for every row on A, to stop joining that row as soon as a match is found?
with the EXISTS clause, once it sees a match it will stop and check for the next record to be checked.
adding the DISTINCT will filter out any duplicate CODs (in case there is one).
select DISTINCT cod
from A ax
where year = 2016
and exists ( select 1
from B bx
WHERE Ax.ID1 = Bx.ID1
AND Ax.ID2 = Bx.ID2
AND Ax.YEAR = Bx.YEAR);
EDIT: Was curious which solution (IN or EXISTS) will give me a better Explain plan
Create the 1st Table Definition
Create table A
(
ID1 number,
ID2 number,
cod varchar2(100),
year number
);
insert 4000000 sequential numbers
BEGIN
FOR i IN 1..4000000 loop
insert into A (id1, id2, cod, year)
values (i, i , i, i);
end loop;
END;
commit;
Create Table B and insert the same data into to it
Create table B
as
select *
from A;
Reinsert Data from Table A to make duplicates
insert into B
select *
from A
Build the Indexes mentioned in the Previous Post Index on join and where
CREATE INDEX A_IDX ON A(year, id1, id2);
CREATE INDEX B_IDX ON B(year, id1, id2);
Update a bunch of rows to make it fetch multiple rows with the year 2016:
update B
set year = 2016
where rownum < 20000;
update A
set year = 2016
where rownum < 20000;
commit;
Check Explain plan using EXISTS
Plan hash value: 1052726981
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS SEMI | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 2 | 36 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("BX"."YEAR"=2016 AND "AX"."ID1"="BX"."ID1" AND "AX"."ID2"="BX"."ID2")
filter("AX"."YEAR"="BX"."YEAR")
Check Explain plan using IN
Plan hash value: 3002464630
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 44 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 44 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| A | 1 | 26 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | A_IDX | 1 | | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B_IDX | 1 | 18 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("YEAR"=2016)
5 - access("YEAR"=2016 AND "ID1"="ID1" AND "ID2"="ID2")
Although my test case is limited, i'm guessing that both the IN and EXISTS clause have nearly the same execution.
On the face of it, what you are actually trying to do should be done like this:
select distinct cod
from A
where year = 2016
and (id1, id2) in (select id1, id2 from B where year = 2016)
The subquery in the WHERE condition is a non-correlated query, so it will be evaluated only once. And the IN condition is evaluated using short-circuiting; instead of a complete join, it will search through the results of the subquery only until a match is found.
EDIT: As Migs Isip points out, there may be duplicate codes in the original table, so a "distinct" may still be needed. I edited my code to add it back after Migs posted his answer.
Not sure about your existing indexes but you can improve your query a bit by adding another JOIN condition like
Select distinct A.cod
from A join B
on A.id1=B.id1 and
A.id2=B.id2 and
A.year = B.year // this one
where A.year=2016;

Why Oracle it's running this (wrong) query

Why Oracle it's running this (wrong) query?
SELECT * FROM CUSTOMERS WHERE CUSTOMER_TYPE_ID = 1ORDER BY ID;
without a space between 1 and ORDER
In Oracle a variable name or identifier starts with underscore("_") or letters. So, for 1order, the interpreter knows there is no identifier, it must be a number, so it tries to get the number and separate the rest and succeeds.
Looking at the explain plan, you can see that Oracle could resolve the filter predicate, and the query is considered valid.
SQL> EXPLAIN PLAN FOR
2 SELECT * FROM OE.CUSTOMERS WHERE CUSTOMER_ID = 232ORDER BY CUSTOMER_ID;
Explained.
SQL>
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 4238351645
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 177 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| CUSTOMERS | 1 | 177 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | CUSTOMERS_PK | 1 | | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CUSTOMER_ID"=232)
14 rows selected.
SQL>
So, optimizer could identify it as access("CUSTOMER_ID"=232)

Join much slower using table() function

I am attempting to use a table() function on an object in order to do a join within a PL/SQL function. When using this function, a query may take up to 20 minutes to complete; when I enter the data directly into a table instead, it takes less than 5 seconds. I have not been able to figure out why there is such a significant difference, but my best hunch is that the index on the column from the joining table is not being used. The column definition for the tables and for the objects is the same.
Here is some example code:
create or replace type VARCHAR20_TYPE is OBJECT
(
val varchar2(20 byte);
);
create or replace type VARCHAR20_TABLE is table of VARCHAR20_TYPE;
create or replace FUNCTION test_function(
in_project_ids VARCHAR20_TABLE
) RETURN INTEGER
IS
l_result INTEGER;
BEGIN
SELECT count(*) into l_result FROM project p JOIN TABLE(in_project_ids) t ON p.project_id = t.val;
RETURN l_result;
END;
If I were to replace in_project_ids in the above example with a join to a real table with the same column definition, it significantly improves the performance of the function.
this is to be expected. when dealing with in memory arrays like this Oracle will assume 8k rows will be in that table.
try this to help it:
SELECT /*+ cardinality(t, 20) */ count(*) into l_result FROM project p JOIN TABLE(in_project_ids) t ON p.project_id = t.val;
where 20 should be a rough guess on the actual number of entries. this is one of the edge cases where hinting is "ok" (and required to help the optimizer).
edit
eg:
SQL> explain plan for SELECT /*+ cardinality(t, 1) */ * FROM project p JOIN TABLE(VARCHAR20_TABLE()) t ON p.project_id = t.val;
Explained.
SQL> select * From table(dbms_xplan.display);
Plan hash value: 858605789
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 27 | 30 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 1 | 2 | 29 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C0011177 | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID | PROJECT | 1 | 25 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."PROJECT_ID"=TO_NUMBER(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)))
Note
-----
- dynamic sampling used for this statement (level=2)
21 rows selected.
SQL> explain plan for SELECT * FROM project p JOIN TABLE(VARCHAR20_TABLE()) t ON p.project_id = t.val;
Explained.
SQL> select * From table(dbms_xplan.display);
Plan hash value: 583089723
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 215K| 33 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 8168 | 215K| 33 (4)| 00:00:01 |
| 2 | TABLE ACCESS FULL | PROJECT | 2000 | 50000 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 16336 | 29 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("P"."PROJECT_ID"=TO_NUMBER(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)))
Note
-----
- dynamic sampling used for this statement (level=2)
19 rows selected.
a trivial example but note the "Rows" on the collection fetch = 8168 without the hint and the change in plan as a result. check the explain plan with the real table vs the collection vs the hinted collection and helpfully, with a reasonable cardinality hint number your plan and performance should improve.

ORACLE - How do I create indexes that will be used when NLS_COMP=Linguistic and NLS_Sort=Binary_CI

By default Oracle uses indexes created.
When I change to NLS_COMP=Linguistic and NLS_Sort=Binary_CI, I get full table scans.
I'd read somewhere that creating an index using (nlssort(name, 'NLS_SORT=BINARY_CI'));
Would work.
As my attempt below shows, not so much. Even if I force it, the performance does not seem to be what I would expect. This is a trivial example I like to solve this for a table with many millions of rows, so full table scans would be bad.
So the question is how to I build indexes so they will be used.
Thanks
-- Setup X
create table x ( name varchar2(30)) ;
insert into x select table_name from all_tables;
create index x_ix on x (name);
create index x_ic on x (nlssort(name, 'NLS_SORT=BINARY_CI'));
/
-- Default Settings
ALTER SESSION SET NLS_COMP=BINARY;
ALTER SESSION SET NLS_SORT=BINARY;
/
set autotrace on
/
select * from X where NAME like 'x%';
--0 rows selected
--
---------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 1 | 17 | 1 (0)| 00:00:01 |
--|* 1 | INDEX RANGE SCAN| X_IX | 1 | 17 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
/
set autotrace off
/
-- Linguistic
ALTER SESSION SET NLS_COMP=LINGUISTIC;
ALTER SESSION SET NLS_SORT=BINARY_CI;
/
set autotrace on
/
select * from X where NAME like 'x%';
--13 rows selected
--
----------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 1 | 17 | 3 (0)| 00:00:01 |
--|* 1 | TABLE ACCESS FULL| X | 1 | 17 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
select /*+ INDEX( X X_IX ) */ * from X where NAME like 'x%';
--13 rows selected
--
---------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 1 | 17 | 9 (0)| 00:00:01 |
--|* 1 | INDEX FULL SCAN | X_IX | 1 | 17 | 9 (0)| 00:00:01 |
---------------------------------------------------------------------------
select /*+ INDEX( X X_IC ) */ * from X where NAME like 'x%';
--13 rows selected
--
--------------------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 1 | 17 | 448 (1)| 00:00:06 |
--|* 1 | TABLE ACCESS BY INDEX ROWID| X | 1 | 17 | 448 (1)| 00:00:06 |
--| 2 | INDEX FULL SCAN | X_IC | 1629 | | 8 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
/
set autotrace off
/
Since Oracle 11g - LIKE CAN use linguistic indexes.
The documentation was modified to:
The SQL functions MAX( ) and MIN( ) cannot use linguistic indexes when NLS_COMP is set to LINGUISTIC
Notice they removed the "and also the LIKE operator" part.
I have reproduced your finding on my test DB (10.2.0.3). Upon investigation, it appears the LIKE operator cannot use the linguistic index -- from the 10gR2 Documentation:
The SQL functions MAX( ) and MIN( ),
and also the LIKE operator, cannot use
linguistic indexes when NLS_COMP is
set to LINGUISTIC.
It seems the main purpose of linguistic indexes is to improve the SORT operation.
If your goal is to search on this column in a case-insensitive way, I suggest you create an index on UPPER(name) and build your query with UPPER(name) LIKE UPPER('x%') instead.
If you want to use another (more complex) linguistic setting, you might want to look at the Oracle Text indexes.
Edit: There is another workaround: you can replace the LIKE 'ABC%' with:
SQL> select * from x where name >= 'ABC' and name < 'ABD';
Execution Plan
----------------------------------------------------------
Plan hash value: 708878862
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | 4 (0)| 00:00:
| 1 | TABLE ACCESS BY INDEX ROWID| X | 1 | 24 | 4 (0)| 00:00:
|* 2 | INDEX RANGE SCAN | X_IC | 1 | | 3 (0)| 00:00:
--------------------------------------------------------------------------------
As you can see if you can translate the LIKE expression to an expression with the comparison operators (> and <) the linguistic index might be used.

Why isn't index used for this query?

I had a query where an index was not used when I thought it could be, so I reproduced it out of curiosity:
Create a test_table with 1.000.000 rows (10 distinct values in col, 500 bytes of data in some_data).
CREATE TABLE test_table AS (
SELECT MOD(ROWNUM,10) col, LPAD('x', 500, 'x') some_data
FROM dual
CONNECT BY ROWNUM <= 1000000
);
Create an index and gather table stats:
CREATE INDEX test_index ON test_table ( col );
EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );
Try to get distinct values of col and the COUNT:
EXPLAIN PLAN FOR
SELECT col, COUNT(*)
FROM test_table
GROUP BY col;
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 15816 (1)| 00:03:10
| 1 | HASH GROUP BY | | 10 | 30 | 15816 (1)| 00:03:10
| 2 | TABLE ACCESS FULL| TEST_TABLE | 994K| 2914K| 15755 (1)| 00:03:10
---------------------------------------------------------------------------------
The index is not used, providing the hint does not change this.
I guess, the index can't be used in this case, but why?
UPDATE:
Try making the col column NOT NULL. That is the reason it's not using the index. When it's not null, here's the plan.
SELECT STATEMENT, GOAL = ALL_ROWS 69 10 30
HASH GROUP BY 69 10 30
INDEX FAST FULL SCAN SANDBOX TEST_INDEX 56 98072 294216
If the optimizer determines that it's more efficient NOT to use the index (maybe due to rewriting the query), then it won't. Optimizer hints are just that, namely, hints to tell Oracle an index you'd like it to use. You can think of them as suggestions. But if the optimizer determines that it's better not to use the index (again, as result of query rewrite for example), then it's not going to.
Refer to this link: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm
"Specifying one of these hints causes the optimizer to choose the specified access path only if the access path is available based on the existence of an index or cluster and on the syntactic constructs of the SQL statement. If a hint specifies an unavailable access path, then the optimizer ignores it."
Since you are running a count(*) operation, the optimizer has determined that it's more efficient to just scan the whole table and hash instead of using your index.
Here's another handy link on hints:
http://www.dba-oracle.com/t_hint_ignored.htm
you forgot this really important information: COL is not null
If the column is NULLABLE, the index can not be used because there might be unindexed rows.
SQL> ALTER TABLE test_table MODIFY (col NOT NULL);
Table altered
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*) FROM test_table GROUP BY col;
Explained
SQL> SELECT * FROM table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1077170955
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 1954 (1)| 00:00:2
| 1 | SORT GROUP BY NOSORT| | 10 | 30 | 1954 (1)| 00:00:2
| 2 | INDEX FULL SCAN | TEST_INDEX | 976K| 2861K| 1954 (1)| 00:00:2
--------------------------------------------------------------------------------
I ran Peter's original stuff and reproduced his results. I then applied dcp's suggestion...
SQL> alter table test_table modify col not null;
Table altered.
SQL> EXEC dbms_stats.gather_table_stats( user, 'TEST_TABLE' , cascade=>true)
PL/SQL procedure successfully completed.
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*)
3 FROM test_table
4 GROUP BY col;
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2099921975
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 574 (9)| 00:00:07 |
| 1 | HASH GROUP BY | | 10 | 30 | 574 (9)| 00:00:07 |
| 2 | INDEX FAST FULL SCAN| TEST_INDEX | 1000K| 2929K| 532 (2)| 00:00:07 |
------------------------------------------------------------------------------------
9 rows selected.
SQL>
The reason this matters, is because NULL values are not included in a normal B-TREE index, but the GROUP BY has to include NULL as a grouping "value" in your query. By telling the optimizer that there are no NULLs in col it is free to use the much more efficient index (I was getting an elapsed time of almost 3.55 seconds with the FTS). This is a classic example of how metadata can influence the optimizer.
Incidentally, this is obviously a 10g or 11g database, because it uses the HASH GROUP BY algorithm, instead of the older SORT (GROUP BY) algorithm.
bitmap index will do as well
Execution Plan
----------------------------------------------------------
Plan hash value: 2200191467
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 15983 (2)| 00:03:12 |
| 1 | HASH GROUP BY | | 10 | 30 | 15983 (2)| 00:03:12 |
| 2 | TABLE ACCESS FULL| TEST_TABLE | 1013K| 2968K| 15825 (1)| 00:03:10 |
---------------------------------------------------------------------------------
SQL> create bitmap index test_index on test_table(col);
Index created.
SQL> EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );
PL/SQL procedure successfully completed.
SQL> SELECT col, COUNT(*)
2 FROM test_table
3 GROUP BY col
4 /
Execution Plan
----------------------------------------------------------
Plan hash value: 238193838
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 286 (0)| 00:00:04 |
| 1 | SORT GROUP BY NOSORT | | 10 | 30 | 286 (0)| 00:00:04 |
| 2 | BITMAP CONVERSION COUNT| | 1010K| 2961K| 286 (0)| 00:00:04 |
| 3 | BITMAP INDEX FULL SCAN| TEST_INDEX | | | | |
---------------------------------------------------------------------------------------