Why is this query so slow? - sql

I have tables FOO and BAR. FOO has a foreign key to BAR's PK.
When I execute the following query it takes several seconds.
select foo.name, foo.description, bar.quadrant from FOO, BAR
where FOO.BAR_ID = BAR.BAR_ID
Here is my explain plan:
OPERATION OBJECT_NAME OPTIONS COST
SELECT STATEMENT 39
HASH JOIN 39
TABLE ACCESS BAR FULL 2
TABLE ACCESS FOO FULL 36
FOO has 6000 records in it and BAR only has 5. The BAR_ID column is a NUMBER.
This is running on Oracle 10g and it is taking ~3 seconds to complete. That seems extreme given how quickly it performs other queries.
EDIT table defs:
CREATE TABLE BAR
(
"BAR_ID" NUMBER NOT NULL,
"QUADRANT" VARCHAR2(100 BYTE) NOT NULL,
CONSTRAINT "BAR_PK" PRIMARY KEY ("BAR_ID")
)
CREATE TABLE FOO
( "FOO_ID" NUMBER NOT NULL,
"BAR_ID" NUMBER NOT NULL,
"NAME" VARCHAR2(250 BYTE) NOT NULL,
"DESCRIPTION" VARCHAR2(250 BYTE),
CONSTRAINT "FOO_PK" PRIMARY KEY ("FOO_ID"),
CONSTRAINT "FOO__FK1" FOREIGN KEY ("BAR_ID") REFERENCES BAR ("BAR_ID") ENABLE
);

Are you sure you have good statistics? I created a test case from your DDL and saw this plan before statistics:
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4996 | 1619K| 10 (10)| 00:00:01 |
|* 1 | HASH JOIN | | 4996 | 1619K| 10 (10)| 00:00:01 |
| 2 | TABLE ACCESS FULL| BAR | 5 | 325 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| FOO | 4996 | 1302K| 6 (0)| 00:00:01 |
---------------------------------------------------------------------------
(If you get the dbms_xplan output you'll also see "dynamic sampling used for this statement").
After doing this:
SQL> begin dbms_stats.gather_table_stats(user,'FOO'); end;
2 /
PL/SQL procedure successfully completed.
SQL> c/FOO/BAR/
1* begin dbms_stats.gather_table_stats(user,'BAR'); end;
SQL> /
PL/SQL procedure successfully completed.
I see:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4996 | 131K| 9 (12)| 00:00:01 |
| 1 | MERGE JOIN | | 4996 | 131K| 9 (12)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| BAR | 5 | 40 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | BAR_PK | 5 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 4996 | 94924 | 7 (15)| 00:00:01 |
| 5 | TABLE ACCESS FULL | FOO | 4996 | 94924 | 6 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

There's a bucket load of instrumentation built into Oracle for investigating this sort of issue.
Start with this paper:
http://method-r.com/downloads/doc_download/10-for-developers-making-friends-with-the-oracle-database-cary-millsap

Get a TKPROF trace for your query to see what really happens - explain plan is just an estimate.
Basically, execute ALTER SESSION SET SQL_TRACE = TRUE command before your query, execute the query, and then ALTER SESSION SET SQL_TRACE = FALSE. Then find the trace file produced from location determined by USER_DUMP_DEST parameter (look into v$parameter view). Use TKPROF utility to process the raw trace file into more readable format, and examine the results (and post them here, too).
(See Using SQL Trace and TKPROF from Oracle.com for more information.)

Does the table get frequent updates?
Is foo.description a huge CLOB?
Is network latency making it seem like the query is taking a long time?
Are these tables really complex views?
Were the tables once very large and have since had lots of data deleted?

From what I can remember, Oracle will see this as a simple join that will ignore the indexes. The basic idea is that because you are not limiting the data in either table and just joining them together, it thinks that a full table scan will work better. If the foo table has null in the bar_id column for several rows, then you may want to use the index hint.
As an example, if you run the query based on a single bar_id, the explain plan will likely use the indexes as expected. Without the index it will do a full scan on the bar table, because it is very small, and a full scan on the foo table because you are not filtering out any values for bar_id.
One last note is to make sure you update statistics on the tables and indexes. This would be important for a sparse index as Oracle may realize the index can significantly change the cost of the query.

It is very reasonable to make a full table scan to FOO table, the table has 4996 row and you right a query that you ask oracle to "Send all the Foo records along with their bar.quadrant"

Related

Extremely Huge time take for executing my following query?

I just make some queries for select data from my server. The query is:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
ds_doc.archiveno,
ds_arch.archiveid
FROM ECR.ds_comp,
ECR.ds_doc,
ECR.ds_arch
WHERE ds_comp.docidno=ds_doc.docidno
AND ds_doc.archiveno =ds_arch.archiveno
GROUP BY ds_doc.archiveno,
ds_arch.archiveid;
result what is expecting is :
9708,24 9704,93 9 Vee3 0,009255342
13140,55 12682,93 10 Vf5 0,012095385
104533,94 89183,02 3 Mdf4 0,085051556
72346,34 48290,63 7 Sds2 0,046053534
But this query almost take one day. Any idea for optimize this query please?
You provide close to no information that is required to help with performance problem, so only a general checklist can be provided
Check the Query
The query does not qualify the columns clengthand plength so please check if they are defined in the table ds_comp - if not, maybe you do not need to join to this table at all...
Also I assume that docidno is a primary key of ds_doc and archiveno is PK of ds_arch. If not you query will work, but you will get a different result as you expect due to duplication caused by the join (this may also cause excesive elapsed time)!
Verify the Execution Plan
Produce the execution plan for your query in text form (to be able to post it) as follows
EXPLAIN PLAN SET STATEMENT_ID = '<sometag>' into plan_table FOR
... your query here ...
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', '<sometag>','ALL'));
Remember that you are joining complete tables (not only few rows for some ID), so if you see INDEX ACCESS or NESTED LOOP there is a problem that explains the long runtime.
You want to see only HASH JOIN and FULL TABLE SCAN in your plan.
Index Access
Contrary to some recommendations in other answers if you want to profit from Index definition you do not need indexes on join columns (as explained above). What you can do is to cover all required attributes in indexes and perform the query using only indexes and ommit the table access at all. This will help if the tables are bright, i.e. the row size is large.
This definition will be needed
create index ds_comp_idx1 on ds_comp (docidno,clength,plength);
create index ds_doc_idx1 on ds_doc (docidno,archiveno);
create index ds_arch_idx1 on ds_arch (archiveno,archiveid);
and you will receive this plan
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1119K| 97M| 908 (11)| 00:00:01 |
| 1 | HASH GROUP BY | | 1119K| 97M| 908 (11)| 00:00:01 |
|* 2 | HASH JOIN | | 1119K| 97M| 831 (3)| 00:00:01 |
|* 3 | HASH JOIN | | 1001 | 52052 | 5 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | DS_ARCH_IDX1 | 11 | 286 | 1 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| DS_DOC_IDX1 | 1001 | 26026 | 4 (0)| 00:00:01 |
| 6 | INDEX FAST FULL SCAN | DS_COMP_IDX1 | 1119K| 41M| 818 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."DOCIDNO"="D"."DOCIDNO")
3 - access("D"."ARCHIVENO"="A"."ARCHIVENO")
Note the INDEX FULL SCAN and INDEX FAST FULL SCAN which means you are scanning the data from the index only and you do not need to perform the full table scan.
Use Parallel Option
With your rather simple query there is not much option to improve something. What works always is to deploy a parallel query using the /*+ PARALLEL(N) */ hint.
The precontition is that your database is configured for this option and you have hardware that can deploy it.
Rewrite using explicit joins:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
d.archiveno,
a.archiveid
FROM ECR.ds_comp c
INNER JOIN ECR.ds_doc d ON c.docidno=d.docidno
INNER JOIN ECR.ds_arch a ON d.archiveno=a.archiveno
GROUP BY d.archiveno,
a.archiveid;
Check indexes exist on join columns c.docidno, d.docidno, d.archiveno, a.archiveno

Group By not using index

There is a table which has trades and its row count is 220 million, one of column is counterparty. The column is indexed. If I run a normal query like:
select *
from <table>
where counterparty = 'X'
The plan shows it uses index. Where as if I use group by on same column, it doesn't use index and does table scan. i.e.: for below query:
select counterparty, count(*)
from <table>
group by counterparty
Could you please advise, why it's not using the index for group by? FYI - I have already run the db stats.
FYI - the plan for 1st and second query is shown below:
Note - we are migrating data from Sybase to oracle, when I use same group by in Sybase with same indexes. The query uses indexes, but not in oracle.
First
Plan hash value: 350128866
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2209 | 1469K| 914 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FXCASHTRADE | 2209 | 1469K| 914 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SCB_FXCASHTRADE_002 | 2209 | | 11 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("COUNTERPARTY"='test')
Second
> Plan hash value: 2920872612
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 100K| 2151K| | 6558K (1)| 00:00:38 |
| 1 | HASH GROUP BY | | 100K| 2151K| 6780M| 6558K (1)| 00:00:38 |
| 2 | TABLE ACCESS FULL| FXCASHTRADE | 221M| 4643M| | 6034K (1)| 00:00:35 |
I am going to make an educated guess and say that counterparty is defined as a nullable column. As such, Oracle can't solely rely on the index to generate the results of your group by query, since null values need to be included in the results, but (Oracle) indexes don't include null values. With that in mind, a full table scan makes sense.
If there is no good reason for counterparty to be nullable, go ahead and make it not null. The execution plan should then change to use the index as expected.
Alternatively, if you can't make that change, but you don't care about null values for this particular query, you can tweak the query to filter our null values explicitly. This should also result in a better execution plan.
select counterparty, count(*)
from tbl
where counterparty is not null -- add this filter
group by counterparty
Note: I'm no Sybase expert, but I assume that indexes include null values. Oracle indexes do not include null values. That would explain the difference in execution plan between both databases.

Oracle SQL execution plan changes due to SYS_OP_C2C internal conversion

I'm wondering why cost of this query
select * from address a
left join name n on n.adress_id=a.id
where a.street='01';
is higher than
select * from address a
left join name n on n.adress_id=a.id
where a.street=N'01';
where address table looks like this
ID NUMBER
STREET VARCHAR2(255 CHAR)
POSTAL_CODE VARCHAR2(255 CHAR)
and name table looks like this
ID NUMBER
ADDRESS_ID NUMBER
NAME VARCHAR2(255 CHAR)
SURNAME VARCHAR2(255 CHAR)
These are costs returned by explain plan
Explain plan for '01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3591 | 1595K| 87 (0)| 00:00:02 |
| 1 | NESTED LOOPS OUTER | | 3591 | 1595K| 87 (0)| 00:00:02 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 3 | 207 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."STREET"='01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
Explain plan for N'01'
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 347 | 154K| 50 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 347 | 154K| 50 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | ADDRESS | 1 | 69 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| NAME | 1157 | 436K| 47 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | NAME_HSI | 1157 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("A"."STREET")=U'01')
4 - access("N"."ADDRESS_ID"(+)="A"."ID")
As you can see cost for N'01' query is lower than cost for '01'. Any idea why? N'01' needs additionally convert varchar to nvarchar so cost should be higher (SYS_OP_C2C()). The other question is why rows processed by N'01' query is lower than '01'?
[EDIT]
Table address has 30 rows.
Table name has 19669 rows.
SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
I am not sure about the reason why the number of rows are less, but I can guarantee it could be more too. Cost estimation won't be affected.
Let's try to see step-by-step in a test case.
SQL> CREATE TABLE t AS SELECT 'a'||LEVEL col FROM dual CONNECT BY LEVEL < 1000;
Table created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = 'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 5 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter("COL"='a10')
13 rows selected.
SQL>
So far so good. Since there is only one row with value as 'a10', optimizer estimated one row.
Let's see with the national characterset conversion.
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE col = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 10 | 50 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
1 - filter(SYS_OP_C2C("COL")=U'a10')
13 rows selected.
SQL>
What happened here? We can see filter(SYS_OP_C2C("COL")=U'a10'), which means an internal function is applied and it converts the varchar2 value to nvarchar2. The filter now found 10 rows.
This will also suppress any index usage, since now a function is applied on the column. We can tune it by creating a function-based index to avoid full table scan.
SQL> create index nchar_indx on t(to_nchar(col));
Index created.
SQL>
SQL> EXPLAIN PLAN FOR SELECT * FROM t WHERE to_nchar(col) = N'a10';
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1400144832
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 50 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10 | 50 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | NCHAR_INDX | 4 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - access(SYS_OP_C2C("COL")=U'a10')
14 rows selected.
SQL>
However, will this make the execution plans similar? No. i think with two different charactersets , the filter will not be applied alike. Thus, the difference lies.
My research says,
Usually, such scenarios occur when the data coming via an application
is nvarchar2 type, but the table column is varchar2. Thus, Oracle
applies an internal function in the filter operation. My suggestion
is, to know your data well, so that you use similar data types during
design phase.
When worrying about explain plans, it matters whether there are current statistics on the tables. If the statistics do not represent the actual data reasonably well, then the optimizer will make mistakes and estimate cardinalities incorrectly.
You can check how long ago statistics were gathered by querying the data dictionary:
select table_name, last_analyzed
from user_tables
where table_name in ('ADDRESS','NAME');
You can gather statistics for the optimizer to use by calling DBMS_STATS:
begin
dbms_stats.gather_table_stats(user, 'ADDRESS');
dbms_stats.gather_table_stats(user, 'NAME');
end;
So perhaps after gathering statistics you will get different explain plans. Perhaps not.
The difference in your explain plans is primarily because the optimizer estimates how many rows it will find in address table differently in the two cases.
In the first case you have an equality predicate with same datatype - this is good and the optimizer can often estimate cardinality (row count) reasonably well for cases like this.
In the second case a function is applied to the column - this is often bad (unless you have function based indexes) and will force the optimizer to take a wild guess. That wild quess will be different in different versions of Oracle as the developers of the optimizer tries to improve upon it. Some versions the wild guess will simply be something like "I guess 5% of the number of rows in the table."
When comparing different datatypes, it is best to avoid implicit conversions, particularly when like this case the implicit conversion makes a function on the column rather than the literal. If you have cases where you get a value as datatype NVARCHAR2 and need to use it in a predicate like above, it can be a good idea to explicitly convert the value to the datatype of the column.
select * from address a
left join name n on n.adress_id=a.id
where a.street = CAST( N'01' AS VARCHAR2(255));
In this case with a literal it does not make sense, of course. Here you would just use your first query. But if it was a variable or function parameter, maybe you could have use cases for doing something like this.
As I can see the first query returns 3591 rows, the second one returns 347 rows. So Oracle needs less I/O operation that's why the cost is less.
Don't be confused with
N'01' needs additionally convert varchar to nvarchar
Oracle does one hard parse and then uses soft parse for the same queries. So the longer your oracle works the faster it becomes.

Distinct values on indexed column

I have a table with 115 M rows. One of the column is indexed (index called "my_index" on explain plan below) and not nullable. Moreover, this column has just one distinct value so far.
When I do
select distinct my_col from my_table;
, it takes 230 seconds which is very long. Here is the explain plan.
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 3 | 22064 (90)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 22064 (90)| 00:03:23 |
| 2 | INDEX FULL SCAN | my_index | 115M| 331M| 2363 (2)| 00:00:22 |
Since the column has just one distinct value, why does it take so long ? Why Oracle does not just check index entries and fastly find that there is just one possible value for this column ? On the explain plan above, the index scanning seems to take 22 s but what is this "SORT UNIQUE NOSORT" which takes ages ?
Thank you in advance for your help
Re analyse the table.
EXEC dbms_stats.gather_table_stats('owner','table_name',cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ');
Change Index Type
One distinct value out of 115M rows??!! That's what called as low cardinality, not so good for the 'normal' B-Tree index Consider a bitmapped index. (If at all you have B-tree)
Reconstructing Query
If you are sure that no new values will be added to this column then please remove the distinct clause and rather use as Abhijith said.
SORT UNIQUE NOSORT is not taking too long. You are looking at the estimates from a bad execution plan that is probably the result of unreasonable optimizer parameters. For example, setting the parameter OPTIMIZER_INDEX_COST_ADJ to 1 instead of the default 100 can produce a similar plan. Most likely your query runs slowly because your database is busy or just slow.
What's wrong with the posted execution plan?
The posted execution plan seems unreasonable. Retrieving data should take much longer than simply throwing out duplicates. And the consumer operation, SORT UNIQUE NOSORT, can start at almost the same time as the producer operation, INDEX FULL SCAN. Normally they should finish at almost the same time. The execution plan in the question shows the optimizer estimates. The screenshot below of an active report shows the actual timelines for a very similar query. All steps are starting and stopping at almost the same time.
Sample setup with reasonable plan
Below is a very similar setup, but with a very plain configuration. Same number of rows read (115 million) and returned (1), and almost the exact same segment size (329MB vs 331 MB). The plan shows almost all of the time being spent on the INDEX FULL SCAN.
drop table test1 purge;
create table test1(a number not null, b number, c number) nologging;
begin
for i in 1 .. 115 loop
insert /*+ append */ into test1 select 1, level, level
from dual connect by level <= 1000000;
commit;
end loop;
end;
/
create index test1_idx on test1(a);
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 244K (4)| 00:48:50 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 244K (4)| 00:48:50 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 237K (1)| 00:47:30 |
--------------------------------------------------------------------------------
Re-creating a bad plan
--Set optimizer_index_cost_adj to a ridiculously low value.
--This changes the INDEX FULL SCAN estimate from 47 minutes to 29 seconds.
alter session set optimizer_index_cost_adj = 1;
--Changing the CPUSPEEDNW to 800 will exactly re-create the time estimate
--for SORT UNIQUE NOSORT. This value is not ridiculous, and it is not
--something you should normally change. But it does imply your CPUs are
--slow. My 2+ year-old desktop had an original score of 1720.
begin
dbms_stats.set_system_stats( 'CPUSPEEDNW', 800);
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 16842 (86)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 16842 (86)| 00:03:23 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 2389 (2)| 00:00:29 |
--------------------------------------------------------------------------------
How to investigate
Check the parameters.
select name, value from v$parameter where name like 'optimizer_index%'
NAME VALUE
---- -----
optimizer_index_cost_adj 1
optimizer_index_caching 0
Also check the system statistics.
select * from sys.aux_stats$;
+---------------+------------+-------+------------------+
| SNAME | PNAME | PVAL1 | PVAL2 |
+---------------+------------+-------+------------------+
| SYSSTATS_INFO | STATUS | | COMPLETED |
| SYSSTATS_INFO | DSTART | | 09-23-2013 17:52 |
| SYSSTATS_INFO | DSTOP | | 09-23-2013 17:52 |
| SYSSTATS_INFO | FLAGS | 1 | |
| SYSSTATS_MAIN | CPUSPEEDNW | 800 | |
| SYSSTATS_MAIN | IOSEEKTIM | 10 | |
| SYSSTATS_MAIN | IOTFRSPEED | 4096 | |
| SYSSTATS_MAIN | SREADTIM | | |
| SYSSTATS_MAIN | MREADTIM | | |
| SYSSTATS_MAIN | CPUSPEED | | |
| SYSSTATS_MAIN | MBRC | | |
| SYSSTATS_MAIN | MAXTHR | | |
| SYSSTATS_MAIN | SLAVETHR | | |
+---------------+------------+-------+------------------+
To find out where the time is really spent, use a tool like the active report.
select dbms_sqltune.report_sql_monitor(sql_id => '5s63uf4au6hcm',
type => 'active') from dual;
If there are only a few distinct values of the column, try a compressed index:
create index my_index on my_table (my_col) compress;
This will store each distinct value of the column only once, hopefully reducing the execution time of your query.
As a bonus: use this to see the actual plan used for a query:
select /*+ gather_plan_statistics */ distinct my_col from my_table;
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR);
The gather_plan_statistics hint will collect more data (it will take longer to execute), but it works without it too. See the documentation of DBMS_XPLAN.DISPLAY_CURSOR for more details.
See the explain plan carefully.
It scans the whole index to know what you are trying to fetch
Then applies distinct function (try to retrieve the unique values). Though you say there is only one unique value, it has to scan the whole index to get the values. Oracle does not know that there is only one distinct value in the index. You can restrict the rownum = 1 to get the quick answer.
Try this to get the quick answer
select my_col from my_table where rownum = 1;
It is highly unfavourable to add an index on a column which has very less distribution. This is bad for the table and overall for the application as well. This just does not make any sense

SQL Query going for Full Table scan instead of Index Based Scan

I have two tables:
create table big( id number, name varchar2(100));
insert into big(id, name) select rownum, object_name from all_objects;
create table small as select id from big where rownum < 10;
create index big_index on big(id);
On these tables if I execute the following query:
select *
from big_table
where id like '45%'
or id in ( select id from small_table);
it always goes for a Full Table Scan.
Execution Plan
----------------------------------------------------------
Plan hash value: 2290496975
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3737 | 97162 | 85 (3)| 00:00:02 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| BIG | 74718 | 1897K| 85 (3)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| SMALL | 1 | 4 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=45 OR EXISTS (SELECT /*+ */ 0 FROM "SMALL" "SMALL"
WHERE "ID"=:B1))
3 - filter("ID"=:B1)
Are there any ways in which we can rewrite the Query So that it always goes for index Scan.
No, no and no.
You do NOT want it to use an index. Luckily Oracle is smarter than that.
ID is numeric. While it might have ID values of 45,450,451,452,4501,45004,4500003 etc, in the indexes these values will be scattered anywhere and everywhere. If you went with a condition such as ID BETWEEN 450 AND 459, then it may be worth using the index.
To use the index it would have to scan it all the way from top to bottom (converting each ID to a character to do the LIKE comparison). Then, for any match, it has to go off to get the NAME column.
It has decided that it is easier to and quicker to scan the table (which, with 75,000 rows isn't that big anyway) rather than mucking about going back and forth between the index and the table.
The others are right, you shouldn't use a numeric column like that.
However, it is actually, the OR <subquery> construct that is causing a (performance) problem in this case. I don't know if it is different in version 11, but up to version 10gr2, it causes a a filter operation with what is basically a nested loop with a correlated subquery. In your case, the use of a numeric column as a varchar also results in a full table scan.
You can rewrite your query like this:
select *
from big
where id like '45%'
union all
select *
from big
join small using(id)
where id not like '45%';
With your test case, I end up with a row count of 174000 rows in big and 9 small.
Running your query takes 7 seconds with 1211399 consistent gets.
Running my query 0,7 seconds and uses 542 consistent gets.
The explain plans for my query is:
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8604 | 154 (6)|
| 1 | UNION-ALL | | | |
|* 2 | TABLE ACCESS FULL | BIG | 8603 | 151 (4)|
| 3 | NESTED LOOPS | | 1 | 3 (0)|
|* 4 | TABLE ACCESS FULL | SMALL | 1 | 3 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID| BIG | 1 | 0 (0)|
|* 6 | INDEX UNIQUE SCAN | BIG_PK | 1 | 0 (0)|
---------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_CHAR("ID") LIKE '45%')
4 - filter(TO_CHAR("SMALL"."ID") NOT LIKE '45%')
6 - access("BIG"."ID"="SMALL"."ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
542 consistent gets
0 physical reads
0 redo size
33476 bytes sent via SQL*Net to client
753 bytes received via SQL*Net from client
76 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1120 rows processed
Something like this might work:
select *
from big_table big
where id like '45%'
or exists ( select id from small_table where id = big.id);