Need help in understanding query execution time - sql

I previously posted a question about joining two tables based on certain criteria over here
How to join two tables based on a timestamp (with variance of a few seconds)? (link doesn't have to be read)
I found that after creating indexes it works really fast. The snippet of my current code is
CREATE INDEX INDEXNAME1 ON TABLEA (CALL_DATE+5/86400);
CREATE INDEX INDEXNAME2 ON TABLEA (CALL_DATE+6/86400);
CREATE INDEX INDEXNAME3 ON TABLEB (NUMBER1,NUMBER2);
CREATE INDEX INDEXNAME4 ON TABLEA (NUMBER1,NUMBER2);
----
INSERT INTO AB_RECON (
SELECT A.*,B.* FROM TABLEB B FULL OUTER JOIN TABLEA A
ON B.NUMBER1=A.NUMBER1 AND B.NUMBER2=A.NUMBER2 AND
B.CALL_DATE-A.CALL_DATE IN (5/86400,6/86400);
----
DROP INDEX INDEXNAME1;
DROP INDEX INDEXNAME2;
DROP INDEX INDEXNAME3;
DROP INDEX INDEXNAME4;
Don't bother about the correctness of the code, it works. But the problem I'm facing is that the execution time is quite random. 90% of the time, the execution time is really quick (2-5 minutes) but sometimes (like right now its running for more than 20 minutes).
I know it might seem like "depends on the size of the tables" but on an average
TABLEA has 1.4 million records, and TABLEB has .9 million records. + or - a few ten thousands, not more.
I've run the following code (ran as SYS) to identify the queries currently running on the database along with elapsed time
select sess.sid, sess.serial#, sess.sql_id, sess.last_call_et as
EXECUTION_TIME,sq.sql_text from v$session sess,v$sql sq
where status = 'ACTIVE' and last_call_et > sysdate - (sysdate - (3/86400))
and username is not null and sess.sql_id=sq.sql_id;
And I get the following output
SID || SERIAL# || SQL_ID || EXECUTION_TIME || SQL_TEXT
246 || 51291 || dxa2sz103vt0g || 1256 || <my recon query pasted above>
I don't understand what's taking it so long cause from the looks of it, its the only active query. I'm not a DBA so I don't fully understand if there's something I'm missing.
Would appreciate it if some light could be shed into possible reasons/ solutions so that I can be point myself in the correct direction.
Additional information if required
Explain Plan
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 2386K| 530M| | 2395M (1)|999:59:59 |
| 1 | LOAD TABLE CONVENTIONAL | AB_RECON | | | | | |
| 2 | VIEW | | 2386K| 530M| | 2395M (1)|999:59:59 |
| 3 | UNION-ALL | | | | | | |
|* 4 | HASH JOIN RIGHT OUTER| | 1417K| 109M| 49M| 10143 (1)| 00:02:02 |
| 5 | TABLE ACCESS FULL | TABLEA | 968K| 38M| | 1753 (1)| 00:00:22 |
| 6 | TABLE ACCESS FULL | TABLEB | 1417K| 52M| | 2479 (1)| 00:00:30 |
|* 7 | FILTER | | | | | | |
| 8 | TABLE ACCESS FULL | TABLEA | 968K| 38M| | 1754 (1)| 00:00:22 |
|* 9 | TABLE ACCESS FULL | TABLEB | 1 | 29 | | 2479 (1)| 00:00:30 |
Oracle Edition
Oracle Database 11g Enterprise Edition 11.2.0.3.0 64-bit Production

Ok, I didn't really find an answer to my question but I did find a work around.
I've broken down what I wanted to do (basically a reconciliation between two sources of information) into three parts.
Matched in both source A and B
Missing in source B
Missing in source A
The queries I've used is given below. Overall it runs much faster. Will have to monitor its performance over several executions.
INSERT INTO AB_RECON (
SELECT M.*,I.* FROM TABLEA M, TABLEB I
WHERE M.ANUMBER=I.ANUMBER AND M.BNUMBER=I.BNUMBER
AND M.CALL_DATE-I.CALL_DATE B (5/86400,6/86400));
COMMIT;
INSERT INTO AB_RECON
(SELECT ANUMBER,BNUMBER,CALL_DATE,CALL_DURATION,REF_NO,NULL,NULL,NULL,NULL,NULL FROM
(SELECT * FROM TABLEA M WHERE NOT EXISTS
(SELECT ANUMBER,BNUMBER,CALL_DATE,CALL_DURATION,REF_NO FROM AB_RECON I
WHERE M.ANUMBER=I.ANUMBER AND M.BNUMBER=I.BNUMBER AND M.CALL_DATE=I.CALL_DATE
)
)
);
COMMIT;
INSERT INTO AB_RECON
(SELECT NULL,NULL,NULL,NULL,NULL,ANUMBER,BNUMBER,CALL_DATE,CALL_DURATION,REF_NO FROM
(SELECT * FROM TABLEB M WHERE NOT EXISTS
(SELECT ANUMBER,BNUMBER,CALL_DATE,CALL_DURATION,REF_NO FROM AB_RECON I
WHERE M.ANUMBER=I.ANUMBER AND M.BNUMBER=I.BNUMBER AND M.CALL_DATE=I.CALL_DATE
)
)
);
Honestly, I have no idea about the theory behind why this works faster. So my main question remains unresolved.

Related

Query Optimization - subselect in Left Join

I'm working on optimizing a sql query, and I found a particular line that appears to be killing my queries performance:
LEFT JOIN anothertable lastweek
AND lastweek.date>= (SELECT MAX(table.date)-7 max_date_lweek
FROM table table
WHERE table.id= lastweek.id)
AND lastweek.date< (SELECT MAX(table.date) max_date_lweek
FROM table table
WHERE table.id= lastweek.id)
I'm working on a way of optimizing these lines, but I'm stumped. If anyone has any ideas, please let me know!
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1908654 | 145057704 | 720461 | 00:00:29 |
| * 1 | HASH JOIN RIGHT OUTER | | 1908654 | 145057704 | 720461 | 00:00:29 |
| 2 | VIEW | VW_DCL_880D8DA3 | 427487 | 7694766 | 716616 | 00:00:28 |
| * 3 | HASH JOIN | | 427487 | 39328804 | 716616 | 00:00:28 |
| 4 | VIEW | VW_SQ_2 | 7174144 | 193701888 | 278845 | 00:00:11 |
| 5 | HASH GROUP BY | | 7174144 | 294139904 | 278845 | 00:00:11 |
| 6 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| * 7 | HASH JOIN | | 8549735 | 555732775 | 429294 | 00:00:17 |
| 8 | VIEW | VW_SQ_1 | 7174144 | 172179456 | 278845 | 00:00:11 |
| 9 | HASH GROUP BY | | 7174144 | 294139904 | 278845 | 00:00:11 |
| 10 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| 11 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| * 12 | TABLE ACCESS STORAGE FULL | TASK | 1908654 | 110701932 | 2520 | 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("SYS_ID"(+)="TASK"."PARENT")
* 3 - access("ITEM_2"="TASK_LWEEK"."SYS_ID")
* 3 - filter("TASK_LWEEK"."SNAPSHOT_DATE"<"MAX_DATE_LWEEK")
* 7 - access("ITEM_1"="TASK_LWEEK"."SYS_ID")
* 7 - filter("TASK_LWEEK"."SNAPSHOT_DATE">=INTERNAL_FUNCTION("MAX_DATE_LWEEK"))
* 12 - storage("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE#!)-15)
* 12 - filter("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE#!)-15)
Well, you are not even showing the select. As I can see that the select is done over Exadata ( Table Access Storage Full ) , perhaps you need to ask yourself why do you need to make 4 access to the same table.
You access fourth times ( lines 6, 10, 11, 12 ) to the main table TASK with 170994691 rows ( based on estimation of the CBO ). I don't know whether the statistics are up-to-date or it is optimizing sampling kick in due to lack of good statistics.
A solution could be use WITH for generating intermediate results that you need several times in your outline query
with my_set as
(SELECT MAX(table.date)-7 max_date_lweek ,
max(table.date) as max_date,
id from FROM table )
select
.......................
from ...
left join anothertable lastweek on ( ........ )
left join myset on ( anothertable.id = myset.id )
where
lastweek.date >= myset.max_date_lweek
and
lastweek.date < myset.max_date
Please, take in account that you did not provide the query, so I am guessing a lot of things.
Since complete information is not available I will suggest:
You are using the same query twice then why not use CTE such as
with CTE_example as (SELECT MAX(table.date), max_date_lweek, ID
FROM table table)
Looking at your explain plan, the only table being accessed is TASK. From that, I infer that the tables in your example: ANOTHERTABLE and TABLE are actually the same table and that, therefore, you are trying to get the last week of data that exists in that table for each id value.
If all that is true, it should be much faster to use an analytic function to get the max date value for each id and then limit based on that.
Here is an example of what I mean. Note I use "dte" instead of "date", to remove confusion with the reserved word "date".
LEFT JOIN ( SELECT lastweek.*,
max(dte) OVER ( PARTITION BY id ) max_date
FROM anothertable lastweek ) lastweek
ON 1=1 -- whatever other join conditions you have, seemingly omitted from your post
AND lastweek.dte >= lastweek.max_date - 7;
Again, this only works if I am correct in thinking that table and anothertable are actually the same table.

Trying to optimize a *random* query in Oracle SQL

I need to optimize a procedure in Oracle SQL, mainly using indexes. This is the statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
FOR I IN (SELECT * FROM (SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE)WHERE ROWNUM<=cuantos)
LOOP
DELETE FROM OBSERVATIONS WHERE nplate=i.nplate AND odatetime=i.odatetime;
END LOOP;
end del_obs;
My plan was to create an index related with rownum since it is what appears to be used to do the deletes. But I don't know if it is going to be worthy. The problem with this procedure is that its randomness causes a lot of consistent gets. Can anyone help me with this?? Thanks :)
Note: I cannot change the code, only make improvements afterwards
Use the ROWID pseudo-column to filter the columns:
CREATE OR REPLACE PROCEDURE DEL_OBS(
cuantos number
)
IS
BEGIN
DELETE FROM OBSERVATIONS
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM observations
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM < cuantos
);
END del_obs;
If you have an index on the table then it can use a index fast full scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( id ) AS
SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 50000;
Query 1: No Index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 123 | 00:00:02 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 24 | 123 | 00:00:02 |
| 3 | VIEW | VW_NSO_1 | 10000 | 120000 | 121 | 00:00:02 |
| 4 | SORT UNIQUE | | 1 | 120000 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 19974 | 239688 | 121 | 00:00:02 |
| * 7 | SORT ORDER BY STOPKEY | | 19974 | 239688 | 121 | 00:00:02 |
| 8 | TABLE ACCESS FULL | TABLE_NAME | 19974 | 239688 | 25 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 12 | 1 | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
Query 2 Add an index:
ALTER TABLE table_name ADD CONSTRAINT tn__id__pk PRIMARY KEY ( id )
Query 3 With the index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 37 | 13 | 00:00:01 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 37 | 13 | 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 9968 | 119616 | 11 | 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 119616 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 9968 | 119616 | 11 | 00:00:01 |
| * 7 | SORT ORDER BY STOPKEY | | 9968 | 119616 | 11 | 00:00:01 |
| 8 | INDEX FAST FULL SCAN | TN__ID__PK | 9968 | 119616 | 9 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 25 | 1 | 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
If you cannot do it in single SQL statement using ROWID then you can rewrite your existing procedure to use exactly the same queries but use the FORALL statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number)
IS
TYPE obs_tab IS TABLE OF observations%ROWTYPE;
begin
SELECT *
BULK COLLECT INTO obs_tab
FROM (
SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM<=cuantos;
FORALL i IN 1 .. obs_tab.COUNT
DELETE FROM OBSERVATIONS
WHERE nplate = obs_tab(i).nplate
AND odatetime = obs_tab(i).odatetime;
END del_obs;
What you definitively need is an index on OBSERVATIONS to allow the DELETEwith an index access.
CREATE INDEX cuantos ON OBSERVATIONS(nplate, odatetime);
The execution of the procedure will lead to one FULL TABLE SCANot the OBSERVATIONS table and to one INDEX ACCESS for each deleted record.
For a limited number deleted recrods it will behave similar as the set DELETEproposed in other answer; for larger number of deleted records the elapsed time will linerary scale with the number of deletes.
For a non-trival number of deleted records you must assume that the index is not completely in the buffer pool and lots of disc access will be requried. So you'll end with approximately 100 deleted rows per second.
In other words to delete 100K rows it will take ca. 1/4 hour.
To delete 1M rows you need 2 3/4 of an hour.
You see while deleting in this scale the first part of the task - the FULL SCAN of your table is neglectable, it will take few minutes only. The only possibility to get acceptable response time in this case is to switch the logic to a single DELETEstatement as proposed in other answers.
This behavior is also called the rule: "Row by Row is Slow by Slow" (i.e. processing in a loop works fine, but only with a limited number of records).
You can do this using a single delete statement:
delete from observations o
where (o.nplate, o.odatetime) in (select nplace, odatetime
from (select o2.nplate, o2.odatetime
from observations o2
order by DBMS_RANDOM.VALUE
) o2
where rownum <= v_cuantos
);
This is often faster than executing multiple queries for each row being deleted.
Try this. test on MSSQL hopes so it will work also on Oracle. please remarks the status.
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
DELETE OBSERVATIONS FROM OBSERVATIONS
join (select * from OBSERVATIONS ORDER BY VALUE ) as i on
nplate=i.nplate AND
odatetime=i.odatetime AND
i.ROWNUM<=cuantos;
End DEL_OBS;
Since you say that nplate and odatetime are the primary key of observations, then I am guessing the problem is here:
SELECT * FROM (
SELECT *
FROM observations
ORDER BY DBMS_RANDOM.VALUE)
WHERE ROWNUM<=cuantos;
There is no way to prevent that from performing a full scan of observations, plus a lot of sorting if that's a big table.
You need to change the code that runs. By far, the easiest way to change the code is to change the source code and recompile it.
However, there are ways to change the code that executes without changing the source code. Here are two:
(1) Use DBMS_FGAC to add a policy that detects whether you are in this procedure and, if so, add a predicate to the observations table like this:
AND rowid IN
( SELECT obs_sample.rowid
FROM observations sample (0.05) obs_sample)
(2) Use DBMS_ADVANCED_REWRITE to rewrite your query changing:
FROM observations
.. to ..
FROM observations SAMPLE (0.05)
Using the text of your query in the re-write policy should prevent it from affecting other queries against the observations table.
Neither of these are easy (at all), but can be worth a try if you are really stuck.

Want to process 5000 records from the select query is taking long time in oracle database

Each time i want to process 5000 records like below.
First time i want to process records from 1 to 5000 rows.
second time i want to process records from 5001 to 10000 rows.
third time i want to process records from 10001 to 15001 rows like wise
I dont want to go for procedure or PL/SQL. I will change the rnum values in my code to fetch the 5000 records.
The given query is taking 3 minutes to fetch the records from 3 joined tables. How can i reduced the time to fetch the records.
select * from (
SELECT to_number(AA.MARK_ID) as MARK_ID, AA.SUPP_ID as supplier_id, CC.supp_nm as SUPPLIER_NAME, CC.supp_typ as supplier_type,
CC.supp_lock_typ as supplier_lock_type, ROW_NUMBER() OVER (ORDER BY AA.MARK_ID) as rnum
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL)
where rnum >=10001 and rnum<=15000;
I have tried below scenario but no luck.
I have tried the /*+ USE_NL(AA BB) */ hints.
I used exists in the where conditions. but its taking the same 3 minutes to fetch the records.
Below is the table details.
select count(*) from TABLE_B;
-----------------
2275
select count(*) from TABLE_A;
-----------------
2405276
select count(*) from TABLE_C;
-----------------
1269767
Result of my inner query total records is
SELECT count(*)
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL;
-----------------
2027055
All the used columns in where conditions are indexed properly.
Explain Table for the given query is...
Plan hash value: 3726328503
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 1 | VIEW | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 2 | WINDOW SORT PUSHED RANK | | 2082K| 166M| 200M| 85175 (1)| 00:17:03 |
|* 3 | HASH JOIN | | 2082K| 166M| | 44550 (1)| 00:08:55 |
| 4 | TABLE ACCESS FULL | TABLE_C | 1640 | 49200 | | 22 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 2082K| 107M| 27M| 44516 (1)| 00:08:55 |
|* 6 | VIEW | index$_join$_005 | 1274K| 13M| | 9790 (1)| 00:01:58 |
|* 7 | HASH JOIN | | | | | | |
| 8 | INLIST ITERATOR | | | | | | |
|* 9 | INDEX RANGE SCAN | TABLE_B_IN2 | 1274K| 13M| | 2371 (2)| 00:00:29 |
| 10 | INDEX FAST FULL SCAN| TABLE_B_IU1 | 1274K| 13M| | 4801 (1)| 00:00:58 |
|* 11 | TABLE ACCESS FULL | TABLE_A | 2356K| 96M| | 27174 (1)| 00:05:27 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=10001 AND "RNUM"<=15000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "A"."MARK_ID")<=15000)
3 - access("A"."SUPP_ID"="C"."LOC_ID" AND "A"."VALUE_KEY"="C"."VALUE_KEY")
5 - access("A"."MARK_ID"="A"."MARK_ID" AND "A"."VALUE_KEY"="A"."VALUE_KEY")
6 - filter("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
7 - access(ROWID=ROWID)
9 - access("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
11 - filter("A"."CHNL_ID"=160 AND "A"."VPR_ID" IS NOT NULL)
Could you please anyone help me on this to tune this query as i am trying from last 2 days?
Each query will take a long time because each query will have to join then sort all rows. The row_number analytic function can only return a result if the whole set has been read. This is highly inefficient. If the data set is large, you only want to sort and hash-join once.
You should fetch the whole set once, using batches of 5k rows. Alternatively, if you want to keep your existing code logic, you could store the result in a temporary table, for instance:
CREATE TABLE TMP AS <your above query>
CREATE INDEX ON TMP (rnum)
And then replace your query in your code by
SELECT * FROM TMP WHERE rnum BETWEEN :x AND :y
Obviously if your temp table is being reused periodically, just create it once and delete when done (or use a true temporary table).
How many unique MARK_ID values have you got in TABLE_A? I think you may get better performance if you limit the fetched ranges of records by MARK_ID instead of the artificial row number, because the latter is obviously not sargeable. Granted, you may not get exactly 5000 rows in each range but I have a feeling it's not as important as the query performance.
Firstly, giving obfuscated table names makes it nearly impossible to deduce anything about the data distributions and relationships between tables, so potential answerers are crippled from the start.
However, if every row in table_a matches one row in the other tables then you can avoid some of the usage of 200Mb of temporary disk space that is probably crippling performance by pushing the ranking down into an inline view or common table expression.
Monitor V$SQL_WORKAREA to check the exact amount of space being used for the window function, and if it is still excessive consider modifying the memory management to increase available sort area size.
Something like:
with cte_table_a as (
SELECT
to_number(MARK_ID) as MARK_ID,
SUPP_ID as supplier_id,
ROW_NUMBER() OVER (ORDER BY MARK_ID) as rnum
from
TABLE_A
where
char_id='160' and
VPR_ID IS NOT NULL)
select ...
from
cte_table_a aa,
TABLE_B BB,
TABLE_C CC
WHERE
aa.rnum >= 10001 and
aa.rnum <= 15000 and
AA.MARK_ID = BB.MARK_ID AND
AA.SUPP_ID = CC.location_id AND
BB.VALUE_KEY = AA.VALUE_KEY AND
BB.VALUE_KEY = CC.VALUE_KEY

Distinct values on indexed column

I have a table with 115 M rows. One of the column is indexed (index called "my_index" on explain plan below) and not nullable. Moreover, this column has just one distinct value so far.
When I do
select distinct my_col from my_table;
, it takes 230 seconds which is very long. Here is the explain plan.
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 3 | 22064 (90)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 22064 (90)| 00:03:23 |
| 2 | INDEX FULL SCAN | my_index | 115M| 331M| 2363 (2)| 00:00:22 |
Since the column has just one distinct value, why does it take so long ? Why Oracle does not just check index entries and fastly find that there is just one possible value for this column ? On the explain plan above, the index scanning seems to take 22 s but what is this "SORT UNIQUE NOSORT" which takes ages ?
Thank you in advance for your help
Re analyse the table.
EXEC dbms_stats.gather_table_stats('owner','table_name',cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ');
Change Index Type
One distinct value out of 115M rows??!! That's what called as low cardinality, not so good for the 'normal' B-Tree index Consider a bitmapped index. (If at all you have B-tree)
Reconstructing Query
If you are sure that no new values will be added to this column then please remove the distinct clause and rather use as Abhijith said.
SORT UNIQUE NOSORT is not taking too long. You are looking at the estimates from a bad execution plan that is probably the result of unreasonable optimizer parameters. For example, setting the parameter OPTIMIZER_INDEX_COST_ADJ to 1 instead of the default 100 can produce a similar plan. Most likely your query runs slowly because your database is busy or just slow.
What's wrong with the posted execution plan?
The posted execution plan seems unreasonable. Retrieving data should take much longer than simply throwing out duplicates. And the consumer operation, SORT UNIQUE NOSORT, can start at almost the same time as the producer operation, INDEX FULL SCAN. Normally they should finish at almost the same time. The execution plan in the question shows the optimizer estimates. The screenshot below of an active report shows the actual timelines for a very similar query. All steps are starting and stopping at almost the same time.
Sample setup with reasonable plan
Below is a very similar setup, but with a very plain configuration. Same number of rows read (115 million) and returned (1), and almost the exact same segment size (329MB vs 331 MB). The plan shows almost all of the time being spent on the INDEX FULL SCAN.
drop table test1 purge;
create table test1(a number not null, b number, c number) nologging;
begin
for i in 1 .. 115 loop
insert /*+ append */ into test1 select 1, level, level
from dual connect by level <= 1000000;
commit;
end loop;
end;
/
create index test1_idx on test1(a);
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 244K (4)| 00:48:50 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 244K (4)| 00:48:50 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 237K (1)| 00:47:30 |
--------------------------------------------------------------------------------
Re-creating a bad plan
--Set optimizer_index_cost_adj to a ridiculously low value.
--This changes the INDEX FULL SCAN estimate from 47 minutes to 29 seconds.
alter session set optimizer_index_cost_adj = 1;
--Changing the CPUSPEEDNW to 800 will exactly re-create the time estimate
--for SORT UNIQUE NOSORT. This value is not ridiculous, and it is not
--something you should normally change. But it does imply your CPUs are
--slow. My 2+ year-old desktop had an original score of 1720.
begin
dbms_stats.set_system_stats( 'CPUSPEEDNW', 800);
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);
Plan hash value: 77032494
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 16842 (86)| 00:03:23 |
| 1 | SORT UNIQUE NOSORT| | 1 | 3 | 16842 (86)| 00:03:23 |
| 2 | INDEX FULL SCAN | TEST1_IDX | 115M| 329M| 2389 (2)| 00:00:29 |
--------------------------------------------------------------------------------
How to investigate
Check the parameters.
select name, value from v$parameter where name like 'optimizer_index%'
NAME VALUE
---- -----
optimizer_index_cost_adj 1
optimizer_index_caching 0
Also check the system statistics.
select * from sys.aux_stats$;
+---------------+------------+-------+------------------+
| SNAME | PNAME | PVAL1 | PVAL2 |
+---------------+------------+-------+------------------+
| SYSSTATS_INFO | STATUS | | COMPLETED |
| SYSSTATS_INFO | DSTART | | 09-23-2013 17:52 |
| SYSSTATS_INFO | DSTOP | | 09-23-2013 17:52 |
| SYSSTATS_INFO | FLAGS | 1 | |
| SYSSTATS_MAIN | CPUSPEEDNW | 800 | |
| SYSSTATS_MAIN | IOSEEKTIM | 10 | |
| SYSSTATS_MAIN | IOTFRSPEED | 4096 | |
| SYSSTATS_MAIN | SREADTIM | | |
| SYSSTATS_MAIN | MREADTIM | | |
| SYSSTATS_MAIN | CPUSPEED | | |
| SYSSTATS_MAIN | MBRC | | |
| SYSSTATS_MAIN | MAXTHR | | |
| SYSSTATS_MAIN | SLAVETHR | | |
+---------------+------------+-------+------------------+
To find out where the time is really spent, use a tool like the active report.
select dbms_sqltune.report_sql_monitor(sql_id => '5s63uf4au6hcm',
type => 'active') from dual;
If there are only a few distinct values of the column, try a compressed index:
create index my_index on my_table (my_col) compress;
This will store each distinct value of the column only once, hopefully reducing the execution time of your query.
As a bonus: use this to see the actual plan used for a query:
select /*+ gather_plan_statistics */ distinct my_col from my_table;
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR);
The gather_plan_statistics hint will collect more data (it will take longer to execute), but it works without it too. See the documentation of DBMS_XPLAN.DISPLAY_CURSOR for more details.
See the explain plan carefully.
It scans the whole index to know what you are trying to fetch
Then applies distinct function (try to retrieve the unique values). Though you say there is only one unique value, it has to scan the whole index to get the values. Oracle does not know that there is only one distinct value in the index. You can restrict the rownum = 1 to get the quick answer.
Try this to get the quick answer
select my_col from my_table where rownum = 1;
It is highly unfavourable to add an index on a column which has very less distribution. This is bad for the table and overall for the application as well. This just does not make any sense

Oracle intermediate join table size

When we join more than 2 tables, oracle or for that matter any database decides to join 2 tables and use the result to join with subsequent tables. Is there a way to identify the intermediate join size. I am particularly interested in oracle. One solution I know is to use Autotrace in sqldeveloper which has the column LAST_OUTPUT_ROWS. But for queries executed by pl/sql and other means does oracle record the intermediate join size in some table?
I am asking this because recently we had a problem as someone dropped the statistics and failed to regenerate it and when traced through we found that oracle formed an intermediate table of 180 million rows before arriving at the final result of 6 rows and the query was quite slow.
Oracle can materialize the intermediate results of a table join in the temporary segment set for your session.
Since it's a one-off table that is deleted after the query is complete, its statistics are not stored.
However, you can estimate its size by building a plan for the query and looking at ROWS parameters of the appropriate operation:
EXPLAIN PLAN FOR
WITH q AS
(
SELECT /*+ MATERIALIZE */
e1.value AS val1, e2.value AS val2
FROM t_even e1, t_even e2
)
SELECT COUNT(*)
FROM q
SELECT *
FROM TABLE(DBMS_XPLAN.display())
Plan hash value: 3705384459
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 43G (5)|999:59:59 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | | | |
| 3 | MERGE JOIN CARTESIAN | | 100T| 909T| 42G (3)|999:59:59 |
| 4 | TABLE ACCESS FULL | T_ODD | 10M| 47M| 4206 (3)| 00:00:51 |
| 5 | BUFFER SORT | | 10M| 47M| 42G (3)|999:59:59 |
| 6 | TABLE ACCESS FULL | T_ODD | 10M| 47M| 4204 (3)| 00:00:51 |
| 7 | SORT AGGREGATE | | 1 | | | |
| 8 | VIEW | | 100T| | 1729M (62)|999:59:59 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6604_2660595 | 100T| 909T| 1729M (62)|999:59:59 |
---------------------------------------------------------------------------------------------------------
Here, the materialized table is called SYS_TEMP_0FD9D6604_2660595 and the estimated record count is 100T (100,000,000,000,000 records)