I have two tables TABLE_A and TABLE_B ( one to many. FK of table_a in table_b ). I have written the following 3 queries and each one of it will perform at different speeds on the tables but basically they all are doing the same.
Time: 3.916 seconds.
SELECT count(*)
FROM TABLE_A hconn
WHERE EXISTS
(SELECT *
FROM TABLE_B hipconn
WHERE HIPCONN.A_ID = HCONN.A_ID
);
Time: 3.52 seconds
SELECT COUNT(*)
FROM TABLE_A hconn,
TABLE_B HIPCONN
WHERE HCONN.A_ID = HIPCONN.A_ID;
Time: 2.72 seconds.
SELECT COUNT(*)
FROM TABLE_A HCONN
JOIN TABLE_B HIPCONN
ON HCONN.A_ID = HIPCONN.A_ID;
From the above timings, we can know that the last query is performing better than other. (I've tested them a bunch of times and they all perform in the same order mentioned but the last query performed well always).
I've started looking at the explain plan for the above queries to find out why it is happening.
Query explain plan, it prints out the same cost and time for all the above queries without any difference.(Explain plan below) I re-ran a couple of times, but the result is same for all the above queries.
Question: Why does the speed of the results vary when the explain plan showed that it takes same amount of time for all the queries? where am I going wrong?
Plan hash value: 600428245
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | | 12913 (2)| 00:02:35 |
| 1 | SORT AGGREGATE | | 1 | 11 | | | |
|* 2 | HASH JOIN RIGHT SEMI | | 2273K| 23M| 39M| 12913 (2)| 00:02:35 |
| 3 | INDEX STORAGE FAST FULL SCAN| BIN$ACCkNNuTHKPgUJAKNJgj5Q==$0 | 2278K| 13M| | 1685 (2)| 00:00:21 |
| 4 | INDEX STORAGE FAST FULL SCAN| BIN$ACCkNNubHKPgUJAKNJgj5Q==$0 | 6448K| 30M| | 4009 (2)| 00:00:49 |
-------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("HIPCONN"."A_ID"="HCONN"."A_ID")
You may use DBMS_XPLAN.DISPLAY_CURSOR to display the actual execution plan for the last SQL statement executed, since the queries may have more than one execution plan in the library cache.
Also you may enable a 10046 trace at level 12 to check why the queries are responding with different execution times.
Related
I am analyzing the "explanation plan" about the following instruction
SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
and Oracle SQL Developer tells me that it has a cardinality of 1513 and cost of 1302.
How are these calculations performed? Could be reproduced with an instruction (calculate with a select and obtain de same value)?
The cardinality generated by an explain plan can be based on many factors, but in your code Oracle is probably just guessing that the SUBSTR expression will return 1% of all rows from the table.
For example, we can recreate your cardinality estimate by creating a simple table with 151,300 rows:
drop table friends;
create table friends(activity varchar2(100));
create index friends_idx on friends(activity);
insert into friends select level from dual connect by level <= 1513 * 100;
begin
dbms_stats.gather_table_stats(user, 'FRIENDS', no_invalidate => false);
end;
/
The resulting explain plan estimates the query will return 1% of the table, or 1513 rows:
explain plan for SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
select * from table(dbms_xplan.display);
Plan hash value: 3524934291
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1513 | 9078 | 72 (6)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| FRIENDS | 1513 | 9078 | 72 (6)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUBSTR("ACTIVITY",1,2)='49')
The above code is the simplest explanation, but there are potentially dozens of other weird things that are going on with your query. Running EXPLAIN PLAN FOR SELECT... and then SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); is often enough to investigate the cardinality. Pay special attention to the "Note" section for any unexpected gotchas.
Not all of these cardinality rules and features are documented. But if you have a lot of free time, and want to understand the math behind it all, run some 10053 trace files and read Jonathan Lewis' blog and book. His book also explains how the "cost" is generated, but the calculations are so complicated that it's not worth worrying about.
Why doesn't Oracle calculate a perfect cardinality estimate?
It's too expensive to calculate actual cardinalities before running the queries. To create an always-perfect estimate for the SUBSTR operation, Oracle would have to run something like the below query:
SELECT SUBSTR(activity,1,2), COUNT(*)
FROM friends
GROUP BY SUBSTR(activity,1,2);
For my sample data, the above query returns 99 counts, and determines that the cardinality estimate should be 1111 for the original query.
But the above query has to first read all the data from FRIENDS.ACTIVITY, which requires either an index fast full scan or a full table scan. Then the data has to be sorted or hashed to get the counts per group (which is likely an O(N*LOG(N)) operation). If the table is large, the intermediate results won't fit in memory and must be written and then read from disk.
Pre-calculating the cardinality would be more work than the actual query itself. The results could perhaps be saved, but storing those results could take up a lot of space, and how does the database know that the predicate will ever be needed again? And even if the pre-calculated cardinalities were stored, as soon as someone modifies the table those values may become worthless.
And this whole effort assumes that the functions are deterministic. While SUBSTR works reliably, what if there was a custom function like DBMS_RANDOM.VALUE? These problems are both theoretically impossible (the halting problem), and very difficult in practice. Instead, the optimizer relies on guesses like DBA_TABLES.NUM_ROWS (from when the statistics were last gathered) * 0.01 for "complex" predicates.
Dynamic Sampling
Dynamic sampling, also known as dynamic statistics, will pre-run parts of your SQL statement to create a better estimate. You can set the amount of data to be sampled, and by setting the value to 10, Oracle will effectively run the whole thing ahead of time to determine the cardinality. This feature can obviously be pretty slow, and there are lots of weird edge cases and other features I'm not discussing here, but for your query it can create a perfect estimate of 1,111 rows:
EXPLAIN PLAN FOR SELECT /*+ dynamic_sampling(10) */ * FROM friends WHERE SUBSTR(activity,1,2) = '49';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 3524934291
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1111 | 6666 | 72 (6)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| FRIENDS | 1111 | 6666 | 72 (6)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUBSTR("ACTIVITY",1,2)='49')
Note
-----
- dynamic statistics used: dynamic sampling (level=10)
Dynamic Reoptimization
Oracle can keep track of the number of rows at run-time and adjust the plan accordingly. This feature doesn't help you with your simple sample query. But if the table was used as part of a join, when the cardinality estimates become more important, Oracle will build multiple versions of the explain plan and use the one depending on the actual cardinality.
In the below explain plan, you can see the estimate is still the same old 1513. But if the actual number is much lower at run time, Oracle will disable the HASH JOIN operation meant for a large number of rows, and will switch to the NESTED LOOPS operation that is better suited for a smaller number of rows.
EXPLAIN PLAN FOR
SELECT *
FROM friends friends1
JOIN friends friends2
ON friends1.activity = friends2.activity
WHERE SUBSTR(friends1.activity,1,2) = '49';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(format => '+adaptive'));
Plan hash value: 215764417
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1530 | 18360 | 143 (5)| 00:00:01 |
| * 1 | HASH JOIN | | 1530 | 18360 | 143 (5)| 00:00:01 |
|- 2 | NESTED LOOPS | | 1530 | 18360 | 143 (5)| 00:00:01 |
|- 3 | STATISTICS COLLECTOR | | | | | |
| * 4 | TABLE ACCESS FULL | FRIENDS | 1513 | 9078 | 72 (6)| 00:00:01 |
|- * 5 | INDEX RANGE SCAN | FRIENDS_IDX | 1 | 6 | 168 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | FRIENDS | 151K| 886K| 70 (3)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("FRIENDS1"."ACTIVITY"="FRIENDS2"."ACTIVITY")
4 - filter(SUBSTR("FRIENDS1"."ACTIVITY",1,2)='49')
5 - access("FRIENDS1"."ACTIVITY"="FRIENDS2"."ACTIVITY")
Note
-----
- this is an adaptive plan (rows marked '-' are inactive)
Expression Statistics
Expression statistics tells Oracle to gather additional types of statistics. We can force Oracle to gather statistics on the SUBSTR expression, and then those statistics can be used for more accurate estimates. In the below example, the final estimate is actually only slightly different. Expression statistics alone don't work well here, but that was just bad luck in this case.
SELECT dbms_stats.create_extended_stats(extension => '(SUBSTR(activity,1,2))', ownname => user, tabname => 'FRIENDS')
FROM DUAL;
begin
dbms_stats.gather_table_stats(user, 'FRIENDS');
end;
/
EXPLAIN PLAN FOR SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 3524934291
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1528 | 13752 | 72 (6)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| FRIENDS | 1528 | 13752 | 72 (6)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUBSTR("ACTIVITY",1,2)='49')
Expression Statistics and Histograms
With the addition of a histogram, we're finally creating something pretty similar to what your teacher described. When the expression statistics are gathered, a histogram will save information about the number of unique values in up to 255 different ranges or buckets. In our case, since there are only 99 unique rows, the histogram will perfectly estimate the number of rows for '49' as '1111'.
--(There are several ways to gather histograms. Instead of directly forcing it, I prefer to call the query
-- multiple times so that Oracle will register the need for a histogram, and automatically create one.)
SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
begin
dbms_stats.gather_table_stats(user, 'FRIENDS');
end;
/
EXPLAIN PLAN FOR SELECT * FROM friends WHERE SUBSTR(activity,1,2) = '49';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 3524934291
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1111 | 9999 | 72 (6)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| FRIENDS | 1111 | 9999 | 72 (6)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUBSTR("ACTIVITY",1,2)='49')
Summary
Oracle will not automatically pre-run all predicates to perfectly estimate cardinalities. But there are several mechanisms we can use to get Oracle to do something very similar for a small number of queries that we care about.
The situation gets even more complicated when you consider bind variables - what if the value '49' changes frequently? (Adaptive Cursor Sharing can help with that.) Or what if a huge amount of rows are modified, how do we update statistics quickly? (Online Statistics Gathering and Incremental Statistics can help with that.)
The optimizer doesn't really optimize. There's only enough time to satisfice.
I just make some queries for select data from my server. The query is:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
ds_doc.archiveno,
ds_arch.archiveid
FROM ECR.ds_comp,
ECR.ds_doc,
ECR.ds_arch
WHERE ds_comp.docidno=ds_doc.docidno
AND ds_doc.archiveno =ds_arch.archiveno
GROUP BY ds_doc.archiveno,
ds_arch.archiveid;
result what is expecting is :
9708,24 9704,93 9 Vee3 0,009255342
13140,55 12682,93 10 Vf5 0,012095385
104533,94 89183,02 3 Mdf4 0,085051556
72346,34 48290,63 7 Sds2 0,046053534
But this query almost take one day. Any idea for optimize this query please?
You provide close to no information that is required to help with performance problem, so only a general checklist can be provided
Check the Query
The query does not qualify the columns clengthand plength so please check if they are defined in the table ds_comp - if not, maybe you do not need to join to this table at all...
Also I assume that docidno is a primary key of ds_doc and archiveno is PK of ds_arch. If not you query will work, but you will get a different result as you expect due to duplication caused by the join (this may also cause excesive elapsed time)!
Verify the Execution Plan
Produce the execution plan for your query in text form (to be able to post it) as follows
EXPLAIN PLAN SET STATEMENT_ID = '<sometag>' into plan_table FOR
... your query here ...
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', '<sometag>','ALL'));
Remember that you are joining complete tables (not only few rows for some ID), so if you see INDEX ACCESS or NESTED LOOP there is a problem that explains the long runtime.
You want to see only HASH JOIN and FULL TABLE SCAN in your plan.
Index Access
Contrary to some recommendations in other answers if you want to profit from Index definition you do not need indexes on join columns (as explained above). What you can do is to cover all required attributes in indexes and perform the query using only indexes and ommit the table access at all. This will help if the tables are bright, i.e. the row size is large.
This definition will be needed
create index ds_comp_idx1 on ds_comp (docidno,clength,plength);
create index ds_doc_idx1 on ds_doc (docidno,archiveno);
create index ds_arch_idx1 on ds_arch (archiveno,archiveid);
and you will receive this plan
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1119K| 97M| 908 (11)| 00:00:01 |
| 1 | HASH GROUP BY | | 1119K| 97M| 908 (11)| 00:00:01 |
|* 2 | HASH JOIN | | 1119K| 97M| 831 (3)| 00:00:01 |
|* 3 | HASH JOIN | | 1001 | 52052 | 5 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | DS_ARCH_IDX1 | 11 | 286 | 1 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| DS_DOC_IDX1 | 1001 | 26026 | 4 (0)| 00:00:01 |
| 6 | INDEX FAST FULL SCAN | DS_COMP_IDX1 | 1119K| 41M| 818 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."DOCIDNO"="D"."DOCIDNO")
3 - access("D"."ARCHIVENO"="A"."ARCHIVENO")
Note the INDEX FULL SCAN and INDEX FAST FULL SCAN which means you are scanning the data from the index only and you do not need to perform the full table scan.
Use Parallel Option
With your rather simple query there is not much option to improve something. What works always is to deploy a parallel query using the /*+ PARALLEL(N) */ hint.
The precontition is that your database is configured for this option and you have hardware that can deploy it.
Rewrite using explicit joins:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
d.archiveno,
a.archiveid
FROM ECR.ds_comp c
INNER JOIN ECR.ds_doc d ON c.docidno=d.docidno
INNER JOIN ECR.ds_arch a ON d.archiveno=a.archiveno
GROUP BY d.archiveno,
a.archiveid;
Check indexes exist on join columns c.docidno, d.docidno, d.archiveno, a.archiveno
I have this query with a windows function that I have a hard time getting rid of the sort in Oracle. I am by no means an Oracle Expert but my company's application needs to be compatible in both Oracle and SQL Server and we don't really have an Oracle expert so I need help.
Here's is the query in question:
SELECT
A.TYP_0,A.ACCNUM_0,A.NUM_0,A.DUDLIG_0,A.NUMHDU_0
,A.DATEVT_0
,A.PAYDAT_0
,A.BPRTYP_0
,A.CPY_0
,A.FCY_0
,A.BPR_0
,A.LIG_0
,A.SAC_0
,A.SNS_0
,A.AMTCUR_0
,A.AMTLOC_0
,A.PAYCUR_0
,A.PAYLOC_0
,MIN(DATEVT_0) over (PARTITION BY A.TYP_0,A.ACCNUM_0,A.NUM_0,A.DUDLIG_0 ORDER BY NUMHDU_0 ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS MINFOLLOWING
FROM SEED.HISTODUD A
WHERE EXTRACT(YEAR FROM DATEVT_0) > 1800
I have created an index for this just like I did in SQL Server but had to put the INCLUDE fields into the index because that option didn't exist in Oracle
CREATE UNIQUE INDEX X3ARAP_IDX ON SEED.HISTODUD
(
TYP_0,
,ACCNUM_0
,NUM_0
,DUDLIG_0
,NUMHDU_0
,DATEVT_0
,PAYDAT_0
,BPRTYP_0
,CPY_0
,FCY_0
,BPR_0
,LIG_0
,SAC_0
,SNS_0
,AMTCUR_0
,AMTLOC_0
,PAYCUR_0
,PAYLOC_0
);
Here is the execution plan:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3728420768
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 690 | 71070 | 59 (2)| 00:00:01 |
| 1 | WINDOW SORT | | 690 | 71070 | 59 (2)| 00:00:01 |
|* 2 | INDEX FAST FULL SCAN| X3ARAP_IDX | 690 | 71070 | 58 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - filter(EXTRACT(YEAR FROM INTERNAL_FUNCTION("DATEVT_0"))>1800)
We have a customer that has a really huge database and the sort used to create a temporary table but it seems to be no longer doing so (I tried to dropped the index and try the old query but I don't see a temp table on it anymore for some weird reason) but I just can't get rid of the sort.
I tried to replace the MIN by a ROW_NUMBER and get rid of the condition on ROWS to see if that was the issue but I still get the same execution plan.
There is a table which has trades and its row count is 220 million, one of column is counterparty. The column is indexed. If I run a normal query like:
select *
from <table>
where counterparty = 'X'
The plan shows it uses index. Where as if I use group by on same column, it doesn't use index and does table scan. i.e.: for below query:
select counterparty, count(*)
from <table>
group by counterparty
Could you please advise, why it's not using the index for group by? FYI - I have already run the db stats.
FYI - the plan for 1st and second query is shown below:
Note - we are migrating data from Sybase to oracle, when I use same group by in Sybase with same indexes. The query uses indexes, but not in oracle.
First
Plan hash value: 350128866
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2209 | 1469K| 914 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FXCASHTRADE | 2209 | 1469K| 914 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SCB_FXCASHTRADE_002 | 2209 | | 11 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("COUNTERPARTY"='test')
Second
> Plan hash value: 2920872612
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 100K| 2151K| | 6558K (1)| 00:00:38 |
| 1 | HASH GROUP BY | | 100K| 2151K| 6780M| 6558K (1)| 00:00:38 |
| 2 | TABLE ACCESS FULL| FXCASHTRADE | 221M| 4643M| | 6034K (1)| 00:00:35 |
I am going to make an educated guess and say that counterparty is defined as a nullable column. As such, Oracle can't solely rely on the index to generate the results of your group by query, since null values need to be included in the results, but (Oracle) indexes don't include null values. With that in mind, a full table scan makes sense.
If there is no good reason for counterparty to be nullable, go ahead and make it not null. The execution plan should then change to use the index as expected.
Alternatively, if you can't make that change, but you don't care about null values for this particular query, you can tweak the query to filter our null values explicitly. This should also result in a better execution plan.
select counterparty, count(*)
from tbl
where counterparty is not null -- add this filter
group by counterparty
Note: I'm no Sybase expert, but I assume that indexes include null values. Oracle indexes do not include null values. That would explain the difference in execution plan between both databases.
I am a bit new to Oracle and I am have a question regarding Oracle's explain plan. I have used the 'auto-trace' feature for a particular query.
SQL> SELECT * from myTable;
11 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 1233351234
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 330 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| MYTABLE| 11 | 330 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
My question is if I want to calculate the 'total' cost of this query, is it 6 (3+3) or its only 3. Suppose I had a larger query with more steps in the plan, do I have to add up all the values in the cost column to get the total cost or is it the first value (ID=0) that is the total cost of a query?
Cost is 3, the plan is shown as a hierarchy, with the cost of the sub-components already included in the parent components.
You might also want to take a look at some of the responses to:
How do you interpret a query's explain plan?