I just make some queries for select data from my server. The query is:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
ds_doc.archiveno,
ds_arch.archiveid
FROM ECR.ds_comp,
ECR.ds_doc,
ECR.ds_arch
WHERE ds_comp.docidno=ds_doc.docidno
AND ds_doc.archiveno =ds_arch.archiveno
GROUP BY ds_doc.archiveno,
ds_arch.archiveid;
result what is expecting is :
9708,24 9704,93 9 Vee3 0,009255342
13140,55 12682,93 10 Vf5 0,012095385
104533,94 89183,02 3 Mdf4 0,085051556
72346,34 48290,63 7 Sds2 0,046053534
But this query almost take one day. Any idea for optimize this query please?
You provide close to no information that is required to help with performance problem, so only a general checklist can be provided
Check the Query
The query does not qualify the columns clengthand plength so please check if they are defined in the table ds_comp - if not, maybe you do not need to join to this table at all...
Also I assume that docidno is a primary key of ds_doc and archiveno is PK of ds_arch. If not you query will work, but you will get a different result as you expect due to duplication caused by the join (this may also cause excesive elapsed time)!
Verify the Execution Plan
Produce the execution plan for your query in text form (to be able to post it) as follows
EXPLAIN PLAN SET STATEMENT_ID = '<sometag>' into plan_table FOR
... your query here ...
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', '<sometag>','ALL'));
Remember that you are joining complete tables (not only few rows for some ID), so if you see INDEX ACCESS or NESTED LOOP there is a problem that explains the long runtime.
You want to see only HASH JOIN and FULL TABLE SCAN in your plan.
Index Access
Contrary to some recommendations in other answers if you want to profit from Index definition you do not need indexes on join columns (as explained above). What you can do is to cover all required attributes in indexes and perform the query using only indexes and ommit the table access at all. This will help if the tables are bright, i.e. the row size is large.
This definition will be needed
create index ds_comp_idx1 on ds_comp (docidno,clength,plength);
create index ds_doc_idx1 on ds_doc (docidno,archiveno);
create index ds_arch_idx1 on ds_arch (archiveno,archiveid);
and you will receive this plan
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1119K| 97M| 908 (11)| 00:00:01 |
| 1 | HASH GROUP BY | | 1119K| 97M| 908 (11)| 00:00:01 |
|* 2 | HASH JOIN | | 1119K| 97M| 831 (3)| 00:00:01 |
|* 3 | HASH JOIN | | 1001 | 52052 | 5 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | DS_ARCH_IDX1 | 11 | 286 | 1 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| DS_DOC_IDX1 | 1001 | 26026 | 4 (0)| 00:00:01 |
| 6 | INDEX FAST FULL SCAN | DS_COMP_IDX1 | 1119K| 41M| 818 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."DOCIDNO"="D"."DOCIDNO")
3 - access("D"."ARCHIVENO"="A"."ARCHIVENO")
Note the INDEX FULL SCAN and INDEX FAST FULL SCAN which means you are scanning the data from the index only and you do not need to perform the full table scan.
Use Parallel Option
With your rather simple query there is not much option to improve something. What works always is to deploy a parallel query using the /*+ PARALLEL(N) */ hint.
The precontition is that your database is configured for this option and you have hardware that can deploy it.
Rewrite using explicit joins:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
d.archiveno,
a.archiveid
FROM ECR.ds_comp c
INNER JOIN ECR.ds_doc d ON c.docidno=d.docidno
INNER JOIN ECR.ds_arch a ON d.archiveno=a.archiveno
GROUP BY d.archiveno,
a.archiveid;
Check indexes exist on join columns c.docidno, d.docidno, d.archiveno, a.archiveno
Related
I have executed the below query but the indexes are not being used.
Following are the indexes available for the below tables.
I have provided the explain plan generated for the query.
Can some one please tell me why the indexes are not being used.
I have gathered the table statistics multiple times also.
wms_area_master - Index name: WMS_AREA_MASTER_PK - Index columns: DC_CODE, DC_AREA
wms_bin_master - WMS_BIN_MASTER_IDX - DC_CODE, DC_AREA
EXPLAIN PLAN FOR
SELECT *
from wms_area_master wam ,
wms_bin_master wbm
where WAM.DC_CODE = wBM.DC_CODE
and WAM.DC_AREA = wBM.DC_AREA;
select * from table(dbms_xplan.display);
Plan hash value: 2387754896
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 41079 | 12M| 252 (2)| 00:00:01 |
|* 1 | HASH JOIN | | 41079 | 12M| 252 (2)| 00:00:01 |
| 2 | TABLE ACCESS FULL| WMS_AREA_MASTER | 217 | 32984 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| WMS_BIN_MASTER | 41058 | 6214K| 248 (2)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("WAM"."DC_CODE"="WBM"."DC_CODE" AND
"WAM"."DC_AREA"="WBM"."DC_AREA")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
- 1 Sql Plan Directive used for this statement
Thanks
Your query doesn't appear to have any predicates, just join conditions, so there doesn't appear to be any reason to use an index here. Since you need to read all the data from both tables, the fastest way to do so will be to do table scans. Using an index isn't necessarily faster and doing a table scan isn't necessarily slower-- it depends on how much of the data you need to access.
If you had predicates in your query that restricted the rows that were returned, Oracle might find it advantageous to use an index on those columns. If your projection (the columns in the select) list were only columns that were part of an index rather than every column in the table, it is possible that Oracle would choose to do a full scan of the index rather than of the table assuming the index segment was meaningfully smaller than the table segment.
I'm a bit puzzled on why a full table scan is performed on a simple sql query that uses primary key to join:
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
Explain shows:
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | | 283K (2)| 00:00:15 |
| 1 | SORT AGGREGATE | | 1 | 24 | | | |
|* 2 | HASH JOIN | | 677K| 15M| 14M| 283K (2)| 00:00:15 |
| 3 | INLIST ITERATOR | | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| ZVZ_BRIEF_REGISTRATIE | 694K| 6779K| | 17430 (1)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | ZVZ_BRIEF_REGISTRATIE_IF4 | 694K| | | 1469 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | ZVZ_PRINT_DOCUMENT | 9567K| 127M| | 260K (1)| 00:00:14 |
----------------------------------------------------------------------------------------------------------------------------
Where pd.PRINT_DOCUMENT_ID is a primary key.
Despite millions of records, I wouldn't expect this query to be slow.
What is the reason, and how to improve?
Does this give you a different plan?
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
WHERE br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
If so then you want to add BRIEF_REG_GROEP_ID to your index.
Probably last time statistics for ZVZ_PRINT_DOCUMENT were calculated when there were very few rows, so Oracle thinks that hash will be very small. Either try recalculating statistics or use hints:
SELECT /*+ leading(br pd) use_nl(pd)*/ max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
The optimiser estimates that it will access 694K rows from ZVZ_BRIEF_REGISTRATIE for the three BRIEF_REG_GROEP_ID values, using an index, and then it needs to get the corresponding details from ZVZ_PRINT_DOCUMENT. 694K individual index lookups is a lot (consider that it has to go the the index for each one and then use the rowid to access the table, in a loop, 694K times), and it has calculated that it will take less effort to just read ZVZ_PRINT_DOCUMENT once and crunch the two sets in a single hash join. Index lookups are usually better for small volumes of data.
Is it any faster if you hint it to use the index?
Are the row estimates in the execution plan correct? How many rows are there in each table and how many will you actually read?
What is your Oracle version and do you have adaptive features enabled?
It's slightly odd that your query has no WHERE clause but instead a filtering condition is included in the inner join. I expect the optimiser will rewrite it as a WHERE predicate anyway, but I would still want to experiment to see whether it affected the plan.
There is a table which has trades and its row count is 220 million, one of column is counterparty. The column is indexed. If I run a normal query like:
select *
from <table>
where counterparty = 'X'
The plan shows it uses index. Where as if I use group by on same column, it doesn't use index and does table scan. i.e.: for below query:
select counterparty, count(*)
from <table>
group by counterparty
Could you please advise, why it's not using the index for group by? FYI - I have already run the db stats.
FYI - the plan for 1st and second query is shown below:
Note - we are migrating data from Sybase to oracle, when I use same group by in Sybase with same indexes. The query uses indexes, but not in oracle.
First
Plan hash value: 350128866
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2209 | 1469K| 914 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FXCASHTRADE | 2209 | 1469K| 914 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SCB_FXCASHTRADE_002 | 2209 | | 11 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("COUNTERPARTY"='test')
Second
> Plan hash value: 2920872612
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 100K| 2151K| | 6558K (1)| 00:00:38 |
| 1 | HASH GROUP BY | | 100K| 2151K| 6780M| 6558K (1)| 00:00:38 |
| 2 | TABLE ACCESS FULL| FXCASHTRADE | 221M| 4643M| | 6034K (1)| 00:00:35 |
I am going to make an educated guess and say that counterparty is defined as a nullable column. As such, Oracle can't solely rely on the index to generate the results of your group by query, since null values need to be included in the results, but (Oracle) indexes don't include null values. With that in mind, a full table scan makes sense.
If there is no good reason for counterparty to be nullable, go ahead and make it not null. The execution plan should then change to use the index as expected.
Alternatively, if you can't make that change, but you don't care about null values for this particular query, you can tweak the query to filter our null values explicitly. This should also result in a better execution plan.
select counterparty, count(*)
from tbl
where counterparty is not null -- add this filter
group by counterparty
Note: I'm no Sybase expert, but I assume that indexes include null values. Oracle indexes do not include null values. That would explain the difference in execution plan between both databases.
I have two tables TABLE_A and TABLE_B ( one to many. FK of table_a in table_b ). I have written the following 3 queries and each one of it will perform at different speeds on the tables but basically they all are doing the same.
Time: 3.916 seconds.
SELECT count(*)
FROM TABLE_A hconn
WHERE EXISTS
(SELECT *
FROM TABLE_B hipconn
WHERE HIPCONN.A_ID = HCONN.A_ID
);
Time: 3.52 seconds
SELECT COUNT(*)
FROM TABLE_A hconn,
TABLE_B HIPCONN
WHERE HCONN.A_ID = HIPCONN.A_ID;
Time: 2.72 seconds.
SELECT COUNT(*)
FROM TABLE_A HCONN
JOIN TABLE_B HIPCONN
ON HCONN.A_ID = HIPCONN.A_ID;
From the above timings, we can know that the last query is performing better than other. (I've tested them a bunch of times and they all perform in the same order mentioned but the last query performed well always).
I've started looking at the explain plan for the above queries to find out why it is happening.
Query explain plan, it prints out the same cost and time for all the above queries without any difference.(Explain plan below) I re-ran a couple of times, but the result is same for all the above queries.
Question: Why does the speed of the results vary when the explain plan showed that it takes same amount of time for all the queries? where am I going wrong?
Plan hash value: 600428245
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | | 12913 (2)| 00:02:35 |
| 1 | SORT AGGREGATE | | 1 | 11 | | | |
|* 2 | HASH JOIN RIGHT SEMI | | 2273K| 23M| 39M| 12913 (2)| 00:02:35 |
| 3 | INDEX STORAGE FAST FULL SCAN| BIN$ACCkNNuTHKPgUJAKNJgj5Q==$0 | 2278K| 13M| | 1685 (2)| 00:00:21 |
| 4 | INDEX STORAGE FAST FULL SCAN| BIN$ACCkNNubHKPgUJAKNJgj5Q==$0 | 6448K| 30M| | 4009 (2)| 00:00:49 |
-------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("HIPCONN"."A_ID"="HCONN"."A_ID")
You may use DBMS_XPLAN.DISPLAY_CURSOR to display the actual execution plan for the last SQL statement executed, since the queries may have more than one execution plan in the library cache.
Also you may enable a 10046 trace at level 12 to check why the queries are responding with different execution times.
I have two tables tableA(33M records) and tableB (270K records),
I want to delete all records in tableA which also exist in tableB. So write a SQL statement below.
I think It should be modified since it table more than 1 hour to remove them all.
Do you have an idea if it is usual for these kind of operation.
Note: primary key of both tables is id.
delete from tableA where id in (select id from tableB);
Here is the sql statement explain
| 0 | DELETE STATEMENT | | 289K| 7341K| | 85624 (1)| 00:17:08 |
| 1 | DELETE | tableA | | | | | |
| 2 | MERGE JOIN | | 289K| 7341K| | 85624 (1)| 00:17:08 |
| 3 | INDEX FULL SCAN | SYS_C0015397 | 36M| 455M| | 84050 (1)| 00:16:49 |
|* 4 | SORT JOIN | | 289K| 3670K| 11M| 1574 (1)| 00:00:19 |
| 5 | INDEX FAST FULL SCAN| SYS_C0015401 | 289K| 3670K| | 193 (2)| 00:00:03 |
---------------------------------------------------------------------------------------------------
That's an interesting execution plan. You don't see merge joins often because they usually require a sort of the data first, but in this case only one data set needs to be sorted because it's accessed via an index fast full scan (which returns unsorted data) instead of an index full scan.
Most of the cost is associated with reading the SYS_C0015397 index via an index full scan, and I'd guess that the optimiser has done the arithmetic for a pair of fast full scans and a hash join and rejected it. Still, I'd see if that can be hinted with:
delete /*+ no_use_merge(tablea) */ from ...
I'm not sure if that's enough to get a hash join, but see if the explain plan tries something other than a merge join there.
Is the join column on tablea a unique or PK?
Use exists or insert saved data into new table (B), drop old table (A) and rename the new table(B) as a table(A).