There is a table which has trades and its row count is 220 million, one of column is counterparty. The column is indexed. If I run a normal query like:
select *
from <table>
where counterparty = 'X'
The plan shows it uses index. Where as if I use group by on same column, it doesn't use index and does table scan. i.e.: for below query:
select counterparty, count(*)
from <table>
group by counterparty
Could you please advise, why it's not using the index for group by? FYI - I have already run the db stats.
FYI - the plan for 1st and second query is shown below:
Note - we are migrating data from Sybase to oracle, when I use same group by in Sybase with same indexes. The query uses indexes, but not in oracle.
First
Plan hash value: 350128866
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2209 | 1469K| 914 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FXCASHTRADE | 2209 | 1469K| 914 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SCB_FXCASHTRADE_002 | 2209 | | 11 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("COUNTERPARTY"='test')
Second
> Plan hash value: 2920872612
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 100K| 2151K| | 6558K (1)| 00:00:38 |
| 1 | HASH GROUP BY | | 100K| 2151K| 6780M| 6558K (1)| 00:00:38 |
| 2 | TABLE ACCESS FULL| FXCASHTRADE | 221M| 4643M| | 6034K (1)| 00:00:35 |
I am going to make an educated guess and say that counterparty is defined as a nullable column. As such, Oracle can't solely rely on the index to generate the results of your group by query, since null values need to be included in the results, but (Oracle) indexes don't include null values. With that in mind, a full table scan makes sense.
If there is no good reason for counterparty to be nullable, go ahead and make it not null. The execution plan should then change to use the index as expected.
Alternatively, if you can't make that change, but you don't care about null values for this particular query, you can tweak the query to filter our null values explicitly. This should also result in a better execution plan.
select counterparty, count(*)
from tbl
where counterparty is not null -- add this filter
group by counterparty
Note: I'm no Sybase expert, but I assume that indexes include null values. Oracle indexes do not include null values. That would explain the difference in execution plan between both databases.
Related
I have executed the below query but the indexes are not being used.
Following are the indexes available for the below tables.
I have provided the explain plan generated for the query.
Can some one please tell me why the indexes are not being used.
I have gathered the table statistics multiple times also.
wms_area_master - Index name: WMS_AREA_MASTER_PK - Index columns: DC_CODE, DC_AREA
wms_bin_master - WMS_BIN_MASTER_IDX - DC_CODE, DC_AREA
EXPLAIN PLAN FOR
SELECT *
from wms_area_master wam ,
wms_bin_master wbm
where WAM.DC_CODE = wBM.DC_CODE
and WAM.DC_AREA = wBM.DC_AREA;
select * from table(dbms_xplan.display);
Plan hash value: 2387754896
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 41079 | 12M| 252 (2)| 00:00:01 |
|* 1 | HASH JOIN | | 41079 | 12M| 252 (2)| 00:00:01 |
| 2 | TABLE ACCESS FULL| WMS_AREA_MASTER | 217 | 32984 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| WMS_BIN_MASTER | 41058 | 6214K| 248 (2)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("WAM"."DC_CODE"="WBM"."DC_CODE" AND
"WAM"."DC_AREA"="WBM"."DC_AREA")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
- 1 Sql Plan Directive used for this statement
Thanks
Your query doesn't appear to have any predicates, just join conditions, so there doesn't appear to be any reason to use an index here. Since you need to read all the data from both tables, the fastest way to do so will be to do table scans. Using an index isn't necessarily faster and doing a table scan isn't necessarily slower-- it depends on how much of the data you need to access.
If you had predicates in your query that restricted the rows that were returned, Oracle might find it advantageous to use an index on those columns. If your projection (the columns in the select) list were only columns that were part of an index rather than every column in the table, it is possible that Oracle would choose to do a full scan of the index rather than of the table assuming the index segment was meaningfully smaller than the table segment.
I'm a bit puzzled on why a full table scan is performed on a simple sql query that uses primary key to join:
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
Explain shows:
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | | 283K (2)| 00:00:15 |
| 1 | SORT AGGREGATE | | 1 | 24 | | | |
|* 2 | HASH JOIN | | 677K| 15M| 14M| 283K (2)| 00:00:15 |
| 3 | INLIST ITERATOR | | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| ZVZ_BRIEF_REGISTRATIE | 694K| 6779K| | 17430 (1)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | ZVZ_BRIEF_REGISTRATIE_IF4 | 694K| | | 1469 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | ZVZ_PRINT_DOCUMENT | 9567K| 127M| | 260K (1)| 00:00:14 |
----------------------------------------------------------------------------------------------------------------------------
Where pd.PRINT_DOCUMENT_ID is a primary key.
Despite millions of records, I wouldn't expect this query to be slow.
What is the reason, and how to improve?
Does this give you a different plan?
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
WHERE br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
If so then you want to add BRIEF_REG_GROEP_ID to your index.
Probably last time statistics for ZVZ_PRINT_DOCUMENT were calculated when there were very few rows, so Oracle thinks that hash will be very small. Either try recalculating statistics or use hints:
SELECT /*+ leading(br pd) use_nl(pd)*/ max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
The optimiser estimates that it will access 694K rows from ZVZ_BRIEF_REGISTRATIE for the three BRIEF_REG_GROEP_ID values, using an index, and then it needs to get the corresponding details from ZVZ_PRINT_DOCUMENT. 694K individual index lookups is a lot (consider that it has to go the the index for each one and then use the rowid to access the table, in a loop, 694K times), and it has calculated that it will take less effort to just read ZVZ_PRINT_DOCUMENT once and crunch the two sets in a single hash join. Index lookups are usually better for small volumes of data.
Is it any faster if you hint it to use the index?
Are the row estimates in the execution plan correct? How many rows are there in each table and how many will you actually read?
What is your Oracle version and do you have adaptive features enabled?
It's slightly odd that your query has no WHERE clause but instead a filtering condition is included in the inner join. I expect the optimiser will rewrite it as a WHERE predicate anyway, but I would still want to experiment to see whether it affected the plan.
I just make some queries for select data from my server. The query is:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
ds_doc.archiveno,
ds_arch.archiveid
FROM ECR.ds_comp,
ECR.ds_doc,
ECR.ds_arch
WHERE ds_comp.docidno=ds_doc.docidno
AND ds_doc.archiveno =ds_arch.archiveno
GROUP BY ds_doc.archiveno,
ds_arch.archiveid;
result what is expecting is :
9708,24 9704,93 9 Vee3 0,009255342
13140,55 12682,93 10 Vf5 0,012095385
104533,94 89183,02 3 Mdf4 0,085051556
72346,34 48290,63 7 Sds2 0,046053534
But this query almost take one day. Any idea for optimize this query please?
You provide close to no information that is required to help with performance problem, so only a general checklist can be provided
Check the Query
The query does not qualify the columns clengthand plength so please check if they are defined in the table ds_comp - if not, maybe you do not need to join to this table at all...
Also I assume that docidno is a primary key of ds_doc and archiveno is PK of ds_arch. If not you query will work, but you will get a different result as you expect due to duplication caused by the join (this may also cause excesive elapsed time)!
Verify the Execution Plan
Produce the execution plan for your query in text form (to be able to post it) as follows
EXPLAIN PLAN SET STATEMENT_ID = '<sometag>' into plan_table FOR
... your query here ...
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', '<sometag>','ALL'));
Remember that you are joining complete tables (not only few rows for some ID), so if you see INDEX ACCESS or NESTED LOOP there is a problem that explains the long runtime.
You want to see only HASH JOIN and FULL TABLE SCAN in your plan.
Index Access
Contrary to some recommendations in other answers if you want to profit from Index definition you do not need indexes on join columns (as explained above). What you can do is to cover all required attributes in indexes and perform the query using only indexes and ommit the table access at all. This will help if the tables are bright, i.e. the row size is large.
This definition will be needed
create index ds_comp_idx1 on ds_comp (docidno,clength,plength);
create index ds_doc_idx1 on ds_doc (docidno,archiveno);
create index ds_arch_idx1 on ds_arch (archiveno,archiveid);
and you will receive this plan
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1119K| 97M| 908 (11)| 00:00:01 |
| 1 | HASH GROUP BY | | 1119K| 97M| 908 (11)| 00:00:01 |
|* 2 | HASH JOIN | | 1119K| 97M| 831 (3)| 00:00:01 |
|* 3 | HASH JOIN | | 1001 | 52052 | 5 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | DS_ARCH_IDX1 | 11 | 286 | 1 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| DS_DOC_IDX1 | 1001 | 26026 | 4 (0)| 00:00:01 |
| 6 | INDEX FAST FULL SCAN | DS_COMP_IDX1 | 1119K| 41M| 818 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."DOCIDNO"="D"."DOCIDNO")
3 - access("D"."ARCHIVENO"="A"."ARCHIVENO")
Note the INDEX FULL SCAN and INDEX FAST FULL SCAN which means you are scanning the data from the index only and you do not need to perform the full table scan.
Use Parallel Option
With your rather simple query there is not much option to improve something. What works always is to deploy a parallel query using the /*+ PARALLEL(N) */ hint.
The precontition is that your database is configured for this option and you have hardware that can deploy it.
Rewrite using explicit joins:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
d.archiveno,
a.archiveid
FROM ECR.ds_comp c
INNER JOIN ECR.ds_doc d ON c.docidno=d.docidno
INNER JOIN ECR.ds_arch a ON d.archiveno=a.archiveno
GROUP BY d.archiveno,
a.archiveid;
Check indexes exist on join columns c.docidno, d.docidno, d.archiveno, a.archiveno
I have this query with a windows function that I have a hard time getting rid of the sort in Oracle. I am by no means an Oracle Expert but my company's application needs to be compatible in both Oracle and SQL Server and we don't really have an Oracle expert so I need help.
Here's is the query in question:
SELECT
A.TYP_0,A.ACCNUM_0,A.NUM_0,A.DUDLIG_0,A.NUMHDU_0
,A.DATEVT_0
,A.PAYDAT_0
,A.BPRTYP_0
,A.CPY_0
,A.FCY_0
,A.BPR_0
,A.LIG_0
,A.SAC_0
,A.SNS_0
,A.AMTCUR_0
,A.AMTLOC_0
,A.PAYCUR_0
,A.PAYLOC_0
,MIN(DATEVT_0) over (PARTITION BY A.TYP_0,A.ACCNUM_0,A.NUM_0,A.DUDLIG_0 ORDER BY NUMHDU_0 ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS MINFOLLOWING
FROM SEED.HISTODUD A
WHERE EXTRACT(YEAR FROM DATEVT_0) > 1800
I have created an index for this just like I did in SQL Server but had to put the INCLUDE fields into the index because that option didn't exist in Oracle
CREATE UNIQUE INDEX X3ARAP_IDX ON SEED.HISTODUD
(
TYP_0,
,ACCNUM_0
,NUM_0
,DUDLIG_0
,NUMHDU_0
,DATEVT_0
,PAYDAT_0
,BPRTYP_0
,CPY_0
,FCY_0
,BPR_0
,LIG_0
,SAC_0
,SNS_0
,AMTCUR_0
,AMTLOC_0
,PAYCUR_0
,PAYLOC_0
);
Here is the execution plan:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3728420768
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 690 | 71070 | 59 (2)| 00:00:01 |
| 1 | WINDOW SORT | | 690 | 71070 | 59 (2)| 00:00:01 |
|* 2 | INDEX FAST FULL SCAN| X3ARAP_IDX | 690 | 71070 | 58 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - filter(EXTRACT(YEAR FROM INTERNAL_FUNCTION("DATEVT_0"))>1800)
We have a customer that has a really huge database and the sort used to create a temporary table but it seems to be no longer doing so (I tried to dropped the index and try the old query but I don't see a temp table on it anymore for some weird reason) but I just can't get rid of the sort.
I tried to replace the MIN by a ROW_NUMBER and get rid of the condition on ROWS to see if that was the issue but I still get the same execution plan.
I want to index this query clause -- note that the text is static.
SELECT * FROM tbl where flags LIKE '%current_step: complete%'
To re-iterate, the current_step: complete never changes in the query. I want to build an index that will effectively pre-calculate this boolean value, thereby preventing full table scans...
I would prefer not to add a boolean column to store the pre-calculated value as this would necessitate code changes in the application....
If you don't want to change the query, and it isn't just an issue of nor changing the data maintenance (in which case a virtual column and/or index would do the job), you could use a materialised view that applies the filter, and let query rewrite take case of using that instead of the real table. Which may well be overkill but is an option.
The original plan for a mocked-up version:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TBL | 2 | 60 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAGS" IS NOT NULL AND "FLAGS" LIKE '%current_step:
complete%')
A materialised view that will only hold the records your query is interested in (this is a simple example but you'd need to decide how to refresh and add a log if needed):
create materialized view mvw
enable query rewrite as
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
Now your query hits the materialised view, thanks to query rewrite:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: complete%';
select * from table(dbms_xplan.display);
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 60 | 3 (0)| 00:00:01 |
| 1 | MAT_VIEW REWRITE ACCESS FULL| MVW | 2 | 60 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
But any other query will still use the original table:
explain plan for
SELECT * FROM tbl where flags LIKE '%current_step: working%';
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TBL | 1 | 27 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAGS" LIKE '%current_step: success%' AND "FLAGS" IS NOT
NULL)
Of course a virtual index would be simpler if you are allowed to modify the query...
A full text search index might be what you are looking for.
There are a few ways you can implement this:
Oracle has Oracle Text where you can define which type of full text index you want.
Lucene is a Java full text search framework.
Solr is a server product that provides full text search.
I would prefer not to add a boolean column to store the pre-calculated value as this would necessitate code changes in the application
There are two ways I can suggest :
1.
If you are on 11g and up, you could have a VIRTUAL COLUMN always generated as 1 when the value is complete else 0. All you need to do then :
select * from table where virtual_column = 1
To improve performance, you could have an index over it, which is equivalent to a function-based index.
2.
Update : Perhaps, I should be more clear with my second point : source
There are instances where Oracle will use an index to resolve a like with the pattern of '%text%'. If the query can be resolved without having to go back to the table (rowid lookup), the index may be chosen. Example:
select distinct first_nm from person where first_nm like '%EV%';
And in above case, Oracle will do an index fast full scan - a full scan of the smaller index.