I have two tables tableA(33M records) and tableB (270K records),
I want to delete all records in tableA which also exist in tableB. So write a SQL statement below.
I think It should be modified since it table more than 1 hour to remove them all.
Do you have an idea if it is usual for these kind of operation.
Note: primary key of both tables is id.
delete from tableA where id in (select id from tableB);
Here is the sql statement explain
| 0 | DELETE STATEMENT | | 289K| 7341K| | 85624 (1)| 00:17:08 |
| 1 | DELETE | tableA | | | | | |
| 2 | MERGE JOIN | | 289K| 7341K| | 85624 (1)| 00:17:08 |
| 3 | INDEX FULL SCAN | SYS_C0015397 | 36M| 455M| | 84050 (1)| 00:16:49 |
|* 4 | SORT JOIN | | 289K| 3670K| 11M| 1574 (1)| 00:00:19 |
| 5 | INDEX FAST FULL SCAN| SYS_C0015401 | 289K| 3670K| | 193 (2)| 00:00:03 |
---------------------------------------------------------------------------------------------------
That's an interesting execution plan. You don't see merge joins often because they usually require a sort of the data first, but in this case only one data set needs to be sorted because it's accessed via an index fast full scan (which returns unsorted data) instead of an index full scan.
Most of the cost is associated with reading the SYS_C0015397 index via an index full scan, and I'd guess that the optimiser has done the arithmetic for a pair of fast full scans and a hash join and rejected it. Still, I'd see if that can be hinted with:
delete /*+ no_use_merge(tablea) */ from ...
I'm not sure if that's enough to get a hash join, but see if the explain plan tries something other than a merge join there.
Is the join column on tablea a unique or PK?
Use exists or insert saved data into new table (B), drop old table (A) and rename the new table(B) as a table(A).
Related
I'm a bit puzzled on why a full table scan is performed on a simple sql query that uses primary key to join:
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
Explain shows:
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | | 283K (2)| 00:00:15 |
| 1 | SORT AGGREGATE | | 1 | 24 | | | |
|* 2 | HASH JOIN | | 677K| 15M| 14M| 283K (2)| 00:00:15 |
| 3 | INLIST ITERATOR | | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| ZVZ_BRIEF_REGISTRATIE | 694K| 6779K| | 17430 (1)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | ZVZ_BRIEF_REGISTRATIE_IF4 | 694K| | | 1469 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | ZVZ_PRINT_DOCUMENT | 9567K| 127M| | 260K (1)| 00:00:14 |
----------------------------------------------------------------------------------------------------------------------------
Where pd.PRINT_DOCUMENT_ID is a primary key.
Despite millions of records, I wouldn't expect this query to be slow.
What is the reason, and how to improve?
Does this give you a different plan?
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
WHERE br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
If so then you want to add BRIEF_REG_GROEP_ID to your index.
Probably last time statistics for ZVZ_PRINT_DOCUMENT were calculated when there were very few rows, so Oracle thinks that hash will be very small. Either try recalculating statistics or use hints:
SELECT /*+ leading(br pd) use_nl(pd)*/ max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
The optimiser estimates that it will access 694K rows from ZVZ_BRIEF_REGISTRATIE for the three BRIEF_REG_GROEP_ID values, using an index, and then it needs to get the corresponding details from ZVZ_PRINT_DOCUMENT. 694K individual index lookups is a lot (consider that it has to go the the index for each one and then use the rowid to access the table, in a loop, 694K times), and it has calculated that it will take less effort to just read ZVZ_PRINT_DOCUMENT once and crunch the two sets in a single hash join. Index lookups are usually better for small volumes of data.
Is it any faster if you hint it to use the index?
Are the row estimates in the execution plan correct? How many rows are there in each table and how many will you actually read?
What is your Oracle version and do you have adaptive features enabled?
It's slightly odd that your query has no WHERE clause but instead a filtering condition is included in the inner join. I expect the optimiser will rewrite it as a WHERE predicate anyway, but I would still want to experiment to see whether it affected the plan.
I just make some queries for select data from my server. The query is:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
ds_doc.archiveno,
ds_arch.archiveid
FROM ECR.ds_comp,
ECR.ds_doc,
ECR.ds_arch
WHERE ds_comp.docidno=ds_doc.docidno
AND ds_doc.archiveno =ds_arch.archiveno
GROUP BY ds_doc.archiveno,
ds_arch.archiveid;
result what is expecting is :
9708,24 9704,93 9 Vee3 0,009255342
13140,55 12682,93 10 Vf5 0,012095385
104533,94 89183,02 3 Mdf4 0,085051556
72346,34 48290,63 7 Sds2 0,046053534
But this query almost take one day. Any idea for optimize this query please?
You provide close to no information that is required to help with performance problem, so only a general checklist can be provided
Check the Query
The query does not qualify the columns clengthand plength so please check if they are defined in the table ds_comp - if not, maybe you do not need to join to this table at all...
Also I assume that docidno is a primary key of ds_doc and archiveno is PK of ds_arch. If not you query will work, but you will get a different result as you expect due to duplication caused by the join (this may also cause excesive elapsed time)!
Verify the Execution Plan
Produce the execution plan for your query in text form (to be able to post it) as follows
EXPLAIN PLAN SET STATEMENT_ID = '<sometag>' into plan_table FOR
... your query here ...
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', '<sometag>','ALL'));
Remember that you are joining complete tables (not only few rows for some ID), so if you see INDEX ACCESS or NESTED LOOP there is a problem that explains the long runtime.
You want to see only HASH JOIN and FULL TABLE SCAN in your plan.
Index Access
Contrary to some recommendations in other answers if you want to profit from Index definition you do not need indexes on join columns (as explained above). What you can do is to cover all required attributes in indexes and perform the query using only indexes and ommit the table access at all. This will help if the tables are bright, i.e. the row size is large.
This definition will be needed
create index ds_comp_idx1 on ds_comp (docidno,clength,plength);
create index ds_doc_idx1 on ds_doc (docidno,archiveno);
create index ds_arch_idx1 on ds_arch (archiveno,archiveid);
and you will receive this plan
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1119K| 97M| 908 (11)| 00:00:01 |
| 1 | HASH GROUP BY | | 1119K| 97M| 908 (11)| 00:00:01 |
|* 2 | HASH JOIN | | 1119K| 97M| 831 (3)| 00:00:01 |
|* 3 | HASH JOIN | | 1001 | 52052 | 5 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | DS_ARCH_IDX1 | 11 | 286 | 1 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| DS_DOC_IDX1 | 1001 | 26026 | 4 (0)| 00:00:01 |
| 6 | INDEX FAST FULL SCAN | DS_COMP_IDX1 | 1119K| 41M| 818 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."DOCIDNO"="D"."DOCIDNO")
3 - access("D"."ARCHIVENO"="A"."ARCHIVENO")
Note the INDEX FULL SCAN and INDEX FAST FULL SCAN which means you are scanning the data from the index only and you do not need to perform the full table scan.
Use Parallel Option
With your rather simple query there is not much option to improve something. What works always is to deploy a parallel query using the /*+ PARALLEL(N) */ hint.
The precontition is that your database is configured for this option and you have hardware that can deploy it.
Rewrite using explicit joins:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
d.archiveno,
a.archiveid
FROM ECR.ds_comp c
INNER JOIN ECR.ds_doc d ON c.docidno=d.docidno
INNER JOIN ECR.ds_arch a ON d.archiveno=a.archiveno
GROUP BY d.archiveno,
a.archiveid;
Check indexes exist on join columns c.docidno, d.docidno, d.archiveno, a.archiveno
I am trying to run the below query and i am joining the tables on the index field ( hdr.M_KEYID)
still i see TABLE ACCESS FULL in explain plan .
Can you please let me know where did i go wrong and how than this be changed to make it faster
Below are the indexes on each table
Indexes on MY_H2S
M_KEY0
M_KEY1
Indexes of MY_HBS
M_DATE
M_KEYID
M_DATE
Query:
select
bdy.M_DATE as M_DATE,
M_KEY0 as M_KEY0,
M_KEY1 as M_KEY1 ,
(M_B_F+M_A_F)/2 as M_PRICE,
bdy.M_DATE as M_DATE
from
MY_H2S hdr left join MY_HBS bdy on hdr.M_KEYID = bdy.M_KEYID
Explain Plan :
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 182K| 12M| 458 (1)| 00:00:06 |
|* 1 | HASH JOIN OUTER | | 182K| 12M| 458 (1)| 00:00:06 |
| 2 | TABLE ACCESS FULL| MY_H2S | 124 | 3968 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| MY_HBS | 182K| 7288K| 455 (1)| 00:00:06 |
----------------------------------------------------------------------------------
Can you please let mw know where did i go wrong and how than this be chnaged to make it faster
This is too long for a comment.
Personally, I would expect Oracle to use the MY_HBS(M_KEYID) for the JOIN. However, there are mitigating circumstances:
The table is small, fitting (presumably) on one page.
The index does not cover the query (you are selecting other columns).
The optimizer is balancing multiple considerations. Linearly scanning a list of 124 records is not necessarily worse than loading an index, traversing the index, and then loading the (single) database.
There is a table which has trades and its row count is 220 million, one of column is counterparty. The column is indexed. If I run a normal query like:
select *
from <table>
where counterparty = 'X'
The plan shows it uses index. Where as if I use group by on same column, it doesn't use index and does table scan. i.e.: for below query:
select counterparty, count(*)
from <table>
group by counterparty
Could you please advise, why it's not using the index for group by? FYI - I have already run the db stats.
FYI - the plan for 1st and second query is shown below:
Note - we are migrating data from Sybase to oracle, when I use same group by in Sybase with same indexes. The query uses indexes, but not in oracle.
First
Plan hash value: 350128866
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2209 | 1469K| 914 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FXCASHTRADE | 2209 | 1469K| 914 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SCB_FXCASHTRADE_002 | 2209 | | 11 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("COUNTERPARTY"='test')
Second
> Plan hash value: 2920872612
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 100K| 2151K| | 6558K (1)| 00:00:38 |
| 1 | HASH GROUP BY | | 100K| 2151K| 6780M| 6558K (1)| 00:00:38 |
| 2 | TABLE ACCESS FULL| FXCASHTRADE | 221M| 4643M| | 6034K (1)| 00:00:35 |
I am going to make an educated guess and say that counterparty is defined as a nullable column. As such, Oracle can't solely rely on the index to generate the results of your group by query, since null values need to be included in the results, but (Oracle) indexes don't include null values. With that in mind, a full table scan makes sense.
If there is no good reason for counterparty to be nullable, go ahead and make it not null. The execution plan should then change to use the index as expected.
Alternatively, if you can't make that change, but you don't care about null values for this particular query, you can tweak the query to filter our null values explicitly. This should also result in a better execution plan.
select counterparty, count(*)
from tbl
where counterparty is not null -- add this filter
group by counterparty
Note: I'm no Sybase expert, but I assume that indexes include null values. Oracle indexes do not include null values. That would explain the difference in execution plan between both databases.
I have a query I would like to optimize. This is the query:
SELECT CONN.connNum, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
FROM INTER
INNER JOIN CONN ON (INTER.IDConn_FK = CONN.IDConn)
GROUP BY INTER.IDConn_FK, CONN.connNum;
These are the explain plan results:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 171 | 7 (15)| 00:00:01 |
| 1 | HASH GROUP BY | | 3 | 171 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS | | 3 | 171 | 6 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 3 | 171 | 6 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | INTER | 3 | 78 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C002012172 | 1 | | 0 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| CONN | 1 | 31 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
I've tried using more specific SELECTs, but the results are the same (something like FROM (SELECT IDConn_FK, walkingDistanceMinutes FROM INTER) I etc). Can you please show me a way to get the cost down?
It was very useful to know if IDConn_FK and connNum were unique on their table because this changes lots of things.
If they're both unique on their tables, you wouldn't need to group results because there wouldn't be multiple occurrences of the same value for connNum. So, in this case, one optimizations would be to not group by because there is only a single value of walkingDistanceMinutes corresponding to each connNum. Removing an unneeded group by would be the right optimization here.
If just connNum is unique on CONN, then one way to optimize this query may be to limit the size of the resources needed to sort the elements during the MIN evaluation. This can be done using a subquery that will also limit the number of rows involved in the join. Here you can use query #1
If only IDConn_FK is unique then the query is fine as it is. Query #2 may help you a little, but not really much.
If none of the two columns is unique, you can always try to limit the number of rows involved in the join through a subquery like for case #2, but you will also need to re-evaluate the MIN once more because you need it corresponding to connNum(that relies on table CONN). Don't think that grouping twice will be more expensive than doing it at once: this is a sort of divide-et-impera approach(separate a complex problem into more simple problems and the recombine their results together to get the solution for the complex problem). Here you could use query #2.
Query #1:
SELECT CONN.connNum, minimalWalkingDistance
FROM (
select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
from INTER
GROUP BY INTER.IDConn_FK
) inter
JOIN CONN using (IDConn)
Query #2
SELECT CONN.connNum, MIN(INTER.minimalWalkingDistance) AS minimalWalkingDistance
FROM (
select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
from INTER
GROUP BY INTER.IDConn_FK
) inter
JOIN CONN using (IDConn)
group by CONN.connNum
And last one more thing to know: don't always consider execution plan cost as God's word, there are many times where queries with high cost are more efficient than others with lower cost. Especially when there are a high number of joins and aggregations.
For your size of data, there is no real optimization possible. For larger data, Oracle should choose other execution paths. You might try this:
select c.connNum,
(select min(i.walkingDistanceMinutes
from inter i
where i.IDConn_FK = c.idConn
) as minimalWalkingDistance
from conn c ;
I'm not 100% sure this is exactly the same query. I'm assuming that idConn is the primary key on the conn table.
Create a unique index on Conn (IDConn, connNum).
This should remove the last live off the query plan as the index can satisfy all needed columns.