Oracle HASH_JOIN_RIGHT_SEMI performance - sql

Here is my query,
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
SHIPMENT_ITEMS is a very large table (10.1TB) , id_map is a very small table (12 rows and 3 columns). This query goes through HASH_JOIN_RIGHT_SEMI and takes a very long time.SHIPMENT_ITEMS is partitioned on ID column.
If I remove subquery with hard code values , it performs lot better
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN (1,2,3 )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
I cannot remove the subquery as it leads to hard coding.
Given that id_map is a very small table , I expect both queries to perform very similar. Why is the first one taking much longer.
I'm actually trying to understand why this performs so bad.
I expect dynamic partition pruning to happen here and I'm not able to come out with a reason on why its not happening
https://docs.oracle.com/cd/E11882_01/server.112/e25523/part_avail.htm#BABHDCJG

Try hint no_unnest.
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT /*+ NO_UNNEST */ ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
CBO will not try to join subquery and use it like filter

Instead of using 'in' operator, use exists and check the query performance
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE Exists ( SELECT 1 FROM id_map map WHERE map.code = 'A' and map.ID = so.ID)
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')

Related

Self join SQL is taking too much time to execute

Below SQL is taking too much time to execute.Dont know where is am doing wrong but yes getting proper result.can i further simplify this sql.
This is oracle db and jmc_job_step table contains huge records.
select *
from
jmc_job_run_id jobrunid0_
inner join
jmc_job_step jobsteps1_
on jobrunid0_.id=jobsteps1_.job_run_id
where
(
jobsteps1_.creation_date in (
select
min(jobstep2_.creation_date)
from
jmc_job_step jobstep2_
where
jobrunid0_.id=jobstep2_.job_run_id
group by
jobstep2_.job_run_id ,
jobstep2_.job_step_no
)
)
or jobsteps1_.job_step_progress_value in (
select
max(jobstep3_.job_step_progress_value)
from
jmc_job_step jobstep3_
where
jobrunid0_.id=jobstep3_.job_run_id
group by
jobstep3_.job_run_id ,
jobstep3_.job_step_no
)
)
order by
jobrunid0_.job_start_time desc
This is useless; it says "I don't care what those columns contain", but - yet - you give the database engine to check those values anyway.
(
upper(jobrunid0_.tenant_id) like '%'|| null
)
and (
upper(jobrunid0_.job_run_id) like '%'||null||'%'
)

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Speed up SQL simple Query

We have a table called PROTOKOLL, with the following definition:
PROTOKOLL TableDefinition
The table has 10 million pcs of records.
SELECT *
FROM (SELECT /*+ FIRST_ROWS */ a.*, ROWNUM rnum
FROM (SELECT t0.*, t1.*
FROM PROTOKOLL t0
, PROTOKOLL t1
WHERE (
(
(t0.BENUTZER_ID = 'A07BU0006')
AND (t0.TYP = 'E')
) AND
(
(t1.UUID = t0.ANDERES_PROTOKOLL_UUID)
AND
(t1.TYP = 'A')
)
)
ORDER BY t0.ZEITPUNKT DESC
) a
WHERE ROWNUM <= 4999) WHERE rnum > 0;
So practically we join the table with itself through ANDERES_PROTOKOLL_UUID field, we apply simple filterings. The results are sorted with creation time and the number of the result record set is limited to 5000.
The elapsed time of the query is about 10 Minutes! --- which is not acceptable ☹
I already have the execution plan and statistic information in place and trying to figure out how to speed up the query, pls. find them attached.
My first recognition, that the optimizer puts “"P"."ANDERES_PROTOKOLL_UUID" IS NOT NULL” condition additionally to the where clause, but I do not know why. Is it a problem?
Or where are the bottleneck of the query?
How can I avoid….Any suggestion is welcome.

ORACLE SQL - Compare dates without join

I have a very large table of data 1+ billion rows. If I try to join that table to itself to do a comparison, the cost on the estimated plan is unrunnable (cost: 226831405289150). Is there a way I can achieve the same results as the query below without a join, perhaps an over partition?
What I need to do is make sure another event did not happen within 24 hours before or after the one with the wildcare was received.
Thanks so much for your help!
select e2.SYSTEM_NO,
min(e2.DT) as dt
from SYSTEM_EVENT e2
inner join table1.event el2
on el2.event_id = e2.event_id
left join ( Select se.DT
from SYSTEM_EVENT se
where
--fails
( se.event_id in ('101','102','103','104')
--restores
or se.event_id in ('106','107','108','109')
)
) e3
on e3.dt-e2.dt between .0001 and 1
or e3.dt-e2.dt between -1 and .0001
where el2.descr like '%WILDCARE%'
and e3.dt is null
and e2.REC_STS_CD = 'A'
group by e2.SYSTEM_NO
Not having any test data it is difficult to determine what you are trying to achieve but it appears you could try using an analytic function with a range window:
SELECT system_no,
MIN( dt ) AS dt
FROM (
SELECT system_no,
dt,
COUNT(
CASE
WHEN ( se.event_id in ('101','102','103','104') --fails
OR se.event_id in ('106','107','108','109') ) --restores
THEN 1
END
) OVER (
ORDER BY dt
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS num
FROM system_event
) se
WHERE num = 0
AND REC_STS_CD = 'A'
AND EXISTS(
SELECT 1
FROM table1.event te
WHERE te.descr like '%WILDCARE%'
AND te.event_id = se.event_id
)
GROUP BY system_no
This is not direct answer for your question but it is a bit too long for comment.
How old data may be inserted? 48h window means you need to check only subset of data not whole 1bilion row table if data is inserted incrementally. So if it is please reduce data in comparison by some with clause or temporary table.
If you still need to compare along whole table I would go for partitioning by event_id or other attribute if there is better partition. And compare each group separately.
where el2.descr like '%WILDCARE%' is performance killer for such huge table.

How to write this query [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am stuck in a query and need your help and suggestion.The situation is :
I have a table with structure as
JOB_ID , ITEM_ID , NEW_ITEM_ID , STATUS
Where job_id is a primary key and status can be AC,SB.
Now i want to write a query that selects only those rows from table which have STATUS as AC and for which none of ITEM_ID OR NEW_ITEM_ID is in the row for which status is SB.I have already written a query but it takes a lot of time so please help me writing the optimized query.This what i have written
SELECT * FROM (
SELECT JOB_ID,NEW_ITEM_ID,ITEM_ID,STATUS
FROM X1
WHERE STATUS='AC'
AND NEW_ITEM_ID IS NOT NULL
MINUS
( SELECT T1.JOB_ID,T1.NEW_ITEM_ID ,T1.ITEM_ID ,T1.STATUS
FROM ( SELECT *
FROM X1
WHERE STATUS IN 'AC'
AND NEW_ITEM_ID IS NOT NULL ) T1
, ( SELECT *
FROM X1
WHERE STATUS IN ('PR','SB')
AND NEW_ITEM_ID IS NOT NULL ) T2
WHERE ( T2.ITEM_ID IN (T1.ITEM_ID,T1.NEW_ITEM_ID)
OR T2.NEW_ITEM_ID IN (T1.ITEM_ID,T1.NEW_ITEM_ID)
)
AND T1.STATUS!=T2.STATUS
)
) T
EDIT
This table is going to contain millions of records say around 30M.
The easiest way would be to have a query that selects all ITEM_IDs and NEW_ITEM_IDs which status is SB, then have another query like this:
SELECT * FROM table WHERE STATUS = 'AC' AND WHERE ITEM_ID NOT IN (the results of the previous query) AND WHERE NEW_ITEM_ID NOT IN (the results of the query for NEW_ITEM_IDs mentioned above).
Just an idea though but with the proper syntax I think that should work.
try this :
select * from status where STATUS ='AC' or (STATUS ='SB' and ITEM_ID is null) or or (STATUS ='SB' and NEW_ITEM_ID is null)
It sounds like you are looking for (1) the rows where status is AC and (2) there is no other row where the item_id or new_item_id's match and the status is SB?
How about:
SELECT job_id, item_id, new_item_id, status
FROM x1 a
WHERE a.status = 'AC'
AND NOT EXISTS (SELECT 1 FROM x1 b
WHERE b.status = 'SB'
AND ( b.new_item_id = a.item_id
OR b.item_id = a.new_item_id )
"This table is going to contain millions of records say around 30M"
This is one crucial piece of information but a couple of other key stats are missing. How many rows match the status of 'PR','SB' and 'AC' ? How many rows have new_item_id populated? Are those columns indexed?
You 'select * from x1' in your sub-queries. SELECT * is bad practice, a bug-waiting to happen. However it is disastrous here, because you don't use any of the columns, but you're forcing the database to read the entire row for each entry in the result-sets. The longer the rows the more expensive that is. In the sub-query you really should be driving off just indexes if you can possibly do so.
Ideally, you would have a index on X1 ( STATUS, NEW_ITEM_ID, ITEM_ID, JOB_ID ). Then you wouldn't hit the table at all. But at the very least you need an index on (STATUS, NEW_ITEM_ID). An index just on STATUS won't do you any good unless STATUS is highly selective - several hundred different values, evenly distributed. (Which seems unlikely: in my experience most status columns have a handful of different states_.
Your posted query hits table X1 three times; that will take ages. So the main thing is to reduce the number of times you hit the table. This is where sub-query factoring can help:
with data as ( select job_id, new_item_id, item_id, status
from x1
where status in ('PR','SB', 'AC' )
and new_item_id is not null )
select t1.*
from data t1
, data t2
where t1.status = 'AC'
and t2.status in ( 'PR','SB' )
abd (t2.new_item_id in ( t1.new_item_id, t1.item_id )
or t2.item_id in ( t1.new_item_id, t1.item_id ) )
/
So this query hits the table only once, and with a favourable index not even once.
If the query still takes too much time - or you can't wangle a helpful index - the other option for improving execution times against massive tables is parallel query. This option is open to you if you have an Enterprise Edition license and a server with sufficient CPUs (and both those conditions should be true if you want to run an application database with multi-million row tables_.
with data as ( select /*+ parallel (x1, 4) */
job_id, new_item_id, item_id, status
from x1
...