Self join SQL is taking too much time to execute

Self join SQL is taking too much time to execute - sql

Below SQL is taking too much time to execute.Dont know where is am doing wrong but yes getting proper result.can i further simplify this sql.
This is oracle db and jmc_job_step table contains huge records.
select *
from
jmc_job_run_id jobrunid0_
inner join
jmc_job_step jobsteps1_
on jobrunid0_.id=jobsteps1_.job_run_id
where
(
jobsteps1_.creation_date in (
select
min(jobstep2_.creation_date)
from
jmc_job_step jobstep2_
where
jobrunid0_.id=jobstep2_.job_run_id
group by
jobstep2_.job_run_id ,
jobstep2_.job_step_no
)
)
or jobsteps1_.job_step_progress_value in (
select
max(jobstep3_.job_step_progress_value)
from
jmc_job_step jobstep3_
where
jobrunid0_.id=jobstep3_.job_run_id
group by
jobstep3_.job_run_id ,
jobstep3_.job_step_no
)
)
order by
jobrunid0_.job_start_time desc

This is useless; it says "I don't care what those columns contain", but - yet - you give the database engine to check those values anyway.
(
upper(jobrunid0_.tenant_id) like '%'|| null
)
and (
upper(jobrunid0_.job_run_id) like '%'||null||'%'
)

Related

Big query De-duplication query is not working properly

anyone please tell me the below query is not working properly, It suppose to delete the duplicate records only and keep the one of them (latest record) but it is deleting all the record instead of keeping one of the duplicate records, why is it so?
delete
from
dev_rahul.page_content_insights
where
(sha_id,
etl_start_utc_dttm) in (
select
(a.sha_id,
a.etl_start_utc_dttm)
from
(
select
sha_id,
etl_start_utc_dttm,
ROW_NUMBER() over (Partition by sha_id
order by
etl_start_utc_dttm desc) as rn
from
dev_rahul.page_content_insights
where
(snapshot_dt) >= '2021-03-25' ) a
where
a.rn <> 1)

Query looks ok, though I don't use that syntax for cleaning up duplicates.
Can I confirm the following:
sha_id, etl_start_utc_dttm is your primary key?
You wish to keep sha_id and the latest row based on etl_start_utc_dttm field descending?
If so, try this two query pattern:
create or replace table dev_rahul.rows_not_to_delete as
SELECT col.* FROM (SELECT ARRAY_AGG(pci ORDER BY etl_start_utc_dttm desc LIMIT 1
) OFFSET(0)] col
FROM dev_rahul.page_content_insights pci
where snapshot_dt >= '2021-03-25' )
GROUP BY sha_id
);
delete dev_rahul.page_content_insights p
where not exists (select 1 from DW_pmo.rows_not_to_delete d
where p.sha_id = d.sha_id and p.etl_start_utc_dttm = d.etl_start_utc_dttm
) and snapshot_dt >= '2021-03-25';
You could do this in a singe query by putting the first statement into a CTE.

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here

You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).

Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).

A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Calculate time difference in minutes in SQL Server 2008

I have a table in SQL server 2008 with data.
Table contains data with amount of time organization has worked on request
CREATE TABLE support
( ID varchar(50),
IN_ORGANIZATION varchar(MAX),
FROM_ORGANIZATION varchar(MAX),
TIMEDIF datetime );
INSERT INTO support
(ID, IN_ORGANIZATION,FROM_ORGANIZATION,TIMEDIF )
VALUES
('22907','ORGANIZATION_NAME_1','RODLAY LLP','2017-04-15 14:58:00.000'),
('22907','MARY LOAN','ORGANIZATION_NAME_1','2017-04-15 15:00:00.000'),
('23289','VENIXTON Ltd','ORGANIZATION_NAME_1','2017-04-21 11:00:00.000'),
('23289','ORGANIZATION_NAME_1','Ocean Loan','2017-04-21 12:00:00.000'),
('23289','Ocean Loan','ORGANIZATION_NAME_1','2017-04-21 13:00:00.000')
;
I want to find time work organizations with the request: ORGANIZATION_NAME_1.
Help me write CURSOR to calculate the time.
Result:
ID, TIMEDIF(minutes)
22907, 2
23289, 120

Datediff function will do the trick
select id,datediff(minute,min(timedif),max(timedif) ) AS time from support
where in_organization = 'ORGANIZATION_NAME_1' or from_organization = 'ORGANIZATION_NAME_1'
group by id ;
My Output:
|id |time
1 |22907 |2
2 |23289 |120
Let tme know in case of any queries.

Maybe this query will help you:
select
id,
DATEDIFF(m,MIN(TIMEDIF),MAX(TIMEDIF)) as [TIMEDIF(minutes)]
from support
where IN_ORGANIZATION ='ORGANIZATION_NAME_1'
or FROM_ORGANIZATION ='ORGANIZATION_NAME_1'
group by id

If you are just arbitrarily trying to get the TimeDifferences between rows you could try something like this:
; WITH x AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY id) AS rwn
From dbo.support
)
SELECT
x.ID
, y.ID AS NextID
, x.IN_ORGANIZATION
, y.IN_ORGANIZATION NextInOrg
, x.FROM_ORGANIZATION
, y.FROM_ORGANIZATION NextFromOrg
, x.TIMEDIF
, y.TIMEDIF AS NextTimeDiff
, x.rwn
, DATEDIFF(MINUTE, x.TIMEDIF, y.TIMEDIF) AS DifferenceFromOneToTheNext
FROM x
INNER JOIN x y ON x.rwn = y.rwn - 1
If you put in an identity that self seeds you can already get a pointer for reference. This is really arbitrary though.

SQL Logic: Finding Non-Duplicates with Similar Rows

I'll do my best to summarize what I am having trouble with. I never used much SQL until recently.
Currently I am using SQL Server 2012 at work and have been tasked with trying to find oddities in SQL tables. Specifically, the tables contain similar information regarding servers. Kind of meta, I know. So they each share a column called "DB_NAME". After that, there are no similar columns. So I need to compare Table A and Table B and produce a list of records (servers) where a server is NOT listed in BOTH Table A and B. Additionally, this query is being ran against an exception list. I'm not 100% sure of the logic to best handle this. And while I would love to get something "extremely efficient", I am more-so looking at something that just plain works at the time being.
SELECT *
FROM (SELECT
UPPER(ta.DB_NAME) AS [DB_Name]
FROM
[CMS].[dbo].[TABLE_A] AS ta
UNION
SELECT
UPPER(tb.DB_NAME) AS [DB_Name]
FROM
[CMS].[dbo].[TABLE_B] as tb
) AS SQLresults
WHERE NOT EXISTS (
SELECT *
FROM
[CMS].[dbo].[TABLE_C_EXCEPTIONS] as tc
WHERE
SQLresults.[DB_Name] = tc.DB_NAME)
ORDER BY SQLresults.[DB_Name]

One method uses union all and aggregation:
select ab.*
from ((select upper(name) as name, 'A' as which
from CMS.dbo.TABLE_A
) union all
(select upper(name), 'B' as which
from CMS.dbo.TABLE_B
)
) ab
where not exists (select 1
from CMS.dbo.TABLE_C_EXCEPTION e
where upper(e.name) = ab.name
)
having count(distinct which) <> 2;
SQL Server is case-insensitive by default. I left the upper()s in the query in case your installation is case sensitive.

Here is another option using EXCEPT. I added a group by in each half of the union because it was not clear in your original post if DB_NAME is unique in your tables.
select DatabaseName
from
(
SELECT UPPER(ta.DB_NAME) AS DatabaseName
FROM [CMS].[dbo].[TABLE_A] AS ta
GROUP BY UPPER(ta.DB_NAME)
UNION ALL
SELECT UPPER(tb.DB_NAME) AS DatabaseName
FROM [CMS].[dbo].[TABLE_B] as tb
GROUP BY UPPER(tb.DB_NAME)
) x
group by DatabaseName
having count(*) < 2
EXCEPT
(
select DN_Name
from CMS.dbo.TABLE_C_EXCEPTION
)

Oracle HASH_JOIN_RIGHT_SEMI performance

Here is my query,
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
SHIPMENT_ITEMS is a very large table (10.1TB) , id_map is a very small table (12 rows and 3 columns). This query goes through HASH_JOIN_RIGHT_SEMI and takes a very long time.SHIPMENT_ITEMS is partitioned on ID column.
If I remove subquery with hard code values , it performs lot better
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN (1,2,3 )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
I cannot remove the subquery as it leads to hard coding.
Given that id_map is a very small table , I expect both queries to perform very similar. Why is the first one taking much longer.
I'm actually trying to understand why this performs so bad.
I expect dynamic partition pruning to happen here and I'm not able to come out with a reason on why its not happening
https://docs.oracle.com/cd/E11882_01/server.112/e25523/part_avail.htm#BABHDCJG

Try hint no_unnest.
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT /*+ NO_UNNEST */ ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
CBO will not try to join subquery and use it like filter

Instead of using 'in' operator, use exists and check the query performance
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE Exists ( SELECT 1 FROM id_map map WHERE map.code = 'A' and map.ID = so.ID)
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Self join SQL is taking too much time to execute - sql

This is useless; it says "I don't care what those columns contain", but - yet - you give the database engine to check those values anyway. ( upper(jobrunid0_.tenant_id) like '%'|| null ) and ( upper(jobrunid0_.job_run_id) like '%'||null||'%' )

Related

Big query De-duplication query is not working properly

Modify my SQL Server query -- returns too many rows sometimes

Calculate time difference in minutes in SQL Server 2008

SQL Logic: Finding Non-Duplicates with Similar Rows

Oracle HASH_JOIN_RIGHT_SEMI performance

Categories

Resources