Removing duplicates rows in left Joining query

Removing duplicates rows in left Joining query - sql

SELECT Rec.[Reg_ID]
,Rec.[Reg_No]
,Rec.[Case_ID]
,Det.Deleted AS CaseDeleted
,[Status].[Status]
,Det.[Unit_Submission_Date] AS [Signature]
,TD.TargetDate AS [Target]
,TD.TargetID
FROM [dbo].[Regestrations] Rec
LEFT JOIN [dbo].[Reg_Details] Det ON Rec.Case_ID = Det.CaseID
LEFT JOIN [dbo].[lkpStatus] [Status] ON Rec.Status_ID = [Status].StatusID
LEFT JOIN TargetDate TD ON TD.RecommId = Rec.Reg_ID
WHERE (Det.MissionID = 50 AND [Status].[Status] = 1 AND Rec.Deleted = 0 AND Det.Deleted = 0)
GROUP BY Rec.[Reg_ID],Rec.[Reg_No],Rec.[Case_ID]
,[Status].[Status]
,Det.[Unit_Submission_Date]
,TD.TargetDate
,Det.Deleted
,TD.TargetID
ORDER BY TD.TargetID desc
I have the above query that is supposed to return rows with unique Rec.[Reg_No]. But joined table TargetDate can have duplicate Rec.[Reg_ID] and if thats the case i get duplicate Rec.[Reg_No] rows in my results.
Table TargetDate has a date time column so i want to eliminate the duplicate Rec.[Reg_No] by selecting 1 row with the latest date value from table TargetDate.
How do modify my Join condition or the query where clause to achive the above?

One way is to use a window function such as ROW_NUMBER() that will generate sequential number based on the specified partition. This generated number can then be used to get the latest row.
SELECT Reg_ID, Reg_No, Case_ID, CaseDeleted, [Status], Signature, [Target], TargetID
FROM
(
SELECT Rec.[Reg_ID]
,Rec.[Reg_No]
,Rec.[Case_ID]
,Det.Deleted AS CaseDeleted
,[Status].[Status]
,Det.[Unit_Submission_Date] AS [Signature]
,TD.TargetDate AS [Target]
,TD.TargetID
,RN = ROW_NUMBER() OVER (PARTITION BY Rec.[Reg_No] ORDER BY TD.TargetID DESC)
FROM [BOI].[dbo].[Regestrations] Rec
LEFT JOIN [dbo].[Reg_Details] Det ON Rec.Case_ID = Det.CaseID
LEFT JOIN [dbo].[lkpStatus] [Status] ON Rec.Status_ID = [Status].StatusID
LEFT JOIN TargetDate TD ON TD.RecommId = Rec.Reg_ID
WHERE (Det.MissionID = 50 AND [Status].[Status] = 1 AND Rec.Deleted = 0 AND Det.Deleted = 0)
) subQuery
WHERE RN = 1
ORDER BY TargetID desc

This query can work correctly if you remove TD.TargetDate from GROUP BY clause and compute what you really need in output - MAX(TD.TargetDate)
But preferable way it to avoid GROUP BY clause at all:
...
FROM [BOI].[dbo].[Regestrations] Rec
LEFT JOIN [dbo].[Reg_Details] Det ON Rec.Case_ID = Det.CaseID
LEFT JOIN [dbo].[lkpStatus] [Status] ON Rec.Status_ID = [Status].StatusID
OUTER APPLY(
SELECT TOP 1 td.TargetDate, td.TargetID
FROM TargetDate TD
WHERE TD.RecommId = Rec.Reg_ID
ORDER BY TD.TargetDate DESC
) td
...

You should first find latest TargetDate for each Reg_ID or RecommId. Then you can use your normal join with TargetDate table just this time with matching both the RecommId and TargetDate.
Try this Query:
SELECT Rec.[Reg_ID]
,Rec.[Reg_No]
,Rec.[Case_ID]
,Det.Deleted AS CaseDeleted
,[Status].[Status]
,Det.[Unit_Submission_Date] AS [Signature]
,TD.TargetDate AS [Target]
,TD.TargetID
FROM [BOI].[dbo].[Regestrations] Rec
LEFT JOIN [dbo].[Reg_Details] Det ON Rec.Case_ID = Det.CaseID
LEFT JOIN [dbo].[lkpStatus] [Status] ON Rec.Status_ID = [Status].StatusID
LEFT JOIN (SELECT RecommId, MAX(TargetDate) MaxTargetDate GROUP BY RecommId) TDWithLatestDate ON TDWithLatestDate.RecommId = Rec.Reg_ID
LEFT OUTER JOIN TargetDate ON TD.RecommId = TDWithLatestDate.RecommId AND TD.TargetDate = TDWithLatestDate.MaxTargetDate
WHERE (Det.MissionID = 50
AND [Status].[Status] = 1
AND Rec.Deleted = 0
AND Det.Deleted = 0
)
GROUP BY Rec.[Reg_ID]
,Rec.[Reg_No]
,Rec.[Case_ID]
,[Status].[Status]
,Det.[Unit_Submission_Date]
,TD.TargetDate
,Det.Deleted
,TD.TargetID
ORDER BY TD.TargetID desc
You can improve this query if you want to avoid tie when there more than one record fighting to be latest.

Related

SQL Server aggregate function without group by

I want to include tcon.Inductive_Injection_Hours, tcon.Capacitive_Injection_Hours without applying group by. How can I do that?
SELECT
bp.Serial_Number,
tcon.Serial_Number AS ConverterSerialNumber,
MAX(tcon.Time_Stamp) AS DateStamp,
tcon.Inductive_Injection_Hours,
tcon.Capacitive_Injection_Hours
FROM
dbo.Bypass AS bp
INNER JOIN
dbo.Converter AS c ON bp.Bypass_ID = c.Bypass_ID
INNER JOIN
dbo.Converter_Tel_Data AS tcon ON c.Converter_ID = tcon.Converter_ID
WHERE
(bp.Site_ID = 7)
GROUP BY
bp.Serial_Number, tcon.Serial_Number,
tcon.Inductive_Injection_Hours, tcon.Capacitive_Injection_Hours
ORDER BY
ConverterSerialNumber

I have figured it out.
select [data].Serial_Number,Time_Stamp,Inductive_Injection_Hours,Capacitive_Injection_Hours,b.Serial_Number from Converter_Tel_Data as [data]
inner join dbo.Converter AS c On [data].Converter_ID = c.Converter_ID
inner join dbo.Bypass as b on c.Bypass_ID = b.Bypass_ID
WHERE
(Time_Stamp = (SELECT MAX(Time_Stamp) FROM Converter_Tel_Data WHERE Converter_ID = [data].Converter_ID)) And ([data].Site_ID=7)
ORDER BY [data].Serial_Number

You can use row_number - either in a CTE/derived table or using a trick with TOP 1.
Select Top 1 With Ties
bp.Serial_Number
, tcon.Serial_Number AS ConverterSerialNumber
, tcon.Time_Stamp AS DateStamp
, tcon.Inductive_Injection_Hours
, tcon.Capacitive_Injection_Hours
From dbo.Bypass AS bp
Inner Join dbo.Converter AS c On bp.Bypass_ID = c.Bypass_ID
Inner Join dbo.Converter_Tel_Data AS tcon ON c.Converter_ID = tcon.Converter_ID
Where bp.Site_ID = 7
Order By
row_number() over(Partition By bp.Serial_Number Order By tcon.Time_Stamp desc)
This should return the latest row from the tconn table for each bp.Serial_Number.

SQL statement merge two rows into one

In the results of my sql-statement (SQL Server 2016) I would like to combine two rows with the same value in two columns ("study_id" and "study_start") into one row and keep the row with higest value in a third cell ("Id"). If any columns (i.e. "App_id" or "Date_arrival) in the row with higest Id is NULL, then it should take the value from the row with the lowest "Id".
I get the result below:
Id study_id study_start Code Expl Desc Startmonth App_id Date_arrival Efter_op Date_begin
167262 878899 954 4.1 udd.ord Afbrudt feb 86666 21-06-2012 N 17-08-2012
180537 878899 954 1 Afsluttet Afsluttet feb NULL NULL NULL NULL
And I would like to get this result:
Id study_id study_start Code Expl Desc Startmonth App_id Date_arrival Efter_op Date_begin
180537 878899 954 1 Afsluttet Afsluttet feb 86666 21-06-2012 N 17-08-2012
My statement looks like this:
SELECT dbo.PopulationStam_V.ELEV_ID AS id,
dbo.PopulationStam_V.PERS_ID AS study_id,
dbo.STUDIESTARTER.STUDST_ID AS study_start,
dbo.Optagelse_Studiestatus.AFGANGSARSAG AS Code,
dbo.Optagelse_Studiestatus.KORT_BETEGNELSE AS Expl,
ISNULL((CAST(dbo.Optagelse_Studiestatus.Studiestatus AS varchar(20))), 'Indskrevet') AS 'Desc',
dbo.STUDIESTARTER.OPTAG_START_MANED AS Startmonth,
dbo.ANSOGNINGER.ANSOG_ID as App_id,
dbo.ANSOGNINGER.ANKOMSTDATO AS Data_arrival',
dbo.ANSOGNINGER.EFTEROPTAG AS Efter_op,
dbo.ANSOGNINGER.STATUSDATO AS Date_begin
FROM dbo.INSTITUTIONER
INNER JOIN dbo.PopulationStam_V
ON dbo.INSTITUTIONER.INST_ID = dbo.PopulationStam_V.SEMI_ID
LEFT JOIN dbo.ANSOGNINGER
ON dbo.PopulationStam_V.ELEV_ID = dbo.ANSOGNINGER.ELEV_ID
INNER JOIN dbo.STUDIESTARTER
ON dbo.PopulationStam_V.STUDST_ID_OPRINDELIG = dbo.STUDIESTARTER.STUDST_ID
INNER JOIN dbo.UDD_NAVNE_T
ON dbo.PopulationStam_V.UDDA_ID = dbo.UDD_NAVNE_T.UDD_ID
INNER JOIN dbo.UDDANNELSER
ON dbo.UDD_NAVNE_T.UDD_ID = dbo.UDDANNELSER.UDDA_ID
LEFT OUTER JOIN dbo.PERSONER
ON dbo.PopulationStam_V.PERS_ID = dbo.PERSONER.PERS_ID
LEFT OUTER JOIN dbo.POSTNR
ON dbo.PERSONER.PONR_ID = dbo.POSTNR.PONR_ID
LEFT OUTER JOIN dbo.KønAlleElevID_V
ON dbo.PopulationStam_V.ELEV_ID = dbo.KønAlleElevID_V.ELEV_ID
LEFT OUTER JOIN dbo.Optagelse_Studiestatus
ON dbo.PopulationStam_V.AFAR_ID = dbo.Optagelse_Studiestatus.AFAR_ID
LEFT OUTER JOIN dbo.frafaldsmodel_adgangsgrundlag
ON dbo.frafaldsmodel_adgangsgrundlag.ELEV_ID = dbo.PopulationStam_V.ELEV_ID
LEFT OUTER JOIN dbo.Optagelse_prioriteterUFM
ON dbo.Optagelse_prioriteterUFM.cpr = dbo.PopulationStam_V.CPR_NR
AND dbo.Optagelse_prioriteterUFM.Aar = dbo.frafaldsmodel_adgangsgrundlag.optagelsesaar
LEFT OUTER JOIN dbo.frafaldsmodel_stoettetabel_uddannelser AS fsu
ON fsu.id_uddannelse = dbo.UDDANNELSER.UDDA_ID
AND fsu.id_inst = dbo.INSTITUTIONER.INST_ID
AND fsu.uddannelse_aar = dbo.frafaldsmodel_adgangsgrundlag.optagelsesaar
WHERE dbo.STUDIESTARTER.STUDIESTARTSDATO > '2012-03-01 00:00:00.000'
AND (dbo.Optagelse_Studiestatus.AFGANGSARSAG IS NULL
OR dbo.Optagelse_Studiestatus.AFGANGSARSAG NOT LIKE '2.7.4')
AND (dbo.PopulationStam_V.INDSKRIVNINGSFORM = '1100'
OR dbo.PopulationStam_V.INDSKRIVNINGSFORM = '1700')
GROUP BY dbo.PopulationStam_V.ELEV_ID,
dbo.PopulationStam_V.PERS_ID,
dbo.STUDIESTARTER.STUDST_ID,
dbo.Optagelse_Studiestatus.AFGANGSARSAG,
dbo.Optagelse_Studiestatus.KORT_BETEGNELSE,
dbo.STUDIESTARTER.OPTAG_START_MANED,
Studiestatus,
dbo.ANSOGNINGER.ANSOG_ID,
dbo.ANSOGNINGER.ANKOMSTDATO,
dbo.ANSOGNINGER.EFTEROPTAG,
dbo.ANSOGNINGER.STATUSDATO
I really hope somebody out there can help.

Many ways, this will work:
WITH subSource AS (
/* Your query here */
)
SELECT
s1.id,
/* all other columns work like this:
COALESCE(S1.column,s2.column)
for example: */
coalesce(s1.appid,s2.appid) as appid
FROM subSource s1
INNER JOIN subSource s2
ON s1.study_id =s2.study_id
and s1.study_start = s2.study_start
AND s1.id > s2.id
/* I imagine some other clauses might be needed but maybe not */
The rest is copy paste

Query specific field based on max sequence number

I have a record set where some of the rows are duplicated. In particular, they are duplicated for the last three rows of the record set. Of the entire four rows, the correct result set that I desire would include the first row and the last row. I desire this because for a particular SARAPPD_TERM_CODE_ENTRY, the record needed is the one where the SARAPPD_SEQ_NO value is at its max. So, the first row because for that particular term, the sequence number is maxed at one and the last row because the sequence number is maxed at six. Image and query are below.
select ppd.sarappd_seq_no, ppd.sarappd_term_code_entry, ppd.sarappd_apdc_code,
dap. dap.saradap_term_code_entry,
spri.spriden_id,
t.sgbstdn_astd_code, t.*
from sgbstdn t
left join spriden spri on t.sgbstdn_pidm = spri.spriden_pidm
left join saradap dap on spri.spriden_pidm = dap.saradap_pidm
join sarappd ppd on dap.saradap_pidm = ppd.sarappd_pidm
where t.sgbstdn_astd_code not in ('AS', 'DS', 'WD', 'SU', 'LA')
and t.sgbstdn_stst_code = 'AS'
and spri.spriden_change_ind is null
and spri.spriden_id = '123456789'
and (ppd.sarappd_apdc_code = 25 or ppd.sarappd_apdc_code = 30
or ppd.sarappd_apdc_code =35)
and ppd.sarappd_term_code_entry = dap.saradap_term_code_entry
--where b.sarappd_term_code_entry = ppd.sarappd_term_code_entry)
order by ppd.sarappd_term_code_entry
I believe this is a simple "where this = ( select max() ) type of query but I've been trying some different things and nothing is working. I'm not getting the results I want. So with that said, any help on this would be greatly appreciated. Thanks in advance.

You could use ROW_NUMBER()
;WITH cte
AS
(select
ROW_NUMBER() OVER (PARTITION BY SARAPPD_TERM_CODE_ENTRY ORDER BY SARAPPD_SEQ_NO DESC) AS RN
ppd.sarappd_seq_no,
ppd.sarappd_term_code_entry,
ppd.sarappd_apdc_code,
dap. dap.saradap_term_code_entry,
spri.spriden_id,
t.sgbstdn_astd_code, t.*
from sgbstdn t
left join spriden spri on t.sgbstdn_pidm = spri.spriden_pidm
left join saradap dap on spri.spriden_pidm = dap.saradap_pidm
join sarappd ppd on dap.saradap_pidm = ppd.sarappd_pidm
where t.sgbstdn_astd_code not in ('AS', 'DS', 'WD', 'SU', 'LA')
and t.sgbstdn_stst_code = 'AS'
and spri.spriden_change_ind is null
and spri.spriden_id = '123456789'
and (ppd.sarappd_apdc_code = 25 or ppd.sarappd_apdc_code = 30
or ppd.sarappd_apdc_code =35)
and ppd.sarappd_term_code_entry = dap.saradap_term_code_entry
order by ppd.sarappd_term_code_entry) a
SELECT *
FROM cte WHERE rn = 1

returning only the most recent record in dataset

I have a query returning data for several dates.
I want to return only record with the most recent date in the field SAPOD (the date is in fact CYYMMDD where C=0 before 2000 and 1 after 2000 and I can show this as YYYYMMDD using SAPOD=Case when LEFT(SAPOD,1)=1 then '20' else '19' end + SUBSTRING(cast(sapod as nvarchar(7)),2,7))
here is my query:
SELECT GFCUS, Ne.NEEAN, SCDLE, SAPOD, SATCD,CUS.GFCUN, BGCFN1,
BGCFN2, BGCFN3, SV.SVCSA, SV.SVNA1, SV.SVNA2, SV.SVNA3,
SV.SVNA4, SV.SVNA5, SV.SVPZIP, SV.NONUK
FROM SCPF ACC
INNER JOIN GFPF CUS ON GFCPNC = SCAN
LEFT OUTER JOIN SXPF SEQ ON SXCUS = GFCUS AND SXPRIM = ''
LEFT OUTER JOIN SVPFClean SV ON SVSEQ = SXSEQ
LEFT OUTER JOIN BGPF ON BGCUS = GFCUS AND BGCLC = GFCLC
LEFT OUTER JOIN NEPF NE ON SCAB=NE.NEAB and SCAN=ne.NEAN and SCAS=ne.NEAS
LEFT OUTER JOIN SAPF SA ON SCAB=SAAB and SCAN=SAAN and SCAS=SAAS
WHERE
(SATCD>500 and
scsac='IV' and
scbal = 0 and
scai30<>'Y' and
scai14<>'Y' and
not exists(select * from v5pf where v5and=scan and v5bal<>0))
GROUP BY GFCUS, Ne.NEEAN, SCDLE, SAPOD, SATCD,
CUS.GFCUN, BGCFN1, BGCFN2, BGCFN3, SV.SVCSA,
SV.SVNA1, SV.SVNA2, SV.SVNA3, SV.SVNA4, SV.SVNA5, SV.SVPZIP, SV.NONUK
ORDER BY MAX(SCAN) ASC, SAPOD DESC
I am getting results like the below where there are several transactions by a customer, and we only want to show the data of the most recent transaction:
So how can I show just the most recent transaction? Is this a case where I should use an OUTER APPLY or CROSS APPY?
EDIT:
Sorry I should clarify that I need the most recent date for each of the unique records in the field NEEAN which is the Account number

You can use ROW_NUMBER() as follows:
SELECT
ROW_NUMBER() OVER (PARTITION BY Ne.NEEAN ORDER BY SAPOD DESC) AS [Row],
GFCUS, Ne.NEEAN, SCDLE, SAPOD, SATCD,CUS.GFCUN, BGCFN1,
BGCFN2, BGCFN3, SV.SVCSA, SV.SVNA1, SV.SVNA2, SV.SVNA3,
SV.SVNA4, SV.SVNA5, SV.SVPZIP, SV.NONUK
FROM SCPF ACC
INNER JOIN GFPF CUS ON GFCPNC = SCAN
LEFT OUTER JOIN SXPF SEQ ON SXCUS = GFCUS AND SXPRIM = ''
LEFT OUTER JOIN SVPFClean SV ON SVSEQ = SXSEQ
LEFT OUTER JOIN BGPF ON BGCUS = GFCUS AND BGCLC = GFCLC
LEFT OUTER JOIN NEPF NE ON SCAB=NE.NEAB and SCAN=ne.NEAN and SCAS=ne.NEAS
LEFT OUTER JOIN SAPF SA ON SCAB=SAAB and SCAN=SAAN and SCAS=SAAS
WHERE
(SATCD>500 and
scsac='IV' and
scbal = 0 and
scai30<>'Y' and
scai14<>'Y' and
not exists(select * from v5pf where v5and=scan and v5bal<>0)) and
[Row] = 1
GROUP BY GFCUS, Ne.NEEAN, SCDLE, SAPOD, SATCD,
CUS.GFCUN, BGCFN1, BGCFN2, BGCFN3, SV.SVCSA,
SV.SVNA1, SV.SVNA2, SV.SVNA3, SV.SVNA4, SV.SVNA5, SV.SVPZIP, SV.NONUK
ORDER BY MAX(SCAN) ASC
You could encapsulate this within a subquery if you don't want to return the [Row] column.

you can user row_number to get top 1 row per customer
In the where clause need to return values with pos value as 1
sample query
row_number() over ( partition by GFCUS order by SAPOD desc) as pos

how to insert random records without violating primary constraint

Please pardon my title if misleading, but I have a table tb_party with PRIMARY KEY on party_key
I have following query -
insert TMS..tb_party
(
[party_key]
,[party_first_name]
,[tax_id]
,[party_type_cd]
,[citizenship_country_cd]
,[domicile_country_cd]
,[party_num]
,[batch_dt]
)
select distinct
[party_key]
,[party_first_name]
,[tax_id]
,[party_type_cd]
,[cit]
,[dom]
,[party_num]
,[batch_dt]
from
(select distinct
sca.[party_key]
,sca.[party_first_name]
,sca.[tax_id]
,pt.party_type_cd
,tc_cit.COUNTRY_ID as cit
,tc_dom.COUNTRY_ID as dom
,sca.[party_num]
,getdate() as batch_dt
,dense_rank() over(partition by sca.[party_key] order by sca.party_key) as rnk
from Iteration_3.dbo.staging_cust_acct sca (nolock)
join Iteration_3..STG_PARTY_UPLOAD (nolock) stg_party
on sca.party_key = stg_party.PARTY_KEY
left join Iteration_3..STG_COUNTRY_ISO tc_dom (nolock)
on sca.[domicile_country] = tc_dom.COUNTRY_NAME
left join Iteration_3..STG_COUNTRY_ISO tc_cit (nolock)
on sca.[citizenship_country] = tc_cit.COUNTRY_NAME
left join TMS..tb_party_type pt (nolock)
on sca.[party_type] = pt.party_type_desc
WHERE
SCA.party_type IS NOT NULL
) x
where rnk = 1
The insert fails because it is trying to insert duplicate party key and since the distinct is on ALL columns, it is picking up duplicate party_keys.
What I want - I want to pick up all distinct party_keys and insert 1 row in tb_party. the other rows can be ignored. Is this possible?

Here is where row_number came in handy.
select distinct
[party_key]
,[party_first_name]
,[tax_id]
,[party_type_cd]
,[cit]
,[dom]
,[party_num]
,[batch_dt]
from
(
select distinct
sca.[party_key]
,sca.[party_first_name]
,sca.[tax_id]
,pt.party_type_cd
,tc_cit.COUNTRY_ID as cit
,tc_dom.COUNTRY_ID as dom
,sca.[party_num]
,getdate() as batch_dt
--,dense_rank() over(partition by sca.[party_key] order by sca.party_key,sca.[tax_id],sca.[party_num],sca.[party_first_name]*/) as rnk
,row_number() over (partition by sca.party_key order by sca.party_key) rn
from Iteration_3.dbo.staging_cust_acct sca (nolock)
join Iteration_3..STG_PARTY_UPLOAD (nolock) stg_party
on sca.party_key = stg_party.PARTY_KEY
left join Iteration_3..STG_COUNTRY_ISO tc_dom (nolock)
on sca.[domicile_country] = tc_dom.COUNTRY_NAME
left join Iteration_3..STG_COUNTRY_ISO tc_cit (nolock)
on sca.[citizenship_country] = tc_cit.COUNTRY_NAME
left join TMS..tb_party_type pt (nolock)
on sca.[party_type] = pt.party_type_desc
WHERE --SCA.update_source = 'ECM' AND SCA.update_dt = #ECM_MAX_DATE
--and
SCA.party_type IS NOT NULL
) x
where rn = 1

Try merging into your target table on the unique columns. When not matched, insert. When matched update like this:
...
set target.[column] = coalesce(source.[column], target.[column])
...
This way you update all your target columns when you have source values, otherwise they remain unchanged.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas