PostgreSQL Aggregate function select 2 corresponding fields - sql

I have the following sub query:
LEFT OUTER JOIN
(
SELECT
MAX(testresult."testdate") AS beforeresult,
testschedule."test_id",
testschedule."asset_id",
testschedule."testdate" AS testdate,
testschedule."testresult_id" as scheduleresult_id
FROM
"public"."testresult" testresult
INNER JOIN "public"."testschedule" testschedule
ON testschedule."asset_id" = testresult."asset_id"
AND testschedule."test_id" = testresult."test_id"
WHERE
"testresult"."client_id" = 25368272
AND testresult."testdate" < testschedule."mintolerancedate"
AND testschedule."testdate" > '2016-10-01'
AND testschedule."testdate" < '2016-10-20'
GROUP BY
testschedule."asset_id",
testschedule."test_id",
testschedule."testdate",
testschedule."testresult_id"
ORDER BY MAX(testresult."testdate")
) lasttr
ON "testschedule"."asset_id" = lasttr."asset_id"
AND "testschedule"."test_id" = lasttr."test_id"
AND testschedule."testdate" = lasttr.testdate
This gives me the correct testdate. However, I also need the testschedule."testresult_id" that corresponds with the date. Is there a way to select this from the above query?

You can fiddle with window functions or in the most recent versions, do a left lateral join:
LEFT JOIN LATERAL
(SELECT ts."test_id", ts."asset_id", ts."testdate" AS testdate, ts."testresult_id" as scheduleresult_id,
tr.*
FROM "public"."testresult" tr INNER JOIN
"public"."testschedule" ts
ON ts."asset_id" = tr."asset_id" and ts."test_id" = tr."test_id"
WHERE tr."client_id" = 25368272 AND
tr."testdate" < ts."mintolerancedate" AND
ts."testdate" > '2016-10-01' and ts."testdate" < '2016-10-20' AND
"testschedule"."asset_id" = ts."asset_id" AND
"testschedule"."test_id" = ts."test_id"
ORDER BY tr."testdate" DESC
FETCH FIRST 1 ROW ONLY
) lasttr
A lateral join is like a correlated subquery in the FROM clause. You can return as many columns as you like. There is no ON clause because the join conditions are in the subquery's WHERE clause.

Related

SQL Server aggregate function without group by

I want to include tcon.Inductive_Injection_Hours, tcon.Capacitive_Injection_Hours without applying group by. How can I do that?
SELECT
bp.Serial_Number,
tcon.Serial_Number AS ConverterSerialNumber,
MAX(tcon.Time_Stamp) AS DateStamp,
tcon.Inductive_Injection_Hours,
tcon.Capacitive_Injection_Hours
FROM
dbo.Bypass AS bp
INNER JOIN
dbo.Converter AS c ON bp.Bypass_ID = c.Bypass_ID
INNER JOIN
dbo.Converter_Tel_Data AS tcon ON c.Converter_ID = tcon.Converter_ID
WHERE
(bp.Site_ID = 7)
GROUP BY
bp.Serial_Number, tcon.Serial_Number,
tcon.Inductive_Injection_Hours, tcon.Capacitive_Injection_Hours
ORDER BY
ConverterSerialNumber
I have figured it out.
select [data].Serial_Number,Time_Stamp,Inductive_Injection_Hours,Capacitive_Injection_Hours,b.Serial_Number from Converter_Tel_Data as [data]
inner join dbo.Converter AS c On [data].Converter_ID = c.Converter_ID
inner join dbo.Bypass as b on c.Bypass_ID = b.Bypass_ID
WHERE
(Time_Stamp = (SELECT MAX(Time_Stamp) FROM Converter_Tel_Data WHERE Converter_ID = [data].Converter_ID)) And ([data].Site_ID=7)
ORDER BY [data].Serial_Number
You can use row_number - either in a CTE/derived table or using a trick with TOP 1.
Select Top 1 With Ties
bp.Serial_Number
, tcon.Serial_Number AS ConverterSerialNumber
, tcon.Time_Stamp AS DateStamp
, tcon.Inductive_Injection_Hours
, tcon.Capacitive_Injection_Hours
From dbo.Bypass AS bp
Inner Join dbo.Converter AS c On bp.Bypass_ID = c.Bypass_ID
Inner Join dbo.Converter_Tel_Data AS tcon ON c.Converter_ID = tcon.Converter_ID
Where bp.Site_ID = 7
Order By
row_number() over(Partition By bp.Serial_Number Order By tcon.Time_Stamp desc)
This should return the latest row from the tconn table for each bp.Serial_Number.

Removing duplicates rows in left Joining query

SELECT Rec.[Reg_ID]
,Rec.[Reg_No]
,Rec.[Case_ID]
,Det.Deleted AS CaseDeleted
,[Status].[Status]
,Det.[Unit_Submission_Date] AS [Signature]
,TD.TargetDate AS [Target]
,TD.TargetID
FROM [dbo].[Regestrations] Rec
LEFT JOIN [dbo].[Reg_Details] Det ON Rec.Case_ID = Det.CaseID
LEFT JOIN [dbo].[lkpStatus] [Status] ON Rec.Status_ID = [Status].StatusID
LEFT JOIN TargetDate TD ON TD.RecommId = Rec.Reg_ID
WHERE (Det.MissionID = 50 AND [Status].[Status] = 1 AND Rec.Deleted = 0 AND Det.Deleted = 0)
GROUP BY Rec.[Reg_ID],Rec.[Reg_No],Rec.[Case_ID]
,[Status].[Status]
,Det.[Unit_Submission_Date]
,TD.TargetDate
,Det.Deleted
,TD.TargetID
ORDER BY TD.TargetID desc
I have the above query that is supposed to return rows with unique Rec.[Reg_No]. But joined table TargetDate can have duplicate Rec.[Reg_ID] and if thats the case i get duplicate Rec.[Reg_No] rows in my results.
Table TargetDate has a date time column so i want to eliminate the duplicate Rec.[Reg_No] by selecting 1 row with the latest date value from table TargetDate.
How do modify my Join condition or the query where clause to achive the above?
One way is to use a window function such as ROW_NUMBER() that will generate sequential number based on the specified partition. This generated number can then be used to get the latest row.
SELECT Reg_ID, Reg_No, Case_ID, CaseDeleted, [Status], Signature, [Target], TargetID
FROM
(
SELECT Rec.[Reg_ID]
,Rec.[Reg_No]
,Rec.[Case_ID]
,Det.Deleted AS CaseDeleted
,[Status].[Status]
,Det.[Unit_Submission_Date] AS [Signature]
,TD.TargetDate AS [Target]
,TD.TargetID
,RN = ROW_NUMBER() OVER (PARTITION BY Rec.[Reg_No] ORDER BY TD.TargetID DESC)
FROM [BOI].[dbo].[Regestrations] Rec
LEFT JOIN [dbo].[Reg_Details] Det ON Rec.Case_ID = Det.CaseID
LEFT JOIN [dbo].[lkpStatus] [Status] ON Rec.Status_ID = [Status].StatusID
LEFT JOIN TargetDate TD ON TD.RecommId = Rec.Reg_ID
WHERE (Det.MissionID = 50 AND [Status].[Status] = 1 AND Rec.Deleted = 0 AND Det.Deleted = 0)
) subQuery
WHERE RN = 1
ORDER BY TargetID desc
This query can work correctly if you remove TD.TargetDate from GROUP BY clause and compute what you really need in output - MAX(TD.TargetDate)
But preferable way it to avoid GROUP BY clause at all:
...
FROM [BOI].[dbo].[Regestrations] Rec
LEFT JOIN [dbo].[Reg_Details] Det ON Rec.Case_ID = Det.CaseID
LEFT JOIN [dbo].[lkpStatus] [Status] ON Rec.Status_ID = [Status].StatusID
OUTER APPLY(
SELECT TOP 1 td.TargetDate, td.TargetID
FROM TargetDate TD
WHERE TD.RecommId = Rec.Reg_ID
ORDER BY TD.TargetDate DESC
) td
...
You should first find latest TargetDate for each Reg_ID or RecommId. Then you can use your normal join with TargetDate table just this time with matching both the RecommId and TargetDate.
Try this Query:
SELECT Rec.[Reg_ID]
,Rec.[Reg_No]
,Rec.[Case_ID]
,Det.Deleted AS CaseDeleted
,[Status].[Status]
,Det.[Unit_Submission_Date] AS [Signature]
,TD.TargetDate AS [Target]
,TD.TargetID
FROM [BOI].[dbo].[Regestrations] Rec
LEFT JOIN [dbo].[Reg_Details] Det ON Rec.Case_ID = Det.CaseID
LEFT JOIN [dbo].[lkpStatus] [Status] ON Rec.Status_ID = [Status].StatusID
LEFT JOIN (SELECT RecommId, MAX(TargetDate) MaxTargetDate GROUP BY RecommId) TDWithLatestDate ON TDWithLatestDate.RecommId = Rec.Reg_ID
LEFT OUTER JOIN TargetDate ON TD.RecommId = TDWithLatestDate.RecommId AND TD.TargetDate = TDWithLatestDate.MaxTargetDate
WHERE (Det.MissionID = 50
AND [Status].[Status] = 1
AND Rec.Deleted = 0
AND Det.Deleted = 0
)
GROUP BY Rec.[Reg_ID]
,Rec.[Reg_No]
,Rec.[Case_ID]
,[Status].[Status]
,Det.[Unit_Submission_Date]
,TD.TargetDate
,Det.Deleted
,TD.TargetID
ORDER BY TD.TargetID desc
You can improve this query if you want to avoid tie when there more than one record fighting to be latest.

Should a subquery on a join use tables from an outer query in the where clause?

I need to add a subquery to a join, because one payment can have more than one allotment, so I only need to account for the first match (where rownum = 1).
However, I'm not sure if adding pmt from the outer query to the subquery on the allotment join is best.
Should I be doing this differently in the event of performance hits, etc.. ?
SELECT
pmt.payment_uid,
alt.allotment_uid,
FROM
payment pmt
/* HERE: is the reference to pmt.pay_key and pmt.client_id
incorrect in the below subquery? */
INNER JOIN allotment alc ON alt.allotment_uid = (
SELECT
allotment_uid
FROM
allotment
WHERE
pay_key = pmt.pay_key
AND
pay_code = 'xyz'
AND
deleted = 'N'
AND
client_id = pmt.client_id
AND
ROWNUM = 1
)
WHERE
AND
pmt.deleted = 'N'
AND
pmt.date_paid >= TO_DATE('2017-07-01')
AND
pmt.date_paid < TO_DATE('2017-10-01') + 1;
It's difficult to identify the performance issue in your query without seeing an explain plan output. You query does seem to do an additional SELECT on the allotment for every record from the main query.
Here is a version which doesn't use correlated sub query. Obviously I haven't been able to test it. It does a simple join in and then filters all records except one of the allotments. Hope this helps.
WITH v_payment
AS
(
SELECT
pmt.payment_uid,
alt.allotment_uid,
ROW_NUMBER () OVER(PARTITION BY allotment_id) r_num
FROM
payment pmt JOIN allotment alt
ON (pmt.pay_key = alt.pay_key AND
pmt.client_id = alt.client_id)
WHERE pmt.deleted = 'N' AND
pmt.date_paid >= TO_DATE('2017-07-01') AND
pmt.date_paid < TO_DATE('2017-10-01') + 1 AND
alt.pay_code = 'xyz' AND
alt.deleted = 'N'
)
SELECT payment_uid,
allotment_uid
FROM v_payment
WHERE r_num = 1;
Let's know how this performs!
You can phrase the query that way. I would be more likely to do:
SELECT . . .
FROM payment p INNER JOIN
(SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY pay_key, client_id
ORDER BY allotment_uid
) as seqnum
FROM allotment a
WHERE pay_code = 'xyz' AND deleted = 'N'
) a
ON a.pay_key = p.pay_key AND a.client_id = p.client_id AND
seqnum = 1
WHERE p.deleted = 'N' AND
p.date_paid >= DATE '2017-07-01' AND
p.date_paid < (DATE '2017-10-01') + 1;

returning only the most recent record in dataset

I have a query returning data for several dates.
I want to return only record with the most recent date in the field SAPOD (the date is in fact CYYMMDD where C=0 before 2000 and 1 after 2000 and I can show this as YYYYMMDD using SAPOD=Case when LEFT(SAPOD,1)=1 then '20' else '19' end + SUBSTRING(cast(sapod as nvarchar(7)),2,7))
here is my query:
SELECT GFCUS, Ne.NEEAN, SCDLE, SAPOD, SATCD,CUS.GFCUN, BGCFN1,
BGCFN2, BGCFN3, SV.SVCSA, SV.SVNA1, SV.SVNA2, SV.SVNA3,
SV.SVNA4, SV.SVNA5, SV.SVPZIP, SV.NONUK
FROM SCPF ACC
INNER JOIN GFPF CUS ON GFCPNC = SCAN
LEFT OUTER JOIN SXPF SEQ ON SXCUS = GFCUS AND SXPRIM = ''
LEFT OUTER JOIN SVPFClean SV ON SVSEQ = SXSEQ
LEFT OUTER JOIN BGPF ON BGCUS = GFCUS AND BGCLC = GFCLC
LEFT OUTER JOIN NEPF NE ON SCAB=NE.NEAB and SCAN=ne.NEAN and SCAS=ne.NEAS
LEFT OUTER JOIN SAPF SA ON SCAB=SAAB and SCAN=SAAN and SCAS=SAAS
WHERE
(SATCD>500 and
scsac='IV' and
scbal = 0 and
scai30<>'Y' and
scai14<>'Y' and
not exists(select * from v5pf where v5and=scan and v5bal<>0))
GROUP BY GFCUS, Ne.NEEAN, SCDLE, SAPOD, SATCD,
CUS.GFCUN, BGCFN1, BGCFN2, BGCFN3, SV.SVCSA,
SV.SVNA1, SV.SVNA2, SV.SVNA3, SV.SVNA4, SV.SVNA5, SV.SVPZIP, SV.NONUK
ORDER BY MAX(SCAN) ASC, SAPOD DESC
I am getting results like the below where there are several transactions by a customer, and we only want to show the data of the most recent transaction:
So how can I show just the most recent transaction? Is this a case where I should use an OUTER APPLY or CROSS APPY?
EDIT:
Sorry I should clarify that I need the most recent date for each of the unique records in the field NEEAN which is the Account number
You can use ROW_NUMBER() as follows:
SELECT
ROW_NUMBER() OVER (PARTITION BY Ne.NEEAN ORDER BY SAPOD DESC) AS [Row],
GFCUS, Ne.NEEAN, SCDLE, SAPOD, SATCD,CUS.GFCUN, BGCFN1,
BGCFN2, BGCFN3, SV.SVCSA, SV.SVNA1, SV.SVNA2, SV.SVNA3,
SV.SVNA4, SV.SVNA5, SV.SVPZIP, SV.NONUK
FROM SCPF ACC
INNER JOIN GFPF CUS ON GFCPNC = SCAN
LEFT OUTER JOIN SXPF SEQ ON SXCUS = GFCUS AND SXPRIM = ''
LEFT OUTER JOIN SVPFClean SV ON SVSEQ = SXSEQ
LEFT OUTER JOIN BGPF ON BGCUS = GFCUS AND BGCLC = GFCLC
LEFT OUTER JOIN NEPF NE ON SCAB=NE.NEAB and SCAN=ne.NEAN and SCAS=ne.NEAS
LEFT OUTER JOIN SAPF SA ON SCAB=SAAB and SCAN=SAAN and SCAS=SAAS
WHERE
(SATCD>500 and
scsac='IV' and
scbal = 0 and
scai30<>'Y' and
scai14<>'Y' and
not exists(select * from v5pf where v5and=scan and v5bal<>0)) and
[Row] = 1
GROUP BY GFCUS, Ne.NEEAN, SCDLE, SAPOD, SATCD,
CUS.GFCUN, BGCFN1, BGCFN2, BGCFN3, SV.SVCSA,
SV.SVNA1, SV.SVNA2, SV.SVNA3, SV.SVNA4, SV.SVNA5, SV.SVPZIP, SV.NONUK
ORDER BY MAX(SCAN) ASC
You could encapsulate this within a subquery if you don't want to return the [Row] column.
you can user row_number to get top 1 row per customer
In the where clause need to return values with pos value as 1
sample query
row_number() over ( partition by GFCUS order by SAPOD desc) as pos

Need to speed up the results of this SQL statement. Any advice?

I've got the following SQL Statement that needs some major speed up. The problem is I need to search on two fields, where each of them is calling several sub-selects. Is there a way to join the two fields together so I call the sub-selects only once?
SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM trcdba.billspaid
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010'
AND custno in
(select custno from cwdba.txpytaxid where taxpayerno in
(select taxpayerno from cwdba.txpyaccts where accountno in
(select accountno from rtadba.reasacct where controlno = 1234567)))
OR custno2 in
(select custno from cwdba.txpytaxid where taxpayerno in
(select taxpayerno from cwdba.txpyaccts where accountno in
(select accountno from rtadba.reasacct where controlno = 1234567)))
I would use joins instead of the embedded sub-queries.
when you use a function on the column:
date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010'
an index will NOT be used. Try something like this
datepif > someconversionhere('01/06/2009')
AND datepif <= someconversionhere('01/06/2010')
Use inner joins too. There isn't any info in the question to indicate table size or if there is an index or not, so this is a guess and should work best if there are many more rows in billspaid for the date range vs rows that match the joining tables for r.controlno = 1234567, which I suspect is the case:
SELECT
COALESCE(b1.billyr,b2.billyr) AS billyr
,COALESCE(b1.billno,b2.billno) AS billno
,COALESCE(b1.propacct,b2.propacct) AS propacct
,COALESCE(b1.vinid,b2.vinid) AS vinid
,COALESCE(b1.taxpaid,b2.taxpaid) AS taxpaid
,COALESCE(b1.duedate,b2.duedate) AS duedate
,COALESCE(b1.datepif,b2.datepif) AS datepif
,COALESCE(b1.propdesc,b2.propdesc) AS propdesc
FROM rtadba.reasacct r
INNER JOIN cwdba.txpyaccts a ON r.accountno=t.accountno
INNER JOIN cwdba.txpytaxid t ON a.taxpayerno=t.taxpayerno
LEFT OUTER JOIN trcdba.billspaid b1 ON t.custno=b1.custno AND b1.datepif > someconversionhere('01/06/2009') AND b1.datepif <= someconversionhere('01/06/2010')
LEFT OUTER JOIN trcdba.billspaid b2 ON t.custno2=b2.custno AND b2.datepif > someconversionhere('01/06/2009') AND b2.datepif <= someconversionhere('01/06/2010')
WHERE r.controlno = 1234567
AND COALESCE(b1.custno,b2.custno) IS NOT NULL
create an index for each of these:
rtadba.reasacct.controlno and cover on accountno
cwdba.txpyaccts.accountno and cover on taxpayerno
cwdba.txpytaxid.taxpayerno and cover on custno
trcdba.billspaid.custno +datepif
trcdba.billspaid.custno2 +datepif
Here's the same thing using JOIN instead of sub queries.
SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
ON txpytaxid.custno = billspaid.custno OR txpytaxid.custno = billspaid.custno2
INNER JOIN txpyaccts
ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010'
However, if the OR in the JOIN is giving you performance problems, you can always try using a union:
(SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
ON txpytaxid.custno = billspaid.custno
INNER JOIN txpyaccts
ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010')
UNION
(SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
ON txpytaxid.custno = billspaid.custno2
INNER JOIN txpyaccts
ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010')
Use EXISTS instead of IN ( unless the result set of the IN subquery is very small).
If you do UNION instead of OR ( which should be functionally equivalent ) use UNION ALL instead.