How to select a single row for each unique ID

How to select a single row for each unique ID - sql

SQL novice here learning on the job, still a greenhorn. I have a problem I don't know how to overcome. Using IBM Netezza and Aginity Workbench.
My current output will try to return one row per case number based on when a task was created. It will only keep the row with the newest task. This gets me about 85% of the way there. The issue is that sometimes multiple tasks have a create day of the same day.
I would like to incorporate Task Followup Date to only keep the newest row if there are multiple rows with the same Case Number. I posted an example of what my current code outputs and what i would like it to output.
Current code
SELECT
A.PS_CASE_ID AS Case_Number
,D.CASE_TASK_TYPE_NM AS Task
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT AS Task_Followup_Date
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON (A.CASE_ID = C.CASE_ID)
INNER JOIN VW_CASE_TASK_TYPE D ON (C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID)
INNER JOIN ADMIN.VW_RSN_CTGY B ON (A.RSN_CTGY_ID = B.RSN_CTGY_ID)
WHERE
(A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT')
AND CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01'
AND B.RSN_CTGY_NM = 'Chargeback Initiation'
AND CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE)) from VW_CASE_TASK C2 WHERE C2.CASE_ID = C.CASE_ID)
GROUP BY
A.PS_CASE_ID
,D.CASE_TASK_TYPE_NM
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT
Current output
Desired output

You could use ROW_NUMBER here:
WITH cte AS (
SELECT DISTINCT A.PS_CASE_ID AS Case_Number, D.CASE_TASK_TYPE_NM AS Task,
C.TASK_CRTE_TMS, C.TASK_FLWUP_DT AS Task_Followup_Date,
ROW_NUMBER() OVER (PARTITION BY A.PS_CASE_ID ORDER BY C.TASK_FLWUP_DT DESC) rn
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON A.CASE_ID = C.CASE_ID
INNER JOIN VW_CASE_TASK_TYPE D ON C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID
INNER JOIN ADMIN.VW_RSN_CTGY B ON A.RSN_CTGY_ID = B.RSN_CTGY_ID
WHERE (A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT') AND
CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01' AND
B.RSN_CTGY_NM = 'Chargeback Initiation' AND
CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE))
FROM VW_CASE_TASK C2
WHERE C2.CASE_ID = C.CASE_ID)
)
SELECT
Case_Number,
Task,
TASK_CRTE_TMS,
Task_Followup_Date
FROM cte
WHERE rn = 1;

One method used window functions:
with cte as (
< your query here >
)
select x.*
from (select cte.*,
row_number() over (partition by case_number, Task_Followup_Date
order by TASK_CRTE_TMS asc
) as seqnum
from cte
) x
where seqnum = 1;

Related

SQL - ROW_NUMBER that is used in a multi-condition LEFT JOIN

Two tables store different properties for each product: CTI_ROUTING_VIEW and ORD_MACH_OPS
They are both organized by SPEC_NO > MACH_SEQ_NO but the format of the Sequence number is different for each table so it can't be used for a JOIN. ORCH_MACH_OPS has MACHINE and PASS_NO, meaning if a product goes through the same machine twice, the row with the higher SEQ_NO will be PASS_NO 2, 3, etc. CTI_ROUTING_VIEW does not offer PASS_NO, but I can achieve the desired result with:
SELECT TOP (1000) [SPEC_NO]
,[SPEC_PART_NO]
,[MACH_NO]
,[MACH_SEQ_NO]
,[BLANK_WID]
,[BLANK_LEN]
,[NO_OUT_WID]
,[NO_OUT_LEN]
,[SU_MINUTES]
,[RUN_SPEED]
,[NO_COLORS]
,[PRINTDIEID]
,[CUTDIEID]
,ROW_NUMBER() OVER (PARTITION BY MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO
FROM [CREATIVE].[dbo].[CTI_ROUTING_VIEW]
I would think that I could use this artificial PASS_NO as a JOIN condition, but I can't seem to get it to come through. This is my first time using ROW_NUMBER() so I'm just wondering if I'm doing something wrong in the JOIN syntax.
SELECT rOrd.[SPEC_NO]
,rOrd.[MACH_SEQ_NO]
,rOrd.[WAS_REROUTED]
,rOrd.[NO_OUT]
,rOrd.[PART_COMP_FLG]
,rOrd.[SCHED_START]
,rOrd.[SCHED_STOP]
,rOrd.[MACH_REROUTE_FLG]
,rOrd.[MACH_DESCR]
,rOrd.REPLACED_MACH_NO
,rOrd.MACH_NO
,rOrd.PASS_NO
,rWip.MAX_TRX_DATETIME
,ISNULL(rWip.NET_FG_SUM*rOrd.NO_OUT,0) as NET_FG_SUM
,CASE
WHEN rCti.BLANK_WID IS NULL then 'N//A'
ELSE CONCAT(rCti.BLANK_WID, ' X ', rCti.BLANK_LEN)
END AS SIZE
,ISNULL(rCti.PRINTDIEID,'N//A') as PRINTDIEID
,ISNULL(rCti.CUTDIEID, 'N//A') as CUTDIEID
,rStyle.DESCR as STYLE
,ISNULL(rCti.NO_COLORS, 0) as NO_COLORS
,CAST(CONCAT(rOrd.ORDER_NO,'-',rOrd.ORDER_PART_NO) as varchar) as ORD_MACH_KEY
FROM [CREATIVE].[dbo].[ORD_MACH_OPS] as rOrd
LEFT JOIN (SELECT DISTINCT
[SPEC_NO]
,[SPEC_PART_NO]
,[MACH_NO]
,MACH_SEQ_NO
,[BLANK_WID]
,[BLANK_LEN]
,[NO_COLORS]
,[PRINTDIEID]
,[CUTDIEID]
,ROW_NUMBER() OVER (PARTITION BY MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO
FROM [CREATIVE].[dbo].[CTI_ROUTING_VIEW]) as rCti
ON rCti.SPEC_NO = rOrd.SPEC_NO
and rCti.MACH_NO =
CASE
WHEN rOrd.REPLACED_MACH_NO is null then rOrd.MACH_NO
ELSE rOrd.REPLACED_MACH_NO
END
and rCti.PASS_NO = rOrd.PASS_NO
LEFT JOIN INVENTORY_ITEM_TAB as rTab
ON rTab.SPEC_NO = rOrd.SPEC_NO
LEFT JOIN STYLE_DESCRIPTION as rStyle
ON rStyle.DESCR_CD = rTab.STYLE_CD
LEFT JOIN (
SELECT
JOB_NUMBER
,FORM_NO
,TRX_ORIG_MACH_NO
,PASS_NO
,SUM(GROSS_FG_QTY-WASTE_QTY) as NET_FG_SUM
,MAX(TRX_DATETIME) as MAX_TRX_DATETIME
FROM WIP_MACH_OPS
WHERE GROSS_FG_QTY <> 0
GROUP BY JOB_NUMBER, FORM_NO, TRX_ORIG_MACH_NO, PASS_NO) as rWip
ON rWip.JOB_NUMBER = rOrd.ORDER_NO
and rWip.FORM_NO = rOrd.ORDER_PART_NO
and rWip.TRX_ORIG_MACH_NO = rOrd.MACH_NO
and rWip.PASS_NO = rOrd.PASS_NO
WHERE rOrd.SCHED_START > DATEADD(DAY, -20, GETDATE())

I fixed it by adding a second partition.
ROW_NUMBER() OVER (PARTITION BY SPEC_NO, MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO

Return calculated column values for SELECT DISTINCT query

I have the following table in SQlite:
_id|token|status |timestamp|mood|eta|name|calc_eta
__________________________________________________________________________ 168|iqmC.3aHMBGbl|ok|1516625084498|50|-4154|Sample Name|1516625533082
169|iqmC.3aHMBGbl|ok|1516625084498|50|-4214|Sample Name|1516625533108
170|iqmC.3aHMBGbl|ok|1516625084498|50|-4274|Sample Name|1516625533414
171|iqmC.3aHMBGbl|ok|1516625084498|50|-4334|Sample Name|1516625533160
172|iqmC.3aHMBGbl|ok|1516625084498|50|-4394|Sample Name|1516625533680
173|iqmC.3aHMBGbl|ok|1516625084498|50|-4420|Sample Name|1516625533068
174|iqmC.3aHMBGbl|ok|1516625084498|50|-4428|Sample Name|1516625533482
175|iqmC.3aHMBGbl|ok|1516625084498|50|-4483|Sample Name|1516625533155
176|iqmC.3aHMBGbl|ok|1516625084498|50|-4543|Sample Name|1516625533148
177|TFbintkHMBw4H|ok|1516630122485|50|2526|Sample Name|1516632672019
178|TFbintkHMBw4H|ok|1516630122485|50|2520|Sample Name|1516632671903
179|TFbintkHMBw4H|ok|1516630122485|50|2460|Sample Name|1516632672321
180|TFbintkHMBw4H|ok|1516630122485|50|2344|Sample Name|1516632672859
181|TFbintkHMBw4H|ok|1516630122485|50|2336|Sample Name|1516632671939
182|TFbintkHMBw4H|ok|1516630122485|50|2281|Sample Name|1516632672802
183|TFbintkHMBw4H|ok|1516630122485|50|2220|Sample Name|1516632671828
184|TFbintkHMBw4H|ok|1516630122485|50|2161|Sample Name|1516632672625
I'm trying to come up with a query on it that would give me the difference between the two newest(based on auto-increment _id), calc_eta values for each distinct token value.
So in this case the result should be:
iqmC.3aHMBGbl|-7
TFbintkHMBw4H|797
I got this far with the SQL but it is not providing the calculated value for each distinct token currently and I'm not sure how to go further.
SELECT DISTINCT token,
(SELECT calc_eta
FROM DATA s
WHERE
(SELECT count(*)
FROM DATA f
WHERE f.token = s.token
AND f._id >= s._id) <= 1) -
(SELECT calc_eta
FROM
(SELECT calc_eta,
MIN(_id)
FROM DATA s
WHERE
(SELECT count(*)
FROM DATA f
WHERE f.token = s.token
AND f._id >= s._id) <= 2)) AS delay
FROM DATA;

In most SQL dialects, you would use window functions such as lag():
select d.*,
(calc_eta - prev_calc_eta) as diff
from (select d.*,
lag(calc_eta) over (partition by token order by _id) as prev_calc_eta,
row_number() over (partition by token order by _id desc) as seqnum
from data d
) d
where seqnum = 1;

Use of MAX function in SQL query to filter data

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC

Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.

Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.

Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

How do I combine subquery rows into one column in Oracle?

I'm working with FileNet. I'm trying to get the folders that a document may be filed in to appear in one column of the record set delimited with semicolons. This was the layout previously decided on and I am tasked with making Oracle do it. Here's what I have for a query so far:
SELECT d1.F_DOCNUMBER,
d1.F_DOCCLASSNUMBER,
d1.F_ENTRYDATE,
d1.F_ARCHIVEDATE,
d1.F_RETENTBASE,
d1.F_RETENTDISP,
d1.F_RETENTOFFSET,
d1.F_PAGES,
d1.F_DOCTYPE,
d1.F_DOCFORMAT,
d1.A32 AS CERT_NUM,
d1.A35 AS DOC_TYPE,
d1.A36 AS BATCH_KEY,
d1.A37 AS FIELD_REP_CODE,
d1.A38 AS EFFECTIVE_DATE,
d1.A39 AS VOUCH_NUM_HIGH,
d1.A40 AS VOUCH_NUM_LOW,
f1.Folders
FROM doctaba d1
LEFT JOIN (SELECT SUBSTR (SYS_CONNECT_BY_PATH (F_FOLDERNAME , ';'), 2) Folders
FROM (SELECT fc2.F_DOCNUMBER, f2.F_FOLDERNAME, ROW_NUMBER () OVER (ORDER BY f2.F_FOLDERNAME) rn, COUNT (*) OVER () cnt
FROM folder_contents fc2
INNER JOIN folder f2
ON f2.F_FOLDERNUMBER = fc2.F_FOLDERNUMBER
WHERE fc2.F_DOCNUMBER = d1.F_DOCNUMBER)
WHERE rn = cnt
START WITH rn = 1
CONNECT BY rn = PRIOR rn + 1) f1
ON d1.F_DOCNUMBER = f1.F_DOCNUMBER
WHERE d1.F_DOCTYPE IS NULL
AND d1.F_DOCNUMBER >= 107777
AND d1.F_DOCNUMBER <= 305791
ORDER BY d1.F_DOCNUMBER;
The problem is that d1.F_DOCNUMBER is being marked as an invalid identifier. I read on some forums that Oracle may not let that column identifier work multiple query levels down. Anyone have some suggestions on how to make this work? Thanks!
EDIT:
Here's my original query that just includes the folder values in rows.
SELECT doctaba.F_DOCNUMBER,
doctaba.F_DOCCLASSNUMBER,
doctaba.F_ENTRYDATE,
doctaba.F_ARCHIVEDATE,
doctaba.F_RETENTBASE,
doctaba.F_RETENTDISP,
doctaba.F_RETENTOFFSET,
doctaba.F_PAGES,
doctaba.F_DOCTYPE,
doctaba.F_DOCFORMAT,
doctaba.A32 AS CERT_NUM,
doctaba.A35 AS DOC_TYPE,
doctaba.A36 AS BATCH_KEY,
doctaba.A37 AS FIELD_REP_CODE,
doctaba.A38 AS EFFECTIVE_DATE,
doctaba.A39 AS VOUCH_NUM_HIGH,
doctaba.A40 AS VOUCH_NUM_LOW,
folder.F_FOLDERNAME
FROM doctaba
LEFT JOIN folder_contents
ON doctaba.F_DOCNUMBER = folder_contents.F_DOCNUMBER
INNER JOIN folder
ON folder.F_FOLDERNUMBER = folder_contents.F_FOLDERNUMBER
WHERE doctaba.F_DOCTYPE IS NULL
AND doctaba.F_DOCNUMBER >= 107777
AND doctaba.F_DOCNUMBER <= 17208174
ORDER BY doctaba.F_DOCNUMBER;

In this case, you are lucky. You are only getting one value from the subquery, so you can just make it a correlated subquery in the select clause:
SELECT . . .
(SELECT SUBSTR(SYS_CONNECT_BY_PATH (F_FOLDERNAME , ';'), 2) as Folders
FROM (SELECT fc2.F_DOCNUMBER, f2.F_FOLDERNAME,
ROW_NUMBER () OVER (ORDER BY f2.F_FOLDERNAME) rn,
COUNT (*) OVER () cnt
FROM folder_contents fc2 INNER JOIN
folder f2
ON f2.F_FOLDERNUMBER = fc2.F_FOLDERNUMBER
WHERE fc2.F_DOCNUMBER = d1.F_DOCNUMBER
)
WHERE rn = cnt
START WITH rn = 1
CONNECT BY rn = PRIOR rn + 1
) as Folders
FROM doctaba d1
WHERE d1.F_DOCTYPE IS NULL AND
d1.F_DOCNUMBER >= 107777 AND
d1.F_DOCNUMBER <= 305791
ORDER BY d1.F_DOCNUMBER;

SQL ROW_NUMBER with INNER JOIN

I need to use ROW_NUMBER() in the following Query to return rows 5 to 10 of the result. Can anyone please show me what I need to do? I've been trying to no avail. If anyone can help I'd really appreciate it.
SELECT *
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity

You need to stick it in a table expression to filter on ROW_NUMBER. You won't be able to use * as it will complain about the column name starRating appearing more than once so will need to list out the required columns explicitly. This is better practice anyway.
WITH CTE AS
(
SELECT /*TODO: List column names*/
ROW_NUMBER()
OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity) AS RN
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT /*TODO: List column names*/
FROM CTE
WHERE RN BETWEEN 5 AND 10
ORDER BY RN

You can use a with clause. Please try the following
WITH t AS
(
SELECT villa_data.starRating,
villa_data.capacity,
villa_data.bedrooms,
villa_prices.period,
villa_prices.price,
ROW_NUMBER() OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity ) AS 'RowNumber'
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT *
FROM t
WHERE RowNumber BETWEEN 5 AND 10;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to select a single row for each unique ID - sql

One method used window functions: with cte as ( < your query here > ) select x.* from (select cte.*, row_number() over (partition by case_number, Task_Followup_Date order by TASK_CRTE_TMS asc ) as seqnum from cte ) x where seqnum = 1;

Related

SQL - ROW_NUMBER that is used in a multi-condition LEFT JOIN

Return calculated column values for SELECT DISTINCT query

Use of MAX function in SQL query to filter data

How do I combine subquery rows into one column in Oracle?

SQL ROW_NUMBER with INNER JOIN

Categories

Resources