Use of MAX function in SQL query to filter data - sql

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC

Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.

Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.

Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

Related

Update failing while using dense_rank and row_number

Here is the sample data of the two tables , just put together for easy reference
I want the upper part of the table [Outbound].[dbo].[Encounter_Out_P] with column "277CA_FILENAME","277CA_FILENAME2","277CA_FILENAME3","277CA_FILENAME4" as NULLS which are sorted by File_Submitted_DT ascending order to be updated with "277FileId" values of the lower table [Outbound].[dbo].[Encounter_Out_277_P] which are sorted by EDIFECSProcessDate in ascending order. Thanks in advance
Here is my code
WITH
cte_2771 AS (
SELECT
"277CA_FILENAME",
File_Submitted_DT,
TRN02_PatientControlNumber
FROM (
SELECT
I."277CA_FILENAME",
I.File_Submitted_DT,
#cte_277.TRN02_PatientControlNumber
,dense_rank() OVER(PARTITION BY #cte_277.TRN02_PatientControlNumber ORDER BY ABS(DATEDIFF(MINUTE, i.File_Submitted_DT, #cte_277.EDIFECSProcessDate)) ASC) rw1
FROM
[Outbound].[dbo].[Encounter_Out_P] I
INNER JOIN #cte_277 ON I.EncounterID = #cte_277.TRN02_PatientControlNumber
--WHERE EncounterID = 'AP230120920712808806'
)t
WHERE
t.rw1 = 1
)
,
cte_2772 AS (
SELECT
"277FileId"
,1 + ((ROW_NUMBER() OVER(PARTITION BY TRN02_PatientControlNumber ORDER BY EDIFECSProcessDate,File_Submitted_DT ASC ) - 1) % 4)rw2
,TRN02_PatientControlNumber
FROM (
SELECT DISTINCT
p."277FileId",
p.EDIFECSProcessDate,
p.TRN02_PatientControlNumber
,p.ID
,cte_2771.File_Submitted_DT
FROM [Outbound].[dbo].[Encounter_Out_277_P] p
INNER JOIN cte_2771 ON cte_2771.File_Submitted_DT < p.EDIFECSProcessDate
WHERE
p.TRN02_PatientControlNumber = cte_2771.TRN02_PatientControlNumber
) t
)
UPDATE cte_2771
SET "277CA_FILENAME" =
COALESCE(cte_2771."277CA_FILENAME", cte_2772."277FileId" )
FROM cte_2771 INNER JOIN cte_2772
ON cte_2772.TRN02_PatientControlNumber = cte_2771.TRN02_PatientControlNumber
WHERE cte_2772.rw2 = 1
I want the output to be like below, (the upperpart) just put together for easy reference
Notes
I have posted the code for "277CA_FILENAME" only, since it is the same for the rest by changing the WHERE condition changes as "WHERE cte_2772.rw2 = 2,3,4"
if I Uncomment the --WHERE EncounterID = 'AP230120920712808806' in the cte_2771 , it is working perfectly, but if I comment it and run for the entire load, one row gets correct and the other one gets NULL

SQL - ROW_NUMBER that is used in a multi-condition LEFT JOIN

Two tables store different properties for each product: CTI_ROUTING_VIEW and ORD_MACH_OPS
They are both organized by SPEC_NO > MACH_SEQ_NO but the format of the Sequence number is different for each table so it can't be used for a JOIN. ORCH_MACH_OPS has MACHINE and PASS_NO, meaning if a product goes through the same machine twice, the row with the higher SEQ_NO will be PASS_NO 2, 3, etc. CTI_ROUTING_VIEW does not offer PASS_NO, but I can achieve the desired result with:
SELECT TOP (1000) [SPEC_NO]
,[SPEC_PART_NO]
,[MACH_NO]
,[MACH_SEQ_NO]
,[BLANK_WID]
,[BLANK_LEN]
,[NO_OUT_WID]
,[NO_OUT_LEN]
,[SU_MINUTES]
,[RUN_SPEED]
,[NO_COLORS]
,[PRINTDIEID]
,[CUTDIEID]
,ROW_NUMBER() OVER (PARTITION BY MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO
FROM [CREATIVE].[dbo].[CTI_ROUTING_VIEW]
I would think that I could use this artificial PASS_NO as a JOIN condition, but I can't seem to get it to come through. This is my first time using ROW_NUMBER() so I'm just wondering if I'm doing something wrong in the JOIN syntax.
SELECT rOrd.[SPEC_NO]
,rOrd.[MACH_SEQ_NO]
,rOrd.[WAS_REROUTED]
,rOrd.[NO_OUT]
,rOrd.[PART_COMP_FLG]
,rOrd.[SCHED_START]
,rOrd.[SCHED_STOP]
,rOrd.[MACH_REROUTE_FLG]
,rOrd.[MACH_DESCR]
,rOrd.REPLACED_MACH_NO
,rOrd.MACH_NO
,rOrd.PASS_NO
,rWip.MAX_TRX_DATETIME
,ISNULL(rWip.NET_FG_SUM*rOrd.NO_OUT,0) as NET_FG_SUM
,CASE
WHEN rCti.BLANK_WID IS NULL then 'N//A'
ELSE CONCAT(rCti.BLANK_WID, ' X ', rCti.BLANK_LEN)
END AS SIZE
,ISNULL(rCti.PRINTDIEID,'N//A') as PRINTDIEID
,ISNULL(rCti.CUTDIEID, 'N//A') as CUTDIEID
,rStyle.DESCR as STYLE
,ISNULL(rCti.NO_COLORS, 0) as NO_COLORS
,CAST(CONCAT(rOrd.ORDER_NO,'-',rOrd.ORDER_PART_NO) as varchar) as ORD_MACH_KEY
FROM [CREATIVE].[dbo].[ORD_MACH_OPS] as rOrd
LEFT JOIN (SELECT DISTINCT
[SPEC_NO]
,[SPEC_PART_NO]
,[MACH_NO]
,MACH_SEQ_NO
,[BLANK_WID]
,[BLANK_LEN]
,[NO_COLORS]
,[PRINTDIEID]
,[CUTDIEID]
,ROW_NUMBER() OVER (PARTITION BY MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO
FROM [CREATIVE].[dbo].[CTI_ROUTING_VIEW]) as rCti
ON rCti.SPEC_NO = rOrd.SPEC_NO
and rCti.MACH_NO =
CASE
WHEN rOrd.REPLACED_MACH_NO is null then rOrd.MACH_NO
ELSE rOrd.REPLACED_MACH_NO
END
and rCti.PASS_NO = rOrd.PASS_NO
LEFT JOIN INVENTORY_ITEM_TAB as rTab
ON rTab.SPEC_NO = rOrd.SPEC_NO
LEFT JOIN STYLE_DESCRIPTION as rStyle
ON rStyle.DESCR_CD = rTab.STYLE_CD
LEFT JOIN (
SELECT
JOB_NUMBER
,FORM_NO
,TRX_ORIG_MACH_NO
,PASS_NO
,SUM(GROSS_FG_QTY-WASTE_QTY) as NET_FG_SUM
,MAX(TRX_DATETIME) as MAX_TRX_DATETIME
FROM WIP_MACH_OPS
WHERE GROSS_FG_QTY <> 0
GROUP BY JOB_NUMBER, FORM_NO, TRX_ORIG_MACH_NO, PASS_NO) as rWip
ON rWip.JOB_NUMBER = rOrd.ORDER_NO
and rWip.FORM_NO = rOrd.ORDER_PART_NO
and rWip.TRX_ORIG_MACH_NO = rOrd.MACH_NO
and rWip.PASS_NO = rOrd.PASS_NO
WHERE rOrd.SCHED_START > DATEADD(DAY, -20, GETDATE())
I fixed it by adding a second partition.
ROW_NUMBER() OVER (PARTITION BY SPEC_NO, MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO

How to select a single row for each unique ID

SQL novice here learning on the job, still a greenhorn. I have a problem I don't know how to overcome. Using IBM Netezza and Aginity Workbench.
My current output will try to return one row per case number based on when a task was created. It will only keep the row with the newest task. This gets me about 85% of the way there. The issue is that sometimes multiple tasks have a create day of the same day.
I would like to incorporate Task Followup Date to only keep the newest row if there are multiple rows with the same Case Number. I posted an example of what my current code outputs and what i would like it to output.
Current code
SELECT
A.PS_CASE_ID AS Case_Number
,D.CASE_TASK_TYPE_NM AS Task
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT AS Task_Followup_Date
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON (A.CASE_ID = C.CASE_ID)
INNER JOIN VW_CASE_TASK_TYPE D ON (C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID)
INNER JOIN ADMIN.VW_RSN_CTGY B ON (A.RSN_CTGY_ID = B.RSN_CTGY_ID)
WHERE
(A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT')
AND CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01'
AND B.RSN_CTGY_NM = 'Chargeback Initiation'
AND CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE)) from VW_CASE_TASK C2 WHERE C2.CASE_ID = C.CASE_ID)
GROUP BY
A.PS_CASE_ID
,D.CASE_TASK_TYPE_NM
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT
Current output
Desired output
You could use ROW_NUMBER here:
WITH cte AS (
SELECT DISTINCT A.PS_CASE_ID AS Case_Number, D.CASE_TASK_TYPE_NM AS Task,
C.TASK_CRTE_TMS, C.TASK_FLWUP_DT AS Task_Followup_Date,
ROW_NUMBER() OVER (PARTITION BY A.PS_CASE_ID ORDER BY C.TASK_FLWUP_DT DESC) rn
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON A.CASE_ID = C.CASE_ID
INNER JOIN VW_CASE_TASK_TYPE D ON C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID
INNER JOIN ADMIN.VW_RSN_CTGY B ON A.RSN_CTGY_ID = B.RSN_CTGY_ID
WHERE (A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT') AND
CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01' AND
B.RSN_CTGY_NM = 'Chargeback Initiation' AND
CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE))
FROM VW_CASE_TASK C2
WHERE C2.CASE_ID = C.CASE_ID)
)
SELECT
Case_Number,
Task,
TASK_CRTE_TMS,
Task_Followup_Date
FROM cte
WHERE rn = 1;
One method used window functions:
with cte as (
< your query here >
)
select x.*
from (select cte.*,
row_number() over (partition by case_number, Task_Followup_Date
order by TASK_CRTE_TMS asc
) as seqnum
from cte
) x
where seqnum = 1;

MAX NOT WORKING IN SQL QUERY

I want the latest record to be retrieved by the following query....
but max is not working in the below query. All the rows are getting retrieved instead of the latest one
SELECT SV.SEGMENT1 TARGETED_INCENTIVE,
SIT.ANALYSIS_CRITERIA_ID,
SIT.OBJECT_VERSION_NUMBER OBJECT_VERSION_NUMBER,
ST.ID_FLEX_NUM,
SIT.DATE_FROM,
SIT.DATE_TO,
MAX (SIT.PERSON_ANALYSIS_ID)
FROM FND_ID_FLEX_STRUCTURES_TL STTL,
FND_ID_FLEX_STRUCTURES ST,
PER_PERSON_ANALYSES SIT,
PER_ANALYSIS_CRITERIA SV
WHERE 1 = 1
AND (STTL.ID_FLEX_STRUCTURE_NAME) LIKE
('%%Tare%')
AND STTL.LANGUAGE = USERENV ('LANG')
AND ST.ID_FLEX_CODE = STTL.ID_FLEX_CODE
AND ST.ID_FLEX_NUM = STTL.ID_FLEX_NUM
AND ST.ID_FLEX_NUM = SIT.ID_FLEX_NUM
AND ST.ID_FLEX_NUM = SV.ID_FLEX_NUM
AND TO_DATE (SIT.DATE_TO) IS NULL
AND SIT.ANALYSIS_CRITERIA_ID = SV.ANALYSIS_CRITERIA_ID
AND SIT.PERSON_ID = (SELECT PERSON_ID
FROM abc
WHERE ID = :AIN)
GROUP BY SV.SEGMENT1,
SIT.ANALYSIS_CRITERIA_ID,
STTL.ID_FLEX_STRUCTURE_NAME,
SIT.OBJECT_VERSION_NUMBER,
ST.ID_FLEX_NUM,
SIT.DATE_FROM,
SIT.DATE_TO;
Can anyone guide ?
I'm afraid that's not what MAX() does. MAX() is an aggregate function (though it can be used as a window [analytic] function), so when you get the MAX() of a particular column grouped by other columns, you will get distinct combinations of values for all those other columns.
I think you might want something like this:
SELECT targeted_incentive, analysis_criteria_id
, object_version_number, id_flex_num, date_from
, date_to, person_analysis_id
FROM (
SELECT sv.segment1 AS targeted_incentive
, sit.analysis_criteria_id
, sit.object_version_number
, st.id_flex_num
, sit.date_from
, sit.date_to
, sit.person_analysis_id
, RANK() OVER ( ORDER BY sit.person_analysis_id DESC ) rn
FROM fnd_id_flex_structures_tl sttl
, fnd_id_flex_structures st
, per_person_analyses sit
, per_analysis_criteria sv
WHERE sttl.id_flex_structure_name LIKE '%Tare%'
AND sttl.language = USERENV('LANG')
AND st.id_flex_code = sttl.id_flex_code
AND st.id_flex_num = sttl.id_flex_num
AND st.id_flex_num = sit.id_flex_num
AND st.id_flex_num = sv.id_flex_num
AND sit.date_to IS NULL
AND sit.analysis_criteria_id = sv.analysis_criteria_id
AND sit.person_id = ( SELECT person_id FROM abc
WHERE id = :AIN )
) WHERE rn = 1;
The RANK() window function will return the rank of each row ordered by the value of person_analysis_id in descending order. To get the maximum value, simply filter for rank = 1. Note that this will return more than one row in case of ties. If you want only one row, use ROW_NUMBER() in place of RANK().
Also note that I cleaned up the query a bit. You certainly don't need to use two % wildcards in a row in a LIKE, for example. You also definitely don't need the WHERE 1=1 condition.

How Do I Group Rows Together In A Query?

I am trying to distinguish the physical servers uptime from virtual ones by looking at OS. I am able to pull out the result of the VMWare OS, however, I'd like to group the physical servers as one row.
Here is the code I have so far:
SELECT TOP (100) PERCENT Avg(dbo.tblserveruptime.uptime) AS Uptime,
Count(*) AS Total
FROM dbo.server
INNER JOIN dbo.tblserveruptime
ON dbo.server.name = dbo.tblserveruptime.name
WHERE ( dbo.server.status = N'production' )
AND ( dbo.server.server_env = N'prod' )
AND ( dbo.server.os_type <> N'vmware' )
GROUP BY dbo.tblserveruptime.month,
dbo.tblserveruptime.year
HAVING ( dbo.tblserveruptime.month = 4 )
AND ( dbo.tblserveruptime.year = 2013 )
UNION
SELECT TOP (100) PERCENT Avg(dbo.tblserveruptime.uptime) AS Uptime,
Count(*) AS Total
FROM dbo.server
INNER JOIN dbo.tblserveruptime
ON dbo.server.name = dbo.tblserveruptime.name
WHERE ( dbo.server.status = N'production' )
AND ( dbo.server.server_env = N'prod' )
AND ( dbo.server.os_type = N'vmware' )
GROUP BY dbo.tblserveruptime.month,
dbo.tblserveruptime.year
HAVING ( dbo.tblserveruptime.month = 4 )
AND ( dbo.tblserveruptime.year = 2013 )
Whatever field it is that gives you the physical server name, add that field to your GROUP BY statement and put it first. It will then group by Server, Month and Year (as you currently have it). My guess is you might want to swap Year and Month as it makes more sense to do it that way than by Month and then Year.
Okay, thanks to all who tried to help. I tried putting images of my before and after results, but I couldn't since I don't have 10 reputation points. Anyways, I think I got it. Here it is:
SELECT AVG(dbo.tblServerUptime.Uptime) AS Uptime,
CASE WHEN OS_TYPE = 'VMWare' THEN 'Virtual' ELSE 'Physical' END AS [Physical vs VM]
FROM dbo.Server INNER JOIN
dbo.tblServerUptime ON dbo.Server.NAME = dbo.tblServerUptime.NAME
GROUP BY CASE WHEN OS_TYPE = 'VMWare' THEN 'Virtual' ELSE 'Physical' END