I'm trying to apply a condition to LAG in a SQL query. Does anyone know how to do this?
This is the query:
SELECT CONCAT([FirstName],' ',[LastName]) AS employee,
CAST([ArrivalTime] AS DATE) AS date,
CAST(DATEADD(hour,2,FORMAT([ArrivalTime],'HH:mm')) AS TIME) as time,
CASE [EventType]
WHEN 20001 THEN 'ENTRY'
ELSE 'EXIT'
END AS Action,
OutTime =
CASE [EventType]
WHEN '20001'
THEN DATEDIFF(minute,Lag([ArrivalTime],1) OVER(ORDER BY [CardHolderID], [ArrivalTime]), [ArrivalTime])
ELSE
NULL
END
FROM [CCFTEvent].[dbo].[ReportEvent]
LEFT JOIN [CCFTCentral].[dbo].[Cardholder] ON [CCFTEvent].[dbo].[ReportEvent].[CardholderID] = [CCFTCentral].[dbo].[Cardholder].[FTItemID]
WHERE EventClass = 41
AND [FirstName] IS NOT NULL
AND [FirstName] LIKE 'Leeann%'
The problem I have is when the times are subtracted between two different dates, it must also be NULL when subtracting between two different dates.
The 910 is incorrect.
I'd add another condition to your case statement. i.e.
...
CASE
WHEN [EventType] = '20001' AND DATEDIFF(DAY,[ArrivalTime],LAG([ArrivalTime]) over (ORDER BY [CardHolderID], [ArrivalTime])) > 0
THEN NULL
WHEN [EventType] = '20001'
THEN DATEDIFF(minute,Lag([ArrivalTime],1) OVER(ORDER BY [CardHolderID], [ArrivalTime]), [ArrivalTime])
ELSE NULL
It seems to me that the LAG just needs to be partitioned by the date (& some other fields for good measure).
If the previous date is in another partition,
then the LAG will return NULL,
then the datediff will return NULL.
SELECT
CONCAT(holder.FirstName+' ', holder.LastName) AS employee,
CAST(repev.ArrivalTime AS DATE) AS [date],
CAST(SWITCHOFFSET(repev.ArrivalTime,'+02:00') AS TIME) as [time],
IIF(repev.EventType = 20001, 'ENTRY', 'EXIT') AS Action,
(CASE WHEN repev.EventType = 20001
THEN DATEDIFF(minute, LAG(repev.ArrivalTime)
OVER (PARTITION BY repev.EventClass, repev.CardholderID, CAST(repev.ArrivalTime AS DATE)
ORDER BY repev.ArrivalTime), repev.ArrivalTime)
END) AS OutTime
FROM [CCFTEvent].[dbo].[ReportEvent] AS repev
LEFT JOIN [CCFTCentral].[dbo].[Cardholder] AS holder ON holder.FTItemID = repev.CardholderID
WHERE repev.EventClass = 41
AND holder.FirstName LIKE 'Leeann%'
Test on db<>fiddle here
Related
I have a simple sounding requirement that has had me stumped for a day or so now, so its time to seek help from the experts.
My requirement is to simply roll-up multiple rows into a single row based upon a break condition - when any of these columns change Employee ID, Allowance Plan, Allowance Amount or To Date, then the row is to be kept, if that makes sense.
An example source data set is shown below:
and the target data after collapsing the rows should look like this:
As you can see I don't need any type of running totals calculating I just need to collapse the rows into a single record per from date/to date combination.
So far I have tried the following SQL using a GROUP BY and MIN function
select [Employee ID], [Allowance Plan],
min([From Date]), max([To Date]), [Allowance Amount]
from [dbo].[#AllowInfo]
group by [Employee ID], [Allowance Plan], [Allowance Amount]
but that just gives me a single row and does not take into account the break condition.
what do I need to do so that the records are rolled-up (correct me if that is not the right terminology) correctly taking into account the break condition?
Any help is appreciated.
Thank you.
Note that your test data does not really exercise the algo that well - e.g. you only have one employee, one plan. Also, as you described it, you would end up with 4 rows as there is a change of todate between 7->8, 8->9, 9->10 and 10->11.
But I can see what you are trying to do, so this should at least get you on the right track, and returns the expected 3 rows. I have taken the end of a group to be where either employee/plan/amount has changed, or where todate is not null (or where we reach the end of the data)
CREATE TABLE #data
(
RowID INT,
EmployeeID INT,
AllowancePlan VARCHAR(30),
FromDate DATE,
ToDate DATE,
AllowanceAmount DECIMAL(12,2)
);
INSERT INTO #data(RowID, EmployeeID, AllowancePlan, FromDate, ToDate, AllowanceAmount)
VALUES
(1,200690,'CarAllowance','30/03/2017', NULL, 1000.0),
(2,200690,'CarAllowance','01/08/2017', NULL, 1000.0),
(6,200690,'CarAllowance','23/04/2018', NULL, 1000.0),
(7,200690,'CarAllowance','30/03/2018', NULL, 1000.0),
(8,200690,'CarAllowance','21/06/2018', '01/04/2019', 1000.0),
(9,200690,'CarAllowance','04/11/2021', NULL, 1000.0),
(10,200690,'CarAllowance','30/03/2017', '13/05/2022', 1000.0),
(11,200690,'CarAllowance','14/05/2022', NULL, 850.0);
-- find where the break points are
WITH chg AS
(
SELECT *,
CASE WHEN LAG(EmployeeID, 1, -1) OVER(ORDER BY RowID) != EmployeeID
OR LAG(AllowancePlan, 1, 'X') OVER(ORDER BY RowID) != AllowancePlan
OR LAG(AllowanceAmount, 1, -1) OVER(ORDER BY RowID) != AllowanceAmount
OR LAG(ToDate, 1) OVER(ORDER BY RowID) IS NOT NULL
THEN 1 ELSE 0 END AS NewGroup
FROM #data
),
-- count the number of break points as we go to group the related rows
grp AS
(
SELECT chg.*,
ISNULL(
SUM(NewGroup)
OVER (ORDER BY RowID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
0) AS grpNum
FROM chg
)
SELECT MIN(grp.RowID) AS RowID,
MAX(grp.EmployeeID) AS EmployeeID,
MAX(grp.AllowancePlan) AS AllowancePlan,
MIN(grp.FromDate) AS FromDate,
MAX(grp.ToDate) AS ToDate,
MAX(grp.AllowanceAmount) AS AllowanceAmount
FROM grp
GROUP BY grpNum
one way is to get all rows the last todate, and then group on that
select min(t.RowID) as RowID,
t.EmployeeID,
min(t.AllowancePlan) as AllowancePlan,
min(t.FromDate) as FromDate,
max(t.ToDate) as ToDate,
min(t.AllowanceAmount) as AllowanceAmount
from ( select t.RowID,
t.EmployeeID,
t.FromDate,
t.AllowancePlan,
t.AllowanceAmount,
case when t.ToDate is null then ( select top 1 t2.ToDate
from test t2
where t2.EmployeeID = t.EmployeeID
and t2.ToDate is not null
and t2.FromDate > t.FromDate -- t2.RowID > t.RowID
order by t2.RowID, t2.FromDate
)
else t.ToDate
end as todate
from test t
) t
group by t.EmployeeID, t.ToDate
order by t.EmployeeID, min(t.RowID)
See and test yourself in this DBFiddle
the result is
RowID
EmployeeID
AllowancePlan
FromDate
ToDate
AllowanceAmount
1
200690
CarAllowance
2017-03-30
2019-04-01
1000
9
200690
CarAllowance
2021-11-04
2022-05-13
1000
11
200690
CarAllowance
2022-05-14
(null)
850
I have the following data in SQL Server:
What I need is that for every day by employee (employeeId) I get in the follwing data:
AccessCode column means I = PunchIn and O = PunchOut and we have to filter by lunchtype = 'N'
So basically the result should return only one row per day and all the punch ins and punch outs in the middle of the first entrance and last exist shouldn't be considered.
Any clue?
You can do conditional aggregation :
select employeeid, In, Out,
dateadd(second, datediff(second, in, out), 0) as Hours
from(select employeeid,
min(case when AccessCode = 'I' then timestamp end) as In,
max(case when AccessCode = 'O' then timestamp end) as Out
from table t
where lunchtype = 'N'
group by employeeid, convert(date, times)
) t;
You can try this
with cte as
(select
*,
cast(times as date) as myda
from myTable
)
select
employeeid,
mn as punch_in,
mx as punch_out,
datediff(minute, mn, mx)/60.0 as hours
from
(select
employeeid,
min(times) over (partition by myda) as mn,
max(times) over (partition by myda) as mx
from cte
) t
group by
employeeid, mn, mx
Try this:
select employeeId,
min(case when accessCode = 'I' then timestamp end) punchIn,
max(case when accessCode = 'O' then timestamp end) punchOut
from myTable
where lunchtype = 'N'
group by employeeId
I have some data I would like to pull from a database, I'm using RStudio for my query. What I intend to do is write:
The first CTE statement to pull all my necessary information.
The second CTE statement will add two new columns for two row numbers, which are partitioned by different groups. Two additional columns will be added for Lead and Lag values.
The third CTE will produce two more columns where the two columns use nested case_when statements to give me NewOpen and NewClosed dates.
What I have so far:
q5<- sqlQuery(ch,paste("
;with CTE AS
(
select
oz.id as AccountID
,ac.PROD_TYPE_CDE as ProductTypeCode
,CASE WHEN ac.OPEN_DTE='0001-01-01' then null else ac.OPEN_DTE END as OpenDate
,CASE WHEN ac.CLOS_DTE = '0001-01-01' then null else ac.CLOS_DTE END as ClosedDate
,df.proc_dte as FullDate
FROM
dbs.tb_dbs_acct_fact df
inner join
dbs.tb_acct_details ac on df.dw_serv_id = ac.dw_serv_id
left outer join
dbs.tb_oz_id oz on df.proc_dte = oz.proc_dte
),
cte1 as
(
select *
,row_nbr = row_number() over( partition by AccountID order by AccountID, FullDate asc )
,row_nbr2 = row_number() over( partition by AccountID,ProductTypeCode order by AccountID, FullDate asc )
,lag(ProductTypeCode) over(partition by AccountID order by FullDate asc ) as Lagging
,LEAD(ProductTypeCode) over(partition by AccountID order order by FullDate asc ) as Leading
FROM CTE
),
cte2 as (select *
,case when cte1.row_nbr = 1 & cte1.Lagging=cte1.ProductTypeCode then cte1.OpenDate else
case when cte1.Lagging<>cte1.ProductTypeCode then cte1.FullDate else NULL END END as NewOpen
,case when cte1.ClosedDate IS NOT NULL then cte1.ClosedDate else
case when cte1.Leading <> cte1.ProductTypeCode then cte1.FullDate else NULL END END as NewClosed
FROM cte1
);"))
This code, however won't run.
As mentioned, WITH is a statement to define CTEs to be used in a final query. Your query only contains CTE definitions but never actually use any in a final statement. Additionally, you can combine the first two CTEs since window functions can run at any level. Possibly the last CTE can serve as your final SELECT statement.
sql <- "WITH CTE AS
(SELECT
oz.id AS AccountID
, ac.PROD_TYPE_CDE as ProductTypeCode
, CASE
WHEN ac.OPEN_DTE='0001-01-01'
THEN NULL
ELSE ac.OPEN_DTE
END AS OpenDate
, CASE
WHEN ac.CLOS_DTE = '0001-01-01'
THEN NULL
ELSE ac.CLOS_DTE
END AS ClosedDate
, df.proc_dte AS FullDate
, ROW_NUMBER() OVER (PARTITION BY oz.id
ORDER BY oz.id, df.proc_dte) AS row_nbr
, ROW_NUMBER() OVER (PARTITION BY oz.id, ac.PROD_TYPE_CDE
ORDER BY oz.id, df.proc_dte) AS row_nbr2
, LAG(ac.PROD_TYPE_CDE) OVER (PARTITION BY oz.id
ORDER BY df.proc_dte) AS Lagging
, LEAD(ac.PROD_TYPE_CDE) OVER (PARTITION BY oz.id
ORDER BY df.proc_dte) AS Leading
FROM
dbs.tb_dbs_acct_fact df
INNER JOIN
dbs.tb_acct_details ac ON df.dw_serv_id = ac.dw_serv_id
LEFT OUTER JOIN
dbs.tb_oz_id oz ON df.proc_dte = oz.proc_dte
)
SELECT *
, CASE
WHEN row_nbr = 1 & Lagging = ProductTypeCode
THEN OpenDate
ELSE
CASE
WHEN Lagging <> ProductTypeCode
THEN FullDate
ELSE NULL
END
END AS NewOpen
, CASE
WHEN ClosedDate IS NOT NULL
THEN ClosedDate
ELSE
CASE
WHEN Leading <> ProductTypeCode
THEN FullDate
ELSE NULL
END
END AS NewClosed
FROM CTE;"
q5 <- sqlQuery(ch, sql)
i'm trying to get in a new column the sessions who are between 08:00 and 18:00. You can see my last CASE in the CTE. For each date there should be a new column "TotalRestrictedSessions" which indicate how many session were on that particular date. If there are none, in this case i have to write 0. I suspect that my problem is when i convert the DATE?
WITH ParkeonCTE
AS
(
SELECT
OccDate = CONVERT(DATE, OC.LocalStartTime),
TotalOccSessions = COUNT(OC.SessionId),
AuthorityId,
TotalOccDuration = ISNULL(SUM(OC.DurationMinutes),0),
TotalNumberOfOverstay = SUM(CAST(OC.IsOverstay AS INT)),
TotalMinOfOverstays = ISNULL(SUM(OC.OverStayDurationMinutes),0),
(CASE
WHEN OC.OspId IS NULL THEN 'OffStreet' ELSE 'OnStreet'
END
) AS ParkingContextType,
(CASE
WHEN CAST(OC.LocalStartTime AS TIME) >= '08:00:00' AND CAST(OC.LocalStartTime AS TIME) <=
'18:00:00'
THEN COUNT(OC.SessionId)
END
) AS TotalRestrictedSessions
FROM Analytics.OccupancySessions AS OC
WHERE OC.AuthorityId IS NOT NULL
GROUP BY CONVERT(DATE,OC.LocalStartTime), OC.AuthorityId,OC.OspId
)
SELECT OC.OccDate,
OC.ParkingContextType,
OC.AuthorityId,
OC.TotalRestrictedSessions,
SUM(OC.TotalOccSessions) AS TotalOccSessions,
AVG(OC.TotalOccDuration) AS AvgOccMinutesDuration, -- wrong
SUM(OC.TotalOccDuration) AS TotalOccDuration,
SUM(OC.TotalNumberOfOverstay) AS TotalNumberOfOverstay,
SUM(OC.TotalMinOfOverstays) AS TotalMinOfOverstays,
CAST(AVG(OC.TotalMinOfOverstays) AS decimal(10,2)) AS AvgMinOfOverstays -- wrong
FROM ParkeonCTE AS OC
GROUP BY OC.OccDate, OC.AuthorityId, OC.ParkingContextType
ORDER BY OC.OccDate DESC
You just need to move your aggregation outside of your CASE expression, called conditional aggregation.
SUM(CASE
WHEN CAST(OC.LocalStartTime AS TIME) >= '08:00:00'
AND CAST(OC.LocalStartTime AS TIME) <= '18:00:00'
THEN 1
ELSE 0
END
) AS TotalRestrictedSessions
Generally, you should include the current query results and your desired results in your question to make it easier to figure out where the issues are.
I have a SQL Server view to show an overview of account statements, first we calculate the latest closing balances of the user accounts to know what the latest balance was from their account. This is the LATEST_CB_DATES part.
Than we calculate the next business days, meaning the 2 next days where we are expecting to receive a balance in the database. This happens in NEXT_B_DAYS
Finally we calculate if the account is expecting a closing balance, received one or received one too late. Note that we use a window reception ending for this.
IF EXISTS (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = 'VIEW_AS_AS_ACCT_STAT')
DROP VIEW VIEW_AS_AS_ACCT_STAT
GO
CREATE VIEW VIEW_AS_AS_ACCT_STAT AS
WITH LATEST_CB_DATES AS (
SELECT * FROM (
SELECT row_number() over (partition by SD_ACCT.ID order by (AS_ACCT_STAT.CBAL_BAL_DATE) DESC) RN,SD_ACCT.ID, SD_ACCT.ACCT_NBR, AS_ACCT_STAT.CBAL_BAL_DATE AS BAL_DATE, SD_ACCT.CODE, SD_ACCT.CCY, SD_ACCT_GRP.ID AS GRP_ID, SD_ACCT_GRP.CODE AS ACCT_GRP_CODE, SD_ACCT.DATA_OWNER_ID, AS_ACCT_STAT.STATIC_DATA_BNK AS BANK_CODE, AS_ACCT_STAT.STATIC_DATA_HLD AS HOLDER_CODE
FROM SD_ACCT
LEFT JOIN AS_ACCT on SD_ACCT.ID = AS_ACCT.STATIC_DATA_ACCT_ID
LEFT JOIN AS_ACCT_STAT on AS_ACCT.ID = AS_ACCT_STAT.ACCT_ID
JOIN SD_ACCT_GRP_MEMBER ON SD_ACCT.ID = SD_ACCT_GRP_MEMBER.ACCT_ID
JOIN SD_ACCT_GRP on SD_ACCT_GRP_MEMBER.GRP_ID = SD_ACCT_GRP.ID
JOIN SD_ACCT_GRP_ROLE on SD_ACCT_GRP_ROLE.ID = SD_ACCT_GRP.ROLE_ID
WHERE SD_ACCT_GRP_ROLE.CODE = 'AccountStatementsToReceive' AND (AS_ACCT_STAT.VALID = 1 OR AS_ACCT_STAT.VALID IS NULL)
) LST_STMT
WHERE RN = 1
),
NEXT_B_DAYS AS (
SELECT VIEW_BUSINESS_DATES.CAL_ID, VIEW_BUSINESS_DATES.BUSINESS_DATE,
LEAD(VIEW_BUSINESS_DATES.BUSINESS_DATE, 1) OVER (PARTITION BY VIEW_BUSINESS_DATES.CAL_CODE ORDER BY VIEW_BUSINESS_DATES.BUSINESS_DATE) AS NEXT_BUSINESS_DATE,
LEAD(VIEW_BUSINESS_DATES.BUSINESS_DATE, 2) OVER (PARTITION BY VIEW_BUSINESS_DATES.CAL_CODE ORDER BY VIEW_BUSINESS_DATES.BUSINESS_DATE) AS SECOND_BUSINESS_DATE
FROM VIEW_BUSINESS_DATES
)
SELECT LATEST_CB_DATES.ID AS ACCT_ID,
LATEST_CB_DATES.CODE AS ACCT_CODE,
LATEST_CB_DATES.ACCT_NBR,
LATEST_CB_DATES.CCY AS ACCT_CCY,
LATEST_CB_DATES.BAL_DATE AS LATEST_CLOSING_BAL_DATE,
LATEST_CB_DATES.DATA_OWNER_ID,
LATEST_CB_DATES.BANK_CODE,
LATEST_CB_DATES.HOLDER_CODE,
LATEST_CB_DATES.ACCT_GRP_CODE,
CASE
WHEN LATEST_CB_DATES.BAL_DATE IS NULL THEN 'Expecting'
WHEN NEXT_B_DAYS.NEXT_BUSINESS_DATE IS NULL OR NEXT_B_DAYS.SECOND_BUSINESS_DATE IS NULL THEN 'Late'
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NOT NULL AND GETDATE() >= TODATETIMEOFFSET(CAST(NEXT_B_DAYS.SECOND_BUSINESS_DATE AS DATETIME) + CAST(CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END AS TIME) AS DATETIME), SEC_TIMEZONE.UTC_TIME_TOTAL_OFFSET) THEN 'Late'
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NULL AND GETDATE() >= TODATETIMEOFFSET(CAST(NEXT_B_DAYS.SECOND_BUSINESS_DATE AS DATETIME) + CAST(CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_START AS TIME) AS DATETIME), SEC_TIMEZONE.UTC_TIME_TOTAL_OFFSET) AND CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END AS TIME) >= CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_START AS TIME) THEN 'Expecting'
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NULL AND GETDATE() >= TODATETIMEOFFSET(CAST(NEXT_B_DAYS.NEXT_BUSINESS_DATE AS DATETIME) + CAST(CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_START AS TIME) AS DATETIME), SEC_TIMEZONE.UTC_TIME_TOTAL_OFFSET) AND CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END AS TIME) < CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_START AS TIME) THEN 'Expecting' -- overnight
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NULL AND CAST (GETDATE() AS DATE) > NEXT_B_DAYS.SECOND_BUSINESS_DATE THEN 'Expecting'
ELSE 'Received'
END AS STAT,
CASE
WHEN LATEST_CB_DATES.BAL_DATE IS NULL THEN NULL
WHEN NEXT_B_DAYS.NEXT_BUSINESS_DATE IS NULL OR NEXT_B_DAYS.SECOND_BUSINESS_DATE IS NULL THEN NULL
WHEN AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END IS NOT NULL THEN CAST(NEXT_B_DAYS.SECOND_BUSINESS_DATE AS DATETIME) + CAST(CAST(AS_AS_RECEPTION_CONF.RECEPTION_WINDOW_END AS TIME) AS DATETIME)
ELSE NULL
END AS DEADLINE,
SEC_TIMEZONE.UTC_TIME_TOTAL_OFFSET AS TIME_ZONE
FROM AS_AS_RECEPTION_CONF
JOIN LATEST_CB_DATES ON AS_AS_RECEPTION_CONF.ACCT_GRP_ID = LATEST_CB_DATES.GRP_ID
JOIN SEC_TIMEZONE ON SEC_TIMEZONE.ID = AS_AS_RECEPTION_CONF.TIME_ZONE_ID
LEFT JOIN NEXT_B_DAYS ON AS_AS_RECEPTION_CONF.CALENDAR_ID = NEXT_B_DAYS.CAL_ID AND LATEST_CB_DATES.BAL_DATE = NEXT_B_DAYS.BUSINESS_DATE
GO
SELECT * FROM VIEW_AS_AS_ACCT_STAT
What is the issue? Nothing, this works fine, but it's slow. We created a graphical report to display the data for our customers, but it takes 1minute, 30 seconds to load this SQL when you have 5000 accounts, which is too slow.
I guess the reason is the last line, but I didn't manage to refactor it well
LEFT JOIN NEXT_B_DAYS ON AS_AS_RECEPTION_CONF.CALENDAR_ID =
NEXT_B_DAYS.CAL_ID AND LATEST_CB_DATES.BAL_DATE =
NEXT_B_DAYS.BUSINESS_DATE
The exeuction plan of my sql can be found here
How can I refactor this to make my view still work but much more performant?