SQL Split Island On Criteria - sql

I have a SQL table with From and To dates like so:
Row From To
--------------------------------------------------
1 2017-10-28 00:00:00 2017-10-30 00:00:00
2 2017-10-30 00:00:00 2017-10-31 00:00:00
3 2017-10-31 00:00:00 2017-10-31 07:30:00
4 2017-10-31 14:41:00 2017-10-31 15:14:00
5 2017-10-31 17:13:00 2017-11-01 00:00:00
6 2017-11-01 00:00:00 2017-11-01 23:45:00
7 2017-11-02 03:13:00 2017-11-02 07:56:00
I need to group consecutive data into islands. The data is non-overlapping. This is done easily enough using this query:
;with Islands as
(
SELECT
min([From]) as [From]
,max([To]) as [To]
FROM
(
select
[From],
[To],
sum(startGroup) over (order by [From]) StartGroup
from
(
SELECT
[From],
[To],
(case when [From] <= lag([To]) over (order by [From])
then 0
else 1
end) as StartGroup
FROM dbo.DateTable
) IsNewIsland
) GroupedIsland
group by StartGroup
)
select *
from Islands
And gives me these results:
From To Rows
-----------------------------------------------------
2017-10-28 00:00:00 2017-10-31 07:30:00 1-3
2017-10-31 14:41:00 2017-10-31 15:14:00 4
2017-10-31 17:13:00 2017-11-01 23:45:00 5-6
2017-11-02 03:13:00 2017-11-02 07:56:00 7
The problem I have is that I need to modify the query to cap/split the islands once they have gotten enough records to be a certain total duration. This is an input/hardcoded value. The split includes the entire record, not splitting in the middle of a record's From-To range. As an example, I need to split islands to be 27 hours. This would give this result:
From To Rows
-----------------------------------------------------
2017-10-29 00:00:00 2017-10-30 00:00:00 1
2017-10-30 00:00:00 2017-10-31 07:30:00 2-3
2017-10-31 17:13:00 2017-11-01 23:45:00 5-6
The first island was split because rows 1 and 2 alone created a 27 hour period. Rows 4 and 7 are not enough to create an island, so they are ignored.
I tried pulling this information via a lag function in the inner select to compute the "rolling duration" across rows, but it would not work on islands that spanned more than 2 rows because it would only track the last row's duration and I could not "carry" the calculation forward.
SELECT
[From],
[To],
(case when [From] <= lag([To]) over (order by [From]
then (datediff(minute, [From], [To]) + lag(datediff(minute, [From], [To])) over (order by [From]))
else datediff(minute, [From], [To])
end) as RollingDuration,
(case when [From] <= lag([To]) over (order by [From])
then 0
else 1
end) as StartGroup
FROM dbo.DateTable

The "least worst" way I can think of doing it is a "quirky update". (Google it, I honestly didn't make it up.)
http://www.sqlservercentral.com/articles/T-SQL/68467/
Copy the data in to a new table with one or more additional (blank) fields
Use a CLUSTERED PRIMARY KEY to ensure the rows are updated in correct sequence
Use UPDATE and user variables to iterate through rows and store results of calculations
Using that I can start a new group if there is a gap, or a running total reaches 27 hours. Then proceed as usual.
-- New table to work through
----------------------------------------------------------------------
-- Addition [group_start] field (identifies groups, and useful data)
-- PRIMARY KEY CLUSTERED to enforce the order rows will be processed
----------------------------------------------------------------------
CREATE TABLE sample (
id INT,
start DATETIME,
cease DATETIME,
group_start DATETIME DEFAULT(0),
PRIMARY KEY CLUSTERED (group_start, start) -- To force the order we will iterate the rows, and is useful in last step
);
INSERT INTO
sample (
id,
start,
cease
)
VALUES
(1, '2017-10-28 00:00:00', '2017-10-30 00:00:00'),
(2, '2017-10-30 00:00:00', '2017-10-31 00:00:00'),
(3, '2017-10-31 00:00:00', '2017-10-31 07:30:00'),
(4, '2017-10-31 14:41:00', '2017-10-31 15:14:00'),
(5, '2017-10-31 17:13:00', '2017-11-01 00:00:00'),
(6, '2017-11-01 00:00:00', '2017-11-01 23:45:00'),
(7, '2017-11-02 03:13:00', '2017-11-02 07:56:00')
;
-- Quirky Update
----------------------------------------------------------------------
-- Update [group_start] to the start of the current group
-- -> new group if gap since previous row
-- -> new group if previous row took group to 27 hours
-- -> else same group as previous row
----------------------------------------------------------------------
DECLARE #grp_start DATETIME = 0;
WITH
lagged AS
(
SELECT *, LAG(cease) OVER (ORDER BY group_start, start) AS lag_cease FROM sample
)
UPDATE
lagged
SET
#grp_start
= group_start
= CASE WHEN start <> lag_cease THEN start
WHEN start >= DATEADD(hour, 27, #grp_start) THEN start
ELSE #grp_start END
OPTION
(MAXDOP 1)
;
-- Standard SQL to apply other logic
----------------------------------------------------------------------
-- MAX() OVER () to find end time of each group
-- WHERE to filter out any groups under 12 hours long
----------------------------------------------------------------------
SELECT
*
FROM
(
SELECT
*,
MAX(cease) OVER (PARTITION BY group_start) AS group_cease
FROM
sample
)
bounded_groups
WHERE
group_cease >= DATEADD(hour, 12, group_start)
;
http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=1bec5b3fe920c1affd58f23a11e280a0

Related

How to select rows based on a rolling 30 day window SQL

My question involves how to identify an index discharge.
The index discharge is the earliest discharge. On that date, the 30 day window starts. Any admissions during that time period are considered readmissions, and they should be ignored. Once the 30 day window is over, then any subsequent discharge is considered an index and the 30 day window begins again.
I can't seem to work out the logic for this. I've tried different windowing functions, I've tried cross joins and cross applies. The issue I keep encountering is that a readmission cannot be an index admission. It must be excluded.
I have successfully written a while loop to solve this problem, but I'd really like to get this in a set based format, if it's possible. I haven't been successful so far.
Ultimate goal is this -
id
AdmitDate
DischargeDate
MedicalRecordNumber
IndexYN
1
2021-03-03 00:00:00.000
2021-03-09 13:20:00.000
X0090362
1
4
2021-03-05 00:00:00.000
2021-03-10 16:00:00.000
X0012614
1
6
2021-05-18 00:00:00.000
2021-05-21 22:20:00.000
X0012614
1
7
2021-06-21 00:00:00.000
2021-07-08 13:30:00.000
X0012614
1
8
2021-02-03 00:00:00.000
2021-02-09 17:00:00.000
X0019655
1
10
2021-03-23 00:00:00.000
2021-03-26 16:40:00.000
X0019655
1
11
2021-03-15 00:00:00.000
2021-03-18 15:53:00.000
X4135958
1
13
2021-05-17 00:00:00.000
2021-05-23 14:55:00.000
X4135958
1
15
2021-06-24 00:00:00.000
2021-07-13 15:06:00.000
X4135958
1
Sample code is below.
CREATE TABLE #Admissions
(
[id] INT,
[AdmitDate] DATETIME,
[DischargeDateTime] DATETIME,
[UnitNumber] VARCHAR(20),
[IndexYN] INT
)
INSERT INTO #Admissions
VALUES( 1 ,'2021-03-03' ,'2021-03-09 13:20:00.000' ,'X0090362', NULL)
,(2 ,'2021-03-27' ,'2021-03-30 19:59:00.000' ,'X0090362', NULL)
,(3 ,'2021-03-31' ,'2021-04-04 05:57:00.000' ,'X0090362', NULL)
,(4 ,'2021-03-05' ,'2021-03-10 16:00:00.000' ,'X0012614', NULL)
,(5 ,'2021-03-28' ,'2021-04-16 13:55:00.000' ,'X0012614', NULL)
,(6 ,'2021-05-18' ,'2021-05-21 22:20:00.000' ,'X0012614', NULL)
,(7 ,'2021-06-21' ,'2021-07-08 13:30:00.000' ,'X0012614', NULL)
,(8 ,'2021-02-03' ,'2021-02-09 17:00:00.000' ,'X0019655', NULL)
,(9 ,'2021-02-17' ,'2021-02-22 17:25:00.000' ,'X0019655', NULL)
,(10 ,'2021-03-23' ,'2021-03-26 16:40:00.000' ,'X0019655', NULL)
,(11 ,'2021-03-15' ,'2021-03-18 15:53:00.000' ,'X4135958', NULL)
,(12 ,'2021-04-08' ,'2021-04-13 19:42:00.000' ,'X4135958', NULL)
,(13 ,'2021-05-17' ,'2021-05-23 14:55:00.000' ,'X4135958', NULL)
,(14 ,'2021-06-09' ,'2021-06-14 12:45:00.000' ,'X4135958', NULL)
,(15 ,'2021-06-24' ,'2021-07-13 15:06:00.000' ,'X4135958', NULL)
You can use a recursive CTE to identify all rows associated with each "index" discharge:
with a as (
select a.*, row_number() over (order by dischargedatetime) as seqnum
from admissions a
),
cte as (
select id, admitdate, dischargedatetime, unitnumber, seqnum, dischargedatetime as index_dischargedatetime
from a
where seqnum = 1
union all
select a.id, a.admitdate, a.dischargedatetime, a.unitnumber, a.seqnum,
(case when a.dischargedatetime > dateadd(day, 30, cte.index_dischargedatetime)
then a.dischargedatetime else cte.index_dischargedatetime
end) as index_dischargedatetime
from cte join
a
on a.seqnum = cte.seqnum + 1
)
select *
from cte;
You can then incorporate this into an update:
update admissions
set indexyn = (case when admissions.dischargedatetime = cte.index_dischargedatetime then 'Y' else 'N' end)
from cte
where cte.id = admissions.id;
Here is a db<>fiddle. Note that I changed the type of IndexYN to a character to assign 'Y'/'N', which makes sense given the column name.

Compare date in multiple rows and calculate the downtime

I'm trying to calculate the downtime for a train from a service record, below is a sample scenario
There can be multiple jobs running simultaneous for a train which can overlap at times
For:
Job_number 1 the date diff between the work start and end date is 360 Minute
Job_number 2 the date diff between the work start and end date is 60 Minute but this overlap with Job_number 1 so we shouldn't consider this
Job_number 3 the date diff between the work start and end date is 45 Minute but this partially overlap with Job_number 1 so we should consider only 10 Minute
So the actual down time should be 360 Minute (Job 1) + 0 Minute (Job 2) + 10 Minute (Job 3) = 370 Minute
My desired output is :-
I'm having 20 trains as of now for which I need to calculate the downtime as above
How do I do this?
Sample Data script:
CREATE TABLE [dbo].[tb_ServiceMemo](
[Job_Number] [nvarchar](500) NULL,
[Train_Number] [nvarchar](500) NULL,
[Work_Start_Date] [datetime] NULL,
[Work_Completed_Date] [datetime] NULL
) ON [PRIMARY]
INSERT INTO [dbo].[tb_ServiceMemo]
VALUES (1,1,'01-08-2018 12:35','01-08-18 18:35'),
(2,1,'01-08-2018 14:20','01-08-18 15:20'),
(3,1,'01-08-2018 18:00','01-08-18 18:45')
This is a gaps-and-islands problem, but it is tricky because it has start and end times.
The idea for the solution is to determine when an outage starts. What is the characteristic? Well, the period starts at a time where there is no overlap with preceding work. The tricky part is that more than one "work" effort could start at the same time (although your data does not show this).
Once you know the time when an outage starts, you can use a cumulative sum to assign a group to each record and then simply aggregate by that group (and other information).
The following query should do what you want:
with starts as (
select sm.*,
(case when exists (select 1
from tb_ServiceMemo sm2
where sm2.Train_Number = sm.Train_Number and
sm2.Work_Start_Date < sm.Work_Start_Date and
sm2.Work_Completed_Date >= sm.Work_Start_Date
)
then 0 else 1
end) as isstart
from tb_ServiceMemo sm
)
select Train_Number, min(Work_Start_Date) as outage_start_date, max(Work_Completed_Date) as outage_end_date,
datediff(minute, min(Work_Start_Date), max(Work_Completed_Date))
from (select s.*, sum(isstart) over (partition by Train_Number order by Work_Start_Date) as grp
from starts s
) s
group by Train_Number, grp;
In this db<>fiddle, I added a few more rows to show how the code works in different scenarios.
This is a Gaps and Islands in Sequences problem.
You can try to use recursive CTE, get the minute during every row.
then use every MAX and MIN DateTime to calculate the result.
;WITH CTE AS (
SELECT [Train_Number], [Work_Start_Date] ,[Work_Completed_Date]
FROM [tb_ServiceMemo]
UNION ALL
SELECT [Train_Number], DATEADD(minute,1,[Work_Start_Date]) ,[Work_Completed_Date]
FROM CTE
WHERE DATEADD(minute,1,[Work_Start_Date]) <= [Work_Completed_Date]
),CTE2 AS (
SELECT DISTINCT Train_Number,
Work_Start_Date,
MAX(Work_Completed_Date) OVER(PARTITION BY Train_Number ORDER BY Work_Completed_Date DESC) MAX_Time
FROM CTE
),CTE_RESULT AS (
SELECT *,datediff(mi,MAX_Time,Work_Start_Date) - row_number() over(PARTITION BY Train_Number ORDER BY Work_Start_Date) grp
FROM CTE2
)
SELECT Train_Number,sum(time_diff)
FROM (
SELECT Train_Number,DATEDIFF(MI,MIN(Work_Start_Date),MAX(Work_Start_Date)) time_diff
FROM CTE_RESULT
GROUP BY Train_Number,grp
)t1
GROUP BY Train_Number
option ( MaxRecursion 0 );
sqlfiddle
This is the infamous gaps and islands problem with dates. The following is a solution that uses a recursive CTE. It might be a little tough to understand if you aren't used to working with them, I commented all parts that might need clarifying.
I also added a few more examples to contemplate different scenarios, such as different days on periods and overlapping times exactly at the start/end.
Example setup:
IF OBJECT_ID('tempdb..#tb_ServiceMemo') IS NOT NULL
DROP TABLE #tb_ServiceMemo
CREATE TABLE #tb_ServiceMemo(
Job_Number INT, -- This is an INT not VARCHAR!! (even the name says so)
Train_Number INT, -- This one also!!
Work_Start_Date DATETIME,
Work_Completed_Date DATETIME)
INSERT INTO #tb_ServiceMemo (
Job_Number,
Train_Number,
Work_Start_Date,
Work_Completed_Date)
VALUES
-- Total time train 1: 6h 10m (370m)
(1,1,'2018-08-01 12:35','2018-08-01 18:35'), -- Make sure to write date literals in ISO format (yyyy-MM-dd) to avoid multiple interpretations
(2,1,'2018-08-01 14:20','2018-08-01 15:20'),
(3,1,'2018-08-01 18:00','2018-08-01 18:45'),
-- Total time train 2: 2h (120m)
(4,2,'2018-08-01 12:00','2018-08-01 12:10'),
(5,2,'2018-08-01 12:15','2018-08-01 12:20'),
(6,2,'2018-08-01 13:15','2018-08-01 13:45'),
(9,2,'2018-08-01 13:45','2018-08-01 15:00'),
-- Total time train 3: 3h 45m (225m)
(7,3,'2018-08-01 23:30','2018-08-02 00:30'),
(8,3,'2018-08-02 00:15','2018-08-02 03:15'),
-- Total time train 4: 2d 8h 15m (3375m)
(10,4,'2018-08-01 23:00','2018-08-03 23:00'),
(11,4,'2018-08-02 00:15','2018-08-04 07:15')
The solution:
;WITH TimeLapses AS
(
-- Recursive Anchor: Find the minimum Jobs for each train that doesn't overlap with previous Jobs
SELECT
InitialJobNumber = T.Job_Number,
JobNumber = T.Job_Number,
TrainNumber = T.Train_Number,
IntervalStart = T.Work_Start_Date,
IntervalEnd = T.Work_Completed_Date,
JobExtensionPath = CONVERT(VARCHAR(MAX), T.Job_Number), -- Will store the chained jobs together for clarity
RecursionLevel = 1
FROM
#tb_ServiceMemo AS T
WHERE
NOT EXISTS (
SELECT
'Job doesn''t overlap with previous Jobs (by train)'
FROM
#tb_ServiceMemo AS S
WHERE
S.Train_Number = T.Train_Number AND
S.Job_Number < T.Job_Number AND
S.Work_Completed_Date >= T.Work_Start_Date AND -- Conditions for the periods to overlap
S.Work_Start_Date <= T.Work_Completed_Date)
UNION ALL
-- Recursive Union: Chain overlapping Jobs by train and keep intervals boundaries (min & max)
SELECT
InitialJobNumber = L.InitialJobNumber,
JobNumber = T.Job_Number,
TrainNumber = L.TrainNumber,
IntervalStart = CASE -- Minimum of both starts
WHEN L.IntervalStart <= T.Work_Start_Date THEN L.IntervalStart
ELSE T.Work_Start_Date END,
IntervalEnd = CASE -- Maximum of both ends
WHEN L.IntervalEnd >= T.Work_Completed_Date THEN L.IntervalEnd
ELSE T.Work_Completed_Date END,
JobExtensionPath = L.JobExtensionPath + '->' + CONVERT(VARCHAR(MAX), T.Job_Number),
RecursionLevel = L.RecursionLevel + 1
FROM
TimeLapses AS L -- Recursive CTE!
INNER JOIN #tb_ServiceMemo AS T ON
L.TrainNumber = T.Train_Number AND
T.Work_Completed_Date >= L.IntervalStart AND -- Conditions for the periods to overlap
T.Work_Start_Date <= L.IntervalEnd
WHERE
L.JobNumber < T.Job_Number -- Prevent joining in both directions (that would be "<>") to avoid infinite loops
),
MaxRecursionLevelByTrain AS
(
/*
Max recursion level will hold the longest interval for each train, as there might be recursive paths that skips some jobs. For example: Train 1's job 1 will
join with Job 2 and Job 3 on the first recursive level, then Job 2 will join with Job 3 on the next recursion. The higher the recursion level the more Jobs we
are taking into account for the longest interval.
We also need to group by InitialJobNumber as there might be different, idependent gaps for each train.
*/
SELECT
TrainNumber = T.TrainNumber,
InitialJobNumber = T.InitialJobNumber,
MaxRecursionLevel = MAX(T.RecursionLevel)
FROM
TimeLapses AS T
GROUP BY
T.TrainNumber,
T.InitialJobNumber
),
ExpandedLapses AS
(
SELECT
TrainNumber = T.TrainNumber,
InitialJobNumber = M.InitialJobNumber,
IntervalStart = T.IntervalStart,
IntervalEnd = T.IntervalEnd,
DownTime = DATEDIFF(MINUTE, T.IntervalStart, T.IntervalEnd),
JobExtensionPath = T.JobExtensionPath,
RecursionLevel = T.RecursionLevel
FROM
MaxRecursionLevelByTrain AS M
INNER JOIN TimeLapses AS T ON
M.TrainNumber = T.TrainNumber AND
M.MaxRecursionLevel = T.RecursionLevel AND
M.InitialJobNumber = T.InitialJobNumber
)
SELECT
TrainNumber = E.TrainNumber,
TotalDownTime = SUM(DownTime)
FROM
ExpandedLapses AS E
GROUP BY
E.TrainNumber
And these are the partial results from each CTE, so you can see each step:
TimeLapses:
InitialJobNumber JobNumber TrainNumber IntervalStart IntervalEnd JobExtensionPath RecursionLevel
1 1 1 2018-08-01 12:35:00.000 2018-08-01 18:35:00.000 1 1
1 2 1 2018-08-01 12:35:00.000 2018-08-01 18:35:00.000 1->2 2
1 3 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 1->3 2
1 3 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 1->2->3 3
4 4 2 2018-08-01 12:00:00.000 2018-08-01 12:10:00.000 4 1
5 5 2 2018-08-01 12:15:00.000 2018-08-01 12:20:00.000 5 1
6 6 2 2018-08-01 13:15:00.000 2018-08-01 13:45:00.000 6 1
6 9 2 2018-08-01 13:15:00.000 2018-08-01 15:00:00.000 6->9 2
7 8 3 2018-08-01 23:30:00.000 2018-08-02 03:15:00.000 7->8 2
7 7 3 2018-08-01 23:30:00.000 2018-08-02 00:30:00.000 7 1
10 10 4 2018-08-01 23:00:00.000 2018-08-03 23:00:00.000 10 1
10 11 4 2018-08-01 23:00:00.000 2018-08-04 07:15:00.000 10->11 2
MaxRecursionLevelByTrain:
TrainNumber InitialJobNumber MaxRecursionLevel
1 1 3
2 4 1
2 5 1
2 6 2
3 7 2
4 10 2
ExtendedLapses:
TrainNumber InitialJobNumber IntervalStart IntervalEnd DownTime JobExtensionPath RecursionLevel
1 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 370 1->2->3 3
2 4 2018-08-01 12:00:00.000 2018-08-01 12:10:00.000 10 4 1
2 5 2018-08-01 12:15:00.000 2018-08-01 12:20:00.000 5 5 1
2 6 2018-08-01 13:15:00.000 2018-08-01 15:00:00.000 105 6->9 2
3 7 2018-08-01 23:30:00.000 2018-08-02 03:15:00.000 225 7->8 2
4 10 2018-08-01 23:00:00.000 2018-08-04 07:15:00.000 3375 10->11 2
Final Result:
TrainNumber TotalDownTime
1 370
2 120
3 225
4 3375
A few things worth mentioning:
While this solution will definitely be faster than using a cursor, it might not be the best one available, specially if you have a huge dataset (more than 100k records). There is room for improving performance.
You might benefit from a index on #tb_ServiceMemo (Train_Number, Job_Number, Work_Start_Date) to speed up the query.
You might need to add OPTION (MAXRECURSION N) at the end of the SELECT statement, being N the max recursion level you want to try. Default is 100, so if there are more than 100 periods that chain together for a particular train, an error message will pop up. You can use 0 as N for unlimited.
Make sure that every end time is higher than the start time, and that the job numbers don't repeat, at least by each train.
Can you try this one ? I added other test case to besure but I think it's OK. I also think there is more simple
INSERT INTO [dbo].[tb_ServiceMemo]
SELECT 1, 1, CONVERT(DATETIME, '2018-08-01 09:35:00', 120), CONVERT(DATETIME, '2018-08-01 12:45:00', 120) union
SELECT 2, 1, CONVERT(DATETIME, '2018-08-01 12:35:00', 120), CONVERT(DATETIME, '2018-08-01 18:35:00', 120) union
SELECT 3, 1, CONVERT(DATETIME, '2018-08-01 14:20:00', 120), CONVERT(DATETIME, '2018-08-01 15:20:00', 120) union
SELECT 4, 1, CONVERT(DATETIME, '2018-08-01 18:00:00', 120), CONVERT(DATETIME, '2018-08-01 18:45:00', 120) union
SELECT 5, 1, CONVERT(DATETIME, '2018-08-01 19:00:00', 120), CONVERT(DATETIME, '2018-08-01 19:45:00', 120)
SELECT [Train_Number], SUM(DATEDIFF(MINUTE, T.[Work_Start_Date], T.Work_Completed_Date)) as Delay
FROM (
SELECT
[Job_Number],
[Train_Number],
CASE
WHEN EXISTS(SELECT * FROM [tb_ServiceMemo] T3 WHERE T1.[Work_Start_Date] BETWEEN T3.[Work_Start_Date] AND T3.[Work_Completed_Date] AND T1.[Job_Number] <> T3.[Job_Number] AND T1.Train_Number = T3.Train_Number)
THEN (SELECT MAX(T3.[Work_Completed_Date]) FROM [tb_ServiceMemo] T3 WHERE T1.[Work_Start_Date] BETWEEN T3.[Work_Start_Date] AND T3.[Work_Completed_Date] AND T1.[Job_Number] <> T3.[Job_Number] AND T1.Train_Number = T3.Train_Number)
ELSE [Work_Start_Date] END as [Work_Start_Date],
[Work_Completed_Date]
FROM [tb_ServiceMemo] T1
WHERE NOT EXISTS( -- To kick off the ignored case
SELECT T2.*
FROM [tb_ServiceMemo] T2
WHERE T2.[Work_Start_Date] < T1.[Work_Start_Date] AND T2.[Work_Completed_Date] > T1.[Work_Completed_Date]
)
) as T
GROUP BY [Train_Number]
The idea is to :
ignore the result contained into another
rewrite the start date value of each rown if she is contained into another

How to add values of weekend and holiday's to the previous working day

I have to add weekend and holiday's value to the previous working day value so that weekend and holiday's should not display in the report but if we don't have previous working day we should simply skip the row as 2018-01-01 skipped in the below output
**DAYS VALUE**
2018-01-01 10 Holiday-1
2018-01-02 20
2018-01-03 30
2018-01-04 40
2018-01-05 50
2018-01-06 60 Saturday
2018-01-07 70 Sunday
2018-01-08 80
2018-01-09 90
2018-01-10 100 Holiday-2
OUTPUT
2018-01-02 20
2018-01-03 30
2018-01-04 40
2018-01-05 180
2018-01-08 80
2018-01-09 190
I am trying with LEAD, LAG, DATEDIFF and in other ways but not getting any solution so please guys help he with this problem.
When there is a row in your Holidays calendar table (I will assume, that weekends are there too), you need to find the max date, prior the current one, for which there is no row in holidays table. Then group by this "real date" and sum the value. Something like this:
declare #t table([DAYS] date, [VALUE] int)
declare #Holidays table([DAYS] date, Note varchar(100))
insert into #t values
('2018-01-01', 10),
('2018-01-02', 20),
('2018-01-03', 30),
('2018-01-04', 40),
('2018-01-05', 50),
('2018-01-06', 60),
('2018-01-07', 70),
('2018-01-08', 80),
('2018-01-09', 90),
('2018-01-10', 100)
insert into #Holidays values
('2018-01-01', 'Holiday-1'),
('2018-01-06', 'Saturday'),
('2018-01-07', 'Sunday'),
('2018-01-10', 'Holiday-2')
;with cte as (
select
IIF(h1.[DAYS] is not null /* i.e. it is a holiday */,
(select max([DAYS])
from #t t2
where t2.[DAYS] < t1.[DAYS] and not exists(select * from #Holidays h2 where h2.[DAYS] = t2.[DAYS])), t1.[DAYS]) as RealDate
, t1.[VALUE]
from #t t1
left join #Holidays h1 on t1.DAYS = h1.[DAYS]
)
select
RealDate
, sum([VALUE]) as RealValue
from cte
where RealDate is not null
group by RealDate
You can do this with cumulative sums (to define groups) and aggregation. Define the groups as the number of non-holidays on or before a given day, then aggregate. This is the same value for a non-holiday followed by a holiday.
Then aggregate:
select max(days) as days, sum(value)
from (select t.*,
sum(case when holiday is null then 1 else 0 end) over (order by days asc) as grp
from t
) t
group by grp;
EDIT:
With a separate holidays table, you just need to add the join:
select max(days) as days, sum(value)
from (select t.*,
sum(case when h.holiday is null then 1 else 0 end) over (order by t.days asc) as grp
from t left join
holidays h
on t.days = h.date
) t
group by grp;

Counting rows between dates using row number?

I am trying to find the number of rows that 2 dates fall between. Basically I have an auth dated 1/1/2018 - 4/1/2018 and I need the count of pay periods those dates fall within.
Here is the data I am looking at:
create table #dates
(
pp_start_date date,
pp_end_date date
)
insert into #dates (pp_start_date,pp_end_date)
values ('2017-12-28', '2018-01-10'),
('2018-01-11', '2018-01-24'),
('2018-01-25', '2018-02-07'),
('2018-02-08', '2018-02-21'),
('2018-02-22', '2018-03-07'),
('2018-03-08', '2018-03-21'),
('2018-03-22', '2018-04-04'),
('2018-04-05', '2018-04-18');
When I run this query,
SELECT
ad.pp_start_date, ad.pp_end_date, orderby
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY pp_start_date) AS orderby, *
FROM
#dates) ad
WHERE
'2018-01-01' <= ad.pp_end_date
I somehow want to only get 7 rows. Is this even possible? Thanks in advance for any help!
EDIT - Ok so using a count(*) worked to get the number of rows but now I am trying to get the number of rows for 2 dynamic dates form another temp table but I don't see a way to relate the data.
Using the #dates temp table referenced above gives me the date data. Now using this data:
create table #stuff
([month] date,
[name] varchar(20),
units int,
fips_code int,
auth_datefrom date,
auth_dateto date)
insert into #stuff (month,name,units,fips_code,auth_datefrom,auth_dateto)
values ('2018-01-01','SMITH','50','760', '2018-01-01', '2018-04-01');
insert into #stuff (month,name,units,fips_code,auth_datefrom,auth_dateto)
values ('2018-01-01','JONES','46','193', '2018-01-01', '2018-04-01');
insert into #stuff (month,name,units,fips_code,auth_datefrom,auth_dateto)
values ('2018-01-01','DAVID','84','109', '2018-02-01', '2018-04-01');
I want to somehow create a statement that does a count of rows from the #dates table where the auth dates are referenced in the #stuff table I just can't figure out how to relate them or join them.
pp_start_date <= auth_dateto and pp_end_date >= auth_datefrom
Here is my output for #dates
pp_start_date pp_end_date
2017-12-28 2018-01-10
2018-01-11 2018-01-24
2018-01-25 2018-02-07
2018-02-08 2018-02-21
2018-02-22 2018-03-07
2018-03-08 2018-03-21
2018-03-22 2018-04-04
2018-04-05 2018-04-18
Here is my output for #stuff
month name units fips_code auth_datefrom auth_dateto
2018-01-01 SMITH 50 760 2018-01-01 2018-04-01
2018-01-01 JONES 46 193 2018-01-01 2018-04-01
2018-01-01 DAVID 84 109 2018-02-01 2018-04-01
I am trying to use the auth_datefrom and auth_dateto from #stuff to find out how many rows that is from #dates.
try this one.
SELECT ad.pp_start_date, ad.pp_end_date, orderby
from (select
row_number()over ( order by pp_start_date) as orderby, * from
#dates) ad
where ad.pp_end_date <= '2018-01-01'
or ad.pp_start_date >= '2018-01-01'
Are you looking for this?
select d.*
from #dates d
where d.startdate <= '2018-04-01' and
d.enddate >= '2018-01-01';
This returns all rows that have a date with the time period you specify.
I'm not sure what the row_number() does. If you want the count, then:
select count(*)
from #dates d
where d.startdate <= '2018-04-01' and
d.enddate >= '2018-01-01';

select a chain of records using mssql

I have a table with records in TimeLines, I need to get rows that form a chain of 45 minutes set.
1|2016-01-01 00:00
2|2016-01-01 00:30
3|2016-01-01 00:45
4|2016-01-01 01:00
How I can find 2nd row depending from it time, cause 2nd, 3rd and 4th rows are indissoluble 15 minutes chain of timeline for 45 min set?
1st and 2nd is not okay, cause interval between timelines is 30 min.
2nd, 3rd and 4th rows are consistent chain of timeline.
2nd row plus 15 min - okay. cause existed 3rd row with that time.
3rd row plus 15 min - okay. cause existed 4th row with that time.
as result i have 45 min consistent timeline chain.
1row plus 15 min - not okay. cause 00:15 time with date not existed.
Try this
DECLARE #Tbl TABLE (Id INT, StartDate DATETIME)
INSERT INTO #Tbl
VALUES
(1,'2016-01-01 00:00'),
(2,'2016-01-01 00:30'),
(3,'2016-01-01 00:45'),
(4,'2016-01-01 01:00')
;WITH CTE
AS
(
SELECT
Id ,
StartDate,
ROW_NUMBER() OVER (ORDER BY Id) AS RowId
FROM
#Tbl
)
SELECT
CurRow.*,
CASE
WHEN
DATEDIFF(MINUTE, CurRow.StartDate, NextRow.StartDate ) = 15 OR
DATEDIFF(MINUTE, PrevRow.StartDate, CurRow.StartDate ) = 15
THEN '15 MIN'
ELSE 'NO' END Flag
FROM
CTE CurRow LEFT JOIN
(SELECT *, C.RowId - 1 AS TmpRowId FROM CTE C) NextRow ON CurRow.RowId = NextRow.TmpRowId LEFT JOIN
(SELECT *, C.RowId + 1 AS TmpRowId FROM CTE C) PrevRow ON CurRow.RowId = PrevRow.TmpRowId
OUTPUT:
Id StartDate RowId Flag
1 2016-01-01 00:00:00.000 1 NO
2 2016-01-01 00:30:00.000 2 15 MIN
3 2016-01-01 00:45:00.000 3 15 MIN
4 2016-01-01 01:00:00.000 4 15 MIN
If I understand you correctly, you can use LEAD/LAG:
WITH Src AS
(
SELECT * FROM (VALUES
(1,'2016-01-01 00:00'),
(2,'2016-01-01 00:30'),
(3,'2016-01-01 00:45'),
(4,'2016-01-01 01:00')) T(ID, [Date])
)
SELECT *, CASE WHEN LEAD([Date]) OVER (ORDER BY ID)=DATEADD(MINUTE, 15, [Date])
OR LAG([Date]) OVER (ORDER BY ID)=DATEADD(MINUTE, -15, [Date])
THEN 'Chained' END [Status]
FROM Src
It produces:
ID Date Status
-- ---- ------
1 2016-01-01 00:00 NULL
2 2016-01-01 00:30 Chained
3 2016-01-01 00:45 Chained
4 2016-01-01 01:00 Chained
You can do this with OUTER APPLY and tricky ROW_NUMBER():
;WITH TimeLines AS ( --This CTE is similar to your table
SELECT *
FROM (VALUES
(1, '2016-01-01 00:00'),(2, '2016-01-01 00:30'),
(3, '2016-01-01 00:45'),(4, '2016-01-01 01:00'),
(5, '2016-01-01 01:05'),(6, '2016-01-01 01:07'),
(7, '2016-01-01 01:15'),(8, '2016-01-01 01:30'),
(9, '2016-01-01 01:45'),(10, '2016-01-01 02:00')
) as t(id, datum)
)
, cte AS (
SELECT t.id,
t.datum,
CASE WHEN ISNULL(DATEDIFF(MINUTE,t1.datum,t.datum),0) != 15 THEN DATEDIFF(MINUTE,t.datum,t2.datum) ELSE 15 END as i
FROM TimeLines t --in this cte with the help of
OUTER APPLY ( --OUTER APPLY we are getting next and previous dates to compare them
SELECT TOP 1 *
FROM TimeLines
WHERE t.datum > datum
ORDER BY datum desc) t1
OUTER APPLY (
SELECT TOP 1 *
FROM TimeLines
WHERE t.datum < datum
ORDER BY datum asc) t2
)
SELECT *, --this is final select to get rows you need with chaines
(ROW_NUMBER() OVER (ORDER BY (SELECT 1))+2)/3 as seq
FROM cte
WHERE i = 15
Output:
id datum i seq
2 2016-01-01 00:30 15 1
3 2016-01-01 00:45 15 1
4 2016-01-01 01:00 15 1
7 2016-01-01 01:15 15 2
8 2016-01-01 01:30 15 2
9 2016-01-01 01:45 15 2
10 2016-01-01 02:00 15 3