How to select rows based on a rolling 30 day window SQL

How to select rows based on a rolling 30 day window SQL - sql

My question involves how to identify an index discharge.
The index discharge is the earliest discharge. On that date, the 30 day window starts. Any admissions during that time period are considered readmissions, and they should be ignored. Once the 30 day window is over, then any subsequent discharge is considered an index and the 30 day window begins again.
I can't seem to work out the logic for this. I've tried different windowing functions, I've tried cross joins and cross applies. The issue I keep encountering is that a readmission cannot be an index admission. It must be excluded.
I have successfully written a while loop to solve this problem, but I'd really like to get this in a set based format, if it's possible. I haven't been successful so far.
Ultimate goal is this -
id
AdmitDate
DischargeDate
MedicalRecordNumber
IndexYN
1
2021-03-03 00:00:00.000
2021-03-09 13:20:00.000
X0090362
1
4
2021-03-05 00:00:00.000
2021-03-10 16:00:00.000
X0012614
1
6
2021-05-18 00:00:00.000
2021-05-21 22:20:00.000
X0012614
1
7
2021-06-21 00:00:00.000
2021-07-08 13:30:00.000
X0012614
1
8
2021-02-03 00:00:00.000
2021-02-09 17:00:00.000
X0019655
1
10
2021-03-23 00:00:00.000
2021-03-26 16:40:00.000
X0019655
1
11
2021-03-15 00:00:00.000
2021-03-18 15:53:00.000
X4135958
1
13
2021-05-17 00:00:00.000
2021-05-23 14:55:00.000
X4135958
1
15
2021-06-24 00:00:00.000
2021-07-13 15:06:00.000
X4135958
1
Sample code is below.
CREATE TABLE #Admissions
(
[id] INT,
[AdmitDate] DATETIME,
[DischargeDateTime] DATETIME,
[UnitNumber] VARCHAR(20),
[IndexYN] INT
)
INSERT INTO #Admissions
VALUES( 1 ,'2021-03-03' ,'2021-03-09 13:20:00.000' ,'X0090362', NULL)
,(2 ,'2021-03-27' ,'2021-03-30 19:59:00.000' ,'X0090362', NULL)
,(3 ,'2021-03-31' ,'2021-04-04 05:57:00.000' ,'X0090362', NULL)
,(4 ,'2021-03-05' ,'2021-03-10 16:00:00.000' ,'X0012614', NULL)
,(5 ,'2021-03-28' ,'2021-04-16 13:55:00.000' ,'X0012614', NULL)
,(6 ,'2021-05-18' ,'2021-05-21 22:20:00.000' ,'X0012614', NULL)
,(7 ,'2021-06-21' ,'2021-07-08 13:30:00.000' ,'X0012614', NULL)
,(8 ,'2021-02-03' ,'2021-02-09 17:00:00.000' ,'X0019655', NULL)
,(9 ,'2021-02-17' ,'2021-02-22 17:25:00.000' ,'X0019655', NULL)
,(10 ,'2021-03-23' ,'2021-03-26 16:40:00.000' ,'X0019655', NULL)
,(11 ,'2021-03-15' ,'2021-03-18 15:53:00.000' ,'X4135958', NULL)
,(12 ,'2021-04-08' ,'2021-04-13 19:42:00.000' ,'X4135958', NULL)
,(13 ,'2021-05-17' ,'2021-05-23 14:55:00.000' ,'X4135958', NULL)
,(14 ,'2021-06-09' ,'2021-06-14 12:45:00.000' ,'X4135958', NULL)
,(15 ,'2021-06-24' ,'2021-07-13 15:06:00.000' ,'X4135958', NULL)

You can use a recursive CTE to identify all rows associated with each "index" discharge:
with a as (
select a.*, row_number() over (order by dischargedatetime) as seqnum
from admissions a
),
cte as (
select id, admitdate, dischargedatetime, unitnumber, seqnum, dischargedatetime as index_dischargedatetime
from a
where seqnum = 1
union all
select a.id, a.admitdate, a.dischargedatetime, a.unitnumber, a.seqnum,
(case when a.dischargedatetime > dateadd(day, 30, cte.index_dischargedatetime)
then a.dischargedatetime else cte.index_dischargedatetime
end) as index_dischargedatetime
from cte join
a
on a.seqnum = cte.seqnum + 1
)
select *
from cte;
You can then incorporate this into an update:
update admissions
set indexyn = (case when admissions.dischargedatetime = cte.index_dischargedatetime then 'Y' else 'N' end)
from cte
where cte.id = admissions.id;
Here is a db<>fiddle. Note that I changed the type of IndexYN to a character to assign 'Y'/'N', which makes sense given the column name.

Related

SSMS 2018 - Find Gaps in Dates and Flag the Gaps

I have reviewed many posts about how to find gaps in dates and believe that I am close to figuring it out but need just a little extra help. Per my query I am pulling distinct days with a record count for each distinct day. I have added a "Gap_Days" column which should return a zero if no gap from previous date OR the number of days since the previous date. As you can see all of my Gap_Days are zero when in fact I am missing 10/24 and 10/25. Therefore on 10/26 there should be a gap of 2 since the previous date is 10/23.
Thanks in advance for pointing out what I am probably looking right at.
SELECT DISTINCT Run_Date, COUNT(Run_Date) AS Daily_Count,
Gap_Days = Coalesce(DateDiff(Day,Lag(Run_Date) Over (partition by Run_Date order by Run_Date DESC), Run_Date)-1,0)
FROM tblUnitsOfWork
WHERE (Run_Date >= '2022-10-01')
GROUP BY Run_Date
ORDER BY Run_Date DESC;
Run_Date Daily_Count Gap_Days
2022-10-29 00:00:00.000 8431 0
2022-10-28 00:00:00.000 8204 0
2022-10-27 00:00:00.000 8705 0
2022-10-26 00:00:00.000 7885 0
2022-10-23 00:00:00.000 7485 0
2022-10-22 00:00:00.000 8699 0
2022-10-21 00:00:00.000 9212 0
2022-10-20 00:00:00.000 9220 0

First let's set up some demo data:
DECLARE #table TABLE (ID INT IDENTITY, date DATE)
DECLARE #dt DATE
WHILE (SELECT COUNT(*) FROM #table) < 30
BEGIN
SET #dt = DATEADD(DAY,(ROUND(((50 - 1 -1) * RAND() + 1), 0) - 1)-25,CURRENT_TIMESTAMP)
IF NOT EXISTS (SELECT 1 FROM #table WHERE date = #dt) INSERT INTO #table (date) SELECT #dt
END
ID date
--------
1 2022-11-10
2 2022-11-15
3 2022-10-20
...
28 2022-10-14
29 2022-11-13
30 2022-11-21
This gives us a table variable with 30 random dates in a 50 day window. Now let's look for missing dates:
SELECT *, CASE WHEN ROW_NUMBER() OVER (ORDER BY date) > 1 AND LAG(date,1) OVER (ORDER BY date) <> DATEADD(DAY,-1,date) THEN 'GAP! ' + CAST(DATEDIFF(DAY,LAG(date,1) OVER (ORDER BY date),date)-1 AS NVARCHAR) + ' DAYS MISSING!' END
FROM #table
ORDER BY date
All we're doing here is ignoring the first date (since it's expected there wouldn't be one before then) and from then on comparing the last date (using lag ordered by date) to the current date. If it is not a day before the case statement will produce a message with how many days were missing.
ID date MissingDatesFlag
----------------------------
1 2022-10-08 NULL
4 2022-10-09 NULL
25 2022-10-10 NULL
28 2022-10-11 NULL
22 2022-10-15 GAP! 4 DAYS MISSING!
2 2022-10-18 GAP! 3 DAYS MISSING!
12 2022-10-19 NULL
24 2022-10-20 NULL
....
15 2022-11-18 GAP! 3 DAYS MISSING!
29 2022-11-21 GAP! 3 DAYS MISSING!
20 2022-11-22 NULL
Since the demo data is randomly selected your results may vary, but they should be similar.

Summing Records within a Moving Date Range, Date Distances

I have complex calculation requirement for a user logging system. I need to locate the most frequently active users based on their number of logins within a 180 day window. Once two login dates are 181 days apart, they do not count towards a total but could count towards a total when grouped with other dates.
For example here is Jim's login history:
Jim 2018-01-01
Jim 2018-04-01
Jim 2018-05-01
Jim 2018-06-01
Jim 2018-07-01
Jim 2018-08-01
Jim 2018-09-01
Jim 2018-12-01
Using 6 months, instead of 180 days, for simplicity, and only looking 6 months in one direction, Jim had the following totals:
Logins: 5 (2018-01-01 + 6 months)
Logins: 6 (2018-04-01 + 6 months)
Logins: 5 (2018-05-01 + 6 months)
Logins: 5 (2018-06-01 + 6 months)
Logins: 4 (2018-07-01 + 6 months)
Logins: 3 (2018-08-01 + 6 months)
Logins: 2 (2018-09-01 + 6 months)
Logins: 1 (2018-12-01 + 6 months)
So my system would report back 6 because it only wants the maximum total.
Other than brute force calculation, I'm lost on how to construct this system. Yes I can denormalize data to any degree, speed is most important.

Try this:
declare #tbl table(name char(3), dt date);
insert into #tbl values
('Jim', '2018-01-01'),
('Jim', '2018-04-01'),
('Jim', '2018-05-01'),
('Jim', '2018-06-01'),
('Jim', '2018-07-01'),
('Jim', '2018-08-01'),
('Jim', '2018-09-01'),
('Jim', '2018-12-01');
;with cte as (
select name, dt, DATEADD(day, 181, dt) upperDt from #tbl
), cte2 as (
select name,
(select COUNT(*) from cte where dt between c.dt and c.upperDt and name = c.name) cnt
from cte c
)
select name, MAX(cnt) [max]
from cte2
group by name

Try this, using a Common Table Expression to Calculate the EndDate Window and CROSS APPLY to calculate the total number of logins
DECLARE #t TABLE (UserName NVARCHAR(10), LoginDate DATETIME)
INSERT INTO #t
(UserName,LoginDate) VALUES
('Jim','2018-01-01'),
('Jim','2018-04-01'),
('Jim','2018-05-01'),
('Jim','2018-06-01'),
('Jim','2018-07-01'),
('Jim','2018-08-01'),
('Jim','2018-09-01'),
('Jim','2018-12-01')
; WITH CteDateRange
AS(
SELECT
T.UserName
,T.LoginDate
--,EndDateRange = DATEADD(DAY, 181, LoginDate)
,EndDateRange = DATEADD(MONTH, 6, LoginDate)
FROM #t T
)
SELECT
DR.UserName
,DR.LoginDate
,DR.EndDateRange
,T.Total
FROM CteDateRange DR
CROSS APPLY ( SELECT Total = COUNT(D.LoginDate)
FROM CteDateRange D
WHERE D.LoginDate >= DR.LoginDate
AND D.LoginDate <= DR.EndDateRange
AND D.UserName = DR.UserName
) T
Output
UserName LoginDate EndDateRange Total
Jim 2018-01-01 00:00:00.000 2018-07-01 00:00:00.000 5
Jim 2018-04-01 00:00:00.000 2018-10-01 00:00:00.000 6
Jim 2018-05-01 00:00:00.000 2018-11-01 00:00:00.000 5
Jim 2018-06-01 00:00:00.000 2018-12-01 00:00:00.000 5
Jim 2018-07-01 00:00:00.000 2019-01-01 00:00:00.000 4
Jim 2018-08-01 00:00:00.000 2019-02-01 00:00:00.000 3
Jim 2018-09-01 00:00:00.000 2019-03-01 00:00:00.000 2
Jim 2018-12-01 00:00:00.000 2019-06-01 00:00:00.000 1

One basic solution uses a join:
select l.*
from (select l.name, count(*) as cnt,
row_number() over (partition by name order by count(*) desc) as seqnum
from logins l join
logins l2
on l.name = l2.name and
l2.date >= l.date and l2.date < dateadd(day, 181, l.date)
group by l.name
) l
where seqnum = 1;
This might have acceptable performance with an index on logins(name, date).

SQL Dates Selection

I Have a OPL_Dates Table with Start Date and End Dates as Below:
dbo.OPL_Dates
ID Start_date End_date
--------------------------------------
12345 1975-01-01 2001-12-31
12345 1989-01-01 2004-12-31
12345 2005-01-01 NULL
12345 2007-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2013-02-07 NULL
12377 2010-01-01 2012-01-01
12489 2011-12-31 NULL
12489 2012-03-01 2012-04-01
The Output I am looking for is:
ID Start_date End_date
-------------------------------------
12345 1975-01-01 2004-12-31
12345 2005-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2010-01-01 2012-01-01
12377 2013-02-07 NULL
12489 2011-12-31 NULL
Basically, I want to show the gap between the OPL periods(IF Any) else I need min of Start Date and Max of End Dates, for a particular ID.NULL means Open-Ended Date which can be converted to "9999-12-31".

The following pretty much does what you want:
with p as (
select v.*, sum(inc) over (partition by v.id order by v.dte) as running_inc
from t cross apply
(values (id, start_date, 1),
(id, coalesce(end_date, '2999-12-31'), -1)
) v(id, dte, inc)
)
select id, min(dte), max(dte)
from (select p.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by id order by dte desc) as grp
from p
) p
group by id, grp;
Note that it changes the "inifinite" end date from NULL to 2999-12-31. This is a convenience, because NULL orders first in SQL Server ascending sorts.
Here is a SQL Fiddle.
What is this doing? It is unpivoting the dates into a single column, with a 1/-1 flag (inc) indicating whether the record is a start or end. The running sum of this flag then indicates the groups that should be combined. When the running sum is 0, then a group has ended. To include the end date in the right group, a reverse running sum is needed -- but that's a detail.

SQL Split Island On Criteria

I have a SQL table with From and To dates like so:
Row From To
--------------------------------------------------
1 2017-10-28 00:00:00 2017-10-30 00:00:00
2 2017-10-30 00:00:00 2017-10-31 00:00:00
3 2017-10-31 00:00:00 2017-10-31 07:30:00
4 2017-10-31 14:41:00 2017-10-31 15:14:00
5 2017-10-31 17:13:00 2017-11-01 00:00:00
6 2017-11-01 00:00:00 2017-11-01 23:45:00
7 2017-11-02 03:13:00 2017-11-02 07:56:00
I need to group consecutive data into islands. The data is non-overlapping. This is done easily enough using this query:
;with Islands as
(
SELECT
min([From]) as [From]
,max([To]) as [To]
FROM
(
select
[From],
[To],
sum(startGroup) over (order by [From]) StartGroup
from
(
SELECT
[From],
[To],
(case when [From] <= lag([To]) over (order by [From])
then 0
else 1
end) as StartGroup
FROM dbo.DateTable
) IsNewIsland
) GroupedIsland
group by StartGroup
)
select *
from Islands
And gives me these results:
From To Rows
-----------------------------------------------------
2017-10-28 00:00:00 2017-10-31 07:30:00 1-3
2017-10-31 14:41:00 2017-10-31 15:14:00 4
2017-10-31 17:13:00 2017-11-01 23:45:00 5-6
2017-11-02 03:13:00 2017-11-02 07:56:00 7
The problem I have is that I need to modify the query to cap/split the islands once they have gotten enough records to be a certain total duration. This is an input/hardcoded value. The split includes the entire record, not splitting in the middle of a record's From-To range. As an example, I need to split islands to be 27 hours. This would give this result:
From To Rows
-----------------------------------------------------
2017-10-29 00:00:00 2017-10-30 00:00:00 1
2017-10-30 00:00:00 2017-10-31 07:30:00 2-3
2017-10-31 17:13:00 2017-11-01 23:45:00 5-6
The first island was split because rows 1 and 2 alone created a 27 hour period. Rows 4 and 7 are not enough to create an island, so they are ignored.
I tried pulling this information via a lag function in the inner select to compute the "rolling duration" across rows, but it would not work on islands that spanned more than 2 rows because it would only track the last row's duration and I could not "carry" the calculation forward.
SELECT
[From],
[To],
(case when [From] <= lag([To]) over (order by [From]
then (datediff(minute, [From], [To]) + lag(datediff(minute, [From], [To])) over (order by [From]))
else datediff(minute, [From], [To])
end) as RollingDuration,
(case when [From] <= lag([To]) over (order by [From])
then 0
else 1
end) as StartGroup
FROM dbo.DateTable

The "least worst" way I can think of doing it is a "quirky update". (Google it, I honestly didn't make it up.)
http://www.sqlservercentral.com/articles/T-SQL/68467/
Copy the data in to a new table with one or more additional (blank) fields
Use a CLUSTERED PRIMARY KEY to ensure the rows are updated in correct sequence
Use UPDATE and user variables to iterate through rows and store results of calculations
Using that I can start a new group if there is a gap, or a running total reaches 27 hours. Then proceed as usual.
-- New table to work through
----------------------------------------------------------------------
-- Addition [group_start] field (identifies groups, and useful data)
-- PRIMARY KEY CLUSTERED to enforce the order rows will be processed
----------------------------------------------------------------------
CREATE TABLE sample (
id INT,
start DATETIME,
cease DATETIME,
group_start DATETIME DEFAULT(0),
PRIMARY KEY CLUSTERED (group_start, start) -- To force the order we will iterate the rows, and is useful in last step
);
INSERT INTO
sample (
id,
start,
cease
)
VALUES
(1, '2017-10-28 00:00:00', '2017-10-30 00:00:00'),
(2, '2017-10-30 00:00:00', '2017-10-31 00:00:00'),
(3, '2017-10-31 00:00:00', '2017-10-31 07:30:00'),
(4, '2017-10-31 14:41:00', '2017-10-31 15:14:00'),
(5, '2017-10-31 17:13:00', '2017-11-01 00:00:00'),
(6, '2017-11-01 00:00:00', '2017-11-01 23:45:00'),
(7, '2017-11-02 03:13:00', '2017-11-02 07:56:00')
;
-- Quirky Update
----------------------------------------------------------------------
-- Update [group_start] to the start of the current group
-- -> new group if gap since previous row
-- -> new group if previous row took group to 27 hours
-- -> else same group as previous row
----------------------------------------------------------------------
DECLARE #grp_start DATETIME = 0;
WITH
lagged AS
(
SELECT *, LAG(cease) OVER (ORDER BY group_start, start) AS lag_cease FROM sample
)
UPDATE
lagged
SET
#grp_start
= group_start
= CASE WHEN start <> lag_cease THEN start
WHEN start >= DATEADD(hour, 27, #grp_start) THEN start
ELSE #grp_start END
OPTION
(MAXDOP 1)
;
-- Standard SQL to apply other logic
----------------------------------------------------------------------
-- MAX() OVER () to find end time of each group
-- WHERE to filter out any groups under 12 hours long
----------------------------------------------------------------------
SELECT
*
FROM
(
SELECT
*,
MAX(cease) OVER (PARTITION BY group_start) AS group_cease
FROM
sample
)
bounded_groups
WHERE
group_cease >= DATEADD(hour, 12, group_start)
;
http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=1bec5b3fe920c1affd58f23a11e280a0

select a chain of records using mssql

I have a table with records in TimeLines, I need to get rows that form a chain of 45 minutes set.
1|2016-01-01 00:00
2|2016-01-01 00:30
3|2016-01-01 00:45
4|2016-01-01 01:00
How I can find 2nd row depending from it time, cause 2nd, 3rd and 4th rows are indissoluble 15 minutes chain of timeline for 45 min set?
1st and 2nd is not okay, cause interval between timelines is 30 min.
2nd, 3rd and 4th rows are consistent chain of timeline.
2nd row plus 15 min - okay. cause existed 3rd row with that time.
3rd row plus 15 min - okay. cause existed 4th row with that time.
as result i have 45 min consistent timeline chain.
1row plus 15 min - not okay. cause 00:15 time with date not existed.

Try this
DECLARE #Tbl TABLE (Id INT, StartDate DATETIME)
INSERT INTO #Tbl
VALUES
(1,'2016-01-01 00:00'),
(2,'2016-01-01 00:30'),
(3,'2016-01-01 00:45'),
(4,'2016-01-01 01:00')
;WITH CTE
AS
(
SELECT
Id ,
StartDate,
ROW_NUMBER() OVER (ORDER BY Id) AS RowId
FROM
#Tbl
)
SELECT
CurRow.*,
CASE
WHEN
DATEDIFF(MINUTE, CurRow.StartDate, NextRow.StartDate ) = 15 OR
DATEDIFF(MINUTE, PrevRow.StartDate, CurRow.StartDate ) = 15
THEN '15 MIN'
ELSE 'NO' END Flag
FROM
CTE CurRow LEFT JOIN
(SELECT *, C.RowId - 1 AS TmpRowId FROM CTE C) NextRow ON CurRow.RowId = NextRow.TmpRowId LEFT JOIN
(SELECT *, C.RowId + 1 AS TmpRowId FROM CTE C) PrevRow ON CurRow.RowId = PrevRow.TmpRowId
OUTPUT:
Id StartDate RowId Flag
1 2016-01-01 00:00:00.000 1 NO
2 2016-01-01 00:30:00.000 2 15 MIN
3 2016-01-01 00:45:00.000 3 15 MIN
4 2016-01-01 01:00:00.000 4 15 MIN

If I understand you correctly, you can use LEAD/LAG:
WITH Src AS
(
SELECT * FROM (VALUES
(1,'2016-01-01 00:00'),
(2,'2016-01-01 00:30'),
(3,'2016-01-01 00:45'),
(4,'2016-01-01 01:00')) T(ID, [Date])
)
SELECT *, CASE WHEN LEAD([Date]) OVER (ORDER BY ID)=DATEADD(MINUTE, 15, [Date])
OR LAG([Date]) OVER (ORDER BY ID)=DATEADD(MINUTE, -15, [Date])
THEN 'Chained' END [Status]
FROM Src
It produces:
ID Date Status
-- ---- ------
1 2016-01-01 00:00 NULL
2 2016-01-01 00:30 Chained
3 2016-01-01 00:45 Chained
4 2016-01-01 01:00 Chained

You can do this with OUTER APPLY and tricky ROW_NUMBER():
;WITH TimeLines AS ( --This CTE is similar to your table
SELECT *
FROM (VALUES
(1, '2016-01-01 00:00'),(2, '2016-01-01 00:30'),
(3, '2016-01-01 00:45'),(4, '2016-01-01 01:00'),
(5, '2016-01-01 01:05'),(6, '2016-01-01 01:07'),
(7, '2016-01-01 01:15'),(8, '2016-01-01 01:30'),
(9, '2016-01-01 01:45'),(10, '2016-01-01 02:00')
) as t(id, datum)
)
, cte AS (
SELECT t.id,
t.datum,
CASE WHEN ISNULL(DATEDIFF(MINUTE,t1.datum,t.datum),0) != 15 THEN DATEDIFF(MINUTE,t.datum,t2.datum) ELSE 15 END as i
FROM TimeLines t --in this cte with the help of
OUTER APPLY ( --OUTER APPLY we are getting next and previous dates to compare them
SELECT TOP 1 *
FROM TimeLines
WHERE t.datum > datum
ORDER BY datum desc) t1
OUTER APPLY (
SELECT TOP 1 *
FROM TimeLines
WHERE t.datum < datum
ORDER BY datum asc) t2
)
SELECT *, --this is final select to get rows you need with chaines
(ROW_NUMBER() OVER (ORDER BY (SELECT 1))+2)/3 as seq
FROM cte
WHERE i = 15
Output:
id datum i seq
2 2016-01-01 00:30 15 1
3 2016-01-01 00:45 15 1
4 2016-01-01 01:00 15 1
7 2016-01-01 01:15 15 2
8 2016-01-01 01:30 15 2
9 2016-01-01 01:45 15 2
10 2016-01-01 02:00 15 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to select rows based on a rolling 30 day window SQL - sql

Related

SSMS 2018 - Find Gaps in Dates and Flag the Gaps

Summing Records within a Moving Date Range, Date Distances

SQL Dates Selection

SQL Split Island On Criteria

select a chain of records using mssql

Categories

Resources