How can I query for overlapping date ranges? - sql

I'm using SQL Server 2008 R2 and trying to create a query that will show whether dates overlap.
I'm trying to calculate the number of days someone is covered under a certain criteria. Here is an example of the table...
CREATE TABLE mytable
(
CARDNBR varchar(10)
GPI char(14) ,
GPI_DESCRIPTION_10 varchar(50) ,
RX_DATE datetime ,
DAYS_SUPPLY int ,
END_DT datetime ,
METRIC_QUANTITY float
)
INSERT INTO mytable VALUES ('1234567890','27200040000315','Glyburide','01/30/2013','30','03/01/2013','60')
INSERT INTO mytable VALUES ('1234567890','27200040000315','Glyburide','03/04/2013','30','04/03/2013','60')
INSERT INTO mytable VALUES ('1234567890','27250050007520','Metformin','01/03/2013','30','02/02/2013','120')
INSERT INTO mytable VALUES ('1234567890','27250050007520','Metformin','02/27/2013','30','03/29/2013','120')
I want to be able to count the number of days that a person was covered from the first RX_DATE to the last END_DT, which in this example is 90 days (4/3/13 - 1/3/13).
That part is done, but this is where I'm getting into trouble.
Between row 1 and row 2, there was a 3 day period where there were no drugs being taken. Between rows 3 and 4 there was a 25 day period. However, during that 25 day period, row 1 covered that gap. So the end number I need to show is 3 for the gap between rows 1 and 2.
Any help would be greatly appreciated.
Thanks.

There might be a better approach, but you could create a lookup of days, join to it and select the distinct days that join, that will get you the total count of days covered for all lines:
CREATE TABLE #lkp_Calendar (Dt DATE)
GO
SET NOCOUNT ON
DECLARE #intFlag INT
SET #intFlag = 1
WHILE (#intFlag <=500)
BEGIN
--Loop through this:
INSERT INTO #lkp_Calendar
SELECT DATEADD(day,#intFlag,'20120101')
SET #intFlag = #intFlag + 1
END
GO
--Days Covered
SELECT CARDNBR, COUNT(DISTINCT b.Dt)CT
FROM #mytable a
JOIN #lkp_Calendar b
ON b.Dt BETWEEN a.RX_DATE AND a.END_DT
GROUP BY CARDNBR
--Total Days
SELECT CARDNBR, DATEDIFF(DAY,MIN(RX_DATE),MAX(END_DT))+1 'Total_Days'
FROM #mytable
GROUP BY CARDNBR
--Combined
SELECT covered.CARDNBR, covered.CT 'Days Covered', total.Total_Days 'Total Days', total.Total_Days - covered.CT 'Days Gap'
FROM (SELECT CARDNBR, COUNT(DISTINCT b.Dt)CT
FROM #mytable a
JOIN #lkp_Calendar b
ON b.Dt BETWEEN a.RX_DATE AND a.END_DT
GROUP BY CARDNBR
)covered
JOIN (SELECT CARDNBR, DATEDIFF(DAY,MIN(RX_DATE),MAX(END_DT))+1 'Total_Days'
FROM #mytable
GROUP BY CARDNBR
)total
ON covered.CARDNBR = total.CARDNBR
You said 90 days, but I believe you should have 91. Date diff from Mon-Wed is only 2, but that's 3 days covered. But you can decide if coverage begins on the rx date or the day after.

Related

Is there a way to aggregate a variable range of dates in SQL using a SET operation

I have a table like this one....
CREATE TABLE AbsentStudents
(
Id int not null primary key identity(1,1),
StudentId int not null,
AbsentDate datetime not null
)
This is a very large table that has 1 row for each student for each day that they were absent.
I have been asked to write a stored procedure that gets student absences by date range. What makes this query tricky is that I have to filter/aggregate by "absence episodes". The number of days that constitutes an "absence episode" is a procedure parameter so it can vary.
So for example, I need to get a list of students who were absent between 1/1/2016 to 1/17/2016 but only if they were absent for more than #Days (2 or 3 or whatever the parameter dictates) days.
I think that alone I could figure out. However, within the date range a student can have more than one "absence episode". So a student might have been absent for 3 days at the beginning of the date range, 2 days in the middle of the date range, and 4 days at the end of the date range and each of those constitutes a different "absence episodes". Assuming that my #Days parameter is 2, that should return 3 rows for that student. And, each returned row should calculate how many days the student was absent for that "absence episode."
So I would like my procedure require 3 parameters (#StartDate datetime,#EndDate datetime, #Days int) and return something like this...
StudentId, InitialAbsentDate, ConsecutiveDaysMissed
And ideally it would do this using a SET operation and avoid cursors. (Although cursors are fine if that is the only option.)
UPDATE (by Shnugo)
A test scenario
DECLARE #AbsentStudents TABLE(
Id int not null primary key identity(1,1),
StudentId int not null,
AbsentDate datetime not null
);
INSERT INTO #AbsentStudents VALUES
--student 1
(1,{d'2016-10-01'}),(1,{d'2016-10-02'}),(1,{d'2016-10-03'}) --three days
,(1,{d'2016-10-05'}) --one day
,(1,{d'2016-10-07'}),(1,{d'2016-10-08'}) --two days
--student 2
,(2,{d'2016-10-01'}),(2,{d'2016-10-02'}),(2,{d'2016-10-03'}),(2,{d'2016-10-04'}) --four days
,(2,{d'2016-10-08'}),(2,{d'2016-10-09'}),(2,{d'2016-10-10'}) --three days
,(2,{d'2016-10-12'}); --one day
DECLARE #startDate DATETIME={d'2016-10-01'};
DECLARE #endDate DATETIME={d'2016-10-31'};
DECLARE #Days INT = 3;
If you just want periods of times when students are absent, you can do this with a difference of row numbers approach.
Now, the following assumes that days are sequential with no gaps and uses the difference of row numbers to get periods of absences:
select student_id,
min(AbsentDate),
max(AbsentDate),
count(*) as number_of_days
from (select a.*,
row_number() over (partition by student_id order by AbsentDate) as seqnum_sa
from AbsentStudents a
) a
group by student_id,
dateadd(day, - seqnum_sa, AbsentDate);
Notes:
You have additional requirements on minimum days and date ranges. These are easily handled with a where clause.
I suspect you have a hidden requirement on avoiding week ends an holidays. Neither this (nor other answers) cover this. Ask another question if this is an issue.
You can try this query:
SELECT
StudentId
, MIN(AbsentDate) AS InitialDate
, COUNT(*) AS ConsecutiveDaysMissed
FROM (
SELECT
dateNumber - ROW_NUMBER() OVER(PARTITION BY StudentId ORDER BY dateNumber) AS PeriodId
, AbsentDate
, StudentId
FROM(
SELECT
StudentId
, AbsentDate
, CAST(CONVERT(CHAR(8), AbsentDate, 112) AS INT) AS dateNumber
FROM AbsentStudents
WHERE AbsentDate BETWEEN #StartDate AND #EndDate
) AS T
) AS StudentPeriod
GROUP BY StudentID, PeriodId
Well, you can make a table with dates and their order numbers without holidays and weekends. Then make the join with AbsentStudents by date and use order number instead of CAST(CONVERT(CHAR(8), AbsentDate, 112) AS INT) AS dateNumber.
You can use a trick. If you order by date, you can find date groups by subtracting the number of days from smallest element and adding a counter that goes up by one every row.
SELECT StudentID
FROM (
SELECT StudentID, GROUP_NUM, COUNT(*) AS GROUP_DAY_CNT
FROM (
SELECT StudentId,
DATEDIFF(dd,DATEADD(dd,M.Min, ROW_NUMBER() OVER (ORDER BY AbsetntDate),AbsentDate) as GROUP_NUM
FROM AbsentStudent
CROSS JOIN (SELECT MIN(AbsentDate) as Min FROM AbsentStudents WHERE AbsentDate BETWEEN #StartDate AND #EndDate) M
WHERE AbsentDate BETWEEN #StartDate AND #EndDate
) X
GROUP BY StudentID, GROUP_NUM
) Z
WHERE GROUP_DAY_CNT >= #Days

Finding Time overlaps for two differen timlines

I am using SQL SERVER. I have two tables table1 and table2. Both of them store time intervals, for simplicity just say both has two datetime2 column, column names are S1 and S2 for table 1, T1 and T2 for table2, for each row S1 is Greater than S2 , exactly for the table two. I want to calculate the value of intervals between S2 and S1(like a timeline) and minus it from overlap of T1 and T2 over S1 and S2. I tried this but can't go further than first part of calculation
DECLARE #x float
SET x=0
SELECT SUM(S1-S2)-x from table1
(set x =(SELECT (T1-T2) FROM table2
WHERE T1>=S1 and T2<=S2));
Example:
S2= 10/25/2012 ; S1= 11/30/2012;
assume that we have three rows in table 2
T2=10/20/2012 , T1=10/28/2012
T2=11/4/2012 , T1=11/8/2012
T2=11/22/2012 , T1=11/30/2012
what I want is to find total minutes between S1 and S2 except the minutes that overlapped with second table T1 and T2 intervals. My query works for the second row in second table when the whole interval between T1 and T2 is in the interval of S1 and S2.
This is somehow complicated hope this example helps
Query works fine but i can not calculate the overlap value with the query when one of the T1 or T2 are in the S1 and S2 interval. Should i run multiple queries? What are the parallels here?
I'm using SQL Server 2008 for this example.
This solution assumes that all intervals in table T do not overlap with each other.
The following articles explain interval algebra in detail and I think they are a very good read. They have nice diagrams as well.
Comparing date ranges
http://salman-w.blogspot.com.au/2012/06/sql-query-overlapping-date-ranges.html
http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html
http://stewashton.wordpress.com/2014/03/11/sql-for-date-ranges-gaps-and-overlaps/
Create tables with sample data
I named the columns in a less confusing manner than in the original question. I've added few extra intervals that do not overlap to illustrate that proposed solution filters them out.
DECLARE #TableS TABLE (ID int IDENTITY(1,1), DateFromS date, DateToS date);
DECLARE #TableT TABLE (ID int IDENTITY(1,1), DateFromT date, DateToT date);
INSERT INTO #TableS (DateFromS, DateToS) VALUES ('2012-10-25', '2012-11-30');
INSERT INTO #TableS (DateFromS, DateToS) VALUES ('2015-10-25', '2015-11-30');
INSERT INTO #TableT (DateFromT, DateToT) VALUES ('2012-10-20', '2012-10-28');
INSERT INTO #TableT (DateFromT, DateToT) VALUES ('2012-11-04', '2012-11-08');
INSERT INTO #TableT (DateFromT, DateToT) VALUES ('2012-11-22', '2012-11-30');
INSERT INTO #TableT (DateFromT, DateToT) VALUES ('2010-11-22', '2010-11-30');
INSERT INTO #TableT (DateFromT, DateToT) VALUES ('2020-11-22', '2020-11-30');
Find overlapping intervals
I assume that we want to do these calculations for each row in the table S and for each row in table T. If this is not the case, you should join tables with some extra condition.
In this example I work only with days precision, not minutes, and I assume that start and end dates are inclusive, i.e. duration between 01/01/2000 and 01/01/2000 is one day. It should be fairly straightforward to extend this to minute precision.
SELECT *
,ISNULL(1+DATEDIFF(day, MaxDateFrom.DateFrom, MinDateTo.DateTo), 0) AS OverlappedDays
FROM
#TableS AS TS
LEFT JOIN #TableT AS TT ON TS.DateFromS <= TT.DateToT AND TS.DateToS >= TT.DateFromT
-- all periods in TS, which overlap with periods in TT
--(StartA <= EndB) and (EndA >= StartB)
CROSS APPLY
(
SELECT CASE WHEN TS.DateFromS > TT.DateFromT THEN TS.DateFromS ELSE TT.DateFromT END AS DateFrom
) AS MaxDateFrom
CROSS APPLY
(
SELECT CASE WHEN TS.DateToS < TT.DateToT THEN TS.DateToS ELSE TT.DateToT END AS DateTo
) AS MinDateTo
The condition in LEFT JOIN leaves only overlapping intervals. To calculate the duration of the overlapping interval I use two CROSS APPLYs. This is the result set of this intermediary query:
ID DateFromS DateToS ID DateFromT DateToT DateFrom DateTo OverlappedDays
1 2012-10-25 2012-11-30 1 2012-10-20 2012-10-28 2012-10-25 2012-10-28 4
1 2012-10-25 2012-11-30 2 2012-11-04 2012-11-08 2012-11-04 2012-11-08 5
1 2012-10-25 2012-11-30 3 2012-11-22 2012-11-30 2012-11-22 2012-11-30 9
2 2015-10-25 2015-11-30 NULL NULL NULL NULL NULL 0
Note, that the last row corresponds to the case when an interval in table S doesn't overlap with any intervals from table T.
Calculate durations
Now all we need is to sum up the duration of overlapping intervals T for each original row in table S and subtract it from the duration of the interval S.
SELECT
TS.ID
,TS.DateFromS
,TS.DateToS
,1+DATEDIFF(day, TS.DateFromS, TS.DateToS) AS DurationS
,ISNULL(SUM(1+DATEDIFF(day, MaxDateFrom.DateFrom, MinDateTo.DateTo)), 0) AS DurationOverlapped
,1+DATEDIFF(day, TS.DateFromS, TS.DateToS)
- ISNULL(SUM(1+DATEDIFF(day, MaxDateFrom.DateFrom, MinDateTo.DateTo)), 0) AS FinalDuration
FROM
#TableS AS TS
LEFT JOIN #TableT AS TT ON TS.DateFromS <= TT.DateToT AND TS.DateToS >= TT.DateFromT
CROSS APPLY
(
SELECT CASE WHEN TS.DateFromS > TT.DateFromT THEN TS.DateFromS ELSE TT.DateFromT END AS DateFrom
) AS MaxDateFrom
CROSS APPLY
(
SELECT CASE WHEN TS.DateToS < TT.DateToT THEN TS.DateToS ELSE TT.DateToT END AS DateTo
) AS MinDateTo
GROUP BY TS.ID, TS.DateFromS, TS.DateToS
This is the result set:
ID DateFromS DateToS DurationS DurationOverlapped FinalDuration
1 2012-10-25 2012-11-30 37 18 19
2 2015-10-25 2015-11-30 37 0 37
You are interested in the FinalDuration value, which is 19 for your example and 37 for the second interval that I added for this example.
You can add more intervals to the sample data to play with the queries and see how they work.
This solution assumes that all intervals in table T do not overlap with each other.

Growth Of Distinct Users Per Week

I need to get a report that shows distinct users per week to show user growth per week, but I need it to show cumulative distinct users.
So if I have 5 weeks of data, I want to show:
Distinct users from week 0 through week 1
Distinct users from week 0 through week 2
Distinct users from week 0 through week 3
Distinct users from week 0 through week 4
Distinct users from week 0 through week 5
I have a whole year's worth of data. The only way I know how to do this is to literally query the time ranges adjusting a week out at a time and this is very tedious. I just can't figure out how I could query everything from week 0 through week 1 all the way to week 0 through week 52.
EDIT - What I have so far:
select count(distinct user_id) as count
from tracking
where datepart(wk,login_dt_tm) >= 0 and datepart(wk,login_dt_tm) <= 1
Then I take that number, record it, and update it to -- datepart(wk,login_dt_tm) <= 2. And so on until I have all the weeks. That way I can chart a nice growth chart by week.
This is tedious and there has to be another way.
UPDATE-
I used the solution provided by #siyual but updated it to use a table variable so I could get all the results in one output.
Declare #Week Int = 0
Declare #Totals Table
(
WeekNum int,
UserCount int
)
While #Week < 52
Begin
insert into #Totals (WeekNum,UserCount)
select #Week,count(distinct user_id) as count
from tracking
where datepart(wk,login_dt_tm) >= #Week and datepart(wk,login_dt_tm) <= (#Week + 1)
Set #Week += 1
End
Select * from #Totals
Why not something like:
select count(distinct user_id) as count, datepartk(wk, login_dt_tm) as week
from tracking
group by datepart(wk,login_dt_tm)
order by week
You could try something like this:
Declare #Week Int = 1
While #Week <= 52
Begin
select count(distinct user_id) as count
from tracking
where datepart(wk,login_dt_tm) >= 0 and datepart(wk,login_dt_tm) <= #Week
Set #Week += 1
End
Just for the record, I would do this in one statement, using a recursive CTE to generate the numbers from 1 to 52 (you could also use a numbers table):
with numbers as (
select 1 as n
union all
select n + 1
from numbers
where n < 52
)
select count(distinct user_id) as count
from tracking t join
numbers n
on datepart(wk, login_dt_tm) >= 0 and datepart(wk, login_dt_tm) <= numbers.n;
Seems easier to put it all in one query.
SELECT
week_num,
distinct_count
FROM (
select distinct
datepart(wk,login_dt_tm) week_num
from #tracking
) t_week
CROSS APPLY (
select
count(distinct user_id) distinct_count
from #tracking
where datepart(wk,login_dt_tm) between 0 and t_week.week_num
) t_count

Repeating rows from right join

My application saves logs that need to be taken at least one time during each of the 3 different time periods during the day. So ideally 3 logs per day, each with a unique time period ID. I need to write an exception report (MSSQL 2008) that will show when a time period is missed for any given day. I have a LogTimePeriods table that contains 3 rows for each of the time periods. The Logs table contains the LogTimePeriodID so I do not need to do any logic to see what Time period the log belongs in (that is done via the application).
I know I need something along the lines of a right/left join to try to match all the LogTimePeriodID for every Log row for a given date. I cant seem to make any progress. Any help is appreciated! Thanks for reading.
SQL Fiddle
EDIT: Desired output below
Date | LogPeriodID
6/3 | 3
6/5 | 2
6/5 | 3
Your SQL Fiddle is set to use MYSQL, not SQL Server 2008, so I can't test my answer against your data: however, based on my understanding of your requirements and assuming you are querying a SQL 2008 database, the following example should work for you (the references to my table variables would obviously be replaced with your actual tables).
DECLARE #StartDate DATE = '06/04/2014'
DECLARE #EndDate DATE = GETDATE();
DECLARE #LogTimePeriod TABLE (LogTimePeriodID INT IDENTITY(1,1), TimePeriod VARCHAR(20))
INSERT INTO #LogTImePeriod (TimePeriod) SELECT '00:00 - 07:59'
INSERT INTO #LogTImePeriod (TimePeriod) SELECT '08:00 - 15:59'
INSERT INTO #LogTImePeriod (TimePeriod) SELECT '16:00 - 23:59'
DECLARE #logs TABLE (LogDataID INT IDENTITY(1,1), LogDate DATE, SomeInformation VARCHAR(10), LogTimePeriodID INT)
INSERT INTO #logs (SomeInformation, LogDate, LogTimePeriodID) SELECT 'abc', '6/4/2014', 1
INSERT INTO #logs (SomeInformation, LogDate, LogTimePeriodID) SELECT 'def', '6/4/2014', 2
INSERT INTO #logs (SomeInformation, LogDate, LogTimePeriodID) SELECT 'ghi', '6/4/2014', 3
INSERT INTO #logs (SomeInformation, LogDate, LogTimePeriodID) SELECT 'abc', '6/5/2014', 1
INSERT INTO #logs (SomeInformation, LogDate, LogTimePeriodID) SELECT 'def', '6/5/2014', 2;
WITH dates AS (
SELECT CAST(#StartDate AS DATETIME) 'date'
UNION ALL
SELECT DATEADD(dd, 1, t.date)
FROM dates t
WHERE DATEADD(dd, 1, t.date) <= #EndDate)
SELECT ltp.LogTimePeriodID, ltp.TimePeriod, dates.date
FROM
#LogTimePeriod ltp
INNER JOIN
dates ON 1=1
LEFT JOIN
#logs ld ON
ltp.LogTimePeriodID = ld.LogTimePeriodID AND
dates.date = ld.LogDate
WHERE ld.LogDataID IS NULL
OPTION (MAXRECURSION 1000) -- 0 is unlimited, 1000 limits to 1000 rows
SQLServer 2008 has the EXCEPT keyword that subtract the second recordset from the first.
In this case it's possible to generate all the possible logs and remove from those the ones in the logs table, that will left the logs not present in the table.
SELECT DISTINCT StartDateTime, StartTime, EndTime
FROM Logs
CROSS JOIN LogTimePeriods
EXCEPT
SELECT StartDateTime, StartTime, EndTime
FROM LogTimePeriods ltp
LEFT JOIN logs l ON l.LogTimePeriodID = ltp.LogTimePeriodID
ORDER BY StartDateTime, StartTime
SQLFiddle demo with you data converted to SQLServer 2008

group data by any range of 30 days (not by range of dates) in SQL Server

I got a table with a list of transactions.
for the example, lets say it has 4 fields:
ID, UserID, DateAddedd, Amount
I would like to run a query that checks if there was a time, that in 30 days, a user made transactions in the sum of 100 or more
I saw lots of samples of grouping by month or a day but the problem is that if for example
a user made a 50$ transaction on the 20/4 and on the 5/5 he made another 50$ transaction, the query should show it. (its 100$ or more in a period of 30 days)
I think that this should work (I'm assuming that transactions have a date component, and that a user can have multiple transactions on a single day):
;with DailyTransactions as (
select UserID,DATEADD(day,DATEDIFF(day,0,DateAdded),0) as DateOnly,SUM(Amount) as Amount
from Transactions group by UserID,DATEADD(day,DATEDIFF(day,0,DateAdded),0)
), Numbers as (
select ROW_NUMBER() OVER (ORDER BY object_id) as n from sys.objects
), DayRange as (
select n from Numbers where n between 1 and 29
)
select
dt.UserID,dt.DateOnly as StartDate,MAX(ot.DateOnly) as EndDate, dt.Amount + COALESCE(SUM(ot.Amount),0) as TotalSpend
from
DailyTransactions dt
cross join
DayRange dr
left join
DailyTransactions ot
on
dt.UserID = ot.UserID and
DATEADD(day,dr.n,dt.DateOnly) = ot.DateOnly
group by dt.UserID,dt.DateOnly,dt.Amount
having dt.Amount + COALESCE(SUM(ot.Amount),0) >= 100.00
Okay, I'm using 3 common table expressions. The first (DailyTransactions) is reducing the transactions table to a single transaction per user per day (this isn't necessary if the DateAdded is a date only, and each user has a single transaction per day). The second and third (Numbers and DayRange) are a bit of a cheat - I wanted to have the numbers 1-29 available to me (for use in a DATEADD). There are a variety of ways of creating either a permanent or (as in this case) temporary Numbers table. I just picked one, and then in DayRange, I filter it down to the numbers I need.
Now that we have those available to us, we write the main query. We're querying for rows from the DailyTransactions table, but we want to find later rows in the same table that are within 30 days. That's what the left join to DailyTransactions is doing. It's finding those later rows, of which there may be 0, 1 or more. If it's more than one, we want to add all of those values together, so that's why we need to do a further bit of grouping at this stage. Finally, we can write our having clause, to filter down only to those results where the Amount from a particular day (dt.Amount) + the sum of amounts from later days (SUM(ot.Amount)) meets the criteria you set out.
I based this on a table defined thus:
create table Transactions (
UserID int not null,
DateAdded datetime not null,
Amount decimal (38,2)
)
If I understand you correctly, you need a calendar table and then check the sum between date and date+30. So if you want to check a period of 1 year you need to check something like 365 periods.
Here is one way of doing that. The recursive CTE creates the calendar and the cross apply calculates the sum for each CalDate between CalDate and CalDate+30.
declare #T table(ID int, UserID int, DateAdded datetime, Amount money)
insert into #T values(1, 1, getdate(), 50)
insert into #T values(2, 1, getdate()-29, 60)
insert into #T values(4, 2, getdate(), 40)
insert into #T values(5, 2, getdate()-29, 50)
insert into #T values(7, 3, getdate(), 70)
insert into #T values(8, 3, getdate()-30, 80)
insert into #T values(9, 4, getdate()+50, 50)
insert into #T values(10,4, getdate()+51, 50)
declare #FromDate datetime
declare #ToDate datetime
select
#FromDate = min(dateadd(d, datediff(d, 0, DateAdded), 0)),
#ToDate = max(dateadd(d, datediff(d, 0, DateAdded), 0))
from #T
;with cal as
(
select #FromDate as CalDate
union all
select CalDate + 1
from cal
where CalDate < #ToDate
)
select S.UserID
from cal as C
cross apply
(select
T.UserID,
sum(Amount) as Amount
from #T as T
where T.DateAdded between CalDate and CalDate + 30
group by T.UserID) as S
where S.Amount >= 100
group by S.UserID
option (maxrecursion 0)