Split one record into multiple based on another table - sql

I have a table tracking location stays. There's an ID, startdatetime, enddatetime, and other fields.
I have another table with events that occur within each of those stays, with similar start and end times, and linked on the ID field.
What I need to do is merge the two and split the location table up into its individual events. The trick here is a location may start on 2017-08-02 but the first event might not start for a few days. Thus i'd need a record for that gap at the start.
sample data
CREATE TABLE #Stays (
EpID INT, StayId INT, StayStartDate DateTime, StayEndDate DateTime);
CREATE TABLE #Events (
EpID INT, EventId INT, EventStartDate DateTime, EventEndDate DateTime, EventNumber INT);
INSERT INTO #Events SELECT 1, 7897, '2016-11-24 00:00:00.000','2016-11-26 00:00:00.000', 1
INSERT INTO #Events SELECT 1, 7898, '2016-11-26 00:00:00.000','2016-11-28 00:00:00.000', 2
INSERT INTO #Stays SELECT 1, 10, '2016-11-22 08:15:00.000','2016-11-24 10:54:00.000'
INSERT INTO #Stays SELECT 1, 11, '2016-11-24 10:54:00.000','2016-11-24 11:17:00.000'
INSERT INTO #Stays SELECT 1, 12, '2016-11-24 11:17:00.000','2016-11-25 08:16:00.000'
INSERT INTO #Stays SELECT 1, 13, '2016-11-25 08:16:00.000','2016-11-28 23:15:00.000'
expected output would be
EpId StartDate EndDate EventNumber
1 2016-11-22 08:15:00.000 2016-11-23 23:59:59.000 NULL
1 2016-11-24 00:00:00.000 2016-11-25 23:59:59.000 7897
1 2016-11-26 00:00:00.000 2016-11-27 23:59:59.000 7898
1 2016-11-28 00:00:00.000 2016-11-28 23:15:00.000 NULL
here is what i'm trying. It currently doesn't work properly, and i'm sure the method i'm working on is probably not the best. It's currently not melding the two datasets together.
My guess is theres a much easier way to do it with outer or cross apply, but my knowledge of how they work is rather limited.
Any help?
;with e as (
SELECT [EpID]
,EventId
,[EventNumber]
,case when [EventStartDate] > DayStart then [EventStartDate] else DayStart end as [EventStart]
,case when [EventEndDate] < DayEnd then [EventEndDate] else DayEnd end as [EventEnd]
FROM [Events] e
inner join DimStaySegmentDayReference d on d.DayEnd >= e.[EventStartDate] and d.DayStart <= e.[EventEndDate]
),
s as (
select
[EpID]
,StayId
,case when StayStartDate > DayStart then StayStartDate else DayStart end as [StayStart]
,case when StayEndDate < DayEnd then StayEndDate else DayEnd end as [StayEnd]
from Stays s
inner join DimStaySegmentDayReference d on d.DayEnd >= StayStartDate and d.DayStart <= StayEndDate
),
u as (select 'stay' as source, [EpID], StayStart, StayEnd, '' as event from s
union all
select 'event' as source, [EpID], [EventStart], [EventEnd], eventnumber as event from e)
select Source,
[EpID],
Staystart,
stayend,
case when lag(stayend) over (partition by EpId ORDER BY STAYSTART) < StayEnd-0.0001 AND source='event' then lag(stayend) over (partition by EpId ORDER BY STAYSTART) else staystart end as staystartnew,
case when lead(staystart) over (partition by EpID ORDER BY StayStart) < stayend then lead(staystart) over (partition by EpID ORDER BY StayStart) else stayend end as stayendnew,
event
from u
where StayStart <> stayend
order by StayStart
The DayReference table is simply every day with a start and end time so i can split the record into day segments.
I'm using SQL Server 2012
Edit for some context
I've updated my sample data to make it a bit clearer.
The stay table tracks location stays. In this provided case i'm ignoring multiple locations to make finding a solution easier.
Locations and Events are agnostic to each other, other than occurring for the same EpID within the same time frame.
As an example consider tracking time at work, you start at 9am and finish at 5pm. For this work day you'll have say 5 location stays making up the full shift. 9-11 desk, 11-12 meeting, 12-1 lunch, 1-3 meeting, 3-5 desk.
You then have a series of events, lets call it drinking coffee. You drink coffee between 9:30 and 10, and 2-4.
What I need to do is mesh together these two sets of data creating a single timeline.
9-930 desk, 930-10 coffee, 10-11 desk, 11-12 meeting, 12-1 lunch, 1-2 meeting, 2-4 coffee, 4-5 desk.
Hope this helps

Probably some things can be simplified, but will be easy to read what I am validating for each case, also, I think that one row is missing in your output example, I got a last one from 2018-09-14 16:00 To 2018-09-15 12:00 and I did not find a reason on the logic or the question to discard it
Extra validations and a left join to the Stays with no registered events would be needed, but here is my approach
;WITH CTE AS (
SELECT D.*, s.StayId,
EventNumber,
LAG(D.DStart) OVER (ORDER BY EventNumber) As LagStart,
LAG(StayID) OVER (ORDER BY EventNumber) As LagStay,
LAG(Event) OVER (ORDER BY EventNumber) As LagEvent,
LEAD(D.DEnd) OVER (ORDER BY EventNumber) As LeadEnd,
LEAD(StayID) OVER (ORDER BY EventNumber) As LeadStay,
LEAD(Event) OVER (ORDER BY EventNumber) As LeadEvent
FROM #Events E
CROSS APPLY
(
SELECT TOP 1 * FROM #Stays S WHERE E.EventStartDate BETWEEN S.StayStartDate AND S.StayEndDate
UNION
SELECT TOP 1 * FROM #Stays S WHERE E.EventEndDate BETWEEN S.StayStartDate AND S.StayEndDate
) S
CROSS APPLY (
SELECT StayStartDate AS DStart, EventStartDate DEnd, Null AS Event, 1 as c WHERE StayStartDate < EventStartDate
UNION
SELECT EventStartDate, EventEndDate, EventNumber, 2 WHERE EventStartDate >= StayStartDate AND EventEndDate <= StayEndDate
UNION
SELECT StayStartDate, EventEndDate, EventNumber, 3 WHERE StayStartDate > EventStartDate AND EventEndDate < StayEndDate
UNION
SELECT EventStartDate, StayEndDate, EventNumber, 4 WHERE StayStartDate < EventStartDate AND EventEndDate > StayEndDate
UNION
SELECT EventEndDate, StayEndDate, Null, 5 WHERE EventEndDate < StayEndDate
) D
)
SELECT DISTINCT
CASE WHEN LagStay = StayId AND Event IS NULL AND LagEvent IS NULL THEN LagStart
ELSE DStart END AS StartDate,
CASE WHEN LeadStay = StayId AND Event IS NULL AND LeadEvent IS NULL THEN LeadEnd
ELSE DEnd END AS EndDate,
Event, StayID
FROM CTE
ORDER BY StartDate

Related

Calculate date difference between dates based on a specific condition

I have a table History with the columns date, person and status and I need to know what is the total amount of time spent since it started until it reaches the finished status ( Finished status can occur multiples times). I need to get the datediff from the first time it's created until the first time it's with status finished, afterwards I need to get the next date were it's not finished and get again the datediff using the date it was again finished and so on. Another condition is to do this calculation only if Person who changed the status is not null. After that I need to sum all times and get the total.
I tried with Lead and Lag function but was not getting the results that I need.
First let's talk about providing demo data. Here's a good way to do it:
Create a table variable similar to your actual object(s) and then populate them:
DECLARE #statusTable TABLE (Date DATETIME, Person INT, Status NVARCHAR(10), KeyID NVARCHAR(7))
INSERT INTO #statusTable (Date, Person, Status, KeyID) VALUES
('2022-10-07 07:01:17.463', 1, 'Start', 'AAA-111'),
('2022-10-07 07:01:17.463', 1, 'Waiting', 'AAA-111'),
('2022-10-11 14:01:44.463', 1, 'Waiting', 'AAA-111'),
('2022-10-14 10:04:17.463', 1, 'Waiting', 'AAA-111'),
('2022-10-14 10:04:17.463', 1, 'Finished','AAA-111'),
('2022-10-14 10:04:17.463', 1, 'Waiting', 'AAA-111'),
('2022-10-17 17:01:17.463', 1, 'Waiting', 'AAA-111'),
('2022-10-21 11:03:17.463', 1, 'Waiting', 'AAA-111'),
('2022-10-21 11:03:17.463', 1, 'Finished','AAA-111'),
('2022-10-21 11:03:17.463', 1, 'Waiting', 'AAA-111'),
('2022-10-21 11:04:17.463', NULL, 'Waiting', 'AAA-111'),
('2022-10-21 11:05:17.463', 1, 'Finished','AAA-111')
Your problem is recursive, so we can use a rCTE to resolve it.
;WITH base AS (
SELECT *, CASE WHEN LAG(Status,1) OVER (PARTITION BY KeyID ORDER BY Date) <> 'Waiting' AND Status = 'Waiting' THEN 1 END AS isStart, ROW_NUMBER() OVER (PARTITION BY KeyID ORDER BY Date) AS rn
FROM #statusTable
), rCTE AS (
SELECT date AS startDate, date, Person, Status, KeyID, IsStart, rn
FROM base
WHERE isStart = 1
UNION ALL
SELECT a.startDate, r.date, r.Person, r.Status, a.KeyID, r.IsStart, r.rn
FROM rCTE a
INNER JOIN base r
ON a.rn+1 = r.rn
AND a.KeyID = r.KeyID
AND r.IsStart IS NULL
)
SELECT StartDate, MAX(date) AS FinishDate, KeyID, DATEDIFF(MINUTE,StartDate,MAX(Date)) AS Minutes
FROM rCTE
GROUP BY rCTE.startDate, KeyID
HAVING COUNT(Person) = COUNT(KeyID)
StartDate FinishDate KeyID Minutes
---------------------------------------------------------------
2022-10-07 07:01:17.463 2022-10-14 10:04:17.463 AAA-111 10263
2022-10-14 10:04:17.463 2022-10-21 11:03:17.463 AAA-111 10139
What we're doing here is finding, and marking the starts. Since when there is a Start row, the timestamp matches the first Waiting row and there isn't always a start row, we're gonna use the first waiting row as the start marker.
Then, we go through and find the next Finish row for that KeyID.
Using this we can now group on the StartDate, Max the StatusDate (as FinishDate) and then use a DATEDIFF to calculate the difference.
Finally, we compare the count of KeyIDs to the count of Person. If there is a NULL value for Person the counts will not match, and we just discard the data.
select min(date) as start
,max(date) as finish
,datediff(millisecond, min(date), max(date)) as diff_in_millisecond
,sum(datediff(millisecond, min(date), max(date))) over() as total_diff_in_millisecond
from
(
select *
,count(case when Status = 'Finished' then 1 end) over(order by date desc, status desc) as grp
,case when person is null then 0 else 1 end as flg
from t
) t
group by grp
having min(flg) = 1
order by start
start
finish
diff_in_millisecond
total_diff_in_millisecond
2022-10-07 07:01:17.4630000
2022-10-14 10:04:28.4730000
615791010
1242093518
2022-10-14 10:04:28.4730000
2022-10-21 11:03:06.7170000
608318244
1242093518
2022-10-26 12:46:14.7730000
2022-10-26 17:45:59.0370000
17984264
1242093518
Fiddle

Query without WHILE Loop

We have appointment table as shown below. Each appointment need to be categorized as "New" or "Followup". Any appointment (for a patient) within 30 days of first appointment (of that patient) is Followup. After 30 days, appointment is again "New". Any appointment within 30 days become "Followup".
I am currently doing this by typing while loop.
How to achieve this without WHILE loop?
Table
CREATE TABLE #Appt1 (ApptID INT, PatientID INT, ApptDate DATE)
INSERT INTO #Appt1
SELECT 1,101,'2020-01-05' UNION
SELECT 2,505,'2020-01-06' UNION
SELECT 3,505,'2020-01-10' UNION
SELECT 4,505,'2020-01-20' UNION
SELECT 5,101,'2020-01-25' UNION
SELECT 6,101,'2020-02-12' UNION
SELECT 7,101,'2020-02-20' UNION
SELECT 8,101,'2020-03-30' UNION
SELECT 9,303,'2020-01-28' UNION
SELECT 10,303,'2020-02-02'
You need to use recursive query.
The 30days period is counted starting from prev(and no it is not possible to do it without recursion/quirky update/loop). That is why all the existing answer using only ROW_NUMBER failed.
WITH f AS (
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY PatientId ORDER BY ApptDate)
FROM Appt1
), rec AS (
SELECT Category = CAST('New' AS NVARCHAR(20)), ApptId, PatientId, ApptDate, rn, startDate = ApptDate
FROM f
WHERE rn = 1
UNION ALL
SELECT CAST(CASE WHEN DATEDIFF(DAY, rec.startDate,f.ApptDate) <= 30 THEN N'FollowUp' ELSE N'New' END AS NVARCHAR(20)),
f.ApptId,f.PatientId,f.ApptDate, f.rn,
CASE WHEN DATEDIFF(DAY, rec.startDate, f.ApptDate) <= 30 THEN rec.startDate ELSE f.ApptDate END
FROM rec
JOIN f
ON rec.rn = f.rn - 1
AND rec.PatientId = f.PatientId
)
SELECT ApptId, PatientId, ApptDate, Category
FROM rec
ORDER BY PatientId, ApptDate;
db<>fiddle demo
Output:
+---------+------------+-------------+----------+
| ApptId | PatientId | ApptDate | Category |
+---------+------------+-------------+----------+
| 1 | 101 | 2020-01-05 | New |
| 5 | 101 | 2020-01-25 | FollowUp |
| 6 | 101 | 2020-02-12 | New |
| 7 | 101 | 2020-02-20 | FollowUp |
| 8 | 101 | 2020-03-30 | New |
| 9 | 303 | 2020-01-28 | New |
| 10 | 303 | 2020-02-02 | FollowUp |
| 2 | 505 | 2020-01-06 | New |
| 3 | 505 | 2020-01-10 | FollowUp |
| 4 | 505 | 2020-01-20 | FollowUp |
+---------+------------+-------------+----------+
How it works:
f - get starting point(anchor - per every PatientId)
rec - recursibe part, if the difference between current value and prev is > 30 change the category and starting point, in context of PatientId
Main - display sorted resultset
Similar class:
Conditional SUM on Oracle - Capping a windowed function
Session window (Azure Stream Analytics)
Running Total until specific condition is true - Quirky update
Addendum
Do not ever use this code on production!
But another option, that is worth mentioning besides using cte, is to use temp table and update in "rounds"
It could be done in "single" round(quirky update):
CREATE TABLE Appt_temp (ApptID INT , PatientID INT, ApptDate DATE, Category NVARCHAR(10))
INSERT INTO Appt_temp(ApptId, PatientId, ApptDate)
SELECT ApptId, PatientId, ApptDate
FROM Appt1;
CREATE CLUSTERED INDEX Idx_appt ON Appt_temp(PatientID, ApptDate);
Query:
DECLARE #PatientId INT = 0,
#PrevPatientId INT,
#FirstApptDate DATE = NULL;
UPDATE Appt_temp
SET #PrevPatientId = #PatientId
,#PatientId = PatientID
,#FirstApptDate = CASE WHEN #PrevPatientId <> #PatientId THEN ApptDate
WHEN DATEDIFF(DAY, #FirstApptDate, ApptDate)>30 THEN ApptDate
ELSE #FirstApptDate
END
,Category = CASE WHEN #PrevPatientId <> #PatientId THEN 'New'
WHEN #FirstApptDate = ApptDate THEN 'New'
ELSE 'FollowUp'
END
FROM Appt_temp WITH(INDEX(Idx_appt))
OPTION (MAXDOP 1);
SELECT * FROM Appt_temp ORDER BY PatientId, ApptDate;
db<>fiddle Quirky update
You could do this with a recursive cte. You should first order by apptDate within each patient. That can be accomplished by a run-of-the-mill cte.
Then, in the anchor portion of your recursive cte, select the first ordering for each patient, mark the status as 'new', and also mark the apptDate as the date of the most recent 'new' record.
In the recursive portion of your recursive cte, increment to the next appointment, calculate the difference in days between the present appointment and the most recent 'new' appointment date. If it's greater than 30 days, mark it 'new' and reset the most recent new appointment date. Otherwise mark it as 'follow up' and just pass along the existing days since new appointment date.
Finallly, in the base query, just select the columns you want.
with orderings as (
select *,
rn = row_number() over(
partition by patientId
order by apptDate
)
from #appt1 a
),
markings as (
select apptId,
patientId,
apptDate,
rn,
type = convert(varchar(10),'new'),
dateOfNew = apptDate
from orderings
where rn = 1
union all
select o.apptId, o.patientId, o.apptDate, o.rn,
type = convert(varchar(10),iif(ap.daysSinceNew > 30, 'new', 'follow up')),
dateOfNew = iif(ap.daysSinceNew > 30, o.apptDate, m.dateOfNew)
from markings m
join orderings o
on m.patientId = o.patientId
and m.rn + 1 = o.rn
cross apply (select daysSinceNew = datediff(day, m.dateOfNew, o.apptDate)) ap
)
select apptId, patientId, apptDate, type
from markings
order by patientId, rn;
I should mention that I initially deleted this answer because Abhijeet Khandagale's answer seemed to meet your needs with a simpler query (after reworking it a bit). But with your comment to him about your business requirement and your added sample data, I undeleted mine because believe this one meets your needs.
I'm not sure that it's exactly what you implemented. But another option, that is worth mentioning besides using cte, is to use temp table and update in "rounds". So we are going to update temp table while all statuses are not set correctly and build result in an iterative way. We can control number of iteration using simply local variable.
So we split each iteration into two stages.
Set all Followup values that are near to New records. That's pretty easy to do just using right filter.
For the rest of the records that dont have status set we can select first in group with same PatientID. And say that they are new since they not processed by the first stage.
So
CREATE TABLE #Appt2 (ApptID INT, PatientID INT, ApptDate DATE, AppStatus nvarchar(100))
select * from #Appt1
insert into #Appt2 (ApptID, PatientID, ApptDate, AppStatus)
select a1.ApptID, a1.PatientID, a1.ApptDate, null from #Appt1 a1
declare #limit int = 0;
while (exists(select * from #Appt2 where AppStatus IS NULL) and #limit < 1000)
begin
set #limit = #limit+1;
update a2
set
a2.AppStatus = IIF(exists(
select *
from #Appt2 a
where
0 > DATEDIFF(day, a2.ApptDate, a.ApptDate)
and DATEDIFF(day, a2.ApptDate, a.ApptDate) > -30
and a.ApptID != a2.ApptID
and a.PatientID = a2.PatientID
and a.AppStatus = 'New'
), 'Followup', a2.AppStatus)
from #Appt2 a2
--select * from #Appt2
update a2
set a2.AppStatus = 'New'
from #Appt2 a2 join (select a.*, ROW_NUMBER() over (Partition By PatientId order by ApptId) rn from (select * from #Appt2 where AppStatus IS NULL) a) ar
on a2.ApptID = ar.ApptID
and ar.rn = 1
--select * from #Appt2
end
select * from #Appt2 order by PatientID, ApptDate
drop table #Appt1
drop table #Appt2
Update. Read the comment provided by Lukasz. It's by far smarter way. I leave my answer just as an idea.
I believe the recursive common expression is great way to optimize queries avoiding loops, but in some cases it can lead to bad performance and should be avoided if possible.
I use the code below to solve the issue and test it will more values, but encourage you to test it with your real data, too.
WITH DataSource AS
(
SELECT *
,CEILING(DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30 + 0.000001) AS [GroupID]
FROM #Appt1
)
SELECT *
,IIF(ROW_NUMBER() OVER (PARTITION BY [PatientID], [GroupID] ORDER BY [ApptDate]) = 1, 'New', 'Followup')
FROM DataSource
ORDER BY [PatientID]
,[ApptDate];
The idea is pretty simple - I want separate the records in group (30 days), in which group the smallest record is new, the others are follow ups. Check how the statement is built:
SELECT *
,DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate])
,DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30
,CEILING(DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30 + 0.000001)
FROM #Appt1
ORDER BY [PatientID]
,[ApptDate];
So:
first, we are getting the first date, for each group and calculating the differences in days with the current one
then, we are want to get groups - * 1.0 / 30 is added
as for 30, 60, 90, etc days we are getting whole number and we wanted to start a new period, I have added + 0.000001; also, we are using ceiling function to get the smallest integer greater than, or equal to, the specified numeric expression
That's it. Having such group we simply use ROW_NUMBER to find our start date and make it as new and leaving the rest as follow ups.
With due respect to everybody and in IMHO,
There is not much difference between While LOOP and Recursive CTE in terms of RBAR
There is not much performance gain when using Recursive CTE and Window Partition function all in one.
Appid should be int identity(1,1) , or it should be ever increasing clustered index.
Apart from other benefit it also ensure that all successive row APPDate of that patient must be greater.
This way you can easily play with APPID in your query which will be more efficient than putting inequality operator like >,< in APPDate.
Putting inequality operator like >,< in APPID will aid Sql Optimizer.
Also there should be two date column in table like
APPDateTime datetime2(0) not null,
Appdate date not null
As these are most important columns in most important table,so not much cast ,convert.
So Non clustered index can be created on Appdate
Create NonClustered index ix_PID_AppDate_App on APP (patientid,APPDate) include(other column which is not i predicate except APPID)
Test my script with other sample data and lemme know for which sample data it not working.
Even if it do not work then I am sure it can be fix in my script logic itself.
CREATE TABLE #Appt1 (ApptID INT, PatientID INT, ApptDate DATE)
INSERT INTO #Appt1
SELECT 1,101,'2020-01-05' UNION ALL
SELECT 2,505,'2020-01-06' UNION ALL
SELECT 3,505,'2020-01-10' UNION ALL
SELECT 4,505,'2020-01-20' UNION ALL
SELECT 5,101,'2020-01-25' UNION ALL
SELECT 6,101,'2020-02-12' UNION ALL
SELECT 7,101,'2020-02-20' UNION ALL
SELECT 8,101,'2020-03-30' UNION ALL
SELECT 9,303,'2020-01-28' UNION ALL
SELECT 10,303,'2020-02-02'
;With CTE as
(
select a1.* ,a2.ApptDate as NewApptDate
from #Appt1 a1
outer apply(select top 1 a2.ApptID ,a2.ApptDate
from #Appt1 A2
where a1.PatientID=a2.PatientID and a1.ApptID>a2.ApptID
and DATEDIFF(day,a2.ApptDate, a1.ApptDate)>30
order by a2.ApptID desc )A2
)
,CTE1 as
(
select a1.*, a2.ApptDate as FollowApptDate
from CTE A1
outer apply(select top 1 a2.ApptID ,a2.ApptDate
from #Appt1 A2
where a1.PatientID=a2.PatientID and a1.ApptID>a2.ApptID
and DATEDIFF(day,a2.ApptDate, a1.ApptDate)<=30
order by a2.ApptID desc )A2
)
select *
,case when FollowApptDate is null then 'New'
when NewApptDate is not null and FollowApptDate is not null
and DATEDIFF(day,NewApptDate, FollowApptDate)<=30 then 'New'
else 'Followup' end
as Category
from cte1 a1
order by a1.PatientID
drop table #Appt1
Although it's not clearly addressed in the question, it's easy to figure out that the appointment dates cannot be simply categorized by 30-day groups. It makes no business sense. And you cannot use the appt id either. One can make a new appointment today for 2020-09-06.
Here is how I address this issue. First, get the first appointment, then calculate the date difference between each appointment and the first appt. If it's 0, set to 'New'. If <= 30 'Followup'. If > 30, set as 'Undecided' and do the next round check until there is no more 'Undecided'. And for that, you really need a while loop, but it does not loop through each appointment date, rather only a few datasets. I checked the execution plan. Even though there are only 10 rows, the query cost is significantly lower than that using recursive CTE, but not as low as Lukasz Szozda's addendum method.
IF OBJECT_ID('tempdb..#TEMPTABLE') IS NOT NULL DROP TABLE #TEMPTABLE
SELECT ApptID, PatientID, ApptDate
,CASE WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) = 0) THEN 'New'
WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) <= 30) THEN 'Followup'
ELSE 'Undecided' END AS Category
INTO #TEMPTABLE
FROM #Appt1
WHILE EXISTS(SELECT TOP 1 * FROM #TEMPTABLE WHERE Category = 'Undecided') BEGIN
;WITH CTE AS (
SELECT ApptID, PatientID, ApptDate
,CASE WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) = 0) THEN 'New'
WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) <= 30) THEN 'Followup'
ELSE 'Undecided' END AS Category
FROM #TEMPTABLE
WHERE Category = 'Undecided'
)
UPDATE #TEMPTABLE
SET Category = CTE.Category
FROM #TEMPTABLE t
LEFT JOIN CTE ON CTE.ApptID = t.ApptID
WHERE t.Category = 'Undecided'
END
SELECT ApptID, PatientID, ApptDate, Category
FROM #TEMPTABLE
I hope this will help you.
WITH CTE AS
(
SELECT #Appt1.*, RowNum = ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ApptDate, ApptID) FROM #Appt1
)
SELECT A.ApptID , A.PatientID , A.ApptDate ,
Expected_Category = CASE WHEN (DATEDIFF(MONTH, B.ApptDate, A.ApptDate) > 0) THEN 'New'
WHEN (DATEDIFF(DAY, B.ApptDate, A.ApptDate) <= 30) then 'Followup'
ELSE 'New' END
FROM CTE A
LEFT OUTER JOIN CTE B on A.PatientID = B.PatientID
AND A.rownum = B.rownum + 1
ORDER BY A.PatientID, A.ApptDate
You could use a Case statement.
select
*,
CASE
WHEN DATEDIFF(d,A1.ApptDate,A2.ApptDate)>30 THEN 'New'
ELSE 'FollowUp'
END 'Category'
from
(SELECT PatientId, MIN(ApptId) 'ApptId', MIN(ApptDate) 'ApptDate' FROM #Appt1 GROUP BY PatientID) A1,
#Appt1 A2
where
A1.PatientID=A2.PatientID AND A1.ApptID<A2.ApptID
The question is, should this category be assigned based off the initial appointment, or the one prior? That is, if a Patient has had three appointments, should we compare the third appointment to the first, or the second?
You problem states the first, which is how I've answered. If that's not the case, you'll want to use lag.
Also, keep in mind that DateDiff makes not exception for weekends. If this should be weekdays only, you'll need to create your own Scalar-Valued function.
using Lag function
select apptID, PatientID , Apptdate ,
case when date_diff IS NULL THEN 'NEW'
when date_diff < 30 and (date_diff_2 IS NULL or date_diff_2 < 30) THEN 'Follow Up'
ELSE 'NEW'
END AS STATUS FROM
(
select
apptID, PatientID , Apptdate ,
DATEDIFF (day,lag(Apptdate) over (PARTITION BY PatientID order by ApptID asc),Apptdate) date_diff ,
DATEDIFF(day,lag(Apptdate,2) over (PARTITION BY PatientID order by ApptID asc),Apptdate) date_diff_2
from #Appt1
) SRC
Demo --> https://rextester.com/TNW43808
with cte
as
(
select
tmp.*,
IsNull(Lag(ApptDate) Over (partition by PatientID Order by PatientID,ApptDate),ApptDate) PriorApptDate
from #Appt1 tmp
)
select
PatientID,
ApptDate,
PriorApptDate,
DateDiff(d,PriorApptDate,ApptDate) Elapsed,
Case when DateDiff(d,PriorApptDate,ApptDate)>30
or DateDiff(d,PriorApptDate,ApptDate)=0 then 'New' else 'Followup' end Category from cte
Mine is correct. The authors was incorrect, see elapsed

Insert missing dates into existing table

I have a query that finds missing dates from a table.
The query is:
;WITH NullGaps AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY ChannelName, ReadingDate) AS ID,
SerialNumber, ReadingDate, ChannelName, uid
FROM
[UriData]
)
SELECT
(DATEDIFF(MINUTE, g1.ReadingDate , g2.ReadingDate) / 15) -1 AS 'MissingCount',
g1.ReadingDate AS 'FromDate', g2.ReadingDate AS 'ToDate'
FROM
NullGaps g1
INNER JOIN
NullGaps g2 ON g1.ID = (g2.ID - 1)
WHERE
DATEADD(MINUTE, 15, g1.ReadingDate) < g2.ReadingDate
The output is:
--------------------------------------------------------------
| MissingCount | FromDate | ToDate |
--------------------------------------------------------------
| 2 | 2018-09-20 14:30:00 | 2018-09-20 15:15:00 |
| 1 | 2018-09-20 15:30:00 | 2018-09-20 16:00:00 |
| 1 | 2018-09-20 20:30:00 | 2018-09-20 21:00:00 |
--------------------------------------------------------------
The output is the number of datetimes that are missing from the FromDate to the ToDate (which both exist). For example, in the first row of the output (above), the times I want to create and insert will be '2018-09-20 14:45:00' and '2018-09-20 15:00:00' (they are all 15-minute intervals)
I need to understand, how I now create the new dates and insert them into an existing table. I can create one date, but I can't create dates where there are multiple missing values between two times.
TIA
SQL Fiddle
If you also want to find the missing datetimes at the start and the end of a date?
Then comparing to generated datetimes should be a valiable method.
Such dates can be generated via a Recursive CTE.
Then you can join your data to the Recursive CTE and select those that are missing.
Or use a NOT EXISTS.
For example:
WITH RCTE AS
(
select [SerialNumber], [ChannelName], 0 as Lvl, cast(cast([ReadingDate] as date) as datetime) as ReadingDate
from [UriData]
group by SerialNumber, [ChannelName], cast([ReadingDate] as date)
union all
select [SerialNumber], [ChannelName], Lvl + 1, DATEADD(MINUTE,15,[ReadingDate])
from RCTE
where cast([ReadingDate] as date) = cast(DATEADD(MINUTE,15,[ReadingDate]) as date)
)
SELECT [SerialNumber], [ChannelName], [ReadingDate] AS FromDate
FROM RCTE r
WHERE NOT EXISTS
(
select 1
from [UriData] t
where t.[SerialNumber] = r.[SerialNumber]
and t.[ChannelName] = r.[ChannelName]
and t.[ReadingDate] = r.[ReadingDate]
);
A test can be found here
And here's another query that takes a different approuch :
WITH CTE AS
(
SELECT SerialNumber, ChannelName, ReadingDate,
LAG(ReadingDate) OVER (PARTITION BY SerialNumber, ChannelName ORDER BY ReadingDate) AS prevReadingDate
FROM [UriData]
)
, RCTE AS
(
select SerialNumber, ChannelName, 0 as Lvl,
prevReadingDate AS ReadingDate,
prevReadingDate AS MinReadingDate,
ReadingDate AS MaxReadingDate
from CTE
where DATEDIFF(MINUTE, prevReadingDate, ReadingDate) > 15
union all
select SerialNumber, ChannelName, Lvl + 1,
DATEADD(MINUTE,15,ReadingDate),
MinReadingDate,
MaxReadingDate
from RCTE
where ReadingDate < DATEADD(MINUTE,-15,MaxReadingDate)
)
select SerialNumber, ChannelName,
ReadingDate AS FromDate,
DATEADD(MINUTE,15,ReadingDate) AS ToDate,
dense_rank() over (partition by SerialNumber, ChannelName order by MinReadingDate) as GapRank,
(DATEDIFF(MINUTE, MinReadingDate, MaxReadingDate) / 15) AS TotalMissingQuarterGaps
from RCTE
where Lvl > 0 AND MinReadingDate < MaxReadingDate
ORDER BY SerialNumber, ChannelName, MinReadingDate;
You can test that one here
I don't understand your query for calculating missing values. Your question doesn't have sample data or explain the logic. I'm pretty sure that lag() would be much simpler.
But given your query (or any other), one method to expand out the data is to use a recursive CTE:
with missing as (<your query here>)
cte as (
select dateadd(minute, 15, fromdate) as dte, missingcount - 1 as missingcount
from missing
union all
select dateadd(minute, 15, dte), missingcount - 1
from cte
where missingcount > 0
)
select *
from cte;
If you have more than 100 missing times in one row, then add option (maxrecursion 0) to the end of the query.
Based on the information shared with me, I did the following which does what I need.
The first part is to find the date ranges that are missing by finding the from and to dates that have missing dates between them, then insert them into a table for auditing, but it will hold the missing dates I am looking for:
;WITH NullGaps AS(
SELECT ROW_NUMBER() OVER (ORDER BY ChannelName, ReadingDate) AS ID,SerialNumber, ReadingDate, ChannelName, uid
FROM [Staging].[UriData]
)
INSERT INTO [Staging].[MissingDates]
SELECT (DATEDIFF(MINUTE, g1.ReadingDate , g2.ReadingDate) / 15) -1 AS 'MissingCount',
g1.ChannelName,
g1.SerialNumber,
g1.ReadingDate AS FromDate,
g2.ReadingDate AS ToDate
FROM NullGaps g1
INNER JOIN NullGaps g2
ON g1.ID = (g2.ID - 1)
WHERE DATEADD(MINUTE, 15, g1.ReadingDate) < g2.ReadingDate
AND g1.ChannelName IN (SELECT ChannelName FROM staging.ActiveChannels)
AND NOT EXISTS(
SELECT 1 FROM [Staging].[MissingDates] m
WHERE m.Channel = g1.ChannelName
AND m.Serial = g1.SerialNumber
AND m.FromDate = g1.ReadingDate
AND m.ToDate = g2.ReadingDate
)
Now that I have the ranges to look for, I can now create the missing dates and insert them into the table that holds real data.
;WITH MissingDateTime AS(
SELECT DATEADD(MINUTE, 15, FromDate) AS dte, MissingCount -1 AS MissingCount, Serial, Channel
FROM [Staging].[MissingDates]
UNION ALL
SELECT DATEADD(MINUTE, 15, dte), MissingCount - 1, Serial, Channel
FROM MissingDateTime
WHERE MissingCount > 0
) -- END CTE
INSERT INTO [Staging].[UriData]
SELECT NEWID(), Serial, Channel, '999', '0', dte, CURRENT_TIMESTAMP, 0,1,0 FROM MissingDateTime m
WHERE NOT EXISTS(
SELECT 1 FROM [Staging].[UriData] u
WHERE u.ChannelName = m.Channel
AND u.SerialNumber = m.Serial
AND u.ReadingDate = m.dte
) -- END SELECT
I am sure you can offer improvements to this. This solution finds only the missing dates and allows me to back fill my data table with only the missing dates. I can also change the intervals later should other devices need to be used for different intervals. I have put the queries in two sperarate SPROC's so I can control both apects, being: one for auditing and one for back filling.

Filter LEFT JOINed table with dates to display current event, else future, else past?

I have a table that lists vacation information for different users (username, vacation start, and vacation end dates) -- 4 users are listed below:
Username VacationStart DeploymentEnd
rsuarez 2014-03-10 2014-03-26
studd 2014-01-18 2014-01-29
studd 2014-02-11 2014-02-26
studd 2014-03-02 2014-03-04
ssteele 2014-03-11 2014-03-26
ssteele 2014-03-18 2014-03-28
atidball 2014-03-05 2014-03-20
atidball 2014-03-06 2014-03-26
atidball 2014-03-13 2014-03-20
atidball 2014-03-18 2014-03-31
For a new query, I want to display only 4 rows, with each user having only one set of vacation dates displayed, either current/in-progress vacation, future/next vacation (if no current exists) or most recent (if two above are false).
The end result should be following (assuming today is 3/9/2014):
Username VacationStart DeploymentEnd
rsuarez 2014-03-10 2014-03-26
studd 2014-03-02 2014-03-04
ssteele 2014-03-11 2014-03-26
atidball 2014-03-05 2014-03-20
Vacation dates are actually coming from another table (data_vacations), which I left join to data_users. I am trying to perform case selection inside left join statement.
Here is what I tried before, but my logic fails there, since I ended up to mix different vacation end dates to vacation start dates:
SELECT Username, VacationStart, VacationEnd
FROM data_users
LEFT JOIN
(
SELECT userGUID,
CASE WHEN MIN(CASE WHEN (VacationEnd < getdate()) THEN NULL ELSE VacationStart END) IS NULL THEN MAX(VacationStart)
ELSE MIN(VacationStart) END AS VacationStart,
CASE WHEN MIN(CASE WHEN (VacationEnd < getdate()) THEN NULL ELSE VacationEnd END) IS NULL THEN MAX(VacationEnd)
ELSE MIN(VacationEnd) END AS VacationEnd
FROM data_vacations
GROUP BY userGUID
) b ON(data_empl_master.userGUID= b.userGUID)
What am I doing wrong? How could I fix it?
Also.. on side note.. Do I perform this filtering in LEFT JOIN correctly? Since data_users is much bigger, having distinct user ids... and I would like to join the available vacation information based on example above, while still displaying all unique user ids.
Using a common table expression to rank by category (current = 1, future = 2, past = 3) and each category individually by start date/differene from GETDATE(), you can get the result you want by ranking the result using ROW_NUMBER();
DECLARE #DATE DATETIME = GETDATE()
;WITH cte AS (
SELECT *, 1 r, VacationStart s FROM data_users
WHERE #DATE BETWEEN VacationStart and DeploymentEnd
UNION ALL
SELECT *,2 r, VacationStart - #DATE s FROM data_users
WHERE VacationStart > #DATE
UNION ALL
SELECT *,3 r, #DATE - DeploymentEnd s FROM data_users
WHERE DeploymentEnd < #DATE
), cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY username ORDER BY r,s) rn FROM cte
)
SELECT Username, VacationStart, DeploymentEnd FROM cte2 WHERE rn=1;
An SQLfiddle to test with.
Getting the date as a variable is necessary to get a consistent GETDATE() value over the whole query, otherwise it may not be consistent if called multiple times.
select u.name,s.startdate,s.enddate
from users u
left join
(
select su.name,
max(su.start) as startdate,
max(su.end) as enddate from users su group by su.name
)s on u.name= s.name
group by u.name
Since you are asking two questions I will answer the one about getting the vacation dates and let you figure out the join.
I don't think you can get the desired vacations dates in one simple query. First you need to establish if the given date range is in past, present or future. Then you need to order those ranges by start/end dates to get the most recent or next upcoming. You need sort the past vacations in descending and upcoming in ascending order. Funny enough user atidball has two vacations in-progress, I sorted that in the same manner as future vacation. Finally apply your rules, I did that by sorting by state.
declare #currentDate date = '20140309'
;
with cte1 as
(
-- state: the lower number the higher priority
select Username, VacationStart, DeploymentEnd,
case
when VacationStart <= #currentDate and DeploymentEnd >= #currentDate
then 0 -- in progress
when VacationStart > #currentDate
then 1 -- future
when DeploymentEnd < #currentDate
then 2 -- past
else NULL
end as state
from data_vacations
)
, cte2 as
(
select *,
row_number() over(partition by username, state order by VacationStart, DeploymentEnd) as rn
from cte1
where state < 2 -- current or upcoming
union all
select *,
row_number() over(partition by username, state order by DeploymentEnd desc, VacationStart desc) as rn
from cte1
where state = 2 -- past
)
, cte3 as
(
-- apply the rules: find the record with highest priority
select Username, min(state) as minstate
from cte1
group by Username
)
select cte2.Username, cte2.VacationStart, cte2.DeploymentEnd
from cte2
inner join cte3
on cte2.Username = cte3.Username
and cte2.state = cte3.minstate
and cte2.rn = 1 -- most recent or next upcoming
See the SQLFiddle.

Finding overlapping dates

I have a set of Meeting rooms and meetings in that having start date and end Date. A set of meeting rooms belong to a building.
The meeting details are kept in MeetingDetail table having a startDate and endDate.
Now I want to fire a report between two time period say reportStartDate and reportEndDate, which finds me the time slots in which all the meeting rooms are booked for a given building
Table structure
MEETING_ROOM - ID, ROOMNAME, BUILDING_NO
MEETING_DETAIL - ID, MEETING_ROOM_ID, START_DATE, END_DATE
The query has to be fired for reportStartDate and REportEndDate
Just to clarify further, the aim is to find all the time slots in which all the meeting rooms were booked in a given time period of reportStartDate and reportEndDate
For SQL Server 2005+ you could try the following (see note at the end for mysql)
WITH TIME_POINTS (POINT_P) AS
(SELECT DISTINCT START_DATE FROM MEETING_DETAIL
WHERE START_DATE > #reportStartDate AND START_DATE < #reportEndDate
UNION SELECT DISTINCT END_DATE FROM MEETING_DETAIL
WHERE END_DATE > #reportStartDate AND END_DATE < #reportEndDate
UNION SELECT #reportEndDate
UNION SELECT #reportStartDate),
WITH TIME_SLICE (START_T, END_T) AS
(SELECT A.POINT_P, MIN(B.POINT_P) FROM
TIMEPOINTS A
INNER JOIN TIMEPOINTS B ON A.POINT_P > B.POINT_P
GROUP BY A.POINT_P),
WITH SLICE_MEETINGS (START_T, END_T, MEETING_ROOM_ID, BUILDING_NO) AS
(SELECT START_T, END_T, MEETING_ROOM_ID, BUILDING_NO FROM
TIME_SLICE A
INNER JOIN MEETING_DETAIL B ON B.START_DATE <= A.START_T AND B.END_DATE >= B.END_T
INNER JOIN MEETING_ROOM C ON B.MEETING_ROOM_ID = C.ID),
WITH SLICE_COUNT (START_T, END_T, BUILDING_NO, ROOMS_C) AS
(SELECT START_T, END_T, BUILDING_NO, COUNT(MEETING_ROOM_ID) FROM
SLICE_MEETINGS
GROUP BY START_T, END_T, BUILDING_NO),
WITH ROOMS_BUILDING (BUILDING_NO, ROOMS_C) AS
(SELECT BUILDING_NO, COUNT(ID) FROM
MEETING_ROOM
GROUP BY BUILDING_NO)
SELECT B.BUILDING_NO, A.START_T, A.END_T
FROM SLICE_COUNT A.
INNER JOIN ROOMS_BUILDING B WHERE A.BUILDING_NO = B.BUILDING_NO AND B.ROOMS_C = A.ROOMS_C;
what it does is (each step corresponds to each CTE definition above)
Get all the time markers, i.e. end or start times
Get all time slices i.e. the smallest unit of time between which there is no other time marker (i.e. no meetings start in a time slice, it's either at the beginning or at the end of a time slice)
Get meetings for each time slice, so now you get something like
10.30 11.00 Room1 BuildingA
10.30 11.00 Room2 BuildingA
11.00 12.00 Room1 BuildingA
Get counts of rooms booked per building per time slice
Filter out timeslice-building combinations that match the number of rooms in each building
Edit
Since mysql doesn't support the WITH clause you'll have to construct views for each (of the 5) WITH clases above. everything else would remain the same.
After reading your comment, I think I understand the problem a bit better. As a first step I would generate a matrix of meeting rooms and time slots using cross join:
select *
from (
select distinct start_date
, end_date
from #meeting_detail
) ts
cross join
#meeting_room mr
Then, for each cell in the matrix, add meetings in that timeslot:
left join
#meeting_detail md
on mr.id = md.meeting_room_id
and ts.start_date < md.end_date
and md.start_date < ts.end_date
And then demand that there are no free rooms. For example, by saying that the left join must succeed for all rooms and time slots. A left join succeeds if any field is not null:
group by
mr.building_no
, ts.start_date
, ts.end_date
having max(case when md.meeting_room_id is null
then 1 else 0 end) = 0
Here's a complete working example. It's written for SQL Server, and the table variables (#meeting_detail) won't work in MySQL. But the report generating query should work in most databases:
set nocount on
declare #meeting_room table (id int, roomname varchar(50),
building_no int)
declare #meeting_detail table (meeting_room_id int,
start_date datetime, end_date datetime)
insert #meeting_room (id, roomname, building_no)
select 1, 'Kitchen', 6
union all select 2, 'Ballroom', 6
union all select 3, 'Conservatory', 7
union all select 4, 'Dining Room', 7
insert #meeting_detail (meeting_room_id, start_date, end_date)
select 1, '2010-08-01 9:00', '2010-08-01 10:00'
union all select 1, '2010-08-01 10:00', '2010-08-01 11:00'
union all select 2, '2010-08-01 10:00', '2010-08-01 11:00'
union all select 3, '2010-08-01 10:00', '2010-08-01 11:00'
select mr.building_no
, ts.start_date
, ts.end_date
from (
select distinct start_date
, end_date
from #meeting_detail
) ts
cross join
#meeting_room mr
left join
#meeting_detail md
on mr.id = md.meeting_room_id
and ts.start_date < md.end_date
and md.start_date < ts.end_date
group by
mr.building_no
, ts.start_date
, ts.end_date
having max(case when md.meeting_room_id is null
then 1 else 0 end) = 0
This prints:
building_no start end
6 2010-08-01 10:00:00.000 2010-08-01 11:00:00.000