Iterating through rows to capture the value in the next row - sql

I have been a long time reader of this forum. It has helped me a lot, however I have a question which I cannot find a solution specific to my requirements.
I am given the task to develop a metric to determine how many days the 'Staff Performance Evaulations' are past due. The data comes in the following format:
EmployeeID LastEvalCompleteDate NextEvalDueDate
1001 2010-01-01 2010-11-01
1001 2010-11-20 2011-11-01
1001 2011-10-29 2012-11-15
1002 NULL 2013-12-01
According to the sample data above, the employee 1001 has had 3 evals since 2010-01-01. Employee 1002 has started this year and his first eval is due on 2013-12-01.
What I need to do is to convert the data to this format:
EmployeeID EvalDueDate EvalCompleteDate DaysPastDue
1001 2010-11-01 2010-11-20 19
1001 2011-11-01 2011-10-29 -2
1001 2012-11-15 NULL 342 (based on today's date)
1002 2013-12-01 NULL -39 (based on today's date)
As you noticed, I derive a new row by taking the value of NextEvalDueDate column and mapping it to the EvalDueDate column in my new table. I also take the value in the LastEvalCompleteDate column in the NEXT row and map it to the NextEvalDueDate column.
I am having trouble with iterating through the rows for a given EmployeeID. I tried using ROW_NUMBER() OVER (PARTITION BY ...) but it did not take me anywhere.
I appreciate any kind of help. Thank you.

You went into right direction using ROW_NUMBER() OVER (PARTITION BY ...). Don't know where have you stuck, but it should be something like this:
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY NextEvalDueDate) RN
FROM dbo.Table1
)
SELECT
c1.EmployeeID
, c1.NextEvalDueDate AS EvalDueDate
, c2.LastEvalCompleteDate AS EvalCompleteDate
, DATEDIFF(DAY, c1.NextEvalDueDate, COALESCE(c2.LastEvalCompleteDate, GETDATE())) AS DaysPastDue
FROM CTE c1
LEFT JOIN CTE c2 ON c1.EmployeeID = c2.EmployeeID AND c1.RN = c2.RN - 1
ORDER BY c1.EmployeeID, c1.RN

DECLARE #Results TABLE
(
EmployeeID INT NOT NULL,
RowNum INT NOT NULL,
PRIMARY KEY (RowNum, EmployeeID),
LastEvalCompleteDate DATE,
NextEvalDueDate DATE
);
INSERT #Results (RowNum, EmployeeID, LastEvalCompleteDate, NextEvalDueDate)
SELECT ROW_NUMBER() OVER(PARTITION BY e.EmployeeID ORDER BY e.LastEvalCompleteDate),
e.EmployeeID,
e.LastEvalCompleteDate,
e.NextEvalDueDate
FROM dbo.EmployeeEvaluation e;
WITH Base
AS
(
SELECT crt.RowNum,
crt.EmployeeID,
crt.NextEvalDueDate AS EvalDueDate,
nxt.LastEvalCompleteDate AS EvalCompleteDate
FROM #Results crt
LEFT JOIN #Results nxt ON crt.EmployeeID = nxt.EmployeeID AND crt.RowNum + 1 = nxt.RowNum
)
SELECT r.*,
DATEDIFF(DAY, r.EvalDueDate, ISNULL(r.EvalCompleteDate, GETDATE())) AS DaysPastDue
FROM Base r
ORDER BY r.EmployeeID, r.RowNum

Related

Query without WHILE Loop

We have appointment table as shown below. Each appointment need to be categorized as "New" or "Followup". Any appointment (for a patient) within 30 days of first appointment (of that patient) is Followup. After 30 days, appointment is again "New". Any appointment within 30 days become "Followup".
I am currently doing this by typing while loop.
How to achieve this without WHILE loop?
Table
CREATE TABLE #Appt1 (ApptID INT, PatientID INT, ApptDate DATE)
INSERT INTO #Appt1
SELECT 1,101,'2020-01-05' UNION
SELECT 2,505,'2020-01-06' UNION
SELECT 3,505,'2020-01-10' UNION
SELECT 4,505,'2020-01-20' UNION
SELECT 5,101,'2020-01-25' UNION
SELECT 6,101,'2020-02-12' UNION
SELECT 7,101,'2020-02-20' UNION
SELECT 8,101,'2020-03-30' UNION
SELECT 9,303,'2020-01-28' UNION
SELECT 10,303,'2020-02-02'
You need to use recursive query.
The 30days period is counted starting from prev(and no it is not possible to do it without recursion/quirky update/loop). That is why all the existing answer using only ROW_NUMBER failed.
WITH f AS (
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY PatientId ORDER BY ApptDate)
FROM Appt1
), rec AS (
SELECT Category = CAST('New' AS NVARCHAR(20)), ApptId, PatientId, ApptDate, rn, startDate = ApptDate
FROM f
WHERE rn = 1
UNION ALL
SELECT CAST(CASE WHEN DATEDIFF(DAY, rec.startDate,f.ApptDate) <= 30 THEN N'FollowUp' ELSE N'New' END AS NVARCHAR(20)),
f.ApptId,f.PatientId,f.ApptDate, f.rn,
CASE WHEN DATEDIFF(DAY, rec.startDate, f.ApptDate) <= 30 THEN rec.startDate ELSE f.ApptDate END
FROM rec
JOIN f
ON rec.rn = f.rn - 1
AND rec.PatientId = f.PatientId
)
SELECT ApptId, PatientId, ApptDate, Category
FROM rec
ORDER BY PatientId, ApptDate;
db<>fiddle demo
Output:
+---------+------------+-------------+----------+
| ApptId | PatientId | ApptDate | Category |
+---------+------------+-------------+----------+
| 1 | 101 | 2020-01-05 | New |
| 5 | 101 | 2020-01-25 | FollowUp |
| 6 | 101 | 2020-02-12 | New |
| 7 | 101 | 2020-02-20 | FollowUp |
| 8 | 101 | 2020-03-30 | New |
| 9 | 303 | 2020-01-28 | New |
| 10 | 303 | 2020-02-02 | FollowUp |
| 2 | 505 | 2020-01-06 | New |
| 3 | 505 | 2020-01-10 | FollowUp |
| 4 | 505 | 2020-01-20 | FollowUp |
+---------+------------+-------------+----------+
How it works:
f - get starting point(anchor - per every PatientId)
rec - recursibe part, if the difference between current value and prev is > 30 change the category and starting point, in context of PatientId
Main - display sorted resultset
Similar class:
Conditional SUM on Oracle - Capping a windowed function
Session window (Azure Stream Analytics)
Running Total until specific condition is true - Quirky update
Addendum
Do not ever use this code on production!
But another option, that is worth mentioning besides using cte, is to use temp table and update in "rounds"
It could be done in "single" round(quirky update):
CREATE TABLE Appt_temp (ApptID INT , PatientID INT, ApptDate DATE, Category NVARCHAR(10))
INSERT INTO Appt_temp(ApptId, PatientId, ApptDate)
SELECT ApptId, PatientId, ApptDate
FROM Appt1;
CREATE CLUSTERED INDEX Idx_appt ON Appt_temp(PatientID, ApptDate);
Query:
DECLARE #PatientId INT = 0,
#PrevPatientId INT,
#FirstApptDate DATE = NULL;
UPDATE Appt_temp
SET #PrevPatientId = #PatientId
,#PatientId = PatientID
,#FirstApptDate = CASE WHEN #PrevPatientId <> #PatientId THEN ApptDate
WHEN DATEDIFF(DAY, #FirstApptDate, ApptDate)>30 THEN ApptDate
ELSE #FirstApptDate
END
,Category = CASE WHEN #PrevPatientId <> #PatientId THEN 'New'
WHEN #FirstApptDate = ApptDate THEN 'New'
ELSE 'FollowUp'
END
FROM Appt_temp WITH(INDEX(Idx_appt))
OPTION (MAXDOP 1);
SELECT * FROM Appt_temp ORDER BY PatientId, ApptDate;
db<>fiddle Quirky update
You could do this with a recursive cte. You should first order by apptDate within each patient. That can be accomplished by a run-of-the-mill cte.
Then, in the anchor portion of your recursive cte, select the first ordering for each patient, mark the status as 'new', and also mark the apptDate as the date of the most recent 'new' record.
In the recursive portion of your recursive cte, increment to the next appointment, calculate the difference in days between the present appointment and the most recent 'new' appointment date. If it's greater than 30 days, mark it 'new' and reset the most recent new appointment date. Otherwise mark it as 'follow up' and just pass along the existing days since new appointment date.
Finallly, in the base query, just select the columns you want.
with orderings as (
select *,
rn = row_number() over(
partition by patientId
order by apptDate
)
from #appt1 a
),
markings as (
select apptId,
patientId,
apptDate,
rn,
type = convert(varchar(10),'new'),
dateOfNew = apptDate
from orderings
where rn = 1
union all
select o.apptId, o.patientId, o.apptDate, o.rn,
type = convert(varchar(10),iif(ap.daysSinceNew > 30, 'new', 'follow up')),
dateOfNew = iif(ap.daysSinceNew > 30, o.apptDate, m.dateOfNew)
from markings m
join orderings o
on m.patientId = o.patientId
and m.rn + 1 = o.rn
cross apply (select daysSinceNew = datediff(day, m.dateOfNew, o.apptDate)) ap
)
select apptId, patientId, apptDate, type
from markings
order by patientId, rn;
I should mention that I initially deleted this answer because Abhijeet Khandagale's answer seemed to meet your needs with a simpler query (after reworking it a bit). But with your comment to him about your business requirement and your added sample data, I undeleted mine because believe this one meets your needs.
I'm not sure that it's exactly what you implemented. But another option, that is worth mentioning besides using cte, is to use temp table and update in "rounds". So we are going to update temp table while all statuses are not set correctly and build result in an iterative way. We can control number of iteration using simply local variable.
So we split each iteration into two stages.
Set all Followup values that are near to New records. That's pretty easy to do just using right filter.
For the rest of the records that dont have status set we can select first in group with same PatientID. And say that they are new since they not processed by the first stage.
So
CREATE TABLE #Appt2 (ApptID INT, PatientID INT, ApptDate DATE, AppStatus nvarchar(100))
select * from #Appt1
insert into #Appt2 (ApptID, PatientID, ApptDate, AppStatus)
select a1.ApptID, a1.PatientID, a1.ApptDate, null from #Appt1 a1
declare #limit int = 0;
while (exists(select * from #Appt2 where AppStatus IS NULL) and #limit < 1000)
begin
set #limit = #limit+1;
update a2
set
a2.AppStatus = IIF(exists(
select *
from #Appt2 a
where
0 > DATEDIFF(day, a2.ApptDate, a.ApptDate)
and DATEDIFF(day, a2.ApptDate, a.ApptDate) > -30
and a.ApptID != a2.ApptID
and a.PatientID = a2.PatientID
and a.AppStatus = 'New'
), 'Followup', a2.AppStatus)
from #Appt2 a2
--select * from #Appt2
update a2
set a2.AppStatus = 'New'
from #Appt2 a2 join (select a.*, ROW_NUMBER() over (Partition By PatientId order by ApptId) rn from (select * from #Appt2 where AppStatus IS NULL) a) ar
on a2.ApptID = ar.ApptID
and ar.rn = 1
--select * from #Appt2
end
select * from #Appt2 order by PatientID, ApptDate
drop table #Appt1
drop table #Appt2
Update. Read the comment provided by Lukasz. It's by far smarter way. I leave my answer just as an idea.
I believe the recursive common expression is great way to optimize queries avoiding loops, but in some cases it can lead to bad performance and should be avoided if possible.
I use the code below to solve the issue and test it will more values, but encourage you to test it with your real data, too.
WITH DataSource AS
(
SELECT *
,CEILING(DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30 + 0.000001) AS [GroupID]
FROM #Appt1
)
SELECT *
,IIF(ROW_NUMBER() OVER (PARTITION BY [PatientID], [GroupID] ORDER BY [ApptDate]) = 1, 'New', 'Followup')
FROM DataSource
ORDER BY [PatientID]
,[ApptDate];
The idea is pretty simple - I want separate the records in group (30 days), in which group the smallest record is new, the others are follow ups. Check how the statement is built:
SELECT *
,DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate])
,DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30
,CEILING(DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30 + 0.000001)
FROM #Appt1
ORDER BY [PatientID]
,[ApptDate];
So:
first, we are getting the first date, for each group and calculating the differences in days with the current one
then, we are want to get groups - * 1.0 / 30 is added
as for 30, 60, 90, etc days we are getting whole number and we wanted to start a new period, I have added + 0.000001; also, we are using ceiling function to get the smallest integer greater than, or equal to, the specified numeric expression
That's it. Having such group we simply use ROW_NUMBER to find our start date and make it as new and leaving the rest as follow ups.
With due respect to everybody and in IMHO,
There is not much difference between While LOOP and Recursive CTE in terms of RBAR
There is not much performance gain when using Recursive CTE and Window Partition function all in one.
Appid should be int identity(1,1) , or it should be ever increasing clustered index.
Apart from other benefit it also ensure that all successive row APPDate of that patient must be greater.
This way you can easily play with APPID in your query which will be more efficient than putting inequality operator like >,< in APPDate.
Putting inequality operator like >,< in APPID will aid Sql Optimizer.
Also there should be two date column in table like
APPDateTime datetime2(0) not null,
Appdate date not null
As these are most important columns in most important table,so not much cast ,convert.
So Non clustered index can be created on Appdate
Create NonClustered index ix_PID_AppDate_App on APP (patientid,APPDate) include(other column which is not i predicate except APPID)
Test my script with other sample data and lemme know for which sample data it not working.
Even if it do not work then I am sure it can be fix in my script logic itself.
CREATE TABLE #Appt1 (ApptID INT, PatientID INT, ApptDate DATE)
INSERT INTO #Appt1
SELECT 1,101,'2020-01-05' UNION ALL
SELECT 2,505,'2020-01-06' UNION ALL
SELECT 3,505,'2020-01-10' UNION ALL
SELECT 4,505,'2020-01-20' UNION ALL
SELECT 5,101,'2020-01-25' UNION ALL
SELECT 6,101,'2020-02-12' UNION ALL
SELECT 7,101,'2020-02-20' UNION ALL
SELECT 8,101,'2020-03-30' UNION ALL
SELECT 9,303,'2020-01-28' UNION ALL
SELECT 10,303,'2020-02-02'
;With CTE as
(
select a1.* ,a2.ApptDate as NewApptDate
from #Appt1 a1
outer apply(select top 1 a2.ApptID ,a2.ApptDate
from #Appt1 A2
where a1.PatientID=a2.PatientID and a1.ApptID>a2.ApptID
and DATEDIFF(day,a2.ApptDate, a1.ApptDate)>30
order by a2.ApptID desc )A2
)
,CTE1 as
(
select a1.*, a2.ApptDate as FollowApptDate
from CTE A1
outer apply(select top 1 a2.ApptID ,a2.ApptDate
from #Appt1 A2
where a1.PatientID=a2.PatientID and a1.ApptID>a2.ApptID
and DATEDIFF(day,a2.ApptDate, a1.ApptDate)<=30
order by a2.ApptID desc )A2
)
select *
,case when FollowApptDate is null then 'New'
when NewApptDate is not null and FollowApptDate is not null
and DATEDIFF(day,NewApptDate, FollowApptDate)<=30 then 'New'
else 'Followup' end
as Category
from cte1 a1
order by a1.PatientID
drop table #Appt1
Although it's not clearly addressed in the question, it's easy to figure out that the appointment dates cannot be simply categorized by 30-day groups. It makes no business sense. And you cannot use the appt id either. One can make a new appointment today for 2020-09-06.
Here is how I address this issue. First, get the first appointment, then calculate the date difference between each appointment and the first appt. If it's 0, set to 'New'. If <= 30 'Followup'. If > 30, set as 'Undecided' and do the next round check until there is no more 'Undecided'. And for that, you really need a while loop, but it does not loop through each appointment date, rather only a few datasets. I checked the execution plan. Even though there are only 10 rows, the query cost is significantly lower than that using recursive CTE, but not as low as Lukasz Szozda's addendum method.
IF OBJECT_ID('tempdb..#TEMPTABLE') IS NOT NULL DROP TABLE #TEMPTABLE
SELECT ApptID, PatientID, ApptDate
,CASE WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) = 0) THEN 'New'
WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) <= 30) THEN 'Followup'
ELSE 'Undecided' END AS Category
INTO #TEMPTABLE
FROM #Appt1
WHILE EXISTS(SELECT TOP 1 * FROM #TEMPTABLE WHERE Category = 'Undecided') BEGIN
;WITH CTE AS (
SELECT ApptID, PatientID, ApptDate
,CASE WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) = 0) THEN 'New'
WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) <= 30) THEN 'Followup'
ELSE 'Undecided' END AS Category
FROM #TEMPTABLE
WHERE Category = 'Undecided'
)
UPDATE #TEMPTABLE
SET Category = CTE.Category
FROM #TEMPTABLE t
LEFT JOIN CTE ON CTE.ApptID = t.ApptID
WHERE t.Category = 'Undecided'
END
SELECT ApptID, PatientID, ApptDate, Category
FROM #TEMPTABLE
I hope this will help you.
WITH CTE AS
(
SELECT #Appt1.*, RowNum = ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ApptDate, ApptID) FROM #Appt1
)
SELECT A.ApptID , A.PatientID , A.ApptDate ,
Expected_Category = CASE WHEN (DATEDIFF(MONTH, B.ApptDate, A.ApptDate) > 0) THEN 'New'
WHEN (DATEDIFF(DAY, B.ApptDate, A.ApptDate) <= 30) then 'Followup'
ELSE 'New' END
FROM CTE A
LEFT OUTER JOIN CTE B on A.PatientID = B.PatientID
AND A.rownum = B.rownum + 1
ORDER BY A.PatientID, A.ApptDate
You could use a Case statement.
select
*,
CASE
WHEN DATEDIFF(d,A1.ApptDate,A2.ApptDate)>30 THEN 'New'
ELSE 'FollowUp'
END 'Category'
from
(SELECT PatientId, MIN(ApptId) 'ApptId', MIN(ApptDate) 'ApptDate' FROM #Appt1 GROUP BY PatientID) A1,
#Appt1 A2
where
A1.PatientID=A2.PatientID AND A1.ApptID<A2.ApptID
The question is, should this category be assigned based off the initial appointment, or the one prior? That is, if a Patient has had three appointments, should we compare the third appointment to the first, or the second?
You problem states the first, which is how I've answered. If that's not the case, you'll want to use lag.
Also, keep in mind that DateDiff makes not exception for weekends. If this should be weekdays only, you'll need to create your own Scalar-Valued function.
using Lag function
select apptID, PatientID , Apptdate ,
case when date_diff IS NULL THEN 'NEW'
when date_diff < 30 and (date_diff_2 IS NULL or date_diff_2 < 30) THEN 'Follow Up'
ELSE 'NEW'
END AS STATUS FROM
(
select
apptID, PatientID , Apptdate ,
DATEDIFF (day,lag(Apptdate) over (PARTITION BY PatientID order by ApptID asc),Apptdate) date_diff ,
DATEDIFF(day,lag(Apptdate,2) over (PARTITION BY PatientID order by ApptID asc),Apptdate) date_diff_2
from #Appt1
) SRC
Demo --> https://rextester.com/TNW43808
with cte
as
(
select
tmp.*,
IsNull(Lag(ApptDate) Over (partition by PatientID Order by PatientID,ApptDate),ApptDate) PriorApptDate
from #Appt1 tmp
)
select
PatientID,
ApptDate,
PriorApptDate,
DateDiff(d,PriorApptDate,ApptDate) Elapsed,
Case when DateDiff(d,PriorApptDate,ApptDate)>30
or DateDiff(d,PriorApptDate,ApptDate)=0 then 'New' else 'Followup' end Category from cte
Mine is correct. The authors was incorrect, see elapsed

Find the true start end dates for customers that have multiple accounts in SQL Server 2014

I have a checking account table that contains columns Cust_id (customer id), Open_Date (start date), and Closed_Date (end date). There is one row for each account. A customer can open multiple accounts at any given point. I would like to know how long the person has been a customer.
eg 1:
CREATE TABLE [Cust]
(
[Cust_id] [varchar](10) NULL,
[Open_Date] [date] NULL,
[Closed_Date] [date] NULL
)
insert into [Cust] values ('a123', '10/01/2019', '10/15/2019')
insert into [Cust] values ('a123', '10/12/2019', '11/01/2019')
Ideally I would like to insert this into a table with just one row, that says this person has been a customer from 10/01/2019 to 11/01/2019. (as he opened his second account before he closed his previous one.
Similarly eg 2:
insert into [Cust] values ('b245', '07/01/2019', '09/15/2019')
insert into [Cust] values ('b245', '10/12/2019', '12/01/2019')
I would like to see 2 rows in this case- one that shows he was a customer from 07/01 to 09/15 and then again from 10/12 to 12/01.
Can you point me to the best way to get this?
I would approach this as a gaps and islands problem. You want to group together groups of adjacents rows whose periods overlap.
Here is one way to solve it using lag() and a cumulative sum(). Everytime the open date is greater than the closed date of the previous record, a new group starts.
select
cust_id,
min(open_date) open_date,
max(closed_date) closed_date
from (
select
t.*,
sum(case when not open_date <= lag_closed_date then 1 else 0 end)
over(partition by cust_id order by open_date) grp
from (
select
t.*,
lag(closed_date) over (partition by cust_id order by open_date) lag_closed_date
from cust t
) t
) t
group by cust_id, grp
In this db fiddle with your sample data, the query produces:
cust_id | open_date | closed_date
:------ | :--------- | :----------
a123 | 2019-10-01 | 2019-11-01
b245 | 2019-07-01 | 2019-09-15
b245 | 2019-10-12 | 2019-12-01
I would solve this with recursion. While this is certainly very heavy, it should accommodate even the most complex account timings (assuming your data has such). However, if the sample data provided is as complex as you need to solve for, I highly recommend sticking with the solution provided above. It is much more concise and clear.
WITH x (cust_id, open_date, closed_date, lvl, grp) AS (
SELECT cust_id, open_date, closed_date, 1, 1
FROM (
SELECT cust_id
, open_date
, closed_date
, row_number()
OVER (PARTITION BY cust_id ORDER BY closed_date DESC, open_date) AS rn
FROM cust
) AS t
WHERE rn = 1
UNION ALL
SELECT cust_id, open_date, closed_date, lvl, grp
FROM (
SELECT c.cust_id
, c.open_date
, c.closed_date
, x.lvl + 1 AS lvl
, x.grp + CASE WHEN c.closed_date < x.open_date THEN 1 ELSE 0 END AS grp
, row_number() OVER (PARTITION BY c.cust_id ORDER BY c.closed_date DESC) AS rn
FROM cust c
JOIN x
ON x.cust_id = c.cust_id
AND c.open_date < x.open_date
) AS t
WHERE t.rn = 1
)
SELECT cust_id, min(open_date) AS first_open_date, max(closed_date) AS last_closed_date
FROM x
GROUP BY cust_id, grp
ORDER BY cust_id, grp
I would also add the caveat that I don't run on SQL Server, so there could be syntax differences that I didn't account for. Hopefully they are minor, if present.
you can try something like that:
select distinct
cust_id,
(select min(Open_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
),
(select max(Closed_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
)
from Cust as a
so, for every row - you're selecting minimal and maximal dates from all overlapping ranges, later distinct filters out duplicates

How to find the time difference between data present in two rows

I have the following table :
Customer_ID PurchaseDatetime
309 2/3/2014 12:29:00
309 2/27/2014 17:11:00
309 4/15/2014 13:24:00
I want to write a query which would calculate the difference between the datetime field of two consecutive rows. Ideally the output should be like
Customer_ID PurchaseDatetime
309 0
309 2/27/2014 17:11:00 - 2/3/2014 12:29:00 // The exact time difference in hours
309 4/15/2014 13:24:00 - 2/27/2014 17:11:00 // The exact time difference in hours
How do I write such a query?
Try this...
CREATE TABLE #Purchases
(
CustomerID INT,
PurchaseDate DATETIME
)
INSERT INTO #Purchases
VALUES
(100004,'2016-05-16 08:00:00'),
(100005,'2016-05-16 09:05:00'),
(100006,'2016-05-16 10:08:40'),
(32141 ,'2016-05-16 11:18:00'),
(84230 ,'2016-05-16 12:25:10'),
(23444 ,'2016-05-16 13:40:00'),
(100001,'2016-05-16 14:50:00')
;WITH CTE AS
(
SELECT
CustomerID,
PurchaseDate,
ROW_NUMBER() OVER (ORDER BY PurchaseDate) AS Seq
FROM #Purchases
)
SELECT
p.CustomerID,
p.PurchaseDate,
pl.PurchaseDate,
DATEADD(SECOND,DATEDIFF(SECOND, pl.PurchaseDate,p.PurchaseDate),0) AS DiffDT,
DATEDIFF(HOUR, pl.PurchaseDate,p.PurchaseDate) HourDiff
FROM CTE AS p
LEFT OUTER JOIN CTE AS pl ON pl.Seq = p.Seq - 1 -- Last batch
ORDER BY p.PurchaseDate
Try This Query...
;with cte as
(select row_number() over(order by (select 100)) Id.Customer_ID,PurchaseDatetime from Table)
select a.Customer_ID,b.PurchaseDatetime-a.PurchaseDatetime from cte a inner join cte b on a.id=b.id-1
Thanks
SELECT *,
PURCHASEDATETIME =
CASE CUSTOMER_ID
WHEN CUSTOMER_ID THEN
DATEDIFF(HH, LAG(PURCHASEDATETIME, 1) OVER(ORDER BY CUSTOMER_ID, PURCHASEDATETIME), PURCHASEDATETIME)
ELSE
NULL
END
FROM table

Filter LEFT JOINed table with dates to display current event, else future, else past?

I have a table that lists vacation information for different users (username, vacation start, and vacation end dates) -- 4 users are listed below:
Username VacationStart DeploymentEnd
rsuarez 2014-03-10 2014-03-26
studd 2014-01-18 2014-01-29
studd 2014-02-11 2014-02-26
studd 2014-03-02 2014-03-04
ssteele 2014-03-11 2014-03-26
ssteele 2014-03-18 2014-03-28
atidball 2014-03-05 2014-03-20
atidball 2014-03-06 2014-03-26
atidball 2014-03-13 2014-03-20
atidball 2014-03-18 2014-03-31
For a new query, I want to display only 4 rows, with each user having only one set of vacation dates displayed, either current/in-progress vacation, future/next vacation (if no current exists) or most recent (if two above are false).
The end result should be following (assuming today is 3/9/2014):
Username VacationStart DeploymentEnd
rsuarez 2014-03-10 2014-03-26
studd 2014-03-02 2014-03-04
ssteele 2014-03-11 2014-03-26
atidball 2014-03-05 2014-03-20
Vacation dates are actually coming from another table (data_vacations), which I left join to data_users. I am trying to perform case selection inside left join statement.
Here is what I tried before, but my logic fails there, since I ended up to mix different vacation end dates to vacation start dates:
SELECT Username, VacationStart, VacationEnd
FROM data_users
LEFT JOIN
(
SELECT userGUID,
CASE WHEN MIN(CASE WHEN (VacationEnd < getdate()) THEN NULL ELSE VacationStart END) IS NULL THEN MAX(VacationStart)
ELSE MIN(VacationStart) END AS VacationStart,
CASE WHEN MIN(CASE WHEN (VacationEnd < getdate()) THEN NULL ELSE VacationEnd END) IS NULL THEN MAX(VacationEnd)
ELSE MIN(VacationEnd) END AS VacationEnd
FROM data_vacations
GROUP BY userGUID
) b ON(data_empl_master.userGUID= b.userGUID)
What am I doing wrong? How could I fix it?
Also.. on side note.. Do I perform this filtering in LEFT JOIN correctly? Since data_users is much bigger, having distinct user ids... and I would like to join the available vacation information based on example above, while still displaying all unique user ids.
Using a common table expression to rank by category (current = 1, future = 2, past = 3) and each category individually by start date/differene from GETDATE(), you can get the result you want by ranking the result using ROW_NUMBER();
DECLARE #DATE DATETIME = GETDATE()
;WITH cte AS (
SELECT *, 1 r, VacationStart s FROM data_users
WHERE #DATE BETWEEN VacationStart and DeploymentEnd
UNION ALL
SELECT *,2 r, VacationStart - #DATE s FROM data_users
WHERE VacationStart > #DATE
UNION ALL
SELECT *,3 r, #DATE - DeploymentEnd s FROM data_users
WHERE DeploymentEnd < #DATE
), cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY username ORDER BY r,s) rn FROM cte
)
SELECT Username, VacationStart, DeploymentEnd FROM cte2 WHERE rn=1;
An SQLfiddle to test with.
Getting the date as a variable is necessary to get a consistent GETDATE() value over the whole query, otherwise it may not be consistent if called multiple times.
select u.name,s.startdate,s.enddate
from users u
left join
(
select su.name,
max(su.start) as startdate,
max(su.end) as enddate from users su group by su.name
)s on u.name= s.name
group by u.name
Since you are asking two questions I will answer the one about getting the vacation dates and let you figure out the join.
I don't think you can get the desired vacations dates in one simple query. First you need to establish if the given date range is in past, present or future. Then you need to order those ranges by start/end dates to get the most recent or next upcoming. You need sort the past vacations in descending and upcoming in ascending order. Funny enough user atidball has two vacations in-progress, I sorted that in the same manner as future vacation. Finally apply your rules, I did that by sorting by state.
declare #currentDate date = '20140309'
;
with cte1 as
(
-- state: the lower number the higher priority
select Username, VacationStart, DeploymentEnd,
case
when VacationStart <= #currentDate and DeploymentEnd >= #currentDate
then 0 -- in progress
when VacationStart > #currentDate
then 1 -- future
when DeploymentEnd < #currentDate
then 2 -- past
else NULL
end as state
from data_vacations
)
, cte2 as
(
select *,
row_number() over(partition by username, state order by VacationStart, DeploymentEnd) as rn
from cte1
where state < 2 -- current or upcoming
union all
select *,
row_number() over(partition by username, state order by DeploymentEnd desc, VacationStart desc) as rn
from cte1
where state = 2 -- past
)
, cte3 as
(
-- apply the rules: find the record with highest priority
select Username, min(state) as minstate
from cte1
group by Username
)
select cte2.Username, cte2.VacationStart, cte2.DeploymentEnd
from cte2
inner join cte3
on cte2.Username = cte3.Username
and cte2.state = cte3.minstate
and cte2.rn = 1 -- most recent or next upcoming
See the SQLFiddle.

Sql Server Self JOIN (pushing column values down)

I am asked to do the following:
"CycleStartDate needs to be the BillDate from the previous BillDate record. If a previous record does not exist, you should use the most recent CycleEndDate from the DataTime table"
CycleStartDate and CycleEndDate are columns in a table called DataTime
BillDate is a column in a table called BillingData
This is the BillDate values:
2012-07-27 00:00:00.000
2012-07-27 00:00:00.000
2012-08-27 00:00:00.000
2012-08-27 00:00:00.000
2012-09-28 00:00:00.000
2012-09-28 00:00:00.000
2012-10-26 00:00:00.000
2012-10-26 00:00:00.000
2012-11-27 00:00:00.000
2012-11-27 00:00:00.000
2012-12-27 00:00:00.000
How would I set the CycleStartDate values based on the requirements?
The tables Datetime and BillingData are connected by a column called MeterID.
Try something similar to this...
SELECT B.BillDate,
ISNULL(
B2.BillDate,
(SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID)
) CycleStartDate
FROM BillingData B
OUTER APPLY (
SELECT TOP 1 B2.BillDate
FROM BillingData B2
WHERE B2.MeterID = B.MeterID AND
B2.BillingData < B.BillingData
ORDER BY B2.BillingData DESC
) B2
I still have one doubt... Do you need to take the SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID or the SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID AND DT.CycleEndDate < B.BillDate?
But it can be done without the OUTER APPLY...
SELECT B.BillDate,
ISNULL(
(SELECT MAX(B2.BillDate)
FROM BillingData B2
WHERE B2.MeterID = B.MeterID AND
B2.BillingData < B.BillingData),
(SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID)
) CycleStartDate
FROM BillingData B
I think the second version is quite readable... For each row of BillingData B, look for the biggest BillDate (MAX(B2.BillDate)) lesser than the current BillDate and of the same MeterID. If not present (the ISNULL, if the first one is not present then it's NULL, so it goes to the second part of the ISNULL), look for the biggest CycleEndDate from DataTime with the same MeterID and return it.
You can use the ROW_NUMBER() function for offsetting a JOIN:
SELECT a.BillDate, COALESCE(b.BillDate,c.CycleEndDate) 'CycleEndDate'
FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY BillDate DESC)'RowRank'
FROM YourTable
)a
LEFT JOIN (SELECT *,ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY BillDate DESC)'RowRank'
FROM YourTable
)b
ON a.RowRank = b.RowRank - 1
AND a.MeterID = b.MeterID
LEFT JOIN (SELECT MeterID,MAX(CycleEndDate)'CycleEndDate'
FROM DataTime
GROUP BY MeterID
) c
ON a.MeterID = c.MeterID
The PARTITION BY may not be necessary as well as the MeterID criteria in the JOIN, your wording is a little confusing as to whether the ORDER BY should be ascending or descending, as it is above the newest record will be the one that gets it's date from the DateTime table, remove DESC to make it the oldest record that gets it's value from that table.