SQL - Update Record Based on Entity's Previous Record - sql

I have a temp table that has an entityID, a start date, an end date, and number of days. I get the number of days by getting the datediff between start and end dates and +1 day. The problem that I am having is when an entity has a second record that has the same start date as its previous end date, I get the number of days as 1 too many. ie.:
EntityID StartDate EndDate NumOfDays
-------- --------- ------- ---------
3414 02/01/2018 02/02/2018 2
3414 02/02/2018 02/10/2018 9
I need to make the StartDate of the second record to be 02/03/2018 and NumOfDays becomes 8 so that the whole range of days is 10 which would be correct. The temp table is ordered on EntityID, StartDate. There would be thousands of records in the table and maybe a few hundred that has this case. I only need to change the start date if that entity's previous end date is the same.
Should I do a loop? Cursor? Or is there a better way?
We are on SQL Server 2014

First, it seems that you should be calculating the number of days without the end date. That would solve the problem. But, that might not work.
You can use an updatable CTE:
with toupdate as (
select t.*, lag(end_date) over (partition by entityid order by start_date) as prev_end_date
from t
)
update toupdate
set numofdays = numofdays - 1
where prev_end_date = end_date;

Declare #t TABLE (EntityID INT, StartDate DATE, EndDate DATE)
INSERT INTO #t VALUES
(3414 ,'02/01/2018','02/02/2018'),
(3414 ,'02/02/2018','02/10/2018');
WITH x AS (
SELECT t.*
, CASE WHEN LAG(EndDate) OVER (PARTITION BY EntityID ORDER BY StartDate) >= StartDate
THEN DATEADD( DAY , 1 , LAG(EndDate) OVER (PARTITION BY EntityID ORDER BY StartDate))
ELSE StartDate END NewStartDate
FROM #t t
)
SELECT EntityID
, NewStartDate
, EndDate
, DATEDIFF(DAY, NewStartDate , EndDate) + 1 AS NumOfDays
FROM X

Related

Get Start and End date from multiple rows of dates, excluding weekends

I'm trying figure out how to return Start Date and End date based on data like in the below table:
Name
Date From
Date To
A
2022-01-03
2022-01-03
A
2021-12-29
2021-12-31
A
2021-12-28
2021-12-28
A
2021-12-27
2021-12-27
A
2021-12-23
2021-12-24
A
2021-11-08
2021-11-09
The result I am after would show like this:
Name
Date From
Date To
A
2021-12-23
2022-01-03
A
2021-11-08
2021-11-09
The dates in first table will sometimes go over weekends with the Date From and Date To, but in cases where the row ends on a Friday and next row starts on following Monday it will need to be classified as the same "block", as presented in the second table. I was hoping to use DATEFIRST setting to cater for the weekends to avoid using a calendar table, as per How do I exclude Weekend days in a SQL Server query?, but if calendar table ends up being the easiest way out I'm happy to look into creating one.
In above example I only have 1 Name, but the table will have multiple names and it will need to be grouped by that.
The only examples of this I am seeing are using only 1 date column for records and I struggled changing their code around to cater for my example. The closest example I found doesn't work for me as it is based on datetime fields and the time differences - find start and stop date for contiguous dates in multiple rows
This is a Gaps & Island problem with the twist that you need to consider weekend continuity.
You can do:
select max(name) as name, min(date_from) as date_from, max(date_to) as date_to
from (
select *, sum(inc) over(order by date_to) as grp
from (
select *,
case when lag(ext_to) over(order by date_to) = date_from
then 0 else 1 end as inc
from (
select *,
case when (datepart(weekday, date_to) = 6)
then dateadd(day, 3, date_to)
else dateadd(day, 1, date_to) end as ext_to
from t
) x
) y
) z
group by grp
Result:
name date_from date_to
---- ---------- ----------
A 2021-11-08 2021-11-09
A 2021-12-23 2022-01-03
See running example at db<>fiddle #1.
Note: Your question doesn't mention it, but you probably want to segment per person. I didn't do it.
EDIT: Adding partition by name
Partitioning by name is quite easy actually. The following query does it:
select name, min(date_from) as date_from, max(date_to) as date_to
from (
select *, sum(inc) over(partition by name order by date_to) as grp
from (
select *,
case when lag(ext_to) over(partition by name order by date_to) = date_from
then 0 else 1 end as inc
from (
select *,
case when (datepart(weekday, date_to) = 6)
then dateadd(day, 3, date_to)
else dateadd(day, 1, date_to) end as ext_to
from t
) x
) y
) z
group by name, grp
order by name, grp
See running query at db<>fiddle #2.
with extended as (
select name,
date_from,
case when datepart(weekday, date_to) = 6
then dateadd(day, 2, date_to) else date_to end as date_to
from t
), adjacent as (
select *,
case when dateadd(day, 1,
lag(date_to) over (partition by name order by date_from)) = date_from
then 0 else 1 end as brk
from extended
), blocked as (
select *, sum(brk) over (partition by name order by date_from) as grp
from adjacent
)
select name, min(date_from), max(date_to) from blocked
group by name, grp;
I'm assuming that ranges do no overlap and that all input dates do fall on weekdays. While hammering this out on my cellphone I originally made two mistakes. For some reason I got to and from dates reversed in my head and then I was thinking that Friday is 5 (as with ##datefirst) rather than 6. (Of course this could otherwise vary with the regional setting anyway.) One advantage of using table expressions is to modularize and bury certain details in lower levels of the logic. In this case it would be very easy to adjust dates should some of these assumptions prove to be wrong.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=42e0c452d57d474232bcf991d6d3c43c

Calculate days based on date range

I have data like below
create table #Temp(Id int, FromDate date, ToDate date)
Insert into #Temp
values(1,'9/1/2019','9/1/2019'),
(2,'9/2/2019','9/3/2019'),
(3,'9/2/2019','9/3/2019'),
(4,'9/4/2019','9/6/2019'),
(5,'9/7/2019','9/7/2019')
I am trying to calculate the difference and create days i.e Day 1, Day 2-3 etc...
Expected result
Id FromDate ToDate Display
1 01/09/2019 01/09/2019 Day 1
2 02/09/2019 03/09/2019 Day 2-3
3 02/09/2019 03/09/2019 Day 2-3
4 04/09/2019 06/09/2019 Day 4-6
5 07/09/2019 07/09/2019 Day 7
I have tried below code using datediff, but not sure how to relate to previous row and get the date range
select *, DATEDIFF(DAY,FromDate,ToDate)
from #Temp
Use first_value
select *
, datediff(day, first_value(FromDate) over(order by FromDate), FromDate) + 1
, datediff(day, first_value(FromDate) over(order by FromDate), ToDate) + 1
from #Temp
You can try this if you want exactly the same output
Select
* ,
case
when (FromDate != ToDate)
then
'Day '+ DATEPART(Day,FromDate) + '-' + DATEPART(Day,ToDate)
else
'Day '+ DATEPART(Day,FromDate)
END AS Display
From #Temp
You don't want the previous row's value, you want the earliest fromdate, and then you can compare it to every row.
select id, min (fromdate) over (order by fromdate asc) as earliest_date,fromDate,todate,
datediff(day,min (fromdate) over (order by fromdate asc),fromdate)+1,
datediff(day,min (fromdate) over (order by fromdate asc),todate)+1
from
#temp
Fiddle

How to calculate number of days working on tasks if we have many tasks and the date range of each tasks could have overlap

I run into a question during working and I would really appreciate if anyone could give me some ideas.
We have a table which keeps tracking of tasks employee has finished. Table structure as below :
EmployeeNum | TaskID |Start Date of task | End Date of task
I want to calculate how many days each employee has invested in each task using this table. At first my code looks like this:
Select
EmployeeNum,TaskID,DateDiff(day,StartDate,EndDate)+1 as PureDay
from
TaskTable
Group by
EmployeeNum,TaskID
But then I found a problem that there are overlaps in the date range for each task.
For example, we have TaskA, TaskB, TaskC for one employee.
TaskA is from 2018-10-01 to 2018-10-05
TaskB from 2018-10-02 to 2018-10-07
TaskC from 2018-10-09 to 2018-10-10
In this way, the actual working days of this employee should be from 2018-10-01 to 2018-10-07, and then 2018-10-09 to 2018-10-10 which is 9 days. If I calculate date range of each task then add them together then actual working days become 5+6+2=13 days instead of 9.
I'm wandering if there could be any good ways to solve this overlapping problem ? Thank you very much for any ideas!
Following query will count how many working days each employee spent on each task ;
SELECT
EmployeeNum,
TaskID,
(DATEDIFF(dd, StartDate, EndDate) + 1)
-(DATEDIFF(wk, StartDate, EndDate) * 2)
-(CASE WHEN DATENAME(dw, StartDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, EndDate) = 'Saturday' THEN 1 ELSE 0 END) as PureDay
FROM
TaskTable
GROUP BY
EmployeeNum,
TaskID
See this link for on explanation on how this computation works.
Once you know the date when a task starts, you can use a cumulative sum to assign a group to each record and then simply aggregate by that group (and other information).
The following query should do what you want:
with starts as (
select sm.*,
(case when exists (select 1
from tb_TaskMaster sm2
where sm2.EmpID = sm.EmpID and
sm2.StartDate < sm.StartDate and
sm2.EndDate >= sm.StartDate
)
then 0 else 1
end) as isstart
from tb_TaskMaster sm
)
select EmpID, count(TaskId) as cnt_TaskID, min(StartDate) as StartDate, max(EndDate) as EndDate,
datediff(Day, min(StartDate), max(EndDate)) + 1 as PureDay
from (select s.*, sum(isstart) over (partition by EmpID order by StartDate) as grp
from starts s
) s
group by EmpID, grp
order by EmpID
In this db<>fiddle, you could find the DDL & DML for my example data and the working of the code.
You can try this.
Im not sure it will work all the way but you can give it a try :)
declare #table table (empid int,taskid nvarchar(50),startdate date, enddate date)
insert into #table
values
(1,'TaskA','2018-10-01','2018-10-05'),
(1,'TaskB','2018-10-02','2018-10-07'),
(1,'TaskC','2018-10-09','2018-10-10')
select *,case when comparedate > startdate then datediff(dd,comparedate,enddate) else datediff(dd,startdate,enddate)+1 end as countofworkingdays from (
Select empid,taskid,startdate,enddate,lag(enddate,1,'1900-01-01') over(partition by empid order by startdate) as CompareDate from #table
)x
Result
This eliminates overlapping ranges by adjusting the start date based on all previous end dates:
with maxEndDates as
( -- find the maximum previous end date
Select empid,taskid,startdate,enddate,
max(EndDate)
over (partition by EmpID
order by StartDate, EndDate desc
rows between unbounded preceding and 1 preceding) as maxEndDate
from TaskTable
),
daysPerTask as
( -- calculate the difference based on the adjusted start date to eliminate overlaping days
select *,
case when maxEndDate >= enddate then 0 -- range already fully covered
when maxEndDate > startdate then datediff(dd, maxEndDate, enddate) -- range partially overlapping
else datediff(dd, startdate, enddate)+1 -- new range
end as dayCount
from maxEndDates
)
-- get the final count
select EmpID, sum(dayCount)
from daysPerTask
group by EmpID;
See db<>fiddle
Thank you all very much for your responding and help. I found a solution during searching in Stackoverflow, the following is it's link:
T-SQL date range in a table split and add the individual date to the table
The Tally table suggested by Felix in the above question is a great way to solve my problem since I have millions of records and the real situation is really complicated.
Thank you all again for your help!

To club the rows for week days

I have data like below:
StartDate EndDate Duration
----------
41890 41892 3
41898 41900 3
41906 41907 2
41910 41910 1
StartDate and EndDate are respective ID values for any dates from calendar. I want to calculate the sum of duration for consecutive days. Here I want to include the days which are weekends. E.g. in the above data, let's say 41908 and 41909 are weekends, then my required result set should look like below.
I already have another proc that can return me the next working day, i.e. if I pass 41907 or 41908 or 41909 as DateID in that proc, it will return 41910 as the next working day. Basically I want to check if the DateID returned by my proc when I pass the above EndDateID is same as the next StartDateID from above data, then both the rows should be clubbed. Below is the data I want to get.
ID StartDate EndDate Duration
----------
278457 41890 41892 3
278457 41898 41900 3
278457 41906 41910 3
Please let me know in case the requirement is not clear, I can explain further.
My Date Table is like below:
DateId Date Day
----------
41906 09-04-2014 Thursday
41907 09-05-2014 Friday
41908 09-06-2014 Saturdat
41909 09-07-2014 Sunday
41910 09-08-2014 Monday
Here is the SQL Code for setup:
CREATE TABLE Table1
(
StartDate INT,
EndDate INT,
LeaveDuration INT
)
INSERT INTO Table1
VALUES(41890, 41892, 3),
(41898, 41900, 3),
(41906, 41907, 3),
(41910, 41910, 1)
CREATE TABLE DateTable
(
DateID INT,
Date DATETIME,
Day VARCHAR(20)
)
INSERT INTO DateTable
VALUES(41907, '09-05-2014', 'Friday'),
(41908, '09-06-2014', 'Saturday'),
(41909, '09-07-2014', 'Sunday'),
(41910, '09-08-2014', 'Monday'),
(41911, '09-09-2014', 'Tuesday')
This is rather complicated. Here is an approach using window functions.
First, use the date table to enumerate the dates without weekends (you can also take out holidays if you want). Then, expand the periods into one day per row, by using a non-equijoin.
You can then use a trick to identify sequential days. This trick is to generate a sequential number for each id and subtract it from the sequential number for the dates. This is a constant for sequential days. The final step is simply an aggregation.
The resulting query is something like this:
with d as (
select d.*, row_number() over (order by date) as seqnum
from dates d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(d.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from table t join
d ds
on ds.date between t.startdate and t.enddate
) t
group by t.id, grp;
EDIT:
The following is the version on this SQL Fiddle:
with d as (
select d.*, row_number() over (order by date) as seqnum
from datetable d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(ds.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from (select t.*, 'abc' as id from table1 t) t join
d ds
on ds.dateid between t.startdate and t.enddate
) t
group by grp;
I believe this is working, but the date table doesn't have all the dates in it.

SQL: Select Multiple Columns with Max() on calculated values

Real basic: I have table T with following data:
ID StartDate Term (months)
----------------------
1 10/1/2012 12
2 10/1/2012 24
3 12/1/2012 12
I need to know the ID of the row that has the max end date. I've successfully calculated the end date as
select max( DateAdd(month, term, StartDate) from table [this would result in 10/1/2014]
how do i get the ID value and Start Date of the row that contains the max end date?
MS SQL:
SELECT TOP 1 ID, StartDate
FROM T
ORDER BY DateAdd(month, term, StartDate) DESC
MySQL:
SELECT ID, StartDate
FROM T
ORDER BY DateAdd(month, term, StartDate) DESC
LIMIT 1
In case more than one ID has the same extreme "end date" and you need them all, you can try this:
SELECT x.id
FROM (
SELECT id
, RANK ( ) OVER ( ORDER BY DateAdd(month, term, StartDate) DESC) as rn
FROM T
) x
WHERE t.rn = 1