How to find contiguous dates in numerous rows in SQL Server - sql

We have a table with service provisions for people. For example:
id people_id dateStart dateEnd
1 1 28.07.14 19.07.16
2 2 14.04.15 16.02.16
3 2 16.02.16 18.04.16
4 2 18.04.16 27.06.16
5 2 27.06.16 19.07.16
6 2 19.07.16 NULL
7 3 24.02.12 17.06.12
8 3 23.07.12 19.09.12
9 3 18.08.14 NULL
10 4 28.06.15 NULL
11 5 19.01.16 NULL
I need to extract distinct people_id's (clients) with real start date of unfinished uninterrupted service that lasts more than year and then count days pass. 'Start date' and 'End date' of two different rows should be the same to be count as contiguous. One client can only have one unfinished service.
So the perfect result for the table above would be:
people_id dateStart lasts(days)
2 14.04.15 472
3 18.08.14 711
4 28.06.15 397
I didn't have problem with a single service:
SELECT
--some other columns from PEOPLE,
p.PEOPLE_ID,
s.DATESTART,
DATEDIFF(DAY, s.DATESTART, GETDATE()) as lasts
FROM
PEOPLE p
INNER JOIN service s on s.ID =
(
SELECT TOP 1 s2.ID
FROM service s2
WHERE s2.PEOPLE_ID = p.PEOPLE_ID
AND s2.DATESTART IS NOT NULL
AND s2.DATEEND IS NULL
ORDER BY s2.DATESTART DESC
)
WHERE
DATEDIFF(DAY, s.DATESTART , GETDATE()) >= 365
But I can't figure out how to determine contiguous services.

You can find where periods of "continuous" service begin by using lag(). Then a cumulative sum of this flag provides a group, which can be used for aggregation:
select people_id, min(datestart) as datestart,
(case when count(dateend) = count(*) then max(dateend) end) as dateend
from (select t.*,
sum(case when prev_dateend = datestart then 0 else 1 end) over
(partition by people_id order by datestart) as grp
from (select t.*,
lag(dateend) over (partition by people_id order by date_start) as prev_dateend
from t
) t
) t
group by people_id, grp
having count(*) > count(dateend);

Try this query:
select PeopleId, min(dateStart) as dateStart, sum(diff) as [lasts(days)] from
(
select P.*, datediff(day,datestart, DateEnd) as diff from
(select peopleId, dateStart,
isnull(dateend, cast(getdate() as date)) as DateEnd
from People
) P
where Dateend in
(select DateStart from People
where PeopleId = P.PeopleId)
or DateEnd = cast(getdate() as date ) -- check for continuous dates
) P1 group by PeopleId having sum(diff)> 365 --check for > one year
The comments in the query should explain things

Related

Find non consecutive date ranges

i want to find if some of all the consecutive date ranges has gap between. Some of the dates are not consecutive, in this case it will return the RowId of the single range.
Table Name: Subscriptions
RowId
ClientId
Status
StartDate
EndDate
1
1
1
01/01/2022
02/01/2022
2
1
1
03/01/2022
04/01/2022
3
1
1
12/01/2022
15/01/2022
4
2
1
03/01/2022
06/01/2022
i want a sql statement to find RowId of non consecutive ranges for each client and status in (1,3) (example of result)
RowId
3
I want to solve the problem using SQL only.
thanks
One way you could do this is to use Lag (or lead) to identify gaps in neighbouring rows' date ranges and take the top N rows where the gap exceeds 1 day.
select top (1) with ties rowId
from t
where status in (1,3)
order by
case when DateDiff(day, lag(enddate,1,enddate)
over(partition by clientid order by startdate), startdate) >1
then 0 else 1 end;
You can detect gaps with LAG() and mark them. Then, it's easy to filter out the rows. For example:
select *
from (
select *,
case when dateadd(day, -1, start_date) >
lag(end_date) over(partition by client_id order by start_date)
then 1 else 0 end as i
from t
) x
where i = 1
Or simpler...
select *
from (
select *,
lag(end_date) over(partition by client_id order by start_date) as prev_end
from t
) x
where dateadd(day, -1, start_date) > prev_end

MS-SQL how to add missing month in a table values

I have a table with the following entries,
ID
date
Frequency
1
'2012-04-30'
5
1
'2012-06-30'
4
1
'2012-07-31'
25
2
'2012-04-30'
7
2
'2012-05-31'
4
2
'2012-06-30'
1
2
'2012-07-31'
6
I need to add missing month and the date which gets added should be the last date of that month with frequency value as 0.
The expected output is
ID
date
Frequency
1
'2012-04-30'
5
1
'2012-05-31'
0
1
'2012-06-30'
4
1
'2012-07-31'
25
2
'2012-04-30'
7
2
'2012-05-31'
4
2
'2012-06-30'
1
2
'2012-07-31'
6
I need to add missing month and the date which gets added should be the last date of that
I would suggest recursive CTEs:
with cte as (
select id, date, frequency,
lead(date) over (partition by id order by date) as next_date
from t
union all
select id, eomonth(date, 1), 0, next_date
from cte
where eomonth(date, 1) < dateadd(day, -1, next_date)
)
select id, date, frequency
from cte
order by id, date;
The anchor part of the CTE calculates the end date for a given row. The recursive part then just keeps adding months to fill in the missing rows (and none if there are none). The use of eomonth(date, 1) is just a handy way of getting the last day of the next month.
Here is a db<>fiddle.
If you have all dates in the table, you can also use cross join to generate the rows and then left join to bring in the existing data:
select i.id, d.date, coalesce(t.frequency, 0) as frequency
from (select distinct id from t) i cross join
(select distinct date from t) d left join
t
on i.id = t.id and d.date = t.date
order by i.id, d.date;
If you have a large amount of data, you can compare performance. This may be a case where a recursive CTE is faster than alternative methods.

SQL Join two tables by unrelated date

I’m looking to join two tables that do not have a common data point, but common value (date). I want a table that lists the date and total number of hired/terminated employees on that day. Example is below:
Table 1
Hire Date Employee Number Employee Name
--------------------------------------------
5/5/2018 10078 Joe
5/5/2018 10077 Adam
5/5/2018 10078 Steve
5/8/2018 10079 Jane
5/8/2018 10080 Mary
Table 2
Termination Date Employee Number Employee Name
----------------------------------------------------
5/5/2018 10010 Tony
5/6/2018 10025 Jonathan
5/6/2018 10035 Mark
5/8/2018 10052 Chris
5/9/2018 10037 Sam
Desired result:
Date Total Hired Total Terminated
--------------------------------------
5/5/2018 3 1
5/6/2018 0 2
5/7/2018 0 0
5/8/2018 2 1
5/9/2018 0 1
Getting the total count is easy, just unsure as the best approach from the standpoint of "adding" a date column
If you need all dates within some window then you need to join the data to a calendar. You can then left join and sum flags for data points.
DECLARE #StartDate DATETIME = (SELECT MIN(ActionDate) FROM(SELECT ActionDate = MIN(HireDate) FROM Table1 UNION SELECT ActionDate = MIN(TerminationDate) FROM Table2)AS X)
DECLARE #EndDate DATETIME = (SELECT MAX(ActionDate) FROM(SELECT ActionDate = MAX(HireDate) FROM Table1 UNION SELECT ActionDate = MAX(TerminationDate) FROM Table2)AS X)
;WITH AllDates AS
(
SELECT CalendarDate=#StartDate
UNION ALL
SELECT DATEADD(DAY, 1, CalendarDate)
FROM AllDates
WHERE DATEADD(DAY, 1, CalendarDate) <= #EndDate
)
SELECT
CalendarDate,
TotalHired = SUM(CASE WHEN H.HireDate IS NULL THEN NULL ELSE 1 END),
TotalTerminated = SUM(CASE WHEN T.TerminationDate IS NULL THEN NULL ELSE 1 END)
FROM
AllDates D
LEFT OUTER JOIN Table1 H ON H.HireDate = D.CalendarDate
LEFT OUTER JOIN Table2 T ON T.TerminationDate = D.CalendarDate
/* If you only want dates with data points then uncomment out the where clause
WHERE
NOT (H.HireDate IS NULL AND T.TerminationDate IS NULL)
*/
GROUP BY
CalendarDate
I would do this with a union all and aggregations:
select dte, sum(is_hired) as num_hired, sum(is_termed) as num_termed
from (select hiredate as dte, 1 as is_hired, 0 as is_termed from table1
union all
select terminationdate, 0 as is_hired, 1 as is_termed from table2
) ht
group by dte
order by dte;
This does not include the "missing" dates. If you want those, a calendar or recursive CTE works. For instance:
with ht as (
select dte, sum(is_hired) as num_hired, sum(is_termed) as num_termed
from (select hiredate as dte, 1 as is_hired, 0 as is_termed from table1
union all
select terminationdate, 0 as is_hired, 1 as is_termed from table2
) ht
group by dte
),
d as (
select min(dte) as dte, max(dte) as max_dte)
from ht
union all
select dateadd(day, 1, dte), max_dte
from d
where dte < max_dte
)
select d.dte, coalesce(ht.num_hired, 0) as num_hired, coalesce(ht.num_termed) as num_termed
from d left join
ht
on d.dte = ht.dte
order by dte;
Try this one
SELECT ISNULL(a.THE_DATE, b.THE_DATE) as Date,
ISNULL(a.Total_Hire,0) as Total_Hire,
ISNULL (b.Total_Terminate,0) as Total_terminate
FROM (SELECT Hire_date as the_date, COUNT(1) as Total_Hire
FROM TABLE_HIRE GROUP BY HIRE_DATE) a
FULL OUTER JOIN (SELECT Termination_Date as the_date, COUNT(1) as Total_Terminate
FROM TABLE_TERMINATE GROUP BY HIRE_DATE) a
ON a.the_date = b.the_date

SQL Server: Count days difference between previous date and current date

I've been trying to find a way to count days difference between two dates from previous and current rows which counting only business days.
Example data and criteria here.
ID StartDate EndDate NewDate DaysDifference
========================================================================
0 04/05/2017 null
1 12/06/2017 16/06/2017 12/06/2017 29
2 03/07/2017 04/07/2017 16/06/2017 13
3 07/07/2017 10/07/2017 04/07/2017 5
4 12/07/2017 26/07/2017 10/07/2017 13
My end goal is
I want two new columns; NewDate and DayDifference.
NewDate column is from EndDate from previous row. As you can see that for example, NewDate of ID 2 is 16/06/2017 which come from EndDate of ID 1. But if value in EndDate of previous row is null, use its StartDate instead(ID 1 case).
DaysDifference column is from counting only business days between EndDate and NewDate columns.
Here is script that I am using atm.
select distinct
c.ID
,c.EndDate
,isnull(p.EndDate,c.StartDate) as NewDate
,count(distinct cast(l.CalendarDate as date)) as DaysDifference
from
(select *
from table) c
full join
(select *
from table) p
on c.level = p.level
and c.id-1 = p.id
left join Calendar l
on (cast(l.CalendarDate as date) between cast(p.EndDate as date) and cast(c.EndDate as date)
or
cast(l.CalendarDate as date) between cast(p.EndDate as date) and cast(c.StartDate as date))
and l.Day not in ('Sat','Sun') and l.Holiday <> 'Y'
where c.ID <> 0
group by
c.ID
,c.EndDate
,isnull(p.EndDate,c.StartDate)
And this's the current result :
ID EndDate NewDate DaysDifference
=========================================================
1 16/06/2017 12/06/2017 0
2 04/07/2017 16/06/2017 13
3 10/07/2017 04/07/2017 5
4 26/07/2017 10/07/2017 13
Seems like in the real data, I've got correct DaysDifference for ID 2,3,4 except ID 1 because of the null value from its previous row(ID 0) that printing StartDate instead of null EndDate, so it counts incorrectly.
Hope I've provided enough info. :)
Could you please guide me a way to count DaysDifference correctly.
Thanks in advance!
I think you can use this logic to get the previous date:
select t.*,
lag(coalesce(enddate, startdate), 1) over (order by 1) as newdate
from t;
Then for the difference:
select id, enddate, newdate,
sum(case when c.day not in ('Sat', 'Sun') and c.holiday <> 'Y' then 1 else 0 end) as diff
from (select t.*,
lag(coalesce(enddate, startdate), 1) over (order by 1) as newdate
from t
) t join
calendar c
on c.calendardate >= newdate and c.calendardate <= startdate
group by select id, enddate, newdate;

Weighting a length of time to get a different Date each time

I have an arrival Date 01/01/2010, this has occurred 50 times and I want to randomise 50 departure dates using the length of stay weighting guide below, as you can the majority of these will leave 2 days later, but I cannot figure out how to write the code, Can you help.
LengthofStay LengthofStayWeighting
------------ ---------------------
1 1
2 5
3 4
4 3
5 3
6 3
7 3
8 1
9 1
10 1
I have started but have got stuck already
SELECT ArrivalDate,RAND(checksum(NEWID())) * LengthOfStay.LengthofStayWeighting AS Expr1,
ArrivalDate + Expr1 as DepartureDate
FROM Bookings, LengthOfStay
ORDER BY ArrivalDate
You may need to use DATEADD
SELECT ArrivalDate, DATEADD(day, RAND(checksum(NEWID())) * LengthOfStay.LengthofStayWeighting, ArrivalDate) AS DepartureDate
FROM Bookings, LengthOfStay
ORDER BY ArrivalDate
update: Based on your comment, I think I misunderstood the question. Is this what you need?:
SELECT ArrivalDate,
DATEADD(day, (select TOP 1 LengthofStayWeighting FROM LengthOfStay group by LengthofStayWeighting ORDER BY LengthofStayWeighting DESC), ArrivalDate) AS DepartureDate
FROM Bookings
ORDER BY ArrivalDate
Basically you need to obtain the length that is repeated the most, in your case "1". If so, I think you need to include a FOREIGN Key..
SELECT ArrivalDate,
DATEADD(day, (select TOP 1 LengthofStayWeighting FROM LengthOfStay l WHERE b.Id = l.BookingId GROUP BY LengthofStayWeighting ORDER BY LengthofStayWeighting DESC), ArrivalDate) AS DepartureDate
FROM Bookings b
ORDER BY ArrivalDate
You are trying to pull numbers from a cumulative distribution. This requires generating a random number and then pulling from the distribution.
The following code gives an example:
with LengthOfStay as (select 1 as LengthOfStay, 1 as LengthOfStayWeighting union all
select 2 as LengthOfStay, 5 union all
select 3, 4 union all
select 4, 4
),
Bookings as (select cast('2013-01-01' as DATETIME) as ArrivalDate),
CumeLengthOfStay as
(select los.*,
(select SUM(LengthOfStayWeighting) from LengthOfStay los2 where los2.LengthOfStay <= los.LengthOfStay
) as cumeweighting
from LengthOfStay los
) -- select * from CumeLengthOfStay
SELECT ArrivalDate, clos.LengthOfStay, randnum % sumweighting, sumweighting,
ArrivalDate + clos.LengthOfStay as DepartureDate
FROM (select b.*, ABS(CAST(NEWID() AS binary(6))+0) as randnum
from Bookings b
) b cross join
(select SUM(LengthOfStayWeighting) as sumweighting from LengthOfStay) const left outer join
CumeLengthOfStay clos
on (b.randnum % const.sumweighting) between clos.cumeweighting - clos.LengthOfStayWeighting and clos.cumeweighting - 1
ORDER BY ArrivalDate
Basically, you add up the weights, generate a random number less than the highest weight (using the % operator), and then look up this value in the cumulative sum of the weights.