Grouping shift data by 7-day windows in SQL Server 2012 - sql

What I want to do is to calculate the number of shifts and hours worked by each employee in any given 7-day period. In order to achieve this, I need to identify and group 'islands' of shifts. Note that this 7-day period is not tied to a calendar week and the beginning and ending of this 7-day period would vary from employee to employee. This is sets it apart from other similar questions asked her in the past.
I have a table like this:
Person ID Start Date End Date Start time End time Hours Worked
12345 06-07-20 06-07-20 6:00 AM 7:45 AM 1.75
12345 06-07-20 06-07-20 8:15 AM 8:45 AM 0.50
12345 06-07-20 06-07-20 9:19 AM 9:43 AM 0.40
12345 08-07-20 08-07-20 12:00 AM 12:39 AM 0.65
12345 09-07-20 09-07-20 10:05 PM 11:59 PM 1.90
12345 11-07-20 11-07-20 4:39 PM 4:54 PM 0.25
12345 22-07-20 22-07-20 7:00 AM 7:30 AM 0.50
12345 23-07-20 23-07-20 1:00 PM 3:00 PM 2.00
12345 24-07-20 24-07-20 9:14 AM 9:35 AM 0.35
12345 27-07-20 27-07-20 4:00 PM 6:00 PM 2.00
12345 27-07-20 27-07-20 2:00 PM 4:00 PM 2.00
12345 28-07-20 28-07-20 9:00 AM 10:00 AM 1.00
12345 28-07-20 28-07-20 4:39 AM 4:59 AM 0.34
I want group and summarise the data above like this:
Person ID From To Number of shifts Number of Hours
12345 06-07-20 11-07-20 6 5.45
12345 22-07-20 28-07-20 7 8.19
Note that the first grouping for employee 12345 starts on 06-07-20 and ends on 11-07-20 because these shifts fall within the 06-07-20 - 13-07-20 7-day window.
The next day 7-day window is from 22-07-20 to 28-07-20, which means that the start date for the 7-day window has to be dynamic and based on the data i.e. not constant which makes this a complex task.
Also note that an employee may work multiple shifts in a day and that the shifts may not be consecutive.
I was playing around with using DATEDIFF() with LAG() and LEAD() but was unable to get to where I want. Any help would be appreciated.

I think you need a recursive CTE gor this. The idea is to enumerate the shifts of each person, and then iteratively walk the dataset, while keeping track of the first date of the period - when there is more than 7 days between the start of a period and the current date, the start date resets, and a new group starts.
with recursive
data as (select t.*, row_number() over(partition by personid order by start_date) rn from mytable t)
cte as (
select personid, start_date, start_date end_date, hours_worked, rn
from data
where rn = 1
union all
select
c.personid,
case when d.start_date > dateadd(day, 7, c.start_date) then d.start_date else c.start_date end,
d.start_date,
d.hours_worked,
d.rn
from cte c
inner join data d on d.personid = c.personid and d.rn = c.rn + 1
)
select personid, start_date, max(start_date) end_date, count(*) no_shifts, sum(hours_worked)
from cte
group by personid, start_date
This assumes that:
dates do not span over multiple days, as shown in your sample data
dates are stored as date datatype, and times as time

Related

SQL query to find available slots with multiple providers and users

I want to be able to find the number of available slots for a particular time duration for all locations and all days
For example: I have to know the number of available appointments before 10 AM in all locations from the below sample tables
I have looked at other answers in stack overflow, mine is peculiar in the sense it also involves data on multiple doctors/patients.
Doctor's time table
Location
RESOURCE
Day
StartTime
EndTime
ABC
D1
Mon
8:00 AM
12:00 PM
ABC
D1
Tue
8:00 AM
12:00 PM
ABC
D2
Mon
9:00 AM
01:00 PM
ABC
D2
Tue
8:00 AM
12:00 PM
XYZ
D1
Mon
8:00 AM
12:00 PM
XYZ
D1
Tue
8:00 AM
12:00 PM
XYZ
D4
Mon
9:00 AM
01:00 PM
XYZ
D4
Tue
8:00 AM
12:00 PM
Patient's appointment time table
Location
Patient
Duration
StartTime
ApptDt
ABC
P1
15
8:00 AM
10/4/2021
ABC
P2
15
8:15 AM
10/4/2021
ABC
P3
15
9:00 AM
10/4/2021
ABC
P4
15
9:00 AM
10/5/2021
XYZ
P5
15
10:00 AM
10/5/2021
XYZ
P6
15
10:00 AM
10/5/2021
XYZ
P7
15
10:15 AM
10/5/2021
XYZ
P8
15
10:15 AM
10/5/2021
Doctor's time table does not have dates as it is the same throughout the year.
On Mondays in ABC location, since there are 2 doctors overlapping the time between 9:00 AM to 12:00 noon, they can accept multiple appointments at the same time. ie, 2 patients from 9:00 am to 9:15 am can be served in location ABC.
A typical duration(Duration) for an appointment is 15 minutes as indicated in the patient's table.
Expected result set
Location
Date
Available appts
ABC.
10/4/2021
8
XYZ
10/4/2021
12
On 10/4/2021 there were 8 slots available for booking before 10 AM because there were no appointments between
8:30-8:45 for D1
8:45-9:00 for D1
9:00-9:15(2) for D1,D2
9:15-9:30(2) for D1,D2
9:30-9:45(2) for D1,D2
9:45-10:00(2) for D1,D2
I want to also know for a specific time slot how many appointments were booked vs available.
I'd re-imagine this data as transactional using CTEs, compute balances and then find the points where the balance is non-zero.
Conceptually, that means there's a +1 doctor transaction on each doctor's start time, and a -1 doctor transaction on each doctor's end time. Patients are just the reverse, there is a -1 doctor transaction at their start time and a +1 doctor transaction at their start time plus duration.
So something like:
WITH DrStarts AS (
SELECT
1 [Drs],
[Dates].[Date] + [DrSched].StartTime [Timestamp]
FROM [DrSched]
INNER JOIN [Dates]
ON WEEKDAY([Dates]) = [DrSched].[Day]
), DrEnds AS (
SELECT
-1 [Drs],
[Dates].[Date] + [DrSched].EndTime [Timestamp]
FROM [DrSched]
INNER JOIN [Dates]
ON WEEKDAY([Dates]) = [DrSched].[Day]
), ApptStarts AS (
SELECT -1 [Drs], [Date] + [Time] FROM [Appts]
), ApptEnds AS (
SELECT -1 [Drs], DATEADD(MM,[Duration],[Date] + [Time]) FROM [Appts]
), Txns AS (
SELECT *, 1 Priority FROM DrStarts
UNION ALL SELECT *, 1 Priority FROM DrEnds
UNION ALL SELECT *, 0 Priority FROM ApptStarts
UNION ALL SELECT *, 0 Priority FROM ApptEnds
)
I added priorities at the end so we can make sure the patient leaves an instant before the doctor leaves. Then you can get the balance using a windowed function like so:
, AvailDrs AS (
SELECT
*,
SUM([Drs]) OVER( ORDER BY [Timestamp] DESC, [Priority] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) [AvailDrs]
FROM Txns
)
Then to get the available slots, you just do:
SELECT
[AvailDrs].[Timestamp] [From],
LEAD([AvailDrs].[Timestamp]) OVER(ORDER BY [AvailDrs].[Timestamp]) [To],
[AvailDrs].[AvailDrs]
FROM AvailDrs
WHERE [AvailDrs] > 0
Though you may want to filter that to get rid of zero-length windows because those will occur.
This is not very performant, but if you have a high volume scenario, you probably want to reconsider your database design to make this function require less transformation.
You also need to make a date table. I presume you actually have a work calendar somewhere, but if not there are myriad ways to create a date table within a dynamic start/end date so I just assume it exists here. this approach also lets you easily slot in holidays, and perhaps incorporate a dr-specific leave calendar too.
In general, a wide range of difficult SQL probnlems become much easier if you reimagine the data as account/amount/timestamp transactions. Here you don't even subdivide into accounts but you often need that concept for other puzzles.
Also, I haven't tested this exact code, so you may end up with duplicates. If that's the case you may need to global key ORDER BY tie breaker to keep everything running smooth in the windowed functions. You can add this as an identity column to both tables, or just define a CTE with a DENSE_RANK() key column and use that instead of selecting from the tables directly.

Grouping sets of data in Oracle SQL

I have been trying to separate groups in data being stored on my oracle database for more accurate analysis.
Current Output
Time Location
10:00 A111
11:00 A112
12:00 S111
13:00 S234
17:00 A234
18:00 S747
19:00 A878
Desired Output
Time Location Group Number
10:00 A111 1
11:00 A112 1
12:00 S111 1
13:00 S234 1
17:00 A234 2
18:00 S747 2
19:00 A878 3
I have been trying to use over and partition by to assign the values, however I can only get into to increment all the time not only on a change. Also tried using lag but I struggled to make use of that.
I only need the value in the second column to start from 1 and increment when the first letter of field 1 changes (using substr).
This is my attempt using row_number but I am far off I think. There would be a time column in the output as well not shown above.
select event_time, st_location, Row_Number() over(partition by
SUBSTR(location,1,1) order
by event_time)
as groupnumber from pic
Any help would be really appreciated!
Edit:
Time Location Group Number
10:00 A-10112 1
11:00 A-10421 1
12:00 ST-10621 1
13:00 ST-23412 1
17:00 A-19112 2
18:00 ST-74712 2
19:00 A-87812 3
It is a gap and island problem. Use the following code:
select location,
dense_rank() over (partition by SUBSTR(location,1,1) order by grp)
from
(
select (row_number() over (order by time)) -
(row_number() over (partition by SUBSTR(location,1,1) order by time)) grp,
location,
time
from data
) t
order by time
dbfiddle demo
The main idea is in the subquery which isolates consecutive sequences of items (computation of grp column). The rest is simple once you have the grp column.
select DENSE_RANK() over(partition by SUBSTR("location",1,1) ORDER BY SUBSTR("location",1,2))
as Rownumber,
"location" from Table1;
Demo
http://sqlfiddle.com/#!4/21120/16

SQL Server Query to Get Available Employee based on Schedule

I have two tables, parent table Employees and child table Employees_Availability, like this:
Employees table:
EmployeesID Name Group Availability_Order Available
--------------------------------------------------------------
1 Steve Sales 1 TRUE
2 Ann Sales 2 TRUE
3 Jack Sales 3 FALSE
4 Sandy Support 4 TRUE
5 Bill Support 5 TRUE
6 John Support 6 TRUE
Employees_Schedule table:
EmployeesID Day From To
----------------------------------------------
1 Monday 8:00 12:00
1 Monday 13:00 17:00
2 Monday 12:00 13:00
3 Tuesday 7:30 11:30
3 Wednesday 7:30 11:30
3 Friday 14:30 16:30
4 Tuesday 11:30 17:00
5 Wednesday 8:00 12:00
5 Wednesday 13:00 17:00
5 Thursday 12:00 13:00
5 Friday 7:30 11:30
6 Friday 12:00 13:00
How can I create a query that given date/time and Group return first available employee? I am using SQL Server 2012. Here is what I started doing but got stuck:
Select top 1
Name
from
Empolyees e join? Employees_Schedule s
on
e.employeesID = s.EmployeesID
where
e.group = 'Sales'
and DATENAME(Weekday,'5/24/2016 10:00') = s.Day
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' >= CAST(s.from AS TIME)
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' <= CAST(s.to AS TIME)
order by
e.availability_order
Thanks
Have you looked into Window Function and CTE? You could easily achieve this with, for example..
Row_Number() OVER(PARTITION BY day ORDER BY starttime ASC) as ColumnName
Combined with predicate
WHERE columnName = 1 AND groupName = 'groupname'
For detail, read BOL on OVER()Clause here, and CTE here.
It looks like you're close. If you wrap the main part of your SQL in a Common Table Expression and use the row_number() window function then you can find the first available:
;with cte as (
Select top 1
Name,
row_number() over (order by ea.From) PrioritySequence
from
Empolyees e join? Employees_Schedule s
on
e.employeesID = s.EmployeesID
where
e.group = 'Sales'
and DATENAME(Weekday,'5/24/2016 10:00') = s.Day
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' >= CAST(s.from AS TIME)
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' <= CAST(s.to AS TIME)
)
select *
from cte
where PrioritySequence = 1

SQL Time Packing of Islands

I have an sql table that has something similar to this:
EmpNo StartTime EndTime
------------------------------------------
1 7:00 7:30
1 7:15 7:45
1 13:40 15:00
2 8:00 14:00
2 8:30 9:00
3 10:30 14:30
I've seen a lot of examples where you can find the gaps between everything, and a lot of examples where you can pack overlaps for everything. But I want to be able to separate these out by user.
Sadly, I need a pure SQL solution.
Ultimately, I would like to return:
EmpNo StartTime EndTime
------------------------------------------
1 7:00 7:45
1 13:40 15:00
2 8:00 14:00
3 10:30 14:30
It seems simple enough, I have just spent the last day trying to figure it out, and come up with very little. Never will any column here be NULL, and you can assume there could be duplicates, or gaps of 0.
I know this is the classic island problem, but the solutions I have seen so far aren't incredibly friendly with keeping separate ID's grouped
"Pure SQL" would surely support the lag(), lead(), and cumulative sum functions because these are part of the standard. Here is a solution using standard SQL:
select EmpNo, min(StartTime) as StartTime, max(EndTime) as EndTime
from (select t.*, sum(StartGroup) over (partition by EmpNo order by StartTime) as grp
from (select t.*,
(case when StartTime <= lag(EndTime) over (partition by EmpNo order by StartTime)
then 0
else 1
end) as StartGroup
from table t
) t
) t
group by EmpNo, grp;
If your database doesn't support these, you can implement the same logic using correlated subqueries.

MySQL select using datetime, group by date only

Is is possible to select a datetime field from a MySQL table and group by the date only?
I'm trying to output a list of events that happen at multiple times, grouped by the date it happened on.
My table/data looks like this: (the timestamp is a datetime field)
1. 2010-03-21 18:00:00 Event1
2. 2010-03-21 18:30:00 Event2
3. 2010-03-30 13:00:00 Event3
4. 2010-03-30 14:00:00 Event4
I want to output something like this:
March 21st
1800 - Event 1
1830 - Event 2
March 30th
1300 - Event 3
1400 - Event 4
Thanks!
select date_format(created_at, "%Y-m-%d") as date from tablename GROUP BY date
OR
SELECT DATE_FORMAT(date_column, '%H%i') as time, event FROM table ORDER BY DATE_FORMAT(date_column, '%Y-%m-%d'), time
SELECT DATE_FORMAT(date_column, '%H%i'), DATE_FORMAT(date_column, '%M %D'), event FROM table ORDER BY date_column
%H%i - 1830
%M%D - March 21st