SQL Time Packing of Islands - sql

I have an sql table that has something similar to this:
EmpNo StartTime EndTime
------------------------------------------
1 7:00 7:30
1 7:15 7:45
1 13:40 15:00
2 8:00 14:00
2 8:30 9:00
3 10:30 14:30
I've seen a lot of examples where you can find the gaps between everything, and a lot of examples where you can pack overlaps for everything. But I want to be able to separate these out by user.
Sadly, I need a pure SQL solution.
Ultimately, I would like to return:
EmpNo StartTime EndTime
------------------------------------------
1 7:00 7:45
1 13:40 15:00
2 8:00 14:00
3 10:30 14:30
It seems simple enough, I have just spent the last day trying to figure it out, and come up with very little. Never will any column here be NULL, and you can assume there could be duplicates, or gaps of 0.
I know this is the classic island problem, but the solutions I have seen so far aren't incredibly friendly with keeping separate ID's grouped

"Pure SQL" would surely support the lag(), lead(), and cumulative sum functions because these are part of the standard. Here is a solution using standard SQL:
select EmpNo, min(StartTime) as StartTime, max(EndTime) as EndTime
from (select t.*, sum(StartGroup) over (partition by EmpNo order by StartTime) as grp
from (select t.*,
(case when StartTime <= lag(EndTime) over (partition by EmpNo order by StartTime)
then 0
else 1
end) as StartGroup
from table t
) t
) t
group by EmpNo, grp;
If your database doesn't support these, you can implement the same logic using correlated subqueries.

Related

SQL query to find available slots with multiple providers and users

I want to be able to find the number of available slots for a particular time duration for all locations and all days
For example: I have to know the number of available appointments before 10 AM in all locations from the below sample tables
I have looked at other answers in stack overflow, mine is peculiar in the sense it also involves data on multiple doctors/patients.
Doctor's time table
Location
RESOURCE
Day
StartTime
EndTime
ABC
D1
Mon
8:00 AM
12:00 PM
ABC
D1
Tue
8:00 AM
12:00 PM
ABC
D2
Mon
9:00 AM
01:00 PM
ABC
D2
Tue
8:00 AM
12:00 PM
XYZ
D1
Mon
8:00 AM
12:00 PM
XYZ
D1
Tue
8:00 AM
12:00 PM
XYZ
D4
Mon
9:00 AM
01:00 PM
XYZ
D4
Tue
8:00 AM
12:00 PM
Patient's appointment time table
Location
Patient
Duration
StartTime
ApptDt
ABC
P1
15
8:00 AM
10/4/2021
ABC
P2
15
8:15 AM
10/4/2021
ABC
P3
15
9:00 AM
10/4/2021
ABC
P4
15
9:00 AM
10/5/2021
XYZ
P5
15
10:00 AM
10/5/2021
XYZ
P6
15
10:00 AM
10/5/2021
XYZ
P7
15
10:15 AM
10/5/2021
XYZ
P8
15
10:15 AM
10/5/2021
Doctor's time table does not have dates as it is the same throughout the year.
On Mondays in ABC location, since there are 2 doctors overlapping the time between 9:00 AM to 12:00 noon, they can accept multiple appointments at the same time. ie, 2 patients from 9:00 am to 9:15 am can be served in location ABC.
A typical duration(Duration) for an appointment is 15 minutes as indicated in the patient's table.
Expected result set
Location
Date
Available appts
ABC.
10/4/2021
8
XYZ
10/4/2021
12
On 10/4/2021 there were 8 slots available for booking before 10 AM because there were no appointments between
8:30-8:45 for D1
8:45-9:00 for D1
9:00-9:15(2) for D1,D2
9:15-9:30(2) for D1,D2
9:30-9:45(2) for D1,D2
9:45-10:00(2) for D1,D2
I want to also know for a specific time slot how many appointments were booked vs available.
I'd re-imagine this data as transactional using CTEs, compute balances and then find the points where the balance is non-zero.
Conceptually, that means there's a +1 doctor transaction on each doctor's start time, and a -1 doctor transaction on each doctor's end time. Patients are just the reverse, there is a -1 doctor transaction at their start time and a +1 doctor transaction at their start time plus duration.
So something like:
WITH DrStarts AS (
SELECT
1 [Drs],
[Dates].[Date] + [DrSched].StartTime [Timestamp]
FROM [DrSched]
INNER JOIN [Dates]
ON WEEKDAY([Dates]) = [DrSched].[Day]
), DrEnds AS (
SELECT
-1 [Drs],
[Dates].[Date] + [DrSched].EndTime [Timestamp]
FROM [DrSched]
INNER JOIN [Dates]
ON WEEKDAY([Dates]) = [DrSched].[Day]
), ApptStarts AS (
SELECT -1 [Drs], [Date] + [Time] FROM [Appts]
), ApptEnds AS (
SELECT -1 [Drs], DATEADD(MM,[Duration],[Date] + [Time]) FROM [Appts]
), Txns AS (
SELECT *, 1 Priority FROM DrStarts
UNION ALL SELECT *, 1 Priority FROM DrEnds
UNION ALL SELECT *, 0 Priority FROM ApptStarts
UNION ALL SELECT *, 0 Priority FROM ApptEnds
)
I added priorities at the end so we can make sure the patient leaves an instant before the doctor leaves. Then you can get the balance using a windowed function like so:
, AvailDrs AS (
SELECT
*,
SUM([Drs]) OVER( ORDER BY [Timestamp] DESC, [Priority] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) [AvailDrs]
FROM Txns
)
Then to get the available slots, you just do:
SELECT
[AvailDrs].[Timestamp] [From],
LEAD([AvailDrs].[Timestamp]) OVER(ORDER BY [AvailDrs].[Timestamp]) [To],
[AvailDrs].[AvailDrs]
FROM AvailDrs
WHERE [AvailDrs] > 0
Though you may want to filter that to get rid of zero-length windows because those will occur.
This is not very performant, but if you have a high volume scenario, you probably want to reconsider your database design to make this function require less transformation.
You also need to make a date table. I presume you actually have a work calendar somewhere, but if not there are myriad ways to create a date table within a dynamic start/end date so I just assume it exists here. this approach also lets you easily slot in holidays, and perhaps incorporate a dr-specific leave calendar too.
In general, a wide range of difficult SQL probnlems become much easier if you reimagine the data as account/amount/timestamp transactions. Here you don't even subdivide into accounts but you often need that concept for other puzzles.
Also, I haven't tested this exact code, so you may end up with duplicates. If that's the case you may need to global key ORDER BY tie breaker to keep everything running smooth in the windowed functions. You can add this as an identity column to both tables, or just define a CTE with a DENSE_RANK() key column and use that instead of selecting from the tables directly.

Grouping shift data by 7-day windows in SQL Server 2012

What I want to do is to calculate the number of shifts and hours worked by each employee in any given 7-day period. In order to achieve this, I need to identify and group 'islands' of shifts. Note that this 7-day period is not tied to a calendar week and the beginning and ending of this 7-day period would vary from employee to employee. This is sets it apart from other similar questions asked her in the past.
I have a table like this:
Person ID Start Date End Date Start time End time Hours Worked
12345 06-07-20 06-07-20 6:00 AM 7:45 AM 1.75
12345 06-07-20 06-07-20 8:15 AM 8:45 AM 0.50
12345 06-07-20 06-07-20 9:19 AM 9:43 AM 0.40
12345 08-07-20 08-07-20 12:00 AM 12:39 AM 0.65
12345 09-07-20 09-07-20 10:05 PM 11:59 PM 1.90
12345 11-07-20 11-07-20 4:39 PM 4:54 PM 0.25
12345 22-07-20 22-07-20 7:00 AM 7:30 AM 0.50
12345 23-07-20 23-07-20 1:00 PM 3:00 PM 2.00
12345 24-07-20 24-07-20 9:14 AM 9:35 AM 0.35
12345 27-07-20 27-07-20 4:00 PM 6:00 PM 2.00
12345 27-07-20 27-07-20 2:00 PM 4:00 PM 2.00
12345 28-07-20 28-07-20 9:00 AM 10:00 AM 1.00
12345 28-07-20 28-07-20 4:39 AM 4:59 AM 0.34
I want group and summarise the data above like this:
Person ID From To Number of shifts Number of Hours
12345 06-07-20 11-07-20 6 5.45
12345 22-07-20 28-07-20 7 8.19
Note that the first grouping for employee 12345 starts on 06-07-20 and ends on 11-07-20 because these shifts fall within the 06-07-20 - 13-07-20 7-day window.
The next day 7-day window is from 22-07-20 to 28-07-20, which means that the start date for the 7-day window has to be dynamic and based on the data i.e. not constant which makes this a complex task.
Also note that an employee may work multiple shifts in a day and that the shifts may not be consecutive.
I was playing around with using DATEDIFF() with LAG() and LEAD() but was unable to get to where I want. Any help would be appreciated.
I think you need a recursive CTE gor this. The idea is to enumerate the shifts of each person, and then iteratively walk the dataset, while keeping track of the first date of the period - when there is more than 7 days between the start of a period and the current date, the start date resets, and a new group starts.
with recursive
data as (select t.*, row_number() over(partition by personid order by start_date) rn from mytable t)
cte as (
select personid, start_date, start_date end_date, hours_worked, rn
from data
where rn = 1
union all
select
c.personid,
case when d.start_date > dateadd(day, 7, c.start_date) then d.start_date else c.start_date end,
d.start_date,
d.hours_worked,
d.rn
from cte c
inner join data d on d.personid = c.personid and d.rn = c.rn + 1
)
select personid, start_date, max(start_date) end_date, count(*) no_shifts, sum(hours_worked)
from cte
group by personid, start_date
This assumes that:
dates do not span over multiple days, as shown in your sample data
dates are stored as date datatype, and times as time

How to have the rolling distinct count of each day for past three days in Oracle SQL?

I searched for this a lot, but I couldn't find the solution yet. let me explain my question by sample data and my desired output.
sample data:
datetime customer
---------- --------
2018-10-21 09:00 Ryan
2018-10-21 10:00 Sarah
2018-10-21 20:00 Sarah
2018-10-22 09:00 Peter
2018-10-22 10:00 Andy
2018-10-23 09:00 Sarah
2018-10-23 10:00 Peter
2018-10-24 10:00 Andy
2018-10-24 20:00 Andy
my desired output is to have the distinctive number of customers for past three days relative to each day:
trunc(datetime) progressive count distinct customer
--------------- -----------------------------------
2018-10-21 2
2018-10-22 4
2018-10-23 4
2018-10-24 3
explanation: for 21th, because we have only Ryan and Sarah the count is 2 (also because we have no other records before 21th); for 22th Andy and Peter are added to the distinct list, so it's 4. for 23th, no new customer is added so it would be 4. for 24th, however, as we only should consider past 3 days (as per business logic), we should only take 24th,23th and 22th; so the distinct customers would be Sarah, Andy and Peter. so the count is 3.
I believe it is called the progressive count, or moving count or rolling up count. but I couldn't implement it in Oracle 11g SQL. Obviously it's easy by using PL-SQL programming (Stored-Procedure/Function). but, preferably I wonder if we can have it by a single SQL query.
What you seem to want is:
select date,
count(distinct customer) over (order by date rows between 2 preceding and current row)
from (select distinct trunc(datetime) as date, customer
from t
) t
group by date;
However, Oracle does not support window frames with count(distinct).
One rather brute force approach is a correlated subquery:
select date,
(select count(distinct t2.customer)
from t t2
where t2.datetime >= t.date - 2
) as running_3
from (select distinct trunc(datetime) as date
from t
) t;
This should have reasonable performance for a small number of dates. As the number of dates increases, the performance will degrade linearly.

Grouping sets of data in Oracle SQL

I have been trying to separate groups in data being stored on my oracle database for more accurate analysis.
Current Output
Time Location
10:00 A111
11:00 A112
12:00 S111
13:00 S234
17:00 A234
18:00 S747
19:00 A878
Desired Output
Time Location Group Number
10:00 A111 1
11:00 A112 1
12:00 S111 1
13:00 S234 1
17:00 A234 2
18:00 S747 2
19:00 A878 3
I have been trying to use over and partition by to assign the values, however I can only get into to increment all the time not only on a change. Also tried using lag but I struggled to make use of that.
I only need the value in the second column to start from 1 and increment when the first letter of field 1 changes (using substr).
This is my attempt using row_number but I am far off I think. There would be a time column in the output as well not shown above.
select event_time, st_location, Row_Number() over(partition by
SUBSTR(location,1,1) order
by event_time)
as groupnumber from pic
Any help would be really appreciated!
Edit:
Time Location Group Number
10:00 A-10112 1
11:00 A-10421 1
12:00 ST-10621 1
13:00 ST-23412 1
17:00 A-19112 2
18:00 ST-74712 2
19:00 A-87812 3
It is a gap and island problem. Use the following code:
select location,
dense_rank() over (partition by SUBSTR(location,1,1) order by grp)
from
(
select (row_number() over (order by time)) -
(row_number() over (partition by SUBSTR(location,1,1) order by time)) grp,
location,
time
from data
) t
order by time
dbfiddle demo
The main idea is in the subquery which isolates consecutive sequences of items (computation of grp column). The rest is simple once you have the grp column.
select DENSE_RANK() over(partition by SUBSTR("location",1,1) ORDER BY SUBSTR("location",1,2))
as Rownumber,
"location" from Table1;
Demo
http://sqlfiddle.com/#!4/21120/16

SQL getting datediff from same field

I have a problem. I need to get the date difference in terms of hours in my table but the problem is it is saved in the same field. This is my table would look like.
RecNo. Employeeno recorddate recordtime recordval
1 001 8/22/2014 8:15 AM 1
2 001 8/22/2014 5:00 PM 2
3 001 8/24/2014 8:01 AM 1
4 001 8/24/2014 5:01 PM 2
1 indicates time in and 2 indicates time out. Now, How will i get the number of hours worked for each day? What i want to get is something like this.
Date hoursworked
8/22/2014 8
8/24/2014 8
I am using VS 2010 and SQL server 2005
You could self-join each "in" record with its corresponding "out" record and use datediff to subtract them:
SELECT time_in.employeeno AS "Employee No",
time_in.recorddate AS "Date",
DATEDIFF (hour, time_in.recordtime, time_out.recordtime)
AS "Hours Worked"
FROM (SELECT *
FROM my_table
WHERE recordval = 1) time_in
INNER JOIN (SELECT *
FROM my_table
WHERE recordval = 2) time_out
ON time_in.employeeno = time_out.employeeno AND
time_in.recorddate = time_out.recorddate
If you always record time in and time out for every employee, and just one per day, using a self-join should work:
SELECT
t1.Employeeno,
t1.recorddate,
t1.recordtime AS [TimeIn],
t2.recordtime AS [TimeOut],
DATEDIFF(HOUR,t1.recordtime, t2.recordtime) AS [HoursWorked]
FROM Table1 t1
INNER JOIN Table1 t2 ON
t1.Employeeno = t2.Employeeno
AND t1.recorddate = t2.recorddate
WHERE t1.recordval = 1 AND t2.recordval = 2
I included the recordtime fields as time in, time out, if you don't want them just remove them.
Note that this datediff calculation gives 9 hours, and not 8 as you suggested.
Sample SQL Fiddle
Using this sample data:
with table1 as (
select * from ( values
(1,'001', cast('20140822' as datetime),cast('08:15:00 am' as time),1)
,(2,'001', cast('20140822' as datetime),cast('05:00:00 pm' as time),2)
,(3,'001', cast('20140824' as datetime),cast('08:01:00 am' as time),1)
,(4,'001', cast('20140824' as datetime),cast('04:59:00 pm' as time),2)
,(5,'001', cast('20140825' as datetime),cast('10:01:00 pm' as time),1)
,(6,'001', cast('20140826' as datetime),cast('05:59:00 am' as time),2)
)data(RecNo,EmployeeNo,recordDate,recordTime,recordVal)
)
this query
SELECT
Employeeno
,convert(char(10),recorddate,120) as DateStart
,convert(char(5),cast(TimeIn as time)) as TimeIn
,convert(char(5),cast(TimeOut as time)) as TimeOut
,DATEDIFF(minute,timeIn, timeOut) / 60 AS [HoursWorked]
,DATEDIFF(minute,timeIn, timeOut) % 60 AS [MinutesWorked]
FROM (
SELECT
tIn.Employeeno,
tIn.recorddate,
dateadd(minute, datediff(minute,0,tIn.recordTime), tIn.recordDate)
as TimeIn,
( SELECT TOP 1
dateadd(minute, datediff(minute,0,tOut.recordTime), tOut.recordDate)
as TimeOut
FROM Table1 tOut
WHERE tOut.RecordVal = 2
AND tOut.EmployeeNo = tIn.EmployeeNo
AND tOut.RecNo > tIn.RecNo
ORDER BY tOut.EmployeeNo, tOut.RecNo
) as TimeOut
FROM Table1 tIn
WHERE tIn.recordval = 1
) T
yields (as desired)
Employeeno DateStart TimeIn TimeOut HoursWorked MinutesWorked
---------- ---------- ------ ------- ----------- -------------
001 2014-08-22 08:15 17:00 8 45
001 2014-08-24 08:01 16:59 8 58
001 2014-08-25 22:01 05:59 7 58
No assumptions are made about shifts not running across midnight (see case 3).
This particular implementation may not be the most performant way to construct this correlated subquery, so if there is a performance problem come back and we can look at it again. However running those tests requires a large dataset which I don't feel like constructing just now.