SQL query to find available slots with multiple providers and users - sql

I want to be able to find the number of available slots for a particular time duration for all locations and all days
For example: I have to know the number of available appointments before 10 AM in all locations from the below sample tables
I have looked at other answers in stack overflow, mine is peculiar in the sense it also involves data on multiple doctors/patients.
Doctor's time table
Location
RESOURCE
Day
StartTime
EndTime
ABC
D1
Mon
8:00 AM
12:00 PM
ABC
D1
Tue
8:00 AM
12:00 PM
ABC
D2
Mon
9:00 AM
01:00 PM
ABC
D2
Tue
8:00 AM
12:00 PM
XYZ
D1
Mon
8:00 AM
12:00 PM
XYZ
D1
Tue
8:00 AM
12:00 PM
XYZ
D4
Mon
9:00 AM
01:00 PM
XYZ
D4
Tue
8:00 AM
12:00 PM
Patient's appointment time table
Location
Patient
Duration
StartTime
ApptDt
ABC
P1
15
8:00 AM
10/4/2021
ABC
P2
15
8:15 AM
10/4/2021
ABC
P3
15
9:00 AM
10/4/2021
ABC
P4
15
9:00 AM
10/5/2021
XYZ
P5
15
10:00 AM
10/5/2021
XYZ
P6
15
10:00 AM
10/5/2021
XYZ
P7
15
10:15 AM
10/5/2021
XYZ
P8
15
10:15 AM
10/5/2021
Doctor's time table does not have dates as it is the same throughout the year.
On Mondays in ABC location, since there are 2 doctors overlapping the time between 9:00 AM to 12:00 noon, they can accept multiple appointments at the same time. ie, 2 patients from 9:00 am to 9:15 am can be served in location ABC.
A typical duration(Duration) for an appointment is 15 minutes as indicated in the patient's table.
Expected result set
Location
Date
Available appts
ABC.
10/4/2021
8
XYZ
10/4/2021
12
On 10/4/2021 there were 8 slots available for booking before 10 AM because there were no appointments between
8:30-8:45 for D1
8:45-9:00 for D1
9:00-9:15(2) for D1,D2
9:15-9:30(2) for D1,D2
9:30-9:45(2) for D1,D2
9:45-10:00(2) for D1,D2
I want to also know for a specific time slot how many appointments were booked vs available.

I'd re-imagine this data as transactional using CTEs, compute balances and then find the points where the balance is non-zero.
Conceptually, that means there's a +1 doctor transaction on each doctor's start time, and a -1 doctor transaction on each doctor's end time. Patients are just the reverse, there is a -1 doctor transaction at their start time and a +1 doctor transaction at their start time plus duration.
So something like:
WITH DrStarts AS (
SELECT
1 [Drs],
[Dates].[Date] + [DrSched].StartTime [Timestamp]
FROM [DrSched]
INNER JOIN [Dates]
ON WEEKDAY([Dates]) = [DrSched].[Day]
), DrEnds AS (
SELECT
-1 [Drs],
[Dates].[Date] + [DrSched].EndTime [Timestamp]
FROM [DrSched]
INNER JOIN [Dates]
ON WEEKDAY([Dates]) = [DrSched].[Day]
), ApptStarts AS (
SELECT -1 [Drs], [Date] + [Time] FROM [Appts]
), ApptEnds AS (
SELECT -1 [Drs], DATEADD(MM,[Duration],[Date] + [Time]) FROM [Appts]
), Txns AS (
SELECT *, 1 Priority FROM DrStarts
UNION ALL SELECT *, 1 Priority FROM DrEnds
UNION ALL SELECT *, 0 Priority FROM ApptStarts
UNION ALL SELECT *, 0 Priority FROM ApptEnds
)
I added priorities at the end so we can make sure the patient leaves an instant before the doctor leaves. Then you can get the balance using a windowed function like so:
, AvailDrs AS (
SELECT
*,
SUM([Drs]) OVER( ORDER BY [Timestamp] DESC, [Priority] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) [AvailDrs]
FROM Txns
)
Then to get the available slots, you just do:
SELECT
[AvailDrs].[Timestamp] [From],
LEAD([AvailDrs].[Timestamp]) OVER(ORDER BY [AvailDrs].[Timestamp]) [To],
[AvailDrs].[AvailDrs]
FROM AvailDrs
WHERE [AvailDrs] > 0
Though you may want to filter that to get rid of zero-length windows because those will occur.
This is not very performant, but if you have a high volume scenario, you probably want to reconsider your database design to make this function require less transformation.
You also need to make a date table. I presume you actually have a work calendar somewhere, but if not there are myriad ways to create a date table within a dynamic start/end date so I just assume it exists here. this approach also lets you easily slot in holidays, and perhaps incorporate a dr-specific leave calendar too.
In general, a wide range of difficult SQL probnlems become much easier if you reimagine the data as account/amount/timestamp transactions. Here you don't even subdivide into accounts but you often need that concept for other puzzles.
Also, I haven't tested this exact code, so you may end up with duplicates. If that's the case you may need to global key ORDER BY tie breaker to keep everything running smooth in the windowed functions. You can add this as an identity column to both tables, or just define a CTE with a DENSE_RANK() key column and use that instead of selecting from the tables directly.

Related

Subsetting on dates for a SQL query

Using Snowflake, I am attempting to subset on customers that have no current subscriptions, and eliminating all IDs for those which have current/active contracts.
Each ID will typically have multiple records associated with a contract/renewal history for a particular ID/customer.
It is only known if a customer is active if there is no contract that goes beyond the current date, while there are likely multiple past contracts which have lapsed, but the account is still active if one of those contract end dates goes beyond the current date.
Consider the following table:
Date_Start
Date_End
Name
ID
2015-07-03
2019-07-03
Piggly
001
2019-07-04
2025-07-04
Piggly
001
2013-10-01
2017-12-31
Doggy
031
2018-01-01
2018-06-30
Doggy
031
2020-01-01
2021-03-14
Catty
022
2021-03-15
2024-06-01
Catty
022
1999-06-01
2021-06-01
Horsey
052
2021-06-02
2022-01-01
Horsey
052
2022-01-02
2022-07-04
Horsey
052
With a desired output non-active customers that do not have an end date beyond Jan 5th 2023 (or current/arbitrary date)
Name
ID
Doggy
031
Horsey
052
My first attempt was:
SELECT Name, ID
FROM table
WHERE Date_End < GETDATE()
but the obvious problem is that I'll also be selecting past contracts of customers who haven't expired/churned and who have a contract that goes beyond the current date.
How do I resolve this?
As there are many rows per name and ID, you should aggregate the data and then use a HAVING clause to select only those you are interested in.
SELECT name, id
FROM table
GROUP BY name, id
HAVING MAX(date_end) < GETDATE();
You can work it out with an EXCEPT operator, if your DBMS supports it:
SELECT DISTINCT Name, ID FROM tab
EXCEPT
SELECT DISTINCT Name, ID FROM tab WHERE Date_end > <your_date>
This would removes the active <Name, ID> pairs from the whole.

Grouping shift data by 7-day windows in SQL Server 2012

What I want to do is to calculate the number of shifts and hours worked by each employee in any given 7-day period. In order to achieve this, I need to identify and group 'islands' of shifts. Note that this 7-day period is not tied to a calendar week and the beginning and ending of this 7-day period would vary from employee to employee. This is sets it apart from other similar questions asked her in the past.
I have a table like this:
Person ID Start Date End Date Start time End time Hours Worked
12345 06-07-20 06-07-20 6:00 AM 7:45 AM 1.75
12345 06-07-20 06-07-20 8:15 AM 8:45 AM 0.50
12345 06-07-20 06-07-20 9:19 AM 9:43 AM 0.40
12345 08-07-20 08-07-20 12:00 AM 12:39 AM 0.65
12345 09-07-20 09-07-20 10:05 PM 11:59 PM 1.90
12345 11-07-20 11-07-20 4:39 PM 4:54 PM 0.25
12345 22-07-20 22-07-20 7:00 AM 7:30 AM 0.50
12345 23-07-20 23-07-20 1:00 PM 3:00 PM 2.00
12345 24-07-20 24-07-20 9:14 AM 9:35 AM 0.35
12345 27-07-20 27-07-20 4:00 PM 6:00 PM 2.00
12345 27-07-20 27-07-20 2:00 PM 4:00 PM 2.00
12345 28-07-20 28-07-20 9:00 AM 10:00 AM 1.00
12345 28-07-20 28-07-20 4:39 AM 4:59 AM 0.34
I want group and summarise the data above like this:
Person ID From To Number of shifts Number of Hours
12345 06-07-20 11-07-20 6 5.45
12345 22-07-20 28-07-20 7 8.19
Note that the first grouping for employee 12345 starts on 06-07-20 and ends on 11-07-20 because these shifts fall within the 06-07-20 - 13-07-20 7-day window.
The next day 7-day window is from 22-07-20 to 28-07-20, which means that the start date for the 7-day window has to be dynamic and based on the data i.e. not constant which makes this a complex task.
Also note that an employee may work multiple shifts in a day and that the shifts may not be consecutive.
I was playing around with using DATEDIFF() with LAG() and LEAD() but was unable to get to where I want. Any help would be appreciated.
I think you need a recursive CTE gor this. The idea is to enumerate the shifts of each person, and then iteratively walk the dataset, while keeping track of the first date of the period - when there is more than 7 days between the start of a period and the current date, the start date resets, and a new group starts.
with recursive
data as (select t.*, row_number() over(partition by personid order by start_date) rn from mytable t)
cte as (
select personid, start_date, start_date end_date, hours_worked, rn
from data
where rn = 1
union all
select
c.personid,
case when d.start_date > dateadd(day, 7, c.start_date) then d.start_date else c.start_date end,
d.start_date,
d.hours_worked,
d.rn
from cte c
inner join data d on d.personid = c.personid and d.rn = c.rn + 1
)
select personid, start_date, max(start_date) end_date, count(*) no_shifts, sum(hours_worked)
from cte
group by personid, start_date
This assumes that:
dates do not span over multiple days, as shown in your sample data
dates are stored as date datatype, and times as time

Rank values from table based on temporal sequences of values

I have a table similar to this one, representing which drivers were driving different cars at certain times.
CAR_ID DRIVER_ID DT
10 A 10:00
10 A 12:00
10 A 14:00
10 B 16:00
10 B 17:00
10 B 20:00
10 A 21:00
10 A 22:00
20 C 15:00
20 C 18:00
Where DT is a datetime. I'm trying to have something similar to what I would obtain using a DENSE_RANK() function but generating a new number when there is a change on the column DRIVER_ID between two drivers. This would be my expected output:
CAR_ID DRIVER_ID DT RES
10 A 10:00 1
10 A 12:00 1
10 A 14:00 1
10 B 16:00 2
10 B 17:00 2
10 B 20:00 2
10 A 21:00 3 #
10 A 22:00 3 #
20 C 15:00 4
20 C 18:00 4
Using DENSE_RANK() OVER (PARTITION BY CAR_ID, DRIVER_ID ORDER BY DT) AS RES I get the two elements marked with a # as members of the same group as the first three rows, but I want them to be different, because of the "discontinuity" (the car was driven by another driver from 16:00 to 20:00). I can't seem to find a solution that doesn't include a loop. Is this possible?
Any help would be greatly appreciated.
This can be done with lag and running sum.
select t.*,sum(case when prev_driver = driver then 0 else 1 end) over(partition by id order by dt) as res
from (select t.*,lag(driver_id) over(partition by id order by dt) as prev_driver
from tbl
) t
You need to do.a row_number partitioned by car and ordered by dt. Also you need to do a row_number partitioned by car and driver and ordered by dt. Subtracting the second of these from the first gives you a unique "segment" number - which in this case will represent the continuous period of possession that each driver had of each car.
This segment number value has no Intrinsic meaning - it is just guaranteed to be different for each segment within the partition of cars and drivers. Then use this segment number as an additional partition for whatever function you are trying to apply.
As a note, however, I couldn't work out how you got the results you've displayed for RES from the code you quote, and thus I'm not exactly sure what you are trying to achieve overall.

postgresql query to get counts between 12:00 and 12:00

I have the following query that works fine, but it is giving me counts for a single, whole day (00:00 to 23:59 UTC). For example, it's giving me counts for all of January 1 2017 (00:00 to 23:59 UTC).
My dataset lends itself to be queried from 12:00 UTC to 12:00 UTC. For example, I'm looking for all counts from Jan 1 2017 12:00 UTC to Jan 2 2017 12:00 UTC.
Here is my query:
SELECT count(DISTINCT ltg_data.lat), cwa, to_char(time, 'MM/DD/YYYY')
FROM counties
JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom)
WHERE cwa = 'MFR'
AND time BETWEEN '1987-06-01'
AND '1992-08-1'
GROUP BY cwa, to_char(time, 'MM/DD/YYYY');
FYI...I'm changing the format of the time so I can use the results more readily in javascript.
And a description of the dataset. It's thousands of point data that occurs within various polygons every second. I'm determining if the points are occurring withing the polygon "cwa = MFR" and then counting them.
Thanks for any help!
I see two approaches here.
first, join generate_series(start_date::timestamp,end_date,'12 hours'::interval) to get count in those generate_series. this would be more correct I believe. But it has a major minus - you have to lateral join it against existing data set to use min(time) and max(time)...
second, a monkey hack itself, but much less coding and less querying. Use different time zone to make 12:00 a start of the day, eg (you did not give the sample, so I generate content of counties with generate_series with 2 hours interval as sample data):
t=# with counties as (select generate_series('2017-09-01'::timestamptz,'2017-09-04'::timestamptz,'2 hours'::interval)
g)
select count(1),to_char(g,'MM/DD/YYYY') from counties
group by to_char(g,'MM/DD/YYYY')
order by 2;
count | to_char
-------+------------
12 | 09/01/2017
12 | 09/02/2017
12 | 09/03/2017
1 | 09/04/2017
(4 rows)
so for UTC time zone there are 12 two hours interval rows for days above, due to inclusive nature of generate_series in my sample, 1 row for last days. in general: 37 rows.
Now a monkey hack:
t=# with counties as (select generate_series('2017-09-01'::timestamptz,'2017-09-04'::timestamptz,'2 hours'::interval)
g)
select count(1),to_char(g at time zone 'utc+12','MM/DD/YYYY') from counties
group by to_char(g at time zone 'utc+12','MM/DD/YYYY')
order by 2;
count | to_char
-------+------------
6 | 08/31/2017
12 | 09/01/2017
12 | 09/02/2017
7 | 09/03/2017
(4 rows)
I select same dates for different time zone, switching it exactly 12 hours, getting first day starting at 31 Aug middday, not 1 Sep midnight, and the count changes, still totalling 37 rows, but grouping your requested way...
update
for your query I'd try smth like:
SELECT count(DISTINCT ltg_data.lat), cwa, to_char(time at time zone 'utc+12', 'MM/DD/YYYY')
FROM counties
JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom)
WHERE cwa = 'MFR'
AND time BETWEEN '1987-06-01'
AND '1992-08-1'
GROUP BY cwa, to_char(time at time zone 'utc+12', 'MM/DD/YYYY');
also if you want to apply +12 hours logic to where clause - add at time zone 'utc+12' to "time" comparison as well

SQL getting datediff from same field

I have a problem. I need to get the date difference in terms of hours in my table but the problem is it is saved in the same field. This is my table would look like.
RecNo. Employeeno recorddate recordtime recordval
1 001 8/22/2014 8:15 AM 1
2 001 8/22/2014 5:00 PM 2
3 001 8/24/2014 8:01 AM 1
4 001 8/24/2014 5:01 PM 2
1 indicates time in and 2 indicates time out. Now, How will i get the number of hours worked for each day? What i want to get is something like this.
Date hoursworked
8/22/2014 8
8/24/2014 8
I am using VS 2010 and SQL server 2005
You could self-join each "in" record with its corresponding "out" record and use datediff to subtract them:
SELECT time_in.employeeno AS "Employee No",
time_in.recorddate AS "Date",
DATEDIFF (hour, time_in.recordtime, time_out.recordtime)
AS "Hours Worked"
FROM (SELECT *
FROM my_table
WHERE recordval = 1) time_in
INNER JOIN (SELECT *
FROM my_table
WHERE recordval = 2) time_out
ON time_in.employeeno = time_out.employeeno AND
time_in.recorddate = time_out.recorddate
If you always record time in and time out for every employee, and just one per day, using a self-join should work:
SELECT
t1.Employeeno,
t1.recorddate,
t1.recordtime AS [TimeIn],
t2.recordtime AS [TimeOut],
DATEDIFF(HOUR,t1.recordtime, t2.recordtime) AS [HoursWorked]
FROM Table1 t1
INNER JOIN Table1 t2 ON
t1.Employeeno = t2.Employeeno
AND t1.recorddate = t2.recorddate
WHERE t1.recordval = 1 AND t2.recordval = 2
I included the recordtime fields as time in, time out, if you don't want them just remove them.
Note that this datediff calculation gives 9 hours, and not 8 as you suggested.
Sample SQL Fiddle
Using this sample data:
with table1 as (
select * from ( values
(1,'001', cast('20140822' as datetime),cast('08:15:00 am' as time),1)
,(2,'001', cast('20140822' as datetime),cast('05:00:00 pm' as time),2)
,(3,'001', cast('20140824' as datetime),cast('08:01:00 am' as time),1)
,(4,'001', cast('20140824' as datetime),cast('04:59:00 pm' as time),2)
,(5,'001', cast('20140825' as datetime),cast('10:01:00 pm' as time),1)
,(6,'001', cast('20140826' as datetime),cast('05:59:00 am' as time),2)
)data(RecNo,EmployeeNo,recordDate,recordTime,recordVal)
)
this query
SELECT
Employeeno
,convert(char(10),recorddate,120) as DateStart
,convert(char(5),cast(TimeIn as time)) as TimeIn
,convert(char(5),cast(TimeOut as time)) as TimeOut
,DATEDIFF(minute,timeIn, timeOut) / 60 AS [HoursWorked]
,DATEDIFF(minute,timeIn, timeOut) % 60 AS [MinutesWorked]
FROM (
SELECT
tIn.Employeeno,
tIn.recorddate,
dateadd(minute, datediff(minute,0,tIn.recordTime), tIn.recordDate)
as TimeIn,
( SELECT TOP 1
dateadd(minute, datediff(minute,0,tOut.recordTime), tOut.recordDate)
as TimeOut
FROM Table1 tOut
WHERE tOut.RecordVal = 2
AND tOut.EmployeeNo = tIn.EmployeeNo
AND tOut.RecNo > tIn.RecNo
ORDER BY tOut.EmployeeNo, tOut.RecNo
) as TimeOut
FROM Table1 tIn
WHERE tIn.recordval = 1
) T
yields (as desired)
Employeeno DateStart TimeIn TimeOut HoursWorked MinutesWorked
---------- ---------- ------ ------- ----------- -------------
001 2014-08-22 08:15 17:00 8 45
001 2014-08-24 08:01 16:59 8 58
001 2014-08-25 22:01 05:59 7 58
No assumptions are made about shifts not running across midnight (see case 3).
This particular implementation may not be the most performant way to construct this correlated subquery, so if there is a performance problem come back and we can look at it again. However running those tests requires a large dataset which I don't feel like constructing just now.