Rank values from table based on temporal sequences of values - sql

I have a table similar to this one, representing which drivers were driving different cars at certain times.
CAR_ID DRIVER_ID DT
10 A 10:00
10 A 12:00
10 A 14:00
10 B 16:00
10 B 17:00
10 B 20:00
10 A 21:00
10 A 22:00
20 C 15:00
20 C 18:00
Where DT is a datetime. I'm trying to have something similar to what I would obtain using a DENSE_RANK() function but generating a new number when there is a change on the column DRIVER_ID between two drivers. This would be my expected output:
CAR_ID DRIVER_ID DT RES
10 A 10:00 1
10 A 12:00 1
10 A 14:00 1
10 B 16:00 2
10 B 17:00 2
10 B 20:00 2
10 A 21:00 3 #
10 A 22:00 3 #
20 C 15:00 4
20 C 18:00 4
Using DENSE_RANK() OVER (PARTITION BY CAR_ID, DRIVER_ID ORDER BY DT) AS RES I get the two elements marked with a # as members of the same group as the first three rows, but I want them to be different, because of the "discontinuity" (the car was driven by another driver from 16:00 to 20:00). I can't seem to find a solution that doesn't include a loop. Is this possible?
Any help would be greatly appreciated.

This can be done with lag and running sum.
select t.*,sum(case when prev_driver = driver then 0 else 1 end) over(partition by id order by dt) as res
from (select t.*,lag(driver_id) over(partition by id order by dt) as prev_driver
from tbl
) t

You need to do.a row_number partitioned by car and ordered by dt. Also you need to do a row_number partitioned by car and driver and ordered by dt. Subtracting the second of these from the first gives you a unique "segment" number - which in this case will represent the continuous period of possession that each driver had of each car.
This segment number value has no Intrinsic meaning - it is just guaranteed to be different for each segment within the partition of cars and drivers. Then use this segment number as an additional partition for whatever function you are trying to apply.
As a note, however, I couldn't work out how you got the results you've displayed for RES from the code you quote, and thus I'm not exactly sure what you are trying to achieve overall.

Related

SQL query to find available slots with multiple providers and users

I want to be able to find the number of available slots for a particular time duration for all locations and all days
For example: I have to know the number of available appointments before 10 AM in all locations from the below sample tables
I have looked at other answers in stack overflow, mine is peculiar in the sense it also involves data on multiple doctors/patients.
Doctor's time table
Location
RESOURCE
Day
StartTime
EndTime
ABC
D1
Mon
8:00 AM
12:00 PM
ABC
D1
Tue
8:00 AM
12:00 PM
ABC
D2
Mon
9:00 AM
01:00 PM
ABC
D2
Tue
8:00 AM
12:00 PM
XYZ
D1
Mon
8:00 AM
12:00 PM
XYZ
D1
Tue
8:00 AM
12:00 PM
XYZ
D4
Mon
9:00 AM
01:00 PM
XYZ
D4
Tue
8:00 AM
12:00 PM
Patient's appointment time table
Location
Patient
Duration
StartTime
ApptDt
ABC
P1
15
8:00 AM
10/4/2021
ABC
P2
15
8:15 AM
10/4/2021
ABC
P3
15
9:00 AM
10/4/2021
ABC
P4
15
9:00 AM
10/5/2021
XYZ
P5
15
10:00 AM
10/5/2021
XYZ
P6
15
10:00 AM
10/5/2021
XYZ
P7
15
10:15 AM
10/5/2021
XYZ
P8
15
10:15 AM
10/5/2021
Doctor's time table does not have dates as it is the same throughout the year.
On Mondays in ABC location, since there are 2 doctors overlapping the time between 9:00 AM to 12:00 noon, they can accept multiple appointments at the same time. ie, 2 patients from 9:00 am to 9:15 am can be served in location ABC.
A typical duration(Duration) for an appointment is 15 minutes as indicated in the patient's table.
Expected result set
Location
Date
Available appts
ABC.
10/4/2021
8
XYZ
10/4/2021
12
On 10/4/2021 there were 8 slots available for booking before 10 AM because there were no appointments between
8:30-8:45 for D1
8:45-9:00 for D1
9:00-9:15(2) for D1,D2
9:15-9:30(2) for D1,D2
9:30-9:45(2) for D1,D2
9:45-10:00(2) for D1,D2
I want to also know for a specific time slot how many appointments were booked vs available.
I'd re-imagine this data as transactional using CTEs, compute balances and then find the points where the balance is non-zero.
Conceptually, that means there's a +1 doctor transaction on each doctor's start time, and a -1 doctor transaction on each doctor's end time. Patients are just the reverse, there is a -1 doctor transaction at their start time and a +1 doctor transaction at their start time plus duration.
So something like:
WITH DrStarts AS (
SELECT
1 [Drs],
[Dates].[Date] + [DrSched].StartTime [Timestamp]
FROM [DrSched]
INNER JOIN [Dates]
ON WEEKDAY([Dates]) = [DrSched].[Day]
), DrEnds AS (
SELECT
-1 [Drs],
[Dates].[Date] + [DrSched].EndTime [Timestamp]
FROM [DrSched]
INNER JOIN [Dates]
ON WEEKDAY([Dates]) = [DrSched].[Day]
), ApptStarts AS (
SELECT -1 [Drs], [Date] + [Time] FROM [Appts]
), ApptEnds AS (
SELECT -1 [Drs], DATEADD(MM,[Duration],[Date] + [Time]) FROM [Appts]
), Txns AS (
SELECT *, 1 Priority FROM DrStarts
UNION ALL SELECT *, 1 Priority FROM DrEnds
UNION ALL SELECT *, 0 Priority FROM ApptStarts
UNION ALL SELECT *, 0 Priority FROM ApptEnds
)
I added priorities at the end so we can make sure the patient leaves an instant before the doctor leaves. Then you can get the balance using a windowed function like so:
, AvailDrs AS (
SELECT
*,
SUM([Drs]) OVER( ORDER BY [Timestamp] DESC, [Priority] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) [AvailDrs]
FROM Txns
)
Then to get the available slots, you just do:
SELECT
[AvailDrs].[Timestamp] [From],
LEAD([AvailDrs].[Timestamp]) OVER(ORDER BY [AvailDrs].[Timestamp]) [To],
[AvailDrs].[AvailDrs]
FROM AvailDrs
WHERE [AvailDrs] > 0
Though you may want to filter that to get rid of zero-length windows because those will occur.
This is not very performant, but if you have a high volume scenario, you probably want to reconsider your database design to make this function require less transformation.
You also need to make a date table. I presume you actually have a work calendar somewhere, but if not there are myriad ways to create a date table within a dynamic start/end date so I just assume it exists here. this approach also lets you easily slot in holidays, and perhaps incorporate a dr-specific leave calendar too.
In general, a wide range of difficult SQL probnlems become much easier if you reimagine the data as account/amount/timestamp transactions. Here you don't even subdivide into accounts but you often need that concept for other puzzles.
Also, I haven't tested this exact code, so you may end up with duplicates. If that's the case you may need to global key ORDER BY tie breaker to keep everything running smooth in the windowed functions. You can add this as an identity column to both tables, or just define a CTE with a DENSE_RANK() key column and use that instead of selecting from the tables directly.

Grouping sets of data in Oracle SQL

I have been trying to separate groups in data being stored on my oracle database for more accurate analysis.
Current Output
Time Location
10:00 A111
11:00 A112
12:00 S111
13:00 S234
17:00 A234
18:00 S747
19:00 A878
Desired Output
Time Location Group Number
10:00 A111 1
11:00 A112 1
12:00 S111 1
13:00 S234 1
17:00 A234 2
18:00 S747 2
19:00 A878 3
I have been trying to use over and partition by to assign the values, however I can only get into to increment all the time not only on a change. Also tried using lag but I struggled to make use of that.
I only need the value in the second column to start from 1 and increment when the first letter of field 1 changes (using substr).
This is my attempt using row_number but I am far off I think. There would be a time column in the output as well not shown above.
select event_time, st_location, Row_Number() over(partition by
SUBSTR(location,1,1) order
by event_time)
as groupnumber from pic
Any help would be really appreciated!
Edit:
Time Location Group Number
10:00 A-10112 1
11:00 A-10421 1
12:00 ST-10621 1
13:00 ST-23412 1
17:00 A-19112 2
18:00 ST-74712 2
19:00 A-87812 3
It is a gap and island problem. Use the following code:
select location,
dense_rank() over (partition by SUBSTR(location,1,1) order by grp)
from
(
select (row_number() over (order by time)) -
(row_number() over (partition by SUBSTR(location,1,1) order by time)) grp,
location,
time
from data
) t
order by time
dbfiddle demo
The main idea is in the subquery which isolates consecutive sequences of items (computation of grp column). The rest is simple once you have the grp column.
select DENSE_RANK() over(partition by SUBSTR("location",1,1) ORDER BY SUBSTR("location",1,2))
as Rownumber,
"location" from Table1;
Demo
http://sqlfiddle.com/#!4/21120/16

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.

Assign a counter in SQL Server to records with sequential dates, and only increment when dates not sequential

I am trying to assign a Trip # to records for Customers with sequential days, and increment the Trip ID if they have a break in sequential days, and come later in the month for example. The data structure looks like this:
CustomerID Date
1 2014-01-01
1 2014-01-02
1 2014-01-04
2 2014-01-01
2 2014-01-05
2 2014-01-06
2 2014-01-08
The desired output based upon the above example dataset would be:
CustomerID Date Trip
1 2014-01-01 1
1 2014-01-02 1
1 2014-01-04 2
2 2014-01-01 1
2 2014-01-05 2
2 2014-01-06 2
2 2014-01-08 3
So if the Dates for that Customer are back-to-back, it is considered the same Trip, and has the same Trip #. Is there a way to do this in SQL Server? I am using MSSQL 2012.
My initial thoughts are to use the LAG, ROW_NUMBER, or OVER/PARTITION BY function, or even a Recursive Table Variable Function. I can paste some code, but in all honesty, my code isn't working so far. If this is a simple query, but I am just not thinking about it correctly, that would be great.
Thank you in advance.
Since Date is a DATE (ie has no hours), you could for example use DENSE_RANK() by Date - ROW_NUMBER() days which will give a constant value for continuous days, something like;
WITH cte AS (
SELECT CustomerID, Date,
DATEADD(DAY,
-ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Date),
Date) dt
FROM trips
)
SELECT CustomerID, Date,
DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY dt)
FROM cte;
An SQLfiddle to test with.

How to build a daily history from a set of rows in SQL?

I have a simple SQLite table that records energy consumption all day long. It looks like that:
rowid amrid timestamp value
---------- ---------- ---------- ----------
1 1 1372434068 5720
2 2 1372434075 0
3 3 1372434075 90
4 1 1372434078 5800
5 2 1372434085 0
6 3 1372434085 95
I would like to build a simplified history of the consumption of the last day by getting the closest value for every 10 minutes to build a CSV which would look like:
date value
---------------- ----------
2013-07-01 00:00 90
2013-07-01 00:10 100
2013-07-01 00:20 145
As for now I have a request that allows me to get the closest value for one timestamp:
SELECT *
FROM indexes
WHERE amrid=3
ORDER BY ABS(timestamp - strftime('%s','2013-07-01 00:20:00'))
LIMIT 1;
How can I build a request that would do the trick to get it for the whole day?
Thanks,
Let me define "closest value" as the first value after each 10-minute interval begins. You can generalize the idea to other definitions, but I think this is easiest to explain the idea.
Convert the timestamp to a string value of the form "yyyy-mm-dd hhMM". This string is 15 characters long. The 10-minute interval would be the first 14 characters. So, using this as an aggregation key, calculate the min(id) and use this to join back to the original data.
Here is the idea:
select isum.*
from indexes i join
(select substr(strftime('%Y-%m-%d %H%M', timestamp), 1, 14) as yyyymmddhhm,
min(id) as whichid
from indexes
group by substr(strftime('%Y-%m-%d %H%M', timestamp), 1, 14)
) isum
on i.id = isum.whichid