Counting intersecting time intervals in T-SQL - sql

CODE:
CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME)
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME)
INSERT INTO #Temp1 (CoachID, BusyST, BusyET)
SELECT 1,'2016-08-17 09:12:00','2016-08-17 10:11:00'
UNION
SELECT 3,'2016-08-17 09:30:00','2016-08-17 10:00:00'
UNION
SELECT 4,'2016-08-17 12:07:00','2016-08-17 13:10:00'
INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET)
SELECT 1,'2016-08-17 09:07:00','2016-08-17 11:09:00'
UNION
SELECT 2,'2016-08-17 09:11:00','2016-08-17 09:30:00'
UNION
SELECT 3,'2016-08-17 09:24:00','2016-08-17 13:08:00'
UNION
SELECT 1,'2016-08-17 11:34:00','2016-08-17 12:27:00'
UNION
SELECT 4,'2016-08-17 09:34:00','2016-08-17 13:00:00'
UNION
SELECT 5,'2016-08-17 09:10:00','2016-08-17 09:55:00'
--RESULT-SET QUERY GOES HERE
DROP TABLE #Temp1
DROP TABLE #Temp2
DESIRED OUTPUT:
CoachID CanCoachST CanCoachET NumOfCoaches
1 2016-08-17 09:12:00.000 2016-08-17 09:24:00.000 2 --(ID2 = 2,5)
1 2016-08-17 09:24:00.000 2016-08-17 09:30:00.000 3 --(ID2 = 2,3,5)
1 2016-08-17 09:30:00.000 2016-08-17 09:34:00.000 1 --(ID2 = 5)
1 2016-08-17 09:34:00.000 2016-08-17 09:55:00.000 2 --(ID2 = 4,5)
1 2016-08-17 09:55:00.000 2016-08-17 10:00:00.000 1 --(ID2 = 4)
1 2016-08-17 10:00:00.000 2016-08-17 10:11:00.000 2 --(ID2 = 3,4)
3 2016-08-17 09:30:00.000 2016-08-17 09:34:00.000 1 --(ID2 = 5)
3 2016-08-17 09:34:00.000 2016-08-17 09:55:00.000 2 --(ID2 = 4,5)
3 2016-08-17 09:55:00.000 2016-08-17 10:00:00.000 1 --(ID2 = 4)
4 2016-08-17 12:07:00.000 2016-08-17 12:27:00.000 2 --(ID2 = 1,3)
4 2016-08-17 12:27:00.000 2016-08-17 13:08:00.000 1 --(ID2 = 3)
4 2016-08-17 13:08:00.000 2016-08-17 13:10:00.000 0 --(No one is available)
GOAL:
Consider #Temp1 as the table of team coaches (ID1) and their meeting times (ST1 = Meeting Start Time and ET1 = Meeting End Time).
Consider #Temp2 as the table of team coaches (ID2) and their total available times (ST2 = Available Start Time and ET2 = Available End Time).
Now, the goal is to find all possible coaches from #Temp2 who are available to coach during the meeting time of the coaches from #Temp1.
So for example, For the coach ID1 = 1, who is busy between 9:12 and 10:11 (data can span across multiple days, if that info matters), we have
coach ID2 = 2 and 5 that can coach between 9:12 and 9:24
, coach ID2 = 2,3, and 5 that can coach between 9:24 and 9:30
, coach ID2 = 5 that can coach between 9:30 and 9:34
, coach ID2 = 4 and 5 that can coach between 9:34 and 9:55
, coach ID2 = 4 that can coach between 9:55 and 10:00
, and coach ID2 = 3 and 4 that can coach between 10:00 and 10:11 (note how ID 3, although available in #Temp2 table between 9:24 and 13:08, it can't coach for ID1 = 1 between 9:24 and 10:00 because its also busy between 9:30 and 10:00.
My effort so far: Only dealing with breaking #Temp1's time bracket so far. Still need to figure out A) how to remove that non-busy time window from the output B) add a field/map it to right T1's CoachIDs.
;WITH ED
AS (SELECT BusyET, CoachID FROM #Temp1
UNION ALL
SELECT BusyST, CoachID FROM #Temp1
)
,Brackets
AS (SELECT MIN(BusyST) AS BusyST
,( SELECT MIN(BusyET)
FROM ED e
WHERE e.BusyET > MIN(BusyST)
) AS BusyET
FROM #Temp1 T
UNION ALL
SELECT B.BusyET
,e.BusyET
FROM Brackets B
INNER JOIN ED E ON B.BusyET < E.BusyET
WHERE NOT EXISTS (
SELECT *
FROM ED E2
WHERE E2.BusyET > B.BusyET
AND E2.BusyET < E.BusyET
)
)
SELECT *
FROM Brackets
ORDER BY BusyST;
I think I need to join on comparing ST/ET dates between two tables where IDs don't match each other. But I'm having trouble figuring out how to actually get only the meeting time window and unique count.
Updated with better schema/data-set. Also note, even though CoachID 4 is not "scheduled" to be available, he's still listed as busy for that last few minutes. And there can be scenario where no one else is available to work during that time, in which case, we can return 0 cnt record (or not return it if it's really complicated).
Again, the goal is to find count and combination of all available CoachIDs and their Available time window that can coach for the CoachIDs listed in the busy table.
Updated with more sample description matching sample data.

The query in this answer was inspired by the Packing Intervals by Itzik Ben-Gan.
At first I didn't understand the full complexity of the requirements and assumed that intervals in Table1 and Table2 don't overlap. I assumed that the same coach can't be busy and available at the same time.
It turns out that my assumption was wrong, so the first variant of the query that I'm leaving below has to be extended with preliminary step that subtracts all intervals stored in Table1 from intervals stored in Table2.
It uses the similar idea. Each start of the "available" interval is marked with +1 EventType and end of the "available" interval is marked with -1 EventType. For "busy" intervals the marks are reversed. "Busy" interval starts with -1 and ends with +1. This is done in C1_Subtract.
Then running total tells us where the "truly" available intervals are (C2_Subtract). Finally, CTE_Available leaves only "truly" available intervals.
Sample data
I added few rows to illustrate what happens if no coaches are available. I also added CoachID=9, which is not in the initial results of the first variant of the query.
CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME);
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME);
-- Start time is inclusive
-- End time is exclusive
INSERT INTO #Temp1 (CoachID, BusyST, BusyET) VALUES
(1, '2016-08-17 09:12:00','2016-08-17 10:11:00'),
(3, '2016-08-17 09:30:00','2016-08-17 10:00:00'),
(4, '2016-08-17 12:07:00','2016-08-17 13:10:00'),
(6, '2016-08-17 15:00:00','2016-08-17 16:00:00'),
(9, '2016-08-17 15:00:00','2016-08-17 16:00:00');
INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET) VALUES
(1,'2016-08-17 09:07:00','2016-08-17 11:09:00'),
(2,'2016-08-17 09:11:00','2016-08-17 09:30:00'),
(3,'2016-08-17 09:24:00','2016-08-17 13:08:00'),
(1,'2016-08-17 11:34:00','2016-08-17 12:27:00'),
(4,'2016-08-17 09:34:00','2016-08-17 13:00:00'),
(5,'2016-08-17 09:10:00','2016-08-17 09:55:00'),
(7,'2016-08-17 15:10:00','2016-08-17 15:20:00'),
(8,'2016-08-17 15:15:00','2016-08-17 15:25:00'),
(7,'2016-08-17 15:40:00','2016-08-17 15:55:00'),
(9,'2016-08-17 15:05:00','2016-08-17 15:07:00'),
(9,'2016-08-17 15:40:00','2016-08-17 16:55:00');
Intermediate results of CTE_Available
+---------+-------------------------+-------------------------+
| CoachID | AvailableST | AvailableET |
+---------+-------------------------+-------------------------+
| 1 | 2016-08-17 09:07:00.000 | 2016-08-17 09:12:00.000 |
| 1 | 2016-08-17 10:11:00.000 | 2016-08-17 11:09:00.000 |
| 1 | 2016-08-17 11:34:00.000 | 2016-08-17 12:27:00.000 |
| 2 | 2016-08-17 09:11:00.000 | 2016-08-17 09:30:00.000 |
| 3 | 2016-08-17 09:24:00.000 | 2016-08-17 09:30:00.000 |
| 3 | 2016-08-17 10:00:00.000 | 2016-08-17 13:08:00.000 |
| 4 | 2016-08-17 09:34:00.000 | 2016-08-17 12:07:00.000 |
| 5 | 2016-08-17 09:10:00.000 | 2016-08-17 09:55:00.000 |
| 7 | 2016-08-17 15:10:00.000 | 2016-08-17 15:20:00.000 |
| 7 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |
| 8 | 2016-08-17 15:15:00.000 | 2016-08-17 15:25:00.000 |
| 9 | 2016-08-17 16:00:00.000 | 2016-08-17 16:55:00.000 |
+---------+-------------------------+-------------------------+
Now we can use these intermediate results of CTE_Available instead of #Temp2 in the first variant of the query. See detailed explanations below the first variant of the query.
Full query
WITH
C1_Subtract
AS
(
SELECT
CoachID
,AvailableST AS ts
,+1 AS EventType
FROM #Temp2
UNION ALL
SELECT
CoachID
,AvailableET AS ts
,-1 AS EventType
FROM #Temp2
UNION ALL
SELECT
CoachID
,BusyST AS ts
,-1 AS EventType
FROM #Temp1
UNION ALL
SELECT
CoachID
,BusyET AS ts
,+1 AS EventType
FROM #Temp1
)
,C2_Subtract AS
(
SELECT
C1_Subtract.*
,SUM(EventType)
OVER (
PARTITION BY CoachID
ORDER BY ts, EventType DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS cnt
,LEAD(ts)
OVER (
PARTITION BY CoachID
ORDER BY ts, EventType DESC)
AS NextTS
FROM C1_Subtract
)
,CTE_Available
AS
(
SELECT
C2_Subtract.CoachID
,C2_Subtract.ts AS AvailableST
,C2_Subtract.NextTS AS AvailableET
FROM C2_Subtract
WHERE cnt > 0
)
,CTE_Intervals
AS
(
SELECT
TBusy.CoachID AS BusyCoachID
,TBusy.BusyST
,TBusy.BusyET
,CA.CoachID AS AvailableCoachID
,CA.AvailableST
,CA.AvailableET
-- max of start time
,CASE WHEN CA.AvailableST < TBusy.BusyST
THEN TBusy.BusyST
ELSE CA.AvailableST
END AS ST
-- min of end time
,CASE WHEN CA.AvailableET > TBusy.BusyET
THEN TBusy.BusyET
ELSE CA.AvailableET
END AS ET
FROM
#Temp1 AS TBusy
CROSS APPLY
(
SELECT
TAvailable.*
FROM
CTE_Available AS TAvailable
WHERE
-- the same coach can't be available and busy
TAvailable.CoachID <> TBusy.CoachID
-- intervals intersect
AND TAvailable.AvailableST < TBusy.BusyET
AND TAvailable.AvailableET > TBusy.BusyST
) AS CA
)
,C1 AS
(
SELECT
BusyCoachID
,AvailableCoachID
,ST AS ts
,+1 AS EventType
FROM CTE_Intervals
UNION ALL
SELECT
BusyCoachID
,AvailableCoachID
,ET AS ts
,-1 AS EventType
FROM CTE_Intervals
UNION ALL
SELECT
CoachID AS BusyCoachID
,CoachID AS AvailableCoachID
,BusyST AS ts
,+1 AS EventType
FROM #Temp1
UNION ALL
SELECT
CoachID AS BusyCoachID
,CoachID AS AvailableCoachID
,BusyET AS ts
,-1 AS EventType
FROM #Temp1
)
,C2 AS
(
SELECT
C1.*
,SUM(EventType)
OVER (
PARTITION BY BusyCoachID
ORDER BY ts, EventType DESC, AvailableCoachID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
- 1 AS cnt
,LEAD(ts)
OVER (
PARTITION BY BusyCoachID
ORDER BY ts, EventType DESC, AvailableCoachID)
AS NextTS
FROM C1
)
SELECT
BusyCoachID AS CoachID
,ts AS CanCoachST
,NextTS AS CanCoachET
,cnt AS NumOfCoaches
FROM C2
WHERE ts <> NextTS
ORDER BY BusyCoachID, CanCoachST
;
Final result
+---------+-------------------------+-------------------------+--------------+
| CoachID | CanCoachST | CanCoachET | NumOfCoaches |
+---------+-------------------------+-------------------------+--------------+
| 1 | 2016-08-17 09:12:00.000 | 2016-08-17 09:24:00.000 | 2 |
| 1 | 2016-08-17 09:24:00.000 | 2016-08-17 09:30:00.000 | 3 |
| 1 | 2016-08-17 09:30:00.000 | 2016-08-17 09:34:00.000 | 1 |
| 1 | 2016-08-17 09:34:00.000 | 2016-08-17 09:55:00.000 | 2 |
| 1 | 2016-08-17 09:55:00.000 | 2016-08-17 10:00:00.000 | 1 |
| 1 | 2016-08-17 10:00:00.000 | 2016-08-17 10:11:00.000 | 2 |
| 3 | 2016-08-17 09:30:00.000 | 2016-08-17 09:34:00.000 | 1 |
| 3 | 2016-08-17 09:34:00.000 | 2016-08-17 09:55:00.000 | 2 |
| 3 | 2016-08-17 09:55:00.000 | 2016-08-17 10:00:00.000 | 1 |
| 4 | 2016-08-17 12:07:00.000 | 2016-08-17 12:27:00.000 | 2 |
| 4 | 2016-08-17 12:27:00.000 | 2016-08-17 13:08:00.000 | 1 |
| 4 | 2016-08-17 13:08:00.000 | 2016-08-17 13:10:00.000 | 0 |
| 6 | 2016-08-17 15:00:00.000 | 2016-08-17 15:10:00.000 | 0 |
| 6 | 2016-08-17 15:10:00.000 | 2016-08-17 15:15:00.000 | 1 |
| 6 | 2016-08-17 15:15:00.000 | 2016-08-17 15:20:00.000 | 2 |
| 6 | 2016-08-17 15:20:00.000 | 2016-08-17 15:25:00.000 | 1 |
| 6 | 2016-08-17 15:25:00.000 | 2016-08-17 15:40:00.000 | 0 |
| 6 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 | 1 |
| 6 | 2016-08-17 15:55:00.000 | 2016-08-17 16:00:00.000 | 0 |
| 9 | 2016-08-17 15:00:00.000 | 2016-08-17 15:10:00.000 | 0 |
| 9 | 2016-08-17 15:10:00.000 | 2016-08-17 15:15:00.000 | 1 |
| 9 | 2016-08-17 15:15:00.000 | 2016-08-17 15:20:00.000 | 2 |
| 9 | 2016-08-17 15:20:00.000 | 2016-08-17 15:25:00.000 | 1 |
| 9 | 2016-08-17 15:25:00.000 | 2016-08-17 15:40:00.000 | 0 |
| 9 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 | 1 |
| 9 | 2016-08-17 15:55:00.000 | 2016-08-17 16:00:00.000 | 0 |
+---------+-------------------------+-------------------------+--------------+
I'd recommend to create the following indexes to avoid some Sorts in the execution plan.
CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_BusyST] ON #Temp1
(
CoachID ASC,
BusyST ASC
);
CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_BusyET] ON #Temp1
(
CoachID ASC,
BusyET ASC
);
CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_AvailableST] ON #Temp2
(
CoachID ASC,
AvailableST ASC
);
CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_AvailableET] ON #Temp2
(
CoachID ASC,
AvailableET ASC
);
On real data, though, the bottleneck may be somewhere else, which may depend on the data distribution. The query is rather complicated and tuning it without real data would be too much guesswork.
First variant of the query
Run the query step-by-step, CTE-by-CTE and examine intermediate results to undestand how it works.
CTE_Intervals gives us a list of available intervals that intersect with busy intervals.
C1 puts both start and end times in the same column with the corresponding EventType. This will help us track when an interval starts or ends.
A running total of EventType gives the count of available coaches. C1 unions busy coaches into the mix to properly count cases when no coach is available.
WITH
CTE_Intervals
AS
(
SELECT
TBusy.CoachID AS BusyCoachID
,TBusy.BusyST
,TBusy.BusyET
,CA.CoachID AS AvailableCoachID
,CA.AvailableST
,CA.AvailableET
-- max of start time
,CASE WHEN CA.AvailableST < TBusy.BusyST
THEN TBusy.BusyST
ELSE CA.AvailableST
END AS ST
-- min of end time
,CASE WHEN CA.AvailableET > TBusy.BusyET
THEN TBusy.BusyET
ELSE CA.AvailableET
END AS ET
FROM
#Temp1 AS TBusy
CROSS APPLY
(
SELECT
TAvailable.*
FROM
#Temp2 AS TAvailable
WHERE
-- the same coach can't be available and busy
TAvailable.CoachID <> TBusy.CoachID
-- intervals intersect
AND TAvailable.AvailableST < TBusy.BusyET
AND TAvailable.AvailableET > TBusy.BusyST
) AS CA
)
,C1 AS
(
SELECT
BusyCoachID
,AvailableCoachID
,ST AS ts
,+1 AS EventType
FROM CTE_Intervals
UNION ALL
SELECT
BusyCoachID
,AvailableCoachID
,ET AS ts
,-1 AS EventType
FROM CTE_Intervals
UNION ALL
SELECT
CoachID AS BusyCoachID
,CoachID AS AvailableCoachID
,BusyST AS ts
,+1 AS EventType
FROM #Temp1
UNION ALL
SELECT
CoachID AS BusyCoachID
,CoachID AS AvailableCoachID
,BusyET AS ts
,-1 AS EventType
FROM #Temp1
)
,C2 AS
(
SELECT
C1.*
,SUM(EventType)
OVER (
PARTITION BY BusyCoachID
ORDER BY ts, EventType DESC, AvailableCoachID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
- 1 AS cnt
,LEAD(ts)
OVER (
PARTITION BY BusyCoachID
ORDER BY ts, EventType DESC, AvailableCoachID)
AS NextTS
FROM C1
)
SELECT
BusyCoachID AS CoachID
,ts AS CanCoachST
,NextTS AS CanCoachET
,cnt AS NumOfCoaches
FROM C2
WHERE ts <> NextTS
ORDER BY BusyCoachID, CanCoachST
;
DROP TABLE #Temp1;
DROP TABLE #Temp2;
Result
I've added comments for each line with IDs of available coaches that were counted.
Now I understand why my initial result was not the same as your expected result.
+---------+---------------------+---------------------+--------------+
| CoachID | CanCoachST | CanCoachET | NumOfCoaches |
+---------+---------------------+---------------------+--------------+
| 1 | 2016-08-17 09:12:00 | 2016-08-17 09:24:00 | 2 | 2,5
| 1 | 2016-08-17 09:24:00 | 2016-08-17 09:30:00 | 3 | 2,3,5
| 1 | 2016-08-17 09:30:00 | 2016-08-17 09:34:00 | 2 | 3,5
| 1 | 2016-08-17 09:34:00 | 2016-08-17 09:55:00 | 3 | 3,4,5
| 1 | 2016-08-17 09:55:00 | 2016-08-17 10:11:00 | 2 | 3,4
| 3 | 2016-08-17 09:30:00 | 2016-08-17 09:34:00 | 2 | 1,5
| 3 | 2016-08-17 09:34:00 | 2016-08-17 09:55:00 | 3 | 1,4,5
| 3 | 2016-08-17 09:55:00 | 2016-08-17 10:00:00 | 2 | 1,4
| 4 | 2016-08-17 12:07:00 | 2016-08-17 12:27:00 | 2 | 3,1
| 4 | 2016-08-17 12:27:00 | 2016-08-17 13:08:00 | 1 | 3
| 4 | 2016-08-17 13:08:00 | 2016-08-17 13:10:00 | 0 | none
| 6 | 2016-08-17 15:00:00 | 2016-08-17 15:10:00 | 0 | none
| 6 | 2016-08-17 15:10:00 | 2016-08-17 15:15:00 | 1 | 7
| 6 | 2016-08-17 15:15:00 | 2016-08-17 15:20:00 | 2 | 7,8
| 6 | 2016-08-17 15:20:00 | 2016-08-17 15:25:00 | 1 | 8
| 6 | 2016-08-17 15:25:00 | 2016-08-17 15:40:00 | 0 | none
| 6 | 2016-08-17 15:40:00 | 2016-08-17 15:55:00 | 1 | 7
| 6 | 2016-08-17 15:55:00 | 2016-08-17 16:00:00 | 0 | none
+---------+---------------------+---------------------+--------------+

This query will do the calculations:
SELECT TT1.ID1
, case when TT2.ST2 < TT1.ST1 THEN TT1.ST1 ELSE TT2.ST2 END
, case when TT2.ET2 > TT1.ET1 THEN TT1.ET1 ELSE TT2.ET2 END
, COUNT(distinct TT2.id2)
FROM #Temp1 TT1 INNER JOIN #Temp2 TT2
ON TT1.ET1 > TT2.ST2 AND TT1.ST1 < TT2.ET2 AND TT1.ID1 <> TT2.ID2
GROUP BY TT1.ID1
, case when TT2.ST2 < TT1.ST1 THEN TT1.ST1 ELSE TT2.ST2 END
, case when TT2.ET2 > TT1.ET1 THEN TT1.ET1 ELSE TT2.ET2 END
However, the result will include the slots where coaches cal fill in for the full time slot, e.g for the Coach 1 there will be three slots: from 9:00 to 9:30 with substitute coach #2, from 9:30 to 10:00 substitute coach #4 and the timeslot from 9:00 to 10:00 with substitute coaches #3 and #4. Here is the whole result:
ID1
----------- ----------------------- ----------------------- -----------
1 2016-08-17 09:00:00.000 2016-08-17 09:30:00.000 1
1 2016-08-17 09:00:00.000 2016-08-17 10:00:00.000 2
1 2016-08-17 09:30:00.000 2016-08-17 10:00:00.000 1
3 2016-08-17 09:30:00.000 2016-08-17 10:00:00.000 3
4 2016-08-17 12:00:00.000 2016-08-17 12:30:00.000 1
4 2016-08-17 12:00:00.000 2016-08-17 13:00:00.000 1

As best I can tell, what you're looking for is something like this:
;WITH CTE AS (
SELECT ID1, ST1, DATEADD(MINUTE, 30, ST1) ET1
FROM #Temp1
UNION ALL
SELECT C.ID1, C.ET1, DATEADD(MINUTE, 30, C.ET1)
FROM CTE C
JOIN #Temp1 T ON T.ID1 = C.ID1
WHERE T.ET1 >= DATEADD(MINUTE, 30, C.ET1))
SELECT *
FROM CTE C
OUTER APPLY (
SELECT COUNT(*) ID2Cnt
FROM #Temp2 T
WHERE ST2 <= C.ST1
AND ET2 >= C.ET1
AND ID2 <> C.ID1
AND NOT EXISTS (
SELECT 1
FROM CTE
WHERE ID1 = T.ID2
AND ST1 <= C.ST1
AND ET1 >= C.ET1)) T
ORDER BY ID1, ST1;
The CTE will split your #Temp1 coaches up into half hour slots, and then I'm assuming you want to find all the people in #Temp2 who aren't the same ID and have a shift that starts earlier or at the same time and ends after or at the same time... NOTE: I'm assuming blocks can only be half an hour here.
EDIT: Never mind... I just realised you also want to discount the busy people in #Temp1 from the result set so I added a not exists clause in the apply...

I recommend using the concept of an interval / timeslot table.
Another way of explaining it is consider a "Time Dimension Table"
Define all your times, and then record your facts with references to the time intervals at the granularity you care about. Because you had times ending in 7 and 11 minutes I chose 1 minute intervals, though I recommend 15-30 minute intervals.
By doing this it makes it easy to join / compare the tables.
Consider design / implementation below:
-- dimension table
-- drop table #intervals
create table #intervals(intervalId int identity(1,1) not null primary key clustered,intervalStartTime datetime unique)
declare #s datetime, #e datetime, #i int
set #s = '2016-08-16'
set #e = '2016-08-18'
set #i = 1
while (#s <= #e )
begin
insert into #intervals(intervalStartTime) values(#s)
set #s = dateadd(minute, #i, #s)
end
-- fact table:
-- drop table #Fact
create table #Fact(intervalId int, coachid int, isBusy int default(0) , isAvailable int default(0))
-- record every coach's times
insert into #Fact(coachid,intervalId)
select distinct c.coachid, i.intervalId
from
(
select distinct coachid from #temp1
union
select distinct coachid from #temp2
) c cross join #intervals i
-- record free / busy info
update f set isbusy = 1
from #intervals i inner join #fact f on i.intervalId = f.intervalId
inner join #temp1 t on f.coachid = t.coachid and i.intervalStartTime between t.BusyST and t.BusyET
-- record free / busy info
update f set isAvailable = 1
from #intervals i inner join #fact f on i.intervalId = f.intervalId
inner join #temp2 t on f.coachid = t.coachid and i.intervalStartTime between t.AvailableST and t.AvailableET
-- construct your query to find common times,etc
select * from #intervals i inner join #Fact f on i.intervalId = f.intervalId
-- example result showing # of coaches available vs free
select i.intervalId, i.intervalStartTime, sum(isBusy) as coachesBusy, sum(isAvailable) as coachesAvailable
from #intervals i inner join #Fact f on i.intervalId = f.intervalId
group by i.intervalId, i.intervalStartTime
having sum(isBusy) < sum(isAvailable)
you can then look for common or unique interval ids however you need.
let me know if you require additional clarification.

I'm using a little numbers table ... you don't need something for dates, just numbers. What I am building here is smaller than what you would use in a real scenario.
CREATE TABLE dbo.Numbers (Num INT PRIMARY KEY CLUSTERED);
WITH E1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS t(N))
,E2 AS (SELECT N = 1 FROM E1 AS a, E1 AS b)
,E4 AS (SELECT N = 1 FROM E2 AS a, E2 AS b)
,cteTally AS (SELECT N = 0 UNION ALL
SELECT N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4)
INSERT INTO dbo.Numbers (Num)
SELECT N FROM cteTally;
Pleae note the #startDate below ... it is artificially close to the dates you're dealing with and in a real prod scenario you would have that date be earlier to go along with your larger Numbers table.
Here is the solution to your problem and it will work with older SQL Server versions (as well as the 2012 you have tagged):
DECLARE #startDate DATETIME = '20160817';
WITH cteBusy AS
(
SELECT num.Num
, busy.CoachID
FROM #Temp1 AS busy
JOIN dbo.Numbers AS num
ON num.Num >= DATEDIFF(MINUTE, #startDate, busy.BusyST)
AND num.Num < DATEDIFF(MINUTE, #startDate, busy.BusyET)
)
, cteAvailable AS
(
SELECT num.Num
, avail.CoachID
FROM #Temp2 AS avail
JOIN dbo.Numbers AS num
ON num.Num >= DATEDIFF(MINUTE, #startDate, avail.AvailableST)
AND num.Num < DATEDIFF(MINUTE, #startDate, avail.AvailableET)
LEFT JOIN cteBusy AS b
ON b.Num = num.Num
AND b.CoachID = avail.CoachID
WHERE b.Num IS NULL
)
, cteGrouping AS
(
SELECT b.Num
, b.CoachID
, NumOfCoaches = COUNT(a.CoachID)
FROM cteBusy AS b
LEFT JOIN cteAvailable AS a
ON a.Num = b.Num
GROUP BY b.Num, b.CoachID
)
, cteFinal AS
(
SELECT cte.Num
, cte.CoachID
, cte.NumOfCoaches
, block = cte.Num - ROW_NUMBER() OVER(PARTITION BY cte.CoachID, cte.NumOfCoaches ORDER BY cte.Num)
FROM cteGrouping AS cte
)
SELECT cte.CoachID
, CanCoachST = DATEADD(MINUTE, MIN(cte.Num), #startDate)
, CanCoachET = DATEADD(MINUTE, MAX(cte.Num) + 1, #startDate)
, cte.NumOfCoaches
FROM cteFinal AS cte
GROUP BY cte.CoachId, cte.NumOfCoaches, cte.block
ORDER BY cte.CoachID, CanCoachST;
Enjoy!

I believe the following query will work, however I can make no promises on performance.
CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME)
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME)
INSERT INTO #Temp1 (CoachID, BusyST, BusyET)
SELECT 1,'2016-08-17 09:12:00','2016-08-17 10:11:00'
UNION
SELECT 3,'2016-08-17 09:30:00','2016-08-17 10:00:00'
UNION
SELECT 4,'2016-08-17 12:07:00','2016-08-17 13:10:00'
INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET)
SELECT 1,'2016-08-17 09:07:00','2016-08-17 11:09:00'
UNION
SELECT 2,'2016-08-17 09:11:00','2016-08-17 09:30:00'
UNION
SELECT 3,'2016-08-17 09:24:00','2016-08-17 13:08:00'
UNION
SELECT 1,'2016-08-17 11:34:00','2016-08-17 12:27:00'
UNION
SELECT 4,'2016-08-17 09:34:00','2016-08-17 13:00:00'
UNION
SELECT 5,'2016-08-17 09:10:00','2016-08-17 09:55:00'
;WITH WorkScheduleWithID -- Select work schedule (#Temp2 – available times) and generate ID for each schedule entry.
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [CoachID]) AS [ID]
,[WS].[CoachID]
,[WS].[AvailableST] AS [Start]
,[WS].[AvailableET] As [End]
FROM #Temp2 [WS]
), SchedulesIntersect -- Determine where work schedule and meeting schedule (busy times) intersect.
AS
(
SELECT [ID]
,[CoachID]
,[Start]
,[End]
,[IntersectTime]
,SUM([Availability]) OVER (PARTITION BY [ID] ORDER BY [IntersectTime]) AS GroupID
FROM (
SELECT [WS].[ID]
,[WS].[CoachID]
,[WS].[Start]
,[WS].[End]
,[MS1].[BusyST] AS [IntersectTime]
,0 AS [Availability]
FROM WorkScheduleWithID [WS]
INNER JOIN #Temp1 [MS1] ON ([MS1].[CoachID] = [WS].[CoachID])
AND
( ([MS1].[BusyST] > [WS].[Start]) AND ([MS1].[BusyST] < [WS].[End]) ) -- Meeting start contained with in work schedule
UNION ALL
SELECT [WS].[ID]
,[WS].[CoachID]
,[WS].[Start]
,[WS].[End]
,[MS2].[BusyET] AS [IntersectTime]
,1 AS [Availability]
FROM WorkScheduleWithID [WS]
INNER JOIN #Temp1 [MS2] ON ([MS2].[CoachID] = [WS].[CoachID])
AND
( ([MS2].BusyET > [WS].[Start]) AND ([MS2].BusyET < [WS].[End]) ) -- Meeting end contained with in work schedule
) Intersects
),ActualAvailability -- Determine actual availability of each coach based on work schedule and busy time.
AS
(
SELECT [ID]
,[CoachID]
,(
CASE
WHEN [GroupID] = 0 THEN [Start]
ELSE MIN([IntersectTime])
END
) AS [Start]
,(
CASE
WHEN ( ([GroupID] > 0) AND (MIN([IntersectTime]) = MAX([IntersectTime])) ) THEN [End]
ELSE MAX([IntersectTime])
END
) AS [End]
FROM SchedulesIntersect
GROUP BY [ID], [CoachID], [Start], [End], [GroupID]
UNION ALL
SELECT [ID]
,[CoachID]
,[Start]
,[End]
FROM WorkScheduleWithID WS
WHERE WS.ID NOT IN (SELECT ID FROM SchedulesIntersect)
),TimeIntervals -- Determine time intervals for which each coach’s availability will be checked against.
AS
(
SELECT DISTINCT *
FROM (
SELECT MS.CoachID
,MS.BusyST
,MS.BusyET
,(
CASE
WHEN AC.Start < MS.BusyST THEN MS.BusyST
ELSE AC.Start
END
) AS [TS]
FROM #Temp1 MS
LEFT OUTER JOIN ActualAvailability AC ON (AC.CoachID <> MS.CoachID)
AND
(
( (MS.[BusyST] <= AC.[Start]) AND (MS.[BusyET] >= AC.[End]) ) OR -- Meeting covers entire work schedule
( (MS.[BusyST] > AC.[Start]) AND (MS.[BusyET] < AC.[End]) ) OR -- Meeting is contained with in work schedule
( (MS.[BusyST] < AC.[Start]) AND (MS.[BusyET] > AC.[Start]) AND ([MS].[BusyET] < AC.[End]) ) OR -- Meeting ends within work schedule (partial overlap)
( (MS.[BusyST] > AC.[Start]) AND (MS.[BusyST] < AC.[End]) AND ([MS].[BusyET] > AC.[End]) ) -- Meeting starts within work schedule (partial overlap)
)
UNION ALL
SELECT MS.CoachID
,MS.BusyST
,MS.BusyET
,(
CASE
WHEN AC.[End] > MS.BusyET THEN MS.BusyET
ELSE AC.[End]
END
) AS [TS]
FROM #Temp1 MS
LEFT OUTER JOIN ActualAvailability AC ON (AC.CoachID <> MS.CoachID)
AND
(
( (MS.[BusyST] <= AC.[Start]) AND (MS.[BusyET] >= AC.[End]) ) OR -- Meeting covers entire work schedule
( (MS.[BusyST] > AC.[Start]) AND (MS.[BusyET] < AC.[End]) ) OR -- Meeting is contained with in work schedule
( (MS.[BusyST] < AC.[Start]) AND (MS.[BusyET] > AC.[Start]) AND ([MS].[BusyET] < AC.[End]) ) OR -- Meeting ends within work schedule (partial overlap)
( (MS.[BusyST] > AC.[Start]) AND (MS.[BusyST] < AC.[End]) AND ([MS].[BusyET] > AC.[End]) ) -- Meeting starts within work schedule (partial overlap)
)
) Intervals
),AvailableCoachTimeSegments -- Determine each coach’s availability against each time interval being checked.
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY TI.CoachID ORDER BY TI.Start, AT.CoachID) AS RankAsc
,ROW_NUMBER() OVER (PARTITION BY TI.CoachID ORDER BY TI.[End] DESC, AT.CoachID DESC) AS RankDesc
,TI.CoachID
,TI.BusyST
,TI.BusyET
,TI.Start
,TI.[End]
,AT.CoachID AS AvailableCoachID
,AT.Start AS AvailableStart
,AT.[End] AS AvailableEnd
,(
CASE
WHEN (MIN(TI.[Start]) OVER (PARTITION BY TI.CoachID)) <> TI.BusyST THEN 1
ELSE 0
END
) AS StartIncomplete
,(
CASE
WHEN (MAX(TI.[End]) OVER (PARTITION BY TI.CoachID)) <> TI.BusyET THEN 1
ELSE 0
END
) AS EndIncomplete
FROM (
SELECT CoachID
,BusyST
,BusyET
,TS AS [Start]
,LEAD(TS, 1, TS) OVER (PARTITION BY CoachID ORDER BY TS) AS [End]
FROM TimeIntervals
) TI
LEFT OUTER JOIN ActualAvailability AT ON
(
( (AT.[Start] <= TI.[Start]) AND (AT.[End] >= TI.[End]) ) OR -- Coach availability covers entire time segment
( (AT.[Start] > TI.[Start]) AND (AT.[End] < TI.[End]) ) OR -- Coach availability is contained within the time segment
( (AT.[Start] < TI.[Start]) AND (AT.[End] > TI.[Start]) AND (AT.[End] < TI.[End]) ) OR -- Coach availability ends within the time segment (partial overlap)
( (AT.[Start] > TI.[Start]) AND (AT.[Start] < TI.[End]) AND (AT.[End] > TI.[End]) ) -- Coach availability starts within the time segment (partial overlap)
)
)
-- Final result
SELECT CoachID
,BusyST
,BusyET
,Start AS CanCoachST
,[End] AS CanCoachET
,COUNT(AvailableCoachID) AS NumOfCoaches
,ISNULL(STUFF((
SELECT TOP 100 PERCENT ', ' + CAST(AvailableCoach.AvailableCoachID AS VARCHAR(MAX))
FROM AvailableCoachTimeSegments AvailableCoach
WHERE (AvailableCoach.CoachID = Results.CoachID AND AvailableCoach.Start = Results.Start AND AvailableCoach.[End] = Results.[End])
ORDER BY AvailableCoach.AvailableCoachID
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,''), '(No one is available)') AS AvailableCoaches
FROM AvailableCoachTimeSegments Results
WHERE [Start] <> [End]
GROUP BY CoachID, BusyST, BusyET, Start, [End], StartIncomplete, EndIncomplete
UNION ALL -- Add any missing time segments at the start of the busy time or end of the busy time.
SELECT CoachID
,BusyST
,BusyET
,(
CASE
WHEN StartIncomplete = 1 THEN BusyST
WHEN EndIncomplete = 1 THEN MAX([End])
ELSE Start
END
) AS CanCoachST
,(
CASE
WHEN StartIncomplete = 1 THEN Start
WHEN EndIncomplete = 1 THEN BusyET
ELSE [End]
END
) AS CanCoachET
,0 AS NumOfCoaches
,'(No one is available)' AS AvailableCoaches
FROM AvailableCoachTimeSegments Results
WHERE [Start] <> [End] AND ( (StartIncomplete = 1 AND RankAsc = 1) OR (EndIncomplete = 1 AND RankDesc = 1) )
GROUP BY CoachID, BusyST, BusyET, Start, [End], StartIncomplete, EndIncomplete
ORDER BY CoachID, CanCoachST
DROP TABLE #Temp1
DROP TABLE #Temp2

This is your expected result that is taking into consideration of the Busy Coaches that overlap the Available Coaches.
| CoachID | CanCoachST | CanCoachET | NumOfCoaches | CanCoach |
|---------|------------------|------------------|--------------|----------|
| 1 | 2016-08-17 09:12 | 2016-08-17 09:24 | 2 | 2, 5 |
| 1 | 2016-08-17 09:24 | 2016-08-17 09:30 | 3 | 2, 3, 5 |
| 1 | 2016-08-17 09:30 | 2016-08-17 09:34 | 1 | 5 |
| 1 | 2016-08-17 09:34 | 2016-08-17 09:55 | 2 | 4, 5 |
| 1 | 2016-08-17 09:55 | 2016-08-17 10:00 | 1 | 4 |
| 1 | 2016-08-17 10:00 | 2016-08-17 10:11 | 2 | 3, 4 |
| 3 | 2016-08-17 09:30 | 2016-08-17 09:34 | 1 | 5 |
| 3 | 2016-08-17 09:34 | 2016-08-17 09:55 | 2 | 4, 5 |
| 3 | 2016-08-17 09:55 | 2016-08-17 10:00 | 1 | 4 |
| 4 | 2016-08-17 12:07 | 2016-08-17 12:27 | 2 | 1, 3 |
| 4 | 2016-08-17 12:27 | 2016-08-17 13:08 | 1 | 3 |
| 4 | 2016-08-17 13:08 | 2016-08-17 13:10 | 0 | NULL |
#Temp1 as Busy Coaches:
| CoachID | BusyST | BusyET |
|---------|------------------|------------------|
| 1 | 2016-08-17 09:12 | 2016-08-17 10:11 |
| 3 | 2016-08-17 09:30 | 2016-08-17 10:00 |
| 4 | 2016-08-17 12:07 | 2016-08-17 13:10 |
#Temp2 as Available Coaches:
| CoachID | AvailableST | AvailableET |
|---------|------------------|------------------|
| 1 | 2016-08-17 09:07 | 2016-08-17 11:09 |
| 1 | 2016-08-17 11:34 | 2016-08-17 12:27 |
| 2 | 2016-08-17 09:11 | 2016-08-17 09:30 |
| 3 | 2016-08-17 09:24 | 2016-08-17 13:08 |
| 4 | 2016-08-17 09:34 | 2016-08-17 13:00 |
| 5 | 2016-08-17 09:10 | 2016-08-17 09:55 |
The script below is a bit long.
;
with
st
(
CoachID,
CanCoachST
)
as
(
select
bound.CoachID,
s.BusyST
from
#Temp1 as s
cross apply
(
select
b.CoachID,
b.BusyST,
b.BusyET
from
#Temp1 as b
where 1 = 1
and s.BusyST between b.BusyST and b.BusyET
)
as bound
union all
select
bound.CoachID,
s.BusyET
from
#Temp1 as s
cross apply
(
select
b.CoachID,
b.BusyST,
b.BusyET
from
#Temp1 as b
where 1 = 1
and s.BusyET between b.BusyST and b.BusyET
and s.CoachID != b.CoachID
)
as bound
union all
select
bound.CoachID,
s.AvailableST
from
#Temp2 as s
cross apply
(
select
b.CoachID,
b.BusyST,
b.BusyET
from
#Temp1 as b
where 1 = 1
and s.AvailableST between b.BusyST and b.BusyET
)
as bound
union all
select
bound.CoachID,
s.AvailableET
from
#Temp2 as s
cross apply
(
select
b.CoachID,
b.BusyST,
b.BusyET
from
#Temp1 as b
where 1 = 1
and s.AvailableET between b.BusyST and b.BusyET
and s.CoachID != b.CoachID
)
as bound
),
d as
(
select distinct
CoachID,
CanCoachST
from
st
),
r as
(
select
row_number() over (order by CoachID, CanCoachST) as RowID,
CoachID,
CanCoachST
from
d
),
rng as
(
select
r1.RowID,
r1.CoachID,
r1.CanCoachST,
case when r1.CoachID = r2.CoachID
then r2.CanCoachST else t.BusyET end as CanCoachET
from
r as r1
left join
r as r2
on
r1.RowID = r2.RowID - 1
left join
#Temp1 as t
on
t.CoachID = r1.CoachID
),
c as
(
select
rng.RowID,
rng.CoachID,
rng.CanCoachST,
rng.CanCoachET,
t2.CoachID as CanCoachID
from
rng
cross join
#Temp1 as t1
cross join
#Temp2 as t2
where 1 = 1
and t2.CoachID != rng.CoachID
and t2.AvailableST <= rng.CanCoachST
and t2.AvailableET >= rng.CanCoachET
),
b as
(
select
rng.RowID,
rng.CoachID,
rng.CanCoachST,
rng.CanCoachET,
t1.CoachID as BusyCoachID
from
rng
cross join
#Temp1 as t1
where 1 = 1
and t1.CoachID != rng.CoachID
and t1.BusyST <= rng.CanCoachST
and t1.BusyET >= rng.CanCoachET
),
e as
(
select
c.RowID,
c.CoachID,
c.CanCoachST,
c.CanCoachET,
c.CanCoachID
from
c
except
select
b.RowID,
b.CoachID,
b.CanCoachST,
b.CanCoachET,
b.BusyCoachID
from
b
),
f as
(
select
rng.RowID,
rng.CoachID,
rng.CanCoachST,
rng.CanCoachET,
e.CanCoachID
from
rng
left join
e
on
e.RowID = rng.RowID
)
select
f.CoachID,
f.CanCoachST,
f.CanCoachET,
count(f.CanCoachID) as NumOfCoaches,
stuff
(
(
select ', ' + cast(f1.CanCoachID as varchar(5))
from f as f1 where f1.RowID = f.RowID
for xml path('')
),
1, 2, ''
)
as CanCoach
from
f
group by
f.RowID,
f.CoachID,
f.CanCoachST,
f.CanCoachET
order by
1, 2

Related

Time difference between rows based on condition

Not really an expert with SQL and im having problems figuring out how to do this one.
Got a table like this one:
ID
Message
TimeStamp
User
1
Hello
2022-08-01 10:00:00
A
1
How are you?
2022-08-01 10:00:05
A
1
Hello there
2022-08-01 10:00:10
B
1
I am okay
2022-08-01 10:00:12
B
1
Good to know
2022-08-01 10:00:15
A
1
Bye
2022-08-01 10:00:25
B
2
Hello
2022-08-01 10:02:50
A
2
Hi
2022-08-01 10:03:50
B
I need to calculate the time difference each time there is a response from the B user after a message from A.
Expected result would be like this
ID
Difference
1
5
1
10
2
60
Trying to use Lead function to obtain the next desired timestamp but im not getting the expected result
Any tips or advice?
Thanks
Even if it is already answered - it's a nice use case for Vertica's MATCH() clause. Looking for a pattern consisting of sender = 'A' followed by sender = 'B'.
You get a pattern id, and then you can group by other stuff plus the pattern id to get max timestamp and min timestamp.
Also note that I renamed both "user" and "timestamp", as they are reserved words...
-- your input, don't use in final query
indata(ID,Message,ts,Usr) AS (
SELECT 1,'Hello' ,TIMESTAMP '2022-08-01 10:00:00','A'
UNION ALL SELECT 1,'How are you?',TIMESTAMP '2022-08-01 10:00:05','A'
UNION ALL SELECT 1,'Hello there' ,TIMESTAMP '2022-08-01 10:00:10','B'
UNION ALL SELECT 1,'I am okay' ,TIMESTAMP '2022-08-01 10:00:12','B'
UNION ALL SELECT 1,'Good to know',TIMESTAMP '2022-08-01 10:00:15','A'
UNION ALL SELECT 1,'Bye' ,TIMESTAMP '2022-08-01 10:00:25','B'
UNION ALL SELECT 2,'Hello' ,TIMESTAMP '2022-08-01 10:02:50','A'
UNION ALL SELECT 2,'Hi' ,TIMESTAMP '2022-08-01 10:03:50','B'
)
-- end of input, real query starts here , replace following comma with "WITH"
,
w_match_clause AS (
SELECT
*
, event_name()
, pattern_id()
, match_id()
FROM indata
MATCH (
PARTITION BY id ORDER BY ts
DEFINE
sentbya AS usr='A'
, sentbyb AS usr='B'
PATTERN
p AS (sentbya sentbyb)
)
-- ctl SELECT * FROM w_match_clause;
-- ctl ID | Message | ts | Usr | event_name | pattern_id | match_id
-- ctl ----+--------------+---------------------+-----+------------+------------+----------
-- ctl 1 | How are you? | 2022-08-01 10:00:05 | A | sentbya | 1 | 1
-- ctl 1 | Hello there | 2022-08-01 10:00:10 | B | sentbyb | 1 | 2
-- ctl 1 | Good to know | 2022-08-01 10:00:15 | A | sentbya | 2 | 1
-- ctl 1 | Bye | 2022-08-01 10:00:25 | B | sentbyb | 2 | 2
-- ctl 2 | Hello | 2022-08-01 10:02:50 | A | sentbya | 1 | 1
-- ctl 2 | Hi | 2022-08-01 10:03:50 | B | sentbyb | 1 | 2
)
SELECT
id
, MAX(ts) - MIN(ts) AS difference
FROM w_match_clause
GROUP BY
id
, pattern_id
ORDER BY
id;
-- out id | difference
-- out ----+------------
-- out 1 | 00:00:05
-- out 1 | 00:00:10
-- out 2 | 00:01
With mySQL 8.0:
WITH cte AS (
SELECT id, user, timestamp, ( LEAD(user) OVER (ORDER BY timestamp) ) AS to_user,
TIME_TO_SEC(TIMEDIFF(LEAD(timestamp) OVER (ORDER BY timestamp), timestamp)) AS time_diff
FROM msg_tab
)
SELECT id, time_diff
FROM cte
WHERE user='A' AND to_user IN ('B', NULL)

Postgres - calculate total working hours based on IN and OUT entry

I have the below tables:
1) My Company table
id | c_name | c_code | status
----+------------+----------+--------
1 | AAAAAAAAAA | AA1234 | Active
2) My User table
id | c_id | u_name | status | emp_id
----+------------+----------+--------+--------
1 | 1 | XXXXXXXX | Active | 1
2 | 1 | YYYYYYYY | Active | 2
3) My Attendance table
id | u_id | swipe_time | status
----+--------+------------------------+--------
1 | 1 | 2020-08-20 16:00:00 | IN
2 | 1 | 2020-08-20 20:00:00 | OUT
3 | 1 | 2020-08-20 21:00:00 | IN
4 | 1 | 2020-08-21 01:00:00 | OUT
5 | 1 | 2020-08-21 16:00:00 | IN
6 | 1 | 2020-08-21 19:00:00 | OUT
I need to calculate the attendance grouped by date, u_id like below:
Note: The query parameters would be the "From Date", "To Date" and "Company ID"
u_id | u_name | date | in_time | out_time | hrs
-----+-----------+-------------+----------------------+----------------------+-----
1 | XXXXXXXX | 2020-08-20 | 2020-08-20 16:00:00 | 2020-08-21 01:00:00 | 7
1 | XXXXXXXX | 2020-08-21 | 2020-08-21 16:00:00 | 2020-08-21 19:00:00 | 4
2 | YYYYYYYY | null | null | null | 0
Is this possible in PostgreSQL?
The tricky part is to expand one row that covers two (calendar) days to two rows and allocating the hours of the "next" day correctly.
The first part is to get a pivot table that combines IN/OUT pairs into a single row.
A simple (yet not very efficient) approach is:
select ain.u_id,
ain.swipe_time as time_in,
(select min(aout.swipe_time)
from attendance aout
where aout.u_id = ain.u_id
and aout.status = 'OUT'
and aout.swipe_time > ain.swipe_time) as time_out
from attendance ain
where ain.status = 'IN'
The next step is to break up the rows with more than one day into two rows.
This assumes that you never have an IN/OUT pair that covers more than two days!
with inout as (
select ain.u_id,
ain.swipe_time as time_in,
(select min(aout.swipe_time)
from attendance aout
where aout.u_id = ain.u_id
and aout.status = 'OUT'
and aout.swipe_time > ain.swipe_time) as time_out
from attendance ain
where ain.status = 'IN'
), expanded as (
select u_id,
time_in::date as "date",
time_in,
time_out
from inout
where time_in::date = time_out::date
union all
select i.u_id,
x.time_in::date as date,
x.time_in,
x.time_out
from inout i
cross join lateral (
select i.u_id,
i.time_in,
i.time_in::date + 1 as time_out
union all
select i.u_id,
i.time_out::date,
i.time_out
) x
where i.time_out::date > i.time_in::date
)
select *
from expanded;
The above returns the following for your sample data:
u_id | date | time_in | time_out
-----+------------+---------------------+--------------------
1 | 2020-08-20 | 2020-08-20 16:00:00 | 2020-08-20 20:00:00
1 | 2020-08-20 | 2020-08-20 21:00:00 | 2020-08-21 00:00:00
1 | 2020-08-21 | 2020-08-21 00:00:00 | 2020-08-21 01:00:00
1 | 2020-08-21 | 2020-08-21 16:00:00 | 2020-08-21 19:00:00
How does this work?
So we first select all those rows that start and end on the same day with this part:
select u_id,
time_in::date as "date",
time_in,
time_out
from inout
where time_in::date = time_out::date
The second part of the union splits up the rows that span two days by using a cross join that generates one row with the original start time and midnight, and another from midnight to the original end time:
select i.u_id,
x.time_in::date as date,
x.time_in,
x.time_out
from inout i
cross join lateral (
-- this generates a row for the first of the two days
select i.u_id,
i.time_in,
i.time_in::date + 1 as time_out
union all
-- this generates the row for the next day
select i.u_id,
i.time_out::date,
i.time_out
) x
where i.time_out::date > i.time_in::date
At the end the new "expanded" rows are then aggregated by grouping them by user and date and left joined to the users table to get the username as well.
with inout as (
select ain.u_id,
ain.swipe_time as time_in,
(select min(aout.swipe_time)
from attendance aout
where aout.u_id = ain.u_id
and aout.status = 'OUT'
and aout.swipe_time > ain.swipe_time) as time_out
from attendance ain
where ain.status = 'IN'
), expanded as (
select u_id,
time_in::date as "date",
time_in,
time_out
from inout
where time_in::date = time_out::date
union all
select i.u_id,
x.time_in::date as date,
x.time_in,
x.time_out
from inout i
cross join lateral (
select i.u_id,
i.time_in,
i.time_in::date + 1 as time_out
union all
select i.u_id,
i.time_out::date,
i.time_out
) x
where i.time_out::date > i.time_in::date
)
select u.id,
u.u_name,
e."date",
min(e.time_in) as time_in,
max(e.time_out) as time_out,
sum(e.time_out - e.time_in) as duration
from users u
left join expanded e on u.id = e.u_id
group by u.id, u.u_name, e."date"
order by u.id, e."date";
Which then results in:
u_id | date | time_in | time_out | duration
-----+------------+---------------------+---------------------+----------------------------------------------
1 | 2020-08-20 | 2020-08-20 16:00:00 | 2020-08-21 00:00:00 | 0 years 0 mons 0 days 7 hours 0 mins 0.0 secs
1 | 2020-08-21 | 2020-08-21 00:00:00 | 2020-08-21 19:00:00 | 0 years 0 mons 0 days 4 hours 0 mins 0.0 secs
The "duration" column is an interval which you can format to your liking.
Online example
Using lead window function makes it somewhat simpler and readable. For balanced IN and OUT attendance events this will work fine, otherwise there will be null values for the attendance hours. This makes sense, because either the person has not left yet or has not attended yet or attendance data is corrupt.
select
u.id u_id, u.u_name,
t.date_in date, t.t_in in_time, t.t_out out_time,
extract('hour' from t.t_out - t.t_in) hrs
from users u
left outer join
(
select u_id,
date_trunc('day', swipe_time) date_in,
swipe_time t_in,
lead(swipe_time, 1) over (partition by u_id order by u_id, swipe_time) t_out,
status
from attendance
) t
on u.id = t.u_id
where t.status = 'IN';

How to get the count of policies for rejection codes

I have a requirement where i need to map the rejection codes and give the count of the Policies of Each type for Each rejection codes. Example mentioned as below.
Policy Table:
PolicyNo | PolType |MyRejectionCode
----------|---------|--------------
Pol1 | S | M101
----------|---------|--------------
Pol2 | S | M101, M102
----------|---------|--------------
Pol3 | S | M104
----------|---------|--------------
Pol4 | F | M105, M107, M108
----------|---------|--------------
Pol5 | F | M106
----------|---------|--------------
Pol6 | F | M107
RejectionMapping Table:
MyRejCode | StandardRejCode
-----------------|----------------
M101 | S101
-----------------|----------------
M102 | S102
-----------------|----------------
M103 | S103
-----------------|----------------
M104 | S104
-----------------|----------------
M105 | S101
-----------------|----------------
M106 | S102
-----------------|----------------
M107 | S103
-----------------|----------------
M108 | S104
Now i want to create a select query to give the result in below format.
StandartRejCode | S_Pol_Type_Count | F_Pol_Type_Count
-----------------|--------------------|----------------------
S101 | 2 | 1
-----------------|--------------------|----------------------
S102 | 1 | 1
-----------------|--------------------|----------------------
S103 | 0 | 2
-----------------|--------------------|----------------------
S104 | 1 | 1
How can I create a query for mentioned output?
Sample Data
DECLARE #Policy AS TABLE (PolicyNo VARCHAR(100) , PolType VARCHAR(100) ,MyRejectionCode VARCHAR(100))
INSERT INTO #Policy
SELECT 'Pol1','S','M101' UNION ALL
SELECT 'Pol2','S','M101,M102' UNION ALL
SELECT 'Pol3','S','M104' UNION ALL
SELECT 'Pol4','F','M105,M107,M108' UNION ALL
SELECT 'Pol5','F','M106' UNION ALL
SELECT 'Pol6','F','M107'
DECLARE #RejectionMapping AS Table ( MyRejCode VARCHAR(100), StandardRejCode VARCHAR(100))
INSERT INTO #RejectionMapping
SELECT 'M101','S101' UNION ALL
SELECT 'M102','S102' UNION ALL
SELECT 'M103','S103' UNION ALL
SELECT 'M104','S104' UNION ALL
SELECT 'M105','S101' UNION ALL
SELECT 'M106','S102' UNION ALL
SELECT 'M107','S103' UNION ALL
SELECT 'M108','S104'
Now We need to split the comma separated data into single row using xml format.
Then need to join both the tables using inner join and finally find the count of occurrences of "PolType"
;WITH Cte
AS
(
SELECT PolicyNo,PolType,Split.a.value ('.','nvarchar(max)') AS MyRejectionCode
FROM
(
SELECT PolicyNo, PolType,
CAST('<S>'+REPLACE(MyRejectionCode,',','</S><S>')+'</S>' AS XML) AS MyRejectionCode
FROM #Policy
)AS A
CROSS APPLY MyRejectionCode.nodes('S') AS Split(a)
)
SELECT R.StandardRejCode,
ISNULL(SUM(CASE WHEN PolType = 'S' THEN 1 ELSE 0 END),0) AS S_Pol_Type_Count,
ISNULL(SUM(CASE WHEN PolType = 'F' THEN 1 ELSE 0 END),0) AS F_Pol_Type_Count
FROM Cte c
INNER JOIN #RejectionMapping R
ON R.MyRejCode = c.MyRejectionCode
GROUP BY R.StandardRejCode
Result
StandardRejCode S_Pol_Type_Count F_Pol_Type_Count
------------------------------------------------------
S101 2 1
S102 1 1
S103 0 2
S104 1 1

Update end date based on its succeeding start date in sql server

I am new to SQL Server, I tried few methods but couldn't able to get succeed to update below nulls with the value of their immediate successive to respective products (start_day-1 day), It is my production scenario, so I cant able to publish original query I tried. So kindly help me to achieve this scenario.
Table_Name - Product
Actual data:
------------------------------------------
Product_cd | Start_date | end_date
------------------------------------------
A | 2017-01-01 | 2017-01-10
A | 2017-01-11 | null
A | 2017-03-10 | 2099-12-31
B | 2015-01-01 | null
B | 2017-01-11 | 2099-12-31
C | 2015-01-01 | 2015-01-10
C | 2015-01-11 | null
C | 2015-03-10 | 2015-03-09
C | 2015-03-10 | 2099-12-31
D | 2000-01-01 | 2000-10-21
D | 2000-10-22 | 2000-11-12
D | 2000-11-13 | null
D | 2015-03-10 | 2099-12-31
Correct data expecting: (After Null in end_date, min(start_date) for same product- 1 day)
------------------------------------------
Product_cd | Start_date | end_date
------------------------------------------
A | 2017-01-01 | 2017-01-10
A | 2017-01-11 | 2017-03-09
A | 2017-03-10 | 2099-12-31
B | 2015-01-01 | 2017-01-10
B | 2017-01-11 | 2099-12-31
C | 2015-01-01 | 2015-01-10
C | 2015-01-11 | 2015-03-09
C | 2015-03-10 | 2015-03-09
C | 2015-03-10 | 2099-12-31
D | 2000-01-01 | 2000-10-21
D | 2000-10-22 | 2000-11-12
D | 2000-11-13 | 2015-03-09
D | 2015-03-10 | 2099-12-31
As etsa says the LEAD window function is what you need to use here (see here). You can only put this in a SELECT though so your update will need to be via something like a CTE. Try something like this...
DROP TABLE IF EXISTS StartEnd
CREATE TABLE StartEnd
( Product_cd char(1),
Startdate date,
end_date date
)
INSERT dbo.StartEnd (Product_cd,Startdate,end_date)
VALUES
('A','2017-01-01','2017-01-10' ),
('A','2017-01-11',null ),
('A','2017-03-10','2099-12-31' ),
('B','2015-01-01',null ),
('B','2017-01-11','2099-12-31' ),
('C','2015-01-01','2015-01-10' ),
('C','2015-01-11',null ),
('C','2015-03-10','2015-03-09' ),
('C','2015-03-10','2099-12-31' ),
('D','2000-01-01','2000-10-21' ),
('D','2000-10-22','2000-11-12' ),
('D','2000-11-13',null ),
('D','2015-03-10','2099-12-31' );
SELECT * FROM dbo.StartEnd AS se;
WITH UpdateRows AS
(
SELECT se.Product_cd,
se.Startdate,
se.end_date,
CASE WHEN se.end_date IS NULL
THEN dateadd(DAY,-1,lead(se.StartDate,1) OVER(PARTITION BY se.Product_cd ORDER BY se.Startdate))
ELSE se.end_date END AS newEndDate
FROM dbo.StartEnd AS se
)
UPDATE UpdateRows
SET end_date = newEndDate
WHERE end_date IS NULL;
SELECT * FROM dbo.StartEnd AS se;
In SQL Server 2012+, you can use lead(). In earlier versions, you need another method. Here is one:
update p
set end_date = dateadd(day, -1, p2.start_date)
from product p outer apply
(select top 1 p2.*
from product p2
where p2.product_cd = p.product_cd and
p2.start_date > p.start_date
order by p2.start_date desc
) p2
where p.end_date is null;
If you just want to retrieve the data, then you can use the same from clause in a select.
Try this.....
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) rownum,* INTO #Temp_table
FROM dbo.StartEnd f1
SELECT t1.Product_cd,t1.Startdate,DATEADD(DAY,-1,t2.Startdate)end_date
FROM #Temp_table t1
LEFT JOIN #Temp_table t2 ON t1.rownum = t2.rownum - 1
To extract the values you want you can use following query. It use windows analytical function LEAD() to find next value for a PRODUCT_CD, using START_DATE ordering). (As Gordon pointed out, in MSSQL 2012+)
SELECT *
FROM (SELECT PRODUCT_CD, START_DATE, END_DATE
, LEAD(START_DATE) OVER (PARTITION BY PRODUCT_CD ORDER BY START_DATE)-1 AS DATE_SUCC
FROM PRODUCT) A
WHERE END_DATE IS NULL AND DATE_SUCC IS NOT NULL;
Try to make the UPDATE by yourself. If you find any problem let me know and we'll see together.
I thought it would be useful for you to try to do the UPDATE, but others don't think so.
Here is the UPDATE, starting from my SELECT (I don't think CTE is necessary). I used it inside a BEGIN TRAN / ROLLBACK TRAN, so you can check it.
BEGIN TRAN
UPDATE A SET END_DATE = A.DATE_SUCC
FROM (SELECT PRODUCT_CD, START_DATE, END_DATE
, LEAD(START_DATE) OVER (PARTITION BY PRODUCT_CD ORDER BY START_DATE)-1 AS DATE_SUCC
FROM PRODUCT) A
WHERE A.END_DATE IS NULL AND A.DATE_SUCC IS NOT NULL
SELECT * FROM PRODUCT
ROLLBACK TRAN
Output sample:
PRODUCT_CD START_DATE END_DATE
A 2017-01-01 00:00:00.000 2017-01-10 00:00:00.000
A 2017-01-11 00:00:00.000 2017-03-09 00:00:00.000
A 2017-03-10 00:00:00.000 2099-12-31 00:00:00.000
B 2015-01-01 00:00:00.000 2017-01-10 00:00:00.000
B 2017-01-11 00:00:00.000 2099-12-31 00:00:00.000
...

A very basic SQL issue I'm stuck with [duplicate]

I have a table of player performance:
CREATE TABLE TopTen (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
home INT UNSIGNED NOT NULL,
`datetime`DATETIME NOT NULL,
player VARCHAR(6) NOT NULL,
resource INT NOT NULL
);
What query will return the rows for each distinct home holding its maximum value of datetime? In other words, how can I filter by the maximum datetime (grouped by home) and still include other non-grouped, non-aggregate columns (such as player) in the result?
For this sample data:
INSERT INTO TopTen
(id, home, `datetime`, player, resource)
VALUES
(1, 10, '04/03/2009', 'john', 399),
(2, 11, '04/03/2009', 'juliet', 244),
(5, 12, '04/03/2009', 'borat', 555),
(3, 10, '03/03/2009', 'john', 300),
(4, 11, '03/03/2009', 'juliet', 200),
(6, 12, '03/03/2009', 'borat', 500),
(7, 13, '24/12/2008', 'borat', 600),
(8, 13, '01/01/2009', 'borat', 700)
;
the result should be:
id
home
datetime
player
resource
1
10
04/03/2009
john
399
2
11
04/03/2009
juliet
244
5
12
04/03/2009
borat
555
8
13
01/01/2009
borat
700
I tried a subquery getting the maximum datetime for each home:
-- 1 ..by the MySQL manual:
SELECT DISTINCT
home,
id,
datetime AS dt,
player,
resource
FROM TopTen t1
WHERE `datetime` = (SELECT
MAX(t2.datetime)
FROM TopTen t2
GROUP BY home)
GROUP BY `datetime`
ORDER BY `datetime` DESC
The result-set has 130 rows although database holds 187, indicating the result includes some duplicates of home.
Then I tried joining to a subquery that gets the maximum datetime for each row id:
-- 2 ..join
SELECT
s1.id,
s1.home,
s1.datetime,
s1.player,
s1.resource
FROM TopTen s1
JOIN (SELECT
id,
MAX(`datetime`) AS dt
FROM TopTen
GROUP BY id) AS s2
ON s1.id = s2.id
ORDER BY `datetime`
Nope. Gives all the records.
I tried various exotic queries, each with various results, but nothing that got me any closer to solving this problem.
You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:
SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
The fastest MySQL solution, without inner queries and without GROUP BY:
SELECT m.* -- get the row that contains the max value
FROM topten m -- "m" from "max"
LEFT JOIN topten b -- "b" from "bigger"
ON m.home = b.home -- match "max" row with "bigger" row by `home`
AND m.datetime < b.datetime -- want "bigger" than "max"
WHERE b.datetime IS NULL -- keep only if there is no bigger than max
Explanation:
Join the table with itself using the home column. The use of LEFT JOIN ensures all the rows from table m appear in the result set. Those that don't have a match in table b will have NULLs for the columns of b.
The other condition on the JOIN asks to match only the rows from b that have bigger value on the datetime column than the row from m.
Using the data posted in the question, the LEFT JOIN will produce this pairs:
+------------------------------------------+--------------------------------+
| the row from `m` | the matching row from `b` |
|------------------------------------------|--------------------------------|
| id home datetime player resource | id home datetime ... |
|----|-----|------------|--------|---------|------|------|------------|-----|
| 1 | 10 | 04/03/2009 | john | 399 | NULL | NULL | NULL | ... | *
| 2 | 11 | 04/03/2009 | juliet | 244 | NULL | NULL | NULL | ... | *
| 5 | 12 | 04/03/2009 | borat | 555 | NULL | NULL | NULL | ... | *
| 3 | 10 | 03/03/2009 | john | 300 | 1 | 10 | 04/03/2009 | ... |
| 4 | 11 | 03/03/2009 | juliet | 200 | 2 | 11 | 04/03/2009 | ... |
| 6 | 12 | 03/03/2009 | borat | 500 | 5 | 12 | 04/03/2009 | ... |
| 7 | 13 | 24/12/2008 | borat | 600 | 8 | 13 | 01/01/2009 | ... |
| 8 | 13 | 01/01/2009 | borat | 700 | NULL | NULL | NULL | ... | *
+------------------------------------------+--------------------------------+
Finally, the WHERE clause keeps only the pairs that have NULLs in the columns of b (they are marked with * in the table above); this means, due to the second condition from the JOIN clause, the row selected from m has the biggest value in column datetime.
Read the SQL Antipatterns: Avoiding the Pitfalls of Database Programming book for other SQL tips.
Here goes T-SQL version:
-- Test data
DECLARE #TestTable TABLE (id INT, home INT, date DATETIME,
player VARCHAR(20), resource INT)
INSERT INTO #TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700
-- Answer
SELECT id, home, date, player, resource
FROM (SELECT id, home, date, player, resource,
RANK() OVER (PARTITION BY home ORDER BY date DESC) N
FROM #TestTable
)M WHERE N = 1
-- and if you really want only home with max date
SELECT T.id, T.home, T.date, T.player, T.resource
FROM #TestTable T
INNER JOIN
( SELECT TI.id, TI.home, TI.date,
RANK() OVER (PARTITION BY TI.home ORDER BY TI.date) N
FROM #TestTable TI
WHERE TI.date IN (SELECT MAX(TM.date) FROM #TestTable TM)
)TJ ON TJ.N = 1 AND T.id = TJ.id
EDIT
Unfortunately, there are no RANK() OVER function in MySQL.
But it can be emulated, see Emulating Analytic (AKA Ranking) Functions with MySQL.
So this is MySQL version:
SELECT id, home, date, player, resource
FROM TestTable AS t1
WHERE
(SELECT COUNT(*)
FROM TestTable AS t2
WHERE t2.home = t1.home AND t2.date > t1.date
) = 0
This will work even if you have two or more rows for each home with equal DATETIME's:
SELECT id, home, datetime, player, resource
FROM (
SELECT (
SELECT id
FROM topten ti
WHERE ti.home = t1.home
ORDER BY
ti.datetime DESC
LIMIT 1
) lid
FROM (
SELECT DISTINCT home
FROM topten
) t1
) ro, topten t2
WHERE t2.id = ro.lid
I think this will give you the desired result:
SELECT home, MAX(datetime)
FROM my_table
GROUP BY home
BUT if you need other columns as well, just make a join with the original table (check Michael La Voie answer)
Best regards.
Since people seem to keep running into this thread (comment date ranges from 1.5 year) isn't this much simpler:
SELECT * FROM (SELECT * FROM topten ORDER BY datetime DESC) tmp GROUP BY home
No aggregation functions needed...
Cheers.
You can also try this one and for large tables query performance will be better. It works when there no more than two records for each home and their dates are different. Better general MySQL query is one from Michael La Voie above.
SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
FROM t_scores_1 t1
INNER JOIN t_scores_1 t2
ON t1.home = t2.home
WHERE t1.date > t2.date
Or in case of Postgres or those dbs that provide analytic functions try
SELECT t.* FROM
(SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
, row_number() over (partition by t1.home order by t1.date desc) rw
FROM topten t1
INNER JOIN topten t2
ON t1.home = t2.home
WHERE t1.date > t2.date
) t
WHERE t.rw = 1
SELECT tt.*
FROM TestTable tt
INNER JOIN
(
SELECT coord, MAX(datetime) AS MaxDateTime
FROM rapsa
GROUP BY
krd
) groupedtt
ON tt.coord = groupedtt.coord
AND tt.datetime = groupedtt.MaxDateTime
This works on Oracle:
with table_max as(
select id
, home
, datetime
, player
, resource
, max(home) over (partition by home) maxhome
from table
)
select id
, home
, datetime
, player
, resource
from table_max
where home = maxhome
Try this for SQL Server:
WITH cte AS (
SELECT home, MAX(year) AS year FROM Table1 GROUP BY home
)
SELECT * FROM Table1 a INNER JOIN cte ON a.home = cte.home AND a.year = cte.year
Here is MySQL version which prints only one entry where there are duplicates MAX(datetime) in a group.
You could test here http://www.sqlfiddle.com/#!2/0a4ae/1
Sample Data
mysql> SELECT * from topten;
+------+------+---------------------+--------+----------+
| id | home | datetime | player | resource |
+------+------+---------------------+--------+----------+
| 1 | 10 | 2009-04-03 00:00:00 | john | 399 |
| 2 | 11 | 2009-04-03 00:00:00 | juliet | 244 |
| 3 | 10 | 2009-03-03 00:00:00 | john | 300 |
| 4 | 11 | 2009-03-03 00:00:00 | juliet | 200 |
| 5 | 12 | 2009-04-03 00:00:00 | borat | 555 |
| 6 | 12 | 2009-03-03 00:00:00 | borat | 500 |
| 7 | 13 | 2008-12-24 00:00:00 | borat | 600 |
| 8 | 13 | 2009-01-01 00:00:00 | borat | 700 |
| 9 | 10 | 2009-04-03 00:00:00 | borat | 700 |
| 10 | 11 | 2009-04-03 00:00:00 | borat | 700 |
| 12 | 12 | 2009-04-03 00:00:00 | borat | 700 |
+------+------+---------------------+--------+----------+
MySQL Version with User variable
SELECT *
FROM (
SELECT ord.*,
IF (#prev_home = ord.home, 0, 1) AS is_first_appear,
#prev_home := ord.home
FROM (
SELECT t1.id, t1.home, t1.player, t1.resource
FROM topten t1
INNER JOIN (
SELECT home, MAX(datetime) AS mx_dt
FROM topten
GROUP BY home
) x ON t1.home = x.home AND t1.datetime = x.mx_dt
ORDER BY home
) ord, (SELECT #prev_home := 0, #seq := 0) init
) y
WHERE is_first_appear = 1;
+------+------+--------+----------+-----------------+------------------------+
| id | home | player | resource | is_first_appear | #prev_home := ord.home |
+------+------+--------+----------+-----------------+------------------------+
| 9 | 10 | borat | 700 | 1 | 10 |
| 10 | 11 | borat | 700 | 1 | 11 |
| 12 | 12 | borat | 700 | 1 | 12 |
| 8 | 13 | borat | 700 | 1 | 13 |
+------+------+--------+----------+-----------------+------------------------+
4 rows in set (0.00 sec)
Accepted Answers' outout
SELECT tt.*
FROM topten tt
INNER JOIN
(
SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home
) groupedtt ON tt.home = groupedtt.home AND tt.datetime = groupedtt.MaxDateTime
+------+------+---------------------+--------+----------+
| id | home | datetime | player | resource |
+------+------+---------------------+--------+----------+
| 1 | 10 | 2009-04-03 00:00:00 | john | 399 |
| 2 | 11 | 2009-04-03 00:00:00 | juliet | 244 |
| 5 | 12 | 2009-04-03 00:00:00 | borat | 555 |
| 8 | 13 | 2009-01-01 00:00:00 | borat | 700 |
| 9 | 10 | 2009-04-03 00:00:00 | borat | 700 |
| 10 | 11 | 2009-04-03 00:00:00 | borat | 700 |
| 12 | 12 | 2009-04-03 00:00:00 | borat | 700 |
+------+------+---------------------+--------+----------+
7 rows in set (0.00 sec)
SELECT c1, c2, c3, c4, c5 FROM table1 WHERE c3 = (select max(c3) from table)
SELECT * FROM table1 WHERE c3 = (select max(c3) from table1)
Another way to gt the most recent row per group using a sub query which basically calculates a rank for each row per group and then filter out your most recent rows as with rank = 1
select a.*
from topten a
where (
select count(*)
from topten b
where a.home = b.home
and a.`datetime` < b.`datetime`
) +1 = 1
DEMO
Here is the visual demo for rank no for each row for better understanding
By reading some comments what about if there are two rows which have same 'home' and 'datetime' field values?
Above query will fail and will return more than 1 rows for above situation. To cover up this situation there will be a need of another criteria/parameter/column to decide which row should be taken which falls in above situation. By viewing sample data set i assume there is a primary key column id which should be set to auto increment. So we can use this column to pick the most recent row by tweaking same query with the help of CASE statement like
select a.*
from topten a
where (
select count(*)
from topten b
where a.home = b.home
and case
when a.`datetime` = b.`datetime`
then a.id < b.id
else a.`datetime` < b.`datetime`
end
) + 1 = 1
DEMO
Above query will pick the row with highest id among the same datetime values
visual demo for rank no for each row
Why not using:
SELECT home, MAX(datetime) AS MaxDateTime,player,resource FROM topten GROUP BY home
Did I miss something?
In MySQL 8.0 this can be achieved efficiently by using row_number() window function with common table expression.
(Here row_number() basically generating unique sequence for each row for every player starting with 1 in descending order of resource. So, for every player row with sequence number 1 will be with highest resource value. Now all we need to do is selecting row with sequence number 1 for each player. It can be done by writing an outer query around this query. But we used common table expression instead since it's more readable.)
Schema:
create TABLE TestTable(id INT, home INT, date DATETIME,
player VARCHAR(20), resource INT);
INSERT INTO TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700
Query:
with cte as
(
select id, home, date , player, resource,
Row_Number()Over(Partition by home order by date desc) rownumber from TestTable
)
select id, home, date , player, resource from cte where rownumber=1
Output:
id
home
date
player
resource
1
10
2009-03-04 00:00:00
john
399
2
11
2009-03-04 00:00:00
juliet
244
5
12
2009-03-04 00:00:00
borat
555
8
13
2009-01-01 00:00:00
borat
700
db<>fiddle here
This works in SQLServer, and is the only solution I've seen that doesn't require subqueries or CTEs - I think this is the most elegant way to solve this kind of problem.
SELECT TOP 1 WITH TIES *
FROM TopTen
ORDER BY ROW_NUMBER() OVER (PARTITION BY home
ORDER BY [datetime] DESC)
In the ORDER BY clause, it uses a window function to generate & sort by a ROW_NUMBER - assigning a 1 value to the highest [datetime] for each [home].
SELECT TOP 1 WITH TIES will then select one record with the lowest ROW_NUMBER (which will be 1), as well as all records with a tying ROW_NUMBER (also 1)
As a consequence, you retrieve all data for each of the 1st ranked records - that is, all data for records with the highest [datetime] value with their given [home] value.
Try this
select * from mytable a join
(select home, max(datetime) datetime
from mytable
group by home) b
on a.home = b.home and a.datetime = b.datetime
Regards
K
#Michae The accepted answer will working fine in most of the cases but it fail for one for as below.
In case if there were 2 rows having HomeID and Datetime same the query will return both rows, not distinct HomeID as required, for that add Distinct in query as below.
SELECT DISTINCT tt.home , tt.MaxDateTime
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
this is the query you need:
SELECT b.id, a.home,b.[datetime],b.player,a.resource FROM
(SELECT home,MAX(resource) AS resource FROM tbl_1 GROUP BY home) AS a
LEFT JOIN
(SELECT id,home,[datetime],player,resource FROM tbl_1) AS b
ON a.resource = b.resource WHERE a.home =b.home;
Hope below query will give the desired output:
Select id, home,datetime,player,resource, row_number() over (Partition by home ORDER by datetime desc) as rownum from tablename where rownum=1
(NOTE: The answer of Michael is perfect for a situation where the target column datetime cannot have duplicate values for each distinct home.)
If your table has duplicate rows for homexdatetime and you need to only select one row for each distinct home column, here is my solution to it:
Your table needs one unique column (like id). If it doesn't, create a view and add a random column to it.
Use this query to select a single row for each unique home value. Selects the lowest id in case of duplicate datetime.
SELECT tt.*
FROM topten tt
INNER JOIN
(
SELECT min(id) as min_id, home from topten tt2
INNER JOIN
(
SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt2
ON tt2.home = groupedtt2.home
) as groupedtt
ON tt.id = groupedtt.id
Accepted answer doesn't work for me if there are 2 records with same date and home. It will return 2 records after join. While I need to select any (random) of them. This query is used as joined subquery so just limit 1 is not possible there.
Here is how I reached desired result. Don't know about performance however.
select SUBSTRING_INDEX(GROUP_CONCAT(id order by datetime desc separator ','),',',1) as id, home, MAX(datetime) as 'datetime'
from topten
group by (home)