I want to find the date intervals at which the employee comes on a regular basis - sql

Imagine a employee who works in a company whos having a contract to work on a specific task, he comes in and goes on start and end date respectively. I want to get the interval at which the employee comes to office without any absence.
Example Data:
DECLARE #TimeClock TABLE (PunchID INT IDENTITY, EmployeeID INT, PunchinDate DATE)
INSERT INTO #TimeClock (EmployeeID, PunchInDate) VALUES
(1, '2020-01-01'), (1, '2020-01-02'), (1, '2020-01-03'), (1, '2020-01-04'),
(1, '2020-01-05'), (1, '2020-01-06'), (1, '2020-01-07'), (1, '2020-01-08'),
(1, '2020-01-09'), (1, '2020-01-10'), (1, '2020-01-11'), (1, '2020-01-12'),
(1, '2020-01-13'), (1, '2020-01-14'), (1, '2020-01-16'),
(1, '2020-01-17'), (1, '2020-01-18'), (1, '2020-01-19'), (1, '2020-01-20'),
(1, '2020-01-21'), (1, '2020-01-22'), (1, '2020-01-23'), (1, '2020-01-24'),
(1, '2020-01-25'), (1, '2020-01-26'), (1, '2020-01-27'), (1, '2020-01-28'),
(1, '2020-01-29'), (1, '2020-01-30'), (1, '2020-01-31'),
(1, '2020-02-01'), (1, '2020-02-02'), (1, '2020-02-03'), (1, '2020-02-04'),
(1, '2020-02-05'), (1, '2020-02-06'), (1, '2020-02-07'), (1, '2020-02-08'),
(1, '2020-02-09'), (1, '2020-02-10'), (1, '2020-02-12'),
(1, '2020-02-13'), (1, '2020-02-14'), (1, '2020-02-15'), (1, '2020-02-16');
--the output shall look like this '2020-01-01 to 2020-02-10' as this is the interval at which the employee comes without any leave
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-01') as START_DATE, FORMAT( getdate(), '2020-01-10') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-11') as START_DATE, FORMAT( getdate(), '2020-01-15') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-21') as START_DATE, FORMAT( getdate(), '2020-01-31') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-02-01') as START_DATE, FORMAT( getdate(), '2020-02-10') as END_DATE
--the output shall look like this '2020-01-01 to 2020-01-15' and '2020 01-21 to 2020-02-10'as these are the intervals at which the employee comes without any leave

Using the example data provided we can query the table like this:
;WITH iterate AS (
SELECT *, DATEADD(DAY,1,PunchinDate) AS NextDate
FROM #TimeClock
), base AS (
SELECT *
FROM (
SELECT *, CASE WHEN DATEADD(DAY,-1,PunchInDate) = LAG(PunchinDate,1) OVER (PARTITION BY EmployeeID ORDER BY PunchinDate) THEN PunchInDate END AS s
FROM iterate
) a
WHERE s IS NULL
), rCTE AS (
SELECT EmployeeID, PunchInDate AS StartDate, PunchInDate AS EndDate, NextDate
FROM base
UNION ALL
SELECT a.EmployeeID, a.StartDate, r.PunchInDate, r.NextDate
FROM rCTE a
INNER JOIN iterate r
ON a.NextDate = r.PunchinDate
AND a.EmployeeID = r.EmployeeID
)
SELECT EmployeeID, StartDate, MAX(EndDate) AS EndDate, DATEDIFF(DAY,StartDate,MAX(EndDate)) AS Streak
FROM rCTE
GROUP BY rCTE.EmployeeID, rCTE.StartDate
This is known as a recursive common table expression, and allows us to compare values between related rows. In this case we're looking for rows where they follow a streak, and we want o re-start that streak anytime we encounter a break. We're using a windowed function called LAG to look back a row to the previous value, and compare it to the one we have now. If it's not yesterday, then we start a new streak.
EmployeeID StartDate EndDate Streak
------------------------------------------
1 2020-01-01 2020-01-15 14
1 2020-01-17 2020-02-10 24
1 2020-02-12 2020-02-16 4

Related

How to pivot records to multi-hot encoding in BigQuery?

I have a records table:
CREATE TABLE `records`
(
`ID1` INT64,
`ID2` DATE,
`EVENT` INT64,
);
This table has some records like:
INSERT `records` (ID1, ID2, EVENT)
VALUES (1, PARSE_DATE('%Y-%m-%d', '2022-01-10'), 111),
(1, PARSE_DATE('%Y-%m-%d', '2022-01-10'), 111),
(1, PARSE_DATE('%Y-%m-%d', '2022-01-10'), 222),
(1, PARSE_DATE('%Y-%m-%d', '2022-03-15'), 333),
(1, PARSE_DATE('%Y-%m-%d', '2022-03-15'), 111),
(1, PARSE_DATE('%Y-%m-%d', '2022-07-22'), 444),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 999),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 888),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 888),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 555),
(2, PARSE_DATE('%Y-%m-%d', '2021-02-17'), 666),
(2, PARSE_DATE('%Y-%m-%d', '2022-07-22'), 111)
And I have a separate dictionary (or vocabulary) table:
CREATE TABLE `t_vocabs`
(
`EVENT` INT64,
`DESCRIPTION` STRING,
);
Table t_vocabs contains all unique values for EVENT:
INSERT `t_vocabs` (EVENT, DESCRIPTION)
VALUES (111, 'desc1'),
(222, 'desc2'),
(333, 'desc3'),
(444, 'desc4'),
(555, 'desc5'),
(666, 'desc6'),
(777, 'desc7'),
(888, 'desc8'),
(999, 'desc9'),
(123, 'desc10'),
(456, 'desc111')
Therefore, records would look like this:
And the vocabulary table would look like this:
I need to convert records to multi-hot encoding in which for each ID1 and ID2 I have one record, with the number of columns equal to the unique number of events in t_vocabs and the values equal to 1 when that relevant event exists in records and 0 otherwise.
For this example, it'd be like:
Note, I have tried
SELECT
*
FROM
`records`
PIVOT
(COUNT(*) FOR EVENT IN (111, 222, 333, 444, 555, 666, 777, 888, 999, 123, 456))
And this does the job, but the issue is that the t_vocabs is a big table and I can't list all EVENT values manually in this query. When I try to select events from another table (e.g. following query) it doesn't like the select:
SELECT *
FROM `records`
PIVOT(COUNT(*) FOR EVENT IN (SELECT EVENT FROM `t_vocabs`))
Use below
execute immediate (select '''
select * from records
pivot (count(*) for event in (''' || string_agg('' || event, ',' order by event) || "))"
from t_vocabs
)
if applied to sample data in your question - output is

Creating a table of sequences (integers and dates)

Sometimes I need to create tables of regularly occuring sequences of dates or integers. I manually create them and that works but it's more difficult to maintain because there's a lot of code duplication. What is the more maintainable way to say a sequence of integers or dates that occur at regular intervals?
Here's my current approach:
DECLARE #IndexDate TABLE (
[Id] INT,
[Date] DATE
)
INSERT INTO #IndexDate (
Id, Date
)
VALUES
(1, CONCAT(YEAR(GETDATE()), '-01-01')),
(2, CONCAT(YEAR(GETDATE()), '-02-01')),
(3, CONCAT(YEAR(GETDATE()), '-03-01')),
(4, CONCAT(YEAR(GETDATE()), '-04-01')),
(5, CONCAT(YEAR(GETDATE()), '-05-01')),
(6, CONCAT(YEAR(GETDATE()), '-06-01')),
(7, CONCAT(YEAR(GETDATE()), '-07-01')),
(8, CONCAT(YEAR(GETDATE()), '-08-01')),
(9, CONCAT(YEAR(GETDATE()), '-09-01')),
(10, CONCAT(YEAR(GETDATE()), '-10-01')),
(11, CONCAT(YEAR(GETDATE()), '-11-01')),
(12, CONCAT(YEAR(GETDATE()), '-12-01'))
SELECT * FROM #IndexDate
You could use recursive cte:
WITH cte(n) AS (
SELECT 1
UNION ALL
SELECT n+ 1
FROM cte
WHERE n < 12
)
SELECT *
FROM cte
OPTION (maxrecursion 0);
WITH cte(d) AS (
SELECT CAST('20190101' AS DATE)
UNION ALL
SELECT DATEADD(m, 1, d)
FROM cte
WHERE d < '20200101'
)
SELECT *
FROM cte
OPTION (maxrecursion 0);
db<>fiddle demo
To match your logic with a recursive CTE, you can do:
with indextable as (
select 1 as id, datefromparts(year(getdate()), 1, 1) as date
union all
select 1 + id, dateadd(month, 1, date)
from indextable
where id < 12
)
select *
from indextable;
For one year, you don't have to worry about option (maxrecursion).
I'm not a big fan of using date as a column name, because it is a keyword, but SQL Server allows it.

How Result using cross apply, can achieve using any other joins like inner,left,right join in sql server

This is my SQL script with sample data
CREATE TABLE [dbo].[Employee]
(
[ID] [INT] IDENTITY(1,1) NOT NULL,
[Name] [VARCHAR](100) NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[LoginEntry]
(
[ID] [INT] IDENTITY(1,1) NOT NULL,
[LoginTime] [DATETIME] NULL,
[EmpID] [INT] NULL,
[GateNumber] [VARCHAR](50) NULL
) ON [PRIMARY]
GO
ALTER TABLE Employee
ADD CONSTRAINT Pk_Employee PRIMARY KEY (Id)
GO
ALTER TABLE LoginEntry
ADD CONSTRAINT Fk_LoginEntry_Employee
FOREIGN KEY (EmpId) REFERENCES Employee(Id)
GO
SET IDENTITY_INSERT [dbo].[Employee] ON
GO
INSERT [dbo].[Employee] ([ID], [Name])
VALUES (1, N'Employee 1'), (2, N'Employee 2'), (3, N'Employee 3'),
(4, N'Employee 4'), (5, N'Employee 5'), (6, N'Employee 6')
GO
SET IDENTITY_INSERT [dbo].[Employee] OFF
GO
SET IDENTITY_INSERT [dbo].[LoginEntry] ON
GO
INSERT [dbo].[LoginEntry] ([ID], [LoginTime], [EmpID], [GateNumber])
VALUES (1, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 1, N'Gate 1'),
(2, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 1, N'Gate 1'),
(3, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 1, N'Gate 2'),
(4, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 2, N'Gate 1'),
(5, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 2, N'Gate 1'),
(6, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 2, N'Gate 2'),
(7, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 3, N'Gate 1'),
(8, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 3, N'Gate 1'),
(9, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 3, N'Gate 2'),
(10, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 4, N'Gate 1'),
(11, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 4, N'Gate 1'),
(19, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 5, N'Gate 1'),
(20, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 5, N'Gate 1'),
(21, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 5, N'Gate 2'),
(22, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 6, N'Gate 1'),
(23, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 6, N'Gate 1'),
(24, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 6, N'Gate 2')
SET IDENTITY_INSERT [dbo].[LoginEntry] OFF
GO
SELECT
e.ID, dt.EmpId, Name, LoginTime
FROM
Employee e
CROSS APPLY
(SELECT TOP 1
l.ID, l.LoginTime, l.EmpId
FROM
LoginEntry l
WHERE
l.EmpId = e.id) dt
GO
The result I get:
ID EmpId Name LoginTime
-----------------------------------------------
1 1 Employee 1 2014-10-24 08:00:00.000
2 2 Employee 2 2014-10-24 08:00:00.000
3 3 Employee 3 2014-10-24 08:00:00.000
4 4 Employee 4 2014-10-24 08:00:00.000
5 5 Employee 5 2014-10-24 08:00:00.000
6 6 Employee 6 2014-10-24 08:00:00.000
I am Expecting the same result using Joins(inner,right,left,full) in sql server i tried my luck but couldn't, pls can any one help me out thanks in advance
First, your query is incomplete. When you use TOP 1 without an ORDER BY you never have a guarantee which one it will pick. New data, concurrent processes, re-indexing, software patches, the time of day, they all can cause the result to change.
So, it should be something like...
SELECT
e.ID,dt.EmpId,Name,LoginTime
FROM
Employee e
CROSS APPLY
(
SELECT TOP 1
l.ID
,l.LoginTime
,l.EmpId
FROM
LoginEntry l
WHERE
l.EmpId=e.id
ORDER BY
l.LoginTime DESC -- Will cause TOP 1 to pick the most recent value (per employee)
)
dt
As for doing it with joins, doing the TOP 1 (or greatest-n-per-group, for which your n is 1), is longer, messier, and slower. So I won't go in to that.
But you can use ROW_NUMBER() to do the TOP 1 part, and then use a JOIN to associate that result with your main table...
WITH
ordered_logins AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY EmpID ORDER BY LoginTime DESC, ID DESC) AS row_ordinal
FROM
LoginEntry
)
SELECT
e.ID, l.EmpId, e.Name, l.LoginTime
FROM
Employee e
LEFT JOIN
ordered_logins l
ON l.EmpID = e.ID
AND l.row_oridnal = 1
The ROW_NUMBER() assigns every row a value from 1 upwards (per EmpID - the partition clause). It's order by loginTime descending, so the latest logins are first, and just in case two logins have the exact same time, it's secondarily ordered by ID desc.
Then the LEFT JOIN only picks the rows numbered 1 (the latest logins) and if there are no logins gives NULLs instead (so that the employee record is Not discarded due to a lack of join).
Note: the LEFT JOIN equivalent for APPLY is to use OUTER APPLY instead of CROSS APPLY.
Just for the sake of showing one more way to solve the problem, the "longer, messier, and slower" JOIN method that (I thought) MatBailie didn't show would look like this:
SELECT TOP (1) WITH TIES
e.ID
,l.EmpID
,e.Name
,l.LoginTime
FROM
dbo.Employee AS e
JOIN
dbo.LoginEntry AS l
ON
l.EmpID = e.ID
ORDER BY
ROW_NUMBER() OVER (PARTITION BY e.ID ORDER BY l.LoginTime DESC, ID DESC)
The ROW_NUMBER in the ORDER BY clause takes all of the logins for each Employee ID and numbers them by LoginTime with the most recent first (and using the LoginEntry ID as tie breaker, and thanks for that touch, Mat).
Then, the SELECT TOP (1) WITH TIES does its thing. The WITH TIES bit means that it selects the number one result from each PARTITION BY group in the ORDER BY clause.
you can use window function row_number()
with cte as(
SELECT e.ID,l.EmpId,Name,l.LoginTime, row_number() over(partition by e.ID order by l.LoginTime) as rn
FROM Employee e join LoginEntry l on l.EmpId=e.id
) select ID,EmpId,Name,LoginTime from cte where rn=1
demo link

Summing up the records as per given conditions

I have a table like below, What I need that for any particular fund and up to any particular date logic will sum the amount value. Let say I need the sum for 3 dates as 01/28/2015,03/30/2015 and 04/01/2015. Then logic will check for up to first date how many records are there in table . If it found more than one record then it'll sum the amount value. Then for next date it'll sum up to the next date but from the previous date it had summed up.
Id Fund Date Amount
1 A 01/20/2015 250
2 A 02/28/2015 300
3 A 03/20/2015 400
4 A 03/30/2015 200
5 B 04/01/2015 500
6 B 04/01/2015 600
I want result to be like below
Id Fund Date SumOfAmount
1 A 02/28/2015 550
2 A 03/30/2015 600
3 B 04/01/2015 1100
Based on your question, it seems that you want to select a set of dates, and then for each fund and selected date, get the sum of the fund amounts from the selected date to the previous selected date. Here is the result set I think you should be expecting:
Fund Date SumOfAmount
A 2015-02-28 550.00
A 2015-03-30 600.00
B 2015-04-01 1100.00
Here is the code to produce this output:
DECLARE #Dates TABLE
(
SelectedDate DATE PRIMARY KEY
)
INSERT INTO #Dates
VALUES
('02/28/2015')
,('03/30/2015')
,('04/01/2015')
DECLARE #FundAmounts TABLE
(
Id INT PRIMARY KEY
,Fund VARCHAR(5)
,Date DATE
,Amount MONEY
);
INSERT INTO #FundAmounts
VALUES
(1, 'A', '01/20/2015', 250)
,(2, 'A', '02/28/2015', 300)
,(3, 'A', '03/20/2015', 400)
,(4, 'A', '03/30/2015', 200)
,(5, 'B', '04/01/2015', 500)
,(6, 'B', '04/01/2015', 600);
SELECT
F.Fund
,D.SelectedDate AS Date
,SUM(F.Amount) AS SumOfAmount
FROM
(
SELECT
SelectedDate
,LAG(SelectedDate,1,'1/1/1900') OVER (ORDER BY SelectedDate ASC) AS PreviousDate
FROM #Dates
) D
JOIN
#FundAmounts F
ON
F.Date BETWEEN DATEADD(DAY,1,D.PreviousDate) AND D.SelectedDate
GROUP BY
D.SelectedDate
,F.Fund
EDIT: Here is alternative to the LAG function for this example:
FROM
(
SELECT
SelectedDate
,ISNULL((SELECT TOP 1 SelectedDate FROM #Dates WHERE SelectedDate < Dates.SelectedDate ORDER BY SelectedDate DESC),'1/1/1900') AS PreviousDate
FROM #Dates Dates
) D
If i change your incorrect sample data to ...
CREATE TABLE TableName
([Id] int, [Fund] varchar(1), [Date] datetime, [Amount] int)
;
INSERT INTO TableName
([Id], [Fund], [Date], [Amount])
VALUES
(1, 'A', '2015-01-28 00:00:00', 250),
(2, 'A', '2015-01-28 00:00:00', 300),
(3, 'A', '2015-03-30 00:00:00', 400),
(4, 'A', '2015-03-30 00:00:00', 200),
(5, 'B', '2015-04-01 00:00:00', 500),
(6, 'B', '2015-04-01 00:00:00', 600)
;
this query using GROUP BY works:
SELECT MIN(Id) AS Id,
MIN(Fund) AS Fund,
[Date],
SUM(Amount) AS SumOfAmount
FROM dbo.TableName t
WHERE [Date] IN ('01/28/2015','03/30/2015','04/01/2015')
GROUP BY [Date]
Demo
Initially i have used Row_number and month function to pick max date of every month and in 2nd cte i did sum of amounts and joined them..may be this result set matches your out put
declare #t table (Id int,Fund Varchar(1),Dated date,amount int)
insert into #t (id,Fund,dated,amount) values (1,'A','01/20/2015',250),
(2,'A','01/28/2015',300),
(3,'A','03/20/2015',400),
(4,'A','03/30/2015',200),
(5,'B','04/01/2015',600),
(6,'B','04/01/2015',500)
;with cte as (
select ID,Fund,Amount,Dated,ROW_NUMBER() OVER
(PARTITION BY DATEDIFF(MONTH, '20000101', dated)ORDER BY dated desc)AS RN from #t
group by ID,Fund,DATED,Amount
),
CTE2 AS
(select SUM(amount)Amt from #t
GROUP BY MONTH(dated))
,CTE3 AS
(Select Amt,ROW_NUMBER()OVER (ORDER BY amt)R from cte2)
,CTE4 AS
(
Select DISTINCT C.ID As ID,
C.Fund As Fund,
C.Dated As Dated
,ROW_NUMBER()OVER (PARTITION BY RN ORDER BY (SELECT NULL))R
from cte C INNER JOIN CTE3 CC ON c.RN = CC.R
Where C.RN = 1
GROUP BY C.ID,C.Fund,C.RN,C.Dated )
select C.R,C.Fund,C.Dated,cc.Amt from CTE4 C INNER JOIN CTE3 CC
ON c.R = cc.R
declare #TableName table([Id] int, [Fund] varchar(1), [Date] datetime, [Amount] int)
declare #Sample table([SampleDate] datetime)
INSERT INTO #TableName
([Id], [Fund], [Date], [Amount])
VALUES
(1, 'A', '20150120 00:00:00', 250),
(2, 'A', '20150128 00:00:00', 300),
(3, 'A', '20150320 00:00:00', 400),
(4, 'A', '20150330 00:00:00', 200),
(5, 'B', '20150401 00:00:00', 500),
(6, 'B', '20150401 00:00:00', 600)
INSERT INTO #Sample ([SampleDate])
values ('20150128 00:00:00'), ('20150330 00:00:00'), ('20150401 00:00:00')
-- select * from #TableName
-- select * from #Sample
;WITH groups AS (
SELECT [Fund], [Date], [AMOUNT], MIN([SampleDate]) [SampleDate] FROM #TableName
JOIN #Sample ON [Date] <= [SampleDate]
GROUP BY [Fund], [Date], [AMOUNT])
SELECT [Fund], [SampleDate], SUM([AMOUNT]) FROM groups
GROUP BY [Fund], [SampleDate]
Explanation:
The CTE groups finds the earliest SampleDate which is later than (or equals to) your
data's date and enriches your data accordingly, thus giving them the group to be summed up in.
After that, you can group on the derived date.

How to group rows by their DATEDIFF?

I hope you can help me.
I need to display the records in HH_Solution_Audit table -- if 2 or more staffs enter the room within 10 minutes. Here are the requirements:
Display only the events that have a timestamp (LAST_UPDATED) interval of less than or equal to 10 minutes. Therefore, I must compare the current row to the next row and previous row to check if their DATEDIFF is less than or equal to 10 minutes. I’m done with this part.
Show only the records if the number of distinct STAFF_GUID inside the room for less than or equal to 10 minutes is at least 2.
HH_Solution_Audit Table Details:
ID - PK
STAFF_GUID - staff id
LAST_UPDATED - datetime when a staff enters a room
Here's what I got so far. This satisfies requirement # 1 only.
CREATE TABLE HH_Solution_Audit (
ID INT PRIMARY KEY,
STAFF_GUID NVARCHAR(1),
LAST_UPDATED DATETIME
)
GO
INSERT INTO HH_Solution_Audit VALUES (1, 'b', '2013-04-25 9:01')
INSERT INTO HH_Solution_Audit VALUES (2, 'b', '2013-04-25 9:04')
INSERT INTO HH_Solution_Audit VALUES (3, 'b', '2013-04-25 9:13')
INSERT INTO HH_Solution_Audit VALUES (4, 'a', '2013-04-25 10:15')
INSERT INTO HH_Solution_Audit VALUES (5, 'a', '2013-04-25 10:30')
INSERT INTO HH_Solution_Audit VALUES (6, 'a', '2013-04-25 10:33')
INSERT INTO HH_Solution_Audit VALUES (7, 'a', '2013-04-25 10:41')
INSERT INTO HH_Solution_Audit VALUES (8, 'a', '2013-04-25 11:02')
INSERT INTO HH_Solution_Audit VALUES (9, 'a', '2013-04-25 11:30')
INSERT INTO HH_Solution_Audit VALUES (10, 'a', '2013-04-25 11:45')
INSERT INTO HH_Solution_Audit VALUES (11, 'a', '2013-04-25 11:46')
INSERT INTO HH_Solution_Audit VALUES (12, 'a', '2013-04-25 11:51')
INSERT INTO HH_Solution_Audit VALUES (13, 'a', '2013-04-25 12:24')
INSERT INTO HH_Solution_Audit VALUES (14, 'b', '2013-04-25 12:27')
INSERT INTO HH_Solution_Audit VALUES (15, 'b', '2013-04-25 13:35')
DECLARE #numOfPeople INT = 2,
--minimum number of people that must be inside
--the room for #lengthOfStay minutes
#lengthOfStay INT = 10,
--number of minutes of stay
#dateFrom DATETIME = '04/25/2013 00:00',
#dateTo DATETIME = '04/25/2013 23:59';
WITH cteSource AS
(
SELECT ID, STAFF_GUID, LAST_UPDATED,
ROW_NUMBER() OVER (ORDER BY LAST_UPDATED) AS row_num
FROM HH_SOLUTION_AUDIT
WHERE LAST_UPDATED >= #dateFrom AND LAST_UPDATED <= #dateTo
)
SELECT [current].ID, [current].STAFF_GUID, [current].LAST_UPDATED
FROM
cteSource AS [current]
LEFT OUTER JOIN
cteSource AS [previous] ON [current].row_num = [previous].row_num + 1
LEFT OUTER JOIN
cteSource AS [next] ON [current].row_num = [next].row_num - 1
WHERE
DATEDIFF(MINUTE, [previous].LAST_UPDATED, [current].LAST_UPDATED)
<= #lengthOfStay
OR
DATEDIFF(MINUTE, [current].LAST_UPDATED, [next].LAST_UPDATED)
<= #lengthOfStay
ORDER BY [current].ID, [current].LAST_UPDATED
Running the query returns IDs:
1, 2, 3, 5, 6, 7, 10, 11, 12, 13, 14
That satisfies requirement # 1 of having less than or equal to 10 minutes interval between the previous row, current row and next row.
Can you help me with the 2nd requirement? If it's applied, the returned IDs should only be:
13, 14
Here's an idea. You don't need ROW_NUMBER and previous and next records. You just need to queries unioned - one looking for everyone that have someone checked X minutes behind, and another looking for X minutes upfront. Each uses a correlated sub-query and COUNT(*) to find number of matching people. If number is greater then your #numOfPeople - that's it.
EDIT: new version: Instead of doing two queries with 10 minutes upfront and behind, we'll only check for 10 minutes behind - selecting those that match in cteLastOnes. After that will go in another part of query to search for those that actually exist within those 10 minutes. Ultimately again making union of them and the 'last ones'
WITH cteSource AS
(
SELECT ID, STAFF_GUID, LAST_UPDATED
FROM HH_SOLUTION_AUDIT
WHERE LAST_UPDATED >= #dateFrom AND LAST_UPDATED <= #dateTo
)
,cteLastOnes AS
(
SELECT * FROM cteSource c1
WHERE #numOfPeople -1 <= (SELECT COUNT(DISTINCT STAFF_GUID)
FROM cteSource c2
WHERE DATEADD(MI,#lengthOfStay,c2.LAST_UPDATED) > c1.LAST_UPDATED
AND C2.LAST_UPDATED <= C1.LAST_UPDATED
AND c1.STAFF_GUID <> c2.STAFF_GUID)
)
SELECT * FROM cteLastOnes
UNION
SELECT * FROM cteSource s
WHERE EXISTS (SELECT * FROM cteLastOnes l
WHERE DATEADD(MI,#lengthOfStay,s.LAST_UPDATED) > l.LAST_UPDATED
AND s.LAST_UPDATED <= l.LAST_UPDATED
AND s.STAFF_GUID <> l.STAFF_GUID)
SQLFiddle DEMO - new version
SQLFiddle DEMO - old version