Create a lapsed concept based on logic across every row per ID - sql
I am trying to get to a lapsed_date which is when there are >12 weeks (ie. 84 days) for a given ID between:
1) onboarded_at and current_date (if no applied_at exists) - this means lapsed_now if >84 days
2) onboarded_at and min(applied_at) (if one exists)
3) each consecutive applied_at
4) max(applied_at) and current_date - this means lapsed_now if >84 days
If there are multiple instances where he lapsed, then we only show the latest lapsed date.
The attempt I have works for most but not all cases. Can you assists make it work universally?
Sample set:
CREATE TABLE #t
(
id VARCHAR(10),
rank INTEGER,
onboarded_at DATE,
applied_at DATE
);
INSERT INTO #t VALUES
('A',1,'20180101','20180402'),
('A',2,'20180101','20180403'),
('A',3,'20180101','20180504'),
('B',1,'20180201','20180801'),
('C',1,'20180301','20180401'),
('C',2,'20180301','20180501'),
('C',3,'20180301','20180901'),
('D',1,'20180401',null)
Best attempt:
SELECT onb.id,
onb.rank,
onb.onboarded_at,
onb.applied_at,
onb.lapsed_now,
CASE WHEN lapsed_now = 1 OR lapsed_previous = 1
THEN 1
ELSE 0
END lapsed_ever,
CASE WHEN lapsed_now = 1
THEN DATEADD(DAY, 84, lapsed_now_date)
ELSE min_applied_at_add_84
END lapsed_date
FROM
(SELECT *,
CASE
WHEN DATEDIFF(DAY, onboarded_at, MIN(ISNULL(applied_at, onboarded_at)) over (PARTITION BY id)) >= 84
THEN 1
WHEN DATEDIFF(DAY, MAX(applied_at) OVER (PARTITION BY id), GETDATE()) >= 84
THEN 1
ELSE 0
END lapsed_now,
CASE
WHEN MAX(DATEDIFF(DAY, onboarded_at, ISNULL(applied_at, GETDATE()))) OVER (PARTITION BY id) >= 84
THEN 1
ELSE 0
END lapsed_previous,
MAX(applied_at) OVER (PARTITION BY id) lapsed_now_date,
DATEADD(DAY, 84, MIN(CASE WHEN applied_at IS NULL THEN onboarded_at ELSE applied_at END) OVER (PARTITION BY id)) min_applied_at_add_84
FROM #t
) onb
Current solution:
id rank onboarded_at applied_at lapsed_now lapsed_ever lapsed_date
A 1 2018-01-01 2018-04-02 1 1 2018-07-27
A 2 2018-01-01 2018-04-03 1 1 2018-07-27
A 3 2018-01-01 2018-05-04 1 1 2018-07-27
B 2 2018-02-01 2018-08-01 1 1 2018-10-24
C 1 2018-03-01 2018-04-01 0 1 2018-06-24
C 2 2018-03-01 2018-05-01 0 1 2018-06-24
C 3 2018-03-01 2018-09-01 0 1 2018-06-24
D 1 2018-04-01 null 1 1 2018-06-24
Expected solution:
id rank onboarded_at applied_at lapsed_now lapsed_ever lapsed_date
A 1 2018-01-01 2018-04-02 1 1 2018-07-27 (not max lapsed date)
A 2 2018-01-01 2018-04-03 1 1 2018-07-27
A 3 2018-01-01 2018-05-04 1 1 2018-07-27 (May 4 + 84)
B 1 2018-02-01 2018-08-01 0 1 2018-04-26 (Feb 1 + 84)
C 1 2018-03-01 2018-04-01 0 1 2018-07-24
C 2 2018-03-01 2018-05-01 0 1 2018-07-24 (May 1 + 84)
C 3 2018-03-01 2018-09-01 0 1 2018-07-24
D 1 2018-04-01 null 1 1 2018-06-24
Bit of guesswork here, but hopefully this does the trick:
SELECT res.id,
res.rank,
res.onboarded_at,
res.applied_at,
res.lapsed_now,
CASE WHEN lapsed_now = 1 OR lapsed_previous = 1
THEN 1
ELSE 0
END lapsed_ever,
CASE
WHEN lapsed_now = 1
THEN DATEADD(DAY, 84, lapsed_now_date)
WHEN applied_difference_gt84 IS NOT NULL
THEN DATEADD(DAY, 84, applied_difference_gt84)
WHEN DATEDIFF(DAY, min_applied_at_add_84, GETDATE()) < 84
THEN DATEADD(DAY, 84, onboarded_at)
ELSE min_applied_at_add_84
END lapsed_date
FROM (
SELECT *, MAX(applied_difference) OVER (PARTITION BY id ORDER BY rank ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) applied_difference_gt84
FROM
(
SELECT *,
CASE
WHEN DATEDIFF(DAY, onboarded_at, MIN(ISNULL(applied_at, onboarded_at)) over (PARTITION BY id)) >= 84
AND DATEDIFF(DAY, MAX(applied_at) OVER (PARTITION BY id), GETDATE()) >= 84
THEN 1
WHEN DATEDIFF(DAY, ISNULL(MAX(applied_at) OVER (PARTITION BY id), onboarded_at), GETDATE()) >= 84
THEN 1
ELSE 0
END lapsed_now,
CASE
WHEN MAX(DATEDIFF(DAY, onboarded_at, ISNULL(applied_at, GETDATE()))) OVER (PARTITION BY id) >= 84
THEN 1
ELSE 0
END lapsed_previous,
CASE
WHEN DATEDIFF(MONTH, applied_at, LEAD(applied_at, 1) OVER (PARTITION BY id ORDER BY rank)) >= 2
THEN applied_at
ELSE NULL
END applied_difference,
ISNULL(MAX(applied_at) OVER (PARTITION BY id), onboarded_at) lapsed_now_date,
DATEADD(DAY, 84, MIN(CASE WHEN applied_at IS NULL THEN onboarded_at ELSE applied_at END) OVER (PARTITION BY id)) min_applied_at_add_84
FROM #t
) onb
) res
Results:
id rank onboarded_at applied_at lapsed_now lapsed_ever lapsed_date
A 1 2018-01-01 2018-04-02 1 1 2018-07-27
A 2 2018-01-01 2018-04-03 1 1 2018-07-27
A 3 2018-01-01 2018-05-04 1 1 2018-07-27
B 1 2018-02-01 2018-08-01 0 1 2018-04-26
C 1 2018-03-01 2018-04-01 0 1 2018-07-24
C 2 2018-03-01 2018-05-01 0 1 2018-07-24
C 3 2018-03-01 2018-09-01 0 1 2018-07-24
D 1 2018-04-01 (null) 1 1 2018-06-24
It's a bit messy because of the need to calculate the difference between the applied_at dates.
#Jim, inspired by your answer, I created the following solution.
I think it is easily understandable and intuitive, knowing the lapsed criteria:
SELECT id, onboarded_at, applied_at,
max(case when (zero_applicants is not null and current_date - onboarded_at > 84) or (last_applicant is not null and current_date - last_applicant > 84) then 1 else 0 end) over (partition by id) lapsed_now,
max(case when (zero_applicants is not null and current_date - onboarded_at > 84) or (one_applicant is not null and applied_at - onboarded_at > 84)
or (one_applicant is not null and current_date - applied_at > 84) or (next_applicant is not null and next_applicant- applied_at > 84)
or (last_applicant is not null and current_date - last_applicant > 84) then 1 else 0 end) over(partition by id) lapsed_ever,
max(case when zero_applicants is not null and current_date - onboarded_at > 84 then onboarded_at + 84
when one_applicant is not null and applied_at - onboarded_at > 84 then onboarded_at + 84
when one_applicant is not null and current_date - applied_at > 84 then applied_at + 84
when next_applicant is not null and next_applicant - applied_at > 84 then applied_at + 84
when last_applicant is not null and current_date - last_applicant > 84 then last_applicant + 84
end) over (partition by id) lapsed_date
from (
select *,
case when MAX(applied_at) OVER (PARTITION BY id) is null then onboarded_at end as zero_applicants,
case when count(applied_at) over(partition by id)=1 then onboarded_at end as one_applicant,
case when count(applied_at) over(partition by id)>1 then LEAD(applied_at, 1) OVER (PARTITION BY id ORDER BY applied_at) end as next_applicant,
case when LEAD(applied_at, 1) OVER (PARTITION BY id ORDER BY applied_at) is null then MAX(applied_at) over(partition by id) end as last_applicant
from #t
) res
order by id, applied_at
Related
SQL Consecutive Date Cumulative Count
I am doing some roster analysis and need to identify when an employee has worked for 5 or more consecutive days. In my table, I can extract data something like the below (note, there are lot more columns, this is just a cut down example): Emp Start First_Entry 1234 23/06/2016 1 1234 24/06/2016 1 1234 24/06/2016 0 1234 25/06/2016 1 1234 26/06/2016 1 1234 27/06/2016 1 1234 28/06/2016 1 1234 29/06/2016 1 1234 29/06/2016 0 1234 30/06/2016 1 1234 2/07/2016 1 1234 3/07/2016 1 1234 3/07/2016 0 1234 4/07/2016 1 1234 4/07/2016 0 1234 5/07/2016 1 1234 6/07/2016 1 1234 9/07/2016 1 1234 10/07/2016 1 1234 11/07/2016 1 1234 12/07/2016 1 And what I am after is something like this: Emp Start First_Entry Consecutive_Days Over_5 Status 1234 23/06/2016 1 1 0 Worked < 5 1234 24/06/2016 1 2 0 Worked < 5 1234 24/06/2016 0 2 0 Worked < 5 1234 25/06/2016 1 3 0 Worked < 5 1234 26/06/2016 1 4 0 Worked < 5 1234 27/06/2016 1 5 1 Worked >= 5 1234 28/06/2016 1 6 1 Worked >= 5 1234 29/06/2016 1 7 1 Worked >= 5 1234 29/06/2016 0 7 1 Worked >= 5 1234 30/06/2016 1 8 1 Worked >= 5 1234 02/07/2016 1 1 0 Worked < 5 1234 03/07/2016 1 2 0 Worked < 5 1234 03/07/2016 0 2 0 Worked < 5 1234 04/07/2016 1 3 0 Worked < 5 1234 04/07/2016 0 3 0 Worked < 5 1234 05/07/2016 1 4 0 Worked < 5 1234 06/07/2016 1 5 1 Worked >= 5 1234 09/07/2016 1 1 0 Worked < 5 1234 10/07/2016 1 2 0 Worked < 5 1234 11/07/2016 1 3 0 Worked < 5 1234 12/07/2016 1 4 0 Worked < 5 I'm really not sure how to go about getting the cumulative count for consecutive days, so any help you can give will be amazing
Probably someone would come up with a brilliant solution but this would do. Your problem looks like an "Gaps and Islands" problem. Finding islands of date ranges we can find out the rest easily. In the below SQL, #mindate is not a must, but makes it easier. CREATE TABLE #temptable ( [Emp] CHAR(4), [startDate] DATE, [First_Entry] BIT ); INSERT INTO #temptable ( [Emp], [startDate], [First_Entry] ) VALUES ('1234', N'2016-06-23', 1), ('1234', N'2016-06-24', 1), ('1234', N'2016-06-24', 0), ('1234', N'2016-06-25', 1), ('1234', N'2016-06-26', 1), ('1234', N'2016-06-27', 1), ('1234', N'2016-06-28', 1), ('1234', N'2016-06-29', 1), ('1234', N'2016-06-29', 0), ('1234', N'2016-06-30', 1), ('1234', N'2016-07-02', 1), ('1234', N'2016-07-03', 1), ('1234', N'2016-07-03', 0), ('1234', N'2016-07-04', 1), ('1234', N'2016-07-04', 0), ('1234', N'2016-07-05', 1), ('1234', N'2016-07-06', 1), ('1234', N'2016-07-09', 1), ('1234', N'2016-07-10', 1), ('1234', N'2016-07-11', 1), ('1234', N'2016-07-12', 1); DECLARE #minDate DATE; SELECT #minDate = DATEADD(d, -1, MIN(startDate)) FROM #temptable; WITH firstOnly AS (SELECT * FROM #temptable WHERE First_Entry = 1), grouper (emp, startDate, grp) AS (SELECT Emp, startDate, DATEDIFF(d, #minDate, startDate) - ROW_NUMBER() OVER (PARTITION BY Emp ORDER BY startDate) FROM firstOnly), islands (emp, START, [end]) AS (SELECT emp, MIN(startDate), MAX(startDate) FROM grouper GROUP BY emp, grp), consecutives (emp, startDate, consecutive_days) AS (SELECT f.Emp, f.startDate, -- i.START, -- i.[end], ROW_NUMBER() OVER (PARTITION BY f.Emp, i.START ORDER BY i.START) FROM firstOnly f INNER JOIN islands i ON f.startDate BETWEEN i.START AND i.[end]) SELECT t.Emp, t.startDate, t.First_Entry, c.consecutive_days, CAST(CASE WHEN c.consecutive_days < 5 THEN 0 ELSE 1 END AS BIT) Over_5, CASE WHEN c.consecutive_days < 5 THEN 'Worked < 5' ELSE 'Worked >= 5' END [Status] FROM consecutives c INNER JOIN #temptable t ON t.Emp = c.emp AND t.startDate = c.startDate; DROP TABLE #temptable;
This is a island and gap problem, You can try to use LAG window function to get the previous startDate row for each Emp, ten use SUM window function to calculate which days are continuous. Finally, We can use CASE WHEN expression to judge whether the day is greater than 5. ;WITH CTE AS ( SELECT [Emp], [startDate], [First_Entry], SUM(CASE WHEN DATEDIFF(dd,f_Dt,startDate) <= 1 THEN 0 ELSE 1 END) OVER(PARTITION BY Emp ORDER BY startDate) grp FROM ( SELECT *, LAG(startDate,1,startDate) OVER(PARTITION BY Emp ORDER BY startDate) f_Dt FROM T ) t1 ) SELECT [Emp], [startDate], [First_Entry], SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) Consecutive_Days, (CASE WHEN SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) >= 5 THEN 1 ELSE 0 END) Over_5, (CASE WHEN SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) >= 5 THEN 'Worked >= 5' ELSE 'Worked < 5' END) Status FROM CTE sqlfiddle
How to count records by store, day wise and in 2 hours range period with pivot table format?
I have multiple stores records with user's punch records. I would like to create a report for each store' day-wise which 2 hours have how many employees was working? Clock In ID Last Name First Name In time Out time 912 Bedolla Jorge 1/1/2021 7:29 1/1/2021 11:31 912 Romero Gabriel 1/1/2021 10:55 1/1/2021 14:07 912 Bedolla Jorge 1/1/2021 12:00 1/1/2021 16:07 912 Zaragoza Daniel 1/1/2021 13:06 1/1/2021 14:57 912 Thaxton Christopher 1/1/2021 14:01 1/1/2021 16:57 912 Jones Elena 1/1/2021 14:01 1/1/2021 16:35 912 Zaragoza Daniel 1/1/2021 15:12 1/1/2021 17:09 912 Jones Elena 1/1/2021 16:45 1/1/2021 18:05 912 Smith Kirsten 1/1/2021 17:30 1/1/2021 20:01 912 Zaragoza Daniel 1/1/2021 17:41 1/1/2021 21:49 Looking for a result something like below. (below result data is incorrect) store ForDate 0-2 2-4 4-6 6-8 8-10 10-12 12-14 14-16 16-18 18-20 20-22 22-0 912 2021-01-01 0 0 0 1 0 1 2 3 3 2 3 0 912 2021-01-02 0 0 2 1 2 3 2 4 2 3 3 0 912 2021-01-03 0 0 1 1 2 2 2 2 3 0 2 0 912 2021-01-04 0 0 2 0 2 1 2 2 3 3 1 0 912 2021-01-05 0 0 2 1 1 3 4 4 2 2 1 0 912 2021-01-06 0 0 2 0 2 1 2 3 3 2 3 0 912 2021-01-07 0 0 2 1 2 1 3 4 2 2 0 0 912 2021-01-08 0 0 2 2 2 1 3 2 1 2 1 0 912 2021-01-09 0 0 1 1 0 3 1 3 2 2 3 0 912 2021-01-10 0 0 2 2 1 2 2 1 1 2 2 0 I tried to solve with below query but it's wrong and stil it's just inTime but outTime is pending. SELECT TOP 10 store, ForDate, ISNULL([0], 0) + ISNULL([1], 0) AS [0-1], ISNULL([2], 0) + ISNULL([3], 0) AS [2-3], ISNULL([4], 0) + ISNULL([5], 0) AS [4-5], ISNULL([6], 0) + ISNULL([7], 0) AS [6-7], ISNULL([8], 0) + ISNULL([9], 0) AS [8-9], ISNULL([10], 0) + ISNULL([11], 0) AS [10-11], ISNULL([12], 0) + ISNULL([13], 0) AS [12-13], ISNULL([14], 0) + ISNULL([15], 0) AS [14-15], ISNULL([16], 0) + ISNULL([17], 0) AS [16-17], ISNULL([18], 0) + ISNULL([19], 0) AS [18-19], ISNULL([20], 0) + ISNULL([21], 0) AS [20-21], ISNULL([22], 0) + ISNULL([23], 0) AS [22-23] FROM ( select * from ( select store, CAST(InTime as date) AS ForDate, DATEPART(hour,InTime) AS OnHour, COUNT(*) AS Totals from Punches GROUP BY store, CAST(InTime as date), DATEPART(hour,InTime) ) src pivot ( sum(Totals) for OnHour in ([0],[1], [2], [3],[4], [5], [6],[7],[8], [9], [10],[11], [12], [13],[14], [15], [16],[17],[18], [19],[20],[21], [22], [23]) ) piv ) t1 order by store, ForDate Here is SQL Fiddle with data. https://www.db-fiddle.com/f/jo4atDmmj8cshyK1CWWo7x/2
That is insane but worth trying SELECT storeid, ForDate, ISNULL([0], 0) + ISNULL([1], 0) AS [0-1], ISNULL([2], 0) + ISNULL([3], 0) AS [2-3], ISNULL([4], 0) + ISNULL([5], 0) AS [4-5], ISNULL([6], 0) + ISNULL([7], 0) AS [6-7], ISNULL([8], 0) + ISNULL([9], 0) AS [8-9], ISNULL([10], 0) + ISNULL([11], 0) AS [10-11], ISNULL([12], 0) + ISNULL([13], 0) AS [12-13], ISNULL([14], 0) + ISNULL([15], 0) AS [14-15], ISNULL([16], 0) + ISNULL([17], 0) AS [16-17], ISNULL([18], 0) + ISNULL([19], 0) AS [18-19], ISNULL([20], 0) + ISNULL([21], 0) AS [20-21], ISNULL([22], 0) + ISNULL([23], 0) AS [22-23] FROM ( select * from ( SELECT [Dates].StoreId, [Dates].ForDate, Hours.hour OnHour, COUNT(*) Totals FROM ( SELECT storeId, CAST(InTime as date) AS ForDate FROM Punches UNION SELECT storeId, CAST(OutTime AS date) AS ForDate FROM Punches ) [Dates] JOIN ( SELECT * FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23)) hours([hour]) ) [Hours] ON 1=1 JOIN ( SELECT * FROM dbo.Punches ) p ON p.StoreId = [Dates].StoreId AND (DATEADD(HOUR, [Hours].[hour], CAST([Dates].ForDate AS DATETIME)) BETWEEN CAST(p.InTime AS DATETIME) AND CAST(p.Outtime AS DATETIME)) GROUP BY [Dates].StoreId, Dates.ForDate, [hour] ) src pivot ( sum(Totals) for OnHour in ([0],[1], [2], [3],[4], [5], [6],[7],[8], [9], [10],[11], [12], [13],[14], [15], [16],[17],[18], [19],[20],[21], [22], [23]) ) piv ) t1 order by storeid, ForDate Let's go a bit deeper: I generated all possible dates by this part: SELECT storeId, CAST(InTime as date) AS ForDate FROM Punches UNION SELECT storeId, CAST(OutTime AS date) AS ForDate FROM Punches And all possible hours by doing this: SELECT * FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23)) hours([hour]) Then I joined them to find all possible date-hours. After that, I just joined them with punches and counted the punch if the generated date-hour is between inTime and OutTime by adding this condition: (DATEADD(HOUR, [Hours].[hour], CAST([Dates].ForDate AS DATETIME)) BETWEEN CAST(p.InTime AS DATETIME) AND CAST(p.Outtime AS DATETIME)) The rest is exactly the same as your code
You can use simple CASE statement to get what you so far try- SELECT StoreId, CAST(InTime as Date) as ForDate, SUM(CASE WHEN DATEPART(hour,InTime) in (0,1) THEN 1 ELSE 0 END) AS [0-1], SUM(CASE WHEN DATEPART(hour,InTime) in (2,3) THEN 1 ELSE 0 END) AS [2-3], SUM(CASE WHEN DATEPART(hour,InTime) in (4,5) THEN 1 ELSE 0 END) AS [4-5], SUM(CASE WHEN DATEPART(hour,InTime) in (6,7) THEN 1 ELSE 0 END) AS [6-7], SUM(CASE WHEN DATEPART(hour,InTime) in (8,9) THEN 1 ELSE 0 END) AS [8-9], SUM(CASE WHEN DATEPART(hour,InTime) in (10,11) THEN 1 ELSE 0 END) AS [10-11], SUM(CASE WHEN DATEPART(hour,InTime) in (12,13) THEN 1 ELSE 0 END) AS [12-13], SUM(CASE WHEN DATEPART(hour,InTime) in (14,15) THEN 1 ELSE 0 END) AS [14-15], SUM(CASE WHEN DATEPART(hour,InTime) in (16,17) THEN 1 ELSE 0 END) AS [16-17], SUM(CASE WHEN DATEPART(hour,InTime) in (18,19) THEN 1 ELSE 0 END) AS [18-19], SUM(CASE WHEN DATEPART(hour,InTime) in (20,21) THEN 1 ELSE 0 END) AS [20-21], SUM(CASE WHEN DATEPART(hour,InTime) in (22,23) THEN 1 ELSE 0 END) AS [22-23] FROM Punches GROUP BY StoreId, CAST(InTime as Date) And for your final result use below query:- Select StoreId,ForDate, SUM(CASE WHEN [0-2]>0 THEN 1 ELSE 0 END) AS [0-2], SUM(CASE WHEN [2-4]>0 THEN 1 ELSE 0 END) AS [2-4], SUM(CASE WHEN [4-6]>0 THEN 1 ELSE 0 END) AS [4-6], SUM(CASE WHEN [6-8]>0 THEN 1 ELSE 0 END) AS [6-8], SUM(CASE WHEN [8-10]>0 THEN 1 ELSE 0 END) AS [8-10], SUM(CASE WHEN [10-12]>0 THEN 1 ELSE 0 END) AS [10-12], SUM(CASE WHEN [12-14]>0 THEN 1 ELSE 0 END) AS [12-14], SUM(CASE WHEN [14-16]>0 THEN 1 ELSE 0 END) AS [14-16], SUM(CASE WHEN [16-18]>0 THEN 1 ELSE 0 END) AS [16-18], SUM(CASE WHEN [18-20]>0 THEN 1 ELSE 0 END) AS [18-20], SUM(CASE WHEN [20-22]>0 THEN 1 ELSE 0 END) AS [20-22], SUM(CASE WHEN [22-24]>0 THEN 1 ELSE 0 END) AS [22-24] from (SELECT StoreId,FirstName+LastName as Name, CAST(InTime as Date) as ForDate, SUM(CASE WHEN DATEPART(hour,InTime) in (0,1) OR (DATEPART(hour,InTime)<0 AND DATEPART(hour,OutTime)>=1) THEN 1 ELSE 0 END) AS [0-2], SUM(CASE WHEN DATEPART(hour,InTime) in (2,3) OR (DATEPART(hour,InTime)<2 AND DATEPART(hour,OutTime)>=2) THEN 1 ELSE 0 END) AS [2-4], SUM(CASE WHEN DATEPART(hour,InTime) in (4,5) OR (DATEPART(hour,InTime)<4 AND DATEPART(hour,OutTime)>=4) THEN 1 ELSE 0 END) AS [4-6], SUM(CASE WHEN DATEPART(hour,InTime) in (6,7) OR (DATEPART(hour,InTime)<6 AND DATEPART(hour,OutTime)>=6) THEN 1 ELSE 0 END) AS [6-8], SUM(CASE WHEN DATEPART(hour,InTime) in (8,9) OR (DATEPART(hour,InTime)<8 AND DATEPART(hour,OutTime)>=8) THEN 1 ELSE 0 END) AS [8-10], SUM(CASE WHEN DATEPART(hour,InTime) in (10,11) OR (DATEPART(hour,InTime)<10 AND DATEPART(hour,OutTime)>=10) THEN 1 ELSE 0 END) AS [10-12], SUM(CASE WHEN DATEPART(hour,InTime) in (12,13) OR (DATEPART(hour,InTime)<12 AND DATEPART(hour,OutTime)>=12) THEN 1 ELSE 0 END) AS [12-14], SUM(CASE WHEN DATEPART(hour,InTime) in (14,15) OR (DATEPART(hour,InTime)<14 AND DATEPART(hour,OutTime)>=14) THEN 1 ELSE 0 END) AS [14-16], SUM(CASE WHEN DATEPART(hour,InTime) in (16,17) OR (DATEPART(hour,InTime)<16 AND DATEPART(hour,OutTime)>=16) THEN 1 ELSE 0 END) AS [16-18], SUM(CASE WHEN DATEPART(hour,InTime) in (18,19) OR (DATEPART(hour,InTime)<18 AND DATEPART(hour,OutTime)>=18) THEN 1 ELSE 0 END) AS [18-20], SUM(CASE WHEN DATEPART(hour,InTime) in (20,21) OR (DATEPART(hour,InTime)<20 AND DATEPART(hour,OutTime)>=20) THEN 1 ELSE 0 END) AS [20-22], SUM(CASE WHEN DATEPART(hour,InTime) in (22,23) OR (DATEPART(hour,InTime)<22 AND DATEPART(hour,OutTime)>=22) THEN 1 ELSE 0 END) AS [22-24] FROM Punches GROUP BY StoreId,FirstName+LastName,CAST(InTime as Date)) detailsQuery GROUP BY StoreId,ForDate
Sum column values over a window based on a week range (impala)
Given a table as follows : client_id date connections --------------------------------------- 121438297 2018-01-03 0 121438297 2018-01-08 1 121438297 2018-01-10 3 121438297 2018-01-12 1 121438297 2018-01-19 7 363863811 2018-01-18 0 363863811 2018-01-30 5 363863811 2018-02-01 4 363863811 2018-02-10 0 I am looking for an efficient way to sum the number of connections that occur within 6 days following the current row (the current row being included in the sum), partitioned by client_id, which would result in : client_id date connections connections_within_6_days --------------------------------------------------------------------- 121438297 2018-01-03 0 1 121438297 2018-01-08 1 5 121438297 2018-01-10 3 4 121438297 2018-01-12 1 1 121438297 2018-01-19 7 7 363863811 2018-01-18 0 0 363863811 2018-01-30 5 9 363863811 2018-02-01 4 4 363863811 2018-02-10 0 0 Issues : I do not want to add all missing dates and then perform a sliding window counting the 7 following rows because my table is already extremely large. I am using Impala and the range between interval '7' days following and current row is not supported. Edit : I am looking for a generic answer taking into account the fact that I will need to change the window size to larger numbers (30+ days for example)
This answers the original version of the question. Impala doesn't fully support range between. Unfortunately, that doesn't leave many options. One is to use lag() with lots of explicit logic: select t.*, ( (case when lag(date, 6) over (partition by client_id order by date) = date - interval 6 day then lag(connections, 6) over (partition by client_id order by date) else 0 end) + (case when lag(date, 5) over (partition by client_id order by date) = date - interval 6 day then lag(connections, 5) over (partition by client_id order by date) else 0 end) + (case when lag(date, 4) over (partition by client_id order by date) = date - interval 6 day then lag(connections, 4) over (partition by client_id order by date) else 0 end) + (case when lag(date, 3) over (partition by client_id order by date) = date - interval 6 day then lag(connections, 3) over (partition by client_id order by date) else 0 end) + (case when lag(date, 2) over (partition by client_id order by date) = date - interval 6 day then lag(connections, 2) over (partition by client_id order by date) else 0 end) + (case when lag(date, 1) over (partition by client_id order by date) = date - interval 6 day then lag(connections, 1) over (partition by client_id order by date) else 0 end) + connections ) as connections_within_6_days from t; Unfortunately, this doesn't generalize very well. If you want a wide range of days, you might want to ask another question.
SQL query which converts sets or range of records on the basis of record before and after rows of that range
Suppose this table Day Present Absent Holiday 1/1/2019 1 0 0 1/2/2019 0 1 0 1/3/2019 0 0 1 1/4/2019 0 0 1 1/5/2019 0 0 1 1/6/2019 0 1 0 1/7/2019 1 0 0 1/8/2019 0 1 0 1/9/2019 0 0 1 1/10/2019 0 1 0 I want to mark all holidays zero which are between absents, if an employee is absent before and after the holidays, then holidays will become absent days for him. I don't want to use a loop, I want set base query approach.
As a select, you can use lead() and lag(): select t.*, (case when prev_absent = 0 and next_absent = 0 and holiday = 1 then 0 else holiday end) as new_holiday from (select t.*, lag(absent) over (order by day) as prev_absent, lead(absent) over (order by day) as next_absent from t ) t; If this does what you want, then you can incorporate this into an update: with toupdate as ( select t.*, (case when prev_absent = 0 and next_absent = 0 and holiday = 1 then 0 else holiday end) as new_holiday from (select t.*, lag(absent) over (order by day) as prev_absent, lead(absent) over (order by day) as next_absent from t ) t ) t update toupdate set holiday = new_holiday where holiday <> new_holiday; EDIT: You can also do this with joins: select t.*, (case when tprev.absent = 0 and tnext.absent = 0 and t.holiday = 1 then 0 else holiday end) as new_holiday from t left join t tprev on tnext.day = dateadd(day, -1, t.day) left join t tnext on tprev.day = dateadd(day, 1, tprev.day)
Sequence of Patterns within Date/time range
I have a problem I would need help on .. In the example below, if I want to get scenarios based on the data patterns 010 as scenario1, 000 as scenario2, 111 as scenario3 within the Id.. Ignore the records that doesn't follow the pattern.. Ex: id date Status 1 2012-10-18 1 1 2012-10-19 1 1 2012-10-20 0 1 2012-10-21 0 1 2012-10-22 0 1 2012-10-23 0 1 2012-10-24 1 1 2012-10-25 0 1 2012-10-26 0 1 2012-10-27 0 1 2012-10-28 1 2 2012-10-19 0 2 2012-10-20 0 2 2012-10-21 0 2 2012-10-22 1 2 2012-10-23 1 scenario1: 1 2012-10-23 0 1 2012-10-24 1 1 2012-10-25 0 Scenario2: 1 2012-10-20 0 1 2012-10-21 0 1 2012-10-22 0 2 2012-10-19 0 2 2012-10-20 0 2 2012-10-21 0 Scenario3 - none (no records)
You can construct the patterns as strings and then use string comparison. At least part of the trick is that you want all rows in the pattern, so you need to construct all potential patterns where each row might appear: select t.* from (select t.*, concat(lag(status), -2) over (partition by id order by date), lag(status), -1) over (partition by id order by date), status ) as pat1, concat(lag(status), -1) over (partition by id order by date), status, lead(status), 1) over (partition by id order by date) ) as pat2, concat(status, lead(status), 1) over (partition by id order by date), lead(status), 2) over (partition by id order by date) ) as pat3 from t ) t where '010' in (pat1, pat2, pat3);