Split Table into Windows with Recurring Attributes - sql

My title is awful, because I am not sure how to describe the challenge. I would love an edit if someone can think of a more descriptive title. Hopefully my input/desired output will help explain. Here is some sample input data:
create table #input (
num varchar(10),
code varchar(10),
event_date date
)
insert into #input (num, code, event_date)
values('123456', 'Active', '2007-09-10'),
('123456', 'Active', '2010-09-15'),
('123456', 'Active', '2010-09-24'),
('123456', 'Inactive', '2018-09-17'),
('123456', 'Inactive', '2019-01-01'),
('123456', 'Active', '2019-02-08')
select *
from #input
order by event_date
I want to tag each record for each group of num + code with the same number. However, I want the time periods to stay separate. Here is the desired result:
create table #result (
num varchar(10),
code varchar(10),
event_date date,
tag int
)
insert into #result (num, code, event_date, tag)
values('123456', 'Active', '2007-09-10', 1),
('123456', 'Active', '2010-09-15', 1),
('123456', 'Active', '2010-09-24', 1),
('123456', 'Inactive', '2018-09-17', 2),
('123456', 'Inactive', '2019-01-01', 2),
('123456', 'Active', '2019-02-08', 3)
select *
from #result
order by event_date
Obviously normal window partitions like this...
select *, row_number() over(partition by num, code order by event_date) rn
from #input
order by event_date
...don't work, because there is no field on which to partition that would split the two "Active" groups (two groups, because they happen during two time frames). How would I reach my desired result? I have a hunch that a series of lag() and lead() functions might work, but I couldn't get anywhere meaningful.
Alternatively, how would I achieve the results so the categories overlap by one?
create table #result_new (
num varchar(10),
code varchar(10),
event_date date,
tag int
)
insert into #result (num, code, event_date, tag)
values('123456', 'Active', '2007-09-10', 1),
('123456', 'Active', '2010-09-15', 1),
('123456', 'Active', '2010-09-24', 1),
('123456', 'Inactive', '2018-09-17', 1),
('123456', 'Inactive', '2019-01-01', 2),
('123456', 'Active', '2019-02-08', 2)
select *
from #result_new
order by event_date

LAG gets your half way there, but not the whole way. You can use LAG to check the value of the last row, and create (what I have called) a switch. You can then use a SUM window function, with a ROWs BETWEEN clause to get the value for tag:
WITH CTE AS(
SELECT num,
code,
event_date,
CASE WHEN code = LAG(code) OVER (PARTITION BY num ORDER BY event_date) THEN 0 ELSE 1 END AS Switch
FROM #input)
SELECT num,
code,
event_date,
SUM(Switch) OVER (PARTITION BY num ORDER BY event_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS tag
FROM CTE;

Related

I want to find the date intervals at which the employee comes on a regular basis

Imagine a employee who works in a company whos having a contract to work on a specific task, he comes in and goes on start and end date respectively. I want to get the interval at which the employee comes to office without any absence.
Example Data:
DECLARE #TimeClock TABLE (PunchID INT IDENTITY, EmployeeID INT, PunchinDate DATE)
INSERT INTO #TimeClock (EmployeeID, PunchInDate) VALUES
(1, '2020-01-01'), (1, '2020-01-02'), (1, '2020-01-03'), (1, '2020-01-04'),
(1, '2020-01-05'), (1, '2020-01-06'), (1, '2020-01-07'), (1, '2020-01-08'),
(1, '2020-01-09'), (1, '2020-01-10'), (1, '2020-01-11'), (1, '2020-01-12'),
(1, '2020-01-13'), (1, '2020-01-14'), (1, '2020-01-16'),
(1, '2020-01-17'), (1, '2020-01-18'), (1, '2020-01-19'), (1, '2020-01-20'),
(1, '2020-01-21'), (1, '2020-01-22'), (1, '2020-01-23'), (1, '2020-01-24'),
(1, '2020-01-25'), (1, '2020-01-26'), (1, '2020-01-27'), (1, '2020-01-28'),
(1, '2020-01-29'), (1, '2020-01-30'), (1, '2020-01-31'),
(1, '2020-02-01'), (1, '2020-02-02'), (1, '2020-02-03'), (1, '2020-02-04'),
(1, '2020-02-05'), (1, '2020-02-06'), (1, '2020-02-07'), (1, '2020-02-08'),
(1, '2020-02-09'), (1, '2020-02-10'), (1, '2020-02-12'),
(1, '2020-02-13'), (1, '2020-02-14'), (1, '2020-02-15'), (1, '2020-02-16');
--the output shall look like this '2020-01-01 to 2020-02-10' as this is the interval at which the employee comes without any leave
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-01') as START_DATE, FORMAT( getdate(), '2020-01-10') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-11') as START_DATE, FORMAT( getdate(), '2020-01-15') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-21') as START_DATE, FORMAT( getdate(), '2020-01-31') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-02-01') as START_DATE, FORMAT( getdate(), '2020-02-10') as END_DATE
--the output shall look like this '2020-01-01 to 2020-01-15' and '2020 01-21 to 2020-02-10'as these are the intervals at which the employee comes without any leave
Using the example data provided we can query the table like this:
;WITH iterate AS (
SELECT *, DATEADD(DAY,1,PunchinDate) AS NextDate
FROM #TimeClock
), base AS (
SELECT *
FROM (
SELECT *, CASE WHEN DATEADD(DAY,-1,PunchInDate) = LAG(PunchinDate,1) OVER (PARTITION BY EmployeeID ORDER BY PunchinDate) THEN PunchInDate END AS s
FROM iterate
) a
WHERE s IS NULL
), rCTE AS (
SELECT EmployeeID, PunchInDate AS StartDate, PunchInDate AS EndDate, NextDate
FROM base
UNION ALL
SELECT a.EmployeeID, a.StartDate, r.PunchInDate, r.NextDate
FROM rCTE a
INNER JOIN iterate r
ON a.NextDate = r.PunchinDate
AND a.EmployeeID = r.EmployeeID
)
SELECT EmployeeID, StartDate, MAX(EndDate) AS EndDate, DATEDIFF(DAY,StartDate,MAX(EndDate)) AS Streak
FROM rCTE
GROUP BY rCTE.EmployeeID, rCTE.StartDate
This is known as a recursive common table expression, and allows us to compare values between related rows. In this case we're looking for rows where they follow a streak, and we want o re-start that streak anytime we encounter a break. We're using a windowed function called LAG to look back a row to the previous value, and compare it to the one we have now. If it's not yesterday, then we start a new streak.
EmployeeID StartDate EndDate Streak
------------------------------------------
1 2020-01-01 2020-01-15 14
1 2020-01-17 2020-02-10 24
1 2020-02-12 2020-02-16 4

SQL Pivot Half of table

I have a table that consists of time information. It's basically:
Employee, Date, Seq, Time In, Time Out.
They can clock out multiple times a day, so I'm trying to get all of the clock outs in a day on one row. My result would be something like:
Employee, Date, TimeIn1, TimeOut1, TimeIn2, TimeOut2, TimeIn3, TimeOut3....
Where the 1, 2, and 3 are the sequence numbers. I know I could just do a bunch of left joins to the table itself based on employee=employee, date=date, and seq=seq+1, but is there a way to do it in a pivot? I don't want to pivot the employee and date fields, just the time in and time out.
The short answer is: Yes, it's possible.
The exact code will be updated if/when you provide sample data to clarify some points, but you can absolutely pivot the times out while leaving the employee/work date alone.
Sorry for the wall of code; none of the fiddle sites are working from my current computer
declare #test table (
pk int,
workdate date,
seq int,
tIN time,
tOUT time
)
insert into #test values
(1, '2020-11-25', 1, '08:00', null),
(1, '2020-11-25', 2, null, '11:00'),
(1, '2020-11-25', 3, '11:32', null),
(1, '2020-11-25', 4, null, '17:00'),
(2, '2020-11-25', 5, '08:00', null),
(2, '2020-11-25', 6, null, '09:00'),
(2, '2020-11-25', 7, '09:15', null),
-- new date
(1, '2020-11-27', 8, '08:00', null),
(1, '2020-11-27', 9, null, '08:22'),
(1, '2020-11-27', 10, '09:14', null),
(1, '2020-11-27', 11, null, '12:08'),
(1, '2020-11-27', 12, '01:08', null),
(1, '2020-11-27', 13, null, '14:40'),
(1, '2020-11-27', 14, '14:55', null),
(1, '2020-11-27', 15, null, '17:00')
select *
from (
/* this just sets the column header names and condenses their values */
select
pk,
workdate,
colName = case when tin is not null then 'TimeIn' + cast(empDaySEQ as varchar) else 'TimeOut' + cast(empDaySEQ as varchar) end,
colValue = coalesce(tin, tout)
from (
/* main query */
select
pk,
workdate,
/* grab what pair # this clock in or out is; reset by employee & date */
empDaySEQ = (row_number() over (partition by pk, workdate order by seq) / 2) + (row_number() over (partition by pk, workdate order by seq) % 2),
tin,
tout
from #test
) i
) a
PIVOT (
max(colValue)
for colName
IN ( /* replace w/ dynamic if you don't know upper boundary of max in/out pairs */
[TimeIn1],
[TimeOut1],
[TimeIn2],
[TimeOut2],
[TimeIn3],
[TimeOut3],
[TimeIn4],
[TimeOut4]
)
) mypivotTable
generates these results.
(I would provide a fiddle demo but they're not working for me today)

Select last row where given value changed

I want to select last row (by mod_date) where column new_status is different from previous entry for given object_id.
At first I tried with row_number but not made it, later I came up with lead/lag functions and I think I'm closer to solution but still not ideal results.
Here's the code and the fiddle:
https://www.db-fiddle.com/f/kS8SAi2WsAjfFLomd7t2it/0
CREATE TABLE changes
(object_id integer,
new_status smallint,
comment text,
mod_date timestamp);
INSERT INTO changes
VALUES
(1001, 0, null, '2020-06-01 12:01'),
(1001, 1, 'XYZ', '2020-06-01 12:05'),
(1001, 1, 'YZX', '2020-06-01 12:11'),
(1002, 1, 'XYZ', '2020-06-01 13:21'),
(1002, 1, 'AAA', '2020-06-01 13:25'),
(1002, 0, 'BCA', '2020-06-01 14:11'),
(1003, 1, 'AXX', '2020-06-01 14:12'),
(1003, 0, 'YZX', '2020-06-01 14:13'),
(1003, 0, 'YYY', '2020-06-01 14:17');
SELECT object_id, min(mod_date), new_status FROM (
SELECT
object_id
, mod_date
, new_status
--, row_number() over (partition BY object_id ORDER BY mod_date desc) rn
, lag(new_status) OVER (partition by object_id ORDER BY mod_date desc) as next_status
FROM changes
ORDER BY 1)x
WHERE new_status = next_status
OR next_status is null
GROUP BY 1,3
The output for 1001, and 1003 is fine, for 1002 it should be row with status 0.
Appreciate any help and suggestions!
I think you want:
select distinct on (object_id) c.*
from (select c.*,
lag(new_status) over (partition by object_id order by mod_date) as prev_ns
from changes c
) c
where prev_ns is distinct from new_status
order by object_id, mod_date desc;
Here is a db<>fiddle.

Calc aggregates across continuous groups of values in Redshift

This is something that would probably be pretty easy to code a solution, but harder to accomplish in straight SQL. I may have to give up and code a routine that scans through the table.
I have a table of user status values with start and end dates that is like this:
create table #t (userid int4, status varchar(15), start_time date, end_time date);
insert into #t values
(1, 'Active', '2019-08-15', '2019-08-20'),
(1, 'Active', '2019-08-20', '2019-08-22'),
(1, 'Active', '2019-08-22', '2019-09-22'),
(1, 'Inactive', '2019-09-22', '2019-10-22'),
(1, 'At Risk', '2019-10-22', '2019-11-22'),
(1, 'Lapsed', '2019-11-22', '2019-12-08'),
(1, 'Active', '2019-12-08', '2019-12-18'),
(1, 'Active', '2019-12-18', '2020-01-11'),
(1, 'Active', '2020-01-11', '2020-01-15'),
(1, 'Active', '2020-01-15', '2020-02-15'),
(1, 'Inactive', '2020-02-15', '2020-03-15')
;
I'm trying to summarized to min/max dates for each continuous group of status values (when sorted by start_time), as shown below:
I've been trying to get there by using window functions in Redshift, but I cannot partition based on status as that seems to group the statuses together and I end up with "Active" from 2019-08-15 to 2020-02-15.
This is a so-called gaps-and-islands approach. Written on my phone so untested. But you should be able to search SO for that key-phrase.
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY userid
ORDER BY start
)
AS row_userid_start,
ROW_NUMBER()
OVER (
PARTITION BY userid, status
ORDER BY start
)
AS row_userid_status_start
FROM
#t
)
SELECT
userid,
status,
MIN(start) AS start,
MAX(end) AS end
FROM
sorted
GROUP BY
userid,
status,
row_userid_status_start - row_userid_start

How to sum up Loss amount per each claim ignoring date

I have table with Loss amount per each transaction date.
How can I create column ClaimLoss that would sum up Loss amount per each claim?
declare #TempTable1 table (ID int, ClaimNumber varchar(100), date date, Loss money)
insert into #TempTable1
values (1, 'Claim1','2017-01-01', 100),
(2, 'Claim1','2017-03-06',150),
(3, 'Claim1','2017-05-01', 50),
(4, 'Claim2','2018-01-01', 150),
(5, 'Claim2','2018-08-15', 250),
(6, 'Claim2','2018-05-03', 350),
(7, 'Claim3','2018-09-01', 330),
(8, 'Claim4','2019-01-01', 140),
(9, 'Claim4','2019-01-13', 225),
(10, 'Claim5','2019-02-01', 145)
select ID,
ClaimNumber,
Date,
Loss
from #TempTable1
I need something like this:
Is it possible to do in the same select statement?
This seems like a place to use row_number() and case:
select t.*,
(case when row_number() over (partition by ClaimNumber order by date) = 1
then sum(loss) over (partition by ClaimNumber)
else 0
end) as claimloss
from #TempTable1 t;
You can use window function:
select ID, ClaimNumber, Date, Loss,
(case when min(id) over (partition by ClaimNumber) = id
then sum(loss) over (partition by ClaimNumber)
else 0
end) as claimloss
from #TempTable1;