Split Table into Windows with Recurring Attributes

Split Table into Windows with Recurring Attributes - sql

My title is awful, because I am not sure how to describe the challenge. I would love an edit if someone can think of a more descriptive title. Hopefully my input/desired output will help explain. Here is some sample input data:
create table #input (
num varchar(10),
code varchar(10),
event_date date
)
insert into #input (num, code, event_date)
values('123456', 'Active', '2007-09-10'),
('123456', 'Active', '2010-09-15'),
('123456', 'Active', '2010-09-24'),
('123456', 'Inactive', '2018-09-17'),
('123456', 'Inactive', '2019-01-01'),
('123456', 'Active', '2019-02-08')
select *
from #input
order by event_date
I want to tag each record for each group of num + code with the same number. However, I want the time periods to stay separate. Here is the desired result:
create table #result (
num varchar(10),
code varchar(10),
event_date date,
tag int
)
insert into #result (num, code, event_date, tag)
values('123456', 'Active', '2007-09-10', 1),
('123456', 'Active', '2010-09-15', 1),
('123456', 'Active', '2010-09-24', 1),
('123456', 'Inactive', '2018-09-17', 2),
('123456', 'Inactive', '2019-01-01', 2),
('123456', 'Active', '2019-02-08', 3)
select *
from #result
order by event_date
Obviously normal window partitions like this...
select *, row_number() over(partition by num, code order by event_date) rn
from #input
order by event_date
...don't work, because there is no field on which to partition that would split the two "Active" groups (two groups, because they happen during two time frames). How would I reach my desired result? I have a hunch that a series of lag() and lead() functions might work, but I couldn't get anywhere meaningful.
Alternatively, how would I achieve the results so the categories overlap by one?
create table #result_new (
num varchar(10),
code varchar(10),
event_date date,
tag int
)
insert into #result (num, code, event_date, tag)
values('123456', 'Active', '2007-09-10', 1),
('123456', 'Active', '2010-09-15', 1),
('123456', 'Active', '2010-09-24', 1),
('123456', 'Inactive', '2018-09-17', 1),
('123456', 'Inactive', '2019-01-01', 2),
('123456', 'Active', '2019-02-08', 2)
select *
from #result_new
order by event_date

LAG gets your half way there, but not the whole way. You can use LAG to check the value of the last row, and create (what I have called) a switch. You can then use a SUM window function, with a ROWs BETWEEN clause to get the value for tag:
WITH CTE AS(
SELECT num,
code,
event_date,
CASE WHEN code = LAG(code) OVER (PARTITION BY num ORDER BY event_date) THEN 0 ELSE 1 END AS Switch
FROM #input)
SELECT num,
code,
event_date,
SUM(Switch) OVER (PARTITION BY num ORDER BY event_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS tag
FROM CTE;

Related

I want to find the date intervals at which the employee comes on a regular basis

Imagine a employee who works in a company whos having a contract to work on a specific task, he comes in and goes on start and end date respectively. I want to get the interval at which the employee comes to office without any absence.
Example Data:
DECLARE #TimeClock TABLE (PunchID INT IDENTITY, EmployeeID INT, PunchinDate DATE)
INSERT INTO #TimeClock (EmployeeID, PunchInDate) VALUES
(1, '2020-01-01'), (1, '2020-01-02'), (1, '2020-01-03'), (1, '2020-01-04'),
(1, '2020-01-05'), (1, '2020-01-06'), (1, '2020-01-07'), (1, '2020-01-08'),
(1, '2020-01-09'), (1, '2020-01-10'), (1, '2020-01-11'), (1, '2020-01-12'),
(1, '2020-01-13'), (1, '2020-01-14'), (1, '2020-01-16'),
(1, '2020-01-17'), (1, '2020-01-18'), (1, '2020-01-19'), (1, '2020-01-20'),
(1, '2020-01-21'), (1, '2020-01-22'), (1, '2020-01-23'), (1, '2020-01-24'),
(1, '2020-01-25'), (1, '2020-01-26'), (1, '2020-01-27'), (1, '2020-01-28'),
(1, '2020-01-29'), (1, '2020-01-30'), (1, '2020-01-31'),
(1, '2020-02-01'), (1, '2020-02-02'), (1, '2020-02-03'), (1, '2020-02-04'),
(1, '2020-02-05'), (1, '2020-02-06'), (1, '2020-02-07'), (1, '2020-02-08'),
(1, '2020-02-09'), (1, '2020-02-10'), (1, '2020-02-12'),
(1, '2020-02-13'), (1, '2020-02-14'), (1, '2020-02-15'), (1, '2020-02-16');
--the output shall look like this '2020-01-01 to 2020-02-10' as this is the interval at which the employee comes without any leave
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-01') as START_DATE, FORMAT( getdate(), '2020-01-10') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-11') as START_DATE, FORMAT( getdate(), '2020-01-15') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-21') as START_DATE, FORMAT( getdate(), '2020-01-31') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-02-01') as START_DATE, FORMAT( getdate(), '2020-02-10') as END_DATE
--the output shall look like this '2020-01-01 to 2020-01-15' and '2020 01-21 to 2020-02-10'as these are the intervals at which the employee comes without any leave

Using the example data provided we can query the table like this:
;WITH iterate AS (
SELECT *, DATEADD(DAY,1,PunchinDate) AS NextDate
FROM #TimeClock
), base AS (
SELECT *
FROM (
SELECT *, CASE WHEN DATEADD(DAY,-1,PunchInDate) = LAG(PunchinDate,1) OVER (PARTITION BY EmployeeID ORDER BY PunchinDate) THEN PunchInDate END AS s
FROM iterate
) a
WHERE s IS NULL
), rCTE AS (
SELECT EmployeeID, PunchInDate AS StartDate, PunchInDate AS EndDate, NextDate
FROM base
UNION ALL
SELECT a.EmployeeID, a.StartDate, r.PunchInDate, r.NextDate
FROM rCTE a
INNER JOIN iterate r
ON a.NextDate = r.PunchinDate
AND a.EmployeeID = r.EmployeeID
)
SELECT EmployeeID, StartDate, MAX(EndDate) AS EndDate, DATEDIFF(DAY,StartDate,MAX(EndDate)) AS Streak
FROM rCTE
GROUP BY rCTE.EmployeeID, rCTE.StartDate
This is known as a recursive common table expression, and allows us to compare values between related rows. In this case we're looking for rows where they follow a streak, and we want o re-start that streak anytime we encounter a break. We're using a windowed function called LAG to look back a row to the previous value, and compare it to the one we have now. If it's not yesterday, then we start a new streak.
EmployeeID StartDate EndDate Streak
------------------------------------------
1 2020-01-01 2020-01-15 14
1 2020-01-17 2020-02-10 24
1 2020-02-12 2020-02-16 4

SQL Pivot Half of table

I have a table that consists of time information. It's basically:
Employee, Date, Seq, Time In, Time Out.
They can clock out multiple times a day, so I'm trying to get all of the clock outs in a day on one row. My result would be something like:
Employee, Date, TimeIn1, TimeOut1, TimeIn2, TimeOut2, TimeIn3, TimeOut3....
Where the 1, 2, and 3 are the sequence numbers. I know I could just do a bunch of left joins to the table itself based on employee=employee, date=date, and seq=seq+1, but is there a way to do it in a pivot? I don't want to pivot the employee and date fields, just the time in and time out.

The short answer is: Yes, it's possible.
The exact code will be updated if/when you provide sample data to clarify some points, but you can absolutely pivot the times out while leaving the employee/work date alone.
Sorry for the wall of code; none of the fiddle sites are working from my current computer
declare #test table (
pk int,
workdate date,
seq int,
tIN time,
tOUT time
)
insert into #test values
(1, '2020-11-25', 1, '08:00', null),
(1, '2020-11-25', 2, null, '11:00'),
(1, '2020-11-25', 3, '11:32', null),
(1, '2020-11-25', 4, null, '17:00'),
(2, '2020-11-25', 5, '08:00', null),
(2, '2020-11-25', 6, null, '09:00'),
(2, '2020-11-25', 7, '09:15', null),
-- new date
(1, '2020-11-27', 8, '08:00', null),
(1, '2020-11-27', 9, null, '08:22'),
(1, '2020-11-27', 10, '09:14', null),
(1, '2020-11-27', 11, null, '12:08'),
(1, '2020-11-27', 12, '01:08', null),
(1, '2020-11-27', 13, null, '14:40'),
(1, '2020-11-27', 14, '14:55', null),
(1, '2020-11-27', 15, null, '17:00')
select *
from (
/* this just sets the column header names and condenses their values */
select
pk,
workdate,
colName = case when tin is not null then 'TimeIn' + cast(empDaySEQ as varchar) else 'TimeOut' + cast(empDaySEQ as varchar) end,
colValue = coalesce(tin, tout)
from (
/* main query */
select
pk,
workdate,
/* grab what pair # this clock in or out is; reset by employee & date */
empDaySEQ = (row_number() over (partition by pk, workdate order by seq) / 2) + (row_number() over (partition by pk, workdate order by seq) % 2),
tin,
tout
from #test
) i
) a
PIVOT (
max(colValue)
for colName
IN ( /* replace w/ dynamic if you don't know upper boundary of max in/out pairs */
[TimeIn1],
[TimeOut1],
[TimeIn2],
[TimeOut2],
[TimeIn3],
[TimeOut3],
[TimeIn4],
[TimeOut4]
)
) mypivotTable
generates these results.
(I would provide a fiddle demo but they're not working for me today)

Select last row where given value changed

I want to select last row (by mod_date) where column new_status is different from previous entry for given object_id.
At first I tried with row_number but not made it, later I came up with lead/lag functions and I think I'm closer to solution but still not ideal results.
Here's the code and the fiddle:
https://www.db-fiddle.com/f/kS8SAi2WsAjfFLomd7t2it/0
CREATE TABLE changes
(object_id integer,
new_status smallint,
comment text,
mod_date timestamp);
INSERT INTO changes
VALUES
(1001, 0, null, '2020-06-01 12:01'),
(1001, 1, 'XYZ', '2020-06-01 12:05'),
(1001, 1, 'YZX', '2020-06-01 12:11'),
(1002, 1, 'XYZ', '2020-06-01 13:21'),
(1002, 1, 'AAA', '2020-06-01 13:25'),
(1002, 0, 'BCA', '2020-06-01 14:11'),
(1003, 1, 'AXX', '2020-06-01 14:12'),
(1003, 0, 'YZX', '2020-06-01 14:13'),
(1003, 0, 'YYY', '2020-06-01 14:17');
SELECT object_id, min(mod_date), new_status FROM (
SELECT
object_id
, mod_date
, new_status
--, row_number() over (partition BY object_id ORDER BY mod_date desc) rn
, lag(new_status) OVER (partition by object_id ORDER BY mod_date desc) as next_status
FROM changes
ORDER BY 1)x
WHERE new_status = next_status
OR next_status is null
GROUP BY 1,3
The output for 1001, and 1003 is fine, for 1002 it should be row with status 0.
Appreciate any help and suggestions!

I think you want:
select distinct on (object_id) c.*
from (select c.*,
lag(new_status) over (partition by object_id order by mod_date) as prev_ns
from changes c
) c
where prev_ns is distinct from new_status
order by object_id, mod_date desc;
Here is a db<>fiddle.

Calc aggregates across continuous groups of values in Redshift

This is something that would probably be pretty easy to code a solution, but harder to accomplish in straight SQL. I may have to give up and code a routine that scans through the table.
I have a table of user status values with start and end dates that is like this:
create table #t (userid int4, status varchar(15), start_time date, end_time date);
insert into #t values
(1, 'Active', '2019-08-15', '2019-08-20'),
(1, 'Active', '2019-08-20', '2019-08-22'),
(1, 'Active', '2019-08-22', '2019-09-22'),
(1, 'Inactive', '2019-09-22', '2019-10-22'),
(1, 'At Risk', '2019-10-22', '2019-11-22'),
(1, 'Lapsed', '2019-11-22', '2019-12-08'),
(1, 'Active', '2019-12-08', '2019-12-18'),
(1, 'Active', '2019-12-18', '2020-01-11'),
(1, 'Active', '2020-01-11', '2020-01-15'),
(1, 'Active', '2020-01-15', '2020-02-15'),
(1, 'Inactive', '2020-02-15', '2020-03-15')
;
I'm trying to summarized to min/max dates for each continuous group of status values (when sorted by start_time), as shown below:
I've been trying to get there by using window functions in Redshift, but I cannot partition based on status as that seems to group the statuses together and I end up with "Active" from 2019-08-15 to 2020-02-15.

This is a so-called gaps-and-islands approach. Written on my phone so untested. But you should be able to search SO for that key-phrase.
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY userid
ORDER BY start
)
AS row_userid_start,
ROW_NUMBER()
OVER (
PARTITION BY userid, status
ORDER BY start
)
AS row_userid_status_start
FROM
#t
)
SELECT
userid,
status,
MIN(start) AS start,
MAX(end) AS end
FROM
sorted
GROUP BY
userid,
status,
row_userid_status_start - row_userid_start

How to sum up Loss amount per each claim ignoring date

I have table with Loss amount per each transaction date.
How can I create column ClaimLoss that would sum up Loss amount per each claim?
declare #TempTable1 table (ID int, ClaimNumber varchar(100), date date, Loss money)
insert into #TempTable1
values (1, 'Claim1','2017-01-01', 100),
(2, 'Claim1','2017-03-06',150),
(3, 'Claim1','2017-05-01', 50),
(4, 'Claim2','2018-01-01', 150),
(5, 'Claim2','2018-08-15', 250),
(6, 'Claim2','2018-05-03', 350),
(7, 'Claim3','2018-09-01', 330),
(8, 'Claim4','2019-01-01', 140),
(9, 'Claim4','2019-01-13', 225),
(10, 'Claim5','2019-02-01', 145)
select ID,
ClaimNumber,
Date,
Loss
from #TempTable1
I need something like this:
Is it possible to do in the same select statement?

This seems like a place to use row_number() and case:
select t.*,
(case when row_number() over (partition by ClaimNumber order by date) = 1
then sum(loss) over (partition by ClaimNumber)
else 0
end) as claimloss
from #TempTable1 t;

You can use window function:
select ID, ClaimNumber, Date, Loss,
(case when min(id) over (partition by ClaimNumber) = id
then sum(loss) over (partition by ClaimNumber)
else 0
end) as claimloss
from #TempTable1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Split Table into Windows with Recurring Attributes - sql

Related

I want to find the date intervals at which the employee comes on a regular basis

SQL Pivot Half of table

Select last row where given value changed

Calc aggregates across continuous groups of values in Redshift

How to sum up Loss amount per each claim ignoring date

Categories

Resources