case when.. then.. where - sql

I have the following piece of code:
case when status=status2 and rank=5 then datediff(day, rep_onboard_date, client_signup_date) end as time_to_status2
where:
rank= row_number() over(partition by rep_id order by client_signup_date)
and:
status2= case when count(client_signup_date) over (partition by rep_id) >=5.
This takes the time difference between rep_onboard_date and client_signup_date of his 5th client.
This works fine however only the 5th row per rep is populated, while the rest are (null).
What I would like is that if any row is populated for the rep with time_to_status then all rows should carry the same value.
Simplified query:
with cte as (
select rep_id, rep_onboard_date, user_id, client_signup_date, /* a bunch of other fields,*/
count(client_signup_date) over (partition by rep_id) as total_applicants,
case when count(client_signup_date) over (partition by rep_id) >=10 then 'status1'
when count(client_signup_date) over (partition by rep_id) >=5 then 'status2'
when count(client_signup_date) over (partition by rep_id) >=1 then 'status3'
else 'none' end status,
row_number() over(partition by rep_id order by client_signup_date) as rank,
from table1 r
left join table2 u on r.user_id=u.user_id
left join table3 pi on u.user_id=pi.user_id
)
select *,
case when status='status1' and rank=10 then datediff(day, advisor_onboard_date, client_signup_date) end as time_to_status1,
case when status='status2' and rank=5 then datediff(day, advisor_onboard_date, client_signup_date) end as time_to_status2,
case when status='status3' and rank=1 then datediff(day, advisor_onboard_date, client_signup_date) end as time_to_status3
from cte
Current output:
rep_id user_id rep_onboard_date client_signup_date status rank time_to_status
1 1 1/1/2018 1/5/2018 status2 1 (null)
1 2 1/1/2018 1/5/2018 status2 2 (null)
1 3 1/1/2018 1/6/2018 status2 3 (null)
1 4 1/1/2018 1/7/2018 status2 4 (null)
1 5 1/1/2018 1/10/2018 status2 5 9
1 6 1/1/2018 1/15/2018 status2 6 (null)
Expected output:
rep_id user_id rep_onboard_date client_signup_date status rank time_to_status
1 1 1/1/2018 1/5/2018 status2 1 9
1 2 1/1/2018 1/5/2018 status2 2 9
1 3 1/1/2018 1/6/2018 status2 3 9
1 4 1/1/2018 1/7/2018 status2 4 9
1 5 1/1/2018 1/10/2018 status2 5 9
1 6 1/1/2018 1/15/2018 status2 6 9

I believe what you want are window functions:
select cte.*,
max(case when status = 'status1' and rank = 10
then datediff(day, advisor_onboard_date, client_signup_date)
end) over (partition by rep_id) as time_to_status1
from cte;

It seems you don't want a CASE, you want a subquery. Something like this:
SELECT col1,
col2,
(SELECT datediff(day, rep_onboard_date, client_signup_date)
FROM yourTable t2
WHERE t2.rep_id = t.rep_id
AND ((t.rank = 10 AND t.status = 'status')
OR (t.rank = 5 AND t.status = 'status2')
OR (t.rank = 1 AND t.status = 'status3'))) as time_to_status
FROM (yourTable or derivedTable with rank function) t

Related

Review sessions from timestamp column with start/stop event with duplicate start/stop records

I have following case change records:
id
case_id
state
time_created
1
100
REVIEW_NEEDED
2021-03-30 15:11:58.015907000
2
100
REVIEW_NEEDED
2021-04-01 13:08:17.945926000
3
100
REVIEW
2021-04-07 06:20:48.873865000
4
100
WAITING
2021-04-07 06:32:47.159664000
5
100
REVIEW_NEEDED
2021-04-09 06:32:51.132127000
6
100
REVIEW
2021-04-12 04:39:36.426467000
7
100
REVIEW
2021-04-12 04:40:36.000000000
8
100
CLOSED
2021-04-12 04:40:43.133736000
9
101
REVIEW_NEEDED
2021-03-30 20:37:58.015907000
10
101
REVIEW
2021-04-04 13:08:17.945926000
11
101
CLOSED
2021-04-06 06:20:48.873865000
12
101
CLOSED
2021-04-06 06:20:50.000000000
I'd like to report sessions out of these like following:
open_id
close_id
case_id
waiting_time_start
handling_time_start
handling_time_end
1
4
100
2021-03-30 15:11:58.015907000
2021-04-07 06:20:48.873865000
2021-04-07 06:32:47.159664000
5
8
100
2021-04-09 06:32:51.132127000
2021-04-12 04:39:36.426467000
2021-04-12 04:40:43.133736000
9
11
101
2021-03-30 20:37:58.015907000
2021-04-04 13:08:17.945926000
2021-04-06 06:20:48.873865000
Waiting_time_start: when state = REVIEW_NEEDED
Handling_time_start: when state = REVIEW
Handling_time_end: when state = WAITING or CLOSED
My current solution is to rank the Waiting_time_start, Handling_time_start and Handling_time_end for each case and then join these events on rank, but this is not perfect as there's duplicate records, so number of start/stop events can differ for a case.
Thanks a lot for any ideas!
This is rather complicated. Start by adding a grouping based on the count of "waiting" and "closed" -- but only when they change values:
select t.*,
sum(case when (state <> next_state or next_state is null) and
state in ('WAITING', 'CLOSED')
then 1 else 0
end) over (partition by caseid order by time_created desc) as grouping
from (select t.*,
lead(state) over (partition by caseid order by time_created) as next_state
from t
) t
Then, you can just aggregate:
with cte as (
select t.*,
sum(case when (state <> next_state or next_state is null) and
state in ('WAITING', 'CLOSED')
then 1 else 0
end) over (partition by caseid order by time_created desc) as grouping
from (select t.*,
lead(state) over (partition by caseid order by time_created) as next_state
from t
) t
)
select caseid, min(id), max(id),
min(case when status = 'REVIEW_NEEDED' then time_created end),
min(case when status = 'REVIEW' then time_created end),
max(time_created)
from cte
group by grouping, caseid;

How to count consecutive dates using Netezza

I need to count consecutive days in order to define my cohorts. I have a table that looks like:
pat_id admin_date
----------------------------
1 3/10/2019
1 3/11/2019
1 3/23/2019
1 3/24/2019
1 3/25/2019
2 12/26/2017
2 2/27/2019
2 3/16/2019
2 3/17/2019
I want such as output:
pat_id admin_date consecutive
--------------------------------------------
1 3/10/2019 1
1 3/11/2019 2
1 3/23/2019 1
1 3/24/2019 2
1 3/25/2019 3
2 12/26/2017 1
2 2/27/2019 1
2 3/16/2019 1
2 3/17/2019 2
so that I can use these consecutive days value (per pat_id) to filter for my cohort. I've seen few posts that suggested using DateDiff/DateAdd with row_number, such as:
datediff(day, -row_number() over (partition by mrn order by admin_date), admin_date)
but datediff/dateadd functions wouldn't work on Netezza...
The closest I've got so far was:
select row_number() over (partition by mrn order by administration_date) as consecutive
which doesn't recognize gap between dates and return such an output:
pat_id admin_date consecutive
--------------------------------------------
1 3/10/2019 1
1 3/11/2019 2
1 3/23/2019 3
1 3/24/2019 4
1 3/25/2019 5
2 12/26/2017 1
2 2/27/2019 2
2 3/16/2019 3
2 3/17/2019 4
Does anyone know how to tackle this?
Use lag() to see where the groups start and a cumulative sum to define the group. The rest is just row_number():
select t.*,
row_number() over (partition by pat_id, grp order by admin_date) as consecutive
from (select t.*,
sum( case when prev_ad = admin_date - interval '1 day' then 0 else 1 end) over
(partition by pat_id order by admin_date) as grp
from (select t.*,
lag(admin_date) over (partition by pat_id order by admin_date) as prev_ad
from t
) t
)t ;

Conditional time to status calculation

I am trying to calculate how long it takes a rep to have x amount of clients apply for service: meaning I need the time between date_created - ie. date the rep was onboarded, and when rep reaches a certain "status". Status is reached when x of the rep's clients (= users) have a non-null date_applied- ie. date user signed up.
x is minimum criteria to reach each "status", and ties back to a previous question: Aggregate case when inside non aggregate query where I am currently calculating "status" like so:
case when count(date_applied) over (partition by rep_id) >=10 then 'status1'
when count(date_applied) over (partition by rep_id) >=5 then 'status2'
when count(date_applied) over (partition by rep_id) >=1 then 'status3'
else 'no_status' end status
So it takes 10 clients to reach status1, 5 to reach status2 and 1 to reach status3. These are the criteria for each "status", so if you have 7 users for example, you still calculate status2 based on the date the 5th user applied.
I think calculating time_to_status1/2/3 (what i am trying to get at) should look something like this:
case when count(date_applied) over (partition by rep_id) >=10 then
datediff(day, date_created, date_applied for the 10th user that applied with that rep) end as time_to_status1,
case when count(date_applied) over (partition by rep_id) >=5 then
datediff(day, date_created, date_applied for the 5th user that applied with that rep) end as time_to_status2,
case when count(date_applied) over (partition by rep_id) >=1 then
datediff(day, date_created, date_applied for the 1st user that applied with that rep) end as time_to_status3
Any help is greatly appreciated!
--Edit--
Sample current data:
rep_id user_id date_created date_applied status
1 1 1/1/2018 6:43:22 AM 1/5/2018 2:45:15 PM status2
1 2 1/1/2018 6:43:22 AM 1/5/2018 3:35:15 PM status2
1 3 1/1/2018 6:43:22 AM 1/6/2018 4:25:15 PM status2
1 4 1/1/2018 6:43:22 AM 1/7/2018 5:05:15 PM status2
1 5 1/1/2018 6:43:22 AM 1/10/2018 3:35:15 PM status2
1 6 1/1/2018 6:43:22 AM 1/15/2018 12:55:23 PM status2
2 7 1/12/2018 1:13:42 PM 1/15/2018 4:25:15 PM status3
2 8 1/12/2018 1:13:42 PM 1/16/2018 1:05:15 PM status3
2 9 1/12/2018 1:13:42 PM 1/16/2018 3:35:15 PM status3
3 10 1/20/2018 10:13:15 AM 1/26/2018 7:25:15 PM status3
4 11 1/21/2018 3:33:23 PM (null) no_status
Desired output:
rep_id user_id date_created date_applied status time_to_status1 time_to_status2 time_to_status3
1 1 1/1/2018 6:43:22 AM 1/5/2018 2:45:15 PM status2 (null) 9 (null)
1 2 1/1/2018 6:43:22 AM 1/5/2018 3:35:15 PM status2 (null) 9 (null)
1 3 1/1/2018 6:43:22 AM 1/6/2018 4:25:15 PM status2 (null) 9 (null)
1 4 1/1/2018 6:43:22 AM 1/7/2018 5:05:15 PM status2 (null) 9 (null)
1 5 1/1/2018 6:43:22 AM 1/10/2018 3:35:15 PM status2 (null) 9 (null)
1 6 1/1/2018 6:43:22 AM 1/15/2018 12:55:23 PM status2 (null) 9 (null)
2 7 1/12/2018 1:13:42 PM 1/15/2018 4:25:15 PM status3 (null) (null) 3
2 8 1/12/2018 1:13:42 PM 1/16/2018 1:05:15 PM status3 (null) (null) 3
2 9 1/12/2018 1:13:42 PM 1/16/2018 3:35:15 PM status3 (null) (null) 3
3 10 1/20/2018 10:13:15 AM 1/26/2018 7:25:15 PM status3 (null) (null) 6
4 11 1/21/2018 3:33:23 PM (null) no_status (null) (null) (null)
rep_id=1 has status2 because he has 6 users with with a non null date_applied, so time_to_status2 in his case is based on date_applied of 5th client rep signed up: datediff(day, '1/1/2018 6:43:22 AM', '1/10/2018 3:35:15 PM') = 9 days
rep_id=2 has status3 because he has 3 users with a non null date_applied, so time_to_status3 in his case is based on date_applied of 1st client rep signed up: datediff(day, '1/12/2018 1:13:42 PM', '1/15/2018 4:25:15 PM') = 3 days
rep_id=3 has status3 because he has 1 (>=1) user with a non null date_applied, so time_to_status3 in his case is datediff(day, '1/20/2018 10:13:15 AM', '1/26/2018 7:25:15 PM') = 6 days
Based on #Parfait's deleted hint, and #Gordon's answer on a different question, I was able to come up with an answer:
with cte as
(
initial query with:
case when count(client_signup_date) over (partition by rep_id) >=10 then 'status1'
when count(client_signup_date) over (partition by rep_id) >=5 then 'status2'
when count(client_signup_date) over (partition by rep_id) >=1 then 'status3'
else 'none' end status,
row_number() over(partition by rep_id order by client_signup_date) as rank
)
select *,
max(case when status = 'status1' and rank = 10
then datediff(day, advisor_onboard_date, client_signup_date)
end) over (partition by rep_id) as time_to_status1,
max(case when status = 'status2' and rank = 5
then datediff(day, advisor_onboard_date, client_signup_date)
end) over (partition by rep_id) as time_to_status2,
max(case when status = 'status3' and rank = 1
then datediff(day, advisor_onboard_date, client_signup_date)
end) over (partition by rep_id) as time_to_status3
into #t
from cte

Query and Partition By clause group by window

I've the following code
declare #test table (id int, [Status] int, [Date] date)
insert into #test (Id,[Status],[Date]) VALUES
(1,1,'2018-01-01'),
(2,1,'2018-01-01'),
(1,1,'2017-11-01'),
(1,2,'2017-10-01'),
(1,1,'2017-09-01'),
(2,2,'2017-01-01'),
(1,1,'2017-08-01'),
(1,1,'2017-07-01'),
(1,1,'2017-06-01'),
(1,2,'2017-05-01'),
(1,1,'2017-04-01'),
(1,1,'2017-03-01'),
(1,1,'2017-01-01')
SELECT
id,
[Status],
MIN([Date]) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status] ) as WindowStart,
max([Date]) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status]) as WindowEnd,
COUNT(*) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status] ) as total
from #test
But the result is this:
id Status WindowStart WindowEnd total
1 1 2017-01-01 2017-01-01 1
1 1 2017-01-01 2017-03-01 2
1 1 2017-01-01 2017-04-01 3
1 1 2017-01-01 2017-06-01 4
1 1 2017-01-01 2017-07-01 5
1 1 2017-01-01 2017-08-01 6
1 1 2017-01-01 2017-09-01 7
1 1 2017-01-01 2017-11-01 8
1 1 2017-01-01 2018-01-01 9
1 2 2017-05-01 2017-05-01 1
1 2 2017-05-01 2017-10-01 2
2 1 2018-01-01 2018-01-01 1
2 2 2017-01-01 2017-01-01 1
And I need to be grouped by window like this.
id Status WindowStart WindowEnd total
1 1 2017-01-01 2017-04-01 3
1 2 2017-05-01 2017-05-01 1
1 1 2017-06-01 2017-09-01 4
1 2 2017-10-01 2017-10-01 1
1 1 2017-11-01 2018-01-01 2
2 1 2018-01-01 2018-01-01 1
2 2 2017-01-01 2017-01-01 1
The first group for the id= 1 Status = 1 should end at the first row with Status = 2 (2017-05-01) so the total is 3 and then start again from the 2017-06-01 to 2017-09-01 with a total of 4 rows.
How can get this done?
This is a "classic" Groups and Island issue. There's probably 1000's of answers for these on the Internet.
This works for what you're after, however, try having a bit more of a research before hand. :)
WITH Groups AS(
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [Date]) -
ROW_NUMBER() OVER (PARTITION BY id, [status] ORDER BY [Date]) AS Grp
FROM #test t)
SELECT G.id,
G.[Status],
MIN([Date]) AS WindowStart,
MAX([date]) AS WindowsEnd,
COUNT(*) AS Total
FROM Groups G
GROUP BY G.id,
G.[Status],
G.Grp
ORDER BY G.id, WindowStart;
Note, that the ordering of your last 2 lines is the other way round in this solution; it seems you're ordering ASCENDING for id 1, for DESCENDING for id 2 in your expected results.
Here is one way using LAG function
;WITH cte
AS (SELECT *,
grp = Sum(CASE WHEN prev_val = Status THEN 0 ELSE 1 END)
OVER(partition BY id ORDER BY Date)
FROM (SELECT *,
prev_val = Lag(Status)OVER(partition BY id ORDER BY Date)
FROM #test) a)
SELECT id,
Status,
WindowStart = Min(date),
WindowEnd = Max(date),
Total = Count(*)
FROM cte
GROUP BY id, Status, grp
Using lag function first find the previous status of each date, then using Sum over() create a group by incrementing the number only when there is a change in status.

window function in redshift

I have some data that looks like this:
CustID EventID TimeStamp
1 17 1/1/15 13:23
1 17 1/1/15 14:32
1 13 1/1/25 14:54
1 13 1/3/15 1:34
1 17 1/5/15 2:54
1 1 1/5/15 3:00
2 17 2/5/15 9:12
2 17 2/5/15 9:18
2 1 2/5/15 10:02
2 13 2/8/15 7:43
2 13 2/8/15 7:50
2 1 2/8/15 8:00
I'm trying to use the row_number function to get it to look like this:
CustID EventID TimeStamp SeqNum
1 17 1/1/15 13:23 1
1 17 1/1/15 14:32 1
1 13 1/1/25 14:54 2
1 13 1/3/15 1:34 2
1 17 1/5/15 2:54 3
1 1 1/5/15 3:00 4
2 17 2/5/15 9:12 1
2 17 2/5/15 9:18 1
2 1 2/5/15 10:02 2
2 13 2/8/15 7:43 3
2 13 2/8/15 7:50 3
2 1 2/8/15 8:00 4
I tried this:
row_number () over
(partition by custID, EventID
order by custID, TimeStamp asc) SeqNum]
but got this back:
CustID EventID TimeStamp SeqNum
1 17 1/1/15 13:23 1
1 17 1/1/15 14:32 2
1 13 1/1/25 14:54 3
1 13 1/3/15 1:34 4
1 17 1/5/15 2:54 5
1 1 1/5/15 3:00 6
2 17 2/5/15 9:12 1
2 17 2/5/15 9:18 2
2 1 2/5/15 10:02 3
2 13 2/8/15 7:43 4
2 13 2/8/15 7:50 5
2 1 2/8/15 8:00 6
how can I get it to sequence based on the change in the EventID?
This is tricky. You need a multi-step process. You need to identify the groups (a difference of row_number() works for this). Then, assign an increasing constant to each group. And then use dense_rank():
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
min(timestamp) over (partition by custid, eventid, grp) as mints
from (select sd.*,
(row_number() over (partition by custid order by timestamp) -
row_number() over (partition by custid, eventid order by timestamp)
) as grp
from somedata sd
) sd
) sd;
Another method is to use lag() and a cumulative sum:
select sd.*,
sum(case when prev_eventid is null or prev_eventid <> eventid
then 1 else 0 end) over (partition by custid order by timestamp
) as seqnum
from (select sd.*,
lag(eventid) over (partition by custid order by timestamp) as prev_eventid
from somedata sd
) sd;
EDIT:
The last time I used Amazon Redshift it didn't have row_number(). You can do:
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
min(timestamp) over (partition by custid, eventid, grp) as mints
from (select sd.*,
(row_number() over (partition by custid order by timestamp rows between unbounded preceding and current row) -
row_number() over (partition by custid, eventid order by timestamp rows between unbounded preceding and current row)
) as grp
from somedata sd
) sd
) sd;
Try this code block:
WITH by_day
AS (SELECT
*,
ts::date AS login_day
FROM table_name)
SELECT
*,
login_day,
FIRST_VALUE(login_day) OVER (PARTITION BY userid ORDER BY login_day , userid rows unbounded preceding) AS first_day
FROM by_day