How to count consecutive dates using Netezza - sql

I need to count consecutive days in order to define my cohorts. I have a table that looks like:
pat_id admin_date
----------------------------
1 3/10/2019
1 3/11/2019
1 3/23/2019
1 3/24/2019
1 3/25/2019
2 12/26/2017
2 2/27/2019
2 3/16/2019
2 3/17/2019
I want such as output:
pat_id admin_date consecutive
--------------------------------------------
1 3/10/2019 1
1 3/11/2019 2
1 3/23/2019 1
1 3/24/2019 2
1 3/25/2019 3
2 12/26/2017 1
2 2/27/2019 1
2 3/16/2019 1
2 3/17/2019 2
so that I can use these consecutive days value (per pat_id) to filter for my cohort. I've seen few posts that suggested using DateDiff/DateAdd with row_number, such as:
datediff(day, -row_number() over (partition by mrn order by admin_date), admin_date)
but datediff/dateadd functions wouldn't work on Netezza...
The closest I've got so far was:
select row_number() over (partition by mrn order by administration_date) as consecutive
which doesn't recognize gap between dates and return such an output:
pat_id admin_date consecutive
--------------------------------------------
1 3/10/2019 1
1 3/11/2019 2
1 3/23/2019 3
1 3/24/2019 4
1 3/25/2019 5
2 12/26/2017 1
2 2/27/2019 2
2 3/16/2019 3
2 3/17/2019 4
Does anyone know how to tackle this?

Use lag() to see where the groups start and a cumulative sum to define the group. The rest is just row_number():
select t.*,
row_number() over (partition by pat_id, grp order by admin_date) as consecutive
from (select t.*,
sum( case when prev_ad = admin_date - interval '1 day' then 0 else 1 end) over
(partition by pat_id order by admin_date) as grp
from (select t.*,
lag(admin_date) over (partition by pat_id order by admin_date) as prev_ad
from t
) t
)t ;

Related

Rank customer Transactions per segments in SQL Server

I have below table which has customer's transaction details.
Tranactaction date
CustomerID
1/27/2022
1
1/29/2022
1
2/27/2022
1
3/27/2022
1
3/29/2022
1
3/31/2022
1
4/2/2022
1
4/4/2022
1
4/6/2022
1
In this table consecutive transactions occurred in every two days considered as a segment.
For example, Transactions between Jan 27th and Jan 29th considered as segment 1 & Transactions between Mar 29th and Apr 6th considered as Segment 2. I need to rank the transactions per segment with date order. If a transaction not fall under any segment by default the rank is 1. Expected output is below.
Segment Rank
Tranactaction date
CustomerID
1
1/27/2022
1
2
1/29/2022
1
1
2/27/2022
1
1
3/27/2022
1
2
3/29/2022
1
3
3/31/2022
1
4
4/2/2022
1
5
4/4/2022
1
6
4/6/2022
1
Can somebody guide how to achieve this in T-sql?
Using lag() to check for change in TransDate that is within 2 days and groups together (as a segment). After that use row_number() to generate the required sequence
with
cte as
(
select *,
g = case when datediff(day,
lag(t.TransDate) over (order by t.TransDate),
t.TransDate
) <= 2
then 0
else 1
end
from tbl t
),
cte2 as
(
select *, grp = sum(g) over (order by TransDate)
from cte
)
select *, row_number() over (partition by grp order by TransDate)
from cte2
db<>fiddle demo

Estimation of Cumulative value every 3 months in SQL

I have a table like this:
ID Date Prod
1 1/1/2009 5
1 2/1/2009 5
1 3/1/2009 5
1 4/1/2009 5
1 5/1/2009 5
1 6/1/2009 5
1 7/1/2009 5
1 8/1/2009 5
1 9/1/2009 5
And I need to get the following result:
ID Date Prod CumProd
1 2009/03/01 5 15 ---Each 3 months
1 2009/06/01 5 30 ---Each 3 months
1 2009/09/01 5 45 ---Each 3 months
What could be the best approach to take in SQL?
You can try the below - using window function
DEMO Here
select * from
(
select *,sum(prod) over(order by DATEPART(qq,dateval)) as cum_sum,
row_number() over(partition by DATEPART(qq,dateval) order by dateval) as rn
from t
)A where rn=1
How about just filtering on the month number?
select t.*
from (select id, date, prod, sum(prod) over (partition by id order by date) as running_prod
from t
) t
where month(date) in (3, 6, 9, 12);

Using the earliest date of a partition to determine what other dates belong to that partition

Assume this is my table:
ID DATE
--------------
1 2018-11-12
2 2018-11-13
3 2018-11-14
4 2018-11-15
5 2018-11-16
6 2019-03-05
7 2019-05-07
8 2019-05-08
9 2019-05-08
I need to have partitions be determined by the first date in the partition. Where, any date that is within 2 days of the first date, belongs in the same partition.
The table would end up looking like this if each partition was ranked
PARTITION ID DATE
------------------------
1 1 2018-11-12
1 2 2018-11-13
1 3 2018-11-14
2 4 2018-11-15
2 5 2018-11-16
3 6 2019-03-05
4 7 2019-05-07
4 8 2019-05-08
4 9 2019-05-08
I've tried using datediff with lag to compare to the previous date but that would allow a partition to be inappropriately sized based on spacing, for example all of these dates would be included in the same partition:
ID DATE
--------------
1 2018-11-12
2 2018-11-14
3 2018-11-16
4 2018-11-18
3 2018-11-20
4 2018-11-22
Previous flawed attempt:
Mark when a date is more than 2 days past the previous date:
(case when datediff(day, lag(event_time, 1) over (partition by user_id, stage order by event_time), event_time) > 2 then 1 else 0 end)
You need to use a recursive CTE for this, so the operation is expensive.
with t as (
-- add an incrementing column with no gaps
select t.*, row_number() over (order by date) as seqnum
from t
),
cte as (
select id, date, date as mindate, seqnum
from t
where seqnum = 1
union all
select t.id, t.date,
(case when t.date <= dateadd(day, 2, cte.mindate)
then cte.mindate else t.date
end) as mindate,
t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select cte.*, dense_rank() over (partition by mindate) as partition_num
from cte;

Query and Partition By clause group by window

I've the following code
declare #test table (id int, [Status] int, [Date] date)
insert into #test (Id,[Status],[Date]) VALUES
(1,1,'2018-01-01'),
(2,1,'2018-01-01'),
(1,1,'2017-11-01'),
(1,2,'2017-10-01'),
(1,1,'2017-09-01'),
(2,2,'2017-01-01'),
(1,1,'2017-08-01'),
(1,1,'2017-07-01'),
(1,1,'2017-06-01'),
(1,2,'2017-05-01'),
(1,1,'2017-04-01'),
(1,1,'2017-03-01'),
(1,1,'2017-01-01')
SELECT
id,
[Status],
MIN([Date]) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status] ) as WindowStart,
max([Date]) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status]) as WindowEnd,
COUNT(*) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status] ) as total
from #test
But the result is this:
id Status WindowStart WindowEnd total
1 1 2017-01-01 2017-01-01 1
1 1 2017-01-01 2017-03-01 2
1 1 2017-01-01 2017-04-01 3
1 1 2017-01-01 2017-06-01 4
1 1 2017-01-01 2017-07-01 5
1 1 2017-01-01 2017-08-01 6
1 1 2017-01-01 2017-09-01 7
1 1 2017-01-01 2017-11-01 8
1 1 2017-01-01 2018-01-01 9
1 2 2017-05-01 2017-05-01 1
1 2 2017-05-01 2017-10-01 2
2 1 2018-01-01 2018-01-01 1
2 2 2017-01-01 2017-01-01 1
And I need to be grouped by window like this.
id Status WindowStart WindowEnd total
1 1 2017-01-01 2017-04-01 3
1 2 2017-05-01 2017-05-01 1
1 1 2017-06-01 2017-09-01 4
1 2 2017-10-01 2017-10-01 1
1 1 2017-11-01 2018-01-01 2
2 1 2018-01-01 2018-01-01 1
2 2 2017-01-01 2017-01-01 1
The first group for the id= 1 Status = 1 should end at the first row with Status = 2 (2017-05-01) so the total is 3 and then start again from the 2017-06-01 to 2017-09-01 with a total of 4 rows.
How can get this done?
This is a "classic" Groups and Island issue. There's probably 1000's of answers for these on the Internet.
This works for what you're after, however, try having a bit more of a research before hand. :)
WITH Groups AS(
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [Date]) -
ROW_NUMBER() OVER (PARTITION BY id, [status] ORDER BY [Date]) AS Grp
FROM #test t)
SELECT G.id,
G.[Status],
MIN([Date]) AS WindowStart,
MAX([date]) AS WindowsEnd,
COUNT(*) AS Total
FROM Groups G
GROUP BY G.id,
G.[Status],
G.Grp
ORDER BY G.id, WindowStart;
Note, that the ordering of your last 2 lines is the other way round in this solution; it seems you're ordering ASCENDING for id 1, for DESCENDING for id 2 in your expected results.
Here is one way using LAG function
;WITH cte
AS (SELECT *,
grp = Sum(CASE WHEN prev_val = Status THEN 0 ELSE 1 END)
OVER(partition BY id ORDER BY Date)
FROM (SELECT *,
prev_val = Lag(Status)OVER(partition BY id ORDER BY Date)
FROM #test) a)
SELECT id,
Status,
WindowStart = Min(date),
WindowEnd = Max(date),
Total = Count(*)
FROM cte
GROUP BY id, Status, grp
Using lag function first find the previous status of each date, then using Sum over() create a group by incrementing the number only when there is a change in status.

window function in redshift

I have some data that looks like this:
CustID EventID TimeStamp
1 17 1/1/15 13:23
1 17 1/1/15 14:32
1 13 1/1/25 14:54
1 13 1/3/15 1:34
1 17 1/5/15 2:54
1 1 1/5/15 3:00
2 17 2/5/15 9:12
2 17 2/5/15 9:18
2 1 2/5/15 10:02
2 13 2/8/15 7:43
2 13 2/8/15 7:50
2 1 2/8/15 8:00
I'm trying to use the row_number function to get it to look like this:
CustID EventID TimeStamp SeqNum
1 17 1/1/15 13:23 1
1 17 1/1/15 14:32 1
1 13 1/1/25 14:54 2
1 13 1/3/15 1:34 2
1 17 1/5/15 2:54 3
1 1 1/5/15 3:00 4
2 17 2/5/15 9:12 1
2 17 2/5/15 9:18 1
2 1 2/5/15 10:02 2
2 13 2/8/15 7:43 3
2 13 2/8/15 7:50 3
2 1 2/8/15 8:00 4
I tried this:
row_number () over
(partition by custID, EventID
order by custID, TimeStamp asc) SeqNum]
but got this back:
CustID EventID TimeStamp SeqNum
1 17 1/1/15 13:23 1
1 17 1/1/15 14:32 2
1 13 1/1/25 14:54 3
1 13 1/3/15 1:34 4
1 17 1/5/15 2:54 5
1 1 1/5/15 3:00 6
2 17 2/5/15 9:12 1
2 17 2/5/15 9:18 2
2 1 2/5/15 10:02 3
2 13 2/8/15 7:43 4
2 13 2/8/15 7:50 5
2 1 2/8/15 8:00 6
how can I get it to sequence based on the change in the EventID?
This is tricky. You need a multi-step process. You need to identify the groups (a difference of row_number() works for this). Then, assign an increasing constant to each group. And then use dense_rank():
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
min(timestamp) over (partition by custid, eventid, grp) as mints
from (select sd.*,
(row_number() over (partition by custid order by timestamp) -
row_number() over (partition by custid, eventid order by timestamp)
) as grp
from somedata sd
) sd
) sd;
Another method is to use lag() and a cumulative sum:
select sd.*,
sum(case when prev_eventid is null or prev_eventid <> eventid
then 1 else 0 end) over (partition by custid order by timestamp
) as seqnum
from (select sd.*,
lag(eventid) over (partition by custid order by timestamp) as prev_eventid
from somedata sd
) sd;
EDIT:
The last time I used Amazon Redshift it didn't have row_number(). You can do:
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
min(timestamp) over (partition by custid, eventid, grp) as mints
from (select sd.*,
(row_number() over (partition by custid order by timestamp rows between unbounded preceding and current row) -
row_number() over (partition by custid, eventid order by timestamp rows between unbounded preceding and current row)
) as grp
from somedata sd
) sd
) sd;
Try this code block:
WITH by_day
AS (SELECT
*,
ts::date AS login_day
FROM table_name)
SELECT
*,
login_day,
FIRST_VALUE(login_day) OVER (PARTITION BY userid ORDER BY login_day , userid rows unbounded preceding) AS first_day
FROM by_day