Rank customer Transactions per segments in SQL Server - sql

I have below table which has customer's transaction details.
Tranactaction date
CustomerID
1/27/2022
1
1/29/2022
1
2/27/2022
1
3/27/2022
1
3/29/2022
1
3/31/2022
1
4/2/2022
1
4/4/2022
1
4/6/2022
1
In this table consecutive transactions occurred in every two days considered as a segment.
For example, Transactions between Jan 27th and Jan 29th considered as segment 1 & Transactions between Mar 29th and Apr 6th considered as Segment 2. I need to rank the transactions per segment with date order. If a transaction not fall under any segment by default the rank is 1. Expected output is below.
Segment Rank
Tranactaction date
CustomerID
1
1/27/2022
1
2
1/29/2022
1
1
2/27/2022
1
1
3/27/2022
1
2
3/29/2022
1
3
3/31/2022
1
4
4/2/2022
1
5
4/4/2022
1
6
4/6/2022
1
Can somebody guide how to achieve this in T-sql?

Using lag() to check for change in TransDate that is within 2 days and groups together (as a segment). After that use row_number() to generate the required sequence
with
cte as
(
select *,
g = case when datediff(day,
lag(t.TransDate) over (order by t.TransDate),
t.TransDate
) <= 2
then 0
else 1
end
from tbl t
),
cte2 as
(
select *, grp = sum(g) over (order by TransDate)
from cte
)
select *, row_number() over (partition by grp order by TransDate)
from cte2
db<>fiddle demo

Related

sum values based on 7-day cycle in SQL Oracle

I have dates and some value, I would like to sum values within 7-day cycle starting from the first date.
date value
01-01-2021 1
02-01-2021 1
05-01-2021 1
07-01-2021 1
10-01-2021 1
12-01-2021 1
13-01-2021 1
16-01-2021 1
18-01-2021 1
22-01-2021 1
23-01-2021 1
30-01-2021 1
this is my input data with 4 groups to see what groups will create the 7-day cycle.
It should start with first date and sum all values within 7 days after first date included.
then start a new group with next day plus anothe 7 days, 10-01 till 17-01 and then again new group from 18-01 till 25-01 and so on.
so the output will be
group1 4
group2 4
group3 3
group4 1
with match_recognize would be easy current_day < first_day + 7 as a condition for the pattern but please don't use match_recognize clause as solution !!!
One approach is a recursive CTE:
with tt as (
select dte, value, row_number() over (order by dte) as seqnum
from t
),
cte (dte, value, seqnum, firstdte) as (
select tt.dte, tt.value, tt.seqnum, tt.dte
from tt
where seqnum = 1
union all
select tt.dte, tt.value, tt.seqnum,
(case when tt.dte < cte.firstdte + interval '7' day then cte.firstdte else tt.dte end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select firstdte, sum(value)
from cte
group by firstdte
order by firstdte;
This identifies the groups by the first date. You can use row_number() over (order by firstdte) if you want a number.
Here is a db<>fiddle.

How to use SQL to get column count for a previous date?

I have the following table,
id status price date
2 complete 10 2020-01-01 10:10:10
2 complete 20 2020-02-02 10:10:10
2 complete 10 2020-03-03 10:10:10
3 complete 10 2020-04-04 10:10:10
4 complete 10 2020-05-05 10:10:10
Required output,
id status_count price ratio
2 0 0 0
2 1 10 0
2 2 30 0.33
I am looking to add the price for previous row. Row 1 is 0 because it has no previous row value.
Find ratio ie 10/30=0.33
You can use analytical function ROW_NUMBER and SUM as follows:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id ORDER BY date), 0) - price as price
FROM yourTable;
DB<>Fiddle demo
I think you want something like this:
SELECT
id,
COUNT(*) OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id
ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING), 0) price
FROM yourTable;
Demo
Please also check another method:
with cte
as(*,ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
SUM(price) OVER (PARTITION BY id ORDER BY date) ss from yourTable)
select id,status_count,isnull(ss,0)-price price
from cte

Unable to resolve Rank Over Partition with multiple variables

I am trying to analyse a bunch of transaction data and have set up a series of different ranks to help me. The one I can't get right is the beneficiary rank. I want it to partition where there is a change in beneficiary chronologically rather than alphabetically.
Where the same beneficiary is paid from January to March and then again in June I would like the June to be classed a separate 'session'.
I am using Teradata SQL if that makes a difference.
I thought the solution was going to be a DENSE_RANK but if I PARTITION BY (CustomerID, Beneficiary) ORDER BY SystemDate it counts up the number of months. If I PARTITION BY (CustomerID) ORDER BY Beneficiary then it is not chronological, I need the highest rank to be the latest Beneficiary.
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
FROM table ORDER BY CustomerID, PaymentRank
CustomerID Beneficiary Amount DateStamp Month PaymentRank MonthRank BeneficiaryRank TransactionRank
a aa 10 Jan 1 1 1 1
a aa 20 Feb 2 2 2 1
a aa 20 Mar 3 3 3 2
a aa 20 Apr 4 4 4 3
a bb 20 May 5 5 1 1
a bb 30 Jun 6 6 2 1
a aa 30 Jul 7 7 5 2
a aa 30 Aug 8 8 6 1
a cc 5 Sep 9 9 1 1
a cc 5 Oct 10 10 2 2
a cc 5 Nov 11 11 3 3
b cc 5 Dec 1 1 1 1
This is what I have so far, I want a column alongside this which will look like the below
CustomerID Beneficiary Amount DateStamp Month NewRank
a aa 10 Jan 1
a aa 20 Feb 1
a aa 20 Mar 1
a aa 20 Apr 1
a bb 20 May 2
a bb 30 Jun 2
a aa 30 Jul 3
a aa 30 Aug 3
a cc 5 Sep 4
a cc 5 Oct 4
a cc 5 Nov 4
b cc 5 Dec 1
This is a type of gaps-and-islands problem. I would recommend lag() and a cumulative sum:
select t.*,
sum(case when prev_systemdate > systemdate - interval '1' month then 0 else 1 end) over (partition by customerid, beneficiary order by systemdate)
from (select t.*,
lag(systemdate) over (partition by customerid, beneficiary order by systemdate) as prev_systemdate
from t
) t
Credits to #Gordon and #dnoeth for providing the ideas and code to get me on the right track.
The below is mostly ripped from dnoeth but needed to add ROWS unbounded preceding to get the aggregation correct. Without this it was just showing the total for the partition. I also changed the systemdate to paymentrank as I had to fiddle about a bit with duplicate entries on a day.
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag) OVER(PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = MIN(Beneficiary) OVER (PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) THEN 0 ELSE 1 END AS Flag ) AS dt
ORDER BY CustomerID, PaymentRank
The inner query sets a flag whenever the beneficiary changes. The outer query then does a cumulative sum on those.
I was unsure what the unbounded preceding was doing and #dnoeth has a great explanation here Below is taken from that explanation.
•UNBOUNDED PRECEDING, all rows before the current row -> fixed
•UNBOUNDED FOLLOWING, all rows after the current row -> fixed
•x PRECEDING, x rows before the current row -> relative
•y FOLLOWING, y rows after the current row -> relative
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag)
OVER(PARTITION BY CustomerID
ORDER BY SystemDate ASC
,flag DESC -- needed if the order by columns are not unique
ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate) THEN 0 ELSE 1 END AS flag
FROM table
) AS dt
ORDER BY CustomerID, PaymentRank
Your problem with Gordon's query is probably caused by your Teradata release, LAG is only supported in 16.10+. But there's a simple workaround:
LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate)
--is equivalent to
MIN(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))

Counters in Teradata while inserting records

I am trying to insert records in a table in the below format
Name Amount Date Counter
A 100 Jan 1 1
A 100 Jan2 1
A 200 Jan 10 2
A 300 Mar 30 3
B 50 Jan 7 1
C 20 Jan 7 1
Could someone tell me the sql for generating the value for the Counter field .
The counter value should increment whenever the amount changes and reset when the name changes.
What you need is a DENSE_RANK function. Unfortunately it's not natively implemented before TD14.10, but it can be written using nested OLAP-functions:
SELECT
Name
,Amount
,date_col
,SUM(flag)
OVER (PARTITION BY Name
ORDER BY date_col
ROWS UNBOUNDED PRECEDING) AS "DENSE_RANK"
FROM
(
SELECT
Name
,Amount
,date_col
,CASE
WHEN Amount = MIN(Amount)
OVER (PARTITION BY Name
ORDER BY date_col
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
THEN 0
ELSE 1
END AS flag
FROM dropme
) AS dt;

How to calculate moving sum with reset based on condition in teradata SQL?

I have this data and I want to sum the field USAGE_FLAG but reset when it drops to 0 or moves to a new ID keeping the dataset ordered by SU_ID and WEEK:
SU_ID WEEK USAGE_FLAG
100 1 0
100 2 7
100 3 7
100 4 0
101 1 0
101 2 7
101 3 0
101 4 7
102 1 7
102 2 7
102 3 7
102 4 0
So I want to create this table:
SU_ID WEEK USAGE_FLAG SUM
100 1 0 0
100 2 7 7
100 3 7 14
100 4 0 0
101 1 0 0
101 2 7 7
101 3 0 0
101 4 7 7
102 1 7 7
102 2 7 14
102 3 7 21
102 4 0 0
I have tried the MSUM() function using GROUP BY but it won't keep the order I want above. It groups the 7's and the week numbers together which I don't want.
Anyone know if this is possible to do? I'm using teradata
In standard SQL a running sum can be done using a windowing function:
select su_id,
week,
usage_flag,
sum(usage_flag) over (partition by su_id order by week) as running_sum
from the_table;
I know Teradata supports windowing functions, I just don't know whether it also supports an order by in the window definition.
Resetting the sum is a bit more complicated. You first need to create "group IDs" that change each time the usage_flag goes to 0. The following works in PostgreSQL, I don't know if this works in Teradata as well:
select su_id,
week,
usage_flag,
sum(usage_flag) over (partition by su_id, group_nr order by week) as running_sum
from (
select t1.*,
sum(group_flag) over (partition by su_id order by week) as group_nr
from (
select *,
case
when usage_flag = 0 then 1
else 0
end as group_flag
from the_table
) t1
) t2
order by su_id, week;
Try below code, with use of RESET function it is working fine.
select su_id,
week,
usage_flag,
SUM(usage_flag) OVER (
PARTITION BY su_id
ORDER BY week
RESET WHEN usage_flag < /* preceding row */ SUM(usage_flag) OVER (
PARTITION BY su_id ORDER BY week
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
ROWS UNBOUNDED PRECEDING
)
from emp_su;
Please try below SQL:
select su_id,
week,
usage_flag,
SUM(usage_flag) OVER (PARTITION BY su_id ORDER BY week
RESET WHEN usage_flag = 0
ROWS UNBOUNDED PRECEDING
)
from emp_su;
Here RESET WHEN usage_flag = 0 will reset sum whenever sum usage_flag drops to 0