TSQL Recursive CTE That Contains a SUM - sql

I have a table called T that has three rows as follows....
Id allocation multiplier multipliedallocation multipliedallocationsum
1 20.000 1.0008 20.016 100.052
2 50.000 1.0006 50.030 100.052
3 30.000 1.0002 30.006 100.052
I would like to use this table to produce some projection results that alter the allocation over a number of months as follows...
Id allocation multiplier multipliedallocation multipliedallocationsum mnth
1 20.000 1.0008 20.016 100.052 1
2 50.000 1.0006 50.030 100.052 1
3 30.000 1.0002 30.006 100.052 1
1 20.005 1.0008 20.021 100.050 2
2 50.003 1.0006 50.033 100.050 2
3 29.990 1.0002 29.996 100.050 2
1 20.011 1.0008 20.027 100.052 3
2 50.008 1.0006 50.038 100.052 3
3 29.981 1.0002 29.987 100.052 3
1 20.017 1.0008 20.033 100.054 4
2 50.012 1.0006 50.042 100.054 4
3 29.971 1.0002 29.979 100.054 4
etc
The multipliedallocation on any row = the allocation x multiplier.
The multipliedallocationsum on any row = the sum of the multipliedallocation for that month
The allocation for month n is the multipliedallocation for month n-1 divided by the multipliedallocationsum for month n-1, multiplied by 100
This is the recursive cte SQL statement I have come up with so far, but it is not doing what I expect...
WITH Alloc (Id, allocation, multiplier, multipliedallocation, mnth)
AS
(
SELECT Id, allocation, multiplier, allocation * multiplier AS multipliedallocation, 1 AS mnth
FROM T
),
AllocProjected (Id, allocation, multiplier, multipliedallocation, multipliedallocationsum, mnth)
AS
(
SELECT a.Id, a.allocation, a.multiplier, a.multipliedallocation,
SUM(a.multpliedallocation) OVER (PARTITION BY a.Id, a.mnth), 1
FROM Alloc a
UNION ALL
SELECT a.Id,
CONVERT(decimal(6,3), (a.multipliedallocation / a.multipliedallocationsum) * 100.0) AS allocation,
a.multiplier AS multiplier,
a.multiplier,
a.multiplier * (a.multipliedallocation / a.multipliedallocationsum) * 100.0 AS multipliedallocation,
SUM(a.multipliedallocation) OVER (PARTITION BY a.mnth) AS multipliedallocationsum,
a.mnth + 1
FROM AllocProjected a
WHERE a.mnth < 24
)
SELECT * FROM AllocProjected
ORDER BY mnth
the multipliedallocationsum only seems to show multipliedallocation for a single row, not the sum of the multipliedallocation for the whole month (i.e. across 3 rows)

Related

Snowflake SQL - Count Distinct Users within descending time interval

I want to count the distinct amount of users over the last 60 days, and then, count the distinct amount of users over the last 59 days, and so on and so forth.
Ideally, the output would look like this (TARGET OUTPUT)
Day Distinct Users
60 200
59 200
58 188
57 185
56 180
[...] [...]
where 60 days is the max total possible distinct users, and then 59 would have a little less and so on and so forth.
my query looks like this.
select
count(distinct (case when datediff(day,DATE,current_date) <= 60 then USER_ID end)) as day_60,
count(distinct (case when datediff(day,DATE,current_date) <= 59 then USER_ID end)) as day_59,
count(distinct (case when datediff(day,DATE,current_date) <= 58 then USER_ID end)) as day_58
FROM Table
The issue with my query is that This outputs the data by column instead of by rows (like shown below) AND, most importantly, I have to write out this logic 60x for each of the 60 days.
Current Output:
Day_60 Day_59 Day_58
209 207 207
Is it possible to write the SQL in a way that creates the target as shown initially above?
Using below data in CTE format -
with data_cte(dates,userid) as
(select * from values
('2022-05-01'::date,'UID1'),
('2022-05-01'::date,'UID2'),
('2022-05-02'::date,'UID1'),
('2022-05-02'::date,'UID2'),
('2022-05-03'::date,'UID1'),
('2022-05-03'::date,'UID2'),
('2022-05-03'::date,'UID3'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID2'),
('2022-05-04'::date,'UID3'),
('2022-05-04'::date,'UID4'),
('2022-05-05'::date,'UID1'),
('2022-05-06'::date,'UID1'),
('2022-05-07'::date,'UID1'),
('2022-05-07'::date,'UID2'),
('2022-05-08'::date,'UID1')
)
Query to get all dates and count and distinct counts -
select dates,count(userid) cnt, count(distinct userid) cnt_d
from data_cte
group by dates;
DATES
CNT
CNT_D
2022-05-01
2
2
2022-05-02
2
2
2022-05-03
3
3
2022-05-04
5
4
2022-05-05
1
1
2022-05-06
1
1
2022-05-08
1
1
2022-05-07
2
2
Query to get difference of date from current date
select dates,datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates;
DATES
DDIFF
CNT
CNT_D
2022-05-01
45
2
2
2022-05-02
44
2
2
2022-05-03
43
3
3
2022-05-04
42
5
4
2022-05-05
41
1
1
2022-05-06
40
1
1
2022-05-08
38
1
1
2022-05-07
39
2
2
Get records with date difference beyond a certain range only -
include clause having
select datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates
having ddiff<=43;
DDIFF
CNT
CNT_D
43
3
3
42
5
4
41
1
1
39
2
2
38
1
1
40
1
1
If you need to prefix 'day' to each date diff count, you can
add and outer query to previously fetched data-set and add the needed prefix to the date diff column as following -
I am using CTE syntax, but you may use sub-query given you will select from table -
,cte_1 as (
select datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates
having ddiff<=43)
select 'day_'||to_char(ddiff) days,
cnt,
cnt_d
from cte_1;
DAYS
CNT
CNT_D
day_43
3
3
day_42
5
4
day_41
1
1
day_39
2
2
day_38
1
1
day_40
1
1
Updated the answer to get distinct user count for number of days range.
A clause can be included in the final query to limit to number of days needed.
with data_cte(dates,userid) as
(select * from values
('2022-05-01'::date,'UID1'),
('2022-05-01'::date,'UID2'),
('2022-05-02'::date,'UID1'),
('2022-05-02'::date,'UID2'),
('2022-05-03'::date,'UID5'),
('2022-05-03'::date,'UID2'),
('2022-05-03'::date,'UID3'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID6'),
('2022-05-04'::date,'UID2'),
('2022-05-04'::date,'UID3'),
('2022-05-04'::date,'UID4'),
('2022-05-05'::date,'UID7'),
('2022-05-06'::date,'UID1'),
('2022-05-07'::date,'UID8'),
('2022-05-07'::date,'UID2'),
('2022-05-08'::date,'UID9')
),cte_1 as
(select datediff(day,dates,current_date()) ddiff,userid
from data_cte), cte_2 as
(select distinct ddiff from cte_1 )
select cte_2.ddiff,
(select count(distinct userid)
from cte_1 where cte_1.ddiff <= cte_2.ddiff) cnt
from cte_2
order by cte_2.ddiff desc
DDIFF
CNT
47
9
46
9
45
9
44
8
43
5
42
4
41
3
40
1
You can do unpivot after getting your current output.
sample one.
select
*
from (
select
209 Day_60,
207 Day_59,
207 Day_58
)unpivot ( cnt for days in (Day_60,Day_59,Day_58));

Select max of nested id from amazon redshift

My database is an amazon redshift.
I have a table that looks like this -
id
nested_id
date
value
1
10
'2021-01-01'
5
1
20
'2021-01-01'
10
1
10
'2021-01-02'
6
1
20
'2021-01-02'
11
1
10
'2021-01-03'
7
1
20
'2021-01-03'
12
2
30
'2021-01-01'
5
2
40
'2021-01-01'
10
2
30
'2021-01-02'
6
2
40
'2021-01-02'
11
2
30
'2021-01-03'
7
2
40
'2021-01-03'
12
So this is basically a table that tracks values by id over time, except for every id there can be a nested_id. And the dates and values are primarily connected to the nested_id.
However, let's say I'm starting with the id field, but for each id I want to only return the points over time for the nested_id that has the greater sum of points.
So right now I'm just grabbing it like this...
select *
from mytable
where id in (1, 2)
except I only want it to return nested_id rows where the maximum value of that nested_id is the greatest.
So here's how I would do this manually.
For id of 1, the maximum value is 12, and the nested_id of that value is 20
For id of 2, the maximum value is 12, and the nested_id of that value is 40
So my return table should be
id
nested_id
date
value
1
20
'2021-01-01'
10
1
20
'2021-01-02'
11
1
20
'2021-01-03'
12
2
40
'2021-01-01'
10
2
40
'2021-01-02'
11
2
40
'2021-01-03'
12
Is there an easy way of performing this query? I'm assuming you have to partition somehow?
You can solve this with row_number window functions
with maxs as (
select id,
nested_id,
value,
row_number() over (partition by id order by value desc) rn
from mytable
)
select mt.*
from mytable mt
left join maxs on mt.id = maxs.id and mt.nested_id = maxs.nested_id
where maxs.rn = 1

How to Count Distinct for SAS PROC SQL with Rolling Date Window?

I have a SAS dataset that looks like the one below:
MEMBER_ID VAR1 VAR2 DATE
1 12 5 01/04/2020
1 12 5 02/06/2020
1 16 5 04/14/2020
1 12 7 09/10/2020
2 10 5 02/20/2020
2 10 6 04/20/2020
2 10 5 04/25/2020
2 10 5 05/15/2020
3 15 3 01/15/2020
3 16 4 01/25/2020
4 10 5 05/15/2020
5 11 7 03/03/2020
5 12 8 04/03/2020
5 13 9 05/03/2020
My goal is to count the distinct values in VAR1 and VAR2 grouped by MEMBER_ID and a rolling date range of 180 days. So if the date in row 2 is within the 180 days of row 1 for member 1, then they will be counted (distinctly). My current code looks as follows:
PROC SQL;
CREATE TABLE WORK.WANT AS
SELECT t1.MEMBER_ID,
t1.VAR1,
t1.VAR2,
t1.DATE,
/* var1Count */
(COUNT(DISTINCT(t1.VAR1))) FORMAT=10. LABEL="var1Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var1Count ,
/* var2Count */
(COUNT(DISTINCT(t1.VAR2))) FORMAT=10. LABEL="var2Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var2Count ,
FROM WORK.HAVE t1
GROUP BY t1.MEMBER_ID
HAVING (CALCULATED var1Count ) >= 2 AND (CALCULATED var2Count ) >= 2
ORDER BY t1.MEMBER_ID,
t1.DATE;
QUIT;
But while I think this WHERE statement in the column calculation may work for regular SQL code, it's giving me errors here. Any other ideas? It may be that I need to do this COUNT(DISTINCT VAR) in a different SAS data step, but I'm unsure (and fairly new to SAS for that matter). Any help at all is greatly appreciated!
I think you need to use correlated subqueries for this in SAS:
SELECT h.* ,
(SELECT COUNT(DISTINCT h2.VAR1)
FROM WORK.HAVE h2
WHERE h2.MEMBER_ID = h.MEMBER_ID AND
h2.DATE BETWEEN h.DATE - 180 AND h.DATE
) as var1count,
(SELECT COUNT(DISTINCT h2.VAR2)
FROM WORK.HAVE h2
WHERE h2.MEMBER_ID = h.MEMBER_ID AND
h2.DATE BETWEEN h.DATE - 180 AND h.DATE
) as var2count
FROM WORK.HAVE h;
If you want to filter on the counts, you can use a subquery.

Running assignment of values with break T-SQL

With the below table of data
Customer
Amount Billed
Amount Paid
Date
1
100
60
01/01/2000
1
100
40
01/02/2000
2
200
150
01/01/2000
2
200
30
01/02/2000
2
200
10
01/03/2000
2
200
15
01/04/2000
I would like to create the next two columns
Customer
Amount Billed
Amount Paid
Assigned
Remainder
Date
1
100
60
60
40
01/01/2000
1
100
40
40
0
01/02/2000
2
200
150
150
50
01/01/2000
2
200
30
30
20
01/02/2000
2
200
10
10
10
01/03/2000
2
200
15
10
-5
01/04/2000
The amount paid on each line should be removed from the amount billed and pushed onto the next line for the same customer. The process should continue until there are no more records or the remainder is < 0.
Is there a way of doing this without a cursor? Maybe a recursive CTE?
Thanks
As I mentioned in the comments, this is just a cumulative SUM:
WITH YourTable AS(
SELECT *
FROM (VALUES(1,100,60 ,CONVERT(date,'01/01/2000')),
(1,100,40 ,CONVERT(date,'01/02/2000')),
(2,200,150,CONVERT(date,' 01/01/2000')),
(2,200,30 ,CONVERT(date,'01/02/2000')),
(2,200,10 ,CONVERT(date,'01/03/2000')),
(2,200,15 ,CONVERT(date,'01/04/2000')))V(Customer,AmountBilled,AmountPaid,[Date]))
SELECT Customer,
AmountBilled,
AmountPaid,
AmountBilled - SUM(AmountPaid) OVER (PARTITION BY Customer ORDER BY [Date] ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Remainder,
[Date]
FROM YourTable
ORDER BY Customer,
[Date];
Note this returns -5 for the last row, not 5, as 200 - 205 = -5. If you want 5 wrap the whole expression in an absolute function.
You can achieve this using recursive CTE as well.
DECLARE #customer table (Customer int, AmountBilled int, AmountPaid int, PaidDate date)
insert into #customer
values
(1 ,100, 60 ,'01/01/2000')
,(1 ,100, 40 ,'01/02/2000')
,(2 ,200, 150 ,'01/01/2000')
,(2 ,200, 30 ,'01/02/2000')
,(2 ,200, 10 ,'01/03/2000')
,(2 ,200, 15 ,'01/04/2000');
;WITH CTE_CustomerRNK as
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY customer order by paiddate) AS RNK
from #customer),
CTE_Customer as
(
SELECT customer, AmountBilled, AmountPaid, (amountbilled-amountpaid) as remainder, paiddate ,RNK FROM CTE_CustomerRNK where rnk = 1
union all
SELECT r.customer, r.AmountBilled, r.AmountPaid, (c.remainder - r.AmountPaid) as remainder, r.PaidDate, r.rnk
FROM CTE_CustomerRNK as r
inner join CTE_Customer as c
on c.Customer = r.Customer
and r.rnk = c.rnk + 1
)
SELECT customer, AmountBilled, AmountPaid, remainder, paiddate
FROM CTE_Customer order by Customer
customer
AmountBilled
AmountPaid
remainder
paiddate
1
100
60
40
2000-01-01
1
100
40
0
2000-01-02
2
200
150
50
2000-01-01
2
200
30
20
2000-01-02
2
200
10
10
2000-01-03
2
200
15
-5
2000-01-04

How to find regions where total of their sale exceeded 60%

I have a table interest_summary table with two columns:
int_rate number,
total_balance number
example
10.25 50
10.50 100
10.75 240
11.00 20
My query should return in 2 columns or a string like 10.50 to 10.75 because adding their total exceed 60% of total amount added together
Could you suggest a logic in Oracle?
select
min(int_rate),
max(int_rate)
from
(
select
int_rate,
nvl(sum(total_balance) over(
order by total_balance desc
rows between unbounded preceding and 1 preceding
),0) as part_sum
from interest_summary
)
where
part_sum < (select 0.6*sum(total_balance) from interest_summary)
fiddle
I'm assuming that you're selecting the rows based on the following algorithm:
Sort your rows by total_balance (descending)
Select the highest total_balance row remaining
If its total_balance added to the running total of the total balance is under 60%, add it to the pool and get the next row (step 2)
If not add the row to the pool and return.
The sorted running total looks like this (I'll number the rows so that it's easier to understand what happens):
SQL> WITH data AS (
2 SELECT 1 id, 10.25 interest_rate, 50 total_balance FROM DUAL
3 UNION ALL SELECT 2 id, 10.50 interest_rate, 100 total_balance FROM DUAL
4 UNION ALL SELECT 3 id, 10.75 interest_rate, 240 total_balance FROM DUAL
5 UNION ALL SELECT 4 id, 11.00 interest_rate, 20 total_balance FROM DUAL
6 )
7 SELECT id, interest_rate,
8 SUM(total_balance) OVER (ORDER BY total_balance DESC) running_total,
9 SUM(total_balance) OVER (ORDER BY total_balance DESC)
10 /
11 SUM(total_balance) OVER () * 100 pct_running_total
12 FROM data
13 ORDER BY 3;
ID INTEREST_RATE RUNNING_TOTAL PCT_RUNNING_TOTAL
---------- ------------- ------------- -----------------
3 10,75 240 58,5365853658537
2 10,5 340 82,9268292682927
1 10,25 390 95,1219512195122
4 11 410 100
So in this example we must return rows 3 and 2 because row 2 is the first row where its percent running total is above 60%:
SQL> WITH data AS (
2 SELECT 1 id, 10.25 interest_rate, 50 total_balance FROM DUAL
3 UNION ALL SELECT 2 id, 10.50 interest_rate, 100 total_balance FROM DUAL
4 UNION ALL SELECT 3 id, 10.75 interest_rate, 240 total_balance FROM DUAL
5 UNION ALL SELECT 4 id, 11.00 interest_rate, 20 total_balance FROM DUAL
6 )
7 SELECT ID, interest_rate
8 FROM (SELECT ID, interest_rate,
9 SUM(over_limit)
10 OVER(ORDER BY total_balance DESC) over_limit_no
11 FROM (SELECT id,
12 interest_rate,
13 total_balance,
14 CASE
15 WHEN SUM(total_balance)
16 OVER(ORDER BY total_balance DESC)
17 / SUM(total_balance) OVER() * 100 < 60 THEN
18 0
19 ELSE
20 1
21 END over_limit
22 FROM data
23 ORDER BY 3))
24 WHERE over_limit_no <= 1;
ID INTEREST_RATE
---------- -------------
3 10,75
2 10,5