Slab based calculation in SQL - sql

I want to calculate the slab based logic.
This is my table where the min_bucket and max_bucket range is mention and also the rate of it.
slabs min_bucket max_bucket rate_per_month
----------------------------------------------
Slab 1 0 300000 20
Slab 2 300000 500000 17
Slab 3 500000 1000000 14
Slab 4 1000000 13
We need to calculate as
If there are 450k subs, the payout will be 300k20 + 150k17
If the Total Count is 1000001, then its output should be as
min_bucket max_bucket rate_per_month Count rate_per_month revenue
-----------------------------------------------------------------------
0 300000 20 300000 20 6000000
300000 500000 17 500000 17 8500000
500000 1000000 14 200001 14 2800014
Where count is calculated as 300000+500000+200001 = 1000001, and revenue is calculated as rate_per_month * Count as per the slab.
Can anyone help me write the SQL query for this, which will handle all the cases?

You can build running totals of the slabs table and work with them:
with given as (select 1000001 as value)
, slabs as
(
select
slab,
min_bucket,
max_bucket,
rate_per_month,
sum(min_bucket) over (order by min_bucket) as sum_min_bucket,
sum(coalesce(max_bucket, 2147483647)) over (order by min_bucket) as sum_max_bucket
from mytable
)
select
slabs.slab,
slabs.min_bucket,
slabs.max_bucket,
slabs.rate_per_month,
case when slabs.sum_max_bucket <= given.value
then slabs.max_bucket
else given.value - sum_min_bucket
end as used,
case when slabs.sum_max_bucket <= given.value
then slabs.max_bucket
else given.value - sum_min_bucket
end * slabs.rate_per_month as revenue
from given
join slabs on slabs.sum_min_bucket < given.value
order by slabs.min_bucket;
I don't know your DBMS, but this is standard SQL and likely to work for you (either right away or with a little tweak).
Demo: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=9c4f5f837b6167c7e4f2f7e571f4b26f

Related

How to create dynamic time intervals in SQL Snowflake?

I am writing a query that should recognize the dates when different budgets associated to one account were totally consumed. My start date is 1/12 so my task is to create a cumulative sum of the expenses that the account made for its first budget, then when the cumulative sum reach the budget, I should store the date when it happened, and the use that date as my new START DATE to compute the expenses of the same account but for the second budget, and so on. It seems for me difficult to create this dynamic interval was on the result of the previous budget. Do you have any idea of how I could I manage this problem?
What I Tried
WITH TRM_INTERVAL AS
(
SELECT
Country,
replace(Account_ID, '-', '') AS fixed_accountid,
TO_DATE(Date) AS CONVERTED_DATE,
TO_DATE(CAST(CONVERTED_DATE AS VARCHAR), 'YYYY-DD-MM') AS CONVERTED_DATE_FORMAT,
TRM2,
TRM1,
TOTAL_ARS_IN_ACCOUNT,
row_number()
over (partition by Account_ID, Country order by CONVERTED_DATE_FORMAT) AS ROW_NUMBER,
lag(TOTAL_ARS_IN_ACCOUNT)
over (partition by Account_ID, Country order by CONVERTED_DATE_FORMAT) AS PREVIOUS_ARS_BUCKET,
lead(CONVERTED_DATE_FORMAT)
over (partition by Account_ID, Country order by CONVERTED_DATE_FORMAT) AS NEXT_DATE
FROM CONECTOR
),
--DETAIL
DETAIL AS (
SELECT TI.COUNTRY, TI.fixed_accountid, TI.PREVIOUS_ARS_BUCKET, TI.TOTAL_ARS_IN_ACCOUNT, TI.ROW_NUMBER, GO.EVENT_DATE, SUM(CASE WHEN GO.CURRENCY_ORIGINAL = 'ARS' THEN GO.FINAL_CAC * 160 END) AS ARS_SPEND,
SUM(ARS_SPEND) OVER(PARTITION BY TI.fixed_accountid, TI.COUNTRY, TI.TOTAL_ARS_IN_ACCOUNT ORDER BY GO.EVENT_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUMULATIVE_ARS_SPEND
FROM TABLE1 AS GO
JOIN CONECTOR AS TI
ON GO.ACCOUNT_ID = TI.fixed_accountid AND GO.COUNTRY = TI.COUNTRY
WHERE GO.EVENT_DATE >= '2022-12-1'
GROUP BY 1,2,3,4,5,6
),
--ACUMULADO
LOGIC AS (
SELECT *,
CUMULATIVE_ARS_SPEND / TOTAL_ARS_IN_ACCOUNT AS CUMULATIVE_ARS_SPEND_PERCENTAGE,
--COMO VALIDAR CON EL TI TOTAL ARS CORRESPONDIENTE
CASE WHEN CUMULATIVE_ARS_SPEND_PERCENTAGE >=0.686 AND EVENT_DATE <> CURRENT_DATE() THEN 'FLAG' END AS FLAGS
FROM DETAIL
)
I have 3 subqueries. To sum up, I create the cumulative sum of the expenses related to each account but I could not create the dynamic intervals. The dates start from 12/1/2022.
What actually resulted
You see 3 budgets as example. All of them are associated with a single account. I start with the budget 500,000, you see that I made the cumulative sum until it reach 100%. but then for the second budget 30,000,000 the dates and the expenses are repeated. The same dates were applied 1/12, 1/13, 1/14 and 1/15. The idea is that if the first budget was totally consumed on 1/15 as you can see, the query should compute the expenses for the second budget since one day later 1/16 until that second budget is as well totally consumed and so on.
FECHA BUDGET SPEND_DAILY CUMULATIVE % ACUMM
01/12/2022 500000 100000 100000 20.00%
01/13/2022 500000 150000 250000 50.00%
01/14/2022 500000 200000 450000 90.00%
01/15/2022 500000 50000 500000 100.00%
01/12/2022 30000000 100000 100000 0.33%
01/13/2022 30000000 150000 250000 0.83%
01/14/2022 30000000 200000 450000 1.50%
01/15/2022 30000000 50000 500000 1.67%
01/12/2022 4000000 100000 100000 2.50%
01/13/2022 4000000 150000 250000 6.25%
01/14/2022 4000000 200000 450000 11.25%
01/15/2022 4000000 50000 500000 12.50%
What I expected to happen
The desire output should be these table, where the first budget is totally consumed one day, and on the next day, the expenses compute is updated and now the cumulative sum is applied to the second budget.
FECHA BUDGET GASTO DIARIO GASTO ACUMM % ACUMM
01/12/2022 500000 100000 100000 20.00%
01/13/2022 500000 150000 250000 50.00%
01/14/2022 500000 200000 450000 90.00%
01/15/2022 500000 50000 500000 100.00%
01/16/2022 30000000 100000 100000 0.33%
01/17/2022 30000000 200000 300000 1.00%
01/18/2022 30000000 500000 800000 2.67%
01/19/2022 30000000 300000 1100000 3.67%
01/20/2022 30000000 150000 1250000 4.17%
01/21/2022 30000000 20000000 21250000 70.83%
01/22/2022 30000000 10000000 31250000 104.17%

Window functions: PARTITION BY one column after ORDER BY another

Disclaimer: The shown problem is much more general than I expected first. The example below is taken from a solution to another question. But now I was taking this sample for solving many problems more - mostly related to time series (have a look at the "Linked" section in the right bar).
So I am trying to explain the problem more generally first:
I am using PostgreSQL but I am sure this problem exists in other window function supporting DBMS' (MS SQL Server, Oracle, ...) as well.
Window functions can be used to group certain values together by a common attribute or value. For example you can group rows by a date. Then you are able to calculate the max value within every single date or an average value or counting rows or whatever.
This can be achieved by defining a PARTITION. Grouping by dates would work with PARTITION BY date_column. Now you want to do an operation which needs a special order within your groups (calculating row numbers or sum up a column). This can be done with PARTITON BY date_column ORDER BY an_attribute_column.
Now think about a finer resolution of time series. What if you do not have dates but timestamps. Then you cannot group by the time column anymore. But nevertheless it might be important to analyse the data in the order they were added (maybe the timestamp is the creating time of your data set). Then you realize that some consecutive rows have the same value and you want to group your data by this common value. But the clue is that the rows have different timestamps.
The problem here is that you cannot do a PARTITION BY value_column. Because PARTITION BY forces an ordering first. So your table would be ordered by the value_column before the grouping and is not ordered by the timestamp anymore. This yields in results you are not expecting.
More general speaking: The problem is to ensure a special ordering even if the ordered column is not part of the created partition.
Example:
db<>fiddle
I have the following table:
ts val
100000 50
130100 30050
160100 60050
190200 100
220200 30100
250200 30100
300000 300
500000 100
550000 1000
600000 1000
650000 2000
700000 2000
720000 2000
750000 300
I had the problem that I had to group all tied values of the column val. But I wanted to hold the order by ts. To achieve this I wanted to add a column with a unique ID per val group
Expected result:
ts val group
100000 50 1
130100 30050 2
160100 60050 3
190200 100 4
220200 30100 5 \ same group
250200 30100 5 /
300000 300 6
500000 100 7
550000 1000 8 \ same group
600000 1000 8 /
650000 2000 9 \
700000 2000 9 | same group
720000 2000 9 /
750000 300 10
First try was the use of the rank window function which would do this job normally:
SELECT
*,
rank() OVER (PARTITION BY val ORDER BY ts)
FROM
test
But in this case this doesn't work because the PARTITION BY clause orders the table first by its partition columns (val in this case) and then by its ORDER BY columns. So the order is by val, ts instead of the expected order by ts. So the result was not the expected one of course.
ts val rank
100000 50 1
190200 100 1
500000 100 2
300000 300 1
750000 300 2
550000 1000 1
600000 1000 2
650000 2000 1
700000 2000 2
720000 2000 3
130100 30050 1
220200 30100 1
250200 30100 2
160100 60050 1
The question is: How to get the group ids with respect to the order by ts?
Edit: I added an own solution below but I feel very uncomfortable with it. It seems way too complicated. I was wondering if there's a better way to achieve this result.
I came up with this solution by myself (hoping someone else will get a better one):
demo:db<>fiddle
order by ts
give out the next val value with the lag window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
check if the next and the current values are the same. Then I can print out a 0 or a 1
sum up these values with an ordered SUM. This generates the groups I am looking for. They group the val column but ensure the ordering by the ts column.
The query:
SELECT
*,
SUM(is_diff) OVER (ORDER BY ts)
FROM (
SELECT
*,
CASE WHEN val = lag(val) over (order by ts) THEN 0 ELSE 1 END as is_diff
FROM test
)s
The result:
ts val is_diff sum
100000 50 1 1
130100 30050 1 2
160100 60050 1 3
190200 100 1 4
220200 30100 1 5 \ group
250200 30100 0 5 /
300000 300 1 6
500000 100 1 7
550000 1000 1 8 \ group
600000 1000 0 8 /
650000 2000 1 9 \
700000 2000 0 9 | group
720000 2000 0 9 /
750000 300 1 10

How to calculate (ate - avg(rate) for each range?

This is a follow up question for How to find max records for given range
I have several ranges from human input, like 1-100, 101-10001. For each range, I want to calculate rate - avg(rate for each range).
Input:
Distance Rate
10 5
25 200
50 300
1000 5
2000 2000
Output:
Distance rate - avg(rate for each range)
10 x
25 xx
50 xx
1000 xx
2000 xxx
You need to define the ranges and then use window functions. This is pretty easy:
select t.distance, t.rate, v.grp,
(t.rate - avg(t.rate) over (partition by v.grp)) as deviation
from t outer apply
(values (case when t.distance <= 100 then '1-100'
when t.distance <= 1000 then '101-1000'
else 'other'
end)
) v(grp);

Cumulative Compoud Interest Calculation(Oracle Database 11g Release 2)

I have a requirement to calculate rolling compound interest on several accounts in pl/sql. I was looking for help/advice on how to script calculate these calculations. The calculations I need are in the last two columns of the output below (INTERESTAMOUNT AND RUNNING TOTAL). I found similar examples of this on here, but nothing specifically fitting these requirements in pl/sql. I am also new to CTE/Recursive Techniques and the Model technique I found required a specific iteration which would be variable in this case. Please see my problem below:
Calculations:
INTERESTAMOUNT = (Previous Year RUNNING TOTAL+ Current Year AMOUNT) * INTEREST_RATE
RUNNINGTOTAL = (Previous Year RUNNING TOTAL+ Current Year AMOUNT) * (1 + INTEREST_RATE) - CURRENT YEAR EXPENSES
Input Table:
YEAR ACCT_ID AMOUNT INTEREST_RATE EXPENSES
2002 1 1000 0.05315 70
2003 1 1500 0.04213 80
2004 1 800 0.03215 75
2005 1 950 0.02563 78
2000 2 750 0.07532 79
2001 2 600 0.06251 75
2002 2 300 0.05315 70
Desired Output:
YEAR ACCT_ID AMOUNT INTEREST_RATE EXPENSES INTERESTAMOUNT RUNNINGTOTAL
2002 1 1000 0.05315 70 53.15 983.15
2003 1 1500 0.04213 80 104.62 2507.77
2004 1 800 0.03215 75 106.34 3339.11
2005 1 950 0.02563 78 109.93 4321.04
2000 2 750 0.07532 79 56.49 727.49
2001 2 600 0.06251 75 82.98 1335.47
2002 2 300 0.05315 70 86.93 1652.4
One way to do it is with a recursive cte.
with rownums as (select t.*
,row_number() over(partition by acct_id order by yr) as rn
from t) -- t is your tablename
,cte(rn,yr,acct_id,amount,interest_rate,expenses,running_total,interest_amount) as
(select rn,yr,acct_id,amount,interest_rate,expenses
,(amount*(1+interest_rate))-expenses
,amount*interest_rate
from rownums
where rn=1
union all
select t.rn,t.yr,t.acct_id,t.amount,t.interest_rate,t.expenses
,((c.running_total+t.amount)*(1+t.interest_rate))-t.expenses
,(c.running_total+t.amount)*t.interest_rate
from cte c
join rownums t on t.acct_id=c.acct_id and t.rn=c.rn+1
)
select * from cte
Sample Demo
Generate row numbers using row_number function
Calculate the interest and running total of the first row for each acct_id (anchor in the recursive cte).
Join every row to the next one (ordered by ascending order of year column) for each account_id and compute the running total and interest for the subsequent rows.

Grouping of Similar data by amount in Oracle

I have a txn table with columns ac_id, txn_amt. It will store the data txn amounts along with account ids. Below is example of data
AC_ID TXN_AMT
10 1000
10 1000
10 1010
10 1030
10 5000
10 5010
10 10000
20 32000
20 32200
20 5000
I want to write a query in such a way that all the amounts which are within 10% range of the previous amounts should be grouped together. Output should be something like this:
AC_ID TOTAL_AMT TOTAL_CNT GROUP
10 4040 4 1
10 10010 2 2
20 64200 2 3
20 5000 1 4
I tried with LAG function but still clueless. This is the code snippet I tried:
select ac_id, txn_amt, round((((txn_amt - lag(txn_amt, 1) over (partition by ac_id order by ac_id, txn_amt))/txn_amt)*100,2) as amt_diff_pct from txn;
Any clue or help will be highly appreciated.
If by previous you mean "the largest amount less than", then you can do this. You can find where the gaps are (i.e. larger than a 10% difference). Then you can assign a group by counting the number of gaps:
select ac_id, sum(txn_amt) as total_amt, count(*) as total_cnt, grp
from (select t.*,
sum(case when prev_txn_amt * 1.1 > txn_amt then 0 else 1 end) over
(partition by ac_id order by txn_amt) as grp
from (select t.*,
lag(txn_amt) over (partition by ac_id order by txn_amt) as prev_txn_amt
from txn t
) t
) t
group by ac_id, grp;