Presto running cumulative sum with limit or cap - sql

I am trying to add a column to my table with a running sum that restarts once a value is reached. It should look like this (for ex the limit here is 500):
x
output
100
100
200
300
100
400
300
300
200
500
200
200
Another option is to create some type of id for each batch of values with the sum<500:
x
output
100
1
200
1
100
1
300
2
200
2
200
3
I am using the SUM(x) over () function but I am not finding a way to restart the sum after it reached the 500.

Related

How to calculate leftovers of each balance top-up using first in first out technique?

Imagine we have user balances. There's a table with top-up and withdrawals. Let's call it balance_updates.
transaction_id
user_id
current_balance
amount
created_at
1
1
100
100
...
2
1
0
-100
3
2
400
400
4
2
300
-100
5
2
200
-200
6
2
300
100
7
2
50
-50
What I want to get off this is a list of top-ups and their leftovers using the first in first out technique for each user.
So the result could be this
top_up
user_id
leftover
1
1
0
3
2
50
6
2
100
Honestly, I struggle to turn it to SQL. Tho I know how to do it on paper. Got any ideas?

Turn MultiIndex Series into pivot table design by unique value counts

Sample Data:
Date,code
06/01/2021,405
06/01/2021,405
06/01/2021,400
06/02/2021,200
06/02/2021,300
06/03/2021,500
06/02/2021,500
06/03/2021,300
06/05/2021,500
06/04/2021,500
06/03/2021,400
06/02/2021,400
06/04/2021,400
06/03/2021,400
06/01/2021,400
06/04/2021,200
06/05/2021,200
06/02/2021,200
06/06/2021,300
06/04/2021,300
06/06/2021,300
06/05/2021,400
06/03/2021,400
06/04/2021,400
06/04/2021,500
06/01/2021,200
06/02/2021,300
import pandas as pd
df = pd.read_csv(testfile.csv)
code_total = df.groupby(by="Date",)['code'].value_counts()
print(code_total)
Date code
06/01/2021 400 2
405 2
200 1
06/02/2021 200 2
300 2
400 1
500 1
06/03/2021 400 3
300 1
500 1
06/04/2021 400 2
500 2
200 1
300 1
06/05/2021 200 1
400 1
500 1
06/06/2021 300 2
dates = set([x[0] for x in code_total.index])
codes = set([x[1] for x in code_total.index])
test = pd.DataFrame(code_total,columns=sorted(codes),index=sorted(dates))
print(test)
Is there a way to transpose the second index into a column and retain the value for the counts? Ultimately I'm trying to plot the count of unique error codes on a line graph. I've been searching up many different ways but am always missing something. any help would be appreciated.
Use Series.unstack:
df = df.groupby(by="Date",)['code'].value_counts().unstack(fill_value=0)

Pandas - Converting columns in percentage based on first columns value

There is a data frame with totals and counts:
pd.DataFrame({
'categorie':['a','b','c'],
'total':[100,1000,500],
'x':[10,100,5],
'y':[100,1000,500]
})
categorie
total
x
y
a
100
10
100
b
1000
100
1000
c
500
5
500
I like to convert the counted columns into percentage based on the totals:
categorie
total
x%
y%
a
100
10
100
b
1000
10
100
c
500
1
100
Following will work for a series:
(100 * df['x'] / df['total']).round(1)
How to apply this for all columns in the data frame?
try via div(),mul() and astype() method:
df[['x%','y%']]=df[['x','y']].div(df['total'],axis=0).mul(100).astype(int)
output of df:
categorie total x y x% y%
0 a 100 10 100 10 100
1 b 1000 100 1000 10 100
2 c 500 5 500 1 100

How to Calculate Percentages for Groups in SQL

I have a table that looks something like this
Class ID Value
A 1 300
A 2 200
A 3 500
B 1 300
B 2 300
C 1 1000
Is there a way of using SQL to calculate the percentage share each ID has to the class.
For example, the percentages for class A would be 30% to id 1, 20% to ID 2, and 50% to id 3 and so on for the other classes:
Class ID Value Percentage
A 1 300 30%
A 2 200 20%
A 3 500 50%
B 1 300 50%
B 2 300 50%
C 1 1000 100%
You can use window functions (if your database, which you did not disclose, supports them):
select
t.*,
1.0 * value / sum(value) over(partition by class) ratio
from mytable t
This gives you a ratio, that is a value between 0 and 1 - I find that this is more relevant than a percentage, but you can multiply that by 100 if you like.

Fifo Method using SQL

I need one adaptation for the first table because there are negative issues points and I need the net table considerating the negatives points as debit of the first time of issue. E.g:
Date of issue Number of account Issued points
30-abr 1 300
31-may 1 50
30-jun 1 100
30-jun 1 -50
30-abr 2 200
31-may 2 60
I want this table
Date of issue Number of account Issued points
30-abr 1 250
31-may 1 50
30-jun 1 100
30-abr 2 200
31-may 2 60