I am trying to add a column to my table with a running sum that restarts once a value is reached. It should look like this (for ex the limit here is 500):
x
output
100
100
200
300
100
400
300
300
200
500
200
200
Another option is to create some type of id for each batch of values with the sum<500:
x
output
100
1
200
1
100
1
300
2
200
2
200
3
I am using the SUM(x) over () function but I am not finding a way to restart the sum after it reached the 500.
I have a pandas df as follows:
User Amount Type
100 10 Check
100 20 Cash
100 30 Paypal
200 50 Venmo
200 50 Cash
200 50 Check
300 20 Zelle
300 15 Zelle
300 15 Zelle
I want to organize it such that my end result is as follows:
User Cash Check Paypal Venmo Zelle
100 1 1 1
200 1 1 1
300 3
I am looking to count the number of times a user has transacted through each unique method.
If a user didnt transact, I want to either leave it blank or set it to 0.
How can I do this? I tried a pd.groupby() but am not sure of the next step...
Thanks!
You are looking for crosstab:
pd.crosstab(df['User'], df['Type']).reset_index().rename_axis('',axis=1)
output:
User Cash Check Paypal Venmo Zelle
0 100 1 1 1 0 0
1 200 1 1 0 1 0
2 300 0 0 0 0 3
Imagine we have user balances. There's a table with top-up and withdrawals. Let's call it balance_updates.
transaction_id
user_id
current_balance
amount
created_at
1
1
100
100
...
2
1
0
-100
3
2
400
400
4
2
300
-100
5
2
200
-200
6
2
300
100
7
2
50
-50
What I want to get off this is a list of top-ups and their leftovers using the first in first out technique for each user.
So the result could be this
top_up
user_id
leftover
1
1
0
3
2
50
6
2
100
Honestly, I struggle to turn it to SQL. Tho I know how to do it on paper. Got any ideas?
Sample Data:
Date,code
06/01/2021,405
06/01/2021,405
06/01/2021,400
06/02/2021,200
06/02/2021,300
06/03/2021,500
06/02/2021,500
06/03/2021,300
06/05/2021,500
06/04/2021,500
06/03/2021,400
06/02/2021,400
06/04/2021,400
06/03/2021,400
06/01/2021,400
06/04/2021,200
06/05/2021,200
06/02/2021,200
06/06/2021,300
06/04/2021,300
06/06/2021,300
06/05/2021,400
06/03/2021,400
06/04/2021,400
06/04/2021,500
06/01/2021,200
06/02/2021,300
import pandas as pd
df = pd.read_csv(testfile.csv)
code_total = df.groupby(by="Date",)['code'].value_counts()
print(code_total)
Date code
06/01/2021 400 2
405 2
200 1
06/02/2021 200 2
300 2
400 1
500 1
06/03/2021 400 3
300 1
500 1
06/04/2021 400 2
500 2
200 1
300 1
06/05/2021 200 1
400 1
500 1
06/06/2021 300 2
dates = set([x[0] for x in code_total.index])
codes = set([x[1] for x in code_total.index])
test = pd.DataFrame(code_total,columns=sorted(codes),index=sorted(dates))
print(test)
Is there a way to transpose the second index into a column and retain the value for the counts? Ultimately I'm trying to plot the count of unique error codes on a line graph. I've been searching up many different ways but am always missing something. any help would be appreciated.
Use Series.unstack:
df = df.groupby(by="Date",)['code'].value_counts().unstack(fill_value=0)
I have a table called Product Variant.
sequence No item
400 1 4.5
500 1 0
501 1 0
502 1 0
503 1 B-DP
504 2 0
400 1 2.5
500 2 0
501 2 0
502 2 0
503 2 B-PP
504 2 0
My Required output is :
sequence No item item1
503 1 B-DP 4.5
503 2 B-PP 2.5
I am trying but not coming as expected.. Can anyone suggest me on this please.
Thanks in Advance.
Something like this?
select max(case when item like 'B%' then sequence end),
no,
sum(try_convert(numeric(38, 6), item1))
from t
group by no;