Pandas group by and row count by category

Pandas group by and row count by category - pandas

I have a pandas df as follows:
User Amount Type
100 10 Check
100 20 Cash
100 30 Paypal
200 50 Venmo
200 50 Cash
200 50 Check
300 20 Zelle
300 15 Zelle
300 15 Zelle
I want to organize it such that my end result is as follows:
User Cash Check Paypal Venmo Zelle
100 1 1 1
200 1 1 1
300 3
I am looking to count the number of times a user has transacted through each unique method.
If a user didnt transact, I want to either leave it blank or set it to 0.
How can I do this? I tried a pd.groupby() but am not sure of the next step...
Thanks!

You are looking for crosstab:
pd.crosstab(df['User'], df['Type']).reset_index().rename_axis('',axis=1)
output:
User Cash Check Paypal Venmo Zelle
0 100 1 1 1 0 0
1 200 1 1 0 1 0
2 300 0 0 0 0 3

Related

Pandas DataFrame subtract values

Im new to python
I have a data frame (df) which has the following structure:
ID
rate
Sequential number
a
150
1
a
150
1
a
50
2
b
250
1
c
25
1
d
25
1
d
40
2
d
25
3
The ID are customers, the value are monthly rates and Sequential number is a number that always increases by 1, if the customer changes the monthly rate
I want to do the following:
for every ID find the maximum value in the column Sequential number, take the associated value in the column rate, find the minimum value in the column Sequential number and take associated value in the column rate and subtracting the rates.
At the end I want to have a additional column to my data frame with the difference of the rates. Maybe the loop could do the following:
for id in df()
find max() in column Sequential number and get value in rates -
min () in column Sequential number and get value in rates
return difference
The new df_new should be this
ID
rate
Sequential number
rate_diff
a
150
1
0
a
150
1
0
a
50
2
-100
b
250
1
0
c
25
1
0
d
25
1
0
d
40
2
0
d
30
3
5
If an ID has only one entry, the rate_diff should be 0
I tried already the lambda Function:
df['diff_rate'] = df.groupby('ID')['rate'].transform(lambda x : x-x.min())
but this returns
ID
rate
Sequential number
rate_diff
a
150
1
100
a
150
1
100
a
50
2
0
b
250
1
0
c
25
1
0
d
25
1
0
d
40
2
15
d
30
3
10
Maybe someone of you have a small workaround for this! :-)

One approach with indexing:
g = df.groupby('ID')['Sequential number']
IMAX = g.idxmax()
IMIN = g.idxmin()
df['rate_diff'] = 0
df.loc[IMAX, 'rate_diff'] = (df.loc[IMAX, 'rate'].to_numpy()
-df.loc[IMIN, 'rate'].to_numpy()
)
Another with groupby.transform+where:
g = df.sort_values(by=['ID', 'Sequential number']).groupby('ID')
m = g['Sequential number'].idxmax()
df['rate_diff'] = (g['rate'].transform(lambda x: x.iloc[-1]-x.iloc[0])
.where(df.index.isin(m), 0)
)
output:
ID rate Sequential number rate_diff
0 a 150 1 0
1 a 150 1 0
2 a 50 2 -100
3 b 250 1 0
4 c 25 1 0
5 d 25 1 0
6 d 40 2 0
7 d 30 3 5

How to calculate leftovers of each balance top-up using first in first out technique?

Imagine we have user balances. There's a table with top-up and withdrawals. Let's call it balance_updates.
transaction_id
user_id
current_balance
amount
created_at
1
1
100
100
...
2
1
0
-100
3
2
400
400
4
2
300
-100
5
2
200
-200
6
2
300
100
7
2
50
-50
What I want to get off this is a list of top-ups and their leftovers using the first in first out technique for each user.
So the result could be this
top_up
user_id
leftover
1
1
0
3
2
50
6
2
100
Honestly, I struggle to turn it to SQL. Tho I know how to do it on paper. Got any ideas?

SQL Query to turn my columns un to rows

Im looking for a way to make this happen:
From this:
id date budget forecast actual
0 1/1/111 100 200 5
1 2/1/111 10 20 3
2 3/1/111 300 5000 1
3 4/1/111 400 800 0
To this:
column1 column2 column3 column4 column5
id 0 1 2 3
date 1/11/1111 2/11/1111 3/11/1111 4/11/1111
budget 100 10 300 400
forecast 200 20 5000 800
actual 5 3 1 0
Is there any way of doing this in SQL?
Thanks in advance.

Fifo Method using SQL

I need one adaptation for the first table because there are negative issues points and I need the net table considerating the negatives points as debit of the first time of issue. E.g:
Date of issue Number of account Issued points
30-abr 1 300
31-may 1 50
30-jun 1 100
30-jun 1 -50
30-abr 2 200
31-may 2 60
I want this table
Date of issue Number of account Issued points
30-abr 1 250
31-may 1 50
30-jun 1 100
30-abr 2 200
31-may 2 60

Sqlite query Smarter Rows with mathematics of cloned tables?

Okay this is pretty complicated to even explain but I will try.
Buying table
----------------------------------------------------------------------------------------------------------
id itemId amount price bought collected slot aborted playerHash
1 2607 4111 200600 0 0 0 0 1020628
2 11335 1 0 0 0 3 1 1020628
3 2495 6546 5306 0 0 1 0 1020628
4 1127 101 58300 101 0 5 0 37763265
5 14479 1 107500 0 0 2 0 37763265
6 1 100 1 0 0 0 0 3 *simulate a problem Buy
Selling table
----------------------------------------------------------------------------------------------------------
id itemId amount price sold collected slot aborted playerHash
1 8 8234 132950 7244 0 4 0 1020628
2 9 1980 132950 0 0 5 0 1020628
3 9 100 126300 0 0 2 0 1020628
4 3024 8888 10900 8888 0 0 0 37763265
5 1 100 1 1 0 0 0 1 *simulate a problem Sell
6 1 100 1 1 0 0 0 2 *simulate a problem Sell
Result of match ups
----------------------------------------------------------------------------------------------------------
S.itemId S.amount S.price S.sold S.collected S.slot S.playerHash B.itemId B.amount B.price B.bought B.collected B.slot B.playerHash
123 2 444 1 0 0 15431 123 34535 448 3 0 1 3455
123 2 444 1 0 0 15431 123 7567 444 333 0 3 7651
*simulated result rows of wrong data
1 100 1 1 0 0 1 1 100 1 0 0 0 3
1 100 1 1 0 0 2 1 100 1 0 0 0 3
The query I'm trying to do has to match up Buyers of same itemId with Sellers of same itemId.
Has to make sure both Buyer and Seller didn't have aborted boolean set on their item.
Also process Buyers who paying more then what the Seller wants first.
Also check to make sure the amount Seller is selling hasn't been sold yet.
Also check to make sure the amount Buyer is buying hasn't been bought yet.
As well as process 100 sales per batch.
Query below works great for the most part.
Problem I'm trying to fix in the query is to make it more intelligent with mathematics.
Lets say Selling table had playerHash 1 and playerHash 2
itemId 8 with amount 100 and sold 1 listed by playerHash lets say 1.
itemId 8 with amount 100 and sold 1 listed by playerHash lets say 2.
Now there is 1 Buyer called playerHash 3 who is buying same itemId 8 and amount 100.
Problem now is
playerHash1 can't sell all his 100 amount of itemId 8 because he only has 99 left (sold 1)
playerHash2 can't sell all his 100 amount of itemId 8 because he only has 99 left (sold 1) same story
Now the query should return
playerHash 1 selling to playerHash3 buyer with a new column to the end of row called something like willBuyAmount set it to 99
Next row should know that playerHash 1 has sold = 100 (without updating the database and without changing sold column in previous row)
Dsplay that playerHash 2 selling to playerHash3 buyer with a new coulmn to the end of row called something like willBuyAmount set it to 1.
playerHash2 should still be able to know by now that it has 98 left (sold 1) + 1 temporary somewhere idk clone the tables maybe?
and for future buyers of same itemId and >= price should be able to buy just 98 of this item amount left from playerHash2.
Here is the math I want to put into the query
VAR amountBuyerNeeds = (B.amount - B.bought)
VAR amountSellerStock = (S.amount - S.sold)
//update these values in the simulated cloned table data.
B.bought = IF(amountBuyerNeeds > amountSellerStock, (B.bought + amountSellerStock), B.amount)
S.sold = IF(amountSellerStock > amountBuyerNeeds, (S.sold + amountBuyerNeeds), S.amount)
//to real row print out
willBuyAmount = IF(amountBuyerNeeds > amountSellerStock, amountSellerStock, amountBuyerNeeds)
Query ATM looks like this.
SELECT S.itemId AS sell_itemId,
S.amount AS sell_amount,
S.price AS sell_price,
S.sold AS sell_sold,
S.collected AS sell_collected,
S.slot AS sell_slot,
S.playerHash AS sell_playerHash,
B.itemId AS buy_itemId,
B.amount AS buy_amount,
B.price AS buy_price,
B.bought AS buy_bought,
B.collected AS buy_collected,
B.slot AS buy_slot,
B.playerHash AS buy_playerHash
FROM Buying AS B,
Selling AS S
ON B.itemId = S.itemId
AND
B.aborted = 0
AND
S.aborted = 0
AND
B.price >= S.price
AND
S.sold < S.amount
AND
B.bought < B.amount
ORDER BY B.price DESC
LIMIT 100;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas group by and row count by category - pandas

You are looking for crosstab: pd.crosstab(df['User'], df['Type']).reset_index().rename_axis('',axis=1) output: User Cash Check Paypal Venmo Zelle 0 100 1 1 1 0 0 1 200 1 1 0 1 0 2 300 0 0 0 0 3

Related

Pandas DataFrame subtract values

How to calculate leftovers of each balance top-up using first in first out technique?

SQL Query to turn my columns un to rows

Fifo Method using SQL

Sqlite query Smarter Rows with mathematics of cloned tables?

Categories

Resources