group by aggregate function for multiplication - pandas

I want to aggregate 3 dataframes I have but instead of adding them together. I want to multiply 3 of them. is there a way to do it ?
i.e.
df=result.groupby(['name']).agg({'A':'sum','B':'sum'})
df1
A B
tim 1 5
emma 3 7
df2
A B
tim 1 8
emma 1 2
result
A B
tim 2 13
emma 4 9
Instead of summing the two, I want to multiply them:
A B
tim 1 40
emma 12 18

Use GroupBy.prod:
df=result.groupby(['name']).agg({'A':'prod','B':'prod'})
If need also join them:
df = pd.concat([df1, df2]).groupby('name', as_index=False).prod()

Related

the 'combine' of a split-apply-combine in pd.groupby() works brilliantly, but I'm not sure why

I have a fragment of code similar to below. It works perfectly, but I'm not sure why I am so lucky.
The groupby() is a split-apply-combine operation. So I understand why the qf.groupby(qf.g).mean() returns a series with two rows, the mean() for each of a,b.
And what's brilliant is that -combine step of the qf.groupby(qf.g).cumsum() reassembles all the rows into their original order as found in the starting df.
My question is, "Why am I able to count on this behavior?" I'm glad I can, but I cannot articulate why it's possible.
#split-apply-combine
import pandas as pd
#DF with a value, and an arbitrary category
qf= pd.DataFrame(data=[x for x in "aaabbaaaab"], columns=['g'])
qf['val'] = [1,2,3,1,2,3,4,5,6,9]
print(f"applying mean() to members in each group of a,b ")
print ( qf.groupby(qf.g).mean() )
print(f"\n\napplying cumsum() to members in each group of a,b ")
print( qf.groupby(qf.g).cumsum() ) #this combines them in the original index order thankfully
qf['running_totals'] = qf.groupby(qf.g).cumsum()
print (f"\n{qf}")
yields:
applying mean() to members in each group of a,b
val
g
a 3.428571
b 4.000000
applying cumsum() to members in each group of a,b
val
0 1
1 3
2 6
3 1
4 3
5 9
6 13
7 18
8 24
9 12
g val running_totals
0 a 1 1
1 a 2 3
2 a 3 6
3 b 1 1
4 b 2 3
5 a 3 9
6 a 4 13
7 a 5 18
8 a 6 24
9 b 9 12

most efficient way to set dataframe column indexing to other columns

I have a large Dataframe. One of my columns contains the name of others. I want to eval this colum and set in each row the value of the referenced column:
|A|B|C|Column|
|:|:|:|:-----|
|1|3|4| B |
|2|5|3| A |
|3|5|9| C |
Desired output:
|A|B|C|Column|
|:|:|:|:-----|
|1|3|4| 3 |
|2|5|3| 2 |
|3|5|9| 9 |
I am achieving this result using:
df.apply(lambda d: eval("d." + d['Column']), axis=1)
But it is very slow, even using swifter. Is there a more efficient way of performing this?
For better performance, use df.to_numpy():
In [365]: df['Column'] = df.to_numpy()[df.index, df.columns.get_indexer(df.Column)]
In [366]: df
Out[366]:
A B C Column
0 1 3 4 3
1 2 5 3 2
2 3 5 9 9
For Pandas < 1.2.0, use lookup:
df['Column'] = df.lookup(df.index, df['Column'])
From 1.2.0+, lookup is decprecated, you can just use a for loop:
df['Column'] = [df.at[idx, r['Column']] for idx, r in df.iterrows()]
Output:
A B C Column
0 1 3 4 3
1 2 5 3 2
2 3 5 9 9
Since lookup is going to decprecated try numpy method with get_indexer
df['new'] = df.values[df.index,df.columns.get_indexer(df.Column)]
df
Out[75]:
A B C Column new
0 1 3 4 B 3
1 2 5 3 A 2
2 3 5 9 C 9

Get value from another df based on condition

I have 2 df
df1:
ID X Y Cond
Johnson 2 3 fine
Sand NAN NAN sick
Cooper 1 2 fine
Nelson 1 2 fine
Peterson 4 5 fine
and df2 :
id2 X Y
Magic 2 3
Sand 2 3
Cooper 1 2
Dean 1 2
I want to update x value in df1, if Cond ="sick" and df["id"] = df["id2]
to get the new df1 :
ID X Y Cond
Johnson 2 3 fine
Sand 2 3 sick
Cooper 1 2 fine
Nelson 1 2 fine
Peterson 4 5 fine
I tried :
df1["x"] = np.where((df["cond"]=="sick")& (df1["id"]==df2["id2"]),df2["x"],"")
But its not working. I get this ValueError :
ValueError: Can only compare identically-labeled Series objects
Thank you
First convert both ID columns to index values for possible match selected rows by DataFrame.loc:
df11 = df1.set_index('ID')
df22 = df2.set_index('id2')
df11.loc[df11["Cond"]=="sick", ['X','Y']] = df22[['X','Y']]
df = df11.reset_index()
print (df)
ID X Y Cond
0 Johnson 2 3 fine
1 Sand 2 3 sick
2 Cooper 1 2 fine
3 Nelson 1 2 fine
4 Peterson 4 5 fine
You can use the where() method of pandas dataframes instead of the wherefunction from numpy. The code looks like this :
df1.loc[:,["X", "Y"]] = df1.loc[:,["X", "Y"]].where(df1["Cond"]!="sick",df2.loc[:,["X", "Y"]])

Groupby sum of two column and create new dataframe in pandas

I have a dataframe as shown below
Player Goal Freekick
Messi 2 5
Ronaldo 1 4
Messi 1 4
Messi 0 5
Ronaldo 0 9
Ronaldo 1 8
Xavi 1 1
Xavi 0 7
From the above I would like do groupby sum of Goal and Freekick as shown below.
Expected Output:
Player toatal_goals total_freekicks
Messi 3 14
Ronaldo 2 21
Xavi 1 8
I tried below code:
df1 = df.groupby(['Player'])['Goal'].sum().reset_index().rename({'Goal':'toatal_goals'})
df1['total_freekicks'] = df.groupby(['Player'])['Freekick'].sum()
But above one does not work, please help me..
First aggregate sum by Player, then DataFrame.add_prefix and convert columns names to lowercase:
df = df.groupby('Player').sum().add_prefix('total_').rename(columns=str.lower)
print (df)
total_goal total_freekick
Player
Messi 3 14
Ronaldo 2 21
Xavi 1 8
You can use namedagg to create the aggregations with customized column names.
(
df.groupby(by='Player')
.agg(toatal_goals=('Goal', 'sum'),
total_freekicks=('Freekick', 'sum'))
.reset_index()
)
Player toatal_goals total_freekicks
Messi 3 14
Ronaldo 2 21
Xavi 1 8

python pandas - set column value of column based on index and or ID of concatenated dataframes

I have a concatenated dataframe of at least two concatenated dataframes:
i.e.
df1
Name | Type | ID
0 Joe A 1
1 Fred B 2
2 Mike Both 3
3 Frank Both 4
df2
Name | Type | ID
0 Bill Both 1
1 Jill Both 2
2 Mill B 3
3 Hill A 4
ConcatDf:
Name | Type | ID
0 Joe A 1
1 Fred B 2
2 Mike Both 3
3 Frank Both 4
0 Bill Both 1
1 Jill Both 2
2 Mill B 3
3 Hill A 4
Suppose after they are concatenated, I'd like to set Type for all records from df1 to C and all records from df2 to B. Is this possible?
The indices of the dataframes can be vastly different sizes.
Thanks in advance.
df3 = pd.concat([df1,df2], keys = (1,2))
df3.loc[(1), 'Type'] == 'C'
When you concat you can assign the df's keys. This will create a multi-index with the keys separating the concatonated df's. Then when you use .loc with keys you can use( around the key to call the group. In the code above we would change all the Types of df1 (which has a key of 1) to C.
Use merge with indicator=True to find rows belong to df1 or df2. Next, use np.where to assign A or B.
t = concatdf.merge(df1, how='left', on=concatdf.columns.tolist(), indicator=True)
concatdf['Type'] = np.where(t._merge.eq('left_only'), 'B', 'C')
Out[2185]:
Name Type ID
0 Joe C 1
1 Fred C 2
2 Mike C 3
3 Frank C 4
0 Bill B 1
1 Jill B 2
2 Mill B 3
3 Hill B 4