Groupby sum of two column and create new dataframe in pandas

Groupby sum of two column and create new dataframe in pandas - pandas

I have a dataframe as shown below
Player Goal Freekick
Messi 2 5
Ronaldo 1 4
Messi 1 4
Messi 0 5
Ronaldo 0 9
Ronaldo 1 8
Xavi 1 1
Xavi 0 7
From the above I would like do groupby sum of Goal and Freekick as shown below.
Expected Output:
Player toatal_goals total_freekicks
Messi 3 14
Ronaldo 2 21
Xavi 1 8
I tried below code:
df1 = df.groupby(['Player'])['Goal'].sum().reset_index().rename({'Goal':'toatal_goals'})
df1['total_freekicks'] = df.groupby(['Player'])['Freekick'].sum()
But above one does not work, please help me..

First aggregate sum by Player, then DataFrame.add_prefix and convert columns names to lowercase:
df = df.groupby('Player').sum().add_prefix('total_').rename(columns=str.lower)
print (df)
total_goal total_freekick
Player
Messi 3 14
Ronaldo 2 21
Xavi 1 8

You can use namedagg to create the aggregations with customized column names.
(
df.groupby(by='Player')
.agg(toatal_goals=('Goal', 'sum'),
total_freekicks=('Freekick', 'sum'))
.reset_index()
)
Player toatal_goals total_freekicks
Messi 3 14
Ronaldo 2 21
Xavi 1 8

Related

Python Pandas: Rerange data from vertical to horizontal

I would like to transform a data frame using pandas.
Old-Dataframe:
Person-ID
Reference-ID
Name
1
1
Max
2
1
Kevin
3
1
Sara
4
4
Chessi
5
9
Fernando
into a new-dataframe in the following format.
New-Dataframe:
Person-ID
Reference-ID
Member1
Member2
Member3
1
1
Max
Kevin
Sara
4
4
Chessi
5
9
Fernando
My solution would be:
Write all the Reference-IDs from the old-dataframe into the new-dataframe
Write all the Person-Ids from the old-dataframe into the new-dataframe, which their reference_id is not in the old-dataframe (see example Fernando)
Loop trough the "old"-dataframe and add the name to the corresponding line in the new dataframe
Do you have any suggestions, on how to make this faster/simpler?
PS: The old-dataframe can be made like this
person_id = [1,2,3,4,5]
reference_id = [1,1,1,4,9]
name = ['Max','Kevin','Sara',"Chessi","Fernando"]
list_tuples=list(zip(person_id,reference_id,name))
old_dataframe = pd.DataFrame(list_tuples,columns=['Person_ID','Reference_id','Name'])

You can use pivot_table() like this:
df1= pd.pivot_table(df, index=['Reference-ID'], values=['Person-ID', 'Name'], aggfunc=({'Person-ID':'min', 'Name':lambda x:list(x), 'Person-ID':'min'}))
df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))
Output:
Person-ID
Reference-ID
0
1
2
1
1
Max
Kevin
Sara
4
4
Chessi
None
None
5
9
Fernando
None
None
You can reassign column names like this:
df2=df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))
df2.columns=list(df2.columns[0:2])+[f"Member{x+1}" for x in df2.columns[2:]]
Output:
Person-ID
Reference-ID
Member1
Member2
Member3
1
1
Max
Kevin
Sara
4
4
Chessi
None
None
5
9
Fernando
None
None

Pandas dataframe long to wide grouping by column with duplicated element

Hello I imported a dataframe which has no headers.
I created some headers using
df=pd.read_csv(path, names=['Prim Index', 'Alt Index', 'Aka', 'Name', 'Unnamed9'])
Then, I only keep
df=df[['Prim Index', 'Name']]
My question is how do I make df from long to wide, as 'Prim Index' is duplicated, I would like to have each unique Prim Index in one row and their names in different columns.
Thanks in advance! I appreciate any help on this!
Current df
Prim Index Alt Index Aka Name Unnamed9
1 2345 aka Marcus 0
1 7634 aka Tiffany 0
1 3242 aka Royce 0
2 8765 aka Charlotte 0
2 4343 aka Sara 0
3 9825 aka Keith 0
4 6714 aka Jennifer 0
5 7875 aka Justin 0
5 1345 aka Diana 0
6 6591 aka Liz 0
Desired df
Prim Index Name1 Name2 Name3 Name4
1 Marcus Tiffany Royce
2 Charlotte Sara
3 Keith
4 Jennifer
5 Justin Diana
6 Liz

Use GroupBy.cumcount for counter with DataFrame.set_index for MultiIndex, then reshape by Series.unstack and change columns names by DataFrame.add_prefix:
df1 = (df.set_index(['Prim Index', df.groupby('Prim Index').cumcount().add(1)])['Name']
.unstack(fill_value='')
.add_prefix('Name'))
print (df1)
Name1 Name2 Name3
Prim Index
1 Marcus Tiffany Royce
2 Charlotte Sara
3 Keith
4 Jennifer
5 Justin Diana
6 Liz
If there hast to be always 4 names add DataFrame.reindex by range:
df1 = (df.set_index(['Prim Index', df.groupby('Prim Index').cumcount().add(1)])['Name']
.unstack(fill_value='')
.reindex(range(1, 5), fill_value='', axis=1)
.add_prefix('Name'))
print (df1)
Name1 Name2 Name3 Name4
Prim Index
1 Marcus Tiffany Royce
2 Charlotte Sara
3 Keith
4 Jennifer
5 Justin Diana
6 Liz

Using Pivot Table, you can get similar solution that #jezreal did.
c = ['Prim Index','Name']
d = [[1,'Marcus'],[1,'Tiffany'],[1,'Royce'],
[2,'Charlotte'],[2,'Sara'],
[3,'Keith'],
[4,'Jennifer'],
[5,'Justin'],
[5,'Diana'],
[6,'Liz']]
import pandas as pd
df = pd.DataFrame(data = d,columns=c)
print (df)
df=(pd.pivot_table(df,index='Prim Index',
columns=df.groupby('Prim Index').cumcount().add(1),values='Name',aggfunc='sum',fill_value='')
.add_prefix('Name'))
df = df.reset_index()
print (df)
output of this will be:
Prim Index Name1 Name2 Name3
0 1 Marcus Tiffany Royce
1 2 Charlotte Sara
2 3 Keith
3 4 Jennifer
4 5 Justin Diana
5 6 Liz

Complex Pivoting in Pandas involving multiple columns

My df:
t name team Value
1-Jan-10 Roger Ajou 10
1-Jan-10 Kim KSR 20
1-Jan-10 Tim KKR 0
2-Jan-10 Tim KKR 10
2-Jan-10 Roger Ajou 20
3-Jan-10 Kim KSR 20
3-Jan-10 Tim KKR 10
3-Jan-10 Roger Ajou 0
I tried pandas pivoting but, here I need to pivot 2 column together and expected output is like below
KSR Ajou KKR
Kim Roger Tim
1-Jan-10 20 10 0
2-Jan-10 20 10
3-Jan-10 20 0 10
Note: the column are sorted based on 'name' column. Is this doable in pandas?

Use DataFrame.set_index with Series.unstack for reshape, then sorting by second level in MultiIndex and last remove index and columns names by DataFrame.rename_axis:
df1 = (df.set_index(['t','team','name'])['Value']
.unstack([1,2], fill_value=0)
.sort_index(level=1, axis=1)
.rename_axis(index=None, columns=[None, None]))
print (df1)
KSR Ajou KKR
Kim Roger Tim
1-Jan-10 20 10 0
2-Jan-10 0 20 10
3-Jan-10 20 0 10

how to apply one hot encoding or get dummies on 2 columns together in pandas?

I have below dataframe which contain sample values like:-
df = pd.DataFrame([["London", "Cambridge", 20], ["Cambridge", "London", 10], ["Liverpool", "London", 30]], columns= ["city_1", "city_2", "id"])
city_1 city_2 id
London Cambridge 20
Cambridge London 10
Liverpool London 30
I need the output dataframe as below which is built while joining 2 city columns together and applying one hot encoding after that:
id London Cambridge Liverpool
20 1 1 0
10 1 1 0
30 1 0 1
Currently, I am using the below code which works one time on a column, please could you advise if there is any pythonic way to get the above output
output_df = pd.get_dummies(df, columns=['city_1', 'city_2'])
which results in
id city_1_Cambridge city_1_London and so on columns

You can add parameters prefix_sep and prefix to get_dummies and then use max if want only 1 or 0 values (dummies or indicator columns) or sum if need count 1 values :
output_df = (pd.get_dummies(df, columns=['city_1', 'city_2'], prefix_sep='', prefix='')
.max(axis=1, level=0))
print (output_df)
id Cambridge Liverpool London
0 20 1 0 1
1 10 1 0 1
2 30 0 1 1
Or if want processing all columns without id convert not processing column(s) to index first by DataFrame.set_index, then use get_dummies with max and last add DataFrame.reset_index:
output_df = (pd.get_dummies(df.set_index('id'), prefix_sep='', prefix='')
.max(axis=1, level=0)
.reset_index())
print (output_df)
id Cambridge Liverpool London
0 20 1 0 1
1 10 1 0 1
2 30 0 1 1

group by aggregate function for multiplication

I want to aggregate 3 dataframes I have but instead of adding them together. I want to multiply 3 of them. is there a way to do it ?
i.e.
df=result.groupby(['name']).agg({'A':'sum','B':'sum'})
df1
A B
tim 1 5
emma 3 7
df2
A B
tim 1 8
emma 1 2
result
A B
tim 2 13
emma 4 9
Instead of summing the two, I want to multiply them:
A B
tim 1 40
emma 12 18

Use GroupBy.prod:
df=result.groupby(['name']).agg({'A':'prod','B':'prod'})
If need also join them:
df = pd.concat([df1, df2]).groupby('name', as_index=False).prod()

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Groupby sum of two column and create new dataframe in pandas - pandas

First aggregate sum by Player, then DataFrame.add_prefix and convert columns names to lowercase: df = df.groupby('Player').sum().add_prefix('total_').rename(columns=str.lower) print (df) total_goal total_freekick Player Messi 3 14 Ronaldo 2 21 Xavi 1 8

You can use namedagg to create the aggregations with customized column names. ( df.groupby(by='Player') .agg(toatal_goals=('Goal', 'sum'), total_freekicks=('Freekick', 'sum')) .reset_index() ) Player toatal_goals total_freekicks Messi 3 14 Ronaldo 2 21 Xavi 1 8

Related

Python Pandas: Rerange data from vertical to horizontal

Pandas dataframe long to wide grouping by column with duplicated element

Complex Pivoting in Pandas involving multiple columns

how to apply one hot encoding or get dummies on 2 columns together in pandas?

group by aggregate function for multiplication

Categories

Resources