add "all" row to pandas group by - pandas

This is my code (using pandas 0.19.2)
import pandas as pd
data=StringIO("""category,region,sales
fruits,east,12
vegatables,east,3
fruits,west,5
vegatables,wst,7
""")
df = pd.read_csv(data)
print(df.groupby('category', as_index=False).agg({'sales': sum}))
This is the output:
category sales
0 fruits 17
1 vegatables 10
My question is: how do add an 'all' row so the output would look like this:
category sales
0 fruits 17
1 vegatables 10
all 27

You can try pivot_table and alter the new data:
new_df = df.pivot_table(columns='category',index='region', values='sales')
new_df['all'] = new_df.sum(1)
Output:
category fruits vegatables all
region
east 12 3 15
west 5 7 12
And if you want your original data:
new_df.stack().to_frame(name='Sales').reset_index()
Output:
region category Sales
0 east fruits 12
1 east vegatables 3
2 east all 15
3 west fruits 5
4 west vegatables 7
5 west all 12

here is what i ended up doing:
from io import StringIO
import pandas as pd
data = StringIO("""category,region,sales
fruits,east,12
vegatables,east,3
fruits,west,5
vegatables,wst,7
""")
df = pd.read_csv(data)
body=df.groupby('category', as_index=False).agg({'sales': sum})
head=df.groupby(lambda x: True, as_index=False) #advanced panda trickery
head=head.agg({'sales': sum})
head.insert(0,'category','*all*')
print(body.append(head))
basically, create another dataframe with the 'all' row and concat

Related

Copy first of group down and sum total - pre defined groups

I have previously asked how to iterate through a prescribed grouping of items and received the solution.
import pandas as pd
data = [['apple', 1], ['orange', 2], ['pear', 3], ['peach', 4],['plum', 5], ['grape', 6]]
#index_groups = [0],[1,2],[3,4,5]
df = pd.DataFrame(data, columns=['Name', 'Number'])
for i in range(len(df)):
print(df['Number'][i])
Name Age
0 apple 1
1 orange 2
2 pear 3
3 peach 4
4 plum 5
5 grape 6
where :
for group in index_groups:
print(df.loc[group])
gave me just what I needed. Following up on this I would like to now sum the numbers per group but also copy down the first 'Name' in each group to the other names in the group, and then concatenate so one line per 'Name'.
In the above example the output I'm seeking would be
Name Age
0 apple 1
1 orange 5
2 peach 15
I can append the sums to a list easy enough
group_sum = []
group_sum.append(sum(df['Number'].loc[group]))
But I can't get the 'Names' in order to merge with the sums.
You could try:
df_final = pd.DataFrame()
for group in index_groups:
_df = df.loc[group]
_df["Name"] = df.loc[group].Name.iloc[0]
df_final = pd.concat([df_final, _df])
df_final.groupby("Name").agg(Age=("Number", "sum")).reset_index()
Output:
Name Age
0 apple 1
1 orange 5
2 peach 15

Comparing strings in two different dataframe and adding a column [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I have two dataframes as follows:
df1 =
Index Name Age
0 Bob1 20
1 Bob2 21
2 Bob3 22
The second dataframe is as follows -
df2 =
Index Country Name
0 US Bob1
1 UK Bob123
2 US Bob234
3 Canada Bob2
4 Canada Bob987
5 US Bob3
6 UK Mary1
7 UK Mary2
8 UK Mary3
9 Canada Mary65
I would like to compare the names from df1 to the countries in df2 and create a new dataframe as follows:
Index Country Name Age
0 US Bob1 20
1 Canada Bob2 21
2 US Bob3 22
Thank you.
Using merge() should solve the problem.
df3 = pd.merge(df1, df2, on='Name')
Outcome:
import pandas as pd
df1 = pd.DataFrame({ "Name":["Bob1", "Bob2", "Bob3"], "Age":[20,21,22]})
df2 = pd.DataFrame({ "Country":["US", "UK", "US", "Canada", "Canada", "US", "UK", "UK", "UK", "Canada"],
"Name":["Bob1", "Bob123", "Bob234", "Bob2", "Bob987", "Bob3", "Mary1", "Mary2", "Mary3", "Mary65"]})
df3 = pd.merge(df1, df2, on='Name')
df3

How can I create a datframe column which counts the occurrence of each value in anopther column?

I am trying to add a column to my dataframe, which will hold a value which represents the number of times a unique value has appeared in another column.
For example , I haver the following dataframe:
Date|Team|Goals|
22.08.20|Team1|4|
22.08.20|Team2|3|
22.08.20|Team3|1|
22.09.20|Team1|4|
22.09.20|Team3|5|
I would like to add a counter column, which counts how often each team appears:
Date|Team|Goals|Count|
22.08.20|Team1|4|1|
22.08.20|Team2|3|1|
22.08.20|Team3|1|1|
22.09.20|Team1|4|2|
22.09.20|Team3|5|2|
My Dataframe is ordered by date, so the teams should appear in the correct order.
Apologies, very new to pandas and stack overflow, so please let me know if I can format this question differently. Thanks
TRY:
df['Count'] = df.groupby('Team').cumcount().add(1)
OUTPUT:
Date Team Goals Count
0 22.08.20 Team1 4 1
1 22.08.20 Team2 3 1
2 22.08.20 Team3 1 1
3 22.09.20 Team1 4 2
4 22.09.20 Team3 5 2
Another answer building upon #Nk03's with replicable results:
import pandas as pd
import numpy as np
# Set numpy random seed
np.random.seed(42)
# Create dates array
dates = pd.date_range(start='2021-06-01', periods=10, freq='D')
# Create teams array
teams_names = ['Team 1', 'Team 2', 'Team 3']
teams = [teams_names[i] for i in np.random.randint(0, 3, 10)]
# Create goals array
goals = np.random.randint(1, 6, 10)
# Create DataFrame
data = pd.DataFrame({'Date': dates,
'Team': teams,
'Goals': goals})
# Cumulative count of teams
data['Count'] = data.groupby('Team').cumcount().add(1)
The output will be:
Date Team Goals Count
0 2021-06-01 Team 2 3 1
1 2021-06-02 Team 2 1 2
2 2021-06-03 Team 2 4 3
3 2021-06-04 Team 1 2 1
4 2021-06-05 Team 2 4 4
5 2021-06-06 Team 1 2 2
6 2021-06-07 Team 2 2 5
7 2021-06-08 Team 3 4 1
8 2021-06-09 Team 3 5 2
9 2021-06-10 Team 1 2 3

How to order dataframe using a list in pandas

I have a pandas dataframe as follows.
import pandas as pd
data = [['Alex',10, 175],['Bob',12, 178],['Clarke',13, 179]]
df = pd.DataFrame(data,columns=['Name','Age', 'Height'])
print(df)
I also have a list as follows.
mynames = ['Emj', 'Bob', 'Jenne', 'Alex', 'Clarke']
I want to order the rows of my dataframe in the order of mynames list. In other words, my output should be as follows.
Name Age Height
0 Bob 12 178
1 Alex 10 175
2 Clarke 13 179
I was trying to do this as follows. I am wondering if there is an easy way to do this in pandas than converting the dataframe to list.
I am happy to provide more details if needed.
You can do pd.Categorical + argsort
df=df.loc[pd.Categorical(df.Name,mynames).argsort()]
Name Age Height
1 Bob 12 178
0 Alex 10 175
2 Clarke 13 179

how to create a new column after groupby count?

I am trying to create a new column with group by and count.
However it throws error "incompatible index of inserted column with frame index"
import pandas as pd
#read csv
df1 = pd.read_csv('hi.txt',sep = '\t')#provide name and sheet of excel file
a=df1.groupby(['c','t']).count()
df1['difference']=a
print(df1)
Input :
coun id cat
A 12 90
U 13 91
Use:
new_df=df.groupby(['category','country'],sort=False).country.count().to_frame('count').reset_index()
print(new_df)
category country count
0 9910 AUS 2
1 7310 NZL 1
2 9910 NZL 1