With 2 group by columns, how can I do a subtotal by each of the group by columns? - pandas

New to pandas. I'm trying to a subtotal within 2 group by columns. I have managed to figure out how to sum using 2 group by attributes but within that, I'm also trying to do a subtotal. Please see below for example -
df.groupby(['Fruit','Name'])['Number'].sum()
Output
Fruit Name Number
Apples Bob 16
Mike 9
Steve 10
------
35
------
Grapes Bob 35
Tom 87
Tony 15
------
137
------
Oranges Bob 67
Mike 57
Tom 15
Tony 1
What I'm looking for is to show a subtotal by each fruit within the dataframe. Thank you!

You can use a mix of unstack, assign, and stack to do this:
sums = df.groupby(['Fruit', 'Name'])['Number'].sum().unstack().assign(Total=df.groupby('Fruit').sum()).stack()
Output:
>>> sums
Fruit Name
Apples Bob 16.0
Mike 9.0
Steve 10.0
Total 35.0
Grapes Bob 35.0
Tom 87.0
Tony 15.0
Total 137.0
Oranges Bob 67.0
Mike 57.0
Tom 15.0
Tony 1.0
Total 140.0
dtype: float64

IIUC, you can use df.sum or df.groupby.sum with level=0:
df.sum(level=0) # Will be deprecated
# or
df.groupby(level=0).sum()
Output:
Fruit
Apples 35
Grapes 137
Oranges 140
Name: Number, dtype: int64

Related

Counting the number of value occurence based on repeated Dates [duplicate]

I am using this dataframe:
Fruit Date Name Number
Apples 10/6/2016 Bob 7
Apples 10/6/2016 Bob 8
Apples 10/6/2016 Mike 9
Apples 10/7/2016 Steve 10
Apples 10/7/2016 Bob 1
Oranges 10/7/2016 Bob 2
Oranges 10/6/2016 Tom 15
Oranges 10/6/2016 Mike 57
Oranges 10/6/2016 Bob 65
Oranges 10/7/2016 Tony 1
Grapes 10/7/2016 Bob 1
Grapes 10/7/2016 Tom 87
Grapes 10/7/2016 Bob 22
Grapes 10/7/2016 Bob 12
Grapes 10/7/2016 Tony 15
I would like to aggregate this by Name and then by Fruit to get a total number of Fruit per Name. For example:
Bob,Apples,16
I tried grouping by Name and Fruit but how do I get the total number of Fruit?
Use GroupBy.sum:
df.groupby(['Fruit','Name']).sum()
Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
To specify the column to sum, use this: df.groupby(['Name', 'Fruit'])['Number'].sum()
Also you can use agg function,
df.groupby(['Name', 'Fruit'])['Number'].agg('sum')
If you want to keep the original columns Fruit and Name, use reset_index(). Otherwise Fruit and Name will become part of the index.
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
Fruit Name Number
Apples Bob 16
Apples Mike 9
Apples Steve 10
Grapes Bob 35
Grapes Tom 87
Grapes Tony 15
Oranges Bob 67
Oranges Mike 57
Oranges Tom 15
Oranges Tony 1
As seen in the other answers:
df.groupby(['Fruit','Name'])['Number'].sum()
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
Both the other answers accomplish what you want.
You can use the pivot functionality to arrange the data in a nice table
df.groupby(['Fruit','Name'],as_index = False).sum().pivot('Fruit','Name').fillna(0)
Name Bob Mike Steve Tom Tony
Fruit
Apples 16.0 9.0 10.0 0.0 0.0
Grapes 35.0 0.0 0.0 87.0 15.0
Oranges 67.0 57.0 0.0 15.0 1.0
df.groupby(['Fruit','Name'])['Number'].sum()
You can select different columns to sum numbers.
A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.
df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})
using your values...
df.groupby(['Name', 'Fruit']).agg({'Number': "sum"})
You can set the groupby column to index then using sum with level
df.set_index(['Fruit','Name']).sum(level=[0,1])
Out[175]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Oranges Bob 67
Tom 15
Mike 57
Tony 1
Grapes Bob 35
Tom 87
Tony 15
You could also use transform() on column Number after group by. This operation will calculate the total number in one group with function sum, the result is a series with the same index as original dataframe.
df['Number'] = df.groupby(['Fruit', 'Name'])['Number'].transform('sum')
df = df.drop_duplicates(subset=['Fruit', 'Name']).drop('Date', 1)
Then, you can drop the duplicate rows on column Fruit and Name. Moreover, you can drop the column Date by specifying axis 1 (0 for rows and 1 for columns).
# print(df)
Fruit Name Number
0 Apples Bob 16
2 Apples Mike 9
3 Apples Steve 10
5 Oranges Bob 67
6 Oranges Tom 15
7 Oranges Mike 57
9 Oranges Tony 1
10 Grapes Bob 35
11 Grapes Tom 87
14 Grapes Tony 15
# You could achieve the same result with functions discussed by others:
# print(df.groupby(['Fruit', 'Name'], as_index=False)['Number'].sum())
# print(df.groupby(['Fruit', 'Name'], as_index=False)['Number'].agg('sum'))
There is an official tutorial Group by: split-apply-combine talking about what you can do after group by.
If you want the aggregated column to have a custom name such as Total Number, Total etc. (all the solutions on here results in a dataframe where the aggregate column is named Number), use named aggregation:
df.groupby(['Fruit', 'Name'], as_index=False).agg(**{'Total Number': ('Number', 'sum')})
or (if the custom name doesn't need to have a white space in it):
df.groupby(['Fruit', 'Name'], as_index=False).agg(Total=('Number', 'sum'))
this is equivalent to SQL query:
SELECT Fruit, Name, sum(Number) AS Total
FROM df
GROUP BY Fruit, Name
Speaking of SQL, there's pandasql module that allows you to query pandas dataFrames in the local environment using SQL syntax. It's not part of Pandas, so will have to be installed separately.
#! pip install pandasql
from pandasql import sqldf
sqldf("""
SELECT Fruit, Name, sum(Number) AS Total
FROM df
GROUP BY Fruit, Name
""")
You can use dfsql
for your problem, it will look something like:
df.sql('SELECT fruit, sum(number) GROUP BY fruit')
https://github.com/mindsdb/dfsql
here is an article about it:
https://medium.com/riselab/why-every-data-scientist-using-pandas-needs-modin-bringing-sql-to-dataframes-3b216b29a7c0
You can use reset_index() to reset the index after the sum
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
or
df.groupby(['Fruit','Name'], as_index=False)['Number'].sum()

Conditionally concatenate rows of a dataframe and process additional columns based on the condition

I have an Input Dataframe that the following :
NAME TEXT START END
Tim Tim Wagner is a teacher. 10 20.5
Tim He is from Cleveland, Ohio. 20.5 40
Frank Frank is a musician. 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. 62 70
Frank He performed at the Carnegie Hall last year. 70 85
Frank It was fantastic listening to him. 85 90
Want output dataframe as follows:
NAME TEXT START END
Tim Tim Wagner is a teacher. He is from Cleveland, Ohio. 10 40
Frank Frank is a musician 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him. 62 90
Appreciate your help on this.
Thanks
Try:
grp = (df['NAME'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT','START','END']\
.agg({'TEXT':lambda x: ' '.join(x), 'START': 'min', 'END':'max'})\
.reset_index().drop('group', axis=1)
Output:
NAME TEXT START END
0 Tim Tim Wagner is a teacher. He is from Cleveland,... 10.0 40.0
1 Frank Frank is a musician. 40.0 50.0
2 Tim He like to travel with his family 50.0 62.0
3 Frank He is a performing artist who plays the cello.... 62.0 90.0

Data frame: group by datatime index and a column [duplicate]

I am using this dataframe:
Fruit Date Name Number
Apples 10/6/2016 Bob 7
Apples 10/6/2016 Bob 8
Apples 10/6/2016 Mike 9
Apples 10/7/2016 Steve 10
Apples 10/7/2016 Bob 1
Oranges 10/7/2016 Bob 2
Oranges 10/6/2016 Tom 15
Oranges 10/6/2016 Mike 57
Oranges 10/6/2016 Bob 65
Oranges 10/7/2016 Tony 1
Grapes 10/7/2016 Bob 1
Grapes 10/7/2016 Tom 87
Grapes 10/7/2016 Bob 22
Grapes 10/7/2016 Bob 12
Grapes 10/7/2016 Tony 15
I would like to aggregate this by Name and then by Fruit to get a total number of Fruit per Name. For example:
Bob,Apples,16
I tried grouping by Name and Fruit but how do I get the total number of Fruit?
Use GroupBy.sum:
df.groupby(['Fruit','Name']).sum()
Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
To specify the column to sum, use this: df.groupby(['Name', 'Fruit'])['Number'].sum()
Also you can use agg function,
df.groupby(['Name', 'Fruit'])['Number'].agg('sum')
If you want to keep the original columns Fruit and Name, use reset_index(). Otherwise Fruit and Name will become part of the index.
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
Fruit Name Number
Apples Bob 16
Apples Mike 9
Apples Steve 10
Grapes Bob 35
Grapes Tom 87
Grapes Tony 15
Oranges Bob 67
Oranges Mike 57
Oranges Tom 15
Oranges Tony 1
As seen in the other answers:
df.groupby(['Fruit','Name'])['Number'].sum()
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
Both the other answers accomplish what you want.
You can use the pivot functionality to arrange the data in a nice table
df.groupby(['Fruit','Name'],as_index = False).sum().pivot('Fruit','Name').fillna(0)
Name Bob Mike Steve Tom Tony
Fruit
Apples 16.0 9.0 10.0 0.0 0.0
Grapes 35.0 0.0 0.0 87.0 15.0
Oranges 67.0 57.0 0.0 15.0 1.0
df.groupby(['Fruit','Name'])['Number'].sum()
You can select different columns to sum numbers.
A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.
df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})
using your values...
df.groupby(['Name', 'Fruit']).agg({'Number': "sum"})
You can set the groupby column to index then using sum with level
df.set_index(['Fruit','Name']).sum(level=[0,1])
Out[175]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Oranges Bob 67
Tom 15
Mike 57
Tony 1
Grapes Bob 35
Tom 87
Tony 15
You could also use transform() on column Number after group by. This operation will calculate the total number in one group with function sum, the result is a series with the same index as original dataframe.
df['Number'] = df.groupby(['Fruit', 'Name'])['Number'].transform('sum')
df = df.drop_duplicates(subset=['Fruit', 'Name']).drop('Date', 1)
Then, you can drop the duplicate rows on column Fruit and Name. Moreover, you can drop the column Date by specifying axis 1 (0 for rows and 1 for columns).
# print(df)
Fruit Name Number
0 Apples Bob 16
2 Apples Mike 9
3 Apples Steve 10
5 Oranges Bob 67
6 Oranges Tom 15
7 Oranges Mike 57
9 Oranges Tony 1
10 Grapes Bob 35
11 Grapes Tom 87
14 Grapes Tony 15
# You could achieve the same result with functions discussed by others:
# print(df.groupby(['Fruit', 'Name'], as_index=False)['Number'].sum())
# print(df.groupby(['Fruit', 'Name'], as_index=False)['Number'].agg('sum'))
There is an official tutorial Group by: split-apply-combine talking about what you can do after group by.
If you want the aggregated column to have a custom name such as Total Number, Total etc. (all the solutions on here results in a dataframe where the aggregate column is named Number), use named aggregation:
df.groupby(['Fruit', 'Name'], as_index=False).agg(**{'Total Number': ('Number', 'sum')})
or (if the custom name doesn't need to have a white space in it):
df.groupby(['Fruit', 'Name'], as_index=False).agg(Total=('Number', 'sum'))
this is equivalent to SQL query:
SELECT Fruit, Name, sum(Number) AS Total
FROM df
GROUP BY Fruit, Name
Speaking of SQL, there's pandasql module that allows you to query pandas dataFrames in the local environment using SQL syntax. It's not part of Pandas, so will have to be installed separately.
#! pip install pandasql
from pandasql import sqldf
sqldf("""
SELECT Fruit, Name, sum(Number) AS Total
FROM df
GROUP BY Fruit, Name
""")
You can use dfsql
for your problem, it will look something like:
df.sql('SELECT fruit, sum(number) GROUP BY fruit')
https://github.com/mindsdb/dfsql
here is an article about it:
https://medium.com/riselab/why-every-data-scientist-using-pandas-needs-modin-bringing-sql-to-dataframes-3b216b29a7c0
You can use reset_index() to reset the index after the sum
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
or
df.groupby(['Fruit','Name'], as_index=False)['Number'].sum()

Group by multiple columns pandas [duplicate]

I am using this dataframe:
Fruit Date Name Number
Apples 10/6/2016 Bob 7
Apples 10/6/2016 Bob 8
Apples 10/6/2016 Mike 9
Apples 10/7/2016 Steve 10
Apples 10/7/2016 Bob 1
Oranges 10/7/2016 Bob 2
Oranges 10/6/2016 Tom 15
Oranges 10/6/2016 Mike 57
Oranges 10/6/2016 Bob 65
Oranges 10/7/2016 Tony 1
Grapes 10/7/2016 Bob 1
Grapes 10/7/2016 Tom 87
Grapes 10/7/2016 Bob 22
Grapes 10/7/2016 Bob 12
Grapes 10/7/2016 Tony 15
I would like to aggregate this by Name and then by Fruit to get a total number of Fruit per Name. For example:
Bob,Apples,16
I tried grouping by Name and Fruit but how do I get the total number of Fruit?
Use GroupBy.sum:
df.groupby(['Fruit','Name']).sum()
Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
To specify the column to sum, use this: df.groupby(['Name', 'Fruit'])['Number'].sum()
Also you can use agg function,
df.groupby(['Name', 'Fruit'])['Number'].agg('sum')
If you want to keep the original columns Fruit and Name, use reset_index(). Otherwise Fruit and Name will become part of the index.
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
Fruit Name Number
Apples Bob 16
Apples Mike 9
Apples Steve 10
Grapes Bob 35
Grapes Tom 87
Grapes Tony 15
Oranges Bob 67
Oranges Mike 57
Oranges Tom 15
Oranges Tony 1
As seen in the other answers:
df.groupby(['Fruit','Name'])['Number'].sum()
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
Both the other answers accomplish what you want.
You can use the pivot functionality to arrange the data in a nice table
df.groupby(['Fruit','Name'],as_index = False).sum().pivot('Fruit','Name').fillna(0)
Name Bob Mike Steve Tom Tony
Fruit
Apples 16.0 9.0 10.0 0.0 0.0
Grapes 35.0 0.0 0.0 87.0 15.0
Oranges 67.0 57.0 0.0 15.0 1.0
df.groupby(['Fruit','Name'])['Number'].sum()
You can select different columns to sum numbers.
A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.
df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})
using your values...
df.groupby(['Name', 'Fruit']).agg({'Number': "sum"})
You can set the groupby column to index then using sum with level
df.set_index(['Fruit','Name']).sum(level=[0,1])
Out[175]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Oranges Bob 67
Tom 15
Mike 57
Tony 1
Grapes Bob 35
Tom 87
Tony 15
You could also use transform() on column Number after group by. This operation will calculate the total number in one group with function sum, the result is a series with the same index as original dataframe.
df['Number'] = df.groupby(['Fruit', 'Name'])['Number'].transform('sum')
df = df.drop_duplicates(subset=['Fruit', 'Name']).drop('Date', 1)
Then, you can drop the duplicate rows on column Fruit and Name. Moreover, you can drop the column Date by specifying axis 1 (0 for rows and 1 for columns).
# print(df)
Fruit Name Number
0 Apples Bob 16
2 Apples Mike 9
3 Apples Steve 10
5 Oranges Bob 67
6 Oranges Tom 15
7 Oranges Mike 57
9 Oranges Tony 1
10 Grapes Bob 35
11 Grapes Tom 87
14 Grapes Tony 15
# You could achieve the same result with functions discussed by others:
# print(df.groupby(['Fruit', 'Name'], as_index=False)['Number'].sum())
# print(df.groupby(['Fruit', 'Name'], as_index=False)['Number'].agg('sum'))
There is an official tutorial Group by: split-apply-combine talking about what you can do after group by.
If you want the aggregated column to have a custom name such as Total Number, Total etc. (all the solutions on here results in a dataframe where the aggregate column is named Number), use named aggregation:
df.groupby(['Fruit', 'Name'], as_index=False).agg(**{'Total Number': ('Number', 'sum')})
or (if the custom name doesn't need to have a white space in it):
df.groupby(['Fruit', 'Name'], as_index=False).agg(Total=('Number', 'sum'))
this is equivalent to SQL query:
SELECT Fruit, Name, sum(Number) AS Total
FROM df
GROUP BY Fruit, Name
Speaking of SQL, there's pandasql module that allows you to query pandas dataFrames in the local environment using SQL syntax. It's not part of Pandas, so will have to be installed separately.
#! pip install pandasql
from pandasql import sqldf
sqldf("""
SELECT Fruit, Name, sum(Number) AS Total
FROM df
GROUP BY Fruit, Name
""")
You can use dfsql
for your problem, it will look something like:
df.sql('SELECT fruit, sum(number) GROUP BY fruit')
https://github.com/mindsdb/dfsql
here is an article about it:
https://medium.com/riselab/why-every-data-scientist-using-pandas-needs-modin-bringing-sql-to-dataframes-3b216b29a7c0
You can use reset_index() to reset the index after the sum
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
or
df.groupby(['Fruit','Name'], as_index=False)['Number'].sum()

Pandas: Combine rows with different values [duplicate]

I am using this dataframe:
Fruit Date Name Number
Apples 10/6/2016 Bob 7
Apples 10/6/2016 Bob 8
Apples 10/6/2016 Mike 9
Apples 10/7/2016 Steve 10
Apples 10/7/2016 Bob 1
Oranges 10/7/2016 Bob 2
Oranges 10/6/2016 Tom 15
Oranges 10/6/2016 Mike 57
Oranges 10/6/2016 Bob 65
Oranges 10/7/2016 Tony 1
Grapes 10/7/2016 Bob 1
Grapes 10/7/2016 Tom 87
Grapes 10/7/2016 Bob 22
Grapes 10/7/2016 Bob 12
Grapes 10/7/2016 Tony 15
I would like to aggregate this by Name and then by Fruit to get a total number of Fruit per Name. For example:
Bob,Apples,16
I tried grouping by Name and Fruit but how do I get the total number of Fruit?
Use GroupBy.sum:
df.groupby(['Fruit','Name']).sum()
Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
To specify the column to sum, use this: df.groupby(['Name', 'Fruit'])['Number'].sum()
Also you can use agg function,
df.groupby(['Name', 'Fruit'])['Number'].agg('sum')
If you want to keep the original columns Fruit and Name, use reset_index(). Otherwise Fruit and Name will become part of the index.
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
Fruit Name Number
Apples Bob 16
Apples Mike 9
Apples Steve 10
Grapes Bob 35
Grapes Tom 87
Grapes Tony 15
Oranges Bob 67
Oranges Mike 57
Oranges Tom 15
Oranges Tony 1
As seen in the other answers:
df.groupby(['Fruit','Name'])['Number'].sum()
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
Both the other answers accomplish what you want.
You can use the pivot functionality to arrange the data in a nice table
df.groupby(['Fruit','Name'],as_index = False).sum().pivot('Fruit','Name').fillna(0)
Name Bob Mike Steve Tom Tony
Fruit
Apples 16.0 9.0 10.0 0.0 0.0
Grapes 35.0 0.0 0.0 87.0 15.0
Oranges 67.0 57.0 0.0 15.0 1.0
df.groupby(['Fruit','Name'])['Number'].sum()
You can select different columns to sum numbers.
A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.
df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})
using your values...
df.groupby(['Name', 'Fruit']).agg({'Number': "sum"})
You can set the groupby column to index then using sum with level
df.set_index(['Fruit','Name']).sum(level=[0,1])
Out[175]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Oranges Bob 67
Tom 15
Mike 57
Tony 1
Grapes Bob 35
Tom 87
Tony 15
You could also use transform() on column Number after group by. This operation will calculate the total number in one group with function sum, the result is a series with the same index as original dataframe.
df['Number'] = df.groupby(['Fruit', 'Name'])['Number'].transform('sum')
df = df.drop_duplicates(subset=['Fruit', 'Name']).drop('Date', 1)
Then, you can drop the duplicate rows on column Fruit and Name. Moreover, you can drop the column Date by specifying axis 1 (0 for rows and 1 for columns).
# print(df)
Fruit Name Number
0 Apples Bob 16
2 Apples Mike 9
3 Apples Steve 10
5 Oranges Bob 67
6 Oranges Tom 15
7 Oranges Mike 57
9 Oranges Tony 1
10 Grapes Bob 35
11 Grapes Tom 87
14 Grapes Tony 15
# You could achieve the same result with functions discussed by others:
# print(df.groupby(['Fruit', 'Name'], as_index=False)['Number'].sum())
# print(df.groupby(['Fruit', 'Name'], as_index=False)['Number'].agg('sum'))
There is an official tutorial Group by: split-apply-combine talking about what you can do after group by.
If you want the aggregated column to have a custom name such as Total Number, Total etc. (all the solutions on here results in a dataframe where the aggregate column is named Number), use named aggregation:
df.groupby(['Fruit', 'Name'], as_index=False).agg(**{'Total Number': ('Number', 'sum')})
or (if the custom name doesn't need to have a white space in it):
df.groupby(['Fruit', 'Name'], as_index=False).agg(Total=('Number', 'sum'))
this is equivalent to SQL query:
SELECT Fruit, Name, sum(Number) AS Total
FROM df
GROUP BY Fruit, Name
Speaking of SQL, there's pandasql module that allows you to query pandas dataFrames in the local environment using SQL syntax. It's not part of Pandas, so will have to be installed separately.
#! pip install pandasql
from pandasql import sqldf
sqldf("""
SELECT Fruit, Name, sum(Number) AS Total
FROM df
GROUP BY Fruit, Name
""")
You can use dfsql
for your problem, it will look something like:
df.sql('SELECT fruit, sum(number) GROUP BY fruit')
https://github.com/mindsdb/dfsql
here is an article about it:
https://medium.com/riselab/why-every-data-scientist-using-pandas-needs-modin-bringing-sql-to-dataframes-3b216b29a7c0
You can use reset_index() to reset the index after the sum
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
or
df.groupby(['Fruit','Name'], as_index=False)['Number'].sum()