I have been playing with aggregation in pandas dataframe. Considering the following dataframe:
df=pd.DataFrame({'a':[1,2,3,4,5,6,7,8],
'batch':['q','q','q','w','w','w','w','e'],
'c':[4,1,3,4,5,1,3,2]})
I have to do aggregation on the batch column with mean for column a and min for column c.
I used the following method to do the aggregation:
agg_dict = {'a':{'a':'mean'},'c':{'c':'min'}}
aggregated_df = df.groupby("batch").agg(agg_dict)
The problem is that I want the final data frame to have the same columns as the original data frame with the slight difference of having the aggregated values present in each of the columns.
The result of the above aggregation is a multi-index data frame, and am not sure how to convert it to an individual data frame?
I followed the link: Reverting from multiindex to single index dataframe in pandas . But, this didn't work, and the final output was still a multi-index data frame.
Great, if someone could help
you can try the following code df.groupby('batch').aggregate({'c':'min','a':mean})
Related
I'm working on a jupyter notebook, and I would like to get the average 'pcnt_change' based on 'day_of_week'. How do I do this?
A simple groupby call would do the trick here.
If df is the pandas dataframe:
df.groupby('day_of_week').mean()
would return a dataframe with average of all numeric columns in the dataframe with day_of_week as index. If you want only certain column(s) to be returned, select only the needed columns on the groupby call (for e.g.,
df[['open_price', 'high_price', 'day_of_week']].groupby('day_of_week').mean()
I need to reshape my dataframe so that it is wide instead of long, showing each date as column headings and two indices for state and variable name. I've tried using transpose(), melt(), stack(), unstack(), pivot() and set_index() unsuccessfully. Please advise!
The closest that I've come to the solutions is forecasts.set_index(['State', 'Revenue', 'YoY_Change]) or forecasts.set_index(['Date']).T to transpose the date column, but neither are the correct solution.
My data looks like this:
And I need it to look like this:
This is melt followed by pivot:
(df.melt(['State','Date'])
.pivot_table(index=['State', 'variable'], columns='Date', values='value', aggfunc='first')
)
I am having a problem mapping groupby mean statistics to a dataframe column in order to produce a new column.
The raw data is as follows:
I set about creating a new data frame which would display the average sales for 2018 by 'Brand Origin'.
I then proceeded to convert the new data frame to a dictionary in order to complete the mapping process.
I attempted to map the data to the original data frame but I get NaN values.
What have I done wrong?
I think you need transform:
df['new'] = df.groupby('Brand Origin')['2018'].transform('mean')
I have a dataframe with a column 'date' (YYYY-MM-DD HH:MM:SS) and datetime64 type.
I want to drop/eliminate rows by selecting ranges of dates. How can I do this on python/pandas?
Thank you so much in advance
(I cannot post comments, thus I dare to put an answer) The following questions also refer to deleting or filtering a data frame based on the value of a given column:
Delete rows from a pandas DataFrame based on a conditional expression involving len(string) giving KeyError
Deleting DataFrame row in Pandas based on column value
Basically, you can pass a boolean array to the index operator [ ] of the data frame, this returns the filtered data frame. Here the pandas v1.0.1 (!) documentation of how to index data frames. Also this question is helpful.
I have a grouped dataframe according to two columns.
Now i want to plot the data of Date vs Confirmed in seaborn.
Is there a good way to do it.
grouped_series = cases.groupby(['Country/Region','ObservationDate'])['Confirmed','Deaths','Recovered'].sum()
print(grouped_series)
You can change aggregatetion for grouping by datetimes only:
cases.groupby(['ObservationDate'])['Confirmed'].sum().plot()
Or if need summed values per ObservationDate and Country/Region:
cases.groupby(['Country/Region','ObservationDate'])['Confirmed'].sum().unstack(0).plot()