renaming columns after group by and sum in pandas dataframe - pandas

This is my group by command:
pdf_chart_data1 = pdf_chart_data.groupby('sell').value.agg(['sum']).rename(
columns={'sum':'valuesum','sell' : 'selltime'}
)
I am able to change the column name for value but not for 'sell'.
Please help to resolve this issue.

You cannot rename it, because it is index. You can add as_index=False for return DataFrame or add reset_index:
pdf_chart_data1=pdf_chart_data.groupby('sell', as_index=False)['value'].sum()
.rename(columns={'sum':'valuesum','sell' : 'selltime'})
Or:
pdf_chart_data1=pdf_chart_data.groupby('sell')['value'].sum()
.reset_index()
.rename(columns={'sum':'valuesum','sell' : 'selltime'})

df = df.groupby('col1')['col1'].count()
df1= df.to_frame().rename(columns={'col1':'new_name'}).reset_index()

If you join to groupby with the same index where one is nunique ->number of unique items and one is unique->list of unique items then you get two columns called Sport. Using as_index=False I was able to rename the second Sport name using rename then concat the two lists together and sort descending on sport and display the 10 five sportcounts.
grouped=df.groupby('NOC', as_index=False)
Nsport=grouped['Sport'].nunique()\
.rename(columns={'Sport':'SportCount'})
Nsport=Nsport.set_index('NOC')
country_grouped=df.groupby('NOC')
Nsport2=country_grouped['Sport'].unique()
df2=pd.concat([Nsport,Nsport2], join='inner',axis=1).reindex(Nsport.index)
df2=df2.sort_values(by=["SportCount"],ascending=False)
print(df2.columns)
for key,item in df2.head(5).iterrows():
print(key,item)

Related

Pandas how to group by day and other column

I am getting the daily counts of rows from a dataframe using
df = df.groupby(by=df['startDate'].dt.date).count()
How can I modify this so I can also group by another column 'unitName'?
Thank you
Use list with GroupBy.size:
df = df.groupby([df['startDate'].dt.date, 'unitName']).size()
If need count non missing values, e.g. column col use DataFrameGroupBy.count:
df = df.groupby([df['startDate'].dt.date, 'unitName'])['col'].count()

reordering columns in vaex?

my question is how do I reorder columns in vaex. for example, I want the 5th column at number 1 and the first column at number 5, etc. I know we can use the reindex method in pandas, is there a way to mimic that in vaex. thanks for your help.
I think you can do this by selecting the columns in the order that you want.
import vaex
df = vaex.from_arrays(a=[1,2,3], b=[4,5,6])
display(df) # first a, then b
df = df[["b","a"]]
display(df) # first b, then a

Groupby does return previous df without changing it

df=pd.read_csv('../input/tipping/tips.csv')
df_1 = df.groupby(['day','time'])
df_1.head()
Guys, what am I missing here ? As it returns to me previous dataframe without groupby
We can print it using the following :
df_1 = df.groupby(['day','time']).apply(print)
groupby doesn't work the way you are assuming by the sounds of it. Using head on the grouped dataframe takes the first 5 rows of the dataframe, even if it is across groups because that is how the groupby object is built. You can use #tlentali's approach to print out each group, but df_1 will not be assigned the grouped dataframe that way, instead, None (the number of groups times) as that is the output of print.
The way below gives a lot of control over how to show/display the groups and their keys
This might also help you understand more about how the grouped data frame structure in pandas works.
df_1 = df.groupby(['day','time'])
# for each (day,time) and grouped data
for key, group in df_1:
# show the (day,time)
print(key)
# display head of the grouped data
group.head()

how to name colums?

I have a pandas Data Frame where some of the id's are repeated a few times. I've written this code:
df = df["id"].value_counts()
and got this output
What should I do to get something like in the following image?
Thanks
As Quang Hoang answered, value_counts set the column you count as the index. Therefore in order to get the id and the count as columns, you need to do 2 things:
Make the counts as column - to_frame(name='B')
Reset the index to make the ids another column which we'll rename to the desired name: .reset_index().rename(columns={'index': 'A'})
So in one line it'll be:
df = df["id"].value_counts().to_frame(name='B').reset_index().rename(columns={'index': 'A'})
Another possible way is:
col = list(["A", "B")]
df.columns = col

add unique column to a pandas dataframe

I have a pandas dataframe with 10 columns. I would like to add a column which will uniquely identify every row. I do have to come up with the unique value(could be as simple as a running sequence). How can I do this? I tried adding index as a column itself but for some reason I get a KeyError when I do this.
add a column from range of len of you index
df['new'] = range(1, len(df.index)+1)