add unique column to a pandas dataframe - pandas

I have a pandas dataframe with 10 columns. I would like to add a column which will uniquely identify every row. I do have to come up with the unique value(could be as simple as a running sequence). How can I do this? I tried adding index as a column itself but for some reason I get a KeyError when I do this.

add a column from range of len of you index
df['new'] = range(1, len(df.index)+1)

Related

Find the average of a column based on another Pandas?

I'm working on a jupyter notebook, and I would like to get the average 'pcnt_change' based on 'day_of_week'. How do I do this?
A simple groupby call would do the trick here.
If df is the pandas dataframe:
df.groupby('day_of_week').mean()
would return a dataframe with average of all numeric columns in the dataframe with day_of_week as index. If you want only certain column(s) to be returned, select only the needed columns on the groupby call (for e.g.,
df[['open_price', 'high_price', 'day_of_week']].groupby('day_of_week').mean()

Pandas Dataframe: How to get the cell instead of is value

I have a task to compare two dataframe with same columns name but different size, we can call it previous and current. I am trying to get the difference between (previous and current) in the Quantity and Booked Columns and highlight it as yellow. The common key between the two dataframe would be the 'SN' columns
I have coded out the following
for idx, rows in df_n.iterrows():
if rows["Quantity"] == rows['Available'] + rows['Booked']:
continue
else:
rows["Quantity"] = rows["Quantity"] - rows['Available'] - rows['Booked']
df_n.loc[idx, 'Quantity'].style.applymap('background-color: yellow')
# pdb.set_trace()
if (df_o['Booked'][df_o['SN'] == rows["SN"]] != rows['Booked']).bool():
df_n.loc[idx, 'Booked'].style.apply('background-color: yellow')
I realise I have a few problems here and need some help
df_n.loc[idx, 'Quantity'] returns value instead of a dataframe type. How can I get a dataframe from one cell. Do I have to pd.DataFrame(data=df_n.loc[idx, 'Quantity'], index=idx, columns ='Quantity'). Will this create a copy or will update the reference?
How do I compare the SN of both dataframe, looking for a better way to compare. One thing I could think of is to use set index for both dataframe and when finished using them, reset them back?
My dataframe:
Previous dataframe
Current Dataframe
df_n.loc[idx, 'Quantity'] returns value instead of a dataframe type.
How can I get a dataframe from one cell. Do I have to
pd.DataFrame(data=df_n.loc[idx, 'Quantity'], index=idx, columns
='Quantity'). Will this create a copy or will update the reference?
To create a DataFrame from one cell you can try: df_n.loc[idx, ['Quantity']].to_frame().T
How do I compare the SN of both dataframe, looking for a better way to
compare. One thing I could think of is to use set index for both
dataframe and when finished using them, reset them back?
You can use df_n.merge(df_o, on='S/N') to merge dataframes and 'compare' columns.

how to name colums?

I have a pandas Data Frame where some of the id's are repeated a few times. I've written this code:
df = df["id"].value_counts()
and got this output
What should I do to get something like in the following image?
Thanks
As Quang Hoang answered, value_counts set the column you count as the index. Therefore in order to get the id and the count as columns, you need to do 2 things:
Make the counts as column - to_frame(name='B')
Reset the index to make the ids another column which we'll rename to the desired name: .reset_index().rename(columns={'index': 'A'})
So in one line it'll be:
df = df["id"].value_counts().to_frame(name='B').reset_index().rename(columns={'index': 'A'})
Another possible way is:
col = list(["A", "B")]
df.columns = col

Position of index column in CSV output from pandas data frame

I am trying to reposition the index column in the output CSV from pandas DataFrame.to_csv()
I can order the non index columns using columns but it is unclear how to move the index column.
If i have 2 columns Name and Age and index i want the columns to come out in the following order in resulting CSV Name, Age,index
Anyone know how to do this?
index cannot be moved, it is always first column in DataFrame or Series or Panel. But you can copy data from index to another column.
But if need last column created from index:
df['new_last'] = df.index
If need custom position of new column:
df.insert(2, 'new', df.index)
And last for prevent write index to csv, thanks #Vivek Kalyanarangan:
df.to_csv(file, index=False)

renaming columns after group by and sum in pandas dataframe

This is my group by command:
pdf_chart_data1 = pdf_chart_data.groupby('sell').value.agg(['sum']).rename(
columns={'sum':'valuesum','sell' : 'selltime'}
)
I am able to change the column name for value but not for 'sell'.
Please help to resolve this issue.
You cannot rename it, because it is index. You can add as_index=False for return DataFrame or add reset_index:
pdf_chart_data1=pdf_chart_data.groupby('sell', as_index=False)['value'].sum()
.rename(columns={'sum':'valuesum','sell' : 'selltime'})
Or:
pdf_chart_data1=pdf_chart_data.groupby('sell')['value'].sum()
.reset_index()
.rename(columns={'sum':'valuesum','sell' : 'selltime'})
df = df.groupby('col1')['col1'].count()
df1= df.to_frame().rename(columns={'col1':'new_name'}).reset_index()
If you join to groupby with the same index where one is nunique ->number of unique items and one is unique->list of unique items then you get two columns called Sport. Using as_index=False I was able to rename the second Sport name using rename then concat the two lists together and sort descending on sport and display the 10 five sportcounts.
grouped=df.groupby('NOC', as_index=False)
Nsport=grouped['Sport'].nunique()\
.rename(columns={'Sport':'SportCount'})
Nsport=Nsport.set_index('NOC')
country_grouped=df.groupby('NOC')
Nsport2=country_grouped['Sport'].unique()
df2=pd.concat([Nsport,Nsport2], join='inner',axis=1).reindex(Nsport.index)
df2=df2.sort_values(by=["SportCount"],ascending=False)
print(df2.columns)
for key,item in df2.head(5).iterrows():
print(key,item)