how to name colums? - pandas

I have a pandas Data Frame where some of the id's are repeated a few times. I've written this code:
df = df["id"].value_counts()
and got this output
What should I do to get something like in the following image?
Thanks

As Quang Hoang answered, value_counts set the column you count as the index. Therefore in order to get the id and the count as columns, you need to do 2 things:
Make the counts as column - to_frame(name='B')
Reset the index to make the ids another column which we'll rename to the desired name: .reset_index().rename(columns={'index': 'A'})
So in one line it'll be:
df = df["id"].value_counts().to_frame(name='B').reset_index().rename(columns={'index': 'A'})

Another possible way is:
col = list(["A", "B")]
df.columns = col

Related

pandas: split pandas columns of unequal length list into multiple columns

I have a dataframe with one column of unequal list which I want to spilt into multiple columns (the item value will be the column names). An example is given below
I have done through iterrows, iterating thruough the rows and examine the list from each rows. It seem workable as my dataframe has few rows. However, I wonder if there is any clean methods
I have done through additional_df = pd.DataFrame(venue_df.location.values.tolist())
However the list break down into as below
thanks fro your help
Can you try this code: built assuming venue_df.location contains the list you have shown in the cells.
venue_df['school'] = venue_df.location.apply(lambda x: ('school' in x)+0)
venue_df['office'] = venue_df.location.apply(lambda x: ('office' in x)+0)
venue_df['home'] = venue_df.location.apply(lambda x: ('home' in x)+0)
venue_df['public_area'] = venue_df.location.apply(lambda x: ('public_area' in x)+0)
Hope this helps!
First lets explode your location column, so we can get your wanted end result.
s=df['Location'].explode()
Then lets use crosstab in that series so we can get your end result
import pandas as pd
pd.crosstab(s).unstack()
I didnt test it out cause i dont know you base_df

How to index a column with two values pandas

I have two dataframes:
Dataframe #1
Reads the values--Will only be interested in NodeID AND GSE
sta = pd.read_csv(filename)
Dataframe #2
Reads the file, use pivot and get the following result
sim = pd.read_csv(headout,index_col=0)
sim['Layer'] = sim.groupby('date').cumcount() + 1
sim['Layer'] = 'L' + sim['Layer'].astype(str)
sim = sim.pivot(index = None , columns = 'Layer').T
This gives me the index column to be with two values. (The header is blank for the first one, and Layers for the second) i.e 1,L1.
What I need help on is:
I can not find a way to rename that first blank in the index to 'NodeID'.
I want to name it that so that I can do the lookup function and use NodeID in both dataframes so that I can bring in the 'GSE' values from the first dataframe to the second.
I have been googling way to rename that first column in the second dataframe and I can not seem to find an solution. Any ideas help at this point. I think my pivot function might be wrong...
This is a picture of dataframe #2 before pivot. The number 1-4 are the Node ID.
when I export it to csv to see what the dataframe looks like I get this..
Try
df.rename(columns={"Index": "your preferred name"})
if it is your index then do -
df = df.reset_index()
df.rename(columns={"index": "your preferred name"})

Is there any way to run a process on selected columns but keep the columns in the dataframe?

I am using pandas.DataFrame with 85 columns. I want to run a process on 84 of the columns and keep the 85th column unchanged. I don't want to drop or delete the 85th column. Here is the process I am trying:
normalized_df = (df - df.min()) / (df.max() - df.min())
You can try the following code, using list comprehension for selecting only columns you want to run process on:
UNWANTED_COL_NAME = 'name_of_unwanted_column'
touchable_columns = [col for col in df.columns if col != UNWANTED_COL_NAME]
normalized_df = (df[touchable_columns] - df[touchable_columns].min()) / (df[touchable_columns].max() - df[touchable_columns].min())
normalized_df[UNWANTED_COL_NAME] = df[UNWANTED_COL_NAME]
IIUC you want to skip last column (85th) only for that you can skip last columns using slicing like df.iloc[:,:-1] for operation and then join last column with result.
Ex.:
normalized_df = (df.iloc[:,:-1]-df.iloc[:,:-1].min())/(df.iloc[:,:-1].max()-df.iloc[:,:-1].min()).join(df.iloc[:,-1])

Pandas groupby year filtering the dataframe by n largest values

I have a dataframe at hourly level with several columns. I want to extract the entire rows (containing all columns) of the 10 top values of a specific column for every year in my dataframe.
so far I ran the following code:
df = df.groupby([df.index.year])['totaldemand'].apply(lambda grp: grp.nlargest(10)))
The problem here is that I only get the top 10 values for each year of that specific column and I lose the other columns. How can I do this operation and having the corresponding values of the other columns that correspond to the top 10 values per year of my 'totaldemand' column?
We usually do head after sort_values
df = df.sort_values('totaldemand',ascending = False).groupby([df.index.year])['totaldemand'].head(10)
nlargest can be applied to each group, passing the column to look for
largest values.
So run:
df.groupby([df.index.year]).apply(lambda grp: grp.nlargest(3, 'totaldemand'))
Of course, in the final version replace 3 with your actual value.
Get the index of your query and use it as a mask on your original df:
idx = df.groupby([df.index.year])['totaldemand'].apply(lambda grp: grp.nlargest(10))).index.to_list()
df.iloc[idx,]
(or something to that extend, I can't test now without any test data)

renaming columns after group by and sum in pandas dataframe

This is my group by command:
pdf_chart_data1 = pdf_chart_data.groupby('sell').value.agg(['sum']).rename(
columns={'sum':'valuesum','sell' : 'selltime'}
)
I am able to change the column name for value but not for 'sell'.
Please help to resolve this issue.
You cannot rename it, because it is index. You can add as_index=False for return DataFrame or add reset_index:
pdf_chart_data1=pdf_chart_data.groupby('sell', as_index=False)['value'].sum()
.rename(columns={'sum':'valuesum','sell' : 'selltime'})
Or:
pdf_chart_data1=pdf_chart_data.groupby('sell')['value'].sum()
.reset_index()
.rename(columns={'sum':'valuesum','sell' : 'selltime'})
df = df.groupby('col1')['col1'].count()
df1= df.to_frame().rename(columns={'col1':'new_name'}).reset_index()
If you join to groupby with the same index where one is nunique ->number of unique items and one is unique->list of unique items then you get two columns called Sport. Using as_index=False I was able to rename the second Sport name using rename then concat the two lists together and sort descending on sport and display the 10 five sportcounts.
grouped=df.groupby('NOC', as_index=False)
Nsport=grouped['Sport'].nunique()\
.rename(columns={'Sport':'SportCount'})
Nsport=Nsport.set_index('NOC')
country_grouped=df.groupby('NOC')
Nsport2=country_grouped['Sport'].unique()
df2=pd.concat([Nsport,Nsport2], join='inner',axis=1).reindex(Nsport.index)
df2=df2.sort_values(by=["SportCount"],ascending=False)
print(df2.columns)
for key,item in df2.head(5).iterrows():
print(key,item)