How Should I skip Rows in Pandas DataFrame, that are above the Index Columns name? [duplicate] - pandas

This question already has answers here:
Pandas dataframe with multiindex column - merge levels
(4 answers)
Closed 2 months ago.
I want to set all column names at one line. How should I do it?
I tried many things but couldn't do it, including renaming columns.

You need to flatten your multiindex column header.
df.columns = df.columns.map('_'.join)
Or using f-string with list comprehension:
df.columns = [f'{i}_{j}' if j else f'{i}' for i, j in df.columns]

Related

pandas filtering column names [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 4 months ago.
I have this type of dataframe;
A = ["axa","axb","axc","axd","bxa","bxb","bxc","bxd","cxa".......]
My question is I have this type of data but there are more than 350 columns and for example i need only 'c' including column names in new dataframe. How can i do that?
new dataframe columns should look like this;
B = A[["axc","bxc","cxa","cxb","cxc","cxd","dxc","exc","fxc".......]]
Use for filter columns names with c by DataFrame.filter:
df2 = df.filter(like='c')
Or use list comprehension for filter columns names:
df2 = df[[x for x in df.columns if 'c' in x]]
You can do it easily using list comprehension:
new_df = df[[col for col in df.columns if 'c' in col]]

How to replace character into multiIndex pandas [duplicate]

This question already has an answer here:
Pandas dataframe replace string in multiple columns by finding substring
(1 answer)
Closed 11 months ago.
I have a dataset with severals columns containing numbers and I need to remove the ',' thousand separator.
Here is an example: 123,456.15 -> 123456.15.
I tried to get it done with multi-indexes the following way:
toProcess = ['col1','col2','col3']
df[toProcess] = df[toProcess].str.replace(',','')
Unfortunately, the error is: 'Dataframe' object has no attributes 'str'. Dataframe don't have str attributes but Series does.
How can I achieve this task efficiently ?
Here is a working way iterating over the columns:
toProcess = ['col1','col2','col3']
for i, col in enumerate(toProcess):
df[col] = df[col].str.replace(',','')
Use:
df[toProcess] = df[toProcess].replace(',','', regex=True)

All column names not listed by df.columns [duplicate]

This question already has answers here:
pandas groupby without turning grouped by column into index
(3 answers)
Closed 2 years ago.
I wanted to perform groupby and agg fucntion on my dataframe, so i performed the below code
basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')])
basic_df.head(2)
My O/P:
totSale count
S2PName S2PName-Category
IDLY Food 598771.47 19749
DOSA Food 567431.03 14611
Now I try to print the columns using basic_df.columns
My O/P:
Index(['totSale', 'count'], dtype='object')
Why are the other two columns "S2pname and S2PName-category" not being displayed. What do I need to do to display them as well?
Thanks !
Adding as_index=False, or reset_index() at the end
basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False,as_index=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')])
#basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')]).reset_index()

Remove the duplicated entry from this list [duplicate]

This question already has answers here:
Drop all duplicate rows across multiple columns in Python Pandas
(8 answers)
Closed 3 years ago.
Based on the bellow example, how can i remove just the last "A" from the list ? By using duplicates (as i did) it deletes both entries. The end result should be: A, B, C but now i get B,C.
import pandas as pd
df = pd.DataFrame({'ID': ["A", "B", "C", "A"]})
df.drop_duplicates(keep=False,inplace=True)
print(df)
You just need to set parameter keep="first"

how to write list comprehension for selecting cells base on a substring [duplicate]

This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 3 years ago.
I am trying to rewrite the following in one line using list comprehension. I want to select cells that contains substring '[edit]' only. ut is my dataframe and the column that I want to select from is 'col1'. Thanks!
for u in ut['col1']:
if '[edit]' in u:
print(u)
I expect the following output:
Alabama[edit]
Alaska[edit]
Arizona[edit]
...
If the output of a Pandas Series is acceptable, then you can just use .str.contains, without a loop
s = ut[ut["col1"].str.contains("edit")]
If you need to print each element of the Series separately, then loop over the Series using
for i in s:
print(i)