Extracting information from Pandas dataframe column headers [duplicate] - pandas

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 2 years ago.
I have a pandas dataframe with column headers, which contain information. I want to loop through the column headers and use logical operations on each header to extract the columns with the relevant information that I have.
my df.columns command gives something like this:
['(param1:x)-(param2:y)-(param3:z1)',
'(param1:x)-(param2:y)-(param3:z2)',
'(param1:x)-(param2:y)-(param3:z3)']
I want to select only the columns, which contain (param3:z1) and (param3:z3).
Is this possible?

You can use filter:
df = df.filter(regex='z1|z3')

Related

Pandas Stack Column Number Mismatch [duplicate]

This question already has answers here:
Pandas: Adding new column to dataframe which is a copy of the index column
(3 answers)
Closed 1 year ago.
Try to stack and result in 3 columns not 1
Hello, I am trying to use the stack function in pandas, but when I use it results in only 1 column when using shape, but displays 3. I see that they are on different levels and I have tried stuff with levels with no success. What can I do I need 3 columns!?
-Thanks
Use new_cl_traff.reset_index()
As you can see in your screenshot you have a multi-index on your dataframe with year and month - see the line where you name the two index levels:
new_cl_traf.index.set_names(["Year","Month"], inplace=True)
You can see the documentation for pandas.stack here
if you use new_cl_traff.reset_index() the index or a subset of levels will be reset - see documentation here

How do create lists of items for every unique ID in a Pandas DataFrame? [duplicate]

This question already has answers here:
How to get unique values from multiple columns in a pandas groupby
(3 answers)
Python pandas unique value ignoring NaN
(4 answers)
Closed 1 year ago.
Imagine I have a table that looks like this.
original table
How do I convert it into this?
converted table
Attached sample data. Thanks.

Pandas - list of unique strings in a column [duplicate]

This question already has answers here:
Find the unique values in a column and then sort them
(8 answers)
Closed 1 year ago.
i have a dataframe column which contains these values:
A
A
A
F
R
R
B
B
A
...
I would like to make a list summarizing the different strings, as [A,B,F,...].
I've used groupby with nunique(), but I don't need counting.
How can I make the list ?
Thanks
unique() is enough
df['col'].unique().tolist()
pandas.Series.nunique() is to return the number of unique items.

filter multiple separate rows in a DataFrame that meet the condition in another DataFrame with pandas? [duplicate]

This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 2 years ago.
This is my DataFrame
df = pd.DataFrame({'uid': [109200005, 108200056, 109200060, 108200085, 108200022],
'grades': [69.233627, 70.130900, 83.357011, 88.206387, 74.342212]})
This is my condition list which comes from another DataFrame
condition_list = [109200005, 108200085]
I use this code to filter records that meet the condition
idx_list = []
for i in condition_list:
idx_list.append(df[df['uid']==i].index.values[0])
and get what I need
>>> df.iloc[idx_list]
uid grades
0 109200005 69.233627
3 108200085 88.206387
Job is done. I'd just like to know is there a simpler way to do the job?
Yes, use isin:
df[df['uid'].isin(condition_list)]

how to write list comprehension for selecting cells base on a substring [duplicate]

This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 3 years ago.
I am trying to rewrite the following in one line using list comprehension. I want to select cells that contains substring '[edit]' only. ut is my dataframe and the column that I want to select from is 'col1'. Thanks!
for u in ut['col1']:
if '[edit]' in u:
print(u)
I expect the following output:
Alabama[edit]
Alaska[edit]
Arizona[edit]
...
If the output of a Pandas Series is acceptable, then you can just use .str.contains, without a loop
s = ut[ut["col1"].str.contains("edit")]
If you need to print each element of the Series separately, then loop over the Series using
for i in s:
print(i)