Pandas Stack Column Number Mismatch [duplicate] - pandas

This question already has answers here:
Pandas: Adding new column to dataframe which is a copy of the index column
(3 answers)
Closed 1 year ago.
Try to stack and result in 3 columns not 1
Hello, I am trying to use the stack function in pandas, but when I use it results in only 1 column when using shape, but displays 3. I see that they are on different levels and I have tried stuff with levels with no success. What can I do I need 3 columns!?
-Thanks

Use new_cl_traff.reset_index()

As you can see in your screenshot you have a multi-index on your dataframe with year and month - see the line where you name the two index levels:
new_cl_traf.index.set_names(["Year","Month"], inplace=True)
You can see the documentation for pandas.stack here
if you use new_cl_traff.reset_index() the index or a subset of levels will be reset - see documentation here

Related

When to use pandas ‘loc’ for dataframe slicing [duplicate]

This question already has answers here:
Python: Pandas Series - Why use loc?
(3 answers)
Closed 1 year ago.
In pandas, if i have a dataframe , i can subset it like:
df[df.col == some_condition]
Also, i can do:
df.loc[df.col == some_condition]
What is the difference between the two? The ‘loc’ approach seems more verbose?
In simple words:
There are three primary indexers for pandas. We have the indexing operator itself (the brackets []), .loc, and .iloc. Let's summarize them:
[] - Primarily selects subsets of columns, but can select rows as well. Can't simultaneously select rows and columns.
.loc - selects subsets of rows and columns by label only
.iloc - selects subsets of rows and columns by integer location only
For more detailed explanation you can check this question

filter multiple separate rows in a DataFrame that meet the condition in another DataFrame with pandas? [duplicate]

This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 2 years ago.
This is my DataFrame
df = pd.DataFrame({'uid': [109200005, 108200056, 109200060, 108200085, 108200022],
'grades': [69.233627, 70.130900, 83.357011, 88.206387, 74.342212]})
This is my condition list which comes from another DataFrame
condition_list = [109200005, 108200085]
I use this code to filter records that meet the condition
idx_list = []
for i in condition_list:
idx_list.append(df[df['uid']==i].index.values[0])
and get what I need
>>> df.iloc[idx_list]
uid grades
0 109200005 69.233627
3 108200085 88.206387
Job is done. I'd just like to know is there a simpler way to do the job?
Yes, use isin:
df[df['uid'].isin(condition_list)]

How to use nlasgest in pandas? [duplicate]

This question already has answers here:
Pandas max value index
(3 answers)
Closed 2 years ago.
I'm looking for the highest row of a dataframa, actually the idea is to pick the highest value and the index. I'm trying to use this code:
data_q11.nlargest(144,['1980','2010'])
where data_q11 is the dataframe,144 the number os rows in this df and range of columns.
Although the result is returning a empty list of 0 rows and x 31 columns.
There is a function in Pandas for the index of the maximum value:
data_q11['col'].idxmax(axis=1)

Extracting information from Pandas dataframe column headers [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 2 years ago.
I have a pandas dataframe with column headers, which contain information. I want to loop through the column headers and use logical operations on each header to extract the columns with the relevant information that I have.
my df.columns command gives something like this:
['(param1:x)-(param2:y)-(param3:z1)',
'(param1:x)-(param2:y)-(param3:z2)',
'(param1:x)-(param2:y)-(param3:z3)']
I want to select only the columns, which contain (param3:z1) and (param3:z3).
Is this possible?
You can use filter:
df = df.filter(regex='z1|z3')

All column names not listed by df.columns [duplicate]

This question already has answers here:
pandas groupby without turning grouped by column into index
(3 answers)
Closed 2 years ago.
I wanted to perform groupby and agg fucntion on my dataframe, so i performed the below code
basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')])
basic_df.head(2)
My O/P:
totSale count
S2PName S2PName-Category
IDLY Food 598771.47 19749
DOSA Food 567431.03 14611
Now I try to print the columns using basic_df.columns
My O/P:
Index(['totSale', 'count'], dtype='object')
Why are the other two columns "S2pname and S2PName-category" not being displayed. What do I need to do to display them as well?
Thanks !
Adding as_index=False, or reset_index() at the end
basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False,as_index=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')])
#basic_df = df.groupby(['S2PName','S2PName-Category'], sort=False)['S2PGTotal'].agg([('totSale','sum'), ('count','size')]).reset_index()