I need to reshape my dataframe so that it is wide instead of long, showing each date as column headings and two indices for state and variable name. I've tried using transpose(), melt(), stack(), unstack(), pivot() and set_index() unsuccessfully. Please advise!
The closest that I've come to the solutions is forecasts.set_index(['State', 'Revenue', 'YoY_Change]) or forecasts.set_index(['Date']).T to transpose the date column, but neither are the correct solution.
My data looks like this:
And I need it to look like this:
This is melt followed by pivot:
(df.melt(['State','Date'])
.pivot_table(index=['State', 'variable'], columns='Date', values='value', aggfunc='first')
)
Related
I am unable to sort the 'count' column from aggregated categorical variable
df_perCity
.groupby(['City','Complaint Type'])['Complaint Type']
.agg(['count'])
.apply(lambda x: x.sort_values(ascending=False))
.sort_values(by='City')
I tried searching everywhere but the closest I think would be to add a new column. I prefer to not add a column to keep data integrity.
If I unstack it, the column 'count' would effectively transform into the column axis and sorting will not make sense by City, by Complaint Type.
Thank you!
df_perCity
.groupby(['City','Complaint Type'])['Complaint Type']
.agg(['count'])
.sort_values(by=['City','count'], ascending=False)
Or you could do:
.sort_values(['City','count'], asending=[True,False])
For those seeking the answer here it is thanks to #josh Friedlander for pointing out the mistake.
I am new to Pandas. Sorry for using images instead of tables here; I tried to follow the instructions for inserting a table, but I couldn't.
Pandas version: '1.3.2'
Given this dataframe with Close and Volume for stocks, I've managed to calculate OBV, using pandas, like this:
df.groupby('Ticker').apply(lambda x: (np.sign(x['Close'].diff().fillna(0)) * x['Volume']).cumsum())
The above gave me the correct values for OBV as
shown here.
However, I'm not able to assign the calculated values to a new column.
I would like to do something like this:
df['OBV'] = df.groupby('Ticker').apply(lambda x: (np.sign(x['Close'].diff().fillna(0)) * x['Volume']).cumsum())
But simply doing the expression above of course will throw us the error:
ValueError: Columns must be same length as key
What am I missing?
How can I insert the calculated values into the original dataframe as a single column, df['OBV'] ?
I've checked this thread so I'm sure I should use apply.
This discussion looked promising, but it is not for my case
Use Series.droplevel for remove first level of MultiIndex:
df['OBV'] = df.groupby('Ticker').apply(lambda x: (np.sign(x['Close'].diff().fillna(0)) * x['Volume']).cumsum()).droplevel(0)
I'm working on a jupyter notebook, and I would like to get the average 'pcnt_change' based on 'day_of_week'. How do I do this?
A simple groupby call would do the trick here.
If df is the pandas dataframe:
df.groupby('day_of_week').mean()
would return a dataframe with average of all numeric columns in the dataframe with day_of_week as index. If you want only certain column(s) to be returned, select only the needed columns on the groupby call (for e.g.,
df[['open_price', 'high_price', 'day_of_week']].groupby('day_of_week').mean()
I have been playing with aggregation in pandas dataframe. Considering the following dataframe:
df=pd.DataFrame({'a':[1,2,3,4,5,6,7,8],
'batch':['q','q','q','w','w','w','w','e'],
'c':[4,1,3,4,5,1,3,2]})
I have to do aggregation on the batch column with mean for column a and min for column c.
I used the following method to do the aggregation:
agg_dict = {'a':{'a':'mean'},'c':{'c':'min'}}
aggregated_df = df.groupby("batch").agg(agg_dict)
The problem is that I want the final data frame to have the same columns as the original data frame with the slight difference of having the aggregated values present in each of the columns.
The result of the above aggregation is a multi-index data frame, and am not sure how to convert it to an individual data frame?
I followed the link: Reverting from multiindex to single index dataframe in pandas . But, this didn't work, and the final output was still a multi-index data frame.
Great, if someone could help
you can try the following code df.groupby('batch').aggregate({'c':'min','a':mean})
I can't get what is possibly wrong in the way I use df.corr() function.
For a DF with 2 columns it returns only 1*1 resulting DF.
In:
merged_df[['Citable per Capita','Citations']].corr()
Out:
one by one resulting DF
What can be the problem here? I expected to see as many rows and columns as many columns were there in the original DF
I found the problem - it was the wrong dtype of the first column values.
To change type of all the columns, use:
df=df.apply(lambda x: pd.to_numeric(x, errors='ignore'))
Note that apply creates a copy of df. That is why reassignment is necessary here