I have this original DataFrame with 2 levels index
I would like to reorder the levels so that 'delta' comes first then 'time', so I used :
df2.reorder_levels(['delta', 'time'])
But this results in a DF where the delta levels are duplicated for every time instead of having them grouped. Any idea ??
Related
What is the difference between the following approaches to get the rows from a column of df, a pandas DataFrame :
df['col_name'][x:y]
vs.
df.loc[x:y,'col_name']
I'm trying to transpose row data in a dataframe to be a unique column for each value. I have a vertical top-down report, and need to break it out into a horizontal-style report. I'm using alpaca API that makes a dataframe, but now a new version creates a different dataframe structure. Columns have been eliminated and I need them back. Other code that relies on the old style dataframes will be hard to re-tool.
The original dataframe was horizontal, with unique column, with four sub-columns in the header for each stock ticker.
But now it's producing a vertical dataframe. The unique columns are now under 1 column symbol as row values.
This is how I build the dataframe using api features.
df_ticker = api.get_bars(
tickers,
timeframe,
start=start_date,
end=end_date,
limit=1000
).df # to make it a dataframe.
I tried to transpose the values of the symbol column using the pivot_table() function to make the stock values their own columns again like original dataframe, but it didn't come out good.
df_ticker_fixed = df_ticker.reset_index()
df_ticker_fixed = df_ticker_fixed.pivot_table(
index='timestamp',
columns='symbol',
values=['open', 'high', 'low', 'close', 'volume']
)
df_ticker_fixed.head()
Result has incorrect column headers
What I basically need is make that double-header column format again, where there's a column with sub-divided columns underneath. I don't know what that's called when you have two layers of columns in a report.
This is called a MultiIndex and you want to swaplevel after your pivot, with an optional sort_index:
df_ticker_fixed = (df_ticker
.reset_index()
.pivot_table(
index='timestamp',
columns='symbol',
values=['open', 'high', 'low', 'close', 'volume']
)
.swaplevel(axis=1)
.sort_index(level='symbol', axis=1, sort_remaining=False)
)
I have a very large df which I am trying to transpose (rows->columns) but I get the following error: 'Unstacked DataFrame is too big, causing int32 overflow'
Instead I am trying to split my df into n number of equal sized parts so I can transpose them and then concatenate them.
My df is a multi-index df with two levels so when I split the df I would like it to split through the first index (level = 0) index. I have tried np.array_split() but I am unsure whether it splits by row length or index length.
Below i have a pandas dataframe having multiindex, one level is "Address" and second level is (score,num,axis).All total one column.
Output i am looking as separate columns under 'Address' column index.
print(df.columns.tolist())
The question is not exactly clear but from what i got, try this:
df.reset_index().set_index('Address')
I initially had 100k rows in my dataset. I read the csv using pandas into a dataframe called data. I tried to do a subset selection of 51 rows using .loc. My index labels are numeric values 0, 1, 2, 3 etc. I tried using this command -
data = data.loc['0':'50']
But the results were weird, it took all the rows from 0 to 49999, looks like it is taking rows till the index value starts with 50.
Similarly, I tried with this command - new_data = data.loc['0':'19']
and the result was all the rows, starting from 0 till 18999.
Could this be a bug in pandas?
You want to use .iloc in place of .loc, since you are selecting data from the dataframe via numeric indices.
For example:
data.iloc[:50,:]
Keep in mind that your indices are of numeric-type, not string-type, so querying with a string (as you have done in your OP) attempts to match string-wise comparisons.