Add a title to a dataframe - pandas

I originally had a dataframe df1,
Close
ticker AAPL AMD BIDU GOOGL IXIC
Date
2011-06-01 12.339643 8.370000 132.470001 263.063049 2769.189941
2011-06-02 12.360714 8.240000 138.490005 264.294281 2773.310059
2011-06-03 12.265714 7.970000 133.210007 261.801788 2732.780029
2011-06-06 12.072857 7.800000 126.970001 260.790802 2702.560059
2011-06-07 11.858571 7.710000 124.820000 259.774780 2701.560059
... ... ... ... ... ...
2021-05-24 127.099998 77.440002 188.960007 2361.040039 13661.169922
2021-05-25 126.900002 77.860001 192.770004 2362.870117 13657.169922
2021-05-26 126.849998 78.339996 194.880005 2380.310059 13738.000000
2021-05-27 125.279999 78.419998 194.809998 2362.679932 13736.280273
2021-05-28 124.610001 80.080002 196.270004 2356.850098 13748.740234
Due to the need for calculation, I changed the columns and created df2, which contains no Close,
ticker AAPL AMD BIDU GOOGL IXIC
Date
2011-08-25 0.760119 0.028203 0.621415 0.036067 0.993046
2011-09-23 0.648490 0.216017 0.267167 0.699657 0.562897
2011-10-21 0.442864 0.326310 0.197121 0.399332 0.048258
2011-11-18 0.333015 0.062089 0.164588 0.373293 0.015258
2011-12-19 0.101208 0.389120 0.218844 0.094759 0.116979
... ... ... ... ... ...
2021-01-12 0.437177 0.012871 0.997870 0.075802 0.137392
2021-02-10 0.064343 0.178901 0.522356 0.625447 0.320007
2021-03-11 0.135033 0.300345 0.630085 0.253857 0.466884
2021-04-09 0.358583 0.484004 0.295894 0.215424 0.454395
2021-05-07 0.124987 0.311816 0.999940 0.232552 0.281189
And now I am struggling on how to add a name to the dataframe again, say ret, because I would like to plot the histogram of each column, and would like the titles to be something like ('ret', 'AAPL')...
This may be a bit stupid and confusing, hopefully I have explained the question clearly. Thanks for any help.

you can use pd.MultiIndex.from_product() method:
df2=df2.set_index('Date')
#If 'Date' column is not your Index then make it index
df2.columns=pd.MultiIndex.from_product([['ret'],df2.columns])

Related

Slicing pandas dataframe using index values

I'm trying to select the rows who's index values are congruent to 1 mod 24. How can I best do this?
This is my dataframe:
ticker date open high low close volume momo nextDayLogReturn
335582 ETH/USD 2021-11-05 00:00:00+00:00 4535.3 4539.3 4495.8 4507.1 9.938260e+06 9.094134 -9.160928
186854 BTC/USD 2021-11-05 00:00:00+00:00 61437.0 61528.0 61111.0 61170.0 1.191233e+07 10.640513 -10.825763
186853 BTC/USD 2021-11-04 23:00:00+00:00 61190.0 61541.0 61130.0 61437.0 1.395133e+07 10.645757 -10.842114
335581 ETH/USD 2021-11-04 23:00:00+00:00 4518.8 4539.4 4513.6 4535.3 1.296507e+07 9.087243 -9.139240
186852 BTC/USD 2021-11-04 22:00:00+00:00 61393.0 61426.0 61044.0 61190.0 1.360557e+07 10.639201 -10.812127
This was my attempt:
newindex = []
for i in range(0,df2.shape[0]+1):
if(i%24 ==1):
newindex.append(i)
df2.iloc[[newindex]]
Essentially, I need to select the rows using a boolean but i'm not sure how to do it.
Many thanks

Pandas - Take value n month before

I am working with datetime. Is there anyway to get a value of n months before.
For example, the data look like:
dft = pd.DataFrame(
np.random.randn(100, 1),
columns=["A"],
index=pd.date_range("20130101", periods=100, freq="M"),
)
dft
Then:
For every Jul of each year, we take value of December in previous year and apply it to June next year
For other month left (from Aug this year to June next year), we take value of previous month
For example: that value from Jul-2000 to June-2001 will be the same and equal to value of Dec-1999.
What I've been trying to do is:
dft['B'] = np.where(dft.index.month == 7,
dft['A'].shift(7, freq='M') ,
dft['A'].shift(1, freq='M'))
However, the result is simply a copy of column A. I don't know why. But when I tried for single line of code :
dft['C'] = dft['A'].shift(7, freq='M')
then everything is shifted as expected. I don't know what is the issue here
The issue is index alignment. This shift that you performed acts on the index, but using numpy.where you convert to arrays and lose the index.
Use pandas' where or mask instead, everything will remain as Series and the index will be preserved:
dft['B'] = (dft['A'].shift(1, freq='M')
.mask(dft.index.month == 7, dft['A'].shift(7, freq='M'))
)
output:
A B
2013-01-31 -2.202668 NaN
2013-02-28 0.878792 -2.202668
2013-03-31 -0.982540 0.878792
2013-04-30 0.119029 -0.982540
2013-05-31 -0.119644 0.119029
2013-06-30 -1.038124 -0.119644
2013-07-31 0.177794 -1.038124
2013-08-31 0.206593 -2.202668 <- correct
2013-09-30 0.188426 0.206593
2013-10-31 0.764086 0.188426
... ... ...
2020-12-31 1.382249 -1.413214
2021-01-31 -0.303696 1.382249
2021-02-28 -1.622287 -0.303696
2021-03-31 -0.763898 -1.622287
2021-04-30 0.420844 -0.763898
[100 rows x 2 columns]

Pandas dataframe.resample multiple columns: max on one column, select corresponding values on another, and mean on others

I have a dataframe with several variables:
tagdata.head()
Out[128]:
Depth Temperature ... Ay Az
Time ...
2017-09-25 21:46:05 23.0 7.70 ... 0.054688 -0.691406
2017-09-25 21:46:10 24.5 6.15 ... 0.148438 -0.742188
2017-09-25 21:46:15 27.5 4.10 ... -0.078125 -0.875000
2017-09-25 21:46:20 29.0 2.55 ... 0.144531 -0.664062
2017-09-25 21:46:25 30.0 2.45 ... 0.343750 -0.886719
[5 rows x 6 columns]
I want to resample every 24H, select 1) the maximum Depth within 24H, 2) the value of temperature that corresponds to that maximum depth 3) the 24H mean for the last two columns, Ay and Az.
So far I have use the code below and it works but I would like to make the last two lines cleaner into one if possible.
Thanks!
tagdata_dailydepthmax = tagdata.resample('24H').apply(lambda tagdata: tagdata.loc[tagdata.Depth.idxmax()])
tagdata_dailydepthmax.Ay = tagdata['Ay'].resample('24H').mean()
tagdata_dailydepthmax.Az = tagdata['Az'].resample('24H').mean()
You can try this. It calculates mean for multiple columns
tagdata_dailydepthmax[['Ay','Az']] = tagdata[['Ay','Az']].resample('24H').mean()

Best way to filter out data from specific month in pandas [duplicate]

This question already has answers here:
How to filter a dataframe of dates by a particular month/day?
(3 answers)
Closed 1 year ago.
I have financial data:
Open High ... Adj Close Volume
Date ...
2016-11-17 60.410000 60.950001 ... 56.484898 32132700
2016-11-18 60.779999 61.139999 ... 56.214767 27686300
2016-11-21 60.500000 60.970001 ... 56.689823 19652600
2016-11-22 60.980000 61.259998 ... 56.932003 23206700
2016-11-23 61.009998 61.099998 ... 56.261349 21848900
... ... ... ... ...
2021-11-10 334.570007 334.630005 ... 330.799988 25500900
2021-11-11 331.250000 333.769989 ... 332.429993 16849800
2021-11-12 333.920013 337.230011 ... 336.720001 23822000
2021-11-15 337.540009 337.880005 ... 336.070007 16723000
2021-11-16 335.679993 340.670013 ... 339.510010 20746300
I want to filter out all the examples in a specific month, e.g., November. To clarify, I want data from each November, regardless of the year.
I guess I could reset the index and than extract the month somehow.
Is there an easier way?, like between_time offers the option to filter out intra-day time intervals.
Assuming you have a DatetimeIndex, use dt accessor.
df_nov = df[df.index.month == 11]

What are the intervals(?) in the end of yfinance timeindex?

Downloading data from Yahoo finance via the library yfinance returns a data-frame with a time-index
df = yf.download("MSFT ORCL", interval="30m", period="20D")
The index seems a bit unusual.
Adj Close ... Volume
MSFT ORCL ... MSFT ORCL
Datetime ...
2021-10-25 09:30:00-04:00 307.415009 98.349998 ... 3007643 583876
2021-10-25 10:00:00-04:00 308.109985 97.879997 ... 1295458 593084
2021-10-25 10:30:00-04:00 307.980011 98.120003 ... 962431 268932
2021-10-25 11:00:00-04:00 308.209991 98.184998 ... 816024 204434
2021-10-25 11:30:00-04:00 308.065002 98.320000 ... 804070 145182
... ... ... ... ...
2021-11-19 13:30:00-05:00 343.250000 94.360001 ... 758614 261011
2021-11-19 14:00:00-05:00 342.894989 94.004997 ... 587500 270425
2021-11-19 14:30:00-05:00 342.932007 94.065002 ... 590121 296746
2021-11-19 15:00:00-05:00 343.296814 94.044998 ... 832597 311972
2021-11-19 15:30:00-05:00 343.029999 93.970001 ... 2250862 1012153
What is the -05:00 or -04:00 in the index. I know I can't compare it to a normal timestamp with the same value.
test = pd.Timestamp(2021, 10, 25, 9,30)
Is not equal to the index in the data frame with the value "2021-10-25 09:30:00-04:00".