Downloading data from Yahoo finance via the library yfinance returns a data-frame with a time-index
df = yf.download("MSFT ORCL", interval="30m", period="20D")
The index seems a bit unusual.
Adj Close ... Volume
MSFT ORCL ... MSFT ORCL
Datetime ...
2021-10-25 09:30:00-04:00 307.415009 98.349998 ... 3007643 583876
2021-10-25 10:00:00-04:00 308.109985 97.879997 ... 1295458 593084
2021-10-25 10:30:00-04:00 307.980011 98.120003 ... 962431 268932
2021-10-25 11:00:00-04:00 308.209991 98.184998 ... 816024 204434
2021-10-25 11:30:00-04:00 308.065002 98.320000 ... 804070 145182
... ... ... ... ...
2021-11-19 13:30:00-05:00 343.250000 94.360001 ... 758614 261011
2021-11-19 14:00:00-05:00 342.894989 94.004997 ... 587500 270425
2021-11-19 14:30:00-05:00 342.932007 94.065002 ... 590121 296746
2021-11-19 15:00:00-05:00 343.296814 94.044998 ... 832597 311972
2021-11-19 15:30:00-05:00 343.029999 93.970001 ... 2250862 1012153
What is the -05:00 or -04:00 in the index. I know I can't compare it to a normal timestamp with the same value.
test = pd.Timestamp(2021, 10, 25, 9,30)
Is not equal to the index in the data frame with the value "2021-10-25 09:30:00-04:00".
Related
I'm trying to select the rows who's index values are congruent to 1 mod 24. How can I best do this?
This is my dataframe:
ticker date open high low close volume momo nextDayLogReturn
335582 ETH/USD 2021-11-05 00:00:00+00:00 4535.3 4539.3 4495.8 4507.1 9.938260e+06 9.094134 -9.160928
186854 BTC/USD 2021-11-05 00:00:00+00:00 61437.0 61528.0 61111.0 61170.0 1.191233e+07 10.640513 -10.825763
186853 BTC/USD 2021-11-04 23:00:00+00:00 61190.0 61541.0 61130.0 61437.0 1.395133e+07 10.645757 -10.842114
335581 ETH/USD 2021-11-04 23:00:00+00:00 4518.8 4539.4 4513.6 4535.3 1.296507e+07 9.087243 -9.139240
186852 BTC/USD 2021-11-04 22:00:00+00:00 61393.0 61426.0 61044.0 61190.0 1.360557e+07 10.639201 -10.812127
This was my attempt:
newindex = []
for i in range(0,df2.shape[0]+1):
if(i%24 ==1):
newindex.append(i)
df2.iloc[[newindex]]
Essentially, I need to select the rows using a boolean but i'm not sure how to do it.
Many thanks
I am working with datetime. Is there anyway to get a value of n months before.
For example, the data look like:
dft = pd.DataFrame(
np.random.randn(100, 1),
columns=["A"],
index=pd.date_range("20130101", periods=100, freq="M"),
)
dft
Then:
For every Jul of each year, we take value of December in previous year and apply it to June next year
For other month left (from Aug this year to June next year), we take value of previous month
For example: that value from Jul-2000 to June-2001 will be the same and equal to value of Dec-1999.
What I've been trying to do is:
dft['B'] = np.where(dft.index.month == 7,
dft['A'].shift(7, freq='M') ,
dft['A'].shift(1, freq='M'))
However, the result is simply a copy of column A. I don't know why. But when I tried for single line of code :
dft['C'] = dft['A'].shift(7, freq='M')
then everything is shifted as expected. I don't know what is the issue here
The issue is index alignment. This shift that you performed acts on the index, but using numpy.where you convert to arrays and lose the index.
Use pandas' where or mask instead, everything will remain as Series and the index will be preserved:
dft['B'] = (dft['A'].shift(1, freq='M')
.mask(dft.index.month == 7, dft['A'].shift(7, freq='M'))
)
output:
A B
2013-01-31 -2.202668 NaN
2013-02-28 0.878792 -2.202668
2013-03-31 -0.982540 0.878792
2013-04-30 0.119029 -0.982540
2013-05-31 -0.119644 0.119029
2013-06-30 -1.038124 -0.119644
2013-07-31 0.177794 -1.038124
2013-08-31 0.206593 -2.202668 <- correct
2013-09-30 0.188426 0.206593
2013-10-31 0.764086 0.188426
... ... ...
2020-12-31 1.382249 -1.413214
2021-01-31 -0.303696 1.382249
2021-02-28 -1.622287 -0.303696
2021-03-31 -0.763898 -1.622287
2021-04-30 0.420844 -0.763898
[100 rows x 2 columns]
This question already has answers here:
How to filter a dataframe of dates by a particular month/day?
(3 answers)
Closed 1 year ago.
I have financial data:
Open High ... Adj Close Volume
Date ...
2016-11-17 60.410000 60.950001 ... 56.484898 32132700
2016-11-18 60.779999 61.139999 ... 56.214767 27686300
2016-11-21 60.500000 60.970001 ... 56.689823 19652600
2016-11-22 60.980000 61.259998 ... 56.932003 23206700
2016-11-23 61.009998 61.099998 ... 56.261349 21848900
... ... ... ... ...
2021-11-10 334.570007 334.630005 ... 330.799988 25500900
2021-11-11 331.250000 333.769989 ... 332.429993 16849800
2021-11-12 333.920013 337.230011 ... 336.720001 23822000
2021-11-15 337.540009 337.880005 ... 336.070007 16723000
2021-11-16 335.679993 340.670013 ... 339.510010 20746300
I want to filter out all the examples in a specific month, e.g., November. To clarify, I want data from each November, regardless of the year.
I guess I could reset the index and than extract the month somehow.
Is there an easier way?, like between_time offers the option to filter out intra-day time intervals.
Assuming you have a DatetimeIndex, use dt accessor.
df_nov = df[df.index.month == 11]
I originally had a dataframe df1,
Close
ticker AAPL AMD BIDU GOOGL IXIC
Date
2011-06-01 12.339643 8.370000 132.470001 263.063049 2769.189941
2011-06-02 12.360714 8.240000 138.490005 264.294281 2773.310059
2011-06-03 12.265714 7.970000 133.210007 261.801788 2732.780029
2011-06-06 12.072857 7.800000 126.970001 260.790802 2702.560059
2011-06-07 11.858571 7.710000 124.820000 259.774780 2701.560059
... ... ... ... ... ...
2021-05-24 127.099998 77.440002 188.960007 2361.040039 13661.169922
2021-05-25 126.900002 77.860001 192.770004 2362.870117 13657.169922
2021-05-26 126.849998 78.339996 194.880005 2380.310059 13738.000000
2021-05-27 125.279999 78.419998 194.809998 2362.679932 13736.280273
2021-05-28 124.610001 80.080002 196.270004 2356.850098 13748.740234
Due to the need for calculation, I changed the columns and created df2, which contains no Close,
ticker AAPL AMD BIDU GOOGL IXIC
Date
2011-08-25 0.760119 0.028203 0.621415 0.036067 0.993046
2011-09-23 0.648490 0.216017 0.267167 0.699657 0.562897
2011-10-21 0.442864 0.326310 0.197121 0.399332 0.048258
2011-11-18 0.333015 0.062089 0.164588 0.373293 0.015258
2011-12-19 0.101208 0.389120 0.218844 0.094759 0.116979
... ... ... ... ... ...
2021-01-12 0.437177 0.012871 0.997870 0.075802 0.137392
2021-02-10 0.064343 0.178901 0.522356 0.625447 0.320007
2021-03-11 0.135033 0.300345 0.630085 0.253857 0.466884
2021-04-09 0.358583 0.484004 0.295894 0.215424 0.454395
2021-05-07 0.124987 0.311816 0.999940 0.232552 0.281189
And now I am struggling on how to add a name to the dataframe again, say ret, because I would like to plot the histogram of each column, and would like the titles to be something like ('ret', 'AAPL')...
This may be a bit stupid and confusing, hopefully I have explained the question clearly. Thanks for any help.
you can use pd.MultiIndex.from_product() method:
df2=df2.set_index('Date')
#If 'Date' column is not your Index then make it index
df2.columns=pd.MultiIndex.from_product([['ret'],df2.columns])
Right now what I am doing is to pull data for the last 30 days, store this in a dataframe and then pick the data for the last 20 days to use. However If one of the days in the last 20 days is a holiday, then Yahoo shows the Volume across that day as 0 and fills the OHLC(Open, High, Low, Close, Adj Close) with the Adj Close of the previous day. In the example shown below, the data for 2016-01-26 is invalid and I dont want to retreive this data.
So how do I pull data from Yahoo for excatly the last 20 working days ?
My present code is below:
from datetime import date, datetime, timedelta
import pandas_datareader.data as web
todays_date = date.today()
n = 30
date_n_days_ago = date.today() - timedelta(days=n)
yahoo_data = web.DataReader('ACC.NS', 'yahoo', date_n_days_ago, todays_date)
yahoo_data_20_day = yahoo_data.tail(20)
IIUC you can add filter, where column Volume is not 0:
from datetime import date, datetime, timedelta
import pandas_datareader.data as web
todays_date = date.today()
n = 30
date_n_days_ago = date.today() - timedelta(days=n)
yahoo_data = web.DataReader('ACC.NS', 'yahoo', date_n_days_ago, todays_date)
#add filter - get data, where column Volume is not 0
yahoo_data = yahoo_data[yahoo_data.Volume != 0]
yahoo_data_20_day = yahoo_data.tail(20)
print yahoo_data_20_day
Open High Low Close Volume Adj Close
Date
2016-01-20 1218.90 1229.00 1205.00 1212.25 156300 1206.32
2016-01-21 1225.00 1236.95 1211.25 1228.45 209200 1222.44
2016-01-22 1239.95 1256.65 1230.05 1241.00 123200 1234.93
2016-01-25 1250.00 1263.50 1241.05 1245.00 124500 1238.91
2016-01-27 1249.00 1250.00 1228.00 1230.35 112800 1224.33
2016-01-28 1232.40 1234.90 1208.00 1214.95 134500 1209.00
2016-01-29 1220.10 1253.50 1216.05 1240.05 254400 1233.98
2016-02-01 1245.00 1278.90 1240.30 1271.85 210900 1265.63
2016-02-02 1266.80 1283.00 1253.05 1261.35 204600 1255.18
2016-02-03 1244.00 1279.00 1241.45 1248.95 191000 1242.84
2016-02-04 1255.25 1277.40 1253.20 1270.40 205900 1264.18
2016-02-05 1267.05 1286.00 1259.05 1271.40 231300 1265.18
2016-02-08 1271.00 1309.75 1270.15 1280.60 218500 1274.33
2016-02-09 1271.00 1292.85 1270.00 1279.10 148600 1272.84
2016-02-10 1270.00 1278.25 1250.05 1265.85 256800 1259.66
2016-02-11 1250.00 1264.70 1225.50 1234.00 231500 1227.96
2016-02-12 1234.20 1242.65 1199.10 1221.05 212000 1215.07
2016-02-15 1230.00 1268.70 1228.35 1256.55 130800 1250.40
2016-02-16 1265.00 1273.10 1225.00 1227.80 144700 1221.79
2016-02-17 1222.80 1233.50 1204.00 1226.05 165000 1220.05