I have a dataframe that looks like below,
Date 3tier1 3tier2
2013-01-01 08:00:00+08:00 20.97946282 20.97946282
2013-01-02 08:00:00+08:00 20.74539378 20.74539378
2013-01-03 08:00:00+08:00 20.51126054 20.51126054
2013-01-04 08:00:00+08:00 20.27707322 20.27707322
2013-01-05 08:00:00+08:00 20.04284112 20.04284112
2013-01-06 08:00:00+08:00 19.80857234 19.80857234
2013-01-07 08:00:00+08:00 19.57427331 19.57427331
2013-01-08 08:00:00+08:00 19.33994822 19.33994822
2013-01-09 08:00:00+08:00 19.10559849 19.10559849
2013-01-10 08:00:00+08:00 18.87122241 18.87122241
2013-01-11 08:00:00+08:00 18.63681507 18.63681507
2013-01-12 08:00:00+08:00 18.40236877 18.40236877
2013-01-13 08:00:00+08:00 18.16787383 18.16787383
2013-01-14 08:00:00+08:00 17.93331972 17.93331972
2013-01-15 08:00:00+08:00 17.69869612 17.69869612
2013-01-16 08:00:00+08:00 17.46399372 17.46399372
2013-01-17 08:00:00+08:00 17.22920466 17.22920466
2013-01-18 08:00:00+08:00 16.9943227 16.9943227
2013-01-19 08:00:00+08:00 17.27850867 16.7593431
2013-01-20 08:00:00+08:00 17.69762778 16.52426248
2013-01-21 08:00:00+08:00 18.12537837 16.28907864
2013-01-22 08:00:00+08:00 18.56180775 16.05379043
2013-01-23 08:00:00+08:00 19.00689471 15.81839767
2013-01-24 08:00:00+08:00 19.46053468 15.58290109
2013-01-25 08:00:00+08:00 19.92252218 15.3473024
2013-01-26 08:00:00+08:00 20.3925305 15.11160423
2013-01-27 08:00:00+08:00 20.87008788 14.87581016
2013-01-28 08:00:00+08:00 21.35454987 14.63992467
2013-01-29 08:00:00+08:00 21.84506726 14.40395298
2013-01-30 08:00:00+08:00 22.34054913 14.16790086
2013-01-31 08:00:00+08:00 22.83962058 13.93177434
2013-02-01 08:00:00+08:00 23.34057473 13.69557937
2013-02-02 08:00:00+08:00 23.84131896 13.45932144
2013-02-03 08:00:00+08:00 24.33931544 13.22300514
2013-02-04 08:00:00+08:00 24.8315166 12.98663374
2013-02-05 08:00:00+08:00 25.31429677 12.7502088
2013-02-06 08:00:00+08:00 25.78338191 12.51372976
2013-02-07 08:00:00+08:00 26.23378052 12.27719367
2013-02-08 08:00:00+08:00 26.65971992 12.04059517
2013-02-09 08:00:00+08:00 27.05459343 11.80392662
2013-02-10 08:00:00+08:00 27.41092527 11.56717871
2013-02-11 08:00:00+08:00 27.72036088 11.3303412
2013-02-12 08:00:00+08:00 27.97369094 11.09340384
2013-02-13 08:00:00+08:00 28.16091685 10.85635718
2013-02-14 08:00:00+08:00 28.27136466 10.61919323
2013-02-15 08:00:00+08:00 28.29385218 10.38190579
2013-02-16 08:00:00+08:00 28.21691143 10.14449064
2013-02-17 08:00:00+08:00 28.02906576 9.906945571
2013-02-18 08:00:00+08:00 27.71915819 9.669270289
2013-02-19 08:00:00+08:00 27.27672516 9.431466436
2013-02-20 08:00:00+08:00 26.69240919 9.193537583
2013-02-21 08:00:00+08:00 25.9584032 8.955489323
2013-02-22 08:00:00+08:00 25.06891975 8.717329426
2013-02-23 08:00:00+08:00 24.02067835 8.479068052
2013-02-24 08:00:00+08:00 22.81340411 8.240718006
2013-02-25 08:00:00+08:00 21.45033241 8.002294987
2013-02-26 08:00:00+08:00 19.93872048 7.763817801
2013-02-27 08:00:00+08:00 18.29038758 7.525308512
2013-02-28 08:00:00+08:00 16.5223583 7.286792516
2013-03-01 08:00:00+08:00 14.65781009 7.048298548
2013-03-02 08:00:00+08:00 12.72782154 6.809858708
2013-03-03 08:00:00+08:00 10.77512952 6.57150857
2013-03-04 08:00:00+08:00 8.862866684 6.333287469
2013-03-05 08:00:00+08:00 7.095368405 6.095239078
2013-03-06 08:00:00+08:00 5.857412338 5.857412338
2013-03-07 08:00:00+08:00 6.062085995 5.619862847
2013-03-08 08:00:00+08:00 7.707047277 5.382654808
2013-03-09 08:00:00+08:00 9.419192265 5.145863673
2013-03-10 08:00:00+08:00 11.12489254 4.909579657
2013-03-11 08:00:00+08:00 12.78439056 4.673912321
2013-03-12 08:00:00+08:00 14.37406958 4.438996486
2013-03-13 08:00:00+08:00 15.87932086 4.204999838
2013-03-14 08:00:00+08:00 17.29126015 3.97213278
2013-03-15 08:00:00+08:00 18.60496304 3.740661371
2013-03-16 08:00:00+08:00 19.81836754 3.510924673
2013-03-17 08:00:00+08:00 20.9315104 3.283358444
2013-03-18 08:00:00+08:00 21.94595693 3.058528064
2013-03-19 08:00:00+08:00 22.86436015 2.837174881
2013-03-20 08:00:00+08:00 23.69011593 2.620282024
2013-03-21 08:00:00+08:00 24.42709384 2.409168144
2013-03-22 08:00:00+08:00 25.07942941 2.205620134
2013-03-23 08:00:00+08:00 25.65136634 2.012076744
2013-03-24 08:00:00+08:00 26.14713926 1.831868652
2013-03-25 08:00:00+08:00 26.57088882 1.669492776
2013-03-26 08:00:00+08:00 26.92660259 1.53082259
2013-03-27 08:00:00+08:00 27.21807571 1.423006398
2013-03-28 08:00:00+08:00 27.44888683 1.353644799
2013-03-29 08:00:00+08:00 27.66626757 1.328979238
2013-03-30 08:00:00+08:00 28.03215155 1.351655979
2013-03-31 08:00:00+08:00 28.34758652 1.419589908
I would like to find the range for each month for column of my choice. and group the months when there is a change in direction of range, Say for example: 3tier1 for the month 1 actually starts from 20 goes to 16 and then again goes to 22, ex: From Jan 1 to Jan 18 - downward 20 to 16 and then from Jan 19 to Feb 15 upward from 17 to 28 and so on and so forth,
Expected output:
2013-01-01 to 2013-01-18 - 20 to 16
2013-01-19 to 2013-02-15 - 17 to 28
Is there a builtin pandas function that can do this with ease? Thanks for your help in advance.
I don't know of built in function that does what you are looking for. It can be put together with enough lines of code. I would use .diff() and .shift().
This is what I came up with.
import pandas as pd
import numpy as np
file = 'C:/path_to_file/data.csv'
df = pd.read_csv(file, parse_dates=['Date'])
# Now I have your dataframe loaded. ** Your procedures are below.
df['trend'] = np.where(df['3tier1'].diff()>0,1,-1) # trend is increasing or decreasing
df['again'] = df['trend'].diff() # get the differnece in trend
df['again'] = df['again'].shift(periods=-1) + df['again']
df['change'] = np.where(df['again'].isin([2,-2,np.nan]), 2, 0)
# get to the desired data.
dfc = df[df['change']==2]
dfc['to_date'] = dfc['Date'].shift(periods=-1)
dfc['to_End'] = dfc['3tier1'].shift(periods=-1)
dfc.drop(columns=['trend', 'again','change'], inplace=True)
# get the rows that show the trend
dfc = dfc.iloc[::2, :]
print(dfc)
Related
I have a dataframe of minute data for multiple Stocks, each stock has multiple sessions. See sample below
Symbol Time Open High Low Close Volume LOD
2724312 AEHR 2019-09-23 09:31:00 1.42 1.42 1.42 1.42 200 NaN
2724313 AEHR 2019-09-23 09:43:00 1.35 1.35 1.34 1.34 6062 NaN
2724314 AEHR 2019-09-23 09:58:00 1.35 1.35 1.29 1.30 8665 NaN
2724315 AEHR 2019-09-23 09:59:00 1.32 1.32 1.32 1.32 100 NaN
2724316 AEHR 2019-09-23 10:00:00 1.35 1.35 1.35 1.35 400 NaN
... ... ... ... ... ... ... ... ...
4266341 ZI 2021-09-10 15:56:00 63.08 63.16 63.08 63.15 18205 NaN
4266342 ZI 2021-09-10 15:57:00 63.14 63.14 63.07 63.07 19355 NaN
4266343 ZI 2021-09-10 15:58:00 63.07 63.12 63.07 63.10 16650 NaN
4266344 ZI 2021-09-10 15:59:00 63.09 63.12 63.06 63.11 25775 NaN
4266345 ZI 2021-09-10 16:00:00 63.11 63.17 63.11 63.17 28578 NaN
I need the Low Of Day(LOD) for the session (9:30-4pm) up to the time in each row.
The completed df should look like this
Symbol Time Open High Low Close Volume LOD
2724312 AEHR 2019-09-23 09:31:00 1.42 1.42 1.42 1.42 200 1.42
2724313 AEHR 2019-09-23 09:43:00 1.35 1.35 1.34 1.34 6062 1.34
2724314 AEHR 2019-09-23 09:58:00 1.35 1.35 1.29 1.30 8665 1.29
2724315 AEHR 2019-09-23 09:59:00 1.32 1.32 1.32 1.32 100 1.29
2724316 AEHR 2019-09-23 10:00:00 1.35 1.35 1.35 1.35 400 1.29
... ... ... ... ... ... ... ... ...
4266341 ZI 2021-09-10 15:56:00 63.08 63.16 63.08 63.15 18205 63.08
4266342 ZI 2021-09-10 15:57:00 63.14 63.14 63.07 63.07 19355 63.07
4266343 ZI 2021-09-10 15:58:00 63.07 63.12 63.07 63.10 16650 63.07
4266344 ZI 2021-09-10 15:59:00 63.09 63.12 63.06 63.11 25775 63.06
4266345 ZI 2021-09-10 16:00:00 63.11 63.17 63.11 63.17 28578 63.06
My current solution
prev_symbol = "WXYZ"
prev_low = 10000000
prev_session = datetime.date(1920, 1, 1)
session_start = 1
for i, row in df.iterrows():
current_session = (df['Time'].iloc[i]).time()
current_symbol = df['Symbol'].iloc[i]
if current_symbol == prev_symbol:
if current_session == prev_session:
sesh_low = df.iloc[session_start:i, 'Low'].min()
df.at[i, 'LOD'] = sesh_low
else:
df.at[i, 'LOD'] = df.at[i, 'Low']
prev_session = current_session
session_start = i
else:
df.at[i, 'LOD'] = df.at[i, 'Low']
prev_symbol = current_symbol
prev_session = current_session
session_start = i
This returns a SettingWithCopyWarning error. Please help
You can try .groupby() + .expanding():
# if you have values already converted/sorted, skip:
# df["Time"] = pd.to_datetime(df["Time"])
# df = df.sort_values(by=["Symbol", "Time"])
df["LOD"] = df.groupby("Symbol")["Low"].expanding().min().values
print(df)
Prints:
Symbol Time Open High Low Close Volume LOD
2724312 AEHR 2019-09-23 09:31:00 1.42 1.42 1.42 1.42 200 1.42
2724313 AEHR 2019-09-23 09:43:00 1.35 1.35 1.34 1.34 6062 1.34
2724314 AEHR 2019-09-23 09:58:00 1.35 1.35 1.29 1.30 8665 1.29
2724315 AEHR 2019-09-23 09:59:00 1.32 1.32 1.32 1.32 100 1.29
2724316 AEHR 2019-09-23 10:00:00 1.35 1.35 1.35 1.35 400 1.29
4266341 ZI 2021-09-10 15:56:00 63.08 63.16 63.08 63.15 18205 63.08
4266342 ZI 2021-09-10 15:57:00 63.14 63.14 63.07 63.07 19355 63.07
4266343 ZI 2021-09-10 15:58:00 63.07 63.12 63.07 63.10 16650 63.07
4266344 ZI 2021-09-10 15:59:00 63.09 63.12 63.06 63.11 25775 63.06
4266345 ZI 2021-09-10 16:00:00 63.11 63.17 63.11 63.17 28578 63.06
Here's to read in a DataFrame like the one I'm looking at
pd.DataFrame({
'period' : [1, 2, 3, 4, 5, 8, 9, 10, 11, 13, 14, 15, 16, 19, 20, 21, 22,
23, 25, 26],
'id' : [1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285,
1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285],
'pred': [-1.6534775, -1.6534775, -1.6534775, -1.6534775, -1.6534775,
-1.6534775, -1.6534775, -1.6534775, -1.6534775, -1.6534775,
-1.6534775, -1.6534775, -1.6534775, -1.6534775, -1.6534775,
-1.6534775, -1.6534775, -1.6534775, -1.6534775, -1.6534775],
'ret' : [ None, -0.02222222, -0.01363636, 0. , -0.02764977,
None, -0.00909091, -0.01376147, 0.00465116, None,
0.01869159, 0. , 0. , None , -0.00460829,
0.00462963, 0.02304147, 0. , None, -0.00050756]})
Which will look like this when read in.
period id pred ret
0 1 1285 -1.653477 NaN
1 2 1285 -1.653477 -0.022222
2 3 1285 -1.653477 -0.013636
3 4 1285 -1.653477 0.000000
4 5 1285 -1.653477 -0.027650
5 8 1285 -1.653477 NaN
6 9 1285 -1.653477 -0.009091
7 10 1285 -1.653477 -0.013761
8 11 1285 -1.653477 0.004651
9 13 1285 -1.653477 NaN
10 14 1285 -1.653477 0.018692
11 15 1285 -1.653477 0.000000
12 16 1285 -1.653477 0.000000
13 19 1285 -1.653477 NaN
14 20 1285 -1.653477 -0.004608
15 21 1285 -1.653477 0.004630
16 22 1285 -1.653477 0.023041
17 23 1285 -1.653477 0.000000
18 25 1285 -1.653477 NaN
19 26 1285 -1.653477 -0.000508
pred is 20 period prediction and consequently I want to do is bring the returns back 20 days. (but do it in a flexible way.)
Here's the lag function I have presently
def lag(df, col, lag_dist=1, ref='period', group='id'):
df = df.copy()
new_col = 'lag'+str(lag_dist)+'_'+col
df[new_col] = df.groupby(group)[col].shift(lag_dist)
# set NaN values that differ from specified
df[new_col] = (df.groupby(group)[ref]
.shift(lag_dist)
.sub(df[ref])
.eq(-lag_dist)
.mul(1)
.replace(0,np.nan)*df[new_col])
return df[new_col]
but when I run
df['fut20_ret'] = lag(df, 'ret', -20, 'period')
df.head(20)
I get
period id pred gain fee prc ret fut20_ret
0 1 1285 -1.653478 0.000000 0.87 1.000000 NaN NaN
1 2 1285 -1.653478 -0.022222 0.87 0.977778 -0.022222 NaN
2 3 1285 -1.653478 -0.035556 0.87 0.964444 -0.013636 NaN
3 4 1285 -1.653478 -0.035556 0.87 0.964444 0.000000 NaN
4 5 1285 -1.653478 -0.062222 0.87 0.937778 -0.027650 NaN
6 8 1285 -1.653478 -0.022222 0.87 0.977778 NaN NaN
7 9 1285 -1.653478 -0.031111 0.87 0.968889 -0.009091 NaN
8 10 1285 -1.653478 -0.044444 0.87 0.955556 -0.013761 NaN
9 11 1285 -1.653478 -0.040000 0.87 0.960000 0.004651 NaN
10 13 1285 -1.653478 -0.048889 0.87 0.951111 NaN NaN
11 14 1285 -1.653478 -0.031111 0.87 0.968889 0.018692 NaN
12 15 1285 -1.653478 -0.031111 0.87 0.968889 0.000000 NaN
13 16 1285 -1.653478 -0.031111 0.87 0.968889 0.000000 NaN
15 19 1285 -1.653478 -0.035556 0.87 0.964444 NaN NaN
16 20 1285 -1.653478 -0.040000 0.87 0.960000 -0.004608 NaN
17 21 1285 -1.653478 -0.035556 0.87 0.964444 0.004630 NaN
18 22 1285 -1.653478 -0.013333 0.87 0.986667 0.023041 NaN
19 23 1285 -1.653478 -0.013333 0.87 0.986667 0.000000 NaN
How can I modify my lag function so that it works properly? It's close but I'm struggling on the last little bit.
i have dataframe belows. and every day i have 9 point.
Date,Time is multiindexed index.
i wanna replace Time index in other time(00:00:00~2:00:00)everyday
Date Time a b c
2018-01-09 6:00:00 20.31 0 -2.95
2018-01-09 6:15:00 20.76 26738 -2.88
2018-01-09 6:30:00 21.4 22462 -2.77
2018-01-09 6:45:00 21.84 20033 -3
2018-01-09 7:00:00 22.17 20010 -3.28
2018-01-09 7:15:00 22.38 18133 -2.82
2018-01-09 7:30:00 22.75 18254 -3.14
2018-01-09 7:45:00 22.93 17039 -3.22
2018-01-09 8:00:00 23.13 15934 -3.27
2018-01-10 6:00:00 20.31 0 -2.95
2018-01-10 6:15:00 20.76 26738 -2.88
2018-01-10 6:30:00 21.4 22462 -2.77
2018-01-10 6:45:00 21.84 20033 -3
2018-01-10 7:00:00 22.17 20010 -3.28
2018-01-10 7:15:00 22.38 18133 -2.82
2018-01-10 7:30:00 22.75 18254 -3.14
2018-01-10 7:45:00 22.93 17039 -3.22
2018-01-10 8:00:00 23.13 15934 -3.27
so result should be belows
Date Time a b c
2018-01-09 0:00:00 20.31 0 -2.95
2018-01-09 0:15:00 20.76 26738 -2.88
2018-01-09 0:30:00 21.4 22462 -2.77
2018-01-09 0:45:00 21.84 20033 -3
2018-01-09 1:00:00 22.17 20010 -3.28
2018-01-09 1:15:00 22.38 18133 -2.82
2018-01-09 1:30:00 22.75 18254 -3.14
2018-01-09 1:45:00 22.93 17039 -3.22
2018-01-09 2:00:00 23.13 15934 -3.27
2018-01-10 0:00:00 20.31 0 -2.95
2018-01-10 0:15:00 20.76 26738 -2.88
2018-01-10 0:30:00 21.4 22462 -2.77
2018-01-10 0:45:00 21.84 20033 -3
2018-01-10 1:00:00 22.17 20010 -3.28
2018-01-10 1:15:00 22.38 18133 -2.82
2018-01-10 1:30:00 22.75 18254 -3.14
2018-01-10 1:45:00 22.93 17039 -3.22
2018-01-10 2:00:00 23.13 15934 -3.27
how can i do it?
If want replace all values by 15min interval times by days, you can first create dictinary for mapping:
d = dict(enumerate(pd.date_range(start='2018-01-01', end='2018-01-02', freq='15T').strftime('%H:%M:%S')))
print (d)
{0: '00:00:00', 1: '00:15:00', 2: '00:30:00', 3: '00:45:00', 4: '01:00:00', 5: '01:15:00', 6: '01:30:00', 7: '01:45:00', 8: '02:00:00', 9: '02:15:00', 10: '02:30:00', 11: '02:45:00', 12: '03:00:00', 13: '03:15:00', 14: '03:30:00', 15: '03:45:00', 16: '04:00:00', 17: '04:15:00', 18: '04:30:00', 19: '04:45:00', 20: '05:00:00', 21: '05:15:00', 22: '05:30:00', 23: '05:45:00', 24: '06:00:00', 25: '06:15:00', 26: '06:30:00', 27: '06:45:00', 28: '07:00:00', 29: '07:15:00', 30: '07:30:00', 31: '07:45:00', 32: '08:00:00', 33: '08:15:00', 34: '08:30:00', 35: '08:45:00', 36: '09:00:00', 37: '09:15:00', 38: '09:30:00', 39: '09:45:00', 40: '10:00:00', 41: '10:15:00', 42: '10:30:00', 43: '10:45:00', 44: '11:00:00', 45: '11:15:00', 46: '11:30:00', 47: '11:45:00', 48: '12:00:00', 49: '12:15:00', 50: '12:30:00', 51: '12:45:00', 52: '13:00:00', 53: '13:15:00', 54: '13:30:00', 55: '13:45:00', 56: '14:00:00', 57: '14:15:00', 58: '14:30:00', 59: '14:45:00', 60: '15:00:00', 61: '15:15:00', 62: '15:30:00', 63: '15:45:00', 64: '16:00:00', 65: '16:15:00', 66: '16:30:00', 67: '16:45:00', 68: '17:00:00', 69: '17:15:00', 70: '17:30:00', 71: '17:45:00', 72: '18:00:00', 73: '18:15:00', 74: '18:30:00', 75: '18:45:00', 76: '19:00:00', 77: '19:15:00', 78: '19:30:00', 79: '19:45:00', 80: '20:00:00', 81: '20:15:00', 82: '20:30:00', 83: '20:45:00', 84: '21:00:00', 85: '21:15:00', 86: '21:30:00', 87: '21:45:00', 88: '22:00:00', 89: '22:15:00', 90: '22:30:00', 91: '22:45:00', 92: '23:00:00', 93: '23:15:00', 94: '23:30:00', 95: '23:45:00', 96: '00:00:00'}
Then use cumcount for Counter and map:
s = df.groupby(level=0).cumcount().map(d)
print (s)
Date Time
2018-01-09 6:00:00 00:00:00
6:15:00 00:15:00
6:30:00 00:30:00
6:45:00 00:45:00
7:00:00 01:00:00
7:15:00 01:15:00
7:30:00 01:30:00
7:45:00 01:45:00
8:00:00 02:00:00
2018-01-10 6:00:00 00:00:00
6:15:00 00:15:00
6:30:00 00:30:00
6:45:00 00:45:00
7:00:00 01:00:00
7:15:00 01:15:00
7:30:00 01:30:00
7:45:00 01:45:00
8:00:00 02:00:00
Last reassign new index by set_index with get_level_values for first level indices:
df = df.set_index([df.index.get_level_values(0), s])
print (df)
a b c
Date
2018-01-09 00:00:00 20.31 0 -2.95
00:15:00 20.76 26738 -2.88
00:30:00 21.40 22462 -2.77
00:45:00 21.84 20033 -3.00
01:00:00 22.17 20010 -3.28
01:15:00 22.38 18133 -2.82
01:30:00 22.75 18254 -3.14
01:45:00 22.93 17039 -3.22
02:00:00 23.13 15934 -3.27
2018-01-10 00:00:00 20.31 0 -2.95
00:15:00 20.76 26738 -2.88
00:30:00 21.40 22462 -2.77
00:45:00 21.84 20033 -3.00
01:00:00 22.17 20010 -3.28
01:15:00 22.38 18133 -2.82
01:30:00 22.75 18254 -3.14
01:45:00 22.93 17039 -3.22
02:00:00 23.13 15934 -3.27
my pivot table looks like this:
In [285]: piv
Out[285]:
K 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398
and df2 looks like this:
In [290]: df2
Out[290]:
F Symbol
Expiry
2018-03-20 12:00:00 123.000000 ZN MAR 18
2018-06-20 12:00:00 122.609375 ZN JUN 18
I am looking to add piv['F'] based on the following:
piv.index.month < df2.index.month
so the result should looks like this:
K F 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-19 123.000000 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 123.000000 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 123.000000 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 123.000000 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 123.609375 0.041025 0.041000 0.040872 0.040623 0.040398
would help will be much appreciated.
reindex + backfill
df.index=pd.to_datetime(df.index)
df1.index=pd.to_datetime(df1.index)
df['F']=df1.reindex(df.index,method='backfill').F.values
df
Out[164]:
118.5 119.0 119.5 120.0 120.5 F
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842 123.000000
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526 123.000000
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196 123.000000
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991 123.000000
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005 123.000000
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398 122.609375
You want to use pd.merge_asof with direction='forward' and make sure to merge on the indices.
pd.merge_asof(
piv, df2[['F']],
left_index=True,
right_index=True,
direction='forward'
)
118.5 119.0 119.5 120.0 120.5 F
Expiry
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842 123.000000
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526 123.000000
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196 123.000000
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991 123.000000
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005 123.000000
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398 122.609375
And if you want 'F' in front:
pd.merge_asof(
piv, df2[['F']],
left_index=True,
right_index=True,
direction='forward'
).pipe(lambda d: d[['F']].join(d.drop('F', 1)))
F 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-12 123.000000 0.050842 0.050842 0.050842 0.050842 0.050842
2018-01-19 123.000000 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 123.000000 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 123.000000 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 123.000000 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 122.609375 0.041025 0.041000 0.040872 0.040623 0.040398
I'd like to read the quotations of several tickers at the same time. I am using:
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
start = datetime.datetime(2017, 9, 20)
end = datetime.datetime(2017,9,22)
h = web.DataReader(["EWI", "EWG"], "yahoo", start, end)
... and it seems to work.
However, the data are read into a panel data structure. If I print variable "h" I get:
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 4 (major_axis) x 2 (minor_axis)
Items axis: Adj Close to Volume
Major_axis axis: 2017-09-22 00:00:00 to 2017-09-19 00:00:00
Minor_axis axis: EWG to EWI
I'd like:
to "see" the resulting panel values (I'm relatively new to pandas).
is it possible to flatten the panel into a DataFrame? (IMO it is better documented)
If I read the "Adjusted close" for me it would be more than enough. Perhaps reading into DataFrame directly would be easier?
Thank you
I think you need Panel.to_frame for MultiIndex DataFrame:
#with random data
df = h.to_frame()
print (df)
Adj Close Close High Low Open Volume
major minor
2013-01-01 EWI 0.471435 0.471435 0.471435 0.471435 0.471435 0.471435
EWG -1.190976 -1.190976 -1.190976 -1.190976 -1.190976 -1.190976
2013-01-02 EWI 1.432707 1.432707 1.432707 1.432707 1.432707 1.432707
EWG -0.312652 -0.312652 -0.312652 -0.312652 -0.312652 -0.312652
2013-01-03 EWI -0.720589 -0.720589 -0.720589 -0.720589 -0.720589 -0.720589
EWG 0.887163 0.887163 0.887163 0.887163 0.887163 0.887163
2013-01-04 EWI 0.859588 0.859588 0.859588 0.859588 0.859588 0.859588
EWG -0.636524 -0.636524 -0.636524 -0.636524 -0.636524 -0.636524
And then select column:
s = df['Adj Close']
print (s)
major minor
2013-01-01 EWI 0.471435
EWG -1.190976
2013-01-02 EWI 1.432707
EWG -0.312652
2013-01-03 EWI -0.720589
EWG 0.887163
2013-01-04 EWI 0.859588
EWG -0.636524
Name: Adj Close, dtype: float64
df1 = df[['Adj Close']]
print (df1)
Adj Close
major minor
2013-01-01 EWI 0.471435
EWG -1.190976
2013-01-02 EWI 1.432707
EWG -0.312652
2013-01-03 EWI -0.720589
EWG 0.887163
2013-01-04 EWI 0.859588
EWG -0.636524
Notice:
In future Panel will be deprecated.