Finding ranges from a dataframe

Finding ranges from a dataframe - pandas

I have a dataframe that looks like below,
Date 3tier1 3tier2
2013-01-01 08:00:00+08:00 20.97946282 20.97946282
2013-01-02 08:00:00+08:00 20.74539378 20.74539378
2013-01-03 08:00:00+08:00 20.51126054 20.51126054
2013-01-04 08:00:00+08:00 20.27707322 20.27707322
2013-01-05 08:00:00+08:00 20.04284112 20.04284112
2013-01-06 08:00:00+08:00 19.80857234 19.80857234
2013-01-07 08:00:00+08:00 19.57427331 19.57427331
2013-01-08 08:00:00+08:00 19.33994822 19.33994822
2013-01-09 08:00:00+08:00 19.10559849 19.10559849
2013-01-10 08:00:00+08:00 18.87122241 18.87122241
2013-01-11 08:00:00+08:00 18.63681507 18.63681507
2013-01-12 08:00:00+08:00 18.40236877 18.40236877
2013-01-13 08:00:00+08:00 18.16787383 18.16787383
2013-01-14 08:00:00+08:00 17.93331972 17.93331972
2013-01-15 08:00:00+08:00 17.69869612 17.69869612
2013-01-16 08:00:00+08:00 17.46399372 17.46399372
2013-01-17 08:00:00+08:00 17.22920466 17.22920466
2013-01-18 08:00:00+08:00 16.9943227 16.9943227
2013-01-19 08:00:00+08:00 17.27850867 16.7593431
2013-01-20 08:00:00+08:00 17.69762778 16.52426248
2013-01-21 08:00:00+08:00 18.12537837 16.28907864
2013-01-22 08:00:00+08:00 18.56180775 16.05379043
2013-01-23 08:00:00+08:00 19.00689471 15.81839767
2013-01-24 08:00:00+08:00 19.46053468 15.58290109
2013-01-25 08:00:00+08:00 19.92252218 15.3473024
2013-01-26 08:00:00+08:00 20.3925305 15.11160423
2013-01-27 08:00:00+08:00 20.87008788 14.87581016
2013-01-28 08:00:00+08:00 21.35454987 14.63992467
2013-01-29 08:00:00+08:00 21.84506726 14.40395298
2013-01-30 08:00:00+08:00 22.34054913 14.16790086
2013-01-31 08:00:00+08:00 22.83962058 13.93177434
2013-02-01 08:00:00+08:00 23.34057473 13.69557937
2013-02-02 08:00:00+08:00 23.84131896 13.45932144
2013-02-03 08:00:00+08:00 24.33931544 13.22300514
2013-02-04 08:00:00+08:00 24.8315166 12.98663374
2013-02-05 08:00:00+08:00 25.31429677 12.7502088
2013-02-06 08:00:00+08:00 25.78338191 12.51372976
2013-02-07 08:00:00+08:00 26.23378052 12.27719367
2013-02-08 08:00:00+08:00 26.65971992 12.04059517
2013-02-09 08:00:00+08:00 27.05459343 11.80392662
2013-02-10 08:00:00+08:00 27.41092527 11.56717871
2013-02-11 08:00:00+08:00 27.72036088 11.3303412
2013-02-12 08:00:00+08:00 27.97369094 11.09340384
2013-02-13 08:00:00+08:00 28.16091685 10.85635718
2013-02-14 08:00:00+08:00 28.27136466 10.61919323
2013-02-15 08:00:00+08:00 28.29385218 10.38190579
2013-02-16 08:00:00+08:00 28.21691143 10.14449064
2013-02-17 08:00:00+08:00 28.02906576 9.906945571
2013-02-18 08:00:00+08:00 27.71915819 9.669270289
2013-02-19 08:00:00+08:00 27.27672516 9.431466436
2013-02-20 08:00:00+08:00 26.69240919 9.193537583
2013-02-21 08:00:00+08:00 25.9584032 8.955489323
2013-02-22 08:00:00+08:00 25.06891975 8.717329426
2013-02-23 08:00:00+08:00 24.02067835 8.479068052
2013-02-24 08:00:00+08:00 22.81340411 8.240718006
2013-02-25 08:00:00+08:00 21.45033241 8.002294987
2013-02-26 08:00:00+08:00 19.93872048 7.763817801
2013-02-27 08:00:00+08:00 18.29038758 7.525308512
2013-02-28 08:00:00+08:00 16.5223583 7.286792516
2013-03-01 08:00:00+08:00 14.65781009 7.048298548
2013-03-02 08:00:00+08:00 12.72782154 6.809858708
2013-03-03 08:00:00+08:00 10.77512952 6.57150857
2013-03-04 08:00:00+08:00 8.862866684 6.333287469
2013-03-05 08:00:00+08:00 7.095368405 6.095239078
2013-03-06 08:00:00+08:00 5.857412338 5.857412338
2013-03-07 08:00:00+08:00 6.062085995 5.619862847
2013-03-08 08:00:00+08:00 7.707047277 5.382654808
2013-03-09 08:00:00+08:00 9.419192265 5.145863673
2013-03-10 08:00:00+08:00 11.12489254 4.909579657
2013-03-11 08:00:00+08:00 12.78439056 4.673912321
2013-03-12 08:00:00+08:00 14.37406958 4.438996486
2013-03-13 08:00:00+08:00 15.87932086 4.204999838
2013-03-14 08:00:00+08:00 17.29126015 3.97213278
2013-03-15 08:00:00+08:00 18.60496304 3.740661371
2013-03-16 08:00:00+08:00 19.81836754 3.510924673
2013-03-17 08:00:00+08:00 20.9315104 3.283358444
2013-03-18 08:00:00+08:00 21.94595693 3.058528064
2013-03-19 08:00:00+08:00 22.86436015 2.837174881
2013-03-20 08:00:00+08:00 23.69011593 2.620282024
2013-03-21 08:00:00+08:00 24.42709384 2.409168144
2013-03-22 08:00:00+08:00 25.07942941 2.205620134
2013-03-23 08:00:00+08:00 25.65136634 2.012076744
2013-03-24 08:00:00+08:00 26.14713926 1.831868652
2013-03-25 08:00:00+08:00 26.57088882 1.669492776
2013-03-26 08:00:00+08:00 26.92660259 1.53082259
2013-03-27 08:00:00+08:00 27.21807571 1.423006398
2013-03-28 08:00:00+08:00 27.44888683 1.353644799
2013-03-29 08:00:00+08:00 27.66626757 1.328979238
2013-03-30 08:00:00+08:00 28.03215155 1.351655979
2013-03-31 08:00:00+08:00 28.34758652 1.419589908
I would like to find the range for each month for column of my choice. and group the months when there is a change in direction of range, Say for example: 3tier1 for the month 1 actually starts from 20 goes to 16 and then again goes to 22, ex: From Jan 1 to Jan 18 - downward 20 to 16 and then from Jan 19 to Feb 15 upward from 17 to 28 and so on and so forth,
Expected output:
2013-01-01 to 2013-01-18 - 20 to 16
2013-01-19 to 2013-02-15 - 17 to 28
Is there a builtin pandas function that can do this with ease? Thanks for your help in advance.

I don't know of built in function that does what you are looking for. It can be put together with enough lines of code. I would use .diff() and .shift().
This is what I came up with.
import pandas as pd
import numpy as np
file = 'C:/path_to_file/data.csv'
df = pd.read_csv(file, parse_dates=['Date'])
# Now I have your dataframe loaded. ** Your procedures are below.
df['trend'] = np.where(df['3tier1'].diff()>0,1,-1) # trend is increasing or decreasing
df['again'] = df['trend'].diff() # get the differnece in trend
df['again'] = df['again'].shift(periods=-1) + df['again']
df['change'] = np.where(df['again'].isin([2,-2,np.nan]), 2, 0)
# get to the desired data.
dfc = df[df['change']==2]
dfc['to_date'] = dfc['Date'].shift(periods=-1)
dfc['to_End'] = dfc['3tier1'].shift(periods=-1)
dfc.drop(columns=['trend', 'again','change'], inplace=True)
# get the rows that show the trend
dfc = dfc.iloc[::2, :]
print(dfc)

Related

Get stock Low of Day (LOD) price for incomplete daily bar using minute bar data (multiple stocks, multiple sessions in one df) SettingWithCopyWarning

I have a dataframe of minute data for multiple Stocks, each stock has multiple sessions. See sample below
Symbol Time Open High Low Close Volume LOD
2724312 AEHR 2019-09-23 09:31:00 1.42 1.42 1.42 1.42 200 NaN
2724313 AEHR 2019-09-23 09:43:00 1.35 1.35 1.34 1.34 6062 NaN
2724314 AEHR 2019-09-23 09:58:00 1.35 1.35 1.29 1.30 8665 NaN
2724315 AEHR 2019-09-23 09:59:00 1.32 1.32 1.32 1.32 100 NaN
2724316 AEHR 2019-09-23 10:00:00 1.35 1.35 1.35 1.35 400 NaN
... ... ... ... ... ... ... ... ...
4266341 ZI 2021-09-10 15:56:00 63.08 63.16 63.08 63.15 18205 NaN
4266342 ZI 2021-09-10 15:57:00 63.14 63.14 63.07 63.07 19355 NaN
4266343 ZI 2021-09-10 15:58:00 63.07 63.12 63.07 63.10 16650 NaN
4266344 ZI 2021-09-10 15:59:00 63.09 63.12 63.06 63.11 25775 NaN
4266345 ZI 2021-09-10 16:00:00 63.11 63.17 63.11 63.17 28578 NaN
I need the Low Of Day(LOD) for the session (9:30-4pm) up to the time in each row.
The completed df should look like this
Symbol Time Open High Low Close Volume LOD
2724312 AEHR 2019-09-23 09:31:00 1.42 1.42 1.42 1.42 200 1.42
2724313 AEHR 2019-09-23 09:43:00 1.35 1.35 1.34 1.34 6062 1.34
2724314 AEHR 2019-09-23 09:58:00 1.35 1.35 1.29 1.30 8665 1.29
2724315 AEHR 2019-09-23 09:59:00 1.32 1.32 1.32 1.32 100 1.29
2724316 AEHR 2019-09-23 10:00:00 1.35 1.35 1.35 1.35 400 1.29
... ... ... ... ... ... ... ... ...
4266341 ZI 2021-09-10 15:56:00 63.08 63.16 63.08 63.15 18205 63.08
4266342 ZI 2021-09-10 15:57:00 63.14 63.14 63.07 63.07 19355 63.07
4266343 ZI 2021-09-10 15:58:00 63.07 63.12 63.07 63.10 16650 63.07
4266344 ZI 2021-09-10 15:59:00 63.09 63.12 63.06 63.11 25775 63.06
4266345 ZI 2021-09-10 16:00:00 63.11 63.17 63.11 63.17 28578 63.06
My current solution
prev_symbol = "WXYZ"
prev_low = 10000000
prev_session = datetime.date(1920, 1, 1)
session_start = 1
for i, row in df.iterrows():
current_session = (df['Time'].iloc[i]).time()
current_symbol = df['Symbol'].iloc[i]
if current_symbol == prev_symbol:
if current_session == prev_session:
sesh_low = df.iloc[session_start:i, 'Low'].min()
df.at[i, 'LOD'] = sesh_low
else:
df.at[i, 'LOD'] = df.at[i, 'Low']
prev_session = current_session
session_start = i
else:
df.at[i, 'LOD'] = df.at[i, 'Low']
prev_symbol = current_symbol
prev_session = current_session
session_start = i
This returns a SettingWithCopyWarning error. Please help

You can try .groupby() + .expanding():
# if you have values already converted/sorted, skip:
# df["Time"] = pd.to_datetime(df["Time"])
# df = df.sort_values(by=["Symbol", "Time"])
df["LOD"] = df.groupby("Symbol")["Low"].expanding().min().values
print(df)
Prints:
Symbol Time Open High Low Close Volume LOD
2724312 AEHR 2019-09-23 09:31:00 1.42 1.42 1.42 1.42 200 1.42
2724313 AEHR 2019-09-23 09:43:00 1.35 1.35 1.34 1.34 6062 1.34
2724314 AEHR 2019-09-23 09:58:00 1.35 1.35 1.29 1.30 8665 1.29
2724315 AEHR 2019-09-23 09:59:00 1.32 1.32 1.32 1.32 100 1.29
2724316 AEHR 2019-09-23 10:00:00 1.35 1.35 1.35 1.35 400 1.29
4266341 ZI 2021-09-10 15:56:00 63.08 63.16 63.08 63.15 18205 63.08
4266342 ZI 2021-09-10 15:57:00 63.14 63.14 63.07 63.07 19355 63.07
4266343 ZI 2021-09-10 15:58:00 63.07 63.12 63.07 63.10 16650 63.07
4266344 ZI 2021-09-10 15:59:00 63.09 63.12 63.06 63.11 25775 63.06
4266345 ZI 2021-09-10 16:00:00 63.11 63.17 63.11 63.17 28578 63.06

missing observation panel data, bring forward value 20 periods

Here's to read in a DataFrame like the one I'm looking at
pd.DataFrame({
'period' : [1, 2, 3, 4, 5, 8, 9, 10, 11, 13, 14, 15, 16, 19, 20, 21, 22,
23, 25, 26],
'id' : [1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285,
1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285, 1285],
'pred': [-1.6534775, -1.6534775, -1.6534775, -1.6534775, -1.6534775,
-1.6534775, -1.6534775, -1.6534775, -1.6534775, -1.6534775,
-1.6534775, -1.6534775, -1.6534775, -1.6534775, -1.6534775,
-1.6534775, -1.6534775, -1.6534775, -1.6534775, -1.6534775],
'ret' : [ None, -0.02222222, -0.01363636, 0. , -0.02764977,
None, -0.00909091, -0.01376147, 0.00465116, None,
0.01869159, 0. , 0. , None , -0.00460829,
0.00462963, 0.02304147, 0. , None, -0.00050756]})
Which will look like this when read in.
period id pred ret
0 1 1285 -1.653477 NaN
1 2 1285 -1.653477 -0.022222
2 3 1285 -1.653477 -0.013636
3 4 1285 -1.653477 0.000000
4 5 1285 -1.653477 -0.027650
5 8 1285 -1.653477 NaN
6 9 1285 -1.653477 -0.009091
7 10 1285 -1.653477 -0.013761
8 11 1285 -1.653477 0.004651
9 13 1285 -1.653477 NaN
10 14 1285 -1.653477 0.018692
11 15 1285 -1.653477 0.000000
12 16 1285 -1.653477 0.000000
13 19 1285 -1.653477 NaN
14 20 1285 -1.653477 -0.004608
15 21 1285 -1.653477 0.004630
16 22 1285 -1.653477 0.023041
17 23 1285 -1.653477 0.000000
18 25 1285 -1.653477 NaN
19 26 1285 -1.653477 -0.000508
pred is 20 period prediction and consequently I want to do is bring the returns back 20 days. (but do it in a flexible way.)
Here's the lag function I have presently
def lag(df, col, lag_dist=1, ref='period', group='id'):
df = df.copy()
new_col = 'lag'+str(lag_dist)+'_'+col
df[new_col] = df.groupby(group)[col].shift(lag_dist)
# set NaN values that differ from specified
df[new_col] = (df.groupby(group)[ref]
.shift(lag_dist)
.sub(df[ref])
.eq(-lag_dist)
.mul(1)
.replace(0,np.nan)*df[new_col])
return df[new_col]
but when I run
df['fut20_ret'] = lag(df, 'ret', -20, 'period')
df.head(20)
I get
period id pred gain fee prc ret fut20_ret
0 1 1285 -1.653478 0.000000 0.87 1.000000 NaN NaN
1 2 1285 -1.653478 -0.022222 0.87 0.977778 -0.022222 NaN
2 3 1285 -1.653478 -0.035556 0.87 0.964444 -0.013636 NaN
3 4 1285 -1.653478 -0.035556 0.87 0.964444 0.000000 NaN
4 5 1285 -1.653478 -0.062222 0.87 0.937778 -0.027650 NaN
6 8 1285 -1.653478 -0.022222 0.87 0.977778 NaN NaN
7 9 1285 -1.653478 -0.031111 0.87 0.968889 -0.009091 NaN
8 10 1285 -1.653478 -0.044444 0.87 0.955556 -0.013761 NaN
9 11 1285 -1.653478 -0.040000 0.87 0.960000 0.004651 NaN
10 13 1285 -1.653478 -0.048889 0.87 0.951111 NaN NaN
11 14 1285 -1.653478 -0.031111 0.87 0.968889 0.018692 NaN
12 15 1285 -1.653478 -0.031111 0.87 0.968889 0.000000 NaN
13 16 1285 -1.653478 -0.031111 0.87 0.968889 0.000000 NaN
15 19 1285 -1.653478 -0.035556 0.87 0.964444 NaN NaN
16 20 1285 -1.653478 -0.040000 0.87 0.960000 -0.004608 NaN
17 21 1285 -1.653478 -0.035556 0.87 0.964444 0.004630 NaN
18 22 1285 -1.653478 -0.013333 0.87 0.986667 0.023041 NaN
19 23 1285 -1.653478 -0.013333 0.87 0.986667 0.000000 NaN
How can I modify my lag function so that it works properly? It's close but I'm struggling on the last little bit.

how can i replace time series data index with other values in pandas?

i have dataframe belows. and every day i have 9 point.
Date,Time is multiindexed index.
i wanna replace Time index in other time(00:00:00~2:00:00)everyday
Date Time a b c
2018-01-09 6:00:00 20.31 0 -2.95
2018-01-09 6:15:00 20.76 26738 -2.88
2018-01-09 6:30:00 21.4 22462 -2.77
2018-01-09 6:45:00 21.84 20033 -3
2018-01-09 7:00:00 22.17 20010 -3.28
2018-01-09 7:15:00 22.38 18133 -2.82
2018-01-09 7:30:00 22.75 18254 -3.14
2018-01-09 7:45:00 22.93 17039 -3.22
2018-01-09 8:00:00 23.13 15934 -3.27
2018-01-10 6:00:00 20.31 0 -2.95
2018-01-10 6:15:00 20.76 26738 -2.88
2018-01-10 6:30:00 21.4 22462 -2.77
2018-01-10 6:45:00 21.84 20033 -3
2018-01-10 7:00:00 22.17 20010 -3.28
2018-01-10 7:15:00 22.38 18133 -2.82
2018-01-10 7:30:00 22.75 18254 -3.14
2018-01-10 7:45:00 22.93 17039 -3.22
2018-01-10 8:00:00 23.13 15934 -3.27
so result should be belows
Date Time a b c
2018-01-09 0:00:00 20.31 0 -2.95
2018-01-09 0:15:00 20.76 26738 -2.88
2018-01-09 0:30:00 21.4 22462 -2.77
2018-01-09 0:45:00 21.84 20033 -3
2018-01-09 1:00:00 22.17 20010 -3.28
2018-01-09 1:15:00 22.38 18133 -2.82
2018-01-09 1:30:00 22.75 18254 -3.14
2018-01-09 1:45:00 22.93 17039 -3.22
2018-01-09 2:00:00 23.13 15934 -3.27
2018-01-10 0:00:00 20.31 0 -2.95
2018-01-10 0:15:00 20.76 26738 -2.88
2018-01-10 0:30:00 21.4 22462 -2.77
2018-01-10 0:45:00 21.84 20033 -3
2018-01-10 1:00:00 22.17 20010 -3.28
2018-01-10 1:15:00 22.38 18133 -2.82
2018-01-10 1:30:00 22.75 18254 -3.14
2018-01-10 1:45:00 22.93 17039 -3.22
2018-01-10 2:00:00 23.13 15934 -3.27
how can i do it?

If want replace all values by 15min interval times by days, you can first create dictinary for mapping:
d = dict(enumerate(pd.date_range(start='2018-01-01', end='2018-01-02', freq='15T').strftime('%H:%M:%S')))
print (d)
{0: '00:00:00', 1: '00:15:00', 2: '00:30:00', 3: '00:45:00', 4: '01:00:00', 5: '01:15:00', 6: '01:30:00', 7: '01:45:00', 8: '02:00:00', 9: '02:15:00', 10: '02:30:00', 11: '02:45:00', 12: '03:00:00', 13: '03:15:00', 14: '03:30:00', 15: '03:45:00', 16: '04:00:00', 17: '04:15:00', 18: '04:30:00', 19: '04:45:00', 20: '05:00:00', 21: '05:15:00', 22: '05:30:00', 23: '05:45:00', 24: '06:00:00', 25: '06:15:00', 26: '06:30:00', 27: '06:45:00', 28: '07:00:00', 29: '07:15:00', 30: '07:30:00', 31: '07:45:00', 32: '08:00:00', 33: '08:15:00', 34: '08:30:00', 35: '08:45:00', 36: '09:00:00', 37: '09:15:00', 38: '09:30:00', 39: '09:45:00', 40: '10:00:00', 41: '10:15:00', 42: '10:30:00', 43: '10:45:00', 44: '11:00:00', 45: '11:15:00', 46: '11:30:00', 47: '11:45:00', 48: '12:00:00', 49: '12:15:00', 50: '12:30:00', 51: '12:45:00', 52: '13:00:00', 53: '13:15:00', 54: '13:30:00', 55: '13:45:00', 56: '14:00:00', 57: '14:15:00', 58: '14:30:00', 59: '14:45:00', 60: '15:00:00', 61: '15:15:00', 62: '15:30:00', 63: '15:45:00', 64: '16:00:00', 65: '16:15:00', 66: '16:30:00', 67: '16:45:00', 68: '17:00:00', 69: '17:15:00', 70: '17:30:00', 71: '17:45:00', 72: '18:00:00', 73: '18:15:00', 74: '18:30:00', 75: '18:45:00', 76: '19:00:00', 77: '19:15:00', 78: '19:30:00', 79: '19:45:00', 80: '20:00:00', 81: '20:15:00', 82: '20:30:00', 83: '20:45:00', 84: '21:00:00', 85: '21:15:00', 86: '21:30:00', 87: '21:45:00', 88: '22:00:00', 89: '22:15:00', 90: '22:30:00', 91: '22:45:00', 92: '23:00:00', 93: '23:15:00', 94: '23:30:00', 95: '23:45:00', 96: '00:00:00'}
Then use cumcount for Counter and map:
s = df.groupby(level=0).cumcount().map(d)
print (s)
Date Time
2018-01-09 6:00:00 00:00:00
6:15:00 00:15:00
6:30:00 00:30:00
6:45:00 00:45:00
7:00:00 01:00:00
7:15:00 01:15:00
7:30:00 01:30:00
7:45:00 01:45:00
8:00:00 02:00:00
2018-01-10 6:00:00 00:00:00
6:15:00 00:15:00
6:30:00 00:30:00
6:45:00 00:45:00
7:00:00 01:00:00
7:15:00 01:15:00
7:30:00 01:30:00
7:45:00 01:45:00
8:00:00 02:00:00
Last reassign new index by set_index with get_level_values for first level indices:
df = df.set_index([df.index.get_level_values(0), s])
print (df)
a b c
Date
2018-01-09 00:00:00 20.31 0 -2.95
00:15:00 20.76 26738 -2.88
00:30:00 21.40 22462 -2.77
00:45:00 21.84 20033 -3.00
01:00:00 22.17 20010 -3.28
01:15:00 22.38 18133 -2.82
01:30:00 22.75 18254 -3.14
01:45:00 22.93 17039 -3.22
02:00:00 23.13 15934 -3.27
2018-01-10 00:00:00 20.31 0 -2.95
00:15:00 20.76 26738 -2.88
00:30:00 21.40 22462 -2.77
00:45:00 21.84 20033 -3.00
01:00:00 22.17 20010 -3.28
01:15:00 22.38 18133 -2.82
01:30:00 22.75 18254 -3.14
01:45:00 22.93 17039 -3.22
02:00:00 23.13 15934 -3.27

Add column to pandas pivot table based on complex condition

my pivot table looks like this:
In [285]: piv
Out[285]:
K 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398
and df2 looks like this:
In [290]: df2
Out[290]:
F Symbol
Expiry
2018-03-20 12:00:00 123.000000 ZN MAR 18
2018-06-20 12:00:00 122.609375 ZN JUN 18
I am looking to add piv['F'] based on the following:
piv.index.month < df2.index.month
so the result should looks like this:
K F 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-19 123.000000 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 123.000000 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 123.000000 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 123.000000 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 123.609375 0.041025 0.041000 0.040872 0.040623 0.040398
would help will be much appreciated.

reindex + backfill
df.index=pd.to_datetime(df.index)
df1.index=pd.to_datetime(df1.index)
df['F']=df1.reindex(df.index,method='backfill').F.values
df
Out[164]:
118.5 119.0 119.5 120.0 120.5 F
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842 123.000000
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526 123.000000
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196 123.000000
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991 123.000000
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005 123.000000
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398 122.609375

You want to use pd.merge_asof with direction='forward' and make sure to merge on the indices.
pd.merge_asof(
piv, df2[['F']],
left_index=True,
right_index=True,
direction='forward'
)
118.5 119.0 119.5 120.0 120.5 F
Expiry
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842 123.000000
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526 123.000000
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196 123.000000
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991 123.000000
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005 123.000000
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398 122.609375
And if you want 'F' in front:
pd.merge_asof(
piv, df2[['F']],
left_index=True,
right_index=True,
direction='forward'
).pipe(lambda d: d[['F']].join(d.drop('F', 1)))
F 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-12 123.000000 0.050842 0.050842 0.050842 0.050842 0.050842
2018-01-19 123.000000 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 123.000000 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 123.000000 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 123.000000 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 122.609375 0.041025 0.041000 0.040872 0.040623 0.040398

reading into DataFrame instead of Panel

I'd like to read the quotations of several tickers at the same time. I am using:
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
start = datetime.datetime(2017, 9, 20)
end = datetime.datetime(2017,9,22)
h = web.DataReader(["EWI", "EWG"], "yahoo", start, end)
... and it seems to work.
However, the data are read into a panel data structure. If I print variable "h" I get:
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 4 (major_axis) x 2 (minor_axis)
Items axis: Adj Close to Volume
Major_axis axis: 2017-09-22 00:00:00 to 2017-09-19 00:00:00
Minor_axis axis: EWG to EWI
I'd like:
to "see" the resulting panel values (I'm relatively new to pandas).
is it possible to flatten the panel into a DataFrame? (IMO it is better documented)
If I read the "Adjusted close" for me it would be more than enough. Perhaps reading into DataFrame directly would be easier?
Thank you

I think you need Panel.to_frame for MultiIndex DataFrame:
#with random data
df = h.to_frame()
print (df)
Adj Close Close High Low Open Volume
major minor
2013-01-01 EWI 0.471435 0.471435 0.471435 0.471435 0.471435 0.471435
EWG -1.190976 -1.190976 -1.190976 -1.190976 -1.190976 -1.190976
2013-01-02 EWI 1.432707 1.432707 1.432707 1.432707 1.432707 1.432707
EWG -0.312652 -0.312652 -0.312652 -0.312652 -0.312652 -0.312652
2013-01-03 EWI -0.720589 -0.720589 -0.720589 -0.720589 -0.720589 -0.720589
EWG 0.887163 0.887163 0.887163 0.887163 0.887163 0.887163
2013-01-04 EWI 0.859588 0.859588 0.859588 0.859588 0.859588 0.859588
EWG -0.636524 -0.636524 -0.636524 -0.636524 -0.636524 -0.636524
And then select column:
s = df['Adj Close']
print (s)
major minor
2013-01-01 EWI 0.471435
EWG -1.190976
2013-01-02 EWI 1.432707
EWG -0.312652
2013-01-03 EWI -0.720589
EWG 0.887163
2013-01-04 EWI 0.859588
EWG -0.636524
Name: Adj Close, dtype: float64
df1 = df[['Adj Close']]
print (df1)
Adj Close
major minor
2013-01-01 EWI 0.471435
EWG -1.190976
2013-01-02 EWI 1.432707
EWG -0.312652
2013-01-03 EWI -0.720589
EWG 0.887163
2013-01-04 EWI 0.859588
EWG -0.636524
Notice:
In future Panel will be deprecated.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding ranges from a dataframe - pandas

Related

Get stock Low of Day (LOD) price for incomplete daily bar using minute bar data (multiple stocks, multiple sessions in one df) SettingWithCopyWarning

missing observation panel data, bring forward value 20 periods

how can i replace time series data index with other values in pandas?

Add column to pandas pivot table based on complex condition

reading into DataFrame instead of Panel

Categories

Resources