Add column to pandas pivot table based on complex condition

Add column to pandas pivot table based on complex condition - pandas

my pivot table looks like this:
In [285]: piv
Out[285]:
K 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398
and df2 looks like this:
In [290]: df2
Out[290]:
F Symbol
Expiry
2018-03-20 12:00:00 123.000000 ZN MAR 18
2018-06-20 12:00:00 122.609375 ZN JUN 18
I am looking to add piv['F'] based on the following:
piv.index.month < df2.index.month
so the result should looks like this:
K F 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-19 123.000000 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 123.000000 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 123.000000 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 123.000000 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 123.609375 0.041025 0.041000 0.040872 0.040623 0.040398
would help will be much appreciated.

reindex + backfill
df.index=pd.to_datetime(df.index)
df1.index=pd.to_datetime(df1.index)
df['F']=df1.reindex(df.index,method='backfill').F.values
df
Out[164]:
118.5 119.0 119.5 120.0 120.5 F
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842 123.000000
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526 123.000000
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196 123.000000
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991 123.000000
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005 123.000000
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398 122.609375

You want to use pd.merge_asof with direction='forward' and make sure to merge on the indices.
pd.merge_asof(
piv, df2[['F']],
left_index=True,
right_index=True,
direction='forward'
)
118.5 119.0 119.5 120.0 120.5 F
Expiry
2018-01-12 0.050842 0.050842 0.050842 0.050842 0.050842 123.000000
2018-01-19 0.039526 0.039526 0.039526 0.039526 0.039526 123.000000
2018-01-26 0.039196 0.039196 0.039196 0.039196 0.039196 123.000000
2018-02-02 0.039991 0.039991 0.039991 0.039991 0.039991 123.000000
2018-02-23 0.040005 0.040005 0.040005 0.040005 0.040005 123.000000
2018-03-23 0.041025 0.041000 0.040872 0.040623 0.040398 122.609375
And if you want 'F' in front:
pd.merge_asof(
piv, df2[['F']],
left_index=True,
right_index=True,
direction='forward'
).pipe(lambda d: d[['F']].join(d.drop('F', 1)))
F 118.5 119.0 119.5 120.0 120.5
Expiry
2018-01-12 123.000000 0.050842 0.050842 0.050842 0.050842 0.050842
2018-01-19 123.000000 0.039526 0.039526 0.039526 0.039526 0.039526
2018-01-26 123.000000 0.039196 0.039196 0.039196 0.039196 0.039196
2018-02-02 123.000000 0.039991 0.039991 0.039991 0.039991 0.039991
2018-02-23 123.000000 0.040005 0.040005 0.040005 0.040005 0.040005
2018-03-23 122.609375 0.041025 0.041000 0.040872 0.040623 0.040398

Related

Finding ranges from a dataframe

I have a dataframe that looks like below,
Date 3tier1 3tier2
2013-01-01 08:00:00+08:00 20.97946282 20.97946282
2013-01-02 08:00:00+08:00 20.74539378 20.74539378
2013-01-03 08:00:00+08:00 20.51126054 20.51126054
2013-01-04 08:00:00+08:00 20.27707322 20.27707322
2013-01-05 08:00:00+08:00 20.04284112 20.04284112
2013-01-06 08:00:00+08:00 19.80857234 19.80857234
2013-01-07 08:00:00+08:00 19.57427331 19.57427331
2013-01-08 08:00:00+08:00 19.33994822 19.33994822
2013-01-09 08:00:00+08:00 19.10559849 19.10559849
2013-01-10 08:00:00+08:00 18.87122241 18.87122241
2013-01-11 08:00:00+08:00 18.63681507 18.63681507
2013-01-12 08:00:00+08:00 18.40236877 18.40236877
2013-01-13 08:00:00+08:00 18.16787383 18.16787383
2013-01-14 08:00:00+08:00 17.93331972 17.93331972
2013-01-15 08:00:00+08:00 17.69869612 17.69869612
2013-01-16 08:00:00+08:00 17.46399372 17.46399372
2013-01-17 08:00:00+08:00 17.22920466 17.22920466
2013-01-18 08:00:00+08:00 16.9943227 16.9943227
2013-01-19 08:00:00+08:00 17.27850867 16.7593431
2013-01-20 08:00:00+08:00 17.69762778 16.52426248
2013-01-21 08:00:00+08:00 18.12537837 16.28907864
2013-01-22 08:00:00+08:00 18.56180775 16.05379043
2013-01-23 08:00:00+08:00 19.00689471 15.81839767
2013-01-24 08:00:00+08:00 19.46053468 15.58290109
2013-01-25 08:00:00+08:00 19.92252218 15.3473024
2013-01-26 08:00:00+08:00 20.3925305 15.11160423
2013-01-27 08:00:00+08:00 20.87008788 14.87581016
2013-01-28 08:00:00+08:00 21.35454987 14.63992467
2013-01-29 08:00:00+08:00 21.84506726 14.40395298
2013-01-30 08:00:00+08:00 22.34054913 14.16790086
2013-01-31 08:00:00+08:00 22.83962058 13.93177434
2013-02-01 08:00:00+08:00 23.34057473 13.69557937
2013-02-02 08:00:00+08:00 23.84131896 13.45932144
2013-02-03 08:00:00+08:00 24.33931544 13.22300514
2013-02-04 08:00:00+08:00 24.8315166 12.98663374
2013-02-05 08:00:00+08:00 25.31429677 12.7502088
2013-02-06 08:00:00+08:00 25.78338191 12.51372976
2013-02-07 08:00:00+08:00 26.23378052 12.27719367
2013-02-08 08:00:00+08:00 26.65971992 12.04059517
2013-02-09 08:00:00+08:00 27.05459343 11.80392662
2013-02-10 08:00:00+08:00 27.41092527 11.56717871
2013-02-11 08:00:00+08:00 27.72036088 11.3303412
2013-02-12 08:00:00+08:00 27.97369094 11.09340384
2013-02-13 08:00:00+08:00 28.16091685 10.85635718
2013-02-14 08:00:00+08:00 28.27136466 10.61919323
2013-02-15 08:00:00+08:00 28.29385218 10.38190579
2013-02-16 08:00:00+08:00 28.21691143 10.14449064
2013-02-17 08:00:00+08:00 28.02906576 9.906945571
2013-02-18 08:00:00+08:00 27.71915819 9.669270289
2013-02-19 08:00:00+08:00 27.27672516 9.431466436
2013-02-20 08:00:00+08:00 26.69240919 9.193537583
2013-02-21 08:00:00+08:00 25.9584032 8.955489323
2013-02-22 08:00:00+08:00 25.06891975 8.717329426
2013-02-23 08:00:00+08:00 24.02067835 8.479068052
2013-02-24 08:00:00+08:00 22.81340411 8.240718006
2013-02-25 08:00:00+08:00 21.45033241 8.002294987
2013-02-26 08:00:00+08:00 19.93872048 7.763817801
2013-02-27 08:00:00+08:00 18.29038758 7.525308512
2013-02-28 08:00:00+08:00 16.5223583 7.286792516
2013-03-01 08:00:00+08:00 14.65781009 7.048298548
2013-03-02 08:00:00+08:00 12.72782154 6.809858708
2013-03-03 08:00:00+08:00 10.77512952 6.57150857
2013-03-04 08:00:00+08:00 8.862866684 6.333287469
2013-03-05 08:00:00+08:00 7.095368405 6.095239078
2013-03-06 08:00:00+08:00 5.857412338 5.857412338
2013-03-07 08:00:00+08:00 6.062085995 5.619862847
2013-03-08 08:00:00+08:00 7.707047277 5.382654808
2013-03-09 08:00:00+08:00 9.419192265 5.145863673
2013-03-10 08:00:00+08:00 11.12489254 4.909579657
2013-03-11 08:00:00+08:00 12.78439056 4.673912321
2013-03-12 08:00:00+08:00 14.37406958 4.438996486
2013-03-13 08:00:00+08:00 15.87932086 4.204999838
2013-03-14 08:00:00+08:00 17.29126015 3.97213278
2013-03-15 08:00:00+08:00 18.60496304 3.740661371
2013-03-16 08:00:00+08:00 19.81836754 3.510924673
2013-03-17 08:00:00+08:00 20.9315104 3.283358444
2013-03-18 08:00:00+08:00 21.94595693 3.058528064
2013-03-19 08:00:00+08:00 22.86436015 2.837174881
2013-03-20 08:00:00+08:00 23.69011593 2.620282024
2013-03-21 08:00:00+08:00 24.42709384 2.409168144
2013-03-22 08:00:00+08:00 25.07942941 2.205620134
2013-03-23 08:00:00+08:00 25.65136634 2.012076744
2013-03-24 08:00:00+08:00 26.14713926 1.831868652
2013-03-25 08:00:00+08:00 26.57088882 1.669492776
2013-03-26 08:00:00+08:00 26.92660259 1.53082259
2013-03-27 08:00:00+08:00 27.21807571 1.423006398
2013-03-28 08:00:00+08:00 27.44888683 1.353644799
2013-03-29 08:00:00+08:00 27.66626757 1.328979238
2013-03-30 08:00:00+08:00 28.03215155 1.351655979
2013-03-31 08:00:00+08:00 28.34758652 1.419589908
I would like to find the range for each month for column of my choice. and group the months when there is a change in direction of range, Say for example: 3tier1 for the month 1 actually starts from 20 goes to 16 and then again goes to 22, ex: From Jan 1 to Jan 18 - downward 20 to 16 and then from Jan 19 to Feb 15 upward from 17 to 28 and so on and so forth,
Expected output:
2013-01-01 to 2013-01-18 - 20 to 16
2013-01-19 to 2013-02-15 - 17 to 28
Is there a builtin pandas function that can do this with ease? Thanks for your help in advance.

I don't know of built in function that does what you are looking for. It can be put together with enough lines of code. I would use .diff() and .shift().
This is what I came up with.
import pandas as pd
import numpy as np
file = 'C:/path_to_file/data.csv'
df = pd.read_csv(file, parse_dates=['Date'])
# Now I have your dataframe loaded. ** Your procedures are below.
df['trend'] = np.where(df['3tier1'].diff()>0,1,-1) # trend is increasing or decreasing
df['again'] = df['trend'].diff() # get the differnece in trend
df['again'] = df['again'].shift(periods=-1) + df['again']
df['change'] = np.where(df['again'].isin([2,-2,np.nan]), 2, 0)
# get to the desired data.
dfc = df[df['change']==2]
dfc['to_date'] = dfc['Date'].shift(periods=-1)
dfc['to_End'] = dfc['3tier1'].shift(periods=-1)
dfc.drop(columns=['trend', 'again','change'], inplace=True)
# get the rows that show the trend
dfc = dfc.iloc[::2, :]
print(dfc)

Group dataframes by percentile level

I have a dataframe:
('U', 'OLHC', '+') counts: 127
Date Open High Low Close Sign Struct Trend OH HL LC OL LH HC
1997-06-17 00:00:00+00:00 812.97 897.60 811.80 894.42 + OLHC U 84.63 85.80 82.62 1.17 85.80 3.18
1998-03-08 00:00:00+00:00 957.59 1055.69 954.24 1055.69 + OLHC U 98.10 101.45 101.45 3.35 101.45 0.00
1998-10-14 00:00:00+00:00 957.28 1066.11 923.32 1005.53 + OLHC U 108.83 142.79 82.21 33.96 142.79 60.58
1998-11-27 00:00:00+00:00 1005.53 1192.97 1000.12 1192.33 + OLHC U 187.44 192.85 192.21 5.41 192.85 0.64
1999-01-10 00:00:00+00:00 1192.33 1278.24 1136.89 1275.09 + OLHC U 85.91 141.35 138.20 55.44 141.35 3.15
1999-04-08 00:00:00+00:00 1271.18 1344.08 1216.03 1343.98 + OLHC U 72.90 128.05 127.95 55.15 128.05 0.10
1999-11-14 00:00:00+00:00 1282.81 1396.12 1233.70 1396.06 + OLHC U 113.31 162.42 162.36 49.11 162.42 0.06
2001-04-25 00:00:00+00:00 1182.91 1253.76 1081.19 1228.75 + OLHC U 70.85 172.57 147.56 101.72 172.57 25.01
2001-12-01 00:00:00+00:00 1066.98 1163.38 1052.83 1137.88 + OLHC U 96.40 110.55 85.05 14.15 110.55 25.50
2003-03-30 00:00:00+00:00 836.25 895.78 788.90 863.50 + OLHC U 59.53 106.88 74.60 47.35 106.88 32.28
2003-05-13 00:00:00+00:00 863.50 947.51 843.68 942.30 + OLHC U 84.01 103.83 98.62 19.82 103.83 5.21
2003-09-22 00:00:00+00:00 977.59 1040.18 974.21 1022.82 + OLHC U 62.59 65.97 48.61 3.38 65.97 17.36
2003-11-05 00:00:00+00:00 1022.82 1061.44 990.34 1051.81 + OLHC U 38.62 71.10 61.47 32.48 71.10 9.63
2003-12-19 00:00:00+00:00 1053.14 1091.03 1031.24 1088.67 + OLHC U 37.89 59.79 57.43 21.90 59.79 2.36
2004-12-05 00:00:00+00:00 1095.74 1197.11 1090.23 1191.17 + OLHC U 101.37 106.88 100.94 5.51 106.88 5.94
2005-01-18 00:00:00+00:00 1190.84 1217.90 1173.76 1195.98 + OLHC U 27.06 44.14 22.22 17.08 44.14 21.92
2005-05-30 00:00:00+00:00 1142.40 1199.56 1136.22 1198.78 + OLHC U 57.16 63.34 62.56 6.18 63.34 0.78
2006-02-18 00:00:00+00:00 1274.61 1294.90 1253.61 1287.24 + OLHC U 20.29 41.29 33.63 21.00 41.29 7.66
2006-04-03 00:00:00+00:00 1287.14 1310.88 1268.42 1297.81 + OLHC U 23.74 42.46 29.39 18.72 42.46 13.07
2006-09-26 00:00:00+00:00 1267.60 1336.60 1266.67 1336.34 + OLHC U 69.00 69.93 69.67 0.93 69.93 0.26
2006-11-09 00:00:00+00:00 1335.37 1389.45 1327.10 1378.33 + OLHC U 54.08 62.35 51.23 8.27 62.35 11.12
2006-12-23 00:00:00+00:00 1378.35 1431.81 1375.60 1410.76 + OLHC U 53.46 56.21 35.16 2.75 56.21 21.05
2007-09-13 00:00:00+00:00 1455.27 1503.41 1370.60 1483.95 + OLHC U 48.14 132.81 113.35 84.67 132.81 19.46
2008-04-20 00:00:00+00:00 1293.37 1395.90 1256.98 1390.33 + OLHC U 102.53 138.92 133.35 36.39 138.92 5.57
2009-04-07 00:00:00+00:00 770.05 845.61 666.79 815.55 + OLHC U 75.56 178.82 148.76 103.26 178.82 30.06
2009-05-21 00:00:00+00:00 815.55 930.17 814.84 888.33 + OLHC U 114.62 115.33 73.49 0.71 115.33 41.84
2009-07-04 00:00:00+00:00 888.33 956.23 881.46 896.42 + OLHC U 67.90 74.77 14.96 6.87 74.77 59.81
2009-08-17 00:00:00+00:00 896.42 1018.00 869.32 979.73 + OLHC U 121.58 148.68 110.41 27.10 148.68 38.27
2009-09-30 00:00:00+00:00 979.73 1080.15 979.73 1057.08 + OLHC U 100.42 100.42 77.35 0.00 100.42 23.07
2009-11-13 00:00:00+00:00 1057.08 1105.37 1020.18 1093.48 + OLHC U 48.29 85.19 73.30 36.90 85.19 11.89
2010-10-31 00:00:00+00:00 1126.57 1196.14 1122.79 1183.26 + OLHC U 69.57 73.35 60.47 3.78 73.35 12.88
2010-12-14 00:00:00+00:00 1185.71 1246.73 1173.00 1241.59 + OLHC U 61.02 73.73 68.59 12.71 73.73 5.14
2011-01-27 00:00:00+00:00 1241.58 1301.29 1232.85 1299.54 + OLHC U 59.71 68.44 66.69 8.73 68.44 1.75
2011-03-12 00:00:00+00:00 1299.63 1344.07 1275.10 1304.28 + OLHC U 44.44 68.97 29.18 24.53 68.97 39.79
2011-12-01 00:00:00+00:00 1223.46 1292.66 1158.66 1244.58 + OLHC U 69.20 134.00 85.92 64.80 134.00 48.08
2012-01-14 00:00:00+00:00 1246.03 1296.82 1202.37 1289.09 + OLHC U 50.79 94.45 86.72 43.66 94.45 7.73
2012-02-27 00:00:00+00:00 1290.22 1371.94 1290.22 1367.59 + OLHC U 81.72 81.72 77.37 0.00 81.72 4.35
2012-07-08 00:00:00+00:00 1318.90 1374.81 1266.74 1354.68 + OLHC U 55.91 108.07 87.94 52.16 108.07 20.13
2012-08-21 00:00:00+00:00 1354.66 1426.68 1325.41 1413.17 + OLHC U 72.02 101.27 87.76 29.25 101.27 13.51
2012-12-31 00:00:00+00:00 1366.42 1448.00 1366.42 1426.19 + OLHC U 81.58 81.58 59.77 0.00 81.58 21.81
2013-02-13 00:00:00+00:00 1426.19 1524.69 1426.19 1520.33 + OLHC U 98.50 98.50 94.14 0.00 98.50 4.36
Then, I select only events by percentile value:
col_name = 'HC'
group_name = col_name + '_lev'
level_value = .95
hc_group = df[df[col_name] > df[col_name].quantile(level_value)]
hc_group.loc[:, group_name] = col_name
I get the result:
Date Open High Low Close Sign Struct Trend OH HL LC OL LH HC HC_lev
1998-10-14 00:00:00+00:00 957.28 1066.11 923.32 1005.53 + OLHC U 108.83 142.79 82.21 33.96 142.79 60.58 HC
2009-05-21 00:00:00+00:00 815.55 930.17 814.84 888.33 + OLHC U 114.62 115.33 73.49 0.71 115.33 41.84 HC
2009-07-04 00:00:00+00:00 888.33 956.23 881.46 896.42 + OLHC U 67.90 74.77 14.96 6.87 74.77 59.81 HC
2009-08-17 00:00:00+00:00 896.42 1018.00 869.32 979.73 + OLHC U 121.58 148.68 110.41 27.10 148.68 38.27 HC
2011-03-12 00:00:00+00:00 1299.63 1344.07 1275.10 1304.28 + OLHC U 44.44 68.97 29.18 24.53 68.97 39.79 HC
2011-12-01 00:00:00+00:00 1223.46 1292.66 1158.66 1244.58 + OLHC U 69.20 134.00 85.92 64.80 134.00 48.08 HC
2016-06-29 00:00:00+00:00 2065.04 2120.55 1991.68 2070.77 + OLHC U 55.51 128.87 79.09 73.36 128.87 49.78 HC
This code works fine.
I would like to do the same for each column from list ['OH', 'HL', 'LC', 'OL', 'LH', 'HC'] and
return a groupby object, grouped by these columns.
In other words, I need one big object , consist of let say oh_group, hl_group, ..., hc_group
Could you advise me , how to deal with this?

Finally, I've found the solution. May be it will be useful for someone else.
names = ['OH', 'HL', 'LC', 'OL', 'LH', 'HC']
percentiles = [.75, .90, .95, .98]
for col_name in names:
for perc in percentiles:
k = df[df[col_name] > df[col_name].quantile(perc)]
k.loc[:, 'Level'] = str(perc)
total_df = pd.concat([total_df, k], sort=False)
print(col_name + ' events:')
print('----------')
print_groups(total_df.groupby('Level'))

Pandas X_axis hourly [duplicate]

This question already has answers here:
Pandas timeseries plot setting x-axis major and minor ticks and labels
(2 answers)
Closed 5 years ago.
I have this little Pandascode:
graph = auswahl[['Volumenstrom_Außen', 'Vpunkt_Gesamt','Zuluft_Druck_10','Abluft_Druck_10']]
a = graph.plot(figsize=[50,10])
a.set(ylabel="m³/h", xlabel="Zeit", title="Volumenströme")#,ylim=[0,100])
a.legend(loc="upper left")
plt.show()
How can I set the X-Axis showing every Hour?
the dataframe looks like this:
Volumenstrom_Außen Vpunkt_Gesamt Zuluft_Druck Abluft_Druck
Zeit
2018-02-15 16:49:00 1021.708443 752.699 49.328 46.811
2018-02-15 16:49:15 1021.708443 752.699 49.328 46.811
2018-02-15 16:49:30 1021.708443 752.699 49.328 46.811
2018-02-15 16:49:45 1021.708443 752.699 49.328 46.811
2018-02-15 16:50:00 1021.708443 752.699 49.328 46.811
2018-02-15 16:50:15 1021.708443 752.699 49.328 46.811
2018-02-15 16:50:30 1021.708443 752.699 49.328 46.811
2018-02-15 16:50:45 1021.708443 752.699 49.328 46.811
2018-02-15 16:51:00 1092.171094 752.699 49.328 46.811
2018-02-15 16:51:15 1092.171094 752.699 49.328 46.811

Let's take this example dataframe, whose index is at minute granularity
import pandas as pd
import random
ts_index = pd.date_range('1/1/2000', periods=1000, freq='T')
v1 = [random.random() for i in range(1000)]
v2 = [random.random() for i in range(1000)]
v3 = [random.random() for i in range(1000)]
ts_df = pd.DataFrame({'v1':v1,'v2':v2,'v3':v3},index=ts_index)
ts_df.head()
v1 v2 v3
2000-01-01 00:00:00 0.593039 0.017351 0.742111
2000-01-01 00:01:00 0.563233 0.837362 0.869767
2000-01-01 00:02:00 0.453925 0.962600 0.690868
2000-01-01 00:03:00 0.757895 0.123610 0.622777
2000-01-01 00:04:00 0.759841 0.906674 0.263902
We could use pandas.DataFrame.resample to downsample this data to hourly granularity, like shown below
hourly_mean_df = ts_df.resample('H').mean() # you can use .sum() also
hourly_mean_df.head()
v1 v2 v3
2000-01-01 00:00:00 0.516001 0.461119 0.467895
2000-01-01 01:00:00 0.530603 0.458208 0.550892
2000-01-01 02:00:00 0.472090 0.522278 0.508345
2000-01-01 03:00:00 0.515713 0.486906 0.541538
2000-01-01 04:00:00 0.514543 0.478097 0.489217
Now you can plot this hourly summary
hourly_mean_df.plot()

how can i replace time series data index with other values in pandas?

i have dataframe belows. and every day i have 9 point.
Date,Time is multiindexed index.
i wanna replace Time index in other time(00:00:00~2:00:00)everyday
Date Time a b c
2018-01-09 6:00:00 20.31 0 -2.95
2018-01-09 6:15:00 20.76 26738 -2.88
2018-01-09 6:30:00 21.4 22462 -2.77
2018-01-09 6:45:00 21.84 20033 -3
2018-01-09 7:00:00 22.17 20010 -3.28
2018-01-09 7:15:00 22.38 18133 -2.82
2018-01-09 7:30:00 22.75 18254 -3.14
2018-01-09 7:45:00 22.93 17039 -3.22
2018-01-09 8:00:00 23.13 15934 -3.27
2018-01-10 6:00:00 20.31 0 -2.95
2018-01-10 6:15:00 20.76 26738 -2.88
2018-01-10 6:30:00 21.4 22462 -2.77
2018-01-10 6:45:00 21.84 20033 -3
2018-01-10 7:00:00 22.17 20010 -3.28
2018-01-10 7:15:00 22.38 18133 -2.82
2018-01-10 7:30:00 22.75 18254 -3.14
2018-01-10 7:45:00 22.93 17039 -3.22
2018-01-10 8:00:00 23.13 15934 -3.27
so result should be belows
Date Time a b c
2018-01-09 0:00:00 20.31 0 -2.95
2018-01-09 0:15:00 20.76 26738 -2.88
2018-01-09 0:30:00 21.4 22462 -2.77
2018-01-09 0:45:00 21.84 20033 -3
2018-01-09 1:00:00 22.17 20010 -3.28
2018-01-09 1:15:00 22.38 18133 -2.82
2018-01-09 1:30:00 22.75 18254 -3.14
2018-01-09 1:45:00 22.93 17039 -3.22
2018-01-09 2:00:00 23.13 15934 -3.27
2018-01-10 0:00:00 20.31 0 -2.95
2018-01-10 0:15:00 20.76 26738 -2.88
2018-01-10 0:30:00 21.4 22462 -2.77
2018-01-10 0:45:00 21.84 20033 -3
2018-01-10 1:00:00 22.17 20010 -3.28
2018-01-10 1:15:00 22.38 18133 -2.82
2018-01-10 1:30:00 22.75 18254 -3.14
2018-01-10 1:45:00 22.93 17039 -3.22
2018-01-10 2:00:00 23.13 15934 -3.27
how can i do it?

If want replace all values by 15min interval times by days, you can first create dictinary for mapping:
d = dict(enumerate(pd.date_range(start='2018-01-01', end='2018-01-02', freq='15T').strftime('%H:%M:%S')))
print (d)
{0: '00:00:00', 1: '00:15:00', 2: '00:30:00', 3: '00:45:00', 4: '01:00:00', 5: '01:15:00', 6: '01:30:00', 7: '01:45:00', 8: '02:00:00', 9: '02:15:00', 10: '02:30:00', 11: '02:45:00', 12: '03:00:00', 13: '03:15:00', 14: '03:30:00', 15: '03:45:00', 16: '04:00:00', 17: '04:15:00', 18: '04:30:00', 19: '04:45:00', 20: '05:00:00', 21: '05:15:00', 22: '05:30:00', 23: '05:45:00', 24: '06:00:00', 25: '06:15:00', 26: '06:30:00', 27: '06:45:00', 28: '07:00:00', 29: '07:15:00', 30: '07:30:00', 31: '07:45:00', 32: '08:00:00', 33: '08:15:00', 34: '08:30:00', 35: '08:45:00', 36: '09:00:00', 37: '09:15:00', 38: '09:30:00', 39: '09:45:00', 40: '10:00:00', 41: '10:15:00', 42: '10:30:00', 43: '10:45:00', 44: '11:00:00', 45: '11:15:00', 46: '11:30:00', 47: '11:45:00', 48: '12:00:00', 49: '12:15:00', 50: '12:30:00', 51: '12:45:00', 52: '13:00:00', 53: '13:15:00', 54: '13:30:00', 55: '13:45:00', 56: '14:00:00', 57: '14:15:00', 58: '14:30:00', 59: '14:45:00', 60: '15:00:00', 61: '15:15:00', 62: '15:30:00', 63: '15:45:00', 64: '16:00:00', 65: '16:15:00', 66: '16:30:00', 67: '16:45:00', 68: '17:00:00', 69: '17:15:00', 70: '17:30:00', 71: '17:45:00', 72: '18:00:00', 73: '18:15:00', 74: '18:30:00', 75: '18:45:00', 76: '19:00:00', 77: '19:15:00', 78: '19:30:00', 79: '19:45:00', 80: '20:00:00', 81: '20:15:00', 82: '20:30:00', 83: '20:45:00', 84: '21:00:00', 85: '21:15:00', 86: '21:30:00', 87: '21:45:00', 88: '22:00:00', 89: '22:15:00', 90: '22:30:00', 91: '22:45:00', 92: '23:00:00', 93: '23:15:00', 94: '23:30:00', 95: '23:45:00', 96: '00:00:00'}
Then use cumcount for Counter and map:
s = df.groupby(level=0).cumcount().map(d)
print (s)
Date Time
2018-01-09 6:00:00 00:00:00
6:15:00 00:15:00
6:30:00 00:30:00
6:45:00 00:45:00
7:00:00 01:00:00
7:15:00 01:15:00
7:30:00 01:30:00
7:45:00 01:45:00
8:00:00 02:00:00
2018-01-10 6:00:00 00:00:00
6:15:00 00:15:00
6:30:00 00:30:00
6:45:00 00:45:00
7:00:00 01:00:00
7:15:00 01:15:00
7:30:00 01:30:00
7:45:00 01:45:00
8:00:00 02:00:00
Last reassign new index by set_index with get_level_values for first level indices:
df = df.set_index([df.index.get_level_values(0), s])
print (df)
a b c
Date
2018-01-09 00:00:00 20.31 0 -2.95
00:15:00 20.76 26738 -2.88
00:30:00 21.40 22462 -2.77
00:45:00 21.84 20033 -3.00
01:00:00 22.17 20010 -3.28
01:15:00 22.38 18133 -2.82
01:30:00 22.75 18254 -3.14
01:45:00 22.93 17039 -3.22
02:00:00 23.13 15934 -3.27
2018-01-10 00:00:00 20.31 0 -2.95
00:15:00 20.76 26738 -2.88
00:30:00 21.40 22462 -2.77
00:45:00 21.84 20033 -3.00
01:00:00 22.17 20010 -3.28
01:15:00 22.38 18133 -2.82
01:30:00 22.75 18254 -3.14
01:45:00 22.93 17039 -3.22
02:00:00 23.13 15934 -3.27

Need to add a column to a pd.df with values from a function

My Dataframe looks like this:
In [325]: TYVOL.tail()
Out[325]:
Close
Date
2017-11-24 0.027705
2017-11-27 0.029335
2017-11-28 0.029335
2017-11-29 0.029498
2017-11-30 0.031454
tried this:
TYVOL['pb'] = [my_probability_gamma(TYVOL.Close[date],shape0,shape1,shape2,scale0,
scale1,scale2,pbv0,pbv1,pbv2,date) for date in TYVOL.index]
which throws a KeyError: Timestamp
Anything obvious i am doing wrong? Thanks for your help.

I think you need loc:
TYVOL['pb'] = [my_probability_gamma(TYVOL.loc[date, 'Close'],shape0,shape1,shape2,scale0,
scale1,scale2,pbv0,pbv1,pbv2,date) for date in TYVOL.index]
Or apply and instead date use x.name:
f = lambda x: my_probability_gamma(x['Close'],shape0,shape1,shape2,
scale0, scale1,scale2,pbv0,pbv1,pbv2,x.name)
TYVOL['pb'] = TYVOL.apply(f, axis=1)
Or use iteritems:
TYVOL['pb'] = [my_probability_gamma(c,shape0,shape1,shape2,scale0,
scale1,scale2,pbv0,pbv1,pbv2,d) for d, c in TYVOL['Close'].iteritems()]
Test:
def my_probability_gamma(x,y,z):
return (x,y,z)
shape0 = 1
TYVOL['pb'] = [my_probability_gamma(TYVOL.loc[date, 'Close'],shape0,date) for date in TYVOL.index]
print (TYVOL)
Close pb
Date
2017-11-24 0.027705 (0.027705, 1, 2017-11-24 00:00:00)
2017-11-27 0.029335 (0.029335, 1, 2017-11-27 00:00:00)
2017-11-28 0.029335 (0.029335, 1, 2017-11-28 00:00:00)
2017-11-29 0.029498 (0.029498, 1, 2017-11-29 00:00:00)
2017-11-30 0.031454 (0.031454, 1, 2017-11-30 00:00:00)
shape0 = 1
f = lambda x: my_probability_gamma(x['Close'],shape0,x.name)
TYVOL['pb'] = TYVOL.apply(f, axis=1)
print (TYVOL)
Close pb
Date
2017-11-24 0.027705 (0.027705, 1, 2017-11-24T00:00:00.000000000)
2017-11-27 0.029335 (0.029335, 1, 2017-11-27T00:00:00.000000000)
2017-11-28 0.029335 (0.029335, 1, 2017-11-28T00:00:00.000000000)
2017-11-29 0.029498 (0.029498, 1, 2017-11-29T00:00:00.000000000)
2017-11-30 0.031454 (0.031454, 1, 2017-11-30T00:00:00.000000000)
TYVOL['pb'] = [my_probability_gamma(c,shape0,d) for d, c in TYVOL['Close'].iteritems()]
print (TYVOL)
Close pb
Date
2017-11-24 0.027705 (0.027705, 1, 2017-11-24 00:00:00)
2017-11-27 0.029335 (0.029335000000000003, 1, 2017-11-27 00:00:00)
2017-11-28 0.029335 (0.029335000000000003, 1, 2017-11-28 00:00:00)
2017-11-29 0.029498 (0.029498000000000003, 1, 2017-11-29 00:00:00)
2017-11-30 0.031454 (0.031454, 1, 2017-11-30 00:00:00)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Add column to pandas pivot table based on complex condition - pandas

Related

Finding ranges from a dataframe

Group dataframes by percentile level

Pandas X_axis hourly [duplicate]

how can i replace time series data index with other values in pandas?

Need to add a column to a pd.df with values from a function

Categories

Resources