Calculate Moving Average on Previous Calculated Moving Average (Snowflake) - sql

I have a dataset that looks something like this. I wish to calculate a modified moving average (column Mod_MA) for sales column based on the following logic :
If there is no event, then ST Else Average last 4 dates.
Date
Item
Event
ST
Mod_MA
2022-10-01
ABC
100
100
2022-10-02
ABC
110
110
2022-10-03
ABC
120
120
2022-10-04
ABC
130
130
2022-10-05
ABC
EV1
140
115
2022-10-06
ABC
EV1
150
119
2022-10-07
ABC
160
160
2022-10-08
ABC
170
170
2022-10-09
ABC
180
180
2022-10-10
ABC
EV2
190
157
2022-10-11
ABC
EV2
200
167
2022-10-12
ABC
EV2
210
168
2022-10-01
XYZ
100
100
2022-10-02
XYZ
110
110
2022-10-03
XYZ
120
120
2022-10-04
XYZ
130
130
2022-10-05
XYZ
EV3
140
115
2022-10-06
XYZ
EV3
150
119
2022-10-07
XYZ
EV3
160
121
2022-10-08
XYZ
170
170
2022-10-09
XYZ
180
180
2022-10-10
XYZ
EV4
190
147
2022-10-11
XYZ
EV4
200
155
2022-10-12
XYZ
210
210
Hopefully the image helps clarify what I am going for.
I have tried LAG & AVG OVER ORDER BY but since I dont have an exact number of iterations I need to run, these dont work.
Calculation Formulae
Would appreciate any help.

Related

Find Maximum Value in Column Pandas

I have a data frame like this- Machine Vibration data.
datetime
tagid
value
quality
0
2021-03-01 13:43:41.440
B42
345
192
1
2021-03-01 13:43:41.440
B43
958
192
2
2021-03-01 13:43:41.440
B44
993
192
3
2021-03-01 13:43:41.440
B45
1224
192
4
2021-03-01 13:43:43.527
B188
6665
192
5
2021-03-01 13:43:43.527
B189
7162
192
6
2021-03-01 13:43:43.527
B190
7193
192
7
2021-03-01 13:43:43.747
C29
2975
192
8
2021-03-01 13:43:43.747
C30
4445
192
9
2021-03-01 13:43:43.747
C31
4015
192
I want to convert this to hourly maximum value for each tag id.
Sample Output
datetime
tagid
value
quality
01-03-2021 13:00
C91
3982
192
01-03-2021 14:00
C91
3972
192
01-03-2021 13:00
C92
9000
192
01-03-2021 14:00
C92
9972
192
01-03-2021 13:00
B42
396
192
01-03-2021 14:00
B42
370
192
01-03-2021 15:00
B42
370
192
I tried with grouper, but couldn't get output.
Use Grouper with aggregate max:
df = df.groupby([pd.Grouper(freq='H', key='datetime'), 'tagid']).max().reset_index()

Pandas multiply 2 series of different dimensions to give dataframe

I have a series;
Red 33
Blue 44
Green 22
And also this series;
0 100
1 100
2 100
3 200
4 200
5 200
I want to multiply these in a way to give the following dataframe
Red Blue Green
0 330 440 220
1 330 440 220
2 330 440 220
3 660 880 440
4 660 880 440
5 660 880 440
Can anyone see a simply / tidy way this could be done?
IIUC assuming s is the name of the first series and s1 is the name of the second series, try:
m=s.to_frame().T
pd.DataFrame(m.values*s1.values[:,None],columns=m.columns)
Red Blue Green
0 3300 4400 2200
1 3300 4400 2200
2 3300 4400 2200
3 6600 8800 4400
4 6600 8800 4400
5 6600 8800 4400

how to add incremental number to specific column in pandas

I have following dataframe in pandas
code tank length dia diff
123 3 625 210 -0.38
123 5 635 210 1.2
I want to add 1 only in length for 5 times if the diff is positive and subtract 1 if the dip is negative. My desired dataframe looks like
code tank length diameter
123 3 625 210
123 3 624 210
123 3 623 210
123 3 622 210
123 3 621 210
123 3 620 210
123 5 635 210
123 5 636 210
123 5 637 210
123 5 638 210
123 5 639 210
123 5 640 210
I am doing following in pandas.
df.add(1)
But, its adding 1 to all the columns.
Use Index.repeat 6 times, then add counter values by GroupBy.cumcount and last create default RangeIndex by DataFrame.set_index:
df1 = df.loc[df.index.repeat(6)].copy()
df1['length'] += df1.groupby(level=0).cumcount()
df1 = df1.reset_index(drop=True)
Or:
df1 = (df.loc[df.index.repeat(6)]
.assign(length = lambda x: x.groupby(level=0).cumcount() + x['length'])
.reset_index(drop=True))
print (df1)
code tank length dia
0 123 3 625 210
1 123 3 626 210
2 123 3 627 210
3 123 3 628 210
4 123 3 629 210
5 123 3 630 210
6 123 5 635 210
7 123 5 636 210
8 123 5 637 210
9 123 5 638 210
10 123 5 639 210
11 123 5 640 210
EDIT:
df1 = df.loc[df.index.repeat(6)].copy()
add = df1.groupby(level=0).cumcount()
mask = df1['diff'] < 0
df1['length'] = np.where(mask, df1['length'] - add, df1['length'] + add)
df1 = df1.reset_index(drop=True)
print (df1)
code tank length dia diff
0 123 3 625 210 -0.38
1 123 3 624 210 -0.38
2 123 3 623 210 -0.38
3 123 3 622 210 -0.38
4 123 3 621 210 -0.38
5 123 3 620 210 -0.38
6 123 5 635 210 1.20
7 123 5 636 210 1.20
8 123 5 637 210 1.20
9 123 5 638 210 1.20
10 123 5 639 210 1.20
11 123 5 640 210 1.20
We can use pd.concat, np.cumsum and groupby + .add.
If you want to substract, simply multiply addition * -1 so for example: (np.cumsum(np.ones(n))-1) * -1
n = 6
new = pd.concat([df]*n).sort_values(['code', 'length']).reset_index(drop=True)
addition = np.cumsum(np.ones(n))-1
new['length'] = new.groupby(['code', 'tank'])['length'].apply(lambda x: x.add(addition))
Output
code tank length dia
0 123 3 625.0 210
1 123 3 626.0 210
2 123 3 627.0 210
3 123 3 628.0 210
4 123 3 629.0 210
5 123 3 630.0 210
6 123 5 635.0 210
7 123 5 636.0 210
8 123 5 637.0 210
9 123 5 638.0 210
10 123 5 639.0 210
11 123 5 640.0 210

How to set a flag in groupby with a condition in pandas

I have a following dataframe
code date time product tank stock out_value
123 2019-06-20 07:00 MS 1 370 350
123 2019-06-20 07:30 HS 3 340 350
123 2019-06-20 07:00 MS 2 340 350
123 2019-06-20 07:30 HS 4 340 350
123 2019-06-20 08:00 MS 1 470 350
123 2019-06-20 08:30 HS 3 450 350
123 2019-06-20 08:00 MS 2 470 350
123 2019-06-20 08:30 HS 4 490 350
123 2019-06-20 09:30 HS 4 0 350
234 2019-06-20 09:30 HS 1 200 350
I want to find out which stock values are less than out_value in above dataframe excluding 0 value.
e.g. at 07:30 for ro code 123 on date 2019-06-20 for product HS there are two tanks 3 and 4, so if stocks for both the tanks are below out_value then flag is set to 1.
My desired dataframe would be
code date time product tank stock out_value flag
123 2019-06-20 07:00 MS 1 370 350 0
123 2019-06-20 07:30 HS 3 340 350 1
123 2019-06-20 07:00 MS 2 340 350 0
123 2019-06-20 07:30 HS 4 340 350 1
123 2019-06-20 08:00 MS 1 470 350 0
123 2019-06-20 08:30 HS 3 450 350 0
123 2019-06-20 08:00 MS 2 470 350 0
123 2019-06-20 08:30 HS 4 490 350 0
123 2019-06-20 09:30 HS 4 0 350 0
234 2019-06-20 09:30 HS 1 200 350 1
How can I do it in pandas?
If need check difference with non 0 values and then check all True values per groups with GroupBy.transform and GroupBy.all:
df['flag'] = ((df['stock']<df['out_value']) & (df['stock'] !=0))
df['flag'] = df.groupby(['code','date','time','product'])['flag'].transform('all').astype(int)
print (df)
code date time product tank stock out_value flag
0 123 2019-06-20 07:00 MS 1 370 350 0
1 123 2019-06-20 07:30 HS 3 340 350 1
2 123 2019-06-20 07:00 MS 2 340 350 0
3 123 2019-06-20 07:30 HS 4 340 350 1
4 123 2019-06-20 08:00 MS 1 470 350 0
5 123 2019-06-20 08:30 HS 3 450 350 0
6 123 2019-06-20 08:00 MS 2 470 350 0
7 123 2019-06-20 08:30 HS 4 490 350 0
8 123 2019-06-20 09:30 HS 4 0 350 0
9 234 2019-06-20 09:30 HS 1 200 350 1
Or if need test only difference, test per groups and last chain with mask for test non 0 values:
df['flag'] = df['stock']<df['out_value']
mask = df.groupby(['code','date','time','product'])['flag'].transform('all')
df['flag'] = (mask & (df['stock'] !=0)).astype(int)
This should do it:
df['flag'] = (df.assign(flag=(df.stock<df.out_value)&(df.stock>0))
.groupby(['code', 'date', 'time', 'product'], as_index=False)['flag']
.transform(all)
.astype(int))
df
code date time product tank stock out_value flag
0 123 2019-06-20 07:00 MS 1 370 350 0
1 123 2019-06-20 07:30 HS 3 340 350 1
2 123 2019-06-20 07:00 MS 2 340 350 0
3 123 2019-06-20 07:30 HS 4 340 350 1
4 123 2019-06-20 08:00 MS 1 470 350 0
5 123 2019-06-20 08:30 HS 3 450 350 0
6 123 2019-06-20 08:00 MS 2 470 350 0
7 123 2019-06-20 08:30 HS 4 490 350 0
8 123 2019-06-20 09:30 HS 4 0 350 0
9 234 2019-06-20 09:30 HS 1 200 350 1
you could do, it gives(I guess) the right result for the dataframe you provided, but I'm not sure if that's what you want.
df['flag'] = ((df['stock']<df['out_value']) & (df['stock'] !=0)).astype(int)
To me, it is quite unclear what you're asking. If you want to flag as 1, all the rows which has stock below to out_value, except if they are 0, you can do...
df['flag'] = 0
df.loc[(df['stock'] < df['out_value']) & (df['stock'] != 0), 'flag'] = 1

Capping values after a trigger level in a different variable _after GroupBy

There was an elegant answer to a question almost like this provided by EdChum. The difference between that question and this is that now the capping needs to be applied to data that had had "GroupBy" performed.
Original Data:
Symbol DTE Spot Strike Vol
AAPL 30.00 100.00 80.00 14.58
AAPL 30.00 100.00 85.00 16.20
AAPL 30.00 100.00 90.00 18.00
AAPL 30.00 100.00 95.00 20.00
AAPL 30.00 100.00 100.00 22.00
AAPL 30.00 100.00 105.00 25.30
AAPL 30.00 100.00 110.00 29.10
AAPL 30.00 100.00 115.00 33.46
AAPL 30.00 100.00 120.00 38.48
AAPL 50.00 102.00 80.00 13.08
AAPL 50.00 102.00 85.00 14.70
AAPL 50.00 102.00 90.00 16.50
AAPL 50.00 102.00 95.00 18.50
AAPL 50.00 102.00 100.00 20.50
AAPL 50.00 102.00 105.00 23.80
AAPL 50.00 102.00 110.00 27.60
AAPL 50.00 102.00 115.00 31.96
AAPL 50.00 102.00 120.00 36.98
IBM 30.00 170.00 150.00 7.29
IBM 30.00 170.00 155.00 8.10
IBM 30.00 170.00 160.00 9.00
IBM 30.00 170.00 165.00 10.00
IBM 30.00 170.00 170.00 11.00
IBM 30.00 170.00 175.00 12.65
IBM 30.00 170.00 180.00 14.55
IBM 30.00 170.00 185.00 16.73
IBM 30.00 170.00 190.00 19.24
IBM 60.00 171.00 150.00 5.79
IBM 60.00 171.00 155.00 6.60
IBM 60.00 171.00 160.00 7.50
IBM 60.00 171.00 165.00 8.50
IBM 60.00 171.00 170.00 9.50
IBM 60.00 171.00 175.00 11.15
IBM 60.00 171.00 180.00 13.05
IBM 60.00 171.00 185.00 15.23
IBM 60.00 171.00 190.00 17.74
I then create a few new variables:
df['ATM_dist'] =abs(df['Spot']-df['Strike'])
imin = df.groupby(['DTE','Symbol'])['ATM_dist'].transform('idxmin')
df['NormStrike']=np.log(df['Strike']/df['Spot'])/(((df['DTE']/365)**.5)*df['ATMvol']/100)
df['ATMvol'] = df.loc[imin,'Vol'].values
The results are below:
Symbol DTE Spot Strike Vol ATM_dist ATMvol NormStrike
0 AAPL 30 100 80 14.58 20 22.0 -3.537916
1 AAPL 30 100 85 16.20 15 22.0 -2.576719
2 AAPL 30 100 90 18.00 10 22.0 -1.670479
3 AAPL 30 100 95 20.00 5 22.0 -0.813249
4 AAPL 30 100 100 22.00 0 22.0 0.000000
5 AAPL 30 100 105 25.30 5 22.0 0.773562
6 AAPL 30 100 110 29.10 10 22.0 1.511132
7 AAPL 30 100 115 33.46 15 22.0 2.215910
8 AAPL 30 100 120 38.48 20 22.0 2.890688
9 AAPL 50 102 80 13.08 22 20.5 -3.201973
10 AAPL 50 102 85 14.70 17 20.5 -2.402955
11 AAPL 50 102 90 16.50 12 20.5 -1.649620
12 AAPL 50 102 95 18.50 7 20.5 -0.937027
13 AAPL 50 102 100 20.50 2 20.5 -0.260994
14 AAPL 50 102 105 23.80 3 20.5 0.382049
15 AAPL 50 102 110 27.60 8 20.5 0.995172
16 AAPL 50 102 115 31.96 13 20.5 1.581035
17 AAPL 50 102 120 36.98 18 20.5 2.141961
18 IBM 30 170 150 7.29 20 11.0 -3.968895
19 IBM 30 170 155 8.10 15 11.0 -2.929137
20 IBM 30 170 160 9.00 10 11.0 -1.922393
21 IBM 30 170 165 10.00 5 11.0 -0.946631
22 IBM 30 170 170 11.00 0 11.0 0.000000
23 IBM 30 170 175 12.65 5 11.0 0.919188
24 IBM 30 170 180 14.55 10 11.0 1.812480
25 IBM 30 170 185 16.73 15 11.0 2.681295
26 IBM 30 170 190 19.24 20 11.0 3.526940
27 IBM 60 171 150 5.79 21 9.5 -3.401827
28 IBM 60 171 155 6.60 16 9.5 -2.550520
29 IBM 60 171 160 7.50 11 9.5 -1.726243
30 IBM 60 171 165 8.50 6 9.5 -0.927332
31 IBM 60 171 170 9.50 1 9.5 -0.152273
32 IBM 60 171 175 11.15 4 9.5 0.600317
33 IBM 60 171 180 13.05 9 9.5 1.331704
34 IBM 60 171 185 15.23 14 9.5 2.043051
35 IBM 60 171 190 17.74 19 9.5 2.735427
I wish to have the values of 'Vol' cap to the level where another column 'NormStrike' hits a trigger (in this case abs(NormStrike) >= 2 ). This new column, 'Desired_Level', created while leaving the 'Vol' column unchanged. The first cap should cause the Vol value at index location 0 to be 16.2 because the cap was triggered at index location 1 when NormStrike hit -2.576719.
Added clarification:
I am looking for a generic solution, that works away from the lowest abs(NormStrike) level in both directions to hit both the -2 and the +2 trigger. If it is not hit (which it might not be) then desired level is just original_level
An additional note, it will always be true that the abs(NormStrike) continues to grow in size from the min(abs(NormStrike)) level as it is a function of abs(distance from spot to strike)
the code that EdChum provided (prior to me bringing GroupBy into the mix) is below:
clip = 4
lower = df.loc[df['NS'] <= -clip, 'Vol'].idxmax()
upper = df.loc[df['NS'] >= clip, 'Vol'].idxmin()
df['Original_level'] = df['Original_level'].clip(df.loc[lower,'Original_level'], df.loc[upper, 'Original_level'])
There are 2 issues, first, it did not work after groupby and second, if a particular group of data does not have a NS value that exceeds the "clip" value then it generates an error. The ideal outcome would be, in this case, nothing is done to the Vol level for the particular Symbol/DTE group in question.
Ed suggested implementing a reset_index() but I am not sure how to use that to solve the issue.
I hope this was not to convoluted of a question
thank you for any assistance
You can try this to see whether it works out. I assume if the clip has been triggered, then NaN will be put. You can replace it by your customized choice.
import pandas as pd
import numpy as np
# use np.where(criterion, x, y) to do a vectorized statement like if criterion is True, then set it to x, else set it to y
def func(group):
group['Triggered'] = np.where((group['NormStrike'] >= 2) | (group['NormStrike'] <= -4), 'Yes', 'No')
group['Desired_Level'] = np.where((group['NormStrike'] >= 2) | (group['NormStrike'] <= -4), np.nan, group['Vol'])
group = group.fillna(method='ffill').fillna(method='bfill')
return group
df = df.groupby(['Symbol', 'DTE']).apply(func)
Out[410]:
Symbol DTE Spot Strike Vol ATM_dist ATMvol NormStrike Triggered Desired_Level
0 AAPL 30 100 80 14.58 20 22 -3.5379 No 14.58
1 AAPL 30 100 85 16.20 15 22 -2.5767 No 16.20
2 AAPL 30 100 90 18.00 10 22 -1.6705 No 18.00
3 AAPL 30 100 95 20.00 5 22 -0.8132 No 20.00
4 AAPL 30 100 100 22.00 0 22 0.0000 No 22.00
5 AAPL 30 100 105 25.30 5 22 0.7736 No 25.30
6 AAPL 30 100 110 29.10 10 22 1.5111 No 29.10
7 AAPL 30 100 115 33.46 15 22 2.2159 Yes 29.10
8 AAPL 30 100 120 38.48 20 22 2.8907 Yes 29.10
9 AAPL 50 102 80 14.58 22 22 -3.5379 No 14.58
10 AAPL 50 102 85 16.20 17 22 -2.5767 No 16.20
11 AAPL 50 102 90 18.00 12 22 -1.6705 No 18.00
12 AAPL 50 102 95 20.00 7 22 -0.8132 No 20.00
13 AAPL 50 102 100 22.00 2 22 0.0000 No 22.00
14 AAPL 50 102 105 25.30 3 22 0.7736 No 25.30
15 AAPL 50 102 110 29.10 8 22 1.5111 No 29.10
16 AAPL 50 102 115 33.46 13 22 2.2159 Yes 29.10
17 AAPL 50 102 120 38.48 18 22 2.8907 Yes 29.10
18 AAPL 30 170 150 14.58 20 22 -3.5379 No 14.58
19 AAPL 30 170 155 16.20 15 22 -2.5767 No 16.20
20 AAPL 30 170 160 18.00 10 22 -1.6705 No 18.00
21 AAPL 30 170 165 20.00 5 22 -0.8132 No 20.00
22 AAPL 30 170 170 22.00 0 22 0.0000 No 22.00
23 AAPL 30 170 175 25.30 5 22 0.7736 No 25.30
24 AAPL 30 170 180 29.10 10 22 1.5111 No 29.10
25 AAPL 30 170 185 33.46 15 22 2.2159 Yes 29.10
26 AAPL 30 170 190 38.48 20 22 2.8907 Yes 29.10
27 AAPL 60 171 150 14.58 21 22 -3.5379 No 14.58
28 AAPL 60 171 155 16.20 16 22 -2.5767 No 16.20
29 AAPL 60 171 160 18.00 11 22 -1.6705 No 18.00
30 AAPL 60 171 165 20.00 6 22 -0.8132 No 20.00
31 AAPL 60 171 170 22.00 1 22 0.0000 No 22.00
32 AAPL 60 171 175 25.30 4 22 0.7736 No 25.30
33 AAPL 60 171 180 29.10 9 22 1.5111 No 29.10
34 AAPL 60 171 185 33.46 14 22 2.2159 Yes 29.10
35 AAPL 60 171 190 38.48 19 22 2.8907 Yes 29.10