Find Maximum Value in Column Pandas - pandas

I have a data frame like this- Machine Vibration data.
datetime
tagid
value
quality
0
2021-03-01 13:43:41.440
B42
345
192
1
2021-03-01 13:43:41.440
B43
958
192
2
2021-03-01 13:43:41.440
B44
993
192
3
2021-03-01 13:43:41.440
B45
1224
192
4
2021-03-01 13:43:43.527
B188
6665
192
5
2021-03-01 13:43:43.527
B189
7162
192
6
2021-03-01 13:43:43.527
B190
7193
192
7
2021-03-01 13:43:43.747
C29
2975
192
8
2021-03-01 13:43:43.747
C30
4445
192
9
2021-03-01 13:43:43.747
C31
4015
192
I want to convert this to hourly maximum value for each tag id.
Sample Output
datetime
tagid
value
quality
01-03-2021 13:00
C91
3982
192
01-03-2021 14:00
C91
3972
192
01-03-2021 13:00
C92
9000
192
01-03-2021 14:00
C92
9972
192
01-03-2021 13:00
B42
396
192
01-03-2021 14:00
B42
370
192
01-03-2021 15:00
B42
370
192
I tried with grouper, but couldn't get output.

Use Grouper with aggregate max:
df = df.groupby([pd.Grouper(freq='H', key='datetime'), 'tagid']).max().reset_index()

Related

Calculate Moving Average on Previous Calculated Moving Average (Snowflake)

I have a dataset that looks something like this. I wish to calculate a modified moving average (column Mod_MA) for sales column based on the following logic :
If there is no event, then ST Else Average last 4 dates.
Date
Item
Event
ST
Mod_MA
2022-10-01
ABC
100
100
2022-10-02
ABC
110
110
2022-10-03
ABC
120
120
2022-10-04
ABC
130
130
2022-10-05
ABC
EV1
140
115
2022-10-06
ABC
EV1
150
119
2022-10-07
ABC
160
160
2022-10-08
ABC
170
170
2022-10-09
ABC
180
180
2022-10-10
ABC
EV2
190
157
2022-10-11
ABC
EV2
200
167
2022-10-12
ABC
EV2
210
168
2022-10-01
XYZ
100
100
2022-10-02
XYZ
110
110
2022-10-03
XYZ
120
120
2022-10-04
XYZ
130
130
2022-10-05
XYZ
EV3
140
115
2022-10-06
XYZ
EV3
150
119
2022-10-07
XYZ
EV3
160
121
2022-10-08
XYZ
170
170
2022-10-09
XYZ
180
180
2022-10-10
XYZ
EV4
190
147
2022-10-11
XYZ
EV4
200
155
2022-10-12
XYZ
210
210
Hopefully the image helps clarify what I am going for.
I have tried LAG & AVG OVER ORDER BY but since I dont have an exact number of iterations I need to run, these dont work.
Calculation Formulae
Would appreciate any help.

add Rank Column to pandas Dataframe based on column condition

What I have is below.
DOG
Date
Steps
Tiger
2021-11-01
164
Oakley
2021-11-01
76
Piper
2021-11-01
65
Millie
2021-11-01
188
Oscar
2021-11-02
152
Foster
2021-11-02
191
Zeus
2021-11-02
101
Benji
2021-11-02
94
Lucy
2021-11-02
186
Rufus
2021-11-02
65
Hank
2021-11-03
98
Olive
2021-11-03
122
Ellie
2021-11-03
153
Thor
2021-11-03
152
Nala
2021-11-03
181
Mia
2021-11-03
48
Bella
2021-11-03
23
Izzy
2021-11-03
135
Pepper
2021-11-03
22
Diesel
2021-11-04
111
Dixie
2021-11-04
34
Emma
2021-11-04
56
Abbie
2021-11-04
32
Guinness
2021-11-04
166
Kobe
2021-11-04
71
What I want is below. Rank by value of ['Steps'] column for each Date
DOG
Date
Steps
Rank
Tiger
2021-11-01
164
2
Oakley
2021-11-01
76
3
Piper
2021-11-01
65
4
Millie
2021-11-01
188
1
Oscar
2021-11-02
152
3
Foster
2021-11-02
191
1
Zeus
2021-11-02
101
4
Benji
2021-11-02
94
5
Lucy
2021-11-02
186
2
Rufus
2021-11-02
65
6
Hank
2021-11-03
98
6
Olive
2021-11-03
122
5
Ellie
2021-11-03
153
2
Thor
2021-11-03
152
3
Nala
2021-11-03
181
1
Mia
2021-11-03
48
7
Bella
2021-11-03
23
8
Izzy
2021-11-03
135
4
Pepper
2021-11-03
22
9
Diesel
2021-11-04
111
2
Dixie
2021-11-04
34
5
Emma
2021-11-04
56
4
Abbie
2021-11-04
32
6
Guinness
2021-11-04
166
1
Kobe
2021-11-04
71
3
I tried below, but it failed.
df['Rank'] = df.groupby('Date')['Steps'].rank(ascending=False)
First your solution for me working.
Maybe need method='dense' and casting to integers.
df['Rank'] = df.groupby('Date')['Steps'].rank(ascending=False, method='dense').astype(int)

Pandas Group/Merge Dataframe by Non-Periodic Series

How do I group one DataFrame by another possibly-non-periodic Series? Mock-up below:
This is the DataFrame to be split:
i = pd.date_range(end="today", periods=20, freq="d").normalize()
v = np.random.randint(0,100,size=len(i))
d = pd.DataFrame({"value": v}, index=i)
>>> d
value
2021-02-06 48
2021-02-07 1
2021-02-08 86
2021-02-09 82
2021-02-10 40
2021-02-11 22
2021-02-12 63
2021-02-13 37
2021-02-14 41
2021-02-15 57
2021-02-16 30
2021-02-17 69
2021-02-18 63
2021-02-19 27
2021-02-20 23
2021-02-21 46
2021-02-22 66
2021-02-23 10
2021-02-24 91
2021-02-25 43
This is the splitting criteria, grouping by the Series dates. A group consists of any ordered dataframe value v such that {v} intersects [s,s+1) - but as with resampling it would be nice to control the inclusion parameters.
s = pd.date_range(start="2019-10-14", freq="2W", periods=52).to_series()
s = s.drop(np.random.choice(s.index, 10, replace=False))
s = s.reset_index(drop=True)
>>> s[25:29]
25 2021-01-24
26 2021-02-07
27 2021-02-21
28 2021-03-07
dtype: datetime64[ns]
And this is the example output... or something like it. Index is taken from the series rather than the dataframe.
>>> ???.sum()
value
...
2021-01-24 47
2021-02-07 768
2021-02-21 334
...
Internally the groups would have this structure:
...
2021-01-10
sum: 0
2021-01-24
2021-02-06 47
sum: 47
2021-02-07
2021-02-07 52
2021-02-08 56
2021-02-09 21
2021-02-10 39
2021-02-11 86
2021-02-12 30
2021-02-13 20
2021-02-14 76
2021-02-15 91
2021-02-16 70
2021-02-17 34
2021-02-18 73
2021-02-19 41
2021-02-20 79
sum: 768
2021-02-21
2021-02-21 90
2021-02-22 75
2021-02-23 12
2021-02-24 70
2021-02-25 87
sum: 334
2021-03-07
sum: 0
...
Looks like you can do:
bucket = pd.cut(d.index, bins=s, label=s[:-1], right=False)
d.groupby(bucket).sum()

Update fields in table based on previous field records

I have a table of records for pay periods with fields WeekStart and WeekEnd populated for ever fiscal year. In my application a user should be able to update the first WeekEnd date for a given fiscal year and that should in turn update subsequent records WeekStart date by adding 1 day to the Previous WeekEnd date and for the same records's WeekEnd date, add 13 days to the new WeekStart date.
This is part of a Stored Procedure written in SQL Server 2016.
UPDATE [dbo].[staffing_BiweeklyPPCopy]
SET
[WeekStart] = DATEADD(DD, 1, LAG([WeekEnd], 1) OVER (ORDER BY [ID])),
[WeekEnd] = DATEADD(DD, 14, LAG([WeekEnd], 1) OVER (ORDER BY [ID]))
WHERE
[FiscalYear] = #fiscalyear
Original Table contents shown below...
ID WeekStart WeekEnd
163 2018-10-01 2018-10-13
164 2018-10-14 2018-10-27
165 2018-10-28 2018-11-10
166 2018-11-11 2018-11-24
167 2018-11-25 2018-12-08
168 2018-12-09 2018-12-22
169 2018-12-23 2019-01-05
170 2019-01-06 2019-01-19
171 2019-01-20 2019-02-02
172 2019-02-03 2019-02-16
173 2019-02-17 2019-03-02
174 2019-03-03 2019-03-16
175 2019-03-17 2019-03-30
176 2019-03-31 2019-04-13
177 2019-04-14 2019-04-27
178 2019-04-28 2019-05-11
179 2019-05-12 2019-05-25
180 2019-05-26 2019-06-08
181 2019-06-09 2019-06-22
182 2019-06-23 2019-07-06
183 2019-07-07 2019-07-20
184 2019-07-21 2019-08-03
185 2019-08-04 2019-08-17
186 2019-08-18 2019-08-31
187 2019-09-01 2019-09-14
188 2019-09-15 2019-09-28
189 2019-09-29 2019-09-30
For example if a user updates the weekend date for record ID 163 to '2018-10-14', the table will update as follows..
ID WeekStart WeekEnd
163 2018-10-01 2018-10-14
164 2018-10-15 2018-10-28
165 2018-10-29 2018-11-11
166 2018-11-12 2018-11-25
167 2018-11-26 2018-12-09
.
.
.
189 2019-09-30 2019-09-30
Thank you in advance.

Capping values after a trigger level in a different variable _after GroupBy

There was an elegant answer to a question almost like this provided by EdChum. The difference between that question and this is that now the capping needs to be applied to data that had had "GroupBy" performed.
Original Data:
Symbol DTE Spot Strike Vol
AAPL 30.00 100.00 80.00 14.58
AAPL 30.00 100.00 85.00 16.20
AAPL 30.00 100.00 90.00 18.00
AAPL 30.00 100.00 95.00 20.00
AAPL 30.00 100.00 100.00 22.00
AAPL 30.00 100.00 105.00 25.30
AAPL 30.00 100.00 110.00 29.10
AAPL 30.00 100.00 115.00 33.46
AAPL 30.00 100.00 120.00 38.48
AAPL 50.00 102.00 80.00 13.08
AAPL 50.00 102.00 85.00 14.70
AAPL 50.00 102.00 90.00 16.50
AAPL 50.00 102.00 95.00 18.50
AAPL 50.00 102.00 100.00 20.50
AAPL 50.00 102.00 105.00 23.80
AAPL 50.00 102.00 110.00 27.60
AAPL 50.00 102.00 115.00 31.96
AAPL 50.00 102.00 120.00 36.98
IBM 30.00 170.00 150.00 7.29
IBM 30.00 170.00 155.00 8.10
IBM 30.00 170.00 160.00 9.00
IBM 30.00 170.00 165.00 10.00
IBM 30.00 170.00 170.00 11.00
IBM 30.00 170.00 175.00 12.65
IBM 30.00 170.00 180.00 14.55
IBM 30.00 170.00 185.00 16.73
IBM 30.00 170.00 190.00 19.24
IBM 60.00 171.00 150.00 5.79
IBM 60.00 171.00 155.00 6.60
IBM 60.00 171.00 160.00 7.50
IBM 60.00 171.00 165.00 8.50
IBM 60.00 171.00 170.00 9.50
IBM 60.00 171.00 175.00 11.15
IBM 60.00 171.00 180.00 13.05
IBM 60.00 171.00 185.00 15.23
IBM 60.00 171.00 190.00 17.74
I then create a few new variables:
df['ATM_dist'] =abs(df['Spot']-df['Strike'])
imin = df.groupby(['DTE','Symbol'])['ATM_dist'].transform('idxmin')
df['NormStrike']=np.log(df['Strike']/df['Spot'])/(((df['DTE']/365)**.5)*df['ATMvol']/100)
df['ATMvol'] = df.loc[imin,'Vol'].values
The results are below:
Symbol DTE Spot Strike Vol ATM_dist ATMvol NormStrike
0 AAPL 30 100 80 14.58 20 22.0 -3.537916
1 AAPL 30 100 85 16.20 15 22.0 -2.576719
2 AAPL 30 100 90 18.00 10 22.0 -1.670479
3 AAPL 30 100 95 20.00 5 22.0 -0.813249
4 AAPL 30 100 100 22.00 0 22.0 0.000000
5 AAPL 30 100 105 25.30 5 22.0 0.773562
6 AAPL 30 100 110 29.10 10 22.0 1.511132
7 AAPL 30 100 115 33.46 15 22.0 2.215910
8 AAPL 30 100 120 38.48 20 22.0 2.890688
9 AAPL 50 102 80 13.08 22 20.5 -3.201973
10 AAPL 50 102 85 14.70 17 20.5 -2.402955
11 AAPL 50 102 90 16.50 12 20.5 -1.649620
12 AAPL 50 102 95 18.50 7 20.5 -0.937027
13 AAPL 50 102 100 20.50 2 20.5 -0.260994
14 AAPL 50 102 105 23.80 3 20.5 0.382049
15 AAPL 50 102 110 27.60 8 20.5 0.995172
16 AAPL 50 102 115 31.96 13 20.5 1.581035
17 AAPL 50 102 120 36.98 18 20.5 2.141961
18 IBM 30 170 150 7.29 20 11.0 -3.968895
19 IBM 30 170 155 8.10 15 11.0 -2.929137
20 IBM 30 170 160 9.00 10 11.0 -1.922393
21 IBM 30 170 165 10.00 5 11.0 -0.946631
22 IBM 30 170 170 11.00 0 11.0 0.000000
23 IBM 30 170 175 12.65 5 11.0 0.919188
24 IBM 30 170 180 14.55 10 11.0 1.812480
25 IBM 30 170 185 16.73 15 11.0 2.681295
26 IBM 30 170 190 19.24 20 11.0 3.526940
27 IBM 60 171 150 5.79 21 9.5 -3.401827
28 IBM 60 171 155 6.60 16 9.5 -2.550520
29 IBM 60 171 160 7.50 11 9.5 -1.726243
30 IBM 60 171 165 8.50 6 9.5 -0.927332
31 IBM 60 171 170 9.50 1 9.5 -0.152273
32 IBM 60 171 175 11.15 4 9.5 0.600317
33 IBM 60 171 180 13.05 9 9.5 1.331704
34 IBM 60 171 185 15.23 14 9.5 2.043051
35 IBM 60 171 190 17.74 19 9.5 2.735427
I wish to have the values of 'Vol' cap to the level where another column 'NormStrike' hits a trigger (in this case abs(NormStrike) >= 2 ). This new column, 'Desired_Level', created while leaving the 'Vol' column unchanged. The first cap should cause the Vol value at index location 0 to be 16.2 because the cap was triggered at index location 1 when NormStrike hit -2.576719.
Added clarification:
I am looking for a generic solution, that works away from the lowest abs(NormStrike) level in both directions to hit both the -2 and the +2 trigger. If it is not hit (which it might not be) then desired level is just original_level
An additional note, it will always be true that the abs(NormStrike) continues to grow in size from the min(abs(NormStrike)) level as it is a function of abs(distance from spot to strike)
the code that EdChum provided (prior to me bringing GroupBy into the mix) is below:
clip = 4
lower = df.loc[df['NS'] <= -clip, 'Vol'].idxmax()
upper = df.loc[df['NS'] >= clip, 'Vol'].idxmin()
df['Original_level'] = df['Original_level'].clip(df.loc[lower,'Original_level'], df.loc[upper, 'Original_level'])
There are 2 issues, first, it did not work after groupby and second, if a particular group of data does not have a NS value that exceeds the "clip" value then it generates an error. The ideal outcome would be, in this case, nothing is done to the Vol level for the particular Symbol/DTE group in question.
Ed suggested implementing a reset_index() but I am not sure how to use that to solve the issue.
I hope this was not to convoluted of a question
thank you for any assistance
You can try this to see whether it works out. I assume if the clip has been triggered, then NaN will be put. You can replace it by your customized choice.
import pandas as pd
import numpy as np
# use np.where(criterion, x, y) to do a vectorized statement like if criterion is True, then set it to x, else set it to y
def func(group):
group['Triggered'] = np.where((group['NormStrike'] >= 2) | (group['NormStrike'] <= -4), 'Yes', 'No')
group['Desired_Level'] = np.where((group['NormStrike'] >= 2) | (group['NormStrike'] <= -4), np.nan, group['Vol'])
group = group.fillna(method='ffill').fillna(method='bfill')
return group
df = df.groupby(['Symbol', 'DTE']).apply(func)
Out[410]:
Symbol DTE Spot Strike Vol ATM_dist ATMvol NormStrike Triggered Desired_Level
0 AAPL 30 100 80 14.58 20 22 -3.5379 No 14.58
1 AAPL 30 100 85 16.20 15 22 -2.5767 No 16.20
2 AAPL 30 100 90 18.00 10 22 -1.6705 No 18.00
3 AAPL 30 100 95 20.00 5 22 -0.8132 No 20.00
4 AAPL 30 100 100 22.00 0 22 0.0000 No 22.00
5 AAPL 30 100 105 25.30 5 22 0.7736 No 25.30
6 AAPL 30 100 110 29.10 10 22 1.5111 No 29.10
7 AAPL 30 100 115 33.46 15 22 2.2159 Yes 29.10
8 AAPL 30 100 120 38.48 20 22 2.8907 Yes 29.10
9 AAPL 50 102 80 14.58 22 22 -3.5379 No 14.58
10 AAPL 50 102 85 16.20 17 22 -2.5767 No 16.20
11 AAPL 50 102 90 18.00 12 22 -1.6705 No 18.00
12 AAPL 50 102 95 20.00 7 22 -0.8132 No 20.00
13 AAPL 50 102 100 22.00 2 22 0.0000 No 22.00
14 AAPL 50 102 105 25.30 3 22 0.7736 No 25.30
15 AAPL 50 102 110 29.10 8 22 1.5111 No 29.10
16 AAPL 50 102 115 33.46 13 22 2.2159 Yes 29.10
17 AAPL 50 102 120 38.48 18 22 2.8907 Yes 29.10
18 AAPL 30 170 150 14.58 20 22 -3.5379 No 14.58
19 AAPL 30 170 155 16.20 15 22 -2.5767 No 16.20
20 AAPL 30 170 160 18.00 10 22 -1.6705 No 18.00
21 AAPL 30 170 165 20.00 5 22 -0.8132 No 20.00
22 AAPL 30 170 170 22.00 0 22 0.0000 No 22.00
23 AAPL 30 170 175 25.30 5 22 0.7736 No 25.30
24 AAPL 30 170 180 29.10 10 22 1.5111 No 29.10
25 AAPL 30 170 185 33.46 15 22 2.2159 Yes 29.10
26 AAPL 30 170 190 38.48 20 22 2.8907 Yes 29.10
27 AAPL 60 171 150 14.58 21 22 -3.5379 No 14.58
28 AAPL 60 171 155 16.20 16 22 -2.5767 No 16.20
29 AAPL 60 171 160 18.00 11 22 -1.6705 No 18.00
30 AAPL 60 171 165 20.00 6 22 -0.8132 No 20.00
31 AAPL 60 171 170 22.00 1 22 0.0000 No 22.00
32 AAPL 60 171 175 25.30 4 22 0.7736 No 25.30
33 AAPL 60 171 180 29.10 9 22 1.5111 No 29.10
34 AAPL 60 171 185 33.46 14 22 2.2159 Yes 29.10
35 AAPL 60 171 190 38.48 19 22 2.8907 Yes 29.10