How to set a flag in groupby with a condition in pandas - pandas

I have a following dataframe
code date time product tank stock out_value
123 2019-06-20 07:00 MS 1 370 350
123 2019-06-20 07:30 HS 3 340 350
123 2019-06-20 07:00 MS 2 340 350
123 2019-06-20 07:30 HS 4 340 350
123 2019-06-20 08:00 MS 1 470 350
123 2019-06-20 08:30 HS 3 450 350
123 2019-06-20 08:00 MS 2 470 350
123 2019-06-20 08:30 HS 4 490 350
123 2019-06-20 09:30 HS 4 0 350
234 2019-06-20 09:30 HS 1 200 350
I want to find out which stock values are less than out_value in above dataframe excluding 0 value.
e.g. at 07:30 for ro code 123 on date 2019-06-20 for product HS there are two tanks 3 and 4, so if stocks for both the tanks are below out_value then flag is set to 1.
My desired dataframe would be
code date time product tank stock out_value flag
123 2019-06-20 07:00 MS 1 370 350 0
123 2019-06-20 07:30 HS 3 340 350 1
123 2019-06-20 07:00 MS 2 340 350 0
123 2019-06-20 07:30 HS 4 340 350 1
123 2019-06-20 08:00 MS 1 470 350 0
123 2019-06-20 08:30 HS 3 450 350 0
123 2019-06-20 08:00 MS 2 470 350 0
123 2019-06-20 08:30 HS 4 490 350 0
123 2019-06-20 09:30 HS 4 0 350 0
234 2019-06-20 09:30 HS 1 200 350 1
How can I do it in pandas?

If need check difference with non 0 values and then check all True values per groups with GroupBy.transform and GroupBy.all:
df['flag'] = ((df['stock']<df['out_value']) & (df['stock'] !=0))
df['flag'] = df.groupby(['code','date','time','product'])['flag'].transform('all').astype(int)
print (df)
code date time product tank stock out_value flag
0 123 2019-06-20 07:00 MS 1 370 350 0
1 123 2019-06-20 07:30 HS 3 340 350 1
2 123 2019-06-20 07:00 MS 2 340 350 0
3 123 2019-06-20 07:30 HS 4 340 350 1
4 123 2019-06-20 08:00 MS 1 470 350 0
5 123 2019-06-20 08:30 HS 3 450 350 0
6 123 2019-06-20 08:00 MS 2 470 350 0
7 123 2019-06-20 08:30 HS 4 490 350 0
8 123 2019-06-20 09:30 HS 4 0 350 0
9 234 2019-06-20 09:30 HS 1 200 350 1
Or if need test only difference, test per groups and last chain with mask for test non 0 values:
df['flag'] = df['stock']<df['out_value']
mask = df.groupby(['code','date','time','product'])['flag'].transform('all')
df['flag'] = (mask & (df['stock'] !=0)).astype(int)

This should do it:
df['flag'] = (df.assign(flag=(df.stock<df.out_value)&(df.stock>0))
.groupby(['code', 'date', 'time', 'product'], as_index=False)['flag']
.transform(all)
.astype(int))
df
code date time product tank stock out_value flag
0 123 2019-06-20 07:00 MS 1 370 350 0
1 123 2019-06-20 07:30 HS 3 340 350 1
2 123 2019-06-20 07:00 MS 2 340 350 0
3 123 2019-06-20 07:30 HS 4 340 350 1
4 123 2019-06-20 08:00 MS 1 470 350 0
5 123 2019-06-20 08:30 HS 3 450 350 0
6 123 2019-06-20 08:00 MS 2 470 350 0
7 123 2019-06-20 08:30 HS 4 490 350 0
8 123 2019-06-20 09:30 HS 4 0 350 0
9 234 2019-06-20 09:30 HS 1 200 350 1

you could do, it gives(I guess) the right result for the dataframe you provided, but I'm not sure if that's what you want.
df['flag'] = ((df['stock']<df['out_value']) & (df['stock'] !=0)).astype(int)

To me, it is quite unclear what you're asking. If you want to flag as 1, all the rows which has stock below to out_value, except if they are 0, you can do...
df['flag'] = 0
df.loc[(df['stock'] < df['out_value']) & (df['stock'] != 0), 'flag'] = 1

Related

Calculate Moving Average on Previous Calculated Moving Average (Snowflake)

I have a dataset that looks something like this. I wish to calculate a modified moving average (column Mod_MA) for sales column based on the following logic :
If there is no event, then ST Else Average last 4 dates.
Date
Item
Event
ST
Mod_MA
2022-10-01
ABC
100
100
2022-10-02
ABC
110
110
2022-10-03
ABC
120
120
2022-10-04
ABC
130
130
2022-10-05
ABC
EV1
140
115
2022-10-06
ABC
EV1
150
119
2022-10-07
ABC
160
160
2022-10-08
ABC
170
170
2022-10-09
ABC
180
180
2022-10-10
ABC
EV2
190
157
2022-10-11
ABC
EV2
200
167
2022-10-12
ABC
EV2
210
168
2022-10-01
XYZ
100
100
2022-10-02
XYZ
110
110
2022-10-03
XYZ
120
120
2022-10-04
XYZ
130
130
2022-10-05
XYZ
EV3
140
115
2022-10-06
XYZ
EV3
150
119
2022-10-07
XYZ
EV3
160
121
2022-10-08
XYZ
170
170
2022-10-09
XYZ
180
180
2022-10-10
XYZ
EV4
190
147
2022-10-11
XYZ
EV4
200
155
2022-10-12
XYZ
210
210
Hopefully the image helps clarify what I am going for.
I have tried LAG & AVG OVER ORDER BY but since I dont have an exact number of iterations I need to run, these dont work.
Calculation Formulae
Would appreciate any help.

Find Maximum Value in Column Pandas

I have a data frame like this- Machine Vibration data.
datetime
tagid
value
quality
0
2021-03-01 13:43:41.440
B42
345
192
1
2021-03-01 13:43:41.440
B43
958
192
2
2021-03-01 13:43:41.440
B44
993
192
3
2021-03-01 13:43:41.440
B45
1224
192
4
2021-03-01 13:43:43.527
B188
6665
192
5
2021-03-01 13:43:43.527
B189
7162
192
6
2021-03-01 13:43:43.527
B190
7193
192
7
2021-03-01 13:43:43.747
C29
2975
192
8
2021-03-01 13:43:43.747
C30
4445
192
9
2021-03-01 13:43:43.747
C31
4015
192
I want to convert this to hourly maximum value for each tag id.
Sample Output
datetime
tagid
value
quality
01-03-2021 13:00
C91
3982
192
01-03-2021 14:00
C91
3972
192
01-03-2021 13:00
C92
9000
192
01-03-2021 14:00
C92
9972
192
01-03-2021 13:00
B42
396
192
01-03-2021 14:00
B42
370
192
01-03-2021 15:00
B42
370
192
I tried with grouper, but couldn't get output.
Use Grouper with aggregate max:
df = df.groupby([pd.Grouper(freq='H', key='datetime'), 'tagid']).max().reset_index()

Pandas multiply 2 series of different dimensions to give dataframe

I have a series;
Red 33
Blue 44
Green 22
And also this series;
0 100
1 100
2 100
3 200
4 200
5 200
I want to multiply these in a way to give the following dataframe
Red Blue Green
0 330 440 220
1 330 440 220
2 330 440 220
3 660 880 440
4 660 880 440
5 660 880 440
Can anyone see a simply / tidy way this could be done?
IIUC assuming s is the name of the first series and s1 is the name of the second series, try:
m=s.to_frame().T
pd.DataFrame(m.values*s1.values[:,None],columns=m.columns)
Red Blue Green
0 3300 4400 2200
1 3300 4400 2200
2 3300 4400 2200
3 6600 8800 4400
4 6600 8800 4400
5 6600 8800 4400

how to add incremental number to specific column in pandas

I have following dataframe in pandas
code tank length dia diff
123 3 625 210 -0.38
123 5 635 210 1.2
I want to add 1 only in length for 5 times if the diff is positive and subtract 1 if the dip is negative. My desired dataframe looks like
code tank length diameter
123 3 625 210
123 3 624 210
123 3 623 210
123 3 622 210
123 3 621 210
123 3 620 210
123 5 635 210
123 5 636 210
123 5 637 210
123 5 638 210
123 5 639 210
123 5 640 210
I am doing following in pandas.
df.add(1)
But, its adding 1 to all the columns.
Use Index.repeat 6 times, then add counter values by GroupBy.cumcount and last create default RangeIndex by DataFrame.set_index:
df1 = df.loc[df.index.repeat(6)].copy()
df1['length'] += df1.groupby(level=0).cumcount()
df1 = df1.reset_index(drop=True)
Or:
df1 = (df.loc[df.index.repeat(6)]
.assign(length = lambda x: x.groupby(level=0).cumcount() + x['length'])
.reset_index(drop=True))
print (df1)
code tank length dia
0 123 3 625 210
1 123 3 626 210
2 123 3 627 210
3 123 3 628 210
4 123 3 629 210
5 123 3 630 210
6 123 5 635 210
7 123 5 636 210
8 123 5 637 210
9 123 5 638 210
10 123 5 639 210
11 123 5 640 210
EDIT:
df1 = df.loc[df.index.repeat(6)].copy()
add = df1.groupby(level=0).cumcount()
mask = df1['diff'] < 0
df1['length'] = np.where(mask, df1['length'] - add, df1['length'] + add)
df1 = df1.reset_index(drop=True)
print (df1)
code tank length dia diff
0 123 3 625 210 -0.38
1 123 3 624 210 -0.38
2 123 3 623 210 -0.38
3 123 3 622 210 -0.38
4 123 3 621 210 -0.38
5 123 3 620 210 -0.38
6 123 5 635 210 1.20
7 123 5 636 210 1.20
8 123 5 637 210 1.20
9 123 5 638 210 1.20
10 123 5 639 210 1.20
11 123 5 640 210 1.20
We can use pd.concat, np.cumsum and groupby + .add.
If you want to substract, simply multiply addition * -1 so for example: (np.cumsum(np.ones(n))-1) * -1
n = 6
new = pd.concat([df]*n).sort_values(['code', 'length']).reset_index(drop=True)
addition = np.cumsum(np.ones(n))-1
new['length'] = new.groupby(['code', 'tank'])['length'].apply(lambda x: x.add(addition))
Output
code tank length dia
0 123 3 625.0 210
1 123 3 626.0 210
2 123 3 627.0 210
3 123 3 628.0 210
4 123 3 629.0 210
5 123 3 630.0 210
6 123 5 635.0 210
7 123 5 636.0 210
8 123 5 637.0 210
9 123 5 638.0 210
10 123 5 639.0 210
11 123 5 640.0 210

How to add status to the table

I have the following table where is clipping from my db. I have 2 types of contracts.
I: client pays for first 6mth 60$, next 6mth 120$ (111 client)
II: client pays for first 6mth 60$ but if want still pays 60$ the contract will be extended at 6mth, whole contract is 18mth. (321 client who still pays)
ID_Client | Amount | Amount_charge | Lenght | Date_from | Date_to | Reverse
--------------------------------------------------------------------------------
111 60 60 12 2015-01-01 2015-01-31 12
111 60 60 12 2015-02-01 2015-02-28 11
111 60 60 12 2015-03-01 2015-03-31 10
111 60 60 12 2015-04-01 2015-04-30 9
111 60 60 12 2015-05-01 2015-05-31 8
111 60 60 12 2015-06-01 2015-06-30 7
111 120 60 12 2015-07-01 2015-07-31 6
111 120 60 12 2015-08-01 2015-08-31 5
111 120 60 12 2015-09-01 2015-09-30 4
111 120 60 12 2015-10-01 2015-10-31 3
111 120 60 12 2015-11-01 2015-11-30 2
111 120 60 12 2015-12-01 2015-12-31 1
111 120 60 12 2016-01-01 2015-01-31 0
111 120 60 12 2016-02-01 2015-02-29 0
321 60 60 12 2015-01-01 2015-01-31 12
321 60 60 12 2015-02-01 2015-02-28 11
321 60 60 12 2015-03-01 2015-03-31 10
321 60 60 12 2015-04-01 2015-04-30 9
321 60 60 12 2015-05-01 2015-05-31 8
321 60 60 12 2015-06-01 2015-06-30 7
321 60 60 12 2015-07-01 2015-07-31 6
321 60 60 12 2015-08-01 2015-08-31 5
321 60 60 12 2015-09-01 2015-09-30 4
321 60 60 12 2015-10-01 2015-10-31 3
321 60 60 12 2015-11-01 2015-11-30 2
321 60 60 12 2015-12-01 2015-12-31 1
321 60 60 12 2016-01-01 2016-01-30 0
321 60 60 12 2016-02-01 2016-02-31 0
321 60 60 12 2016-03-01 2016-03-30 0
321 60 60 12 2016-04-01 2016-04-31 0
I need to add status column.
A - normal period of agreement
D - where the agreement is doubled after 6mth but after 12mth is E(nd of agreemnt)
E - where contract is finished
L - where contract after 6mth was extended, after 18mth the status will be type E
For 321 Client after 12mth the lenght of contract was updated from 12 to 18
I have a lot of clients so i think better will be using loop to go by all clients?
ID_Client | Amount | Amount_charge | Lenght | Date_from | Date_to | Reverse | Status
-----------------------------------------------------------------------------------------
111 60 60 12 2015-01-01 2015-01-31 12 A
111 60 60 12 2015-02-01 2015-02-28 11 A
111 60 60 12 2015-03-01 2015-03-31 10 A
111 60 60 12 2015-04-01 2015-04-30 9 A
111 60 60 12 2015-05-01 2015-05-31 8 A
111 60 60 12 2015-06-01 2015-06-30 7 A
111 120 60 12 2015-07-01 2015-07-31 6 D
111 120 60 12 2015-08-01 2015-08-31 5 D
111 120 60 12 2015-09-01 2015-09-30 4 D
111 120 60 12 2015-10-01 2015-10-31 3 D
111 120 60 12 2015-11-01 2015-11-30 2 D
111 120 60 12 2015-12-01 2015-12-31 1 D
111 120 60 12 2016-01-01 2015-01-31 0 E
111 120 60 12 2016-02-01 2015-02-29 0 E
321 60 60 12 2015-01-01 2015-01-31 12 A
321 60 60 12 2015-02-01 2015-02-28 11 A
321 60 60 12 2015-03-01 2015-03-31 10 A
321 60 60 12 2015-04-01 2015-04-30 9 A
321 60 60 12 2015-05-01 2015-05-31 8 A
321 60 60 12 2015-06-01 2015-06-30 7 A
321 60 60 12 2015-07-01 2015-07-31 6 L
321 60 60 12 2015-08-01 2015-08-31 5 L
321 60 60 12 2015-09-01 2015-09-30 4 L
321 60 60 12 2015-10-01 2015-10-31 3 L
321 60 60 12 2015-11-01 2015-11-30 2 L
321 60 60 12 2015-12-01 2015-12-31 1 L
321 60 60 18 2016-01-01 2016-01-30 0 L
321 60 60 18 2016-02-01 2016-02-31 0 L
321 60 60 18 2016-03-01 2016-03-30 0 L
321 60 60 18 2016-04-01 2016-04-31 0 L
If the Reverse column is what I think:
update table1 a
set "Status"=
CASE
WHEN A."Reverse" > 6 THEN
'A'
WHEN A."Reverse" > 0 THEN
DECODE (A."Amount", A."Amount_charge", 'L', 'D')
ELSE
CASE
WHEN A."Amount" <> A."Amount_charge" THEN
'E'
ELSE
CASE WHEN ADD_MONTHS ( (SELECT b."Date_from" FROM table1 b WHERE a."ID_Client" = b."ID_Client" AND b."Reverse" = 1),6) > a."Date_from" THEN 'L'
ELSE
'E'
END
END
END
Better is to calculate the sums. The amount per month come from first payment. Something like this:
DECLARE
CURSOR c2
IS
SELECT ID_CLIENT, --AMOUNT, AMOUNT_CHARGE, LENGTH, DATE_FROM, DATE_TO, REVERSE, STATUS,
FIRST_VALUE (amount_charge) OVER (PARTITION BY id_client ORDER BY date_from) first_amount_charge,
SUM (amount) OVER (PARTITION BY id_client ORDER BY date_from) sum_amount,
SUM (amount_charge) OVER (PARTITION BY id_client ORDER BY date_from) sum_amount_charge
FROM TABLE2
FOR UPDATE NOWAIT;
BEGIN
FOR c1 IN c2
LOOP
UPDATE table2
SET status = CASE WHEN c1.sum_amount <= 6 * c1.first_amount_charge THEN 'A'
WHEN c1.sum_amount > 18 * c1.first_amount_charge THEN 'E'
WHEN c1.sum_amount > c1.sum_amount_charge THEN 'D'
ELSE 'L'
END
WHERE CURRENT OF c2;
END LOOP;
END;