Pandas DateTimeIndex - create a new value which has the max value of previous month - pandas

I have something like the following dataframe (notice dt is the index)
fx fy
dt
2019-05-29 0.000000 0.000000
2019-05-30 65.410004 156.449997
2019-05-31 70.279999 125.040001
2019-06-01 49.220001 147.979996
2019-06-02 100.580002 232.539993
2019-06-03 262.230011 468.809998
2019-06-04 383.779999 525.390015
2019-06-05 761.609985 1147.380005
2019-06-06 1060.750000 1727.380005
2019-06-07 1640.300049 2827.120117
What I want to achieve is the have a new column named fz where each day's value, is the previous month's max value of fy - so the result would be
fx fy fz
dt
2019-05-29 0.000000 0.000000 NaN
2019-05-30 65.410004 156.449997 NaN
2019-05-31 70.279999 125.040001 NaN
2019-06-01 49.220001 147.979996 156.449997
2019-06-02 100.580002 232.539993 156.449997
2019-06-03 262.230011 468.809998 156.449997
2019-06-04 383.779999 525.390015 156.449997
2019-06-05 761.609985 1147.380005 156.449997
2019-06-06 1060.750000 1727.380005 156.449997
2019-06-07 1640.300049 2827.120117 156.449997
The first month's fz is empty because there is no previous month. I tried a combination of pd.Grouper(freq='M') with .transform() and .shift(-1, freq='M') but failed miserably as it changed the index entirely, and I would like to keep the index as is.
How can I solve this for arbitrary N months back?

Use DatetimeIndex.to_period for month period with shifting and mapping by Index.map:
#changed datetimeindex
print (df)
fx fy
dt
2019-05-29 0.000000 0.000000
2019-05-30 65.410004 156.449997
2019-05-31 70.279999 125.040001
2019-06-01 49.220001 147.979996
2019-06-02 100.580002 232.539993
2019-07-03 262.230011 468.809998
2019-07-04 383.779999 525.390015
2019-08-05 761.609985 1147.380005
2019-08-06 1060.750000 1727.380005
2019-09-07 1640.300049 2827.120117
N = 2
s = df.index.to_period('m')
df['fz'] = s.map(df.groupby(s)['fy'].max().shift(N))
print (df)
fx fy fz
dt
2019-05-29 0.000000 0.000000 NaN
2019-05-30 65.410004 156.449997 NaN
2019-05-31 70.279999 125.040001 NaN
2019-06-01 49.220001 147.979996 NaN
2019-06-02 100.580002 232.539993 NaN
2019-07-03 262.230011 468.809998 156.449997
2019-07-04 383.779999 525.390015 156.449997
2019-08-05 761.609985 1147.380005 232.539993
2019-08-06 1060.750000 1727.380005 232.539993
2019-09-07 1640.300049 2827.120117 525.390015
Solution if datetimes are not conecutive, missing some months with add N to PeriodIndex by rename:
print (df)
fx fy
dt
2019-05-29 0.000000 0.000000
2019-05-30 65.410004 156.449997
2019-05-31 70.279999 125.040001
2019-06-01 49.220001 147.979996
2019-06-02 100.580002 232.539993
2019-08-03 262.230011 468.809998
2019-08-04 383.779999 525.390015
2019-09-05 761.609985 1147.380005
2019-09-06 1060.750000 1727.380005
2019-09-07 1640.300049 2827.120117
N = 1
s = df.index.to_period('m')
df['fz'] = s.map(df.groupby(s)['fy'].max().rename(lambda x: x + N))
print (df)
fx fy fz
dt
2019-05-29 0.000000 0.000000 NaN
2019-05-30 65.410004 156.449997 NaN
2019-05-31 70.279999 125.040001 NaN
2019-06-01 49.220001 147.979996 156.449997
2019-06-02 100.580002 232.539993 156.449997
2019-08-03 262.230011 468.809998 NaN
2019-08-04 383.779999 525.390015 NaN
2019-09-05 761.609985 1147.380005 525.390015
2019-09-06 1060.750000 1727.380005 525.390015
2019-09-07 1640.300049 2827.120117 525.390015

you can do it in two steps:
create a table with maximum values + shift per monthly period:
maximum_shift = df.resample('M')['fy'].max().shift().to_period('M')
concatenate/merge it to the original data frame:
pd.DataFrame(pd.concat([df.to_period('M'), maximum_shift], axis=1).values, index=df.index, columns=df.columns.tolist()+['fz'])

Related

pandas (multi) index wrong need to change it

I have a DataFrame multiData that looks like this:
print(multiData)
Date Open High Low Close Adj Close Volume
Ticker Date
AAPL 0 2010-01-04 7.62 7.66 7.59 7.64 6.51 493729600
1 2010-01-05 7.66 7.70 7.62 7.66 6.52 601904800
2 2010-01-06 7.66 7.69 7.53 7.53 6.41 552160000
3 2010-01-07 7.56 7.57 7.47 7.52 6.40 477131200
4 2010-01-08 7.51 7.57 7.47 7.57 6.44 447610800
... ... ... ... ... ... ... ...
META 2668 2022-12-23 116.03 118.18 115.54 118.04 118.04 17796600
2669 2022-12-27 117.93 118.60 116.05 116.88 116.88 21392300
2670 2022-12-28 116.25 118.15 115.51 115.62 115.62 19612500
2671 2022-12-29 116.40 121.03 115.77 120.26 120.26 22366200
2672 2022-12-30 118.16 120.42 117.74 120.34 120.34 19492100
I need to get rid of "Date 0, 1, 2, ..." column and make the actual "Date" column part of the (multi) index
How do I do this?
Use df.droplevel to delete level 1 and chain df.set_index to add column Date to the index by setting the append parameter to True.
df = df.droplevel(1).set_index('Date', append=True)
df
Open High Low Close Adj Close Volume
Ticker Date
AAPL 2010-01-04 7.62 7.66 7.59 7.64 6.51 493729600
2010-01-05 7.66 7.70 7.62 7.66 6.52 601904800

resample dataset with one irregular datetime

I have a dataframe like the following. I wanted to check the values for each 15minutes. But I see that there is a time at 09:05:51. How can I resample the dataframe for 15minutes?
hour_min value
06:30:00 0.0
06:45:00 0.0
07:00:00 0.0
07:15:00 0.0
07:30:00 102.754717
07:45:00 130.599057
08:00:00 154.117925
08:15:00 189.061321
08:30:00 214.924528
08:45:00 221.382075
09:00:00 190.839623
09:05:51 428.0
09:15:00 170.973995
09:30:00 0.0
09:45:00 0.0
10:00:00 174.448113
10:15:00 174.900943
10:30:00 182.976415
10:45:00 195.783019
11:00:00 200.337292
11:14:00 80.0
11:15:00 206.280952
11:30:00 218.87886
11:45:00 238.251781
12:00:00 115.5
12:15:00 85.5
12:30:00 130.0
12:45:00 141.0
13:00:00 267.353774
13:15:00 257.061321
13:21:00 8.0
13:27:19 80.0
13:30:00 258.761905
13:45:00 254.703088
13:53:52 278.0
14:00:00 254.790476
14:15:00 247.165094
14:30:00 250.061321
14:45:00 264.014151
15:00:00 132.0
15:15:00 108.0
15:30:00 158.5
15:45:00 457.0
16:00:00 273.745283
16:15:00 273.962264
16:30:00 279.089623
16:45:00 280.264151
17:00:00 296.061321
17:15:00 296.481132
17:30:00 282.957547
17:45:00 279.816038
I have tried this line, but i get a typeError.
res = s.resample('15T').sum()
I tried to make the index to date, but it does not work too.

Merge old and new table and fill values by date

I have df1:
Date
Symbol
Time
Quantity
Price
2020-09-04
AAPL
09:54:48
11.0
115.97
2020-09-16
AAPL
09:30:02
-11.0
115.33
2020-02-24
AMBA
09:30:02
22.0
64.24
2020-02-25
AMBA
14:01:28
-22.0
62.64
2020-07-14
AMGN
09:30:01
5.0
243.90
...
...
...
...
...
2020-12-08
YUMC
09:30:00
-22.0
56.89
2020-11-18
Z
14:20:01
12.0
100.68
2020-11-20
Z
09:30:01
-12.0
109.25
2020-09-04
ZS
09:45:24
9.0
135.94
2020-09-14
ZS
09:38:23
-9.0
126.41
and df2:
Date
USD
2
2020-02-01
22.702
3
2020-03-01
22.753
4
2020-06-01
22.601
5
2020-07-01
22.626
6
2020-08-01
22.739
..
...
...
248
2020-12-23
21.681
249
2020-12-28
21.482
250
2020-12-29
21.462
251
2020-12-30
21.372
252
2020-12-31
21.387
I want to add a new column "USD" from df2 by date in df1.
Trying
new_df = (dane5.reset_index()
.merge(kurz2,how='outer')
.fillna(0)
.set_index('Date'))
new_df.sort_index(inplace=True)
new_df= new_df[new_df['Symbol'] != 0]
print(new_df.head(50))
But I return zero value some rows:
Date
Symbol
Time
Quantity
Price
USD
2020-01-02
GL
10:31:14
13.0
104.550000
0.000
2020-01-02
ATEC
13:35:04
211.0
6.860000
0.000
2020-01-03
IOVA
14:02:32
56.0
25.790000
0.000
2020-01-03
TGNA
09:30:00
90.0
16.080000
0.000
2020-01-03
SCS
09:30:01
-70.0
20.100000
0.000
2020-01-03
SKX
09:30:09
34.0
41.940000
0.000
2020-01-06
IOVA
09:45:19
-56.0
24.490000
24.163
2020-01-06
GL
09:30:02
-13.0
103.430000
24.163
2020-01-06
SKX
15:55:15
-34.0
43.900000
24.163
2020-01-07
TGNA
15:55:16
-90.0
16.945000
23.810
2020-01-07
MRTX
09:46:18
-13.0
101.290000
23.810
2020-01-07
MRTX
09:34:10
13.0
109.430000
23.810
2020-01-08
ITCI
09:30:01
49.0
27.640000
0.000
Could you some help me please?
Sorry my bad English language.

Pandas groupby and subtract each element of a column by element in nth row

I have a dataframe df as given below:
country_code count_date confirmed_cases
0 AFG 2020-09-13 38641.0
1 AFG 2020-09-12 38606.0
2 AFG 2020-09-11 38572.0
3 AFG 2020-09-10 38544.0
4 AFG 2020-09-09 38520.0
... ... ... ...
19521 ZWE 2020-06-03 206.0
19522 ZWE 2020-06-02 203.0
19523 ZWE 2020-06-01 178.0
19524 ZWE 2020-05-31 174.0
19525 ZWE 2020-05-30 149.0
After groupby country_code how do I create a new column which has confirmed_cases of each date subtracted by confirmed_cases n days earlier.
I tried
n = 7
df.groupby('country_code').confirmed_cases.transform(lambda x:x-x.iloc[::n])
which doesnt work.
You can do shift
n = 7
out = df['confirmed_cases'] - df.groupby('country_code').confirmed_cases.shift(n)
Update :
df.groupby('country_code').confirmed_cases.apply(lambda x:x-x.shift(n))

how to fill missing datatime row with pandas

index valuve
2017-01-25 01:00:00:00 1
2017-01-25 02:00:00:00 5
2017-01-25 03:00:00:00 7
2017-01-25 07:00:00:00 34
2017-01-25 20:00:00:00 45
2017-01-25 24:00:00:00 45
2017-01-26 1:00:00:00 31
This dataframe is a 24h record of each day, but it misses some record. How can i insert the missing row into the right place and fill 'nan' to the corresponding value?
Here is complicated 24H in datetimes, so necessary replace it to 23H and add one hour. Last use DataFrame.asfreq for add missing values for 24H DatetimeIndex:
mask = df.index.str.contains(' 24:')
idx = df.index.where(~mask, df.index.str.replace(' 24:', ' 23:'))
idx = pd.to_datetime(idx, format='%Y-%m-%d %H:%M:%S:%f')
df.index = idx.where(~mask, idx + pd.Timedelta(1, unit='H'))
df = df.asfreq('H')
print (df)
valuve
index
2017-01-25 01:00:00 1.0
2017-01-25 02:00:00 5.0
2017-01-25 03:00:00 7.0
2017-01-25 04:00:00 NaN
2017-01-25 05:00:00 NaN
2017-01-25 06:00:00 NaN
2017-01-25 07:00:00 34.0
2017-01-25 08:00:00 NaN
2017-01-25 09:00:00 NaN
2017-01-25 10:00:00 NaN
2017-01-25 11:00:00 NaN
2017-01-25 12:00:00 NaN
2017-01-25 13:00:00 NaN
2017-01-25 14:00:00 NaN
2017-01-25 15:00:00 NaN
2017-01-25 16:00:00 NaN
2017-01-25 17:00:00 NaN
2017-01-25 18:00:00 NaN
2017-01-25 19:00:00 NaN
2017-01-25 20:00:00 45.0
2017-01-25 21:00:00 NaN
2017-01-25 22:00:00 NaN
2017-01-25 23:00:00 NaN
2017-01-26 00:00:00 45.0
2017-01-26 01:00:00 31.0