Pandas: Find weekly max from timeseries(calendar week not 7 days) - pandas

I want my dataframe to be grouped by calendar weekly, like Monday to Sunday.
timestamp value
# before time
...
# this is a Friday
2021-10-01 13:00:00 2204.0
2021-10-01 13:30:00 3262.0
...
# this is next Monday
2021-10-04 16:00:00 254.0
2021-10-04 16:30:00 990.0
2021-10-04 17:00:00 1044.0
2021-10-04 17:30:00 26.0
...
# time continues
The result I'm expecting, hope this is clear enough.
timestamp value weekly_max
# this is a Friday
2021-10-01 13:00:00 2204.0 3262.0 # assume 3262.0 is the maximum value during 2021-09-27 to 2021-10-03
2021-10-01 13:30:00 3262.0 3262.0
...
# this is next Monday
2021-10-04 16:00:00 254.0 1044.0
2021-10-04 16:30:00 990.0 1044.0
2021-10-04 17:00:00 1044.0 1044.0
2021-10-04 17:30:00 26.0 1044.0
...

get week number:
df['week'] = df.datetime.dt.isocalendar().week
get max for each week
df_weeklymax = df.groupby('week').agg(max=('value', 'max')).reset_index()
merge 2 tables
df.merge(df_weeklymax, on='week', how='left')
example output:
datetime
value
week
max
0
2021-01-01 00:00:00
20
53
69
1
2021-01-01 13:36:00
69
53
69
2
2021-01-02 03:12:00
69
53
69
3
2021-01-02 16:48:00
57
53
69
4
2021-01-03 06:24:00
39
53
69
5
2021-01-03 20:00:00
56
53
69
6
2021-01-04 09:36:00
73
1
92
7
2021-01-04 23:12:00
76
1
92
8
2021-01-05 12:48:00
92
1
92
9
2021-01-06 02:24:00
4
1
92

Related

Get value of Same Hour value at 1,2 day before, 1 weak before , 1 month before

I have time series data with other fields.
Now I want create more columns like
valueonsamehour1daybefore,valueonsamehour2daybefore,
valueonsamehour3daybefore,valueonsamehour1weekbefore,
valueonsamehour1monthbefore
If values are not present at the hour then value should be set as zero
dataframe can be loaded from here
url = 'https://drive.google.com/file/d/1BXvJqKGLwG4hqWJvh9gPAHqCbCcCKkUT/view?usp=sharing'
path = 'https://drive.google.com/uc? export=download&id='+url.split('/')[-2]
df = pd.read_csv(path,index_col=0,delimiter=",")
The DataFrame looks like the following:
| time | StartCity | District | Id | stype | EndCity | Count
2021-09-15 09:00:00 1 104 2713 21 9 2
2021-05-16 11:00:00 1 107 1044 11 6 1
2021-05-16 12:00:00 1 107 1044 11 6 0
2021-05-16 13:00:00 1 107 1044 11 6 0
2021-05-16 14:00:00 1 107 1044 11 6 0
2021-05-16 15:00:00 1 107 1044 11 6 0
2021-05-16 16:00:00 1 107 1044 11 6 0
2021-05-16 17:00:00 1 107 1044 11 6 0
2021-05-16 18:00:00 1 107 1044 11 6 0
2021-05-16 19:00:00 1 107 1044 11 6 0
2021-05-16 20:00:00 1 107 1044 11 6 0
2021-05-16 21:00:00 1 107 1044 11 6 0
2021-05-16 22:00:00 1 107 1044 11 6 0
2021-05-16 23:00:00 1 107 1044 11 6 0
2021-05-17 00:00:00 1 107 1044 11 6 0
2021-05-17 01:00:00 1 107 1044 11 6 0
2021-05-17 02:00:00 1 107 1044 11 6 0
2021-05-17 03:00:00 1 107 1044 11 6 0
2021-05-17 04:00:00 1 107 1044 11 6 0
2021-05-17 05:00:00 1 107 1044 11 6 0
2021-05-17 06:00:00 1 107 1044 11 6 0
2021-05-17 07:00:00 1 107 1044 11 6 0
2021-05-17 08:00:00 1 107 1044 11 6 0
2021-05-17 09:00:00 1 107 1044 11 6 0
2021-05-17 10:00:00 1 107 1044 11 6 0
2021-05-17 11:00:00 1 107 1044 11 6 0

Pandas Group/Merge Dataframe by Non-Periodic Series

How do I group one DataFrame by another possibly-non-periodic Series? Mock-up below:
This is the DataFrame to be split:
i = pd.date_range(end="today", periods=20, freq="d").normalize()
v = np.random.randint(0,100,size=len(i))
d = pd.DataFrame({"value": v}, index=i)
>>> d
value
2021-02-06 48
2021-02-07 1
2021-02-08 86
2021-02-09 82
2021-02-10 40
2021-02-11 22
2021-02-12 63
2021-02-13 37
2021-02-14 41
2021-02-15 57
2021-02-16 30
2021-02-17 69
2021-02-18 63
2021-02-19 27
2021-02-20 23
2021-02-21 46
2021-02-22 66
2021-02-23 10
2021-02-24 91
2021-02-25 43
This is the splitting criteria, grouping by the Series dates. A group consists of any ordered dataframe value v such that {v} intersects [s,s+1) - but as with resampling it would be nice to control the inclusion parameters.
s = pd.date_range(start="2019-10-14", freq="2W", periods=52).to_series()
s = s.drop(np.random.choice(s.index, 10, replace=False))
s = s.reset_index(drop=True)
>>> s[25:29]
25 2021-01-24
26 2021-02-07
27 2021-02-21
28 2021-03-07
dtype: datetime64[ns]
And this is the example output... or something like it. Index is taken from the series rather than the dataframe.
>>> ???.sum()
value
...
2021-01-24 47
2021-02-07 768
2021-02-21 334
...
Internally the groups would have this structure:
...
2021-01-10
sum: 0
2021-01-24
2021-02-06 47
sum: 47
2021-02-07
2021-02-07 52
2021-02-08 56
2021-02-09 21
2021-02-10 39
2021-02-11 86
2021-02-12 30
2021-02-13 20
2021-02-14 76
2021-02-15 91
2021-02-16 70
2021-02-17 34
2021-02-18 73
2021-02-19 41
2021-02-20 79
sum: 768
2021-02-21
2021-02-21 90
2021-02-22 75
2021-02-23 12
2021-02-24 70
2021-02-25 87
sum: 334
2021-03-07
sum: 0
...
Looks like you can do:
bucket = pd.cut(d.index, bins=s, label=s[:-1], right=False)
d.groupby(bucket).sum()

Skip Week Number 53 to week number 1 in pandas

The output for the function (analysis_data['Date']+ pd.DateOffset(1)).dt.week is
Date Week
2020-12-26 52
2020-12-27 53
2020-12-28 53
2020-12-29 53
2020-12-30 53
2020-12-31 53
2021-01-01 53
2021-01-02 53
2021-01-03 1
But i want my dataframe to consider 53 as Week 1 as well
Date Week
2020-12-26 52
2020-12-27 1
2020-12-28 1
2020-12-29 1
2020-12-30 1
2020-12-31 1
2021-01-01 1
2021-01-02 1
2021-01-03 2

Add a column value with the other date time column at minutes level in pandas

I have a data frame as shown below
ID ideal_appt_time service_time
1 2020-01-06 09:00:00 22
2 2020-01-06 09:30:00 15
1 2020-01-08 14:00:00 42
2 2020-01-12 01:30:00 5
I would like to add service time in terms of minutes with ideal_appt_time and create new column called finish.
Expected Output:
ID ideal_appt_time service_time finish
1 2020-01-06 09:00:00 22 2020-01-06 09:22:00
2 2020-01-06 09:30:00 15 2020-01-06 09:45:00
1 2020-01-08 14:00:00 42 2020-01-08 14:42:00
2 2020-01-12 01:30:00 35 2020-01-12 02:05:00
Use to_timedelta for convert column to timedeltas by minutes and add to datetimes:
df['ideal_appt_time'] = pd.to_datetime(df['ideal_appt_time'])
df['finish'] = df['ideal_appt_time'] + pd.to_timedelta(df['service_time'], unit='Min')
print (df)
ID ideal_appt_time service_time finish
0 1 2020-01-06 09:00:00 22 2020-01-06 09:22:00
1 2 2020-01-06 09:30:00 15 2020-01-06 09:45:00
2 1 2020-01-08 14:00:00 42 2020-01-08 14:42:00
3 2 2020-01-12 01:30:00 5 2020-01-12 01:35:00
Data
df=pd.DataFrame({'ideal_appt_time':['2020-01-06 09:00:00','2020-01-06 09:30:00','2020-01-08 14:00:00','2020-01-12 01:30:00'],'service_time':[22,15,42,35]})
Another way out
df['finish'] = pd.to_datetime(df['ideal_appt_time']).add( df['service_time'].astype('timedelta64[m]'))
df
ideal_appt_time service_time finish
0 2020-01-06 09:00:00 22 2020-01-06 09:22:00
1 2020-01-06 09:30:00 15 2020-01-06 09:45:00
2 2020-01-08 14:00:00 42 2020-01-08 14:42:00
3 2020-01-12 01:30:00 35 2020-01-12 02:05:00

Sorting csv file with Python 3

I'm having trouble sorting a csv file which has in its second column the UTC time as: 2010-01-01 00:00:00
I have a file that is like this:
name utc_time longitude latitude
A 2010-01-01 00:00:34 23 41
B 2011-01-01 10:00:00 26 44
C 2009-01-01 03:00:00 34 46
D 2012-01-01 00:00:00 31 47
E 2010-01-01 04:00:00 44 48
F 2013-01-01 14:00:00 24 41
Which I want it to be outputted in a csv file keeping the same structure but sorted by date:
Output:
name utc_time longitude latitude
C 2009-01-01 03:00:00 34 46
A 2010-01-01 00:00:34 23 41
E 2010-01-01 04:00:00 44 48
B 2011-01-01 10:00:00 26 44
D 2012-01-01 00:00:00 31 47
F 2013-01-01 14:00:00 24 41
I'm actually trying this:
fileEru = pd.read_csv("input.csv")
fileEru = sorted(fileEru, key = lambda row: datetime.strptime(row[1],'%Y-%m-%d %H:%M:%S'), reverse=True)
fileEru.to_csv("output.csv")
But it doesn't work.
try this:
(pd.read_csv("input.csv", parse_dates=['utc_time'])
.sort_values('utc_time')
.to_csv("output.csv", index=False))