Add a column value with the other date time column at minutes level in pandas - pandas

I have a data frame as shown below
ID ideal_appt_time service_time
1 2020-01-06 09:00:00 22
2 2020-01-06 09:30:00 15
1 2020-01-08 14:00:00 42
2 2020-01-12 01:30:00 5
I would like to add service time in terms of minutes with ideal_appt_time and create new column called finish.
Expected Output:
ID ideal_appt_time service_time finish
1 2020-01-06 09:00:00 22 2020-01-06 09:22:00
2 2020-01-06 09:30:00 15 2020-01-06 09:45:00
1 2020-01-08 14:00:00 42 2020-01-08 14:42:00
2 2020-01-12 01:30:00 35 2020-01-12 02:05:00

Use to_timedelta for convert column to timedeltas by minutes and add to datetimes:
df['ideal_appt_time'] = pd.to_datetime(df['ideal_appt_time'])
df['finish'] = df['ideal_appt_time'] + pd.to_timedelta(df['service_time'], unit='Min')
print (df)
ID ideal_appt_time service_time finish
0 1 2020-01-06 09:00:00 22 2020-01-06 09:22:00
1 2 2020-01-06 09:30:00 15 2020-01-06 09:45:00
2 1 2020-01-08 14:00:00 42 2020-01-08 14:42:00
3 2 2020-01-12 01:30:00 5 2020-01-12 01:35:00

Data
df=pd.DataFrame({'ideal_appt_time':['2020-01-06 09:00:00','2020-01-06 09:30:00','2020-01-08 14:00:00','2020-01-12 01:30:00'],'service_time':[22,15,42,35]})
Another way out
df['finish'] = pd.to_datetime(df['ideal_appt_time']).add( df['service_time'].astype('timedelta64[m]'))
df
ideal_appt_time service_time finish
0 2020-01-06 09:00:00 22 2020-01-06 09:22:00
1 2020-01-06 09:30:00 15 2020-01-06 09:45:00
2 2020-01-08 14:00:00 42 2020-01-08 14:42:00
3 2020-01-12 01:30:00 35 2020-01-12 02:05:00

Related

Pandas: create a period based on date column

I have a dataframe
ID datetime
11 01-09-2021 10:00:00
11 01-09-2021 10:15:15
11 01-09-2021 15:00:00
12 01-09-2021 15:10:00
11 01-09-2021 18:00:00
I need to add period based just on datetime if it increases to 2 hours
ID datetime period
11 01-09-2021 10:00:00 1
11 01-09-2021 10:15:15 1
11 01-09-2021 15:00:00 2
12 01-09-2021 15:10:00 2
11 01-09-2021 18:00:00 3
And the same thing but based on ID and datetime
ID datetime period
11 01-09-2021 10:00:00 1
11 01-09-2021 10:15:15 1
11 01-09-2021 15:00:00 2
12 01-09-2021 15:10:00 1
11 01-09-2021 18:00:00 3
How can I do that?
You can get difference by Series.diff, convert to hours Series.dt.total_seconds, comapre for 2 and add cumulative sum:
df['period'] = df['datetime'].diff().dt.total_seconds().div(3600).gt(2).cumsum().add(1)
print (df)
ID datetime period
0 11 2021-01-09 10:00:00 1
1 11 2021-01-09 10:15:15 1
2 11 2021-01-09 15:00:00 2
3 12 2021-01-09 15:10:00 2
4 11 2021-01-09 18:00:00 3
Similar idea per groups:
f = lambda x: x.diff().dt.total_seconds().div(3600).gt(2).cumsum().add(1)
df['period'] = df.groupby('ID')['datetime'].transform(f)
print (df)
ID datetime period
0 11 2021-01-09 10:00:00 1
1 11 2021-01-09 10:15:15 1
2 11 2021-01-09 15:00:00 2
3 12 2021-01-09 15:10:00 1
4 11 2021-01-09 18:00:00 3

generate a random number between 2 and 40 with mean 20 as a column in pandas

I have a data frame as shown below
session slot_num appt_time
s1 1 2020-01-06 09:00:00
s1 2 2020-01-06 09:20:00
s1 3 2020-01-06 09:40:00
s1 3 2020-01-06 09:40:00
s1 4 2020-01-06 10:00:00
s1 4 2020-01-06 10:00:00
s2 1 2020-01-06 08:20:00
s2 2 2020-01-06 08:40:00
s2 2 2020-01-06 08:40:00
s2 3 2020-01-06 09:00:00
s2 4 2020-01-06 09:20:00
s2 5 2020-01-06 09:40:00
s2 5 2020-01-06 09:40:00
s2 6 2020-01-06 10:00:00
s3 1 2020-01-09 13:00:00
s3 1 2020-01-09 13:00:00
s3 2 2020-01-09 13:20:00
s3 3 2020-01-09 13:40:00
In the above I would like to add a column called service_time.
service_time should contain any random digits between 2 to 40 with mean 20 for each session.
I prefer random numbers should follow random normal distribution with mean 20, standard deviation 10, minimum 2 and maximum 40
Expected output:
session slot_num appt_time service_time
s1 1 2020-01-06 09:00:00 30
s1 2 2020-01-06 09:20:00 10
s1 3 2020-01-06 09:40:00 15
s1 3 2020-01-06 09:40:00 35
s1 4 2020-01-06 10:00:00 20
s1 4 2020-01-06 10:00:00 10
s2 1 2020-01-06 08:20:00 15
s2 2 2020-01-06 08:40:00 20
s2 2 2020-01-06 08:40:00 25
s2 3 2020-01-06 09:00:00 30
s2 4 2020-01-06 09:20:00 20
s2 5 2020-01-06 09:40:00 8
s2 5 2020-01-06 09:40:00 40
s2 6 2020-01-06 10:00:00 2
s3 1 2020-01-09 13:00:00 4
s3 1 2020-01-09 13:00:00 32
s3 2 2020-01-09 13:20:00 26
s3 3 2020-01-09 13:40:00 18
Note : please note that this is the one of that random combination which follows the minimum, maximum and mean criteria mentioned above.
One possible solution with cutom function:
#https://stackoverflow.com/a/39435600/2901002
def gen_avg(n, expected_avg=20, a=2, b=40):
while True:
l = np.random.randint(a, b, size=n)
avg = np.mean(l)
if avg == expected_avg:
return l
df['service_time'] = df.groupby('session')['session'].transform(lambda x: gen_avg(len(x)))
print (df)
session slot_num appt_time service_time
0 s1 1 2020-01-06 09:00:00 31
1 s1 2 2020-01-06 09:20:00 9
2 s1 3 2020-01-06 09:40:00 23
3 s1 3 2020-01-06 09:40:00 37
4 s1 4 2020-01-06 10:00:00 6
5 s1 4 2020-01-06 10:00:00 14
6 s2 1 2020-01-06 08:20:00 33
7 s2 2 2020-01-06 08:40:00 29
8 s2 2 2020-01-06 08:40:00 18
9 s2 3 2020-01-06 09:00:00 32
10 s2 4 2020-01-06 09:20:00 9
11 s2 5 2020-01-06 09:40:00 26
12 s2 5 2020-01-06 09:40:00 10
13 s2 6 2020-01-06 10:00:00 3
14 s3 1 2020-01-09 13:00:00 19
15 s3 1 2020-01-09 13:00:00 22
16 s3 2 2020-01-09 13:20:00 5
17 s3 3 2020-01-09 13:40:00 34
Here's a solution with NumPy's new Generator infrastructure. See the documentation for a discussion of the differences between this and the older RandomState infrastructure.
import numpy as np
from numpy.random import default_rng
# assuming df is the name of your dataframe
n = len(df)
# set up random number generator
rng = default_rng()
# sample more than enough values
vals = rng.normal(loc=20., scale=10., size=2*n)
# filter values according to cut-off conditions
vals = vals[2 <= vals]
vals = vals[vals <= 40]
# add n random values to dataframe
df['service_time'] = vals[:n]
The normal distribution has an unbounded range, so if you're bounding between 2 and 40 the distribution isn't normal. An alternative which is bounded, and avoids acceptance/rejection schemes, is to use the triangular distribution (see Wikipedia for details). Since the mean of a triangular distribution is (left + mode + right) / 3, with left = 2 and right = 40 you would set mode = 18 to get the desired mean of 20.

Groupby - Generate date time as sequence

I have a dataframe as shown below
session slot_num
s1 1
s1 2
s1 3
s1 3
s1 4
s1 4
s2 1
s2 2
s2 2
s2 3
s2 4
s2 5
s2 5
s2 6
s3 1
s3 1
s3 2
s3 3
from the above I would like to create a column appt_time as shown below.
Expected output
session slot_num appt_time
s1 1 2020-01-06 09:00:00
s1 2 2020-01-06 09:20:00
s1 3 2020-01-06 09:40:00
s1 3 2020-01-06 09:40:00
s1 4 2020-01-06 10:00:00
s1 4 2020-01-06 10:00:00
s2 1 2020-01-06 08:20:00
s2 2 2020-01-06 08:40:00
s2 2 2020-01-06 08:40:00
s2 3 2020-01-06 09:00:00
s2 4 2020-01-06 09:20:00
s2 5 2020-01-06 09:40:00
s2 5 2020-01-06 09:40:00
s2 6 2020-01-06 10:00:00
s3 1 2020-01-09 13:00:00
s3 1 2020-01-09 13:00:00
s3 2 2020-01-09 13:20:00
s3 3 2020-01-09 13:40:00
Explanation:
for session = s1, appt_start time = 2020-01-06 09:00:00, then for each increase in slot_num for that session increment appt_time by 20 minutes.
for session = s2, appt_start time = 2020-01-06 08:20:00, then for each increase in slot_num for that session increment appt_time by 20 minutes.
for session = s3, appt_start time = 2020-01-09 13:00:00, then for each increase in slot_num for that session increment appt_time by 20 minutes.
First is necessary specified first datetimes for each session, here is used dictionary with Series.map, conver to datetimes and then add timedeltas by 20 minutes by to_timedelta with subtract 0 for added 0 Timedelta for first group and multiple for 20 minutes:
d = {'s1':'2020-01-06 09:00:00',
's2':'2020-01-06 08:20:00',
's3':'2020-01-09 13:00:00'}
df['appt_time'] = (pd.to_datetime(df['session'].map(d)) +
pd.to_timedelta(df['slot_num'].sub(1), unit='T').mul(20))
print (df)
session slot_num appt_time
0 s1 1 2020-01-06 09:00:00
1 s1 2 2020-01-06 09:20:00
2 s1 3 2020-01-06 09:40:00
3 s1 3 2020-01-06 09:40:00
4 s1 4 2020-01-06 10:00:00
5 s1 4 2020-01-06 10:00:00
6 s2 1 2020-01-06 08:20:00
7 s2 2 2020-01-06 08:40:00
8 s2 2 2020-01-06 08:40:00
9 s2 3 2020-01-06 09:00:00
10 s2 4 2020-01-06 09:20:00
11 s2 5 2020-01-06 09:40:00
12 s2 5 2020-01-06 09:40:00
13 s2 6 2020-01-06 10:00:00
14 s3 1 2020-01-09 13:00:00
15 s3 1 2020-01-09 13:00:00
16 s3 2 2020-01-09 13:20:00
17 s3 3 2020-01-09 13:40:00

create a new columns by adding minutes to date time column and another column by groupby row number - in Pandas

I have a data frame as shown below
session appt_time
s1 2020-01-06 09:00:00
s1 2020-01-06 09:20:00
s1 2020-01-06 09:40:00
s1 2020-01-06 09:40:00
s1 2020-01-06 10:00:00
s1 2020-01-06 10:00:00
s2 2020-01-06 08:20:00
s2 2020-01-06 08:40:00
s2 2020-01-06 08:40:00
s2 2020-01-06 09:00:00
s2 2020-01-06 09:20:00
s2 2020-01-06 09:40:00
s2 2020-01-06 09:40:00
s2 2020-01-06 10:00:00
s3 2020-01-09 13:00:00
s3 2020-01-09 13:00:00
s3 2020-01-09 13:20:00
s3 2020-01-09 13:40:00
From the above I would like to create a new columns called ideal_appt_time and slot_num as shown below.
session appt_time ideal_appt_time slot_num
s1 2020-01-06 09:00:00 2020-01-06 09:00:00 1
s1 2020-01-06 09:20:00 2020-01-06 09:20:00 2
s1 2020-01-06 09:40:00 2020-01-06 09:40:00 3
s1 2020-01-06 09:40:00 2020-01-06 10:00:00 4
s1 2020-01-06 10:00:00 2020-01-06 10:20:00 5
s1 2020-01-06 10:00:00 2020-01-06 10:40:00 6
s2 2020-01-06 08:20:00 2020-01-06 08:20:00 1
s2 2020-01-06 08:40:00 2020-01-06 08:40:00 2
s2 2020-01-06 08:40:00 2020-01-06 09:00:00 3
s2 2020-01-06 09:00:00 2020-01-06 09:20:00 4
s2 2020-01-06 09:20:00 2020-01-06 09:40:00 5
s2 2020-01-06 09:40:00 2020-01-06 10:00:00 6
s2 2020-01-06 09:40:00 2020-01-06 10:20:00 7
s2 2020-01-06 10:00:00 2020-01-06 10:40:00 8
s3 2020-01-09 13:00:00 2020-01-09 13:00:00 1
s3 2020-01-09 13:00:00 2020-01-09 13:20:00 2
s3 2020-01-09 13:20:00 2020-01-09 13:40:00 3
s3 2020-01-09 13:40:00 2020-01-09 14:00:00 4
Explanation:
where ideal_appt_time is calculated based on appt_time, start of ideal_appt_time is same as appt_time. then adding 20 minutes to that, where as in appt_time some appt_time are repeating.
slot_num just counted the slot of that session based on appoitment time.
Use GroupBy.cumcount for counter Series, converted to timedeltas by to_timedelta and multiple 20 for 20 Minutes.
Then get first timestamp per group by GroupBy.transform and GroupBy.first, add timedeltas and last for counter column add 1:
df['appt_time'] = pd.to_datetime(df['appt_time'])
counts = df.groupby('session').cumcount()
td = pd.to_timedelta(counts, unit='Min') * 20
df['ideal_appt_time'] = df.groupby('session')['appt_time'].transform('first') + td
df['slot_num'] = counts + 1
print (df)
session appt_time ideal_appt_time slot_num
0 s1 2020-01-06 09:00:00 2020-01-06 09:00:00 1
1 s1 2020-01-06 09:20:00 2020-01-06 09:20:00 2
2 s1 2020-01-06 09:40:00 2020-01-06 09:40:00 3
3 s1 2020-01-06 09:40:00 2020-01-06 10:00:00 4
4 s1 2020-01-06 10:00:00 2020-01-06 10:20:00 5
5 s1 2020-01-06 10:00:00 2020-01-06 10:40:00 6
6 s2 2020-01-06 08:20:00 2020-01-06 08:20:00 1
7 s2 2020-01-06 08:40:00 2020-01-06 08:40:00 2
8 s2 2020-01-06 08:40:00 2020-01-06 09:00:00 3
9 s2 2020-01-06 09:00:00 2020-01-06 09:20:00 4
10 s2 2020-01-06 09:20:00 2020-01-06 09:40:00 5
11 s2 2020-01-06 09:40:00 2020-01-06 10:00:00 6
12 s2 2020-01-06 09:40:00 2020-01-06 10:20:00 7
13 s2 2020-01-06 10:00:00 2020-01-06 10:40:00 8
14 s3 2020-01-09 13:00:00 2020-01-09 13:00:00 1
15 s3 2020-01-09 13:00:00 2020-01-09 13:20:00 2
16 s3 2020-01-09 13:20:00 2020-01-09 13:40:00 3
17 s3 2020-01-09 13:40:00 2020-01-09 14:00:00 4

groupby first as a dictionary in pandas

I have data frame as shown below.
session slot_num appt_time
s1 1 2020-01-06 09:00:00
s1 2 2020-01-06 09:20:00
s1 3 2020-01-06 09:40:00
s1 3 2020-01-06 09:40:00
s1 4 2020-01-06 10:00:00
s1 4 2020-01-06 10:00:00
s2 1 2020-01-06 08:20:00
s2 2 2020-01-06 08:40:00
s2 2 2020-01-06 08:40:00
s2 3 2020-01-06 09:00:00
s2 4 2020-01-06 09:20:00
s2 5 2020-01-06 09:40:00
s2 5 2020-01-06 09:40:00
s2 6 2020-01-06 10:00:00
s3 1 2020-01-09 13:00:00
s3 1 2020-01-09 13:00:00
s3 2 2020-01-09 13:20:00
s3 3 2020-01-09 13:40:00
from the above I want to create a dictionary with key as session and value as a starting time of each appt_time.
Expected Output:
d = {'S1':'2020-01-06 09:00:00',
'S2':'2020-01-06 08:20:00',
'S3':'2020-01-09 13:00:00'}
Use DataFrame.drop_duplicates with convert sesion to index, select column for Series and last use Series.to_dict:
d = df.drop_duplicates('session').set_index('session')['appt_time'].to_dict()
print (d)
{'s1': '2020-01-06 09:00:00', 's2': '2020-01-06 08:20:00', 's3': '2020-01-09 13:00:00'}