create a new column based on groupby date time column at date level in pandas - pandas

I have data frame as shown below.
Doctor Appointment Booking_ID
A 2020-01-18 12:00:00 1
A 2020-01-18 12:30:00 2
A 2020-01-18 13:00:00 3
A 2020-01-18 13:00:00 4
A 2020-01-19 13:00:00 13
A 2020-01-19 13:30:00 14
B 2020-01-18 12:00:00 5
B 2020-01-18 12:30:00 6
B 2020-01-18 13:00:00 7
B 2020-01-25 12:30:00 6
B 2020-01-25 13:00:00 7
C 2020-01-19 12:00:00 19
C 2020-01-19 12:30:00 20
C 2020-01-19 13:00:00 21
C 2020-01-22 12:30:00 20
C 2020-01-22 13:00:00 21
From the above I would like to create a column called Session as shown below.
Expected Output:
Doctor Appointment Booking_ID Session
A 2020-01-18 12:00:00 1 S1
A 2020-01-18 12:30:00 2 S1
A 2020-01-18 13:00:00 3 S1
A 2020-01-18 13:00:00 4 S1
A 2020-01-29 13:00:00 13 S2
A 2020-01-29 13:30:00 14 S2
B 2020-01-18 12:00:00 5 S3
B 2020-01-18 12:30:00 6 S3
B 2020-01-18 13:00:00 17 S3
B 2020-01-25 12:30:00 16 S4
B 2020-01-25 13:00:00 7 S4
C 2020-01-19 12:00:00 19 S5
C 2020-01-19 12:30:00 20 S5
C 2020-01-19 13:00:00 21 S5
C 2020-01-22 12:30:00 29 S6
C 2020-01-22 13:00:00 26 S6
C 2020-01-22 13:30:00 24 S6
Session should be different for different doctor and different Appointment date(in day level)
I tried below
df = df.sort_values(['Doctor', 'Appointment'], ascending=True)
df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date
df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)
But it is considering session based on only dates. I would like to consider doctor as well.

IIUC, Groupby.ngroup with Series.dt.date
df['Session'] = 'S' + (df.groupby(['Doctor',pd.to_datetime(df['Appointment']).dt.date])
.ngroup()
.add(1).astype(str))
Doctor Appointment Booking_ID Session
0 A 2020-01-18-12:00:00 1 S1
1 A 2020-01-18-12:30:00 2 S1
2 A 2020-01-18-13:00:00 3 S1
3 A 2020-01-18-13:00:00 4 S1
4 A 2020-01-19-13:00:00 13 S2
5 A 2020-01-19-13:30:00 14 S2
6 B 2020-01-18-12:00:00 5 S3
7 B 2020-01-18-12:30:00 6 S3
8 B 2020-01-18-13:00:00 7 S3
9 B 2020-01-25-12:30:00 6 S4
10 B 2020-01-25-13:00:00 7 S4
11 C 2020-01-19-12:00:00 19 S5
12 C 2020-01-19-12:30:00 20 S5
13 C 2020-01-19-13:00:00 21 S5
14 C 2020-01-22-12:30:00 20 S6
15 C 2020-01-22-13:00:00 21 S6

you can go with sort_values and check where either the diff in date is not 0 or the doctor not the same than previous row with shift like:
df = df.sort_values(['Doctor', 'Appointment'], ascending=True)
df['Session'] = 'S'+(df['Appointment'].dt.date.diff().ne(pd.Timedelta(days=0))
|df['Doctor'].ne(df['Doctor'].shift())).cumsum().astype(str)
print (df)
Doctor Appointment Booking_ID Session
0 A 2020-01-18 12:00:00 1 S1
1 A 2020-01-18 12:30:00 2 S1
2 A 2020-01-18 13:00:00 3 S1
3 A 2020-01-18 13:00:00 4 S1
4 A 2020-01-19 13:00:00 13 S2
5 A 2020-01-19 13:30:00 14 S2
6 B 2020-01-18 12:00:00 5 S3
7 B 2020-01-18 12:30:00 6 S3
8 B 2020-01-18 13:00:00 7 S3
9 B 2020-01-25 12:30:00 6 S4
10 B 2020-01-25 13:00:00 7 S4
11 C 2020-01-19 12:00:00 19 S5
12 C 2020-01-19 12:30:00 20 S5
13 C 2020-01-19 13:00:00 21 S5
14 C 2020-01-22 12:30:00 20 S6
15 C 2020-01-22 13:00:00 21 S6

Another approach using idxmin with a slightly different result:
df['Session'] = 'S' + (df.groupby(
['Doctor', df.Appointment.dt.date]
).transform('idxmin').iloc[:,0]+1).astype('str')

This is groupby().numgroup():
# convert to datetime
df.Appointment = pd.to_datetime(df.Appointment)
df['Session'] = 'S' + (df.groupby(['Doctor', df.Appointment.dt.date]).ngroup()+1).astype(str)
Output:
Doctor Appointment Booking_ID Session
0 A 2020-01-18 12:00:00 1 S1
1 A 2020-01-18 12:30:00 2 S1
2 A 2020-01-18 13:00:00 3 S1
3 A 2020-01-18 13:00:00 4 S1
4 A 2020-01-19 13:00:00 13 S2
5 A 2020-01-19 13:30:00 14 S2
6 B 2020-01-18 12:00:00 5 S3
7 B 2020-01-18 12:30:00 6 S3
8 B 2020-01-18 13:00:00 7 S3
9 B 2020-01-25 12:30:00 6 S4
10 B 2020-01-25 13:00:00 7 S4
11 C 2020-01-19 12:00:00 19 S5
12 C 2020-01-19 12:30:00 20 S5
13 C 2020-01-19 13:00:00 21 S5
14 C 2020-01-22 12:30:00 20 S6
15 C 2020-01-22 13:00:00 21 S6

Related

generate a random number between 2 and 40 with mean 20 as a column in pandas

I have a data frame as shown below
session slot_num appt_time
s1 1 2020-01-06 09:00:00
s1 2 2020-01-06 09:20:00
s1 3 2020-01-06 09:40:00
s1 3 2020-01-06 09:40:00
s1 4 2020-01-06 10:00:00
s1 4 2020-01-06 10:00:00
s2 1 2020-01-06 08:20:00
s2 2 2020-01-06 08:40:00
s2 2 2020-01-06 08:40:00
s2 3 2020-01-06 09:00:00
s2 4 2020-01-06 09:20:00
s2 5 2020-01-06 09:40:00
s2 5 2020-01-06 09:40:00
s2 6 2020-01-06 10:00:00
s3 1 2020-01-09 13:00:00
s3 1 2020-01-09 13:00:00
s3 2 2020-01-09 13:20:00
s3 3 2020-01-09 13:40:00
In the above I would like to add a column called service_time.
service_time should contain any random digits between 2 to 40 with mean 20 for each session.
I prefer random numbers should follow random normal distribution with mean 20, standard deviation 10, minimum 2 and maximum 40
Expected output:
session slot_num appt_time service_time
s1 1 2020-01-06 09:00:00 30
s1 2 2020-01-06 09:20:00 10
s1 3 2020-01-06 09:40:00 15
s1 3 2020-01-06 09:40:00 35
s1 4 2020-01-06 10:00:00 20
s1 4 2020-01-06 10:00:00 10
s2 1 2020-01-06 08:20:00 15
s2 2 2020-01-06 08:40:00 20
s2 2 2020-01-06 08:40:00 25
s2 3 2020-01-06 09:00:00 30
s2 4 2020-01-06 09:20:00 20
s2 5 2020-01-06 09:40:00 8
s2 5 2020-01-06 09:40:00 40
s2 6 2020-01-06 10:00:00 2
s3 1 2020-01-09 13:00:00 4
s3 1 2020-01-09 13:00:00 32
s3 2 2020-01-09 13:20:00 26
s3 3 2020-01-09 13:40:00 18
Note : please note that this is the one of that random combination which follows the minimum, maximum and mean criteria mentioned above.
One possible solution with cutom function:
#https://stackoverflow.com/a/39435600/2901002
def gen_avg(n, expected_avg=20, a=2, b=40):
while True:
l = np.random.randint(a, b, size=n)
avg = np.mean(l)
if avg == expected_avg:
return l
df['service_time'] = df.groupby('session')['session'].transform(lambda x: gen_avg(len(x)))
print (df)
session slot_num appt_time service_time
0 s1 1 2020-01-06 09:00:00 31
1 s1 2 2020-01-06 09:20:00 9
2 s1 3 2020-01-06 09:40:00 23
3 s1 3 2020-01-06 09:40:00 37
4 s1 4 2020-01-06 10:00:00 6
5 s1 4 2020-01-06 10:00:00 14
6 s2 1 2020-01-06 08:20:00 33
7 s2 2 2020-01-06 08:40:00 29
8 s2 2 2020-01-06 08:40:00 18
9 s2 3 2020-01-06 09:00:00 32
10 s2 4 2020-01-06 09:20:00 9
11 s2 5 2020-01-06 09:40:00 26
12 s2 5 2020-01-06 09:40:00 10
13 s2 6 2020-01-06 10:00:00 3
14 s3 1 2020-01-09 13:00:00 19
15 s3 1 2020-01-09 13:00:00 22
16 s3 2 2020-01-09 13:20:00 5
17 s3 3 2020-01-09 13:40:00 34
Here's a solution with NumPy's new Generator infrastructure. See the documentation for a discussion of the differences between this and the older RandomState infrastructure.
import numpy as np
from numpy.random import default_rng
# assuming df is the name of your dataframe
n = len(df)
# set up random number generator
rng = default_rng()
# sample more than enough values
vals = rng.normal(loc=20., scale=10., size=2*n)
# filter values according to cut-off conditions
vals = vals[2 <= vals]
vals = vals[vals <= 40]
# add n random values to dataframe
df['service_time'] = vals[:n]
The normal distribution has an unbounded range, so if you're bounding between 2 and 40 the distribution isn't normal. An alternative which is bounded, and avoids acceptance/rejection schemes, is to use the triangular distribution (see Wikipedia for details). Since the mean of a triangular distribution is (left + mode + right) / 3, with left = 2 and right = 40 you would set mode = 18 to get the desired mean of 20.

Groupby count on unique date time in pandas

I have a data frame as shown below
Doctor Start B_ID Session Finish NoShow
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00 no
A 2020-01-18 12:20:00 2 S1 2020-01-18 12:52:00 no
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00 no
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00 yes
A 2020-01-18 13:35:00 5 S1 2020-01-18 13:56:00 no
A 2020-01-18 14:10:00 6 S1 2020-01-18 14:15:00 no
A 2020-01-18 14:10:00 7 S1 2020-01-18 14:28:00 yes
A 2020-01-18 14:10:00 8 S1 2020-01-18 14:40:00 yes
A 2020-01-18 14:10:00 9 S1 2020-01-18 15:01:00 no
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00 no
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00 no
A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00 yes
A 2020-01-19 13:40:00 15 S2 2020-01-19 13:46:00 no
A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00 yes
A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00 no
A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00 yes
B 2020-01-18 12:00:00 21 S3 2020-01-18 12:33:00 no
B 2020-01-18 12:30:00 22 S3 2020-01-18 12:52:00 no
B 2020-01-18 13:10:00 23 S3 2020-01-18 13:25:00 no
B 2020-01-18 13:10:00 24 S3 2020-01-18 13:39:00 no
B 2020-01-18 13:30:00 25 S3 2020-01-18 13:56:00 yes
B 2020-01-18 14:05:00 26 S3 2020-01-18 14:15:00 no
B 2020-01-18 14:30:00 27 S3 2020-01-18 14:48:00 yes
From the above I would like to prepare below data frame
Expected Output:
Doctor Day No_of_slots No_of_bookings No_of_NoShow
A 2020-01-18 5 9 3
A 2020-01-19 5 7 3
b 2020-01-18 6 7 2
Where
No_of_slots = Total number of slots based on unique Start time
No_of_bookings = Total number of bookings
No_of_NoShow = Number of NoShow == 'yes'
Use GroupBy.agg with named aggregation, for count yes values is used sum by helper column new created by DataFrame.assign with compare by Series.eq and converting to numeric by Series.view:
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
d = df['Start'].dt.date.rename('Day')
df1 = (df.assign(new = df['NoShow'].eq('yes').view('i1'))
.groupby(['Doctor', d]).agg(No_of_slots=('Start','nunique'),
No_of_bookings=('Start','size'),
No_of_NoShow=('new', 'sum'))
.reset_index())
print (df1)
Doctor Day No_of_slots No_of_bookings No_of_NoShow
0 A 2020-01-18 5 9 3
1 A 2020-01-19 5 7 3
2 B 2020-01-18 6 7 2

create a new column by comparing previous value of another column in pandas

I have a data frame as shown below
Doctor Start B_ID Session Finish
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00
A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00
A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00
A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00
A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00
A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00
A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00
A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00
A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00
A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00
A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00
A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00
From the above data frame I would like to create a column called "Actual_start_time" based on the previous finish and current Start time for the same session.
Steps
if previous finish > current start:
df['Actual_start'] = df['previous finish']
else:
df['Actual_start'] = df['Start']
Expected output:
Doctor Start B_ID Session Finish Actual_start
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00 2020-01-18 12:00:00
A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00 2020-01-18 12:33:00
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00 2020-01-18 13:00:00
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00 2020-01-18 13:23:00
A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00 2020-01-18 13:37:00
A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00 2020-01-18 14:00:00
A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00 2020-01-18 14:15:00
A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00 2020-01-18 14:28:00
A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00 2020-01-18 14:40:00
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00 2020-01-19 12:00:00
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00 2020-01-19 12:30:00
A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00 2020-01-19 13:00:00
A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00 2020-01-19 13:30:00
A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00 2020-01-19 14:00:00
A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00 2020-01-19 14:10:00
A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00 2020-01-19 14:20:00
Use DataFrameGroupBy.shift and replace first values of group by column Start by Series.fillna:
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
s = df.groupby('Session')['Finish'].shift()
df['Actual_start'] = np.where(s.gt(df['Start']), s, df['Start'])
print (df)
Doctor Start B_ID Session Finish \
0 A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00
1 A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00
2 A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00
3 A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00
4 A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00
5 A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00
6 A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00
7 A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00
8 A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00
9 A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00
10 A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00
11 A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00
12 A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00
13 A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00
14 A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00
15 A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00
Actual_start
0 2020-01-18 12:00:00
1 2020-01-18 12:33:00
2 2020-01-18 13:00:00
3 2020-01-18 13:23:00
4 2020-01-18 13:37:00
5 2020-01-18 14:00:00
6 2020-01-18 14:15:00
7 2020-01-18 14:28:00
8 2020-01-18 14:40:00
9 2020-01-19 12:00:00
10 2020-01-19 12:30:00
11 2020-01-19 13:00:00
12 2020-01-19 13:30:00
13 2020-01-19 14:00:00
14 2020-01-19 14:10:00
15 2020-01-19 14:20:00

groupby count in pandas multiple specific condition

I have a data frame as shown below.
Doctor Appointment B_ID No_Show
A 2020-01-18 12:00:00 1 0.2
A 2020-01-18 12:30:00 2 0.3
A 2020-01-18 13:00:00 3 0.8
A 2020-01-18 13:00:00 4 0.3
A 2020-01-18 13:30:00 5 0.6
A 2020-01-18 14:00:00 6 0.8
A 2020-01-18 14:00:00 7 0.9
A 2020-01-18 14:00:00 8 0.4
A 2020-01-18 14:00:00 9 0.6
A 2020-01-19 12:00:00 12 0.9
A 2020-01-19 12:00:00 13 0.5
A 2020-01-19 13:00:00 14 0.3
A 2020-01-19 13:00:00 15 0.7
A 2020-01-19 14:00:00 16 0.6
A 2020-01-19 14:00:00 17 0.8
A 2020-01-19 14:00:00 19 0.3
From the above I would like to prepare below df.
No_Show = Probability of no show.
From the above I would like prepare below data frame
Expected output:
Doctor Appointment B_ID No_Show Session slot_num Patient_count
A 2020-01-18 12:00:00 1 0.2 S1 1 1
A 2020-01-18 12:30:00 2 0.3 S1 2 1
A 2020-01-18 13:00:00 3 0.8 S1 3 1
A 2020-01-18 13:00:00 4 0.3 S1 3 2
A 2020-01-18 13:30:00 5 0.6 S1 4 1
A 2020-01-18 14:00:00 6 0.8 S1 5 1
A 2020-01-18 14:00:00 7 0.9 S1 5 2
A 2020-01-18 14:00:00 8 0.4 S1 5 3
A 2020-01-18 14:00:00 9 0.6 S1 5 4
A 2020-01-19 12:00:00 12 0.9 S2 1 1
A 2020-01-19 12:00:00 13 0.5 S2 1 2
A 2020-01-19 12:30:00 14 0.3 S2 2 1
A 2020-01-19 13:00:00 15 0.7 S2 3 1
A 2020-01-19 13:30:00 15 0.7 S2 4 1
A 2020-01-19 14:00:00 16 0.6 S2 5 1
A 2020-01-19 14:00:00 17 0.8 S2 5 2
A 2020-01-19 14:00:00 19 0.3 S2 5 3
Explanation:
Session = Consider one session a day.
slot_num = slot of that day ( each slot is assumed to be 30 minutes duration).
Patient_count = number patient on the same session and same slot.
For Series is used Series.factorize with prepend S and converted to Series and strings, similar idea is used in custom function in GroupBy.transform and for GroupBy.cumcount id added new column slot_num:
df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date
df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)
f = lambda x: pd.factorize(x)[0]
df['slot_num'] = df.groupby(['Doctor', 'Session'])['Appointment'].transform(f) + 1
df['Patient_count'] = df.groupby(['Doctor', 'Session', 'slot_num']).cumcount() + 1
print (df)
Doctor Appointment B_ID No_Show Session slot_num Patient_count
0 A 2020-01-18 12:00:00 1 0.2 S1 1 1
1 A 2020-01-18 12:30:00 2 0.3 S1 2 1
2 A 2020-01-18 13:00:00 3 0.8 S1 3 1
3 A 2020-01-18 13:00:00 4 0.3 S1 3 2
4 A 2020-01-18 13:30:00 5 0.6 S1 4 1
5 A 2020-01-18 14:00:00 6 0.8 S1 5 1
6 A 2020-01-18 14:00:00 7 0.9 S1 5 2
7 A 2020-01-18 14:00:00 8 0.4 S1 5 3
8 A 2020-01-18 14:00:00 9 0.6 S1 5 4
9 A 2020-01-19 12:00:00 12 0.9 S2 1 1
10 A 2020-01-19 12:30:00 13 0.5 S2 2 1
11 A 2020-01-19 13:00:00 14 0.3 S2 3 1
12 A 2020-01-19 13:30:00 15 0.7 S2 4 1
13 A 2020-01-19 14:00:00 16 0.6 S2 5 1
14 A 2020-01-19 14:00:00 17 0.8 S2 5 2
15 A 2020-01-19 14:00:00 19 0.3 S2 5 3

Groupby based on date time column on minute level in pandas

I have a data frame as shown below.
Doctor Appointment Booking_ID
A 2020-01-18 12:00:00 1
A 2020-01-18 12:30:00 2
A 2020-01-18 13:00:00 3
A 2020-01-18 13:00:00 4
B 2020-01-18 12:00:00 5
B 2020-01-18 12:30:00 6
B 2020-01-18 13:00:00 7
B 2020-01-18 13:00:00 8
B 2020-01-18 13:00:00 9
B 2020-01-18 16:30:00 10
A 2020-01-19 12:00:00 11
A 2020-01-19 12:30:00 12
A 2020-01-19 13:00:00 13
A 2020-01-19 13:30:00 14
A 2020-01-19 14:00:00 15
A 2020-01-19 14:00:00 16
A 2020-01-19 14:00:00 17
A 2020-01-19 14:00:00 18
B 2020-01-19 12:00:00 19
B 2020-01-19 12:30:00 20
B 2020-01-19 13:00:00 21
B 2020-01-19 13:30:00 22
B 2020-01-19 14:00:00 23
B 2020-01-19 13:30:00 24
B 2020-01-19 15:00:00 25
B 2020-01-18 15:30:00 26
From the above I would like to find out the number of booking on same time for same doctor.
Expected Output:
Doctor Appointment Booking_ID Number_of_Booking
A 2020-01-18 12:00:00 1 1
A 2020-01-18 12:30:00 2 1
A 2020-01-18 13:00:00 3 2
A 2020-01-18 13:00:00 4 2
B 2020-01-18 12:00:00 5 1
B 2020-01-18 12:30:00 6 1
B 2020-01-18 13:00:00 7 3
B 2020-01-18 13:00:00 8 3
B 2020-01-18 13:00:00 9 3
B 2020-01-18 16:30:00 10 1
A 2020-01-19 12:00:00 11 1
A 2020-01-19 12:30:00 12 1
A 2020-01-19 13:00:00 13 1
A 2020-01-19 13:30:00 14 1
A 2020-01-19 14:00:00 15 4
A 2020-01-19 14:00:00 16 4
A 2020-01-19 14:00:00 17 4
A 2020-01-19 14:00:00 18 4
B 2020-01-19 12:00:00 19 1
B 2020-01-19 12:30:00 20 1
B 2020-01-19 13:00:00 21 1
B 2020-01-19 13:30:00 22 2
B 2020-01-19 14:00:00 23 2
B 2020-01-19 13:30:00 24 2
B 2020-01-19 14:00:00 25 2
B 2020-01-18 15:30:00 26 1
Example:
On the time 2020-01-19 13:30:00 doctor B has two bookings as shown below
Doctor Appointment Booking_ID
B 2020-01-19 13:30:00 22
B 2020-01-19 13:30:00 24
So the output will be as shown below
Doctor Appointment Booking_ID Number_of_Booking
B 2020-01-19 13:30:00 22 2
B 2020-01-19 13:30:00 24 2
For first use GroupBy.transform with GroupBy.size:
df['Number_of_Booking']=df.groupby(['Doctor','Appointment'])['Booking_ID'].transform('size')
print (df.head())
Doctor Appointment Booking_ID Number_of_Booking
0 A 2020-01-18 12:00:00 1 1
1 A 2020-01-18 12:30:00 2 1
2 A 2020-01-18 13:00:00 3 2
3 A 2020-01-18 13:00:00 4 2
4 B 2020-01-18 12:00:00 5 1
For second if unique combination of Doctor and Appointment in all data like in sample assign length of DataFrame:
df['Number_of_Booking'] = len(df)
print (df)
Doctor Appointment Booking_ID Number_of_Booking
0 B 2020-01-19 13:30:00 22 2
1 B 2020-01-19 13:30:00 24 2