create a new column based on groupby date time column at date level in pandas

create a new column based on groupby date time column at date level in pandas - pandas

I have data frame as shown below.
Doctor Appointment Booking_ID
A 2020-01-18 12:00:00 1
A 2020-01-18 12:30:00 2
A 2020-01-18 13:00:00 3
A 2020-01-18 13:00:00 4
A 2020-01-19 13:00:00 13
A 2020-01-19 13:30:00 14
B 2020-01-18 12:00:00 5
B 2020-01-18 12:30:00 6
B 2020-01-18 13:00:00 7
B 2020-01-25 12:30:00 6
B 2020-01-25 13:00:00 7
C 2020-01-19 12:00:00 19
C 2020-01-19 12:30:00 20
C 2020-01-19 13:00:00 21
C 2020-01-22 12:30:00 20
C 2020-01-22 13:00:00 21
From the above I would like to create a column called Session as shown below.
Expected Output:
Doctor Appointment Booking_ID Session
A 2020-01-18 12:00:00 1 S1
A 2020-01-18 12:30:00 2 S1
A 2020-01-18 13:00:00 3 S1
A 2020-01-18 13:00:00 4 S1
A 2020-01-29 13:00:00 13 S2
A 2020-01-29 13:30:00 14 S2
B 2020-01-18 12:00:00 5 S3
B 2020-01-18 12:30:00 6 S3
B 2020-01-18 13:00:00 17 S3
B 2020-01-25 12:30:00 16 S4
B 2020-01-25 13:00:00 7 S4
C 2020-01-19 12:00:00 19 S5
C 2020-01-19 12:30:00 20 S5
C 2020-01-19 13:00:00 21 S5
C 2020-01-22 12:30:00 29 S6
C 2020-01-22 13:00:00 26 S6
C 2020-01-22 13:30:00 24 S6
Session should be different for different doctor and different Appointment date(in day level)
I tried below
df = df.sort_values(['Doctor', 'Appointment'], ascending=True)
df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date
df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)
But it is considering session based on only dates. I would like to consider doctor as well.

IIUC, Groupby.ngroup with Series.dt.date
df['Session'] = 'S' + (df.groupby(['Doctor',pd.to_datetime(df['Appointment']).dt.date])
.ngroup()
.add(1).astype(str))
Doctor Appointment Booking_ID Session
0 A 2020-01-18-12:00:00 1 S1
1 A 2020-01-18-12:30:00 2 S1
2 A 2020-01-18-13:00:00 3 S1
3 A 2020-01-18-13:00:00 4 S1
4 A 2020-01-19-13:00:00 13 S2
5 A 2020-01-19-13:30:00 14 S2
6 B 2020-01-18-12:00:00 5 S3
7 B 2020-01-18-12:30:00 6 S3
8 B 2020-01-18-13:00:00 7 S3
9 B 2020-01-25-12:30:00 6 S4
10 B 2020-01-25-13:00:00 7 S4
11 C 2020-01-19-12:00:00 19 S5
12 C 2020-01-19-12:30:00 20 S5
13 C 2020-01-19-13:00:00 21 S5
14 C 2020-01-22-12:30:00 20 S6
15 C 2020-01-22-13:00:00 21 S6

you can go with sort_values and check where either the diff in date is not 0 or the doctor not the same than previous row with shift like:
df = df.sort_values(['Doctor', 'Appointment'], ascending=True)
df['Session'] = 'S'+(df['Appointment'].dt.date.diff().ne(pd.Timedelta(days=0))
|df['Doctor'].ne(df['Doctor'].shift())).cumsum().astype(str)
print (df)
Doctor Appointment Booking_ID Session
0 A 2020-01-18 12:00:00 1 S1
1 A 2020-01-18 12:30:00 2 S1
2 A 2020-01-18 13:00:00 3 S1
3 A 2020-01-18 13:00:00 4 S1
4 A 2020-01-19 13:00:00 13 S2
5 A 2020-01-19 13:30:00 14 S2
6 B 2020-01-18 12:00:00 5 S3
7 B 2020-01-18 12:30:00 6 S3
8 B 2020-01-18 13:00:00 7 S3
9 B 2020-01-25 12:30:00 6 S4
10 B 2020-01-25 13:00:00 7 S4
11 C 2020-01-19 12:00:00 19 S5
12 C 2020-01-19 12:30:00 20 S5
13 C 2020-01-19 13:00:00 21 S5
14 C 2020-01-22 12:30:00 20 S6
15 C 2020-01-22 13:00:00 21 S6

Another approach using idxmin with a slightly different result:
df['Session'] = 'S' + (df.groupby(
['Doctor', df.Appointment.dt.date]
).transform('idxmin').iloc[:,0]+1).astype('str')

This is groupby().numgroup():
# convert to datetime
df.Appointment = pd.to_datetime(df.Appointment)
df['Session'] = 'S' + (df.groupby(['Doctor', df.Appointment.dt.date]).ngroup()+1).astype(str)
Output:
Doctor Appointment Booking_ID Session
0 A 2020-01-18 12:00:00 1 S1
1 A 2020-01-18 12:30:00 2 S1
2 A 2020-01-18 13:00:00 3 S1
3 A 2020-01-18 13:00:00 4 S1
4 A 2020-01-19 13:00:00 13 S2
5 A 2020-01-19 13:30:00 14 S2
6 B 2020-01-18 12:00:00 5 S3
7 B 2020-01-18 12:30:00 6 S3
8 B 2020-01-18 13:00:00 7 S3
9 B 2020-01-25 12:30:00 6 S4
10 B 2020-01-25 13:00:00 7 S4
11 C 2020-01-19 12:00:00 19 S5
12 C 2020-01-19 12:30:00 20 S5
13 C 2020-01-19 13:00:00 21 S5
14 C 2020-01-22 12:30:00 20 S6
15 C 2020-01-22 13:00:00 21 S6

Related

generate a random number between 2 and 40 with mean 20 as a column in pandas

I have a data frame as shown below
session slot_num appt_time
s1 1 2020-01-06 09:00:00
s1 2 2020-01-06 09:20:00
s1 3 2020-01-06 09:40:00
s1 3 2020-01-06 09:40:00
s1 4 2020-01-06 10:00:00
s1 4 2020-01-06 10:00:00
s2 1 2020-01-06 08:20:00
s2 2 2020-01-06 08:40:00
s2 2 2020-01-06 08:40:00
s2 3 2020-01-06 09:00:00
s2 4 2020-01-06 09:20:00
s2 5 2020-01-06 09:40:00
s2 5 2020-01-06 09:40:00
s2 6 2020-01-06 10:00:00
s3 1 2020-01-09 13:00:00
s3 1 2020-01-09 13:00:00
s3 2 2020-01-09 13:20:00
s3 3 2020-01-09 13:40:00
In the above I would like to add a column called service_time.
service_time should contain any random digits between 2 to 40 with mean 20 for each session.
I prefer random numbers should follow random normal distribution with mean 20, standard deviation 10, minimum 2 and maximum 40
Expected output:
session slot_num appt_time service_time
s1 1 2020-01-06 09:00:00 30
s1 2 2020-01-06 09:20:00 10
s1 3 2020-01-06 09:40:00 15
s1 3 2020-01-06 09:40:00 35
s1 4 2020-01-06 10:00:00 20
s1 4 2020-01-06 10:00:00 10
s2 1 2020-01-06 08:20:00 15
s2 2 2020-01-06 08:40:00 20
s2 2 2020-01-06 08:40:00 25
s2 3 2020-01-06 09:00:00 30
s2 4 2020-01-06 09:20:00 20
s2 5 2020-01-06 09:40:00 8
s2 5 2020-01-06 09:40:00 40
s2 6 2020-01-06 10:00:00 2
s3 1 2020-01-09 13:00:00 4
s3 1 2020-01-09 13:00:00 32
s3 2 2020-01-09 13:20:00 26
s3 3 2020-01-09 13:40:00 18
Note : please note that this is the one of that random combination which follows the minimum, maximum and mean criteria mentioned above.

One possible solution with cutom function:
#https://stackoverflow.com/a/39435600/2901002
def gen_avg(n, expected_avg=20, a=2, b=40):
while True:
l = np.random.randint(a, b, size=n)
avg = np.mean(l)
if avg == expected_avg:
return l
df['service_time'] = df.groupby('session')['session'].transform(lambda x: gen_avg(len(x)))
print (df)
session slot_num appt_time service_time
0 s1 1 2020-01-06 09:00:00 31
1 s1 2 2020-01-06 09:20:00 9
2 s1 3 2020-01-06 09:40:00 23
3 s1 3 2020-01-06 09:40:00 37
4 s1 4 2020-01-06 10:00:00 6
5 s1 4 2020-01-06 10:00:00 14
6 s2 1 2020-01-06 08:20:00 33
7 s2 2 2020-01-06 08:40:00 29
8 s2 2 2020-01-06 08:40:00 18
9 s2 3 2020-01-06 09:00:00 32
10 s2 4 2020-01-06 09:20:00 9
11 s2 5 2020-01-06 09:40:00 26
12 s2 5 2020-01-06 09:40:00 10
13 s2 6 2020-01-06 10:00:00 3
14 s3 1 2020-01-09 13:00:00 19
15 s3 1 2020-01-09 13:00:00 22
16 s3 2 2020-01-09 13:20:00 5
17 s3 3 2020-01-09 13:40:00 34

Here's a solution with NumPy's new Generator infrastructure. See the documentation for a discussion of the differences between this and the older RandomState infrastructure.
import numpy as np
from numpy.random import default_rng
# assuming df is the name of your dataframe
n = len(df)
# set up random number generator
rng = default_rng()
# sample more than enough values
vals = rng.normal(loc=20., scale=10., size=2*n)
# filter values according to cut-off conditions
vals = vals[2 <= vals]
vals = vals[vals <= 40]
# add n random values to dataframe
df['service_time'] = vals[:n]

The normal distribution has an unbounded range, so if you're bounding between 2 and 40 the distribution isn't normal. An alternative which is bounded, and avoids acceptance/rejection schemes, is to use the triangular distribution (see Wikipedia for details). Since the mean of a triangular distribution is (left + mode + right) / 3, with left = 2 and right = 40 you would set mode = 18 to get the desired mean of 20.

Groupby count on unique date time in pandas

I have a data frame as shown below
Doctor Start B_ID Session Finish NoShow
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00 no
A 2020-01-18 12:20:00 2 S1 2020-01-18 12:52:00 no
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00 no
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00 yes
A 2020-01-18 13:35:00 5 S1 2020-01-18 13:56:00 no
A 2020-01-18 14:10:00 6 S1 2020-01-18 14:15:00 no
A 2020-01-18 14:10:00 7 S1 2020-01-18 14:28:00 yes
A 2020-01-18 14:10:00 8 S1 2020-01-18 14:40:00 yes
A 2020-01-18 14:10:00 9 S1 2020-01-18 15:01:00 no
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00 no
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00 no
A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00 yes
A 2020-01-19 13:40:00 15 S2 2020-01-19 13:46:00 no
A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00 yes
A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00 no
A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00 yes
B 2020-01-18 12:00:00 21 S3 2020-01-18 12:33:00 no
B 2020-01-18 12:30:00 22 S3 2020-01-18 12:52:00 no
B 2020-01-18 13:10:00 23 S3 2020-01-18 13:25:00 no
B 2020-01-18 13:10:00 24 S3 2020-01-18 13:39:00 no
B 2020-01-18 13:30:00 25 S3 2020-01-18 13:56:00 yes
B 2020-01-18 14:05:00 26 S3 2020-01-18 14:15:00 no
B 2020-01-18 14:30:00 27 S3 2020-01-18 14:48:00 yes
From the above I would like to prepare below data frame
Expected Output:
Doctor Day No_of_slots No_of_bookings No_of_NoShow
A 2020-01-18 5 9 3
A 2020-01-19 5 7 3
b 2020-01-18 6 7 2
Where
No_of_slots = Total number of slots based on unique Start time
No_of_bookings = Total number of bookings
No_of_NoShow = Number of NoShow == 'yes'

Use GroupBy.agg with named aggregation, for count yes values is used sum by helper column new created by DataFrame.assign with compare by Series.eq and converting to numeric by Series.view:
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
d = df['Start'].dt.date.rename('Day')
df1 = (df.assign(new = df['NoShow'].eq('yes').view('i1'))
.groupby(['Doctor', d]).agg(No_of_slots=('Start','nunique'),
No_of_bookings=('Start','size'),
No_of_NoShow=('new', 'sum'))
.reset_index())
print (df1)
Doctor Day No_of_slots No_of_bookings No_of_NoShow
0 A 2020-01-18 5 9 3
1 A 2020-01-19 5 7 3
2 B 2020-01-18 6 7 2

create a new column by comparing previous value of another column in pandas

I have a data frame as shown below
Doctor Start B_ID Session Finish
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00
A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00
A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00
A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00
A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00
A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00
A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00
A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00
A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00
A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00
A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00
A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00
From the above data frame I would like to create a column called "Actual_start_time" based on the previous finish and current Start time for the same session.
Steps
if previous finish > current start:
df['Actual_start'] = df['previous finish']
else:
df['Actual_start'] = df['Start']
Expected output:
Doctor Start B_ID Session Finish Actual_start
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00 2020-01-18 12:00:00
A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00 2020-01-18 12:33:00
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00 2020-01-18 13:00:00
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00 2020-01-18 13:23:00
A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00 2020-01-18 13:37:00
A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00 2020-01-18 14:00:00
A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00 2020-01-18 14:15:00
A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00 2020-01-18 14:28:00
A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00 2020-01-18 14:40:00
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00 2020-01-19 12:00:00
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00 2020-01-19 12:30:00
A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00 2020-01-19 13:00:00
A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00 2020-01-19 13:30:00
A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00 2020-01-19 14:00:00
A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00 2020-01-19 14:10:00
A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00 2020-01-19 14:20:00

Use DataFrameGroupBy.shift and replace first values of group by column Start by Series.fillna:
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
s = df.groupby('Session')['Finish'].shift()
df['Actual_start'] = np.where(s.gt(df['Start']), s, df['Start'])
print (df)
Doctor Start B_ID Session Finish \
0 A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00
1 A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00
2 A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00
3 A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00
4 A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00
5 A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00
6 A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00
7 A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00
8 A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00
9 A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00
10 A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00
11 A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00
12 A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00
13 A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00
14 A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00
15 A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00
Actual_start
0 2020-01-18 12:00:00
1 2020-01-18 12:33:00
2 2020-01-18 13:00:00
3 2020-01-18 13:23:00
4 2020-01-18 13:37:00
5 2020-01-18 14:00:00
6 2020-01-18 14:15:00
7 2020-01-18 14:28:00
8 2020-01-18 14:40:00
9 2020-01-19 12:00:00
10 2020-01-19 12:30:00
11 2020-01-19 13:00:00
12 2020-01-19 13:30:00
13 2020-01-19 14:00:00
14 2020-01-19 14:10:00
15 2020-01-19 14:20:00

groupby count in pandas multiple specific condition

I have a data frame as shown below.
Doctor Appointment B_ID No_Show
A 2020-01-18 12:00:00 1 0.2
A 2020-01-18 12:30:00 2 0.3
A 2020-01-18 13:00:00 3 0.8
A 2020-01-18 13:00:00 4 0.3
A 2020-01-18 13:30:00 5 0.6
A 2020-01-18 14:00:00 6 0.8
A 2020-01-18 14:00:00 7 0.9
A 2020-01-18 14:00:00 8 0.4
A 2020-01-18 14:00:00 9 0.6
A 2020-01-19 12:00:00 12 0.9
A 2020-01-19 12:00:00 13 0.5
A 2020-01-19 13:00:00 14 0.3
A 2020-01-19 13:00:00 15 0.7
A 2020-01-19 14:00:00 16 0.6
A 2020-01-19 14:00:00 17 0.8
A 2020-01-19 14:00:00 19 0.3
From the above I would like to prepare below df.
No_Show = Probability of no show.
From the above I would like prepare below data frame
Expected output:
Doctor Appointment B_ID No_Show Session slot_num Patient_count
A 2020-01-18 12:00:00 1 0.2 S1 1 1
A 2020-01-18 12:30:00 2 0.3 S1 2 1
A 2020-01-18 13:00:00 3 0.8 S1 3 1
A 2020-01-18 13:00:00 4 0.3 S1 3 2
A 2020-01-18 13:30:00 5 0.6 S1 4 1
A 2020-01-18 14:00:00 6 0.8 S1 5 1
A 2020-01-18 14:00:00 7 0.9 S1 5 2
A 2020-01-18 14:00:00 8 0.4 S1 5 3
A 2020-01-18 14:00:00 9 0.6 S1 5 4
A 2020-01-19 12:00:00 12 0.9 S2 1 1
A 2020-01-19 12:00:00 13 0.5 S2 1 2
A 2020-01-19 12:30:00 14 0.3 S2 2 1
A 2020-01-19 13:00:00 15 0.7 S2 3 1
A 2020-01-19 13:30:00 15 0.7 S2 4 1
A 2020-01-19 14:00:00 16 0.6 S2 5 1
A 2020-01-19 14:00:00 17 0.8 S2 5 2
A 2020-01-19 14:00:00 19 0.3 S2 5 3
Explanation:
Session = Consider one session a day.
slot_num = slot of that day ( each slot is assumed to be 30 minutes duration).
Patient_count = number patient on the same session and same slot.

For Series is used Series.factorize with prepend S and converted to Series and strings, similar idea is used in custom function in GroupBy.transform and for GroupBy.cumcount id added new column slot_num:
df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date
df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)
f = lambda x: pd.factorize(x)[0]
df['slot_num'] = df.groupby(['Doctor', 'Session'])['Appointment'].transform(f) + 1
df['Patient_count'] = df.groupby(['Doctor', 'Session', 'slot_num']).cumcount() + 1
print (df)
Doctor Appointment B_ID No_Show Session slot_num Patient_count
0 A 2020-01-18 12:00:00 1 0.2 S1 1 1
1 A 2020-01-18 12:30:00 2 0.3 S1 2 1
2 A 2020-01-18 13:00:00 3 0.8 S1 3 1
3 A 2020-01-18 13:00:00 4 0.3 S1 3 2
4 A 2020-01-18 13:30:00 5 0.6 S1 4 1
5 A 2020-01-18 14:00:00 6 0.8 S1 5 1
6 A 2020-01-18 14:00:00 7 0.9 S1 5 2
7 A 2020-01-18 14:00:00 8 0.4 S1 5 3
8 A 2020-01-18 14:00:00 9 0.6 S1 5 4
9 A 2020-01-19 12:00:00 12 0.9 S2 1 1
10 A 2020-01-19 12:30:00 13 0.5 S2 2 1
11 A 2020-01-19 13:00:00 14 0.3 S2 3 1
12 A 2020-01-19 13:30:00 15 0.7 S2 4 1
13 A 2020-01-19 14:00:00 16 0.6 S2 5 1
14 A 2020-01-19 14:00:00 17 0.8 S2 5 2
15 A 2020-01-19 14:00:00 19 0.3 S2 5 3

Groupby based on date time column on minute level in pandas

I have a data frame as shown below.
Doctor Appointment Booking_ID
A 2020-01-18 12:00:00 1
A 2020-01-18 12:30:00 2
A 2020-01-18 13:00:00 3
A 2020-01-18 13:00:00 4
B 2020-01-18 12:00:00 5
B 2020-01-18 12:30:00 6
B 2020-01-18 13:00:00 7
B 2020-01-18 13:00:00 8
B 2020-01-18 13:00:00 9
B 2020-01-18 16:30:00 10
A 2020-01-19 12:00:00 11
A 2020-01-19 12:30:00 12
A 2020-01-19 13:00:00 13
A 2020-01-19 13:30:00 14
A 2020-01-19 14:00:00 15
A 2020-01-19 14:00:00 16
A 2020-01-19 14:00:00 17
A 2020-01-19 14:00:00 18
B 2020-01-19 12:00:00 19
B 2020-01-19 12:30:00 20
B 2020-01-19 13:00:00 21
B 2020-01-19 13:30:00 22
B 2020-01-19 14:00:00 23
B 2020-01-19 13:30:00 24
B 2020-01-19 15:00:00 25
B 2020-01-18 15:30:00 26
From the above I would like to find out the number of booking on same time for same doctor.
Expected Output:
Doctor Appointment Booking_ID Number_of_Booking
A 2020-01-18 12:00:00 1 1
A 2020-01-18 12:30:00 2 1
A 2020-01-18 13:00:00 3 2
A 2020-01-18 13:00:00 4 2
B 2020-01-18 12:00:00 5 1
B 2020-01-18 12:30:00 6 1
B 2020-01-18 13:00:00 7 3
B 2020-01-18 13:00:00 8 3
B 2020-01-18 13:00:00 9 3
B 2020-01-18 16:30:00 10 1
A 2020-01-19 12:00:00 11 1
A 2020-01-19 12:30:00 12 1
A 2020-01-19 13:00:00 13 1
A 2020-01-19 13:30:00 14 1
A 2020-01-19 14:00:00 15 4
A 2020-01-19 14:00:00 16 4
A 2020-01-19 14:00:00 17 4
A 2020-01-19 14:00:00 18 4
B 2020-01-19 12:00:00 19 1
B 2020-01-19 12:30:00 20 1
B 2020-01-19 13:00:00 21 1
B 2020-01-19 13:30:00 22 2
B 2020-01-19 14:00:00 23 2
B 2020-01-19 13:30:00 24 2
B 2020-01-19 14:00:00 25 2
B 2020-01-18 15:30:00 26 1
Example:
On the time 2020-01-19 13:30:00 doctor B has two bookings as shown below
Doctor Appointment Booking_ID
B 2020-01-19 13:30:00 22
B 2020-01-19 13:30:00 24
So the output will be as shown below
Doctor Appointment Booking_ID Number_of_Booking
B 2020-01-19 13:30:00 22 2
B 2020-01-19 13:30:00 24 2

For first use GroupBy.transform with GroupBy.size:
df['Number_of_Booking']=df.groupby(['Doctor','Appointment'])['Booking_ID'].transform('size')
print (df.head())
Doctor Appointment Booking_ID Number_of_Booking
0 A 2020-01-18 12:00:00 1 1
1 A 2020-01-18 12:30:00 2 1
2 A 2020-01-18 13:00:00 3 2
3 A 2020-01-18 13:00:00 4 2
4 B 2020-01-18 12:00:00 5 1
For second if unique combination of Doctor and Appointment in all data like in sample assign length of DataFrame:
df['Number_of_Booking'] = len(df)
print (df)
Doctor Appointment Booking_ID Number_of_Booking
0 B 2020-01-19 13:30:00 22 2
1 B 2020-01-19 13:30:00 24 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

create a new column based on groupby date time column at date level in pandas - pandas

Another approach using idxmin with a slightly different result: df['Session'] = 'S' + (df.groupby( ['Doctor', df.Appointment.dt.date] ).transform('idxmin').iloc[:,0]+1).astype('str')

Related

generate a random number between 2 and 40 with mean 20 as a column in pandas

Groupby count on unique date time in pandas

create a new column by comparing previous value of another column in pandas

groupby count in pandas multiple specific condition

Groupby based on date time column on minute level in pandas

Categories

Resources