Pandas Count Number of On/Off Events and Duration - pandas

I have a DataFrame with two columns, one containing the time of an event and the other containing whether the event is an On or an Off. I would like to count the number of times an On occurs followed by an Off as well as the total duration On occurs.
For example see this DataFrame:
Time Event
01:00 On
01:15 Off
01:16 Off
02:00 On
02:15 Off
23:30 On
Would have 2 On/Off events with a total duration of O:30.
I'm sure how to approach this problem.

Create a mask, which gives you the number of events. Then subtract to get the time difference.
df['Time'] = pd.to_timedelta(df.Time+':00')
m = df.Event.eq('On') & df.Event.shift(-1).eq('Off')
m.sum()
#2
(df.shift(-1).loc[m, 'Time'] - df.loc[m, 'Time']).sum()
#Timedelta('0 days 00:30:00')

Related

Plotly: highlight regular trading market hours

Goal: highlight regular trading market hours in a plotly chart.
-Using a df with standard datetime and 1 minute intervals.
-Regular trading hours = 9:30am EST to 4pm EST
—-Incase interested:
——-pre market = 4am to 9:30am
——-post market = 4pm to 8pm
Stack overflow has great links for highlighting weekend data like this:
Nevermind that link was just removed by the author as I tried to post it, but it’s too difficult for me to translate that to specific times of day anyway.
This is relatively easy to do using fig.add_vrect()
I built a similar highlighting system for night and day:
time = df.index.get_level_values("time")
# Getting info for plotting the day/night indicator
# time[0].date() picks out 00:00 (midnight) and then we add 6 hours to get 6 am.
start_morning = pd.to_datetime(time[0].date()) + pd.Timedelta(
hours=6
)
end_morning = pd.to_datetime(time[-1].date()) + pd.Timedelta(
hours=30
)
num_mornings = (end_morning - start_morning).days
# Now we build up the morning times, every day at 6 am
mornings = [
start_morning + timedelta(days=x) for x in range(num_mornings)
]
for morning in mornings:
fig.add_vrect(
# Highlighted region starts at 6 am and ends at 6 pm, every day.
x0=morning,
x1=morning + timedelta(hours=12),
fillcolor="white",
opacity=0.1,
line_width=0,
)
For you, it would just be a simple matter of adjusting the times. So for instance, for 9:30 am you can use
morning = pd.to_datetime(time[0].date()) + pd.Timedelta(hours=9.5)
to get the first day of your data, at 9:30 am. Now in fig.add_vrect() use
x0= morning
x1= morning + timedelta(hours=6.5)
to highlight between 9:30 am and 4 pm.

pandas: how to group by time intervals of varying length?

I know that it is possible to group your data by time intervals of the same length by using the function resample. But how can I group by time intervals of custom length (i.e. irregular time intervals)?
Here is an example:
Say we have a dataframe with time values, like this:
rng = pd.date_range(start='2015-02-11', periods=7, freq='M')
df = pd.DataFrame({ 'Date': rng, 'Val': np.random.randn(len(rng)) })
And we have the following time intervals:
2015-02-12 -----
2015-05-10
2015-05-10 -----
2015-08-20
2015-08-20 -----
2016-01-01
It is clear that rows with index 0,1,2 belong to the first time interval, rows with index 3,4,5 belong to the second time interval and row 7 belongs to the last time interval.
My question is: how do I group these rows according to those specific time intervals, in order to perform aggregate functions (e.g. mean) on them?

Generate dataframe with timeseries index starting today and fixed interval

I'm trying to generate pandas dataframe with timeseries index with the fixed interval. As an input parameters I need to provide set start and end date. The challenge is that the generated index starts either from month start freq='3MS' or month end with freq='3M'. That cannot be defined in number of days as the whole year needs to have exact 4 periods and the start date needs to be as the defined start date.
The expected output should be in this case:
2020-10-05
2021-01-05
2021-04-05
2021-10-05
Any ideas appreciated.
interpolated = pd.DataFrame( index=pd.date_range('2020-10-05', '2045-10-05', freq='3M'),columns['dummy'])

Is there a way to fix or bypass weird time formats in a specific column in a dataframe?

I am working with a SLURM dataset in Pandas that has time formats like so in the 'Elapsed' column:
00:00:00
00:26:51
However, sometimes there are sections that are greater than 24 hours, and it displays it like so:
1-00:02:00
3-01:25:02
I want to find the mean of the entire column but it mishandles the to_timedelta conversion on the entries with entries above 24 hours like shown above. One example is this:
Before to_timedelta: 3-01:25:02
after to_timedelta: -13 days +10:34:58
I cannot simply convert the column into a new format because when entry is not greater than 24 hours, preceding zeros do not exist, ex: 0-20:00:00
This method would be easiest I believe if there is a way however.
Is there a way to fix this conversion or any other ideas on approaching this?
One way to go around is replacing - with days:
pd.to_timedelta(df['time'].str.replace('-','days'))
Output (for 4 lines above):
0 0 days 00:00:00
1 0 days 00:26:51
2 1 days 00:02:00
3 3 days 01:25:02
Name: time, dtype: timedelta64[ns]

Query to find a block of time in a schedule

Imagine I had a work schedule from 9am to 6pm. It is divided into 15 minute blocks and appointments (increments of 15 minutes) can be fitted into the times available.
Now, if I need to insert a new appointment that is 45 minutes long is there an easy query to find a block of time that is available to fit the appointment in for a given date
The basic table design is
AppointmentId
Date
StartTime
Length - 15 minute incremenents
I would like to get a list of available times to choose from, so if the only appointment for the given day is a 30 minute one at 9:30 then the list of times would be
(No times before 9:30 as the 45 minute appointment wont fit)
10:15
10:30
10:45
...
5:15pm (last time of the day the appointment will fit)
By using ranking function (i.e Row_Number()) set number for each row in each day (let say it's name is rn), then join this query with it self by this condition q2.rn = q1.rn-1 then you have end of appointment beside start of next appointment, then calculate datediff(mi) on this end and start, so this value is the gap, then write another query wrapping this query to filter records that have gap >= yourNeededTime. Also for start of day and end of day you can create 2 dummy records one for 9am and one for 6pm so that yo can handle gap of start of day to the first appointment and last appointment to the end of day.
I hope this helps