How to find the number of hours between two dates excluding weekends and certain holidays in Python? BusinessHours package - pandas

I'm trying to find a very clean method to calculate the number of hours between two dates excluding weekends and certain holidays.
What I found out is that the package BusinessHours (https://pypi.python.org/pypi/BusinessHours/1.01) can do this. However I did not find any instruction on how to use the package (the syntax actually) especially how to input the holidays.
I found the original code of the package (https://github.com/dnel/BusinessHours/blob/master/BusinessHours.py) but still not so sure.
I guess it could be something like this:
date1 = pd.to_datetime('2017-01-01 00:00:00')
date2 = pd.to_datetime('2017-01-22 12:00:00')
import BusinessHour
gethours(date1, date2, worktiming=[8, 17], weekends=[6, 7])
Still, where can I input the holidays? And what if I do not want to exclude the non-office-hour, am I just adjust the worktiming to worktiming=[0,23]?
Anyone know how to use this package please tell me about it. I appreciate it.
P/s: I knew a command in numpy to get the number of business days between 2 dates (busday_count) but there is no command to get the result in hours. Any other commands in pandas or numpy that can fulfill the task are welcomed too.
Thank you

Try out this package called business-duration in PyPi Link to PyPi
Example Code
from business_duration import businessDuration
import pandas as pd
from datetime import time,datetime
import holidays as pyholidays
startdate = pd.to_datetime('2017-01-01 00:00:00')
enddate = pd.to_datetime('2017-01-22 12:00:00')
holidaylist = pyholidays.Australia()
unit='hour'
#By default Saturday and Sunday are excluded
print(businessDuration(startdate,enddate,holidaylist=holidaylist,unit=unit))
Output: 335.99611
holidaylist:
{datetime.date(2017, 1, 1): "New Year's Day",
datetime.date(2017, 1, 2): "New Year's Day (Observed)",
datetime.date(2017, 1, 26): 'Australia Day',
datetime.date(2017, 3, 6): 'Canberra Day',
datetime.date(2017, 4, 14): 'Good Friday',
datetime.date(2017, 4, 15): 'Easter Saturday',
datetime.date(2017, 4, 17): 'Easter Monday',
datetime.date(2017, 4, 25): 'Anzac Day',
datetime.date(2017, 6, 12): "Queen's Birthday",
datetime.date(2017, 9, 26): 'Family & Community Day',
datetime.date(2017, 10, 2): 'Labour Day',
datetime.date(2017, 12, 25): 'Christmas Day',
datetime.date(2017, 12, 26): 'Boxing Day'}

Reusing code from sources out there, I assembled this code that seems to work (for UK holidays) but I'd be keen on comments on how to improve it.
I know it is not particularly elegant but may help someone.
Btw, I would like find a way to plug calendars from the Holiday library into this one.
In any case, currently it does not need many libraries, just pandas and datetime, which is possibly a plus.
import pandas as pd
import datetime
from pandas.tseries.offsets import CDay
from pandas.tseries.holiday import (
AbstractHolidayCalendar, DateOffset, EasterMonday,
GoodFriday, Holiday, MO,
next_monday, next_monday_or_tuesday)
# This function will calculate the number of working minutes by first
# generating a time series of business days. Then it will calculate the
# precise working minutes for the start and end date, and use the total
# working hours for each day in-between.
def count_mins(starttime,endtime, bus_day_series, bus_start_time,bus_end_time):
mins_in_working_day=(bus_end_time-bus_start_time)*60
# now we are going to take the series of business days (pre-calculated)
# and sub select the period provided as argument of the function
# we could do the calculation of that "calendar" in the function itself
# but to improve performance, we calculate it separately and then we c
# call the function with that series as argument, provided the dates
# fall within the calculated range, of course
days = bus_day_series[starttime.date():endtime.date()]
daycount = len(days)
if len(days)==0:
return 0
else:
first_day_start = days[0].replace(hour=bus_start_time, minute=0)
first_day_end = days[0].replace(hour=bus_end_time, minute=0)
first_period_start = max(first_day_start, starttime)
first_period_end = min(first_day_end, endtime)
if first_period_end<=first_period_start:
first_day_mins=0
else:
first_day_sec=first_period_end - first_period_start
first_day_mins=first_day_sec.seconds/60
if daycount == 1:
return first_day_mins
else:
last_period_start = days[-1].replace(hour=bus_start_time, minute=0)
#we know the last day will always start in the bus_start_time
last_day_end = days[-1].replace(hour=bus_end_time, minute=0)
last_period_end = min(last_day_end, endtime)
if last_period_end<=last_period_start:
last_day_mins=0
else:
last_day_sec=last_period_end - last_period_start
last_day_mins=last_day_sec.seconds/60
middle_days_mins=0
if daycount>2:
middle_days_mins=(daycount-2)*mins_in_working_day
return first_day_mins + last_day_mins + middle_days_mins
# Calculates the date series with all the business days
# of the period we are interested on
class EnglandAndWalesHolidayCalendar(AbstractHolidayCalendar):
rules = [
Holiday('New Years Day', month=1, day=1, observance=next_monday),
GoodFriday,
EasterMonday,
Holiday('Early May bank holiday',
month=5, day=1, offset=DateOffset(weekday=MO(1))),
Holiday('Spring bank holiday',
month=5, day=31, offset=DateOffset(weekday=MO(-1))),
Holiday('Summer bank holiday',
month=8, day=31, offset=DateOffset(weekday=MO(-1))),
Holiday('Christmas Day', month=12, day=25, observance=next_monday),
Holiday('Boxing Day',
month=12, day=26, observance=next_monday_or_tuesday)
]
# From this point its how we use the function
# Here we hardcode a start/end date to create the list of business days
cal = EnglandAndWalesHolidayCalendar()
dayindex = pd.bdate_range(datetime.date(2019,1,1),datetime.date.today(),freq=CDay(calendar=cal))
day_series = dayindex.to_series()
# Convenience function to simplify how we call the main function
# It will take a pre calculated day_series.
def bus_hr(ts_start, ts_end, day_series ):
BUS_START=8
BUS_END=20
minutes = count_mins(ts_start, ts_end, day_series, BUS_START, BUS_END)
return int(round(minutes/60,0))
#A set of checks that the function is working properly
assert bus_hr( pd.Timestamp(2019,9,30,6,1,0) , pd.Timestamp(2019,10,1,9,0,0),day_series) == 13
assert bus_hr( pd.Timestamp(2019,10,3,10,30,0) , pd.Timestamp(2019,10,3,23,30,0),day_series)==10
assert bus_hr( pd.Timestamp(2019,8,25,10,30,0) , pd.Timestamp(2019,8,27,10,0,0),day_series) ==2
assert bus_hr( pd.Timestamp(2019,12,25,8,0,0) , pd.Timestamp(2019,12,25,17,0,0),day_series) ==0
assert bus_hr( pd.Timestamp(2019,12,26,8,0,0) , pd.Timestamp(2019,12,26,17,0,0),day_series) ==0
assert bus_hr( pd.Timestamp(2019,12,27,8,0,0) , pd.Timestamp(2019,12,27,17,0,0),day_series) ==9
assert bus_hr( pd.Timestamp(2019,6,24,5,10,44) , pd.Timestamp(2019,6,24,7,39,17),day_series)==0
assert bus_hr( pd.Timestamp(2019,6,24,5,10,44) , pd.Timestamp(2019,6,24,8,29,17),day_series)==0
assert bus_hr( pd.Timestamp(2019,6,24,5,10,44) , pd.Timestamp(2019,6,24,10,0,0),day_series)==2
assert bus_hr(pd.Timestamp(2019,4,30,21,19,0) , pd.Timestamp(2019,5,1,16,17,56),day_series)==8
assert bus_hr(pd.Timestamp(2019,4,30,21,19,0) , pd.Timestamp(2019,5,1,20,17,56),day_series)==12

The most current pip install of this package 1.2 has an error in line 51 with "extraday" which needs to be changed to "extradays" .
I too have been scouring the internet for some workable code to calculate business hours and business days. This package had a little bit of tweeking but works just fine when you get it up and running.
This is what I have in my notebook:
#import BusinessHours
from BusinessHours import BusinessHours as bh
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
date1 = pd.to_datetime('2017-01-01 00:00:00')
date2 = pd.to_datetime('2017-01-22 12:00:00')
bh(date1, date2, worktiming=[8, 17], weekends=[6, 7]).gethours()
This was also in the source code:
'''
holidayfile - A file consisting of the predetermined office holidays.
Each date starts in a new line and currently must only be in the format
dd-mm-yyyy
'''
Hope this helps

Related

Business Day Counts btw 2 date series (for countries which have a "friday - saturday" weekend )

I try to calculate the working day counts between 2 date columns for each row. My data is consisted of different countries all over the world. -I found the working day counts for European countries by:
df['count'] = np.busday_count (df['Start_Date_column'].tolist(), df['Final_Date_column'].tolist())
-However, some muslim countries like Oman, Bahrain, Kuwait, Qatar etc. have Friday-Saturday weekend. Do you have a suggestion for me to solve this problem for these exceptional countries?
After all, I have solved my problem combining some methods and wanted to share it with everybody who may need it.
I'm kinda beginner in Python, therefore my code could be way improved but here is
the solution which works for me:
P.S: I used some European holidays for trying out, you can customize it in line with your need.
start_date= df['start_date']
end_date= df["final_date"]
data = pd.DataFrame(list(zip(start_date,end_date)), columns = ['Start Date', 'End Date'])
dubai_workdays= "Sun Mon Tue Wed Thu"
dubai_hol = CustomBusinessDay(holidays= [pd.datetime(2022, 10, 3),
pd.datetime(2023, 1, 6),
pd.datetime(2022, 12, 26),
pd.datetime(2022, 12, 31),
pd.datetime(2022, 12, 25)],
weekmask=dubai_workdays)
#pd.bdate_range(pd.datetime(2023, 1, 1), pd.datetime(2023, 1, 2), holidays=dubai_hol, freq= 'C', weekmask = None)
data['Bus_Days'] = data.apply(lambda x: len(pd.bdate_range(x['Start Date'],
x['End Date'],
freq= dubai_hol)), axis=1)

Add business days to pandas dataframe with dates and skip over holidays python

I have a dataframe with dates as seen in the table below. 1st block is what it should look like and the 2nd block is what I get when just adding the BDays. This is an example of what it should look like when completed. I want to use the 1st column and add 5 business days to the dates, but if the 5 Bdays overlaps a holiday (like 15 Feb'21) then I need to add one additional day. It is fairly simple to add the 5Bday using pandas.tseries.offsets import BDay, but i cannot skip the holidays while using the dataframe.
I have tried to use pandas.tseries.holiday import USFederalHolidayCalendar, the workdays and workalendar modules, but cannot figure it out. Anyone have an idea what I can do.
Correct Example
DATE
EXIT DATE +5
2021/02/09
2021/02/17
2021/02/10
2021/02/18
Wrong Example
DATE
EXIT DATE +5
2021/02/09
2021/02/16
2021/02/10
2021/02/17
Here are some examples of code I tried:
import pandas as pd
from workdays import workday
...
df['DATE'] = workday(df['EXIT DATE +5'], days=5, holidays=holidays)
Next Example:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
dt = df['DATE']
df['EXIT DATE +5'] = dt + bday_us
=========================================
Final code:
Below is the code I finally settled on. I had to define the holidays manually due to the days the NYSE actually trades. Like for instance the day Pres Bush was laid to rest.
import datetime as dt
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
Holiday('BushDay', year=2018, month=12, day=5),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
offset = 5
df = pd.DataFrame(['2019-10-11', '2019-10-14', '2017-04-13', '2018-11-28', '2021-07-02'], columns=['DATE'])
df['DATE'] = pd.to_datetime(df['DATE'])
def offset_date(start, offset):
return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USTradingCalendar())
df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)
print(df)
Input data
df = pd.DataFrame(['2021-02-09', '2021-02-10', '2021-06-28', '2021-06-29', '2021-07-02'], columns=['DATE'])
df['DATE'] = pd.to_datetime(df['DATE'])
Suggested solution using apply
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay
def offset_date(start, offset):
return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USFederalHolidayCalendar())
offset = 5
df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)
DATE END
2021-02-09 2021-02-17
2021-02-10 2021-02-18
2021-06-28 2021-07-06
2021-06-29 2021-07-07
2021-07-02 2021-07-12
PS: If you want to use a particular calendar such as the NYSE, instead of the default USFederalHolidayCalendar, I recommend following the instructions on this answer, about creating a custom calendar.
Alternative solution which I do not recommend
Currently, to the best of my knowledge, pandas do not support a vectorized approach to your problem. But if you want to follow a similar approach to the one you mentioned, here is what you should do.
First, you will have to define an arbitrary far away end date that includes all the periods you might need and use it to create a list of holidays.
holidays = USFederalHolidayCalendar().holidays(start='2021-02-09', end='2030-02-09')
Then, you pass the holidays list to CustomBusinessDay through the holidays parameter instead of the calendar to generate the desired offset.
offset = 5
bday_us = pd.offsets.CustomBusinessDay(n=offset, holidays=holidays)
df['END'] = df['DATE'] + bday_us
However, this type of approach is not a true vectorized solution, even though it might seem like it. See the following SO answer for further clarification. Under the hood, this approach is probably doing a conversion that is not efficient. This why it yields the following warning.
PerformanceWarning: Non-vectorized DateOffset being applied to Series
or DatetimeIndex
Here's one way to do it
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from datetime import timedelta as td
def get_exit_date(date):
holiday_list = cals.holidays(start=date, end=date + td(weeks=2)).tolist()
# 6 periods since start date is included in set
n_bdays = pd.bdate_range(start=date, periods=6, freq='C', holidays=holiday_list)
return n_bdays[-1]
df = pd.read_clipboard()
cals = USFederalHolidayCalendar()
# I would convert this to datetime
df['DATE'] = pd.to_datetime(df['DATE'])
df['EXIT DATE +5'] = df['DATE'].apply(get_exit_date)
this is using bdate_range which returns a datetime index
Results:
DATE EXIT DATE +5
0 2021-02-09 2021-02-17
1 2021-02-10 2021-02-18
Another option is instead of dynamically creating the holiday list. You could also just choose a start date and leave it outside the function like so:
def get_exit_date(date):
# 6 periods since start date is included in set
n_bdays = pd.bdate_range(start=date, periods=6, freq='C', holidays=holiday_list)
return n_bdays[-1]
df = pd.read_clipboard()
cals = USFederalHolidayCalendar()
holiday_list = cals.holidays(start='2021-01-01').tolist()
# I would convert this to datetime
df['DATE'] = pd.to_datetime(df['DATE'])
df['EXIT DATE +5'] = df['DATE'].apply(get_exit_date)

Compare date difference with pandas timestamp values

I'm trying to perform specific operations based on the age of data in days within a dataframe. What I am looking for is something like as follows:
import pandas as pd
if 10days < (pd.Timestamp.now() - pd.Timestamp(2019, 3, 20)):
print 'The data is older than 10 days'
Is there something I can replace "10days" with or some other way I can perform operations based on the difference between two Timestamp values?
What you're looking for is pd.Timedelta('10D'), pd.Timedelta(10, unit='D') (or unit='days' or unit='day'), or pd.Timedelta(days=10). For example,
In [37]: pd.Timedelta(days=10) < pd.Timestamp.now() - pd.Timestamp(2019, 3, 20)
Out[37]: False
In [38]: pd.Timedelta(days=5) < pd.Timestamp.now() - pd.Timestamp(2019, 3, 20)
Out[38]: True

proper date format for pandas datareader?

can someone please explain how to input the proper date format for pandas datareader? it seems like i have tried both date formats in the past and they have worked. however, in the last few days these lines only output the last year's worth of data...
import pandas_datareader.data as wb
import datetime
start = datetime.datetime(2012,1,1)
end = datetime.datetime(2012,12,31)
df = wb.DataReader ('GE', 'google', '2012, 1, 1', '2012, 12, 31') # doesn't work
print (df)
df2 = wb.DataReader ('GE', 'google', start, end) # doesn't work
print (df2)
abbreviated output for both:
Open High Low Close Volume
Date
2016-09-15 29.55 29.85 29.42 29.75 35262527
...
2017-09-13 23.93 24.18 23.92 24.11 38629676
thanks,
david
during the process of fixing this, i upgraded to the most current version of pandas (0.20.3) and pandas-datareader (0.5.0). that did not fix the code in the question. the problem appears to be trying to use google as the source. the code below runs correctly but uses yahoo as the source. however, it fails when trying to use google as the source.
from pandas_datareader import data, wb
from datetime import date
start = date (2012, 1, 1)
end = date (2012, 12, 31)
df = data.DataReader ('GE', 'yahoo', start, end)
print (df)

adding labels to candlestick chart in matplotlib

I'm trying to add a series (composed of a list of [1,2,0,....]) to a candlestick chart I produced with matplotlib, but cannot work out how to include those labels for each specific candle in the graph. Basically I'd like to produce a chart like this one:
(source: linnsoft.com)
with the labels with the numbers (my signal series) just over or below each candles.
Is there any way I can reach that?
Don't know if it helps, but my series are of the pandas DataFrame kind...
Here's an example derived from - http://matplotlib.org/examples/pylab_examples/finance_demo.html
Take special note of ax.annotate method call in the code below.
from pylab import *
from matplotlib.dates import DateFormatter, WeekdayLocator, HourLocator, \
DayLocator, MONDAY
from matplotlib.finance import quotes_historical_yahoo, candlestick,\
plot_day_summary, candlestick2
# (Year, month, day) tuples suffice as args for quotes_historical_yahoo
date1 = ( 2004, 2, 1)
date2 = ( 2004, 4, 12 )
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator() # minor ticks on the days
weekFormatter = DateFormatter('%b %d') # Eg, Jan 12
dayFormatter = DateFormatter('%d') # Eg, 12
quotes = quotes_historical_yahoo('INTC', date1, date2)
if len(quotes) == 0:
raise SystemExit
fig = figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
#ax.xaxis.set_minor_formatter(dayFormatter)
#plot_day_summary(ax, quotes, ticksize=3)
candlestick(ax, quotes, width=0.6)
ax.xaxis_date()
ax.autoscale_view()
setp( gca().get_xticklabels(), rotation=45, horizontalalignment='right')
import datetime
dt = datetime.datetime(2004, 3, 8)
# Annotating a specific candle
ax.annotate('This is my special candle', xy=(dt, 24), xytext=(dt, 25),
arrowprops=dict(facecolor='black', shrink=0.05),
)
show()
The resulting plot if you run this file, should show you:-