Finding a minimum date before another date - sql

Let's say I have two tables. One is a table with information about customer service inquiries, which contains information about the customer and the time the inquiry was placed. The customer's information (in this case, the ID) is saved for all future inquiries.
CUST_ID INQUIRY_ID INQUIRY_DATE
001 34 2015-05-03 08:15
001 36 2015-05-05 13:12
002 39 2015-05-10 18:43
003 42 2015-05-12 14:58
003 46 2015-05-14 07:27
001 50 2015-05-18 19:06
003 55 2015-05-20 11:40
The other table contains information about the resolution dates for all customer inquiries.
CUST_ID RESOLVED_DATE
001 2015-05-06 12:54
002 2015-05-11 08:09
003 2015-05-14 19:37
001 2015-05-19 16:12
003 2015-05-22 08:40
The resolution table doesn't have a key to link to the inquiry table other than the CUST_ID, so in order to calculate the time to resolution, I want to determine the minimum inquiry date before the resolution for EACH resolution date. The resulting table would look like this:
CUST_ID FIRST_INQUIRY RESOLVED_DT
001 2015-05-03 08:15 2015-05-06 12:54
001 2015-05-18 19:06 2015-05-19 16:12
002 2015-05-10 18:43 2015-05-11 08:09
003 2015-05-12 14:58 2015-05-14 19:37
003 2015-05-20 11:40 2015-05-22 08:40
At first I just went with min(case when INQUIRY_DATE < RESOLVED_DT), but for people like customers 001 and 003 who have multiple inquiries across different dates, the query would just return the first ever inquiry date, not the first since the last inquiry. Does anyone know how to do this? I'm using Netezza.

One option is to create a subquery for each table (inquries and resolutions) which numbers the transaction for each CUST_ID using the date. Then, the two subqueries can be joined together using this ordered index column along with the CUST_ID.
I also used the INQUIRY_ID in the inquiries table to break a tie, should it occur. There is not way to break a tie in the resolutions table for a given customer and date based on the data you showed us.
SELECT t1.CUST_ID, t1.INQUIRY_ID AS FIRST_INQUIRY, t2.RESOLVED_DATE AS RESOLVED_DT
FROM
(
SELECT CUST_ID, INQUIRY_ID, INQUIRY_DATE,
(SELECT COUNT(*) + 1
FROM inquiries
WHERE CUST_ID = t.CUST_ID AND INQUIRY_DATE <= t.INQUIRY_DATE
AND INQUIRY_ID < t.INQUIRY_ID) AS index
FROM inquiries AS t
) AS t1
INNER JOIN
(
SELECT CUST_ID, RESOLVED_DATE,
(SELECT COUNT(*) + 1
FROM resolutions
WHERE CUST_ID = t.CUST_ID AND RESOLVED_DATE < t.RESOLVED_DATE) AS index
FROM resolutions t
) AS t2
ON t1.CUST_ID = t2.CUST_ID AND t1.index = t2.index
Here are what the subquery tables look like:
inquiries:
CUST_ID INQUIRY_ID INQUIRY_DATE index
001 34 2015-05-03 08:15 1
001 36 2015-05-05 13:12 2
002 39 2015-05-10 18:43 1
003 42 2015-05-12 14:58 1
003 46 2015-05-14 07:27 2
001 50 2015-05-18 19:06 3
003 55 2015-05-20 11:40 3
resolutions:
CUST_ID RESOLVED_DATE index
001 2015-05-06 12:54 1
002 2015-05-11 08:09 1
003 2015-05-14 19:37 1
001 2015-05-19 16:12 2
003 2015-05-22 08:40 2
Note that this solution is not robust to missing data, e.g. there is an inquiry which was not closed, or the resolution was never recorded.

Related

Rolling On-Hand Remainder column?

CONumber, LineNumber, PartNumber, OrderQty, ScheduleDate, OnHandQty columns are a pure SELECT query with no transformations. I am trying to recreate the RollingOnHand column in SQL.
The rules are
If a part only has one row, report the real [OnHandQty]
If a part has multiple rows, the oldest order consumes its [OrderQty] from [OnHandQty]
The next oldest order pulls its [OrderQty] from the remaining [OnHandQty], repeat until final row of the matching part
The last row of a given part will display the remaining [OnHandQty]
Is this possible to accomplish in an SQL query?
CONumber
LineNumber
PartNumber
OrderQty
ScheduleDate
OnHandQty
RollingOnHand
C02959
00002
Part 01
102
2022-04-01
0
0
C04017
00001
Part 02
2007
2022-04-01
5099
5099
C04107
00001
Part 03
1
2022-03-09
0
0
C04106
00001
Part 04
1
2022-03-09
0
0
C04108
00001
Part 05
1
2022-03-09
0
0
C03514
00002
Part 06
250
2022-03-11
310
250
C03514
00003
Part 06
250
2022-03-18
310
60
C03757
00001
Part 06
250
2022-04-06
310
0
C04225
00002
Part 07
40
2022-03-31
53
53
C03965
00002
Part 08
24
2022-04-04
0
0
C04034
00001
Part 09
88
2022-03-18
128
128
C04144
00002
Part 10
22
2022-04-04
0
0
C04141
00001
Part 10
100
2022-04-04
0
0
C03734
00003
Part 11
116
2022-03-29
103
103
C03379
00001
Part 12
128
2022-03-07
19
19
C03344
00003
Part 13
40
2022-03-11
5
5
C04058
00001
Part 14
407
2022-03-25
0
0
C03697
00002
Part 15
436
2022-04-04
235
235
C03689
00002
Part 16
111
2022-03-16
87
87
C03690
00001
Part 16
250
2022-03-23
87
0
C03690
00002
Part 16
250
2022-04-06
87
0
C03240
00004
Part 17
3
2022-03-16
30
3
C03725
00001
Part 17
250
2022-03-16
30
27
C03725
00002
Part 17
250
2022-03-23
30
0
C03726
00001
Part 17
250
2022-04-01
30
0
C03726
00002
Part 17
250
2022-04-06
30
0
C03596
00017
Part 18
56
2022-04-06
344
344
C03927
00001
Part 19
600
2022-04-04
1800
600
C03927
00002
Part 19
1000
2022-04-06
1800
1200
I think this basically does what you need (Fiddle)
WITH T AS
(
SELECT *,
AlreadyConsumed = SUM(OrderQty) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
PrevLineNumber = LAG([LineNumber]) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC),
NextLineNumber = LEAD([LineNumber]) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC)
FROM Demo
)
SELECT CONumber,
LineNumber,
PartNumber,
OrderQty,
ScheduleDate,
OnHandQty,
RollingOnHand = CASE
--If a part only has one row, report the real [OnHandQty]
WHEN PrevLineNumber IS NULL
AND NextLineNumber IS NULL THEN OnHandQty
--Not the last row and won't use all the remainder up
WHEN NextLineNumber IS NOT NULL AND Remainder > OrderQty THEN OrderQty
--otherwise use what's left
ELSE Remainder
END
FROM T
CROSS APPLY (SELECT CASE WHEN AlreadyConsumed > OnHandQty THEN 0 ELSE OnHandQty - ISNULL(AlreadyConsumed,0) END) C(Remainder)
The
SUM ... PARTITION BY [PartNumber] ... ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING computes the cumulative OrderQty for all rows before the current row (not including it)
The LAG/ LEAD results are used as indicators to determine whether we are in the first/last rows of a partition and special logic is needed.
I didn't quite follow the rationale behind the business logic so I may have made some invalid simplifications but it returns the desired results with the sample data and anyway the query should be easy to tweak if needed.

Match group of variables and values with the nearest datetime

I have a transaction table that looks like that:
transaction_start store_no item_no amount post_voided
2021-03-01 10:00:00 001 101 45 N
2021-03-01 10:00:00 001 105 25 N
2021-03-01 10:00:00 001 109 40 N
2021-03-01 10:05:00 002 103 35 N
2021-03-01 10:05:00 002 135 20 N
2021-03-01 10:08:00 001 140 2 N
2021-03-01 10:11:00 001 101 -45 Y
2021-03-01 10:11:00 001 105 -25 Y
2021-03-01 10:11:00 001 109 -40 Y
The table does not have an id column; the transaction_start for a given store_no will never be the same.
Whenever a transaction is post voided, the transaction is then repeated with the same store_no, item_no but with a negative/minus amount and an equal or higher transaction_start. Also, the column post_voided is then equal to 'Y'.
In the example above, the rows 1-3 have the same transaction_start and store_no, thus belonging to the same receipt, containing three different items (101, 105, 109). The same logic is applied to the other rows: rows 4-5 belong to a same receipt, and so on. In the example, 4 different receipts can be seen. The last receipt, given by the last three rows, is a post voided of the first receipt (rows 1-3).
What I want to do is to change the transaction_start for the post_voided = 'Y' transactions (in my example, only one receipt - represented by the last three rows - has it) to the next/closest datetime of a similar receipt that has the variables store_no, item_no and (negative) amount (but post_voided = 'N') (in my example, the similar ticket is given by the first three rows - store_no, all item_no and (positive) amount match). The transaction_start for the post voided receipt is always equal or higher than the "original" receipt.
Desired output:
transaction_start store_no item_no amount post_voided
2021-03-01 10:00:00 001 101 45 N
2021-03-01 10:00:00 001 105 25 N
2021-03-01 10:00:00 001 109 40 N
2021-03-01 10:05:00 002 103 35 N
2021-03-01 10:05:00 002 135 20 N
2021-03-01 10:08:00 001 140 2 N
2021-03-01 10:00:00 001 101 -45 Y
2021-03-01 10:00:00 001 105 -25 Y
2021-03-01 10:00:00 001 109 -40 Y
Here a link of the table: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=26142fa24e46acb4213b96c86f4eb94b
Thanks in advance!
Consider below
select a.* replace(ifnull(b.transaction_start, a.transaction_start) as transaction_start)
from `project.dataset.table` a
left join (
select * replace(-amount as amount)
from `project.dataset.table`
where post_voided = 'N'
) b
using (store_no, item_no)
if applied to sample data in your question - output is
Consider below for new / extended example (https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=91f9f180fd672e7c357aa48d18ced5fd)
select x.* replace(ifnull(y.original_transaction_start, x.transaction_start) as transaction_start)
from `project.dataset.table` x
left join (
select b.transaction_start, b.store_no, b.item_no, b.amount amount,
max(a.transaction_start) original_transaction_start
from `project.dataset.table` a
join `project.dataset.table` b
on a.store_no = b.store_no
and a.item_no = b.item_no
and a.amount = -b.amount
and a.post_voided = 'N'
and b.post_voided = 'Y'
and a.transaction_start < b.transaction_start
group by b.transaction_start, b.store_no, b.item_no, b.amount
) y
using (store_no, item_no, amount, transaction_start)
with output

Query to find active days per year to find revenue per user per year

I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;
You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.

SQL - Creating a timeline for each ID (Vertica)

I am dealing with the following problem in SQL (using Vertica):
In short -- Create a timeline for each ID (in a table where I have multiple lines, orders in my example, per ID)
What I would like to achieve -- At my disposal I have a table on historical order date and I would like to compute new customer (first order ever in the past month), active customer- (>1 order in last 1-3 months), passive customer- (no order for last 3-6 months) and inactive customer (no order for >6 months) rates.
Which steps I have taken so far -- I was able to construct a table similar to the example presented below:
CustomerID Current order date Time between current/previous order First order date (all-time)
001 2015-04-30 12:06:58 (null) 2015-04-30 12:06:58
001 2015-09-24 17:30:59 147 05:24:01 2015-04-30 12:06:58
001 2016-02-11 13:21:10 139 19:50:11 2015-04-30 12:06:58
002 2015-10-21 10:38:29 (null) 2015-10-21 10:38:29
003 2015-05-22 12:13:01 (null) 2015-05-22 12:13:01
003 2015-07-09 01:04:51 47 12:51:50 2015-05-22 12:13:01
003 2015-10-23 00:23:48 105 23:18:57 2015-05-22 12:13:01
A little bit of intuition: customer 001 placed three orders from which the second one was 147 days after its first order. Customer 002 has only placed one order in total.
What I think that the next steps should be -- I would like to know for each date (also dates on which a certain user did not place an order), for each CustomerID, how long it has been since his/her last order. This would imply that I would create some sort of timeline for each CustomerID. In the example presented above I would get 287 (days between 1st of May 2015 and 11th of February 2016, the timespan of this table) lines for each CustomerID. I have difficulties solving this previous step. When I have performed this step I want to create a field which shows at each date the last order date, the period between the last order date and the current date, and what state someone is in at the current date. For the example presented earlier, this would look something like this:
CustomerID Last order date Current date Time between current date /last order State
001 2015-04-30 12:06:58 2015-05-01 00:00:00 0 00:00:00 New
...
001 2015-04-30 12:06:58 2015-06-30 00:00:00 60 11:53:02 Active
...
001 2015-09-24 17:30:59 2016-02-01 00:00:00 129 11:53:02 Passive
...
...
002 2015-10-21 17:30:59 2015-10-22 00:00:00 0 06:29:01 New
...
002 2015-10-21 17:30:59 2015-11-30 00:00:00 39 06:29:01 Active
...
...
003 2015-05-22 12:13:01 2015-06-23 00:00:00 31 11:46:59 Active
...
003 2015-07-09 01:04:51 2015-10-22 00:00:00 105 11:46:59 Inactive
...
At the dots there should be all the inbetween dates but for sake of space I have left these out of the table.
When I know for each date what the state is of each customer (active/passive/inactive) my plan is to sum the states and group by date which should give me the sum of new, active, passive and inactive customers. From here on I can easily compute the rates at each date.
Anybody that knows how I can possibly achieve this task?
Note -- If anyone has other ideas how to achieve the goal presented above (using some other approach compared to the approach I had in mind) please let me know!
EDIT
Suppose you start from a table like this:
SQL> select * from ord order by custid, ord_date ;
custid | ord_date
--------+---------------------
1 | 2015-04-30 12:06:58
1 | 2015-09-24 17:30:59
1 | 2016-02-11 13:21:10
2 | 2015-10-21 10:38:29
3 | 2015-05-22 12:13:01
3 | 2015-07-09 01:04:51
3 | 2015-10-23 00:23:48
(7 rows)
You can use Vertica's Timeseries Analytic Functions TS_FIRST_VALUE(), TS_LAST_VALUE() to fill gaps and interpolate last_order date to the current date:
Then you just have to join this with a Vertica's TimeSeries generated from the same table with interval one day starting from the first day each customer did place his/her first order up to now (current_date):
select
custid,
status_dt,
last_order_dt,
case
when status_dt::date - last_order_dt::date < 30 then case
when nord = 1 then 'New' else 'Active' end
when status_dt::date - last_order_dt::date < 90 then 'Active'
when status_dt::date - last_order_dt::date < 180 then 'Passive'
else 'Inactive'
end as status
from (
select
custid,
last_order_dt,
status_dt,
conditional_true_event (first_order_dt is null or
last_order_dt > lag(last_order_dt))
over(partition by custid order by status_dt) as nord
from (
select
custid,
ts_first_value(ord_date) as first_order_dt ,
ts_last_value(ord_date) as last_order_dt ,
dt::date as status_dt
from
( select custid, ord_date from ord
union all
select distinct(custid) as custid, current_date + 1 as ord_date from ord
) z timeseries dt as '1 day' over (partition by custid order by ord_date)
) x
) y
where status_dt <= current_date
order by 1, 2
;
And you will get something like this:
custid | status_dt | last_order_dt | status
--------+------------+---------------------+---------
1 | 2015-04-30 | 2015-04-30 12:06:58 | New
1 | 2015-05-01 | 2015-04-30 12:06:58 | New
1 | 2015-05-02 | 2015-04-30 12:06:58 | New
...
1 | 2015-05-29 | 2015-04-30 12:06:58 | New
1 | 2015-05-30 | 2015-04-30 12:06:58 | Active
1 | 2015-05-31 | 2015-04-30 12:06:58 | Active
...
etc.

Creating a field(s) that counts days within a month from date range?

Similar to the following:
Count days within a month from date range
I want to find a way, within the MS-Access Query Design environment, to create fields that count the number of month/year days within a date range.
Here is what I want the data to look like:
Row | StartDate | EndDate | #DaysJan2010 | #DaysFeb2010 | #DaysMarch2010
001 01/02/2010 02/04/2012 29 28 31
002 01/02/2010 01/05/2010 4 0 0
003 04/02/2010 05/05/2010 0 0 0
004 01/02/2010 02/04/2012 29 28 31
005 02/02/2012 02/03/2012 0 2 0
Please keep in mind that both month and year are important because I need to be able to distinguish between the number of days that fall within a given date range for January 2010 and January 2011, as opposed to just the number of days within a given date range that are in January.
If there is a systematic way of performing of creating these fields by using SQL in Access, that would be my preferred method.
However, in the event that it is impossible (or very difficult) to do so, I would like to know how to build each field in the expression builder, so that I may at least be able to generate the count fields one at a time.
As always, thank you very much for your time.
There are cases where date manipulations can be aided by a "dates table". Similar to a "numbers table", a "dates table" is a table containing one row for every date in a given range, usually covering the entire range of dates that one could expect to encounter in the actual data.
For sample data in a table named [SampleData]
Row StartDate EndDate
--- ---------- ----------
001 2010-01-02 2012-02-04
002 2010-01-02 2010-01-05
003 2010-04-02 2010-05-05
004 2010-01-02 2012-02-04
005 2012-02-02 2012-02-03
and a [DatesTable] that is simply
theDate
----------
2010-01-01
2010-01-02
2010-01-03
...
2012-12-30
2012-12-31
the query
SELECT
sd.Row,
dt.theDate,
Year(dt.theDate) AS theYear,
Month(dt.theDate) AS theMonth
FROM
SampleData AS sd
INNER JOIN
DatesTable AS dt
ON dt.theDate >= sd.StartDate
AND dt.theDate <= sd.EndDate
returns a row for each date in the interval for each [SampleData].[Row] value. (For this particular sample data, that's 1568 rows in total.)
Performing an aggregation on that
SELECT
Row,
theYear,
theMonth,
COUNT(*) AS NumberOfDays
FROM
(
SELECT
sd.Row,
dt.theDate,
Year(dt.theDate) AS theYear,
Month(dt.theDate) AS theMonth
FROM
SampleData AS sd
INNER JOIN
DatesTable AS dt
ON dt.theDate >= sd.StartDate
AND dt.theDate <= sd.EndDate
) AS allDates
GROUP BY
Row,
theYear,
theMonth
gives us all of the counts
Row theYear theMonth NumberOfDays
--- ------- -------- ------------
001 2010 1 30
001 2010 2 28
001 2010 3 31
001 2010 4 30
001 2010 5 31
001 2010 6 30
001 2010 7 31
001 2010 8 31
001 2010 9 30
001 2010 10 31
001 2010 11 30
001 2010 12 31
001 2011 1 31
001 2011 2 28
001 2011 3 31
001 2011 4 30
001 2011 5 31
001 2011 6 30
001 2011 7 31
001 2011 8 31
001 2011 9 30
001 2011 10 31
001 2011 11 30
001 2011 12 31
001 2012 1 31
001 2012 2 4
002 2010 1 4
003 2010 4 29
003 2010 5 5
004 2010 1 30
004 2010 2 28
004 2010 3 31
004 2010 4 30
004 2010 5 31
004 2010 6 30
004 2010 7 31
004 2010 8 31
004 2010 9 30
004 2010 10 31
004 2010 11 30
004 2010 12 31
004 2011 1 31
004 2011 2 28
004 2011 3 31
004 2011 4 30
004 2011 5 31
004 2011 6 30
004 2011 7 31
004 2011 8 31
004 2011 9 30
004 2011 10 31
004 2011 11 30
004 2011 12 31
004 2012 1 31
004 2012 2 4
005 2012 2 2
We can then report on that, or crosstab it, or do any number of other fun things.
Side note:
One circumstance where a "dates table" can be very useful is when we have to deal with Statutory Holidays. That is because
Sometimes the "day off" for a Statutory Holiday is not the actual day. If "International Bacon Day" falls on a Sunday then we might get the Monday off.
Some Statutory Holidays can be tricky to calculate. For example, Good Friday for us Canadians is (if I remember correctly) "the Friday before the first Sunday after the first full moon after the Spring Equinox".
If we have a "dates table" then we can add a [StatutoryHoliday] Yes/No field to flag all of the (observed) holidays and then use ... WHERE NOT StatutoryHoliday to exclude them.