Add N business days to a given date skipping holidays, exceptions and weekends in SQL DB2 - sql

I'm facing a challenging task here, spent a day on it and I was only able to solve it through a procedure but it is taking too long to run for all projects.
I would like to solve it in a single query if possible (no functions or procedures).
There is already some questions here doing it in programming languages OR sql functions/procedures (Wich I also solved min). So I'm asking if it is possible to solve it with just SQL
The background info is:
A project table
A phase table
A holiday table
A dayexception table which cancel a holiday or a weekend day (make that date as a working day) and it is associated with a project
A project may have 0-N phases
A phase have a start date, a duration and a draworder (needed by the system)
Working days is all days that is not weekend days and not a holiday (exception is if that date is in dayexception table)
Consider this following scenario:
project | phase(s) | Dayexception | Holiday
id | id pid start duration draworder | pid date | date
1 | 1 1 2014-01-20 10 0 | 1 2014-01-25 | 2014-01-25
| 2 1 2014-02-17 14 2 | |
The ENDDATE for the project id 1 and phase id 1 is actually 2014-01-31 see the generated data below:
The date on the below data (and now on) is formatted as dd/mm/yyyy (Brazil format) and the value N is null
proj pha start day weekday dayexcp holiday workday
1 1 20/01/2014 20/01/2014 2 N N 1
1 1 20/01/2014 21/01/2014 3 N N 1
1 1 20/01/2014 22/01/2014 4 N N 1
1 1 20/01/2014 23/01/2014 5 N N 1
1 1 20/01/2014 24/01/2014 6 N N 1
1 1 20/01/2014 25/01/2014 7 25/01/2014 25/01/2014 1
1 1 20/01/2014 26/01/2014 1 N N 0
1 1 20/01/2014 27/01/2014 2 N 27/01/2014 0
1 1 20/01/2014 28/01/2014 3 N N 1
1 1 20/01/2014 29/01/2014 4 N N 1
To generate the above data I created a view daysOfYear with all days from 2014 and 2015 (it can be bigger or smaller, created it with two years for the year turn cases) with a CTE query if you guys want to see it let me know and I will add it here. And the following select statement:
select ph.project_id proj,
ph.id phase_id pha,
ph.start,
dy.curday day,
dy.weekday, /*weekday here is a calling to the weekday function of db2*/
doe.exceptiondate dayexcp,
h.date holiday,
case when exceptiondate is not null or (weekday not in (1,7) and h.date is null)
then 1 else 0 end as workday
from phase ph
inner join daysofyear dy
on (year(ph.start) = dy.year)
left join dayexception doe
on (ph.project_id = doe.project_id
and dy.curday = truncate(doe.exceptiondate))
left join holiday h
on (dy.curday = truncate(h.date))
where ph.project_id = 1
and ph.id = 1
and dy.year in (year(ph.start),year(ph.start)+1)
and dy.curday>=ph.start
and dy.curday<=ph.start + ((duration - 1) days)
order by ph.project_id, start, dy.curday, draworder
To solve this scenario I created the following query:
select project_id,
min(start),
max(day) + sum(case when workday=0 then 1 else 0 end) days as enddate
from project_phase_days /*(view to the above select)*/
This will return correctly:
proj start enddate
1 20/01/2014 31/01/2014
The problem I couldn't solve is if the days I'm adding (non workdays sum(case when workday=0 then 1 else 0 end) days ) to the last enddate (max(day)) is weekend days or holidays or exceptions.
See the following scenario (The duration for the below phase is 7):
proj pha start day weekday dayexcp holiday workday
81 578 14/04/2014 14/04/2014 2 N N 1
81 578 14/04/2014 15/04/2014 3 N N 1
81 578 14/04/2014 16/04/2014 4 N N 1
81 578 14/04/2014 17/04/2014 5 N N 1
81 578 14/04/2014 18/04/2014 6 N 18/04/2014 0
81 578 14/04/2014 19/04/2014 7 N 0
81 578 14/04/2014 20/04/2014 1 N 20/04/2014 0
/*the below data I added to show the problem*/
81 578 14/04/2014 21/04/2014 2 N 21/04/2014 0
81 578 14/04/2014 22/04/2014 3 N 1
81 578 14/04/2014 23/04/2014 4 N 1
81 578 14/04/2014 24/04/2014 5 N 1
With the above data my query will return
proj start enddate
81 14/04/2014 23/04/2014
But the correct result would be the enddate as 24/04/2014 that's because my query doesn't take into account if the days after the last day is weekend days or holidays (or exceptions for that matter) as you can see in the dataset above the day 21/04/2014 which is outside my duration is also a Holiday.
I also tried to create a CTE on phase table to add a day for each iteration until the duration is over but I couldn't add the exceptions nor the holidays because the DB2 won't let me add a left join on the CTE recursion. Like this:
with CTE (projectid, start, enddate, duration, level) as (
select projectid, start, start as enddate, duration, 1
from phase
where project_id=1
and phase_id=1
UNION ALL
select projectid, start, enddate + (level days), duration,
case when isWorkDay(enddate + (level days)) then level+1 else level end as level
from CTE left join dayexception on ...
left join holiday on ...
where level < duration
) select * from CTE
PS: the above query doesn't work because of the DB2 limitations and isWorkDay is just as example (it would be a case on the dayexception and holiday table values).
If you have any doubts, please just ask in the comments.
Any help would be greatly appreciated. Thanks.

How to count business days forward and backwards.
Background last Century I worked at this company that used this technique. So this is a pseudo code answer. It worked great for their purposes.
What you need is a table that contains a date column and and id column that increments by one. Fill the table with only business dates... That's the tricky part because of the observing date on another date. Like 2017-01-02 was a holiday where I work but its not really a recognized holiday AFAIK.
How to get 200 business days in the future.
Select the min(id) where date >= to current date.
Select the date where id=id+200.
How to get 200 business days in the past.
Select the min(id) from table with a date >= to current date.
Select the date with id=id-200.
Business days between.
select count(*) from myBusinessDays where "date" between startdate and enddate
Good Luck as this is pseudo code.

So, using the idea of #danny117 answer I was able to create a query to solve my problem. Not exactly his idea but it gave me directions to solve it, so I will mark it as the correct answer and this answer is to share the actual code to solve it.
First let me share the view I created to the periods. As I said I created a view daysofyear with the data of 2014 and 2015 (in my final solution I added a considerable bigger interval without impacting in the end result). Ps: the date format here is in Brazil format dd/mm/yyyy
create or replace view daysofyear as
with CTE (curday, year, weekday) as (
select a1.firstday, year(a1.firstday), dayofweek(a1.firstday)
from (select to_date('01/01/1990', 'dd/mm/yyyy') firstday
from sysibm.sysdummy1) as a1
union all
select a.curday + 1 day as sumday,
year(a.curday + 1 day),
dayofweek(a.curday + 1 day)
from CTE a
where a.curday < to_date('31/12/2050', 'dd/mm/yyyy')
)
select * from cte;
With that View I then created another view with the query on my question adding an amount of days based on my historical data (bigger phase + a considerable margin) here it is:
create or replace view project_phase_days as
select ph.project_id proj,
ph.id phase_id pha,
ph.start,
dy.curday day,
dy.weekday, /*weekday here is a calling to the weekday function of db2*/
doe.exceptiondate dayexcp,
h.date holiday,
ph.duration,
case when exceptiondate is not null or (weekday not in (1,7) and h.date is null)
then 1 else 0 end as workday
from phase ph
inner join daysofyear dy
on (year(ph.start) = dy.year)
left join dayexception doe
on (ph.project_id = doe.project_id
and dy.curday = truncate(doe.exceptiondate))
left join holiday h
on (dy.curday = truncate(h.date))
where dy.year in (year(ph.start),year(ph.start)+1)
and dy.curday>=ph.start
and dy.curday<=ph.start + ((duration - 1) days) + 200 days
/*max duration in database is 110*/
After that I then created this query:
select p.id,
a.start,
a.curday as enddate
from project p left join
(
select p1.project_id,
p1.duration,
p1.start,
p1.curday,
row_number() over (partition by p1.project_id
order by p1.project_id, p1.start, p1.curday) rorder
from project_phase_days p1
where p1.validday=1
) as a
on (p.id = a.project_id
and a.rorder = a.duration)
order by p.id, a.start
What it does is select all workdays from my view (joined with my other days view) rownumber based on the project_id ordered by project_id, start date and current day (curday) I then join with the project table to get the trick part that solved the problem which is a.rorder = a.duration
If you guys need more explanation I will be glad to provide.

Related

how to calculate occupancy on the basis of admission and discharge dates

Suppose I have patient admission/claim wise data like the sample below. Data type of patient_id and hosp_id columns is VARCHAR
Table name claims
rec_no
patient_id
hosp_id
admn_date
discharge_date
1
1
1
01-01-2020
10-01-2020
2
2
1
31-12-2019
11-01-2020
3
1
1
11-01-2020
15-01-2020
4
3
1
04-01-2020
10-01-2020
5
1
2
16-01-2020
17-01-2020
6
4
2
01-01-2020
10-01-2020
7
5
2
02-01-2020
11-01-2020
8
6
2
03-01-2020
12-01-2020
9
7
2
04-01-2020
13-01-2020
10
2
1
31-12-2019
10-01-2020
I have another table wherein bed strength/max occupancy strength of hospitals are stored.
table name beds
hosp_id
bed_strength
1
3
2
4
Expected Results I want to find out hospital-wise dates where its declared bed-strength has exceeded on any day.
Code I have tried Nothing as I am new to SQL. However, I can solve this in R with the following strategy
pivot_longer the dates
tidyr::complete() missing dates in between
summarise or aggregate results for each date.
Simultaneously, I also want to know that whether it can be done without pivoting (if any) in sql because in the claims table there are 15 million + rows and pivoting really really slows down the process. Please help.
You can use generate_series() to do something very similar in Postgres. For the occupancy by date:
select c.hosp_id, gs.date, count(*) as occupanyc
from claims c cross join lateral
generate_series(admn_date, discharge_date, interval '1 day') gs(date)
group by c.hosp_id, gs.date;
Then use this as a subquery to get the dates that exceed the threshold:
select hd.*, b.strength
from (select c.hosp_id, gs.date, count(*) as occupancy
from claims c cross join lateral
generate_series(c.admn_date, c.discharge_date, interval '1 day') gs(date)
group by c.hosp_id, gs.date
) hd join
beds b
using (hosp_id)
where h.occupancy > b.strength

How to LEFT JOIN on ROW_NUM using WITH

Right now I'm in the testing phase of this query so I'm only testing it on two Queries. I've gotten stuck on the final part where I want to left join everything (this will have to be extended to 12 separate queries). The problem is basically as the title suggests--I want to join 12 queries on the created Row_Num column using the WITH() statement, instead of creating 12 separate tables and saving them as table in a database.
WITH Jan_Table AS
(SELECT ROW_NUMBER() OVER (ORDER BY a.SALE_DATE) as Row_ID, a.SALE_DATE, sum(a.revenue) as Jan_Rev
FROM ba.SALE_TABLE a
WHERE a.SALE_DATE BETWEEN '2015-01-01' and '2015-01-31'
GROUP BY a.SALE_DATE)
SELECT ROW_NUMBER() OVER (ORDER BY a.SALE_DATE) as Row_ID, a.SALE_DATE, sum(a.revenue) as Jun_Rev, j.Jan_Rev
FROM ba.SALE_TABLE a
LEFT JOIN Jan_Table j
on "j.Row_ID" = a.Row_ID
WHERE a.SALE_DATE BETWEEN '2015-06-01' and '2015-06-30'
GROUP BY a.SALE_DATE
And then I get this error message:
ERROR: column "j.Row_ID" does not exist
I put in the "j.Row_ID" because the previous message was:
ERROR: column a.row_id does not exist Hint: Perhaps you meant to
reference the column "j.row_id".
Each query works individually without the JOIN and WITH functions. I have one for every month of the year and want to join 12 of these together eventually.
The output should be a single column with ROW_NUM and 12 Monthly Revenues columns. Each row should be a day of the month. I know not every month has 31 days. So, for example, Feb only has 28 days, meaning I'd want days 29, 30, and 31 as NULLs. The query above still has the dates--but I will remove the "SALE_DATE" column after I can just get these two queries to join.
My initially thought was just to create 12 tables but I think that'd be a really bad use of space and not the most logical solution to this problem if I were to extend this solution.
edit
Below are the separate outputs of the two qaruies above and the third table is what I'm trying to make. I can't give you the raw data. Everything above has been altered from the actual column names and purposes of the data that I'm using. And I don't know how to create a dataset--that's too above my head in SQL.
Jan_Table (first five lines)
Row_Num Date Jan_Rev
1 2015-01-01 20
2 2015-01-02 20
3 2015-01-03 20
4 2015-01-04 20
5 2015-01-05 20
Jun_Table (first five lines)
Row_Num Date Jun_Rev
1 2015-06-01 30
2 2015-06-02 30
3 2015-06-03 30
4 2015-06-04 30
5 2015-06-05 30
JOINED_TABLE (first five lines)
Row_Num Date Jun_Rev Date Jan_Rev
1 2015-06-01 30 2015-01-01 20
2 2015-06-02 30 2015-01-02 20
3 2015-06-03 30 2015-01-03 20
4 2015-06-04 30 2015-01-04 20
5 2015-06-05 30 2015-01-05 20
It seems like you can just use group by and conditional aggregation for your full query:
select day(sale_date),
max(case when month(sale_date) = 1 then sale_date end) as jan_date,
max(case when month(sale_date) = 1 then revenue end) as jan_revenue,
max(case when month(sale_date) = 2 then sale_date end) as feb_date,
max(case when month(sale_date) = 2 then revenue end) as feb_revenue,
. . .
from sale_table s
group by day(sale_date)
order by day(sale_date);
You haven't specified the database you are using. DAY() is a common function to get the day of the month; MONTH() is a common function to get the months of the year. However, those particular functions might be different in your database.

T-SQL filtering records based on dates and time difference with other records

I have a table for which I have to perform a rather complex filter: first a filter by date is applied, but then records from the previous and next days should be included if their time difference does not exceed 8 hours compared to its prev or next record (depending if the date is less or greater than filter date).
For those adjacent days the selection should stop at the first record that does not satisfy this condition.
This is how my raw data looks like:
Id
Desc
EntryDate
1
Event type 1
2021-03-12 21:55:00.000
2
Event type 1
2021-03-12 01:10:00.000
3
Event type 1
2021-03-11 20:17:00.000
4
Event type 1
2021-03-11 05:04:00.000
5
Event type 1
2021-03-10 23:58:00.000
6
Event type 1
2021-03-10 11:01:00.000
7
Event type 1
2021-03-10 10:00:00.000
In this example set, if my filter date is '2021-03-11', my expected result set should be all records from that day plus adjacent records from 03-12 and 03-10 that satisfy the 8 hours condition. Note how record with Id 7 is not be included because record with Id 6 does not comply:
Id
EntryDate
2
2021-03-12 01:10:00.000
3
2021-03-11 20:17:00.000
4
2021-03-11 05:04:00.000
5
2021-03-10 23:58:00.000
Need advice how to write this complex query
This is a variant of gaps-and-islands. Define the difference . . . and then groups based on the differences:
with e as (
select t.*
from (select t.*,
sum(case when prev_entrydate > dateadd(hour, -8, entrydate) then 0 else 1 end) over (order by entrydate) as grp
from (select t.*,
lag(entrydate) over (order by entrydate) as prev_entrydate
from t
) t
)
select e.*
from e.*
where e.grp in (select e2.grp
from t e2
where date(e2.entrydate) = #filterdate
);
Note: I'm not sure exactly how filter date is applied. This assumes that it is any events on the entire day, which means that there might be multiple groups. If there is only one group (say the first group on the day), the query can be simplified a bit from a performance perspective.
declare #DateTime datetime = '2021-03-11'
select *
from t
where t.EntryDate between DATEADD(hour , -8 , #DateTime) and DATEADD(hour , 32 , #DateTime)

Determine a specific fortnight based on anchor dates

I have 2 x bi-weekly periods that were defined by 2 starting dates 1 week apart. For example, Group 1 started on 2016-01-15 and Group 2 started on 2016-01-22.
By bi-weekly, I mean a rolling period lasting 2 weeks.
How can I determine if the current date is in week 1 of Group 1 or is in week 1 of Group 2?
By way of example, today's date is 2016-04-04 so this would be day 1 of Group 2 and day 8 of Group 1, therefore I would like to a query to return 'Group 2'.
DATEDIFF calculates the difference between two dates. Divide it by 14 days and take the remainder (%).
If remainder is less than 7, then it is closer to that starting date.
Since you know that your starting dates are 1 week apart you really need to check only one starting date.
DECLARE #VarStartGroup1 date = '2016-01-15';
DECLARE #VarStartGroup2 date = '2016-01-22';
DECLARE #VarCurrentDate date = '2016-04-04';
SELECT
DATEDIFF(day, #VarStartGroup1, #VarCurrentDate) AS TotalDays1,
DATEDIFF(day, #VarStartGroup2, #VarCurrentDate) AS TotalDays2,
DATEDIFF(day, #VarStartGroup1, #VarCurrentDate) % 14 AS DayNumberInGroup1,
DATEDIFF(day, #VarStartGroup2, #VarCurrentDate) % 14 AS DayNumberInGroup2,
CASE WHEN DATEDIFF(day, #VarStartGroup1, #VarCurrentDate) % 14 < 7
THEN 'Group1' ELSE 'Group2' END AS Result
;
Result
+------------+------------+-------------------+-------------------+--------+
| TotalDays1 | TotalDays2 | DayNumberInGroup1 | DayNumberInGroup2 | Result |
+------------+------------+-------------------+-------------------+--------+
| 80 | 73 | 10 | 3 | Group2 |
+------------+------------+-------------------+-------------------+--------+
I included intermediate calculations in the result to help understand what is going on.

How to aggregate 7 days in SQL

I was trying to aggregate a 7 days data for FY13 (starts on 10/1/2012 and ends on 9/30/2013) in SQL Server but so far no luck yet. Could someone please take a look. Below is my example data.
DATE BREAD MILK
10/1/12 1 3
10/2/12 2 4
10/3/12 2 3
10/4/12 0 4
10/5/12 4 0
10/6/12 2 1
10/7/12 1 3
10/8/12 2 4
10/9/12 2 3
10/10/12 0 4
10/11/12 4 0
10/12/12 2 1
10/13/12 2 1
So, my desired output would be like:
DATE BREAD MILK
10/1/12 1 3
10/2/12 2 4
10/3/12 2 3
10/4/12 0 4
10/5/12 4 0
10/6/12 2 1
Total 11 15
10/7/12 1 3
10/8/12 2 4
10/9/12 2 3
10/10/12 0 4
10/11/12 4 0
10/12/12 2 1
10/13/12 2 1
Total 13 16
--------through 9/30/2013
Please note, since FY13 starts on 10/1/2012 and ends on 9/30/2012, the first week of FY13 is 6 days instead of 7 days.
I am using SQL server 2008.
You could add a new computed column for the date values to group them by week and sum the other columns, something like this:
SELECT DATEPART(ww, DATEADD(d,-2,[DATE])) AS WEEK_NO,
SUM(Bread) AS Bread_Total, SUM(Milk) as Milk_Total
FROM YOUR_TABLE
GROUP BY DATEPART(ww, DATEADD(d,-2,[DATE]))
Note: I used DATEADD and subtracted 2 days to set the first day of the week to Monday based on your dates. You can modify this if required.
Use option with GROUP BY ROLLUP operator
SELECT CASE WHEN DATE IS NULL THEN 'Total' ELSE CONVERT(nvarchar(10), DATE, 101) END AS DATE,
SUM(BREAD) AS BREAD, SUM(MILK) AS MILK
FROM dbo.test54
GROUP BY ROLLUP(DATE),(DATENAME(week, DATE))
Demo on SQLFiddle
Result:
DATE BREAD MILK
10/01/2012 1 3
10/02/2012 2 4
10/03/2012 2 3
10/04/2012 0 4
10/05/2012 4 0
10/06/2012 2 1
Total 11 15
10/07/2012 1 3
10/08/2012 4 7
10/10/2012 0 4
10/11/2012 4 0
10/12/2012 2 1
10/13/2012 2 1
Total 13 16
You are looking for a rollup. In this case, you will need at least one more column to group by to do your rollup on, the easiest way to do that is to add a computed column that groups them into weeks by date.
Take a lookg at: Summarizing Data Using ROLLUP
Here is the general idea of how it could be done:
You need a derived column for each row to determine which fiscal week that record belongs to. In general you could subtract that record's date from 10/1, get the number of days that have elapsed, divide by 7, and floor the result.
Then you can GROUP BY that derived column and use the SUM aggregate function.
The biggest wrinkle is that 6 day week you start with. You may have to add some logic to make sure that the weeks start on Sunday or whatever day you use but this should get you started.
The WITH ROLLUP suggestions above can help; you'll need to save the data and transform it as you need.
The biggest thing you'll need to be able to do is identify your weeks properly. If you don't have those loaded into tables already so you can identify them, you can build them on the fly. Here's one way to do that:
CREATE TABLE #fy (fyear int, fstart datetime, fend datetime);
CREATE TABLE #fylist(fyyear int, fydate DATETIME, fyweek int);
INSERT INTO #fy
SELECT 2012, '2011-10-01', '2012-09-30'
UNION ALL
SELECT 2013, '2012-10-01', '2013-09-30';
INSERT INTO #fylist
( fyyear, fydate )
SELECT fyear, DATEADD(DAY, Number, DATEADD(DAY, -1, fy.fstart)) AS fydate
FROM Common.NUMBERS
CROSS APPLY (SELECT * FROM #fy WHERE fyear = 2013) fy
WHERE fy.fend >= DATEADD(DAY, Number, DATEADD(DAY, -1, fy.fstart));
WITH weekcalc AS
(
SELECT DISTINCT DATEPART(YEAR, fydate) yr, DATEPART(week, fydate) dt
FROM #fylist
),
ridcalc AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY yr, dt) AS rid, yr, dt
FROM weekcalc
)
UPDATE #fylist
SET fyweek = rid
FROM #fylist
JOIN ridcalc
ON DATEPART(YEAR, fydate) = yr
AND DATEPART(week, fydate) = dt;
SELECT list.fyyear, list.fyweek, p.[date], COUNT(bread) AS Bread, COUNT(Milk) AS Milk
FROM products p
JOIN #fylist list
ON p.[date] = list.fydate
GROUP BY list.fyyear, list.fyweek, p.[date] WITH ROLLUP;
The Common.Numbers reference above is a simple numbers table that I use for this sort of thing (goes from 1 to 1M). You could also build that on the fly as needed.