I'm building a scheduling system where I store an initial appointment and how often it repeats. My table looks something like this:
CREATE TABLE (
id serial primary key,
initial_timestamp timestamp not null,
recurring interval
);
id initial_timestamp recurring
27 2020-06-02 3 weeks
24 2020-06-03 10 days
Assuming I can handle the time component, and that the only intervals we'll run across are days and weeks, how can I find the when those two appointments will overlap? For example, the previous example will overlap on June 23rd. It's 3 weeks from June 2nd and 20 days from June 3rd, so the first appointment will repeat once on that day and the second appointment will repeat on the 13th and then the 23rd.
In my program, I have another date, say June 7th with a recurring interval of 12 days. What query can I use to find the time it will take for a recurring appointment starting on June 7th to overlap with every existing recurring appointment? So for example, this appointment will repeat on June 19, July 1, and July 13. Appointment #24 from the table above will repeat on June 13, June 23, July 3, and July 13, if my math is right. I'd like my query comparing this appointment to appointment #24 to return, first of all, July 13th, then also how long it would take to repeat again, which I assume would be like finding the least common multiple of the two intervals, in this case, 60 days (LCM of 12 and 10). So I could expect it to repeat again on July 13 + 60 days = Sept 11.
I tried using generate_series, but since I don't know the size of the intervals, the series would have to continue infinitely, right? It's probably not the best choice here. I assume the answer would have more to do with the math of multiplying intervals somehow.
Note that recurring can be null, so I'd assume there has to be something like WHERE recurring IS NOT NULL in there somewhere. Another thing to note: no initial appointments overlap. I've already guarded against that. The search term doesn't overlap with any of the appointment's initial times either.
If it helps at all, I'm using PHP 5.3 to send queries to Postgres 9.4 (I know, it's an ancient setup). I'd prefer to do most of this in SQL just because most of the other logic is in SQL right now, so I can just run the query and start manipulating the results with PHP.
So in summary, if my math is right, what Postgres query should I use with the table above to compare a given date and interval with every date and interval pair from the table to find the next date those two overlap and how far apart each overlap instance would be?
This was hard.
WITH RECURSIVE moving_target(initial_timestamp, recurring) AS (
VALUES (timestamp '2020-06-07', interval '12 days') -- search term
)
, x AS ( -- advance to the closest day before or at moving target
SELECT t.id
, t_date + ((m_date - t_date) / t_step) * t_step AS t_date
, t_step
, m.*
FROM ( -- normalize table data
SELECT id
, initial_timestamp::date AS t_date
, EXTRACT ('days' FROM recurring)::int AS t_step
FROM tbl
WHERE recurring IS NOT NULL -- exclude!
) t
CROSS JOIN ( -- normalize input
SELECT initial_timestamp::date AS m_date
, EXTRACT ('days' FROM recurring)::int AS m_step
FROM moving_target
) m
)
, rcte AS ( -- recursive CTE
SELECT id, t_date, t_step, m_date, m_step
, ARRAY[m_date - t_date] AS gaps -- keep track of gaps
, CASE
WHEN t_date = m_date THEN true -- found match
WHEN t_step % m_step = 0 THEN false -- can never match
WHEN (m_date - t_date) % 2 = 1 -- odd gap ...
AND t_step % 2 = 0 -- ... but even steps
AND m_step % 2 = 0 THEN false -- can never match
-- WHEN <stop conditions?> THEN false -- hard to determine!
-- ELSE null -- keep searching
END AS match
FROM x
UNION ALL
SELECT id, t_date, t_step, m_date, m_step
, gaps || m_date - t_date
, CASE
WHEN t_date = m_date THEN true
WHEN (m_date - t_date) = ANY (gaps) THEN false -- gap repeated!
-- ELSE null -- keep searching
END AS match
FROM (
SELECT id
, t_date + (((m_date + m_step) - t_date) / t_step) * t_step AS t_date
, t_step
, m_date + m_step AS m_date -- + 1 step
, m_step
, gaps
FROM rcte
WHERE match IS NULL
) sub
)
SELECT id, t.initial_timestamp, t.recurring
, CASE WHEN r.match THEN r.t_date END AS match_date
FROM rcte r
JOIN tbl t USING (id)
WHERE r.match IS NOT NULL;
db<>fiddle here - with more test rows
There may be potential to improve further. The core problem is in the realm of
prime factorization. As it seems reasonable to expect fairly small intervals, I solved it by testing for cycles: If, while incrementally stepping forward, a gap between dates is detected that we have seen before, and dates didn't overlap yet, they will never overlap and we can stop. This loops at most GREATEST(m_step, t_step) times (the number of days in the bigger interval), so it shouldn't scale terribly.
I identified some basic mathematical stop conditions to avoid looping in hopeless cases a priori. There may be more ...
Explaining everything that's going on here is more work than devising the query. I added comments that should explain basics ...
Then again, while intervals are small, a "brute force" approach based on generate_series() may still be faster.
I have two date fields, DATE_FIELD_ONE = 8/30/2018 and DATE_FIELD_TWO = DATE_FIELD_ONE + 20. I need to find what DATE_FIELD_TWO should be if I'm only added 20 business days . How would I accomplish this? I thought maybe trying 'DY' but not sure how to get it to work. Thanks.
CASE WHEN TO_CHAR(TO_DATE(DATE_FIELD_ONE),'DY')='SAT' THEN 1 ELSE 0 END
CASE WHEN TO_CHAR(TO_DATE(DATE_FIELD_ONE),'DY')='SUN' THEN 1 ELSE 0 END
You may try this :
select max(date_field_two) as date_field_two
from
(
select date'2018-08-30'+
cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
else
level
end as int) as date_field_two,
sum(cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
else
1
end as int)) over (order by level) as next_day
from dual
connect by level <= 20*1.5
-- 20 is the day to be added, every time 5(#of business days)*1.5 > 7(#of week days)
-- 7=5+2<5+(5/2)=5*(1+1/2)=5*1.5 [where 1.5 is just a coefficient might be replaced a greater one like 2]
-- so 4*5*1.5=20*1.5 > 4*7
)
where next_day = 20;
DATE_FIELD_TWO
-----------------
27.09.2018
by using connect by dual clause.
P.S. Ignored the case for public holidays, which differ from one culture to another , depending on the question being related with only weekends.
Rextester Demo
Edit : Assume you have a national holidays on '2018-09-25' and '2018-09-26' (in this set of days), then consider the following :
select max(date_field_two) as date_field_two
from
(
select date'2018-08-30'+
(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
when date'2018-08-30'+level in (date'2018-09-25',date'2018-09-26') then
0
else
level
end) as date_field_two,
sum(cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
when date'2018-08-30'+level in (date'2018-09-25',date'2018-09-26') then
0
else
1
end as int)) over (order by level) as next_day
from dual
connect by level <= 20*2
)
where next_day = 20;
DATE_FIELD_TWO
-----------------
01.10.2018
which iterates one day next, as in this case, unless this holiday coincides with weekend.
you can define workdays to be whatever you like if you use a PL/SQL function
Have a simple prototype here - without any holidays - but it could be adapted for that purpose using the same kind of logic.
create or replace function add_business_days (from_date IN date, bd IN integer) return date as
fd date := trunc(from_date,'iw');
cnt int := (from_date-fd)+bd-1;
ww int := ceil(cnt/5);
wd int := mod(cnt,5);
begin
return from_date + (ww*7)+wd;
end;
/
I realize you already have an answer, but for what it's worth this is something we deal with all the time and have what has turned out to be a very good solution.
In effect, we maintain a separate table called "work days" that has every conceivable date we would ever compare (and that definition will vary from application to application, of course -- but in any case it will never be "huge" by RDBMS standards). There is a boolean flag that dictates if the date is a work day or a weekend/holiday, but more importantly there is an index value that only increments on work days. The table looks like this:
The advantage to this is transparency and scalability. If you want the difference between two dates in work days:
select
h.entry_date, h.invoice_date, wd2.workday_index - wd1.workday_index as delta
from
sales_order_data h
join util.work_days wd1 on h.sales_order_entry_dte = wd1.cal_date
join util.work_days wd2 on h.invoice_dte = wd2.cal_date
If you need to take a date in a table and add 20 days (like your original problem statement):
select
h.date_field_1, wd2.cal_date as date_field_1_plus_20
from
my_table h
join util.work_days wd1 on h.date_field_1 = wd1.cal_date
join util.work_days wd2 on
wd1.workday_index + 20 = wd2.workday_index and
wd2.is_workday
(disclaimer, this is in PostgreSQL, which is why I have the boolean. In Oracle, I'm guessing you need to change that to an integer and say =1)
Also, for the bonus question, this also gives two different options for defining "work day," one that rolls forward and another that rolls backwards (hence the workday_index and workday_index_back). For example, if you need something on a Saturday, and Saturday is not a work day, that means you need it on Friday. Conversely, if something is to be delivered on Saturday, and Saturday is not on a work day, then that means it will be available on Monday. The context of how to handle non-workdays differs, and this method affords you the option of chosing the right one.
As a final selling point, this option allows you to define holidays as non-work days also... and you can do this or not do this; it's up to you. The solution permits either option. You could theoretically add two more columns for work day index weekend only that gave you both options.
In below query, I am calculating the minimum, the maximum, and the average for a two hour interval using PostgreSQL.
The query works fine for even start hours (..04:00:00+05:30), but it gives similar result as that of even start time for odd start hours (..05:00:00+05:30).
The multiple by 2 returns even hours which is the problem.
SELECT tagid, CAST(sample_time_stamp as Date) AS stat_date,
(floor(extract(hour from sample_time_stamp)/2) * 2)::int AS hrs,
min(sensor_reading) AS theMin,
max(sensor_reading) AS theMax,
avg(sensor_reading) AS theAvg
FROM sensor_readings WHERE tagid =1 AND
sample_time_stamp BETWEEN '2012-10-23 01:00:00+05:30'
AND '2012-10-23 05:59:00+05:30'
GROUP BY tagid,CAST(sample_time_stamp as Date),
floor(extract(hour from sample_time_stamp)/2) * 2
ORDER BY tagid,stat_date, hrs
OutPut for Odd start Hour ('2012-10-23 01:00:00+05:30')
tagid date hrs theMin themax theAvg
1 2012-10-23 0 6 58 30.95
1 2012-10-23 2 2 59 29.6916666666667
1 2012-10-23 4 3 89 31.7666666666667
OutPut for Even start Hour ('2012-10-23 02:00:00+05:30')
tagid date hrs theMin themax theAvg
1 2012-10-23 2 2 59 29.6916666666667
1 2012-10-23 4 3 89 31.7666666666667
To get constant time-frames starting with your minimum timestamp:
WITH params AS (
SELECT '2012-10-23 01:00:00+05:30'::timestamptz AS _min -- input params
,'2012-10-23 05:59:00+05:30'::timestamptz AS _max
,'2 hours'::interval AS _interval
)
,ts AS (SELECT generate_series(_min, _max, _interval) AS t_min FROM params)
,timeframe AS (
SELECT t_min
,lead(t_min, 1, _max) OVER (ORDER BY t_min) AS t_max
FROM ts, params
)
SELECT s.tagid
,t.t_min
,t.t_max -- mildly redundant except for last row
,min(s.sensor_reading) AS the_min
,max(s.sensor_reading) AS the_max
,avg(s.sensor_reading) AS the_avg
FROM timeframe t
LEFT JOIN sensor_readings s ON s.tagid = 1
AND s.sample_time_stamp >= t.t_min
AND s.sample_time_stamp < t.t_max
GROUP BY 1,2,3
ORDER BY 1,2;
Can be used for any time frame and any interval length. Requires PostgreSQL 8.4 or later.
If the maximum timestamp _max does not fall on _min + n * _interval the last time-frame is truncated. The last row can therefore represent a shorter time-frame than your desired _interval.
Key elements
Common Table Expressions (CTE) for easier handling. Input parameter values once in the top CTE params.
generate_series() for intervals to create the time raster.
Window function lead(...) with 3 parameters (including default) - to cover the special case of last row.
LEFT JOIN between raster and actual data, so that time frames without matching data will still show in the result (with NULL values as data).
That's also the reason for a later edit: WHERE condition had to move to the LEFT JOIN condition, to achieve that.
Alternative time frame generation with recursive CTE:
WITH RECURSIVE params AS (
SELECT '2012-10-23 01:00:00+05:30'::timestamptz AS _min -- input params
,'2012-10-23 05:59:00+05:30'::timestamptz AS _max
,'2 hours'::interval AS _interval
)
, timeframe AS (
SELECT _min AS t_min, LEAST(_min + _interval, _max) AS t_max
FROM params
UNION ALL
SELECT t_max, LEAST(t_max + _interval, _max)
FROM timeframe t, params p
WHERE t_max < _max
)
SELECT ...
Slightly faster ... take your pick.
-> sqlfiddle displaying both.
Note that you can have non-recursive CTEs (additionally) even when declared WITH RECURSIVE.
Performance & Index
Should be faster than your original query. Half the code deals with generating the time raster, which concerns few rows and is very fast. Handling actual table rows (the expensive part) gets cheaper, because we don't calculate a new value from every sample_time_stamp any more.
You should definitely have a multi-column index of the form:
CREATE INDEX foo_idx ON sensor_readings (tagid, sample_time_stamp DESC);
I use DESC on the assumption that you more often query recent entries (later timestamps). Remove the modifier if that's not the case. Doesn't make a big difference either way.
Okay, so I've done quite a lot of reading on the possibility of emulating the networkdays function of excel in sql, and have come to the conclusion that by far the easiest solution is to have a calendar table which will flag working days or non working days. However, due to circumstances out of my control, we don't have access to such a luxury and it's unlikely that we will any time in the near future.
Currently I have managed to bodge together what is undoubtedly a horrible ineffecient query in SQL that does work - the catch is, it will only work for a single client record at a time.
SELECT O_ASSESSMENTS.ASM_ID,
O_ASSESSMENTS.ASM_START_DATE,
O_ASSESSMENTS.ASM_END_DATE,
sum(CASE
When TO_CHAR(O_ASSESSMENTS.ASM_START_DATE + rownum -1,'Day')
= 'Sunday ' THEN 0
When TO_CHAR(O_ASSESSMENTS.ASM_START_DATE + rownum -1,'Day')
= 'Saturday ' THEN 0
WHEN O_ASSESSMENTS.ASM_START_DATE + rownum - 1
IN ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
THEN 0
ELSE 1
END)-1 AS Week_Day
From O_ASSESSMENTS,
ALL_OBJECTS
WHERE O_ASSESSMENTS.ASM_QSA_ID IN ('TYPE1')
AND O_ASSESSMENTS.ASM_END_DATE >= '01/01/2012'
AND O_ASSESSMENTS.ASM_ID = 'A00000'
AND ROWNUM <= O_ASSESSMENTS.ASM_END_DATE-O_ASSESSMENTS.ASM_START_DATE+1
GROUP BY
O_ASSESSMENTS.ASM_ID,
O_ASSESSMENTS.ASM_START_DATE,
O_ASSESSMENTS.ASM_END_DATE
Basically, I'm wondering if a) I should stop wasting my time on this or b) is it possible to get this to work for multiple clients? Any pointers appreciated thanks!
Edit: Further clarification - I already work out timescales using excel, but it would be ideal if we could do it in the report as the report in question is something that we would like end users to be able to run without any further manipulation.
Edit:
MarkBannister's answer works perfectly albeit slowly (though I had expected as much given it's not the preferred solution) - the challenge now lies in me integrating this into an existing report!
with
calendar_cte as (select
to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'day') in ('sunday ','saturday ') then 0 when to_date('01-01-2000')+level-1 in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011','01-01-2012','02-01-2012') then 0 else 1 end working_day
from dual
connect by level <= 1825 + sysdate - to_date('01-01-2000') )
SELECT
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE,
sum(c.working_day)-1 AS Week_Day
From
O_ASSESSMENTS a
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
WHERE a.ASM_QSA_ID IN ('TYPE1')
and a.ASM_END_DATE >= '01/01/2012'
GROUP BY
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE
There are a few ways to do this. Perhaps the simplest might be to create a CTE that produces a virtual calendar table, based on Oracle's connect by syntax, and then join it to the Assesments table, like so:
with calendar_cte as (
select to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'Day')
in ('Sunday ','Saturday ') then 0
when to_date('01-01-2000')+level-1
in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
then 0
else 1
end working_day
from dual
connect by level <= 36525 + sysdate - to_date('01-01-2000') )
SELECT a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE,
sum(c.working_day) AS Week_Day
From O_ASSESSMENTS a
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
WHERE a.ASM_QSA_ID IN ('TYPE1') and
a.ASM_END_DATE >= '01/01/2012' -- and a.ASM_ID = 'A00000'
GROUP BY
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE
This will produce a virtual table populated with dates from 01 January 2000 to 10 years after the current date, with all weekends marked as non-working days and all days specified in the second in clause (ie. up to 27 December 2011) also marked as non-working days.
The drawback of this method (or any method where the holiday dates are hardcoded into the query) is that each time new holiday dates are defined, every single query that uses this approach will have to have those dates added.
If you can't use a calendar table in Oracle, you might be better off exporting to Excel. Brute force always works.
Networkdays() "returns the number of whole working days between start_date and end_date. Working days exclude weekends and any dates identified in holidays."
Excluding weekends seems fairly straightforward. Every 7-day period will contain two weekend days. You'll just need to take some care with the leftover days.
Holidays are a different story. You have to either store them or pass them as an argument. If you could store them, you'd store them in a calendar table, and your problem would be over. But you can't do that.
So you're looking at passing them as an argument. Off the top of my head--and I haven't had any tea yet this morning--I'd consider a common table expression or a wrapper for a stored procedure.