I'm building a scheduling system where I store an initial appointment and how often it repeats. My table looks something like this:
CREATE TABLE (
id serial primary key,
initial_timestamp timestamp not null,
recurring interval
);
id initial_timestamp recurring
27 2020-06-02 3 weeks
24 2020-06-03 10 days
Assuming I can handle the time component, and that the only intervals we'll run across are days and weeks, how can I find the when those two appointments will overlap? For example, the previous example will overlap on June 23rd. It's 3 weeks from June 2nd and 20 days from June 3rd, so the first appointment will repeat once on that day and the second appointment will repeat on the 13th and then the 23rd.
In my program, I have another date, say June 7th with a recurring interval of 12 days. What query can I use to find the time it will take for a recurring appointment starting on June 7th to overlap with every existing recurring appointment? So for example, this appointment will repeat on June 19, July 1, and July 13. Appointment #24 from the table above will repeat on June 13, June 23, July 3, and July 13, if my math is right. I'd like my query comparing this appointment to appointment #24 to return, first of all, July 13th, then also how long it would take to repeat again, which I assume would be like finding the least common multiple of the two intervals, in this case, 60 days (LCM of 12 and 10). So I could expect it to repeat again on July 13 + 60 days = Sept 11.
I tried using generate_series, but since I don't know the size of the intervals, the series would have to continue infinitely, right? It's probably not the best choice here. I assume the answer would have more to do with the math of multiplying intervals somehow.
Note that recurring can be null, so I'd assume there has to be something like WHERE recurring IS NOT NULL in there somewhere. Another thing to note: no initial appointments overlap. I've already guarded against that. The search term doesn't overlap with any of the appointment's initial times either.
If it helps at all, I'm using PHP 5.3 to send queries to Postgres 9.4 (I know, it's an ancient setup). I'd prefer to do most of this in SQL just because most of the other logic is in SQL right now, so I can just run the query and start manipulating the results with PHP.
So in summary, if my math is right, what Postgres query should I use with the table above to compare a given date and interval with every date and interval pair from the table to find the next date those two overlap and how far apart each overlap instance would be?
This was hard.
WITH RECURSIVE moving_target(initial_timestamp, recurring) AS (
VALUES (timestamp '2020-06-07', interval '12 days') -- search term
)
, x AS ( -- advance to the closest day before or at moving target
SELECT t.id
, t_date + ((m_date - t_date) / t_step) * t_step AS t_date
, t_step
, m.*
FROM ( -- normalize table data
SELECT id
, initial_timestamp::date AS t_date
, EXTRACT ('days' FROM recurring)::int AS t_step
FROM tbl
WHERE recurring IS NOT NULL -- exclude!
) t
CROSS JOIN ( -- normalize input
SELECT initial_timestamp::date AS m_date
, EXTRACT ('days' FROM recurring)::int AS m_step
FROM moving_target
) m
)
, rcte AS ( -- recursive CTE
SELECT id, t_date, t_step, m_date, m_step
, ARRAY[m_date - t_date] AS gaps -- keep track of gaps
, CASE
WHEN t_date = m_date THEN true -- found match
WHEN t_step % m_step = 0 THEN false -- can never match
WHEN (m_date - t_date) % 2 = 1 -- odd gap ...
AND t_step % 2 = 0 -- ... but even steps
AND m_step % 2 = 0 THEN false -- can never match
-- WHEN <stop conditions?> THEN false -- hard to determine!
-- ELSE null -- keep searching
END AS match
FROM x
UNION ALL
SELECT id, t_date, t_step, m_date, m_step
, gaps || m_date - t_date
, CASE
WHEN t_date = m_date THEN true
WHEN (m_date - t_date) = ANY (gaps) THEN false -- gap repeated!
-- ELSE null -- keep searching
END AS match
FROM (
SELECT id
, t_date + (((m_date + m_step) - t_date) / t_step) * t_step AS t_date
, t_step
, m_date + m_step AS m_date -- + 1 step
, m_step
, gaps
FROM rcte
WHERE match IS NULL
) sub
)
SELECT id, t.initial_timestamp, t.recurring
, CASE WHEN r.match THEN r.t_date END AS match_date
FROM rcte r
JOIN tbl t USING (id)
WHERE r.match IS NOT NULL;
db<>fiddle here - with more test rows
There may be potential to improve further. The core problem is in the realm of
prime factorization. As it seems reasonable to expect fairly small intervals, I solved it by testing for cycles: If, while incrementally stepping forward, a gap between dates is detected that we have seen before, and dates didn't overlap yet, they will never overlap and we can stop. This loops at most GREATEST(m_step, t_step) times (the number of days in the bigger interval), so it shouldn't scale terribly.
I identified some basic mathematical stop conditions to avoid looping in hopeless cases a priori. There may be more ...
Explaining everything that's going on here is more work than devising the query. I added comments that should explain basics ...
Then again, while intervals are small, a "brute force" approach based on generate_series() may still be faster.
Date jm Text
-------- ---- ----
6/3/2015 ne Good
6/4/2015 ne Good
6/5/2015 ne Same
6/8/2015 ne Same
I want to count how often the "same" value occurs in a set of consecutive days.
I dont want to count the value for the whole database. Now on the current date it is 2 (above example).
It is very important for me that "Same" never occurs...
The query has to ignore the weekend (6 and 7 june).
Date jm Text
-------- ---- ----
6/3/2015 ne Same
6/4/2015 ne Same
6/5/2015 ne Good
6/8/2015 ne Good
In this example the count is zero
Okay, I'm starting to get the picture, although at first I thought you wanted to count by jm, and now it seems you want to count by Text = 'Same'. Anyway, that's what this query should do. It gets the row for the current date. Is connects all previous rows and counts them. Also, it shows whether the current text (and that of the connected rows).
So the query will return one row (if there is one for today), which will show the date, jm and Text of the current date, the number of consecutive days for which the Text has been the same (just in case you want to know how many days it is 'Good'), and the number of days (either 0 or the same as the other count) for which the Text has been 'Same'.
I hope this query is right, or at least it gives you an idea of how to solve the problem using CONNECT BY. I should mention I based the 'Friday-detection' on this question.
Also, I don't have Oracle at hand, so please forgive me for any minor syntax errors.
WITH
VW_SAMESTATUSES AS
( SELECT t.*
FROM YourTable t
START WITH -- Start with the row for today
t.Date = trunc(sysdate)
CONNECT BY -- Connect to previous row that have a lower date.
-- Note that PRIOR refers to the prior record, which is
-- actually the NEXT day. :)
t.Date = PRIOR t.Date +
CASE MOD(TO_CHAR(t.Date, 'J'), 7) + 1
WHEN 5 THEN 3 -- Friday, so add 3
ELSE 1 -- Other days, so add one
END
-- And the Text also has to match to the one of the next day.
AND t.Text = PRIOR t.Text)
SELECT s.Date,
s.jm,
MAX(Text) AS CurrentText, -- Not really MAX, they are actually all the same
COUNT(*) AS ConsecutiveDays,
COUNT(CASE WHEN Text = 'Same' THEN 1 END) as SameCount
FROM VW_SAMESTATUSES s
GROUP BY s.Date,
s.jm
This recursive query (available from Oracle version 11g) might be useful:
with s(tcode, tdate) as (
select tcode, tdate from test where tdate = date '2015-06-08'
union all
select t.tcode, t.tdate from test t, s
where s.tcode = t.tcode
and t.tdate = s.tdate - decode(s.tdate-trunc(s.tdate, 'iw'), 0, 3, 1) )
select count(1) cnt from s
SQLFiddle
I prepared sample data according to your original question, without further edits, you can see them in attached SQLFiddle. Additional conditions for column 'Text'
are very simple, just add something like ... and Text ='Same' in where clauses.
In current version query counts number of previous days starting from given date (change it in line 2) where dates are consecutive (excluding weekend days) and values in column tcode is the same for all days.
Part: decode(s.tdate-trunc(s.tdate, 'iw'), 0, 3, 1) is for substracting days depending if it's Monday or other day, and should work independently from NLS settings.
I have a recursive sql that I am running which works but gives me the following warning.
SQL0347W The recursive common table expression "DT_LAST_YEAR" may
contain an infinite loop. SQLSTATE=01605
How can I get rid of the warning?
INSERT INTO REP_MAN_TRAN_COUNTS (SITEDIRECTORYID, BUSINESSDATE, TRANCOUNT)
WITH dt_this_year (level, seqdate) AS
(
SELECT 1, date(current timestamp) -7 DAYS FROM sysibm.sysdummy1
UNION ALL
SELECT level, seqdate + level days FROM dt_this_year WHERE level < 1000 AND seqdate + 1 days < date(current timestamp)
)
,dt_last_year (level, seqdate) AS
(
SELECT 1, date(current timestamp) -7 DAYS - 1 year FROM sysibm.sysdummy1
UNION ALL
SELECT level, seqdate + level days FROM dt_last_year WHERE level < 1000 AND seqdate + 1 days < date(current timestamp) -1 year
)
select 10049, date(dts.calendarday), count(*) trancount
from (
SELECT seqdate AS calendarday FROM dt_this_year
UNION
SELECT seqdate AS calendarday FROM dt_last_year
) dts LEFT JOIN ccftrxheader ccf
ON date(dts.calendarday) = date(ccf.businessdate)
WHERE ccf.sitedirectoryid=10049
GROUP BY ccf.sitedirectoryid,dts.calendarday
How do you get rid of warnings?
By changing the code so that it no longer generates the warning in the first place. Hiding warnings is problematic, because it often disguises a potentially larger problem. I'm fairly certain it's complaining here because the termination clause you provide for level can't ever be reached (because you never manipulate it).
Personally, I'd probably re-write your query into something like this:
INSERT INTO Rep_Man_Tran_Counts (siteDirectoryId, businessDate, tranCount)
WITH dt_Calendar_Data (level, calendarDay) AS
(SELECT l, c
FROM (VALUES (1, CURRENT_DATE - 7 DAYS),
(1, CURRENT_DATE - 7 DAYS - 1 YEAR)) t(l, c)
UNION ALL
SELECT level + 1, calendarDay + 1 DAYS
FROM dt_Calendar_Data
WHERE level < 7)
SELECT 10049, dtCal.calendarDay, COALESCE(COUNT(*), 0) as tranCount
FROM dt_Calendar_Data dtCal
LEFT JOIN ccftrxHeader ccf
ON ccf.businessDate = dtCal.calendarDay
AND ccf.siteDirectoryId = 10049
GROUP BY dtCal.seqDate
(untested, as you've provided no sample data, and I don't have a DB2 instance)
I've assumed you actually wanted a LEFT JOIN, as opposed to the regular INNER JOIN you were actually getting (due to the condition in the WHERE clause, and probably the GROUP BY as well). To avoid adding nulls to your data, I've wrapped the count in COALESCE(...), which will give you 0 instead.
I've also assumed that businessDate is a DATE type, and not a timestamp. If it is a timestamp this query needs to be adjusted (note that the function you were using would for the optimizer to ignore indices).
Note that order of operations with dates matter! Thankfully when dealing with year ranges, you only have one day to worry about in the Gregorian calendar (February 29th). Your current ordering will compare identical calendar days at the start of the range (which one has the "gap" depends on whether this year or last year is a leap year).
EDIT:
Sure, lets look at that CTE:
FROM(VALUES (1, CURRENT_DATE - 7 DAYS),
(1, CURRENT_DATE - 7 DAYS - 1 YEAR)) t(l, c)
This is just a standard VALUES clause used as a table reference. This is the SQL Standard way to construct a small temp table (Rather than referencing the dummy tables, which tend to be vendor-specific). If the statement is run on 2014-02-26 then the resulting table will be:
t
l c
===============
1 "2014-02-19"
1 "2013-02-19"
These columns get renamed by the column listing of the CTE, which are then referenced in the join (and in the case of a recursive CTE, by the recursive portion).
This then forms the starting data for the rest of the recursive query:
UNION ALL
SELECT level + 1, calendarDay + 1 DAYS
FROM dt_Calendar_Data
WHERE level < 7
In DB2 (and some other RDBMSs), recursive CTEs essentially execute iteratively, acting off the results of the "previous" invocation. Every time around, we increment level, and add another day to calendarDay. The "next" rows are then:
level calendarDay
======================
2 "2014-02-20"
2 "2013-02-20"
This continues until the "previous" row has level = 7, which means a new row is not generated (check the WHERE clause). In general, it's best to only have one termination condition (and make progress every iteration), to make it easier for the optimizer to spot. The resulting data is then in the ranges:
level calendarDay
=====================
1 "2014-02-19"
. .....
7 "2014-02-26"
1 "2013-02-19"
. .....
7 "2013-02-26"
... as a side note, I generated the this year/last year data together to make the number of references shorter. If you only needed the one year, level is unnecessary.
Okay, so I've done quite a lot of reading on the possibility of emulating the networkdays function of excel in sql, and have come to the conclusion that by far the easiest solution is to have a calendar table which will flag working days or non working days. However, due to circumstances out of my control, we don't have access to such a luxury and it's unlikely that we will any time in the near future.
Currently I have managed to bodge together what is undoubtedly a horrible ineffecient query in SQL that does work - the catch is, it will only work for a single client record at a time.
SELECT O_ASSESSMENTS.ASM_ID,
O_ASSESSMENTS.ASM_START_DATE,
O_ASSESSMENTS.ASM_END_DATE,
sum(CASE
When TO_CHAR(O_ASSESSMENTS.ASM_START_DATE + rownum -1,'Day')
= 'Sunday ' THEN 0
When TO_CHAR(O_ASSESSMENTS.ASM_START_DATE + rownum -1,'Day')
= 'Saturday ' THEN 0
WHEN O_ASSESSMENTS.ASM_START_DATE + rownum - 1
IN ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
THEN 0
ELSE 1
END)-1 AS Week_Day
From O_ASSESSMENTS,
ALL_OBJECTS
WHERE O_ASSESSMENTS.ASM_QSA_ID IN ('TYPE1')
AND O_ASSESSMENTS.ASM_END_DATE >= '01/01/2012'
AND O_ASSESSMENTS.ASM_ID = 'A00000'
AND ROWNUM <= O_ASSESSMENTS.ASM_END_DATE-O_ASSESSMENTS.ASM_START_DATE+1
GROUP BY
O_ASSESSMENTS.ASM_ID,
O_ASSESSMENTS.ASM_START_DATE,
O_ASSESSMENTS.ASM_END_DATE
Basically, I'm wondering if a) I should stop wasting my time on this or b) is it possible to get this to work for multiple clients? Any pointers appreciated thanks!
Edit: Further clarification - I already work out timescales using excel, but it would be ideal if we could do it in the report as the report in question is something that we would like end users to be able to run without any further manipulation.
Edit:
MarkBannister's answer works perfectly albeit slowly (though I had expected as much given it's not the preferred solution) - the challenge now lies in me integrating this into an existing report!
with
calendar_cte as (select
to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'day') in ('sunday ','saturday ') then 0 when to_date('01-01-2000')+level-1 in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011','01-01-2012','02-01-2012') then 0 else 1 end working_day
from dual
connect by level <= 1825 + sysdate - to_date('01-01-2000') )
SELECT
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE,
sum(c.working_day)-1 AS Week_Day
From
O_ASSESSMENTS a
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
WHERE a.ASM_QSA_ID IN ('TYPE1')
and a.ASM_END_DATE >= '01/01/2012'
GROUP BY
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE
There are a few ways to do this. Perhaps the simplest might be to create a CTE that produces a virtual calendar table, based on Oracle's connect by syntax, and then join it to the Assesments table, like so:
with calendar_cte as (
select to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'Day')
in ('Sunday ','Saturday ') then 0
when to_date('01-01-2000')+level-1
in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
then 0
else 1
end working_day
from dual
connect by level <= 36525 + sysdate - to_date('01-01-2000') )
SELECT a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE,
sum(c.working_day) AS Week_Day
From O_ASSESSMENTS a
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
WHERE a.ASM_QSA_ID IN ('TYPE1') and
a.ASM_END_DATE >= '01/01/2012' -- and a.ASM_ID = 'A00000'
GROUP BY
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE
This will produce a virtual table populated with dates from 01 January 2000 to 10 years after the current date, with all weekends marked as non-working days and all days specified in the second in clause (ie. up to 27 December 2011) also marked as non-working days.
The drawback of this method (or any method where the holiday dates are hardcoded into the query) is that each time new holiday dates are defined, every single query that uses this approach will have to have those dates added.
If you can't use a calendar table in Oracle, you might be better off exporting to Excel. Brute force always works.
Networkdays() "returns the number of whole working days between start_date and end_date. Working days exclude weekends and any dates identified in holidays."
Excluding weekends seems fairly straightforward. Every 7-day period will contain two weekend days. You'll just need to take some care with the leftover days.
Holidays are a different story. You have to either store them or pass them as an argument. If you could store them, you'd store them in a calendar table, and your problem would be over. But you can't do that.
So you're looking at passing them as an argument. Off the top of my head--and I haven't had any tea yet this morning--I'd consider a common table expression or a wrapper for a stored procedure.