Using sql function generate_series() in redshift - sql

I'd like to use the generate series function in redshift, but have not been successful.
The redshift documentation says it's not supported. The following code does work:
select *
from generate_series(1,10,1)
outputs:
1
2
3
...
10
I'd like to do the same with dates. I've tried a number of variations, including:
select *
from generate_series(date('2008-10-01'),date('2008-10-10 00:00:00'),1)
kicks out:
ERROR: function generate_series(date, date, integer) does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]
Also tried:
select *
from generate_series('2008-10-01 00:00:00'::timestamp,
'2008-10-10 00:00:00'::timestamp,'1 day')
And tried:
select *
from generate_series(cast('2008-10-01 00:00:00' as datetime),
cast('2008-10-10 00:00:00' as datetime),'1 day')
both kick out:
ERROR: function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]
If not looks like I'll use this code from another post:
SELECT to_char(DATE '2008-01-01'
+ (interval '1 month' * generate_series(0,57)), 'YYYY-MM-DD') AS ym
PostgreSQL generate_series() with SQL function as arguments

Amazon Redshift seems to be based on PostgreSQL 8.0.2. The timestamp arguments to generate_series() were added in 8.4.
Something like this, which sidesteps that problem, might work in Redshift.
SELECT current_date + (n || ' days')::interval
from generate_series (1, 30) n
It works in PostgreSQL 8.3, which is the earliest version I can test. It's documented in 8.0.26.
Later . . .
It seems that generate_series() is unsupported in Redshift. But given that you've verified that select * from generate_series(1,10,1) does work, the syntax above at least gives you a fighting chance. (Although the interval data type is also documented as being unsupported on Redshift.)
Still later . . .
You could also create a table of integers.
create table integers (
n integer primary key
);
Populate it however you like. You might be able to use generate_series() locally, dump the table, and load it on Redshift. (I don't know; I don't use Redshift.)
Anyway, you can do simple date arithmetic with that table without referring directly to generate_series() or to interval data types.
select (current_date + n)
from integers
where n < 31;
That works in 8.3, at least.

Using Redshift today, you can generate a range of dates by using datetime functions and feeding in a number table.
select (getdate()::date - generate_series)::date from generate_series(1,30,1)
Generates this for me
date
2015-11-06
2015-11-05
2015-11-04
2015-11-03
2015-11-02
2015-11-01
2015-10-31
2015-10-30
2015-10-29
2015-10-28
2015-10-27
2015-10-26
2015-10-25
2015-10-24
2015-10-23
2015-10-22
2015-10-21
2015-10-20
2015-10-19
2015-10-18
2015-10-17
2015-10-16
2015-10-15
2015-10-14
2015-10-13
2015-10-12
2015-10-11
2015-10-10
2015-10-09
2015-10-08

The generate_series() function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide.
UPDATE
generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate last 30 days date
Ref: generate_series function in Amazon Redshift

As of writing this, generate_series() on our instance of Redshift (1.0.33426) could not be used to, for example, create a table:
# select generate_series(1,100,1);
1
2
...
# create table normal_series as select generate_series(1,100,1);
INFO: Function "generate_series(integer, integer, integer) not supported.
ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.
However, with recursive works:
# create table recursive_series as with recursive t(n) as (select 1::integer union all select n+1 from t where n < 100) select n from t;
SELECT
-- modify as desired, here is a date series:
# select getdate()::date + n from recursive_series;
2021-12-18
2021-12-19
...

I needed to do something similar, but with 5 minutes intervals over 7 days. So here's a CTE based hack (ugly but not too verbose)
INSERT INTO five_min_periods
WITH
periods AS (select 0 as num UNION select 1 as num UNION select 2 UNION select 3 UNION select 4 UNION select 5 UNION select 6 UNION select 7 UNION select 8 UNION select 9 UNION select 10 UNION select 11),
hours AS (select num from periods UNION ALL select num + 12 from periods),
days AS (select num from periods where num <= 6),
rightnow AS (select CAST( TO_CHAR(GETDATE(), 'yyyy-mm-dd hh24') || ':' || trim(TO_CHAR((ROUND((DATEPART (MINUTE, GETDATE()) / 5), 1) * 5 ),'09')) AS TIMESTAMP) as start)
select
ROW_NUMBER() OVER(ORDER BY d.num DESC, h.num DESC, p.num DESC) as idx
, DATEADD(minutes, -p.num * 5, DATEADD( hours, -h.num, DATEADD( days, -d.num, n.start ) ) ) AS period_date
from days d, hours h, periods p, rightnow n
Should be able to extend this to other generation schemes. The trick here is using the Cartesian product join (i.e. no JOIN/WHERE clause) to multiply the hand-crafted CTE's to produce the necessary increments and apply to an anchor date.

Redshift's generate_series() function is a leader node only function and as such you cannot use it for downstream processing on the compute nodes. This can be replace by a recursive CTE (or keep a "dates" table on your database). I have an example of such in a recent answer:
Cross join Redshift with sequence of dates
One caution I like to give in answers like this is to be careful with inequality joins (or cross joins or any under-qualified joins) when working with VERY LARGE tables which can happen often in Redshift. If you are joining with a moderate Redshift table of say 1M rows then things will be fine. But if you are doing this on a table of 1B rows then the data explosion will likely cause massive performance issues as the query spills to disk.
I've written a couple of white papers on how to write this type of query in a data space sensitive way. This issue of massive intermediate results is not unique to Redshift and I first developed my approach solving a client's HIVE query issue. "First rule of writing SQL for Big Data - don't make more"

Per the comments of #Ryan Tuck and #Slobodan Pejic generate_series() does not work on Redshift when joining to another table.
The workaround I used was to write out every value in the series in the query:
SELECT
'2019-01-01'::date AS date_month
UNION ALL
SELECT
'2019-02-01'::date AS date_month
Using a Python function like this:
import arrow
def generate_date_series(start, end):
start = arrow.get(start)
end = arrow.get(end)
months = list(
f"SELECT '{month.format('YYYY-MM-DD')}'::date AS date_month"
for month in arrow.Arrow.range('month', start, end)
)
return "\nUNION ALL\n".join(months)

perhaps not as elegant as other solutions, but here's how I did it:
drop table if exists #dates;
create temporary table #dates as
with recursive cte(val_date) as
(select
cast('2020-07-01' as date) as val_date
union all
select
cast(dateadd(day, 1, val_date) as date) as val_date
from
cte
where
val_date <= getdate()
)
select
val_date as yyyymmdd
from
cte
order by
val_date
;

For five minute buckets i would do the following:
select date_trunc('minute', getdate()) - (i || ' minutes')::interval
from generate_series(0, 60*5-1, 5) as i
You could replace 5 by any given interval, and 60 by the number of rows you want.

SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,(select datediff(day,'01-Jan-2021',now()::date))) i
ORDER BY 1

Related

Most efficient way of getting all records within an 2 hour window

I would like to write a query that based on two date fields finds where there is a 2 hour or greater difference.
SELECT TO_DATE(Date_Fielda, 'DD-MON-YY HR24:MI:SS'),
TO_DATE(Date_Fieldb, 'DD_MON-YY, HR24:MI:SS')
FROM DUAL;
how would I do this?
Well, you would select from a table and use a where clause. Here is a simple method in Oracle:
select t.*
from t
where abs(date_columna - date_columnb) > 2 / 24
If you know that one column comes earlier than the other, then:
select t.*
from t
where date_columna > date_columnb + interval '2 hour'

IBM DB2: Generate list of dates between two dates

I need a query which will output a list of dates between two given dates.
For example, if my start date is 23/02/2016 and end date is 02/03/2016, I am expecting the following output:
Date
----
23/02/2016
24/02/2016
25/02/2016
26/02/2016
27/02/2016
28/02/2016
29/02/2016
01/03/2016
02/03/2016
Also, I need the above using SQL only (without the use of 'WITH' statement or tables). Please help.
I am using ,ostly DB2 for iSeries, so I will give you an SQL only solution that works on it. Currently I don't have an access to the server, so the query is not tested but it should work. EDIT Query is already tested and working
SELECT
d.min + num.n DAYS
FROM
-- create inline table with min max date
(VALUES(DATE('2015-02-28'), DATE('2016-03-01'))) AS d(min, max)
INNER JOIN
-- create inline table with numbers from 0 to 999
(
SELECT
n1.n + n10.n + n100.n AS n
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
) AS num
ON
d.min + num.n DAYS<= d.max
ORDER BY
num.n;
if you don't want to execute the query only once, you should consider creating a real table with values for the loop:
CREATE TABLE dummy_loop AS (
SELECT
n1.n + n10.n + n100.n AS n
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
) WITH DATA;
ALTER TABLE dummy_loop ADD PRIMARY KEY (dummy_loop.n);
It depends on the reason for which you like to use it, but you could even create table for lets say for 100 years. It will be only 100*365 = 36500 rows with just a date field, so the table will be quite small and fast for joins.
CREATE TABLE dummy_dates AS (
SELECT
DATE('1970-01-01') + (n1.n + n10.n + n100.n) DAYS AS date
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
) WITH DATA;
ALTER TABLE dummy_dates ADD PRIMARY KEY (dummy_dates.date);
And the select query could look like:
SELECT
*
FROM
dummy_days
WHERE
date BETWEEN(:startDate, :endDate);
EDIT 2: Thanks to #Lennart suggestion I have changed TABLE(VALUES(..,..,..)) to VALES(..,..,..) because as he said TABLE is a synonym to LATERAL that was a real surprise for me.
EDIT 3: Thanks to #godric7gt I have removed TIMESTAMPDIFF and will remove from all my scripts, because as it is said in the documentation:
These assumptions are used when converting the information in the second argument, which is a timestamp duration, to the interval type specified in the first argument. The returned estimate may vary by a number of days. For example, if the number of days (interval 16) is requested for the difference between '1997-03-01-00.00.00' and '1997-02-01-00.00.00', the result is 30. This is because the difference between the timestamps is 1 month, and the assumption of 30 days in a month applies.
It was a real surprise, because I was always trust this function for days difference.
For generating rows recusive SQL will needed.
Usually this looks like this in DB2:
with temp (date) as (
select date('23.02.2016') as date from sysibm.sysdummy1
union all
select date + 1 day from temp
where date < date('02.03.2016')
)
select * from temp
For whatever reason a CTE (using WITH) should be avoided.
A possible workaround would be setting
db2set DB2_COMPATIBILITY_VECTOR=8
which enables the use of the Oracle style recusion with CONNECT BY
SELECT date('22.02.2016') + level days as dt
FROM sysibm.sysdummy1 CONNECT BY date('22.02.2016') + level days <= date('02.03.2016')
Please note: after setting the DB2_COMPATIBILITY_VECTOR a instance restart is necessary.
This solution doesn't use WITH, but it does use WHILE and a temp table...hopefully that meets your needs still?
EDIT -- I built this in SSMS 2014
DECLARE #Start DATE
DECLARE #End DATE
SET #Start = '2016-02-23'
SET #End = '2016-03-02'
CREATE TABLE #Dates ([Date] DATE)
WHILE #Start <= #End
BEGIN
INSERT INTO #Dates
SELECT #Start
SET #Start = DATEADD(Day,1,#Start)
END
SELECT * FROM #Dates
DROP TABLE #Dates
I assume AS400 does not support recursive CTE's, and that's why you want a solution without them. I have no clue whether it supports any of the following constructions, but it might be worth a shot. First we will need a generator, any table with a sufficient number of rows will do. If you don't have a table large enough for the number of days you want you can create a cartesian product. Example:
select row_number() over ()
from a_table
cross join a_table
Another way of extending the domain is to create the powerset of a table using group by cube, see below.
Assume we one way or another can create a large enough set of rows. You can generate the dates like:
select date('23/02/2016') + n days
from (
select row_number() over () as n
from a_table
) as t
where n < 100
order by n
If for some reason you don't want to use an existing table, group by cube will produce a relation with a cardinality equal to the power set of the attributes. Here I use 4 columns which will generate 16 rows.
select date('2016-01-01') + row_number() over () days
from sysibm.dual x
group by cube(x.dummy, x.dummy, x.dummy, x.dummy)
If you want to generate say 100 rows you need 7 (since 2^7=128) attributes in the group by cube clause and a fetch first 100 rows:
select date('2016-01-01') + row_number() over () days
from sysibm.dual x
group by cube(x.dummy, x.dummy, x.dummy, x.dummy, x.dummy, x.dummy, x.dummy)
order by 1
fetch first 100 rows only

How to generate Month list in PostgreSQL?

I have a table A with startdate column which is TIMESTAMP WITHOUT TIME ZONE I need to write a query/function that generate a list of months from the MIN value of the column till MAX value of the column.
For example:
startdate
2014-12-08
2015-06-16
2015-02-17
will generate a list of: (Dec-14,Jan-15,Feb-15,Mar-15,Apr-15,May-15,Jun-15)
How do I do that? I never used PostgreSQL to generate data that wasn't there... it always has been finding the correct data in the DB... any ideas how to do that? Is it doable in a query?
For people looking for an unformatted list of months:
select * from generate_series('2017-01-01', now(), '1 month')
You can generate sequences of data with the generate_series() function:
SELECT to_char(generate_series(min, max, '1 month'), 'Mon-YY') AS "Mon-YY"
FROM (
SELECT date_trunc('month', min(startdate)) AS min,
date_trunc('month', max(startdate)) AS max
FROM a) sub;
This generates a row for every month, in a pretty format. If you want to have it like a list, you can aggregate them all in an outer query:
SELECT string_agg("Mon-YY", ', ') AS "Mon-YY list"
FROM (
-- Query above
) subsub;
SQLFiddle here

How do you do date math that ignores the year?

I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following.
SELECT * FROM events
WHERE EXTRACT(month FROM "date") = 3
AND EXTRACT(day FROM "date") < EXTRACT(day FROM "date") + 14
The problem with this is that months wrap.
I would prefer to do something like this, but I don't know how to ignore the year.
SELECT * FROM events
WHERE (date > '2013-03-01' AND date < '2013-04-01')
How can I accomplish this kind of date math in Postgres?
TL/DR: use the "Black magic version" below.
All queries presented in other answers so far operate with conditions that are not sargable: they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. Doesn't matter much with small tables. Matters a lot with big tables.
Given the following simple table:
CREATE TABLE event (
event_id serial PRIMARY KEY
, event_date date
);
Query
Version 1. and 2. below can use a simple index of the form:
CREATE INDEX event_event_date_idx ON event(event_date);
But all of the following solutions are even faster without index.
1. Simple version
SELECT *
FROM (
SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date
FROM generate_series( 0, 14) d
CROSS JOIN generate_series(13, 113) y
) x
JOIN event USING (event_date);
Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with the final simple join.
2. Advanced version
WITH val AS (
SELECT extract(year FROM age(current_date + 14, min(event_date)))::int AS max_y
, extract(year FROM age(current_date, max(event_date)))::int AS min_y
FROM event
)
SELECT e.*
FROM (
SELECT ((current_date + d.d) - interval '1 year' * y.y)::date AS event_date
FROM generate_series(0, 14) d
,(SELECT generate_series(min_y, max_y) AS y FROM val) y
) x
JOIN event e USING (event_date);
Range of years is deduced from the table automatically - thereby minimizing generated years.
You could go one step further and distill a list of existing years if there are gaps.
Effectiveness co-depends on the distribution of dates. It's better for few years with many rows each.
Simple db<>fiddle to play with here
Old sqlfiddle
3. Black magic version
Create a simple SQL function to calculate an integer from the pattern 'MMDD':
CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';
I had to_char(time, 'MMDD') at first, but switched to the above expression which proved fastest in new tests on Postgres 9.6 and 10:
db<>fiddle here
It allows function inlining because EXTRACT(xyz FROM date) is implemented with the IMMUTABLE function date_part(text, date) internally. And it has to be IMMUTABLE to allow its use in the following essential multicolumn expression index:
CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);
Multicolumn for a number of reasons:
Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.
Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.
Basic query:
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN f_mmdd(current_date)
AND f_mmdd(current_date + 14);
One PL/pgSQL table function to rule them all
Fork to one of two queries to cover the turn of the year:
CREATE OR REPLACE FUNCTION f_anniversary(_the_date date = current_date, _days int = 14)
RETURNS SETOF event
LANGUAGE plpgsql AS
$func$
DECLARE
d int := f_mmdd($1);
d1 int := f_mmdd($1 + $2 - 1); -- fix off-by-1 from upper bound
BEGIN
IF d1 > d THEN
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN d AND d1
ORDER BY f_mmdd(e.event_date), e.event_date;
ELSE -- wrap around end of year
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) >= d OR
f_mmdd(e.event_date) <= d1
ORDER BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;
-- chronological across turn of the year
END IF;
END
$func$;
Call using defaults: 14 days beginning "today":
SELECT * FROM f_anniversary();
Call for 7 days beginning '2014-08-23':
SELECT * FROM f_anniversary(date '2014-08-23', 7);
db<>fiddle here - comparing EXPLAIN ANALYZE
"February 29"
When dealing with anniversaries or "birthdays", you need to define how to deal with the special case "February 29" in leap years.
When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.
On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.
Say, Bob is born on the 29th of February:
My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.
My query 3. includes February 29 in the range. Bob has birthday every year.
There is no magical solution. You have to define what you want for every case.
Test
To substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.
The good news: all of them are correct and yield the same result - except for Gordon's query that had syntax errors, and #wildplasser's query that fails when the year wraps around (easy to fix).
Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older).
INSERT INTO event (event_date)
SELECT '2000-1-1'::date - (random() * 36525)::int
FROM generate_series (1, 108000);
Delete ~ 8 % to create some dead tuples and make the table more "real life".
DELETE FROM event WHERE random() < 0.08;
ANALYZE event;
My test case had 99289 rows, 4012 hits.
C - Catcall
WITH anniversaries as (
SELECT event_id, event_date
,(event_date + (n || ' years')::interval)::date anniversary
FROM event, generate_series(13, 113) n
)
SELECT event_id, event_date -- count(*) --
FROM anniversaries
WHERE anniversary BETWEEN current_date AND current_date + interval '14' day;
C1 - Catcall's idea rewritten
Aside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year's anniversary, which avoids the need for a CTE altogether:
SELECT event_id, event_date
FROM event
WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date
BETWEEN current_date AND current_date + 14;
D - Daniel
SELECT * -- count(*) --
FROM event
WHERE extract(month FROM age(current_date + 14, event_date)) = 0
AND extract(day FROM age(current_date + 14, event_date)) <= 14;
E1 - Erwin 1
See "1. Simple version" above.
E2 - Erwin 2
See "2. Advanced version" above.
E3 - Erwin 3
See "3. Black magic version" above.
G - Gordon
SELECT * -- count(*)
FROM (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) e
WHERE to_date(to_char(now(), 'YYYY') || '-'
|| (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END)
,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;
H - a_horse_with_no_name
WITH upcoming as (
SELECT event_id, event_date
,CASE
WHEN date_trunc('year', age(event_date)) = age(event_date)
THEN current_date
ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)
* interval '1' year) AS date)
END AS next_event
FROM event
)
SELECT event_id, event_date
FROM upcoming
WHERE next_event - current_date <= 14;
W - wildplasser
CREATE OR REPLACE FUNCTION this_years_birthday(_dut date)
RETURNS date
LANGUAGE plpgsql AS
$func$
DECLARE
ret date;
BEGIN
ret := date_trunc('year' , current_timestamp)
+ (date_trunc('day' , _dut)
- date_trunc('year' , _dut));
RETURN ret;
END
$func$;
Simplified to return the same as all the others:
SELECT *
FROM event e
WHERE this_years_birthday( e.event_date::date )
BETWEEN current_date
AND current_date + '2weeks'::interval;
W1 - wildplasser's query rewritten
The above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:
CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date)
LANGUAGE sql AS
$func$
SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date
$func$;
SELECT *
FROM event e
WHERE this_years_birthday(e.event_date) BETWEEN current_date
AND (current_date + 14);
Test results
I ran this test with a temporary table on PostgreSQL 9.1.7.
Results were gathered with EXPLAIN ANALYZE, best of 5.
Results
Without index
C: Total runtime: 76714.723 ms
C1: Total runtime: 307.987 ms -- !
D: Total runtime: 325.549 ms
E1: Total runtime: 253.671 ms -- !
E2: Total runtime: 484.698 ms -- min() & max() expensive without index
E3: Total runtime: 213.805 ms -- !
G: Total runtime: 984.788 ms
H: Total runtime: 977.297 ms
W: Total runtime: 2668.092 ms
W1: Total runtime: 596.849 ms -- !
With index
E1: Total runtime: 37.939 ms --!!
E2: Total runtime: 38.097 ms --!!
With index on expression
E3: Total runtime: 11.837 ms --!!
All other queries perform the same with or without index because they use non-sargable expressions.
Conclusion
So far, #Daniel's query was the fastest.
#wildplassers (rewritten) approach performs acceptably, too.
#Catcall's version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.
The rewritten version performs pretty well, though. The expression I use is something like a simpler version of #wildplassser's this_years_birthday() function.
My "simple version" is faster even without index, because it needs fewer computations.
With index, the "advanced version" is about as fast as the "simple version", because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.
My "black magic version" is fastest with or without index. And it is very simple to call.
The updated version (after the benchmark) is a bit faster, yet.
With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.
I believe the following test works in all cases, assuming a column named anniv_date:
select * from events
where extract(month from age(current_date+interval '14 days', anniv_date))=0
and extract(day from age(current_date+interval '14 days', anniv_date)) <= 14
As an example of how it works when crossing a year (and also a month), let's say an anniversary date is 2009-01-04 and the date at which the test is run is 2012-12-29.
We want to consider any date between 2012-12-29 and 2013-01-12 (14 days)
age('2013-01-12'::date, '2009-01-04'::date) is 4 years 8 days.
extract(month...) from this is 0 and extract(days...) is 8, which is lower than 14 so it matches.
How about this?
select *
from events e
where to_char(e."date", 'MM-DD') between to_char(now(), 'MM-DD') and
to_char(date(now())+14, 'MM-DD')
You can do the comparison as strings.
To take year ends into account, we'll convert back to dates:
select *
from events e
where to_date(to_char(now(), 'YYYY')||'-'||to_char(e."date", 'MM-DD'), 'YYYY-MM-DD')
between date(now()) and date(now())+14
You do need to make a slight adjustment for Feb 29. I might suggest:
select *
from (select e.*,
to_char(e."date", 'MM-DD') as MMDD
from events
) e
where to_date(to_char(now(), 'YYYY')||'-'||(case when MMDD = '02-29' then '02-28' else MMDD), 'YYYY-MM-DD')
between date(now()) and date(now())+14
For convenience, I created two functions that yield the (expected or past) birsthday in the current year, and the upcoming birthday.
CREATE OR REPLACE FUNCTION this_years_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION next_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
IF (ret < date_trunc( 'day' , current_timestamp))
THEN ret = ret + '1year'::interval; END IF;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
--
-- call the function
--
SELECT date_trunc( 'day' , t.topic_date) AS the_date
, this_years_birthday( t.topic_date::date ) AS the_day
, next_birthday( t.topic_date::date ) AS next_day
FROM topic t
WHERE this_years_birthday( t.topic_date::date )
BETWEEN current_date
AND current_date + '2weeks':: interval
;
NOTE: the casts are needed because I only had timestamps available.
This should handle wrap-arounds at the end of the year as well:
with upcoming as (
select name,
event_date,
case
when date_trunc('year', age(event_date)) = age(event_date) then current_date
else cast(event_date + ((extract(year from age(event_date)) + 1) * interval '1' year) as date)
end as next_event
from events
)
select name,
next_event,
next_event - current_date as days_until_next
from upcoming
order by next_event - current_date
You can filter than on the expression next_event - current_date to apply the "next 14 days"
The case ... is only necessary if you consider events that would be "today" as "upcoming" as well. Otherwise, that can be reduced to the else part of the case statement.
Note that I "renamed" the column "date" to event_date. Mainly because reserved words shouldn't be used as an identifier but also because date is a terrible column name. It doesn't tell you anything about what it stores.
You can generate a virtual table of anniversaries, and select from it.
with anniversaries as (
select event_date,
(event_date + (n || ' years')::interval)::date anniversary
from events, generate_series(1,10) n
)
select event_date, anniversary
from anniversaries
where anniversary between current_date and current_date + interval '14' day
order by event_date, anniversary
The call to generate_series(1,10) has the effect of generating 10 years of anniversaries for each event_date. I wouldn't use the literal value 10 in production. Instead, I'd either calculate the right number of years to use in a subquery, or I'd use a large literal like 100.
You'll want to adjust the WHERE clause to fit your application.
If you have a performance problem with the virtual table (when you have a lot of rows in "events"), replace the common table expression with a base table having the identical structure. Storing anniversaries in a base table makes their values obvious (especially for, say, Feb 29 anniversaries), and queries on such a table can use an index. Querying an anniversary table of half a million rows using just the SELECT statement above takes 25ms on my desktop.
I found a way to do it.
SELECT EXTRACT(DAYS FROM age('1999-04-10', '2003-05-12')),
EXTRACT(MONTHS FROM age('1999-04-10', '2003-05-12'));
date_part | date_part
-----------+-----------
-2 | -1
I can then just check that the month is 0 and the days are less than 14.
If you have a more elegant solution, please do post it. I'll leave the question open for a bit.
I don't work with postgresql so I googled it's date functions and found this: http://www.postgresql.org/docs/current/static/functions-datetime.html
If I read it correctly, looking for events in the next 14 days is as simple as:
where mydatefield >= current_date
and mydatefield < current_date + integer '14'
Of course I might not be reading it correctly.

Is it possible to write a query which returns a date for every day between two specified days?

Basically, the question says it all. I need a PL\SQL query that returns a list of dates between two dates such that for 01-JAN-2010 to 20-JAN-2010 I would get 20 rows returned:
the_date
--------
01-JAN-2010
02-JAN-2010
03-JAN-2010
04-JAN-2010
...
20-JAN-2010
The following query will return each day between 1/1 and 1/20 (inclusive).
select to_date('1/1/2010','mm/dd/yyyy')+level
from dual
connect by level <= to_date('1/20/2010','mm/dd/yyyy')
- to_date('1/1/2010','mm/dd/yyyy');
Here's an example from postgres, I hope the dialects are comparable in regards to recursive
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 20
)
SELECT n FROM t;
...will return 20 records, numbers from 1 to 20
Cast/convert these to dates and there you are
UPDATE:
Sorry, don't have ORA here, but according to this article
SELECT
SYS_CONNECT_BY_PATH(DUMMY, '/')
FROM
DUAL
CONNECT BY
LEVEL<4;
gives
SYS_CONNECT_BY_PATH(DUMMY,'/')
--------------------------------
/X
/X/X
/X/X/X
It is also stated that this is supposed to be very efficient way to generate rows.
If ROWNUM can be used in the above select and if variable can be used in LEVEL condition then solution can be worked out.
UPDATE2:
And indeed there are several options.
SELECT (CAST('01-JAN-2010' AS DATE) + (ROWNUM - 1)) n
FROM ( SELECT 1 just_a_column
FROM dual
CONNECT BY LEVEL <= 20
)
orafaq states that: 'It should be noted that in later versions of oracle, at least as far back as 10gR1, operations against dual are optimized such that they require no logical or physical I/O operations. This makes them quite fast.', so I would say this is not completely esoteric.
OK, so it might seem a little hacky, but here's what I've come up with:
SELECT (CAST('01-JAN-2010' AS DATE) + (ROWNUM - 1)) AS the_date
FROM all_objects
WHERE ROWNUM <= CAST('20-JAN-2010' AS DATE) - CAST('01-JAN-2010' AS DATE) + 1
The magic sauce is using ROWNUM as a seed for date arithmetic, I'm using all_objects but you could use any table that has enough rows in it to supply the required range. You can shuffle it around to make it work off SYSDATE instead of hard coding the value, but in principle I think that the idea is sound.
Here's an example that returns a list of dates from 10 days ago to 10 days time:
SELECT (SYSDATE -10 + (ROWNUM-1)) AS the_date
FROM all_objects
WHERE ROWNUM <= (SYSDATE +10) - (SYSDATE -10) + 1
No. Queries can only return existing data - and if you have no table of all days, you are out.
That said (I am no oracle specialist), a function or stored procedure should be able to do that. In SQL Server I would have a function returning a table (that I could then use in joins).
But a pure query - no. Not unless oracle has such a function already.