How do you do date math that ignores the year? - sql

I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following.
SELECT * FROM events
WHERE EXTRACT(month FROM "date") = 3
AND EXTRACT(day FROM "date") < EXTRACT(day FROM "date") + 14
The problem with this is that months wrap.
I would prefer to do something like this, but I don't know how to ignore the year.
SELECT * FROM events
WHERE (date > '2013-03-01' AND date < '2013-04-01')
How can I accomplish this kind of date math in Postgres?

TL/DR: use the "Black magic version" below.
All queries presented in other answers so far operate with conditions that are not sargable: they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. Doesn't matter much with small tables. Matters a lot with big tables.
Given the following simple table:
CREATE TABLE event (
event_id serial PRIMARY KEY
, event_date date
);
Query
Version 1. and 2. below can use a simple index of the form:
CREATE INDEX event_event_date_idx ON event(event_date);
But all of the following solutions are even faster without index.
1. Simple version
SELECT *
FROM (
SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date
FROM generate_series( 0, 14) d
CROSS JOIN generate_series(13, 113) y
) x
JOIN event USING (event_date);
Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with the final simple join.
2. Advanced version
WITH val AS (
SELECT extract(year FROM age(current_date + 14, min(event_date)))::int AS max_y
, extract(year FROM age(current_date, max(event_date)))::int AS min_y
FROM event
)
SELECT e.*
FROM (
SELECT ((current_date + d.d) - interval '1 year' * y.y)::date AS event_date
FROM generate_series(0, 14) d
,(SELECT generate_series(min_y, max_y) AS y FROM val) y
) x
JOIN event e USING (event_date);
Range of years is deduced from the table automatically - thereby minimizing generated years.
You could go one step further and distill a list of existing years if there are gaps.
Effectiveness co-depends on the distribution of dates. It's better for few years with many rows each.
Simple db<>fiddle to play with here
Old sqlfiddle
3. Black magic version
Create a simple SQL function to calculate an integer from the pattern 'MMDD':
CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';
I had to_char(time, 'MMDD') at first, but switched to the above expression which proved fastest in new tests on Postgres 9.6 and 10:
db<>fiddle here
It allows function inlining because EXTRACT(xyz FROM date) is implemented with the IMMUTABLE function date_part(text, date) internally. And it has to be IMMUTABLE to allow its use in the following essential multicolumn expression index:
CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);
Multicolumn for a number of reasons:
Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.
Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.
Basic query:
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN f_mmdd(current_date)
AND f_mmdd(current_date + 14);
One PL/pgSQL table function to rule them all
Fork to one of two queries to cover the turn of the year:
CREATE OR REPLACE FUNCTION f_anniversary(_the_date date = current_date, _days int = 14)
RETURNS SETOF event
LANGUAGE plpgsql AS
$func$
DECLARE
d int := f_mmdd($1);
d1 int := f_mmdd($1 + $2 - 1); -- fix off-by-1 from upper bound
BEGIN
IF d1 > d THEN
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN d AND d1
ORDER BY f_mmdd(e.event_date), e.event_date;
ELSE -- wrap around end of year
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) >= d OR
f_mmdd(e.event_date) <= d1
ORDER BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;
-- chronological across turn of the year
END IF;
END
$func$;
Call using defaults: 14 days beginning "today":
SELECT * FROM f_anniversary();
Call for 7 days beginning '2014-08-23':
SELECT * FROM f_anniversary(date '2014-08-23', 7);
db<>fiddle here - comparing EXPLAIN ANALYZE
"February 29"
When dealing with anniversaries or "birthdays", you need to define how to deal with the special case "February 29" in leap years.
When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.
On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.
Say, Bob is born on the 29th of February:
My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.
My query 3. includes February 29 in the range. Bob has birthday every year.
There is no magical solution. You have to define what you want for every case.
Test
To substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.
The good news: all of them are correct and yield the same result - except for Gordon's query that had syntax errors, and #wildplasser's query that fails when the year wraps around (easy to fix).
Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older).
INSERT INTO event (event_date)
SELECT '2000-1-1'::date - (random() * 36525)::int
FROM generate_series (1, 108000);
Delete ~ 8 % to create some dead tuples and make the table more "real life".
DELETE FROM event WHERE random() < 0.08;
ANALYZE event;
My test case had 99289 rows, 4012 hits.
C - Catcall
WITH anniversaries as (
SELECT event_id, event_date
,(event_date + (n || ' years')::interval)::date anniversary
FROM event, generate_series(13, 113) n
)
SELECT event_id, event_date -- count(*) --
FROM anniversaries
WHERE anniversary BETWEEN current_date AND current_date + interval '14' day;
C1 - Catcall's idea rewritten
Aside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year's anniversary, which avoids the need for a CTE altogether:
SELECT event_id, event_date
FROM event
WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date
BETWEEN current_date AND current_date + 14;
D - Daniel
SELECT * -- count(*) --
FROM event
WHERE extract(month FROM age(current_date + 14, event_date)) = 0
AND extract(day FROM age(current_date + 14, event_date)) <= 14;
E1 - Erwin 1
See "1. Simple version" above.
E2 - Erwin 2
See "2. Advanced version" above.
E3 - Erwin 3
See "3. Black magic version" above.
G - Gordon
SELECT * -- count(*)
FROM (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) e
WHERE to_date(to_char(now(), 'YYYY') || '-'
|| (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END)
,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;
H - a_horse_with_no_name
WITH upcoming as (
SELECT event_id, event_date
,CASE
WHEN date_trunc('year', age(event_date)) = age(event_date)
THEN current_date
ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)
* interval '1' year) AS date)
END AS next_event
FROM event
)
SELECT event_id, event_date
FROM upcoming
WHERE next_event - current_date <= 14;
W - wildplasser
CREATE OR REPLACE FUNCTION this_years_birthday(_dut date)
RETURNS date
LANGUAGE plpgsql AS
$func$
DECLARE
ret date;
BEGIN
ret := date_trunc('year' , current_timestamp)
+ (date_trunc('day' , _dut)
- date_trunc('year' , _dut));
RETURN ret;
END
$func$;
Simplified to return the same as all the others:
SELECT *
FROM event e
WHERE this_years_birthday( e.event_date::date )
BETWEEN current_date
AND current_date + '2weeks'::interval;
W1 - wildplasser's query rewritten
The above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:
CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date)
LANGUAGE sql AS
$func$
SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date
$func$;
SELECT *
FROM event e
WHERE this_years_birthday(e.event_date) BETWEEN current_date
AND (current_date + 14);
Test results
I ran this test with a temporary table on PostgreSQL 9.1.7.
Results were gathered with EXPLAIN ANALYZE, best of 5.
Results
Without index
C: Total runtime: 76714.723 ms
C1: Total runtime: 307.987 ms -- !
D: Total runtime: 325.549 ms
E1: Total runtime: 253.671 ms -- !
E2: Total runtime: 484.698 ms -- min() & max() expensive without index
E3: Total runtime: 213.805 ms -- !
G: Total runtime: 984.788 ms
H: Total runtime: 977.297 ms
W: Total runtime: 2668.092 ms
W1: Total runtime: 596.849 ms -- !
With index
E1: Total runtime: 37.939 ms --!!
E2: Total runtime: 38.097 ms --!!
With index on expression
E3: Total runtime: 11.837 ms --!!
All other queries perform the same with or without index because they use non-sargable expressions.
Conclusion
So far, #Daniel's query was the fastest.
#wildplassers (rewritten) approach performs acceptably, too.
#Catcall's version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.
The rewritten version performs pretty well, though. The expression I use is something like a simpler version of #wildplassser's this_years_birthday() function.
My "simple version" is faster even without index, because it needs fewer computations.
With index, the "advanced version" is about as fast as the "simple version", because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.
My "black magic version" is fastest with or without index. And it is very simple to call.
The updated version (after the benchmark) is a bit faster, yet.
With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.

I believe the following test works in all cases, assuming a column named anniv_date:
select * from events
where extract(month from age(current_date+interval '14 days', anniv_date))=0
and extract(day from age(current_date+interval '14 days', anniv_date)) <= 14
As an example of how it works when crossing a year (and also a month), let's say an anniversary date is 2009-01-04 and the date at which the test is run is 2012-12-29.
We want to consider any date between 2012-12-29 and 2013-01-12 (14 days)
age('2013-01-12'::date, '2009-01-04'::date) is 4 years 8 days.
extract(month...) from this is 0 and extract(days...) is 8, which is lower than 14 so it matches.

How about this?
select *
from events e
where to_char(e."date", 'MM-DD') between to_char(now(), 'MM-DD') and
to_char(date(now())+14, 'MM-DD')
You can do the comparison as strings.
To take year ends into account, we'll convert back to dates:
select *
from events e
where to_date(to_char(now(), 'YYYY')||'-'||to_char(e."date", 'MM-DD'), 'YYYY-MM-DD')
between date(now()) and date(now())+14
You do need to make a slight adjustment for Feb 29. I might suggest:
select *
from (select e.*,
to_char(e."date", 'MM-DD') as MMDD
from events
) e
where to_date(to_char(now(), 'YYYY')||'-'||(case when MMDD = '02-29' then '02-28' else MMDD), 'YYYY-MM-DD')
between date(now()) and date(now())+14

For convenience, I created two functions that yield the (expected or past) birsthday in the current year, and the upcoming birthday.
CREATE OR REPLACE FUNCTION this_years_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION next_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
IF (ret < date_trunc( 'day' , current_timestamp))
THEN ret = ret + '1year'::interval; END IF;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
--
-- call the function
--
SELECT date_trunc( 'day' , t.topic_date) AS the_date
, this_years_birthday( t.topic_date::date ) AS the_day
, next_birthday( t.topic_date::date ) AS next_day
FROM topic t
WHERE this_years_birthday( t.topic_date::date )
BETWEEN current_date
AND current_date + '2weeks':: interval
;
NOTE: the casts are needed because I only had timestamps available.

This should handle wrap-arounds at the end of the year as well:
with upcoming as (
select name,
event_date,
case
when date_trunc('year', age(event_date)) = age(event_date) then current_date
else cast(event_date + ((extract(year from age(event_date)) + 1) * interval '1' year) as date)
end as next_event
from events
)
select name,
next_event,
next_event - current_date as days_until_next
from upcoming
order by next_event - current_date
You can filter than on the expression next_event - current_date to apply the "next 14 days"
The case ... is only necessary if you consider events that would be "today" as "upcoming" as well. Otherwise, that can be reduced to the else part of the case statement.
Note that I "renamed" the column "date" to event_date. Mainly because reserved words shouldn't be used as an identifier but also because date is a terrible column name. It doesn't tell you anything about what it stores.

You can generate a virtual table of anniversaries, and select from it.
with anniversaries as (
select event_date,
(event_date + (n || ' years')::interval)::date anniversary
from events, generate_series(1,10) n
)
select event_date, anniversary
from anniversaries
where anniversary between current_date and current_date + interval '14' day
order by event_date, anniversary
The call to generate_series(1,10) has the effect of generating 10 years of anniversaries for each event_date. I wouldn't use the literal value 10 in production. Instead, I'd either calculate the right number of years to use in a subquery, or I'd use a large literal like 100.
You'll want to adjust the WHERE clause to fit your application.
If you have a performance problem with the virtual table (when you have a lot of rows in "events"), replace the common table expression with a base table having the identical structure. Storing anniversaries in a base table makes their values obvious (especially for, say, Feb 29 anniversaries), and queries on such a table can use an index. Querying an anniversary table of half a million rows using just the SELECT statement above takes 25ms on my desktop.

I found a way to do it.
SELECT EXTRACT(DAYS FROM age('1999-04-10', '2003-05-12')),
EXTRACT(MONTHS FROM age('1999-04-10', '2003-05-12'));
date_part | date_part
-----------+-----------
-2 | -1
I can then just check that the month is 0 and the days are less than 14.
If you have a more elegant solution, please do post it. I'll leave the question open for a bit.

I don't work with postgresql so I googled it's date functions and found this: http://www.postgresql.org/docs/current/static/functions-datetime.html
If I read it correctly, looking for events in the next 14 days is as simple as:
where mydatefield >= current_date
and mydatefield < current_date + integer '14'
Of course I might not be reading it correctly.

Related

Dynamic date in BigQuery

Without manually having to change the date to current date, I'd like to have code which helps to change the date automatically or auto increment date by one day post 0000hrs in big query
AND ((call_date >= "2022-10-01") AND (call_date <= "2022-10-12"))
Below is the complete code.
WITH_0 AS ( SELECT *, FROM employee_calldata),
_1 AS (
SELECT
call_date AS __call_date__1,
sub_queue AS __sub_queue__1,
sum(call_count) as callstaken,
mode AS __mode__1, `FROM _0 AS _t
WHERE
(NOT ((call_type) IS NULL)))
AND ((call_date >= "2022-10-01") AND (call_date <= "2022-10-12"))
AND (sub_queue = "Customer_Complaints")
GROUP BY __call_date__1, __sub_queue__1, __mode__1)
SELECT * FROM _1
`
DATE((DATETIME_ADD(('2022-10-03 00:00:00'), INTERVAL 100 HOUR)))
=> 2022-10-07
I think what you are asking is how you can create rolling windows that increment as the days go forward.
The equivalent to:
AND ((DATE(call_date) >= "2022-10-01") AND (call_date <= "2022-10-12"))
Is:
AND ((DATE(call_date) >= DATETIME_SUB(CURRENT_DATE(), INTERVAL 2 DAY) AND (DATE(call_date) <= DATETIME_ADD(CURRENT_DATE(), INTERVAL 9 DAY)
These values will change based on the current date, change the intervals in DATETIME_SUB and DATETIME_ADD to change the difference from the current date.
Also some other general comments on your code.
You do not need brackets on the WHERE conditions.
(NOT ((call_type) IS NULL))) can be written as call_type IS NOT NULL.
You do not need your first SELECT *, FROM employee_calldata or your SELECT * FROM _1 as they do nothing extra.
This means your final query can be written as:
SELECT
call_date AS __call_date__1,
sub_queue AS __sub_queue__1,
sum(call_count) as callstaken,
mode AS __mode__1
FROM
employee_calldata AS _t
WHERE
call_type IS NOT NULL
AND DATE(call_date) >= DATETIME_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND DATE(call_date) <= DATETIME_ADD(CURRENT_DATE(), INTERVAL 9 DAY)
AND sub_queue = "Customer_Complaints"
GROUP BY
__call_date__1,
__sub_queue__1,
__mode__1

Using sql function generate_series() in redshift

I'd like to use the generate series function in redshift, but have not been successful.
The redshift documentation says it's not supported. The following code does work:
select *
from generate_series(1,10,1)
outputs:
1
2
3
...
10
I'd like to do the same with dates. I've tried a number of variations, including:
select *
from generate_series(date('2008-10-01'),date('2008-10-10 00:00:00'),1)
kicks out:
ERROR: function generate_series(date, date, integer) does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]
Also tried:
select *
from generate_series('2008-10-01 00:00:00'::timestamp,
'2008-10-10 00:00:00'::timestamp,'1 day')
And tried:
select *
from generate_series(cast('2008-10-01 00:00:00' as datetime),
cast('2008-10-10 00:00:00' as datetime),'1 day')
both kick out:
ERROR: function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]
If not looks like I'll use this code from another post:
SELECT to_char(DATE '2008-01-01'
+ (interval '1 month' * generate_series(0,57)), 'YYYY-MM-DD') AS ym
PostgreSQL generate_series() with SQL function as arguments
Amazon Redshift seems to be based on PostgreSQL 8.0.2. The timestamp arguments to generate_series() were added in 8.4.
Something like this, which sidesteps that problem, might work in Redshift.
SELECT current_date + (n || ' days')::interval
from generate_series (1, 30) n
It works in PostgreSQL 8.3, which is the earliest version I can test. It's documented in 8.0.26.
Later . . .
It seems that generate_series() is unsupported in Redshift. But given that you've verified that select * from generate_series(1,10,1) does work, the syntax above at least gives you a fighting chance. (Although the interval data type is also documented as being unsupported on Redshift.)
Still later . . .
You could also create a table of integers.
create table integers (
n integer primary key
);
Populate it however you like. You might be able to use generate_series() locally, dump the table, and load it on Redshift. (I don't know; I don't use Redshift.)
Anyway, you can do simple date arithmetic with that table without referring directly to generate_series() or to interval data types.
select (current_date + n)
from integers
where n < 31;
That works in 8.3, at least.
Using Redshift today, you can generate a range of dates by using datetime functions and feeding in a number table.
select (getdate()::date - generate_series)::date from generate_series(1,30,1)
Generates this for me
date
2015-11-06
2015-11-05
2015-11-04
2015-11-03
2015-11-02
2015-11-01
2015-10-31
2015-10-30
2015-10-29
2015-10-28
2015-10-27
2015-10-26
2015-10-25
2015-10-24
2015-10-23
2015-10-22
2015-10-21
2015-10-20
2015-10-19
2015-10-18
2015-10-17
2015-10-16
2015-10-15
2015-10-14
2015-10-13
2015-10-12
2015-10-11
2015-10-10
2015-10-09
2015-10-08
The generate_series() function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide.
UPDATE
generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate last 30 days date
Ref: generate_series function in Amazon Redshift
As of writing this, generate_series() on our instance of Redshift (1.0.33426) could not be used to, for example, create a table:
# select generate_series(1,100,1);
1
2
...
# create table normal_series as select generate_series(1,100,1);
INFO: Function "generate_series(integer, integer, integer) not supported.
ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.
However, with recursive works:
# create table recursive_series as with recursive t(n) as (select 1::integer union all select n+1 from t where n < 100) select n from t;
SELECT
-- modify as desired, here is a date series:
# select getdate()::date + n from recursive_series;
2021-12-18
2021-12-19
...
I needed to do something similar, but with 5 minutes intervals over 7 days. So here's a CTE based hack (ugly but not too verbose)
INSERT INTO five_min_periods
WITH
periods AS (select 0 as num UNION select 1 as num UNION select 2 UNION select 3 UNION select 4 UNION select 5 UNION select 6 UNION select 7 UNION select 8 UNION select 9 UNION select 10 UNION select 11),
hours AS (select num from periods UNION ALL select num + 12 from periods),
days AS (select num from periods where num <= 6),
rightnow AS (select CAST( TO_CHAR(GETDATE(), 'yyyy-mm-dd hh24') || ':' || trim(TO_CHAR((ROUND((DATEPART (MINUTE, GETDATE()) / 5), 1) * 5 ),'09')) AS TIMESTAMP) as start)
select
ROW_NUMBER() OVER(ORDER BY d.num DESC, h.num DESC, p.num DESC) as idx
, DATEADD(minutes, -p.num * 5, DATEADD( hours, -h.num, DATEADD( days, -d.num, n.start ) ) ) AS period_date
from days d, hours h, periods p, rightnow n
Should be able to extend this to other generation schemes. The trick here is using the Cartesian product join (i.e. no JOIN/WHERE clause) to multiply the hand-crafted CTE's to produce the necessary increments and apply to an anchor date.
Redshift's generate_series() function is a leader node only function and as such you cannot use it for downstream processing on the compute nodes. This can be replace by a recursive CTE (or keep a "dates" table on your database). I have an example of such in a recent answer:
Cross join Redshift with sequence of dates
One caution I like to give in answers like this is to be careful with inequality joins (or cross joins or any under-qualified joins) when working with VERY LARGE tables which can happen often in Redshift. If you are joining with a moderate Redshift table of say 1M rows then things will be fine. But if you are doing this on a table of 1B rows then the data explosion will likely cause massive performance issues as the query spills to disk.
I've written a couple of white papers on how to write this type of query in a data space sensitive way. This issue of massive intermediate results is not unique to Redshift and I first developed my approach solving a client's HIVE query issue. "First rule of writing SQL for Big Data - don't make more"
Per the comments of #Ryan Tuck and #Slobodan Pejic generate_series() does not work on Redshift when joining to another table.
The workaround I used was to write out every value in the series in the query:
SELECT
'2019-01-01'::date AS date_month
UNION ALL
SELECT
'2019-02-01'::date AS date_month
Using a Python function like this:
import arrow
def generate_date_series(start, end):
start = arrow.get(start)
end = arrow.get(end)
months = list(
f"SELECT '{month.format('YYYY-MM-DD')}'::date AS date_month"
for month in arrow.Arrow.range('month', start, end)
)
return "\nUNION ALL\n".join(months)
perhaps not as elegant as other solutions, but here's how I did it:
drop table if exists #dates;
create temporary table #dates as
with recursive cte(val_date) as
(select
cast('2020-07-01' as date) as val_date
union all
select
cast(dateadd(day, 1, val_date) as date) as val_date
from
cte
where
val_date <= getdate()
)
select
val_date as yyyymmdd
from
cte
order by
val_date
;
For five minute buckets i would do the following:
select date_trunc('minute', getdate()) - (i || ' minutes')::interval
from generate_series(0, 60*5-1, 5) as i
You could replace 5 by any given interval, and 60 by the number of rows you want.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,(select datediff(day,'01-Jan-2021',now()::date))) i
ORDER BY 1

Generate Dates starting from a date returned by a condition - SQL

A series of dates with a specified interval can be generated using a variable and a static date as per the linked question that I asked earlier. However when there's a where clause to produce a start date, the dates generation seems to stop and only shows the first interval date. I also checked other posts, those that I found e.g. 1, e.g. 2, e.g. 3 are shown with a static date or using CTE.. I am looking for a solution without storedprocedures/functions...
This works:
SELECT DATE(DATE_ADD('2012-01-12',
INTERVAL #i:=#i+30 DAY) ) AS dateO
FROM members, (SELECT #i:=0) r
where #i < DATEDIFF(now(), date '2012-01-12')
;
These don't:
SELECT DATE_ADD(date '2012-01-12',
INTERVAL #j:=#j+30 DAY) AS dateO, #j
FROM `members`, (SELECT #j:=0) s
where #j <= DATEDIFF(now(), date '2012-01-12')
and mmid = 100
;
SELECT DATE_ADD(stdate,
INTERVAL #k:=#k+30 DAY) AS dateO, #k
FROM `members`, (SELECT #k:=0) t
where #k <= DATEDIFF(now(), stdate)
and mmid = 100
;
SQLFIDDLE REFERENCE
Expected Results:
Be the same as the first query results given it starts generating dates with stDate of mmid=100.
Preferably in ANSI SQL so it can be supported in MYSQL, SQL Server/MS Access SQL as Oracle has trunc and rownum given per this query with 14 votes and PostGres has generatge_Series function. I would like to know if this is a bug or a limitation in MYSQL?
PS: I have asked a similar quetion before. It was based on static date values where as this one is based on a date value from a table column based on a condition.
The simplest way to insure cross-platform compatibility is to use a calendar table. In its simplest form
create table calendar (
cal_date date primary key
);
insert into calendar values
('2013-01-01'),
('2013-01-02'); -- etc.
There are many ways to generate dates for insertion.
Instead of using a WHERE clause to generate rows, you use a WHERE clause to select rows. To select October of this year, just
select cal_date
from calendar
where cal_date between '2013-10-01' and '2013-10-31';
It's reasonably compact--365,000 rows to cover a period of 1000 years. That ought to cover most business scenarios.
If you need cross-platform date arithmetic, you can add a tally column.
drop table calendar;
create table calendar (
cal_date date primary key,
tally integer not null unique check (tally > 0)
);
insert into calendar values ('2012-01-01', 1); -- etc.
To select all the dates of 30-day intervals, starting on 2012-01-12 and ending at the end of the calendar year, use
select cal_date
from calendar
where ((tally - (select tally
from calendar
where cal_date = '2012-01-12')) % 30 ) = 0;
cal_date
--
2012-01-12
2012-02-11
2012-03-12
2012-04-11
2012-05-11
2012-06-10
2012-07-10
2012-08-09
2012-09-08
2012-10-08
2012-11-07
2012-12-07
If your "mmid" column is guaranteed to have no gaps--an unspoken requirement for a calendar table--you can use the "mmid" column in place of my "tally" column.

Group by data intervals

I have a single table which stores bandwidth usage on the network over a period of time. One column will contain the date time (primary key) and another column will record the bandwidth. Data is recorded every minute. We will have other columns recording other data at that moment in time.
If the user requests the data on 15 minute intervals (within a 24 hour period given start and end date), is it possible with a single query to get the data I require or would I have to write a stored procedure/cursor to do this? Users may then request 5 minute intervals data etc.
I will most likely be using Postgres but are there other NOSQL options which would be better?
Any ideas?
WITH t AS (
SELECT ts, (random()*100)::int AS bandwidth
FROM generate_series('2012-09-01', '2012-09-04', '1 minute'::interval) ts
)
SELECT date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,count(*) AS rows_in_timeslice -- optional
,sum(bandwidth) AS sum_bandwidth
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz -- user's time range
AND ts < '2012-09-03 00:00:00+02'::timestamptz -- careful with borders
GROUP BY 1, 2
ORDER BY 1, 2;
The CTE t provides data like your table might hold: one timestamp ts per minute with a bandwidth number. (You don't need that part, you work with your table instead.)
Here is a very similar solution for a very similar question - with detailed explanation how this particular aggregation works:
date_trunc 5 minute interval in PostgreSQL
Here is a similar solution for a similar question concerning running sums - with detailed explanation and links for the various functions used:
PostgreSQL: running count of rows for a query 'by minute'
Additional question in comment
WITH -- same as above ...
SELECT DISTINCT ON (1,2)
date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,bandwidth AS bandwith_sample_at_min15
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz
AND ts < '2012-09-03 00:00:00+02'::timestamptz
ORDER BY 1, 2, ts DESC;
Retrieves one un-aggregated sample per 15 minute interval - from the last available row in the window. This will be the 15th minute if the row is not missing. Crucial parts are DISTINCT ON and ORDER BY.
More information about the used technique here:
Select first row in each GROUP BY group?
select
date_trunc('hour', d) +
(((extract(minute from d)::integer / 5 * 5)::text) || ' minute')::interval
as "from",
date_trunc('hour', d) +
((((extract(minute from d)::integer / 5 + 1) * 5)::text) || ' minute')::interval
- '1 second'::interval
as "to",
sum(random() * 1000) as bandwidth
from
generate_series('2012-01-01', '2012-01-31', '1 minute'::interval) s(d)
group by 1, 2
order by 1, 2
;
That for 5 minutes ranges. For 15 minutes divide by 15.

Filter by date range (same month and day) across years

I have a PostgreSQL database with a table holding dates.
Now I need to find all rows within the date range 15/02 until 21/06 (day/month) across all years.
Example result:
1840-02-28
1990-06-21
1991-02-15
1991-04-25
1992-05-30
1995-03-04
1995-04-10
2001-02-03
2010-04-06
Assuming (with a leap of faith) that you want dates between certain days of the year regardless of the year (like if you're sending out a batch of birthday cards or something), you can set up a test with this:
CREATE TABLE d (dt date);
COPY d FROM STDIN;
1840-02-28
1990-06-21
1991-02-15
1991-04-25
1992-05-30
1995-03-04
1995-04-10
2001-02-03
2010-04-06
\.
And you can use "row value constructors" to easily select the desired range:
SELECT * FROM d
WHERE (EXTRACT(MONTH FROM dt), EXTRACT(DAY FROM dt))
BETWEEN (2, 15) AND (6, 21);
Which yields:
dt
------------
1840-02-28
1990-06-21
1991-02-15
1991-04-25
1992-05-30
1995-03-04
1995-04-10
2010-04-06
(8 rows)
Use a WHERE clause with the BETWEEN operator. See:
http://www.postgresql.org/docs/current/static/functions-comparison.html#FUNCTIONS-COMPARISON
and:
http://www.postgresql.org/docs/current/static/sql-select.html
http://www.postgresql.org/docs/current/static/tutorial.html
If that doesn't help, please expand your question with:
The structure of the table(s) you're working with, either from psql's \d tablename command or the original CREATE TABLE statements;
Some sample contents
The query you're having problems with
Expected results
You can use following syntax.
SELECT * FROM tableName WHERE dateColumnName BETWEEN '2012.01.01' AND '2012.08.14';
Just replace following;
tableName - Name of the table you are going to access
dateColumnName - Name of the column whch contains dates
2012.08.1 - Start date
2012.08.21 - End date
When entering the two dates, carefully examine the example above. Enter in the same format, and enclose them inside ''s.
If you replace * mark with a column name, you can filter out values of that column only.
Hope that helps..
I am pretty sure, #kgrittn's interpretation of the question is accurate and I love his elegant use of a row constructors. Even more so after I have tested a couple of alternatives, none of which could match the performance:
Tested with a real life table of 65426 rows; 32107 qualified. PostgreSQL 9.1.4, best of five with EXPLAIN ANALYZE:
SELECT * FROM tbl
WHERE to_char(data, 'MMDD') BETWEEN '0215' AND '0621';
Total runtime: 251.188 ms
SELECT * FROM tbl
WHERE to_char(data, 'MMDD')::int BETWEEN 215 AND 621;
Total runtime: 250.965 ms
SELECT * FROM tbl
WHERE to_char(data, 'MMDD') COLLATE "C" BETWEEN '0215' AND '0621';
Total runtime: 221.732 ms
String comparison is faster with the "non-locale" C - more in the manual about collation support.
SELECT * FROM tbl
WHERE EXTRACT(MONTH FROM data)*100 + EXTRACT(DAY FROM data)
BETWEEN 215 AND 621;
Total runtime: 209.965 ms
SELECT * FROM tbl
WHERE EXTRACT(MONTH FROM data) BETWEEN 3 AND 5
OR EXTRACT(MONTH FROM data) = 2 AND EXTRACT(DAY FROM data) >= 15
OR EXTRACT(MONTH FROM data) = 6 AND EXTRACT(DAY FROM data) <= 21;
Total runtime: 160.169 ms
SELECT * FROM tbl
WHERE EXTRACT(MONTH FROM data) BETWEEN 2 AND 6
AND CASE EXTRACT(MONTH FROM data)
WHEN 2 THEN EXTRACT(DAY FROM data) >= 15
WHEN 6 THEN EXTRACT(DAY FROM data) <=21
ELSE TRUE END;
Total runtime: 147.390 ms
SELECT * FROM tbl
WHERE CASE EXTRACT(MONTH FROM data)
WHEN 3 THEN TRUE
WHEN 4 THEN TRUE
WHEN 5 THEN TRUE
WHEN 2 THEN EXTRACT(DAY FROM data) >= 15
WHEN 6 THEN EXTRACT(DAY FROM data) <= 21
ELSE FALSE END;
Total runtime: 131.907 ms
#Kevin's solution with row constructors:
SELECT * FROM tbl
WHERE (EXTRACT(MONTH FROM data), EXTRACT(DAY FROM data))
BETWEEN (2, 15) AND (6, 21);
Total runtime: 125.460 ms
Chapeau.
Faster with functional index
The only way to beat that is with indexes. None of the queries above can use a plain index on data. However, if read performance is crucial (and for a small cost on write performance) you can resort to a functional index:
CREATE INDEX ON tbl(EXTRACT(MONTH FROM data), EXTRACT(DAY FROM data));
SELECT * FROM tbl
WHERE (EXTRACT(MONTH FROM data), EXTRACT(DAY FROM data))
BETWEEN (2, 15) AND (6, 21);
Total runtime: 85.895 ms
And that's where I can finally beat Kevin's query by a hair: with a single column index instead of the multi-column index needed in his case.
CREATE INDEX ON tbl(
CAST(EXTRACT(MONTH FROM data) * 100 + EXTRACT(DAY FROM data) AS int));
SELECT * FROM tbl
WHERE (EXTRACT(MONTH FROM data) * 100 + EXTRACT(DAY FROM data))::int
BETWEEN 215 AND 621;
Total runtime: 84.215 ms
You can use simple condition >= and <= or similar or use between/and but the trick is to know your exact data type.
Sometimes date fields contain time and that is where the query can go wrong so it is recommended to use some date related functions to remove the time issue. In SQL Server common function to do that is datediff function.