Postgresql WHERE with age() function [duplicate] - sql

This question already has answers here:
Using an Alias column in the where clause in Postgresql
(6 answers)
Closed 4 years ago.
I'm pretty sure this has been asked before but I am struggling to get the correct syntax for a table containing data like
id date type report item_id
1 2018-11-07 Veröffentlichung des 9-Monats-Berichtes TRUE 16
2 2018-11-06 Veröffentlichung des 9-Monats-Berichtes TRUE 17
3 2019-03-07 Veröffentlichung des Jahresberichtes TRUE 17
4 2019-05-10 Bericht zum 1. Quartal TRUE 17
The query I am trying to formulate is
SELECT date, AGE(now(), date) as t1
FROM dates
WHERE t1 > 0
Meaning I am only looking for values in the past.
However, I get an error
ERROR: column "t1" does not exist
(of course, it is an alias). Does Postgresql not support aliases here?

You cannot refer to alias in WHERE condition, because logically WHERE is executed before SELECT.
You could use subquery:
SELECT *
FROM (SELECT date, AGE(now(), date) as t1
FROM dates) sub
WHERE sub.t1 > interval '0::seconds';
Or LATERAL(my favourite way):
SELECT date, s.t1
FROM dates
,LATERAL (SELECT AGE(now(), date) as t1) AS s
WHERE s.t1 > interval '0::seconds';
Or repeat expression(violates DRY principle):
SELECT date, AGE(now(), date) as t1
FROM dates
WHERE AGE(now(), date) > interval '0::seconds';
As for calculating AGE you don't really need it, because you could rewrite it as date > now().
Related articles:
PostgreSQL: using a calculated column in the same query
MySQL - Search into a custom Column
Why do “linq to sql” queries starts with the FROM keyword unlike regular SQL queries?

If you're in a hurry and have an (ordered) index on date don't do that.
Because this query can use the index giving a massive gain in performance at only a slight investment in coding effort.
SELECT date, AGE(now(), date) AS t1
FROM dates
WHERE date > now();
I say now(), because you did, but perhaps you want CURRENT_DATE instead
To create a suitable index do
create index dates_date on dates(date);

Related

Get the count of distinct userids for last couple of days

Let's say the last 7 days for this table:
Userid Download time
Rab01 2020-04-29 03:28
Klm01 2020-04-29 04:01
Klm01 2020-04-30 05:10
Rab01 2020-04-29 12:14
Osa_3 2020-04-25 09:01
Following is the required output:
Count Download_time
1 2020-04-25
2 2020-04-29
1 2020-04-30
Tested with PostgreSQL. You also tagged Redshift, which forked at Postgres 8.2, a long time ago. There may be discrepancies ..
Since you seem to be happy with standard ISO format, a simple cast to date would be most efficient:
SELECT count(DISTINCT userid) AS "Count"
, download_time::date AS "Download_Day"
FROM tbl
WHERE download_time >= CURRENT_DATE - 7
AND download_time < CURRENT_DATE
GROUP BY 2;
db<>fiddle here
CURRENT_DATE is standard SQL and works for both Postgres and Redshift. Related:
How do I determine the last day of the previous month using PostgreSQL?
About the "last 7 days": I took the last 7 whole days (excluding today - necessarily incomplete), with syntax that can use a plain index on (download_time). Related:
Get dates of a day of week in a date range
Slow LEFT JOIN on CTE with time intervals
Interval (days) in PostgreSQL with two parameters
Ideally, you have a composite index on (download_time, userid) (and fulfill some preconditions) to get very fast index-only scans. See:
Is a composite index also good for queries on the first field?
count(DISTINCT ...) is typically slow. For big tables with many duplicates, there are faster techniques. Disclose your exact setup and cardinalities if you need to optimize performance.
If the actual data type is timestamptz, not just timestamp, you also need to define the time zone defining day boundaries. See:
Ignoring time zones altogether in Rails and PostgreSQL
About the optional short syntax GROUP BY 2:
Select first row in each GROUP BY group?
About capitalization of identifiers:
Are PostgreSQL column names case-sensitive?
You can use date_trunc function for get day only part from datetime and use it for grouping.
The query may be next:
SELECT
count(distinct Userid) as Count, -- get unuque users count
to_char(date_trunc('day', Download_time), 'YYYY-MM-DD') AS Download_Day -- convert time do day
FROM table
WHERE DATE_PART('day', NOW() - Download_time) < 7 -- last 7 days
GROUP BY Download_Day; -- group by day
Fiddle

Efficient way of counting a large content from a cloumn or a two in a database using selected time period

I need to list number of column1 that have been added to the database over the selected time period (since the day the list is requested)-daily, weekly (last 7 days), monthly (last 30 days) and quarterly (last 3 months). for example below is the table I created to perform this task.
Column | Type | Modifiers
------------------+-----------------------------+-----------------------------------------------------
column1 character varying (256) not null default nextval
date timestamp without time zone not null default now()
coloumn2 charater varying(256) ..........
Now, I need the total count of entries in column1 with respect the selected time period.
Like,
Column 1 | Date | Coloumn2
------------------+-----------------------------+-----------------------------------------------------
abcdef 2013-05-12 23:03:22.995562 122345rehr566
njhkepr 2013-04-10 21:03:22.337654 45hgjtron
ffb3a36dce315a7 2013-06-14 07:34:59.477735 jkkionmlopp
abcdefgggg 2013-05-12 23:03:22.788888 22345rehr566
From above data, for daily selected time period it should be count= 2
I have tried doing this query
select count(column1) from table1 where date='2012-05-12 23:03:22';
and have got the exact one record matching the time stamp. But I really needed to do it in proper way I believe this is not an efficient way of retrieving the count. Anyone who could help me know the right and efficient way of writing such query would be great. I am new to the database world, and I am trying to be efficient in writing any query.
Thanks!
[EDIT]
Each query currently is taking 175854ms to get process. What could be the efficient way to lessen the time to have it processed accordingly. Any help would be really great. I am using Postgresql to do the same.
To be efficient, conditions should compare values of the sane type as the columns being compared. In this case, the column being compared - Date - has type timestamp, so we need to use a range of tinestamp values.
In keeping with this, you should use current_timestamp for the "now" value, and as confirmed by the documentation, subtracting an interval from a timestamp yields a timestamp, so...
For the last 1 day:
select count(*) from table1
where "Date" > current_timestamp - interval '1 day'
For the last 7 days:
select count(*) from table1
where "Date" > current_timestamp - interval '7 days'
For the last 30 days:
select count(*) from table1
where "Date" > current_timestamp - interval '30 days'
For the last 3 months:
select count(*) from table1
where "Date" > current_timestamp - interval '3 months'
Make sure you have an index on the Date column.
If you find that the index is not being used, try converting the condition to a between, eg:
where "Date" between current_timestamp - interval '3 months' and current_timestamp
Logically the same, but may help the optimizer to choose the index.
Note that column1 is irrelevant to the question; being unique there is no possibility of the row count being different from the number of different values of column1 found by any given criteria.
Also, the choice of "Date" for the column name is poor, because a) it is a reserved word, and b) it is not in fact a date.
If you want to count number of records between two dates:
select count(*)
from Table1
where "Date" >= '2013-05-12' and "Date" < '2013-05-13'
-- count for one day, upper bound not included
select count(*)
from Table1
where "Date" >= '2013-05-12' and "Date" < '2013-06-13'
-- count for one month, upper bound not included
select count(*)
from Table1
where
"Date" >= current_date and
"Date" < current_date + interval '1 day'
-- current date
What I understand from your wording is
select date_trunc('day', "date"), count(*)
from t
where "date" >= '2013-01-01'
group by 1
order by 1
Replace 'day' for 'week', 'month', 'quarter' as needed.
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
Create an index on the "date" column.
select count(distinct column1) from table1 where date > '2012-05-12 23:03:22';
I assume "number of column1" means "number of distinct values in column1.
Edit:
Regarding your second question (speed of the query): I would assume that an index on the date column should speed up the runtime. Depending on the data content, this could even be declared unique.
To throw another option into the mix...
Add a column of type "date" and index that -- named "datecol" for this example:
create index on tbl_datecol_idx on tbl (datecol);
analyze tbl;
Then your query can use an equality operator:
select count(*) from tbl where datecol = current_date - 1; --yesterday
Or if you can't add the date datatype column, you could create a functional index on the existing column:
create index tbl_date_fbi on tbl ( ("date"::DATE) );
analyze tbl;
select count(*) from tbl where "date"::DATE = current_date - 1;
Note1: you do not need to query "column1" directly as every row has that attribute filled due to the NOT NULL.
Note2: Creating a column named "date" is poor form, and even worse that it is of type TIMESTAMP.

Using sql function generate_series() in redshift

I'd like to use the generate series function in redshift, but have not been successful.
The redshift documentation says it's not supported. The following code does work:
select *
from generate_series(1,10,1)
outputs:
1
2
3
...
10
I'd like to do the same with dates. I've tried a number of variations, including:
select *
from generate_series(date('2008-10-01'),date('2008-10-10 00:00:00'),1)
kicks out:
ERROR: function generate_series(date, date, integer) does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]
Also tried:
select *
from generate_series('2008-10-01 00:00:00'::timestamp,
'2008-10-10 00:00:00'::timestamp,'1 day')
And tried:
select *
from generate_series(cast('2008-10-01 00:00:00' as datetime),
cast('2008-10-10 00:00:00' as datetime),'1 day')
both kick out:
ERROR: function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]
If not looks like I'll use this code from another post:
SELECT to_char(DATE '2008-01-01'
+ (interval '1 month' * generate_series(0,57)), 'YYYY-MM-DD') AS ym
PostgreSQL generate_series() with SQL function as arguments
Amazon Redshift seems to be based on PostgreSQL 8.0.2. The timestamp arguments to generate_series() were added in 8.4.
Something like this, which sidesteps that problem, might work in Redshift.
SELECT current_date + (n || ' days')::interval
from generate_series (1, 30) n
It works in PostgreSQL 8.3, which is the earliest version I can test. It's documented in 8.0.26.
Later . . .
It seems that generate_series() is unsupported in Redshift. But given that you've verified that select * from generate_series(1,10,1) does work, the syntax above at least gives you a fighting chance. (Although the interval data type is also documented as being unsupported on Redshift.)
Still later . . .
You could also create a table of integers.
create table integers (
n integer primary key
);
Populate it however you like. You might be able to use generate_series() locally, dump the table, and load it on Redshift. (I don't know; I don't use Redshift.)
Anyway, you can do simple date arithmetic with that table without referring directly to generate_series() or to interval data types.
select (current_date + n)
from integers
where n < 31;
That works in 8.3, at least.
Using Redshift today, you can generate a range of dates by using datetime functions and feeding in a number table.
select (getdate()::date - generate_series)::date from generate_series(1,30,1)
Generates this for me
date
2015-11-06
2015-11-05
2015-11-04
2015-11-03
2015-11-02
2015-11-01
2015-10-31
2015-10-30
2015-10-29
2015-10-28
2015-10-27
2015-10-26
2015-10-25
2015-10-24
2015-10-23
2015-10-22
2015-10-21
2015-10-20
2015-10-19
2015-10-18
2015-10-17
2015-10-16
2015-10-15
2015-10-14
2015-10-13
2015-10-12
2015-10-11
2015-10-10
2015-10-09
2015-10-08
The generate_series() function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide.
UPDATE
generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate last 30 days date
Ref: generate_series function in Amazon Redshift
As of writing this, generate_series() on our instance of Redshift (1.0.33426) could not be used to, for example, create a table:
# select generate_series(1,100,1);
1
2
...
# create table normal_series as select generate_series(1,100,1);
INFO: Function "generate_series(integer, integer, integer) not supported.
ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.
However, with recursive works:
# create table recursive_series as with recursive t(n) as (select 1::integer union all select n+1 from t where n < 100) select n from t;
SELECT
-- modify as desired, here is a date series:
# select getdate()::date + n from recursive_series;
2021-12-18
2021-12-19
...
I needed to do something similar, but with 5 minutes intervals over 7 days. So here's a CTE based hack (ugly but not too verbose)
INSERT INTO five_min_periods
WITH
periods AS (select 0 as num UNION select 1 as num UNION select 2 UNION select 3 UNION select 4 UNION select 5 UNION select 6 UNION select 7 UNION select 8 UNION select 9 UNION select 10 UNION select 11),
hours AS (select num from periods UNION ALL select num + 12 from periods),
days AS (select num from periods where num <= 6),
rightnow AS (select CAST( TO_CHAR(GETDATE(), 'yyyy-mm-dd hh24') || ':' || trim(TO_CHAR((ROUND((DATEPART (MINUTE, GETDATE()) / 5), 1) * 5 ),'09')) AS TIMESTAMP) as start)
select
ROW_NUMBER() OVER(ORDER BY d.num DESC, h.num DESC, p.num DESC) as idx
, DATEADD(minutes, -p.num * 5, DATEADD( hours, -h.num, DATEADD( days, -d.num, n.start ) ) ) AS period_date
from days d, hours h, periods p, rightnow n
Should be able to extend this to other generation schemes. The trick here is using the Cartesian product join (i.e. no JOIN/WHERE clause) to multiply the hand-crafted CTE's to produce the necessary increments and apply to an anchor date.
Redshift's generate_series() function is a leader node only function and as such you cannot use it for downstream processing on the compute nodes. This can be replace by a recursive CTE (or keep a "dates" table on your database). I have an example of such in a recent answer:
Cross join Redshift with sequence of dates
One caution I like to give in answers like this is to be careful with inequality joins (or cross joins or any under-qualified joins) when working with VERY LARGE tables which can happen often in Redshift. If you are joining with a moderate Redshift table of say 1M rows then things will be fine. But if you are doing this on a table of 1B rows then the data explosion will likely cause massive performance issues as the query spills to disk.
I've written a couple of white papers on how to write this type of query in a data space sensitive way. This issue of massive intermediate results is not unique to Redshift and I first developed my approach solving a client's HIVE query issue. "First rule of writing SQL for Big Data - don't make more"
Per the comments of #Ryan Tuck and #Slobodan Pejic generate_series() does not work on Redshift when joining to another table.
The workaround I used was to write out every value in the series in the query:
SELECT
'2019-01-01'::date AS date_month
UNION ALL
SELECT
'2019-02-01'::date AS date_month
Using a Python function like this:
import arrow
def generate_date_series(start, end):
start = arrow.get(start)
end = arrow.get(end)
months = list(
f"SELECT '{month.format('YYYY-MM-DD')}'::date AS date_month"
for month in arrow.Arrow.range('month', start, end)
)
return "\nUNION ALL\n".join(months)
perhaps not as elegant as other solutions, but here's how I did it:
drop table if exists #dates;
create temporary table #dates as
with recursive cte(val_date) as
(select
cast('2020-07-01' as date) as val_date
union all
select
cast(dateadd(day, 1, val_date) as date) as val_date
from
cte
where
val_date <= getdate()
)
select
val_date as yyyymmdd
from
cte
order by
val_date
;
For five minute buckets i would do the following:
select date_trunc('minute', getdate()) - (i || ' minutes')::interval
from generate_series(0, 60*5-1, 5) as i
You could replace 5 by any given interval, and 60 by the number of rows you want.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,(select datediff(day,'01-Jan-2021',now()::date))) i
ORDER BY 1

How do you do date math that ignores the year?

I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following.
SELECT * FROM events
WHERE EXTRACT(month FROM "date") = 3
AND EXTRACT(day FROM "date") < EXTRACT(day FROM "date") + 14
The problem with this is that months wrap.
I would prefer to do something like this, but I don't know how to ignore the year.
SELECT * FROM events
WHERE (date > '2013-03-01' AND date < '2013-04-01')
How can I accomplish this kind of date math in Postgres?
TL/DR: use the "Black magic version" below.
All queries presented in other answers so far operate with conditions that are not sargable: they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. Doesn't matter much with small tables. Matters a lot with big tables.
Given the following simple table:
CREATE TABLE event (
event_id serial PRIMARY KEY
, event_date date
);
Query
Version 1. and 2. below can use a simple index of the form:
CREATE INDEX event_event_date_idx ON event(event_date);
But all of the following solutions are even faster without index.
1. Simple version
SELECT *
FROM (
SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date
FROM generate_series( 0, 14) d
CROSS JOIN generate_series(13, 113) y
) x
JOIN event USING (event_date);
Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with the final simple join.
2. Advanced version
WITH val AS (
SELECT extract(year FROM age(current_date + 14, min(event_date)))::int AS max_y
, extract(year FROM age(current_date, max(event_date)))::int AS min_y
FROM event
)
SELECT e.*
FROM (
SELECT ((current_date + d.d) - interval '1 year' * y.y)::date AS event_date
FROM generate_series(0, 14) d
,(SELECT generate_series(min_y, max_y) AS y FROM val) y
) x
JOIN event e USING (event_date);
Range of years is deduced from the table automatically - thereby minimizing generated years.
You could go one step further and distill a list of existing years if there are gaps.
Effectiveness co-depends on the distribution of dates. It's better for few years with many rows each.
Simple db<>fiddle to play with here
Old sqlfiddle
3. Black magic version
Create a simple SQL function to calculate an integer from the pattern 'MMDD':
CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';
I had to_char(time, 'MMDD') at first, but switched to the above expression which proved fastest in new tests on Postgres 9.6 and 10:
db<>fiddle here
It allows function inlining because EXTRACT(xyz FROM date) is implemented with the IMMUTABLE function date_part(text, date) internally. And it has to be IMMUTABLE to allow its use in the following essential multicolumn expression index:
CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);
Multicolumn for a number of reasons:
Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.
Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.
Basic query:
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN f_mmdd(current_date)
AND f_mmdd(current_date + 14);
One PL/pgSQL table function to rule them all
Fork to one of two queries to cover the turn of the year:
CREATE OR REPLACE FUNCTION f_anniversary(_the_date date = current_date, _days int = 14)
RETURNS SETOF event
LANGUAGE plpgsql AS
$func$
DECLARE
d int := f_mmdd($1);
d1 int := f_mmdd($1 + $2 - 1); -- fix off-by-1 from upper bound
BEGIN
IF d1 > d THEN
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN d AND d1
ORDER BY f_mmdd(e.event_date), e.event_date;
ELSE -- wrap around end of year
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) >= d OR
f_mmdd(e.event_date) <= d1
ORDER BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;
-- chronological across turn of the year
END IF;
END
$func$;
Call using defaults: 14 days beginning "today":
SELECT * FROM f_anniversary();
Call for 7 days beginning '2014-08-23':
SELECT * FROM f_anniversary(date '2014-08-23', 7);
db<>fiddle here - comparing EXPLAIN ANALYZE
"February 29"
When dealing with anniversaries or "birthdays", you need to define how to deal with the special case "February 29" in leap years.
When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.
On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.
Say, Bob is born on the 29th of February:
My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.
My query 3. includes February 29 in the range. Bob has birthday every year.
There is no magical solution. You have to define what you want for every case.
Test
To substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.
The good news: all of them are correct and yield the same result - except for Gordon's query that had syntax errors, and #wildplasser's query that fails when the year wraps around (easy to fix).
Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older).
INSERT INTO event (event_date)
SELECT '2000-1-1'::date - (random() * 36525)::int
FROM generate_series (1, 108000);
Delete ~ 8 % to create some dead tuples and make the table more "real life".
DELETE FROM event WHERE random() < 0.08;
ANALYZE event;
My test case had 99289 rows, 4012 hits.
C - Catcall
WITH anniversaries as (
SELECT event_id, event_date
,(event_date + (n || ' years')::interval)::date anniversary
FROM event, generate_series(13, 113) n
)
SELECT event_id, event_date -- count(*) --
FROM anniversaries
WHERE anniversary BETWEEN current_date AND current_date + interval '14' day;
C1 - Catcall's idea rewritten
Aside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year's anniversary, which avoids the need for a CTE altogether:
SELECT event_id, event_date
FROM event
WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date
BETWEEN current_date AND current_date + 14;
D - Daniel
SELECT * -- count(*) --
FROM event
WHERE extract(month FROM age(current_date + 14, event_date)) = 0
AND extract(day FROM age(current_date + 14, event_date)) <= 14;
E1 - Erwin 1
See "1. Simple version" above.
E2 - Erwin 2
See "2. Advanced version" above.
E3 - Erwin 3
See "3. Black magic version" above.
G - Gordon
SELECT * -- count(*)
FROM (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) e
WHERE to_date(to_char(now(), 'YYYY') || '-'
|| (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END)
,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;
H - a_horse_with_no_name
WITH upcoming as (
SELECT event_id, event_date
,CASE
WHEN date_trunc('year', age(event_date)) = age(event_date)
THEN current_date
ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)
* interval '1' year) AS date)
END AS next_event
FROM event
)
SELECT event_id, event_date
FROM upcoming
WHERE next_event - current_date <= 14;
W - wildplasser
CREATE OR REPLACE FUNCTION this_years_birthday(_dut date)
RETURNS date
LANGUAGE plpgsql AS
$func$
DECLARE
ret date;
BEGIN
ret := date_trunc('year' , current_timestamp)
+ (date_trunc('day' , _dut)
- date_trunc('year' , _dut));
RETURN ret;
END
$func$;
Simplified to return the same as all the others:
SELECT *
FROM event e
WHERE this_years_birthday( e.event_date::date )
BETWEEN current_date
AND current_date + '2weeks'::interval;
W1 - wildplasser's query rewritten
The above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:
CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date)
LANGUAGE sql AS
$func$
SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date
$func$;
SELECT *
FROM event e
WHERE this_years_birthday(e.event_date) BETWEEN current_date
AND (current_date + 14);
Test results
I ran this test with a temporary table on PostgreSQL 9.1.7.
Results were gathered with EXPLAIN ANALYZE, best of 5.
Results
Without index
C: Total runtime: 76714.723 ms
C1: Total runtime: 307.987 ms -- !
D: Total runtime: 325.549 ms
E1: Total runtime: 253.671 ms -- !
E2: Total runtime: 484.698 ms -- min() & max() expensive without index
E3: Total runtime: 213.805 ms -- !
G: Total runtime: 984.788 ms
H: Total runtime: 977.297 ms
W: Total runtime: 2668.092 ms
W1: Total runtime: 596.849 ms -- !
With index
E1: Total runtime: 37.939 ms --!!
E2: Total runtime: 38.097 ms --!!
With index on expression
E3: Total runtime: 11.837 ms --!!
All other queries perform the same with or without index because they use non-sargable expressions.
Conclusion
So far, #Daniel's query was the fastest.
#wildplassers (rewritten) approach performs acceptably, too.
#Catcall's version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.
The rewritten version performs pretty well, though. The expression I use is something like a simpler version of #wildplassser's this_years_birthday() function.
My "simple version" is faster even without index, because it needs fewer computations.
With index, the "advanced version" is about as fast as the "simple version", because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.
My "black magic version" is fastest with or without index. And it is very simple to call.
The updated version (after the benchmark) is a bit faster, yet.
With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.
I believe the following test works in all cases, assuming a column named anniv_date:
select * from events
where extract(month from age(current_date+interval '14 days', anniv_date))=0
and extract(day from age(current_date+interval '14 days', anniv_date)) <= 14
As an example of how it works when crossing a year (and also a month), let's say an anniversary date is 2009-01-04 and the date at which the test is run is 2012-12-29.
We want to consider any date between 2012-12-29 and 2013-01-12 (14 days)
age('2013-01-12'::date, '2009-01-04'::date) is 4 years 8 days.
extract(month...) from this is 0 and extract(days...) is 8, which is lower than 14 so it matches.
How about this?
select *
from events e
where to_char(e."date", 'MM-DD') between to_char(now(), 'MM-DD') and
to_char(date(now())+14, 'MM-DD')
You can do the comparison as strings.
To take year ends into account, we'll convert back to dates:
select *
from events e
where to_date(to_char(now(), 'YYYY')||'-'||to_char(e."date", 'MM-DD'), 'YYYY-MM-DD')
between date(now()) and date(now())+14
You do need to make a slight adjustment for Feb 29. I might suggest:
select *
from (select e.*,
to_char(e."date", 'MM-DD') as MMDD
from events
) e
where to_date(to_char(now(), 'YYYY')||'-'||(case when MMDD = '02-29' then '02-28' else MMDD), 'YYYY-MM-DD')
between date(now()) and date(now())+14
For convenience, I created two functions that yield the (expected or past) birsthday in the current year, and the upcoming birthday.
CREATE OR REPLACE FUNCTION this_years_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION next_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
IF (ret < date_trunc( 'day' , current_timestamp))
THEN ret = ret + '1year'::interval; END IF;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
--
-- call the function
--
SELECT date_trunc( 'day' , t.topic_date) AS the_date
, this_years_birthday( t.topic_date::date ) AS the_day
, next_birthday( t.topic_date::date ) AS next_day
FROM topic t
WHERE this_years_birthday( t.topic_date::date )
BETWEEN current_date
AND current_date + '2weeks':: interval
;
NOTE: the casts are needed because I only had timestamps available.
This should handle wrap-arounds at the end of the year as well:
with upcoming as (
select name,
event_date,
case
when date_trunc('year', age(event_date)) = age(event_date) then current_date
else cast(event_date + ((extract(year from age(event_date)) + 1) * interval '1' year) as date)
end as next_event
from events
)
select name,
next_event,
next_event - current_date as days_until_next
from upcoming
order by next_event - current_date
You can filter than on the expression next_event - current_date to apply the "next 14 days"
The case ... is only necessary if you consider events that would be "today" as "upcoming" as well. Otherwise, that can be reduced to the else part of the case statement.
Note that I "renamed" the column "date" to event_date. Mainly because reserved words shouldn't be used as an identifier but also because date is a terrible column name. It doesn't tell you anything about what it stores.
You can generate a virtual table of anniversaries, and select from it.
with anniversaries as (
select event_date,
(event_date + (n || ' years')::interval)::date anniversary
from events, generate_series(1,10) n
)
select event_date, anniversary
from anniversaries
where anniversary between current_date and current_date + interval '14' day
order by event_date, anniversary
The call to generate_series(1,10) has the effect of generating 10 years of anniversaries for each event_date. I wouldn't use the literal value 10 in production. Instead, I'd either calculate the right number of years to use in a subquery, or I'd use a large literal like 100.
You'll want to adjust the WHERE clause to fit your application.
If you have a performance problem with the virtual table (when you have a lot of rows in "events"), replace the common table expression with a base table having the identical structure. Storing anniversaries in a base table makes their values obvious (especially for, say, Feb 29 anniversaries), and queries on such a table can use an index. Querying an anniversary table of half a million rows using just the SELECT statement above takes 25ms on my desktop.
I found a way to do it.
SELECT EXTRACT(DAYS FROM age('1999-04-10', '2003-05-12')),
EXTRACT(MONTHS FROM age('1999-04-10', '2003-05-12'));
date_part | date_part
-----------+-----------
-2 | -1
I can then just check that the month is 0 and the days are less than 14.
If you have a more elegant solution, please do post it. I'll leave the question open for a bit.
I don't work with postgresql so I googled it's date functions and found this: http://www.postgresql.org/docs/current/static/functions-datetime.html
If I read it correctly, looking for events in the next 14 days is as simple as:
where mydatefield >= current_date
and mydatefield < current_date + integer '14'
Of course I might not be reading it correctly.

Generate Dates starting from a date returned by a condition - SQL

A series of dates with a specified interval can be generated using a variable and a static date as per the linked question that I asked earlier. However when there's a where clause to produce a start date, the dates generation seems to stop and only shows the first interval date. I also checked other posts, those that I found e.g. 1, e.g. 2, e.g. 3 are shown with a static date or using CTE.. I am looking for a solution without storedprocedures/functions...
This works:
SELECT DATE(DATE_ADD('2012-01-12',
INTERVAL #i:=#i+30 DAY) ) AS dateO
FROM members, (SELECT #i:=0) r
where #i < DATEDIFF(now(), date '2012-01-12')
;
These don't:
SELECT DATE_ADD(date '2012-01-12',
INTERVAL #j:=#j+30 DAY) AS dateO, #j
FROM `members`, (SELECT #j:=0) s
where #j <= DATEDIFF(now(), date '2012-01-12')
and mmid = 100
;
SELECT DATE_ADD(stdate,
INTERVAL #k:=#k+30 DAY) AS dateO, #k
FROM `members`, (SELECT #k:=0) t
where #k <= DATEDIFF(now(), stdate)
and mmid = 100
;
SQLFIDDLE REFERENCE
Expected Results:
Be the same as the first query results given it starts generating dates with stDate of mmid=100.
Preferably in ANSI SQL so it can be supported in MYSQL, SQL Server/MS Access SQL as Oracle has trunc and rownum given per this query with 14 votes and PostGres has generatge_Series function. I would like to know if this is a bug or a limitation in MYSQL?
PS: I have asked a similar quetion before. It was based on static date values where as this one is based on a date value from a table column based on a condition.
The simplest way to insure cross-platform compatibility is to use a calendar table. In its simplest form
create table calendar (
cal_date date primary key
);
insert into calendar values
('2013-01-01'),
('2013-01-02'); -- etc.
There are many ways to generate dates for insertion.
Instead of using a WHERE clause to generate rows, you use a WHERE clause to select rows. To select October of this year, just
select cal_date
from calendar
where cal_date between '2013-10-01' and '2013-10-31';
It's reasonably compact--365,000 rows to cover a period of 1000 years. That ought to cover most business scenarios.
If you need cross-platform date arithmetic, you can add a tally column.
drop table calendar;
create table calendar (
cal_date date primary key,
tally integer not null unique check (tally > 0)
);
insert into calendar values ('2012-01-01', 1); -- etc.
To select all the dates of 30-day intervals, starting on 2012-01-12 and ending at the end of the calendar year, use
select cal_date
from calendar
where ((tally - (select tally
from calendar
where cal_date = '2012-01-12')) % 30 ) = 0;
cal_date
--
2012-01-12
2012-02-11
2012-03-12
2012-04-11
2012-05-11
2012-06-10
2012-07-10
2012-08-09
2012-09-08
2012-10-08
2012-11-07
2012-12-07
If your "mmid" column is guaranteed to have no gaps--an unspoken requirement for a calendar table--you can use the "mmid" column in place of my "tally" column.