Filter by date range (same month and day) across years - sql

I have a PostgreSQL database with a table holding dates.
Now I need to find all rows within the date range 15/02 until 21/06 (day/month) across all years.
Example result:
1840-02-28
1990-06-21
1991-02-15
1991-04-25
1992-05-30
1995-03-04
1995-04-10
2001-02-03
2010-04-06

Assuming (with a leap of faith) that you want dates between certain days of the year regardless of the year (like if you're sending out a batch of birthday cards or something), you can set up a test with this:
CREATE TABLE d (dt date);
COPY d FROM STDIN;
1840-02-28
1990-06-21
1991-02-15
1991-04-25
1992-05-30
1995-03-04
1995-04-10
2001-02-03
2010-04-06
\.
And you can use "row value constructors" to easily select the desired range:
SELECT * FROM d
WHERE (EXTRACT(MONTH FROM dt), EXTRACT(DAY FROM dt))
BETWEEN (2, 15) AND (6, 21);
Which yields:
dt
------------
1840-02-28
1990-06-21
1991-02-15
1991-04-25
1992-05-30
1995-03-04
1995-04-10
2010-04-06
(8 rows)

Use a WHERE clause with the BETWEEN operator. See:
http://www.postgresql.org/docs/current/static/functions-comparison.html#FUNCTIONS-COMPARISON
and:
http://www.postgresql.org/docs/current/static/sql-select.html
http://www.postgresql.org/docs/current/static/tutorial.html
If that doesn't help, please expand your question with:
The structure of the table(s) you're working with, either from psql's \d tablename command or the original CREATE TABLE statements;
Some sample contents
The query you're having problems with
Expected results

You can use following syntax.
SELECT * FROM tableName WHERE dateColumnName BETWEEN '2012.01.01' AND '2012.08.14';
Just replace following;
tableName - Name of the table you are going to access
dateColumnName - Name of the column whch contains dates
2012.08.1 - Start date
2012.08.21 - End date
When entering the two dates, carefully examine the example above. Enter in the same format, and enclose them inside ''s.
If you replace * mark with a column name, you can filter out values of that column only.
Hope that helps..

I am pretty sure, #kgrittn's interpretation of the question is accurate and I love his elegant use of a row constructors. Even more so after I have tested a couple of alternatives, none of which could match the performance:
Tested with a real life table of 65426 rows; 32107 qualified. PostgreSQL 9.1.4, best of five with EXPLAIN ANALYZE:
SELECT * FROM tbl
WHERE to_char(data, 'MMDD') BETWEEN '0215' AND '0621';
Total runtime: 251.188 ms
SELECT * FROM tbl
WHERE to_char(data, 'MMDD')::int BETWEEN 215 AND 621;
Total runtime: 250.965 ms
SELECT * FROM tbl
WHERE to_char(data, 'MMDD') COLLATE "C" BETWEEN '0215' AND '0621';
Total runtime: 221.732 ms
String comparison is faster with the "non-locale" C - more in the manual about collation support.
SELECT * FROM tbl
WHERE EXTRACT(MONTH FROM data)*100 + EXTRACT(DAY FROM data)
BETWEEN 215 AND 621;
Total runtime: 209.965 ms
SELECT * FROM tbl
WHERE EXTRACT(MONTH FROM data) BETWEEN 3 AND 5
OR EXTRACT(MONTH FROM data) = 2 AND EXTRACT(DAY FROM data) >= 15
OR EXTRACT(MONTH FROM data) = 6 AND EXTRACT(DAY FROM data) <= 21;
Total runtime: 160.169 ms
SELECT * FROM tbl
WHERE EXTRACT(MONTH FROM data) BETWEEN 2 AND 6
AND CASE EXTRACT(MONTH FROM data)
WHEN 2 THEN EXTRACT(DAY FROM data) >= 15
WHEN 6 THEN EXTRACT(DAY FROM data) <=21
ELSE TRUE END;
Total runtime: 147.390 ms
SELECT * FROM tbl
WHERE CASE EXTRACT(MONTH FROM data)
WHEN 3 THEN TRUE
WHEN 4 THEN TRUE
WHEN 5 THEN TRUE
WHEN 2 THEN EXTRACT(DAY FROM data) >= 15
WHEN 6 THEN EXTRACT(DAY FROM data) <= 21
ELSE FALSE END;
Total runtime: 131.907 ms
#Kevin's solution with row constructors:
SELECT * FROM tbl
WHERE (EXTRACT(MONTH FROM data), EXTRACT(DAY FROM data))
BETWEEN (2, 15) AND (6, 21);
Total runtime: 125.460 ms
Chapeau.
Faster with functional index
The only way to beat that is with indexes. None of the queries above can use a plain index on data. However, if read performance is crucial (and for a small cost on write performance) you can resort to a functional index:
CREATE INDEX ON tbl(EXTRACT(MONTH FROM data), EXTRACT(DAY FROM data));
SELECT * FROM tbl
WHERE (EXTRACT(MONTH FROM data), EXTRACT(DAY FROM data))
BETWEEN (2, 15) AND (6, 21);
Total runtime: 85.895 ms
And that's where I can finally beat Kevin's query by a hair: with a single column index instead of the multi-column index needed in his case.
CREATE INDEX ON tbl(
CAST(EXTRACT(MONTH FROM data) * 100 + EXTRACT(DAY FROM data) AS int));
SELECT * FROM tbl
WHERE (EXTRACT(MONTH FROM data) * 100 + EXTRACT(DAY FROM data))::int
BETWEEN 215 AND 621;
Total runtime: 84.215 ms

You can use simple condition >= and <= or similar or use between/and but the trick is to know your exact data type.
Sometimes date fields contain time and that is where the query can go wrong so it is recommended to use some date related functions to remove the time issue. In SQL Server common function to do that is datediff function.

Related

How to create a view in PostgreSQL with where clause

I am trying to create a view in Postgres
I have 3-4 time stamps in each closing_date where I need to select only the latest time stamp of each day
And also I have to restrict the closing_date to only 30 days (shown in SQL query below)
Below is the query from SQL data which I had created
CREATE VIEW dbo.CashBreaks_30Days_View as
SELECT Closing_date,Bo,Desk,Breaks_Staus,Owner,status,Team,
SLA,Age_Bucket_EntryDate,Age_Bucket_ValueDate,Age_EntryDate,Age_ValueDate,
[Type_(2)]
FROM Master_Data_CashBreaks
WHERE Closing_date >= cast(getdate()-37 as date);
If I understood you correctly, something like this might return result you want:
create or replace view cash_breaks_30days_view
as
select a.list_of_columns
from master_data_cashbreaks a
where a.closing_date >= trunc(sysdate) - 30 --> the last 30 days
and a.closing_date = (select max(b.closing_date) --> subquery is used to return
from master_data_cashbreaks b -- the last timestamp per date
where b.id = a.id
and trunc(b.closing_date) = trunc(a.closing_date)
)
There are few mistakes in your code:
[Type_(2)] - Not a valid SQL
getdate() - There is no such function available in Oracle. you should use the trunc(SYSDATE) instead.
getdate()-37 - Why -37, when you want the last 30 days data. It should be 30.
Your query should look like this in oracle:
CREATE VIEW dbo.CashBreaks_30Days_View as
SELECT * FROM
(SELECT Closing_date,Bo,Desk,Breaks_Staus,Owner,status,Team,
SLA,Age_Bucket_EntryDate,Age_Bucket_ValueDate,Age_EntryDate,Age_ValueDate,
ROW_NUMBER() OVER (PARTITION BY Closing_date ORDER BY Closing_date DESC) AS RN
FROM Master_Data_CashBreaks
WHERE Closing_date >= TRUNC(SYSDATE) - 30)
WHERE RN = 1;
Your SQL contains a lot of errors
square brackets are invalid in SQL identifiers, if you have such a column you need to use double quotes. It's unclear to me if your column is named "[Type_(2)]" or maybe just `"Type_(2)"
There is no getdate() in SQL or in Postgres. Use current_date instead
So fixing all those error, your statement should look like this:
CREATE VIEW dbo.CashBreaks_30Days_View
as
SELECT Closing_date, Bo, Desk, Breaks_Staus, Owner, status,
Team, SLA, Age_Bucket_EntryDate,
Age_Bucket_ValueDate, Age_EntryDate, Age_ValueDate,
"[Type_(2)]" -- or maybe only "Type_(2)"
FROM Master_Data_CashBreaks
WHERE Closing_date >= current_cate - 30;
i am able to create with the below
create or replace view cashbreaks_30days_view_latesttime
as
select a. Column details
from master_data a
where a. Closing_date >= NOW() - interval '40 days'
and a.closing_date = (select max(b.closing_date)
from master_data b
where date(b.closing_date) = date(a.closing_date));

Teradata Current year and year-1

How to get the dynamic years in the Query for where condition, i need to fetch data for 2017,2018,2019, currently i am hard coding them ( where FSC_YR in (2017,2018,2019) instead i need in a dynamic way. How to do it in teradata.
I tried extract(year from current_date)-2,extract(year from current_date)-1,extract(year from current_date)-3). I am getting error too many expression.
Since you're looking for a range of year numbers, why not just use a BETWEEN?
SELECT *
FROM data
WHERE fsc_yr BETWEEN EXTRACT(year FROM current_date - interval '2' year) AND EXTRACT(year FROM current_date)
But as #dnoeth pointed out in the comments.
To avoid an error when running it on Feb. 29, using INTERVAL might not be the safest method.
But just subtracting from the year number isn't so bad really.
SELECT *
FROM data
WHERE fsc_yr BETWEEN EXTRACT(year FROM current_date)-2 AND EXTRACT(year FROM current_date)
Also note that such error can come from selecting more than 1 column in the query for an IN
For example this would fail:
SELECT * FROM Table1
WHERE Col1 IN (SELECT Col1, Col2 FROM Tabel2)
So if you would use the query for data with a * then it would still result in that error.

Postgresql WHERE with age() function [duplicate]

This question already has answers here:
Using an Alias column in the where clause in Postgresql
(6 answers)
Closed 4 years ago.
I'm pretty sure this has been asked before but I am struggling to get the correct syntax for a table containing data like
id date type report item_id
1 2018-11-07 Veröffentlichung des 9-Monats-Berichtes TRUE 16
2 2018-11-06 Veröffentlichung des 9-Monats-Berichtes TRUE 17
3 2019-03-07 Veröffentlichung des Jahresberichtes TRUE 17
4 2019-05-10 Bericht zum 1. Quartal TRUE 17
The query I am trying to formulate is
SELECT date, AGE(now(), date) as t1
FROM dates
WHERE t1 > 0
Meaning I am only looking for values in the past.
However, I get an error
ERROR: column "t1" does not exist
(of course, it is an alias). Does Postgresql not support aliases here?
You cannot refer to alias in WHERE condition, because logically WHERE is executed before SELECT.
You could use subquery:
SELECT *
FROM (SELECT date, AGE(now(), date) as t1
FROM dates) sub
WHERE sub.t1 > interval '0::seconds';
Or LATERAL(my favourite way):
SELECT date, s.t1
FROM dates
,LATERAL (SELECT AGE(now(), date) as t1) AS s
WHERE s.t1 > interval '0::seconds';
Or repeat expression(violates DRY principle):
SELECT date, AGE(now(), date) as t1
FROM dates
WHERE AGE(now(), date) > interval '0::seconds';
As for calculating AGE you don't really need it, because you could rewrite it as date > now().
Related articles:
PostgreSQL: using a calculated column in the same query
MySQL - Search into a custom Column
Why do “linq to sql” queries starts with the FROM keyword unlike regular SQL queries?
If you're in a hurry and have an (ordered) index on date don't do that.
Because this query can use the index giving a massive gain in performance at only a slight investment in coding effort.
SELECT date, AGE(now(), date) AS t1
FROM dates
WHERE date > now();
I say now(), because you did, but perhaps you want CURRENT_DATE instead
To create a suitable index do
create index dates_date on dates(date);

Using sql function generate_series() in redshift

I'd like to use the generate series function in redshift, but have not been successful.
The redshift documentation says it's not supported. The following code does work:
select *
from generate_series(1,10,1)
outputs:
1
2
3
...
10
I'd like to do the same with dates. I've tried a number of variations, including:
select *
from generate_series(date('2008-10-01'),date('2008-10-10 00:00:00'),1)
kicks out:
ERROR: function generate_series(date, date, integer) does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]
Also tried:
select *
from generate_series('2008-10-01 00:00:00'::timestamp,
'2008-10-10 00:00:00'::timestamp,'1 day')
And tried:
select *
from generate_series(cast('2008-10-01 00:00:00' as datetime),
cast('2008-10-10 00:00:00' as datetime),'1 day')
both kick out:
ERROR: function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]
If not looks like I'll use this code from another post:
SELECT to_char(DATE '2008-01-01'
+ (interval '1 month' * generate_series(0,57)), 'YYYY-MM-DD') AS ym
PostgreSQL generate_series() with SQL function as arguments
Amazon Redshift seems to be based on PostgreSQL 8.0.2. The timestamp arguments to generate_series() were added in 8.4.
Something like this, which sidesteps that problem, might work in Redshift.
SELECT current_date + (n || ' days')::interval
from generate_series (1, 30) n
It works in PostgreSQL 8.3, which is the earliest version I can test. It's documented in 8.0.26.
Later . . .
It seems that generate_series() is unsupported in Redshift. But given that you've verified that select * from generate_series(1,10,1) does work, the syntax above at least gives you a fighting chance. (Although the interval data type is also documented as being unsupported on Redshift.)
Still later . . .
You could also create a table of integers.
create table integers (
n integer primary key
);
Populate it however you like. You might be able to use generate_series() locally, dump the table, and load it on Redshift. (I don't know; I don't use Redshift.)
Anyway, you can do simple date arithmetic with that table without referring directly to generate_series() or to interval data types.
select (current_date + n)
from integers
where n < 31;
That works in 8.3, at least.
Using Redshift today, you can generate a range of dates by using datetime functions and feeding in a number table.
select (getdate()::date - generate_series)::date from generate_series(1,30,1)
Generates this for me
date
2015-11-06
2015-11-05
2015-11-04
2015-11-03
2015-11-02
2015-11-01
2015-10-31
2015-10-30
2015-10-29
2015-10-28
2015-10-27
2015-10-26
2015-10-25
2015-10-24
2015-10-23
2015-10-22
2015-10-21
2015-10-20
2015-10-19
2015-10-18
2015-10-17
2015-10-16
2015-10-15
2015-10-14
2015-10-13
2015-10-12
2015-10-11
2015-10-10
2015-10-09
2015-10-08
The generate_series() function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide.
UPDATE
generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate last 30 days date
Ref: generate_series function in Amazon Redshift
As of writing this, generate_series() on our instance of Redshift (1.0.33426) could not be used to, for example, create a table:
# select generate_series(1,100,1);
1
2
...
# create table normal_series as select generate_series(1,100,1);
INFO: Function "generate_series(integer, integer, integer) not supported.
ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.
However, with recursive works:
# create table recursive_series as with recursive t(n) as (select 1::integer union all select n+1 from t where n < 100) select n from t;
SELECT
-- modify as desired, here is a date series:
# select getdate()::date + n from recursive_series;
2021-12-18
2021-12-19
...
I needed to do something similar, but with 5 minutes intervals over 7 days. So here's a CTE based hack (ugly but not too verbose)
INSERT INTO five_min_periods
WITH
periods AS (select 0 as num UNION select 1 as num UNION select 2 UNION select 3 UNION select 4 UNION select 5 UNION select 6 UNION select 7 UNION select 8 UNION select 9 UNION select 10 UNION select 11),
hours AS (select num from periods UNION ALL select num + 12 from periods),
days AS (select num from periods where num <= 6),
rightnow AS (select CAST( TO_CHAR(GETDATE(), 'yyyy-mm-dd hh24') || ':' || trim(TO_CHAR((ROUND((DATEPART (MINUTE, GETDATE()) / 5), 1) * 5 ),'09')) AS TIMESTAMP) as start)
select
ROW_NUMBER() OVER(ORDER BY d.num DESC, h.num DESC, p.num DESC) as idx
, DATEADD(minutes, -p.num * 5, DATEADD( hours, -h.num, DATEADD( days, -d.num, n.start ) ) ) AS period_date
from days d, hours h, periods p, rightnow n
Should be able to extend this to other generation schemes. The trick here is using the Cartesian product join (i.e. no JOIN/WHERE clause) to multiply the hand-crafted CTE's to produce the necessary increments and apply to an anchor date.
Redshift's generate_series() function is a leader node only function and as such you cannot use it for downstream processing on the compute nodes. This can be replace by a recursive CTE (or keep a "dates" table on your database). I have an example of such in a recent answer:
Cross join Redshift with sequence of dates
One caution I like to give in answers like this is to be careful with inequality joins (or cross joins or any under-qualified joins) when working with VERY LARGE tables which can happen often in Redshift. If you are joining with a moderate Redshift table of say 1M rows then things will be fine. But if you are doing this on a table of 1B rows then the data explosion will likely cause massive performance issues as the query spills to disk.
I've written a couple of white papers on how to write this type of query in a data space sensitive way. This issue of massive intermediate results is not unique to Redshift and I first developed my approach solving a client's HIVE query issue. "First rule of writing SQL for Big Data - don't make more"
Per the comments of #Ryan Tuck and #Slobodan Pejic generate_series() does not work on Redshift when joining to another table.
The workaround I used was to write out every value in the series in the query:
SELECT
'2019-01-01'::date AS date_month
UNION ALL
SELECT
'2019-02-01'::date AS date_month
Using a Python function like this:
import arrow
def generate_date_series(start, end):
start = arrow.get(start)
end = arrow.get(end)
months = list(
f"SELECT '{month.format('YYYY-MM-DD')}'::date AS date_month"
for month in arrow.Arrow.range('month', start, end)
)
return "\nUNION ALL\n".join(months)
perhaps not as elegant as other solutions, but here's how I did it:
drop table if exists #dates;
create temporary table #dates as
with recursive cte(val_date) as
(select
cast('2020-07-01' as date) as val_date
union all
select
cast(dateadd(day, 1, val_date) as date) as val_date
from
cte
where
val_date <= getdate()
)
select
val_date as yyyymmdd
from
cte
order by
val_date
;
For five minute buckets i would do the following:
select date_trunc('minute', getdate()) - (i || ' minutes')::interval
from generate_series(0, 60*5-1, 5) as i
You could replace 5 by any given interval, and 60 by the number of rows you want.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,(select datediff(day,'01-Jan-2021',now()::date))) i
ORDER BY 1

How do you do date math that ignores the year?

I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following.
SELECT * FROM events
WHERE EXTRACT(month FROM "date") = 3
AND EXTRACT(day FROM "date") < EXTRACT(day FROM "date") + 14
The problem with this is that months wrap.
I would prefer to do something like this, but I don't know how to ignore the year.
SELECT * FROM events
WHERE (date > '2013-03-01' AND date < '2013-04-01')
How can I accomplish this kind of date math in Postgres?
TL/DR: use the "Black magic version" below.
All queries presented in other answers so far operate with conditions that are not sargable: they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. Doesn't matter much with small tables. Matters a lot with big tables.
Given the following simple table:
CREATE TABLE event (
event_id serial PRIMARY KEY
, event_date date
);
Query
Version 1. and 2. below can use a simple index of the form:
CREATE INDEX event_event_date_idx ON event(event_date);
But all of the following solutions are even faster without index.
1. Simple version
SELECT *
FROM (
SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date
FROM generate_series( 0, 14) d
CROSS JOIN generate_series(13, 113) y
) x
JOIN event USING (event_date);
Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with the final simple join.
2. Advanced version
WITH val AS (
SELECT extract(year FROM age(current_date + 14, min(event_date)))::int AS max_y
, extract(year FROM age(current_date, max(event_date)))::int AS min_y
FROM event
)
SELECT e.*
FROM (
SELECT ((current_date + d.d) - interval '1 year' * y.y)::date AS event_date
FROM generate_series(0, 14) d
,(SELECT generate_series(min_y, max_y) AS y FROM val) y
) x
JOIN event e USING (event_date);
Range of years is deduced from the table automatically - thereby minimizing generated years.
You could go one step further and distill a list of existing years if there are gaps.
Effectiveness co-depends on the distribution of dates. It's better for few years with many rows each.
Simple db<>fiddle to play with here
Old sqlfiddle
3. Black magic version
Create a simple SQL function to calculate an integer from the pattern 'MMDD':
CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';
I had to_char(time, 'MMDD') at first, but switched to the above expression which proved fastest in new tests on Postgres 9.6 and 10:
db<>fiddle here
It allows function inlining because EXTRACT(xyz FROM date) is implemented with the IMMUTABLE function date_part(text, date) internally. And it has to be IMMUTABLE to allow its use in the following essential multicolumn expression index:
CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);
Multicolumn for a number of reasons:
Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.
Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.
Basic query:
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN f_mmdd(current_date)
AND f_mmdd(current_date + 14);
One PL/pgSQL table function to rule them all
Fork to one of two queries to cover the turn of the year:
CREATE OR REPLACE FUNCTION f_anniversary(_the_date date = current_date, _days int = 14)
RETURNS SETOF event
LANGUAGE plpgsql AS
$func$
DECLARE
d int := f_mmdd($1);
d1 int := f_mmdd($1 + $2 - 1); -- fix off-by-1 from upper bound
BEGIN
IF d1 > d THEN
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN d AND d1
ORDER BY f_mmdd(e.event_date), e.event_date;
ELSE -- wrap around end of year
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) >= d OR
f_mmdd(e.event_date) <= d1
ORDER BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;
-- chronological across turn of the year
END IF;
END
$func$;
Call using defaults: 14 days beginning "today":
SELECT * FROM f_anniversary();
Call for 7 days beginning '2014-08-23':
SELECT * FROM f_anniversary(date '2014-08-23', 7);
db<>fiddle here - comparing EXPLAIN ANALYZE
"February 29"
When dealing with anniversaries or "birthdays", you need to define how to deal with the special case "February 29" in leap years.
When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.
On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.
Say, Bob is born on the 29th of February:
My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.
My query 3. includes February 29 in the range. Bob has birthday every year.
There is no magical solution. You have to define what you want for every case.
Test
To substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.
The good news: all of them are correct and yield the same result - except for Gordon's query that had syntax errors, and #wildplasser's query that fails when the year wraps around (easy to fix).
Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older).
INSERT INTO event (event_date)
SELECT '2000-1-1'::date - (random() * 36525)::int
FROM generate_series (1, 108000);
Delete ~ 8 % to create some dead tuples and make the table more "real life".
DELETE FROM event WHERE random() < 0.08;
ANALYZE event;
My test case had 99289 rows, 4012 hits.
C - Catcall
WITH anniversaries as (
SELECT event_id, event_date
,(event_date + (n || ' years')::interval)::date anniversary
FROM event, generate_series(13, 113) n
)
SELECT event_id, event_date -- count(*) --
FROM anniversaries
WHERE anniversary BETWEEN current_date AND current_date + interval '14' day;
C1 - Catcall's idea rewritten
Aside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year's anniversary, which avoids the need for a CTE altogether:
SELECT event_id, event_date
FROM event
WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date
BETWEEN current_date AND current_date + 14;
D - Daniel
SELECT * -- count(*) --
FROM event
WHERE extract(month FROM age(current_date + 14, event_date)) = 0
AND extract(day FROM age(current_date + 14, event_date)) <= 14;
E1 - Erwin 1
See "1. Simple version" above.
E2 - Erwin 2
See "2. Advanced version" above.
E3 - Erwin 3
See "3. Black magic version" above.
G - Gordon
SELECT * -- count(*)
FROM (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) e
WHERE to_date(to_char(now(), 'YYYY') || '-'
|| (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END)
,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;
H - a_horse_with_no_name
WITH upcoming as (
SELECT event_id, event_date
,CASE
WHEN date_trunc('year', age(event_date)) = age(event_date)
THEN current_date
ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)
* interval '1' year) AS date)
END AS next_event
FROM event
)
SELECT event_id, event_date
FROM upcoming
WHERE next_event - current_date <= 14;
W - wildplasser
CREATE OR REPLACE FUNCTION this_years_birthday(_dut date)
RETURNS date
LANGUAGE plpgsql AS
$func$
DECLARE
ret date;
BEGIN
ret := date_trunc('year' , current_timestamp)
+ (date_trunc('day' , _dut)
- date_trunc('year' , _dut));
RETURN ret;
END
$func$;
Simplified to return the same as all the others:
SELECT *
FROM event e
WHERE this_years_birthday( e.event_date::date )
BETWEEN current_date
AND current_date + '2weeks'::interval;
W1 - wildplasser's query rewritten
The above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:
CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date)
LANGUAGE sql AS
$func$
SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date
$func$;
SELECT *
FROM event e
WHERE this_years_birthday(e.event_date) BETWEEN current_date
AND (current_date + 14);
Test results
I ran this test with a temporary table on PostgreSQL 9.1.7.
Results were gathered with EXPLAIN ANALYZE, best of 5.
Results
Without index
C: Total runtime: 76714.723 ms
C1: Total runtime: 307.987 ms -- !
D: Total runtime: 325.549 ms
E1: Total runtime: 253.671 ms -- !
E2: Total runtime: 484.698 ms -- min() & max() expensive without index
E3: Total runtime: 213.805 ms -- !
G: Total runtime: 984.788 ms
H: Total runtime: 977.297 ms
W: Total runtime: 2668.092 ms
W1: Total runtime: 596.849 ms -- !
With index
E1: Total runtime: 37.939 ms --!!
E2: Total runtime: 38.097 ms --!!
With index on expression
E3: Total runtime: 11.837 ms --!!
All other queries perform the same with or without index because they use non-sargable expressions.
Conclusion
So far, #Daniel's query was the fastest.
#wildplassers (rewritten) approach performs acceptably, too.
#Catcall's version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.
The rewritten version performs pretty well, though. The expression I use is something like a simpler version of #wildplassser's this_years_birthday() function.
My "simple version" is faster even without index, because it needs fewer computations.
With index, the "advanced version" is about as fast as the "simple version", because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.
My "black magic version" is fastest with or without index. And it is very simple to call.
The updated version (after the benchmark) is a bit faster, yet.
With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.
I believe the following test works in all cases, assuming a column named anniv_date:
select * from events
where extract(month from age(current_date+interval '14 days', anniv_date))=0
and extract(day from age(current_date+interval '14 days', anniv_date)) <= 14
As an example of how it works when crossing a year (and also a month), let's say an anniversary date is 2009-01-04 and the date at which the test is run is 2012-12-29.
We want to consider any date between 2012-12-29 and 2013-01-12 (14 days)
age('2013-01-12'::date, '2009-01-04'::date) is 4 years 8 days.
extract(month...) from this is 0 and extract(days...) is 8, which is lower than 14 so it matches.
How about this?
select *
from events e
where to_char(e."date", 'MM-DD') between to_char(now(), 'MM-DD') and
to_char(date(now())+14, 'MM-DD')
You can do the comparison as strings.
To take year ends into account, we'll convert back to dates:
select *
from events e
where to_date(to_char(now(), 'YYYY')||'-'||to_char(e."date", 'MM-DD'), 'YYYY-MM-DD')
between date(now()) and date(now())+14
You do need to make a slight adjustment for Feb 29. I might suggest:
select *
from (select e.*,
to_char(e."date", 'MM-DD') as MMDD
from events
) e
where to_date(to_char(now(), 'YYYY')||'-'||(case when MMDD = '02-29' then '02-28' else MMDD), 'YYYY-MM-DD')
between date(now()) and date(now())+14
For convenience, I created two functions that yield the (expected or past) birsthday in the current year, and the upcoming birthday.
CREATE OR REPLACE FUNCTION this_years_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION next_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
IF (ret < date_trunc( 'day' , current_timestamp))
THEN ret = ret + '1year'::interval; END IF;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
--
-- call the function
--
SELECT date_trunc( 'day' , t.topic_date) AS the_date
, this_years_birthday( t.topic_date::date ) AS the_day
, next_birthday( t.topic_date::date ) AS next_day
FROM topic t
WHERE this_years_birthday( t.topic_date::date )
BETWEEN current_date
AND current_date + '2weeks':: interval
;
NOTE: the casts are needed because I only had timestamps available.
This should handle wrap-arounds at the end of the year as well:
with upcoming as (
select name,
event_date,
case
when date_trunc('year', age(event_date)) = age(event_date) then current_date
else cast(event_date + ((extract(year from age(event_date)) + 1) * interval '1' year) as date)
end as next_event
from events
)
select name,
next_event,
next_event - current_date as days_until_next
from upcoming
order by next_event - current_date
You can filter than on the expression next_event - current_date to apply the "next 14 days"
The case ... is only necessary if you consider events that would be "today" as "upcoming" as well. Otherwise, that can be reduced to the else part of the case statement.
Note that I "renamed" the column "date" to event_date. Mainly because reserved words shouldn't be used as an identifier but also because date is a terrible column name. It doesn't tell you anything about what it stores.
You can generate a virtual table of anniversaries, and select from it.
with anniversaries as (
select event_date,
(event_date + (n || ' years')::interval)::date anniversary
from events, generate_series(1,10) n
)
select event_date, anniversary
from anniversaries
where anniversary between current_date and current_date + interval '14' day
order by event_date, anniversary
The call to generate_series(1,10) has the effect of generating 10 years of anniversaries for each event_date. I wouldn't use the literal value 10 in production. Instead, I'd either calculate the right number of years to use in a subquery, or I'd use a large literal like 100.
You'll want to adjust the WHERE clause to fit your application.
If you have a performance problem with the virtual table (when you have a lot of rows in "events"), replace the common table expression with a base table having the identical structure. Storing anniversaries in a base table makes their values obvious (especially for, say, Feb 29 anniversaries), and queries on such a table can use an index. Querying an anniversary table of half a million rows using just the SELECT statement above takes 25ms on my desktop.
I found a way to do it.
SELECT EXTRACT(DAYS FROM age('1999-04-10', '2003-05-12')),
EXTRACT(MONTHS FROM age('1999-04-10', '2003-05-12'));
date_part | date_part
-----------+-----------
-2 | -1
I can then just check that the month is 0 and the days are less than 14.
If you have a more elegant solution, please do post it. I'll leave the question open for a bit.
I don't work with postgresql so I googled it's date functions and found this: http://www.postgresql.org/docs/current/static/functions-datetime.html
If I read it correctly, looking for events in the next 14 days is as simple as:
where mydatefield >= current_date
and mydatefield < current_date + integer '14'
Of course I might not be reading it correctly.