Oracle Query SQL Tuning? - sql

select round(avg(et_gsm_sınyal)) as sinyal,mahalle_kodu,ilce_kodu,sebeke
from
(select et_gsm_sınyal,sozlesme_no,SUBSTR(et_operator,1,5) as sebeke
from thkol316old
where tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND tarih < TRUNC(SYSDATE, 'MM')) okuma,
(select sozlesme_no,ilce_kodu,mahalle_kodu from commt020) bilgiler
where okuma.sozlesme_no=bilgiler.sozlesme_no
group by mahalle_kodu,ilce_kodu,sebeke;
commt020 -> customer table
thkol316old -> old bill table
This query is works but it's works very slow.
It's about 550 seconds response time.
What am I supposed to do this query work faster ?
It's the execution plan
SELECT STATEMENT 7547
HASH
GROUP BY 7547
FILTER
Filter Predicates
ADD_MONTHS(TRUNC(SYSDATE#!,'fmmm'),-1)
NESTED LOOPS
NESTED LOOPS
7546
TABLE ACCESS
COMMT020 BY GLOBAL INDEX ROWID 3 ROW LOCATION ROW LOCATION`

First of all I would try with ordinary table joins instead of the more awkward inline view - joining, I hope the following will work (I haven't created tables and tried it):
select round(avg(okuma.et_gsm_sınyal)) as sinyal,
bilgiler.mahalle_kodu,bilgiler.ilce_kodu, SUBSTR(okuma.et_operator,1,5) as sebeke
from thkol316old okuma
inner join commt020 bilgiler on okuma.sozlesme_no=bilgiler.sozlesme_no
where okuma.tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND okuma.tarih < TRUNC(SYSDATE, 'MM')
group by bilgiler.mahalle_kodu,bilgiler.ilce_kodu, SUBSTR(okuma.et_operator,1,5);
Then, if things still are slow:
Verify that there is an index where bilgiler.sozlesme_no is the
first column in the index
Verify that there is an index where okuma.sozlesme_no is the first column in the index
Verify that there is an index where okuma.tarih is the first column in the index

Clearly
column tarih is not indexed for table thkol316old
create index tarih_idx on thkol316old (tarih);
Then
analyze table commt020 compute statistics;
analyze table thkol316old compute statistics;
In Oracle analyzing tables produce information that is used in creating query execution plans. Every time you alter a table you may want to also analyze it. Many big systems do it on a schedule.

First of all, you don’t need BILIGILER inline view just to select a few columns from COMMT020 table, you’ll be perfectly fine selecting from that table directly:
SELECT ROUND(AVG(et_gsm_s1nyal)) AS sinyal,
mahalle_kodu,ilce_kodu,sebeke
FROM (
SELECT et_gsm_s1nyal,
sozlesme_no,
SUBSTR(et_operator,1,5) AS sebeke
FROM thkol316old
WHERE tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND tarih < TRUNC(SYSDATE, 'MM')
) okuma, commt020
WHERE okuma.sozlesme_no = commt020.sozlesme_no
GROUP BY mahalle_kodu,ilce_kodu,sebeke
/
Then, let’s rewrite the join using ANSI join expression. I prefer ANSI joins over old Oracle-style joins because they allow separating join conditions from filtering conditions, therefore providing more clarity into what’s actually going on. Also, it is a good style to assign aliases to the tables and clearly indicate which columns we select from which table.
SELECT ROUND(AVG(o.et_gsm_s1nyal)) AS sinyal,
c.mahalle_kodu, c.ilce_kodu, o.sebeke
FROM (
SELECT th.et_gsm_s1nyal,
th.sozlesme_no,
SUBSTR(th.et_operator,1,5) AS sebeke
FROM thkol316old th
WHERE th.tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND th.tarih < TRUNC(SYSDATE, 'MM')
) okuma o
JOIN commt020 c ON o.sozlesme_no = c.sozlesme_no
GROUP BY c.mahalle_kodu, c.ilce_kodu, o.sebeke
/
Now it is clearer than the remaining inline view is redundant too. Although it's hard to tell without knowing the details of those tables, but you will probably be better off “unwrapping” that inline view and replacing it with a straight join:
SELECT ROUND(AVG(th.et_gsm_s1nyal)) AS sinyal,
c.mahalle_kodu,
c.ilce_kodu,
SUBSTR(th.et_operator,1,5) AS sebeke
FROM commt020 c
JOIN thkol316old th ON c.sozlesme_no = th.sozlesme_no
WHERE th.tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND th.tarih < TRUNC(SYSDATE, 'MM')
GROUP BY c.mahalle_kodu, c.ilce_kodu, SUBSTR(th.et_operator,1,5)
/
Unfortunately optimizing this query further requires additional information, such as:
The new execution plan.
The version of Oracle you’re using.
Numbers of rows in THKOL316OLD and COMMT020 tables.
What indexes exist on those tables.

Related

Optimization on large tables

I have the following query that joins two large tables. I am trying to join on patient_id and records that are not older than 30 days.
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0
and to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30
Currently, this query takes 2 hours to run. What indexes can I create on these tables for this query to run faster.
I will take a shot in the dark, because as others said it depends on what the table structure, indices, and the output of the planner is.
The most obvious thing here is that as long as it is possible, you want to represent dates as some date datatype instead of strings. That is the first and most important change you should make here. No index can save you if you transform strings. Because very likely, the problem is not the patient_id, it's your date calculation.
Other than that, forcing hash joins on the patient_id and then doing the filtering could help if for some reason the planner decided to do nested loops for that condition. But that is for after you fixed your date representation AND you still have a problem AND you see that the planner does nested loops on that attribute.
Some observations if you are stuck with string fields for the dates:
YYYYMMDD date strings are ordered and can be used for <,> and =.
Building strings from the data in chairs to use to JOIN on data will make good use of an index like one on data for patient_id, from_date.
So my suggestion would be to write expressions that build the date strings you want to use in the JOIN. Or to put it another way: do not transform the child table data from a string to something else.
Example expression that takes 30 days off a string date and returns a string date:
select to_char(to_date('20200112', 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
Untested:
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and id.from_date between to_char(to_date(c.from_date, 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
and c.from_date
For this query:
select *
from chairs c join data
id
on c.patient_id = id.patient_id and
to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0 and
to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30;
You should start with indexes on (patient_id, from_date) -- you can put them in both tables.
The date comparisons are problematic. Storing the values as actual dates can help. But it is not a 100% solution because comparison operations are still needed.
Depending on what you are actually trying to accomplish there might be other ways of writing the query. I might encourage you to ask a new question, providing sample data, desired results, and a clear explanation of what you really want. For instance, this query is likely to return a lot of rows. And that just takes time as well.
Your query have a non SERGABLE predicate because it uses functions that are iteratively executed. You need to discard such functions and replace them by a direct access to the columns. As an exemple :
SELECT *
FROM chairs AS c
JOIN data AS id
ON c.patient_id = id.patient_id
AND c.from_date BETWEEN id.from_date AND id.from_date + INTERVAL '1 day'
Will run faster with those two indexes :
CREATE X_SQLpro_001 ON chairs (patient_id, from_date);
CREATE X_SQLpro_002 ON data (patient_id, from_date) ;
Also try to avoid
SELECT *
And list only the necessary columns

Oracle Optimize Query

i'm working with oracle pl/sql and i have a stored procedure with this query, and it is a bit convoluted, but it gets the job done, the thing is it takes like 35 minutes, and the sql developer Autotrace says that is doing a full scan even though the tables have their indexes.
So is there any way to improve this query?
select tipotrx, sum(saldo) as saldo,
count(*) as totaltrx from (
select max(ids) as IDTRX, max(monto) as monto, min(saldo) as saldo, max(aq_data) as aq_data, thekey, tipotrx
from (
select t.SID as ids, (TO_NUMBER(SUBSTR(P.P1, 18, 12))) as monto,
((TO_NUMBER(SUBSTR(P.P1, 18, 12)) * (TO_NUMBER(SUBSTR(t.acquirer_data, 13,2)) -
TO_NUMBER(SUBSTR(P.P4, 3,2))))) as saldo,
(TO_CHAR(t.trx_date, 'YYMMDD') || t.auth_code || t.trx_amount || (SELECT
functions.decrypt(t.card_number) FROM DUAL)) as thekey,
t.acquirer_data AS aq_data,
TO_NUMBER(SUBSTR(t.acquirer_data, 12, 1)) as tipotrx
from TBL_TRX t INNER JOIN TBL_POS P ON (t.SID = P.transaction)
WHERE (TO_NUMBER(SUBSTR(t.acquirer_data, 13,2)) >= TO_NUMBER(SUBSTR(P.P4, 3,2)))
AND trunc(t.INC_DATE) between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35) AND TO_DATE('20/06/2020', 'DD/MM/YYYY')
) t
group by thekey, tipotrx order by max(ids) desc) j
group by tipotrx;
Thanks.
Most of the time the index has to match exactly what's in the WHERE clause to be eligible for use. An index on the acquirer_data column cannot be used when your WHERE clause says
TO_NUMBER(SUBSTR(t.acquirer_data, 13,2))
An index on the INC_DATE cannot be used when your WHERE clause says
trunc(t.INC_DATE)
You manipulate every column in the WHERE clause and that alone can potentially prevent the use of any normal index.
If however you create function-based indexes, you can make some new indexes that match what's in your WHERE clause. That way at least there's a chance that the DB will use an index instead of doing full table scans.
--example function based index.
CREATE INDEX TRUNC_INC_DATE ON TBL_TRX (trunc(t.INC_DATE));
Of course, new indexes take up more space and add overhead of their own. Keep using that autotrace to see if it's worth it.
Also, updating table statistics probably wont hurt either.
Change this:
trunc(t.INC_DATE) between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35)
AND TO_DATE('20/06/2020', 'DD/MM/YYYY')
To this:
t.INC_DATE between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35)
AND TO_DATE('21/06/2020', 'DD/MM/YYYY') - INTERVAL '1' SECOND
Instead of building a function-based index you can modify the predicate to be sargable (able to use an index). Instead of using TRUNC to subtract from the the column, add a day minus one second to upper bound literal.
The code is more confusing but should be able to take advantage of the index. However, 35 days of data may be a large amount; the date index may not be very useful and you may need to look at other predicates.

How to create a view in PostgreSQL with where clause

I am trying to create a view in Postgres
I have 3-4 time stamps in each closing_date where I need to select only the latest time stamp of each day
And also I have to restrict the closing_date to only 30 days (shown in SQL query below)
Below is the query from SQL data which I had created
CREATE VIEW dbo.CashBreaks_30Days_View as
SELECT Closing_date,Bo,Desk,Breaks_Staus,Owner,status,Team,
SLA,Age_Bucket_EntryDate,Age_Bucket_ValueDate,Age_EntryDate,Age_ValueDate,
[Type_(2)]
FROM Master_Data_CashBreaks
WHERE Closing_date >= cast(getdate()-37 as date);
If I understood you correctly, something like this might return result you want:
create or replace view cash_breaks_30days_view
as
select a.list_of_columns
from master_data_cashbreaks a
where a.closing_date >= trunc(sysdate) - 30 --> the last 30 days
and a.closing_date = (select max(b.closing_date) --> subquery is used to return
from master_data_cashbreaks b -- the last timestamp per date
where b.id = a.id
and trunc(b.closing_date) = trunc(a.closing_date)
)
There are few mistakes in your code:
[Type_(2)] - Not a valid SQL
getdate() - There is no such function available in Oracle. you should use the trunc(SYSDATE) instead.
getdate()-37 - Why -37, when you want the last 30 days data. It should be 30.
Your query should look like this in oracle:
CREATE VIEW dbo.CashBreaks_30Days_View as
SELECT * FROM
(SELECT Closing_date,Bo,Desk,Breaks_Staus,Owner,status,Team,
SLA,Age_Bucket_EntryDate,Age_Bucket_ValueDate,Age_EntryDate,Age_ValueDate,
ROW_NUMBER() OVER (PARTITION BY Closing_date ORDER BY Closing_date DESC) AS RN
FROM Master_Data_CashBreaks
WHERE Closing_date >= TRUNC(SYSDATE) - 30)
WHERE RN = 1;
Your SQL contains a lot of errors
square brackets are invalid in SQL identifiers, if you have such a column you need to use double quotes. It's unclear to me if your column is named "[Type_(2)]" or maybe just `"Type_(2)"
There is no getdate() in SQL or in Postgres. Use current_date instead
So fixing all those error, your statement should look like this:
CREATE VIEW dbo.CashBreaks_30Days_View
as
SELECT Closing_date, Bo, Desk, Breaks_Staus, Owner, status,
Team, SLA, Age_Bucket_EntryDate,
Age_Bucket_ValueDate, Age_EntryDate, Age_ValueDate,
"[Type_(2)]" -- or maybe only "Type_(2)"
FROM Master_Data_CashBreaks
WHERE Closing_date >= current_cate - 30;
i am able to create with the below
create or replace view cashbreaks_30days_view_latesttime
as
select a. Column details
from master_data a
where a. Closing_date >= NOW() - interval '40 days'
and a.closing_date = (select max(b.closing_date)
from master_data b
where date(b.closing_date) = date(a.closing_date));

sql query to get today new records compared with yesterday

i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times
Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.
The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().
Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );

How do you do date math that ignores the year?

I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following.
SELECT * FROM events
WHERE EXTRACT(month FROM "date") = 3
AND EXTRACT(day FROM "date") < EXTRACT(day FROM "date") + 14
The problem with this is that months wrap.
I would prefer to do something like this, but I don't know how to ignore the year.
SELECT * FROM events
WHERE (date > '2013-03-01' AND date < '2013-04-01')
How can I accomplish this kind of date math in Postgres?
TL/DR: use the "Black magic version" below.
All queries presented in other answers so far operate with conditions that are not sargable: they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. Doesn't matter much with small tables. Matters a lot with big tables.
Given the following simple table:
CREATE TABLE event (
event_id serial PRIMARY KEY
, event_date date
);
Query
Version 1. and 2. below can use a simple index of the form:
CREATE INDEX event_event_date_idx ON event(event_date);
But all of the following solutions are even faster without index.
1. Simple version
SELECT *
FROM (
SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date
FROM generate_series( 0, 14) d
CROSS JOIN generate_series(13, 113) y
) x
JOIN event USING (event_date);
Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with the final simple join.
2. Advanced version
WITH val AS (
SELECT extract(year FROM age(current_date + 14, min(event_date)))::int AS max_y
, extract(year FROM age(current_date, max(event_date)))::int AS min_y
FROM event
)
SELECT e.*
FROM (
SELECT ((current_date + d.d) - interval '1 year' * y.y)::date AS event_date
FROM generate_series(0, 14) d
,(SELECT generate_series(min_y, max_y) AS y FROM val) y
) x
JOIN event e USING (event_date);
Range of years is deduced from the table automatically - thereby minimizing generated years.
You could go one step further and distill a list of existing years if there are gaps.
Effectiveness co-depends on the distribution of dates. It's better for few years with many rows each.
Simple db<>fiddle to play with here
Old sqlfiddle
3. Black magic version
Create a simple SQL function to calculate an integer from the pattern 'MMDD':
CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';
I had to_char(time, 'MMDD') at first, but switched to the above expression which proved fastest in new tests on Postgres 9.6 and 10:
db<>fiddle here
It allows function inlining because EXTRACT(xyz FROM date) is implemented with the IMMUTABLE function date_part(text, date) internally. And it has to be IMMUTABLE to allow its use in the following essential multicolumn expression index:
CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);
Multicolumn for a number of reasons:
Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.
Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.
Basic query:
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN f_mmdd(current_date)
AND f_mmdd(current_date + 14);
One PL/pgSQL table function to rule them all
Fork to one of two queries to cover the turn of the year:
CREATE OR REPLACE FUNCTION f_anniversary(_the_date date = current_date, _days int = 14)
RETURNS SETOF event
LANGUAGE plpgsql AS
$func$
DECLARE
d int := f_mmdd($1);
d1 int := f_mmdd($1 + $2 - 1); -- fix off-by-1 from upper bound
BEGIN
IF d1 > d THEN
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) BETWEEN d AND d1
ORDER BY f_mmdd(e.event_date), e.event_date;
ELSE -- wrap around end of year
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date) >= d OR
f_mmdd(e.event_date) <= d1
ORDER BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;
-- chronological across turn of the year
END IF;
END
$func$;
Call using defaults: 14 days beginning "today":
SELECT * FROM f_anniversary();
Call for 7 days beginning '2014-08-23':
SELECT * FROM f_anniversary(date '2014-08-23', 7);
db<>fiddle here - comparing EXPLAIN ANALYZE
"February 29"
When dealing with anniversaries or "birthdays", you need to define how to deal with the special case "February 29" in leap years.
When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.
On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.
Say, Bob is born on the 29th of February:
My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.
My query 3. includes February 29 in the range. Bob has birthday every year.
There is no magical solution. You have to define what you want for every case.
Test
To substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.
The good news: all of them are correct and yield the same result - except for Gordon's query that had syntax errors, and #wildplasser's query that fails when the year wraps around (easy to fix).
Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older).
INSERT INTO event (event_date)
SELECT '2000-1-1'::date - (random() * 36525)::int
FROM generate_series (1, 108000);
Delete ~ 8 % to create some dead tuples and make the table more "real life".
DELETE FROM event WHERE random() < 0.08;
ANALYZE event;
My test case had 99289 rows, 4012 hits.
C - Catcall
WITH anniversaries as (
SELECT event_id, event_date
,(event_date + (n || ' years')::interval)::date anniversary
FROM event, generate_series(13, 113) n
)
SELECT event_id, event_date -- count(*) --
FROM anniversaries
WHERE anniversary BETWEEN current_date AND current_date + interval '14' day;
C1 - Catcall's idea rewritten
Aside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year's anniversary, which avoids the need for a CTE altogether:
SELECT event_id, event_date
FROM event
WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date
BETWEEN current_date AND current_date + 14;
D - Daniel
SELECT * -- count(*) --
FROM event
WHERE extract(month FROM age(current_date + 14, event_date)) = 0
AND extract(day FROM age(current_date + 14, event_date)) <= 14;
E1 - Erwin 1
See "1. Simple version" above.
E2 - Erwin 2
See "2. Advanced version" above.
E3 - Erwin 3
See "3. Black magic version" above.
G - Gordon
SELECT * -- count(*)
FROM (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) e
WHERE to_date(to_char(now(), 'YYYY') || '-'
|| (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END)
,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;
H - a_horse_with_no_name
WITH upcoming as (
SELECT event_id, event_date
,CASE
WHEN date_trunc('year', age(event_date)) = age(event_date)
THEN current_date
ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)
* interval '1' year) AS date)
END AS next_event
FROM event
)
SELECT event_id, event_date
FROM upcoming
WHERE next_event - current_date <= 14;
W - wildplasser
CREATE OR REPLACE FUNCTION this_years_birthday(_dut date)
RETURNS date
LANGUAGE plpgsql AS
$func$
DECLARE
ret date;
BEGIN
ret := date_trunc('year' , current_timestamp)
+ (date_trunc('day' , _dut)
- date_trunc('year' , _dut));
RETURN ret;
END
$func$;
Simplified to return the same as all the others:
SELECT *
FROM event e
WHERE this_years_birthday( e.event_date::date )
BETWEEN current_date
AND current_date + '2weeks'::interval;
W1 - wildplasser's query rewritten
The above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:
CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date)
LANGUAGE sql AS
$func$
SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date
$func$;
SELECT *
FROM event e
WHERE this_years_birthday(e.event_date) BETWEEN current_date
AND (current_date + 14);
Test results
I ran this test with a temporary table on PostgreSQL 9.1.7.
Results were gathered with EXPLAIN ANALYZE, best of 5.
Results
Without index
C: Total runtime: 76714.723 ms
C1: Total runtime: 307.987 ms -- !
D: Total runtime: 325.549 ms
E1: Total runtime: 253.671 ms -- !
E2: Total runtime: 484.698 ms -- min() & max() expensive without index
E3: Total runtime: 213.805 ms -- !
G: Total runtime: 984.788 ms
H: Total runtime: 977.297 ms
W: Total runtime: 2668.092 ms
W1: Total runtime: 596.849 ms -- !
With index
E1: Total runtime: 37.939 ms --!!
E2: Total runtime: 38.097 ms --!!
With index on expression
E3: Total runtime: 11.837 ms --!!
All other queries perform the same with or without index because they use non-sargable expressions.
Conclusion
So far, #Daniel's query was the fastest.
#wildplassers (rewritten) approach performs acceptably, too.
#Catcall's version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.
The rewritten version performs pretty well, though. The expression I use is something like a simpler version of #wildplassser's this_years_birthday() function.
My "simple version" is faster even without index, because it needs fewer computations.
With index, the "advanced version" is about as fast as the "simple version", because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.
My "black magic version" is fastest with or without index. And it is very simple to call.
The updated version (after the benchmark) is a bit faster, yet.
With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.
I believe the following test works in all cases, assuming a column named anniv_date:
select * from events
where extract(month from age(current_date+interval '14 days', anniv_date))=0
and extract(day from age(current_date+interval '14 days', anniv_date)) <= 14
As an example of how it works when crossing a year (and also a month), let's say an anniversary date is 2009-01-04 and the date at which the test is run is 2012-12-29.
We want to consider any date between 2012-12-29 and 2013-01-12 (14 days)
age('2013-01-12'::date, '2009-01-04'::date) is 4 years 8 days.
extract(month...) from this is 0 and extract(days...) is 8, which is lower than 14 so it matches.
How about this?
select *
from events e
where to_char(e."date", 'MM-DD') between to_char(now(), 'MM-DD') and
to_char(date(now())+14, 'MM-DD')
You can do the comparison as strings.
To take year ends into account, we'll convert back to dates:
select *
from events e
where to_date(to_char(now(), 'YYYY')||'-'||to_char(e."date", 'MM-DD'), 'YYYY-MM-DD')
between date(now()) and date(now())+14
You do need to make a slight adjustment for Feb 29. I might suggest:
select *
from (select e.*,
to_char(e."date", 'MM-DD') as MMDD
from events
) e
where to_date(to_char(now(), 'YYYY')||'-'||(case when MMDD = '02-29' then '02-28' else MMDD), 'YYYY-MM-DD')
between date(now()) and date(now())+14
For convenience, I created two functions that yield the (expected or past) birsthday in the current year, and the upcoming birthday.
CREATE OR REPLACE FUNCTION this_years_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION next_birthday( _dut DATE) RETURNS DATE AS
$func$
DECLARE
ret DATE;
BEGIN
ret =
date_trunc( 'year' , current_timestamp)
+ (date_trunc( 'day' , _dut)
- date_trunc( 'year' , _dut)
)
;
IF (ret < date_trunc( 'day' , current_timestamp))
THEN ret = ret + '1year'::interval; END IF;
RETURN ret;
END;
$func$ LANGUAGE plpgsql;
--
-- call the function
--
SELECT date_trunc( 'day' , t.topic_date) AS the_date
, this_years_birthday( t.topic_date::date ) AS the_day
, next_birthday( t.topic_date::date ) AS next_day
FROM topic t
WHERE this_years_birthday( t.topic_date::date )
BETWEEN current_date
AND current_date + '2weeks':: interval
;
NOTE: the casts are needed because I only had timestamps available.
This should handle wrap-arounds at the end of the year as well:
with upcoming as (
select name,
event_date,
case
when date_trunc('year', age(event_date)) = age(event_date) then current_date
else cast(event_date + ((extract(year from age(event_date)) + 1) * interval '1' year) as date)
end as next_event
from events
)
select name,
next_event,
next_event - current_date as days_until_next
from upcoming
order by next_event - current_date
You can filter than on the expression next_event - current_date to apply the "next 14 days"
The case ... is only necessary if you consider events that would be "today" as "upcoming" as well. Otherwise, that can be reduced to the else part of the case statement.
Note that I "renamed" the column "date" to event_date. Mainly because reserved words shouldn't be used as an identifier but also because date is a terrible column name. It doesn't tell you anything about what it stores.
You can generate a virtual table of anniversaries, and select from it.
with anniversaries as (
select event_date,
(event_date + (n || ' years')::interval)::date anniversary
from events, generate_series(1,10) n
)
select event_date, anniversary
from anniversaries
where anniversary between current_date and current_date + interval '14' day
order by event_date, anniversary
The call to generate_series(1,10) has the effect of generating 10 years of anniversaries for each event_date. I wouldn't use the literal value 10 in production. Instead, I'd either calculate the right number of years to use in a subquery, or I'd use a large literal like 100.
You'll want to adjust the WHERE clause to fit your application.
If you have a performance problem with the virtual table (when you have a lot of rows in "events"), replace the common table expression with a base table having the identical structure. Storing anniversaries in a base table makes their values obvious (especially for, say, Feb 29 anniversaries), and queries on such a table can use an index. Querying an anniversary table of half a million rows using just the SELECT statement above takes 25ms on my desktop.
I found a way to do it.
SELECT EXTRACT(DAYS FROM age('1999-04-10', '2003-05-12')),
EXTRACT(MONTHS FROM age('1999-04-10', '2003-05-12'));
date_part | date_part
-----------+-----------
-2 | -1
I can then just check that the month is 0 and the days are less than 14.
If you have a more elegant solution, please do post it. I'll leave the question open for a bit.
I don't work with postgresql so I googled it's date functions and found this: http://www.postgresql.org/docs/current/static/functions-datetime.html
If I read it correctly, looking for events in the next 14 days is as simple as:
where mydatefield >= current_date
and mydatefield < current_date + integer '14'
Of course I might not be reading it correctly.