Getting results between two dates in PostgreSQL - sql

I have the following table:
+-----------+-----------+------------+----------+
| id | user_id | start_date | end_date |
| (integer) | (integer) | (date) | (date) |
+-----------+-----------+------------+----------+
Fields start_date and end_date are holding date values like YYYY-MM-DD.
An entry from this table can look like this: (1, 120, 2012-04-09, 2012-04-13).
I have to write a query that can fetch all the results matching a certain period.
The problem is that if I want to fetch results from 2012-01-01 to 2012-04-12, I get 0 results even though there is an entry with start_date = "2012-04-09" and end_date = "2012-04-13".

SELECT *
FROM mytable
WHERE (start_date, end_date) OVERLAPS ('2012-01-01'::DATE, '2012-04-12'::DATE);
Datetime functions is the relevant section in the docs.

Assuming you want all "overlapping" time periods, i.e. all that have at least one day in common.
Try to envision time periods on a straight time line and move them around before your eyes and you will see the necessary conditions.
SELECT *
FROM tbl
WHERE start_date <= '2012-04-12'::date
AND end_date >= '2012-01-01'::date;
This is sometimes faster for me than OVERLAPS - which is the other good way to do it (as #Marco already provided).
Note the subtle difference. The manual:
OVERLAPS automatically takes the earlier value of the pair as the
start. Each time period is considered to represent the half-open
interval start <= time < end, unless start and end are equal in which
case it represents that single time instant. This means for instance
that two time periods with only an endpoint in common do not overlap.
Bold emphasis mine.
Performance
For big tables the right index can help performance (a lot).
CREATE INDEX tbl_date_inverse_idx ON tbl(start_date, end_date DESC);
Possibly with another (leading) index column if you have additional selective conditions.
Note the inverse order of the two columns. See:
Optimizing queries on a range of timestamps (two columns)

just had the same question, and answered this way, if this could help.
select *
from table
where start_date between '2012-01-01' and '2012-04-13'
or end_date between '2012-01-01' and '2012-04-13'

To have a query working in any locale settings, consider formatting the date yourself:
SELECT *
FROM testbed
WHERE start_date >= to_date('2012-01-01','YYYY-MM-DD')
AND end_date <= to_date('2012-04-13','YYYY-MM-DD');

Looking at the dates for which it doesn't work -- those where the day is less than or equal to 12 -- I'm wondering whether it's parsing the dates as being in YYYY-DD-MM format?

You have to use the date part fetching method:
SELECT * FROM testbed WHERE start_date ::date >= to_date('2012-09-08' ,'YYYY-MM-DD') and date::date <= to_date('2012-10-09' ,'YYYY-MM-DD')

No offense but to check for performance of sql I executed some of the above mentioned solutiona pgsql.
Let me share you Statistics of top 3 solution approaches that I come across.
1) Took : 1.58 MS Avg
2) Took : 2.87 MS Avg
3) Took : 3.95 MS Avg
Now try this :
SELECT * FROM table WHERE DATE_TRUNC('day', date ) >= Start Date AND DATE_TRUNC('day', date ) <= End Date
Now this solution took : 1.61 Avg.
And best solution is 1st that suggested by marco-mariani

SELECT *
FROM ecs_table
WHERE (start_date, end_date) OVERLAPS ('2012-01-01'::DATE, '2012-04-12'::DATE + interval '1');

Let's try range data type.
--sample data.
begin;
create temp table tbl(id serial, user_id integer, start_date date, end_date date);
insert into tbl(user_id, start_date, end_date) values(1, '2012-04-09', '2012-04-13');
insert into tbl(user_id, start_date, end_date) values(1, '2012-01-09', '2012-04-12');
insert into tbl(user_id, start_date, end_date) values(1, '2012-02-09', '2012-04-10');
insert into tbl(user_id, start_date, end_date) values(1, '2012-04-09', '2012-04-10');
commit;
add a new daterange column.
begin;
alter table tbl add column tbl_period daterange ;
update tbl set tbl_period = daterange(start_date,end_date);
commit;
--now test time.
select * from tbl
where tbl_period && daterange('2012-04-10' ::date, '2012-04-12'::date);
returns:
id | user_id | start_date | end_date | tbl_period
----+---------+------------+------------+-------------------------
1 | 1 | 2012-04-09 | 2012-04-13 | [2012-04-09,2012-04-13)
2 | 1 | 2012-01-09 | 2012-04-12 | [2012-01-09,2012-04-12)
further reference: https://www.postgresql.org/docs/current/functions-range.html#RANGE-OPERATORS-TABLE

Related

What is the best way to find all dates that match the date you entered?

I am trying to get all results from Oracle DB using SQL Developer by corresponding date.
My data:
ID | date_time_of_identification
--------------------------------------------
1240088696 | 22-SEP-19 06.24.23.432000000 AM
1239485087 | 21-SEP-19 09.25.45.912000000 AM
1239228398 | 21-SEP-19 07.18.40.555000000 AM
1239223300 | 21-SEP-19 07.16.39.812000000 AM
1233224199 | 18-SEP-19 10.54.04.023000000 AM
1232432331 | 18-SEP-19 05.06.40.383000000 AM
1231492850 | 17-SEP-19 01.06.05.316000000 PM
So I am trying to get all rows from 21.09.2019, then I am writing:
select * from mytable where date_time_of_identification = TO_DATE('2019/09/21', 'yyyy/mm/dd'); -- no result
Now I am trying to write better query:
select * from mytable
where to_char(date_time_of_identification, 'yyyy/mm/dd') = to_char(TO_DATE('2019/09/21', 'yyyy/mm/dd'), 'yyyy/mm/dd');
It returns good result, but Is there a better solution?
You'll have to truncate your date from column to lose the timestamp part:
select *
from mytable
where trunc(date_time_of_identification) = TO_DATE('2019/09/21', 'yyyy/mm/dd');
Assuming that your predicate is reasonably selective (i.e. the number of rows on a particular day is a small fraction of the number of rows in the table), you'd generally want your query to be able to use an index on date_time_of_identification. If you apply a function to that column, you won't be able to use an index. So you'd generally want to write this as
select *
from myTable
where date_time_of_identification >= date '2019-09-21'
and date_time_of_identification < date '2019-09-22'
The alternative would be to create a function-based index on date_time_of_identification and then use that function in the query.
create index fbi_myTable
on trunc( date_time_of_identification );
select *
from myTable
where trunc( date_time_of_identification ) = date '2019-09-21';

oracle query to extract all the dates within given periods and range

Consider if there is a table with two column
TABLE TIME_FRAME
------------------------
|FROM |TO |
|2013-12-13 |2014-01-06|
|2011-12-05 |2011-12-31|
|2014-01-23 |2014-02-22|
|2011-11-21 |2011-12-17|
........
FROM and TO from each row defines a period of time. Also there can be overlap between the periods (here row 2 and row 4) or cover multiple periods
if give a start_date and end_date as parameters here the requirement is about return all the dates falls within the parameters and also within any of the periods in the columns
for example
if start_date is 2013-12-25 and end_date is 2014-02-10
so from above data it should return all dates between
`2013-12-25` and `2014-01-06`
plus
`2014-01-23` and `2014-02-10`
is it possible to create a query for above requirement (not by PL/SQL)?
It's possible by creating set of days using LEVEL recursion operator and then filtering this set by comparing it to your table data. Here is the working Oracle SQL query for you:
select day
from (select to_date('25-DEC-2013', 'dd-mon-yyyy') - 1 + level day
from dual
connect by level <= to_date('10-FEB-2014', 'dd-mon-yyyy') -
to_date('25-DEC-2013', 'dd-mon-yyyy') + 1)
where exists
(select 1 from TIME_FRAME p where day between p.FROM and p.TO);
Hope this helps!

Get average interval between pairs of rows in a table

I have a table with the following data (paypal transactions):
txn_type | date | subscription_id
----------------+----------------------------+---------------------
subscr_signup | 2014-01-01 07:53:20 | S-XXX01
subscr_signup | 2014-01-05 10:37:26 | S-XXX02
subscr_signup | 2014-01-08 08:54:00 | S-XXX03
subscr_eot | 2014-03-01 08:53:57 | S-XXX01
subscr_eot | 2014-03-05 08:58:02 | S-XXX02
I want to get the average subscription length overall for a given time period (subscr_eot is the end of a subscription). In the case of a subscription that is still ongoing ('S-XXX03') I want it to be included from it's start date until now in the average.
How would I go about doing this with an SQL statement in Postgres?
SQL Fiddle. Subscription length for each subscription:
select
subscription_id,
coalesce(t2.date, current_timestamp) - t1.date as subscription_length
from
(
select *
from t
where txn_type = 'subscr_signup'
) t1
left join
(
select *
from t
where txn_type = 'subscr_eot'
) t2 using (subscription_id)
order by t1.subscription_id
The average:
select
avg(coalesce(t2.date, current_timestamp) - t1.date) as subscription_length_avg
from
(
select *
from t
where txn_type = 'subscr_signup'
) t1
left join
(
select *
from t
where txn_type = 'subscr_eot'
) t2 using (subscription_id)
I used a couple of common table expressions; you can take the pieces apart pretty easily to see what they do.
One of the reasons this SQL is complicated is because you're storing column names as data. (subscr_signup and subscr_eot are actually column names, not data.) This is a SQL anti-pattern; expect it to cause you much pain.
with subscription_dates as (
select
p1.subscription_id,
p1.date as subscr_start,
coalesce((select min(p2.date)
from paypal_transactions p2
where p2.subscription_id = p1.subscription_id
and p2.txn_type = 'subscr_eot'
and p2.date > p1.date), current_date) as subscr_end
from paypal_transactions p1
where txn_type = 'subscr_signup'
), subscription_days as (
select subscription_id, subscr_start, subscr_end, (subscr_end - subscr_start) + 1 as subscr_days
from subscription_dates
)
select avg(subscr_days) as avg_days
from subscription_days
-- add your date range here.
avg_days
--
75.6666666666666667
I didn't add your date range as a WHERE clause, because it's not clear to me what you mean by "a given time period".
Using the window function lag(), this becomes considerably shorter:
SELECT avg(ts_end - ts) AS avg_subscr
FROM (
SELECT txn_type, ts, lag(ts, 1, localtimestamp)
OVER (PARTITION BY subscription_id ORDER BY txn_type) AS ts_end
FROM t
) sub
WHERE txn_type = 'subscr_signup';
SQL Fiddle.
lag() conveniently takes a default value for missing rows. Exactly what we need here, so we don't need COALESCE in addition.
The query builds on the fact that subscr_eot sorts before subscr_signup.
Probably faster than presented alternatives so far because it only needs a single sequential scan - even though the window functions add some cost.
Using the column ts instead of date for three reasons:
Your "date" is actually a timestamp.
"date" is a reserved word in standard SQL (even if it's allowed in Postgres).
Never use basic type names as identifiers.
Using localtimestamp instead of now() or current_timestamp since you are obviously operating with timestamp [without time zone].
Also, your columns txn_type and subscription_id should not be text
Maybe an enum for txn_type and integer for subscription_id. That would make table and indexes considerably smaller and faster.
For the query at hand, the whole table has to be read an indexes won't help - except for a covering index in Postgres 9.2+, if you need the read performance:
CREATE INDEX t_foo_idx ON t (subscription_id, txn_type, ts);

Generate Dates starting from a date returned by a condition - SQL

A series of dates with a specified interval can be generated using a variable and a static date as per the linked question that I asked earlier. However when there's a where clause to produce a start date, the dates generation seems to stop and only shows the first interval date. I also checked other posts, those that I found e.g. 1, e.g. 2, e.g. 3 are shown with a static date or using CTE.. I am looking for a solution without storedprocedures/functions...
This works:
SELECT DATE(DATE_ADD('2012-01-12',
INTERVAL #i:=#i+30 DAY) ) AS dateO
FROM members, (SELECT #i:=0) r
where #i < DATEDIFF(now(), date '2012-01-12')
;
These don't:
SELECT DATE_ADD(date '2012-01-12',
INTERVAL #j:=#j+30 DAY) AS dateO, #j
FROM `members`, (SELECT #j:=0) s
where #j <= DATEDIFF(now(), date '2012-01-12')
and mmid = 100
;
SELECT DATE_ADD(stdate,
INTERVAL #k:=#k+30 DAY) AS dateO, #k
FROM `members`, (SELECT #k:=0) t
where #k <= DATEDIFF(now(), stdate)
and mmid = 100
;
SQLFIDDLE REFERENCE
Expected Results:
Be the same as the first query results given it starts generating dates with stDate of mmid=100.
Preferably in ANSI SQL so it can be supported in MYSQL, SQL Server/MS Access SQL as Oracle has trunc and rownum given per this query with 14 votes and PostGres has generatge_Series function. I would like to know if this is a bug or a limitation in MYSQL?
PS: I have asked a similar quetion before. It was based on static date values where as this one is based on a date value from a table column based on a condition.
The simplest way to insure cross-platform compatibility is to use a calendar table. In its simplest form
create table calendar (
cal_date date primary key
);
insert into calendar values
('2013-01-01'),
('2013-01-02'); -- etc.
There are many ways to generate dates for insertion.
Instead of using a WHERE clause to generate rows, you use a WHERE clause to select rows. To select October of this year, just
select cal_date
from calendar
where cal_date between '2013-10-01' and '2013-10-31';
It's reasonably compact--365,000 rows to cover a period of 1000 years. That ought to cover most business scenarios.
If you need cross-platform date arithmetic, you can add a tally column.
drop table calendar;
create table calendar (
cal_date date primary key,
tally integer not null unique check (tally > 0)
);
insert into calendar values ('2012-01-01', 1); -- etc.
To select all the dates of 30-day intervals, starting on 2012-01-12 and ending at the end of the calendar year, use
select cal_date
from calendar
where ((tally - (select tally
from calendar
where cal_date = '2012-01-12')) % 30 ) = 0;
cal_date
--
2012-01-12
2012-02-11
2012-03-12
2012-04-11
2012-05-11
2012-06-10
2012-07-10
2012-08-09
2012-09-08
2012-10-08
2012-11-07
2012-12-07
If your "mmid" column is guaranteed to have no gaps--an unspoken requirement for a calendar table--you can use the "mmid" column in place of my "tally" column.

Object with starts_on and ends_on best query suggestion

I need to select from a table that has a starts_on field and an ends_on field.
I need to pass start date and end date for filtering and retrieving those objects.
At the moment it's working and I use the following:
SELECT * FROM ***
WHERE ((starts_on >= START_DATE AND starts_on <= END_DATE) OR
(ends_on >= START_DATE AND ends_on <= END_DATE) OR
(starts_on <= END_DATE AND ends_on >= END_DATE))
ORDER BY starts_on, id
It looks a bit messy, but can't see an easy way to simplify it. Any idea?
I'm using postgres 9.1 as dbms.
Edit:
starts_on | timestamp without time zone |
ends_on | timestamp without time zone |
Ex: if one entry has starts_on = '2012/02/02' and ends_on '2012/02/05' I want the following behavior:
if I filter by start date 2012/01/01 and end date 2012/03/01 I want the item to be returned
if I filter by start date 2012/02/04 and end date 2012/03/01 I want the item to be returned
if I filter by start date 2012/02/05 and end date 2012/03/01 I want the item to be returned
if I filter by start date 2012/02/04 and end date 2012/02/04 I want the item to be returned
if I filter by start date 2012/02/06 and end date 2012/03/01 I want the item to NOT be returned
if I filter by start date 2012/01/01 and end date 2012/02/01 I want the item to NOT be returned
Query
If you want all rows where the time period between starts_on and ends_on overlaps with the passed time period of START_DATE and END_DATE, and "end" is always later than "start", and all involved columns are of type timestamp (as opposed to time or date), this simpler query does the job:
SELECT *
FROM tbl
WHERE starts_on <= END_DATE
AND ends_on >= START_DATE
ORDER BY starts_on, id;
Fits the question as later clarified.
Index
The best index for this query would be a multi-column index like:
CREATE INDEX tbl_range_idx ON tbl (starts_on, ends_on DESC)
Would work with DESC / ASC almost as well, because an index can be searched in both directions almost equally well.
How do I figure?
The index is searched on the first condition starts_on <= END_DATE, qualifying rows are at the beginning.
From there, Postgres can take all rows that end late enough according to ends_on >= START_DATE. Qualifying rows come first. Optimal index.
But don't just take my word - test performance with EXPLAIN ANALYZE. Run a couple of times to exclude caching effects.
There is also the OVERLAPS operator for the same purpose. Simplifies the logic, but isn't superior otherwise.
And there are the new range types in PostgreSQL 9.2, with their own operators. Not for 9.1 though.