Count full months between two dates - sql

I've been working on this for a few hours with no luck and have hit a wall. My data looks like this:
Date1 Date2
2012-05-06 2012-05-05
2012-03-20 2012-01-05
What I'm trying to do is add 1 to the count for every month between two dates. So my output would ideally look like this:
Year Month Sum
2012 2 1
In other words, it should check for "empty" months between two dates and add 1 to them.
This is the code I've worked out so far. It will basically count the number of months between the two dates and group them into months and years.
SELECT
EXTRACT(YEAR FROM Date2::date) as "Year",
EXTRACT(MONTH FROM Date2::date) as "Month",
SUM(DATE_PART('year', Date1::date) - DATE_PART('year', Date2::date)) * 12 +
(DATE_PART('month', Date1::date) - DATE_PART('month', Date2::date))
FROM
test
GROUP BY
"Year",
"Month",
ORDER BY
"Year" DESC,
"Month" DESC;
This is where I'm stuck - I don't know how to actually add 1 for each of the "empty" months.

Test setup
With some sample rows (should be provided in the question):
CREATE TABLE test (
test_id serial PRIMARY KEY
, date1 date NOT NULL
, date2 date NOT NULL
);
INSERT INTO test(date1, date2)
VALUES
('2012-03-20', '2012-01-05') -- 2012-02 lies in between
, ('2012-01-20', '2012-03-05') -- 2012-02 (reversed)
, ('2012-05-06', '2012-05-05') -- nothing
, ('2012-05-01', '2012-06-30') -- still nothing
, ('2012-08-20', '2012-11-05') -- 2012-09 - 2012-10
, ('2012-11-20', '2013-03-05') -- 2012-12 - 2013-02
;
Postgres 9.3 or newer
Use a LATERAL join:
SELECT to_char(mon, 'YYYY') AS year
, to_char(mon, 'MM') AS month
, count(*) AS ct
FROM (
SELECT date_trunc('mon', least(date1, date2)::timestamp) + interval '1 mon' AS d1
, date_trunc('mon', greatest(date1, date2)::timestamp) - interval '1 mon' AS d2
FROM test
) sub1
, generate_series(d1, d2, interval '1 month') mon -- implicit CROSS JOIN LATERAL
WHERE d2 >= d1 -- exclude ranges without gap right away
GROUP BY mon
ORDER BY mon;
What is the difference between LATERAL and a subquery in PostgreSQL?
Postgres 9.2 or older
No LATERAL, yet. Use a subquery instead:
SELECT to_char(mon, 'YYYY') AS year
, to_char(mon, 'MM') AS month
, count(*) AS ct
FROM (
SELECT generate_series(d1, d2, interval '1 month') AS mon
FROM (
SELECT date_trunc('mon', least(date1, date2)::timestamp) + interval '1 mon' AS d1
, date_trunc('mon', greatest(date1, date2)::timestamp) - interval '1 mon' AS d2
FROM test
) sub1
WHERE d2 >= d1 -- exclude ranges without gap right away
) sub2
GROUP BY mon
ORDER BY mon;
Result
year | month | ct
------+-------+----
2012 | 2 | 2
2012 | 9 | 1
2012 | 10 | 1
2012 | 12 | 1
2013 | 1 | 1
2013 | 2 | 1
db<>fiddle here
SQL Fiddle.
Explanation
You are looking for complete calendar months between the two dates.
These queries work with any dates or timestamps in ascending or descending order and should perform well.
The WHERE clause is optional, since generate_series() returns no row if start > end. But it should be a bit faster to exclude empty ranges a priori.
The cast to timestamp makes it a bit cleaner and faster. Rationale:
Generating time series between two dates in PostgreSQL

AFAIK you can simply substract/add dates in postgresql
'2001-06-27 14:43:21'::DATETIME - '2001-06-27 14:33:21'::DATETIME = '00:10:00'::INTERVAL
So in your case that request part should look like
DATE_PART('month', Date1::datetime - Date2::datetime) as "MonthInterval"

age(timestamp1, timestamp2) => returns interval
the we try to extract year and month out of the interval and add them accordingly.
select extract(year from age(timestamp1, timestamp2))*12 + extract(month from
age(timestamp1, timestamp2))

Related

Rewrite PostgreSQL query using CTE:

I have the following code to pull records from a daterange in PostgreSQL, it works as intended. The "end date" is determined by the "date" column from the last record, and the "start date" is calculated by subtracting a 7-day interval from the "end date".
SELECT date
FROM files
WHERE daterange((
(SELECT date FROM files ORDER BY date DESC LIMIT 1) - interval '7 day')::date, -- "start date"
(SELECT date FROM files ORDER BY date DESC LIMIT 1)::date, -- "end date"
'(]') #> date::date
ORDER BY date ASC
I'm trying to rewrite this query using CTEs, so I can replace those subqueries with values such as end_date and start_date. Is this possible using this method or should I look for other alternatives like variables? I'm still learning SQL.
WITH end_date AS
(
SELECT date FROM files ORDER BY date DESC LIMIT 1
),
start_date AS
(
SELECT date FROM end_date - INTERVAL '7 day'
)
SELECT date
FROM files
WHERE daterange(
start_date::date,
end_date::date,
'(]') #> date::date
ORDER BY date ASC
Right now I'm getting the following error:
ERROR: syntax error at or near "-"
LINE 7: SELECT date FROM end_date - INTERVAL '7 day'
You do not need two CTEs, it's one just fine, which can be joined to filter data.
WITH RECURSIVE files AS (
SELECT CURRENT_DATE date, 1 some_value
UNION ALL
SELECT (date + interval '1 day')::date, some_value + 1 FROM files
WHERE date < (CURRENT_DATE + interval '1 month')::date
),
dates AS (
SELECT
(MAX(date) - interval '7 day')::date from_date,
MAX(date) to_date
FROM files
)
SELECT f.* FROM files f
JOIN dates d ON daterange(d.from_date, d.to_date, '(]') #> f.date
You even can make it to be a daterange initially in CTE and use it later like this
WITH dates AS (
SELECT
daterange((MAX(date) - interval '7 day')::date, MAX(date), '(]') range
FROM files
)
SELECT f.* FROM files f
JOIN dates d ON d.range #> f.date
Here the first CTE is used just to generate some data.
It will get all file lines for dates in the last week, excluding from_date and including to_date.
date
some_value
2022-09-26
25
2022-09-27
26
2022-09-28
27
2022-09-29
28
2022-09-30
29
2022-10-01
30
2022-10-02
31
I think this is what you want:
WITH end_date AS
(
SELECT date FROM files ORDER BY date DESC LIMIT 1
),
start_date AS
(
SELECT date - INTERVAL '7 day' as date
FROM end_date
)
SELECT F.date, S.date startDate, E.date endDate
FROM files F
JOIN start_date S on F.date >= S.date
JOIN end_date E on F.date <= E.date
ORDER BY date ASC;
I hope I'm not repeating anything, but if I understand your problem correctly I think this will work:
with cte as (
select max (date)::date as max_date from files
)
select date
from files
cross join cte
where date >= max_date - 7
Or perhaps even:
select date
from files
where date >= (select max (date)::date - 7 from files)
Since you have already determined that the CTE has the max date, there is really no need to further bound it with a between, <= or range. You can simply say anything after that date minus 7 days.
The error in your code above is because you want this:
SELECT date - INTERVAL '7 day' as date FROM end_date
And not this:
SELECT date FROM end_date - INTERVAL '7 day'
You are subtracting from the table, which doesn't make sense.

Get the number of remaining days after excluding date ranges in a table

create table test (start date ,"end" date);
insert into test values
('2019-05-05','2019-05-10')
,('2019-05-25','2019-06-10')
,('2019-07-05','2019-07-10')
;
I am looking for the following output, where for every date between the start and end the person is available only between start and end. considering for the month of may he is present for 11 days(05/05 to 05/10 and 05/25 to 05/31) and the total number of days in the month of may is 31. The output column should have 31-11 (the number of days he worked)
MonthDate------Days-
2019-05-01 20(31-11)
2019-06-01 20(30-10)
2019-07-01 26(31-5)
I get slightly different results.
But the idea is to generate every date. Then filter out the ones that are used and aggregate:
select date_trunc('month', dte) as yyyymm,
count(*) filter (where t.startd is null) as available_days
from (select generate_series(date_trunc('month', min(startd)), date_trunc('month', max(endd)) + interval '1 month - 1 day', interval '1 day') dte
from test
) d left join
test t
on d.dte between t.startd and t.endd
group by date_trunc('month', dte)
order by date_trunc('month', dte);
Here is a db<>fiddle.
The free days in May are:
1
2
3
4
11
12
13
14
15
16
17
18
19
20
21
22
23
24
I am counting 18 of these. So, I believe the results from this query.
If you do not want to include the end date (which is contrary to your description using "between", then the on logic would be:
on d.dte >= t.startd and
d.dte < t.endd
But that would only get you up to 19 in May.
Your results are inconsistent. I decided to go with inclusive bounds for the simplest solution:
SELECT date_trunc('month', d)::date, count(*)
FROM (
SELECT generate_series(timestamp '2019-05-01', timestamp '2019-07-31', interval '1 day') d
EXCEPT ALL
SELECT generate_series(start_date::timestamp, end_date::timestamp, interval '1 day') x
FROM test
) sub
GROUP BY date_trunc('month', d);
date_trunc | count
-----------+------
2019-05-01 | 18
2019-06-01 | 20
2019-07-01 | 25
db<>fiddle here
This generates all days of a given time frame (May to July of the year in your case) and excludes the days generated from all your date ranges.
Assuming at least Postgres 10.
What is the expected behaviour for multiple set-returning functions in SELECT clause?
Assuming data type date in your table. I cast to timestamp for best results. See:
Generating time series between two dates in PostgreSQL
Aside: don't use the reserved words start and end as identifiers.
Related:
Select rows which are not present in other table

Customizing the range of a week with date_trunc

I've been trying for hours now to write a date_trunc statement to be used in a group by where my week starts on a Friday and ends the following Thursday.
So something like
SELECT
DATE_TRUNC(...) sales_week,
SUM(sales) sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
Which would return the results for the last complete week (by those standards) as 09-13-2019.
You can subtract 4 days and then add 4 days:
SELECT DATE_TRUNC(<whatever> - INTERVAL '4 DAY') + INTERVAL '4 DAY' as sales_week,
SUM(sales) as sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
The expression
select current_date - cast(cast(7 - (5 - extract(dow from current_date)) as text) || ' days' as interval);
should always give you the previous Friday's date.
if by any chance you might have gaps in data (maybe more granular breakdowns vs just per week), you can generate a set of custom weeks and left join to that:
drop table if exists sales_weeks;
create table sales_weeks as
with
dates as (
select generate_series('2019-01-01'::date,current_date,interval '1 day')::date as date
)
,week_ids as (
select
date
,sum(case when extract('dow' from date)=5 then 1 else 0 end) over (order by date) as week_id
from dates
)
select
week_id
,min(date) as week_start_date
,max(date) as week_end_date
from week_ids
group by 1
order by 1
;

Weekly total sums

I have a table in a PostgreSQL database containing dates and a total count per day.
mydate total
2012-05-12 12
2012-05-14 8
2012-05-13 4
2012-05-12 12
2012-05-15 2
2012-05-17 1
2012-05-18 1
2012-05-21 1
2012-05-25 1
Now I need to get the weekly totals for a given date range.
Ex. I want to get the weekly totals from 2012-05-01 up to 2012-05-31.
I'm looking at this output:
2012-05-01 2012-05-07 0
2012-05-08 2012-05-14 36
2012-05-15 2012-05-22 5
2012-05-23 2012-05-29 1
2012-05-30 2012-05-31 0
This works for any given date range:
CREATE FUNCTION f_tbl_weekly_sumtotals(_range_start date, _range_end date)
RETURNS TABLE (week_start date, week_end date, sum_total bigint)
LANGUAGE sql AS
$func$
SELECT w.week_start, w.week_end, COALESCE(sum(t.total), 0)
FROM (
SELECT week_start::date, LEAST(week_start::date + 6, _range_end) AS week_end
FROM generate_series(_range_start::timestamp
, _range_end::timestamp
, interval '1 week') week_start
) w
LEFT JOIN tbl t ON t.mydate BETWEEN w.week_start and w.week_end
GROUP BY w.week_start, w.week_end
ORDER BY w.week_start
$func$;
Call:
SELECT * FROM f_tbl_weekly_sumtotals('2012-05-01', '2012-05-31');
Major points
I wrapped it in a function for convenience, so the date range has to be provided once only.
The subquery w produces the series of weeks starting from the first day of the given date range. The upper bound is capped with LEAST to stay within the upper bound of the given date range.
Then LEFT JOIN to the data table (tbl in my example) to keep all weeks in the result, even where no data rows are found.
The rest should be obvious. COALESCE to output 0 instead of NULL for empty weeks.
Data types have to match, I assumed mydate date and total int for lack of information. (The sum() of an int is bigint.)
Explanation for my particular use of generate_series():
Generating time series between two dates in PostgreSQL
Using this function
CREATE OR REPLACE FUNCTION last_day(date)
RETURNS date AS
$$
SELECT (date_trunc('MONTH', $1) + INTERVAL '1 MONTH - 1 day')::date;
$$ LANGUAGE 'sql' IMMUTABLE STRICT;
AND generate_series (from 8.4 onwards) we can create the date partitions.
SELECT wk.wk_start,
CAST(
CASE (extract(month from wk.wk_start) = extract(month from wk.wk_start + interval '6 days'))
WHEN true THEN wk.wk_start + interval '6 days'
ELSE last_day(wk.wk_start)
END
AS date) AS wk_end
FROM
(SELECT CAST(generate_series('2012-05-01'::date,'2012-05-31'::date,interval '1 week') AS date) AS wk_start) AS wk;
Then putting it together with the data
CREATE TABLE my_tab(mydate date,total integer);
INSERT INTO my_tab
values
('2012-05-12'::date,12),
('2012-05-14'::date,8),
('2012-05-13'::date,4),
('2012-05-12'::date,12),
('2012-05-15'::date,2),
('2012-05-17'::date,1),
('2012-05-18'::date,1),
('2012-05-21'::date,1),
('2012-05-25'::date,1);
WITH month_by_week AS
(SELECT wk.wk_start,
CAST(
CASE (extract(month from wk.wk_start) = extract(month from wk.wk_start + interval '6 days'))
WHEN true THEN wk.wk_start + interval '6 days'
ELSE last_day(wk.wk_start)
END
AS date) AS wk_end
FROM
(SELECT CAST(generate_series('2012-05-01'::date,'2012-05-31'::date,interval '1 week') AS date) AS wk_start) AS wk
)
SELECT month_by_week.wk_start,
month_by_week.wk_end,
SUM(COALESCE(mt.total,0))
FROM month_by_week
LEFT JOIN my_tab mt ON mt.mydate BETWEEN month_by_week.wk_start AND month_by_week.wk_end
GROUP BY month_by_week.wk_start,
month_by_week.wk_end
ORDER BY month_by_week.wk_start;

Postgresql generate_series of months

I'm trying to generate a series in PostgreSQL with the generate_series function. I need a series of months starting from Jan 2008 until current month + 12 (a year out). I'm using and restricted to PostgreSQL 8.3.14 (so I don't have the timestamp series options in 8.4).
I know how to get a series of days like:
select generate_series(0,365) + date '2008-01-01'
But I am not sure how to do months.
select DATE '2008-01-01' + (interval '1' month * generate_series(0,11))
Edit
If you need to calculate the number dynamically, the following could help:
select DATE '2008-01-01' + (interval '1' month * generate_series(0,month_count::int))
from (
select extract(year from diff) * 12 + extract(month from diff) + 12 as month_count
from (
select age(current_timestamp, TIMESTAMP '2008-01-01 00:00:00') as diff
) td
) t
This calculates the number of months since 2008-01-01 and then adds 12 on top of it.
But I agree with Scott: you should put this into a set returning function, so that you can do something like select * from calc_months(DATE '2008-01-01')
You can interval generate_series like this:
SELECT date '2014-02-01' + interval '1' month * s.a AS date
FROM generate_series(0,3,1) AS s(a);
Which would result in:
date
---------------------
2014-02-01 00:00:00
2014-03-01 00:00:00
2014-04-01 00:00:00
2014-05-01 00:00:00
(4 rows)
You can also join in other tables this way:
SELECT date '2014-02-01' + interval '1' month * s.a AS date, t.date, t.id
FROM generate_series(0,3,1) AS s(a)
LEFT JOIN <other table> t ON t.date=date '2014-02-01' + interval '1' month * s.a;
You can interval generate_series like this:
SELECT TO_CHAR(months, 'YYYY-MM') AS "dateMonth"
FROM generate_series(
'2008-01-01' :: DATE,
'2008-06-01' :: DATE ,
'1 month'
) AS months
Which would result in:
dateMonth
-----------
2008-01
2008-02
2008-03
2008-04
2008-05
2008-06
(6 rows)
Well, if you only need months, you could do:
select extract(month from days)
from(
select generate_series(0,365) + date'2008-01-01' as days
)dates
group by 1
order by 1;
and just parse that into a date string...
But since you know you'll end up with months 1,2,..,12, why not just go with select generate_series(1,12);?
In the generated_series() you can define the step, which is one month in your case. So, dynamically you can define the starting date (i.e. 2008-01-01), the ending date (i.e. 2008-01-01 + 12 months) and the step (i.e. 1 month).
SELECT generate_series('2008-01-01', '2008-01-01'::date + interval '12 month', '1 month')::date AS generated_dates
and you get
1/1/2008
2/1/2008
3/1/2008
4/1/2008
5/1/2008
6/1/2008
7/1/2008
8/1/2008
9/1/2008
10/1/2008
11/1/2008
12/1/2008
1/1/2009