Weekly total sums - sql

I have a table in a PostgreSQL database containing dates and a total count per day.
mydate total
2012-05-12 12
2012-05-14 8
2012-05-13 4
2012-05-12 12
2012-05-15 2
2012-05-17 1
2012-05-18 1
2012-05-21 1
2012-05-25 1
Now I need to get the weekly totals for a given date range.
Ex. I want to get the weekly totals from 2012-05-01 up to 2012-05-31.
I'm looking at this output:
2012-05-01 2012-05-07 0
2012-05-08 2012-05-14 36
2012-05-15 2012-05-22 5
2012-05-23 2012-05-29 1
2012-05-30 2012-05-31 0

This works for any given date range:
CREATE FUNCTION f_tbl_weekly_sumtotals(_range_start date, _range_end date)
RETURNS TABLE (week_start date, week_end date, sum_total bigint)
LANGUAGE sql AS
$func$
SELECT w.week_start, w.week_end, COALESCE(sum(t.total), 0)
FROM (
SELECT week_start::date, LEAST(week_start::date + 6, _range_end) AS week_end
FROM generate_series(_range_start::timestamp
, _range_end::timestamp
, interval '1 week') week_start
) w
LEFT JOIN tbl t ON t.mydate BETWEEN w.week_start and w.week_end
GROUP BY w.week_start, w.week_end
ORDER BY w.week_start
$func$;
Call:
SELECT * FROM f_tbl_weekly_sumtotals('2012-05-01', '2012-05-31');
Major points
I wrapped it in a function for convenience, so the date range has to be provided once only.
The subquery w produces the series of weeks starting from the first day of the given date range. The upper bound is capped with LEAST to stay within the upper bound of the given date range.
Then LEFT JOIN to the data table (tbl in my example) to keep all weeks in the result, even where no data rows are found.
The rest should be obvious. COALESCE to output 0 instead of NULL for empty weeks.
Data types have to match, I assumed mydate date and total int for lack of information. (The sum() of an int is bigint.)
Explanation for my particular use of generate_series():
Generating time series between two dates in PostgreSQL

Using this function
CREATE OR REPLACE FUNCTION last_day(date)
RETURNS date AS
$$
SELECT (date_trunc('MONTH', $1) + INTERVAL '1 MONTH - 1 day')::date;
$$ LANGUAGE 'sql' IMMUTABLE STRICT;
AND generate_series (from 8.4 onwards) we can create the date partitions.
SELECT wk.wk_start,
CAST(
CASE (extract(month from wk.wk_start) = extract(month from wk.wk_start + interval '6 days'))
WHEN true THEN wk.wk_start + interval '6 days'
ELSE last_day(wk.wk_start)
END
AS date) AS wk_end
FROM
(SELECT CAST(generate_series('2012-05-01'::date,'2012-05-31'::date,interval '1 week') AS date) AS wk_start) AS wk;
Then putting it together with the data
CREATE TABLE my_tab(mydate date,total integer);
INSERT INTO my_tab
values
('2012-05-12'::date,12),
('2012-05-14'::date,8),
('2012-05-13'::date,4),
('2012-05-12'::date,12),
('2012-05-15'::date,2),
('2012-05-17'::date,1),
('2012-05-18'::date,1),
('2012-05-21'::date,1),
('2012-05-25'::date,1);
WITH month_by_week AS
(SELECT wk.wk_start,
CAST(
CASE (extract(month from wk.wk_start) = extract(month from wk.wk_start + interval '6 days'))
WHEN true THEN wk.wk_start + interval '6 days'
ELSE last_day(wk.wk_start)
END
AS date) AS wk_end
FROM
(SELECT CAST(generate_series('2012-05-01'::date,'2012-05-31'::date,interval '1 week') AS date) AS wk_start) AS wk
)
SELECT month_by_week.wk_start,
month_by_week.wk_end,
SUM(COALESCE(mt.total,0))
FROM month_by_week
LEFT JOIN my_tab mt ON mt.mydate BETWEEN month_by_week.wk_start AND month_by_week.wk_end
GROUP BY month_by_week.wk_start,
month_by_week.wk_end
ORDER BY month_by_week.wk_start;

Related

Customizing the range of a week with date_trunc

I've been trying for hours now to write a date_trunc statement to be used in a group by where my week starts on a Friday and ends the following Thursday.
So something like
SELECT
DATE_TRUNC(...) sales_week,
SUM(sales) sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
Which would return the results for the last complete week (by those standards) as 09-13-2019.
You can subtract 4 days and then add 4 days:
SELECT DATE_TRUNC(<whatever> - INTERVAL '4 DAY') + INTERVAL '4 DAY' as sales_week,
SUM(sales) as sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
The expression
select current_date - cast(cast(7 - (5 - extract(dow from current_date)) as text) || ' days' as interval);
should always give you the previous Friday's date.
if by any chance you might have gaps in data (maybe more granular breakdowns vs just per week), you can generate a set of custom weeks and left join to that:
drop table if exists sales_weeks;
create table sales_weeks as
with
dates as (
select generate_series('2019-01-01'::date,current_date,interval '1 day')::date as date
)
,week_ids as (
select
date
,sum(case when extract('dow' from date)=5 then 1 else 0 end) over (order by date) as week_id
from dates
)
select
week_id
,min(date) as week_start_date
,max(date) as week_end_date
from week_ids
group by 1
order by 1
;

Search for holiday doy IN (string) (Postgresql)

This work:
WITH month AS (
SELECT date_part('doy',d.dt) as doy,
dt::date as date
FROM generate_series('2017-01-01','2017-01-15', interval '1 day') as d(dt)
)
SELECT date,
CASE
WHEN doy IN (1,2,3) THEN 0 ELSE 8 END
FROM month
http://sqlfiddle.com/#!15/aed15/10
But if I store 1,2,3 as a string
CREATE TABLE holidays
(id int4,days character(60));
INSERT INTO holidays
(id,days)
VALUES
('2017','1,2,3');
...and replace 1,2,3 with this string:
WITH month AS (
SELECT date_part('doy',d.dt) as doy,
dt::date as date
FROM generate_series('2017-01-01','2017-01-15', interval '1 day') as d(dt)
)
SELECT date, days,
CASE
WHEN doy::text IN (days) THEN 0 ELSE 8 END
FROM month
LEFT JOIN holidays ON id=2017
http://sqlfiddle.com/#!15/aed15/13
It seems that 'days' is not casted correct. But I cannot figure out how.
TIA,
the shortest solution here would be turning string list to array and using ANY construct:
WITH month AS (
SELECT date_part('doy',d.dt) as doy,
dt::date as date
FROM generate_series('2017-01-01','2017-01-15', interval '1 day') as d(dt)
)
SELECT date, days,
CASE
WHEN doy::text = ANY(concat('{',days,'}')::text[]) THEN 0 ELSE 8 END
FROM month
LEFT JOIN holidays ON id=2017
But I would rethink the whole solution, as it feels wrong

Get dates of a day of week in a date range

I need a function in PostgreSQL that accepts a date range and returns the dates inside the date range that are Mondays. Anybody have an idea how this could be done?
create function f(dr daterange)
returns setof date as $$
select d::date
from generate_series(
lower(dr), upper(dr), interval '1 day'
) s (d)
where
extract(dow from d) = 1 and
d::date <# dr;
;
$$ language sql;
select f(daterange('2014-01-01', '2014-01-20'));
f
------------
2014-01-06
2014-01-13
The most efficient way should be to find the first Monday and generate a series in steps of 7 days:
CREATE OR REPLACE FUNCTION f_mondays(dr daterange)
RETURNS TABLE (day date) AS
$func$
SELECT generate_series(a + (8 - EXTRACT(ISODOW FROM a)::int) % 7
, z
, interval '7 days')::date
FROM (
SELECT CASE WHEN lower_inc(dr) THEN lower(dr) ELSE lower(dr) + 1 END AS a
, CASE WHEN upper_inc(dr) THEN upper(dr) ELSE upper(dr) - 1 END AS z
) sub
$func$ LANGUAGE sql;
The subquery extracts start (a) and end (z) of the range, adjusted for inclusive and exclusive bounds with range functions.
The expression (8 - EXTRACT(ISODOW FROM a)::int) % 7 returns the number of days until the next monday. 0 if it's Monday already. The manual about EXTRACT().
generate_series() can iterate any given interval - 7 days in this case. The result is a timestamp, so we cast to date.
Only generates Mondays in the range, no WHERE clause needed.
Call:
SELECT day FROM f_mondays('[2014-04-14,2014-05-02)'::daterange);
Returns:
day
----------
2014-04-14
2014-04-21
2014-04-28
SQL Fiddle.

Count full months between two dates

I've been working on this for a few hours with no luck and have hit a wall. My data looks like this:
Date1 Date2
2012-05-06 2012-05-05
2012-03-20 2012-01-05
What I'm trying to do is add 1 to the count for every month between two dates. So my output would ideally look like this:
Year Month Sum
2012 2 1
In other words, it should check for "empty" months between two dates and add 1 to them.
This is the code I've worked out so far. It will basically count the number of months between the two dates and group them into months and years.
SELECT
EXTRACT(YEAR FROM Date2::date) as "Year",
EXTRACT(MONTH FROM Date2::date) as "Month",
SUM(DATE_PART('year', Date1::date) - DATE_PART('year', Date2::date)) * 12 +
(DATE_PART('month', Date1::date) - DATE_PART('month', Date2::date))
FROM
test
GROUP BY
"Year",
"Month",
ORDER BY
"Year" DESC,
"Month" DESC;
This is where I'm stuck - I don't know how to actually add 1 for each of the "empty" months.
Test setup
With some sample rows (should be provided in the question):
CREATE TABLE test (
test_id serial PRIMARY KEY
, date1 date NOT NULL
, date2 date NOT NULL
);
INSERT INTO test(date1, date2)
VALUES
('2012-03-20', '2012-01-05') -- 2012-02 lies in between
, ('2012-01-20', '2012-03-05') -- 2012-02 (reversed)
, ('2012-05-06', '2012-05-05') -- nothing
, ('2012-05-01', '2012-06-30') -- still nothing
, ('2012-08-20', '2012-11-05') -- 2012-09 - 2012-10
, ('2012-11-20', '2013-03-05') -- 2012-12 - 2013-02
;
Postgres 9.3 or newer
Use a LATERAL join:
SELECT to_char(mon, 'YYYY') AS year
, to_char(mon, 'MM') AS month
, count(*) AS ct
FROM (
SELECT date_trunc('mon', least(date1, date2)::timestamp) + interval '1 mon' AS d1
, date_trunc('mon', greatest(date1, date2)::timestamp) - interval '1 mon' AS d2
FROM test
) sub1
, generate_series(d1, d2, interval '1 month') mon -- implicit CROSS JOIN LATERAL
WHERE d2 >= d1 -- exclude ranges without gap right away
GROUP BY mon
ORDER BY mon;
What is the difference between LATERAL and a subquery in PostgreSQL?
Postgres 9.2 or older
No LATERAL, yet. Use a subquery instead:
SELECT to_char(mon, 'YYYY') AS year
, to_char(mon, 'MM') AS month
, count(*) AS ct
FROM (
SELECT generate_series(d1, d2, interval '1 month') AS mon
FROM (
SELECT date_trunc('mon', least(date1, date2)::timestamp) + interval '1 mon' AS d1
, date_trunc('mon', greatest(date1, date2)::timestamp) - interval '1 mon' AS d2
FROM test
) sub1
WHERE d2 >= d1 -- exclude ranges without gap right away
) sub2
GROUP BY mon
ORDER BY mon;
Result
year | month | ct
------+-------+----
2012 | 2 | 2
2012 | 9 | 1
2012 | 10 | 1
2012 | 12 | 1
2013 | 1 | 1
2013 | 2 | 1
db<>fiddle here
SQL Fiddle.
Explanation
You are looking for complete calendar months between the two dates.
These queries work with any dates or timestamps in ascending or descending order and should perform well.
The WHERE clause is optional, since generate_series() returns no row if start > end. But it should be a bit faster to exclude empty ranges a priori.
The cast to timestamp makes it a bit cleaner and faster. Rationale:
Generating time series between two dates in PostgreSQL
AFAIK you can simply substract/add dates in postgresql
'2001-06-27 14:43:21'::DATETIME - '2001-06-27 14:33:21'::DATETIME = '00:10:00'::INTERVAL
So in your case that request part should look like
DATE_PART('month', Date1::datetime - Date2::datetime) as "MonthInterval"
age(timestamp1, timestamp2) => returns interval
the we try to extract year and month out of the interval and add them accordingly.
select extract(year from age(timestamp1, timestamp2))*12 + extract(month from
age(timestamp1, timestamp2))

Postgresql generate_series of months

I'm trying to generate a series in PostgreSQL with the generate_series function. I need a series of months starting from Jan 2008 until current month + 12 (a year out). I'm using and restricted to PostgreSQL 8.3.14 (so I don't have the timestamp series options in 8.4).
I know how to get a series of days like:
select generate_series(0,365) + date '2008-01-01'
But I am not sure how to do months.
select DATE '2008-01-01' + (interval '1' month * generate_series(0,11))
Edit
If you need to calculate the number dynamically, the following could help:
select DATE '2008-01-01' + (interval '1' month * generate_series(0,month_count::int))
from (
select extract(year from diff) * 12 + extract(month from diff) + 12 as month_count
from (
select age(current_timestamp, TIMESTAMP '2008-01-01 00:00:00') as diff
) td
) t
This calculates the number of months since 2008-01-01 and then adds 12 on top of it.
But I agree with Scott: you should put this into a set returning function, so that you can do something like select * from calc_months(DATE '2008-01-01')
You can interval generate_series like this:
SELECT date '2014-02-01' + interval '1' month * s.a AS date
FROM generate_series(0,3,1) AS s(a);
Which would result in:
date
---------------------
2014-02-01 00:00:00
2014-03-01 00:00:00
2014-04-01 00:00:00
2014-05-01 00:00:00
(4 rows)
You can also join in other tables this way:
SELECT date '2014-02-01' + interval '1' month * s.a AS date, t.date, t.id
FROM generate_series(0,3,1) AS s(a)
LEFT JOIN <other table> t ON t.date=date '2014-02-01' + interval '1' month * s.a;
You can interval generate_series like this:
SELECT TO_CHAR(months, 'YYYY-MM') AS "dateMonth"
FROM generate_series(
'2008-01-01' :: DATE,
'2008-06-01' :: DATE ,
'1 month'
) AS months
Which would result in:
dateMonth
-----------
2008-01
2008-02
2008-03
2008-04
2008-05
2008-06
(6 rows)
Well, if you only need months, you could do:
select extract(month from days)
from(
select generate_series(0,365) + date'2008-01-01' as days
)dates
group by 1
order by 1;
and just parse that into a date string...
But since you know you'll end up with months 1,2,..,12, why not just go with select generate_series(1,12);?
In the generated_series() you can define the step, which is one month in your case. So, dynamically you can define the starting date (i.e. 2008-01-01), the ending date (i.e. 2008-01-01 + 12 months) and the step (i.e. 1 month).
SELECT generate_series('2008-01-01', '2008-01-01'::date + interval '12 month', '1 month')::date AS generated_dates
and you get
1/1/2008
2/1/2008
3/1/2008
4/1/2008
5/1/2008
6/1/2008
7/1/2008
8/1/2008
9/1/2008
10/1/2008
11/1/2008
12/1/2008
1/1/2009