Weeks between two dates - sql

I'm attempting to turn two dates into a series of records. One record for each week between the dates.
Additionally the original start and end dates should be used to clip the week in case the range starts or ends mid-week. I'm also assuming that a week starts on Monday.
With a start date of: 05/09/2018 and an end date of 27/09/2018 I would like to retrieve the following results:
| # | Start Date | End date |
|---------------------------------|
| 0 | '05/09/2018' | '09/09/2018' |
| 1 | '10/09/2018' | '16/09/2018' |
| 2 | '17/09/2018' | '23/09/2018' |
| 3 | '24/09/2018' | '27/09/2018' |
I have made some progress - at the moment I can get the total number of weeks between the date range with:
SELECT (
EXTRACT(
days FROM (
date_trunc('week', to_date('27/09/2018', 'DD/MM/YYYY')) -
date_trunc('week', to_date('05/09/2018', 'DD/MM/YYYY'))
) / 7
) + 1
) as total_weeks;
Total weeks will return 4 for the above SQL. This is where I'm stuck, going from an integer to actual set of results.

Window functions are your friend:
SELECT week_num,
min(d) AS start_date,
max(d) AS end_date
FROM (SELECT d,
count(*) FILTER (WHERE new_week) OVER (ORDER BY d) AS week_num
FROM (SELECT DATE '2018-09-05' + i AS d,
extract(dow FROM DATE '2018-09-05'
+ lag(i) OVER (ORDER BY i)
) = 1 AS new_week
FROM generate_series(0, DATE '2018-09-27' - DATE '2018-09-05') AS i
) AS week_days
) AS weeks
GROUP BY week_num
ORDER BY week_num;
week_num | start_date | end_date
----------+------------+------------
0 | 2018-09-05 | 2018-09-09
1 | 2018-09-10 | 2018-09-16
2 | 2018-09-17 | 2018-09-23
3 | 2018-09-24 | 2018-09-27
(4 rows)

Use generate_series():
select gs.*
from generate_series(date_trunc('week', '2018-09-05'::date),
'2018-09-27'::date,
interval '1 week'
) gs(dte)

Ultimately I expanded on Gordon's solution to get to the following, however Laurenz's answer is slightly more concise.
select
(
case when (week_start - interval '6 days' <= date_trunc('week', '2018-09-05'::date)) then '2018-09-05'::date else week_start end
) as start_date,
(
case when (week_start + interval '6 days' >= '2018-09-27'::date) then '2018-09-27'::date else week_start + interval '6 days' end
) as end_date
from generate_series(
date_trunc('week', '2018-09-05'::date),
'2018-09-27'::date,
interval '1 week'
) gs(week_start);

Related

SQL window function over 31 days but including only one first day of the month

I want to calculate avg or sum (metric) over (past 31 days) to use it in visualization.
"metric" varies every days but it always jumps at the first day of calendar months.
The problem is that some months (like November) are 30-day long. So this function actually includes two first day of the months on first of December (run the query below and check the row at 2021-12-01T00:00:00Z).
I need avg (metric) over (past 30 days) if we have two first days of the months in the window and (past 31 days) otherwise.
with days as(
select
'2021-10-01' :: timestamptz + (d || ' day') ::interval as "day"
from generate_series(0, 100) d
)
, daily_metrics as (
select
"day"
-- in reality "metric" fluctuates every day. But it jumps on the first day of the months
, case when extract(day from "day") = 1 then 300 else 100 end :: float as metric
from days
)
, result as (
select
"day"
, avg(metric) over (rows between 30 preceding and current row) as metric_roll_avg
from daily_metrics
)
select * from result
where "day" > '2021-10-01' :: timestamptz + '30 day' :: interval
This is what I ended up doing:
First crate a calendar view, each rows for the days of the calendar
with
calendar as (
select c."day"
, extract(day from c."day") = 1 and extract(day from date_trunc('month', c."day") - interval '1 day') = 30 as last_month_30
, extract(day from c."day") in (1, 2) and extract(day from date_trunc('month', c."day") - interval '1 day') = 29 as last_month_29
, extract(day from c."day") in (1, 2, 3) and extract(day from date_trunc('month', c."day") - interval '1 day') = 28 as last_month_28
from ... as c ...
)
Which returns this view:
+------------+---------------+---------------+---------------+
| day | last_month_30 | last_month_29 | last_month_28 |
|------------+---------------+---------------+---------------|
| 2022-02-26 | False | False | False |
| 2022-02-27 | False | False | False |
| 2022-02-28 | False | False | False |
| 2022-03-01 | False | False | True |
| 2022-03-02 | False | False | True |
+------------+---------------+---------------+---------------+
And using a case switch:
select
"day"
, case
when last_month_30
then avg(revenue) over (order by "day" rows between 29 preceding and current row )
when last_month_29
then avg(revenue) over (order by "day" rows between 28 preceding and current row)
when last_month_28
then avg(revenue) over (order by "day" rows between 27 preceding and current row)
else avg(revenue) over (order by "day" rows between 30 preceding and current row )
end as monthly_revenue
from ...
Probably not the cleanest way, it but works for all cases.

Count days per month from days off table

I have table which stores person, start of holiday and stop of holiday.
I need to count from it, how many working days per month person was on holiday. So I want to partition this table over month.
To get holidays I'm using: https://github.com/christopherthompson81/pgsql_holidays
Let's assume I have table for one person only with start/stop only.
create table data (id int, start date, stop date);
This is function for network_days I wrote:
CREATE OR REPLACE FUNCTION network_days(start_date date , stop_date date) RETURNS bigint AS $$
SELECT count(*) FROM
generate_series(start_date , stop_date - interval '1 minute' , interval '1 day') the_day
WHERE
extract('ISODOW' FROM the_day) < 6 AND the_day NOT IN (
SELECT datestamp::timestamptz FROM holidays_poland (extract(year FROM o.start_date)::int, extract(year FROM o.stop_date)::int))
$$
LANGUAGE sql
STABLE;
and I created function with query like:
--$2 = 2020
SELECT
month, year, sum(value_per_day)
FROM (
SELECT to_char(dt , 'mm') AS month, to_char(dt, 'yyyy') AS year, (network_days ((
CASE WHEN EXTRACT(year FROM df.start_date) < 2020 THEN (SELECT date_trunc('year' , df.start_date) + interval '1 year')::date
ELSE df.start_date END) , ( CASE WHEN EXTRACT(year FROM df.stop_date) > $2 THEN (date_trunc('year' , df.stop_date))::date
ELSE
df.stop_date END))::int ::numeric / count(*) OVER (PARTITION BY id))::int AS value_per_day
FROM intranet.dayoff df
LEFT JOIN generate_series((
CASE WHEN EXTRACT(year FROM df.start_date) < $2 THEN (SELECT date_trunc('year' , df.start_date) + interval '1 year')::date ELSE df.start_date
END) , (CASE WHEN EXTRACT(year FROM df.stop_date) > $2 THEN (date_trunc('year' , df.stop_date))::date
ELSE df.stop_date END) - interval '1 day' , interval '1 day') AS t (dt) ON extract('ISODOW' FROM dt) < 6
WHERE
extract(isodow FROM dt) < 6 AND (EXTRACT(year FROM start_date) = $2 OR EXTRACT(year FROM stop_date) = $2)) t
GROUP BY month, year
ORDER BY month;
based on: https://dba.stackexchange.com/questions/237745/postgresql-split-date-range-by-business-days-then-aggregate-by-month?rq=1
and I almost have it:
10 rows returned
| month | year | sum |
| ----- | ---- | ---- |
| 03 | 2020 | 2 |
| 04 | 2020 | 13 |
| 06 | 2020 | 1 |
| 11 | 2020 | 1 |
| 12 | 2020 | 2 |
| 05 | 2020 | 1 |
| 10 | 2020 | 2 |
| 08 | 2020 | 10 |
| 01 | 2020 | 1 |
| 02 | 2020 | 1 |
so in function I created I'd need to add something like this
dt NOT IN (SELECT datestamp::timestamptz FROM holidays_poland ($2, $2))
but I end up with many conditions and I feel like this wrong approach.
I feel like I should just somehow divide table from:
id start stop
1 31.12.2019 00:00:00 01.01.2020 00:00:00
2 30.03.2020 00:00:00 14.04.2020 00:00:00
3 01.05.2020 00:00:00 03.05.2020 00:00:00
to
start stop
30.03.2020 00:00:00 01.01.2020 00:00:00
01.01.2020 00:00:00 14.04.2020 00:00:00
01.05.2020 00:00:00 03.05.2020 00:00:00
and just run network_days function for this date range, but I couldn't successfully partition my query of the table to get such result.
What do you think is best way to achieve what I want to calculate?
demo:db<>fiddle
SELECT
gs::date
FROM person_holidays p,
generate_series(p.start, p.stop, interval '1 day') gs -- 1
WHERE gs::date NOT IN (SELECT holiday FROM holidays) -- 2
AND EXTRACT(isodow from gs::date) < 6 -- 3
Generate date series from person's start and stop date
Exclude all dates from the holidays table
If necessary: Exclude all weekend days (Saturday and Sunday)
Afterwards you are able to GROUP BY months and count the records:
SELECT
date_trunc('month', gs),
COUNT(*)
FROM person_holidays p,
generate_series(p.start, p.stop, interval '1 day') gs
WHERE gs::date NOT IN (SELECT holiday FROM holidays)
and extract(isodow from gs::date) < 6
GROUP BY 1

Sum results on constant timeframe range on each date in table

I'm using PostGres DB.
I have a table that contains test names, their results and reported time:
|test_name|result |report_time|
| A |error |29/11/2020 |
| A |failure|28/12/2020 |
| A |error |29/12/2020 |
| B |passed |30/12/2020 |
| C |failure|31/12/2020 |
| A |error |31/12/2020 |
I'd like to sum how many tests have failed or errored in the last 30 days, per date (and limit it to be 5 days back from the current date), so the final result will be:
| date | sum | (notes)
| 29/11/2020 | 1 | 1 failed/errored test in range (29/11 -> 29/10)
| 28/12/2020 | 2 | 2 failed/errored tests in range (28/12 -> 28/11)
| 29/12/2020 | 3 | 3 failed/errored tests in range (29/12 -> 29/11)
| 30/12/2020 | 2 | 2 failed/errored tests in range (30/12 -> 30/11)
| 31/12/2020 | 4 | 4 failed/errored tests in range (31/12 -> 31/11)
I know how to sum the results per date (i.e, how many failures/errors were on a specific date):
SELECT report_time::date AS "Report Time", count(case when result in ('failure', 'error') then 1 else
null end) from table
where report_time::date = now()::date
GROUP BY report_time::date, count(case when result in ('failure', 'error') then 1 else null end)
But I'm struggling to sum each date 30 days back.
You can generate the dates and then use window functions:
select gs.dte, num_failed_error, num_failed_error_30
from genereate_series(current_date - interval '5 day', current_date, interval '1 day') gs(dte) left join
(select t.report_time, count(*) as num_failed_error,
sum(count(*)) over (order by report_time range between interval '30 day' preceding and current row) as num_failed_error_30
from t
where t.result in ('failed', 'error') and
t.report_time >= current_date - interval '35 day'
group by t.report_time
) t
on t.report_time = gs.dte ;
Note: This assumes that report_time is only the date with no time component. If it has a time component, use report_time::date.
If you have data on each day, then this can be simplified to:
select t.report_time, count(*) as num_failed_error,
sum(count(*)) over (order by report_time range between interval '30 day' preceding and current row) as num_failed_error_30
from t
where t.result in ('failed', 'error') and
t.report_time >= current_date - interval '35 day'
group by t.report_time
order by report_time desc
limit 5;
Since I'm using PostGresSql 10.12 and update is currently not an option, I took a different approach, where I calculate the dates of the last 30 days and for each date I calculate the cumulative distinct sum for the past 30 days:
SELECT days_range::date, SUM(number_of_tests)
FROM generate_series (now() - interval '30 day', now()::timestamp , '1 day'::interval) days_range
CROSS JOIN LATERAL (
SELECT environment, COUNT(DISTINCT(test_name)) as number_of_tests from tests
WHERE report_time > days_range - interval '30 day'
GROUP BY report_time::date
HAVING COUNT(case when result in ('failure', 'error') then 1 else null end) > 0
ORDER BY report_time::date asc
) as lateral_query
GROUP BY days_range
ORDER BY days_range desc
It is definitely not the best optimized query, it takes ~1 minute for it to compute.

Migrate to Standard SQL: choose the closest sunday from current date

I need to migrate from legacy to standard SQL this query:
SELECT MAX(FECHA)
FROM(
SELECT FECHA, DAYOFWEEK(FECHA) AS DIA
FROM(
SELECT DATE(DATE_ADD(TIMESTAMP("2017-05-29"), pos - 1, "DAY")) AS FECHA
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (
FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()),
TIMESTAMP("2017-05-29")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
))
WHERE DIA=1
The query must return the previous closest sunday date from current date.
When I run this in standard SQL I get
Syntax error: Expected keyword JOIN but got ")" at [12:2] (after FROM (SELECT NULL)),h
The query must return the previous closest sunday date from current date.
#standardSQL
SELECT
DATE_SUB(CURRENT_DATE(), INTERVAL EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 1 DAY)
You can replace CURRENT_DATE() with any date and it will return previous closest Sunday
You can use DATE_TRUNC with the WEEK part to truncate to the most recent Sunday. For example:
#standardSQL
WITH Input AS (
SELECT date
FROM UNNEST([
DATE '2017-06-26',
DATE '2017-06-24',
DATE '2017-05-04']) AS date
)
SELECT
date,
FORMAT_DATE('%A', date) AS dayofweek,
DATE_TRUNC(date, WEEK) AS previous_sunday
FROM Input;
This returns:
+------------+-----------+-----------------+
| date | dayofweek | previous_sunday |
+------------+-----------+-----------------+
| 2017-06-24 | Saturday | 2017-06-18 |
| 2017-05-04 | Thursday | 2017-04-30 |
| 2017-06-26 | Monday | 2017-06-25 |
+------------+-----------+-----------------+

GROUP BY next months over N years

I need to aggregate amounts grouped by "horizon" 12 next months over 5 year:
assuming we are 2015-08-15
SUM amount from 0 to 12 next months (from 2015-08-16 to 2016-08-15)
SUM amount from 12 to 24 next months (from 2016-08-16 to 2017-08-15)
SUM amount from 24 to 36 next months ...
SUM amount from 36 to 48 next months
SUM amount from 48 to 60 next months
Here is a fiddled dataset example:
+----+------------+--------+
| id | date | amount |
+----+------------+--------+
| 1 | 2015-09-01 | 10 |
| 2 | 2015-10-01 | 10 |
| 3 | 2016-10-01 | 10 |
| 4 | 2017-06-01 | 10 |
| 5 | 2018-06-01 | 10 |
| 6 | 2019-05-01 | 10 |
| 7 | 2019-04-01 | 10 |
| 8 | 2020-04-01 | 10 |
+----+------------+--------+
Here is the expected result:
+---------+--------+
| horizon | amount |
+---------+--------+
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 20 |
| 5 | 10 |
+---------+--------+
How can I get these 12 next months grouped "horizons" ?
I tagged PostgreSQL but I'm actually using an ORM so it's just to find the idea. (by the way I don't have access to the date formatting functions)
I would split by 12 months time frame and group by this:
SELECT
FLOOR(
(EXTRACT(EPOCH FROM date) - EXTRACT(EPOCH FROM now()))
/ EXTRACT(EPOCH FROM INTERVAL '12 month')
) + 1 AS "horizon",
SUM(amount) AS "amount"
FROM dataset
GROUP BY horizon
ORDER BY horizon;
SQL Fiddle
Inspired by: Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)
Assuming you need intervals from current date to this day next year and so on, I would query this like this:
SELECT 1 AS horizon, SUM(amount) FROM dataset
WHERE date > now()
AND date < (now() + '12 months'::INTERVAL)
UNION
SELECT 2 AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + '12 months'::INTERVAL)
AND date < (now() + '24 months'::INTERVAL)
UNION
SELECT 3 AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + '24 months'::INTERVAL)
AND date < (now() + '36 months'::INTERVAL)
UNION
SELECT 4 AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + '36 months'::INTERVAL)
AND date < (now() + '48 months'::INTERVAL)
UNION
SELECT 5 AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + '48 months'::INTERVAL)
AND date < (now() + '60 months'::INTERVAL)
ORDER BY horizon;
You can generalize it and make something like this using additional variable:
SELECT number AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + ((number - 1) * '12 months'::INTERVAL))
AND date < (now() + (number * '12 months'::INTERVAL));
Where number is an integer from range [1,5]
Here is what I get from the Fiddle:
| horizon | sum |
|---------|-----|
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 20 |
| 5 | 10 |
Perhaps CTE?
WITH RECURSIVE grps AS
(
SELECT 1 AS Horizon, (date '2015-08-15') + interval '1' day AS FromDate, (date '2015-08-15') + interval '1' year AS ToDate
UNION ALL
SELECT Horizon + 1, ToDate + interval '1' day AS FromDate, ToDate + interval '1' year
FROM grps WHERE Horizon < 5
)
SELECT
Horizon,
(SELECT SUM(amount) FROM dataset WHERE date BETWEEN g.FromDate AND g.ToDate) AS SumOfAmount
FROM
grps g
SQL fiddle
Rather simply:
SELECT horizon, sum(amount) AS amount
FROM generate_series(1, 5) AS s(horizon)
JOIN dataset ON "date" >= current_date + (horizon - 1) * interval '1 year'
AND "date" < current_date + horizon * interval '1 year'
GROUP BY horizon
ORDER BY horizon;
You need a union and an aggregate function:
select 1 as horizon,
sum(amount) amount
from the_table
where date >= current_date
and date < current_date + interval '12' month
union all
select 2 as horizon,
sum(amount) amount
where date >= current_date + interval '12' month
and date < current_date + interval '24' month
union all
select 3 as horizon,
sum(amount) amount
where date >= current_date + interval '24' month
and date < current_date + interval '36' month
... and so on ...
But I don't know, how to do that with an obfuscation layer (aka ORM) but I'm sure it supports (or it should) aggregation and unions.
This could easily be wrapped up into a PL/PgSQL function where you pass the "horizon" and the SQL is built dynamically so that all you need to call is something like: select * from sum_horizon(5) where 5 indicates the number of years.
Btw: date is a horrible name for a column. For one because it's a reserved word, but more importantly because it doesn't document the meaning of the column. Is it a "release date"? A "due date"? An "order date"?
Try this
select
id,
sum(case when date>=current_date and date<current_date+interval 1 year then amount else 0 end) as year1,
sum(case when date>=current_date+interval 1 year and date<current_date+interval 2 year then amount else 0 end) as year2,
sum(case when date>=current_date+interval 2 year and date<current_date+interval 3 year then amount else 0 end) as year3,
sum(case when date>=current_date+interval 3 year and date<current_date+interval 4 year then amount else 0 end) as year4,
sum(case when date>=current_date+interval 4 year and date<current_date+interval 5 year then amount else 0 end) as year5
from table
group by id