Postgres - Fast way to sum over rows from last day of month - sql

I want to query a table and sum a column for all of the rows from the last day of the month.
Let's use the following table as an example:
CREATE TABLE example(dt date, value int)
(The real table has many more columns and is relatively large, and the real query is more complicated)
I have the following query:
SELECT dt, SUM(value)
FROM example
WHERE dt IN (SELECT DISTINCT
date_trunc('MONTH', generate_series('2012-01-01'::date,
'2016-12-01'::date,
interval '1 day') + INTERVAL '1 MONTH - 1 day')::date)
GROUP BY dt
It runs in about ~2 seconds on my real table.
However, if I generate the full list of end-of-month days in my range and parameterise the query like so:
SELECT dt, SUM(value)
FROM example
WHERE dt IN ('2012-01-31', ...)
GROUP BY dt
It's much quicker, ~750ms.
I would prefer not to generate the dates and pass them through to the query like that, is there a way I can do this entirely in SQL and make it as fast as the latter version?

The sub-select is needlessly complicated. It can be simplified to:
SELECT dt, SUM(value)
FROM example
WHERE dt IN (SELECT d::date
from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') dates (d)
GROUP BY dt; --<< the group by is necessary
Maybe that speeds up the query.
You can also try to put the date generation into a CTE:
with dates (d) as (
SELECT t::date
from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') t
)
SELECT dt, SUM(value)
FROM example
WHERE dt IN ( select d from dates)
GROUP BY dt;
Sometimes doing a JOIN is also more efficient:
with dates (d) as (
SELECT t::date
from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') t
)
SELECT dt, SUM(value)
FROM example
JOIN dates on example.dt = dates.d
GROUP BY dt;

The performance problem in your query comes from the fact that you are generating a daily series. Change it to monthly, remove the distinct and add a group by
select dt, sum(value)
from
example
inner join (
select date_trunc('month', dt) + interval '1 month - 1 day' as dt
from generate_series('2012-01-01'::date, '2016-12-01', '1 month') gs (dt)
) d using (dt)
group by dt

Related

SQL Query to group dates and includes different dates in the aggregation

I have a table with two columns, dates and number of searches in each date. What I want to do is group by the dates, and find the sum of number of searches for each date.
The trick is that for each group, I also want to include the number of searches for the date exactly the following week, and the number of searches for the date exactly the previous week.
So If I have
Date
Searches
2/3/2023
2
2/10/2023
4
2/17/2023
1
2/24/2023
5
I want the output for the 2/10/2023 and 2/17/2023 groups to be
Date
Sum
2/10/2023
7
2/17/2023
10
How can I write a query for this?
You can use a correlated query for this:
select date, (
select sum(searches)
from t as x
where x.date between t.date - interval '7 day' and t.date + interval '7 day'
) as sum_win
from t
Replace interval 'x day' with the appropriate date add function for your RDBMS.
If your RDBMS supports interval in window functions then a much better solution would be:
select date, sum(searches) over (
order by date
range between interval '7 day' preceding and interval '7 day' following
) as sum_win
from t
Assuming weekly rows
CREATE TABLE Table1
([Dates] date, [Searches] int)
;
INSERT INTO Table1
([Dates], [Searches])
VALUES
('2023-02-03 00:00:00', 2),
('2023-02-10 00:00:00', 4),
('2023-02-17 00:00:00', 1),
('2023-02-24 00:00:00', 5)
;
;with cte as (
select dates
, searches
+ lead(searches) over(order by dates)
+ lag(searches) over(order by dates) as sum_searches
from table1)
select * from cte
where sum_searches is not null;
dates
sum_searches
2023-02-10
7
2023-02-17
10
fiddle

How to use generate_series to get the sum of values in a weekly interval

I'm having trouble using generate_series in a weekly interval. I have two examples here, one is in a monthly interval and it is working as expected. It is returning each month and the sum of the facts.sends values. I'm trying to do the same exact thing in a weekly interval, but the values are not being added correctly.
Monthly interval (Working):
https://www.db-fiddle.com/f/a9SbDBpa9SMGxM3bk8fMAD/0
Weekly interval (Not working): https://www.db-fiddle.com/f/tNxRbCxvgwswoaN7esDk5w/2
You should generate a series that starts on Monday.
WITH range_values AS (
SELECT date_trunc('week', min(fact_date)) as minval,
date_trunc('week', max(fact_date)) as maxval
FROM facts),
week_range AS (
SELECT generate_series(date_trunc('week', '2022-05-01'::date), now(), '1 week') as week
FROM range_values
),
grouped_facts AS (
SELECT date_trunc('week', fact_date) as week,
sends
FROM facts
WHERE
fact_date >= '2022-05-20'
)
SELECT week_range.week,
COALESCE(sum(sends)::integer, 0) AS total_sends
FROM week_range
LEFT OUTER JOIN grouped_facts on week_range.week = grouped_facts.week
GROUP BY 1
ORDER BY 1;
DB Fiddle.

how can i use unnest generate date array to provide end of month index?

i want to be able to create a fake index for my data so e.g. if i have an single order i want it repeated for every date in the array created below.
select
*
from
database.data,
UNNEST(GENERATE_DATE_ARRAY(
'2014-01-01',
(SELECT
MAX(Order_Date)
FROM
database.data), INTERVAL 1 MONTH)) AS month
however this creates an index of the 1st of each month, how can i change this so it's the end of every month? e.g. 2014-01-31, and 1 month interval, onwards
You can use date arithmetics:
select d.*, date_sub(date_add(dt, 1, interval 1 month), interval 1 day)
from database.data d
cross join unnest(
generate_date_array('2014-01-01', (select max(order_date) from database.data), interval 1 month)
) as dt
As of 10/14/2020, a new function LAST_DAY is released to do this in one stop:
SELECT LAST_DAY(DATE '2008-11-25', MONTH) AS last_day

how to get date different in postgres using date_part option

How to get date time difference in PostgreSQL
I am using below syntax
select id, A_column,B_column,
(SELECT count(*) AS count_days_no_weekend
FROM generate_series(B_column ::timestamp , A_column ::timestamp, interval '1 day') the_day
WHERE extract('ISODOW' FROM the_day) < 5) * 24 + DATE_PART('hour', B_column::timestamp-A_column ::timestamp ) as hrs
FROM table req where id='123';
If A_column=2020-05-20 00:00:00 and B_column=2020-05-15 00:00:00 I want to get 72(in hours).
Is there any possibility to skip weekends(Saturday and Sunday) in first one, it means to get the result as 72 hours(exclude weekend hours)
i am getting 0
But i need to get 72 hours
And if If A_column=2020-08-15 12:00:00 and B_column=2020-08-15 00:00:00 I want to get 12(in hours).
One option uses a lateral join and generate_series() to enumerate each and every hour between the two timestamps, while filtering out week-ends:
select t.a_column, t.b_column, h.count_hours_no_weekend
from mytable t
cross join lateral (
select count(*) count_hours_no_weekend
from generate_series(t.b_column::timestamp, t.a_column::timestamp, interval '1 hour') s(col)
where extract('isodow' from s.col) < 5
) h
where id = 123
I would attack this by calculating the weekend hours to let the database deal with daylight savings time. I would then subtract the intervening weekend hours from the difference between the two date values.
with weekend_days as (
select *, date_part('isodow', ddate) as dow
from table1
cross join lateral
generate_series(
date_trunc('day', b_column),
date_trunc('day', a_column),
interval '1 day') as gs(ddate)
where date_part('isodow', ddate) in (6, 7)
), weekend_time as (
select id,
sum(
least(ddate + interval '1 day', a_column) -
greatest(ddate, b_column)
) as we_ival
from weekend_days
group by id
)
select t.id,
a_column - b_column as raw_difference,
coalesce(we_ival, interval '0') as adjustment,
a_column - b_column -
coalesce(we_ival, interval '0') as adj_difference
from weekend_time w
left join table1 t on t.id = w.id;
Working fiddle.

Daily average for the month (needs number of days in month)

I have a table as follow:
CREATE TABLE counts
(
T TIMESTAMP NOT NULL,
C INTEGER NOT NULL
);
I create the following views from it:
CREATE VIEW micounts AS
SELECT DATE_TRUNC('minute',t) AS t,SUM(c) AS c FROM counts GROUP BY 1;
CREATE VIEW hrcounts AS
SELECT DATE_TRUNC('hour',t) AS t,SUM(c) AS c,SUM(c)/60 AS a
FROM micounts GROUP BY 1;
CREATE VIEW dycounts AS
SELECT DATE_TRUNC('day',t) AS t,SUM(c) AS c,SUM(c)/24 AS a
FROM hrcounts GROUP BY 1;
The problem now comes in when I want to create the monthly counts to know what to divide the daily sums by to get the average column a i.e. the number of days in the specific month.
I know to get the days in PostgreSQL you can do:
SELECT DATE_PART('days',DATE_TRUNC('month',now())+'1 MONTH'::INTERVAL-DATE_TRUNC('month',now()))
But I can't use now(), I have to somehow let it know what the month is when the grouping gets done. Any suggestions i.e. what should replace ??? in this view:
CREATE VIEW mocounts AS
SELECT DATE_TRUNC('month',t) AS t,SUM(c) AS c,SUM(c)/(???) AS a
FROM dycounts
GROUP BY 1;
A bit shorter and faster and you get the number of days instead of an interval:
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 month'
- interval '1 day')
It's possible to combine multiple units in a single interval value . So we can use '1 mon - 1 day':
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 mon - 1 day')
(mon, month or months work all the same for month units.)
To divide the daily sum by the number of days in the current month (orig. question):
SELECT t::date AS the_date
, SUM(c) AS c
, SUM(c) / EXTRACT(day FROM date_trunc('month', t::date)
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
To divide monthly sum by the number of days in the current month (updated question):
SELECT DATE_TRUNC('month', t)::date AS t
,SUM(c) AS c
,SUM(c) / EXTRACT(day FROM date_trunc('month', t)::date
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
You have to repeat the GROUP BY expression if you want to use a single query level.
Or use a subquery:
SELECT *, c / EXTRACT(day FROM t + interval '1 mon - 1 day') AS a
FROM (
SELECT date_trunc('month', t)::date AS t, SUM(c) AS c
FROM dycounts
GROUP BY 1
) sub;