PostgreSQL generate_series with WHERE clause

PostgreSQL generate_series with WHERE clause - sql

I'm having an issue generating a series of dates and then returning the COUNT of rows matching that each date in the series.
SELECT generate_series(current_date - interval '30 days', current_date, '1 day':: interval) AS i, COUNT(*)
FROM download
WHERE product_uuid = 'someUUID'
AND created_at = i
GROUP BY created_at::date
ORDER BY created_at::date ASC
I want the output to be the number of rows that match the current date in the series.
05-05-2018, 35
05-06-2018, 23
05-07-2018, 0
05-08-2018, 10
...
The schema has the following columns: id, product_uuid, created_at. Any help would be greatly appreciated. I can add more detail if needed.

Put the table generating function in the from and use a join:
SELECT g.dte, COUNT(d.product_uuid)
FROM generate_series(current_date - interval '30 days', current_date, '1 day':: interval
) gs(dte) left join
download d
on d.product_uuid = 'someUUID' AND
d.created_at::date = g.dte
GROUP BY g.dte
ORDER BY g.dte;

Related

how to get date different in postgres using date_part option

How to get date time difference in PostgreSQL
I am using below syntax
select id, A_column,B_column,
(SELECT count(*) AS count_days_no_weekend
FROM generate_series(B_column ::timestamp , A_column ::timestamp, interval '1 day') the_day
WHERE extract('ISODOW' FROM the_day) < 5) * 24 + DATE_PART('hour', B_column::timestamp-A_column ::timestamp ) as hrs
FROM table req where id='123';
If A_column=2020-05-20 00:00:00 and B_column=2020-05-15 00:00:00 I want to get 72(in hours).
Is there any possibility to skip weekends(Saturday and Sunday) in first one, it means to get the result as 72 hours(exclude weekend hours)
i am getting 0
But i need to get 72 hours
And if If A_column=2020-08-15 12:00:00 and B_column=2020-08-15 00:00:00 I want to get 12(in hours).

One option uses a lateral join and generate_series() to enumerate each and every hour between the two timestamps, while filtering out week-ends:
select t.a_column, t.b_column, h.count_hours_no_weekend
from mytable t
cross join lateral (
select count(*) count_hours_no_weekend
from generate_series(t.b_column::timestamp, t.a_column::timestamp, interval '1 hour') s(col)
where extract('isodow' from s.col) < 5
) h
where id = 123

I would attack this by calculating the weekend hours to let the database deal with daylight savings time. I would then subtract the intervening weekend hours from the difference between the two date values.
with weekend_days as (
select *, date_part('isodow', ddate) as dow
from table1
cross join lateral
generate_series(
date_trunc('day', b_column),
date_trunc('day', a_column),
interval '1 day') as gs(ddate)
where date_part('isodow', ddate) in (6, 7)
), weekend_time as (
select id,
sum(
least(ddate + interval '1 day', a_column) -
greatest(ddate, b_column)
) as we_ival
from weekend_days
group by id
)
select t.id,
a_column - b_column as raw_difference,
coalesce(we_ival, interval '0') as adjustment,
a_column - b_column -
coalesce(we_ival, interval '0') as adj_difference
from weekend_time w
left join table1 t on t.id = w.id;
Working fiddle.

Speed up query where results with count(*) = 0 are included

I have a table squitters with, amongst others, a column parsed_time. I want to know the number of records per hour for the last two days and used this query:
SELECT date_trunc('hour', parsed_time) AS hour , count(*)
FROM squitters
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day'
GROUP BY hour
ORDER BY hour DESC;
This works, but hours with zero records do not appear in the result. I want to have hours
with zero records also in the result with a count equal to zero, so I wrote this query using the generate_series function:
SELECT bins.hour, count(squitters.parsed_time)
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') bins(hour)
LEFT OUTER JOIN squitters ON bins.hour = date_trunc('hours', squitters.parsed_time)
GROUP BY bins.hour
ORDER BY bins.hour DESC;
This works, in the results are hour-bins with counts equal to zero, but is considerably slower.
How can I have the speed of the first query with the count=zero results of the second query?
(btw. there is an index on parsed_time)

You could try and change the join condition so no date function is applied on column parsed_time:
SELECT b.hour, COUNT(s.parsed_time) cnt
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') b(hour)
LEFT OUTER JOIN squitters s
ON s.parsed_time >= b.hour
AND s.parsed_time < b.hours + interval '1 hour'
GROUP BY b.hour
ORDER BY b.hour DESC;
Alternatively, you could also try using a correlated subquery (or a lateral join) instead of a left join - this avoids the need for outer aggregation:
SELECT
b.hour,
(
SELECT COUNT(*)
FROM squitters s
WHERE s.parsed_time >= b.hour AND s.parsed_time < b.hours + interval '1 hour'
) cnt
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') b(hour)
ORDER BY b.hour desc

You could take advantage of Common Table Expressions to divide your problem into small chunks:
WITH cte AS (
--First query your table
SELECT date_trunc('hour', parsed_time) AS sq_hour , count(*)
FROM squitters
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day'
GROUP BY hour
ORDER BY hour DESC
), series AS (
--Create the series without the data returned from 1st query
SELECT
bins.series_hour,
0
FROM
generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') bins(series_hour)
WHERE
series_hour not in (SELECT sq_hour FROM cte)
)
--Union the result
SELECT * FROM cte
UNION
SELECT * FROM series
ORDER BY 1

Postgres - Return 0 count for intervals with no data in date_trunc

I am trying to create a table that lists how many counts i have in 5 minute intervals over 10 days. I think my join is wrong since i am not getting the empty rows in my query.
select date_trunc('minute', activities.activitytime) -
(CAST(EXTRACT(MINUTE FROM activities.activitytime)
AS integer) % 5) * interval '1 minute' as day_column, count(activities.activityid)
from generate_series(current_date - interval '10 day', current_date, '1 minute') d
left join activities on date(activities.activitytime) = d
group by day_column
order by day_column;

You are close. But the key idea is that you need to use the columns from the generate_series() for the group by key:
select d.dte, count(a.activitytime)
from generate_series(current_date - interval '10 day', current_date, '5 minute') d(dte) left join
activities a
on a.activitytime >= d.dte and a.activitytime < d.dte + interval '5 minute'
group by d.dte
order by d.dte;

Postgresql get max row group by column

I am trying to get the max row from the sum of daily counts in a table. I have looked at several posts that look similar, however it doesn't seem to work. I have tried to follow
Get MAX row for GROUP in MySQL
but it doesn't work in Postgres. Here's what I have
select source, SUM(steps) as daily_steps, to_char("endTime"::date, 'MM/DD/YYYY') as step_date
from activities
where user_id = 1
and "endTime" <= CURRENT_TIMESTAMP + INTERVAL '1 day'
and "endTime" >= CURRENT_TIMESTAMP - INTERVAL '7 days'
group by source, to_char("endTime"::date, 'MM/DD/YYYY')
This returns the following
source, daily_steps, step_date
"walking";750;"11/17/2015"
"walking";821;"11/22/2015"
"walking";106;"11/20/2015"
"running";234;"11/21/2015"
"running";600;"11/24/2015"
I would like the result to return only the rows that have the max value for daily_steps by source. The result should look like
source, daily_steps, step_date
"walking";821;"11/22/2015"
"running";600;"11/24/2015"

Postgres offers the convenient distinct on syntax:
select distinct on (a.source) a.*
from (select source, SUM(steps) as daily_steps, to_char("endTime"::date, 'MM/DD/YYYY') as step_date
from activities a
where user_id = 1 and
"endTime" <= CURRENT_TIMESTAMP + INTERVAL '1 day' and
"endTime" >= CURRENT_TIMESTAMP - INTERVAL '7 days'
group by source, to_char("endTime"::date, 'MM/DD/YYYY')
) a
order by a.source, daily_steps desc;

Daily average for the month (needs number of days in month)

I have a table as follow:
CREATE TABLE counts
(
T TIMESTAMP NOT NULL,
C INTEGER NOT NULL
);
I create the following views from it:
CREATE VIEW micounts AS
SELECT DATE_TRUNC('minute',t) AS t,SUM(c) AS c FROM counts GROUP BY 1;
CREATE VIEW hrcounts AS
SELECT DATE_TRUNC('hour',t) AS t,SUM(c) AS c,SUM(c)/60 AS a
FROM micounts GROUP BY 1;
CREATE VIEW dycounts AS
SELECT DATE_TRUNC('day',t) AS t,SUM(c) AS c,SUM(c)/24 AS a
FROM hrcounts GROUP BY 1;
The problem now comes in when I want to create the monthly counts to know what to divide the daily sums by to get the average column a i.e. the number of days in the specific month.
I know to get the days in PostgreSQL you can do:
SELECT DATE_PART('days',DATE_TRUNC('month',now())+'1 MONTH'::INTERVAL-DATE_TRUNC('month',now()))
But I can't use now(), I have to somehow let it know what the month is when the grouping gets done. Any suggestions i.e. what should replace ??? in this view:
CREATE VIEW mocounts AS
SELECT DATE_TRUNC('month',t) AS t,SUM(c) AS c,SUM(c)/(???) AS a
FROM dycounts
GROUP BY 1;

A bit shorter and faster and you get the number of days instead of an interval:
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 month'
- interval '1 day')
It's possible to combine multiple units in a single interval value . So we can use '1 mon - 1 day':
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 mon - 1 day')
(mon, month or months work all the same for month units.)
To divide the daily sum by the number of days in the current month (orig. question):
SELECT t::date AS the_date
, SUM(c) AS c
, SUM(c) / EXTRACT(day FROM date_trunc('month', t::date)
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
To divide monthly sum by the number of days in the current month (updated question):
SELECT DATE_TRUNC('month', t)::date AS t
,SUM(c) AS c
,SUM(c) / EXTRACT(day FROM date_trunc('month', t)::date
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
You have to repeat the GROUP BY expression if you want to use a single query level.
Or use a subquery:
SELECT *, c / EXTRACT(day FROM t + interval '1 mon - 1 day') AS a
FROM (
SELECT date_trunc('month', t)::date AS t, SUM(c) AS c
FROM dycounts
GROUP BY 1
) sub;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

PostgreSQL generate_series with WHERE clause - sql

Related

how to get date different in postgres using date_part option

Speed up query where results with count(*) = 0 are included

Postgres - Return 0 count for intervals with no data in date_trunc

Postgresql get max row group by column

Daily average for the month (needs number of days in month)

Categories

Resources