Speed up query where results with count(*) = 0 are included

Speed up query where results with count(*) = 0 are included - sql

I have a table squitters with, amongst others, a column parsed_time. I want to know the number of records per hour for the last two days and used this query:
SELECT date_trunc('hour', parsed_time) AS hour , count(*)
FROM squitters
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day'
GROUP BY hour
ORDER BY hour DESC;
This works, but hours with zero records do not appear in the result. I want to have hours
with zero records also in the result with a count equal to zero, so I wrote this query using the generate_series function:
SELECT bins.hour, count(squitters.parsed_time)
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') bins(hour)
LEFT OUTER JOIN squitters ON bins.hour = date_trunc('hours', squitters.parsed_time)
GROUP BY bins.hour
ORDER BY bins.hour DESC;
This works, in the results are hour-bins with counts equal to zero, but is considerably slower.
How can I have the speed of the first query with the count=zero results of the second query?
(btw. there is an index on parsed_time)

You could try and change the join condition so no date function is applied on column parsed_time:
SELECT b.hour, COUNT(s.parsed_time) cnt
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') b(hour)
LEFT OUTER JOIN squitters s
ON s.parsed_time >= b.hour
AND s.parsed_time < b.hours + interval '1 hour'
GROUP BY b.hour
ORDER BY b.hour DESC;
Alternatively, you could also try using a correlated subquery (or a lateral join) instead of a left join - this avoids the need for outer aggregation:
SELECT
b.hour,
(
SELECT COUNT(*)
FROM squitters s
WHERE s.parsed_time >= b.hour AND s.parsed_time < b.hours + interval '1 hour'
) cnt
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') b(hour)
ORDER BY b.hour desc

You could take advantage of Common Table Expressions to divide your problem into small chunks:
WITH cte AS (
--First query your table
SELECT date_trunc('hour', parsed_time) AS sq_hour , count(*)
FROM squitters
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day'
GROUP BY hour
ORDER BY hour DESC
), series AS (
--Create the series without the data returned from 1st query
SELECT
bins.series_hour,
0
FROM
generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') bins(series_hour)
WHERE
series_hour not in (SELECT sq_hour FROM cte)
)
--Union the result
SELECT * FROM cte
UNION
SELECT * FROM series
ORDER BY 1

Related

Take last hour and group it by 1 minute

I was wondering if you can help me write a query that should just SELECT count(*) but only include data from last hour and group it by minute.
So I have a table that has a createdts so I have the date there. I just want to see how many entries I have in the last hour, but group COUNT(*) per minute.
SELECT COUNT(*) FROM mytable
WHERE createdts >= now()::date - interval '1 hour'
GROUP BY 'every minute'

DATE_TRUNC() does this:
SELECT DATE_TRUNC('minute', createdts), COUNT(*)
FROM mytable
WHERE createdts >= now()::date - interval '1 hour'
GROUP BY DATE_TRUNC('minute', createdts)
ORDER BY DATE_TRUNC('minute', createdts);

how to get date different in postgres using date_part option

How to get date time difference in PostgreSQL
I am using below syntax
select id, A_column,B_column,
(SELECT count(*) AS count_days_no_weekend
FROM generate_series(B_column ::timestamp , A_column ::timestamp, interval '1 day') the_day
WHERE extract('ISODOW' FROM the_day) < 5) * 24 + DATE_PART('hour', B_column::timestamp-A_column ::timestamp ) as hrs
FROM table req where id='123';
If A_column=2020-05-20 00:00:00 and B_column=2020-05-15 00:00:00 I want to get 72(in hours).
Is there any possibility to skip weekends(Saturday and Sunday) in first one, it means to get the result as 72 hours(exclude weekend hours)
i am getting 0
But i need to get 72 hours
And if If A_column=2020-08-15 12:00:00 and B_column=2020-08-15 00:00:00 I want to get 12(in hours).

One option uses a lateral join and generate_series() to enumerate each and every hour between the two timestamps, while filtering out week-ends:
select t.a_column, t.b_column, h.count_hours_no_weekend
from mytable t
cross join lateral (
select count(*) count_hours_no_weekend
from generate_series(t.b_column::timestamp, t.a_column::timestamp, interval '1 hour') s(col)
where extract('isodow' from s.col) < 5
) h
where id = 123

I would attack this by calculating the weekend hours to let the database deal with daylight savings time. I would then subtract the intervening weekend hours from the difference between the two date values.
with weekend_days as (
select *, date_part('isodow', ddate) as dow
from table1
cross join lateral
generate_series(
date_trunc('day', b_column),
date_trunc('day', a_column),
interval '1 day') as gs(ddate)
where date_part('isodow', ddate) in (6, 7)
), weekend_time as (
select id,
sum(
least(ddate + interval '1 day', a_column) -
greatest(ddate, b_column)
) as we_ival
from weekend_days
group by id
)
select t.id,
a_column - b_column as raw_difference,
coalesce(we_ival, interval '0') as adjustment,
a_column - b_column -
coalesce(we_ival, interval '0') as adj_difference
from weekend_time w
left join table1 t on t.id = w.id;
Working fiddle.

Postgres - Return 0 count for intervals with no data in date_trunc

I am trying to create a table that lists how many counts i have in 5 minute intervals over 10 days. I think my join is wrong since i am not getting the empty rows in my query.
select date_trunc('minute', activities.activitytime) -
(CAST(EXTRACT(MINUTE FROM activities.activitytime)
AS integer) % 5) * interval '1 minute' as day_column, count(activities.activityid)
from generate_series(current_date - interval '10 day', current_date, '1 minute') d
left join activities on date(activities.activitytime) = d
group by day_column
order by day_column;

You are close. But the key idea is that you need to use the columns from the generate_series() for the group by key:
select d.dte, count(a.activitytime)
from generate_series(current_date - interval '10 day', current_date, '5 minute') d(dte) left join
activities a
on a.activitytime >= d.dte and a.activitytime < d.dte + interval '5 minute'
group by d.dte
order by d.dte;

PostgreSQL generate_series with WHERE clause

I'm having an issue generating a series of dates and then returning the COUNT of rows matching that each date in the series.
SELECT generate_series(current_date - interval '30 days', current_date, '1 day':: interval) AS i, COUNT(*)
FROM download
WHERE product_uuid = 'someUUID'
AND created_at = i
GROUP BY created_at::date
ORDER BY created_at::date ASC
I want the output to be the number of rows that match the current date in the series.
05-05-2018, 35
05-06-2018, 23
05-07-2018, 0
05-08-2018, 10
...
The schema has the following columns: id, product_uuid, created_at. Any help would be greatly appreciated. I can add more detail if needed.

Put the table generating function in the from and use a join:
SELECT g.dte, COUNT(d.product_uuid)
FROM generate_series(current_date - interval '30 days', current_date, '1 day':: interval
) gs(dte) left join
download d
on d.product_uuid = 'someUUID' AND
d.created_at::date = g.dte
GROUP BY g.dte
ORDER BY g.dte;

how to insert non-grouped data

Inspired by this great answer I wrote the following query that returns the AVG calculated according 5-minutes intervals for the last year.
What I would like to have is all the 5-minutes intervals and, in case, set to null if no rows fit into a particular timespan.
with intervals as (select
(select min("timestamp") from public.hst_energy_d) + n AS start_timestamp,
(select min("timestamp") from public.hst_energy_d) + n + 299 AS end_timestamp
from generate_series(extract(epoch from now())::BIGINT - 10596096000, extract(epoch from now())::BIGINT, 300) n)
(SELECT AVG(meas."Al1") as "avg", islots.start_timestamp AS "timestamp"
FROM public.hst_energy_d meas
RIGHT OUTER JOIN intervals islots
on meas.timestamp >= islots.start_timestamp and meas.timestamp <= islots.end_timestamp
WHERE
meas.idinstrum = 4
AND
meas.id_device = 122
AND
meas.timestamp > extract(epoch from now()) - 10596096000
GROUP BY islots.start_timestamp, islots.end_timestamp
ORDER BY timestamp);

I think I see what you're trying to do, and I wonder if using interval '5 minutes' liberally would't be a better and easier to follow approach:
with times as ( -- find the first date in the dataset, up to today
select
date_trunc ('minutes', min("timestamp")) -
mod (extract ('minutes' from min("timestamp"))::int, 5) * interval '1 minute' as bt,
date_trunc ('minutes', current_timestamp) -
mod (extract ('minutes' from current_timestamp)::int, 5) * interval '1 minute' as et
from hst_energy_d
where
idinstrum = 4 and
id_device = 122
), -- generate every possible range between these dates
ranges as (
select
generate_series(bt, et, interval '5 minutes') as range_start
from times
), -- normalize your data to which 5-minut interval it belongs to
rounded_hst as (
select
date_trunc ('minutes', "timestamp") -
mod (extract ('minutes' from "timestamp")::int, 5) * interval '1 minute' as round_time,
*
from hst_energy_d
where
idinstrum = 4 and
id_device = 122
)
select
r.range_start, r.range_start + interval '5 minutes' as range_end,
avg (hd."Al1")
from
ranges r
left join rounded_hst hd on
r.range_start = hd.round_time
group by
r.range_start
order by
r.range_start
By the way, the discerning eye may wonder why bother with the CTE rounded_hst and why not just use a "between" in the join. From everything I've tested and observed, the database will explode out all possibilities and then test the between condition in what amounts to a where clause -- a filtered cartesian. For this many intervals, that's guaranteed to be a killer.
The truncation of each data to the nearest five-minutes allows for a standard SQL join. I encourage you to test both, and I think you'll see what I mean.
-- EDIT 11/17/2016 --
Solution from OP that takes into account the times are numbers, not dates:
with times as ( -- find the first date in the dataset, up to today
select
date_trunc('minutes', to_timestamp(min("timestamp"))::timestamp) -
mod(extract ('minutes' from to_timestamp(min("timestamp"))::timestamp)::int, 5) * interval '1 minute' as bt,
date_trunc('minutes', current_timestamp::timestamp) -
mod(extract ('minutes' from (current_timestamp)::timestamp)::int, 5) * interval '1 minute' as et
from hst_energy_d
where
idinstrum = 4 and
id_device = 122
), -- generate every possible range between these dates
ranges as (
select
generate_series(bt, et, interval '5 minutes') as range_start
from times
), -- normalize your data to which 5-minute interval it belongs to
rounded_hst as (
select
date_trunc ('minutes', to_timestamp("timestamp")::timestamp)::timestamp -
mod (extract ('minutes' from (to_timestamp("timestamp")::timestamp))::int, 5) * interval '1 minute' as round_time,
*
from hst_energy_d
where
idinstrum = 4 and
id_device = 122
)
select
extract('epoch' from r.range_start)::bigint, extract('epoch' from r.range_start + interval '5 minutes')::bigint as range_end,
avg (hd."Al1")
from
ranges r
left join rounded_hst hd on
r.range_start = hd.round_time
group by
r.range_start
order by
r.range_start;

I think this post will be suitable for you
Group DateTime into 5,15,30 and 60 minute intervals
This is a way of grouping dates, I'll recommend to build a scalar function.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Speed up query where results with count(*) = 0 are included - sql

Related

Take last hour and group it by 1 minute

how to get date different in postgres using date_part option

Postgres - Return 0 count for intervals with no data in date_trunc

PostgreSQL generate_series with WHERE clause

how to insert non-grouped data

Categories

Resources