Combine multiple selects for statistics generation into on result set - sql

Hey Pros,
I am far away to have good knowledge about SQL, and would ask you to give me some hints.
Currently we aggregate our data with python and I would try to switch this when possible to. (SQL (Postgresql server)
My goal is to have one statment that generate an average for two seperates column's for specific time intervals (1 Hour, 1 Day, 1 Week, Overall) also all events in each period shoud be counted.
I can create 4 single statments for each interval but strugle how to combine this four selects into on result set.
select
count(id) as hour_count,
camera_name,
round(avg("pconf")) as hour_p_conf,
round(avg("dconf")) as hour_d_conf
from camera_events where timestamp between NOW() - interval '1 HOUR' and NOW() group by camera_name;
select
count(id) as day_count,
camera_name,
round(avg("pconf")) as day_p_conf,
round(avg("dconf")) as day_d_conf
from camera_events where timestamp between NOW() - interval '1 DAY' and NOW() group by camera_name;
select
count(id) as week_count,
camera_name,
round(avg("pconf")) as week_p_conf,
round(avg("dconf")) as week_d_conf
from camera_events where timestamp between NOW() - interval '1 WEEK' and NOW() group by camera_name;
select
count(id) as overall_count,
camera_name,
round(avg("pconf")) as overall_p_conf,
round(avg("dconf")) as overall_d_conf
from camera_events group by camera_name;
When possbile the result should look like the data on image
Some hints would be great, thank u

Consider conditional aggregation by moving WHERE logic to CASE statements in SELECT. Alternatively, in PostgreSQL use FILTER clauses.
select
camera_name,
count(id) filter(timestamp between NOW() - interval '1 HOUR' and NOW()) as hour_count,
round(avg("pconf") filter(timestamp between NOW() - interval '1 HOUR' and NOW())) as hour_p_conf,
round(avg("dconf") filter(timestamp between NOW() - interval '1 HOUR' and NOW())) as hour_d_conf,
count(id) filter(timestamp between NOW() - interval '1 DAY' and NOW()) as day_count,
round(avg("pconf") filter(timestamp between NOW() - interval '1 DAY' and NOW())) as day_p_conf,
round(avg("dconf") filter(timestamp between NOW() - interval '1 DAY' and NOW())) as day_d_conf,
count(id) filter(timestamp between NOW() - interval '1 WEEK' and NOW()) as week_count,
round(avg("pconf") filter(timestamp between NOW() - interval '1 WEEK' and NOW())) as week_p_conf,
round(avg("dconf") filter(timestamp between NOW() - interval '1 WEEK' and NOW())) as week_d_conf,
count(id) as overall_count,
round(avg("pconf")) as overall_p_conf,
round(avg("dconf")) as overall_d_conf
from camera_events
group by camera_name;

The simplest way is to join them. For example:
select
coalesce(h.camera_name, d.camera_name, w.camera_name) as camera_name
h.hour_count, h.hour_p_conf, h.hour_d_conf
d.day_count, d.day_p_conf, d.day_d_conf
w.week_count, w.week_p_conf, w.week_d_conf
from (
-- hourly query here
) h
full join (
-- daily query here
) d on d.camera_name = h.camera_name
full join (
-- weekly query here
) w on w.camera_name = coalesce(h.camera_name, d.camera_name)

Related

replace multiple queries with a single query which will give the same result

The queries are :
select date_trunc('month', now()) - interval '1 month' as prev_month_first_date
select (date_trunc('month', now())::date-1 - interval '0 days') as prev_month_last_date;
select date_trunc('month', now()) as current_month_first_date;
SELECT date_trunc('year', now()) as current_year_first_date;
SELECT date_trunc('quarter', now()) as current_quarter_first_date;
can I form a single query with all these queries? I am working on postgresql.
If you are looking for separate columns, just use a single select with multiple expressions:
select date_trunc('month', now()) - interval '1 month' as prev_month_first_date,
(date_trunc('month', now())::date-1 - interval '0 days') as prev_month_last_date,
date_trunc('month', now()) as current_month_first_date,
date_trunc('year', now()) as current_year_first_date,
date_trunc('quarter', now()) as current_quarter_first_date;

Create a report with 2 rows of values, each of which is from a separate SELECT statement

I have a report (using Blazer, if you care) that displays data like this, of recently updated or created rows in the jobs table:
5 Minutes | 1 Hour | 1 Day | Total
----------------------------------
0 0 367 30,989
The SQL looks something like this:
SELECT
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
) as "Total"
;
I want to add a second row, for jobs WHERE "Jobs"."active" IS TRUE. How do I make this display another row?
I want the final result to be something like this:
Status | 5 Minutes | 1 Hour | 1 Day | Total
-------------------------------------------
* 0 0 367 30,989
Active 0 0 123 24,972
The labels are not the issue. The only thing that's not obvious is how to create a new row.
The simplest way is to UNION on another bunch of queries, that have this more restrictive where clause:
SELECT
'*' as Kind,
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
) as "Total"
UNION ALL
SELECT
'Active',
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE
) as "Total"
If I were you, I would prefer this way to resolve your query:
select
"Jobs"."active" as Status,
sum(case when "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW() then 1 else 0 end) as "5 Minutes",
sum(case when "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW() then 1 else 0 end) as "1 Hour",
sum(case when "Jobs"."updated_at" "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW() then 1 else 0 end) as "1 Day",
count(*) as "Total"
from public.jobs AS "Jobs"
group by "Jobs"."active"
This way you read your table public.jobs once, and not several times (once per count). With this choice, grouping by the status is a simple group by operation
Basically, you want conditional aggregation. In Postgres, that would normally use filter:
SELECT COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '5 minute' AND NOW()) as cnt_5_minutes,
COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '1 hour' AND NOW()) as cnt_1_hour,
COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '1 day' AND NOW()) as cnt_1_day,
COUNT(*) as Total
FROM public.jobs j;
You probably don't have future update dates, so this would more simply be written as:
SELECT COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '5 minute') as cnt_5_minutes,
COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '1 hour') as cnt_1_hour,
COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '1 day') as cnt_1_day,
COUNT(*) as Total
FROM public.jobs j;
In addition, I would advise you to drop the double quotes from updated_at. Using double quotes around identifiers is just a bad habit.
The only thing that's not obvious is how to create a new row.
Basically, add a second row with UNION ALL.
First get rid of all the separate SELECT queries for each metric, though. That's needlessly expensive (important if the table is not trivially small). A single SELECT with conditional aggregates can replace all of your original (like Gordon suggested). In Postgres 9.4 or later, the aggregate FILTER clause is the way to go. See:
Aggregate columns with additional (distinct) filters
To get another row you could just run a another query adding the filter "active" IS TRUE to each expression (which boils down to just active, as a boolean column needs no further evaluation).
But that would double the cost again, and we can avoid that. Run a single SELECT in a CTE, and the split results with UNION ALL in the outer query:
WITH cte AS (
SELECT count(*) FILTER (WHERE updated_at > now() - interval '5 min') AS ct_5min
, count(*) FILTER (WHERE updated_at > now() - interval '5 min' AND active) AS ct_5min_a
, count(*) FILTER (WHERE updated_at > now() - interval '1 hour') AS ct_1h
, count(*) FILTER (WHERE updated_at > now() - interval '1 hour' AND active) AS ct_1h_a
, count(*) FILTER (WHERE updated_at > now() - interval '1 day') AS ct_1d
, count(*) FILTER (WHERE updated_at > now() - interval '1 day' AND active) AS ct_1d_a
, count(*) AS ct_all
, count(*) FILTER (WHERE active) AS ct_all_a
FROM public.jobs
)
SELECT '*' AS status, ct_5min, ct_1h, ct_1d, ct_all
FROM cte
UNION ALL
SELECT 'Active', ct_5min_a, ct_1h_a, ct_1d_a, ct_all_a
FROM cte

SQL: Select average value of column for last hour and last day

I have a table like below image. What I need is to get average value of Volume column, grouped by User both for 1 hour and 24 hours ago. How can I use avg with two different date range in single query?
You can do it like:
SELECT user, AVG(Volume)
FROM mytable
WHERE created >= NOW() - interval '1 hour'
AND created <= NOW()
GROUP BY user
Few things to remember, you are executing the query on same server with same time zone. You need to group by the user to group all the values in volume column and then apply the aggregation function like avg to find average. Similarly if you need both together then you could do the following:
SELECT u1.user, u1.average, u2.average
FROM
(SELECT user, AVG(Volume) as average
FROM mytable
WHERE created >= NOW() - interval '1 hour'
AND created <= NOW()
GROUP BY user) AS u1
INNER JOIN
(SELECT user, AVG(Volume) as average
FROM mytable
WHERE created >= NOW() - interval '1 day'
AND created <= NOW()
GROUP BY user) AS u2
ON u1.user = u2.user
Use conditional aggregation. Postgres offers very convenient syntax using the FILTER clause:
SELECT user,
AVG(Volume) FILTER (WHERE created >= NOW() - interval '1 hour' AND created <= NOW()) as avg_1hour,
AVG(Volume) FILTER (WHERE created >= NOW() - interval '1 day' AND created <= NOW()) as avg_1day
FROM mytable
WHERE created >= NOW() - interval '1 DAY' AND
created <= NOW()
GROUP BY user;
This will filter out users who have had no activity in the past day. If you want all users -- even those with no recent activity -- remove the WHERE clause.
The more traditional method uses CASE:
SELECT user,
AVG(CASE WHEN created >= NOW() - interval '1 hour' AND created <= NOW() THEN Volume END) as avg_1hour,
AVG(CASE WHEN created >= NOW() - interval '1 day' AND created <= NOW() THEN Volume END) as avg_1day
. . .
SELECT User, AVG(Volume) , ( IIF(created < DATE_SUB(NOW(), INTERVAL 1 HOUR) , 1 , 0) )IntervalType
WHERE created < DATE_SUB(NOW(), INTERVAL 1 HOUR)
AND created < DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY User, (IIF(created < DATE_SUB(NOW(), INTERVAL 1 HOUR))
Please Tell me about it's result :)

Postgres - Return 0 count for intervals with no data in date_trunc

I am trying to create a table that lists how many counts i have in 5 minute intervals over 10 days. I think my join is wrong since i am not getting the empty rows in my query.
select date_trunc('minute', activities.activitytime) -
(CAST(EXTRACT(MINUTE FROM activities.activitytime)
AS integer) % 5) * interval '1 minute' as day_column, count(activities.activityid)
from generate_series(current_date - interval '10 day', current_date, '1 minute') d
left join activities on date(activities.activitytime) = d
group by day_column
order by day_column;
You are close. But the key idea is that you need to use the columns from the generate_series() for the group by key:
select d.dte, count(a.activitytime)
from generate_series(current_date - interval '10 day', current_date, '5 minute') d(dte) left join
activities a
on a.activitytime >= d.dte and a.activitytime < d.dte + interval '5 minute'
group by d.dte
order by d.dte;

Gain Perfomance of Postgresql Query

I have a postgresql Database with a denormalized Schema (1 Table) with around 4 Million entries. Now i have this query:
SELECT
count(*) AS Total,
(SELECT count(*) FROM table
WHERE "Timestamp" > current_timestamp - INTERVAL '1 hour' AND "tableName" LIKE '%ping%') AS hour,
(SELECT count(*) FROM table
WHERE "Timestamp" > now() :: DATE AND "tableName" LIKE '%ping%') AS day,
(SELECT count(*)
FROM table
WHERE "Timestamp" > now() :: DATE - INTERVAL '1 day' AND
"Timestamp" <= now() :: DATE - INTERVAL '1 day' AND "tableName" LIKE '%ping%') AS yesterday,
(SELECT count(*) FROM table
WHERE "Timestamp" > now() :: DATE - INTERVAL '2 day' AND
"Timestamp" <= now() :: DATE - INTERVAL '1 day' AND "tableName" LIKE '%ping%') AS "dayBeforeYesterday",
(SELECT count(*)
FROM table WHERE "Timestamp" > current_timestamp - INTERVAL '1 week' AND "tableName" LIKE '%ping%') AS week,
(SELECT count(*)
FROM table
WHERE "Timetamp" > current_timestamp - INTERVAL '2 week' AND
"Timestamp" < current_timestamp - INTERVAL '1 week' AND "tableName" LIKE '%ping%') AS "lastWeek",
(SELECT count(*)
FROM table
WHERE "Timestamp" > current_timestamp - INTERVAL '3 week' AND
"Timestamp" < current_timestamp - INTERVAL '2 week' AND "tableName" LIKE '%ping%') AS "weekBeforeLastWeek",
(SELECT count(*)
FROM table
WHERE"Timestamp" > current_timestamp - INTERVAL '1 month' AND "tableName" LIKE '%ping%')AS month
FROM table WHERE "tableName" LIKE '%ping%';
This takes around 14 Sec to 2 minutes (depends on how much other stuff is going on). but my server, which is a VM with ubuntu hosted on Azure always has both CPUs covered by 100%.
If i check the statistics of postgesql, it's mostly this query, which blocks the whole CPU.
It's a D2 VM with 2 Cores, 7gb SSD.
Is there a way to speed this up without upgrading my Azure Package?
Instead of all those sub-queries, use case expressions to do conditional aggregation:
SELECT
count(*) AS Total,
count(case when "Timestamp" > current_timestamp - INTERVAL '1 hour' AND "tableName" LIKE '%ping%' then 1 end) AS hour,
...