SQL averages per row from multiple columns and nulls

SQL averages per row from multiple columns and nulls - sql

I have an app that logs data for sensors and I want to be able to produce averages from multiple sensors, could be one, two, three or plenty...
EDIT: These are temperature sensors so 0 is a value that the sensors might store as a value in the database.
My initial starting point was this SQL query:
SELECT grid.t5||'.000000' as ts,
avg(t.sensorvalue) sensorvalue1
, avg(w.sensorvalue)AS sensorvalue2
FROM
(SELECT generate_series(min(date_trunc('hour', ts))
,max(ts), interval '5 min') AS t5 FROM device_history_20865735 where
ts between '2015/05/13 09:00' and '2015/05/14 09:00' ) grid
LEFT JOIN device_history_20865735 t ON t.ts >= grid.t5 AND t.ts < grid.t5 + interval '5 min'
LEFT JOIN device_history_493417852 w ON w.ts >= grid.t5 AND w.ts < grid.t5 + interval '5 min'
--WHERE t.sensorvalue notnull
GROUP BY grid.t5 ORDER BY grid.t5
I get 5 min averages as it is better for my app.
The results as expected have NULL values for either sensorvalue1 or 2:
ts;sensorvalue1;sensorvalue2
"2015-05-13 09:00:00.000000";19.9300003051758;
"2015-05-13 09:05:00.000000";20;
"2015-05-13 09:10:00.000000";;
"2015-05-13 09:15:00.000000";20.0599994659424;
"2015-05-13 09:20:00.000000";;
"2015-05-13 09:25:00.000000";20.1200008392334;
My aim is to calculate an average for each 5 min interval from all the available sensors so as NULLs are a problem I thought of using a CASE statement so if there is a NULL to get the value of the other sensor...
SELECT grid.t5||'.000000' as ts,
CASE
WHEN avg(t.sensorvalue) ISNULL THEN avg(w.sensorvalue)
ELSE avg(t.sensorvalue)
END AS sensorvalue
,
CASE
WHEN avg(w.sensorvalue) ISNULL THEN avg(t.sensorvalue)
ELSE avg(w.sensorvalue)
END AS sensorvalue2
FROM
(SELECT generate_series(min(date_trunc('hour', ts)),max(ts), interval '5 min') AS t5
FROM device_history_20865735 where
ts between '2015/05/13 09:00' and '2015/05/14 09:00' ) grid
LEFT JOIN device_history_20865735 t ON t.ts >= grid.t5 AND t.ts < grid.t5 + interval '5 min'
LEFT JOIN device_history_493417852 w ON w.ts >= grid.t5 AND w.ts < grid.t5 + interval '5 min'
GROUP BY grid.t5 ORDER BY grid.t5
but then to calculate the average I have to do another select on top of this and devide per number of columns (aka sensors) and if they are just two it is OK but if there are 3 or 4 sensors this can get very messy as there could be multiple sensors with NULL values per row...
The SQL is derived grammatically from an app (using Python) using postgres 9.4 so is there a simple way to achieve what is needed as I feel I'm down a rather complex route...?
EDIT #2: With your input I've produce this SQL code, again it seems rather complex but open to your ideas and scrutiny if it is reliable and maintainable:
SELECT ts, sensortotal, sensorcount,
CASE
WHEN sensorcount = 0 THEN -1000
ELSE sensortotal/sensorcount
END AS sensorAvg
FROM (
WITH grid as (
SELECT t5
FROM (SELECT generate_series(min(date_trunc('hour', ts)), max(ts), interval '5 min') as t5
FROM device_history_20865735
) d
WHERE t5 between '2015-05-13 09:00' and '2015-05-14 09:00'
)
SELECT d1.t5 || '.000000' as ts
, Coalesce(avg(d1.sensorvalue), 0) + Coalesce(avg(d2.sensorvalue),0) as sensorTotal
, (CASE
WHEN avg(d1.sensorvalue) ISNULL THEN 0
ELSE 1
END + CASE
WHEN avg(d2.sensorvalue) ISNULL THEN 0
ELSE 1
END) as sensorCount
FROM (SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
FROM grid LEFT JOIN
device_history_20865735 t
ON t.ts >= grid.t5 AND t.ts <grid.t5 + interval '5 min'
GROUP BY grid.t5
) d1 LEFT JOIN
(SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
FROM grid LEFT JOIN
device_history_493417852 t
ON t.ts >= grid.t5 AND t.ts <grid.t5 + interval '5 min'
GROUP BY grid.t5
) d2 on d1.t5 = d2.t5
GROUP BY d1.t5
ORDER BY d1.t5
) tmp;
Thanks!

It sounds like you want to something like this:
(coalesce(value1,0) + coalesce(value2,0) + coalesce(value3,0)) /
(value1 IS NOT NULL::int + value2 IS NOT NULL::int + value3 IS NOT NULL::int)
AS average
Basically, just do the math you want to do for each row. The only "tricky" part is how to "count" the non-null values--I used a cast, but there are other options such as:
CASE WHEN value1 IS NULL THEN 0 ELSE 1 END

To get accurate averages, you need to calculate each one separately before the join:
WITH grid as (
SELECT t5
FROM (SELECT generate_series(min(date_trunc('hour', ts)), max(ts), interval '5 min') as t5
FROM device_history_20865735
) d
WHERE t5 between '2015-05-13 09:00' and '2015-05-14 09:00'
)
SELECT d1.t5 || '.000000' as ts,
avg(d1.sensorvalue) as sensorvalue1
, avg(d2.sensorvalue) as sensorvalue2
FROM (SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
FROM grid LEFT JOIN
device_history_20865735 t
ON t.ts >= grid.t5 AND t.ts <grid.t5 + interval '5 min'
GROUP BY grid.t5
) d1 LEFT JOIN
(SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
FROM grid LEFT JOIN
device_history_493417852 t
ON t.ts >= grid.t5 AND t.ts <grid.t5 + interval '5 min'
GROUP BY grid.t5
) d2 on d1.t5 = d2.t5
GROUP BY d1.t5
ORDER BY d1.t5;

Related

PostgreSQL showing different time periods in a single query

I have a query that will return the ratio of issuances from (issuances from specific network with specific time period / total issuances). so the issuances from specific network with a specific time period divided to total issuances from all networks. Right now it returns the ratios of issuances only from last year (year-to-date I mean), I want to include several time periods in it such as one month ago, 2 month ago etc. LEFT JOIN usually works but I couldn't figure it out for this one. How do I do it?
Here is the query:
SELECT IR1.network,
count(*) / ((select count(*) FROM issuances_extended
where status = 'completed' and
issued_at >= date_trunc('year',current_date)) * 1.) as issuance_ratio_ytd
FROM issuances_extended as IR1 WHERE status = 'completed' and
(issued_at >= date_trunc('year',current_date))
GROUP BY
IR1.network
order by IR1.network

I would break your query into CTEs something like this:
with periods (period_name, period_range) as (
values
('YTD', daterange(date_trunc('year', current_date), null)),
('LY', daterange(date_trunc('year', current_date - 'interval 1 year'),
date_trunc('year', current_date))),
('MTD', daterange(date_trunc('month', current_date - 'interval 1 month'),
date_trunc('month', current_date));
-- Add whatever other intervals you want to see
), period_totals as ( -- Get period totals
select p.period_name, p.period_range, count(*) as total_issuances
from periods p
join issuances_extended i
on i.status = 'completed'
and i.issued_at <# p.period_range
)
select p.period_name, p.period_range,
i.network, count(*) as network_issuances,
1.0 * count(*) / p.total_issuances as issuance_ratio
from period_totals p
join issuances_extended i
on i.status = 'completed'
and i.issued_at <# p.period_range
group by p.period_name, p.period_range, i.network, p.total_issuances;
The problem with this is that you get rows instead of columns, but you can use a spreadsheet program or reporting tool to pivot if you need to. This method simplifies the calculations and lets you add whatever period ranges you want by adding more values to the periods CTE.

Something like this? Obviously not tested
SELECT
IR1.network,
count(*)/((select count(*) FROM issuances_extended
where status = 'completed' and
issued_at between mon.t and current_date ) * 1.) as issuance_ratio_ytd
FROM
issuances_extended as IR1 ,
(
SELECT
generate_series('2022-01-01'::date,
'2022-07-01'::date, '1 month') AS t)
AS mon
WHERE
status = 'completed' and
(issued_at between mon.t and current_date)
GROUP BY
IR1.network
ORDER BY
IR1.network

I've managed to join these tables, so I am answering my question for those who would need some help. To add more tables all you have to do is put new queries in LEFT JOIN and acknowledge them in the base query (IR3, IR4, blabla etc.)
SELECT
IR1.network,
count(*) / (
(
select
count(*)
FROM
issuances_extended
where
status = 'completed'
and issued_at >= date_trunc('year', current_date)
) * 1./ 100
) as issuances_ratio_ytd,
max(coalesce(IR2.issuances_ratio_m0, 0)) as issuances_ratio_m0
FROM
issuances_extended as IR1
LEFT JOIN (
SELECT
network,
count(*) / (
(
select
count(*)
FROM
issuances_extended
where
status = 'completed'
and issued_at >= date_trunc('month', current_date)
) * 1./ 100
) as issuances_ratio_m0
FROM
issuances_extended
WHERE
status = 'completed'
and (issued_at >= date_trunc('month', current_date))
GROUP BY
network
) AS IR2 ON IR1.network = IR2.network
WHERE
status = 'completed'
and (issued_at >= date_trunc('year', current_date))
GROUP BY
IR1.network,
IR2.issuances_ratio_m0
order by
IR1.network

Divide results from two query by another query in SQL

I have this query in Metabase:
with l1 as (SELECT date_trunc ('day', Ticket_Escalated_At) as time_scale, count (Ticket_ID) as chat_per_day
FROM CHAT_TICKETS where SUPPORT_QUEUE = 'transfer_investigations'
and date_trunc('month', TICKET_ESCALATED_AT) > now() - interval '6' Month
GROUP by 1)
with l2 as (SELECT date_trunc('day', created_date) as week, count(*) as TI_watchman_ticket
FROM jira_issues
WHERE issue_type NOT IN ('Transfer - General', 'TI - Advanced')
and date_trunc('month', created_date) > now() - interval '6' Month
and project_key = 'TI2'
GROUP BY 1)
SELECT l1.* from l1
UNION SELECT l2.* from l2
ORDER by 1
and this one:
with hours as (SELECT date_trunc('day', ws.start_time) as date_
,(ifnull(sum((case when ws.shift_position = 'TI - Non-watchman' then (minutes_between(ws.end_time, ws.start_time)/60) end)),0) + ifnull(sum((case when ws.shift_position = 'TI - Watchman' then (minutes_between(ws.end_time, ws.start_time)/60) end)),0) ) as total
from chat_agents a
join wiw_shifts ws on a.email = ws.user_email
left join people_ops.employees h on substr(h.email,1, instr(h.email,'#revolut') - 1) = a.login
where (seniority != 'Lead' or seniority is null)
and date_trunc('month', ws.start_time) > now() - interval '6' Month
GROUP BY 1)
I would like to divide the output of the UNION of the first one, by the result of the second one, any ideas.

Sum all values in a column and group by 2 minutes

I want to sum all values in a column "value" and group by a interval of 2 minutes
I have values like this:
value TIME
0.3 2019-05-22 01:11:45---> first value 0,3
0.3 2019-05-22 01:12:16-----|
0.3 2019-05-22 01:13:26-----|second value 0,6
0.2 2019-05-22 01:13:56---|
0.4 2019-05-22 01:14:06---|
0.6 2019-05-22 01:15:43 --|third value 1,2
But what I really want is like this:
value TIME
0.3 2019-05-22 01:11:45
0.6 2019-05-22 01:13:45
1.2 2019-05-22 01:15:45
My code in postgresql:
SELECT medi_sensor.value, time
FROM medi_sensor
JOIN sensor ON medi_sensor.sensor_name = sensor.name
JOIN mote ON num_mot=mot_id
JOIN room ON room_id=id_div
WHERE medi_sensor.sensor_name LIKE 'current%' AND room.name='DIV' AND time>'2019-05-22' AND time<'2019-05-24'
ORDER BY time ASC
The problem is how to group by minute to minute in my time column

In Postgres, you can use generate_series() to generate the values:
select gs.t, sum(value)
from (select ms.value, time, min(time) over () as min_time, max(time) over () as max_time
from medi_sensor ms join
sensor s
on ms.sensor_name = s.name join
mote
on num_mot = mot_id join
room r
on room_id = id_div
where ms.sensor_name LIKE 'current%' and
r.name = 'DIV' and
time > '2019-05-22' and
time < '2019-05-24'
) x right join lateral
generate_series(min_time, max_time, interval '2 minute') gs(t)
on time >= gs.t and time < ts.t + interval '2 minute'
order by gs.t;
I would recommend that you use table aliases for all column references in your query.
EDIT:
with x as (
select ms.value, time
from medi_sensor ms join
sensor s
on ms.sensor_name = s.name join
mote
on num_mot = mot_id join
room r
on room_id = id_div
where ms.sensor_name LIKE 'current%' and
r.name = 'DIV' and
time > '2019-05-22' and
time < '2019-05-24'
) x
select gs.ts, sum(x.value)
from (select generate_series(min_time, max_time, interval '2 minute') as ts
from (select min(time) as min_time, max(time) as max_time
from x
)
) gs left join
x
on x.time >= gs.t and x.time < ts.t + interval '2 minute'
group by gs.ts
order by gs.ts;

SQL: combine case and number to date conversion

I want to do a sql statement which calculates in one table the substraction of 2 Date values. If the value is negative I just want to show it as the 0 value.
The number value is the number of seconds a payment is in a current state.
I convert it with a trick to a time value(Date type).
My current code is
SELECT
max(CASE WHEN t1.time_event < SYSDATE and t2.time_event > SYSDATE THEN to_char(to_date(max(round(SYSDATE - t1.time_event) * 24 * 60 * 60)),'ssssss'),'hh24:mi:ss') else to_char(to_date(0)) END) as "current_age"
from tbl_dummyfeed t1 join tbl_dummyfeed t2 on t1.payment_Id = t2.payment_id
where t1.event = 'accepted' and t2.event = 'enriched');

You could use a similar date 'trick', which is to add the fractional-days difference to a nominal date which has the time portion as midnight - you could use a fixed date, or trunc(sysdate), as long as the time ends up as midnight - without having to multiply by 24*60*60. (Your to_date() solution does the same thing implicitly, effectively adding the number of seconds to midnight on the first day of the current month; but this might be a little clearer). But you can also move the case clauses into the where filter
select to_char(date '1970-01-01'
+ nvl(max(sysdate - t1.time_event), 0), 'HH24:MI:SS') as "current_age"
from tbl_dummyfeed t1
join tbl_dummyfeed t2
on t1.trax_id = t2.trax_id
where t1.event = 'accepted'
and t1.time_event < sysdate
and t2.event = 'enriched'
and t2.time_event > sysdate;
You could also use an analytic approach so you only have to hit the table once, with a subquery that pairs up each 'enriched' time with the previous 'accepted' time for that ID, and which you then filter against the current time:
select to_char(date '1970-01-01'
+ nvl(max(sysdate - time_accepted), 0), 'HH24:MI:SS') as "current_age"
from (
select last_value(case when event = 'accepted' then time_event end ignore nulls)
over (partition by trax_id order by time_event) as time_accepted,
case when event = 'enriched' then time_event end as time_enriched
from tbl_dummyfeed
)
where time_accepted < sysdate
and time_enriched > sysdate;

This works, only the conversion to time hh24:mi:ss still needs to happen.
SELECT max(CASE WHEN t1.time_event < SYSDATE and t2.time_event > SYSDATE THEN round((SYSDATE - t1.time_event) * 24 * 60 * 60) else 0 END) as "current_age"
from tbl_dummyfeed t1 join tbl_dummyfeed t2 on t1.payment_id= t2.payment_id
where t1.event = 'accepted' and t2.event = 'enriched';
When I add the conversion to hh24:mm:ss The solution looks like this
SELECT to_char(to_date(max(CASE WHEN t1.time_event < SYSDATE and t2.time_event > SYSDATE THEN round((SYSDATE - t1.time_event) * 24 * 60
* 60) else 0 END),'sssss'),'hh24:mi:ss') as "current_age" from tbl_dummyfeed t1 join tbl_dummyfeed t2 on t1.trax_Id = t2.trax_id where t1.event = 'accepted' and t2.event = 'enriched';
This is the only good solution for my question. Hope this helps people.

Querying for a 'run' of consecutive columns in Postgres

I have a table:
create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');
I would like to construct a statement that given an event could return the length of the 'run' of events starting with that event. A run is defined by:
Two events are in a run together if they are within 30 seconds of one another.
If A and B are in a run together, and B and C are in a run together then A is in a run
with C.
However my query does not need to go backwards in time, so if I select on event 2, then only events 2, 3, and 4 should be counted as part of the run of events starting with 2, and 3 should be returned as the length of the run.
Any ideas? I'm stumped.

Here is the RECURSIVE CTE-solution. (islands-and-gaps problems naturally lend themselves to recursive CTE)
WITH RECURSIVE runrun AS (
SELECT event_id, event_time
, event_time - ('30 sec'::interval) AS low_time
, event_time + ('30 sec'::interval) AS high_time
FROM table1
UNION
SELECT t1.event_id, t1.event_time
, LEAST ( rr.low_time, t1.event_time - ('30 sec'::interval) ) AS low_time
, GREATEST ( rr.high_time, t1.event_time + ('30 sec'::interval) ) AS high_time
FROM table1 t1
JOIN runrun rr ON t1.event_time >= rr.low_time
AND t1.event_time < rr.high_time
)
SELECT DISTINCT ON (event_id) *
FROM runrun rr
WHERE rr.event_time >= '2011-01-01 00:00:15'
AND rr.low_time <= '2011-01-01 00:00:15'
AND rr.high_time > '2011-01-01 00:00:15'
;
Result:
event_id | event_time | low_time | high_time
----------+---------------------+---------------------+---------------------
2 | 2011-01-01 00:00:15 | 2010-12-31 23:59:45 | 2011-01-01 00:00:45
3 | 2011-01-01 00:00:29 | 2010-12-31 23:59:45 | 2011-01-01 00:01:28
4 | 2011-01-01 00:00:58 | 2010-12-31 23:59:30 | 2011-01-01 00:01:28
(3 rows)

Could look like this:
WITH x AS (
SELECT event_time
,row_number() OVER w AS rn
,lead(event_time) OVER w AS next_time
FROM table1
WHERE event_id >= <start_id>
WINDOW w AS (ORDER BY event_time, event_id)
)
SELECT COALESCE(
(SELECT x.rn
FROM x
WHERE (x.event_time + interval '30s') < x.next_time
ORDER BY x.rn
LIMIT 1)
,(SELECT count(*) FROM x)
) AS run_length
This version does not rely on a gap-less sequence of IDs, but on event_time only.
Identical event_time's are additionally sorted by event_id to be unambiguous.
Read about the window functions row_number() and lead() and CTE (With clause) in the manual.
Edit
If we cannot assume that a bigger event_id has a later (or equal) event_time, substitute this for the first WHERE clause:
WHERE event_time >= (SELECT event_time FROM table1 WHERE event_id = <start_id>)
Rows with the same event_time as the starting row but a a smaller event_id will still be ignored.
In the special case of one run till the end no end is found and no row returned. COALESCE returns the count of all rows instead.

You can join a table onto itself on a date difference statement. Actually, this is postgres, a simple minus works.
This subquery will find all records that is a 'start event'. That is to say, all event records that does not have another event record occurring within 30 seconds before it:
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
where b.event_time is null) startevent
With a few changes...same logic, except picking up an 'end' event:
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) end_event
Now we can join these together to associate which start event goes to which end event:
(still writing...there's a couple ways at going on this. I'm assuming only the example has linear ID numbers, so you'll want to join the start event time to the end event time having the smallest positive difference on the event times).
Here's my end result...kinda nested a lot of subselects
select a.start_id, case when a.event_id is null then t1.event_id::varchar else 'single event' end as end_id
from
(select start_event.event_id as start_id, start_event.event_time as start_time, last_event.event_id, min(end_event.event_time - start_event.event_time) as min_interval
from
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
where b.event_time is null) start_event
inner join
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) end_event
on end_event.event_time > start_event.event_time
--check for only event
left join
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) last_event
on start_event.event_id = last_event.event_id
group by 1,2,3) a
left join table1 t1 on t1.event_time = a.start_time + a.min_interval
Results as start_id, end_Id:
1;"4"
5;"6"
7;"single event"
8;"single event"
9;"11"
I had to use a third left join to pick out single events as a method of detecting events that were both start events and end events. End result is in ID's and can be linked back to your original table if you want different information than just the ID. Unsure how this solution will scale, if you've got millions of events...could be an issue.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL averages per row from multiple columns and nulls - sql

Related

PostgreSQL showing different time periods in a single query

Divide results from two query by another query in SQL

Sum all values in a column and group by 2 minutes

SQL: combine case and number to date conversion

Querying for a 'run' of consecutive columns in Postgres

Categories

Resources