Combine Output of Queries in Pivot Table with PostgreSQL - sql

Suppose I have a table called orders that looks like this:
id
order date
Orders_Wanted
Orders_Given
1
2020-11-29 19:12:44.417
2
6
1
2020-11-29 20:12:44.417
2
6
1
2020-11-30 23:37:28.692
8
2
1
2020-11-30 23:37:28.692
2
6
How do I write a query that shows the count of orders_wanted - orders_given by hour broken down into two columns, one that counts positive results and one that counts negative results (a note that orders_wanted and orders_given are times, so that's why I am calculating orders_wanted - orders_given). I would also like to add a final column that calculates the percentage of total orders per hour that are positive (count_orders_positive/ (count_orders_negative + Count_orders_positive)).
The output of the query would look something like this:
week
day
hour
count_orders_positive
count_orders_negative
Percentage_orders_positive
48
7
19
0
1
100%
48
7
20
0
1
100%
49
1
23
1
1
50%
So far I am able to get the bottom two results using these queries, but I don't know how to combine them.
SELECT
extract (week from (order_date at time zone 'MST' at time zone 'UTC') ) as "week",
extract (isodow from (order_date at time zone 'MST' at time zone 'UTC') ) as "day",
extract (hour from (order_date at time zone 'MST' at time zone 'UTC') ) as "hour",
Count (extract (hour from (order_date at time zone 'MST' at time zone 'UTC') )) as
"count_orders_positive"
from orders
WHERE orders_wanted - orders_given >= 0
group by week, day, hour
order by week, day, hour;
week
day
hour
count_orders_positive
49
1
23
1
SELECT
extract (week from (order_date at time zone 'MST' at time zone 'UTC') ) as "week",
extract (isodow from (order_date at time zone 'MST' at time zone 'UTC') ) as "day",
extract (hour from (order_date at time zone 'MST' at time zone 'UTC') ) as "hour",
Count (extract (hour from (order_date at time zone 'MST' at time zone 'UTC') )) as
"count_orders_negative"
from orders
WHERE orders_wanted - orders_given < 0
group by week, day, hour
order by week, day, hour;
week
day
hour
count_orders_negative
48
7
19
1
48
7
20
1
49
1
23
1

You can do conditional aggregation. avg() comes handy to compute the percentage:
select
extract (week from (order_date at time zone 'MST' at time zone 'UTC') ) as "week",
extract (isodow from (order_date at time zone 'MST' at time zone 'UTC') ) as "day",
extract (hour from (order_date at time zone 'MST' at time zone 'UTC') ) as "hour",
count(*) filter (where orders_wanted - orders_given >= 0) as count_orders_positive,
count(*) filter (where orders_wanted - orders_given < 0) as count_orders_negative,
100 * avg((orders_wanted - orders_given >= 0)::int) as percent_orders_positive
from orders
group by week, day, hour
order by week, day, hour;

Related

How can I aggregate time series data in postgres from a specific timestamp & fixed intervals (e.g. 1 hour , 1 day, 7 day ) without using date_trunc()?

I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:
I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.
SELECT date_trunc('week', "DATETIME") AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;
returns the first time series interval as 2021-01-04 with an incorrect count:
week count gas coal
"2021-01-04 00:00:00" 192 18291.34375 2321.4427083333335
"2021-01-11 00:00:00" 336 14477.407738095239 2027.547619047619
"2021-01-18 00:00:00" 336 13947.044642857143 1152.047619047619
****EDIT: the following will return the correct weekly intervals by checking the start date relative to the nearest monday / start of week, and adjusts the results accordingly:
WITH vars1 AS (
SELECT '2021-01-07T00:00:00.000Z'::timestamp as start_time,
'2021-01-28T00:00:00.000Z'::timestamp as end_time
),
vars2 AS (
SELECT
((select start_time from vars1)::date - (date_trunc('week', (select start_time from vars1)::timestamp))::date) as diff
)
SELECT date_trunc('week', "DATETIME" - ((select diff from vars2) || ' day')::interval)::date + ((select diff from vars2) || ' day')::interval AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (select start_time from vars1) AND "DATETIME" < (select end_time from vars1)
GROUP BY week
ORDER BY week ASC
returns..
week count gas coal
"2021-01-07 00:00:00" 336 17242.752976190477 2293.8541666666665
"2021-01-14 00:00:00" 336 13481.497023809523 1483.0565476190477
"2021-01-21 00:00:00" 336 15278.854166666666 1592.7916666666667
And then for any daily or hourly (swap out day with hour) intervals you can use the following:
SELECT date_trunc('day', "DATETIME") AS day,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2022-01-07T00:00:00.000Z' AND "DATETIME" < '2022-01-10T23:59:59.999Z'
GROUP BY day
ORDER BY day ASC
;
In order to select the complete week, you should change the WHERe-clause to something like:
WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp)
AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp) + interval '7' day)::date
This will effectively get the records from January 4,2021 until (and including ) January 9,2022
Note: I changed <= to < to stop the end-date being included!
EDIT:
when you want your weeks to start on January 7, you can always group by:
(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7
(where d is the column containing the datetime-value.)
see: dbfiddle
EDIT:
This will get the list from a given date, and a specified interval.
see DBFIFFLE
WITH vars AS (
SELECT
'2021-01-07T00:00:00.000Z'::timestamp AS qstart,
'2022-01-06T23:59:59.999Z'::timestamp AS qend,
7 as qint,
INTERVAL '1 DAY' as qinterval
)
SELECT
(select date(qstart) FROM vars) + (SELECT qinterval from vars) * ((date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int-(date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int % (SELECT qint FROM vars)))::int) AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (SELECT qstart FROM vars) AND "DATETIME" <= (SELECT qend FROM vars)
GROUP BY week
ORDER BY week
;
I added the WITH vars to do the variable stuff on top and no need to mess with the rest of the query. (Idea borrowed here)
I only tested with qint=7,qinterval='1 DAY' and qint=14,qinterval='1 DAY' (but others values should work too...)
Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts and the start_date as follows
Difference in Days
extract (day from ts - start_date)
Difference in Weeks
Is the difference in day divided by 7 and truncated
trunc(extract (day from ts - start_date)/7)
Difference in Hours
Is the difference in day times 24 + the difference in hours of the day
extract (day from ts - start_date)*24 + extract (hour from ts - start_date)
The difference can be used in GROUP BY directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.
Sample Example
I'm using a CTE for the start date to avoid multpile copies of the paramater
with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select
ts,
extract (day from ts - (select start_ts from start_time)) day_diff,
trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
extract (day from ts - (select start_ts from start_time)) *24 + extract (hour from ts - (select start_ts from start_time)) hour_diff,
value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1

get time series in 10 minutes of interval

I am generating one time-series from using below query.
SELECT date_trunc('min', dd):: TIMESTAMP WITHOUT TIME zone as time_ent
FROM generate_series ( timestamp '2021-12-09 06:34:37' + ((DATE_PART('min', timestamp '2021-12-09 06:34:37')::integer % 2) || ' minutes') :: INTERVAL
, '2021-12-10 06:34:37'::timestamp
, '20 min'::interval) dd
and it will give me output like below.
2021-12-09 06:34:00.000
2021-12-09 06:54:00.000
2021-12-09 07:14:00.000
2021-12-09 07:34:00.000
but I need output like.
2021-12-09 06:40:00.000
2021-12-09 07:00:00.000
2021-12-09 07:20:00.000
2021-12-09 07:40:00.000
currently, the time series hours depend upon the timestamp that I pass. in above it gives me mins like 34,54,14...but I want the mins like 40,00,20...it should not depend on the time I passed in query. I tried with timestamp '2021-12-09 06:34:37' + ((DATE_PART('min', timestamp '2021-12-09 06:34:37')::integer % 2) || ' minutes') :: INTERVAL but not any success.
Based on your description, I assumed that you want to create a series of timestamps for 00, 20, 40 minute at each hour until the next day.
select *
from (
select date_trunc('hour', current_timestamp) + i * interval '20 minutes' as ts
from generate_series(1, 24*3) as t(i)) t
where ts between current_timestamp and current_timestamp + interval '1 day'
The key idea here is to truncate the current_timestamp to 00 minute first. This becomes the start of the series. Then filter out the generated timestamps outside the range you want. You may need to adjust the second argument of generate_series function, depending on your requirement.
Or you can just generate a series of timestamp like the following:
select *
from (
select ts
from generate_series(
date_trunc('hour', current_timestamp),
current_timestamp + interval '1 day',
interval '20 minutes') as t(ts)) as t
where ts between current_timestamp and current_timestamp + interval '1 day'
Here, you still need to trunc your timestamp to hour first so the start of the series at 00 minute.

Select Data From Multiple Days Between Certain Times (Spanning 2 days)

I need to know how many entries appear in my DB for the past 7 days with a timestamp between 23:00 & 01:00...
The Issue I have is the timestamp goes across 2 days and unsure if this is even possible in the one query.
So far I have come up with the below:
select trunc(timestamp) as DTE, extract(hour from timestamp) as HR, count(COLUMN) as Total
from TABLE
where trunc(timestamp) >= '12-NOV-19' and
extract(hour from timestamp) in ('23','00','01')
group by trunc(timestamp), extract(hour from timestamp)
order by 1,2 desc;
The result I am hoping for is something like this:
DTE | Total
20-NOV-19 5
19-NOV-19 4
18-NOV-19 4
17-NOV-19 6
Many thanks
Filter on the day first comparing it to TRUNC( SYSDATE ) - INTERVAL '7' DAY and then consider the hours by comparing the timestamp to itself truncated back to midnight with an offset of a number of hours.
select trunc(timestamp) as DTE,
extract(hour from timestamp) as HR,
count(COLUMN) as Total
from TABLE
WHERE timestamp >= TRUNC( SYSDATE ) - INTERVAL '7' DAY
AND ( timestamp <= TRUNC( timestamp ) + INTERVAL '01:00' HOUR TO MINUTE
OR timestamp >= TRUNC( timestamp ) + INTERVAL '23:00' HOUR TO MINUTE
)
group by trunc(timestamp), extract(hour from timestamp)
order by DTE, HR desc;
Subtract or add an hour to derive the date. I'm not sure what date you want to assign to each period, but the idea is:
select trunc(timestamp - interval '1' hour) as DTE,
count(*) as Total
from t
where trunc(timestamp - interval '1' hour) >= DATE '2019-11-12' and
extract(hour from timestamp) in (23, 0)
group by trunc(timestamp - interval '1' hour)
order by 1 desc;
Note: If you want times between 11:00 p.m. and 1:00 a.m., then you want the hour to be 23 or 0.

Postgres: Select a timeinterval that spans past midnight

I have the following table:
id | time
----+-------------
1 | 21:00:00+01
2 | 22:00:00+01
3 | 23:00:00+01
Column id is of type integer and time is time with timezone. I want to select all rows that fall within a specified interval, e.g.,
select *
from times
where time >= time '22:30' - interval '60 minutes' and time <= time '22:30' + interval '60 minutes';
However, if the intervall extends past midnight, i.e., when I select 23:30 as time argument, then I get an empty result set.
Is there a way to tell postgress to ignore the minutes that span past midnight?
You can use this logic:
select *
from times t cross join
(values ('22:30'::time - interval '60 minutes', '22:30'::time + interval '60 minutes')
) v(fromt, tot)
where (fromt <= tot and time >= fromt and time <= tot) or
(fromt > tot and (time >= fromt or time <= tot))

SQL statement dynamically using current time to choose a time frame in a field (Oracle)

All, I have something that is stumping me and I have seen a lot of examples, but nothing is helping solve this.
I have time frames like 03:30:00 to 11:29:59 that I work with (say shift times). I want to dynamically query data for the last shift based on the current shift.
Example: if it is currently between 11:30:00 AM and 7:29:59 PM, I want get the last shift that was between 03:30:00 AM and 11:30:00 AM.
This would look like an if statement in my mind:
If time between .... then
select time between....
elseif time between.... then
select time between...
I tried many combinations and can't figure this out. I think I would need a CASE and maybe a subquery? or maybe DECODE will work?
SELECT CAST(ccd.DATEc AS TIME) as time_occured,
FROM db.datatb ccd
WHERE ccd.DATE > SYSDATE - interval '1440' minute
AND (
((TO_CHAR(SYSDATE, 'hh24:mi:ss')BETWEEN '03:30:00' AND '11:29:59' IN (SELECT
ccd.DATEc FROM db.datatb WHERE (CAST(ccd.DATEc AS TIME)NOT BETWEEN '03:30:00
AM' AND '07:29:59 PM')))
OR (TO_CHAR(SYSDATE, 'hh24:mi:ss')BETWEEN '11:30:00' AND '19:29:59' IN
(SELECT ccd.DATEc FROM db.datatb WHERE (CAST(ccd.DATEc AS TIME) BETWEEN
'03:30:00 AM' AND '11:29:59 AM')))
OR (TO_CHAR(SYSDATE, 'hh24:mi:ss')NOT BETWEEN '03:30:00' AND '19:29:59' IN
(SELECT ccd.DATEc FROM db.datatb WHERE (CAST(ccd.DATEc AS TIME) BETWEEN
'11:30:00 AM' AND '07:29:59 PM')))
)
SELECT *
FROM db.datatb
CROSS JOIN
( SELECT TRUNC( SYSDATE - INTERVAL '210' MINUTE )
+ NUMTODSINTERVAL(
TRUNC(
( SYSDATE - INTERVAL '210' MINUTE
- TRUNC( SYSDATE - INTERVAL '210' MINUTE )
) * 3
) * 480
+ 210,
'MINUTE'
) AS current_shift_start
FROM DUAL
) css
WHERE DATEc >= css.current_shift_start - INTERVAL '8' HOUR
AND DATEc < css.current_shift_start;
Explanation:
The shifts are 8 hours each starting at 03:30 (or 210 minutes past midnight); so SYSDATE - INTERVAL '210' MINUTE will move offset the times so that after this offset they start at 00:00, 08:00 and 16:00 which is thirds of a day.
date_value - TRUNC( date_value ) calculates the fraction of a day (between 0 and 1) that the time component represents; so TRUNC( ( date_value - TRUNC( date_value ) ) * 3 ) maps that fraction of the day to 0, 1 or 2 corresponding to whether it is in the 1st, 2nd or 3rd 8 hour period of the day. Multiple that value by 480 minutes and then add the 210 minutes that the date was originally offset by and you have the minutes past the start of the day that the shift starts.