Hourly and daily groups with overlapping days (11:00 pm-1:00am) - sql

I'm attempting to count number of ids active during a given hourly window and day. I have three columns: start_time, end_time, and id.
Most of the groups are standard same-day groupings, i.e., 9am-11am, 11am-1pm, etc.
One of the groupings overlaps two days (11pm-1am). That is, May 16 11:01 PM and May 17 12:01 AM should be in the same group. But when composing a simple query, you'd end up grouping May 17 12:01 AM within the May 17 11:00pm.
I've tried several queries with several variations of case statements and subqueries, but can't get the gist.
This below query attempt gives me 0 for 11pm_1am column. I figured I could just move a 12:30 AM time into the previous date and count it there, but no luck.
select date_trunc('day', start_time_1) as date_active
, count(distinct(case when CAST(start_time_1 AS TIME) <= '06:00:00' and CAST(end_time_1 AS TIME) >= '01:00:00' then id end)) as early_morning
, count(distinct(case when CAST(start_time_1 AS TIME) <= '00:00:00' and CAST(end_time_1 AS TIME) >= '23:00:00' then id end)) as overnight_11pm_1am
from
(select id
, case
when cast(start_time as time) between '00:00:00' and '01:00:00'
then start_time - INTERVAL '01:00' HOUR TO MINUTE
else start_time
end as start_time_1
,case
when cast(end_timeas time) between '00:00:00' and '01:00:00'
then end_time - INTERVAL '01:00' HOUR TO MINUTE
else end_time
end as end_time_1
from table
where start_time >= '2021-01-01'
and start_time < '2021-01-04'
)
group by date_active
Out of ideas.

Related

How can I aggregate time series data in postgres from a specific timestamp & fixed intervals (e.g. 1 hour , 1 day, 7 day ) without using date_trunc()?

I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:
I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.
SELECT date_trunc('week', "DATETIME") AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;
returns the first time series interval as 2021-01-04 with an incorrect count:
week count gas coal
"2021-01-04 00:00:00" 192 18291.34375 2321.4427083333335
"2021-01-11 00:00:00" 336 14477.407738095239 2027.547619047619
"2021-01-18 00:00:00" 336 13947.044642857143 1152.047619047619
****EDIT: the following will return the correct weekly intervals by checking the start date relative to the nearest monday / start of week, and adjusts the results accordingly:
WITH vars1 AS (
SELECT '2021-01-07T00:00:00.000Z'::timestamp as start_time,
'2021-01-28T00:00:00.000Z'::timestamp as end_time
),
vars2 AS (
SELECT
((select start_time from vars1)::date - (date_trunc('week', (select start_time from vars1)::timestamp))::date) as diff
)
SELECT date_trunc('week', "DATETIME" - ((select diff from vars2) || ' day')::interval)::date + ((select diff from vars2) || ' day')::interval AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (select start_time from vars1) AND "DATETIME" < (select end_time from vars1)
GROUP BY week
ORDER BY week ASC
returns..
week count gas coal
"2021-01-07 00:00:00" 336 17242.752976190477 2293.8541666666665
"2021-01-14 00:00:00" 336 13481.497023809523 1483.0565476190477
"2021-01-21 00:00:00" 336 15278.854166666666 1592.7916666666667
And then for any daily or hourly (swap out day with hour) intervals you can use the following:
SELECT date_trunc('day', "DATETIME") AS day,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2022-01-07T00:00:00.000Z' AND "DATETIME" < '2022-01-10T23:59:59.999Z'
GROUP BY day
ORDER BY day ASC
;
In order to select the complete week, you should change the WHERe-clause to something like:
WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp)
AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp) + interval '7' day)::date
This will effectively get the records from January 4,2021 until (and including ) January 9,2022
Note: I changed <= to < to stop the end-date being included!
EDIT:
when you want your weeks to start on January 7, you can always group by:
(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7
(where d is the column containing the datetime-value.)
see: dbfiddle
EDIT:
This will get the list from a given date, and a specified interval.
see DBFIFFLE
WITH vars AS (
SELECT
'2021-01-07T00:00:00.000Z'::timestamp AS qstart,
'2022-01-06T23:59:59.999Z'::timestamp AS qend,
7 as qint,
INTERVAL '1 DAY' as qinterval
)
SELECT
(select date(qstart) FROM vars) + (SELECT qinterval from vars) * ((date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int-(date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int % (SELECT qint FROM vars)))::int) AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (SELECT qstart FROM vars) AND "DATETIME" <= (SELECT qend FROM vars)
GROUP BY week
ORDER BY week
;
I added the WITH vars to do the variable stuff on top and no need to mess with the rest of the query. (Idea borrowed here)
I only tested with qint=7,qinterval='1 DAY' and qint=14,qinterval='1 DAY' (but others values should work too...)
Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts and the start_date as follows
Difference in Days
extract (day from ts - start_date)
Difference in Weeks
Is the difference in day divided by 7 and truncated
trunc(extract (day from ts - start_date)/7)
Difference in Hours
Is the difference in day times 24 + the difference in hours of the day
extract (day from ts - start_date)*24 + extract (hour from ts - start_date)
The difference can be used in GROUP BY directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.
Sample Example
I'm using a CTE for the start date to avoid multpile copies of the paramater
with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select
ts,
extract (day from ts - (select start_ts from start_time)) day_diff,
trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
extract (day from ts - (select start_ts from start_time)) *24 + extract (hour from ts - (select start_ts from start_time)) hour_diff,
value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1

Select Data From Multiple Days Between Certain Times (Spanning 2 days)

I need to know how many entries appear in my DB for the past 7 days with a timestamp between 23:00 & 01:00...
The Issue I have is the timestamp goes across 2 days and unsure if this is even possible in the one query.
So far I have come up with the below:
select trunc(timestamp) as DTE, extract(hour from timestamp) as HR, count(COLUMN) as Total
from TABLE
where trunc(timestamp) >= '12-NOV-19' and
extract(hour from timestamp) in ('23','00','01')
group by trunc(timestamp), extract(hour from timestamp)
order by 1,2 desc;
The result I am hoping for is something like this:
DTE | Total
20-NOV-19 5
19-NOV-19 4
18-NOV-19 4
17-NOV-19 6
Many thanks
Filter on the day first comparing it to TRUNC( SYSDATE ) - INTERVAL '7' DAY and then consider the hours by comparing the timestamp to itself truncated back to midnight with an offset of a number of hours.
select trunc(timestamp) as DTE,
extract(hour from timestamp) as HR,
count(COLUMN) as Total
from TABLE
WHERE timestamp >= TRUNC( SYSDATE ) - INTERVAL '7' DAY
AND ( timestamp <= TRUNC( timestamp ) + INTERVAL '01:00' HOUR TO MINUTE
OR timestamp >= TRUNC( timestamp ) + INTERVAL '23:00' HOUR TO MINUTE
)
group by trunc(timestamp), extract(hour from timestamp)
order by DTE, HR desc;
Subtract or add an hour to derive the date. I'm not sure what date you want to assign to each period, but the idea is:
select trunc(timestamp - interval '1' hour) as DTE,
count(*) as Total
from t
where trunc(timestamp - interval '1' hour) >= DATE '2019-11-12' and
extract(hour from timestamp) in (23, 0)
group by trunc(timestamp - interval '1' hour)
order by 1 desc;
Note: If you want times between 11:00 p.m. and 1:00 a.m., then you want the hour to be 23 or 0.

Calculate Business Hours Between Two Dates without Creating Function or View

I realize that this might be a somewhat redundant question BUT I have struggled to follow some of the examples that I did find and I thought I would ask again providing details on my specific scenario.
Here is why I am working with:
Oracle Database
The dates are in timestamp format
I cannot create any additional tables/views (due to permission issue)
I cannot create any custom functions (due to permission issue)
I have a 40 hour work week and business hours of 8 to 4:30 Monday through Friday. (I guess technically that leaves us with more than 40 hours to account for b/c I don't want to get SO FANCY TO worry about excluding lunch breaks)
I'm able to figure out to calculate hours but I don't know how to get in the business day component.
Starting with your example of 8AM Friday through 9AM Monday:
with dates as (
Select timestamp '2019-05-31 08:00:00' start_date
, timestamp '2019-06-03 09:00:00' end_date
from dual
)
We need to generate the days in between. We can do that with a recursive query:
, recur(start_date, calc_date, end_date) as (
-- Anchor Part
select start_date
, trunc(start_date)
, end_date
from dates
-- Recrusive Part
union all
select start_date
, calc_date+1
, end_date
from recur
where calc_date+1 < end_Date
)
From that we need to figure out a few things like, is the calc_day a weekday or a weekend, and what are the starting and ending times for the calc_day, we can then take those values and use a little date arithmetic to find the number of hours worked on that day (returned as day to second interval since we started with timestamps):
, days as (
select calc_date
, case when mod(to_number(to_char(calc_date,'d'))-1,6) != 0 then 1 end isWeekDay
, greatest(start_date, calc_date + interval '8' hour) start_time
, least(end_date, calc_date + interval '16:30' hour to minute) end_time
, least( ( least(end_date, calc_date + interval '16:30' hour to minute)
- greatest(start_date, calc_date + interval '8' hour)
) * case when mod(to_number(to_char(calc_date,'d'))-1,6) != 0 then 1 end
, interval '8' hour
) daily_hrs
from recur
where start_date < (calc_date + interval '16:30' hour to minute)
and (calc_date + interval '8' hour) < end_date
)
Note that in the above step, we've limited the daily hours to 8 hours a day, and the where clause guards against start or end days that are outside business hours. The final step is to sum the hours. Unfortunately Oracle doesn't have any native interval aggregate or analytic functions, but we can still manage by converting the intervals to seconds, summing them and then converting them back to an interval for output:
select calc_date
, daily_hrs
, numtodsinterval(sum( extract(hour from daily_hrs)*60*60
+ extract(minute from daily_hrs)*60
+ extract(second from daily_hrs)
) over (order by calc_date)
,'second') run_sum
from days;
I've done the sum above as an analytic function so we can see some of the intervening data, but if you just want the final output you can change the last part of the query to this:
select numtodsinterval(sum( extract(hour from daily_hrs)*60*60
+ extract(minute from daily_hrs)*60
+ extract(second from daily_hrs)
)
,'second') run_sum
Here's a db<>fiddle with the whole query in action. Note that in the fiddle, I've altered the DB session's NLS_TERRITORY setting to AMERICA to make the query work since the first day of the week is country specific. The second query in the fiddle replaces the territory specific function:
case when mod(to_number(to_char(calc_date,'d'))-1,6) != 0 then 1 end
with a location and language agnostic calculation:
case when (mod(mod(calc_date - next_day(date '2019-1-1',to_char(date '2019-01-06','day')),7),6)) != 0 then 1 end

Count of rides in each week over the last 12 weeks

Assume that we have the following tables, with columns as indicated:
Rides
ride_id
start_time
end_time
passenger_id
driver_id
ride_region
is_completed (Y/N)
Drivers
driver_id
onboarding_time
home_region
Write a query that we could use to create a plot of the total count of rides completed in our San Francisco region, for each week over the last 12 weeks.
I have used datepart to get the count for every week. But I am not sure how to include the clause which outputs last 12 weeks from TODAY. My code will give a count for week 1 to 12 from the earliest start time.
Please check my code and correct me.
SELECT datepart(week, START_TIME), COUNT(RIDE_ID)
FROM RIDES
WHERE is completed = 'Y' AND ride_region ='San Francisco' AND
datepart(week, START_TIME) <= 12
group by `datepart(week, START_TIME)`;
I expect count output for last 12 weeks based on week.
Instead of:
AND datepart(week, START_TIME) <= 12
use this
AND START_TIME > current_date - interval '84 day'
because you want all the rows from the last 12 weeks = 84 days
and group by datepart(week, START_TIME)
If you want the last 12 weeks from current_date
SELECT datepart(week, START_TIME), COUNT(RIDE_ID)
FROM RIDES
WHERE is completed = 'Y'
AND ride_region ='San Francisco'
AND datepart(week, START_TIME) between (date_trunc('week', current_date) -12)
AND date_trunc('week', current_date)
group by datepart(week, START_TIME);
This is a bit complicated, because you probably don't want partial weeks. So, subtracting 12 weeks (or 84 days) may not be sufficient.
I would recommend logic more like this:
where start_time >= date_trunc('week', curdate()) - interval '12 week') and
start_time < date_trunc('week', curdate())
This gives the last 12 full weeks of data, based on calendar weeks.
You have already decided to use the canonical definition of week, so this makes sense. Alternatively, you could have the definition of week starting on any day of the week.
Select ride_id, start_time, end_time, datediff(week,start_time,end_time) as [WEEKLYRIDE],
Count(ride_id) AS (TOTALRIDES)
From rides
Where ride_region = ‘San Francisco’
And ride_completed = ‘Y’
And datediff(wk,start_time, end_time) BETWEEN 1 and 12
GROUP BY datediff(wk,start_time, end_time)

Time difference in SQL between record and "a specific time on the day"

I have a tabel in a relation database which contains a lot of dates.
I my application logic I have divided one day into 4 parts of 6 hours each, starting at: 00:00, 06:00, 12:00 and 18:00.
Now I would like to find the time difference of the earliest record in the database for each quater of a day, and the beginning og the peiod. How can I do that?
In psuedo-sql i guess it looks like
select min(created_at - ROUND_DOWN_TO_6_HOURS(created_at)) from mytabel group by day_quater;
The problem is how to calculate "ROUND_DOWN_TO_6_HOURS". So if "created_at" is 19:15 it will be rounded down to 18:00 and "created_at - ROUND_DOWN_TO_6_HOURS(created_at)" will return 1:15 hourd
I'm working with psql
If you're just trying to locate the records that match these ranges, you could just use that in the WHERE clause like
select * from myTable
where datepart(hh, created_at) between 0 and 6
If your trying to create a computed field that will have the 00 or 06 ... then you could use the "DatePart()" function in sql to pull the hour... DATEPART ( hh, date )... This would return a numeric value of 0, 1, 2, 3, ... 23 and you can compute a field based on this value being between 2 of your hours listed...
Here's a sample...
select
case
when datepart(hh, add_dt) between 0 and 6 then 1
when datepart(hh, add_dt) between 7 and 12 then 2
when datepart(hh, add_dt) between 13 and 18 then 3
when datepart(hh, add_dt) between 19 and 24 then 4
end
from myTable
where add_dt is not null
You could use CASE in conjunction with your date column and datetime functions to establish the quarter-of-day (1,2,3,4) and extract the day part from the datetime value, group by day, quarter, and then use the MIN(yourdatecolumn) to grab the earliest time within each quarter grouping.
Not sure what you mean by "beginning of the period". but you can measure the difference between any arbitrary datetime and your set of earliest times per day-quarter which was instantiated in the manner above.
http://www.postgresql.org/docs/8.2/static/functions-datetime.html
select
record::time - (case
when record::time >= '18:00' then '18:00'
when record::time >= '12:00' then '12:00'
when record::time >= '6:00' then '6:00'
else '0:00' end
)::time as difference
from my_table
My PostgreSQL is a little rusty, but something like this:
select
date_trunc('day',CreatedOn) [Day],
min(case when date_part('hour',TIMESTAMP CreatedOn) < 6 then '00:00'
when date_part('hour',TIMESTAMP CreatedOn) < 12 then '06:00'
when date_part('hour',TIMESTAMP CreatedOn) < 18 then '12:00'
else '18:00'
end) [Quarter]
from MyTable
group by date_trunc('day',CreatedOn)
order by date_trunc('day',CreatedOn)