Block average in SQL - sql

I have to take average of delta time interval between rows in SQL Server that represent the time occurred between two consecutive operations. However, there are no operations during nights / holidays / weekends (e.g. between the last operation of Friday and the first one on Monday the delta time is more than 48h, but i don't want to consider it), so the average time is totally incorrect.
How to deal with this problem? Is there a way to drop these entries and compute the real average delta time, doing a sort of block (per-day?) average?
Thanks!
An example:
Time
00:00:37
00:00:32
00:00:25
...
00:01:22
00:00:54 ---- e.g. Night ---
09:34:12 <--- Exclude this from the average calculation ---
00:00:22
00:00:41
00:00:36
...
Desired output
Avg time: 41.13s

For the time difference, you can apply a where clause. For the rest, just date functions and arithmetic:
select convert(date, time),
avg( datediff(second, prev_time, time) * 1.0 ) as avg_seconds
from (select t.*,
lag(time) over (order by time) as prev_time
from t
) t
where time < dateadd(hour, 4, prev_time) -- or whatever the threshold is
group by convert(date, time);

Related

Get count of matching time ranges for every minute of the day in Postgres

Problem
I have a table of records each containing id, in_datetime, and out_datetime. A record is considered "open" during the time between the in_datetime and out_datetime. I want to know how many time records were "open" for each minute of the day (regardless of date). For example, for the last 90 days I want to know how many records were "open" at 3:14 am, then 3:15 am, then 3:16 am, then... If no records were "open" at 2:00 am the query should return 0 or null instead of excluding the row, thus 1440 rows should always be returned (the number of minutes in a day). Datetimes are stored in UTC and need to be cast to a time zone.
Simplified example graphic
record_id | time_range
| 0123456789 (these are minutes past midnight)
1 | =========
2 | ===
3 | =======
4 | ===
5 | ==
______________________
result 3323343210
Desired output
time | count of open records at this time
00:00 120
00:01 135
00:02 132
...
23:57 57
23:58 62
23:59 60
No more than 1440 records would ever be returned as there are only 1440 minutes in the day.
What I've tried
1.) In a subquery, I currently generate a minutely series of times for the entire range of each time record. I then group those by time and get a count of the records per minute.
Here is a db-fiddle using my current query:
select
trs.minutes,
count(trs.minutes)
from (
select
generate_series(
DATE_TRUNC('minute', (time_records.in_datetime::timestamptz AT TIME ZONE 'America/Denver')),
DATE_TRUNC('minute', (time_records.out_datetime::timestamptz AT TIME ZONE 'America/Denver')),
interval '1 min'
)::time as minutes
from
time_records
) trs
group by
trs.minutes
This works but is quite inefficient and takes several seconds to run due to the size of my table. Additionally, it excludes times when no records were open. I think somehow I could use window functions to count the number of overlapping time records for each minute of the day, but I don't quite understand how to do that.
2.) Modifying Gordon Linoff's query in his answer below, I came to this (db-fiddle link):
with tr as (
select
date_trunc('minute', (tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver'))::time as m,
1 as inc
from
time_records tr
union all
select
(date_trunc('minute', (tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')) + interval '1 minute')::time as m,
-1 as inc
from
time_records tr
union all
select
minutes::time,
0
from
generate_series(timestamp '2000-01-01 00:00', timestamp '2000-01-01 23:59', interval '1 min') as minutes
)
select
m,
sum(inc) as changes_at_inc,
sum(sum(inc)) over (order by m) as running_count
from
tr
where
m is not null
group by
m
order by
m;
This runs reasonably quickly, but towards the end of the day (about 22:00 onwards in the linked example) the values turn negative for some reason. Additionally, this query doesn't seem to work correctly with records with time ranges that cross over midnight. It's a step in the right direction, but I unfortunately don't understand it enough to improve on it further.
Here is a faster method. Generate "in" and "out" records for when something gets counted. Then aggregate and use a running sum.
To get all minutes, throw in a generate_series() for the time period in question:
with tr as (
select date_trunc('minute', (tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver')) as m,
1 as inc
from time_records tr
union all
select date_trunc('minute', (tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')) + interval '1 minute' as m,
-1 as inc
from time_records tr
union all
select generate_series(date_trunc('minute',
min(tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver')),
date_trunc('minute',
max(tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')),
interval '1 minute'
), 0
from time_records tr
)
select m,
sum(inc) as changes_at_inc,
sum(sum(inc)) over (order by m) as running_count
from tr
group by m
order by m;

Averaging event start time from DateTime column

I'm calculating average start times from events that run late at night and may not start until the next morning.
2018-01-09 00:01:38.000
2018-01-09 23:43:22.000
currently all I can produce is an average of 11:52:30.0000000
I would like the result to be ~ 23:52
the times averaged will not remain static as this event runs daily and I will have new data daily. I will likely take the most recent 10 records and average them.
Would be nice to have SQL you're running, but probably you just need to format properly your output, it should be something like this:
FORMAT(cast(<your column> as time), N'hh\:mm(24h)')
The following will both compute the average across the datetime field and also return the result as a 24hr time notation only.
SELECT CAST(CAST(AVG(CAST(<YourDateTimeField_Here> AS FLOAT)) AS DATETIME) AS TIME) [AvgTime] FROM <YourTableContaining_DateTime>
The following will calculate the average time of day, regardless of what day that is.
--SAMPLE DATA
create table #tmp_sec_dif
(
sample_date_time datetime
)
insert into #tmp_sec_dif
values ('2018-01-09 00:01:38.000')
, ('2018-01-09 23:43:22.000')
--ANSWER
declare #avg_sec_dif int
set #avg_sec_dif =
(select avg(a.sec_dif) as avg_sec_dif
from (
--put the value in terms of seconds away from 00:00:00
--where 23:59:00 would be -60 and 00:01:00 would be 60
select iif(
datepart(hh, sample_date_time) < 12 --is it morning?
, datediff(s, '00:00:00', cast(sample_date_time as time)) --if morning
, datediff(s, '00:00:00', cast(sample_date_time as time)) - 86400 --if evening
) as sec_dif
from #tmp_sec_dif
) as a
)
select cast(dateadd(s, #avg_sec_dif, '00:00:00') as time) as avg_time_of_day
The output would be an answer of 23:52:30.0000000
This code allows you to define a date division point. e.g. 18 identifies 6pm. The time calculation would then be based on seconds after 6pm.
-- Defines the hour of the day when a new day starts
DECLARE #DayDivision INT = 18
IF OBJECT_ID(N'tempdb..#StartTimes') IS NOT NULL DROP TABLE #StartTimes
CREATE TABLE #StartTimes(
start DATETIME NOT NULL
)
INSERT INTO #StartTimes
VALUES
('2018-01-09 00:01:38.000')
,('2018-01-09 23:43:22.000')
SELECT
-- 3. Add the number of seconds to a day starting at the
-- day division hour, then extract the time portion
CAST(DATEADD(SECOND,
-- 2. Average number of seconds
AVG(
-- 1. Get the number of seconds from the day division point (#DayDivision)
DATEDIFF(SECOND,
CASE WHEN DATEPART(HOUR,start) < #DayDivision THEN
SMALLDATETIMEFROMPARTS(YEAR(DATEADD(DAY,-1,start)),MONTH(DATEADD(DAY,-1,start)),DAY(DATEADD(DAY,-1,start)),#DayDivision,0)
ELSE
SMALLDATETIMEFROMPARTS(YEAR(start),MONTH(start),DAY(start),#DayDivision,0)
END
,start)
)
,'01 jan 1900 ' + CAST(#DayDivision AS VARCHAR(2)) + ':00') AS TIME) AS AverageStartTime
FROM #StartTimes

how to show float numbers in Hours, Day, Minute, Second in SQL Server

I have a simple record in table below:
Depart_dt Arrived_dt
10/1/2013 6:15:00 AM 10/1/2013 7:25:00 AM
Based on my calculation, it is 1 hour and 10 min.
Thanks to VKP, I used the datediff function as below:
Select
Dateiff (DD, depart_dt, arrived_dt) as day,
Dateiff (HH, depart_dt, arrived_dt) as hour,
Dateiff (Minute, depart_dt, arrived_dt) as min,
Date if (second, depart_dt, arrived_dt) as second
from temp
However, my result looks funny with the minute and second columns
Day Hour Min Second
0 1 70 4200
The hour appears correct but I am not sure how it comes to 70 in min column and 4200 in second column?
sorry guys, I was wrong. Yes, 70 min is correct because that is 1 hour and 10 min. Please disregard this
You can just use DATEDIFF to get the difference as an integer.
select item,
datediff(dd, start_dt, end_dt) as total_days,
datediff(hh, start_dt, end_dt) as total_hours,
datediff(minute, start_dt, end_dt) as total_minutes,
datediff(second, start_dt, end_dt) as total_seconds
from yourtable
When using datediff you'll have to understand what it does. The name is quite confusing, because it doesn't actually calculate date differences, but according the documentation: "Returns the count (signed integer) of the specified datepart boundaries crossed between the specified startdate and enddate."
That means that for example datediff hour for 06:15 and 07:00 is 1 hour.
You'll probably want something like this:
DATEDIFF(SECOND, [Depart_dt], [Arrived_dt])/86400 as Days,
((DATEDIFF(SECOND, [Depart_dt], [Arrived_dt])%86400)/3600) as Hours,
(((DATEDIFF(SECOND, [Depart_dt], [Arrived_dt])%86400)%3600)/60) as Minutes,
(((DATEDIFF(SECOND, [Depart_dt], [Arrived_dt])%86400)%3600)%60) as Seconds
This calculates the amounts in full days / hours etc so number of hours will never be 24 or more.

Number of specific one-hour periods between two date/times

I have a table of table records, call it "game"
It has an id and timestamp.
What I need to know is unrelated to the table specifically. In order to know the average number of games played per hour, I need to know :
Total games played for each hour over the date range
Number of hourly
periods between the date range.
Finding the first is a matter of extracting the hour from the timestamp and grouping by it.
For the second, if the date range was rounded to the nearest day, finding this value would be easy (totalgames/numdays).
Unfortunately I can't assume this. What I need help with is finding the number of specific hour periods existing within a time range.
Example:
If the range is 5 PM today to 8 PM tomorrow, there is one "00" hour (midnight to 1 AM), but two 17, 18, 19 hours (5-6, 6-7, 7-8)
Thanks for the help
Edit: for clarity, consider the following query:
I have table game:
id, daytime
select EXTRACT(hour from daytime) as hour_period, count (*)
from game
where daytime > dateFrom and daytime < dayTo
group by hour_period
This will give me the number of games played broken down into hourly chunks for the time period.
In order to find the average games played per hour, I need to know exactly how many specific hour durations are between two timestamps. Simply dividing by the number of days is not accurate.
Edit: The ideal output will look something like this:
00 275
01 300
02 255
...
Consider the following: How many times does midnight occur between date 1 and date 2 ? If you have 1.5 days, that doesn't guarantee that midnight will occur twice. 6 AM today to 6 PM tomorrow night, for example, has 1 midnight, but 9PM tonight to 9 AM two days from now has 2 midnights.
What I'm trying to find is how many of the EXACT HOUR occurs between two timestamps, so I can use it to average the number of games played at THAT HOUR over a time period.
EDIT:
The following query gets the days, hours, and # of games, giving an output as below:
29 23 100
29 00 130
30 22 140
30 23 150
Then, the outer query adds up the number of games for each distinct hour and divides by the number of hours, as follows
22 140
23 125
00 130
The modified query is below:
SELECT
hour_period,
sum(hourly_no_of_games) / count(hour_period)
FROM
(
SELECT
EXTRACT(DAY from daytime) as day_period,
EXTRACT(HOUR from daytime) as hour_period,
count (*) hourly_no_of_games
from game
where daytime > dateFrom and daytime < dayTo
group by EXTRACT(DAY from daytime), EXTRACT(HOUR from daytime)
) hourly_data
GROUP BY hour_period
ORDER BY hour_period;
SQL Fiddle demo
If you need something to GROUP BY, you can truncate the timestamp to the level of hour, as in the following:
DECLARE #Date DATETIME
SET #Date = GETDATE()
SELECT #Date, DATEADD(Hour, DATEDIFF(Hour, 0, #Date), 0) AS RoundedDate
If you just need to find the total hours, you can just select the DATEDIFF in hours, such as with
SELECT DATEDIFF(Hour, '5/29/2014 20:01:32.999', GETDATE())
Extract not only the hour of the day but the day of the year (1-366). Then group on those. If there is the possibility the interval could span a year, then add the year itself and group by all three.
year dy hr games
2013 365 23 115
2014 1 00 103

PostgreSQL - Getting statistical data

I need to collect some statistical information in my application.
I have a table of users (tb_user)
Every time a new user accesses the application, it adds a new record in this table, ie, one line for each user. The main field are id and date_hour (timestamp for the first time user accessed the application).
tb_user
id (bigint) | date_time (timestamp with time zone)
1 | 2012-01-29 11:29:50.359-03
2 | 2012-01-31 14:27:10.359-03
I need get:
amount average users by day, week and month
Example:
by day: 55.45
by week : XX.XX
month: XX.XX
EDIT:
My best solution was:
WITH daily_count AS (SELECT COUNT(id) AS user_count FROM tb_user)
SELECT user_count, tbaux2.days, (user_count/tbaux2.days) FROM daily_count,
(SELECT EXTRACT(DAY FROM (t2.diff) ) + 1 AS days
FROM
(with tbaux AS(SELECT min(date_time) AS min FROM tb_user)
SELECT (now() - min) AS diff
FROM tbaux) AS t2) AS tbaux2
GROUP BY user_count, tbaux2.days
But this solution only worked with EXTRACT (DAY ... With weeks and month did not work
Any help is welcome.
Alternatively:
SELECT user_count, tbaux2.days, (user_count/tbaux2.days) AS userPerDay, ((user_count/tbaux2.days) * 7) AS userPerWeek, ((user_count/tbaux2.days) * 30) AS userPerMonth
EDIT 2:
Based on responses from #Bruno, there are some considerations:
When I asked the question, in really I requested a way to select data by day, month and year. I believe that the search that I posted and #Bruno refined, should be interpreted as average of "a day, every 7 days and every 30 days" and not by days, weeks and months. I believe that if it is interpreted in this way, there not will be problems of gender-quoted in example (10% drop). I believe this approach of "every" is answer I need in moment, so will sign this answer.
I suggest as an improvement of post:
Consider only closed day in result (not collect users of the current day, and not counting the current day in division)
The result is two numeric digits.
New research considering a data really per week and per month.
Thanks.
You should look into aggregate functions (min, max, count, avg), which go hand in hand with GROUP BY. For date-based aggregations, date_trunc is also useful.
For example, this will return the number of rows per day:
SELECT date_trunc('day', date_time) AS day_start,
COUNT(id) AS user_count FROM tb_user
GROUP BY date_trunc('day', date_time);
You can then do the daily average using something like this (with a CTE):
WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
COUNT(id) AS user_count FROM tb_user
GROUP BY date_trunc('day', date_time))
SELECT AVG(user_count) FROM daily_count;
Use 'week' instead of day for the weekly counts, and so on (see date_trunc documentation).
EDIT: (Following comment: average up to and including 5/1/2012, i.e. before the 6th.)
WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
COUNT(id) AS user_count
FROM tb_user
WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06')
GROUP BY date_trunc('day', date_time))
SELECT SUM(user_count)/(DATE('2012-01-06') - DATE('2012-01-01')) FROM daily_count;
What's above is over-complicated, in this case. This should give you the same result:
SELECT COUNT(id)/(DATE('2012-01-06') - DATE('2012-01-01'))
FROM tb_user
WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06');
EDIT 2: After your edit, I guess what you're after is just a single global average for the entire period of existence of your database, rather than groups by month/week/day.
This should give you the average number of rows per day:
WITH total_min_max AS (SELECT
COUNT(id) AS total_visits,
MIN(date_time) AS first_date_time,
MAX(date_time) AS last_date_time,
FROM tb_user)
SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
FROM total_min_max
(I would replace last_date_time with NOW() to make the average over the time until now, rather than until the last visit, if there's no recent visit.)
Then, for daily, weekly, and "monthly":
WITH daily_avg AS (
WITH total_min_max AS (SELECT
COUNT(id) AS total_visits,
MIN(date_time) AS first_date_time,
MAX(date_time) AS last_date_time,
FROM tb_user)
SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
FROM total_min_max)
SELECT
users_per_day,
(users_per_day * 7) AS users_per_week,
(users_per_month * 30) AS users_per_month
FROM daily_avg
This being said, conclusions you draw from such statistics might not be great, especially if you want to see how it changes.
I would also normalise the data per day rather than assuming 30 days in a month (if not per hour, because not all days have 24 hours). Say you have 10 visits per day in Jan 2011 and 10 visits per day in Feb 2011. That gives you 310 visits in Jan and 280 visits in Feb. If you don't pay attention, you could think you've had a almost a 10% drop in terms of number of visitors, so something went wrong in Feb, when really, this isn't the case.