Calculate Average Time Over 24 hour period - sql

I'm working in Teradata and am trying to calulate the average time a job completes.
Data Values:
Job Name Start Date End Date End Time
D_BDW_CCIP_SRM_LD 10/10/2012 10/11/2012 01:41:49
D_BDW_CCIP_SRM_LD 10/9/2012 10/10/2012 00:19:56
D_BDW_CCIP_SRM_LD 10/8/2012 10/8/2012 23:37:18
D_BDW_CCIP_SRM_LD 10/5/2012 10/5/2012 23:39:47
D_BDW_CCIP_SRM_LD 10/4/2012 10/4/2012 23:42:47
D_BDW_CCIP_SRM_LD 10/3/2012 10/3/2012 23:41:54
The average is coming back with 16:07 instead of 00:07. What I need to happen is that the calculations where the job finishes next day understands that the time expanded.
In Excel I could do this by adding one day to the end time and then averaging and displaying as a time.
How do I do this in Teradata?

This is such an interesting question! UPDATED with correct syntax: Assuming your START_DATE and END_DATE are DATE values and END_TIME is a TIME value, here is a solution:
select cast( avg( case
when start_date <> end_date
then extract(second from end_time)
+ extract(minute from end_time) * 60
+ extract(hour from end_time) * 3600
+ 86400
else extract(second from end_time)
+ extract(minute from end_time) * 60
+ extract(hour from end_time) * 3600
end) mod 86400) as decimal(10,4))
* INTERVAL '00:00:01.00' HOUR TO SECOND as avg_time
from your_table
The CASE expression "adds" one day (86,400 seconds) as you suggested when using Excel to determine the average seconds since midnight into an intermediate result and converted into a TIME column.
To be fair, I received help from the Teradata Forum formatting the result, but I like this so much I'll be using it myself.

This seems to do the trick, but I'd be interested in seeing if there is another way.
SELECT job_name,
case when avg_end_time_in_minutes > 60*24 then avg_end_time_in_minutes - 60*24
else avg_end_time_in_minutes end as avg_adjusted,
case when max_end_time_in_minutes > 60*24 then max_end_time_in_minutes - 60*24
else max_end_time_in_minutes end as max_adjusted,
CAST((CAST(avg_adjusted / 60 AS INTEGER) (FORMAT '9(2)')) AS CHAR(2))||':'||
CAST((CAST((avg_adjusted / 60 MOD 1)*60 AS INTEGER) (FORMAT '9(2)')) AS CHAR(2))
avg_adjusted_time,
CAST((CAST(max_adjusted / 60 AS INTEGER) (FORMAT '9(2)')) AS CHAR(2))||':'||
CAST((CAST((max_adjusted / 60 MOD 1)*60 AS INTEGER) (FORMAT '9(2)')) AS CHAR(2))
max_adjusted_time
FROM (
SELECT job_name,
AVG(end_time_in_minutes) avg_end_time_in_minutes,
MAX(CAST(end_time_in_minutes AS DECIMAL(8,2))) max_end_time_in_minutes
FROM (
SELECT job_name,
CAST(substr(end_time, 1, 2) AS INTEGER)*60
+ CAST(substr(end_time, 4, 2) AS INTEGER)
+ cast(end_date - start_date as integer)*60*24 AS end_time_in_minutes
FROM dabank_prod_ops_tb.bdw_tables_load_tracker_view a
WHERE a.status = 'COMPLETED'
AND a.start_date BETWEEN CURRENT_DATE - 31 AND CURRENT_DATE -1
AND a.end_time IS NOT NULL
) a
GROUP BY 1
) b

First, figure out the number of seconds that the end time is from midnight on the start date. We can then use that to calculate the average number of seconds taken, and then add that to midnight to find the average end time.
select
avg(extract(second from end_time) + 60 *
(extract(minute from end_time) + 60 *
(extract(hour from end_time) + 24 *
(end_date - start_date))) as avg_duration_in_seconds
cast(avg_duration_in_seconds / 60 / 60 as integer) as avg_hours
mod(cast(avg_duration_in_seconds / 60 as integer), 60) as avg_minutes
mod(cast(avg_duration_in_seconds as integer), 60) as avg_seconds,
cast('00:00:00' as time) +
cast(avg_hours as interval hour) +
cast(avg_minutes as interval minute) +
cast(avg_seconds as interval second) as avg_end_time
from my_table
Be aware though that if the average ends up over 24 hours, avg_end_time will be something like 00:01:15 rather than 24:01:15.

Related

Trying to get HH:MM:SS from milliseconds in Presto

I'm trying to convert milliseconds to format HH:MM:SS or MM:SS, but I keep getting the same error.
Here's the error:
java.sql.SQLException: [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 5:19: Unexpected parameters (time, varchar(5)) for function date_format. Expected: date_format(timestamp with time zone, varchar(x)) , date_format(timestamp, varchar(x)) [Execution ID: 89bfd858-9992-439f-ad84-b59bfd1cbde8]
Here's my code:
SELECT
column_a,
round(AVG((milliseconds) / 1000)) AS Seconds,
(case when milliseconds/1000 < 60 * 60
then time '00:00:00' + milliseconds * interval '1' second, '%i:%s'
else time '00:00:00' + milliseconds * interval '1' second, '%H:%i:%s'
end) as hhmmss,
round((AVG((column_b)) / 1099511627776),2) AS b,
COUNT(column_c) AS c
FROM
table
GROUP BY
column_a
Tried with this one as well
(case when milliseconds/1000 < 60 * 60
then date_format(time '00:00:00' + milliseconds * interval '1' second, '%i:%s')
else date_format(time '00:00:00' + milliseconds * interval '1' second, '%H:%i:%s')
end) as hhmmss
Any help, please?
You can just cast your time to timestamp:
select date_format(cast(time '00:00:00' + 23 * interval '1' second as timestamp), '%H:%i:%s')
Output:
_col0
00:00:23
Note that this will work only if you have less than 24 hours interval in your milliseconds, otherwise you will need to do math yourself and concat results into desired string.
P.S. Should not milliseconds * interval '1' second be (milliseconds/1000) * interval '1' second?

How to extract the hour of day from an epoch and count each instance that occurs during that hour?

I have a question that I feel is pretty straight forward but is giving me some issues.
I have a column in table X called event_time which is an epoch. I am wanting to extract the hour of day out of that and count the number of rides that have occurred during that hour.
So the output will end up being a bar chart with x values 0-24 and the Y being the number of instances that occur (which is bike rides for example).
Here is what I have now, that isn't giving me the correct output:
select extract(hour from to_timestamp(start_time)::date) as hr,
count(*) as ct
from x
group by hr
order by hr asc
Any hints or help are appreciated.
Thanks
You can use arithmetic:
select floor( (start_time % (24 * 60 * 60)) / (60 * 60) ) as hour,
count(*)
from x
group by hour;
Or convert to a date/time and extract the hour:
select extract(hour from '1970-01-01'::date + start_time * interval '1 second') as hour, count(*)
from x
group by hour;

Postgres: Select a timeinterval that spans past midnight

I have the following table:
id | time
----+-------------
1 | 21:00:00+01
2 | 22:00:00+01
3 | 23:00:00+01
Column id is of type integer and time is time with timezone. I want to select all rows that fall within a specified interval, e.g.,
select *
from times
where time >= time '22:30' - interval '60 minutes' and time <= time '22:30' + interval '60 minutes';
However, if the intervall extends past midnight, i.e., when I select 23:30 as time argument, then I get an empty result set.
Is there a way to tell postgress to ignore the minutes that span past midnight?
You can use this logic:
select *
from times t cross join
(values ('22:30'::time - interval '60 minutes', '22:30'::time + interval '60 minutes')
) v(fromt, tot)
where (fromt <= tot and time >= fromt and time <= tot) or
(fromt > tot and (time >= fromt or time <= tot))

Where date between a and b performance

I have come across a query which has me curious whether the programmer was show boating or whether there is merit to the way it has been done in terms of performance. I have no clue as to why the from time is 01:59 rather than 00:00, this would actually remove some of the results that would actually want to be included.
This is the where clause of the query
WHERE REPORTDATE BETWEEN TRUNC(SYSDATE - 21) + 01 / 24 + 59 / (24 * 60) + 59 / (24 * 60 * 60)
AND TRUNC(SYSDATE) + 23 / 24 + 59 / (24 * 60) + 59 / (24 * 60 * 60)
and if my math is correct, is the same as
WHERE REPORTDATE BETWEEN to_date('13/04/2017 01:59','dd/mm/yyyy hh24:mi')
AND to_date('04/05/2017 23:59','dd/mm/yyyy hh24:mi')
Is there any benefit in the first calculated where clause over the second?
You can use interval literals to get rid of all the arithmetic and simplify the query:
WHERE REPORTDATE BETWEEN TRUNC( SYSDATE ) - INTERVAL '20 22:00:01' DAY TO SECOND
AND TRUNC( SYSDATE ) + INTERVAL '00 23:59:59' DAY TO SECOND
or
WHERE REPORTDATE BETWEEN TRUNC( SYSDATE ) - INTERVAL '21' DAY
+ INTERVAL '01:59:59' HOUR TO SECOND
AND TRUNC( SYSDATE ) + INTERVAL '00 23:59:59' DAY TO SECOND
or
WHERE REPORTDATE >= TRUNC( SYSDATE ) - INTERVAL '20 22:00:01' DAY TO SECOND
AND REPORTDATE < TRUNC( SYSDATE ) + INTERVAL '1' DAY
It is hard to imagine a performance difference, based on different ways of calculating constants in a query.
I would write this using something like this:
WHERE REPORTDATE >= CAST(TIMESTAMP '2017-04-13 02:00:00' as DATE) and
REPORTDATE < DATE '2017-05-05'
If you are going to include date/time constants, use the built-in mechanisms that support standard formats.
or for more flexibility based on the current date:
WHERE REPORTDATE >= TRUNC(sysdate) - 21 + 2 / 24 AND
REPORTDATE < TRUNC(sysdate) + 1
(or, if 1:59 is really intended . . . then TRUNC(sysdate) - 21 + (1 * 60 + 59) / (24 * 60).)

SQL AVERAGE TIME

I have the following query in MSSQL:
select
TRANSACTION_TYPE_ID
,COUNT(TRANSACTION_TYPE_ID)AS NUMBER_OF_TRANSACTIONS
,CAST(SUM(AMOUNT)AS DECIMAL (30,2)) AS TOTAL
FROM
[ONLINE_TRANSACTION]
WHERE CONVERT(CHAR(8), CREATED_ON, 114) >='17:30' AND AMOUNT IS NOT NULL AND
TRANSACTION_TYPE_ID !='CHEQUE-STOP-TRANS-TYPE'
GROUP BY TRANSACTION_TYPE_ID
ORDER BY TRANSACTION_TYPE_ID
I want to show the type of transactions TRANSATION_TYPE_ID as above the total amount of each type of transaction as above BUT also the average time these transactions occurred CREATED_ON which is datetime I still have not find a good way of doing this?
Based on Randolph Potter's answer, you can find the average time like:
avg(DATEPART(hh,created_on)*60 + DATEPART(mi,created_on)) % 24 as AvgHour,
avg(DATEPART(hh,created_on)*60 + DATEPART(mi,created_on)) / 24 as AvgMinute
One way would be to convert the time to seconds, calculate the average, and then convert it back to hours, minutes and seconds for the result.
If your talking about the time of day and not looking to get a specific date, ie. 5:32 instead of Jan 4, 2012 5:32, the below could help. Sorry about the caps, it's the way I'm used to writing SQL.
CONVERT(VARCHAR,AVG(DATEPART(HH,CREATED_ON)*60 + DATEPART(MI,CREATED_ON)) / 60) + ':' +
CASE WHEN CONVERT(VARCHAR,AVG(DATEPART(HH,CREATED_ON)*60 + DATEPART(MI,CREATED_ON)) % 60) < 10
THEN '0'+CONVERT(VARCHAR,AVG(DATEPART(HH,CREATED_ON)*60 + DATEPART(MI,CREATED_ON)) % 60)
ELSE CONVERT(VARCHAR,AVG(DATEPART(HH,CREATED_ON)*60 + DATEPART(MI,CREATED_ON)) % 60)
END AS AVG_CREATED_ON
SELECT FROM_UNIXTIME( ( ROUND((UNIX_TIMESTAMP( floor(timestamp_column)) / 60 ),0) * 60 ) ) rounded_time
FROM mysql_table
WHERE timestamp_column BETWEEN STR_TO_DATE('31/07/2012','%d/%m/%Y')
AND STR_TO_DATE('01/08/2012','%d/%m/%Y')