BigQuery how to calculate average DATETIME_DIFF

BigQuery how to calculate average DATETIME_DIFF - sql

I just can't figure out how to get the average of travel time ended_at and started_at using DATETIME_DIFF. So there are two possible values in column member_casual and I want to figure out how to group the average travel times per member group. ie. return two rows with one value on each row, the average travel time for the group. I've tried searching but I've failed to translate the solutions to my issue.
SELECT
member_casual,
DATETIME_DIFF(started_at, ended_at, MINUTE) +
CASE WHEN ended_at < started_at THEN 24 ELSE 0 end
FROM dataset
GROUP BY
member_casual,
started_at,
ended_at
LIMIT 100
I've tried adding AVG(...) in several places but I guess I just don't know enough about SQL yet to figure this out.
CASE is used to fix the error that happens when the travel period passes midnight.

Basically you only add an average aroud the date_diff and the query will get the aveage for every member_casual
also a LIMIT with Out order By makes no sense
SELECT
member_casual,
AVG(DATETIME_DIFF(started_at, ended_at, MINUTE) +
CASE WHEN ended_at < started_at THEN 24 ELSE 0 end) avd_date_diff
FROM dataset
GROUP BY
member_casual
ORDER BY member_casual
LIMIT 100

Related

calculating average with grouping based on time intervals

In a postgres table I have store the speed of an object with a 10 seconds interval. The values are not available for every 10 seconds during the day; so it could be that there is no line for today 16:39:40
How would the query look like to get an relation containing the average of the speed for 1 minute (or 30sec or n-sec) intervals for a given day, assuming the non-existing rows mean a speed of 0.
speed_table
id (int, pk)
ts (timestamp)
speed (numeric)
I've built this query but am getting stuck on some important parts:
SELECT
date_trunc('minute', ts) AS truncated,
avg(speed)
FROM speed_table AS t
WHERE ts >= '2014-06-21 00:00:00'
AND ts <= '2014-06-21 23:59:59'
AND condition2 = 'something'
GROUP BY date_trunc('minute', ts)
ORDER BY truncated
How can I alter the interval in something other then the result of the date_trunc function eg 5 minutes of 30 seconds?
How can I add the not available rows for the remaining of the day?

Simple and fast solution for this particular example:
SELECT date_trunc('minute', ts) AS minute
, sum(speed)/6 AS avg_speed
FROM speed_table AS t
WHERE ts >= '2014-06-21 0:0'
AND ts < '2014-06-20 0:0' -- exclude dangling corner case
AND condition2 = 'something'
GROUP BY 1
ORDER BY 1;
You need to factor in missing rows as "0 speed". Since a minute has 6 samples, just sum and divide by 6. Missing rows evaluate to 0 implicitly.
This returns no row for minutes with no rows at all.avg_speed for missing result rows is 0.
General query for arbitrary intervals
Works for all any interval listed in the manual for date_trunc():
SELECT date_trunc('minute', g.ts) AS ts_start
, avg(COALESCE(speed, 0)) AS avg_speed
FROM (SELECT generate_series('2014-06-21 0:0'::timestamp
, '2014-06-22 0:0'::timestamp
, '10 sec'::interval) AS ts) g
LEFT JOIN speed_table t USING (ts)
WHERE (t.condition2 = 'something' OR
t.condition2 IS NULL) -- depends on actual condition!
AND g.ts <> '2014-06-22 0:0'::timestamp -- exclude dangling corner case
GROUP BY 1
ORDER BY 1;
The problematic part is the additional unknown condition. You would need to define that. And decide whether missing rows supplied by generate_series should pass the test or not (which can be tricky!).
I let them pass in my example (and all other rows with a NULL values).
Compare:
PostgreSQL: running count of rows for a query 'by minute'
Arbitrary intervals:
Truncate timestamp to arbitrary intervals
For completely arbitrary intervals consider #Clodoaldo's math based on epoch values or use the often overlooked function width_bucket(). Example:
Aggregating (x,y) coordinate point clouds in PostgreSQL
Aggregating (x,y) coordinate point clouds in PostgreSQL

If you had issued some data it would be possible to test so this can contain errors. Point them including the error message so I can fix.
select
to_timestamp(
(extract(epoch from ts)::integer / (60 * 2)) * (60 * 2)
) as truncated,
avg(coalesce(speed, 0)) as avg_speed
from
generate_series (
'2014-06-21 00:00:00'::timestamp,
'2014-06-22'::timestamp - interval '1 second',
'10 seconds'
) ts (ts)
left join
speed_table t on ts.ts = t.ts and condition2 = 'something'
group by 1
order by 1
The example is grouped by 30 seconds. It is number of seconds since 1970-01-01 00:00:00 (epoch) divided by 120. When you want to group by 5 minutes divide it by 12 (60 / 5).
The generate_series in the example is generating timestamps at 1 second interval. It is left outer joined to the speed table so it fills the gaps. When the speed is null then coalesce returns 0.

How to get a SUM of a DATEDIFF but provide cut-off at 24 hours IF a single day is specified

This is actually my first question on stackoverflow, so I sincerely apologize if I am confusing or unclear.
That being said, here is my issue:
I work at a car manufacturing company and we have recently implemented the ability to track when our machines are idle. This is done by assessing the start and end time of the event called "idle_start."
Right now, I am trying to get the SUM of how long a machine is idle. Now, I figured this out BUT, some of the idle_times are LONGER than 24 hours.
So, when I specify that I only want to see the idle_time sums of ONE particular day, the sum is also counting the idle time past 24 hours.
I want to provide the option of CUTTING OFF at that 24 hours. Is this possible?
Here is the query:
{code}
SELECT r.`name` 'Producer'
, m.`name` 'Manufacturer'
-- , timediff(re.time_end, re.time_start) 'Idle Time Length'
, SEC_TO_TIME(SUM((TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start))))) 'Total Time'
, (SUM((TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start)))))/3600 'Total Time in Hours'
, (((SUM((TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start)))))/3600))/((IF(r.resource_status_id = 2, COUNT(r.resource_id), NULL))*24) 'Percent Machine is Idle divided by Machine Hours'
FROM resource_event re
JOIN resource_event_type ret
ON re.resource_event_type_id = ret.resource_event_type_id
JOIN resource_event_type reep
ON ret.parent_resource_event_type_id = reep.resource_event_type_id
JOIN resource r
ON r.`resource_id` = re.`resource_id`
JOIN manufacturer m
ON m.`manufacturer_id` = r.`manufacturer_id`
WHERE re.`resource_event_type_id` = 19
AND ret.`parent_resource_event_type_id` = 3
AND DATE_FORMAT(re.time_start, '%Y-%m-%d') >= '2013-08-12'
AND DATE_FORMAT(re.time_start, '%Y-%m-%d') <= '2013-08-18'
-- AND re.`resource_id` = 8
AND "Idle Time Length" IS NOT NULL
AND r.manufacturer_id = 13
AND r.resource_status_id = 2
GROUP BY 1, 2
Feel free to ignore the dash marks up top. And please tell me if I can be more specific as to figure this out easier and provide less headaches for those willing to help me out.
Thank you so much!

You'll want a conditional SUM, using CASE.
Not sure of syntax for your db exactly, but something like:
, SUM (CASE WHEN TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start))/3600 > 24 THEN 0
ELSE TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start))/3600
END)'Total Time in Hours'

This is not an attempt to answer your question. It's being presented as an answer rather than a comment for better formatting and readability.
You have this
AND DATE_FORMAT(re.time_start, '%Y-%m-%d') >= '2013-08-12'
AND DATE_FORMAT(re.time_start, '%Y-%m-%d') <= '2013-08-18'
in your where clause. Using functions like this make your query take longer to execute, especially on indexed fields. Something like this would run quicker.
AND re.time_start >= a date value goes here
AND re.time_start <= another date value goes here

Do you want to cut off when start/end are before/after your time range?
You can use a case to adjust it based on your timeframe, e.g. for time_start
case
when re.time_start < timestamp '2013-08-12 00:00:00'
then timestamp '2013-08-12 00:00:00'
else re.time_start
end
similar for time_end and then use those CASEs within your TIMEDIFF.
Btw, your where-condition for a given date range should be:
where time_start < timestamp '2013-08-19 00:00:00'
and time_end >= timestamp '2013-08-12 00:00:00'
This will return all idle times between 2013-08-12 and 2013-08-18

PostgreSQL "nested"? distincts and count

I need to get the count of the distinct names per hour in one query in PostgreSQL 9.1
The relevant columns(generalized for question) in my table are:
occurred timestamp with time zone and
name character varying(250)
And the table name for the sake of the question is just table
The occurred timestamps will all be within a midnight to midnight(exclusive) range for one day. So far my query looks like:
'SELECT COUNT(DISTINCT ON (name)) FROM table'
It would be nice if I could get the output formatted as a list of 24 integers(one for each hour of the day), the names aren't required to be returned.

If I understand correctly what you want, you can write:
SELECT EXTRACT(HOUR FROM occurred),
COUNT(DISTINCT name)
FROM ...
WHERE ...
GROUP
BY EXTRACT(HOUR FROM occurred)
ORDER
BY EXTRACT(HOUR FROM occurred)
;

SELECT date_trunc('hour', occurred) AS hour_slice
,count(DISTINCT name) AS name_ct
FROM mytable
GROUP BY 1
ORDER BY 1;
DISTINCT ON is a different feature.
date_trunc() gives you a sum for every distinct hour, while EXTRACT sums per hour-of-day over longer periods of time. The two results do not add up, because summing up multiple count(DISTINCT x) is equal or greater than one count(DISTINCT x).

You want this by hour:
select extract(hour from occurred) as hr, count(distinct name)
from table t
group by extract(hour from occurred)
order by 1
This assumes there is data for only one day. Otherwise, hours from different days would be combined. To get around this, you would need to include date information as well.

How to get number of hits by time regardless of Date?

I am working on a sql view that should get the average number of hits by hour of the day, regardless of what day/date it is for traffic monitoring (12:00:00.000 - 12:59:59.999). Any ideas?
EDIT
Now I have the total, how do I get the average? SELECT AVG("FUNCTION BELOW") DOES NOT WORK
SELECT COUNT(*) AS total, DATEPART(hh, LogDate) AS HourOfDay
FROM dbo.Log
GROUP BY DATEPART(hh, LogDate)

Convert to DATEPART(hh,.....
Example SELECT DATEPART(hh,GETDATE())
Since you are on SQL Server 2008, you can use the time data type, just convert to time
example
SELECT CONVERT(TIME,GETDATE())
Then you can filter that also
Since I am not sure what your output is supposed to be like I am showing you both, but if all you need is to group by hour, then just do a datepart(hh.....

The query below may be good enough for you. It divides the count by the difference between todays date and the minimum date in the LogDate column.
SELECT DATEPART(hh,LogDate) as Hour
,CAST(COUNT(*)as decimal)/DATEDIFF(d,(SELECT MIN(LogDate) from log)
,CURRENT_TIMESTAMP) as AverageHits
, COUNT(*) as Count
FROM log
GROUP BY DATEPART(hh,LogDate)
ORDER by DATEPART(hh,LogDate) asc

How to filter table to date when it has a timestamp with time zone format?

I have a very large dataset - records in the hundreds of millions/billions.
I would like to filter the data in this column - i am only showing 2 records of millions:
arrival_time
2019-04-22 07:36:09.870+00
2019-06-07 09:46:09.870+00
How can i filter the data in this column to only the date part? as in I would like to filter where the arrival_time is 2019-04-22 as this would give me the first record and any other records which have the matching date of 2019-04-22?
I have tried to cast the column to timestamp::date = "2019-04-22" but this has been costly and does not work well given i have such vast amounts of records.
sample code is:
select
*
from
mytable
where
arrival_time::timestamp::date = '2019-09-30'
again very costly if i cast to date format as this will be done before the filtering!
any ideas? I am using postgresql and pgadmin4

This query:
where (arrival_time::timestamp)::date = '2019-09-30'
Is converting arrival_time to another type. That generally precludes the use of index and makes it harder for the optimizer to choose the best execution path.
Instead, compare to same data type:
where arrival_time >= '2019-09-30'::timestamp and
arrival_time >= ('2019-09-30'::timestamp + interval '1 day')

You can try to filter for the upper and lower bounds of that day.
...
WHERE arrival_time >= '2019-04-22'::timestamp
AND arrival_time < '2019-04-23'::timestamp
...
Like that an index on arrival_time should be usable and help to improve performance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery how to calculate average DATETIME_DIFF - sql

Related

calculating average with grouping based on time intervals

How to get a SUM of a DATEDIFF but provide cut-off at 24 hours IF a single day is specified

PostgreSQL "nested"? distincts and count

How to get number of hits by time regardless of Date?

How to filter table to date when it has a timestamp with time zone format?

Categories

Resources