difference between 2 datetimes in UTC in Bigquery - google-bigquery

I am returning a set of values in a bigquery select statement like this
I need to add a compute field Utlization for each row like this formulae (end_time - start_time)*cores Utilization
This time format is in UTC so I am not sure how to do this , I want to do this in the select statement itself. I am new to BigQuery. Kindly Help . Thanks

You need to use TIMESTAMP_DIFF function in standard SQL.
In your particular case the query would be something like
SELECT TIMESTAMP_DIFF(end_time , start_time , second)*cores as Utilization
FROM <yourtable>
Take into consideration that you can change the time unit of the result and you should change it to fit your needs. I inserted second but you can use microsecond, millisecond, second, minute, hour or day.

Related

How to set a max range condition with timescale time_bucket_gapfill() in order to not fill real missing values?

I'd like some advices to know if what I need to do is achievable with timescale functions.
I've just found out I can use time_bucket_gapfill() to complete missing data, which is amazing! I need data each 5 minutes but I can receive 10 minutes, 30 minutes or 1 hour data. So the function helps me to complete the missing points in order to have only 5 minutes points. Also, I use locf() to set the gapfilled value with last value found.
My question is: can I set a max range when I set the last value found with locf() in order to never overpass 1 hour ?
Example: If the last value found is older than 1 hour ago I don't want to fill gaps, I need to leave it empty to say we have real missing values here.
I think I'm close to something with this but apparently I'm not allowed to use locf() in the same case.
ERROR: multiple interpolate/locf function calls per resultset column not supported
Somebody have an idea how I can resolve that?
How to reproduce:
Create table powers
CREATE table powers (
delivery_point_id BIGINT NOT NULL,
at timestamp NOT NULL,
value BIGINT NOT NULL
);
Create hypertable
SELECT create_hypertable('powers', 'at');
Create indexes
CREATE UNIQUE INDEX idx_dpid_at ON powers(delivery_point_id, at);
CREATE INDEX index_at ON powers(at);
Insert data for one day, one delivery point, point 10 minutes
INSERT INTO powers SELECT 1, at, round(random()*10000) FROM generate_series(TIMESTAMP '2021-01-01 00:00:00', TIMESTAMP '2022-01-02 00:00:00', INTERVAL '10 minutes') AS at;
Remove three hours of data from 4am to 7am
DELETE FROM powers WHERE delivery_point_id = 1 AND at < '2021-01-1 07:00:00' AND at > '2021-01-01 04:00:00';
The query that need to be fixed
SELECT
time_bucket_gapfill('5 minutes', at) AS point_five,
avg(value) AS avg,
CASE
WHEN (locf(at) - at) > interval '1 hour' THEN null
ELSE locf(avg(value))
END AS gapfilled
FROM powers
GROUP BY point_five, at
ORDER BY point_five;
Actual: ERROR: multiple interpolate/locf function calls per resultset column not supported
Expected: Gapfilled values each 5 minutes except between 4am and 7 am (real missing values).
This is a great question! I'm going to provide a workaround for how to do this with the current stuff, but I think it'd be great if you'd open a Github issue as well, because there might be a way to add an option for this that doesn't require a workaround like this.
I also think your attempt was a good approach and just requires a few tweaks to get it right!
The error that you're seeing is that we can't have multiple locf calls in a single column, this is a limitation that's pretty easy to work around as we can just shift both of them into a subquery, but that's not enough. The other thing that we need to change is that locf only works on aggregates, right now, you’re trying to use it on a column (at) that isn’t aggregated, which isn’t going to work, because it wouldn’t know which of the values of at in a time_bucket to “pull forward” for the gapfill.
Now you said you want to fill data as long as the previous point wasn’t more than one hour ago, so, we can take the last value of at in the bucket by using last(at, at) this is also the max(at) so either of those aggregates would work. So we put that into a CTE (common table expression or WITH query) and then we do the case statement outside like so:
WITH filled as (SELECT
time_bucket_gapfill('5 minutes', at) AS point_five,
avg(value) AS avg,
locf(last(at, at)) as filled_from,
locf(avg(value)) as filled_avg
FROM powers
WHERE at BETWEEN '2021-01-01 01:30:00' AND '2021-01-01 08:30:00'
AND delivery_point_id = 1
GROUP BY point_five
ORDER BY point_five)
SELECT point_five,
avg,
filled_from,
CASE WHEN point_five - filled_from > '1 hour'::interval THEN NULL
ELSE filled_avg
END as gapfilled
FROM filled;
Note that I’ve tried to name my CTE expressively so that it’s a little easier to read!
Also, I wanted to point out a couple other hyperfunctions that you might think about using:
heartbeat_agg is a new/experimental one that will help you determine periods when your system is up or down, so if you're expecting points at least every hour, you can use it to find the periods where the delivery point was down or the like.
When you have more irregular sampling or want to deal with different data frequencies from different delivery points, I’d take a look a the time_weight family of functions. They can be more efficient than using something like gapfill to upsample, by instead letting you treat all the different sample rates similarly, without having to create more points and more work to do so. Even if you want to, for instance, compare sums of values, you’d use something like integral to get the time weighted sum over a period based on the locf interpolation.
Anyway, hope all that is helpful!

SQL: select date rows that contain specific hour and minute

I am querying a table that has the date column as follows:
date
2021-03-08 05:05:31+00
2021-03-08 05:10:31+00
How can I select all the rows that contain 05:05 as the hour and minute in SQL? i.e. rows where hour = 05, and minute = 05. In this case it will be the first row.
Q: How can I select all the rows that contain 05:05 as the hour and minute in SQL?
A: For MySQL, look in the MySql Date and Time functions. There, you'll find Extract().
You can use it as follows:
https://www.w3schools.com/sql/func_mysql_extract.asp
Extract the minute from a datetime:
SELECT EXTRACT(MINUTE FROM "2017-06-15 09:34:21");
This assumes that you're storing the column as a "Date" type.
Different RDBMS vendors have different Date/Time functions. You'll have to read the documentation and experiment to determine which syntax to use for your particular DB vendor and your particular table schema.
You Can Use below Query for get Result as per your question .
There is DateName function in SQL and you can put this in your query as below.
CreatedDate is column name..
Example :
Select * from #tmp1 where datename(hour,createdDate)=07 And datename(minute,CreatedDate)=07

PostgreSQL: Calculate Average Handling Time

I have a sample table that looks like this
I need to to a SQL script to get the Average Handling Time of a Case, I researched for suggestions but never worked with timestamps and I'm really lost on how to do it.
If you subtract one timestamp from another, you get an interval. And you can calculate the average over intervals.
select avg(close_timestamp - create_timestamp)
from the_table;
You can calculate the AVG of the difference of the timestamp.
SELECT agent, avg(close_timestamp - create_timestamp) average_timestamp
FROM your_table
GROUP BY agent
ORDER BY agent
You can format the solution for obtain it in days/hours/minutes/seconds.

Mysql Datetime queries

I'm new to this forum. I've been having trouble constructing a MySQL query. Basically I want to select data and use some sort of function to output the timestamp field in a certain way. I know that dateformat can do this by minute, day, hour, etc. But consider the following:
Say it is 12:59pm. I want to be able to select data from the past day, and have the data be placed into two hour wide time 'bins' based on it's timestamp.
So these bins would be: 10:00am, 8:00am, 6:00am, 4:00am, etc, and the query would convert the data's timestamp in one of these bins.
E.G.
data converted
4:45am becomes 4:00am,
6:30am becomes 6:00am,
9:55am becomes 8:00am,
10:03am becomes 10:00am,
11:00am becomes 10:00am
Make sense? The width of the bins needs to be dynamic as well. I hope I described the problem clearly, and any help is appreciated.
Examples:
Monthly buckets:
GROUP BY YEAR(datestampfield) desc, MONTH(datestampfield) desc
Hourly buckets, with number of hours configurable:
set #rangehrs = 2; select *,FLOOR(HOUR(dateadded)/#rangehrs )*#rangehrs as x from mytable GROUP BY FLOOR(HOUR(dateadded)/#rangehrs )*#rangehrs limit 5;
Sounds like you're looking for a histogram of time. Not sure that's a real thing, but the term histogram might get you in a good place....like this related question:
Getting data for histogram plot

SQL: select one record for each day nearest to a specific time

I have one table that stores values with a point in time:
CREATE TABLE values
(
value DECIMAL,
datetime DATETIME
)
There may be many values on each day, there may also be only one value for a given day. Now I want to get the value for each day in a given timespan (e.g. one month) which is nearest to a given time of day. I only want to get one value per day if there are records for this day or no value if there are no records. My database is PostgreSQL. I'm quite stuck with that. I could just get all values in the timespan and select the nearest value for each day programmatically, but that would mean to pull a huge amount of data from the database, because there can be many values on one day.
(Update)
To formulate it a bit more abstract: I have data of arbitrary precision (could be one minute, could be two hours or two days) and I want to convert it to a fixed precision of one day, with a specific time of day.
(second update)
This is the query from the accepted answer with correct postgresql type converstions, assuming the desired time is 16:00:
SELECT datetime, value FROM values, (
SELECT DATE(datetime) AS date, MIN(ABS(EXTRACT(EPOCH FROM TIME '16:00' - CAST(datetime AS TIME)))) AS timediff
FROM values
GROUP BY DATE(datetime)
) AS besttimes
WHERE
CAST(values.datetime AS TIME) BETWEEN TIME '16:00' - CAST(besttimes.timediff::text || ' seconds' AS INTERVAL)
AND TIME '16:00' + CAST(besttimes.timediff::text || ' seconds' AS INTERVAL)
AND DATE(values.datetime) = besttimes.date
How about going into this direction?
SELECT values.value, values.datetime
FROM values,
( SELECT DATE(datetime) AS date, MIN(ABS(_WANTED_TIME_ - TIME(datetime))) AS timediff
FROM values
GROUP BY DATE(datetime)
) AS besttimes
WHERE TIME(values.datetime) BETWEEN _WANTED_TIME_ - besttimes.timediff
AND _WANTED_TIME_ + besttimes.timediff
AND DATE(values.datetime) = besttimes.date
I am not sure about the date/time extracting and abs(time) functions, so you will have to replace them probably.
It appears you have two parts to solve:
Are there any results for a day at all?
If there are, then which is the nearest one?
By shortcircuiting the process at part 1 if you have no results you'll save a lot of execution time.
The next thing to note is that you don't have to pull the data from the database, wait until you have an answer or not by using PLSQL functions (or something else) to work it out on the server first.
Once you have a selection of times to check you can use intervals to compare them. Check the Postgres docs on intervals and datetime functions for precise instructions, but basically you minus the selected dates from the date you've given and the one with the smallest interval is the one you want.