How to select rows until the sum of a column reaches N, where the column is of type TIME - sql

I would like to select enough audio calls to have 00:10:00 minutes of audio. I have tried to achieve this by writing the following SQL (postgres) statement
SELECT file_name, audio_duration
FROM (
SELECT distinct file_name, audio_duration, SUM(audio_duration)
OVER (ORDER BY audio_duration) AS total_duration
FROM data
) AS t
WHERE
t.total_duration <='00:10:00'
GROUP BY file_name, audio_duration
My problem is that it doesn't seem to be calculating the total duration correctly.
I suspect this is due the audio_duration column being the TIME type.
If anyone have any hints or suggestions on how to make this query, it would be greatly appreciated.

You should really define that column to be an interval. A time column stores a moment in time, e.g. "3 in the afternoon".
However you can cast a single time value to an interval. You also don't need the window function to first calculate the "running total" if you want the total duration per file:
SELECT file_name, sum(audio_duration::interval) as total_duration
FROM data
GROUP BY file_name
HAVING sum(audio_duration::interval) <= interval '10 minute';
To permanently change the column type to an interval you can use:
alter table data
alter duration type interval;

I fully agree with #a_horse_with_no_name that Interval is the better datatype, but must admit that the Time datatype in not incorrect. While you cannot add (+) time datatypes you can SUM them. Summing time datatypes result in an interval, and produces the same result as summing corresponding intervals. Time besides being moment is also the interval from the beginning of day to that moment. Demo (fiddle)
with as_time (dur) as ( values ('10:34:45 AM'::time), ('03:14:50 PM'::time), ('11:15:25 PM'::time))
, as_intv (dur) as ( values ('10:34:45'::interval), ('15:14:50'::interval),('23:15:25'::interval))
select *
from (select sum(dur) sum_time from as_time) st
, (select sum(dur) sum_intv from as_intv) si;
BTW: The answer to the rhetorical question "what is the sum of "8 in the morning" and "3 in the afternoon"? Well it's 23:00:00.

Related

Hive - calculating string type timestamp differences in minutes

I'm novice to SQL (in hive) and trying to calculate every anonymousid's time spent between first event and last event in minutes. The resource table's timestamp is formatted as string,
like: "2020-12-24T09:47:17.775Z". I've tried in two ways:
1- Cast column timestamp to bigint and calculated the difference from main table.
select anonymousid, max(from_unixtime(cast('timestamp' as bigint)) - min(from_unixtime(cast('timestamp' as bigint)) from db1.formevent group by anonymousid
I got NULLs after implementing this as a solution.
2- Create a new table from main resource, put conditions to call with 'where' and tried to convert 'timestamp' to date format without any min-max calculation.
create table db1.successtime as select anonymousid, pagepath,buttontype, itemname, 'location', cast(to_date(from_unixtime(unix_timestamp('timestamp', "yyyy-MM-dd'T'HH:mm:ss.SSS"),'HH:mm:ss') as date) from db1.formevent where pagepath = "/account/sign-up/" and itemname = "Success" and 'location' = "Standard"
Then I got NULLs again and I left. It looks like this
Is there any way I can reformat and calculate time difference in minutes between first and last event ('timestamp') and take the average grouped by 'location'?
select anonymousid,
(max(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) -
min(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
) / 60
from db1.formevent
group by anonymousid;
From your description, this should work:
select anonymousid,
(max(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss') -
min(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss')
) / 60
from db1.formevent
group by anonymousid;
Note that the column name is not in single quotes.

REGR_SLOPE in Teradata SQL Query Returning 0 Slope

I am a relative newbie with Teradata SQL and have run into this strange (I think strange) situation. I am trying to run a regression (REGR_SLOPE) on sensor data. I am gathering sensor readings for a single day, each day is 80 observations which is confirmed by the COUNT in the outer SELECT. My query is:
SELECT
d.meter_id,
REGR_SLOPE(d.reading_measure, d.x_axis) AS slope,
COUNT(d.x_axis) AS xcount,
COUNT(d.reading_measure) AS read_count
FROM
(
SELECT
meter_id,
reading_measure,
row_number() OVER (ORDER BY Reading_Dttm) AS x_axis
FROM data_mart.v_meter_reading
WHERE Reading_Start_Dt = '2017-12-12'
AND Meter_Id IN (11932101, 11419827, 11385229, 11643466)
AND Channel_Num = 5
) d
GROUP BY 1
When I use the "IN" clause in the subquery to specify Meter_Id, I get slope values, but when I take it out (to run over all meters) all the slopes are 0 (zero). I would simply like to run a line through a day's worth of observations (80).
I'm using Teradata v15.0.
What am I missing / doing wrong?
I would bet a Pepperoni Pizza that it's the x_axis value.
Instead try ROW_NUMBER() OVER (PARTITION BY meter_id ORDER BY reading_dttm)
This will ensure that the x_axis starts again from 1 for each meter, and each reading will always be 1 away from the previous reading on the x_axis.
This makes me thing you should probably just use reading_dttm as the x_axis value, rather than fabricating one with ROW_NUMBER(). That way readings with a 5 hour gap between them have a different slope to readings with a 10 day gap between them. You may need to convert the reading_dttm's data-type, with a function like TO_UNIXTIME(reading_dttm), or something similar.
I'll message you my address for the Pizza Delivery. (Joking.)
Additional to #MatBailie's answer.
You probably know that should you order by the timestamp instead of the ROW_NUMBER, but you couldn't do it because Teradata doesn't allow timestamps in this place (strange).
There's no built-in TO_UNIXTIME function in Teradata, but you can use this instead:
REPLACE FUNCTION TimeStamp_to_UnixTime (ts TIMESTAMP(6))
RETURNS decimal(18,6)
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN
(Cast(ts AS DATE) - DATE '1970-01-01') * 86400
+ (Extract(HOUR From ts) * 3600)
+ (Extract(MINUTE From ts) * 60)
+ (Extract(SECOND From ts));
If you're not allowed to create UDFs simply cut&paste the calculation.

How to compare time stamps from consecutive rows

I have a table that I would like to sort by a timestamp desc and then compare all consecutive rows to determine the difference between each row. From there, I would like to find all the rows whose difference is greater than ~2hours.
I'm stuck on how to actually compare consecutive rows in a table. Any help would be much appreciated.
I'm using Oracle SQL Developer 3.2
You didn't show us your table definition, but something like this:
select *
from (
select t.*,
t.timestamp_column,
t.timestamp_column - lag(timestamp_column) over (order by timestamp_column) as diff
from the_table t
) x
where diff > interval '2' hour;
This assumes that timestamp_column is defined as timestamp not date (otherwise the result of the difference wouldn't be an interval)

Resample on time series data

I have a table with time series column in the millisecond, I want to resample the time series and apply mean on the group. How can I implement it in Postgres?
"Resample" means aggregate all time stamps within one second or one minute. All rows within one second or one minute form a group.
table structure
date x y z
Use date_trunc() to truncate timestamps to a given unit of time, and GROUP BY that expression:
SELECT date_trunc('minute', date) AS date_truncated_to_minute
, avg(x) AS avg_x
, avg(y) AS avg_y
, avg(z) AS avg_z
FROM tbl
GROUP BY 1;
Assuming your misleadingly named date column is actually of type timestamp or timestamptz.
Related answer with more details and links:
PostgreSQL: running count of rows for a query 'by minute'

examine if one time series column of table has two adjacent time points which have interval larger than certain length

I am dealing with data preprocessing on a table containing time series column
toy example Table A
timestamp value
12:30:24 1
12:32:21 3
12:33:21 4
timestamp is ordered and always go incrementally
Is that possible to define an function or something else to return "True expression" when table has two adjacent time points which have interval larger than certain length and return "False" otherwise?
I am using postgresql, thank you
SQL Fiddle
select bool_or(bigger_than) as bigger_than
from (
select
time - lag(time) over (order by time)
>
interval '1 minute' as bigger_than
from table_a
) s;
bigger_than
-------------
t
bool_or will stop searching as soon as it finds the first true value.
http://www.postgresql.org/docs/current/static/functions-aggregate.html
Your sample data shows a time value. But it works the same for a timestamp
Something like this:
select count(*) > 0
from (
select timestamp,
lag(timestamp) over (order by value) as prev_ts
from table_a
) t
where timestamp - prev_ts < interval '1' minute;
It calculates the difference between a timestamp and it's "previous" timestamp. The order of the timestamps is defined by the value column. The outer query then counts the number of rows where the difference is smaller than 1 minute.
lag() is called a window functions. More details on those can be found in the manual:
http://www.postgresql.org/docs/current/static/tutorial-window.html