I have a table that contains the timestamp and temperature. I want to get the highest rate of change of the temperature each day. Is there a formula, or aggregate function that does that?
I can get the highest rate via a program that gets each row in sequential order and determine the change between the two rows, but wondering if there is a function ready for that.
SQL has the LAG function which can be used to get the value from a previous row.
Using something like the below SQL statement, I can get the previous temperature and the time since the last reading:
SELECT datetime, temp,
(LAG(temp, 1, 0) OVER (ORDER BY datetime) - Temp) PrevTemp,
(julianday(datetime) - LAG(julianday(datetime), 1, 0) OVER (order by datetime))*24*60 TimeDIff
FROM temperature
ORDER BY datetime
The results are something like this:
'2022-08-11 22:44:40', 33.000, -0.06300000000000239, 12.316666916012764
'2022-08-11 22:45:11', 32.937, 0.06300000000000239, 0.5166665464639664
'2022-08-11 22:47:15', 33.000, -0.06300000000000239, 2.066666856408119
'2022-08-11 22:48:16', 32.937, 0.06300000000000239, 1.016666516661644
'2022-08-11 22:49:18', 33.000, -0.06300000000000239, 1.0333330929279327
Related
Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily
I'm novice to SQL (in hive) and trying to calculate every anonymousid's time spent between first event and last event in minutes. The resource table's timestamp is formatted as string,
like: "2020-12-24T09:47:17.775Z". I've tried in two ways:
1- Cast column timestamp to bigint and calculated the difference from main table.
select anonymousid, max(from_unixtime(cast('timestamp' as bigint)) - min(from_unixtime(cast('timestamp' as bigint)) from db1.formevent group by anonymousid
I got NULLs after implementing this as a solution.
2- Create a new table from main resource, put conditions to call with 'where' and tried to convert 'timestamp' to date format without any min-max calculation.
create table db1.successtime as select anonymousid, pagepath,buttontype, itemname, 'location', cast(to_date(from_unixtime(unix_timestamp('timestamp', "yyyy-MM-dd'T'HH:mm:ss.SSS"),'HH:mm:ss') as date) from db1.formevent where pagepath = "/account/sign-up/" and itemname = "Success" and 'location' = "Standard"
Then I got NULLs again and I left. It looks like this
Is there any way I can reformat and calculate time difference in minutes between first and last event ('timestamp') and take the average grouped by 'location'?
select anonymousid,
(max(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) -
min(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
) / 60
from db1.formevent
group by anonymousid;
From your description, this should work:
select anonymousid,
(max(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss') -
min(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss')
) / 60
from db1.formevent
group by anonymousid;
Note that the column name is not in single quotes.
I'm looking to summarize some information into a kind of report, and the crux of it is similar to the following problem. I'm looking for the approach in any sql-like language.
consider a schema containing the following:
id - int, on - bool, time - datetime
This table is basically a log that specifies when a thing of id changes state between 'on' and 'off'.
What I want is a table with the percentage of time 'on' for each id seen. So a result might look like this
id, percent 'on'
1, 50
2, 45
3, 67
I would expect the overall time to be
now - (time first seen in the log)
Programatically, I understand how to do this. For each id, I just want to add up all of the segments of time for which the item was 'on' and express this as a percentage of the total time. I'm not quite seeing how to do this in sql however
You can use lead() and some date/time arithmetic (which varies by database).
In pseudo-code this looks like:
select id,
sum(csae when status = on then coalesce(next_datetime, current_datetime) - datetime) end) / (current_datetime - min(datetime))
from (select t.*,
lead(datetime) over (partition by id order by datetime) as next_datetime
from t
) t
group by id;
Date/time functions vary by database, so this is just to give an idea of what to do.
For this SQL,
SELECT CID, Time, Val
FROM MyTable
WHERE CID = 8
I get the following data,
CID, Time, Val
8,2016-10-19 13:49:06.217,7.036
8,2016-10-19 13:49:15.237,6.547
8,2016-10-19 13:49:46.063,6.292
8,2016-10-19 13:49:57.387,5.998
I want each of Time value minus the starting time, which can be calculated by
SELECT MIN(Time) StartTime
FROM MyTable
WHERE CID = 8
I know that I can define a T-SQL variable to do that, However, is it possible to do the task, getting the relative time instead of absolute time for each record, in one SQL?
You can use min() window function to get the minimum time for each id and use it for subtraction.
select cid,time,val,
datediff(millisecond,min(time) over(partition by cid),time) as diff
from mytable
Change the difference interval (millisecond shown) per your requirement. There can be an overflow if the difference is too big.
I have a table that is being updated every 10 minutes with data of flow rates, at the end of the day I need a query that will generate the Average flow, the minimum flow value and the time it occurred at (same with max) and then the total flow for the day. I have this so far:
SELECT *
FROM [WS6].[dbo].[MasterData]
Where [_Datetime] between '2015-07-06' and '2015-07-06 23:59:59'
This gets me the whole days data, but how can I filter for the values I need? Thank you
In SQL Server 2012, you can use first_value():
SELECT avg(flow), min(flow), max(flow), sum(flow),
first_value(_DateTime) over (order by flow asc) as minTime,
first_value(_DateTime) over (order by flow desc) as maxTime
FROM [WS6].[dbo].[MasterData]
Where [_Datetime] >= '2015-07-06' and [_Datetime] < '2015-07-07';
In earlier versions you can use window functions or cross apply.
It depends on what you are trying to get the Min and Max value on. for example if you are trying to get the min and max of field between those dates you would do something like this where _field is the filed you want the value for (will return 1 or no rows):
SELECT MIN(_field) AS 'MIN', MAX(_field) AS 'MAX'
FROM [WS6].[dbo].[MasterData]
Where [_Datetime] between '2015-07-06' and '2015-07-06 23:59:59'
if you are trying to get the min and max of many fields grouped by another field you can try something like this (may return many, 1 or no rows).
SELECT Group_by_field, MIN(_field) AS 'MIN', MAX(_field) AS 'MAX'
FROM [WS6].[dbo].[MasterData]
Where [_Datetime] between '2015-07-06' and '2015-07-06 23:59:59'
Group by Group_by_field
It would be helpful if you provided a sample table and sample output of what you want to do.