How can I make an average of dates in MySQL? - sql

How can I make an average between dates in MySQL?
I am more interested in the time values, hours and minutes.
On a table with:
| date_one | datetime |
| date_two | datetime |
Doing a query like:
SELECT AVG(date_one-date_two) FROM some_table WHERE some-restriction-applies;
Edit:
The AVG(date1-date2) works but I have no clue what data it is returning.

This seems a bit hackish, but will work for dates beteen ~ 1970 and 2030 (on 32 bit arch). You are essentially converting the datetime values to integer, averaging them, and converting the average back to a datetime value.
SELECT
from_unixtime(
avg(
unix_timestamp(date_one)-unix_timestamp(date_two)
)
)
FROM
some_table
WHERE
some-restriction-applies
There is likely a better solution out there, but this will get you by in a pinch.

select avg(datediff(date1,date2))
select avg(timediff(datetime,datetime))

SELECT date_one + (date_two - date_one) / 2 AS average_date
FROM thetable
WHERE whatever
You can't sum dates, but you can subtract them and get a time interval that you can halve and add back to the first date.

SELECT TIMESTAMPADD(MINUTE, TIMESTAMPDIFF(MINUTE, '2011-02-12 10:00:00', '2011-02-12 12:00:00')/2, '2011-02-12 10:00:00')
The result is
'2011-02-12 11:00:00'

CREATE TABLE `some_table`
(
`some_table_key` INT(11) NOT NULL AUTO_INCREMENT,
`group_name` VARCHAR(128) NOT NULL,
`start` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
`finish` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`some_table_key`)
);
SELECT
group_name,
COUNT(*) AS entries,
SEC_TO_TIME( AVG( TIME_TO_SEC( TIMEDIFF(finish, start) ) ) ) AS average_time
FROM some_table
GROUP BY
some_table.group_name
;
You should always specify the group you want when using group functions, you can end up in some nasty messes with the group functions if you later extend queries with JOIN etc and assume MySql will choose the right group for you.

thinking outloud you could do a datediff in minutes from a set time, average that and then add those minutes back to the set time...

AVG is a grouping function, which means it will sum all the rows in the table and divide by the row count. But it sounds like you want the average of two different columns, reported individually for each row. In that case, you should just compute it yourself: (date1+date2)/2. (MySQL may want some extra syntax to add those columns properly.)
The code you've written will give you the table's average elapsed time between date1 and date2.

Related

Adding minutes of runtime from on/off records during a time period

I have a SQL database that collects temperature and sensor data from the barn.
The table definition is:
CREATE TABLE [dbo].[DataPoints]
(
[timestamp] [datetime] NOT NULL,
[pointname] [nvarchar](50) NOT NULL,
[pointvalue] [float] NOT NULL
)
The sensors report outside temperature (degrees), inside temperature (degrees), and heating (as on/off).
Sensors create a record when the previous reading has changed, so temperatures are generated every few minutes, one record for heat coming ON, one for heat going OFF, and so on.
I'm interested in how many minutes of heat has been used overnight, so a 24-hour period from 6 AM yesterday to 6 AM today would work fine.
This query:
SELECT *
FROM [home_network].[dbo].[DataPoints]
WHERE (pointname = 'Heaters')
AND (timestamp BETWEEN '2022-12-18 06:00:00' AND '2022-12-19 06:00:00')
ORDER BY timestamp
returns this data:
2022-12-19 02:00:20 | Heaters | 1
2022-12-19 02:22:22 | Heaters | 0
2022-12-19 03:43:28 | Heaters | 1
2022-12-19 04:25:31 | Heaters | 0
The end result should be 22 minutes + 42 minutes = 64 minutes of heat, but I can't see how to get this result from a single query. It also just happens that this result set has two complete heat on/off cycles, but that will not always be the case. So, if the first heat record was = 0, that means that at 6 AM, the heat was already on, but the start time won't show in the query. The same idea applies if the last heat record is =1 at, say 05:15, which means 45 minutes have to be added to the total.
Is it possible to get this minutes-of-heat-time result with a single query? Actually, I don't know the right approach, and it doesn't matter if I have to run several queries. If needed, I could use a small app that reads the raw data, and applies logic outside of SQL to arrive at the total. But I'd prefer to be able to do this within SQL.
This isn't a complete answer, but it should help you get started. From the SQL in the post, I'm assuming you're using SQL Server. I've formatted the code to match. Replace #input with your query above if you want to test on your own data. (SELECT * FROM [home_network].[dbo]...)
--generate dummy table with sample output from question
declare #input as table(
[timestamp] [datetime] NOT NULL,
[pointname] [nvarchar](50) NOT NULL,
[pointvalue] [float] NOT NULL
)
insert into #input values
('2022-12-19 02:00:20','Heaters',1),
('2022-12-19 02:22:22','Heaters',0),
('2022-12-19 03:43:28','Heaters',1),
('2022-12-19 04:25:31','Heaters',0);
--Append a row number to the result
WITH A as (
SELECT *,
ROW_NUMBER() OVER(ORDER BY(SELECT 1)) as row_count
from #input)
--Self join the table using the row number as a guide
SELECT sum(datediff(MINUTE,startTimes.timestamp,endTimes.timestamp))
from A as startTimes
LEFT JOIN A as endTimes on startTimes.row_count=endTimes.row_count-1
--Only show periods of time where the heater is turned on at the start
WHERE startTimes.row_count%2=1
Your problem can be divided into 2 steps:
Filter sensor type and date range, while also getting time span of each record by calculating date difference between timestamp of current record and the next one in chronological order.
Filter records with ON status and summarize the duration
(Optional) convert to HH:MM:SS format to display
Here's the my take on the problem with comments of what I do in each step, all combined into 1 single query.
-- Step 3: Convert output to HH:MM:SS, this is just for show and can be reduced
SELECT STUFF(CONVERT(VARCHAR(8), DATEADD(SECOND, total_duration, 0), 108),
1, 2, CAST(FLOOR(total_duration / 3600) AS VARCHAR(5)))
FROM (
-- Step 2: select records with status ON (1) and aggregate total duration in seconds
SELECT sum(duration) as total_duration
FROM (
-- Step 1: Use LEAD to get next adjacent timestamp and calculate date difference (time span) between the current record and the next one in time order
SELECT TOP 100 PERCENT
DATEDIFF(SECOND, timestamp, LEAD(timestamp, 1, '2022-12-19 06:00:00') OVER (ORDER BY timestamp)) as duration,
pointvalue
FROM [dbo].[DataPoints]
-- filtered by sensor name and time range
WHERE pointname = 'Heaters'
AND (timestamp BETWEEN '2022-12-18 06:00:00' AND '2022-12-19 06:00:00')
ORDER BY timestamp ASC
) AS tmp
WHERE tmp.pointvalue = 1
) as tmp2
Note: As the last record does not have next adjacent timestamp, it will be filled with the end time of inspection (In this case it's 6AM of the next day).
I do not really think it would be possible to achieve within single query.
Option 1:
implement stored procedure where you can implement some logic how to calculate these periods.
Option 2:
add new column (duration) and on insert new record calculate difference between NOW and previous timestamp and update duration for previous record

Calculating average with biginteger time intervals using TimescaleDB

I have a schema with the following fields:
Name of row | Type
--------------------------+--------
name | string
value1 | numeric
timestamp | bigint
The rows contain entries with a name, a numeric value and a bigint value storing the unix timestamp in nanoseconds. Using TimescaleDB and I would like to use time_buckets_gapfill to retrieve the data. Given the timestamps are stored in bigint, this is quite cumbersome.
I would like to get aggregated data for these intervals: 5min, hour, day, week, month, quarter, year. I have managed to make it work using normal time_buckets, but now I would like to fill the gaps as well. I am using the following query now:
SELECT COALESCE(COUNT(*), 0), COALESCE(SUM(value1), 0), time_bucket_gapfill('5 min', date_trunc('quarter', to_timestamp(timestamp/1000000000)), to_timestamp(1599100000), to_timestamp(1599300000)) AS bucket
FROM playground
WHERE name = 'test' AND timestamp >= 1599100000000000000 AND timestamp <= 1599300000000000000
GROUP BY bucket
ORDER BY bucket ASC
This returns the values correctly, but does not fill the empty spaces. If I modified my query to
time_bucket_gapfill('5 min',
date_trunc('quarter',
to_timestamp(timestamp/1000000000),
to_timestamp(1599100000),
to_timestamp(1599200000))
I would get the first entry correctly and then empty rows every 5 minutes. How could I make it work? Thanks!
Here is a DB fiddle, but it doesn't work as it doesn't support TimeScaleDB. The query above returns the following:
coalesce | coalesce | avg_val
------------------------+-------------------------
3 | 300 | 2020-07-01 00:00:00+00
0 | 0 | 2020-09-03 02:25:00+00
0 | 0 | 2020-09-03 02:30:00+00
You should use datatypes in your time_bucket_gapfill that matches the datatypes in your table. The following query should get you what you are looking for:
SELECT
COALESCE(count(*), 0),
COALESCE(SUM(value1), 0),
time_bucket_gapfill(300E9::BIGINT, timestamp) AS bucket
FROM
t
WHERE
name = 'example'
AND timestamp >= 1599100000000000000
AND timestamp < 1599200000000000000
GROUP BY
bucket;
I have managed to solve it by building on Sven's answer. It first uses his function to fill out the gaps and then date_trunc is called eliminating the extra rows.
WITH gapfill AS (
SELECT
COALESCE(count(*), 0) as count,
COALESCE(SUM(value1), 0) as sum,
time_bucket_gapfill(300E9::BIGINT, timestamp) as bucket
FROM
playground
WHERE
name = 'test'
AND timestamp >= 1599100000000000000
AND timestamp < 1599300000000000000
GROUP BY
bucket
)
SELECT
SUM(count),
SUM(sum),
date_trunc('quarter', to_timestamp(bucket/1000000000)) as truncated
FROM
gapfill
GROUP BY truncated
ORDER BY truncated ASC

BigQuery query on (custom) timestamp partitioned table returns zero results

I have a table in a dataset with the following schema:
date TIMESTAMP
id INTEGER
...
The table is partitioned on the date column.
Displaying a preview of the table in the BQ UI reveals it has many rows in February:
date, id, ...
2019-02-19 16:18:00 UTC, 534480012, ...
2019-02-19 16:23:00 UTC, 534423879, ...
However, this query returns zero results:
SELECT id FROM `<project>.<dataset>.<table>`
WHERE
TIMESTAMP_SUB(date, INTERVAL 60*24 HOUR) <= `date` AND
TIMESTAMP_SUB(date, INTERVAL 24 HOUR) >= `date`
(And yes, as of this writing, February rows should show up.)
What's more is even the "default" query returns zero results:
SELECT id FROM `<project>.<dataset>.<table>` WHERE date = TIMESTAMP("2019-02-19") LIMIT 1000
No errors in either case. Just empty results. What am I doing wrong?
How can this ever be true?
TIMESTAMP_SUB(date, INTERVAL 24 HOUR) >= `date`
If you subtract 24 hours, then it will be less then date rather than bigger than date.
As for your second query, you simply have no timestamps that are exactly at midnight. Presumably, you intend something like:
WHERE DATE(date) = DATE('2019-02-19')
I strongly recommend that you change the name of the column. Naming a column after a SQL keyword is a bad idea. Calling something a "date" when it is really a timestamp is misleading and confusing.

SQL Server : average count of alerts per day, not including days with no alerts

I have a table that acts as a message log, with the two key tables being TIMESTAMP and TEXT. I'm working on a query that grabs all alerts (from TEXT) for the past 30 days (based on TIMESTAMP) and gives a daily average for those alerts.
Here is the query so far:
--goback 30 days start at midnight
declare #olderdate as datetime
set #olderdate = DATEADD(Day, -30, DATEDIFF(Day, 0, GetDate()))
--today at 11:59pm
declare #today as datetime
set #today = dateadd(ms, -3, (dateadd(day, +1, convert(varchar, GETDATE(), 101))))
print #today
--Grab average alerts per day over 30 days
select
avg(x.Alerts * 1.0 / 30)
from
(select count(*) as Alerts
from MESSAGE_LOG
where text like 'The process%'
and text like '%has alerted%'
and TIMESTAMP between #olderdate and #today) X
However, I want to add something that checks whether there were any alerts for a day and, if there are no alerts for that day, doesn't include it in the average. For example, if there are 90 alerts for a month but they're all in one day, I wouldn't want the average to be 3 alerts per day since that's clearly misleading.
Is there a way I can incorporate this into my query? I've searched for other solutions to this but haven't been able to get any to work.
This isn't written for your query, as I don't have any DDL or sample data, thus I'm going to provide a very simple example instead of how you would do this.
USE Sandbox;
GO
CREATE TABLE dbo.AlertMessage (ID int IDENTITY(1,1),
AlertDate date);
INSERT INTO dbo.AlertMessage (AlertDate)
VALUES('20190101'),('20190101'),('20190105'),('20190110'),('20190115'),('20190115'),('20190115');
GO
--Use a CTE to count per day:
WITH Tots AS (
SELECT AlertDate,
COUNT(ID) AS Alerts
FROM dbo.AlertMessage
GROUP BY AlertDate)
--Now the average
SELECT AVG(Alerts*1.0) AS DayAverage
FROM Tots;
GO
--Clean up
DROP TABLE dbo.AlertMessage;
You're trying to compute a double-aggregate: The average of daily totals.
Without using a CTE, you can try this as well, which is generalized a bit more to work for multiple months.
--get a list of events per day
DECLARE #Event TABLE
(
ID INT NOT NULL IDENTITY(1, 1)
,DateLocalTz DATE NOT NULL--make sure to handle time zones
,YearLocalTz AS DATEPART(YEAR, DateLocalTz) PERSISTED
,MonthLocalTz AS DATEPART(MONTH, DateLocalTz) PERSISTED
)
/*
INSERT INTO #Event(EntryDateLocalTz)
SELECT DISTINCT CONVERT(DATE, TIMESTAMP)--presumed to be in your local time zone because you did not specify
FROM dbo.MESSAGE_LOG
WHERE UPPER([TEXT]) LIKE 'THE PROCESS%' AND UPPER([TEXT]) LIKE '%HAS ALERTED%'--case insenitive
*/
INSERT INTO #Event(DateLocalTz)
VALUES ('2018-12-31'), ('2019-01-01'), ('2019-01-01'), ('2019-01-01'), ('2019-01-12'), ('2019-01-13')
--get average number of alerts per alerting day each month
-- (this will not return months with no alerts,
-- use a LEFT OUTER JOIN against a month list table if you need to include uneventful months)
SELECT
YearLocalTz
,MonthLocalTz
,AvgAlertsOfAlertingDays = AVG(CONVERT(REAL, NumDailyAlerts))
FROM
(
SELECT
YearLocalTz
,MonthLocalTz
,DateLocalTz
,NumDailyAlerts = COUNT(*)
FROM #Event
GROUP BY YearLocalTz, MonthLocalTz, DateLocalTz
) AS X
GROUP BY YearLocalTz, MonthLocalTz
ORDER BY YearLocalTz ASC, MonthLocalTz ASC
Some things to note in my code:
I use PERSISTED columns to get the month and year date parts (because I'm lazy when populating tables)
Use explicit CONVERT to escape integer math that rounds down decimals. Multiplying by 1.0 is a less-readable hack.
Use CONVERT(DATE, ...) to round down to midnight instead of converting back and forth between strings
Do case-insensitive string searching by making everything uppercase (or lowercase, your preference)
Don't subtract 3 milliseconds to get the very last moment before midnight. Change your semantics to interpret the end of a time range as exclusive, instead of dealing with the precision of your datatypes. The only difference is using explicit comparators (i.e. use < instead of <=). Also, DATETIME resolution is 1/300th of a second, not 3 milliseconds.
Avoid using built-in keywords as column names (i.e. "TEXT"). If you do, wrap them in square brackets to avoid ambiguity.
Instead of dividing by 30 to get the average, divide by the count of distinct days in your results.
select
avg(x.Alerts * 1.0 / x.dd)
from
(select count(*) as Alerts, count(distinct CAST([TIMESTAMP] AS date)) AS dd
...

How do I add the values of a column together dependant on another column

It's quite a hard one to explain but probably (hopefully) an easy one to solve so I'll just explain what it is I'm trying to achieve.
I have a table where multiple logs can be entered for a day each as a seperate row, I then have a decimal as another column, I'm trying to create a summary for each day which would be something like
01/01/1900 | | 5.5
When there's one entry for the 01/01/1900 with 2.5, one with 3 in the main table so adding the values together for the day?
My only issue is adding the dates together if the dates the same, I was thinking something like
Select distinct date and joining it with a table that gets the sum of the decimal column where date is... and that's where im not too sure?
Any help would be great! thanks
If your table is named logs with data like
log_date | value
1900-01-01 | 2.5
1900-01-01 | 3
then your query is
SELECT sum(value) FROM logs GROUP BY log_date
What you're looking for is probably a GROUP BY clause.
SELECT [ yourdatecol, ] sum(yourdecimalcol) FROM yourtable
[ WHERE yourdatecol = .. ]
GROUP BY [ get_ymd_from_date(yourdatecol) | yourdatecol ] ;
With such syntax you'll get sum of row sets, selected by the same datecol value. You may also want to approximate date ( e.g. taking only Y/M/D part from it ), if date contains H/M/ss and what you want is per-day sums. Optional parts I enclosed in square brackets.
SELECT log_date,sum(value) FROM logs GROUP BY log_date
CREATE VIEW Summary
AS
SELECT
DateValue,
SUM(DecimalValue) DayTotal
FROM
EventTable
GROUP BY
DateValue;
Then
SELECT
*
FROM
Summary
WHERE
DateValue = '1900-01-01'
Try this :
SELECT CONVERT(VARCHAR, DateColumn, 103) AS OutputDate, SUM(ValueColumn) AS TotalValue
FROM YourTable
GROUP BY CONVERT(VARCHAR, DateColumn, 103)
I'm presuming a DateTime is used, lets call it logdate. I'm also presuming the other one is a decimal, lets call it logdecimal.
Using SQL server 2008 you can do (the is a type called date which is without the time-part):
SELECT
CAST(logdate as date) as TheDay,
SUM(logdecimal) as TheSum
FROM logTable
GROUP BY CAST(logdatetime as date)
Using a SQL server without the type date, maybe something like:
SELECT
CONVERT(varchar(10), logdate, 101) as TheDay,
SUM(logdecimal) as TheSum
FROM logTable
GROUP BY CONVERT(varchar(10), logdate , 101)
Regards, Olle
Edit: This one will work if it is a DateTime (including time part) you want to group as a date (not including time part). Looks like this was not the case in this question.