Reduce/Summarize and Replace Timestamped Records - sql

I have a SQL table that has timestamped records for server performance data. This data is polled and stored every 1 minute for multiple servers. I want to keep data for a large period of time but reduce the number records for data older than six months.
For example, I have some old records like so:
Timestamp Server CPU App1 App2
1 ... 00:01 Host1 5 1 10
2 ... 00:01 Host2 10 5 20
3 ... 00:02 Host1 6 0 11
4 ... 00:02 Host2 11 5 20
5 ... 00:03 Host1 4 1 9
6 ... 00:04 Host2 9 6 19
I want to be able to reduce this data from every minute to every 10 minutes or possibly every hour for older data.
My initial assumption is that I'd average the values for times within a 10 minute time period and create a new timestamped record after deleting the old records. Could I create a sql query that generates the insert statements for the new summarized records? What would that query look like?
Or is there a better way to accomplish this summarization job?

You might also want to consider moving the summarized information into a different table so you don't end up in a situation where you're wondering if you're looking at "raw" or summarized data. Other benefits would be that you could include MAX, MIN, STDDEV and other values along with the AVG.
The tricky part is chunking out the times. The best way I could think of was to start with the output from the CONVERT(blah, Timestamp, 120) function:
-- Result: 2015-07-08 20:50:55
SELECT CONVERT(VARCHAR(19), CURRENT_TIMESTAMP, 120)
By cutting it off after the hour or after the 10-minute point you can truncate the times:
-- Hour; result is 2015-07-08 20
SELECT CONVERT(VARCHAR(13), CURRENT_TIMESTAMP, 120)
-- 10-minute point; result is 2015-07-08 20:50:5
SELECT CONVERT(VARCHAR(15), CURRENT_TIMESTAMP, 120)
With a little more massaging you can fill out the minutes for either one and CAST it back to a DATETIME or DATETIME2:
-- Hour increment
CAST(CONVERT(VARCHAR(13), CURRENT_TIMESTAMP, 120) + ':00' AS DATETIME)
-- 10-minute increment
CAST(CONVERT(VARCHAR(15), CURRENT_TIMESTAMP, 120) + 0' AS DATETIME)
Using the logic above, all times are truncated. In other words, the hour formula will convert Timestamp where 11:00 <= Timestamp < 12:00 to 11:00. The minute formula will convert Timestamp where 11:20 <= Timestamp < 11:30 to 11:20.
So the better part query looks like this (I've left out getting rid of the rows you've just summarized):
-- The hour-increment version
INSERT INTO myTableOrOtherTable
SELECT
CAST(CONVERT(VARCHAR(13), [Timestamp], 120) + ':00' AS DATETIME),
AVG(CPU),
AVG(App1),
AVG(App2)
FROM myTable
GROUP BY
CAST(CONVERT(VARCHAR(13), [Timestamp], 120) + ':00' AS DATETIME)

Assuming you have record for every minute, this is how you can group your records by 10 minutes:
SELECT
[Timestamp] = MIN([Timestamp]),
[Server],
CPU = AVG(CPU),
App1 = AVG(App1),
App2 = AVG(App2)
FROM (
SELECT *,
RN = (ROW_NUMBER() OVER(PARTITION BY [Server] ORDER BY [Timestamp]) - 1) / 10
FROM temp
)t
GROUP BY [Server], RN

Related

Averaging event start time from DateTime column

I'm calculating average start times from events that run late at night and may not start until the next morning.
2018-01-09 00:01:38.000
2018-01-09 23:43:22.000
currently all I can produce is an average of 11:52:30.0000000
I would like the result to be ~ 23:52
the times averaged will not remain static as this event runs daily and I will have new data daily. I will likely take the most recent 10 records and average them.
Would be nice to have SQL you're running, but probably you just need to format properly your output, it should be something like this:
FORMAT(cast(<your column> as time), N'hh\:mm(24h)')
The following will both compute the average across the datetime field and also return the result as a 24hr time notation only.
SELECT CAST(CAST(AVG(CAST(<YourDateTimeField_Here> AS FLOAT)) AS DATETIME) AS TIME) [AvgTime] FROM <YourTableContaining_DateTime>
The following will calculate the average time of day, regardless of what day that is.
--SAMPLE DATA
create table #tmp_sec_dif
(
sample_date_time datetime
)
insert into #tmp_sec_dif
values ('2018-01-09 00:01:38.000')
, ('2018-01-09 23:43:22.000')
--ANSWER
declare #avg_sec_dif int
set #avg_sec_dif =
(select avg(a.sec_dif) as avg_sec_dif
from (
--put the value in terms of seconds away from 00:00:00
--where 23:59:00 would be -60 and 00:01:00 would be 60
select iif(
datepart(hh, sample_date_time) < 12 --is it morning?
, datediff(s, '00:00:00', cast(sample_date_time as time)) --if morning
, datediff(s, '00:00:00', cast(sample_date_time as time)) - 86400 --if evening
) as sec_dif
from #tmp_sec_dif
) as a
)
select cast(dateadd(s, #avg_sec_dif, '00:00:00') as time) as avg_time_of_day
The output would be an answer of 23:52:30.0000000
This code allows you to define a date division point. e.g. 18 identifies 6pm. The time calculation would then be based on seconds after 6pm.
-- Defines the hour of the day when a new day starts
DECLARE #DayDivision INT = 18
IF OBJECT_ID(N'tempdb..#StartTimes') IS NOT NULL DROP TABLE #StartTimes
CREATE TABLE #StartTimes(
start DATETIME NOT NULL
)
INSERT INTO #StartTimes
VALUES
('2018-01-09 00:01:38.000')
,('2018-01-09 23:43:22.000')
SELECT
-- 3. Add the number of seconds to a day starting at the
-- day division hour, then extract the time portion
CAST(DATEADD(SECOND,
-- 2. Average number of seconds
AVG(
-- 1. Get the number of seconds from the day division point (#DayDivision)
DATEDIFF(SECOND,
CASE WHEN DATEPART(HOUR,start) < #DayDivision THEN
SMALLDATETIMEFROMPARTS(YEAR(DATEADD(DAY,-1,start)),MONTH(DATEADD(DAY,-1,start)),DAY(DATEADD(DAY,-1,start)),#DayDivision,0)
ELSE
SMALLDATETIMEFROMPARTS(YEAR(start),MONTH(start),DAY(start),#DayDivision,0)
END
,start)
)
,'01 jan 1900 ' + CAST(#DayDivision AS VARCHAR(2)) + ':00') AS TIME) AS AverageStartTime
FROM #StartTimes

SQL splitting time when over midnight

I am after some guidance on the best way to get useful information out of our MIS database
Scenario:- I want to check staff utilisation by a variable period that I can drill down into. This needs to then be split into days so I can assess over a 24 hour period what was done
The table is huge and has loads of columns we need to calculate, so ideally I need to split the records that span 2 days into 2
The table has a datetimeformat field that has user [starttime], it then has a separate field that has [duration] which is in decimal hours.
So an example would be:
ID StartTime Duration Qty username
1 2016-11-24 23:00:00 2.00 1000 Joe Bloggs
In the example above Joe starts at 11pm and works till 1 am, so what I need is to somehow split this record in my query to put anything before midnight as 1 record and anything after into another This example is pretty simple as it is half/half but some might start at 10pm and finish at 6pm so I would need 2 hours and 6 hours.
Not sure on the best way to do this, my initial thoughts was to create a cte where a start time is in 1 day and if the starttime + duration was in the next day then split the record.
Not sure if there is an easier way or if anyone has had to do this before.
Any help appreciated
#Joe has the right idea, here is pseudo-SQL
SELECT ID,StartTime,Duration,Qty,username
WHERE TRUNCATE(StartTime,DAY) = TRUNCATE(StartTime + Duration hours ,DAY)
UNION
SELECT ID,StartTime, TRUNCATE(StartTime,DAY) + 1 days - StartTime hours ,Qty,username
WHERE TRUNCATE(StartTime,DAY) < TRUNCATE(StartTime+Duration hours,DAY)
UNION
SELECT ID,TRUNCATE(StartTime+Duration hour,DAY),StartTime + duration hours - DATE(StartTime+Duration),Qty,username
WHERE TRUNCATE(StartTime,DAY) < TRUNCATE(StartTime+Duration hours,DAY)
Where TRUNCATE(timestamp,DAY) truncates a timestamp to YYYY-MM-DD 00:00:00
You can multiply rows with join. Make Tally table, simple table with numbers 1, 2, 3... and do a join. I will use table starting at zero here:
CREATE TABLE Tally0 (Number INT IDENTITY(0,1) PRIMARY KEY NOT NULL);
GO
INSERT INTO Tally0 DEFAULT VALUES;
GO 10000
Now the harders part is conversion between dates and numerics:
;WITH
tmp1 AS (SELECT *,
DATEDIFF(SECOND, CONVERT(DATE, StartTime), StartTime)/3600.0
+ DATEPART(NANOSECOND, StartTime)/(3600*1000000000.0) AS startingHours
FROM Record),
tmp2 AS (SELECT *,
startingHours + Duration AS endingHours,
(startingHours + Duration)/24.0 AS endingDays
FROM tmp1)
SELECT *,
CASE WHEN Number = 0 THEN StartTime
ELSE DATEADD(DAY, Number, CONVERT(DATE, StartTime))
END AS StartTime2,
CASE WHEN Number = 0 AND 1 < endingDays THEN 24 - startingHours
WHEN Number = 0 THEN Duration
WHEN Number + 1 < endingDays THEN 24
ELSE endingHours - Number * 24
END AS Duration2
FROM tmp2
JOIN Tally0 ON Number < endingDays

How do you filter to get data that is in between a certain time of day in IBM DB2

I am trying to add a filter condition in the DB2 database. I am new to it and come from an Oracle background. I am trying to get records with dates in between today at 4 AM and today at 5 PM only. I currently have the below query that returns zero results:
db2 => select datetimeColumn from datetimeExample WHERE datetimeColumn BETWEEN timestamp(current date) - 1 day + 4 hour AND timestamp(current date) - 1 day + 13 hour
DATETIMECOLUMN
--------------------------
0 record(s) selected.
And here is the data in the table that I believe should show but there is something wrong with condition statement, any help is appreciated
db2 => select * from datetimeExample
DATETIMECOLUMN
--------------------------
2016-06-16-09.38.53.759000
1988-12-25-17.12.30.000000
2016-12-25-17.10.30.000000
2016-06-16-04.10.30.000000
2016-06-16-05.10.30.000000
1988-12-25-15.12.30.000000
1988-12-25-14.12.30.000000
2016-06-16-12.10.30.000000
2016-06-16-07.10.30.000000
2016-06-16-08.10.30.000000
10 record(s) selected.
The query should work when you leave out the - 1 day. The reason is that timestamp(current date) returns the timestamp for today at zero hours. Then you add 4 hours and are at the required start time. Similar maths for the end time (and 5 pm should be + 17 hours).
select datetimeColumn from datetimeExample
WHERE datetimeColumn
BETWEEN timestamp(current date) + 4 hours AND timestamp(current date) + 17 hours

How to count number of records present in a date range between a fixed time in SQL Server?

I need to get the number of records present in my [RecordsTable] for the last 3 months.
However the catch is I need the records which are processed between 10PM and 2AM.
For example --
07/01/2015 10PM -- 07/02/2015 2AM
07/02/2015 10PM -- 07/03/2015 2AM
07/03/2015 10PM -- 07/04/2015 2AM
The below SQL gives me the records present on any particular day starting from May,2015.
But I am not able to get the timing(10PM-2AM of next day) embedded in the SQL and need some help.
SELECT CONVERT(VARCHAR(12), RecordDate, 101),count(RecordID)
FROM [RecordsTable](NOLOCK)
WHERE RecordDate > '2015-05-01'
GROUP BY CONVERT(VARCHAR(12), RecordDate, 101)
MSSQL Supports both Date and Time datatypes. You can break up your where statement to reflect both date and time conditions separately.
SELECT COUNT(Records)
FROM TABLE
WHERE CONVERT(Date,DateCol) BETWEEN 'MM/DD/YYYY' AND 'MM/DD/YYYY'
AND CONVERT(Time,DateCol) BETWEEN 'HH:MM:SS' AND 'HH:MM:SS'
Try the following:
SELECT count(1)
FROM RecordsTable
WHERE RecordDate > '2015-05-01'
AND NOT DATEPART(hour, RecordDate) BETWEEN 2 AND 21
I assume RecordDate is a datetime or datetime2 column. between 2 and 21 will return rows where the hour for RecordDate is between 2am and 9pm, inclusive. NOT between 2 and 21 will return the reverse, giving you data for 10pm, 11pm, 12pm, and 1am. This does not include any time between 2:00am and 2:59am. If you need to include events that occurred precisely at but not after 2:00am, things get a bit tricker, but similar code based on not between would apply.
To get records in the last 3 months you can use two ways -- one by month looks like this
WHERE MONTH(colname) >= MONTH(GETDATE()) -3
This will get you inclusive months but not partial months. To get partial months is a bit more tricky because you could mean (for example for today) the 9th day of 3 months ago or you could mean 90 days ago. In the first case this works
WHERE colname >= dateadd(month,-3, getdate())
and for 90 days ago
WHERE colname >= dateadd(day,-90, getdate())
To get between 10PM and 2AM use this
WHERE datepart(hour,colname) >= 22 OR datepart(hour,colname) <= 2
Use DATEPART
SELECT COUNT(1)
FROM Table1
WHERE RecordDate > '2015-05-01'
AND (DATEPART(HOUR, RecordDate) <= 2 OR DATEPART(HOUR, RecordDate) >= 22)
Try this
SELECT count(*) FROM tablename where created_at>='2015-03-17 07:15:10' and created_at<='2015-07-09 02:23:50';
You can even use between
SELECT count(*) FROM tablename where created_at between '2015-03-17 07:15:10' and '2015-07-09 02:26:50';
You can use curdate() to get today's date

SQL select one row for every n minutes

I'm using Microsoft SQL Server 2008, and have a data set that has entries for every few minutes, over a long period of time. I am using a program to graph the data, so i need to return about 20 values per hour. Some days the data is every minute, sometimes every five minutes, and sometimes every 8 or 9 minutes, so selecting every nth row won't give an even spread over time
eg for a sample in 2012, it looks like this :
DateTime
2012-01-01 08:00:10.000
2012-01-01 08:08:35.000
2012-01-01 08:17:01.000
2012-01-01 08:25:26.000
and for a sample the next year it looks like this:
DateTime
2013-07-20 08:00:00.000
2013-07-20 08:01:00.000
2013-07-20 08:02:00.000
2013-07-20 08:03:00.000
2013-07-20 08:04:00.000
at the moment I am using a statement like this:
SELECT * FROM [Master]
WHERE (((CAST(DATEPART(hour, DateTime)as varchar(2)))*60)
+CAST(DATEPART(minute, DateTime)as varchar(2))) % '5' = '0'
ORDER BY DateTime
This works fine for july 2013, but I miss most points in 2012, as it returns this
DateTime
2012-01-01 08:00:10.000
2012-01-01 08:25:26.000
2012-01-01 08:50:43.000
2012-01-01 09:15:59.000
2012-01-01 10:40:14.000
2012-01-01 11:05:30.000
What better way is there to do this?
EDIT: The table has a DateTime column, and a pressure column, and I need to output both and graph pressure against date and time.
Since they can be random for the hours, this should work for what you need:
Declare #NumberPerHour Int = 20
;With Cte As
(
Select DateTime, Row_Number() Over (Partition By DateDiff(Hour, 0, DateTime) Order By NewId()) RN
From Master
)
Select DateTime
From Cte
Where RN <= #NumberPerHour
Order By DateTime Asc
This will group the rows by the hour, and assign a random Row_Number ID to them, and only pull those with a Row_Number less than the number you're looking for per hour.