Count aggregated data on every nth hour - sql

I have some data generated during time. I used the query below to count the number of "interactions" which happened every hour.
SELECT COUNT(*) as Quantity, FORMAT(cast(InteractionDate as datetime2), 'yyyy-MM-dd HH') as Datum
FROM Interaction as in
INNER JOIN Mission as mi
on in.MissionID=mi.MissionID
WHERE InteractionDate between '2015-01-13 12' AND '2015-01-22 12'
GROUP BY FORMAT(cast(InteractionDate as datetime2), 'yyyy-MM-dd HH')
ORDER BY Datum
The query above gives me this:
116 | 2015-01-15 00
37 | 2015-01-15 01
17 | 2015-01-15 02
Now i want to get the aggregated number of interactions on every nth hour. Let's say I want every 3rd hour, for the data provided I would get:
170 | 2015-01-15 02
How can I do that?

You could group by date and hour separately, this would let you have hour-expressions. For example;
GROUP BY cast(InteractionDate as date), (hour(InteractionDate)/4)
This would give you midnight to 6am in the first bucket, 6am to midday in the next etc.

You can aggregate data by any period of time by getting the interval using datediff, and then making the integer division, like this:
group by datediff(hour, '1990-01-01T00:00:00', yourDatetime) / 3
The maths are: get the integer number of hours from the base date, and make an integer division by 3, what yields groups of 3 consecutive hours with the same result. And then it's use to group the data.
This wil get the number of hours from the base date-time that you want to specify. The only important part, in this case, is the time part, which let you decide the inital point for the 3 hour intervals. In this case, the intervals are [00:00 to 03:00], [03:00 to 06:00] and so on. If you need different intervals, make a different base date-time, for example '1990-01-01T00:01:00' would give you the intervals [01:00 to 04:00], [04:00 to 07:00], and so on.
To get further details, see this full answer: Group DateTime into 5,15,30 and 60 minute intervals
In this answer you'll see how you can show the start and end date-time of each interval, apart from the aggregated values. And have a deeper insight into this solution.

Related

How can I calculate the number of minutes per day between a daterange

First off I apologize I do not even know where to start and haven't been able to find anything specific to this particular question.
I have a table with datetimes (start and end) and i need to find a way to get minutes/hours between those days. It could either be a sum of the time on weekdays or a some kind of pivot on each day and grouping by the ID number. I had thought to assign a value to the number of days however the times are random and do not start/end at midnight so I am at a loss as how to approach this.
Here are some examples of the date/time format if that helps.
startdate 2018-12-14 10:53:01
enddate 2018-12-27 11:50:00
Any helps or hints would be greatly appreciated!
Edit
forgot to include I am working in SQL Server (SSMS)
Editing For Additional Clarification
Here is a sample date range with an ID number, I wanted to keep it simple.
|ID number| start time |end time
|1 |12/14/2018 10:53|12/17/2018 12:00
here is what I'm trying to achieve (the separation of each date range/ID #)
ID number| start time |end time |mins|
1 | 12/14/2018 10:53|12/14/2018 23:59|786 |
1 | 12/15/2018 0:00 |12/15/2018 23:59|1439|
1 | 12/16/2018 0:00 |12/16/2018 23:59|1439|
1 | 12/17/2018 0:00 |12/17/2018 12:00|960 |
The MINUTE parameter of the DATEDIFF function can be used to determine the difference in minutes between two datetime columns. As below, the second parameter is the start date and the third parameter is the end date, with the result being the amount of time in the specified interval (days, minutes, etc.) from the start to the end date. If you need to find the number of hours between these two columns the HOUR parameter can be used for this. Grouping can be performed as well, as in the second example.
DATEDIFF:
SELECT DATEDIFF(MINUTE, StartDateColumn, EndDateColumn)
DATEDIFF with Grouping and Aggregation:
SELECT ColumnA, SUM(DATEDIFF(MINUTE, StartDateColumn, EndDateColumn)) as DifferenceInMinutes
FROM YourSchema.YourTable
GROUP BY ColumnA

filter time periods in redshift

how to filter time period from datetime column in sql.
have a table with product, date time and quantity.
date time from 00 hrs to 24 hrs , but requirement is to filter give time range eg from 08:05 to 14:25 , Please suggest
if this is a sort key, then first you need to filter the date range and then the time range to ensure you get the benefit of the sort key, e.g.
WHERE purchase_time > '2017-10-01' AND DATE_PART('hour', purchase_time) BETWEEN 8 and 9
If you need to be more granular you could do something like:
WHERE purchase_time > '2017-10-01' AND (DATE_PART('hour', purchase_time) * 100 + DATE_PART('minute', purchase_time)) BETWEEN 805 and 1425

Time looping an average

I have a table with 17,000 records that is ordered by time spaced in 15 minute intervals. The time values loop back onto themselves every 24 hours, so for example, I could have 100 records that are all at 1 AM, just on different days. I want to create a 'average day' by taking those 100 records at 1 am and finding the average of them for the averaged 1 am.
I don't know how to format the table to make it show up nicely here.
I'm assuming you want to calculate the average value per time interval regardless of the day in a query. You could use this SQL to group your table by Time interval only (assuming that it's separate from the date field), and average whichever fields you want to average. Do not select or group by the date field, just select and group by the time field.
SELECT TimeField
, AVG([Field1ToAverage])
, AVG([Field2ToAverage])
FROM MyTable
GROUP BY TimeField;
If the date and time fields are stored together in the same column, you will have to extract the time only:
SELECT TimeValue([DateTimeField])
, AVG([Field1ToAverage])
, AVG([Field2ToAverage])
FROM MyTable
GROUP BY TimeValue([DateTimeField]);

Vertica date series is starting one month before specified date

I work with a Vertica database and I needed to make a query that, given two dates, would give me a list of all months between said dates. For example, if I were to give the query 2015-01-01 and 2015-12-31, it would output me the following list:
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01
After a bit of digging, I was able to discover the following query:
SELECT date_trunc('MONTH', ts)::date as Mois
FROM
(
SELECT '2015-01-01'::TIMESTAMP as tm
UNION
SELECT '2015-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)
This query works and gives me the following output:
2014-12-01
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01
As you can see, by giving the query a starting date of '2015-01-01' or anywhere in january for that matters, I end up with an extra entry, namely 2014-12-01. In itself, the bug (or whatever you want to call this unexpected behavior) is easy to circumvent (just start in february), but I have to admit my curiosity's piked. Why exactly is the serie starting one month BEFORE the date I specified?
EDIT: Alright, after reading Kimbo's warning and confirming that indeed, long periods will eventually cause problems, I was able to come up with the following query that readjusts the dates correctly.
SELECT ts as originalMonth,
ts +
(
mod
(
day(first_value(ts) over (order by ts)) - day(ts) + day(last_day(ts)),
day(last_day(ts))
)
) as adjustedMonth
FROM
(
SELECT ts
FROM
(
SELECT '2015-01-01'::TIMESTAMP as tm
UNION
SELECT '2018-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)
) as temp
The only problem I have is that I have no control over the initial day of the first record of the series. It's set automatically by Vertica to the current day. So if I run this query on the 31st of the month, I wonder how it'll behave. I guess I'll just have to wait for december to see unless someone knows how to get timeseries to behave in a way that would allow me to test it.
EDIT: Okay, so after trying out many different date combinations, I was able to determine that the day which the series starts changes depending on the date you specify. This caused a whole lot of problems... until we decided to go the simple way. Instead of using a month interval, we used a day interval and only selected one specific day per month. WAY simpler and it works all the time. Here's the final query:
SELECT ts as originalMonth
FROM
(
SELECT ts
FROM
(
SELECT '2000-02-01'::TIMESTAMP as tm
UNION
SELECT '2018-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 day' OVER (ORDER BY tm)
) as temp
where day(ts) = 1
I think it boils down to this statement from the doc: http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/SELECT/TIMESERIESClause.htm
TIME_SLICE can return the start or end time of a time slice, depending
on the value of its fourth input parameter (start_or_end). TIMESERIES,
on the other hand, always returns the start time of each time slice.
When you define a time interval with some start date (2015-01-01, for example), then TIMESERIES ts AS '1 month' will create for its first time slice a slice that starts 1 month ahead of that first data point, so 2014-12-01. When you do DATE_TRUNC('MON', ts), that of course sets the first date value to 2014-12-01 even if your start date is 2015-01-03, or whatever.
e: I want to throw out one more warning -- your use of DATE_TRUNC achieves what you need, I think. But, from the doc: Unlike TIME_SLICE, the time slice length and time unit expressed in [TIMESERIES] length_and_time_unit_expr must be constants so gaps in the time slices are well-defined. This means that '1 month' is actually 30 days exactly. This obviously has problems if you're going for more than a couple years.

Number of specific one-hour periods between two date/times

I have a table of table records, call it "game"
It has an id and timestamp.
What I need to know is unrelated to the table specifically. In order to know the average number of games played per hour, I need to know :
Total games played for each hour over the date range
Number of hourly
periods between the date range.
Finding the first is a matter of extracting the hour from the timestamp and grouping by it.
For the second, if the date range was rounded to the nearest day, finding this value would be easy (totalgames/numdays).
Unfortunately I can't assume this. What I need help with is finding the number of specific hour periods existing within a time range.
Example:
If the range is 5 PM today to 8 PM tomorrow, there is one "00" hour (midnight to 1 AM), but two 17, 18, 19 hours (5-6, 6-7, 7-8)
Thanks for the help
Edit: for clarity, consider the following query:
I have table game:
id, daytime
select EXTRACT(hour from daytime) as hour_period, count (*)
from game
where daytime > dateFrom and daytime < dayTo
group by hour_period
This will give me the number of games played broken down into hourly chunks for the time period.
In order to find the average games played per hour, I need to know exactly how many specific hour durations are between two timestamps. Simply dividing by the number of days is not accurate.
Edit: The ideal output will look something like this:
00 275
01 300
02 255
...
Consider the following: How many times does midnight occur between date 1 and date 2 ? If you have 1.5 days, that doesn't guarantee that midnight will occur twice. 6 AM today to 6 PM tomorrow night, for example, has 1 midnight, but 9PM tonight to 9 AM two days from now has 2 midnights.
What I'm trying to find is how many of the EXACT HOUR occurs between two timestamps, so I can use it to average the number of games played at THAT HOUR over a time period.
EDIT:
The following query gets the days, hours, and # of games, giving an output as below:
29 23 100
29 00 130
30 22 140
30 23 150
Then, the outer query adds up the number of games for each distinct hour and divides by the number of hours, as follows
22 140
23 125
00 130
The modified query is below:
SELECT
hour_period,
sum(hourly_no_of_games) / count(hour_period)
FROM
(
SELECT
EXTRACT(DAY from daytime) as day_period,
EXTRACT(HOUR from daytime) as hour_period,
count (*) hourly_no_of_games
from game
where daytime > dateFrom and daytime < dayTo
group by EXTRACT(DAY from daytime), EXTRACT(HOUR from daytime)
) hourly_data
GROUP BY hour_period
ORDER BY hour_period;
SQL Fiddle demo
If you need something to GROUP BY, you can truncate the timestamp to the level of hour, as in the following:
DECLARE #Date DATETIME
SET #Date = GETDATE()
SELECT #Date, DATEADD(Hour, DATEDIFF(Hour, 0, #Date), 0) AS RoundedDate
If you just need to find the total hours, you can just select the DATEDIFF in hours, such as with
SELECT DATEDIFF(Hour, '5/29/2014 20:01:32.999', GETDATE())
Extract not only the hour of the day but the day of the year (1-366). Then group on those. If there is the possibility the interval could span a year, then add the year itself and group by all three.
year dy hr games
2013 365 23 115
2014 1 00 103