filter time periods in redshift

filter time periods in redshift - sql

how to filter time period from datetime column in sql.
have a table with product, date time and quantity.
date time from 00 hrs to 24 hrs , but requirement is to filter give time range eg from 08:05 to 14:25 , Please suggest

if this is a sort key, then first you need to filter the date range and then the time range to ensure you get the benefit of the sort key, e.g.
WHERE purchase_time > '2017-10-01' AND DATE_PART('hour', purchase_time) BETWEEN 8 and 9
If you need to be more granular you could do something like:
WHERE purchase_time > '2017-10-01' AND (DATE_PART('hour', purchase_time) * 100 + DATE_PART('minute', purchase_time)) BETWEEN 805 and 1425

Related

Group by arbitrary interval

I have a column that is of type timestamp. I would like to dynamically group the results by random period time (it can be 10 seconds or even 5 hours).
Supposing, I have this kind of data:
Image
If the user provides 2 hours and wants to get the max value of the air_pressure, I would like to have the first row combined with the second one. The result should look like this:
date | max air_pressure
2022-11-22 00:00:00:000 | 978.81666667
2022-11-22 02:00:00:000 | 978.53
2022-11-22 04:00:00:000 | 987.23333333
and so on. As I mentioned, the period must be easy to change, because maybe he wants to group by days/seconds...
The functionality should work like function date_trunc(). But that can only group by minutes/seconds/hours, while I would like to group for arbitrary intervals.

Basically:
SELECT g.start_time, max(air_pressure) AS max_air_pressure
FROM generate_series($start
, $end
, interval '15 min') g(start_time)
LEFT JOIN tbl t ON t.date_id >= g.start_time
AND t.date_id < g.start_time + interval '15 min' -- same interval
GROUP BY 1
ORDER BY 1;
$start and $end are timestamps delimiting your time frame of interest.
Returns all time slots, and NULL for max_air_pressure if no matching entries are found for the time slot.
See:
Best way to count rows by arbitrary time intervals
Aside: "date_id" is an unfortunate column name for a timestamp.

How to calculate the time difference in SQL with DATEDIFF?

I am using the DATEDIFF function to calculate the difference between my two timestamps.
payment_time = 2021-10-29 07:06:32.097332
trigger_time = 2021-10-10 14:11:13
What I have written is : date_diff('minute',payment_time,trigger_time) <= 15
I basically want the count of users who paid within 15 mins of the triggered time
thus I have also done count(s.user_id) as count
However it returns count as 1 even in the above case since the minutes are within 15 but the dates 10th October and 29th October are 19 days apart and hence it should return 0 or not count this row in my query.
How do I compare the dates in my both columns and then count users who have paid within 15 mins?

This also works to calculate minutes between to timestamps (it first finds the interval (subtraction), and then converts that to seconds (extracting EPOCH), and divides by 60:
extract(epoch from (payment_time-trigger_time))/60

In PostgreSQL, I prefer to subtract the two timestamps from each other, and extract the epoch from the resulting interval:
Like here:
WITH
indata(payment_time,trigger_time) AS (
SELECT TIMESTAMP '2021-10-29 07:06:32.097332',TIMESTAMP '2021-10-10 14:11:13'
UNION ALL SELECT TIMESTAMP '2021-10-29 00:00:14' ,TIMESTAMP '2021-10-29 00:00:00'
)
SELECT
EXTRACT(EPOCH FROM payment_time-trigger_time) AS epdiff
, (EXTRACT(EPOCH FROM payment_time-trigger_time) <= 15) AS filter_matches
FROM indata;
-- out epdiff | filter_matches
-- out ----------------+----------------
-- out 1616119.097332 | false
-- out 14.000000 | true

Count aggregated data on every nth hour

I have some data generated during time. I used the query below to count the number of "interactions" which happened every hour.
SELECT COUNT(*) as Quantity, FORMAT(cast(InteractionDate as datetime2), 'yyyy-MM-dd HH') as Datum
FROM Interaction as in
INNER JOIN Mission as mi
on in.MissionID=mi.MissionID
WHERE InteractionDate between '2015-01-13 12' AND '2015-01-22 12'
GROUP BY FORMAT(cast(InteractionDate as datetime2), 'yyyy-MM-dd HH')
ORDER BY Datum
The query above gives me this:
116 | 2015-01-15 00
37 | 2015-01-15 01
17 | 2015-01-15 02
Now i want to get the aggregated number of interactions on every nth hour. Let's say I want every 3rd hour, for the data provided I would get:
170 | 2015-01-15 02
How can I do that?

You could group by date and hour separately, this would let you have hour-expressions. For example;
GROUP BY cast(InteractionDate as date), (hour(InteractionDate)/4)
This would give you midnight to 6am in the first bucket, 6am to midday in the next etc.

You can aggregate data by any period of time by getting the interval using datediff, and then making the integer division, like this:
group by datediff(hour, '1990-01-01T00:00:00', yourDatetime) / 3
The maths are: get the integer number of hours from the base date, and make an integer division by 3, what yields groups of 3 consecutive hours with the same result. And then it's use to group the data.
This wil get the number of hours from the base date-time that you want to specify. The only important part, in this case, is the time part, which let you decide the inital point for the 3 hour intervals. In this case, the intervals are [00:00 to 03:00], [03:00 to 06:00] and so on. If you need different intervals, make a different base date-time, for example '1990-01-01T00:01:00' would give you the intervals [01:00 to 04:00], [04:00 to 07:00], and so on.
To get further details, see this full answer: Group DateTime into 5,15,30 and 60 minute intervals
In this answer you'll see how you can show the start and end date-time of each interval, apart from the aggregated values. And have a deeper insight into this solution.

Number of specific one-hour periods between two date/times

I have a table of table records, call it "game"
It has an id and timestamp.
What I need to know is unrelated to the table specifically. In order to know the average number of games played per hour, I need to know :
Total games played for each hour over the date range
Number of hourly
periods between the date range.
Finding the first is a matter of extracting the hour from the timestamp and grouping by it.
For the second, if the date range was rounded to the nearest day, finding this value would be easy (totalgames/numdays).
Unfortunately I can't assume this. What I need help with is finding the number of specific hour periods existing within a time range.
Example:
If the range is 5 PM today to 8 PM tomorrow, there is one "00" hour (midnight to 1 AM), but two 17, 18, 19 hours (5-6, 6-7, 7-8)
Thanks for the help
Edit: for clarity, consider the following query:
I have table game:
id, daytime
select EXTRACT(hour from daytime) as hour_period, count (*)
from game
where daytime > dateFrom and daytime < dayTo
group by hour_period
This will give me the number of games played broken down into hourly chunks for the time period.
In order to find the average games played per hour, I need to know exactly how many specific hour durations are between two timestamps. Simply dividing by the number of days is not accurate.
Edit: The ideal output will look something like this:
00 275
01 300
02 255
...
Consider the following: How many times does midnight occur between date 1 and date 2 ? If you have 1.5 days, that doesn't guarantee that midnight will occur twice. 6 AM today to 6 PM tomorrow night, for example, has 1 midnight, but 9PM tonight to 9 AM two days from now has 2 midnights.
What I'm trying to find is how many of the EXACT HOUR occurs between two timestamps, so I can use it to average the number of games played at THAT HOUR over a time period.

EDIT:
The following query gets the days, hours, and # of games, giving an output as below:
29 23 100
29 00 130
30 22 140
30 23 150
Then, the outer query adds up the number of games for each distinct hour and divides by the number of hours, as follows
22 140
23 125
00 130
The modified query is below:
SELECT
hour_period,
sum(hourly_no_of_games) / count(hour_period)
FROM
(
SELECT
EXTRACT(DAY from daytime) as day_period,
EXTRACT(HOUR from daytime) as hour_period,
count (*) hourly_no_of_games
from game
where daytime > dateFrom and daytime < dayTo
group by EXTRACT(DAY from daytime), EXTRACT(HOUR from daytime)
) hourly_data
GROUP BY hour_period
ORDER BY hour_period;
SQL Fiddle demo

If you need something to GROUP BY, you can truncate the timestamp to the level of hour, as in the following:
DECLARE #Date DATETIME
SET #Date = GETDATE()
SELECT #Date, DATEADD(Hour, DATEDIFF(Hour, 0, #Date), 0) AS RoundedDate
If you just need to find the total hours, you can just select the DATEDIFF in hours, such as with
SELECT DATEDIFF(Hour, '5/29/2014 20:01:32.999', GETDATE())

Extract not only the hour of the day but the day of the year (1-366). Then group on those. If there is the possibility the interval could span a year, then add the year itself and group by all three.
year dy hr games
2013 365 23 115
2014 1 00 103

MySQL Sum based on date range and time of day

I have a large set of data collected every 15 minutes. I am trying to select data between a certain time period and then within that time period divide it up by another date intervals. And within those intervals sum over a certain time period.
For example, I would like to be able to select data between 01/01/2009 and 01/01/2010 and group by date ranges 01/01/2009 - 05/01/2009, 05/02/2009 - 11/01/2009, 11/02/2009 - 01/01/2010 and then within each group select the data from time 00:00:01 - 12:00:00 and 12:00:01 - 23:59:59
SELECT SUM(Data.usage)AS sum
FROM Data.meter_id = Meter.id
WHERE Data.start_read >= '2009-01-01'
AND Data.end_read <= '2010-01-01 23:59:59'
GROUP BY date range? Not sure how to separate the data. Thanks

To group by date ranges, I often use case statements:
Group By Case
When start_read between '01/01/2009' and '05/01/2010' then 'Jan-Apr 09'
When start_read between '05/01/2009' and '11/01/2010' then 'May-Nov 09'
...etc

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas