I'm writing a query to group rows by their timestamps into aggregation blocks. The query parameters include the start time start of the first aggregation block and a positive integer period which is the length of each and every aggregation block in minutes. Given a row's timestamp row_stamp, which is later than start, I want to calculate its block_start such that
block_start <= row_stamp < block_start + period minutes, and
block_start = start + (N * period minutes) where N is a nonnegative integer
It's easy enough to find row_stamp - start, which appears to be an interval. I figure I'll want block_start to be either start + (period minutes) * FLOOR((row_stamp - start) / (period minutes)) or row_stamp - MOD(row_stamp - start, period minutes). I know neither of these is exact syntax, but I think you see the algorithms I'm going for. Unfortunately, it looks like neither FLOOR nor MOD works well with an interval as its first parameter. What's the recommended way to turn an interval into a number of minutes and then, after the math, turn it back to an interval again?
My apologies for not mentioning earlier that I'm looking for the number of rows in each aggregation block. Edited to add an example.
CREATE TABLE "DEV_JKNIGHT" ( "ROW_STAMP" TIMESTAMP (6) NOT NULL );
insert into dev_jknight values (TIMESTAMP '2022-06-27 14:27:00');
insert into dev_jknight values (TIMESTAMP '2022-06-27 14:32:00');
insert into dev_jknight values (TIMESTAMP '2022-06-27 14:33:00');
insert into dev_jknight values (TIMESTAMP '2022-06-27 15:01:00');
insert into dev_jknight values (TIMESTAMP '2022-06-27 16:32:00');
Suppose the query parameters are a start time of '2022-06-27 14:15:00' and a period of 15 minutes. Then the aggregation blocks are as follows. The first begins at the specified start time and lasts for the period number of minutes. The next block starts immediately after the first one ends and lasts the same length.
14:15:00 - 14:30:00, June 27
14:30:00 - 14:45:00, June 27
14:45:00 - 15:00:00, June 27
15:00:00 - 15:15:00, June 27
and so on
If I run the query with those parameters, then I'm looking for these four rows of output.
Block start = '2022-06-27 14:15:00', count = "1"
Block start = '2022-06-27 14:30:00', count = "2"
Block start = '2022-06-27 15:00:00', count = "1"
Block start = '2022-06-27 16:30:00', count = "1"
The first row of output indicates that there is one, and only one, dev_jknight row in the aggregation block which starts at 14:15:00 -- namely, the row at 14:27:00.
The second row of output indicates that there are two dev_jknight rows in the aggregation block which starts at 14:30:00 -- the rows at 14:32 and 14:33.
Because there are zero dev_jknight rows in the third aggregation block (14:45 - 15:00), there is no output row for it.
The last two rows of output indicate that there is one dev_jknight rows in the aggregation block which starts at 15:00 and one in the block which starts at 16:30.
Thank you for your patience while I clarified this.
Turn an interval into a number of minutes?
Use EXTRACT:
SELECT EXTRACT(DAY FROM value) * 24 * 60
+ EXTRACT(HOUR FROM value) * 60
+ EXTRACT(MINUTE FROM value)
+ EXTRACT(SECOND FROM value) / 60 AS minutes
FROM (
SELECT INTERVAL '1 12:34:56.789' DAY TO SECOND AS value
FROM DUAL
);
Which outputs:
MINUTES
2194.946483333333333333333333333333333333
and then turn it back to an interval again?
Multiply a 1 minute interval by the number of minutes:
SELECT INTERVAL '1' MINUTE * 2194.94648333333 AS interval_value
FROM DUAL;
or, use the NUMTODSINTERVAL function:
SELECT NUMTODSINTERVAL(2194.94648333333, 'MINUTE') AS interval_value FROM DUAL;
INTERVAL_VALUE
+000000001 12:34:56.789000000
db<>fiddle here
Related
I am using the DATEDIFF function to calculate the difference between my two timestamps.
payment_time = 2021-10-29 07:06:32.097332
trigger_time = 2021-10-10 14:11:13
What I have written is : date_diff('minute',payment_time,trigger_time) <= 15
I basically want the count of users who paid within 15 mins of the triggered time
thus I have also done count(s.user_id) as count
However it returns count as 1 even in the above case since the minutes are within 15 but the dates 10th October and 29th October are 19 days apart and hence it should return 0 or not count this row in my query.
How do I compare the dates in my both columns and then count users who have paid within 15 mins?
This also works to calculate minutes between to timestamps (it first finds the interval (subtraction), and then converts that to seconds (extracting EPOCH), and divides by 60:
extract(epoch from (payment_time-trigger_time))/60
In PostgreSQL, I prefer to subtract the two timestamps from each other, and extract the epoch from the resulting interval:
Like here:
WITH
indata(payment_time,trigger_time) AS (
SELECT TIMESTAMP '2021-10-29 07:06:32.097332',TIMESTAMP '2021-10-10 14:11:13'
UNION ALL SELECT TIMESTAMP '2021-10-29 00:00:14' ,TIMESTAMP '2021-10-29 00:00:00'
)
SELECT
EXTRACT(EPOCH FROM payment_time-trigger_time) AS epdiff
, (EXTRACT(EPOCH FROM payment_time-trigger_time) <= 15) AS filter_matches
FROM indata;
-- out epdiff | filter_matches
-- out ----------------+----------------
-- out 1616119.097332 | false
-- out 14.000000 | true
I am creating a query that shows me the time elapsed between two dates, only taking into account only the one that is Monday through Friday from 08:00 to 17:00, for example:
For example, if a petition opens on day 1 at 6:30 p.m. and closes on day 2 at 8:45 p.m., the TMO is 45 minutes.
If it closes on day 3 at 8:45, the TMO is 9 hours and 45 minutes.
Example 2:
If a petition opens on Friday at 16:45 and closes on Tuesday at 8:30, the MTO would be: 15 minutes on Friday, nine hours on Monday and 30 minutes on Tuesday for an MTO = 9 hours 45 minutes
The query is performed on a single column of type date as I show below
I currently use a LAG function to make the query, but I can not create something functional, not even optimal to incorporate, I would greatly appreciate your help.
In the solution below I will ignore the "lag" part of your problem, which you said you know how to use. I am only showing how to count "working hours" between any two date_times (they may be during or before or after work hours, and/or they can be on weekend days; the computation is the same in all cases).
Explaining the answer in words: For two given date-times, "start" and "end", calculate how many "work" hours elapsed from the beginning of the week (from Monday 00:00:00) till each of them. This is in fact a calculation for ONE date, not for TWO dates. Then: given "start" and "end", calculate this number of hours for each of them; subtract the "end" number of hours from the "start" number of hours. To the result, add x times 5 times 9, where x is the difference in weeks between Monday 00:00:00 of the two dates. (If they are in the same week, the difference will be 0.)
To truncate a date to the beginning of the day, we use TRUNC(dt). To truncate to the beginning of Monday, TRUNC(dt, 'iw').
To compute how many "work" hours are from the beginning of the date dt until the actual time-of-day we can use the calculation
greatest(0, least(17/24, dt - trunc(dt)) - 8/24)
(the results will be in days; we calculate everything in days and then we can convert to hours). However, in the final formula we must check to see if the date is a Saturday or Sunday, in which case this should just be zero. Or, better, we can adjust the calculation a bit later, when we count from the beginning of Monday (we can use least( 5*9/24, ...)).
Putting everything together:
with
inputs ( dt1, dt2 ) as (
select to_date('2017-09-25 11:30:00', 'yyyy-mm-dd hh24:mi:ss'),
to_date('2017-10-01 22:45:00', 'yyyy-mm-dd hh24:mi:ss')
from dual
)
-- End of SIMULATED input dates (for testing only).
select 24 *
( least(5 * (17 - 8) / 24, greatest(0, least(17/24, dt2 - trunc(dt2)) - 8/24)
+ (17 - 8) / 24 * (trunc(dt2) - trunc(dt2, 'iw')))
-
least(5 * (17 - 8) / 24, greatest(0, least(17/24, dt1 - trunc(dt1)) - 8/24)
+ (17 - 8) / 24 * (trunc(dt1) - trunc(dt1, 'iw')))
+ 5 * (17 - 8) / 24 * (trunc(dt2, 'iw') - trunc(dt1, 'iw')) / 7
)
as duration_in_hours
from inputs
;
DURATION_IN_HOURS
-----------------
41.500
I am creating a table which will have 2 columns:
Day_time (time from 1978-01-01 00:00:00 Sunday, till 1978-01-07 23:59:00.0 Saturday, Granularity: Minute)
Time_id (a unique id for each minute), to be populated
I have column one populated. I want to populate column two.
How I am doing it right now:
EXTRACT(dayofweek FROM day_time) * 10000 + DATEDIFF('minutes', TRUNC(day_time), day_time)
I basically want a function where I pass any date and it tells me where I am in a week. So, I need a function, just like the function above. Just more optimized, where I give a date and get a unique ID. The unique ID should repeat weekly.
Example: ID for Jan 1, 2015 00:00:00 will be same as Jan 8, 2015 00:00:00.
Why 1978-01-01? cuz it starts from a Sunday.
Why 10,000? cuz the number of minutes in a day are in four digits.
You can do it all in one fell swoop, without needing to extract the date separately:
SELECT DATEDIFF('minutes', date_trunc('week',day_time), day_time) which I'd expect to be marginally faster.
Another approach that I'd expect to be significantly faster would be converting the timestamp to epoch, dividing by 60 to get minutes from epoch and then taking the value modulus of 10,080 (for 60 * 24 * 7 minutes in a week).
SELECT (extract(epoch from day_time) / 60) % 10080
If you don't care about the size of the weekly index, you could also do:
SELECT (extract(epoch from day_time)) % 604800 and skip the division step altogether, which should make it faster still.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the fastest way to truncate timestamps to 5 minutes in Postgres?
Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)
I want to aggregate data at 5 minute intervals in PostgreSQL. If I use the date_trunc() function, I can aggregate data at an hourly, monthly, daily, weekly, etc. interval but not a specific interval like 5 minute or 5 days.
select date_trunc('hour', date1), count(*) from table1 group by 1;
How can we achieve this in PostgreSQL?
SELECT date_trunc('hour', date1) AS hour_stump
, (extract(minute FROM date1)::int / 5) AS min5_slot
, count(*)
FROM table1
GROUP BY 1, 2
ORDER BY 1, 2;
You could GROUP BY two columns: a timestamp truncated to the hour and a 5-minute-slot.
The example produces slots 0 - 11. Add 1 if you prefer 1 - 12.
I cast the result of extract() to integer, so the division / 5 truncates fractional digits. The result:
minute 0 - 4 -> slot 0
minute 5 - 9 -> slot 1
etc.
This query only returns values for those 5-minute slots where values are found. If you want a value for every slot or if you want a running sum over 5-minute slots, consider this related answer:
PostgreSQL: running count of rows for a query 'by minute'
Here's a simple query you can either wrap in a function or cut and paste all over the place:
select now()::timestamp(0), (extract(epoch from now()::timestamptz(0)-date_trunc('d',now()))::int)/60;
It'll give you the current time, and a number from 0 to the n-1 where n=60 here. To make it every 5 minutes, make that number 300 and so on. It groups by the seconds since the start of the day. To make it group by seconds since year begin, hour begin, or whatever else, change the 'd' in the date_trunc.
I have a table with measures and the time this measures have been taken in the following form: MM/DD/YYYY HH:MI:SS AM. I have measures over many days starting at the same time every day.The datas are minute by minute so basically the seconds are always = 0. I want to select only the measures for the first 5 minutes of each day. I would have used the where statement but the condition would only be on the minutes and note the date is there a way to do this?
Thanks
You could try something like this:
SELECT * FROM SomeTable
WHERE
DATEPART(hh, timestamp_col) = 0 AND -- filter for first hour of the day
DATEPART(mm, timestamp_col) <= 5 -- filter for the first five minutes
Careful! 0 means midnight. If your "first hour" of the day is actually 8 or 9 AM then you should replace the 0 with an 8 or 9.