Create table with 15 minutes interval on date time in Snowflake - sql

I am trying to create a table in Snowflake with 15 mins interval. I have tried with generator, but that's not give in the 15 minutes interval. Are there any function which I can use to generate and build this table for couple of years worth data.
Such as
Date
Hour
202-03-29
02:00 AM
202-03-29
02:15 AM
202-03-29
02:30 AM
202-03-29
02:45 AM
202-03-29
03:00 AM
202-03-29
03:15 AM
.........
........
.........
........
Thanks

Use following as time generator with 15min interval and then use other date time functions as needed to extract date part or time part in separate columns.
with CTE as
(select timestampadd(min,seq4()*15 ,date_trunc(hour, current_timestamp())) as time_count
from table(generator(rowcount=>4*24)))
select time_count from cte;
+-------------------------------+
| TIME_COUNT |
|-------------------------------|
| 2022-03-29 14:00:00.000 -0700 |
| 2022-03-29 14:15:00.000 -0700 |
| 2022-03-29 14:30:00.000 -0700 |
| 2022-03-29 14:45:00.000 -0700 |
| 2022-03-29 15:00:00.000 -0700 |
| 2022-03-29 15:15:00.000 -0700 |
.
.
.
....truncated output
| 2022-03-30 13:15:00.000 -0700 |
| 2022-03-30 13:30:00.000 -0700 |
| 2022-03-30 13:45:00.000 -0700 |
+-------------------------------+

There are many answers to this question h e r e already (those 4 are all this month).
But major point to note is you MUST NOT use SEQx() as the number generator (you can use it in the ORDER BY, but that is not needed). As noted in the doc's
Important
This function uses sequences to produce a unique set of increasing integers, but does not necessarily produce a gap-free sequence. When operating on a large quantity of data, gaps can appear in a sequence. If a fully ordered, gap-free sequence is required, consider using the ROW_NUMBER window function.
CREATE TABLE table_of_2_years_date_times AS
SELECT
date_time::date as date,
date_time::time as time
FROM (
SELECT
row_number() over (order by null)-1 as rn
,dateadd('minute', 15 * rn, '2022-03-01'::date) as date_time
from table(generator(rowcount=>4*24*365*2))
)
ORDER BY rn;
then selecting the top/bottom:
(SELECT * FROM table_of_2_years_date_times ORDER BY date,time LIMIT 5)
UNION ALL
(SELECT * FROM table_of_2_years_date_times ORDER BY date desc,time desc LIMIT 5)
ORDER BY 1,2;
DATE
TIME
2022-03-01
00:00:00
2022-03-01
00:15:00
2022-03-01
00:30:00
2022-03-01
00:45:00
2022-03-01
01:00:00
2024-02-28
22:45:00
2024-02-28
23:00:00
2024-02-28
23:15:00
2024-02-28
23:30:00
2024-02-28
23:45:00

Related

postgres query to group the records by hourly interval with date field

I have a table that has some file input data with file_id and file_input_date. I want to filter / group these file_ids depending on file_input_date. The problem is my date is in format of YYYY-MM-DD HH:mm:ss and I want to go further to group them by hour and not just the date.
Edit: some sample data
file_id | file_input_date
597872 | 2023-01-12 16:06:22.92879
497872 | 2023-01-11 16:06:22.92879
397872 | 2023-01-11 16:06:22.92879
297872 | 2023-01-11 17:06:22.92879
297872 | 2023-01-11 17:06:22.92879
297872 | 2023-01-11 17:06:22.92879
297872 | 2023-01-11 18:06:22.92879
what I want to see is
1 for 2023-01-12 16:06
2 for 2023-01-11 16:06
3 for 2023-01-11 17:06
1 for 2023-01-11 18:06
the output format will be different but this kind of gives what I want.
You could convert the dates to strings with the format you want and group by it:
SELECT TO_CHAR(file_input_date, 'YYYY-MM-DD HH24:MI'), COUNT(*)
FROM mytable
GROUP BY TO_CHAR(file_input_date, 'YYYY-MM-DD HH24:MI')
To get to hour not minute:
create table date_grp (file_id integer, file_input_date timestamp);
INSERT INTO date_grp VALUES
(597872, '2023-01-12 16:06:22.92879'),
(497872, '2023-01-11 16:06:22.92879'),
(397872, '2023-01-11 16:06:22.92879'),
(297872, '2023-01-11 17:06:22.92879'),
(297872, '2023-01-11 17:06:22.92879'),
(297872, '2023-01-11 17:06:22.92879'),
(297872, '2023-01-11 18:06:22.92879');
SELECT
date_trunc('hour', file_input_date),
count(date_trunc('hour', file_input_date))
FROM
date_grp
GROUP BY
date_trunc('hour', file_input_date);
date_trunc | count
---------------------+-------
01/11/2023 18:00:00 | 1
01/11/2023 17:00:00 | 3
01/12/2023 16:00:00 | 1
01/11/2023 16:00:00 | 2
(4 rows)
Though if you want to minute
SELECT
date_trunc('minute', file_input_date),
count(date_trunc('minute', file_input_date))
FROM
date_grp
GROUP BY
date_trunc('minute', file_input_date);
date_trunc | count
---------------------+-------
01/11/2023 18:06:00 | 1
01/11/2023 16:06:00 | 2
01/12/2023 16:06:00 | 1
01/11/2023 17:06:00 | 3

SQL time-series resampling

I have clickhouse table with some rows like that
id
created_at
6962098097124188161
2022-07-01 00:00:00
6968111372399976448
2022-07-02 00:00:00
6968111483775524864
2022-07-03 00:00:00
6968465518567268352
2022-07-04 00:00:00
6968952917160271872
2022-07-07 00:00:00
6968952924479332352
2022-07-09 00:00:00
I need to resample time-series and get count by date like this
created_at
count
2022-07-01 00:00:00
1
2022-07-02 00:00:00
2
2022-07-03 00:00:00
3
2022-07-04 00:00:00
4
2022-07-05 00:00:00
4
2022-07-06 00:00:00
4
2022-07-07 00:00:00
5
2022-07-08 00:00:00
5
2022-07-09 00:00:00
6
I've tried this
SELECT
arrayJoin(
timeSlots(
MIN(created_at),
toUInt32(24 * 3600 * 10),
24 * 3600
)
) as ts,
SUM(
COUNT(*)
) OVER (
ORDER BY
ts
)
FROM
table
but it counts all rows.
How can I get expected result?
why not use group by created_at
like
select count(*) from table_name group by toDate(created_at)

Select data between 2 datetime fields based on current date/time

I have a table that has the following values (reduced for brevity)
Period
Periodfrom
Periodto
Glperiodoracle
Glperiodcalendar
88
2022-01-01 00:00:00
2022-01-28 00:00:00
JAN-FY2022
JAN-2022
89
2022-01-29 00:00:00
2022-02-25 00:00:00
FEB-FY2022
FEB-2022
90
2022-02-26 00:00:00
2022-04-01 00:00:00
MAR-FY2022
MAR-2022
91
2022-04-02 00:00:00
2022-04-29 00:00:00
APR-FY2022
APR-2022
92
2022-04-30 00:00:00
2022-05-27 00:00:00
MAY-FY2022
MAY-2022
93
2022-05-28 00:00:00
2022-07-01 00:00:00
JUN-FY2022
JUN-2022
94
2022-07-02 00:00:00
2022-07-29 00:00:00
JUL-FY2022
JUL-2022
95
2022-07-30 00:00:00
2022-08-26 00:00:00
AUG-FY2022
AUG-2022
96
2022-08-27 00:00:00
2022-09-30 00:00:00
SEP-FY2022
SEP-2022
97
2022-10-01 00:00:00
2022-10-28 00:00:00
OCT-FY2023
OCT-2022
I want to make a stored procedure that when executed (without receiving parameters) will return the single row corresponding to the date between PeriodFrom and PeriodTo based on execution date.
I have something like this:
Select top 1 Period,
Periodfrom,
Periodto,
Glperiodoracle,
Glperiodcalendar
From Calendar_Period
Where Periodfrom <= getdate()
And Periodto >= getdate()
I understand that using BETWEEN could lead to errors, but would this work in the edge cases taking in account seconds, right?
Looks like (i) your end date is inclusive (ii) the time portion is always 00:00. So the correct and most performant query would be:
where cast(getdate() as date) between Periodfrom and Periodto
It will, for example, return the first row when the current time is 2022-01-28 23:59:59.999.

How to break datetime in 12 hour chunks and use it for aggregation in Presto SQL?

I have been trying to break the datetime in 12 hour chunk in Presto SQL but was unsuccessful.
Raw data table:
datetime
Login
2022-05-08 07:10:00.000
1234
2022-05-09 23:20:00.000
5678
2022-05-09 06:20:00.000
5674
2022-05-08 09:20:00.000
8971
The output table should look like below. I have to get count of login in 12 hour chunks. So, first should be from 00:00:00.000 to 11:59:00:000 and the next chunk from 12:00:00.000 to 23:59:00:000
Output:
datetime
count
2022-05-08 00:00:00.000
2
2022-05-08 12:00:00.000
0
2022-05-09 00:00:00.000
1
2022-05-09 12:20:00.000
1
This should work:
Extract the hour from the timestamp, then integer divide it by 12. That will make it 0 till 11:59, and 1 till 23:59. Then, multiply that back by 12.
Use that resulting integer to DATE_ADD() it with unit 'HOUR' to the timestamp of the row truncated to the day.
SELECT
DATE_ADD('HOUR',(HOUR(ts) / 12) * 12, TRUNC(ts,'DAY')) AS halfday
, SUM(login) AS count_login
FROM indata
GROUP BY
halfday
;
-- out halfday | count_login
-- out ---------------------+-------------
-- out 2022-05-08 00:00:00 | 15879
-- out 2022-05-08 12:00:00 | 5678
This query worked for me.
SELECT
DATE_ADD('HOUR',(HOUR(ts) / 12) * 12, date_trunc('DAY',ts)) AS halfday
, SUM(login) AS count_login
FROM indata
GROUP BY
halfday
;

How do I count the max rows with the time range in SQL?

This is the table in Postgresql:
mydb=# \d login_log
Table "public.login_log"
Column | Type | Modifiers
-------------+--------------------------+-----------
id | integer |
login_start | timestamp with time zone |
login_end | timestamp with time zone |
some rows:
1 | 2015-03-19 10:00:00 | 2015-03-19 13:30:00
2 | 2015-03-19 10:20:00 | 2015-03-19 13:20:00
3 | 2015-03-19 13:00:00 | 2015-03-19 16:00:00
4 | 2015-03-19 13:10:00 | 2015-03-19 16:00:00
5 | 2015-03-19 14:30:00 | 2015-03-19 15:30:00
6 | 2015-03-19 15:00:00 | 2015-03-19 15:30:00
7 | 2015-03-19 12:00:00 | 2015-03-19 18:00:00
I need a SQL to count out in which time range there is the max logged users.
with the example above, the result is:
in time range: 2015-03-19 13:10:00 ~ 2015-03-19 13:20:00,
5 users logged in. (1, 2, 3, 4, 7)
Use range types (construct them "on the fly"). They offer quite a few helpful functions and operators. You would only need to define a custom aggregate, which will provide you the overall intersection. So - you would end up with something like this:
with common as (
select (intersection(tsrange(login_start, login_end))) as period
from login_log
)
select
-- common.period,
-- array_agg(id)
*
from common, login_log
WHERE tsrange(login_start, login_end) && common.period
-- GROUP BY common.period
/*
for some reason, when uncommenting the "^--..." lines,
and commenting the "*" one - sqlfiddle shows an empty result.
Nevertheless it works on my local posgres...
*/
See the working example: http://sqlfiddle.com/#!15/0c9c6/10
Find different timestamps of interest using UNION ALL, count number of active users at these timestamps:
select ts,
(select count(*) from login_log t2
where timestamps.ts between t2.login_start and t2.login_end) as count
from (select login_start as ts
from login_log
union all
select login_end
from login_log) as timestamps
order by count desc
fetch first 1 row only
Finally order descending and pick the highest value only!
(From a non Postgresql user, so some details may be wrong... Please comment if that's the case and I'll edit!)