How to aggregate rows in the range of timestamp in vertica db (vsql) - sql

Suppose I have a table with data like this:
ts | bandwidth_bytes
---------------------+-----------------
2021-08-27 22:00:00 | 3792
2021-08-27 21:45:00 | 1164
2021-08-27 21:30:00 | 7062
2021-08-27 21:15:00 | 3637
2021-08-27 21:00:00 | 2472
2021-08-27 20:45:00 | 1328
2021-08-27 20:30:00 | 1932
2021-08-27 20:15:00 | 1434
2021-08-27 20:00:00 | 1530
2021-08-27 19:45:00 | 1457
2021-08-27 19:30:00 | 1948
2021-08-27 19:15:00 | 1160
I need to output something like this:
ts | bandwidth_bytes
---------------------+-----------------
2021-08-27 22:00:00 | 15,655
2021-08-27 21:00:00 | 7166
2021-08-27 20:00:00 | 6095
I want to do sum bandwidth_bytes over 1 hour timestamp of data.
I want to do this in vsql specifically.
More columns are present but for simplification I have shown only these two.

You can use date_trunc():
select [date_trunc('hour', ts)][1] as ts_hh, sum(bandwidth_bytes)
from t
group by ts_hh;

Use Vertica's lovely function TIME_SLICE().
You can't only go by hour, you can also go by slices of 2 or 3 hours, which DATE_TRUNC() does not offer.
You seem to want all between 20:00:01 and 21:00:00 to belong to a time slice of 21:00:00. In both DATE_TRUNC() and TIME_SLICE(), however, it's 20:00:00 to 20:59:59 that belongs to the same time slice. So I subtracted one second before applying TIME_SLICE() .
WITH
-- your in data ...
indata(ts,bandwidth_bytes) AS (
SELECT TIMESTAMP '2021-08-27 22:00:00',3792
UNION ALL SELECT TIMESTAMP '2021-08-27 21:45:00',1164
UNION ALL SELECT TIMESTAMP '2021-08-27 21:30:00',7062
UNION ALL SELECT TIMESTAMP '2021-08-27 21:15:00',3637
UNION ALL SELECT TIMESTAMP '2021-08-27 21:00:00',2472
UNION ALL SELECT TIMESTAMP '2021-08-27 20:45:00',1328
UNION ALL SELECT TIMESTAMP '2021-08-27 20:30:00',1932
UNION ALL SELECT TIMESTAMP '2021-08-27 20:15:00',1434
UNION ALL SELECT TIMESTAMP '2021-08-27 20:00:00',1530
UNION ALL SELECT TIMESTAMP '2021-08-27 19:45:00',1457
UNION ALL SELECT TIMESTAMP '2021-08-27 19:30:00',1948
UNION ALL SELECT TIMESTAMP '2021-08-27 19:15:00',1160
)
SELECT
TIME_SLICE(ts - INTERVAL '1 SECOND' ,1,'HOUR','END') AS ts
, SUM(bandwidth_bytes) AS bandwidth_bytes
FROM indata
GROUP BY 1
ORDER BY 1 DESC;
ts | bandwidth_bytes
---------------------+-----------------
2021-08-27 22:00:00 | 15655
2021-08-27 21:00:00 | 7166
2021-08-27 20:00:00 | 6095

Related

postgres query to group the records by hourly interval with date field

I have a table that has some file input data with file_id and file_input_date. I want to filter / group these file_ids depending on file_input_date. The problem is my date is in format of YYYY-MM-DD HH:mm:ss and I want to go further to group them by hour and not just the date.
Edit: some sample data
file_id | file_input_date
597872 | 2023-01-12 16:06:22.92879
497872 | 2023-01-11 16:06:22.92879
397872 | 2023-01-11 16:06:22.92879
297872 | 2023-01-11 17:06:22.92879
297872 | 2023-01-11 17:06:22.92879
297872 | 2023-01-11 17:06:22.92879
297872 | 2023-01-11 18:06:22.92879
what I want to see is
1 for 2023-01-12 16:06
2 for 2023-01-11 16:06
3 for 2023-01-11 17:06
1 for 2023-01-11 18:06
the output format will be different but this kind of gives what I want.
You could convert the dates to strings with the format you want and group by it:
SELECT TO_CHAR(file_input_date, 'YYYY-MM-DD HH24:MI'), COUNT(*)
FROM mytable
GROUP BY TO_CHAR(file_input_date, 'YYYY-MM-DD HH24:MI')
To get to hour not minute:
create table date_grp (file_id integer, file_input_date timestamp);
INSERT INTO date_grp VALUES
(597872, '2023-01-12 16:06:22.92879'),
(497872, '2023-01-11 16:06:22.92879'),
(397872, '2023-01-11 16:06:22.92879'),
(297872, '2023-01-11 17:06:22.92879'),
(297872, '2023-01-11 17:06:22.92879'),
(297872, '2023-01-11 17:06:22.92879'),
(297872, '2023-01-11 18:06:22.92879');
SELECT
date_trunc('hour', file_input_date),
count(date_trunc('hour', file_input_date))
FROM
date_grp
GROUP BY
date_trunc('hour', file_input_date);
date_trunc | count
---------------------+-------
01/11/2023 18:00:00 | 1
01/11/2023 17:00:00 | 3
01/12/2023 16:00:00 | 1
01/11/2023 16:00:00 | 2
(4 rows)
Though if you want to minute
SELECT
date_trunc('minute', file_input_date),
count(date_trunc('minute', file_input_date))
FROM
date_grp
GROUP BY
date_trunc('minute', file_input_date);
date_trunc | count
---------------------+-------
01/11/2023 18:06:00 | 1
01/11/2023 16:06:00 | 2
01/12/2023 16:06:00 | 1
01/11/2023 17:06:00 | 3

SQL time-series resampling

I have clickhouse table with some rows like that
id
created_at
6962098097124188161
2022-07-01 00:00:00
6968111372399976448
2022-07-02 00:00:00
6968111483775524864
2022-07-03 00:00:00
6968465518567268352
2022-07-04 00:00:00
6968952917160271872
2022-07-07 00:00:00
6968952924479332352
2022-07-09 00:00:00
I need to resample time-series and get count by date like this
created_at
count
2022-07-01 00:00:00
1
2022-07-02 00:00:00
2
2022-07-03 00:00:00
3
2022-07-04 00:00:00
4
2022-07-05 00:00:00
4
2022-07-06 00:00:00
4
2022-07-07 00:00:00
5
2022-07-08 00:00:00
5
2022-07-09 00:00:00
6
I've tried this
SELECT
arrayJoin(
timeSlots(
MIN(created_at),
toUInt32(24 * 3600 * 10),
24 * 3600
)
) as ts,
SUM(
COUNT(*)
) OVER (
ORDER BY
ts
)
FROM
table
but it counts all rows.
How can I get expected result?
why not use group by created_at
like
select count(*) from table_name group by toDate(created_at)

compare oracle row count between different dates hourly

I am using this sql to query the count of rows hourly for three days ago ...
select trunc(sendtime ,'hh24') , count(*)
FROM t_sendedmsglog
where msgcontext like '%sm_%_tone_succ%' and sendtime > sysdate -3
group by trunc(sendtime ,'hh24')
order by trunc(sendtime ,'hh24') desc;
and the result shows like :
for example:
#|TRUNC(SENDTIME,'HH24')|COUNT(*)|
1|10/15/2020|12:00:00 PM|593|
2|10/15/2020|11:00:00 AM|889|
3|10/15/2020|10:00:00 AM|854|
4|10/15/2020|9:00:00 AM|1027|
5|10/15/2020|8:00:00 AM|8409|
.
.
.
12|10/15/2020|1:00:00 AM|101|
13|10/15/2020|281|
14|10/14/2020|11:00:00 PM|722|
15|10/14/2020|10:00:00 PM|1381|
16|10/14/2020|9:00:00 PM|2123|
.
.
25|10/14/2020|12:00:00 PM|1195|
26|10/14/2020|11:00:00 AM|1699|
27|10/14/2020|10:00:00 AM|747|
28|10/14/2020|9:00:00 AM|827|
.
.
40|10/13/2020|9:00:00 PM|2058|
41|10/13/2020|8:00:00 PM|2800|
but how I can make the result appear like below instead, so I can compare the count between different days for the same hour ?
hour|10/12/2020|10/13/2020|10/14/2020|count(*)
11:00:00 PM|618 |509 |722 |
10:00:00 PM|3181|1144|1381|
09:00:00 PM|3520|2058|2123|
08:00:00 PM|3688|2800|9347|
07:00:00 PM|3648|3166|3469|
06:00:00 PM|3628|2973|4518|
05:00:00 PM|3644|2429|3607|
04:00:00 PM|3652|3678|2291|
03:00:00 PM|1017|7711|819 |
02:00:00 PM|814 |7693|1310|
01:00:00 PM|856 |825 |848 |
12:00:00 PM|558 |1531|1195|
11:00:00 AM|0 |1132|1699|
10:00:00 AM|0 |732 |747 |
09:00:00 AM|0 |709 |827 |
08:00:00 AM|0 |1256|947 |
07:00:00 AM|0 |1465|1502|
06:00:00 AM|0 |749 |780 |
05:00:00 AM|0 |181 |169 |
04:00:00 AM|0 |46 |32 |
03:00:00 AM|0 |23 |34 |
02:00:00 AM|0 |46 |39 |
01:00:00 AM|0 |82 |81 |
00:00:00 AM|0 | |218 |
Use conditional aggregation:
select trunc(sendtime, 'hh24') , count(*) as total,
sum(case when trunc(sendtime) = trunc(sysdate) - interval '2' day then 1 else 0 end) as yester2day,
sum(case when trunc(sendtime) = trunc(sysdate) - interval '1' day then 1 else 0 end) as yesterday,
sum(case when trunc(sendtime) = trunc(sysdate) - interval '0' day then 1 else 0 end) as today
from t_sendedmsglog
where msgcontext like '%sm_%_tone_succ%' and
sendtime >= trunc(sysdate) - interval '2' day
group by trunc(sendtime, 'hh24')
order by trunc(sendtime, 'hh24') desc;
Note that I tweaked the date comparison in the where clause as well. In Oracle, sysdate has a time component, which you don't care about for the filtering purposes.

I want to output MAX (value) over following 30 minutes IF current row value is > 10 and previous row is <10

Using data mytable
date value
2019-07-11 02:20:00 UTC 14.99
2019-07-11 02:30:00 UTC 12.53
2019-07-11 02:40:00 UTC 12.53
2019-07-11 02:50:00 UTC 14.99
2019-07-11 03:00:00 UTC 10.07
2019-07-11 03:10:00 UTC 7.61
2019-07-11 03:20:00 UTC 7.61
2019-07-11 03:30:00 UTC 10.07
2019-07-11 03:40:00 UTC 10.07
2019-07-11 03:50:00 UTC 7.61
2019-07-11 04:00:00 UTC 7.61
2019-07-11 04:10:00 UTC 7.61:
I want to output MAX (value) over following 30 minutes IF current row value is > 10 and previous row is < 10.
For example, if value is >10, check previous row value is <10. If this is true, output MAX(value) over 30 minutes following current row. For the table above, the first value that this would output should be 10.07
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
CASE value > 10 AND prev_value < 10
WHEN TRUE THEN
MAX(value) OVER(ORDER BY UNIX_SECONDS(ts) RANGE BETWEEN CURRENT ROW AND 1800 FOLLOWING)
ELSE NULL
END max_value_next_30_min
FROM (
SELECT *, LAG(value) OVER(ORDER BY ts) prev_value
FROM `project.dataset.table`
)
-- ORDER BY ts
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2019-07-11 02:20:00 UTC' ts, 14.99 value UNION ALL
SELECT '2019-07-11 02:30:00 UTC', 12.53 UNION ALL
SELECT '2019-07-11 02:40:00 UTC', 12.53 UNION ALL
SELECT '2019-07-11 02:50:00 UTC', 14.99 UNION ALL
SELECT '2019-07-11 03:00:00 UTC', 10.07 UNION ALL
SELECT '2019-07-11 03:10:00 UTC', 7.61 UNION ALL
SELECT '2019-07-11 03:20:00 UTC', 7.61 UNION ALL
SELECT '2019-07-11 03:30:00 UTC', 10.07 UNION ALL
SELECT '2019-07-11 03:40:00 UTC', 10.07 UNION ALL
SELECT '2019-07-11 03:50:00 UTC', 17.61 UNION ALL
SELECT '2019-07-11 04:00:00 UTC', 7.61 UNION ALL
SELECT '2019-07-11 04:10:00 UTC', 7.61
)
SELECT *,
CASE value > 10 AND prev_value < 10
WHEN TRUE THEN
MAX(value) OVER(ORDER BY UNIX_SECONDS(ts) RANGE BETWEEN CURRENT ROW AND 1800 FOLLOWING)
ELSE NULL
END max_value_next_30_min
FROM (
SELECT *, LAG(value) OVER(ORDER BY ts) prev_value
FROM `project.dataset.table`
)
-- ORDER BY ts
with output
Row ts value prev_value max_value_next_30_min
1 2019-07-11 02:20:00 UTC 14.99 null null
2 2019-07-11 02:30:00 UTC 12.53 14.99 null
3 2019-07-11 02:40:00 UTC 12.53 12.53 null
4 2019-07-11 02:50:00 UTC 14.99 12.53 null
5 2019-07-11 03:00:00 UTC 10.07 14.99 null
6 2019-07-11 03:10:00 UTC 7.61 10.07 null
7 2019-07-11 03:20:00 UTC 7.61 7.61 null
8 2019-07-11 03:30:00 UTC 10.07 7.61 17.61
9 2019-07-11 03:40:00 UTC 10.07 10.07 null
10 2019-07-11 03:50:00 UTC 17.61 10.07 null
11 2019-07-11 04:00:00 UTC 7.61 17.61 null
12 2019-07-11 04:10:00 UTC 7.61 7.61 null

SQL - move date to within 48 hr window

I have a bunch of historic timestamp dates. Basically, I need to simulate a new date such that the historic dates are moved to within a 48 hour window of the current date.
This is an extract of the date column:
2019-05-07 17:46:57.733 UTC
2019-05-15 13:03:25.247 UTC
2019-05-07 13:27:49.453 UTC
2019-05-11 04:24:02.293 UTC
2019-04-18 08:00:54.660 UTC
2019-04-25 05:34:36.777 UTC
2019-05-14 16:48:07.863 UTC
Assuming the current date is 2019-10-03 15:00:00. The expected range of dates should be between 2019-10-03 15:00:00 and 2019-10-01 15:00:00
The expected results should be the following.
2019-10-02 17:46:57.733 UTC
2019-10-03 13:03:25.247 UTC
2019-10-03 13:27:49.453 UTC
2019-10-03 04:24:02.293 UTC
2019-10-02 08:00:54.660 UTC
2019-10-02 05:34:36.777 UTC
2019-10-01 16:48:07.863 UTC
Why not just construct two days of random timestamps?
select timestamp_add(current_timestamp, interval cast(rand() * (60 * 60 * 24 * 2) as int64) second)
from t
It feels like you are looking for a random date function.
CREATE TEMP FUNCTION random_date()
RETURNS DATE
AS ( DATE_SUB(CURRENT_DATE(), INTERVAL CAST(FLOOR(RAND() * 29 / 10) AS INT64) DAY));
with data as (
select "2019-05-07 17:46:57.733 UTC" as date_time UNION ALL
select "2019-05-15 13:03:25.247 UTC" UNION ALL
select "2019-05-07 13:27:49.453 UTC" UNION ALL
select "2019-05-11 04:24:02.293 UTC" UNION ALL
select "2019-04-18 08:00:54.660 UTC" UNION ALL
select "2019-04-25 05:34:36.777 UTC" UNION ALL
select "2019-05-14 16:48:07.863 UTC" )
SELECT
CONCAT(FORMAT_DATE("%Y-%m-%d", random_date()), " ", SUBSTR(date_time, 12))
FROM data;
Output:
+-----------------------------+
| f0_ |
+-----------------------------+
| 2019-10-01 17:46:57.733 UTC |
| 2019-10-01 13:03:25.247 UTC |
| 2019-10-02 13:27:49.453 UTC |
| 2019-10-03 04:24:02.293 UTC |
| 2019-10-03 08:00:54.660 UTC |
| 2019-10-03 05:34:36.777 UTC |
| 2019-10-02 16:48:07.863 UTC |
+-----------------------------+