SQL - move date to within 48 hr window

SQL - move date to within 48 hr window - sql

I have a bunch of historic timestamp dates. Basically, I need to simulate a new date such that the historic dates are moved to within a 48 hour window of the current date.
This is an extract of the date column:
2019-05-07 17:46:57.733 UTC
2019-05-15 13:03:25.247 UTC
2019-05-07 13:27:49.453 UTC
2019-05-11 04:24:02.293 UTC
2019-04-18 08:00:54.660 UTC
2019-04-25 05:34:36.777 UTC
2019-05-14 16:48:07.863 UTC
Assuming the current date is 2019-10-03 15:00:00. The expected range of dates should be between 2019-10-03 15:00:00 and 2019-10-01 15:00:00
The expected results should be the following.
2019-10-02 17:46:57.733 UTC
2019-10-03 13:03:25.247 UTC
2019-10-03 13:27:49.453 UTC
2019-10-03 04:24:02.293 UTC
2019-10-02 08:00:54.660 UTC
2019-10-02 05:34:36.777 UTC
2019-10-01 16:48:07.863 UTC

Why not just construct two days of random timestamps?
select timestamp_add(current_timestamp, interval cast(rand() * (60 * 60 * 24 * 2) as int64) second)
from t

It feels like you are looking for a random date function.
CREATE TEMP FUNCTION random_date()
RETURNS DATE
AS ( DATE_SUB(CURRENT_DATE(), INTERVAL CAST(FLOOR(RAND() * 29 / 10) AS INT64) DAY));
with data as (
select "2019-05-07 17:46:57.733 UTC" as date_time UNION ALL
select "2019-05-15 13:03:25.247 UTC" UNION ALL
select "2019-05-07 13:27:49.453 UTC" UNION ALL
select "2019-05-11 04:24:02.293 UTC" UNION ALL
select "2019-04-18 08:00:54.660 UTC" UNION ALL
select "2019-04-25 05:34:36.777 UTC" UNION ALL
select "2019-05-14 16:48:07.863 UTC" )
SELECT
CONCAT(FORMAT_DATE("%Y-%m-%d", random_date()), " ", SUBSTR(date_time, 12))
FROM data;
Output:
+-----------------------------+
| f0_ |
+-----------------------------+
| 2019-10-01 17:46:57.733 UTC |
| 2019-10-01 13:03:25.247 UTC |
| 2019-10-02 13:27:49.453 UTC |
| 2019-10-03 04:24:02.293 UTC |
| 2019-10-03 08:00:54.660 UTC |
| 2019-10-03 05:34:36.777 UTC |
| 2019-10-02 16:48:07.863 UTC |
+-----------------------------+

Related

SQL time-series resampling

I have clickhouse table with some rows like that
id
created_at
6962098097124188161
2022-07-01 00:00:00
6968111372399976448
2022-07-02 00:00:00
6968111483775524864
2022-07-03 00:00:00
6968465518567268352
2022-07-04 00:00:00
6968952917160271872
2022-07-07 00:00:00
6968952924479332352
2022-07-09 00:00:00
I need to resample time-series and get count by date like this
created_at
count
2022-07-01 00:00:00
1
2022-07-02 00:00:00
2
2022-07-03 00:00:00
3
2022-07-04 00:00:00
4
2022-07-05 00:00:00
4
2022-07-06 00:00:00
4
2022-07-07 00:00:00
5
2022-07-08 00:00:00
5
2022-07-09 00:00:00
6
I've tried this
SELECT
arrayJoin(
timeSlots(
MIN(created_at),
toUInt32(24 * 3600 * 10),
24 * 3600
)
) as ts,
SUM(
COUNT(*)
) OVER (
ORDER BY
ts
)
FROM
table
but it counts all rows.
How can I get expected result?

why not use group by created_at
like
select count(*) from table_name group by toDate(created_at)

Create table with 15 minutes interval on date time in Snowflake

I am trying to create a table in Snowflake with 15 mins interval. I have tried with generator, but that's not give in the 15 minutes interval. Are there any function which I can use to generate and build this table for couple of years worth data.
Such as
Date
Hour
202-03-29
02:00 AM
202-03-29
02:15 AM
202-03-29
02:30 AM
202-03-29
02:45 AM
202-03-29
03:00 AM
202-03-29
03:15 AM
.........
........
.........
........
Thanks

Use following as time generator with 15min interval and then use other date time functions as needed to extract date part or time part in separate columns.
with CTE as
(select timestampadd(min,seq4()*15 ,date_trunc(hour, current_timestamp())) as time_count
from table(generator(rowcount=>4*24)))
select time_count from cte;
+-------------------------------+
| TIME_COUNT |
|-------------------------------|
| 2022-03-29 14:00:00.000 -0700 |
| 2022-03-29 14:15:00.000 -0700 |
| 2022-03-29 14:30:00.000 -0700 |
| 2022-03-29 14:45:00.000 -0700 |
| 2022-03-29 15:00:00.000 -0700 |
| 2022-03-29 15:15:00.000 -0700 |
.
.
.
....truncated output
| 2022-03-30 13:15:00.000 -0700 |
| 2022-03-30 13:30:00.000 -0700 |
| 2022-03-30 13:45:00.000 -0700 |
+-------------------------------+

There are many answers to this question h e r e already (those 4 are all this month).
But major point to note is you MUST NOT use SEQx() as the number generator (you can use it in the ORDER BY, but that is not needed). As noted in the doc's
Important
This function uses sequences to produce a unique set of increasing integers, but does not necessarily produce a gap-free sequence. When operating on a large quantity of data, gaps can appear in a sequence. If a fully ordered, gap-free sequence is required, consider using the ROW_NUMBER window function.
CREATE TABLE table_of_2_years_date_times AS
SELECT
date_time::date as date,
date_time::time as time
FROM (
SELECT
row_number() over (order by null)-1 as rn
,dateadd('minute', 15 * rn, '2022-03-01'::date) as date_time
from table(generator(rowcount=>4*24*365*2))
)
ORDER BY rn;
then selecting the top/bottom:
(SELECT * FROM table_of_2_years_date_times ORDER BY date,time LIMIT 5)
UNION ALL
(SELECT * FROM table_of_2_years_date_times ORDER BY date desc,time desc LIMIT 5)
ORDER BY 1,2;
DATE
TIME
2022-03-01
00:00:00
2022-03-01
00:15:00
2022-03-01
00:30:00
2022-03-01
00:45:00
2022-03-01
01:00:00
2024-02-28
22:45:00
2024-02-28
23:00:00
2024-02-28
23:15:00
2024-02-28
23:30:00
2024-02-28
23:45:00

How to aggregate rows in the range of timestamp in vertica db (vsql)

Suppose I have a table with data like this:
ts | bandwidth_bytes
---------------------+-----------------
2021-08-27 22:00:00 | 3792
2021-08-27 21:45:00 | 1164
2021-08-27 21:30:00 | 7062
2021-08-27 21:15:00 | 3637
2021-08-27 21:00:00 | 2472
2021-08-27 20:45:00 | 1328
2021-08-27 20:30:00 | 1932
2021-08-27 20:15:00 | 1434
2021-08-27 20:00:00 | 1530
2021-08-27 19:45:00 | 1457
2021-08-27 19:30:00 | 1948
2021-08-27 19:15:00 | 1160
I need to output something like this:
ts | bandwidth_bytes
---------------------+-----------------
2021-08-27 22:00:00 | 15,655
2021-08-27 21:00:00 | 7166
2021-08-27 20:00:00 | 6095
I want to do sum bandwidth_bytes over 1 hour timestamp of data.
I want to do this in vsql specifically.
More columns are present but for simplification I have shown only these two.

You can use date_trunc():
select [date_trunc('hour', ts)][1] as ts_hh, sum(bandwidth_bytes)
from t
group by ts_hh;

Use Vertica's lovely function TIME_SLICE().
You can't only go by hour, you can also go by slices of 2 or 3 hours, which DATE_TRUNC() does not offer.
You seem to want all between 20:00:01 and 21:00:00 to belong to a time slice of 21:00:00. In both DATE_TRUNC() and TIME_SLICE(), however, it's 20:00:00 to 20:59:59 that belongs to the same time slice. So I subtracted one second before applying TIME_SLICE() .
WITH
-- your in data ...
indata(ts,bandwidth_bytes) AS (
SELECT TIMESTAMP '2021-08-27 22:00:00',3792
UNION ALL SELECT TIMESTAMP '2021-08-27 21:45:00',1164
UNION ALL SELECT TIMESTAMP '2021-08-27 21:30:00',7062
UNION ALL SELECT TIMESTAMP '2021-08-27 21:15:00',3637
UNION ALL SELECT TIMESTAMP '2021-08-27 21:00:00',2472
UNION ALL SELECT TIMESTAMP '2021-08-27 20:45:00',1328
UNION ALL SELECT TIMESTAMP '2021-08-27 20:30:00',1932
UNION ALL SELECT TIMESTAMP '2021-08-27 20:15:00',1434
UNION ALL SELECT TIMESTAMP '2021-08-27 20:00:00',1530
UNION ALL SELECT TIMESTAMP '2021-08-27 19:45:00',1457
UNION ALL SELECT TIMESTAMP '2021-08-27 19:30:00',1948
UNION ALL SELECT TIMESTAMP '2021-08-27 19:15:00',1160
)
SELECT
TIME_SLICE(ts - INTERVAL '1 SECOND' ,1,'HOUR','END') AS ts
, SUM(bandwidth_bytes) AS bandwidth_bytes
FROM indata
GROUP BY 1
ORDER BY 1 DESC;
ts | bandwidth_bytes
---------------------+-----------------
2021-08-27 22:00:00 | 15655
2021-08-27 21:00:00 | 7166
2021-08-27 20:00:00 | 6095

compare oracle row count between different dates hourly

I am using this sql to query the count of rows hourly for three days ago ...
select trunc(sendtime ,'hh24') , count(*)
FROM t_sendedmsglog
where msgcontext like '%sm_%_tone_succ%' and sendtime > sysdate -3
group by trunc(sendtime ,'hh24')
order by trunc(sendtime ,'hh24') desc;
and the result shows like :
for example:
#|TRUNC(SENDTIME,'HH24')|COUNT(*)|
1|10/15/2020|12:00:00 PM|593|
2|10/15/2020|11:00:00 AM|889|
3|10/15/2020|10:00:00 AM|854|
4|10/15/2020|9:00:00 AM|1027|
5|10/15/2020|8:00:00 AM|8409|
.
.
.
12|10/15/2020|1:00:00 AM|101|
13|10/15/2020|281|
14|10/14/2020|11:00:00 PM|722|
15|10/14/2020|10:00:00 PM|1381|
16|10/14/2020|9:00:00 PM|2123|
.
.
25|10/14/2020|12:00:00 PM|1195|
26|10/14/2020|11:00:00 AM|1699|
27|10/14/2020|10:00:00 AM|747|
28|10/14/2020|9:00:00 AM|827|
.
.
40|10/13/2020|9:00:00 PM|2058|
41|10/13/2020|8:00:00 PM|2800|
but how I can make the result appear like below instead, so I can compare the count between different days for the same hour ?
hour|10/12/2020|10/13/2020|10/14/2020|count(*)
11:00:00 PM|618 |509 |722 |
10:00:00 PM|3181|1144|1381|
09:00:00 PM|3520|2058|2123|
08:00:00 PM|3688|2800|9347|
07:00:00 PM|3648|3166|3469|
06:00:00 PM|3628|2973|4518|
05:00:00 PM|3644|2429|3607|
04:00:00 PM|3652|3678|2291|
03:00:00 PM|1017|7711|819 |
02:00:00 PM|814 |7693|1310|
01:00:00 PM|856 |825 |848 |
12:00:00 PM|558 |1531|1195|
11:00:00 AM|0 |1132|1699|
10:00:00 AM|0 |732 |747 |
09:00:00 AM|0 |709 |827 |
08:00:00 AM|0 |1256|947 |
07:00:00 AM|0 |1465|1502|
06:00:00 AM|0 |749 |780 |
05:00:00 AM|0 |181 |169 |
04:00:00 AM|0 |46 |32 |
03:00:00 AM|0 |23 |34 |
02:00:00 AM|0 |46 |39 |
01:00:00 AM|0 |82 |81 |
00:00:00 AM|0 | |218 |

Use conditional aggregation:
select trunc(sendtime, 'hh24') , count(*) as total,
sum(case when trunc(sendtime) = trunc(sysdate) - interval '2' day then 1 else 0 end) as yester2day,
sum(case when trunc(sendtime) = trunc(sysdate) - interval '1' day then 1 else 0 end) as yesterday,
sum(case when trunc(sendtime) = trunc(sysdate) - interval '0' day then 1 else 0 end) as today
from t_sendedmsglog
where msgcontext like '%sm_%_tone_succ%' and
sendtime >= trunc(sysdate) - interval '2' day
group by trunc(sendtime, 'hh24')
order by trunc(sendtime, 'hh24') desc;
Note that I tweaked the date comparison in the where clause as well. In Oracle, sysdate has a time component, which you don't care about for the filtering purposes.

Get the dates of two weeks from today from database

I have some dates in postgresql database. I want to find dates from today to next two weeks or 14 days. How i can find the dates between current date and next 14 days? This query is not working.
I have date format 2019-12-26 in database.
"SELECT work_date FROM USERS_SCHEDULE WHERE user_id = 11 AND data(now() +14)";

Simply by adding the number of days to the date you can set the limit date you want.
Sample Data
CREATE TABLE users_schedule (work_date DATE);
INSERT INTO users_schedule
SELECT generate_series(CURRENT_DATE, DATE '2020-01-31', '1 day');
Query (dates between the current date and 3 days later)
SELECT work_date FROM users_schedule
WHERE work_date BETWEEN CURRENT_DATE AND CURRENT_DATE + 3;
work_date
------------
2019-12-26
2019-12-27
2019-12-28
2019-12-29
(4 rows)
If you mean you want to get all possible dates inside an interval, take a look at generate_series:
SELECT generate_series(DATE '2016-08-01', DATE '2016-08-14', '1 day');
generate_series
------------------------
2016-08-01 00:00:00+02
2016-08-02 00:00:00+02
2016-08-03 00:00:00+02
2016-08-04 00:00:00+02
2016-08-05 00:00:00+02
2016-08-06 00:00:00+02
2016-08-07 00:00:00+02
2016-08-08 00:00:00+02
2016-08-09 00:00:00+02
2016-08-10 00:00:00+02
2016-08-11 00:00:00+02
2016-08-12 00:00:00+02
2016-08-13 00:00:00+02
2016-08-14 00:00:00+02
(14 rows)
Using CURRENT_DATE
SELECT generate_series(CURRENT_DATE, DATE '2019-12-31', '1 day');
generate_series
------------------------
2019-12-26 00:00:00+01
2019-12-27 00:00:00+01
2019-12-28 00:00:00+01
2019-12-29 00:00:00+01
2019-12-30 00:00:00+01
2019-12-31 00:00:00+01
(6 rows)

SELECT work_date
FROM users_schedule
WHERE user_id = 11
AND work_date BETWEEN CURRENT_DATE
AND CURRENT_DATE + INTERVAL '14 days'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - move date to within 48 hr window - sql

Why not just construct two days of random timestamps? select timestamp_add(current_timestamp, interval cast(rand() * (60 * 60 * 24 * 2) as int64) second) from t

Related

SQL time-series resampling

Create table with 15 minutes interval on date time in Snowflake

How to aggregate rows in the range of timestamp in vertica db (vsql)

compare oracle row count between different dates hourly

Get the dates of two weeks from today from database

Categories

Resources