Using generate_serires with partition by generate triple duplicate row - sql

I have a table which look like this:
dt type
-----------------------------
2019-07-01 10:00:00 A
2019-07-01 10:15:00 A
2019-07-01 11:00:00 A
2019-07-01 08:30:00 B
2019-07-01 08:45:00 B
2019-07-01 09:30:00 B
Each type has it own dt value but each type should have a consecutive 15 minute range dt. But some row are missing. So, I used generate_strings() to add date and partition by to do it based on each type column by using this:
SELECT
generate_series(min(dt) over (partition by type),
max(dt) over (partition by type), interval '15 minute')
, type
FROM t
which I generate datetime in dt column based on in min to max dt with a range of 15 minutes.
This is what I expect to get:
dt type
-----------------------------
2019-07-01 10:00:00 A
2019-07-01 10:15:00 A
2019-07-01 10:30:00 A
2019-07-01 10:45:00 A
2019-07-01 11:00:00 A
2019-07-01 08:30:00 B
2019-07-01 08:45:00 B
2019-07-01 09:00:00 B
2019-07-01 09:15:00 B
2019-07-01 09:30:00 B
But what I got as a result is like the expected one but it return triple for each type and datetime.
E.g.
dt type
-----------------------------
2019-07-01 10:00:00 A
2019-07-01 10:15:00 A
2019-07-01 10:30:00 A
2019-07-01 10:45:00 A
2019-07-01 11:00:00 A
2019-07-01 10:00:00 A
2019-07-01 10:15:00 A
2019-07-01 10:30:00 A
2019-07-01 10:45:00 A
2019-07-01 11:00:00 A
2019-07-01 10:00:00 A
2019-07-01 10:15:00 A
2019-07-01 10:30:00 A
2019-07-01 10:45:00 A
2019-07-01 11:00:00 A
2019-07-01 08:30:00 B
. . .
This also happened to type B as well.
So, from my query, what do I need to change to get the expected result?

You just want to run generate_series() over the aggregation:
SELECT type, generate_series(min_dt, max_dt, interval '15 minute')
FROM (SELECT type, MIN(dt) as min_dt, MAX(dt) as max_dt
FROM t
GROUP BY type
) t;
The window functions start by adding the min and max value to each row. Then each row gets its own series.

Related

Analyze a Time Series

I am inserting data into a table with date/time column.
I want to find speed of inserts during a particular duration as follows :
Duration # of Records
1:00pm - 2:00PM 1000
2:00pm - 3:00PM 1400
.......................
11:00PM- 12:00 1100
Though I can find above by repeatedly executing follows:
select count(*) from table_A where insert_date between 1:00pm and 2:00pm
Is there Oracle supplied package/function which can produce above report - without having to execute separate statements ?
Here's a couple of examples. To get "sparse" results, ie, just the data that exists within the table, you simply use TRUNC
SQL> create table data ( d date );
Table created.
SQL>
SQL> insert into data
2 select date '2022-02-10' + dbms_random.normal/10
3 from dual
4 connect by level <= 10000;
10000 rows created.
SQL>
SQL> select trunc(d,'HH24'), count(*)
2 from data
3 group by trunc(d,'HH24')
4 order by 1;
TRUNC(D,'HH24') COUNT(*)
------------------- ----------
09/02/2022 13:00:00 1
09/02/2022 15:00:00 4
09/02/2022 16:00:00 10
09/02/2022 17:00:00 40
09/02/2022 18:00:00 126
09/02/2022 19:00:00 282
09/02/2022 20:00:00 595
09/02/2022 21:00:00 948
09/02/2022 22:00:00 1389
09/02/2022 23:00:00 1577
10/02/2022 00:00:00 1609
10/02/2022 01:00:00 1362
10/02/2022 02:00:00 956
10/02/2022 03:00:00 624
10/02/2022 04:00:00 281
10/02/2022 05:00:00 134
10/02/2022 06:00:00 43
10/02/2022 07:00:00 16
10/02/2022 08:00:00 2
10/02/2022 10:00:00 1
20 rows selected.
If you need to get ALL hours, even if there was no data for a given hour, you can OUTER JOIN the raw data to a synthetic list of rows with all hours for the desired range, eg
SQL> with full_range as
2 ( select date '2022-02-09' + rownum/24 hr
3 from dual
4 connect by level <= 48
5 ),
6 raw_data as
7 ( select trunc(d,'HH24') dhr, count(*) cnt
8 from data
9 group by trunc(d,'HH24')
10 )
11 select full_range.hr, raw_data.cnt
12 from raw_data, full_range
13 where full_range.hr = raw_data.dhr(+)
14 order by 1;
HR CNT
------------------- ----------
09/02/2022 01:00:00
09/02/2022 02:00:00
09/02/2022 03:00:00
09/02/2022 04:00:00
09/02/2022 05:00:00
09/02/2022 06:00:00
09/02/2022 07:00:00
09/02/2022 08:00:00
09/02/2022 09:00:00
09/02/2022 10:00:00
09/02/2022 11:00:00
09/02/2022 12:00:00
09/02/2022 13:00:00 1
09/02/2022 14:00:00
09/02/2022 15:00:00 4
09/02/2022 16:00:00 10
09/02/2022 17:00:00 40
09/02/2022 18:00:00 126
09/02/2022 19:00:00 282
09/02/2022 20:00:00 595
09/02/2022 21:00:00 948
09/02/2022 22:00:00 1389
09/02/2022 23:00:00 1577
10/02/2022 00:00:00 1609
10/02/2022 01:00:00 1362
10/02/2022 02:00:00 956
10/02/2022 03:00:00 624
10/02/2022 04:00:00 281
10/02/2022 05:00:00 134
10/02/2022 06:00:00 43
10/02/2022 07:00:00 16
10/02/2022 08:00:00 2
10/02/2022 09:00:00
10/02/2022 10:00:00 1
10/02/2022 11:00:00
10/02/2022 12:00:00
10/02/2022 13:00:00
10/02/2022 14:00:00
10/02/2022 15:00:00
10/02/2022 16:00:00
10/02/2022 17:00:00
10/02/2022 18:00:00
10/02/2022 19:00:00
10/02/2022 20:00:00
10/02/2022 21:00:00
10/02/2022 22:00:00
10/02/2022 23:00:00
11/02/2022 00:00:00
48 rows selected.

Generate a table with interval of months in a year -Oracle

I have to create a table in the below format:-
TS_RANGE_BEGIN TS_RANGE_END
2019-01-01 17:00:00 2019-01-31 17:00:00
2019-02-01 17:00:00 2019-02-28 17:00:00
2019-03-01 17:00:00 2019-03-31 17:00:00
Could you please help on this?
Thanks,
Looks like a simple row generator:
SQL> alter session set nls_date_format = 'yyyy-mm-dd hh24:mi:ss';
Session altered.
SQL> with std (datum) as
2 (select to_date('01.01.2019 17:00', 'dd.mm.yyyy hh24:mi') from dual)
3 select add_months(datum, level - 1) ts_range_begin,
4 add_months(datum, level) - 1 ts_range_end
5 from std
6 connect by level <= 12;
TS_RANGE_BEGIN TS_RANGE_END
------------------- -------------------
2019-01-01 17:00:00 2019-01-31 17:00:00
2019-02-01 17:00:00 2019-02-28 17:00:00
2019-03-01 17:00:00 2019-03-31 17:00:00
2019-04-01 17:00:00 2019-04-30 17:00:00
2019-05-01 17:00:00 2019-05-31 17:00:00
2019-06-01 17:00:00 2019-06-30 17:00:00
2019-07-01 17:00:00 2019-07-31 17:00:00
2019-08-01 17:00:00 2019-08-31 17:00:00
2019-09-01 17:00:00 2019-09-30 17:00:00
2019-10-01 17:00:00 2019-10-31 17:00:00
2019-11-01 17:00:00 2019-11-30 17:00:00
2019-12-01 17:00:00 2019-12-31 17:00:00
12 rows selected.
SQL>
The STD CTE is used to set starting date.

Oracle SQL List Intervals

I need to create new interval rows based on a start datetime column and an end datetime column.
My statement looks like this currently
select id,
startdatetime,
enddatetime
from calls
result looks like this
id startdatetime enddatetime
1 01/01/2020 00:00:00 01/01/2020 04:00:00
I would like a result like this
id startdatetime enddatetime Intervals
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 03:00:00
Thanking you in advance
p.s. I'm new to SQL
You can use a recursive sub-query factoring clause to loop and incrementally add an hour:
WITH times ( id, startdatetime, enddatetime, intervals ) AS (
SELECT id,
startdatetime,
enddatetime,
startdatetime
FROM calls c
UNION ALL
SELECT id,
startdatetime,
enddatetime,
intervals + INTERVAL '1' HOUR
FROM times
WHERE intervals + INTERVAL '1' HOUR <= enddatetime
)
SELECT *
FROM times;
outputs:
ID | STARTDATETIME | ENDDATETIME | INTERVALS
-: | :------------------ | :------------------ | :------------------
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 00:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 01:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 02:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 03:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 04:00:00
db<>fiddle here
You can use the hierarchy query as following:
SQL> WITH CALLS (ID, STARTDATETIME, ENDDATETIME)
2 AS ( SELECT 1,
3 TO_DATE('01/01/2020 00:00:00', 'dd/mm/rrrr hh24:mi:ss'),
4 TO_DATE('01/01/2020 04:00:00', 'dd/mm/rrrr hh24:mi:ss')
5 FROM DUAL)
6 -- Your query starts from here
7 SELECT
8 ID,
9 STARTDATETIME,
10 ENDDATETIME,
11 STARTDATETIME + ( COLUMN_VALUE / 24 ) AS INTERVALS
12 FROM
13 CALLS C
14 CROSS JOIN TABLE ( CAST(MULTISET(
15 SELECT LEVEL - 1
16 FROM DUAL
17 CONNECT BY LEVEL <= TRUNC(24 *(ENDDATETIME - STARTDATETIME))
18 ) AS SYS.ODCINUMBERLIST) )
19 ORDER BY INTERVALS;
ID STARTDATETIME ENDDATETIME INTERVALS
---------- ------------------- ------------------- -------------------
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 03:00:00
SQL>
Cheers!!

Splitting interval overlapping more days in PostgreSQL

I have a PostgreSQL table containing start timestamp and duration time.
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00
2018-01-04 09:00:00 | 2 days 16 hours
What I would like is to have the interval splitted into every day like this:
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 01:00:00
2018-01-03 00:00:00 | 03:00:00
2018-01-04 09:00:00 | 15:00:00
2018-01-05 00:00:00 | 24:00:00
2018-01-06 00:00:00 | 24:00:00
2018-01-07 00:00:00 | 01:00:00
I am playing with generate_series(), width_bucket(), range functions, but I still can't find plausible solution. Is there any existing or working solution?
not sure about all edge cases, but this seems working:
t=# with c as (select *,min(t) over (), max(t+i) over (), tsrange(date_trunc('day',t),t+i) tr from t)
, mid as (
select distinct t,i,g,tr
, case when g < t then t else g end tt
from c
right outer join (select generate_series(date_trunc('day',min),date_trunc('day',max),'1 day') g from c) e on g <# tr order by 3,1
)
select
tt
, i
, case when tt+'1 day' > upper(tr) and t < g then upper(tr)::time::interval when upper(tr) - lower(tr) < '1 day' then i else g+'1 day' - tt end
from mid
order by tt;
tt | i | case
---------------------+-----------------+----------
2018-01-01 15:00:00 | 06:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00 | 01:00:00
2018-01-03 00:00:00 | 04:00:00 | 03:00:00
2018-01-04 09:00:00 | 2 days 16:00:00 | 15:00:00
2018-01-05 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-06 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-07 00:00:00 | 2 days 16:00:00 | 01:00:00
(7 rows)
also please mind that timestamp without time zone can fail you when comparing timestamps...

How to correctly handle daylight saving times with timestamps

I’m trying to create a table in Postgres that stores events which occur once every full hour every day for the next couple of years. So I populated a coloumn using the following expression:
INSERT INTO tablename(time)
SELECT CAST('2013-01-01' AS DATE) + (n || ' hour')::INTERVAL
FROM generate_series(0, 100000) n;
As a datatype for this column I chose timestamp with time zone and hoped in this way daylight saving time would be automatically taken into account. (Btw, my default time zone is CET, so it's UTC+1 or UTC+2 when DST applies). As a result of the above query I get this:
2013-03-31 00:00:00 +01
2013-03-31 01:00:00 +01
2013-03-31 03:00:00 +02
2013-03-31 03:00:00 +02
2013-03-31 04:00:00 +02
...
2013-10-27 00:00:00 +02
2013-10-27 01:00:00 +02
2013-10-27 02:00:00 +01
2013-10-27 03:00:00 +01
2013-10-27 04:00:00 +01
...
The offset to UTC changes and I expected that 02:00 is left out on March 31st as this day only has 23 hours, but I don’t know why 03:00 is there twice, whereas on October 27th 02:00 is only there once instead of twice as this day has 25 hours. What I would like to achieve is that for all years on the specific day in March 2 o'clock is not skipped (I would rather put in 'n. a.' or something for the corresponding value) and that there are two entries for 3'oclock on the specific day in October (but not in March), so that I'll get a column of the following form (where 1 stands for the hour from 00:00-1:00, 2 for 1:00-2:00, etc.):
2013-03-31 1 +01
2013-03-31 2 +01
2013-03-31 3 +02
2013-03-31 4 +02
2013-03-31 5 +02
...
2013-10-27 1 +02
2013-10-27 2 +02
2013-10-27 3A +02
2013-10-27 3B +01
2013-10-27 4 +01
2013-10-27 5 +01
...
Has anybody an idea how to go about it? Am I doing something basically wrong? Is it just a matter of formatting? Do I have to write a function? Any help would be appreciated. Thank you.
Date and time in Postgres are stored in UTC and are converted into local time according to the zone specified by the timezone configuration.
This means that you only need to solve the representation problem. Try using AT TIME ZONE 'UTC+2' to convert UTC time to your timezone and see the result. Here's query:
SELECT (CAST('2013-03-30' AS DATE) + (n || ' hour')::INTERVAL) AT TIME ZONE 'UTC+2'
FROM generate_series(0, 1000) n;
The timestamp is always stored as UTC regardless of time zone settings. From the manual
For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone.
set time zone 'CET';
drop table if exists events;
create table events (
tstz timestamp with time zone
);
insert into events (tstz)
select generate_series('2013-01-01', '2013-10-28', interval '1 hour') s(tstz)
;
Notice the use of the generate_series function.
select
tstz at time zone 'UTC' as "UTC",
tstz at time zone 'CET' as "CET",
tstz at time zone 'CEST' as "CEST",
tstz as "LOCAL"
from events
where date_trunc('day', tstz) in ('2013-03-31', '2013-10-27')
order by tstz
;
UTC | CET | CEST | LOCAL
---------------------+---------------------+---------------------+------------------------
2013-03-30 23:00:00 | 2013-03-31 00:00:00 | 2013-03-31 01:00:00 | 2013-03-31 00:00:00+01
2013-03-31 00:00:00 | 2013-03-31 01:00:00 | 2013-03-31 02:00:00 | 2013-03-31 01:00:00+01
2013-03-31 01:00:00 | 2013-03-31 02:00:00 | 2013-03-31 03:00:00 | 2013-03-31 03:00:00+02
2013-03-31 02:00:00 | 2013-03-31 03:00:00 | 2013-03-31 04:00:00 | 2013-03-31 04:00:00+02
2013-03-31 03:00:00 | 2013-03-31 04:00:00 | 2013-03-31 05:00:00 | 2013-03-31 05:00:00+02
...
2013-10-26 22:00:00 | 2013-10-26 23:00:00 | 2013-10-27 00:00:00 | 2013-10-27 00:00:00+02
2013-10-26 23:00:00 | 2013-10-27 00:00:00 | 2013-10-27 01:00:00 | 2013-10-27 01:00:00+02
2013-10-27 00:00:00 | 2013-10-27 01:00:00 | 2013-10-27 02:00:00 | 2013-10-27 02:00:00+02
2013-10-27 01:00:00 | 2013-10-27 02:00:00 | 2013-10-27 03:00:00 | 2013-10-27 02:00:00+01
2013-10-27 02:00:00 | 2013-10-27 03:00:00 | 2013-10-27 04:00:00 | 2013-10-27 03:00:00+01
2013-10-27 03:00:00 | 2013-10-27 04:00:00 | 2013-10-27 05:00:00 | 2013-10-27 04:00:00+01
2013-10-27 04:00:00 | 2013-10-27 05:00:00 | 2013-10-27 06:00:00 | 2013-10-27 05:00:00+01
If a timestamp with timestamp column is selectec without using at time zone as in the LOCAL column above, it will be outputed at the server time zone at that timestamp. That is why there are missing and duplicated hours.
I think your desired output is wrong. But it is achievable with some query fu
I can't reproduce your actual output. What is the server time zone?
show time zone;
TimeZone
----------
CET