How to correctly handle daylight saving times with timestamps - sql

I’m trying to create a table in Postgres that stores events which occur once every full hour every day for the next couple of years. So I populated a coloumn using the following expression:
INSERT INTO tablename(time)
SELECT CAST('2013-01-01' AS DATE) + (n || ' hour')::INTERVAL
FROM generate_series(0, 100000) n;
As a datatype for this column I chose timestamp with time zone and hoped in this way daylight saving time would be automatically taken into account. (Btw, my default time zone is CET, so it's UTC+1 or UTC+2 when DST applies). As a result of the above query I get this:
2013-03-31 00:00:00 +01
2013-03-31 01:00:00 +01
2013-03-31 03:00:00 +02
2013-03-31 03:00:00 +02
2013-03-31 04:00:00 +02
...
2013-10-27 00:00:00 +02
2013-10-27 01:00:00 +02
2013-10-27 02:00:00 +01
2013-10-27 03:00:00 +01
2013-10-27 04:00:00 +01
...
The offset to UTC changes and I expected that 02:00 is left out on March 31st as this day only has 23 hours, but I don’t know why 03:00 is there twice, whereas on October 27th 02:00 is only there once instead of twice as this day has 25 hours. What I would like to achieve is that for all years on the specific day in March 2 o'clock is not skipped (I would rather put in 'n. a.' or something for the corresponding value) and that there are two entries for 3'oclock on the specific day in October (but not in March), so that I'll get a column of the following form (where 1 stands for the hour from 00:00-1:00, 2 for 1:00-2:00, etc.):
2013-03-31 1 +01
2013-03-31 2 +01
2013-03-31 3 +02
2013-03-31 4 +02
2013-03-31 5 +02
...
2013-10-27 1 +02
2013-10-27 2 +02
2013-10-27 3A +02
2013-10-27 3B +01
2013-10-27 4 +01
2013-10-27 5 +01
...
Has anybody an idea how to go about it? Am I doing something basically wrong? Is it just a matter of formatting? Do I have to write a function? Any help would be appreciated. Thank you.

Date and time in Postgres are stored in UTC and are converted into local time according to the zone specified by the timezone configuration.
This means that you only need to solve the representation problem. Try using AT TIME ZONE 'UTC+2' to convert UTC time to your timezone and see the result. Here's query:
SELECT (CAST('2013-03-30' AS DATE) + (n || ' hour')::INTERVAL) AT TIME ZONE 'UTC+2'
FROM generate_series(0, 1000) n;

The timestamp is always stored as UTC regardless of time zone settings. From the manual
For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone.
set time zone 'CET';
drop table if exists events;
create table events (
tstz timestamp with time zone
);
insert into events (tstz)
select generate_series('2013-01-01', '2013-10-28', interval '1 hour') s(tstz)
;
Notice the use of the generate_series function.
select
tstz at time zone 'UTC' as "UTC",
tstz at time zone 'CET' as "CET",
tstz at time zone 'CEST' as "CEST",
tstz as "LOCAL"
from events
where date_trunc('day', tstz) in ('2013-03-31', '2013-10-27')
order by tstz
;
UTC | CET | CEST | LOCAL
---------------------+---------------------+---------------------+------------------------
2013-03-30 23:00:00 | 2013-03-31 00:00:00 | 2013-03-31 01:00:00 | 2013-03-31 00:00:00+01
2013-03-31 00:00:00 | 2013-03-31 01:00:00 | 2013-03-31 02:00:00 | 2013-03-31 01:00:00+01
2013-03-31 01:00:00 | 2013-03-31 02:00:00 | 2013-03-31 03:00:00 | 2013-03-31 03:00:00+02
2013-03-31 02:00:00 | 2013-03-31 03:00:00 | 2013-03-31 04:00:00 | 2013-03-31 04:00:00+02
2013-03-31 03:00:00 | 2013-03-31 04:00:00 | 2013-03-31 05:00:00 | 2013-03-31 05:00:00+02
...
2013-10-26 22:00:00 | 2013-10-26 23:00:00 | 2013-10-27 00:00:00 | 2013-10-27 00:00:00+02
2013-10-26 23:00:00 | 2013-10-27 00:00:00 | 2013-10-27 01:00:00 | 2013-10-27 01:00:00+02
2013-10-27 00:00:00 | 2013-10-27 01:00:00 | 2013-10-27 02:00:00 | 2013-10-27 02:00:00+02
2013-10-27 01:00:00 | 2013-10-27 02:00:00 | 2013-10-27 03:00:00 | 2013-10-27 02:00:00+01
2013-10-27 02:00:00 | 2013-10-27 03:00:00 | 2013-10-27 04:00:00 | 2013-10-27 03:00:00+01
2013-10-27 03:00:00 | 2013-10-27 04:00:00 | 2013-10-27 05:00:00 | 2013-10-27 04:00:00+01
2013-10-27 04:00:00 | 2013-10-27 05:00:00 | 2013-10-27 06:00:00 | 2013-10-27 05:00:00+01
If a timestamp with timestamp column is selectec without using at time zone as in the LOCAL column above, it will be outputed at the server time zone at that timestamp. That is why there are missing and duplicated hours.
I think your desired output is wrong. But it is achievable with some query fu
I can't reproduce your actual output. What is the server time zone?
show time zone;
TimeZone
----------
CET

Related

SQL-Aggregate Timeseries Table (HourOfDay, Val) to Average Value of HourOfDay by Weeekday (fi. Avg of Mondays 10:00-11:00, 11:00-12:00,...,Tue...)

So far I made an SQL query that provides me with a table containing the amount of customers handled for each hour of the day - given a arbitrary start and an end datetime value (from Grafana interface). The result might be over many weeks. My goal is to implement an hourly heatmap by weekday with averaged values.
How do I aggregate those customer per hour to show the average value of that hours per weekday?
So let's say I got 24 values per day over 19 days. How do I aggregate so I get 24 values for each mon, tue, wed, thu, fri, sat, sun - each hour representing the average value for those days?
Also only use data of full weeks, so strip leading and trailing days, that are not part of a fully represented week (so same amount of individual weekdays representing an average value).
Here is a segment on how the return of my SQL query looks so far. (hour of each day, number of customers):
...
2021-12-13 11:00:00 | 0
2021-12-13 12:00:00 | 3
2021-12-13 13:00:00 | 4
2021-12-13 14:00:00 | 4
2021-12-13 15:00:00 | 7
2021-12-13 16:00:00 | 17
2021-12-13 17:00:00 | 12
2021-12-13 18:00:00 | 18
2021-12-13 19:00:00 | 15
2021-12-13 20:00:00 | 8
2021-12-13 21:00:00 | 10
2021-12-13 22:00:00 | 1
2021-12-13 23:00:00 | 0
2021-12-14 00:00:00 | 0
2021-12-14 01:00:00 | 0
2021-12-14 02:00:00 | 0
2021-12-14 03:00:00 | 0
2021-12-14 04:00:00 | 0
2021-12-14 05:00:00 | 0
2021-12-14 06:00:00 | 0
2021-12-14 07:00:00 | 0
2021-12-14 08:00:00 | 0
2021-12-14 09:00:00 | 0
2021-12-14 10:00:00 | 12
2021-12-14 11:00:00 | 12
2021-12-14 12:00:00 | 19
2021-12-14 13:00:00 | 11
2021-12-14 14:00:00 | 11
2021-12-14 15:00:00 | 12
2021-12-14 16:00:00 | 9
2021-12-14 17:00:00 | 2
...
So (schematically, example data) startDate 2021-12-10 11:00 to endDate 2021-12-31 17:00
-------------------------------
...
Mon 2021-12-13 12:00 | 3
Mon 2021-12-13 13:00 | 4
Mon 2021-12-13 14:00 | 4
...
Mon 2021-12-20 12:00 | 1
Mon 2021-12-20 13:00 | 6
Mon 2021-12-20 13:00 | 2
...
Mon 2021-12-27 12:00 | 2
Mon 2021-12-27 13:00 | 2
Mon 2021-12-27 13:00 | 3
...
-------------------------------
into this:
strip leading fri 10., sat 11., sun 12.
strip trailing tue 28., wen 29., thu 30., fri 31.
average hours per weekday
-------------------------------
...
Mon 12:00 | 2
Mon 13:00 | 4
Mon 14:00 | 3
...
Tue 12:00 | x
Tue 13:00 | y
Tue 13:00 | z
...
-------------------------------
My approach so far:
WITH CustomersPerHour as (
SELECT dateadd(hour, datediff(hour, 0, Systemdatum),0) as DayHour, Count(*) as C
FROM CustomerList
WHERE CustomerID > 0
AND Datum BETWEEN '2021-12-010T11:00:00Z' AND '2021-12-31T17:00:00Z'
AND EntryID IN (62,65)
AND CustomerID IN (SELECT * FROM udf_getActiveUsers())
GROUP BY dateadd(hour, datediff(hour, 0, Systemdatum), 0)
)
-- add null values on missing data/insert missing hours
SELECT DATEDIFF(second, '1970-01-01', dt.Date) AS time, C as Customers
FROM dbo.udf_generateHoursTable('2021-12-03T18:14:56Z', '2022-03-13T18:14:56Z') as dt
LEFT JOIN CustomersPerHour cPh ON dt.Date = cPh.DayHour
ORDER BY
time ASC
Hi simpliest solution is just do what you have written in example. Create custom base for aggregation.
So first step is to prepare your data in aggregated table with Date & Hour precision & customer count.
Then create base.
This is example of basic idea:
-- EXAMPLE
SELECT
DATENAME(WEEKDAY, GETDATE()) + ' ' + CAST(DATEPART(HOUR, GETDATE()) + ':00' AS varchar(8))
-- OUTPUT: Sunday 21:00
You can concatenate data and then use it in GROUP BY clause.
Adjust this query for your use case:
SELECT
DATENAME(WEEKDAY, <DATETIME_COL>) + ' ' + CAST(DATEPART(HOUR, <DATETIME_COL>) AS varchar(8)) + ':00' as base
,SUM(...) as sum_of_whatever
,AVG(...) as avg_of_whatever
FROM <YOUR_AGG_TABLE>
GROUP BY DATENAME(WEEKDAY, <DATETIME_COL>) + ' ' + CAST(DATEPART(HOUR, <DATETIME_COL>) AS varchar(8)) + ':00'
This create base exactly as you wanted.
You can use this logic to create other desired agg. bases.

Get correct offset using timezoneoffse table

I am trying to figure out the offset that should be applied to a meeting with start and end date time.
Timezone table below stores the utc offset in minutes and when the utc offset became active.
Timezone Table
TimezoneCode StartDate EndDate UtcOffSetInMinute
Antarctica/Casey 2020-04-05 02:00:00 2020-09-26 02:00:00 720
Antarctica/Casey 2020-09-27 05:00:00 2020-05-03 05:00:00 780
Meeting table which stores all the meetings
Meeting Table
|Id | StartDateTime | EndDateTime
+----+---------------------+----------------------
|1 | 2020-04-06 23:00:00 | 2020-09-26 05:00:00
|2 | 2020-10-21 10:00:00 | 2020-10-21 11:00:00
Using the above timezone table I am struggling to figure local time of meeting.
How can we join the timezone table with meeting table and get the utcoffset for meeting based on date range?
Expected output
|Id | StartDateTime | EndDateTime | OffsetInMins
+----+---------------------+----------------------
|1 | 2020-04-06 23:00:00 | 2020-09-26 05:00:00 | 720
|2 | 2020-09-27 23:00:00 | 2020-09-29 05:00:00 | 780

Oracle SQL List Intervals

I need to create new interval rows based on a start datetime column and an end datetime column.
My statement looks like this currently
select id,
startdatetime,
enddatetime
from calls
result looks like this
id startdatetime enddatetime
1 01/01/2020 00:00:00 01/01/2020 04:00:00
I would like a result like this
id startdatetime enddatetime Intervals
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 03:00:00
Thanking you in advance
p.s. I'm new to SQL
You can use a recursive sub-query factoring clause to loop and incrementally add an hour:
WITH times ( id, startdatetime, enddatetime, intervals ) AS (
SELECT id,
startdatetime,
enddatetime,
startdatetime
FROM calls c
UNION ALL
SELECT id,
startdatetime,
enddatetime,
intervals + INTERVAL '1' HOUR
FROM times
WHERE intervals + INTERVAL '1' HOUR <= enddatetime
)
SELECT *
FROM times;
outputs:
ID | STARTDATETIME | ENDDATETIME | INTERVALS
-: | :------------------ | :------------------ | :------------------
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 00:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 01:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 02:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 03:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 04:00:00
db<>fiddle here
You can use the hierarchy query as following:
SQL> WITH CALLS (ID, STARTDATETIME, ENDDATETIME)
2 AS ( SELECT 1,
3 TO_DATE('01/01/2020 00:00:00', 'dd/mm/rrrr hh24:mi:ss'),
4 TO_DATE('01/01/2020 04:00:00', 'dd/mm/rrrr hh24:mi:ss')
5 FROM DUAL)
6 -- Your query starts from here
7 SELECT
8 ID,
9 STARTDATETIME,
10 ENDDATETIME,
11 STARTDATETIME + ( COLUMN_VALUE / 24 ) AS INTERVALS
12 FROM
13 CALLS C
14 CROSS JOIN TABLE ( CAST(MULTISET(
15 SELECT LEVEL - 1
16 FROM DUAL
17 CONNECT BY LEVEL <= TRUNC(24 *(ENDDATETIME - STARTDATETIME))
18 ) AS SYS.ODCINUMBERLIST) )
19 ORDER BY INTERVALS;
ID STARTDATETIME ENDDATETIME INTERVALS
---------- ------------------- ------------------- -------------------
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 03:00:00
SQL>
Cheers!!

How to change the default day of week and timestamp using date_trunc in snowflake sql

I have a timestamp variable as input, and I want to group the data by week, with a week defined as being between saturday 21:00:00 and saturday 20:59:59. I am querying from a snowflake database.
My data looks like this:
employee_id | shift_started_at | hours_worked
1 | '2018-09-12 08:00:00' | 2
2 | '2018-09-10 22:00:00' | 8
1 | '2018-09-18 08:00:00' | 3
I am trying something like this:
alter session set week_start = 6;
SELECT dateadd('hour',21,date_trunc('week',shift_started_at)) as week_starts_at,
min(shift_started_at) as first_shift_of_week,
max(shift_started_at) as last_shift_of_week,
sum(hours_worked)
FROM table
group by 1;
But even though this query gives me the right date for week_starts_at, the min and max select statements show that the group by statement is ignoring the dateadd function. In short, my weeks are being counted from midnight to midnight on saturday. Any advice on how to change the default timestamp used by date_trunc? Thank you!
The problem is that you're applying date_trunc(week..) before adjusting the time by hours. One solution would be:
first, move the shift times by 3 hours forward, so 9pm shift starts on Sunday midnight
then truncate to a week, with Sunday being the first day of the week
then move the result back 3 hours, to 21 on Saturday
Here's an example:
alter session set week_start = 7, timestamp_output_format='YYYY-MM-DD HH24:MI:SS DY';
create or replace table x(t timestamp);
insert into x values('2018-09-14 08:00:00'),('2018-09-15 20:59:59'),('2018-09-15 21:00:00'),('2018-09-16 23:00:00'), ('2018-09-22 20:59:59'),('2018-09-22 21:00:00');
select t, dateadd(hour, 3, t), date_trunc(week, dateadd(hour, 3, t)), dateadd(hour, -3, date_trunc(week, dateadd(hour, 3, t))) from x;
-------------------------+-------------------------+---------------------------------------+----------------------------------------------------------+
T | DATEADD(HOUR, 3, T) | DATE_TRUNC(WEEK, DATEADD(HOUR, 3, T)) | DATEADD(HOUR, -3, DATE_TRUNC(WEEK, DATEADD(HOUR, 3, T))) |
-------------------------+-------------------------+---------------------------------------+----------------------------------------------------------+
2018-09-14 08:00:00 Fri | 2018-09-14 11:00:00 Fri | 2018-09-09 00:00:00 Sun | 2018-09-08 21:00:00 Sat |
2018-09-15 20:59:59 Sat | 2018-09-15 23:59:59 Sat | 2018-09-09 00:00:00 Sun | 2018-09-08 21:00:00 Sat |
2018-09-15 21:00:00 Sat | 2018-09-16 00:00:00 Sun | 2018-09-16 00:00:00 Sun | 2018-09-15 21:00:00 Sat |
2018-09-16 23:00:00 Sun | 2018-09-17 02:00:00 Mon | 2018-09-16 00:00:00 Sun | 2018-09-15 21:00:00 Sat |
2018-09-22 20:59:59 Sat | 2018-09-22 23:59:59 Sat | 2018-09-16 00:00:00 Sun | 2018-09-15 21:00:00 Sat |
2018-09-22 21:00:00 Sat | 2018-09-23 00:00:00 Sun | 2018-09-23 00:00:00 Sun | 2018-09-22 21:00:00 Sat |
-------------------------+-------------------------+---------------------------------------+----------------------------------------------------------+
The last column you can use for grouping, and things should work.

Splitting interval overlapping more days in PostgreSQL

I have a PostgreSQL table containing start timestamp and duration time.
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00
2018-01-04 09:00:00 | 2 days 16 hours
What I would like is to have the interval splitted into every day like this:
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 01:00:00
2018-01-03 00:00:00 | 03:00:00
2018-01-04 09:00:00 | 15:00:00
2018-01-05 00:00:00 | 24:00:00
2018-01-06 00:00:00 | 24:00:00
2018-01-07 00:00:00 | 01:00:00
I am playing with generate_series(), width_bucket(), range functions, but I still can't find plausible solution. Is there any existing or working solution?
not sure about all edge cases, but this seems working:
t=# with c as (select *,min(t) over (), max(t+i) over (), tsrange(date_trunc('day',t),t+i) tr from t)
, mid as (
select distinct t,i,g,tr
, case when g < t then t else g end tt
from c
right outer join (select generate_series(date_trunc('day',min),date_trunc('day',max),'1 day') g from c) e on g <# tr order by 3,1
)
select
tt
, i
, case when tt+'1 day' > upper(tr) and t < g then upper(tr)::time::interval when upper(tr) - lower(tr) < '1 day' then i else g+'1 day' - tt end
from mid
order by tt;
tt | i | case
---------------------+-----------------+----------
2018-01-01 15:00:00 | 06:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00 | 01:00:00
2018-01-03 00:00:00 | 04:00:00 | 03:00:00
2018-01-04 09:00:00 | 2 days 16:00:00 | 15:00:00
2018-01-05 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-06 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-07 00:00:00 | 2 days 16:00:00 | 01:00:00
(7 rows)
also please mind that timestamp without time zone can fail you when comparing timestamps...