How add extra hour to the table showing only timeseries dates - sql

I have the following table.
Every last Sunday of October the time changes (so for this day extra hour is created during the day). I.E. on 25.04.2020, time shifts 1 hour back at 03:00 (GMT+3), so for that day we have 25 hours. How is it possible that this SQL will also show this extra hour as a duplicate to 25.04.2020 03:00.
The same is for March, however, it shows duplicated values at 04:00 (GMT+3) and ignores 03:00. How is it possible to replace one of two 04:00 with 03:00?
I also noticed that when adding TO_CHAR(ts,'YYYY-MM-DD HH24:00:00') (instead of select ts from ...), it solves the problem for March, however, if use TO_CHAR(ts,'YYYY-MM-DD HH24:00:00')::timestamp then the duplicate appears again.
SELECT ts FROM (
SELECT '2020-10-20'::TIMESTAMP AT TIME ZONE 'UTC' AS tm
UNION
SELECT '2020-10-30'::TIMESTAMP AT TIME ZONE 'UTC' AS tm
) AS t TIMESERIES ts AS '1 Hour' OVER (ORDER BY tm)
ORDER BY ts
My output and desired output

Related

SQL timestamp filtering based only on time

I want to create a query in Oracle SQL that will grab records from a given time interval, during certain hours of the day, e.g. records between 10am to noon, in the past 10 days. I tried this, but it does not work:
select * from my_table where timestamp between
to_timestamp('2020-12-30','YYYY-MM-DD')
and
to_timestamp('2021-01-08','YYYY-MM-DD') and
timestamp between
to_timestamp('10:00:00','HH24:MI:SS')
and
to_timestamp('12:00:00','HH24:MI:SS')
where timestamp is of type TIMESTAMP. I have also thought of using a join, but I am struggling to find a way to filter on time of day.
Is there a way to filter using only the time, not the date, or a way to filter on time for every day in the interval?
select *
from my_table
where timestamp between to_timestamp('2020-12-30','YYYY-MM-DD')
and to_timestamp('2021-01-08','YYYY-MM-DD')
and timestamp - trunc(timestamp) between interval '10' hour
and interval '12' hour
If you don't need to include exactly noon (including no fractional seconds), you could also do
select *
from my_table
where timestamp between to_timestamp('2020-12-30','YYYY-MM-DD')
and to_timestamp('2021-01-08','YYYY-MM-DD')
and extract( hour from timestamp ) between 10 and 11
As an aside, I'd hope that your actual column name isn't timestamp. It's legal as a column name but it is a reserved word so you're generally much better off using a different name.

Get count of matching time ranges for every minute of the day in Postgres

Problem
I have a table of records each containing id, in_datetime, and out_datetime. A record is considered "open" during the time between the in_datetime and out_datetime. I want to know how many time records were "open" for each minute of the day (regardless of date). For example, for the last 90 days I want to know how many records were "open" at 3:14 am, then 3:15 am, then 3:16 am, then... If no records were "open" at 2:00 am the query should return 0 or null instead of excluding the row, thus 1440 rows should always be returned (the number of minutes in a day). Datetimes are stored in UTC and need to be cast to a time zone.
Simplified example graphic
record_id | time_range
| 0123456789 (these are minutes past midnight)
1 | =========
2 | ===
3 | =======
4 | ===
5 | ==
______________________
result 3323343210
Desired output
time | count of open records at this time
00:00 120
00:01 135
00:02 132
...
23:57 57
23:58 62
23:59 60
No more than 1440 records would ever be returned as there are only 1440 minutes in the day.
What I've tried
1.) In a subquery, I currently generate a minutely series of times for the entire range of each time record. I then group those by time and get a count of the records per minute.
Here is a db-fiddle using my current query:
select
trs.minutes,
count(trs.minutes)
from (
select
generate_series(
DATE_TRUNC('minute', (time_records.in_datetime::timestamptz AT TIME ZONE 'America/Denver')),
DATE_TRUNC('minute', (time_records.out_datetime::timestamptz AT TIME ZONE 'America/Denver')),
interval '1 min'
)::time as minutes
from
time_records
) trs
group by
trs.minutes
This works but is quite inefficient and takes several seconds to run due to the size of my table. Additionally, it excludes times when no records were open. I think somehow I could use window functions to count the number of overlapping time records for each minute of the day, but I don't quite understand how to do that.
2.) Modifying Gordon Linoff's query in his answer below, I came to this (db-fiddle link):
with tr as (
select
date_trunc('minute', (tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver'))::time as m,
1 as inc
from
time_records tr
union all
select
(date_trunc('minute', (tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')) + interval '1 minute')::time as m,
-1 as inc
from
time_records tr
union all
select
minutes::time,
0
from
generate_series(timestamp '2000-01-01 00:00', timestamp '2000-01-01 23:59', interval '1 min') as minutes
)
select
m,
sum(inc) as changes_at_inc,
sum(sum(inc)) over (order by m) as running_count
from
tr
where
m is not null
group by
m
order by
m;
This runs reasonably quickly, but towards the end of the day (about 22:00 onwards in the linked example) the values turn negative for some reason. Additionally, this query doesn't seem to work correctly with records with time ranges that cross over midnight. It's a step in the right direction, but I unfortunately don't understand it enough to improve on it further.
Here is a faster method. Generate "in" and "out" records for when something gets counted. Then aggregate and use a running sum.
To get all minutes, throw in a generate_series() for the time period in question:
with tr as (
select date_trunc('minute', (tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver')) as m,
1 as inc
from time_records tr
union all
select date_trunc('minute', (tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')) + interval '1 minute' as m,
-1 as inc
from time_records tr
union all
select generate_series(date_trunc('minute',
min(tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver')),
date_trunc('minute',
max(tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')),
interval '1 minute'
), 0
from time_records tr
)
select m,
sum(inc) as changes_at_inc,
sum(sum(inc)) over (order by m) as running_count
from tr
group by m
order by m;

Vertica date series is starting one month before specified date

I work with a Vertica database and I needed to make a query that, given two dates, would give me a list of all months between said dates. For example, if I were to give the query 2015-01-01 and 2015-12-31, it would output me the following list:
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01
After a bit of digging, I was able to discover the following query:
SELECT date_trunc('MONTH', ts)::date as Mois
FROM
(
SELECT '2015-01-01'::TIMESTAMP as tm
UNION
SELECT '2015-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)
This query works and gives me the following output:
2014-12-01
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01
As you can see, by giving the query a starting date of '2015-01-01' or anywhere in january for that matters, I end up with an extra entry, namely 2014-12-01. In itself, the bug (or whatever you want to call this unexpected behavior) is easy to circumvent (just start in february), but I have to admit my curiosity's piked. Why exactly is the serie starting one month BEFORE the date I specified?
EDIT: Alright, after reading Kimbo's warning and confirming that indeed, long periods will eventually cause problems, I was able to come up with the following query that readjusts the dates correctly.
SELECT ts as originalMonth,
ts +
(
mod
(
day(first_value(ts) over (order by ts)) - day(ts) + day(last_day(ts)),
day(last_day(ts))
)
) as adjustedMonth
FROM
(
SELECT ts
FROM
(
SELECT '2015-01-01'::TIMESTAMP as tm
UNION
SELECT '2018-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)
) as temp
The only problem I have is that I have no control over the initial day of the first record of the series. It's set automatically by Vertica to the current day. So if I run this query on the 31st of the month, I wonder how it'll behave. I guess I'll just have to wait for december to see unless someone knows how to get timeseries to behave in a way that would allow me to test it.
EDIT: Okay, so after trying out many different date combinations, I was able to determine that the day which the series starts changes depending on the date you specify. This caused a whole lot of problems... until we decided to go the simple way. Instead of using a month interval, we used a day interval and only selected one specific day per month. WAY simpler and it works all the time. Here's the final query:
SELECT ts as originalMonth
FROM
(
SELECT ts
FROM
(
SELECT '2000-02-01'::TIMESTAMP as tm
UNION
SELECT '2018-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 day' OVER (ORDER BY tm)
) as temp
where day(ts) = 1
I think it boils down to this statement from the doc: http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/SELECT/TIMESERIESClause.htm
TIME_SLICE can return the start or end time of a time slice, depending
on the value of its fourth input parameter (start_or_end). TIMESERIES,
on the other hand, always returns the start time of each time slice.
When you define a time interval with some start date (2015-01-01, for example), then TIMESERIES ts AS '1 month' will create for its first time slice a slice that starts 1 month ahead of that first data point, so 2014-12-01. When you do DATE_TRUNC('MON', ts), that of course sets the first date value to 2014-12-01 even if your start date is 2015-01-03, or whatever.
e: I want to throw out one more warning -- your use of DATE_TRUNC achieves what you need, I think. But, from the doc: Unlike TIME_SLICE, the time slice length and time unit expressed in [TIMESERIES] length_and_time_unit_expr must be constants so gaps in the time slices are well-defined. This means that '1 month' is actually 30 days exactly. This obviously has problems if you're going for more than a couple years.

Get timestamp of one month ago in PostgreSQL

I have a PostgreSQL database in which one table rapidly grows very large (several million rows every month or so) so I'd like to periodically archive the contents of that table into a separate table.
I'm intending to use a cron job to execute a .sql file nightly to archive all rows that are older than one month into the other table.
I have the query working fine, but I need to know how to dynamically create a timestamp of one month prior.
The time column is stored in the format 2013-10-27 06:53:12 and I need to know what to use in an SQL query to build a timestamp of exactly one month prior. For example, if today is October 27, 2013, I want the query to match all rows where time < 2013-09-27 00:00:00
Question was answered by a friend in IRC:
'now'::timestamp - '1 month'::interval
Having the timestamp return 00:00:00 wasn't terrible important, so this works for my intentions.
select date_trunc('day', NOW() - interval '1 month')
This query will return date one month ago from now and round time to 00:00:00.
When you need to query for the data of previous month, then you need to query for the respective date column having month values as (current_month-1).
SELECT *
FROM {table_name}
WHERE {column_name} >= date_trunc('month', current_date-interval '1' month)
AND {column_name} < date_trunc('month', current_date)
The first condition of where clause will search the date greater than the first day (00:00:00 Day 1 of Previous Month)of previous month and second clause will search for the date less than the first day of current month(00:00:00 Day 1 of Current Month).
This will includes all the results where date lying in previous month.

datetime manipulation: replace all dates with 00:00 time with 24:00 the previous day

I have a table described here: http://sqlfiddle.com/#!3/f8852/3
The date_time field for when the time is 00:00 is wrong. For example:
5/24/2013 00:00
This should really be:
5/23/2013 24:00
So hour 00:00 corresponds to the last hour of the previous day (I didn't create this table but have to work with it). Is there way quick way when I do a select I can replace all dates with 00:00 as the time with 24:00 the previous day? I can do it easily in python in a for loop but not quite sure how to structure it in sql. Appreciate the help.
All datetimes are instants in time, not spans of a finite length, and they can exist in only one day. The instant that represents Midnight is by definition, in the next day, the day in which it is the start of the day, i.e., a day is closed on its beginning and open at its end, or, to phrase it again, valid allowable time values within a single calendar date vary from 00:00:00.00000, to 23:59:59.9999.
This would be analogous to asking that the minute value within an hour be allowed to vary from 1 to 60, instead of from 0 to 59, and that the value of 60 was the last minute of the previous hour.
What you are talking about is only a display issue. Even if you could enter a date as 1 Jan 2013 24:00, (24:00:00 is not a legal time of day) it would be entered as a datetime at the start of the date 2 Jan, not at the end of 1 Jan.
One thing that illustrates this, is to notice that, because of rounding (SQL can only resolve datetimes to within about 300 milleseconds), if you create a datetime that is only a few milleseconds before midnight, it will round up to midnight and move to the next day, as can be seen by running the following in enterprise manager...
Select cast ('1 Jan 2013 23:59:59.999' as datetime)
SQL server stoers all datetimes as two integers, one that represents the number days since 1 Jan 1900, and the other the number of ticks (1 tick is 1/300th of a second, about 3.33 ms), since midnight. If it has been zero time interval since Midnight, it is stll the same day, not the previous day.
If you have been inserting data assuming that midnight 00:00:00 means the end of the day, you need to fix that.
If you need to correct your existing data, you need to add one day to every date in your database that has midnight as it's time component, (i.e., has a zero time component).
Update tbale set
date_time = dateAdd(day, 1, date_time)
Where date_time = dateadd(day, datediff(day, 0, date_time), 0)