Vertica date series is starting one month before specified date - sql

I work with a Vertica database and I needed to make a query that, given two dates, would give me a list of all months between said dates. For example, if I were to give the query 2015-01-01 and 2015-12-31, it would output me the following list:
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01
After a bit of digging, I was able to discover the following query:
SELECT date_trunc('MONTH', ts)::date as Mois
FROM
(
SELECT '2015-01-01'::TIMESTAMP as tm
UNION
SELECT '2015-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)
This query works and gives me the following output:
2014-12-01
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01
As you can see, by giving the query a starting date of '2015-01-01' or anywhere in january for that matters, I end up with an extra entry, namely 2014-12-01. In itself, the bug (or whatever you want to call this unexpected behavior) is easy to circumvent (just start in february), but I have to admit my curiosity's piked. Why exactly is the serie starting one month BEFORE the date I specified?
EDIT: Alright, after reading Kimbo's warning and confirming that indeed, long periods will eventually cause problems, I was able to come up with the following query that readjusts the dates correctly.
SELECT ts as originalMonth,
ts +
(
mod
(
day(first_value(ts) over (order by ts)) - day(ts) + day(last_day(ts)),
day(last_day(ts))
)
) as adjustedMonth
FROM
(
SELECT ts
FROM
(
SELECT '2015-01-01'::TIMESTAMP as tm
UNION
SELECT '2018-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)
) as temp
The only problem I have is that I have no control over the initial day of the first record of the series. It's set automatically by Vertica to the current day. So if I run this query on the 31st of the month, I wonder how it'll behave. I guess I'll just have to wait for december to see unless someone knows how to get timeseries to behave in a way that would allow me to test it.
EDIT: Okay, so after trying out many different date combinations, I was able to determine that the day which the series starts changes depending on the date you specify. This caused a whole lot of problems... until we decided to go the simple way. Instead of using a month interval, we used a day interval and only selected one specific day per month. WAY simpler and it works all the time. Here's the final query:
SELECT ts as originalMonth
FROM
(
SELECT ts
FROM
(
SELECT '2000-02-01'::TIMESTAMP as tm
UNION
SELECT '2018-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 day' OVER (ORDER BY tm)
) as temp
where day(ts) = 1

I think it boils down to this statement from the doc: http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/SELECT/TIMESERIESClause.htm
TIME_SLICE can return the start or end time of a time slice, depending
on the value of its fourth input parameter (start_or_end). TIMESERIES,
on the other hand, always returns the start time of each time slice.
When you define a time interval with some start date (2015-01-01, for example), then TIMESERIES ts AS '1 month' will create for its first time slice a slice that starts 1 month ahead of that first data point, so 2014-12-01. When you do DATE_TRUNC('MON', ts), that of course sets the first date value to 2014-12-01 even if your start date is 2015-01-03, or whatever.
e: I want to throw out one more warning -- your use of DATE_TRUNC achieves what you need, I think. But, from the doc: Unlike TIME_SLICE, the time slice length and time unit expressed in [TIMESERIES] length_and_time_unit_expr must be constants so gaps in the time slices are well-defined. This means that '1 month' is actually 30 days exactly. This obviously has problems if you're going for more than a couple years.

Related

How add extra hour to the table showing only timeseries dates

I have the following table.
Every last Sunday of October the time changes (so for this day extra hour is created during the day). I.E. on 25.04.2020, time shifts 1 hour back at 03:00 (GMT+3), so for that day we have 25 hours. How is it possible that this SQL will also show this extra hour as a duplicate to 25.04.2020 03:00.
The same is for March, however, it shows duplicated values at 04:00 (GMT+3) and ignores 03:00. How is it possible to replace one of two 04:00 with 03:00?
I also noticed that when adding TO_CHAR(ts,'YYYY-MM-DD HH24:00:00') (instead of select ts from ...), it solves the problem for March, however, if use TO_CHAR(ts,'YYYY-MM-DD HH24:00:00')::timestamp then the duplicate appears again.
SELECT ts FROM (
SELECT '2020-10-20'::TIMESTAMP AT TIME ZONE 'UTC' AS tm
UNION
SELECT '2020-10-30'::TIMESTAMP AT TIME ZONE 'UTC' AS tm
) AS t TIMESERIES ts AS '1 Hour' OVER (ORDER BY tm)
ORDER BY ts
My output and desired output

Extract dates from week numbers on BigQuery

I have a large data file containing a string type column 'YearMonthWeek'
It contains values such as '20160101' for the first week of January 2016, or '20161040' for the 40th week of the year 2016 apparently falling in October.
Now, I want to convert these strings to actual dates so that every YearMonthWeek value is converted to, say the first day of that week. (Whether that ends up being Monday or Sunday I don't really care).
I tried the following query:
PARSE_TIMESTAMP('%Y%m%W', CAST(YearMonthWeek AS STRING)) AS datefield
(See this documentation for details)
This runs without errors, but returns me the first day of the month for every single entry...
So for example '20160101' and '20160102' both get parsed as 2016-01-01 00:00:00 UTC.
Is this an issue with the PARSE_TIMESTAMP function, or am I missing something?
Try doing something like
DATE_ADD(date_expression, INTERVAL %W WEEK)
Static example:
SELECT
DATE_ADD(
DATE(PARSE_TIMESTAMP('%Y', SUBSTR(CAST('20161252' AS STRING),0,4))),
INTERVAL (CAST(SUBSTR(CAST('20160102' AS STRING),7) AS INT64)) week)
AS datefield
-
Row datefield
1 2016-01-15
You may add something as a margin to it, according to ISO 8601, the first week of the year is the one that contains January 4th. So you could have something like: 4 + 7*($week - 1)

Find the minute difference between 2 date time

I need to get the difference between 2 date time in minutes(Time difference in minutes). And the last difference will be calculated based on 6 PM of every date.
Sample data: need result of last column
User_Name Date Time difference in minutes
User 1 1/1/06 12:00 PM 30
user 2 1/1/06 12:30 PM 315
user 3 1/1/06 5:45 PM 15
Here the date will be always in same date and the last user date difference calculated based on default value 6PM. Assuming the dates of any user will not cross 6PM time.
Please suggest how to write the query for the same.
You could use the lead window function.
I assume your table is called mytable and the date column is mydate (it is a bad idea to call a column Date as it is a reserved word).
select user_name,
round((lead(mydate, 1, trunc(mydate)+18/24)
over (partition by trunc(mydate) order by mydate)
- mydate) *24*60) as difference
from mytable
I found the solution.. if its not correct let me know
SELECT User_name,created_date,
trunc(to_number((cast(nvl(lead (created_date,1) OVER (ORDER BY created_date),TRUNC(SYSDATE) + (19/24)) as date) - cast(created_date as date)))*24*60) as difference
FROM users;

Change Character to time stamp in IBM informix DB

I am writing a query to convert a character to Date Time
The following query extracts my time stamps in Character format.
select
(to_char(TO_CHAR(MDY(month(current- 1 units month), 1,year(current- 1 units month)),'%Y-%m-%d')||' 13:00:00')),
(to_char(TO_CHAR((DATE(DATE(extend(TODAY, YEAR TO MONTH)) - 1 UNITS DAY)+1),'%d-%m-%Y')||' 13:00:00'))
from dual
Output:
`T 0÷
2015-08-01 13:00:00 01-09-2015 13:00:00
2015-08-01 13:00:00 01-09-2015 13:00:00
Now I am trying to convert the Character to Time stamp using DATETIME(2001-12-31 15:32:55) YEAR TO SECOND function. I am getting syntax error.
select
DATETIME(to_char(TO_CHAR(MDY(month(current- 1 units month), 1,year(current- 1 units month)),'%Y-%m-%d')||' 13:00:00')) YEAR TO SECOND ,
DATETIME(to_char(TO_CHAR((DATE(DATE(extend(TODAY, YEAR TO MONTH)) - 1 UNITS DAY)+1),'%d-%m-%Y')||' 13:00:00') ) YEAR TO SECOND
from dual
How ever the following is working fine:
select DATETIME(2001-12-31 15:32:55) YEAR TO SECOND
from dual
Thanks in Advance. Please do not suggest answers for Oracle. its damn easy in Oracle.
Try using a CAST to convert your output as a DATETIME YEAR TO SECOND:
select
(to_char(TO_CHAR(MDY(month(current- 1 units month), 1,year(current- 1 units month)),'%Y-%m-%d')||' 13:00:00'))::DATETIME YEAR TO SECOND ,
(to_char(TO_CHAR((DATE(DATE(extend(TODAY, YEAR TO MONTH)) - 1 UNITS DAY)+1),'%Y-%m-%d')||' 13:00:00'))::DATETIME YEAR TO SECOND
from dual
That seems to work OK, but I'd suggest you don't do this. Also, note that DATETIME is an ISO standard: YYYY-MM-DD HH:MM:SS.FFF (or part thereof). Your second example with the date in English format is not going to parse to a DATETIME.
Your first algorithm is not leap-day safe, and the second example is horribly over-complicated. You can determine 13:00 on the first day of the current month using the far simpler construction:
(TODAY-DAY(TODAY)+1)::DATETIME YEAR TO SECOND + INTERVAL(13) HOUR TO HOUR
This also has the benefit of avoiding casting back and forth between DATE and CHAR. The calculation of the same time the previous month can be written as:
MDY(MONTH(TODAY-DAY(TODAY)),1,YEAR(TODAY-DAY(TODAY)))::DATETIME YEAR TO SECOND + INTERVAL(13) HOUR TO HOUR
... which will do the right thing on Feb 29/March 1 in a leap year, which your algorithm won't.
The construction TODAY-DAY(TODAY) will always produce the last day of the prior month.

Given Start Time , End Time & Secs. Obtain secs in a specific Hour

I'm using Vertica Database. I am trying to get the total secs in a particular hour from the following example session data. Any sample SQL code would be very helpful - Thanks
start time end time session length(secs)
2010-02-21 20:30:00 2010-02-21 23:30:00 10800
2010-02-21 21:30:00 2010-02-21 22:30:00 3600
2010-02-21 21:45:00 2010-02-21 21:59:00 840
2010-02-21 22:00:00 2010-02-21 22:20:00 1200
2010-02-21 22:30:00 2010-02-21 23:30:00 3600
Desired Output
hour secs_in_that_hour
20 1800
21 6240
22 8400
23 3600
You would need a table containing every hour, so that you could join it in. That join would be based on the hour being within start and end time and then you can extract the time using (min(hour end,end time) - max(hour start,start time)). Then group on the hour and sum.
Since I don't know vertica, I have no complete answer to this.
Vertica is based on PostgresSQL, especially language-wise. The best thing you could do is look up Postgres's Date Time functions and related tutorials. I haven't found an instance where a Postgres time function does not work in Vertica.
http://www.postgresql.org/docs/8.0/interactive/functions-datetime.html
There is probably a datediff type function you can use. (Sorry, I don't have to time to look it up.)
See Vertica function
TIMESERIES Clause
Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation. See Using Time Series Analytics in the Programmer's Guide for details and examples.
Syntax
TIMESERIES slice_time AS 'length_and_time_unit_expression' OVER (
... [ window_partition_clause (page 147) [ , ... ] ]
... ORDER BY time_expression )
... [ ORDER BY table_column [ , ... ] ]
The simplest way is to just extract epoch (number of seconds) on the interval (the difference between timestamps).
As for the overlapping sums, you'll need to first break it out by hour. Some of these hours don't exist so you'll need to generate them using a TIMESERIES clause.
The idea will be to first create your hourly time slices, then theta join to find (and fan out) for all possible matches on this. This is basically looking for any and all overlaps of the time range. Luckily, this is pretty simple as it is just anywhere the start time is before the end of the slice and the end time is greater than the start of the slice.
Then you use greatest and least to find the actual time to start and stop within the slice, subtract them out, convert interval to seconds and done.
See below for the example.
with slices as (
select slice_time slice_time_start, slice_time + interval '1 hour' slice_time_end
from (
select min(start_time) time_range from mytest
union all
select max(end_time) from mytest
) range
timeseries slice_time as '1 HOUR' over (order by range.time_range)
)
select slice_time_start "hour", extract(epoch from sum( least(end_time, slice_time_end)-greatest(slice_time_start, start_time))) secs_in_that_hour
from slices join mytest on ( start_time < slice_time_end and end_time > slice_time_start)
group by 1
order by 1
There may be some edge cases or so additional filtering needed if your data isn't so clean.