I could not find any function in snowflake docs that can do this.
If I understand what you mean correctly it appears to be:
TO_TIMESTAMP( epoch_sec )
This is the reference. There's variations for time zone support too.
Unfortunately I don't think there is a perfect solution for this issue. The Snowflake docs do say that the to_timestamp() function supports epoch seconds, microseconds, and nanoseconds, however their own example using the number 31536000000000000 does not even work.
select to_timestamp(31536000000000000); -- returns "Invalid Date" (incorrect)
The number of digits your epoch number has will vary by its origin source. What I found helpful was using a tool like epoch converter to input the full epoch number and determine what your date should be. Then try to get to that date in Snowflake using some manipulation. To get the real date for the example above:
select to_timestamp(left(31536000000000000, 11)); -- returns "1971-01-01" (correct)
You may notice that this is not set in stone. Adding or removing the number of digits you keep in your to_timestamp function will completely change the output, so you may need to add or remove numbers to get the date you are looking for. For example, the number 1418419324000000 should return date "2014-12-12"...
select to_timestamp(1418419324000000); -- returns "Invalid Date" (incorrect)
select to_timestamp(left(1418419324000000, 11)); -- returns "2419-06-24" (incorrect)
select to_timestamp(left(1418419324000000, 10)); -- returns "2014-12-12" (correct)
I had to play around with how many characters I input to get to where I needed to be. It's definitely a hack, but it's a simple solution to get there.
Related
I'm working with JSON data that may contain formatted timestamps. I'm converting them to proper timestamps using cast when I came across something odd, a conversion that makes no sense:
select cast('[112,180]'::json#>>'{}' as timestamp with time zone);
Produces a result:
timestamptz
------------------------------
0112-06-28 00:00:00+00:53:28
The first number is interpreted as the year, and the second number the day of the year, but...
I played a bit an discovered the first integer needs to be >= 100, and the second integer needs to be from 100 to 366. Any other values, and other array lengths will fail.
I'm curious as to why this pattern is parsed as a timestamp?
I'd also be happy to know if there was a way to disable this behaviour.
It is parsed as a timestamp because that is what you explicitly told it to do.
It is not an array of two integers, it is the text string consisting of the sequence of characters [112,180], because that is what #>> yields.
It is parsed following the rules documented here (although it doesn't define what a token is, so that is a bit vague), specifically rule 3d followed by 3b.
Redefining date parsing sounds like a giant mess. I would think it be better to make a #>> variant that throws an ERROR (if that is what you want, you didn't say what you wanted to happen, only what you wanted not to happen) when the json_typeof from #> is not string.
As you could see below the most of the dates are same but carries different format and with one of the dates as invalid. My question is how to insert this data in one go to the table as per correct format thereby discarding invalid entries without getting any errors on insert.
select to_date(dob,'dd/mm/yyyy') from table
This would not work in most of the cases as the data might be same as below but it may be shuffled in different format. I have thousand plus of such entries and I am wondering if it would be able to work via sql query with the use of regex (i thought maybe)
One solution is regular expressions:
select (case when regexp_like(dob, '^[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}$'
then to_date(dob, 'dd/mm/yyyy')
when regexp_like(dob, '^[0-9]{4}/[0-9]{2}/[0-9]{2}$'
then to_date(dob, 'yyyy/mm/dd')
when regexp_like(dob, '^[0-9]{4}-[0-9]{2}-[0-9]{2}$'
then to_date(dob, 'yyyy-mm-dd')
end)
from t;
I don't really recommend this, however. You have a data modeling problem. You have stored a date as a string, and that is the fundamental issue. You have no control over the inputs, so you don't know if 3/7/64 refers to March 7th or July 3rd.
You should really fix the table when the data is input.
Oracle TO_DATE() function is fairly forgiving: it doesn't care whether we use '/' or '-' as a date separator in the format mask. So I prefer to use TO_DATE() rather than regular expressions, as they express intent better and they're easier to read: I'm trying to convert strings to dates and here are the formats I'm expecting. Also, Oracle's inbuilt SQL functions are more performative than regex, which does matter in bulk operations like data warehouse ETL runs.
A straightforward solution is to use a function which applies various date format masks to the date string until one is successful. That is our winner and we return it as a date. If none of the masks fit the string we return null.
function cast_to_date (p_str in varchar2) return date is
d date;
masks sys.dbms_debug_vc2coll := sys.dbms_debug_vc2coll ('dd-mm-yyyy', 'mm-dd-yyyy', 'yyyy-mm-dd');
begin
for idx in 1..masks.count() loop
begin
d := to_date(p_str, masks(idx));
exit;
exception
when others then
d := null;
end;
end loop;
return d;
end;
This version applies three masks. Think about the order in which you assign masks to the array: if you think more strings represent dates as month-day-year than day-month-year and it matters whether you load 10-APR-1990 or 04-OCT-1990 then you should change the order accordingly.
Here is a demo on db<>fiddle of this approach in action. Note that I've added a few more input rows to your sample.
most of the dates are same but carries different format and with one of the dates as invalid.
Actually three of the eight dates are invalid, and three more could be either month-day-year or day-month-year. Which means you can only be sure that 25% of the dates in the sample are correct. Given that hit rate you ought to be suspicious of those ones too: certainly they're dates, but are they correct? And by extension, can you trust any of the data this source system is passing you? How can you be sure they are only cavalier with this one column and rigorous with all the other columns? I'll bet the "first name lastname and phone number" are full of dodgy values too.
Most likely you have exaggerated the numbers of bad dates in the posted sample for the purposes of the question. But if this breakdown is representative of the data you get then you ought to have a discussion about whether it's worth you time loading this data, and if you do load it, how much you should trust it in downstream processing.
I have an already generated script that uses the following code: ORDER BY TO_CHAR((A.VERIFIED_DTTM - 4/24),'YYYY-MM-DD'). The output of A.VERIFIED_DTTM is a simple date eg: 09-SEP-19.
I was my understanding that 4/24 is the same as saying 4 hours. I have tried removing this string from my query and received radically different results (not just sorting or ordering).
In another area of the script, I have a code that makes excellent sense (A.VERIFIED_DTTM >= sysdate - 30). This I understand.
Can anyone explain what is actually happening with this query line?
Thank you!
It is subtracting 4 hours and converting the date value to a string in the format of YYYY-MM-DD. This is probably a timezone adjustment.
To be honest, I'm not sure why it doesn't just use:
ORDER BY A.VERIFIED_DTTM
Removing it should not radically change the results.
Thanks to some wonderful application design, I've come to find myself face-to-face with a real WTF - it seems that the application I support outputs the date and time into two separate columns in one particular table; the date goes into a 'Date' column as the datetime data type, whilst the time goes into a 'Time' column as the money data type in minutes and seconds (so, for example, 10:35:00 would be £10.35).
I need to amalgamate these two columns during a query I'm making to the database so it returns as one complete datetime column but obviously just doing...
...snip...
CAST(au.[Date] as datetime) + CAST(au.[Time] AS datetime) as 'LastUpdateDate'
...snip...
... doesn't work as I hoped (naivély) that it would.
My predecessor encountered this issue and came up with a... "creative" solution to this:
MIN(DATEADD(HOUR,CAST(LEFT(CONVERT(VARCHAR(10),[time],0),CHARINDEX('.',CONVERT(VARCHAR(10),[time],0),0)-1) AS INT),DATEADD(MINUTE,CAST(RIGHT(CONVERT(VARCHAR(10),[time],0),LEN(CONVERT(VARCHAR(10),[time],0)) - CHARINDEX('.',CONVERT(VARCHAR(10),[time],0),0)) AS INT),[date]))) AS CreatedDateTime
Unlike my predecessor, I would rather try to keep this solution as simple as possible. Do you think it would be possible to cast the values in this column to time by:
Casting the money value to string
Replacing the decimal point for a colon
Parsing this as a datetime object (to replace the CAST(au.[Time] as datetime) part of the first code block
Is this feasible? And if not, can anyone assist?
EDIT
Just to be 100% clear, I cannot change the underlying data type for the column as the application relies on the data type being money. This is purely so my sanely-written application that does housekeeping reports can actually read the data in as a complete datetime value.
I'd prefer an arithmetical convertation without any string castings
MIN(
DATEADD(
MINUTE,
FLOOR(au.[Time]) * 60 + (au.[Time]-FLOOR(au.[Time])) * 100,
au.[Date])
) AS CreatedDateTime
You can add a layer of sanity, if changing the column to time outright is not an option:
ALTER TABLE ... ADD SaneDate AS
DATEADD(MINUTE, FLOOR([Time]) * 60 + 100 * ([Time] - FLOOR([Time])), [Date])
One computed column and then you can stick to using that instead of repeating the calculations everywhere. If altering the tables in any way is out of the question, you could at least make a view or table-valued function to capture this logic. (Preferably not a scalar function, although that's more obvious -- those have horrendous performance in queries.)
I tend to prefer DATEADD over string manipulation when possible, simply because the results tend to be more predictable. In this case there's no real issue, since converting money to char(5) is perfectly predictable regardless of language settings, but still.
Just had a look at how to use the REPLACE command and it works as expected:
CAST(au.[Date] as datetime) + CAST(REPLACE(CAST(au.[Time] AS varchar(5)),'.',':') AS datetime) as 'LastUpdateDate'
now outputs 2018-01-10 10:32:00.000 whereas before it was providing some incorrect date and time value.
I suppose you could mathematically convert it as #JeroenMostert has suggested - to be fair I'm not 100% on the performance impact this solution may have against calculating the minutes and converting it that way so I'll give that a try as well just to be sure.
In the queries I stumble upon each date is converted with to_date function before any comparison. Sometimes it caused "literal does not match format string" error, which had rather nothing to do with format and the cause was explained here:
ORA-01861: literal does not match format string
My question is: is it really necessary to use date conversion? Why is it converted in the first place before applying any logical comparison?
Oracle does not store dates as, well, dates. The problem is that there might be a time on the dates that would cause them to be unequal. (You can see the documentation here for information about the date data type.)
In general, we think that "2013-01-01" is equal to "2013-01-01". However, the first date might be "2013-01-01 01:00:00" and the second "2013-01-01 02:02:02". And they would not be equal. To make matters worse, they may look the same when they are printed out.
You don't actually have to convert the dates to strings in order to do such comparisons. You can also use the trunc() function. Such a transformation of the data is insurance against "invisible" time components of the data interfering with comparisons.
You should really be storing dates as actual dates (or timestamps). If you have strings representing dates, you will often need to convert them using to_date (with a specified format, not relying on default formats). It really depends on what comparisons/date functionality you want. You're getting errors because you hit a value that does not conform to your specified format. This is also a good reason to specify a column as DATE to store dates. For example,
select to_date('123', 'MM-DD-YYYY') from dual;
will throw an ORA-01861. So you may have 99.9% of the rows as MM-DD-YYYY, but the 0.1% will cause you headaches.
Anyway, if you cleanup those strings, you can do much more using to_date and date functions. For example:
select
(last_day(to_date('02-05-2009', 'MM-DD-YYYY')) - to_date('01-15-1998', 'MM-DD-YYYY')) as days_between_dates
from dual;
Not fun to do that with strings. Or maybe just find the most recent date:
select greatest( to_date('02-05-2009', 'MM-DD-YYYY'), to_date('12-01-1988', 'MM-DD-YYYY')) from dual;
using string comparison would give wrong answer:
select greatest('02-05-2009', '12-01-1988') from dual;
Just a few examples, but much better to treat dates as dates, not strings.
If you have a string that represents a date, use TO_DATE.
If you already have a date, use it directly.