What is expected input date pattern for date_format function in databricks spark SQL - apache-spark-sql

I am trying to better understand the date_format function offered by Spark SQL.As per the official databricks documentation (I am using databricks), this function expects any date/ string in a valid datetime format. Below is the link for the same.
I am finding it difficult to understand what is the exact definition of "valid" here. I am trying to understand the functionality through two examples here.
Input string in YYYY-MM-DD format (2021-07-09), for which I get the expected results correctly:
Input string in DD-MM-YYYY format (20-07-2021), and I get null:
Why is this happening? How did this function understand that the parameter that I am passing is indeed in YYYY-MM-DD format? It could also have been YYYY-DD-MM.
My requirement is that I implement a logic that could handle all kinds of valid date formats (MM-DD-YYYY, YYYY-MM-DD, DD-MM-YYYY) and format the dates accordingly.

The following is valid input and output formats for ANSI date/time data types:
Example: ANSIDATE yyyy-mm-dd 2007-02-28 TIME WITH TIME ZONE hh:mm:ss.ffff... [+|-]th:tm
The valid range of time zone offset is from -14:00 to +14:00. date complies with the ANSI SQL standard definition for the Gregorian calendar: "NOTE 85 - Datetime data types will allow dates in the Gregorian format to be stored in the date range 0001-01-01 CE through 9999-12-31 CE
See Databricks SQL datetime patterns for details on valid formats. The function checks that the resulting dates are valid dates in the Proleptic Gregorian calendar, otherwise it returns NULL
When you use "20-07-2021" it does not conform to "yyyy-mm-dd" so results in NULL
Alternately, you can use make_date function which Creates a date from year, month, and day fields. Or better use to_date function
select date_format(to_date('9/15/2021', 'MM/dd/yyyy'), 'yyyy/MM/dd')
See Datetime Patterns for Formatting and Parsing in Spark.

Related

Oracle PL/SQL : to_date() format not considered

When I execute this in PL/SQL Developer :
SELECT to_date('29/03/17 14:05','DD/MM/RR HH24:MI') FROM dual;
Here's what I get :
3/29/2017 2:05:00 PM
How is this possible ? I use HH24 but it seems like it's HH that's being used instead. The day and month are also not in the format I entered.
What you are doing with the to_date method is parsing the string into a date object. If you then want to output the date object as string with a different format you should use the to_char method.
Example:
SELECT to_char(
to_date('29/03/17 14:05','DD/MM/RR HH24:MI'),
'DD/MM/RR HH24:MI'
) FROM dual;
Ok, conceptual excercise coming up
Which of these dates represents the 1st January 2017?
01/01/2017
2017-01-01
01-JAN-2017
That's right, all of them. The date datatype is not a format, it stores the value of the date, not how it appears.
If using Oracle, adjust your NLS_DATE_FORMAT to match your expectation, but again, this is just how the system will display the date, not how it stores it.
(N.B. This answer is more to give more clarity to the other answers but it's too long for a comment.)
Oracle stores DATEs (and TIMESTAMPs etc) in its own specific format. Us humans represent dates in a variety of different formats and we deal with strings. Even us humans can get confused over what a date string represents, given no context - e.g. 03/09/2017 - is that the 3rd of September, 2017 or the 9th of March 2017?
So, when you pass a date into Oracle, you need to convert it into Oracle's date format by passing a string in and telling Oracle what the date format of that string is. This can be done using to_date() or via the DATE literal (which is always in yyyy-mm-dd format).
Conversely, when you want to read something that's stored in Oracle's DATE datatype, you need to tell Oracle how you want it to be displayed, which you can do by using to_char() with the appropriate format mask.
If you fail to explicitly convert the string-to-a-date or date-to-a-string, then Oracle uses the format specified in the NLS_DATE_FORMAT to decide how to do the conversion.
In your case, you didn't specify how you wanted your date to be displayed, so Oracle has to use to_char() along with the format contained in your NLS_DATE_FORMAT in order to display the date as a string, and clearly that's different to the format you passed the date-string in as.

SQL Date Format Conversion

I have a question regarding SQL dates.
The table I am working with has a date field in the following format: "22-SEP-08". The field is a date column.
I am trying to figure out how to output records from 1/1/2000 to present day.
The code below is not filtering the date field:
Select distinct entity.lt_date
from feed.entitytable entity
where entity.lt_date >= '2000-01-01'
Any help regarding this issue is much appreciated. Thanks!
Edit: I am using Oracle SQL Developer to write my code.
DATEs do not have "a format". Any format you see is applied by the application displaying the date value.
You can either change the configuration of SQL Developer to display dates in a different format, or you can use to_char() to format the date the way you want.
The reason your statement does not work, is most probably because of the implicit data type conversion that you are relying on.
'2000-01-01' is a string value, not a date. And the string is converted using the NLS settings of your session. Given the fact that you see dates displayed as DD-MON-YY means that that is the format that is used by the evil implicit data type conversion. You should supply date values always as real date literals.
There are two ways of specifying a real date literal. The first is ANSI SQL and simple uses the keyword DATE in front of an ISO formatted string:
where entity.lt_date >= DATE '2000-01-01'
Note the DATE keyword in front of the string, wich makes it a real date literal not a string expression.
The other option is to use to_date() to convert a character value into a date:
where entity.lt_date >= to_date('2000-01-01', 'yyyy-mm-dd');
More details about specifying date literals can be found in the manual:
Date literals
to_date function
My guess is the data type isn't a Date. Just in case its a char type, try to convert it using the Oracle TO_DATE() function. The Oracle documentation below should help you with parameters.
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions183.htm
An implicit datatype conversion bites once again.
You're right. The predicate is not doing the comparison you are expecting,
Oracle is performing an implicit datatype conversion, from DATE to VARCHAR, so that it can do a comparison to the string literal.
If lt_date column is DATE datatype, then Oracle is seeing your where clause:
where entity.lt_date >= '2000-01-01'
Oracle is actually seeing it as if it's written like this:
where TO_CHAR(entity.lt_date) >= '2000-01-01'
And that's where the "format" problem comes in. The column itself does not have a "format". Because the second argument to the TO_CHAR function is not supplied, Oracle is using the value of the NLS_DATE_FORMAT parameter (from your session). And that's probably set to DD-MON-YY. Which is why that's the "format" you're seeing when you a run a SELECT statement in SQL*Plus. Because the DATE value is (again) being run through a TO_CHAR function to get a string that can be displayed.
To get the "filtering" you want, don't do a comparison to a string literal. Instead, do the comparison to an expression that has DATE datatype.
You can use the Oracle TO_DATE function. And you don't want to rely on setting of NLS_DATE_FORMAT, explicitly specify the format model as the second argument to the function. For example:
DO THIS
where entity.lt_date >= TO_DATE('2000-01-01','YYYY-MM-DD')
DON'T DO THIS
It's also possible to specify the format model as the second argument to the TO_CHAR function.
where TO_CHAR(entity.lt_date,'YYYY-MM-DD') >= '2001-01-01'
But you don't want to do that because that's going to force Oracle to evaluate that expression on the left side for every flipping row in the table, so it has a string value to do the comparison. (That's true unless someone created a function-based index for you.) If you do the comparison on the bare column, using the TO_DATE on the literal side, Oracle can make effective use of an appropriate index (with lt_date as the leading column) to satisfy the predicate.

Does TO_DATE function in oracle require strict format matching

I am confused with to_date(char[,'format'[,nls_lang]) function. Lets suppose I have to_date('31-DEC-1982','DD-MON-YYYY');should the format specified in the function be same as the date string? The above function works fine. When I use to_date('31-DEC-1982','DD-MM-YYYY'); also works fine but the month field in date string and that in format does not match.
So my doubt is should the date string and format specified match exactly to convert it to Date object.
Thanks
Generally speaking, yes, the date string and specified format should match. But the reason why it works in your case is that Oracle, for certain cases, provides flexibility of alternative format matching.
Excerpt from official Oracle Site
If a match fails between a datetime format element and the corresponding characters in the date string, then Oracle attempts alternative format elements
So as per above table, you can use 'MON' or 'MONTH' in place of 'MM'.
Similarly you can use 'YYYY' in place 'YY', etc
Reference:
Oracle Format Matching
Whatever format you follow, the object returned will be of date type.
You can test this via creating a dummy table and showing the table description.
e.g. CREATE TABLE TEST AS(
select to_date('31-DEC-1982','DD-MON-YYYY') dd from dual);
Now desc test;
Result will be dd date.
Similar will be the result with another type.
However if you are using SQL Developer, the date will be show in the exact NLS format as the setting there applies.
tools->preferences->database->NLS

How do you parse a custom formatted date time string into a datetime?

This is for Microsoft SQL Server
I have an audit table with a timestamp represented as a string - timestamps are in multiple locale-specific representations (eg some are in mm/dd others are dd/mm)
I know some rows that I'm interested in have a timestamp string in the format of dd/MM/yy HH:mm:ss
I want to write a query that will return rows where the timestamp string is NOT in that format so I imagine something like this (with an imaginary PARSEDATE function)
WHERE PARSEDATE(timestamp) IS NOT NULL
Everything I've read about T-SQL datetime functions seem to involve well defined format codes eg 112 but I don't see a generalized way of being able to provide a custom date time format string for parsing?
Set the format before running your query.
SET LANGUAGE us_english;
SET DATEFORMAT dmy;
In your query
WHERE ISDATE(timestamp) = 1
More information can be found here

creating table in Oracle with Date

I want to create a table in Oracle 10g and I want to specify the date format for my date column. If I use the below syntax:
create table datetest(
........
startdate date);
Then the date column will accept the date format DD-MON-YY which I dont want.
I want the syntax for my date column to be MM-DD-YYYY
Please let me know how to proceed with this.
Regards,
A DATE has no inherent format. It is not simply a string that happens to represent a date. Oracle has its own internal format for storing date values.
Formats come into play when actual date values need to be converted into strings or vice versa, which of course happens a lot since interactively we write dates out as strings.
The default date format for your database is determined by the settings NLS_DATE_FORMAT, which you probably have set to DD-MON-YYYY (which I believe is the default setting for American English locales). You can change this at the database level or for a single session for convenience, but in general it is safer programming practice to be explicit so that you don't get errors or, worse, wrong results if your code is run in a different environment.
The simplest way to specify a date value unambiguously is a date literal, which is the word 'date' followed by a string representing the date in YYYY-MM-DD format, e.g. date '2012-11-13'. The Oracle parser directly translates this into the corresponding internal date value.
If you want to use a different format, then I recommend explicitly using TO_CHAR/TO_DATE with your desired format model in your code. Examples:
INSERT INTO my_table (my_date) VALUES ( TO_DATE( '11-13-2012', 'MM-DD-YYYY' ) );
SELECT TO_CHAR( my_date, 'MM-DD-YYYY' ) FROM my_table;
dates rdo not have a format like you're suggesting. they are stored internally as a 7 byte number. to format the date when selecting, please use TO_CHAR(yourdatefield, 'format')
where formats are all shown here: http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm#i34924
eg to_char(startdate, 'mm-dd-yyyy')