Load Date Column from raw data with '/' separator in sparkSQL - sql

I have columns with data type DATE in sparkSQL
e.g.
CREATE TABLE ABC(startDate DATE, EndDate DATE....
and I load data as LOAD DATA INPATH './input/user.txt' INTO TABLE ABC
In user.txt data is like
2016/06/12 2016/06/15
2016/06/12 2016/06/15
but it loads data as
null null
null null
if it's
2016-06-12 2016-06-15
2016-06-12 2016-06-15
then it takes the data correctly.
How to handle data when the date separator is '/ '?
I don't want to replace the separator in input file.
Please help me. Thanks.

I faced this issue before in Hive. I found a workaround for this. First load them as string instead of Data type DATE
ex:
CREATE TABLE ABC(startDate string, EndDate string....)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ']'
STORED AS TEXTFILE
LOCATION './input/user.txt';
Then i used string functions to extract date/month/year from above fields. For example
select substr(date,1,4) as year,substr(date,6,1) as month .... from ABC
one other way is to replace the '/' with '-' and then cast them as DATE type and use Date functions
example
select regexp_replace(startDate,'/','-') from ABC
All the above is how to achieve it in Hive. To work on this in spark is also to first load them as string in to a dataframe.
val s1 = Seq(("2016/06/12", "2016/06/15" ), ("2016/06/12", "2016/06/15")).toDF("x", "y")
val result = s1.select(regexp_replace($"x","/", "-"),regexp_replace($"y","/", "-")).show()
result
+----------+----------+
| startDate| EndDate|
+----------+----------+
|2016-06-12|2016-06-15|
|2016-06-12|2016-06-15|
+----------+----------+
Hope this helps.

i know it's kinda late to answer this question but, in SPARK you can also include dateFormat in options while creating a table.
This will convert your date format from 2016/06/12 to 2016-06-12
CREATE TABLE IF NOT EXISTS ABC (
startDate DATE,
EndDate DATE,
...
)
using txt
options(
path "./input/user.txt",
dateFormat "yyyy/MM/dd"
)
select startDate, EndDate from ABC
result:
| startDate | EndDate |
|:----------|:---------|
|2016-06-12 |2016-06-15|
|2016-06-12 |2016-06-15|

I found one more way to do it using functions in SparkSQL on Spark 2.0 Preview Version
TO_DATE(from_unixtime(unix_timestamp(regexp_replace(startDate , '/','-'),'MM-dd-yyyy'))) AS startDate

Related

Translate Teradata DATE function (division/extract and sum) into BigQuery

I have this code in Teradata that reads "x_date/100+190000". So from my understanding it removes the 'day' portion from DATE and then adds an INT number of days. Now I have to translate the same into BigQuery but can't see how.
edit: so what I have is a SELECT statement that includes the "x_date" field, which has a DATE format. It contains a list of dates in the form of 'yyyy-mm-dd'. The query reads something like:
SELECT x_date/100+190000
FROM x_table
and the field has this sort of rows:
| '2022-06-06' |
| '2020-03-06' |
| '2019-09-01' |
| '2028-05-06' |
What I don't understand exactly is what this functions are doing in Teradata.
My expected output should be in DATE format and should be copying (in BigQuery), whatever the Teradata function is doing to the field.
Use below
SELECT FORMAT_DATE('%Y%m', x_date)
FROM x_table

Convert integer to date as MM/DD/YYYY like from 20200926 to 09/26/2020?

I need to convert number to date from 20200926 to 09/26/2020, where date_key is the date col in table. The below code does not return the output as 09/26/2020 it's returning value in the same format.
SELECT CONVERT(VARCHAR(10),date_key,101)
FROM table
Convert to a string and then to a date and back to a string with the right format:
select convert(varchar(255), cast(cast(20200926 as varchar(255)) as date), 101)
Your number format is already in YYYYMMDD format, so converting to date datetype is easier.
First you are converting to CHAR(8) to get string representation and then converting to DATE value. Then choose the format for display.
More info on CONVERT
SELECT convert(NVARCHAR(10),convert(date,convert(NCHAR(8),tdate)),101)
FROM
(
VALUES (20200926)
) as t(tdate)
+------------+
| dateval |
+------------+
| 09/26/2020 |
+------------+
Made alternative solution. I guess less readable for some and more for others.
print datefromparts(20200926/10000,20200926%10000/100,20200926%100)

How to convert a string column which contains date and time into just date using sql cast operator?

I have a column called 'created_date' whose data type is string. It contains records that are of the pattern date and time. I want to create another column called 'modified_date' that will take just the date from the 'created_date' column so as to be able to do some mathematical computations on dates later. I want to do this using the SQL CAST operator.
Below is how I expect the output to be-
ID created_date modified_date
1 2017-11-01 16:30:40 2017-11-01
2 2017-11-23 15:30:40 2017-11-23
3 2017-11-16 14:30:40 2017-11-16
Any suggestions on how to do this?
I am going to assume that you are using BigQuery.
You can use:
select date(created_date)
You could also be more specific:
select date(substr(created_date, 1, 10))
Or convert to a datetime and then to a date:
select date(cast(created_date as datetime))
You could use a simple date_format() and str_to_date()
select date_format(str_to_date(created_date,'%Y-%m-%d %T'), '%Y-%m-%d') modified_date
Below is for BigQuery Standard SQL
#standardSQL
SELECT *, DATE(TIMESTAMP(created_date)) modified_date
FROM `project.dataset.table`
I want to do this using the SQL CAST operator.
Note: i do not recommend using generic CAST for DATE, DATETIME, TIMESTAMP data types. Instead you should use respective functions as in this answer. Or if string is not directly represent such datatypes - you can use respective PARSE_ function where you can set format in which date/datetime/timestamp is represented in string!

Changing the format of data in a column

Trying the change the date column from YYYYMMDD to MMDDYYYY while maintaining varchar value. Currently my column is set as varchar(10). Is there a way to change the strings in mass numbers because I have thousands of rows that need the format converted.
For example:
| ID | Date |
------------------------
| 1 | 20140911 |
| 2 | 20140101 |
| 3 | 20140829 |
What I want my table to look like:
| ID | Date |
------------------------
| 1 | 09112014 |
| 2 | 01012014 |
| 3 | 08292014 |
Bonus question: Would it cause an issue while trying to convert this column if there is data such as 91212 for 09/12/2012 or something like 1381 which is supposed to be 08/01/2013?
Instead of storing the formatted date in separate column; just correct the format while fetching using STR_TO_DATE function (as you said your dates are stored as string/varchar) like below. Again, as other have suggested don't store date data as string rather use the datetime data type instead
SELECT STR_TO_DATE(`Date`, '%m/%d/%Y')
FROM yourtable
EDIT:
In that case, I would suggest don't update your original table. Rather store this formatted data in a view or in a separate table all together like below
create view formatted_date_view
as
SELECT ID,STR_TO_DATE(`Date`, '%m/%d/%Y') as 'Formatted_Date'
FROM yourtable
(OR)
create table formatted_date_table
as
SELECT ID,STR_TO_DATE(`Date`, '%m/%d/%Y') as 'Formatted_Date'
FROM yourtable
EDIT1:
In case of SQL Server use CONVERT function like CONVERT(datetime, Date,110). so, it would be (Here 110 is the style for mm-dd-yyyy format)
SELECT ID,convert(datetime,[Date],110) as 'Formatted_Date'
FROM yourtable
(OR)
CAST function like below (only drawback, you can't use any specific style to format the date)
SELECT ID, cast([Date] as datetime) as 'Formatted_Date'
FROM yourtable
MS SQL Server Solution:
Which SQL are you trying with?
MSSQL Server 2008 R2
You can use Convert function on your date field. You have to specify the date's format Style.
For mm/dd/yyyy format Style value is 101.
Using with style value, your update statement can be:
UPDATE table_name
SET date = CONVERT( VARCHAR, date, 101 )
Refer To:
How to format datetime & date in Sql Server
SQL Server 2008 Date Format
Demo # MS SQL Server 2008 Fiddle
MySQL Solution:
it needs to stay in varchar or int and the dates are yyyymmdd and I need to change thousands of rows of data to be in mmddyyyy format.
Change to date type using str_to_date and then change again to string using date_format.
UPDATE table_name
SET date = DATE_FORMAT( STR_TO_DATE( date, '%Y%m%d' ), '%m%d%Y' )
The value 20140911 when converted from yyyymmdd to mmddyyyy format, will retain the leading 0 as 09112014.
Bonus question: Would it cause an issue while trying to convert this column if there is data such as 91212 for 09/12/2012 or something like 1381 which is supposed to be 08/01/2013
You can use str_to_date( '91212', '%c%e%y' ) to convert the same to valid date object. But MySQL, though defines to support single digit month and date numbers, it won't parse such date correctly and returns a NULL on such formats.
mysql> select str_to_date( '91212', '%c%e%y' ) s1, str_to_date( '091212', '%c%e%y' ) s2;
+------+------------+
| s1 | s2 |
+------+------------+
| NULL | 2012-09-12 |
+------+------------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------+
| Warning | 1411 | Incorrect datetime value: '91212' for function str_to_date |
+---------+------+------------------------------------------------------------+
1 row in set (0.00 sec)

Select between varchar as a date returns null

I have a table attendance_sheet and it has column string_date which is a varchar.
This is inside my table data.
id | string_date | pname
1 | '06/03/2013' | 'sam'
2 | '08/23/2013' | 'sd'
3 | '11/26/2013' | 'rt'
I try to query it using this range.
SELECT * FROM attendance_sheet
where string_date between '06/01/2013' and '12/31/2013'
then it returns the data.. but when I try to query it using this
SELECT * FROM attendance_sheet
where string_date between '06/01/2013' and '03/31/2014'
it did not return any results...
It can be fixed without any changing the column type for example the string_date which is a varchar will be changed into a date?
Does anyone has an Idea about my case?
any help will be appreciated, thanks in advance ..
Use strftime
SELECT * FROM attendance_sheet
where strftime(string_date,'%m/%d/%Y') between '2013-06-01' and '2013-31-21'
The reason this:
where string_date between '06/01/2013' and '03/31/2014'
does not return any results is that '06' is greater than '03'. It's essentially the same as using this filter.
where SomeField between 'b' and 'a'
The cause of this problem is a poorly designed database. Storing dates as strings is a bad idea. Juergen has shown you a function that might help you, but since your field is varchar, values like, 'fred', 'barney', and 'dino' are perfectly valid. The Str_to_date() function won't work very well with those.
If you are able to change your database, do so.
To be able to do string comparisons correctly, you have to change your database so that the most significant field of the date comes first, i.e., use the format yyyy-mm-dd.