Sparksql Int to date format - apache-spark-sql

my table has a date column that is the type integer
when I do:
sc.sql('select date from table).show(1)
I get:
19860102
but if I run:
sc.sql('select cast(date to Timestamp) from table).show(1)
I get:
1970-08-18 20:41:42
How can I convert this the df Date field from Integer to string to use dayofweek function?
Is there a way to do it using sql instead of functions F?

You can accomplish this with some string formatting and a bit of math.
select format_string('%d-%02d-%02d', floor(date / 10000), floor(date / 100) % 100, date % 100) from table

Related

SQL solution to extract total number of hours from a string column

I have a varchar(255) column in table1:
time
--------------------------
Monday|10:00-18:00
Tuesday|10:00-16:00
Friday|10:00-20:00
How do I extract the number of hours from the above column using SQLite? I tried using varchar() and datetime() none of which worked. Should I be using regex to get the time?
Thank you in advance.
You can extract the time values using:
select substr(time, -11, 5), substr(time, -5)
You can then use julianday() to get the difference in seconds (or any other unit):
select round((julianday(substr(time, -5)) - julianday(substr(time, -11, 5))) * 60 * 60 * 24) as seconds_diff
from t;
Here is a db<>fiddle.

I have column x in hh:mm format of datatype varchar in SQL Server and I want to perform sum on that 'x' column

I have column x in hh:mm format of datatype varchar in SQL Server and I want to perform sum on that x column.
I created a user-defined function to convert total min into hh:mm format.
Then I tried to perform sum to calculate total duration:
sum(cast(new_totalmin AS Int))
also i want total of HH:mm exactly as example
4:20
+1:10
5:30
5 hour: 30 minute
or i can do one thing here insted hh:mm i keep column as it is which is totalmin as int once sum cal insted of hh:mm (hh.mm which is in decimal also ok for me PSB it will be ok for me ':' or '.' format )
(60 min --> 1:00 --> 1.00
90 min --> 1:30 -->1.30
---------------------------------
sum --> 150 min -->2:30 --> 2.30)
but it did not work.
I got an error like
Conversion failed when converting the varchar value '01:00' to data type int
DECLARE #SampleData AS TABLE (HourMinutes VARCHAR(10));
INSERT INTO #SampleData VALUES ('4:32');
INSERT INTO #SampleData VALUES ('5:28');
INSERT INTO #SampleData VALUES ('6:00');
INSERT INTO #SampleData VALUES ('7:10');
SELECT * FROM #SampleData
SELECT SUM(datediff(minute, 0, HourMinutes)) TotalMinute
FROM #SampleData
You will get following output
hh:mm is a varchar data and applying SUM will not work on it.
As you are telling that you are already having a function, I would suggest you to perform sum of the minutes and then later convert them to hh:mm
SELECT ... , YourUserDefinedFunction(sum(minuteData)) as minutesInHHMM_Format
FROM ...
WHERE ...
GROUP BY ...
I would recommend that you store numeric values -- such as the number of minutes -- as a number rather than a string.
The challenge is converting the value back to an HH:MM format. SQL Server does not support time values of 24 hours or greater, so you need to use string manipulations.
Assuming that your values are all less than 24 hours, you can use:
select sum(datediff(minute, 0, hhmm)) as num_minutes,
concat(sum(datediff(minute, 0, hhmm)) / 60, ':',
format(sum(datediff(minute, 0, hhmm)) % 60, '00')
)
from t;
The result here is a string, so this can exceed 24 hours.
A more general solution eschews date/times altogether:
select sum(v.minutes) as num_minutes,
concat(sum(v.minutes) / 60, ':',
format(sum(v.minutes) % 60, '00')
)
from t cross apply
(values (left(t.hhmm, charindex(':', t.hhmm) - 1) * 60 + right(t.hhmm, 2))
) v(minutes);
Here is a db<>fiddle.

Invalid datetime string when CAST As Date

I have Time column in BigQuery, the values of which look like this: 2020-09-01-07:53:19 it is a STRING format. I need to extract just the date. Desired output: 2020-09-01.
My query:
SELECT
CAST(a.Time AS date) as Date
from `table_a`
The error message is: Invalid datetime string "2020-09-02-02:17:49"
You could also use the parse_datetime(), then convert to a date.
with temp as (select '2020-09-02-02:17:49' as Time)
select
date(parse_datetime('%Y-%m-%d-%T',Time)) as new_date
from temp
How about just taking the left-most 10 characters?
select substr(a.time, 1, 10)
If you want this as a date, then:
select parse_date('%Y-%m-%d', substr(a.time, 1, 10))
select STR_TO_DATE('2020-09-08 00:58:09','%Y-%m-%d') from DUAL;
or to be more specific as your column do as:
select STR_TO_DATE(a.Time,'%Y-%m-%d') from `table_a`;
Note: this format is applicable where mysql is supported

How to deal with a date stored as a string in an int field in Teradata?

I've got a table in Teradata that stores a date in an 8 character INT field in the following form "YYYYMMDD", so for today it would store "20180308". If I try to CAST it as a date like this:
CAST(date_field AS DATE FORMAT 'YYYY-MM-DD')
It transforms the date to some future date in the year 3450 or something.
I think it's an error that this data isn't either stored as a date object. Is there anyway to overcome this glitch? I don't have access to change this unfortunately.
Thanks
It's not an 8 character integer, it's an 8 digit integer.
Teradata stores dates using
(year - 1900) * 10000
+ (month * 100)
+ day
This results in 1180308 for today and 20180308 will return 3918-03-08
To cast it to a date you need to use
cast(intdate-19000000 as date)
select cast('20180308' as date format 'yyyymmdd') ;

Comparing dates as strings and integers in Teradata

My task is to compare dates in two different tables in a Teradata database. In table group_1 dates are BIGINT, for instance 20,141,106 and in table group_2 dates are VARCHAR(30), for instance, 11/12/2015.
What would be the best way to do a conversion and compare them, namely,
select * from ....
where date in group_1 = date in group_2?
Many thanks in advance.
Can you safely convert those columns to dates (no invalid dates)?
BIGINT -> DATE:
cast(col - 19000000 as date)
VARCHAR -> DATE:
to_date(col, 'dd/mm/yyyy') (or 'mm/dd/yyyy'?)
Otherwise:
BIGINT -> VARCHAR:
TRIM((col MOD 100) * 1000000 + (col/100 MOD 100) * 10000 + (col / 10000) (FORMAT '99/99/9999')) -- dmy
or
TRIM((col/100 MOD 100) * 1000000 + (col MOD 100) * 10000 + (col / 10000) (FORMAT '99/99/9999')) -- mdy
And next time, try to store data using the correct datatype or at least the same wrong type :-)
You should convert date types to date for comparisons and other operations.
For the integer:
select to_date(convert(bigintcol as varchar(255)), 'YYYYMMDD')
For the string:
select to_date(varcharcol, 'MM/DD/YYYY') -- or perhaps DD/MM/YYYY
You can then compare the dates directly.