Calculate time difference between two columns of string type in hive without changing the data type string - sql

I am trying to calculate the time difference between two columns of a row which are of string data type. If the time difference between them is less than 2 hours then select the first column of that row else if the time difference is greater than 2 hours then select the second column of that row. It can be done by converting the columns to datetime format, but I want the result to be in string only. How can I do that? The data looks like this:
col1(string type)
2018-07-16 02:23:00
2018-07-26 12:26:00
2018-07-26 15:32:00
col2(string type)
2018-07-16 02:36:00
2018-07-26 14:29:00
2018-07-27 15:38:00

I think you don't need to convert the columns to datetime format, since the data in your case is already ordered (yyyy-MM-dd hh:mm:ss). You just need to take all the digits and take it into one string (yyyyMMddhhmmss) then you can apply your selection which is bigger or smaller than 2 hours (here 20000 since the hour is followed by mmss). By looking at your example (assuming col2 > col1), this query would work:
SELECT case when regexp_replace(col2,'[^0-9]', '')-regexp_replace(col1,'[^0-9]', '') < 20000 then col1 else col2 end as col3 from your_table;

Use unix_timestamp() to convert string timestamp to seconds.
The difference in hours will be:
hive> select (unix_timestamp('2018-07-16 02:23:00')- unix_timestamp('2018-07-16 02:36:00'))/60/60;
OK
-0.21666666666666667
Important update: this method will work correctly only if time zone is configured as UTC. Because for DST timezones for some marginal cases Hive converts time during timestamp operations. Consider this example for PDT time zone:
hive> select hour('2018-03-11 02:00:00');
OK
3
Note the hour is 3, not 2. This is because 2018-03-11 02:00:00 cannot exist in PDT time zone because exactly at 2018-03-11 02:00:00 time is adjusted and becomes 2018-03-11 03:00:00.
The same happens when converting to unix_timestamp. For PDT time zone unix_timestamp('2018-03-11 03:00:00') and unix_timestamp('2018-03-11 02:00:00') will return the same timestamp:
hive> select unix_timestamp('2018-03-11 03:00:00');
OK
1520762400
hive> select unix_timestamp('2018-03-11 02:00:00');
OK
1520762400
And few links for your reference:
https://community.hortonworks.com/questions/82511/change-default-timezone-for-hive.html
http://boristyukin.com/watch-out-for-timezones-with-sqoop-hive-impala-and-spark-2/
Also have a look at this jira please: Hive should carry out timestamp computations in UTC

Related

parse_timestamp vs format_timestamp bigquery

Could someone help me understand why these two queries are returning different results in bigquery?
select FORMAT_TIMESTAMP('%F %H:%M:%E*S', "2018-10-01 00:00:00" , 'Europe/London')
returns 2018-10-01 01:00:00
select PARSE_TIMESTAMP('%F %H:%M:%E*S', "2018-10-0100:00:00", "Europe/London")
returns 2018-09-30 23:00:00 UTC
As 2018-10-01 is during british summer time (UTC +1), I would've expected both queries to return 2018-09-30 23:00:00 UTC
The first is given a timestamp which is in UTC. It then converts it to the corresponding time in Europe/London. The return value is a string representing the time in the local timezone.
The second takes a string representation and returns a UTC timestamp. The representation is assumed to be in Europe/London.
So, the two functions are going in different directions, one from UTC to the local time and the other from the local time to UTC.

Apache Hive- Time Stamp query

I have two time stamp columns in a Hive DB storing timestamp in following format:
hive> select last_date from xyz limit 2;
OK
2019-08-21 15:11:23.553
2019-08-21 15:11:23.553
[Above has milliseconds stored in it by default]
hive> select last_modify_date from xyz limit 2;
OK
2018-04-18 23:32:58
2017-09-22 04:02:32
I need a common Hive select query which would convert both the above timestamps to 'YYYY-MM-DD HH:mm:ss.SSS' formats, preserving the millisecond value if exists, or appending '.000' if it doesnt exist.
What I have tried so far:
select
last_modify_date,
from_unixtime(unix_timestamp(last_modify_date), "yyyy-MM-dd HH:mm:ss.SSS") as ts
from xyz limit 3;
However, the above query displays '.000' for both the above said timestamp columns.
Please help
From the UDF that implements unix_timestamp, you can see that the returned value is in SENCONDS represented by a LongWritable. And anything less than one second is rounded off.
You can write your own UDF, or just use pure SQL to achieve that.
One of the easy way is to use the GenericUDFRpad rpad:
select rpad(your_date, 23, '.000') from your_table;
Some examples:
hive> select rpad('2018-04-18 23:32:58', 23, '.000');
OK
2018-04-18 23:32:58.000
hive> select rpad('2018-04-18 23:32:58.553', 23, '.000');
OK
2018-04-18 23:32:58.553

How to extract time in HH24:MM from varchar in Oracle

I have a column in the following varchar format. I would like to extract the time based on a condition e.g. < 7:00.
Table1
Column: timer(varchar)
23:45
05:00
07:00
22:00
Expected output
test
05:00
07:30
I tried the following:
Select *
FROM Table1
where timer < 7:00
However, the result is not as expected.
Oracle does not have a time date, so presumably the type is a string.
Use string comparisons:
where time < '07:00'
Note that the leading 0 is important!
If this is just time and you want a proper comparison then you can convert them to date and compare them.
Select *
FROM Table1
where to_date(timer,'hh24:mi') < to_date('07:00','hh24:mi');
please note that your expected output contains 07:30 but it is not less than 07:00 so it will not be part of the output if you compare it with less than 07:00.
Cheers!!

The difference between two dates in Hiveql

I wish to find the differences between two dates in date format in Hiveql. I used the blow function in SAS to return a date value by subtracting a number
intnx('day', 20MAR2019 , -7)
It subtracts 7 days from the date and returns 13MAR2019
I wish to convert it to Hiveql language. Any tips would be appreciated!
you can use date_sub function in hive to subtract the days from a given date.
hive> select current_date;
2019-07-25
hive> select date_sub(current_date,7);
2019-07-18
This will return null.
hive> select date_sub('13MAR2019',7);
OK
NULL
since your date is format 'ddMMMYYY', you can convert it yyy-MM-dd format.
hive> select date_sub(from_unixtime(unix_timestamp('13MAR2019' ,'ddMMMyyyy'), 'yyyy-MM-dd'),7);
OK
2019-03-06

Time Difference in Redshift

how to get exact time Difference between two column
eg:
col1 date is 2014-09-21 02:00:00
col2 date is 2014-09-22 01:00:00
output like
result: 23:00:00
I am getting result like
Hours Minutes Seconds
--------------------
3 3 20
1 2 30
using the following query
SELECT start_time,
end_time,
DATE_PART(H,end_time) - DATE_PART(H,start_time) AS Hours,
DATE_PART(M,end_time) - DATE_PART(M,start_time) AS Minutes,
DATE_PART(S,end_time) - DATE_PART(S,start_time) AS Seconds
FROM user_session
but i need like
Difference
-----------
03:03:20
01:02:30
Use DATEDIFF to get the seconds between the two datetimes:
DATEDIFF(second,'2014-09-23 00:00:00.000','2014-09-23 01:23:45.000')
Then use DATEADD to add the seconds to '1900-01-01 00:00:00':
DATEADD(seconds,5025,'1900-01-01 00:00:00')
Then CAST the result to a TIME data type (note that this limits you to 24 hours max):
CAST('1900-01-01 01:23:45' as TIME)
Then LTRIM the date part of the value off the TIME data (as discovered by Benny). Redshift does not allow use of TIME on actual stored data:
LTRIM('1900-01-01 01:23:45','1900-01-01')
Now, do it in a single step:
SELECT LTRIM(DATEADD(seconds,DATEDIFF(second,'2014-09-23 00:00:00','2014-09-23 01:23:45.000'),'1900-01-01 00:00:00'),'1900-01-01');
:)
SELECT LTRIM(DATEADD(seconds,DATEDIFF(second,'2014-09-23 00:00:00','2014-09-23 01:23:45.000'),'1900-01-01 00:00:00'),'1900-01-01');