Hive date cast chopping of milli seconds - hive

Below date cast is not displaying milli seconds.
select from_unixtime(unix_timestamp("2017-07-31 23:48:25.957" , "yyyy-MM-dd HH:mm:ss.SSS"));
2017-07-31 23:48:25
What is the way to get milli seconds?
Thanks.

Since this string is in ISO format, the casting can be done straightforward
hive> select cast("2017-07-31 23:48:25.957" as timestamp);
OK
2017-07-31 23:48:25.957
or
hive> select timestamp("2017-07-31 23:48:25.957");
OK
2017-07-31 23:48:25.957

because unix_timestamp is based on seconds, it truncate milliseconds.
Instead, you can transform string to timestamp using date_format, which preserve milliseconds. And then from_utc_timestamp.
select from_utc_timestamp(date_format("2017-07-31 23:48:25.957",'yyyy-MM-dd HH:mm:ss.SSS'),'UTC') as datetime

Related

epoch with milliseconds to timestamp with milliseconds conversion in Hive

How can I convert unix epoch with milliseconds to timestamp with milliseconds In Hive?
Neither cast() nor from_unixtime() function is working to get the timestamp with milliseconds.
I tried .SSS but the function just increases the year and doesn't take it as a part of millisecond.
scala> spark.sql("select from_unixtime(1598632101000, 'yyyy-MM-dd hh:mm:ss.SSS')").show(false)
+-----------------------------------------------------+
|from_unixtime(1598632101000, yyyy-MM-dd hh:mm:ss.SSS)|
+-----------------------------------------------------+
|52628-08-20 02:00:00.000 |
+-----------------------------------------------------+
I think you can just cast():
select cast(1598632101000 / 1000.0 as timestamp)
Note that this produces a timestamp datatype rather than a string, as in from_unixtime().
from_unixtime works with seconds, not milliseconds. Convert to timestamp in seconds from_unixtime(ts div 1000), concatenate with '.'+ milliseconds (mod(ts,1000)) and cast as timestamp. Tested in Hive:
with your_data as (
select stack(2,1598632101123, 1598632101000) as ts
)
select cast(concat(from_unixtime(ts div 1000),'.',mod(ts,1000)) as timestamp)
from your_data;
Result:
2020-08-28 16:28:21.123
2020-08-28 16:28:21.0
Here's another way in pure Spark Scala using UDF to wrap the Java function to return new Timestamp(ms)
import java.sql.Timestamp
val fromMilli = udf((ms:Long) => new Timestamp(ms))
#Test
val df = Seq((1598632101123L)).toDF("ts")
df.select(fromMilli($"ts")).show(false)
Result
+-----------------------+
|UDF(ts) |
+-----------------------+
|2020-08-28 16:28:21.123|
+-----------------------+

How to convert "2019-11-02T20:18:00Z" to timestamp in HQL?

I have datetime string "2019-11-02T20:18:00Z". How can I convert it into timestamp in Hive HQL?
try this:
select from_unixtime(unix_timestamp("2019-11-02T20:18:00Z", "yyyy-MM-dd'T'HH:mm:ss"))
If you want preserve milliseconds then remove Z, replace T with space and convert to timestamp:
select timestamp(regexp_replace("2019-11-02T20:18:00Z", '^(.+?)T(.+?)Z$','$1 $2'));
Result:
2019-11-02 20:18:00
Also it works with milliseconds:
select timestamp(regexp_replace("2019-11-02T20:18:00.123Z", '^(.+?)T(.+?)Z$','$1 $2'));
Result:
2019-11-02 20:18:00.123
Using from_unixtime(unix_timestamp()) solution does not work with milliseconds.
Demo:
select from_unixtime(unix_timestamp("2019-11-02T20:18:00.123Z", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"));
Result:
2019-11-02 20:18:00
Milliseconds are lost. And the reason is that function unix_timestamp returns seconds passed from the UNIX epoch (1970-01-01 00:00:00 UTC).

Calculate time difference between two columns of string type in hive without changing the data type string

I am trying to calculate the time difference between two columns of a row which are of string data type. If the time difference between them is less than 2 hours then select the first column of that row else if the time difference is greater than 2 hours then select the second column of that row. It can be done by converting the columns to datetime format, but I want the result to be in string only. How can I do that? The data looks like this:
col1(string type)
2018-07-16 02:23:00
2018-07-26 12:26:00
2018-07-26 15:32:00
col2(string type)
2018-07-16 02:36:00
2018-07-26 14:29:00
2018-07-27 15:38:00
I think you don't need to convert the columns to datetime format, since the data in your case is already ordered (yyyy-MM-dd hh:mm:ss). You just need to take all the digits and take it into one string (yyyyMMddhhmmss) then you can apply your selection which is bigger or smaller than 2 hours (here 20000 since the hour is followed by mmss). By looking at your example (assuming col2 > col1), this query would work:
SELECT case when regexp_replace(col2,'[^0-9]', '')-regexp_replace(col1,'[^0-9]', '') < 20000 then col1 else col2 end as col3 from your_table;
Use unix_timestamp() to convert string timestamp to seconds.
The difference in hours will be:
hive> select (unix_timestamp('2018-07-16 02:23:00')- unix_timestamp('2018-07-16 02:36:00'))/60/60;
OK
-0.21666666666666667
Important update: this method will work correctly only if time zone is configured as UTC. Because for DST timezones for some marginal cases Hive converts time during timestamp operations. Consider this example for PDT time zone:
hive> select hour('2018-03-11 02:00:00');
OK
3
Note the hour is 3, not 2. This is because 2018-03-11 02:00:00 cannot exist in PDT time zone because exactly at 2018-03-11 02:00:00 time is adjusted and becomes 2018-03-11 03:00:00.
The same happens when converting to unix_timestamp. For PDT time zone unix_timestamp('2018-03-11 03:00:00') and unix_timestamp('2018-03-11 02:00:00') will return the same timestamp:
hive> select unix_timestamp('2018-03-11 03:00:00');
OK
1520762400
hive> select unix_timestamp('2018-03-11 02:00:00');
OK
1520762400
And few links for your reference:
https://community.hortonworks.com/questions/82511/change-default-timezone-for-hive.html
http://boristyukin.com/watch-out-for-timezones-with-sqoop-hive-impala-and-spark-2/
Also have a look at this jira please: Hive should carry out timestamp computations in UTC

Converting only time to unixtimestamp in Hive

I have a column eventtime that only stores the time of day as string. Eg:
0445AM - means 04:45 AM. I am using the below query to convert to UNIX timestamp.
select unix_timestamp(eventtime,'hhmmaa'),eventtime from data_raw limit 10;
This seems to work fine for test data. I always thought unixtimestamp is a combination of date and time while here I only have the time. My question is what date does it consider while executing the above function? The timestamps seem to be quite small.
Unix timestamp is the bigint number of seconds from Unix epoch (1970-01-01 00:00:00 UTC). The unix time stamp is a way to track time as a running total of seconds.
select unix_timestamp('0445AM','hhmmaa') as unixtimestamp
Returns
17100
And this is exactly 4hrs, 45min converted to seconds.
select 4*60*60 + 45*60
returns 17100
And to convert it back use from_unixtime function
select from_unixtime (17100,'hhmmaa')
returns:
0445AM
If you convert using format including date, you will see it assumes the date is 1970-01-01
select from_unixtime (17100,'yyyy-MM-dd hhmmaa')
returns:
1970-01-01 0445AM
See Hive functions dosc here.
Also there is very useful site about Unix timestamp

Extracting Time from Timestamp in SQL

I am using Redshift and am looking to extract the time from the timestamp.
Here is the timestamp: 2017-10-31 23:30:00
and I would just like to get the time as 23:30:00
Is that possible?
In Redshift you can simply cast the value to a time:
the_timestamp_column::time
alternatively you can use the standard cast() operator:
cast(the_timestamp_column as time)
Please go through this link
http://docs.aws.amazon.com/redshift/latest/dg/r_Dateparts_for_datetime_functions.html
timezone, timezone_hour, timezone_minute
Supported by the DATE_TRUNC function and the EXTRACT for time stamp
with time zone (TIMESTAMPTZ)
Examples is here
select extract(minute from timestamp '2009-09-09 12:08:43');
select extract(hours from timestamp '2009-09-09 12:08:43');