How to convert "2019-11-02T20:18:00Z" to timestamp in HQL? - sql

I have datetime string "2019-11-02T20:18:00Z". How can I convert it into timestamp in Hive HQL?

try this:
select from_unixtime(unix_timestamp("2019-11-02T20:18:00Z", "yyyy-MM-dd'T'HH:mm:ss"))

If you want preserve milliseconds then remove Z, replace T with space and convert to timestamp:
select timestamp(regexp_replace("2019-11-02T20:18:00Z", '^(.+?)T(.+?)Z$','$1 $2'));
Result:
2019-11-02 20:18:00
Also it works with milliseconds:
select timestamp(regexp_replace("2019-11-02T20:18:00.123Z", '^(.+?)T(.+?)Z$','$1 $2'));
Result:
2019-11-02 20:18:00.123
Using from_unixtime(unix_timestamp()) solution does not work with milliseconds.
Demo:
select from_unixtime(unix_timestamp("2019-11-02T20:18:00.123Z", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"));
Result:
2019-11-02 20:18:00
Milliseconds are lost. And the reason is that function unix_timestamp returns seconds passed from the UNIX epoch (1970-01-01 00:00:00 UTC).

Related

epoch with milliseconds to timestamp with milliseconds conversion in Hive

How can I convert unix epoch with milliseconds to timestamp with milliseconds In Hive?
Neither cast() nor from_unixtime() function is working to get the timestamp with milliseconds.
I tried .SSS but the function just increases the year and doesn't take it as a part of millisecond.
scala> spark.sql("select from_unixtime(1598632101000, 'yyyy-MM-dd hh:mm:ss.SSS')").show(false)
+-----------------------------------------------------+
|from_unixtime(1598632101000, yyyy-MM-dd hh:mm:ss.SSS)|
+-----------------------------------------------------+
|52628-08-20 02:00:00.000 |
+-----------------------------------------------------+
I think you can just cast():
select cast(1598632101000 / 1000.0 as timestamp)
Note that this produces a timestamp datatype rather than a string, as in from_unixtime().
from_unixtime works with seconds, not milliseconds. Convert to timestamp in seconds from_unixtime(ts div 1000), concatenate with '.'+ milliseconds (mod(ts,1000)) and cast as timestamp. Tested in Hive:
with your_data as (
select stack(2,1598632101123, 1598632101000) as ts
)
select cast(concat(from_unixtime(ts div 1000),'.',mod(ts,1000)) as timestamp)
from your_data;
Result:
2020-08-28 16:28:21.123
2020-08-28 16:28:21.0
Here's another way in pure Spark Scala using UDF to wrap the Java function to return new Timestamp(ms)
import java.sql.Timestamp
val fromMilli = udf((ms:Long) => new Timestamp(ms))
#Test
val df = Seq((1598632101123L)).toDF("ts")
df.select(fromMilli($"ts")).show(false)
Result
+-----------------------+
|UDF(ts) |
+-----------------------+
|2020-08-28 16:28:21.123|
+-----------------------+

How to I convert Timestamp in YYY-MM-DD HH:mm:ss to YYY-MM-DD HH:mm:ss.SSS in Hive while doing a select query?

I am comparing timestamp columns between 2 different database engines and I need to retrieve the time stamp column stored in YYY-MM-DD HH:mm:ss format to YYY-MM-DD HH:mm:ss.SSS, with SSS being 000 when no entry is there.
Can I do the above using Hive select query?
Split the timestamp to get milliseconds part, use rpad to add zeroes if there is no millisecond part at all or milliseconds part is less that 3 digits.
Demo:
with your_data as (
select stack(3, '2019-11-02 20:18:00.123',
'2019-11-02 20:18:00.12',
'2019-11-02 20:18:00'
) as ts
)
select concat(split(ts,'\\.')[0],'.',rpad(nvl(split(ts,'\\.')[1],''),3,0))
from your_data d
;
Result:
2019-11-02 20:18:00.123
2019-11-02 20:18:00.120
2019-11-02 20:18:00.000
Given that both formats (and their lengths) are strictly defined, you can use this simple logic:
left(concat(ts,'.000'),19)
Can't check the exact syntax, but basically you append extra zeros and cut them off if you don't need them.

How do I convert gps time to date in BigQuery

Currently I am using
SELECT TIMESTAMP_SECONDS(CAST(my_time_unix_ns/1000 AS int64)) AS my_date,...
But some of the columns store time in gps ns. How do I convert them into date?
There is no support for nanosecond precision within Cloud BigQuery. BigQuery CURRENT_TIMESTAMP() returns only up to milliseconds (Example:1), and the CAST() function supports only up to millisecond precision level (#Example-2, 3 and 4). For more context on timestamp precision, please refer to the supported range of BigQuery timestamps [1], which is 0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999.
On the other hand, I assume the Unix time function you are using returns an integer value larger than the capacity of Int64. Please refer to the numeric data type documentation [2]
Example-1:
SELECT CURRENT_TIMESTAMP() AS Current_Time
Result: 2019-12-24 17:51:44.419542 UTC
Example-2:
SELECT CAST('2019-12-24 00:00:00.000000' AS TIMESTAMP)
Result: 2019-12-24 00:00:00 UTC
Example-3:
SELECT CAST('2019-12-24 11:12:47.145482+00' AS TIMESTAMP)
Result: 2019-12-24 11:12:47.145482 UTC
Example-4:
SELECT CAST('2019-12-24 11:12:47.14548200' AS TIMESTAMP)
Result: error: "Could not cast literal "2019-12-24 11:12:47.14548200" to type TIMESTAMP "
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp-type
[2] https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer-type

Converting only time to unixtimestamp in Hive

I have a column eventtime that only stores the time of day as string. Eg:
0445AM - means 04:45 AM. I am using the below query to convert to UNIX timestamp.
select unix_timestamp(eventtime,'hhmmaa'),eventtime from data_raw limit 10;
This seems to work fine for test data. I always thought unixtimestamp is a combination of date and time while here I only have the time. My question is what date does it consider while executing the above function? The timestamps seem to be quite small.
Unix timestamp is the bigint number of seconds from Unix epoch (1970-01-01 00:00:00 UTC). The unix time stamp is a way to track time as a running total of seconds.
select unix_timestamp('0445AM','hhmmaa') as unixtimestamp
Returns
17100
And this is exactly 4hrs, 45min converted to seconds.
select 4*60*60 + 45*60
returns 17100
And to convert it back use from_unixtime function
select from_unixtime (17100,'hhmmaa')
returns:
0445AM
If you convert using format including date, you will see it assumes the date is 1970-01-01
select from_unixtime (17100,'yyyy-MM-dd hhmmaa')
returns:
1970-01-01 0445AM
See Hive functions dosc here.
Also there is very useful site about Unix timestamp

Hive date cast chopping of milli seconds

Below date cast is not displaying milli seconds.
select from_unixtime(unix_timestamp("2017-07-31 23:48:25.957" , "yyyy-MM-dd HH:mm:ss.SSS"));
2017-07-31 23:48:25
What is the way to get milli seconds?
Thanks.
Since this string is in ISO format, the casting can be done straightforward
hive> select cast("2017-07-31 23:48:25.957" as timestamp);
OK
2017-07-31 23:48:25.957
or
hive> select timestamp("2017-07-31 23:48:25.957");
OK
2017-07-31 23:48:25.957
because unix_timestamp is based on seconds, it truncate milliseconds.
Instead, you can transform string to timestamp using date_format, which preserve milliseconds. And then from_utc_timestamp.
select from_utc_timestamp(date_format("2017-07-31 23:48:25.957",'yyyy-MM-dd HH:mm:ss.SSS'),'UTC') as datetime