How to do difference of timestamp (which is in string) in hive?
I tried using
date_format(column_name,'yyyy-MM-dd HH:mm:ss.sss')
to convert into a timestamp but the difference is giving me a null value
try with unix_timestamp function
select unix_timestamp('2018-03-03 00:08:48.409') - unix_timestamp('2018-03-02 00:08:48.409');
+--------+--+
| _c0 |
+--------+--+
| 86400 |
+--------+--+
Your query would be something like
select (unix_timestamp(step_start_time) - unix_timestamp(step_end_time ))diff;
Related
I'm trying to parse a timestamp which is in ISO Date 8601 format.
Example: 2021-04-10T14:11:00Z
This information is stored inside a JSON object and for that reason I'm extracting that data as a string:
The format I'm looking for is a yy-MM-dd hh:mm format and for that I've tried the following
SQL CODE
SELECT document_id,
json_extract(data, '$.Pair') as pair,
PARSE_TIMESTAMP('%y-%m-%d %H:%M', json_extract(data, '$.AlertTime')) as alerttime,
COUNT(document_id) as alert_count
FROM `tradingview-alerts-26eb8.alltables.TradingView_000_raw_latest` as alert_view
GROUP BY alerttime, document_id, pair
Errors
The code from above causes the following error:
Failed to parse input string '"2021-04-10T03:17:00Z"
The reason for this is the T in the middle of the date, I believe,
In order to discard that I tried this change:
SUBSTR(json_extract(data, '$.AlertTime'), 1, 10))
But with that I'm getting an error on a different row:
Failed to parse input string '"2021-04-1'
I'm wondering if it is because of how the date is being presented (year-month-date) the date not having 2 digits? such as 2021-04-01 instead of 2021-04-1.
However if I try with
SUBSTR(json_extract(data, '$.AlertTime'), 1, 11))
The error I'm getting is
Failed to parse input string '"2021-04-10'
You need to include those ISO symbols into format specifier as constants:
select parse_timestamp('%FT%TZ', '2021-04-12T17:38:10Z')
| f0_ |
---------------------------
| 2021-04-12 17:38:10 UTC |
UPD: If you have fractional seconds, you can include optional milliseconds element %E*S instead of time element %T. For non-UTC timestamps there should also be timezone element: %Ez. So, the possible solution could be:
with a as (
select '2021-04-12T20:44:06.95841Z' as ts_str union all
select '2021-04-12T23:44:07.83738+03:00' union all
select '2021-04-12T23:44:08+03:00'
)
select parse_timestamp('%FT%H:%M:%E*S%Ez', regexp_replace(ts_str, 'Z$', '+00:00')) as ts
from a
| ts |
|--------------------------------|
| 2021-04-12 20:44:06.958410 UTC |
| 2021-04-12 20:44:07.837380 UTC |
| 2021-04-12 20:44:08 UTC |
I think you can use timestamp => datetime func.
Like this
datetime(timestamp(2021-11-29T00:00:00.000Z))
How can I convert unix epoch with milliseconds to timestamp with milliseconds In Hive?
Neither cast() nor from_unixtime() function is working to get the timestamp with milliseconds.
I tried .SSS but the function just increases the year and doesn't take it as a part of millisecond.
scala> spark.sql("select from_unixtime(1598632101000, 'yyyy-MM-dd hh:mm:ss.SSS')").show(false)
+-----------------------------------------------------+
|from_unixtime(1598632101000, yyyy-MM-dd hh:mm:ss.SSS)|
+-----------------------------------------------------+
|52628-08-20 02:00:00.000 |
+-----------------------------------------------------+
I think you can just cast():
select cast(1598632101000 / 1000.0 as timestamp)
Note that this produces a timestamp datatype rather than a string, as in from_unixtime().
from_unixtime works with seconds, not milliseconds. Convert to timestamp in seconds from_unixtime(ts div 1000), concatenate with '.'+ milliseconds (mod(ts,1000)) and cast as timestamp. Tested in Hive:
with your_data as (
select stack(2,1598632101123, 1598632101000) as ts
)
select cast(concat(from_unixtime(ts div 1000),'.',mod(ts,1000)) as timestamp)
from your_data;
Result:
2020-08-28 16:28:21.123
2020-08-28 16:28:21.0
Here's another way in pure Spark Scala using UDF to wrap the Java function to return new Timestamp(ms)
import java.sql.Timestamp
val fromMilli = udf((ms:Long) => new Timestamp(ms))
#Test
val df = Seq((1598632101123L)).toDF("ts")
df.select(fromMilli($"ts")).show(false)
Result
+-----------------------+
|UDF(ts) |
+-----------------------+
|2020-08-28 16:28:21.123|
+-----------------------+
I have a table in which date is stored in a dimension table.
I am using this table to retrieve the latest reporting week.
SELECT MAX("Week") AS "Date" FROM "DWH"."DimWeek"
This returns a table with the following date that is in 'YYYY-MM-DD'
+--------------------+
| Date |
|--------------------+
| 2017-01-03 |
+--------------------+
I wish to convert this date, so it returns a format of 'DD-MM-YYYY'
I have attempted to use
SELECT TO_DATE(MAX("Week"), 'DD-MM-YYYY') AS "Date" FROM "DWH"."DimWeek"
SQL Error
too many arguments for function [TO_DATE(MAX("Week", 'DD-MM-YYYY')] expected 1, got 2
I have also attempted to convert it to CHAR
SELECT TO_DATE(TO_CHAR(MAX("Week")), 'DD-MM-YYYY') AS "Date" FROM "DWH"."DimWeek"
However this also returns the result in the undesired format
+--------------------+
| Date |
|--------------------+
| 2017-01-03 |
+--------------------+
Any tips or ideas? Currently querying from Snowflake SQL
Use TO_CHAR(). You want a string in the result, not a date:
SELECT TO_CHAR(MAX("Week")), 'DD-MM-YYYY') AS Date
FROM "DWH"."DimWeek"
I'll try to explain my problem as clear as possible. I would like to filter a table by date (selecting only the record have the date included in current month) and in Oracle SQL I'm using the following query to achieve such goal:
select * from table t1
where t1.DATE_COLUMN between TRUNC(SYSDATE, 'mm') and SYSDATE
How can I replicate the same filter in Hive SQL? The column I should use to apply the filter is a TIMESTAMP type column (e.g. 2017-05-15 00:00:00).
I'm using CDH 5.7.6-1.
Any advice?
Be aware that unix_timestamp is not fixed and is going to change during the query.
For that reason it cannot be used for partitions elimination.
For newer Hive versions use current_date / current_timestamp instead.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
select *
from table t1
where t1.DATE_COLUMN
between cast(from_unixtime(unix_timestamp(),'yyyy-MM-01 00:00:00') as timestamp)
and cast(from_unixtime(unix_timestamp()) as timestamp)
;
select cast (from_unixtime(unix_timestamp(),'yyyy-MM-01 00:00:00') as timestamp)
,cast (from_unixtime(unix_timestamp()) as timestamp)
;
+---------------------+---------------------+
| _c0 | _c1 |
+---------------------+---------------------+
| 2017-05-01 00:00:00 | 2017-05-16 01:04:55 |
+---------------------+---------------------+
You can format as strings:
where date_format(t1.DATE_COLUMN, 'y-m') = date_format(current_timestamp, 'y-m')
I realize that I don't have Hive accessible right now. The documentation suggests 'y-m', but the Java documentation suggests 'yyyy-mm'.
I have a table with a VARCHAR(64) column called datetimestamp that contains datetime strings with the following format:
[02/Jun/2016:23:58:30 +0000].
I'm trying to convert this to a date using to_date(datetimestamp, 'DD/Mon/YYYY:HH24:MM:SS') in my select statement, but I'm getting an 'Invalid Format' error. Not sure if its the UTC bit or what that's messing it up... what's the proper syntax?
Thanks!
It is a bit complicated since to_timestamp does not allow time zone information.
I have come up with this query:
WITH d(part) AS
(SELECT regexp_matches(
'02/Jun/2016:23:58:30 +0000',
'^([^ ]*) ([-+]?\d\d)(\d\d)$'
)
)
SELECT
CAST (to_timestamp(d.part[1], 'DD/Mon/YYYY:HH24:MI:SS')
AT TIME ZONE (d.part[2] || ':' || d.part[3])
AS timestamp with time zone)
AS converted
FROM d;
converted
------------------------
2016-06-02 21:58:30+02
(1 row)
(I am at time zone UTC+02.)
select to_date('02/Jun/2016:23:58:30 +0000', 'DD/Mon/YYYY:HH24:MI:SS');
| to_date |
|------------|
| 2016-06-02 |