Hive String to Timestamp conversion with Milliseconds - sql

I have a requirement to convert the mentioned input string format and produce the desired output in timestamp as shown below.
Input: 16AUG2001:23:46:32.876086
Desired Output: 2001-08-16 23:46:32.876086
Output which is coming by running the below code: 2001-08-17 00:01:08
Query:
select '16AUG2001:23:46:32.876086' as row_ins_timestamp,
from_unixtime(unix_timestamp('16AUG2001:23:46:32.876086',
'ddMMMyyyy:HH:mm:ss.SSSSSS')) as row_ins_timestamp
from temp;
Milliseconds part is not getting converted as required. Please suggest.

unix_timestamp function does not preserve milliseconds.
Convert without milliseconds, then concatenate with millisecond part:
with your_data as (
select stack(3,
'16AUG2001:23:46:32.876086',
'16AUG2001:23:46:32',
'16AUG2001:23:46:32.123'
) as ts
)
select concat_ws('.',from_unixtime(unix_timestamp(split(ts,'\\.')[0],'ddMMMyyyy:HH:mm:ss')),split(ts,'\\.')[1])
from your_data;
Result:
2001-08-16 23:46:32.876086
2001-08-16 23:46:32
2001-08-16 23:46:32.123
Time taken: 0.089 seconds, Fetched: 3 row(s)

Related

Casting DATE/TIMESTAMP types to NUMERIC is prohibited in Hive 3.1.3

I am trying to cast dates from string format to numeric format in milliseconds trying to keep
also the .SSS part as I need to process data at the level of milliseconds duration. While in Hive 1.1.0 I am able to do that with the code below in the newer version it does not let me do that:
select current_timestamp(), unix_timestamp(current_timestamp(), 'yyyy-MM-dd HH:mm:ss.SSS')*1000, cast((cast(date_format(cast(current_timestamp() as string),'yyyy-MM-dd HH:mm:ss.SSS') as timestamp)) as double) * 1000 as time_milliseconds
Can you tell me a workaround to this?
Thank you
Extract millisecond part from string and add to the (timestamp in seconds)*1000
select current_timestamp(),
--unix_timestamp returns seconds only
unix_timestamp(current_timestamp())*1000, --without .SSS * 1000
unix_timestamp(current_timestamp())*1000 +
bigint(regexp_extract(string(current_timestamp()),'\\.(\\d+)$',1)) --with .SSS
Result:
2021-09-21 13:52:32.034 1632232352000 1632232352034
Explicit conversion to bigint and string may be not necessary.
One more method how you can get milliseconds part is to split string by dot and get element #1: split(current_timestamp(),'\\.')[1] instead of regexp_extract(string(current_timestamp()),'\\.(\\d+)$',1):
select ts, unix_timestamp(ts_splitted[0])*1000, unix_timestamp(ts_splitted[0]) * 1000 + ts_splitted[1]
from
(
select current_timestamp() ts, split(current_timestamp(),'\\.') ts_splitted
)s
Result:
2021-09-21 18:21:11.032 1632248471000 1632248471032
I prefer this method. Of course if you have timestamps with microseconds or nanoseconds, the logic should be adjusted accordingly based on the length of the fractional part.

Hive query to find difference between two timestamp

I have two timestamp request and response. I need to find out difference between two these two timestamp in millisecond as below.
Request: 2020-03-20 10:00:00:010
Response: 2020-03-20 10:00:00:020
Diff: 10 millisecond
I tried but could not get my required answer.
I tried as below but it is giving me 0 instead of 10.
select (unix_timestamp(2020-03-20 10:00:00:010) - unix_timestamp(2020-03-20 10:00:00:020))
Thats because unix_timestamp trims out the millisecond portion.
You need some regex to parse it - something like:
select cast(regexp_replace('2020-03-20 10:00:00:020',
'(\\d{4})-(\\d{2})-(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2}):(\\d{3})',
'$1-$2-$3 $4:$5:$6.$7') as timestamp);
OR
SELECT ROUND((CAST(CAST('2020-03-20 10:00:00.020' AS TIMESTAMP) AS DOUBLE)
- CAST(CAST('2020-03-20 10:00:00.010' AS TIMESTAMP) AS DOUBLE)) * 1000)
as timediff
The millisecond portion should of the form yyyy-mm-dd hh:mm:ss.SSS
So you may have to replace the ":" with "." for milliseconds.
Hive timestamp will be always something like:
2020-03-20 01:50:19.158
To get the difference between the two timestamps, you can try running below query:
select date_format("2020-03-20 10:00:00.020",'S') -date_format("2020-03-20 10:00:00.010",'S');
If the millisecond part is separated by ":" , then you can get the difference by running below query:
select cast(substr("2020-03-20 10:00:00:020",-3) as int) - cast(substr("2020-03-20 10:00:00:010",-3) as int);

Presto-Sql : Converting time in string format to date format

In presto, I have a date formatted as varchar that looks like below :
10:46:00
I need to cast this in timestamp. I have tried few but presto throwing errors as
Value cannot be cast to date:10:46:00 and Value cannot be cast to
timestamp:10:46:00
select cast('10:46:00' as DATE) from abc;
select cast('10:46:00' as TIMESTAMP) from abc;
Try with the below query it will solve your problem.
Input Query in Presto:
select (hour(date_parse(CheckStartTime,'%T')) + 1) as hr from TableName;
CheckStartTime:
Column name(varchar) of the table in the format of '12:32:20'.
Output:
13 (it will add one hour to the input time)

Initcap of word

I'm having a table x it contain the column resource_name in this column I'm having data like NASRI(SRI).
I'm applying initcap on this column it's giving output Nasri(sri). But my expected output is Nasri(Sri).
How I can achieve the desired result?
Thank you
One possible solution is to use split() with concat_ws(). If value does not contain '()', then it will also work correctly. Demo with ():
hive> select concat_ws('(',initcap(split('NASRI(SRI)','\\(')[0]),
initcap(split('NASRI(SRI)','\\(')[1])
);
OK
Nasri(Sri)
Time taken: 0.974 seconds, Fetched: 1 row(s)
And for value without () it also works good:
hive> select concat_ws('(',initcap(split('NASRI','\\(')[0]),
initcap(split('NASRI','\\(')[1])
);
OK
Nasri
Time taken: 0.697 seconds, Fetched: 1 row(s)

Hive FROM_UNIXTIME() with milliseconds

I have seen enough posts where we divide by 1000 or cast to convert from Milliseconds epoch time to Timestamp. I would like to know how can we retain the Milliseconds piece too in the timestamp.
1440478800123 The last 3 bytes are milliseconds. How do i convert this to something like YYYYMMDDHHMMSS.sss
I need to capture the millisecond portion also in the converted timestamp
Thanks
select cast(epoch_ms as timestamp)
actually works, because when casting to a timestamp (as opposed to using from_unixtime()), Hive seems to assume an int or bigint is milliseconds. A floating point type is treated as seconds. That is undocumented as far as I can see, and possibly a bug. I wanted a string which includes the timezone (which can be important - particularly if the server changes to summer/daylight savings time), and wanted to be explicit about the conversion in case the cast functionality changes. So this gives an ISO 8601 date (adjust format string as needed for another format)
select from_unixtime(
floor( epoch_ms / 1000 )
, printf( 'yyyy-MM-dd HH:mm:ss.%03dZ', epoch_ms % 1000 )
)
create a hive udf in java
package com.kishore.hiveudf;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
#UDFType(stateful = true)
public class TimestampToDateUDF extends UDF {
String dateFormatted;
public String evaluate(long timestamp) {
Date date = new Date(timestamp);
DateFormat formatter = new SimpleDateFormat("YYYYMMDDHHmmss:SSS");
dateFormatted = formatter.format(date);
return dateFormatted;
}
}
export as TimestampToDateUDF.jar
hive> ADD JAR /home/kishore/TimestampToDate.jar;
hive> create TEMPORARY FUNCTION toDate AS 'com.kishore.hiveudf.TimestampToDateUDF' ;
output
select * from tableA;
OK
1440753288123
Time taken: 0.071 seconds, Fetched: 1 row(s)
hive> select toDate(timestamp) from tableA;
OK
201508240144448:123
Time taken: 0.08 seconds, Fetched: 1 row(s)