Hive FROM_UNIXTIME() with milliseconds - hive

I have seen enough posts where we divide by 1000 or cast to convert from Milliseconds epoch time to Timestamp. I would like to know how can we retain the Milliseconds piece too in the timestamp.
1440478800123 The last 3 bytes are milliseconds. How do i convert this to something like YYYYMMDDHHMMSS.sss
I need to capture the millisecond portion also in the converted timestamp
Thanks

select cast(epoch_ms as timestamp)
actually works, because when casting to a timestamp (as opposed to using from_unixtime()), Hive seems to assume an int or bigint is milliseconds. A floating point type is treated as seconds. That is undocumented as far as I can see, and possibly a bug. I wanted a string which includes the timezone (which can be important - particularly if the server changes to summer/daylight savings time), and wanted to be explicit about the conversion in case the cast functionality changes. So this gives an ISO 8601 date (adjust format string as needed for another format)
select from_unixtime(
floor( epoch_ms / 1000 )
, printf( 'yyyy-MM-dd HH:mm:ss.%03dZ', epoch_ms % 1000 )
)

create a hive udf in java
package com.kishore.hiveudf;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
#UDFType(stateful = true)
public class TimestampToDateUDF extends UDF {
String dateFormatted;
public String evaluate(long timestamp) {
Date date = new Date(timestamp);
DateFormat formatter = new SimpleDateFormat("YYYYMMDDHHmmss:SSS");
dateFormatted = formatter.format(date);
return dateFormatted;
}
}
export as TimestampToDateUDF.jar
hive> ADD JAR /home/kishore/TimestampToDate.jar;
hive> create TEMPORARY FUNCTION toDate AS 'com.kishore.hiveudf.TimestampToDateUDF' ;
output
select * from tableA;
OK
1440753288123
Time taken: 0.071 seconds, Fetched: 1 row(s)
hive> select toDate(timestamp) from tableA;
OK
201508240144448:123
Time taken: 0.08 seconds, Fetched: 1 row(s)

Related

BigQuery Timestamp in Select giving different format

using BigQuery query API to retrieve data from BigQuery. for timestamp column , am getting values in different format.
query="select * from table"
QueryJobConfiguration queryConfig = QueryJobConfiguration
.newBuilder(query)
.setUseLegacySql(false)
.build();
Value in Table : "2022-02-25 08:47:48.801665"
Value in Output : 1.645778868801665E9
If I am casting to string the getting proper value. why is this happening ?
can someone explain ?
You need to convert to TIMESTAMP, because it takes milliseconds since 1970-01-01T00:00:00 UTC. That's why you are getting this result.
You can convert this result to TIMESTAMP using this formula timestamp.toInstant().toEpochMilli() * 1000.
You can see this example:
QueryParameterValue.timestamp(
// Timestamp takes microseconds since 1970-01-01T00:00:00 UTC
timestamp.toInstant().toEpochMilli() * 1000))
Here is more documentation about it.
If you want to cast from BigQuery. You have some options.
CAST the TIMESTAMP columns.
SELECT CAST(DATE("2022-02-25 08:47:48.801665") AS TIMESTAMP )
CAST TIMESTAMP to STRING.
SELECT STRING(TIMESTAMP "2022-02-25 08:47:48.801665", "UTC") AS string;
Give some format.
SELECT FORMAT_TIMESTAMP("%c", TIMESTAMP "2022-02-25 08:47:48.801665", "UTC") AS formatted;
You can see more documentation about CAST.

Casting DATE/TIMESTAMP types to NUMERIC is prohibited in Hive 3.1.3

I am trying to cast dates from string format to numeric format in milliseconds trying to keep
also the .SSS part as I need to process data at the level of milliseconds duration. While in Hive 1.1.0 I am able to do that with the code below in the newer version it does not let me do that:
select current_timestamp(), unix_timestamp(current_timestamp(), 'yyyy-MM-dd HH:mm:ss.SSS')*1000, cast((cast(date_format(cast(current_timestamp() as string),'yyyy-MM-dd HH:mm:ss.SSS') as timestamp)) as double) * 1000 as time_milliseconds
Can you tell me a workaround to this?
Thank you
Extract millisecond part from string and add to the (timestamp in seconds)*1000
select current_timestamp(),
--unix_timestamp returns seconds only
unix_timestamp(current_timestamp())*1000, --without .SSS * 1000
unix_timestamp(current_timestamp())*1000 +
bigint(regexp_extract(string(current_timestamp()),'\\.(\\d+)$',1)) --with .SSS
Result:
2021-09-21 13:52:32.034 1632232352000 1632232352034
Explicit conversion to bigint and string may be not necessary.
One more method how you can get milliseconds part is to split string by dot and get element #1: split(current_timestamp(),'\\.')[1] instead of regexp_extract(string(current_timestamp()),'\\.(\\d+)$',1):
select ts, unix_timestamp(ts_splitted[0])*1000, unix_timestamp(ts_splitted[0]) * 1000 + ts_splitted[1]
from
(
select current_timestamp() ts, split(current_timestamp(),'\\.') ts_splitted
)s
Result:
2021-09-21 18:21:11.032 1632248471000 1632248471032
I prefer this method. Of course if you have timestamps with microseconds or nanoseconds, the logic should be adjusted accordingly based on the length of the fractional part.

HIve Related date conversion

I am facing issue while trying to add one year to the current timestamp.
I was able to add the year to the current timestamp but the timestamp is not coming with the result.
Any help would be a great support.
I am tring this select from_unixtime(unix_timestamp());
If you want a timestamp one year later than now, you can do date arithmetics as follows:
select current_timestamp() + interval '1' year
You can solve your problem by using hive udf.
package com.practice;
import java.sql.Timestamp;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import org.apache.hadoop.hive.ql.exec.UDF;
public class addYearWithTimestamp extends UDF{
private SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.S");
public String evaluate(String t, int year) throws ParseException{
long time=formatter.parse(t.toString()).getTime();
java.sql.Timestamp ts = new Timestamp(time);
Calendar cal = Calendar.getInstance();
cal.setTime(ts);
cal.add(Calendar.YEAR, year);
ts.setTime(cal.getTime().getTime());
return ts.toString();
}
}
After creating addYearWithTimestamp.jar, register it into hive and create udf:
ADD JAR /home/cloudera/Desktop/addYearWithTimestamp.jar;
CREATE TEMPORARY FUNCTION addYear as 'com.practice.addYearWithTimestamp';
Use that udf :
hive> SELECT addYear(current_timestamp,1);
OK
2021-04-25 08:22:17.948
Time taken: 0.083 seconds, Fetched: 1 row(s)

Hive String to Timestamp conversion with Milliseconds

I have a requirement to convert the mentioned input string format and produce the desired output in timestamp as shown below.
Input: 16AUG2001:23:46:32.876086
Desired Output: 2001-08-16 23:46:32.876086
Output which is coming by running the below code: 2001-08-17 00:01:08
Query:
select '16AUG2001:23:46:32.876086' as row_ins_timestamp,
from_unixtime(unix_timestamp('16AUG2001:23:46:32.876086',
'ddMMMyyyy:HH:mm:ss.SSSSSS')) as row_ins_timestamp
from temp;
Milliseconds part is not getting converted as required. Please suggest.
unix_timestamp function does not preserve milliseconds.
Convert without milliseconds, then concatenate with millisecond part:
with your_data as (
select stack(3,
'16AUG2001:23:46:32.876086',
'16AUG2001:23:46:32',
'16AUG2001:23:46:32.123'
) as ts
)
select concat_ws('.',from_unixtime(unix_timestamp(split(ts,'\\.')[0],'ddMMMyyyy:HH:mm:ss')),split(ts,'\\.')[1])
from your_data;
Result:
2001-08-16 23:46:32.876086
2001-08-16 23:46:32
2001-08-16 23:46:32.123
Time taken: 0.089 seconds, Fetched: 3 row(s)

Date comparison in Hive

I'm working with Hive and I have a table structured as follows:
CREATE TABLE t1 (
id INT,
created TIMESTAMP,
some_value BIGINT
);
I need to find every row in t1 that is less than 180 days old. The following query yields no rows even though there is data present in the table that matches the search predicate.
select *
from t1
where created > date_sub(from_unixtime(unix_timestamp()), 180);
What is the appropriate way to perform a date comparison in Hive?
How about:
where unix_timestamp() - created < 180 * 24 * 60 * 60
Date math is usually simplest if you can just do it with the actual timestamp values.
Or do you want it to only cut off on whole days? Then I think the problem is with how you are converting back and forth between ints and strings. Try:
where created > unix_timestamp(date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),180),'yyyy-MM-dd')
Walking through each UDF:
unix_timestamp() returns an int: current time in seconds since epoch
from_unixtime(,'yyyy-MM-dd') converts to a string of the given format, e.g. '2012-12-28'
date_sub(,180) subtracts 180 days from that string, and returns a new string in the same format.
unix_timestamp(,'yyyy-MM-dd') converts that string back to an int
If that's all getting too hairy, you can always write a UDF to do it yourself.
Alternatively you may also use datediff. Then the where clause would be
in case of String timestamp (jdbc format) :
datediff(from_unixtime(unix_timestamp()), created) < 180;
in case of Unix epoch time:
datediff(from_unixtime(unix_timestamp()), from_unixtime(created)) < 180;
I think maybe it's a Hive bug dealing with the timestamp type. I've been trying to use it recently and getting incorrect results.
If I change your schema to use a string instead of timestamp, and supply values in the
yyyy-MM-dd HH:mm:ss
format, then the select query worked for me.
According to the documentation, Hive should be able to convert a BIGINT representing epoch seconds to a timestamp, and that all existing datetime UDFs work with the timestamp data type.
with this simple query:
select from_unixtime(unix_timestamp()), cast(unix_timestamp() as
timestamp) from test_tt limit 1;
I would expect both fields to be the same, but I get:
2012-12-29 00:47:43 1970-01-16 16:52:22.063
I'm seeing other weirdness as well.
TIMESTAMP is milliseconds
unix_timestamp is in seconds
You need to multiply the RHS by 1000.
where created > 1000 * date_sub(from_unixtime(unix_timestamp()), 180);
After reviewing this and referring to Date Difference less than 15 minutes in Hive I came up with a solution. While I'm not sure why Hive doesn't perform the comparison effectively on dates as strings (they should sort and compare lexicographically), the following solution works:
FROM (
SELECT id, value,
unix_timestamp(created) c_ts,
unix_timestamp(date_sub(from_unixtime(unix_timestamp()), 180), 'yyyy-MM-dd') c180_ts
FROM t1
) x
JOIN t1 t ON x.id = t.id
SELECT to_date(t.Created),
x.id, AVG(COALESCE(x.HighestPrice, 0)), AVG(COALESCE(x.LowestPrice, 0))
WHERE unix_timestamp(t.Created) > x.c180_ts
GROUP BY to_date(t.Created), x.id ;