How can I print milliseconds in Hive without ignoring zeros - sql

I am trying to print 'YYYY-MM-d HH:mm:ss.S' which has exact 3 milliseconds in the end.
This is what I get normally.
hive> select current_timestamp();
OK
2020-09-22 12:00:26.658
But in edge cases I also get
hive> select current_timestamp();
OK
2020-09-22 12:00:25.5
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> select cast(current_timestamp() as timestamp);
OK
2020-09-22 12:00:00.09
Time taken: 0.084 seconds, Fetched: 1 row(s)
hive> select current_timestamp() as string;
OK
2020-09-22 11:07:12.27
Time taken: 0.076 seconds, Fetched: 1 row(s)
What I am expecting is not to ignore 0's at the end like:
hive> select current_timestamp();
OK
2020-09-22 12:00:25.500
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> select cast(current_timestamp() as timestamp);
OK
2020-09-22 12:00:00.090
Time taken: 0.084 seconds, Fetched: 1 row(s)
hive> select current_timestamp();
OK
2020-09-22 11:07:12.270
Time taken: 0.076 seconds, Fetched: 1 row(s)
What I tried:
hive> select from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:MM:ss.S');
unix_timestamp(void) is deprecated. Use current_timestamp instead.
OK
2020-09-22 11:09:30.0
Time taken: 0.064 seconds, Fetched: 1 row(s)
And I also tried converting current_timestamp() as string so it wont ignore 0's but that also don't work

Try rpad(string str, int len, string pad).
Doc:
Returns str, right-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters. In case of empty pad string, the return value is null.

Does it work when if you use date_format()?
select date_format(current_timestamp, 'yyyy-MM-dd HH:mm:ss.SSS')

Related

Issue with Hive Data types

We have 3 columns from source , colA is of 3 digits, colB is of 5 digits and ColC is of 5 digits.
We need to create 13 digit unique id based on above 3 columns
Query used - select colA*1000000000000 + colC*100000 + colC
Example -
hive> select 123*1000000000000 + 12345*100000 + 12345;
OK
123001234512345 -- Not Expected
Time taken: 0.091 seconds, Fetched: 1 row(s)
On checking further, below hive query does not give me the correct results.
hive> !hive --version;
Hive 2.3.3-mapr-1904-r9
Git git://738a1fde0d37/root/opensource/mapr-hive-2.3/dl/mapr-hive-2.3 -r 265b539b942d0b9f4811b15880204dec5c0c7e1b
Compiled by root on Tue Aug 6 05:36:17 PDT 2019
From source with checksum 88f44b7532ffd7141c15cb5742e9cb51
hive> select cast(12345*1000000 as bigint);
OK
-539901888
Time taken: 0.126 seconds, Fetched: 1 row(s)
hive> select cast(12345*10000000 as bigint);
OK
-1104051584
Time taken: 0.02 seconds, Fetched: 1 row(s)
hive> select cast(12345*100000000 as bigint);
OK
1844386048
Time taken: 0.018 seconds, Fetched: 1 row(s)
hive> select cast(12345*1000000000 as bigint);
OK
1263991296
Time taken: 0.032 seconds, Fetched: 1 row(s)
Whereas the below query works -
hive> select cast(12345*10000000000 as bigint);
OK
123450000000000
Time taken: 0.017 seconds, Fetched: 1 row(s)
hive> select cast(12345*1000 as bigint);
OK
12345000
Time taken: 0.025 seconds, Fetched: 1 row(s)
hive> select cast(12345*10000 as bigint);
OK
123450000
Time taken: 0.035 seconds, Fetched: 1 row(s)
hive> select cast(12345*100000 as bigint);
OK
1234500000
Time taken: 0.247 seconds, Fetched: 1 row(s)
As the documentation explains:
Integral literals are assumed to be INT by default, unless the number exceeds the range of INT in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.
In this expression:
cast(12345*1000000 as bigint)
The result of 12345*1000000 is cast as a bigint. That does not mean the multiplication is done using that type. For that, you need to cast before multiplying:
12345 * cast(1000000 as bigint)
Or, you can use the suffixes:
12345L * 1000000L
Note that no explicit cast() is required because the values are already bigint.

conversion from string to timestamp is not working

The data in the table as below.
The column jobdate data type is string.
jobdate
1536945012211.kc
1536945014231.kc
1536945312809.kc
I want to convert it to time stamp as the format 2018-12-205 06:15:10.505
I have tried the following queries but returning NULL.
select jobdate,from_unixtime(unix_timestamp(substr(jobdate,1,14),'YYYY-MM-DD HH:mm:ss.SSS')) from job_log;
select jobdate,from_unixtime(unix_timestamp(jobdate,'YYYY-MM-DD HH:mm:ss.SSS')) from job_log;
select jobdate,cast(date_format(jobdate,'YYYY-MM-DD HH:mm:ss.SSS') as timestamp) from job_log;
Please help me.
Thanks in advance
Original timestamps are too long, use 10 digits:
hive> select from_unixtime(cast(substr('1536945012211.kc',1,10) as int),'yyyy-MM-DD HH:mm:ss.SSS');
OK
2018-09-257 10:10:12.000
Time taken: 0.832 seconds, Fetched: 1 row(s)
hive> select from_unixtime(cast(substr('1536945012211.kc',1,10) as int),'yyyy-MM-dd HH:mm:ss.SSS');
OK
2018-09-14 10:10:12.000
Time taken: 0.061 seconds, Fetched: 1 row(s)
hive>

How to set decimal values in hive stack command

I am trying to execute below hive stack command
select stack(2,'A',10.1, '2015-01-01','B',20.123, '2016-01-01');
But it is giving me error because of inconsistencies in decimal precisions, below is the error message
Error: org.apache.spark.sql.AnalysisException: cannot resolve 'stack(2, 'A', 10.1BD, '2015-01-01', 'B', 20.123BD, '2016-01-01')' due to data type mismatch: Argument 2 (decimal(3,1)) != Argument 5 (decimal(5,3)); line 1 pos 7;
'Project [unresolvedalias(stack(2, A, 10.1, 2015-01-01, B, 20.123, 2016-01-01), None)]
+- OneRowRelation (state=,code=0)
Cast explicitly to double or decimal with required precision and scale:
hive> select stack(2,'A',cast(10.1 as double), '2015-01-01','B',cast(20.123 as double), '2016-01-01');
OK
A 10.1 2015-01-01
B 20.123 2016-01-01
Time taken: 2.818 seconds, Fetched: 2 row(s)
hive> select stack(2,'A',cast(10.1 as decimal(5,3)), '2015-01-01','B',cast(20.123 as decimal(5,3)), '2016-01-01');
OK
A 10.1 2015-01-01
B 20.123 2016-01-01
Time taken: 0.066 seconds, Fetched: 2 row(s)

date time comparisons in hive sql

in hive sql I have the following field as date time
date_time
2017-01-01 12:00:00
min_date
2017-02-01 12:00:00
can I compare both fields as date_time > min_date
in my sql query?
how do we compare date time in hive sql?
both timestamp types
You can compare timestamps or strings if they are in sort-able format like this yyyy-MM-dd HH:mm:ss[.f...]
Demo:
hive> select cast('2017-01-01 12:00:00' as timestamp)>cast('2017-02-01 12:00:00' as timestamp);
OK
false
Time taken: 0.13 seconds, Fetched: 1 row(s)
Example with strings:
hive> select '2017-01-01 12:00:00'>'2017-02-01 12:00:00';
OK
false
Time taken: 1.053 seconds, Fetched: 1 row(s)

How to trim leading zero in Hive

How to trim leading zero in Hive, I search too much on google but I didn't get any correct thing which is useful for my problem.
If digit is "00000012300234" want result like "12300234"
you can achieve it by using: regexp_replace String Function
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
The following removes leading zeroes, but leaves one if necessary (i.e. it wouldn't just turn "0" to a blank string).
hive> SELECT regexp_replace( "00000012300234","^0+(?!$)","") ;
OK
12300234
Time taken: 0.156 seconds, Fetched: 1 row(s)
hive> SELECT regexp_replace( "000000","^0+(?!$)","") ;
OK
0
Time taken: 0.157 seconds, Fetched: 1 row(s)
hive> SELECT regexp_replace( "0","^0+(?!$)","") ;
OK
0
Time taken: 0.12 seconds, Fetched: 1 row(s)
OR Using CAST - cast to int to string:
hive> SELECT CAST(CAST( "00000012300234" AS INT) as string);
OK
12300234
Time taken: 0.115 seconds, Fetched: 1 row(s)
hive> SELECT CAST( "00000012300234" AS INT);
OK
12300234
Time taken: 0.379 seconds, Fetched: 1 row(s)
hive>
nothing to do just cast the string in INT
SELECT CAST( "00000012300234" AS INT);
it will return 12300234
SELECT CAST( "00000012300234" AS INT) FROM <your_table> ;
--above SQL works works. But in case the number goes above INT range, then you need to have "BIGNINT" instead of "INT". Else you will see NULLs :-)
SELECT CAST( "00000012300234" AS BIGINT) FROM <your_table>;