How to return month from varchar column and values like "20180912" in hive?
It's strange that it worked fine with function month() on string type in hive,however it returns null now.
And month(from_unixtime(unix_timestamp)(date,'yyyymmdd')) return vaules that do not match the real month
Use substr():
hive> select substr('20180912',5,2);
OK
09
Time taken: 1.675 seconds, Fetched: 1 row(s)
Related
I have a requirement to convert the mentioned input string format and produce the desired output in timestamp as shown below.
Input: 16AUG2001:23:46:32.876086
Desired Output: 2001-08-16 23:46:32.876086
Output which is coming by running the below code: 2001-08-17 00:01:08
Query:
select '16AUG2001:23:46:32.876086' as row_ins_timestamp,
from_unixtime(unix_timestamp('16AUG2001:23:46:32.876086',
'ddMMMyyyy:HH:mm:ss.SSSSSS')) as row_ins_timestamp
from temp;
Milliseconds part is not getting converted as required. Please suggest.
unix_timestamp function does not preserve milliseconds.
Convert without milliseconds, then concatenate with millisecond part:
with your_data as (
select stack(3,
'16AUG2001:23:46:32.876086',
'16AUG2001:23:46:32',
'16AUG2001:23:46:32.123'
) as ts
)
select concat_ws('.',from_unixtime(unix_timestamp(split(ts,'\\.')[0],'ddMMMyyyy:HH:mm:ss')),split(ts,'\\.')[1])
from your_data;
Result:
2001-08-16 23:46:32.876086
2001-08-16 23:46:32
2001-08-16 23:46:32.123
Time taken: 0.089 seconds, Fetched: 3 row(s)
I need o convert a integer value to the highest data type in hive as my value is of 25 digits
select cast(18446744073709551614 as bigint);
NULL value will be returned for the above select stmnt;
I am very well aware that the supplied number is greater than the largest number of Bigint. But we are getting such values upon which i have to calculate the max,min,sum,avg
So how can i cast this type of values so that i will not get the NULLs.
Use decimal(38, 0) for storing numbers bigger than BIGINT, it can store 38 digits. BIGINT can store 19 digits. Read also manual about decimal type.
For literals postfix BD is required. Example:
hive> select CAST(18446744073709551614BD AS DECIMAL(38,0))+CAST(18446744073709551614BD AS DECIMAL(38,0));
OK
36893488147419103228
Time taken: 0.334 seconds, Fetched: 1 row(s)
hive> select CAST(18446744073709551614BD AS DECIMAL(38,0))*2;
OK
36893488147419103228
Time taken: 0.129 seconds, Fetched: 1 row(s)
I'm having a table x it contain the column resource_name in this column I'm having data like NASRI(SRI).
I'm applying initcap on this column it's giving output Nasri(sri). But my expected output is Nasri(Sri).
How I can achieve the desired result?
Thank you
One possible solution is to use split() with concat_ws(). If value does not contain '()', then it will also work correctly. Demo with ():
hive> select concat_ws('(',initcap(split('NASRI(SRI)','\\(')[0]),
initcap(split('NASRI(SRI)','\\(')[1])
);
OK
Nasri(Sri)
Time taken: 0.974 seconds, Fetched: 1 row(s)
And for value without () it also works good:
hive> select concat_ws('(',initcap(split('NASRI','\\(')[0]),
initcap(split('NASRI','\\(')[1])
);
OK
Nasri
Time taken: 0.697 seconds, Fetched: 1 row(s)
An example of this field is "/products/106017388" in the table.
What SQL query shall I write to get the number 106017388 from the field.
Many thanks.
You can try hive function regexp_extract
Something like
select regexp_extract(field_name, "([0-9]+)$", 1) from table_name;
Debuggex Demo for the description about the regex ([0-9]+)$
Documentation
You may use the split command in hive to extract the required value.Like below;
select * from test_stackoverflow;
1 /products/106017388
2 /products1/06017388
Time taken: 0.66 seconds, Fetched: 2 row(s)
select split(value,'[/]')[2] from test_stackoverflow;
OK
106017388
06017388
Time taken: 0.105 seconds, Fetched: 2 row(s)
Hope this helps!
SUBSTR('/products/106017388',11)
to get only the integer part..
I'm working with Hive and I have a table structured as follows:
CREATE TABLE t1 (
id INT,
created TIMESTAMP,
some_value BIGINT
);
I need to find every row in t1 that is less than 180 days old. The following query yields no rows even though there is data present in the table that matches the search predicate.
select *
from t1
where created > date_sub(from_unixtime(unix_timestamp()), 180);
What is the appropriate way to perform a date comparison in Hive?
How about:
where unix_timestamp() - created < 180 * 24 * 60 * 60
Date math is usually simplest if you can just do it with the actual timestamp values.
Or do you want it to only cut off on whole days? Then I think the problem is with how you are converting back and forth between ints and strings. Try:
where created > unix_timestamp(date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),180),'yyyy-MM-dd')
Walking through each UDF:
unix_timestamp() returns an int: current time in seconds since epoch
from_unixtime(,'yyyy-MM-dd') converts to a string of the given format, e.g. '2012-12-28'
date_sub(,180) subtracts 180 days from that string, and returns a new string in the same format.
unix_timestamp(,'yyyy-MM-dd') converts that string back to an int
If that's all getting too hairy, you can always write a UDF to do it yourself.
Alternatively you may also use datediff. Then the where clause would be
in case of String timestamp (jdbc format) :
datediff(from_unixtime(unix_timestamp()), created) < 180;
in case of Unix epoch time:
datediff(from_unixtime(unix_timestamp()), from_unixtime(created)) < 180;
I think maybe it's a Hive bug dealing with the timestamp type. I've been trying to use it recently and getting incorrect results.
If I change your schema to use a string instead of timestamp, and supply values in the
yyyy-MM-dd HH:mm:ss
format, then the select query worked for me.
According to the documentation, Hive should be able to convert a BIGINT representing epoch seconds to a timestamp, and that all existing datetime UDFs work with the timestamp data type.
with this simple query:
select from_unixtime(unix_timestamp()), cast(unix_timestamp() as
timestamp) from test_tt limit 1;
I would expect both fields to be the same, but I get:
2012-12-29 00:47:43 1970-01-16 16:52:22.063
I'm seeing other weirdness as well.
TIMESTAMP is milliseconds
unix_timestamp is in seconds
You need to multiply the RHS by 1000.
where created > 1000 * date_sub(from_unixtime(unix_timestamp()), 180);
After reviewing this and referring to Date Difference less than 15 minutes in Hive I came up with a solution. While I'm not sure why Hive doesn't perform the comparison effectively on dates as strings (they should sort and compare lexicographically), the following solution works:
FROM (
SELECT id, value,
unix_timestamp(created) c_ts,
unix_timestamp(date_sub(from_unixtime(unix_timestamp()), 180), 'yyyy-MM-dd') c180_ts
FROM t1
) x
JOIN t1 t ON x.id = t.id
SELECT to_date(t.Created),
x.id, AVG(COALESCE(x.HighestPrice, 0)), AVG(COALESCE(x.LowestPrice, 0))
WHERE unix_timestamp(t.Created) > x.c180_ts
GROUP BY to_date(t.Created), x.id ;