Hive variable concatenation - variables

I am facing problems in concatenating the value of a variable with a string .
my script contains the below
set hivevar:tab_dt= substr(date_sub(current_date,1),1,10);
CREATE TABLE default.udr_lt_bc_${hivevar:tab_dt}
(
trans_id double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
in the above, the variable tab_dt gets assigned correctly with yesterdays date in the format yyyymmdd.
but when i try to concatenate this variable in a table name with a static string, the script fails. it is not doing the concatenation .
Kindly provide a solution.
note: i tried the below too, which is erroring out too
set hivevar:tab_dt= substr(date_sub(current_date,1),1,10);
set hivevar:tab_nm1= default.udr_lt_bc_;
set hivevar:tab_name= concat(${hivevar:tab_dt},${hivevar:tab_nm1})
CREATE TABLE ${hivevar:tab_name}
(
trans_id double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
This too is returning an error.

Hive does not calculate expressions in the variables, substituting them as is.
Your create table expression results in this:
CREATE TABLE default.udr_lt_bc_substr(date_sub(current_date,1),1,10)...
Your second expression results in this:
CREATE TABLE concat(substr(date_sub(current_date,1),1,10),default.udr_lt_bc_)
Unfortunately Hive does not support such expressions in DDL.
I recommend to calculate this variable in a shell and pass as a --hivevar to the hive script.
For example in the sell script:
table_name=udr_lt_bc_$(date +'%Y_%m_%d' --date "-1 day")
#table_name is udr_lt_bc_2017_10_31 now
#call your script
hive -hivevar table_name="$table_name" -f your_script.hql
And then in your_script you can use variable:
CREATE TABLE default.${hivevar:table_name}
Note that '-' is not allowed in table names, that is why i used '_' instead.
For better understanding how Hive substitutes variables, try this:
hive> set hivevar:tab_dt= substr(date_sub(current_date,1),1,10);
hive> select ${hivevar:tab_dt};
OK
2017-10-31
Time taken: 1.406 seconds, Fetched: 1 row(s)
hive> select '${hivevar:tab_dt}';
OK
substr(date_sub(current_date,1),1,10)
Time taken: 0.087 seconds, Fetched: 1 row(s)
Note that in the first select statement the variable was substituted as is before execution and then calculated in the SQL. Second select statement prevent calculation because the variable is quoted and remains as is: substr(date_sub(current_date,1),1,10).

Another way in Hive:
select concat("table_",date_sub(from_unixtime(unix_timestamp(current_date,'yyyy-MM-dd'),'yyyy-MM-dd'),0));
Here, we can use above in a variable and use it as per our needs.

Related

Expression in Hive LIMIT clause

In Impala, you can do this:
SELECT x FROM t1 LIMIT cast(truncate(9.9) AS INT);
But in Hive, it seems to only take LIMIT [constant].
Is there a way to add expression in LIMIT?
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_limit.html
Unfortunately, this is not possible in Hive. As a workaround you can calculate variable in the shell and pass it to the Hive using --hivevar variable. Limit clause allows only per-calculated variables or constants as arguments:
Demo with variable. You can pass it also as --hivevar argument in the hive command line:
hive> set hivevar:limit=10;
hive> select 10 limit ${hivevar:limit};
OK
10
Time taken: 0.098 seconds, Fetched: 1 row(s)

In Hive I need to Get numeric value after a particular word is it possible?

i want to get a numeric value immediately after a particular word in string
In hive for example :
APDSGDSCRAM051 in that i need to get numeric value after word RAM
is it possible in hive
Note: its not a fixed length string
Here you go, you need to use substr and instr pre-defined hive functions:
create table str_testing (c string);
insert into table str_testing values ('APDSGDSCRAM051');
select substr(c, instr(c, 'RAM') + 3) from str_testing;
OK
051
Time taken: 0.243 seconds, Fetched: 1 row(s)
As explained here, you can implemented in hive as
select regexp_extract(name, '\\d+', 0) from <table_name>;
Note: I do not have environment for Hive configured so you can check this by running at your end. Ya this will work only for first set of numbers found in your string, if you string has numbers at multiple places this might fail.

Last Saturday date in HIVE

I am trying to find last Saturday date in HIVE in YYYY-MM-DD format using:
SET DATE_DM2=date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),cast(((from_unixtime(unix_timestamp(), 'u') % 7)+1) as int));
But this is giving error
Change this line:
c = "hive -e 'use Data; SELECT * FROM table1 WHERE partitiondate='${DATE_D}';'"
into:
c = "hive -e \"use Data; SELECT * FROM table1 WHERE partitiondate='{DATE_D}';\"".format(DATE_D=DATE_D)
The call to format is what #Mai was mentioning. As for printing c, print c will show you the value of c at runtime, so you'll know if the value is as you expect.
P.S. calling commands.getoutput will not only fetch the rows but all of the standard output of calling hive in command line, and store that in a single string - meaning you'll probably need to do some parsing if you need to work with those rows. Or better yet, check out HiveClient.

How to CONCAT quotes while selecting any data from the table

The file (date.sql) has the below commands
date=2014-12-03
SET VAR vDate $date
insert into (Some_table) (date,location)
select $vDate , location from (let say from table "Location")
while executing this sql script i am getting the below error as its expecting values like '2014-12-03' but here its coming without quotes
ERROR:Insert data type mismatch for column date
Is there anyway to concat ' before and after date or is there any simpler way to achieve it ???
Thanks in advance.

Construct table name dynamically in pentaho

I am making a pentaho transformation and i have a table input step. My requirement was to pass the table input step dynamically as a variable and this i achieved by doing :
select * from ${table_name}
and when i run the transformation, i pass in the value of
table_name.
This works.
But my new requirement is: To pass the date as a variable and then construct a table name from the date based on the month and year
So for example:If I pass in 2012-01-31, I want a sql like this:
select * from xxx_201201_v
I cannot use a substr like this:
select * from xxx_substr(${input_date} ,0,4)
So I am confused how to do this
You could use a Get Variables step so you have your date in a field. Then use steps to manipulate your strings and then pass the field to the Table Input step like this
select * from ?
Of course you should activate the option of the Table Input step to get fields from a previous step. If you have 2012-01-31 you can get 201201 using a find and replace step where you find "-" and replace it with nothing, and then cut the string so that you have the first 6 digits.