How to set variables in HIVE scripts - hive

I'm looking for the SQL equivalent of SET varname = value in Hive QL
I know I can do something like this:
SET CURRENT_DATE = '2012-09-16';
SELECT * FROM foo WHERE day >= #CURRENT_DATE
But then I get this error:
character '#' not supported here

You need to use the special hiveconf for variable substitution.
e.g.
hive> set CURRENT_DATE='2012-09-16';
hive> select * from foo where day >= ${hiveconf:CURRENT_DATE}
similarly, you could pass on command line:
% hive -hiveconf CURRENT_DATE='2012-09-16' -f test.hql
Note that there are env and system variables as well, so you can reference ${env:USER} for example.
To see all the available variables, from the command line, run
% hive -e 'set;'
or from the hive prompt, run
hive> set;
Update:
I've started to use hivevar variables as well, putting them into hql snippets I can include from hive CLI using the source command (or pass as -i option from command line).
The benefit here is that the variable can then be used with or without the hivevar prefix, and allow something akin to global vs local use.
So, assume have some setup.hql which sets a tablename variable:
set hivevar:tablename=mytable;
then, I can bring into hive:
hive> source /path/to/setup.hql;
and use in query:
hive> select * from ${tablename}
or
hive> select * from ${hivevar:tablename}
I could also set a "local" tablename, which would affect the use of ${tablename}, but not ${hivevar:tablename}
hive> set tablename=newtable;
hive> select * from ${tablename} -- uses 'newtable'
vs
hive> select * from ${hivevar:tablename} -- still uses the original 'mytable'
Probably doesn't mean too much from the CLI, but can have hql in a file that uses source, but set some of the variables "locally" to use in the rest of the script.

Most of the answers here have suggested to either use hiveconf or hivevar namespace to store the variable. And all those answers are right. However, there is one more namespace.
There are total three namespaces available for holding variables.
hiveconf - hive started with this, all the hive configuration is stored as part of this conf. Initially, variable substitution was not part of hive and when it got introduced, all the user-defined variables were stored as part of this as well. Which is definitely not a good idea. So two more namespaces were created.
hivevar: To store user variables
system: To store system variables.
And so if you are storing a variable as part of a query (i.e. date or product_number) you should use hivevar namespace and not hiveconf namespace.
And this is how it works.
hiveconf is still the default namespace, so if you don't provide any namespace it will store your variable in hiveconf namespace.
However, when it comes to referring a variable, it's not true. By default it refers to hivevar namespace. Confusing, right? It can become clearer with the following example.
If you do not provide namespace as mentioned below, variable var will be stored in hiveconf namespace.
set var="default_namespace";
So, to access this you need to specify hiveconf namespace
select ${hiveconf:var};
And if you do not provide namespace it will give you an error as mentioned below, reason being that by default if you try to access a variable it checks in hivevar namespace only. And in hivevar there is no variable named var
select ${var};
We have explicitly provided hivevar namespace
set hivevar:var="hivevar_namespace";
as we are providing the namespace this will work.
select ${hivevar:var};
And as default, workspace used during referring a variable is hivevar, the following will work too.
select ${var};

Have you tried using the dollar sign and brackets like this:
SELECT *
FROM foo
WHERE day >= '${CURRENT_DATE}';

Just in case someone needs to parameterize hive query via cli.
For eg:
hive_query.sql
SELECT * FROM foo WHERE day >= '${hivevar:CURRENT_DATE}'
Now execute above sql file from cli:
hive --hivevar CURRENT_DATE="2012-09-16" -f hive_query.sql

Two easy ways:
Using hive conf
hive> set USER_NAME='FOO';
hive> select * from foobar where NAME = '${hiveconf:USER_NAME}';
Using hive vars
On your CLI set vars and then use them in hive
set hivevar:USER_NAME='FOO';
hive> select * from foobar where NAME = '${USER_NAME}';
hive> select * from foobar where NAME = '${hivevar:USER_NAME}';
Documentation: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution

One thing to be mindful of is setting strings then referring back to them. You have to make sure the quotes aren't colliding.
set start_date = '2019-01-21';
select ${hiveconf:start_date};
When setting dates then referring to them in code as the strings can conflict. This wouldn't work with the start_date set above.
'${hiveconf:start_date}'
We have to be mindful of not setting twice single or double quotes for strings when referring back to them in the query.

There are multiple options to set variables in Hive.
If you're looking to set Hive variable from inside the Hive shell, you can set it using hivevar. You can set string or integer datatypes. There are no problems with them.
SET hivevar:which_date=20200808;
select ${which_date};
If you're planning to set variables from shell script and want to pass those variables into your Hive script (HQL) file, you can use --hivevar option while calling hive or beeline command.
# shell script will invoke script like this
beeline --hivevar tablename=testtable -f select.hql
-- select.hql file
select * from <dbname>.${tablename};

Try this method:
set t=20;
select *
from myTable
where age > '${hiveconf:t}';
it works well on my platform.

You can export the variable in shell script
export CURRENT_DATE="2012-09-16"
Then in hiveql you like
SELECT * FROM foo WHERE day >= '${env:CURRENT_DATE}'

You can store the output of another query in a variable and latter you can use the same in your code:
set var=select count(*) from My_table;
${hiveconf:var};

Related

How to declare and use variable in Hive SQL?

I am using below syntax to declare and use variable in hive sql query. But it gives me an error as below
SET aa='10';
SELECT
col1 as data,
${aa} as myVar from myTable;
ERROR:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify aa at runtime. It is not in list of params that are allowed to be modified at runtime
I have also tried using hiveconf
SELECT ${hiveconf:aa} from myTable;
You can not pass variable like that. You need to use --hivevar. You can create a hql file with below script - hiveqry.hql. Pls note you can use either a normal variable or with hivevar keyword.
select * from ${hivevar:aa};
select * from ${aa};
Then call that script like below
beeline --hivevar table=myTable --hivevar aa=100 -f hiveqry.hql
Depending on Hive version, when you are setting variable without explicitly specifying the namespace (hiveconf or hivevar), it may work with hiveconf as a default namespace or may not work.
BTW this works in Hive 1.2:
SET aa='10';
SELECT ${hiveconf:aa};
If you specify the namespace explicitly when setting variable, it will work with both hivevar and hiveconf
Works:
SET hivevar:aa='10';
SELECT ${hivevar:aa};
Also works:
SET hiveconf:aa='10';
SELECT ${hiveconf:aa};
Does not work:
SET aa='10';
SELECT ${hivevar:aa} as myvar;
Also if above commands do not work, check if variable substitution is enabled (default is true):
set hive.variable.substitute=true;
If set to false, substitution does not work.
Read the documentation: LanguageManual VariableSubstitution

Oracle SQL if statement based on parameter passed in via .sh script

I have a shell script which calls some SQL like so
sqlplus system/$password#$instance #./oracle/mysqlfile.sql $var1 $var2 $var3
Then in mysqlfile.sql, I define properties like this:
DEFINE var1=&1
DEFINE var2=&3
DEFINE var3=&3
Later in the file, I call another SQL script:
// i wish to wrap this in a if statement - pseudo-code
if(var3="true") do the following
#./oracle/myOthersqlfile.sql &&varA &&varB
I am not sure how to implement this though, any suggestions appreciated
You could (ab)use substitution variables:
set termout off
column var3_path new_value var3_path
select case
when '&var3' = 'true' then './oracle/myOthersqlfile.sql &&varA &&varB'
else '/dev/null'
end as var3_path
from dual;
set termout on
#&var3_path
The query between the set termout commands - which just hide the output of the query - uses a case expression to pick either your real file path or a dummy file; I've used /dev/null, but you could have a 'no-op' file of your own that does nothing if that's clearer. The query gives the result of that the alias var3_path. The new_value line before it turns that into a substitution variable. The # then expands that variable.
So if var3 is 'true' then that runs:
#./oracle/myOthersqlfile.sql &&varA &&varB
(or, actually, with the varA and varB variables already replaced with their actual values) and if it is false it runs:
#/dev/null
which does nothing, silently.
You can set verify on around that code to see when and where substitution is happening.
You can't implement procedural logic into sqlplus. You have these options :
Implement the IF-THEN-ELSE logic inside the shell script that is running the sqlplus.
Use PL/SQL, but then your SQL Script should be called as a process inside an anonymous block, not like an external script.
In your case the easiest way is to change your shell script.
#/bin/bash
#
# load environment Oracle variables
sqlplus system/$password#$instance #./oracle/mysqlfile.sql $var1 $var2 $var3
# if then
if [ $var3 == "true" ]
then
sqlplus system/$password#$instance #./oracle/myOthersqlfile.sql
fi
You should realise that sqlplus is just a CLI ( Command Line Interface ). So you can't apply procedural logic to it.
I have no idea what you do in those sql scripts ( running DMLs, creating files, etc ), but the best approach would be to convert them to PL/SQL, then you can apply whatever logic you need to.

How can we replace value of hive variables to check for any errors

We have a query in which we have defined more than 50 variables.
we call this hql via shell script, most of the times i get into syntax issue where i have not defined hive variables properly in the query.
Example
set hive var0=value0;
set hive var1=value1;
set hive var2=value2;
select * from ${hiveconf:var0} where col1=${hiveconf:var1} and col2=${hiveconf:var2};
I want to to check the above query result after replacing hive variables,
So is there a way to check if the variables are parsed in the right way or are there any syntax errors.
Please let me know for any alternatives as well.
Better use hivevar namespace for the same.
You can print all variable using ! echo command:
set hivevar:var0=value0;
hive> ! echo Variable hivevar:var0 is ${hivevar:var0};
Result:
Variable hivevar:var0 is value0
Also use explain extended <query> - it will print detailed query plan with predicates and fail if it is syntax error.
Update:
Also you can use SELECT for doing the same and Hive can execute simple queries without MR started if hive.fetch.task.conversion is set to more or minimal. If you are using Qubole, add also limit 1 to the query:
set hive.fetch.task.conversion=more;
select 'Variable hivevar:var0 is', '${hivevar:var0}' limit 1;
Why you may need to do this using SELECT? For example for easy checking parameter using casting or some UDF. If you need to check if parameter is of type DATE, use
set hive.fetch.task.conversion=more;
select 'Variable hivevar:var0 is', date '${hivevar:var0}' limit 1;
In this case if ${hivevar:var0} is not date, then type cast exception will be thrown and script execution terminated.
along with hivevar namespace, we can use one more property hive.root.logger=INFO,console.
this will display the query after replacing the variable value, from which we can find out the issue.
cat test.hql
set hivevar:var1=${hivevar:var11};
set hivevar:var2=2345;
select ${hivevar:var11};
select ${hivevar:var2};
hive command - hive --hiveconf hive.root.logger=INFO,console --hivevar var11=1234 -f test.hql
output on console
select 1234
2018-10-17T08:23:31,632 INFO [main] ql.Driver: Completed executing command(queryId=-4dd6-493f-88be-03810f847fe7); Time taken: 0.003 seconds
OK
2018-10-17T08:23:31,632 INFO [main] ql.Driver: OK
2018-10-17T08:23:31,670 INFO [main] io.NullRowsInputFormat$NullRowsRecordReader: Using null rows input format
1234

Hiveconf/hivevar: possible to have a dot ('.') in a variable name?

Is it possible to use a dot in a name of hiveconf variable?
All examples in a documentation show simple variable names like a.
If yes:
How do I reference it in HQL script? select ${hiveconf:airflow.ctx.dag.dag_id} as dag_id; produces syntax error (while ${hiveconf:abcd} is ok).
If no:
Why does airflow or azkaban pass variables to hive scripts like this? Wouldn't the authors know that it's not possible to reference these variables?
hive -hiveconf airflow.ctx.dag.dag_id=video-plays-adverts -f test-hiveconf.hql
Thanks!
Have checked, this works:
set hiveconf:airflow.ctx.dag.dag_id=abc;
hive> select '${hiveconf:airflow.ctx.dag.dag_id}';
OK
abc
Time taken: 0.212 seconds, Fetched: 1 row(s)
Probably you forgot about quotes.
It turns out there were several compound problems:
1) Hivevars work as a C macro system - when you assign set a = concat('-', ${hiveconf:var_name}), content of ${hiveconf:a} isn't a string, but really a command concat('-', ${hiveconf:var_name}) which gets evaluated every time you use it.
2) I was using it in static partitions, which only accept literals, so this:
INSERT OVERWRITE TABLE xyz
PARTITION (year=${hiveconf:y}, month=${hiveconf:m}, week=${hiveconf:w}, day=${hiveconf:d})
got translated to this:
INSERT OVERWRITE TABLE xyz
PARTITION (year=<complex expression>, month=<complex expression>, week=<complex expression>, day=<complex expression>)
Which is not supported - static partition need literals.
End of story. Seems like there is no way to use 'computed' (set a = concat(${b}, ${c})) hiveconf variables in places of constant.

passing values using hivevar in HIVE

I've got a param which is like "This is a param", and I'm going to pass it to below hiveQL:
hive -hivevar sys_nm="This is a param" -e 'select * from rd_sys where rd_sys_nm=${hivevar:sys_nm}'
But Hive returned below error message:
Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1409.jar!/hive-log4j.properties
FAILED: ParseException line 1:49 missing EOF at 'is' near 'This'
g4t7491_[mgr#g4t7491 ~]$
Does anyone know how to pass it normally?
Hive var don't work like hiveconf where you need to apply "hiveconf:somthing" in the code
when declaring hivevar just add the var name like this -> ${var_name}
for example:
through command line:
hive -hivevar MONTH_VAR='11' -e "select * from table where month=${MONTH_VAR};"
you can also declair through the script:
set hivevar:MONTH_VAR=11;
-- so query would look like this (no hiveconf):
set hivevar:MONTH_VAR=11;
SELECT * from table where month=${MONTH_VAR};
You need to put the string in single quotes for it to parse correctly as a string inside the sql after interpolation.
hive -hivevar sys_nm="'This is a param'" -e 'select * from rd_sys where rd_sys_nm=${hivevar:sys_nm}'