Specify default parameter values within a hive script - hive

I am aware that it is possible to specify parameters for a hive query/script like so:
>hive -e "USE uk_pers_dev;set hive.cli.print.header=true;CREATE TABLE IF NOT EXISTS ${hiveconf:tablename} (mycol int);SELECT * FROM ${hiveconf:tablename};" -hiveconf tablename=mytable;
However what I would like to do is specify a default parameter value within my hive script that should be used in case no value is passed from the command-line. Is that possible?

JT
Could you use
SET myvar = "Hello World"
inside your hive script do
SET myvar = COALESCE(${hiveconf:myvar},"Default");
SELECT * FROM MyTable WHERE MyColumn = ${hiveconf:myvar};
OR
SELECT * FROM MyTable WHERE MyColumn = COALESCE(${hiveconf:myvar},"Default");

OK I have a table called airports
name: string
country: string
area_code int
in HIVE I do
SET mycol = name;
SELECT ${hiveconf:mycol} from airports;
works for me.

OK So take the logic away from Hive
In POSH or a cmd file
Do the check
Substitute if needed
Call Hive
Maybe look at that as a wrapper.
Allan

Related

ballerina.io SQL LIKE statement

i am currently trying to execute the following SQL statement in ballerina.io against a MariaDB.
Plan SQL:
select * FROM testDB where test LIKE '%BA%';
I get a result set with all data.
ballerina.io:
var selectRet = testDB->select("select * FROM testDB where test LIKE '%?%'", testREC, "BA");
I get an empty result set.
versions:
ballerina --version
jBallerina 1.1.2
Language specification 2019R3
Ballerina tool 0.8.0
Is it possible to make a SQL statement with LIKE in ballerina.io?
Many greetings,
Martin
The parameter is passed to the query as a separate literal string, not as some kind of template variable. To surround it with wildcards, you need to use concat() in the query:
var selectRet = testDB->select(
"select * FROM testDB where test like concat('%', ?, '%')",
testREC,
"BA"
);
Or just concatenate the wildcards in your code (this looks a bit cleaner to me):
var selectRet = testDB->select(
"select * FROM testDB where test like ?",
testREC,
"%BA%"
);

get the current date and set it to variable in order to use it as table name in HIVE

I want to get the current date as YYMMDD and then set it to variable in order to use it as table name.
Here is my code:
set dates= date +%Y-%m-%d;
CREATE EXTERNAL TABLE IF NOT EXISTS dates(
id STRING,
region STRING,
city STRING)
But this method doesn't work, because it seems the assignments are wrong. Any idea?
Hive does not calculate variables, it substitutes them as is, in your case it will be exactly this string 'date +%Y-%m-%d'. Also it is not possible to use UDF like current_date() in place of table name in DDL.
The solution is to calculate variable in the shell and pass it to Hive:
In the shell
dates=$(date +%Y_%m_%d);
hive --hivevar date="$dates" -f myscript.hql
In the script:
use mydb; create table if not exists tab_${hivevar:date} (id int);
Or you can execute hive script from command line using hive -e, in this case variable can be substituted using shell:
dates=$(date +%Y_%m_%d);
hive -e "use mydb; create table if not exists tab_${dates} (id int);"

How to convert string set to hiveconf variable to an object usable as part of a table name

I am looking for a way to remove quotes from a hiveconf variable string, so that I can use it also as part of a table name:
Basicaly, I have something like
set sub_name = "123";
select ${hiveconf:sub_name} from table_${hiveconf:sub_name};
And when executing I need it to work like:
select "123" from table_123;
For that, I could run with something like:
set variable = "123";
set table_subname = 123;
select ${hiveconf:variable} from table_${hiveconf:table_subname};
Which would then work as
select "123" from table_123;
But is there some elegant way how to use just the one variable, once as a string and once as a part of the table name?
hive> create table table_abc as select 'X' as x;
OK
x
hive> set sub_name=abc;
hive> select "${hiveconf:sub_name}" from table_${hiveconf:sub_name};
OK
_c0
abc

Filter in select where values start with NIR_

I am trying to filter my result set to only return values which start with NIR_.
My SQL statement to do so is as follows
select * from run where name like %NIR_%
The result set also includes names like
NIRMeta_Invalid
NIRMeta_Position
I am not sure what I am doing wrong. I only need to select names which start with NIR_.
You need to escape the underscore in your LIKE pattern if you want it to be treated as a literal.
In SQL Server:
select * from run where name like 'NIR[_]%'
In MySQL and Oracle:
select * from run where name like 'NIR\_%'
If you want names that only start with NIR, then remove the first wildcard in the like pattern:
where name like 'NIR_%'
Note that _ is also a wildcard, so you probably want:
where name like 'NIR\_%'
You can use ESCAPE option to achieve this.
SELECT * FROM run WHERE name LIKE 'NIR#_%' ESCAPE '#'
Sample execution with the given data:
DECLARE #Run TABLE (name VARCHAR (100));
INSERT INTO #Run (name) VALUES
('NIR_MA'), ('NIR_RUN'), ('NIRMeta_Invalid'), ('NIRMeta_Position');
SELECT * FROM #Run WHERE name LIKE 'NIR#_%' ESCAPE '#'
Result:
name
-----
NIR_MA
NIR_RUN

How do you use script variables in psql?

In MS SQL Server, I create my scripts to use customizable variables:
DECLARE #somevariable int
SELECT #somevariable = -1
INSERT INTO foo VALUES ( #somevariable )
I'll then change the value of #somevariable at runtime, depending on the value that I want in the particular situation. Since it's at the top of the script it's easy to see and remember.
How do I do the same with the PostgreSQL client psql?
Postgres variables are created through the \set command, for example ...
\set myvariable value
... and can then be substituted, for example, as ...
SELECT * FROM :myvariable.table1;
... or ...
SELECT * FROM table1 WHERE :myvariable IS NULL;
edit: As of psql 9.1, variables can be expanded in quotes as in:
\set myvariable value
SELECT * FROM table1 WHERE column1 = :'myvariable';
In older versions of the psql client:
... If you want to use the variable as the value in a conditional string query, such as ...
SELECT * FROM table1 WHERE column1 = ':myvariable';
... then you need to include the quotes in the variable itself as the above will not work. Instead define your variable as such ...
\set myvariable 'value'
However, if, like me, you ran into a situation in which you wanted to make a string from an existing variable, I found the trick to be this ...
\set quoted_myvariable '\'' :myvariable '\''
Now you have both a quoted and unquoted variable of the same string! And you can do something like this ....
INSERT INTO :myvariable.table1 SELECT * FROM table2 WHERE column1 = :quoted_myvariable;
One final word on PSQL variables:
They don't expand if you enclose them in single quotes in the SQL statement.
Thus this doesn't work:
SELECT * FROM foo WHERE bar = ':myvariable'
To expand to a string literal in a SQL statement, you have to include the quotes in the variable set. However, the variable value already has to be enclosed in quotes, which means that you need a second set of quotes, and the inner set has to be escaped. Thus you need:
\set myvariable '\'somestring\''
SELECT * FROM foo WHERE bar = :myvariable
EDIT: starting with PostgreSQL 9.1, you may write instead:
\set myvariable somestring
SELECT * FROM foo WHERE bar = :'myvariable'
You can try to use a WITH clause.
WITH vars AS (SELECT 42 AS answer, 3.14 AS appr_pi)
SELECT t.*, vars.answer, t.radius*vars.appr_pi
FROM table AS t, vars;
Specifically for psql, you can pass psql variables from the command line too; you can pass them with -v. Here's a usage example:
$ psql -v filepath=/path/to/my/directory/mydatafile.data regress
regress=> SELECT :'filepath';
?column?
---------------------------------------
/path/to/my/directory/mydatafile.data
(1 row)
Note that the colon is unquoted, then the variable name its self is quoted. Odd syntax, I know. This only works in psql; it won't work in (say) PgAdmin-III.
This substitution happens during input processing in psql, so you can't (say) define a function that uses :'filepath' and expect the value of :'filepath' to change from session to session. It'll be substituted once, when the function is defined, and then will be a constant after that. It's useful for scripting but not runtime use.
FWIW, the real problem was that I had included a semicolon at the end of my \set command:
\set owner_password 'thepassword';
The semicolon was interpreted as an actual character in the variable:
\echo :owner_password
thepassword;
So when I tried to use it:
CREATE ROLE myrole LOGIN UNENCRYPTED PASSWORD :owner_password NOINHERIT CREATEDB CREATEROLE VALID UNTIL 'infinity';
...I got this:
CREATE ROLE myrole LOGIN UNENCRYPTED PASSWORD thepassword; NOINHERIT CREATEDB CREATEROLE VALID UNTIL 'infinity';
That not only failed to set the quotes around the literal, but split the command into 2 parts (the second of which was invalid as it started with "NOINHERIT").
The moral of this story: PostgreSQL "variables" are really macros used in text expansion, not true values. I'm sure that comes in handy, but it's tricky at first.
postgres (since version 9.0) allows anonymous blocks in any of the supported server-side scripting languages
DO '
DECLARE somevariable int = -1;
BEGIN
INSERT INTO foo VALUES ( somevariable );
END
' ;
http://www.postgresql.org/docs/current/static/sql-do.html
As everything is inside a string, external string variables being substituted in will need to be escaped and quoted twice. Using dollar quoting instead will not give full protection against SQL injection.
You need to use one of the procedural languages such as PL/pgSQL not the SQL proc language.
In PL/pgSQL you can use vars right in SQL statements.
For single quotes you can use the quote literal function.
I solved it with a temp table.
CREATE TEMP TABLE temp_session_variables (
"sessionSalt" TEXT
);
INSERT INTO temp_session_variables ("sessionSalt") VALUES (current_timestamp || RANDOM()::TEXT);
This way, I had a "variable" I could use over multiple queries, that is unique for the session. I needed it to generate unique "usernames" while still not having collisions if importing users with the same user name.
Another approach is to (ab)use the PostgreSQL GUC mechanism to create variables. See this prior answer for details and examples.
You declare the GUC in postgresql.conf, then change its value at runtime with SET commands and get its value with current_setting(...).
I don't recommend this for general use, but it could be useful in narrow cases like the one mentioned in the linked question, where the poster wanted a way to provide the application-level username to triggers and functions.
I've found this question and the answers extremely useful, but also confusing. I had lots of trouble getting quoted variables to work, so here is the way I got it working:
\set deployment_user username -- username
\set deployment_pass '\'string_password\''
ALTER USER :deployment_user WITH PASSWORD :deployment_pass;
This way you can define the variable in one statement. When you use it, single quotes will be embedded into the variable.
NOTE! When I put a comment after the quoted variable it got sucked in as part of the variable when I tried some of the methods in other answers. That was really screwing me up for a while. With this method comments appear to be treated as you'd expect.
I really miss that feature. Only way to achieve something similar is to use functions.
I have used it in two ways:
perl functions that use $_SHARED variable
store your variables in table
Perl version:
CREATE FUNCTION var(name text, val text) RETURNS void AS $$
$_SHARED{$_[0]} = $_[1];
$$ LANGUAGE plperl;
CREATE FUNCTION var(name text) RETURNS text AS $$
return $_SHARED{$_[0]};
$$ LANGUAGE plperl;
Table version:
CREATE TABLE var (
sess bigint NOT NULL,
key varchar NOT NULL,
val varchar,
CONSTRAINT var_pkey PRIMARY KEY (sess, key)
);
CREATE FUNCTION var(key varchar, val anyelement) RETURNS void AS $$
DELETE FROM var WHERE sess = pg_backend_pid() AND key = $1;
INSERT INTO var (sess, key, val) VALUES (sessid(), $1, $2::varchar);
$$ LANGUAGE 'sql';
CREATE FUNCTION var(varname varchar) RETURNS varchar AS $$
SELECT val FROM var WHERE sess = pg_backend_pid() AND key = $1;
$$ LANGUAGE 'sql';
Notes:
plperlu is faster than perl
pg_backend_pid is not best session identification, consider using pid combined with backend_start from pg_stat_activity
this table version is also bad because you have to clear this is up occasionally (and not delete currently working session variables)
Variables in psql suck. If you want to declare an integer, you have to enter the integer, then do a carriage return, then end the statement in a semicolon. Observe:
Let's say I want to declare an integer variable my_var and insert it into a table test:
Example table test:
thedatabase=# \d test;
Table "public.test"
Column | Type | Modifiers
--------+---------+---------------------------------------------------
id | integer | not null default nextval('test_id_seq'::regclass)
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
Clearly, nothing in this table yet:
thedatabase=# select * from test;
id
----
(0 rows)
We declare a variable. Notice how the semicolon is on the next line!
thedatabase=# \set my_var 999
thedatabase=# ;
Now we can insert. We have to use this weird ":''" looking syntax:
thedatabase=# insert into test(id) values (:'my_var');
INSERT 0 1
It worked!
thedatabase=# select * from test;
id
-----
999
(1 row)
Explanation:
So... what happens if we don't have the semicolon on the next line? The variable? Have a look:
We declare my_var without the new line.
thedatabase=# \set my_var 999;
Let's select my_var.
thedatabase=# select :'my_var';
?column?
----------
999;
(1 row)
WTF is that? It's not an integer, it's a string 999;!
thedatabase=# select 999;
?column?
----------
999
(1 row)
I've posted a new solution for this on another thread.
It uses a table to store variables, and can be updated at any time. A static immutable getter function is dynamically created (by another function), triggered by update to your table. You get nice table storage, plus the blazing fast speeds of an immutable getter.