How to store the output of a query in a variable in HIVE - hive

I want to store current_day - 1 in a variable in Hive. I know there are already previous threads on this topic but the solutions provided there first recommends defining the variable outside hive in a shell environment and then using that variable inside Hive.
Storing result of query in hive variable
I first got the current_Date - 1 using
select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
Then i tried two approaches:
1. set date1 = ( select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
and
2. set hivevar:date1 = ( select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
Both the approaches are throwing an error:
"ParseException line 1:82 cannot recognize input near 'select' 'date_sub' '(' in expression specification"
When I printed (1) in place of yesterday's date the select query is saved in the variable. The (2) approach throws "{hivevar:dt_chk} is undefined
".
I am new to Hive, would appreciate any help. Thanks.

Hive doesn't support a straightforward way to store query result to variables.You have to use the shell option along with hiveconf.
date1 = $(hive -e "set hive.cli.print.header=false; select date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),1);")
hive -hiveconf "date1"="$date1" -f hive_script.hql
Then in your script you can reference the newly created varaible date1
select '${hiveconf:date1}'

After lots of research, this is probably the best way to achieve setting a variable as an output of an SQL:
INSERT OVERWRITE LOCAL DIRECTORY '<home path>/config/date1'
select CONCAT('set hivevar:date1=',date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1)) from <some table> limit 1;
source <home path>/config/date1/000000_0;
You will then be able to use ${date1} in your subsequent SQLs.
Here we had to use <some table> limit 1 as hive got a bug in insert overwrite if we don't specify a table name.

Related

TrinoUserError (type=USER_ERROR, name=SYNTAX_ERROR, message="line 7:26: mismatched input 'COUNT'. Expecting: '*', <expression>")

I am using dbt-trino and for some reason, it doesn't understand the MySQL query that works fine by executing it directly on MySQL. In this query, I want to select and group records that have been created during the previous month.
The Query:
SELECT order_location, COUNT(*) as order_count
FROM {{ ref('x_stg_order_fields') }}
WHERE
created_at >= DATE_FORMAT( CURRENT_DATE - INTERVAL 1 MONTH, '%Y/%m/01' )
AND
created_at < DATE_FORMAT( CURRENT_DATE, '%Y/%m/01' )
GROUP BY order_location
While this query works fast and successfully directly on MySQL, it returns this error when executing with dbt run:
TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 7:53: mismatched input 'COUNT'. Expecting: '*', <expression>")
Does this mean that dbt-trino doesn't support all MySQL functions?
That error is coming from your database, not from dbt itself. dbt does not parse your SQL commands, it just passes them through to your connected database.
My guess is that {{ ref('x_stg_order_fields' }} may be referring to an ephemeral model that contains a syntax error, or possibly a field named count that isn't quoted?
You can confirm or disprove that by looking at the SQL that dbt tried to run in your database, by inspecting the target directory in your project. Specifically, target/run/path/to/your_model.sql will show you the actual command being executed. You should be able to check line 7, col 53 in that file, and you will see the code that trino is erroring about.

Arguments mismatch using where IN clause in query

I have column in hive table like below
testing_time
2018-12-31 14:45:55
2018-12-31 15:50:58
Now I want to get the distinct values as a variable so I can use in another query.
I have done like below
abc=`hive -e "select collect_set(testing_time)) from db.tbl";`
echo $abc
["2018-12-31 14:45:55","2018-12-31 15:50:58"]
xyz=${abc:1:-1}
when I do
hive -e "select * from db.tbl where testing_time in ($xyz)"
I get below error
Arguments for IN should be the same type! Types are {timestamp IN (string, string)
what the the mistake I am doing?
What is the correct way of achieving my result?
Note: I know I can use subquery for this scenario but I would like to use variable to achieve my result
Problem is that you're comparing timestamp (column testing_time) with string (i.e. "2018-12-31 14:45:55"), so you need to convert string to timestamp, which you can do via TIMESTAMP(string).
Here's a bash script that adds the conversion:
RES="" # here we will save the resulting SQL
IFS=","
read -ra ITEMS <<< "$xyz" # split timestamps into array
for ITEM in "${ITEMS[#]}"; do
RES="${RES}TIMESTAMP($ITEM)," # add the timestamp to RES variable,
# surrounded by TIMESTAMP(x)
done
unset IFS
RES="${RES%?}" # delete the extra comma
Then you can run the constructed SQL query:
hive -e "select * from db.tbl where testing_time in ($RES)"

Create table name in Hive using variable subsitution

I'd like to create a table name in Hive using variable substitution.
E.g.
SET market = "AUS";
create table ${hiveconf:market_cd}_active as ... ;
But it fails. Any idea how it can be achieved?
You should use backtrics (``) for name for that, like:
SET market=AUS;
CREATE TABLE `${hiveconf:market}_active` AS SELECT 1;
DESCRIBE `${hiveconf:market}_active`;
Example run script.sql from beeline:
$ beeline -u jdbc:hive2://localhost:10000/ -n hadoop -f script.sql
Connecting to jdbc:hive2://localhost:10000/
...
0: jdbc:hive2://localhost:10000/> SET market=AUS;
No rows affected (0.057 seconds)
0: jdbc:hive2://localhost:10000/> CREATE TABLE `${hiveconf:market}_active` AS SELECT 1;
...
INFO : Dag name: CREATE TABLE `AUS_active` AS SELECT 1(Stage-1)
...
INFO : OK
No rows affected (12.402 seconds)
0: jdbc:hive2://localhost:10000/> DESCRIBE `${hiveconf:market}_active`;
...
INFO : Executing command(queryId=hive_20190801194250_1a57e6ec-25e7-474d-b31d-24026f171089): DESCRIBE `AUS_active`
...
INFO : OK
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| _c0 | int | |
+-----------+------------+----------+
1 row selected (0.132 seconds)
0: jdbc:hive2://localhost:10000/> Closing: 0: jdbc:hive2://localhost:10000/
Markovitz's criticisms are correct, but do not produce a correct solution. In summary, you can use variable substitution for things like string comparisons, but NOT for things like naming variables and tables. If you know much about language compilers and parsers, you get a sense of why this would be true. You could construct such behavior in a language like Java, but SQL is just too crude.
Running that code produces an error, "cannot recognize input near '$' '{' 'hiveconf' in table name".(I am running Hortonworks, Hive 1.2.1000.2.5.3.0-37).
I spent a couple hours Googling and experimenting with different combinations of punctuation, different tools ranging from command line, Ambari, and DB Visualizer, etc., and I never found any way to construct a table name or a field name with a variable value. I think you're stuck with using variables in places where you need a string literal, like comparisons, but you cannot use them in place of reserved words or existing data structures, if that makes sense. By example:
--works
drop table if exists user_rgksp0.foo;
-- Does NOT work:
set MY_FILE_NAME=user_rgksp0.foo;
--drop table if exists ${hiveconf:MY_FILE_NAME};
-- Works
set REPORT_YEAR=2018;
select count(1) as stationary_event_count, day, zip_code, route_id from aaetl_dms_pub.dms_stationary_events_pub
where part_year = '${hiveconf:REPORT_YEAR}'
-- Does NOT Work:
set MY_VAR_NAME='zip_code'
select count(1) as stationary_event_count, day, '${hiveconf:MY_VAR_NAME}', route_id from aaetl_dms_pub.dms_stationary_events_pub
where part_year = 2018
The qualifies should be removed
You're using the wrong variable name
SET market=AUS; create table ${hiveconf:market}_active as select 1;

data arguments are not working as expected in Hive

I am passing data parameters to hive script, but it's not working.
SET yrmonth=concat(substr(to_date(${hiveconf:runningdate}),1,4),substr(to_date(${hiveconf:runningdate}),6,2));
SET fom=TRUNC(${hiveconf:runningdate},'MONTH');
SET lom=LAST_DAY(${hiveconf:runningdate});
USE cust_db;
SELECT saleid,podid,pname
FROM product
WHERE productln_yrmo=${hiveconf:yrmonth};
--productln_yrmo is int column
SELECT cid,cname,cloc
FROM customer
WHERE customer_createddt >='${hiveconf:fom}'
AND customer_createddt <='${hiveconf:lom}'
AND cloc = 'AUS';
--customer_createddt is date column
hive -hiveconf runningdate='2016-05-18' -f cust.hql
from your query, 'customer_createddt' seems to be DATE type since you have used with '<=' and '>=' operators to get the range of values between 'fom' and 'lom'. I guess lexicographic comparison for date range works fine perfectly . But, not sure when table DDL created with a partition on 'customer_createddt' column.
So please give a try casting string to date ( also please post your table DDL for more clarity)
SELECT cid,cname,cloc
FROM customer
WHERE customer_createddt >=CAST(${hiveconf:fom} AS DATE)
AND customer_createddt <=CAST(${hiveconf:lom} AS DATE)
AND cloc = 'AUS';

Parameters in Datalab SQL modules

The parameterization example in the "SQL Parameters" IPython notebook in the datalab github repo (under datalab/tutorials/BigQuery/) shows how to change the value being tested for in a WHERE clause. Is it possible to use a parameter to change the name of a field being SELECT'd on?
eg:
SELECT COUNT(DISTINCT $a) AS n
FROM [...]
After I received the answer below, here is what I have done (with a dummy table name and field name, obviously):
%%sql --module test01
DEFINE QUERY get_counts
SELECT $a AS a, COUNT(*) AS n
FROM [project_id.dataset_id.table_id]
GROUP BY a
ORDER BY n DESC
table = bq.Table('project_id.dataset_id.table_id')
field = table.schema['field_name']
bq.Query(test01.get_counts,a=field).sql
bq.Query(test01.get_counts,a=field).results()
You can use a field from a Schema object (eg. given a table, get a specific field via table.schema[fieldname]).
Or implement a custom object with a _repr_sql_ method. See: https://github.com/GoogleCloudPlatform/datalab/blob/master/sources/lib/api/gcp/bigquery/_schema.py#L49