I would like to run date command in hive script. It being a shell command, I tried using a preceding '!'. Tried the following:
hive (default)> !date --date="6 months ago";
date: extra operand `ago"'
Try `date --help' for more information.
Command failed with exit code = 1
As mentioned by #samson in comment hive does not parse the command --date="6 months ago";. You can use below workaround.
select add_months(current_date(),-6);
Related
We have a query in which we have defined more than 50 variables.
we call this hql via shell script, most of the times i get into syntax issue where i have not defined hive variables properly in the query.
Example
set hive var0=value0;
set hive var1=value1;
set hive var2=value2;
select * from ${hiveconf:var0} where col1=${hiveconf:var1} and col2=${hiveconf:var2};
I want to to check the above query result after replacing hive variables,
So is there a way to check if the variables are parsed in the right way or are there any syntax errors.
Please let me know for any alternatives as well.
Better use hivevar namespace for the same.
You can print all variable using ! echo command:
set hivevar:var0=value0;
hive> ! echo Variable hivevar:var0 is ${hivevar:var0};
Result:
Variable hivevar:var0 is value0
Also use explain extended <query> - it will print detailed query plan with predicates and fail if it is syntax error.
Update:
Also you can use SELECT for doing the same and Hive can execute simple queries without MR started if hive.fetch.task.conversion is set to more or minimal. If you are using Qubole, add also limit 1 to the query:
set hive.fetch.task.conversion=more;
select 'Variable hivevar:var0 is', '${hivevar:var0}' limit 1;
Why you may need to do this using SELECT? For example for easy checking parameter using casting or some UDF. If you need to check if parameter is of type DATE, use
set hive.fetch.task.conversion=more;
select 'Variable hivevar:var0 is', date '${hivevar:var0}' limit 1;
In this case if ${hivevar:var0} is not date, then type cast exception will be thrown and script execution terminated.
along with hivevar namespace, we can use one more property hive.root.logger=INFO,console.
this will display the query after replacing the variable value, from which we can find out the issue.
cat test.hql
set hivevar:var1=${hivevar:var11};
set hivevar:var2=2345;
select ${hivevar:var11};
select ${hivevar:var2};
hive command - hive --hiveconf hive.root.logger=INFO,console --hivevar var11=1234 -f test.hql
output on console
select 1234
2018-10-17T08:23:31,632 INFO [main] ql.Driver: Completed executing command(queryId=-4dd6-493f-88be-03810f847fe7); Time taken: 0.003 seconds
OK
2018-10-17T08:23:31,632 INFO [main] ql.Driver: OK
2018-10-17T08:23:31,670 INFO [main] io.NullRowsInputFormat$NullRowsRecordReader: Using null rows input format
1234
I want to create a hive script that uses as database one of two given parameters, whichever is not null.
My hive-test.sql is this:
set db_name = coalesce(${hiveconf:dbOne}, ${hiveconf:dbTwo});
use ${hiveconf:db_name};
show tables;
and I run it with:
hive -hiveconf dbOne=my_database -f hive-test.sql
and I am getting:
FAILED: ParseException line 2:12 missing EOF at '(' near 'coalesce'
I should note that if I change the first line in script to:
set db_name = my_database;
it works.
I can't figure out what I did wrong. Your assistance is appreciated.
This feature is not available in Hive.
Do variable assignment in the shell, for example like here: setting-a-shell-variable-in-a-null-coalescing-fashion and pass it to the Hive.
I have a directory with a file named file1.txt
And I run the command:
bq query "SELECT * FROM [publicdata:samples.shakespeare] LIMIT 5"
In my local machine it works fine but in Compute Engine I receive this error:
Waiting on bqjob_r2aaecf624e10b8c5_0000014d0537316e_1 ... (0s) Current status: DONE
BigQuery error in query operation: Error processing job 'my-project-id:bqjob_r2aaecf624e10b8c5_0000014d0537316e_1': Field 'file1.txt' not found.
If the directory is empty it works fine. I'm guessing the asterisk is expanding the file(s) into the query but I don't know why.
Apparently the bq command which is located at /usr/bin/bq has the following script:
#!/bin/sh
exec /usr/lib/google-cloud-sdk/bin/bq ${#}
which expands the asterisk.
As a current workaround I'm calling /usr/lib/google-cloud-sdk/bin/bq directly.
I have a pig script where in the beginning I would like to generate a string of the dates of the past 7 days from a certain date (later used to retrieve log files for those days).
I attempt to do this with this line:
%declare CMD7 input= ; for i in {1..6}; do d=$(date -d "$DATE -i days" "+%Y-%m-%d"); input="\$input\$d,"; done; echo \$input
I get an error :
" ERROR 2999: Unexpected internal error. Error executing shell command: input= ; for i in {1..6}; do d=$(date -d "2012-07-10 -i days" "+%Y-%m-%d"); input="$input$d,"; done;. Command exit with exit code of 127"
however the shell command runs perfectly fine outside of pig. I am really not sure what is going wrong here.
Thank you!
I have got a working solution but not as streamlined as you want, essentially I don't manage to get Pig to execute a complex shell statement in the declare.
I first wrote a shell script (let's call it 6-days-back-from.sh):
#!/bin/bash
DATE=$1
for i in {1..6}; do d=$( date -d "$DATE -$i days" +%F ) ; echo -n "$d "; done
Then a pig script as follow (let's call it days.pig):
%declare my_date `./6-days-back-from.sh $DATE`
A = LOAD 'dual' USING PigStorage();
B = FOREACH A GENERATE '$my_date';
DUMP B
note that dual is a directory containing a text file with a single line of text, for the purpose of displaying our variable
I called the script as follow:
pig -x local -param DATE="2012-08-03" days.pig
and got the following output:
({(2012-08-02),(2012-08-01),(2012-07-31),(2012-07-30),(2012-07-29),(2012-07-28)})
I assume that all I need to do is to:
Create an sql file e.g. nameofsqlfile.sql contents:
perform proc_my_sql_funtion();
Execute this as a cron job.
However, I don't know the commands that I'd need to write to get this cron job executed as a postgres function for a specified host,port,database, user & his password...?
You just need to think of cronjob as running a shell command at a specified time or day.
So your first job is to work out how to run your shell command.
psql --host host.example.com --port 12345 --dbname nameofdatabase --username postgres < my.sql
You can then just add this to your crontab (I recommend you use crontab -e to avoid breaking things)
# runs your command at 00:00 every day
#
# min hour wday month mday command-to-run
0 0 * * * psql --host host.example.com --port 12345 --dbname nameofdatabase < my.sql
In most cases you can put all of the sql source in a shell 'here document'. The nice thing about here documents is that the shell's ${MY_VAR} are expanded even within single quotes, e.g:
#!/bin/sh
THE_DATABASE=personnel
MY_TABLE=employee
THE_DATE_VARIABLE_NAME=hire_date
THE_MONTH=10
THE_DAY=01
psql ${THE_DATABASE} <<THE_END
SELECT COUNT(*) FROM ${MY_TABLE}
WHERE ${THE_DATE_VARIABLE_NAME} >= '2011-${THE_MONTH}-${THE_DAY}'::DATE
THE_END
YMMV
Check this
http://archives.postgresql.org/pgsql-admin/2000-10/msg00026.php
and
http://www.dbforums.com/postgresql/340741-cron-jobs-postgresql.html
or you can just create a bash script to include your coding and call it from crontab
For Postgresql 10 and above you can use pg_cron. As stated in its README.md,
pg_cron is a simple cron-based job scheduler for PostgreSQL (10 or higher) that runs inside the database as an extension. It uses the same syntax as regular cron, but it allows you to schedule PostgreSQL commands directly from the database: