How to pass dynamic parameterss in where condition in bq command line - google-bigquery

FTIMESTAMP="2018-07-09 00:00:00"
LTIMESTAMP="2018-07-09 08:00:00"
echo $FTIMESTAMP
echo $LTIMESTAMP
bq query --nouse_legacy_sql 'insert `table1`(Time,UserId)
select Time,UserId from `table2`
WHERE _PARTITIONTIME >= "$FTIMESTAMP" AND _PARTITIONTIME < "$LTIMESTAMP"'
When I ran these commands in .sh script, it gave the following error:
*Error in query string: Error processing job '************': Could not cast literal "$FTIMESTAMP" to type TIMESTAMP at [3:25].*
I want to pass those parameters dynamically once this query is successful.
Or is there any other way to extract the data for last 8 hours on the basis of partition time.

It's really a better idea to use query parameters instead of modifying your query text directly; you won't have issues where the query text ends up with syntax errors or other problems. Here is an example using parameters with the names from your question:
$ bq query --use_legacy_sql=false \
--parameter=FTIMESTAMP:TIMESTAMP:"2018-07-09 00:00:00" \
--parameter=LTIMESTAMP:TIMESTAMP:"2018-07-09 00:00:00" \
"SELECT #FTIMESTAMP, #LTIMESTAMP;"
+---------------------+---------------------+
| f0_ | f1_ |
+---------------------+---------------------+
| 2018-07-09 00:00:00 | 2018-07-09 00:00:00 |
+---------------------+---------------------+
In your case, you would want something like this:
$ bq query --nouse_legacy_sql \
--parameter=FTIMESTAMP:TIMESTAMP:"2018-07-09 00:00:00" \
--parameter=LTIMESTAMP:TIMESTAMP:"2018-07-09 00:00:00" \
'insert `table1`(Time,UserId)
select Time,UserId from `table2`
WHERE _PARTITIONTIME >= #FTIMESTAMP AND _PARTITIONTIME < #LTIMESTAMP'
If you still want to set the parameter values from shell variables, you can do so like this:
$ FTIMESTAMP="2018-07-09 00:00:00"
$ LTIMESTAMP="2018-07-09 00:00:00"
$ bq query --nouse_legacy_sql \
--parameter=FTIMESTAMP:TIMESTAMP:"$FTIMESTAMP" \
--parameter=LTIMESTAMP:TIMESTAMP:"$LTIMESTAMP" \
'insert `table1`(Time,UserId)
select Time,UserId from `table2`
WHERE _PARTITIONTIME >= #FTIMESTAMP AND _PARTITIONTIME < #LTIMESTAMP'
This sets the values of the query parameters from the shell variables, which are then passed to BigQuery.

Related

How to pass variables as expressions in Bigquery timestamp_diff

Currently, I am trying to check the timestamp difference in hours with expressions passed as a variables through the command line. But I am unable to get the desired output when passing through variables.
a=2019-11-1812:49:43
b=2020-04-04 20:32:33
timediff=$(bq query --nouse_legacy_sql \ 'SELECT TIMESTAMP_DIFF(TIMESTAMP "'$a'", TIMESTAMP "$b", HOUR);')
Looks like the variables I am passing are not recognized. Can someone help me understand the correct way of doing it?
In addition to Hemant's answer to further contribute with the community I will provide an alternative method.
As stated in the documentation, it is possible to use parameterized queries in BigQuery using the Command-Line interface (CLI). You need to use the flag --parameter within your bq query command in order to specify the varibles/parameters you will use.
This flag must be in the format name:type:value. Although, if type is omitted it will used as STRING. As an example:
timediff= $(bq query --use_legacy_sql=false
--parameter='ts_value:TIMESTAMP:2016-12-07 08:00:00'
--parameter='ts_value1:TIMESTAMP:2016-12-07 09:00:00'
'SELECT
TIMESTAMP_DIFF(#ts_value,#ts_value1, HOUR)')
echo $timediff
And the output is:
+-----+
| f0_ |
+-----+
| -1 |
+-----+
You could use --format=csv to format the output as a line:
f0_ -1
In addition, I would like to add that you can use aliases to simplify your query. For instance:
alias bq_set="bq query --use_legacy_sql=false --format=pretty"
timediff=$(bq_set
--parameter='ts_value:TIMESTAMP:2016-12-07 08:00:00'
--parameter='ts_value1:TIMESTAMP:2016-12-07 09:00:00'
'SELECT
TIMESTAMP_DIFF(#ts_value,#ts_value1, HOUR)')
echo $timediff
The output:
+-----+
| f0_ |
+-----+
| -1 |
+-----+
As you can see it was just an alternative to simply your query.
Try using single quotes around the variables, but double-quotes around the entire query. For example:
a='2019-11-18 12:49:43'
b='2020-04-04 20:32:33'
timediff=$(bq query --format=csv --nouse_legacy_sql "SELECT TIMESTAMP_DIFF(TIMESTAMP '$a', TIMESTAMP '$b', HOUR);" | awk
'NR>1')
echo $timediff
-3319

assign date value dynamically to hive query

I have partitioned table in hive and I want to assign value for date column dynamically( yesterday's date ). Below is my current query but it's not working.
ALTER TABLE db1.table1 ADD IF NOT EXISTS PARTITION (loaddate="date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd') , 1)") LOCATION "hdfs://location1/abc/rawdata/externalhivetables/downloading/data";
Instead of returning the date value it's returning me the complete expression.
select downloading.loaddate From downloading limit 3;
+------------------------------------------------------------+
| downloading.loaddate |
+------------------------------------------------------------+
| date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd') , 1) |
| date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd') , 1) |
| date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd') , 1) |
In hive shell we cannot assign variable variables from the result of query yet, we need to have 2 steps:
Use Shell script to execute the query and store the result into a variable.
Then initialize the hive shell/script with the variable.
bash$ var=`hive -S -e "select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd') , 1);"`
bash$ echo $var
Now initialize hive/beeline shell with the varvalue
bash$ hive -hiveconf dd=$var
hive> ALTER TABLE db1.table1 ADD IF NOT EXISTS PARTITION (loaddate='${hiveconf:dd}') LOCATION "hdfs://location1/abc/rawdata/externalhivetables/downloading/data";
Refer to this and this links for additional information.
Use shell to calculate date and substitute it using shell variable substitution:
bash$ dt=$(date -d '-1 day' +%Y-%m-%d)
bash$ hive -e "ALTER TABLE db1.table1 ADD IF NOT EXISTS PARTITION (loaddate='$dt') LOCATION 'hdfs://location1/abc/rawdata/externalhivetables/downloading/data'"

Remove header from query result in bq command line

I have a query $(bq query --format=csv "select value from $BQConfig where parameter = 'Columnwidth'") .
The output of the query in csv format is :
value
3 4 6 8
here i want to get only the result 3 4 6 8 not the value which is just a header.
I have gone through google document and found that --noprint_header works only for bq extract. i didnt find anything for bq query.
If you are on a bash shell, you could use sed or awk to skip the first lines:
bq query --format=csv "SELECT 1 x" | sed "2 d"
Or:
bq query --format=csv "SELECT 1 x" | awk 'NR>2'
You can use the --skip_leading_rows argument (source : Create a table from a file)

BigQuery select alias using regex_extract_all in standard mode

I'm unable to reference a SELECT alias in BigQuery (standard mode).
Trying to do this query:
SELECT
REGEXP_EXTRACT_ALL(text,
r"(<div \w+>)") AS matches
FROM
regex.test
WHERE
matches IS NOT NULL
Here are steps to reproduce.
bq mk regex
bq mk -t regex.test id:integer,text:string
echo '{"id":1, "text":"<div a>"}' | bq insert regex.test
echo '{"id":2, "text":"<div b>"}' | bq insert regex.test
echo '{"id":3, "text":"<div>"}' | bq insert regex.test
bq query --use_legacy_sql=false "select REGEXP_EXTRACT_ALL(text, r\"(<div \w+>)\") AS matches FROM regex.test WHERE id IS NOT NULL"
+--------------+
| matches |
+--------------+
| [u'<div b>'] |
| [] |
| [u'<div a>'] |
+--------------+
When I try to reference the matches alias, I see an error:
bq query --use_legacy_sql=false "select REGEXP_EXTRACT_ALL(text, r\"(<div \w+>)\") AS matches FROM regex.test WHERE matches IS NOT NULL"
Error in query string: Error processing job 'myname': Unrecognized name:
matches
I am unable to reference the alias matches, and am unable to filter those results WHERE matches IS NOT NULL.
Does anyone know what I'm doing incorrectly here?
Thanks!
Even in BQ, you can't use a column alias in the where clause. Just use a subquery:
SELECT t.*
FROM (SELECT REGEXP_EXTRACT_ALL(text, r"(<div \w+>)") AS matches
FROM regex.test
) t
WHERE ARRAY_LENGTH(matches) > 0
Check out SELECT list aliases visibility
The reason why comparing with NULL does't work for REGEXP_EXTRACT_ALL is because
it returns array so checking with length is the way. Comparing with NULL still will work for REGEXP_EXTRACT
In addition, ideally you should be able use REGEX_MATCH to filter out records w/o matches, but looks like there is an issue with this function in standard mode

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.
Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output
True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)
I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'