In BigQuery, how to query a table using fully qualified name where project name contains - hyphen - google-bigquery

I am trying to run this cmd, and it seems BigQuery doesn't like the - in project name. And com-fin-dev is not my default project.
bq --location=US query --use_legacy_sql=false "SELECT col FROM com-fin-dev:my_schema.my_table LIMIT 10"
Syntax error: Unexpected "-"
Any alternative ways i can use the fully qualified table name, in BigQuery where project name contains - hyphen

For the shell that I use, I have to escape the backticks:
bq --location=US query --use_legacy_sql=false \
"SELECT col FROM \`com-fin-dev.my_schema.my_table\` LIMIT 10"
Note that you only need to escape the project name:
bq --location=US query --use_legacy_sql=false \
"SELECT col FROM \`com-fin-dev\`.my_schema.my_table LIMIT 10"

you should use below "spelling"
`com-fin-dev.my_schema.my_table`

Related

Athena Prepared Statement Parameter Order

I'm attempting to use the Athena parameterized queries feature in conjunctions with CTEs.
If I run a simple Athena CTE select statement like this, I get back no results even though I should get a single row back with a single value of 1:
aws athena start-query-execution \
--query-string "WITH cte as (select 1 where 1 = ?) select * from cte where 2 = ?" \
--query-execution-context "Database"="default" \
--result-configuration "OutputLocation"="s3://athena-test-bucket/" \
--execution-parameters "1" "2"
# Output CSV:
# "_col0"
Strangely, if I reverse the parameters, the query works correctly:
aws athena start-query-execution \
--query-string "WITH cte as (select 1 where 1 = ?) select * from cte where 2 = ?" \
--query-execution-context "Database"="default" \
--result-configuration "OutputLocation"="s3://athena-test-bucket/" \
--execution-parameters "2" "1"
# Output CSV:
# "_col0"
# "1"
The AWS docs state that:
In parameterized queries, parameters are positional and are denoted by ?. Parameters are assigned values by their order in the query. Named parameters are not supported.
But this clearly doesn't seem to be the case, as the query only acts as expected when the parameters are in reversed order.
How can I get Athena to correctly insert parameters in the standard, first-to-last, positional way that's typical of other DBs? While the above example is simple, in reality, I have queries with arbitrarily nested CTEs with arbitrary amounts of parameters.
This is a know issue (incorrect parameters order for prepared statements using WITH clause) in old versions of Presto (link 1, link 2) and Trino (link) which was fixed. For Presto in version 0.272 (fix PR):
Fix parameter ordering for prepared statement queries using a WITH clause.
Sadly at the moment of writing Athena engine version 2 is based on Presto 0.217 so you need to workaround by changing the parameters order.

BigQuery table not found error on location

i'm trying to run below script but i keep getting error that the dataset isn't found. The problem is caused by $date on the Select Query, how do I fix this?
My goal is to copy the table from another dataset with data matching based on the date.
#!/bin/bash
date="20160805"
until [[ $date > 20160807 ]];
do
bq query --use_legacy_sql=false --destination_table="google_analytics.ga_sessions_${date}" 'SELECT g.* FROM `10241241.ga_sessions_$date` g, UNNEST (hits) as hits where hits.page.hostname="www.googlemerchandisestore.com" '
date=$(date +'%Y%m%d' -d "$date + 1 day")
done
Below error:
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r6a2d68fbc6d04a34_000001722edd8043_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r5c42006229434f72_000001722edd85ae_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r6114e0d3e72b6646_000001722edd8960_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU
The problem is that you are using single quotes for the query, therefore bash doesn't replace $date with value of parameter. You need to keep double quotes for the query string:
date="20160805"
until [[ $date > 20160807 ]];
do
bq query --use_legacy_sql=false --destination_table="google_analytics.ga_sessions_${date}" "SELECT g.* FROM \`10241241.ga_sessions_$date\` g, UNNEST (hits) as hits where hits.page.hostname=\"www.googlemerchandisestore.com\" "
date=$(date +'%Y%m%d' -d "$date + 1 day")
done

Arguments mismatch using where IN clause in query

I have column in hive table like below
testing_time
2018-12-31 14:45:55
2018-12-31 15:50:58
Now I want to get the distinct values as a variable so I can use in another query.
I have done like below
abc=`hive -e "select collect_set(testing_time)) from db.tbl";`
echo $abc
["2018-12-31 14:45:55","2018-12-31 15:50:58"]
xyz=${abc:1:-1}
when I do
hive -e "select * from db.tbl where testing_time in ($xyz)"
I get below error
Arguments for IN should be the same type! Types are {timestamp IN (string, string)
what the the mistake I am doing?
What is the correct way of achieving my result?
Note: I know I can use subquery for this scenario but I would like to use variable to achieve my result
Problem is that you're comparing timestamp (column testing_time) with string (i.e. "2018-12-31 14:45:55"), so you need to convert string to timestamp, which you can do via TIMESTAMP(string).
Here's a bash script that adds the conversion:
RES="" # here we will save the resulting SQL
IFS=","
read -ra ITEMS <<< "$xyz" # split timestamps into array
for ITEM in "${ITEMS[#]}"; do
RES="${RES}TIMESTAMP($ITEM)," # add the timestamp to RES variable,
# surrounded by TIMESTAMP(x)
done
unset IFS
RES="${RES%?}" # delete the extra comma
Then you can run the constructed SQL query:
hive -e "select * from db.tbl where testing_time in ($RES)"

Remove header from query result in bq command line

I have a query $(bq query --format=csv "select value from $BQConfig where parameter = 'Columnwidth'") .
The output of the query in csv format is :
value
3 4 6 8
here i want to get only the result 3 4 6 8 not the value which is just a header.
I have gone through google document and found that --noprint_header works only for bq extract. i didnt find anything for bq query.
If you are on a bash shell, you could use sed or awk to skip the first lines:
bq query --format=csv "SELECT 1 x" | sed "2 d"
Or:
bq query --format=csv "SELECT 1 x" | awk 'NR>2'
You can use the --skip_leading_rows argument (source : Create a table from a file)

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.
Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output
True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)
I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'