BigQuery table not found error on location - google-bigquery

i'm trying to run below script but i keep getting error that the dataset isn't found. The problem is caused by $date on the Select Query, how do I fix this?
My goal is to copy the table from another dataset with data matching based on the date.
#!/bin/bash
date="20160805"
until [[ $date > 20160807 ]];
do
bq query --use_legacy_sql=false --destination_table="google_analytics.ga_sessions_${date}" 'SELECT g.* FROM `10241241.ga_sessions_$date` g, UNNEST (hits) as hits where hits.page.hostname="www.googlemerchandisestore.com" '
date=$(date +'%Y%m%d' -d "$date + 1 day")
done
Below error:
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r6a2d68fbc6d04a34_000001722edd8043_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r5c42006229434f72_000001722edd85ae_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r6114e0d3e72b6646_000001722edd8960_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU

The problem is that you are using single quotes for the query, therefore bash doesn't replace $date with value of parameter. You need to keep double quotes for the query string:
date="20160805"
until [[ $date > 20160807 ]];
do
bq query --use_legacy_sql=false --destination_table="google_analytics.ga_sessions_${date}" "SELECT g.* FROM \`10241241.ga_sessions_$date\` g, UNNEST (hits) as hits where hits.page.hostname=\"www.googlemerchandisestore.com\" "
date=$(date +'%Y%m%d' -d "$date + 1 day")
done

Related

Cannot define a BigQuery column as ARRAY<STRUCT<INT64, INT64>>

I am trying to define a table that has a column that is an arrays of structs using standard sql. The docs here suggest this should work:
CREATE OR REPLACE TABLE ta_producer_conformed.FundStaticData
(
id STRING,
something ARRAY<STRUCT<INT64,INT64>>
)
but I get an error:
$ bq query --use_legacy_sql=false --location=asia-east2 "$(cat xxxx.ddl.temp.sql | awk 'ORS=" "')"
Waiting on bqjob_r6735048b_00000173ed2d9645_1 ... (0s) Current status: DONE
Error in query string: Error processing job 'xxxxx-10843454-yyyyy-
dev:bqjob_r6735048b_00000173ed2d9645_1': Illegal field name:
Changing the field (edit: column!) name does not fix it. What I am doing wrong?
The fields within the struct need to be named so this works:
CREATE OR REPLACE TABLE ta_producer_conformed.FundStaticData
(
id STRING,
something ARRAY<STRUCT<x INT64,y INT64>>
)

In BigQuery, how to query a table using fully qualified name where project name contains - hyphen

I am trying to run this cmd, and it seems BigQuery doesn't like the - in project name. And com-fin-dev is not my default project.
bq --location=US query --use_legacy_sql=false "SELECT col FROM com-fin-dev:my_schema.my_table LIMIT 10"
Syntax error: Unexpected "-"
Any alternative ways i can use the fully qualified table name, in BigQuery where project name contains - hyphen
For the shell that I use, I have to escape the backticks:
bq --location=US query --use_legacy_sql=false \
"SELECT col FROM \`com-fin-dev.my_schema.my_table\` LIMIT 10"
Note that you only need to escape the project name:
bq --location=US query --use_legacy_sql=false \
"SELECT col FROM \`com-fin-dev\`.my_schema.my_table LIMIT 10"
you should use below "spelling"
`com-fin-dev.my_schema.my_table`

Google BigQuery: bq query from command line fails, although it runs fine from the UI Query Editor

When I run the following query from the BQ UI, it runs fine and gives the desired output.
But when I run the same from the command line, it gives the following error
bq query --destination_table
Chicago_Traffic_Sensor_Data.Latest_Traffic_Data_with_Geocoding
--replace --use_legacy_sql=false 'SELECT segmentid, _lif_lat, start_lon, _lit_lat, _lit_lon, _traffic, _last_updt, CASE WHEN
_traffic < 20 THEN '#FF0000' WHEN _traffic >= 20 and _traffic < 40 THEN '#FFFF00' WHEN _traffic >= 40 THEN '008000' ELSE '#666666' END as
strokeColor FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY segmentid
ORDER BY _last_updt DESC) col FROM
Chicago_Traffic_Sensor_Data.traffic_sensor_data) x WHERE x.col = 1
ORDER BY segmentid' Error in query string: Error processing job
'bedrock-gcp-testing:bqjob_r5d944587b08c4e54_000001626fa3f61d_1':
Syntax error: Unexpected end of statement at [1:412]
You need to escape any characters in your SQL that the command line is trying to interpret. What I find much easier & quicker is to put my SQL in a file, and pipe it instead. For example:
bq query --destination_table grey-sort-challenge:partitioning_magic.foobarred --use_legacy_sql=false "$(cat data.sql)"

Simple query from Dataflow to BigQuerySource throws exception

I'm trying to write a simple Dataflow job that utilizes the query parameter within the BigQuerySource class.
In simplest terms, I can access a BigQuery table using the BigQuerySource class, and then filter against it. I cannot query / filter directly against the BigQuery table using the BigQuerySource.
Here's some code. Filtering in-line, within my Dataflow pipeline, works fine:
import argparse
import apache_beam as beam
parser = argparse.ArgumentParser()
parser.add_argument('--output', required=True)
known_args, pipeline_args = parser.parse_known_args(None)
p = beam.Pipeline(argv=pipeline_args)
source = 'bigquery-public-data:samples.shakespeare'
rows = p | 'read'>>beam.io.Read(beam.io.BigQuerySource(source))
f = rows | 'filter' >> beam.Map(lambda row: 1 if (row['word_count'] > 1) else 0)
f | 'write' >> beam.io.WriteToText(known_args.output)
p.run()
Replacing that middle stanza with a single line query gives an error.
f = p | 'read' >> beam.io.Read(beam.io.BigQuerySource('SELECT 1 FROM ' \
+ 'bigquery-public-data:samples.shakespeare where word_count > 1'))
The error returned looks like a syntax error.
(a29eabc394a38f62): Workflow failed. Causes:
(a29eabc394a38cfa): S04:read+write/Write/WriteImpl/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write failed.,
(fb6d0643d7f13886): BigQuery execution failed.,
(fb6d0643d7f13b03): Error: Message: Encountered " "-" "- "" at line 1, column 59. Was expecting: <EOF>
Do I need to escape the - characters in the BigQuery project name?
In BigQuery Legacy SQL - you should escape whole table reference with [ and ]
For Standard SQL you should use back-ticks for the same reason
See also Escaping reserved keywords and invalid identifiers

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.
Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output
True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)
I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'