Pasting multi-line queries into BigQuery SQL shell - google-bigquery

I'm running the BigQuery command line shell and I'm not able to successfully run multi-line queries (aka queries with line breaks) because whenever I paste the query into the shell each line gets run individually instead of the whole thing together.
For example,
select * from table
works fine because it is in one line but if I try to run
select
*
from
table
it does not work because each line gets run separately.
Is there any way to get this to work?

The query command creates a query job that runs the supplied SQL query. in the documentation Using the bq command-line tool you can find some examples like:
bq query --nouse_legacy_sql \
'SELECT
COUNT(*)
FROM
`bigquery-public-data`.samples.shakespeare'

Honestly I too couldn't find decent help on this, so I created a bit of a workaround using the CONCAT function. Notice that I have a space before the FROM clause. Hope his helps.
DECLARE table_name STRING DEFAULT '20220214';
DECLARE query STRING;
SET query = CONCAT("""select user_pseudo_id, user_id""",
""" FROM `app.analytics.events_""" || table_name || """` where user_id IS NOT NULL group by user_pseudo_id, user_id""");
EXECUTE IMMEDIATE query;

Related

SQL where clause parameterization

I am trying to parameterize certain where clauses to standardized my Postgres SQL scripts for DB monitoring. But have not found a solution that will allow me to have the following script run successfully
variablename = "2021-04-08 00:00:00"
select * from table1
where log_date > variablename;
select * from table2
where log_date > variablename;
Ideally, I would be able to run each script separately, but being able to find/replace the variable line would go a long way for productivity.
Edit: I am currently using DBeaver to run my scripts
To do this in DBeaver I figured out you can use Dynamic Parameter Bindings feature.
https://github.com/dbeaver/dbeaver/wiki/SQL-Execution#dynamic-parameter-bindings
Here is a quick visual demo of how to use it:
https://twitter.com/dbeaver_news/status/1085222860841512960?lang=en
select * from table1
where log_date > :variablename;
When executing the query DBeaver will prompt you for the value desired and remember it when running another query.

Impala SQL in Jupyter Notebooks

Can anybody help me with the correct syntax to run multi line Impala - SQL queries in Jupyter notebooks. I have been using the approach below to run queries on a single line but have not been able to work out how to run multi line queries with indentation.
! impala-shell -q 'SELECT COUNT(*) FROM (SELECT DISTINCT eyesight_evaluation.patient_id FROM eyesight_evaluation WHERE severity_of_sight_loss IN ("Slight","Mild","Moderate","Moderately Severe","Severe","Profound")) AS TotalDiagnosed;'
Thanks!
You have to use 3 single quotes for multi-line comments
'''SELECT COUNT(*) FROM (SELECT DISTINCT eyesight_evaluation.patient_id FROM
eyesight_evaluation WHERE severity_of_sight_loss IN
("Slight","Mild","Moderate","Moderately Severe","Severe","Profound")) AS TotalDiagnosed;'''

How to submit multiple queries in Google BigQuery Composer and Cloud Shell

Just a simple question, please don't tell me that submitting multiple queries is not supported in Query Composer and Google Cloud Shell.
When I submit two statements(for example drop table statements delimited by ";"), it tells me that the drop word on the next line is unexpected.
Turns out that there are no ways to execute multiple queries in either the BigQuery Composer or the the Google Cloud Shell. However, 1 workaround that I have found is to create a local text file in Cloud Shell which stores the queries, delimited by ";". And then set the IFS (Internal Field Separator) to ";" so that I can use a for loop to loop through the file and execute the queries one by one.
Example:
queries.txt
select 1+2;
select 2+3;
select 3+4;
Cloud Shell command
IFS=";"
alias bqn="bq query --nouse_legacy_sql"
for q in $(<"queries.txt"); do bqn $q; done;
BigQuery now has support for multi-statement execution. Check out the scripting documentation. Copying the example:
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data`.usa_names.usa_1910_current
WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data`.samples.shakespeare
);
Google BigQuery is an SQL like language and not all implementations of a mainstream SQL language will be directly compatible with BigQuery.
That being said, there are many ways to workaround. If you are creating table to materialize data in order to have better Query performance and limit the cost of storing data in BigQuery, you can set a expiration date on the temporary table.
This is the command with the expiration date flag:
bq --location=[LOCATION] mk --dataset --default_table_expiration [INTEGER] --description [DESCRIPTION] [PROJECT_ID]:[DATASET]

Using Google Datalab, how to create table from a complex query with UDF in Google BigQuery?

I have seen many similar questions, but they are not specific to Google Cloud Datalab AND using UDF at the same time, e.g. query execution shows Unknown TVS error and creating table from query result via python API.
I managed to create the table when UDF was not used, but when it was, it returned error "Unknown TVF: myFunc".
Edit
Here is the code I'm using:
%%bigquery udf --module transform_field
...udf function...
Then I used udf function above in an sql query:
%%sql --module querymodule
...complex SELECT query FROM transform_field(table)...
Then I'd like to use that query to create another table as shown below:
%%bigquery execute --target project:dataset.tablename --query querymodule
But it kept showing this error instead:
Unknown TVF: TRANSFORM_FIELD
Alright, I found it. Turns out you need to pass the query through a python cell before using it in a %%bigquery execute cell:
bq_query = bq.Query(querymodule, udf=transform_field)
Thus the entire process should go as follows:
%%bigquery udf --module transform_field
...udf function...
Then I used udf function above in an sql query:
%%sql --module querymodule
...complex SELECT query FROM transform_field(table)...
Then use the query and udf function above to create a bq.Query object.
bq_query = bq.Query(querymodule, udf=transform_field)
Then use bq_query in table creation.
%%bigquery execute --target project:dataset.tablename --query bq_query
I keep being amazed at what good night sleeps do.

From unix to Sql

I am making a shell script where I am reading inputs from one file. File contains data
123
1234
121
I am reading the inputs from this file using while read line do condition and putting all inputs in SQL statements.Now in my shell script i am going on SQL Prompt and running some queries. In one condition, I am using EXECUTE IMMEDIATE STATEMENT in SQL.
as
EXECUTE IMMEDIATE 'CREATE TABLE BKP_ACDAGENT4 as SELECT * FROM BKP_ACDAGENT WHERE DATASOURCEAGENTID IN ('123','1234','121')';
I want this to be execute, but somehow its not working.
Can anyone help me in executing it?
You need to escape the single quotes which you have used for the predicates in your IN list, that is the single quotes in
WHERE DATASOURCEAGENTID IN ('123','1234','121')';
are causing the issue here. You need to escape the single quotes using two single quotes
EXECUTE IMMEDIATE 'CREATE TABLE BKP_ACDAGENT4 as SELECT * FROM BKP_ACDAGENT WHERE DATASOURCEAGENTID IN (''123'',''1234'',''121'')';
The above will work on all Oracle version.
If you're one Oracle 10g or above, you can use q keyword
EXECUTE IMMEDIATE q'[CREATE TABLE BKP_ACDAGENT4 as SELECT * FROM BKP_ACDAGENT WHERE DATASOURCEAGENTID IN ('123','1234','121')]';