I want to do something along the lines of:
SELECT some_things
FROM `myproject.mydataset.mytable_#suffix`
But this doesn't work because the parameter isn't expanded inside the table name.
This does work, using wildcard tables:
SELECT some_things
FROM `myproject.mydataset.mytable_*`
WHERE _TABLE_SUFFIX = #suffix
However, it has some problems:
If I mistype the parameter, this query silently returns zero rows, rather than yelling at me loudly.
Query caching stops working when querying with a wildcard.
If other tables exist with the mytable_ prefix, they must have the same schema, even if they don't match the suffix. Otherwise, weird stuff happens. It seems like BigQuery either computes the union of all columns, or takes the schema of an arbitrary table; it's not documented and I didn't look at it in detail.
Is there a better way to query a single table whose name depends on a query parameter?
Yes, you can, here's a working example:
DECLARE tablename STRING;
DECLARE tableQuery STRING;
##get list of tables
CREATE TEMP TABLE tableNames as select table_name from nomo_nausea.INFORMATION_SCHEMA.TABLES where table_name not in ('_sdc_primary_keys', '_sdc_rejected', 'fba_all_order_report_data');
WHILE (select count(*) from tableNames) >= 1 DO
SET tablename = (select table_name from tableNames LIMIT 1);
##build dataset + table name
SET tableQuery = CONCAT('nomo_nausea.' , tablename);
##use concat to build string and execute
EXECUTE IMMEDIATE CONCAT('SELECT * from `', tableQuery, '` where _sdc_deleted_at is not null');
DELETE FROM tableNames where table_name = tablename;
END WHILE;
In order to answer your stated problems:
Table scanning happens in FROM clause, in WHERE clause happens filtering [1] thus if WHERE condition is not match an empty result would be returned.
"Currently, Cached results are not supported when querying with wildcard" [2].
"BigQuery uses the schema for the most recently created table that matches the wildcard as the schema" [3]. What kind of weird stuff you have faced in your use case? "A wildcard table represents a union of all the tables that match the wildcard expression" [4].
In BigQuery parameterized queries can be run, But table names can not be parameterized [5]. Your wildcard solution seems to be the only way.
You can actually use tables as parameters if you use the Python API, but it's not documented yet. If you pass the tables as parameters through a formatted text string vs. a docstring, your query should work.
SQL example:
sql = "SELECT max(_last_updt) FROM `{0}.{1}.{2}` WHERE _last_updt >= TIMESTAMP(" +
"CURRENT_DATE('-06:00'))".format(project_id, dataset_name, table_name)
SQL in context of Python API:
bigquery_client = bigquery.Client() #setup the client
query_job = bigquery_client.query(sql) #run the query
results = query_job.result() # waits for job to complete
for row in results:
print row
Related
My team and I are using a query on a daily basis to receive specific results from a large dataset. This query is constantly updated with different terms that I would like to receive from the dataset.
To make this job more scaleable, I built a table of arrays, each containing the terms and conditions for the query. That way the query can lean on the table, and changes that I make in the table will affect the query without the need to change it.
The thing is - I can't seem to find a way to reference the table in the actual query without selecting it. I want to use the content of the table as a WHERE condition. for example:
table1:
terms
[term1, term2, term3]
query:
select * from dataset
where dataset.collumn like '%term1'
or dataset.collumn like '%term2'
or dataset.collumn like '%term3'
etc.
If you have any ideas please let me know (if the solution involves Python or JS this is also great)
thanks!
You can "build" the syntax you want using Procedural Language in BigQuery and then execute it. Here is a way of doing it without "leaving" BQ (meaning, without using external code):
BEGIN
DECLARE statement STRING DEFAULT 'SELECT col FROM dataset.table WHERE';
FOR record IN (SELECT * FROM UNNEST(['term1','term2','term3']) as term)
DO
SET statement = CONCAT(statement, ' col LIKE "', '%', record.term, '" OR');
END FOR;
SET statement = CONCAT(statement, ' 1=2');
EXECUTE IMMEDIATE statement;
END;
I am trying to write a BigQuery script that I can store as a procedure, I would like one of the arguments I pass to be used in the table name that is written out by the script, for example:
DECLARE id STRING;
SET id = '123';
CREATE OR REPLACE TABLE test.id AS(
SELECT * FROM dataset.table
)
However, in this example the table is created with the name id rather than the value of the "id" variable, 123. Is there any way I can dynamically create a table using the value of a declared variable in the BigQuery UI?
Why not just use Execute Immediate with concat if you know the table schema?
EXECUTE IMMEDIATE CONCAT('CREATE TABLE `', id, '` (column_name STRING)');
So far we have officially announced BigQuery scripting document, which is still in Beta phase, leveraging usage of dynamic parameters (variables) as a placeholders for values in SQL queries . However, according to Parameterized queries in BigQuery documentation, query parameters can't be used for SQL object identifiers:
Parameters cannot be used as substitutes for identifiers, column
names, table names, or other parts of the query.
Maybe you can use a wildcard table. You would create a wildcard table with all subtables you want to query and use the WHERE clause to select any subtable you want. Just be careful, the tables must have the same schema.
I am trying to write a BigQuery script that I can store as a procedure, I would like one of the arguments I pass to be used in the table name that is written out by the script, for example:
DECLARE id STRING;
SET id = '123';
CREATE OR REPLACE TABLE test.id AS(
SELECT * FROM dataset.table
)
However, in this example the table is created with the name id rather than the value of the "id" variable, 123. Is there any way I can dynamically create a table using the value of a declared variable in the BigQuery UI?
Why not just use Execute Immediate with concat if you know the table schema?
EXECUTE IMMEDIATE CONCAT('CREATE TABLE `', id, '` (column_name STRING)');
So far we have officially announced BigQuery scripting document, which is still in Beta phase, leveraging usage of dynamic parameters (variables) as a placeholders for values in SQL queries . However, according to Parameterized queries in BigQuery documentation, query parameters can't be used for SQL object identifiers:
Parameters cannot be used as substitutes for identifiers, column
names, table names, or other parts of the query.
Maybe you can use a wildcard table. You would create a wildcard table with all subtables you want to query and use the WHERE clause to select any subtable you want. Just be careful, the tables must have the same schema.
I want to write a query that examines all the tables in an SQLite database for a piece of information in order to simplify my post-incident diagnostics (performance doesn't matter).
I was hoping to write a query that uses the sqlite_master table to get a list of tables and then query them, all in one query:
SELECT Name
FROM sqlite_master
WHERE Type = 'table' AND (
SELECT count(*)
FROM Name
WHERE conditions
) > 0;
However when attempting to execute this style of query, I receive an error no such table: Name. Is there an alternate syntax that allows this, or is it simply not supported?
SQLite is designed as an embedded database, i.e., to be used together with a 'real' programming language.
To be able to use such dynamic constructs, you must go outside of SQLite itself:
cursor.execute("SELECT name FROM sqlite_master")
rows = cursor.fetchall()
for row in rows:
sql = "SELECT ... FROM {} WHERE ...".format(row[0])
cursor.execute(sql)
I want to write a query that examines all the tables in an SQLite database for a piece of information in order to simplify my post-incident diagnostics (performance doesn't matter).
I was hoping to write a query that uses the sqlite_master table to get a list of tables and then query them, all in one query:
SELECT Name
FROM sqlite_master
WHERE Type = 'table' AND (
SELECT count(*)
FROM Name
WHERE conditions
) > 0;
However when attempting to execute this style of query, I receive an error no such table: Name. Is there an alternate syntax that allows this, or is it simply not supported?
SQLite is designed as an embedded database, i.e., to be used together with a 'real' programming language.
To be able to use such dynamic constructs, you must go outside of SQLite itself:
cursor.execute("SELECT name FROM sqlite_master")
rows = cursor.fetchall()
for row in rows:
sql = "SELECT ... FROM {} WHERE ...".format(row[0])
cursor.execute(sql)