BigQuery - scheduled query through CLI - google-bigquery

Simple question regarding bq cli tool. I am fairly confident the answer is, as of the writing of this question, no, but may be wrong.
Is it possible to create a scheduled query (similar to seen in the screenshot below) using the bq cli tool?

Yes, scheduled queries now could be created with bq mk --transfer_config. Please see examples below:
To create a scheduled query with query SELECT 1:
bq mk --transfer_config --target_dataset=mydataset --display_name='My Scheduled Query' --schedule='every 24 hours' --params='{"query":"SELECT 1","destination_table_name_template":"mytable","write_disposition":"WRITE_TRUNCATE"}' --data_source=scheduled_query
Note:
--target_dataset is required.
--display_name is required.
In --params field, query is required and we only support Standard SQL queries.
In --params field, destination_table_name_template is optional for DML and DDL but required for regular SELECT queries.
In --params field, write_disposition is same as destination_table_name_template, required for regular SELECT queries but optional for DML and DDL.
--data_source needs to be always set to scheduled_query to create a scheduled query.
After a scheduled query is created successfully, you could expect a full resource name, for example:
Transfer configuration 'projects/<p>/locations/<l>/transferConfigs/5d1bec8c-0000-2e6a-a4eb-089e08248b78' successfully created.
To schedule a backfill for this scheduled query, for example:
bq mk --transfer_run --start_time 2017-05-25T00:00:00Z --end_time 2017-05-25T00:00:00Z projects/<p>/locations/<l>/transferConfigs/5d1bec8c-0000-2e6a-a4eb-089e08248b78
Hope this helps! Thank you for using scheduled queries!

Related

BigQuery scheduled query failing

I have a BigQuery scheduled query that is failing with the following error:
Not found: Dataset bunny25256:dataset1 was not found in location US at [5:15]; JobID: 431285762868:scheduled_query_635d3a29-0000-22f2-888e-14223bc47b46
I scheduled the query via the SQL Workspace. When I run the query in the workspace, it works fine. The dataset and everything else that I have created is in the same region: us-central1.
Any ideas on what the problem could be, and how I could fix it or work around it?
There's nothing special about the query, it computes some statistics on a table in dataset1 and puts it in dataset2.
When you submit a query, you submit it to BQ at a given location. The dataset you created lives in us-central1 but your query was submitted to us. The location us and us-central1 are not the same. Change your scheduled query to run in us-central1. See docs on location for more info.
Dataset is not provided correctly- it should be in formate project.dataset.table
try running below in big query
select * from bunny25256:dataset1
you should provide bunny25256:dataset1.table

How to update a scheduled script query including DML statements in BigQuery?

I have a couple of scripts scheduled as scheduled queries in BigQuery. I need to update some of the scripts and sometimes I run into an error where BigQuery doesn't seem to recognize my queries as scripts. According to the documentation, a query that includes DML/DDL statements should not have a destination table configured. However, BigQuery is forcing me to input a destination table, which I don't think I am supposed to do, as per above-mentioned documentation and since my script already includes an INSERT statement.
Here is a screenshot of what I see:
BigQuery screenshot trying to update a DML scheduled script
On the left I have highlighted my script, which included DML, and on the right, the options that BigQuery is forcing me to fill.
What should I do in order to correctly update a script with DML statements, without having to input a destination table in the options?

How do you run a BigQuery query and store the results in a table on a regular basis

I have a BigQuery view which takes about 30 seconds to run. I want to, once a day at a designated time, run the view and store the results in a materialized table (e.g. so that Data Studio dashboards can use the table without making the dashboard take 30 seconds to load)
Is there a built-in way to do this using a tool like dataproc, or do you have to just set up a cronjob that just runs
CREATE TABLE dataset.materialized_view AS
SELECT *
FROM dataset.view;
on a regular basis?
You can achieve this using scheduled queries.
In the Classic BigQuery UI (Cloud Console UI support is under development at the time of this writing), write the query that you want to run in the "Compose Query" text area, then click the "Schedule Query" button. From the panel that appears, you can choose the frequency with which to run the query; the default is every 24 hours.
You can setup a regular cron job which runs the query to read data from you view and write it to a destination table. Based on your example, something like:
bq --location=[LOCATION] query -n 0 --destination_table dataset.materialized_view --use_legacy_sql=false --replace=true 'select * from dataset.view'

How do I have a Athena query run on a schedule and have the result set sent to an email

I have created a few Athena queries to generate reports.
Business wants these reports run nightly and have the output of the query emailed to them?
My first step is to schedule the execution of the saved/named Athena queries so that I can collect the output of the query execution from the S3 buckets.
Is there a way to automate the execution of the queries on a periodic basis?
You can schedule events in AWS using Lambda (see this tutorial). From Lambda you can run just about anything, including trigger some Athena query.

google-bigquery Schedule a nightly table copy

Is it possible to create a scheduled job / process to copy a bigquery table each night? I am trying to create automated nightly table backups, and I havent seen any examples of how to accomplish this.
Any help would be greatly appreciated.
Eric
You can use bq query tool to submit batch job and schedule it with cron (or Task Scheduler if on Windows). The command will look similar to this:
bq --nosync query --batch --allow_large_results --nouse_legacy_sql --replace --destination_table dataset.backup_table "select * from dataset.table"