I am trying to schedule monthly data exports in Google bigquery using query scheduler. This is how my query looks atm:
export data options(
uri='gs://bucket_name/Test*.csv',
format='CSV',
header=true,
overwrite=true,
field_delimiter=';') as
select id from `project.database.table`;
This works perfectly when I run the query but fails when I save this as a scheduled query (Error: Cannot set destination table in jobs with EXPORT statement)
I cannot use the scheduler without specifying a result table. Is there a way to get around this limitation?
This sounds like a bug that BigQuery is requiring setting up a destination table for EXPORT DATA query. Please try this workaround while waiting for a fix:
-- Add this line for your query to be treated as a script
declare unused STRING;
export data options(
...
Related
I like to export data from Big query to Google cloud storage using any script. Also for multiple table using loop save in CSV format and overwrite existing file.
Also how can we schedule this script.
If anybody have answer that will be great help.
Thanks in advance
Common way to approach this problem is to use Airflow and write a DAG to meet your requirements.
But if you want to iterate tables and dump them in GCS on a regular basis only with BigQuery, following could be another option.
1. Export Data
You can export data to GCS with EXPORT DATA statement in BigQuery script.
EXPORT DATA OPTIONS(
uri='gs://bucket/folder/*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements
2. Loops and Dynamic SQL
If you have a list of table you want to dump, you can loop those tables in BigQuery FOR loop.
https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#loops
And you need to generate EXPORT DATA script dynamically for each table. To do so, you can use EXECUTE IMMEDIATE Dynamic SQL.
https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#execute_immediate
3. Scheduling
BigQuery provides a feature to schedule a user query and you can use it for your purpose.
https://cloud.google.com/bigquery/docs/scheduling-queries#set_up_scheduled_queries
Hello I'm very new in big query. I have some situation where my project needs to import from big query result. Here I'm facing some problems:
If I use bq extract, then I cannot use filter or where condition
If I use bq query and then save the result into some files, our server cannot process such large data. And some fields in data itself may contain comma and vertical bar character.
Can I achieve the condition where I could export or extract with filter with the the most efficient ways?
Thank you
The EXPORT DATA statement allows you to combine a query with extracting data to Cloud Storage in a single job.
See more details here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements
EXPORT DATA OPTIONS(
uri='gs://mybucket/my-path-prefix/*',
format='CSV',
header=true) AS
SELECT field1, field2, field3 FROM mydataset.mytable WHERE some_condition
I have a similar question asked in this link BigQuery - Export query results to local file/Google storage
I need to extract data from 2 big query tables using joins and where conditions. The extracted data has to be placed in a file on cloud storage. Mostly csv file. I want to go with a simple solution. Can I use big query export data statement In standard sql and schedule it?? Does it has a limitation of 1 Gb export?? If yes, what is the best possible way to implement this? Creating another temp table to save results from the query and using a data flow job to extras the data from the temp table? Please advise.
Basically google cloud now supports below
Please see code snippet in cloud documentation
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#exporting_data_to_csv_format
I’m thinking if I can use the above statement to export data into a file and select query will have join from 2 tables and other conditions.
This query will be a scheduled query in big query.
Any inputs please??
I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:
conf = {
# Input Parameters.
'mapred.bq.project.id': project,
'mapred.bq.gcs.bucket': bucket,
'mapred.bq.temp.gcs.path': input_directory,
'mapred.bq.input.project.id': 'publicdata',
'mapred.bq.input.dataset.id': 'samples',
'mapred.bq.input.table.id': 'shakespeare',
}
# Output Parameters.
output_dataset = 'wordcount_dataset'
output_table = 'wordcount_output'
# Load data in from BigQuery.
table_data = sc.newAPIHadoopRDD(
'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
'org.apache.hadoop.io.LongWritable',
'com.google.gson.JsonObject',
conf=conf)
copies the entirety of the named table into input_directory. The table I need to extract data from contains >500m rows and I don't need all of those rows. Is there a way to instead issue a query (as opposed to specifying a table) so that I can copy a subset of the data from a table?
Doesn't look like BigQuery supports any kind of filtering/querying for tables export at the moment:
https://cloud.google.com/bigquery/docs/exporting-data
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract
I got this from user guide :
bq --location=US extract 'mydataset.mytable' gs://example-bucket/myfile.csv
But I want to export the data to the file located in my local path
example : /home/rahul/myfile.csv
When I am trying I got the below error:
Extract URI must start with "gs://"
Is it possible to export in local directory?
Also, Can we export the result of our select query to the excel?
Example :
bq --location=US extract 'select * from mydataset.mytable' /home/abc/myfile.csv
No, the BigQuery extract operation takes data out from BigQuery into a Google Cloud Storage (GCS) bucket.
Once data is in GCS you can copy it to your local system with gsutil or any other tool that might combine both operations.