BigQuery to GCS and GCS to Mysql - google-bigquery

I am creating a Airflow pipeline where I use the BigQueryOperator to query my BigQuery tables and use the BigQueryToCloudStorageOperator to export the result table to GCS as csv.
I need to move the csv to a mysql database where it should be stored as a table in the mysql database.
Can I please get any advice or ideas on how to implement this. Thanks!

Since your use case is query data in BigQuery and store data in your MySql database you can use BigQueryToMySqlOperator.
Fetches the data from a BigQuery table (alternatively fetch data for
selected columns) and insert that data into a MySQL table.

Related

From Big Query to Google cloud storage

I like to export data from Big query to Google cloud storage using any script. Also for multiple table using loop save in CSV format and overwrite existing file.
Also how can we schedule this script.
If anybody have answer that will be great help.
Thanks in advance
Common way to approach this problem is to use Airflow and write a DAG to meet your requirements.
But if you want to iterate tables and dump them in GCS on a regular basis only with BigQuery, following could be another option.
1. Export Data
You can export data to GCS with EXPORT DATA statement in BigQuery script.
EXPORT DATA OPTIONS(
uri='gs://bucket/folder/*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements
2. Loops and Dynamic SQL
If you have a list of table you want to dump, you can loop those tables in BigQuery FOR loop.
https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#loops
And you need to generate EXPORT DATA script dynamically for each table. To do so, you can use EXECUTE IMMEDIATE Dynamic SQL.
https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#execute_immediate
3. Scheduling
BigQuery provides a feature to schedule a user query and you can use it for your purpose.
https://cloud.google.com/bigquery/docs/scheduling-queries#set_up_scheduled_queries

Can I use big query export data statement and scheduled the query?

I have a similar question asked in this link BigQuery - Export query results to local file/Google storage
I need to extract data from 2 big query tables using joins and where conditions. The extracted data has to be placed in a file on cloud storage. Mostly csv file. I want to go with a simple solution. Can I use big query export data statement In standard sql and schedule it?? Does it has a limitation of 1 Gb export?? If yes, what is the best possible way to implement this? Creating another temp table to save results from the query and using a data flow job to extras the data from the temp table? Please advise.
Basically google cloud now supports below
Please see code snippet in cloud documentation
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#exporting_data_to_csv_format
I’m thinking if I can use the above statement to export data into a file and select query will have join from 2 tables and other conditions.
This query will be a scheduled query in big query.
Any inputs please??

Table replication in Bigquery

Need to replicate tables from Prod to Test within Bigquery. Apart from BQ export/import, please let me know if there are any replication utility/tools within Bigquery.
Thanks.
To copy a table in BigQuery you can use several methods:
bq tool
Transfer the entire dataset using the transfer tool
Copy using the BigQuery UI
You can also query the table and write its results to a new table
You can try these options:
BigQuery Data Transfer Service:
https://cloud.google.com/bigquery-transfer/docs/working-with-transfers
Copy Tables:
bq cp source-project:dataset.table target-project:dataset.table
CREATE TABLE AS SELECT (CTAS):
CREATE TABLE `target-project.dataset.table` AS SELECT * FROM `source-project.dataset.table`
BigQuery API Client Libraries:
https://cloud.google.com/bigquery/docs/reference/libraries

Upload Google Cloud SQL backup to Bigquery

I have had troubles trying to move a Google Cloud SQL database to BigQuery. I have exported the database backup from Cloud SQL to Cloud Storage, but when trying to import this into BigQuery, I get the error: 'Not found: URI' for gs://bucket-name/file-name
Is what I'm trying to do even possible? I'm hoping to somehow directly upload the Cloud SQL data to BigQuery. It's a large table (>27GB) and I have been having a lot of connection issues with Cloud SQL, so exporting as CSV or JSON isn't the best option.
BigQuery doesn't support the mysql backup format, so the best route forward is to generate csv or json from the cloud sql database and persist those files into cloud storage.
More information on importing data can be found in the BigQuery documentation.
You can use BigQuery Cloud SQL federated query to copy Cloud SQL table into BigQuery. You can do it with one BigQuery SQL statement. For example, following SQL copy MySQL table sales_20191002 to BigQuery table demo.sales_20191002.
INSERT
demo.sales_20191002 (column1, column2 etc..)
SELECT
*
FROM
EXTERNAL_QUERY(
"project.us.connection",
"SELECT * FROM sales_20191002;");
EXTERNAL_QUERY("connection", "foreign SQL") would execute the "foreign SQL" in the Cloud SQL database specified in "connection" and return result back to BigQuery. "foreign SQL" is the source database SQL dialect (MySQL or PostgreSQL).
Before running above SQL query, you need to create a BigQuery connection which point to your Cloud SQL database.
To copy the whole Cloud SQL database, you may want to write a script to iterate all tables and copy them in a loop.

Can I denormalize data in google cloud sql in prep for bigquery

Given that bigquery is not meant as a platform to denormalize data, can I denormalize the data in google cloud sql prior to importing into bigquery?
I have the following tables:
Table1 500M rows, Table2 2M rows, Table3 800K rows,
I can't denormalize in our existing relational database for various reasons. So I'd like to do a sql dump of the data base, load it into google cloud sql, then use sql join scripts to create one large flat table to be imported into bigquery.
Thanks.
That should work. You should be able to dump the generated flat table to csv and import to bigquery. There is no direct Cloud SQL to bigquery loading mechanism, currently, however.