Upload Google Cloud SQL backup to Bigquery - google-bigquery

I have had troubles trying to move a Google Cloud SQL database to BigQuery. I have exported the database backup from Cloud SQL to Cloud Storage, but when trying to import this into BigQuery, I get the error: 'Not found: URI' for gs://bucket-name/file-name
Is what I'm trying to do even possible? I'm hoping to somehow directly upload the Cloud SQL data to BigQuery. It's a large table (>27GB) and I have been having a lot of connection issues with Cloud SQL, so exporting as CSV or JSON isn't the best option.

BigQuery doesn't support the mysql backup format, so the best route forward is to generate csv or json from the cloud sql database and persist those files into cloud storage.
More information on importing data can be found in the BigQuery documentation.

You can use BigQuery Cloud SQL federated query to copy Cloud SQL table into BigQuery. You can do it with one BigQuery SQL statement. For example, following SQL copy MySQL table sales_20191002 to BigQuery table demo.sales_20191002.
INSERT
demo.sales_20191002 (column1, column2 etc..)
SELECT
*
FROM
EXTERNAL_QUERY(
"project.us.connection",
"SELECT * FROM sales_20191002;");
EXTERNAL_QUERY("connection", "foreign SQL") would execute the "foreign SQL" in the Cloud SQL database specified in "connection" and return result back to BigQuery. "foreign SQL" is the source database SQL dialect (MySQL or PostgreSQL).
Before running above SQL query, you need to create a BigQuery connection which point to your Cloud SQL database.
To copy the whole Cloud SQL database, you may want to write a script to iterate all tables and copy them in a loop.

Related

BigQuery to GCS and GCS to Mysql

I am creating a Airflow pipeline where I use the BigQueryOperator to query my BigQuery tables and use the BigQueryToCloudStorageOperator to export the result table to GCS as csv.
I need to move the csv to a mysql database where it should be stored as a table in the mysql database.
Can I please get any advice or ideas on how to implement this. Thanks!
Since your use case is query data in BigQuery and store data in your MySql database you can use BigQueryToMySqlOperator.
Fetches the data from a BigQuery table (alternatively fetch data for
selected columns) and insert that data into a MySQL table.

Azure SQL External table of Azure Table storage data

Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?

What is the recommended way to make a copy of a Cloud SQL Database in BigQuery?

We would like to import our Cloud SQL database into BigQuery to query along with other datasets we have there.
What is the best path to doing this?
You can export your data as CSV files and then load them into BigQuery.
Shamless plug for this answer to a similar question: set up a "connection" to allow Cloud SQL Federated Queries and do all the work directly within BigQuery:
INSERT
demo.customers (column1)
SELECT
*
FROM
EXTERNAL_QUERY("project.us.connection",
"SELECT column1 FROM mysql_table;");

How to ensure faster response time using transact-SQL in Azure SQL DW when I combine data from SQL and non-relational data in Azure blob storage?

What should I do to ensure optimal query performance using transact-SQL in Azure SQL Data Warehouse while combining data sets from SQL and non-relational data in Azure Blob storage? Any inputs would be greatly appreciated.
The best practice is to load data from Azure Blob Storage into SQL Data Warehouse instead of attempting interactive queries over that data.
The reason is that when you run a query against your data residing in Azure Blob Storage (via an external table), SQL Data Warehouse (under-the-covers) imports all the data from Azure Blob Storage into SQL Data Warehouse temporary tables to process the query. So even if you a run SELECT TOP 1 query on your external table, the entire dataset for that table will be imported temporarily to process the query.
As a result, if you know that you will querying the external data frequently, it is recommended that you explicitly load the data into SQL Data Warehouse permanently using a CREATE TABLE AS SELECT command as shown in the document: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/.
As a best practice, break your Azure Storage data into no more than 1GB files when possible for parallel processing with SQL Data Warehouse. More information about how to configure Polybase in SQL Data Warehouse to load data from Azure Storage Blob is here: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/
Let me know if that helps!

Can I denormalize data in google cloud sql in prep for bigquery

Given that bigquery is not meant as a platform to denormalize data, can I denormalize the data in google cloud sql prior to importing into bigquery?
I have the following tables:
Table1 500M rows, Table2 2M rows, Table3 800K rows,
I can't denormalize in our existing relational database for various reasons. So I'd like to do a sql dump of the data base, load it into google cloud sql, then use sql join scripts to create one large flat table to be imported into bigquery.
Thanks.
That should work. You should be able to dump the generated flat table to csv and import to bigquery. There is no direct Cloud SQL to bigquery loading mechanism, currently, however.