Connecting to Cloud SQL server instance from BigQuery - google-bigquery

There is an option to connect a Cloud mySQL instance from BigQuery. I just wanted to know how we can connect a Cloud SQL Server instance to BigQuery.

SQL Server:
There are a bunch of third-party extensions/tools that provide this service. One of them is SSIS Data Flow Source & Destination for Google BigQuery, which is Visual Studio extension that connects SQL Server with Google BigQuery data through SSIS Workflows.:
https://www.cdata.com/drivers/bigquery/ssis/
https://marketplace.visualstudio.com/items?itemName=CDATASOFTWARE.SSISDataFlowSourceDestinationforGoogleBigQuery
In regards to using SQL Server Integration Services to load the data from the on-premises SQL Server to BigQuery, you can take a look for this site. You can also perform ETL from a relational database into BigQuery using Cloud Dataflow, the official documentation details how it can be done, you might need to use Cloud Storage as an intermediate data sink.
Cloud SQL:
BigQuery allows to query data from Cloud SQL by using federated query. The connection must be created within the same project where your Cloud SQL instance is located. If you want to query your data stored in your Cloud SQL instance from BigQuery located in another project, please follow the steps listed below:
Enable the BigQuery API and the BigQuery connection API within your project.
Create a connection to your Cloud SQL instance within the project by following this documentation.
Once you have created the connection, please locate and select it within BigQuery.
Click on the SHARE CONNECTION button and grant permissions to the users that will be use that connection. Please note that the BigQuery Connection User role is the only needed to use a shared connection.
Additionally, please notice that the "Cloud SQL federated queries" feature is in a Beta stage and might change or have limited support (is no available for certain regions, in which case, it is required to use one the supported options mentioned here). Please remember, that to use Cloud SQL Federated queries in BigQuery, the intances need to have a public IP.
If you are limited e.g. by region, one good option might be exporting the data from CloudSQL to Storage as a CSV, and then load it into BigQuery. If you need, it is possible to automate this process using Cloud Composer, refer to this article.
Other approach is to extract information from Cloud SQL (with exports) and import it into BigQuery through load jobs, or streaming inserts.
I hope you find the above pieces of information useful.

It is possible, but be warned the feature is currently Beta
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries

Related

Oracle Cloud to Azure Cloud storage

We have a requirement to move data from oracle Cloud storage to Azure Cloud storage.
The requirement is basically to move data from an Oracle ADW database (hosted on Oracle cloud) to Snowflake database (hosted on Azure).
Since the data volume in tables is huge (some with 60mil+ records) we do not wish to use any ETL tool and instead want to setup a pipeline as below.
Oracle ADW database -> Store data in Oracle storage --> Move data to Azure Cloud storage -> Load into Snowflake using snowpipe or similar snowflake utilities.
How should I go about this implementation?
Also share your views on whether we can use Oracle fastconnect and Azure ExpressRoute to directly pull data from Oracle Cloud onto snowflake (or into Azure storage)
I am looking for the same thing with the simplest method from Oracle (on prem but could be cloud), into Snowflake. Looks like data must be exporeted or dropped to external tables, shifted to Azure Blob storage (like AWS S3), then pushed into Snowflake using COPY INTO - basically copying on disk external tables. This is what Snowpipe does:
"Snowpipe copies the files into a queue, from which they are loaded into the target table in a continuous, serverless fashion based on parameters defined in a specified pipe object. The following table indicates the cloud storage service support for automated Snowpipe from Snowflake accounts hosted on each cloud platform:"
It's been a while since I have worked with this. The other option is GoldenGate, which was not expensive the last time I looked into it:
https://www.snowflake.com/blog/continuous-data-replication-into-snowflake-with-oracle-goldengate/
Easy, simple, fast. Anyone have any better ideas would be appreciated.

Google Cloud Data Fusion, How can I load many tables to bigquery in one pipeline

I want to load many tables which is in aws rds mysql server by using cloud data fusion. each table storage is more than about 1gb. also I found the plugin which name is "multiple database table" to load multi table. but i got a fail. Also basically when I used database source I can check my tables' schema. However, in multiple database table, i can 't find how to check table's schema. how can i use this plugin? or is there any other way to load many tables in data fusion service?
My pipeline setting was as follows.
I'm posting this Community Wiki as OP didn't provide enough details to reproduce but the below information might help someone.
There are few ways to get your data using Cloud Data Fusion, you can use pipeline, plugin, driver and a few others depending on your needs.
On the internet you can find two very well described guides with examples.
If you would like to find some information about Cloud Data Fusion with GCP products you should read Bahadir Bulut guide - How I used Google Cloud Data Fusion to create a data warehouse - Part 1 and Part 2. Also Data Fusion allows to use 150+ preconfigured connectors and transformations like Amazons S3, SQS, etc. Azure services and many more.
Another well described (which I guess would help OP) is to configure both Amazon and GCP resources and using pipelines. This guide is Building a Simple Batch Data Pipeline from AWS RDS to Google BigQuery — Part 1: Setting UP AWS Data pipeline and second part Building a Simple Batch Data Pipeline from AWS RDS to Google BigQuery — Part 2: Setting up BigQuery Transfer Service and Scheduled Query.. In short this guide describes 2 main steps:
Extract data from MYSQL RDS and bring into S3 using AWS data pipeline service
From S3, bring the file inside Bigquery using BigqQuery transfer service.

load data from on prem sql server to bigquery using SSIS

i setup the VPN connection between my on-prem sql server and GCP. i need to load more than 10 million rows data from sql server to bigquery. is there any way to achieve it? can i use SSIS to load the data to bigquery?
my team lead request to use dataflow to load the data to bigquery and not using SSIS.
In regards to using SSIS to load the data from the on-premises SQL Server to BigQuery, this might be what you're looking for. As for using Cloud Dataflow, the official GCP documentation details how it can be done, although you might need to use Cloud Storage as an intermediate data sink.

How do I create a BigQuery dataset out of another BigQuery dataset?

I need to understand the below:
1.) How does one BigQuery connect to another BigQuery and apply some logic and create another BigQuery. For e.g if i have a ETL tool like Data Stage and we have some data been uploaded for us to consume in form of a BigQuery. So in DataStage or using any other technology how do i design the job so that the source is one BQ and the Target is another BQ.
2.) I want to achieve like my input will be a VIEW (BigQuery) and then need to run some logic on the BigQuery View and then load into another BigQuery view.
3.) What is the technology used to connected one BigQuery to another BigQuery is it https or any other technology.
Thanks
If you have a large amount of data to process (many GB), you should do the transformation of the data directly in the Big Query database. It would be very slow to extract all the data, run it through something locally, and send it back. You don't need any outside technology to make one view depend on another view, besides access to the relevant data.
The ideal job design will be an SQL query that Big Query can process. If you are trying to link tables/views across different projects then the source BQ table must be listed in fully-specified form projectName.datasetName.tableName in the FROM clauses of the SQL query. Project names are globally unique in Google Cloud.
Permissions to access the data must be set up correctly. BQ provides fine-grained control over who can access, and it is in the BQ documentation. You can also enable public access to all BQ users if that is appropriate.
Once you have that SQL query, you can create a new view by sending your SQL to Google BigQuery either through the command line (the bq tool), the web console, or an API.
1) You can use BigQuery Connector in DataStage to read and write to bigquery.
2) Bigquery use namespaces in the format project.dataset.table to access tables across projects. This allows you to manipulate your data in GCP as it were in the same database.
To manipulate your data you can use DML or standard SQL.
To execute your queries you can use the GCP Web console or client libraries such as python or java.
3) BigQuery is a RESTful web service and use HTTPS

How to connect GCP SQL instance by using Bigquery?

I need help for something, I'm new in Bigquery..
I want to see my SQL tables over Bigquery connection at the Data Studio. However I couldn't figure it out how to connect GCP mySQL instance from Bigquery.
I tried to change region/location of my SQL instance.(I think it helped a little) And using this query at the new BigQuery web UI.. but I receive below error.
SELECT *
FROM
myinstanceid.my_database_01.TABLE_YS ys
Error: ((Not found: Project myinstanceid))
You cannot directly connect MySQL databases the only external data sources supported are: GCS files, Cloud Bigtable, Google Drive (even directly google spreadsheets).
To run analytics on data inside your MySQL DB you will need to export data into a supported external data source or even better into a BigQuery native table.