How to insert bulk data through spark dataframe to Sql Server Data warehouse by using Service Principle using Databricks - sql-server-2012

I'm trying to insert bulk data through spark dataframe to Sql server data warehouse in Databricks. For this i'm using pyodbc module with service principle(not by using jdbc).I have achieved with single insertion.I couldn't find a way to insert bulk data to sql server data warehouse.Can someone help me a way to insert data in Bulk?

Examples here: https://docs.databricks.com/spark/latest/data-sources/azure/sql-data-warehouse.html
Though this tends to recommend you use a blob storage account between the two.
You can also use the standard SQL interface: https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
But you cannot use a service principal - you will need a SQL Login. I would store a connectionstring in key vault as a secret (using the SQL login). Get the secret using your service principal and then connect to SQL using the connetionstring.

You can do this nicely using polybase, it will require a location to store the temp files:
https://docs.databricks.com/data/data-sources/azure/sql-data-warehouse.html#azure-sql-data-warehouse

Related

How to insert table between 2 server in Azure data warehouse?

I want to insert table from different factory / server in Azure data warehouse. Is it possible to insert by query?
Because it takes a lot of time if I make dataset and pipeline for each table in Azure data factory.
Just from your screenshot, according the icon of the Azure SQL, you're using Azure SQL database, not Azure Data Warehouse.
You could use Elastic Query to do a cross database query in Azure.
Ref the tutorial: Get started with cross-database queries (vertical partitioning) (preview)
Elastic database query (preview) for Azure SQL Database allows you to run T-SQL queries that span multiple databases using a single connection point. This article applies to vertically partitioned databases.
When completed, you will: learn how to configure and use an Azure SQL Database to perform queries that span multiple related databases.
Then you can inset data from external table to current database table.
Note: Azure SQL database already has the master key, you don't need create it again.
Hope this helps.

How do I transfer Sqlite data to Mysql using Pentaho Data Integration(PDI)?

Hi so I have a webApp on local server that writes in a sqlite database. I want to transfer this data from Sqlite server to Mysql server.
How do I do that using Spoon, pentaho.
it's a simple task
create two database connection first one is sqlite and second one is mysql.
after that add table input step for sqlite connection. add table output for mysql connection in transformation.

Azure SQL External table of Azure Table storage data

Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?

Can't CREATE EXTERNAL DATA SOURCE in SQL

I'm trying to create an external data source to access Azure Blob Storage. However, I'm having issues with creating the actual data source.
I've followed the instructions located here:
Examples of bulk access to data in azure blob storage and
Create external data source - transact sql. I'm using SQL Server 2016 on a VM accessing via SSMS on a client machine using Windows Authentication with no issues. Instructions say creating this external data source works for SQL Server 2016 and Azure Blob Storage.
I have created the Master Key:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = <password>
and, the database scoped credential
CREATE DATABASE SCOPED CREDENTIAL UploadCountries
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = <key>;
I have verified both of these exist in the database by querying sys.symmetric_keys and sys.database_scoped_credentials.
However, when I try executing the following code it says 'Incorrect syntax near 'EXTERNAL'
CREATE EXTERNAL DATA SOURCE BlobCountries
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://<somewhere>.table.core.windows.net/<somewhere>',
CREDENTIAL = UploadCountries
);
Your thoughts and help are appreciated!
Steve.
In “Examples of Bulk Access to Data in Azure Blob Storage”, we can find:
Bulk access to Azure blob storage from SQL Server, requires at least SQL Server 2017 CTP 1.1.
And in Arguments section of “CREATE EXTERNAL DATA SOURCE (Transact-SQL)”, we can find similar information:
Use BLOB_STORAGE when performing bulk operations using BULK INSERT or OPENROWSET with SQL Server 2017
You are using SQL Server 2016, so you get Incorrect syntax near 'EXTERNAL' error when you create external data source for Azure Blob storage.

Upload Google Cloud SQL backup to Bigquery

I have had troubles trying to move a Google Cloud SQL database to BigQuery. I have exported the database backup from Cloud SQL to Cloud Storage, but when trying to import this into BigQuery, I get the error: 'Not found: URI' for gs://bucket-name/file-name
Is what I'm trying to do even possible? I'm hoping to somehow directly upload the Cloud SQL data to BigQuery. It's a large table (>27GB) and I have been having a lot of connection issues with Cloud SQL, so exporting as CSV or JSON isn't the best option.
BigQuery doesn't support the mysql backup format, so the best route forward is to generate csv or json from the cloud sql database and persist those files into cloud storage.
More information on importing data can be found in the BigQuery documentation.
You can use BigQuery Cloud SQL federated query to copy Cloud SQL table into BigQuery. You can do it with one BigQuery SQL statement. For example, following SQL copy MySQL table sales_20191002 to BigQuery table demo.sales_20191002.
INSERT
demo.sales_20191002 (column1, column2 etc..)
SELECT
*
FROM
EXTERNAL_QUERY(
"project.us.connection",
"SELECT * FROM sales_20191002;");
EXTERNAL_QUERY("connection", "foreign SQL") would execute the "foreign SQL" in the Cloud SQL database specified in "connection" and return result back to BigQuery. "foreign SQL" is the source database SQL dialect (MySQL or PostgreSQL).
Before running above SQL query, you need to create a BigQuery connection which point to your Cloud SQL database.
To copy the whole Cloud SQL database, you may want to write a script to iterate all tables and copy them in a loop.