How to ensure faster response time using transact-SQL in Azure SQL DW when I combine data from SQL and non-relational data in Azure blob storage? - azure-sql-database

What should I do to ensure optimal query performance using transact-SQL in Azure SQL Data Warehouse while combining data sets from SQL and non-relational data in Azure Blob storage? Any inputs would be greatly appreciated.

The best practice is to load data from Azure Blob Storage into SQL Data Warehouse instead of attempting interactive queries over that data.
The reason is that when you run a query against your data residing in Azure Blob Storage (via an external table), SQL Data Warehouse (under-the-covers) imports all the data from Azure Blob Storage into SQL Data Warehouse temporary tables to process the query. So even if you a run SELECT TOP 1 query on your external table, the entire dataset for that table will be imported temporarily to process the query.
As a result, if you know that you will querying the external data frequently, it is recommended that you explicitly load the data into SQL Data Warehouse permanently using a CREATE TABLE AS SELECT command as shown in the document: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/.

As a best practice, break your Azure Storage data into no more than 1GB files when possible for parallel processing with SQL Data Warehouse. More information about how to configure Polybase in SQL Data Warehouse to load data from Azure Storage Blob is here: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/
Let me know if that helps!

Related

In SQL External table take a time for select and insert data into temp table

I am using external table of main database to Datawarehouse database. While selecting data from external table to # table it take almost 9 min, sometime it take more time. How I can improve this performance of external table?
Is to use the following TSQL to perform the query in the external database and get only the data required. The filter will be applied first in the external database, and then the data from the filter will be received by the database.
When you enable the query's Actual Execution Plan option, you can see
that the query : Select * from PerformanceVarcharNVarchar, brings data
from an external database to the temporal database, and then the
engine applies the filter.
Here is the Official Microsoft Documents :EXECUTE (Transact-SQL) | Docs
Else you can use Azure Data Sync : SQL Data Sync is an Azure SQL Database-based service that allows you to synchronize selected data bidirectionally between multiple databases, both on-premises and in the cloud.
The Original Post has got detailed insights: Lesson Learned #56:
External tables and performance issues | techcommunity

How to insert table between 2 server in Azure data warehouse?

I want to insert table from different factory / server in Azure data warehouse. Is it possible to insert by query?
Because it takes a lot of time if I make dataset and pipeline for each table in Azure data factory.
Just from your screenshot, according the icon of the Azure SQL, you're using Azure SQL database, not Azure Data Warehouse.
You could use Elastic Query to do a cross database query in Azure.
Ref the tutorial: Get started with cross-database queries (vertical partitioning) (preview)
Elastic database query (preview) for Azure SQL Database allows you to run T-SQL queries that span multiple databases using a single connection point. This article applies to vertically partitioned databases.
When completed, you will: learn how to configure and use an Azure SQL Database to perform queries that span multiple related databases.
Then you can inset data from external table to current database table.
Note: Azure SQL database already has the master key, you don't need create it again.
Hope this helps.

Comparing Azure table data against Cosmos Db

We need to do a weekly sync of data in an Azure SQL table into a Cosmos db table. The Azure SQL table is the source and has millions of records. Has anyone done this before and was there a tool used to do this?
Surely,i'd suggest you using Copy activity in Azure Data Factory which is applied for data transfer. You could configure Azure SQL DB as source and Cosmos db as sink. Please see the document:https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#supported-data-stores-and-formats
For your need, you need to do a weekly sync of data so you could create a schedule trigger for the copy activity.

Update changes in Azure SQL Data Warehouse using polybase

I want help regarding Azure SQL Data Warehouse, I'm using Polybase to ELT data from Azure Data Lake Storage Gen2 to Azure SQL DW. When we load data first time into DW no issues. But when we load data again/incremental load how do we upsert data?
Flow we are using
ASDL2 -> (polybase) -> External table -> (CTAS) -> Staging tables -> (transformation) -> dimension tables
Everytime data changes we reload data into ASDL2,
What is the best way to UPSERT data or we should also reload data into SQLDW?
Because MERGE is not supported in Azure Data Warehouse, you need to use other means to load data from the External tables to your Stage tables. PolyBase can be used to load both initial and incremental data to the external table schema but it is how you perform the loading to the staging tables.
The following is a great tutorial on how to deploy this solution: Using PolyBase to Update Tables in Data Warehouse from ADLS
Once the data is loaded to the external tables via PolyBase in a ADFv2 pipeline, a trigger is called to execute an sp in ADWH to perform the load to the staging tables.

Azure SQL External table of Azure Table storage data

Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?