Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?
Related
Currently, I need to transfer all tables (DDL and data in the tables), stored procedures to another schema in Synapse data warehouse. I checked the documentation below, but it seems that I have to move all of them one by one.
https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-schema-transact-sql?view=sql-server-ver15
Is there a method, command or query which I can transfer all the contents of a schema to another in Synapse data warehouse ?
Kind regards,
There is no built-in method to do this, but depending on your skills there are a number of different options:
use SQL Server Management Studio (SSMS) built-in scripting options. Newer versions of SSMS (v18.x and onwards) are capable of producing DDL for Azure Synapse Analytics. Simply point to your object (table, stored proc, view etc) in Object Explorer, right-click it, and view the scripting options. eg for tables you will see 'Script Table as'
SQL Server Data Tools (SSDT) - SSDT now has support for Azure Synapse Analytics, dedicated SQL pools. So you can import your schema, do a find and replace in the .sql scripts in the project, and generate the script. You can also use the Data Compare and Schema Compare features.
command-line option mssql-cli. This offers powerful command-line scripting options but you'll need to download and install it: https://learn.microsoft.com/en-us/sql/tools/mssql-cli?view=sql-server-ver15
Use CTAS to transfer schema and data. Create a simple CTAS template and run it for each of your tables:
CREATE TABLE <new schema>.yourTable
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT *
FROM <old schema>.yourTable;
OPTION ( LABEL = 'CTAS: copy yourTable to new schema' );
So a few options for you.
I am using external table of main database to Datawarehouse database. While selecting data from external table to # table it take almost 9 min, sometime it take more time. How I can improve this performance of external table?
Is to use the following TSQL to perform the query in the external database and get only the data required. The filter will be applied first in the external database, and then the data from the filter will be received by the database.
When you enable the query's Actual Execution Plan option, you can see
that the query : Select * from PerformanceVarcharNVarchar, brings data
from an external database to the temporal database, and then the
engine applies the filter.
Here is the Official Microsoft Documents :EXECUTE (Transact-SQL) | Docs
Else you can use Azure Data Sync : SQL Data Sync is an Azure SQL Database-based service that allows you to synchronize selected data bidirectionally between multiple databases, both on-premises and in the cloud.
The Original Post has got detailed insights: Lesson Learned #56:
External tables and performance issues | techcommunity
I want to insert table from different factory / server in Azure data warehouse. Is it possible to insert by query?
Because it takes a lot of time if I make dataset and pipeline for each table in Azure data factory.
Just from your screenshot, according the icon of the Azure SQL, you're using Azure SQL database, not Azure Data Warehouse.
You could use Elastic Query to do a cross database query in Azure.
Ref the tutorial: Get started with cross-database queries (vertical partitioning) (preview)
Elastic database query (preview) for Azure SQL Database allows you to run T-SQL queries that span multiple databases using a single connection point. This article applies to vertically partitioned databases.
When completed, you will: learn how to configure and use an Azure SQL Database to perform queries that span multiple related databases.
Then you can inset data from external table to current database table.
Note: Azure SQL database already has the master key, you don't need create it again.
Hope this helps.
I want help regarding Azure SQL Data Warehouse, I'm using Polybase to ELT data from Azure Data Lake Storage Gen2 to Azure SQL DW. When we load data first time into DW no issues. But when we load data again/incremental load how do we upsert data?
Flow we are using
ASDL2 -> (polybase) -> External table -> (CTAS) -> Staging tables -> (transformation) -> dimension tables
Everytime data changes we reload data into ASDL2,
What is the best way to UPSERT data or we should also reload data into SQLDW?
Because MERGE is not supported in Azure Data Warehouse, you need to use other means to load data from the External tables to your Stage tables. PolyBase can be used to load both initial and incremental data to the external table schema but it is how you perform the loading to the staging tables.
The following is a great tutorial on how to deploy this solution: Using PolyBase to Update Tables in Data Warehouse from ADLS
Once the data is loaded to the external tables via PolyBase in a ADFv2 pipeline, a trigger is called to execute an sp in ADWH to perform the load to the staging tables.
What should I do to ensure optimal query performance using transact-SQL in Azure SQL Data Warehouse while combining data sets from SQL and non-relational data in Azure Blob storage? Any inputs would be greatly appreciated.
The best practice is to load data from Azure Blob Storage into SQL Data Warehouse instead of attempting interactive queries over that data.
The reason is that when you run a query against your data residing in Azure Blob Storage (via an external table), SQL Data Warehouse (under-the-covers) imports all the data from Azure Blob Storage into SQL Data Warehouse temporary tables to process the query. So even if you a run SELECT TOP 1 query on your external table, the entire dataset for that table will be imported temporarily to process the query.
As a result, if you know that you will querying the external data frequently, it is recommended that you explicitly load the data into SQL Data Warehouse permanently using a CREATE TABLE AS SELECT command as shown in the document: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/.
As a best practice, break your Azure Storage data into no more than 1GB files when possible for parallel processing with SQL Data Warehouse. More information about how to configure Polybase in SQL Data Warehouse to load data from Azure Storage Blob is here: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/
Let me know if that helps!