Azure SQL Serverless inbuilt Pool Column/Field Limitations - azure-synapse

We have created a SQL Database from our Azure SQL Serverless Pool. We have a table that has over 450 fields.
Whenever we try to extract the table with all the fields the query times out and produces the following error:
Msg 15884, Level 16, State 1, Line 2
Query timeout expired.
However, when I we try to extract just a few fields it successfully gives us all the rows.
Therefore, can someone let me know if there are any limitations on the number fields when extracting tables from Azure SQL Serverless Pool?

Msg 15884, Level 16, State 1, Line 2
Query timeout expired.
This error is because the SQL query takes long time to execute. Unfortunately, timeout settings cannot be modified in Synapse SQL serverless pool. The solution is to either optimize the query or to optimize the data stored in external storage.
Below are some points for better performance.
Try to store data in parquet format than csv or Json file. Parquet files are columnar format and size will be lesser for same data which is stored as csv or Json format.
Do not use the storage account with other workloads during query execution.
In order to query large amount of data, use Azure Data Studio or SQL Server Management Studio than azure synapse studio.
Make sure to have Synapse serverless SQL pool and Storage in the same region.
Refer Microsoft document on Best practices for serverless SQL pool - Azure Synapse Analytics .

Related

Query data lake from Azure SQL database

I'm just finding my way around Azure, trying to build a modern data warehouse. One thing I haven't been able to figure out is how to query my data lake from an Azure SQL database.
Something similar to the following works in Azure Synapse (note the long-term plan is to remove Synapse due to cost reasons):
SELECT top 100 *
FROM
OPENROWSET( BULK
'https://storageaccount.blob.core.windows.net/container/folder/2022/09/03/filename.parquet'
,SINGLE_BLOB
) AS [result]
But I get the following error running this from an Azure SQL database (in the Azure portal, using the query editor):
Failed to execute the query. Error: Cannot bulk load because the file "https://storageaccount.blob.core.windows.net/container/folder/2022/09/03/filename.parquet" could not be opened. Operating system error code 6(The handle is invalid.).
I also tried the code below after searching on the Internet:
CREATE EXTERNAL DATA SOURCE pocBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://storageaccount.blob.core.windows.net/container/folder/2022/09/03',
CREDENTIAL= sqlblob);
-- Query remote file
SELECT *
FROM OPENROWSET(BULK 'filename.parquet',
DATA_SOURCE = 'pocBlobStorage',
SINGLE_CLOB
--FORMATFILE='currency.fmt',
--FIRSTROW=2
--, FORMATFILE_DATA_SOURCE = 'pocBlobStorage'
) as D
I tried various combinations of the formatting options, but couldn't get anything to work.
The current error I'm getting is: Failed to execute query. Error: Referenced external data source "pocBlobStorage" not found.
I'm wondering if I need to do something to enable the Azure SQL database to access my data lake. For example, I haven't configured any credential called 'SQL blob' as per my last code segment, but I'm not sure where to do this (for example something similar to creating a linked service in azure data factory).
So how do I query my data lake, directly from my azure SQL database? Is the issue in my query, or do I need to configure access first, and if so how?

CETAS times out for large tables in Synapse Serverless SQL

I'm trying to create a new external table using CETAS (CREATE EXTERNAL TABLE AS SELECT * FROM <table>) statement from an already existing external table in Azure Synapse Serverless SQL Pool. The table I'm selecting from is a very large external table built on around 30 GB of data in parquet format stored in ADLS Gen 2 storage but the query always times out after about 30 minutes. I've tried using premium storage and also tried out most if not all the suggestions made here as well but it didn't help and the query still times out.
The error I get in Synapse Studio is :-
Statement ID: {550AF4B4-0F2F-474C-A502-6D29BAC1C558} | Query hash: 0x2FA8C2EFADC713D | Distributed request ID: {CC78C7FD-ED10-4CEF-ABB6-56A3D4212A5E}. Total size of data scanned is 0 megabytes, total size of data moved is 0 megabytes, total size of data written is 0 megabytes. Query timeout expired.
The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.
Is there a way to resolve this timeout issue or a better way to solve the problem?
This is a limitation of Serverless.
Query timeout expired
The error Query timeout expired is returned if the query executed more
than 30 minutes on serverless SQL pool. This is a limit of serverless
SQL pool that cannot be changed. Try to optimize your query by
applying best practices, or try to materialize parts of your queries
using CETAS. Check is there a concurrent workload running on the
serverless pool because the other queries might take the resources. In
that case you might split the workload on multiple workspaces.
Self-help for serverless SQL pool - Query Timeout Expired
The core use case is that assuming I only have the external table name, I want to create a copy of the data over which that external table is created in Azure storage itself.
It's simple to do in a Data Factory copy job, a Spark job, or AzCopy.

OutputDataConversionError.TypeConversionError writing to Azure SQL DB using Stream Analytics from IoT Hub

I have wired up a Stream Analytics job to take data from an IoT Hub and write it to Azure SQL Database.
I am running into an issue with one input field which is a date/time object '2019-07-29T01:29:27.6246594Z' which always seems to result in an OutputDataConversionError.TypeConversionError -
[11:59:20 AM] Source 'eventssqldb' had 1 occurrences of kind 'OutputDataConversionError.TypeConversionError' between processing times '2019-07-29T01:59:20.7382451Z' and '2019-07-29T01:59:20.7382451Z'.
Input data sample (sourceeventtime is the problem - other datetime fields also fail).
{
"eventtype":"gamedata",
"scoretier":4,
"aistate":"on",
"sourceeventtime":"2019-07-28T23:59:24.6826565Z",
"EventProcessedUtcTime":"2019-07-29T00:13:03.4006256Z",
"PartitionId":1,
"EventEnqueuedUtcTime":"2019-07-28T23:59:25.7940000Z",
"IoTHub":{"MessageId":null,"CorrelationId":null,"ConnectionDeviceId":"testdevice","ConnectionDeviceGenerationId":"636996260331615896","EnqueuedTime":"2019-07-28T23:59:25.7670000Z","StreamId":null}
}
The target field in Azure SQL DB is datetime2 and the incoming value can be converted successfully by Azure SQL DB using a query on the same server.
I've tried a bunch of different techniques including CAST on Stream Analytics, and changing the compatibility level of the Stream Analytics job all to no avail.
Testing the query using a dump of the data in Stream Analytics results in no errors either.
I have the same data writing to Table Storage fine, but need to change to Azure SQL DB to enable shorter automated Power BI refresh cycles.
I have tried multiple Stream Analytics jobs and can recreate each time with Azure SQL DB.
Turns out that this appears to have been a cached error message being displayed in the Azure Portal.
On further investigation through reviewing detailed logs it appears another value that was too long for the target SQL DB field (i.e. would have been truncated) was the actual source of the failure. Resolving this removed the error.

Azure SQL External table of Azure Table storage data

Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?

How to ensure faster response time using transact-SQL in Azure SQL DW when I combine data from SQL and non-relational data in Azure blob storage?

What should I do to ensure optimal query performance using transact-SQL in Azure SQL Data Warehouse while combining data sets from SQL and non-relational data in Azure Blob storage? Any inputs would be greatly appreciated.
The best practice is to load data from Azure Blob Storage into SQL Data Warehouse instead of attempting interactive queries over that data.
The reason is that when you run a query against your data residing in Azure Blob Storage (via an external table), SQL Data Warehouse (under-the-covers) imports all the data from Azure Blob Storage into SQL Data Warehouse temporary tables to process the query. So even if you a run SELECT TOP 1 query on your external table, the entire dataset for that table will be imported temporarily to process the query.
As a result, if you know that you will querying the external data frequently, it is recommended that you explicitly load the data into SQL Data Warehouse permanently using a CREATE TABLE AS SELECT command as shown in the document: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/.
As a best practice, break your Azure Storage data into no more than 1GB files when possible for parallel processing with SQL Data Warehouse. More information about how to configure Polybase in SQL Data Warehouse to load data from Azure Storage Blob is here: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/
Let me know if that helps!