Query data lake from Azure SQL database - sql

I'm just finding my way around Azure, trying to build a modern data warehouse. One thing I haven't been able to figure out is how to query my data lake from an Azure SQL database.
Something similar to the following works in Azure Synapse (note the long-term plan is to remove Synapse due to cost reasons):
SELECT top 100 *
FROM
OPENROWSET( BULK
'https://storageaccount.blob.core.windows.net/container/folder/2022/09/03/filename.parquet'
,SINGLE_BLOB
) AS [result]
But I get the following error running this from an Azure SQL database (in the Azure portal, using the query editor):
Failed to execute the query. Error: Cannot bulk load because the file "https://storageaccount.blob.core.windows.net/container/folder/2022/09/03/filename.parquet" could not be opened. Operating system error code 6(The handle is invalid.).
I also tried the code below after searching on the Internet:
CREATE EXTERNAL DATA SOURCE pocBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://storageaccount.blob.core.windows.net/container/folder/2022/09/03',
CREDENTIAL= sqlblob);
-- Query remote file
SELECT *
FROM OPENROWSET(BULK 'filename.parquet',
DATA_SOURCE = 'pocBlobStorage',
SINGLE_CLOB
--FORMATFILE='currency.fmt',
--FIRSTROW=2
--, FORMATFILE_DATA_SOURCE = 'pocBlobStorage'
) as D
I tried various combinations of the formatting options, but couldn't get anything to work.
The current error I'm getting is: Failed to execute query. Error: Referenced external data source "pocBlobStorage" not found.
I'm wondering if I need to do something to enable the Azure SQL database to access my data lake. For example, I haven't configured any credential called 'SQL blob' as per my last code segment, but I'm not sure where to do this (for example something similar to creating a linked service in azure data factory).
So how do I query my data lake, directly from my azure SQL database? Is the issue in my query, or do I need to configure access first, and if so how?

Related

Azure SQL Serverless inbuilt Pool Column/Field Limitations

We have created a SQL Database from our Azure SQL Serverless Pool. We have a table that has over 450 fields.
Whenever we try to extract the table with all the fields the query times out and produces the following error:
Msg 15884, Level 16, State 1, Line 2
Query timeout expired.
However, when I we try to extract just a few fields it successfully gives us all the rows.
Therefore, can someone let me know if there are any limitations on the number fields when extracting tables from Azure SQL Serverless Pool?
Msg 15884, Level 16, State 1, Line 2
Query timeout expired.
This error is because the SQL query takes long time to execute. Unfortunately, timeout settings cannot be modified in Synapse SQL serverless pool. The solution is to either optimize the query or to optimize the data stored in external storage.
Below are some points for better performance.
Try to store data in parquet format than csv or Json file. Parquet files are columnar format and size will be lesser for same data which is stored as csv or Json format.
Do not use the storage account with other workloads during query execution.
In order to query large amount of data, use Azure Data Studio or SQL Server Management Studio than azure synapse studio.
Make sure to have Synapse serverless SQL pool and Storage in the same region.
Refer Microsoft document on Best practices for serverless SQL pool - Azure Synapse Analytics .

OutputDataConversionError.TypeConversionError writing to Azure SQL DB using Stream Analytics from IoT Hub

I have wired up a Stream Analytics job to take data from an IoT Hub and write it to Azure SQL Database.
I am running into an issue with one input field which is a date/time object '2019-07-29T01:29:27.6246594Z' which always seems to result in an OutputDataConversionError.TypeConversionError -
[11:59:20 AM] Source 'eventssqldb' had 1 occurrences of kind 'OutputDataConversionError.TypeConversionError' between processing times '2019-07-29T01:59:20.7382451Z' and '2019-07-29T01:59:20.7382451Z'.
Input data sample (sourceeventtime is the problem - other datetime fields also fail).
{
"eventtype":"gamedata",
"scoretier":4,
"aistate":"on",
"sourceeventtime":"2019-07-28T23:59:24.6826565Z",
"EventProcessedUtcTime":"2019-07-29T00:13:03.4006256Z",
"PartitionId":1,
"EventEnqueuedUtcTime":"2019-07-28T23:59:25.7940000Z",
"IoTHub":{"MessageId":null,"CorrelationId":null,"ConnectionDeviceId":"testdevice","ConnectionDeviceGenerationId":"636996260331615896","EnqueuedTime":"2019-07-28T23:59:25.7670000Z","StreamId":null}
}
The target field in Azure SQL DB is datetime2 and the incoming value can be converted successfully by Azure SQL DB using a query on the same server.
I've tried a bunch of different techniques including CAST on Stream Analytics, and changing the compatibility level of the Stream Analytics job all to no avail.
Testing the query using a dump of the data in Stream Analytics results in no errors either.
I have the same data writing to Table Storage fine, but need to change to Azure SQL DB to enable shorter automated Power BI refresh cycles.
I have tried multiple Stream Analytics jobs and can recreate each time with Azure SQL DB.
Turns out that this appears to have been a cached error message being displayed in the Azure Portal.
On further investigation through reviewing detailed logs it appears another value that was too long for the target SQL DB field (i.e. would have been truncated) was the actual source of the failure. Resolving this removed the error.

Azure Machine Learning Write output to Azure SQL Database

I am using Azure Machine Learning to clustering data.
The input data is from an Azure SQL Database, and it works fine.
At the end of everything I want to write the output to a table in the same Azure SQL Database, but I get this error:
Error: Error 1000: AFx Library library exception:
Sql encountered an error: Login failed for user
Anyone any idea?
Thank you very much!
Please follow the instructions and examine the examples provided here to properly use the Export Data module to save the data of ML to Azure SQL Database.
How to Export Data to an Azure SQL Database
Add the Export Data module to your experiment. You can find this module in the Data Input and Output group in the experiment items list in Azure Machine Learning Studio.
Connect it to the module that produces the data that you want to export to Azure SQL DB.
For Data destination, select Azure SQL Database. This option supports Azure SQL Data Warehouse as well.
Set the following options specific to Azure SQL Database or Azure SQL Data Warehouse.
Database server name
Type the server name that is generated by Azure. Typically it has the form <generated_identifier>.database.windows.net.
Database name
Type the name of a database on the server you just specified.The database must already exist; the Export Data cannot create it.
Server user account name
Type the user name of an account that has access permissions for the database.
Server user account password
Provide the password for the specified user account.
Comma-separated list of columns to be saved
Type the names of the columns in the experiment that you want to write to the database.
Data table name
Type the name of the table where data will be stored.
For Azure SQL Database, if the table does not exist, it will be created. For Azure SQL Data Warehouse, the table must already exist and have the correct schema, so be sure to create it in advance.
Comma-separated list of datatable columns
Type the names of the columns as you wish them to appear in the destination table. The columns should correspond in order with the column names that you list in Comma-separated list of columns to be saved.
if you are writing to Azure SQL Data Warehouse, the columns names must match those already in the destination table schema.
Number of rows written per SQL Azure operation
Indicate how many rows should be written to the destination table in each batch. By default, the value is set to 50, which is the default batch size for Azure SQL Database. However, you should increase this value if you have a large number of rows to write.
TIP:
For Azure SQL Data Warehouse, we recommend that you set this value to 1. If you use a larger batch size, the size of the command string that is sent to Azure SQL Data Warehouse can exceed the allowed string length, causing an error.
If you don't want to write new results each time you run the experiment, select the Use cached results option. If there are no other changes to module parameters, the experiment will write the data the first time the module is run, and thereafter not perform writes.
However, a write will always be performed if any parameters have been changed in Export Data that would change the results.
Run the experiment.
Find the issue!
I needed to create an specific user with this SQL code:
CREATE USER AMLApplicationUser WITH PASSWORD = '************';
and then add the user to these roles on the database I want to write.
ALTER ROLE db_datareader ADD MEMBER AMLApplicationUser;
ALTER ROLE db_datawriter ADD MEMBER AMLApplicationUser;
I guess only the datawriter role is enough, but I needed datareader too.
So in conclusion, seems that database admin role can be used to read data, but not to write data from AML.
Thank you for your help!

Can't CREATE EXTERNAL DATA SOURCE in SQL

I'm trying to create an external data source to access Azure Blob Storage. However, I'm having issues with creating the actual data source.
I've followed the instructions located here:
Examples of bulk access to data in azure blob storage and
Create external data source - transact sql. I'm using SQL Server 2016 on a VM accessing via SSMS on a client machine using Windows Authentication with no issues. Instructions say creating this external data source works for SQL Server 2016 and Azure Blob Storage.
I have created the Master Key:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = <password>
and, the database scoped credential
CREATE DATABASE SCOPED CREDENTIAL UploadCountries
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = <key>;
I have verified both of these exist in the database by querying sys.symmetric_keys and sys.database_scoped_credentials.
However, when I try executing the following code it says 'Incorrect syntax near 'EXTERNAL'
CREATE EXTERNAL DATA SOURCE BlobCountries
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://<somewhere>.table.core.windows.net/<somewhere>',
CREDENTIAL = UploadCountries
);
Your thoughts and help are appreciated!
Steve.
In “Examples of Bulk Access to Data in Azure Blob Storage”, we can find:
Bulk access to Azure blob storage from SQL Server, requires at least SQL Server 2017 CTP 1.1.
And in Arguments section of “CREATE EXTERNAL DATA SOURCE (Transact-SQL)”, we can find similar information:
Use BLOB_STORAGE when performing bulk operations using BULK INSERT or OPENROWSET with SQL Server 2017
You are using SQL Server 2016, so you get Incorrect syntax near 'EXTERNAL' error when you create external data source for Azure Blob storage.

Unable to create EXTERNAL TABLE to Azure SQL Server on Azure SQL Data Warehouse

Unable to create EXTERNAL TABLE at Azure SQL Data Warehouse pointing to Azure SQL Server.
CREATE EXTERNAL DATA SOURCE EX_SOURCE
WITH (
TYPE = RDBMS,
LOCATION = 'SERVER.database.windows.net',
DATABASE_NAME = 'DB_NAME',
CREDENTIAL = "CREDENTIAL"
)
;
Responds with:
Incorrect syntax near 'RDBMS'
Does anyone know a workaround for this?
I also have tried to to do it the other way around. It allows me to create an EXTERNAL DATA SOURCE pointing to Azure SQL Data Warehouse from Azure SQL Server and to create EXTERNAL TABLE, but when I try to query it I get:
Error retrieving data from one or more shards. The underlying error message received was: 'Parse error at line: 1, column: 36: Incorrect syntax near '='.'.
Azure SQL Data Warehouse, today, only supports creating an external data source to Azure Blob Storage and Hadoop ('hdfs://') targets. The current external query syntax for Azure SQL Database does not support Azure SQL Data Warehouse.