Migrating data from CSV file on Azure Blob Storage to Synapse Analytics (serverless pool) - azure-synapse

I have a problem running a pipeline (migration from CSV file stored on Azure Data Factory, to Synapse Analytics) in the Azure Data Factory.
It worked fine with dedicated pool, but i can't get it to work with built-in serverless pool.
I created run-once pipeline on adf.azure.com with UI creator.
On the "Source Data Store" tab i choose Source type: Azure Blob Storage, then i choose appropriate connection, i press browse to choose desired file, i leave "Recursively" option on, i press next.
The next tab is "File format settings", here I choose Advanced options, and changed Escape character from backslash to double quote.
I press next, there's a tab: "Destination data store", i choose target type: Azure Synapse Analytics and then i choose connection. I specified target table name.
On the next tab there's a column mapping, i unchecked type conversion, in the next tab (Copy Data tool settings) I selected Polybase as a copy method.
I enabled staging blob, selected linked Azure Storage service and blob container, then I Pressed next and finished creating a pipeline.
The error message that I received:
Operation on target Copy_ky9 failed:
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
database operation failed with the following error: 'Incorrect syntax
near
'HEAP'.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Incorrect
syntax near 'HEAP'.,Source=.Net SqlClient Data
Provider,SqlErrorNumber=102,Class=15,ErrorCode=-2146232060,State=1,Errors=[{Class=15,Number=102,State=1,Message=Incorrect
syntax near 'HEAP'.,},],'
Since I used a UI for creating a pipeline, I don't know how to check the syntax, although I guess that it internally generates some command, I couldn't find an option to preview it and fix it's syntax.

A dedicated SQL pool in Azure Synapse Analytics has built-in storage; you can load data to a table in a dedicated SQL pool. A Serverless SQL pool has no storage. It is just a metadata layer for views over files in storage; It can read and write files in Azure storage.
I would stop having ADF load it and just build a view in Synapse Serverless SQL.

Related

Unable to read from Azure synapse using Databricks with master key error

I am trying to read a dataframe from Azure Synapse DWH pool using the tutorial provided https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html
I have set the storage account access key "fs.azure.account.key..blob.core.windows.net" and also specified the temp directory for ADLS in abfss format.
The read operation is of the syntax:
x=spark.read.format('com.databricks.spark.sqldw').option('url',sqlDwUrl).option('tempDir',tempdir).option('forwardSparkAzureStorageCredentials', 'true').option('query',"SELECT TOP(1)* FROM "+ targetSchema + '.' + targetTable).load()
The above executes fine.
And then I try to display the dataframe using
display(x)
,I run into the following error.
SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.
Underlying SQLException(s):
- com.microsoft.sqlserver.jdbc.SQLServerException: Please create a master key in the database or open the master key in the session before performing this operation. [ErrorCode = 15581] [SQLState = S0006]
From the documentation ,I understand that a database master key is required, and that has been duly created. Therefore, I am not sure why this error is thrown.
Surprisingly, write operations to Synapse using the format(df.write
.format("com.databricks.spark.sqldw").....) work like a charm.
I did some research,and based on that,I feel that database master key(this was created by DBA) is valid for both read as well as write operations.
Is there any way by which a database master key would restrict read operations, but not write?
If not, then why could the above issue be occuring?
You would have to create a MASTER KEY first after creating a SQL POOL from Azure Portal. You can do this by connecting through SSMS and running a T-SQL command. If you now try to read from a table in this pool, you would see no error in databricks.
Going through these docs, Required Azure Synapse permissions for PolyBase
As a prerequisite for the first command, the connector expects that a
database master key already exists for the specified Azure Synapse
instance. If not, you can create a key using the CREATE MASTER KEY
command.
Next..
Is there any way by which a database master key would restrict read
operations, but not write? If not, then why could the above issue be
occuring?
If you notice, while writing to SQL, you have configured temp directory in the storage account. Azure Synapse connector automatically discovers the account access key set and forwards it to the connected Azure Synapse instance by creating a temporary Azure database scoped credential
Creates a database credential. A database credential is not mapped to
a server login or database user. The credential is used by the
database to access to the external location anytime the database is
performing an operation that requires access.
And from here Open the Database Master Key of the current database
If the database master key was encrypted with the service master key,
it will be automatically opened when it is needed for decryption or
encryption. In this case, it is not necessary to use the OPEN MASTER
KEY statement.
When a database is first attached or restored to a new instance of SQL
Server, a copy of the database master key (encrypted by the service
master key) is not yet stored in the server. You must use the OPEN
MASTER KEY statement to decrypt the database master key (DMK). Once
the DMK has been decrypted, you have the option of enabling automatic
decryption in the future by using the ALTER MASTER KEY REGENERATE
statement to provision the server with a copy of the DMK, encrypted
with the service master key (SMK).
But...from
For SQL Database and Azure Synapse Analytics, the password protection
is not considered to be a safety mechanism to prevent a data loss
scenario in situations where the database may be moved from one server
to another, as the Service Master Key protection on the Master Key is
managed by Microsoft Azure platform. Therefore, the Master Key
password is optional in SQL Database and Azure Synapse Analytics.
As you can read from above, I tried to repro and yes, after you first create a SQL POOL from Synapse Portal, you can write to a table from databricks directly but when you try to read the same you get the exception.
Spark is writing the data to the common blob storage as parquet file and later synapse uses COPY statement to load these to given table. And when reading data from synapse dedicated SQL pool table, Synapse is writing the data from dedicated sql pool to common blob storage as parquet file with snappy compression and then this is read by Spark and displayed to you.
We are just setting the blob storage account key and secret in the config for the session. And using forwardSparkAzureStorageCredentials = true Synapse connector is forwarding storage access key to Azure synapse dedicated pool by creating Azure database scoped credential.
Note: You can .load() into data frame without exception but when you try and use display(dataframe) the exception pops.
Now considering if MASTER KEY exists, connecting to your sql pool db, you can try the below,
Examples: Azure Synapse Analytics
OPEN MASTER KEY DECRYPTION BY PASSWORD = 'Your-DB-PASS';
GO
CLOSE MASTER KEY;
GO
If you get this error:
Please create a master key in the database or open the master key in the session before performing this operation.
Just CREATE MASTER KEY or use ALTER MASTER KEY:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'ljlLKJjs$2#l23je'
OR
ALTER MASTER KEY REGENERATE WITH ENCRYPTION BY PASSWORD = 'ljlLKJjs$2#l23je';

How to get Azure SQL transactional log

How to get the transaction logs for a Azure SQL db? I'm trying to find log from portal of azure but not getting any luck.
If there is no way to get the log where that is saying in Microsoft docs. any help is appriciate
You don't as it is not exposed in the service. Please step back and describe what problem you'd like to solve. If you want a DR solution, for example, then active geo-replication can solve this for you as part of the service offering.
The log format in Azure SQL DB is constantly changing and is "ahead" of the most recent version of SQL Server. So, it is probably not useful to expose the log (the format is not documented). Your use case will likely determine the alternative question you can ask instead.
Azure SQL Database auditing tracks database events and writes them to an audit log in your Azure storage account, or sends them to Event Hub or Log Analytics for downstream processing and analysis.
Blob audit
Audit logs stored in Azure Blob storage are stored in a container named sqldbauditlogs in the Azure storage account. The directory hierarchy within the container is of the form ////. The Blob file name format is _.xel, where CreationTime is in UTC hh_mm_ss_ms format, and FileNumberInSession is a running index in case session logs spans across multiple Blob files.
For example, for database Database1 on Server1 the following is a possible valid path:
Server1/Database1/SqlDbAuditing_ServerAudit_NoRetention/2019-02-03/12_23_30_794_0.xel
Read-only Replicas audit logs are stored in the same container. The directory hierarchy within the container is of the form ////RO/. The Blob file name shares the same format. The Audit Logs of Read-only Replicas are stored in the same container.

Azure Synapse Analytics Error when using saveAsTable from a DataFrame which is loaded from a SQL source

I'm following the guide (https://learn.microsoft.com/en-us/azure/synapse-analytics/get-started) for loading data from a SQL Pool and writing the DataFrame to a table in the metastore. However I'm getting an error:
Error : org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, PUT, https://xxx.dfs.core.windows.net/tempdata/synapse/workspaces/xxx/sparkpools/SparkPool/sparkpoolinstances/8f3ec14a-1e59-4597-8fd9-42da0db65331?action=setAccessControl&timeout=90, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. RequestId:fe61799c-e01f-0003-119e-37fdb1000000 Time:2020-05-31T22:57:55.8271281Z"
I've replaced my resource names with xxx.
Other DataFrame saveAsTable operations work fine. From what I can see, the data is being read from the SQL Pool successfully and being staged as when I browse the data lake location specified in the error I can see the data.
/tempdata/synapse/workspaces/xxx/sparkpools/SparkPool/sparkpoolinstances/8f3ec14a-1e59-4597-8fd9-42da0db65331
The Synapse workspace managed identity has storage blob data contributor permissions and my own domain account has owner access.
Has anyone else had issues?
Thanks
Andy
Please assign yourself (account with which you're trying to run the script) a role of Storage Blob Data Contributor.
Below information is now showing up during the creation of Azure Synapse workspace.
It was a big struggle to figure this out during it's private preview.
More information related to securing synapse workspace can be found here.
Let me know if this worked.
Thank you.

Data Factory New Linked Service connection failure ACL and firewall rule

I'm trying to move data from a datalake stored in Azure Data Lake Storage Gen1 to a table in an Azure SQL database. In Data Factory "new Linked Service" when I test the connection I get a "connection failed" error message, "Access denied...make sure ACL and firewall rule is correctly configured in the Azure Data Lake Store account. I tried numerous times to correct using related Stack overflow comments and plethora of fragmented Azure documentation to no avail. Am I using the correct approach and if so how do I fix the issue?
Please follow me:
First:
Go to ADF and new Linked service in ADF,then copy Managed identity object ID.
Second:Go to Azure Data Lake Storage Gen1,navigate to Data Explorer -> Access -> click select in the 'Select User or group' field.
Finally:paste your Managed identity object ID and then test your connection in ADF.

SQL Azure Hyperscale

I wanted to try SQL Azure Hyperscale, after I selected it seems I no longer have the option to move out of it. If I try, I get this message "databases cannot be moved out of hyperscale tier".
The Copy option is not available and restore does not give you the capability of choosing a different configuration.
Migration to Hyperscale is currently a one-way operation. Once a database is migrated to Hyperscale, it cannot be migrated directly to a non-Hyperscale service tier. At present, the only way to migrate a database from Hyperscale to non-Hyperscale is to export/import using a BACPAC file or other data movement technologies (Bulk Copy, Azure Data Factory, Azure Databricks, SSIS, etc.).
The Copy database option is currently not available on Hyperscale, but it will be in Public Preview in the near future.