Error connecting to DataLake(ADLS Gen2) store from databricks - azure-data-lake

I am trying to connect to dataLake Gen2 storage from databricks python, unfortunately I am running into error.
Code:
dbutils.fs.ls("abfss://<fsystem name>#<storage name>.dfs.core.windows.net/<folder name>")
Error Message:
Configuration property .dfs.core.windows.net not found.
I doubt if it is something to do with my mount code? Additionally I have added Tenant ID to container "manage access" using storage explorer.
here is my mount code:
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<client ID>",
"fs.azure.account.oauth2.client.secret": "secret",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/directory id/oauth2/token"}
dbutils.fs.mount( source = "abfss://filesystem name#<storage name>.dfs.core.windows.net/", mount_point = /mnt/soldel", extra_configs = configs)
Mount code ran fine, with no errors. Please suggest

Note: You cannot access the Azure Data Lake Gen2 account without configuring the storage account with Databricks.
This is expected error message because you haven't configured storage account with databricks to list filesystem.
Kindly check out the error message and see the correct process of list filesystem in Databricks.
For more details, refer "Databricks - Azure Data Lake Storage Gen2".
Hope this helps.

Related

Not able to get Azure SQL Server Extended Events to work when Blob Storage is set to Enabled from selected virtual networks and IP addresses

So I have an Azure Database and want to test extended events with the database.
I was able to set up my Blob Storage container and was able to get Extended Events via Azure Database to work as long as the Blob Storage network setting Public network access is set to Enabled from all networks. If I set Enabled from selected virtual networks and IP addresses and have Microsoft network routing checked as well as Resource type set with Microsoft.Sql/servers and its value as All In current subscription, it still doesn't work.
I'm not exactly sure what I'm doing wrong and I'm not able to find any documentation on how to make it work without opening up to all networks.
The error I'm getting is:
The target, "5B2DA06D-898A-43C8-9309-39BBBE93EBBD.package0.event_file", encountered a configuration error during initialization. Object cannot be added to the event session. (null) (Microsoft SQL Server, Error: 25602)
Edit - Steps to fix the issue
#Imran: Your answer led me to get everything working. The information you gave and the link provided was enough for me to figure it out.
However, for anyone in the future I want to give better instructions.
The first step I had to do was:
All I had to do was run Set-AzSqlServer -ResourceGroupName [ResourcegroupName] b -ServerName [AzureSQLServerName] -AssignIdentity.
This assigns the SQL Server an Azure Active Directory Identity. After running the above command, you can see your new identity in Azure Active Directory under Enterprise applicationsand then where you see theApplication type == Enterprise Applicationsheader, click the headerApplication type == Enterprise Applicationsand change it toManaged Identities`and click apply. You should see your new identity.
The next step is to give your new identity the role of Storage Blob Data Contributor to your container in Blob Storage. You will need to go to your new container and click Access Control (IAM) => Role assignments => click Add => Add Role assignment => Storage Blob Data Contributor => Managed identity => Select member => click your new identity and click select and then Review + assign
The last step is to get SQL Server to use an identity when connecting to `Blob Storage.
You do that by running the command below on your Azure SQL Server database.
CREATE DATABASE SCOPED CREDENTIAL [https://<mystorageaccountname>.blob.core.windows.net/<mystorageaccountcontainername>]
WITH IDENTITY = 'Managed Identity';
GO
You can see your new credentials when running
SELECT * FROM sys.database_scoped_credentials
The last thing I want to mention is when creating Extended Events with
an Azure SQL Server using SSMS, it gives you this link. This only works if you want your Blob Storage wide open. I think this is a disservice and wish they would have instructions when you want your Blob Storage not wide open by using RBAC instead of SAS.
I tried to reproduce the same in my environment I got the result successfully like below:
To resolve this issue, check whether your account type should be
StorageV2(general purpose v2). If you have a general-purpose v1 or blob storage account, try to upgrade like below.
In storage account -> under setting, configuration -> upgrade
Check whether you have choose Allow trusted Microsoft services to access this storage account under exception and I added firewall client Ip address range and vnet like below.
Make sure Microsoft.Authorization/roleAssignments/write permission in your storage account
After enabling firewall, we lose write access to the storage account and audit logs try to Resave the audit settings from the portal is required in order for auditing to function like below.
Note: Auditing to storage behind firewalls using user managed identity authentication type is not presently supported.
When I try to connect, I got result successfully like below:
Reference:
Configure extended events in SQL Azure to the blob storage with Private Endpoint - Microsoft Community Hub by Sakshi Gupta

Content of directory on path https://xxxxxxx.dfs.core.windows.net/dataverse-xxxx-org5a2/account/Snapshot/2018-08_1656570292/*.csv' cannot be listed

When I try to query our Serverless SQL pool in Azure Synapse Analytics I get the following error:
"Content of directory on path 'https://xxxxxx.dfs.core.windows.net/dataverse-xxxxxx-org5a2bcccf/account/Snapshot/2018-08_1656570292/*.csv' cannot be listed.".
I have checked out the following link for clues as to what could be cause:
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand?tabs=x80070002
It is suggested that the error is due permissions:
However, I believe I have the correct permissons,
I get this error whether I try to execute the query in SSMS or Synapse Workspace.
The error in SSMS is as follows:
Warning: Unable to resolve path https://xxxxx.dfs.core.windows.net/dataverse-xxxxx-org5a2bcccf/account/Snapshot/2018-10_1657304551/*.csv. Error number 13807, Level 16, State 1, Message "Content of directory on path 'https://xxxxxx.dfs.core.windows.net/dataverse-xxxxx-org5a2bcccf/account/Snapshot/2018-10_1657304551/*.csv' cannot be listed.".
Can someone let me know how to resolve this?
The query that I'm attempting to execute can be located here:
https://github.com/slavatrofimov/Synapse-Link-for-Dataverse-data-enrichment-in-Serverless-SQL-Pools/blob/main/SQL/Enrich%20Synapse%20Link%20for%20Dataverse%20Entities%20with%20Human-Readable%20Labels.sql
Is there a definitive way to determine if the problem is due to lack of permissions?
Update Question:
I have just realised that the issue is access the Lake on https://xxxxxx.dfs.core.windows.net/dataverse-xxxxxx-org5a2bcccf/
Therefore please take a look at my permissons on the lake and let me know if it is sufficient?
This issue occurs when the user trying to query the external table does not have the relevant permissions or if there is a firewall enabled on your storage network.
When looked at the permissions you have provided, I see Storage Blob Data reader and Storage Blob Data contributor have been given.
Ref doc: Control storage account access for serverless SQL pool in Azure Synapse Analytics
In case if your storage account is firewall protect then you will have to follow the steps described in this document to overcome the issue: Access storage that is protected with the firewall
Here are couple of relevant articles which might help you configure your storage firewall to overcome this issue:
Storage configuration for external table is not accessible while query on Serverless
Synapse Studio error while trying to read data from Storage Account using SQL On Demand

Getting error:login failed for user <token- identification principle> data factory linked service to synapse analytics

I am trying to create azure data factory linked service to synapse analytics with system-assigned managed identity but i am getting this error
Error 22300:Cannot connect to SQL Database:
#xxxxsql.azuresynapse.net', Database: xxxx, User: Check the linked
service configuration is correct, and make sure the SQL Database
firewall allows the integration runtime to access.
Login failed for user token-identified principal
I am getting this error. how solve this error?
I tried to reproduce same thing in my environment, I got same error.
To resolve above error, Please follow below steps:
Go to Azure synapse workspace -> Azure Active directory ->
Set Admin -> search for Azure data factor, make as admin and save it.
You can refer this similar SO thread.

ADLS to Azure Storage Sync Using AzCopy

Looking for some help to resolve the errors I'm facing. Let me explain the scenario. I'm trying to sync one of the ADLS Gen2 container to Azure BLOB Storage. I have AzCopy 10.4.3, I'm using Azcopy Sync to do this. I'm using the command below
azcopy sync 'https://ADLSGen2.blob.core.windows.net/testsamplefiles/SAMPLE' 'https://AzureBlobStorage.blob.core.windows.net/testsamplefiles/SAMPLE' --recursive
When I run this command I'm getting below error
REQUEST/RESPONSE (Try=1/71.0063ms, OpTime=110.9373ms) -- RESPONSE SUCCESSFULLY RECEIVED
PUT https://AzureBlobStorage.blob.core.windows.net/testsamplefiles/SAMPLE/SampleFile.parquet?blockid=ZDQ0ODlkYzItN2N2QzOWJm&comp=block&timeout=901
X-Ms-Request-Id: [378ca837-d01e-0031-4f48-34cfc2000000]
ERR: [P#0-T#0] COPYFAILED: https://ADLSGen2.blob.core.windows.net/testsamplefiles/SAMPLE/SampleFile.parquet: 404 : 404 The specified resource does not exist.. When Staging block from URL. X-Ms-Request-Id: [378ca837-d01e-0031-4f48-34cfc2000000]
Dst: https://AzureBlobStorage.blob.core.windows.net/testsamplefiles/SAMPLE/SampleFile.parquet
REQUEST/RESPONSE (Try=1/22.9854ms, OpTime=22.9854ms) -- RESPONSE SUCCESSFULLY RECEIVED
GET https://AzureBlobStorage.blob.core.windows.net/testsamplefiles/SAMPLE/SampleFile.parquet?blocklisttype=all&comp=blocklist&timeout=31
X-Ms-Request-Id: [378ca84e-d01e-0031-6148-34cfc2000000]
So far I checked and ensured below things
I logged into correct tenant while logging into AzCopy
Storage Blob Data Contributor role was granted to my AD credentials
Not sure what else I'm missing as the file exists in the source and I'm getting the same error. I tried with SAS but I received different error though. I cannot proceed with SAS due to the vendor policy so I need to ensure this is working with oAuth. Any inputs is really appreciated.
For the 404 error, you may check if there is any typo in the command and the path /testsamplefiles/SAMPLE exists on both source and destination account. Also, please note that from the tips.
Use single quotes in all command shells except for the Windows Command
Shell (cmd.exe). If you're using a Windows Command Shell (cmd.exe),
enclose path arguments with double quotes ("") instead of single
quotes ('').
From azcopy sync supported scenario:
Azure Blob <-> Azure Blob (Source must include a SAS or is publicly
accessible; either SAS or OAuth authentication can be used for
destination)
We must provide include a SAS token in the source, but I tried the below code with AD authentication.
azcopy sync "https://[account].blob.core.windows.net/[container]/[path/to/blob]?[SAS]" "https://[account].blob.core.windows.net/[container]/[path/to/blob]"
but got the same 400 error as the Github issue.
Thus, in this case, after my validation, you could use this command to sync one of the ADLS Gen2 container to Azure BLOB Storage without executing azcopy login. If you have login in, you can run azcopy logout.
azcopy sync "https://nancydl.blob.core.windows.net/container1/sample?sv=xxx" "https://nancytestdiag244.blob.core.windows.net/container1/sample?sv=xxx" --recursive --s2s-preserve-access-tier=false

Virto Commerce Catalog Export to CSV issue with blob/local storage setting on azure deployment

I have an azure deployment, however, when I am exporting to CSV from catalog, it tries to export to the following link:
I have tried to find where the setting is to change this to blob storage but I am a new with the software. If you change the localhost to the URL of the deployment, it gives a 404 error. I believe that there is somewhere there is a setting to configure local or blob storage, could you please direct us to where this is?
By the way, this is from a default azure deployment currently.
Thanks for your help.
Here is a paste of what it returns:
Catalog to csv export
detail
Start export
Start — 4:00:39 AM
End — 4:01:33 AM
Total count
5
Processed count
5
Error count
0
Download Url:http://localhost/admin/Assets/temp/Catalog-newjersey-export.csv
How to switch from local to azure blob storage you can read here Working with blob storages
To solve you problem and switch to azure blob storage you need will do next steps:
Open you azure storage account (blob) access keys and copy primary connection string
Replace in virtocommerce manager web.config exist AssetsConnectionString line to
<add name="AssetsConnectionString" connectionString="provider=AzureBlobStorage; { you storage account connection string }" />