Azure Synapse Analytics - exception Running a Dataflow - azure-data-factory-2

Using the preview of Synapse Analytics Workspace and the relates Synapse Studio, I have created a Data Flow that simply loads parquet file from a Datalake gen2 store into a table inside a SQL pool . Running the pipeline that contains only suchData Flow, I got the error
Livy Id=[0] Job failed during run time with state=[dead].
In synapse studio, looking into Monitor -> Apache Spark Application I found the Driver-stderr log for the failed spark application. There there was a row stating
ERROR Dataflow AppManager: name=AppManager.main, opId=AppManager fail, unexpected:java.lang.NoSuchMethodError: com.microsoft.azure.kusto.ingest.IngestionProperties.setJsonMappingName(Ljava/lang/String;)V, message=adfadf
Does any of you ever seen such error?

Related

Content of directory on path https://xxxxxxx.dfs.core.windows.net/dataverse-xxxx-org5a2/account/Snapshot/2018-08_1656570292/*.csv' cannot be listed

When I try to query our Serverless SQL pool in Azure Synapse Analytics I get the following error:
"Content of directory on path 'https://xxxxxx.dfs.core.windows.net/dataverse-xxxxxx-org5a2bcccf/account/Snapshot/2018-08_1656570292/*.csv' cannot be listed.".
I have checked out the following link for clues as to what could be cause:
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand?tabs=x80070002
It is suggested that the error is due permissions:
However, I believe I have the correct permissons,
I get this error whether I try to execute the query in SSMS or Synapse Workspace.
The error in SSMS is as follows:
Warning: Unable to resolve path https://xxxxx.dfs.core.windows.net/dataverse-xxxxx-org5a2bcccf/account/Snapshot/2018-10_1657304551/*.csv. Error number 13807, Level 16, State 1, Message "Content of directory on path 'https://xxxxxx.dfs.core.windows.net/dataverse-xxxxx-org5a2bcccf/account/Snapshot/2018-10_1657304551/*.csv' cannot be listed.".
Can someone let me know how to resolve this?
The query that I'm attempting to execute can be located here:
https://github.com/slavatrofimov/Synapse-Link-for-Dataverse-data-enrichment-in-Serverless-SQL-Pools/blob/main/SQL/Enrich%20Synapse%20Link%20for%20Dataverse%20Entities%20with%20Human-Readable%20Labels.sql
Is there a definitive way to determine if the problem is due to lack of permissions?
Update Question:
I have just realised that the issue is access the Lake on https://xxxxxx.dfs.core.windows.net/dataverse-xxxxxx-org5a2bcccf/
Therefore please take a look at my permissons on the lake and let me know if it is sufficient?
This issue occurs when the user trying to query the external table does not have the relevant permissions or if there is a firewall enabled on your storage network.
When looked at the permissions you have provided, I see Storage Blob Data reader and Storage Blob Data contributor have been given.
Ref doc: Control storage account access for serverless SQL pool in Azure Synapse Analytics
In case if your storage account is firewall protect then you will have to follow the steps described in this document to overcome the issue: Access storage that is protected with the firewall
Here are couple of relevant articles which might help you configure your storage firewall to overcome this issue:
Storage configuration for external table is not accessible while query on Serverless
Synapse Studio error while trying to read data from Storage Account using SQL On Demand

Erroe run synapse Pipline

How can i resolve the error bellow ? i don't understant because it's worked before
The Azure Synapse Workspace DB connector is currently in public preview.
I also tried to reproduce the same in my environment. I am also getting same error. with the different source types with sink type Workspace DB and getting same error.
When I tried with other Sink type it worked successfully.
May be the issue is with the Workspace DB connector sink type
you can use different sink type, or you can raise Support ticket to Microsoft.

Getting error:login failed for user <token- identification principle> data factory linked service to synapse analytics

I am trying to create azure data factory linked service to synapse analytics with system-assigned managed identity but i am getting this error
Error 22300:Cannot connect to SQL Database:
#xxxxsql.azuresynapse.net', Database: xxxx, User: Check the linked
service configuration is correct, and make sure the SQL Database
firewall allows the integration runtime to access.
Login failed for user token-identified principal
I am getting this error. how solve this error?
I tried to reproduce same thing in my environment, I got same error.
To resolve above error, Please follow below steps:
Go to Azure synapse workspace -> Azure Active directory ->
Set Admin -> search for Azure data factor, make as admin and save it.
You can refer this similar SO thread.

Azure Synapse - Unable to create External table Error 401 Unauthorized

I have a service principal using which I am trying to create an external table for Azure Data lake gen1. The external table creation fails with the error:
Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist.
Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not:
IOException: Server returned HTTP response code: 401
What I understand is that this is unAuthorized error. But I checked that this Service principle has proper role assignment in the Azure Data Lake Gen1 storage. What else could be causing the unauthorized issue here ? Does my SQL synapse instance where I am creating the external table also needs access to ADLS Gen1?
Please note that SQL Synapse instance and ADLS Gen1 instance are in different resource groups.
Just checked the Service principle I was using to create database scoped credentials, its secret had expired based on some periodic schedule. Renewed the secret and using the updated value helped fixing the issue.

How to Increase Azure-hosted Integration Runtime JVM Heap Memory

We are using Azure Data Factories to upload SQL data (stored in ORC files) from Azure Data Lake into Azure Data Warehouse. We keep getting the following error on one specific table:
"errorCode": "2200",
"message": "ErrorCode=UserErrorJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error happened when invoking java, message: java.lang.OutOfMemoryError:Java heap space.,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"failureType": "UserError",
"target": "ADLToADWCopy"
This table gave the same error when we uploaded from the source SQL DB into Azure Data Lake and we were able to resolve that by tweaking the JVM memory allocation on the self-hosted Integration Runtime machine:
setx _JAVA_OPTIONS "-Xms256m -Xmx16g" /M
Is there a way to pass something similar to the Azure-hosted Integration Runtime?