Azure DataLake with DVC - azure-data-lake

We are thinking to use DVC for versioning input data for DataScience project.
my data resides in Azure DataLake Gen1.
how do i configure DVC to push data to Azure DataLake using Service Principal?
i want DVC to store cache and data into Azure DataLake instead on local disk.

Related

Azure synapse Analytics connection to MongoDB Atlas

I'm new to Azure synapse Analytics. I'm trying to copy data from my mongodb Atlas cluster to a datalake
I'm trying to use a private endpoint to authorize the connection from my Azure Synapse workspace, but I always get a timeout issue every time I try to test the connection from the service linked MongoDb. Any ideas on how to get my MongoDB Atlas databases to communicate with Azure Synapse Analytics without allowing all IP addresses? Thanks

Access Azure Key Vault in Pandas read/write Azure Data Lake Storage Gen2 data in serverless Apache Spark pool in Synapse Analytics

Recently, Microsoft released a way for Pandas to read/write Azure Data Lake Storage Gen2 data in serverless Apache Spark pool in Synapse Analytics as per the below link:
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/tutorial-use-pandas-spark-pool
If I have to use the same strategy for pyspark in Azure DataBricks, how can I use the datalake secret (from Azure Key Vault) containing the account key so that pandas can access the data lake smoothly? In this way, I don't have to expose the secret value in DataBricks notebook
for Azure Databricks you just need to create a secret scope out of the Azure KeyVault, and then you can use dbutils.secrets.get function to retrieve a secret from secret scope or ingest the secrets into a Spark conf.
Please note that you will need to set correct Spark configuration to use that storage account key refer to documentation for details (blob storage, ADLS Gen2)

What is the best method to sync medical images between my client PCs and my Azure Blob storage through a cloud-based web application?

What is the best method to sync medical images between my client PCs and my Azure Blob storage through a cloud-based web application? I tried to use MS Azure Blob SDK v18, but it is not that fast. I'm looking for something like dropbox, fast, resumable and efficient parallel uploading.
Solution 1:
AzCopy is a command-line tool for copying data to or from Azure Blob storage, Azure Files, and Azure Table storage, by using simple commands. The commands are designed for optimal performance. Using AzCopy, you can either copy data between a file system and a storage account, or between storage accounts. AzCopy may be used to copy data from local (on-premises) data to a storage account.
And also You can create a scheduled task or cron job that runs an AzCopy command script. The script identifies and uploads new on-premises data to cloud storage at a specific time interval.
Fore more details refer this document
Solution 2:
Azure Data Factory is a fully managed, cloud-based, data-integration ETL service that automates the movement and transformation of data.
By using Azure Data Factory, you can create data-driven workflows to move data between on-premises and cloud data stores. And you can process and transform data with Data Flows. ADF also supports external compute engines for hand-coded transformations by using compute services such as Azure HDInsight, Azure Databricks, and the SQL Server Integration Services (SSIS) integration runtime.
Create an Azure Data Factory pipeline to transfer files between an on-premises machine and Azure Blob Storage.
For more details refer this thread

How to create linked service from azure analysis service to azure synapse SQL pool

How to pull data from cube that is hosted on Azure analysis service and load data in SQL pools of synapse
One solution is to use Azure Data Factory for data movement.
There's no built-in connector for Azure Analysis Service in Data Factory. But since Azure Analysis Services uses Azure Blob Storage to persist storage, you can use the connector for Azure Blob Storage.
In Data Factory, use a Copy Activity with Blob Storage as source and Azure Synapse Analytics as sink.
More on Azure Data Factory here: https://learn.microsoft.com/en-us/azure/data-factory/
Available connectors in Data Factory: https://learn.microsoft.com/en-us/azure/data-factory/connector-overview

An easy-to-use tool to copy data from Amazon S3 to Azure Blob/ADLS Gen2 via Azure Data Factory

Is there any simple tool to help me copy data from Amazon S3 to Azure Blob or Azure Data Lake Gen2?
Azure Data Factory team recently built a storage explorer extension, which is used to copy data from Amazon s3 to Azure Blob or Azure Data Lake Gen2 with simple drag and drop.
Check it here:
https://github.com/Azure/Azure-DataFactory/blob/main/StorageExplorerExtension/storage-explorer-plugin.md
Demo: https://www.youtube.com/watch?reload=9&v=GacGa5T0flk