An easy-to-use tool to copy data from Amazon S3 to Azure Blob/ADLS Gen2 via Azure Data Factory - amazon-s3

Is there any simple tool to help me copy data from Amazon S3 to Azure Blob or Azure Data Lake Gen2?

Azure Data Factory team recently built a storage explorer extension, which is used to copy data from Amazon s3 to Azure Blob or Azure Data Lake Gen2 with simple drag and drop.
Check it here:
https://github.com/Azure/Azure-DataFactory/blob/main/StorageExplorerExtension/storage-explorer-plugin.md
Demo: https://www.youtube.com/watch?reload=9&v=GacGa5T0flk

Related

How to Connect Tableau to Azure Data Lake Storage Gen2

I am using the following link to guide me on connecting Tableau to ADLS Gen2 https://help.tableau.com/current/pro/desktop/en-us/examples_azure_data_lake_gen2.htm
I have got stuck at the first hurdle where the document states
Start Tableau and under Connect, select Azure Data Lake Storage Gen2.
For a complete list of data connections, select More under To a
Server.
I don't have that option with the version of Tableau I just downloaded.
Should I be downloading a different version of Tableau to see the option to select Azure Data Lake Storage Gen2?
You're using Tableau Public (limited connection options), but if you download Tableau Desktop (even on a 14day trial) it will work

Access Azure Key Vault in Pandas read/write Azure Data Lake Storage Gen2 data in serverless Apache Spark pool in Synapse Analytics

Recently, Microsoft released a way for Pandas to read/write Azure Data Lake Storage Gen2 data in serverless Apache Spark pool in Synapse Analytics as per the below link:
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/tutorial-use-pandas-spark-pool
If I have to use the same strategy for pyspark in Azure DataBricks, how can I use the datalake secret (from Azure Key Vault) containing the account key so that pandas can access the data lake smoothly? In this way, I don't have to expose the secret value in DataBricks notebook
for Azure Databricks you just need to create a secret scope out of the Azure KeyVault, and then you can use dbutils.secrets.get function to retrieve a secret from secret scope or ingest the secrets into a Spark conf.
Please note that you will need to set correct Spark configuration to use that storage account key refer to documentation for details (blob storage, ADLS Gen2)

What is the best method to sync medical images between my client PCs and my Azure Blob storage through a cloud-based web application?

What is the best method to sync medical images between my client PCs and my Azure Blob storage through a cloud-based web application? I tried to use MS Azure Blob SDK v18, but it is not that fast. I'm looking for something like dropbox, fast, resumable and efficient parallel uploading.
Solution 1:
AzCopy is a command-line tool for copying data to or from Azure Blob storage, Azure Files, and Azure Table storage, by using simple commands. The commands are designed for optimal performance. Using AzCopy, you can either copy data between a file system and a storage account, or between storage accounts. AzCopy may be used to copy data from local (on-premises) data to a storage account.
And also You can create a scheduled task or cron job that runs an AzCopy command script. The script identifies and uploads new on-premises data to cloud storage at a specific time interval.
Fore more details refer this document
Solution 2:
Azure Data Factory is a fully managed, cloud-based, data-integration ETL service that automates the movement and transformation of data.
By using Azure Data Factory, you can create data-driven workflows to move data between on-premises and cloud data stores. And you can process and transform data with Data Flows. ADF also supports external compute engines for hand-coded transformations by using compute services such as Azure HDInsight, Azure Databricks, and the SQL Server Integration Services (SSIS) integration runtime.
Create an Azure Data Factory pipeline to transfer files between an on-premises machine and Azure Blob Storage.
For more details refer this thread

How to create linked service from azure analysis service to azure synapse SQL pool

How to pull data from cube that is hosted on Azure analysis service and load data in SQL pools of synapse
One solution is to use Azure Data Factory for data movement.
There's no built-in connector for Azure Analysis Service in Data Factory. But since Azure Analysis Services uses Azure Blob Storage to persist storage, you can use the connector for Azure Blob Storage.
In Data Factory, use a Copy Activity with Blob Storage as source and Azure Synapse Analytics as sink.
More on Azure Data Factory here: https://learn.microsoft.com/en-us/azure/data-factory/
Available connectors in Data Factory: https://learn.microsoft.com/en-us/azure/data-factory/connector-overview

Azure DataLake with DVC

We are thinking to use DVC for versioning input data for DataScience project.
my data resides in Azure DataLake Gen1.
how do i configure DVC to push data to Azure DataLake using Service Principal?
i want DVC to store cache and data into Azure DataLake instead on local disk.