change the access tier of file blob storage from hot tier to archive tier in ADF pipeline - azure-data-factory-2

I would like to copy my files from one container to another container using ADF pipeline and while copying i have to changes of Access tier from Hot tier to archive tier.
I have to achieve this using ADF pipeline. Help me a way with out using custom activity would be great.

I don't see a direct property in any activity to achieve this, you can try one of the methods from below.
Using Web activity in Azure Data Factory and Azure Synapse Analytics
Copy Blob where, x-ms-access-tier Specifies the tier to be set on the target blob.
If after the CopyActivity, use Set Blob Tier, x-ms-access-tier Indicates the tier to be set on the blob
Of course you would have to use parameters to make this dynamically executable for multiple files involved.

Related

Load multiple files using Azure Data factory or Synapse

I am moving from SSIS to Azure.
we have 100's of files and MSSQL tables that we want to push into a Gen2 data lake
using 3 zones then SQL Data Lake
Zones being Raw, Staging & Presentation (Change names as you wish)
What is the best process to automate this as much as possible
for example build a table with files / folders / tables to bring into Raw zone
then have Synapse bring these objects either full or incremental load
then process the them into the next 2 zones I guess more custom code as we progress.
Your requirement can be accomplished using multiple activities in Azure Data Factory.
To migrate SSIS packages, you need to use SSIS Integrated Runtime (IR). ADF supports SSIS Integration which can be configured by creating a new SSIS Integration runtime. To create the same, click on the Configure SSIS Integration, provide the basic details and create a new runtime.
Refer below image to create new SSIS IR.
Refer this third-party tutorial by SQLShack to Move local SSIS packages to Azure Data Factory.
Now, to copy the data to different zones using copy activity. You can make as much copy of your data as your requirement using copy activity. Refer Copy data between Azure data stores using Azure Data Factory.
ADF also supports Incrementally load data using Change Data Capture (CDC).
Note: Both Azure SQL MI and SQL Server support the Change Data Capture technology.
Tumbling window trigger and CDC window parameters need to be configured to make the incremental load automated. Check this official tutorial.
The last part:
then process them into the next 2 zones
This you need to manage programmatically as there is no such feature available in ADF which can update the other copies of the data based on CDC. You need to either create a separate CDC for those zones or do it logically.

How to use output of Azure Data Factory Web Activity in next copy activity?

I have a ADF Web activity from which I'm getting metadata as an output. I want to copy this metadata into Azure Postgres DB. How to use the Web activity output as an source to the next copy activity?
Accoding to this answer. I think we can use two Web activities to store
the output of your first Web activity.
Use #activity('Web1').output.Response expression at second web activity to save the output as a blob to the container. Then we can use Copy activity to copy this blob into Azure Postgres DB.
Since I do not have permission to set role permissions, I did not test this. I think this solution is feasible.

how to schedule a query in Azure synapse on-demand

how to schedule a Query in Azure Synapse On-demand and save the result to a azure storage every 1 hour
my idea is to materialize the results into a separate storage and use PowerBI to access the results
Besides the fact that PowerBI can directly access your Synapse instance, if you want to go this route you have several options:
This can be done using a pipeline in the new Synapse Workspace. You should be aware that this technology is still in preview.
Use Polybase and Stored Procedures on a Job Scheduler to INSERT to a Blob Storage location. There is a lot of configuration in this option.
At present, I would recommend Azure Data Factory (ADF) on a Schedule Trigger. This is the simplest and most reliable of the current options. Based on the scenario you described, a single Copy activity could easily perform this task.

Options for ingesting and processing data in Azure sql

I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.

Syncing Azure BLOB Storage to Amazon S3

We're storing about 4 million files (4 TB or so) of miscellaneous files, mainly Word and PDF, in Azure BLOB storage. I'm looking to replicate this data in a different cloud for disaster recovery and peace of mind, and Amazon S3 seems as good a candidate as any.
Trouble is, I don't have a local server large enough to hold a local copy of these files. Ideally, I'd want to sync right from Azure Blob to S3. We're adding new files continually, so the sync would need to be frequent as well (multiple times per day).
I see lots of options for download from Azure to local => upload from local to S3, but very little for direct Azure => S3 sync. What are some good options here?
We can migrate the azure storage data to amazon s3 by node.js package.
You can see the full description provided here.
You can also use azure data factory to replicate as it provides a copy tool which can be modified according to your needs and settings for transferring data .
You can refer to this document on Azure data factory and copy tool.