I am working on a migration project where the source data resides in 4 different ADLS Gen 2 account and needs to be migrated to some other data lake.
For this, I have created 4 LinkedService to connect to these source Gen 2 account.
Now I want to pass LinkedService name to DataFlow at run time depending on the data source I want to run data flow for, to connect to respective ADLS Gen2 account.
I am able to pass other parameter from pipeline to dataflow. But passing LinkedService name is not working.
I'm afraid to say it's not supported.
Data Factory doesn't support pass parameter(pipeline parameter or Data Flow parameter) to linked service. Like you said, We can't pass parameter from Data Flow to linked service.
HTH.
Related
I am moving from SSIS to Azure.
we have 100's of files and MSSQL tables that we want to push into a Gen2 data lake
using 3 zones then SQL Data Lake
Zones being Raw, Staging & Presentation (Change names as you wish)
What is the best process to automate this as much as possible
for example build a table with files / folders / tables to bring into Raw zone
then have Synapse bring these objects either full or incremental load
then process the them into the next 2 zones I guess more custom code as we progress.
Your requirement can be accomplished using multiple activities in Azure Data Factory.
To migrate SSIS packages, you need to use SSIS Integrated Runtime (IR). ADF supports SSIS Integration which can be configured by creating a new SSIS Integration runtime. To create the same, click on the Configure SSIS Integration, provide the basic details and create a new runtime.
Refer below image to create new SSIS IR.
Refer this third-party tutorial by SQLShack to Move local SSIS packages to Azure Data Factory.
Now, to copy the data to different zones using copy activity. You can make as much copy of your data as your requirement using copy activity. Refer Copy data between Azure data stores using Azure Data Factory.
ADF also supports Incrementally load data using Change Data Capture (CDC).
Note: Both Azure SQL MI and SQL Server support the Change Data Capture technology.
Tumbling window trigger and CDC window parameters need to be configured to make the incremental load automated. Check this official tutorial.
The last part:
then process them into the next 2 zones
This you need to manage programmatically as there is no such feature available in ADF which can update the other copies of the data based on CDC. You need to either create a separate CDC for those zones or do it logically.
I am archiving rows that are older than a year into ADLSv2 as delta tables, when there is a need to report on that data, I need to join archived data with some tables existing on on-premise database. Is there a way we can do a join without re-hydrating from or hydrating data to cloud?
Yes, you can achieve this task by using Azure Data Factory.
Azure Data Factory (ADF) is a fully managed, serverless data integration
service. Visually integrate data sources with more than 90 built-in,
maintenance-free connectors at no added cost. Easily construct ETL and
ELT processes code-free in an intuitive environment or write your own
code.
Firstly, you need to install the Self-hosted Integration Runtime in your local machine to access the on-premises SQL Server in ADF. To accomplish this, refer Connect to On-premises Data in Azure Data Factory with the Self-hosted Integration Runtime.
As you have archived the data in ADLS, you need to change the Access tier of that container from Cold -> Hot in order to retrieve the data in ADF.
Later, create a Linked Service using Self-hosted IR which you have created. Create a Dataset using this Linked Service to access the on-premises database.
Similarly, create a Linked Service using default Azure IR. Create a Dataset using this Linked Service to access the data from ADLS.
Now, you also require a destination database where you will store the data after join. If you are storing it in same on-premises database, you can use the existing Linked Service but you need to create a new Dataset mentioning the destination table name.
Once all this configuration done, create a Data Flow activity pipeline in ADF.
Mapping data flows are visually designed data transformations in Azure
Data Factory. Data flows allow data engineers to develop data
transformation logic without writing code. The resulting data flows
are executed as activities within Azure Data Factory pipelines that
use scaled-out Apache Spark clusters.
Learn more about Mapping data flow here.
Finally, in data-flow activity, your sources will be on-premises dataset and ADLS dataset which you have created above. You will be using join transformation in mapping data flow to combine data from two sources. The output stream will include all columns from both sources matched based on a join condition.
The sink transformation will take your destination dataset where the data will be stored as an output.
I would like to copy my files from one container to another container using ADF pipeline and while copying i have to changes of Access tier from Hot tier to archive tier.
I have to achieve this using ADF pipeline. Help me a way with out using custom activity would be great.
I don't see a direct property in any activity to achieve this, you can try one of the methods from below.
Using Web activity in Azure Data Factory and Azure Synapse Analytics
Copy Blob where, x-ms-access-tier Specifies the tier to be set on the target blob.
If after the CopyActivity, use Set Blob Tier, x-ms-access-tier Indicates the tier to be set on the blob
Of course you would have to use parameters to make this dynamically executable for multiple files involved.
I have a ADF Web activity from which I'm getting metadata as an output. I want to copy this metadata into Azure Postgres DB. How to use the Web activity output as an source to the next copy activity?
Accoding to this answer. I think we can use two Web activities to store
the output of your first Web activity.
Use #activity('Web1').output.Response expression at second web activity to save the output as a blob to the container. Then we can use Copy activity to copy this blob into Azure Postgres DB.
Since I do not have permission to set role permissions, I did not test this. I think this solution is feasible.
I want create api or desktop application that allow user to execute my u-sql script, that works on specifict Azure Data Lake Store from my Azure Account. I readed something about Azure Data Factory Service, but is there any another way in .NET to execute U-SQL script on Azure Data Lake Store Data ( and pass parameteres to this scripts ) ?
You could use the Azure Data Lake Analytics (ADLA) .NET SDK to automate actions related to your ADLA account, jobs, or catalog items. You could use this alongside the Azure Data Lake Store (ADLS) .NET SDK to automate actions related to your ADLS account or file system.
Currently, passing parameters to these scripts would involve modifying the script before submitting the job, replacing values or adjusting variables as needed.
Regarding Parameter-passing:
You can prepend your parameters by using the following statement:
DECLARE #parameter type = value;
If you want to default a script parameter in your script, you can use
DECLARE EXTERNAL #parameter type = default_value;
This will give you a default value if you do not add the explicit DECLARE and will be overwritten by the previous DECLARE statement if present.
Please visit http://aka.ms/adlfeedback to file/vote on a request to expose an SDK parameter model.