How can I execute multiple ADF pipelines based on some conditions but without nesting the if activities? - azure-data-factory-2

My requirement is to loop over same set of files using multiple pipelines
e.g.
pipeline 1 consumes file 1 and then a certain output is generated
now pipeline 2 has to be triggered based on output of pipeline 1 else we should skip the run
If pipeline 2 runs then pipeline 3 has to be triggered based on output of pipeline 2 else we should skip the run
Is there any way to do this in ADF without nesting if-else ?

You can simply loop through multiple pipelines using combination of "Execute Pipeline activity" and "If Activity". The Execute Pipeline activity allows a Data Factory pipeline to invoke another pipeline. For false condition execute a different pipeline. Optionally if the child pipeline flow add "Execute Pipeline activity" refering the previous caller!
Caution: can get into a dangerous loop if right conditions are not configured.

Related

how to change default parameter values at pipeline level dynamically in azure data factory while moving from dev to prod

I have few parameters specified at pipeline level in ADF and i have used default values in dev environment.Now i want to move this pipeline to prod environment and want to change the parameter values according to the production.
Earlier is SSIS we used to have configurations(sql,xml...) to do such changes without changing anything in the SSIS package.
can we do the same thing in ADF i:e without changing the default values manually in the package,can we use values stored in sql table to pass as pipeline parameters.
You don't need to worry about the values defined in a pipeline parameter as long as you are going to have a trigger on it. Just make sure to publish different versions of triggers in dev and prod repositories and pass different values to the pipeline parameters.
If however you want to change parameters, you can do so by invoking the pipeline from a parent pipeline through an execute pipeline activity. The values you pass as parameters to the execute pipeline activity can be coming from a lookup (over some configuration file or table).

Failed Activity not running

I have an Azure Data Factory v2 Pipeline with a copy data activity. If the activity fails a Lookup activity should be run. Unfortunately the Lookup never runs. Why doesn't it run on failure of the copy data activity? How do I get this to work?
I'm expecting the "Set load of file to failed" activity to run because the Load Zipped File to Import Destination" activity failed. In fact in the output you can see the Status is "Failed" but no other activity is run.
Later I updated the Copy Activity to skip incompatible rows which caused the Copy data activity to succeed. The expected number of rows loaded now doesn't match the total number of rows loaded, so the If Condition activity goes to the failure route. Why would the Lookup run from the If Condition only triggering the failure Activity vs the Copy Data activity?
Activity dependencies are a logical AND. The lookup activity Set load of file to failed will only execute if both the Copy data activity and the If condition fail. It's not one or the other - it's both. I blogged about this here.
It's common to redesign this as:
A. Use multiple failure activities. Instead of having the one set load of file to failed at the end, copy that activity and have the copy data activity link to the new one on failure.
B. Create a parent pipeline and use an execute pipeline activity. Then add a single failure dependency from the execute pipeline activity to Set load of file to failed activity emphasized text.

GitLab-CI: run job only when all conditions are met

In GitLab-CI document, I read the following contents:
In this example, job will run only for refs that are tagged, or if a build is explicitly requested via an API trigger or a Pipeline Schedule:
job:
# use special keywords
only:
- tags
- triggers
- schedules
I noticed the document uses or instead of and, which means the job is run when either one condition is met. But what if I want to configure a job to only run when all conditions are met, for instance, in a Pipeline Schedule and on master branch?
If your specific question is how do I only run a pipeline on master when it was scheduled this should work:
job:
only:
- master
except:
- triggers
- pushes
- external
- api
- web
In this example you exclude all except for the schedules 'trigger' and only run for the master branch.

How to run 3 kettle scripts one after other

I am new with kettle so I am going to run the 3 kettle script 1.ktr,2.ktr,3.ktr one after other.
Can someone give me the idea how to achive this using kettle steps.
usually, you organize your kettle transformations within kettle jobs (.kjb). In those jobs you can have transformations being processed one after the other. You can also include jobs within jobs to further organize your ETL process. If you execute your jobs and transformations from the command line, please be aware that you execute transformations with the tool kitchen, transformations with pan. You can create jobs like you can create transformations, with spoon.
Ideal way is to use Jobs. Jobs guarantee sequential execution when you put them in sequence unlike a transformation calling multiple transformations through transformation executor (where it goes in parallel)
Create a wrapper job with Start -> (transformation step) KTR1 -> (transformation step) KTR2 -> (transformation step) KTR3 -> Success and run this job.
Create a job. Drag three "Transformation" steps in that job. Check this out:
You can add as many transformations you want in a job. When you will run this job, it will execute the transformations one by one.

Is it possible to execute pentaho step in sequence?

I have a pentaho transformation which is consist of, for example, 10 steps. I want to start this job for N input parameters but not in parallel, each job evaluation should start after previous transformation are fully completed(process done in transaction and commited or rollbacked). Is it possible with Pentaho?
You can add 'Block this step until steps finish' from Flow to your transformation. Or you can mix 'Wait for SQL' component from Utility with loop on your job.
Regards
Mateusz
Maybe you must do it using jobs instead of transformations. Jobs only run on sequence while transformations run on parallel. (Truly, a transformation has a initialize phase whose run is in parallel and then the flow runs sequentially).
If you can't use jobs, you always can do what Matusz said.