how to change default parameter values at pipeline level dynamically in azure data factory while moving from dev to prod

how to change default parameter values at pipeline level dynamically in azure data factory while moving from dev to prod - azure-data-factory-2

I have few parameters specified at pipeline level in ADF and i have used default values in dev environment.Now i want to move this pipeline to prod environment and want to change the parameter values according to the production.
Earlier is SSIS we used to have configurations(sql,xml...) to do such changes without changing anything in the SSIS package.
can we do the same thing in ADF i:e without changing the default values manually in the package,can we use values stored in sql table to pass as pipeline parameters.

You don't need to worry about the values defined in a pipeline parameter as long as you are going to have a trigger on it. Just make sure to publish different versions of triggers in dev and prod repositories and pass different values to the pipeline parameters.
If however you want to change parameters, you can do so by invoking the pipeline from a parent pipeline through an execute pipeline activity. The values you pass as parameters to the execute pipeline activity can be coming from a lookup (over some configuration file or table).

Related

Is it possible to explicitly declare which project to use bq slot in BigqueryOperator?

Says I have composer airflow run under GCP project named project-central (as source project).
I have a lot of task instances using BigqueryOperator to generate a table for multiple projects (as target table), for example:
project-1
project-2
project-3
From what I understand, BigqueryOperator by default will run SQL using bq slot from which composer is running (based on the example above project-central).
With the aim of:
to split the resources,
to prevent the hit limit of bq slot in project-central, and
to have cost attribution
I need to have the task instance BigqueryOperator running using bq slot of the target table.
So if the target table is project-2, BigqueryOperator uses bq slot from project-2 instead of project-central.
Is it possible to have those settings in BigqueryOperator params?
I see params bigquery_conn_id (str) – reference to a specific BigQuery hook from this link https://airflow.apache.org/docs/apache-airflow/1.10.6/_api/airflow/contrib/operators/bigquery_operator/index.html
But is it correct to use that to set up which project to use bq slot?
If yes, what is the value of that params?
Because I can't find many examples of that, just this under the same link, and I still don't understand what that is means.
bigquery_conn_id='airflow-service-account'
bigquery_conn_id='bigquery_default'

SSIS Variables Not Updating When Execute Package on Server

I have created a package that does the following:
ExecuteSQLTask: queries db table and sets package variables from data returned
DataFlowTask starts
OleDBSource: uses package variables as parameters to call stored procedure
FlatFileDestination: uses package variables to save a tab delimited file in the correct location and filename
SendEmailTask: uses package variables to email the file as attachment to recipient
I have the following vars:
FileName
sp_Param1
sp_param2
emailRecipient
SMTPServer
At design time, each var has dummy values. When I run the package in VS, it works perfectly. I can update the values in the db table and each execution picks up the new values and works.
The problem begins when I deploy the package to the database and execute it. It appears to not be setting the variables from the db table any longer and it uses the dummy data that I used during design time. What is going on?

Variable values stored outside of SSIS

This is merely a SSIS question for advanced programmers. I have a sql table that holds clientid, clientname, Filename, Ftplocationfolderpath, filelocationfolderpath
This table holds a unique record for each of my clients. As my client list grows I add a new row in my sql table for that client.
My question is this: Can I use the values in my sql table and somehow reference each of them in my SSIS package variables based on client id?
The reason for the sql table is that sometimes we get request to change the delivery or file name of a file we send externally. We would like to be able to change those things dynamically on the fly within the sql table instead of having to export the package each time and manually change then re-import the package. Each client has it's own SSIS package
let me know if this is feasible..I'd appreciate any insight

Yes, it is possible. There are two ways to approach this and it depends on how the job runs. First is if you are running for a single client for a single job run or if you are running for multiple clients for a single job run.
Either way, you will use the Execute SQL Task to retrieve data from the database and assign it to your variables.
You are running for a single client. This is fairly straightforward. In the Result Set, select the option for Single Row and map the single row's result to the package variables and go about your processing.
You are running for multiple clients. In the Result Set, select Full Result Set and assign the result to a single package variable that is of type Object - give it a meaningful name like ObjectRs. You will then add a ForEachLoop Enumerator:
Type: Foreach ADO Enumerator
ADO object source variable: Select the ObjectRs.
Enumerator Mode: Rows in all the tables (ADO.NET dataset only)
In Variable mappings, map all of the columns in their sequential order to the package variables. This effectively transforms the package into a series of single transactions that are looped.

Yes.
I assume that you run your package once per client or use some loop.
At the beginning of the "per client" code read all required values from the database into SSIS varaibles and the use these variables to define what you need. You should not hardcode client specific information in the package.

Execute SSIS Package Script Task From Stored Procedure Get Variable Value

I have an SSIS package that takes in (through a package parameter) a value, passes it into a script task via a script variable (readonly variable), converts it to another value inside the script task, and finally writes that value out to another script variable (readwrite variable). There are no other SSIS modules in the package aside from the one script task.
What I would like to do, from outside the package (via SQL) is:
Call the SSIS package, passing in a value for my parameter and variable
Get the value of the read/write variable that is determined at the end of the script task execution
I've got step #1 working, just can't figure out #2.
How do I get the value of package variable in an SSIS package after it has executed? Is it accessible? Is it stored anywhere or can I store it somewhere in the SSIS catalog? I've tried to see if it's stored in the SSISDB.[catalog].executions table somewhere, but it doesn't seem to be.
Do I need to write that script variable to a package parameter in order to see it from SQL after execution? Could I then perhaps see it by using EXEC [SSISDB].[catalog].get_parameter_values, or does that only show parameter values before package execution? Am I going about this completely the wrong way?
Thanks in advance!

What I would do is add one last step to the package to write the value of the variable to a table.
Then you can retrieve the value from the table via SQL.
You can either truncate the table every time, or keep a permanent history associated with each time the package runs.

Getting the JOB_ID variable in Pentaho Data Integration

When you log a job in Pentaho Data Integration, one of the fields is ID_JOB, described as "the batch id- a unique number increased by one for each run of a job."
Can I get this ID? I can see it in my logging tables, but I want to set up a transformation to get it. I think there might be a runtime variable that holds an ID for the running job.
I've tried using the Get Variables and Get System Info transformation steps to no avail. I am a new Kettle user.

You have batch_ids of the current transformation and of the parent job available on the Get System Info step. On PDI 5.0 they come before the "command line arguments", but order changes with each version, so you may have to look it up.

You need to create the variable yourself to house the parent job batch ID. The way to do this is to add another transformation as the first step in your job that sets the variable and makes it available to all the other subsequent transformations and job steps that you'll call from the job. Steps:
1) As you have probably already done, enable logging on the job
JOB SETTINGS -> SETTINGS -> CHECK: PASS BATCH ID
JOB SETTINGS -> LOG -> ENABLE LOGGING, DEFINE DATABASE LOG TABLE, ENABLE: ID_JOB FIELD
2) Add a new transformation call it "Set Variable" as the first step after the start of your job
3) Create a variable that will be accessible to all your other transformations that contains the value of the current jobs batch id
3a) ADD A GET SYSTEM INFO STEP. GIVE A NAME TO YOUR FIELD - "parentJobBatchID" AND TYPE OF "parent job batch ID"
3b) ADD A SET VARIABLES STEP AFTER THE GET SYSTEM INFO STEP. DRAW A HOP FROM THE GET SYSTEM INFO STEP TO THE SET VARIABLES STEP AS ITS MAIN OUTPUT
3c) IN THE SET VARIABLES STEP SET FIELDNAME: "parentJobBatchID", SET A VARIABLE NAME - "myJobBatchID", VARIABLE SCOPE TYPE "Valid in the Java Virtual Machine", LEAVE DEFAULT VALUE EMPTY
And that's it. After that, you can go back to your job and add subsequent transformations and steps and they will all be able to access the variable you defined by substituting ${myJobBatchID} or whatever you chose to name it.
IT IS IMPORTANT THAT THE SET VARIABLES STEP IS THE ONLY THING THAT HAPPENS IN THE "Set Variables" TRANSFORMATION AND ANYTHING ELSE YOU WANT TO ACCESS THAT VARIABLE IS ADDED ONLY TO OTHER TRANSFORMATIONS CALLED BY THE JOB. This is because transformations in Pentaho are multi-threaded and you cannot guarantee that the set variables step will happen before other activities in that transformation. The parent job, however, executes sequentially so you can be assured that once you establish the variable containing parent job batch ID in the first transformation of the job that all other transformaitons and job steps will be able to use that variable.
You can test that it worked before you add other functionality by adding a "Write To Log" step after the Set Variables transformation that writes the variable ${myJobBatchID} to the log for you to view and confirm it is working.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to change default parameter values at pipeline level dynamically in azure data factory while moving from dev to prod - azure-data-factory-2

Related

Is it possible to explicitly declare which project to use bq slot in BigqueryOperator?

SSIS Variables Not Updating When Execute Package on Server

Variable values stored outside of SSIS

Execute SSIS Package Script Task From Stored Procedure Get Variable Value

Getting the JOB_ID variable in Pentaho Data Integration

Categories

Resources