pentaho: how to pass a variable in start-job from the lowest transformation - pentaho

I am new at pentaho and have some problems during building my job. I have job1, which consists of job2 and other transformation. Job2 contains 3 transformations: 1, 2 and 3. The transformation3 makes some steps and calls another transformation4 (through Transformation executor step). Transformation4 compares some values and then a new variable „result“ is set. Problem is that I need to use this variable in Job1. I have tried to use „set variables“ step with valid in parent, root, system jobs, but the value is always empty. Are there any opportunities to pass this variable in start-job (job1)? Thank you for your help.

From the above Job/Transformation Flow description it would not be possible to set a value from T4 to J1 as Jobs are executed sequentially and Set_variable on the 1st iteration of T4 can not pass data to Get_Variables of J2. If J1 has been marked as "Run for each Row" (default) and data being read from a source like,
Table - make sure the DML is commited.
File - make sure file is closed.
Hope this answers the question

Related

read specific files names in adf pipeline

I have got requirement saying, blob storage has multiple files with names file_1.csv,file_2.csv,file_3.csv,file_4.csv,file_5.csv,file_6.csv,file_7.csv. From these i have to read only filenames from 5 to 7.
how we can achieve this in ADF/Synapse pipeline.
I have repro’d in my lab, please see the below repro steps.
ADF:
Using the Get Metadata activity, get a list of all files.
(Parameterize the source file name in the source dataset to pass ‘*’ in the dataset parameters to get all files.)
Get Metadata output:
Pass the Get Metadata output child items to ForEach activity.
#activity('Get Metadata1').output.childItems
Add If Condition activity inside ForEach and add the true case expression to copy only required files to sink.
#and(greater(int(substring(item().name,4,1)),4),lessOrEquals(int(substring(item().name,4,1)),7))
When the If Condition is True, add copy data activity to copy the current item (file) to sink.
Source:
Sink:
Output:
I took a slightly different approaching using a Filter activity and the endsWith function:
The filter expression is:
#or(or(endsWith(item().name, '_5.csv'),endsWith(item().name, '_6.csv')),endsWith(item().name, '_7.csv'))
Slightly different approaches, similar results, it depends what you need.
You can always do what #NiharikaMoola-MT suggested . But since you already know the range of the files ( 5-7) , I suggest
Declare two paramter as an upper and lower range
Create a Foreach loop and pass the parameter and to create a range[lowerlimit,upperlimit]
Create a paramterized dataset for source .
Use the fileNumber from the FE loop to create a dynamic expression like
#concat('file',item(),'.csv')

Resolve Azure YAML Pipeline overlapping variable names in multiple variable groups

We're working on converting our Classic Azure Pipelines to YAML Pipelines. One thing that is not clear is how to ensure that two different variable groups with variables with the same name but different meaning don't step on each other.
For example, if I have variable groups vg1 and vg2, each with variable named secretDataDestination, how do I ensure that the correct secretDataDestination is used in a YAML Pipeline?
A more concerning example is, if we initially have two variable groups without overlapping variable names, how do we ensure that adding a newly-overlapping variable name to a group doesn't replace use of the variable as originally intended?
A workaround is leveraging output variables in Azure DevOps with some small inline PowerShell task code.
First, create 2 jobs. Each job with their own variable group, in this case Staging and Prod. Both groups contain the variables apimServiceName and apimPrefix. Add the variables as a job output by echoing them as isOutput=true like this:
- job: StagingVars
dependsOn:
variables:
- group: "Staging"
steps:
- powershell: >-
echo "##vso[task.setvariable variable=apimServiceName;isOutput=true]$(apimServiceName)"
echo "##vso[task.setvariable variable=apimPrefix;isOutput=true]$(apimPrefix)"
name: setvarStep
- job: ProdVars
dependsOn:
variables:
- group: "Prod"
steps:
- powershell: >-
echo "##vso[task.setvariable variable=apimServiceName;isOutput=true]$(apimServiceName)"
echo "##vso[task.setvariable variable=apimPrefix;isOutput=true]$(apimPrefix)"
name: setvarStep
Then, use the variables in a new job, where you specify a new variable name and navigate to the job output to get a value, this works because the variable groups are each placed into their own job, so they will not overwrite any variable:
- job:
dependsOn:
- StagingVars
- ProdVars
variables:
ServiceNameSource: "$[ dependencies.StagingVars.outputs['setvarStep.apimServiceName'] ]"
UrlprefixSource: "$[ dependencies.StagingVars.outputs['setvarStep.apimPrefix'] ]"
ServiceNameDestination: "$[ dependencies.ProdVars.outputs['setvarStep.apimServiceName'] ]"
UrlprefixDestination: "$[ dependencies.ProdVars.outputs['setvarStep.apimPrefix'] ]"
if I have variable groups vg1 and vg2, each with variable named secretDataDestination, how do I ensure that the correct secretDataDestination is used in a YAML Pipeline?
Whether we use classic mode or YAML, it is not recommended to define a variable with the same name in different variable groups. Because when you refer to different variable groups containing the same variable name in the same pipeline, you cannot avoid step on each other.
When you use the same variable name in different variable group in the same pipeline, just like Matt said,
"You can reference multiple variable groups in the same pipeline. If
multiple variable groups include the same variable, the variable group
included last in your YAML file will set the variable's value."
variables:
- group: variable-group1
- group: variable-group2
That means that the variable value in the variable group written later will overwrite the variable value in the variable group written first
I guess you already know this, so you post your second question. Let us now turn to the second question.
if we initially have two variable groups without overlapping variable
names, how do we ensure that adding a newly-overlapping variable name
to a group doesn't replace use of the variable as originally intended?
Indeed, Azure devops currently does not have such a function or mechanism to intelligently detect whether different variable groups have the same variable name, and give a prompt.
I think this is a reasonable request, I add your request for this feature on our UserVoice site which is our main forum for product suggestions:
The ability to detect the same variable in a variable group
As workaround, the simplest and most direct way is that open the variable group of your pipeline link in the Library tab, and directly ctrl + F to search for the existence of the same variable.
Another way is to use REST API Variablegroups - Get Variable Groups By Id to get all the variables, then the loop compares with the variable we are going to enter whether the same variable exists.

Check Stored procedure has retured a value

I am a newbie to Datafctory. As part of my pipeline, I execute an sp to fetch the next record to process using Lookup and then use the returned value in a Set Variable.
If the SP returns noting then the Set Variable fails with the following error
Activity SetBatchId failed: The expression 'activity('usp_get_next_archive_batch').output.firstRow.id' cannot be evaluated because property 'firstRow' doesn't exist, available properties are 'effectiveIntegrationRuntime'.
Is there a way in DF to check the property exists before using it
thanks
Please add a question mark after ‘output’. Means ‘output?.firstRow’.
See also this post.
Azure Data Factory: For each item() value does not exist for a particular attribute
The expression should be 'activity('usp_get_next_archive_batch').output['firstRow'].['id']

How to Skip a Transformation in Pentaho Data Intigration job

I have single Job to run multiple Transformations. I want to parameterize this single job by some parameter that decides if all the transformation has to run or a single transformation has to run based on the name passed.
E.g
Start --> PARAMETER( Transformation_NAME OR ANY_IDENTIFIER that will decide to run all the transformation )
if (Parameter = Transformation_Name)
run only that particular transformation
else if (Parameter = ANY_IDENTIFIER )
run all the transformations as part of that main job
Step 1: Set Environment Variable:
jobName or youCanUseTransformationName - as we are going to pass the transformation name as a value in this environment variable
Step 2: Transomrmation Setup in main job: Set all the transformation in parallel mode with Precomponent as Simple Evaluation as shown.
below.
Step 3: Configure Simple Evaluation component : Double click on the simple Evaluation component.
Set
Evaluate: Variable
Variable Name : Enviroment Variable Name
Type : String
Sucess Condition : if value in List
Value : TransformationName,Unqique_Identifier value (here i have passed as zero)
NOTE: Repeat Step 3 for all the Simple Evaluation Component with value as respective Transformation name, Unique Identifire
in our case,
if set jobName = Transformation Name,
Then it will run only that particular transformation for which Transformation Name is passed in Environment Variable
if passed 0
then all the transformation will execute in one go

Adding a single query result into JMeter report

I have JMeter plan that starts with a single JDBC sampler query that captures session ID from the Teradata database (SELECT SESSION;). Same plan also has large number of JDBC samplers with complicated queries producing large output that I don't want to include in the report.
If I configure summary report and tick Save Response Data (XML) then the output from all sampler queries will be saved
How do I add only first query result (it's a single integer) into the test summary report and ignore results from all other queries? For example is there a way to set responseData = false after the first query output is captured?
Maybe sample_variables property can help?
Define something in "Variable Names" section of the JDBC Request, i.e. put session reference name there like:
Add the next line to user.properties file (lives in Jmeter's "bin" folder)
sample_variables=session_1
or alternatively pass it via -J command-line argument like:
jmeter -Jsample_variables=session_1 -n -t /path/to/testplan.jmx -l /path/to/results.csv
You need to use session_1 not session. As per JDBC Request Sampler documentation:
If the Variable Names list is provided, then for each row returned by a Select statement, the variables are set up with the value of the corresponding column (if a variable name is provided), and the count of rows is also set up. For example, if the Select statement returns 2 rows of 3 columns, and the variable list is A,,C, then the following variables will be set up:
A_#=2 (number of rows)
A_1=column 1, row 1
A_2=column 1, row 2
C_#=2 (number of rows)
C_1=column 3, row 1
C_2=column 3, row 2
So given your query returns only 1 row containing 1 integer - it will live in session_1 JMeter Variable. See Debugging JDBC Sampler Results in JMeter article for comprehensive information on working with database query results in JMeter.
When test completes you'll see an extra column in .jtl results file holding your "session" value:
Although not exactly solving your question as posted, I will suggest a workaround, using a "scope" of a listener (i.e. listener will only record items on the same or lower level than a listener itself). Specifically: have two Summary Reports: one on the level of test, the other (together with the sampler whose response you want to record) under a controller. For example:
here I have samplers 1, 2, 3, 4. I only want to save response data from sampler 2. So
Summary Report - Doesn't save responses is on global level, and it's configured to not save any response data. It only saves what I want to save for all samplers.
Summary Report - Saves '2' only is configured to save response data in XML format. But because this instance of Summary Report is under the same controller as sampler 2, but other samplers (1, 3, 4) are on higher level, it will only record responses of sampler 2.
So it doesn't exactly allow you to save response data from one sampler into the same file as all other Summary Report data. But at least you can filter which responses you are saving.
May be you can try assertion for ${__threadNum}
i.e. set condition for assertion as "${__threadNum}=1" and set your listner's "Log/display only" option as "successes"
This way it should log only the first response from samplers.