Same transformation which different parameters in the job - pentaho

I have a transformation with a parameter that I need to change 5 times.
I want to create a job that execute this transformation 5 times changing my parameter value inside each transformation.
like:
Transformation Q1 - Parameter = 1
Transformation Q2 - Parameter = 2
...
How can I do this?

You could also use a transformation to generate the parameter values, and pass the rows to fill up the parameter value in the second transformation:

You can do this with passing parameter in the 'Transformation Executor' step in Job.
You can found an sample from Here. See the log after run my Job.

Related

pass SQL Query values to Data Factory variable as array for foreachloop

similar to this question how to pass variables to Azure Data Factory REST url's query stirng
However, I have a pipeline to query against graphapi, where I need to pass in a userid as part of the Url to get their manager to build an ActiveDirectory staff hierarchy, this is fine on an individual basis, or even as a predefined array variable where I insert["xx","xxx"] into the pipeline variable etc. My challenge is that I need to pass the results of a SQL query to be the array variable. So, instead of defining the list of users, I need to pass into the foreach loop the results from a SQL query.
I can use a lookup to a set variable, but the url seems to be misconstructed and has extra characters added in for some reason.
returning graph.microsoft.com/v1.0/users/%7B%7B%22id%22:%22xx9e7878-bwbbb-bwbwbwr-7897-414a8e60c78c%22%7D%7D/?$expand=xxxxxx where the "%7B%7B%22id%22:%" and "%22%7D%7D/" is all unnecessary and appears to come from the json rather than just utilising the value.
The lookup runs the query from SQL
The Set variable uses the lookup value's (below) to assign to a pipeline variable as an array.
then the foreachloop uses the variable value in the source
#concat('users/{',item(),'}/?$expand=manager($levels=max;$select=id,displayName,userPrincipalName,createdDate)')
If anyone can suggest how to construct the array value dynamically that would be great.
I have used
SELECT '["'+STRING_AGG(CONVERT(NVARCHAR(MAX),t.[id]),'","')+'"]' AS id FROM
stage.extract_msgraphapi_users t LEFT JOIN stage.extract_msgraphapi_users s ON s.id = t.id
and this returns something that looks like an array ["xx","xxx"] but data factory still interpreted this as a string and not an array. Any help would be appreciated.
10 minutes later:
#concat('users/{',item().id,'}/?$expand=manager($levels=max;$select=id,displayName,userPrincipalName,createdDate)')
note the reference to item().id to use the id level of the array. Works like a dream for anyone else facing the same issue

Pass value from job to transformation in Pentaho

I have the following transformation in Pentaho PDI (note the question mark in the SQL statement):
The transformation is called from a job. What I need is to get the value from the user when the job is run and pass it to the transformation so the question mark is replaced.
My problem is that there are parameters, arguments and variables, and I don't know which one to use. How to make this work?
What karan means is that your sql should look like delete from REFERENCE_DATA where rtepdate = ${you_name_it}, and check the box Variable substitution. The you_name_it parameter must be declared in the transformation option (click anywhere in the spoon panel, Option/Parameters), with or without a default value.
When running the transformation, you are prompted with a panel where you can set the value of the parameters, including you_name_it.
Parameters pass from job to transformation transparently, so you can declare you_name_it as a parameter of the job. Then when the user run the job, it will be prompted to give values to a list of parameters, including you_name_it.
An other way to achieve the same result, is to use arguments. The question marks will be replaced by the fields specified in the Parameters list box, in the same order. Of course the field you use must be defined in a previous step. In your case, a Get variable step, which reads the variable defined in the calling job, and put them in a row.
Note that, there is a ready made Delete step to delete records from a database. Specify the table name (which can be a parameter: just Crtl+Space in the box), the table column and the condition. The condition will come from a previous step defined in a Get parameter like in the argument method.
You can use variables or arguments. If you are using variables then use
${variable1}
syntax in your query and if you want to use arguments then you have to use? In your query and mention the names of those arguments in "Field names to be used as arguments" section. Both will work. Let me know if you need further clarifications.

Informatica PC: how do I make a decision in the flow upon scalar query result?

I'm struggling with what seems like the simplest thing there is: assigning a value to a mapping variable that I later on use in my flow to make a decision upon... With my MS SSIS background this is a 10 seconds task, however in Informatica PowerCenter, it is taking me hours...
So I have a mapping variable $$V_FF and a workflow variable $$V_FF. At first the names were different but while trying things out, I changed that. But that shouldn't matter, right?
In a mapping, I have a view as a source that returns -1, 0 or 1. The mapping variable aggregate function is set to MIN.
In the session that I have created for this mapping, I have a post-session assignment between the wf variable and the mapping variable.
In this mapping I use setvariable function in an Extrans block.
Every time I run the wf, I see in the log that it uses a persistent value instead of assigning a new value everytime the flow is running...
What am I missing here?
Thanks in advance!
Well, the variables here work in a bit different way indeed. So it would be easier to come up with a good answer or you'd explain the whole scenario: what are you using the variable for?
Anyway, the variable values are persisted in repository and reused, as you've noticed. For your scenario you could add an Assignment Task to the Workflow before your Session. Set some low value (eg. -1 if you expect your variable to have some positive value after the Mapping run) and use PreSession Variable Assignment to pass the value to the Mapping. This will override the use of persisted repository value. If course in this case you will need to use Max aggregation.
In the end, I managed to accomplish what I wanted back then. There might be a better way, but this solution is easy to maintain and easy to understand.
Create a variable in your workflow, lets say $$FailureFlag with type integer.
Create a view in your DB that returns 1 row with a integer value between 0 and x, where x is a positive integer value.
Create a mapping with the view that we just created as the source and use a dummy table as destination.
In this mapping you also create a variable, lets say $$MYVAR, with type integer and aggregation "Count". In an "Expression Transformation" I assign the result of the view, column FF, to this variable $$MYVAR by using SETVARIABLE($$MYVAR,FF).
From this mapping, create a session in your workflow. In the Components tab, in the "Postsession_success_variable_mapping" section, add a row and link workflow variable $$FailureFlag with session variable $$MYVAR.
Add a Decision component right after the session you just created, and test the content of your workflow variable, for example $$V_FAILURE_FLAG_IMX = 1.
Connect your decision then with your destination and add a test clause for example:
"$MyDecision.Condition = true AND $MyDecision.PrevTaskStatus = succeeded"
Voila, that's it.

Issues with parametrization of JDBC request with jmeter

I would like to use a predefined queries from csv file.
The problem is that some of the values into the queries must be randomly chosen and each query has different number of parameters.
So i have tried something like this:
"select * from table where column = "${variable1};"
Please note that variable1 is already defined and has proper value.
The problem is that jMeter executes the query without replacing the parameter with its value.
It is not an option to use "?" ( question mark) as it is explained into the basic tutorial.
Has anybody has an idea how to solve this issue, without writing custom code using PreSampler like Beanshell, etc.
It is possible to use JMeter variables in SELECT statements
The reasons for not getting it resolved can be
(Most likely) Variable is not set. Use Debug Sampler and View Results Tree listener combination to double check its value.
You have syntax error in your SQL query
If you have a "complex" variable like variable - is a prefix and 1 is a random number which comes from i.e. __Random() or __threadNum() function you need to refer the variable a little bit differently, like:
${__evalVar(variable${__threadNum})}
or
${__evalVar(${variable}${__Random(1,9,)})}

How to cope with null results in SQL Tasks that return single rows in SSIS 2005?

In a dataflow task, I can slip a rowcount into the processing flow and place the count into a variable. I can later use that variable to conditionally perform some other work if the rowcount was > 0. This works well for me, but I have no corresponding strategy for sql tasks expected to return a single row. In that event, I'm returning those values into variables. If the lookup produces no rows, the sql task fails when assigning values into those variables. I can branch on that component failing, but there's a side effect of that - if I'm running the job as a SQL server agent job step, the step returns DTSER_FAILURE, causing the step to fail. I can tell the sql agent to disregard the step failure, but then I won't know if I have a legitimate error in that step. This seems harder than it should be.
The only strategy I can think of is to run the same query with a count(*) aggregate and test if that returns a number > 0 and if so running the query again without the count. That's ugly because I have the same query in two places that I need to keep in sync.
Is there a better way?
In that same condition you can have additional logic (&& or ||). I would take one of the variables for your single statement and say something to the effect:
If #User::rowcount>0 || #User:single_record_var!=Default
That should help.
What kind of SQL statement? Can you change it to still return a single row with all NULLs instead of no rows?
What stops it from returning more than one row? The package would fail if it ended up returning more than one row, right?
You could also change it to call a stored procedure, and then call the stored procedure in two places without code duplication. You could also change it to be a view or user-defined function (if parameters are needed), SELECT COUNT(*) FROM udf() to check if there is data, SELECT * FROM udf() to get the row.