Pentaho Kettle Mapping with parameterized Table Input - pentaho

I am passing a value to the sub transformation, sub transformation takes the value fine as i have used java-script step to Alert it.
But i have a table input step in the sub transformation step, where i need to used the parent transformation value as a parameter to the TABLE INPUT step to run a query against it, but its not working, as the table input step does not understand the field, how can i achieve this behavior?
I am stuck at this point and can't go further.
The only option that is left is to use the Pentaho JOBS, but is it possible using Mapping inside a transformation?
I tried to setVariable function from the javascript in the sub transformation but nothing works.

I expect that your sub transformation is similar to the one in the figure below. Are you sure you are passing the parameters correctly? The important is to:
have same number of parameters in Mapping input specification as parameters used in Table input step
Replace variables in script checked
Insert data from step filled
parameter ? used in SQL query
If you need to pass more parameters to the table input, the number of parameters in previous step (Mapping input specification in case of my example) needs to respect the number of parameters you use in table input. Then you use ? more times in your query. E.g. for 3 params you could have:
WHERE name = ? AND surname = ? AND age = ?
Also you need to respect the order of parameters which come from previous step:

Related

Azure data factory - pass multiple values from lookup into dynamic query?

I have a lookup function that returns a list of valid GUID IDs in ADF. I then have a foreach process which runs a stored procedure for each GUID ID and assigns an ID column to it.
What I want to do is then have another lookup run which will run the below query to bring me the GUID and also the newly assigned ID. It is very simple to write in SQL
SELECT GUID, Identifier from DBO.GuidLOAD
WHERE GUID in ('GUIDID','GUIDID','GUIDID')
However I am struggling to translate this in ADF.. I have got as far as the #Concat part and most of the help I find online only refers to dynamic queries with single values as input parameters.. where mine is a list of GUIDs where they may be 1, more or none at all..
Can someone advise the best way of writing this dynamic query?
first 2 run fine, I just need the third lookup to run the query based on the output of the first lookup
You can use string interpolation (#{...}) instead of concat(). I have a sample table with with 2 records in my demo table as shown below:
Now, I have sample look up which returns 3 guid records. The following is debug output of lookup activity.
Now, I have used a for loop to create an array of these guid's returned by lookup activity using append variable activity. The items value for each activity is #activity('get guid').output.value. The following is the configuration of append variable inside for each.
#item().guids
I have used join function on the above array variable to create a string which can be used in the required query.
"#{join(variables('req'),'","')}"
Now, the query accepts guid's wrapped inside single quotes i.e., WHERE GUID in ('GUIDID','GUIDID','GUIDID'). So, I created 2 parameters with following values. I used them in order to replace double quotes from the above final variable with single quotes.
singlequote: '
doublequote: "
Now in the look up where you want to use your query, you can build it using the below dynamic content:
SELECT guid, identifier from dbo.demo WHERE GUID in (#{replace(variables('final'),pipeline().parameters.doublequote,pipeline().parameters.singlequote)})
Now, when I debug the pipeline, the following query would be executed which can be seen in the debug input of the final lookup.
The output would be as below. Only one row should be returned from the sample I have taken and output is as expected:

Loop for Pentaho where I redefine a variable on each execution

I have an excel with 300 rows. I need to use each of these rows as a field name in a transformation.
I was thinking of creating a job that for each row of a table sets a variable that I use afterwards on my transformation.
I tried defining a variable as the value I have in one row and the transformation works. Now I need a loop that gets value after value and redefines the variable I created then executes the transformation.
I tried to define a Job that has the following:
Start -> Transformation(ExcelFileCopyRowsToResult) -> SetVariables -> Transformation(The transf that executes using whatever the variable name is at the moment).
The problem is that the variable I defined never changes and the transformation result is always the same because of that.
Executing a transformation for each row in a result set is a standard way of doing things in PDI. You have most of it correct, but instead of setting a variable (which only happens once in the job flow), use the result rows directly.
First, configure the second transformation to Execute for each row in the Edit window.
You can then use one of two ways to pass the fields into the transformation, depending on which is easier for you:
Start the transformation with a get rows from result. This should get you one row each time. The fields will be in stream directly and can be used as such.
Pass the fields as parameters, so they can be used like variables. I use this one more often, but it takes a bit more setup.
Inside the second transformation, go to the properties and enter variable names you want in the Parameters tab.
Save the transformation.
In the job, open the transformation edit window and go to Parameters.
Click Get Parameters.
Type the field name from the first transformation under Stream Column Name for each parameter.

Pentaho compare values from table to a number from REST api

I need to make a dimension for a datawarehouse using pentaho.
I need to compare a number in a table with the number I get from a REST call.
If the number is not in the table, I need to set it to a default (999). I was thinking to use table input step with a select statement, and a javascript step that if the result is null to set it to 999. The problem is if there is no result, there is nothing passed through. How can this be done? Another idea was to get all values from that table and somehow convert it to something so I can read id as an array in javascript. I'm very new to pentaho DI but I've did some research but couldn't find what I was looking for. Anyone know how to solve this? If you need information, or want to see my transformation let me know!
Steps something like this:
Load number from api
Get Numbers from table
A) If number not in table -> set number to value 999
B) If number is in table -> do nothing
Continue with transformation with that number
I have this atm:
But the problem is if the number is not in the table, it returns nothing. I was trying to check in javascript if number = null or 0 then set it to 999.
Thanks in advance!
Replace the Input rain-type table by a lookup stream.
You read the main input with a rest, and the dimension table with an Input table, then make a Stream Lookup in which you specify that the lookup step is the dimension input table. In this step you can also specify a default value of 999.
The lookup stream works like this: for each row coming in from the main stream, the steps looks if it exists in the reference step and adds the reference fields to the row. So there is always one and exactly one passing by.

set variable as today and yesterday in pentaho di

I am creating a transformation in pentaho di to extract data from google analytics. I need to set in "Query Definition" the start date and end date as yesterday and today. I understand this can be done by create two varialbes e.g. ${todsy},${yesterday}. However, I don't know how to set these to change values dynamically at every run. ANy idea on how to do this?
Thanks,
I can think of an easy way to do this. The first thing is that you can't declare and use the variables in the same transformation. I would suggest you to approach this problem in the following way:
Create a transformation before this one, say "set variables transformation". In this transformation you will set the variables.
You can use Get System Info step to set today's and yesterday's dates as the variables. Use copy rows to result step to pass these rows to the next transformation.
In the next transformation, which will be the one you have attached the screenshot of, use the Get Variables step and use these variables in your input step. Or you can use Get rows from result step as well.
You don't need to worry about the dates now, because dates will be generated and your variables get the values dynamically.
You can check this article if you want to learn more about how to pass the values from one transformation to another:
https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/
Hope it helps!
for that, you have to use a job, add the first transformation and inside it use
get system info step then add today's and yesterday's date as a variable, and link to the set variable step. Set the scope of variable as parent job,
in second job use **get variables **.
It took me a while to solve this myself and the way I ended up doing it is as follows:
I created a transformation (called 'set formatted_today variable') the transformation contains two objects:
The transformation contained a 'table input' object with a query like:
select to_char(current_timestamp, 'YYYY-MM-DD-HH-MI') as formatted_today
The output of my 'table input' goes to a 'set variables' object, you can use the 'get fields button to wire the fields you've named in your query to the variable you want to set. In this case, my field is called 'formatted_today' and so is my variable.
In my main job, I have a 'set session variables' object that creates my 'formatted_today' variable.
Immediately after it, I call my 'set formatted_today variable' transformation
Anywhere I need this variable I insert ${formatted_today} in the text

Using variable in transformations

I have a requirement where a particular Kettle transformation (ktr) needs to be run multiple times.
This is the scenario:
A transformation has a table input which pulls user details belonging to a particular country.
I have almost 5 countries and this is saved in a separate table.
Can i make a variable and assign the country name to it and run the same transformation in a loop of five times, where every time the variable gets updated to the next country name.
I need the variable to be used in the table input query and in the column name also.
This is how i mentioned the variable in the table input.
When i am giving the variable as value, in the output i am getting '${COUNTRY
}' instead of the value of the variable
PDI allows you to do multiple iteration using a variable. You need to use "copy rows to result" in kettel step. I have a blog written on this topic.
Blog Link : https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/
Please check if it helps you. :)