I have a requirement where a particular Kettle transformation (ktr) needs to be run multiple times.
This is the scenario:
A transformation has a table input which pulls user details belonging to a particular country.
I have almost 5 countries and this is saved in a separate table.
Can i make a variable and assign the country name to it and run the same transformation in a loop of five times, where every time the variable gets updated to the next country name.
I need the variable to be used in the table input query and in the column name also.
This is how i mentioned the variable in the table input.
When i am giving the variable as value, in the output i am getting '${COUNTRY
}' instead of the value of the variable
PDI allows you to do multiple iteration using a variable. You need to use "copy rows to result" in kettel step. I have a blog written on this topic.
Blog Link : https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/
Please check if it helps you. :)
Related
I have to check multiple tables available or not. please help me with the here ETL.PLC_ACCOUNT is a table and I have to check multiple tables screenshot
According to the screenshot you attached, you're trying to check the table name in a job entry. That only allows for either one static value or variable at the same time.
Instead of that, if you want to validate multiple tables use the transformation step "Table exists". Make sure to have a previous step to set a field containing the list of table names you want to validate.
I have an excel with 300 rows. I need to use each of these rows as a field name in a transformation.
I was thinking of creating a job that for each row of a table sets a variable that I use afterwards on my transformation.
I tried defining a variable as the value I have in one row and the transformation works. Now I need a loop that gets value after value and redefines the variable I created then executes the transformation.
I tried to define a Job that has the following:
Start -> Transformation(ExcelFileCopyRowsToResult) -> SetVariables -> Transformation(The transf that executes using whatever the variable name is at the moment).
The problem is that the variable I defined never changes and the transformation result is always the same because of that.
Executing a transformation for each row in a result set is a standard way of doing things in PDI. You have most of it correct, but instead of setting a variable (which only happens once in the job flow), use the result rows directly.
First, configure the second transformation to Execute for each row in the Edit window.
You can then use one of two ways to pass the fields into the transformation, depending on which is easier for you:
Start the transformation with a get rows from result. This should get you one row each time. The fields will be in stream directly and can be used as such.
Pass the fields as parameters, so they can be used like variables. I use this one more often, but it takes a bit more setup.
Inside the second transformation, go to the properties and enter variable names you want in the Parameters tab.
Save the transformation.
In the job, open the transformation edit window and go to Parameters.
Click Get Parameters.
Type the field name from the first transformation under Stream Column Name for each parameter.
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
I’m new to Pentaho and I’m currently having an issue with mapping specific row values to an ID.
I have a data file with around 30 columns, one of which is for currencies (USD, GBP, AUD, etc).
The main objective is to have the user select up to 8 (minimum of 1) currencies and map them to a corresponding ID 1-8. All other currencies not in the specified 8 will be mapped with an ID of 9.
The final step is to then output the original data set, along with the IDs.
I’m pretty sure I’m making this way harder than it should, but here is what I have at the moment.
I have created a job where the first step is to set the variables for my 8 currencies, selectionOne -> AUD, selectionTwo -> GBP, …, selectionEight -> JPY.
I then have a transformation to read the data from the file and use the copy rows to result step.
Following that I have a second job called for-each which is my loop for checking the current currency in the row.
Within this job I have two transformations, one called set-current, one called map-currencies.
set-current simply uses the get rows from result step (to grab the data from the first transformation). I then use the set variable step to set the current currency to the value in currency field. This works fine, as each pass through in the loop changes the current variable to the correct value.
Map-currencies is where I’m having the most issues.
The goal is to use the filter row step to compare the current currency against the original 8 selected currencies, and then using the value mapper step to map it to an ID, before outputting the csv file.
The main issue here, is that I can’t use my original variables in the filter or value mapper.
So, what I’ve done here is use the get variables step to retrieve the variables and named them: one, two, three, …, eight. This allows me to bypass the filtering issue, but they don’t seem to work for the value mapper, which is the all important step.
The second issue is that when the file is output it only outputs one value (because of the loop), selecting the append option works, but this could be a problem if the job is run more than once.
However, the priority here is the mapping issue.
I understand that this is rather long, and perhaps a tad confusing, but I will greatly appreciate any help on this, even if it’s an entirely new approach 😊.
Like I said, I’m probably making it harder than it should be.
Thanks for your time.
Edit for AlainD
Input example
Output example
This should be doable in a single transformation using the Stream Lookup step.
Text File Input is your main file, Property input reads your property file into Key and Value columns. You could use a normal text file with two columns instead of the property input.
Below are the settings of the Stream lookup. Note the default value "9" for records that are not found in the lookup stream.
I'm trying to process some data and store it in a datawarehouse. For doing it, I wanted to store dimensions in one transformation and fact (only have one) in another transformation. So I can use a job for execute the first one, copy rows to result and get them into the second transformation.
In the first transformation, I read some Excel file and separate this data into some streams. It is data from a baptism, so I have one stream for the person, another one for parents, another one for sponsors, and so on... At the end of each stream, I insert data into database and return PK autogenerated (it is an id autoincrement).
In the second one, I only have Get rows from result and want to set them into a txt file (just for see it is been done correctly). The problem is that the file is created but it is empty. I suppose that if I let fields in Get rows from result empty, it gets all fields.
What am I doing wrong?
At the end what I want is to have one Copy rows to result at the end of each stream in the first transformation and get all this data in the second one.
In "Insert Pare Padrina" I return id_pare_padrina which is autogenerated, and the same with "Insert Mare Padrina" (I have more streams which I also have to include them into result). This transformation is not executed per row because I need values of other rows.
Thank you!
In order to pass the data from the first transformation to the second transformation, you need to set certain parameters like:
1. First of all, in the transformation settings of the second transformation (at the Job Level), check on the items as image below:
Copy Previous results to parameters will ensure that all the results/data in the "Copy Rows to Result" step is getting properly passed to the next level.
Execute for every input row : will execute the second transformation for every rows in the first transformation file. This is optional based on your requirement.
2. In the same transformation settings, define the "Parameters" in the Parameters tabs. Check the image below:
Here, NAME is the parameter i have defined. So when you are using the "Get rows from result", you can define these parameter names.
3. Instead of using "Get rows from result", you can alternately use "Get Variables" step to fetch all the variables coming from the previous step. All you need to do is to define the parameter names inside the ktr file (CTRL + T). (Actually i have practically implemented in that fashion and it worked for me.)
4. Since "copy rows to result" step uses heap memory, defining multiple instances of this step might exhaust the memory space quickly and your code might fall in trouble. Ideally use a single instance of this step.
But if your data interation is only one row, best option would be to use "set variables" step.
I assume you might have missed some of these sections in the job.
You can read more on copy rows to result in here.
Hope it helps :)