I have an excel with 300 rows. I need to use each of these rows as a field name in a transformation.
I was thinking of creating a job that for each row of a table sets a variable that I use afterwards on my transformation.
I tried defining a variable as the value I have in one row and the transformation works. Now I need a loop that gets value after value and redefines the variable I created then executes the transformation.
I tried to define a Job that has the following:
Start -> Transformation(ExcelFileCopyRowsToResult) -> SetVariables -> Transformation(The transf that executes using whatever the variable name is at the moment).
The problem is that the variable I defined never changes and the transformation result is always the same because of that.
Executing a transformation for each row in a result set is a standard way of doing things in PDI. You have most of it correct, but instead of setting a variable (which only happens once in the job flow), use the result rows directly.
First, configure the second transformation to Execute for each row in the Edit window.
You can then use one of two ways to pass the fields into the transformation, depending on which is easier for you:
Start the transformation with a get rows from result. This should get you one row each time. The fields will be in stream directly and can be used as such.
Pass the fields as parameters, so they can be used like variables. I use this one more often, but it takes a bit more setup.
Inside the second transformation, go to the properties and enter variable names you want in the Parameters tab.
Save the transformation.
In the job, open the transformation edit window and go to Parameters.
Click Get Parameters.
Type the field name from the first transformation under Stream Column Name for each parameter.
Related
I make one table named QueryTable that store 4 SQL queries each have different meta data
I want to store these four queries result in Excel sheet
First I have taken executable SQL task and configured the connection and Result Set as a Full Result Set, Query statement.
After that open Result Set tab and create Query_variable as a object type.
2) Drag the For-Each_loop container and set Foreach ADO Enumerator in collection part and assign Query_variable
In variable mapping part create new variable as string type to store four queries. Result.
3) Finally add I one data flow task add OLEDB source configure with Same variable (That I have given in for each loop container).
Rightnow it is showing default value what i have given in User::Variable
I can iterate same No of column (Meta-data) queries and store in excel destination
But the Problem is when variable goes to next query that holds lesser or greater no of column.Here package fail cant handle different meta data table
Please assist me ,Can we iterate different meta data queries same time with proper output?
I Hope I have Explain the Problem what i facing exactly
Set the default value of User::Variable to one of the queries, so that BIDS can validate it at design time.
You can also try setting "DelayValidation" to true, but that might not be enough in this case.
Set the delay validation to true for both the data flow and the for each loop container.
I’m new to Pentaho and I’m currently having an issue with mapping specific row values to an ID.
I have a data file with around 30 columns, one of which is for currencies (USD, GBP, AUD, etc).
The main objective is to have the user select up to 8 (minimum of 1) currencies and map them to a corresponding ID 1-8. All other currencies not in the specified 8 will be mapped with an ID of 9.
The final step is to then output the original data set, along with the IDs.
I’m pretty sure I’m making this way harder than it should, but here is what I have at the moment.
I have created a job where the first step is to set the variables for my 8 currencies, selectionOne -> AUD, selectionTwo -> GBP, …, selectionEight -> JPY.
I then have a transformation to read the data from the file and use the copy rows to result step.
Following that I have a second job called for-each which is my loop for checking the current currency in the row.
Within this job I have two transformations, one called set-current, one called map-currencies.
set-current simply uses the get rows from result step (to grab the data from the first transformation). I then use the set variable step to set the current currency to the value in currency field. This works fine, as each pass through in the loop changes the current variable to the correct value.
Map-currencies is where I’m having the most issues.
The goal is to use the filter row step to compare the current currency against the original 8 selected currencies, and then using the value mapper step to map it to an ID, before outputting the csv file.
The main issue here, is that I can’t use my original variables in the filter or value mapper.
So, what I’ve done here is use the get variables step to retrieve the variables and named them: one, two, three, …, eight. This allows me to bypass the filtering issue, but they don’t seem to work for the value mapper, which is the all important step.
The second issue is that when the file is output it only outputs one value (because of the loop), selecting the append option works, but this could be a problem if the job is run more than once.
However, the priority here is the mapping issue.
I understand that this is rather long, and perhaps a tad confusing, but I will greatly appreciate any help on this, even if it’s an entirely new approach 😊.
Like I said, I’m probably making it harder than it should be.
Thanks for your time.
Edit for AlainD
Input example
Output example
This should be doable in a single transformation using the Stream Lookup step.
Text File Input is your main file, Property input reads your property file into Key and Value columns. You could use a normal text file with two columns instead of the property input.
Below are the settings of the Stream lookup. Note the default value "9" for records that are not found in the lookup stream.
I am creating a transformation in pentaho di to extract data from google analytics. I need to set in "Query Definition" the start date and end date as yesterday and today. I understand this can be done by create two varialbes e.g. ${todsy},${yesterday}. However, I don't know how to set these to change values dynamically at every run. ANy idea on how to do this?
Thanks,
I can think of an easy way to do this. The first thing is that you can't declare and use the variables in the same transformation. I would suggest you to approach this problem in the following way:
Create a transformation before this one, say "set variables transformation". In this transformation you will set the variables.
You can use Get System Info step to set today's and yesterday's dates as the variables. Use copy rows to result step to pass these rows to the next transformation.
In the next transformation, which will be the one you have attached the screenshot of, use the Get Variables step and use these variables in your input step. Or you can use Get rows from result step as well.
You don't need to worry about the dates now, because dates will be generated and your variables get the values dynamically.
You can check this article if you want to learn more about how to pass the values from one transformation to another:
https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/
Hope it helps!
for that, you have to use a job, add the first transformation and inside it use
get system info step then add today's and yesterday's date as a variable, and link to the set variable step. Set the scope of variable as parent job,
in second job use **get variables **.
It took me a while to solve this myself and the way I ended up doing it is as follows:
I created a transformation (called 'set formatted_today variable') the transformation contains two objects:
The transformation contained a 'table input' object with a query like:
select to_char(current_timestamp, 'YYYY-MM-DD-HH-MI') as formatted_today
The output of my 'table input' goes to a 'set variables' object, you can use the 'get fields button to wire the fields you've named in your query to the variable you want to set. In this case, my field is called 'formatted_today' and so is my variable.
In my main job, I have a 'set session variables' object that creates my 'formatted_today' variable.
Immediately after it, I call my 'set formatted_today variable' transformation
Anywhere I need this variable I insert ${formatted_today} in the text
Fairly straightforward question I think, I just haven't been able to find a clear example. I have a very complex transformation that I'm breaking down into a job. Having never created a job before, I'm struggling to send the data from one transformation to another. I used Copy Rows to Result in the first one and Get Rows From Result in the second one, but I feel like I'm still missing something. When I used Get Rows, I had to specify the row names - there was no sort of Get Fields button. I also can't preview the data in the transformation without running the job and having it save to an Excel file. When I did that, ALL of the fields were in the output file -- instead of just the ones I'd specified in the second transformation.
I've searched through the documentation and tried Googling but I can't find a clear walkthrough just on how to smoothly move data from one transformation to another. Any responses would be appreciated even if it's just pointing me towards something I've overlooked.
Thanks!
The most commom way is to use copy rows to result at the end of one KTR and use get rows from result as the starting point for the next one. Though you really can't "see" the result while operating in the next KTR, what you can do to ease the reading is set a preview window and leave it open to see all the columns names and data.
Whoever if you want to set just a few lines of code through to the next KTR you can use Set variables as the ending step of the first KTR and capture those variables at anytime in the second using Get Variables steps. Don't forget that if you do so you need to set the variables in the parent KJB(the Job that called the first KTR) with no Default value, and the Variable scope type of the Set variables step has to be set to Valid in the parent job.
The best way is to create KTR's, run/test each. This way you can examine resulting data and then integrate all individual transformations into the final job.
I'm trying to process some data and store it in a datawarehouse. For doing it, I wanted to store dimensions in one transformation and fact (only have one) in another transformation. So I can use a job for execute the first one, copy rows to result and get them into the second transformation.
In the first transformation, I read some Excel file and separate this data into some streams. It is data from a baptism, so I have one stream for the person, another one for parents, another one for sponsors, and so on... At the end of each stream, I insert data into database and return PK autogenerated (it is an id autoincrement).
In the second one, I only have Get rows from result and want to set them into a txt file (just for see it is been done correctly). The problem is that the file is created but it is empty. I suppose that if I let fields in Get rows from result empty, it gets all fields.
What am I doing wrong?
At the end what I want is to have one Copy rows to result at the end of each stream in the first transformation and get all this data in the second one.
In "Insert Pare Padrina" I return id_pare_padrina which is autogenerated, and the same with "Insert Mare Padrina" (I have more streams which I also have to include them into result). This transformation is not executed per row because I need values of other rows.
Thank you!
In order to pass the data from the first transformation to the second transformation, you need to set certain parameters like:
1. First of all, in the transformation settings of the second transformation (at the Job Level), check on the items as image below:
Copy Previous results to parameters will ensure that all the results/data in the "Copy Rows to Result" step is getting properly passed to the next level.
Execute for every input row : will execute the second transformation for every rows in the first transformation file. This is optional based on your requirement.
2. In the same transformation settings, define the "Parameters" in the Parameters tabs. Check the image below:
Here, NAME is the parameter i have defined. So when you are using the "Get rows from result", you can define these parameter names.
3. Instead of using "Get rows from result", you can alternately use "Get Variables" step to fetch all the variables coming from the previous step. All you need to do is to define the parameter names inside the ktr file (CTRL + T). (Actually i have practically implemented in that fashion and it worked for me.)
4. Since "copy rows to result" step uses heap memory, defining multiple instances of this step might exhaust the memory space quickly and your code might fall in trouble. Ideally use a single instance of this step.
But if your data interation is only one row, best option would be to use "set variables" step.
I assume you might have missed some of these sections in the job.
You can read more on copy rows to result in here.
Hope it helps :)