How to specify a constant as the target value for a column in KTR - pentaho

I am trying to set up a simple KTR to copy over data from one table to another. In the target table, I have a column called JobId which is not mapped to anything in the source table (the idea is to capture the KTR job id in this column).
I notice that if I do not include this column in the mapping, then the SQL generated drops the column from the table altogether (this happens irrespective of whether I set the update flag to Y or N), which is not what I want.
I would like to know how to set the target column to a constant or not alter it altogether.
Thanks

Use the add constants step, or the get variables step. In the get variables you can get the interal.transformation name variable if you like.

Related

Checking multiple table existence in Pentaho

I have to check multiple tables available or not. please help me with the here ETL.PLC_ACCOUNT is a table and I have to check multiple tables screenshot
According to the screenshot you attached, you're trying to check the table name in a job entry. That only allows for either one static value or variable at the same time.
Instead of that, if you want to validate multiple tables use the transformation step "Table exists". Make sure to have a previous step to set a field containing the list of table names you want to validate.

Set default value for column in Mosaic Decisions

I’m using a data flow in Mosaic Decisions and I’m using a MySQL writer node. The result set that I’m going to write has a field inserted-time. But I want to skip the value in this column and want to use the default value set for that column in the DB table. How do I do that?
You can simply drag the column that you want to skip into the "skip-insert-column" section of the writer node.
In this screenshot for example, the column "Target" will not be inserted into the target table and whatever default value set for that column in the DB table will be applied automatically.

Loop for Pentaho where I redefine a variable on each execution

I have an excel with 300 rows. I need to use each of these rows as a field name in a transformation.
I was thinking of creating a job that for each row of a table sets a variable that I use afterwards on my transformation.
I tried defining a variable as the value I have in one row and the transformation works. Now I need a loop that gets value after value and redefines the variable I created then executes the transformation.
I tried to define a Job that has the following:
Start -> Transformation(ExcelFileCopyRowsToResult) -> SetVariables -> Transformation(The transf that executes using whatever the variable name is at the moment).
The problem is that the variable I defined never changes and the transformation result is always the same because of that.
Executing a transformation for each row in a result set is a standard way of doing things in PDI. You have most of it correct, but instead of setting a variable (which only happens once in the job flow), use the result rows directly.
First, configure the second transformation to Execute for each row in the Edit window.
You can then use one of two ways to pass the fields into the transformation, depending on which is easier for you:
Start the transformation with a get rows from result. This should get you one row each time. The fields will be in stream directly and can be used as such.
Pass the fields as parameters, so they can be used like variables. I use this one more often, but it takes a bit more setup.
Inside the second transformation, go to the properties and enter variable names you want in the Parameters tab.
Save the transformation.
In the job, open the transformation edit window and go to Parameters.
Click Get Parameters.
Type the field name from the first transformation under Stream Column Name for each parameter.

PDI /Kettle - Passing data from previous hop to database query

I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.

Renaming column to previously existing column with different type won't let me update

Background: In a Java application I'm working on, I'm doing a refactoring of the storage of enum values. Previously, these were stored as integers, and mapped through enum values with a helper method in the enum. I would like to utilize the #EnumType.STRING capabilities of JPA to make the database more readable.
So, what I'm basically trying to do is change the type (as well as the values) of a column. For example, I had this table definition to begin with:
table Something (
id int,
source int,
[more columns]
)
I wanted to change the source-column into a VARCHAR(100) column instead, and here is how I did that:
Introduce a new column, called source_new with VARCHAR(100).
Populate the new column with mapped values based on the values of the old column (so each row with value 1 in the source column get's the value 'SomeSource' in source_new, each row with value 2 in source gets 'OtherSource', and so on
Drop the source-column
Rename the source_new column to source (using sp_rename)
My problem is this: Once this is done, I can't update the now newly defined source-column, because it still insists that it's an int column, and not a varchar column!
So a query like this:
update Something set source = 'SomeSource' where id = 1;
fails with this:
Error: Conversion failed when converting the varchar value 'SomeSource' to data type int.
SQLState: 22018
ErrorCode: 245
At the same time, sp_help of the table shows that the column is defined as varchar(100), and not int! Also, the column holds numerous varchar values from the original datamigration (from before the rename).
Is this a bug, or am I doing something wrong by renaming a column to a column name that was previously used with another type? (And as I'm typing the last question, it just sounds absurd to me, when I drop a column I expect to disappear, not to leave traces and in effect not allowing me to reuse the column name at any time in the future..)
SQLFiddle to illustrate (sp_rename doesn't work with SQLFiddle it seems): http://sqlfiddle.com/#!3/0380f/3
I have found the culprit, and it's name is trigger!
Some genious decided to put a check if the value updated is a valid source (checking against another table) in a trigger, so much for trusting your own code..
I spit on the shadow of people who hide functionality in database triggers, pfoy! Go back to the 80's where you belong! :p