Sorting on a parameter field in Kettle - pentaho

I want to write a sub-transformatino that sorts a stream based on an arbitrary field (I'm going to need to sort several streams over the course of this entire project, and I'd like to just re-use 1 transformation and change the name of the field I'm sorting by). The problem is no matter what I try, I get an error saying that "The field ${SORT_FIELD} specified in the "Sort Fields" step is not in the steps input stream." (${SORT_FIELD} is the variable holding the name of the field I want to sort by and the "Sort Fields" is the name of the actual "Sort Rows" step.
${SORT_FIELD} is listed in the mapping input specification as a required field. I'm also listing ${SORT_FIELD} as a parameter to the sub-transformation (in addition to having it inherit all variables from the parent transformation).
Is there any way to pass a field name as a parameter to a sort rows step so I don't have to manually input the field I want to sort by?

yes, you can do that via metadata injection. I'm pretty sure that the sort rows step does support metadata injection. Check out matt casters blog on the subject..

Related

sm30: Set matching column heading

I created a table in SAP via se11, then I used the table maintenance generator.
Now I edit the table via sm30:
The second and the third column: Both have the heading "Feldname".
The first "Feldname" column is called COLUMN_NAME and its data element is "Fieldname".
The second "Feldname" column is called AUTH_FIELD and its data element is "XUFIELD"
I would like to see the column names which I gave the columns in se16 (COLUMN_NAME, AUTH_FIELD) in the heading.
How to prevent the table maintenance generator from giving other names in the headings?
Option 1 - use custom data elements:
Instead of using Fieldname and XUFIELD data elements, you can create your custom data elements and give them what header you would like.
(You will have to regenerate table maintenance)
Option 2 - editing screen
When generated the table maintenance, you supplied a function group and a screen number.
Go to SE80 -> Function Groups -> <function_group_supplied> -> screens -> <screen_supplied>.
Then edit it as you want.
Note: Modifying a generated object is considered risky. Your customized changes might be overwritten in a future regeneration.
Add custom data elements with suitable descriptions. Let the new data elements refer to the original ones (resp. the domains) to avoid having to reinvent everything.
Data element descriptions can be translated.
You can set different descriptions for different lengths, e.g. "Field" for the narrow column with length 10, and "Field name" for a wide label with length 30.
Regenerating the maintenance screen won't accidentally delete the changed descriptions.

PDI /Kettle - Passing data from previous hop to database query

I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.

Access 2010 Database Clenup

I have problems with my records within my database, so I have a template with about 260,000 records and for each record they have 3 identification columns to determine what time period the record is from and location: one for year, one for month, and one for region. Then the information for identifying the specific item is TagName, and Description. The Problem I am having is when someone entered data into this database they entered different description for the same device, I know this because the tag name is the same. Can I write code that will go through the data base find the items with the same tag name and use one of the descriptions to replace the ones that are different to have a more uniform database. Also some devices do not have tag names so we would want to avoid the "" Case.
Also moving forward into the future I have added more columns to the database to allow for more information to be retrieved, is there a way that I can back fill the data to older records once I know that they have the same tag name and Description once the database is cleaned up? Thanks in advance for the information it is much appreciated.
I assume that this will have to be done with VBA of some sort to modify records by looking for the first record with that description and using a variable to assign that description to all the other items with the same tag name? I just am not sure of the correct VBA syntax to go about this. I assume a similar method would be used for the backfilling process?
Your question is rather broad and multifaceted, so I'll answer key parts in steps:
The Problem I am having is when someone entered data into this
database they entered different description for the same device, I
know this because the tag name is the same.
While you could fix up those inconsistencies easily enough with a bit of SQL code, it would be better to avoid those inconsistencies being possible in the first place:
Create a new table, let's call it 'Tags', with TagName and TagDescription fields, and with TagName set as the primary key. Ensure both fields have their Required setting to True and Allow Zero Length to False.
Populate this new table with all possible tags - you can do this with a one-off 'append query' in Access jargon (INSERT INTO statement in SQL).
Delete the tag description column from the main table.
Go into the Relationships view and add a one-to-many relation between the two tables, linking the TagName field in the main table to the TagName field in the Tags table.
As required, create a query that aggregates data from the two tables.
Also some devices do not have tag names so we would want to avoid the
"" Case.
In Access, the concept of an empty string ("") is different from the concept of a true blank or 'null'. As such, it would be a good idea to replace all empty strings (if there are any) with nulls -
UPDATE MyTable SET TagName = Null WHERE TagName = '';
You can then set the TagName field's Allow Zero Length property to False in the table designer.
Also moving forward into the future I have added more columns to the
database to allow for more information to be retrieved
Think less in terms of more columns than more tables.
I assume that this will have to be done with VBA of some sort to modify records
Either VBA, SQL, or the Access query designers (which create SQL code behind the scenes). In terms of being able to crunch through data the quickest, SQL is best, though pure VBA (and in particular, using the DAO object library) can be easier to understand and follow.

#DBColumn in Lotus Notes

I've been tasked with learning Lotus Domino Designer - not sure what I did in a previous life, but it must have been pretty bad... - and was wondering how to do a lookup on a database to get some values for selections. As this information could potentially be used in a lot of the applications, I'd prefer it only to be in the one place.
I gather I can use #DBColumn, but what happens if an entry in that lookup changes? If the unique value of the lookup is the text, then the relationship would be broken, wouldn't it? Is there any way of mimicing the idea of relational lookups?
I'm assuming I'm looking at Lotus development from the wrong angle, as this seems to be a real limitation of look ups.
I haven't found any decent learning material on the interwebs, so would appreciate any help.
Ta
You would want to store a unique ID along with the textual value in the source database (not unlike what you would do in an RDBMS). Then, only store that ID in any referencing documents, and use a computed-for-display field to lookup the display value. (There is a performance consideration here - and you could "de-normalize" the data and store the ID and text value in the referencing documents, and do some asynchronous work to keep the values in sync - eg: using a scheduled agent that runs every night or every week).
If DB1 has the key values and DB2 has the documents which will reference these values, then in the form in DB2, you would still do a #DbColumn to lookup your value list. In the lookup view in DB1, concat the text value and ID with a pipe separator (textField + "|" + ID) in the first column. That will tell Notes to store only the ID value (what follows the pipe is the "alias" and is what will be stored).
Note: I would avoid using #DocumentUniqueID as the unique ID for these values, as the Document Unique ID will change if the documents are copied and pasted, or the entire database is copied, etc. You can use the #unique formula function in a computed-when-composed field to generate something close to a unique ID (almost like an identity column in sql).
If you need relational properties, look for non-Notes solutions. It is possible to get some relational behavior using document UNIDs and update agents, but it will be harder than with a proper relational backend.
Your specific problem with referencing to a piece of text that might change can to some extent be resolved by using aliases in the choice fields. If a dialog list contains values on the form...
Foo|id1
Bar|id2
...the form will display Foo but the back-end document will store the value id1 - (and this is what you will be able to show in standard views - although xpages could solve that). Using the #DocumentUniqueID for alias can be a good idea under some circumstances.
It depends on where your using the data. The #DBLookup or #DBColumn will work in Lotus Notes fields if the fields are set to be computed for display. That way they always get the most up to date information when you open the form etc.
If you make it so the data is saved on to the document then you will have to write some update code when you need to refresh the values.
The Lotus Notes help files for designer are pretty good, have a look at that.
SM
You could use a key or alias to store the relationship to your lookup value so if the value itself changes, the connection remains because the alias is intact. For example, if your lookup values were being stored as a collection of documents, I'd have the #DBColumn retrieve Document UNID|lookup value pairs. When in display mode, you could then retrive the value using #GetDocField. If the lookup values are in a different database, then you'd have to retrieve them for display using #DBLookup and construct a view that is keyed off of the UNID or whatever key you decide to use.The only drawback to this technique is that you wouldn't be able to display the field value in views as the actual value isn't stored in the document, just a reference to it. Using XPages, though, you COULD map the relationship into a dynamic datatable just like you would in a truly relational system.
It's tricky, but using LEI, you could also use Notes to front-end a relational backend system, also giving you the dynamic relationship you desire in your lookups.
Hope this helps!
The content of the lookup can change freely. A problem only arises (as it would on any other platform in the same circumstances) if the lookup key changes. You need to use a key that won't change. Human-readable text is an advantage, but if you want to be able to change your key description from, say, "Divisions" to "Business Units" and still have lookups work, you need to use an alias of some kind, which will presumably be mapped to your text description and only used internally. #Unique is pretty good for this, and gives a shortish key, if that is important to you. #DocumentUniqueID is most reliable, but as Ed pointed out, will change (must change - it's a new document) if you copy/paste or make a non-replica copy. This is easy to get around, though. Create a Computed-when-composed field (called, say, "LookupRef") on the form you are using for your reference document with the formula "#DocumentUniqueID". That will capture the ID at the time of creation, and it will not change on copy/paste etc. Use that as your key.

SSIS Derived Column Missing Downstream

I've created a derived column that translates a 1 to an 'M' and a 2 to 'F'. i.e. a gender indicator. The derived column feeds into a Fuzzy Lookup transformation and then to a conditional split. The problem is the derived field does not show up in any of the downstream components. In the Fuzzy Lookup transform the "Pass Through" checkbox is checked for the derived column, but in the following Conditional Split transform the column does not show up at all. Funny thing is that the _Similarity_Gender_Derived does show up in the column list for the conditional split.
Hopefully someone else has seen this type of behavior.
Thanks - Mr. Do
Right click on the Fuzzy Lookup task and select Show Advanced Editor.
Go to the "Input and Output Properties" tab.
Expand the "Output" item, and then the "Output Columns" item.
Is your derived column listed there?
If it is, it should also show up on the available input columns of the Conditional Split task. If not ...
Right click on the Derived Column task and select Show Advanced Editor.
Go to the "Input and Output Properties" tab.
Expand the "Derived Column Output" item, and then the "Output Columns" item, and select your derived gender column.
Note its LineageID attribute.
Repeat the earlier steps to get the Fuzzy Lookup's Output Columns.
Hit the "Add Column" button. Name the column the same name as your derived column, and in the "SourceInputColumnLineageID" attribute, enter the LineageID you noted earlier.
Alternate answer: is your derived column creating an all new column, or simply replacing your existing "1/2" column? In the Derived Column Editor, check your "Derived Column" .. umm .. column. If you are just replacing your existing column with the new value (instead of adding a new column) you may just be looking in the wrong place.
Thanks for the response. Turns out that issue had to do with some corruption with the meta data. I ended up going back into the Derived Column Transform, renamed the column in error, then added a new derived column with the old name. I saved the transform, and then removed the original column. That fixed the problem.
Thanks for the responses.
Did you add the Derived Column into an already existing transform chain?
If you did then there's a good chance that one of the transforms further down the queue is set to not pass on this newly derived column. Check all the transforms below and make sure that you're derived column is set to be passed through.