Pentaho: How to dynamically add Field (= Column) to OutputRow? - pentaho

I would like to dynamically add fields (or a new columns) to the resulting output row in Kettle.
After spending hours reading through froum posts and he not so well done scripting documentation, I wondered if Stackoverflow would be of any help.

We can use the below steps to generate Dynamic column generation:
calculator
add constants.
Select required fields in table input and assign those values as a set variable and second transformtion level use get variables hop

How are your input values passed to the SQL query? if they are variables then just pass the table input step to a "get variables" step and get your new columns in that way.
Alternatively you can add columns using either calculator or add constants.
Or you could even use the "get system info" step to get commandline args and dates etc.

First, let me give you a code snippet of what I have in a User Defined Java Class step:
private int fieldToHashGeoIndex;
private int fieldToHashHeadIndex;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r=getRow();
if (r==null)
{
setOutputDone();
return false;
}
if (first) {
fieldToHashGeoIndex = getInputRowMeta().indexOfValue(getParameter("FIELD_TO_HASH_GEO"));
if (fieldToHashGeoIndex<0) {
throw new KettleException("Field to hash not found in the input row, check parameter 'FIELD_TO_HASH_GEO'!");
}
fieldToHashHeadIndex = getInputRowMeta().indexOfValue(getParameter("FIELD_TO_HASH_HEAD"));
if (fieldToHashHeadIndex<0) {
throw new KettleException("Field to hash not found in the input row, check parameter 'FIELD_TO_HASH_HEAD'!");
}
first=false;
}
Object[] outputRowData = RowDataUtil.resizeArray(r, data.outputRowMeta.size());
int outputIndex = getInputRowMeta().size();
String fieldToHashGeo = getInputRowMeta().getString(r, fieldToHashGeoIndex);
String fieldToHashHead = getInputRowMeta().getString(r, fieldToHashHeadIndex);
outputRowData[outputIndex++] = MurmurHash.hash64(fieldToHashGeo);
outputRowData[outputIndex++] = MurmurHash.hash64(fieldToHashHead);
putRow(data.outputRowMeta, outputRowData);
return true;
}
Now, normally you configure outputRowMeta from the step's config, but maybe you can modify it in the code. This should allow you to specify additional fields in the code.
As an alternative, you could latch on variable fields by defining fixed output fields on to the step like 'field1', 'field2', etc. and tracking the names of the fields elsewhere. You'd probably have to make all the fields of type String and then do your own type adjustments later.
Now that I think of it, though, variable output fields may lead to trouble: you have to be very careful with what you do in later steps to avoid having errors due to type mismatches or missing fields.

Related

SSIS error printing a variable value in a C# script

I'm trying to create a sample about how to catch the row count result in a variable within SSIS package and then just print the value using a simple script task.
So, first at all, I create a variable as Int32 where I'm going to store the row count result set and then I just pass through the count catch the value in a SSIS operation row count.
Then, in my Script task, I pass the variable as ReadOnly.
And finally, I type the code in the public void method to show up the current variable value.
public void Main()
{
// TODO: Add your code here
string result = null;
result = (string)Dts.Variables["qty"].Value;
MessageBox.Show("The current value of the SSIS global variable 'TestVariable' is '" + result);
Dts.TaskResult = (int)ScriptResults.Success;
}
Issues
In the image below you will see my current error in the show-up operation and also I realize that my current variable value is 0. Seems like the row count it is not getting the right value.
So guys, could you please give me a kind of guidance in order to succeded my requirement. thanks so much
SSIS usually doesn't like you performing GUI operations (such as showing a MessageBox) during processing. There are functions to display this information in the Progress Window. Have a look at componentMetaData.FireInformation().

Create a variable in swift with dynamic name

In swift, in a loop managed by an index value that iterates, I want to create a variable which has the variable name that is a concatenation of "person_" and the current loop index.
So my loop ends up creating variables like:
var person_0 = ...
var person_1 = ...
var person_2 = ...
etc...
I had no luck searching online so am posting here.
Thanks!
One solution is to store all your variables in an array. The indexes for the variables you store in that array will correspond to the index values you're trying to include in the variable name.
Create an instance variable at the top of your view controller:
var people = [WhateverTypePersonIs]()
Then create a loop that will store however many people you want in that instance variable:
for var i = 0; i < someVariable; i++ {
let person = // someValue of type WhateverTypePersonIs
people.append(person)
}
If you ever need to get what would have been "person_2" with the way you were trying to solve your problem, for example, you could access that person using people[2].
In Swift it is not possible to create dynamic variable names. What you are trying to achieve is the typical use case for an Array.
Create an Array and fill it with your person data. Later, you can access the persons via its index:
var persons: [String] = []
// fill the array
for i in 0..<10 {
persons.append("Person \(i)")
}
// access person with index 3 (indexes start with 0 so this is the 4th person)
println(persons[3]) // prints "Person 3"
let name = "person_\(index)"
then add name to a mutable array declared before the loop.
Something like that?
What you are trying to do is not possible in swift. Variable name is just for human being (Especially in a compiled language), which means they are stripped in compilation phase.
BUT if you really really want to do this, code generation tool is the way to go. Find a proper code generation tool, run it in build phase.

Custom parameters in Pentaho dashboards

Custom parameters in a CDE/CTools dashboard are great for defaulting initial values of parameters, e.g. setting a date parameter to today. i.e. the parameter looks like:
function() {
// some code
return val
}
However there is an issue with them. The first time you access a "custom parameter" in code, it is a function not a string. So you have to use:
paramName()
To get its value.
Once the end user selects a value then you have to use
paramName
This is really awkward in complicated dashboards with lots of prompts. Is there a better way this can be done? (Perhaps there is something in javascript I'm missing to help here?)
OK, there is a solution, but I dont like it!
First; Move all the init code into named procedures e.g.
function monthInit() {
return "june";
}
Then in the custom parameter for month, just say:
monthInit();
That way the custom parameter is always a string, and never starts off as a function.
Not ideal though because then all your init code is in a separate bit of js.

Check field errors in word references with VSTO

How can I check if some fields in my word have errors? I have a large document that contains many references to other chapters or images. When those chapters or images are missing in the document, the fields containing those references will display Error! Reference Source Not Found instead of the reference.
The problem is, that I need to create an algorithm that will check for those reference errors, no matter what the locale and language of the file is. The problem is, that this field error is localized in the language of the system of the user who uses the word.
How can I do this? Is there any property on Field that can be used to check if the source is available?
Currently, I check for errors in the fields by using the result text of the field:
Int32 fieldErrors = 0;
foreach (Word.Field field in doc.Fields)
{
field.Update();
if (field.Result.Text.StartsWith("Error!"))
++fieldErrors;
}
Unfortunately, this will only work in english word instances.
In the documentation for Field types it is seen that a Field instance has an Update() method that returns a bool. The documentation does not state what the semantic meaning of the return value is, however, by doing a short empirical study I found that the method returns true if the Update() succeeded and false if the update did not succeed. This means that in order to find fields with errors you can do something like:
var fieldsWithErrors = new List<Field>();
foreach (Field field in document.Fields)
{
if(!field.Update())
fieldsWithErrors.Add(field);
}
... or shorter with LINQ:
var fieldsWithErrors = document.Fields.Cast<Field>().Where(field => !field.Update()).ToList();
Another (and faster) approach would be to use the Update() method exposed by the Fields collection.
var indexOfFirstError = document.Fields.Update();
... the method returns the index of the first field with an error. If no errors are found, the method returns 0.
For complete documentation please see the MSDN references:
Field.Update()
Fields.Update()
Field members
Fields members

loading serialized data into a table

For an answer to another question, I wanted to load some serialized lua code into a table. The string to be loaded is of this form:
SavedVars = { }
SavedStats = { }
(where each of the {...} might be any Lua expression, including a table constructor with nested data. I'm assuming it is not calling any (global) functions or using global variables.
What I finally want to have is a table of this form:
{ ["SavedVar"] = { }, ["SavedStats"] = { } }
I do not want to have global variables SavedVars afterwards.
How to do this most elegantly?
(I already found a solution, but maybe someone has a better one.)
Here is my solution:
-- loads a string to a table.
-- this executes the string with the environment of a new table, and then
-- returns the table.
--
-- The code in the string should not need any variables it does not declare itself,
-- as these are not available on runtime. It runs in a really empty environment.
function loadTable(data)
local table = {}
local f = assert(loadstring(data))
setfenv(f, table)
f()
return table
end
It loads the data string with loadstring and then uses setfenv to modify the global environment of the function to a new table. Then calling the loaded function once fills this table (instead of the global environment), which we then can return.
Setting the environment to a new table has the effect that the code can't use any global data at all. I think this is a good way to sandbox the code, but if it is not wanted, you could populate the table before or provide it with some metatable (but unset it before returning the table).
This loading function will also work with serialized data produced like in Saving Tables with Cycles.