Pentaho PDI: Final value of previous row's calculated field

Pentaho PDI: Final value of previous row's calculated field - pentaho

I tried to use the Analytik Query step to access some calculated field of the previous row. Turns out that the rows are all calculated in parallel and that accessing the previous row's fields gives you the current value they have during their processing, which is kind of random. It does not seem to be possible to obtain the final value of a field of a previous row. Or is there any other way than the Analytik Query step? I imagine all I need is a checkbox "Wait for previous rows to complete"...
What I need this for: I am processing time dependent data and doing a state recognition. When I am currently in state A, I do other stuff with my data then when I am in state B. So I need to know the state of the previous data row (which is determined not before the end of my transformation).
It can be done is Excel really easy, so I guess there must be some way in PDI. :-)
Thanks for any help!

If i have understood your question correctly, you may try using the Block this step until steps finish. This step waits until all the step copies that are specified in the dialog have finished. Read the link for more.
Hope this helps:)

I believe that it can be resolved by using the User Defined Java Class (UDJC) step.
If you sort the rows before processing them, the Sort By step would wait for the last row set by default.
Here's the most basic example of writing an output row for each input row. One important thing to keep in mind with the User Defined Java Class step, is the fact that they rewrite your whole data set, therefore need to be well thought of, especially if you do look-backs at previous rows. I hope this helps a bit.
// A class member that stores the previous row:
public Object[] previousRow;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi)
throws KettleException {
// Fetching row:
Object[] r = getRow();
// Check if r is null:
if (r == null) {
setOutputDone();
return false;
}
// Get some field's value:
String someFieldValue = get(Fields.In, "someFieldName").getString(r);
// Log value if you want:
logBasic("current field value is " + someFieldValue);
// Generate an output row object:
Object[] outputRow = RowDataUtil.createResizedCopy(r, data.outputRowMeta.size());
// Modify row's field values if needed:
get(Fields.Out, "someFieldName").setValue(outputRow, "a modified value here");
// Write row:
putRow(data.outputRowMeta, outputRow);
// Assign current row to previous row:
previousRow = r;
return true;
}
EDIT:
One more important thing to note about PDI - the blocking method, either by blocking steps or by the Sort by step, is done by checking row sets rather than single rows.
How can this be verified?
Right click --> Transformation Settings --> Miscellaneous --> Nr of rows in rowset.
The default value is 10000 rows. PDI developers often create a deadlock by using one of the blocking steps with a row set size that doesn't fit their data volume - do keep that in mind.

Use "Identify last row in a stream" & "Filter rows" transformations. The 1st transformation checks if its the last row and returns a Boolean value and the later can be used to filter the records based on the Boolean value returned.

Related

LabVIEW - How to clear an array after each iteration in a for loop

I'm trying to clear an array after each iteration of a for loop in LabVIEW, but the way I've implemented it has the values not going directly to what I want, but it changes with previous values in other parts of the array.
It isn't shown, but this code is inside of a for-loop that iterates through another numeric array.
I know that if I get the array to clear properly after each loop iteration, this should work. How do I do that? I'm a beginner at Labview but have been coding for awhile - help is appreciated!!!
[![labview add to array][2]][2]

It looks as if you're not quite used to how LabVIEW passes data around yet. There's no need to use lots of value property nodes for the same control or indicator within one structure; if you want to use the same data in more than one place, just branch the wire. Perhaps you're thinking that a LabVIEW control or indicator is equivalent to a variable in text languages, and you need to use a property node to get or set it. Instead, think of the wire as the variable. If you want to pass the output of one operation to the input of another, just wire the output to the input.
The indicators with terminals inside your loop will be updated with new values every loop iteration, and the code inside the loop should execute faster than a human can read those values, so once the loop has finished all the outputs except the final values will be lost. Is that what you intended, or do you want to accumulate or store them in some way?
I can see that in each loop iteration you're reading two values from a config file, and the section is specified by the string value of one element of the numeric array Array. You're displaying the two values in the indicators PICKERING and SUBUNIT. If you can describe in words (or pseudocode, or a text language you're used to) what manipulation of data you're actually trying to do in the rest of this code, we may be able to make more specific suggestions.

First of all, I'm assuming that the desired order of operations is the following:
Putting the value of Pickering into Array 2
Extracting from Array 2 the values to put in Pickering 1 and Pickering 2
Putting Array 2 back to its original value
If this is the case, with your current code you can't be sure that operation 1 will be executed be fore operation 2. In fact, the order of these operations can't be pre-determined. You must force the dataflow, for example by creating a sequence structure. You will put the code related to 1 in the first frame, then code related to operation 2 in the second.
Then, to put Array 2 back to it's original value I would add a third frame, where you force an empty array into the Value property node of Array 2 (the tool you use for pickering, but as input and not as output).
The sequence structure has to be inside the for loop.
I have never used the property node Reinit to default, so I can't help you with that.
Unfortunately I can't run Labview on this PC but I hope my explanation was clear enough, if not tell me and I will try to be more specific.

Pentaho/PDI: Increment a value automatically by one if a load-job (within a metajob) fails

in PDI I've got the following structure
0_Metajob
1_Load_1
1_Load_2
1_SimpleEvaluation
1_Mail
As of now
1_Load_1 and 1_Load_2 are independent of each other. The second one will run, irrespective of the success of the first one. That is okay, I want it that way!
Issue
I want to have a counter that is incremented by one every time one of the single loads fails, i.e. in my example the counter can take the values 0, 1 or 2.
What do I need it for? Customer will receive a mail at the end of the metajob. The aforementioned value determines the subject of the mail, i.e. 0=everything fine, 1=so-so, 2=load totally failed!
Why not mailing within every single the Load-Job? I do that but without attaching the log-file because it is usually non-finished. Therefore the log-file is mailed with the mail that is sent when the Metajob is finished.
Tried
"Set a variable". Thought I can simply increment it with adding a one in the value field, i.e. "${VariableName}+1". Of course, this step is implementened within a fail path of each Load-Job.
However, it didn't work.
Would anyone mind helping me? I would appreciate that!

Set Variable doesn't do calculations, you'll need a Javascript step for that.
Fortunately, variables can be also be set within the Javascript step. This bit of code should go into each of the steps you put in place of the Set Variable steps:
var i = parseInt(parent_job.getVariable("Counter"),0);
i = i + 1;
parent_job.setVariable("Counter",i);
true;
This bit of code gets the variable "Counter" from the parent job and converts it to int, since all Pentaho variables are strings. Then it increments it and sets the job variable again. The "true" at the end is to ensure that the javascript step reports success to the main job.
IMPORTANT: This works roughly as you would expect in a Job. It will NOT in a transformation!

QML ListModel.onDataChanged arguments

I was wondering what kind of data I can use to handle a dataChanged-signal in a QML ListModel.
I found out that it has three arguments, two of which are QModelIndices and one is a QVariant(...).
So from the first two (which seems to be the same?) I can get the row, column (which is supposed to be 0), the model itself and uhm... stuff
But why do I get it twice?
And what is the content of the third? It is not null, but I haven't found a property I could use to retrieve some useful data from it.

A ListModel implementsQAbstractItemModel, the dataChanged signal you are seeing is the one defined in this class :
void QAbstractItemModel::dataChanged(const QModelIndex &topLeft, const QModelIndex &bottomRight, const QVector<int> &roles = QVector<int> ())
The 2 first parameters tell us that all data between the first and second indexes are changed. The 3rd parameter is a list of roles where the data has changed, if the list is empty it means the data at all roles has potentially been changed.
In your case the first and second indexes are the same because only one row is changed at a time.

Ampl syntax: return value at a specific position in set. AKA: use a set as an INDEX on another set

My question is this:
How can I use SUBSET (a discontinuous set) to refer to an index location in another set as opposed to an actual value? I see that ord() can be used to return the position of a value in a set, but I want the reverse of this...
my reason for needing this:
I have a model in which some of the set and data statements are roughly:
set ALL_TIME := {0..20000};
param DATA {ALL_TIME}; #read from file in later data statement;
set myTIME := {0...1000};
I am looping over the myTIME set and each time solving the model and then incrementing the start and end by 1: {1..1001}, {2..1002}, {3..1003}, etc.
I have another discontinuous set being read in from a file that looks something like this (yes below is bad syntax, the "...." is just there to mean that the pattern continues until it hits 1000 so I don't have to type it all) :
set SUBSET := {6,7,8,9,10, 16,17,18,19,20, 26,27.....}
Once myTIME increments such that it no longer contains "6", I get a subscript undefined error from a constraint which I understand to be because myTIME in this case is {7..1007} and thus in the following, tSUB=6 causes ALPHA[6] and is undefined:
subject to CONSTRAINT {tSUB in SUBSET}:
ALPHA [tSUB] = ALPHA[last(tSUB)];
What I want is to be able to use SUBSET to always refer to the same index location of ALPHA, DATA, etc.
So:
SUBSET[0] (which equals 6) should always be the 6th value of for example DATA:
{tSUB in SUBSET}: DATA[tSUB]. when tSUB is 0, I want the 6th value of DATA.
(I am new to Ampl and have a hard time wrapping my head around how indexing and sets work - if anything didn't make sense, please ask and I'll try to clarify. If you think it would be more helpful to see my actual code I'll try to sanitize the company data out and post it). Also, some of the code bits above have abysmal syntax. They are not copied from my code, just approximated to try to explain my problem. :)

You can get i-th member of set S with member(i, S), where i is a 1-based index and S is an ordered set. This is described in section 5.6 Ordered sets of the AMPL book.

Increment in while loop not happening after OR

Imagine you've got a while loop with an OR condition of two values. The second value has an increment (++). If the first value is true increment from the second value never seems to occur.
Let's take a look at a piece of code:
bool iterationPassed = false;
while (!iterationPassed || ++retries <= 3)
if(somethingHappened)
iterationPassed = true;
Okay, I'm very aware that this is logical. If the first value is true, there's an OR statement, why would there be a need to check the other value. But in this case, the other value is a result of another step (++) and that step never happened. So I guess incrementing or doing any sort of operation in a complex condition is a very bad practice? Also, is this language specific?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pentaho PDI: Final value of previous row's calculated field - pentaho

If i have understood your question correctly, you may try using the Block this step until steps finish. This step waits until all the step copies that are specified in the dialog have finished. Read the link for more. Hope this helps:)

Use "Identify last row in a stream" & "Filter rows" transformations. The 1st transformation checks if its the last row and returns a Boolean value and the later can be used to filter the records based on the Boolean value returned.

Related

LabVIEW - How to clear an array after each iteration in a for loop

Pentaho/PDI: Increment a value automatically by one if a load-job (within a metajob) fails

QML ListModel.onDataChanged arguments

Ampl syntax: return value at a specific position in set. AKA: use a set as an INDEX on another set

Increment in while loop not happening after OR

Categories

Resources