Pentaho Data Integration - Pass dynamic value for 'Add sequence' as Start - pentaho

Can we pass any dynamic value (which is the max value of another table column) in "Start at Value" in ADD Sequence step.
Please guide me.

Yes, but as the step is written you'll have to be sneaky about it.
Create two transforms and wrap them up in a job. In the first transform, query that database to get the value you want, then store it in a variable. Then in the second transform, which you should execute in the job after the first, in the Add Sequence step use variable substitution on the Start at Value field to sub in the value you previously extracted from the earlier transform.
Note that you can't do this all in one transform because there is no way to ensure that the variable will be set before the Add Sequence step (although it might seem like Wait steps would make this possible, I've tried it in the past and was unsuccessful and so had to go with the methods described above).

Related

Can I get pipeline parameters in Synapse Studio Notebook dynamically without predefining parameter names in parameter cell?

In the Pipeline that triggers the Notebook, I'm passing some base parameters in, as below.
Right now, all the materials I read instructed me to declare variables inside the Notebook parameter cell with the same names, as below.
Is there a way that I can get the Pipeline parameters dynamically without pre-defining the variables? (Similar mechanism with sys.argv, in which a list of args are returned without the need of predefined variables)
To be able to pass parameters, you should have them declared in the synapse notebooks so that the values can be received and used as per requirement.
The closest functionality similar to sys.argv you can get is to just declare one parameter in the parameter cell.
Now, concatenate all the values separated by a delimiter like space or ,. So that, you can just use split and use the values same as sys.argv. So, instead of creating new parameters each time, you can just concatenate the value to this parameter itself.
#{pipeline().parameters.a} #{pipeline().parameters.b} #{pipeline().parameters.c}
When I run this, the values would be passed as shown below. You can use split on args variable and then use the values accordingly.

Data Factory expression substring? Is there a function similar like right?

Please help,
How could I extract 2019-04-02 out of the following string with Azure data flow expression?
ABC_DATASET-2019-04-02T02:10:03.5249248Z.parquet
The first part of the string received as a ChildItem from a GetMetaData activity is dynamically. So in this case it is ABC_DATASET that is dynamic.
Kind regards,
D
There are several ways to approach this problem, and they are really dependent on the format of the string value. Each of these approaches uses Derived Column to either create a new column or replace the existing column's value in the Data Flow.
Static format
If the format is always the same, meaning the length of the sections is always the same, then substring is simplest:
This will parse the string like so:
Useful reminder: substring and array indexes in Data Flow are 1-based.
Dynamic format
If the format of the base string is dynamic, things get a tad trickier. For this answer, I will assume that the basic format of {variabledata}-{timestamp}.parquet is consistent, so we can use the hyphen as a base delineator.
Derived Column has support for local variables, which is really useful when solving problems like this one. Let's start by creating a local variable to convert the string into an array based on the hyphen. This will lead to some other problems later since the string includes multiple hyphens thanks to the timestamp data, but we'll deal with that later. Inside the Derived Column Expression Builder, select "Locals":
On the right side, click "New" to create a local variable. We'll name it and define it using a split expression:
Press "OK" to save the local and go back to the Derived Column. Next, create another local variable for the yyyy portion of the date:
The cool part of this is I am now referencing the local variable array that I created in the previous step. I'll follow this pattern to create a local variable for MM too:
I'll do this one more time for the dd portion, but this time I have to do a bit more to get rid of all the extraneous data at the end of the string. Substring again turns out to be a good solution:
Now that I have the components I need isolated as variables, we just reconstruct them using string interpolation in the Derived Column:
Back in our data preview, we can see the results:
Where else to go from here
If these solutions don't address your problem, then you have to get creative. Here are some other functions that may help:
regexSplit
left
right
dropLeft
dropRight

Pentaho Kettle (PDI) table input step with field substitution running slower than using literal

I'll go straight to the point. I have a table input step, which reads records with a query that includes a where clause, as follows:
SELECT * id, name, surname, creation_date
FROM users
WHERE creation_date > ?
If a put a literal (i.e. '2017-04-02T00:00:00.000Z') in palce of the question mark, this step reads all new values, which could be thousands, in millis. If I use the field substitution and use the incoming value, it takes minutes.
Do you know why this could be happening? Do you know how to solve the issue?
Thank you very much for your time.
I found a workaround, not a solution for this particular issue, but it works: instead getting the value from the previous step and use field substitution (? in the query), I read the value in a previous transformation in the job, store it in the variables space, and read it from there using variable substitution ('${variable_name}' in the query), it works just as fast as if the value were hardcoded.

kettle etl passing variable

In my transformation, I created a var (current time formatted to yyyy-mm-dd HH24mmss ) in Modified java script. I then use a set variable step to set the field to a variable and the scope is valid in root job.
The question is how to use that variable in another transformation (in the same job)? I tried get variable, but there seems to be only system variables. What I want to do is output the date to a file in the second transformation. There are more transformations in between, that's why I can't do the output in the first transformation.
Or is it possible to create a variable in the job, and set its value (current date in yyyy-mm-dd HH24mmss) then use it in transformations?
EDIT:
The answer works, but the date is not in my expected format (yyyy-mm-dd HH24mmss), and it's not clear what format the date is. E.g if I try to format it in a modified java script and use getFullYear function on that I get TypeError: Cannot find function getFullYear in object Wed May 25 17:44:04 BST 2016. But if I just output it to a file, the date is in yyyy/mm/dd hh:mm:ss.
So I found another way to do it is use a table input and generate a date to the format desired and set variable, the rest is the same.
In your first transformation use the Get System Info step to inject the current date/time into your data flow and run it into a Set Variables step that sets the variable defined in your Job.
The variable you're using may not appear in the drop down list when you do CTRL-Space. This is because the variable is allocated by the Job at run time and isn't available at design time. Just type '${VariableName}' into the field at design time. When you run from a job that contains a variable of that name, it should work.

how to create a sequence without using mapping variables and sequence generator?

I had a scenario where i have to generate a sequence without using sequence transformation
i can do the same using mapping variables like using the setcountvariable() option it works but is there any other solution for the same.
Thanks
here unconnected lookup you are using for only getting count or max of sequence from target table.
we can also achive this by using stored procedure transformation.
using unconnected lookup and expression transformation
create unconnected lookup to get max sequence value from target for that you have to override a query which will give you max value
return that max value port to expression transformation
now create one variable port with increment by 1 and add with max value it wil create sequence value for evry record passing to target table