Pentaho : Executing a Job multiple times with different Params - Params are not sent over to Job - pentaho

I have a Pentaho "Job" that should be executed multiple times, each time with a different param value.
I indeed have a similar set up as we see in this link https://help.pentaho.com/Documentation/8.1/Products/Data_Integration/Transformation_Step_Reference/Job_Executor
The Job does get executed as many times as the number of rows, although the different parameter values are not passed onto the Job. If I add a write to Log in the job it does not print the different values.

Figured it was a defect in 8.1.x. Upgraded to 8.2.x and it is resolved.

Related

How to add(concatenate) variables inside batch processing in mule 4?

I am processing records from one DB to another DB. The batch job is being called multiple times in a single request(triggering the process API URL only one time).
How can I add the total records processed(given by the payload at the on-complete phase) for one complete request?
For eg, I ran the process, and three times the batch job executed. So I want to have the sum of all the records in all the 3 batch jobs.
That's not possible because of how the Batch scope works:
In the On Complete phase, none of these variables (not even the
original ones) are visible. Only the final result is available in this
phase. Moreover, since the Batch Job Instance executes asynchronously
from the rest of the flow, no variable set in either a Batch Step or
the On Complete phase will be visible outside the Batch Scope.
source: https://docs.mulesoft.com/mule-runtime/4.3/batch-processing-concept#variable-propagation
What you could do is to store the results in a persistent repository, for example in you database.

SSRS Report won't compile if 20+ values selected in multi-valule parameter

I am working on an SQL report that uses a multi-value parameter that contains a total of 41 users. The report works fine if I select 1 or up to 19 total users, but breaks if 20 or more are selected from the list.
By break I mean it attempts to execute for 40+ minutes before I kill it). When running for 1 or for 19 users the report takes 1:10 to run.
I am using two datasets.
One - my main query in which the parameter is used.
Two - The second query to acquire the list of users for the SSRS parameter.
I use this method frequently with no issues for things like locations, insurances, etc.
The parameter is called in a WHERE statement like so: AND EventUserID IN (#user)
If I comment that line out and use: AND EventUserID IN ('KTR','GORCN',......) with the full list of usernames that were acquired with the same query that is being used in the second dataset it works fine and will return the full report.
I have tested it with different groups of users to make sure that one of the users weren't breaking it, but that didn't matter. I also should mention that the query for the second dataset is one I used from another report that uses it the same exact way. That report will run fine will all users selected (parameter properties are set the same).
I am working with MS SQL Server and MS Visual Studio. More details can be provided if necessary.
Thanks in advance for your time and assistance.
This looks like an issue I came across a while back. You essentially have a limited number of parameters you can pass back to SSRS.
Here's some similar issues:
https://blogs.msdn.com/b/johndesch/archive/2012/05/16/webpage-error-when-running-a-parameterized-report-with-parameters.aspx
You can increase this from the web.config file though
https://epmainc.com/blog/ssrs-reports-error-when-large-number-parameters-are-passed
Basically you want to add a section in your web.config file inside the <appsettings/> section:
add key="aspnet:MaxHttpCollectionKeys" value="9999"
That 9999 value should represent the number of parameters that you believe will be used.
If using SharePoint integration Mode: C:\inetpub\wwwroot\wss\VirtualDirectories\\web.config
If using SSRS native Mode: C:\Program Files\Microsoft SQL Server\MSSQL\Reporting Services\ReportServer.

SQLAgent job with different schedules

I am looking to see if its possible to have one job that runs different schedules, with the catch being one of the schedules needs to pass in a parameter.
I have an executable that will run some functionality when there is no parameter, but if there is a parameter present it will run some additional logic.
Setting up my job I created a schedule (every 15 minutes), Operating system (CmdExec)
runApplication.exe
For the other schedule I would like it to be once per day however the executable would need to be: runApplication.exe "1"
I dont think I can create a different step with a separate schedule, or can I?
Anyone have any ideas on how to achieve this without having two separate jobs?
There's no need for 2 jobs. What you can do is update your script so the result of your job (your parameter) is stored in a table. Then update your secondary logic to reference that table. If there's a value of parameter, then run your secondary logic. All in one script. If there's no value in that parameter, then have your secondary logic to return a 0 or not run at all.
Just make sure you either truncate the entire reference parameter table every run or you store a date in there so you know which one to reference.
Good luck.

Why BigtableIO writes records one by one after GroupBy/Combine DoFn?

Is someone aware of how the bundles are working within BigtableIO? Everything looks fine until one is using GroupBy or Combine DoFn. At this point, the pipeline would change the pane of our PCollection element from PaneInfo.NO_FIRING to PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0} and then BigtableIO will output the following log INFO o.a.b.sdk.io.gcp.bigtable.BigtableIO - Wrote 1 records. Is the logging causing a performance issue when one have millions records to output or is it the fact that BigtableIO is opening and closing a writer for each record?
BigtableIO sends multiple records in a batch RPC. However, that assumes there there are multiple records sent in the "bundle". Bundle sizes are dependent on a combination of the step before hand, and the Dataflow framework. The problems you're seeing don't seem to be related to BigtableIO directly.
FWIW, here's the code for logging the number of records that occurs in the finishBundle() method.

Pentaho Job - Execute job based on condition

I have a pentaho job that is scheduled to run every week and gets data from one table and populates in another.
Now the job executes every week irrespective of whether the source table was updated or not.
I want to put a condition before the job runs to see if the source was updated last week or not and run the job only if the source was updated else dont run the job.
There are many ways you can do this. Assuming you have a table in your database that stores the last date your job was run, you could do something like this.
Create a Job and configure a parameter in it (I called mine RunJob). Create a transformation which gets your max run date, or row count then looks up the run date or row count from the previous run and compares them. It then sets the value of your job's variable based on the results of the comparison. Mine looks like this.
Note that the last step in the transform is a Set Variables step from the Job branch.
Then in your job use a Simple Evaluation step to test the variable. Mine looks like this:
Note here that my transform sets the value of the variable only if the job needs to be run, otherwise it will be NULL.
Note also to be sure to update your last run date or row count after doing the table load. That's what the SQL step does at the end of the job.
You could probably handle this with fewer steps if you used some JavaScript in there, but I prefer not to script if I can avoid it.