Setting varible from shell script in Pentaho kettle which can be accessed by further jobs - pentaho

I wanted to know how can I set an variable from shell job available in pentaho kettle, which can be accessible by further Jobs(Simple evaluation) in the workflow.
I am trying to create a workflow where I have a start element which would trigger as shelljob to check the folder presence, if the folder is present then set one variable. The next job is Simple evaluation which needs to check if the variable(Set by shell job) is true that proceed with the workflow or terminate the workflow.
Start-->ShellJob(check folder created and set variable)-->SimpleEvaluation Job.
--MIK

Good question. I'm not aware of such capability, as the "Execute a shell script..." step isn't designed to be a data pipeline. Furthermore, what values should/can a script return to you? Is it the result of an echo? A shell script could essentially be anything. I would say there's a reason why there is no built-in functionality for that in PDI.
Having said that, what you could do is something like this:
Execute a script, at the end of it write the variables into a text file on the file system
Create a sub-transformation that reads the variables from the file you've written in the shell script step, and then stores it/them in global scope variable
Evaluate the variables in the job
It may seem a bit cumbersome, but it should do the job for you, since you're asking to use the Shell Script step in a way it's not really designed to be used.
Here's an example of a high-level implementation (implementation of the sub-transformation should be very simple):
I hope it helps.

Related

How do I access a variable from Bamboo in a Gradle script?

I am creating a deployment in Bamboo. I have some variables set up under the deployment plan. How can I access these from a Gradle script? There is an arguments input (that I guess I would use something like variable=${bamboo.variable} in there, but I cant work out how to get them to go through to the script (at the moment just doing something like prinln varible to get them out). How can I do this?
As far as I know, Bamboo exports all it's variable into the build environment. In that case, you can get any variable within the script as follows:
System.getenv('bamboo.variable')
Alternatively, you may pass it into the build as build script parameter, like so:
-Pvariable=${bamboo.variable}
and then you can get it within a script as a project property:
println variable

how to call variables from a running script to a different script powershell

I want a script to get a variable from a different script which are both running on the system. Is that possible?
I have two scripts running on the system, and I want one script to pull a user-defined variable instead of asking the user to input the data twice
Assuming both scripts are running concurrently, MSMQ would be one option:
What's the best way to pass values to a running/background script block?
Global Variables are only "Global" in the context of scopes in the current session. They aren't visible to a script running is a different session.
Write it to a text file and have the other script read it. You can load up the text into a variable, then execute it as code using iex (Invoke-Expression).

Execute Multiple PowerShell Files using a SSIS Package

I have multiple PowerShell script files that I need to execute in a sequential flow(one after other). Can someone please help me how to schedule multiple PowerShell files to be executed using a SSIS Package. And I need to build a fault tolerant model were I need to re-execute a powershell script in case of failure.
Running PowerShell
There isn't a built-in Execute PowerShell task (pity) so you'll need to use an Execute Process Task with the path to powershell.exe
Something that you will need to take into consideration is that the default execution policy for PowerShell is Restricted which cannot run a script. Further complicating matters is the account that runs the SSIS package will also need to have its execution policy modified to be able to fire off those scripts. It's a simple matter of Set-ExecutionPolicy RemoteSigned or whatever level you feel is appropriate but you'll need to do this from within the account.
Fault Tolerance
The simple approach is to ignore the return code in the Execute Process Task. Alternatively, if the desire is to keep running the PS1 until it doesn't fail, then you'd wrap a For Loop Container around the Execute Process Task and only set the terminal condition once the task returns a success value. Things might still go sideways depending on what the failure is.

Check for multiple files

Okay, I'll try to explain as good as I can... Quite a particular case.
Tools: SSIS 2008
We have a control flow that now needs to be triggered by an event: the presence of one or multiple files. (1,2 or 3)
The variables used:
BO_FileLocation_1
BO_FileLocation_2
BO_FileLocation_3
BO_FileName_1
BO_FileName_2
BO_FileName_3
There can be one, two or three files: defined in above variables. When they are filled in,
they should be processed. When they are empty, this means there's just one file file, the process should ignore them and jump to the next (file watcher?) task.
For example:
BO_FileLocation_1= "C:\"
BO_FileLocation_2 NULL
BO_FileLocation_3 NULL
BO_FileName_1= "test.csv"
BO_FileName_2 NULL
BO_FileName_3 NULL
The report only needs one file.
I'd need a generic concept that checks the presence of these files, it could be more generic than my SSIS knowledge can handle right now. For example handy, when there's a 4th file in the future. I was also thinking to work with a single script to handle all the logic.
Thanks in advance
A possibly irrelevant image:
If all you want is to trigger the Copy Source File to handle if one or more of the files is present, just use the OR Constraint in your flow. The following image shows you how:
First connect all to the destination:
Then click one of the green arrows. This will make its properties window pop up. Select the Logical ORinstead of the Logical AND:
If everything went well, you should now see the connections as dashed lines:
There are several possible solutions:
Create a sequence container and include all the file imports in the sequence container. Add int variables for RowCountFile1, RowCountFile2, and RowCountFile3 and set the value to 0 (this is the default value when you create an int variable). Add a RowCount transformation to each of the data flows. Create a precedence constraint from the sequence container to the "Do something" task. Set the precedence constraint to success and expression. Set the expression value to #RowCountFile1 > 0 || #RowCountFile2 > 0 || #RowCountFile3 > 0. The advantage of this approach is that you can take an action as soon as the files are detected, you import all available files, and you only take an action after all the files have been imported. You could then schedule running this SSIS package as a SQL Server Agent job step and run it as frequently as you want.
A variant on solution 1 is to use for each file enumerator containers inside the sequence container. This would be useful if you don't know the exact name of the file and you expect to import more than one under some circumstances. For instance, if you get a file every few minutes with a timestamp in its file name and your process doesn't run for some reason, then you may have to process multiple files to get caught up and then take an action once it has been done.
You could use the file watcher task as you outlined in your question. The only problem I have with the file watcher task is that the package has to be in a constantly running state. This makes it hard to troubleshoot problems and performance. It also can introduce other problems since I remember having some problems with the file watcher task years ago when it first came out. It may well be a totally stable task now, but I prefer other methods over the task after having been burned previously. If you really want the package to run continously instead of having it be called by a job, then you could always use a script task to check for file, sleep thread if not found, check again, etc. I'm sure that's what the file watcher task does, but I would trust my own C# over the task. Power to anyone who has had better experiences than me with File Watcher...
Use PowerShell. If you just want to take an action if a file appears and you aren't importing the data, then a PowerShell script could do this just as well as a SSIS package. The drawback is that you have to learn some basic PowerShell, it may be hard to maintain in the future since PowerShell is probably not your bread and butter core language, and you may have to rewrite the code again to a SSIS package if you want to import the data. You would probably call the PowerShell script from a SQL Server Agent job step, so scheduling can be handled pretty easily.
There are more options than what I listed, so let me know if you still want more suggestions.

ms-access: doing repetitive processes with vba/sql

i have an access database backend that contains three tables. i have distributed the front end to several users. this is a very simple database with minimal functionality. i need to import certain rows from a file every hour into one of the tables in the database. i would like to know what is the best way to automate this process so that i can have it running hourly. i need it to be running sort of as a service in the background. can you tell me how you would do this?
You could have for example:
a ms-access file with all necessary code to run the import proc
a BAT file containing the command line(s) that will run this ms-access file with all requested parameters. Check ms-access command line parameters to see the available options.
a task scheduler service software to launch the BAT file: depending on the task scheduler and the command line to be sent, you could even avoid the BAT file step
If all you want to do is run some queries, I would not do this by automating all of Access, but instead by writing a VBScript that uses DAO to execute the SQL directly. That's a much more efficient way to do it, and will run without a console logon (which may or may not be required for full Access to be run by the task scheduler).