Pig: how to exit on failure? - apache-pig

-- do something
store result into '$RESULT.tmp';
rmf $RESULT
mv $RESULT.tmp $RESULT
If exceptions thrown before rmf $RESULT, then the script should exit immediately.

This can be achieved with the -F or -stop_on_failure command line flag. If used, Pig will stop execution when the first failed job is detected and discontinue further processing. This also means that file commands that come after a failed store in the script will not be executed (this can be used to create "done" files).
This is how the flag is used:
$ pig -F myscript.pig
or
$ pig -stop_on_failure myscript.pig
Source: http://pig.apache.org/docs/r0.10.0/perf.html#error-handling

Related

/bin/bash: command not found Google Colab

I am trying to run a ready project on Google Colab.. when I run a shell it gives the following error:
/bin/bash: example.sh: command not found
How I can solve this problem?
You have two options to run shell script in google-colab:
1) Execute a single script with !:
!sh example.sh
!echo "I am your code !!!"
2) Execute entire code-block as shell script with %%shell:
%%shell
sh example.sh
echo "You should add %% "
Note: In the second approach, entire block interpreted as shell script. You do not need ! at beginning of every script.

How to keep the snakemake shell file while running in cluster

While running my snakemake file in cluster I keep getting an error,
snakemake -j 20 --cluster "qsub -o out.txt -e err.txt -q debug" -s
seadragon/scripts/viral_hisat.snake --config json="<input file>"
output="<output file>"
Now this gives me the follwing error,
Error in job run_salmon while creating output file
/gpfs/home/user/seadragon/output/quant_v2_4/test.
ClusterJobException in line 58 of seadragon/scripts/viral_hisat.snake
:
Error executing rule run_salmon on cluster (jobid: 1, external: 156618.sn-mgmt.cm.cluster, jobscript: /gpfs/home/user/.snakemake/tmp.j9nb0hyo/snakejob.run_salmon.1.sh). For detailed error see the cluster log.
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Now I don't find any way to track the error, since my cluster does not give me an way to store the log files, on the other hand /gpfs/home/user/.snakemake/tmp.j9nb0hyo/snakejob.run_salmon.1.sh file is deleted immediately after finishing.
Please let me know if there is an way to keep this shell file even if the snakemake fails.
I am not a qsub user anymore, but if I remember correctly, stdout and stderr are stored in the working directory, under the jobid that Snakemake gives you under external in the error message.
You need to redirect the standard output and standard error output to a file yourself instead of relying on the cluster or snakemake to do this for you.
Instead of the following
my_script.sh
Run the following
my_script.sh > output_file.txt 2> error_file.txt

Handling DCL ON ERROR actions after first error?

The OpenVMS DCL command HELP ON EXAMPLE displays:
ON
Examples
1.$ ON SEVERE_ERROR THEN CONTINUE
A command procedure that contains this statement continues
to execute normally when a warning or error occurs during
execution. When a severe error occurs, the ON statement signals
the procedure to execute the next statement anyway. Once
the statement has been executed as a result of the severe
error condition, the default action (ON ERROR THEN EXIT) is
reinstated.
According to the help if neither [-]x.for nor [-]y.for exist then the last two lines will not be executed:
$ on error then $ continue
$ rename [-]x.for []
$ rename [-]y.for []
$ type *.for
Is there a way to set the ON ERROR handling as in the first line w/o placing an ON ERROR statement between each line of the script?
If the ON ERROR fires, you have to re-establish it. It looks like you
don't know whether any of the files exists. So the ON ERROR needs to be
re-established after the first failing command.
You can do this in a subroutine, like in:
$ on error then $ gosub on_error
$ rename [-]x.for []
$ rename [-]y.for []
$ on error then $ exit
$ type *.for
$ exit
$
$ on_error:
$ on error then $ gosub on_error
$ return
Also, you can handle this differently, with disabling error checking (SET
NOON):
$ set noon
$ rename [-]x.for []
$ rename [-]y.for []
$ set on
$ type *.for
or establishing error handling only for sever errors (ON SEVERE_ERROR):
$ on severe_error then $ exit
$ rename [-]x.for []
$ rename [-]y.for []
$ on error then $ exit
$ type *.for

Redirect stderr through grep -v in LSF batch job

I'm using a library that generates a whole ton of output to stderr (and there is really no way to suppress the output directly in the code; it is ROOT's Minuit2 minimizer which is known for not having a way to suppress the output). I'm running batch jobs through the LSF system, and the error output files are so big that they exceed my disk quota. Erk.
When I run locally on a shell, I do:
python main.py 2> >( grep -v Minuit2 2>&1 )
to suppress the output, as is done here.
This works great, but unfortunately I can't seem to get that or any variation of it to work when running on LSF. I think this is due to LSF not spawning the necessary subshell, but it's not clear.
I run on batch by passing LSF a submit script. The relevant line is:
python main.py $INPUT_FILE
which works great, aside from the aforementioned problem of gigantic error files.
When I try changing that line to
python main.py $INPUT_FILE 2> >( grep -v Minuit2 2>&1 )
I end up with
./singleSubmit.sh: line 16: syntax error near unexpected token `>'
./singleSubmit.sh: line 16: `python $MAIN $1 2> >( grep -v Minuit2 2>&1 )'
in the error log file.
Any idea how I could accomplish what I want, or why this is not working?
Thanks a ton!
The syntax you're using works in bash, not in csh/tcsh. Try changing the first line of your submission script to
#!/bin/bash

Can I fail a build based on the outcome of a SSH Task?

I was wondering if I could use bamboo's SSH task to run a script (this kicks off a small java message injector).
Then grep the logs for ERRORS. If any ERROR is present I would like to fail the build.
Something like this:
Is this a Bash question or is it really about Bamboo? Here is the Bash problem answer:
If you run
[[ ! $(grep ERROR /a/directory/log/*) ]]
the script will exit with an error if it finds the word "ERROR" anywhere in the files.
Bamboo should detect the task execution as failed.
(Note that if Bash is not the default shell on your target system you may need a #!/bin/bash on top of the script file.)