How to keep the snakemake shell file while running in cluster - snakemake

While running my snakemake file in cluster I keep getting an error,
snakemake -j 20 --cluster "qsub -o out.txt -e err.txt -q debug" -s
seadragon/scripts/viral_hisat.snake --config json="<input file>"
output="<output file>"
Now this gives me the follwing error,
Error in job run_salmon while creating output file
/gpfs/home/user/seadragon/output/quant_v2_4/test.
ClusterJobException in line 58 of seadragon/scripts/viral_hisat.snake
:
Error executing rule run_salmon on cluster (jobid: 1, external: 156618.sn-mgmt.cm.cluster, jobscript: /gpfs/home/user/.snakemake/tmp.j9nb0hyo/snakejob.run_salmon.1.sh). For detailed error see the cluster log.
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Now I don't find any way to track the error, since my cluster does not give me an way to store the log files, on the other hand /gpfs/home/user/.snakemake/tmp.j9nb0hyo/snakejob.run_salmon.1.sh file is deleted immediately after finishing.
Please let me know if there is an way to keep this shell file even if the snakemake fails.

I am not a qsub user anymore, but if I remember correctly, stdout and stderr are stored in the working directory, under the jobid that Snakemake gives you under external in the error message.

You need to redirect the standard output and standard error output to a file yourself instead of relying on the cluster or snakemake to do this for you.
Instead of the following
my_script.sh
Run the following
my_script.sh > output_file.txt 2> error_file.txt

Related

Snakemake cannot find output file, gives MissingOutputException while latency-wait is seemingly ignored

I have a simple rule to generate a file in Snakemake. Running snakemake results in an immediate error that it cannot find the generated file, even when --latency-wait is specified as a command line option.
However, this does seem to be a latency-related issue, as this Snakefile runs without problems on a local machine. The output below is on a system that has known latency problems.
Contents of Snakefile:
rule generate_file:
output:
"dummy.txt"
shell:
"head --bytes 1024 < /dev/zero | base64 > '{output}'; ls"
Commands:
$ snakemake --version
5.2.0
$ snakemake -p --latency-wait 10
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 generate_file
1
rule generate_file:
output: dummy.txt
jobid: 0
head --bytes 1024 < /dev/zero | base64 > 'dummy.txt'; ls
dummy.txt Snakefile
MissingOutputException in line 1 of /home/user/project/Snakefile:
[Errno 2] No such file or directory: ''
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Removing output files of failed job generate_file since they might be corrupted:
dummy.txt
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/user/project/.snakemake/log/2018-08-08T101648.774072.snakemake.log
Interestingly, the ls command shows the file is created and visible.
Your rule creates output file dummy.txt when used with snakemake version 5.2.2 and linux, and snakemake ends successfully. Perhaps it is a bug in version 5.2.0? I don't see anything about it in change logs though.
On related note, use of head in shell command used to result in non-zero exit status error. Apparently recent version behaves differently in this respect.

Redirect stderr through grep -v in LSF batch job

I'm using a library that generates a whole ton of output to stderr (and there is really no way to suppress the output directly in the code; it is ROOT's Minuit2 minimizer which is known for not having a way to suppress the output). I'm running batch jobs through the LSF system, and the error output files are so big that they exceed my disk quota. Erk.
When I run locally on a shell, I do:
python main.py 2> >( grep -v Minuit2 2>&1 )
to suppress the output, as is done here.
This works great, but unfortunately I can't seem to get that or any variation of it to work when running on LSF. I think this is due to LSF not spawning the necessary subshell, but it's not clear.
I run on batch by passing LSF a submit script. The relevant line is:
python main.py $INPUT_FILE
which works great, aside from the aforementioned problem of gigantic error files.
When I try changing that line to
python main.py $INPUT_FILE 2> >( grep -v Minuit2 2>&1 )
I end up with
./singleSubmit.sh: line 16: syntax error near unexpected token `>'
./singleSubmit.sh: line 16: `python $MAIN $1 2> >( grep -v Minuit2 2>&1 )'
in the error log file.
Any idea how I could accomplish what I want, or why this is not working?
Thanks a ton!
The syntax you're using works in bash, not in csh/tcsh. Try changing the first line of your submission script to
#!/bin/bash

How to fail gitlab CI build?

I am trying to fail a build in gitlab CI and get email notification about it.
My build script is this:
echo "Listing files!"
ls -la
echo "##########################Preparing build##########################"
mkdir build
cd build
echo "Generating make files"
cmake -G "Unix Makefiles" -D CMAKE_BUILD_TYPE=Release -D CMAKE_VERBOSE_MAKEFILE=on ..
echo "##########################Building##########################"
make
I have commited the code that breaks build. However, instead of finishing, build seems to be stuck in "running" state after exiting make. Last line is:
make: *** [all] Error 2
I also get no notifications.
How can i diagnose what is happening?
Upd.: in runner, following is repeated in log:
Submitting build <..> to coordinator...response error: 500
In production.log and sideq.log of gitlab_ci, following is written:
ERROR: Error connecting to Redis on localhost:6379 (ECONNREFUSED)
Full message with stacktrace is here: pastebin.
I have the same problem, i can help you with a workaround but im trying to fully fix it.
1- most of the times he hangs but the jobs keeps on going and actually finishes it, you can see the processes inside the machine, example: in my case it compiles and in the end it uses docker to publish the build, so the process docker doesn't exist until he reaches that phase.
2- to workaround this issue you have to make the data persistent and "retry" the download over and over again until he downloads everything he needs.
PS: stating what kind of OS you are using always helps.

Can I fail a build based on the outcome of a SSH Task?

I was wondering if I could use bamboo's SSH task to run a script (this kicks off a small java message injector).
Then grep the logs for ERRORS. If any ERROR is present I would like to fail the build.
Something like this:
Is this a Bash question or is it really about Bamboo? Here is the Bash problem answer:
If you run
[[ ! $(grep ERROR /a/directory/log/*) ]]
the script will exit with an error if it finds the word "ERROR" anywhere in the files.
Bamboo should detect the task execution as failed.
(Note that if Bash is not the default shell on your target system you may need a #!/bin/bash on top of the script file.)

SGE Command Not Found, Undefined Variable

I'm attempting to setup a new compute cluster, and currently experiencing errors when using the qsub command in the SGE. Here's a simple experiment that shows the problem:
test.sh
#!/usr/bin/zsh
test="hello"
echo "${test}"
test.sh.eXX
test=hello: Command not found.
test: Undefined variable.
test.sh.oXX
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
If I ran the script on the head node (sh test.sh), the output is correct. I submit the job to the SGE by typing "qsub test.sh".
If I submit the exact same script job in the same way on an established compute cluster like HPC, it works perfectly as expected. What setting could be causing this problem?
Thanks for any help on this matter.
Most likely the queues on your cluster are set to posix_compliant mode with a default shell of /bin/csh. The posix_compliant setting means your #! line is ignored. You can either change the queues to unix_behavior or specify the required shell using qsub's -S option.
#$ -S /bin/sh