How to output the shell script even after all the rules have been finished in `snakemake`? - snakemake

If I have a snakemake workflow and I already finished all the rules, I want to output the shell command lines. Is there a way to do this?
I know -n -p can output the command lines before the rules have been finished.
Thanks in advance.

You simply need to use the option -F which tells snakemake to rerun all rules even if targets are already present on the file system.
--forceall, -F Force the execution of the selected (or the first)
rule and all rules it is dependent on regardless of
already created output.
Don't forget the -n (dry-run) option if you don't want to run your pipeline again and -p (print shell commands).

Related

Is it possible to see qsub job scripts and command line options etc. that will be issued by Snakemake in the dry run mode?

I am new to Snakemake and planning to use Snakemake for qsub in my cluster environment. In order to avoid critical mistakes that may disturb the cluster, I would like to check qsub job scripts and a qsub command that will be generated by Snakemake before actually submitting the jobs to the que.
Is it possible to see qsub job script files etc. in the dry run mode or in some other ways? I searched for relevant questions but could not find the answer. Thank you for your kind help.
Best,
Schuma
Using --printshellcmds or short version -p with --dry-run will allow you to see the commands snakemake will feed to qsub, but you won't see qsub options.
I don't know any option showing which parameters are given to qsub, but snakemake follows a simple set of rules, which you can find detailed information here and here. As you'll see, you can feed arguments to qsub in multiple ways.
With default values --default-resources resource_name1=<value1> resource_name2=<value2> when invoking snakemake.
On a per-rule basis, using resources in rules (prioritized over default values).
With explicitly set values, either for the whole pipeline using --set-resources resource_name1=<value1> or for a specific rule using --set-resources rule_name:resource_name1=<value1> (prioritized over default and per-rule values)
Suppose you have the following pipeline:
rule all:
input:
input.txt
output:
output.txt
resources:
mem_mb=2000
runtime_min=240
shell:
"""
some_command {input} {output}
"""
If you call qsub using the --cluster directive, you can access all keywords of your rules. Your command could then look like this:
snakemake all --cluster "qsub --runtime {resources.runtime} -l mem={resources.mem_mb}mb"
This means snakemake will submit the following script to the cluster just as if you did directly in your command line:
qsub --runtime 240 -l mem=2000mb some_command input.txt output.txt
It is up to you to see which parameters you define where. You might want to check your cluster's documentation or with its administrator what parameters are required and what to avoid.
Also note that for cluster use, Snakemake documentation recommends setting up a profile which you can then use with snakemake --profile myprofile instead of having to specify arguments and default values each time.
Such a profile can be written in a ~/.config/snakemake/profile_name/config.yaml file. Here is an example of such a profile:
cluster: "qsub -l mem={resources.mem_mb}mb other_resource={resources.other_resource_name}"
jobs: 256
printshellcmds: true
rerun-incomplete: true
default-resources:
- mem_mb=1000
- other_resource_name="foo"
Invoking snakemake all --profile profile_name corresponds to invoking
snakemake all --cluster "qsub -l mem={resources.mem_mb}mb other_resource= resources.other_resource_name_in_snakefile}" --jobs 256 --printshellcmds --rerun-incomplete --default-resources mem_mb=1000 other_resource_name "foo"
You may also want to define test rules, like a minimal example of your pipeline for instance, and try these first to verify all goes well before running your full pipeline.

How to run only one rule in snakemake

I have created a workflow within snakemake, I Have a problem when I want to run just one rule. Indeed it runs for me the rules where the output is the input of my rule even if those one are already created before.
Example :
rule A:
input A
output A
rule b:
input b = output A
output b
rule c:
input c = output b
output c
How can I run just the rule C?
If there are dependencies, I have found that only --until works if you want to run rule C just run snakemake -R --until c. If there are assumed dependencies, like shared input or output paths, it will force you to run the upstream rules without the use of --until. Always run first with -n for a dry-run.
You can used the --allowed-rules option.
snakemake --allowed-rules c
Snakemake will try to rerun upstream rules linked by the input/output chain to your downstream rule if the output file(s) of the upstream rule(s) have changed (including if they've been re-created but the content hasn't changed). This behavior makes Snakemake reproducible, but maybe isn't desirable if you're trying to debug a specific part of your pipeline and don't want to run all the intermediate steps.
See this discussion:
https://bitbucket.org/snakemake/snakemake/issues/688/execute-specified-rule-only-and-not
You just run:
snakemake -R b
To see what this will do in advance:
snakemake -R b -n
-R selects the one rule (and all its dependent rules also!), -n does a "dry run", it just prints what it would do without -n.
I think "--force" = "-f" is what is asked for here:
snakemake --force c
snakemake -f c
--force, -f Force the execution of the selected target or the first rule regardless of already created output. (default: False)
--forceall, -F Force the execution of the selected (or the first) rule and all rules it is dependent on regardless of already created output. (default: False)
--forcerun [TARGET ...], -R [TARGET ...] Force the re-execution or creation of the given rules or files. Use this option if you changed a rule
and want to have all its output in your workflow updated. (default: None)
)

Is it possible to print commands instead of rules in snakemake dry run?

Dry runs are a super important functionality of workflow languages. What I am looking at is mostly what would be executed if I run the command and this is exactly what one see when running make -n.
However analogical functionality snakemake -n prints something like
Building DAG of jobs...
rule produce_output:
output: my_output
jobid: 0
wildcards: var=something
Job counts:
count jobs
1 produce_output
1
The log contains kind of everything else than commands that get executed. Is there a way how to get command from snakemake?
snakemake -p --quiet -n
-p for print shell commands
-n for dry run
--quiet for removing the rest
EDIT 2019-Jan
This solution seems broken for lasts versions of snakemake
snakemake -p -n
Avoid the --quiet reported in the #eric-c answer, at least in some situations the combination on -p -n -q does not print the command executed without -n.

Can I fail a build based on the outcome of a SSH Task?

I was wondering if I could use bamboo's SSH task to run a script (this kicks off a small java message injector).
Then grep the logs for ERRORS. If any ERROR is present I would like to fail the build.
Something like this:
Is this a Bash question or is it really about Bamboo? Here is the Bash problem answer:
If you run
[[ ! $(grep ERROR /a/directory/log/*) ]]
the script will exit with an error if it finds the word "ERROR" anywhere in the files.
Bamboo should detect the task execution as failed.
(Note that if Bash is not the default shell on your target system you may need a #!/bin/bash on top of the script file.)

SGE Command Not Found, Undefined Variable

I'm attempting to setup a new compute cluster, and currently experiencing errors when using the qsub command in the SGE. Here's a simple experiment that shows the problem:
test.sh
#!/usr/bin/zsh
test="hello"
echo "${test}"
test.sh.eXX
test=hello: Command not found.
test: Undefined variable.
test.sh.oXX
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
If I ran the script on the head node (sh test.sh), the output is correct. I submit the job to the SGE by typing "qsub test.sh".
If I submit the exact same script job in the same way on an established compute cluster like HPC, it works perfectly as expected. What setting could be causing this problem?
Thanks for any help on this matter.
Most likely the queues on your cluster are set to posix_compliant mode with a default shell of /bin/csh. The posix_compliant setting means your #! line is ignored. You can either change the queues to unix_behavior or specify the required shell using qsub's -S option.
#$ -S /bin/sh