Using SnakeMake, how to pass SLURM flags with dashes in their names? - snakemake

I'm using Snakemake to execute rules on a SLURM cluster.
One of the mandatory flags for this cluster is ntasks-per-node, which in a batch script would be specified as e.g. #SBATCH --ntasks-per-node=5. My understanding is that I need to specify this in a snakemake rule as
rule rule_name:
...
resources:
time='00:00:30', #30 sec
ntasks-per-node=1
...
However, running this Snakefile I get
SyntaxError in line 14 of .../Snakefile:
keyword can't be an expression
because there are dashes in the name. But as far as I can tell, replacing the dashes with underscores doesn't work. What should I do here?
(I'm using the SLURM profile here if that matters)

Try quoting. But more importantly, only the resources that are defined in the RESOURCE_MAPPING variable in the slurm_submit.py will be picked up, and the default cookiecutter does not include an ntasks-per-node argument. Hence, quoting alone won't solve the issue.
There are multiple options.
Edit the slurm_submit.py. Add the ntasks-per-node argument and provide whatever alias(es) you would like to use.
RESOURCE_MAPPING = {
"time": ("time", "runtime", "walltime"),
"mem": ("mem", "mem_mb", "ram", "memory"),
"mem-per-cpu": ("mem-per-cpu", "mem_per_cpu", "mem_per_thread"),
"nodes": ("nodes", "nnodes"),
# some suggested aliases
"ntasks-per-node": ("ntasks-per-node", "ntasks_per_node", "ntasks")
}
I would only do this if there actually are situations where you might change this value.
Define an invocation-level configuration. Snakemake's --cluster_config parameter can still be used to provide additional configuration settings. In this case, a file like
# myslurm.yaml
__default__:
ntasks-per-node: 1
Then use it with
snakemake --profile slurm --cluster_config myslurm.yaml
This is likely the least work to get going.
Define a global value in the profile. The cookiecutter profile generator provides multiple options to define global options that don't often need to change for the profile.

Related

How to pass a list or dictionary using Snakemake's command line config option

I want to pass a list of filenames to produce through my Snakemake workflow by using the --config CLI option. What's the syntax I need for that?
Just specify the list or dict in YAML (~Python) syntax on the command line:
snakemake -c1 --config 'foo={"a":"a.txt", "b":"b.txt"}' 'bar=["file1.txt","file2.csv"]'
You can then access this list in a sample Snakemake rule as follows:
rule all:
input:
baz=config["foo"]["a"],
qux=config["foo"]["b"],
quux=config["bar"],
As usual, the name before the = will be how you access the data after the = through the config variable. Multiple config items can be passed through space separation as shown above.
Here is the full parsing order of the --config CLI option (as of November '21):
--config key=value [key=value […]] sets top-level keys to the given
value, which is parsed, in order, as either: int(value), float(value),
literal True or False (a bool), a YAML-encoded value, or finally
str(value). Only the last occurrence of the option is used if it is
given more than once.
https://github.com/tsibley/blab-standup/blob/master/2021-11-04.md#specifying-config
The full post is a good reference of how Snakemake config handling works (the documentation is a bit lacking at times).

Accessing the --default-remote-prefix within the Snakefile

When I run snakemake on the google life sciences executor, I run something like:
snakemake --google-lifesciences --default-remote-prefix my_bucket_name --preemption-default 10 --use-conda
Now, my_bucket_name is going to get added to all of the input and output paths.
BUT for reasons I need to recreate the full path within the Snakefile code and therefore I want to be able to access whatever is passed to --default-remote-prefix within the code
Is there a way to do this?
I want to be able to access whatever is passed to --default-remote-prefix within the code
You can use the workflow object like:
print(workflow.default_remote_prefix) # Will print my_bucket_name in your example
rule all:
input: ...
I'm not 100% sure if the workflow object is supposed to be used by the user or if it's private to snakemake and if so it could be changed in the future without warning. But I think it's ok, I use workflow.basedir all the time to get the directory where the Snakefile sits.
Alternatively you could parse the sys.argv list but I think that this is more hacky.
Another option:
bucket_name=foo
snakemake --default-remote-prefix $bucket_name --config bucket_name=$bucket_name ...
then use config["bucket_name"] within the code to get the value foo. But I still prefer the workflow solution.

How to use Bamboo plan variables in an inline script task?

When defining a Bamboo plan variable, the page has this.
For task configuration fields, use the syntax
${bamboo.myvariablename}. For inline scripts, variables are exposed as
shell environment variables which can be accessed using the syntax
$BAMBOO_MY_VARIABLE_NAME (Linux/Mac OS X) or %BAMBOO_MY_VARIABLE_NAME%
(Windows).
However, that doesn't work in my Linux inline script. For example, I have the following defined a a plan variable
name: my_plan_var value: some_string
My inline script is simply...
PLAN_VAR=$BAMBOO_MY_PLAN_VAR
echo "Plan var: $PLAN_VAR"
and I just get a blank string.
I've tried this
PLAN_VAR=${bamboo.my_plan_var}
But I get
${bamboo.my_plan_var}: bad substitution
on the log viewer window.
Any pointers?
I tried the following and it works:
On the plan, I set my_plan_var to "it works" (w/o quotes)
In the inline script (don't forget the first line):
#/bin/sh
PLAN_VAR=$bamboo_my_plan_var
echo "testing: $PLAN_VAR"
And I got the expected result:
testing: it works
I also wanted to create a Bamboo variable and the only thing I've found to share it between scripts is with inject-variables like following:
Add to your bamboo-spec.yaml the following after your script that will create the variable:
Build:
tasks:
- script: create-bamboo-var.sh
- inject-variables:
file: bamboo-specs/vars.yaml
scope: RESULT
# namespace: plan
- script: echo ${bamboo.inject.GIT_VERSION} # just for testing
Note: Namespace defaults to inject.
In create-bamboo-var.sh create the file bamboo-specs/vars.yaml:
#!bin/bash
versionStr=$(git describe --tags --always --dirty --abbrev=4)
echo "GIT_VERSION: ${versionStr}" > ./bamboo-specs/vars.yaml
Or for multiple lines you can use:
SW_NUMBER_DIGITS=${1} # Passed as first parameter to build script
cat <<EOT > ./bamboo-specs/vars.yaml
GIT_VERSION: ${versionStr}
SW_NUMBER_APP: ${SW_NUMBER_DIGITS}
EOT
Scope can be local or result. Local means it's only available for current job and result means it can be used in subsequent stages of this plan and releases that are created from the result.
Namespace is just used to avoid naming collisions with other variables.
With the above you can use that variable in later scripts with ${bamboo.inject.GIT_VERSION}. The last script task is just to see that it is working in other scripts. You can also see the variables in the web app as build meta data.
I'm using the above script before the build (in my case compiling C-Code) takes place so I can also create a version.h file that can be used by the source code.
This is still a bit cumbersome but I'm happy with it and I hope it will help others to configure Bamboo. Bamboo documentation could be better. (Still a lot try and error)

How exactly do you use variables in Jenkins?

Can someone concisely explain what the differences between the three variables below are? Because in all honesty, when I create a Jenkins job, I randomly guess between the three types until something works, but I'd love to understand rather than blindly picking.
${ENV,var="BUILD_USER"}
${BUILD_USER}
$BUILD_USER
Also, are there other ways of writing variables in Jenkins that I missed other than the 3 ways above?
When used in a statement:
${ENV,var="BUILD_USER"}--evaluates the system environment variables and returns the value for the variable BUILD_USER.
example: curl ${ENV,var="BUILD_USER"}/api/xml
${BUILD_USER} --returns the value of the BUILD_USER variable in the current script memory space.
example: curl ${BUILD_USER}/api/xml
$BUILD_USER--used to assign values to the BUILD_USER variable.
example: $BUILD_USER = "BUILD_USER"
In general, variable expansion is up to the plugin that interprets a configuration value.
For example, if you set up a job parameter GIT_REPOSITORY and use it to configure an address where git clone should go by putting $GIT_REPOSITORY into the git repository field, it works, but only because the Jenkins git plugin has implemented variable expansion support.
Many plugins do implement it but you cannot know it unless you test it. However, these days the support is so common it is safe to assume it should work.
Both forms of reference, $VAR and ${VAR}, work and are equivalent. The latter form is useful if you need to use the variable in a place where it is surrounded by other characters that could be interpreted as part of variable, like $VARX (Jenkins would be looking for variable named VARX) and ${VAR}X (Jenkins understands the variable is named VAR).
These rules have been modeled after variable expansion rules in Unix shells. Indeed, the job variables are made available as environment variables to build steps and in the Unix shell build step the variables are used the same way as above.
In a Windows CMD build step the variables are again used like any Windows environment variable: %VAR%.

Use environment variables as default for cmake options

I would like to set up a cmake script with some options (flags and strings). For some of the options I would like to use environment variables as a default. Basically, I'm trying to emulate something like MY_OPTION ?= default in a Makefile. I tried the following:
project (Optiontest)
option(MY_OPTION
"Documentation"
$ENV{MY_OPTION}
FORCE)
message("value: ${MY_OPTION}")
I called this in the following way:
$ cmake -DMY_OPTION=ON .
value: ON
$ cmake -DMY_OPTION=OFF .
value: OFF
$ MY_OPTION=OFF cmake .
value: OFF
$ MY_OPTION=ON cmake .
value: OFF
My problem is that the last line should be ON as well.
For bonus karma: I would actually prefer three levels of preference. The value of -DMY_OPTION should be used if given. If not, the value of a set environment variable MY_OPTION should be used. If this is also not set, a constant should be used. I guess, I could use a bunch of nested if statements and somehow check if the variables are set, but I don't know how and I hope there is a better way.
FORCE is (as of CMake 3.0.2) not a valid parameter for option.
This is the primary source of problems. CMake will interpret the string FORCE as the desired initial value of the option in absence of an environment variable. The usual contrived rules for string-to-truth-value-conversion apply, resulting in the option being set to OFF by this call.
Second, you need to account for the fact that the environment variable is not set. Your current code misses to handle that case properly. $ENV{MY_OPTION} will evaluate to the empty string in that case. If you evaluate the set values in both the cache and the environment, you can enforce any behavior that you want.
In general, you should think about what you actually want here. Usually, FORCE setting a cached variable is a bad idea and I would not be surprised if you found your initial argument for doing this flawed after some careful reevaluation.
Maybe value of MY_OPTION cached in CMake cache? Do you try to clean cmake cache after third call MY_OPTION=OFF cmake .?