Error: no Snakefile found, tried Snakefile, snakefile, workflow/Snakefile, workflow/snakefile - snakemake

I had this error whenever running my snakefile.smk
Error: no Snakefile found, tried Snakefile, snakefile, workflow/Snakefile, workflow/snakefile.
using ls command shows that the file exists in directory
Miniconda3-4.7.12.1-Linux-x86_64.sh config.yml miniconda3 nano.save.1 snakemake
Miniconda3-4.7.12.1-MacOSX-x86_64.sh download.r nano.save snakefile.smk work
I am using WSL2 ubuntu 20
the snakefile contents
sample = ["GSE6955", "GSE67311"]
rule download:
output:
"~/{sample}.tar"
run:
"~/download.R"
rule extract:
input:
"~/{sample}.tar"
output:
directory("~/{sample}")
shell:
tar xvf {input}`
Does anyone have an idea how to fix this?

Snakemake by default looks for a file called snakefile or Snakefile in the working directory if a specific file is not provided. In your case, your snakefile is called snakefile.smk. To run snakemake with a specific snakefile, you can call it with the -s or --snakefile command line arg.
I recommend you call snakemake with the -h flag to see all the options available when calling snakemake.

Related

running metabat2 with snakemake but not getting the bin files

I have been trying to run metabat2 with snakemake. I can run it but the output files in metabat2/ are missing. The checkM that works after it does use the data and can work I just cant find the files later. There should be files created with numbers but it is imposible to predict how many files will be created. Is there a way I can specify it to make sure that the files are created in that file?
rule all:
[f"metabat2/" for sample in samples],
[f"checkm/" for sample in samples]
rule metabat2:
input:
"input/consensus.fasta"
output:
directory("metabat2/")
conda:
"envs/metabat2.yaml"
shell:
"metabat2 -i {input} -o {output} -v"
rule checkM:
input:
"metabat2/"
output:
c = "bacteria/CheckM.txt",
d = directory("checkm/")
conda:
"envs/metabat2.yaml"
shell:
"checkm lineage_wf -f {output.c} -t 10 -x fa {input} {output.d}"
the normal code to run metabat2 would be
metabat2 -i path/to/consensus.fasta -o /outputdir/bin -v
this will create in outputdir files with bin.[number].fa
I can't tell what the problem is but I have a couple of suggestions...
[f"metabat2/" for sample in samples]: I doubt this will do what you expect as it will simply create a list with the string metabat2/ repeat len(samples) times. Maybe you want [f"metabat2/{sample}" for sample in samples]? The same for [f"checkm/" for sample in samples]
The samples variable is not used anywhere in the rules following all. I suspect somewhere it should be used and/or you should use something like output: directory("metabat2/{sample}")
Execute snakemake with -p option to see what commands are executed. It may be useful to post the stdout from it.

How can I run multiple runs of pipeline with different config files - issue with lock on .snakemake directory

I am running a snakemake pipeline from the same working directory but with different config files and the input / output are in different directories too. The issue seems to be that although both runs are using data in different folders snakemake creates the lock on the pipeline folder due to the .snakemake folder and the lock folder within. Is there a way to force separate .snakemake folders? code example below:
Both runs are ran from within /home/pipelines/qc_pipeline :
run 1:
/home/apps/miniconda3/bin/snakemake -p -k -j 999 --latency-wait 10 --restart-times 3 --use-singularity --singularity-args "-B /pipelines_test/QC_pipeline/PE_trimming/,/clusterTMP/testingQC/,/home/www/codebase/references" --configfile /clusterTMP/testingQC/config.yaml --cluster-config QC_slurm_roadsheet.json --cluster "sbatch --job-name {cluster.name} --mem-per-cpu {cluster.mem-per-cpu} -t {cluster.time} --output {cluster.output}"
run 2:
/home/apps/miniconda3/bin/snakemake -p -k -j 999 --latency-wait 10 --restart-times 3 --use-singularity --singularity-args "-B /pipelines_test/QC_pipeline/SE_trimming/,/clusterTMP/testingQC2/,/home/www/codebase/references" --configfile /clusterTMP/testingQC2/config.yaml --cluster-config QC_slurm_roadsheet.json --cluster "sbatch --job-name {cluster.name} --mem-per-cpu {cluster.mem-per-cpu} -t {cluster.time} --output {cluster.output}"
error:
Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files in the following directory:
/home/pipelines/qc_pipeline
If you are sure that no other instances of snakemake are running on this directory, the remaining lock was likely caused by a kill signal or a power loss. It can be removed with the --unlock argument.
Maarten-vd-Sande correctly points to the --nolock option (+1), but in my opinion it's a very bad idea to use --nolock routinely.
As the error says, two snakemake processes are trying to create the same file. Unless the error is a bug in snakemake, I wouldn't blindly proceed and overwrite files.
I think it would be safer to assign to each snakemake execution its own execution directory and working directory, like:
topdir=`pwd`
mkdir -p run1
cd run1
snakemake --configfile /path/to/config1.yaml ...
cd $topdir
mkdir -p run2
cd run2
snakemake --configfile /path/to/config2.yaml ...
cd $topdir
mkdir -p run3
etc...
EDIT
Actually, it should be less clunky and probably better to use the the --directory/-d option:
snakemake -d run1 --configfile /path/to/config1.yaml ...
snakemake -d run2 --configfile /path/to/config2.yaml ...
...
As long as the different pipelines do not generate the same output files you can do it with the --nolock option:
snakemake --nolock [rest of the command]
Take a look here for a short doc about nolock.

Does Snakefile location matter?

I am absolute beginner to snakemake. I am building a pipeline as I learn. My question is if the Snakefile is placed with data file that I want to process an NameError: occurs but if I move the Snakefile to a parent directory and edit the path information of input: and output: the code works. what am I missing?
rule sra_convert:
input:
"rna/{id}.sra"
output:
"rna/fastq/{id}.fastq"
shell:
"fastq-dump {input} -O {output}"
above code works fine when I run with
snakemake -p rna/fastq/SRR873382.fastq
However, if I move the file to "rna" directory where the SRR873382.sra file is and edit the code as below
rule sra_convert:
input:
"{id}.sra"
output:
"fastq/{id}.fastq"
message:
"Converting from {id}.sra to {id}.fastq"
shell:
"fastq-dump {input} -O {output}"
and run
snakemake -p fastq/SRR873382.fastq
I get the following error
Building DAG of jobs...
Job counts:
count jobs
1 sra_convert
1
RuleException in line 7 of /home/sarc/Data/rna/Snakefile:
NameError: The name 'id' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}
Solution
rule sra_convert:
input:
"{id}.sra"
output:
"fastq/{id}.fastq"
message:
"Converting from {wildcards.id}.sra to {wildcards.id}.fastq"
shell:
"fastq-dump {input} -O {output}"
above code runs fine without error
I believe that the best source that answers your actual question is:
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#wildcards
If the rule’s output matches a requested file, the substrings matched
by the wildcards are propagated to the input files and to the variable
wildcards, that is here also used in the shell command. The wildcards
object can be accessed in the same way as input and output, which is
described above.

Snakemake: specifying a workdir in YAML config file

I specified a workdir in the YAML config file to use with snakemake as follows:
$> cat config.yaml
workdir: "/home/lina/test_output"
id: 1234
$> cat Snakefile
rule all:
input:
data_out = expand("cat_out/{id}_times_two.txt", id = config['id'])
rule double_print:
input:
data = expand("data/{id}.txt", id = config['id'])
output:
data_out = expand("cat_out/{id}_times_two.txt", id = config['id'])
shell:
'cat {input.data} {input.data} > {output.data_out}'
$> snakemake --configfile=config.yaml
However, once I ran my snakemake command, the output was generated in the directory where the snakefile resides. My snakefile was able to take advantage of the id parameter I specified in the config file, so it was able to read the config file and at least interpret the id parameter.
How should I modify the config file or my snakemake command to make sure the output ends up in the workdir I specified?
Thanks much!
You need to add workdir to the snakefile, not the configuration.
But you can set it dynamically, so in the snakefile, write:
workdir: config['workdir']

MissingRuleException in Snakemake after code changes

I have these two rules:
all_participants = ['01','03','04','05','06','07','08']
rule all:
input: expand("data/interim/tables/screen/p{participant_id}.csv",participant_id=all_participants)
rule extract_screen_table:
output: "data/interim/tables/screen/p{participant_id}.csv"
shell: "python src/data/sql_table_to_csv.py --table screen"
If I execute snakemake everything works, but if I change the code and execute: snakemake -n -R 'snakemake --list-code-changes' I get this error:
Building DAG of jobs...
MissingRuleException:
No rule to produce snakemake --list-code-changes (if you use input functions make sure that they don't raise unexpected exceptions).
The output of snakemake --list-code-change is:
Building DAG of jobs...
data/interim/tables/screen/p03.csv
which I reckon it shouldn't be, and I should get the python script instead.
You have to use backticks for the list-code-changes: `snakemake --list-code-changes`. This is bash syntax for execute the contained command and return STDOUT as a string.