CalledProcessError in Snakemake - snakemake

I have a rule to run megahit genome assembler in my Snakemake pipeline:
rule perform_assembly_using_megahit:
input:
fq = "target.fq"
output:
fa = "assembled_megahit/final.contigs.fa"
threads:
_NBR_CPUS
run:
command = MEGAHIT_DIR + "megahit -r " + input.fq + " -o " + _TEMP_DIR + "assembled_megahit/"
shell(command)
and I got the following error:
RuleException:
CalledProcessError in line 374 of Snakefile:
Command 'set -euo pipefail; ext/MEGAHIT-1.2.9-Linux-x86_64-static/bin/megahit -r target.fq -o assembled_megahit/' returned non-zero exit status 1.
But if I run the same command on the CLI:
ext/MEGAHIT-1.2.9-Linux-x86_64-static/bin/megahit -r target.fq -o assembled_megahit/
the program runs without any problems. Any help would be appreciated.
Thanks!

"Output directory assembled_megahit already exists, please change the parameter -o to another value to avoid overwriting" even though I've deleted that directory before I run my Snakefile.
I believe snakemake creates the necessary directory tree for output files before executing the run directive. So you get the error even if you delete the output directory before running snakemake.
I think you can fix it with something like:
run:
outdir= _TEMP_DIR + "assembled_megahit/"
shutil.rmtree(outdir)
command = MEGAHIT_DIR + "megahit -r " + input.fq + " -o " + outdir
shell(command)
(Obviously it assumes there isn't anything valuable inside outdir)

Related

Nextflow unzip gz file error: not a regular file

I am new to Nextflow and I want to unzip a fastq.gz file. But It raised the error gunzip: SRR1659960_05pc_R1.fastq.gz is not a regular file.
I tried to run the same command directly in my console, it works well.
My Nextflow script is:
process gz_uncompress{
input:
path fastq_r1_gz_path
output:
path fastq_r1_path
script:
"""
gunzip -kd $fastq_r1_gz_path > "fastq_r1_path.fastq"
"""
}
workflow{
gz_uncompress("/Users/test/PycharmProjects/nf-easyfuse/local_test/SRR1659960_05pc_R1.fastq.gz")
}
The error message is:
local_test ywan$ nextflow run t2.nf
N E X T F L O W ~ version 22.10.3
Launching `t2.nf` [peaceful_wilson] DSL2 - revision: bf9e3bc592
executor > local (1)
[36/f8301b] process > gz_uncompress [ 0%] 0 of 1
Error executing process > 'gz_uncompress'
Caused by:
Process `gz_uncompress` terminated with an error exit status (1)
Command executed:
gunzip -kd SRR1659960_05pc_R1.fastq.gz > "fastq_r1_path.fastq"
Command exit status:
1
executor > local (1)
[36/f8301b] process > gz_uncompress [100%] 1 of 1, failed: 1 ✘
Error executing process > 'gz_uncompress'
Caused by:
Process `gz_uncompress` terminated with an error exit status (1)
Command executed:
gunzip -kd SRR1659960_05pc_R1.fastq.gz > "fastq_r1_path.fastq"
Command exit status:
1
Command output:
(empty)
Command error:
gunzip: SRR1659960_05pc_R1.fastq.gz is not a regular file
Work dir:
/Users/test/PycharmProjects/nf-easyfuse/local_test/work/36/f8301b816e9eb834597ff1e6616c51
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
But when I ran gunzip -kd SRR1659960_05pc_R1.fastq.gz > "fastq_r1_path.fastq" in my console, there isn't any errors.
Could you please help me to figure out?
By default, Nextflow will try to stage process input files using symbolic links (this can be changed using the StageInMode directive). With gunzip, just make sure you write the output to stdout, using -c:
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a
sequence of independently compressed members. To obtain better
compression, concatenate all input files before compressing them.
For example:
params.fastq = './test.fastq.gz'
process gunzip_fastq {
input:
path fastq_gz
output:
path fastq_gz.baseName
script:
"""
gunzip -c "${fastq_gz}" > "${fastq_gz.baseName}"
"""
}
workflow{
fastq = file( params.fastq )
gunzip_fastq( fastq )
}
On most systems, you could also just use zcat for this. The zcat command is identical to gunzip -c.
I found add the option -f can fix this issue.
process gz_uncompress2{
input:
path f1
output:
path "*.fastq"
script:
"""
gunzip -fkd $f1 > fastq_r1_path
"""
}
workflow{
path = gz_uncompress2("/Users/test/PycharmProjects/nf-easyfuse/local_test/SRR1659960_05pc_R1.fastq.gz")
path.view()
}

Snakemake Error with MissingOutputException

I am trying to run STAR with snakemake in a server,
My smk file is that one :
import pandas as pd
configfile: 'config.yaml'
#Read sample to batch dataframe mapping batch to sample (here with zip)
sample_to_batch = pd.read_csv("/mnt/DataArray1/users/zisis/STAR_mapping/snakemake_STAR_index/all_samples_test.csv", sep = '\t')
#rule spcifying output
rule all_STAR:
input:
#expand("{sample}/Aligned.sortedByCoord.out.bam", sample = sample_to_batch['sample'])
expand(config['path_to_output']+"{sample}/Aligned.sortedByCoord.out.bam", sample = sample_to_batch['sample'])
rule STAR_align:
#specify input fastq files
input:
fq1 = config['path_to_output']+"{sample}_1.fastq.gz",
fq2 = config['path_to_output']+"{sample}_2.fastq.gz"
params:
#location of indexed genome andl location to save the ouput
genome = directory(config['path_to_reference']+config['ref_assembly']+".STAR_index"),
prefix_outdir = directory(config['path_to_output']+"{sample}/")
threads: 12
output:
config['path_to_output']+"{sample}/Aligned.sortedByCoord.out.bam"
log:
config['path_to_output']+"logs/{sample}.log"
message:
"--- Mapping STAR---"
shell:
"""
STAR --runThreadN {threads} \
--readFilesCommand zcat \
--readFilesIn {input} \
--genomeDir {params.genome} \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--outSAMattributes Standard
"""
While STAR starts normally at the end i have this error:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 14 of /mnt/DataArray1/users/zisis/STAR_mapping/snakemake/STAR_snakefile_align.smk:
Job Missing files after 5 seconds:
/mnt/DataArray1/users/zisis/STAR_mapping/snakemake/001_T1/Aligned.sortedByCoord.out.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 1 completed successfully, but some output files are missing. 1
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
I tried --latency-wait but is not working.
In order to execute snake make i run the command
users/zisis/STAR_mapping/snakemake_STAR_index$ snakemake --snakefile STAR_new_snakefile.smk --cores all --printshellcmds
Technically i am in my directory with full access and permissions
Do you think that this is happening due to strange rights in the execution of snakemake or when it tries to create directories ?
It creates the directory and the files but i can see that there is a files Aligned.sortedByCoord.out.bamAligned.sortedByCoord.out.bam .
IS this the problem ?
I think your STAR command does not have the option that says which file and directory to write to, presumably it is writing the default filename to the current directory. Try something like:
rule STAR_align:
input: ...
output: ...
...
shell:
r"""
outprefix=`dirname {output}`
STAR --outFileNamePrefix $outprefix \
--runThreadN {threads} \
etc...
"""
I am runing the command from my directory in which i am sudo user
I don't think that is the problem but it is strongly recommended to work as regular user and use sudo only in special circumstances (e.g. installing system-wide programs but if you use conda you shouldn't need that).

Snakemake basic issue

I tried to run Snakemake command on my local computer. It didn’t work even I used the simplest code structure, like so:
rule fastqc_raw:
input:
"raw/A.fastq"
output:
"output/fastqc_raw/A.html"
shell:
"fastqc {input} -o {output} -t 4"
It displayed this error:
Error in rule fastqc_raw:
jobid: 1
output: output/fastqc_raw/A.html RuleException: CalledProcessError in line 13 of
/Users/01/Desktop/Snakemake/Snakefile: Command ' set -euo pipefail;
fastqc raw/A.fastq -o output/fastqc_raw/A.html -t 4 ' returned
non-zero exit status 2. File
"/Users/01/Desktop/Snakemake/Snakefile", line 13, in __rule_fastqc_raw
File "/Users/01/miniconda3/lib/python3.6/concurrent/futures/thread.py",line 56, in run
However the snakemake program did created DAG file that looks normal and when I used “snakemake --np” command, it didn’t display any errors.
I did also ran fastqc locally without Snakemake using the same command, and it worked perfectly.
I hope anyone can help me with this
Thanks !!
It looks like Snakemake did its job. It ran the command:
fastqc raw/A.fastq -o output/fastqc_raw/A.html -t 4
But the command returned an error:
Command ' set -euo pipefail;
fastqc raw/A.fastq -o output/fastqc_raw/A.html -t 4 ' returned
non-zero exit status 2.
The next step in debugging is to run the fastqc command manually to see if it gives an error.
I hope you have gotten an answer by now but I had the exact same issue so I will offer my solution.
The error is in the
shell:
"fastqc {input} -o {output} -t 4"
FastQC flag -o expects the output directory and you have given it an output file. Your code should be:
shell:
"fastqc {input} -o output/fastqc_raw/ -t 4"
Your error relates to the fact that the output files have been output in a different location (most likely the input directory) and the rule all: has failed as a result.
Additionally, FastQC will give an error if the directories are not already created, so you will need to do that first.
It is strange as I have seen Snakemake scripts that have no -o flag in the fastqc shell and it worked fine, but I haven't been so lucky.
An additional note: I can see you're using 4 threads there with the '-t 4' argument. You should specify this so Snakemake gives it 4 threads, otherwise I believe it will run with 1 thread and may fail due to lack of memory. This can be done like so:
rule fastqc_raw:
input:
"raw/A.fastq"
output:
"output/fastqc_raw/A.html"
threads: 4
shell:
"fastqc {input} -o {output} -t 4"

syntaxnet bazel test failed

I ran bazel test syntaxnet/... util/utf8/... and it gave me this output:
FAIL: //syntaxnet:parser_trainer_test (see /home/me/.cache/bazel/_bazel_rushat/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log).
INFO: Elapsed time: 2179.396s, Critical Path: 1623.00s
//syntaxnet:arc_standard_transitions_test PASSED in 0.7s
//syntaxnet:beam_reader_ops_test PASSED in 24.1s
//syntaxnet:graph_builder_test PASSED in 14.6s
//syntaxnet:lexicon_builder_test PASSED in 6.1s
//syntaxnet:parser_features_test PASSED in 5.8s
//syntaxnet:reader_ops_test PASSED in 9.4s
//syntaxnet:sentence_features_test PASSED in 0.2s
//syntaxnet:shared_store_test PASSED in 41.7s
//syntaxnet:tagger_transitions_test PASSED in 5.2s
//syntaxnet:text_formats_test PASSED in 6.1s
//util/utf8:unicodetext_unittest PASSED in 0.4s
//syntaxnet:parser_trainer_test FAILED in 0.5s
/home/me/.cache/bazel/_bazel_me/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log
Executed 12 out of 12 tests: 11 tests pass and 1 fails locally.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
If you want the output of --test_verbose_timeout_warnings, please ask.
Test.log output is below because Stackoverflow tells me I have too much code in my post :/
Thanks!
test.log output:
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
-----------------------------------------------------------------------------
+ BINDIR=/home/me/.cache/bazel/_bazel_me/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_trainer_test.runfiles/syntaxnet
+ CONTEXT=/home/me/.cache/bazel/_bazel_me/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_trainer_test.runfiles/syntaxnet/testdata/context.pbtxt
+ TMP_DIR=/tmp/syntaxnet-output
+ mkdir -p /tmp/syntaxnet-output
+ sed s=OUTPATH=/tmp/syntaxnet-output=
+ sed s=SRCDIR=/home/me/.cache/bazel/_bazel_me/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_trainer_test.runfiles= /home/me/.cache/bazel/_bazel_me/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_trainer_test.runfiles/syntaxnet/testdata/context.pbtxt
sed: can't read /home/me/.cache/bazel/_bazel_me/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_trainer_test.runfiles/syntaxnet/testdata/context.pbtxt: No such file or directory
+ PARAMS=128-0.08-3600-0.9-0
+ /home/me/.cache/bazel/_bazel_me/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_trainer_test.runfiles/syntaxnet/parser_trainer --arg_prefix=brain_parser --batch_size=32 --compute_lexicon --decay_steps=3600 --graph_builder=greedy --hidden_layer_sizes=128 --learning_rate=0.08 --momentum=0.9 --output_path=/tmp/syntaxnet-output --task_context=/tmp/syntaxnet-output/context --training_corpus=training-corpus --tuning_corpus=tuning-corpus --params=128-0.08-3600-0.9-0 --num_epochs=12 --report_every=100 --checkpoint_every=1000 --logtostderr
syntaxnet/parser_trainer_test: line 36: /home/me/.cache/bazel/_bazel_me/cc4d67663fbe887a603385d628fdf383/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_trainer_test.runfiles/syntaxnet/parser_trainer: No such file or directory
This is a bug in the syntaxnet test, it's looking for the wrong path. It needs the following patch:
diff --git a/syntaxnet/syntaxnet/parser_trainer_test.sh b/syntaxnet/syntaxnet/parser_trainer_test.sh
index ba2a6e7..977c89c 100755
--- a/syntaxnet/syntaxnet/parser_trainer_test.sh
+++ b/syntaxnet/syntaxnet/parser_trainer_test.sh
## -22,7 +22,7 ##
set -eux
-BINDIR=$TEST_SRCDIR/syntaxnet
+BINDIR=$TEST_SRCDIR/$TEST_WORKSPACE/syntaxnet
CONTEXT=$BINDIR/testdata/context.pbtxt
TMP_DIR=/tmp/syntaxnet-output
You need to have the correct bazel version installed for syntaxnet to compile. I had the latest build and it didn;t work. So I removed it by deleting the folder
rm -fr .cache/bazel
and then reinstalled the correct version bazel==0.2.2b by downloading the correct installer from the download page
and running it on my machine
sudo chmod +x bazel-version.sh
./bazel-version.sh --user

Rebol Call Command doesn't behave exactly like Dos command (ex with Subversion command line)

This Subversion import command works on dos command line:
"C:\Program Files\Subversion\bin\svn.exe" import c:\myproj file:///c:/svnrepo/myproj -m "test"
If I try to send the same command with Rebol Call Command with this script:
Print "This command will and your files to the repository without requiring a working copy"
repo-directory: to-local-file ask "repo: "
project-subdirectory: to-local-file ask "project: "
source-directory: to-local-file ask "source directory: "
comment: ask "comment: "
command: rejoin [{"} Subversion.Directory "bin\svn.exe" {"} " import " source-directory " file:///" (replace/all (to-local-file repo-directory) "\" "/") "/" project-subdirectory " -m " {"} comment {"}]
call/wait/console command
I will get this
repo: c:\svnrepo
project: myproj
source directory: c:\myproj
comment: test
svn: The given propagation message is a path (-F was this intended ?) ; force with
'--force-log'
== 1
The value of command is the same as the Dos command:
>> command
== {"C:\Program Files\Subversion\bin\svn.exe" import c:\myproj file:///c:/svnrepo/m
yproj -m "test"}
>>
So I appended --force-log and it did then work but I still would like to know the reason Rebol doesn't behave like Dos Command if there is one that I ignore.
And if you write the command to say %script.cmd and call that from Rebol, do you get the desired effect?