Nextflow unzip gz file error: not a regular file - nextflow

I am new to Nextflow and I want to unzip a fastq.gz file. But It raised the error gunzip: SRR1659960_05pc_R1.fastq.gz is not a regular file.
I tried to run the same command directly in my console, it works well.
My Nextflow script is:
process gz_uncompress{
input:
path fastq_r1_gz_path
output:
path fastq_r1_path
script:
"""
gunzip -kd $fastq_r1_gz_path > "fastq_r1_path.fastq"
"""
}
workflow{
gz_uncompress("/Users/test/PycharmProjects/nf-easyfuse/local_test/SRR1659960_05pc_R1.fastq.gz")
}
The error message is:
local_test ywan$ nextflow run t2.nf
N E X T F L O W ~ version 22.10.3
Launching `t2.nf` [peaceful_wilson] DSL2 - revision: bf9e3bc592
executor > local (1)
[36/f8301b] process > gz_uncompress [ 0%] 0 of 1
Error executing process > 'gz_uncompress'
Caused by:
Process `gz_uncompress` terminated with an error exit status (1)
Command executed:
gunzip -kd SRR1659960_05pc_R1.fastq.gz > "fastq_r1_path.fastq"
Command exit status:
1
executor > local (1)
[36/f8301b] process > gz_uncompress [100%] 1 of 1, failed: 1 ✘
Error executing process > 'gz_uncompress'
Caused by:
Process `gz_uncompress` terminated with an error exit status (1)
Command executed:
gunzip -kd SRR1659960_05pc_R1.fastq.gz > "fastq_r1_path.fastq"
Command exit status:
1
Command output:
(empty)
Command error:
gunzip: SRR1659960_05pc_R1.fastq.gz is not a regular file
Work dir:
/Users/test/PycharmProjects/nf-easyfuse/local_test/work/36/f8301b816e9eb834597ff1e6616c51
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
But when I ran gunzip -kd SRR1659960_05pc_R1.fastq.gz > "fastq_r1_path.fastq" in my console, there isn't any errors.
Could you please help me to figure out?

By default, Nextflow will try to stage process input files using symbolic links (this can be changed using the StageInMode directive). With gunzip, just make sure you write the output to stdout, using -c:
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a
sequence of independently compressed members. To obtain better
compression, concatenate all input files before compressing them.
For example:
params.fastq = './test.fastq.gz'
process gunzip_fastq {
input:
path fastq_gz
output:
path fastq_gz.baseName
script:
"""
gunzip -c "${fastq_gz}" > "${fastq_gz.baseName}"
"""
}
workflow{
fastq = file( params.fastq )
gunzip_fastq( fastq )
}
On most systems, you could also just use zcat for this. The zcat command is identical to gunzip -c.

I found add the option -f can fix this issue.
process gz_uncompress2{
input:
path f1
output:
path "*.fastq"
script:
"""
gunzip -fkd $f1 > fastq_r1_path
"""
}
workflow{
path = gz_uncompress2("/Users/test/PycharmProjects/nf-easyfuse/local_test/SRR1659960_05pc_R1.fastq.gz")
path.view()
}

Related

Snakemake Error with MissingOutputException

I am trying to run STAR with snakemake in a server,
My smk file is that one :
import pandas as pd
configfile: 'config.yaml'
#Read sample to batch dataframe mapping batch to sample (here with zip)
sample_to_batch = pd.read_csv("/mnt/DataArray1/users/zisis/STAR_mapping/snakemake_STAR_index/all_samples_test.csv", sep = '\t')
#rule spcifying output
rule all_STAR:
input:
#expand("{sample}/Aligned.sortedByCoord.out.bam", sample = sample_to_batch['sample'])
expand(config['path_to_output']+"{sample}/Aligned.sortedByCoord.out.bam", sample = sample_to_batch['sample'])
rule STAR_align:
#specify input fastq files
input:
fq1 = config['path_to_output']+"{sample}_1.fastq.gz",
fq2 = config['path_to_output']+"{sample}_2.fastq.gz"
params:
#location of indexed genome andl location to save the ouput
genome = directory(config['path_to_reference']+config['ref_assembly']+".STAR_index"),
prefix_outdir = directory(config['path_to_output']+"{sample}/")
threads: 12
output:
config['path_to_output']+"{sample}/Aligned.sortedByCoord.out.bam"
log:
config['path_to_output']+"logs/{sample}.log"
message:
"--- Mapping STAR---"
shell:
"""
STAR --runThreadN {threads} \
--readFilesCommand zcat \
--readFilesIn {input} \
--genomeDir {params.genome} \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--outSAMattributes Standard
"""
While STAR starts normally at the end i have this error:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 14 of /mnt/DataArray1/users/zisis/STAR_mapping/snakemake/STAR_snakefile_align.smk:
Job Missing files after 5 seconds:
/mnt/DataArray1/users/zisis/STAR_mapping/snakemake/001_T1/Aligned.sortedByCoord.out.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 1 completed successfully, but some output files are missing. 1
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
I tried --latency-wait but is not working.
In order to execute snake make i run the command
users/zisis/STAR_mapping/snakemake_STAR_index$ snakemake --snakefile STAR_new_snakefile.smk --cores all --printshellcmds
Technically i am in my directory with full access and permissions
Do you think that this is happening due to strange rights in the execution of snakemake or when it tries to create directories ?
It creates the directory and the files but i can see that there is a files Aligned.sortedByCoord.out.bamAligned.sortedByCoord.out.bam .
IS this the problem ?
I think your STAR command does not have the option that says which file and directory to write to, presumably it is writing the default filename to the current directory. Try something like:
rule STAR_align:
input: ...
output: ...
...
shell:
r"""
outprefix=`dirname {output}`
STAR --outFileNamePrefix $outprefix \
--runThreadN {threads} \
etc...
"""
I am runing the command from my directory in which i am sudo user
I don't think that is the problem but it is strongly recommended to work as regular user and use sudo only in special circumstances (e.g. installing system-wide programs but if you use conda you shouldn't need that).

CalledProcessError in Snakemake

I have a rule to run megahit genome assembler in my Snakemake pipeline:
rule perform_assembly_using_megahit:
input:
fq = "target.fq"
output:
fa = "assembled_megahit/final.contigs.fa"
threads:
_NBR_CPUS
run:
command = MEGAHIT_DIR + "megahit -r " + input.fq + " -o " + _TEMP_DIR + "assembled_megahit/"
shell(command)
and I got the following error:
RuleException:
CalledProcessError in line 374 of Snakefile:
Command 'set -euo pipefail; ext/MEGAHIT-1.2.9-Linux-x86_64-static/bin/megahit -r target.fq -o assembled_megahit/' returned non-zero exit status 1.
But if I run the same command on the CLI:
ext/MEGAHIT-1.2.9-Linux-x86_64-static/bin/megahit -r target.fq -o assembled_megahit/
the program runs without any problems. Any help would be appreciated.
Thanks!
"Output directory assembled_megahit already exists, please change the parameter -o to another value to avoid overwriting" even though I've deleted that directory before I run my Snakefile.
I believe snakemake creates the necessary directory tree for output files before executing the run directive. So you get the error even if you delete the output directory before running snakemake.
I think you can fix it with something like:
run:
outdir= _TEMP_DIR + "assembled_megahit/"
shutil.rmtree(outdir)
command = MEGAHIT_DIR + "megahit -r " + input.fq + " -o " + outdir
shell(command)
(Obviously it assumes there isn't anything valuable inside outdir)

Log executed shell command from snakemake

I'd like to save the shell command executed by each snakemake job to a log file.
With --printshellcmds I can print to stdout the shell commands as they are submitted but I would like to save them to individual files. For example, this script:
samples= ['a', 'b', 'c']
rule all:
input:
expand('{sample}.done', sample= samples),
rule do_stuff:
output:
'{sample}.done'
shell:
"""
echo {wildcards.sample} > {output}
"""
will execute the 3 jobs:
echo a > a.done
echo c > c.done
echo b > b.done
I'd like to save each of these to a separate file named after a wildcard (or whatever). Like:
rule do_stuff:
output:
'{sample}.done'
log_shell: < save shell command
'{sample}.cmd.log' <
shell:
"""
echo {wildcards.sample} > {output}
"""
Is this possible? Thank you!
While fiddling with other issues, I came up with this solution. Combine the rundirective with the shell function to first write the command to file and then execute the command itself. Example:
rule do_stuff:
output:
'{sample}.done'
log:
cmd= 'my/logs/{sample}.log'
run:
cmd= """
echo hello > {output}
"""
shell("sort --version > {log.cmd} || true") # Log version
shell("cat <<'EOF' >> {log.cmd}" + cmd) # Log command
shell(cmd) # Exec command
For each {sample} you get a file my/logs/{sample}.log that looks like this:
sort (GNU coreutils) 5.93
Copyright (C) 2005 Free Software Foundation, Inc.
echo hello > a.done
This does the trick. However, it makes the code more cluttered and you effectively lose the --printshellcmds option.

Save Terminal Bazel Build Output

Is it possible to save the output of a bazel build command that is run in terminal? The command is:
bazel build tensorflow/examples/image_retraining:label_image &&
bazel-bin/tensorflow/examples/image_retraining/label_image
--graph=/tmp/output_graph.pb
--labels=/tmp/output_labels.txt
--output_layer=final_result:0
--image=$HOME/Desktop/Image-3/image1.png
I want to save the output to a .txt file; I cannot simply tag on > out.txt to the end of the line or I am thrown an error. But is there bazel-output command?
The stdout of the latest bazel command is logged in your WORKSPACE's output base:
$ echo $(bazel info output_base)
/home/username/.cache/bazel/_bazel_username/3e8af127f8b488324cdf41111355ff4c
and the exact file is command_log:
$ echo $(bazel info command_log)
/home/username/.cache/bazel/_bazel_username/3e8af127f8b488324cdf41111355ff4c/command.log
You can do it by piping the output to both text file and command line, but only on linux. You just append this at the end 2>&1 | tee file_name.log

How to get error message from ditto command , when it fails to archive

Using ditto command we are archiving folder. When folder contains some files which does not have read permission. It fails to archive. That time ditto command logs error message saying " ditto: "Path" : Permission denied. How to get this error message.
As with any UNIX command, errors are written to stderr, which can be captured by adding 2> file to end of the command:
$ ditto src dst 2> error
$ cat error
ditto: /Users/andy/tmp/src/./x: Permission denied
If you are running ditto from a shell script, then something like this should work:
#!/bin/sh
errfile=/tmp/errors.$$
(cd ~/tmp; ditto src dst 2> $errfile)
if [ $? -ne 0 ]; then
echo There was a problem:
cat $errfile
else
echo Everything is cool
fi