Makefile variable manipulation - variables

I'm doing this:
apply: init
#terraform apply -auto-approve
BUCKET=$(shell terraform output -json | jq '.S3_Bucket.value')
DYNAMODB=$(shell terraform output -json | jq '.dynamo_db_lock.value')
#echo $${BUCKET}
make shows both variables being set (I was using := but that doesn't work for me, since i need them set when i execute apply) but it's still echoing blank:
...
Outputs:
S3_Bucket = "bucket1"
dynamo_db_lock = "dyna1"
BUCKET="bucket"
DYNAMODB="dyna1"
<-- echo should print it out here
I want to use that variable for a sed afterwards...
Thanks all!

Each makefile recipe line is run in its own shell. Thus, shell variables set in one recipe line will not effect other recipe lines. You can get around this by catenating all the lines as so:
apply: init
#terraform apply -auto-approve
#BUCKET=$(shell terraform output -json | jq '.S3_Bucket.value'); \
DYNAMODB=$(shell terraform output -json | jq '.dynamo_db_lock.value'); \
echo $${BUCKET}

Related

Nextflow: publishDir, output channels, and output subdirectories

I've been trying to learn how to use Nextflow and come across an issue with adding output to a channel as I need the processes to run in an order. I want to pass output files from one of the output subdirectories created by the tool (ONT-Guppy) into a channel, but can't seem to figure out how.
Here is the nextflow process in question:
process GupcallBases {
publishDir "$params.P1_outDir", mode: 'copy', pattern: "pass/*.bam"
executor = 'pbspro'
clusterOptions = "-lselect=1:ncpus=${params.P1_threads}:mem=${params.P1_memory}:ngpus=1:gpu_type=${params.P1_GPU} -lwalltime=${params.P1_walltime}:00:00"
output:
path "*.bam" into bams_ch
script:
"""
module load cuda/11.4.2
singularity exec --nv $params.Gup_container \
guppy_basecaller --config $params.P1_gupConf \
--device "cuda:0" \
--bam_out \
--recursive \
--compress \
--align_ref $params.refGen \
-i $params.P1_inDir \
-s $params.P1_outDir \
--gpu_runners_per_device $params.P1_GPU_runners \
--num_callers $params.P1_callers
"""
}
The output of the process is something like this:
$params.P1_outDir/pass/(lots of bams and fastqs)
$params.P1_outDir/fail/(lots of bams and fastqs)
$params.P1_outDir/(a few txt and log files)
I only want to keep the bam files in $params.P1_outDir/pass/, hence trying to use the pattern = "pass/*.bam, but I've tried a few other patterns to no avail.
The output syntax was chosen since once this process is done, using the following channel works:
// Channel
// .fromPath("${params.P1_outDir}/pass/*.bam")
// .ifEmpty { error "Cannot find any bam files in ${params.P1_outDir}" }
// .set { bams_ch }
But the problem is if I don't pass the files into the output channel of the first process, they run in parallel. I could simply be missing something in the extensive documentation in how to order processes, which would be an alternative solution.
Edit: I forgo to add the error message which is here: Missing output file(s) `*.bam` expected by process `GupcallBases` and the $params.P1_outDir/ contains the subdirectories and all the log files despite the pattern argument.
Thanks in advance.
Nextflow processes are designed to run isolated from each other, but this can be circumvented somewhat when the command-line input and/or outputs are specified using params. Using params like this can be problematic because if, for example, a params variable specifies an absolute path but your output declaration expects files in the Nextflow working directory (e.g. ./work/fc/0249e72585c03d08e31ce154b6d873), you will get the 'Missing output file(s) expected by process' error you're seeing.
The solution is to ensure your inputs are localized in the working directory using an input declaration block and that the outputs are also written to the work dir. Note that only files specified in the output declaration block can be published using the publishDir directive.
Also, best to avoid calling Singularity manually in your script block. Instead just add singularity.enabled = true to your nextflow.config. This should also work nicely with the beforeScript process directive to initialize your environment:
params.publishDir = './results'
input_dir = file( params.input_dir )
guppy_config = file( params.guppy_config )
ref_genome = file( params.ref_genome )
process GuppyBasecaller {
publishDir(
path: "${params.publishDir}/GuppyBasecaller",
mode: 'copy',
saveAs: { fn -> fn.substring(fn.lastIndexOf('/')+1) },
)
beforeScript 'module load cuda/11.4.2; export SINGULARITY_NV=1'
container '/path/to/guppy_basecaller.img'
input:
path input_dir
path guppy_config
path ref_genome
output:
path "outdir/pass/*.bam" into bams_ch
"""
mkdir outdir
guppy_basecaller \\
--config "${guppy_config}" \\
--device "cuda:0" \\
--bam_out \\
--recursive \\
--compress \\
--align_ref "${ref_genome}" \\
-i "${input_dir}" \\
-s outdir \\
--gpu_runners_per_device "${params.guppy_gpu_runners}" \\
--num_callers "${params.guppy_callers}"
"""
}

Problems getting two output files in Nextflow

Hello all!
I´m trying to write a small Nextflow pipeline that runs vcftools comands in 300 vcf´s. The pipe takes four inputs: vcf, pop1, pop2 and a .txt file, and would have to generate two outputs: a .log.weir.fst and a .log.log file. When i run the pipeline, it only gives the .log.weir.fst files but not the .log files.
Here´s my process definition:
process fst_calculation {
publishDir "${results_dir}/fst_results_pop1_pop2/", mode:"copy"
input:
file vcf
file pop_1
file pop_2
file mart
output:
path "*.log.*"
"""
while read linea
do
echo "[DEBUG] working in line: \$linea"
inicio=\$(echo "\$linea" | cut -f3)
final=\$(echo "\$linea" | cut -f4)
cromosoma=\$(echo "\$linea" | cut -f1)
segmento=\$(echo "\$linea" | cut -f5)
vcftools --vcf ${vcf} \
--weir-fst-pop ${pop_1} \
--weir-fst-pop ${pop_2} \
--out \$inicio.log --chr \$cromosoma \
--from-bp \$inicio --to-bp \$final
done < ${mart}
"""
}
And here´s the workflow of my process
/* Load files into channel*/
pop_1 = Channel.fromPath("${params.fst_path}/pop_1")
pop_2 = Channel.fromPath("${params.fst_path}/pop_2")
vcf = Channel.fromPath("${params.fst_path}/*.vcf")
mart = Channel.fromPath("${params.fst_path}/*.txt")
/* Import modules
*/
include {
fst_calculation } from './nf_modules/modules.nf'
/*
* main pipeline logic
*/
workflow {
p1 = fst_calculation(vcf, pop_1, pop_2, mart)
p1.view()
}
When i check the work directory of the pipeline, I can see that the pipe only generates the .log.weir.fst. To verify if my code was wrong, i ran "bash .command.sh" in the working directory and this actually generates the two output files. So, is there a reason for not getting the two output files when i run the pipe?
I appreciate any help.
Note that bash .command.sh and bash .command.run do different things. The latter is basically a wrapper around the former that sets up the environment and stages the declared input files, among other things. If running the latter produces the unusual behavior, you'll need to dig deeper.
It's not completely clear to me what the problem is here. My guess is that vcftools might behave differently when run non-interactively, such that it sends it's logging to STDERR. If that's the case, the logging will be captured in a file called .command.err. To instead send that to a file, you can just redirect STDERR in the usual way, untested:
while IFS=\$'\\t' read -r cromosoma null inicio final segmento ; do
>&2 echo "[DEBUG] Working with: \${cromosoma}, \${inicio}, \${final}, \${segmento}"
vcftools \\
--vcf "${vcf}" \\
--weir-fst-pop "${pop_1}" \\
--weir-fst-pop "${pop_2}" \\
--out "\${inicio}.log" \\
--chr "\${cromosoma}" \\
--from-bp "\${inicio}" \\
--to-bp "\${final}" \\
2> "\${cromosoma}.\${inicio}.\${final}.log.log"
done < "${mart}"

Nextflow multiple inputs with different number of files

I'm trying to input two channels. However, the seacr_res_ch2 has 4 files, bigwig_ch3 has 5 files which contain a control and 4 samples. So I was trying to run the following process to compute the peak center.
When I ran this process I have got this error: unexpected EOF while looking for matching `"'
process compute_matrix_peak_center {
input:
set val(sample_id), file(seacr_bed) from seacr_res_ch2
set val(sample_id), file(bigwig) from bigwig_ch3
output:
set val(sample_id), file("${sample_id}.peak_centered.mat.gz") into peak_center_ch
script:
"""
"computeMatrix reference-point \
-S ${bigwig} \
-R ${seacr_bed} \
-a 1000 \
-b 1000 \
-o ${sample_id}.peak_centered.mat.gz \
--referencePoint center \
-p 10
"""
}
Likely the input files are not file objects. Try replacing the file in the declaration with path, eg:
input:
set val(sample_id), path(seacr_bed) from seacr_res_ch2
set val(sample_id), path(bigwig) from bigwig_ch3
Check the documentation for details https://www.nextflow.io/docs/latest/process.html#input-of-type-path
Your input block declares twice a value called sample_id. There's no guarantee that these values will be the same if the value is derived from two (or more) channels. One value will simply clobber the other(s). You'll need to join() these channels first:
input:
set val(sample_id), file(seacr_bed), file(bigwig) from seacr_res_ch2.join(bigwig_ch3)

sketchtool CLI with fish shell

I tried for a while to get the fish shell equivalent for the sketch cli initialization commands. Can anyone help?
For fish it the first line seems to work if you remove the '$' character. Second line for the argument passing I've tried removing the $, the quotes, & several different formats. Couldn't find documentation for argument passing initialization in fish.
#!/bin/sh
SKETCH=$(mdfind kMDItemCFBundleIdentifier == 'com.bohemiancoding.sketch3' | head -n 1)
# pass on all given arguments
"$SKETCH/Contents/Resources/sketchtool/bin/sketchtool" "$#"
reference: https://developer.sketch.com/cli/
Try:
set SKETCH (mdfind kMDItemCFBundleIdentifier == 'com.bohemiancoding.sketch3' | head -n 1)
$SKETCH/Contents/Resources/sketchtool/bin/sketchtool $argv

dynamically fetching dynamic variable's value from properties file

Below unix commands works:
export myTempVar=myTempVar1
export myTempVar1=myTempVar2
eval echo '$'$myTempVar
This correctly prints myTempVar2.
However, what if myTempVar1=myTempVar2 is present in a properties file instead of directly in the script.
So my script will have
. $MYDIR/myProperties.properties
myTempVar=myTempVar1
myTempVar3=eval echo '$'$myTempVar
Above lines are not working and the value of myTempVar3 is not coming as myTempVar2.
myProperties.properties is having below line:
myTempVar1=myTempVar2
Using indirection is far safer than eval:
#!/bin/bash
. $MYDIR/myProperties.properties # myTempVar1=myTempVar2
myTempVar=myTempVar1
myTempVar3=${!myTempVar}
echo $myTempVar3
Gives:
myTempVar2
and you don't need the echo in eval:
eval myTempVar3='$'$myTempVar