Makefile : how to create global variable available to sub-makefile? - variables

Given a master.makefile calling a sub-makefile such :
downloads:
make -f downloads.makefile
And a sub-makefile downloads.makefile such :
download: clean
curl -o ./data/<itemname>.png 'http://www.mysite.com/<itemname>.png'
echo "downloaded <itemname>: Done!"
How to set a global variable itemname in my masterfile, available to my sub-makefile ?
Note: My project is actually more complex, with 12 different sub-makefile, which should reuse the same parameters/variable from the master file. The masterfile should assign the value, while the sub-makefile should retrive the variable's value where I put <itemname>.

There are various ways to do it.
First, always, always use $(MAKE) and never make when running a sub-make. Then...
1. You can pass the value on the recursive command line:
itemname = whatever
downloads:
$(MAKE) -f downloads.makefile itemname=$(itemname)
2. You can export the variable in the parent makefile:
export itemname = myvalue
downloads:
$(MAKE) -f downloads.makefile
with sub-makefile such :
download:
echo "downloaded $(itemname): Done!"

Related

Nextflow: publishDir, output channels, and output subdirectories

I've been trying to learn how to use Nextflow and come across an issue with adding output to a channel as I need the processes to run in an order. I want to pass output files from one of the output subdirectories created by the tool (ONT-Guppy) into a channel, but can't seem to figure out how.
Here is the nextflow process in question:
process GupcallBases {
publishDir "$params.P1_outDir", mode: 'copy', pattern: "pass/*.bam"
executor = 'pbspro'
clusterOptions = "-lselect=1:ncpus=${params.P1_threads}:mem=${params.P1_memory}:ngpus=1:gpu_type=${params.P1_GPU} -lwalltime=${params.P1_walltime}:00:00"
output:
path "*.bam" into bams_ch
script:
"""
module load cuda/11.4.2
singularity exec --nv $params.Gup_container \
guppy_basecaller --config $params.P1_gupConf \
--device "cuda:0" \
--bam_out \
--recursive \
--compress \
--align_ref $params.refGen \
-i $params.P1_inDir \
-s $params.P1_outDir \
--gpu_runners_per_device $params.P1_GPU_runners \
--num_callers $params.P1_callers
"""
}
The output of the process is something like this:
$params.P1_outDir/pass/(lots of bams and fastqs)
$params.P1_outDir/fail/(lots of bams and fastqs)
$params.P1_outDir/(a few txt and log files)
I only want to keep the bam files in $params.P1_outDir/pass/, hence trying to use the pattern = "pass/*.bam, but I've tried a few other patterns to no avail.
The output syntax was chosen since once this process is done, using the following channel works:
// Channel
// .fromPath("${params.P1_outDir}/pass/*.bam")
// .ifEmpty { error "Cannot find any bam files in ${params.P1_outDir}" }
// .set { bams_ch }
But the problem is if I don't pass the files into the output channel of the first process, they run in parallel. I could simply be missing something in the extensive documentation in how to order processes, which would be an alternative solution.
Edit: I forgo to add the error message which is here: Missing output file(s) `*.bam` expected by process `GupcallBases` and the $params.P1_outDir/ contains the subdirectories and all the log files despite the pattern argument.
Thanks in advance.
Nextflow processes are designed to run isolated from each other, but this can be circumvented somewhat when the command-line input and/or outputs are specified using params. Using params like this can be problematic because if, for example, a params variable specifies an absolute path but your output declaration expects files in the Nextflow working directory (e.g. ./work/fc/0249e72585c03d08e31ce154b6d873), you will get the 'Missing output file(s) expected by process' error you're seeing.
The solution is to ensure your inputs are localized in the working directory using an input declaration block and that the outputs are also written to the work dir. Note that only files specified in the output declaration block can be published using the publishDir directive.
Also, best to avoid calling Singularity manually in your script block. Instead just add singularity.enabled = true to your nextflow.config. This should also work nicely with the beforeScript process directive to initialize your environment:
params.publishDir = './results'
input_dir = file( params.input_dir )
guppy_config = file( params.guppy_config )
ref_genome = file( params.ref_genome )
process GuppyBasecaller {
publishDir(
path: "${params.publishDir}/GuppyBasecaller",
mode: 'copy',
saveAs: { fn -> fn.substring(fn.lastIndexOf('/')+1) },
)
beforeScript 'module load cuda/11.4.2; export SINGULARITY_NV=1'
container '/path/to/guppy_basecaller.img'
input:
path input_dir
path guppy_config
path ref_genome
output:
path "outdir/pass/*.bam" into bams_ch
"""
mkdir outdir
guppy_basecaller \\
--config "${guppy_config}" \\
--device "cuda:0" \\
--bam_out \\
--recursive \\
--compress \\
--align_ref "${ref_genome}" \\
-i "${input_dir}" \\
-s outdir \\
--gpu_runners_per_device "${params.guppy_gpu_runners}" \\
--num_callers "${params.guppy_callers}"
"""
}

Makefile variable manipulation

I'm doing this:
apply: init
#terraform apply -auto-approve
BUCKET=$(shell terraform output -json | jq '.S3_Bucket.value')
DYNAMODB=$(shell terraform output -json | jq '.dynamo_db_lock.value')
#echo $${BUCKET}
make shows both variables being set (I was using := but that doesn't work for me, since i need them set when i execute apply) but it's still echoing blank:
...
Outputs:
S3_Bucket = "bucket1"
dynamo_db_lock = "dyna1"
BUCKET="bucket"
DYNAMODB="dyna1"
<-- echo should print it out here
I want to use that variable for a sed afterwards...
Thanks all!
Each makefile recipe line is run in its own shell. Thus, shell variables set in one recipe line will not effect other recipe lines. You can get around this by catenating all the lines as so:
apply: init
#terraform apply -auto-approve
#BUCKET=$(shell terraform output -json | jq '.S3_Bucket.value'); \
DYNAMODB=$(shell terraform output -json | jq '.dynamo_db_lock.value'); \
echo $${BUCKET}

Nextflow multiple inputs with different number of files

I'm trying to input two channels. However, the seacr_res_ch2 has 4 files, bigwig_ch3 has 5 files which contain a control and 4 samples. So I was trying to run the following process to compute the peak center.
When I ran this process I have got this error: unexpected EOF while looking for matching `"'
process compute_matrix_peak_center {
input:
set val(sample_id), file(seacr_bed) from seacr_res_ch2
set val(sample_id), file(bigwig) from bigwig_ch3
output:
set val(sample_id), file("${sample_id}.peak_centered.mat.gz") into peak_center_ch
script:
"""
"computeMatrix reference-point \
-S ${bigwig} \
-R ${seacr_bed} \
-a 1000 \
-b 1000 \
-o ${sample_id}.peak_centered.mat.gz \
--referencePoint center \
-p 10
"""
}
Likely the input files are not file objects. Try replacing the file in the declaration with path, eg:
input:
set val(sample_id), path(seacr_bed) from seacr_res_ch2
set val(sample_id), path(bigwig) from bigwig_ch3
Check the documentation for details https://www.nextflow.io/docs/latest/process.html#input-of-type-path
Your input block declares twice a value called sample_id. There's no guarantee that these values will be the same if the value is derived from two (or more) channels. One value will simply clobber the other(s). You'll need to join() these channels first:
input:
set val(sample_id), file(seacr_bed), file(bigwig) from seacr_res_ch2.join(bigwig_ch3)

issue with a modification of youtube-dl in .zshrc

the code I have in my .zshrc is:
ytdcd () { #youtube-dl that automatically puts stuff in a specific folder and returns to the former working directory after.
cd ~/youtube/new/ && {
youtube-dl "$#"
cd - > /dev/null
}
}
ytd() { #sofar, this function can only take one page. so, i can only send one youttube video code per line. will modify it to accept multiple lines..
for i in $*;
do
params=" $params https://youtu.be/$i"
done
ytdcd -f 18 $params
}
so, on the commandline (terminal), when i enter ytd DFreHo3UCD0, i would like to have the video at https://youtu.be/DFreHo3UCD0 to be downloaded. the problem is that when I enter the command in succession, the system just tries to download the video for the previous command and rightly claims the download is complete.
For example, entering:
> ytd DFreHo3UCD0
> ytd L3my9luehfU
would not attempt to download the video for L3my9luehfU but only the video for DFreHo3UCD0 twice.
First -- there's no point to returning to the old directory for ytdcd: You can change to a new directory only inside a subshell, and then exec youtube-dl to replace that subshell with the application process:
This has fewer things to go wrong: Aborting the function's execution can't leave things in the wrong directory, because the parent shell (the one you're interactively using) never changed directories in the first place.
ytdcd () {
(cd ~/youtube/new/ && exec youtube-dl "$#")
}
Second -- use an array when building argument lists, not a string.
If you use set -x to log its execution, you'll see that your original command runs something like:
ytdcd -f 18 'https://youtu.be/one https://youtu.be/two https://youtu.be/three'
See those quotes? That's because $params is a string, passed as a single argument, not an array. (In bash -- or another shell following POSIX rules -- an unquoted string expansion would be string-split and glob-expanded, but zsh doesn't follow POSIX rules).
The following builds up an array of separate arguments and passes them individually:
ytd() {
local -a params=( )
local i
for i; do
params+=( "https://youtu.be/$i" )
done
ytdcd -f 18 "${params[#]}"
}
Finally, it's come up that you don't actually intend to pass all the URLs to just one youtube-dl instance. To run a separate instance per URL, use:
ytd() {
local i retval=0
for i; do
ytdcd -f 18 "$i" || retval=$?
done
return "$retval"
}
Note here that we're capturing non-success exit status, so as not to hide an error in any ytdcd instance other than the last (which would otherwise occur).
I would declare param as local, so that you are not appending url after urls...
You can try to add this awesome function to your .zshrc:
funfun() {
local _fun1="$_fun1 fun1!"
_fun2="$_fun2 fun2!"
echo "1 says: $_fun1"
echo "2 says: $_fun2"
}
To observe the thing ;)
EDIT (Explanation):
When sourcing shell script, you add it to you current environment, that is why you can run those function you define. So, when those function use variables, by default, those variable will be global and accessible from anywhere in your environment! Therefore, In this case param is defined globally for all the length of your shell session. Since you want to allow the download of several video at once, you are appending values to this global variable, which will grow all the time.
Enforcing local tells zsh to limit the scope of params to the function only.
Another solution is to reset the variable when you call the function.

dynamically fetching dynamic variable's value from properties file

Below unix commands works:
export myTempVar=myTempVar1
export myTempVar1=myTempVar2
eval echo '$'$myTempVar
This correctly prints myTempVar2.
However, what if myTempVar1=myTempVar2 is present in a properties file instead of directly in the script.
So my script will have
. $MYDIR/myProperties.properties
myTempVar=myTempVar1
myTempVar3=eval echo '$'$myTempVar
Above lines are not working and the value of myTempVar3 is not coming as myTempVar2.
myProperties.properties is having below line:
myTempVar1=myTempVar2
Using indirection is far safer than eval:
#!/bin/bash
. $MYDIR/myProperties.properties # myTempVar1=myTempVar2
myTempVar=myTempVar1
myTempVar3=${!myTempVar}
echo $myTempVar3
Gives:
myTempVar2
and you don't need the echo in eval:
eval myTempVar3='$'$myTempVar