"Mix" operator does not wait for upstream processes to finish

"Mix" operator does not wait for upstream processes to finish - nextflow

I have several upstream processes, say A, B and C, doing similar tasks.
Downstream of that, I have one process X that needs to treat all outputs of the A, B and C in the same way.
I tried to use the "mix" operator to create a single channel from the output files of A, B and C like so :
process A {
output:
file outA
}
process B {
output:
file outB
}
process C {
output:
file outC
}
inX = outA.mix(outB,outC)
process X {
input:
file inX
"myscript.sh"
}
Process A often finishes before B and C, and somehow, process X does not wait for process B and C to finish, and only take the outputs of A as input.

The following snippet works nicely:
process A {
output:
file outA
"""
touch outA
"""
}
process B {
output:
file outB
"""
touch outB
"""
}
process C {
output:
file outC
"""
touch outC
"""
}
inX = outA.mix(outB,outC)
process X {
input:
file inX
"echo myscript.sh"
}
If you continue to experience the same problem feel free to open an issue including a reproducible test case.

Related

How do I process files with matching pattern in nextflow?

Suppose I have nextflow channels:
Channel.fromFilePairs( "test/read*_R{1,2}.fa" )
.set{ reads }
reads.view()
Channel.fromPath(['test/lib_R1.fa','test/lib_R2.fa'] )
.set{ libs }
libs.view()
Which results in:
// reads channel
[read_b, [<path>/test/read_b_R1.fa, <path>/test/read_b_R2.fa]]
[read_a, [<path>/test/read_a_R1.fa, <path>/test/read_a_R2.fa]]
// libs channel
<path>/test/lib_R1.fa
<path>/test/lib_R2.fa
How do I run a process foo that executes matching read-lib pair, where the same lib is used for all read pairs? So basically I want to execute foo 4 times:
foo(test/read_b_R1.fa, test/lib_R1.fa)
foo(test/read_b_R2.fa, test/lib_R2.fa)
foo(test/read_a_R1.fa, test/lib_R1.fa)
foo(test/read_a_R2.fa, test/lib_R2.fa)

If you want to use the same library for all read pairs, what you really want is a value channel which can be read an unlimited number of times without being consumed. Note that a value channel is implicitly created by a process when it's invoked with a simple value. This could indeed be a list of files, but it looks like what you want is just one of these to correspond to each of the R1 or R2 reads. I think the simplest solution here is to just include your process using an alias so that you can pass in the required channels/files without too much effort:
params.reads = 'test/read*_R{1,2}.fa'
include { foo as foo_r1 } from './modules/foo.nf'
include { foo as foo_r2 } from './modules/foo.nf'
workflow {
Channel
.fromFilePairs( params.reads )
.multiMap { sample, reads ->
def (r1, r2) = reads
read1:
tuple(sample, r1)
read2:
tuple(sample, r2)
}
.set { reads }
lib_r1 = file('test/lib_R1.fa')
lib_r2 = file('test/lib_R2.fa')
foo_r1(reads.read1, lib_r1)
foo_r2(reads.read2, lib_r2)
}
Contents of ./modules/foo.nf:
process foo {
debug true
input:
tuple val(sample), path(fasta)
path(lib)
"""
echo $sample, $fasta, $lib
"""
}
Results:
$ nextflow run main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [confident_boyd] DSL2 - revision: 8c81e2d743
executor > local (6)
[a8/e8a752] process > foo_r1 (2) [100%] 3 of 3 ✔
[75/2b32f5] process > foo_r2 (3) [100%] 3 of 3 ✔
readC, readC_R2.fa, lib_R2.fa
readA, readA_R1.fa, lib_R1.fa
readC, readC_R1.fa, lib_R1.fa
readB, readB_R2.fa, lib_R2.fa
readA, readA_R2.fa, lib_R2.fa
readB, readB_R1.fa, lib_R1.fa

process FOO {
debug true
input:
tuple val(files), path(lib)
output:
stdout
script:
file_a = files[0]
file_b = files[1]
"""
echo $file_a with $lib
echo $file_b with $lib
"""
}
workflow {
Channel
.of(['read_b', [file('/test/read_b_R1.fa'), file('/test/read_b_R2.fa')]],
['read_a', [file('/test/read_a_R1.fa'), file('/test/read_a_R2.fa')]]
)
.set { reads }
Channel
.of(file('/test/lib_R1.fa'),
file('/test/lib_R2.fa')
)
.set { libs }
reads
.map { sample, files -> files }
.flatten()
.map { file -> [file.name.split('_')[2].split('.fa')[0], file]}
.groupTuple()
.set { reads }
libs
.map { file -> [file.name.split('_')[1].split('.fa')[0], file]}
.set { libs }
reads
.join(libs)
.map { Rx, path, lib -> [path, lib] }
| FOO
}
The output of the script above is:
N E X T F L O W ~ version 22.10.4
Launching `ex.nf` [elegant_wiles] DSL2 - revision: 00862286fd
executor > local (2)
[58/9b3cf1] process > FOO (2) [100%] 2 of 2 ✔
/test/read_b_R1.fa with lib_R1.fa
/test/read_a_R1.fa with lib_R1.fa
/test/read_b_R2.fa with lib_R2.fa
/test/read_a_R2.fa with lib_R2.fa
EDIT as a reply to the comment below.
If you want the process to run once per element in the channel, check the modified version below:
process FOO {
debug true
input:
tuple val(file), path(lib)
output:
stdout
script:
"""
echo $file with $lib
"""
}
workflow {
Channel
.of(['read_b', [file('/test/read_b_R1.fa'), file('/test/read_b_R2.fa')]],
['read_a', [file('/test/read_a_R1.fa'), file('/test/read_a_R2.fa')]]
)
.set { reads }
Channel
.of(file('/test/lib_R1.fa'),
file('/test/lib_R2.fa')
)
.set { libs }
reads
.map { sample, files -> files }
.flatten()
.map { file -> [file.name.split('_')[2].split('.fa')[0], file]}
.groupTuple()
.set { reads }
libs
.map { file -> [file.name.split('_')[1].split('.fa')[0], file]}
.set { libs }
reads
.join(libs)
.map { Rx, path, lib -> [path, lib] }
.map { x, y -> [[x[0], y], [x[1], y]] }
.flatMap()
| FOO
}
Output:
N E X T F L O W ~ version 22.10.4
Launching `ex.nf` [sharp_ekeblad] DSL2 - revision: 1412af632e
executor > local (4)
[a0/416f59] process > FOO (1) [100%] 4 of 4 ✔
/test/read_b_R2.fa with lib_R2.fa
/test/read_a_R2.fa with lib_R2.fa
/test/read_a_R1.fa with lib_R1.fa
/test/read_b_R1.fa with lib_R1.fa

Lock between N Processes in Promela

I am trying to model one of my project in promela for model checking. In that, i have N no of nodes in network. So, for each node I am making a process. Something like this:
init {
byte proc;
atomic {
proc = 0;
do
:: proc < N ->
run node (q[proc],proc);
proc++
:: proc >= N ->
break
od
}
}
So, basically, here each 'node' is process that will simulate each node in my network. Now, Node Process has 3 threads which run parallelly in my original implementation and within these three threads i have lock at some part so that three threads don't access Critical Section at the same time. So, for this in promela, i have done something like this:
proctype node (chan inp;byte ppid)
{
run recv_A()
run send_B()
run do_C()
}
So here recv_A, send_B and do_C are the three threads running parallelly at each node in the network. Now, the problem is, if i put lock in recv_A, send_B, do_C using atomic then it will put lock lock over all 3*N processes whereas i want a lock such that the lock is applied over groups of three. That is, if process1's(main node process from which recv_A is made to run) recv_A is in its CS then only process1's send_B and do_C should be prohibited to enter into CS and not process2's recv_A, send_B, do_C. Is there a way to do this?

Your have several options, and all revolve around implementing some kind of mutual exclusion algorithm among N processes:
Peterson Algorithm
Eisenberg & McGuire Algorithm
Lamport's bakery Algorithm
Szymański's Algorithm
...
An implementation of the Black & White Bakery Algorithm is available here. Note, however, that these algorithms -maybe with the exception of Peterson's one- tend to be complicated and might make the verification of your system impractical.
A somewhat simple approach is to resort on the Test & Set Algorithm which, however, still uses atomic in the trying section. Here is an example implementation taken from here.
bool lock = false;
int counter = 0;
active [3] proctype mutex()
{
bool tmp = false;
trying:
do
:: atomic {
tmp = lock;
lock = true;
} ->
if
:: tmp;
:: else -> break;
fi;
od;
critical:
printf("Process %d entered critical section.\n", _pid);
counter++;
assert(counter == 1);
counter--;
exit:
lock = false;
printf("Process %d exited critical section.\n", _pid);
goto trying;
}
#define c0 (mutex[0]#critical)
#define c1 (mutex[1]#critical)
#define c2 (mutex[2]#critical)
#define t0 (mutex[0]#trying)
#define t1 (mutex[1]#trying)
#define t2 (mutex[2]#trying)
#define l0 (_last == 0)
#define l1 (_last == 1)
#define l2 (_last == 2)
#define f0 ([] <> l0)
#define f1 ([] <> l1)
#define f2 ([] <> l2)
ltl p1 { [] !(c0 && c1) && !(c0 && c2) && !(c1 && c2)}
ltl p2 { []((t0 || t1 || t2) -> <> (c0 || c1 || c2)) }
ltl p3 {
(f0 -> [](t0 -> <> c0))
&&
(f1 -> [](t1 -> <> c1))
&&
(f2 -> [](t2 -> <> c2))
};
In your code, you should use a different lock variable for every group of 3 related threads. The lock contention would still happen at a global level, but some process working inside the critical section would not cause other processes to wait other than those who belong to the same thread group.
Another idea is that to exploit channels to achieve mutual exclusion: have each group of threads share a common asynchronous channel which initially contains one token message. Whenever one of these threads wants to access the critical section, it reads from the channel. If the token is not inside the channel, it waits until it becomes available. Otherwise, it can go forward in the critical section and when it finishes it puts the token back inside the shared channel.
proctype node (chan inp; byte ppid)
{
chan lock = [1] of { bool };
lock!true;
run recv_A(lock);
run send_B(lock);
run do_C(lock);
};
proctype recv_A(chan lock)
{
bool token;
do
:: true ->
// non-critical section code
// ...
// acquire lock
lock?token ->
// critical section
// ...
// release lock
lock!token
// non-critical section code
// ...
od;
};
...
This approach might be the simplest to start with, so I would pick this one first. Note however that I have no idea on how that affects performance during verification time, and this might very well depend on how channels are internally handled by Spin. A complete code example of this solution can be found here in the file channel_mutex.pml.
To conclude, note that you might want to add mutual exclusion, progress and lockout-freedom LTL properties to your model to ensure that it behaves correctly. An example of the definition of these properties is available here and a code example is available here.

AWK: Parsing a directed graph / Extracting records from a file based on fields matching parts of an input file

Problem
So I have an input file with three fields. It is basically a list describing a type of directed graph. The first field is the starting node, second connection type (as in this case there is more than one), and last is the node projected onto.
The problem is, this is a very large and unruly direct graph, and I am only interested in some paths. So I want to provide an input file which has a name of nodes that I care about. If the nodes are mentioned in either the first or third field of the graph file, then I want that entire record (as the path type might vary).
Question
How to extract only certain records of a directed graph?
Bonus, how to extract only those paths which join nodes of interest by at most one neighbor (i.e. nodes of interest can be second nearest neighbors).
Request
I am trying to improve my AWK programming, which is why 1) I want to do this in AWK and 2) I would greatly appreciate a verbose explanation of the code :)
Example of the Problem
Input file:
A
C
D
File to parse:
A -> B
A -> C
A -> D
B -> A
B -> D
C -> E
D -> F
E -> B
E -> F
F -> C
...
Output:
A -> B
A -> C
A -> D
B -> A
B -> D
C -> E
D -> F
F -> C
Bonus Example:
A -> B -> D -> F -> C

If I understand your problem correctly, then this will do:
awk 'NR==FNR { data[$1] = 1; next } $1 in data || $3 in data { print }' graph[12]
How it works: while reading the first file, add all interesting nodes to data. While reading the second file, print only the lines where field one or field three is in data, i.e. is an interesting node.

Going for the bonus:
function left(str) { # returns the leftmost char of a given edge (A -> B)
return substr(str,1,1)
}
function right(str) { # returns the rightmost...
return substr(str,length(str),1)
}
function cmp_str_ind(i1, v1, i2, v2) # array travese order function
{ # this forces the start from the node in the beginning of input file
if(left(i1)==left(a)&&left(i2)!=left(a)) # or leftmost value in a
return -1
else if(left(i2)==left(a)&&left(i1)!=left(a))
return 1
else if(i1 < i2)
return -1
return (i1 != i2)
}
function trav(a,b,c,d) { # goes thru edges in AWK order
# print "A:"a," C:"c," D:"d
if(index(d,c)||index(d,right(c))) {
return ""
}
d=d", "c # c holds the current edge being examined
if(index(a,right(c))) { # these edges affect a
# print "3:"c
sub(right(c),"",a)
if(a=="") { # when a is empty, path is found
print d # d has the traversed path
exit
}
for (i in b) {
if(left(i)==right(c)) # only try the ones that can be added to the end
trav(a,b,i,d)
}
a=a""right(c)
} else {
# print "4:"c
for (i in b)
if(left(i)==right(c))
trav(a,b,i,d)
}
}
BEGIN { # playing with the traverse order
PROCINFO["sorted_in"]="cmp_str_ind"
}
NR==FNR {
a=a""$0 # a has the input (ADC)
next
}
{
b[$0]=$0 # b has the edges
}
END { # after reading in the data, recursively test every path
for(i in b) # candidate pruning the unfit ones first. CLR or Dijkstra
if(index(a,left(i))) { # were not consulted on that logic.
# print "3: "i
sub(left(i),"",a)
trav(a,b,i,left(i))
a=a""left(i)
}
else {
# print "2: "i
trav(a,b,i,left(i))
}
}
$ awk -f graph.awk input parse
A, A -> D, D -> F, F -> C
If you uncomment the BEGIN part, you get the A, A -> B, B -> D, D -> F, F -> C. I know, I should work on it more and comment it better but it's midnight over here. Maybe tomorrow.

How to receive message from 'any' channel in PROMELA/SPIN

I'm modeling an algorithm in Spin.
I have a process that has several channels and at some point, I know a message is going to come but don't know from which channel. So want to wait (block) the process until it a message comes from any of the channels. how can I do that?

I think you need Promela's if construct (see http://spinroot.com/spin/Man/if.html).
In the process you're referring to, you probably need the following:
byte var;
if
:: ch1?var -> skip
:: ch2?var -> skip
:: ch3?var -> skip
fi
If none of the channels have anything on them, then "the selection construct as a whole blocks" (quoting the manual), which is exactly the behaviour you want.
To quote the relevant part of the manual more fully:
"An option [each of the :: lines] can be selected for execution only when its guard statement is executable [the guard statement is the part before the ->]. If more than one guard statement is executable, one of them will be selected non-deterministically. If none of the guards are executable, the selection construct as a whole blocks."
By the way, I haven't syntax checked or simulated the above in Spin. Hopefully it's right. I'm quite new to Promela and Spin myself.

If you want to have your number of channels variable without having to change the implementation of the send and receive parts, you might use the approach of the following producer-consumer example:
#define NUMCHAN 4
chan channels[NUMCHAN];
init {
chan ch1 = [1] of { byte };
chan ch2 = [1] of { byte };
chan ch3 = [1] of { byte };
chan ch4 = [1] of { byte };
channels[0] = ch1;
channels[1] = ch2;
channels[2] = ch3;
channels[3] = ch4;
// Add further channels above, in
// accordance with NUMCHAN
// First let the producer write
// something, then start the consumer
run producer();
atomic { _nr_pr == 1 ->
run consumer();
}
}
proctype consumer() {
byte var, i;
chan theChan;
i = 0;
do
:: i == NUMCHAN -> break
:: else ->
theChan = channels[i];
if
:: skip // non-deterministic skip
:: nempty(theChan) ->
theChan ? var;
printf("Read value %d from channel %d\n", var, i+1)
fi;
i++
od
}
proctype producer() {
byte var, i;
chan theChan;
i = 0;
do
:: i == NUMCHAN -> break
:: else ->
theChan = channels[i];
if
:: skip;
:: theChan ! 1;
printf("Write value 1 to channel %d\n", i+1)
fi;
i++
od
}
The do loop in the consumer process non-deterministically chooses an index between 0 and NUMCHAN-1 and reads from the respective channel, if there is something to read, else this channel is always skipped. Naturally, during a simulation with Spin the probability to read from channel NUMCHAN is much smaller than that of channel 0, but this does not make any difference in model checking, where any possible path is explored.

Problem with predicate in Alloy

So I have the following bit of code in Alloy:
sig Node { }
sig Queue { root : Node }
pred SomePred {
no q, q' : Queue | q.root = q'.root
}
run SomePred for 3
but this won't yield any instance containing a Queue, I wonder why. It only shows up instances with Nodes. I've tried the equivalent predicate
pred SomePred' {
all q, q' : Queue | q.root != q'.root
}
but the output is the same.
Am I missing something?

There's a logic flaw in there:
fact SomeFact {
no q, q' : Queue | q.root = q'.root
}
Assume there is an instance with a single queue Q that has a given root R. When running SomeFact, it'd test the only queue available, Q, and it'd find it that Q.root = Q.root, thus, excluding the given instance from coming to life.
The same reasoning can be made for instances with an arbitrary number of queues.
Here is a working version:
sig Node {
}
sig Queue {
root : Node
}
fact sss {
all disj q, q' : Queue | q.root != q'.root
}
pred abc() {
}
run abc for 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

"Mix" operator does not wait for upstream processes to finish - nextflow

Related

How do I process files with matching pattern in nextflow?

Lock between N Processes in Promela

AWK: Parsing a directed graph / Extracting records from a file based on fields matching parts of an input file

How to receive message from 'any' channel in PROMELA/SPIN

Problem with predicate in Alloy

Categories

Resources