Nextflow, Limit the number of concurrent workers in Google - nextflow

I was wondering if there is a way of limiting the number of concurrently running workers when using the Google Life Sciences executor.
I was using the following in my nextflow.config file, but apparently it did not matter.
executor {
queueSize = 100
}
There were more than 100 workers concurrently running (the number in braces next to the processes name showed 108).

Setting executor.queueSize = 100 like you have done should exactly limit the number of parallel tasks the executor will handle. However, I think the problem you're hitting is actually due to this:
The channel guarantees that items are delivered in the same order as
they have been sent - but - since the process is executed in a
parallel manner, there is no guarantee that they are processed in the
same order as they are received.
This is true even when the queueSize is set to 1. The results might be unexpected, but we can test this pretty easily with:
Contents of nextflow.config:
executor {
queueSize = 1
}
Contents of main.nf:
nextflow.enable.dsl=2
process test {
tag { "myval: ${myval}" }
input:
val myval
"""
sleep 1
"""
}
workflow {
myvals = Channel.of( 'A'..'Z' )
test( myvals )
}
Run with:
nextflow run -ansi-log false main.nf
Results:
N E X T F L O W ~ version 21.04.3
Launching `main.nf` [ecstatic_visvesvaraya] - revision: d34a24fe4a
[b9/21ef9b] Submitted process > test (myval: B)
[39/783aaf] Submitted process > test (myval: H)
[b5/aeae8b] Submitted process > test (myval: A)
[7a/e36e72] Submitted process > test (myval: D)
[de/25f001] Submitted process > test (myval: I)
[72/f913f2] Submitted process > test (myval: C)
[43/a2c78e] Submitted process > test (myval: L)
[5b/7c0434] Submitted process > test (myval: F)
[25/884e7c] Submitted process > test (myval: E)
[16/4c3b41] Submitted process > test (myval: G)
[62/c7bee1] Submitted process > test (myval: Q)
[71/cdbd37] Submitted process > test (myval: J)
[e6/634461] Submitted process > test (myval: N)
[37/03dd88] Submitted process > test (myval: S)
[fd/70867c] Submitted process > test (myval: K)
[cf/fb7e83] Submitted process > test (myval: T)
[56/3d6d41] Submitted process > test (myval: M)
[1e/81ad89] Submitted process > test (myval: O)
[db/66a292] Submitted process > test (myval: R)
[d5/212940] Submitted process > test (myval: Z)
[a8/1a33ab] Submitted process > test (myval: P)
[7e/60daa7] Submitted process > test (myval: U)
[d5/4d19c4] Submitted process > test (myval: V)
[13/8404ff] Submitted process > test (myval: W)
[22/adb044] Submitted process > test (myval: X)
[65/21a22b] Submitted process > test (myval: Y)

Related

"Mix" operator does not wait for upstream processes to finish

I have several upstream processes, say A, B and C, doing similar tasks.
Downstream of that, I have one process X that needs to treat all outputs of the A, B and C in the same way.
I tried to use the "mix" operator to create a single channel from the output files of A, B and C like so :
process A {
output:
file outA
}
process B {
output:
file outB
}
process C {
output:
file outC
}
inX = outA.mix(outB,outC)
process X {
input:
file inX
"myscript.sh"
}
Process A often finishes before B and C, and somehow, process X does not wait for process B and C to finish, and only take the outputs of A as input.
The following snippet works nicely:
process A {
output:
file outA
"""
touch outA
"""
}
process B {
output:
file outB
"""
touch outB
"""
}
process C {
output:
file outC
"""
touch outC
"""
}
inX = outA.mix(outB,outC)
process X {
input:
file inX
"echo myscript.sh"
}
If you continue to experience the same problem feel free to open an issue including a reproducible test case.

Lock between N Processes in Promela

I am trying to model one of my project in promela for model checking. In that, i have N no of nodes in network. So, for each node I am making a process. Something like this:
init {
byte proc;
atomic {
proc = 0;
do
:: proc < N ->
run node (q[proc],proc);
proc++
:: proc >= N ->
break
od
}
}
So, basically, here each 'node' is process that will simulate each node in my network. Now, Node Process has 3 threads which run parallelly in my original implementation and within these three threads i have lock at some part so that three threads don't access Critical Section at the same time. So, for this in promela, i have done something like this:
proctype node (chan inp;byte ppid)
{
run recv_A()
run send_B()
run do_C()
}
So here recv_A, send_B and do_C are the three threads running parallelly at each node in the network. Now, the problem is, if i put lock in recv_A, send_B, do_C using atomic then it will put lock lock over all 3*N processes whereas i want a lock such that the lock is applied over groups of three. That is, if process1's(main node process from which recv_A is made to run) recv_A is in its CS then only process1's send_B and do_C should be prohibited to enter into CS and not process2's recv_A, send_B, do_C. Is there a way to do this?
Your have several options, and all revolve around implementing some kind of mutual exclusion algorithm among N processes:
Peterson Algorithm
Eisenberg & McGuire Algorithm
Lamport's bakery Algorithm
SzymaƄski's Algorithm
...
An implementation of the Black & White Bakery Algorithm is available here. Note, however, that these algorithms -maybe with the exception of Peterson's one- tend to be complicated and might make the verification of your system impractical.
A somewhat simple approach is to resort on the Test & Set Algorithm which, however, still uses atomic in the trying section. Here is an example implementation taken from here.
bool lock = false;
int counter = 0;
active [3] proctype mutex()
{
bool tmp = false;
trying:
do
:: atomic {
tmp = lock;
lock = true;
} ->
if
:: tmp;
:: else -> break;
fi;
od;
critical:
printf("Process %d entered critical section.\n", _pid);
counter++;
assert(counter == 1);
counter--;
exit:
lock = false;
printf("Process %d exited critical section.\n", _pid);
goto trying;
}
#define c0 (mutex[0]#critical)
#define c1 (mutex[1]#critical)
#define c2 (mutex[2]#critical)
#define t0 (mutex[0]#trying)
#define t1 (mutex[1]#trying)
#define t2 (mutex[2]#trying)
#define l0 (_last == 0)
#define l1 (_last == 1)
#define l2 (_last == 2)
#define f0 ([] <> l0)
#define f1 ([] <> l1)
#define f2 ([] <> l2)
ltl p1 { [] !(c0 && c1) && !(c0 && c2) && !(c1 && c2)}
ltl p2 { []((t0 || t1 || t2) -> <> (c0 || c1 || c2)) }
ltl p3 {
(f0 -> [](t0 -> <> c0))
&&
(f1 -> [](t1 -> <> c1))
&&
(f2 -> [](t2 -> <> c2))
};
In your code, you should use a different lock variable for every group of 3 related threads. The lock contention would still happen at a global level, but some process working inside the critical section would not cause other processes to wait other than those who belong to the same thread group.
Another idea is that to exploit channels to achieve mutual exclusion: have each group of threads share a common asynchronous channel which initially contains one token message. Whenever one of these threads wants to access the critical section, it reads from the channel. If the token is not inside the channel, it waits until it becomes available. Otherwise, it can go forward in the critical section and when it finishes it puts the token back inside the shared channel.
proctype node (chan inp; byte ppid)
{
chan lock = [1] of { bool };
lock!true;
run recv_A(lock);
run send_B(lock);
run do_C(lock);
};
proctype recv_A(chan lock)
{
bool token;
do
:: true ->
// non-critical section code
// ...
// acquire lock
lock?token ->
// critical section
// ...
// release lock
lock!token
// non-critical section code
// ...
od;
};
...
This approach might be the simplest to start with, so I would pick this one first. Note however that I have no idea on how that affects performance during verification time, and this might very well depend on how channels are internally handled by Spin. A complete code example of this solution can be found here in the file channel_mutex.pml.
To conclude, note that you might want to add mutual exclusion, progress and lockout-freedom LTL properties to your model to ensure that it behaves correctly. An example of the definition of these properties is available here and a code example is available here.

Waiting for executoro to finish while tasks can submit more tasks

I submitting tasks to a ThreadPoolExecutor, while those tasks can submit more tasks in some cases. I want to wait until all tasks are finished. I have a solution for that using a condition over the internal queue size of the executor, but though it works, I feel it is not the best approach.
Here is an example for what I'm doing, I reduced it to a very simple case:
from concurrent.futures.thread import ThreadPoolExecutor
from threading import Condition
def func(e: ThreadPoolExecutor, c: Condition, val: int):
print(val)
if val < 5:
# A case when I need to submit a new task
e.submit(func, e, c, val + 1)
with c:
c.notify()
with ThreadPoolExecutor(max_workers=5) as e:
c = Condition()
for i in range(10):
print(e._work_queue.qsize())
e.submit(func, e, c, 1)
with c:
while not e._work_queue.qsize() == 0:
c.wait_for(lambda: e._work_queue.qsize() == 0)
Is there a better and/or cleaner way to do that?

Promela - non-determinism not non-deterministic?

Consider this snippet:
chan sel = [0] of {int};
active proctype Selector(){
int not_me;
endselector:
do
:: sel ? not_me;
if
:: 0 != not_me -> sel ! 0;
:: 1 != not_me -> sel ! 1;
:: 2 != not_me -> sel ! 2;
:: 3 != not_me -> sel ! 3;
:: else -> -1;
fi
od
}
proctype H(){
int i = -1;
int count = 1000;
do
:: sel ! i; sel ? i; printf("currently selected: %d\n",i); count = count -1;
:: count < 0 -> break;
od
assert(false);
}
init{
atomic{
run H();
}
}
You'd expect this to print pretty the values 0..3 pretty arbitrarily until the counter falls below 0, at which point it can either print another number or it will terminate.
However, that doesn't seem to be the case.
The only values returned are 0, then 1, then 0, then 1, then 0, then 1, ...
Did I somehow misunderstand the "non-determinism" of the if/fi statements?
(using ispin on ubuntu, if that matters).
Relevant part of language spec. Seems non-determinstic to me.
If you're looking at (a few) traces of the system only, then you're at the mercy of the (pseudo) random generator.
I thought the main purpose of SPIN is to prove properties. So, you could write a formula F that describes the trace(s) that you want, and then have SPIN check that "system and F" has a model.
If you are running Spin in 'simulation' mode, then the else options are visited deterministically, I believe. So in the Selector proctype, the simulation proceeds in the if by checking the options as: 0 ~= not_me and then the 1, 2, 3 options. For your execution, you thus ping pong between 0 and 1.
You can confirm this, by replacing your if statement with:
if
:: 0 != not_me -> sel ! 0;
:: 1 != not_me -> sel ! 1;
:: else -> assert(false)
fi
and your simulation will never reach the assert.
Spin can also be run in 'verification' mode - generate a pan executable and execute that. Then, all cases will be visited (modulo limits in memory and time). However, in 'verification' mode nothing is printed out - so you might be hard pressed to see the other cases!

How do you print in a Go test using the "testing" package?

I'm running a test in Go with a statement to print something (i.e. for debugging of tests) but it's not printing anything.
func TestPrintSomething(t *testing.T) {
fmt.Println("Say hi")
}
When I run go test on this file, this is the output:
ok command-line-arguments 0.004s
The only way to really get it to print, as far as I know, is to print it via t.Error(), like so:
func TestPrintSomethingAgain(t *testing.T) {
t.Error("Say hi")
}
Which outputs this:
Say hi
--- FAIL: TestPrintSomethingAgain (0.00 seconds)
foo_test.go:35: Say hi
FAIL
FAIL command-line-arguments 0.003s
gom: exit status 1
I've Googled and looked through the manual but didn't find anything.
The structs testing.T and testing.B both have a .Log and .Logf method that sound to be what you are looking for. .Log and .Logf are similar to fmt.Print and fmt.Printf respectively.
See more details here: http://golang.org/pkg/testing/#pkg-index
fmt.X print statements do work inside tests, but you will find their output is probably not on screen where you expect to find it and, hence, why you should use the logging methods in testing.
If, as in your case, you want to see the logs for tests that are not failing, you have to provide go test the -v flag (v for verbosity). More details on testing flags can be found here: https://golang.org/cmd/go/#hdr-Testing_flags
For example,
package verbose
import (
"fmt"
"testing"
)
func TestPrintSomething(t *testing.T) {
fmt.Println("Say hi")
t.Log("Say bye")
}
go test -v
=== RUN TestPrintSomething
Say hi
--- PASS: TestPrintSomething (0.00 seconds)
v_test.go:10: Say bye
PASS
ok so/v 0.002s
Command go
Description of testing flags
-v
Verbose output: log all tests as they are run. Also print all
text from Log and Logf calls even if the test succeeds.
Package testing
func (*T) Log
func (c *T) Log(args ...interface{})
Log formats its arguments using default formatting, analogous to Println, and records the text in the error log. For tests, the text will be printed only if the test fails or the -test.v flag is set. For benchmarks, the text is always printed to avoid having performance depend on the value of the -test.v flag.
t.Log() will not show up until after the test is complete, so if you're trying to debug a test that is hanging or performing badly it seems you need to use fmt.
Yes: that was the case up to Go 1.13 (August 2019) included.
And that was followed in golang.org issue 24929
Consider the following (silly) automated tests:
func TestFoo(t *testing.T) {
t.Parallel()
for i := 0; i < 15; i++ {
t.Logf("%d", i)
time.Sleep(3 * time.Second)
}
}
func TestBar(t *testing.T) {
t.Parallel()
for i := 0; i < 15; i++ {
t.Logf("%d", i)
time.Sleep(2 * time.Second)
}
}
func TestBaz(t *testing.T) {
t.Parallel()
for i := 0; i < 15; i++ {
t.Logf("%d", i)
time.Sleep(1 * time.Second)
}
}
If I run go test -v, I get no log output until all of TestFoo is done, then no output until all of TestBar is done, and again no more output until all of TestBaz is done.
This is fine if the tests are working, but if there is some sort of bug, there are a few cases where buffering log output is problematic:
When iterating locally, I want to be able to make a change, run my tests, see what's happening in the logs immediately to understand what's going on, hit CTRL+C to shut the test down early if necessary, make another change, re-run the tests, and so on.
If TestFoo is slow (e.g., it's an integration test), I get no log output until the very end of the test. This significantly slows down iteration.
If TestFoo has a bug that causes it to hang and never complete, I'd get no log output whatsoever. In these cases, t.Log and t.Logf are of no use at all.
This makes debugging very difficult.
Moreover, not only do I get no log output, but if the test hangs too long, either the Go test timeout kills the test after 10 minutes, or if I increase that timeout, many CI servers will also kill off tests if there is no log output after a certain amount of time (e.g., 10 minutes in CircleCI).
So now my tests are killed and I have nothing in the logs to tell me what happened.
But for (possibly) Go 1.14 (Q1 2020): CL 127120
testing: stream log output in verbose mode
The output now is:
=== RUN TestFoo
=== PAUSE TestFoo
=== RUN TestBar
=== PAUSE TestBar
=== RUN TestBaz
=== PAUSE TestBaz
=== CONT TestFoo
=== CONT TestBaz
main_test.go:30: 0
=== CONT TestFoo
main_test.go:12: 0
=== CONT TestBar
main_test.go:21: 0
=== CONT TestBaz
main_test.go:30: 1
main_test.go:30: 2
=== CONT TestBar
main_test.go:21: 1
=== CONT TestFoo
main_test.go:12: 1
=== CONT TestBaz
main_test.go:30: 3
main_test.go:30: 4
=== CONT TestBar
main_test.go:21: 2
=== CONT TestBaz
main_test.go:30: 5
=== CONT TestFoo
main_test.go:12: 2
=== CONT TestBar
main_test.go:21: 3
=== CONT TestBaz
main_test.go:30: 6
main_test.go:30: 7
=== CONT TestBar
main_test.go:21: 4
=== CONT TestBaz
main_test.go:30: 8
=== CONT TestFoo
main_test.go:12: 3
=== CONT TestBaz
main_test.go:30: 9
=== CONT TestBar
main_test.go:21: 5
=== CONT TestBaz
main_test.go:30: 10
main_test.go:30: 11
=== CONT TestFoo
main_test.go:12: 4
=== CONT TestBar
main_test.go:21: 6
=== CONT TestBaz
main_test.go:30: 12
main_test.go:30: 13
=== CONT TestBar
main_test.go:21: 7
=== CONT TestBaz
main_test.go:30: 14
=== CONT TestFoo
main_test.go:12: 5
--- PASS: TestBaz (15.01s)
=== CONT TestBar
main_test.go:21: 8
=== CONT TestFoo
main_test.go:12: 6
=== CONT TestBar
main_test.go:21: 9
main_test.go:21: 10
=== CONT TestFoo
main_test.go:12: 7
=== CONT TestBar
main_test.go:21: 11
=== CONT TestFoo
main_test.go:12: 8
=== CONT TestBar
main_test.go:21: 12
main_test.go:21: 13
=== CONT TestFoo
main_test.go:12: 9
=== CONT TestBar
main_test.go:21: 14
=== CONT TestFoo
main_test.go:12: 10
--- PASS: TestBar (30.01s)
=== CONT TestFoo
main_test.go:12: 11
main_test.go:12: 12
main_test.go:12: 13
main_test.go:12: 14
--- PASS: TestFoo (45.02s)
PASS
ok command-line-arguments 45.022s
It is indeed in Go 1.14, as Dave Cheney attests in "go test -v streaming output":
In Go 1.14, go test -v will stream t.Log output as it happens, rather than hoarding it til the end of the test run.
Under Go 1.14 the fmt.Println and t.Log lines are interleaved, rather than waiting for the test to complete, demonstrating that test output is streamed when go test -v is used.
Advantage, according to Dave:
This is a great quality of life improvement for integration style tests that often retry for long periods when the test is failing.
Streaming t.Log output will help Gophers debug those test failures without having to wait until the entire test times out to receive their output.
For testing sometimes I do
fmt.Fprintln(os.Stdout, "hello")
Also, you can print to:
fmt.Fprintln(os.Stderr, "hello")
t.Log and t.Logf do print out in your test but can often be missed as it prints on the same line as your test. What I do is Log them in a way that makes them stand out, ie
t.Run("FindIntercomUserAndReturnID should find an intercom user", func(t *testing.T) {
id, err := ic.FindIntercomUserAndReturnID("test3#test.com")
assert.Nil(t, err)
assert.NotNil(t, id)
t.Logf("\n\nid: %v\n\n", *id)
})
which prints it to the terminal as,
=== RUN TestIntercom
=== RUN TestIntercom/FindIntercomUserAndReturnID_should_find_an_intercom_user
TestIntercom/FindIntercomUserAndReturnID_should_find_an_intercom_user: intercom_test.go:34:
id: 5ea8caed05a4862c0d712008
--- PASS: TestIntercom (1.45s)
--- PASS: TestIntercom/FindIntercomUserAndReturnID_should_find_an_intercom_user (1.45s)
PASS
ok github.com/RuNpiXelruN/third-party-delete-service 1.470s
WARNING: The answers here do not apply when testing multiple packages at once.
The answers from #VonC and #voidlogic are fantastic, but I wanted to bring the following thread to attention in case someone is running a permutation of go test -v ./...: https://github.com/golang/go/issues/46959
The issue lies in implementation nuances/difficulties related to running tests from multiple packages.
For example, running go test -v -count=1 -run TestOnlyOneInstanceOfThisTestExists ./multiple/packages/exist/below/... will only print logs after the test completes.
However, running go test -v -count=1 -run TestOnlyOneInstanceOfThisTestExists ./this/path/points/to/one/package/only/... will stream the output as expected.
In case your using testing.M and associated setup/teardown; -v is valid here as well.
package g
import (
"os"
"fmt"
"testing"
)
func TestSomething(t *testing.T) {
t.Skip("later")
}
func setup() {
fmt.Println("setting up")
}
func teardown() {
fmt.Println("tearing down")
}
func TestMain(m *testing.M) {
setup()
result := m.Run()
teardown()
os.Exit(result)
}
$ go test -v g_test.go
setting up
=== RUN TestSomething
g_test.go:10: later
--- SKIP: TestSomething (0.00s)
PASS
tearing down
ok command-line-arguments 0.002s
The *_test.go file is a Go source like the others, you can initialize a new logger every time if you need to dump complex data structure, here an example:
// initZapLog is delegated to initialize a new 'log manager'
func initZapLog() *zap.Logger {
config := zap.NewDevelopmentConfig()
config.EncoderConfig.EncodeLevel = zapcore.CapitalColorLevelEncoder
config.EncoderConfig.TimeKey = "timestamp"
config.EncoderConfig.EncodeTime = zapcore.ISO8601TimeEncoder
logger, _ := config.Build()
return logger
}
Then, every time, in every test:
func TestCreateDB(t *testing.T) {
loggerMgr := initZapLog()
// Make logger avaible everywhere
zap.ReplaceGlobals(loggerMgr)
defer loggerMgr.Sync() // flushes buffer, if any
logger := loggerMgr.Sugar()
logger.Debug("START")
conf := initConf()
/* Your test here
if false {
t.Fail()
}*/
}