Waiting for executoro to finish while tasks can submit more tasks - python-multithreading

I submitting tasks to a ThreadPoolExecutor, while those tasks can submit more tasks in some cases. I want to wait until all tasks are finished. I have a solution for that using a condition over the internal queue size of the executor, but though it works, I feel it is not the best approach.
Here is an example for what I'm doing, I reduced it to a very simple case:
from concurrent.futures.thread import ThreadPoolExecutor
from threading import Condition
def func(e: ThreadPoolExecutor, c: Condition, val: int):
print(val)
if val < 5:
# A case when I need to submit a new task
e.submit(func, e, c, val + 1)
with c:
c.notify()
with ThreadPoolExecutor(max_workers=5) as e:
c = Condition()
for i in range(10):
print(e._work_queue.qsize())
e.submit(func, e, c, 1)
with c:
while not e._work_queue.qsize() == 0:
c.wait_for(lambda: e._work_queue.qsize() == 0)
Is there a better and/or cleaner way to do that?

Related

No support for zipping multiple Flows in Kotlin?

I was really surprised when we were trying to move away from RxJ to kotlin Flows that there isn't any facility to zip multiple flows.
val flow1= (1..3).asFlow().onEach {delay(1000)}
val flow2= (1..3).asFlow().onEach {delay(2000)}
val flow3= (1..3).asFlow().onEach {delay(3000)}
I was looking for a Rx styled Flow.zip(flow1,flow2,flow3) however I failed to find any such facility.
What is strange to me that I didn't find many questions asking what I am asking here on stack or in any Kotlin Flows tutorial .
This makes me think that I must be doing something wrong , and that there might be an alternate facility to zip multiple flows.
Any hints ?
You can build your own on top of zip for two flows
inline fun <A, B, C, D> Flow<A>.zip(
flowB: Flow<B>,
flowC: Flow<C>,
crossinline f: (A, B, C) -> D
): Flow<D> =
zip(flowB, ::Pair).zip(flowC) { (a, b), c -> f(a, b, c) }
Usage:
suspend fun main() {
val one = flowOf(0, 1, 2, 3)
val two = flowOf("a", "b", "c", "d")
val three = flowOf(5.0, 6.0, 7.0)
one.zip(two, three, ::Triple).collect(::print)
// prints (0, a, 5.0)(1, b, 6.0)(2, c, 7.0)
}
Probably not as efficient as a direct implementation but depending on your use case it might be good enough.

Kotlin SharedFlow combine operation. Have zip behaviour in a specific situation

I'm combining two SharedFlows and then performing a long working operation.
At the start, I know the state so I emit a "starting value" for both the flows. After that user can emit to either flows.
Both flows are mostly independent but in a specific situation, the user can emit to both flows at the same time. What this does is that combine is triggered twice and the long working job is performed twice when in fact, in this case, I'm only interested receiving both values but only performing the job once.
Here is what I have:
val _numbers = MutableSharedFlow<Int>(replay = 0, extraBufferCapacity = 1, onBufferOverflow = BufferOverflow.DROP_OLDEST)
val numbers: SharedFlow<Int> = _numbers
val _strings = MutableSharedFlow<String>(replay = 0, extraBufferCapacity = 1, onBufferOverflow = BufferOverflow.DROP_OLDEST)
val strings: SharedFlow<String> = _strings
combine(numbers, strings) { (number, strings) ->
println("values $number - $strings. Starting to perform a long working job")
}
.launchIn(CoroutineScope(Dispatchers.IO))
runBlocking {
delay(500)
// This is the initial values. I always know this at start.
_numbers.emit(0)
_strings.emit("a")
// Depending of user action, number or string is emitted.
delay(100)
_numbers.emit(1)
delay(100)
_numbers.emit(2)
delay(100)
_numbers.emit(3)
delay(100)
_numbers.emit(4)
delay(100)
_strings.emit("b")
delay(100)
_strings.emit("c")
delay(100)
_strings.emit("d")
delay(100)
_strings.emit("e")
delay(100)
// In a specific situation both values need to change but I only want to trigger the long working job once
_numbers.emit(10)
_strings.emit("Z")
}
This can produce this:
values 0 - a. Starting to perform a long working job
values 1 - a. Starting to perform a long working job
values 2 - a. Starting to perform a long working job
values 3 - a. Starting to perform a long working job
values 4 - a. Starting to perform a long working job
values 4 - b. Starting to perform a long working job
values 4 - c. Starting to perform a long working job
values 4 - d. Starting to perform a long working job
values 4 - e. Starting to perform a long working job
values 10 - e. Starting to perform a long working job
values 10 - Z. Starting to perform a long working job
Or this:
values 0 - a. Starting to perform a long working job
values 1 - a. Starting to perform a long working job
values 2 - a. Starting to perform a long working job
values 3 - a. Starting to perform a long working job
values 4 - a. Starting to perform a long working job
values 4 - b. Starting to perform a long working job
values 4 - c. Starting to perform a long working job
values 4 - d. Starting to perform a long working job
values 4 - e. Starting to perform a long working job
values 10 - Z. Starting to perform a long working job
Due to the buffer overflow, sometimes I can achieve what I want (this latest one) but on others, I have the values 10 - e. Starting to perform a long working job that I'm not interested in.
Is there any way I can enforce, when emitting to the two, only start the long work once?
https://pl.kotl.in/JA1Wdhra9
If you want to keep 2 flows, the distinction between single and double events will have to be time-based. You won't be able to distinguish between a quick update of string-then-number from a "double-update".
If time-based is ok for you, using debounce before the long processing should be the way to go:
combine(numbers, strings) { (number, string) -> number to string }
.debounce(50)
.onEach { (number, string) ->
println("values $number - $string. Starting to perform a long working job")
}
.launchIn(CoroutineScope(Dispatchers.IO))
Here, combine only builds pairs from the 2 flows, but still gets all events, and then debounce ignores quick succession of events and only sends the latest of a quick series. This also introduces a slight delay, but it all depends on what you want to achieve.
If time-based distinction is not ok for you, you need a way for the producer to send double events in a way that is distinct from 2 single events. For this, you can use a single flow of events, and you can for instance define events like this:
sealed class Event {
data class SingleNumberUpdate(val value: Int): Event()
data class SingleStringUpdate(val value: String): Event()
data class DoubleUpdate(val num: Int, val str: String): Event()
}
But then you'll have to write the "combine" logic yourself (keeping the state of the latest number and string):
flow {
var num = 0
var str = "a"
emit(num to str)
events.collect { e ->
when (e) {
is Event.SingleNumberUpdate -> {
num = e.value
}
is Event.SingleStringUpdate -> {
str = e.value
}
is Event.DoubleUpdate -> {
num = e.num
str = e.str
}
}
emit(num to str)
}
}
.onEach { (number, strings) ->
println("values $number - $strings. Starting to perform a long working job")
}
.launchIn(CoroutineScope(Dispatchers.IO))

"Mix" operator does not wait for upstream processes to finish

I have several upstream processes, say A, B and C, doing similar tasks.
Downstream of that, I have one process X that needs to treat all outputs of the A, B and C in the same way.
I tried to use the "mix" operator to create a single channel from the output files of A, B and C like so :
process A {
output:
file outA
}
process B {
output:
file outB
}
process C {
output:
file outC
}
inX = outA.mix(outB,outC)
process X {
input:
file inX
"myscript.sh"
}
Process A often finishes before B and C, and somehow, process X does not wait for process B and C to finish, and only take the outputs of A as input.
The following snippet works nicely:
process A {
output:
file outA
"""
touch outA
"""
}
process B {
output:
file outB
"""
touch outB
"""
}
process C {
output:
file outC
"""
touch outC
"""
}
inX = outA.mix(outB,outC)
process X {
input:
file inX
"echo myscript.sh"
}
If you continue to experience the same problem feel free to open an issue including a reproducible test case.

Alternative to nested if for conditional logic

Is there a design pattern or methodology or language that allows you to write complex conditional logic beyond just nested Ifs?
At the very least, does this kind of question or problem have a name? I was unable to find anything here or through Google that described what I was trying to solve that wasn't just, replace your IF with a Switch statement.
I'm playing around with a script to generate a bunch of data. As part of this, I'd like to add in a lot of branching conditional logic that should provide variety as well as block off certain combinations.
Something like, If User is part of group A, then they can't be part of group B, and if they have Attribute C, then that limits them to characteristic 5 or 6, but nothing below or above that.
The answer is simple: refactoring.
Let's take an example (pseudo-code):
if (a) {
if (b) {
if (c) {
// do something
}
}
}
can be replaced by:
if (a && b && c) {
// do something
}
Now, say that a, b and c are complex predicates which makes the code hard to read, for example:
if (visitorIsInActiveTestCell(visitor) &&
!specialOptOutConditionsApply(request, visitor) &&
whatEverWeWantToCheckHere(bla, blabla)) {
// do something
}
we can refactor it as well and create a new method:
def shouldDoSomething(request, visitor, bla, blabla) {
return visitorIsInActiveTestCell(visitor) &&
!specialOptOutConditionsApply(request, visitor) &&
whatEverWeWantToCheckHere(bla, blabla)
}
and now our if condition isn't nested and becomes easier to read and understand:
if (shouldDoSomething(request, visitor, bla, blabla)) {
// do something
}
Sometimes it's not straightforward to extract such logic and refactor, and it may require taking some time to think about it, but I haven't yet ran into an example in which it was impossible.
All of the foregoing answers seem to miss the question.
One of the patterns that frequently occurs in hardware-interface looks like this:
if (something) {
step1;
if ( the result of step1) {
step2;
if (the result of step2) {
step3;
... and so on
}}}...
This structure cannot be collapsed into a logical conjunction, as each step is dependent on the result of the previous one, and may itself have internal conditions.
In assembly code, it would be a simple matter of test and branch to a common target; i.e., the dreaded "go to". In C, you end up with a pile of indented code that after about 8 levels is very difficult to read.
About the best that I've been able to come up with is:
while( true) {
if ( !something)
break;
step1
if ( ! result of step1)
break;
step2
if ( ! result of step2)
break;
step3
...
break;
}
Does anyone have a better solution?
It is possible you want to replace your conditional logic with polymorphism, assuming you are using an object-oriented language.
That is, instead of:
class Bird:
#...
def getSpeed(self):
if self.type == EUROPEAN:
return self.getBaseSpeed();
elif self.type == AFRICAN:
return self.getBaseSpeed() - self.getLoadFactor() * self.numberOfCoconuts;
elif self.type == NORWEGIAN_BLUE:
return 0 if isNailed else self.getBaseSpeed(self.voltage)
else:
raise Exception("Should be unreachable")
You can say:
class Bird:
#...
def getSpeed(self):
pass
class European(Bird):
def getSpeed(self):
return self.getBaseSpeed()
class African(Bird):
def getSpeed(self):
return self.getBaseSpeed() - self.getLoadFactor() * self.numberOfCoconuts
class NorwegianBlue(Bird):
def getSpeed():
return 0 if self.isNailed else self.getBaseSpeed(self.voltage)
# Somewhere in client code
speed = bird.getSpeed()
Taken from here.

Akka - load balancing and full use of the processor

I wrote a matrix multiplication algorithm, which uses parallel collections, to speed up the multiplication.
It goes like that:
(0 until M1_ROWS).grouped(PARTITION_ROWS).toList.par.map( i =>
singleThreadedMultiplicationFAST(i.toArray.map(m1(_)), m2)
).reduce(_++_)
Now I would like to do the same in Akka, so what I did is:
val multiplyer = actorOf[Pool]
multiplyer start
val futures = (0 until M1_ROWS).grouped(PARTITION_ROWS).map( i =>
multiplyer ? MultiplyMatrix(i.toArray.map(m1(_)), m2)
)
futures.map(_.get match { case res :Array[Array[Double]] => res }).reduce(_++_)
class Multiplyer extends akka.actor.Actor{
protected def receive = {
case MultiplyMatrix(m1, m2) => self reply singleThreadedMultiplicationFAST (m1,m2)
}
}
class Pool extends Actor with DefaultActorPool
with FixedCapacityStrategy with RoundRobinSelector {
def receive = _route
def partialFill = false
def selectionCount = 1
def instance = actorOf[Multiplyer]
def limit = 32 // I tried 256 with no effect either
}
It turned out that actor based version of this algorithm is using only
200% on my i7 sandybridge, while the parallel collections version is
using 600% of processor and is 4-5x faster.
I thought it might be the dispatcher and tried this:
self.dispatcher = Dispatchers.newThreadBasedDispatcher(self, mailboxCapacity = 100)
and this(I shared this one between actors):
val messageDispatcher = Dispatchers.newExecutorBasedEventDrivenDispatcher("d1")
.withNewBoundedThrea dPoolWithLinkedBlockingQueueWithUnboundedCapacity(100)
.setCorePoolSize(16)
.setMaxPoolSize(128)
.setKeepAliveTimeInMillis(60000).build
But I didn't observe any changes. Still 200% processor usage only and
the algorithm is 4-5 times slower than the parallel collections
version.
I am sure I am doing something silly so please help!!!:)
This expression:
val futures = (0 until M1_ROWS).grouped(PARTITION_ROWS).map( i =>
multiplyer ? MultiplyMatrix(i.toArray.map(m1(_)), m2)
)
creates a lazy collection, so your _.get makes your entire program serial.
So the solution is to make that expression strict by adding toList or similar.