How to join a Kotlin SupervisorJob - kotlin

I am trying to process a tree of data objects. Each tree leaf is supposed to be processed through a function using a coroutine. The whole process should be done using a fixed size threadpool.
So I came up with this:
val node = an instance of WorkspaceEntry (tree structure)
val localDispatcher = newFixedThreadPoolContext(16)
fun main() {
val job = SupervisorJob()
val scope = CoroutineScope(localDispatcher + job)
handleEntry(node, scope)
runBlocking {
job.join()
}
}
The handleEntry method recursively launches a child job in the supervisor for each tree leaf.
The child jobs of the supervisor all complete successfully, but the join never returns. Am I understanding this wrong?
Edit: HandleEntry function
private fun handleEntry(workspaceEntry: WorkspaceEntry, scope: CoroutineScope) {
if (workspaceEntry is FileEntry) {
scope.launch {
FileTypeRegistry.processFile(workspaceEntry.fileBlob)
}
} else {
workspaceEntry.children.forEach { child -> handleEntry(child, scope) }
}
}

It seems the Job that is used to create CoroutineContext (in your case SupervisorJob) is not intended for waiting child coroutines to finish, so you can't use job.join(). I guess the main intent of that Job is to cancel child coroutines. Changing runBlocking block to the following will work:
runBlocking {
job.children.forEach {
it.join()
}
}

You have mixed two roles:
the master job found in the coroutine scope that never completes on its own and is used to control the lifecycle of everything else
the job corresponding to a unit of work, possibly decomposed into more child jobs
You need both, like this:
val masterJob = SupervisorJob()
val scope = CoroutineScope(localDispatcher + masterJob)
val unitOfWork = scope.launch { handleEntry(node, scope) }
runBlocking { unitOfWork.join() }
The above code doesn't really motivate the existence of the master job because you start just one child job from it, but it may make sense in a wider picture, where you have some context from which you launch many jobs, and want to be able to write
masterJob.cancel()
to cancel everything before it's done.

Related

Kotlin: Difference between calling CoroutineScope.launch vs launch inside a coroutine

I am trying to understand structured concurrency in Kotlin and I am unable to wrap my head across this piece of code.
fun main(): Unit = runBlocking {
other(this)
}
suspend fun other(scope: CoroutineScope) {
val job = scope.launch {
scope.launch {
delay(200)
println("e")
}
println("a")
}
job.invokeOnCompletion {
println("Complete")
}
}
The code prints
a
Complete
e
While if I replace the inner scope.launch call with launch, like this
suspend fun other(scope: CoroutineScope) {
val job = scope.launch {
launch {
delay(200)
println("e")
}
println("a")
}
job.invokeOnCompletion {
println("Complete")
}
}
It prints
a
e
Complete
This shows that the first example does not follow structured concurrency since parent job finished before child job. My confusion is, why does this happen?
I felt that scope.launch maybe equivalent to calling launch (which should be equivalent to this.launch and this refers to scope) in this case. But seems like this is not true. Can someone explains why the first one results in unstructured concurrency and what is the difference between the two launch calls? Thanks!
In the first code, while the inner launch looks like it's a child of the outer launch, it's actually not -- it's a sibling of the outer launch since they were both launched from the same scope. So waiting for the outer launch's job to complete doesn't wait for the inner one.
The second code uses structured concurrency since the inner launch uses the scope created by the outer launch (the receiver of the launch block). In this case it's a child of the outer launch so waiting for the outer job to complete waits for the child to complete as well.
The second one is what you're supposed to do: use the CoroutineScope receiver of the launch block to launch child jobs. Using some other scope instead does not provide structured concurrency.

How to pass Observable emissions to MutableSharedFlow?

well, I have an Observable, I’ve used asFlow() to convert it but doesn’t emit.
I’m trying to migrate from Rx and Channels to Flow, so I have this function
override fun processIntents(intents: Observable<Intent>) {
intents.asFlow().shareTo(intentsFlow).launchIn(this)
}
shareTo() is an extension function which does onEach { receiver.emit(it) }, processIntents exists in a base ViewModel, and intentsFlow is a MutableSharedFlow.
fun <T> Flow<T>.shareTo(receiver: MutableSharedFlow<T>): Flow<T> {
return onEach { receiver.emit(it) }
}
I want to pass emissions coming from the intents Observable to intentsFlow, but it doesn’t work at all and the unit test keeps failing.
#Test(timeout = 4000)
fun `WHEN processIntent() with Rx subject or Observable emissions THEN intentsFlow should receive them`() {
return runBlocking {
val actual = mutableListOf<TestNumbersIntent>()
val intentSubject = PublishSubject.create<TestNumbersIntent>()
val viewModel = FlowViewModel<TestNumbersIntent, TestNumbersViewState>(
dispatcher = Dispatchers.Unconfined,
initialViewState = TestNumbersViewState()
)
viewModel.processIntents(intentSubject)
intentSubject.onNext(OneIntent)
intentSubject.onNext(TwoIntent)
intentSubject.onNext(ThreeIntent)
viewModel.intentsFlow.take(3).toList(actual)
assertEquals(3, actual.size)
assertEquals(OneIntent, actual[0])
assertEquals(TwoIntent, actual[1])
assertEquals(ThreeIntent, actual[2])
}
}
test timed out after 4000 milliseconds
org.junit.runners.model.TestTimedOutException: test timed out after
4000 milliseconds
This works
val ps = PublishSubject.create<Int>()
val mf = MutableSharedFlow<Int>()
val pf = ps.asFlow()
.onEach {
mf.emit(it)
}
launch {
pf.take(3).collect()
}
launch {
mf.take(3).collect {
println("$it") // Prints 1 2 3
}
}
launch {
yield() // Without this we suspend indefinitely
ps.onNext(1)
ps.onNext(2)
ps.onNext(3)
}
We need the take(3)s to make sure our program terminates, because MutableSharedFlow and PublishSubject -> Flow collect indefinitely.
We need the yield because we're working with a single thread and we need to give the other coroutines an opportunity to start working.
Take 2
This is much better. Doesn't use take, and cleans up after itself.
After emitting the last item, calling onComplete on the PublishSubject terminates MutableSharedFlow collection. This is a convenience, so that when this code runs it terminates completely. It is not a requirement. You can arrange your Job termination however you like.
Your code never terminating is not related to the emissions never being collected by the MutableSharedFlow. These are separate concerns. The first is due to the fact that neither a flow created from a PublishSubject, nor a MutableSharedFlow, terminates on its own. The PublishSubject flow will terminate when onComplete is called. The MutableSharedFlow will terminate when the coroutine (specifically, its Job) collecting it terminates.
The Flow constructed by PublishSubject.asFlow() drops any emissions if, at the time of the emission, collection of the Flow hasn't suspended, waiting for emissions. This introduces a race condition between being ready to collect and code that calls PublishSubject.onNext().
This, I believe, is the reason why flow collection isn't picking up the onNext emissions in your code.
It's why a yield is required right after we launch the coroutine that collects from psf.
val ps = PublishSubject.create<Int>()
val msf = MutableSharedFlow<Int>()
val psf = ps.asFlow()
.onEach {
msf.emit(it)
}
val j1 = launch {
psf.collect()
}
yield() // Use this to allow psf.collect to catch up
val j2 = launch {
msf.collect {
println("$it") // Prints 1 2 3 4
}
}
launch {
ps.onNext(1)
ps.onNext(2)
ps.onNext(3)
ps.onNext(4)
ps.onComplete()
}
j1.invokeOnCompletion { j2.cancel() }
j2.join()

Coroutine job never completing

Given this piece of code
fun main() {
val job = Job()
val scope = GlobalScope + job
scope.launch {
println("working!")
delay(1000L)is ms)
println("done!")
// how do i finish the job originally associated with this scope?
}
runBlocking {
job.join()
println("job done")
}
}
I have a custom coroutine scope for my application, and i'm associating a job with this scope like that, reason being i want all the new coroutines that are created from this scope to be the children of this job, if i cancel it i want everything in it to be cancelled.
But main job itself is never completing. How do i complete the main job when the task is done? or failed...
The main job works only as the parent job and will never complete.
But you could wait for all children to complete:
runBlocking {
job.children.forEach { it.join() }
println("job done")
}
Alternatively you should go with Eugene's solution and invoke the join method of the Job you just started, instead of the main job.
Let's simplify your code to something like this:
val job = Job()
runBlocking {
job.join()
}
If you run this code you will see that it also never completes. That is because job.join() suspends until the given job reaches a terminal state which is either completed or canceled (see docs).
When you create a job using some coroutine builder (like .launch {...}) you do not need to complete it by yourself. But since you have created it using a factory method Job() it is now your responsibility to complete it.
You can also find more detailed explanation here.
There are several functions to wait for a Job() object to complete and to cancel it. You may pick one from the list
job.cancel()
job.join()
job.cancelAndJoin()
Only the first function is not a suspend function, so you may call it from every other function, not necessarily a suspend functions
There is a better way - the launch{..} function already returns Job object from the call. You may simplify the code to say
val job = GlobalScope.launch { .. }
that Job object will automatically complete when launch block is over or failed with an exception

What does a Coroutine Join do?

So for example I have the following code:
scope.launch {
val job = launch {
doSomethingHere()
}
job.join()
callOnlyWhenJobAboveIsDone()
}
Job.join() is state as such in the documentation:
Suspends coroutine until this job is complete. This invocation resumes normally (without exception) when the job is complete for any reason and the Job of the invoking coroutine is still active. This function also starts the corresponding coroutine if the Job was still in new state.
If I understand it correctly, since join() suspends the coroutine until its completed, then my code above will do exactly what it wants. That is, the method callOnlyWhenJobAboveIsDone() will only be called when doSomethingHere() is finished. Is that correct?
Can anyone explain further the use case for job.join()? Thanks in advance.
Explaining further my usecase:
val storeJobs = ArrayList<Job>()
fun callThisFunctionMultipleTimes() {
scope.launch {
val job = launch {
doSomethingHere()
}
storeJobs.add(job)
job.join()
callOnlyWhenJobAboveIsDone()
}
}
fun callOnlyWhenJobAboveIsDone() {
// Check if there is still an active job
// by iterating through the storedJobs
// and checking if any is active
// if no job is active do some other things
}
is this a valid usecase for job.join()?
That is, the method callOnlyWhenJobAboveIsDone() will only be called when doSomethingHere() is finished. Is that correct?
Yes.
Can anyone explain further the use case for job.join()?
In your case there is actually no need for another job, you could just write:
scope.launch {
doSomethingHere()
callOnlyWhenJobAboveIsDone()
}
That will do the exact same thing, so it is not really a usecase for a Job. Now there are other cases when .join() is really useful.
You want to run (launch) multiple asynchronous actions in parallel, and wait for all of them to finish:
someData
.map { Some.asyncAction(it) } // start in parallel
.forEach { it.join() } // wait for all of them
You have to keep track of an asynchronous state, for example an update:
var update = Job()
fun doUpdate() {
update.cancel() // don't update twice at the same time
update = launch {
someAsyncCode()
}
}
Now to make sure that the last update was done, for example if you want to use some updated data, you can just:
update.join()
anywhere, you can also
update.cancel()
if you want to.
Whats really useful about launch {} is that it not only returns a Job, but also attaches the Job to the CoroutineScope. Through that you can keep track of every async action happening inside your application. For example in your UI you could make every Element extend the CoroutineScope, then you can just cancel the scope if the Element leaves the rendered area, and all updates / animations in it will get stopped.
Kotlin's Job.join() is the non-blocking equivalent of Java's Thread.join().
So your assumption is correct: the point of job.join() is to wait for the completion of the receiver job before executing the rest of the current coroutine.
However, instead of blocking the thread that calls join() (like Java's Thread.join() would do) it simply suspends the coroutine calling join(), leaving the current thread free to do whatever it pleases (like executing another coroutine) in the meantime.
val queryProduct = GlobalScope.async {
}
val verification = GlobalScope.async {
}
GlobalScope.launch {
verification.join()
queryProduct.join()
}
This is how I use join(). When two asyncs are completed, another launch starts.

Kotlin: withContext() vs Async-await

I have been reading kotlin docs, and if I understood correctly the two Kotlin functions work as follows :
withContext(context): switches the context of the current coroutine, when the given block executes, the coroutine switches back to previous context.
async(context): Starts a new coroutine in the given context and if we call .await() on the returned Deferred task, it will suspends the calling coroutine and resume when the block executing inside the spawned coroutine returns.
Now for the following two versions of code :
Version1:
launch(){
block1()
val returned = async(context){
block2()
}.await()
block3()
}
Version2:
launch(){
block1()
val returned = withContext(context){
block2()
}
block3()
}
In both versions block1(), block3() execute in default context(commonpool?) where as block2() executes in the given context.
The overall execution is synchronous with block1() -> block2() -> block3() order.
Only difference I see is that version1 creates another coroutine, where as version2 executes only one coroutine while switching context.
My questions are :
Isn't it always better to use withContext rather than async-await as it is functionally similar, but doesn't create another coroutine. Large numbers of coroutines, although lightweight, could still be a problem in demanding applications.
Is there a case async-await is more preferable to withContext?
Update:
Kotlin 1.2.50 now has a code inspection where it can convert async(ctx) { }.await() to withContext(ctx) { }.
Large number of coroutines, though lightweight, could still be a problem in demanding applications
I'd like to dispel this myth of "too many coroutines" being a problem by quantifying their actual cost.
First, we should disentangle the coroutine itself from the coroutine context to which it is attached. This is how you create just a coroutine with minimum overhead:
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {
continuations.add(it)
}
}
The value of this expression is a Job holding a suspended coroutine. To retain the continuation, we added it to a list in the wider scope.
I benchmarked this code and concluded that it allocates 140 bytes and takes 100 nanoseconds to complete. So that's how lightweight a coroutine is.
For reproducibility, this is the code I used:
fun measureMemoryOfLaunch() {
val continuations = ContinuationList()
val jobs = (1..10_000).mapTo(JobList()) {
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {
continuations.add(it)
}
}
}
(1..500).forEach {
Thread.sleep(1000)
println(it)
}
println(jobs.onEach { it.cancel() }.filter { it.isActive})
}
class JobList : ArrayList<Job>()
class ContinuationList : ArrayList<Continuation<Unit>>()
This code starts a bunch of coroutines and then sleeps so you have time to analyze the heap with a monitoring tool like VisualVM. I created the specialized classes JobList and ContinuationList because this makes it easier to analyze the heap dump.
To get a more complete story, I used the code below to also measure the cost of withContext() and async-await:
import kotlinx.coroutines.*
import java.util.concurrent.Executors
import kotlin.coroutines.suspendCoroutine
import kotlin.system.measureTimeMillis
const val JOBS_PER_BATCH = 100_000
var blackHoleCount = 0
val threadPool = Executors.newSingleThreadExecutor()!!
val ThreadPool = threadPool.asCoroutineDispatcher()
fun main(args: Array<String>) {
try {
measure("just launch", justLaunch)
measure("launch and withContext", launchAndWithContext)
measure("launch and async", launchAndAsync)
println("Black hole value: $blackHoleCount")
} finally {
threadPool.shutdown()
}
}
fun measure(name: String, block: (Int) -> Job) {
print("Measuring $name, warmup ")
(1..1_000_000).forEach { block(it).cancel() }
println("done.")
System.gc()
System.gc()
val tookOnAverage = (1..20).map { _ ->
System.gc()
System.gc()
var jobs: List<Job> = emptyList()
measureTimeMillis {
jobs = (1..JOBS_PER_BATCH).map(block)
}.also { _ ->
blackHoleCount += jobs.onEach { it.cancel() }.count()
}
}.average()
println("$name took ${tookOnAverage * 1_000_000 / JOBS_PER_BATCH} nanoseconds")
}
fun measureMemory(name:String, block: (Int) -> Job) {
println(name)
val jobs = (1..JOBS_PER_BATCH).map(block)
(1..500).forEach {
Thread.sleep(1000)
println(it)
}
println(jobs.onEach { it.cancel() }.filter { it.isActive})
}
val justLaunch: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {}
}
}
val launchAndWithContext: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
withContext(ThreadPool) {
suspendCoroutine<Unit> {}
}
}
}
val launchAndAsync: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
async(ThreadPool) {
suspendCoroutine<Unit> {}
}.await()
}
}
This is the typical output I get from the above code:
Just launch: 140 nanoseconds
launch and withContext : 520 nanoseconds
launch and async-await: 1100 nanoseconds
Yes, async-await takes about twice as long as withContext, but it's still just a microsecond. You'd have to launch them in a tight loop, doing almost nothing besides, for that to become "a problem" in your app.
Using measureMemory() I found the following memory cost per call:
Just launch: 88 bytes
withContext(): 512 bytes
async-await: 652 bytes
The cost of async-await is exactly 140 bytes higher than withContext, the number we got as the memory weight of one coroutine. This is just a fraction of the complete cost of setting up the CommonPool context.
If performance/memory impact was the only criterion to decide between withContext and async-await, the conclusion would have to be that there's no relevant difference between them in 99% of real use cases.
The real reason is that withContext() a simpler and more direct API, especially in terms of exception handling:
An exception that isn't handled within async { ... } causes its parent job to get cancelled. This happens regardless of how you handle exceptions from the matching await(). If you haven't prepared a coroutineScope for it, it may bring down your entire application.
An exception not handled within withContext { ... } simply gets thrown by the withContext call, you handle it just like any other.
withContext also happens to be optimized, leveraging the fact that you're suspending the parent coroutine and awaiting on the child, but that's just an added bonus.
async-await should be reserved for those cases where you actually want concurrency, so that you launch several coroutines in the background and only then await on them. In short:
async-await-async-await — don't do that, use withContext-withContext
async-async-await-await — that's the way to use it.
Isn't it always better to use withContext rather than asynch-await as it is funcationally similar, but doesn't create another coroutine. Large numebrs coroutines, though lightweight could still be a problem in demanding applications
Is there a case asynch-await is more preferable to withContext
You should use async/await when you want to execute multiple tasks concurrently, for example:
runBlocking {
val deferredResults = arrayListOf<Deferred<String>>()
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"1"
}
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"2"
}
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"3"
}
//wait for all results (at this point tasks are running)
val results = deferredResults.map { it.await() }
//Or val results = deferredResults.awaitAll()
println(results)
}
If you don't need to run multiple tasks concurrently you can use withContext.
When in doubt, remember this like a rule of thumb:
If multiple tasks have to happen in parallel and the final result depends on completion of all of them, then use async.
For returning the result of a single task, use withContext.