Coroutines Child Job Cancellation - kotlin

According to Job documentation, the invocation of [cancel][Job.cancel] with exception (other than [CancellationException]) on this job also cancels parent. Job.cancel function only accepts CancellationException. I am testing this behavior but cancelling a child job is not cancelling the parent job despite I am not using SupervisorJob.
/**
* Creates a job object in an active state.
* A failure of any child of this job immediately causes this job to fail, too, and cancels the rest of its children.
*
* To handle children failure independently of each other use [SupervisorJob].
*
* If [parent] job is specified, then this job becomes a child job of its parent and
* is cancelled when its parent fails or is cancelled. All this job's children are cancelled in this case, too.
* --The invocation of [cancel][Job.cancel] with exception (other than [CancellationException]) on this job also cancels parent.--
*
* Conceptually, the resulting job works in the same way as the job created by the `launch { body }` invocation
* (see [launch]), but without any code in the body. It is active until cancelled or completed. Invocation of
* [CompletableJob.complete] or [CompletableJob.completeExceptionally] corresponds to the successful or
* failed completion of the body of the coroutine.
*
* #param parent an optional parent job.
*/
#Suppress("FunctionName")
public fun Job(parent: Job? = null): CompletableJob = JobImpl(parent)
So my test code is like;
fun main() {
val parentJob = Job()
val scope = CoroutineScope(parentJob)
suspend fun printText(text: String) {
println("Before Delay: $text")
delay(1000)
println("After Delay: $text")
}
val job1 = scope.launch {
printText("Job#1")
}
job1.invokeOnCompletion {
println("Job#1 completed. Cause = $it")
}
val job2 = scope.launch {
printText("Job#2")
}
job2.invokeOnCompletion {
println("Job#2 completed. Cause = $it")
}
println(parentJob.children.joinToString { it.toString() } )
job1.cancel(CancellationException())
Thread.sleep(10000)
}
And the outputs like;
Before Delay: Job#1
Before Delay: Job#2
StandaloneCoroutine{Active}#462d5aee, StandaloneCoroutine{Active}#69b0fd6f
Job#1 completed. Cause = java.util.concurrent.CancellationException
After Delay: Job#2
Job#2 completed. Cause = null
Process finished with exit code 0
My question is why the job#2 is not being cancelled?
EDIT:
After the answers from #broot and #Steyrix. I extended the test case.
Now printText(string) function throws exception if the given argument is "Job#1". So I am trying to simulate a viewmodel in android. Lets say that we created CoroutineScope(Job()) and I make two different requests. One of them is throwing exception but it is being caught by try-catch block. So other job continues doing its job and is not being cancelled.
So now the question is then what is the difference between SupervisorJob and Job. Why viewmodelscope (CloseableCoroutineScope(SupervisorJob() + Dispatchers.Main.immediate)) uses SupervisorJob ?
Extended example;
fun main() {
val parentJob = Job()
val scope = CoroutineScope(parentJob)
suspend fun printText(text: String) {
println("Before Delay: $text")
if (text == "Job#1") {
throw IllegalArgumentException("Test")
}
delay(1000)
println("After Delay: $text")
}
val job1 = scope.launch {
try {
printText("Job#1")
} catch (e: Exception) {
println(e)
}
}
job1.invokeOnCompletion {
println("Job#1 completed. Cause = $it")
}
val job2 = scope.launch {
printText("Job#2")
}
job2.invokeOnCompletion {
println("Job#2 completed. Cause = $it")
}
println(parentJob.children.joinToString { it.toString() })
Thread.sleep(10000)
}
Output;
Before Delay: Job#1
java.lang.IllegalArgumentException: Test
Job#1 completed. Cause = null
Before Delay: Job#2
StandaloneCoroutine{Active}#462d5aee
After Delay: Job#2
Job#2 completed. Cause = null
Process finished with exit code 0

Well, it behaves according to the documentation you cited. You used a CancellationException, so it didn't cancel the parent.
The only confusing part in the documentation is why it mentions cancelling using other exception type than CancellationException. This is not possible. cancel() is for normal cancellations, not for failures, so it can never propagate to parents.
It looks like a small mistake in the documentation for a Job() function.

According to official documentation:
Normal cancellation of a job is distinguished from its failure by the
type of this exception that caused its cancellation. A coroutine that
threw CancellationException is considered to be cancelled normally. If
a cancellation cause is a different exception type, then the job is
considered to have failed. When a job has failed, then its parent gets
cancelled with the exception of the same type, thus ensuring
transparency in delegating parts of the job to its children.
Note, that cancel function on a job only accepts CancellationException
as a cancellation cause, thus calling cancel always results in a
normal cancellation of a job, which does not lead to cancellation of
its parent. This way, a parent can cancel its own children (cancelling
all their children recursively, too) without cancelling itself.
job1.cancel(CancellationException())
Here you cancel the child job with CancellationException, therefore it is being cancelled "normally" and does not lead to parent cancellation and other children cancellation likewise.

Related

why delay makes co routine cancelable

fun main() = runBlocking {
var i = 1
var job = launch (Dispatchers.Default){
println("Thread name = ${Thread.currentThread().name}")
while (i < 10) { // replace i < 10 to isActive to make coroutine cancellable
delay(500L)
// runBlocking { delay(500L) }
println("$isActive ${i++}")
}
}
println("Thread name = ${Thread.currentThread().name}")
delay(2000L) // delay a bit
println("main: I'm tired of waiting!")
job.cancelAndJoin() // cancels the job and waits for its completion
println("main: Now I can quit.")
}
output
Thread name = main
Thread name = DefaultDispatcher-worker-1
true 1
true 2
true 3
main: I'm tired of waiting!
main: Now I can quit.
if i use runBlocking { delay(500L) } then the above co-routine is not cancelable. So, it will print all values upto 9.
but when i use delay(500L) automatically the co-routine can be cancelled. Why ?
delay doesn't actually do anything on its own, it just schedules the coroutine to be resumed at a later point in time. Continuations can, of course, be cancelled at any time.
runBlocking, on the other hand, actually blocks a thread (which is why the compiler will complain about a blocking operation within a coroutine, and why you should never use runBlocking outside of e.g. unit tests).
Note: Since main can now be a suspending function, there's generally no need to use it whatsoever in your core application code.
This function should not be used from a coroutine
runBlocking
Coroutines, like threads, are not truly interruptible; they have to rely on cooperative cancellation (see also why stopping threads is bad). This means that cancellation doesn't actually do anything either; when you cancel a context, it simply notifies all of its child contexts to cancel themselves, but any code that is still running will continue to run until a cancellation check is reached.
It is important to realize that runBlocking is not a suspend function. It cannot be paused, resumed, or cancelled. The parent context is not passed to it by default (it receives an EmptyCoroutineContext), so the coroutine used for the execution of runBlocking won't react to anything that happens upstream.
When you write
while (i < 10) {
runBlocking { delay(500L) }
println("$isActive ${i++}")
}
there are no operations here that are cancellable. Therefore, the code never checks whether its context has been cancelled, so it will continue until it finishes.
delay, however, is cancellable; as soon as its parent context is cancelled, it resumes immediately and throws an exception (i.e., it stops.)
Take a look at the generated code:
#Nullable
public final Object invokeSuspend(#NotNull Object $result) {
switch (this.label) {
case 0:
while (i.element < 10) {
BuildersKt.runBlocking$default( ... );
...
System.out.println(var3);
}
return Unit.INSTANCE;
default:
throw new IllegalStateException( ... );
}
}
Contrast this
while (i.element < 10) {
BuildersKt.runBlocking$default( ... );
...
System.out.println(var3);
}
with
do {
...
System.out.println(var3);
if (i.element >= 10) {
return Unit.INSTANCE;
}
...
} while (DelayKt.delay(500L, this) != var5);
Variable declarations and arguments omitted (...) for brevity.
runBlocking will terminate early if the current thread is interrupted, but again this is the exact same cooperative mechanism, except that it operates at the level of the thread rather than on a coroutine.
The official documentation states:
All the suspending functions in kotlinx.coroutines are cancellable.
and delay is one of them.
You can check that here.
I think the real question should be: Why a nested runBlocking is not cancellable? at least an attempt to create a new coroutine with runBlocking when isActive is false should fail, altough making a coroutine cooperative is your responsability. Besides runBlocking shouldn't be used in the first place.
Turns out if you pass this.coroutineContext as CoroutineContext to runBlocking, it gets cancelled:
fun main() = runBlocking {
var i = 1
var job = launch (Dispatchers.Default){
println("Thread name = ${Thread.currentThread().name}")
while (i < 10) { // replace i < 10 to isActive to make coroutine cancellable
runBlocking(this.coroutineContext) { delay(500L) }
println("$isActive ${i++}")
}
}
println("Thread name = ${Thread.currentThread().name}")
delay(2000L) // delay a bit
println("main: I'm tired of waiting!")
job.cancelAndJoin() // cancels the job and waits for its completion
println("main: Now I can quit.")
}
I modified your code a little
try {
while (i < 10) { // replace i < 10 to isActive to make coroutine cancellable
delay(500L)
println("$isActive ${i++}")
}
} catch (e : Exception){
println("Exception $e")
if (e is CancellationException) throw e
}
the output
Thread name = main
Thread name = DefaultDispatcher-worker-1
true 1
true 2
true 3
main: I'm tired of waiting!
Exception kotlinx.coroutines.JobCancellationException: StandaloneCoroutine was cancelled; job=StandaloneCoroutine{Cancelling}#40bcb892
main: Now I can quit.
can you see the exception StandaloneCoroutine was cancelled its because,
If the Job of the current coroutine is cancelled or completed while this suspending function is waiting i.e delay(500L), this function immediately resumes with CancellationException.
So, the point is if you add a suspending function inside your launch it can be cancellable.
you can try that with user defined suspend fun also

Launch a number of coroutines and join them all with timeout (without cancelling)

I need to launch a number of jobs which will return a result.
In the main code (which is not a coroutine), after launching the jobs I need to wait for them all to complete their task OR for a given timeout to expire, whichever comes first.
If I exit from the wait because all the jobs completed before the timeout, that's great, I will collect their results.
But if some of the jobs are taking longer that the timeout, my main function needs to wake as soon as the timeout expires, inspect which jobs did finish in time (if any) and which ones are still running, and work from there, without cancelling the jobs that are still running.
How would you code this kind of wait?
The solution follows directly from the question. First, we'll design a suspending function for the task. Let's see our requirements:
if some of the jobs are taking longer that the timeout... without cancelling the jobs that are still running.
It means that the jobs we launch have to be standalone (not children), so we'll opt-out of structured concurrency and use GlobalScope to launch them, manually collecting all the jobs. We use async coroutine builder because we plan to collect their results of some type R later:
val jobs: List<Deferred<R>> = List(numberOfJobs) {
GlobalScope.async { /* our code that produces R */ }
}
after launching the jobs I need to wait for them all to complete their task OR for a given timeout to expire, whichever comes first.
Let's wait for all of them and do this waiting with timeout:
withTimeoutOrNull(timeoutMillis) { jobs.joinAll() }
We use joinAll (as opposed to awaitAll) to avoid exception if one of the jobs fail and withTimeoutOrNull to avoid exception on timeout.
my main function needs to wake as soon as the timeout expires, inspect which jobs did finish in time (if any) and which ones are still running
jobs.map { deferred -> /* ... inspect results */ }
In the main code (which is not a coroutine) ...
Since our main code is not a coroutine it has to wait in a blocking way, so we bridge the code we wrote using runBlocking. Putting it all together:
fun awaitResultsWithTimeoutBlocking(
timeoutMillis: Long,
numberOfJobs: Int
) = runBlocking {
val jobs: List<Deferred<R>> = List(numberOfJobs) {
GlobalScope.async { /* our code that produces R */ }
}
withTimeoutOrNull(timeoutMillis) { jobs.joinAll() }
jobs.map { deferred -> /* ... inspect results */ }
}
P.S. I would not recommend deploying this kind of solution in any kind of a serious production environment, since letting your background jobs running (leak) after timeout will invariably badly bite you later on. Do so only if you throughly understand all the deficiencies and risks of such an approach.
You can try to work with whileSelect and the onTimeout clause. But you still have to overcome the problem that your main code is not a coroutine. The next lines are an example of whileSelect statement. The function returns a Deferred with a list of results evaluated in the timeout period and another list of Deferreds of the unfinished results.
fun CoroutineScope.runWithTimeout(timeoutMs: Int): Deferred<Pair<List<Int>, List<Deferred<Int>>>> = async {
val deferredList = (1..100).mapTo(mutableListOf()) {
async {
val random = Random.nextInt(0, 100)
delay(random.toLong())
random
}
}
val finished = mutableListOf<Int>()
val endTime = System.currentTimeMillis() + timeoutMs
whileSelect {
var waitTime = endTime - System.currentTimeMillis()
onTimeout(waitTime) {
false
}
deferredList.toList().forEach { deferred ->
deferred.onAwait { random ->
deferredList.remove(deferred)
finished.add(random)
true
}
}
}
finished.toList() to deferredList.toList()
}
In your main code you can use the discouraged method runBlocking to access the Deferrred.
fun main() = runBlocking<Unit> {
val deferredResult = runWithTimeout(75)
val (finished, pending) = deferredResult.await()
println("Finished: ${finished.size} vs Pending: ${pending.size}")
}
Here is the solution I came up with. Pairing each job with a state (among other info):
private enum class State { WAIT, DONE, ... }
private data class MyJob(
val job: Deferred<...>,
var state: State = State.WAIT,
...
)
and writing an explicit loop:
// wait until either all jobs complete, or a timeout is reached
val waitJob = launch { delay(TIMEOUT_MS) }
while (waitJob.isActive && myJobs.any { it.state == State.WAIT }) {
select<Unit> {
waitJob.onJoin {}
myJobs.filter { it.state == State.WAIT }.forEach {
it.job.onJoin {}
}
}
// mark any finished jobs as DONE to exclude them from the next loop
myJobs.filter { !it.job.isActive }.forEach {
it.state = State.DONE
}
}
The initial state is called WAIT (instead of RUN) because it doesn't necessarily mean that the job is still running, only that my loop has not yet taken it into account.
I'm interested to know if this is idiomatic enough, or if there are better ways to code this kind of behaviour.

Coroutine job never completing

Given this piece of code
fun main() {
val job = Job()
val scope = GlobalScope + job
scope.launch {
println("working!")
delay(1000L)is ms)
println("done!")
// how do i finish the job originally associated with this scope?
}
runBlocking {
job.join()
println("job done")
}
}
I have a custom coroutine scope for my application, and i'm associating a job with this scope like that, reason being i want all the new coroutines that are created from this scope to be the children of this job, if i cancel it i want everything in it to be cancelled.
But main job itself is never completing. How do i complete the main job when the task is done? or failed...
The main job works only as the parent job and will never complete.
But you could wait for all children to complete:
runBlocking {
job.children.forEach { it.join() }
println("job done")
}
Alternatively you should go with Eugene's solution and invoke the join method of the Job you just started, instead of the main job.
Let's simplify your code to something like this:
val job = Job()
runBlocking {
job.join()
}
If you run this code you will see that it also never completes. That is because job.join() suspends until the given job reaches a terminal state which is either completed or canceled (see docs).
When you create a job using some coroutine builder (like .launch {...}) you do not need to complete it by yourself. But since you have created it using a factory method Job() it is now your responsibility to complete it.
You can also find more detailed explanation here.
There are several functions to wait for a Job() object to complete and to cancel it. You may pick one from the list
job.cancel()
job.join()
job.cancelAndJoin()
Only the first function is not a suspend function, so you may call it from every other function, not necessarily a suspend functions
There is a better way - the launch{..} function already returns Job object from the call. You may simplify the code to say
val job = GlobalScope.launch { .. }
that Job object will automatically complete when launch block is over or failed with an exception

How to join a Kotlin SupervisorJob

I am trying to process a tree of data objects. Each tree leaf is supposed to be processed through a function using a coroutine. The whole process should be done using a fixed size threadpool.
So I came up with this:
val node = an instance of WorkspaceEntry (tree structure)
val localDispatcher = newFixedThreadPoolContext(16)
fun main() {
val job = SupervisorJob()
val scope = CoroutineScope(localDispatcher + job)
handleEntry(node, scope)
runBlocking {
job.join()
}
}
The handleEntry method recursively launches a child job in the supervisor for each tree leaf.
The child jobs of the supervisor all complete successfully, but the join never returns. Am I understanding this wrong?
Edit: HandleEntry function
private fun handleEntry(workspaceEntry: WorkspaceEntry, scope: CoroutineScope) {
if (workspaceEntry is FileEntry) {
scope.launch {
FileTypeRegistry.processFile(workspaceEntry.fileBlob)
}
} else {
workspaceEntry.children.forEach { child -> handleEntry(child, scope) }
}
}
It seems the Job that is used to create CoroutineContext (in your case SupervisorJob) is not intended for waiting child coroutines to finish, so you can't use job.join(). I guess the main intent of that Job is to cancel child coroutines. Changing runBlocking block to the following will work:
runBlocking {
job.children.forEach {
it.join()
}
}
You have mixed two roles:
the master job found in the coroutine scope that never completes on its own and is used to control the lifecycle of everything else
the job corresponding to a unit of work, possibly decomposed into more child jobs
You need both, like this:
val masterJob = SupervisorJob()
val scope = CoroutineScope(localDispatcher + masterJob)
val unitOfWork = scope.launch { handleEntry(node, scope) }
runBlocking { unitOfWork.join() }
The above code doesn't really motivate the existence of the master job because you start just one child job from it, but it may make sense in a wider picture, where you have some context from which you launch many jobs, and want to be able to write
masterJob.cancel()
to cancel everything before it's done.

When to use coroutineScope vs supervisorScope?

Can someone explain what exactly is the difference between these two?
When do you use one over the other?
Thanks in advance.
The best way to explain the difference is to explain the mechanism of coroutineScope. Consider this code:
suspend fun main() = println(compute())
suspend fun compute(): String = coroutineScope {
val color = async { delay(60_000); "purple" }
val height = async<Double> { delay(100); throw HttpException() }
"A %s box %.1f inches tall".format(color.await(), height.await())
}
compute() "fetches" two things from the network (imagine the delays are actually network operations) and combines them into a string description. In this case the first fetch is taking a long time, but succeeds in the end; the second one fails almost right away, after 100 milliseconds.
What behavior would you like for the above code?
Would you like to color.await() for a minute, only to realize that the other network call has long failed?
Or perhaps you'd like the compute() function to realize after 100 ms that one of its network calls has failed and immediately fail itself?
With supervisorScope you're getting 1., with coroutineScope you're getting 2.
The behavior of 2. means that, even though async doesn't itself throw the exception (it just completes the Deferred you got from it), the failure immediately cancels its coroutine, which cancels the parent, which then cancels all the other children.
This behavior can be weird when you're unaware of it. If you go and catch the exception from await(), you'll think you've recovered from it, but you haven't. The entire coroutine scope is still being cancelled. In some cases there's a legitimate reason you don't want it: that's when you'll use supervisorScope.
A Few More Points
Let's make two changes to our program: use supervisorScope as discussed, but also swap the order of awaiting on child coroutines:
suspend fun main() = println(compute())
suspend fun compute(): String = supervisorScope {
val color = async { delay(60_000); "purple" }
val height = async<Double> { delay(100); throw HttpException() }
"The box is %.1f inches tall and it's %s".format(height.await(), color.await())
}
Now we first await on the short-lived, failing height coroutine. When run, this program produces an exception after 100 ms and doesn't seem to await on color at all, even though we are using supervisorScope. This seems to contradict the contract of supervisorScope.
What is actually happening is that height.await() throws the exception as well, an event distinct from the underlying coroutine throwing it.
Since we aren't handling the exception, it escapes from the top-level block of supervisorScope and makes it complete abruptly. This condition — distinct from a child coroutine completing abruptly — makes supervisorScope cancel all its child coroutines, but it still awaits on all of them to complete.
So let's add exception handling around the awaits:
suspend fun compute(): String = supervisorScope {
val color = async { delay(60_000); "purple" }
val height = async<Double> { delay(100); throw Exception() }
try {
"The box is %.1f inches tall and it's %s".format(height.await(), color.await())
} catch (e: Exception) {
"there was an error"
}
}
Now the program does nothing for 60 seconds, awaiting the completion of color, just as described.
Or, in another variation, let's remove exception handling around awaits, but make the color coroutine handle the CancellationException, wait for 2 seconds, and then complete:
suspend fun compute(): String = coroutineScope {
val color = async {
try {
delay(60_000); "purple"
} catch (e: CancellationException) {
withContext(NonCancellable) { delay(2_000) }
println("color got cancelled")
"got error"
}
}
val height = async<Double> { delay(100); throw Exception() }
"The box is %.1f inches tall and it's %s".format(height.await(), color.await())
}
This does nothing for 2.1 seconds, then prints "color got cancelled", and then completes with a top-level exception — proving that the child coroutines are indeed awaited on even when the top-level block crashes.
I think Roman Elizarov explain it quite in details, but to make it short:
Coroutines create the following kind of hierarchy:
Parent Coroutine
Child coroutine 1
Child coroutine 2
...
Child coroutine N
Assume that "Coroutine i" fails. What do you want to happen with its parent?
If you want for its parent to also fail, use coroutineScope. That's what structured concurrency is all about.
But if you don't want it to fail, for example child was some kind of background task which can be started again, then use supervisorScope.
coroutineScope -> The scope and all its children fail whenever any of the children fails
supervisorScope -> The scope and all its children do NOT fail whenever any of the children fails
As #N1hk mentioned, if you use async the order of calling await matters. And if you're depending on a result from both asynch..await blocks, then it makes sense to cancel as early as possible so that's not an ideal example for supervisorScope
The difference is more apparent when using launch and join:
fun main() = runBlocking {
supervisorScope {
val task1 = launch {
println("Task 1 started")
delay(100)
if (true) throw Exception("Oops!")
println("Task 1 completed!")
}
val task2 = launch {
println("Task 2 started")
delay(1000)
println("Task 2 completed!")
}
listOf(task1, task2).joinAll()
println("Finished waiting for both tasks")
}
print("Done!")
}
With supervisorScope, the output would be:
Task 1 started
Task 2 started
Exception in thread "main" java.lang.Exception: Oops!
...
Task 2 completed!
Finished waiting for both tasks
Done!
With coroutineScope the output would be just:
Task 1 started
Task 2 started
CoroutineScope -> Cancel whenever any of its children fail.
SupervisorScope -> If we want to continue with the other tasks even when one fails, we go with the supervisorScope. A supervisorScope won’t cancel other children when one of them fails.
Here is a useful link for understanding of coroutine in details:
https://blog.mindorks.com/mastering-kotlin-coroutines-in-android-step-by-step-guide