I've been reading up about concurrency in Kotlin and thought I started to understand it... Then I discovered that async() has been deprecated in 1.3 and I'm back to the start.
Here's what I'd like to do: create a thread (and it does have to be a thread rather than a managed pool, unfortunately), and then be able to execute async blocks on that thread, and return Deferred instances that will let me use .await().
What is the recommended way to do this in Kotlin?
1. Single-threaded coroutine dispatcher
Here's what I'd like to do: create a thread (and it does have to be a thread rather than a managed pool, unfortunately)
Starting a raw thread to handle your coroutines is an option only if you're prepared to dive deep and implement your own coroutine dispatcher for that case. Kotlin offers support for your requirement via a single-threaded executor service wrapped into a dispatcher. Note that this still leaves you with almost complete control over how you start the thread, if you use the overload that takes a thread factory:
val threadPool = Executors.newSingleThreadExecutor {
task -> Thread(task, "my-background-thread")
}.asCoroutineDispatcher()
2. async-await vs. withContext
and then be able to execute async blocks on that thread, and return Deferred instances that will let me use .await().
Make sure you actually need async-await, which means you need it for something else than
val result = async(singleThread) { blockingCal() }.await()
Use async-await only if you need to launch a background task, do some more stuff on the calling thread, and only then await() on it.
Most users new to coroutines latch onto this mechanism due to its familiarity from other languages and use it for plain sequential code like above, but avoiding the pitfall of blocking the UI thread. Kotlin has a "sequential by default" philosophy which means you should instead use
val result = withContext(singleThread) { blockingCall() }
This doesn't launch a new coroutine in the background thread, but transfers the execution of the current coroutine onto it and back when it's done.
3. Deprecated top-level async
Then I discovered that async() has been deprecated in 1.3
Spawning free-running background tasks is a generally unsound practice because it doesn't behave well in the case of errors or even just unusual patterns of execution. Your calling method may return or fail without awaiting on its result, but the background task will go on. If the application repeatedly re-enters the code that spawns the background task, your singleThread executor's queue will grow without bound. All these tasks will run without a purpose because their requestor is long gone.
This is why Kotlin has deprecated top-level coroutine builders and now you must explicitly qualify them with a coroutine scope whose lifetime you must define according to your use case. When the scope's lifetime runs out, it will automatically cancel all the coroutines spawned within it.
On the example of Android this would amount to binding the coroutine scope to the lifetime of an Activity, as explained in the KDoc of CoroutineScope.
Like it's stated with the message, it's deprecated in favor of calling async with an explicit scope like GlobalScope.async {} instead.
This is the actual implementation of the deprecated method as well.
By removing the top level async function, you'll not run into issues with implicit scopes or wrong imports.
Let me recommend this solution: Kotlin coroutines with returned value
It parallelizes tasks into 3 background threads (so called "triplets pool") but it's easy to change it to be single threaded as per your requirement by replacing tripletsPool with backgroundThread as below:
private val backgroundThread = ThreadPoolExecutor(1, 1, 5L, TimeUnit.SECONDS, LinkedBlockingQueue())
Related
I am trying to understand the coroutineScope() suspend function in Kotlin and I'm having a hard time understanding the exact purpose of this function.
As per the kotlinlang docs,
This function is designed for parallel decomposition of work. When any
child coroutine in this scope fails, this scope fails and all the rest
of the children are cancelled (for a different behavior see
supervisorScope). This function returns as soon as the given block and
all its children coroutines are completed.
But I feel this behavior can be achieved by launching a child coroutine and calling join on it.
So for example
suspend fun other() {
coroutineScope {
launch { // some task }
async { // some task }
}
}
This can be written as (scope is a reference to the scope created by the parent coroutine)
suspend fun other(scope: CoroutineScope) {
scope.launch {
launch { // some task }
async { // some task }
}.join()
}
Is there any difference between these two approaches since it looks
like they will produce same result and also seem to work in the same fashion?
If not, is coroutineScope merely a way to reduce this
boilerplate code of passing scope from parent coroutine and
calling join on child coroutine?
TLDR
Using CoroutineScope as in the example adds boilerplate code, is more confusing, error-prone and may handle cases like errors and cancellations differently. coroutineScope() is generally preferred in such cases.
Full answer
These two patterns are conceptually different and are used in different cases. Coroutines are all about sequential code and structured concurrency. Sequential means we can write a traditional code that waits in-place, it doesn't use callbacks, etc. and at the same time we don't get a performance hit. Structured concurrency means concurrent tasks have their owners, tasks consists of smaller sub-tasks that are explicit to the framework.
By mixing both above together we get a very easy to use and error-proof concurrency model where in most cases we don't have to launch background jobs and then manage them manually, watch for errors, handle cancellations, etc. We simply fork into sub-tasks and then join them in-place - that's all.
In Kotlin this is represented by suspend functions. Suspend functions are always executed within some context, this context is passed everywhere implicitly and the coroutines framework provides utils to use this context easily. One of the most common patterns is to fork and then join and this is exactly what coroutineScope() does. It creates a scope for launching sub-tasks and we can't leave this scope until all children are successful. We don't have to pass the scope manually, we don't have to join, we don't have to pass errors from children to their siblings and to parent, we don't have to pass cancellations from the parent to children - this is all automatic.
Therefore, suspend functions and coroutineScope() should be the default way of writing concurrent code with coroutines. This approach is easy to write, easy to read and it is error-proof. We can't easily leak a background task, because coroutineScope() won't let us go anywhere. We can't mistakenly ignore errors from background tasks. Etc.
Of course, in some cases we can't use this pattern. Sometimes, we actually would like to only launch a long-running task and return immediately. Sometimes, we don't consider the caller to be the owner of the task. For example, we could have some kind of a service that manages its tasks and we only schedule these tasks, but the service itself owns them. For these cases we can use CoroutineScope.
By using the scope explicitly we can launch tasks in the different context than the current one or from outside of coroutine world. We generally have more control, but at the same time we partially opt-out of the code correctness guarantees I mentioned above. For example, if we forget to invoke join() we can easily leak background tasks or perform operations in unexpected order. Also, in your case if the coroutine invoking other() is cancelled, all launched operations will be still running in the background. For these reasons, we should use CoroutineScope explicitly only if needed.
Common patterns
As a result of all that was said above, when working with coroutines we usually use one of these patterns:
Suspend function - it runs within the caller context and it waits for all its subtasks, it doesn't launch anything in the background.
Function receiving CoroutineScope either as a param or receiver - usually, that means the function wants to do something with the context even after returning (because otherwise it could be simply a suspend function). It either launches some background tasks or stores the context somewhere for a later use.
Regular function that uses its own CoroutineScope to launch tasks. Usually, this is some kind of a service that keeps its custom context.
At least to me, function which is suspend and receives CoroutineScope is pretty confusing, it is not entirely clear what to expect from it. Will it execute the operation in the caller context or in the provided one? Will it wait to finish or only schedule the operation in the background and return immediately? Maybe it will do both: first do some initial processing synchronously (therefore suspend), but also schedule additional task in the background (therefore scope: CoroutineScope)? We don't know this, we have to read the documentation or source code to understand its behavior. Your second example is unnecessary complication over a simple suspend function.
To further make my point consider this example:
data class User(
val firstName: String,
val lastName: String,
) {
fun getFullName(user: User) = ...
}
This example is far from perfect, but the main point is that it is confusing why we have to pass user to getFullName() if we call this function on a user already. We don't know whether it returns a full name of the passed user, the user we invoked the function on or maybe some kind of a mix? If that would be a member function not receiving a User or a static utility function receiving a User, everything would be clear. But a member function receiving a User is simply confusing. This is similar to your second example where we pass the context both implicitly and explicitly and we don't know which one is used and how exactly.
I'm trying to understand kotlin coroutines, I'm coming from C# and there's something I'm not understanding here in kotlin. In this scenario I'm writing a webapi using Kotlin in the Quarkus framework. From what I can tell if I label a controller (or resource) function as a suspend function quarkus will automatically launch it in a coroutine.
The issue i have is i don't know what the preferred method for suspending that coroutine is. The vast majority of examples I see on kotlin coroutines use the delay() function, which internally uses suspendCancellableCoroutine() to suspend the function. That makes sense, but i don't see a lot of example calling suspendCancellableCoroutine() explicitly. I've done some reading about the underlying code that gets generated in a suspend function, and some resources lead me to believe that by virtue of calling another suspend function i'll hit a suspend point and that will suspend my coroutine. In C# i'd usually just call await() from inside my async function to execute the long running code.
In my kotlin setup i have setup an instance of jmeter and i simulate 5 threads calling my API at the same time, while limiting my program to run on a single thread in quarkus. My API then makes a call to another API (i'll call that API, data API from now on), which could be a long running operation. For the purpose of my test my data API has a 1 second sleep in it.
Essentially:
web api controller -> web api processing -> web api calls data api through client -> data API does slow operation
I've tried calling async/await on the call to the data API, which seems to work, JMeter reports that 5 requests are all completed in roughly 1 second, and the logging i have indicates that all 5 requests are handled on a single thread. This feels clunky though. I'm already in a coroutine and now my coroutine is creating a new coroutine (async is a coroutine builder) to execute the long running function.
I've also removed the async/await and updated the call to the data API to be a suspend function as well (though this is a client generated from resteasy client). This also seems to work, but resteasy reactive could be generating something that's doing the suspend for me. I need to work with a simpler example, but in the mean time...
If i'm not using the delay() function in Kotlin, and i'm executing code in a coroutine, what is the preferred method to indicate that a section of code is potentially blocking and my coroutine should be suspended? Do i launch a new coroutine? Call suspendCancellableCoroutine()? Or something else? Probably overthinking this, but i want to make sure i understand this.
The coroutines library provides several suspend functions you can use to suspend in a coroutine or in another suspend function, among them:
withContext
delay
coroutineScope
supervisorScope
suspendCoroutine
suspendCancellableCoroutine
Job.join
Deferred.await
The typical way to convert blocking (long-running synchronous) code into something you can use in a coroutine is to wrap it in withContext(Dispatchers.Default) { } or withContext(Dispatchers.IO) { }. If it's something you use repeatedly, you can write a suspend function for it:
suspend fun foo() = withContext(Dispatchers.IO) {
blockingFoo()
}
but if it's some one-off blocking chunk of code, you can use withContext directly in a coroutine.
Note, using async { }.await() is basically never done. The compiler warns you against it. You should be using withContext instead. Calling await on a Deferred is used either when one coroutine needs a result from some other coroutine that has been passed to it, or when you're working with multiple parallel children coroutines inside a coroutineScope block.
The typical way to convert asynchronous callback-based code into a suspend function so you can use it synchronously in a coroutine is to use suspendCoroutine or suspendCancellableCoroutine. You can look up how to use those. They are pretty low level. Many libraries like Retrofit and Firebase already provide suspend functions you can use instead of the callbacks.
coroutineScope and supervisorScope are for creating a scope inside your coroutine to run multiple children coroutines in parallel and wait for them all.
I often create classes that have functions that contain a coroutine. It isn't always clear whether the function is being used by some component that is bound to the UI or whether it's doing background work that is more IO oriented. Here's an example:
fun myFunction() {
GlobalScope.launch {
// Do something
}
}
In this example, no Dispatcher.MAIN or Dispatchers.IO is specified. Is this the correct way to do this? Does the coroutine use the scope of whatever the calling client happens to be using? Should I only specify a dispatcher when I know definitively that I need a specific scope?
GlobalScope binds the lifecycle of the Coroutine to the lifecycle of the application itself.
Which means Coroutine started from this scope would continue to live until one of two things occur
Coroutine completes its job.
The Application itself is killed.
Using async or launch on the instance of GlobalScope is highly discouraged.
No Dispatcher.MAIN or Dispatchers.IO is specified. Is this the correct way to do this?
Yea, why not? If the work inside coroutine is not related to either UI or IO go for it.
Should I only specify a dispatcher when I know definitively that I
need a specific scope?
To answer this, let's first see the definition of launch from docs,
fun CoroutineScope.launch(
context: CoroutineContext = EmptyCoroutineContext,
start: CoroutineStart = CoroutineStart.DEFAULT,
block: suspend CoroutineScope.() -> Unit ): Job (source)
The Dispatcher which we are talking about is a kind of CoroutineContext. As you can see in the definition if the CoroutineContext is not mentioned(which means we have not mentioned the Dispatcher too) it is by default set to EmptyCoroutineContext which internally uses Dispatchers.Default and this is what docs say about it,
The default CoroutineDispatcher that is used by all standard builders
like launch, async, etc if neither a dispatcher nor any other
ContinuationInterceptor is specified in their context.
It is backed by a shared pool of threads on JVM. By default, the
maximum number of threads used by this dispatcher is equal to the
number of CPU cores, but is at least two.
So even if you forget to mention the Dispatcher, Scheduler will pick any random available thread from the pool and hand it the Coroutine. But make sure that not to initiate any UI related work without mentioning the Dispatcher.
First of all, you must differentiate the scope from the context and dispatcher.
Coroutine scope is primarily about the lifecycle of the coroutine and deals with the concept of structured concurrency. It may have a default dispatcher, which would be the one logically associated with the object to which you tie the coroutine's lifecycle. For example, if you scope a coroutine to an Android activity, the default dispatcher will be UI.
Coroutine context refers to a dispatcher. The context should change during the coroutine's execution, as the logic inside requires it. Typically, you will use withContext to temporarily switch dispatchers in order to avoid blocking the UI thread. You will not typically launch the whole coroutine in the thread pool, unless all of it should run on a background thread (e.g., no UI interaction).
Second, the choice of dispatcher should be collocated with the code that requires a specific one. It should happen within the function that deals with a given concern, like making REST requests or DB operations. This once again reinforces the practice not to decide on dispatchers when launching the coroutine.
GlobalScope is an EmptyCoroutineScope and all coroutines launched with this scope are like demo threads. They cannot be canceled and remain active until their completion. I suggest implementing a specific scope e not using GlobalScope in order to control all the coroutines that are launched. The GlobalScope use the Dispatchers.Default as the default dispatcher and in your case you always create coroutines in the default dispatcher.
After reading the introduction and the javadoc of CoroutineScope I'm still a little confused what the idea behind a CoroutineScope is.
The first sentence of the doc "Defines a scope for new coroutines." is not clear to me: Why do my coroutines need a scope?
Also, why are standalone coroutine builders deprecated? Why is it better to do this:
fun CoroutineScope.produceSquares(): ReceiveChannel<Int> = produce {
for (x in 1..5) send(x * x)
}
instead of
fun produceSquares(): ReceiveChannel<Int> = produce { //no longer an extension function
for (x in 1..5) send(x * x)
}
You can still use global "standalone" coroutines by spawning them in GlobalScope:
GlobalScope.launch {
println("I'm running unstructured")
}
However, it's not recommended to do this since creating coroutines on a global scope is basically the same we did with good old threads. You create them but somehow need to keep track of a reference to later join/cancel them.
Using structured concurrency, that is nesting coroutines in their scopes, you will have a more maintainable system overall. For example, if you spawn a coroutine inside another one, you inherit the outer scope. This has multiple advantages. If you cancel the outer coroutine, the cancellation will be delegated to its inner coroutines. Also, you can be sure that the outer coroutine will not complete before all its children coroutines have done their work.
There's also a very good example shown in the documentation for CoroutineScope.
CoroutineScope should be implemented on entities with well-defined lifecycle that are responsible for launching children coroutines. Example of such entity on Android is Activity.
After all, the first version of your shown produceSquares methods is better as it is only executable if invoked in a CoroutineScope. That means you can run it inside any other coroutine:
launch {
produceSquares()
}
The coroutine created inside produceSquares inherits the scope of launch. You can be sure that launch does not complete before produceSquares. Also, if you cancelled launch, this would also effect produceSquares.
Furthermore, you can still create a globally running coroutine like this:
GlobalScope.produceSquares()
But, as mentioned, that's not the best option in most cases.
I'd also like to promote an article I wrote. There are some examples demonstrating what scopes mean: https://kotlinexpertise.com/kotlin-coroutines-concurrency/
It is related to the concept of structured concurrency, which defines a structure between coroutines.
On a more philosophical level, you rarely launch coroutines “globally”, like you do with threads. Coroutines are always related to some local scope in your application, which is an entity with a limited life-time, like a UI element. So, with structured concurrency we now require that launch is invoked in a CoroutineScope, which is an interface implemented by your life-time limited objects (like UI elements or their corresponding view models).
As an evident consequence of this concept: by cancelling the context of a scope, all it's subcoroutines will be canceled, too.
Is there a specific language implementation in Kotlin which differs from another language's implementation of coroutines?
What does it mean that a coroutine is like a lightweight thread?
What is the difference?
Are Kotlin coroutines actually running in parallel (concurrently)?
Even in a multi-core system, is there only one coroutine running at any given time?
Here I'm starting 100,000 coroutines. What happens behind this code?
for(i in 0..100000){
async(CommonPool){
// Run long-running operations
}
}
What does it mean that a coroutine is like a lightweight thread?
Coroutine, like a thread, represents a sequence of actions that are executed concurrently with other coroutines (threads).
What is the difference?
A thread is directly linked to the native thread in the corresponding OS (operating system) and consumes a considerable amount of resources. In particular, it consumes a lot of memory for its stack. That is why you cannot just create 100k threads. You are likely to run out of memory. Switching between threads involves OS kernel dispatcher and it is a pretty expensive operation in terms of CPU cycles consumed.
A coroutine, on the other hand, is purely a user-level language abstraction. It does not tie any native resources and, in the simplest case, uses just one relatively small object in the JVM heap. That is why it is easy to create 100k coroutines. Switching between coroutines does not involve OS kernel at all. It can be as cheap as invoking a regular function.
Are Kotlin coroutines actually running in parallel (concurrently)? Even in a multi-core system, is there only one coroutine running at any given time?
A coroutine can be either running or suspended. A suspended coroutine is not associated to any particular thread, but a running coroutine runs on some thread (using a thread is the only way to execute anything inside an OS process). Whether different coroutines all run on the same thread (a thus may use only a single CPU in a multicore system) or in different threads (and thus may use multiple CPUs) is purely in the hands of a programmer who is using coroutines.
In Kotlin, dispatching of coroutines is controlled via coroutine context. You can read more about then in the
Guide to kotlinx.coroutines
Here I'm starting 100,000 coroutines. What happens behind this code?
Assuming that you are using launch function and CommonPool context from the kotlinx.coroutines project (which is open source) you can examine their source code here:
launch is defined here https://github.com/Kotlin/kotlinx.coroutines/blob/master/core/kotlinx-coroutines-core/src/main/kotlin/kotlinx/coroutines/experimental/Builders.kt
CommonPool is defined here https://github.com/Kotlin/kotlinx.coroutines/blob/master/core/kotlinx-coroutines-core/src/main/kotlin/kotlinx/coroutines/experimental/CommonPool.kt
The launch just creates new coroutine, while CommonPool dispatches coroutines to a ForkJoinPool.commonPool() which does use multiple threads and thus executes on multiple CPUs in this example.
The code that follows launch invocation in {...} is called a suspending lambda. What is it and how are suspending lambdas and functions implemented (compiled) as well as standard library functions and classes like startCoroutines, suspendCoroutine and CoroutineContext is explained in the corresponding Kotlin coroutines design document.
Since I used coroutines only on JVM, I will talk about the JVM backend. There are also Kotlin Native and Kotlin JavaScript, but these backends for Kotlin are out of my scope.
So let's start with comparing Kotlin coroutines to other languages coroutines. Basically, you should know that there are two types of coroutines: stackless and stackful. Kotlin implements stackless coroutines - it means that coroutine doesn't have its own stack, and that limiting a little bit what coroutine can do. You can read a good explanation here.
Examples:
Stackless: C#, Scala, Kotlin
Stackful: Quasar, Javaflow
What does it mean that a coroutine is like a lightweight thread?
It means that coroutine in Kotlin doesn't have its own stack, it doesn't map on a native thread, it doesn't require context switching on a processor.
What is the difference?
Thread - preemptively multitasking. (usually).
Coroutine - cooperatively multitasking.
Thread - managed by OS (usually).
Coroutine - managed by a user.
Are Kotlin coroutines actually running in parallel (concurrently)?
It depends. You can run each coroutine in its own thread, or you can run all coroutines in one thread or some fixed thread pool.
More about how coroutines execute is here.
Even in a multi-core system, is there only one coroutine running at any given time?
No, see the previous answer.
Here I'm starting 100,000 coroutines. What happens behind this code?
Actually, it depends. But assume that you write the following code:
fun main(args: Array<String>) {
for (i in 0..100000) {
async(CommonPool) {
delay(1000)
}
}
}
This code executes instantly.
Because we need to wait for results from async call.
So let's fix this:
fun main(args: Array<String>) = runBlocking {
for (i in 0..100000) {
val job = async(CommonPool) {
delay(1)
println(i)
}
job.join()
}
}
When you run this program, Kotlin will create 2 * 100000 instances of Continuation, which will take a few dozen MB of RAM, and in the console, you will see numbers from 1 to 100000.
So let’s rewrite this code in this way:
fun main(args: Array<String>) = runBlocking {
val job = async(CommonPool) {
for (i in 0..100000) {
delay(1)
println(i)
}
}
job.join()
}
What do we achieve now? Now we create only 100,001 instances of Continuation, and this is much better.
Each created Continuation will be dispatched and executed on CommonPool (which is a static instance of ForkJoinPool).