I have written 3 simple programs to test coroutines performance advantage over threads. Each program does a lot of common simple computations. All programs were run separately from each other. Besides execution time I measured CPU usage via Visual VM IDE plugin.
First program does all computations using 1000-threaded pool. This piece of code shows the worst results (64326 ms) comparing to others because of frequent context changes:
val executor = Executors.newFixedThreadPool(1000)
time = generateSequence {
measureTimeMillis {
val comps = mutableListOf<Future<Int>>()
for (i in 1..1_000_000) {
comps += executor.submit<Int> { computation2(); 15 }
}
comps.map { it.get() }.sum()
}
}.take(100).sum()
println("Completed in $time ms")
executor.shutdownNow()
Second program has the same logic but instead of 1000-threaded pool it uses only n-threaded pool (where n equals to amount of the machine's cores). It shows much better results (43939 ms) and uses less threads which is good too.
val executor2 = Executors.newFixedThreadPool(4)
time = generateSequence {
measureTimeMillis {
val comps = mutableListOf<Future<Int>>()
for (i in 1..1_000_000) {
comps += executor2.submit<Int> { computation2(); 15 }
}
comps.map { it.get() }.sum()
}
}.take(100).sum()
println("Completed in $time ms")
executor2.shutdownNow()
Third program is written with coroutines and shows a big variance in the results (from 41784 ms to 81101 ms). I am very confused and don't quite understand why they are so different and why coroutines sometimes slower than threads (considering small async calculations is a forte of coroutines). Here is the code:
time = generateSequence {
runBlocking {
measureTimeMillis {
val comps = mutableListOf<Deferred<Int>>()
for (i in 1..1_000_000) {
comps += async { computation2(); 15 }
}
comps.map { it.await() }.sum()
}
}
}.take(100).sum()
println("Completed in $time ms")
I actually read a lot about these coroutines and how they are implemented in kotlin, but in practice I don't see them working as intended. Am I doing my benchmarking wrong? Or maybe I'm using coroutines wrong?
The way you've set up your problem, you shouldn't expect any benefit from coroutines. In all cases you submit a non-divisible block of computation to an executor. You are not leveraging the idea of coroutine suspension, where you can write sequential code that actually gets chopped up and executed piecewise, possibly on different threads.
Most use cases of coroutines revolve around blocking code: avoiding the scenario where you hog a thread to do nothing but wait for a response. They may also be used to interleave CPU-intensive tasks, but this is a more special-cased scenario.
I would suggest benchmarking 1,000,000 tasks that involve several sequential blocking steps, like in Roman Elizarov's KotlinConf 2017 talk:
suspend fun postItem(item: Item) {
val token = requestToken()
val post = createPost(token, item)
processPost(post)
}
where all of requestToken(), createPost() and processPost() involve network calls.
If you have two implementations of this, one with suspend funs and another with regular blocking functions, for example:
fun requestToken() {
Thread.sleep(1000)
return "token"
}
vs.
suspend fun requestToken() {
delay(1000)
return "token"
}
you'll find that you can't even set up to execute 1,000,000 concurrent invocations of the first version, and if you lower the number to what you can actually achieve without OutOfMemoryException: unable to create new native thread, the performance advantage of coroutines should be evident.
If you want to explore possible advantages of coroutines for CPU-bound tasks, you need a use case where it's not irrelevant whether you execute them sequentially or in parallel. In your examples above, this is treated as an irrelevant internal detail: in one version you run 1,000 concurrent tasks and in the other one you use just four, so it's almost sequential execution.
Hazelcast Jet is an example of such a use case because the computation tasks are co-dependent: one's output is another one's input. In this case you can't just run a few of them until completion, on a small thread pool, you actually have to interleave them so the buffered output doesn't explode. If you try to set up such a scenario with and without coroutines, you'll once again find that you're either allocating as many threads as there are tasks, or you are using suspendable coroutines, and the latter approach wins. Hazelcast Jet implements the spirit of coroutines in plain Java API. Its approach would hugely benefit from the coroutine programming model, but currently it's pure Java.
Disclosure: the author of this post belongs to the Jet engineering team.
Coroutines are not designed to be faster than threads, it is for lower RAM consumption and better syntax for async calls.
Coroutines are designed to be lightweight threads. It uses lower RAM, because when you execute 1,000,000 concurrent routines, it doesn't have to create 1,000,000 threads. Coroutine can help you to optimise the threads usage, and make the execution more efficiency, and you don't need to care about the threads anymore. You can consider a coroutine as a runnable or task, which you can post into a handler and executed in a thread or threadpool.
Related
This question already has answers here:
how to cap kotlin coroutines maximum concurrency
(7 answers)
Closed last month.
I have code, something like this:
entities.forEach {
launch() {
doingSomethingWithDB(it)
}
}
suspend fun doingSomethingWithDB(entity) {
getDBConnectionFromPool()
// doing something
returnDBConnectionToPool()
}
And when the number of entities exceeds the size of DB connections pool (I use HikariCP), I get the error - Connection is not available.... Even if I only use the single thread (e.g. -Dkotlinx.coroutines.io.parallelism=1), I get this error anyway.
Are there best practices for limiting the number of parallel coroutines when dealing with external resources (like fixed size DB connection pool)?
As your doingSomethingWithDB() acquires and releases resources manually at the beginning/end, limiting the parallelism is not sufficient in this case - we need to limit the concurrency. The easiest way to do this is by using a Semaphore:
val semaphore = Semaphore(8)
suspend fun doingSomethingWithDB(entity) {
semaphore.withPermit {
getDBConnectionFromPool()
// doing something
returnDBConnectionToPool()
}
}
A few words of explanation: because coroutines can suspend and switch from thread to thread, even if we limit the parallelism of coroutines that invoke doingSomethingWithDB(), still this function can be invoked arbitrary number of times concurrently. Parallelism only means how many coroutines could be actively executing at a specific moment in time, but if any of them suspend, additional coroutines could proceed.
We are using quarkus to process messages this run on a regular function
in that we have to call a suspend function
basically
fun process (msg:Message):Message{
val resFrom:Data = runBlocking{fetchDataFromDB(msg.key)}
val processedMsg = processRoutingKey(msg,resFrom)
return processedMsg
}
We would like to get the data as a Uni (https://smallrye.io/smallrye-mutiny/getting-started/creating-unis)
so basically we would like to get back
fun process (msg:Message){
val resFrom:Uni<Data> = ConvertUni {fetchDataFromDB(msg.key)}
}
We need the uni further downstream one time to process some data but we would like to return a Uni from the method meaning
fun process (msg:Message):Uni<Message>{
val resFrom:Uni<Data> = ConvertUni {fetchDataFromDB(msg.key)}
val processed:Uni<Message> =process(msg,resfrom)
return processed
}
The signature fun process(msg:Message): Uni<Message> implies that some asynchronous mechanism needs to be started and will outlive the method call. It's like returning a Future or a Deferred. The function returns immediately but the underlying processing is not done yet.
In the coroutines world, this means you need to start a coroutine. However, like any async mechanism, it requires you to be conscious about where it will run, and for how long. This is defined by the CoroutineScope you use to start the coroutine, and this is why coroutine builders like async require such a scope.
So you need to pass a CoroutineScope to your function if you want it to start a coroutine that will last longer than the function call:
fun CoroutineScope.process(msg:Message): Uni<Message> {
val uniResult = async { fetchDataFromDB(msg.key) }.asUni()
return process(msg, uniResult)
}
Here Deferred<T>.asUni() is provided by the library mutiny-kotlin. In the examples given in their doc, they use GlobalScope instead of asking the caller to pass a coroutine scope. This is usually a bad practice because it means you don't control the lifetime of the started coroutine, and you may leak things if you're not careful.
Accepting a CoroutineScope as receiver means the caller of the method can choose the scope of this coroutine, which will automatically cancel the coroutine when appropriate, and will also define the thread pool / event loop on which the coroutine runs.
Now, with that in mind, you see that you'll be using a mix of coroutines and Uni at the same level of API here, which is not great. I would advise to instead stick with suspend functions all the way, until you really have to convert to Uni.
I'm working on a new side project with a goal of more deeply learning Kotlin, and I'm having a little trouble figuring out how to mix Kotlin-style concurrency with code not written with coroutines in mind (JOOQ in this case). The function below is in one of my DAOs, and in that map block, I want to update a bunch of rows in the DB. In this specific example, the updates are actually dependent on the previous one completing, so it will need to be done sequentially, but I'm interested in how this code could be modified to run the updates in parallel, since there will undoubtedly be use cases I have that don't need to be run sequentially.
suspend fun updateProductChoices(choice: ProductChoice) = withContext(Dispatchers.IO) {
ctx().transaction { config ->
val tx = DSL.using(config)
val previousRank = tx.select(PRODUCT_CHOICE.RANK)
.from(PRODUCT_CHOICE)
.where(PRODUCT_CHOICE.STORE_PRODUCT_ID.eq(choice.storeProductId))
.and(PRODUCT_CHOICE.PRODUCT_ID.eq(choice.productId))
.fetchOne(PRODUCT_CHOICE.RANK)
(previousRank + 1..choice.rank).map { rank ->
tx.update(PRODUCT_CHOICE)
.set(PRODUCT_CHOICE.RANK, rank - 1)
.where(PRODUCT_CHOICE.PRODUCT_ID.eq(choice.productId))
.and(PRODUCT_CHOICE.RANK.eq(rank))
.execute()
}
}
}
Would the best way be to wrap the transaction lambda in runBlocking and each update in an async, then awaitAll the result? Also possibly worth noting that the JOOQ queries support executeAsync() which returns a CompletionStage.
Yes, use JOOQ's executeAsync. With executeAsync, you can remove the withContext(Dispatchers.IO) because the call is no longer blocking.
The kotlinx-coroutines-jdk8 library includes coroutines integration with CompletionStage, so you can do a suspending await on it (docs).
To perform the updates in parallel, note that the same library can convert a CompletionStage to a Deferred (docs). Therefore, if you change the call to execute to executeAsync().asDeferred() you will get a list of Deferreds, on which you can awaitAll().
I have two programs
with coroutine
I have 3 loops and i tried assigning each loop to a coroutine for quick execution.
import kotlinx.coroutines.*
fun main() {
val time = measureTimeMillis() {
var i=0
var j=0
var k=0
GlobalScope.launch(Dispatchers.Default){
while(i<1000000)
i++ }
GlobalScope.launch(Dispatchers.Default){
while(j<1000000)
j++}
GlobalScope.launch(Dispatchers.Default){
while(k<1000000)
k++}
}
println(time)
}
Output
109
Without Coroutine
import kotlin.system.measureTimeMillis
import kotlinx.coroutines.*
fun main() {
val time = measureTimeMillis() {
var i=0
var j=0
var k=0
while(i<1000000)
i++
while(j<1000000)
j++
while(k<1000000)
k++
}
println(time)
}
Output
9
I have used timer to calculate the exectution time, but the coroutine code is taking longer.
Why is it working like this, how can i make the coroutine part faster?
Your code disregards a great deal of concerns that make your two examples very different.
First of all, you should never trust the timing of the very first run through the code. This is the time when all the heavyweight class initialization happens, including the initialization of the classes you touch indirectly by calling a library function.
Second, you also ignore all the optimizations the JIT compiler does to the bytecode. Most important in your case is that the code does nothing but increment local variables without using them afterwards. The JIT compiler will be happy to entirely delete your loops. Even if you use the results afterwards, the compiler may be able to do some simple reasoning on what the resulting value will be after 1,000,000 increments.
Your first example, with coroutines, is fundamentally different in that it submits tasks to the commonPool executor service. This means your incrementing code happens within a lambda that captures your local variable. In order to make it work, the compiler must transform it into an instance variable attached to the lambda. This muddies the waters in terms of the compiler proving the loop can be safely eliminated.
However, even if you accounted for all these things, your code is broken in an elementary way as well: you don't await for the completion of the launched coroutines. So after you fix the above issues and make the loops do some non-trivial work whose results you actually check for, you'll find that the coroutine examples report constant time that doesn't depend on the number of loop iterations.
I think what best explains your current results is the initialization costs. Just put a big outer loop over the whole code and the coroutine example will appear to perform much better than now.
I'm using a kotlin channel in order to migrate a database: I have 1 producer and multiple processors which write to database. The producer just sends the batches of documents to channel:
fun CoroutineScope.produceDocumentBatches(mongoCollection: MongoCollection<Document>) = produce<List<Document>> {
var batch = arrayListOf<Document>()
for ((counter, document) in mongoCollection.find().withIndex()) {
if ((counter + 1) % 100 == 0) {
sendBlocking(batch)
batch = arrayListOf()
}
batch.add(document)
}
if (batch.isNotEmpty()) sendBlocking(batch) }
}
This is how my processors look like:
private fun CoroutineScope.processDocumentsAsync(
documentDbCollection: MongoCollection<Document>,
channel: ReceiveChannel<List<Document>>,
numberOfProcessedDocuments: AtomicInteger
) = launch(Dispatchers.IO) {
// do processing
}
And this is how I use them in the script:
fun run() = runBlocking {
val producer = produceDocumentBatches(mongoCollection)
(1..64).map { processDocumentsAsync(documentDbCollection, producer, count) }
}
So is it fine to use sendBlocking with regards to performance? If I use just send I create many suspending functions inside one coroutine because writes to database are much slower than reads and I get java.lang.OutOfMemoryError: Java heap space. Do I understand correctly that producer blocks Main thread but it's fine for performance because all consumers are executed on IO threads?
Maybe my understanding is not correct, but I think you are not creating many suspending functions inside a coroutine, not sure if it even makes sense to say something like that. As you haven't defined a capacity when using produce function, default value is zero and it is using a RendezvousChannel. It is going to suspend till another coroutine invokes receive. I don't think you need sendBlocking.
My guess is that you are using two many consumers and has two many Document instances in memory, it could be the reason for the OutOfMemoryError. What is the size of each Document? What are your jvm heap configurations?
My suggestions:
Use send instead of sendBlocking
Decrease the number of consumers to a much smaller number and see if the OutOfMemory persists.
Use clear() to erase the ArrayList instead of creating a new instance.
If it works fine after those suggestions, try to increase the number of consumers and check if everything still works fine.