I'm working on a new side project with a goal of more deeply learning Kotlin, and I'm having a little trouble figuring out how to mix Kotlin-style concurrency with code not written with coroutines in mind (JOOQ in this case). The function below is in one of my DAOs, and in that map block, I want to update a bunch of rows in the DB. In this specific example, the updates are actually dependent on the previous one completing, so it will need to be done sequentially, but I'm interested in how this code could be modified to run the updates in parallel, since there will undoubtedly be use cases I have that don't need to be run sequentially.
suspend fun updateProductChoices(choice: ProductChoice) = withContext(Dispatchers.IO) {
ctx().transaction { config ->
val tx = DSL.using(config)
val previousRank = tx.select(PRODUCT_CHOICE.RANK)
.from(PRODUCT_CHOICE)
.where(PRODUCT_CHOICE.STORE_PRODUCT_ID.eq(choice.storeProductId))
.and(PRODUCT_CHOICE.PRODUCT_ID.eq(choice.productId))
.fetchOne(PRODUCT_CHOICE.RANK)
(previousRank + 1..choice.rank).map { rank ->
tx.update(PRODUCT_CHOICE)
.set(PRODUCT_CHOICE.RANK, rank - 1)
.where(PRODUCT_CHOICE.PRODUCT_ID.eq(choice.productId))
.and(PRODUCT_CHOICE.RANK.eq(rank))
.execute()
}
}
}
Would the best way be to wrap the transaction lambda in runBlocking and each update in an async, then awaitAll the result? Also possibly worth noting that the JOOQ queries support executeAsync() which returns a CompletionStage.
Yes, use JOOQ's executeAsync. With executeAsync, you can remove the withContext(Dispatchers.IO) because the call is no longer blocking.
The kotlinx-coroutines-jdk8 library includes coroutines integration with CompletionStage, so you can do a suspending await on it (docs).
To perform the updates in parallel, note that the same library can convert a CompletionStage to a Deferred (docs). Therefore, if you change the call to execute to executeAsync().asDeferred() you will get a list of Deferreds, on which you can awaitAll().
Related
This question already has answers here:
how to cap kotlin coroutines maximum concurrency
(7 answers)
Closed last month.
I have code, something like this:
entities.forEach {
launch() {
doingSomethingWithDB(it)
}
}
suspend fun doingSomethingWithDB(entity) {
getDBConnectionFromPool()
// doing something
returnDBConnectionToPool()
}
And when the number of entities exceeds the size of DB connections pool (I use HikariCP), I get the error - Connection is not available.... Even if I only use the single thread (e.g. -Dkotlinx.coroutines.io.parallelism=1), I get this error anyway.
Are there best practices for limiting the number of parallel coroutines when dealing with external resources (like fixed size DB connection pool)?
As your doingSomethingWithDB() acquires and releases resources manually at the beginning/end, limiting the parallelism is not sufficient in this case - we need to limit the concurrency. The easiest way to do this is by using a Semaphore:
val semaphore = Semaphore(8)
suspend fun doingSomethingWithDB(entity) {
semaphore.withPermit {
getDBConnectionFromPool()
// doing something
returnDBConnectionToPool()
}
}
A few words of explanation: because coroutines can suspend and switch from thread to thread, even if we limit the parallelism of coroutines that invoke doingSomethingWithDB(), still this function can be invoked arbitrary number of times concurrently. Parallelism only means how many coroutines could be actively executing at a specific moment in time, but if any of them suspend, additional coroutines could proceed.
We are using quarkus to process messages this run on a regular function
in that we have to call a suspend function
basically
fun process (msg:Message):Message{
val resFrom:Data = runBlocking{fetchDataFromDB(msg.key)}
val processedMsg = processRoutingKey(msg,resFrom)
return processedMsg
}
We would like to get the data as a Uni (https://smallrye.io/smallrye-mutiny/getting-started/creating-unis)
so basically we would like to get back
fun process (msg:Message){
val resFrom:Uni<Data> = ConvertUni {fetchDataFromDB(msg.key)}
}
We need the uni further downstream one time to process some data but we would like to return a Uni from the method meaning
fun process (msg:Message):Uni<Message>{
val resFrom:Uni<Data> = ConvertUni {fetchDataFromDB(msg.key)}
val processed:Uni<Message> =process(msg,resfrom)
return processed
}
The signature fun process(msg:Message): Uni<Message> implies that some asynchronous mechanism needs to be started and will outlive the method call. It's like returning a Future or a Deferred. The function returns immediately but the underlying processing is not done yet.
In the coroutines world, this means you need to start a coroutine. However, like any async mechanism, it requires you to be conscious about where it will run, and for how long. This is defined by the CoroutineScope you use to start the coroutine, and this is why coroutine builders like async require such a scope.
So you need to pass a CoroutineScope to your function if you want it to start a coroutine that will last longer than the function call:
fun CoroutineScope.process(msg:Message): Uni<Message> {
val uniResult = async { fetchDataFromDB(msg.key) }.asUni()
return process(msg, uniResult)
}
Here Deferred<T>.asUni() is provided by the library mutiny-kotlin. In the examples given in their doc, they use GlobalScope instead of asking the caller to pass a coroutine scope. This is usually a bad practice because it means you don't control the lifetime of the started coroutine, and you may leak things if you're not careful.
Accepting a CoroutineScope as receiver means the caller of the method can choose the scope of this coroutine, which will automatically cancel the coroutine when appropriate, and will also define the thread pool / event loop on which the coroutine runs.
Now, with that in mind, you see that you'll be using a mix of coroutines and Uni at the same level of API here, which is not great. I would advise to instead stick with suspend functions all the way, until you really have to convert to Uni.
I am fairly new to Kotlin, and am getting to grips with it's implementation of co-routines. I understand that any function that we may want Kotlin to deal with in a non-blocking way needs to be annotated with suspend, and that such functions can only be executed within a co-routine (or within another suspend function). So far so good.
However I keep coming across a problem with utility functions that accept other functions as parameters. For instance with arrow's Try:
suspend fun somethingAsync() = 1 + 1
Try { 1 + 1 } // All is well
Try { somethingAsync() } // Uh oh....
As the parameter to Try's invoke function/operator is not annotated with suspend, the second call will be rejected by the compiler. How does someone deal with this when writing utility functions that can not know if the code inside the passed function or lambda requires suspend or not? Writing a suspend and a non-suspend version of every such function seems incredibly tedious. Have I missed an obvious way to deal with this situation?
First, let's deal with suspend. What it means is this particular function blocks. Not that this function is asynchronous.
Usually, blocking means IO, but not always. In your example, the function doesn't block, nor does it something in an asynchronous manner (hence Async suffix is incorrect there). But lets assume actual utility code does block for some reason.
Now dealing with suspending functions is something that is done on the caller side. Meaning, what would you like to do while this is being executed:
fun doSomething() {
Try { somethingAsync() }
}
If you're fine with doSomething to block, then you can use runBlocking:
fun doSomething() = runBlocking {
Try { somethingAsync() }
}
I have written 3 simple programs to test coroutines performance advantage over threads. Each program does a lot of common simple computations. All programs were run separately from each other. Besides execution time I measured CPU usage via Visual VM IDE plugin.
First program does all computations using 1000-threaded pool. This piece of code shows the worst results (64326 ms) comparing to others because of frequent context changes:
val executor = Executors.newFixedThreadPool(1000)
time = generateSequence {
measureTimeMillis {
val comps = mutableListOf<Future<Int>>()
for (i in 1..1_000_000) {
comps += executor.submit<Int> { computation2(); 15 }
}
comps.map { it.get() }.sum()
}
}.take(100).sum()
println("Completed in $time ms")
executor.shutdownNow()
Second program has the same logic but instead of 1000-threaded pool it uses only n-threaded pool (where n equals to amount of the machine's cores). It shows much better results (43939 ms) and uses less threads which is good too.
val executor2 = Executors.newFixedThreadPool(4)
time = generateSequence {
measureTimeMillis {
val comps = mutableListOf<Future<Int>>()
for (i in 1..1_000_000) {
comps += executor2.submit<Int> { computation2(); 15 }
}
comps.map { it.get() }.sum()
}
}.take(100).sum()
println("Completed in $time ms")
executor2.shutdownNow()
Third program is written with coroutines and shows a big variance in the results (from 41784 ms to 81101 ms). I am very confused and don't quite understand why they are so different and why coroutines sometimes slower than threads (considering small async calculations is a forte of coroutines). Here is the code:
time = generateSequence {
runBlocking {
measureTimeMillis {
val comps = mutableListOf<Deferred<Int>>()
for (i in 1..1_000_000) {
comps += async { computation2(); 15 }
}
comps.map { it.await() }.sum()
}
}
}.take(100).sum()
println("Completed in $time ms")
I actually read a lot about these coroutines and how they are implemented in kotlin, but in practice I don't see them working as intended. Am I doing my benchmarking wrong? Or maybe I'm using coroutines wrong?
The way you've set up your problem, you shouldn't expect any benefit from coroutines. In all cases you submit a non-divisible block of computation to an executor. You are not leveraging the idea of coroutine suspension, where you can write sequential code that actually gets chopped up and executed piecewise, possibly on different threads.
Most use cases of coroutines revolve around blocking code: avoiding the scenario where you hog a thread to do nothing but wait for a response. They may also be used to interleave CPU-intensive tasks, but this is a more special-cased scenario.
I would suggest benchmarking 1,000,000 tasks that involve several sequential blocking steps, like in Roman Elizarov's KotlinConf 2017 talk:
suspend fun postItem(item: Item) {
val token = requestToken()
val post = createPost(token, item)
processPost(post)
}
where all of requestToken(), createPost() and processPost() involve network calls.
If you have two implementations of this, one with suspend funs and another with regular blocking functions, for example:
fun requestToken() {
Thread.sleep(1000)
return "token"
}
vs.
suspend fun requestToken() {
delay(1000)
return "token"
}
you'll find that you can't even set up to execute 1,000,000 concurrent invocations of the first version, and if you lower the number to what you can actually achieve without OutOfMemoryException: unable to create new native thread, the performance advantage of coroutines should be evident.
If you want to explore possible advantages of coroutines for CPU-bound tasks, you need a use case where it's not irrelevant whether you execute them sequentially or in parallel. In your examples above, this is treated as an irrelevant internal detail: in one version you run 1,000 concurrent tasks and in the other one you use just four, so it's almost sequential execution.
Hazelcast Jet is an example of such a use case because the computation tasks are co-dependent: one's output is another one's input. In this case you can't just run a few of them until completion, on a small thread pool, you actually have to interleave them so the buffered output doesn't explode. If you try to set up such a scenario with and without coroutines, you'll once again find that you're either allocating as many threads as there are tasks, or you are using suspendable coroutines, and the latter approach wins. Hazelcast Jet implements the spirit of coroutines in plain Java API. Its approach would hugely benefit from the coroutine programming model, but currently it's pure Java.
Disclosure: the author of this post belongs to the Jet engineering team.
Coroutines are not designed to be faster than threads, it is for lower RAM consumption and better syntax for async calls.
Coroutines are designed to be lightweight threads. It uses lower RAM, because when you execute 1,000,000 concurrent routines, it doesn't have to create 1,000,000 threads. Coroutine can help you to optimise the threads usage, and make the execution more efficiency, and you don't need to care about the threads anymore. You can consider a coroutine as a runnable or task, which you can post into a handler and executed in a thread or threadpool.
I'm writing a primitive that takes in two agentsets and a command block. It needs to call a few functions, execute the command block in the current context, and then call another function. Here's what I have so far:
class WithContext(pushGraphContext: GraphContext => Unit, popGraphContext: api.World => GraphContext)
extends api.DefaultCommand {
override def getSyntax = commandSyntax(
Array(AgentsetType, AgentsetType, CommandBlockType))
def perform(args: Array[Argument], context: Context) {
val turtleSet = args(0).getAgentSet.requireTurtleSet
val linkSet = args(1).getAgentSet.requireLinkSet
val world = linkSet.world
val gc = new GraphContext(world, turtleSet, linkSet)
val extContext = context.asInstanceOf[ExtensionContext]
val nvmContext = extContext.nvmContext
pushGraphContext(gc)
// execute command block here
popGraphContext(world)
}
}
I looked at some examples that used nvmContext.runExclusively, but that looked like it's specifically for having a given agentset run the command block. I want the current agent (possibly the observer) to run it. Should I wrap nvm.agent in an agentset and pass that to nvmContext.runExclusively? If so, what's the easiest way to wrap an agent in agentset? If not, what should I do?
Method #1
The quicker-but-arguably-dirtier method is to use runExclusiveJob, as demonstrated in (e.g.) the create-red-turtles command in https://github.com/NetLogo/Sample-Scala-Extension/blob/master/src/SampleScalaExtension.scala .
To wrap the current agent in an agentset, you can use agent.AgentSetBuilder. (You could also pass an Array[Agent] of length 1 to one of the ArrayAgentSet constructors, but I'd recommend AgentSetBuilder since it's less reliant on internal implementation details which are likely to change.)
Method #2
The disadvantage of method #1 is the slight constant overhead associated with creating and setting up the extra AgentSet, Job, and Context objects and directing execution through them.
Creating and running a separate job isn't actually how built-in commands like if and while work. Instead of making a new job, they remain in the current job and cause commands in a command block to run (or not run) by manipulating the instruction pointer (nvm.Context.ip) to jump into them or skip over them.
I believe an extension command could do the same. I'm not certain if it has been tried before, but I can't see any reason it wouldn't work.
Doing it this way would involve understanding more about NetLogo engine internals, as documented at https://github.com/NetLogo/NetLogo/wiki/Engine-architecture . You'd model your primitive after e.g. https://github.com/NetLogo/NetLogo/blob/5.0.x/src/main/org/nlogo/prim/etc/_if.java , including altering your implementation of nvm.CustomAssembled. (Note that prim._extern, which runs extension commands, delegates its assemble method to the wrapped command's own assemble method, so this should work.) In your assemble method, instead of calling done() at the end to terminate the job, you'd just allow execution to fall through.
I could try to construct an example that works this way, but it'd take me a couple hours; it's probably not worth me doing unless there's a real need.