I have two programs
with coroutine
I have 3 loops and i tried assigning each loop to a coroutine for quick execution.
import kotlinx.coroutines.*
fun main() {
val time = measureTimeMillis() {
var i=0
var j=0
var k=0
GlobalScope.launch(Dispatchers.Default){
while(i<1000000)
i++ }
GlobalScope.launch(Dispatchers.Default){
while(j<1000000)
j++}
GlobalScope.launch(Dispatchers.Default){
while(k<1000000)
k++}
}
println(time)
}
Output
109
Without Coroutine
import kotlin.system.measureTimeMillis
import kotlinx.coroutines.*
fun main() {
val time = measureTimeMillis() {
var i=0
var j=0
var k=0
while(i<1000000)
i++
while(j<1000000)
j++
while(k<1000000)
k++
}
println(time)
}
Output
9
I have used timer to calculate the exectution time, but the coroutine code is taking longer.
Why is it working like this, how can i make the coroutine part faster?
Your code disregards a great deal of concerns that make your two examples very different.
First of all, you should never trust the timing of the very first run through the code. This is the time when all the heavyweight class initialization happens, including the initialization of the classes you touch indirectly by calling a library function.
Second, you also ignore all the optimizations the JIT compiler does to the bytecode. Most important in your case is that the code does nothing but increment local variables without using them afterwards. The JIT compiler will be happy to entirely delete your loops. Even if you use the results afterwards, the compiler may be able to do some simple reasoning on what the resulting value will be after 1,000,000 increments.
Your first example, with coroutines, is fundamentally different in that it submits tasks to the commonPool executor service. This means your incrementing code happens within a lambda that captures your local variable. In order to make it work, the compiler must transform it into an instance variable attached to the lambda. This muddies the waters in terms of the compiler proving the loop can be safely eliminated.
However, even if you accounted for all these things, your code is broken in an elementary way as well: you don't await for the completion of the launched coroutines. So after you fix the above issues and make the loops do some non-trivial work whose results you actually check for, you'll find that the coroutine examples report constant time that doesn't depend on the number of loop iterations.
I think what best explains your current results is the initialization costs. Just put a big outer loop over the whole code and the coroutine example will appear to perform much better than now.
Related
Kotlin has nice wrappers and shortcuts, but sometimes I get caught not understanding them.
I have this simplified code:
class PipeSeparatedItemsReader (private val filePath: Path) : ItemsReader {
override fun readItems(): Sequence<ItemEntry> {
return filePath.useLines { lines ->
lines.map { ItemEntry("A","B","C","D",) }
}
}
And then I have:
val itemsPath = Path(...).resolve()
val itemsReader = PipeSeparatedItemsReader(itemsPath)
for (itemEntry in itemsReader.readItems())
updateItem(itemEntry)
// I have also tried itemsReader.readItems().forEach { ... }
Which is quite straightforward - I expect this code to give me a sequence which opens a file and reads the lines, parses them, and gives ItemEntrys, and when used up, close the file.
What I get, however, is IOException("Stream closed").
Somehow, even before the first item is read (I have debugged), somewhere within Kotlin's wrappers, the reader.in becomes null, so this exception is thrown in hasNext().
I have seen a similar question here: Kotlin to chain multiple sequences from different InputStream?
That one includes a lot of Java boilerplate which I would like to avoid.
How should I code this sequence using Path.useLines()?
Every Kotlin helper with "use" in the name closes the underlying resource at the end of the lambda you pass (at least that's a convention in the stdlib as far as I know). The most common example being AutoCloseable.use.
The Path.useLines extension is no exception:
Calls the block callback giving it a sequence of all the lines in this file and closes the reader once the processing is complete. [emphasis mine]
This means useLines closes the sequence of lines once the block is done, and thus you cannot return a lazy sequence out of it because you can't use it after the useLines function returns.
So, if you want to return a sequence of lines for later use, you cannot return a transformed sequence from that of useLines directly. Sequences actually cannot detect when someone is done using them, hence why useLines needs a lambda to give a "scope" or "lifetime" to the sequence and know when to close the underlying reader.
If you want to wrap this, you have 2 major options: either split the sequence operation and the close operation (make your PipeSeparatedItemsReader closeable), or use a lambda to process things in-place in readItems() the same way useLines does.
I'm working on a new side project with a goal of more deeply learning Kotlin, and I'm having a little trouble figuring out how to mix Kotlin-style concurrency with code not written with coroutines in mind (JOOQ in this case). The function below is in one of my DAOs, and in that map block, I want to update a bunch of rows in the DB. In this specific example, the updates are actually dependent on the previous one completing, so it will need to be done sequentially, but I'm interested in how this code could be modified to run the updates in parallel, since there will undoubtedly be use cases I have that don't need to be run sequentially.
suspend fun updateProductChoices(choice: ProductChoice) = withContext(Dispatchers.IO) {
ctx().transaction { config ->
val tx = DSL.using(config)
val previousRank = tx.select(PRODUCT_CHOICE.RANK)
.from(PRODUCT_CHOICE)
.where(PRODUCT_CHOICE.STORE_PRODUCT_ID.eq(choice.storeProductId))
.and(PRODUCT_CHOICE.PRODUCT_ID.eq(choice.productId))
.fetchOne(PRODUCT_CHOICE.RANK)
(previousRank + 1..choice.rank).map { rank ->
tx.update(PRODUCT_CHOICE)
.set(PRODUCT_CHOICE.RANK, rank - 1)
.where(PRODUCT_CHOICE.PRODUCT_ID.eq(choice.productId))
.and(PRODUCT_CHOICE.RANK.eq(rank))
.execute()
}
}
}
Would the best way be to wrap the transaction lambda in runBlocking and each update in an async, then awaitAll the result? Also possibly worth noting that the JOOQ queries support executeAsync() which returns a CompletionStage.
Yes, use JOOQ's executeAsync. With executeAsync, you can remove the withContext(Dispatchers.IO) because the call is no longer blocking.
The kotlinx-coroutines-jdk8 library includes coroutines integration with CompletionStage, so you can do a suspending await on it (docs).
To perform the updates in parallel, note that the same library can convert a CompletionStage to a Deferred (docs). Therefore, if you change the call to execute to executeAsync().asDeferred() you will get a list of Deferreds, on which you can awaitAll().
I have written 3 simple programs to test coroutines performance advantage over threads. Each program does a lot of common simple computations. All programs were run separately from each other. Besides execution time I measured CPU usage via Visual VM IDE plugin.
First program does all computations using 1000-threaded pool. This piece of code shows the worst results (64326 ms) comparing to others because of frequent context changes:
val executor = Executors.newFixedThreadPool(1000)
time = generateSequence {
measureTimeMillis {
val comps = mutableListOf<Future<Int>>()
for (i in 1..1_000_000) {
comps += executor.submit<Int> { computation2(); 15 }
}
comps.map { it.get() }.sum()
}
}.take(100).sum()
println("Completed in $time ms")
executor.shutdownNow()
Second program has the same logic but instead of 1000-threaded pool it uses only n-threaded pool (where n equals to amount of the machine's cores). It shows much better results (43939 ms) and uses less threads which is good too.
val executor2 = Executors.newFixedThreadPool(4)
time = generateSequence {
measureTimeMillis {
val comps = mutableListOf<Future<Int>>()
for (i in 1..1_000_000) {
comps += executor2.submit<Int> { computation2(); 15 }
}
comps.map { it.get() }.sum()
}
}.take(100).sum()
println("Completed in $time ms")
executor2.shutdownNow()
Third program is written with coroutines and shows a big variance in the results (from 41784 ms to 81101 ms). I am very confused and don't quite understand why they are so different and why coroutines sometimes slower than threads (considering small async calculations is a forte of coroutines). Here is the code:
time = generateSequence {
runBlocking {
measureTimeMillis {
val comps = mutableListOf<Deferred<Int>>()
for (i in 1..1_000_000) {
comps += async { computation2(); 15 }
}
comps.map { it.await() }.sum()
}
}
}.take(100).sum()
println("Completed in $time ms")
I actually read a lot about these coroutines and how they are implemented in kotlin, but in practice I don't see them working as intended. Am I doing my benchmarking wrong? Or maybe I'm using coroutines wrong?
The way you've set up your problem, you shouldn't expect any benefit from coroutines. In all cases you submit a non-divisible block of computation to an executor. You are not leveraging the idea of coroutine suspension, where you can write sequential code that actually gets chopped up and executed piecewise, possibly on different threads.
Most use cases of coroutines revolve around blocking code: avoiding the scenario where you hog a thread to do nothing but wait for a response. They may also be used to interleave CPU-intensive tasks, but this is a more special-cased scenario.
I would suggest benchmarking 1,000,000 tasks that involve several sequential blocking steps, like in Roman Elizarov's KotlinConf 2017 talk:
suspend fun postItem(item: Item) {
val token = requestToken()
val post = createPost(token, item)
processPost(post)
}
where all of requestToken(), createPost() and processPost() involve network calls.
If you have two implementations of this, one with suspend funs and another with regular blocking functions, for example:
fun requestToken() {
Thread.sleep(1000)
return "token"
}
vs.
suspend fun requestToken() {
delay(1000)
return "token"
}
you'll find that you can't even set up to execute 1,000,000 concurrent invocations of the first version, and if you lower the number to what you can actually achieve without OutOfMemoryException: unable to create new native thread, the performance advantage of coroutines should be evident.
If you want to explore possible advantages of coroutines for CPU-bound tasks, you need a use case where it's not irrelevant whether you execute them sequentially or in parallel. In your examples above, this is treated as an irrelevant internal detail: in one version you run 1,000 concurrent tasks and in the other one you use just four, so it's almost sequential execution.
Hazelcast Jet is an example of such a use case because the computation tasks are co-dependent: one's output is another one's input. In this case you can't just run a few of them until completion, on a small thread pool, you actually have to interleave them so the buffered output doesn't explode. If you try to set up such a scenario with and without coroutines, you'll once again find that you're either allocating as many threads as there are tasks, or you are using suspendable coroutines, and the latter approach wins. Hazelcast Jet implements the spirit of coroutines in plain Java API. Its approach would hugely benefit from the coroutine programming model, but currently it's pure Java.
Disclosure: the author of this post belongs to the Jet engineering team.
Coroutines are not designed to be faster than threads, it is for lower RAM consumption and better syntax for async calls.
Coroutines are designed to be lightweight threads. It uses lower RAM, because when you execute 1,000,000 concurrent routines, it doesn't have to create 1,000,000 threads. Coroutine can help you to optimise the threads usage, and make the execution more efficiency, and you don't need to care about the threads anymore. You can consider a coroutine as a runnable or task, which you can post into a handler and executed in a thread or threadpool.
In the process of learning Rust, I am getting acquainted with error propagation and the choice between unwrap and the ? operator. After writing some prototype code that only uses unwrap(), I would like to remove unwrap from reusable parts, where panicking on every error is inappropriate.
How would one avoid the use of unwrap in a closure, like in this example?
// todo is VecDeque<PathBuf>
let dir = fs::read_dir(&filename).unwrap();
todo.extend(dir.map(|dirent| dirent.unwrap().path()));
The first unwrap can be easily changed to ?, as long as the containing function returns Result<(), io::Error> or similar. However, the second unwrap, the one in dirent.unwrap().path(), cannot be changed to dirent?.path() because the closure must return a PathBuf, not a Result<PathBuf, io::Error>.
One option is to change extend to an explicit loop:
let dir = fs::read_dir(&filename)?;
for dirent in dir {
todo.push_back(dirent?.path());
}
But that feels wrong - the original extend was elegant and clearly reflected the intention of the code. (It might also have been more efficient than a sequence of push_backs.) How would an experienced Rust developer express error checking in such code?
How would one avoid the use of unwrap in a closure, like in this example?
Well, it really depends on what you wish to do upon failure.
should failure be reported to the user or be silent
if reported, should one failure be reported or all?
if a failure occur, should it interrupt processing?
For example, you could perfectly decide to silently ignore all failures and just skip the entries that fail. In this case, the Iterator::filter_map combined with Result::ok is exactly what you are asking for.
let dir = fs::read_dir(&filename)?;
let todos.extend(dir.filter_map(Result::ok));
The Iterator interface is full of goodies, it's definitely worth perusing when looking for tidier code.
Here is a solution based on filter_map suggested by Matthieu. It calls Result::map_err to ensure the error is "caught" and logged, sending it further to Result::ok and filter_map to remove it from iteration:
fn log_error(e: io::Error) {
eprintln!("{}", e);
}
(|| {
let dir = fs::read_dir(&filename)?;
todo.extend(dir
.filter_map(|res| res.map_err(log_error).ok()))
.map(|dirent| dirent.path()));
})().unwrap_or_else(log_error)
I'm writing a primitive that takes in two agentsets and a command block. It needs to call a few functions, execute the command block in the current context, and then call another function. Here's what I have so far:
class WithContext(pushGraphContext: GraphContext => Unit, popGraphContext: api.World => GraphContext)
extends api.DefaultCommand {
override def getSyntax = commandSyntax(
Array(AgentsetType, AgentsetType, CommandBlockType))
def perform(args: Array[Argument], context: Context) {
val turtleSet = args(0).getAgentSet.requireTurtleSet
val linkSet = args(1).getAgentSet.requireLinkSet
val world = linkSet.world
val gc = new GraphContext(world, turtleSet, linkSet)
val extContext = context.asInstanceOf[ExtensionContext]
val nvmContext = extContext.nvmContext
pushGraphContext(gc)
// execute command block here
popGraphContext(world)
}
}
I looked at some examples that used nvmContext.runExclusively, but that looked like it's specifically for having a given agentset run the command block. I want the current agent (possibly the observer) to run it. Should I wrap nvm.agent in an agentset and pass that to nvmContext.runExclusively? If so, what's the easiest way to wrap an agent in agentset? If not, what should I do?
Method #1
The quicker-but-arguably-dirtier method is to use runExclusiveJob, as demonstrated in (e.g.) the create-red-turtles command in https://github.com/NetLogo/Sample-Scala-Extension/blob/master/src/SampleScalaExtension.scala .
To wrap the current agent in an agentset, you can use agent.AgentSetBuilder. (You could also pass an Array[Agent] of length 1 to one of the ArrayAgentSet constructors, but I'd recommend AgentSetBuilder since it's less reliant on internal implementation details which are likely to change.)
Method #2
The disadvantage of method #1 is the slight constant overhead associated with creating and setting up the extra AgentSet, Job, and Context objects and directing execution through them.
Creating and running a separate job isn't actually how built-in commands like if and while work. Instead of making a new job, they remain in the current job and cause commands in a command block to run (or not run) by manipulating the instruction pointer (nvm.Context.ip) to jump into them or skip over them.
I believe an extension command could do the same. I'm not certain if it has been tried before, but I can't see any reason it wouldn't work.
Doing it this way would involve understanding more about NetLogo engine internals, as documented at https://github.com/NetLogo/NetLogo/wiki/Engine-architecture . You'd model your primitive after e.g. https://github.com/NetLogo/NetLogo/blob/5.0.x/src/main/org/nlogo/prim/etc/_if.java , including altering your implementation of nvm.CustomAssembled. (Note that prim._extern, which runs extension commands, delegates its assemble method to the wrapped command's own assemble method, so this should work.) In your assemble method, instead of calling done() at the end to terminate the job, you'd just allow execution to fall through.
I could try to construct an example that works this way, but it'd take me a couple hours; it's probably not worth me doing unless there's a real need.