Testing coroutines in Kotlin - kotlin

I have this simple test about a crawler that is supposed to call the repo 40 times:
#Test
fun testX() {
// ...
runBlocking {
crawlYelp.concurrentCrawl()
// Thread.sleep(5000) // works if I un-comment
}
verify(restaurantsRepository, times(40)).saveAll(restaurants)
// ...
}
and this implementation:
suspend fun concurrentCrawl() {
cities.map { loc ->
1.rangeTo(10).map { start ->
GlobalScope.async {
val rests = scrapYelp.scrap(loc, start * 10)
restaurantsRepository.saveAll(rests)
}
}
}
}
But... I get this:
Wanted 40 times:
-> at ....testConcurrentCrawl(CrawlYelpTest.kt:46)
But was 30 times:
(the 30 is changing all the time; so it seems the test is not waiting...)
Why does it pass when I do the sleep? It should not be needed given I run blocking..
BTW, I have a controller that is supposed to be kept asynchronous:
#PostMapping("crawl")
suspend fun crawl(): String {
crawlYelp.concurrentCrawl()
return "crawling" // this is supposed to be returned right away
}
Thanks

runBlocking waits for all suspend functions to finish, but as concurrentCrawl basically just starts new jobs in new threads with GlobalScope.async currentCrawl, and therefore runBlocking, is done after all jobs were started and not after all of this jobs have finished.
You have to wait for all jobs started with GlobalScope.async to finish like this:
suspend fun concurrentCrawl() {
cities.map { loc ->
1.rangeTo(10).map { start ->
GlobalScope.async {
val rests = scrapYelp.scrap(loc, start * 10)
restaurantsRepository.saveAll(rests)
}
}.awaitAll()
}
}
If you want to wait for concurrentCrawl() to finish outside of concurrentCrawl() then you have to pass the Deferred results to the calling function like in the following example. In that case the suspend keyword can be removed from concurrentCrawl().
fun concurrentCrawl(): List<Deferred<Unit>> {
return cities.map { loc ->
1.rangeTo(10).map { start ->
GlobalScope.async {
println("hallo world $start")
}
}
}.flatten()
}
runBlocking {
concurrentCrawl().awaitAll()
}
As mentioned in the comments: In this case the async method does not return any value so it is better to use launch instead:
fun concurrentCrawl(): List<Job> {
return cities.map { loc ->
1.rangeTo(10).map { start ->
GlobalScope.launch {
println("hallo world $start")
}
}
}.flatten()
}
runBlocking {
concurrentCrawl().joinAll()
}

You could also just use MockK for this (and so much more).
MockK's verify has a timeout : Long parameter specifically for handling these races in tests.
You could leave your production code as it is, and change your test to this:
import io.mockk.verify
#Test
fun `test X`() = runBlocking {
// ...
crawlYelp.concurrentCrawl()
verify(exactly = 40, timeout = 5000L) {
restaurantsRepository.saveAll(restaurants)
}
// ...
}
If the verify is successful at any point before 5 seconds, it'll pass and move on. Else, the verify (and test) will fail.

Related

Async doesn't appear to run concurrently

The following code uses 2 async function calls:
import kotlinx.coroutines.*
import kotlin.system.*
fun main() = runBlocking<Unit> {
val time = measureTimeMillis {
val one = async { doSomethingUsefulOne() }
val two = async { doSomethingUsefulTwo() }
println("The answer is ${one.await() + two.await()}")
}
println("Completed in $time ms")
}
suspend fun doSomethingUsefulOne(): Int {
delay(3000L) // pretend we are doing something useful here
println("first")
return 13
}
suspend fun doSomethingUsefulTwo(): Int {
delay(1000L) // pretend we are doing something useful here, too
println("second")
return 29
}
The println results in "first" being printed first followed by "second". But according to the docs, these asyncs should be running concurrently. But since the first one has a delay of 3 seconds, why is its println running first? In fact, it doesn't even matter if I put the println before or after the delay. I get the same result.
The reason you have a gap between the 2 functions is the way you have called them in your print line:
println("The answer is ${one.await() + two.await()}")
Try changing .await() to .launch().
Await() calls the function and then stops until it is complete.

Kotlin coroutines and Java Completable future integration

Usually I'm using standard kotlin-jdk8 library to jump from Java *future API world into the Kotlin's suspend heaven.
And it worked great for me, until I encountered Neo4J cursor API, where I can't do .await() on the completion stage, because it immediately starts fetching millions of records into memory.
Kotlin way does not work for me, like this:
suspend fun query() {
driver.session().use { session ->
val cursor: StatementResultCursor = session.readTransactionAsync {
it.runAsync("query ...", params)
}.await() // HERE WE DIE WITH OOM
var record = cursor.nextAsync().await()
while (record != null) {
val node = record.get("node")
mySuspendProcessingFunction(node)
record = cursor.nextAsync().await()
}
}
}
At the same time, Java API works good, we fetch records one by one:
suspend fun query() {
session.readTransactionAsync { transaction ->
transaction.runAsync("query ...", params).thenCompose { cursor ->
cursor.forEachAsync { record ->
runBlocking { // BUT I NEED TO DO RUN BLOCKING HERE :(
val node = record.get("node")
mySuspendProcessingFunction(node)
}
}
}
}.thenCompose {
session.closeAsync()
}.await()
}
The second option works for me, but it is pretty ugly - definitely not Kotlin way, and what is more important, I need to use runBlocking (but these whole block is executed within suspend function)
What am I doing wrong? Is there a better way?
UPD
Tried to do this exercise using new Flow() feature, unfortunately results are the same:
suspend fun query() {
session.readTransactionAsync { transaction ->
transaction.runAsync(query, params).thenApply { cursor ->
cursor.asFlow().onEach { record ->
val node = record.get("node")
mySuspendProcessingFunction(node)
}
}
}.thenCompose {
session.closeAsync()
}.await()
}
fun StatementResultCursor.asFlow() = flow {
do {
val record = nextAsync().await()
if (record != null) emit(record)
} while (record != null)
}

produce<Type> vs Channel<Type>()

Trying to understand channels. I want to channelify the android BluetoothLeScanner. Why does this work:
fun startScan(filters: List<ScanFilter>, settings: ScanSettings = defaultSettings): ReceiveChannel<ScanResult?> {
val channel = Channel<ScanResult>()
scanCallback = object : ScanCallback() {
override fun onScanResult(callbackType: Int, result: ScanResult) {
channel.offer(result)
}
}
scanner.startScan(filters, settings, scanCallback)
return channel
}
But not this:
fun startScan(scope: CoroutineScope, filters: List<ScanFilter>, settings: ScanSettings = defaultSettings): ReceiveChannel<ScanResult?> = scope.produce {
scanCallback = object : ScanCallback() {
override fun onScanResult(callbackType: Int, result: ScanResult) {
offer(result)
}
}
scanner.startScan(filters, settings, scanCallback)
}
It tells me Channel was closed when it wants to call offer for the first time.
EDIT1: According to the docs: The channel is closed when the coroutine completes. which makes sense. I know we can use suspendCoroutine with resume for a one shot callback-replacement. This however is a listener/stream-situation. I don't want the coroutine to complete
Using produce, you introduce scope to your Channel. This means, the code that produces the items, that are streamed over the channel, can be cancelled.
This also means that the lifetime of your Channel starts at the start of the lambda of the produce and ends when this lambda ends.
In your example, the lambda of your produce call almost ends immediately, which means your Channel is closed almost immediately.
Change your code to something like this:
fun CoroutineScope.startScan(filters: List<ScanFilter>, settings: ScanSettings = defaultSettings): ReceiveChannel<ScanResult?> = produce {
scanCallback = object : ScanCallback() {
override fun onScanResult(callbackType: Int, result: ScanResult) {
offer(result)
}
}
scanner.startScan(filters, settings, scanCallback)
// now suspend this lambda forever (until its scope is canceled)
suspendCancellableCoroutine<Nothing> { cont ->
cont.invokeOnCancellation {
scanner.stopScan(...)
}
}
}
...
val channel = scope.startScan(filter)
...
...
scope.cancel() // cancels the channel and stops the scanner.
I added the line suspendCancellableCoroutine<Nothing> { ... } to make it suspend 'forever'.
Update: Using produce and handling errors in a structured way (allows for Structured Concurrency):
fun CoroutineScope.startScan(filters: List<ScanFilter>, settings: ScanSettings = defaultSettings): ReceiveChannel<ScanResult?> = produce {
// Suspend this lambda forever (until its scope is canceled)
suspendCancellableCoroutine<Nothing> { cont ->
val scanCallback = object : ScanCallback() {
override fun onScanResult(callbackType: Int, result: ScanResult) {
offer(result)
}
override fun onScanFailed(errorCode: Int) {
cont.resumeWithException(MyScanException(errorCode))
}
}
scanner.startScan(filters, settings, scanCallback)
cont.invokeOnCancellation {
scanner.stopScan(...)
}
}
}

Kotlin coroutines progress counter

I'm making thousands of HTTP requests using async/await and would like to have a progress indicator. I've added one in a naive way, but noticed that the counter value never reaches the total when all requests are done. So I've created a simple test and, sure enough, it doesn't work as expected:
fun main(args: Array<String>) {
var i = 0
val range = (1..100000)
range.map {
launch {
++i
}
}
println("$i ${range.count()}")
}
The output is something like this, where the first number always changes:
98800 100000
I'm probably missing some important detail about concurrency/synchronization in JVM/Kotlin, but don't know where to start. Any tips?
UPDATE: I ended up using channels as Marko suggested:
/**
* Asynchronously fetches stats for all symbols and sends a total number of requests
* to the `counter` channel each time a request completes. For example:
*
* val counterActor = actor<Int>(UI) {
* var counter = 0
* for (total in channel) {
* progressLabel.text = "${++counter} / $total"
* }
* }
*/
suspend fun getAssetStatsWithProgress(counter: SendChannel<Int>): Map<String, AssetStats> {
val symbolMap = getSymbols()?.let { it.map { it.symbol to it }.toMap() } ?: emptyMap()
val total = symbolMap.size
return symbolMap.map { async { getAssetStats(it.key) } }
.mapNotNull { it.await().also { counter.send(total) } }
.map { it.symbol to it }
.toMap()
}
The explanation what exactly makes your wrong approach fail is secondary: the primary thing is fixing the approach.
Instead of async-await or launch, for this communication pattern you should instead have an actor to which all the HTTP jobs send their status. This will automatically handle all your concurrency issues.
Here's some sample code, taken from the link you provided in the comment and adapted to your use case. Instead of some third party asking it for the counter value and updating the GUI with it, the actor runs in the UI context and updates the GUI itself:
import kotlinx.coroutines.experimental.*
import kotlinx.coroutines.experimental.channels.*
import kotlin.system.*
import kotlin.coroutines.experimental.*
object IncCounter
fun counterActor() = actor<IncCounter>(UI) {
var counter = 0
for (msg in channel) {
updateView(++counter)
}
}
fun main(args: Array<String>) = runBlocking {
val counter = counterActor()
massiveRun(CommonPool) {
counter.send(IncCounter)
}
counter.close()
println("View state: $viewState")
}
// Everything below is mock code that supports the example
// code above:
val UI = newSingleThreadContext("UI")
fun updateView(newVal: Int) {
viewState = newVal
}
var viewState = 0
suspend fun massiveRun(context: CoroutineContext, action: suspend () -> Unit) {
val numCoroutines = 1000
val repeatActionCount = 1000
val time = measureTimeMillis {
val jobs = List(numCoroutines) {
launch(context) {
repeat(repeatActionCount) { action() }
}
}
jobs.forEach { it.join() }
}
println("Completed ${numCoroutines * repeatActionCount} actions in $time ms")
}
Running it prints
Completed 1000000 actions in 2189 ms
View state: 1000000
You're losing writes because i++ is not an atomic operation - the value has to be read, incremented, and then written back - and you have multiple threads reading and writing i at the same time. (If you don't provide launch with a context, it uses a threadpool by default.)
You're losing 1 from your count every time two threads read the same value as they will then both write that value plus one.
Synchronizing in some way, for example by using an AtomicInteger solves this:
fun main(args: Array<String>) {
val i = AtomicInteger(0)
val range = (1..100000)
range.map {
launch {
i.incrementAndGet()
}
}
println("$i ${range.count()}") // 100000 100000
}
There's also no guarantee that these background threads will be done with their work by the time you print the result and your program ends - you can test it easily by adding just a very small delay inside launch, a couple milliseconds. With that, it's a good idea to wrap this all in a runBlocking call which will keep the main thread alive and then wait for the coroutines to all finish:
fun main(args: Array<String>) = runBlocking {
val i = AtomicInteger(0)
val range = (1..100000)
val jobs: List<Job> = range.map {
launch {
i.incrementAndGet()
}
}
jobs.forEach { it.join() }
println("$i ${range.count()}") // 100000 100000
}
Have you read Coroutines basics? There's exact same problem as yours:
val c = AtomicInteger()
for (i in 1..1_000_000)
launch {
c.addAndGet(i)
}
println(c.get())
This example completes in less than a second for me, but it prints some arbitrary number, because some coroutines don't finish before main() prints the result.
Because launch is not blocking, there's no guarantee all of coroutines will finish before println. You need to use async, store the Deferred objects and await for them to finish.

Kotlin Process Collection In Parallel?

I have a collection of objects, which I need to perform some transformation on. Currently I am using:
var myObjects: List<MyObject> = getMyObjects()
myObjects.forEach{ myObj ->
someMethod(myObj)
}
It works fine, but I was hoping to speed it up by running someMethod() in parallel, instead of waiting for each object to finish, before starting on the next one.
Is there any way to do this in Kotlin? Maybe with doAsyncTask or something?
I know when this was asked over a year ago it was not possible, but now that Kotlin has coroutines like doAsyncTask I am curious if any of the coroutines can help
Yes, this can be done using coroutines. The following function applies an operation in parallel on all elements of a collection:
fun <A>Collection<A>.forEachParallel(f: suspend (A) -> Unit): Unit = runBlocking {
map { async(CommonPool) { f(it) } }.forEach { it.await() }
}
While the definition itself is a little cryptic, you can then easily apply it as you would expect:
myObjects.forEachParallel { myObj ->
someMethod(myObj)
}
Parallel map can be implemented in a similar way, see https://stackoverflow.com/a/45794062/1104870.
Java Stream is simple to use in Kotlin:
tasks.stream().parallel().forEach { computeNotSuspend(it) }
If you are using Android however, you cannot use Java 8 if you want an app compatible with an API lower than 24.
You can also use coroutines as you suggested. But it's not really part of the language as of now (August 2017) and you need to install an external library. There is very good guide with examples.
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
Note that coroutines are implemented with non-blocking multi-threading, which mean they can be faster than traditional multi-threading. I have code below benchmarking the Stream parallel versus coroutine and in that case the coroutine approach is 7 times faster on my machine. However you have to do some work yourself to make sure your code is "suspending" (non-locking) which can be quite tricky. In my example I'm just calling delay which is a suspend function provided by the library. Non-blocking multi-threading is not always faster than traditional multi-threading. It can be faster if you have many threads doing nothing but waiting on IO, which is kind of what my benchmark is doing.
My benchmarking code:
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.delay
import kotlinx.coroutines.experimental.launch
import kotlinx.coroutines.experimental.runBlocking
import java.util.*
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis
class SomeTask() {
val durationMS = random.nextInt(1000).toLong()
companion object {
val random = Random()
}
}
suspend fun compute(task: SomeTask): Unit {
delay(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun computeNotSuspend(task: SomeTask): Unit {
Thread.sleep(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun main(args: Array<String>) {
val n = 100
val tasks = List(n) { SomeTask() }
val timeCoroutine = measureNanoTime {
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
}
println("Coroutine ${timeCoroutine / 1_000_000} ms")
val timePar = measureNanoTime {
tasks.stream().parallel().forEach { computeNotSuspend(it) }
}
println("Stream parallel ${timePar / 1_000_000} ms")
}
Output on my 4 cores computer:
Coroutine: 1037 ms
Stream parallel: 7150 ms
If you uncomment out the println in the two compute functions you will see that in the non-blocking coroutine code the tasks are processed in the right order, but not with Streams.
You can use RxJava to solve this.
List<MyObjects> items = getList()
Observable.from(items).flatMap(object : Func1<MyObjects, Observable<String>>() {
fun call(item: MyObjects): Observable<String> {
return someMethod(item)
}
}).subscribeOn(Schedulers.io()).observeOn(AndroidSchedulers.mainThread()).subscribe(object : Subscriber<String>() {
fun onCompleted() {
}
fun onError(e: Throwable) {
}
fun onNext(s: String) {
// do on output of each string
}
})
By subscribing on Schedulers.io(), some method is scheduled on background thread.
To process items of a collection in parallel you can use Kotlin Coroutines. For example the following extension function processes items in parallel and waits for them to be processed:
suspend fun <T, R> Iterable<T>.processInParallel(
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> R,
): List<R> = coroutineScope { // or supervisorScope
map {
async(dispatcher) { processBlock(it) }
}.awaitAll()
}
This is suspend extension function on Iterable<T> type, which does a parallel processing of items and returns some result of processing each item. By default it uses Dispatchers.IO dispatcher to offload blocking tasks to a shared pool of threads. Must be called from a coroutine (including a coroutine with Dispatchers.Main dispatcher) or another suspend function.
Example of calling from a coroutine:
val myObjects: List<MyObject> = getMyObjects()
someCoroutineScope.launch {
val results = myObjects.processInParallel {
someMethod(it)
}
// use processing results
}
where someCoroutineScope is an instance of CoroutineScope.
Or if you want to just launch and forget you can use this function:
fun <T> CoroutineScope.processInParallelAndForget(
iterable: Iterable<T>,
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> Unit
) = iterable.forEach {
launch(dispatcher) { processBlock(it) }
}
This is an extension function on CoroutineScope, which doesn't return any result. It also uses Dispatchers.IO dispatcher by default. Can be called using CoroutineScope or from another coroutine.
Calling example:
someoroutineScope.processInParallelAndForget(myObjects) {
someMethod(it)
}
// OR from another coroutine:
someCoroutineScope.launch {
processInParallelAndForget(myObjects) {
someMethod(it)
}
}
where someCoroutineScope is an instance of CoroutineScope.