Heap issue when using kotlin coroutine in a Batch process

Heap issue when using kotlin coroutine in a Batch process - kotlin

I want to call an API for each element in a list.
So I created below code which is an extension function:
suspend fun <T, V> Iterable<T>.customAsyncAll(method: suspend (T) -> V): Iterable<V> {
val deferredList = mutableListOf<Deferred<V>>()
val scope = CoroutineScope(dispatchers.io)
forEach {
val deferred = scope.async {
try {
method(it)
} catch (e: Exception) {
log.error { "customAsyncAll Exception in $method method " + e.stackTraceToString())
}
throw e
}
}
deferredList.add(deferred)
}
return deferredList.awaitAll()
}
Call the code as:
val result = runBlocking{ list.customAsyncAll { apiCall(it) }.toList() }
I see error posting Resource Exhausted event: Java heap space. What is wrong with this code?
When an exception is thrown in one of the api calls, will the rest of the courouting async stuff be released or it still occupies heap space?

I'm guessing you are passing a somewhat large list (50+ items). I do believe that making so many calls is the problem, and realistically speaking I don't think you will have any performance gain by opening more than 10 connections to the API at a time. Μy suggestion would be to limit the concurrent calls to any number of less than 20.
There are many ways to implement this, using Semaphore is my recommendation.
suspend fun <T, V> Iterable<T>.customAsyncAll(method: suspend (T) -> V): Iterable<V> {
val deferredList = mutableListOf<Deferred<V>>()
val scope = CoroutineScope(Dispatchers.IO)
val sema = Semaphore(10)
forEach {
val deferred = scope.async {
sema.withPermit {
try {
method(it)
} catch (e: Exception) {
log.error {
"customAsyncAll Exception in $method method "
+ e.stackTraceToString())
}
throw e
}
}
}
deferredList.add(deferred)
}
return deferredList.awaitAll()
}
 
sidenote
Be sure to cancel any custom CouroutineScope you create after you are done with it, see Custom usage.

Related

Getting data from Datastore for injection

I am trying to retrieve the base url from my proto datastore to be used to initialize my ktor client instance I know how to get the data from the datastore but I don't know how to block execution until that value is received so the client can be initialized with the base url
So my ktor client service asks for a NetworkURLS class which has a method to return the base url
Here is my property to retrieve terminalDetails from my proto datastore
val getTerminalDetails: Flow<TerminalDetails> = cxt.terminalDetails.data
.catch { e ->
if (e is IOException) {
Log.d("Error", e.message.toString())
emit(TerminalDetails.getDefaultInstance())
} else {
throw e
}
}
Normally when I want to get the values I would do something like this
private fun getTerminalDetailsFromStore() {
try {
viewModelScope.launch(Dispatchers.IO) {
localRepository.getTerminalDetails.collect {
_terminalDetails.value = it
}
}
} catch(e: Exception) {
Log.d("AdminSettingsViewModel Error", e.message.toString()) // TODO: Handle Error Properly
}
}
but in my current case what I am looking to do is return terminalDetails.backendHost from a function and that where the issue comes in I know I need to use a coroutine scope to retrieve the value so I don't need to suspend the function but how to a prevent the function returning until the coroutine scope has finished?
I have tried using async and runBlocking but async doesn't work the way I would think it would and runBlocking hangs the entire app
fun backendURL(): String = runBlocking {
var url: String = "localhost"
val job = CoroutineScope(Dispatchers.IO).async {
repo.getTerminalDetails.collect {
it.backendHost
}
}
url
}
Can anyone give me some assistance on getting this to work?
EDIT: Here is my temporary solution, I do not intend on keeping it this way, The issue with runBlocking{} turned out to be the Flow<T> does not finish so runBlocking{} continues to block the app.
fun backendURL(): String {
val details = MutableStateFlow<TerminalDetails>(TerminalDetails.getDefaultInstance())
val job = CoroutineScope(Dispatchers.IO).launch {
repo.getTerminalDetails.collect {
details.value = it
}
}
runBlocking {
delay(250L)
}
return details.value.backendHost
}
EDIT 2: I fully fixed my issue. I created a method with the same name as my val (personal decision) which utilizes runBlocking{} and Flow<T>.first() to block while the value is retrieve. The reason I did not replace my val with the function is there are places where I need the information as well where I can utilize coroutines properly where I am not initializing components on my app
val getTerminalDetails: Flow<TerminalDetails> = cxt.terminalDetails.data
.catch { e ->
if (e is IOException) {
Log.d("Error", e.message.toString())
emit(TerminalDetails.getDefaultInstance())
} else {
throw e
}
}
fun getTerminalDetails(): TerminalDetails = runBlocking {
cxt.terminalDetails.data.first()
}

How to get correct return value for suspend function when using GlobalScope.launch?

I have a suspend function
private suspend fun getResponse(record: String): HashMap<String, String> {}
When I call it in my main function I'm doing this, but the type of response is Job, not HashMap, how can I get the correct return type?
override fun handleRequest(event: SQSEvent?, context: Context?): Void? {
event?.records?.forEach {
try {
val response: Job = GlobalScope.launch {
getResponse(it.body)
}
} catch (ex: Exception) {
logger.error("error message")
}
}
return null
}

Given your answers in the comments, it looks like you're not looking for concurrency here. The best course of action would then be to just make getRequest() a regular function instead of a suspend one.
Assuming you can't change this, you need to call a suspend function from a regular one. To do so, you have several options depending on your use case:
block the current thread while you do your async stuff
make handleRequest a suspend function
make handleRequest take a CoroutineScope to start coroutines with some lifecycle controlled externally, but that means handleRequest will return immediately and the caller has to deal with the running coroutines (please don't use GlobalScope for this, it's a delicate API)
Option 2 and 3 are provided for completeness, but most likely in your context these won't work for you. So you have to block the current thread while handleRequest is running, and you can do that using runBlocking:
override fun handleRequest(event: SQSEvent?, context: Context?): Void? {
runBlocking {
// do your stuff
}
return null
}
Now what to do inside runBlocking depends on what you want to achieve.
if you want to process elements sequentially, simply call getResponse directly inside the loop:
override fun handleRequest(event: SQSEvent?, context: Context?): Void? {
runBlocking {
event?.records?.forEach {
try {
val response = getResponse(it.body)
// do something with the response
} catch (ex: Exception) {
logger.error("error message")
}
}
}
return null
}
If you want to process elements concurrently, but independently, you can use launch and put both getResponse() and the code using the response inside the launch:
override fun handleRequest(event: SQSEvent?, context: Context?): Void? {
runBlocking {
event?.records?.forEach {
launch { // coroutine scope provided by runBlocking
try {
val response = getResponse(it.body)
// do something with the response
} catch (ex: Exception) {
logger.error("error message")
}
}
}
}
return null
}
If you want to get the responses concurrently, but process all responses only when they're all done, you can use map + async:
override fun handleRequest(event: SQSEvent?, context: Context?): Void? {
runBlocking {
val responses = event?.records?.mapNotNull {
async { // coroutine scope provided by runBlocking
try {
getResponse(it.body)
} catch (ex: Exception) {
logger.error("error message")
null // if you want to still handle other responses
// you could also throw an exception otherwise
}
}
}.map { it.await() }
// do something with all responses
}
return null
}

You can use GlobalScope.async() instead of launch() - it returns Deferred, which is a future/promise object. You can then call await() on it to get a result of getResponse().
Just make sure not to do something like: async().await() - it wouldn't make any sense, because it would still run synchronously. If you need to run getResponse() on all event.records in parallel, then you can first go in loop and collect all deffered objects and then await on all of them.

How can I guarantee to get latest data when I use Coroutine in Kotlin?

The Code A is from the project architecture-samples, you can see it here.
The updateTasksFromRemoteDataSource() is suspend function, so it maybe run asynchronously.
When I call the function getTasks(forceUpdate: Boolean) with the paramter True, I'm afraid that return tasksLocalDataSource.getTasks() will be fired before updateTasksFromRemoteDataSource().
I don't know if the Code B can guarantee return tasksLocalDataSource.getTasks() will be fired after updateTasksFromRemoteDataSource().
Code A
class DefaultTasksRepository(
private val tasksRemoteDataSource: TasksDataSource,
private val tasksLocalDataSource: TasksDataSource,
private val ioDispatcher: CoroutineDispatcher = Dispatchers.IO
) : TasksRepository {
override suspend fun getTasks(forceUpdate: Boolean): Result<List<Task>> {
// Set app as busy while this function executes.
wrapEspressoIdlingResource {
if (forceUpdate) {
try {
updateTasksFromRemoteDataSource()
} catch (ex: Exception) {
return Result.Error(ex)
}
}
return tasksLocalDataSource.getTasks()
}
}
private suspend fun updateTasksFromRemoteDataSource() {
val remoteTasks = tasksRemoteDataSource.getTasks()
if (remoteTasks is Success) {
// Real apps might want to do a proper sync, deleting, modifying or adding each task.
tasksLocalDataSource.deleteAllTasks()
remoteTasks.data.forEach { task ->
tasksLocalDataSource.saveTask(task)
}
} else if (remoteTasks is Result.Error) {
throw remoteTasks.exception
}
}
...
}
Code B
class DefaultTasksRepository(
private val tasksRemoteDataSource: TasksDataSource,
private val tasksLocalDataSource: TasksDataSource,
private val ioDispatcher: CoroutineDispatcher = Dispatchers.IO
) : TasksRepository {
override suspend fun getTasks(forceUpdate: Boolean): Result<List<Task>> {
// Set app as busy while this function executes.
wrapEspressoIdlingResource {
coroutineScope {
if (forceUpdate) {
try {
updateTasksFromRemoteDataSource()
} catch (ex: Exception) {
return Result.Error(ex)
}
}
}
return tasksLocalDataSource.getTasks()
}
}
...
}
Added Content
To Tenfour04: Thanks!
If somebody implement updateTasksFromRemoteDataSource() with lauch just like Code C, are you sure the Code C is return tasksLocalDataSource.getTasks() will be fired after updateTasksFromRemoteDataSource() when I call the function getTasks(forceUpdate: Boolean) with the paramter True?
Code C
class DefaultTasksRepository(
private val tasksRemoteDataSource: TasksDataSource,
private val tasksLocalDataSource: TasksDataSource,
private val ioDispatcher: CoroutineDispatcher = Dispatchers.IO
) : TasksRepository {
override suspend fun getTasks(forceUpdate: Boolean): Result<List<Task>> {
// Set app as busy while this function executes.
wrapEspressoIdlingResource {
if (forceUpdate) {
try {
updateTasksFromRemoteDataSource()
} catch (ex: Exception) {
return Result.Error(ex)
}
}
return tasksLocalDataSource.getTasks()
}
}
private suspend fun updateTasksFromRemoteDataSource() {
val remoteTasks = tasksRemoteDataSource.getTasks()
if (remoteTasks is Success) {
// Real apps might want to do a proper sync, deleting, modifying or adding each task.
tasksLocalDataSource.deleteAllTasks()
launch { //I suppose that launch can be fired
remoteTasks.data.forEach { task ->
tasksLocalDataSource.saveTask(task)
}
}
} else if (remoteTasks is Result.Error) {
throw remoteTasks.exception
}
}
}
New Added Content
To Joffrey: Thanks!
I think that the Code D can be compiled.
In this case, when forceUpdate is true, tasksLocalDataSource.getTasks() maybe be run before updateTasksFromRemoteDataSource() is done.
Code D
class DefaultTasksRepository(
private val tasksRemoteDataSource: TasksDataSource,
private val tasksLocalDataSource: TasksDataSource,
private val ioDispatcher: CoroutineDispatcher = Dispatchers.IO,
private val myCoroutineScope: CoroutineScope
) : TasksRepository {
override suspend fun getTasks(forceUpdate: Boolean): Result<List<Task>> {
// Set app as busy while this function executes.
wrapEspressoIdlingResource {
if (forceUpdate) {
try {
updateTasksFromRemoteDataSource(myCoroutineScope)
} catch (ex: Exception) {
return Result.Error(ex)
}
}
return tasksLocalDataSource.getTasks()
}
}
private suspend fun updateTasksFromRemoteDataSource(myCoroutineScope: CoroutineScope) {
val remoteTasks = tasksRemoteDataSource.getTasks()
if (remoteTasks is Success) {
// Real apps might want to do a proper sync, deleting, modifying or adding each task.
tasksLocalDataSource.deleteAllTasks()
myCoroutineScope.launch {
remoteTasks.data.forEach { task ->
tasksLocalDataSource.saveTask(task)
}
}
} else if (remoteTasks is Result.Error) {
throw remoteTasks.exception
}
}
...
}

suspend functions look like regular functions from the call site's point of view because they execute sequentially just like regular synchronous functions.
What I mean by this is that the instructions following a plain call to a suspend function do not execute until the called function completes its execution.
This means that code A is fine (when forceUpdate is true, tasksLocalDataSource.getTasks() will never run before updateTasksFromRemoteDataSource() is done), and the coroutineScope in code B is unnecessary.
Now regarding code C, structured concurrency is here to save you.
People simply cannot call launch without a CoroutineScope receiver.
Since TaskRepository doesn't extend CoroutineScope, the code C as-is will not compile.
There are 2 ways to make this compile though:
Using GlobalScope.launch {}: this will cause the problem you expect, indeed. The body of such a launch will be run asynchronously and independently of the caller. updateTasksFromRemoteDataSource can in this case return before the launch's body is done. The only way to control this is to use .join() on the Job returned by the call to launch (which waits until it's done). This is why it is usually not recommended to use the GlobalScope, because it can "leak" coroutines.
wrapping calls to launch in a coroutineScope {...} inside updateTasksFromRemoteDataSource. This will ensure that all coroutines launched within the coroutineScope block are actually finished before the coroutineScope call completes. Note that everything that's inside the coroutineScope block may very well run concurrently, though, depending on how launch/async are used, but this is the whole point of using launch in the first place, isn't it?
Now with Code D, my answer for code C sort of still holds. Whether you pass a scope or use the GlobalScope, you're effectively creating coroutines with a bigger lifecycle than the suspending function that starts them.
Therefore, it does create the problem you fear.
But why would you pass a CoroutineScope if you don't want implementers to launch long lived coroutines in the provided scope?
Assuming you don't do that, it's unlikely that a developer would use the GlobalScope (or any scope) to do this. It's generally bad style to create long-lived coroutines from a suspending function. If your function is suspending, callers usually expect that when it completes, it has actually done its work.

How to convert Java blocking function into cancellable suspend function?

Kotlin suspend functions should be nonblocking by convention (1). Often we have old Java code which relies on java Thread interruption mechanism, which we cannot (don't want to) modif (2):
public void doSomething(String arg) {
for (int i = 0; i < 100_000; i++) {
heavyCrunch(arg, i);
if (Thread.interrupted()) {
// We've been interrupted: no more crunching.
return;
}
}
}
What is the best way to adapt this code for usage in coroutines?
Version A: is unacceptable because it will run the code on the caller thread. So it will violate the "suspending functions do not block the caller thread" convention:
suspend fun doSomething(param: String) = delegate.performBlockingCode(param)
Version B: is better because it would run the blocking function in background thread, thus it wouldn't block the callers thread (except if by chance the caller uses the same thread from Dispatchers.Default threads pool). But coroutines job cancelling wouldn't interrupt performBlockingCode() which relies on thread interruption.
suspend fun doSomething(param: String) = withContext(Dispatchers.Default) {
delegate.performBlockingCode(param)
}
Version C: is currently the only way which I see to make it working. The idea is to convert blocking function into nonblocking with Java mechanisms and later use suspendCancellableCoroutine (3) for converting asynchronous method into suspend function:
private ExecutorService executor = Executors.newSingleThreadExecutor();
public Future doSomethingAsync(String arg) {
return executor.submit(() -> {
doSomething(arg);
});
}
suspend fun doSomething(param: String) = suspendCancellableCoroutine<Any> { cont ->
try {
val future = delegate.doSomethingAsync(param)
} catch (e: InterruptedException) {
throw CancellationException()
}
cont.invokeOnCancellation { future.cancel(true) }
}
As commented below, above code won't work properly, because continuation.resumeWith() is not called
Version D: uses CompletableFuture: which provides a way to register callback for when completable completes: thenAccept
private ExecutorService executor = Executors.newSingleThreadExecutor();
public CompletableFuture doSomethingAsync(String arg) {
return CompletableFuture.runAsync(() -> doSomething(arg), executor);
}
suspend fun doSomething(param: String) = suspendCancellableCoroutine<Any> { cont ->
try {
val completableFuture = delegate.doSomethingAsync(param)
completableFuture.thenAccept { cont.resumeWith(Result.success(it)) }
cont.invokeOnCancellation { completableFuture.cancel(true) }
} catch (e: InterruptedException) {
throw CancellationException()
}
}
Do you know any better way for that?
https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html
https://medium.com/#elizarov/blocking-threads-suspending-coroutines-d33e11bf4761
https://medium.com/#elizarov/callbacks-and-kotlin-flows-2b53aa2525cf

You may wrap blocking code via suspend fun kotlinx.coroutines.runInterruptible
It suppressed compile warning and blocking code will throw InterruptedException on cancellation
val job = launch {
runInterruptible {
Thread.sleep(500)
}
}
job.cancelAndJoin() // Cause will be 'java.lang.InterruptedException'
Tested on org.jetbrains.kotlinx:kotlinx-coroutines-core-jvm:1.4.2

Kotlin Process Collection In Parallel?

I have a collection of objects, which I need to perform some transformation on. Currently I am using:
var myObjects: List<MyObject> = getMyObjects()
myObjects.forEach{ myObj ->
someMethod(myObj)
}
It works fine, but I was hoping to speed it up by running someMethod() in parallel, instead of waiting for each object to finish, before starting on the next one.
Is there any way to do this in Kotlin? Maybe with doAsyncTask or something?
I know when this was asked over a year ago it was not possible, but now that Kotlin has coroutines like doAsyncTask I am curious if any of the coroutines can help

Yes, this can be done using coroutines. The following function applies an operation in parallel on all elements of a collection:
fun <A>Collection<A>.forEachParallel(f: suspend (A) -> Unit): Unit = runBlocking {
map { async(CommonPool) { f(it) } }.forEach { it.await() }
}
While the definition itself is a little cryptic, you can then easily apply it as you would expect:
myObjects.forEachParallel { myObj ->
someMethod(myObj)
}
Parallel map can be implemented in a similar way, see https://stackoverflow.com/a/45794062/1104870.

Java Stream is simple to use in Kotlin:
tasks.stream().parallel().forEach { computeNotSuspend(it) }
If you are using Android however, you cannot use Java 8 if you want an app compatible with an API lower than 24.
You can also use coroutines as you suggested. But it's not really part of the language as of now (August 2017) and you need to install an external library. There is very good guide with examples.
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
Note that coroutines are implemented with non-blocking multi-threading, which mean they can be faster than traditional multi-threading. I have code below benchmarking the Stream parallel versus coroutine and in that case the coroutine approach is 7 times faster on my machine. However you have to do some work yourself to make sure your code is "suspending" (non-locking) which can be quite tricky. In my example I'm just calling delay which is a suspend function provided by the library. Non-blocking multi-threading is not always faster than traditional multi-threading. It can be faster if you have many threads doing nothing but waiting on IO, which is kind of what my benchmark is doing.
My benchmarking code:
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.delay
import kotlinx.coroutines.experimental.launch
import kotlinx.coroutines.experimental.runBlocking
import java.util.*
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis
class SomeTask() {
val durationMS = random.nextInt(1000).toLong()
companion object {
val random = Random()
}
}
suspend fun compute(task: SomeTask): Unit {
delay(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun computeNotSuspend(task: SomeTask): Unit {
Thread.sleep(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun main(args: Array<String>) {
val n = 100
val tasks = List(n) { SomeTask() }
val timeCoroutine = measureNanoTime {
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
}
println("Coroutine ${timeCoroutine / 1_000_000} ms")
val timePar = measureNanoTime {
tasks.stream().parallel().forEach { computeNotSuspend(it) }
}
println("Stream parallel ${timePar / 1_000_000} ms")
}
Output on my 4 cores computer:
Coroutine: 1037 ms
Stream parallel: 7150 ms
If you uncomment out the println in the two compute functions you will see that in the non-blocking coroutine code the tasks are processed in the right order, but not with Streams.

You can use RxJava to solve this.
List<MyObjects> items = getList()
Observable.from(items).flatMap(object : Func1<MyObjects, Observable<String>>() {
fun call(item: MyObjects): Observable<String> {
return someMethod(item)
}
}).subscribeOn(Schedulers.io()).observeOn(AndroidSchedulers.mainThread()).subscribe(object : Subscriber<String>() {
fun onCompleted() {
}
fun onError(e: Throwable) {
}
fun onNext(s: String) {
// do on output of each string
}
})
By subscribing on Schedulers.io(), some method is scheduled on background thread.

To process items of a collection in parallel you can use Kotlin Coroutines. For example the following extension function processes items in parallel and waits for them to be processed:
suspend fun <T, R> Iterable<T>.processInParallel(
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> R,
): List<R> = coroutineScope { // or supervisorScope
map {
async(dispatcher) { processBlock(it) }
}.awaitAll()
}
This is suspend extension function on Iterable<T> type, which does a parallel processing of items and returns some result of processing each item. By default it uses Dispatchers.IO dispatcher to offload blocking tasks to a shared pool of threads. Must be called from a coroutine (including a coroutine with Dispatchers.Main dispatcher) or another suspend function.
Example of calling from a coroutine:
val myObjects: List<MyObject> = getMyObjects()
someCoroutineScope.launch {
val results = myObjects.processInParallel {
someMethod(it)
}
// use processing results
}
where someCoroutineScope is an instance of CoroutineScope.
Or if you want to just launch and forget you can use this function:
fun <T> CoroutineScope.processInParallelAndForget(
iterable: Iterable<T>,
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> Unit
) = iterable.forEach {
launch(dispatcher) { processBlock(it) }
}
This is an extension function on CoroutineScope, which doesn't return any result. It also uses Dispatchers.IO dispatcher by default. Can be called using CoroutineScope or from another coroutine.
Calling example:
someoroutineScope.processInParallelAndForget(myObjects) {
someMethod(it)
}
// OR from another coroutine:
someCoroutineScope.launch {
processInParallelAndForget(myObjects) {
someMethod(it)
}
}
where someCoroutineScope is an instance of CoroutineScope.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Heap issue when using kotlin coroutine in a Batch process - kotlin

Related

Getting data from Datastore for injection

How to get correct return value for suspend function when using GlobalScope.launch?

How can I guarantee to get latest data when I use Coroutine in Kotlin?

How to convert Java blocking function into cancellable suspend function?

Kotlin Process Collection In Parallel?

Categories

Resources