"Fire and forget" with structured concurrency and coroutines

"Fire and forget" with structured concurrency and coroutines - kotlin

I have a little endpoint that looks like this
val numbers = it.bodyAsString.parseJsonList<Numbers>()
processedNumbers = numberService.process(numbers)
GlobalScope.launch {
sqsService.sendToSqs(processedNumbers)
}
it.response.setStatusCode(204).end()
The reason why I use GlobalScope is because the producer does only need the acknowledge after the numbers have been processed so I am trying to do a fire and forget in a parallel track to be able to immediately respond to the producer
What would be the “best practice” way of doing this with structured currency? Should I create my own scope (like fireAndForgetScope instead of GlobalScope)?

As you already guessed, creating your own scope would be a good solution in this case.
You can define it as member of your controller:
private val bgScope = CoroutineScope(newFixedThreadPoolContext(4, "background-tasks"))
Then usage is very similar to what you're doing:
val numbers = it.bodyAsString.parseJsonList<Numbers>()
processedNumbers = numberService.process(numbers)
bgScope.launch {
sqsService.sendToSqs(processedNumbers)
}
it.response.setStatusCode(204).end()

Related

Thread-safe access to the same variable from different flows (Kotlin)

Is this code thread safe? Do I need a synchronized block or something like that? source1 and source2 endless Kotlin Flow
viewModelScope.launch {
var listAll = mutableListOf<String>()
var list1 = mutableListOf<String>()
var list2 = mutableListOf<String>()
launch {
source1.getNames().collect { list ->
list1 = list
listAll = mutableListOf()
listAll.addAll(list1)
listAll.addAll(list2)
//then consume listAll as StateFlow or return another flow with emit(listAll)
}
}
launch {
source2.getNames().collect { list ->
list2 = list
listAll = mutableListOf()
listAll.addAll(list2)
listAll.addAll(list1)
//then consume listAll as StateFlow or return another flow with emit(listAll)
}
}
}

This code is not thread safe.
However, it is called from viewModelScope.launch which runs on Dispatchers.Main by default. So your inner launch blocks will be called sequentially. This means that after all you will get the result which is produced by second launch block.
To achieve asynchronous behavior, you want to use viewModelScope.launch(Dispatchers.Default).
Your code will probably fire concurrent modification exception in that case.
To synchronize it, you may want to use Java's Collections.synchronizedList which blocks the list while one thread is performing operations with it, so the other thread are not able to perform modifications.
Or perform synchronizing manually using Mutex.
val mutex = Mutex()
viewModelScope.launch(Dispatchers.Default) {
launch {
mutex.withLock {
... // Your code
}
}
launch {
mutex.withLock {
... // Your code
}
}
}
Read official Kotlin guide to shared mutable state
After all, I am struggling to imagine real life example in which you will actually use that code. You probably don't need asynchronous behavior, you will be fine without using two launch blocks. Or you should rethink your design to avoid need of manual synchronization of two coroutines.

How to return an int value stuck in a for loop but a callback in Kotlin?

I am trying to get the size of this firebase collection size of documents, and for some reason in Kotlin, I can't seem to get this to work. I have declared a variable to be zero in an int function and I put it inside a for loop where it increments to the size of the range. Then when I return the value, it is zero. Here is the code I have provided, please help me as to why it is returning zero.
This is just what is being passed to the function
var postSize = 0
That is the global variable, now for below
val db = FirebaseFirestore.getInstance()
val first = db.collection("Post").orderBy("timestamp")
getPostSize(first)
This is the function
private fun getPostSize(first: Query){
first.get().addOnSuccessListener { documents ->
for(document in documents) {
Log.d(TAG, "${document.id} => ${document.data}")
getActualPostSize(postSize++)
}
}
return postSize
}
private fun getActualPostSize(sizeOfPost: Int): Int {
// The number does push to what I am expecting right here if I called a print statement
return sizeOfPost // However here it just returns it to be zero again. Why #tenffour04? Why?
}
It is my understanding, according to the other question that this was linked to, that I was suppose to do something like this.

This question has answers that explain how to approach getting results from asynchronous APIs, like you're trying to do.
Here is a more detailed explanation using your specific example since you were having trouble adapting the answer from there.
Suppose this is your original code you were trying to make work:
// In your "calling code" (inside onCreate() or some click listener):
val db = FirebaseFirestore.getInstance()
val first = db.collection("Post").orderBy("timestamp")
val postSize = getPostSize(first)
// do something with postSize
// Elsewhere in your class:
private fun getPostSize(first: Query): Int {
var postSize = 0
first.get().addOnSuccessListener { documents ->
for(document in documents) {
Log.d(TAG, "${document.id} => ${document.data}")
postSize++
}
}
return postSize
}
The reason this doesn't work is that the code inside your addOnSuccessListener is called some time in the future, after getPostSize() has already returned.
The reason asynchronous code is called in the future is because it takes a long time to do its action, but it's bad to wait for it on the calling thread because it will freeze your UI and make the whole phone unresponsive. So the time-consuming action is done in the background on another thread, which allows the calling code to continue doing what it's doing and finish immediately so it doesn't freeze the UI. When the time-consuming action is finally finished, only then is its callback/lambda code executed.
A simple retrieval from Firebase like this likely takes less than half a second, but this is still too much time to freeze the UI, because it would make the phone seem janky. Half a second in the future is still in the future compared to the code that is called underneath and outside the lambda.
For the sake of simplifying the below examples, let's simplify your original function to avoid using the for loop, since it was unnecessary:
private fun getPostSize(first: Query): Int {
var postSize = 0
first.get().addOnSuccessListener { documents ->
postSize = documents.count()
}
return postSize
}
The following are multiple distinct approaches for working with asynchronous code. You only have to pick one. You don't have to do all of them.
1. Make your function take a callback instead of returning a value.
Change you function into a higher order function. Since the function doesn't directly return the post size, it is a good convention to put "Async" in the function name. What this function does now is call the callback to pass it the value you wanted to retrieve. It will be called in the future when the listener has been called.
private fun getPostSizeAsync(first: Query, callback: (Int) -> Unit) {
first.get().addOnSuccessListener { documents ->
val postSize = documents.count()
callback(postSize)
}
}
Then to use your function in your "calling code", you must use the retrieved value inside the callback, which can be defined using a lambda:
// In your "calling code" (inside onCreate() or some click listener):
val db = FirebaseFirestore.getInstance()
val first = db.collection("Post").orderBy("timestamp")
getPostSizeAsync(first) { postSize ->
// do something with postSize inside the lambda here
}
// Don't try to do something with postSize after the lambda here. Code under
// here is called before the code inside the lambda because the lambda is called
// some time in the future.
2. Handle the response directly in the calling code.
You might have noticed in the above solution 1, you are really just creating an intermediate callback step, because you already have to deal with the callback lambda passed to addOnSuccessListener. You could eliminate the getPostSize function completely and just deal with callbacks at once place in your code. I wouldn't normally recommend this because it violates the DRY principle and the principle of avoiding dealing with multiple levels of abstraction in a single function. However, it may be better to start this way until you better grasp the concept of asynchronous code.
It would look like this:
// In your "calling code" (inside onCreate() or some click listener):
val db = FirebaseFirestore.getInstance()
val first = db.collection("Post").orderBy("timestamp")
first.get().addOnSuccessListener { documents ->
val postSize = documents.count()
// do something with postSize inside the lambda here
}
// Don't try to do something with postSize after the lambda here. Code under
// here is called before the code inside the lambda because the lambda is called
// some time in the future.
3. Put the result in a LiveData. Observe the LiveData separately.
You can create a LiveData that will update its observers about results when it gets them. This may not be a good fit for certain situations, because it would get really complicated if you had to turn observers on and off for your particular logic flow. I think it is probably a bad solution for your code because you might have different queries you want to pass to this function, so it wouldn't really make sense to have it keep publishing its results to the same LiveData, because the observers wouldn't know which query the latest postSize is related to.
But here is how it could be done.
private val postSizeLiveData = MutableLiveData<Int>()
// Function name changed "get" to "fetch" to reflect it doesn't return
// anything but simply initiates a fetch operation:
private fun fetchPostSize(query: Query) {
first.get().addOnSuccessListener { documents ->
postSize.value = documents.count()
}
}
// In your "calling code" (inside onCreate() or some click listener):
val db = FirebaseFirestore.getInstance()
val first = db.collection("Post").orderBy("timestamp")
fetchPostSize(first)
postSizeLiveData.observer(this) { postSize ->
// Do something with postSize inside this observer that will
// be called some time in the future.
}
// Don't try to do something with postSize after the lambda here. Code under
// here is called before the code inside the lambda because the lambda is called
// some time in the future.
4. Use a suspend function and coroutine.
Coroutines allow you to write synchronous code without blocking the calling thread. After you learn to use coroutines, they lead to simpler code because there's less nesting of asynchronous callback lambdas. If you look at option 1, it will become very complicated if you need to call more than one asynchronous function in a row to get the results you want, for example if you needed to use postSize to decide what to retrieve from Firebase next. You would have to call another callback-based higher-order function inside the lambda of your first higher-order function call, nesting the future code inside other future code. (This is nicknamed "callback hell".) To write a synchronous coroutine, you launch a coroutine from lifecycleScope (or viewLifecycleOwner.lifecycleScope in a Fragment or viewModelScope in a ViewModel). You can convert your getter function into a suspend function to allow it to be used synchronously without a callback when called from a coroutine. Firebase provides an await() suspend function that can be used to wait for the result synchronously if you're in a coroutine. (Note that more properly, you should use try/catch when you call await() because it's possible Firebase fails to retrieve the documents. But I skipped that for simplicity since you weren't bothering to handle the possible failure with an error listener in your original code.)
private suspend fun getPostSize(first: Query): Int {
return first.get().await().count()
}
// In your "calling code" (inside onCreate() or some click listener):
lifecycleScope.launch {
val db = FirebaseFirestore.getInstance()
val first = db.collection("Post").orderBy("timestamp")
val postSize = getPostSize(first)
// do something with postSize
}
// Code under here will run before the coroutine finishes so
// typically, you launch coroutines and do all your work inside them.
Coroutines are the common way to do this in Kotlin, but they are a complex topic to learn for a newcomer. I recommend you start with one of the first two solutions until you are much more comfortable with Kotlin and higher order functions.

Can I build a Kotlin SharedFlow where the consumer dictates replay length?

Question
When instantiatiang a Kotlin MutableSharedFlow<T> class it allows you to specify replay length of n >= 0. All consumers will get n number of events replayed. Is it a good way to extend or wrap MutableSharedFlow so that the consumer dictactes how many (if any) events he/she wants replayed?
Example desired consumer code
flow.collectWithReplay(count = 1) { event -> ... }
Count would of course have to be equal or less than the upper boundary decided by the flow instance.
Rationale
Some times you want to act differently upon events that are old and new. An example is when the event contains one-time information that is irellevant after consumed once (e.g. data for an error dialog). You may still want to know that the last state was an error, but since it is old you don't show a dialog again. You'd then call flow.replayCache.lastOrNull() to get the old and then subscribe to new using .collectWitReplay(0).
Other times you don't want that distinction and then it would be a hassle to do the two calls separately. .collectWithReplay(1) then yields less and prettier code.
Solution attempted
I have made a solution using my own 1-element replay cache, which solves a special case for n=1. It would be trivial to extend to any n - that's not the point, but I dislike a couple of things about it:
a) It doesn't utilize the built in replay mechanism of SharedFlow
b) It's not thread-safe. collectWithReplay might lose an event emitted in between its line 1 and 2
c) Not sure if I lose any performance by losing inline on the collect method signature
open class FlowEventBus<T>() {
private val _flow = MutableSharedFlow<T>(replay = 0)
var latest: T? = null
private set
suspend fun emit(event: T) {
latest = event
_flow.emit(event) // suspends until all subscribers receive the event
}
/** Consumers who only wants events occuring from now on subscribe here */
suspend fun collect(action: suspend (value: T) -> Unit) = _flow.collect(action)
/** Consumers who wants the last event emitted as well as future events subscribe here */
suspend fun collectWithReplay(action: suspend (value: T) -> Unit) {
latest?.let { action(it) } // Replay any cached event
_flow.collect(action) // Listen for new events
}
}

Answer to main question
Here the foundation for a solution based on the suggestion from #tenfour04
val mainFlow = MutableSharedFlow<String>(10)
If consumers want a different replay value, the do this:
val flowForTwo = mainFlow.shareIn(threadPoolScope, SharingStarted.Eagerly, 2)
flowForTwo.collect { }
You'll be creating a new SharedFlow each time you do this though, so performance may suffer.
See working test
Variation: Event bus with zero or one replay
Here is a solution where the flow is wrapped in an event bus and the consumer may decide between replay length of 0 or 1. This solution comes with some race condition quirks when you emit and collect very close in time. Run and understand this failing unit before using in production. I don't know how to fix it, or if it's worth fixing. You might be better off just using a variation of my original idea.
/**
* FlowEventBus where consumer can decide between single replay or no replay when collecting.
* Warning: It has some concurrency issues that is apparent when you run the tests
*/
class FlowEventBus<T> {
private val threadPoolScope = CoroutineScope(Dispatchers.Default + SupervisorJob())
private val eventsWithSingleReplay = MutableSharedFlow<T>(replay = 1) // private mutable shared flow
private val eventsWithoutReplay = eventsWithSingleReplay.shareIn(threadPoolScope, SharingStarted.Eagerly, replay = 0)
val latest: T?
get() = eventsWithSingleReplay.replayCache.lastOrNull()
/** Emit a new event */
suspend fun emit(event: T) = eventsWithSingleReplay.emit(event)
/** Consumers who only wants events occuring from now on subscribe here */
suspend fun collect(action: suspend (value: T) -> Unit) = eventsWithoutReplay.collect(action)
/** Consumers who wants the last event emitted as well as future events subscribe here */
suspend fun collectWithReplay(action: suspend (value: T) -> Unit) {
eventsWithSingleReplay.collect(action)
}
}

Kotlin wrap sequential IO calls as a Sequence

I need to process all of the results from a paged API endpoint. I'd like to present all of the results as a sequence.
I've come up with the following (slightly psuedo-coded):
suspend fun getAllRowsFromAPI(client: Client): Sequence<Row> {
var currentRequest: Request? = client.requestForNextPage()
return withContext(Dispatchers.IO) {
sequence {
while(currentRequest != null) {
var rowsInPage = runBlocking { client.makeRequest(currentRequest) }
currentRequest = client.requestForNextPage()
yieldAll(rowsInPage)
}
}
}
}
This functions but I'm not sure about a couple of things:
Is the API request happening inside runBlocking still happening with the IO dispatcher?
Is there a way to refactor the code to launch the next request before yielding the current results, then awaiting on it later?

Question 1: The API-request will still run on the IO-dispatcher, but it will block the thread it's running on. This means that no other tasks can be scheduled on that thread while waiting for the request to finish. There's not really any reason to use runBlocking in production-code at all, because:
If makeRequest is already a blocking call, then runBlocking will do practically nothing.
If makeRequest was a suspending call, then runBlocking would make the code less efficient. It wouldn't yield the thread back to the pool while waiting for the request to finish.
Whether makeRequest is a blocking or non-blocking call depends on the client you're using. Here's a non-blocking http-client I can recommend: https://ktor.io/clients/
Question 2: I would use a Flow for this purpose. You can think of it as a suspendable variant of Sequence. Flows are cold, which means that it won't run before the consumer asks for its contents (in contrary to being hot, which means the producer will push new values no matter if the consumer wants it or not). A Kotlin Flow has an operator called buffer which you can use to make it request more pages before it has fully consumed the previous page.
The code could look quite similar to what you already have:
suspend fun getAllRowsFromAPI(client: Client): Flow<Row> = flow {
var currentRequest: Request? = client.requestForNextPage()
while(currentRequest != null) {
val rowsInPage = client.makeRequest(currentRequest)
emitAll(rowsInPage.asFlow())
currentRequest = client.requestForNextPage()
}
}.flowOn(Dispatchers.IO)
.buffer(capacity = 1)
The capacity of 1 means that will only make 1 more request while processing an earlier page. You could increase the buffer size to make more concurrent requests.
You should check out this talk from KotlinConf 2019 to learn more about flows: https://www.youtube.com/watch?v=tYcqn48SMT8

Sequences are definitely not the thing you want to use in this case, because they are not designed to work in asynchronous environment. Perhaps you should take a look at flows and channels, but for your case the best and simplest choice is just a collection of deferred values, because you want to process all requests at once (flows and channels process them one-by-one, maybe with limited buffer size).
The following approach allows you to start all requests asynchronously (assuming that makeRequest is suspended function and supports asynchronous requests). When you'll need your results, you'll need to wait only for the slowest request to finish.
fun getClientRequests(client: Client): List<Request> {
val requests = ArrayList<Request>()
var currentRequest: Request? = client.requestForNextPage()
while (currentRequest != null) {
requests += currentRequest
currentRequest = client.requestForNextPage()
}
return requests
}
// This function is not even suspended, so it finishes almost immediately
fun getAllRowsFromAPI(client: Client): List<Deferred<Page>> =
getClientRequests(client).map {
/*
* The better practice would be making getAllRowsFromApi an extension function
* to CoroutineScope and calling receiver scope's async function.
* GlobalScope is used here just for simplicity.
*/
GlobalScope.async(Dispatchers.IO) { client.makeRequest(it) }
}
fun main() {
val client = Client()
val deferredPages = getAllRowsFromAPI(client) // This line executes fast
// Here you can do whatever you want, all requests are processed in background
Thread.sleep(999L)
// Then, when we need results....
val pages = runBlocking {
deferredPages.map { it.await() }
}
println(pages)
// In your case you also want to "unpack" pages and get rows, you can do it here:
val rows = pages.flatMap { it.getRows() }
println(rows)
}

I happened across suspendingSequence in Kotlin's coroutines-examples:
https://github.com/Kotlin/coroutines-examples/blob/090469080a974b962f5debfab901954a58a6e46a/examples/suspendingSequence/suspendingSequence.kt
This is exactly what I was looking for.

How to avoid NetworkOnMainThreadException in this case?

I have a function:
fun getUpdatedStr(): DoubleArray {
var Strings : DoubleArray = doubleArrayOf()
for (i in 0..9) {
val page = Jsoup.connect("somesite.com").get()
val table = page.select("table").first().select("td").first()
Strings += table.text()
}
return Strings
}
That throws an android.os.NetworkOnMainThreadException. My problem is that if I try to put this function into a Thread then I can't return the value to use it for other functions. What's the best way to work around this?

You cannot make a network call on Android from the main thread. You must use a worker thread.
This can be done in plain old Java thread, or some "higher level" constructs such as AsyncTask, HandlerThread, RxJava, coroutines etc.
Normally you can't "return" from a thread like you are looking for, as the execution would continue on the main thread after triggering the new one.
If you use coroutines, you can do this with suspend functions.
If you don't want to learn coroutines, with RxJava you'd have to return an observable to the calling functions
Otherwise, convert your method to callback based, and invoke the callback when you're finished

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

"Fire and forget" with structured concurrency and coroutines - kotlin

Related

Thread-safe access to the same variable from different flows (Kotlin)

How to return an int value stuck in a for loop but a callback in Kotlin?

Can I build a Kotlin SharedFlow where the consumer dictates replay length?

Kotlin wrap sequential IO calls as a Sequence

How to avoid NetworkOnMainThreadException in this case?

Categories

Resources