Can I build a Kotlin SharedFlow where the consumer dictates replay length? - kotlin

Question
When instantiatiang a Kotlin MutableSharedFlow<T> class it allows you to specify replay length of n >= 0. All consumers will get n number of events replayed. Is it a good way to extend or wrap MutableSharedFlow so that the consumer dictactes how many (if any) events he/she wants replayed?
Example desired consumer code
flow.collectWithReplay(count = 1) { event -> ... }
Count would of course have to be equal or less than the upper boundary decided by the flow instance.
Rationale
Some times you want to act differently upon events that are old and new. An example is when the event contains one-time information that is irellevant after consumed once (e.g. data for an error dialog). You may still want to know that the last state was an error, but since it is old you don't show a dialog again. You'd then call flow.replayCache.lastOrNull() to get the old and then subscribe to new using .collectWitReplay(0).
Other times you don't want that distinction and then it would be a hassle to do the two calls separately. .collectWithReplay(1) then yields less and prettier code.
Solution attempted
I have made a solution using my own 1-element replay cache, which solves a special case for n=1. It would be trivial to extend to any n - that's not the point, but I dislike a couple of things about it:
a) It doesn't utilize the built in replay mechanism of SharedFlow
b) It's not thread-safe. collectWithReplay might lose an event emitted in between its line 1 and 2
c) Not sure if I lose any performance by losing inline on the collect method signature
open class FlowEventBus<T>() {
private val _flow = MutableSharedFlow<T>(replay = 0)
var latest: T? = null
private set
suspend fun emit(event: T) {
latest = event
_flow.emit(event) // suspends until all subscribers receive the event
}
/** Consumers who only wants events occuring from now on subscribe here */
suspend fun collect(action: suspend (value: T) -> Unit) = _flow.collect(action)
/** Consumers who wants the last event emitted as well as future events subscribe here */
suspend fun collectWithReplay(action: suspend (value: T) -> Unit) {
latest?.let { action(it) } // Replay any cached event
_flow.collect(action) // Listen for new events
}
}

Answer to main question
Here the foundation for a solution based on the suggestion from #tenfour04
val mainFlow = MutableSharedFlow<String>(10)
If consumers want a different replay value, the do this:
val flowForTwo = mainFlow.shareIn(threadPoolScope, SharingStarted.Eagerly, 2)
flowForTwo.collect { }
You'll be creating a new SharedFlow each time you do this though, so performance may suffer.
See working test
Variation: Event bus with zero or one replay
Here is a solution where the flow is wrapped in an event bus and the consumer may decide between replay length of 0 or 1. This solution comes with some race condition quirks when you emit and collect very close in time. Run and understand this failing unit before using in production. I don't know how to fix it, or if it's worth fixing. You might be better off just using a variation of my original idea.
/**
* FlowEventBus where consumer can decide between single replay or no replay when collecting.
* Warning: It has some concurrency issues that is apparent when you run the tests
*/
class FlowEventBus<T> {
private val threadPoolScope = CoroutineScope(Dispatchers.Default + SupervisorJob())
private val eventsWithSingleReplay = MutableSharedFlow<T>(replay = 1) // private mutable shared flow
private val eventsWithoutReplay = eventsWithSingleReplay.shareIn(threadPoolScope, SharingStarted.Eagerly, replay = 0)
val latest: T?
get() = eventsWithSingleReplay.replayCache.lastOrNull()
/** Emit a new event */
suspend fun emit(event: T) = eventsWithSingleReplay.emit(event)
/** Consumers who only wants events occuring from now on subscribe here */
suspend fun collect(action: suspend (value: T) -> Unit) = eventsWithoutReplay.collect(action)
/** Consumers who wants the last event emitted as well as future events subscribe here */
suspend fun collectWithReplay(action: suspend (value: T) -> Unit) {
eventsWithSingleReplay.collect(action)
}
}

Related

Thread-safe access to the same variable from different flows (Kotlin)

Is this code thread safe? Do I need a synchronized block or something like that? source1 and source2 endless Kotlin Flow
viewModelScope.launch {
var listAll = mutableListOf<String>()
var list1 = mutableListOf<String>()
var list2 = mutableListOf<String>()
launch {
source1.getNames().collect { list ->
list1 = list
listAll = mutableListOf()
listAll.addAll(list1)
listAll.addAll(list2)
//then consume listAll as StateFlow or return another flow with emit(listAll)
}
}
launch {
source2.getNames().collect { list ->
list2 = list
listAll = mutableListOf()
listAll.addAll(list2)
listAll.addAll(list1)
//then consume listAll as StateFlow or return another flow with emit(listAll)
}
}
}
This code is not thread safe.
However, it is called from viewModelScope.launch which runs on Dispatchers.Main by default. So your inner launch blocks will be called sequentially. This means that after all you will get the result which is produced by second launch block.
To achieve asynchronous behavior, you want to use viewModelScope.launch(Dispatchers.Default).
Your code will probably fire concurrent modification exception in that case.
To synchronize it, you may want to use Java's Collections.synchronizedList which blocks the list while one thread is performing operations with it, so the other thread are not able to perform modifications.
Or perform synchronizing manually using Mutex.
val mutex = Mutex()
viewModelScope.launch(Dispatchers.Default) {
launch {
mutex.withLock {
... // Your code
}
}
launch {
mutex.withLock {
... // Your code
}
}
}
Read official Kotlin guide to shared mutable state
After all, I am struggling to imagine real life example in which you will actually use that code. You probably don't need asynchronous behavior, you will be fine without using two launch blocks. Or you should rethink your design to avoid need of manual synchronization of two coroutines.

KafkaConsumer: `seekToEnd()` does not make consumer consume from latest offset

I have the following code
class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
fun run() {
consumer.seekToEnd(emptyList())
val pollDuration = 30 // seconds
while (true) {
val records = consumer.poll(Duration.ofSeconds(pollDuration))
// perform record analysis and commitSync()
}
}
}
}
The topic which the consumer is subscribed to continously receives records. Occasionally, the consumer will crash due to the processing step. When the consumer then is restarted, I want it to consume from the latest offset on the topic (i.e. ignore records that were published to the topic while the consumer was down). I thought the seekToEnd() method would ensure that. However, it seems like the method has no effect at all. The consumer starts to consume from the offset from which it crashed.
What is the correct way to use seekToEnd()?
Edit: The consumer is created with the following configs
fun <T> buildConsumer(valueDeserializer: String): KafkaConsumer<String, T> {
val props = setupConfig(valueDeserializer)
Common.setupConsumerSecurityProtocol(props)
return createConsumer(props)
}
fun setupConfig(valueDeserializer: String): Properties {
// Configuration setup
val props = Properties()
props[ConsumerConfig.GROUP_ID_CONFIG] = config.applicationId
props[ConsumerConfig.CLIENT_ID_CONFIG] = config.kafka.clientId
props[ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG] = config.kafka.bootstrapServers
props[AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG] = config.kafka.schemaRegistryUrl
props[ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG] = config.kafka.stringDeserializer
props[ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG] = valueDeserializer
props[KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG] = "true"
props[ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG] = config.kafka.maxPollIntervalMs
props[ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG] = config.kafka.sessionTimeoutMs
props[ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG] = "false"
props[ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG] = "false"
props[ConsumerConfig.AUTO_OFFSET_RESET_CONFIG] = "latest"
return props
}
fun <T> createConsumer(props: Properties): KafkaConsumer<String, T> {
val consumer = KafkaConsumer<String, T>(props)
consumer.subscribe(listOf(config.kafka.inputTopic))
return consumer
}
I found a solution!
I needed to add a dummy poll as a part of the consumer initialization process. Since several Kafka methods are evaluated lazily, it is necessary with a dummy poll to assign partitions to the consumer. Without the dummy poll, the consumer tries to seek to the end of partitions that are null. As a result, seekToEnd() has no effect.
It is important that the dummy poll duration is long enough for the partitions to get assigned. For instance with consumer.poll((Duration.ofSeconds(1)), the partitions did not get time to be assigned before the program moved on to the next method call (i.e. seekToEnd()).
Working code could look something like this
class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
fun run() {
// Initialization
val pollDuration = 30 // seconds
consumer.poll((Duration.ofSeconds(pollDuration)) // Dummy poll to get assigned partitions
// Seek to end and commit new offset
consumer.seekToEnd(emptyList())
consumer.commitSync()
while (true) {
val records = consumer.poll(Duration.ofSeconds(pollDuration))
// perform record analysis and commitSync()
}
}
}
}
The seekToEnd method requires the information on the actual partition (in Kafka terms TopicPartition) on which you plan to make your consumer read from the end.
I am not familiar with the Kotlin API, but checking the JavaDocs on the KafkaConsumer's method seekToEnd you will see, that it asks for a collection of TopicPartitions.
As you are currently using emptyList(), it will have no impact at all, just like you observed.

How to emit Flow value from different function? Kotlin Coroutines

I have a flow :
val myflow = kotlinx.coroutines.flow.flow<Message>{}
and want to emit values with function:
override suspend fun sendMessage(chat: Chat, message: Message) {
myflow.emit(message)
}
But compiler does not allow me to do this, is there any workarounds to solve this problem?
You can use StateFlow for such use case.
Here's a sample code.
import kotlinx.coroutines.*
import kotlinx.coroutines.flow.*
val chatFlow = MutableStateFlow<String>("")
fun main() = runBlocking {
// Observe values
val job = launch {
chatFlow.collect {
print("$it ")
}
}
// Change values
arrayOf("Hey", "Hi", "Hello").forEach {
delay(100)
sendMessage(it)
}
delay(1000)
// Cancel running job
job.cancel()
job.join()
}
suspend fun sendMessage(message: String) {
chatFlow.value = message
}
You can test this code by running below snippet.
<iframe src="https://pl.kotl.in/DUBDfUnX3" style="width:600px;"></iframe>
The answer of Animesh Sahu is pretty much correct. You can also return a Channel as a flow (see consumeAsFlow or asFlow on a BroadcastChannel).
But there is also a thing called StateFlow currently in development by Kotlin team, which is, in part, meant to implement a similar behavior, although it is unknown when it is going to be ready.
EDIT: StateFlow and SharedFlow have been released as part of a stable API (https://blog.jetbrains.com/kotlin/2020/10/kotlinx-coroutines-1-4-0-introducing-stateflow-and-sharedflow/). These tools can and should be used when state management is required in an async execution context.
Use a SharedStateFlow it has got everything you need.
Initialization of your flow:
val myFlow = MutableSharedFlow<Message>()
and now it should just work as you were trying earlier with:
override suspend fun sendMessage(chat: Chat, message: Message) {
myFlow.emit(message)
}
Flow is self contained, once the block (lambda) inside the flow is executed the flow is over, you've to do operations inside and emit them from there.
Here is the similar github issue, says:
Afaik Flow is designed to be a self contained, replayable, cold stream, so emission from outside of it's own scope wouldn't be part of the contract. I think what you're looking for is a Channel.
And IMHO you're probably looking at the Channels, or specifically a ConflatedBroadcastChannel for multiple receivers. The difference between a normal channel and a broadcast channel is that multiple receivers can listen to a broadcast channel using openSubscription function which returns a ReceiveChannel associated with the BroadcastChannel.

Kotlin wrap sequential IO calls as a Sequence

I need to process all of the results from a paged API endpoint. I'd like to present all of the results as a sequence.
I've come up with the following (slightly psuedo-coded):
suspend fun getAllRowsFromAPI(client: Client): Sequence<Row> {
var currentRequest: Request? = client.requestForNextPage()
return withContext(Dispatchers.IO) {
sequence {
while(currentRequest != null) {
var rowsInPage = runBlocking { client.makeRequest(currentRequest) }
currentRequest = client.requestForNextPage()
yieldAll(rowsInPage)
}
}
}
}
This functions but I'm not sure about a couple of things:
Is the API request happening inside runBlocking still happening with the IO dispatcher?
Is there a way to refactor the code to launch the next request before yielding the current results, then awaiting on it later?
Question 1: The API-request will still run on the IO-dispatcher, but it will block the thread it's running on. This means that no other tasks can be scheduled on that thread while waiting for the request to finish. There's not really any reason to use runBlocking in production-code at all, because:
If makeRequest is already a blocking call, then runBlocking will do practically nothing.
If makeRequest was a suspending call, then runBlocking would make the code less efficient. It wouldn't yield the thread back to the pool while waiting for the request to finish.
Whether makeRequest is a blocking or non-blocking call depends on the client you're using. Here's a non-blocking http-client I can recommend: https://ktor.io/clients/
Question 2: I would use a Flow for this purpose. You can think of it as a suspendable variant of Sequence. Flows are cold, which means that it won't run before the consumer asks for its contents (in contrary to being hot, which means the producer will push new values no matter if the consumer wants it or not). A Kotlin Flow has an operator called buffer which you can use to make it request more pages before it has fully consumed the previous page.
The code could look quite similar to what you already have:
suspend fun getAllRowsFromAPI(client: Client): Flow<Row> = flow {
var currentRequest: Request? = client.requestForNextPage()
while(currentRequest != null) {
val rowsInPage = client.makeRequest(currentRequest)
emitAll(rowsInPage.asFlow())
currentRequest = client.requestForNextPage()
}
}.flowOn(Dispatchers.IO)
.buffer(capacity = 1)
The capacity of 1 means that will only make 1 more request while processing an earlier page. You could increase the buffer size to make more concurrent requests.
You should check out this talk from KotlinConf 2019 to learn more about flows: https://www.youtube.com/watch?v=tYcqn48SMT8
Sequences are definitely not the thing you want to use in this case, because they are not designed to work in asynchronous environment. Perhaps you should take a look at flows and channels, but for your case the best and simplest choice is just a collection of deferred values, because you want to process all requests at once (flows and channels process them one-by-one, maybe with limited buffer size).
The following approach allows you to start all requests asynchronously (assuming that makeRequest is suspended function and supports asynchronous requests). When you'll need your results, you'll need to wait only for the slowest request to finish.
fun getClientRequests(client: Client): List<Request> {
val requests = ArrayList<Request>()
var currentRequest: Request? = client.requestForNextPage()
while (currentRequest != null) {
requests += currentRequest
currentRequest = client.requestForNextPage()
}
return requests
}
// This function is not even suspended, so it finishes almost immediately
fun getAllRowsFromAPI(client: Client): List<Deferred<Page>> =
getClientRequests(client).map {
/*
* The better practice would be making getAllRowsFromApi an extension function
* to CoroutineScope and calling receiver scope's async function.
* GlobalScope is used here just for simplicity.
*/
GlobalScope.async(Dispatchers.IO) { client.makeRequest(it) }
}
fun main() {
val client = Client()
val deferredPages = getAllRowsFromAPI(client) // This line executes fast
// Here you can do whatever you want, all requests are processed in background
Thread.sleep(999L)
// Then, when we need results....
val pages = runBlocking {
deferredPages.map { it.await() }
}
println(pages)
// In your case you also want to "unpack" pages and get rows, you can do it here:
val rows = pages.flatMap { it.getRows() }
println(rows)
}
I happened across suspendingSequence in Kotlin's coroutines-examples:
https://github.com/Kotlin/coroutines-examples/blob/090469080a974b962f5debfab901954a58a6e46a/examples/suspendingSequence/suspendingSequence.kt
This is exactly what I was looking for.

How to inform a Flux that I have an item ready to publish?

I am trying to make a class that would take incoming user events, process them and then pass the result to whoever subscribed to it:
class EventProcessor
{
val flux: Flux<Result>
fun onUserEvent1(e : Event)
{
val result = process(e)
// Notify flux that I have a new result
}
fun onUserEvent2(e : Event)
{
val result = process(e)
// Notify flux that I have a new result
}
fun process(e : Event): Result
{
...
}
}
Then the client code can subscribe to EventProcessor::flux and get notified each time a user event has been successfully processed.
However, I do not know how to do this. I tried to construct the flux with the Flux::generate function like this:
class EventProcessor
{
private var sink: SynchronousSink<Result>? = null
val flux: Flux<Result> = Flux.generate{ sink = it }
fun onUserEvent1(e : Event)
{
val result = process(e)
sink?.next(result)
}
fun onUserEvent2(e : Event)
{
val result = process(e)
sink?.next(result)
}
....
}
But this does not work, since I am supposed to immediately call next on the SynchronousSink<Result> passed to me in Flux::generate. I cannot store the sink as in the example:
reactor.core.Exceptions$ErrorCallbackNotImplemented:
java.lang.IllegalStateException: The generator didn't call any of the
SynchronousSink method
I was also thinking about the Flux::merge and Flux::concat methods, but these are static and they create a new Flux. I just want to push things into the existing flux, such that whoever holds it, gets notified.
Based on my limited understanding of the reactive types, this is supposed to be a common use case. Yet I find it very difficult to actually implement it. This brings me to a suspicion that I am missing something crucial or that I am using the library in an odd way, in which it was not intended to be used. If this is the case, any advice is warmly welcome.