MutableSharedFlow custom implementation - kotlin

I am using a MutableSharedFlow instantiated like this:
val flow = MutableSharedFlow<String>(
replay = 10,
onBufferOverflow = BufferOverflow.DROP_OLDEST
)
How could I create a custom implementation that notifies when the buffer overflows and records have been dropped?

Related

KafkaConsumer: `seekToEnd()` does not make consumer consume from latest offset

I have the following code
class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
fun run() {
consumer.seekToEnd(emptyList())
val pollDuration = 30 // seconds
while (true) {
val records = consumer.poll(Duration.ofSeconds(pollDuration))
// perform record analysis and commitSync()
}
}
}
}
The topic which the consumer is subscribed to continously receives records. Occasionally, the consumer will crash due to the processing step. When the consumer then is restarted, I want it to consume from the latest offset on the topic (i.e. ignore records that were published to the topic while the consumer was down). I thought the seekToEnd() method would ensure that. However, it seems like the method has no effect at all. The consumer starts to consume from the offset from which it crashed.
What is the correct way to use seekToEnd()?
Edit: The consumer is created with the following configs
fun <T> buildConsumer(valueDeserializer: String): KafkaConsumer<String, T> {
val props = setupConfig(valueDeserializer)
Common.setupConsumerSecurityProtocol(props)
return createConsumer(props)
}
fun setupConfig(valueDeserializer: String): Properties {
// Configuration setup
val props = Properties()
props[ConsumerConfig.GROUP_ID_CONFIG] = config.applicationId
props[ConsumerConfig.CLIENT_ID_CONFIG] = config.kafka.clientId
props[ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG] = config.kafka.bootstrapServers
props[AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG] = config.kafka.schemaRegistryUrl
props[ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG] = config.kafka.stringDeserializer
props[ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG] = valueDeserializer
props[KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG] = "true"
props[ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG] = config.kafka.maxPollIntervalMs
props[ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG] = config.kafka.sessionTimeoutMs
props[ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG] = "false"
props[ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG] = "false"
props[ConsumerConfig.AUTO_OFFSET_RESET_CONFIG] = "latest"
return props
}
fun <T> createConsumer(props: Properties): KafkaConsumer<String, T> {
val consumer = KafkaConsumer<String, T>(props)
consumer.subscribe(listOf(config.kafka.inputTopic))
return consumer
}
I found a solution!
I needed to add a dummy poll as a part of the consumer initialization process. Since several Kafka methods are evaluated lazily, it is necessary with a dummy poll to assign partitions to the consumer. Without the dummy poll, the consumer tries to seek to the end of partitions that are null. As a result, seekToEnd() has no effect.
It is important that the dummy poll duration is long enough for the partitions to get assigned. For instance with consumer.poll((Duration.ofSeconds(1)), the partitions did not get time to be assigned before the program moved on to the next method call (i.e. seekToEnd()).
Working code could look something like this
class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
fun run() {
// Initialization
val pollDuration = 30 // seconds
consumer.poll((Duration.ofSeconds(pollDuration)) // Dummy poll to get assigned partitions
// Seek to end and commit new offset
consumer.seekToEnd(emptyList())
consumer.commitSync()
while (true) {
val records = consumer.poll(Duration.ofSeconds(pollDuration))
// perform record analysis and commitSync()
}
}
}
}
The seekToEnd method requires the information on the actual partition (in Kafka terms TopicPartition) on which you plan to make your consumer read from the end.
I am not familiar with the Kotlin API, but checking the JavaDocs on the KafkaConsumer's method seekToEnd you will see, that it asks for a collection of TopicPartitions.
As you are currently using emptyList(), it will have no impact at all, just like you observed.

Moving Window With Kotlin Flow

I am trying to create a moving window of data using Kotlin Flows.
It can be achieved in RxKotlin using a buffer, but buffer is not the same using Flows.
RxKotlin has a buffer operator, periodically gathers items emitted by an Observable into bundles and emit these bundles rather than emitting the items one at a time - buffer(count, skip)
Kotlin Flow has a buffer but that just runs a collector in a separate coroutine - buffer
Is there an existing operator in Flows that can achieve this?
I think what you are looking for is not available in the Kotlinx Coroutines library but there is an open issue.
There is also a possible implementation in this comment which I will also include here:
fun <T> Flow<T>.windowed(size: Int, step: Int): Flow<List<T>> = flow {
// check that size and step are > 0
val queue = ArrayDeque<T>(size)
val toSkip = max(step - size, 0) < if sbd would like to skip some elements before getting another window, by serving step greater than size, then why not?
val toRemove = min(step, size)
var skipped = 0
collect { element ->
if(queue.size < size && skipped == toSkip) {
queue.add(element)
}
else if (queue.size < size && skipped < toSkip) {
skipped++
}
if(queue.size == size) {
emit(queue.toList())
repeat(toRemove) { queue.remove() }
skipped = 0
}
}
}

Can I build a Kotlin SharedFlow where the consumer dictates replay length?

Question
When instantiatiang a Kotlin MutableSharedFlow<T> class it allows you to specify replay length of n >= 0. All consumers will get n number of events replayed. Is it a good way to extend or wrap MutableSharedFlow so that the consumer dictactes how many (if any) events he/she wants replayed?
Example desired consumer code
flow.collectWithReplay(count = 1) { event -> ... }
Count would of course have to be equal or less than the upper boundary decided by the flow instance.
Rationale
Some times you want to act differently upon events that are old and new. An example is when the event contains one-time information that is irellevant after consumed once (e.g. data for an error dialog). You may still want to know that the last state was an error, but since it is old you don't show a dialog again. You'd then call flow.replayCache.lastOrNull() to get the old and then subscribe to new using .collectWitReplay(0).
Other times you don't want that distinction and then it would be a hassle to do the two calls separately. .collectWithReplay(1) then yields less and prettier code.
Solution attempted
I have made a solution using my own 1-element replay cache, which solves a special case for n=1. It would be trivial to extend to any n - that's not the point, but I dislike a couple of things about it:
a) It doesn't utilize the built in replay mechanism of SharedFlow
b) It's not thread-safe. collectWithReplay might lose an event emitted in between its line 1 and 2
c) Not sure if I lose any performance by losing inline on the collect method signature
open class FlowEventBus<T>() {
private val _flow = MutableSharedFlow<T>(replay = 0)
var latest: T? = null
private set
suspend fun emit(event: T) {
latest = event
_flow.emit(event) // suspends until all subscribers receive the event
}
/** Consumers who only wants events occuring from now on subscribe here */
suspend fun collect(action: suspend (value: T) -> Unit) = _flow.collect(action)
/** Consumers who wants the last event emitted as well as future events subscribe here */
suspend fun collectWithReplay(action: suspend (value: T) -> Unit) {
latest?.let { action(it) } // Replay any cached event
_flow.collect(action) // Listen for new events
}
}
Answer to main question
Here the foundation for a solution based on the suggestion from #tenfour04
val mainFlow = MutableSharedFlow<String>(10)
If consumers want a different replay value, the do this:
val flowForTwo = mainFlow.shareIn(threadPoolScope, SharingStarted.Eagerly, 2)
flowForTwo.collect { }
You'll be creating a new SharedFlow each time you do this though, so performance may suffer.
See working test
Variation: Event bus with zero or one replay
Here is a solution where the flow is wrapped in an event bus and the consumer may decide between replay length of 0 or 1. This solution comes with some race condition quirks when you emit and collect very close in time. Run and understand this failing unit before using in production. I don't know how to fix it, or if it's worth fixing. You might be better off just using a variation of my original idea.
/**
* FlowEventBus where consumer can decide between single replay or no replay when collecting.
* Warning: It has some concurrency issues that is apparent when you run the tests
*/
class FlowEventBus<T> {
private val threadPoolScope = CoroutineScope(Dispatchers.Default + SupervisorJob())
private val eventsWithSingleReplay = MutableSharedFlow<T>(replay = 1) // private mutable shared flow
private val eventsWithoutReplay = eventsWithSingleReplay.shareIn(threadPoolScope, SharingStarted.Eagerly, replay = 0)
val latest: T?
get() = eventsWithSingleReplay.replayCache.lastOrNull()
/** Emit a new event */
suspend fun emit(event: T) = eventsWithSingleReplay.emit(event)
/** Consumers who only wants events occuring from now on subscribe here */
suspend fun collect(action: suspend (value: T) -> Unit) = eventsWithoutReplay.collect(action)
/** Consumers who wants the last event emitted as well as future events subscribe here */
suspend fun collectWithReplay(action: suspend (value: T) -> Unit) {
eventsWithSingleReplay.collect(action)
}
}

Does it matter how to get Flow from anywhere(repository, etc.)?

I am getting my flow list like that:
val list = repository.someFlowList()
Sometimes I do this like that:
fun list() = repository.someFlowList()
In the Google Codelab it's used like that:
val list: Flow<List<Something>>
get() = repository.someFlowList()
I know what properties, getters, setters, functions are. But I want to know only one thing: is there any difference in terms of efficency, performance, etc? If it matters, I use that flow as livedata(just using asLiveData() method) in activity.
TL;DR: You should prefer a field-backed property over a computed property, but the outcome is the same. If it's a hot flow, in this case you should use a field-backed property to avoid unexpected behavior.
The Flow API is sort of declarative, meaning that when you create a Flow you are just defining what it does, this is why it's said to be a cold flow. The computation you define only runs when the flow is collected. This means that you can call collect on the same instance as many times as you need, the computation it defines will run from scratch each time. The only underlying impact of creating a new Flow instance each time as a computed property, or as result of a function, is that you allocate more instances of the same Flow definition.
Cold Flow
Check this trivial example:
val flow = flow {
emit(0)
emit(1)
}
runBlocking {
val f1 = flow
f1.collect {
println("Flow ID: ${f1.hashCode()} - emits $it")
}
val f2 = flow
f2.collect {
println("Flow ID: ${f2.hashCode()} - emits $it")
}
}
This will print:
Flow ID: 608188624 - emits 0
Flow ID: 608188624 - emits 1
Flow ID: 608188624 - emits 0
Flow ID: 608188624 - emits 1
You see that the same Flow instance, when collected, will run the Flow emission, each time you collect it.
If you change the assignment with a getter (val flow get() = flow {...}), the output is:
Flow ID: 608188624 - emits 0
Flow ID: 608188624 - emits 1
Flow ID: 511833308 - emits 0
Flow ID: 511833308 - emits 1
See that the outcome is the same, the difference is that now you have 2 Flow instances.
Hot Flow
When the Flow is hot, that is, it has values even before a collector starts collecting, then the story is different. A StateFlow is a typical hot Flow. Check this out:
val scope = CoroutineScope(Dispatchers.Default + SupervisorJob())
val flow = MutableStateFlow(0)
val job1 = scope.launch {
val f = flow
f.value = 1 // <-- note this
f.collect {
println("A: $it")
}
}
val job2 = scope.launch {
val f = flow
f.collect {
println("B: $it")
}
}
runBlocking {
job1.cancelAndJoin()
job2.cancelAndJoin()
}
The output is:
A: 1
B: 1
Both Flow collections receive the latest value of the single hot Flow instance.
If you change to val flow get() = MutableStateFlow(0) you get:
A: 1
B: 0
This time we create a different instance of the StateFlow, so the second collector misses the value change that we did before on the first Flow instance. This is a problem if the property is exposed as public, since the implementation of the property, either field-backed or computed, should not be relevant to the caller. Eventually this may end up creating an unexpected behaviour - a bug.
In Kotlin there is a concept of Backing Fields, these fields are only used when needed as part of a property to hold its value in memory.
Using a getter function get() = repository.someFlowList() the body is evaluated every time the property is accessed since there is no backing field assigned to it. While in case of val list = repository.someFlowList() value is evaluated during initialization and saved in a backing field. Kotlin documents on Getters & Setters explains it too.

"Fire and forget" with structured concurrency and coroutines

I have a little endpoint that looks like this
val numbers = it.bodyAsString.parseJsonList<Numbers>()
processedNumbers = numberService.process(numbers)
GlobalScope.launch {
sqsService.sendToSqs(processedNumbers)
}
it.response.setStatusCode(204).end()
The reason why I use GlobalScope is because the producer does only need the acknowledge after the numbers have been processed so I am trying to do a fire and forget in a parallel track to be able to immediately respond to the producer
What would be the “best practice” way of doing this with structured currency? Should I create my own scope (like fireAndForgetScope instead of GlobalScope)?
As you already guessed, creating your own scope would be a good solution in this case.
You can define it as member of your controller:
private val bgScope = CoroutineScope(newFixedThreadPoolContext(4, "background-tasks"))
Then usage is very similar to what you're doing:
val numbers = it.bodyAsString.parseJsonList<Numbers>()
processedNumbers = numberService.process(numbers)
bgScope.launch {
sqsService.sendToSqs(processedNumbers)
}
it.response.setStatusCode(204).end()