Reactor Sinks.Many with variable replay - spring-webflux

Assume that I want a flight arrival Flux. The Sink would be based on the following class.
class Flight {
String code;
Date eta;
boolean arrived;
}
A Map of code -> Flight would be the actual Sink. New flights can be added (emit) and eta can change (causing emit). Arrivals are emitted as arrived=true then deleted from the Map on flight arrival (for simplicity's sake).
If I were to have one Sinks.many().multicast().replay() how can I remove previous eta changes and remove "arrived" flights from the internals of the Sink, such that a new Subscriber would not get "everything", but the latest of each inbound Flight, minus arrivals?
I suspect that I would need to create a new Sinks.many().unicast() for each new Subscriber and emit the latest content from the Map to the new Subscriber and continue to emit updates from Map change events.
However, how do I keep track of these Sinks so that I know which Sinks to emit to? And how do I know when a Sink becomes unsubscribed\disposed, so that I can stop emit to that disposed Sink?
I could do the tracking of subscribers via Flux.create(FluxSink) onRequest() onDispose() etc. which effectively gave me a unicast() Am I missing the difference between FluxSink and Sinks?

Related

How do you close a cold observable gracefully after certain time has elapsed from subscribe?

Assume below code which processes data from a paginatedAPI (external).
Flowable<Data> process = Flowable.generate(() -> new State(),
new BiConsumer<State, Emitter<Data>>() {
void accept() {
//get data from upstream service
//This calls dataEmitter.onNext() internally
pageId = paginatedAPI.get(pageId, dataEmitter);
//process
...
//update state
state.updatePageId(pageId);
}
}).subscribeOn(Schedulers.from(executor));
Now, since this is created from .generate, accept will be called only when subscriber is ready for next data.
I have full control on what I can add to state, but I can't change paginatedAPI
Requirement:
After a time T from subscription,
a) Iterate through all pages without sending them to subscriber and call paginatedAPI.close()
b) Provide subscriber with data from paginatedAPI.close()
If the subscriber disconnects before time T, then
a) Iterate through all pages without sending them to subscriber and call paginatedAPI.close()
I don't understand how to add the concept of time from subscription in controlling the flowable logic.
Also, accept can only call onNext atmost once. Now how can I finish through the paginatedAPI by calling onNext multiple times.
Edit: Added details on emitter and internal onNext call in paginatedAPi.get(pageId, dataEmitter);

Infinispan clustered lock performance does not improve with more nodes?

I have a piece of code that is essentially executing the following with Infinispan in embedded mode, using version 13.0.0 of the -core and -clustered-lock modules:
#Inject
lateinit var lockManager: ClusteredLockManager
private fun getLock(lockName: String): ClusteredLock {
lockManager.defineLock(lockName)
return lockManager.get(lockName)
}
fun createSession(sessionId: String) {
tryLockCounter.increment()
logger.debugf("Trying to start session %s. trying to acquire lock", sessionId)
Future.fromCompletionStage(getLock(sessionId).lock()).map {
acquiredLockCounter.increment()
logger.debugf("Starting session %s. Got lock", sessionId)
}.onFailure {
logger.errorf(it, "Failed to start session %s", sessionId)
}
}
I take this piece of code and deploy it to kubernetes. I then run it in six pods distributed over six nodes in the same region. The code exposes createSession with random Guids through an API. This API is called and creates sessions in chunks of 500, using a k8s service in front of the pods which means the load gets balanced over the pods. I notice that the execution time to acquire a lock grows linearly with the amount of sessions. In the beginning it's around 10ms, when there's about 20_000 sessions it takes about 100ms and the trend continues in a stable fashion.
I then take the same code and run it, but this time with twelve pods on twelve nodes. To my surprise I see that the performance characteristics are almost identical to when I had six pods. I've been digging in to the code but still haven't figured out why this is, I'm wondering if there's a good reason why infinispan here doesn't seem to perform better with more nodes?
For completeness the configuration of the locks are as follows:
val global = GlobalConfigurationBuilder.defaultClusteredBuilder()
global.addModule(ClusteredLockManagerConfigurationBuilder::class.java)
.reliability(Reliability.AVAILABLE)
.numOwner(1)
and looking at the code the clustered locks is using DIST_SYNC which should spread out the load of the cache onto the different nodes.
UPDATE:
The two counters in the code above are simply micrometer counters. It is through them and prometheus that I can see how the lock creation starts to slow down.
It's correctly observed that there's one lock created per session id, this is per design what we'd like. Our use case is that we want to ensure that a session is running in at least one place. Without going to deep into detail this can be achieved by ensuring that we at least have two pods that are trying to acquire the same lock. The Infinispan library is great in that it tells us directly when the lock holder dies without any additional extra chattiness between pods, which means that we have a "cheap" way of ensuring that execution of the session continues when one pod is removed.
After digging deeper into the code I found the following in CacheNotifierImpl in the core library:
private CompletionStage<Void> doNotifyModified(K key, V value, Metadata metadata, V previousValue,
Metadata previousMetadata, boolean pre, InvocationContext ctx, FlagAffectedCommand command) {
if (clusteringDependentLogic.running().commitType(command, ctx, extractSegment(command, key), false).isLocal()
&& (command == null || !command.hasAnyFlag(FlagBitSets.PUT_FOR_STATE_TRANSFER))) {
EventImpl<K, V> e = EventImpl.createEvent(cache.wired(), CACHE_ENTRY_MODIFIED);
boolean isLocalNodePrimaryOwner = isLocalNodePrimaryOwner(key);
Object batchIdentifier = ctx.isInTxScope() ? null : Thread.currentThread();
try {
AggregateCompletionStage<Void> aggregateCompletionStage = null;
for (CacheEntryListenerInvocation<K, V> listener : cacheEntryModifiedListeners) {
// Need a wrapper per invocation since converter could modify the entry in it
configureEvent(listener, e, key, value, metadata, pre, ctx, command, previousValue, previousMetadata);
aggregateCompletionStage = composeStageIfNeeded(aggregateCompletionStage,
listener.invoke(new EventWrapper<>(key, e), isLocalNodePrimaryOwner));
}
The lock library uses a clustered Listener on the entry modified event, and this one uses a filter to only notify when the key for the lock is modified. It seems to me the core library still has to check this condition on every registered listener, which of course becomes a very big list as the number of sessions grow. I suspect this to be the reason and if it is it would be really really awesome if the core library supported a kind of key filter so that it could use a hashmap for these listeners instead of going through a whole list with all listeners.
I believe you are creating a clustered lock per session id. Is this what you need ? what is the acquiredLockCounter? We are about to deprecate the "lock" method in favour of "tryLock" with timeout since the lock method will block forever if the clustered lock is never acquired. Do you ever unlock the clustered lock in another piece of code? If you shared a complete reproducer of the code will be very helpful for us. Thanks!

Kafka streams: groupByKey and reduce not triggering action exactly once when error occurs in stream

I have a simple Kafka streams scenario where I am doing a groupyByKey then reduce and then an action. There could be duplicate events in the source topic hence the groupyByKey and reduce
The action could error and in that case, I need the streams app to reprocess that event. In the example below I'm always throwing an error to demonstrate the point.
It is very important that the action only ever happens once and at least once.
The problem I'm finding is that when the streams app reprocesses the event, the reduce function is being called and as it returns null the action doesn't get recalled.
As only one event is produced to the source topic TOPIC_NAME I would expect the reduce to not have any values and skip down to the mapValues.
val topologyBuilder = StreamsBuilder()
topologyBuilder.stream(
TOPIC_NAME,
Consumed.with(Serdes.String(), EventSerde())
)
.groupByKey(Grouped.with(Serdes.String(), EventSerde()))
.reduce { current, _ ->
println("reduce hit")
null
}
.mapValues { _, v ->
println(Id: "${v.correlationId}")
throw Exception("simulate error")
}
To cause the issue I run the streams app twice. This is the output:
First run
Id: 90e6aefb-8763-4861-8d82-1304a6b5654e
11:10:52.320 [test-app-dcea4eb1-a58f-4a30-905f-46dad446b31e-StreamThread-1] ERROR org.apache.kafka.streams.KafkaStreams - stream-client [test-app-dcea4eb1-a58f-4a30-905f-46dad446b31e] All stream threads have died. The instance will be in error state and should be closed.
Second run
reduce hit
As you can see the .mapValues doesn't get called on the second run even though it errored on the first run causing the streams app to reprocess the same event again.
Is it possible to be able to have a streams app re-process an event with a reduced step where it's treating the event like it's never seen before? - Or is there a better approach to how I'm doing this?
I was missing a property setting for the streams app.
props["processing.guarantee"]= "exactly_once"
By setting this, it will guarantee that any state created from the point of picking up the event will rollback in case of a exception being thrown and the streams app crashing.
The problem was that the streams app would pick up the event again to re-process but the reducer step had state which has persisted. By enabling the exactly_once setting it ensures that the reducer state is also rolled back.
It now successfully re-processes the event as if it had never seen it before

Retrieve value from expired keys

I have a use case where I'm aggregating until a TTL hits 0 on the key/value in Redis. As far as I can tell in the documentation, retrieval or a background job triggers all expired keys to be deleted immediately.
Is there anyway I can 'halt' that deletion and retrieve the value at the time of expiry? Or something similar to that effect?
My last question contains some context to my use case: Redis - any way to trigger an event when a value is no longer being actively written to?
I believe I found an answer to my question. Every row would have a corresponding row with with the convention expiry:{key}.
uniqueEventHash: [value1, value2, value3] // no expiry
expiry:uniqueEventHash: {no value} // set TTL to 60
Now, whenever a new value for that uniqueEventHash arrives, I do two things. I append it onto the uniqueEventHash row with an append, and then I also subsequently reset the TTL on expiry:uniqueEventHash to 60.
When events for that uniqueEventHash stop arriving, the second expiry:expiry:uniqueEventHash is deleted and a notification is sent out to a subscriber to EXPIRY events. In the message is the key that expired, which in this case is expiry:uniqueEventHash.
I can then do the following:
// pseudo code
onExpiryEvent(message):
[type, key] = eventMessage.split(':');
aggregatedValue = await getByKey(key);
del(key);

When Elm Html.Program calls subscriptions

I've found a possible answer to this question in a Google Group but I'll like to know if it's correct and add a follow-up question if it is correct.
The answer there is
Every time the global update function in your app runs for any reason,
the global subscriptions object is reevaluated as well, and effect
managers receive the new list of current subscriptions
If any time the model is changed subscriptions get called what is the effect on subscriptions such as Time.every second taken from Time Effect Elm Guide - is that means the timer get reset when the model changes? What if that was Time.every minute - if the model changes 20 seconds after it starts will it fire in 60 - 20 = 40 seconds or will it fire in 1 minute?
You can check when update and subscriptions are called by adding a Debug.log statement to each. The subscriptions function is called first at the start (since the messages which will be sent to update may depend on it) and also after each call to update.
The timer interval seems to be unaffected by subsequent calls to subscriptions. For example, if you use the elm clock example, change the subscription to
Time.every (10*Time.second) Tick
and add a button to the view which resets the model value to 0, you will see that the tick still takes place at regular 10s intervals, no matter when you click the button.
TLDR; It will fire in 1 minute, unless you turn your subscription
off and on during the first minute
Every time your update runs, the subscriptions function will run too.
The subscriptions function essentially is a list of things you want your app to subscribe to.
In the example you have a subscription that generates a Tick message every 60 seconds.
The behavior you may expect is:
T= 0s: The first time subscriptions runs, you start your subscription to "receive Tick message every 60 seconds".
T= between 0 AND 60s: As long as that particular subscription remains ON, it doesn't matter how often your update function runs. subscriptions will be run, but as long as your particular subscription to the Tick remains ON, things are fine.
T= 60s: You receive a Tick message from your subscription, which in turn will fire update to be called.
T= 60s: subscriptions will run again (because of previous call to update)
What could be interesting is what happens if the subscription to Tick is canceled along the way and then reinstated:
T= 0: subscription to Tick
T= 20s: suppose something changes in the model, causing subscription to Tick to be canceled
T= 40s: some other change in model, causing subscription to Tick to be turned on again
T= 100s: Tick message is fired, and passed to update function
T= 100s: subscriptions will run again