This question is regarding rxpy.
I am trying to build a reactive system that handles messages from a source observable. In addition to that, I am trying to integrate it with a leader election system based on zookeeper.
This combination will allow only one leader in a farm of processes to handle the message stream. Below is the gist of the code I am trying to construct.
# event_source is an observable of messages
# manager.leaders is an observable of leader election events
# manager.followers is an observable of leader relinquish events
event_source\
.skip_until(manager.leaders)\
.take_until(manager.followers)\
.subscribe(observer)
It works fine and all, but I need to inject between skip_until and take_until a piece to handle backfill. This is designed to handle potential gap between a leader process failure and another process assuming leadership. Every processed message will leave a record so that a new leader can catch up on missing messages, if any, before proceeding with the stream.
I tried start_with operator without success. Am I not approaching it in a manner it is not meant to be used for?
Ultimately, the solution I am looking for is to inject a specific number of items in the stream triggered by an event from another stream.
What about this:
manager.leaders \
.flat_map(lambda e: event_source
.start_with(...)
.take_until(manager.followers))
Every time manager.leaders emits a message event_source will be subscribed to, starting with injected items, until manager.followers emits.
Related
I have one question about akka-persistence and event migration. I do have read the "Schema Evolution for Event Sourced Actors" chapter. However, this does not give an answer to my question.
Given I have one persistent actor ChildActor that produce Created event. But, later we discover that ChildActor should be a child of ParentActor. And ParentActor has to update his state based on the creation of ChildActor (to maintains a collection of childs).
We can add a new command CreateChild for ParentActor that will create the ChildActor. However, the parent will never receive the Created event emitted by his child. Thus it will not be able to update his state. Of course, ParentActor can create a ChildCreated event for himself.
But, what about the Created events already persisted by FirstActor?
How can we "send" (and, ideally adapt) them to the ParentActor?
So, my question is:
Can we "route" persisted events from one actor to another?
Thanks
It is possible to watch the events persisted by a given persistence ID with the events by persistence ID query. Since this query is very much like what Akka Persistence must do in replaying events to rebuild a persistent actor's state, it's available in all the commonly used plugins: you'll need to check the documentation for your plugin for how to summon a ReadJournal. Once summoned, assuming that the ReadJournal is further an instance of EventsByPersistenceIdQuery, you would use (Scala):
readJournal.eventsByPersistenceId(childActorPersistenceId, fromOffset, Long.MaxValue)
which would give you an Akka Streams Source of events in order starting at fromOffset. Your subscribing actor may (probably will) want to save in its state the last-seen sequence number as part of its state so if it resumes it doesn't see the event it processed (ideally the event updating the sequence number would be in the same batch or otherwise atomically part of the state update).
Note that there will be an observable delay from persisting the event to when ParentActor sees the event, though many of the recent iterations of plugins (e.g. Cassandra or R2DBC) can directly propagate the event or at least the notification that there's an event for the persistence ID to the query.
I'm new to Mass Transit and I would like to understand if it can helps with my scenario.
I'm building a sample application implemented with a CQRS event sourcing architecture and I need a service bus in order to dispatch the events created by the command stack to the query stack denormalizers.
Let's suppose of having a single aggregate in our domain, let's call it Photo, and two different domain events: PhotoUploaded and PhotoArchived.
Given this scenario, we have two different message types and the default Mass Transit behaviour is creating two different RabbitMq exchanges: one for the PhotoUploaded message type and the other for the PhotoArchived message type.
Let's suppose of having a single denormalizer called PhotoDenormalizer: this service will be a consumer of both message types, because the photo read model must be updated whenever a photo is uploaded or archived.
Given the default Mass Transit topology, there will be two different exchanges so the message processing order cannot be guaranteed between events of different types: the only guarantee that we have is that all the events of the same type will be processed in order, but we cannot guarantee the processing order between events of different type (notice that, given the events semantic of my example, the processing order matters).
How can I handle such a scenario ? Is Mass Transit suitable with my needs ? Am I completely missing the point with domain events dispatching ?
Disclaimer: this is not an answer to your question, but rather a preventive message why you should not do what you are planning to do.
Whilst message brokers like RMQ and messaging middleware libraries like MassTransit are perfect for integration, I strongly advise against using message brokers for event-sourcing. I can refer to my old answer Event-sourcing: when (and not) should I use Message Queue? that explains the reasons behind it.
One of the reasons you have found yourself - event order will never be guaranteed.
Another obvious reason is that building read models from events that are published via a message broker effectively removes the possibility for replay and to build new read models that would need to start processing events from the beginning of time, but all they get are events that are being published now.
Aggregates form transactional boundaries, so every command needs to guarantee that it completes within one transaction. Whilst MT supports the transaction middleware, it only guarantees that you get a transaction for dependencies that support them, but not for context.Publish(#event) in the consumer body, since RMQ doesn't support transactions. You get a good chance of committing changes and not getting events on the read side. So, the rule of thumb for event stores that you should be able to subscribe to the stream of changes from the store, and not publish events from your code, unless those are integration events and not domain events.
For event-sourcing, it is crucial that each read-model keeps its own checkpoint in the stream of events it is projecting. Message brokers don't give you that kind of power since the "checkpoint" is actually your queue and as soon as the message is gone from the queue - it is gone forever, there's no coming back.
Concerning the actual question:
You can use the message topology configuration to set the same entity name for different messages and then they'll be published to the same exchange, but that falls to the "abuse" category like Chris wrote on that page. I haven't tried that but you definitely can experiment. Message CLR type is part of the metadata, so there shouldn't be deserialization issues.
But again, putting messages in the same exchange won't give you any ordering guarantees, except the fact that all messages will land in one queue for the consuming service.
You will have to at least set the partitioning filter based on your aggregate id, to prevent multiple messages for the same aggregate from being processed in parallel. That, by the way, is also useful for integration. That's how we do it:
void AddHandler<T>(Func<ConsumeContext<T>, string> partition) where T : class
=> ep.Handler<T>(
c => appService.Handle(c, aggregateStore),
hc => hc.UsePartitioner(8, partition));
AddHandler<InternalCommands.V1.Whatever>(c => c.Message.StreamGuid);
Suppose that one of cluster nodes received a message and one of actors started to process it. Somewhere in the middle this node died for some reason. What will happen with message, I mean will it be processed by another available node or will be lost?
By default akka (and every other actor model framework) offers at-most-once delivery. This means that messages are send to actors using best effort guarantees - if they won't reach the target they won't be redelivered. This also means, that if message reached the target, but the process associated with it was interrupted before finishing, it won't be retried.
That being said, there are numerous ways to offer a redelivery between actors with various guarantees.
The simplest and most unreliable is to use Ask pattern in combination with i.e. Polly library. This however won't help if a node, on which sender lives, will die - simply because message are still stored only in memory.
The more reliable pattern is to use some event log/queue in front of your cluster (i.e. Azure Service Bus, RabbitMQ or Kafka). In this approach clients are sending requests via bus/queue, while the first actor in process pipeline is responsible for picking it up. If some actor or node in pipeline dies, the whole pipeline for that message is being retried.
Another idea is to use at-least-once delivery found in Akka.Peristence module. It allows you to use eventsourcing capabilities of persistent actors to persist messages. However IMO it requires a bit of exerience with Akka.
All of these approaches present at-least-once delivery guarantees, which means that it's possible to send the same message to its destination more than once. This also means, that your processing logic needs to acknowledge that by either an idempotent behavior or by recognizing and removing duplicates on the receiver side.
Question
I want to pass data between applications, in a publish-subscribe manner. Data may be produced at a much higher rate than consumed and messages get lost, which is not a problem. Imagine a fast sensor and a slow sensor data processor. For that, I use redis pub/sub and wrote a class which acts as a subscriber, receives every message and puts that into a buffer. The buffer is overwritten when a new message comes in or nullified when the message is requested by the "real" function. So when I ask this class, I immediately get a response (hint that my function is slower than data comes in) or I have to wait (hint that my function is faster than the data).
This works pretty good for the case that data comes in fast. But for data which comes in relatively seldom, let's say every five seconds, this does not work: imagine my consumer gets launched slightly after the producer, the first message is lost and my consumer needs to wait nearly five seconds, until it can start working.
I think I have to solve this with Redis tools. Instead of a pub/sub, I could simply use the get/set methods, thus putting the cache functionality into Redis directly. But then, my consumer would have to poll the database instead of the event magic I have at the moment. Keys could look like "key:timestamp", and my consumer now has to get key:* and compare the timestamps permamently, which I think would cause a lot of load. There is no natural possibility to sleep, since although I don't care about dropped messages (there is nothing I can do about), I do care about delay.
Does someone use Redis for a similar thing and could give me a hint about clever use of Redis tools and data structures?
edit
Ideally, my program flow would look like this:
start the program
retrieve key from Redis
tell Redis, "hey, notify me on changes of key".
launch something asynchronously, with a callback for new messages.
By writing this, an idea came up: The publisher not only publishes message on topic key, but also set key message. This way, an application could initially get and then subscribe.
Good idea or not really?
What I did after I got the answer below (the accepted one)
Keyspace notifications are really what I need here. Redis acts as the primary source for information, my client subscribes to keyspace notifications, which notify the subscribers about events affecting specific keys. Now, in the asynchronous part of my client, I subscribe to notifications about my key of interest. Those notifications set a key_has_updates flag. When I need the value, I get it from Redis and unset the flag. With an unset flag, I know that there is no new value for that key on the server. Without keyspace notifications, this would have been the part where I needed to poll the server. The advantage is that I can use all sorts of data structures, not only the pub/sub mechanism, and a slow joiner which misses the first event is always able to get the initial value, which with pub/sib would have been lost.
When I need the value, I obtain the value from Redis and set the flag to false.
One idea is to push the data to a list (LPUSH) and trim it (LTRIM), so it doesn't grow forever if there are no consumers. On the other end, the consumer would grab items from that list and process them. You can also use keyspace notifications, and be alerted each time an item is added to that queue.
I pass data between application using two native redis command:
rpush and blpop .
"blpop blocks the connection when there are no elements to pop from any of the given lists".
Data are passed in json format, between application using list as queue.
Application that want send data (act as publisher) make a rpush on a list
Application that want receive data (act as subscriber) make a blpop on the same list
The code shuold be (in perl language)
Sender (we assume an hash pass)
#Encode hash in json format
my $json_text = encode_json \%$hash_ref;
#Connect to redis and send to list
my $r = Redis->new(server => "127.0.0.1:6379");
$r->rpush("shared_queue","$json_text");
$r->quit;
Receiver (into a infinite loop)
while (1) {
my $r = Redis->new(server => "127.0.0.1:6379");
my #elem =$r->blpop("shared_queue",0);
#Decode hash element
my $hash_ref=decode_json($elem\[1]);
#make some stuff
}
I find this way very usefull for many reasons:
The element are stored into list, so temporary disabling of receiver has no information loss. When recevier restart, can process all items into the list.
High rate of sender can be handled with multiple instance of receiver.
Multiple sender can send data on unique list. In ths case should be easily implmented a data collector
Receiver process that act as daemon can be monitored with specific tools (e.g. pm2)
From Redis 5, there is new data type called "Streams" which is append-only datastructure. The Redis streams can be used as reliable message queue with both point to point and multicast communication using consumer group concept Redis_Streams_MQ
We've got a CQRS project and are thinking about a way to implement a "catchup", e.g. a new event handler is started and tells the eventstore to replay all events for him.
We're not sure if we should do the replay over the NServiceBus, as there is a real 1:1 connection and no publish/subscribe situation. Also we think that our new consumer is not able to keep up with the publish-speed and its input queue would get stuck.
What's the best practice here?
I've heard of people doing the following:
Have a system of replaying/rebroadcasting the events. Event handlers that have produced projections that have already seen these events ignore the events.
Allow events to be queried directly by the Event Handler when resetting it or when starting a new projection from scratch. This can be done in some systems by reading directly from the event store and in other actor based system an actor abstraction around the source of events may be queried.
From my understanding, option 2 allows for better performance as events can be queried in batches as opposed to being replayed to all listeners individually. These are just my observations without any practical experience to draw on yet.