Akka.NET Cluster Sharding - Control shard rebalancing to stop the entity only when it reaches a specific state - akka.net

In Akka.NET cluster sharding I have an entity that can spawn between two behavior states (Idle, Processing). When the entity is in the processing state I don't want the rebalancing mechanism to stop the entity. How can I control shard rebalancing to stop the entity only when it reaches the "Idle" state?


Ensure a new RabbitMQ quorum queue replicas are spread among cluster's availability zones

I'm going to run a multi-node (3 zones, 2 nodes in each, expected to grow) RabbitMQ cluster with many dynamically created quorum queues. It will be unmanageable to tinker with the queue replicas manually.
I need to ensure that a new quorum queue (1lead, 2repl) always spans all 3 AZs to be able to survive an AZ outage.
Is there a way to configure node affinity to achieve that goal?
There is one poor man's solution (actually, pretty expensive to do right) that comes on my mind:
create queues with x-quorum-initial-group-size=1
have a reconciliation script that runs periodically and adds replica members to nodes in the right zones
Of course, build in feature for configurable node affinity, which I might miss somehow, would be the best one.

Could you please explain Replication feature of Redis

I am very new in REDIS cache implementation.
Could you please let me know what is the replication factor means?
How it works or What is the impact?
At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a very simple to use and configure leader follower (master-slave) replication: it allows replica Redis instances to be exact copies of master instances. The replica will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.
This system works using three main mechanisms:
When a master and a replica instances are well-connected, the master keeps the replica updated by sending a stream of commands to the replica, in order to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
When the link between the master and the replica breaks, for network issues or because a timeout is sensed in the master or the replica, the replica reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.
Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases.
Synchronous replication of certain data can be requested by the clients using the WAIT command. However WAIT is only able to ensure that there are the specified number of acknowledged copies in the other Redis instances, it does not turn a set of Redis instances into a CP system with strong consistency: acknowledged writes can still be lost during a failover, depending on the exact configuration of the Redis persistence. However with WAIT the probability of losing a write after a failure event is greatly reduced to certain hard to trigger failure modes.

What's the value of concurrency for sagas?

I do not get the purpose of concurrent messages for saga. I'd expect it to behave more like an actor. So all the messages with the same CorrelationId are processed sequentially. The whole purpose of saga is orchestration of a long running process, so why does parallel message processing matter?
Can you give a legit example where handling messages concurrently for the saga instance is beneficial compared to the sequential mode?
Or do I understand it wrong, and concurrency just means several different saga instances running in parallel?
The reason to ask is this fragment from NServiceBus docs:
The main reason for avoiding accessing data from external resources is possible contention and inconsistency of saga state. Depending on persister, the saga state is retrieved and persisted with either pessimistic or optimistic locking. If the transaction takes too long, it's possible another message will come in that correlates to the same saga instance. It might be processed on a different (concurrent) thread (or perhaps a scaled out endpoint) and it will either fail immediately (pessimistic locking) or while trying to persist the state (optimistic locking). In both cases the message will be retried.
There's none, messages for the single saga instance need to be processed sequentially. There's nothing special about saga configuration in MassTransit, you really want to use a separate endpoint for it and set the concurrency limit to one.
But that would kill the performance for processing messages for different saga instances. To solve this, keep the concurrently limit higher than one and use the partitioning filter by correlation id. Unfortunately, the partitioning filter requires by-message configuration, so you'd need to configure the partitioning for all messages that the saga consumes.
But it all depends on the use-case. All the concurrency issues are resolved by retries when using the persistence-based optimistic concurrency, which is documented per saga persistence provider. Certainly, it produces some noise by retrying database operations, but if the number of retries is under control, you can just keep it as it is.
If you hit tons of retries due to massive concurrent updates, you can revert to partitioning your saga.

Behaviour of persistent messages on a mirrored in a RabbitMQ cluster

I am trying to figure out whether in a mirrored queue that only has persistant messages, is it still possible to lose messages during the re-synchronisation process.
If I have a queue mirrored across a two nodes (to simplify the example).
The exchange and queue is durable and all the messages marked as persistant.
The Master Queue is on Node 1
The Mirrored Queue is on Node 2
The scenario is
Initially the queues are synchronised
Node 2 goes down
Node 2 Recovers
Before Node 2 synchronises Node 1 is lost
Node 2 becomes the master
At step 3 Node 2 recovers, does it load the messages from the message store that it had persisted, or will it start with no messages and start synchronising (by the two standard resynchronsisation methods)
In the case where a queue is mirrored, does each queue have it's own message store.
If this scenario does lose messages, is there a scenario where this can be avoided
It seems that if this scenario occurs, the messages will be lost regardless of your configuration. To mitigate the problem, the solution would be to ensure that
Ensure messages are persisted
Queues and Exchanges are durable
Ensure consumer acknowledgements are used and that it is set to only
acknowledge when the message has been committed to the master and all
the mirrored replicas.
Ensure there are an appropriate number of mirrored replicas so as to
avoid getting to the situation where you don't have a synchronized
There will be a throughput performance hit.

What happens to a RabbitMQ cluster if the only disc node dies?

RabbitMQ clusters need to have at least one disc node (you can't turn your last disc node to a ram node).
However (especially in a cloud context) nodes can die - what is supposed to happen to the cluster if the only disc node dies?
Does the cluster automatically appoint a new disc node, or it continues working with no disc node.
Short answer: In case all disc nodes dies and you have at least one RAM node you'll get RAM-only cluster. In case only one RAM node left and it goes down and then up, only durable entities will reside on it.
Long answer:
If you use clustering as it described in Clustering Guide queues reside only on one node:
All data/state required for the operation of a RabbitMQ broker is
replicated across all nodes, for reliability and scaling, with full
ACID properties. An exception to this are message queues, which by
default reside on the node that created them, though they are visible
and reachable from all nodes. To replicate queues across nodes in a
cluster, see the documentation on high availability (note that you
will need a working cluster first).
So when node dies (not only disc one, it applied to RAM too) you lose queues (with content) resides on that node.
If you use High Availability to mirror queue across more than one nodes (actually, it depends how you set it up, see detailed explanation on ha-mode and ha-policy policy keys - all, exactly and nodes).
With HA, if queue has some ha-policy set and the node it reside dies, that queue will be tried to be mirrored to other nodes, including RAM-only one (sure, it depends how you set up ha-mode, for example if it set to nodes and all nodes from list dies you lose the queue).
So after such intro,
If you turn off all disc nodes and you have only RAM nodes and queues fits the memory everything will works normally. If queues doesn't fit in memory, Flow Control memory limits applied which explained in clustering doc in Restarting section (at the end of e:
At least one disk node should be running at all times to prevent data
loss. RabbitMQ will prevent the creation of a RAM-only cluster in many
situations, but it still won't stop you from stopping and forcefully
resetting all the disc nodes, which will lead to a RAM-only cluster.
Doing this is not advisable and makes losing data very easy.
and a bit more from clustering doc:
A node can be a disk node or a RAM node. (Note: disk and disc are
used interchangeably. Configuration syntax or status messages normally
use disc.) RAM nodes keep their state only in memory (with the
exception of queue contents, which can reside on disc if the queue is
persistent or too big to fit in memory). Disk nodes keep state in
memory and on disk. As RAM nodes don't have to write to disk as much
as disk nodes, they can perform better. However, note that since the
queue data is always stored on disc, the performance improvements will
affect only resources management (e.g. adding/removing queues,
exchanges, or vhosts), but not publishing or consuming speed. Because
state is replicated across all nodes in the cluster, it is sufficient
(but not recommended) to have just one disk node within a cluster, to
store the state of the cluster safely.
So if you don't literally add any disc node you'll get RAM-only cluster. It may be fast in some cases, but if all nodes goes down you will lose all your queues with it content forever, except durable ones while any node dump persistent queues and messages on disc.
But don't rely on RAM node dump persistent entities on disc, while under certain situations it may not dump at all or not all entities (especially, messages).
There are old mailing list threads which may bring some extra light on situation:
Cluster with all memory nodes
Cluster Disk Node vs Ram Node explanation