I'm wondering what the best practice is to achieve eventual (i.e. deferred) replication of reliable state partitions in a Service Fabric cluster?
I'd like to have something like this:
The picture above is supposed to illustrate a single service fabric cluster where some nodes are located on one continent and some other nodes are located on another continent.
The node marked with P is the primary replica for some reliable state. R1 and R2 are secondary replicas for that state. R3..R5 are also replicas for that state, but more like secondary secondaries.
Replicating state between the two sub-clusters will be slow (they are far away from each other).
In this scenario, there will mostly be read-only clients on continent Y and those clients need not see "realtime" updates. Lagging behind is not a problem, as long as all (or at least most) replicas in that sub-cluster have consistent state.
I think that what I'd like to achieve can be summarized in these points:
I want the replicas in sub-cluster Y to be some kind of second class replicas. They should not be allowed to become the primary, and they should be allowed to lag behind.
All (or most) replicas for a given state in a given sub-cluster should be consistent.
I want replication traffic cross these two sub-clusters on a single path. I don't want traffic for the same replicated state to travel cross continent more than once (unless necessary). Perhaps by having something like a "primary deferred replica" on sub-cluster Y that spread the replicated state to the other replicas in that sub-cluster?
Note: For some other reliable state partition, it might as well be the other way around. That sub-cluster Y holds the primary, and that sub-cluster X is "deferred".
Is there some support for this available in the Service Fabric framework? Or are there any best practices available? Or, maybe this scenario is completely off the road?
Right now there isn't a way to designate replicas as "second class" for eventual replication, but it's not completely off the road either. What you've described is completely valid for geo-span clusters. For now, it's possible to set up a geo-span cluster where a replica set spans multiple regions, but the replicas in each region are treated equally.
Related
Folks,
I read thru Why we need distributed lock in the Redis but it is not answering my question.
I went thru the following
https://redis.io/topics/distlock to be specific https://redis.io/topics/distlock#the-redlock-algorithm and
https://redis.io/topics/partitioning
I understand that partitioning is used to distribute the load across the N nodes. So does this not mean that the interested data will always be in one node? If yes then why should I go for a distributed lock that locks across all N nodes?
Example:
Say if I persist a trade with the id 123, then the partition based on the hashing function will work out which node it has to go. Say for the sake of this example, it goes to 0th node.
Now my clients (multiple instances of my service) will want to access the trade 123.
Redis again based on the hashing is going to find in which Node the trade 123 lives.
This means the clients (in reality one instance among the multiple instances i.e only one client ) will lock the trade 123 on the 0th Node.
So why will it care to lock all the N nodes?
Unless partitioning is done on a particular record across all the Nodes i.e
Say a single trade is 1MB in size and it is partitioned across 4 Nodes with 0.25MB in each node.
Kindly share your thoughts. I am sure I am missing something but any leads are much appreciated.
When people describe Paxos, they always assume that there are already some proposers in the cluster. But where are the proposers from, or what decides which processes to be proposers?
How the cluster is initially configured and how it is changed is down to the administrator who is trying to optimise the system.
You can run the different roles on different hosts and have different numbers of them. We could run three proposers, five acceptor and seven learners, whatever you choose. Clients that need to write a value only need to connect to proposers. With multi-Paxos for state replication clients only need to connect to proposers as that is sufficient and the clients don't need to exchange messages with any other role type. Yet there is nothing to prevent clients from also being learners by seeing messages from acceptor.
As long as you follow the Paxos algorithm it all comes down to minimising network hops (latency and bandwidth), costs of hardware, and complexity of the software for your particular workload.
From a practical perspective your clients need to be able to find proposers in the face of failures. A cluster administrator will be configuring which nodes are to be proposes and making sure that they are discovered by clients.
It is hard to visualize from descriptions of the abstract algorithm how things might work as many messaging topographies are possible. When applying the algorithm to a practical application its fair more obvious what setup minimises latency, bandwidth, hardware and complexity. An example might be a three node MySQL cluster running Paxos. You want all three servers to have all the data so they are all learners. All must be acceptors as you need three at a minimum to have one node fail and still maintain progress. They may as well all be proposers to give the best availability and simplicity of software and configuration. Note that one will become the distinguished leader. The database administrator doesn't think about the Paxos roles as they just set up a three-node database cluster.
The roles in the cluster may need to change. For example, you might want to expand the capacity of a database cluster. Or a server might die so you need to change the cluster membership to swap the dead one for a fresh one. For the Paxos algorithm to work every process must have a strongly consistent view of which processes are in which roles. How do you get consensus? You use Paxos to fix a new value of the cluster membership.
Problem statement: My application will be deployed in 3 separate regions, viz: North America, Europe, and Asia. I want to build a redis architecture with the following constraints:
Each region should have it's own Redis cluster which can have multiple masters and slaves.
Each region's cluster should be able to handle writes and reads locally.
Let me elaborate a bit on the second point: I want that all regions should have their own copy of data. So any new data that an application in Europe writes should go to a redis cluster in Europe region not in any other region. And then this data can be (asynchronously) replicated to Asia and North America region.
What I've found as of now is that I can't use redis sentinel as I want mutliple masters. I can't use (I think) redis cluster with masters on separate regions as this would shard the data across all regions, thereby application in Europe can try to write on a key which is sharded on a redis master in Asia.
So my Question is: Is this architecuture possible with Redis OS right now(), or in near future?
I've read this, this, and this on SO stating this feature was not previously available but, It seems this feature is available in Redis Entireprise here though, I couldn't find anything on this topic for open source version of Redis.
One possible solution can be to use redis keys hash tags with one redis master in each region like redis_master_US, redis_master_europe, etc. and slaves in multiple regions (to improve read performance and availability) where keys can be such {US}_California, {US}_Texas, {EU}_Germany, {ASIA}_Japan, etc. But the catch here is that all keys with the prefix US will go to the same redis master but not necessarily redis_master_US, which depends on the hash slot distribution between redis masters. Now there is a way to get around that, if we use premeditated redis key hash tags as can be found here. Now we can use a key like, {fyimk7v1CgnBo}_California, {fyimk7v1CgnBo}_Texas, {91Lnyl}_Germany, {6MQu4Y}_Japan, which we know will point to slot 0, 16382, 8325 respectively and while making a cluster make sure these slots are assigned to redis_master_US, redis_master_Germany, and redis_master_asia respectively.
Though this approach seems to entail lot of gotchas.
Mirroring is replicating data between Kafka cluster, while Replication is for replicating nodes within a Kafka cluster.
Is there any specific use of Replication, if Mirroring has already been setup?
They are used for different use cases. Let's try to clarify.
As described in the documentation,
The purpose of adding replication in Kafka is for stronger durability and higher availability. We want to guarantee that any successfully published message will not be lost and can be consumed, even when there are server failures. Such failures can be caused by machine error, program error, or more commonly, software upgrades. We have the following high-level goals:
Inside a cluster there might be network partitions (a single server fails, and so forth), therefore we want to provide replication between the nodes. Given a setup of three nodes and one cluster, if server1 fails, there are two replicas Kafka can choose from. Same cluster implies same response times (ok, it also depends on how these servers are configured, sure, but in a normal scenario they should not differ so much).
Mirroring, on the other hand, seems to be very valuable, for example, when you are migrating a data center, or when you have multiple data centers (e.g., AWS in the US and AWS in Ireland). Of course, these are just a couple of use cases. So what you do here is to give applications belonging to the same data center a faster and better way to access data - data locality in some contexts is everything.
If you have one node in each cluster, in case of failure, you might have way higher response times to go, let's say, from AWS located in Ireland to AWS in the US.
You might claim that in order to achieve data locality (services in cluster one read from kafka in cluster one) one still needs to copy the data from one cluster to the other. That's definitely true, but the advantages you might get with mirroring could be higher than those you would get by reading directly (via an SSH tunnel?) from Kafka located in another data center, for example single connections down, clients connection/session times longer (depending on the location of the data center), legislation (some data can be collected in a country while some other data shouldn't).
Replication is the basis of higher availability. You shouldn't use Mirroring to handle high availability in a context where data locality matters. At the same time, you should not use just Replication where you need to duplicate data across data centers (I don't even know if you can without Mirroring/an ssh tunnel).
From the active/active documentation -
we have developed active/active high availability for queues
This solution still requires a RabbitMQ cluster, which means that it will not cope
seamlessly with network partitions within the cluster and, for that reason, is not
recommended for use across a WAN (though of course, clients can still connect from
as near and as far as needed)
What does it mean "not recommended for use across a WAN".
I cant understand this remark -
If I buy three machines on ec2 will I need to establish a domain controller/dns server?
What does this restriction mean? and why?
Replication is a time-sensitive application, this means that timing assumptions have to be done in order to get the distributed state synchronized across the replicas.
The Internet is an asynchronous network per definition, the network asynchronously evolves and there's no way to make assumptions on delivery times, neither in case MPLS (Multiprotocol Label Switching) paths are defined: the BGP (Border Gateway Protocol) introduces a lot of unpredictability, paths can be very unpredictable, and this translates in unpredictable latency.
According to above, unpredictable latency is a killer factor for Active-Active replication (i.e. mirroring the state synchronously among the replicas to reach a consistent distributed state).
Another problem to be taken into consideration consists in the Network Partitioning: in a set of replicas, one or more can be isolated creating "islands of non-consistent replicas": let's assume the replica set R = { R1, R2, R3, ..., RN }, for network connectivity reasons (e.g. BGP problems) a subset of replicas like {R1, R2, R3} may be isolated from the remaining ones. Network partitioning implies distributed state inconsistencies: the subset of replicas will be consistent, but globally they evolve independently towards a corrupted distributed state.
The CAP Theorem deals with the replication problem over the WAN (Wide Area Network, i.e. the Internet). It states:
Consistency, Availability and Partitioning cannot be achieved over the WAN or another asynchronous network, 2 out of 3 need to be chosen for large scale distributed systems (e.g. Availability and Network Partitioning for the well known NoSQL databases).
Coming back to the original question: according to above, that statement (from RabbitMQ documentation) tries to sum up in a pragmatic way the problems that I highlighted above (i.e. Active-Active replication cannot be achieved over the WAN). For this reason, if you need to replicate your Broker instances over the WAN, techniques like the Shoveling and Federation are commonly used in RabbitMQ deploys.
It means if you have 3 EC2 instances in your cluster, they should be in the same data center. Not US East and US West for example*. RabbitMQ uses Erlang's node communication and is pretty chatty. Low latency communication is critical to having a performant cluster.
*Ideally even the same subnet, but that's not always possible.