some questions about couchbase's replicas detail - replication

Here I get several questions about the replica functions in couchbase, and hope can be answered. First of all, I wanna give some my own understanding ahout the couchbase; If there are 10 nodes in my cluster, and I set the number of replica to be 3 in each bucket (
actually I find that the maximal value is 3 , and I coundn't find any other way to make it larger than 3), then, does it mean that each data in bucket can only be copied to
other three nodes(I guess the three nodes should be random choosen, but could it select manually )in totally 10 nodes; Furthermore, if some of the 10 nodes have downtime,
will it cause loss of data?
I conclude my questions as follows;
1, The maximal value of the replica number in couchbase is 3, right or wrong? If wrong, how could it be largger than 3.
2, I guess the three nodes should be random choosen, but could it select manually
3, If my understanding is right, it will have loss of data when we find some nodes being in downtime. How could we avoid the loss under that condition

The maximal value of the replica number in couchbase is 3, right or wrong? If wrong, how could it be larger than 3.
The maximum number of replicas that you can have is 3, we run in production with 1 replica but it all comes down to how large your cluster is and performance impact. The more replicas you have the more inter node communication and transfer that will occur.
When you have 3 replicas this means that each node has its data replicated to 3 other nodes, this means you could handle 3 node failures in your cluster. It could happen but it is unlikely, what's more likely to happen is a machine dies and then Couchbase can automatically fail over and promote a replica held in an other node to serve requests/updates.
Couchbase's system is nice because it means you can scale up and down by failing over a node and automatic re-balancing.
I guess the three nodes should be randomly chosen, but could I select it manually?
You have no say on which nodes replicas are held, in fact I think it's a great thing that all of Couchbase's sharding and replica processes are taken out of the developers hands, it's all an automatic process.
If my understanding is right, it will have loss of data when we find some nodes being in downtime. How could we avoid data loss under that condition?
As I said before, if one node goes down then a replica is promoted, with 3 back ups you'd need 3 nodes to fail before you noticed something happening. In a production environment you should already have a warning system for each individual node, be it New Relic, Nagios etc that would report if a server dies. If there was a catastrophic problem and you lost more than 4 nodes then yes you would have data loss.
Bare in mind automatic fail over in Couchbase isn't instantaneous but still it's pretty quick. If you need downtime across the cluster say for server maintenance that needs a restart or something then it is possible to fail a node over, remove it from the cluster, perform operations and tasks on it, then add it back into the cluster and rebalance. Perform those stops again for as many nodes as you need, I've personally done that when I forgot to set specific Linux things that needed a system restart.
Overall to avoid data loss, monitor your servers, make (daily/hourly) backups of the data in your cluster and have machines that are well provisioned for your workrate.
Hope that helps!

Related

Why do we need distributed lock in Redis if we partition on Key?

Folks,
I read thru Why we need distributed lock in the Redis but it is not answering my question.
I went thru the following
https://redis.io/topics/distlock to be specific https://redis.io/topics/distlock#the-redlock-algorithm and
https://redis.io/topics/partitioning
I understand that partitioning is used to distribute the load across the N nodes. So does this not mean that the interested data will always be in one node? If yes then why should I go for a distributed lock that locks across all N nodes?
Example:
Say if I persist a trade with the id 123, then the partition based on the hashing function will work out which node it has to go. Say for the sake of this example, it goes to 0th node.
Now my clients (multiple instances of my service) will want to access the trade 123.
Redis again based on the hashing is going to find in which Node the trade 123 lives.
This means the clients (in reality one instance among the multiple instances i.e only one client ) will lock the trade 123 on the 0th Node.
So why will it care to lock all the N nodes?
Unless partitioning is done on a particular record across all the Nodes i.e
Say a single trade is 1MB in size and it is partitioned across 4 Nodes with 0.25MB in each node.
Kindly share your thoughts. I am sure I am missing something but any leads are much appreciated.

How to restart scylla db cluster without any data loss

I want to restart my Scylla db cluster. But I don't want to lose any data.
Do I lose any data if I restart one after other node?
No, you will not loose data if you are doing a rolling restart.
Scylla keeps the data replicated across multiple nodes (usually 3 or more)
Depending on your Replication Factor (RF) and Consistency Level (CL) you might see read or write operations failed during the restart. See interactive calc here https://docs.scylladb.com/getting-started/consistency/#consistency-level-calculator
If "restarting a node" just involves restarting Scylla or rebooting the kernel on which it runs, then you're safe: Scylla is a distributed database, and is designed to support durability and availability even when nodes temporarily disappear from the network. When a node is temporarily down, all its data is still available for reads (from two other replicas), and also writes continue to work normally and will be eventually replicated to the down node when it finally comes up (using the "hinted handoff" and/or "repair" mechanisms).
However, if by "restarting a node" you mean something more destructive - replacing it with a brand-new node with empty storage, as in some cloud setups where nodes have transient storage. In that case you have to be more careful: If the node's data is lost, we still have two more replicas and the database continues to be available, but you should tell the cluster to "stream" the data which the node lost back to the node - before continuing to do this destructive restart to additional nodes. If you have RF=3 and destroy three nodes at the same time, you will surely lose data.

Forcing Riak to store data on distinct physical servers

I'm concerned by this note in Riak's documentation:
N=3 simply means that three copies of each piece of data will be stored in the cluster. That is, three different partitions/vnodes will receive copies of the data. There are no guarantees that the three replicas will go to three separate physical nodes; however, the built-in functions for determining where replicas go attempts to distribute the data evenly.
https://docs.basho.com/riak/kv/2.1.3/learn/concepts/replication/#so-what-does-n-3-really-mean
I have a cluster of 6 physical servers with N=3. I want to be 100% sure that total loss of some number of nodes (1 or 2) will not lose any data. As I understand the caveat above, Riak cannot guarantee that. It appears that there is some (admittedly low) portion of my data that could have all 3 copies stored on the same physical server.
In practice, this means that for a sufficiently large data set I'm guaranteed to completely lose records if I have a catastrophic failure on a single node (gremlins eat/degauss the drive or something).
Is there a Riak configuration that avoids this concern?
Unfortunate confounding reality: I'm on an old version of Riak (1.4.12).
There is no configuration that avoids the minuscule possibility that a partition might have 2 or more copies on one physical node (although having 5+ nodes in your cluster makes it extremely unlikely that a single node with have more than 2 copies of a partition). With your 6 node cluster it is extremely unlikely that you would have 3 copies of a partition on one physical node.
The riak-admin command line tool can help you explore your partitions/vnodes. Running riak-admin vnode-status (http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#vnode-status) on each node for example will output the status of all vnodes the are running on the local node the command is run on. If you run it on every node in your cluster you confirm whether or not your data is distributed in a satisfactory way.

Does Redis Replication help in load balancing?

We keep continuously writing and updating events into redis and so when we ever we want to read data(which is a lot of data , upwards of for 500000 key value pairs), redis has performance issues. So, we decided to get the data via multiple threads. But because of single instance redis , the performance issues persisted .Will replication help us? As in, by making master and slave redis's , will our reads of the events be distributed to the slaves . We are thinking of making the master write only.
Any other suggestion for performance improvements?
(one of) Replication's declared purposes is to help in scaling reads, so yes to the topic.
Note that after you've set up the slave, you'll need to specify its address for your reader threads and processes. Make sure that you start with read-slaves if you don't have a clear separation between writers and readers.
If a single slave isn't enough, you can actually add more slaves. If you add them directly to the master, you'll get fresher reads but there'll eventually be a performance impact on the master. Alternatively, replication chaining is a great solution for most use cases, i.e. 1 master -> 1 slave -> n slaves.
There are probably other ways to scale Redis for your use case (e.g. clustering), but that really depends on what you're trying/wanting to do :)

Couchbase node failure

My understanding could be amiss here. As I understand it, Couchbase uses a smart client to automatically select which node to write to or read from in a cluster. What I DON'T understand is, when this data is written/read, is it also immediately written to all other nodes? If so, in the event of a node failure, how does Couchbase know to use a different node from the one that was 'marked as the master' for the current operation/key? Do you lose data in the event that one of your nodes fails?
This sentence from the Couchbase Server Manual gives me the impression that you do lose data (which would make Couchbase unsuitable for high availability requirements):
With fewer larger nodes, in case of a node failure the impact to the
application will be greater
Thank you in advance for your time :)
By default when data is written into couchbase client returns success just after that data is written to one node's memory. After that couchbase save it to disk and does replication.
If you want to ensure that data is persisted to disk in most client libs there is functions that allow you to do that. With help of those functions you can also enshure that data is replicated to another node. This function is called observe.
When one node goes down, it should be failovered. Couchbase server could do that automatically when Auto failover timeout is set in server settings. I.e. if you have 3 nodes cluster and stored data has 2 replicas and one node goes down, you'll not lose data. If the second node fails you'll also not lose all data - it will be available on last node.
If one node that was Master goes down and failover - other alive node becames Master. In your client you point to all servers in cluster, so if it unable to retreive data from one node, it tries to get it from another.
Also if you have 2 nodes in your disposal you can install 2 separate couchbase servers and configure XDCR (cross datacenter replication) and manually check servers availability with HA proxies or something else. In that way you'll get only one ip to connect (proxy's ip) which will automatically get data from alive server.
Hopefully Couchbase is a good system for HA systems.
Let me explain in few sentence how it works, suppose you have a 5 nodes cluster. The applications, using the Client API/SDK, is always aware of the topology of the cluster (and any change in the topology).
When you set/get a document in the cluster the Client API uses the same algorithm than the server, to chose on which node it should be written. So the client select using a CRC32 hash the node, write on this node. Then asynchronously the cluster will copy 1 or more replicas to the other nodes (depending of your configuration).
Couchbase has only 1 active copy of a document at the time. So it is easy to be consistent. So the applications get and set from this active document.
In case of failure, the server has some work to do, once the failure is discovered (automatically or by a monitoring system), a "fail over" occurs. This means that the replicas are promoted as active and it is know possible to work like before. Usually you do a rebalance of the node to balance the cluster properly.
The sentence you are commenting is simply to say that the less number of node you have, the bigger will be the impact in case of failure/rebalance, since you will have to route the same number of request to a smaller number of nodes. Hopefully you do not lose data ;)
You can find some very detailed information about this way of working on Couchbase CTO blog:
http://damienkatz.net/2013/05/dynamo_sure_works_hard.html
Note: I am working as developer evangelist at Couchbase