redis cluster total size - redis

I have a quick question about redis cluster.
I'm setting up a redis cluster on google cloud kubernetes engine. I'm using the n1-highmem-2 machine type with 13GB RAM, but I'm slightly confused how to calculate the total available size of the cluster.
I have 3 nodes with each 13GB ram. I'm running 6 pods (2 on each node), 1 master and 1 slave per node. This all works. I've assigned 6GB of RAM to each pod in my pod definition yaml file.
Is it correct to say that my total cluster size would be 18GB (3 masters * 6GB), or can I count the slaves size with the total size of the redis cluster?

Redis Cluster master-slave model
In order to remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).
So, slaves are replicas(read only) of masters(read-write) for availability, hence your total workable size is the size of your master pods.
Keep in mind though, that leaving masters and slaves on the same Kubernetes node only protects from pod failure, not node failure and you should consider redistributing them.
You didn't mention how are you installing Redis, But I'd like to mention Bitnami Redis Helm Chart as it's built for use even on production and deploys 1 master and 3 slaves providing good fail tolerance and have tons of configurations easily personalized using the values.yaml file.

Related

Redis cluster with one master and N replica/slave

Is it possible to create a Redis cluster with only 1 master and N slaves/replicas?
I tried it and it failed:
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 --cluster-replicas 2
*** ERROR: Invalid configuration for cluster creation.
*** Redis Cluster requires at least 3 master nodes.
*** This is not possible with 3 nodes and 2 replicas per node.
*** At least 9 nodes are required.
Is there a way to avoid this restriction of minimum 3 masters?
Redis Cluster doesn't support what you are asking for, but there is another H/A Redis mode, "Redis Sentinel":
https://redis.io/docs/manual/sentinel/
This article is worth reading as it illustrates some pros and cons of the two H/A modes:
Redis Sentinel Pros:
With three nodes, you can build up a fully functional Sentinel deployment. (Image 2)
Simplicity - it’s usually simple to maintain and configure.
Highly available, you can build a Redis Sentinel deployment that can survive certain failures without any need for human intervention.
Work as long as a single master instance is available; it can survive the failure of all slave instances.
Multiple slave nodes can replicate data from a master node.
Redis Sentinel Cons:
Not scalable; writes must go to the master, cannot solve the problem of read-write separation.
Slaves may serve reads, but because of asynchronous replication, outdated reads may result.
It doesn’t shard data, so master and slave utilization will be imbalanced.
The slave node is a waste of resources because it does not serve as a backup node.
Redis-Sentinel must be supported by the client. The client holds half of the magic.

Redis advantages of Sentinel and Cluster

I'm planning to create a high available Redis Cluster. After reading many articles about building Redis cluster i'm confused. So what exactly are
the advantages of a Redis Sentinel Master1 Slave1 Slave2 Cluster? Is it more reliable as a Redis Multinode Sharded Cluster?
the advantages of a Redis Multinode Sharded Cluster? Is it more reliable as a Redis Sentinel Master1 Slave1 Slave2 Cluster?
Further questions to the Redis Sentinel Master1 Slave1 Slave2 Cluster:
when i have 1 Master and the two Slaves and traffic is getting higher and higher so this cluster will be to small how can i make the cluster bigger?
Further questions to the Redis Multinode Sharded Cluster:
why are there so many demos with running a cluster on a single instance but on different ports? That makes no sense to me.
when i have a cluster with 4 masters and 4 replicas, how can an application or a client be sure to write to the cluster? When Master1 and Slave1 are dying but my application is writing always to the IP of Master1 then it will not work anymore. Which solutions are out there to implement a sharded cluster well to make it available for applications to find it with a single ip and port? Keepalived? HAproxy?
when i juse for a 4 master setup with e.g. Keepalived - doesn't that cancel out the different masters?
furthermore i need to understand why the multinode cluster is only for solutions where more data will need to be written as memory is available. Why? For me a multi master setup sounds good to be scaleable.
is it right that the the sharded cluster setup does not support multikey operations when the cluster is not in caching mode?
I'm unsure if these two solutions are the only ones. Hopefully you guys can help me to understand the architectures of Redis. Sorry for so many questions.
I will try to answer some of your questions but first let me describe the different deployment options of Redis.
Redis has three basic deployments: single node, sentinel and cluster.
Single node - The basic solution where you run single process running Redis.
It is not scalable and not highly available.
Redis Sentinel - Deployment that consist of multiple nodes where one is elected as master and the rest are slaves.
It adds high availability since in case of master failure one of the slaves will be automatically promoted to master.
It is not scalable since the master node is the only node that can write data.
You can configure the clients to direct read requests to the slaves, which will take some of the load from the master. However, in this case slaves might return stale data since they replicate the master asynchronously.
Redis Cluster - Deployment that consist of at least 6 nodes (3 masters and 3 slaves). where data is sharded between the masters. It is highly available since in case of master failure, one of his slaves will automatically be promoted to master. It is scalable since you can add more nodes and reshard the data so that the new nodes will take some of the load.
So to answer your questions:
The advantages of Sentinel over Redis Cluster are:
Hardware - You can setup fully working Sentinel deployment with three nodes. Redis Cluster requires at least six nodes.
Simplicity - usually it is easier to maintain and configure.
The advantages of Redis Cluster over Sentinel is that it is scalable.
The decision between that two deployment should be based on your expected load.
If your write load can be managed with a single Redis master node, you can go with Sentinel deployment.
If one node cannot handle your expected load, you must go with Cluster deployment.
Redis Sentinel deployment is not scalable so making the cluster bigger will not improve your performance. The only exception is that adding slaves can improve your read performance (in case you direct read requests to the slaves).
Redis Cluster running on a single node with multiple ports is only for development and demo purposes. In production it is useless.
In Redis Cluster deployment clients should have network access to all nodes (and node only Master1). This is because data is sharded between the masters.
In case client try to write data to Master1 but Master2 is the owner of the data, Master1 will return a MOVE message to the client, guiding it to send the request to Master2.
You cannot have a single HAProxy in front of all Redis nodes.
Same answer as in 5, in the cluster deployment clients should have direct connection to all masters and slaves not through LB or Keepalived.
Not sure I totally understood your question but Redis Cluster is the only solution for Redis that is scalable.
Redis Cluster deployment support multikey operations only when all keys are in the same node. You can use "hash tags" to force multiple keys to be handled by the same master.
Some good links that can help you understand it better:
Description on the different Redis deployment options: https://blog.octo.com/en/what-redis-deployment-do-you-need
Detailed explanation on the architecture of Redis Cluster: https://blog.usejournal.com/first-step-to-redis-cluster-7712e1c31847

Redis cluster on load balancer

I have setup a redis cluster with 1 master node and 2 slave nodes with sentinel running on all 3 nodes.
Prior to this setup, my application was pointing to a single node where redis instance was running.
After the clustering had been set up, where should my application point to?
Thanks.
you need more than one master nodes.
Slave is designed not writble
You can write to the master, and read from both slaves. Of course, you can also read from the master.
In most case, you should NOT write to slave, because even if you config the slave as writable, any write to slave does NOT sync to master or other slaves.
With slave you can achieve data replication. Also, reading from slaves scales out the read performance, if you set up each slave and master on distinct machine. However, you might have consistency problem, i.e. reading inconsistent data from slaves.
Redis cluster and Redis sentinel are two different concepts. If you only looking for HA I would recommend Sentinel, Redis cluster work on top of sharding which is highly distributed in nature. Redis cluster recommend to have minimum 3 masters and equal quantity of slaves for the healthy cluster.

Does redis delete all the keys when one master and its slave fails in redis cluster

I have a question. Suppose I am using a Redis cluster with 3 shards (with master and slave). I came to know that if a master and its slave fails at the same time Redis Cluster is not able to continue to operate. What happen after that.
Would Redis cluster delete all the other keys from other 2 nodes as well? (When it comes back)
Do we need to manually restart this cluster and can we somehow retain the other keys values (on other nodes)?
How will it behave if I use Azure Redis Cache?
Thanks In Advance
1. Would Redis cluster delete all the other keys from other 2 nodes as well? (When it comes back)
First of all only the operations are blocked not the cluster activity and nothing is done with the data so says the documentation
Redis Cluster failure detection is used to recognize when a master or slave node is no longer reachable by the majority of nodes and then respond by promoting a slave to the role of master. When slave promotion is not possible the cluster is put in an error state to stop receiving queries from clients.
Next regarding if the data gets deleted or not (Under Replication document)
In setups where Redis replication is used, it is strongly advised to have persistence turned on in the master
Which means that only if the persistence was turned off and the master server pair went down then you will loose the data. When the pair comes back up, you will not be able to recover the data. So keep Redis persistence turned on.
2. Do we need to manually restart this cluster and can we somehow retain the other keys values (on other nodes)?
I think the above answer covers it up.
3. How will it behave if I use Azure Redis Cache?
From Azure Redis Cache FAQ
High Availability/SLA: Azure Redis Cache guarantees that a Standard/Premium cache will be available at least 99.9% of the time. To learn more about our SLA, see Azure Redis Cache Pricing. The SLA only covers connectivity to the Cache endpoints. The SLA does not cover protection from data loss. We recommend using the Redis data persistence feature in the Premium tier to increase resiliency against data loss.
So it's kinda their headache
OR
Redis Cluster: If you want to create caches larger than 53 GB or want to shard data across multiple Redis nodes, you can use Redis clustering which is available in the Premium tier. Each node consists of a primary/replica cache pair for high availability. For more information, see How to configure clustering for a Premium Azure Redis Cache.

apache hadoop, hbase and nutch components distribution for 4 servers cluster

I have 4 systems. I want to crawl some data. For that first I need to configure cluster. I am confused about placement of components.
should I place all component (hadoop, hive, hbase, nutch) in one machine and add other machines as nodes in hadoop?
Should I place hbase in one machine, nutch in other and hadoop in third and add forth machine as slave of hadoop?
Should HBase be in pseudo distributed mode or full distributed.
How many slaves I sholud add in hbase if I run it as fully distributed mode.
What should be the best way. PLease guide step by step ( For hbase and hadoop)
Say you have 4 nodes n1, n2, n3 and n4.
You can install hadoop and hbase in distributed mode.
If you are using Hadoop 1.x -
n1 - hadoop master[Namenode and Jobtracker]
n2, n3 and n3 - hadoop slaves [datanodes and tasktrackers]
For HBase, you can choose n1 or any other node as Master node, Since Master node are usually not CPU/Memory intensive, all Masters can be deployed on single node on test setup, However in Production its good to have each Master deployment on a separate node.
Lets say n2 - HBase Master, remaining 3 nodes can act as regionservers.
Hive and Nutch can reside on any node.
Hope this helps; For a test setup this should be good to go.
Update -
For Hadoop 2.x, since your cluster size is small, Namenode HA deployment can be skipped.
Namenode HA would require two nodes one each for an active and standby node.
A zookeeper quorum which again requires odd number of nodes so a minimum of three nodes would be required.
A journal quorum again require a minimum of 3 nodes.
But for a cluster this small HA might not be a major concern. So you can keep
n1 - namenode
n2 - ResouceManager or Yarn
and remaining nodes can act as datanodes, try not to deploy anything else on the yarn node.
Rest of the deployment for HBase, Hive and Nutch would remain same.
In my opinion, you should install Hadoop in fully distributed mode, so the jobs could run in parallel manner and much faster, as the MapReduce tasks will be distributed in 4 machines. Of course, the Hadoop's master node should run in one single machine.
If you need to process big amount of data, it's a good choice to install HBase in one single machine and the Hadoop in 3.
You could make all the above very easy using tools/platforms with a very friendly GUI like Cloudera Manager and Hortonworks. They will help you to control and maintain your cluster better but they are also provide Health Monitoring, Cluster Analytics as well as E-Mail notifications for every error occurs in your cluster.
Cloudera Manager
http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-enterprise/cloudera-manager.html
Hortonworks
http://hortonworks.com/
In these two links, you can find more guidance about how you could costruct your cluster