Cassandra rebalance - load-balancing

I'm using Apache Cassandra 2.1.1 and when using nodetool status the Load for one of my nodes is about half the size of the other two while the Owns is almost equal on all the nodes. I am somewhat new to Cassandra and don't know if I should be worried about this or not. I have tried using repair and cleanup after restarting all the nodes, but it still appears unbalanced. I am using GossipingPropertyFileSnitch with each node configured dc=DC1 and rack=RAC1 specified in cassandra-rackdc.properties. I am also using Murmur3Partitioner with NetworkTopologyStrategy where my keyspace is defined as
CREATE KEYSPACE awl WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '2'} AND durable_writes = true;
I believe the problem to be with the awl keyspace since the size of the data/awl folder is the same size as reported by nodetool status. My output for nodetool status is below. Any help would be much appreciated.
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.1.1.152 3.56 GB 256 68.4% d42945cc-59eb-41de-9872-1fa252762797 RAC1
UN 10.1.1.153 6.8 GB 256 67.2% 065c471d-5025-4bf1-854d-52d579f2a6d3 RAC1
UN 10.1.1.154 6.31 GB 256 64.4% 46f05522-29cc-491c-ab65-334b205fc415 RAC1

I would suspect this is due to the distribution of the key values that are being inserted. They are probably not well distributed across the possible key values, so many of them are hashing to one node. Since you are using replication factor 2, the second replica is the next node in the ring, resulting in two nodes with more data than the third node.
You didn't show your table schema, so I don't know what you are using for the partition and clustering keys. You want to use key values that have a high cardinality and good distribution to avoid hot spots where a lot of inserts are hashing to one node. With a better distribution you will get better performance and more even space usage across the nodes.

Related

Splunk : How to figure out replication Factor

If this sound silly to you I apologise in advance, I am new to splunk and did udemy course but can't figure out this.
If I check my indexes.conf file in cluster master I get repFator=0
#
# By default none of the indexes are replicated.
#
repFactor = 0
but if I check https://:8089/services/cluster/config
I see replication factor :
replication_factor 2
So I am confused whether my data is getting replicated,
I have two indexes in a cluster
I believe replication_factor determines how many replicas to have amongst nodes in the cluster, and refFactor determines whether or not to replicate a particular index.
For repFactor, which is an index specific setting
The indexes.conf repFactor attribute
When you add a new index stanza, you must set the repFactor attribute to "auto". This causes the index's data to be replicated to other peers in the cluster.
Note: By default, repFactor is set to 0, which means that the index will not be replicated. For clustered indexes, you must set it to "auto".
The only valid values for repFactor are 0 and "auto".
For replication_factor, which is a cluster setting:
Replication factor and cluster resiliency
The cluster can tolerate a failure of (replication factor - 1) peer nodes. For example, to ensure that your system can tolerate a failure of two peers, you must configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of data available on the remaining peers.
By increasing the replication factor, you can tolerate more peer node failures. With a replication factor of 2, you can tolerate just one node failure; with a replication factor of 3, you can tolerate two concurrent failures; and so on.
The repFactor setting lets you choose which indexes are replicated. By default, none are. The replication_factor setting says how many copies of a replicated bucket to make. Both must be non-zero to replicate data.
The Cluster Manager should confirm that. Select Settings->Indexer Clustering to see which indexes are replicated and their state.

Aerospike cluster behavior in different consistency mode?

I want to understand the behavior of aerospike in different consistancy mode.
Consider a aerospike cluster running with 3 nodes and replication factor 3.
AP modes is simple and it says
Aerospike will allow reads and writes in every sub-cluster.
And Maximum no. of node which can go down < 3 (replication factor)
For aerospike strong consistency it says
Note that the only successful writes are those made on replication-factor number of nodes. Every other write is unsuccessful
Does this really means the no writes are allowed if available nodes < replication factor.
And then same document says
All writes are committed to every replica before the system returns success to the client. In case one of the replica writes fails, the master will ensure that the write is completed to the appropriate number of replicas within the cluster (or sub cluster in case the system has been compromised.)
what does appropriate number of replica means ?
So if I lose one node from my 3 node cluster with strong consistency and replication factor 3 , I will not be able to wright data ?
For aerospike strong consistency it says
Note that the only successful writes are those made on
replication-factor number of nodes. Every other write is unsuccessful
Does this really means the no writes are allowed if available nodes <
replication factor.
Yes, if there are fewer than replication-factor nodes then it is impossible to meet the user specified replication-factor.
All writes are committed to every replica before the system returns
success to the client. In case one of the replica writes fails, the
master will ensure that the write is completed to the appropriate
number of replicas within the cluster (or sub cluster in case the
system has been compromised.)
what does appropriate number of replica means ?
It means replication-factor nodes must receive the write. When a node fails, a new node can be promoted to replica status until either the node returns or an operator registers a new roster (cluster membership list).
So if I lose one node from my 3 node cluster with strong consistency
and replication factor 3 , I will not be able to wright data ?
Yes, so having all nodes a replicas wouldn't be a very useful configuration. Replication-factor 3 allows up to 2 nodes to be down, but only if the remaining nodes are able to satisfy the replication-factor. So for replication-factor 3 you would probably want to run with a minimum of 5 nodes.
You are correct, with 3 nodes and RF 3, losing one node means the cluster will not be able to successfully take write transactions since it wouldn't be able to write the required number of copies (3 in this case).
Appropriate number of replicas means a number of replicas that would match the replication factor configured.

what do we mean by hash slot in redis cluster?

I have read redis-cluster documents but couldn't get the gist of it. Can someone help me understand it from the basics?
Redis Cluster does not use consistent hashing, but a different form of
sharding where every key is conceptually part of what we call an hash
slot.
Hash slots are defined by Redis so the data can be mapped to different nodes in the Redis cluster. The number of slots (16384 ) can be divided and distributed to different nodes.
For example, in a 3 node cluster one node can hold the slots 0 to 5640, next 5461 to 10922 and the third 10923 to 16383. Input key or a part of it is hashed (run against a hash function) to determine a slot number and hence the node to add the key to.
Think of it as logical shards. So redis has 16384 logical shards and these logical shards are mapped to the available physical machines in the cluster.
Mapping may look something like:
0-1000 : Machine 1
1000-2000 : Machine 2
2000-3000 : Machine 3
...
...
When redis gets a key, it does the following:
Calculate hash(key)%16384 -> this finds the logical shard where the given key belongs, let's say it comes to 1500
Look at logical shard to physical machine mapping to identify physical machine. From the above mapping, logical shard 1500 is served by Machine 2. So route the request to physical machine #2.
You can consider slot as its literal meaning, just like slots in the real world.
Every key belongs to a certain slot by some rules. And a slot also belongs to a certain redis node by config.

Are redis hashes kept in ziplist after changing hash-max-ziplist-entries?

I'm running a redis instance where I have stored a lot of hashes with integer fields and values. Specifically, there are many hashes of the form
{1: <int>, 2: <int>, ..., ~10000: <int>}
I was initially running redis with the default values for hash-max-ziplist-entries:
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
and redis was using approximately 3.2 GB of memory.
I then changed these values to
hash-max-ziplist-entries 10240
hash-max-ziplist-value 10000
and restarted redis. My memory usage went down to approximately 480 MB, but redis was using 100% CPU. I reverted the values back to 512 and 64, and restarted redis, but it was still only using 480 MB of memory.
I assume that the memory usage went down because a lot of my hashes were stored as ziplists. I would have guessed that after changing the values and restarting redis they would automatically be converted back into hash tables, but this doesn't appear to be the case.
So, are these hashes still being stored as a ziplist?
They are still in optimized "ziplist" format.
Redis will store hashes (via "hset" or similar) in an optimized way if the hash does end up having more than hash-max-ziplist-entries entries, or if the values are smaller than hash-max-ziplist-values bytes.
If these limits are broken Redis will store the item "normally", ie. not optimized.
Relevant section in documentation (http://redis.io/topics/memory-optimization):
If a specially encoded value will overflow the configured max size, Redis will automatically convert it into normal encoding.
Once the values are written in an optimized way, they are not "unpacked", even if you lower the max size settings later. The settings will apply to new keys that Redis stores.

Memory utilization in redis for each database

Redis allows storing data in 16 different 'databases' (0 to 15). Is there a way to get utilized memory & disk space per database. INFO command only lists number of keys per database.
No, you can not control each database individually. These "databases" are just for logical partitioning of your data.
What you can do (depends on your specific requirements and setup) is spin multiple redis instances, each one does a different task and each one has its own redis.conf file with a memory cap. Disk space can't be capped though, at least not in Redis level.
Side note: Bear in mind that the 16 database number is not hardcoded - you can set it in redis.conf.
I did it by calling dump on all the keys in a Redis DB and measuring the total number of bytes used. This will slow down your server and take a while. It seems the size dump returns is about 4 times smaller than the actual memory use. These number will give you an idea of which db is using the most space.
Here's my code:
https://gist.github.com/mathieulongtin/fa2efceb7b546cbb6626ee899e2cfa0b