Optimizing Redis cluster nodes - redis

I understand that in a Redis cluster, there are 16384 slots total distributed across the nodes. So if I have a key like this entity:user:userID (like user:1234) and the value is a serialized user object and say if my application has 500k+ users. It should get distributed to each slots evenly. We currently have 6 nodes total (3 masters and 3 slaves), and we are always wondering when we shall add 2 more nodes to 8 total. We also do write the cache data to disk, and sometimes we do get latency warning when persisting to disk. I'd assume if there are more nodes, there are less data to persist for each node, thus a better performance/usage of resources. But asides from disk i/o, is there a dead-on performance measurement to let us know when we should start adding additional nodes?
Thanks!

If your limiting factor is disk I/O for replication, using SSDs can drastically improve performance.
Two additional signs that it is time to scale out include server load and used memory for your nodes. There are others, but these two are simple to reason about.
If your limiting factor is processing power on the nodes (e.g. server load) because of a natural increase in requests, scaling out will help distribute the load across more nodes. If one node is consistently higher than the others, this could indicate a hot partition, which is a separate problem to solve.
If your limiting factor is total storage capacity (e.g. used memory) because of a natural increase in data stored in your cache, scaling out will help grow the total storage capacity of your cluster. If you have a very large dataset and the set of keys used on a regular basis is small, technologies such as Redis on Flash by Redis Labs may be applicable.

Related

Redis vs Aerospike usecases?

After going through couple of resources on Google and stack overflow(mentioned below) , I have got high level understanding when to use what but
got couple of questions too
My Understanding :
When used as pure in-memory memory databases both have comparable performance. But for big data where complete complete dataset
can not be fit in memory or even if it can be fit (but it increases the cost), AS(aerospike) can be the good fit as it provides
the mode where indexes can be kept in memory and data in SSD. I believe performance will be bit degraded(compared to completely in memory
db though the way AS handles the read/write from SSD , it makes it fast then traditional disk I/O) but saves the cost and provide performance
then complete data on disk. So when complete data can be fit in memory both can be
equally good but when memory is constraint AS can be good case. Is that right ?
Also it is said that AS provided rich and easy to set up clustering feature whereas some of clustering features in redis needs to be
handled at application. Is it still hold good or it was true till couple of years back(I believe so as I see redis also provides clustering
feature) ?
How is aerospike different from other key-value nosql databases?
What are the use cases where Redis is preferred to Aerospike?
Your assumption in (1) is off, because it applies to (mostly) synthetic situations where all data fits in memory. What happens when you have a system that grows to many terabytes or even petabytes of data? Would you want to try and fit that data in a very expensive, hard to manage fully in-memory system containing many nodes? A modern machine can store a lot more SSD/NVMe drives than memory. If you look at the new i3en instance family type from Amazon EC2, the i3en.24xl has 768G of RAM and 60TB of NVMe storage (8 x 7.5TB). That kind of machine works very well with Aerospike, as it only stores the indexes in memory. A very large amount of data can be stored on a small cluster of such dense nodes, and perform exceptionally well.
Aerospike is used in the real world in clusters that have grown to hundreds of terabytes or even petabytes of data (tens to hundreds of billions of objects), serving millions of operations per-second, and still hitting sub-millisecond to single digit millisecond latencies. See https://www.aerospike.com/summit/ for several talks on that topic.
Another aspect affecting (1) is the fact that the performance of a single instance of Redis is misleading if in-reality you'll be deployed on multiple servers, each with multiple instances of Redis on them. Redis isn't a distributed database as Aerospike is - it requires application-side sharding (which becomes a bit of a clustering and horizontal scaling nightmare) or a separate proxy, which often ends up being the bottleneck. It's great that a single shard can do a million operations per-second, but if the proxy can't handle the combined throughput, and competes with shards for CPU and memory, there's more to the performance at scale picture than just in-memory versus data on SSD.
Unless you're looking at a tiny amount of objects or a small amount of data that isn't likely to grow, you should probably compare the two for yourself with a proof-of-concept test.

InfluxDB max available expiration and performance concerns

I develop my metrics based on influxdb. I want to keep the data forever therefore my retention policy is set to inf and my shard retention policy is set to 100 years (the max I could set).
My main concern has to do with degrading performance by keeping this data. My series will not be more than 100000 (as adviced for the low server specs).
Is there gonna be an impact on the memory used indexing wise? More specific memory used by influxdb regardless of issuing any actions such as queries/continoues queries
Also in case there is a problem with performance, is it possible to backup only the data that are bound to be deleted?
Based on InfluxDB Hardware sizing guidelines, in moderate load situation with a single node InfluxDB deployed on a server with these specifications: CPU:6 cores and RAM:8-32 GB; you can have 250k writes per second and about 25 queries per second. These numbers will definitely meet your requirements. Also by increasing CPU and RAM you can achieve better performance.
Note, If the scale of your work grew in the future, you can also use "continues query" for down-sampling old data; or export a part of data to a backup file.

Why are hotspots on my redis cluster bad?

I have a redis cluster and I am planning to add keys which I know will have a much heavier read/update frequency than other keys. I assume this might cause hotspots on my cluster. Why is this bad and how can I avoid it ?
Hotspot on keys is ok, if these keys can sharding to different redis nodes. But if there is hotspot on some redis nodes/machines, that will be bad, as the memory/cpu load of these machines will be quite heavy, while other nodes are not efficiently used.
If you know exactly what these keys are, you can calculate slots of them by yourself at first, with CRC16 of the key modulo 16384.
And then you can distribute these slots to different redis nodes.
Whether or not items will cause a hot spot on a particular node or nodes depends on a bunch of factors. As already mentioned, hotspotting on a single key isn't necessarily a problem if the overall cluster traffic is still relatively even and the node that key is on isn't taxed. If each of your cluster nodes are handling 1000 commands/sec and on one of those nodes all of the commands are one related to one key, it's not really going to matter since all of the commands are processed serially on a single thread, it's all the same.
However, if you have 5 nodes, all of which are handling 1000 commands/sec, but you add a new key to one node which makes that single node incur another 3000 commands/sec, one of your 5 nodes is now handling 50% of the processing. This means that it's going to take longer for that node handle all of its normal 1000 commands/sec, and 1/5 of your traffic is now arbitrarily much slower.
Part of the general idea of distributing/sharding a database is not only to increase storage capacity but to balance work as well. Unbalancing that work will end up unbalancing or screwing up your application performance. It'll either cause 1/Nth of your request load to be arbitrarily slower due to accessing your bogged down node, or it'll increase processing time across the board if your requests potentially access multiple nodes. Evenly spreading load gives an application better capacity to handle higher load without adversely effecting performance.
But there's also the practical consideration of whether the access to your new keys are proportionally large to your ongoing traffic. If your cluster is handling 1000+ commands/sec/node and a single key will add 10 req/sec to a single particular node, you'll probably be fine just fine either way.

Why redis cluster only have 16384 slots?

In my opinion, with the development of keys, the 'hash conflict' will occurs more and more frequently. I have no idea if those keys on the same slot are stored in singly linked list, then read performance will be effected, especially the stale record?
answer from antirez, the author of Redis, below.
The reason is:
Normal heartbeat packets carry the full configuration of a node, that can be replaced in an idempotent way with the old in order to update an old config. This means they contain the slots configuration for a node, in raw form, that uses 2k of space with16k slots, but would use a prohibitive 8k of space using 65k slots.
At the same time it is unlikely that Redis Cluster would scale to more than 1000 mater nodes because of other design tradeoffs.
So 16k was in the right range to ensure enough slots per master with a max of 1000 maters, but a small enough number to propagate the slot configuration as a raw bitmap easily. Note that in small clusters the bitmap would be hard to compress because when N is small the bitmap would have slots/N bits set that is a large percentage of bits set.
These "slots" are merely a unit of distribution across shards. You're not going to have of 16K shards servers in a cluster; but the are granular enough to allow some degree of weighted load distribution. (For example if you start with four shard on one type of hardware and choose to introduce two more of a more power profile, you could make the new servers targets for twice as many slots as the existing servers and thus achieve a more even relatively utilization of your capacity.
I'm just summarizing the gist of how they're used. For details read the Redis Cluster Specification.

Solandra Sharding: Insider Thoughts

Just got started on Solandra and was trying to understand the 2nd
level details of Solandra sharding.
AFAIK Soalndra creates number of shards configured (as
"solandra.shards.at.once" property) where each shard is up to size of
"solandra.maximum.docs.per.shard".
On the next level it starts
creating slots inside each shard which are defined by
"solandra.maximum.docs.per.shard"/"solandra.index.id.reserve.size".
What I understood from the datamodel of SchemaInfo CF that inside a
particular shard there are slots owned by different physical nodes and
these is a race happening between nodes to get these slots.
My questions are:
Does this mean if I request write on a particular solr node
eg .....solandra/abc/dataimport?command=full-import does this request
gets distributed to all possible nodes etc. Is this distributed write?
Because until that happens how would other nodes be competing for
slots inside a particular shard.Ideally the code for writing a doc or
set of docs would be getting executed on a single physical JVM.
By sharding we tried to write some docs on the single physical node
but if it is writing based on the slots which are owned by different
physical nodes , what did we actually achieved as we again need to
fetch results from different nodes. I understand that the write
throughput is maximized.
Can we look into tuning these numbers -?
"solandra.maximum.docs.per.shard" ,
"solandra.index.id.reserve.size","solandra.shards.at.once" .
If I have just one shard and replication factor as 5 in a single DC
6 node setup, I saw that the endpoints of this shard contain 5
endpoints as per the Replication Factor.But what happens to the 6th
one. I saw through nodetool that the left 6th node doesn't really get
any data. If I increase the replication factor to 6 while keeping the
cluster on , will this solve the problem and doing repair etc or is
there a better way.
Overall the shards.at.once param is used to control parallelism of indexing. the higher that number the more shards are written to at once. If you set it to one you will always to writing to only one shard. Normally this should be set to 20% > the number of nodes in the cluster. so for a four node cluster set it to five.
The higher the reserve size, the less coordination between the nodes is needed. so if you know you have lots of documents to write then raise this.
The higher the docs.per.shard the slower the queries for a given shard will become. In general this should be 1-5M max.
To answer your points:
This will only import from one node. but it will index across many depending on shards at once.
I think the question is should you write across all nodes? Yes.
Yes see above.
If you increase shards.at.once this will be populated quickly