Redis performance on a multi core CPU

Redis performance on a multi core CPU - redis

I am looking around redis to provide me an intermediate cache storage with a lot of computation around set operations like intersection and union.
I have looked at the redis website, and found that the redis is not designed for a multi-core CPU. My question is, Why is it so ?
Also, if yes, how can we make 100% utilization of CPU resources with redis on a multi core CPU's.

I have looked at the redis website, and found that the redis is not designed for a multi-core CPU. My question is, Why is it so?
It is a design decision.
Redis is single-threaded with epoll/kqueue and scales indefinitely in terms of I/O concurrency. --#antirez (creator of Redis)
A reason for choosing an event-driven approach is that synchronization between threads comes at a cost in both the software (code complexity) and the hardware level (context switching). Add to this that the bottleneck of Redis is usually the network or the *memory, not the CPU. On the other hand, a single-threaded architecture has its own benefits (for example the guarantee of atomicity).
Therefore event loops seem like a good design for an efficient & scalable system like Redis.
Also, if yes, how can we make 100% utilization of CPU resources with
redis on a multi core CPU's.
The Redis approach to scale over multiple cores is sharding, mostly together with Twemproxy.
However if for some reason you still want to use a multi-threaded approach, take a look at Thredis but make sure you understand the implications of what its author did (you can not use it as a replication master, for instance).

Redis server is a single threaded. But it allows to achieve 100% utilization of CPU resources using Redis nodes (master and/or slave).
Read operations could be scaled using Redis master/slave configuration with single master. One of CPU core used for master node and all others for slaves.
Write operations could be scaled using Redis multi-master cluster configuration. Multiple CPU cores used for master nodes and all others for slaves.
Redisson - Redis Java client which provides full support of Redis cluster. Works with AWS Elasticache and Azure Redis Cache. It includes master/slave discovery and topology update.

Related

How to save memory from unpopular/cold Redis?

We have a lot of Redis instances, consuming TBs of memory and hundreds of machines.
With our business activities goes up and down, some Redis instances are just not used that frequent any more -- they are "unpopular" or "cold". But Redis stores everything in memory, so a lot of infrequent data that should have been stored in cheap disk are occupying expensive memory.
We are exploring a way to save the memory from these unpopular/cold Redis, as to reduce our machines usage.
We cannot delete data, nor can we migrate to other database. Are there some way to achieve our goals?
PS: We are thinking of some Redis compatible product that can "mix" memory and disk, i.e. it stores hot data in memory but cold in disk, and USING LIMITED RESOURCES. We know RedisLabs' "Redis on Flash(ROF)" solution, but it uses RocksDB, which is very memory unfriendly. What we want is a very memory restrained product. Besides, ROF is not open source :(
Thanks in advance!

ElastiCache Redis now supports data tiering. Data tiering provides a new cost optimal option for storing data in Redis by utilizing lower-cost local NVMe SSDs in each cluster node in addition to storing data in memory. It is ideal for workloads that access up to 20 percent of their overall dataset regularly, and for applications that can tolerate additional latency when accessing data on SSD. More details about data tiering can be found here.

Your problem might be solved by using an orchestrator approach: scaledown when not in use, scale up when in demand.
Implementation depends much on your infrastructure, but a base requirement is proper monitoring of Redis instances usage.
Based on that, if you are running on Kubernetes, you can leverage pod autoscaling.
Otherwise you can implement Consul and use HAProxy to handle the shutdown/spin-up logic. A starting point for that strategy is this article.
Of course Reiner's idea of using swap is a quick win if it works the intended way!

Zookeeper：how many nodes can zookeeper support?

Background:
In our team,we use zookeeper to process when configure changed.
Question:
1、Zookeeper can support how much nodes.
2、What factors can affect zookeeper supported nodes，cpu、disk or network?

Zookeeper can support as many nodes as you wish. It is distributed coordination service that doesn't get affected by the number of nodes you vary in the network. It only matters if the application involves more reads/writes. Zookeeper is most suited when there are more reads, although it works well with writes too.
Zookeeper doesn't get affected by the CPU or the Disk. Since the underlying consensus Protocol in Zookeeper is ZAB, which takes care of CP in CAP theorem. Partitioning of the network is tolerated along with maintaining the consistency using atomic broadcast protocol.

What would be a good hardware configuration for a Redis dedicated server?

I am planning to configure Redis in Master/Slave configuration.
I have got three machines (8GB RAM, 8 cores), planing to to use one master and two slaves.
What would be the recommended hardware configuration for these machines?

Redis is not CPU intensive, so you should get at least 2 cores per server (one for redis, one for backups, maybe one more to do basic stuff on the server?), more is not really relevant. Redis is single-threaded.
Get as much RAM as you can as it defines the size of your store. Also making a dump consumes RAM so your true space size is less than you can think. Monitor your RAM usage to prevent surprises.
For RAM type, if it fails, redis fails and sometimes silently (consistency broken). If you need to be careful with your data always use ECC RAM, it is expensive but maybe less expensive than broken data in RAM accessed through redis causing unknown effects. Redis has no known checks against hardware errors from RAM, even if it is quite rare (less likely to happen than a broken hard drive) it does happen.

How does Redis achieve the high throughput and performance?

I know this is a very generic question. But, I wanted to understand what are the major architectural decision that allow Redis (or caches like MemCached, Cassandra) to work at amazing performance limits.
How are connections maintained?
Are connections TCP or HTTP?
I know that it is completely written in C. How is the memory managed?
What are the synchronization techniques used to achieve high throughput inspite
of competing read/writes?
Basically, what is the difference between a plain vanilla implementation of a machine with in memory cache and server that can respond to commands and a Redis box? I also understand that the answer needs to be very huge and should include very complex details for completion. But, what I'm looking for are some general techniques used rather than all nuances.

There is a wealth of of information in the Redis documentation to understand how it works. Now, to answer specifically your questions:
1) How are connections maintained?
Connections are maintained and managed using the ae event loop (designed by the Redis author). All network I/O operations are non blocking. You can see ae as a minimalistic implementation using the best network I/O demultiplexing mechanism of the platform (epoll for Linux, kqueue for BSD, etc ...) just like libevent, libev, libuv, etc ...
2) Are connections TCP or HTTP?
Connections are TCP using the Redis protocol, which is a simple telnet compatible, text oriented protocol supporting binary data. This protocol is typically more efficient than HTTP.
3) How is the memory managed?
Memory is managed by relying on a general purpose memory allocator. On some platforms, this is actually the system memory allocator. On some other platforms (including Linux), jemalloc has been selected since it offers a good balance between CPU consumption, concurrency support, fragmentation and memory footprint. jemalloc source code is part of the Redis distribution.
Contrary to other products (such as memcached), there is no implementation of a slab allocator in Redis.
A number of optimized data structures have been implemented on top of the general purpose allocator to reduce the memory footprint.
4) What are the synchronization techniques used to achieve high throughput inspite of competing read/writes?
Redis is a single-threaded event loop, so there is no synchronization to be done since all commands are serialized. Now, some threads also run in the background for internal purposes. In the rare cases they access the data managed by the main thread, classical pthread synchronization primitives are used (mutexes for instance). But 100% of the data accesses made on behalf of multiple client connections do not require any synchronization.
You can find more information there:
Redis is single-threaded, then how does it do concurrent I/O?
What is the difference between a plain vanilla implementation of a machine with in memory cache and server that can respond to commands and a Redis box?
There is no difference. Redis is a plain vanilla implementation of a machine with in memory cache and server that can respond to commands. But it is an implementation which is done right:
using the single threaded event loop model
using simple and minimalistic data structures optimized for their corresponding use cases
offering a set of commands carefully chosen to balance minimalism and usefulness
constantly targeting the best raw performance
well adapted to modern OS mechanisms
providing multiple persistence mechanisms because the "one size does fit all" approach is only a dream.
providing the building blocks for HA mechanisms (replication system for instance)
avoiding stacking up useless abstraction layers like pancakes
resulting in a clean and understandable code base that any good C developer can be comfortable with

Distributed Cache that supports incr

I'm looking for a distributed key/value store that supports a balanced load of reads and writes.
Necessary Features:
Get, Set, Incr
Disk backed
Blazingly fast (i.e. eventual consistency is OK)
High availability (i.e. rebalancing load upon node failures)
Nice to have Features:
Overflow to disk (Assuming the load has nice locality properties)
Platform-agnostic (e.g. java based)
Because a lot of the distributed caching solutions support get/set but not incr, it looks like the only option that fits the requirements is terracotta. (Though Redis has a cluster model in their unstable branch).
Any Suggestions?

I can speak namely for redis.
Necessary Features:
Yes, support also for other advanced data structures like hashed, (ordered) sets and lists
Yes, by default redis saves snapshot of the data set on disk.
Yes.
Rebalancing load upon node failures is rather a partition tolerance than high availability in terms of CAP theorem. Redis support replication and cluster is in development.
Nice to have Features:
Read the article about virtual memory.
Most of the POSIX systems.
Maybe your can try to take a look also on membase or couchbase server.

http://www.basho.com/ Riak will do this for you.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas