We have a lot of Redis instances, consuming TBs of memory and hundreds of machines.
With our business activities goes up and down, some Redis instances are just not used that frequent any more -- they are "unpopular" or "cold". But Redis stores everything in memory, so a lot of infrequent data that should have been stored in cheap disk are occupying expensive memory.
We are exploring a way to save the memory from these unpopular/cold Redis, as to reduce our machines usage.
We cannot delete data, nor can we migrate to other database. Are there some way to achieve our goals?
PS: We are thinking of some Redis compatible product that can "mix" memory and disk, i.e. it stores hot data in memory but cold in disk, and USING LIMITED RESOURCES. We know RedisLabs' "Redis on Flash(ROF)" solution, but it uses RocksDB, which is very memory unfriendly. What we want is a very memory restrained product. Besides, ROF is not open source :(
Thanks in advance!
ElastiCache Redis now supports data tiering. Data tiering provides a new cost optimal option for storing data in Redis by utilizing lower-cost local NVMe SSDs in each cluster node in addition to storing data in memory. It is ideal for workloads that access up to 20 percent of their overall dataset regularly, and for applications that can tolerate additional latency when accessing data on SSD. More details about data tiering can be found here.
Your problem might be solved by using an orchestrator approach: scaledown when not in use, scale up when in demand.
Implementation depends much on your infrastructure, but a base requirement is proper monitoring of Redis instances usage.
Based on that, if you are running on Kubernetes, you can leverage pod autoscaling.
Otherwise you can implement Consul and use HAProxy to handle the shutdown/spin-up logic. A starting point for that strategy is this article.
Of course Reiner's idea of using swap is a quick win if it works the intended way!
Related
I learned etcd for a few hours, but a question suddenly came into me. I found that redis is fully capable of covering functions which etcd owns.Like key/value CRUD && watch, and redis is very simple to use. why people choose etcd instead of redis?
why?
I googled a few posts, but no post told me the reason.
Thanks!
Redis stores data in memory, which makes it very high performance but not very durable. If the redis server dies, it's easy to lose data. Etcd stores data in files on disc, and performs fsync across multiple nodes before resolving to guarantee consistency, which makes it very durable but not very performant.
That's a good trade-off for kubernetes, which is using etcd for cluster state and configuration, not user data. It would not be a good trade-off for something like user session data which you might be using redis for in your app because you need extremely fast response times and can tolerate a bit of data loss or inconsistency.
A major difference which is affecting my choice of one vs the other is:
etcd keeps the data index in RAM and the data store on disk
redis keeps both data index and data store in RAM
Theoretically, this means etcd ought to be a good fit for large data / small memory scenarios, where redis would require large RAM.
In practice, etcd's current behaviour is that it allocates some memory per transaction when data is accessed. Under heavy load, the memory footprint of the etcd server balloons unboundedly (appears limited by the rate of read requests), and the Go runtime eventually OOM's, killing the server.
In contrast, the redis design requires a virtual address space sized in relation to the dataset, or to the partition of the dataset stored locally.
Memory footprint examples
Eg, with redis, a 8GB dataset partition with an index size of 0.5GB requires 8.5GB of virtual address space (ie, could be handled with 1GB of RAM and 7.5GB of swap), but not less, and the requirement has an upper bound.
The same 8GB dataset, with etcd, would require only 0.5GB of virtual address space, but not less (ie, could be handled with 500MB of RAM and no swap), in theory. In practice, under high load, etcd's memory use is unbounded.
Other considerations
There are other considerations like data consistency, or supported languages, that have to be evaluated separately.
In my case, the language the server is written in is a factor, as I have in-house C expertise, but no Go expertise. This means I can maintain/diagnose/customize redis (written in C) in-house if needed, but cannot do the same with etc (written in Go), I'd have to use it as released by the maintainers.
My conclusion
Unfortunately, the memory behaviour of etcd, whereby it needs to allocate memory to access the indexed data, negates the memory advantages it might have theoretically, and the risk of crash by OOM due to high load make it unsuitable in applications that might experience unexpected usage spikes. Github bug 14362, Github bug 14352, other OOM reports
Furthermore, the ability to customize the server in-house (ie, available C vs Go expertise) is a business consideration that weighs in redis's favour, in my case.
I am looking around redis to provide me an intermediate cache storage with a lot of computation around set operations like intersection and union.
I have looked at the redis website, and found that the redis is not designed for a multi-core CPU. My question is, Why is it so ?
Also, if yes, how can we make 100% utilization of CPU resources with redis on a multi core CPU's.
I have looked at the redis website, and found that the redis is not designed for a multi-core CPU. My question is, Why is it so?
It is a design decision.
Redis is single-threaded with epoll/kqueue and scales indefinitely in terms of I/O concurrency. --#antirez (creator of Redis)
A reason for choosing an event-driven approach is that synchronization between threads comes at a cost in both the software (code complexity) and the hardware level (context switching). Add to this that the bottleneck of Redis is usually the network or the *memory, not the CPU. On the other hand, a single-threaded architecture has its own benefits (for example the guarantee of atomicity).
Therefore event loops seem like a good design for an efficient & scalable system like Redis.
Also, if yes, how can we make 100% utilization of CPU resources with
redis on a multi core CPU's.
The Redis approach to scale over multiple cores is sharding, mostly together with Twemproxy.
However if for some reason you still want to use a multi-threaded approach, take a look at Thredis but make sure you understand the implications of what its author did (you can not use it as a replication master, for instance).
Redis server is a single threaded. But it allows to achieve 100% utilization of CPU resources using Redis nodes (master and/or slave).
Read operations could be scaled using Redis master/slave configuration with single master. One of CPU core used for master node and all others for slaves.
Write operations could be scaled using Redis multi-master cluster configuration. Multiple CPU cores used for master nodes and all others for slaves.
Redisson - Redis Java client which provides full support of Redis cluster. Works with AWS Elasticache and Azure Redis Cache. It includes master/slave discovery and topology update.
Redis is "memory monster". Storing data as "compressed json string" minimizes memory usage.
Is there any built-in compression option in Redis Db?
Redis uses LZF light data compressor at the dump time, so it won't lessen the memory consumption. Implying that the redis does not compresses the data in memory and stores it as a string.You must deploy your own client side compression code.
The lua scripting also provides the compression algorithm but the branch is relatively new and therefore won't be advisable to use at production level.
No, there isn't any runtime compression option.
However, as dan-boa said - it might be a good idea to implement compression on your application side. Doing it that way will let to save CPU on the Redis server. Your Database server won't be affected of cpu time needed for compression.
In one of our Redis cluster we saved like 82% of memory (from circa 340GB to 60GB) thanks to GZIPing our json-based blobs. Some more thoughts about it and other ways of optimizing memory usage can be found in our article:
http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/
Note: link moved to archive.org backup
I'm looking for a distributed key/value store that supports a balanced load of reads and writes.
Necessary Features:
Get, Set, Incr
Disk backed
Blazingly fast (i.e. eventual consistency is OK)
High availability (i.e. rebalancing load upon node failures)
Nice to have Features:
Overflow to disk (Assuming the load has nice locality properties)
Platform-agnostic (e.g. java based)
Because a lot of the distributed caching solutions support get/set but not incr, it looks like the only option that fits the requirements is terracotta. (Though Redis has a cluster model in their unstable branch).
Any Suggestions?
I can speak namely for redis.
Necessary Features:
Yes, support also for other advanced data structures like hashed, (ordered) sets and lists
Yes, by default redis saves snapshot of the data set on disk.
Yes.
Rebalancing load upon node failures is rather a partition tolerance than high availability in terms of CAP theorem. Redis support replication and cluster is in development.
Nice to have Features:
Read the article about virtual memory.
Most of the POSIX systems.
Maybe your can try to take a look also on membase or couchbase server.
http://www.basho.com/ Riak will do this for you.
I'm rather new to Redis and before using it I'd like to learn some important (as for me) details on it. So....
Redis is using RAM and HDD for storing data. RAM is used as fast read/write storage, HDD is used to make this data persistant. When Redis is started it loads all data from HDD to RAM or it loads only often queried data to the RAM? What if I have 500Mb Redis storage on HDD, but I have only 100Mb or RAM for Redis. Where can I read about it?
Redis loads everything into RAM. All the data is written to disk, but will only be read for things like restarting the server or making a backup.
There are a couple of ways you can use it with less RAM than data though. You can set it up in combination with MySQL or another disk based store to work much like memcached - you manage cache misses and persistence manually.
Redis has a VM mode where all keys must fit in RAM but infrequently accessed data can be on disk. However, I'm not sure if this is in the stable builds yet.
Recent versions (>2.0) have improved significantly and memory management is more efficient. See this blog post that explains how to use hashes to optimize RAM memory footprint: http://antirez.com/post/redis-weekly-update-7.html
The feature called Virtual Memory and it official deprecated
Redis VM is now deprecated. Redis 2.4 will be the latest Redis version featuring Virtual Memory (but it also warns you that Virtual Memory usage is discouraged). We found that using VM has several disadvantages and problems. In the future of Redis we want to simply provide the best in-memory database (but persistent on disk as usual) ever, without considering at least for now the support for databases bigger than RAM. Our future efforts are focused into providing scripting, cluster, and better persistence.
more information about VM: https://redis.io/topics/virtual-memory