Can't seem to find an answer and benchmarks are really tale-telling.
How does Redis handle itself during peak load/usage?
The question comes from knowing CPU usage may hit 100% of its logical core, or memory may be over used.
What happens in these cases?
In general, Redis isn't CPU heavy, and will act like any other application when CPU usage is high, but this largely depends on your Redis version.
Prior to Redis 4.0, it was entirely single-threaded, and long running operations would block (like background saves, DEL's with large objects, etc.) Since 4.0, most of these types of operations are pushed to the background. With saves to disk with the bgsave command, Redis now forks itself and does the save in the child, leaving the parent open to accept changes. 6.0 changed a few things like the del command now acts as unlink, and pushes the actual delete to a thread. There were some plans to add more multi-threading to Redis version 7.0, but it appears this was pushed to 7.2.
The biggest concern however is reaching max system memory, or the maxmemory directive in Redis' config. When this happens, Redis' eviction policy comes into play (set by the maxmemory-policy directive).
Here are the available eviction policies and what they do:
noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true.
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes least frequently used keys with expire field set to true and the shortest remaining time-to-live (TTL) value.
The default maxmemory-policy is noeviction from Redis version 3.0 up through 7.0. In version 2.8 and earlier, the default is volatile-lru.
You can read the Key Eviction docs for more.
Backstory: The keyspace of the Redis database in question reports a large amount of expired keys and memory usage is maxed out. The application using this database is experiencing (rare) intermittent timeouts and I thought (in my limited knowledge) perhaps it is because Redis is having to eject expired keys each time a new key is created.
So to my question: how do I tell Redis to remove all the expired keys?
Secondarily -- is it possible to access/see expired keys with redis-cli?
Here's a slice of the INFO I'm looking at:
maxmemory_policy:allkeys-lru
expired_keys:24326586
evicted_keys:134022997
keyspace_hits:2684031719
keyspace_misses:186380210
slave_expires_tracked_keys:0
active_defrag_key_hits:0
active_defrag_key_misses:0
db2:keys=12994468,expires=3193,avg_ttl=1891176
Answer for myself, posterity, and any other Redis newbies out there. I was looking at the wrong "database". I was under the WRONG impression that Redis only had single table but looking at my question you see "db2". I searched into that and found that Redis can have up to 16 databases identified by a zero-based index. In this case:
SELECT 2
That selects "db2" and now doing a DBSIZE gives a more accurate output.
Oye -- so the problem is that the keys are still there! Otherwise when Redis expires a key it deletes it.
Whoops! I'm leaving my question because someone else might think to ask the same thing and be on the wrong route.
Recently, I met a problem when I use setbit in redis. As I use redis as a bloomFilter part for store, 0.2 billion data cost 380MB memory for 99.99% accuracy. Every day I need to delete the redis key for bloomfilter and create a new one, but found slow log, and this may affect other service in product environment. Counld anybody give a better suggest what to do to forbid this? thx a lot~
according command costs(ms):
DEL bloomFilterKey
use(microseconds):83886
Freeing the large mount of memory, i.e. 380MB, costs too much time, and blocks Redis.
In order to avoid this, you can upgrade your Redis to version 4.0, and use the new command UNLINK to delete the key. This command frees memory in a different thread, and won't block Redis.
I have a clustered cache store set up with Infinispan (8.2.4 Final) using the SoftIndexFileStore for persistence.
The documentation states that if entries expire it's not possible for the Compactor to cleanup purged entries and the disk usage will grow overtime. From the userguide:
When entries are stored with expiration, SIFS cannot detect that some
of those entries are expired. Therefore, such old file will not be
compacted (method AdvancedStore.purgeExpired() is not implemented).
This can lead to excessive file-system space usage.
Most of my entries expire but there are some which need to persist indefinitely meaning I can't simply run a cleanup job every once in while to delete all the data files.
How to deal with this wasted disk usage? After several weeks of running I see many files which haven't been modified in weeks. Is it safe to delete old files which haven't been modified e.g. less than a month ago?
No; old files won't ever be modified again (they are written once and then considered immutable until removed). Removing them manually could lead to failures since these files are referenced in the index.
Regrettably, when the store is iterated and the entries are found expired, the Compactor.free() is not called, because there could be multiple concurrent iterations and we could end up calling it many times for single entry.
A proper solution would be implementing a periodic (or JMX-triggered) process that goes through old files, computes space occupied by expired entries and schedules files that exceed some threshold for compaction. This should go into Compactor. Please see SIFS javadoc for general design description.
If you're interested in developing this feature and you want to discuss that more, please go to Infinispan forum.
I understand that Redis serves all data from memory, but does it persist as well across server reboot so that when the server reboots it reads into memory all the data from disk. Or is it always a blank store which is only to store data while apps are running with no persistence?
I suggest you read about this on http://redis.io/topics/persistence . Basically you lose the guaranteed persistence when you increase performance by using only in-memory storing. Imagine a scenario where you INSERT into memory, but before it gets persisted to disk lose power. There will be data loss.
Redis supports so-called "snapshots". This means that it will do a complete copy of whats in memory at some points in time (e.g. every full hour). When you lose power between two snapshots, you will lose the data from the time between the last snapshot and the crash (doesn't have to be a power outage..). Redis trades data safety versus performance, like most NoSQL-DBs do.
Most NoSQL-databases follow a concept of replication among multiple nodes to minimize this risk. Redis is considered more a speedy cache instead of a database that guarantees data consistency. Therefore its use cases typically differ from those of real databases:
You can, for example, store sessions, performance counters or whatever in it with unmatched performance and no real loss in case of a crash. But processing orders/purchase histories and so on is considered a job for traditional databases.
Redis server saves all its data to HDD from time to time, thus providing some level of persistence.
It saves data in one of the following cases:
automatically from time to time
when you manually call BGSAVE command
when redis is shutting down
But data in redis is not really persistent, because:
crash of redis process means losing all changes since last save
BGSAVE operation can only be performed if you have enough free RAM (the amount of extra RAM is equal to the size of redis DB)
N.B.: BGSAVE RAM requirement is a real problem, because redis continues to work up until there is no more RAM to run in, but it stops saving data to HDD much earlier (at approx. 50% of RAM).
For more information see Redis Persistence.
It is a matter of configuration. You can have none, partial or full persistence of your data on Redis. The best decision will be driven by the project's technical and business needs.
According to the Redis documentation about persistence you can set up your instance to save data into disk from time to time or on each query, in a nutshell. They provide two strategies/methods AOF and RDB (read the documentation to see details about then), you can use each one alone or together.
If you want a "SQL like persistence", they have said:
The general indication is that you should use both persistence methods if you want a degree of data safety comparable to what PostgreSQL can provide you.
The answer is generally yes, however a fuller answer really depends on what type of data you're trying to store. In general, the more complete short answer is:
Redis isn't the best fit for persistent storage as it's mainly performance focused
Redis is really more suitable for reliable in-memory storage/cacheing of current state data, particularly for allowing scalability by providing a central source for data used across multiple clients/servers
Having said this, by default Redis will persist data snapshots at a periodic interval (apparently this is every 1 minute, but I haven't verified this - this is described by the article below, which is a good basic intro):
http://qnimate.com/redis-permanent-storage/
TL;DR
From the official docs:
RDB persistence [the default] performs point-in-time snapshots of your dataset at specified intervals.
AOF persistence [needs to be explicitly configured] logs every write operation received by the server, that will be played again at server startup, reconstructing the
original dataset.
Redis must be explicitly configured for AOF persistence, if this is required, and this will result in a performance penalty as well as growing logs. It may suffice for relatively reliable persistence of a limited amount of data flow.
You can choose no persistence at all.Better performance but all the data lose when Redis shutting down.
Redis has two persistence mechanisms: RDB and AOF.RDB uses a scheduler global snapshooting and AOF writes update to an apappend-only log file similar to MySql.
You can use one of them or both.When Redis reboots,it constructes data from reading the RDB file or AOF file.
All the answers in this thread are talking about the possibility of redis to persist the data: https://redis.io/topics/persistence (Using AOF + after every write (change)).
It's a great link to get you started, but it is defenently not showing you the full picture.
Can/Should You Really Persist Unrecoverable Data/State On Redis?
Redis docs does not talk about:
Which redis providers support this (AOF + after every write) option:
Almost none of them - redis labs on the cloud does NOT provide this option. You may need to buy the on-premise version of redis-labs to support it. As not all companies are willing to go on-premise, then they will have a problem.
Other Redis Providers does not specify if they support this option at all. AWS Cache, Aiven,...
AOF + after every write - This option is slow. you will have to test it your self on your production hardware to see if it fits your requirements.
Redis enterpice provide this option and from this link: https://redislabs.com/blog/your-cloud-cant-do-that-0-5m-ops-acid-1msec-latency/ let's see some banchmarks:
1x x1.16xlarge instance on AWS - They could not achieve less than 2ms latency:
where latency was measured from the time the first byte of the request arrived at the cluster until the first byte of the ‘write’ response was sent back to the client
They had additional banchmarking on a much better harddisk (Dell-EMC VMAX) which results < 1ms operation latency (!!) and from 70K ops/sec (write intensive test) to 660K ops/sec (read intensive test). Pretty impresive!!!
But it defenetly required a (very) skilled devops to help you create this infrastructure and maintain it over time.
One could (falsy) argue that if you have a cluster of redis nodes (with replicas), now you have full persistency. this is false claim:
All DBs (sql,non-sql,redis,...) have the same problem - For example, running set x 1 on node1, how much time it takes for this (or any) change to be made in all the other nodes. So additional reads will receive the same output. well, it depends on alot of fuctors and configurations.
It is a nightmare to deal with inconsistency of a value of a key in multiple nodes (any DB type). You can read more about it from Redis Author (antirez): http://antirez.com/news/66. Here is a short example of the actual ngihtmare of storing a state in redis (+ a solution - WAIT command to know how much other redis nodes received the latest change change):
def save_payment(payment_id)
redis.rpush(payment_id,”in progress”) # Return false on exception
if redis.wait(3,1000) >= 3 then
redis.rpush(payment_id,”confirmed”) # Return false on exception
if redis.wait(3,1000) >= 3 then
return true
else
redis.rpush(payment_id,”cancelled”)
return false
end
else
return false
end
The above example is not suffeint and has a real problem of knowing in advance how much nodes there actually are (and alive) at every moment.
Other DBs will have the same problem as well. Maybe they have better APIs but the problem still exists.
As far as I know, alot of applications are not even aware of this problem.
All in all, picking more dbs nodes is not a one click configuration. It involves alot more.
To conclude this research, what to do depends on:
How much devs your team has (so this task won't slow you down)?
Do you have a skilled devops?
What is the distributed-system skills in your team?
Money to buy hardware?
Time to invest in the solution?
And probably more...
Many Not well-informed and relatively new users think that Redis is a cache only and NOT an ideal choice for Reliable Persistence.
The reality is that the lines between DB, Cache (and many more types) are blurred nowadays.
It's all configurable and as users/engineers we have choices to configure it as a cache, as a DB (and even as a hybrid).
Each choice comes with benefits and costs. And this is NOT an exception for Redis but all well-known Distributed systems provide options to configure different aspects (Persistence, Availability, Consistency, etc). So, if you configure Redis in default mode hoping that it will magically give you highly reliable persistence then it's team/engineer fault (and NOT that of Redis).
I have discussed these aspects in more detail on my blog here.
Also, here is a link from Redis itself.