redis del operation cause slow log - redis

Recently, I met a problem when I use setbit in redis. As I use redis as a bloomFilter part for store, 0.2 billion data cost 380MB memory for 99.99% accuracy. Every day I need to delete the redis key for bloomfilter and create a new one, but found slow log, and this may affect other service in product environment. Counld anybody give a better suggest what to do to forbid this? thx a lot~
according command costs(ms):
DEL bloomFilterKey
use(microseconds):83886

Freeing the large mount of memory, i.e. 380MB, costs too much time, and blocks Redis.
In order to avoid this, you can upgrade your Redis to version 4.0, and use the new command UNLINK to delete the key. This command frees memory in a different thread, and won't block Redis.

Related

Is changing the max-memory policy in Redis an expensive operation?

fairly new to Redis and working on a Azure Cache for Redis implementation.
In the Azure documentation around Redis trouble shooting it's stated that there can be long-running commands and that the redis command documentation shows the time complexity of all commands.
I couldn't find anything around the config set maxmemory-policy command.
Is my assumption correct that setting/changing the maxmemory-policy itself is not an expensive command (unlike e.g. resharding/rebalancing a cluster)?
(I know, "expensive" does not really have a proper definition here :) )
Thanks for any hints or answers!
Yes, the config set command itself is not an expensive command. It iterates the config array which contains about 200 items, to find the config, and set it. That's all.
However, after the setting, Redis might need to free memory for evicted items periodically or for each command. That's a cost.

Is the UNLINK command always better than DEL command?

In Redis 4.0, there is a new command UNLINK to delete the keys in Redis memory.
This command is very similar to DEL: it removes the specified keys.
Just like DEL a key is ignored if it does not exist. However the
command performs the actual memory reclaiming in a different thread,
so it is not blocking, while DEL is. This is where the command name
comes from: the command just unlinks the keys from the keyspace. The
actual removal will happen later asynchronously.
So one can always (100% times) use UNLINK instead of DEL as UNLINK is nonblocking, unlike DEL, right?
Before discussing which one is better, let's take a look at the difference between these commands. Both DEL and UNLINK free the key part in blocking mode. And the difference is the way they free the value part.
DEL always frees the value part in blocking mode. However, if the value is too large, e.g. too many allocations for a large LIST or HASH, it blocks Redis for a long time. In order to solve the problem, Redis implements the UNLINK command, i.e. an 'non-blocking' delete.
In fact, UNLINK is NOT always non-blocking/async. If the value is small, e.g. the size of LIST or HASH is less than 64, the value will be freed immediately. In this way, UNLINK is almost the same as DEL, except that it costs a few more function calls than DEL. However, if the value is large, Redis puts the value into a list, and the value will be freed by another thread i.e. the non-blocking free. In this way, the main thread has to do some synchronization with the background thread, and that's also a cost.
In a conclusion, if the value is small, DEL is at least, as good as UNLINK. If value is very large, e.g. LIST with thousands or millions of items, UNLINK is much better than DEL. You can always safely replace DEL with UNLINK. However, if you find the thread synchronization becomes a problem (multi-threading is always a headache), you can rollback to DEL.
UPDATE:
Since Redis 6.0, there's a new configuration: lazyfree-lazy-user-del. You can set it to be yes, and Redis will run the DEL command as if running a UNLINK command.
YES. please have a read on Lazy Redis is better Redis from antirez. But the reason is not that the unlink is nonblocking command. The reason is unlink is smarter than del.
UNLINK is a smart command: it calculates the deallocation cost of an object, and if it is very small it will just do what DEL is supposed to do and free the object ASAP. Otherwise the object is sent to the background queue for processing.
Also, I think the more fast way is we do the decision for redis: using DEL for small key, using UNLINK for a huge key such as big list or set. We can decrease the needless calculation of redis.

How to flushdb in Redis with a smooth way?

I use redis "flushdb" to flush all data in Redis but caused redis-server went away, I wondered the problem cound be cleaning a large amount of keys. So is there any idea for flushing Redis in a smoothly way? Maybe with more time for flushing all data?
flushall is "delete all keys" as described here: http://redis.io/commands/flushall
Delete operations are blocking operations.
Large delete operations may block redis 1minute or more. (e.g. you delete a 16GB hash with tons of keys)
You should write a Script which uses cursors to do this.
//edit:
I found my old answer here and wanted to be more specific providing resources:
Large number of keys, use SCAN to iterate over them with a cursor and do a gracefully cleanup in smaller batches.
Large hash, use either UNLINK command to async delete or HSCAN to iterate over it with a cursor and do a gracefully cleanup.

Does Redis persist data?

I understand that Redis serves all data from memory, but does it persist as well across server reboot so that when the server reboots it reads into memory all the data from disk. Or is it always a blank store which is only to store data while apps are running with no persistence?
I suggest you read about this on http://redis.io/topics/persistence . Basically you lose the guaranteed persistence when you increase performance by using only in-memory storing. Imagine a scenario where you INSERT into memory, but before it gets persisted to disk lose power. There will be data loss.
Redis supports so-called "snapshots". This means that it will do a complete copy of whats in memory at some points in time (e.g. every full hour). When you lose power between two snapshots, you will lose the data from the time between the last snapshot and the crash (doesn't have to be a power outage..). Redis trades data safety versus performance, like most NoSQL-DBs do.
Most NoSQL-databases follow a concept of replication among multiple nodes to minimize this risk. Redis is considered more a speedy cache instead of a database that guarantees data consistency. Therefore its use cases typically differ from those of real databases:
You can, for example, store sessions, performance counters or whatever in it with unmatched performance and no real loss in case of a crash. But processing orders/purchase histories and so on is considered a job for traditional databases.
Redis server saves all its data to HDD from time to time, thus providing some level of persistence.
It saves data in one of the following cases:
automatically from time to time
when you manually call BGSAVE command
when redis is shutting down
But data in redis is not really persistent, because:
crash of redis process means losing all changes since last save
BGSAVE operation can only be performed if you have enough free RAM (the amount of extra RAM is equal to the size of redis DB)
N.B.: BGSAVE RAM requirement is a real problem, because redis continues to work up until there is no more RAM to run in, but it stops saving data to HDD much earlier (at approx. 50% of RAM).
For more information see Redis Persistence.
It is a matter of configuration. You can have none, partial or full persistence of your data on Redis. The best decision will be driven by the project's technical and business needs.
According to the Redis documentation about persistence you can set up your instance to save data into disk from time to time or on each query, in a nutshell. They provide two strategies/methods AOF and RDB (read the documentation to see details about then), you can use each one alone or together.
If you want a "SQL like persistence", they have said:
The general indication is that you should use both persistence methods if you want a degree of data safety comparable to what PostgreSQL can provide you.
The answer is generally yes, however a fuller answer really depends on what type of data you're trying to store. In general, the more complete short answer is:
Redis isn't the best fit for persistent storage as it's mainly performance focused
Redis is really more suitable for reliable in-memory storage/cacheing of current state data, particularly for allowing scalability by providing a central source for data used across multiple clients/servers
Having said this, by default Redis will persist data snapshots at a periodic interval (apparently this is every 1 minute, but I haven't verified this - this is described by the article below, which is a good basic intro):
http://qnimate.com/redis-permanent-storage/
TL;DR
From the official docs:
RDB persistence [the default] performs point-in-time snapshots of your dataset at specified intervals.
AOF persistence [needs to be explicitly configured] logs every write operation received by the server, that will be played again at server startup, reconstructing the
original dataset.
Redis must be explicitly configured for AOF persistence, if this is required, and this will result in a performance penalty as well as growing logs. It may suffice for relatively reliable persistence of a limited amount of data flow.
You can choose no persistence at all.Better performance but all the data lose when Redis shutting down.
Redis has two persistence mechanisms: RDB and AOF.RDB uses a scheduler global snapshooting and AOF writes update to an apappend-only log file similar to MySql.
You can use one of them or both.When Redis reboots,it constructes data from reading the RDB file or AOF file.
All the answers in this thread are talking about the possibility of redis to persist the data: https://redis.io/topics/persistence (Using AOF + after every write (change)).
It's a great link to get you started, but it is defenently not showing you the full picture.
Can/Should You Really Persist Unrecoverable Data/State On Redis?
Redis docs does not talk about:
Which redis providers support this (AOF + after every write) option:
Almost none of them - redis labs on the cloud does NOT provide this option. You may need to buy the on-premise version of redis-labs to support it. As not all companies are willing to go on-premise, then they will have a problem.
Other Redis Providers does not specify if they support this option at all. AWS Cache, Aiven,...
AOF + after every write - This option is slow. you will have to test it your self on your production hardware to see if it fits your requirements.
Redis enterpice provide this option and from this link: https://redislabs.com/blog/your-cloud-cant-do-that-0-5m-ops-acid-1msec-latency/ let's see some banchmarks:
1x x1.16xlarge instance on AWS - They could not achieve less than 2ms latency:
where latency was measured from the time the first byte of the request arrived at the cluster until the first byte of the ‘write’ response was sent back to the client
They had additional banchmarking on a much better harddisk (Dell-EMC VMAX) which results < 1ms operation latency (!!) and from 70K ops/sec (write intensive test) to 660K ops/sec (read intensive test). Pretty impresive!!!
But it defenetly required a (very) skilled devops to help you create this infrastructure and maintain it over time.
One could (falsy) argue that if you have a cluster of redis nodes (with replicas), now you have full persistency. this is false claim:
All DBs (sql,non-sql,redis,...) have the same problem - For example, running set x 1 on node1, how much time it takes for this (or any) change to be made in all the other nodes. So additional reads will receive the same output. well, it depends on alot of fuctors and configurations.
It is a nightmare to deal with inconsistency of a value of a key in multiple nodes (any DB type). You can read more about it from Redis Author (antirez): http://antirez.com/news/66. Here is a short example of the actual ngihtmare of storing a state in redis (+ a solution - WAIT command to know how much other redis nodes received the latest change change):
def save_payment(payment_id)
redis.rpush(payment_id,”in progress”) # Return false on exception
if redis.wait(3,1000) >= 3 then
redis.rpush(payment_id,”confirmed”) # Return false on exception
if redis.wait(3,1000) >= 3 then
return true
else
redis.rpush(payment_id,”cancelled”)
return false
end
else
return false
end
The above example is not suffeint and has a real problem of knowing in advance how much nodes there actually are (and alive) at every moment.
Other DBs will have the same problem as well. Maybe they have better APIs but the problem still exists.
As far as I know, alot of applications are not even aware of this problem.
All in all, picking more dbs nodes is not a one click configuration. It involves alot more.
To conclude this research, what to do depends on:
How much devs your team has (so this task won't slow you down)?
Do you have a skilled devops?
What is the distributed-system skills in your team?
Money to buy hardware?
Time to invest in the solution?
And probably more...
Many Not well-informed and relatively new users think that Redis is a cache only and NOT an ideal choice for Reliable Persistence.
The reality is that the lines between DB, Cache (and many more types) are blurred nowadays.
It's all configurable and as users/engineers we have choices to configure it as a cache, as a DB (and even as a hybrid).
Each choice comes with benefits and costs. And this is NOT an exception for Redis but all well-known Distributed systems provide options to configure different aspects (Persistence, Availability, Consistency, etc). So, if you configure Redis in default mode hoping that it will magically give you highly reliable persistence then it's team/engineer fault (and NOT that of Redis).
I have discussed these aspects in more detail on my blog here.
Also, here is a link from Redis itself.

How can I stop redis graceful when mem and swap is full?

Last night, I run a job to insert data into a redis set(because I want keep my data unique).After I wake up this morning, I find insert operation because very slowly.
Htop shows Memory usage 1884/2015MB and swap usage 1019/1021MB
I realize that 2G memory can not hold redis.
Then I run shutdown in redis-cli, but no action, waiting and waiting...
I also try service redis_6379 stop, but terminal stop at stoping....
What can I do to make redis save all data to dump.rdb and close it graceful?
Normally, a simple redis-cli shutdown should suffice.
Are you using periodical snapshots? If yes, you might be safe to reboot your machine. One important thing to note is that enabling periodical snapshots doubles the memory usage since Redis has to create an in-memory copy of the dataset before writing it to disk.
Another important thing is to follow the advices from Redis setup hints, if you haven't already.
This might not answer your question, but should help you avoid it from happening again.