Is it safe to delete keys from Redis while iterating through them using scan?

Is it safe to delete keys from Redis while iterating through them using scan? - redis

I want to iterate through a number of keys that are stored in Redis with the same prefix using scan and do some processing in my app code with the key's values. Is it safe to delete the keys returned from scan's output after processing them? I don't see this mentioned in the scan documentation: https://redis.io/commands/scan

Yes, it's safe to delete the returned keys.
Redis scan is stateless. Keyspace change (e.g. adding new keys or removing old keys) during the scan process does not make the scan fail, although Keyspace change might lead to keys missing or duplicated keys returned.
Check this on the details of how Redis scan work.

Related

Efficiently delete RedisKeys in bulk via wildcard pattern

Problem:
I need to efficiently delete keys from my Redis Cache using a wildcard pattern. I don't need atomicity; eventual consistency is acceptable.
Tech stack:
.NET 6 (async all the way through)
StackExchange.Redis 2.6.66
Redis Server 6.2.6
I currently have ~500k keys in Redis.
I'm not able to use RedisJSON for various reasons
Example:
I store the following 3 STRING types with keys:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
where each STRING stores JSON like so:
> dump dailynote:getitemsforuser:region:fl:user:123
"{\"Name\":\"john\",\"Age\":22}"
The original solution used the KeysAsync method to retrieve the list of keys to delete via a wildcard pattern. Since the Redis Server is 6.x, the SCAN feature is being used by KeysAsync internally by the StackExchange.Redis nuget.
Original implementation used a wildcard pattern dailynote:getitemsforuser:region:*. As one would expect, this solution didn't scale well and we started seeing RedisTimeoutExceptions.
I'm aware of the "avoid this in PROD if you can" and have seen Marc Gravell respond to a couple other questions/issues on SO and StackExchange.Redis GitHub. The only potential alternative I could think of is to use a Redis SET to "track" each RedisKey and then retrieve the list of values from the SET (which are the keys I need to remove). Then delete the SET as well as the returned keys.
Potential Solution?:
Create a Redis SET with a key of dailynote:getitemsforuser with a value which is the key of the form dailynote:getitemsforuser:region:XX...
The SET would look like:
dailynote:getitemsforuser (KEY)
dailynote:getitemsforuser:region:sw:user:123 (VALUE)
dailynote:getitemsforuser:region:fl:user:123 (VALUE)
dailynote:getitemsforuser:region:sw:user:456 (VALUE)
...
I would still have each individual STRING type as well:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
when it is time to do the "wildcard" remove, I get the members of the dailynote:getitemsforuser SET, then call RemoveAsync passing the members of the set as the RedisKey[]. Then call RemoveAsync with the key of the SET (dailynote:getitemsforuser)
I'm looking for feedback on how viable of a solution this is, alternative ideas, gotchas, and suggestions for improvement. TIA
UPDATE
Added my solution I went with below...

The big problem with both KEYS and SCAN with Redis is that they require a complete scan of the massive hash table that stores every Redis key. Even if you use a pattern, it still needs to check each entry in that hash table to see if it matches.
Assuming you are calling SADD when you are also setting the value in your key—and thus avoiding the call to SCAN—this should work. It is worth noting that calls to SMEMBERS to get all the members of a Set can also cause issues if the Set is big. Redis—being single-threaded—will block while all the members are returned. You can mitigate this by using SSCAN instead. StackExchange.Redis might do this already. I'm not sure.
You might also be able to write a Lua script that reads the Set and UNLINKs all the keys atomically. This would reduce network but could tie Redis up if this takes too long.

I ended up using the solution I suggested above where I use a Redis SET with a known/fixed key to "track" each of the necessary keys.
When a key that needs to be tracked is added, I call StackExchange.Redis.IDatabase.SetAddAsync (SADD) while calling StackExchange.Redis.IDatabase.HashSetAsync (HSET) for adding the "tracked" key (along with its TTL).
When it is time to remove the "tracked" key, I first call StackExchange.Redis.IDatabase.SetScanAsync (SSCAN) (with a page size of 250) iterating on the IAsyncEnumerable and call StackExchange.Redis.IDatabase.KeyDeleteAsync (HDEL) on chunks of the members of the SET. I then call StackExchange.Redis.IDatabase.KeyDeleteAsync on the actual key of the SET itself.
Hope this helps someone else.

Delete large number of keys in Redis

I have a large number of simple keys to delete from Redis with a prefix and I am trying to find the most efficient way to do this atomically inside a transaction with Lua script:
Iterate with SCAN and DEL keys?
Iterate with SCAN and EXPIRE each key?
Iterate with SCAN and UNLINK keys?
Which of the above is the recommended way to proceed? Shall I have a different approach - like using a hash and multiple keys inside hash? Would any of the above be any problem in case of Redis cluster?

I would suggest to go with unlink with batch processing like following and it will clear the memory in efficient way. I don't suggest expiry as redis will look 10times(this default configuration) within 1second to delete expired keys and it may not be efficient way.
redis-cli --scan --pattern 'prefix:*' | xargs -L 1000 redis-cli unlink

if your keys are simple strings, you can organise your prefixes inside hashes, with keys as name/value pairs.
Then, just drop the entire hash.
PS: Yes you have to rewrite all your read queries, but the performance impact should be negligible?
PS2: Doesn't address the clustered Redis problems.

A Redis Lua script must be provided explicitly with all key names it touches via the KEYS array. All three approaches violate that principle as they fetch the key names with SCAN.

Get all hashes exists in redis

I'm have hashes in redis cache like:
Hash Key Value
hashme:1 Hello World
hashme:2 Here Iam
myhash:1 Next One
My goal is to get the Hashes as output in the CLI like:
hashme
myhash
If there's no such option, this is ok too:
hashme:1
hashme:2
myhash:1
I didn't find any relevant command for it in Redis API.
Any suggestions ?

You can use the SCAN command to get all keys from Redis. Then for each key, use the TYPE command to check if it's a hash.
UPDATE:
With Redis 6.0, the SCAN command supports TYPE subcommand, and you can use this subcommand to scan all keys of a specified type:
SCAN 0 TYPE hash
Also never use KEYS command in production environment!!! It's a dangerous command which might block Redis for a long time.

keys *
is work for me. you Can try it.

The idea of redis (and others K/v stores) is for you to build an index. The database won't do it for you. It's a major difference with relational databases, which conributes to better performances.
So when your app creates a hash, put its key into a SET. When your app deletes a hash, remove its key from the SET. And then to get the list of hash IDs, just use SMEMBERS to get the content of the SET.

connection.keys('*') this will bring all the keys irrespective of the data type as everything in redis is stored as key value format

for redis in python, you can use below command to retrieve keys from redis db
def standardize_list(bytelist):
return [x.decode('utf-8') for x in bytelist]
>>> standardize_list(r.keys())
['hat:1236154736', 'hat:56854717', 'hat:1326692461']
here r variable is redis connection object

Redis scan match performance with large number of keys?

Can't find any info about redis scan match
does it mean that if I have 500,000 keys it will iterate over all of them one by one and check if they match the pattern? or it have some other clever trick to pull only relevance keys?
if its actually scan them all, is it performance wisely?
THanks

Scan is basically an alternate to keys command which is blocking. It will return a cursor and with that cursor you need to scan again and the process continues. Duplicates are also possible so you need to handle them in the app logic which means even if you have only 1 million keys and you scan for 10,000 items in each scan it can go more than 10 times.
So it's actually a trade off instead of using keys which is a blocking command but quick you can use scan which is actually slow in comparison with keys command but will not block in the production environment and still achieves what you need.
Hope this helps

paging through entries in redis hash

I couldn't find a way to "page" through redis hashes (doc).
I've got ~5million hash entries in 1 redis db. I am trying to iterate through all of them without having to resort to building a list of entry keys.
Can this be achieved?

Since all the redis hash commands require the key element. You need to store your set of keys to page your hash.

See my answer to this question for an example of key iteration by using extra sets.
There is no way to avoid storing extra sets (or lists) and still iterate on a huge number of keys. The KEYS command is not an option.

I had exactly the same requirement of Redis Hash Pagination and yes it is possible to page through redis hash using HSCAN command. Detailed documentation of the same is present at SCAN.
Usage: Hscan <your key/hash name> <cursor-id> count <page-size>.
cursor id which should be passed initially is 0 and it returns a cursor-id and data which is of page-size. The cursor id returned needs to be passed in the subsequent call for fetching subsequent data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas