Efficiently delete RedisKeys in bulk via wildcard pattern - redis

Problem:
I need to efficiently delete keys from my Redis Cache using a wildcard pattern. I don't need atomicity; eventual consistency is acceptable.
Tech stack:
.NET 6 (async all the way through)
StackExchange.Redis 2.6.66
Redis Server 6.2.6
I currently have ~500k keys in Redis.
I'm not able to use RedisJSON for various reasons
Example:
I store the following 3 STRING types with keys:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
where each STRING stores JSON like so:
> dump dailynote:getitemsforuser:region:fl:user:123
"{\"Name\":\"john\",\"Age\":22}"
The original solution used the KeysAsync method to retrieve the list of keys to delete via a wildcard pattern. Since the Redis Server is 6.x, the SCAN feature is being used by KeysAsync internally by the StackExchange.Redis nuget.
Original implementation used a wildcard pattern dailynote:getitemsforuser:region:*. As one would expect, this solution didn't scale well and we started seeing RedisTimeoutExceptions.
I'm aware of the "avoid this in PROD if you can" and have seen Marc Gravell respond to a couple other questions/issues on SO and StackExchange.Redis GitHub. The only potential alternative I could think of is to use a Redis SET to "track" each RedisKey and then retrieve the list of values from the SET (which are the keys I need to remove). Then delete the SET as well as the returned keys.
Potential Solution?:
Create a Redis SET with a key of dailynote:getitemsforuser with a value which is the key of the form dailynote:getitemsforuser:region:XX...
The SET would look like:
dailynote:getitemsforuser (KEY)
dailynote:getitemsforuser:region:sw:user:123 (VALUE)
dailynote:getitemsforuser:region:fl:user:123 (VALUE)
dailynote:getitemsforuser:region:sw:user:456 (VALUE)
...
I would still have each individual STRING type as well:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
when it is time to do the "wildcard" remove, I get the members of the dailynote:getitemsforuser SET, then call RemoveAsync passing the members of the set as the RedisKey[]. Then call RemoveAsync with the key of the SET (dailynote:getitemsforuser)
I'm looking for feedback on how viable of a solution this is, alternative ideas, gotchas, and suggestions for improvement. TIA
UPDATE
Added my solution I went with below...

The big problem with both KEYS and SCAN with Redis is that they require a complete scan of the massive hash table that stores every Redis key. Even if you use a pattern, it still needs to check each entry in that hash table to see if it matches.
Assuming you are calling SADD when you are also setting the value in your key—and thus avoiding the call to SCAN—this should work. It is worth noting that calls to SMEMBERS to get all the members of a Set can also cause issues if the Set is big. Redis—being single-threaded—will block while all the members are returned. You can mitigate this by using SSCAN instead. StackExchange.Redis might do this already. I'm not sure.
You might also be able to write a Lua script that reads the Set and UNLINKs all the keys atomically. This would reduce network but could tie Redis up if this takes too long.

I ended up using the solution I suggested above where I use a Redis SET with a known/fixed key to "track" each of the necessary keys.
When a key that needs to be tracked is added, I call StackExchange.Redis.IDatabase.SetAddAsync (SADD) while calling StackExchange.Redis.IDatabase.HashSetAsync (HSET) for adding the "tracked" key (along with its TTL).
When it is time to remove the "tracked" key, I first call StackExchange.Redis.IDatabase.SetScanAsync (SSCAN) (with a page size of 250) iterating on the IAsyncEnumerable and call StackExchange.Redis.IDatabase.KeyDeleteAsync (HDEL) on chunks of the members of the SET. I then call StackExchange.Redis.IDatabase.KeyDeleteAsync on the actual key of the SET itself.
Hope this helps someone else.

Related

Remodeling Redis structure as NoSQL suggests would record a single key set object?

Remodeling Redis structure as NoSQL for a find by value solution suggests to record a single key set object?
My initial problem as "find all keys by value" in Redis. And for my surprise, that's not possible as it sounds, or it not a good option considering Redis use.
Searching in SO, I found this question Find key by value the answer for it is a remodeling structure approach.
My scenario
I have keys which are random UUID generated in code which store a boolean value. The UUID represents an request-id for an application which will return the success of this request after being processed. Then my application will store this request-id until it's not processed - there are some republishing feature if a 2 minutes timeout no response occurs for this request-id.
So, the data stored in Redis seems like:
44f672a0-36da-4906-9fa0-d3ee0b0f1989 true
33749e7e-5e62-4340-8914-cb9f6eed6111 false
and so on...
In some point of my code I should find all keys not processed. Which is a problem with this structure, I should have to scan all keys and for each key find the associated value. It's like a 2 querying per key - not a good approach.
Solution scenario
So, according to this question and answer I should store a single key named false with a set of values which are my request-id. So it would looks like:
false [33749e7e-5e62-4340-8914-cb9f6eed6111, 36b1eb8f-1576-49e7-a95a-ab852cc2624d ...]
So here we solved the find by value problem. Since I just have a single key false I'll found all not keys not processed.
But now I have to update this set key all the time I receive a success processing request instead of deleting a single key.
Does this updating set scenario could be a performance problem?
The idea is not to have a large unprocessed request-id. This set aims to be small in size and values
I think this is a design problem not a driver or code problem, but I'm using Java with Jedis to communicate with Redis.

Use set or just create keys in redis to check existence?

I can think of two ways of checking existence using redis:
Use the whole database as a 'set', and just SET a key and checking existence by GETing it (or using EXISTS as mentioned in the comment by #Sergio Tulentsev)
Use SADD to add all members to a key and check existence by SISMEMBER
Which one is better? Will it be a problem, compared to the same amount of keys in a single set, if I choose the first method and the number of keys in a database gets larger?
In fact, besides these two methods, you can also use the HASH data structure with HEXISTS command (I'll call this method as the third solution).
All these solutions are fast enough, and it's NOT a problem if you have a large SET, HASH, or keyspace.
So, which one should we use? It depends on lots of things...
Does the key has value?
Keys of both the first and the third solution can have value, while the second solution CANNOT.
So if there's no value for each key, I'd prefer the second solution, i.e. SET solution. Otherwise, you have to use the first or third solution.
Does the value has structure?
If the value is NOT raw string, but a data structure, e.g. LIST, SET. You have to use the first solution, since HASH's value CAN only be raw string.
Do you need to do set operations?
If you need to do intersection, union or diff operations on multiple data sets, you should use the second solution. Redis has built-in commands for these operations, although they might be slow commands.
Memory efficiency consideration
Redis takes more memory-efficient encoding for small SET and HASH. So when you have lots of small data sets, take the second and the third solution can save lots of memory. See this for details.
UPDATE
Do you need to set TTL for these keys?
As #dizzyf points out in the comment, if you need to set TTL for these keys, you have to use the first solution. Because items of HASH and SET DO NOT have expiration property. You can only set TTL for the entire HASH or SET, NOT their elements.

Redis PFADD to check a exists-in-set query

I have a requirement to process multiple records from a queue. But due to some external issues the items may sporadically occur multiple times.
I need to process items only once
What I planned to use is PFADD into redis every record ( as a md5sum) and then see if that returns success. If that shows no increment then the record is a duplicate else process the record.
This seems pretty straightforward , but I am getting too many false positives while using PFADD
Is there a better way to do this ?
Being the probabilistic data structure that it is, Redis' HyperLogLog exhibits 0.81% standard error. You can reduce (but never get rid of) the probability for false positives by using multiple HLLs, each counting a the value of a different hash function on your record.
Also note that if you're using a single HLL there's no real need to hash the record - just PFADD as is.
Alternatively, use a Redis Set to keep all the identifiers/hashes/records and have 100%-accurate membership tests with SISMEMBER. This approach requires more (RAM) resources as you're storing each processed element, but unless your queue is really huge that shouldn't be a problem for a modest Redis instance. To keep memory consumption under control, switch between Sets according to the date and set an expiry on the Set keys (another approach is to use a single Sorted Set and manually remove old items from it by keeping their timestamp in the score).
In general in distributed systems you have to choose between processing items either :
at most once
at least once
Processing something exactly-once would be convenient however this is generally impossible.
That being said there could be acceptable workarounds for your specific use case, and as you suggest storing the items already processed could be an acceptable solution.
Be aware though that PFADD uses HyperLogLog, which is fast and scales but is approximate about the count of the items, so in this case I do not think this is what you want.
However if you are fine with having a small probability of errors, the most appropriate data structure here would be a Bloom filter (as described here for Redis), which can be implemented in a very memory-efficient way.
A simple, efficient, and recommended solution would be to use a simple redis key (for instance a hash) storing a boolean-like value ("0", "1" or "true", "false") for instance with the HSET or SET with the NX option instruction. You could also put it under a namespace if you wish to. It has the added benefit of being able to expire keys also.
It would avoid you to use a set (not the SET command, but rather the SINTER, SUNION commands), which doesn't necessarily work well with Redis cluster if you want to scale to more than one node. SISMEMBER is still fine though (but lacks some features from hashes such as time to live).
If you use a hash, I would also advise you to pick a hash function that has fewer chances of collisions than md5 (a collision means that two different objects end up with the same hash).
An alternative approach to the hash would be to assign an uuid to every item when putting it in the queue (or a squuid if you want to have some time information).

ServiceStack.Redis SearchKeys

I am using the ServiceStack.Redis client on C#.
I added about 5 million records of type1 using the following pattern a::name::1 and 11 million records of type2 using the pattern b::RecId::1.
Now I am using redis typed client as client = redis.As<String>. I want to retrieve all the keys of type2. I am using the following pattern:
var keys = client.SearchKeys("b::RecID::*");
But it takes forever (approximately 3-5 mins) to retrieve the keys.
Is there any faster and more efficient way to do this?
You should work hard to avoid the need to scan the keyspace. KYES is literally a server stopper, but even if you have SCAN available: don't do that. Now, you could choose to keep the keys of things you have available in a set somewhere, but there is no SRANGE etc - in 2. you'd have to use SMEMBERS, which is still going to need to return a few million records - but at least they will all be available. In later server versions, you have access to SCAN (think: KEYS) and SSCAN (think: SMEMBERS), but ultimately you simply have the issue of wanting millions of rows, which is never free.
If possible, you could mitigate the impact by using a master/slave pair, and running the expensive operations on the slave. At least other clients will be able to do something while you're killing the server.
The keys command in Redis is slow (well, not slow, but time consuming). It also blocks your server from accepting any other command while it's running.
If you really want to iterate over all of your keys take a look at the scan command instead- although I have no idea about ServiceStack for this
You can use the SCAN command, make a loop search, where each search is restricted to a smaller number of keys. For a complete example, refer to this article: http://blog.bossma.cn/csharp/nservicekit-redis-support-scan-solution/

Redis keys function for match with multiple pattern

How i can find keys with multiple match pattern, for example i've keys with
foo:*, event:*, poi:* and article:* patterns.
how i find keys with redis keys function for match with foo:* or poi:* pattern, its like
find all keys with preffix foo:* or poi:*
You should not do this. KEYS is mainly a debug command. It is not supposed to be used for anything else.
Redis is not a database supporting ad-hoc queries: you are supposed to provide access paths for the data you put into Redis (using extra set or hash or zset indexes).
If you really need to run arbitrary boolean expressions on keys to select data, I would suggest to do it offline by using the rdb-redis-tools package.