How do I delete all keys matching a specified key pattern using StackExchange.Redis? - redis

I've got about 150,000 keys in a Redis cache, and need to delete > 95% of them - all keys matching a specific key prefix - as part of a cache rebuild. As I can see it, there are three ways to achieve this:
Use server.Keys(pattern) to pull out the entire key list matching my prefix pattern, and iterate through the keys calling KeyDelete for each one.
Maintain a list of keys in a Redis set - each time I insert a value, I also insert the key in the corresponding key set, and then retrieve these sets rather than using Keys. This would avoid the expensive Keys() call, but still relies on deleting tens of thousands of records one by one.
Isolate all of my volatile data in a specific numbered database, and just flush it completely at the start of a cache rebuild.
I'm using .NET and the StackExchange.Redis client - I've seen solutions elsewhere that use the CLI or rely on Lua scripting, but nothing that seems to address this particular use case - have I missed a trick, or is this just something you're not supposed to do with Redis?
(Background: Redis is acting as a view model in front of the Microsoft Dynamics CRM API, so the cache is populated on first run by pulling around 100K records out of CRM, and then kept in sync by publishing notifications from within CRM whenever an entity is modified. Data is cached in Redis indefinitely and we're dealing with a specific scenario here where the CRM plugins fail to fire for a period of time, which causes cache drift and eventually requires us to flush and rebuild the cache.)

Both options 2 & 3 are reasonable.
Steer clear of option 1. KEYS really is slow and only gets slower as your keyspace grows.
I'd normally go for 2 (without LUA, including LUA would increase the learning curve to support the solution - which of course is fine when justified and assuming it's existence is clear/documented.), but 3 could definitely be a contender, fast and simple, as long as you can be sure you won't exceed the configured DB limit.

Use scanStream instead of keys and it will work like a charm.
Docs - https://redis.io/commands/scan
The below code can get you a array of keys starting with LOGIN:: and you can loop through the array and execute redis DEL command to del the corresponding keys.
Example code in nodejs :-
const redis = require('ioredis');
let stream = redis.scanStream({
match: "LOGIN::*"
});
stream.on("data", async (keys = []) => {
let key;
for (key of keys) {
if (!keysArray.includes(key)) {
await keysArray.push(key);
}
}
});
stream.on("end", () => {
res(keysArray);
});

Related

Is hash always preferable if I'm always getting multiple fields in redis?

Let's say I am checking information about some of my users every second. I need to take an action on some of those users that may take more than a second. Something like this:
#pseudocode
users = DB.query("SELECT * FROM users WHERE state=5");
users.forEach(user => {
if (user.needToDoThing()) {
user.doThatThing();
}
});
I want to make sure I won't accidentally run doThatThing on a user who has it already running. I am thinking of solving it by setting cache keys based on the user ID as things are processed
#pseudocode
runningUsers = redis.getMeThoseUsers();
users = DB.query("SELECT * FROM users WHERE state=5 AND id NOT IN (runningUsers)");
redis.setThoseUsers(users);
users.forEach(user => {
if (user.needToDoThing()) {
user.doThatThing();
}
redis.unsetThatUser(user);
});
I am unsure if I should...
Use one hash with a field per user
Use multiple keys with mset and hget
Is there a performance or business reason I'd want one over the other? I am assuming I should use a hash so I can use hgetall to know who is running on that hash vs doing a scan on something like runningusers:*. Does that seem right?
Generally speaking, option 1 (Use one hash with a field per user) is probably the best method in most cases because you want to access all fields for the users at once. It can be achieved by using HGETALL.
But when go you for 2nd option (use multiple keys with mset and mget) you want to query every single time in redis to get the user details.By using MGET you can access all key values but you want to know the key name for each users. It will be suitable when you are accessing few fields in an object.Disadvantages: possibly slower when you need to access all/most of the fields for the users.
NOTE: By using 1st option you can't set TTL for single users because in redis there is no support for TTL for child keys in hash structure,the only way you should set for entire hash.But by using 2nd option, you can set TTL for every single users.

Ravendb memory leak on query

I'm having a hard problem solving an issue with RavenDB.
At my work we have a process to trying to identify potential duplicates in our database on a specified collection (let's call it users collection).
That means, I'm iterating through the collection and for each document there is a query that is trying to find similar entities. So just imagine, it's quite a long task to run.
My problem is, when the task starts running, the memory consumption for RavenDB is going higher and higher, it's literally just growing and growing, and it seems to continue until it reaches the maximum memory of the system.
But it doesn't really makes sense, since I'm only doing query, I'm using one single index and take a default page size when querying (128).
Anybody meet a similar problem like this? I really have no idea what is going on in ravendb. but it seems like a memory leak.
RavenDB version: 3.0.179
When i need to do massive operations on large collections i work following this steps to prevent problems on memory usage:
I use Query Streaming to extract all the ids of the documents that i want to process (with a dedicated session)
I open a new session for each id, i load the document and then i do what i need
First, a recommendation: if you don't want duplicates, store them with a well-known ID. For example, suppose you don't want duplicate User objects. You'd store them with an ID that makes them unique:
var user = new User() { Email = "foo#bar.com" };
var id = "Users/" + user.Email; // A well-known ID
dbSession.Store(user, id);
Then, when you want to check for duplicates, just check against the well known name:
public string RegisterNewUser(string email)
{
// Unlike .Query, the .Load call is ACID and never stale.
var existingUser = dbSession.Load<User>("Users/" + email);
if (existingUser != null)
{
return "Sorry, that email is already taken.";
}
}
If you follow this pattern, you won't have to worry about running complex queries nor worry about stale indexes.
If this scenario can't work for you for some reason, then we can help diagnose your memory issues. But to diagnose that, we'll need to see your code.

Alternatives to slow DEL large key

There is async UNLINK in the upcoming Redis 4, but until then, what are some good alternatives to implementing DELete of large set keys with no or minimal blocking?
Is RENAME to some unique name followed by EXPIRE 1 second a good solution? RENAME first so that the original key name becomes available for use. Freeing the memory right away is not of immediate concern, Redis can do async garbage collection when it can.
EXPIRE will not eliminate the delay, only delay it until the server actually expires the value (note that Redis uses an approximate expiration algorithm). Once the server gets to actually expiring the value, it will issue a DEL command that will block the server until the value is deleted.
If you are unable to use v4's UNLINK, the best way you could go about deleting a large set is by draining it incrementally. This can be easily accomplished with a server-side Lua script to reduce the bandwidth, such as this one:
local target = KEYS[1]
local count = tonumber(ARGV[1]) or 100
local reply = redis.call('SPOP', target, count)
if reply then
return #reply
else
return nil
end
To drain, call repeatedly the script above with the key-to-be-deleted's name, and with or without a count argument, until you get a nill Redis reply.

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently
A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

Redis value update

Im currently having a redis data set with key representing ids and values as a json . I need to add a new entity in the json for every userid(keys). Is there any existing opensource tool? what is the way i should proceed to update for 1M keys of data.
There are a few possibilities:
Here's some pseudo code for doing this with Redis 2.6 Lua scripting.
for userid in users:
EVAL 'local obj = cjson.decode(redis.call("GET", KEY[1])); obj.subobj.newjsonkey = ARGV[1]; redis.call("SET", KEY[1], cjson.encode(obj));' 1 userid "new value!"
Short of that, you may need to stop the service and do this with GETs and SETs since you probably don't have a locking mechanism in place. If you can enforce a lock, see http://redis.io/commands/setnx
There are a few tools for updating an rdb. https://github.com/sripathikrishnan/redis-rdb-tools https://github.com/nrk/redis-rdb
Note, this answer was adapted to my answer to: Working with nested objects in Redis?