Set and delete or just overwrite? - redis

I have keys in Redis that after being read once are no longer needed. Should I delete them or just let them sit in the database until I need the key again?
I guess the question is which costs more: unneeded data sitting in the database, or a delete operation?
The set command overwrites if string data already exists at the key specified. So, in a way there is a delete and write command, I could use a get and delete command.
Or I could just call delete after getting a key. My question is should I, or just let the key sit there?

If you're only dealing with one key at a time, then overwriting (set) versus deleting (del) both only have a time complexity of O(1) as per the Redis documentation. I'm personally a fan of deleting an entry as soon as I'm finished with it since it is low cost and keeps the storage at a minimum. That being said, both time complexities are low, so overwriting shouldn't be an issue either :)

Related

Redis - prioritizing two sets of keys (eviction policy)

We have two separate set of keys in one Redis instance (set1 and set2). All keys in both sets have an expire time set.
If Redis instance hits max memory cap, we want keys from set1 (and only from it!) be evicted to free some memory, but we need to have a guarantee that keys from set2 will not be evicted until their time limit and, thus, will always expire in a normal way.
Is there any possibility to achieve it?
Thanx in advance!
Redis doesn't provide this finely grained of a level of control over cache invalidation. You're restricted to the following options:
noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true.
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes keys with expire field set to true and the shortest remaining time-to-live (TTL) value.
The best you could do would be to set the policy to noeviction and then write your own cache-invalidation process. Or maybe set it to volatile-ttl and then have set2 be non-volatile keys that you remove manually. A fair bit of work and possibly not worth it.
The documentation describing these options also provides some good insight into how Redis actually removes things and might be worth perusing.

Efficiently delete RedisKeys in bulk via wildcard pattern

Problem:
I need to efficiently delete keys from my Redis Cache using a wildcard pattern. I don't need atomicity; eventual consistency is acceptable.
Tech stack:
.NET 6 (async all the way through)
StackExchange.Redis 2.6.66
Redis Server 6.2.6
I currently have ~500k keys in Redis.
I'm not able to use RedisJSON for various reasons
Example:
I store the following 3 STRING types with keys:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
where each STRING stores JSON like so:
> dump dailynote:getitemsforuser:region:fl:user:123
"{\"Name\":\"john\",\"Age\":22}"
The original solution used the KeysAsync method to retrieve the list of keys to delete via a wildcard pattern. Since the Redis Server is 6.x, the SCAN feature is being used by KeysAsync internally by the StackExchange.Redis nuget.
Original implementation used a wildcard pattern dailynote:getitemsforuser:region:*. As one would expect, this solution didn't scale well and we started seeing RedisTimeoutExceptions.
I'm aware of the "avoid this in PROD if you can" and have seen Marc Gravell respond to a couple other questions/issues on SO and StackExchange.Redis GitHub. The only potential alternative I could think of is to use a Redis SET to "track" each RedisKey and then retrieve the list of values from the SET (which are the keys I need to remove). Then delete the SET as well as the returned keys.
Potential Solution?:
Create a Redis SET with a key of dailynote:getitemsforuser with a value which is the key of the form dailynote:getitemsforuser:region:XX...
The SET would look like:
dailynote:getitemsforuser (KEY)
dailynote:getitemsforuser:region:sw:user:123 (VALUE)
dailynote:getitemsforuser:region:fl:user:123 (VALUE)
dailynote:getitemsforuser:region:sw:user:456 (VALUE)
...
I would still have each individual STRING type as well:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
when it is time to do the "wildcard" remove, I get the members of the dailynote:getitemsforuser SET, then call RemoveAsync passing the members of the set as the RedisKey[]. Then call RemoveAsync with the key of the SET (dailynote:getitemsforuser)
I'm looking for feedback on how viable of a solution this is, alternative ideas, gotchas, and suggestions for improvement. TIA
UPDATE
Added my solution I went with below...
The big problem with both KEYS and SCAN with Redis is that they require a complete scan of the massive hash table that stores every Redis key. Even if you use a pattern, it still needs to check each entry in that hash table to see if it matches.
Assuming you are calling SADD when you are also setting the value in your key—and thus avoiding the call to SCAN—this should work. It is worth noting that calls to SMEMBERS to get all the members of a Set can also cause issues if the Set is big. Redis—being single-threaded—will block while all the members are returned. You can mitigate this by using SSCAN instead. StackExchange.Redis might do this already. I'm not sure.
You might also be able to write a Lua script that reads the Set and UNLINKs all the keys atomically. This would reduce network but could tie Redis up if this takes too long.
I ended up using the solution I suggested above where I use a Redis SET with a known/fixed key to "track" each of the necessary keys.
When a key that needs to be tracked is added, I call StackExchange.Redis.IDatabase.SetAddAsync (SADD) while calling StackExchange.Redis.IDatabase.HashSetAsync (HSET) for adding the "tracked" key (along with its TTL).
When it is time to remove the "tracked" key, I first call StackExchange.Redis.IDatabase.SetScanAsync (SSCAN) (with a page size of 250) iterating on the IAsyncEnumerable and call StackExchange.Redis.IDatabase.KeyDeleteAsync (HDEL) on chunks of the members of the SET. I then call StackExchange.Redis.IDatabase.KeyDeleteAsync on the actual key of the SET itself.
Hope this helps someone else.

How many keys can be deleted in a single redis del command?

I want to delete multiple redis keys using a single delete command on redis client.
Is there any limit in the number of keys to be deleted?
i will be using del key1 key2 ....
There's no hard limit on the number of keys, but the query buffer limit does provide a bound. Connections are closed when the buffer hits 1 GB, so practically speaking this is somewhat difficult to hit.
Docs:
https://redis.io/topics/clients
However! You may want to take into consideration that Redis is single-threaded: a time-consuming command will block all other commands until completed. Depending on your use-case this may make a good case for "chunking" up your deletes into groups of, say, 1000 at a time, because it allows other commands to squeeze in between. (Whether or not this is tolerable is something you'll need to determine based on your specific scenario.)

Most efficient way to replace large redis keys

I have a large Redis sorted set. We need to re-index the data in the set daily, while clients actively request data from the set. My plan is to simply build a second set using a different key and then replace the existing key with the new one:
Build new "indexed" sorted set
RENAME "indexed" set to "live" to replace existing "live" set.
Looking at the RENAME documentation, it states:
If newkey already exists it is overwritten, when this happens RENAME executes an implicit DEL operation, so if the deleted key contains a very big value it may cause high latency even if RENAME itself is usually a constant-time operation.
I'm wondering, then, if it's better to rename the "live" sorted set (e.g. to "dead"), then rename the new "indexed" sorted set to "live" -- and pipeline those requests. And only then, issue a separate DEL command to delete the "dead" set:
Build new "indexed" sorted set
pipeline: RENAME existing "live" set to "dead"
pipeline: RENAME new "indexed" set to "live"
DEL "dead" set
ideas?
Using DEL, you are only postponing the problem. During the DEL, redis blocks other clients.
First, I'd investigate how big the problem is. It can be a problem, for example deleting a 3.5GB ZSET key takes about 2 seconds on our staging system.
If it's a problem, split up the DEL by using ZREMRANGEBYRANK and ZCARD.
Pipelining is efficient (non-transactional ofcourse), so it helps to determine the total size upfront by ZCARD, and after that, issue N ZREMRANGEBYRANK commands (piped) with a range of (example) -10000 0, ending with '0 -1'. As soon as all members are deleted, Redis automatically deletes the key (the Sorted Set) itself.
Hope this helps, TW

Which database operation is heaviest?

If I perform the CRUD operations on the same table, what is the heaviest operation in terms of performance?
People say DELETE and then INSERT is better than UPDATE in some cases, is this true? Then UPDATE is the heaviest operation?
Like all things in life, it depends.
SQL Server uses WAL (write ahead logging) to maintain ACID (Atomicity, Consistency, Isolation, Durability) properties.
A insert needs to log entries for data page and index page changes. If page splits occur, it takes longer. Then the data is written to the data file.
A delete marks the data and index pages for re-use. The data will still be there right after the operation.
A update is implemented as an delete and insert. There for double the log entries.
What can help inserts is pre-allocating the space in the data file before running the job. Auto growing the data files is expensive.
In summary, I would expect updates on average to be the most expensive operation.
I am by no way an expert on the storage engine.
Please check out http://www.sqlskills.com - Paul Randals blog and/or Kalen Daleny SQL Server Internals book, http://sqlserverinternals.com/. These authors go in depth on all the cases that might happen.
It depends mostly on foregin keys and indexes which you have on this table. For deletion and isertion every column that is a foreign key and part of an index has to be checked on foreign key references and every index containing that column has to be rebuilt.
If you do DELETE and then INSERT then checking and rebuilding happens twice. If it is a really large table then rebuilding indexes can take very long time and in this case update will be MUCH faster.
Of course if you have index on the key that you're searching with update statement and you are not updating the key.
For a small table with almost no indexes/foreign keys the operations run so fast that it's not a big issue.