Alternatives to slow DEL large key

Alternatives to slow DEL large key - redis

There is async UNLINK in the upcoming Redis 4, but until then, what are some good alternatives to implementing DELete of large set keys with no or minimal blocking?
Is RENAME to some unique name followed by EXPIRE 1 second a good solution? RENAME first so that the original key name becomes available for use. Freeing the memory right away is not of immediate concern, Redis can do async garbage collection when it can.

EXPIRE will not eliminate the delay, only delay it until the server actually expires the value (note that Redis uses an approximate expiration algorithm). Once the server gets to actually expiring the value, it will issue a DEL command that will block the server until the value is deleted.
If you are unable to use v4's UNLINK, the best way you could go about deleting a large set is by draining it incrementally. This can be easily accomplished with a server-side Lua script to reduce the bandwidth, such as this one:
local target = KEYS[1]
local count = tonumber(ARGV[1]) or 100
local reply = redis.call('SPOP', target, count)
if reply then
return #reply
else
return nil
end
To drain, call repeatedly the script above with the key-to-be-deleted's name, and with or without a count argument, until you get a nill Redis reply.

Related

Can I rotate cache expiration in a Redis cluster

I have a redis cluster with several replica nodes, holding a cache of a time-consuming complex db query. The cache expires every minute, and lately with enough traffic volume I've had client timeouts while the cache is rebuilding and waiting for that complex db query to complete.
What I'd like to do is set it up so one node expires every even minute while the other expires every odd minute, this way if one node is rebuilding the cache, the other node can serve the cache. Does Redis have such a feature, or is there a recommended workaround for a scenario like this? I couldn't find any docs on this. Thank you!

In a Redis cluster, the primary will expire the key and instruct its replicas to expire it too by issuing a DEL command to them on the cluster bus.
If you want the value to be always available for your clients, then you need to a process to refresh your key at the cadence you want and then use the expiration cache-miss scenario as a failback in case the refresh process failed.
If you really want to use two keys and have them expire every other minute, you can use EXPIREAT or its precise version PEXPIREAT. But this sounds unnecessary.
You can use TTL (or PTTL) to consult on the time left for a key.
If your clients access the cache key on bursts and you just want to avoid some of them getting a cache-miss every minute and therefore timing out, you can get the value and the TTL and if it is lower than a reasonable time you then trigger the query to refresh after responding to your client immediately.
You can use a simple Lua Script so you query both the value and the TTL of the key with one request to the Redis server. You can also do the same with pipelining or transactions, I just like to promote Lua scripting as it is a more powerful tool.
local val = {}
val[1] = redis.call('GET', KEYS[1])
if val[1] then
val[2] = redis.call('PTTL', KEYS[1])
return val
else
return false
end
You use as
EVAL "local val = {} val[1] = redis.call('GET', KEYS[1]) if val[1] then val[2] = redis.call('PTTL', KEYS[1]) return val else return false end" 1 data
1) "queryResult"
2) (integer) 1664
You get both the value and the TTL and then you can trigger a proactive refresh if your cache key is close to expire.

Error on Write operation (code 22) after calling Truncate. - C# client

When I try to use Aerospike client Write() I obtain this error:
22 AS_PROTO_RESULT_FAIL_FORBIDDEN
The error occurs only when the Write operation is called after a Truncate() and only on specific keys.
I tried to:
change the key type (string, long, small numbers, big numbers)
change the Key type passed (Value, long, string)
change the retries number on WritePolicy
add a delay (200ms, 500ms) before every write
generate completely new keys (GUID.NewGuid().ToString())
None solved the case so I think the unique cause is the Truncate operation.
The error is systematic; for the same set of keys fail exactly on the same keys.
The error occurs also when after calling the Truncate I wait X seconds and checking the Console Management the Objects number on the Set is "0" .
I have to wait minutes (1 to 5) to be sure that running the process the problem is gone.
The cluster has 3 nodes with replica factor of 2. SSD persistence
I'm using the NuGet C# Aerospike.Client v 3.4.4
Running the process on a single local node (docker, in memory) does not give any error.
How can I know when the Truncate() process (the delete operation behind it) is completely terminated and I can safely use the Set ?
[Solution]
As suggested our devops checked the timespan synchronization. He found that the NTP was not enabled on the machine images (by mistake).
Enabled it. Tested again. No more errors.
Thanks,
Alex

Sounds like a potential issue with time synchronization across nodes, make sure you have ntp setup correctly... That would be my only guess at this point, especially as you are mentioning it does work on a single node. The truncate command will capture the current time (if you don't specify a time) and will use that to prevent records written 'prior' to that time from being written. Check under the (from top of my head, sorry if not exactly this) /opt/aerospike/smd/truncate.smd to see on each node the timestamp of the truncated command and check the time across the different nodes.
[Thanks #kporter for the comment. So the time would be the same in all truncate.smd file, but a time discrepancy between machine would then still cause writes to fail against some of the nodes]

How to synchronise multiple writer on redis?

I have multiple writers overwriting the same key in redis. How do I guarantee that only the chosen one write last?
Can I perform write synchronisation in Redis withour synchronise the writers first?
Background:
In my system a unique dispatcher send works to do to various workers. Each worker then write the result in Redis overwrite the same key. I need to be sure that only the last worker that receive work from the dispatcher writes in Redis.

Use an ordered set (ZSET): add your entry with a score equal to the unix timestamp, then delete all but the top rank.
A Redis Ordered set is a set, where each entry also has a score. The set is ordered according to the score, and the position of an element in the ordered set is called Rank.
In order:
Remove all the entries with score equal or less then the one you are adding(zremrangebyscore). Since you are adding to a set, in case your value is duplicate your new entry would be ignored, you want instead to keep the entry with highest rank. 
Add your value to the zset (zadd)
delete by rank all the entries but the one with HIGHEST rank (zremrangebyrank)
You should do it inside a transaction (pipeline)
Example in python:
# timestamp contains the time when the dispatcher sent a message to this worker
key = "key_zset:%s"%id
pipeline = self._redis_connection.db.pipeline(transaction=True)
pipeline.zremrangebyscore(key, 0, t)  # Avoid duplicate Scores and identical data
pipeline.zadd(key, t, "value")
pipeline.zremrangebyrank(key, 0, -2)
pipeline.execute(raise_on_error=True)

If I were you, I would use redlock.
Before you write to that key, you acquire the lock for it, then update it and then release the lock.
I use Node.js so it would look something like this, not actually correct code but you get the idea.
Promise.all(startPromises)
.bind(this)
.then(acquireLock)
.then(withLock)
.then(releaseLock)
.catch(handleErr)
function acquireLock(key) {
return redis.rl.lock(`locks:${key}`, 3000)
}
function withLock(lock) {
this.lock = lock
// do stuff here after get the lock
}
function releaseLock() {
this.lock.unlock()
}

You can use redis pipeline with Transaction.
Redis is single threaded server. Server will execute commands syncronously. When Pipeline with transaction is used, server will execute all commands in pipeline atomically.
Transactions
MULTI, EXEC, DISCARD and WATCH are the foundation of transactions in Redis. They allow the execution of a group of commands in a single step, with two important guarantees:
All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.
A simple example in python
with redis_client.pipeline(transaction=True) as pipe:
val = int(pipe.get("mykey"))
val = val*val%10
pipe.set("mykey",val)
pipe.execute()

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently

A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

How do I delete all keys matching a specified key pattern using StackExchange.Redis?

I've got about 150,000 keys in a Redis cache, and need to delete > 95% of them - all keys matching a specific key prefix - as part of a cache rebuild. As I can see it, there are three ways to achieve this:
Use server.Keys(pattern) to pull out the entire key list matching my prefix pattern, and iterate through the keys calling KeyDelete for each one.
Maintain a list of keys in a Redis set - each time I insert a value, I also insert the key in the corresponding key set, and then retrieve these sets rather than using Keys. This would avoid the expensive Keys() call, but still relies on deleting tens of thousands of records one by one.
Isolate all of my volatile data in a specific numbered database, and just flush it completely at the start of a cache rebuild.
I'm using .NET and the StackExchange.Redis client - I've seen solutions elsewhere that use the CLI or rely on Lua scripting, but nothing that seems to address this particular use case - have I missed a trick, or is this just something you're not supposed to do with Redis?
(Background: Redis is acting as a view model in front of the Microsoft Dynamics CRM API, so the cache is populated on first run by pulling around 100K records out of CRM, and then kept in sync by publishing notifications from within CRM whenever an entity is modified. Data is cached in Redis indefinitely and we're dealing with a specific scenario here where the CRM plugins fail to fire for a period of time, which causes cache drift and eventually requires us to flush and rebuild the cache.)

Both options 2 & 3 are reasonable.
Steer clear of option 1. KEYS really is slow and only gets slower as your keyspace grows.
I'd normally go for 2 (without LUA, including LUA would increase the learning curve to support the solution - which of course is fine when justified and assuming it's existence is clear/documented.), but 3 could definitely be a contender, fast and simple, as long as you can be sure you won't exceed the configured DB limit.

Use scanStream instead of keys and it will work like a charm.
Docs - https://redis.io/commands/scan
The below code can get you a array of keys starting with LOGIN:: and you can loop through the array and execute redis DEL command to del the corresponding keys.
Example code in nodejs :-
const redis = require('ioredis');
let stream = redis.scanStream({
match: "LOGIN::*"
});
stream.on("data", async (keys = []) => {
let key;
for (key of keys) {
if (!keysArray.includes(key)) {
await keysArray.push(key);
}
}
});
stream.on("end", () => {
res(keysArray);
});

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas