Lua Scripts vs Multi/Exec in Redis - redis

Is there any reason to use lua scripts for atomicity in redis rather than using multi/exec style transactions?
I see some implementations specifically choose lua scripts when atomicity is needed but wouldn't it same with multi/exec as well or is it just a preference?

LUA is useful(and the only way) when you need to the result of one operation to use it in another. When you use MULTI/EXEC, you get results at the end of the transaction as an array. There won't be an intermediate response to use in the middle of the transaction.
Let's say you have list, you LPOP one element and use that element name as a key to INCRBY some other other. You can't do it in MULTI/EXEC(you may use WATCH with them to fail if the watched key is modified) in a transactional way. You need to give/know all the required parameters before starting transaction. When you assign the value, then it won't be server side, but will be client side which may cause a racing condition.
In LUA(with EVAL), you can do that assignment such as
local elt = redis.call('LPOP', KEYS[1])
local result = redis.call('INCRBY', elt, 2);
return result
There may be some cases that "choosing" any of them can be an option, but in some you need LUA.

Related

Efficiently delete RedisKeys in bulk via wildcard pattern

Problem:
I need to efficiently delete keys from my Redis Cache using a wildcard pattern. I don't need atomicity; eventual consistency is acceptable.
Tech stack:
.NET 6 (async all the way through)
StackExchange.Redis 2.6.66
Redis Server 6.2.6
I currently have ~500k keys in Redis.
I'm not able to use RedisJSON for various reasons
Example:
I store the following 3 STRING types with keys:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
where each STRING stores JSON like so:
> dump dailynote:getitemsforuser:region:fl:user:123
"{\"Name\":\"john\",\"Age\":22}"
The original solution used the KeysAsync method to retrieve the list of keys to delete via a wildcard pattern. Since the Redis Server is 6.x, the SCAN feature is being used by KeysAsync internally by the StackExchange.Redis nuget.
Original implementation used a wildcard pattern dailynote:getitemsforuser:region:*. As one would expect, this solution didn't scale well and we started seeing RedisTimeoutExceptions.
I'm aware of the "avoid this in PROD if you can" and have seen Marc Gravell respond to a couple other questions/issues on SO and StackExchange.Redis GitHub. The only potential alternative I could think of is to use a Redis SET to "track" each RedisKey and then retrieve the list of values from the SET (which are the keys I need to remove). Then delete the SET as well as the returned keys.
Potential Solution?:
Create a Redis SET with a key of dailynote:getitemsforuser with a value which is the key of the form dailynote:getitemsforuser:region:XX...
The SET would look like:
dailynote:getitemsforuser (KEY)
dailynote:getitemsforuser:region:sw:user:123 (VALUE)
dailynote:getitemsforuser:region:fl:user:123 (VALUE)
dailynote:getitemsforuser:region:sw:user:456 (VALUE)
...
I would still have each individual STRING type as well:
dailynote:getitemsforuser:region:sw:user:123
dailynote:getitemsforuser:region:fl:user:123
dailynote:getitemsforuser:region:sw:user:456
...
when it is time to do the "wildcard" remove, I get the members of the dailynote:getitemsforuser SET, then call RemoveAsync passing the members of the set as the RedisKey[]. Then call RemoveAsync with the key of the SET (dailynote:getitemsforuser)
I'm looking for feedback on how viable of a solution this is, alternative ideas, gotchas, and suggestions for improvement. TIA
UPDATE
Added my solution I went with below...
The big problem with both KEYS and SCAN with Redis is that they require a complete scan of the massive hash table that stores every Redis key. Even if you use a pattern, it still needs to check each entry in that hash table to see if it matches.
Assuming you are calling SADD when you are also setting the value in your key—and thus avoiding the call to SCAN—this should work. It is worth noting that calls to SMEMBERS to get all the members of a Set can also cause issues if the Set is big. Redis—being single-threaded—will block while all the members are returned. You can mitigate this by using SSCAN instead. StackExchange.Redis might do this already. I'm not sure.
You might also be able to write a Lua script that reads the Set and UNLINKs all the keys atomically. This would reduce network but could tie Redis up if this takes too long.
I ended up using the solution I suggested above where I use a Redis SET with a known/fixed key to "track" each of the necessary keys.
When a key that needs to be tracked is added, I call StackExchange.Redis.IDatabase.SetAddAsync (SADD) while calling StackExchange.Redis.IDatabase.HashSetAsync (HSET) for adding the "tracked" key (along with its TTL).
When it is time to remove the "tracked" key, I first call StackExchange.Redis.IDatabase.SetScanAsync (SSCAN) (with a page size of 250) iterating on the IAsyncEnumerable and call StackExchange.Redis.IDatabase.KeyDeleteAsync (HDEL) on chunks of the members of the SET. I then call StackExchange.Redis.IDatabase.KeyDeleteAsync on the actual key of the SET itself.
Hope this helps someone else.

compare redis command: multi and mget

there are two systems sharing a redis database, one system just read the redis, the other update it.
the read system is so busy that the redis can't handle it, to reduce the count of requests to redis, I find "mget", but I also find "multi".
I'm sure mget will reduce the number of requests, but will "multi" do the same? I think "multi" will force the redis server to keep some info about this transaction and collect requests in this transaction from the client one by one, so the total number of requests sent stays the same, but the results returned will be combined together, right?
So If I just read KeyA, keyB, keyC in "multi" when the other write system changed KeyB's value, what will happen?
Short Answer: You should use MGET
MULTI is used for transaction, and it won't reduces the number of requests. Also, the MULTI command MIGHT be deprecated in the future, since there's a better choice: lua scripting.
So If I just read KeyA, keyB, keyC in "multi" when the other write system changed KeyB's value, what will happen?
Since MULTI (with EXEC) command ensures transaction, all of the three GET commands (read operations) executes atomically. If the update happens before the read operation, you'll get the old value. Otherwise, you'll get the new value.
By the way, there's another option to reduce RTT: PIPELINE. However, in your case, MGET should be the best option.

Why can't my Redis Lua script atomically update keys on different Redis Cluster nodes?

I have a Redis Cluster consisting of multiple nodes. I want to update 3 different keys in a single atomic operation. My Lua script is like:
local u1 = redis.call('incrby', KEYS[1], ARGV[1])
local u2 = redis.call('incrby', KEYS[2], ARGV[1])
local u3 = redis.call('incrby', KEYS[3], ARGV[1])
And I fired it with:
EVAL script 3 key1 key2 key3 arg
But I got the error message:
WARN Resp(AppErr CROSSSLOT Keys in request don't hash to the same slot)
The above operations cannot be done, and the updates will fail. It seems I cannot modify the keys in different nodes with a single Lua script. But according to the doc:
All Redis commands must be analyzed before execution to determine
which keys the command will operate on. In order for this to be true
for EVAL, keys must be passed explicitly. This is useful in many ways,
but especially to make sure Redis Cluster can forward your request to
the appropriate cluster node.
Note this rule is not enforced in order
to provide the user with opportunities to abuse the Redis single
instance configuration, at the cost of writing scripts not compatible
with Redis Cluster.
So I think as long as I follow the key passing rule, the script should be compatible with Redis Cluster. I wonder what's the problem here and what should I do to update all keys in a single script.
I'm afraid you've misunderstood the documentation. (And I agree that it's not very clear.)
Redis operations, whether commands or Lua scripts, can only work when all the keys are on the same server. The purpose of the key passing rule is to allow Cluster servers to figure out where to send the script and to fail fast if all the keys don't come from the same server (which is what happened in your case).
So it's your responsibility to make sure that all the keys you want to operate on are located on the same server. The way to do that is to use hash tags to force keys to hash to the same slot. See the documentation for more details on that.

Redis PFADD to check a exists-in-set query

I have a requirement to process multiple records from a queue. But due to some external issues the items may sporadically occur multiple times.
I need to process items only once
What I planned to use is PFADD into redis every record ( as a md5sum) and then see if that returns success. If that shows no increment then the record is a duplicate else process the record.
This seems pretty straightforward , but I am getting too many false positives while using PFADD
Is there a better way to do this ?
Being the probabilistic data structure that it is, Redis' HyperLogLog exhibits 0.81% standard error. You can reduce (but never get rid of) the probability for false positives by using multiple HLLs, each counting a the value of a different hash function on your record.
Also note that if you're using a single HLL there's no real need to hash the record - just PFADD as is.
Alternatively, use a Redis Set to keep all the identifiers/hashes/records and have 100%-accurate membership tests with SISMEMBER. This approach requires more (RAM) resources as you're storing each processed element, but unless your queue is really huge that shouldn't be a problem for a modest Redis instance. To keep memory consumption under control, switch between Sets according to the date and set an expiry on the Set keys (another approach is to use a single Sorted Set and manually remove old items from it by keeping their timestamp in the score).
In general in distributed systems you have to choose between processing items either :
at most once
at least once
Processing something exactly-once would be convenient however this is generally impossible.
That being said there could be acceptable workarounds for your specific use case, and as you suggest storing the items already processed could be an acceptable solution.
Be aware though that PFADD uses HyperLogLog, which is fast and scales but is approximate about the count of the items, so in this case I do not think this is what you want.
However if you are fine with having a small probability of errors, the most appropriate data structure here would be a Bloom filter (as described here for Redis), which can be implemented in a very memory-efficient way.
A simple, efficient, and recommended solution would be to use a simple redis key (for instance a hash) storing a boolean-like value ("0", "1" or "true", "false") for instance with the HSET or SET with the NX option instruction. You could also put it under a namespace if you wish to. It has the added benefit of being able to expire keys also.
It would avoid you to use a set (not the SET command, but rather the SINTER, SUNION commands), which doesn't necessarily work well with Redis cluster if you want to scale to more than one node. SISMEMBER is still fine though (but lacks some features from hashes such as time to live).
If you use a hash, I would also advise you to pick a hash function that has fewer chances of collisions than md5 (a collision means that two different objects end up with the same hash).
An alternative approach to the hash would be to assign an uuid to every item when putting it in the queue (or a squuid if you want to have some time information).

What is the conventional way to store objects in a sorted set in redis?

What is the most convenient/fast way to implement a sorted set in redis where the values are objects, not just strings.
Should I just store object id's in the sorted set and then query every one of them individually by its key or is there a way that I can store them directly in the sorted set, i.e. must the value be a string?
It depends on your needs, if you need to share this data with other zsets/structures and want to write the value only once for every change, you can put an id as the zset value and add a hash to store the object. However, it implies making additionnal queries when you read data from the zset (one zrange + n hgetall for n values in the zset), but writing and synchronising the value between many structures is cheap (only updating the hash corresponding to the value).
But if it is "self-contained", with no or few accesses outside the zset, you can serialize to a chosen format (JSON, MESSAGEPACK, KRYO...) your object and then store it as the value of your zset entry. This way, you will have better performance when you read from the zset (only 1 query with O(log(N)+M), it is actually pretty good, probably the best you can get), but maybe you will have to duplicate the value in other zsets / structures if you need to read / write this value outside, which also implies maintaining synchronisation by hand on the value.
Redis has good documentation on performance of each command, so check what queries you would write and calculate the total cost, so that you can make a good comparison of these two options.
Also, don't forget that redis comes with optimistic locking, so if you need pessimistic (because of contention for instance) you will have to do it by hand and/or using lua scripts. If you need a lot of sync, the first option seems better (less performance on read, but still good, less queries and complexity on writes), but if you have values that don't change a lot and memory space is not a problem, the second option will provide better performance on reads (you can duplicate the value in redis, synchronize the values periodically for instance).
Short answer: Yes, everything must be stored as a string
Longer answer: you can serialize your object into any text-based format of your choosing. Most people choose MsgPack or JSON because it is very compact and serializers are available in just about any language.