What will happen if 2 workers call ZREM on the same element of a sorted set at the same time? Will it return true to the worker which actually removes the element and false to the other to indicate it doesn't exist or will it return true to both? In other words is ZREM atomic internally?
Redis is (mostly) single-threaded so all its operations are atomic and ZREM is no exception. Your question, however, is actually about doing a "ZPOP" atomically so there are two possible ways to do that.
Option 1: WATCH/MULTI/EXEC
In pseudo code, this is how an optimistic transaction would look:
:start
WATCH somekey
member = ZREVRANGE somekey 0 0
MULTI
ZREM somekey member
if not EXEC goto :start // or quit trying
Option 2: Lua script
zpop.lua:
local member = redis.call('ZREVRANGE', KEYS[1], 0, 0)
return redis.call('ZREM', KEYS[1], member)
redis-cli --eval zpop.lua somekey
Note - The Importance of Atomicity
In case you decide not to use these mechanisms that ensure atomicity, you'll be running into issues sooner than you think. Here's a possible scenario:
Process A Redis Server Process B
ZREVRANGE ------------>
<------------- foo
<--------- ZADD +inf bar
OK --------->
ZREM foo -------------->
<-------------- 1
In the above example, after A fetches foo, B inserts bar with an absurdly high score so it becomes the top element in the set. A, however, will continue and remove the previously-at-the-top foo.
Related
Assume that there is a key K in Redis that is holding a list of values.
Many producer clients are adding elements to this list, one by one using LPUSH or RPUSH.
On the other hand, another set of consumer clients are popping elements from the list, though with certain restriction. Consumers will only attempt to pop N number of items, only if the list contains at least N number of items. This ensures that the consumer will hold N items in hand after finishing popping process
If the list contains fewer than N number of items, consumers shouldn't even attempt to pop elements from the list at all, because they won't have at least N items at the end.
If there is only 1 Consumer client, the client can simply run LLEN command to check if the list contains at least N items, and subtract N using LPOP/RPOP.
However, if there are many consumer clients, there can be a race condition and they can simultaneously pop items from the list, after reading LLEN >= N. So we might end up in a state where each consumer might pop fewer than N elements, and there is no item left in the list in Redis.
Using a separate locking system seems to be one way to tackle this issue, but I was curious if this type of operation can be done only using Redis commands, such as Multi/Exec/Watch etc.
I checked Multi/Exec approach and it seems that they do not support rollback. Also, all commands executed between Multi/Exec transaction will return 'QUEUED' so I won't be able to know if N number of LPOP that I will be executing in the transaction will all return elements or not.
So all you need is an atomic way to check on list length and pop conditionally.
This is what Lua scripts are for, see EVAL command.
Here a Lua script to get you started:
local len = redis.call('LLEN', KEYS[1])
if len >= tonumber(ARGV[1]) then
local res = {n=len}
for i=1,len do
res[i] = redis.call('LPOP', KEYS[1])
end
return res
else
return false
end
Use as
EVAL "local len = redis.call('LLEN', KEYS[1]) \n if len >= tonumber(ARGV[1]) then \n local res = {n=len} \n for i=1,len do \n res[i] = redis.call('LPOP', KEYS[1]) \n end \n return res \n else \n return false \n end" 1 list 3
This will only pop ARGV[1] elements (the number after the key name) from the list if the list has at least that many elements.
Lua scripts are ran atomically, so there is no race condition between reading clients.
As OP pointed in comments, there is risk of data-loss, say because power failure between LPOPs and the script return. You can use RPOPLPUSH instead of LPOP, storing the elements on a temporary list. Then you also need some tracking, deleting, and recovery logic. Note your client could also die, leaving some elements unprocessed.
You may want to take a look at Redis Streams. This data structure is ideal for distributing load among many clients. When using it with Consumer Groups, it has a pending entries list (PEL), that acts as that temporary list.
Clients then do a XACK to remove elements from the PEL once processed. Then you are also protected from client failures.
Redis Streams are very useful to solve the complex problem you are trying to solve. You may want to do the free course about this.
You could use a prefetcher.
Instead of each consumer greedily picking an item from the queue which leads to the problem of 'water water everywhere, but not a drop to drink', you could have a prefetcher that builds a packet of size = 6. When the prefetcher has a full packet, it can place the item in a separate packet queue (another redis key with list of packets) and pop the items from the main queue in a single transaction. Essentially, what you wrote:
If there is only 1 Consumer client, the client can simply run LLEN
command to check if the list contains at least N items, and subtract N
using LPOP/RPOP.
If the prefetcher doesn't have a full packet, it does nothing and keeps waiting for the main queue size to reach 6.
On the consumer side, they will just query the prefetched packets queue and pop the top packet and go. It is always 1 pre-built packet (size=6 items). If there are no packets available, they wait.
On the producer side, no changes are required. They can keep inserting into the main queue.
BTW, there can be more than one prefetcher task running concurrently and they can synchronize access to the main queue between themselves.
Implementing a scalable prefetcher
Prefetcher implementation could be described using a buffet table analogy. Think of the main queue as a restaurant buffet table where guests can pick up their food and leave. Etiquette demands that the guests follow a queue system and wait for their turn. Prefetchers also would follow something analogous. Here's the algorithm:
Algorithm Prefetch
Begin
while true
check = main queue has 6 items or more // this is a queue read. no locks required
if(check == true)
obtain an exclusive lock on the main queue
if lock successful
begin a transaction
create a packet and fill it with top 6 items from
the queue after popping them
add the packet to the prefetch queue
if packet added to prefetch queue successfully
commit the transaction
else
rollback the transaction
end if
release the lock
else
// someone else has the excl lock, we should just wait
sleep for xx millisecs
end if
end if
end while
End
I am just showing an infinite polling loop here for simplicity. But this could be implemented using pub/sub pattern through Redis Notifications. So, the prefetcher just waits for a notification that the main queue key is receiving an LPUSH and then executes the logic inside the while loop body above.
There are other ways you could do this. But this should give you some ideas.
Let suppose I want to pop 3 elements from the set, how I ensure that only pop if 3 elements present in a set otherwise return any error or other msg
How to use "spop" command with "count" argument.
What you want is to call SCARD myKey to test the number of members, and based on the result call SPOP.
SPOP with COUNT will return up to COUNT members, meaning if your set only has one or two, they'll be SPOPed and returned.
You probably want to do this with one atomic operation. So you have to use Lua Scrips:
EVAL "if redis.call('SCARD', KEYS[1]) >= tonumber(ARGV[1]) then return redis.call('SPOP', KEYS[1], ARGV[1]) else return redis.error_reply(KEYS[1]..' does NOT have at least '..ARGV[1]..' members') end" 1 myKey myNumber
Let's take a look at the script:
if redis.call('SCARD', KEYS[1]) >= tonumber(ARGV[1]) then
return redis.call('SPOP', KEYS[1], ARGV[1])
else
return redis.error_reply(KEYS[1]..' does NOT have at least '..ARGV[1]..' members')
end
KEYS[1] refers to the key parameter, the set you're interested in. It is important to pass keys through parameters for your script to be supported in a Redis Cluster.
ARGV[1] is an additional argument to pass your number of desired members, in your question, it is 3.
The script is run atomically server-side within Redis, and it is compiled only once as Redis caches it internally.
You can use SCRIPT LOAD to load the script and then reuse it with EVALSHA, to also improve networking performance.
Is there good way to support pop members from the Redis Sorted Set just like the api LPOP of the List ?
What I figured out for poping message from the Redis Sorted Set is using ZRANGE +ZREM , however it is not thread security and need the distributed lock when multi threads accessing them at the same time from the different host.
Please kind suggesting if there is better way to pop the members from the Sorted Set?
In Redis 5.0 or above, you can use [B]ZPOP{MIN|MAX} key [count] for this scenario.
The MIN version takes the item(s) with the lowest scores; MAX takes the item(s) with the highest scores. count defaults to 1, and the B prefix blocks until the data is available.
ZPOPMIN
ZPOPMAX
BZPOPMIN
BZPOPMAX
You can write a Lua script to do the job: wrap these two commands in a single Lua script. Redis ensures that the Lua script runs in an atomic way.
local key = KEYS[1]
local result = redis.call('ZRANGE', key, 0, 0)
local member = result[1]
if member then
redis.call('ZREM', key, member)
return member
else
return nil
end
I'm tracking members in multiple Sorted Sets in Redis as a way to do multi-column indexing on the members. As an example, let's say I have two Sorted Sets, lastseen (which is epoch time) and points, and I store usernames as members in these Sorted Sets.
I'm wanting to first sort by lastseen so I can get the users seen within the last day or month, then I'm wanting to sort the resulting members by points so I effectively have the members seen within the last day or month sorted by points.
This would be easy if I could store the result of a call to ZREVRANGEBYSCORE to a new Sorted Set (we'll call the new Sorted Set temp), because then I could sort lastseen with limits, store the result to temp, use ZINTERSTORE against temp and points with a weight of zero for out (stored to result), and finally use ZREVRANGEBYSCORE again on result. However, there's no built-in way in Redis to store the result of ZRANGE to a new Sorted Set.
I looked into using the solution posted here, and while it does seem to order the results correctly, the resulting scores in the Sorted Set can no longer be used to accurately limit results based on time (ie. only want ones within the last day).
For example:
redis> ZADD lastseen 12345 "foo"
redis> ZADD lastseen 12350 "bar"
redis> ZADD lastseen 12355 "sucka"
redis> ZADD points 5 "foo"
redis> ZADD points 3 "bar"
redis> ZADD points 9 "sucka"
What I'd like to end up with, assuming my time window is between 12349 and 12356, is the list of members ['sucka', 'bar'].
The solutions I can think of are:
1) Your wish was to ZREVRANGEBYSCORE and somehow save the temporary result. Instead you could copy the zset (which can be done with a ZINTERSTORE with only one set as an argument), then do a ZREMRANGEBYSCORE on the new copy to get rid of the times you're not interested in, then do the final ZINTERSTORE.
2) Do it in a loop on the client, as Eli suggested.
3) Do the same thing in a Lua script.
These are all potentially expensive operations, so what's going to work best will depend on your data and use case. Without knowing more, I would personally lean towards the Lua solution.
For queries that get this complex, you want to supplement Redis' built-in commands with another processing language. The easiest way to do that is calling from within whatever your backend language is and using that to process. An example in Python using redis-py is:
import redis
finish_time, start_time = 12356, 12349
r = redis.Redis(host='localhost', port=6379, db=0, password='some_pass')
entries_in_time_frame = r.zrevrangebyscore('lastseen', finish_time, start_time)
p = r.pipeline()
for entry in entries_in_time_frame:
p.zscore('points', entry)
scores = zip(entries_in_time_frame, p.execute())
sorted_entries = [tup[0] for tup in sorted(scores, key=lambda tup: tup[1])]
>>> ['sucka', 'bar']
Note the pipeline, so we're only ever sending two calls to the Redis server, so network latency shouldn't slow us down much. If you need to go even faster (perhaps if what's returned by the first ZREVRANGEBYSCORE is very long), you can rewrite the same logic as above as a Lua script. Here's a working example (note my lua is rusty, so this can be optimized):
local start_time = ARGV[1]
local finish_time = ARGV[2]
local entries_in_time_frame = redis.call('ZREVRANGEBYSCORE', KEYS[1], finish_time, start_time)
local sort_function = function (k0, k1)
local s0 = redis.call('ZSCORE', KEYS[2], k0)
local s1 = redis.call('ZSCORE', KEYS[2], k1)
return (s0 > s1)
end
table.sort(entries_in_time_frame, sort_function)
return entries_in_time_frame
You can call it like so:
redis-cli -a some_pass EVAL "$(cat script.lua)" 2 lastseen points 12349 12356
Returning:
1) "bar"
2) "foo"
I'm trying to add a value to a list but only if it hasn't been added yet.
Is there a command to do this or is there a way to test for the existence of a value within a list?
Thanks!
I need to do the same.
I think about to remove the element from the list and then add it again. If the element is not in the list, redis will return 0, so there is no error
lrem mylist 0 myitem
rpush mylist myitem
As Tommaso Barbugli mentioned you should use a set instead a list if you need only unique values.
see REDIS documentation SADD
redis> SADD myset "Hello"
(integer) 1
redis> SADD myset "World"
(integer) 1
redis> SADD myset "World"
(integer) 0
redis> SMEMBERS myset
1) "World"
2) "Hello"
If you want to check the presence of a value in the set you may use SISMEMBER
redis> SADD myset "one"
(integer) 1
redis> SISMEMBER myset "one"
(integer) 1
redis> SISMEMBER myset "two"
(integer) 0
It looks like you need a set or a sorted set.
Sets have O(1) membership test and enforced uniqueness.
If you can't use the SETs (in case you want to achieve some blocking POP/PUSH list features) you can use a simple script:
script load 'local exists = false; for idx=1, redis.call("LLEN",KEYS[1]) do if (redis.call("LINDEX", KEYS[1], idx) == ARGV[1]) then exists = true; break; end end; if (not exists) then redis.call("RPUSH", KEYS[1], ARGV[1]) end; return not exists or 0'
This will return the SHA code of the script you've added.
Just call then:
evalsha 3e31bb17571f819bea95ca5eb5747a373c575ad9 1 test-list myval
where
3e31bb17571f819bea95ca5eb5747a373c575ad9 (the SHA code of the script you added)
1 — is number of parameters (1 is constant for this function)
test-list — the name of your list
myval - the value you need to add
it returns 1 if the new item was added or 0 if it was already in the list.
Such feature is available in set using hexistshexists command in redis.
Checking a list to see if a member exists within it is O(n), which can get quite expensive for big lists and is definitely not ideal. That said, everyone else seems to be giving you alternatives. I'll just tell you how to do what you're asking to do, and assume you have good reasons for doing it the way you're doing it. I'll do it in Python, assuming you have a connection to Redis called r, some list called some_list and some new item to add called new_item:
lst = r.lrange(list_name, -float('Inf'), float('Inf'))
if new_item not in lst:
r.rpush(list_name, new_item)
I encountered this problem while adding to a task worker queue, because I wanted to avoid adding many duplicate tasks. Using a Redis set (as many people are suggesting) would be nice, but Redis sets don't have a "blocking pop" like BRPOPLPUSH, so they're not good for task queues.
So, here's my slightly non-ideal solution (in Python):
def pushOnlyNewItemsToList(redis, list_name, items):
""" Adds only the items that aren't already in the list.
Though if run simultaneously in multiple threads, there's still a tiny chance of adding duplicate items.
O(n) on the size of the list."""
existing_items = set(redis.lrange(list_name,0,-1))
new_items = set(items).difference(existing_items)
if new_items:
redis.lpush(list_name, *new_items)
Note the caveats in the docstring.
If you need to truly guarantee no duplicates, the alternative is to run LREM, LPUSH inside a Redis pipeline, as in 0xAffe's answer. That approach causes less network traffic, but has the downside of reordering the list. It's probably the best general answer if you don't care about list order.