I am looking on LPOP and LPOPRPUSH as a valid options for an atomic action to pop value.
However I have a job that every 2 seconds pops 1000 values from that list - which is 1000 requests to Redis.
I would have used SPOP which can return X values back in one request. But those are randon ones and not the most left ones.
I do need to pop them from the left side of the list.
What are my options to do it the fastest, without locking and atomic? I have multiple servers that access this list and I can't retrieve duplicate values (That's why LRANGE doesn't work for me)
EDIT
The more I'm thinking about it the more I see that I need to compromise and use SPOP.
The scenario is batching inserts into the DB with Redis. Instead of thousands inserts a sec to MySQL - I'm pushing to Redis and every 2sec get the values and insert in one go to MySQL.
I guess I can use SPOP if I will add timestamp to the actual value in Redis, and to avoid the possibility where a value can be stuck in the set forever I will run a loop of SPOP x 1000 until null.
There're to options:
Use Lua script to pop N elements in a single EVAL command:
EVAL 'local result = {}; for i = 1, ARGV[1] do result[i] = redis.call("lpop", KEYS[1]) end; return result' 1 key N
Use Redis pipeline to send N LPOP commands to reduce RTT.
Related
I have a dozen of REDIS Keys of the type SET, say
PUBSUB_USER_SET-1-1668985588478915880,
PUBSUB_USER_SET-2-1668985588478915880,
PUBSUB_USER_SET-3-1668988644477632747,
.
.
.
.
PUBSUB_USER_SET-10-1668983464477632083
The set contains a userId and the problem statement is to check if the user is present in any of the set or not
The solution I tried is to get all the keys and append with a delimiter (, comma) and pass it as an argument to lua script wherein with gmatch operator I split the keys and run sismember operation until there is a hit.
local vals = KEYS[1]
for match in (vals..","):gmatch("(.-)"..",") do
local exist = redis.call('sismember', match, KEYS[2])
if (exist == 1) then
return 1
end
end
return 0
Now as and when the number of keys grows to PUBSUB_USER_SET-20 or PUBSUB_USER_SET-30 I see an increase in latency and in throughput.
Is this the better way to do or Is it better to batch LUA scripts where in instead of passing 30keys as arguments I pass in batches of 10keys and return as soon as the user is present or is there any better way to do this?
I would propose a different solution instead of storing keys randomly in a set. You should store keys in one set and you should query that set to check whether a key is there or not.
Lets say we've N sets numbered s-0,s-1,s-2,...,s-19
You should put your keys in one of these sets based on their hash key, which means you need to query only one set instead of checking all these sets. You can use any hashing algorithm.
To make it further interesting you can try consistent hashing.
You can use redis pipeline with batching(10 keys per iteration) to improve the performance
Bussiness Objective
I'm creating a dashboard that will depend on some time-series and I'll use Redis to implement it. I'm new to using Redis and I'm trying to use Redis-Streams to count the elements in a stream.
XADD conversation:9:chat_messages * id 2583 user_type Bot
XADD conversation:9:chat_messages * id 732016 user_type User
XADD conversation:9:chat_messages * id 732017 user_type Staff
XRANGE conversation:9:chat_messages - +
I'm aware that I can get the total count of the elements using the XLEN command like this:
XLEN conversation:9:chat_messages
but I want to also know the elements in a period, for example:
XLEN conversation:9:chat_messages 1579551316273 1579551321872
I know I can use LUA to count those elements but I want some REALLY fast way to achieve this and I know that using Redis markup will be the fastest way.
Is there any way to achieve this with a straight forward Redis command? Or do I have to write a Lua script to do this?
Additional information
I'm limited by AWS' ElastiCache to use the only Redis 5.0.6, I cannot install other modules such as the RedisTimeSeries module. I'd like to use that module but it's not possible at the moment.
While the Redis Stream data structure doesn't support this, you can use a Sorted Set alongside it for keeping track of message ranges.
Basically, for each message ID you get from XADD - e.g. "1579551316273-0" - you need to do a ZADD conversation:9:ids 0 1579551316273-0. Then, you can use ZLEXCOUNT to get the "length" of a range.
Sorry, there is no commands-way to achieve this.
Your best option with Redis Streams would be to use a Lua script. You will get O(N) with N being the number of elements being counted, instead of O(log N) if a command existed.
local T = redis.call('XRANGE', KEYS[1], ARGV[1], ARGV[2])
local count = 0
for _ in pairs(T) do count = count + 1 end
return count
Note the difference between O(N) and O(log(N)) is significant for a large N, but for a chat application, if tracked by conversation, this won't make that big of a difference if chats have hundreds or even thousands of entries, once you account total command time including Round Trip Time which takes most of the time. The Lua script above removes network-payload and client-processing time.
You can switch to sorted sets if you really want O(log N) and you don't need consumer groups and other stream features. See How to store in Redis sorted set with server-side timestamp as score? if you want to use Redis server timestamp atomically.
Then you can use ZCOUNT which is O(log(N)).
If you do need Stream features, then you would need to keep the sorted set as a secondary index.
I have two sets in Redis - ProcessedUrls and PendingUrls.
I want to do in one redis query the following:
Remove all the keys that are in both pending and processed sets from the Pendings set and after that return 100(or any other number - X) values of the Pending set.
Should I do it via Lua(redis server-side scripting language)?
I would think there's a more simple way.
Thansk for the help
You can use the SDIFFSTORE command to get the diff items and save it back to the pending set:
SDIFFSTORE PendingUrls PendingUrls ProcessedUrls
Then you can use SRANDMEMBER PendingUrls N command to randomly get N members in the pending set.
If you want to make these two operations atomic, wrap them into a Lua script or transaction.
I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently
A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.
I'm starting to use Redis, and I've run into the following problem.
I have a bunch of objects, let's say Messages in my system. Each time a new User connects, I do the following:
INCR some global variable, let's say g_message_id, and save INCR's return value (the current value of g_message_id).
LPUSH the new message (including the id and the actual message) into a list.
Other clients use the value of g_message_id to check if there are any new messages to get.
Problem is, one client could INCR the g_message_id, but not have time to LPUSH the message before another client tries to read it, assuming that there is a new message.
In other words, I'm looking for a way to do the equivalent of adding rows in SQL, and having an auto-incremented index to work with.
Notes:
I can't use the list indexes, since I often have to delete parts of the list, making it invalid.
My situation in reality is a bit more complex, this is a simpler version.
Current solution:
The best solution I've come up with and what I plan to do is use WATCH and Transactions to try and perform an "autoincrement" myself.
But this is such a common use-case in Redis that I'm surprised there is not existing answer for it, so I'm worried I'm doing something wrong.
If I'm reading correctly, you are using g_message_id both as an id sequence and as a flag to indicate new message(s) are available. One option is to split this into two variables: one to assign message identifiers and the other as a flag to signal to clients that a new message is available.
Clients can then compare the current / prior value of g_new_message_flag to know when new messages are available:
> INCR g_message_id
(integer) 123
# construct the message with id=123 in code
> MULTI
OK
> INCR g_new_message_flag
QUEUED
> LPUSH g_msg_queue "{\"id\": 123, \"msg\": \"hey\"}"
QUEUED
> EXEC
Possible alternative, if your clients can support it: you might want to look into the
Redis publish/subscribe commands, e.g. cients could publish notifications of new messages and subscribe to one or more message channels to receive notifications. You could keep the g_msg_queue to maintain a backlog of N messages for new clients, if necessary.
Update based on comment: If you want each client to detect there are available messages, pop all that are available, and zero out the list, one option is to use a transaction to read the list:
# assuming the message queue contains "123", "456", "789"..
# a client detects there are new messages, then runs this:
> WATCH g_msg_queue
OK
> LRANGE g_msg_queue 0 100000
QUEUED
> DEL g_msg_queue
QUEUED
> EXEC
1) 1) "789"
2) "456"
3) "123"
2) (integer) 1
Update 2: Given the new information, here's what I would do:
Have your writer clients use RPUSH to append new messages to the list. This lets the reader clients start at 0 and iterate forward over the list to get new messages.
Readers need to only remember the index of the last message they fetched from the list.
Readers watch g_new_message_flag to know when to fetch from the list.
Each reader client will then use "LRANGE list index limit" to fetch the new messages. Suppose a reader client has seen a total of 5 messages, it would run "LRANGE g_msg_queue 5 15" to get the next 10 messages. Suppose 3 are returned, so it remembers the index 8. You can make the limit as large as you want, and can walk through the list in small batches.
The reaper client should set a WATCH on the list and delete it inside a transaction, aborting if any client is concurrently reading from it.
When a reader client tries LRANGE and gets 0 messages it can assume the list has been truncated and reset its index to 0.
Do you really need unique sequential IDs? You can use UUIDs for uniqueness and timestamps to check for new messages. If you keep the clocks on all your servers properly synchronized then timestamps with a one second resolution should work just fine.
If you really do need unique sequential IDs then you'll probably have to set up a Flickr style ticket server to properly manage the central list of IDs. This would, essentially, move your g_message_id into a database with proper transaction handling.
You can simulate auto-incrementing a unique key for new rows. Simply use DBSIZE to get the current number of rows, then in your code, increment that number by 1, and use that number as the key for the new row. It's simple and atomic.