Autoincrement in Redis

Autoincrement in Redis - redis

I'm starting to use Redis, and I've run into the following problem.
I have a bunch of objects, let's say Messages in my system. Each time a new User connects, I do the following:
INCR some global variable, let's say g_message_id, and save INCR's return value (the current value of g_message_id).
LPUSH the new message (including the id and the actual message) into a list.
Other clients use the value of g_message_id to check if there are any new messages to get.
Problem is, one client could INCR the g_message_id, but not have time to LPUSH the message before another client tries to read it, assuming that there is a new message.
In other words, I'm looking for a way to do the equivalent of adding rows in SQL, and having an auto-incremented index to work with.
Notes:
I can't use the list indexes, since I often have to delete parts of the list, making it invalid.
My situation in reality is a bit more complex, this is a simpler version.
Current solution:
The best solution I've come up with and what I plan to do is use WATCH and Transactions to try and perform an "autoincrement" myself.
But this is such a common use-case in Redis that I'm surprised there is not existing answer for it, so I'm worried I'm doing something wrong.

If I'm reading correctly, you are using g_message_id both as an id sequence and as a flag to indicate new message(s) are available. One option is to split this into two variables: one to assign message identifiers and the other as a flag to signal to clients that a new message is available.
Clients can then compare the current / prior value of g_new_message_flag to know when new messages are available:
> INCR g_message_id
(integer) 123
# construct the message with id=123 in code
> MULTI
OK
> INCR g_new_message_flag
QUEUED
> LPUSH g_msg_queue "{\"id\": 123, \"msg\": \"hey\"}"
QUEUED
> EXEC
Possible alternative, if your clients can support it: you might want to look into the
Redis publish/subscribe commands, e.g. cients could publish notifications of new messages and subscribe to one or more message channels to receive notifications. You could keep the g_msg_queue to maintain a backlog of N messages for new clients, if necessary.
Update based on comment: If you want each client to detect there are available messages, pop all that are available, and zero out the list, one option is to use a transaction to read the list:
# assuming the message queue contains "123", "456", "789"..
# a client detects there are new messages, then runs this:
> WATCH g_msg_queue
OK
> LRANGE g_msg_queue 0 100000
QUEUED
> DEL g_msg_queue
QUEUED
> EXEC
1) 1) "789"
2) "456"
3) "123"
2) (integer) 1
Update 2: Given the new information, here's what I would do:
Have your writer clients use RPUSH to append new messages to the list. This lets the reader clients start at 0 and iterate forward over the list to get new messages.
Readers need to only remember the index of the last message they fetched from the list.
Readers watch g_new_message_flag to know when to fetch from the list.
Each reader client will then use "LRANGE list index limit" to fetch the new messages. Suppose a reader client has seen a total of 5 messages, it would run "LRANGE g_msg_queue 5 15" to get the next 10 messages. Suppose 3 are returned, so it remembers the index 8. You can make the limit as large as you want, and can walk through the list in small batches.
The reaper client should set a WATCH on the list and delete it inside a transaction, aborting if any client is concurrently reading from it.
When a reader client tries LRANGE and gets 0 messages it can assume the list has been truncated and reset its index to 0.

Do you really need unique sequential IDs? You can use UUIDs for uniqueness and timestamps to check for new messages. If you keep the clocks on all your servers properly synchronized then timestamps with a one second resolution should work just fine.
If you really do need unique sequential IDs then you'll probably have to set up a Flickr style ticket server to properly manage the central list of IDs. This would, essentially, move your g_message_id into a database with proper transaction handling.

You can simulate auto-incrementing a unique key for new rows. Simply use DBSIZE to get the current number of rows, then in your code, increment that number by 1, and use that number as the key for the new row. It's simple and atomic.

Related

How do I get records in a Redis stream by position instead of ID?

With XREAD, I can get count from a specific ID (or first or last). With XRANGE, I can get a range from key to key (or from first - or to last +). Same in reverse with XREVRANGE.
How do I get by position? E.g. "give me the last 10 records in order"? XREVRANGE can do it with XREVRANGE stream + - COUNT 10, although it will be backwards in order, and XRANGE will give me the first 10 with XRANGE stream - + COUNT 10, but how do I:
get the last X in order?
get an arbitrary X in the middle, e.g. the 10 from position 100-109 (inclusive), when the stream has 5000 records in it?

Redis Streams currently (v6.0) do not provide an API for accessing their records via position/offset. The Stream's main data structure (a radix tree) can't provide this type of functionality efficiently.
This is doable with an additional index, for example a Sorted Set of the ids. A similar discussion is at https://groups.google.com/g/redis-db/c/bT00XGMq4DM/m/lWDzFa3zBQAJ
The real question is, as #sonus21 had asked, what's the use case.

It is an existing API that allows users to retrieve records from index
x to index y. It has a memory backend (just an array of records) and a
file one, want to put Redis in it as well.
We can just use Redis List to get the items randomly and at any offset using LRANGE
Attaching consumers to Redis LIST is easy but it does not come with an ack mechanism like a stream, you can build an ack mechanism in your code but that could be tricky. There could be many ways to build an ack mechanism in Redis, see one of them here Rqueue.
Ack mechanism is not always required but if you just want to consume from the LIST, you can use BLPOP/LPOP commands to consume elements from the list, that could be your simple consumer, but using BLPOP/LPOP commands would remove entries from the LIST so your offset would be dynamic, it will not always start from the 0. Instead of using BLPOP/LPOP, if you use LRANGE and track the offset in a simple key like my-consumer-offset, using this offset you can build your consumer that would always get the next element from the list based on the current offset.
Using LIST and STREAM to get random offset and streaming feature. Most important part would be you should use Consumer group so that streams are not trimmed. During push operation you should add element to STREAM as well as LIST. Once elements are inserted your consumer can work without any issue in a consumer group, for random offset you use LIST to get elements. Stream can grow as it'd be in append only mode, so you would need to periodically trim stream. See my other answer for deleting older entries

How to define TTL for redis streams?

I have two micro services and I need to implement reliable notifications between them. I thought about using redis streams -
serviceA will send a request to serviceB with an identifier X.
Once serviceB is done doing the work serviceA asked for, it'll create/add to a stream (the stream is specific for X) a new item to let it know it's done.
ServiceA can send multiple requests, each request may have a different identifier. So it'll block for new elements in different streams.
My question is how can I delete streams which are no longer needed, depending on their age. For example I'd like to have streams that were created over a day ago deleted. Is this possible?
If it's not, I'd love to hear any ideas you have as to how not to have unneeded streams in redis.
Thanks

There's no straight forward way to delete older entries based on the TTL/age. You can use a combination of XTRIM/XDEL with other commands to trim the stream.
Let's see how we can use XTRIM
XTRIM stream MAXLEN ~ SIZE
XTRIM trims the stream to a given number of items, evicting older items (items with lower IDs) if needed.
You generate the stream size every day or periodically based on your delete policy and store it somewhere using XLEN command
Run a periodic job that would call XTRIM as
XTRIM x-stream MAXLEN ~ (NEW_SIZE - PREVIOUS_SIZE)
For example, yesterday stream size was 500 now it's 600 then we need to delete 500 entries so we can just run
XTRIM x-stream MAXLEN ~ 100
You can use different policies for deletion for example daily, weekly, twice a week, etc.
XDEL stream ID [ID...]
Removes the specified entries from a stream, and returns the number of entries deleted, that may be different from the number of IDs passed to the command in case certain IDs do not exist.
So what you can do is whenever Service B consumes the event than the service itself can delete the stream entry as service B knows the stream ID, but this will not work as soon as you start using the consumer group. So I would say use Redis set or Redis map to track the acknowledge stream ids and run a periodic sweep job to clean up the stream.
For example
Service A sends a stream item with ID1 to service B
Service B acknowledges the stream item after consuming the items in the map
ack_stream = { ID1: true }, you can track other data e.g. count in case of the consumer group.
A sweep job would run at periodically like 1 AM daily that reads all the elements of ack_stream and filters out all items that require deletion. Now you can call XDEL commands in batch with the set of stream ids.

How to synchronise multiple writer on redis?

I have multiple writers overwriting the same key in redis. How do I guarantee that only the chosen one write last?
Can I perform write synchronisation in Redis withour synchronise the writers first?
Background:
In my system a unique dispatcher send works to do to various workers. Each worker then write the result in Redis overwrite the same key. I need to be sure that only the last worker that receive work from the dispatcher writes in Redis.

Use an ordered set (ZSET): add your entry with a score equal to the unix timestamp, then delete all but the top rank.
A Redis Ordered set is a set, where each entry also has a score. The set is ordered according to the score, and the position of an element in the ordered set is called Rank.
In order:
Remove all the entries with score equal or less then the one you are adding(zremrangebyscore). Since you are adding to a set, in case your value is duplicate your new entry would be ignored, you want instead to keep the entry with highest rank. 
Add your value to the zset (zadd)
delete by rank all the entries but the one with HIGHEST rank (zremrangebyrank)
You should do it inside a transaction (pipeline)
Example in python:
# timestamp contains the time when the dispatcher sent a message to this worker
key = "key_zset:%s"%id
pipeline = self._redis_connection.db.pipeline(transaction=True)
pipeline.zremrangebyscore(key, 0, t)  # Avoid duplicate Scores and identical data
pipeline.zadd(key, t, "value")
pipeline.zremrangebyrank(key, 0, -2)
pipeline.execute(raise_on_error=True)

If I were you, I would use redlock.
Before you write to that key, you acquire the lock for it, then update it and then release the lock.
I use Node.js so it would look something like this, not actually correct code but you get the idea.
Promise.all(startPromises)
.bind(this)
.then(acquireLock)
.then(withLock)
.then(releaseLock)
.catch(handleErr)
function acquireLock(key) {
return redis.rl.lock(`locks:${key}`, 3000)
}
function withLock(lock) {
this.lock = lock
// do stuff here after get the lock
}
function releaseLock() {
this.lock.unlock()
}

You can use redis pipeline with Transaction.
Redis is single threaded server. Server will execute commands syncronously. When Pipeline with transaction is used, server will execute all commands in pipeline atomically.
Transactions
MULTI, EXEC, DISCARD and WATCH are the foundation of transactions in Redis. They allow the execution of a group of commands in a single step, with two important guarantees:
All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.
A simple example in python
with redis_client.pipeline(transaction=True) as pipe:
val = int(pipe.get("mykey"))
val = val*val%10
pipe.set("mykey",val)
pipe.execute()

Query random but unread keys the Redis way

I have thousands of messages each stored like a list of properties (text, subject, date, etc) in a separate key: msg:1001, msg:1002 etc...
There is also a list keyed as messages with ids of all existing messages: 1001,1002,1003...
Now I need to get 10 random messages.
But, I only need those messages that are not flagged by the user (sort of unread).
There is a hash for each user keyed as flags:USERID = 1001=red,1005=blue,1010=red,...
Currently I have to keep in memory of my application a full list of messages plus all flags for all users currently logged in and do all the math by hand (in JavaScript).
Is there a way to do such a query in Redis way, with no duplicating all the data on the application end?

Your question is an example of a space–time tradeoff. On the one hand, you say that you don't want to keep a list of the unflagged messages in your system, but I would guess that you also want to keep your application relatively fast. Therefore, I suggest giving up some space and keeping a set of unflagged messages.
As messages are created in your system, add them both to messages (SADD messages <messageid>) and messages_unflagged (SADD messages_unflagged <messageid>). After a user adds a flag to a message, remove the message from the unflagged set (SREM messages_unflagged <messageid>). When you need 10 random, unflagged messages, you can get their IDs in constant time (SRANDMEMBER messages_unflagged 10).

Get multiple sets

I've currently got a dataset which is something like:
channel1 = user1,user2,user3
channel2 = user4,user5,user6
(note- these are not actual names, the text is not a predictable sequence)
I would like to have the most optimized capability for the following:
1) Add user to a channel
2) Remove user from a channel
3) Get list of all users in several selected channels, maintaining knowledge of which channel they are in (in case it matters- this can also be simply checking whether a channel has any users or not without getting an actual list of them)
4) Detect if a specific user is in a channel (willing to forego this feature if necessary)
I'm a bit hungup on the fact that there are only two ways I can see of getting multiple keys at once:
A) Using regular keys and a mget key1, key2, key3
In this solution, each value would be a JSON string which can then be manipulated and queried clientside to add/remove/determine values. This itself has a couple problems- firstly that it's possible another client will change the data while it's being processed (i.e. this solution is not atomic) and it's not easy to detect right away if a channel contains a specific user even though it is easy to detect if a channel has any users (this is low priority, as stated above)
B) Using sets and sunion
I would really like to use sets for this solution somehow, the above solution just seems wrong... but I cannot see how to query multiple sets at once and maintain info about which set each member is from or if any of the sets in the union have 0 members (sunion only gives me a final set of all the combined members)
Any solutions which can implement the above points 1-4 in optimal time and atomic operations?
EDIT: One idea which might work in my specific case is to store the channel name as part of the username and then use sets. Still, it would be great to get a more generic answer

Short answer: use sets + pipelining + MULTI/EXEC, or sets + Lua.
1) Add user to a channel
SADD command
2) Remove user from a channel
SREM command
3) Get list of all users in several selected channels
There are several ways to do it.
If you don't need strict atomicity, you just have to pipeline several SMEMBERS commands to retrieve all the sets in one roundtrip. If you are just interested whether channels have users or not, you can replace SMEMBERS by SCARD.
If you need strict atomicity, you can pipeline a MULTI/EXEC block containing SMEMBERS or SCARD commands. The output of the EXEC command will contain all the results. This is the solution I would recommend.
An alternative (atomic) way is to call a server-side Lua script using the EVAL command. Lua script executions are always atomic. The script could take a number of channel as input parameters, and build a multi-layer bulk reply to return the output.
4) Detect if a specific user is in a channel
SISMEMBER command - pipeline them if you need to check for several users.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas