Best way to handle read/unread messages in Redis activity feed? - redis

I have an activity feed system that uses Redis sorted sets.
Events happen and a message is placed into a sorted set for each relevant user, with a timestamp for the score.
The messages are then distributed to the users, either when they log in, or through a push if the user is currently logged in.
I'd like to differentiate between messages that have been "read" by the user and ones that are still unread.
From my understanding, I can't just have a "read/unread" property as part of the member as changing it would cause the member to be different and therefore be added a second time, instead of replacing the current member.
So, what I'm thinking is that for each user, I have to sorted sets - an "unread" set and a "read" set.
When new events come in, they're added to the "unread" set
When the user reads a message, I add the message to the read set and remove it from the unread set.
A little less sure on how to deliver them. I can't just union them as I lose the distinction between read/unread unless I inverse the score on the unread ones, for instance.
Returning the two sets individually (and merging them in code) makes paging difficult. I'd like to be able to the 20 most recent messages regardless of read/unread status.
So, questions:
Is the read set/unread set way the best way to do it? Is there a better way?
What's the best way to return subsets of the merged/union'd data.
Thanks!

Instead of trying to update the member, you could just pop it and insert a new version. Should be no problem, because you know both member and its timestamp.

Related

Retrieving messages in chat application

I am working on an application for chatting. I wondering how to design the API for getting messages from a server.
When the application loads the chat window, I display the last 20 messages from the server using the endpoint:
/messages?user1={user1Id}&user2={user2Id}&page=0
Secondly, I allow user to load and display previous messages when the users scrolls to the top using the same endpoint but with different page (1)
/messages?user1={user1Id}&user2={user2Id}&page=1
But this design doesn't work correctly when users start to send messages to each others. The reason is that the endpoint returns the messages using descending order. Invoking the endpoint will give different result sets before sending/receiving a message and after.
My goal is to get always 20 previous messages when a user scrolls through conversation history.
How would you implement this in a clean way (including the REST API design)? I can imagine some solutions, but they seem dirty to me.
REST does not scale well for real time applications. It takes too many resources to open the HTTP connection again and again. Better to use websockets for it.
As of the upper problem if you start pagination with the latest message, not with the first message, then I would not wonder that it changes. To keep the pages after it you need to send the message id you started with. So for example: GET ...&count=20&latest=1, after that you get a 20 list of messages, you get the message id of the latest one and do the pagination with GET ...&count=20&basePoint={messageId}&page=0 to always get the exact same page no matter that you got new messages. After that you continue with GET ...&count=20&basePoint={messageId}&page=1 and so on... You will have negative pages GET ...&count=20&basePoint={messageId}&page=-1 if there are newer messages. Though I don't know any chatting application which uses pagination this way.
What I would do is GET ...&count=20&latest=1 and get the 20th message and do GET ...&count=20&before={messageId_20th} after that get the last message again from that list which is the 40th and do GET ...&count=20&before={messageId_40th} and so on. To get the new messages you can do GET ...&count=20&latest=1 again or GET ...&count=20&after={messageId_1st}.
None of the above is really good if you are using a caching mechanism in the client. I would add something like key frames in videos, so key messages. For example every 20th message can be a key message. I would do caching based on the key message ids so the responses could be cacheable, something like GET ...&count=20&latest=1 would return messages and one in the middle of the list would have a property of keyMessage=true. I would start the next request from the last key message something like GET ...&count=20&before={messageId_lastKey}, so the response will be cacheable.
Another way of doing this is starting pagination from the very first message. So when you do GET ...&count=20&latest=1 it will write the index of the message, something like 1234th message. You can do caching based on every 20th message just like in the key message solution, but instead of the message ids, you can do GET ...&count=20&to=1220 and merge the pages on the client. Or if you really want to spare with data, then GET ...&from=1201&to=1220&before=1215 or GET ...&from=1201&to=1220&last=1214 or GET ...&from=1201&to=1214, etc... And continue normal pagination with GET ...&count=20&to=1200 or GET ...&from=1181&to=1200.
What is good with the upper fixed pages approach, that you can use range headers too. For example GET .../latest would return the last 20 message with the header of Content-Range: messages 1215-1234/1234. After that you can do GET .../ Range: messages=1201-1214 and GET .../ Range: messages=1181-1200 and so on... When the total message count is updated in the response Content-Range: messages 1181-1200/1245, then you can automagically download the new message with GET .../ Range: messages=1235-1245 or with GET .../ Range: messages=1235- if you expect the 1245 to change meanwhile. You can do the same thing with the URI too, but if you have a standard solution like range headers it is better to use it instead of reinventing the wheel.
As of the &user1={user1Id}&user2={user2Id} part I would order it based on alphabet, so always user1 would be earlier in alphabetic order than user2, something like &user1=B&user2=A -> &user1=A&user2=B. So that will be properly cacheable too, not necassarily on the client, but you can add a server side cache too, so if the two participating users try to get the same conversation recently it won't be queried from the database again. Another way of doing this is adding a conversation id, which can be random unique or generated from the two user ids, but it must be always the same for the two users e.g. /conversations/A+B or /conversations/x23542asda. I would do the latter, because if you start support conversations with multiple users it is better to have a dedicated conversation identifier. Another thing you can support is having multiple topic related conversations between the two users, so you won't need to do /conversations/A+B/{topic} just use a unique id and search for conversations before creating a new one, so either /conversations?participants[]=A&participants[]=B&topic={topic} will give you an empty list or a list with a single conversation. This could be used to list coversations of a single user /conversations?participants[]=A or list conversations of a single user in a certain topic /conversations?participants[]=A&topic={topic}.
Generally speaking, for "system design" questions, there isn't any one correct answer. Multiple approaches can be considered, based on their pros and cons. You've mentioned you want a clean way to retrieve messages, so your expectation might be different from mine.
Here is a simple approach/idea to get you started:
Have a field in your database that can be used to order messages. It could be a timestamp, a message_id, or something else that you want. Let's call this m_id for now.
On your client, have a field that tracks the m_id of the earliest message currently available locally on the client. This key (let's call it earliest_m_id) will be used to query the database, because you can simply fetch x number of messages before the earliest_m_id.
This would solve your issue of incorrect messages being fetched. Using paging will not work, since pages will continuously change with the exchange of messages.

Redis: Expiration of secondary indexes

I'm curious about best practices as far as expiring secondary indexes in Redis.
For instance, say I have object IDs with positions that are only considered valid for 5 seconds before they expire. I have:
objectID -> hash of object info
I set these entries to automatically expire after 5 seconds.
I also want to look up objects by their zip code, so I have another mapping:
zipCode -> set or list of objectID
I know there's no way to automatically expire elements of a set, but I'd like to automatically remove objectIDs from that zip code mapping when they expire.
What is the standard practice for how to do this? If there were an event fired upon expiration I could find the associated zip code mapping and remove the objectID, but not sure there is (I will be using Go).
If there were an event fired upon expiration I could find the associated zip code mapping and remove the objectID, but not sure there is
YES, there is an expiration notification, check this for an example.
You can have a client subscribing to the expiration event, and delete items from the zipCode set or list.
However, this solution has two problems:
The notification is NOT reliable. If the client disconnects from Redis, or just crashes, client will miss some notifications. In order to make it more reliable, you can have multiple clients listening to the notification. If one client crashes, others can still remove items from zipCode.
It's NOT atomic. There's a time window between the key expiration and removing items from zipCode.
If you're fine with these 2 problems, you can try this solution.

RabbitMQ - How to ensure two queues stay synchronized

I have two queues that both have distinct data types that affect one another as they're being processed by my application, therefore processing messages from the two queues asynchronously would cause a data integrity issue.
I'm curious as to the best practice for making sure only one consumer is consuming at any given time. Here is a summary of what I have so far:
EventMessages receive information about external events that may or may not have an impact on the enqueued/existing PurchaseOrderMessages.
Since we anticipate we'll be consuming more PurchaseOrderMessage than EventMessage, maybe we should just ensure the EventMessage Queue is empty (via the API) before we process anything in PurchaseOrderMessage Queue - but that gets into the question of wait times, etc. and this all needs to happen as close to real time as possible.
If there's a way to simply pause a Consumer A until Consumer B is at rest that might be the simplest solution, I'm just not quite sure which direction I need to go in.
UPDATE
To provide some additional context, a PurchaseOrderMessage will contain a Origin and Destination.
A EventMessage also contains location data.
Each time a PurchaseOrderMessage is processed, it will query the current EventMessage records for any Event locations that match the Origin and Destination of that PurchaseOrder and create an association.
Each time an EventMessage is processed, it will query the current PurchaseOrderMessage records for any Origins of Destinations that match that Event and create an association.
If synchronous queues aren't a good solution, what's an alternative that would insure none of the associations are missed when EventMessages and PurchaseOrderMessages are getting published to the app at the same time?
UPDATE 2
Ultimately this data will serve a UI which will have a list of PurchaseOrders and the events that might be affecting their delivery dates. It would be too slow to do the "Event Check" as the PurchaseOrder data was being rendered/retrieved by the end user which is why we're wanting to do it as they're processed/consumed.
Let me begin with the bottom line up front - on the face of it, what you are asking doesn't make sense.
Queues should never require synchronization. The very thought of doing so entirely defeats the purpose of having a queue. For some background, visit this answer.
Let's consider some common places from real life where we encounter multiple queues:
Movie theaters (box office, concession counter, usher)
Theme parks (snack bars, major attractions)
Manufacturing floors (each station may have a queue waiting to process)
In each of these examples, from the point of view of the object in the queue, it can only wait in one at a time. It cannot wait in one line while it is waiting in another- such a thing is physically impossible.
Your example seems to take two completely unrelated things and merge them together. You have a queue for PurchaseOrder objects - but what is the queue for? That's the equivalent of going to Disney World and waiting in the Customer queue - what is the purpose of such a queue? If the purpose is not clear, it's not a real queue.
Addressing your issue
This particular issue needs to be addressed first by clearly defining the various operations that are being done to a PurchaseOrder, then creating queues for each of those operations. If these operations are truly synchronous, then your business logic should be coded to wait for one operation to complete before starting another. In this circumstance, it would be considered an exception if a PurchaseOrder got to the head of one queue without fulfilling a pre-requisite.
Please remember that a message queue typically serves a stateless operation. Good design dictates that messages in the queue contain all the information needed for the processor to process the message. If you don't adhere to this, then your database becomes a single point of contention for your system - and while this is not an insurmountable problem, it does make the design more complex.
Waiting in Multiple Queues
Now, if you've ever been to Disney World, you'll also know that they have something called a FastPass+ (FP+), which allows the holder to skip the line at the designated attraction. Disney allocates a certain number of slots per hour for each major attraction at the park, and guests are able to request up to three FP+s during each day. FP+ times are allocated for one hour blocks, and guests cannot have two overlapping FP+ time blocks. Once all FP+ slots have been issued for the ride, no more are made available. The FP+ system ensures these rules are enforced, independently of the standby queues for each ride. Essentially, by using FastPass+, guests can wait in multiple lines virtually and experience more attractions during their visit.
If you are unable to analyze your design and come up with an alternative, perhaps the FastPass+ approach could help alleviate some of the bottlenecks.
Disclaimer: I don't work for Disney, but I do go multiple times per month, always getting my FastPass first

How to handle a single publisher clogging up my RabbitMQ's queue

In my last project, I am using MassTransit (2.10.1) with RabbitMQ.
On some scenarios, a producer is allowed to send a bulk of messages to the queue.
For example - the user set to a bulk notification to his list of contacts - the list could be as large as 100000 contacts on some cases. This will send a message per each contact to the queue (I need to keep track of each message). Now since - as I understand - messages are being processed in the order of entrance, that user is clogging up the queue for a large amount of time while another user, which may have done a simple thing such as send a test message to himself, waits for the processing to end.
I have considered separating queues for regular VS bulk operations but this still doesn't solve the problem for small bulks (user with dozens of contacts waiting for users with hundred thousands) and causes extra maintenance.
The ideal solution for me - I think - would involve manipulating the routing in such a way that the consumer will be handling x messages from the same user, move the X messages from the next user, than again, and than moving back to the beginning of the queue, until all messages are processed.
Is that possible? Is there a better solution?
Thanks in advance.
You will to have to write code to manage this yourself. RabbitMQ doesn't really have any built-in mechanism to handle a scenario like this, without your code getting involved.
If you want to process a few at a time from bulk, then back to normal, then back to bulk, you'll need 2 queues and code to manage which one is being pulled from, when.
Just my opinion, seeing as how there is no built in way to my knowledge...Have you considered using whatever storage you are using to store the notifications, then just publish one message, with a List of Notifications, store it in you DB, and then have a retrieve notifications for user consumer. the response would be one message, it may have a massive payload, but even if that gets bogged down, add a skip and take property to the message and force it to be between 0 and 50 (or whatever). In what scenario would you want to show a user 100,000 notifications at once?

Redis usecase for notification system

Does anyone know the usecases for notification system (redis).
I tried many patterns, but not satisfied.
I would think using a List to create a queue would be the best approach. You can push a JSON document or some other serialized data representing the notice onto the list then pop them off as they are delivered (or keep them in the list depending upon your need). Using things like LRANGE you can easily paginate to handle any number of notices.
Take a look at Staircar: Redis-powered notifications. Tumblr team actually uses Redis SortedSet for notifications:
Redis Sorted Sets fit the characteristics of notifications perfectly, without the I/O and concurrency pitfalls of implementing a similar structure in MySQL. Sorted sets in Redis are ordered by a score (unix timestamp in our case), contain unique elements (non-repeating collections of strings in redis speak), can be trimmed or appended to cheaply, and are keyed off, well, a key (user in our case)
Take a look at Thoonk. It produces like pub/sub events that correspond to publish/edit/retract/resorts on higher level objects called feeds. It works well for notification inboxes and application state changes.
The contract/schema https://github.com/andyet/thoonk.js/blob/master/contract.txt provides a lot of recipes that you may be interested in.