Is it possible to use Redis Pub/Sub persisting latest message? - redis

Currently, I have a project with public transport vehicles, I have to poll the GTFS-RT standard every x seconds and then I will receive the latest data for all vehicles. So to send this to the frontend I am using Redis Pub/Sub with WebSockets.
Currently, however I just publish all the vehicles to redis regardless of any changes. This causes the frontend to re-render everything => slows down the front-end client. I want to actually only publish whenever changes are made to the vehicles. Doing that however means that new listeners would miss out on vehicles. So I would have to persist the latest data for every vehicle.
My question is - is this achievable by replacing Redis Pub/Sub with Redis Streams and then always deleting the current message if another message appears? If that is possible how would I do that? With Pub/Sub I thought about just making a channel for every vehicle and letting the client listen to all the channels, as can be done with pattern matching, but it wouldn't solve the problem of persistence, so what would be the best approach to tackle this problem?

Related

How to scale Redis Queue

We are shifting from Monolithic to Microservice Architecture for our e-commerce marketplace application. We chosen Redis pub/sub for microservice to microservice communication and also for some push notification purpose. Push notification strategy is like below:
Whenever an order is created (i,e customer creates an order), the backend publishes an event in respective channel (queue) and the specific push-notification-microservice consumes this event (json message) and sends push notification to the seller mobile.
For the time being we are using redis-server installed in our ubuntu machine without any hassle. But the headache is in future when millions of order will be generated in a point of time then how can we handle this situation ? That means, we need to scale the Redis Queue, right ?
My exact clean question (regardless the above scenario) is:
How can I horizontally scale Redis Queue instead of increasing the RAM in same machine ?
Whenever an order is created (i,e customer creates an order), the
backend publishes an event in respective channel (queue) and the
specific push-notification-microservice consumes this event (json
message) and sends push notification to the seller mobile.
IIUC you're sending a message over Redis PUB/SUB, that's not durable that means if the only producer is up and other services/consumers are down then consumers will miss messages. Any services that are down will lose all those messages that are sent when the said service was down.
Now let's assume, you're using Redis LIST and other combinations of data structures to solve the missing events issue.
Scaling Redis queue is a little bit tricky since entire data is stored in a list, that resides on a single Redis machine/host. What you can do is create your own partitioning scheme and design your Redis keys as per the partitioning scheme as Redis does internally when we add a new master in the cluster, creating consistent hashing would require some efforts.
Very simple you can distribute loads based on the userId for example if userId is between 0 and 1000 then use queue_0, 1000-2000 queue_1, and so on. This is a manual process that you can be automated using some script. Whenever a new queue is added to the set all consumers have to be notified and the publisher will be updated as well.
Dividing based on the number is a range partition scheme, you can use a hash partition scheme as well, either you use a range or hash partitioning scheme, whenever a new queue is added to the queue set the consumers must be notified for potential updates. Consumers can spawn a new worker for the new queue, removing a queue could be tricky as all consumers must have drained their respective queue.
You might consider using Rqueue

Why pub sub in redis cannot be used together with other commands?

I'm reading here, and I see a warning stating that PUB/SUB subscribers in Redis should not issue other commands:
A client subscribed to one or more channels should not issue commands,
although it can subscribe and unsubscribe to and from other channels.
I have two questions:
Why is this limitation?
For the scope of the paragraph, what's a client? A whole process? A Redis connection? A complete Redis instance? Or is it a bad idea in general to issue commands and subscribe to channels, and the admonition goes for every and any scope I can think of?
A client, in this case, is an instance of a connection to Redis. An application could well have multiple clients, each with different responsibilities or as a way to provide higher degrees of parallelism to the application.
What they are suggesting here, however, is that you use an individual client (think 'connection') to handle your incoming subscription messages and to react to those messages as its sole responsibility. The reason it's recommended not to make calls with this connection is because while it is waiting on incoming messages from subscribed channels, the client is in a blocked state.
Trying to make a call on a given client won't work while it's awaiting response from a blocking call.

"Archiving" publish/subscribe message in Redis

I am using Redis' publish/subscribe feature. So the server is publishing 10 items then the client gets those 10 items.
Now however, a new client subscribes to the feed. I would like them to get the previous 10 items as well as any new items.
Does Redis have a way of doing this using the publish and subscribe functionality? Is a feed history stored anywhere in the database? Is there an easy way of doing this? Is the best way to also store the messages in a list and have the client do an LRANGE my_list 0 10 on the list?
I'd keep a separate archive of the data and have events added to both. New clients can subscribe and queue the real time events, read the archive until it's up to date with the first published event, then catch up with the published events. That way you shouldn't miss any published events while switching between the archive and real time events.
Stumbled on this during some research. I know it is old but I wanted to add that with the Redis Streams data structure it is not overly complex to implement persistent messaging.
The publisher would publish messages to a Stream and a subscriber would just get the latest message if that is all it cared about. You can also create user groups to limit how many subscribers can get the message and then mark them as acknowledged to avoid duplicate processing. This is good when you want a message to be handled only once and need a way to confirm that.
I ended up creating a nodejs app for this sort of purpose. In my case, user data was published to the redis server which i wanted to store, I subscribed to the redis channel with a nodejs app and then saved the details to a database, ive played around with mysql and mongo so far, let me know if this is of any interest and ill paste some code, there are some similarities in trying to store a publish history...
Cheers

How is Redis used in Trello?

I understand that, roughly speaking, Trello uses Redis for a transient data store.
Is anyone able to elaborate further on the part it plays in the application?
We use Redis on Trello for ephemeral data that we would be okay with losing. We do not persist the data in Redis to disk, and we use it allkeys-lru, so we only store things there can be kicked out at any time with only very minor inconvenience to users (e.g. momentarily seeing an incorrect user status). That being said, we give it more than 5x the space it needs to store its actual working set and choose from 10 keys for expiry, so we really never see anything get kicked out that we're using.
It's our pubsub server. When a user does something to a board or a card, we want to send a message with that delta to all websocket-connected clients that are subscribed to the object that changed, so all of our Node processes are subscribed to a pubsub channel that propagates those messages, and they propagate that out to the appropriately permissioned and subscribed websockets.
We SORT OF use it to back socket.io, but since we only use the websockets, and since socket.io is too chatty to scale like we need it to at the moment, we have a patch that disables all but the one channel that is necessary to us.
For our users who don't have websockets, we have to keep a list of the actions that have happened on each object channel since the user's last poll request. For that we use a list which we cap at the most recent 100 elements, and an auxilary counter of how many elements have been added to the list since it was created. So when we're answering a poll request from such a browser, we can check the last element it reports that it has seen, and only send down any messages that have been added to the queue since then. So that gets a poll request down to just a permissions check and a single Redis key check in most cases, which is very fast.
We store some ephemeral data about the active status of connected users in Redis, because that data changes frequently and it is not necessary to persist it to disk.
We store short-lived keys to support OAuth logins in Redis.
We love Redis; once you have an instance of it up and running, you want to use it for all kinds of things. The only real trouble we have had with it is with slow-consuming clients eating up the available space.
We use MongoDB for our more traditional database needs.
Trello uses Redis with Socket.IO (RedisStore) for scaling, with the following two features:
key-value store, to set and get values for a connected client
as a pub-sub service
Resources:
Look at the code for RedisStore in Socket.IO here: https://github.com/LearnBoost/socket.io/blob/master/lib/stores/redis.js
Example of Socket.IO with RedisStore: http://www.ranu.com.ar/2011/11/redisstore-and-rooms-with-socketio.html

Does the redis pub/sub model require persistent connections to redis?

In a web application, if I need to write an event to a queue, I would make a connection to redis to write the event.
Now if I want another backend process (say a daemon or cron job) to process the or react the the publishing of the event in redis, do I need a persistant connection?
Little confused on how this pub/sub process works in a web application.
Basically in Redis there are two different messaging models:
Fire and Forget / One to Many: Pub/Sub. At the time a message is PUBLISH-ed all the subscribers will receive it, but this message is then lost forever. If a client was not subscribed there is no way it can get it back.
Persisting Queues / One to One: Lists, possibly used with blocking commands such as BLPOP. With lists you have a producer pushing into a list, and one or many consumers waiting for elements, but one message will reach only one of the waiting clients. With lists you have persistence, and messages will wait for a client to pop them instead of disappearing. So even if no one is listening there is a backlog (as big as your available memory, or you can limit the backlog using LTRIM).
I hope this is clear. I suggest you studying the following commands to understand more about Redis and messaging semantics:
LPUSH/RPUSH, RPOP/LPOP, BRPOP/BLPOP
PUBLISH, SUBSCRIBE, PSUBSCRIBE
Doc for this commands is available at redis.io
I'm not totally sure, but I believe that yes, pub/sub requires a persistent connection.
For an alternative I would take a peek at resque and how it handles that. Instead of using pub/sub it simply adds an item to a list in redis, and then whatever daemon or cron job you have can use the lpop command to get the first one.
Sorry for only giving a pseudo answer and then a plug.