How is Redis used in Trello? - redis

I understand that, roughly speaking, Trello uses Redis for a transient data store.
Is anyone able to elaborate further on the part it plays in the application?

We use Redis on Trello for ephemeral data that we would be okay with losing. We do not persist the data in Redis to disk, and we use it allkeys-lru, so we only store things there can be kicked out at any time with only very minor inconvenience to users (e.g. momentarily seeing an incorrect user status). That being said, we give it more than 5x the space it needs to store its actual working set and choose from 10 keys for expiry, so we really never see anything get kicked out that we're using.
It's our pubsub server. When a user does something to a board or a card, we want to send a message with that delta to all websocket-connected clients that are subscribed to the object that changed, so all of our Node processes are subscribed to a pubsub channel that propagates those messages, and they propagate that out to the appropriately permissioned and subscribed websockets.
We SORT OF use it to back socket.io, but since we only use the websockets, and since socket.io is too chatty to scale like we need it to at the moment, we have a patch that disables all but the one channel that is necessary to us.
For our users who don't have websockets, we have to keep a list of the actions that have happened on each object channel since the user's last poll request. For that we use a list which we cap at the most recent 100 elements, and an auxilary counter of how many elements have been added to the list since it was created. So when we're answering a poll request from such a browser, we can check the last element it reports that it has seen, and only send down any messages that have been added to the queue since then. So that gets a poll request down to just a permissions check and a single Redis key check in most cases, which is very fast.
We store some ephemeral data about the active status of connected users in Redis, because that data changes frequently and it is not necessary to persist it to disk.
We store short-lived keys to support OAuth logins in Redis.
We love Redis; once you have an instance of it up and running, you want to use it for all kinds of things. The only real trouble we have had with it is with slow-consuming clients eating up the available space.
We use MongoDB for our more traditional database needs.

Trello uses Redis with Socket.IO (RedisStore) for scaling, with the following two features:
key-value store, to set and get values for a connected client
as a pub-sub service
Resources:
Look at the code for RedisStore in Socket.IO here: https://github.com/LearnBoost/socket.io/blob/master/lib/stores/redis.js
Example of Socket.IO with RedisStore: http://www.ranu.com.ar/2011/11/redisstore-and-rooms-with-socketio.html

Related

CoTurn Data Usage Stats on Multi User System

We want to track each users turn usage seperately. I inspected Turn Rest API, AFAIU it is just used to authorize the user which already exists in Coturn db. This is the point I couldn't realize exactly. I can use an ice server list which includes different username-credential peers. But I must have initialized these username-credential peers on coturn db before. Am I right? If I am right, I do not have an idea how to do this. Detection of user's request to use turn from frontend -> Should generate credentials like this CoTURN: How to use TURN REST API? which I am already achieved -> if this is a new user, my backend should go to my EC2 instance and run "turnadmin create user command" somehow -> then I will let the WebRTC connection - > then track the usage of specific user and send it back to my backend somehow.
Is this a true scenario? If not, how should it be? Is there another way to manage multiple users and their data usage? Any help would be appreciated.
AFAIU to get the stats data, we must use redis database. I tried to used it, I displayed the traffic data (with psubscribe turn/realm//user//allocation//traffic ) but the other subscribe events have never worked ( psubscribe turn/realm//user//allocation//traffic/peer or psubscribe turn/realm//user//allocation/*/total_traffic even if the allocation is deleted). So, I tried to get past traffic data from redis db but I couldn't find how. At redis, KEYS * command gives just "status" events.
Even if I get these traffic data, I couldn't realize how to use it with multi users. Currently in our project we have one user(in terms of coturn) and other users are using turn over this user.
BTW we tried to track the usage where we created peer connection object from RTCPeerConnection interface. I realized that incoming bytes are lower than the redis output. So I think there is a loss and I think I should calculate it from turn side.

Is it possible to use Redis Pub/Sub persisting latest message?

Currently, I have a project with public transport vehicles, I have to poll the GTFS-RT standard every x seconds and then I will receive the latest data for all vehicles. So to send this to the frontend I am using Redis Pub/Sub with WebSockets.
Currently, however I just publish all the vehicles to redis regardless of any changes. This causes the frontend to re-render everything => slows down the front-end client. I want to actually only publish whenever changes are made to the vehicles. Doing that however means that new listeners would miss out on vehicles. So I would have to persist the latest data for every vehicle.
My question is - is this achievable by replacing Redis Pub/Sub with Redis Streams and then always deleting the current message if another message appears? If that is possible how would I do that? With Pub/Sub I thought about just making a channel for every vehicle and letting the client listen to all the channels, as can be done with pattern matching, but it wouldn't solve the problem of persistence, so what would be the best approach to tackle this problem?

RabbitMQ Pub/Sub setup with large number of disconnected clients...

This is a new area for me so hopefully my question makes sense.
In my program I have a large number of clients which are windows services running on laptops - that are often disconnected. Occasionally they come on line and I want them to receive updates based on user profiles. There are many types of notifications that require the client to perform some work on the local application (i.e. the laptop).
I realize that I could do this with a series of restful database queries, but since there are so many clients (upwards to 10,000) and there are lots of different notification types, I was curious if perhaps this was not a problem better suited for a messaging product like RabbitMQ or even 0MQ.
But how would one set this up. (let's assume in RabbitMQ?
Would each user be assigned their own queue?
Or is it preferable to have each queue be a distinct notification type and you would use some combination of direct exchanges or filtering messages based on a routing key, where the routing key could be a username.
Since each user may potentially have a different set of notifications based on their user profile, I am thinking that each client/consumer would have a specific message for each notification sitting on a queue waiting for them to come online and process it.
Is this the right way of thinking about the problem? Thanks in advance.
It will be easier for you to balance a lot of queues than filter long ones, so it's better to use queue per consumer.
Messages can have arbitrary headers and body so it is the right place for notification types.
Since you will be using long-living queues, waiting for consumers on disk - you better use lazy queues https://www.rabbitmq.com/lazy-queues.html (it's available since version 3.6.0)

Redis as a message broker

Question
I want to pass data between applications, in a publish-subscribe manner. Data may be produced at a much higher rate than consumed and messages get lost, which is not a problem. Imagine a fast sensor and a slow sensor data processor. For that, I use redis pub/sub and wrote a class which acts as a subscriber, receives every message and puts that into a buffer. The buffer is overwritten when a new message comes in or nullified when the message is requested by the "real" function. So when I ask this class, I immediately get a response (hint that my function is slower than data comes in) or I have to wait (hint that my function is faster than the data).
This works pretty good for the case that data comes in fast. But for data which comes in relatively seldom, let's say every five seconds, this does not work: imagine my consumer gets launched slightly after the producer, the first message is lost and my consumer needs to wait nearly five seconds, until it can start working.
I think I have to solve this with Redis tools. Instead of a pub/sub, I could simply use the get/set methods, thus putting the cache functionality into Redis directly. But then, my consumer would have to poll the database instead of the event magic I have at the moment. Keys could look like "key:timestamp", and my consumer now has to get key:* and compare the timestamps permamently, which I think would cause a lot of load. There is no natural possibility to sleep, since although I don't care about dropped messages (there is nothing I can do about), I do care about delay.
Does someone use Redis for a similar thing and could give me a hint about clever use of Redis tools and data structures?
edit
Ideally, my program flow would look like this:
start the program
retrieve key from Redis
tell Redis, "hey, notify me on changes of key".
launch something asynchronously, with a callback for new messages.
By writing this, an idea came up: The publisher not only publishes message on topic key, but also set key message. This way, an application could initially get and then subscribe.
Good idea or not really?
What I did after I got the answer below (the accepted one)
Keyspace notifications are really what I need here. Redis acts as the primary source for information, my client subscribes to keyspace notifications, which notify the subscribers about events affecting specific keys. Now, in the asynchronous part of my client, I subscribe to notifications about my key of interest. Those notifications set a key_has_updates flag. When I need the value, I get it from Redis and unset the flag. With an unset flag, I know that there is no new value for that key on the server. Without keyspace notifications, this would have been the part where I needed to poll the server. The advantage is that I can use all sorts of data structures, not only the pub/sub mechanism, and a slow joiner which misses the first event is always able to get the initial value, which with pub/sib would have been lost.
When I need the value, I obtain the value from Redis and set the flag to false.
One idea is to push the data to a list (LPUSH) and trim it (LTRIM), so it doesn't grow forever if there are no consumers. On the other end, the consumer would grab items from that list and process them. You can also use keyspace notifications, and be alerted each time an item is added to that queue.
I pass data between application using two native redis command:
rpush and blpop .
"blpop blocks the connection when there are no elements to pop from any of the given lists".
Data are passed in json format, between application using list as queue.
Application that want send data (act as publisher) make a rpush on a list
Application that want receive data (act as subscriber) make a blpop on the same list
The code shuold be (in perl language)
Sender (we assume an hash pass)
#Encode hash in json format
my $json_text = encode_json \%$hash_ref;
#Connect to redis and send to list
my $r = Redis->new(server => "127.0.0.1:6379");
$r->rpush("shared_queue","$json_text");
$r->quit;
Receiver (into a infinite loop)
while (1) {
my $r = Redis->new(server => "127.0.0.1:6379");
my #elem =$r->blpop("shared_queue",0);
#Decode hash element
my $hash_ref=decode_json($elem\[1]);
#make some stuff
}
I find this way very usefull for many reasons:
The element are stored into list, so temporary disabling of receiver has no information loss. When recevier restart, can process all items into the list.
High rate of sender can be handled with multiple instance of receiver.
Multiple sender can send data on unique list. In ths case should be easily implmented a data collector
Receiver process that act as daemon can be monitored with specific tools (e.g. pm2)
From Redis 5, there is new data type called "Streams" which is append-only datastructure. The Redis streams can be used as reliable message queue with both point to point and multicast communication using consumer group concept Redis_Streams_MQ

Redis publish/subscribe: see what channels are currently subscribed to

I am currently interested in seeing what channels are subscribed to in a Redis pub/sub application I have. When a client connects to our server, we register them to a channel that looks like:
user:user_id
The reason for this is I want to be able to see who's "online". I currently blindly fire off messages to a channel without knowing if a client is online since it's not critical that they receive these types of messages.
In an effort to make my application smarter, I'd like to be able to discover if a client is online or not using the pub/sub API, and if they are offline, cache their messages to a separate redis queue which I can push to them when they get back online.
This does not have to be 100% accurate, but the more accurate it is, the better. I'm assuming a generic key does not get created when a channel gets subscribed to, so I cannot do something as trivial as:
redis-cli keys user* to find all online users.
The other strategy I've thought of is just maintaining my own Redis Set whenever a user published or removes themselves from a channel (which the client automatically handles when they hop online and close the app). That would be an additional layer of complexity that I need to manage and I'm hoping there is a more trivial approach with the data that's already available.
As of Redis 2.8 you can do:
PUBSUB CHANNELS [pattern]
The PUBSUB CHANNELS command has O(N) complexity, where N is the number of active channels.
So in your case:
redis-cli PUBSUB CHANNELS user*
would give you want you want.
There is currently no command for showing what channels "exist" by way of being subscribed to, but there is and "approved" issue and a pull request that implements this.
https://github.com/antirez/redis/issues/221
https://github.com/antirez/redis/pull/412
Due to the nature of this call, it is not something that can scale, and is thus a "DEBUG" command.
There are a few other ways to solve your problem, however.
If you have reason to believe that a channel may be subscribed to, you can send it a message and look at the result. The result is the number of subscribers that got the message. If you got 0, you know that they're not there.
Assuming that your user_ids are incremental, you might be interested in using SETBIT to set a 1 or 0 to a user's offset bit to track presence. You can then do cool things like the new BITCOUNT to see how many users are online, and GETBIT to determine if a specific user is online.
The way I have solved your problem more specifically in the past is by signaling a subscription manager that I have subscribed to a channel. The manager then "pings" the channel by sending a blank message to confirm that there is a subscriber, and occasionally pings the channel thereafter to determine if the user is still online. Not ideal, but better than using DEBUG CHANNELS in production.
From version 2.8.0 redis has a pubsub command that would help in this case:
http://redis.io/commands/pubsub
Remark: currently the state of 2.8.0 is not stable yet (RC2)
I am unaware of any specific way to query what channels are being subscribed to, and you are correct that there isn't any key created when this happens. Also, I wouldn't use the KEYS command in production anyway, as it's really a debugging command.
You have the right idea about using a set to add the user when they're online, and then query this with SISMEMBER <set> <user_id> to determine if the messages should be sent to them or added to a Redis list for processing once they do come online.
You will need to figure out when a user logs off so you can remove them from the list of online users, but I don't know enough about your system to know exactly how you would go about that.
If the connected clients have the ability to send a message back to inform the server that the message(s) were consumed, you could use this to keep track of which messages should be stored for later retrieval.
Cheers,
Mike
* PUBSUB NUMSUB [channel-1 ... channel-N]
Returns the number of subscribers (not counting clients subscribed to patterns) for the specified channels.
https://redis.io/commands/pubsub