How to create channel programmatically on socketcluster? - socketcluster

I'm considering using socketcluster to build a realtime app. The docs are very clear but I could not find a way to create a channel on demand programmatically.
My need is: as a user, I would like to call a REST API which will create a channel which would immediately be up and running on the server.
For example, calling from client side: POST https://<myServer>/api/channels with JSON body { "channel": "myChannel} would create a myChannel channel on the server and my client side code would be able to subscribe directly (after having received the server response):
var myChannel = socket.subscribe('myChannel');
myChannel.publish('myChannel', 'I am here !');
myChannel.watch(function (data) {
console.log('received data from myChannel:', data);
});
I suppose that this newly created channel would use my authorization middleware as middlewares are defined at server level (wsServer.addMiddleware(wsServer.MIDDLEWARE_SUBSCRIBE, ...)
Thanks a lot for your help,
Pierre

With SocketCluster, channels are created and destroyed for you automatically so you don't need to manage their lifecycle. A channel will be created on the back end if there is at least one client subscribed to it (based on the channel name) and will be automatically destroyed once all of those clients have disconnected or unsubscribed from it. SC also accounts for failure cases too - E.g. if internet connections are unexpectedly lost.
SC is designed to be efficient at creating and destroying lots of unique channels on the fly. You can have hundreds of unique channels per user (so possibly many thousands or even millions of unique channels in total). Channels don't consume any CPU at all if they're idle and each channel has a tiny memory footprint.
Channels in SC are not message queues (unlike what is offered by RabbitMQ, NSQ, Kafka, Stomp...); SC does not store messages on a persistent queue (though you can extend SC with your own persistence logic).

Related

How to scale Redis Queue

We are shifting from Monolithic to Microservice Architecture for our e-commerce marketplace application. We chosen Redis pub/sub for microservice to microservice communication and also for some push notification purpose. Push notification strategy is like below:
Whenever an order is created (i,e customer creates an order), the backend publishes an event in respective channel (queue) and the specific push-notification-microservice consumes this event (json message) and sends push notification to the seller mobile.
For the time being we are using redis-server installed in our ubuntu machine without any hassle. But the headache is in future when millions of order will be generated in a point of time then how can we handle this situation ? That means, we need to scale the Redis Queue, right ?
My exact clean question (regardless the above scenario) is:
How can I horizontally scale Redis Queue instead of increasing the RAM in same machine ?
Whenever an order is created (i,e customer creates an order), the
backend publishes an event in respective channel (queue) and the
specific push-notification-microservice consumes this event (json
message) and sends push notification to the seller mobile.
IIUC you're sending a message over Redis PUB/SUB, that's not durable that means if the only producer is up and other services/consumers are down then consumers will miss messages. Any services that are down will lose all those messages that are sent when the said service was down.
Now let's assume, you're using Redis LIST and other combinations of data structures to solve the missing events issue.
Scaling Redis queue is a little bit tricky since entire data is stored in a list, that resides on a single Redis machine/host. What you can do is create your own partitioning scheme and design your Redis keys as per the partitioning scheme as Redis does internally when we add a new master in the cluster, creating consistent hashing would require some efforts.
Very simple you can distribute loads based on the userId for example if userId is between 0 and 1000 then use queue_0, 1000-2000 queue_1, and so on. This is a manual process that you can be automated using some script. Whenever a new queue is added to the set all consumers have to be notified and the publisher will be updated as well.
Dividing based on the number is a range partition scheme, you can use a hash partition scheme as well, either you use a range or hash partitioning scheme, whenever a new queue is added to the queue set the consumers must be notified for potential updates. Consumers can spawn a new worker for the new queue, removing a queue could be tricky as all consumers must have drained their respective queue.
You might consider using Rqueue

Azure service bus multiple instances for the same subscriber

I have a situation where I have an asp.net core application which registers a subscription client to a topic on startup (IHostedService), this subscription client essentially has a dictionary of callbacks that need to be fired whenever it detects a new message in a topic with an id (this id is stored on the message properties). This dictionary lives throughout the lifetime of the application, and is in memory.
Everything works fine on a single instance of the asp.net core app service on azure, as soon as I scale up to 2, I notice that sometimes the callbacks in the subscription are not firing. This makes sense, as we have two instances now, each with its own dictionary store of callbacks.
So I updated the code to check if the id of the subscription exists, if not, abandon message, if yes, get the callback and invoke it.
public async Task HandleMessage(Microsoft.Azure.ServiceBus.Message message, CancellationToken cancellationToken)
{
var queueItem = this.converter.DeserializeItem(message);
var sessionId = // get the session id from the message
if (string.IsNullOrEmpty(sessionId))
{
await this.subscriptionClient.AbandonAsync(message.SystemProperties.LockToken);
return;
}
if (!this.subscriptions.TryGetValue(sessionId, out var subscription))
{
await this.subscriptionClient.AbandonAsync(message.SystemProperties.LockToken);
return;
}
await subscription.Call(queueItem);
// subscription was found and executed. Complete message
await this.subscriptionClient.CompleteAsync(message.SystemProperties.LockToken);
}
However, the problem still occurs. My only guess is that when calling AbandonAsync, the same instance is picking up the message again?
I guess what I am really trying to ask is, if I have multiple instances of a topic subscription client all pointing to the same subscriber for the topic, is it possible for all the instances to get a copy of the message? Or is that not guaranteed.
if I have multiple instances of a topic subscription client all pointing to the same subscriber for the topic, is it possible for all the instances to get a copy of the message? Or is that not guaranteed.
No. If it's the same subscription all clients are pointing to, only one will be receiving that message.
You're running into an issue of scaling out with competing consumers. If you're scaling out, you never know what instance will pick the message. And since your state is local (in memory of each instance), this will fail from time to time. Additional downside is the cost. By fetching messages on the "wrong" instance and abandoning, you're going to pay higher cost on the messaging side.
To address this issue you either need to have a shared/centralized or change your architecture around this.
I managed to solve the issue by making use of service bus sessions. What I was trying to do with the dictionary of callbacks is basically a session manager anyway!
Service bus sessions allow me to have multiple instances of a session client all pointing to the same subscription. However, each instance will only know or care about the sessions it is currently dealing with.

Can RabbitMQ (or similar message queuing system) be used to single thread requests per user?

The issue is we have some modern web applications that are integrated with a legacy system that was never designed to support multiple concurrent requests from a single user. Basically there are certain types of requests that the legacy system can only handle one-at-a-time from a single user. It can handle multiple concurrent requests coming from different users, but for technical reasons cannot handle multiple from a single user. In these situations, the user's first request will complete successfully, but any subsequent requests from that same user that come in while the first request is still executing will fail.
Because our apps are ajax enabled, multi-tab/multi-browser friendly, and just the fact that there are multiple apps - there are certain scenarios where a user could wind up having more than one of these types of requests being sent to the legacy system at the same time.
I'm trying to determine if something like RabbitMQ could be positioned in front of the legacy system and leveraged to single-thread requests per user/IP. The thinking being that the web apps would send all requests to MQ, and they'd stack into per-user queues and pass on to the legacy system one at a time.
I don't know if there would be concerns about the potential number of queues this could create - we have a user-base of approx 4,000.
And I know we could somewhat address this in the web apps individually, but since there are multiple apps it'd be duplicating logic across them, and you'd still have the potential for two different apps to fire off concurrent requests.
Any feedback would be appreciated. Thanks-
I'm not sure a unique queue per user will work as you would need to have a backend worker process listening for messages on that queue that would need to be dynamically created.
Below is one option but it does have a performance bottleneck potential as a single backend process would be handling all requests sequentially. You could use multiple worker processes but you wouldn't know if one had completed before the other causing a race condition if your app requires a specific sequence of actions.
You could simply put all transactions (from all users) into a single queue and have a backend process pull off of that queue and service the request. If there needs to be a response back to the user once the request was serviced, then the worker process could respond back to a separate queue with a correlationID that could be used to send the response date back to the correct user.
I've done this before with ExpressJS apps where the following flow would happen:
The user/process/ajax makes a request
Express takes the payload from the request object and sends it to a RabbitMQ queue with a unique correlationId (e.g. UUID).
Express then takes the response object and stores it in a responseStore object with the key being the correlationId
Meanwhile, a backend worker process pulls the item from the queue, does some work and then sends a message to a different response queue with the same correlationId
The ExpressJS application has a connection to the response queue and when it receives a message, it takes the correlationId from the response and looks for a response object stored with same correlationId in the responseStore. If it finds it, it takes the payload from the message and does something like response.send(payload) or response.json(payload)
To do this, you should also have a mechanism that stores the creation time of the response object in the responseStore along with the response object. Then have a separate process that will check the responseStore and clean up old response objects after a certain timeout in case there are issues with the backend process completing.
Look here for more info on RPC with RabbitMQ:
https://www.rabbitmq.com/tutorials/tutorial-six-javascript.html
Hope this helps.

Redis Stale Data

I'm new at Redis. I'm designing a pub/sub mechanism, in which there's a specific channel for every client (business client) that has at least one user (browser) connected. Those users then receive information of the client to which they belong.
I need Redis because I have a distributed system, so there exists a backend which pushes data to the corresponding client channels and then exists a webapp which has it's own server (multiple instances) that holds the users connections (websockets).
Resuming:
The backend is the publisher and webapp server is the subscriber
A Client has multiple Users
One channel per Client with at least 1 User connected
If Client doesn't have connected Users, then no channel exists
Backend pushes data to every existing Client channel
Webapp Server consumes data only from the Client channels that correspond to the Users connected to itself.
So, in order to reduce work, from my Backend I don't want to push data to Clients that don't have Users connected. So it seems that I need way to share the list of connected Users from my Webapp to my Backend, so that the Backend can decide which Clients data push to Redis. The obvious solution to share that piece of data would be the same Redis instance.
My approach is to have a key in Redis with something like this:
[USERS: User1/ClientA/WebappServer1, User2/ClientB/WebappServer1,
User3/ClientA/WebappServer2]
So here comes my question...
How can I overcome stale data if for example one of my Webapps nodes crashes and it doesn't have the chance to remove the list of connected Users to it from Redis?
Thanks a lot!
Firstly, good luck with the overall project - sounds challenging and fun :)
I'd use a slightly different design to keep track of my users - have each Client/Webapp maintain a set (possibly sorted with login time as score) of their users. Set a TTL for the set and have the client/webapp reset it periodically, or it will expire if the owning process crashes.

How is Redis used in Trello?

I understand that, roughly speaking, Trello uses Redis for a transient data store.
Is anyone able to elaborate further on the part it plays in the application?
We use Redis on Trello for ephemeral data that we would be okay with losing. We do not persist the data in Redis to disk, and we use it allkeys-lru, so we only store things there can be kicked out at any time with only very minor inconvenience to users (e.g. momentarily seeing an incorrect user status). That being said, we give it more than 5x the space it needs to store its actual working set and choose from 10 keys for expiry, so we really never see anything get kicked out that we're using.
It's our pubsub server. When a user does something to a board or a card, we want to send a message with that delta to all websocket-connected clients that are subscribed to the object that changed, so all of our Node processes are subscribed to a pubsub channel that propagates those messages, and they propagate that out to the appropriately permissioned and subscribed websockets.
We SORT OF use it to back socket.io, but since we only use the websockets, and since socket.io is too chatty to scale like we need it to at the moment, we have a patch that disables all but the one channel that is necessary to us.
For our users who don't have websockets, we have to keep a list of the actions that have happened on each object channel since the user's last poll request. For that we use a list which we cap at the most recent 100 elements, and an auxilary counter of how many elements have been added to the list since it was created. So when we're answering a poll request from such a browser, we can check the last element it reports that it has seen, and only send down any messages that have been added to the queue since then. So that gets a poll request down to just a permissions check and a single Redis key check in most cases, which is very fast.
We store some ephemeral data about the active status of connected users in Redis, because that data changes frequently and it is not necessary to persist it to disk.
We store short-lived keys to support OAuth logins in Redis.
We love Redis; once you have an instance of it up and running, you want to use it for all kinds of things. The only real trouble we have had with it is with slow-consuming clients eating up the available space.
We use MongoDB for our more traditional database needs.
Trello uses Redis with Socket.IO (RedisStore) for scaling, with the following two features:
key-value store, to set and get values for a connected client
as a pub-sub service
Resources:
Look at the code for RedisStore in Socket.IO here: https://github.com/LearnBoost/socket.io/blob/master/lib/stores/redis.js
Example of Socket.IO with RedisStore: http://www.ranu.com.ar/2011/11/redisstore-and-rooms-with-socketio.html