Erlang/Mochiweb newbie question abt clients communication - process

Everytime the client/browser connects to
Mochiweb server, it creates new process of Loop, doesn't it? So, if I want
to transfer a message from one client to another (typical chat system) I
should use the self() of Loop to store all connected clients PIDs, shouldn't
I?
If something(or everything) is wrong so far plz explain me briefly how the
system works, where is server process and where is client process?
How to send a message to the Loop process of client using its PID? I mean where to
put the "receive" in the Loop?

Here's a good article about a Mochiweb Web Chat implemention. HTTP Clients don't have PID's as HTTP is a stateless protocol. You can use cookies to connect a request to a unique visitor of the chat room.

First, do your research right. Check out this article , and this one and then this last one.
Let the mochiweb processes bring chat data into your other application server (could be a gen_server, a worker in your OTP app with many supervisors, other distributed workers e.t.c). You should not depend on the PID of the mochiweb process. Have another way of uniquely identifying your users. Cookies, Session ids, Auth tokens e.t.c. Something managed only by your application. Let the mochiweb processes just deliver chat data to your servers as soon as its available. You could do some kind of queuing in mnesia where every user has a message queue into which other users post chat messages. Then the mochiweb processes just keep asking mnesia if there is a message available for the user at each connection. In summary, it will depend on the chat methodology: HTTP Long polling/COMET, REST/ Server push/Keep-alive connections blur blur blur....Just keep it fault tolerant and do not involve mochiweb processes in the chat engine, just let mochiweb be only transport and do your chat jungle behind it!

You can use several data structures to avoid using PIDs for identity. Take an example of a queue(). Imagine you have a replicated mnesia Database with RAM Table in which you have implemented a clearly uniquely identifiable queue() per user. A process (mochiweb process), holding a connection to the user only holds an identity of this users session. It then uses this identity to keep checking into his queue() in Mnesia at regular intervals (if you are intending to it this way -- keeping mochiweb processes alive as long as the users session). Then it means that no matter which Process PID a user is connected through, as long as the process has the users identity, then it can fetch (read) messages from his message queue(). This would consequently result in having the possibility of a user having multiple client sessions. The same process can use this identity to dump messages from this user into other users' queues().

Related

Redis - Max Subscriptions/Connections?

I'm designing a Java Spring based real-time notifications system & chat system using Redis & WebSockets(with sockJS and STOMP). Requirement is for each user to subscribe to a unique channel (channel name will be user id). This is because notifications can be targeted to a single user and chat conversation can be 1-on-1. The very reason im using redis is to get an event triggered in the corresponding application server(there are many) where the user is connected via WebSocket. As I understand, when a publish happens to say "user1" - and if I want to get the "onMessage handler" fired for just that target user:
Do I need to maintain 1 redis connection per user ?
is it okay to open 15k connections at a time with 15k unique subscriptions for those many users connected to the system at once?
Since you have tagged the question with Redisson, I presume you are using it already. If your choice of WebSocket framework is flexible, i.e. not limited to be SockJS with STOMP, you could consider the netty-socketio project. It is written by the author of Redisson and the integration between the two can't be any more natural.
Netty-socketio is fully compatible wit the popular SocketIO client side JS library and it is used by plenty companies commercially.
It doesn't require one redis connection per user and there are people whose usage are known to have already exceeded your requirement.
This is mentioned in the project's README file.
Customer feedback in 2014:
"To stress test the solution we run 30 000 simultaneous websocket clients and managed to peak at total of about 140 000 messages per second with less than 1 second average delay." (c) Viktor Endersz - Kambi Sports Solutions

Can RabbitMQ (or similar message queuing system) be used to single thread requests per user?

The issue is we have some modern web applications that are integrated with a legacy system that was never designed to support multiple concurrent requests from a single user. Basically there are certain types of requests that the legacy system can only handle one-at-a-time from a single user. It can handle multiple concurrent requests coming from different users, but for technical reasons cannot handle multiple from a single user. In these situations, the user's first request will complete successfully, but any subsequent requests from that same user that come in while the first request is still executing will fail.
Because our apps are ajax enabled, multi-tab/multi-browser friendly, and just the fact that there are multiple apps - there are certain scenarios where a user could wind up having more than one of these types of requests being sent to the legacy system at the same time.
I'm trying to determine if something like RabbitMQ could be positioned in front of the legacy system and leveraged to single-thread requests per user/IP. The thinking being that the web apps would send all requests to MQ, and they'd stack into per-user queues and pass on to the legacy system one at a time.
I don't know if there would be concerns about the potential number of queues this could create - we have a user-base of approx 4,000.
And I know we could somewhat address this in the web apps individually, but since there are multiple apps it'd be duplicating logic across them, and you'd still have the potential for two different apps to fire off concurrent requests.
Any feedback would be appreciated. Thanks-
I'm not sure a unique queue per user will work as you would need to have a backend worker process listening for messages on that queue that would need to be dynamically created.
Below is one option but it does have a performance bottleneck potential as a single backend process would be handling all requests sequentially. You could use multiple worker processes but you wouldn't know if one had completed before the other causing a race condition if your app requires a specific sequence of actions.
You could simply put all transactions (from all users) into a single queue and have a backend process pull off of that queue and service the request. If there needs to be a response back to the user once the request was serviced, then the worker process could respond back to a separate queue with a correlationID that could be used to send the response date back to the correct user.
I've done this before with ExpressJS apps where the following flow would happen:
The user/process/ajax makes a request
Express takes the payload from the request object and sends it to a RabbitMQ queue with a unique correlationId (e.g. UUID).
Express then takes the response object and stores it in a responseStore object with the key being the correlationId
Meanwhile, a backend worker process pulls the item from the queue, does some work and then sends a message to a different response queue with the same correlationId
The ExpressJS application has a connection to the response queue and when it receives a message, it takes the correlationId from the response and looks for a response object stored with same correlationId in the responseStore. If it finds it, it takes the payload from the message and does something like response.send(payload) or response.json(payload)
To do this, you should also have a mechanism that stores the creation time of the response object in the responseStore along with the response object. Then have a separate process that will check the responseStore and clean up old response objects after a certain timeout in case there are issues with the backend process completing.
Look here for more info on RPC with RabbitMQ:
https://www.rabbitmq.com/tutorials/tutorial-six-javascript.html
Hope this helps.

Redis Stale Data

I'm new at Redis. I'm designing a pub/sub mechanism, in which there's a specific channel for every client (business client) that has at least one user (browser) connected. Those users then receive information of the client to which they belong.
I need Redis because I have a distributed system, so there exists a backend which pushes data to the corresponding client channels and then exists a webapp which has it's own server (multiple instances) that holds the users connections (websockets).
Resuming:
The backend is the publisher and webapp server is the subscriber
A Client has multiple Users
One channel per Client with at least 1 User connected
If Client doesn't have connected Users, then no channel exists
Backend pushes data to every existing Client channel
Webapp Server consumes data only from the Client channels that correspond to the Users connected to itself.
So, in order to reduce work, from my Backend I don't want to push data to Clients that don't have Users connected. So it seems that I need way to share the list of connected Users from my Webapp to my Backend, so that the Backend can decide which Clients data push to Redis. The obvious solution to share that piece of data would be the same Redis instance.
My approach is to have a key in Redis with something like this:
[USERS: User1/ClientA/WebappServer1, User2/ClientB/WebappServer1,
User3/ClientA/WebappServer2]
So here comes my question...
How can I overcome stale data if for example one of my Webapps nodes crashes and it doesn't have the chance to remove the list of connected Users to it from Redis?
Thanks a lot!
Firstly, good luck with the overall project - sounds challenging and fun :)
I'd use a slightly different design to keep track of my users - have each Client/Webapp maintain a set (possibly sorted with login time as score) of their users. Set a TTL for the set and have the client/webapp reset it periodically, or it will expire if the owning process crashes.

Camel route "to" specific websocket endpoint

I have some camel routes with mina sockets and jetty websockets. I am able to broadcast a message to all the clients connected to the websocket but how do i send a message to a specific endpoint. How do i maintain a list of all connected clients with a client id as reference so i can route to a specific client. Is that possible? Will i be able to mention a dynamic client in the to URI?
Or maybe i am thinking about this wrong and i need to create topics on active mq and have the clients subscribe to it. That would mean that i create a topic for every websocket client? and route the message to the right topic.
Am i atleast on the right track here, any examples you can point out? Google was not helpful.
The approach you take depends on how sensitive the client information is. The downside of a single topic with selectors is that anyone can subscribe to the topic without a selector and see all the information for everyone - not usually something that you want to do.
A better scheme is to use a message distribution mechanism (set of Camel routes) that act as an intermediary between the websocket clients and the system producing the messages. This mechanism is responsible for distributing messages from a single destination to client-specitic destinations. I have worked on a couple of banking web front-ends that used a similar scheme.
In order for this to work you first generate for each user a distinct token/UUID; this is presented to the user when the session is established (usually through some sort of profile query/message).
It's essential that the UUID can be worked out as a hash of the clientId rather than being stored in a DB, as it will be used all the time and you want to make sure this is worked out quickly.
The user then uses that information to connect to specific topics that use that UUID as a suffix. For example two users subscribing to an orderConfirmation topic would each subscribe to their own version of that topic:
clientA -> orderConfirmation.71jqsd87162iuhw78162wd7168
clientB -> orderConfirmation.76232hdwe7r23j92irjh291e0d
To keep track of "presence", your clients would need to periodically send a heartbeat message containing their clientId to a well-known topic that your distribution mechanism listens on. Clients should not be able to subscribe to this topic for reads (see ActiveMQ Security). The message distribution mechanism needs to keep in memory a data structure that contains the clientId and the time a heartbeat was last seen.
When a message is received by the distribution mechanism, it checks whether the clientID for which it received the message has a "live/present" session, determines the UUID for the client, and broadcasts the message on the appropriate topic.
Over time this will create a large number of topics on your broker that you don't want hanging around when the user has gone away. You can configure ActiveMQ to delete these if they have been inactive for some time.
You definitely do not want to create separate endpoint for each client.
Topic and a subscription with selector is an elegant way to resolve it.
I would say the best one.
You need single topic, which every client would subscribe to with the selector looking like where clientId in ('${myClientId}', 'EVERYONE'). Now when you want to publish a message to specific client, you set a property clientId to the id of this client. If you want to broadcast, you set it to 'EVERYONE'
I hope I understand the problem right...

How is Redis used in Trello?

I understand that, roughly speaking, Trello uses Redis for a transient data store.
Is anyone able to elaborate further on the part it plays in the application?
We use Redis on Trello for ephemeral data that we would be okay with losing. We do not persist the data in Redis to disk, and we use it allkeys-lru, so we only store things there can be kicked out at any time with only very minor inconvenience to users (e.g. momentarily seeing an incorrect user status). That being said, we give it more than 5x the space it needs to store its actual working set and choose from 10 keys for expiry, so we really never see anything get kicked out that we're using.
It's our pubsub server. When a user does something to a board or a card, we want to send a message with that delta to all websocket-connected clients that are subscribed to the object that changed, so all of our Node processes are subscribed to a pubsub channel that propagates those messages, and they propagate that out to the appropriately permissioned and subscribed websockets.
We SORT OF use it to back socket.io, but since we only use the websockets, and since socket.io is too chatty to scale like we need it to at the moment, we have a patch that disables all but the one channel that is necessary to us.
For our users who don't have websockets, we have to keep a list of the actions that have happened on each object channel since the user's last poll request. For that we use a list which we cap at the most recent 100 elements, and an auxilary counter of how many elements have been added to the list since it was created. So when we're answering a poll request from such a browser, we can check the last element it reports that it has seen, and only send down any messages that have been added to the queue since then. So that gets a poll request down to just a permissions check and a single Redis key check in most cases, which is very fast.
We store some ephemeral data about the active status of connected users in Redis, because that data changes frequently and it is not necessary to persist it to disk.
We store short-lived keys to support OAuth logins in Redis.
We love Redis; once you have an instance of it up and running, you want to use it for all kinds of things. The only real trouble we have had with it is with slow-consuming clients eating up the available space.
We use MongoDB for our more traditional database needs.
Trello uses Redis with Socket.IO (RedisStore) for scaling, with the following two features:
key-value store, to set and get values for a connected client
as a pub-sub service
Resources:
Look at the code for RedisStore in Socket.IO here: https://github.com/LearnBoost/socket.io/blob/master/lib/stores/redis.js
Example of Socket.IO with RedisStore: http://www.ranu.com.ar/2011/11/redisstore-and-rooms-with-socketio.html