Imagine and application like Whatsapp that for each chat has a count of mentions and messages not read:
I want to implement a scalable system to handle notification count of an app. Here what I've think about possible solutions and their problems:
1) Create a counter for each user in each group collection and increase by 1 for each new message:
➜ Problem: if I have chats with 500, 1000, 10000 users I will have to do 500, 1000, 10000 field updates.
➜ Test: I've created a new collection with 50M of documents. Update time for 6000 users = 0.15 seconds. Update time for 100000 users = 14.2 seconds. It's not scalable.
Notifications Model: (compound index: roomId: 1, channelId: 1, userId: 1)
{
roomId: string,
channelId: string,
userId: string,
unread_messages: int,
unread_mentions: int,
last_read: date
}
2) Save the last message read from each user and when doing the initial data GET, count for each chat, from the last message read to the last, and limit it.
➜ Problem: if you have 200 chats and you limit the number of notifications to 100 and it has been a while without logging into the application, you will have to count 100 * 200 rooms. When the "Count" operation is quite expensive for databases.
➜ Test: I've counted 100 messages per chat and 200 chats = 8.4 seconds. Messages indexed by id and timestamp. A lot of time for client login.
3) Set up a PUB / SUB using for example ActiveMQ, RabbitMQ or Kafka, and for each chat create a queue.
➜ Problem: You duplicate messages in the database and in queue/topics, in addition to being shared queues you would have to make queries if I am user X up to where I have read the last time and when you connect as a subscriber those messages are consumed and they are no longer available to other consumers.
In kafka, if each topic it's a chat, I can't do a count of pending notifications without getting all pending messages and consuming them. So, if I consume this messages and I dont enter in a chat, there will be no notifications the next time I log in.
Can you think of any more options or are any of the ones I mentioned previously are scalable?
Thank you very much in advance.
In order to solve this, you can keep the count of written messages in every chat and the count of read messages for every user in every chat. Essentially, the difference between these numbers is the number of unread messages of a user for a specific chat.
Let' say there are 1000 online users, all in 100 chat rooms, 10 users active in each room and 990 inactive in each room. Each active user, all of a sudden writes one message in the chat. This will produce 1000 messages and only 1000 counts (10 per chat). Users which are inactive will only receive the new count for each chat, but their own count for read messages stays the same. For those active in a chat there is no count, since the number of their read messages will equal the count of the chat.
If one user is offline and enters online in one chat, he will get 10 messages and one update for the number of read messages. If he is enrolled on all 100 chats, he will get 1000 messages and 100 updates if he reads all.
If one user is online, but not active in any chat (app in the background), he will get the new count for every chat that is written into. Since there is a read message count for every chat in his profile, the client will have to do the math and display the difference.
This can be further optimised by letting the client do some work and update the backend with the number of read messages. This basically offloads the backend for half of the operations in the example above, so the effective number of operations done in the backend will be 1000.
Of course, there can be further optimisations done, like bidirectional asynchronous updates that are being sent at controlled time intervals or number of messages. This allows for both the client and the backend to send bulk notifications and control use of resources.
Given context you provided, I think solution 1) is perfectly viable, but decouple counter update from visualization and keep these info in memory.
Now imagine the following process:
application start
during start a separate thread is running, doing first counter (in 14.2 seconds, acceptable on start )
these information are loaded in some kind of in memory database ( for example redis ), for quick access -> this is your "user in memory notification counter cache" with a simple map (uid,[c]) where uid is userId and [c] is array of counter.
you can limit this map for each user, for example at max 255 chat/groups, otherwise your application need to compute and update/extend the map (like the limit you mentioned)
periodically you can "compact" this map and purge from memory unused counters ( each night, as example, or each 2 hours, depends on your requirements ) to keep memory on check and don't explode
user1 access to the application first time
application fire a request and get unread messages notifications from the cache ( in memory, so really quick)
user2 send a message to user1, now some scenarios:
user1 is not online ( app closed ), so a "slow" refresh for user1 (and only this) unread notification counter can be triggered to update the in memory cache ( and some seconds are acceptable )
user1 is online, chat is openend and messages is delivered. In this case counter cache doesn't require a refresh
user1 is online, but not in the chat specific chat, but for example in the chat list. I suppose some kind of trigger can be fired and ask for updated/refresh list of notification messages for the user, BUT for only the chat with user2, not for all --> I think this is the key, so you can update/refresh is on the app and in the in memory central cache
I think this will solve your problem, get more speed and quickness, but require:
application know the status for each user (online/offline) and store it for quick access ( another map in the in memory database maybe ? )
local user app know when a new message in a specific chat is available
I suppose this two requirements are already part of your system, for a "chat messaging platform kind-of"
We launched a new mobile app for iOS and Android devices first week of January and use FCM to send push notification to users.
Thus far we've sent (based on the firebase console report) ~60k notifications out to our users and overall its a very solid and reliable platform. We split our 'sends' in groups of 1000 push tokens / devices.
Question: ~15 times since we've launched we've received 'No Result' back from the CURL that sends the notifications upstream to FCM... and on one occasion we received an error 500.
To work around this and not just assume success we are detecting when the result isn't what we expect it to be upon success, and we log the response (i.e. "no result")... then wait 5 seconds and retry, up to 3 times. (our log message denotes the 'try number' as well).
We have, maybe twice a week, received the 'first try' message (meaning the first attempt failed and 5 secs later the 2nd attempt kicks off)... and only ONCE (this week) have we received the 'second try' message...
We're wondering if this is normal behavior for FCM? Is there some paid level of support or access that would alleviate these re-try instances for us? I don't think there is an SLA for FCM, but generally speaking are others seeing this same behavior and is the rate I've described here what you'd' consider 'normal'?
Thx!
Answer received from Google today:
Hello!
If I've correctly understood this, you have sent 60k messages and received 16 failures? That comes out to around 99.9997% success. Three nines is pretty much industry gold. So looking stellar so far.
There is no paid FCM version, but all clients, regardless of payment plan, run on the best hardware available so you're already in the premium servers. : )
We have a Java application that gets messages from rabbitmq using Spring AMQP.
For some of the queues, the number of consumers are not increasing resulting in slower messages delivery rate.
e.g. even though the max consumers is set to 50, number of consumers remained 6 for most of the time for the load of 9000 messages.
However, this is not the case with other queues. i..e consumers count reached till 35 for other queues.
We are using SimpleMessageListenerContainer's setMaxConcurrentConsumers API for setting max consumers.
Can someone please help me to understand this?
Configuration:
number of concurrent consumers: 4
number of max concurrent consumers: 50
When asking questions like this, you must always show configuration. Edit your question with complete details.
It depends on your configuration. By default, a new consumer is only added once every 10 seconds, and only if an existing consumer receives 10 messages without any gaps.
If that still doesn't answer your question, turn on DEBUG logging. If you can't figure it out from that, post the log (covering at least startConsumerMinInterval milliseconds) someplace like pastebin or dropbox.
I have a quickblox account that we're using internally for testing. Very low throughput (Total of around 600 messages across 2 days and never more than a 3 or 4 per second at the very peak.)
Today the messages stopped sending in the chatroom. There doesn't appear to be any errors coming through the network panel of chrome and no errors popping up in the admin panel.
As a test, without changing any client code, I created a new room and simply updated my config so my client pointed there. This worked with absolutely no problems.
Are there any things I may be missing here? Is this possibly a free tier thing where only a few hundred messages may be sent at any one time or is this more likely something client side?
It were some maintenance periods, you should receive emails about it.
So maybe you were trying to use chat during that period.
600 messages across 2 days it's very small value, so no problems here with limits.
If I post 3 messages to a topic with 2 active consumers, what will the dequeue count be after all messages are successfully consumed, 3 or 6? From my JConsole I think 6 (it shows enQ=3 and deQ=6) but can you confirm?
Yes your assumption is correct. But keep in mind that it might not always be an exact multiple, if one of the consumers disconnects for a period of time and then reconnects, the dequeue count will not include the messages missed by that client while it was disconnected.