Usage rate limit for FCM - 2500? or 400? for admin SDK - firebase-cloud-messaging

Reading the docs I found:
XMPP server throttling
We limit the rate that you can connect to FCM XMPP servers to 400 connections per minute per project. This shouldn't be an issue for message delivery, but it is important for ensuring the stability of our system.
For each project, FCM allows 2500 connections in parallel.
Also within that page there was a description of different ways to connect to FCM to send messages. HTML and XMPP are different mechanisms, so I am assuming that the admin SDK (Golang in my case) uses HTTP under the hood and not XMPP so please correct me if that's not true.
If the admin SDK uses HTTP, that means there can only be 2500 simultaneous connections.
I'm making a scalable application where users basically define their own schedule for notifications (and the messages) and a server retrieves it, runs on a timer loop every 30 seconds or so to see who needs their message sent.
For all intents and purposes, each notification is different. However the vast majority of these notifications will land on the hour. Meaning my server will have to send possibly many thousands of notifications within the X:00 minute in the hour. It's important these notifications come on time (ie, I cannot space them all out within the hour).
Using workarounds like topics won't work in my case because everyone is individual.
I'm just thinking of options to deal with these limitations (and make sure I understand them). If FCM allows 2500 connections in parallel via the admin SDK in Go, can I do 2500 async connections, wait until they all finish, and do another 2500, rinse and repeat? That way if I have 25,000 subscribed users let's say, and each takes 1 seconds, I could theoretically send all the notifications in 10 seconds, which is acceptable.
Are there any other rate limits that I need to be aware of?

I am assuming that the admin SDK (Golang in my case) uses HTTP under the hood
That is correct. The Admin SDKs use the versioned HTTP API to make calls to FCM.
The key to scaling your FCM usage is to use the resources efficiently. For example in the versioned API (that the Admin SDKs use under the hood) you can pass up to 500 requests over a single HTTP connection, meaning that you can amortize the cost of building the connection over many calls.
You can find an example of the actual HTTP calls in the REST example in the documentation on sending messages to multiple devices:
Content-Type: application/http
Content-Transfer-Encoding: binary
Authorization: Bearer ya29.ElqKBGN2Ri_Uz...HnS_uNreA
POST /v1/projects/myproject-b5ae1/messages:send
Content-Type: application/json
accept: application/json
"title":"FCM Message",
"body":"This is an FCM notification message!"
Content-Type: application/http
Content-Transfer-Encoding: binary
Authorization: Bearer ya29.ElqKBGN2Ri_Uz...HnS_uNreA
POST /v1/projects/myproject-b5ae1/messages:send
Content-Type: application/json
accept: application/json
"title":"FCM Message",
"body":"This is an FCM notification message!"
In the Go Admin SDK this'd be equivalent to calling sendAll, which:
func (c Client) SendAll(ctx context.Context, messages []*Message) (*BatchResponse, error)
SendAll sends the messages in the given array via Firebase Cloud Messaging.
The messages array may contain up to 500 messages. SendAll employs batching to send the entire array of [messages] as a single RPC call. Compared to the Send() function, this is a significantly more efficient way to send multiple messages. The responses list obtained from the return value corresponds to the order of the input messages. An error from SendAll indicates a total failure -- i.e. none of the messages in the array could be sent. Partial failures are indicated by a BatchResponse return value.


ASP.NET Core and 102 status code implementation

I have long operation, which called via Web API. Status code 102 says to us:
An interim response used to inform the client that the server has
accepted the complete request, but has not yet completed it.
This status code SHOULD only be sent when the server has a reasonable
expectation that the request will take significant time to complete.
As guidance, if a method is taking longer than 20 seconds (a
reasonable, but arbitrary value) to process the server SHOULD return a
102 (Processing) response. The server MUST send a final response after
the request has been completed.
So, I want to return 102 status code to client, then client waits response about result of operation. How to implement it on .NET?
I read this thread: How To Return Http 102 Processing in Asp.Net Web Api?
This thread has good explanation what is necessary, but no response. I don't understand how it implement on .NET, not theory...
Using HTTP 102 requires that the server send two responses for one request. ASP.NET (Core or not) does not support sending a response to the client without completely ending the request. Any attempt to send two responses will end up in throwing an exception and just not working. (I tried a couple different ways)
There's a good discussion here about how it's not actually in the HTTP spec, so implementing it isn't really required.
There are a couple alternatives I can think of:
Use web sockets (a persistent connection that allows data to be sent back and forth), like with SignalR, for example.
If your request takes a long time because it's getting data from elsewhere, you can try pulling in that data via a stream and send it to the client via a stream. That will send the data as it's coming in, rather than loading it all into memory first before sending it. Here's an example of streaming data from a database to the response:

Can RabbitMQ (or similar message queuing system) be used to single thread requests per user?

The issue is we have some modern web applications that are integrated with a legacy system that was never designed to support multiple concurrent requests from a single user. Basically there are certain types of requests that the legacy system can only handle one-at-a-time from a single user. It can handle multiple concurrent requests coming from different users, but for technical reasons cannot handle multiple from a single user. In these situations, the user's first request will complete successfully, but any subsequent requests from that same user that come in while the first request is still executing will fail.
Because our apps are ajax enabled, multi-tab/multi-browser friendly, and just the fact that there are multiple apps - there are certain scenarios where a user could wind up having more than one of these types of requests being sent to the legacy system at the same time.
I'm trying to determine if something like RabbitMQ could be positioned in front of the legacy system and leveraged to single-thread requests per user/IP. The thinking being that the web apps would send all requests to MQ, and they'd stack into per-user queues and pass on to the legacy system one at a time.
I don't know if there would be concerns about the potential number of queues this could create - we have a user-base of approx 4,000.
And I know we could somewhat address this in the web apps individually, but since there are multiple apps it'd be duplicating logic across them, and you'd still have the potential for two different apps to fire off concurrent requests.
Any feedback would be appreciated. Thanks-
I'm not sure a unique queue per user will work as you would need to have a backend worker process listening for messages on that queue that would need to be dynamically created.
Below is one option but it does have a performance bottleneck potential as a single backend process would be handling all requests sequentially. You could use multiple worker processes but you wouldn't know if one had completed before the other causing a race condition if your app requires a specific sequence of actions.
You could simply put all transactions (from all users) into a single queue and have a backend process pull off of that queue and service the request. If there needs to be a response back to the user once the request was serviced, then the worker process could respond back to a separate queue with a correlationID that could be used to send the response date back to the correct user.
I've done this before with ExpressJS apps where the following flow would happen:
The user/process/ajax makes a request
Express takes the payload from the request object and sends it to a RabbitMQ queue with a unique correlationId (e.g. UUID).
Express then takes the response object and stores it in a responseStore object with the key being the correlationId
Meanwhile, a backend worker process pulls the item from the queue, does some work and then sends a message to a different response queue with the same correlationId
The ExpressJS application has a connection to the response queue and when it receives a message, it takes the correlationId from the response and looks for a response object stored with same correlationId in the responseStore. If it finds it, it takes the payload from the message and does something like response.send(payload) or response.json(payload)
To do this, you should also have a mechanism that stores the creation time of the response object in the responseStore along with the response object. Then have a separate process that will check the responseStore and clean up old response objects after a certain timeout in case there are issues with the backend process completing.
Look here for more info on RPC with RabbitMQ:
Hope this helps.

How REST API handle continuous data update

I have REST backend api, and front end will call api to get data.
I was wondering how REST api handles continuous data update, for example,
in jenkins, we will see that if we execute build job, we can see the continous log output on page until job finishes. How REST accomplish that?
Jenkins will just continue to send data. That's it. It simply carries on sending (at least that's what I'd presume it does). Normally the response contains a header field indicating how much data the response contains (Content-Length). But this field is not necessary. The server can omit it. In such a case the response body ends when the server closes the connection. See RFC 7230:
Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection.
Another possibility would be to use the chunked transfer encoding. Then the server sends a chunk of data having its own Content-Length header. The server terminates this by sending a zero-length last chunk.
Websocksts would be a third possibility.
I was searching for an answer myself and then the obvious solution struck me. In order to see what type of communication a service is using, you can simply view it from browser side using Developer Tools.
In Google Chrome it will be F12 -> Network.
In case of Jenkins, front-end is sending AJAX requests to backend for data:
every 5 seconds on Dashboards page
every second during Pipeline run (Console Output page), that you have mentioned.
I have also checked the approach in AWS. When checking the status of instances (example: Initializing... , Booting...), it queries the backend every second. It seems to be a standard interval for its services.
Additional note:
When running an AWS Remote Console though, it first sends requests for remote console instance status (backend answers with { status: "BOOTING" }, etc.). After backend returns status as "RUNNING", it starts a WebSocket session between your browser and AWS backend (you can notice it by applying WS filter in developer tools).
Then it is no longer REST API, but WebSockets, that is a different protocol (stateful).

Number of concurrent connections to GCM service

We are sending push notifications to Android devices via GCM API.
People are allowed to subscribe to different topics and receive alert every couple of days.
There is between 100_000 to 1_000_000 users subscribed for given topic, so we wanted to speed things up using more than ten connections.
We see answers with retry, so we retry after specified period of time as stated in the docs.
Can we get rid of retires by using more connections and sending the requests slower?
Or is the quota set for given API key and starting more connections will even hurt us?
We are using GCM HTTP interface. To be precise erlang-gcm library: We are sending message to 1M users. We are not sending to topic. We are performing multicast send to list of users. gcm-erlang library allows us to pass 1000 users per request (which is also the limit of GCM API). This means, we have to perform at least 1000 requests.
It takes something around 10 minutes to process all those 1000 requests, so we wanted to make them in parallel, but it doesn't make it faster. Here I've found information on throttling:
"Messages are throttled on a per application"
Does it mean, that even if this are messages to different users, we are still throttled, because they are using single API key for our mobile application?
Will the XMPP endpoint faster?
It is weird that parallelizing requests didn't make them faster. How come? Are you sure that the bottleneck is not on your side?
No, it doesn't look like you're throttled (you would receive errors if you were instead of waiting on line)
I still don't understand why topics don't work for you. They seem like a good match.
Anyway, if you want to send messages individually, I would highly recommend switching to XMPP. You will be able to send one hundred messages at a time per connection and open up to 1000 connection (but you're not gonna need that much really).

Spray with Air Timeout for Comet

I am building an Air application that long-polls a Spray server to get relevant updates.
I am new to Spray and have read that, if requests are not handled on time, a 500 timeout error is automatically sent to the client by the framework. I can catch this error on the Air side, and then send another request, etc.
Are there any drawbacks to using this approach (I cannot think of any) or is it better to avoid the timeout and send back some sort of "no news" message to the client instead?
I would say, from a RESTful perspective, that the response should pertain to the state of the resource. Looking at the available response codes:
204 No Content The server successfully processed the request, but is
not returning any content.
This states that the request was carried out successfully yet there is nothing to return.
204 No Content