How do Cloud Run instances perceive multiple requests from HTTP2 connections? - express

HTTP2 has this multiplexing feature.
From this [answer](Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.) we get that:
Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.
Let's say I split my app into 50 small bundled files, to take advantage of the multiplex communication.
My server is an express app hosted in a Cloud Run instance.
Here is what Cloud Run says about concurrency:
By default Cloud Run container instances can receive many requests at the same time (up to a maximum of 250).
So, if 5 users hit my app at the same time, does it mean that my instance will be max'ed out for a brief moment?
Because each browser (from the 5 users) will make 50 requests (for the 50 small bundled files), resulting on a total of 250.
Does the fact that multiplex traffic occurs on over the same connection change any thing? How does it work?
Does it mean that my cloud run will perceive 5 connections and my express server will perceive 250 requests? I think I'm confused about the request expression in these 2 perspectives (the cloud run instance and the express server).

A "request" is :
the establishment of the connexion between the server and the client (the browser here)
The data transfert
The connexion close.
With streaming capacity of HTTP2 and websocket, the connexion can takes minutes (and up to 1 hour) and you can send data through the channel as you want. 1 connexion = 1 request, 5 connexions = 5 requests.
But keep in mind that keeping this connexion open and processing data in it consume resources on your backend and you can't have dozens of connexion that actively send/receive data, you will saturate your instance.

Related

Load balancer confusion (Load balancer mechanism )

Hi I'm little confused about load balancer concept
I've read some articles about loadbalancer in nginx and from what I've understand is that the load balancer spread the request into multiple servers !
But i thought if one server is down another one is up and running (not simultaneously all server together)
and another thing is when request spread between servers what happen to static data like sessions and InMemory Database like RedisDB
I think i'm confused and missunderstood the loadbalancer mechanism
and from what I've understand is that the load balancer spread the request into multiple servers ! But i thought if one server is down another one is up and running (not simultaneously all server together)
As it comes from the name the goal of load balancer (LB) is to balance the load. As per wiki definition for example:
In computing, load balancing is the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.
To perform this task load balancer obviously need to have some monitoring over the resources, including liveness checks (so it can bring out of the rotation the failing servers/nodes). Ideally LB should work with stateless services (i.e. request could be routed to any of the servers supporting handling such request type) but that is not always the case due to multiple reasons, for example in ASP.NET in case of non-distributed session requests should have been routed to servers which handled the previous request from the session, which could have been handled with so called sticky session/cookie.
and another thing is when request spread between servers what happen to static data like sessions and InMemory Database like RedisDB
It is not very clear what is the question here. As I mentioned before ideally you will want to have stateless services which will use some shared datastore (s) to handle the requests, so if request comes for any server/node it can load all the needed data to handle it.
So in short when request comes to LB it selects one of the servers based on some algorithm (round robin, resource based, sharding, response time based, etc.) and send this request to this server so in theory based on the used approach sequential requests from the same source can hit different nodes/servers (so basically this is one of the ways to horizontally scale your application).
I actually found my answer in nginx doc page
Short answer is IP-Hash mechanism
Nginx doc word :
Please note that with round-robin or least-connected load balancing, each subsequent client’s request can be potentially distributed to a different server. There is no guarantee that the same client will be always directed to the same server.
If there is the need to tie a client to a particular application server — in other words, make the client’s session “sticky” or “persistent” in terms of always trying to select a particular server — the ip-hash load balancing mechanism can be used.
With ip-hash, the client’s IP address is used as a hashing key to determine what server in a server group should be selected for the client’s requests. This method ensures that the requests from the same client will always be directed to the same server except when this server is unavailable.
To configure ip-hash load balancing, just add the ip_hash directive to the server (upstream) group configuration:
upstream myapp1 {
ip_hash;
server srv1.example.com;
server srv2.example.com;
server srv3.example.com;
}
http://nginx.org/en/docs/http/load_balancing.html

how the server send message SSE worked in multiple server instance environments

I have a question on how to make SSE worked in multiple server environments.
In UI, there are two steps:
1. source = new EventSource('http://localhost:3000/stream');
source.addEventListener('open', function(e) {
$("#state").text("Connected")
}, false);
user in UI can post to api to update data
after user post to api, server is sending event to UI to udpate UI
In one server environement, this worked perfect fine, no problem at all.
But in multi server instance environments, this won't be working. For example, I have two server instance, and UI subscribed to server 1, then server 1 is remembering the connection, but data update is from server 2, when data is changed, there is no connection for SSE in server 2. Then in this senario, how can server 2 send SSE to UI?
In order to make SSE working in multiple server environments, do we need to adopt any saving solution to save the connection information so that any server instance can send SSE accurately to UI?
Let me clarify this more:
yes, both service 1 and service 2 are behind load balancer, they do not have to have same URL. UI is pure frontend end application, can even be mobile app. So, if UI is sending a eventSource request to LB of server1, then only this instance can use this connection to send event back to UI, right? But if we have multiple instance of server 1, that means any server 1 instance other than current one can NOT send event back to UI.
I believe this is the limitation of SSE unless the connection can be shared among all the instances. But how.
Thanks
If you have two servers, with different URLs, make one SSE connection (from each client) to each server.
Be aware of CORS restrictions, i.e. the same origin policy. (It works identically to xhr2 CORS, so fairly easy to google; my book also covers it in detail, chapter 9.)
If you have two servers behind a load balancer, which is presenting a single URL to the clients, then you just have to make sure the load balancer is configured correctly. I.e. to always pass through that socket to the correct server. If a back-end server dies, and needs replacing, the load balancer should close the SSE socket; the client will then auto-reconnect, and get a new back-end server.
The multiple servers behind a load balancer, should either be having their own data push socket connections to a master data source, or should all be polling the master data source.

How long will google cloud HTTP(S) Load Balancers keep tcp connections open?

tl;dr:
When a google cloud HTTPS load balancer opens a tcp stream (with a "Connection: keep-alive" header in the request), are there any guarantees around how long (at max) that stream will be kept open to the backend server?
longer:
I deployed a Go http server behind an HTTPS load balancer and quickly ran into a lot of issues because I had set an aggressive (10s) read deadline on my socket connections, which meant that my server often closed connections in the middle of reading subsequent requests. So clearly I'm doing that wrong, but at the same time I don't want to not set ANY deadlines on my sockets, because I want to guard against the possibility of these servers slowly leaking dead connections over time, eating up all my file descriptors.
As such, it would be nice if, for example, the load balancers automatically close any tcp streams that they have open after 5 minutes. That way I can set my server's read deadline to (e.g.) 6 minutes and I can be sure that I'll never interrupt any requests - the deadline will only be invoked in exceptional cases (e.g. the FIN packet from the load balancer was not received by my server).
I was unable to get an official answer on this from Google enterprise support, but from my experiments (analyzing multi-hour tcpdumps) it looks like the load balancer will close connections after ~10 minutes of idleness (meaning no tcp data packets for 10 minutes).
Per here, idle TCP connections to Compute Instances are timed out after 10 minutes, which would seem to confirm your hypothesis.

How does dropbox server keep connection alive with all its client app?

Dropbox has more than 300M user.Dropbox desktop application need to keep connection alive with dropbox server for every updates.
But how does dropbox server keep connection alive with all its desktop user?
The dropbox client keeps a TCP connection constantly open to listen for server-side notifications. When it receives a notification, the client initiates an HTTPS conversation to see what changed and download it. When something changes on the client side, it also initiates an HTTPS conversation to update the files on the server.
Source: http://www-net.cs.umass.edu/imc2012/papers/p481.pdf
The Dropbox client keeps continuously opened a TCP
connection to a notification server (notifyX.dropbox.com),
used for receiving information about changes performed else-
where. In contrast to other traffic, notification connections
are not encrypted. Delayed HTTP responses are used to implement a push mechanism: a notification request is sent by the local client asking for eventual changes; the server response is received periodically about 60 seconds later in case of no change; after receiving it, the client immediately
sends a new request. Changes on the central storage are instead advertised as soon as they are performed.
While the decrypted headers give no indication of what servers Dropbox uses to keep so many open TCP connections, people report being able to keep over 600k (https://stackoverflow.com/a/9676852/15472) or even over 1M (http://blog.whatsapp.com/196/1-million-is-so-2011). With enough load-balancing, 300M users, of which only a fraction of which are connected simultaneously and actively share data within each other, certainly seems within reach.
I doubt that all 300M users are connected at the same time... And by the amount of storage they provide, they will have enough servers to handle the needed amount of connections, maybe 1% of their user count at a time.
If you like to investigate yourself, you could use tools like TCPView (part of Sysinternals Suite) to check which connections are opened by the application, or Wireshark to check the transferred data.
I assume that you mean 'update' of storage content; that could also happen on fixed intervals by opening a connection, getting the files list and closing the connection afterwards. In this case the connection would be used for a few seconds in an interval of e.g. 5 minutes. This would again reduce the number of needed simultaneous connections by factor ~100.

Http server - slow read

I am trying to simulate a slow http read attack against apache server running on my localhost.
But it seems like, the server does not complain and simply waits forever for the client to read.
This is what I do:
Request a huge file (say ~1MB) from the http server
Read the response from the server in a loop waiting 100 secs before successive reads
Since the file is huge and the client receive buffer is small, the server has to send the file in multiple chunks. But, at the client side, I wait for 100 secs between successive reads. As a result, the server often polls the client and finds that, the receive window size of the client is zero since the client has not yet read the receive buffer.
But it looks like the server does not bother to break the connection and it silently keeps polling the client. Server sends the data when the client window size is > 0 and again goes back to wait for the client.
I want to know whether there are any apache config parameters that I can set to break the connection from the server side after waiting sometime for the client to read the data.
Perhaps this would be more useful to you, (simpler and saves you time): http://ha.ckers.org/slowloris/ which is a Perl script that sends partial HTTP requests, the Apache server leaves the connection open (now unavailable to new users) and if executed on a Linux environment, (Linux does not limit threads beyond hardware capability) you can effectively block all open sockets, and in turn prevent other users from accessing the server. It uses minimal bandwidth because it does not "flood" the server with requests, it simply slowly takes the sockets hostage. You can download the file here: http://ha.ckers.org/slowloris/slowloris.pl
To prevent an attack like this (well, mitigate) see here: https://serverfault.com/questions/32361/how-to-best-defend-against-a-slowloris-dos-attack-against-an-apache-web-server
You could also use a load-balancer or round-robin setup.
Try slowhttptest to test the slow read attack you're describing. (It can also be used to test slow sending of headers.)