How do load balancer's handle situations where all servers are busy? - load-balancing

Consider the case where a load balancer is connected to 10 servers, each of which runs a single web server process that is single threaded. Handling a request takes 10 seconds.
Now the load balancer receives 10 requests in the span of 2 seconds. When an eleventh request comes in, what happens?
The load balancer realizes that all the servers are busy, and waits until one of them completes their task before forwarding the request to that server
The load balancer immediately forwards the request to one of the servers, despite it being busy.
I'm having trouble finding information on which of the two above scenarios is correct. Thanks for your help!

Related

Load balancer confusion (Load balancer mechanism )

Hi I'm little confused about load balancer concept
I've read some articles about loadbalancer in nginx and from what I've understand is that the load balancer spread the request into multiple servers !
But i thought if one server is down another one is up and running (not simultaneously all server together)
and another thing is when request spread between servers what happen to static data like sessions and InMemory Database like RedisDB
I think i'm confused and missunderstood the loadbalancer mechanism
and from what I've understand is that the load balancer spread the request into multiple servers ! But i thought if one server is down another one is up and running (not simultaneously all server together)
As it comes from the name the goal of load balancer (LB) is to balance the load. As per wiki definition for example:
In computing, load balancing is the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.
To perform this task load balancer obviously need to have some monitoring over the resources, including liveness checks (so it can bring out of the rotation the failing servers/nodes). Ideally LB should work with stateless services (i.e. request could be routed to any of the servers supporting handling such request type) but that is not always the case due to multiple reasons, for example in ASP.NET in case of non-distributed session requests should have been routed to servers which handled the previous request from the session, which could have been handled with so called sticky session/cookie.
and another thing is when request spread between servers what happen to static data like sessions and InMemory Database like RedisDB
It is not very clear what is the question here. As I mentioned before ideally you will want to have stateless services which will use some shared datastore (s) to handle the requests, so if request comes for any server/node it can load all the needed data to handle it.
So in short when request comes to LB it selects one of the servers based on some algorithm (round robin, resource based, sharding, response time based, etc.) and send this request to this server so in theory based on the used approach sequential requests from the same source can hit different nodes/servers (so basically this is one of the ways to horizontally scale your application).
I actually found my answer in nginx doc page
Short answer is IP-Hash mechanism
Nginx doc word :
Please note that with round-robin or least-connected load balancing, each subsequent client’s request can be potentially distributed to a different server. There is no guarantee that the same client will be always directed to the same server.
If there is the need to tie a client to a particular application server — in other words, make the client’s session “sticky” or “persistent” in terms of always trying to select a particular server — the ip-hash load balancing mechanism can be used.
With ip-hash, the client’s IP address is used as a hashing key to determine what server in a server group should be selected for the client’s requests. This method ensures that the requests from the same client will always be directed to the same server except when this server is unavailable.
To configure ip-hash load balancing, just add the ip_hash directive to the server (upstream) group configuration:
upstream myapp1 {
ip_hash;
server srv1.example.com;
server srv2.example.com;
server srv3.example.com;
}
http://nginx.org/en/docs/http/load_balancing.html

How do Cloud Run instances perceive multiple requests from HTTP2 connections?

HTTP2 has this multiplexing feature.
From this [answer](Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.) we get that:
Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.
Let's say I split my app into 50 small bundled files, to take advantage of the multiplex communication.
My server is an express app hosted in a Cloud Run instance.
Here is what Cloud Run says about concurrency:
By default Cloud Run container instances can receive many requests at the same time (up to a maximum of 250).
So, if 5 users hit my app at the same time, does it mean that my instance will be max'ed out for a brief moment?
Because each browser (from the 5 users) will make 50 requests (for the 50 small bundled files), resulting on a total of 250.
Does the fact that multiplex traffic occurs on over the same connection change any thing? How does it work?
Does it mean that my cloud run will perceive 5 connections and my express server will perceive 250 requests? I think I'm confused about the request expression in these 2 perspectives (the cloud run instance and the express server).
A "request" is :
the establishment of the connexion between the server and the client (the browser here)
The data transfert
The connexion close.
With streaming capacity of HTTP2 and websocket, the connexion can takes minutes (and up to 1 hour) and you can send data through the channel as you want. 1 connexion = 1 request, 5 connexions = 5 requests.
But keep in mind that keeping this connexion open and processing data in it consume resources on your backend and you can't have dozens of connexion that actively send/receive data, you will saturate your instance.

Azure application gateway - caching and routing traffic

lets say there are 2 web services. The goal is, that the app gateway routes the requests to both of them. If one of them is down, it should cache all the requests. Once it is up again, which can happen hours later, all the requests cached in the meantime should be send to it in the correct sequence. This is to preserve both services in the same state. Is something like this possible with an application gateway? Or with any other webserver/tool?
Thanks!
u can do that but u need some configuration HTTP Load Balancing
Load Balancer Overview
The capacity of a single server is limited. Once a website gains more and more attraction the instance serving the site comes to a point where it can not handle any more users. The website starts to slow down or even become unavailable as the server goes down from the traffic.
This is the point where a load balancer enters the game. It allows to spread the “load” that all those visitors and their requests create to be “balanced” over a series of different instances.
In case of increasing load on a setup, capacity can easily be increased by adding more instances to the load balancers backend. This allows to scale your infrastructure without any downtime or delays whilst waiting for DNS zones to be updated.

How long will google cloud HTTP(S) Load Balancers keep tcp connections open?

tl;dr:
When a google cloud HTTPS load balancer opens a tcp stream (with a "Connection: keep-alive" header in the request), are there any guarantees around how long (at max) that stream will be kept open to the backend server?
longer:
I deployed a Go http server behind an HTTPS load balancer and quickly ran into a lot of issues because I had set an aggressive (10s) read deadline on my socket connections, which meant that my server often closed connections in the middle of reading subsequent requests. So clearly I'm doing that wrong, but at the same time I don't want to not set ANY deadlines on my sockets, because I want to guard against the possibility of these servers slowly leaking dead connections over time, eating up all my file descriptors.
As such, it would be nice if, for example, the load balancers automatically close any tcp streams that they have open after 5 minutes. That way I can set my server's read deadline to (e.g.) 6 minutes and I can be sure that I'll never interrupt any requests - the deadline will only be invoked in exceptional cases (e.g. the FIN packet from the load balancer was not received by my server).
I was unable to get an official answer on this from Google enterprise support, but from my experiments (analyzing multi-hour tcpdumps) it looks like the load balancer will close connections after ~10 minutes of idleness (meaning no tcp data packets for 10 minutes).
Per here, idle TCP connections to Compute Instances are timed out after 10 minutes, which would seem to confirm your hypothesis.

Using Load Balancers To Terminate TLS For TCP/IP Connections

I am writing a TCP/IP server that handlers persistent connections. I'll be using TLS to secure the communication and have a question about how to do this:
Currently I have a load balancer (AWS ELB) in front of a single server. In order for the load balancer to do the TLS termination for the duration of the connection it must hold on to the connection and forward the plain text to the application behind it.
client ---tls---> Load Balancer ---plain text---> App Server
This works great. Yay! My concern is that I'll need a load balancer in front of every app server because, presumably, the number of connections the load balancer can handle is the same as the number of connections the app server can handle (assuming the same OS and NIC). This means that if I had 1 load balancer and 2 app servers, I could wind up in a situation where the load balancer is at full capacity and each app server is at half capacity. In order to avoid this problem I'd have to create a 1 to 1 relationship between the load balancers and app servers.
I'd prefer the app server to not have to do the TLS termination because, well, why recreate the wheel? Are there better methods than to have a 1 to 1 relationship between the load balancer and the app server to avoid the capacity issue mentioned above?
There are two probable flaws in your presumption.
The first is the assumption that your application server will experience the same amount of load for a given number of connections as the load balancer. Unless your application server is extremely well-written, it seems reasonable that it would run out of CPU or memory or encounter other scaling issues before it reached the theoretical maximum ~64K concurrent connections IPv4 can handle on a given IP address. If that's really true, then great -- well done.
The second issue is that a single load balancer from ELB is not necessarily a single machine. A single ELB launches a hidden virtual machine in each availability zone where you've attached the ELB to a subnet, regardless of the number of instances attached, and the number of ELB nodes scales up automatically as load increases. (If I remember right, I've seen as many as nodes 8 running at the same time -- for a single ELB.) Presumably the class of those ELB instances could change , too, but that's not a facet that's well documented. There's not a charge for these machines, as they are included in the ELB price, so as they scale up, the monthly cost for the ELB doesn't change... but provisioning qty = 1 ELB does not mean you get only 1 ELB node.