Load balancer - how to write one for a custom application? - load-balancing

I've written a simple server application which will run distributed on several machines.
My question is how does a network load balancer works, in general?
I've heard of round-robin and other algorithms, but what I haven't got answer to is how does the process really goes? In socket terms.
The client connects to one of the load balancer machines, asks for a "free-to-connect-to" server and simply connects to it?
That's the simpliest way I can think of.
.. or, does it use the load balancer as a proxy (that implies that all the NBs must be always connected to the application servers, and data is transferred through them)?
It's more of a general question. How would you do this?
Thank you all!

There are several different ways to load balance an application. Some are physical devices that sit between your router and the servers; some are software based with a bit of code that runs on each of the load balanced devices.
Microsoft has load balancing built into Windows which is all software based. It's pretty good and easy to set up.
However, I'll cover the physical route.
There are several algorithms here, but the main one is Round Robin with an option for "sticky" sessions. Sticky in this case means that the load balancer will try to keep a history of clients and forward requests from the same client to the same machine. This means the load balancer needs to keep a list of clients and where it directed those clients. Depending on cache size, clients may fall off the list and on future requests they may be forwarded to a different server.
Round Robin is a pretty simple idea. For each request that comes in send it to the next server in the list. More complicated algorithms might take into account how many requests go to a particular server and how long are those requests taking; then try to rebalance new requests to favor faster servers. This part is complicated though.

Related

Load balancer confusion (Load balancer mechanism )

Hi I'm little confused about load balancer concept
I've read some articles about loadbalancer in nginx and from what I've understand is that the load balancer spread the request into multiple servers !
But i thought if one server is down another one is up and running (not simultaneously all server together)
and another thing is when request spread between servers what happen to static data like sessions and InMemory Database like RedisDB
I think i'm confused and missunderstood the loadbalancer mechanism
and from what I've understand is that the load balancer spread the request into multiple servers ! But i thought if one server is down another one is up and running (not simultaneously all server together)
As it comes from the name the goal of load balancer (LB) is to balance the load. As per wiki definition for example:
In computing, load balancing is the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.
To perform this task load balancer obviously need to have some monitoring over the resources, including liveness checks (so it can bring out of the rotation the failing servers/nodes). Ideally LB should work with stateless services (i.e. request could be routed to any of the servers supporting handling such request type) but that is not always the case due to multiple reasons, for example in ASP.NET in case of non-distributed session requests should have been routed to servers which handled the previous request from the session, which could have been handled with so called sticky session/cookie.
and another thing is when request spread between servers what happen to static data like sessions and InMemory Database like RedisDB
It is not very clear what is the question here. As I mentioned before ideally you will want to have stateless services which will use some shared datastore (s) to handle the requests, so if request comes for any server/node it can load all the needed data to handle it.
So in short when request comes to LB it selects one of the servers based on some algorithm (round robin, resource based, sharding, response time based, etc.) and send this request to this server so in theory based on the used approach sequential requests from the same source can hit different nodes/servers (so basically this is one of the ways to horizontally scale your application).
I actually found my answer in nginx doc page
Short answer is IP-Hash mechanism
Nginx doc word :
Please note that with round-robin or least-connected load balancing, each subsequent client’s request can be potentially distributed to a different server. There is no guarantee that the same client will be always directed to the same server.
If there is the need to tie a client to a particular application server — in other words, make the client’s session “sticky” or “persistent” in terms of always trying to select a particular server — the ip-hash load balancing mechanism can be used.
With ip-hash, the client’s IP address is used as a hashing key to determine what server in a server group should be selected for the client’s requests. This method ensures that the requests from the same client will always be directed to the same server except when this server is unavailable.
To configure ip-hash load balancing, just add the ip_hash directive to the server (upstream) group configuration:
upstream myapp1 {
ip_hash;
server srv1.example.com;
server srv2.example.com;
server srv3.example.com;
}
http://nginx.org/en/docs/http/load_balancing.html

High availability using cloud load balancing

I am looking for the best way to achieve high availability for my organizations applications. Since they contain sensitive information, the applications must reside inside my organizations data centers.
I was thinking of using Google load balancing to direct requests to my servers, but I don't think they can be pointed at external servers, just Google VMs. Does anyone know if that's true?
My other thought was that I could use Google load balancing to point to Google VMs running Nginx and have that load balance between my data centers. Does anyone know if that is feasible? Under this scenario, can I terminate SSL on my servers, or does it have to terminate at the Google VM?
Unfortunately, you are correct: You cannot use Google Cloud's Network load-balancing with external servers.
You could do your second option, but I'd strongly suggest you reconsider the approach: too many moving parts, and for what benefit? If a server goes down you lose session state anyway, so maybe it'd be better for you to use DNS load balancing instead.
FYI I use Google LoadBalancing and AutoScaling, it works pretty good, but not perfect (frequent 502 burps), which is probably why it's still in "Beta".

Do companies that provide APIs use a shim or proxy in front of their APIs?

I'm researching how large companies manage their public APIs. I'm thinking of companies with mature established APIs such as Google, Facebook, Twitter, and Amazon.
These companies have a number of different APIs that they expose to the public. Google, for example, has Plus, AdSense, AdWords etc. APIs that are publicly consumable. I'd like to understand if they use a cluster of reverse-proxy servers in front of those APIs to provide common functionality so that their specialist API servers don't need to implement that.
For example: Throttling and Authentication could be handled at this layer instead of implementing it in each API cluster.
The questions: Does anyone use a shim or reverse proxy in front of their APIs to handle common tasks? What are the use cases that make a reverse-proxy a good or bad idea for a cluster of API servers?
Most large companies explore a variety of things to handle the traffic and load on their servers. Roughly speaking:
A load balancer sits between the entry point and the actual client.
A reverse proxy often times sits between these to handle static files, pre-computed/rendered views, and other such largely static assets.
Any cast is used for DNS purposes, so that you are routed towards the nearest server that handles that URL.
Back pressure is employed in systems to limit the amount of requests feeding through a single pipeline and so that services don't tip over.
Memcached, Redis and the like are used as short term caches. That is, if it's going to roughly be the same result every 5 seconds, then that result can be cached in memory for faster delivery. Some proxies can be configured to read out of these.
If you're really interested, start reading some of the Netflix blog. Take a look at some of the open source they've used like Hystrix or Zuul. You can also take a look at some of their videos. They make heavy use of proxies and have built in some very advanced distributed behavior.
As far as a reverse proxy being a good idea, think in terms of failure. If your service calls out to another API by direct route and that service fails, then your service will fail and cascade upwards to the end user. On the other hand, if it's hitting a reverse proxy, then that proxy can be configured or even auto detect failures and divert traffic to back up servers.
As far as a reverse proxy being a good idea, think in terms of load. Sometimes servers can only handle a fraction of the traffic individually so that load must be shared on many servers. This is true not just of CPU capped but also IO capped resources (even if the return signal itself will not be the cause of the IO capping.)
Daisy chaining like this presents its own special little hell but it's sometimes unavoidable. On the downsides and what makes it a really bad choice if you can avoid it at all costs is a loss of deterministic behavior. Sometimes the stupidest things will bring your servers down. And by stupid, I mean, really, really dumb stuff that you never thought in a million years might bite you in the butt (think server clocks out of sync.) You have to start using rolling deploys of code, take down servers manually or forcefully if they stop responding, and keep those proxy configs in good order.
HTTP1.1 support can also be an issue. Not all reverse proxy adhere to the spec. In fact, some of them only cover ~50%. HAProxy does not do SSL. If you're only limited hardware then thread based proxy can unexpectedly swamp the system with threads.
Finally, adding in a proxy is one more thing that will break (not can, will.) You have to monitor them just like any piece of the platform, aggregate their logs, and run mock drills on them too.

Web App: High Availability / How to prevent a single point of failure?

Can someone explain to me how high-availability ("HA") works for a web application ... because I assume HA means that there exist no single-point-of-failure.
However, even if a load balancer is used- isn't that the single point of failure?
I have found this article on the subject:
http://www.tenereillo.com/GSLBPageOfShame.htm
Basically if you do not require long lasting sticky sessions you can configure your DNS servers to return multiple A records (IP addresses) for your website.
Web browsers are smart enough to try all the addresses until they find one that works.
In simple words high availability can be defined as running a system 24*7 without a downtime even if there are hardware and software failures. In other way a fault tolerance application. This helps ensure uninterrupted use of the application for it’s intended users.
Read more on High Availability Deployment Architecture
It works the following way that you setup two HA Proxy servers with heartbeat, so when one fails (stops responding to queries), it's being removed from the cluster.
Requests from HA Proxy can be forwarded to web servers in round robin fashion, and if one web server fails, HA Proxy servers do not try to contact it until it's alive.
Web servers are storing all dynamic information in database, which is replicated across two MySQL instances.
As you can see, HA Proxy and Cluster MySQL (or simply MySQL replication) as well IP Clustering here is the key.
Sure it is when operated alone. Usual highly available setup includes 2 or more load balancers running in cluster in either active/active or active/passive configuration. To further increase the availability you can have 2 different Internet Service Providers (or geo distributed datacenters) each running a pair of clustered load balancers. Then you configure DNS A record resolving to 2 distinct public IP addresses which guarantees round-robin processing splitting DNS requests evenly (CloudFlare is very fast and reliable at this). There's also possibility to return IP address of datacenter closest to your originating geo location by using something like PowerDNS dnsdist
This is what big players do to make their services highly available.
Please read https://docs.oracle.com/cd/E23824_01/html/821-1453/gkkky.html for more clearity. Actually both load balancer uses same vip(Virtual IP Address. https://techterms.com/definition/vip).
HA architecture is a entire field and multiple books were written on it, so it is hard to answer in a short paragraph.
To sum up the ideal situation, you would be using multiple servers, interconnected to a layer of multiple load balancers. The nodes and LB will be located in a few different data centers, and connected to different network backbone. Ideally the data centers will be located all over the world.
In short, all component will have redundancy, including the load balancers.
For a starting point, see Wikipedia for High Availability Cluster

Apache and the c10k

How is Apache in respect to handling the c10k problem under normal conditions ?
Say while running very small scripts with little data, or do I need to scale out if I use Apache?
In the background heavy lifting is done by a few servers running specialized software that processes the requests but I'd like to use Apache as a front. Is this a viable plan?
I consider Apache to be more of an origin server - running something like mod_php or mod_perl to generate the content and being smart about routing to the appropriate system.
If you are getting thousands of concurrent hits to the front of your site, with a mix of types of data (static and dynamic) being returned, you may find it useful to put a more optimised system in front of it though.
The classic post-optimisation problem with Apache isn't generating the dynamic content (or at least, that can be optimised for early in the process), but simply waiting for a slow client to be able to receive the bytes that are being sent. It can therefore be a significant advantage to put a reverse proxy, in the form of Squid or Nginx, in front of the servers to take over the 'spoon-feeding' of the slow network clients, while allowing the content production to happen at full speed, and at local network speeds - 100Mb/sec or even gigabit speeds - if it even has to traverse a network at all.
I'm assuming you've probably seen this data, but if not, it might give you some idea.
Guys, imagine that you are running web server with 10K connections (simultaneous). How could it be?
You've got many many connections per second
Dynamic content
Are you sure that your CPU can handle that many PHP sessions for example? I guess no, so why are you thinking about C10K problem? :D
Static content - small files
And still soo many connections? On single server? Probably you've got problems with networking/throughput too or you are future competitor of Google. Use lighttpd which addresses C10K problem and is stable - fly light. Using Apache for only static files for large sites is obvious.
Your clients are downloading large files for a large time - static content
ISO images, archives etc
If you are doing it via web server - FTP may be more appropriate.
Video streaming
Use lighttpd or specialized software. And still... What about other resources?
I am using Linux Virtual Server as load balancer in front of apache servers (with specific patches for LVS-NAT) and I am happy :) This string is an answer you want to hear.