How to redirect connection to implement load balancer? - load-balancing

I have to write a load balancer for our custom server, not http.
I have gone through lot of articles over internet. Every where its mentioned that load balancer redirects the connection to actual server.
But no where is mentioned in how to redirect the connects.
Can some body tell how to implement the connection redirection in C ?
Thanks

Redirecting connection in this context means creating a proxy between two connections - external (client facing), and internal (server facing). On one end you listen for incoming connections, on other you pick a backend server and redirect traffic from client connection there. In essence you're creating a flow from two ip tuples:
((external ip, external port, external interface) , (internal ip, internal port,internal interface))
The data flow is:
client load balancer server
[c1 sock]<--->[external socket | internal socket]<--->[s1 sock]
Basic operation mode would be:
When client connects, the load balancers picks a server from the servers
pool.
When data is transferred on either end, the load balancer copies
data between the two sockets.
When connection state changes on either end (is closed), the load balancer replicates the state to the other socket.
When a backed server is down, the load balancer excludes it from the pool (some kind of monitoring is required).
You can implement it without the usage of sockets, at the network layer, but that requires userspace TCP/IP stack implementaton and ability to read packets directly from network adapter queue.
nginx can load balance TCP and UDP connections. Why not use it instead of reinventing the wheel? It is probably way more tuned and battle tested that your solution will be in a few years.
https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/

Related

What is purpose of decryption of data at both the load balancer and then the web server?

I heard that to alleviate the web server of the burden of performing the SSL Termination, it is moved to load balancers and then HTTP connection is made from the LB to the web server. However, in order to ensure security, an accepted practice is to re encrypt the data on the LB and then transmit it to the web server. If we are eventually sending the encrypted data to the web servers, what is the purpose of having a LB terminate SSL in the first place ?
A load balancer will spread the load over multiple backend servers so that each backend server takes only a part of the load. This balancing of the load can be done in a variety of ways, also depending on the requirements of the web application:
If the application is fully stateless (like only serving static content) each TCP connection can be send to an arbitrary server. In this case no SSL inspection would be needed since the decision does not depend on the content of the traffic.
If the application is instead stateful the decision which backend to use might be done based on the session cookie, so that requests end up at the same server as the previous requests for the session. Since the session cookie is part of the encrypted content SSL inspection is needed. Note that in this case often a simpler approach can be used, like basing the decision on the clients source IP address and thus avoiding the costly SSL inspection.
Sometimes load balancers also do more than just balance the load. They might incorporate security features, like a Web Application Firewall, they might sanitize the traffic or similar. These features work on the content so SSL inspection is needed.

What is the difference between Reverse proxy and Load balancer?

I am trying to understand how reverse proxy and load balancing are different from each other. When its useful to use reverse proxy over load balancing.
Both promise to improve efficiency and sits in between client and server. They nearly look the same when we try to understand them, but still their functionality differs.
Load balancing: Is hardware or a software unit that distributes the total load on a website by distributing it to multiple servers.
The algorithms used by load balancing should be chosen as such it makes the best use of each servers’ capacity and can provide the result as fast as possible.
Load balancers are of three categories: DNS Round Robin, L3/L4 Load Balancer [ works on IP and TCP layer ], and L7 Load Balancer [ works on application layer].
The different kinds of algorithms used by load balancer for distributing load are IP Hash, Least connection, Round robin, Least traffic, etc.
Reverse Proxy: They act as a face of website or we can say they serve as a gateway that web traffic has to pass. The main role of a reverse proxy is:
Security: They act as a wall to your backend server. Protecting the backend from direct interactions and thus improving the security of the overall system.
Web acceleration: It also provides features like caching, SSL encryption, and Compression to reduce the time to provide responses to clients.
Flexibility: The changes in backend architecture become more flexible as the client can only access the reverse proxy.
A reverse proxy can even be relevant even when there is only one server in your system. In such cases there is no requirement of load balancers but still the reverse proxy can be useful providing security, flexibility and web acceleration.
According to this link,
A reverse proxy accepts a request from a client, forwards it to a server that can fulfill it, and returns the server’s response to the client. In other words, Reverse proxies act as such for HTTP traffic and application programming interfaces.
A load balancer distributes incoming client requests among a group of servers, in each case returning the response from the selected server to the appropriate client. Load balancers can deal with multiple protocols — HTTP as well as Domain Name System protocol, Simple Message Transfer Protocol and Internet Message Access Protocol. A load balancer receives and routes client requests for application, text, image or video data to any server in a pool that is capable of fulfilling them and then returns the server’s response to the client.

Difference Between Load Balancing and Load Balancer

I need to know the difference between a load balancer and load balancing.
Load balancing is the functionality provided by a Load balancer :).
In software architecture, a load balancer proxies client requests to a pool of application server, using an algorithm, with the objective of balancing the load of client requests evenly across the pool
Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool.
A load balancer acts as the “traffic cop” sitting in front of your servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.
refer - https://www.nginx.com/resources/glossary/load-balancing/
Load Balancing helps spread incoming request traffic across cluster of servers. If a server is not availble to take a request, load balancer passes this request to another server.
Load Balancer in turn are the ones which achieve above, they could come in between :-
User - webserver
Webserver - internal application servers
Internal servers - database servers
Application servers - cache servers
Different types of Load Balancers:
Smart Client - Adding load balance achievability by It is a client which takes a pool of service hosts and balances load across them, detects downed hosts and avoids sending requests their way.
Hardware Load Balancer - Buy your own dedicated high performance server eg. Citrix NetScaler.
Software Load Balancer - Buy a software load balancer to overcome all the pain of building your own smart client or if you not ready spending on dedicated server. Cost effective than above two is buying a software load balancer eg. VmWare, HAProxy etc
As per my knowledge both are same but you can say that the load balancer is the device used for balancing the traffic as per the availability of the server and load balancing is nothing but theoretical explanation for how to achieve this.
Please correct me if I'm wrong!

Why should all packets pass load balancer instead of redirecting client to a specific server?

I am trying to understand how load balancing works.
Currently, as far as I understand, all the network packets are passing through load balancers. But this obviously looks making LB itself as a bottleneck. It would be better redirecting clients to a specific server directly.
So, why most of LB configuration shows all the packets are passing through LB machine?

Determine SSL connection behind a load balancer

Looking for best practice here. We deal with SSL connection at our load balancer level and hence all the connection from our load balancer to our web servers are http. With that we have no way of telling what kind of connection the client is making to our web server since all connection is through http. We currently have 2 solution, one is to have the load balancer to append a port number in the URL string so that we can determine the kind of request (ex. 80 for http and 443 for https). The other solution is for the load balancer to append a special header when it get https request so the web servers knows the type of connection.
Do you see cons in both solution? Is there any best practice regarding SSL being applied at the load balancer level instead of web server level?
I would prefer the header, I think. Adding something in the URL creates the possibility, however slim, that you'll collide with a query string parameter that an app wants to use. A custom header would be easier.
A third option could be to have ssl connections redirect to a different port, say 8080, so on the back end you know that port 80 connections were http to begin with, and port 8080 connections were 443 to begin with, even though they're both http at that point.
I suggest using the header. A related concept is determining the IP address of the client (for logging purposes), since all requests to your web server appear to originate at the load balancer. The x-forwarded-for header is customarily used here.