Custom load balance logic in HAProxy - load-balancing

I am working on a video-conferencing application. We have a pool of servers where rooms are created, a room can have n number of users. I was exploring HAProxy and several other load balancers, but couldn't find any solution for what I was looking for.
My requirements are as follows
A room should be created on the server with the lowest load at the time of creation.
All users of that room should join on the same server.
I have tried url_param balance logic with consistent hashing, but it is distributing load randomly. Is it even possible with modern L7 load balancers or do I need to write some custom logic (in some load balancer) or a separate application for this scenario?
Is there any way of balancing load based on connections or CPU usage while maintaining the session stickiness?

balance documentation says you can choose algorithm like leastconn and that this only applies when no persistence information is available, or when a connection is redispatched to another server.
So the second part of the answer are stick tables. Read docs about stick match and other stick keywords
So with stick table it looks like this:
backend foo
mode http
balance leastconn
stick store-request src
stick-table type ip size 200k expire 30m
server s1 192.168.1.1:8080
server s2 192.168.1.2:8080
There are more examples in the docs.
What you need to figure out (or tell us) is how can we know the room client wants based on the request and make such stick table and rules. If it's in URL or http header then it is perfectly doable in haproxy.
If leastconn is not good enough, then there is an option of dynamically adjusting servers' weights with haproxy's unix socket CLI and use roundrobin algorithm. Also agent options can be configured for servers to dynamically set servers' weights.

Related

Algorithm to split traffic across variable number of servers

I have a .NET Core service AAA that retrieves a bit of data from another Core service BBB. BBB has an in-memory cache (ConcurrentDictionary) and is deployed to 10 boxes. The total size of the data to be cached is around 100GB.
AAA will have a list of servers that run BBB and I was thinking of doing something along the lines of ServerId = DataItemId % 10, so that each of the boxes gets to serve and cache 10% of the total dataset. What I can't figure is what to do when one of the BBB boxes goes down (e.g. due to Windows Update).
Is there some algorith to split the traffic, that will allow servers to go down and up, but still redirect most of requests to the server that has revelant data cashed?
Azure Load Balancer does not interact with the application payload. It makes decisions based on a hashing function which includes the 5-tuple of the TCP/UDP transport IP packet. There is a difference between Basic and Standard LB here in that Standard LB uses an improved hashing function. There's no strict guarantee for share of requests, but the number of flows arriving over time should be relatively even. A health probe can be used to detect whether a backend instance is health or sick. This controls whether new flows arrive on a backend instance. https://aka.ms/lbprobes has details.

IP Load Balancing - Number of requests limit

I want to configure IP Load Balancing service for our VPS. I have got the documentation at http://docs.ovh.ca/en/products-iplb.html#presentation where I can integrate it.
I want to limit the number of requests on each server (S1, S2). How can I achieve this?
Suppose, I want S1 should handle all requests if requests sent to load balancer are less than 3500 per minute.
If requests are greater than 3500 (per minute), then load balancer should forward all extra requests to S2.
Regards,
Just had a look and I believe you won't be able to achieve what you are looking for with the available load balancing algorithm.
If you look at the available ones, you can see five ldb algorithm. I would say from my experience with load balancers (not from OVH) that they should do the following:
First: probably the first real server to reply (with health monitor) will get the query
leastcon: this distributes connections to the server that is currently managing the fewest open connections at the time the new connection request is received.
roundrobin: next connection is given to the next real server in line
source: not sure about this one but I believe you load-balance per src ip. Eg if request is coming from 143.32.Y.Z, send it to server A etc.
uri: I believe it load balances by URI. Typical if you are hosting different webservers.
I would advise to check with OVH what you can do. Typically in those scenario with an F5 load balancer for example, you can configure a simple script for this. Or groups, if the first group fail, we sent the traffic to the second one.
Now a ratio (also called weighted) ldb algo can do the job, not exactly what you want indeed.
Cheers

Limit bandwidth per-IP by value from HTTP Header

I have file download site. What I look for is limiting bandwidth per IP (!). Limit should be set dynamically by HTTP header from backend.
My current implementation uses X-Accel-Limit-Rate (I can change that header, it's not hard-coded anywhere), but it does limit only current connection/request.
Is my idea doable in G-Wan?
Yes, this can be done.
Write a G-WAN handler to extract the X-Accel-Limit-Rate HTTP header. Then enforce this policy by using the throttle_reply() G-WAN API call documented here.
An example available called throttle.c might help you further.
The throttle_reply() G-WAN function lets you apply throttling on a global basis or per connection, so you will just apply the relevant throttling values for either IP addresses or authenticated users, depending on your needs.
throttle_reply() can change the download speed dynamically during the lifespan of each connection so you can slow-down old connections and create new ones with an adaptive download rate.
Of course, this can be enforced on a per client IP address (or cookie, or even ISP/Datacenter AS record) to deal with huge workloads.

How does the load balanced server is working?

Thanks for taking time to read my questions.
I am having some basic doubts about the load balanced servers.
I assume that One application is hosted on the two servers, when one server is heavily loaded the load balancer is switching the responsibilities of handling the particular request to another server.
This is how I assumed about the load balancer.
Which is managing and monitoring the load and do all the transfers of requests?
How do the static variables are taken place for processing? For ex: , - I have a variable called as 'totalNumberOfClick'. Which is being incremented whenever we hit the page.
If a GET request is handled by a server and its POST method also should be managed by that server.Right? For Ex: in to- A user is requesting a page for editing, the Asp.Net runtime will create a set of viewstate (which has controlID and its values) and is maintained in the server and client side. When we hit the post button the server is validating the view state and allowing it to into a server and doing other processing.
If the post is getting transferred to another server, how the Runtime allow it to do the processing.
If you are using the load balancing built into Windows, then there are several options for how the load is distributed. The servers keep in communication with each other and organise the load between themselves.
The most scalable option is to evenly balance the requests across all of the servers. This means that each request could end up being processed by a different server so a common practice is to use "sticky sessions". These are tied to the user's IP address, and make sure that all requests from the same user go to the same server.
There is no way to share static variables across multiple servers so you will need to store the value in a database or on another server.
If you find an out of process to host session state (such as stateserver or sql server) then you can process any request on any server. Viewstate allows the server to recreate most of the data needed that generated the page.
I have some answers for you.
When it comes to web applications, load balancers need to provide what is calles Session Stickyness. That means that once a server is elected to serve a clients request all subsequent request will be directed to the same node as long as the session is active. Of course this is not neccessary if your web application does not rely on any state that has to be preserved (i.e. stateless, sessionless).
I think this can answer your third and maybe even your second question.
Your first question is on how load balancers work internally. Since I am not an expert in that I can only guess that the load balancer that each client is talking to measures ping response times to derive an estimated load amount on the server. Maybe more sophisticated techniques could be used.

what are some good "load balancing issues" to know?

Hey there guys, I am a recent grad, and looking at a couple jobs I am applying for I see that I need to know things like runtime complexity (straight forward enough), caching (memcached!), and load balancing issues
 (no idea on this!!)
So, what kind of load balancing issues and solutions should I try to learn about, or at least be vaguely familiar with for .net or java jobs ?
Googling around gives me things like network load balancing, but wouldn't that usually not be adminstrated by a software developer?
One thing I can think of is session management. By default, whenever you get a session ID, that session ID points to some in-memory data on the server. However, when you use load-balacing, there are multiple servers. What happens when data is stored in the session on machine 1, but for the next request the user is redirected to machine 2? His session data would be lost.
So, you'll have to make sure that either the user gets back to the same machine for every concurrent request ('sticky connection') or you do not use in-proc session state, but out-of-proc session state, where session data is stored in, for example, a database.
There is a concept of load distribution where requests are sprayed across a number of servers (usually with session affinity). Here there is no feedback on how busy any particular server may be, we just rely on statistical sharing of the load. You could view the WebSphere Http plugin in WAS ND as doing this. It actually works pretty well even for substantial web sites
Load balancing tries to be cleverer than that. Where some feedback on the relative load of the servers determines where new requests go. (even then session affinity tends to be treated as higher priority than balancing load). The WebSphere On Demand Router that was originally delivered in XD does this. If you read this article you will see the kind of algorithms used.
You can achieve balancing with network spraying devices, they could consult "agents" running in the servers which give feedback to the sprayer to give a basis for decisions where request should go. Hence even this Hardware-based approach can have a Software element. See Dynamic Feedback Protocol
network combinatorics, max- flow min-cut theorems and their use