How can I configure HAProxy to use all servers in a backend if none are UP - load-balancing

I'm running HAProxy as a TCP loadbalancer in front of an on-prem Kubernetes cluster. I have set up a small app on each cluster node which return HTTP200 when the node is considered healthy. One of the healthchecks it performs is to query the KubeAPI and verify the status according to K8S itself. Now, if for some reason the Kube API goes down, all nodes will be considered unhealthy at the same time, even though the applications running on the workers are still available.
I'd like to set up HAProxy in such a way that whenever all worker nodes are down according to the health check, HAProxy just assumes they are all alive. If indeed all nodes are down, whether or not traffic is forwarded doesn't matter. If the reason they're all down is that some shared component doesn't respond, just blindly sending traffic will at least keep the service going.
I've parsed the HAProxy reference in search of an option which does this. I can't seem to find one. I think I should be able to get this functionality by registering each worker node twice, once regularly and once with the backup option specified. Then adding allbackups to the backend would make it so that if all worker nodes are down, alls worker nodes are used as a backup. That would look like this:
backend workers
mode tcp
option httpchk HEAD /
option allbackups
server worker-001-1 <address-1> check port 32000
server worker-001-2 <address-2> check port 32000
server worker-001-1-backup <address-1> backup
server worker-001-2-backup <address-2> backup
While this solution seems to work. It seems very hacky. Is there any way to do this in a cleaner way. Is there an option I missed in the reference?
Thanks!

I found a more suitable solution in this answer: https://serverfault.com/a/923624/255500
It boils down to using backend switching rules and creating two backends for each group of clusters:
frontend ingress
bind *:80 http
bind *:443 https
bind *:30000-32676 nodeports
mode tcp
default_backend workers
use_backend workers_backup if { nbsrv(workers) eq 0 }
backend workers
mode tcp
option httpchk HEAD /
server worker-001-1 <address-1> check port 32000
server worker-001-2 <address-2> check port 32000
backend workers_backup
mode tcp
server worker-001-1 <address-1> no-check
server worker-001-2 <address-2> no-check
Once backend workers has zero servers up, backend workers_backup will be used. It's still registering each node twice, but I think this is the better solution.

Is it possible that you're trying to solve the wrong problem? If the nodes report as unhealthy if the Kube API is unavailable, then should you focus on making Kube API highly available?
In this article, they describe a way to create a highly available control plane. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

Related

Apache threads stay in state reading after queries

My configuration is apache and tomcat behind an aws elb. Apache is configured with no keepalive, and set with max clients to a low number due to each query being very cpu intensive. I'll load test the machine with queries. Then the number of available requests goes to zero as can be seen by curl -s localhost/server-status?auto not responding immediately. When I stop the loadtest I can see the scoreboard from curl -s localhost/server-status?auto still is full of R's even though from the tomcat logs it is clear nothing is happening. Does anyone have an idea on what possible causes there might be?
If your apache display 'R' in the status it means there is open TCP connections from ELB to apache (just an open TCP connection, no data sent yet).
There is no official complete documentation on this subject (how the numbers of pre-opened connections is optimized ), but the amazon documentation state (at this page: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html ) that:
Classic Load Balancers use pre-open connections but Application Load Balancers do not.
So, the answer is: it is an optimization from amazon (TCP connection cost a little bit to be openned).

Why is mod_proxy_protocol or ELB causing high apache worker count?

We have a legacy cluster of servers running Apache 2.4 that run our application sitting behind an ELB. This ELB has two listeners, one HTTP, and one HTTPS which terminates at the ELB and sends regular HTTP traffic to the instances behind it. This ELB also has pre-open turned off (it was causing a busy worker buildup). Under normal load we have 1-3 busy workers per instance.
We have a new cluster of servers we are trying to migrate to behind a new ELB. The purpose of this migration is to allow for SNI – serving TLS traffic to thousands of domains. As such this cluster uses mod_proxy_protocol which has been enabled at the ELB level. For the purposes of testing we’ve been weighting traffic at the DNS (Route 53) level to send 30% of our traffic to the new load balancer. Under even this small load we see 5 – 10 busy workers and that grows as traffic does.
As a further test we took one of these new instances, disabled proxy_protocol, and moved it from the new ELB to the old ELB, the worker count drops to average levels, being 1-3 busy workers. This seems to indicate that there is an issue wither with the ELB (differences between HTTP and TCP handling?) or mod_proxy_protocol.
My question: Why is it that we have twice the busy apache workers when using proxy protocol and the new ELB? I would think that since TCP listeners are dumb and don’t do any processing on the traffic, they would be faster and as a result consume less workers time than HTTP listeners which actively ‘modify’ the traffic going thru them.
Any guidance to help us diagnose this issue is appreciated.
The difference is simple and significant:
An ELB in HTTP mode takes care of holding the idle keep-alive connections from browsers without holding open corresponding connections to the instance. There's no necessary correlation between browser connections and back-end connections -- a backend connection can be reused.
In TCP mode, it's 1:1. It has to be, because the ELB can't reuse a back-end connection for different browser connection on the front-end -- it's not interpreting what's going down the pipe. That's always true for TCP, but if the reason isn't intuitive, it should be particularly obvious with the proxy protocol enabled. The PROXY "header" is not in fact a "header" in the usual sense -- it's a preamble. It can only be sent at the very beginning of a connection, identifying the source address and port. The connection persists until the browser or server closes it, or it times out. It's 1:1.
This is not likely to be viable with Apache.
Back to HTTP mode, for a minute.
This ELB also has pre-open turned off (it was causing a busy worker buildup).
I don't know how you did that -- I've never seen it documented, so I assume this must have been through a support request.
This seems like a case of solving entirely the wrong problem. Instead of having a number of connections that seems to you to be artificially high, all you've really accomplished is keeping the number of connections artificially low -- ultimately, you're probably actually impairing your performance and ability to scale. Those spare connections are for the purpose of handling bursts of demand. If your instance is too small to handle them, then I would suggest that the real problem is just that: your instance is too small.
Another approach -- which is exactly the solution I use for my dreaded legacy Apache-based applications (one of which has a single Apache server sitting behind a total of about 15 to 20 different ELBs -- necessary because each ELB is offloading SSL using a certificate provided by one of the old platform's customers) -- is HAProxy between the ELBs and Apache. HAProxy can handle literally hundreds of connections and millions of requests per day on tiny instances (I'm talking tiny -- t2.nano and t2.micro), and it has no problem keeping the connections alive from all of the ELBs yet closing the Apache connection after each request... so it's optimizing things in both directions.
And of course, you can also use HAProxy with a TCP balancer and the proxy protocol -- the author of HAProxy was also the creator of the proxy protocol standard. You can also just run it on the instances with Apache rather than on separate instances. It's lightweight in memory and CPU and doesn't fork. I'm not affilated with the project, other than having submitted occasional bug reports during the development of the Lua integration.

How to scale out haproxy node itself?

As i am having one application in which architecture is as per below,
users ----> Haproxy load balancer(TCP Connection) -> Application server 1
-> Application server 2
-> Application server 3
Now i am able to scale out and scale in application servers. But due to high no of TCP connections(around 10000 TCP connections), my haproxy load balancer node RAM gets full, so further not able to achieve more no of TCP connections afterwords. So need to add one more node for happroxy itself. So my question is how to scale out haproxy node itself or is there any other solution for the same?
As i am deploying application on AWS. Your solution is highly appreciated.
Thanks,
Mayank
Use AWS Route53 and create CNAMEs that point to your haproxy instances.
For example:
Create a CNAME haproxy.example.com pointing to haproxy instance1.
Create a second CNAME haproxy.example.com pointing to haproxy instance2.
Both of your CNAMEs should use some sort of routing policy. Simplest is probably round robin. Simply rotates over your list of CNAMEs. When you lookup haproxy.example.com you will get the addresses of both instances. The order of ips returned will change with each request. This will distribute load evenly between your two instances.
There are many more options depending on your needs: http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html
Overall this is pretty easy to configure.
Few unrelated issues: You can also setup health checks and route traffic to the remaining healthy instance(s) if needed. If your running at capacity with two instances you might want to add a few more to able to cope with an instance failing.

Redis cluster via HAProxy

I have a Redis Cluster that clients are connecting to via HAPRoxy with a Virtual IP. The Redis cluster has three nodes (with each node sharing the same server with a running sentinel instance).
My question is, when i clients gets a "MOVED" error/message from a cluster node upon sending a request, does it bypass the HAProxy the second time when it connects since it has been provided with an IP:port when the MOVEd message was issued? If not, how does the HAProxy know the second time to send it to the correct node?
I just need to understand how this works under the hood.
If you want to use HAProxy in front of Redis Cluster nodes, you will need to either:
Set up an HAProxy for each master/slave pair, and wire up something to update HAProxy when a failure happens, as well as probably intercept the topology related commands to insert the virtual IPs rather than the IPs the nodes themselves have and report via the topology commands/responses.
Customize HAProxy to teach it how to be the cluster-aware Redis client so the actual client doesn't know about cluster at all. This means teaching it the Redis protocol, storing the cluster's topology information, and selecting the node to query based on the key(s) being accessed by the consumer code.
With Redis Cluster the client must be able to access every node in the cluster. Of the two options above Option 2 is the "easier" one, but at this point I wouldn't recommend either.
Conceivably you could use the VIP as a "first place to get the topology info" IP but I suspect you'd have serious issues develop as that original IP would not be one of the ones properly being reported as a nod handling data. For that you could simply use round-robin DNS and avoid that problem, or use the built-in "here is a list of cluster IPs (or names?)" to the initial connection configuration.
Your simplest, and least likely to be problematic, route is to go "full native" and simply give full and direct access to every node in the cluster to your clients and not use HAProxy at all.

Glassfish failover without load balancer

I have a Glassfish v2u2 cluster with two instances and I want to to fail-over between them. Every document that I read on this subject says that I should use a load balancer in front of Glassfish, like Apache httpd. In this scenario failover works, but I again have a single point of failure.
Is Glassfish able to do that fail-over without a load balancer in front?
The we solved this is that we have two IP addresses which both respond to the URL. The DNS provider (DNS Made Easy) will round robin between the two. Setting the timeout low will ensure that if one server fails the other will answer. When one server stops responding, DNS Made Easy will only send the other host as the server to respond to this URL. You will have to trust the DNS provider, but you can buy service with extremely high availability of the DNS lookup
As for high availability, you can have cluster setup which allows for session replication so that the user won't loose more than potentially one request which fails.
Hmm.. JBoss can do failover without a load balancer according to the docs (http://docs.jboss.org/jbossas/jboss4guide/r4/html/cluster.chapt.html) Chapter 16.1.2.1. Client-side interceptor.
As far as I know glassfish the cluster provides in-memory session replication between nodes. If I use Suns Glassfish Enterprise Application Server I can use HADB which promisses 99.999% of availability.
No, you can't do it at the application level.
Your options are:
Round-robin DNS - expose both your servers to the internet and let the client do the load-balancing - this is quite attractive as it will definitely enable fail-over.
Use a different layer 3 load balancing system - such as "Windows network load balancing" , "Linux Network Load balancing" or the one I wrote called "Fluffy Linux cluster"
Use a separate load-balancer that has a failover hot spare
In any of these cases you still need to ensure that your database and session data etc, are available and in sync between the members of your cluster, which in practice is much harder.