Is it possible to combine canary deployments to EKS and an ALB with the least outstanding requests algorithm? - load-balancing

Using an ALB with weighted targets and "round robin algorithm" seems to be the default way of doing canary. But what would happen if I change the algorithm? Would the weights still be respected? Is it possible (or does it even makes sense) to combine weights and "outstanding requests algorithm"?
Reason for this question is that I'm evaluating the integration of ArgoCD rollouts with AWS Load balancer controller.

Related

Mirror canary deployment in RTL?

I am new to canary deployments. We are going to start doing canary deployments via Istio.
I was assuming this would just be a deployment mechanism, probably with some Istio routing testing in a pre-prod env but in earlier test envs we'd ring fence to a version being tested as we do today.
It's been suggested the canary concept is applied to all test environments so we effectively run all versions we expect to canary test in prod in the Route To Live.
Wondring what approach others are taking?
Mirroring
As mentioned here
Using Istio, you can use traffic mirroring to duplicate traffic to another service. You can incorporate a traffic mirroring rule as part of a canary deployment pipeline, allowing you to analyze a service's behavior before sending live traffic to it.
If you're looking for best practices I would recommend to start with this tutorial on medium, because it is explained very well here.
How Traffic Mirroring Works
Traffic mirroring works using the steps below:
You deploy a new version of the application and switch on traffic
mirroring.
The old version responds to requests like before but also sends an asynchronous copy to the new version.
The new version processes the traffic but does not respond to the user.
The operations team monitor the new version and report any issues to the development team.
As the application processes live traffic, it helps the team uncover issues that they would typically not find in a pre-production environment. You can use monitoring tools, such as Prometheus and Grafana, for recording and monitoring your test results.
Additionally there is an example with nginx that perfectly shows how it should work.
Canary deployment
As mentioned here
One of the benefits of the Istio project is that it provides the control needed to deploy canary services. The idea behind canary deployment (or rollout) is to introduce a new version of a service by first testing it using a small percentage of user traffic, and then if all goes well, increase, possibly gradually in increments, the percentage while simultaneously phasing out the old version. If anything goes wrong along the way, we abort and rollback to the previous version. In its simplest form, the traffic sent to the canary version is a randomly selected percentage of requests, but in more sophisticated schemes it can be based on the region, user, or other properties of the request.
Depending on your level of expertise in this area, you may wonder why Istio’s support for canary deployment is even needed, given that platforms like Kubernetes already provide a way to do version rollout and canary deployment. Problem solved, right? Well, not exactly. Although doing a rollout this way works in simple cases, it’s very limited, especially in large scale cloud environments receiving lots of (and especially varying amounts of) traffic, where autoscaling is needed.
There are the differences between k8s canary deployment and istio canary deployment.
k8s
As an example, let’s say we have a deployed service, helloworld version v1, for which we would like to test (or simply rollout) a new version, v2. Using Kubernetes, you can rollout a new version of the helloworld service by simply updating the image in the service’s corresponding Deployment and letting the rollout happen automatically. If we take particular care to ensure that there are enough v1 replicas running when we start and pause the rollout after only one or two v2 replicas have been started, we can keep the canary’s effect on the system very small. We can then observe the effect before deciding to proceed or, if necessary, rollback. Best of all, we can even attach a horizontal pod autoscaler to the Deployment and it will keep the replica ratios consistent if, during the rollout process, it also needs to scale replicas up or down to handle traffic load.
Although fine for what it does, this approach is only useful when we have a properly tested version that we want to deploy, i.e., more of a blue/green, a.k.a. red/black, kind of upgrade than a “dip your feet in the water” kind of canary deployment. In fact, for the latter (for example, testing a canary version that may not even be ready or intended for wider exposure), the canary deployment in Kubernetes would be done using two Deployments with common pod labels. In this case, we can’t use autoscaling anymore because it’s now being done by two independent autoscalers, one for each Deployment, so the replica ratios (percentages) may vary from the desired ratio, depending purely on load.
Whether we use one deployment or two, canary management using deployment features of container orchestration platforms like Docker, Mesos/Marathon, or Kubernetes has a fundamental problem: the use of instance scaling to manage the traffic; traffic version distribution and replica deployment are not independent in these systems. All replica pods, regardless of version, are treated the same in the kube-proxy round-robin pool, so the only way to manage the amount of traffic that a particular version receives is by controlling the replica ratio. Maintaining canary traffic at small percentages requires many replicas (e.g., 1% would require a minimum of 100 replicas). Even if we ignore this problem, the deployment approach is still very limited in that it only supports the simple (random percentage) canary approach. If, instead, we wanted to limit the visibility of the canary to requests based on some specific criteria, we still need another solution.
istio
With Istio, traffic routing and replica deployment are two completely independent functions. The number of pods implementing services are free to scale up and down based on traffic load, completely orthogonal to the control of version traffic routing. This makes managing a canary version in the presence of autoscaling a much simpler problem. Autoscalers may, in fact, respond to load variations resulting from traffic routing changes, but they are nevertheless functioning independently and no differently than when loads change for other reasons.
Istio’s routing rules also provide other important advantages; you can easily control fine-grained traffic percentages (e.g., route 1% of traffic without requiring 100 pods) and you can control traffic using other criteria (e.g., route traffic for specific users to the canary version). To illustrate, let’s look at deploying the helloworld service and see how simple the problem becomes.
There is an example.
There are additional resources you may want to check about traffic mirroring in istio:
https://istio.io/latest/docs/tasks/traffic-management/mirroring/
https://itnext.io/use-istio-traffic-mirroring-for-quicker-debugging-a341d95d63f8
https://dev.to/peterj/mirroring-traffic-with-istio-service-mesh-2cm4
https://livebook.manning.com/book/istio-in-action/chapter-5/v-7/130
https://istio.io/latest/docs/tasks/traffic-management/traffic-shifting/#apply-weight-based-routing

How to increase availability of EKS

As EKS SLA, a 99.9% avialbility is committed by Amazon.
How I can increase that to 99.99% (or even 99.999%)? Would it help if I add master/slave nodes?
Thanks.
I don't think there is a way to do this and still call your setup an AWS EKS Cluster. It is defined in the EKS SLA that an EKS Cluster means that the control plane would be run by AWS. Three masters in different AZs provide pretty much good HA.
A workaround may be introducing a queue between the control plane (i.e. the k8s api) and your requests. The queue can retain the requests which were not successful due to availability issues and can again send the request based on some priority or time-based logic. This won't increase the HA for real-time tasks, but wouldn't let requests made from asynchronous use cases go to waste.

Traefik backend.loadbalancer.swarm or not

The tag traefik.backend.loadbalancer.swarm is default to false with --docker.swarmMode, however I wanted to know if there is any advantages / disadvantages to turn it on.
From what I've gathered so far, traefik.backend.loadbalancer.swarm with cause traefik to call the backend service (by a virtual ip) via the swarm mesh routing network, resulting a single backend record in the dashboard.
Advantage
Not sure, they both use round robin by default.
Disadvantage
Lose the following features:
wighted round robin
sticky session as swarm lb doesn't have it natively right now

What is the conceptual difference between Service Discovery tools and Load Balancers that check node health?

Recently several service discovery tools have become popular/"mainstream", and I’m wondering under what primary use cases one should employ them instead of traditional load balancers.
With LBs, you cluster a bunch of nodes behind the balancer, and then clients make requests to the balancer, who then (typically) round robins those requests to all the nodes in the cluster.
With service discovery (Consul, ZK, etc.), you let a centralized “consensus” service determine what nodes for particular service are healthy, and your app connects to the nodes that the service deems as being healthy. So while service discovery and load balancing are two separate concepts, service discovery gives you load balancing as a convenient side effect.
But, if the load balancer (say HAProxy or nginx) has monitoring and health checks built into it, then you pretty much get service discovery as a side effect of load balancing! Meaning, if my LB knows not to forward a request to an unhealthy node in its cluster, then that’s functionally equivalent to a consensus server telling my app not to connect to an unhealty node.
So to me, service discovery tools feel like the “6-in-one,half-dozen-in-the-other” equivalent to load balancers. Am I missing something here? If someone had an application architecture entirely predicated on load balanced microservices, what is the benefit (or not) to switching over to a service discovery-based model?
Load balancers typically need the endpoints of the resources it balances the traffic load. With the growth of microservices and container based applications, runtime created dynamic containers (docker containers) are ephemeral and doesnt have static end points. These container endpoints are ephemeral and they change as they are evicted and created for scaling or other reasons. Service discovery tools like Consul are used to store the endpoints info of dynamically created containers (docker containers). Tools like consul-registrator running on container hosts registers container end points in service discovery tools like consul. Tools like Consul-template will listen for changes to containers end points in consul and update the load balancer (nginx) for sending the traffic to. Thus both Service Discovery Tools like Consul and Load Balancing tools like Nginx co-exist to provide runtime service discovery and load balancing capability respectively.
Follow up: what are the benefits of ephemeral nodes (ones that come and go, live and die) vs. "permanent" nodes like traditional VMs?
[DDG]: Things that come quickly to my mind: Ephemeral nodes like docker containers are suited for stateless services like APIs etc. (There is traction for persistent containers using external volumes - volume drivers etc)
Speed: Spinning up or destroying ephemeral containers (docker containers from image) takes less than 500 milliseconds as opposed to minutes in standing up traditional VMs
Elastic Infrastructure: In the age of cloud we want to scale out and in according to users demand which implies there will be be containers of ephemeral in nature (cant hold on to IPs etc). Think of a markerting campaign for a week for which we expect 200% increase in traffic TPS, quickly scale with containers and then post campaign, destroy it.
Resource Utilization: Data Center or Cloud is now one big computer (compute cluster) and containers pack the compute cluster for max resource utilization and during weak demand destroy the infrastructure for lower bill/resource usage.
Much of this is possible because of lose coupling with ephemeral containers and runtime discovery using service discovery tool like consul. Traditional VMs and tight binding of IPs can stifle this capability.
Note that the two are not necessarily mutually exclusive. It is possible, for example, that you might still direct clients to a load balancer (which might perform other roles such as throttling) but have the load balancer use a service registry to locate instances.
Also worth pointing out that service discovery enables client-side load balancing i.e. the client can invoke the service directly without the extra hop through the load balancer. My understanding is that this was one of the reasons that Netflix developed Eureka, to avoid inter-service calls having to go out and back through the external ELB for which they would have had to pay. Client-side load balancing also provides a means for the client to influence the load-balancing decision based on its own perspective of service availability.
If you look at the tools from a completely different perspective, namely ITSM/ITIL, load balancing becomes "just that", whereas service discovery is a part of keeping your CMDB up to date, and ajour with all your services, and their interconnectivity, for better visibility of impact, in case of downtime, and an overview of areas that may need supplementing, in case of High availability applications.
Furthermore, service-discovery only gives you a picture as of the last scan, and not near-real-time (of course dependent on which scanning interval you have set), whereas load balancing will keep an up-to-date picture of your application's health.

Bluemix Load Balancer Algorithm

What algorithm is used to balance HTTP load among several instances running on Bluemix? It seems I can use auto-scaling service to scale horizontally, and want to know what algorithm is used when balancing the load.
Cloud Foundry uses round-robin load balancing to distribute requests across the running instances of your app.