I am looking for the best way to achieve high availability for my organizations applications. Since they contain sensitive information, the applications must reside inside my organizations data centers.
I was thinking of using Google load balancing to direct requests to my servers, but I don't think they can be pointed at external servers, just Google VMs. Does anyone know if that's true?
My other thought was that I could use Google load balancing to point to Google VMs running Nginx and have that load balance between my data centers. Does anyone know if that is feasible? Under this scenario, can I terminate SSL on my servers, or does it have to terminate at the Google VM?
Unfortunately, you are correct: You cannot use Google Cloud's Network load-balancing with external servers.
You could do your second option, but I'd strongly suggest you reconsider the approach: too many moving parts, and for what benefit? If a server goes down you lose session state anyway, so maybe it'd be better for you to use DNS load balancing instead.
FYI I use Google LoadBalancing and AutoScaling, it works pretty good, but not perfect (frequent 502 burps), which is probably why it's still in "Beta".
Related
I have some questions according to https://docs.konghq.com/2.0.x/clustering
I’ll really appreciated if someone help me.
1)according to Clustering Reference I need a load balancer , could you please introduce me a free one which I can use in front of my Kong nodes?
2)I still don’t know is it better to implement kong nodes in different VMs or in Docker using docker-compose file for a fully production environment ?
Best Regards,
I think both your questions are highly dependent of your tech stack / architecture.
Regarding the load balancing question, I can think of several options for different options:
DNS Load Balancing, which depends on client side load balancing
Services in an Kubernetes/OpenShift environment, which provide load balancing across a bunch of pods
AWS Load Balancers, if you deploy Kong directly on EC2 machines. (I am sure other cloud providers have simar concepts)
Whether you deploy Kong on a VM or as a Docker Container is quite hard to answer. It depends on your tech stack you already have in place and on your requirements (see https://docs.konghq.com/2.0.x/sizing-guidelines/). However, I would not recommend to use docker-compose for this use case. If you decide for a Docker-based solution you should take a look at container management solutions such as Kubernetes or OpenShift. There you have solved the management of your Kong containers (such as how many replicas are running and what happens if one replica is failing) and you have solved the load balancing issue (by using Kubernetes/OpenShift services objects).
When running self-healing, scalable stateless services in frameworks such as marathon, the "affirmed" pattern is to have a tool for service discovery (e.g., bamboo) that feeds a load-balancer (e.g., HAProxy), preferrably with some automatic configuration, so that users can be proxied to services when hitting the load-balancer.
I don't seem to find much material about how to make the load-balancer itself highly available.
If the host that runs the load-balancer dies, I would like the services to still be accessed on the same URIs without downtimes.
What I desire can be achieved with Pacemaker/Corosync, but the fact that this specific point is often omitted in the various tutorials and blog posts, makes me think that maybe there is a simpler pattern or that I am overlooking the problem.
Do you have any suggestions?
One common approach is to run multiple machines with Bamboo/HAProxy on them pulling container locations from the Marathon/Mesos masters. In an AWS/Cloud-du-jour environment, these proxy machines can go behind an ELB (or even in an Autoscaling Group if your automation is good enough) to give you true HA functionality.
I'm researching how large companies manage their public APIs. I'm thinking of companies with mature established APIs such as Google, Facebook, Twitter, and Amazon.
These companies have a number of different APIs that they expose to the public. Google, for example, has Plus, AdSense, AdWords etc. APIs that are publicly consumable. I'd like to understand if they use a cluster of reverse-proxy servers in front of those APIs to provide common functionality so that their specialist API servers don't need to implement that.
For example: Throttling and Authentication could be handled at this layer instead of implementing it in each API cluster.
The questions: Does anyone use a shim or reverse proxy in front of their APIs to handle common tasks? What are the use cases that make a reverse-proxy a good or bad idea for a cluster of API servers?
Most large companies explore a variety of things to handle the traffic and load on their servers. Roughly speaking:
A load balancer sits between the entry point and the actual client.
A reverse proxy often times sits between these to handle static files, pre-computed/rendered views, and other such largely static assets.
Any cast is used for DNS purposes, so that you are routed towards the nearest server that handles that URL.
Back pressure is employed in systems to limit the amount of requests feeding through a single pipeline and so that services don't tip over.
Memcached, Redis and the like are used as short term caches. That is, if it's going to roughly be the same result every 5 seconds, then that result can be cached in memory for faster delivery. Some proxies can be configured to read out of these.
If you're really interested, start reading some of the Netflix blog. Take a look at some of the open source they've used like Hystrix or Zuul. You can also take a look at some of their videos. They make heavy use of proxies and have built in some very advanced distributed behavior.
As far as a reverse proxy being a good idea, think in terms of failure. If your service calls out to another API by direct route and that service fails, then your service will fail and cascade upwards to the end user. On the other hand, if it's hitting a reverse proxy, then that proxy can be configured or even auto detect failures and divert traffic to back up servers.
As far as a reverse proxy being a good idea, think in terms of load. Sometimes servers can only handle a fraction of the traffic individually so that load must be shared on many servers. This is true not just of CPU capped but also IO capped resources (even if the return signal itself will not be the cause of the IO capping.)
Daisy chaining like this presents its own special little hell but it's sometimes unavoidable. On the downsides and what makes it a really bad choice if you can avoid it at all costs is a loss of deterministic behavior. Sometimes the stupidest things will bring your servers down. And by stupid, I mean, really, really dumb stuff that you never thought in a million years might bite you in the butt (think server clocks out of sync.) You have to start using rolling deploys of code, take down servers manually or forcefully if they stop responding, and keep those proxy configs in good order.
HTTP1.1 support can also be an issue. Not all reverse proxy adhere to the spec. In fact, some of them only cover ~50%. HAProxy does not do SSL. If you're only limited hardware then thread based proxy can unexpectedly swamp the system with threads.
Finally, adding in a proxy is one more thing that will break (not can, will.) You have to monitor them just like any piece of the platform, aggregate their logs, and run mock drills on them too.
Can someone explain to me how high-availability ("HA") works for a web application ... because I assume HA means that there exist no single-point-of-failure.
However, even if a load balancer is used- isn't that the single point of failure?
I have found this article on the subject:
http://www.tenereillo.com/GSLBPageOfShame.htm
Basically if you do not require long lasting sticky sessions you can configure your DNS servers to return multiple A records (IP addresses) for your website.
Web browsers are smart enough to try all the addresses until they find one that works.
In simple words high availability can be defined as running a system 24*7 without a downtime even if there are hardware and software failures. In other way a fault tolerance application. This helps ensure uninterrupted use of the application for it’s intended users.
Read more on High Availability Deployment Architecture
It works the following way that you setup two HA Proxy servers with heartbeat, so when one fails (stops responding to queries), it's being removed from the cluster.
Requests from HA Proxy can be forwarded to web servers in round robin fashion, and if one web server fails, HA Proxy servers do not try to contact it until it's alive.
Web servers are storing all dynamic information in database, which is replicated across two MySQL instances.
As you can see, HA Proxy and Cluster MySQL (or simply MySQL replication) as well IP Clustering here is the key.
Sure it is when operated alone. Usual highly available setup includes 2 or more load balancers running in cluster in either active/active or active/passive configuration. To further increase the availability you can have 2 different Internet Service Providers (or geo distributed datacenters) each running a pair of clustered load balancers. Then you configure DNS A record resolving to 2 distinct public IP addresses which guarantees round-robin processing splitting DNS requests evenly (CloudFlare is very fast and reliable at this). There's also possibility to return IP address of datacenter closest to your originating geo location by using something like PowerDNS dnsdist
This is what big players do to make their services highly available.
Please read https://docs.oracle.com/cd/E23824_01/html/821-1453/gkkky.html for more clearity. Actually both load balancer uses same vip(Virtual IP Address. https://techterms.com/definition/vip).
HA architecture is a entire field and multiple books were written on it, so it is hard to answer in a short paragraph.
To sum up the ideal situation, you would be using multiple servers, interconnected to a layer of multiple load balancers. The nodes and LB will be located in a few different data centers, and connected to different network backbone. Ideally the data centers will be located all over the world.
In short, all component will have redundancy, including the load balancers.
For a starting point, see Wikipedia for High Availability Cluster
I've written a simple server application which will run distributed on several machines.
My question is how does a network load balancer works, in general?
I've heard of round-robin and other algorithms, but what I haven't got answer to is how does the process really goes? In socket terms.
The client connects to one of the load balancer machines, asks for a "free-to-connect-to" server and simply connects to it?
That's the simpliest way I can think of.
.. or, does it use the load balancer as a proxy (that implies that all the NBs must be always connected to the application servers, and data is transferred through them)?
It's more of a general question. How would you do this?
Thank you all!
There are several different ways to load balance an application. Some are physical devices that sit between your router and the servers; some are software based with a bit of code that runs on each of the load balanced devices.
Microsoft has load balancing built into Windows which is all software based. It's pretty good and easy to set up.
However, I'll cover the physical route.
There are several algorithms here, but the main one is Round Robin with an option for "sticky" sessions. Sticky in this case means that the load balancer will try to keep a history of clients and forward requests from the same client to the same machine. This means the load balancer needs to keep a list of clients and where it directed those clients. Depending on cache size, clients may fall off the list and on future requests they may be forwarded to a different server.
Round Robin is a pretty simple idea. For each request that comes in send it to the next server in the list. More complicated algorithms might take into account how many requests go to a particular server and how long are those requests taking; then try to rebalance new requests to favor faster servers. This part is complicated though.