Hardware Load-Balancer for JBoss - apache

In what scenario does it make sense to put a hardware load-balancer in front of the apache servers that are running mod_cluster? Logically it seems like mod_cluster is doing all the load balancing. Is mod_cluster required if you're doing Jboss clustering?
Example Architecture
(1) website www.foo.bar being served from:
(4) Apache Servers Running mod_cluster
(2) JBoss App Servers - 1 Cluster

Benefits of a load balancer:
If one server goes down, some use heartbeats and then do not send traffic to dead servers
Downsides of clustering:
In clustering, if a component breaks down, it kills every server.
Benefits of a cluster:
More power to serve webpages
All on one disk
Downsides of a load balancer:
Costly hardware (or free software)
If the load balancer dies, everything dies.
Feel free to add to this answer.

Related

Build an app stack that has no single point of failure and is fault tolerant

Considering a three-tier app ( web server, app server and database).
[Apache Web server -> Tomcat app server -> database]
How to build an app stack (leaving out the database part) that has no single point of failure and is fault tolerant ?
IMHO, this is quite an open-ended question. How specific is a single point of failure - single app server, single physical server, single data centre, network?
A starting point would be to run your Tomcat and Apache servers in clusters. Alternatively, you can run separate instances, fronted with a load balancer such as HAProxy - except, to avoid a single point of failure, you will need redundancy on the load balancer as well. I recently worked on a project where we had two instances of a load balancer, fronted with a virtual IP (VIP). The load balancers communicated with two different app server instances, using a round-robin approach. Clients connected to the VIP in order to use the application, and they were completely oblivious to the fact that there were multiple servers behind it.
As an additional comment, you may also want to look at space-based architecture - https://en.wikipedia.org/wiki/Space-based_architecture.

Time based apache web server load balancing

I have two Tomcat nodes behind an Apache web server reverse proxy used for load-balancing.
Is there a way for me to configure the workers such that additional hosts are added to the cluster only during certain times of the day?
Although neither mod_jk nor mod_proxy_ajp (or mod_proxy_http) have any specific features for high-load re-sizing (that I know of), I definitely know that mod_jk can be configured for a number of backend instances (Tomcat nodes) and they don't all have to be running all the time.
Take the following configuration for example:
worker.list=T1lb, T2lb, T3lb, T4lb
worker.T1lb.type-lb
worker.T1lb.balance_workers=h1t1, h2t1, h3t1, h4t1
worker.T2lb.type-lb
worker.T2lb.balance_workers=h1t2, h2t2, h3t2, h4t2
[etc for other combinations of TXlb]
worker.h1t1.host=host1.internal
worker.h1t1.port=1111
worker.h1t1.ping_mode=A
worker.h1t2.host=host1.internal
worker.h1t2.port=2222
worker.h1t2.ping_mode=A
worker.h1t3.host=host1.internal
worker.h1t3.port=3333
worker.h1t3.ping_mode=A
worker.h1t4.host=host1.internal
worker.h1t4.port=4444
worker.h1t4.ping_mode=A
[etc for other combinations of hXtY]
If you simply shut-down (or don't start) the Tomcat nodes for h1t3 and h1t4, then mod_jk will know that they are unavailable and won't send requests to them. When you start them up, they'll start taking requests.
There is another option for this configuration. It's a little cleaner, but requires a little more work.
You have the same configuration as above, but you explicitly set the activation state of the nodes you usually don't keep online to disabled, like this:
worker.h1t3.host=host1.internal
worker.h1t3.port=3333
worker.h1t3.ping_mode=A
worker.h1t3.activation=S
worker.h1t4.host=host1.internal
worker.h1t4.port=4444
worker.h1t4.ping_mode=A
worker.h1t4.activation=S
If you want to spin-up nodes h1t3 and h1t4, then you'll bring those nodes online, then change the activation state of those workers from S (stopped) to A (active). Then, mod_jk will start sending requests to those available nodes. When you want to take them offline, put the nodes into the S state (stopped) again, then stop those Tomcat instances.
Lots of this is documented in the Apache Tomcat Connectors load balancing howto, with a full reference in the Apache Tomcat Connectors worker reference.

What is the conceptual difference between Service Discovery tools and Load Balancers that check node health?

Recently several service discovery tools have become popular/"mainstream", and I’m wondering under what primary use cases one should employ them instead of traditional load balancers.
With LBs, you cluster a bunch of nodes behind the balancer, and then clients make requests to the balancer, who then (typically) round robins those requests to all the nodes in the cluster.
With service discovery (Consul, ZK, etc.), you let a centralized “consensus” service determine what nodes for particular service are healthy, and your app connects to the nodes that the service deems as being healthy. So while service discovery and load balancing are two separate concepts, service discovery gives you load balancing as a convenient side effect.
But, if the load balancer (say HAProxy or nginx) has monitoring and health checks built into it, then you pretty much get service discovery as a side effect of load balancing! Meaning, if my LB knows not to forward a request to an unhealthy node in its cluster, then that’s functionally equivalent to a consensus server telling my app not to connect to an unhealty node.
So to me, service discovery tools feel like the “6-in-one,half-dozen-in-the-other” equivalent to load balancers. Am I missing something here? If someone had an application architecture entirely predicated on load balanced microservices, what is the benefit (or not) to switching over to a service discovery-based model?
Load balancers typically need the endpoints of the resources it balances the traffic load. With the growth of microservices and container based applications, runtime created dynamic containers (docker containers) are ephemeral and doesnt have static end points. These container endpoints are ephemeral and they change as they are evicted and created for scaling or other reasons. Service discovery tools like Consul are used to store the endpoints info of dynamically created containers (docker containers). Tools like consul-registrator running on container hosts registers container end points in service discovery tools like consul. Tools like Consul-template will listen for changes to containers end points in consul and update the load balancer (nginx) for sending the traffic to. Thus both Service Discovery Tools like Consul and Load Balancing tools like Nginx co-exist to provide runtime service discovery and load balancing capability respectively.
Follow up: what are the benefits of ephemeral nodes (ones that come and go, live and die) vs. "permanent" nodes like traditional VMs?
[DDG]: Things that come quickly to my mind: Ephemeral nodes like docker containers are suited for stateless services like APIs etc. (There is traction for persistent containers using external volumes - volume drivers etc)
Speed: Spinning up or destroying ephemeral containers (docker containers from image) takes less than 500 milliseconds as opposed to minutes in standing up traditional VMs
Elastic Infrastructure: In the age of cloud we want to scale out and in according to users demand which implies there will be be containers of ephemeral in nature (cant hold on to IPs etc). Think of a markerting campaign for a week for which we expect 200% increase in traffic TPS, quickly scale with containers and then post campaign, destroy it.
Resource Utilization: Data Center or Cloud is now one big computer (compute cluster) and containers pack the compute cluster for max resource utilization and during weak demand destroy the infrastructure for lower bill/resource usage.
Much of this is possible because of lose coupling with ephemeral containers and runtime discovery using service discovery tool like consul. Traditional VMs and tight binding of IPs can stifle this capability.
Note that the two are not necessarily mutually exclusive. It is possible, for example, that you might still direct clients to a load balancer (which might perform other roles such as throttling) but have the load balancer use a service registry to locate instances.
Also worth pointing out that service discovery enables client-side load balancing i.e. the client can invoke the service directly without the extra hop through the load balancer. My understanding is that this was one of the reasons that Netflix developed Eureka, to avoid inter-service calls having to go out and back through the external ELB for which they would have had to pay. Client-side load balancing also provides a means for the client to influence the load-balancing decision based on its own perspective of service availability.
If you look at the tools from a completely different perspective, namely ITSM/ITIL, load balancing becomes "just that", whereas service discovery is a part of keeping your CMDB up to date, and ajour with all your services, and their interconnectivity, for better visibility of impact, in case of downtime, and an overview of areas that may need supplementing, in case of High availability applications.
Furthermore, service-discovery only gives you a picture as of the last scan, and not near-real-time (of course dependent on which scanning interval you have set), whereas load balancing will keep an up-to-date picture of your application's health.

WLS loadbalancing query

I have a Weblogic Cluster and Apache is acting as a front end proxy. By default the Weblogic Cluster and plugin uses round robin algorithm and suppose if i change the loadbalancing algorithm to weightbased or randon how will the weblogic plugin comes to know about the loadbalancing algorithm changes on the WLS cluster side. For serving this purpose do we need a hardware loadbalancer? Apache as a front end proxy with WLS plugin will only support round robin algorithm or will it support other loadbalancing algorithm.For a HA large scale production Env do we prefer Apache with WLS plugin or H/W loadbalancer like bigip or cisco lb as a front end for Weblogic.
The comment is right. The apache plugin only uses Round Robin. For other load balancing strategy, you have to choose HW loadbalancer.
As to your last question, we usually prefer HW loadbalancer over Apache + plugin due to more choice of load balancing strategy, better management interface and some extra features that load balancer provides, such as compression, SPDY support, etc. But if you have lots of static content, Apache + plugin is not a bad choice as it can serve the static content directly without hitting the weblogic cluster, thus reduce the WLS server load. Also a common architecture design is HW loadbalancer --> apache --> weblogic cluster. Thus there is load balancing and HA on the apache server too.
As the comment already indicates, a weight based approach is only supported for EJBs and RMI Objects.
Please refer to the WLS Cluster documentation: http://docs.oracle.com/cd/E23943_01/web.1111/e13709/load_balancing.htm#CHDGFIBD
If you want to load-balance your Web-Sessions you probably need to look into specialized hardware or software components.

Glassfish failover without load balancer

I have a Glassfish v2u2 cluster with two instances and I want to to fail-over between them. Every document that I read on this subject says that I should use a load balancer in front of Glassfish, like Apache httpd. In this scenario failover works, but I again have a single point of failure.
Is Glassfish able to do that fail-over without a load balancer in front?
The we solved this is that we have two IP addresses which both respond to the URL. The DNS provider (DNS Made Easy) will round robin between the two. Setting the timeout low will ensure that if one server fails the other will answer. When one server stops responding, DNS Made Easy will only send the other host as the server to respond to this URL. You will have to trust the DNS provider, but you can buy service with extremely high availability of the DNS lookup
As for high availability, you can have cluster setup which allows for session replication so that the user won't loose more than potentially one request which fails.
Hmm.. JBoss can do failover without a load balancer according to the docs (http://docs.jboss.org/jbossas/jboss4guide/r4/html/cluster.chapt.html) Chapter 16.1.2.1. Client-side interceptor.
As far as I know glassfish the cluster provides in-memory session replication between nodes. If I use Suns Glassfish Enterprise Application Server I can use HADB which promisses 99.999% of availability.
No, you can't do it at the application level.
Your options are:
Round-robin DNS - expose both your servers to the internet and let the client do the load-balancing - this is quite attractive as it will definitely enable fail-over.
Use a different layer 3 load balancing system - such as "Windows network load balancing" , "Linux Network Load balancing" or the one I wrote called "Fluffy Linux cluster"
Use a separate load-balancer that has a failover hot spare
In any of these cases you still need to ensure that your database and session data etc, are available and in sync between the members of your cluster, which in practice is much harder.