2 separate AEM instances under same sub domain? - apache

I have 2 geographically separately hosted AEM (adobe experience manager) instances under the same TLD but with separate sub domains.
For example www.foo.com (instance 1) and www2.foo.com (instance 2)
Is it possible to have both these AEM instances appear under the same TLD? For example something like :
www.foo.com/instance1/ and www.foo.com/instance2/
Any help appreciated!

Yes, this sort of thing is done often using network tools. Typically a website that is built to handle a load will have some sort of load balancer in front of it. The load balancer would sit even ahead of the dispatchers in the overall flow. With a load balancer you can specify routing rules (such as an irule with an F5 load balancer) that will cause the load balancer to send traffic to different places based on the rules you set up--such as the differences in the initial folder structure of the URLs. Check out some articles on irules for more background, such as https://devcentral.f5.com/articles/the101-irules-ndash-introduction-to-irules.
The same can also be done via content delivery networks (CDNs). Ultimately, what you are looking to do must be done at some network layer before the request actually hits an AEM server. The AEM instances themselves won't know that other instances exist. They will just respond to the requests that reach them, and it will be up to the routing layers in the network upstream from them to determine which HTTP requests go to which AEM servers.
See also:
http route url parts to different server
server
Forward specific urls on same domain to different servers

Related

Application load balancer vs network load balancer

I am new to AWS. I can't get a clear idea behind ALB vs NLB. Could anyone explain in a simple way?
There are some excellent answers out there already, let me pick out some key points that may help.
Network Load Balancer
As the name implies, this is for the network levels only. Typically layer 4.
It does not care, nor see, about anything regarding the application layer, such as cookies, headers, etc.
It is context-less, caring only about the network-layer information contained within the packets it is directing this way and that.
the 'balancing' done here is done so solely with IP addresses, port numbers, and other network variables.
Application Load Balancer
This takes into account multiple variables, from the application to the network. It can route its traffic based on this.
It is context-aware and can direct requests based on any single variable as easily as it can a combination of variables.
Key Differences
The network load balancer just forward requests whereas the application load balancer examines the contents of the HTTP request header to determine where to route the request
Network load balancing cannot assure availability of the application, where as Application load balancing can.
Some good sources from where I extracted this information are:
https://medium.com/awesome-cloud/aws-difference-between-application-load-balancer-and-network-load-balancer-cb8b6cd296a4
https://linuxacademy.com/community/show/22677-application-load-balancer-vs-network-load-balancer/
https://aws.amazon.com/elasticloadbalancing/features/#compare
In main response by #james above, network level has been defined multiple times and mentions about network layer information. However, I would like to mention, NLB yes operates on Layer4, but Layer4 is Transport Layer not Network Layer. NLB preserves source IP and thus Elastic IP can be used in case of NLB.

Why do we need web servers if we have load balancer to direct the requests?

Suppose we have two servers serving requests through a load balancer. Is it necessary to have web server in both of our servers to process the requests. Can load balancer itself act as a web server. Suppose we are using apache web server and HAProxy. So does that mean that web server(Apache) should be installed in both the server and load balancer in any one of the server. Why can't we have load balancer in both of our server machine that will be receiving the request and talking to each other to process the requests.
At the very basic, you want to have Webservers fulfill requests for static contents, while Application servers handle business logics, i.e. handle requests for dynamic contents.
But Web servers can do many other things as well such as authenticate and validate requests, logging metrics. Also, the important part of Webserver is putting the Content it gets from Application servers with a View for client to represent.
You want to have LB sitting in front of both Web and App servers if you have more than one server. Also, there's nothing preventing you from putting both Web and App server in one.
The load balancer is in front of your webserver(s) to redirect requests according to number of sessions, a hash of source IP and destination IP, requested URL or other criteria. Additionally, it will check availability of the backend servers to ensure requests get answered even if one server fails.
It's not installed on every webserver - you only need one instance. It could be a hardware appliance, or a software (like HAproxy) which may or may not be installed on one of the webservers. Although this would not be prudent, as this webserver could fail and then the proxy would not be able to redirect traffic to the remaining server.
There are several different scenarios for this. One is load balancing requests to 2 webservers which serve the same HTML content, to provide redundancy.
Another would be to provide multiple websites using just one public address, i.e. applying destination NAT according to the requested URL. For this, the software has to determine the URL in the HTML request and redirect traffic to the backend webserver servicing this site. This sometimes is called 'reverse proxy' as it hides the internal server addresses from the outside.

Does a load-balancer need a load balancer?

Considering a situation in which we have a web-application which is deployed in multiple servers and client requests landing to a load balancer, which in turn routes requests to actual server.
Now, if we have too many requests coming concurrently, would the load balancer itself fail? Suppose we get 1 million requests per second, won't that be beyond the processing capacity of a single load balancer?
How do we design (at least conceptually) a system which handles situations like this?
Putting a load balancer in front of your load balancer will not solve the problem simply because if one load balancer would failover due to the high traffic, so would the one in front!
You can achieve what you're looking for with DNS. You can register multiple IP address to a domain name and hence have multiple load balancers.
Let's say you're making a request to www.example.com. Your browser will lookup the record in the DNS and receive a list of corresponding IP addresses. Then the request will go to the first address on the list. If it's unavailable, it will go to the next on the list. The DNS servers will randomize order of the list to spread the load, and even do periodic health checks to remove unresponsive IPs. That means your requests will be split among your load balancers instead of hitting just the one.

Should I run Tomcat by itself or Apache + Tomcat?

I was wondering if it would be okay to run Tomcat as both the web server and container? On the other hand, it seems that the right way to go about scaling your webapp is to use Apache HTTP listening on port 80 and connecting that to Tomcat listening on another port?
Are both ways acceptable? What is being used nowdays? Whats the prime difference? How do most major websites go about this?
Thanks.
Placing an Apache (or any other webserver) in front of your application server(s) (Tomcat) is a good thing for a number of reasons.
First consideration is about static resources and caching.
Tomcat will probably serve also a lot of static content, or even on dynamic content it will send some caching directives to browsers. However, each browser that hits your tomcat for the first time will cause tomcat to send the static file. Since processing a request is a bit more expensive in Tomcat than it is in Apache (because of Apache being super-optimized and exploiting very low level stuff not always available in Tomcat, because Tomcat extracting much more informations from the request than Apache needs etc...), it may be better for the static files to be server by Apache.
Since however configuring Apache to serve part of the content and Tomcat for the rest or the URL space is a daunting task, it is usually easier to have Tomcat serve everything with the right cache headers, and Apache in front of it capturing the content, serving it to the requiring browser, and caching it so that other browser hitting the same file will get served directly from Apache without even disturbing Tomcat.
Other than static files, also many dynamic stuff may not need to be updated every millisecond. For example, a json loaded by the homepage that tells the user how much stuff is in your database, is an expensive query performed thousands of times that can safely be performed each hour or so without making your users angry. So, tomcat may serve the json with proper one hour caching directive, Apache will cache the json fragment and serve it to any browser requiring it for one hour. There are obviously a ton of other ways to implement it (a caching filter, a JPA cache that caches the query etc...), but sending proper cache headers and using Apache as a reverse proxy is quite easy, REST compliant and scales well.
Another consideration is load balancing. Apache comes with a nice load balancing module, that can help you scale your application on a number of Tomcat instances, supposed that your application can scale horizontally or run on a cluster.
A third consideration is about ulrs, headers etc.. From time to time you may need to change some urls, or remove or override some headers. For example, before a major update you may want to disable caching on browsers for some hours to avoid browsers keep using stale data (same as lowering the DNS TTL before switching servers), or move the old application on another url space, or rewrite old URLs to new ones when possible. While reconfiguring the servlets inside your web.xml files is possible, and filters can do wonders, if you are using a framework that interprets the URLs you may need to do a lot of work on your sitemap files or similar stuff.
Having Apache or another web server in front of Tomcat may help a lot changing only Apache configuration files with modules like mod_rewrite.
So, I always recommend having Apache httpd in front of Tomcat. The small overhead on connection handling is usually recovered thanks to caching of resources, and the additional configuration works is regained the first time you need to move URLs or handle some headers.
It depends on your network and how you wish to have security set up.
If you have a two-firewall DMZ, with applications deployed inside the second firewall, it makes sense to have an Apache or IIS instance in between the two firewalls to handle security and proxy calls into the app server. If it's acceptable to put the Tomcat instance in the DMZ you're free to do so. The only downside that I see is that you'll have to open a port in the second firewall to access a database inside. That might put the database at risk.
Another consideration is traffic. You don't say anything about traffic, sizing servers, and possible load balancing and clustering. A load balancer in front of a cluster of app servers is more likely to be kept inside the second firewall. The Tomcat instance is capable of handling traffic on its own, but there are always volume limitations depending on the hardware it's deployed on and what the application is doing with each request. It's almost impossible to give a yes or no answer without more detailed, application-specific information.
Search the site for "tomcat without apache" - it's been asked before. I voted to close before finding duplicates.

Round robin server setup

From what I understand, if you have multiple web servers, then you need some kind of load balancer that will split the traffic amongst your web servers.
Does this mean that the load balancer is the main connecting point on the network? ie. the load balancer has the IP address of the domain name?
If this is the case, it makes it really easy to add new hardware since you don't have to wait for any dns propogation right?
There are several solutions to this "problem".
You could round-robin at the DNS-level. I.e. have www.yourdomain.com point to several IP-addresses (well all your servers).
This doesn't give you any intelligence in the load balancing, but the load will be more or less randomly distributed, but you wouldn't be resilient to hardware failures as they would still require changes to DNS.
On the other hand you could use a proxy or a loadbalancing proxy that has a single IP but then distributes the traffic to several back-end boxes. This gives you a single point of failure (the proxy, you could of course have several proxies to defeat that problem) and would also give you the added bonus of being able to use some metric to divide the load more evenly and intelligently than with just round-robin dns.
This setup can also handle hardware failure in the back-end pretty seamlessly. The end user never sees the back-end, just the front-end.
There are other issues to think about as well, if your page uses sessions or other smart logic, you can run into synchronisation problems when your user (potentially) hits different servers on every access.
It does (in general). It depends on what server OS and software you are using, but in general, you'll hit the load balancer for each request, and the load balancer will then farm out the work according to the scheme you have in place (round robin, least busy, session controlled, application controlled, etc...)
andy has part of the answer, but for true load balancing and high availability you would want to use a pair of hardware load balancers like F5 bigips in an active passive configuration.
Yes your domain IP would be hosted on these devices and traffic would connect firstly to those devices. Bigips offer a lot of added functionality including multiple ways of load balancing and some great url rewriting, ssl acceleration, etc. It also allows you to run your web servers on a seperate non routable address scheme and even run multiple sites on different ports with the F5's handling the translations.
Once you introduce load balancing you may have some other considerations to take into account for your application(s) like sticky sessions and session state but that is a different subject