I was wondering if it would be okay to run Tomcat as both the web server and container? On the other hand, it seems that the right way to go about scaling your webapp is to use Apache HTTP listening on port 80 and connecting that to Tomcat listening on another port?
Are both ways acceptable? What is being used nowdays? Whats the prime difference? How do most major websites go about this?
Thanks.
Placing an Apache (or any other webserver) in front of your application server(s) (Tomcat) is a good thing for a number of reasons.
First consideration is about static resources and caching.
Tomcat will probably serve also a lot of static content, or even on dynamic content it will send some caching directives to browsers. However, each browser that hits your tomcat for the first time will cause tomcat to send the static file. Since processing a request is a bit more expensive in Tomcat than it is in Apache (because of Apache being super-optimized and exploiting very low level stuff not always available in Tomcat, because Tomcat extracting much more informations from the request than Apache needs etc...), it may be better for the static files to be server by Apache.
Since however configuring Apache to serve part of the content and Tomcat for the rest or the URL space is a daunting task, it is usually easier to have Tomcat serve everything with the right cache headers, and Apache in front of it capturing the content, serving it to the requiring browser, and caching it so that other browser hitting the same file will get served directly from Apache without even disturbing Tomcat.
Other than static files, also many dynamic stuff may not need to be updated every millisecond. For example, a json loaded by the homepage that tells the user how much stuff is in your database, is an expensive query performed thousands of times that can safely be performed each hour or so without making your users angry. So, tomcat may serve the json with proper one hour caching directive, Apache will cache the json fragment and serve it to any browser requiring it for one hour. There are obviously a ton of other ways to implement it (a caching filter, a JPA cache that caches the query etc...), but sending proper cache headers and using Apache as a reverse proxy is quite easy, REST compliant and scales well.
Another consideration is load balancing. Apache comes with a nice load balancing module, that can help you scale your application on a number of Tomcat instances, supposed that your application can scale horizontally or run on a cluster.
A third consideration is about ulrs, headers etc.. From time to time you may need to change some urls, or remove or override some headers. For example, before a major update you may want to disable caching on browsers for some hours to avoid browsers keep using stale data (same as lowering the DNS TTL before switching servers), or move the old application on another url space, or rewrite old URLs to new ones when possible. While reconfiguring the servlets inside your web.xml files is possible, and filters can do wonders, if you are using a framework that interprets the URLs you may need to do a lot of work on your sitemap files or similar stuff.
Having Apache or another web server in front of Tomcat may help a lot changing only Apache configuration files with modules like mod_rewrite.
So, I always recommend having Apache httpd in front of Tomcat. The small overhead on connection handling is usually recovered thanks to caching of resources, and the additional configuration works is regained the first time you need to move URLs or handle some headers.
It depends on your network and how you wish to have security set up.
If you have a two-firewall DMZ, with applications deployed inside the second firewall, it makes sense to have an Apache or IIS instance in between the two firewalls to handle security and proxy calls into the app server. If it's acceptable to put the Tomcat instance in the DMZ you're free to do so. The only downside that I see is that you'll have to open a port in the second firewall to access a database inside. That might put the database at risk.
Another consideration is traffic. You don't say anything about traffic, sizing servers, and possible load balancing and clustering. A load balancer in front of a cluster of app servers is more likely to be kept inside the second firewall. The Tomcat instance is capable of handling traffic on its own, but there are always volume limitations depending on the hardware it's deployed on and what the application is doing with each request. It's almost impossible to give a yes or no answer without more detailed, application-specific information.
Search the site for "tomcat without apache" - it's been asked before. I voted to close before finding duplicates.
Related
I have a medium sized website called algebra.com. As of today, it is ranked 900th website in US in Quantcast ratings.
At the peak of its usage, during weekday evenings, it serves over 120-150 queries for objects per second. Almost all objects, INCLUDING IMAGES, are dynamically generated.
It has 7.5 million page views per month.
It is server by Apache2 on Ubuntu and is supplemented by Perlbal reverse proxy, which helps reduce the number of apache slots/child processes in use.
I spent an inordinate amount of time working on performance for HTTP and the result is a fairly well functioning website.
Now that the times call for transition to HTTPS (fully justified here, as I have logons and registered users), I want to make sure that I do not end up with a disaster.
I am afraid, however, that I may end up with a performance nightmare, as HTTPS sessions last longer and I am not sure whether a reverse proxy can help as much as it did with HTTP.
Secondly, I want to make sure that I will have enough CPU capacity to handle HTTPS traffic.
Again, this is not a small website with a few hits per second, we are talking 100+ hits per second.
Additionally, I run multiple sites on one server.
For example, can I have a reverse proxy, that supports several virtual domains on one IP (SNI), and translates HTTPS traffic into HTTP, so that I do not have to encrypt twice (once by apache for the proxy, and once by the proxy for the client browser)?
What is the "best practices approach" to have multiple websites, some large, served by a mix of HTTP and HTTPS?
Maybe I can continue running perlbal on port 80, and run nginx on port 443? Can nginx be configured as a reverse proxy for multiple HTTPS sites?
You really need to load test this, and no one can give a definitive answer other than that.
I would offer the following pieces of advice though:
First up Stack overflow is really for programming questions. This question probably belongs on the sister site www.serverfault.com.
Https processing is, IMHO, not an issue for modern hardware unless you are encrypting large volumes of traffic (e.g. video streaming). Especially with proper caching and other performance tuning that I presume you've already done from what you say in your question. However not dealt with a site of your traffic so it could become an issue there.
There will be a small hit to clients as the negotiate the https session on initial connection. This is in the order of a few hundred milliseconds, will only happen on initial connection for each session, is unlikely to be noticed by most people, but it is there.
There are several things you can do to optimise https including choosing fast ciphers, implementing session resumption (two methods for this - and this can get complicated on load balanced sites). Ssllabs runs an excellent https tester to check your set up, Mozilla has some great documentation and advice, or you could check out my own blog post on this.
As to whether you terminate https at your end point (proxy/load balanced) that's very much up to you. Yes there will be a performance hit if you re-encrypt to https again to connect to your actual server. Most proxy servers also allow you to just pass through the https traffic to your main server so you only decrypt once but then you lose the original IP address from your webserver logs which can be useful. It also depends on if you access your web server directly at all? For example at my company we don't go through the load balanced for internal traffic so we do enable https on the web server as well and make the LoadBalancer re-encrypt to connect to that so we can view the site over https.
Other things to be aware of:
You could see an SEO hit during migration. Make sure you redirect all traffic, tell Google Search Console your preferred site (http or https), update your sitemap and all links (or make them relative).
You need to be aware of insecure content issues. All resources (e.g. css, javascript and images) need to be served over https or you will get browsers warnings and refuse to use those resources. HSTS can help with links on your own domain for those browsers that support HSTS, and CSP can also help (either to report on them or to automatically upgrade them - for browsers that support upgrade insecure requests).
Moving to https-only does take a bit of effort but it's once off and after that it makes your site so much easier to manage than trying to maintain two versions of same site. The web is moving to https more and more - and if you have (or are planning to have) logged in areas then you have no choice as you should 100% not use http for this. Google gives a slight ranking boost to https sites (though it's apparently quite small so shouldn't be your main reason to move), and have even talked about actively showing http sites as insecure. Better to be ahead of the curve IMHO and make the move now.
Hope that's useful.
I just installed both Apache server and Tomcat, and I read that I should put static html pages in Apache and put dynamic pages, like JSP, Servlets, and all other full Java applications in Tomcat. Specifically, where should they go respectively?
For instance, should html files be placed under /var/www/html?
And all other files under /opt/apache-tomcat-7.0.34/webapps/?
Any tutorial for this? Thanks a lot.
The typical ways to forward requests from Apache to Tomcat involve the use of mod_proxy, mod_proxy_ajp or mod_jk (there might be more). All of them are well documented and basically involve requests that hit your Apache to be forwarded to tomcat if they match certain criteria (like path names) - all nonmatching requests will be handled by Apache, however you configure this one.
However, I'm seconding on JB Nizet's comment: Dividing the serving of different content to Apache and Tomcat is an optimization. It's arguable if you should add this complexity (not that it's too complex, but it's more to do than not separating it) when you don't have the need. E.g. if your nonoptimized website can handle 1000 concurrent users, but you'll rarely have more than 10 - don't bother.
We use Apache with Nginx(as reverse proxy) for more concurrency level because of the way that Nginx handles static contents and use fewer connections something that Apache lacks.
The question now is that is there any difference between the above scenario and using another server for serving static content (css,js,images,etc) with nginX and your primary server with Apache installed?
In my project there are millions of user with avatar,banner and ofcourse photo gallery. Project is nearly ready, and I want to make sure I'm on the right direction. Which scenario is the best?
EDIT:
What would happen if slow clients cause Apache to keep threads busy for longer than needed in the primary server?
One of the main purposes of nginx behind Apache is to handle slow clients to ensure that Apache doesn't have to keep its threads busy for this.
btw, I think it's relevant to the topic http://www.aosabook.org/en/nginx.html
Which one of these two is most commonly used scenario? I want to use the same scenario in my learning process. thanks.
Don't know about the rest of the industry, but where I work we have Apache HTTPD front-ending for Tomcat.
Any static content is directly provided by HTTPD for performance. Pain in the neck to separate every app out, but there is a noticeable payoff.
Also, HTTPD has some nice code for cookie handling, URL rewriting, clustering and so on.
Only if we determine that there's dynamic, database-bound data to show do we forward to Tomcat, which does an admirable job there.
Has been working well for us for almost a decade. Others too, I would wager.
In order to lighten Apache's load people often suggest using lighttpd to serve up static content.
e.g. http://www.linux.com/feature/51673
In this setup Apache passes requests for static content back to lighttpd via mod_proxy, while serving dynamic requests itself.
My question is: how does this reduce the load on the server? Since you still have an apache process spawned for every request that comes in, how does this positively impact the load? From what I can see the size of the Apache process proxying its request through lighttpd is as large as it would be if it were serving the file itself.
Running Lighttpd behind Apache to serve static files certainly seems braindead to me. Apache still has to unpack the HTTP packets and parse the request through its parse tree, send proxy requests, and then Lighttpd has to re-unpack, hit the filesystem and send the files back through Apache. I've never heard of anyone using a setup like this in production.
What you will see, is people using a lightweight webserver like Nginx as a frontend server to serve static files and proxy dynamic URLs to Apache. Or, you can run Varnish or Squid as a caching reverse proxy frontend, so that all your high-traffic static files (i.e. images, CSS etc. and any dynamic pages you're willing to send cache-friendly headers for) are served out of memory.
Apache can also be optimized to serve static files -- so often when I hear people complain about Apache, they really don't know how to configure it. They've only ever used the prefork MPM (vs. threaded or worker) and have all sorts of modules enabled (usually they're running from a Linux distribution's kitchen-sink Apache package that builds everything as modules and defaults to enabling 10-20 modules or more). Tune Apache by turning off unneeded modules/stupid features like support for .htaccess (which makes Apache scan the filesystem on every request!) first. (You can also run two instances of Apache, with a "light" Apache as frontend that proxies to a "heavy" Apache for dynamic requests ... maybe your frontend is threaded but your backend is prefork because you have to run thread-unsafe external modules like mod_php.)
Re:
Since you still have an apache process
spawned for every request that comes
in, how does this positively impact
the load? From what I can see the size
of the Apache process proxying its
request through lighttpd is as large
as it would be if it were serving the
file itself.
If you're spawning processes on every request, then that means you're using the prefork MPM. Keep in mind that when the OS reports memory usage for each of these processes, not all that memory is wired, a lot of those processes are idle. And when you're talking about speed, you're concerned more with request parsing and internal code branches for a given request (how much processing is the server doing?) than with memory usage reported by the OS.
For example, if you enable something like mod_php, then each of those worker processes is going to instantly go up by about 20-40M (depending on what's enabled in your PHP interpreter), but that doesn't mean Apache is using that memory on static requests. Of course if you're optimizing your server for maximum concurrency on small static files, then enabling mod_php would still be very bad, you're not going to be able to fit nearly as many prefork processes into RAM.
I probably could come up with a "nightmare configuration" for Apache that would make it actually slower serving static files than proxying those requests to a backend Lighttpd, but it would involve enabling expensive features like .htaccess in Apache that are disabled in Lighttpd, so it wouldn't really be fair.
If you still have the power to serve static and dynamic content from the same machine (as they in your referenced article do), then I really see no point in that setup.
Maybe it does reduce the Load of Apache, because it doesn't have to do IO to the disk, but it will increase the Load of Lighttpd on the same machine and thus reducing the available load to apache ...
Maybe Lighttpd IO access is lighter, than that of Apache 1.3, but why not just switch to Apache 2 or Lighttpd completely? And if the performance really start to suck, host the static files on another machine (media.yourdomain.com).
I small introduction to how you can make a performant setup is found here:
Deploying Django -> scroll to Scaling some page before the end
I don't know much about internal workings of Apache, but one explanation I've seen is about memory pressure. In short, Apache tries to balance the memory it uses for caching and for dynamic pages; but usually ends up with too much cache and too little for apps. If you separate them to different processes, each one will optimize for the kind of load.
Currently, what I'm doing is using nginx as front end. It's really fast and light, and specifically designed as a frontend proxy; but also serves static files. In fact, since it can also call FastCGI processes, you could get rid of Apache and still get the benefits of split file/app processes. (and there's some extra memcached magic that looks absolutely genius)
(Yes, lighttpd can also be used as frontend to Apache and/or FastCGI)
You don't have an Apache process spawned for each request - static files (images and the like) are fetched directly by lighttpd.
Use Apache MPM Worker fastcgi this will lower you server memory usage. MPM worker serves static content better then Prefork and is nearly on par with lighttpd when it comes to static content.