Communicating between Apache and a different process - apache

I want to communicate between Apache and an external process.
I can modify the source of the process (written in C++) as much as I want, but Apache should (hopefully) remain the same. I was thinking about just using an Intranet socket between PHP and the program, but that just seems inefficient and hard to do if there are multiple page loads at once, and using a file is even worse.
Essentially, Apache (and PHP) would query the external program, and should read or modify a hashtable. How should I go about doing this?

Make your 'external process' expose an HTTP server, then reverse-proxy from apache to that HTTP server. Done.

Related

Is there a way to combine websockets and normal http through apache?

So I have this server where I host more than one website for professional purposes. But I also like to develop game websites and I would like to create a roguelike game with HTML5.
The game engine itself would be developped in C++ on the server and the client should ask the server what changes in the environment after every move.
So, normally, I would send an ajax request to the server where apache would reroute the request to my C++ application which is running as a FastCGI Service. My C++ application would check the session, look up if the movement is valid, change the internal values so that the character moves, also changes other things in the environment, and would then send the changes back to the client.
But ajax requests can be relatively slow, opening and closing connections all the time. So when I read about websockets, I thought I was in heaven until I saw that it will interfere with Apache and Apache is not really optimized to work with it.
Obviously, I could create a web socket on a different port, but with all those firewalls out there, I don't think that's a good option.
So, is there a way to combine the two? Where apache is able to understand that a websocket request should be ignored and passed on to my application instead?

Should I run Tomcat by itself or Apache + Tomcat?

I was wondering if it would be okay to run Tomcat as both the web server and container? On the other hand, it seems that the right way to go about scaling your webapp is to use Apache HTTP listening on port 80 and connecting that to Tomcat listening on another port?
Are both ways acceptable? What is being used nowdays? Whats the prime difference? How do most major websites go about this?
Thanks.
Placing an Apache (or any other webserver) in front of your application server(s) (Tomcat) is a good thing for a number of reasons.
First consideration is about static resources and caching.
Tomcat will probably serve also a lot of static content, or even on dynamic content it will send some caching directives to browsers. However, each browser that hits your tomcat for the first time will cause tomcat to send the static file. Since processing a request is a bit more expensive in Tomcat than it is in Apache (because of Apache being super-optimized and exploiting very low level stuff not always available in Tomcat, because Tomcat extracting much more informations from the request than Apache needs etc...), it may be better for the static files to be server by Apache.
Since however configuring Apache to serve part of the content and Tomcat for the rest or the URL space is a daunting task, it is usually easier to have Tomcat serve everything with the right cache headers, and Apache in front of it capturing the content, serving it to the requiring browser, and caching it so that other browser hitting the same file will get served directly from Apache without even disturbing Tomcat.
Other than static files, also many dynamic stuff may not need to be updated every millisecond. For example, a json loaded by the homepage that tells the user how much stuff is in your database, is an expensive query performed thousands of times that can safely be performed each hour or so without making your users angry. So, tomcat may serve the json with proper one hour caching directive, Apache will cache the json fragment and serve it to any browser requiring it for one hour. There are obviously a ton of other ways to implement it (a caching filter, a JPA cache that caches the query etc...), but sending proper cache headers and using Apache as a reverse proxy is quite easy, REST compliant and scales well.
Another consideration is load balancing. Apache comes with a nice load balancing module, that can help you scale your application on a number of Tomcat instances, supposed that your application can scale horizontally or run on a cluster.
A third consideration is about ulrs, headers etc.. From time to time you may need to change some urls, or remove or override some headers. For example, before a major update you may want to disable caching on browsers for some hours to avoid browsers keep using stale data (same as lowering the DNS TTL before switching servers), or move the old application on another url space, or rewrite old URLs to new ones when possible. While reconfiguring the servlets inside your web.xml files is possible, and filters can do wonders, if you are using a framework that interprets the URLs you may need to do a lot of work on your sitemap files or similar stuff.
Having Apache or another web server in front of Tomcat may help a lot changing only Apache configuration files with modules like mod_rewrite.
So, I always recommend having Apache httpd in front of Tomcat. The small overhead on connection handling is usually recovered thanks to caching of resources, and the additional configuration works is regained the first time you need to move URLs or handle some headers.
It depends on your network and how you wish to have security set up.
If you have a two-firewall DMZ, with applications deployed inside the second firewall, it makes sense to have an Apache or IIS instance in between the two firewalls to handle security and proxy calls into the app server. If it's acceptable to put the Tomcat instance in the DMZ you're free to do so. The only downside that I see is that you'll have to open a port in the second firewall to access a database inside. That might put the database at risk.
Another consideration is traffic. You don't say anything about traffic, sizing servers, and possible load balancing and clustering. A load balancer in front of a cluster of app servers is more likely to be kept inside the second firewall. The Tomcat instance is capable of handling traffic on its own, but there are always volume limitations depending on the hardware it's deployed on and what the application is doing with each request. It's almost impossible to give a yes or no answer without more detailed, application-specific information.
Search the site for "tomcat without apache" - it's been asked before. I voted to close before finding duplicates.

nginx/apache/php vs nginx/php

I currently have one server with nginx that reverse_proxy to apache (same server) for processing php requests. I'm wondering if I drop apache so I'd run nginx/fastcgi to php if I'd see any sort of performance increases. I'm assuming I would since Apache's pretty bloated up, but at the same time I'm not sure how reliable fastcgi/php is especially in high traffic situations.
My sites gets around 200,000 unique visitors a month, with around 6,000,000 page crawls from the search engines monthly. This number is steadily increasing so I'm looking at perfomrance options.
My site is very optimized code wise and there isn't any caching (don't want that either), each page has a max of 2 sql queries without any joins on other tables, indexes are perfect as well.
In a year or so I'll be rewriting everything to use ClearSilver for the templates, and then probably use python or else c++ for extreme performance.
I suppose I'm more or less looking for any advice from anyone who is familiar with nginx/fastcgi and if willing to provide some benchmarks. My sites are one server with 1 quad core xeon, 8gb ram, 150gb velociraptor drive.
nginx will definitely work faster than Apache. I can't tell about fastcgi since I never used it with nginx but this solution seems to make more sense on several servers (one for static contents and one for fastcgi/PHP).
If you are really targeting performance -and even consider C/C++- then you should give a try to G-WAN, an all-in-one server which provides (very fast) C scripts.
Not only G-WAN has a ridiculously small memory footprint (120 KB) but it scales like nothing else. There's work ahead of you if you migrate from PHP, but you can start with the performance-critical tasks and migrate progressively.
We have made the jump and cannot consider to go back to Apache!
Here is a chart showing the respective performances of nginx, apache and g-wan:
g-wan.com/imgs/gwan-lighttpd-nginx-cherokee.png
apache does not seem to lead the pack (and that's a -Quad XEON # 3GHz).
Here is an independent benchmark for g-wan vs nginx, varnish and others http://nbonvin.wordpress.com/2011/03/14/apache-vs-nginx-vs-varnish-vs-gwan/
g-wan handles much more requests per second with much less CPU time.
NGINX is the best choice as a webserver now a days.
The main difference between Apache and NGINX lies in their design
architecture. Apache uses a process-driven approach and creates a
new thread for each request. Whereas NGINX uses an event-driven
architecture to handle multiple requests within one thread.
As far as Static content is concerned, Nginx overpasses Apache.
Both are great at processing Dynamic content.
Apache runs on all operating systems such as UNIX, Linux or BSD and
has full support for Microsoft Windows & NGINX also runs on several
modern Unix-like systems and has support for Windows, but its
performance on Windows is not as stable as that on UNIX platforms.
Apache allows additional configuration on a per-directory basis via
.htaccess files. Where Nginx doesn’t allow additional configuration.
Request Interpretation-Apache pass file System location. Nginx
Passes URI to interpret requests.
Apache have 60 official dynamically loadable modules that can be
turned On/Off.Nginx have 3rd Party core modules (not dynamically
loadable).NGINX provides all of the core features of a web server,
without sacrificing the lightweight and high-performance qualities
that have made it successful.
Apache Supports customization of web server through dynamic modules.
Nginx is not flexible enough to support dynamic modules and loading.
Apache makes sure that all the website that runs on its server are
safe from any harm and hackers. Apache offers configuration tips for
DDoS attack handling, as well as the mod_evasive module for
responding to HTTP DoS, DDoS, or brute force attacks.
When Choose Apache over NGINX?
When needs .htaccess files, you can override system-wide settings on
a per-directory basis.
In a shared hosting environment, Apache works better because of its
.htaccess configuration.
In case of functionality limitations – use Apache
When Choose NGINX over Apache?
Fast Static Content Processing
Great for High Traffic Websites
When Use Both of them -Together
User can use Nginx in front of Apache as a server proxy.

What is CGI mode?

What does it mean when we say an application can run in CGI mode? I was reviewing the features of various CMS systems on cmsmatrix.org and "CGI mode support" was listed as a feature. What are the other "modes" in which a web application can run?
Basically, CGI implies that the webserver will execute an external process, get its result (a generated HTML page, an image, ...) and send it back to the client.
This has major drawbacks because it launchs the external process each time it is needed, so this can be a big overhead.
You have also FastCGI that launch the external process once, and reuses it when needed.
But usually, languages are integrated directly in the webserver.
For example, Apache has a mod_perl module to execute perl scripts, instead of executing perl scripts via CGI
CGI stands for 'Common Gateway Interface', which is an old architecture for web applications. CGI works by placing the variables from the HTTP request and fork/exec()ing the CGI process. It gained popularity in the early days of web development as it worked well on a unix host. Perl/CGI was a popular architecture in this era and it contributed substantially to the popularity of Perl as a language.
The main claim to fame of CGI is that it doesn't require much plumbing, so it will work with most web servers. The main drawback is that the fork-exec process is slow-ish as the CGI script has to be started (which may involve starting a perl or other interpreter). On Windows, spawning a new process is much slower than unix, so CGI is even more inefficient.
CGI is a protocol that is used by web servers to call executable files on the server. Upon receiving a request, it sends the information about the request to the cgi script and returns the result of that script back to the browser.
An alternative to that is fastcgi. This means, that the web server does not contact a script to answer the request, but a process. The communication protocol is still the same, though (hence the name).

apache + lighttpd front-proxy concept

In order to lighten Apache's load people often suggest using lighttpd to serve up static content.
e.g. http://www.linux.com/feature/51673
In this setup Apache passes requests for static content back to lighttpd via mod_proxy, while serving dynamic requests itself.
My question is: how does this reduce the load on the server? Since you still have an apache process spawned for every request that comes in, how does this positively impact the load? From what I can see the size of the Apache process proxying its request through lighttpd is as large as it would be if it were serving the file itself.
Running Lighttpd behind Apache to serve static files certainly seems braindead to me. Apache still has to unpack the HTTP packets and parse the request through its parse tree, send proxy requests, and then Lighttpd has to re-unpack, hit the filesystem and send the files back through Apache. I've never heard of anyone using a setup like this in production.
What you will see, is people using a lightweight webserver like Nginx as a frontend server to serve static files and proxy dynamic URLs to Apache. Or, you can run Varnish or Squid as a caching reverse proxy frontend, so that all your high-traffic static files (i.e. images, CSS etc. and any dynamic pages you're willing to send cache-friendly headers for) are served out of memory.
Apache can also be optimized to serve static files -- so often when I hear people complain about Apache, they really don't know how to configure it. They've only ever used the prefork MPM (vs. threaded or worker) and have all sorts of modules enabled (usually they're running from a Linux distribution's kitchen-sink Apache package that builds everything as modules and defaults to enabling 10-20 modules or more). Tune Apache by turning off unneeded modules/stupid features like support for .htaccess (which makes Apache scan the filesystem on every request!) first. (You can also run two instances of Apache, with a "light" Apache as frontend that proxies to a "heavy" Apache for dynamic requests ... maybe your frontend is threaded but your backend is prefork because you have to run thread-unsafe external modules like mod_php.)
Re:
Since you still have an apache process
spawned for every request that comes
in, how does this positively impact
the load? From what I can see the size
of the Apache process proxying its
request through lighttpd is as large
as it would be if it were serving the
file itself.
If you're spawning processes on every request, then that means you're using the prefork MPM. Keep in mind that when the OS reports memory usage for each of these processes, not all that memory is wired, a lot of those processes are idle. And when you're talking about speed, you're concerned more with request parsing and internal code branches for a given request (how much processing is the server doing?) than with memory usage reported by the OS.
For example, if you enable something like mod_php, then each of those worker processes is going to instantly go up by about 20-40M (depending on what's enabled in your PHP interpreter), but that doesn't mean Apache is using that memory on static requests. Of course if you're optimizing your server for maximum concurrency on small static files, then enabling mod_php would still be very bad, you're not going to be able to fit nearly as many prefork processes into RAM.
I probably could come up with a "nightmare configuration" for Apache that would make it actually slower serving static files than proxying those requests to a backend Lighttpd, but it would involve enabling expensive features like .htaccess in Apache that are disabled in Lighttpd, so it wouldn't really be fair.
If you still have the power to serve static and dynamic content from the same machine (as they in your referenced article do), then I really see no point in that setup.
Maybe it does reduce the Load of Apache, because it doesn't have to do IO to the disk, but it will increase the Load of Lighttpd on the same machine and thus reducing the available load to apache ...
Maybe Lighttpd IO access is lighter, than that of Apache 1.3, but why not just switch to Apache 2 or Lighttpd completely? And if the performance really start to suck, host the static files on another machine (media.yourdomain.com).
I small introduction to how you can make a performant setup is found here:
Deploying Django -> scroll to Scaling some page before the end
I don't know much about internal workings of Apache, but one explanation I've seen is about memory pressure. In short, Apache tries to balance the memory it uses for caching and for dynamic pages; but usually ends up with too much cache and too little for apps. If you separate them to different processes, each one will optimize for the kind of load.
Currently, what I'm doing is using nginx as front end. It's really fast and light, and specifically designed as a frontend proxy; but also serves static files. In fact, since it can also call FastCGI processes, you could get rid of Apache and still get the benefits of split file/app processes. (and there's some extra memcached magic that looks absolutely genius)
(Yes, lighttpd can also be used as frontend to Apache and/or FastCGI)
You don't have an Apache process spawned for each request - static files (images and the like) are fetched directly by lighttpd.
Use Apache MPM Worker fastcgi this will lower you server memory usage. MPM worker serves static content better then Prefork and is nearly on par with lighttpd when it comes to static content.