What is Apache process model? - apache

I have been googling this question for some time but got no answers. What's the Apache process model?
By process model, I mean how Apache manage process or thread to handling HTTP request.
Does it fork one process for each HTTP request?
Does it have process/thread pool?
Can we config it?
Is there any online doc for such Apache details?

This depends on your system and configuration : see Core Features and Multi-Processing Modules : you could use, for instance :
Apache MPM winnt on windows -- that one uses threads
Or Apache MPM prefork -- that one uses processes
Or even Apache MPM worker -- which uses both several processes and threads.
Quoting the page of the last one, Apache MPM worker :
This Multi-Processing Module (MPM)
implements a hybrid multi-process
multi-threaded server. By using
threads to serve requests, it is able
to serve a large number of requests
with fewer system resources than a
process-based server. However, it
retains much of the stability of a
process-based server by keeping
multiple processes available, each
with many threads.

Related

Handling Http Request with Apache2 (or Nginx). Does a new process gets created for each or a set of N requests?

Will a web server (WS) (like apache2 or nginix (or container like tomcat(TC)) create a new process to handle incoming request. My concern is about servers that support high number of parallel users (say 20K+ parallel users).
I think load balancing happens on the other side of web server (if it is used to front Tomcat etc). So in theory, a single web server should be accepting all the (20K+)incoming request before it can distribute the load to other servers backing it.
So, the questions is: Does Web Server (WS) handle all these requests in a single process or it smartly spawns other process to help share the work (i know the "client - server" binding happens though - client_host:random_port plus server_host:fixed_port).
Reference: Prior to reading this article:Fronting Tomcat with Apache I was thinking it is a single process doing all the smart work. But in this article there is mentioning of MPM (Multi-Processing Module)
It combines the best from two worlds, having a set of child processes each having a set of separate threads. There are sites that are running 10K+ concurrent connections using this technology.
And as it goes, it is getting more sophisticated as threads also being spawned like mentioned above. (these are not the tomcat threads that serve each individual request by calling the service method, but these are threads on Apache WS to handle request and distribute them to nodes for processing).
If any one used MPM. Little further explanation of how all this works will be great.
Questions like -
(1) As child processes are spawned what is it exact role. Is the child process just for mediating the request to tomcat or any thing more. If so, then after the child process gets response from TC, does the child process forward the response to parent process or directly to the client (since it can know the client_host:random_port from parent process. I am not sure if this is allowed in theory, though the child process can not accept any new request as the fixed_port which can bind to only one process is already tied to parent process.
(2) What kind of load is shared to thread by the child or parent process. Again it must almost be same as in (1). But what I am not sure is that even in theory if a thread can directly send the request to client.
Apache historically use prefork model of processing. In this model each request == separate operation system (OS) process. It's calling "prefork" because Apache fork some spare processes and process request within. If number of preforked processes not enough - Apache fork new. Pros: process can execute other modules or processes and not care that they do; cons: each request = one process, too much memory used and OS fork also can be slow for your requests.
Other model of Apache - worker MPM. Almost same as prefork, but using not OS processes but OS threads. Thread - it's like lightweight process. One OS process can run many threads using one memory space. Worker MPM used much less memory and new threads created fast. Cons: modules need to support thread, crash of module can crash all threads of all OS process (but this it not important for you because you are using apache as reverse proxy only). Other cons: CPU switching context when switching between threads.
So yes, worker much better than prefork in your case, but...
But we have Nginx :) Nginx using other model (btw, Apache has event MPM too). In this case you has only one process (well, can be few processes, see below). How it works. New request rising special event, OS process waking up, receive request, prepare answer, write answer and gone sleep.
You can say "wow, but this is not multitasking" and will be right. But one big difference between this model and simple sequentially request processing. What happens if you need write big data to slow client? In synchronous way your process need to wait acknowledging about data receiving and only after - process new request. Nginx and Apache event model use asynchronous model. Nginx tell to OS to send some piece of data write this data to OS buffer and... gone sleep, or process new requests. When OS will send piece of data - special event will be sent to nginx. So, main difference - Nginx do not wait I/O (like connect, read, write), Nginx tell to OS that he want and OS send event to Nginx than this task ready (socket connected, data written or new data ready to read in local buffer). Also, modern OS can work asynchronously with HDD (read/write) and even can send files from HDD to tcp socket directly.
Sure, all math operations in this Nginx process will block this process and its stop to process new and existing requests. But when main workflow is work with network (reverse proxy, forward requests to FastCGI or other backend server) plus send static files (asynchronous too) - Nginx can serve thousands simultaneous requests in one OS process! Also, because one process of OS (and one thread) - CPU will execute it in one context.
How I told before - Nginx can start few OS processes and each of this process will be assigned by OS to separate CPU core. Almost no reasons to fork more Nginx OS processes (there is only one reason to do it: if you need to do some blocking operations, but simple reverse proxy with backend balancing - not this case)
So, pros: less CPU context switching, less memory (comparing with worker MPM too), fast connection processing. More pros: Nginx created as HTTP load balancer and have lot of options for it (and even more in commercial Nginx Plus). Cons: If you need some hard math inside OS process, this process will be blocked (but all you math in Tomcat, so Nginx only balancer).
PS: typo fix will come later, out of time. Also, my English bad, so fixes always welcome :)
PPS: Answer question about number of TC thread, asked in comments (was too long for post as comment):
Best way to know it - test it using stress loading tools. Because this number depend on application profile. Response time is not good enough to help answer. Because, for example, big difference between 200ms of 100% math (100% cpu bound) vs 50ms of math + 150ms of sleep waiting database answer.
If application is 100% CPU bound - probably one thread per one core, but in real cases all applications also spent some time in I/O (receive request, send answer to client).
If application work with I/O and need to wait for answers from other services (database, for example), this application spends some time in sleep state and CPU can process other tasks.
So best solution to create number of requests close to real load and run stress test increasing number of concurrent requests (and number of TC workers for sure). Find acceptable response time and fix this number of threads. Sure, need to check before that it is not database fault.
Sure, here I'm talking about dynamic content only, requests for static files from disk must be processed before tomcat (by Nginx, for example).

Apache or nginx ? I like to understand the basic working flow of Nginx , its advantage and disadvantage

Pros & cons over Apache or nginx and how they work internally in order to maximize the resource utilization
Can I use Apache & Nginx together ? If I use only Nginx then what problem I can face ?
Apache has some disadvantages, especially when it is used with the PHP module.
Apache's process model is such that each connection uses a separate process. Each process carries all the overhead of PHP and any other modules you may have loaded with it. An Apache process might run a PHP script or serve static content for one request. If the PHP has a memory leak (which does happen sometimes), the process continues to grow in size. Also, when KeepAlive is enabled, which is usually recommended, that process stays alive for a few seconds after the connection, consuming a "slot" that another client might be able to use and helping the server to reach its MaxClients sooner.
Nginx is an alternative webserver that normally uses the Linux "epoll" API to process requests in a non-blocking mode. This means that one single process can handle many simultaneous connections. Epoll is an efficient way to tell the single process which connection(s) it needs to deal with and which can wait. Nginx has a goal of solving the "C10k" problem - how to have 10,000 concurrent connections.
This naturally goes hand in hand with php-fpm, the FastCGI Process Manager. Nginx itself does not have PHP built-in. When it receives a request for a PHP script, it makes a call out to php-fpm to run the script, which then returns the result to nginx, which returns it to the client.
This all uses a lot less memory than a similar Apache+mod_php configuration.
There are a couple more huge advantages of php-fpm over mod_php:
It uses different "pools", each of which can run as a separate Linux user. This provides a simple and effective way of isolating websites (for example, if they are run by different customers who should not read each other's code) without the overhead or nastiness of suexec or suphp.
It has a slow log feature where it can dump a PHP stack trace of any script that has been running for greater than X seconds. This can help diagnose slow code issues.
Php-fpm can be run with Apache, and in fact this allows you to take advantage of Apache's more efficient Worker MPM (or Event in Apache 2.4). However, my experience is that configuring it in Apache is significantly more complex than configuring it in nginx, and even with Worker, it still is not quite as efficient with nginx.
Disadvantages of moving to nginx - not many, but things to keep in mind:
It does not support .htaccess files. I think this is a good thing personally as .htaccess files must be parsed by Apache for every request, which can cause significant overhead.
Configuration files need to be re-written. If you have many complex site configurations, this could take some doing. For simple cases it is not usually a big deal.
Feature Of Nginx
Nginx is fast because it does not need to create a new process for
each new request.
HTTP proxy and Web server features
Ability to handle more than 10,000 simultaneous connections with a
low memory footprint (~2.5 MB per 10k inactive HTTP keep-alive
connections)
Handling of static files, index files, and auto-indexing
Reverse proxy with caching
Load balancing with in-band health checks
Fault tolerance
Nginx uses very little memory, especially for static Web pages..
FastCGI, SCGI, uWSGI support with caching
Name- and IP address-based virtual servers
IPv6-compatible
SPDY protocol support
FLV and MP4 streaming
Web page access authentication
gzip compression and decompression
URL rewriting having its own rewrite engine
Custom logging with on-the-fly gzip compression
Response rate and concurrent requests limiting
Bandwidth throttling
Server Side Includes
IP address-based geolocation
User tracking
WebDAV
XSLT data processing
Embedded Perl scripting
Nginx is highly scalable, and performance is not dependent on
hardware.
With only Nginx, you lose a whole bunch of apache-specific features such as all the mod_dav stuff. You lose a lot of modules, effectively
Conclusion
The best use for nginx is in front of Apache if you need Apache modules. Use it as a load-balancer if you might, between multiple Apache instances, and you suddenly have a mixed set-up that is rather

Does Apache really "fork" in mod_php/python way for request handling?

I am a dummy in web apps. I have a doubt regaring the functioning of apache web server. My question is mainly centered on "how apache handles each incoming request"
Q: When apache is running in the mod_python/mod_php mode, then does a "fork" happen for each incoming reuest?
If it forks in mod_php/mod_python way, then where is the advantage over CGI mode, except for the fact that the forked process in mod_php way already contains an interpretor instance.
If it doesn't fork each time, how does it actually handle each incoming request in the mod_php/mod_python way. Does it use threads?
PS: Where does FastCGI stands in the above comparison?
With a modern version of Apache, unless you configure it in prefork mode, it should run threaded (and not fork). mod_python is threadsafe, and doesn't require that each instance of it is forked into its own space.

IIS, APACHE, YAWS runtime environment

Recently I gone through a an article explaining potentiality of YAWS server and the number of requests it processes per second. It was mentioned that YAWS can handle 80K requests per second and it also run in multi threaded environment to improve request processing limit.
How can we compare IIS, Apache with YAWS? Which one will process maximum requests? Can I find any comparisons somewhere?
Check this link out:http://www.sics.se/~joe/apachevsyaws.html Link to Yaws vs apache
You see that Yaws handles 80000 concurrent requests (and continuing) while apache fails at around 4000 connections. This is because Yaws runs on the Erlang/OTP VM. Processes belong to this machine and not the operating system. Erlang has been highly customised for concurrent programming. Infact, other erlang web applications like:mochiweb,webmachine, e.t.c are much more powerful than apache when it comes to handling many concurrent requests. Yaws web server scales better than any web server i know of today. With the ability to create appmods, you can create Erlang Applications that communicate over http protocol, making use of the power of yaws.
Yaws home page is: http://yaws.hyber.org/. Actually, Yaws gets its power from OTP (Open Telecom Platform). This set of powerful libraries found at http://erlang.org/, has the most advanced design patterns such as fail over systems, supervision trees, finite state machines, event handlers, e.t.c, You should actually start using erlang for your next web application!!!!

apache low memory error

mod_python(?) is eating a lot of ram (about 9mb per worker process). If i open several TRAC pages at once many of them will have an error due to no ram (64mb virtual limit). if i limit the worker threads to 3 i can get by alright. Problem is if no one is accessing TRAC i have A LOT of ram being unused.
Is there a way i can either
Limit the amount of worker process that can use python?
Limit the amount of worker process in my trac path?
Have apache spawn as many worker process or threads it wants but have it only spawn when X amount or ram is free (or when X amount or below is in use by apache)
Something else ?
You could configure a second mod_python apache with minimal worker threads to run only on the local interface and with a different port, i.e. http://127.0.0.1:9000/. Then for your public apache instance on port 80, disable mod_python and tune for optimal ram utilization. Proxy all trac and other python app requests to the local mod_python instance.
If the public facing apache is left only to serve static content, then consider replacing it with something lightweight such as nginx or lighttpd.