Apache webserver - What happens to requests, when all worker threads are busy - apache

As far as I researched, the scenario when all worker threads are busy serving requests, what happens to the requests that comes next.
Do they wait?
Is this related to some configurable parameters?
Can I get the count of such requests ?
Adding to this please can you explain or give a link where I can get a clear picture of request processing strategy of Apache webserver?
Thanks for Looking at!!

When all Apache worker threads are busy, the new request is stalled (it waits) until one of those worker threads is available. If the client gives up waiting, or you surpass the maximum wait time in your configuration file; it will drop the connection.

This answer is given in 2015. So I talk about apache httpd 2.4.
They wait because the connection is queued on the TCP socket (the connection is not ACKed) Although the default length of the backlog may be set way too high on linux boxes. This may result in connections being closed due to kernel limits being in place.
ListenBacklog (with caveats. See 1.)
This is described here. With lots of interesting stuff.
Read through Apache TCP Backlog by Ryan Frantz to get the glory details about the Apache backlog.

Related

Apache threads stay in state reading after queries

My configuration is apache and tomcat behind an aws elb. Apache is configured with no keepalive, and set with max clients to a low number due to each query being very cpu intensive. I'll load test the machine with queries. Then the number of available requests goes to zero as can be seen by curl -s localhost/server-status?auto not responding immediately. When I stop the loadtest I can see the scoreboard from curl -s localhost/server-status?auto still is full of R's even though from the tomcat logs it is clear nothing is happening. Does anyone have an idea on what possible causes there might be?
If your apache display 'R' in the status it means there is open TCP connections from ELB to apache (just an open TCP connection, no data sent yet).
There is no official complete documentation on this subject (how the numbers of pre-opened connections is optimized ), but the amazon documentation state (at this page: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html ) that:
Classic Load Balancers use pre-open connections but Application Load Balancers do not.
So, the answer is: it is an optimization from amazon (TCP connection cost a little bit to be openned).

Apache mod_wsgi slowloris DoS protection

Assuming the following setup:
Apache server 2.4
mpm_prefork with default settings (256 workers?)
Default Timeout (300s)
High KeepAliveTimeout (100s)
reqtimeout_mod enabled with the following config: RequestReadTimeout header=62,MinRate=500 body=62,MinRate=500
Outdated mod_wsgi 3.5 using Daemon mode with 15 threads and 1 process
AWS ElasticBeanstalk's load balancer acting as a reverse proxy to apache with 60s idle connection timeout
Python/Django being the wsgi application
A simple slowloris attack like the one described here, using a "slow" request body: https://www.blackmoreops.com/2015/06/07/attack-website-using-slowhttptest-in-kali-linux/
The above attack, with just 15 requests (same as mod_wsgi threads) can easily lock the server until a timeout happens, either due to:
Load balancer timeout (60s) happens due to no data sent, this kills the apache connection and mod_wsgi can once again serve requests
Apache RequestReadTimeout happens due to data being sent, but not enough, again mod_wsgi is able to serve requests after this
However, with just 15 concurrent "slow" requests, I was able to lock the server up to 60 seconds.
Repeating the same but with a more bizarre number, like 4096 requests, pretty much locks the server permanently since there will be always a new request that needs to be served by mod_wsgi once the previous times out.
I would expect that the load balancer should handle/detect this before even sending requests to apache, which it already does for similar attacks (partial headers, or tcp syn flood attacks never hit apache which is nice)
What options are available to help against this? I know there's no failproof option since these kind of attacks are difficult to detect and protect, but it's quite silly that the server can be locked that easily.
Also, if the wsgi application never reads request body, I would expect for the issue to not happen as well since the request should return immediately, but I'm not sure about this or the internals of mod_wsgi, for example, this is true when using a local dev wsgi server (the attack files since the request body is never read) but the attack succeeds when using mod_wsgi, which leads me to think it tries to read the body even before sending it to the wsgi code.
Slowloris is a very simple Denial-of-Service attack. This is easy to detect and block.
Detecting and preventing DoS and DDos attacks are complex topics with many solutions. In your case you are making the situation worse by using outdated software and picking a low worker thread count so that the problem arises quickly.
A combination of services are available that would be used to manage Dos and DDos attacks.
The front-end of the total system would be protected by a firewall. Typically this firewall would include a Web Application Firewall to understand the nuances of HTTP protocols. In the AWS world, Amazon WAF and Shield are commonly used.
Another service that helps is a CDN. Amazon CloudFront uses Amazon Shield so it has good DDoS support.
The next step is to combine load balancers with auto scaling mechanisms. When the health checks start to fail (caused by Slowloris), the auto scaler will begin launching new instances and terminating failed instances. However, a sustained Slowloris attack will just hit the new servers. This is why the Web Application Firewall needs to detect the attack and start blocking it.
For your studies, take a look at mod_reqtimeout. This is an effective and tuneable solution for Apache for most Slowloris attacks.
[Update]
In the Amazon DDoS White Paper June 2015, Slowloris is specifically mentioned.
On AWS, you can use Amazon CloudFront and AWS WAF to defend your
application against these attacks. Amazon CloudFront allows you to
cache static content and serve it from AWS Edge Locations that can
help reduce the load on your origin. Additionally, Amazon CloudFront
can automatically close connections from slow-reading or slow-writing
attackers (e.g., Slowloris).
Amazon DDoS White Paper June 2015
In mod_wsgi daemon mode there are a bunch of options to further help to combat such attacks by recovering from it and discarding queued requests as well which have been waiting too long. Try your tests using mod_wsgi-express as it defines defaults for a lot of these options whereas when using mod_wsgi yourself directly, there are no defaults. Use mod_wsgi-express start-server --help to see what defaults are. The actual options you want to look at for mod_wsgi daemon mode are request-timeout, connect-timeout, socket-timeout and queue-timeout. There are also other options related to buffer sizes and listener backlog you can play with. Do note that ultimately the listen backlog of the main Apache worker processes can still be an issue because it usually defaults to 500, which means a lot of requests can queue up stuck before you can even tag them with a time so as to help discard the backlog by tracking queue time.
You can find the documentation at:
http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html
On the point of whether mod_wsgi reads the request body before sending it, no it doesn't. Apache itself because it reads in block may partially read the request body when reading the headers, but it shouldn't block on it. Once the full request headers are passed off to mod_wsgi and sent through to the daemon process, then mod_wsgi will start transferring the request body.
Soloution:
If you are getting hit, I recommend you go to a provider that protects against DDoS attacks. However your best bet would be to programatically block the IP once it has been decided that it is being malicious. If you receive two large Content-Length POST requests than you should block the IP for a few minutes for suspicious activities. Many large companies are very cheap, and some of them are free for the basic package such as Cloud Flare. I use them for my company and I am beyond happy to have them!
Edit: Their job is literally just to protect you. That is it.

How does Apache detects a stopped Tomcat JVM?

We are running multiple Tomcat JVMs under a single Apache cluster. If we shut down all the JVMs except one, sometime we get 503s. If we increase the
retry interval to 180(from retry=10), problem goes away. That bring me
to this question, how does Apache detects a stopped Tomcat JVM? If I
have a cluster which contains multiple JVMs and some of them are down,
how Apache finds that one out? Somewhere I read, Apache uses a real
request to determine health of a back end JVM. In that case, will that
request failed(with 5xx) if JVM is stopped? Why higher retry value is
making the difference? Do you think introducing ping might help?
If someone can explain a bit or point me to some doc, that would be awesome.
We are using Apache 2.4.10, mod_proxy, byrequests LB algorithm, sticky session,
keepalive is on and ttl=300 for all balancer members.
Thanks!
Well let's examine a little what your configuration is actually doing in action and then move to what might help.
[docs]
retry - Here either you 've set it 10 or 180 what you specify is how much time apache will consider your backend server down and thus won't send him requests. So the higher the value, you gain the time for your backend to get up completely but you put more load to the others since you are -1 server for more time.
stickysession - Here if you lose a backend server for whatever reason all the sessions are on it get an error.
All right now that we described the relevant variables for your situation let's clear that apache mod_proxy does not have a health check mechanism embedded, it updates the status of your backend based on responses on real requests.
So your current configuration works as following:
Request arrives on apache
Apache send it to an alive backend
If request gets an error http code for response or doesn't get a response at all, apache puts that backend in ERROR state.
After retry time is passed apache sends to that backend server requests again.
So reading the above you understand that the first request that will reach a backend server which is down will get an error page.
One of the things you can do is indeed ping, according to the docs will check the backend before send any request. Consider of course the overhead that produces.
Although I would suggest you to configure mod_proxy_ajp which is offering extra functionality (and configuration ofc) to your tomcat backend failover detection.

Handling Http Request with Apache2 (or Nginx). Does a new process gets created for each or a set of N requests?

Will a web server (WS) (like apache2 or nginix (or container like tomcat(TC)) create a new process to handle incoming request. My concern is about servers that support high number of parallel users (say 20K+ parallel users).
I think load balancing happens on the other side of web server (if it is used to front Tomcat etc). So in theory, a single web server should be accepting all the (20K+)incoming request before it can distribute the load to other servers backing it.
So, the questions is: Does Web Server (WS) handle all these requests in a single process or it smartly spawns other process to help share the work (i know the "client - server" binding happens though - client_host:random_port plus server_host:fixed_port).
Reference: Prior to reading this article:Fronting Tomcat with Apache I was thinking it is a single process doing all the smart work. But in this article there is mentioning of MPM (Multi-Processing Module)
It combines the best from two worlds, having a set of child processes each having a set of separate threads. There are sites that are running 10K+ concurrent connections using this technology.
And as it goes, it is getting more sophisticated as threads also being spawned like mentioned above. (these are not the tomcat threads that serve each individual request by calling the service method, but these are threads on Apache WS to handle request and distribute them to nodes for processing).
If any one used MPM. Little further explanation of how all this works will be great.
Questions like -
(1) As child processes are spawned what is it exact role. Is the child process just for mediating the request to tomcat or any thing more. If so, then after the child process gets response from TC, does the child process forward the response to parent process or directly to the client (since it can know the client_host:random_port from parent process. I am not sure if this is allowed in theory, though the child process can not accept any new request as the fixed_port which can bind to only one process is already tied to parent process.
(2) What kind of load is shared to thread by the child or parent process. Again it must almost be same as in (1). But what I am not sure is that even in theory if a thread can directly send the request to client.
Apache historically use prefork model of processing. In this model each request == separate operation system (OS) process. It's calling "prefork" because Apache fork some spare processes and process request within. If number of preforked processes not enough - Apache fork new. Pros: process can execute other modules or processes and not care that they do; cons: each request = one process, too much memory used and OS fork also can be slow for your requests.
Other model of Apache - worker MPM. Almost same as prefork, but using not OS processes but OS threads. Thread - it's like lightweight process. One OS process can run many threads using one memory space. Worker MPM used much less memory and new threads created fast. Cons: modules need to support thread, crash of module can crash all threads of all OS process (but this it not important for you because you are using apache as reverse proxy only). Other cons: CPU switching context when switching between threads.
So yes, worker much better than prefork in your case, but...
But we have Nginx :) Nginx using other model (btw, Apache has event MPM too). In this case you has only one process (well, can be few processes, see below). How it works. New request rising special event, OS process waking up, receive request, prepare answer, write answer and gone sleep.
You can say "wow, but this is not multitasking" and will be right. But one big difference between this model and simple sequentially request processing. What happens if you need write big data to slow client? In synchronous way your process need to wait acknowledging about data receiving and only after - process new request. Nginx and Apache event model use asynchronous model. Nginx tell to OS to send some piece of data write this data to OS buffer and... gone sleep, or process new requests. When OS will send piece of data - special event will be sent to nginx. So, main difference - Nginx do not wait I/O (like connect, read, write), Nginx tell to OS that he want and OS send event to Nginx than this task ready (socket connected, data written or new data ready to read in local buffer). Also, modern OS can work asynchronously with HDD (read/write) and even can send files from HDD to tcp socket directly.
Sure, all math operations in this Nginx process will block this process and its stop to process new and existing requests. But when main workflow is work with network (reverse proxy, forward requests to FastCGI or other backend server) plus send static files (asynchronous too) - Nginx can serve thousands simultaneous requests in one OS process! Also, because one process of OS (and one thread) - CPU will execute it in one context.
How I told before - Nginx can start few OS processes and each of this process will be assigned by OS to separate CPU core. Almost no reasons to fork more Nginx OS processes (there is only one reason to do it: if you need to do some blocking operations, but simple reverse proxy with backend balancing - not this case)
So, pros: less CPU context switching, less memory (comparing with worker MPM too), fast connection processing. More pros: Nginx created as HTTP load balancer and have lot of options for it (and even more in commercial Nginx Plus). Cons: If you need some hard math inside OS process, this process will be blocked (but all you math in Tomcat, so Nginx only balancer).
PS: typo fix will come later, out of time. Also, my English bad, so fixes always welcome :)
PPS: Answer question about number of TC thread, asked in comments (was too long for post as comment):
Best way to know it - test it using stress loading tools. Because this number depend on application profile. Response time is not good enough to help answer. Because, for example, big difference between 200ms of 100% math (100% cpu bound) vs 50ms of math + 150ms of sleep waiting database answer.
If application is 100% CPU bound - probably one thread per one core, but in real cases all applications also spent some time in I/O (receive request, send answer to client).
If application work with I/O and need to wait for answers from other services (database, for example), this application spends some time in sleep state and CPU can process other tasks.
So best solution to create number of requests close to real load and run stress test increasing number of concurrent requests (and number of TC workers for sure). Find acceptable response time and fix this number of threads. Sure, need to check before that it is not database fault.
Sure, here I'm talking about dynamic content only, requests for static files from disk must be processed before tomcat (by Nginx, for example).

What do the different terms in Apache configuration means?

I keep coming across certain terms used in the Apache settings. While trying to understand the various discussions and Apache's docs, I need some help figuring out what some of these terms mean:
What is a Client?
What is the difference between a client and a child process? Are they the same?
If MaxClient = 255, does it mean that Apache will process up to 255 page loads in parallel and the rest are queued?
When is a KeepAlive request used?
What is the relationship between a child process and the request of this child process?
First, note that these answers apply either to Apache 1.x, or Apache 2.x only when using the prefork mode.
The machine that opens an HTTP connection and sends a request.
No, they are not the same. An Apache child can handle one request/client at a time, but when that one is finished, the same child can handle a new one.
Yes.
It is used to keep the HTTP connection open in case the client wants to issue another request. A client can remain connected, for example, to download images and such that are associated with a web page. Having KeepAlive On improves performance for the client (user), but having it off reduces memory usage by the server. It is a trade-off.
The Apache process launches a bunch of children. When a request comes in, the parent (root) process picks an idle child to handle that request. When that request is finished, the child is now idle and can handle a new request.
First, I hope you understand that apache 1.3 is very very old, and therefore the documentation will generally be somewhat harder to understand than the newer documentation (i.e. maybe you should upgrade if you have the choice).
I'm not sure where "Client" is referred to by itself in the apache docs by I would assume it refers to anything connecting to an open port and communicating.
Again, not sure where "child" is referred to by itself, so I can't help you there.
MaxClient is the number of processes apache will start to handle requests. It sounds like for Apache 1.3 that what you said is accurate, apache will only handle MaxClient requests in parallel (queuing the rest up to some other maximum for the queue).
KeepAlive is not really a request. It is sent in the request header to tell the server that the browser supports KeepAlive. It has to do with a feature of HTTP that allow one connection to be used for more than one access. If you allow KeepAlive your server will probably get less TCP connections.
I'm not even sure what you're asking here so you'll need to be more specific.