Conflicting keep-alive and idle timeouts - apache

I have an AWS Application Load Balancer with Apache HTTP 2.4 behind it. I've been getting sporatic 502 errors, and I think I've determined that it's because of the Event MPM module. When I switched to the Worker MPM module, the 502s went away. My trouble now is finding the optimal settings for the various timeouts from the ALB -> Apache -> Tomcat.
I have long-running APIs that are meant to be called system-to-system. They may take up to 15 minutes, and the client needs to wait for the response, so I've set the idle timeout on the load balancer to 900 seconds. This app also serves sub-second requests too.
I've read through this AWS article, which essentially says every subsequent downstream timeout needs to be higher than the previous timeout. (ALB idle timeout < Apache keep-alive timeout < Tomcat proxy timeout)
However, in the KeepAliveTimeout httpd documentation, it says:
Setting KeepAliveTimeout to a high value may cause performance problems in heavily loaded servers. The higher the timeout, the more server processes will be kept occupied waiting on connections with idle clients.
That seems contradictory to the AWS article.What values should I set for the Timeout and KeepAliveTimeout Apache directives?

Related

Apache http server drops connection every 60 seconds

I have an Apache HTTP web server that is load balancing for two tomcat instances using mod_jk. Every 60 seconds it seems like it the http server loses connection with the tomcat instance, but immediately reconnects. Is this something that can be turned off?
When I connect directly to one of the tomcat instances I never drop connection. Only through the http server.
Increase the value of the Timeout directive in your httpd.conf file. The default value is 60.

httpd.conf timeout parameter change effecting ELB

Does changing the timeout parameter in httpd.conf can effect the performance of ELB in AWS?
I want to increase the timeout from 60 secs to 120 seconds in httpd.conf. I dont want anything abnormal happening to ELB once I change it.
It will not affect the ELB in any way, but you will not get the desired effect. The ELB has a timeout of 60 seconds by default, if the instance behind it hasn't responded in this time the client will get a Gateway Timeout HTTP 504 from the ELB. So you need to increase also that ELB timeout to the same value to benefit from increasing your Timeout on your web server. More details here: ELB Idle Timeout. On the other hand if it is about the KeepAliveTimeout from httpd.conf, this is something different. When you connect to your website through an ELB, this will open 2 connections: one with the client and one with the instance behind it where your web server is. And if you want to reuse those connections with your backend instance you need to set KeepAlive to On and the value of KeepAliveTimeout + Timeout to be bigger than the Idle Timeout from the ELB, so the ELB will be the one managing those reusable connections and not the backend instance. KeepAliveTimeout is the timeout after the request has been served to wait until to close the connection, so INMHO a value of 10-15 seconds will do. Also please note that putting KeepAlive to On will decrease the CPU consumption - less connections to create, but it will increase the memory usage - you will have some alive connections just waiting for clients. More details on keep alive settings with ELB here.

Effects of increasing timeout limit in Apache

How increase in Apache timeout will effect my system? I have my web server behind the ELB so how their timeouts must be set to make them work properly.
I want to increase the timeout in Apache from 60 seconds to 100 seconds
Changing timeout in Apache won't change anything until or unless I change ELB timeout. Because control stays with ELB and Apache timeout should be greater than or equal to ELB's timeout.
If Apache timeout is increased to 100 ELB timeout should be either equal to Apache timeout or a little less (95-100).

Does Apache's active and idle connections contribute to MaxClients?

Is there a way / command that I can run to see the:
Active Apache workers and their total count
The Waiting workers not used because of their KeepAlive timeout value (TIME_WAIT???)
The total number of queued visitors due to Apache having reached the MaxClients limit?
Any indication of whether queuing is happening, or if my MaxClients setting is in order.
Is the Apache MaxClients setting the total of the active and idle Apache processes, or just active?
We currently have a server that seems to be hitting the MaxClients, but not with active connections but with the waiting connections as well. Is this possible?
Thank a million
Possibly mod_status is what you are looking for?
http://www.apache.org/server-status

Bad gateways with large POST uploads and my apache + varnish + plone setup

This is a rather complicated scenario, so I would highly appreciate any pointer to the correct direction.
So I have setup apache on server A to proxy https traffic το server B, that is a plone site behind varnish and apache.
I connect to A and can browse the site on https, everything is fine. However, problems start when I upload files, via plone's POST forms. I can upload small files (~1 MB), but when I try to upload a 50MB file, I wait all the time till the file is uploaded, and when the indication is 100%, I get a Bad gateway (The proxy server received an invalid response from an upstream server.)
It seems to me that something timeouts between the communication of A and B and instead of being redirected to the correct url, I get a Bad gateway, not to mention that the file is not uploaded.
On the apache log I see
[error] proxy: pass request body failed
As suggested on other threads, I've experimented with the following values with no luck
force-proxy-request-1.0
proxy-nokeepalive
KeepAlive
KeepAliveTimeout
proxy-initial-not-pooled
Timeout
ProxyTimeout
Sooooo..any suggestions? Thanks a million in advance!
Did you check the varnish configuration? varnish has some timeouts of its own, I am familiar with send_timeout which usually breaks downloads if they fail to finish within a few seconds (Varnish really isn't any good for large downloads, because you end doing stupid things like configuring send_timeout=7200 to make it work).
Also, set first_byte_timeout to a larger number for that backend, because a large file upload might delay plone's response just enough to cause this.
Setting the Timeout and KeepAliveTimeout in the apache virtual host file worked for me.
Example:
Timeout 3600
KeepAliveTimeout 50