getting lots of 408 status code in apache access log after migration from http to https - apache

We are getting lots of 408 status code in apache access log and these are coming after migration from http to https .
Our web server is behind loadbalancer and we are using keepalive on and keepalivetimeout value is 15 sec.
Can someone please help to resolve this.

Same problem here, after migration from http to https. Do not panic, it is not a bug but a client feature ;)
I suppose that you find these log entries only in the logs of the default (or alphabetically first) apache ssl conf and that you have a low timeout (<20).
As of my tests these are clients establishing pre-connected/speculative sockets to your web server for fast next page/resource load.
Since they only establish the initial socket connection or handshake ( 150 bytes or few thousands) the connect to the ip and do not specify a vhost name, and got logged in the default/firs apache conf log.
After few secs from the initial connection they drop the socket if not needed or the use is for faster further request.
If your timeout is lower than these few secs you get the 408 if is higher apache doesn't bother.
So either you ignore them / add a different default conf for apache, or you rise the timeout having more apache processes busy waiting from the client to drop or use the socket.
see https://bugs.chromium.org/p/chromium/issues/detail?id=85229 for some related discussions

Related

How should I configure the health check for an https backend of an oci load balancer

I have an oci load balancer setup to front-end my webservers. I have the http backendSet configured using the http protocol and port 80 to handle the healthcheck by checking for a status code of 200.
However i also need a backendSet for handling https traffic. The choices for protocol for the health check only include http and tcp, but my understanding is that using a tcp health check only verifies that the server is up not that the webserver is working properly. If I use the http protocol with port 443, I get inconsistent status codes back from the webserver.When the webserver tries to perform the ssl handshake I get a 400 bad request status back, and other times I get a 403 unauthorized or 200 ok status code.
If I look at the apache access_log file on the backend webserver I see these various status codes, so I am not certain the best way to configure the health check so I can get anything other than a critical health check.
Any suggestions would be appreciated.

Why is Apache unable to serve static content with high concurrency on Windows?

When testing Apache 2.4.16 on windows 7,8,2012 there is a severe limitation when serving static content. Apache can't serve more than 700 concurrent requests for static content with Keep Alive OFF.
When you attempt to do that one of two things will happen:
You will be able to server few thousand requests at first and then the remaining requests will take up to 10 seconds to complete.
OR
You will receive a connection refused error
Test method:
siege -b -c700 -t10s -v http://10.0.0.31/10k.txt (10KB file)
OR
ab -c 700 -n 40000 http://10.0.0.31/10k.txt
However, when testing with Apache bench on the localhost (bypassing the network) Apache works fine and can serve 1000 concurrent requests for 10K static file.
Apache has ThreadsPerChild 7000 (increasing it to 14000 didn't make any difference)
MaxConnectionsPerChild 0
Stack parameters:
MaxUserPort = 65534
TcpTimedWaitDelay = 30
Server has over 60,000 ephemeral ports available starting with port 5,000 to port 65534
My load testing server is Linux on a separate server and sends requests over the network to Apache Windows server over 10Gb/s network.
There are no errors in the Apache log and nothing in the system logs. The tasklist doesn't show anything unusual.
netstat shows few thousand (5,000) of open TCP connections and then Apache stops responding. However when testing with lower concurrency of 300 then the OS can open 60,000 of TCP connections and Apache works fine.
Potential Conclusions:
At first I thought this is OS stack tuning problem but serving php file with the same concurrency works fine.
ab -c 700 -n 10000 http://10.0.0.31/phpinfo.php
Then I tried Nginx for windows on the same machine and Nginx served this without a problem.
ab -c 700 -n 10000 http://10.0.0.31/10k.txt
Nginx was able to serve much higher concurrency up to 2000 requests per second (static content) and the OS opened about 40,000 TCP connections.
So this looks to me like a bug or a limitation in the way Apache communicates with the TCP/IP stack on windows.
When trying to duplicate this problem make sure Keep Alive is OFF and test it over the network (not on localhost).
Any answers or comments on this subject will be greatly appreciated.
Thanks to covener's suggestion here is the answer.
Keep Alive was intentionally disabled to simulate a large number of users connecting from different IP addresses and spawning new TCP connections.
Setting AcceptFilter http to "none" together with turning off MultiViews improved the performance on static content and allowed Apache on windows to serve with concurrency of 2000 and beyond untill all ephemeral ports get exhausted.

(103) Software caused connection abort: proxy: pass request body failed

The following errors are being logged in our proxy Apache logs while processing the request with Tomcat Server:
(103)Software caused connection abort: proxy: pass request body failed
proxy: pass request body failed
We've a Apache reverse proxy which serves the request for the client from our Tomcat Server. Sometimes, the request from the proxy returns 502 with the above error. There are no error logs in Tomcat Server Logs correlated with the above errors in Proxy. Also, the request didn't timeout since some of the requests response time is 1 sec and our default timeout is 120 sec.
We've added ProxyBadHeader Ignore to our httpd configuration [Ref: 502 Proxy Error / Uploading from Apache (mod_proxy) to Tomcat 7] and still didn't see any errors in our Tomcat logs.
Have anyone seen this issue earlier?
We recently had this issue after upgrading one of our machines from Tomcat 6 to 7. Someone forgot to change the default apache-tomcat/conf/tomcat-users.xml file from our standard one and so the wrong password was getting checked by the server. Interestingly this results in the 502 Error you saw above. This can be avoided with some decent logging to determine it is actually an auth problem.

Plone taking a long time to respond to byte-range request

We have two recently upgraded Plone 4.3.2 instances behind a haproxy load balancer which itself is behind Apache.
We limit each Plone instance to serving two concurrent requests using haproxy configuration.
We recently encountered an issue whereby a client sent 4 byte-range requests in quick succession for a PDF that each took between 6 and 8 minutes to get a response. This locked up all available requests for 6 minutes and so haproxy timed out other requests in the queue. The PDF is stored an ATFile object in Plone which I believe should have been migrated to blob storage in our recent upgrade.
My question is what steps should we take to prevent a similar scenario in the future?
I'm also interested in:
how to debug why the byte-range requests on an otherwise lightly loaded server should take so long to respond
how plone.app.blob deals with byte-range requests
is it possible to configure Apache such that byte-range requests are served from its cache but not from the back-end server
As requested here is the haproxy.cfg with superfluous configuration stripped out.
global
maxconn 450
spread-checks 3
defaults
log /dev/log local0
mode http
option http-server-close
option abortonclose
option redispatch
option httplog
timeout connect 7s
timeout client 300s
timeout queue 120s
timeout server 300s
listen cms 127.0.0.1:18181
id 3
balance leastconn
option httpchk
http-check send-state
timeout check 10s
acl cms_edit url_dom xxx.xxx.xxx.xxx
acl cms_not_ok nbsrv() lt 2
block if cms_edit cms_not_ok
server cms_instance1 app:18081 check downinter 10s maxconn 2 rise 1 slowstart 300s
server cms_instance2 app:18082 check downinter 10s maxconn 2 rise 1 slowstart 300s
You can install https://pypi.python.org/pypi/Products.LongRequestLogger and check its log file to see where the request gets stuck.
I've opted to disable byte-range requests to the back-end Zope server. I've added the following to the CMS listen section in haproxy.
reqidel ^Range:.*

Idle socket connection to Apache server timeout period

I open a socket connection to Apache server however I don't send any requests waiting for a specific time to do it. How long can i expect Apache to keep this idle socket connection alive?
Situation is that Apache server has limited resources and connections require to be allocated in advance before they all gone.
After request is sent server advertise its timeout policy:
KeepAlive: timeout=15,max=50
If consequent request is sent in longer then 15 seconds it gets 'server closed connection' error. So it does enforce the policy.
However, it seems that if no requests are sent after connection was opened Apache will not close it even for as long as 10 minutes.
Can someone shed some light on behavior of Apache in such situation.
According to Apache Core Features, TimeOut Directive the default timeout is 300 seconds but it's configurable.
For keep-alive connections (after the first request) the default timeout is 5 sec (see Apache Core Features, KeepAliveTimeout Directive). In Apache 2.0 the default value was 15 seconds. It's also configurable.
Furthermore, there is a mod_reqtimeout Apache Module which provides some fine-tuning settings.
I don't think that any of the mentioned values are available for http clients via http headers or any other forms. (Except the keep-alive value of course.)