So I've got a problem where a small percentage of incoming requests are resulting in "400 bad request" errors and I could really use some input. At first I thought they were just caused by malicious spiders, scrapers, etc. but they seem to be legitimate requests.
I'm running Apache 2.2.15 and mod_perl2.
The first thing I did was turn on mod_logio and interestingly enough, for every request where this happens the request headers are between 8000-9000 bytes, whereas with most requests it's under 1000. Hmm.
There are a lot of cookies being set, and it's happening across all browsers and operating systems, so I assumed it had to be related to bad or "corrupted" cookies somehow - but it's not.
I added \"%{Cookie}i\" to my LogFormat directive hoping that would provide some clues, but as it turns out half the time the 400 error is returned the client doesn't even have a cookie. Darn.
Next I fired up mod_log_forensic hoping to be able to see ALL the request headers, but as luck would have it nothing is logged when it happens. I guess Apache is returning the 400 error before the forensic module gets to do its logging?
By the way, when this happens I see this in the error log:
request failed: error reading the headers
To me this says Apache doesn't like something about the raw incoming request, rather than a problem with our rewriting, etc. Or am I misunderstanding the error?
I'm at a loss where to go from here. Is there some other way that I can easily see all the request headers? I feel like that's the only thing that will possibly provide a clue as to what's going on.
We set a lot of cookies and it turns out we just needed to bump up LimitRequestFieldSize which defaults to 8190. Hope this helps someone else some day...
Related
I'm wondering if there is some way in ModSecurity Apache2 module (version 2.9.1) how to log error messages into log file specified by the SecDebugLog option but don't duplicate them into the standard Apache error log file?
According to ModSecurity documentation the error messages are always doubled in both log files: Messages with levels 1–3 are designed to be meaningful, and are copied to the Apache’s error log. But I'd like to keep the ModSecurity stuff separate and don't mess the standard error log.
You can remove log from any of the Rules and just leave auditlog.
If using the OWASP CRS then change the default action from this:
SecDefaultAction "phase:1,deny,log"
SecDefaultAction "phase:2,deny,log"
to this:
SecDefaultAction "phase:1,deny,nolog,auditlog"
SecDefaultAction "phase:2,deny,nolog,auditlog"
Which will turn off all logging, but then turn on auditlogging again.
You may also want to add similar for phase 3 and 4 depending on whether you are also checkout outbound traffic.
However I would really, really, really caution against this for a number of reasons:
You will eventually block something with a ModSecurity rule and wonder why it's happening and skip over the Audit log and blame Apache. Trust me. "Why is this request returning 403 when I can see the page exists?!?!" At least if in the error log then you've another chance to see why this is so.
The entry in the error log is in one line. This makes it much easier to collect, parse and deal with errors in tools like Splunk. The audit log is spread over several lines so is less machine readable. And you should be reviewing your WAF logs regularly and not just assuming it's working correctly and only look at logs when something goes wrong. Maybe not in detail at each log level but in summary. Ivan Ristic, the original creator of ModSecurity, recently tweeted:
"If you’re not using your WAF as an IDS, you’re doing it wrong."
These are errors. And the error log is therefore the right place for them. The audit log is then a useful place to get extra detail if you cannot explain the errors.
I’m trying to get a mod_perl2 application ported to AWS. As part of the port I thought I’d move from Debian Squeeze to Wheezy with the latest stable mod_perl & Apache2 combination.
The application works right up to the point I try and write JSON responses to the client. At this point, each request is canceled on the client and on the server I get the error
Apache2::RequestIO::print: (103) Software caused connection abort
whenever I write to the client, i.e.:
$self->req->print($output);
I’ve tried tcpdumping the response to the client, and I can see it being written out, but no response is received on the client end and it just barfs chips. I can’t find any information on how to get around this.
I found quite a few people asking about this question on the net without many answers. The solution to my problem was very specific but I thought I’d post what I did anyway, it may help someone.
The client was canceling the request before the response was fully written, which was crapping out Apache::RequestIO (for reasons I still don’t know).
I couldn’t work out why I was seeing this behavior.
By using tcpdump I could see that data was being written out to the client – and it looked fine.
By inspecting the page in Chrome and looking at the network stack, I could see that my request for data was being canceled after no response was received (which was odd because the code worked fine on other servers and I could see the response was being written). Debugging was may harder because with Apache crashing out with an error in print IO I couldn’t check if the bytes written equaled the bytes of data. I wasn’t sure if something was getting stuck on the server side.
So, I changed the Content-Type of the response from application/json to text/html, so that I could query the page and just look at the actual response as text. Once I did that, I could see that the response was fine.
I started to look for other causes, and I found that in the migration to the new server, I’d missed altering some URLs in the DB to point to the new server, which meant my application was trying to get some data from the old DB.
This in turn was causing a load of timing issues, which was causing my problems. Once I fixed the config, the problems went away.
The similar problem is described here: GWT IllegalArgumentException: encodedRequest cannot be empty
My GWT application is deployed in Tomcat6, which is linked with Apache by using Coyote/JK2 connectors. For SSO I use the mod_auth_sspi/1.0.4.
When I use IE8, pages is not displayed, but for Firefox everything OK. In Tomcat logs I see the following:
SEVERE: Exception while dispatching incoming RPC call
java.lang.IllegalArgumentException: encodedRequest cannot be empty
at com.google.gwt.user.server.rpc.RPC.decodeRequest(RPC.java:232)
at org.spring4gwt.server.SpringGwtRemoteServiceServlet.processCall(SpringGwtRemoteServiceServlet.java:32)
at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(RemoteServiceServlet.java:248)
at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:643)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at gov.department.it.server.RequestInterceptorFilter.doFilter(RequestInterceptorFilter.java:90)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:311)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:776)
at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:705)
at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:898)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
at java.lang.Thread.run(Thread.java:619)
What have I tried so far:
1) Can't find the registry key DisableNTLMPreAuth (IMHO it's not the solution, because in my case IE 8 is actively used).
2) I have installed and configured the Native Windows Authentication Framework WAFFLE
web.xml:
...
<filter>
<filter-name>NegotiateSecurityFilter</filter-name>
<filter-class>waffle.servlet.NegotiateSecurityFilter</filter-class>
<init-param>
<param-name>waffle.servlet.spi.NegotiateSecurityFilterProvider/protocols</param-name>
<param-value>NTLM</param-value>
</init-param>
</filter>
...
<filter-mapping>
<filter-name>NegotiateSecurityFilter</filter-name>
<url-pattern>/my-app/*</url-pattern>
</filter-mapping>
...
But it did not help.
3) In worker.properties I set socket_keepalive=0, but it did not help too -
worker.ajp13.type=ajp13
worker.ajp13.host=localhost
worker.ajp13.port=8009
worker.ajp13.lbfactor=50
worker.ajp13.cachesize=10
worker.ajp13.cache_timeout=600
worker.ajp13.socket_keepalive=0
worker.ajp13.socket_timeout=300
What else can I try to do?
You have rediscovered the 7 year old bug #1 in mod_auth_sspi which has affected numerous projects, frustrated numerous developers, and caused uncountable wasted man-hours over the years. Yet it still stands unresolved because the maintainer doesn't consider it a bug. Nor has it been addressed by Microsoft for older browsers, because indications are that IE9 doesn't have this problem.
Cause
It is caused by IE trying to be 'smart' and sending a zero content-length POST (I named it 0POST to try making it an indexable term to benefit those who rediscover it in the next 7 years.) with an NTLM auth header in anticipation of being challenged by the server. IE does this when it has been authenticated before in that protection space. So it knows that it will be challenged again. Sadly mod_auth_sspi is not as smart as IE, so bad things happen on the server side when a 0POST arrives and it is let through to the apps without being challenged. Except that sometimes this can happen even for unprotected areas, if they are under an area that requires authentication.
Other browsers don't pretend to be as smart as IE and don't try to save a few bytes on the first round trip for "performance", so they don't run into this problem. Here is Microsoft's explanation of this behavior.
Horrible Workaround
In Apache httpd.conf set
SSPIPerRequestAuth On
This is equivalent to the DisableNTLMPreAuth IE client-side fix you mentioned, which is impractical for a large user group. Plus it amounts to crippling all non-Apache apps also, which may be capable of handling a 0POST. There are literally NO examples of this setting being discussed or its side effects explained on the web, so I am including this only link I found that sheds some light on it. Anyway, making one server side change seems to be the lesser of the two evils. Although now, by changing the server config, you have crippled all other innocent browsers visiting this site as well.
The problem with this workaround is that it forces EVERY request to perform an SSPI handshake which results in a lot of extra 401 traffic and can affect performance. For performance, NTLM authentication is treated as 'session-based' not 'request-based' which means that the handshake occurs only at the start of the session. When using this setting, you should also set filters to prevent your log filling up with 401s. Also note that this requires KeepAlive to be turned on.
I am not sure your setup is the same as the one described in the WAFFLE fix; were they using Apache like you? I think WAFFLE applies to Tomcat, whereas you have Apache in front, so Apache is handling authentication. You might consider using that setup instead of Apache. If you can use that setup, it may be a better option than this workaround because WAFFLE has explicitly accounted for 0POST and can handle it. The author had also discovered this gem while working with GWT like you.
Interestingly, for jcifs, a fix for this very issue was posted 9 Years ago. The author also provided an excellent explanation later:
The code in the filter examines all HTTP POST requests and determines
if they contain an NTLM type 1 message. If the request contains an
NTLM type 1 message, the filter responds with a dummy type 2 message
to entertain IE's desire to re-negotiate NTLM prior to submitting any
POST data. The browser should then respond with an NTLM type 3
message along with the post data which the filter should then allow to
chain to the rest of the web application.
A simple patch was also created for mod_auth_sspi 5 years ago, if you are interested. See diff in the author's own repo. I am not sure if I agree with that approach though. It tries to detect IE/0POST, whereas I think the right fix should be to detect if the client is requesting auth with a NTLM Type 1 header, as in the jcifs filter. (Type 1 simply means that it is the first message of the handshake)
I wonder if anyone has used alternatives to mod_auth_sspi like mod_auth_ntlm_winbind and if they don't exhibit this behavior. If you have, please leave a comment. We already know WAFFLE works, but it is not a mod_auth_sspi replacement.
One alternative is to forget NTLM and use Kerberos, (mod_auth_kerb) but many people find that too complicated to setup. IE will behave this way on any challenge-response scheme, so odds are that kerb auth could run into the same problem, since a similar 401 sequence happens in both cases. But being a different module, its possible it is capable of handling this.
Lastly, I should mention that there is yet another issue that this per-request auth workaround doesn't seem to fix. I haven't seen it discussed anywhere, but I have found that sometimes after the 0POST, the server waits for a very long time before it responds with the final 200 response with the results of the (proper) POST. This long delay happens only in the end though, NOT immediately in response to the 0POST. That goes fine, and the handshake completes, but the server doesn't respond until after a long wait which I have noticed is suspiciously close to 90 seconds, like some sort of timeout. The practical result of this is that when users log in, IE8 will sometimes hang for 90sec waiting for server response. I thought the KeepAlive might be causing it, but it is not even explicitly defined in my config, so I assume it is at the 15sec Apache default. But I am sure this is related to the 0POST, because it happens only right after a successful 0POST auth handshake. Our server is in a separate (2-way) trusted domain across a firewall, so maybe that has something to do with it.
Diverse Examples of This Issue
https://confluence.atlassian.com/display/JIRAKB/NullPointerException+when+Authenticating+from+IE
http://trac.edgewall.org/ticket/2696
http://trac.edgewall.org/ticket/4560
https://drupal.org/node/82530
http://www.webmasterworld.com/apache/3087425.htm
Why "Content-Length: 0" in POST requests?
https://jira.springsource.org/browse/SEC-1087
The most hilarious example is how IE's smartness affected Microsoft's own products! They themselves couldn't understand how to deal with IE's behavior, causing a bug in ISA Server 2006.
http://support.microsoft.com/kb/942638
I have a WCF service that among other bindings also uses WebHttpBinding for JSON inputs/results.
I made a custom IErrorHandler implementation in order to be able to set the StatusCode to 400 when something goes wrong and also return a JSON understandable message. It´s the straight implementation that you can find everywhere (nice way described here).
My problem is: when I test it locally using Visual Studio Web Development Server (Cassini) it works perfectly. However, when I deploy it to my test server (Windows 2008 with standard config for IIS and everything else) it does not work.
When I call it and debug with Firebug I get a HttpStatusCode 200 as a return and no response text. With Fiddler I get a HttpStatusCode 504 and no return at all. However, the behavior I expected (and what happens locally) is a call to the error callback of the ajax call with the responseText set.
I debugged it remotely and everything looks just fine. The execution pipeline is OK and all the classes are called as they should be just like they are locally, except it does not work.
Any suggestions? I´m pretty much out of options here to figure this out.
Thanks a lot!
if firebug and fiddler are giving different results, what happens if you telnet to it directly and perform a request (Something like:)
GET /VirtualDirectoryAndGetData HTTP/1.1
HOST: example.com
[carriage return]
It wouldn't surprise me if you're somehow getting odd headers/formatting back (to explain why firebug/fiddler disagree)
Another thing to test would be publishing to your dev machine to see if it's a machine-specific issue or a server vs dev webserver issue.
If it's happening anywhere outside VS, you might also try commenting out the lines where you set
rmp.StatusCode = System.Net.HttpStatusCode.BadRequest;
rmp.StatusDescription = "Bad request";
This may indicate whether it's a response code issue or an error handler issue.
If you can edit your question to include the results (with sensitive info removed), we'll see if we can track it down further.
Edit: after looking at the question again, it may well be that the server is erroring before it can send ANY response. FF might assume 200 by default, whereas ie might assume 504 (Gateway Timeout). This is total speculation but is possible. Do you see anything in the event logs?
I had a similar issue which I was able to solve. Take a look at the IIS settings. Details on how I overcame the issue are in this post: IErrorHandler returning wrong message body when HTTP status code is 401 Unauthorized
If you have to take a site down for some type of unavoidable maintenance task (and it's not a big enough site that you have a backup server), what HTTP status code should you have your server return to minimize the possibility that search engines will think the site is gone?
I found this list of status codes from W3C, of which the following seem applicable:
503 Service Unavailable
500 Internal Server Error
408 Timeout
404 Not Found
I think 503 is the most appropriate, but I don't know what search engines might prefer.
From the horse's mouth:
If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the "down for maintenance" page?
You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.
Don't send a 404 -- they may remove you from their index.
I'd probably send a 503 and an appropriate Retry-After, although I don't know if anything actually uses the header.
According to Google the 503 code would be the way to go, since it means "the server is temporarily unavailable."
Also check out the W3C page on the same.