Bots throws 500 error in apache access log - apache

In my Apache error log I can see the following errors has caught on enormous amount everyday.
[Tue Jan 15 13:37:39 2013] [error] [client 66.249.78.53] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
When I check the corroesponding IP, Date and Time with the access log I can see the following
66.249.78.53 - - [15/Jan/2013:13:37:39 +0000] "GET /robots.txt HTTP/1.1" 500 821 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
I've tested my robot.txt file in the Google Webmster tool -> Health -> Blocked URLs and it's fine.
Also when some images accessed by bot's it throw the following error,
Error_LOG
[Tue Jan 15 12:14:16 2013] [error] [client 66.249.78.15] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
Accessed_URL
66.249.78.15 - - [15/Jan/2013:12:14:16 +0000] "GET /userfiles_generic_imagebank/1335441506.jpg?1 HTTP/1.1" 500 821 "-" "Googlebot-Image/1.0"
Actually the above image URL (and several other images in our access log) are not available on our site (they were available before a website revamp that we did in August 2012), and we thrown 404 errors when we go to those invalid resources.
However once in a while, it seems that bots (and even human visitors) generate this type of error in our access/error log, only for static resources like images that don't exist, and our robots.txt file. The server throws a 500 error for them, but actually when I try it from my browser - the images are 404 and the robots.txt is 200 (success).
We are not sure why this is happening and howcome a valid robot.txt and inavalid image can throw a 500 error. We do have a .htaccess file and we are sure that our (Zend framework) application is not being reached, because we have a separate log for that. Therefore, the server itself (or.htaccess) is throwing the 500 error "once in a while" and I can't imagine why. Could it be due to too many requests to the server, or how can I debug this further?
Note that we only noticed these errors after our design revamp, but the web server itself stayed the same

It might be useful to log the domain that the client is accessing. Your server might be accessible via multiple domains, including the raw IP address. When you're testing, you're doing so via the primary domain and everything works as expected. What if you try to access the same files via your IP (http://1.2.3.4/robots.txt) vs. the domain (http://example.com/robots.txt)? Also example.com vs. www.example.com or any other variation that points to the server.
Bots can sometimes hold on to IP/domain info long after an address has changed and may be attempting to access something that the rules were changed for months ago.

Related

Apache access log is reporting thousands of GET requests

Some weeks ago was being overwhelmed with ddos attack
After some research i fixed the proxy misconfiguration and activated Mod_evasive for apache, the slowlyness was gone.
I'm still receiving some types of request like this one
51.159.210.176 - - [01/Dec/2022:20:51:26 +0000] "GET http://chongzhi.jusunpay.com:8122/pay HTTP/1.1" 302 217
Are they harmless to my server? Also there is a way to disable this specific log? Because the access_log file is getting close to 50GB

404 error doesn't appear in Apache error.log

If a visitor gets 404 error, nothing is written in apache error.log. In access log it appears like this:
GET /qqq HTTP/1.1" 404 409 "-"
And nothing in error.log. I have tried everything about LogLevel. As I understand, it is because that 404 page is custom page like
ErrorDocument 404 /new404.html
But I run search through all /etc/apache2 for text "404" in files and nothing was found there (instead of commented lines). What can be the problem? Or maybe I can somehow disable custom 404 page in .htaccess file? Or any other ways to display 404 errors in error.log?
As the person who filed the Apache bug which demoted 404 from Error to Info level as of Apache 2.4.1, here's the justification:
In production HTTP servers open to the Internet, 404s happen all the time. Malware, scanner scripts, and all sorts of other things probe Web servers for vulnerabilities or just because they can, and these things would all trigger errors which will end up being logged somewhere if the appropriate error level is set.
Most production Web server admins are content with seeing 404s in their access logs (which are logged right alongside 200s and 30x redirects), and want to see real server problems -- things they have control over fixing -- in the error log. The logging of 404s in error.log can, in some servers, be so much log spam that it drowns out legitimate problems needing the administrator's attention.
404 is a content issue, not a server issue. So my recommendation is to look in your access.log (or equivalent) for them. If you really want content related issues logged in error.log, you need to set LogLevel core:info. This will give you 404s there, and a few other kinds of content-related error messages too.
404 "errors" don't normally appear in the Apache error log, regardless of whether you have a custom ErrorDocument defined or not.
A 404 error is not strictly a server error. It's an expected HTTP response, so it naturally appears in the access log (as you have stated), not in the error log. The "404" is the HTTP response code, not a server error code.
However, you should be able to enable additional "information" messages in your error logging (eg. LogLevel info on Apache 2.4) to get this "information" in your system error log:
[Mon Feb 06 08:00:00.090525 2017] [core:info] [pid 13876:tid 1748] [client 203.0.113.111:54493] AH00128: File does not exist: /home/user/public_html/path/to/file
Note, however, that there is no mention of "404" - which maybe why your searches came up blank. This LogLevel should not be maintained on a production server.
Maybe this helps somebody...
I had zombie apache instances running (with a slightly different config loaded), and every other request for a static resource defined using an Alias was 404-ing.
Killed the zombies and all good...
404 is a server response, not a error.
You can get the 404 log doing something like:
cat /var/log/apache2/access_log | grep " 404 " | awk -F' ' '{print $4," ",$5,"-",$7}' > /root/404.log
and adjust the awk as you need!

Why does Apache return 403

Why can't I see why Apache returns 403?!
If I look in the access log the only information I get is
193.162.142.166 - - [29/Jan/2014:18:34:26 +0100] "POST /api_test/callback.php HTTP/1.1" 403 2293
How can I get more information about why the request is forbidden/rejected?
The call is made from a payment gateway...
If the callback URL is a http request there are no problems and returns 200 OK
If the callback URL is a https my server returns 403.. I need to know why?
The server has SSL and openSSL installed and it works!
Have tried to do the https request from http://web-sniffer.net/ and then there are no problems..
I don't get it.. There must be something in the request headers from the payment gateway which results in 403
update
error log
[Wed Jan 29 20:45:55 2014] [error] No hostname was provided via SNI for a name based virtual host
solution
Ok it looks like the client doesn't support SNI
http://wiki.apache.org/httpd/NameBasedSSLVHostsWithSNI
Use the LogLevel directive to adjust how verbose the error logs are and increase until you can see what you want.
httpd 2.4 has better messages in a lot of respect and expensive list of LogLevel settings than 2.2. So if you're using 2.2 it may be a bit harder to figure this out.

Magento .htaccess file RewriteBase setting?

I've resolved a problem I got when I set up a staging environment for my existing live Magento store. But I don't understand why it worked & why I didn't have the problem on my live site.
The was the error I was getting, whenever I tried to navigate off my sites staging homepage I got a 500 Internal Server Error.
In the error logs I got this:
[Tue Dec 17 01:12:52 2013] [error] [client 127.0.0.1] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
[Tue Dec 17 12:56:17 2013] [error] [client 127.0.0.1] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace., referer: http://localhost.mysite.com/
With a little research online, I solved it by changing the .htaccess file RewriteBase setting to RewriteBase /. On my live site this setting is commented out as #RewriteBase /magento/.
Why is this setting only needed in my staging environment?
Should it be on the live environment too or should it be avoided
entirely?
I'm running the site locally on an Apache2 server on an Ubuntu
machine, maybe it has something to do with my local server set up?
Why is this setting only needed in my staging environment?
Probably because your staging environment is in a directory called /magento/ inside your document root. When you have checks like:
RewriteCond %{REQUEST_FILENAME} !-f
and the base is wrong, the check to see if a file exists fails. The base is used to append to the beginning of relative URL-paths. So if your files are in /magento/ then without the proper base, your checks will fail and your rules will loop indefinitely (or until the internal recursion limit is reached). On your production environment, the files are probably in your document root, so the base isn't completely necessary, since the rules are in the same directory as the files you are rewriting to.
As for the other 2 questions, can't answer without looking at all your rules and all of your setup.

Apache: multiple ../ in query string = internal server error (error 500)

here's the problem: when requesting url like - http://server/path/to/file.html?param=../../something/something i get response:
500 Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
...
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
log says:
xxx.xxx.xxx.xxx - - [05/Mar/2010:13:43:29 -0500] "GET /path/to/file.html?param=../../something/something HTTP/1.1" 404 - "-" ...
if i remove one instance of '../' in query string (request http://server/path/to/file.html?param=../something/something ), i get the reqested page. it gives error only on two or more '../'s.
this is on some hosting server, and the same thing gives no error on my local servers (LAMP, WAMP). i suppose it's about apache configuration, but i don't know what options to check.
Apache2.2.14 (Unix) is in question, PHP is installed (but it clearly doesn't have anything to do with PHP when i'm requesting plain ol' HTML file), mod_rewrite rules are disabled (no .htaccess files in requested file's path).
any ideas on how to succeed passing multiple '../'s in query string?
turned out to be security precaution enabled by default by hosting provider - not allowing 'backpaths', but i'm not sure which one, and where it's set.