Properly setting up Apache2 to avoid upload timeouts with Microsoft OneDrive - apache

I have a simple Perl script that uploads a file from an HTML form, and it does works. i.e Uploads a file from my local Mac HD to my web server via a webpage.
What I have noticed, however, if I try to upload files from Microsoft's OneDrive I am more likely to get the following info below. I have no problems using my OneDrive via Mac's Finder, or my iPhone, etc.
access.log
[14/Feb/2022:23:36:51 -0500] "POST /cgi-bin/upload2.pl HTTP/1.1" 408
487 "http://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X
10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.2
Safari/605.1.15"
error.log
[Mon Feb 14 23:37:02.121496 2022] [cgi:error] [pid 3734:tid
140367391328000] (70007)The timeout specified has expired: [client
-.-.-.-:58184] AH01225: Error reading request entity data, referer: http://example.com/
My Apache2 settings are:
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 5
At my slim level of knowledge of Apache, I am assuming that my problem is all about timing. If that's the case, can I change settings above to help? Or am I off base?
*One small thing I noticed, but I don't know if it means anything
My web browser says:
Server timeout waiting for the HTTP request from the client. and it
mentions port 80.
However, in the error.log it mentions port 58184. I don't know if that's normal, due to routers, other routine behavior.

Set KeepAlive to Off - it seems to have a detrimental effect with busy Apache servers.

Related

CORs problem before any code is hit, but only in browser XHR requests

I have an API built on CakePHP. It works for the most part but every once in a while browser access to the API dies. The error message on the XHR request response is:
'Access to XMLHttpRequest at 'http://be:8888/api/pings' from origin 'http://localhost:8080' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
However if I make exactly the same request via POSTMAN (or if I browse directly to the URL, rather than via XHR) it works without any trouble. I thought it might be a pre-flight OPTIONS issue but the request headers don't list a Request Method and the Apache access log shows these to be GET requests. There's nothing related in the Apache error log.
Restarting MAMP – i.e. Apache – does not fix the issue, nor does flushing the local DNS cache. The only thing that fixes it is a restart, after which it all works fine again for a few hours before eventually going on the blink again.
I can't think of what's causing this. I don't think it's a true CORS middleware error because the restart fixes it and the API is accessible normally. Also if I put a die in during the CORS middleware __invoke method it doesn't get that far (the die in the webroot index should be hit first anyway).
I get this error even if I disable the app by putting die('here'); at the start of the webroot index.php file.
Even if I delete the index.php files (both in the project root and webroot) so that browsing to the URL shows Apache's default error 404 not found: The requested URL /webroot/index.php was not found on this server, I still get the CORS errors when trying via XHR in the browser.
I've only noticed this issue since upgrading to Mac OS X Catalina.
What could be causing this?
[Update:]
Here's proof that it is working in the browser after a system restart:
Summary
URL: http://be:8888/api/clients
Status: 200 OK
Source: Network
Address: ::1.8888
Request
GET /api/clients HTTP/1.1
Accept: application/vnd.api+json
Content-Type: application/vnd.api+json
Origin: http://localhost:8080
Accept-Language: en-gb
Access-Control-Allow-Origin: *
Host: be:8888
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15
Referer: http://localhost:8080/
Accept-Encoding: gzip, deflate
Connection: keep-alive
However, after a few hours it stops working. If it was actually a CORS issue my understanding is that it would NEVER work.
I'm not an Apache/PHP professional, but make sure you query HTTPS via HTTPS and http via http, other words, both sides should be same. Also, check the request header. And add "access-control-origin" in request header.

Random chars appearing in Apache access logs

We are seeing random letters appear in access logs. The requests 404 since the content does not exist. The requests are made by a variety of users and other requests from the same ip usually look genuine. There is no way to request these from the site. Some of these requests even appear from internal traffic on our network.
Example:
157.203.177.191 - - [04/Feb/2018:23:51:20 +0000] "GET /VLTRP/content/dam/example/dotcom/images/ABtest/existing-customer-thumb.jpg HTTP/1.1" 404 60294 39082 "http://www.example.com/shop.html" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0" 2
Without the /VLTRP this is a genuine request. Has anyone seen something similar before?
For info we are running Apache/2.2.15 (Unix) with ModSec enabled. We do see similar behaviour on another site where we do not have ModSec configured. We see similar requests for internal, external and bot traffic.

modified .htaccess file doesn't work with BrowserMatchNoCase

The servers (Ubuntu Server) it's going down because a 360Spider it's running too many request per seconds, I am trying to resolve this using this configuration in the .htaccess file:
BrowserMatchNoCase "360Spider" bots
BrowserMatchNoCase ^360Spider bots
Order Allow,Deny
Allow from ALL
Deny from env=bots
And works partially because the error.log logs some of this events:
[Sun Jul 20 23:30:15 2014] [error] [client 10.183.200.5] client denied by server configuration: /var/www/view, referer: http://www.mysite.org/
But the access.log it's still saving information about the 360Spider:
10.183.200.5 - - [20/Jul/2014:23:31:33 -0400] "GET /view/article/154967 HTTP/1.1" 403 536 "http://www.mysite.org/view/article/154967/" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider"
I want to block all that have the word 360Spider.
PD: I can't block the bot using the ip because all the traffic come with the same ip. I just can work with the .htaccess file.
Any IP Address, or bot going to a url/website will most likely make a GET request — Apache logs it. Just because you see it in the log does not mean it isn't blocked; Your access.log clearly shows that it is.
When the bot tried to GET /view/article/154967 it was denied (403 Forbidden).

Why does Apache return 403

Why can't I see why Apache returns 403?!
If I look in the access log the only information I get is
193.162.142.166 - - [29/Jan/2014:18:34:26 +0100] "POST /api_test/callback.php HTTP/1.1" 403 2293
How can I get more information about why the request is forbidden/rejected?
The call is made from a payment gateway...
If the callback URL is a http request there are no problems and returns 200 OK
If the callback URL is a https my server returns 403.. I need to know why?
The server has SSL and openSSL installed and it works!
Have tried to do the https request from http://web-sniffer.net/ and then there are no problems..
I don't get it.. There must be something in the request headers from the payment gateway which results in 403
update
error log
[Wed Jan 29 20:45:55 2014] [error] No hostname was provided via SNI for a name based virtual host
solution
Ok it looks like the client doesn't support SNI
http://wiki.apache.org/httpd/NameBasedSSLVHostsWithSNI
Use the LogLevel directive to adjust how verbose the error logs are and increase until you can see what you want.
httpd 2.4 has better messages in a lot of respect and expensive list of LogLevel settings than 2.2. So if you're using 2.2 it may be a bit harder to figure this out.

Bots throws 500 error in apache access log

In my Apache error log I can see the following errors has caught on enormous amount everyday.
[Tue Jan 15 13:37:39 2013] [error] [client 66.249.78.53] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
When I check the corroesponding IP, Date and Time with the access log I can see the following
66.249.78.53 - - [15/Jan/2013:13:37:39 +0000] "GET /robots.txt HTTP/1.1" 500 821 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
I've tested my robot.txt file in the Google Webmster tool -> Health -> Blocked URLs and it's fine.
Also when some images accessed by bot's it throw the following error,
Error_LOG
[Tue Jan 15 12:14:16 2013] [error] [client 66.249.78.15] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
Accessed_URL
66.249.78.15 - - [15/Jan/2013:12:14:16 +0000] "GET /userfiles_generic_imagebank/1335441506.jpg?1 HTTP/1.1" 500 821 "-" "Googlebot-Image/1.0"
Actually the above image URL (and several other images in our access log) are not available on our site (they were available before a website revamp that we did in August 2012), and we thrown 404 errors when we go to those invalid resources.
However once in a while, it seems that bots (and even human visitors) generate this type of error in our access/error log, only for static resources like images that don't exist, and our robots.txt file. The server throws a 500 error for them, but actually when I try it from my browser - the images are 404 and the robots.txt is 200 (success).
We are not sure why this is happening and howcome a valid robot.txt and inavalid image can throw a 500 error. We do have a .htaccess file and we are sure that our (Zend framework) application is not being reached, because we have a separate log for that. Therefore, the server itself (or.htaccess) is throwing the 500 error "once in a while" and I can't imagine why. Could it be due to too many requests to the server, or how can I debug this further?
Note that we only noticed these errors after our design revamp, but the web server itself stayed the same
It might be useful to log the domain that the client is accessing. Your server might be accessible via multiple domains, including the raw IP address. When you're testing, you're doing so via the primary domain and everything works as expected. What if you try to access the same files via your IP (http://1.2.3.4/robots.txt) vs. the domain (http://example.com/robots.txt)? Also example.com vs. www.example.com or any other variation that points to the server.
Bots can sometimes hold on to IP/domain info long after an address has changed and may be attempting to access something that the rules were changed for months ago.