Fail2Ban ignore 404 of local redirect - apache

Assume a bad actor scripts access to an Apache server to probe for vulnerabilities. With Fail2Ban we can catch some number of 404's and ban the IP. Now assume a single web page has a bad local reference to a CSS, JS, or image file. Repeated hits by the same legitimate site visitor will result in some number of 404s, and possibly an IP ban.
Is there a good way to separate these local requests from remote so that we don't ban the valued visitor?
I know all requests are remote, in that a page gets returned to a browser and the content of the page triggers more requests for assets. The thing is, how do we know the difference between that kind of page load pattern, and a script query for the same resource?
If we do know that a request is coming in based on a link that we just generated, we could do a 302 redirect rather than returning a 404, thus avoiding the banning process.
The HTTP Referer header can be used. If the Refer is the same origin as the requested page, or the same as the local site FQDN then we should not ban. But that header can be spoofed. So is this a good tool to use?
I'm thinking cookies can be used, or a session nonce, where a request might come in for assets from a page without a current session cookie. But I don't know if something like that is a built-in feature.
The best solution is obviously to make sure that all pages generated on a site include a valid reference back to the site, but we all know that's not possible. Some CMS add version info to files, or they adjust image paths to include an image size based on the client device/size. Any of these generated headers might simply be wrong until we can find and fix the code that creates them. Between the time we deploy something faulty and the time we fix it, I'm concerned about accidentally banning legitimate visitors with Fail2Ban (and other tools) that do not factor in where the request originates.
Is there another solution to this challenge? Thanks!

how do we know the difference between that kind of page load pattern
You don't in normal case (at least without some white- or black-list).
But you know URI- or paths segments, file extensions etc which would be rather never a target of such attack vectors, which you can ignore.
Some CMS add version info to files, or they adjust image paths to include an image size based on the client device/size.
But you surely knows the prefixes that where correct, so an RE allowing some paths segments would be possible. For instance this one:
# regex ignoring site and cms paths:
^<HOST> -[^"]*\"[A-Z]{3,}\s+/(?!site/|cms/)\S+ HTTP/[^"]+" 40\d\s\d+
will ignore this one:
192.0.2.1 - - [02/Mar/2021:18:01:06] "GET /site/style.css?ver=1.0 HTTP/1.1" 404 469
and match this one:
192.0.2.1 - - [02/Mar/2021:18:01:06] "GET /xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 469
Similar you can write an regex with negative lookahead to ignore certain extensions like .css or .js or arguments like ?ver=1.0.
Another possibility would be to make a special fallback location logging completely worse requests in special log-file (not into access or error logs), like described in wiki :: Best practice so this way it would be possible to consider evildoers with definitely wrong URIs did not matching any proper location which can be handled by web server.
Or simply disable logging of 404 in known as valid locations (paths, prefixes, extensions whatever).
To ensure or completely avoid false positives you can firstly increase maxretry or reduce findtime and observe it a bit (so evildoers with too many attempts going banned and legitimate users with "broken" requests causing 404 but with not so large count of them will be still ignored). So you can cumulate whole list of "valid" 404 request of your application (in order to write more precise regex or filter it in some locations).

Related

Archiving an old PHP website: will any webhost let me totally disable query string support?

I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.
I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.
I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a&param2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.
And ideally I'd like to host it for free.
My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss+xml) even though there's no clue about that in their filenames.
I then considered https://surge.sh/, but hit exactly the same problems.
I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
Is there another host which will fit the bill?
If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to #, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.
I found a solution with Netlify.
I added the wget options --adjust-extension and --restrict-file-names=windows.
The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.
The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with #. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.
This gives static files with names like myfile.php#param1=value1&param2=value2.html and myfile.php.html.
I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.
I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.
/mypage.php param1=:param1 param2=:param2 /mypage.php#param1=:param1&param2=:param2.html 200!
/mypage.php param1=:param1 /mypage.php#param1=:param1.html 200!
/mypage.php param2=:param2 /mypage.php#param2=:param2.html 200!
If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.
There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.
I wrote a _headers file to set the content type of my RSS file:
/rss.php
Content-Type: application/rss+xml
I hope this helps somebody.

DNS resolves correctly using command line tools but fails on browser

Dig, wget, nslookup and curl commands work perfectly for a specific URL I have pointed to another server less than 24 hours ago.
Problem is, it just refuses to be resolved by the browser (Chrome, Safari and Firefox). The strangest part is that it is being successfully resolved by Postman (by testing the OPTIONS and the GET methods separately), but still doesn't return a proper response on the browser side of things.
DNS checks are returning positive, so this is when I started suspecting that the problem is actually within the headers of the HTTP protocol's requests which are sent - alongside the fact that different responses are being returned for the requests that don't include the default browser headers (being issued through the different command-line tools & Postman) and the ones who do (being issued by the browsers automatically or manually using the dev tools).
After fully flushing the current local system's DNS cache, including the browsers's and even trying another device on another network - I still get still no response on the browser.
Kept going, and attempted to verify that with a VPN (locally - which didn't work), and an online web proxy tool (which did work).
Finally, I extracted the router's default DNS server address, used nslookup to look up the URL again, this time specifically mentioning the desired DNS server (the one stated above), and after getting a successful response with the correct values, I am now pretty much sure the HTTP request is causing the problem.
The URL is hosted on Amazon S3 Static Hosting option, which I used many times before, and didn't have a problem with, with that exact same configuration. Looking up the recent changes/features that were possibly added, pointed out that I may need to explicitly set a CORS policy for the newly created bucket, on top of the usual public access policy that is needed.
After applying that as-well - it still doesn't seem to work.
As a quick change in direction that may possibly make some parts clearer about what's going on (and as I started to think that the browser might not be getting the correct Content-Type header in the response, which should be text/html header as its response, and therefore, possibly doesn't resolve the URL with the expected behavior), I went ahead and applied a 301 redirection on the S3 bucket, instead of the static files hosting, and again, it all works perfectly through the command line tools, but not through the browsers.
Anyway, the browser just doesn't seem to complete any of the requests being sent to the URL.
That might be the OPTIONS pre-flight request failing to respond correctly, and the browser just doesn't continue to issuing the GET request, or the URL is not being found by the DNS route the browser is taking, which is unclear to me currently if that is the option.
Any ideas? (besides the fact that sometimes it just takes longer time for some DNS servers that happen to be on the chosen route to update/refresh their cache, which doesn't appear to be affecting my local machine's DNS route specifically for this case. That, being said with caution, was verified by validating the different parts of DNS configuration and prioritization throughout the different possible parts on my system (Mac OS X), including the fact that the response gets back with the correct address successfully).
Found my answer here:
https://serverfault.com/questions/942030/aws-s3-static-hosting-how-to-debug-connection-timeout
As linked there, more details can be found here:
Non-Authoritative-Reason header field [HTTP]
Solution & Explanation: Because of the nature of the domain extension I have purchased (.dev extension) Chrome was silently using HTTPS because of the URL being part of Chrome's HTTP Strict Transport Security (HSTS), because all .dev domains should be using HTTPS only. Therefore, the issue was still showing up, even when explicitly typing http:// into the URL address bar.
This can be overridden by applying a CloudFront distribution with HTTPS support on top of the S3 Static Hosting, as usual (but still, it should be noted as HSTS listings can cause that for different cases, including this one as part of them, because of the .dev domain extension).
Useful Resources (for debugging purposes)
In addition to what is stated here:
https://gist.github.com/stollcri/7c09bafc97223481920e
You can issue a lookup query (and also add or delete your local set of HSTS listings) through the following Chrome's settings URL:
You can also check the current listings here: https://hstspreload.org/

10 internal redirects - can this limit be raised?

I have a client running into some 500 errors when using a CDN. These errors indicate that there are too many internal redirects, and our research confirms that. The client does not want to adjust their internal redirects, and wants to address this problem in another way.
Based on my research so far, this seems like a hard cap which is not specific to any one type of web server, and is in place to avoid endless loops. That being said, is there any way to raise this limit - for instance to 20 instead of 10?
Example:
Browser >>9 redirects>> Origin 200 page (9 redirects total)
Browser >>9 redirects>> Origin gives custom 404 page (+1 redirect for custom 404 - 10 redirects total)
Browser >> CDN (+1 redirect from custom rule) >>9 redirects>> Origin 200 page (10 redirects total)
Browser >> CDN (+1 redirect from custom rule) >>9 redirects>> Origin gives custom 404 page (+1 redirect for custom 404 - 11 redirects total)
Only example 4 gives a 500 error. Without adjusting the redirect configuration or removing the CDN, is there any way to get around this? (Unfortunately, I cannot provide htaccess for more info on the redirects, my apologies).
Unfortunately it is the HTTP client who decides how many redirects it is willing to follow. The limitation you see stems from a recommendation originally given in RFC 2068, sec 10.3 and quoted again in RFC 7231, sec 6.4:
An earlier version of this specification recommended a maximum of five redirections [...] Content developers need to be aware that some clients might implement such a fixed limitation.
A rough estimate on how many redirects are going to hit the limit for browsers can be found in this answer. Most browsers allow to configure this limit (e.g. Firefox is exposing the network.http.redirection-limit setting).
Webservers then are a different matter: It appears Apache had the MaxRedirects option for the RewriteOptions directive between v2.0.45 and 2.1. The LimitInternalRecursion setting seems to have taken over for this. I've been unable to find an equivalent setting for nginx.
As a final note: If you are really seeing this many internal redirects (i.e. redirects that are only performed within the rewrite engine and do not lead to real HTTP redirects immediately), this may be a strong indicator to revise your rewrite rules.
You will need to impress on the client that you must fix the poor application logic that needs countless internal redirects instead of trying to change every browser on the planet.
** EDIT **
Apache apparently has the option to change this server side using RewriteOptions MaxRedirects option but I think you will still have an issue with browsers which often suggest your users stop the redirects and bail out ... sometimes even before 10 redirects.

Aliases on Dreamhost, general management of http request / server errors

I had a hard time deciding how I should manage these errors (404, 500, ...) and when I finally decided, I am encountering problems. This is a reeeeeally long question, I appreciate anyone's attempt to help!
Let me first describe how I decided to set it up. I have several sites hosted on a shared Dreamhost account. In the folder structure that I see, everything of mine on the server is under /home/username, and for example, site1.com's web root is at /home/username/site1.com
I am creating a generic error handler (php script) for errors like 404 not found, 500, etc. that I want to store above the web roots of my sites at /home/username/error_handler/index.php so that I can use an .htaccess file at /home/username/.htaccess which includes something like the following:
ErrorDocument 404 /error_handler/index.php
ErrorDocument 500 /error_handler/index.php
...and many more
When these errors occur on any of my sites, I want it to be directed to /home/username/error_handler/index.phpThis is the problem I'm having a hard time figuring out. The ErrorDocument directives above will actually cause Apache to look for /home/username/site1.com/error_handler/index.php
Anyway, the errors should be redirected to my error handling php script. The script will use $_SERVER['REDIRECT_STATUS'] to get the error code, then use $_SERVER['REDIRECT_URL'] and $_SERVER['HTTP_HOST'] to decide what to do. It will check if an error handler specific to that site exists (for example: site1.com/errors/404.php). If this custom page doesn't exist, it will output a generic message that is slightly more user-friendly and styled, and perhaps will include some contact info for me depending on the error.
Doing it this way lets me funnel all these errors through this 1 php script. I can log the errors however I like or send email notifications if I want. It also lets me set up the ErrorDocument Apache directives once for all my sites instead of having to do it for every site. It will also continue to work without modification when I move the site around, since I already have a system that scans the folder structure to figure out where my site roots are when they really aren't at the web root technically speaking. This may not be possible with other solutions like using mod_rewrite for all 404 problems, which I know is common. Or if it is possible, it may be very difficult to do. Plus, I have already done that work, so it will be easy for me to adapt.
When I am working on sites for which I don't have a domain name yet (or sites where the domain name is already in use at the moment), I store them temporarily in site1.com/dev/site3.com for example. Moving the site to site3.com eventually would cause me to have to update the htaccess files if I had one for each site. Changing the domain name would do the same.
Ex: a site stored at site1.com/dev/site3.com would have this in its htaccess file:
ErrorDocument 404 /site1.com/dev/site3.com/error/404.php
And it would have to be changed to this:
ErrorDocument 404 /site3.com/error/404.php
Obviously, this isn't a huge amount of work, but I already manage a lot of sites and I will probably be making more every year, 95% of which will be hosted on my shared DreamHost account. And most of them get moved at least once. So setting up something automatic will save me a some effort in the long run.
I already have a system set up for managing site-relative links on all my sites. These links will work whether the site exists in a subdirectory of an existing site, or in their own domain. They also work without change in a local development server despite a difference in the web root location. For example, on the live server, the site-relative http link /img/1.jpg would resolve to the file /home/username/site1.com/img/1.jpg while on my local development server it would resolve to C:\xampp\htdocs\img\1.jpg, despite what I consider the logical site root being at C:\xampp\htdocs\site1.com. I love this system, and it is what gave me the idea to set up something that would work automatically like I expected it to, based on the file structure I used.
So, if I could get it to work, I think this seems like a pretty good system. But I am still very new to apache configuration, mod_rewrite, etc. It's possible there is a much easier and better way to do this. If you know of one, please let me know.
Anyway, all that aside, I can't get it working. The easiest thing would be if I could have the ErrorDocument directive send the requests to folders above the web root. But the path is a URL path relative to the document root. Using the following in /home/username/.htaccess,
ErrorDocument 404 /error_handler/index.php
a request for a non-existent resource causes Apache to look for the file at
site1.com/error_handler/index.php
So I thought I should set up a redirection (on all my sites) that would redirect those URLS to /home/username/error_handler. I tried a few things and couldn't get any of them to work.
Alias seemed like the simplest solution, but it is something that has to be set at server runtime (not sure if that is the right terminology - when the server is started). On my local server, it worked fine using:
Alias /error_handler C:\xampp\htdocs\error_handler2
I changed the local folder to test that the Alias was functioning properly. (On the local server, the URL path specified by the ErrorDocument directive is actually pointing to the right folder, since in my local server the web root is technically C:\xampp\htdocs and I store the error handler I want to use is stored locally at C:\xampp\htdocs\error_handler\index.php)
Dreamhost has a web client that can create what I am guessing is an Alias. When I tried to redirect the folder error_handler on site1.com to /home/username/error_handler, it would seem to work right if I typed site1.com/error_handler in the browser. But if I typed site1.com/test1234 (non-existant), it would say there was a 404 error trying to use the error handler. Also, I would have to login through the web client and point and click (and wait several minutes for the server to restart) every time I wanted to set this up for a new site, even if I could get it to work.
So I tried getting it to work with mod_rewrite, which seems like the most flexible solution. My first attempt looked something like this (stored in /home/username/site1.com/.htaccess for now, though it would eventually be at /home/username/.htaccess:
RewriteEngine On
RewriteRule ^error_handler/index.php$ /home/username/error_handler/index.php
The plain english version of what I was trying to do above is to send requests on any of my sites for error_handler/index.php to /home/username/error_handler/index.php. The mis-understanding I had is that the subsitution will be treated as a file path if it exists. But I missed that the documentation says "(or, in the case of using rewrites in a .htaccess file, relative to your document root)". So instead of rewriting to /home/username/error_handler/index.php, it's actually trying to rewrite to /home/username/site1.com/home/username/error_handler/index.php.
I tried including Options +FollowSymLinks because in the Apache documentation it says this:
To enable the rewrite engine in this context [per-directory re-writes in htaccess], you need to set "RewriteEngine On" and "Options FollowSymLinks" must be enabled. If your administrator has disabled override of FollowSymLinks for a user's directory, then you cannot use the rewrite engine. This restriction is required for security reasons.
I searched around for a while and I couldn't find anything about how Dreamhost handles this (probably because I don't know where to look).
I experimented with RewriteBase because in the Apache documentation it says this:
"This directive is required when you use a relative path in a substitution in per-directory (htaccess) context unless either of the following conditions are true:
The original request, and the substitution, are underneath the DocumentRoot (as opposed to reachable by other means, such as Alias)."
Since this is supposed to be a URL path, in my case it should be RewriteBase /, since all my redirects will be from site1.com/error_handler. I also tried Rewrite Base /home/username and RewriteRule ^error_handler/index.php$ error_handler/index.php. However, the Rewrite Base is a URL path relative to the document root. So I need to use something like an alias still. The implication in the quote from the documentation above is that it is possible to use mod_rewrite to send content above the web root. One of the many things I don't know is what the 'other means' besides Alias might be. I believe Alias might not be an option on Dreamhost. At least I couldn't make sense of it.
Why not use error pages in the site root, then include the actual file from the shared section?

Mask redirect to temporary domain with mod_rewrite

We are putting up a company blog at companyname.com/blog but for now the blog is a Wordpress installation that lives on a different server (blog.companyname.com).
The intention is to have the blog and web site both on the same server in a month or two, but that leaves a problem in the interim.
At the moment I am using mod_rewrite to do the following:
http://companyname.com/blog/article-name redirects to http://blog.companyname.com/article-name
Can I somehow keep the address bar displaying companyname.com/blog even though the content is coming from the latter blog.companyname.com?
I can see how to do this if it is on the same server and vhost, but not across a different server?
Thanks
Rather than using mod_rewrite, you could use mod_proxy to set up a reverse proxy on companyname.com, so that requests to http://companyname.com/blog/article-name are proxied (rather than redirected) to http://blog.companyname.com/article-name.
Here are more instructions and examples.
There is functionality with ZoneEdit called webforwards which could probably do this and hide what you are actually doing (unless someone looked into it).
The only thing that mod_rewrite can do is send HTTP header redirects, and those redirects (across servers) always result in the browser address bar reflecting the reality.
You should instead consider writing a 404 script that 'reflects' the blog. This would essentially be a transparent proxy, and many are already written.
The script would find if the requested page (that was 404'd) started with http://mycompany.com/blog/ . If it did, it would download and then send onto the client the blog page and associated files (probably caching them as well).
So requesting http://mycompany.com/blog/article_xyz would cause the 404 script to download and send http://blog.companyname.com/article_xyz.
It's probably more work than it's worth, but you might be able to design a simple enough 404 script that it's worthwhile.
-Adam