htaccess send 404 if query string contains keyword - apache

I'm seeing a lot of traffic which I suspect is probing for a flaw or exploit with the request format of
https://example.com/?testword
I figured while I look into this more I could save resources and disrupt or discourage these requests with a 404 or 500 response
I have tried
RewriteEngine On
RewriteCond %{QUERY_STRING} !(^|&)testword($|&) [NC]
RewriteRule https://example.com/ [L,R=404]
And some other variations on the Query string match but none seem to return 404 when testing. Other questions I have found look for query string values/pairs and rewrite them but no examples seem to exits for just a single value.

RewriteCond %{QUERY_STRING} !(^|&)testword($|&) [NC]
RewriteRule https://example.com/ [L,R=404]
There are a few issues here:
The CondPattern in your condition is negated (! prefix), so it's only successfull when the testword is not present in the query string.
The RewriteRule directive is missing the pattern (first) argument (or substitution (second) argument depending on how you look at it). The RewriteRule directive matches against the URL-path only.
When you specify a non-3xx status code for the R flag, the substitution is ignored. You should specify a single hyphen (-) to indicate no substitution.
To test that the whole-word "testword" exists anywhere in the query string, you can use the regex \btestword\b - where \b are word boundaries. Or maybe you simply want the regex testword - to match "testword" literally anywhere, including when it appears as part of another word? In comparison, the regex (^|&)testword($|&) would miss instances where "testword" appears as a URL parameter name.
Try the following instead:
RewriteCond %{QUERY_STRING} \btestword\b [NC]
RewriteRule ^$ - [R=404]
This matches the homepage only (ie. empty URL-path). The L flag is not required when specifying a non-3xx return status, it is implied.
The - (second argument) indicates no substitution. As mentioned above, when specifying a non-3xx HTTP status, the substitution string is ignored anyway.
To test any URL-path then simply remove the $ (end-of-string anchor) on the RewriteRule pattern. For example:
RewriteCond %{QUERY_STRING} \btestword\b [NC]
RewriteRule ^ - [R=404]
If your homepage doesn't accept any query string parameters then you could simply reject the request (ie. 404 Not Found) when a query string is present. For example:
RewriteCond %{QUERY_STRING} .
RewriteRule ^$ - [R=404]

Related

.htaccess check header and domain conditions as chain

Sorry this might be an easy one.
I'd like to check if both matches. The value of my header and the HTTP_REFERER
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?alloweddomain.com [NC]
RewriteCond %{HTTP:X-SomeHeader} !somekey
RewriteRule ^ - [F]
Otherwise I'd like to block the User.
The header check works nicely, and the documents are only served when it is correct. However the HTTP_REFERER seems to be ignored. The resources are even served when it is nor present. F.e with curl. How do I need to change the conditions that both must match?
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?alloweddomain.com [NC]
RewriteCond %{HTTP:X-SomeHeader} !somekey
RewriteRule ^ - [F]
This is currently checking that both do not match. If the Referer header is not present, but somekey is passed then the request is not blocked.
You need an OR flag on the first condition. ie. If either do not match then block the request. For example:
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?alloweddomain.com [NC,OR]
RewriteCond %{HTTP:X-SomeHeader} !=somekey
RewriteRule ^ - [F]
You also need the = operator on the second CondPattern for an exact match, otherwise you are checking whether somekey exists anywhere in the passed header.
OR, reverse the logic:
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?alloweddomain.com [NC]
RewriteCond %{HTTP:X-SomeHeader} =somekey
RewriteRule ^ - [S=1]
RewriteRule ^ - [F]
If both match then the following rule that blocks the request is skipped.
UPDATE:
but a GET parameter, can you guide how I would do that? This is what I've tried RewriteCond %{QUERY_STRING} apikey!=somekey
With the RewriteCond directive you are performing a (regex) string comparison. The value of the QUERY_STRING server variable (ie. %{QUERY_STRING}) is compared against the string/regex apikey!=somekey. This will never match since you should be checking for the "string" apikey=somekey. The = (or !=) is not a comparison operator, it's just part of the string. (Not to be confused with the ! prefix operator on the CondPattern itself (as used above) that negates the whole expression.)
To check that the string "apikey=somekey" is not contained anywhere in the QUERY_STRING then use the CondPattern !apikey=somekey. However, this is potentially too broad, since (as mentioned) this is checking that the string is not contained anywhere in the QUERY_STRING. A query string of the form fooapikey=somekeybar would also be successful. You could instead perform an exact string comparison (as above). For example:
RewriteCond %{QUERY_STRING} !=apikey=somekey
That's OK if the query string can only consist of the apikey URL parameter and nothing else, but if you are potentially expecting other URL parameters on the same request, eg. foo=1&apikey=somekey&bar=1 or apikey=somekey then you need to resort to a regex of the form:
RewriteCond %{QUERY_STRING} !(^|&)apikey=somekey($|&)
The condition is successful when the URL parameter apikey=somekey (exact string) is not contained anywhere in the query string.

I want to remove a string with a question mark at the end of my URL with .htaccess

I want to remove the string
?mobile=1
out from different URLs with .htaccess. So:
https://www.example.com/?mobile=1 should become https://www.example.com/
and
https://www.example.com/something/?mobile=1 should become https://www.example.com/something/
I tried the following
RewriteEngine On
RewriteRule ^(.+)?mobile=1 /$1 [R=301,L,NC]
But that does not seem to work. Any ideas?
RewriteRule ^(.+)?mobile=1 /$1 [R=301,L,NC]
The RewriteRule pattern matches against the URL-path only, which notably excludes the query string. So the above would never match. (Unless there was a %-encoded ? in the URL-path, eg. %3F)
To match the query string you need an additional condition (RewriteCond directive) and match against the QUERY_STRING server variable.
The regex .+ (1 or more) will not match the document root (ie. your first example: https://www.example.com/?mobile=1). You need to allow for an empty URL-path in this case. eg. .* (0 or more).
For example, try the following near the top of your root .htaccess file:
RewriteCond %{QUERY_STRING} =mobile=1
RewriteRule (.*) /$1 [QSD,R=301,L]
This matches the query string mobile=1 exactly, case-sensitive (as in your examples). No other URL parameters can exist. The = prefix on the CondPattern makes this an exact match string comparison, rather than a regex as it normally would.
And redirects to the same URL-path, represented by the $1 backreference in the substitution string that contains the URL-path from the captured group in the RewriteRule pattern.
The QSD (Query String Discard) flag removes the query string from the redirect response.
Test first with a 302 (temporary) redirect and and only change to a 301 (permanent) - if that is the intention - once you have confirmed this works as intended. 301s are cached persistently by the browser so can make testing problematic.

Why doesn't $1 store the complete URL when the pattern is ^(.*)$?

There are several topics already about this. But I haven't found an answer or I still don't understand it correctly.I know that $1 represents the match from the first set of parentheses in the RewriteRule regex. $1 also stores this value.But if there is only ^(.*)$, then it seems to work differently?
Example:URL: http://www.example.com/
RewriteBase /
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC,OR]
RewriteCond %{HTTPS_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L]
What I understand:1. http://www.example.com/ matches with RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC] and stores the match in %1 (=example.com/).2. go to RewriteRule because the URL matched in step 13. RewriteRule gets the string http://www.example.com/. Because of ^(.*)$, http://www.example.com/ matches completely and is stored in $1.4. I think this URL should appear : https://example.com/http://www.example.com/
What actually appears: https://example.com/
Why does $1 have an empty string? It's all matched, isn't it?
There's quite a few misconceptions here that I'll try to address...
RewriteBase /
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC,OR]
RewriteCond %{HTTPS_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L]
I'll ignore the RewriteBase directive and the second RewriteCond directive...
The RewriteBase directive does not apply here, since there are no relative path substitution strings (the 2nd argument to the RewriteRule directive).
There is no HTTPS_HOST server variable, only HTTP_HOST. See the following question on ServerFault: https://serverfault.com/questions/953020/what-is-the-difference-between-http-host-and-https-host-in-apache-htaccess-file
I think HTTPS_HOST has perpetuated around the internet due to a few typos/misconceptions that have been blindly copy/pasted.
HTTP_HOST contains the value of the Host HTTP request header (the hostname) eg. www.example.com or example.com, depending on what was requested. Hence the name HTTP_ + HOST. This is the same naming convention used for all HTTP request headers. A corresponding server variable is created for each.
So, this becomes (removing the OR flag from the first condition):
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L]
The RewriteRule pattern (eg. ^(.*)$)
But if there is only ^(.*)$, then it seems to work differently?
No, it works the same. The confusion would seem to be what the RewriteRule pattern actually matches against.
The RewriteRule pattern matches against the URL-path only.
The URL-path is the part of the URL after the scheme + hostname and before the query string. eg. Given a request for http://example.com/ then the URL-path is simply /. Or request http://example.com/foo/bar?param=1 - the URL-path is /foo/bar.
HOWEVER, in a per-directory context like .htaccess (as opposed to a server or virtualhost context) the directory-prefix is first removed from the URL-path before the match occurs. (Because .htaccess is processed after the request is mapped to the filesystem and strictly speaking matches against a file-path.) The directory-prefix is the absolute file path of the .htaccess file itself and notably ends with a slash. eg. When the .htaccess file is located in the document root, then the directory-prefix will be something like /var/www/user/public_html/ (the filesystem path to the document root).
So, given a request for http://example.com/ then the URL-path that is matched by the RewriteRule pattern in .htaccess is simply "" (empty string). Or request http://example.com/foo/bar?param=1 - the URL-path that is matched is foo/bar - no slash prefix.
This is more significant when the .htaccess file is located in a subdirectory off the document root. For example, if the .htaccess file is located in the /subdir subdirectory and there is a request of the form http://example.com/subdir/foo/bar, the RewriteRule pattern will again match against just foo/bar (not subdir/foo/bar or /subdir/foo/bar). This is a significant difference to when RewriteRule directives are used in a server (or virtualhost) context. In a server context, the RewriteRule pattern always matches against the full URL-path, starting with a slash - there is no concept of a directory-prefix when used in a server context, since the directives are processed before the request is mapped to the filesystem.
What I understand:
http://www.example.com/ matches with RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC] and stores the match in %1 (=example.com/).
go to RewriteRule because the URL matched in step 1
RewriteRule gets the string http://www.example.com/. Because of ^(.*)$, http://www.example.com/ matches completely and is stored
in $1.
I think this URL should appear : https://example.com/http://www.example.com/
You've got the order of processing wrong. It's actually the RewriteRule pattern that is processed first. Only if the RewriteRule pattern matches are the preceding RewriteCond (conditions) processed. If all the conditions are successful then the RewriteRule substituion (2nd argument) occurs.
So, in order, given a request for http://www.example.com/:
RewriteRule ^(.*)$ - The resulting URL-path "" (empty string) matches the RewriteRule pattern ^(.*)$. The $1 backreference then holds an empty string (as does the $0 backreference - which stores the match of the entire pattern - the same in this case)
RewriteCond %{HTTP_HOST} ^www\.(.*)$ - If the RewriteRule pattern matched in step #1 (it does in this case) then the preceding RewriteCond directive is processed. This matches the Host header eg. www.example.com (no http://) against the regex ^www\.(.*)$. If this is successful then the %1 backreference holds the value of the first captured group, ie. example.com in this example.
RewriteRule ^(.*)$ https://%1/$1 [R=301,L] - If the preceding condition(s) is successful then the substitution (ie. https://%1/$1) in the RewriteRule directive occurs. ie. https://example.com/ - %1 is example.om from the captured group in the last matched CondPattern and $1 is an empty string, from the captured group in the RewriteRule pattern.
Other notes:
Due to the order of processing, it is naturally more efficient to do as much pattern matching in the RewriteRule pattern as possible, instead of relying on preceding RewriteCond directives. (A common misconception that RewriteCond directives are processed first - that is not the case.)
Due to the order of processing, you can use $n backreferences in the TestString (first) argument of the preceding RewriteCond directives. (This wouldn't be possible if the directives were literally processed top-down.)
The %n back references are only from the last matched CondPattern. This is important to consider if you have multiple conditions.

.htaccess skip all rules if url matches

I want to skip all rewrite URLs when specific URL matches. I want to open this page:
https://www.example.com/.well-known/pki-validation/godaddy.html
If godaddy.html matches the URL. Here is what i am doing:
RewriteCond "%{REQUEST_URI}" "==/godaddy.html"
RewriteRule ^(.*)$ https://www.example.com/.well-known/pki-validation/godaddy.html [L]
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule ^(.*)$ https://www.example.com/index.php
but it does not work. I have also tried the [END] flag, but when I write flag [END] it gives me 500 internal server error.
If you want to stop rewriting, when the requested URL ends with godaddy.html, you can use a dash - as the substitution
Substitution of a rewrite rule is the string that replaces the original URL-path that was matched by Pattern. The Substitution may be a:
...
- (dash)
A dash indicates that no substitution should be performed (the existing path is passed through untouched). This is used when a flag (see below) needs to be applied without changing the path.
RewriteRule godaddy.html$ - [L]

Trying to put an exception to RewriteRule in .htaccess

I am redirecting all requests like so:
RewriteRule ^sitemap.xml$ sitemap.php?/ [QSA,L]
# the line below is the one I'm having trouble with
RewriteCond %{REQUEST_URI} !^market-reports$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) /index.php?section=$1 [QSA,L]
All my incoming links are meant to go to index.php, as you can see. But now I want to stop one from going there. I've never written my own RewriteCond before, so I'm a little unsure if what I am doing is correct.
Basically what I'm trying to say is: "If incoming URL is a file, directory or /market-reports/ do nothing. Otherwise send on the URL to index.php?section="
What am I doing wrong? Thanks
So you just need to ignore http://yourdomain.com/market-reports (in addition to files/directories?). You should be fine with:
RewriteCond %{REQUEST_URI} !^/market-reports/?$
This will (not) match "http://yourdomain.com/market-reports" as well as "http://yourdomain.com/market-reports/" as the question mark "?", in the Perl Compatible Regular Expression vocabulary that mod_rewrite uses, makes the match optional (a wildcard) before the end of the string anchor, which is represented with the literal dollar sign "$".
The "^" symbol acts as an anchor matching the beginning of the string and the "!" negates the match, so that any string URL that does not match the rest of the expression will be rewritten to the other specified rules.
See mod_rewrite regex vocabulary