What does "^." and "!" mean in apache config? - apache

I have following rewrite rule:
# Rewriting without query parameters to avoid cache overloading
RewriteCond %{REQUEST_URI} /(en|fr)/search-results.html
RewriteCond %{QUERY_STRING} ^.
RewriteCond %{QUERY_STRING} !referrerPage=automotive-home
RewriteRule ^(.*)/search-results.html$ $1/search-results.html? [NC]
As I understand
RewriteCond %{REQUEST_URI} /(en|fr)/search-results.html
will return true if {REQUEST_URL} will like:
https://www.trololo.com/en/search-results.html
https://www.trololo.com/fr/search-results.html
Please explain the last two RewriteConds:
RewriteCond %{QUERY_STRING} ^.
RewriteCond %{QUERY_STRING} !referrerPage=automotive-home
RewriteCond %{QUERY_STRING} ^.
Does this mean that QUERY_STRING is not blank
%{QUERY_STRING} !referrerPage=automotive-home
Does this mean that QUERY_STRING doesn't contain referrerPage=automotive-home ?

The regex ^. means match any one character. The^` itself represents the start of the string, and is often not really needed for generic expressions like this; it could have been omitted
The . matches any one character... So in this context, it means the query string must have at least 1 character; if the query string is empty, the condition will not be met.
# If the requested query string is *not empty, having at least one character*
RewriteCond %{QUERY_STRING} ^.
# ...and the query string does not contain "referrerPage=automotive-home"
# It doesn't need to be the complete expression because it is not anchored
# with ^ at the start and $ at the end, so this pattern will match
# if it appears anywhere in the query string
RewriteCond %{QUERY_STRING} !referrerPage=automotive-home
# If the above 2 conditions were met, the next `RewriteRule` will be processed.
# This rewrite rule's purpose is to erase the query string. Since it terminates in
# ? without a [QSA] flag, any existing query string will be removed
RewriteRule ^(.*)/search-results.html$ $1/search-results.html? [NC]
In this case, the first RewriteCond could just be expressed without the ^
RewriteCond %{QUERY_STRING} .
As mentioned in the comments, the ! negates the subsequent expression. This, along with the anchors and the . character are documented in the mod_rewrite regex vocabulary.
Finally, beginning with Apache 2.4, there is a [QSD] ("query string discard") flag which achieves the same thing as ending the target URI with ? to erase the query string.

RewriteCond %{QUERY_STRING} ^. ==> the requested query string is not empty, having at least one character
%{QUERY_STRING} !referrerPage=automotive-home ==> the query string does not contain "referrerPage=automotive-home"

Related

.htaccess check header and domain conditions as chain

Sorry this might be an easy one.
I'd like to check if both matches. The value of my header and the HTTP_REFERER
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?alloweddomain.com [NC]
RewriteCond %{HTTP:X-SomeHeader} !somekey
RewriteRule ^ - [F]
Otherwise I'd like to block the User.
The header check works nicely, and the documents are only served when it is correct. However the HTTP_REFERER seems to be ignored. The resources are even served when it is nor present. F.e with curl. How do I need to change the conditions that both must match?
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?alloweddomain.com [NC]
RewriteCond %{HTTP:X-SomeHeader} !somekey
RewriteRule ^ - [F]
This is currently checking that both do not match. If the Referer header is not present, but somekey is passed then the request is not blocked.
You need an OR flag on the first condition. ie. If either do not match then block the request. For example:
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?alloweddomain.com [NC,OR]
RewriteCond %{HTTP:X-SomeHeader} !=somekey
RewriteRule ^ - [F]
You also need the = operator on the second CondPattern for an exact match, otherwise you are checking whether somekey exists anywhere in the passed header.
OR, reverse the logic:
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?alloweddomain.com [NC]
RewriteCond %{HTTP:X-SomeHeader} =somekey
RewriteRule ^ - [S=1]
RewriteRule ^ - [F]
If both match then the following rule that blocks the request is skipped.
UPDATE:
but a GET parameter, can you guide how I would do that? This is what I've tried RewriteCond %{QUERY_STRING} apikey!=somekey
With the RewriteCond directive you are performing a (regex) string comparison. The value of the QUERY_STRING server variable (ie. %{QUERY_STRING}) is compared against the string/regex apikey!=somekey. This will never match since you should be checking for the "string" apikey=somekey. The = (or !=) is not a comparison operator, it's just part of the string. (Not to be confused with the ! prefix operator on the CondPattern itself (as used above) that negates the whole expression.)
To check that the string "apikey=somekey" is not contained anywhere in the QUERY_STRING then use the CondPattern !apikey=somekey. However, this is potentially too broad, since (as mentioned) this is checking that the string is not contained anywhere in the QUERY_STRING. A query string of the form fooapikey=somekeybar would also be successful. You could instead perform an exact string comparison (as above). For example:
RewriteCond %{QUERY_STRING} !=apikey=somekey
That's OK if the query string can only consist of the apikey URL parameter and nothing else, but if you are potentially expecting other URL parameters on the same request, eg. foo=1&apikey=somekey&bar=1 or apikey=somekey then you need to resort to a regex of the form:
RewriteCond %{QUERY_STRING} !(^|&)apikey=somekey($|&)
The condition is successful when the URL parameter apikey=somekey (exact string) is not contained anywhere in the query string.

htaccess send 404 if query string contains keyword

I'm seeing a lot of traffic which I suspect is probing for a flaw or exploit with the request format of
https://example.com/?testword
I figured while I look into this more I could save resources and disrupt or discourage these requests with a 404 or 500 response
I have tried
RewriteEngine On
RewriteCond %{QUERY_STRING} !(^|&)testword($|&) [NC]
RewriteRule https://example.com/ [L,R=404]
And some other variations on the Query string match but none seem to return 404 when testing. Other questions I have found look for query string values/pairs and rewrite them but no examples seem to exits for just a single value.
RewriteCond %{QUERY_STRING} !(^|&)testword($|&) [NC]
RewriteRule https://example.com/ [L,R=404]
There are a few issues here:
The CondPattern in your condition is negated (! prefix), so it's only successfull when the testword is not present in the query string.
The RewriteRule directive is missing the pattern (first) argument (or substitution (second) argument depending on how you look at it). The RewriteRule directive matches against the URL-path only.
When you specify a non-3xx status code for the R flag, the substitution is ignored. You should specify a single hyphen (-) to indicate no substitution.
To test that the whole-word "testword" exists anywhere in the query string, you can use the regex \btestword\b - where \b are word boundaries. Or maybe you simply want the regex testword - to match "testword" literally anywhere, including when it appears as part of another word? In comparison, the regex (^|&)testword($|&) would miss instances where "testword" appears as a URL parameter name.
Try the following instead:
RewriteCond %{QUERY_STRING} \btestword\b [NC]
RewriteRule ^$ - [R=404]
This matches the homepage only (ie. empty URL-path). The L flag is not required when specifying a non-3xx return status, it is implied.
The - (second argument) indicates no substitution. As mentioned above, when specifying a non-3xx HTTP status, the substitution string is ignored anyway.
To test any URL-path then simply remove the $ (end-of-string anchor) on the RewriteRule pattern. For example:
RewriteCond %{QUERY_STRING} \btestword\b [NC]
RewriteRule ^ - [R=404]
If your homepage doesn't accept any query string parameters then you could simply reject the request (ie. 404 Not Found) when a query string is present. For example:
RewriteCond %{QUERY_STRING} .
RewriteRule ^$ - [R=404]

htaccess environment variable based on query string without some pamameters

I want set a apache environment variable based on query string variable but without some parameters.
for example, i have this query string:
utm_source=foo&my_param=baz&utm_medium=bar&_t=9999
now, i want set a variable without utm_source and utm_medium:
my_param=baz&_t=9999
i know only the parameters to remove (utm_source & utm_medium)... the other are only an example...
i have wtrite this code:
RewriteEngine On
RewriteBase /
RewriteRule ^ - [E=CustomQueryString:%{QUERY_STRING}]
RewriteCond %{ENV:CustomQueryString} ^(.*)&?(utm_source|utm_medium)=[^&]+&?(.*)$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%3]
the output is
utm_source=foo&my_param=baz&_t=9999
why utm_source param persist?
i have also tried:
RewriteRule ^ - [E=CustomQueryString:%{QUERY_STRING}]
RewriteCond %{ENV:CustomQueryString} ^(.*)&?utm_source=[^&]+&?(.*)$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%2]
RewriteCond %{ENV:CustomQueryString} ^(.*)&?utm_medium=[^&]+&?(.*)$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%2]
the output is right!
a=b&my_param=baz&_t=9999
i don't undertand why this version not work!
RewriteCond %{ENV:CustomQueryString} ^(.*)&?(utm_source|utm_medium)=[^&]+&?(.*)$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%3]
This condition
RewriteCond %{ENV:CustomQueryString} ^(.*)&?(utm_source|utm_medium)=[^&]+&?(.*)$ [NC]
doesn't remove both utm_source and utm_medium, because it matches only one of them.
.* matches the longest possible string, including one of utm_source or utm_medium, whichever comes first. See also Repetition with Star and Plus, especially section Watch Out for The Greediness! and following, for a detailed explanation.
utm_source|utm_medium means in plain english: match either utm_source or utm_medium. So the regular expression matches
(.*) - %1, including utm_source
(utm_source|utm_medium)=[^&]+ - utm_medium=...
(.*) - %3, everything else after utm_medium
The second variant with two conditions and rules works, because first you remove utm_source=..., and then pass the remaining string to the second condition. There the utm_medium=... part is removed, and finally you have the desired string without both utm_source and utm_medium.
You may enclose the test string with &s. This guarantees, that you have always an ampersand before and after the utm_source=... part and allows to simplify the regular expression a little bit.
RewriteCond &%{ENV:CustomQueryString}& ^(.*)&utm_source=.+?&(.*)$ [NC]
Same goes for the second RewriteCond with utm_medium.
You can use these 2 rule to remove both query parameters:
RewriteRule ^ - [E=CustomQueryString:%{QUERY_STRING}]
# remove utm_source from query string and set env var
RewriteCond %{ENV:CustomQueryString} ^(.*&)?utm_source=[^&]*(?:&(.*))?$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%2]
# remove utm_medium from query string and reset env var
RewriteCond %{ENV:CustomQueryString} ^(.*&)?utm_medium=[^&]*(?:&(.*))?$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%2]
If we start with URL as /?utm_source=foo&my_param=baz&utm_medium=bar&_t=9999
After 2nd rule we will have:
$_SERVER["CustomQueryString"] becomes `my_param=baz&_t=9999`

301 redirect old parameter names to new parameter names by htaccess

I just changed two parameter names and wanna redirect old names to changed name ones with any values anywhere in URL. e.g:
product.php?colornew=anyvalue&productname=anyvalue
301 redirect to:
product.php?color=anyvalue&product=anyvalue
Please note that this is just an example and as I said these two parameter can be anywhere in URL with any value.
You can use this code to rename your query parameters in any URL:
RewriteEngine On
# rename query parameter colornew=>color
RewriteCond %{QUERY_STRING} ^(.*&)?colornew=([^&]*)(&.*)?$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1color=%2%3 [NC]
# rename query parameter productname=>product
RewriteCond %{QUERY_STRING} ^(.*&)?productname=([^&]*)(&.*)?$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1product=%2%3 [NC,NE,L,R=302]
Try :
RewriteEngine on
RewriteCond %{THE_REQUEST} \?colornew=([^&]+)&productname=([^&\s]+)
RewriteRule ^ %{REQUEST_URI}?color=%1&product=%2 [QSA,NC,NE,L,R=301]
A simple fix of anubhava's otherwise excellent answer:
RewriteEngine On
# rename query parameter colornew=>color
RewriteCond %{QUERY_STRING} ^(.*&)?colornew=([^&]*)(&.*|)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1color=%2%3 [NC]
# rename query parameter productname=>product
RewriteCond %{QUERY_STRING} ^(.*&)?productname=([^&]*)(&.*|)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1product=%2%3 [NC,NE,L,R=302]
The difference is, at the end of the rewrite condition, the question mark is removed and there is an added "or" operator: |.
The reason is that, without this change, the URL becomes product.php?color=anyvalue&product=anyvalue%3 when there is nothing filled by the final match - (&.*)? - because the question mark in that match section says it either exists or doesn't; and if it doesn't, %3 doesn't exist so it just becomes appended as a string. Instead, (&.*|) says the match can either be populated by an ampersand plus anything, or it can be populated by nothing. In this way, %3 becomes "" when there is nothing and %3 does not get appended to the string.

htaccess check length of a part of a query string

I want apache to skip certain rewrites in case part of a request is shorter than 255 characters (has to do with caching and the 255 character filename limit in linux).
I've written this:
RewriteCond %{QUERY_STRING} "utm_campaign"
RewriteCond %{QUERY_STRING} "utm_medium"
RewriteCond %{QUERY_STRING} ^(.*\/)([^\/\n]{0,255})$
RewriteRule .* - [S=2]
And I tested the regex against the url (q=path/to/page?utm_campaign=xxx&utm_medium=xxx) and it matches but the query_string variable seems to have a different content because the 2 rules after this still get executed. The part that should match is in this case page?utm_campaign=xxx&utm_medium=xxx (and everything after this) If this is shorter than 255 characters the next 2 rewrite rules can be skipped.
I'm using Drupal 6 btw.
The part before the ? is not in %{QUERY_STRING} (contrary to $_SERVER['QUERY_STRING'] in PHP in this case, hence the confusion), adding %{REQUEST_URI} to the RewriteCond solved the problem:
RewriteCond %{QUERY_STRING} "utm_campaign"
RewriteCond %{REQUEST_URI}%_{QUERY_STRING} "utm_medium"
RewriteCond %{QUERY_STRING} ^(.*\/)([^\/\n]{0,255})$
RewriteRule .* - [S=2]
Not sure how to give #Kamil Šrot credit for this solution since the answer is in a comment?