htaccess environment variable based on query string without some pamameters - apache

I want set a apache environment variable based on query string variable but without some parameters.
for example, i have this query string:
utm_source=foo&my_param=baz&utm_medium=bar&_t=9999
now, i want set a variable without utm_source and utm_medium:
my_param=baz&_t=9999
i know only the parameters to remove (utm_source & utm_medium)... the other are only an example...
i have wtrite this code:
RewriteEngine On
RewriteBase /
RewriteRule ^ - [E=CustomQueryString:%{QUERY_STRING}]
RewriteCond %{ENV:CustomQueryString} ^(.*)&?(utm_source|utm_medium)=[^&]+&?(.*)$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%3]
the output is
utm_source=foo&my_param=baz&_t=9999
why utm_source param persist?
i have also tried:
RewriteRule ^ - [E=CustomQueryString:%{QUERY_STRING}]
RewriteCond %{ENV:CustomQueryString} ^(.*)&?utm_source=[^&]+&?(.*)$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%2]
RewriteCond %{ENV:CustomQueryString} ^(.*)&?utm_medium=[^&]+&?(.*)$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%2]
the output is right!
a=b&my_param=baz&_t=9999
i don't undertand why this version not work!
RewriteCond %{ENV:CustomQueryString} ^(.*)&?(utm_source|utm_medium)=[^&]+&?(.*)$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%3]

This condition
RewriteCond %{ENV:CustomQueryString} ^(.*)&?(utm_source|utm_medium)=[^&]+&?(.*)$ [NC]
doesn't remove both utm_source and utm_medium, because it matches only one of them.
.* matches the longest possible string, including one of utm_source or utm_medium, whichever comes first. See also Repetition with Star and Plus, especially section Watch Out for The Greediness! and following, for a detailed explanation.
utm_source|utm_medium means in plain english: match either utm_source or utm_medium. So the regular expression matches
(.*) - %1, including utm_source
(utm_source|utm_medium)=[^&]+ - utm_medium=...
(.*) - %3, everything else after utm_medium
The second variant with two conditions and rules works, because first you remove utm_source=..., and then pass the remaining string to the second condition. There the utm_medium=... part is removed, and finally you have the desired string without both utm_source and utm_medium.
You may enclose the test string with &s. This guarantees, that you have always an ampersand before and after the utm_source=... part and allows to simplify the regular expression a little bit.
RewriteCond &%{ENV:CustomQueryString}& ^(.*)&utm_source=.+?&(.*)$ [NC]
Same goes for the second RewriteCond with utm_medium.

You can use these 2 rule to remove both query parameters:
RewriteRule ^ - [E=CustomQueryString:%{QUERY_STRING}]
# remove utm_source from query string and set env var
RewriteCond %{ENV:CustomQueryString} ^(.*&)?utm_source=[^&]*(?:&(.*))?$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%2]
# remove utm_medium from query string and reset env var
RewriteCond %{ENV:CustomQueryString} ^(.*&)?utm_medium=[^&]*(?:&(.*))?$ [NC]
RewriteRule ^ - [E=CustomQueryString:%1%2]
If we start with URL as /?utm_source=foo&my_param=baz&utm_medium=bar&_t=9999
After 2nd rule we will have:
$_SERVER["CustomQueryString"] becomes `my_param=baz&_t=9999`

Related

htaccess send 404 if query string contains keyword

I'm seeing a lot of traffic which I suspect is probing for a flaw or exploit with the request format of
https://example.com/?testword
I figured while I look into this more I could save resources and disrupt or discourage these requests with a 404 or 500 response
I have tried
RewriteEngine On
RewriteCond %{QUERY_STRING} !(^|&)testword($|&) [NC]
RewriteRule https://example.com/ [L,R=404]
And some other variations on the Query string match but none seem to return 404 when testing. Other questions I have found look for query string values/pairs and rewrite them but no examples seem to exits for just a single value.
RewriteCond %{QUERY_STRING} !(^|&)testword($|&) [NC]
RewriteRule https://example.com/ [L,R=404]
There are a few issues here:
The CondPattern in your condition is negated (! prefix), so it's only successfull when the testword is not present in the query string.
The RewriteRule directive is missing the pattern (first) argument (or substitution (second) argument depending on how you look at it). The RewriteRule directive matches against the URL-path only.
When you specify a non-3xx status code for the R flag, the substitution is ignored. You should specify a single hyphen (-) to indicate no substitution.
To test that the whole-word "testword" exists anywhere in the query string, you can use the regex \btestword\b - where \b are word boundaries. Or maybe you simply want the regex testword - to match "testword" literally anywhere, including when it appears as part of another word? In comparison, the regex (^|&)testword($|&) would miss instances where "testword" appears as a URL parameter name.
Try the following instead:
RewriteCond %{QUERY_STRING} \btestword\b [NC]
RewriteRule ^$ - [R=404]
This matches the homepage only (ie. empty URL-path). The L flag is not required when specifying a non-3xx return status, it is implied.
The - (second argument) indicates no substitution. As mentioned above, when specifying a non-3xx HTTP status, the substitution string is ignored anyway.
To test any URL-path then simply remove the $ (end-of-string anchor) on the RewriteRule pattern. For example:
RewriteCond %{QUERY_STRING} \btestword\b [NC]
RewriteRule ^ - [R=404]
If your homepage doesn't accept any query string parameters then you could simply reject the request (ie. 404 Not Found) when a query string is present. For example:
RewriteCond %{QUERY_STRING} .
RewriteRule ^$ - [R=404]

Create two environmental variables with one RewriteCond

I have query strings like:
?request=/de/name/dieter
Using the below rule I catch the last part (dieter) and store it as an environmental variable. I'd also like to be able to store the first part of the URL (de) as an environmental variable, but I can't find a way to do that. Is it possible.
Rule at current:
RewriteCond %{QUERY_STRING} ^request=([a-z\/]*)name\/(.*?)([^/]{3})([^/]+) [NC]
RewriteRule .* - [E=N:%2%3/%4]
RewriteRule .* - [E=LANG:%1]
Generally, the "rewrite flags" [] portion of a RewriteRule accepts a comma-separated list of flags, and the RewriteRule docs do not explicitly say that you may not repeat a flag. Since E= is a flag, it should work to provide a comma-separated list of E= along with other rewrite flags (like [L] if needed)
RewriteCond %{QUERY_STRING} ^request=([a-z\/]*)name\/(.*?)([^/]{3})([^/]+) [NC]
RewriteRule .* - [E=N:%2%3/%4,E=LANG:%1]

301 redirect old parameter names to new parameter names by htaccess

I just changed two parameter names and wanna redirect old names to changed name ones with any values anywhere in URL. e.g:
product.php?colornew=anyvalue&productname=anyvalue
301 redirect to:
product.php?color=anyvalue&product=anyvalue
Please note that this is just an example and as I said these two parameter can be anywhere in URL with any value.
You can use this code to rename your query parameters in any URL:
RewriteEngine On
# rename query parameter colornew=>color
RewriteCond %{QUERY_STRING} ^(.*&)?colornew=([^&]*)(&.*)?$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1color=%2%3 [NC]
# rename query parameter productname=>product
RewriteCond %{QUERY_STRING} ^(.*&)?productname=([^&]*)(&.*)?$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1product=%2%3 [NC,NE,L,R=302]
Try :
RewriteEngine on
RewriteCond %{THE_REQUEST} \?colornew=([^&]+)&productname=([^&\s]+)
RewriteRule ^ %{REQUEST_URI}?color=%1&product=%2 [QSA,NC,NE,L,R=301]
A simple fix of anubhava's otherwise excellent answer:
RewriteEngine On
# rename query parameter colornew=>color
RewriteCond %{QUERY_STRING} ^(.*&)?colornew=([^&]*)(&.*|)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1color=%2%3 [NC]
# rename query parameter productname=>product
RewriteCond %{QUERY_STRING} ^(.*&)?productname=([^&]*)(&.*|)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1product=%2%3 [NC,NE,L,R=302]
The difference is, at the end of the rewrite condition, the question mark is removed and there is an added "or" operator: |.
The reason is that, without this change, the URL becomes product.php?color=anyvalue&product=anyvalue%3 when there is nothing filled by the final match - (&.*)? - because the question mark in that match section says it either exists or doesn't; and if it doesn't, %3 doesn't exist so it just becomes appended as a string. Instead, (&.*|) says the match can either be populated by an ampersand plus anything, or it can be populated by nothing. In this way, %3 becomes "" when there is nothing and %3 does not get appended to the string.

What does "^." and "!" mean in apache config?

I have following rewrite rule:
# Rewriting without query parameters to avoid cache overloading
RewriteCond %{REQUEST_URI} /(en|fr)/search-results.html
RewriteCond %{QUERY_STRING} ^.
RewriteCond %{QUERY_STRING} !referrerPage=automotive-home
RewriteRule ^(.*)/search-results.html$ $1/search-results.html? [NC]
As I understand
RewriteCond %{REQUEST_URI} /(en|fr)/search-results.html
will return true if {REQUEST_URL} will like:
https://www.trololo.com/en/search-results.html
https://www.trololo.com/fr/search-results.html
Please explain the last two RewriteConds:
RewriteCond %{QUERY_STRING} ^.
RewriteCond %{QUERY_STRING} !referrerPage=automotive-home
RewriteCond %{QUERY_STRING} ^.
Does this mean that QUERY_STRING is not blank
%{QUERY_STRING} !referrerPage=automotive-home
Does this mean that QUERY_STRING doesn't contain referrerPage=automotive-home ?
The regex ^. means match any one character. The^` itself represents the start of the string, and is often not really needed for generic expressions like this; it could have been omitted
The . matches any one character... So in this context, it means the query string must have at least 1 character; if the query string is empty, the condition will not be met.
# If the requested query string is *not empty, having at least one character*
RewriteCond %{QUERY_STRING} ^.
# ...and the query string does not contain "referrerPage=automotive-home"
# It doesn't need to be the complete expression because it is not anchored
# with ^ at the start and $ at the end, so this pattern will match
# if it appears anywhere in the query string
RewriteCond %{QUERY_STRING} !referrerPage=automotive-home
# If the above 2 conditions were met, the next `RewriteRule` will be processed.
# This rewrite rule's purpose is to erase the query string. Since it terminates in
# ? without a [QSA] flag, any existing query string will be removed
RewriteRule ^(.*)/search-results.html$ $1/search-results.html? [NC]
In this case, the first RewriteCond could just be expressed without the ^
RewriteCond %{QUERY_STRING} .
As mentioned in the comments, the ! negates the subsequent expression. This, along with the anchors and the . character are documented in the mod_rewrite regex vocabulary.
Finally, beginning with Apache 2.4, there is a [QSD] ("query string discard") flag which achieves the same thing as ending the target URI with ? to erase the query string.
RewriteCond %{QUERY_STRING} ^. ==> the requested query string is not empty, having at least one character
%{QUERY_STRING} !referrerPage=automotive-home ==> the query string does not contain "referrerPage=automotive-home"

htaccess check length of a part of a query string

I want apache to skip certain rewrites in case part of a request is shorter than 255 characters (has to do with caching and the 255 character filename limit in linux).
I've written this:
RewriteCond %{QUERY_STRING} "utm_campaign"
RewriteCond %{QUERY_STRING} "utm_medium"
RewriteCond %{QUERY_STRING} ^(.*\/)([^\/\n]{0,255})$
RewriteRule .* - [S=2]
And I tested the regex against the url (q=path/to/page?utm_campaign=xxx&utm_medium=xxx) and it matches but the query_string variable seems to have a different content because the 2 rules after this still get executed. The part that should match is in this case page?utm_campaign=xxx&utm_medium=xxx (and everything after this) If this is shorter than 255 characters the next 2 rewrite rules can be skipped.
I'm using Drupal 6 btw.
The part before the ? is not in %{QUERY_STRING} (contrary to $_SERVER['QUERY_STRING'] in PHP in this case, hence the confusion), adding %{REQUEST_URI} to the RewriteCond solved the problem:
RewriteCond %{QUERY_STRING} "utm_campaign"
RewriteCond %{REQUEST_URI}%_{QUERY_STRING} "utm_medium"
RewriteCond %{QUERY_STRING} ^(.*\/)([^\/\n]{0,255})$
RewriteRule .* - [S=2]
Not sure how to give #Kamil Šrot credit for this solution since the answer is in a comment?