%N backreference inside RewriteCond - apache

I'm working on a virtual domain system. I have a wildcard DNS set up as *.loc, and I'm trying to work on my .htaccess file. The following code works:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www.)?example\.loc$ [NC]
RewriteCond %{REQUEST_URI} !^/example/
RewriteRule (.*) /example/$1 [L,QSA]
But, I want this to work with anything I put in. However, I need the %{REQUEST_URI} checked against the text found as the domain. I tried using this code:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www.)?([a-zA-Z0-9-]*.)?([a-zA-Z0-9-]+)\.loc$ [NC]
RewriteCond %{REQUEST_URI} !^/%3/
RewriteRule (.*) /%3/$1 [L,QSA]
But the line RewriteCond %{REQUEST_URI} !^/%3/ causes my code to throw an Internal Server Error. I understand this is because of the %N in my code but is there a way I can work with it? I need this line, otherwise, my code fails from internal redirects.
I hope this makes sense to someone. All I need is to be able to backreference a RewriteCond in a following RewriteCond.

There's 2 things that you are doing wrong here.
First, your %{HTTP_HOST} regex is no good. You need to escape the . dots otherwise they'll be treated as "any character that's not a newline". This essentially makes the %3 backreference the last character of the hostname before the TLD (e.g. http://blah.bar.loc, %3 = r).
Second, you can't use backreferences in the regex of a RewriteCond, only the left side string, it's sort of a weird limitation. However, you can use the \1 references, in the regex so that you can construct a clever left side string to match against. Something like %3::%{REQUEST_URI} and then you can match like this: !^(.*?)::/\1/?. This regex essentially says: "match and group the first block of text before the ::, then make sure the block of text following the :: starts with /(first block)".
So your rules should look like this:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www\.)?([a-zA-Z0-9-]*\.)?([a-zA-Z0-9-]+)\.loc$ [NC]
RewriteCond %3::%{REQUEST_URI} !^(.*?)::/\1/?
RewriteRule (.*) /%3/$1 [L,QSA]

Related

What is the meaning of ^ and $ in Apache HTTPD RewriteRule?

I have successfully added the following code to my Apache HTTPD configuration:
# Force www.
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]
# Force https (SSL)
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
Although it works as expected, I have a theoretical question:
Why are there a ^ and $ in 3rd line enforcing "www.", and not in the 6th line enforcing "https"?
Sincerely, Dovid.
For both of your regex patterns ^(.*)$ and (.*) will behave same. However guess what, you don't need to use any of them. In fact it is far less error prone also to not to use .* and use %{REQUEST_URI} variable that matches full URI (not the relative one like .*). So I suggest change your rules to this:
# Force www.
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ https://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]
# Force https (SSL)
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]
Flag NE is used for not escaping. It is useful to have this flag in case your original URI has some special characters like # or (,),[,] etc.
^ in RewriteRule pattern above does nothing but returns true for every match since ^ means start position of a string and it will be always match.
Both rules can be combined into a single rule but it will look a bit complicated.
Here it is:
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} !on
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]
Here is the explanation of this rule:
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]: if HOST_NAME doesn't start with www.
[NC,OR]: Ignore case match and ORs next condition
RewriteCond %{HTTPS} !on: HTTPS is not turned on
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]: This condition will always match since www. is an optional match here. It is used to capture substring of HTTP_HOST without starting www. by using (.+) pattern in capture group #1 (to be back-referenced as %1 later). Note that (?:..) is a non-capturing group.
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]: ^ will always match. This rule will redirect to https://www.%1%{REQUEST_URI} with R=301 code by adding https:// and www. to %1. %1 is back-reference of capture group #1 from RewriteCond, as mentioned above.
If using Apache's module mod_rewrite then you can define a RewriteRule.
RewriteRule uses a Regular Expression
The keyword or directive RewriteRule is followed by a Regular Expression (also known as RegEx or pattern). This RegEx (e.g. ^(.*)$) is used to match input URL's in order to rewrite them.
Regular Expressions are coded using special characters
Within a RegEx pattern ^ marks the start of the line to match, whereas the end is denoted by $.
Both are called metacharacters and have special meaning:
^: Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
$: Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
Why they often are obsolete?
Since URLs reaching the HTTP-server always are represented by one single line, these line-delimiting metacharacters can also be omitted without affecting the pattern/rewrite-rule.
They're the same. There's no difference between ^(.*)$ and (.*).
.* matches any string. ^ and $ don't change that since all strings have a start and end.
It depends if you made the certificate for the domain without www or with www.
In the provided example the redirection (6th line) is done to the domain without www. That guarantees that the correct certificate will be served and browser won't display an alert while visiting your site.

Remove trailing slashes from Query String Apache

I am having an issue trying to remove trailing slashes from the end of a query string in apache.
I have the following rewrite rules in place right now to make the URL and Query String all lowercase:
RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} ^[^A-Z]*[A-Z].* [OR]
RewriteCond %{QUERY_STRING} ^[^A-Z]*[A-Z].*
RewriteRule ^ ${lc:%{REQUEST_URI}}?${lc:%{QUERY_STRING}} [L,R=301]
I have tried to add:
RewriteCond %{QUERY_STRING} (.+)/$
RewriteRule ^ %1 [R=301,L]
But it breaks the website. I have been searching for a way to do this but haven't come up with any solutions yet. I tried the answers from this post but they didn't work.
The reason I need to do this is because our application firewall looks for "ID" in the url and if there is any non alphanumeric character that comes after then it blocks the request. The firewall is implemented after the Apache request hits the server.
Hoping someone with more experience with Apache Rewrite rules can help me out. Thanks in advance.
To remove trailing slash from query string you can use this rule:
RewriteCond %{QUERY_STRING} ^(.+)/$
RewriteRule ^ %{REQUEST_URI}?%1 [R=301,L,NE]
Make sure this is first rule in your .htaccess below RewriteEngine On line.

Apache Rewrite rules: How to check for host name in cookies?

I am setting a cookie using rewrite rules, and that is working (simplified for the sake of brevity):
RewriteCond %{QUERY_STRING} set_cookie=1 [NC]
RewriteRule .* http://%{HTTP_HOST}%{REQUEST_URI}?skip=1 [QSA,NE,NC,L,CO=test_%{HTTP_HOST}:tmp:%{HTTP_HOST}:5:/]
This one sets a cookie with the name test_{host_name}. Now I want to read that cookie value the next request. I tried this (and some variants), but that does not seem to work.
RewriteCond %{QUERY_STRING} skip [NC]
RewriteCond %{HTTP_COOKIE} ^.*test_%{HTTP_HOST}=tmp.*$ [NC]
RewriteRule ^(.*)$ - [L]
When I was googling, I found an article that stated the following:
If you are wondering, "Why not use %{HTTP_HOST} instead of corz.org,
create universal code?", as far as I know, it's not possible to test
one server variable against another with a RewriteCond without using
Atomic Back References and some serious POSIX 1003.2+ Jiggery-Pokery.
I guess that's my problem, but I am sort of at a loss on how to solve it. Any help is greatly appreciated.
Regards,
Joost.
There's a useful trick in this area. It is simple regex, but a unique kind of mod_rewrite style.
Note I am not too careful about the matching here especially in the first condition -- this is for illustration in the 2nd condition:
RewriteEngine ON
RewriteCond %{HTTP_COOKIE} test_([^;]*)=tmp.*$
RewriteCond %1<>%{HTTP_HOST} ^(.+)<>\1
RewriteRule .* - [F]
The novel part (for mod_rewrite) is that you can only use variables/backrefs in the first argument, but you can use backrefs (for the current expression, not the preceding one) in the 2nd parameter.
The little <> is just something unlikely to appear as a separator.
I have found a solution. This part in my original question
RewriteCond %{QUERY_STRING} skip [NC]
RewriteCond %{HTTP_COOKIE} ^.*test_%{HTTP_HOST}=tmp.*$ [NC]
RewriteRule ^(.*)$ - [L]
should be replaced by the following
RewriteCond %{QUERY_STRING} skip [NC]
RewriteCond %{HTTP_HOST}##%{HTTP_COOKIE} ^([^#]*)##.*test_\1=tmp.* [NC]
RewriteRule ^(.*)$ - [L]
Only the second RewriteCond has changed. Its left hand side (%{HTTP_HOST}##%{HTTP_COOKIE}) concatenates the http host and cookie values, using ## as glue (## doesn't really mean something, it's just unlikely to be used in a normal host or cookie string).
The right hand side (^([^#]*)##.*test_\1=tmp.*) matches everything to the first "#", which is the host name, and then checks if it can be found somewhere in the cookie values, preceded by "test_" and followed by "=tmp".

Apache mod_rewrite query string without specific order

I have a rule that translates old-style urls into new style. It works ok as long as I use the same order of parameters in the query:
RewriteCond %{QUERY_STRING} ^country=([a-z]{2})&id=([0-9]+)$ [NC]
RewriteRule ^(.*)$ http://%1.localhost/%2? [R=301,L]
So url localhost/index.php?country=us&id=1234 would go to us.localhost/1234
But the problem is that using localhost/index.php?id=1234&country=us (note that arguments are now swapped in order) then the rule of course doesn't apply.
I thought about changing the rule to handle arguments separately, like this:
RewriteCond %{QUERY_STRING} country=([a-z]{2}) [NC]
RewriteCond %{QUERY_STRING} id=([0-9]+) [NC]
RewriteRule ^(.*)$ http://%1.localhost/%2? [R=301,L]
But when entering localhost/index.php?id=1234&country=us I get 1234.localhost/us which is not what I expect (I'd expect first cond to give me %1 and second cond %2 but it seems the order isn't determined this way)
Is there any easy way to achieve this? Of course I could write two separate rules each handling each case, but was wondering if some generic approach could be used (think if we had 3 parameters then permutations would make this unmanageable)
From the documentation, it looks like you can only reference rules from the last match:
RewriteCond backreferences: These are backreferences of the form %N (0 <= N <= 9). %1 to %9 provide access to the grouped parts (again, in parentheses) of the pattern, from the last matched RewriteCond in the current set of conditions. %0 provides access to the whole string matched by that pattern.
I've been racking my brains on a good way to do this and the best I can come up with is to use multiple rules to extract the data into environment variables and then reference those at the end for your final rewrite.
Something like this:
RewriteCond %{QUERY_STRING} country=([a-z]{2}) [NC]
RewriteRule ^(.*)$ - [E=URL_COUNTRY:%1]
RewriteCond %{QUERY_STRING} id=([0-9]+) [NC]
RewriteRule ^(.*)$ - [E=URL_ID:%1]
RewriteRule ^(.*)$ http://%{ENV:URL_COUNTRY}.localhost/%{ENV:URL_ID}? [R=301,L]

Use rewrite rule by query string

How can I make that if the QUERY_STRING matches something it would use that rule?
Options +FollowSymlinks -MultiViews
RewriteEngine On
RewriteBase /
RewriteRule ^(.*)/?$ index.php?uri=$1 [L,QSA]
RewriteCond %{QUERY_STRING} uri=admin
ReqriteRule ^admin\?(.*)/?$ index.php?uri=admin&q=$1 [L,QSA]
Eg. http://localhost/admin?poll/new
After the ? should be the paramater q, the the query would be uri=admin&q=poll/new
Any idea on how I could do this?
Thanks.
Well, it happens that your problem is more simple than the link I gave you as you do not want any analysis on the query string content.
If you use this single line:
RewriteRule ^admin index.php?uri=admin&q= [L,QSA]
Where QSA mean append the query string to the result. You will obtain an internal redirection to:
index.php?uri=admin&q=&poll/new
Which is not OK, this is because the way you use argument (admin?poll/new) is not the standard way. So it seems we'll need to capture the query string content and put it by hand on the rewriteRule. This should work (if you need it only for /admin url):
RewriteCond %{REQUEST_FILENAME} admin [NC]
RewriteCond %{QUERY_STRING} (.*) [NC]
RewriteRule .* index.php?uri=admin&q=%1 [L]
Where %1 is the first parenthesis match in the RewriteCond :(.*), meaning everything in the query string, the query string being anything after the question mark. So in fact this allows admin?poll/new but also admin?poll/new&foo=toto, giving index.php?uri=admin&q=poll/new&foo=bar