What is the meaning of ^ and $ in Apache HTTPD RewriteRule? - apache

I have successfully added the following code to my Apache HTTPD configuration:
# Force www.
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]
# Force https (SSL)
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
Although it works as expected, I have a theoretical question:
Why are there a ^ and $ in 3rd line enforcing "www.", and not in the 6th line enforcing "https"?
Sincerely, Dovid.

For both of your regex patterns ^(.*)$ and (.*) will behave same. However guess what, you don't need to use any of them. In fact it is far less error prone also to not to use .* and use %{REQUEST_URI} variable that matches full URI (not the relative one like .*). So I suggest change your rules to this:
# Force www.
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ https://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]
# Force https (SSL)
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]
Flag NE is used for not escaping. It is useful to have this flag in case your original URI has some special characters like # or (,),[,] etc.
^ in RewriteRule pattern above does nothing but returns true for every match since ^ means start position of a string and it will be always match.
Both rules can be combined into a single rule but it will look a bit complicated.
Here it is:
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} !on
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]
Here is the explanation of this rule:
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]: if HOST_NAME doesn't start with www.
[NC,OR]: Ignore case match and ORs next condition
RewriteCond %{HTTPS} !on: HTTPS is not turned on
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]: This condition will always match since www. is an optional match here. It is used to capture substring of HTTP_HOST without starting www. by using (.+) pattern in capture group #1 (to be back-referenced as %1 later). Note that (?:..) is a non-capturing group.
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]: ^ will always match. This rule will redirect to https://www.%1%{REQUEST_URI} with R=301 code by adding https:// and www. to %1. %1 is back-reference of capture group #1 from RewriteCond, as mentioned above.

If using Apache's module mod_rewrite then you can define a RewriteRule.
RewriteRule uses a Regular Expression
The keyword or directive RewriteRule is followed by a Regular Expression (also known as RegEx or pattern). This RegEx (e.g. ^(.*)$) is used to match input URL's in order to rewrite them.
Regular Expressions are coded using special characters
Within a RegEx pattern ^ marks the start of the line to match, whereas the end is denoted by $.
Both are called metacharacters and have special meaning:
^: Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
$: Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
Why they often are obsolete?
Since URLs reaching the HTTP-server always are represented by one single line, these line-delimiting metacharacters can also be omitted without affecting the pattern/rewrite-rule.

They're the same. There's no difference between ^(.*)$ and (.*).
.* matches any string. ^ and $ don't change that since all strings have a start and end.

It depends if you made the certificate for the domain without www or with www.
In the provided example the redirection (6th line) is done to the domain without www. That guarantees that the correct certificate will be served and browser won't display an alert while visiting your site.

Related

replace domain name in htaccess rule with dynamic hostname

I have this rule in htaccess:
1 ## Protect from spam bots ##
2 RewriteEngine On
3 RewriteCond %{REQUEST_METHOD} POST
4 RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
5 RewriteCond %{HTTP_REFERER} !.DOMAIN.COM.* [OR]
6 RewriteCond %{HTTP_USER_AGENT} ^$
7 RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]
I want to replace the DOMAIN.COM at line 5 with dynamic hostname.
I would like to use the same rule with other domain without having to modify htaccess.
RewriteCond %{HTTP_REFERER} !.DOMAIN.COM.* [OR]
The complexity comes about because server variables of the form %{HTTP_HOST} are not expanded in the CondPattern (2nd argument to the RewriteCond directive), since it is a PCRE (regular expression).
Instead of the above line, you can do something like this instead:
RewriteCond %{HTTP_HOST}##%{HTTP_REFERER} !^(.*?)##https?://\1/ [OR]
This checks that the requested Host header matches the hostname part of the HTTP Referer header.
The \1 backreference (in the Referer) matches against the Host. The ## string is just any unique string that cannot otherwise occur.
Note that it is possible for legitimate users to not send an HTTP Referer header at all, in which case your current ruleset will also fail.
RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]
Note that the substitution string in your RewriteRule is malformed. It is an "ordinary" string, not a regex. Consequently, the anchors ^ and $ will be seen as literal characters and should be removed:
RewriteRule .* http://%{REMOTE_ADDR}/ [R=301,L]

mod_rewrite - exclude urls

I need a mod_rewrite to redirect all http requests to https, but I want do exclude some URLs
# force https
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} ^secure\. [NC]
RewriteCond %{REQUEST_URI} !gateway_callback [NC]
RewriteRule ^. https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301,QSA]
All URLs which match gateway_callback must be excluded
This URL should not be redirected, but it does!?
http://secure.localhost/da/gateway_callback/29/
Have tried to flush DNS cache in the browser, but the URL is still redirected to https
The main problem with your config is that the REQUEST_URI variable contains everything after and including the forward slash. The third RewriteCond statement needs to be updated to something like the following:
RewriteCond %{REQUEST_URI} !^/da/gateway_callback/.*$ [NC]
This should match the example you have provided. If the URI does not always start with /da/ then you might need to put in a wildcard:
RewriteCond %{REQUEST_URI} !^/[^/]+/gateway_callback/.*$ [NC]
where the [^/]+ matches one or more characters which is not a forward slash.
I would recommend always using the regex anchors wherever possible as it removes ambiguity. The original RewriteCond attempting to match REQUEST_URI does not use them, which can confuse admins at a casual glance.
Also note that all related examples for the RewriteCond and RewriteRule directives in the official documentation use the start anchor.
Could the trailing slash be keeping you from matching? I'm never clear on whether the trailing slash is preserved or not. I suggest changing your 3rd condition to
RewriteCond %{REQUEST_URI} !/gateway_callback/\d+/?$ [NC]

*Generic* httpd.conf redirect non-www to www

Is it possible to have a generic / multi-domain httpd.conf line that will redirect any non-www request to its www equivalent?
By generic, I mean something that does not rely on hardcoded domain name, ie.
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^yourdomain.com [NC]
RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [L,R=301]
I really don't want to edit httpd.conf every time I have another website added/removed from my server, esp. that websites are added/removed dynamically!
The mod_rewrite documentation has all the information you need, but there is a lot to read. There are two parts to what you want: first, you need to match any domain not starting with www.; then, you need to prefix www. to the current URL.
For the first part, there's this (which applies to both RewriteCond and RewriteRule):
You can prefix the pattern string with a '!' character (exclamation mark) to specify a non-matching pattern.
So "hostname doesn't begin www." could be tested like this:
RewriteCond %{HTTP_HOST} !^www\. [NC]
For the second part, there's this:
In addition to plain text, the Substition string can include [...] server-variables as in rule condition test-strings (%{VARNAME})
So the actual redirect can be made generic like this:
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [L,R=301]
Incidentally, it's also possible to do the opposite (redirect everything to not have the www.) because RewriteRule substitutions can also use this:
back-references (%N) to the last matched RewriteCond pattern
So you could capture everything in the hostname after the www. and use that as the target of the rule:
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

%N backreference inside RewriteCond

I'm working on a virtual domain system. I have a wildcard DNS set up as *.loc, and I'm trying to work on my .htaccess file. The following code works:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www.)?example\.loc$ [NC]
RewriteCond %{REQUEST_URI} !^/example/
RewriteRule (.*) /example/$1 [L,QSA]
But, I want this to work with anything I put in. However, I need the %{REQUEST_URI} checked against the text found as the domain. I tried using this code:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www.)?([a-zA-Z0-9-]*.)?([a-zA-Z0-9-]+)\.loc$ [NC]
RewriteCond %{REQUEST_URI} !^/%3/
RewriteRule (.*) /%3/$1 [L,QSA]
But the line RewriteCond %{REQUEST_URI} !^/%3/ causes my code to throw an Internal Server Error. I understand this is because of the %N in my code but is there a way I can work with it? I need this line, otherwise, my code fails from internal redirects.
I hope this makes sense to someone. All I need is to be able to backreference a RewriteCond in a following RewriteCond.
There's 2 things that you are doing wrong here.
First, your %{HTTP_HOST} regex is no good. You need to escape the . dots otherwise they'll be treated as "any character that's not a newline". This essentially makes the %3 backreference the last character of the hostname before the TLD (e.g. http://blah.bar.loc, %3 = r).
Second, you can't use backreferences in the regex of a RewriteCond, only the left side string, it's sort of a weird limitation. However, you can use the \1 references, in the regex so that you can construct a clever left side string to match against. Something like %3::%{REQUEST_URI} and then you can match like this: !^(.*?)::/\1/?. This regex essentially says: "match and group the first block of text before the ::, then make sure the block of text following the :: starts with /(first block)".
So your rules should look like this:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www\.)?([a-zA-Z0-9-]*\.)?([a-zA-Z0-9-]+)\.loc$ [NC]
RewriteCond %3::%{REQUEST_URI} !^(.*?)::/\1/?
RewriteRule (.*) /%3/$1 [L,QSA]

RewriteCond backrefrence to another RewriteCond causing 500

I have setup a *.mydomain.com subdomain in cpanel (cpanel, no shell access)
Going to anything.mydomain.com gets me to the same directory which I mounted for *.mydomain.com
So when I go to test.mydomain.com with the following in .htaccess,
What works properly is:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(.+)\.(.+?\..+?)$ [NC]
RewriteCond %{REQUEST_URI} !^/test/(.+)?$
RewriteRule ^(.*) /%1/$1
What doesn't work and gives a 500 Error is this (Just replaced the test with %1):
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(.+)\.(.+?\..+?)$ [NC]
RewriteCond %{REQUEST_URI} !^/%1/(.+)?$
RewriteRule ^(.*) /%1/$1
What I want to do is allow dynamic setup of subdomains if a subdirectory with it's name exists. The rewriting is done gracefully when I hardcode the subdomain name test in the .htaccess and not when I use a backrefrence %1 for it.
You can't use a % variable or a backreference as part of the regular expression in a RewriteCond. You can create a string usimg a backreference and a # regex backreference:
RewriteCond %1:%{REQUEST_URI} !^([^:]+):/\1/(.+)?$
So you create a string made up of the subdomain of the host match, a colon, then the URI, match against the subdomain in your regex, and reference it using \1.
Additionally, you may need to add another %{HTTP_HOST} match right after because the %1 backreference might have gotten reset because you can't backreference a non match.