Just want to confirm something. From what I gather of how mod_rewrite works, Apache receives an URL and immediately mod_rewrite applies (non-<directory>) rules in httpd.conf, then per-directory mod-rewriting goes to work, then restarts the process with a new URL if any changes are made.
#JonLin's great answer to this question first says that when your per-directory rule specs an absolute replacement (ie. starting with a slash), it's assumed to be relative to the DocumentRoot which I get. But of relative replacements (no slash) Jon then says:
it's based on the directory that the rule is in. So if
RewriteRule ^foo$ bar.php [L]
is in the "root" and you go to http://example.com/foo, you get served http://example.com/bar.php. But if that rule is in the "subdir1" directory, and you go to http://example.com/subdir1/foo, you get served http://example.com/subdir1/bar.php. etc. This sometimes works and sometimes doesn't, as the documentation says, it's supposed to be required for relative paths, but most of the time it seems to work. Except when you are redirecting (using the R flag, or implicitly because you have http://host in your rule's target). That means this rule:
RewriteRule ^foo$ bar.php [L,R]
if it's in the "subdir2" directory, and you go to http://example.com/subdir2/foo, mod_rewrite will mistake the relative path as a file-path instead of a URL-path and because of the R flag, you'll end up getting redirected to something like: http://example.com/var/www/localhost/htdocs/subdir1.
As Jon explains in the last bit, when a redirect will occur and when there's no rewriteBase, a string intended as filepath gets appended to the site's base address to create a phony URL. But just to confirm, even in the former case Jon mentions, ie. not an actual redirect, the substituted string does get sent back to Apache's URL-reception code, restarting the whole process, correct? The diagram on this page of the spec seems to imply that until no rules make a change, the process keeps restarting. These non-redirect cases would seem to be the time when it WOULD make sense to tack the filepath right from the file system root to the htaccess directory onto the beginning of the substitution. But how does that get turned into a proper URL as expected by the URL-reception code - does http://localhost get prepended? I think that would make everything relative to the documentroot, not the actual file system root.
Thanks!
Been doing some more reading and think I've got this explained, for anyone who's interested.
Regarding my question about how a file system absolute path gets turned into a valid url for the internal redirect, I was thinking that the URI in an HTTP request contained "http://hostname", but this has been cut off ie. the URI is like /this/is/a/path. The host name is in a separate "Host" header field, and is no longer a vital piece of information by the time mod_rewrite is running, as Apache's initial Post Read Request phase has already noticed the GET request on the port and, if Name-Based Virtual Hosting is in use, interpreted things like the DocumentRoot from the Host header field, and finally called the URI Translation Phase where mod_rewrite executes. So any time mod_rewrite is running, there could be only one host name that got us here.
So to summarize, what I had called the "URL-reception" part of Apache always deals with /paths/like/this/without/hostname, not just after internal redirects. The spec does say that rewriteCond/rewriteRule match against such paths, but I figured the host name was there initially and got removed. So then all that's left is to ensure our rules are prepared for cases where they are running in an internal redirect spawned by an earlier runthrough of themselves, and not do something inadvertent when they see a file system absolute path caused by a replacement that didn't start with a slash. What a mouthful.
Related
I'm having some difficulty with a super simple htaccess redirect.
All I want to do is rewrite absolutely everything, except a couple files.
htaccess looks like this:
RewriteEngine On
RewriteCond %{REQUEST_URI} !sitemap
RewriteCond %{REQUEST_URI} !robots
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The part that works is that everything gets redirected to new domain as it should be. And I can also access robots.txt without being forwarded, but not with sitemap.xml. If I try to go to sitemap.xml, the domain forwards along anyway and opens the sitemap file on the new domain.
I have this exact same issue when trying to "ignore" index.html. I can ignore robots, I can ignore alternate html or php files, but if I want to ignore index.html, the regex fails.
Since I can't actually SEE what is in the REQUEST_URI variable, my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI? I know this because of a stupid test. If I choose to ignore index.html like this:
RewriteCond %{REQUEST_URI} !index.html
Then if I type example.com/index.html I will be forwarded. But if I just type example.com/ the ignore actually works and it shows the content of index.html without forwarding!
How is it that when I choose to ignore the regex "index.html", it only works when "index.html" is not actually typed in the address bar!?!
And it gets even weirder! Should I type something like example.com/index.html?option=value, then the ignore rule works and I do NOT get forwarded when there are attributes like this. But index.html by itself doesn't work, and then just having the slash root, the rule works again.
I'm completely confused! Why does it seem like REQUEST_URI is not able to see some filenames like index.html and sitemap.xml? I've been Googling for 2 days and not only can I not find out if this is true, but I can't seem to find any websites which actually give examples of what these htaccess server variables actually contain!
Thanks!
my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI?
This is not true. There is no such special treatment of any requested URL. The REQUEST_URI server variable contains the URL-path (only) of the request. This notably excludes the scheme + hostname and any query string (which are available in their own variables).
However, if there are any other mod_rewrite directives that precede this (including the server config) that rewrite the URL then the REQUEST_URI server variable is also updated to reflect the rewritten URL.
index.html (Directory Index)
index.html is possibly a special case. Although, if you are explicitly requesting index.html as part of the URL itself (as you appear to be doing) then this does not apply.
If, on the other hand, you are requesting a directory, eg. http://example.com/subdir/ and relying on mod_dir issuing an internal subrequest for the directory index (ie. index.html), then the REQUEST_URI variable may or may not contain index.html - depending on the version of Apache (2.2 vs 2.4) you are on. On Apache 2.2 mod_dir executes first, so you would need to check for /subdir/index.html. However, on Apache 2.4, mod_rewrite executes first, so you simply check for the requested URL: /subdir/. It's safer to check for both, particularly if you have other rewrites and there is possibility of a second pass through the rewrite engine.
Caching problems
However, the most probable cause in this scenario is simply a caching issue. If the 301 redirect has previously been in place without these exceptions then it's possible these redirections have been cached by the browser. 301 (permanent) redirects are cached persistently by the browser and can cause issues with testing (as well as your users that also have these redirects cached - there is little you can do about that unfortunately).
RewriteCond %{REQUEST_URI} !(sitemap|index|alternate|alt) [NC]
RewriteRule .* alternate.html [R,L]
The example you presented in comments further suggests a caching issue, since you are now getting different results for sitemap than those posted in your question. (It appears to be working as intended in your second example).
Examining Apache server variables
#zzzaaabbb mentioned one method to examine the value of the Apache server variable. (Note that the Apache server variable REQUEST_URI is different to the PHP variable of the same name.) You can also assign the value of an Apache server variable to an environment variable, which is then readable in your application code.
For example:
RewriteRule ^ - [E=APACHE_REQUEST_URI:%{REQUEST_URI}]
You can then examine the value of the APACHE_REQUEST_URI environment variable in your server-side code. Note that if you have any other rewrites that result in the rewritting process to start over then you could get multiple env vars, each prefixed with REDIRECT_.
With the index.html problem, you probably just need to escape the dot (index\.html). You are in the regex pattern-matching area on the right-hand side of RewriteCond. With the un-escaped dot in there, there would need to be a character at that spot in the request, to match, and there isn't, so you're not matching and are getting the unwanted forward.
For the sitemap not matching problem, you could check to see what REQUEST_URI actually contains, by just creating an empty dummy file (to avoid 404 throwing) and then do a redirect at top of .htaccess. Then, in browser URL, type in anything you want to see the REQUEST_URI for -- it will show in address bar.
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
Credit MrWhite with that easy test method.
Hopefully that will show that sitemap in URL ends up as something else, so will at least partially explain why it's not pattern-matching and preventing redirect, when it should be pattern-matching and preventing redirect.
I would also test by being sure that the server isn't stepping in front of things with custom 301 directive that for whatever reason makes sitemap behave unexpectedly. Put this at the top of your .htaccess for that test.
ErrorDocument 301 default
Trying to configure apache2 to load example.com/forum/ from a different document root, relative to the site root. Forums are installed somewhere else on the server.
Is there a directory alias command? I've found the alias configuration entry for apache, but had no luck.
Basically, I want example.com to have the same directory its always had, but example.com/forum/ to be hosted somewhere else, on the same server.
I tagged this question with mod_rewrite because I thought maybe it would be the key, here.
Cheers!
Alias is the right way, unless you have some subtlety that you didn't reveal in your question.
# http.conf
Alias /forum /usr/lib/bbs/ # or whatever
The job of Alias is to take the abstract URL coming into your system and map it to a concrete filesystem path. Once it has done that, the request is no longer an URL but a path. If there is no Alias or similar directive handling that URL, then it will get mapped to a conrete path via DocumentRoot.
If this isn't working, you have to debug it further. Are you getting errors when you access /forum? Look in the error log.
It all depends of what you want. You can "hardlink" with real path and it works (so you were right to think it could work with mod_rewrite).
Quick sample (that works on my production domains) to make an internal change (I add a subdirectory):
RewriteRule (.*) %{DOCUMENT_ROOT}/mysubfolder%{REQUEST_FILENAME} [QSA,L]
So you can easily do something like:
RewriteRule ^/forum/(.*) %{DOCUMENT_ROOT}/mysubfolder%{REQUEST_FILENAME} [QSA,L]
And my suggestion would be that if you plan to have more rewrite rules, keep everything homogeneous, i.e.: keep on using only rewrite rules, so use my suggestion above. This way you'll not get a bad mix of Alias, RewriteRules and so on. For nice and clean stuff: keep everything homogeneous.
I have the following situation:
On my webserver I have an instance of websvn running, where specific repositories and revisions can be accessed by a URL like
http://www.myhost.com/listing.php?repname=repository1&path=%2Ftrunk%2Fbackend
Somehow, out there in the wild, a wrong URL is being used to access this
http://www.myhost.com/listing.php/?repname=repository1&path=%2Ftrunk%2Fbackend
(Notice the slash after listing.php)
Now, although the URL works and websvn still shows the webpage, images and stylesheets do not get loaded correctly, since they are referenced relative.
I tried to add an .htaccess file to the webroot to redirect people accessing the file as directory to the correct URL.
I have tried multiple variations and ended up with this file:
RewriteEngine on
RewriteRule ^/listing.php/ listing.php [R=301,QSA]
But, since I am writing here, you already guessed it: It doesn't work.
I also tried
RewriteEngine on
RewriteRule ^/listing.php(.*) listing.php$1 [R=301,QSA]
What am I doing wrong?
Perhaps among other things, a RewriteRule within .htaccess that starts with “^/” will never match anything at all. (Examples that include a leading slash are for the global configuration file.) Remove the leading forward slash and see if that helps.
Also, I recommend changing the 301 to a 307 until you get it working. Otherwise, your browser will cache the 301 result, redirecting on subsequent references without consulting your server at all and likely giving you very confusing results.
I'm having some trouble with Apache's mod_rewrite. One of the things I'm trying to get it to do is hide some of my implementation details, so that, for example, the user sees the URL http://www.mysite.com/login but Apache responds with the page at http://www.mysite.com/doc_root/login.php instead (preferably without showing the user that it's a PHP file or the directory structure). Here's what I have in my .htaccess file:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?mysite.com*
RewriteRule ^/(\w+) /doc_root/$1.php [L]
#Redirect http://www.mysite.com to the login page
RewriteRule ^/?$ https://www.mysite.com/doc_root/login.php
But when I go to http://www.mysite.com/login, I get a 404 error even though the page exists. I clearly don't have a great understanding of how the mod_rewrite conditionals and rules work, so can anyone please tell me what I'm doing wrong? Thanks.
Take doc_root out of all the stuff you have it in. That will give you the result you're asking for. However I'm not sure if it's desired or not. How are you going to force someone to login if they manually type http://www.mysite.com/index.php?
Also if you're trying to force all traffic to SSL it's better to use a second VirtualHost and Redirect instead of mod_rewrite. Those are all questions probably better suited for ServerFault
Unless your site has a bunch of different domain names, and you only want mysite.com to do the rewriting, you don't need the RewriteCond. (Potential problem. Apache likes to dick around with the domain name unless you set UseCanonicalName off. If the name isn't what it's expecting, the rewrite won't happen.)
In RewriteCond (and RewriteRule) patterns, . matches any character. Add a backslash before them. (Minor bug. Shouldn't cause rewrites to fail, but they would match stuff like "mysite-com" as well.)
mod_rewrite is actually a URL-to-filename filter. Though it is often used to rewrite URLs to other URLs, sometimes it will misbehave if what you're rewriting to is a URL and it can't tell. (Especially if what it's rewriting to would be an alias, or would otherwise not translate directly to a real filename.) If you add a [PT] flag onto your rule, though, it will consider the rewritten thing a URL and pass it along to the other filters (including the ones that turn URLs into filenames).
Do you really need "/doc_root"? The document root should already be set up in Apache using the DocumentRoot directive, and shouldn't need to be part of the URL unless you have multiple apps on the same domain (in which case it's the app root; the document root doesn't change).
UPDATE:
Another thing i just thought about: Rewrite rules work differently in .htaccess files. Apache likes to strip off the leading slash. So you will probably want to get rid of the first slash in your patterns, or at least make it optional (^/?login instead of ^/login).
^/?(\w+) will match /doc_root/login.php, and cause a rewrite to /doc_root/doc_root.php. You should probably have a $ at the end of your pattern.
I have a hyperlink that looks like this:
http://domain.com/sample/comments/65
And when I click on it, it goes to this:
http://domain.com/sample/comments/index.php?submissionid=65
I'm using a rewrite rule to make it do this. This is what I want, except I also want the URL displayed in the browser to still look like "http://domain.com/sample/comments/65."
How can I do this? The .htaccess file is displayed below.
RewriteEngine on
RewriteRule ^comments/([0-9]+)?$ http://domain.com/sample/comments/index.php?submissionid=$1 [NC,L]
Thanks in advance,
John
You must remove the part http://domain.com/sample/, otherwise it will force a redirect:
RewriteEngine on
RewriteRule ^comments/([0-9]+)?$ comments/index.php?submissionid=$1 [NC,L,B]
The B flag is also necessary because you're using the backreference inside a query string, which requires escaping.
The manual says (emphasis mine):
When using the rewrite engine in .htaccess files the per-directory prefix (which always is the same for a specific directory) is automatically removed for the pattern matching and automatically added after the substitution has been done. This feature is essential for many sorts of rewriting; without this, you would always have to match the parent directory, which is not always possible. There is one exception: If a substitution string starts with http://, then the directory prefix will not be added, and an external redirect (or proxy throughput, if using flag P) is forced. See the RewriteBase directive for more information.
This would not be case if you put the rewrite rule in the virtual host or main configuration as long the request host and the host in the rewrite rule matched.