Tips for debugging .htaccess rewrite rules - apache

Many posters have problems debugging their RewriteRule and RewriteCond statements within their .htaccess files. Most of these are using a shared hosting service and therefore don't have access to the root server configuration. They cannot avoid using .htaccess files for rewriting and cannot enable a RewriteLogLevel" as many respondents suggest. Also there are many .htaccess-specific pitfalls and constraints are aren't covered well. Setting up a local test LAMP stack involves too much of a learning curve for most.
So my Q here is how would we recommend that they debug their rules themselves. I provide a few suggestions below. Other suggestions would be appreciated.
Understand that the mod_rewrite engine cycles through .htaccess files. The engine runs this loop:
do
execute server and vhost rewrites (in the Apache Virtual Host Config)
find the lowest "Per Dir" .htaccess file on the file path with rewrites enabled
if found(.htaccess)
execute .htaccess rewrites (in the user's directory)
while rewrite occurred
So your rules will get executed repeatedly and if you change the URI path then it may end up executing other .htaccessfiles if they exist. So make sure that you terminate this loop, if necessary by adding extra RewriteCond to stop rules firing. Also delete any lower level .htaccess rewrite rulesets unless explicitly intent to use multi-level rulesets.
Make sure that the syntax of each Regexp is correct by testing against a set of test patterns to make sure that is a valid syntax and does what you intend with a fully range of test URIs. See answer below for more details.
Build up your rules incrementally in a test directory. You can make use of the "execute the deepest .htaccess file on the path feature" to set up a separate test directory (tree) and debug rulesets here without screwing up your main rules and stopping your site working. You have to add them one at a time because this is the only way to localise failures to individual rules.
Use a dummy script stub to dump out server and environment variables. (See Listing 2)If your app uses, say, blog/index.php then you can copy this into test/blog/index.php and use it to test out your blog rules in the test subdirectory. You can also use environment variables to make sure that the rewrite engine in interpreting substitution strings correctly, e.g.
RewriteRule ^(.*) - [E=TEST0:%{DOCUMENT_ROOT}/blog/html_cache/$1.html]
and look for these REDIRECT_* variables in the phpinfo dump. BTW, I used this one and discovered on my site that I had to use %{ENV:DOCUMENT_ROOT_REAL} instead. In the case of redirector looping REDIRECT_REDIRECT_* variables list the previous pass. Etc..
Make sure that you don't get bitten by your browser caching incorrect 301 redirects. See answer below. My thanks to Ulrich Palha for this.
The rewrite engine seems sensitive to cascaded rules within an .htaccess context, (that is where a RewriteRule results in a substitution and this falls though to further rules), as I found bugs with internal sub-requests (1), and incorrect PATH_INFO processing which can often be prevents by use of the [NS], [L] and [PT] flags.
Any more comment or suggestions?
Listing 1 -- phpinfo
<?php phpinfo(INFO_ENVIRONMENT|INFO_VARIABLES);

Here are a few additional tips on testing rules that may ease the debugging for users on shared hosting
1. Use a Fake-user agent
When testing a new rule, add a condition to only execute it with a fake user-agent that you will use for your requests. This way it will not affect anyone else on your site.
e.g
#protect with a fake user agent
RewriteCond %{HTTP_USER_AGENT} ^my-fake-user-agent$
#Here is the actual rule I am testing
RewriteCond %{HTTP_HOST} !^www\.domain\.com$ [NC]
RewriteRule ^ http://www.domain.com%{REQUEST_URI} [L,R=302]
If you are using Firefox, you can use the User Agent Switcher to create the fake user agent string and test.
2. Do not use 301 until you are done testing
I have seen so many posts where people are still testing their rules and they are using 301's. DON'T.
If you are not using suggestion 1 on your site, not only you, but anyone visiting your site at the time will be affected by the 301.
Remember that they are permanent, and aggressively cached by your browser.
Use a 302 instead till you are sure, then change it to a 301.
3. Remember that 301's are aggressively cached in your browser
If your rule does not work and it looks right to you, and you were not using suggestions 1 and 2, then re-test after clearing your browser cache or while in private browsing.
4. Use a HTTP Capture tool
Use a HTTP capture tool like Fiddler to see the actual HTTP traffic between your browser and the server.
While others might say that your site does not look right, you could instead see and report that all of the images, css and js are returning 404 errors, quickly narrowing down the problem.
While others will report that you started at URL A and ended at URL C, you will be able to see that they started at URL A, were 302 redirected to URL B and 301 redirected to URL C. Even if URL C was the ultimate goal, you will know that this is bad for SEO and needs to be fixed.
You will be able to see cache headers that were set on the server side, replay requests, modify request headers to test ....

Online .htaccess rewrite testing
I found this Googling for RegEx help, it saved me a lot of time from having to upload new .htaccess files every time I make a small modification.
from the site:
htaccess tester
To test your htaccess rewrite rules, simply fill in the url that you're applying the rules to, place the contents of your htaccess on the larger input area and press "Check Now" button.

Don't forget that in .htaccess files it is a relative URL that is matched.
In a .htaccess file the following RewriteRule will never match:
RewriteRule ^/(.*) /something/$s

Set environment variables and use headers to receive them:
You can create new environment variables with RewriteRule lines, as mentioned by OP:
RewriteRule ^(.*) - [E=TEST0:%{DOCUMENT_ROOT}/blog/html_cache/$1.html]
But if you can't get a server-side script to work, how can you then read this environment variable? One solution is to set a header:
Header set TEST_FOOBAR "%{REDIRECT_TEST0}e"
The value accepts format specifiers, including the %{NAME}e specifier for environment variables (don't forget the lowercase e). Sometimes, you'll need to add the REDIRECT_ prefix, but I haven't worked out when the prefix gets added and when it doesn't.

Make sure that the syntax of each Regexp is correct
by testing against a set of test patterns to make sure that is a valid syntax and does what you intend with a fully range of test URIs.
See regexpCheck.php below for a simple script that you can add to a private/test directory in your site to help you do this. I've kept this brief rather than pretty. Just past this into a file regexpCheck.php in a test directory to use it on your website. This will help you build up any regexp and test it against a list of test cases as you do so. I am using the PHP PCRE engine here, but having had a look at the Apache source, this is basically identical to the one used in Apache. There are many HowTos and tutorials which provide templates and can help you build your regexp skills.
Listing 1 -- regexpCheck.php
<html><head><title>Regexp checker</title></head><body>
<?php
$a_pattern= isset($_POST['pattern']) ? $_POST['pattern'] : "";
$a_ntests = isset($_POST['ntests']) ? $_POST['ntests'] : 1;
$a_test = isset($_POST['test']) ? $_POST['test'] : array();
$res = array(); $maxM=-1;
foreach($a_test as $t ){
$rtn = #preg_match('#'.$a_pattern.'#',$t,$m);
if($rtn == 1){
$maxM=max($maxM,count($m));
$res[]=array_merge( array('matched'), $m );
} else {
$res[]=array(($rtn === FALSE ? 'invalid' : 'non-matched'));
}
}
?> <p> </p>
<form method="post" action="<?php echo $_SERVER['SCRIPT_NAME'];?>">
<label for="pl">Regexp Pattern: </label>
<input id="p" name="pattern" size="50" value="<?php echo htmlentities($a_pattern,ENT_QUOTES,"UTF-8");;?>" />
<label for="n"> Number of test vectors: </label>
<input id="n" name="ntests" size="3" value="<?php echo $a_ntests;?>"/>
<input type="submit" name="go" value="OK"/><hr/><p> </p>
<table><thead><tr><td><b>Test Vector</b></td><td> <b>Result</b></td>
<?php
for ( $i=0; $i<$maxM; $i++ ) echo "<td> <b>\$$i</b></td>";
echo "</tr><tbody>\n";
for( $i=0; $i<$a_ntests; $i++ ){
echo '<tr><td> <input name="test[]" value="',
htmlentities($a_test[$i], ENT_QUOTES,"UTF-8"),'" /></td>';
foreach ($res[$i] as $v) { echo '<td> ',htmlentities($v, ENT_QUOTES,"UTF-8"),' </td>';}
echo "</tr>\n";
}
?> </table></form></body></html>

One from a couple of hours that I wasted:
If you've applied all these tips and are only going on 500 errors because you don't have access to the server error log, maybe the problem isn't in the .htaccess but in the files it redirects to.
After I had fixed my .htaccess-problem I spent two more hours trying to fix it some more, even though I simply had forgotten about some permissions.

Make sure you use the percent sign in front of variables, not the dollar sign.
It's %{HTTP_HOST}, not ${HTTP_HOST}. There will be nothing in the error_log, there will be no Internal Server Errors, your regexp is still correct, the rule will just not match. This is really hideous if you work with django / genshi templates a lot and have ${} for variable substitution in muscle memory.

If you're creating redirections, test with curl to avoid browser caching issues.
Use -I to fetch http headers only.
Use -L to follow all redirections.

Regarding 4., you still need to ensure that your "dummy script stub" is actually the target URL after all the rewriting is done, or you won't see anything!
A similar/related trick (see this question) is to insert a temporary rule such as:
RewriteRule (.*) /show.php?url=$1 [END]
Where show.php is some very simple script that just displays its $_GET parameters (you can display environment variables too, if you want).
This will stop the rewriting at the point you insert it into the ruleset, rather like a breakpoint in a debugger.
If you're using Apache <2.3.9, you'll need to use [L] rather than [END], and you may then need to add:
RewriteRule ^show.php$ - [L]
At the very top of your ruleset, if the URL /show.php is itself being rewritten.

Some mistakes I observed happens when writing .htaccess
Using of ^(.*)$ repetitively in multiple rules, using ^(.*)$ causes other rules to be impotent in most cases, because it matches all of the url in single hit.
So, if we are using rule for this url sapmle/url it will also consume this url sapmle/url/string.
[L] flag should be used to ensure our rule has done processing.
Should know about:
Difference in %n and $n
%n is matched during %{RewriteCond} part and $n is matches on %{RewriteRule} part.
Working of RewriteBase
The RewriteBase directive specifies the URL prefix to be used for
per-directory (htaccess) RewriteRule directives that substitute a
relative path.
This directive is required when you use a relative path in a
substitution in per-directory (htaccess) context unless any of the
following conditions are true:
The original request, and the substitution, are underneath the
DocumentRoot (as opposed to reachable by other means, such as Alias).
The filesystem path to the directory containing the RewriteRule,
suffixed by the relative substitution is also valid as a URL path on
the server (this is rare). In Apache HTTP Server 2.4.16 and later,
this directive may be omitted when the request is mapped via Alias or
mod_userdir.

I found this question while trying to debug my mod_rewrite issues, and it definitely has some helpful advice. But in the end the most important thing is to make sure you have your regex syntax correct. Due to problems with my own RE syntax, installing the regexpCheck.php script was not a viable option.
But since Apache uses Perl-Compatible Regular Expressions (PCRE)s, any tool which helps writing PCREs should help. I've used RegexPlanet's tool with Java and Javascript REs in the past, and was happy to find that they support Perl as well.
Just type in your regular expression and one or more example URLs, and it will tell you if the regex matches (a "1" in the "~=" column) and if applicable, any matching groups (the numbers in the "split" column will correspond to the numbers Apache expects, e.g. $1, $2 etc.) for each URL. They claim PCRE support is "in beta", but it was just what I needed to solve my syntax problems.
http://www.regexplanet.com/advanced/perl/index.html
I'd have simply added a comment to an existing answer but my reputation isn't yet at that level. Hope this helps someone.

In case you are not working in an standard shared hosting environment, but in one to which you have administration access (maybe your local test environment), make sure that use of .htaccess and mod_rewrite are enabled. They are disabled in a default Apache installation. And in that case, no action configured in your .htaccess file works, even if the regexes are perfectly valid.
To enable the use of .htaccess:
Find file apache2.conf, on Debian/Ubuntu this is in /etc/apache2, and within the file the section
<Directory /var/www/>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
</Directory>
and change the line AllowOverride None to AllowOverride All.
To enable module mod_rewrite:
On Debian/Ubuntu, execute
sudo a2enmod rewrite
By the way, to disable a module, you would use a2dismode instead of a2enmode.
After you did the above configuration changes, restart Apache for them to take effect:
sudo systemctl restart apache2

If you're planning on writing more than just one line of rules in .htacesss,
don't even think about trying one of those hot-fix methods to debug it.
I have wasted days setting multiple rules, without feedback from LOGs, only to finally give up.
I got Apache on my PC, copied the whole site to its HDD, and got the whole rule-set sorted out, using the logs, real fast.
Then I reviewed my old rules, which been working. I saw they were not really doing what was desired. A time bomb, given a slightly different address.
There are so many pit falls in rewrite rules, it's not a straight logic thing at all.
You can get Apache up and running in ten minutes, it's 10MB, good license, *NIX/WIN/MAC ready, even without install.
Also, check the header lines of your server and get the same version of Apache from their archive if it's old. My OP is still on 2.0; many things are not supported.

I'll leave this here, maybe obvious detail, but got me banging my head for hours:
be careful using %{REQUEST_URI} because what #Krist van Besien say in his answer is totally right, but not for the REQUEST_URI string, because the out put of this TestString starts with a /. So take care:
RewriteCond %{REQUEST_URI} ^/assets/$
^
| check this pesky fella right here if missing

Best way to debug it!
Add LogLevel notice rewrite:trace8 to the httpd.conf of apache to log all notices of mod_rewrite. If you are at shared hosting and don't have access to httpd.conf then test it locally and upload to the live site. Once enabled this generate a very large log in very short time, it means it can't be tested on live server anyway.

(Similar to Doin idea)
To show what is being matched, I use this code
$keys = array_keys($_GET);
foreach($keys as $i=>$key){
echo "$i => $key <br>";
}
Save it to r.php on the server root and then do some tests in .htaccess
For example, i want to match urls that do not start with a language prefix
RewriteRule ^(?!(en|de)/)(.*)$ /r.php?$1&$2 [L] #$1&$2&...
RewriteRule ^(.*)$ /r.php?nomatch [L] #report nomatch and exit

as pointed out by #JCastell, the online tester does a good job of testing individual redirects against an .htaccess file. However, more interesting is the api exposed which can be used to batch test a list of urls using a json object. However, to make it more useful, I have written a small bash script file which makes use of curl and jq to submit a list of urls and parse the json response into a CSV formated output with the line number and rule matched in the htaccess file along with the redirected url, making it quite handy to compare a list of urls in a spreadsheet and quickly determine which rules are not working.

Perhaps the best way to debug rewrite rules is not to use rewrite rules at all, but to defer URL processing from the htaccess file to a PHP file (let's call it router.php). Then, you can use PHP to do any manipulating you like, with proper error detection and the usual ways to do debugging. This even runs faster, too, since you don't have to use the rewriting module.
To transfer control immediately from .htaccess to router.php for any URL that is not found in the file system, just put the following line in .htaccess:
FallbackResource router.php
Yes, it's really that easy. And yes, it really works. Give it a try.
Note: You may need an ErrorDocument directive in your .htacess file to transfer control explicitly for certain URLs to your router.php file on HTTP status 404, especially if you inherit from a parent htaccess file that handles status 404. So that would make it a total of two lines to transfer control to a router file.

If you are working with url, You might want to check if you "Enable Mod Rewrite"

Related

.httaccess mod_rewrite by first hiding the .php extension and then query string variable [duplicate]

"Pretty links" is an often requested topic, but it is rarely fully explained. mod_rewrite is one way to make "pretty links", but it's complex and its syntax is very terse, hard to grok, and the documentation assumes a certain level of proficiency in HTTP. Can someone explain in simple terms how "pretty links" work and how mod_rewrite can be used to create them?
Other common names, aliases, terms for clean URLs: RESTful URLs, user-friendly URLs, SEO-friendly URLs, slugging, and MVC URLs (probably a misnomer)
To understand what mod_rewrite does you first need to understand how a web server works. A web server responds to HTTP requests. An HTTP request at its most basic level looks like this:
GET /foo/bar.html HTTP/1.1
This is the simple request of a browser to a web server requesting the URL /foo/bar.html from it. It is important to stress that it does not request a file, it requests just some arbitrary URL. The request may also look like this:
GET /foo/bar?baz=42 HTTP/1.1
This is just as valid a request for a URL, and it has more obviously nothing to do with files.
The web server is an application listening on a port, accepting HTTP requests coming in on that port and returning a response. A web server is entirely free to respond to any request in any way it sees fit/in any way you have configured it to respond. This response is not a file, it's an HTTP response which may or may not have anything to do with physical files on any disk. A web server doesn't have to be Apache, there are many other web servers which are all just programs which run persistently and are attached to a port which respond to HTTP requests. You can write one yourself. This paragraph was intended to divorce you from any notion that URLs directly equal files, which is really important to understand. :)
The default configuration of most web servers is to look for a file that matches the URL on the hard disk. If the document root of the server is set to, say, /var/www, it may look whether the file /var/www/foo/bar.html exists and serve it if so. If the file ends in ".php" it will invoke the PHP interpreter and then return the result. All this association is completely configurable; a file doesn't have to end in ".php" for the web server to run it through the PHP interpreter, and the URL doesn't have to match any particular file on disk for something to happen.
mod_rewrite is a way to rewrite the internal request handling. When the web server receives a request for the URL /foo/bar, you can rewrite that URL into something else before the web server will look for a file on disk to match it. Simple example:
RewriteEngine On
RewriteRule /foo/bar /foo/baz
This rule says whenever a request matches "/foo/bar", rewrite it to "/foo/baz". The request will then be handled as if /foo/baz had been requested instead. This can be used for various effects, for example:
RewriteRule (.*) $1.html
This rule matches anything (.*) and captures it ((..)), then rewrites it to append ".html". In other words, if /foo/bar was the requested URL, it will be handled as if /foo/bar.html had been requested. See http://regular-expressions.info for more information about regular expression matching, capturing and replacements.
Another often encountered rule is this:
RewriteRule (.*) index.php?url=$1
This, again, matches anything and rewrites it to the file index.php with the originally requested URL appended in the url query parameter. I.e., for any and all requests coming in, the file index.php is executed and this file will have access to the original request in $_GET['url'], so it can do anything it wants with it.
Primarily you put these rewrite rules into your web server configuration file. Apache also allows* you to put them into a file called .htaccess within your document root (i.e. next to your .php files).
* If allowed by the primary Apache configuration file; it's optional, but often enabled.
What mod_rewrite does not do
mod_rewrite does not magically make all your URLs "pretty". This is a common misunderstanding. If you have this link in your web site:
<a href="/my/ugly/link.php?is=not&very=pretty">
there's nothing mod_rewrite can do to make that pretty. In order to make this a pretty link, you have to:
Change the link to a pretty link:
<a href="/my/pretty/link">
Use mod_rewrite on the server to handle the request to the URL /my/pretty/link using any one of the methods described above.
(One could use mod_substitute in conjunction to transform outgoing HTML pages and their contained links. Though this is usally more effort than just updating your HTML resources.)
There's a lot mod_rewrite can do and very complex matching rules you can create, including chaining several rewrites, proxying requests to a completely different service or machine, returning specific HTTP status codes as responses, redirecting requests etc. It's very powerful and can be used to great good if you understand the fundamental HTTP request-response mechanism. It does not automatically make your links pretty.
See the official documentation for all the possible flags and options.
To expand on deceze's answer, I wanted to provide a few examples and explanation of some other mod_rewrite functionality.
All of the below examples assume that you have already included RewriteEngine On in your .htaccess file.
Rewrite Example
Lets take this example:
RewriteRule ^blog/([0-9]+)/([A-Za-z0-9-\+]+)/?$ /blog/index.php?id=$1&title=$2 [NC,L,QSA]
The rule is split into 4 sections:
RewriteRule - starts the rewrite rule
^blog/([0-9]+)/([A-Za-z0-9-\+]+)/?$ - This is called the pattern, however I'll just refer to it as the left hand side of the rule - what you want to rewrite from
blog/index.php?id=$1&title=$2 - called the substitution, or right hand side of a rewrite rule - what you want to rewrite to
[NC,L,QSA] are flags for the rewrite rule, separated by a comma, which I will explain more on later
The above rewrite would allow you to link to something like /blog/1/foo/ and it would actually load /blog/index.php?id=1&title=foo.
Left hand side of the rule
^ indicates the start of the page name - so it will rewrite example.com/blog/... but not example.com/foo/blog/...
Each set of (…) parentheses represents a regular expression that we can capture as a variable in the right hand side of the rule. In this example:
The first set of brackets - ([0-9]+) - matches a string with a minimum of 1 character in length and with only numeric values (i.e. 0-9). This can be referenced with $1 in the right hand side of the rule
The second set of parentheses matches a string with a minimum of 1 character in length, containing only alphanumeric characters (A-Z, a-z, or 0-9) or - or + (note + is escaped with a backslash as without escaping it this will execute as a regex repetition character). This can be referenced with $2 in the right hand side of the rule
? means that the preceding character is optional, so in this case both /blog/1/foo/ and /blog/1/foo would rewrite to the same place
$ indicates this is the end of the string we want to match
Flags
These are options that are added in square brackets at the end of your rewrite rule to specify certain conditions. Again, there are a lot of different flags which you can read up on in the documentation, but I'll go through some of the more common flags:
NC
The no case flag means that the rewrite rule is case insensitive, so for the example rule above this would mean that both /blog/1/foo/ and /BLOG/1/foo/ (or any variation of this) would be matched.
L
The last flag indicates that this is the last rule that should be processed. This means that if and only if this rule matches, no further rules will be evaluated in the current rewrite processing run. If the rule does not match, all other rules will be tried in order as usual. If you do not set the L flag, all following rules will be applied to the rewritten URL afterwards.
END
Since Apache 2.4 you can also use the [END] flag. A matching rule with it will completely terminate further alias/rewrite processing. (Whereas the [L] flag can oftentimes trigger a second round, for example when rewriting into or out of subdirectories.)
QSA
The query string append flag allows us to pass in extra variables to the specified URL which will get added to the original get parameters. For our example this means that something like /blog/1/foo/?comments=15 would load /blog/index.php?id=1&title=foo&comments=15
R
This flag isn't one I used in the example above, but is one I thought is worth mentioning. This allows you to specify a http redirect, with the option to include a status code (e.g. R=301). For example if you wanted to do a 301 redirect on /myblog/ to /blog/ you would simply write a rule something like this:
RewriteRule ^/myblog/(*.)$ /blog/$1 [R=301,QSA,L]
Rewrite Conditions
Rewrite conditions make rewrites even more powerful, allowing you to specify rewrites for more specific situations. There are a lot of conditions which you can read about in the documentation, but I'll touch on a few common examples and explain them:
# if the host doesn't start with www. then add it and redirect
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
This is a very common practice, which will prepend your domain with www. (if it isn't there already) and execute a 301 redirect. For example, loading up http://example.com/blog/ it would redirect you to http://www.example.com/blog/
# if it cant find the image, try find the image on another domain
RewriteCond %{REQUEST_URI} \.(jpg|jpeg|gif|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*)$ http://www.example.com/$1 [L]
This is slightly less common, but is a good example of a rule that doesn't execute if the filename is a directory or file that exists on the server.
%{REQUEST_URI} \.(jpg|jpeg|gif|png)$ [NC] will only execute the rewrite for files with a file extension of jpg, jpeg, gif or png (case insensitive).
%{REQUEST_FILENAME} !-f will check to see if the file exists on the current server, and only execute the rewrite if it doesn't
%{REQUEST_FILENAME} !-d will check to see if the file exists on the current server, and only execute the rewrite if it doesn't
The rewrite will attempt to load the same file on another domain
References
Stack Overflow has many other great resources to get started:
Serverfault: Everything you ever wanted to know about mod_rewrite
(Keep in mind to remove the slash in ^/ pattern prefixes for .htaccess usage.)
Do's and Dont's in Hidden features of mod_rewrite.
Look through our most popular mod-rewrite questions and answers.
Apache redirecting and remapping guide.
AskApache ultimate .htaccess guide
And the mod-rewrite tag wiki references.
And newcomer-friendly regex overviews even:
Our regex tag wiki for a syntax compendium.
And the short Apache regex summary.
Else regexp.info for easy-to-understand basics.
Oft-used placeholders
.* matches anything, even an empty string. You don't want to use this pattern everywhere, but often in the last fallback rule.
[^/]+ is more often used for path segments. It matches anything but the forward slash.
\d+ only matches numeric strings.
\w+ matches alphanumeric characters. It's basically shorthand for [A-Za-z0-9_].
[\w\-]+ for "slug"-style path segments, using letters, numbers, dash - and _
[\w\-.,]+ adds periods and commas. Prefer an escaped \- dash in […] charclasses.
\. denotes a literal period. Otherwise . outside of […] is placeholder for any symbol.
Each of these placeholders is usually wrapped in (…) parentheses as capture group. And the whole pattern often in ^………$ start + end markers. Quoting "patterns" is optional.
RewriteRules
The following examples are PHP-centric and a bit more incremental, easier to adapt for similar cases.
They're just summaries, often link to more variations or detailed Q&As.
Static mapping /contact, /about
Shortening a few page names to internal file schemes is most simple:
RewriteRule ^contact$ templ/contact.html
RewriteRule ^about$ about.php
Numeric identifiers /object/123
Introducing shortcuts like http://example.com/article/531 to existing PHP scripts is also easy. The numeric placeholder can just be remapped to a $_GET parameter:
RewriteRule ^article/(\d+)$ article-show.php?id=$1
# └───────────────────────────┘
Slug-style placeholders /article/with-some-title-slug
You can easily extend that rule to allow for /article/title-string placeholders:
RewriteRule ^article/([\w-]+)$ article-show.php?title=$1
# └────────────────────────────────┘
Note that your script must be able (or be adapted) to map those titles back to database-ids. RewriteRules alone can't create or guess information out of thin air.
Slugs with numeric prefixes /readable/123-plus-title
Therefore you'll often see mixed /article/529-title-slug paths used in practice:
RewriteRule ^article/(\d+)-([\w-]+)$ article.php?id=$1&title=$2
# └───────────────────────────────┘
Now you could just skip passing the title=$2 anyway, because your script will typically rely on the database-id anyway. The -title-slug has become arbitrary URL decoration.
Uniformity with alternative lists /foo/… /bar/… /baz/…
If you have similar rules for multiple virtual page paths, then you can match and compact them with | alternative lists. And again just reassign them to internal GET parameters:
# ┌─────────────────────────┐
RewriteRule ^(blog|post|user)/(\w+)$ disp.php?type=$1&id=$2
# └───────────────────────────────────┘
You can split them out into individual RewriteRules should this get too complex.
Dispatching related URLs to different backends /date/SWITCH/backend
A more practical use of alternative lists are mapping request paths to distinct scripts. For example to provide uniform URLs for an older and a newer web application based on dates:
# ┌─────────────────────────────┐
# │ ┌───────────┼───────────────┐
RewriteRule ^blog/(2009|2010|2011)/([\d-]+)/?$ old/blog.php?date=$2
RewriteRule ^blog/(\d+)/([\d-]+)/?$ modern/blog/index.php?start=$2
# └──────────────────────────────────────┘
This simply remaps 2009-2011 posts onto one script, and all other years implicitly to another handler.
Note the more specific rule coming first. Each script might use different GET params.
Other delimiters than just / path slashes /user-123-name
You're most commonly seeing RewriteRules to simulate a virtual directory structure. But you're not forced to be uncreative. You can as well use - hyphens for segmenting or structure.
RewriteRule ^user-(\d+)$ show.php?what=user&id=$1
# └──────────────────────────────┘
# This could use `(\w+)` alternatively for user names instead of ids.
For the also common /wiki:section:Page_Name scheme:
RewriteRule ^wiki:(\w+):(\w+)$ wiki.php?sect=$1&page=$2
# └─────┼────────────────────┘ │
# └────────────────────────────┘
Occasionally it's suitable to alternate between /-delimiters and : or . in the same rule even. Or have two RewriteRules again to map variants onto different scripts.
Optional trailing / slash /dir = /dir/
When opting for directory-style paths, you can make it reachable with and without a final /
RewriteRule ^blog/([\w-]+)/?$ blog/show.php?id=$1
# ┗┛
Now this handles both http://example.com/blog/123 and /blog/123/. And the /?$ approach is easy to append onto any other RewriteRule.
Flexible segments for virtual paths .*/.*/.*/.*
Most rules you'll encounter map a constrained set of /…/ resource path segments to individual GET parameters. Some scripts handle a variable number of options however.
The Apache regexp engine doesn't allow optionalizing an arbitrary number of them. But you can easily expand it into a rule block yourself:
Rewriterule ^(\w+)/?$ in.php?a=$1
Rewriterule ^(\w+)/(\w+)/?$ in.php?a=$1&b=$2
Rewriterule ^(\w+)/(\w+)/(\w+)/?$ in.php?a=$1&b=$2&c=$3
# └─────┴─────┴───────────────────┴────┴────┘
If you need up to five path segments, then copy this scheme along into five rules. You can of course use a more specific [^/]+ placeholder each.
Here the ordering isn't as important, as neither overlaps. So having the most frequently used paths first is okay.
Alternatively you can utilize PHPs array parameters via ?p[]=$1&p[]=$2&p[]=3 query string here - if your script merely prefers them pre-split.
(Though it's more common to just use a catch-all rule, and let the script itself expand the segments out of the REQUEST_URI.)
See also: How do I transform my URL path segments into query string key-value pairs?
Optional segments prefix/opt?/.*
A common variation is to have optional prefixes within a rule. This usually makes sense if you have static strings or more constrained placeholders around:
RewriteRule ^(\w+)(?:/([^/]+))?/(\w+)$ ?main=$1&opt=$2&suffix=$3
Now the more complex pattern (?:/([^/])+)? there simply wraps a non-capturing (?:…) group, and makes it optional )?. The contained
placeholder ([^/]+) would be substitution pattern $2, but be empty if there's no middle /…/ path.
Capture the remainder /prefix/123-capture/…/*/…whatever…
As said before, you don't often want too generic rewrite patterns. It does however make sense to combine static and specific comparisons with a .* sometimes.
RewriteRule ^(specific)/prefix/(\d+)(/.*)?$ speci.php?id=$2&otherparams=$2
This optionalized any /…/…/… trailing path segments. Which then of course requires the handling script to split them up, and variabl-ify extracted parameters
itself (which is what Web-"MVC" frameworks do).
Trailing file "extensions" /old/path.HTML
URLs don't really have file extensions. Which is what this entire reference is about (= URLs are virtual locators, not necessarily a direct filesystem image).
However if you had a 1:1 file mapping before, you can craft simpler rules:
RewriteRule ^styles/([\w\.\-]+)\.css$ sass-cache.php?old_fn_base=$1
RewriteRule ^images/([\w\.\-]+)\.gif$ png-converter.php?load_from=$2
Other common uses are remapping obsolete .html paths to newer .php handlers, or just aliasing directory names only for individual (actual/real) files.
Ping-Pong (redirects and rewrites in unison) /ugly.html ←→ /pretty
So at some point you're rewriting your HTML pages to carry only pretty links, as outlined by deceze.
Meanwhile you'll still receive requests for the old paths, sometimes even from bookmarks. As workaround, you can ping-pong browsers to display/establish
the new URLs.
This common trick involves sending a 30x/Location redirect whenever an incoming URL follows the obsolete/ugly naming scheme.
Browsers will then rerequest the new/pretty URL, which afterwards is rewritten (just internally) to the original or new location.
# redirect browser for old/ugly incoming paths
RewriteRule ^old/teams\.html$ /teams [R=301,QSA,END]
# internally remap already-pretty incoming request
RewriteRule ^teams$ teams.php [QSA,END]
Note how this example just uses [END] instead of [L] to safely alternate. For older Apache 2.2 versions you can use other workarounds, besides also remapping
query string parameters for example:
Redirect ugly to pretty URL, remap back to the ugly path, without infinite loops
Spaces ␣ in patterns /this+that+
It's not that pretty in browser address bars, but you can use spaces in URLs. For rewrite patterns use backslash-escaped \␣ spaces.
Else just "-quote the whole pattern or substitution:
RewriteRule "^this [\w ]+/(.*)$" "index.php?id=$1" [L]
Clients serialize URLs with + or %20 for spaces. Yet in RewriteRules they're interpreted with literal characters for all relative path segments.
Frequent duplicates:
Catch-all for a central dispatcher / front-controller script
RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI} !-d
RewriteRule ^.*$ index.php [L]
Which is often used by PHP frameworks or WebCMS / portal scripts. The actual path splitting then is handled in PHP using $_SERVER["REQUEST_URI"]. So conceptionally it's pretty much the opposite of URL handling "per mod_rewrite". (Just use FallBackResource instead.)
Remove www. from hostname
Note that this doesn't copy a query string along, etc.
# ┌──────────┐
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC] │
RewriteRule ^(.*)$ http://%1/$1 [R=301,L] │
# ↓ └───┼────────────┘
# └───────────────┘
See also:
· URL rewriting for different protocols in .htaccess
· Generic htaccess redirect www to non-www
· .htaccess - how to force "www." in a generic way?
Note that RewriteCond/RewriteRule combos can be more complex, with matches (%1 and $1) interacting in both directions even:
Apache manual - mod_rewrite intro, Copyright 2015 The Apache Software Foundation, AL-2.0
Redirect to HTTPS://
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R,L]
See also: https://wiki.apache.org/httpd/RewriteHTTPToHTTPS
"Removing" the PHP extension
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L] # or [END]
See also: Removing the .php extension with mod_rewrite
Aliasing old .html paths to .php scripts
See: http://httpd.apache.org/docs/2.4/rewrite/remapping.html#backward-compatibility
Rewrite from URL like "/page" to a script such as "/index.php/page"
See mod_rewrite, php and the .htaccess file
Redirect subdomain to a folder
See How can i get my htaccess to work (subdomains)?
Prevalent .htaccess pitfalls
Now take this with a grain of salt. Not every advise can be generalized to all contexts.
This is just a simple summary of well-known and a few unobvious stumbling blocks:
Enable mod_rewrite and .htaccess
To actually use RewriteRules in per-directory configuration files you must:
Check that your server has AllowOverride All enabled. Otherwise your per-directory .htaccess directives will go ignored, and RewriteRules won't work.
Obviously have mod_rewrite enabled in your httpd.conf modules section.
Prepend each list of rules with RewriteEngine On still. While mod_rewrite is implicitly active in <VirtualHost> and <Directory> sections,
the per-directory .htaccess files need it individually summoned.
The leading slash ^/ won't match
You shouldn't start your .htaccess RewriteRule patterns with ^/ normally:
RewriteRule ^/article/\d+$ …
↑
This is often seen in old tutorials. And it used to be correct for ancient Apache 1.x versions. Nowadays request paths are conveniently fully directory-relative in .htaccess RewriteRules. Just leave the leading / out.
· Note that the leading slash is still correct in <VirtualHost> sections though. Which is why you often see it ^/? optionalized for rule parity.
· Or when using a RewriteCond %{REQUEST_URI} you'd still match for a leading /.
· See also Webmaster.SE: When is the leading slash (/) needed in mod_rewrite patterns?
<IfModule *> wrappers begone!
You've probably seen this in many examples:
<IfModule mod_rewrite.c>
Rewrite…
</IfModule>
It does make sense in <VirtualHost> sections - if it was combined with another fallback option, such as ScriptAliasMatch. (But nobody ever does that).
And it's commonly distributed for default .htaccess rulesets with many open source projects. There it's just meant as fallback, and keeps "ugly" URLs work as default.
However you don't want that usually in your own .htaccess files.
Firstly, mod_rewrite does not randomly disengage. (If it did, you'd have bigger problems).
Were it really be disabled, your RewriteRules still wouldn't work anyway.
It's meant to prevent HTTP 500 errors. What it usually accomplishes is gracing your users with HTTP 404 errors instead. (Not so much more user-friendly if you think about it.)
Practically it just suppresses the more useful log entries, or server notification mails. You'd be none the wiser as to why your RewriteRules never work.
What seems enticing as generalized safeguard, often turns out to be an obstacle in practice.
Don't use RewriteBase unless needed
Many copy+paste examples contain a RewriteBase / directive. Which happens to be the implicit default anyway. So you don't actually need this. It's a workaround for fancy VirtualHost rewriting schemes, and misguessed DOCUMENT_ROOT paths for some shared hosters.
It makes sense to use with individual web applications in deeper subdirectories. It can shorten RewriteRule patterns in such cases. Generally it's best to prefer relative path specifiers in per-directory rule sets.
See also How does RewriteBase work in .htaccess
Disable MultiViews when virtual paths overlap
URL rewriting is primarily used for supporting virtual incoming paths. Commonly you just have one dispatcher script (index.php) or a few individual handlers (articles.php, blog.php, wiki.php, …). The latter might clash with similar virtual RewriteRule paths.
A request for /article/123 for example could map to article.php with a /123 PATH_INFO implicitly. You'd either have to guard your rules then with the commonplace RewriteCond !-f+!-d, and/or disable PATH_INFO support, or perhaps just disable Options -MultiViews.
Which is not to say you always have to. Content-Negotiation is just an automatism to virtual resources.
Ordering is important
See Everything you ever wanted to know about mod_rewrite
if you haven't already. Combining multiple RewriteRules often leads to interaction. This isn't something to prevent habitually per [L] flag, but a scheme you'll embrace once versed.
You can re-re-rewrite virtual paths from one rule to another, until it reaches an actual target handler.
Still you'd often want to have the most specific rules (fixed string /forum/… patterns, or more restrictive placeholders [^/.]+) in the early rules.
Generic slurp-all rules (.*) are better left to the later ones. (An exception is a RewriteCond -f/-d guard as primary block.)
Stylesheets and images stop working
When you introduce virtual directory structures /blog/article/123 this impacts relative resource references in HTML (such as <img src=mouse.png>).
Which can be solved by:
Only using server-absolute references href="/old.html" or src="/logo.png"
Often simply by adding <base href="/index"> into your HTML <head> section.
This implicitly rebinds relative references to what they were before.
You could alternatively craft further RewriteRules to rebind .css or .png paths to their original locations.
But that's both unneeded, or incurs extra redirects and hampers caching.
See also: CSS, JS and images do not display with pretty url
RewriteConds just mask one RewriteRule
A common misinterpetation is that a RewriteCond blocks multiple RewriteRules (because they're visually arranged together):
RewriteCond %{SERVER_NAME} localhost
RewriteRule ^secret admin/tools.php
RewriteRule ^hidden sqladmin.cgi
Which it doesn't per default. You can chain them using the [S=2] flag. Else you'll have to repeat them. While sometimes you can craft an "inverted" primary rule to [END] the rewrite processing early.
QUERY_STRING exempt from RewriteRules
You can't match RewriteRule index.php\?x=y, because mod_rewrite compares just against relative paths per default. You can match them separately however via:
RewriteCond %{QUERY_STRING} \b(?:param)=([^&]+)(?:&|$)
RewriteRule ^add/(.+)$ add/%1/$1 # ←──﹪₁──┘
See also How can I match query string variables with mod_rewrite?
.htaccess vs. <VirtualHost>
If you're using RewriteRules in a per-directory config file, then worrying about regex performance is pointless. Apache retains
compiled PCRE patterns longer than a PHP process with a common routing framework. For high-traffic sites you should however consider
moving rulesets into the vhost server configuration, once they've been battle-tested.
In this case, prefer the optionalized ^/? directory separator prefix. This allows to move RewriteRules freely between PerDir and server
config files.
Whenever something doesn't work
Fret not.
Compare access.log and error.log
Often you can figure out how a RewriteRule misbehaves just from looking at your error.log and access.log.
Correlate access times to see which request path originally came in, and which path/file Apache couldn't resolve to (error 404/500).
This doesn't tell you which RewriteRule is the culprit. But inaccessible final paths like /docroot/21-.itle?index.php may give away where to inspect further.
Otherwise disable rules until you get some predictable paths.
Enable the RewriteLog
See Apache RewriteLog docs. For debugging you can enable it in the vhost sections:
# Apache 2.2
RewriteLogLevel 5
RewriteLog /tmp/rewrite.log
# Apache 2.4
LogLevel alert rewrite:trace5
#ErrorLog /tmp/rewrite.log
That yields a detailed summary of how incoming request paths get modified by each rule:
[..] applying pattern '^test_.*$' to uri 'index.php'
[..] strip per-dir prefix: /srv/www/vhosts/hc-profi/index.php -> index.php
[..] applying pattern '^index\.php$' to uri 'index.php'
Which helps to narrow down overly generic rules and regex mishaps.
See also:
· .htaccess not working (mod_rewrite)
· Tips for debugging .htaccess rewrite rules
Before asking your own question
As you might know, Stack Overflow is very suitable for asking questions on mod_rewrite. Make them on-topic
by including prior research and attempts (avoid redundant answers), demonstrate basic regex understanding, and:
Include full examples of input URLs, falsly rewritten target paths, your real directory structure.
The complete RewriteRule set, but also single out the presumed defective one.
Apache and PHP versions, OS type, filesystem, DOCUMENT_ROOT, and PHPs $_SERVER environment if it's about a parameter mismatch.
An excerpt from your access.log and error.log to verify what the existing rules resolved to. Better yet, a rewrite.log summary.
This nets quicker and more exact answers, and makes them more useful to others.
Comment your .htaccess
If you copy examples from somewhere, take care to include a # comment and origin link. While it's merely bad manners to omit attribution,
it often really hurts maintenance later. Document any code or tutorial source. In particular while unversed you should be
all the more interested in not treating them like magic blackboxes.
It's not "SEO"-URLs
Disclaimer: Just a pet peeve. You often hear pretty URL rewriting schemes referred to as "SEO" links or something. While this is useful for googling examples, it's a dated misnomer.
None of the modern search engines are really disturbed by .html and .php in path segments, or ?id=123 query strings for that matter. Search engines of old, such as AltaVista, did avoid crawling websites with potentially ambigious access paths. Modern crawlers are often even craving for deep web resources.
What "pretty" URLs should conceptionally be used for is making websites user-friendly.
Having readable and obvious resource schemes.
Ensuring URLs are long-lived (AKA permalinks).
Providing discoverability through /common/tree/nesting.
However don't sacrifice unique requirements for conformism.
Tools
There are various online tools to generate RewriteRules for most GET-parameterish URLs:
http://www.generateit.net/mod-rewrite/index.php
http://www.ipdistance.com/mod_rewrite.php
http://webtools.live2support.com/misc_rewrite.php
Mostly just output [^/]+ generic placeholders, but likely suffices for trivial sites.
Alternatives to mod_rewrite
Many basic virtual URL schemes can be achieved without using RewriteRules. Apache allows PHP scripts to be invoked without .php extension, and with a virtual PATH_INFO argument.
Use the PATH_INFO, Luke
Nowadays AcceptPathInfo On is often enabled by default. Which basically allows .php and other resource URLs to carry a virtual argument:
http://example.com/script.php/virtual/path
Now this /virtual/path shows up in PHP as $_SERVER["PATH_INFO"] where you can handle any extra arguments however you like.
This isn't as convenient as having Apache separate input path segments into $1, $2, $3 and passing them as distinct $_GET variables to PHP. It's merely emulating "pretty URLs" with less configuration effort.
Enable MultiViews to hide the .php extension
The simplest option to also eschew .php "file extensions" in URLs is enabling:
Options +MultiViews
This has Apache select article.php for HTTP requests on /article due to the matching basename. And this works well together with the aforementioned PATH_INFO feature. So you can just use URLs like http://example.com/article/virtual/title. Which makes sense if you have a traditional web application with multiple PHP invocation points/scripts.
Note that MultiViews has a different/broader purpose though. It incurs a very minor performance penalty, because Apache always looks for other files with matching basenames. It's actually meant for Content-Negotiation, so browsers receive the best alternative among available resources (such as article.en.php, article.fr.php, article.jp.mp4).
SetType or SetHandler for extensionless .php scripts
A more directed approach to avoid carrying around .php suffixes in URLs is configuring the PHP handler for other file schemes. The simplest option is overriding the default MIME/handler type via .htaccess:
DefaultType application/x-httpd-php
This way you could just rename your article.php script to just article (without extension), but still have it processed as PHP script.
Now this can have some security and performance implications, because all extensionless files would be piped through PHP now. Therefore you can alternatively set this behaviour for individual files only:
<Files article>
SetHandler application/x-httpd-php
# or SetType
</Files>
This is somewhat dependent on your server setup and the used PHP SAPI. Common alternatives include ForceType application/x-httpd-php or AddHandler php5-script.
Again take note that such settings propagate from one .htaccess to subfolders. You always should disable script execution (SetHandler None and Options -Exec or php_flag engine off etc.) for static resources, and upload/ directories etc.
Other Apache rewriting schemes
Among its many options, Apache provides mod_alias features - which sometimes work just as well as mod_rewrites RewriteRules. Note that most of those must be set up in a <VirtualHost> section however, not in per-directory .htaccess config files.
ScriptAliasMatch is primarily for CGI scripts, but also ought to works for PHP. It allows regexps just like any RewriteRule. In fact it's perhaps the most robust option to configurate a catch-all front controller.
And a plain Alias helps with a few simple rewriting schemes as well.
Even a plain ErrorDocument directive could be used to let a PHP script handle virtual paths. Note that this is a kludgy workaround however, prohibits anything but GET requests, and floods the error.log by definition.
See http://httpd.apache.org/docs/2.2/urlmapping.html for further tips.
A frequent question about URL rewriting goes something like this:
I currently have URLs that look like this:
http://example.com/my-blog/entry.php?id=42
http://example.com/my-blog/entry.php?id=123
I made them pretty like this:
http://example.com/my-blog/42--i-found-the-answer
http://example.com/my-blog/123--count-on-me
By using this in my .htaccess file:
RewriteRule my-blog/(\d+)--i-found-the-answer my-blog/entry.php?id=$1
But I want them to look like this:
http://example.com/my-blog/i-found-the-answer
http://example.com/my-blog/count-on-me
How can I change my .htaccess file to make that work?
The simple answer is that you can't.
Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly
Whenever you type in a URL in a web browser, or follow a link, or display a page that references an image, etc, the browser makes a request for a particular URL. That request ends up at a web server, and the web server gives a response.
A rewrite rule is simply a rule that says "when the browser requests a URL that looks like X, give them the same response as if they'd requested Y".
When we make rules to handle "pretty URLs", the request is the pretty URL, and the response is based on the internal ugly URL. It can't go the other way around, because we're writing the rule on the server, and all the server sees is the request the browser sent it.
You can't use information that you don't have
Given this basic model of what a rewrite rule does, imagine you were giving the instructions to a human. You could say:
If you see a number in the request, like the "42" in "http://example.com/my-blog/42--i-found-the-answer", put that number on the end of "my-blog/entry.php?id="
But if the information isn't there in the request, your instructions won't make any sense:
If the request has "my-blog" in it, like "http://example.com/my-blog/i-found-the-answer", put the right number on the end of "my-blog/entry.php?id="
The person reading those instructions is going to say "Sorry, how do I know what the right number is?"
Redirects: "This URL is currently out of office..."
Sometimes, you see rules that are the other way around, like this:
RewriteRule my-blog/entry.php?id=(\d+) my-blog/$1--i-found-the-answer [R]
This rule does match an ugly URL on the left, and produce a pretty URL on the right. So surely we could write it without the ID at the beginning of the pretty part?
RewriteRule my-blog/entry.php?id=(\d+) my-blog/i-found-the-answer [R]
The important difference is the [R] flag, which means that this rule is actually a redirect - instead of "serve the response from this URL", it means "tell the browser to load this URL instead".
You can think of this like one of those automated e-mail replies, saying "Sorry, Joe Bloggs is currently on holiday; please send your message to Jane Smith instead." In the same way, the redirect above tells the browser "Sorry, there's no content for http://example.com/my-blog/entry.php?id=42; please request http://example.com/my-blog/42--i-found-the-answer instead.
The important point of this analogy is that the above message wouldn't be much use if there wasn't actually anyone called Jane Smith working there, or if they had no idea how to answer the questions Joe Bloggs normally dealt with. Similarly, a redirect is no use if the URL you tell the browser to request doesn't actually do anything useful. Once the browser follows the redirect, it's going to make a new request, and when the server receives the new request, it still won't know what the ID number is.
But some sites do it, so it must be possible!
A web server only has the information present in the request, but how it uses that information is up to you.
For instance, rather than looking up a blog post by ID, you could store its URL directly in the database, then write some code to do the matching directly in PHP, Python, node.js, etc. Or you could have the same URL show different content based on the language the user has set in their browser, or based on a cookie, etc.
Another thing you can do is use a form (or API request) with a method of POST rather than GET. That means additional information is sent in the "body" of the request, separate from the URL. It still has to be sent, but it's not as obvious in the browser, won't be included in bookmarks, etc.
But you can't write a single line in a .htaccess file that performs miracles.

ALL requests via index.php, no exceptions

I've seen a lot of good answers to this question, such as Pretty URLs in PHP frameworks, however they all explicitly exclude existing files and directories, existing .php files, the index.php file itself and/or .css/.js/etc. I want everything directed to the index.php file, including the index.php file, where I can choose what to do with it dynamically, such as compression and minifying of css/js files or 404-ing most files that really exist.
I've been trying many things to get this done with varying and weird effects and looking at other answers for help, but there's always problems with whatever method I try. The closest working method I've found is also the simplest...
RewriteRule .* index.php?__path__=$0 [L]
However, the problem is that when I go to an existing folder without entering the trailing / e.g. localhost/test it redirects to localhost/test/ - which it doesn't do if the directory doesn't exist on the server. This creates the very distinction between things that actually exist on the server and URL rewrites which I am trying to get rid of.
What's weirder, if I go to 127.0.0.1/test (which may be a fix if some DNS cache quirk is causing this), it redirects to http://127.0.0.1/test/?__path__=test, which is totally bizarre (especially since going to 127.0.0.1/test/ avoids redirection completely, as intended, and no [R] was specified in the rewrite), and reveals the very kind of query stringy URL which I am trying to destroy. /var=value makes a much better format for query strings, but I digress. Of course, redirection doesn't happen with 127.0.0.1/testa because the file doesn't exist, so Apache just seems to be doing something intentional I really don't want it to.
Also, since it is odd I couldn't find any other examples of this being done, are there any big downsides this? I hypothesise that aside from a slight amount of additional server load from starting up PHP and executing whatever it has to, there shouldn't be any problems - oh, and of course errors could destroy access to everything hosted. I am using the following code...
// trying to access a file directly?
if ($is_file) {
// TODO: manage served files (gzip, minify, etc.)
// is it an allowed extension?
if (!is_array($CONFIG['allow_ext_access']) || !in_array($type, $CONFIG['allow_ext_access']))
die('access denied');
// it's not *this* file is it?
if (!$is_index) {
// try to get known file type
$file_type = isset($CONFIG['file_types'][$type]) ? $CONFIG['file_types'][$type] : false;
// if we have a file type we can properly pass it as what it is
if ($file_type) {
header('Content-Type: '.$file_type['mime']);
}
// execute php file or just output any other file
if ($type == 'php') #include($path);
else #readfile($path);
die;
}
}
// if it's not this file, then it's a path and a URL for re-routing
If I am understanding it right you can use:
DirectorySlash Off
RewriteEngine On
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule .* index.php?__path__=$0 [L,QSA]
DirectorySlash Off is used to disable adding a trailing slash after real directories as we are routing everything (including existing files / directories) to index.php.
Make sure to test it after clearing your browser cache.
We are using ENV:REDIRECT_STATUS variable here which is an internal mod_rewrite variable that is set to 200 after a successful internal rewrite. By checking
RewriteCond %{ENV:REDIRECT_STATUS} ^$
we're making sure that only first rewrite takes place and there is no looping.

Reference: mod_rewrite, URL rewriting and "pretty links" explained

"Pretty links" is an often requested topic, but it is rarely fully explained. mod_rewrite is one way to make "pretty links", but it's complex and its syntax is very terse, hard to grok, and the documentation assumes a certain level of proficiency in HTTP. Can someone explain in simple terms how "pretty links" work and how mod_rewrite can be used to create them?
Other common names, aliases, terms for clean URLs: RESTful URLs, user-friendly URLs, SEO-friendly URLs, slugging, and MVC URLs (probably a misnomer)
To understand what mod_rewrite does you first need to understand how a web server works. A web server responds to HTTP requests. An HTTP request at its most basic level looks like this:
GET /foo/bar.html HTTP/1.1
This is the simple request of a browser to a web server requesting the URL /foo/bar.html from it. It is important to stress that it does not request a file, it requests just some arbitrary URL. The request may also look like this:
GET /foo/bar?baz=42 HTTP/1.1
This is just as valid a request for a URL, and it has more obviously nothing to do with files.
The web server is an application listening on a port, accepting HTTP requests coming in on that port and returning a response. A web server is entirely free to respond to any request in any way it sees fit/in any way you have configured it to respond. This response is not a file, it's an HTTP response which may or may not have anything to do with physical files on any disk. A web server doesn't have to be Apache, there are many other web servers which are all just programs which run persistently and are attached to a port which respond to HTTP requests. You can write one yourself. This paragraph was intended to divorce you from any notion that URLs directly equal files, which is really important to understand. :)
The default configuration of most web servers is to look for a file that matches the URL on the hard disk. If the document root of the server is set to, say, /var/www, it may look whether the file /var/www/foo/bar.html exists and serve it if so. If the file ends in ".php" it will invoke the PHP interpreter and then return the result. All this association is completely configurable; a file doesn't have to end in ".php" for the web server to run it through the PHP interpreter, and the URL doesn't have to match any particular file on disk for something to happen.
mod_rewrite is a way to rewrite the internal request handling. When the web server receives a request for the URL /foo/bar, you can rewrite that URL into something else before the web server will look for a file on disk to match it. Simple example:
RewriteEngine On
RewriteRule /foo/bar /foo/baz
This rule says whenever a request matches "/foo/bar", rewrite it to "/foo/baz". The request will then be handled as if /foo/baz had been requested instead. This can be used for various effects, for example:
RewriteRule (.*) $1.html
This rule matches anything (.*) and captures it ((..)), then rewrites it to append ".html". In other words, if /foo/bar was the requested URL, it will be handled as if /foo/bar.html had been requested. See http://regular-expressions.info for more information about regular expression matching, capturing and replacements.
Another often encountered rule is this:
RewriteRule (.*) index.php?url=$1
This, again, matches anything and rewrites it to the file index.php with the originally requested URL appended in the url query parameter. I.e., for any and all requests coming in, the file index.php is executed and this file will have access to the original request in $_GET['url'], so it can do anything it wants with it.
Primarily you put these rewrite rules into your web server configuration file. Apache also allows* you to put them into a file called .htaccess within your document root (i.e. next to your .php files).
* If allowed by the primary Apache configuration file; it's optional, but often enabled.
What mod_rewrite does not do
mod_rewrite does not magically make all your URLs "pretty". This is a common misunderstanding. If you have this link in your web site:
<a href="/my/ugly/link.php?is=not&very=pretty">
there's nothing mod_rewrite can do to make that pretty. In order to make this a pretty link, you have to:
Change the link to a pretty link:
<a href="/my/pretty/link">
Use mod_rewrite on the server to handle the request to the URL /my/pretty/link using any one of the methods described above.
(One could use mod_substitute in conjunction to transform outgoing HTML pages and their contained links. Though this is usally more effort than just updating your HTML resources.)
There's a lot mod_rewrite can do and very complex matching rules you can create, including chaining several rewrites, proxying requests to a completely different service or machine, returning specific HTTP status codes as responses, redirecting requests etc. It's very powerful and can be used to great good if you understand the fundamental HTTP request-response mechanism. It does not automatically make your links pretty.
See the official documentation for all the possible flags and options.
To expand on deceze's answer, I wanted to provide a few examples and explanation of some other mod_rewrite functionality.
All of the below examples assume that you have already included RewriteEngine On in your .htaccess file.
Rewrite Example
Lets take this example:
RewriteRule ^blog/([0-9]+)/([A-Za-z0-9-\+]+)/?$ /blog/index.php?id=$1&title=$2 [NC,L,QSA]
The rule is split into 4 sections:
RewriteRule - starts the rewrite rule
^blog/([0-9]+)/([A-Za-z0-9-\+]+)/?$ - This is called the pattern, however I'll just refer to it as the left hand side of the rule - what you want to rewrite from
blog/index.php?id=$1&title=$2 - called the substitution, or right hand side of a rewrite rule - what you want to rewrite to
[NC,L,QSA] are flags for the rewrite rule, separated by a comma, which I will explain more on later
The above rewrite would allow you to link to something like /blog/1/foo/ and it would actually load /blog/index.php?id=1&title=foo.
Left hand side of the rule
^ indicates the start of the page name - so it will rewrite example.com/blog/... but not example.com/foo/blog/...
Each set of (…) parentheses represents a regular expression that we can capture as a variable in the right hand side of the rule. In this example:
The first set of brackets - ([0-9]+) - matches a string with a minimum of 1 character in length and with only numeric values (i.e. 0-9). This can be referenced with $1 in the right hand side of the rule
The second set of parentheses matches a string with a minimum of 1 character in length, containing only alphanumeric characters (A-Z, a-z, or 0-9) or - or + (note + is escaped with a backslash as without escaping it this will execute as a regex repetition character). This can be referenced with $2 in the right hand side of the rule
? means that the preceding character is optional, so in this case both /blog/1/foo/ and /blog/1/foo would rewrite to the same place
$ indicates this is the end of the string we want to match
Flags
These are options that are added in square brackets at the end of your rewrite rule to specify certain conditions. Again, there are a lot of different flags which you can read up on in the documentation, but I'll go through some of the more common flags:
NC
The no case flag means that the rewrite rule is case insensitive, so for the example rule above this would mean that both /blog/1/foo/ and /BLOG/1/foo/ (or any variation of this) would be matched.
L
The last flag indicates that this is the last rule that should be processed. This means that if and only if this rule matches, no further rules will be evaluated in the current rewrite processing run. If the rule does not match, all other rules will be tried in order as usual. If you do not set the L flag, all following rules will be applied to the rewritten URL afterwards.
END
Since Apache 2.4 you can also use the [END] flag. A matching rule with it will completely terminate further alias/rewrite processing. (Whereas the [L] flag can oftentimes trigger a second round, for example when rewriting into or out of subdirectories.)
QSA
The query string append flag allows us to pass in extra variables to the specified URL which will get added to the original get parameters. For our example this means that something like /blog/1/foo/?comments=15 would load /blog/index.php?id=1&title=foo&comments=15
R
This flag isn't one I used in the example above, but is one I thought is worth mentioning. This allows you to specify a http redirect, with the option to include a status code (e.g. R=301). For example if you wanted to do a 301 redirect on /myblog/ to /blog/ you would simply write a rule something like this:
RewriteRule ^/myblog/(*.)$ /blog/$1 [R=301,QSA,L]
Rewrite Conditions
Rewrite conditions make rewrites even more powerful, allowing you to specify rewrites for more specific situations. There are a lot of conditions which you can read about in the documentation, but I'll touch on a few common examples and explain them:
# if the host doesn't start with www. then add it and redirect
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
This is a very common practice, which will prepend your domain with www. (if it isn't there already) and execute a 301 redirect. For example, loading up http://example.com/blog/ it would redirect you to http://www.example.com/blog/
# if it cant find the image, try find the image on another domain
RewriteCond %{REQUEST_URI} \.(jpg|jpeg|gif|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*)$ http://www.example.com/$1 [L]
This is slightly less common, but is a good example of a rule that doesn't execute if the filename is a directory or file that exists on the server.
%{REQUEST_URI} \.(jpg|jpeg|gif|png)$ [NC] will only execute the rewrite for files with a file extension of jpg, jpeg, gif or png (case insensitive).
%{REQUEST_FILENAME} !-f will check to see if the file exists on the current server, and only execute the rewrite if it doesn't
%{REQUEST_FILENAME} !-d will check to see if the file exists on the current server, and only execute the rewrite if it doesn't
The rewrite will attempt to load the same file on another domain
References
Stack Overflow has many other great resources to get started:
Serverfault: Everything you ever wanted to know about mod_rewrite
(Keep in mind to remove the slash in ^/ pattern prefixes for .htaccess usage.)
Do's and Dont's in Hidden features of mod_rewrite.
Look through our most popular mod-rewrite questions and answers.
Apache redirecting and remapping guide.
AskApache ultimate .htaccess guide
And the mod-rewrite tag wiki references.
And newcomer-friendly regex overviews even:
Our regex tag wiki for a syntax compendium.
And the short Apache regex summary.
Else regexp.info for easy-to-understand basics.
Oft-used placeholders
.* matches anything, even an empty string. You don't want to use this pattern everywhere, but often in the last fallback rule.
[^/]+ is more often used for path segments. It matches anything but the forward slash.
\d+ only matches numeric strings.
\w+ matches alphanumeric characters. It's basically shorthand for [A-Za-z0-9_].
[\w\-]+ for "slug"-style path segments, using letters, numbers, dash - and _
[\w\-.,]+ adds periods and commas. Prefer an escaped \- dash in […] charclasses.
\. denotes a literal period. Otherwise . outside of […] is placeholder for any symbol.
Each of these placeholders is usually wrapped in (…) parentheses as capture group. And the whole pattern often in ^………$ start + end markers. Quoting "patterns" is optional.
RewriteRules
The following examples are PHP-centric and a bit more incremental, easier to adapt for similar cases.
They're just summaries, often link to more variations or detailed Q&As.
Static mapping /contact, /about
Shortening a few page names to internal file schemes is most simple:
RewriteRule ^contact$ templ/contact.html
RewriteRule ^about$ about.php
Numeric identifiers /object/123
Introducing shortcuts like http://example.com/article/531 to existing PHP scripts is also easy. The numeric placeholder can just be remapped to a $_GET parameter:
RewriteRule ^article/(\d+)$ article-show.php?id=$1
# └───────────────────────────┘
Slug-style placeholders /article/with-some-title-slug
You can easily extend that rule to allow for /article/title-string placeholders:
RewriteRule ^article/([\w-]+)$ article-show.php?title=$1
# └────────────────────────────────┘
Note that your script must be able (or be adapted) to map those titles back to database-ids. RewriteRules alone can't create or guess information out of thin air.
Slugs with numeric prefixes /readable/123-plus-title
Therefore you'll often see mixed /article/529-title-slug paths used in practice:
RewriteRule ^article/(\d+)-([\w-]+)$ article.php?id=$1&title=$2
# └───────────────────────────────┘
Now you could just skip passing the title=$2 anyway, because your script will typically rely on the database-id anyway. The -title-slug has become arbitrary URL decoration.
Uniformity with alternative lists /foo/… /bar/… /baz/…
If you have similar rules for multiple virtual page paths, then you can match and compact them with | alternative lists. And again just reassign them to internal GET parameters:
# ┌─────────────────────────┐
RewriteRule ^(blog|post|user)/(\w+)$ disp.php?type=$1&id=$2
# └───────────────────────────────────┘
You can split them out into individual RewriteRules should this get too complex.
Dispatching related URLs to different backends /date/SWITCH/backend
A more practical use of alternative lists are mapping request paths to distinct scripts. For example to provide uniform URLs for an older and a newer web application based on dates:
# ┌─────────────────────────────┐
# │ ┌───────────┼───────────────┐
RewriteRule ^blog/(2009|2010|2011)/([\d-]+)/?$ old/blog.php?date=$2
RewriteRule ^blog/(\d+)/([\d-]+)/?$ modern/blog/index.php?start=$2
# └──────────────────────────────────────┘
This simply remaps 2009-2011 posts onto one script, and all other years implicitly to another handler.
Note the more specific rule coming first. Each script might use different GET params.
Other delimiters than just / path slashes /user-123-name
You're most commonly seeing RewriteRules to simulate a virtual directory structure. But you're not forced to be uncreative. You can as well use - hyphens for segmenting or structure.
RewriteRule ^user-(\d+)$ show.php?what=user&id=$1
# └──────────────────────────────┘
# This could use `(\w+)` alternatively for user names instead of ids.
For the also common /wiki:section:Page_Name scheme:
RewriteRule ^wiki:(\w+):(\w+)$ wiki.php?sect=$1&page=$2
# └─────┼────────────────────┘ │
# └────────────────────────────┘
Occasionally it's suitable to alternate between /-delimiters and : or . in the same rule even. Or have two RewriteRules again to map variants onto different scripts.
Optional trailing / slash /dir = /dir/
When opting for directory-style paths, you can make it reachable with and without a final /
RewriteRule ^blog/([\w-]+)/?$ blog/show.php?id=$1
# ┗┛
Now this handles both http://example.com/blog/123 and /blog/123/. And the /?$ approach is easy to append onto any other RewriteRule.
Flexible segments for virtual paths .*/.*/.*/.*
Most rules you'll encounter map a constrained set of /…/ resource path segments to individual GET parameters. Some scripts handle a variable number of options however.
The Apache regexp engine doesn't allow optionalizing an arbitrary number of them. But you can easily expand it into a rule block yourself:
Rewriterule ^(\w+)/?$ in.php?a=$1
Rewriterule ^(\w+)/(\w+)/?$ in.php?a=$1&b=$2
Rewriterule ^(\w+)/(\w+)/(\w+)/?$ in.php?a=$1&b=$2&c=$3
# └─────┴─────┴───────────────────┴────┴────┘
If you need up to five path segments, then copy this scheme along into five rules. You can of course use a more specific [^/]+ placeholder each.
Here the ordering isn't as important, as neither overlaps. So having the most frequently used paths first is okay.
Alternatively you can utilize PHPs array parameters via ?p[]=$1&p[]=$2&p[]=3 query string here - if your script merely prefers them pre-split.
(Though it's more common to just use a catch-all rule, and let the script itself expand the segments out of the REQUEST_URI.)
See also: How do I transform my URL path segments into query string key-value pairs?
Optional segments prefix/opt?/.*
A common variation is to have optional prefixes within a rule. This usually makes sense if you have static strings or more constrained placeholders around:
RewriteRule ^(\w+)(?:/([^/]+))?/(\w+)$ ?main=$1&opt=$2&suffix=$3
Now the more complex pattern (?:/([^/])+)? there simply wraps a non-capturing (?:…) group, and makes it optional )?. The contained
placeholder ([^/]+) would be substitution pattern $2, but be empty if there's no middle /…/ path.
Capture the remainder /prefix/123-capture/…/*/…whatever…
As said before, you don't often want too generic rewrite patterns. It does however make sense to combine static and specific comparisons with a .* sometimes.
RewriteRule ^(specific)/prefix/(\d+)(/.*)?$ speci.php?id=$2&otherparams=$2
This optionalized any /…/…/… trailing path segments. Which then of course requires the handling script to split them up, and variabl-ify extracted parameters
itself (which is what Web-"MVC" frameworks do).
Trailing file "extensions" /old/path.HTML
URLs don't really have file extensions. Which is what this entire reference is about (= URLs are virtual locators, not necessarily a direct filesystem image).
However if you had a 1:1 file mapping before, you can craft simpler rules:
RewriteRule ^styles/([\w\.\-]+)\.css$ sass-cache.php?old_fn_base=$1
RewriteRule ^images/([\w\.\-]+)\.gif$ png-converter.php?load_from=$2
Other common uses are remapping obsolete .html paths to newer .php handlers, or just aliasing directory names only for individual (actual/real) files.
Ping-Pong (redirects and rewrites in unison) /ugly.html ←→ /pretty
So at some point you're rewriting your HTML pages to carry only pretty links, as outlined by deceze.
Meanwhile you'll still receive requests for the old paths, sometimes even from bookmarks. As workaround, you can ping-pong browsers to display/establish
the new URLs.
This common trick involves sending a 30x/Location redirect whenever an incoming URL follows the obsolete/ugly naming scheme.
Browsers will then rerequest the new/pretty URL, which afterwards is rewritten (just internally) to the original or new location.
# redirect browser for old/ugly incoming paths
RewriteRule ^old/teams\.html$ /teams [R=301,QSA,END]
# internally remap already-pretty incoming request
RewriteRule ^teams$ teams.php [QSA,END]
Note how this example just uses [END] instead of [L] to safely alternate. For older Apache 2.2 versions you can use other workarounds, besides also remapping
query string parameters for example:
Redirect ugly to pretty URL, remap back to the ugly path, without infinite loops
Spaces ␣ in patterns /this+that+
It's not that pretty in browser address bars, but you can use spaces in URLs. For rewrite patterns use backslash-escaped \␣ spaces.
Else just "-quote the whole pattern or substitution:
RewriteRule "^this [\w ]+/(.*)$" "index.php?id=$1" [L]
Clients serialize URLs with + or %20 for spaces. Yet in RewriteRules they're interpreted with literal characters for all relative path segments.
Frequent duplicates:
Catch-all for a central dispatcher / front-controller script
RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI} !-d
RewriteRule ^.*$ index.php [L]
Which is often used by PHP frameworks or WebCMS / portal scripts. The actual path splitting then is handled in PHP using $_SERVER["REQUEST_URI"]. So conceptionally it's pretty much the opposite of URL handling "per mod_rewrite". (Just use FallBackResource instead.)
Remove www. from hostname
Note that this doesn't copy a query string along, etc.
# ┌──────────┐
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC] │
RewriteRule ^(.*)$ http://%1/$1 [R=301,L] │
# ↓ └───┼────────────┘
# └───────────────┘
See also:
· URL rewriting for different protocols in .htaccess
· Generic htaccess redirect www to non-www
· .htaccess - how to force "www." in a generic way?
Note that RewriteCond/RewriteRule combos can be more complex, with matches (%1 and $1) interacting in both directions even:
Apache manual - mod_rewrite intro, Copyright 2015 The Apache Software Foundation, AL-2.0
Redirect to HTTPS://
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R,L]
See also: https://wiki.apache.org/httpd/RewriteHTTPToHTTPS
"Removing" the PHP extension
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L] # or [END]
See also: Removing the .php extension with mod_rewrite
Aliasing old .html paths to .php scripts
See: http://httpd.apache.org/docs/2.4/rewrite/remapping.html#backward-compatibility
Rewrite from URL like "/page" to a script such as "/index.php/page"
See mod_rewrite, php and the .htaccess file
Redirect subdomain to a folder
See How can i get my htaccess to work (subdomains)?
Prevalent .htaccess pitfalls
Now take this with a grain of salt. Not every advise can be generalized to all contexts.
This is just a simple summary of well-known and a few unobvious stumbling blocks:
Enable mod_rewrite and .htaccess
To actually use RewriteRules in per-directory configuration files you must:
Check that your server has AllowOverride All enabled. Otherwise your per-directory .htaccess directives will go ignored, and RewriteRules won't work.
Obviously have mod_rewrite enabled in your httpd.conf modules section.
Prepend each list of rules with RewriteEngine On still. While mod_rewrite is implicitly active in <VirtualHost> and <Directory> sections,
the per-directory .htaccess files need it individually summoned.
The leading slash ^/ won't match
You shouldn't start your .htaccess RewriteRule patterns with ^/ normally:
RewriteRule ^/article/\d+$ …
↑
This is often seen in old tutorials. And it used to be correct for ancient Apache 1.x versions. Nowadays request paths are conveniently fully directory-relative in .htaccess RewriteRules. Just leave the leading / out.
· Note that the leading slash is still correct in <VirtualHost> sections though. Which is why you often see it ^/? optionalized for rule parity.
· Or when using a RewriteCond %{REQUEST_URI} you'd still match for a leading /.
· See also Webmaster.SE: When is the leading slash (/) needed in mod_rewrite patterns?
<IfModule *> wrappers begone!
You've probably seen this in many examples:
<IfModule mod_rewrite.c>
Rewrite…
</IfModule>
It does make sense in <VirtualHost> sections - if it was combined with another fallback option, such as ScriptAliasMatch. (But nobody ever does that).
And it's commonly distributed for default .htaccess rulesets with many open source projects. There it's just meant as fallback, and keeps "ugly" URLs work as default.
However you don't want that usually in your own .htaccess files.
Firstly, mod_rewrite does not randomly disengage. (If it did, you'd have bigger problems).
Were it really be disabled, your RewriteRules still wouldn't work anyway.
It's meant to prevent HTTP 500 errors. What it usually accomplishes is gracing your users with HTTP 404 errors instead. (Not so much more user-friendly if you think about it.)
Practically it just suppresses the more useful log entries, or server notification mails. You'd be none the wiser as to why your RewriteRules never work.
What seems enticing as generalized safeguard, often turns out to be an obstacle in practice.
Don't use RewriteBase unless needed
Many copy+paste examples contain a RewriteBase / directive. Which happens to be the implicit default anyway. So you don't actually need this. It's a workaround for fancy VirtualHost rewriting schemes, and misguessed DOCUMENT_ROOT paths for some shared hosters.
It makes sense to use with individual web applications in deeper subdirectories. It can shorten RewriteRule patterns in such cases. Generally it's best to prefer relative path specifiers in per-directory rule sets.
See also How does RewriteBase work in .htaccess
Disable MultiViews when virtual paths overlap
URL rewriting is primarily used for supporting virtual incoming paths. Commonly you just have one dispatcher script (index.php) or a few individual handlers (articles.php, blog.php, wiki.php, …). The latter might clash with similar virtual RewriteRule paths.
A request for /article/123 for example could map to article.php with a /123 PATH_INFO implicitly. You'd either have to guard your rules then with the commonplace RewriteCond !-f+!-d, and/or disable PATH_INFO support, or perhaps just disable Options -MultiViews.
Which is not to say you always have to. Content-Negotiation is just an automatism to virtual resources.
Ordering is important
See Everything you ever wanted to know about mod_rewrite
if you haven't already. Combining multiple RewriteRules often leads to interaction. This isn't something to prevent habitually per [L] flag, but a scheme you'll embrace once versed.
You can re-re-rewrite virtual paths from one rule to another, until it reaches an actual target handler.
Still you'd often want to have the most specific rules (fixed string /forum/… patterns, or more restrictive placeholders [^/.]+) in the early rules.
Generic slurp-all rules (.*) are better left to the later ones. (An exception is a RewriteCond -f/-d guard as primary block.)
Stylesheets and images stop working
When you introduce virtual directory structures /blog/article/123 this impacts relative resource references in HTML (such as <img src=mouse.png>).
Which can be solved by:
Only using server-absolute references href="/old.html" or src="/logo.png"
Often simply by adding <base href="/index"> into your HTML <head> section.
This implicitly rebinds relative references to what they were before.
You could alternatively craft further RewriteRules to rebind .css or .png paths to their original locations.
But that's both unneeded, or incurs extra redirects and hampers caching.
See also: CSS, JS and images do not display with pretty url
RewriteConds just mask one RewriteRule
A common misinterpetation is that a RewriteCond blocks multiple RewriteRules (because they're visually arranged together):
RewriteCond %{SERVER_NAME} localhost
RewriteRule ^secret admin/tools.php
RewriteRule ^hidden sqladmin.cgi
Which it doesn't per default. You can chain them using the [S=2] flag. Else you'll have to repeat them. While sometimes you can craft an "inverted" primary rule to [END] the rewrite processing early.
QUERY_STRING exempt from RewriteRules
You can't match RewriteRule index.php\?x=y, because mod_rewrite compares just against relative paths per default. You can match them separately however via:
RewriteCond %{QUERY_STRING} \b(?:param)=([^&]+)(?:&|$)
RewriteRule ^add/(.+)$ add/%1/$1 # ←──﹪₁──┘
See also How can I match query string variables with mod_rewrite?
.htaccess vs. <VirtualHost>
If you're using RewriteRules in a per-directory config file, then worrying about regex performance is pointless. Apache retains
compiled PCRE patterns longer than a PHP process with a common routing framework. For high-traffic sites you should however consider
moving rulesets into the vhost server configuration, once they've been battle-tested.
In this case, prefer the optionalized ^/? directory separator prefix. This allows to move RewriteRules freely between PerDir and server
config files.
Whenever something doesn't work
Fret not.
Compare access.log and error.log
Often you can figure out how a RewriteRule misbehaves just from looking at your error.log and access.log.
Correlate access times to see which request path originally came in, and which path/file Apache couldn't resolve to (error 404/500).
This doesn't tell you which RewriteRule is the culprit. But inaccessible final paths like /docroot/21-.itle?index.php may give away where to inspect further.
Otherwise disable rules until you get some predictable paths.
Enable the RewriteLog
See Apache RewriteLog docs. For debugging you can enable it in the vhost sections:
# Apache 2.2
RewriteLogLevel 5
RewriteLog /tmp/rewrite.log
# Apache 2.4
LogLevel alert rewrite:trace5
#ErrorLog /tmp/rewrite.log
That yields a detailed summary of how incoming request paths get modified by each rule:
[..] applying pattern '^test_.*$' to uri 'index.php'
[..] strip per-dir prefix: /srv/www/vhosts/hc-profi/index.php -> index.php
[..] applying pattern '^index\.php$' to uri 'index.php'
Which helps to narrow down overly generic rules and regex mishaps.
See also:
· .htaccess not working (mod_rewrite)
· Tips for debugging .htaccess rewrite rules
Before asking your own question
As you might know, Stack Overflow is very suitable for asking questions on mod_rewrite. Make them on-topic
by including prior research and attempts (avoid redundant answers), demonstrate basic regex understanding, and:
Include full examples of input URLs, falsly rewritten target paths, your real directory structure.
The complete RewriteRule set, but also single out the presumed defective one.
Apache and PHP versions, OS type, filesystem, DOCUMENT_ROOT, and PHPs $_SERVER environment if it's about a parameter mismatch.
An excerpt from your access.log and error.log to verify what the existing rules resolved to. Better yet, a rewrite.log summary.
This nets quicker and more exact answers, and makes them more useful to others.
Comment your .htaccess
If you copy examples from somewhere, take care to include a # comment and origin link. While it's merely bad manners to omit attribution,
it often really hurts maintenance later. Document any code or tutorial source. In particular while unversed you should be
all the more interested in not treating them like magic blackboxes.
It's not "SEO"-URLs
Disclaimer: Just a pet peeve. You often hear pretty URL rewriting schemes referred to as "SEO" links or something. While this is useful for googling examples, it's a dated misnomer.
None of the modern search engines are really disturbed by .html and .php in path segments, or ?id=123 query strings for that matter. Search engines of old, such as AltaVista, did avoid crawling websites with potentially ambigious access paths. Modern crawlers are often even craving for deep web resources.
What "pretty" URLs should conceptionally be used for is making websites user-friendly.
Having readable and obvious resource schemes.
Ensuring URLs are long-lived (AKA permalinks).
Providing discoverability through /common/tree/nesting.
However don't sacrifice unique requirements for conformism.
Tools
There are various online tools to generate RewriteRules for most GET-parameterish URLs:
http://www.generateit.net/mod-rewrite/index.php
http://www.ipdistance.com/mod_rewrite.php
http://webtools.live2support.com/misc_rewrite.php
Mostly just output [^/]+ generic placeholders, but likely suffices for trivial sites.
Alternatives to mod_rewrite
Many basic virtual URL schemes can be achieved without using RewriteRules. Apache allows PHP scripts to be invoked without .php extension, and with a virtual PATH_INFO argument.
Use the PATH_INFO, Luke
Nowadays AcceptPathInfo On is often enabled by default. Which basically allows .php and other resource URLs to carry a virtual argument:
http://example.com/script.php/virtual/path
Now this /virtual/path shows up in PHP as $_SERVER["PATH_INFO"] where you can handle any extra arguments however you like.
This isn't as convenient as having Apache separate input path segments into $1, $2, $3 and passing them as distinct $_GET variables to PHP. It's merely emulating "pretty URLs" with less configuration effort.
Enable MultiViews to hide the .php extension
The simplest option to also eschew .php "file extensions" in URLs is enabling:
Options +MultiViews
This has Apache select article.php for HTTP requests on /article due to the matching basename. And this works well together with the aforementioned PATH_INFO feature. So you can just use URLs like http://example.com/article/virtual/title. Which makes sense if you have a traditional web application with multiple PHP invocation points/scripts.
Note that MultiViews has a different/broader purpose though. It incurs a very minor performance penalty, because Apache always looks for other files with matching basenames. It's actually meant for Content-Negotiation, so browsers receive the best alternative among available resources (such as article.en.php, article.fr.php, article.jp.mp4).
SetType or SetHandler for extensionless .php scripts
A more directed approach to avoid carrying around .php suffixes in URLs is configuring the PHP handler for other file schemes. The simplest option is overriding the default MIME/handler type via .htaccess:
DefaultType application/x-httpd-php
This way you could just rename your article.php script to just article (without extension), but still have it processed as PHP script.
Now this can have some security and performance implications, because all extensionless files would be piped through PHP now. Therefore you can alternatively set this behaviour for individual files only:
<Files article>
SetHandler application/x-httpd-php
# or SetType
</Files>
This is somewhat dependent on your server setup and the used PHP SAPI. Common alternatives include ForceType application/x-httpd-php or AddHandler php5-script.
Again take note that such settings propagate from one .htaccess to subfolders. You always should disable script execution (SetHandler None and Options -Exec or php_flag engine off etc.) for static resources, and upload/ directories etc.
Other Apache rewriting schemes
Among its many options, Apache provides mod_alias features - which sometimes work just as well as mod_rewrites RewriteRules. Note that most of those must be set up in a <VirtualHost> section however, not in per-directory .htaccess config files.
ScriptAliasMatch is primarily for CGI scripts, but also ought to works for PHP. It allows regexps just like any RewriteRule. In fact it's perhaps the most robust option to configurate a catch-all front controller.
And a plain Alias helps with a few simple rewriting schemes as well.
Even a plain ErrorDocument directive could be used to let a PHP script handle virtual paths. Note that this is a kludgy workaround however, prohibits anything but GET requests, and floods the error.log by definition.
See http://httpd.apache.org/docs/2.2/urlmapping.html for further tips.
A frequent question about URL rewriting goes something like this:
I currently have URLs that look like this:
http://example.com/my-blog/entry.php?id=42
http://example.com/my-blog/entry.php?id=123
I made them pretty like this:
http://example.com/my-blog/42--i-found-the-answer
http://example.com/my-blog/123--count-on-me
By using this in my .htaccess file:
RewriteRule my-blog/(\d+)--i-found-the-answer my-blog/entry.php?id=$1
But I want them to look like this:
http://example.com/my-blog/i-found-the-answer
http://example.com/my-blog/count-on-me
How can I change my .htaccess file to make that work?
The simple answer is that you can't.
Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly
Whenever you type in a URL in a web browser, or follow a link, or display a page that references an image, etc, the browser makes a request for a particular URL. That request ends up at a web server, and the web server gives a response.
A rewrite rule is simply a rule that says "when the browser requests a URL that looks like X, give them the same response as if they'd requested Y".
When we make rules to handle "pretty URLs", the request is the pretty URL, and the response is based on the internal ugly URL. It can't go the other way around, because we're writing the rule on the server, and all the server sees is the request the browser sent it.
You can't use information that you don't have
Given this basic model of what a rewrite rule does, imagine you were giving the instructions to a human. You could say:
If you see a number in the request, like the "42" in "http://example.com/my-blog/42--i-found-the-answer", put that number on the end of "my-blog/entry.php?id="
But if the information isn't there in the request, your instructions won't make any sense:
If the request has "my-blog" in it, like "http://example.com/my-blog/i-found-the-answer", put the right number on the end of "my-blog/entry.php?id="
The person reading those instructions is going to say "Sorry, how do I know what the right number is?"
Redirects: "This URL is currently out of office..."
Sometimes, you see rules that are the other way around, like this:
RewriteRule my-blog/entry.php?id=(\d+) my-blog/$1--i-found-the-answer [R]
This rule does match an ugly URL on the left, and produce a pretty URL on the right. So surely we could write it without the ID at the beginning of the pretty part?
RewriteRule my-blog/entry.php?id=(\d+) my-blog/i-found-the-answer [R]
The important difference is the [R] flag, which means that this rule is actually a redirect - instead of "serve the response from this URL", it means "tell the browser to load this URL instead".
You can think of this like one of those automated e-mail replies, saying "Sorry, Joe Bloggs is currently on holiday; please send your message to Jane Smith instead." In the same way, the redirect above tells the browser "Sorry, there's no content for http://example.com/my-blog/entry.php?id=42; please request http://example.com/my-blog/42--i-found-the-answer instead.
The important point of this analogy is that the above message wouldn't be much use if there wasn't actually anyone called Jane Smith working there, or if they had no idea how to answer the questions Joe Bloggs normally dealt with. Similarly, a redirect is no use if the URL you tell the browser to request doesn't actually do anything useful. Once the browser follows the redirect, it's going to make a new request, and when the server receives the new request, it still won't know what the ID number is.
But some sites do it, so it must be possible!
A web server only has the information present in the request, but how it uses that information is up to you.
For instance, rather than looking up a blog post by ID, you could store its URL directly in the database, then write some code to do the matching directly in PHP, Python, node.js, etc. Or you could have the same URL show different content based on the language the user has set in their browser, or based on a cookie, etc.
Another thing you can do is use a form (or API request) with a method of POST rather than GET. That means additional information is sent in the "body" of the request, separate from the URL. It still has to be sent, but it's not as obvious in the browser, won't be included in bookmarks, etc.
But you can't write a single line in a .htaccess file that performs miracles.

How do I configure apache for a custom directory?

Trying to configure apache2 to load example.com/forum/ from a different document root, relative to the site root. Forums are installed somewhere else on the server.
Is there a directory alias command? I've found the alias configuration entry for apache, but had no luck.
Basically, I want example.com to have the same directory its always had, but example.com/forum/ to be hosted somewhere else, on the same server.
I tagged this question with mod_rewrite because I thought maybe it would be the key, here.
Cheers!
Alias is the right way, unless you have some subtlety that you didn't reveal in your question.
# http.conf
Alias /forum /usr/lib/bbs/ # or whatever
The job of Alias is to take the abstract URL coming into your system and map it to a concrete filesystem path. Once it has done that, the request is no longer an URL but a path. If there is no Alias or similar directive handling that URL, then it will get mapped to a conrete path via DocumentRoot.
If this isn't working, you have to debug it further. Are you getting errors when you access /forum? Look in the error log.
It all depends of what you want. You can "hardlink" with real path and it works (so you were right to think it could work with mod_rewrite).
Quick sample (that works on my production domains) to make an internal change (I add a subdirectory):
RewriteRule (.*) %{DOCUMENT_ROOT}/mysubfolder%{REQUEST_FILENAME} [QSA,L]
So you can easily do something like:
RewriteRule ^/forum/(.*) %{DOCUMENT_ROOT}/mysubfolder%{REQUEST_FILENAME} [QSA,L]
And my suggestion would be that if you plan to have more rewrite rules, keep everything homogeneous, i.e.: keep on using only rewrite rules, so use my suggestion above. This way you'll not get a bad mix of Alias, RewriteRules and so on. For nice and clean stuff: keep everything homogeneous.

Using .htaccess rewrite rules to reflect a "fake" directory structure in the addres bar

I'm working with an online encyclopedia and I am trying to achieve the following:
Given the physical location of a file in http://example.com/articles/c/a/t/Cat.html,
Get the location in the address bar to show http://example.com/encyclopedia/Cat.html
This also needs to work so that if a link is clicked or someone types in "example.com/encyclopedia/Cat.html", the server will look for the file in "/articles/c/a/t/Cat.html", yet still serve the shorter URI in the address bar.
I understand this may involve some heavy .htaccess voodoo to accomplish, or perhaps that it would be better to use a PHP script to serve this purpose.
So far I have the following in my .htaccess:
<IfModule mod_rewrite.c>
Options +FollowSymLinks
RewriteEngine On
RewriteRule ^encyclopedia/(.*)\.html$ articles/$1.html [NC]
RewriteCond %{THE_REQUEST} ^GET\ articles/(.*)
RewriteRule ^articles/(.*) /encyclopedia/$1 [L,R=301]
</IfModule>
However with this code, it only works by going to "example.com/encyclopedia/c/a/t/Cat.html" and showing the proper page, and when you go to "/articles/c/a/t/Cat.html it still doesn't rewrite it as "/encyclopedia/", it just stays the same.
Edit - By removing the GET\ part from the RewriteCond and removing the leading forward-slash from /encyclopedia/$1 in the following line, any requests to "/articles/c/a/t/Cat.html" are correctly redirected to "/encyclopedia/c/a/t/Cat.html". I am still at a loss trying to remove the "/c/a/t" part though. **
I've tried using the following two rules to remove the "c/a/t/" part:
RewriteRule ^encyclopedia/((.)(.)(.).*)\.html$ articles/$2/$3/$4/$1.html [NC]
RewriteRule ^articles/(.)/(.)/(.)/(.*) /encyclopedia/$4 [L,R=301]
But with no success as I'm sure what's happening is I'm getting the capital "C" from "Cat.html" and putting that in as "/articles/C/a/t/Cat.html" which will obviously not work.
I've been looking around studying .htaccess RewriteRule and RewriteCond for days but I still haven't been able to figure this out and been BHOK enough to cause a few migraines.
Would this be better accomplished using a PHP script? Or can this voodoo be easily enough accomplished via only .htaccess rules?
First thing, forget about .htaccess files. .htaccess files is just an extension of Apache configuration files that you can put in some directories. They're really slowing down your apache server, he needs to check part of his configuration at runtime. It's done to allow some configuration on hosted environments.
Put everything you have in .htaccess files in <Directory> sections on your VirtualHost and use AllowOverride None to tell Apache to forget about trying to read .htaccess files.
So what you need is mod-rewrite voodoo, not .htaccess voodoo :-)
Now your rewrite problem is quite complex. If you need some mod-rewrite help do not forget to read this ServFault article : Everything You Ever Wanted to Know about Mod_Rewrite Rules but Were Afraid to Ask?
I assume that your Cat.html -> c/a/t/Cat.html is just an example and that you can have more than 3 letters : CatAndDogs.html -> c/a/t/a/n/d/d/o/g/s/CatAndDogs.html.
The part of mod-reqrite you need is (I think) RewriteMap. There you will find some helpers like lowercase: that coudl help you, but you will also find the prg: which means using an external program to perform the mapping. I would use perl examples of such rewriteMaps examples available via google and make some transformations. Should be quite easy and Fast in Perl to transform CatAndDogs.html in c/a/t/a/n/d/d/o/g/s/CatAndDogs.html.
Note that RewriteMap will never work inside a .htaccess. Forget .htaccess files. The prg: keyword will launch your perl program as a parallel daemon and will feed him with quite a lot of data, you shoudl really write something robust & fast. Do not forget to use the RewriteLock directive to avoid mixing results (some prg: mappers do not care about mixing results, think about load balancers for examples, but you do want to avoid mixing results for parallel queries)