Hidden features of mod_rewrite - apache

There seem to be a decent number of mod_rewrite threads floating around lately with a bit of confusion over how certain aspects of it work. As a result I've compiled a few notes on common functionality, and perhaps a few annoying nuances.
What other features / common issues have you run across using mod_rewrite?

Where to place mod_rewrite rules
mod_rewrite rules may be placed within the httpd.conf file, or within the .htaccess file. if you have access to httpd.conf, placing rules here will offer a performance benefit (as the rules are processed once, as opposed to each time the .htaccess file is called).
Logging mod_rewrite requests
Logging may be enabled from within the httpd.conf file (including <Virtual Host>):
# logs can't be enabled from .htaccess
# loglevel > 2 is really spammy!
RewriteLog /path/to/rewrite.log
RewriteLogLevel 2
Common use cases
To funnel all requests to a single point:
RewriteEngine on
# ignore existing files
RewriteCond %{REQUEST_FILENAME} !-f
# ignore existing directories
RewriteCond %{REQUEST_FILENAME} !-d
# map requests to index.php and append as a query string
RewriteRule ^(.*)$ index.php?query=$1
Since Apache 2.2.16 you can also use FallbackResource.
Handling 301/302 redirects:
RewriteEngine on
# 302 Temporary Redirect (302 is the default, but can be specified for clarity)
RewriteRule ^oldpage\.html$ /newpage.html [R=302]
# 301 Permanent Redirect
RewriteRule ^oldpage2\.html$ /newpage.html [R=301]
Note: external redirects are implicitly 302 redirects:
# this rule:
RewriteRule ^somepage\.html$ http://google.com
# is equivalent to:
RewriteRule ^somepage\.html$ http://google.com [R]
# and:
RewriteRule ^somepage\.html$ http://google.com [R=302]
Forcing SSL
RewriteEngine on
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://example.com/$1 [R,L]
Common flags:
[R] or [redirect] - force a redirect (defaults to a 302 temporary redirect)
[R=301] or [redirect=301] - force a 301 permanent redirect
[L] or [last] - stop rewriting process (see note below in common pitfalls)
[NC] or [nocase] - specify that matching should be case insensitive
Using the long-form of flags is often more readable and will help others who come to read your code later.
You can separate multiple flags with a comma:
RewriteRule ^olddir(.*)$ /newdir$1 [L,NC]
Common pitfalls
Mixing mod_alias style redirects with mod_rewrite
# Bad
Redirect 302 /somepage.html http://example.com/otherpage.html
RewriteEngine on
RewriteRule ^(.*)$ index.php?query=$1
# Good (use mod_rewrite for both)
RewriteEngine on
# 302 redirect and stop processing
RewriteRule ^somepage.html$ /otherpage.html [R=302,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# handle other redirects
RewriteRule ^(.*)$ index.php?query=$1
Note: you can mix mod_alias with mod_rewrite, but it involves more work than just handling basic redirects as above.
Context affects syntax
Within .htaccess files, a leading slash is not used in the RewriteRule pattern:
# given: GET /directory/file.html
# .htaccess
# result: /newdirectory/file.html
RewriteRule ^directory(.*)$ /newdirectory$1
# .htaccess
# result: no match!
RewriteRule ^/directory(.*)$ /newdirectory$1
# httpd.conf
# result: /newdirectory/file.html
RewriteRule ^/directory(.*)$ /newdirectory$1
# Putting a "?" after the slash will allow it to work in both contexts:
RewriteRule ^/?directory(.*)$ /newdirectory$1
[L] is not last! (sometimes)
The [L] flag stops processing any further rewrite rules for that pass through the rule set. However, if the URL was modified in that pass and you're in the .htaccess context or the <Directory> section, then your modified request is going to be passed back through the URL parsing engine again. And on the next pass, it may match a different rule this time. If you don't understand this, it often looks like your [L] flag had no effect.
# processing does not stop here
RewriteRule ^dirA$ /dirB [L]
# /dirC will be the final result
RewriteRule ^dirB$ /dirC
Our rewrite log shows that the rules are run twice and the URL is updated twice:
rewrite 'dirA' -> '/dirB'
internal redirect with /dirB [INTERNAL REDIRECT]
rewrite 'dirB' -> '/dirC'
The best way around this is to use the [END] flag (see Apache docs) instead of the [L] flag, if you truly want to stop all further processing of rules (and subsequent passes). However, the [END] flag is only available for Apache v2.3.9+, so if you have v2.2 or lower, you're stuck with just the [L] flag.
For earlier versions, you must rely on RewriteCond statements to prevent matching of rules on subsequent passes of the URL parsing engine.
# Only process the following RewriteRule if on the first pass
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ...
Or you must ensure that your RewriteRule's are in a context (i.e. httpd.conf) that will not cause your request to be re-parsed.

if you need to 'block' internal redirects / rewrites from happening in the .htaccess, take a look at the
RewriteCond %{ENV:REDIRECT_STATUS} ^$
condition, as discussed here.

The deal with RewriteBase:
You almost always need to set RewriteBase. If you don't, apache guesses that your base is the physical disk path to your directory. So start with this:
RewriteBase /

Other Pitfalls:
1- Sometimes it's a good idea to disable MultiViews
Options -MultiViews
I'm not well verse on all of MultiViews capabilities, but I know that it messes up my mod_rewrite rules when active, because one of its properties is to try and 'guess' an extension to a file that it thinks I'm looking for.
I'll explain:
Suppose you have 2 php files in your web dir, file1.php and file2.php and you add these conditions and rule to your .htaccess :
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ file1.php/$1
You assume that all urls that do not match a file or a directory will be grabbed by file1.php. Surprise! This rule is not being honored for the url http://myhost/file2/somepath. Instead you're taken inside file2.php.
What's going on is that MultiViews automagically guessed that the url that you actually wanted was http://myhost/file2.php/somepath and gladly took you there.
Now, you have no clue what just happened and you're at that point questioning everything that you thought you knew about mod_rewrite. You then start playing around with rules to try to make sense of the logic behind this new situation, but the more you're testing the less sense it makes.
Ok, In short if you want mod_rewrite to work in a way that approximates logic, turning off MultiViews is a step in the right direction.
2- enable FollowSymlinks
Options +FollowSymLinks
That one, I don't really know the details of, but I've seen it mentioned many times, so just do it.

Equation can be done with following example:
RewriteCond %{REQUEST_URI} ^/(server0|server1).*$ [NC]
# %1 is the string that was found above
# %1<>%{HTTP_COOKIE} concatenates first macht with mod_rewrite variable -> "test0<>foo=bar;"
#RewriteCond search for a (.*) in the second part -> \1 is a reference to (.*)
# <> is used as an string separator/indicator, can be replaced by any other character
RewriteCond %1<>%{HTTP_COOKIE} !^(.*)<>.*stickysession=\1.*$ [NC]
RewriteRule ^(.*)$ https://notmatch.domain.com/ [R=301,L]
Dynamic Load Balancing:
If you use the mod_proxy to balance your system, it's possible to add a dynamic range of worker server.
RewriteCond %{HTTP_COOKIE} ^.*stickysession=route\.server([0-9]{1,2}).*$ [NC]
RewriteRule (.*) https://worker%1.internal.com/$1 [P,L]

A better understanding of the [L] flag is in order. The [L] flag is last, you just have to understand what will cause your request to be routed through the URL parsing engine again. From the docs (http://httpd.apache.org/docs/2.2/rewrite/flags.html#flag_l) (emphasis mine):
The [L] flag causes mod_rewrite to stop processing the rule set. In
most contexts, this means that if the rule matches, no further rules
will be processed. This corresponds to the last command in Perl, or
the break command in C. Use this flag to indicate that the current
rule should be applied immediately without considering further rules.
If you are using RewriteRule in either .htaccess files or in <Directory> sections, it is important to have some understanding of
how the rules are processed. The simplified form of this is that once
the rules have been processed, the rewritten request is handed back to
the URL parsing engine to do what it may with it. It is possible that
as the rewritten request is handled, the .htaccess file or <Directory>
section may be encountered again, and thus the ruleset may be run
again from the start. Most commonly this will happen if one of the
rules causes a redirect - either internal or external - causing the
request process to start over.
So the [L] flag does stop processing any further rewrite rules for that pass through the rule set. However, if your rule marked with [L] modified the request, and you're in the .htaccess context or the <Directory> section, then your modifed request is going to be passed back through the URL parsing engine again. And on the next pass, it may match a different rule this time. If you don't understand what happened, it looks like your first rewrite rule with the [L] flag had no effect.
The best way around this is to use the [END] flag (http://httpd.apache.org/docs/current/rewrite/flags.html#flag_end) instead of the [L] flag, if you truly want to stop all further processing of rules (and subsequent reparsing). However, the [END] flag is only available for Apache v2.3.9+, so if you have v2.2 or lower, you're stuck with just the [L] flag. In this case, you must rely on RewriteCond statements to prevent matching of rules on subsequent passes of the URL parsing engine. Or you must ensure that your RewriteRule's are in a context (i.e. httpd.conf) that will not cause your request to be re-parsed.

Another great feature are rewrite-map-expansions. They're especially useful if you have a massive amout of hosts / rewrites to handle:
They are like a key-value-replacement:
RewriteMap examplemap txt:/path/to/file/map.txt
Then you can use a mapping in your rules like:
RewriteRule ^/ex/(.*) ${examplemap:$1}
More information on this topic can be found here:
http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#mapfunc

mod_rewrite can modify aspects of request handling without altering the URL, e.g. setting environment variables, setting cookies, etc. This is incredibly useful.
Conditionally set an environment variable:
RewriteCond %{HTTP_COOKIE} myCookie=(a|b) [NC]
RewriteRule .* - [E=MY_ENV_VAR:%b]
Return a 503 response:
RewriteRule's [R] flag can take a non-3xx value and return a non-redirecting response, e.g. for managed downtime/maintenance:
RewriteRule .* - [R=503,L]
will return a 503 response (not a redirect per se).
Also, mod_rewrite can act like a super-powered interface to mod_proxy, so you can do this instead of writing ProxyPass directives:
RewriteRule ^/(.*)$ balancer://cluster%{REQUEST_URI} [P,QSA,L]
Opinion:
Using RewriteRules and RewriteConds to route requests to different applications or load balancers based on virtually any conceivable aspect of the request is just immensely powerful. Controlling requests on their way to the backend, and being able to modify the responses on their way back out, makes mod_rewrite the ideal place to centralize all routing-related config.
Take the time to learn it, it's well worth it! :)

Related

Editing .htaccess file to modify URL

I'm trying to modify my .htaccess file to modify my URL and have tried many methods but cannot achieve exactly what I want. For example I have this URL:
http://mywebsite.com/FOLDER/index.php?id=5
Now I want it to look like:
http://mywebsite.com/FOLDER/5
or
http://mywebsite.com/FOLDER/ID/5
My .htaccess contains the following code:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^index/([0-9]+)/([0-9a-zA-Z_-]+) index.php?id=$1 [NC]
I cannot figure out what's wrong. Thanks.
You can use:
RewriteEngine on
# external redirect from actual URL to pretty one
RewriteCond %{THE_REQUEST} \s/+FOLDER/index\.php\?id=(\d+) [NC]
RewriteRule ^ /FOLDER/%1? [R=301,L,NE]
# internal forward from pretty URL to actual one
RewriteRule ^FOLDER/(\d+)/?$ FOLDER/index.php?id=$1 [L,QSA,NC]
The first argument of RewriteRule is what the incoming url without domain and without preceding paths (more on that later) is going to be matched against. This url is, in your case, http://mywebsite.com/FOLDER/5. Assuming that your .htaccess file is in your DocumentRoot, the regex will match against FOLDER/5.
You are currently trying to match FOLDER/5 with ^index/([0-9]+)/([0-9a-zA-Z_-]+), which is not going to work. A better regex would be ^(.*)/([0-9]+)$ or ^(.*)/ID/([0-9]+)$. You can then rewrite to $1/index.php?id=$2. I would recommend using the [L] flag to stop rewriting for this round to avoid common problems with multiple rules matching while you do not expect them to.
Besides this, make sure that your .htaccess files are being read (e.g. by checking that if you enter garbage, you get a 500 internal server error), that mod_rewrite is enabled, that you are allowed to override FileInfo. You also may need to turn AcceptPathInfo off.

Apache mod_rewrite for specific folders and paths

I have found dozens of articles online on how to setup mod_rewrites but for the love of God I can't figure out how to PROPERLY force HTTPS on ALL pages and after that force HTTP on certain directories or (already rewritten) pages.
Now this one gets really tricky as I need HTTPS on this directory, except for two cases, such as "/surf" which actually is rewritten from "surf.php", and "promote-([0-9a-zA-Z-]+)$" which is rewritten from "promote.php?user=$1" :
<Directory /home/rotate/public_html/ptp/>
AllowOverride None
Order Deny,Allow
Allow from all
Options +SymLinksIfOwnerMatch
ErrorDocument 404 "<h1>Oops! Couldn't find that page.</h1>"
RewriteEngine On
RewriteRule ^promote-([0-9]+)$ promote.php?user=$1 [QSA,NC,L]
RewriteRule ^([^.?]+)$ %{REQUEST_URI}.php [L]
</Directory>
I have tried some stuff but which only resulted in some weird redirection loops...
RewriteCond %{HTTPS} on
RewriteRule !^(surf|promote-([0-9]+)$) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
So basically I need to force HTTPS everywhere in /ptp/ except /ptp/surf (which is rewritten from surf.php AND /ptp/promote-123 which is rewritten from promote.php?user=123
Currently I'm using PHP to redirect to HTTP or HTTPS as per my needs but I know that it would be much faster if I could manage to do it via rewrites.
Any pointers, tips, suggestions? Thanks.
UPDATE2: This worked:
RewriteCond %{HTTPS} off
RewriteRule !^(surf|promote(-[0-9]+)?) https://%{HTTP_HOST}%{REQUEST_URI} [R=301]
RewriteRule ^promote-([0-9]+)$ promote.php?user=$1 [NC,L]
RewriteRule ^([^.?]+)$ %{REQUEST_URI}.php
However, the resources such as javascript, fonts etc, were being blocked by the browser, unless I specified absolute HTTPS paths. Note that this never happened when redirecting through PHP...
I changed a little bit and it works perfectly
Changes
Remove the Change the RewriteRule to match file to .php to bottom.
Remove the $ sign that is End of the pattern
As Said in the update promote-1111 will redirect to promote.php?user=$1 change the promote-[0-9]+ to promote(-[0-9]+)? otherwise it will override in the second redirection as you redirecting it to promote.php?user=$1
The Code
RewriteCond %{HTTPS} off
RewriteRule !^(surf|promote(-[0-9]+)?) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
RewriteRule ^([^.?]+)$ %{REQUEST_URI}.php [L]
The page surf
The Page Index
Never mind the error message shown in this image. Since I tried it from my localhost, it won't have a certificate.
Will work with servers
Your rules aren't in the "update" working because of side effects of using <Directory> context. Each substitution starts processing again.
When you request /promote-123 and rewrite it to put the numbers in the query string, you can't then match the numbers as if they're still in the path. You'll need to match the rewriten path and the numbers with RewriteCond %{QUERY_STRING} (if you care about the numbers)

Why does mod_rewrite ignore my [L] flag?

This is my .htaccess file. It should deliver static files from assets folder if the url matches them. Otherwise, everything should be redirected to index.php.
Note that the url doesn't contain assets as segemnt here. So example.com/css/style.css directs to assets/css/style.css.
RewriteEngine on
# disable directory browsing
Options -Indexes
# static assets
RewriteCond %{DOCUMENT_ROOT}/assets/$1 -f
RewriteRule ^(.*)$ assets/$1 [L]
# other requests to index.php
RewriteRule !^asset/ index.php [L]
Unfortunately, urls like example.com/assets/css/style.css also deliver the file, since for that url none of my rules applies and Apache's default behavior is applied which delivers the file.
So I tried changing the last line to this. I thought that this would work since the [L] flag in the rule above should stop execution for asset urls and deliver them.
RewriteRule ^(.*)$ index.php [L]
Instead, not all requests are redirected to index.php, even static assets like example.com/css/style.css. Why does the flag not stop execution of rewrite rules and who to fix my problem then?
I found the solution on the pages of the official documentation.
If you are using RewriteRule in either .htaccess files or in
sections, it is important to have some understanding of
how the rules are processed. The simplified form of this is that once
the rules have been processed, the rewritten request is handed back to
the URL parsing engine to do what it may with it. It is possible that
as the rewritten request is handled, the .htaccess file or
section may be encountered again, and thus the ruleset may be run
again from the start. Most commonly this will happen if one of the
rules causes a redirect - either internal or external - causing the
request process to start over.
It is therefore important, if you are using RewriteRule directives in
one of these contexts, that you take explicit steps to avoid rules
looping, and not count solely on the [L] flag to terminate execution
of a series of rules, as shown below.
An alternative flag, [END], can be used to terminate not only the
current round of rewrite processing but prevent any subsequent rewrite
processing from occurring in per-directory (htaccess) context. This
does not apply to new requests resulting from external redirects.
To fix my problem, I changed to [L] flags to [END].
RewriteEngine on
# disable directory browsing
Options -Indexes
# static assets
RewriteCond %{DOCUMENT_ROOT}/assets/$1 -f
RewriteRule ^(.*)$ assets/$1 [END]
# other requests to index.php
RewriteRule !^asset/ index.php [END]

Redirect loop with simple htaccess rule

I have been pulling my air out over this. It worked before the server migration!
Ok so basically it's as simple as this:
I have a .php file that I want to view the content of using a SEO friendly URL via a ReWrite rule.
Also to canonicalise and to prevent duplicate content I want to 301 the .php version to the SEO friendly version.
This is what I used and has always worked till now on the new server:
RewriteRule ^friendly-url/$ friendly-url.php [L,NC]
RewriteRule ^friendly-url.php$ /friendly-url/$1 [R=301,L]
However disaster has struck and now it causes a redirect loop.
Logically I can only assume that in this version of Apache it is tripping up as it's seeing that the script being run is the .php version and so it tries the redirect again.
How can I re-work this to make it work? Or is there a config I need to switch in WHM?
Thanks!!
This is how your .htaccess should look like:
Options +FollowSymLinks -MultiViews
RewriteEngine On
RewriteBase /
# To externally redirect /friendly-url.php to /friendly-url/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+(friendly-url)\.php [NC]
RewriteRule ^ /%1/? [R=302,L]
## To internally redirect /anything/ to /anything.php
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}/$1\.php -f
RewriteRule ^(.+?)/$ $1.php [L]
Note how I am using R=302, because I don't want the rule to cache on my browser until I confirm its working as expected, then, once I can confirm its working as expected I switch from R=302 to R=301.
Keep in mind you may have also been cached from previous attempts since you're using R=301, so you better of trying to access it from a different browser you have used just to make sure its working.
However disaster has struck and now it causes a redirect loop.
It causes a redirect loop because your redirecting it to itself, the different on my code is that I capture the request, and redirect the php files from there to make it friendly and then use the internal redirect.
The exact same .htaccess file will work differently depending on where it's placed because the [L]ast flag means something different depending on location. In ...conf, [L]ast means all finished processing so get out, but in .htaccess the exact same [L]ast flag means start all over at the top of this file.
To work as expected when moving a block of code from ...conf to .htaccess, most .htaccess files will need one or the other of these tweaks:
Change the [L]ast flags to [END]. (Problem is, the [END] flag is only available in newer [version 2.3.9 and later] Apaches, and won't even "fall back" in earlier versions.)
Add boilerplate code like this at the top of each of your .htaccess files:
*
RewriteCond %{ENV:REDIRECT_STATUS} !^[\s/]*$
RewriteRule ^ - [L]

mod_rewrite ignores [L]

I want to be able to rewrite this
http://localhost/.../identicon/f528764d624db129b32c21fbca0cb8d6.png
to
http://localhost/.../identicon.php?hash=f528764d624db129b32c21fbca0cb8d6
so I add to the /.../.htaccess so this is it:
RewriteEngine On
RewriteRule ^resource/ - [L]
RewriteRule ^identicon/(.+)\.png$ identicon.php?hash=$1 [QSA,L]
RewriteRule ^(.*)$ index.php?t=$1 [QSA,L]
Which doesn't work for some reason because it redirects it to index.php?t=identicon.php; even though the L flag is set! Why?
Add a condition to the last rule to exclude requests that can be mapped to existing files:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php?t=$1 [QSA,L]
That is necessary because the L flag generates an internal redirect with the new URL as the request URL:
Remember, however, that if the RewriteRule generates an internal redirect (which frequently occurs when rewriting in a per-directory context), this will reinject the request and will cause processing to be repeated starting from the first RewriteRule.
(Not correct answer; left for reference)
I just figured out what may be the issue - it's something that thwarted me for a long time.
Depending on your server settings, it very well may be interpreting identicon/xxx.png as a request to identicon.php/xxx.png, assuming that the PHP extension is what you wanted. Try going to /index instead of /index.php - if it loads the PHP file, this is the issue affecting you.
This is the MultiViews Apache option, and it's stupid, but it has to be enabled specifically. Go into your site configuration file and see where it is enabled, and remove it.
If you don't have total control over your server configuration, the following may work in .htaccess (depending, ironically, on your server configuration).
Options -Multiviews