How can I get mod_rewrite to match a rule just once - apache

I have the following URL...
http://localhost/http.mygarble.com/foundationsofwebprogramming/86
...that I want to convert into the following:
http://localhost/http.mygarble.com/php/blog.php?subdomain=foundationsofwebprogramming&page=posts&label=86
I thought I could achieve this with the following rule:
RewriteRule ([^/]+)/([^/]+)$ php/blog.php?subdomain=$1&page=post&label=$2 [NC,L]
However what I find is that this rule is applied repeatedly, resulting in an internal server error. I understand that when the URI is transformed using this rule, the resulting URI will also match the rule, and therefore it is applied again ad-infinitum.
My previous (admittedly rather hazy) understanding was that the [L] flag would stop further processing, although I now understand that this simply means that only the remainder of the rules are skipped, and does not stop the rewrite engine running through the rules again.
I can fix this problem by adding the following condition...
RewriteCond $0 !php/blog.php
RewriteRule ([^/]+)/([^/]+)$ php/blog.php?subdomain=$1&page=post&label=$2 [NC,L]
...or by writing a more specific regular expression. But what I really want to do is find a way of stopping the rewrite engine from attempting ANY further matches once this rule is matched once. Is this possible?
Many thanks.

Usually 2 methods are used.
The first one is a Rewrite Condition testing that the requested file is not a real file. When internal recursion arise your php/blog.php is a real file and rewriterule is not executed the 2nd time. Side-effect is that any request for a file which exists won't be rewritten (which can be good side effect)
RewriteCond %{REQUEST_FILENAME} !-f
The second solution is to check you're not in an internal redirection with:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
Side effect of this 2nd solution is that the rewriteRule cannot be applied if some other rules are applied before (if you want some internal redirection to run after a first pass of rewriting in fact).
Edit
For completion I will add a third method: the [NS] or [nosubreq] tag seems to be doing the same thing. Preventing the rule usage after an internal redirection.

And the third method is to upgrade apache to 2.3.9 or higher and use [END] flag instead of [L].
No side effects

Related

Multiple RewriteRules for a single URL

I want to use multiple RewriteRules in .htaccess to modify a single URL, but only the last rule gets applied.
Example INPUT (link loaded by a browser):
http://example.com/aaa/foo/bar
.htaccess:
RewriteRule ^(.*)foo/(.*)$ $1nofoo/$2
RewriteRule ^(.*)bar(.*)$ $1nobar$2
EXPECTED OUTPUT (what Apache should actually look at):
http://example.com/aaa/nofoo/nobar
ACTUAL OUTPUT:
http://example.com/aaa/foo/nobar/
As you can see, only the last rule was applied. Is there any way to make it work the way I want? All suggestions are welcome.
PS. I want to avoid creating a static, ugly rule like
^(.*)foo/bar(.*) $1nofoo/nobar$2
I need all the modifications to work independently of each other.
UPDATE
So here is exactly what I am trying to achieve. I have some links to a backend server:
http://myserver.com/api/user/$userid/car/$carid/getSpeedRecordDetails
http://myserver.com/api/user/$userid/getUserDetails
http://myserver.com/api/car/$carid/getCarDetails
Where $userid and $carid are some unique 12-char-long strings.
And I want to transform them to these:
http://myserver.com/api/getSpeedRecordDetails.php?userid=$userid&carid=$carid
http://myserver.com/api/getUserDetails.php?userid=$userid
http://myserver.com/api/getCarDetails.php?carid=$carid
And I want to achieve it using the least RewriteRules possible (I am looking for a dynamic solution).
UPDATE #2
I love the SO community! Your patience and willingness to help is truly amazing :)
So the very reason why I am interested in modifying the URL using multiple RewriteRules is because I expect that my backend might soon need to implement hundreds (if not thousands) of user-friendly URLs, and mapping all of them individually would be a waste of time and money. Therefore, I want to take advantage of the fact that all the user-friendly URLs consist of repetitive chunks that can be easily translated. The calls below represent the general three types of the user-friendly URLs I need to manage. The only difference within each type is $userid, $carid, and XXXX_of_a_thousand_functions.
http://myserver.com/backend-api/user/$userid/car/$carid/first_of_a_thousand_functions.do
http://myserver.com/backend-api/user/$userid/second_of_a_thousand_functions.do
http://myserver.com/backend-api/car/$carid/sixteenth_of_a_thousand_functions.do
All of these calls (and remember, there will be hundreds, or even thousands of them) need to be translated into these:
http://myserver.com/backend-api/first_of_a_thousand_functions.php?USER_ID=$userid&CAR_ID=$carid
http://myserver.com/backend-api/second_of_a_thousand_functions.php?USER_ID=$userid
http://myserver.com/backend-api/sixteenth_of_a_thousand_functions.php?CAR_ID=$carid
Seeing that there is a simple pattern governing the translation (I hope it is also visible to you), I thought I could create some simple rules for translating different 'chunks' of the user-friendly URL into the internal URL. For example:
RewriteRule ^(.*)user/([A-Za-z0-9]+)/(.*)$ $1$3&USER_ID=$2
Would be responsible for translating the piece user/HSGRE8563LOS into &USER_ID=HSGRE8563LOS
And because some calls have more than one 'chunk' to process, I need to be able to use multiple RewriteRules on a single URL, which I hope somewhat correlates with the title of the question :)
UPDATE #3 - a future reference
So there are a couple of things I believe need to be said regarding this question.
Apparently .htaccess DOES by default apply all the rules that the conditions of are met. However, some CGI/Fast-CGI installations ruin it, resulting in the kind of behaviour depicted in the first example.
Also, one thing I have NEVER seen mentioned anywhere is that Apache applies the RewriteRules not in the order in which they are listed in .htaccess, but it starts 'scanning' the URL from its beginning, and as soons as the conditions for ANY of the rules are met, the URL gets modified accordingly.
After UPDATE#1... to internally rewrite from the stated friendly URLs using mod_rewrite. Try the following directives in the root .htaccess file:
RewriteEngine On
# http://myserver.com/api/user/$userid/car/$carid/getSpeedRecordDetails
RewriteRule ^api/user/(\w{12})/car/(\w{12})/(getSpeedRecordDetails)$ /api/$3.php?userid=$1&carid=$2 [L]
# http://myserver.com/api/user/$userid/getUserDetails
RewriteRule ^api/user/(\w{12})/(getUserDetails)$ /api/$2.php?userid=$1 [L]
# http://myserver.com/api/car/$carid/getCarDetails
RewriteRule ^api/car/(\w{12})/(getCarDetails)$ /api/$2.php?carid=$1 [L]
\w{12} matches a 12 char long string of letters (upper/lower), numbers and underscore. However, this should be made as restrictive as possible. eg. if a valid id is only numeric then \d{12} would be preferable.
UPDATE#2 The process is almost the same as above....
RewriteBase /backend-api
# http://myserver.com/backend-api/user/$userid/car/$carid/<any>.do
RewriteRule ^backend-api/user/(\w{12})/car/(\w{12})/(\w+)\.do$ $3.php?USER_ID=$1&CAR_ID=$2 [L]
# http://myserver.com/backend-api/user/$userid/<any>.do
RewriteRule ^backend-api/user/(\w{12})/(\w+)\.do$ $2.php?USER_ID=$1 [L]
# http://myserver.com/backend-api/car/$carid/<any>.do
RewriteRule ^backend-api/car/(\w{12})/(\w+)\.do$ $2.php?CAR_ID=$1 [L]
Note the use of RewriteBase. This allows the URL-path to be removed from the RewriteRule substitution.
If /backend-api exists as a physical directory then these rules could instead go in /backend-api/.htaccess. Then you could remove the RewriteBase directive and modify the RewriteRule pattern by removing the backend-api/ portion from near the start of the pattern.
UPDATE#3 - a future reference
Apparently .htaccess DOES by default apply all the rules that the conditions of are met. However, some CGI/Fast-CGI installations ruin it, resulting in the kind of behaviour depicted in the first example.
There are certainly some server (mis)configurations that appear to affect certain aspects of .htaccess/mod_rewrite, however, not in the way suggested earlier in your question and I have never encountered this directly myself. And this should have nothing to do with CGI/Fast-CGI. (?)
...Apache applies the RewriteRules not in the order in which they are listed in .htaccess, ...
Not sure exactly what you mean by this, but RewriteRules are processed "in the order in which they are listed in .htaccess" - this is fundamental to how mod_rewrite works. (However, different modules execute at different times, regardless of the order in .htaccess, but within each module the directives execute top-down, in order. eg. mod_rewrite executes before mod_alias (usually), so if you have a mod_alias Redirect before a mod_rewrite RewriteRule, the RewriteRule is still processed first - which is why it is a bad idea to mix redirects from both modules, as you can end up with confusing conflicts.)
...but it starts 'scanning' the URL from its beginning, and as soons as the conditions for ANY of the rules are met, the URL gets modified accordingly.
Exactly, in the order in which they appear in the file.
Note that it's the RewriteRule directives that are scanned (top - down), not the conditions (ie. RewriteCond directives) - in case that is what you were implying. If the RewriteRule pattern matches the URL-path then the preceding RewriteCond directives are processed and if all these pass then the substitution occurs. (Which is why it is always more efficient to match what you can with the RewriteRule and not rely totally on the RewriteCond directives - the RewriteRule is processed first, not the RewriteCond directives.)
Crucially, (and this might be where you are tripping up?), is that if you have multiple RewriteRule directives then the following RewriteRule directives match against the output/substitution of the previously matched RewriteRule (if any), not against the URL-path of the initial request. Only the first matched RewriteRule matches against the URL-path of the request. So, yes, RewriteRule directives do chain together.
The exception to this is the L (LAST) flag on the RewriteRule. This "breaks" the chain. Although not completely... it causes the current round of processing to stop, but then it all starts again from the top! Only when the URL passes through unchanged is processing finished. (However, in Apache 2.4 you do have the END flag - this does indeed halt processing completely.)

Using [L] flag when redirecting to another host

The apache docs mention
You will almost always want to use [R] in conjunction with [L]
(that is, use [R,L]) because on its own, the [R] flag prepends
http://thishost[:thisport] to the URI, but then passes this on to
the next rule in the ruleset, which can often result in 'Invalid URI
in request' warnings.
Now, let's say I've a rewrite statement as follows:
RewriteRule ^pass/?$ https://www.example.com/ [R,NC]
Since, here I am redirecting my client to another host/domain, do I really need to pass the [L] flag?
Would there be effects if I am also using the [QSA] flag?
Good question indeed.
In all my testing I couldn't create a scenario where omitting L from this rule shows any change in the overall behavior. Only change happens internally as putting L flag force mod_rewrite to run the rewrite loop immediately rather than going till the end of file. So IMO having L is better for performance reasons if not for other reasons.
About QSA: It is absolutely not needed here since QSA is needed only when your rule is modifying QUERY_STRING which is not the case here.

Using variable in RewriteCond condition patern

I want to associate user Ids to a specific application Id like:
<user_id> <app_id>
615 1
616 7
617 3
618 3
My URIs looks like:
/<app_id>/<user_id>/...
Now, I want to be able to easily change the application without impacting the user bookmarks. In my example, I want both
/1/615/index.html or /3/615/index.html
to be served as
/1/615/index.html
With the following rule, I get infinite loop:
RewriteMap map dbm:user-application.map
RewriteRule ^/([0-9]+)/([0-9]+)/(.*)$ /${map:$2}/$2/$3 [R,L]
...
#other proxy code to forward request to applications
I understand that after the redirection, Apache will always execute the same rule.
I then tried to add a rewrite condition to block the loop, like
RewriteMap map dbm:user-application.map
RewriteCond %{REQUEST_URI} !^/${map:$2}
RewriteRule ^/([0-9]+)/([0-9]+)/(.*)$ /${map:$2}/$2/$3 [R,L]
If I read correctly my rewrite logs, I can see that the variable !^/${map:$2} is not replaced in the condition pattern, but checked "as it". And then the condition is always true, and I still get my infinite loop.
Any idea to block the loop as soon as the application id match my map?
/3/615/index.html is correctly redirecting to /1/615/index.html
The problem is that you are redirecting /1/615/index.html to /1/615/index.html as well - you want to detect the case in which the map transform is a no-op and not redirect at all in that case.
If you don't care about the user-facing URL, just change the [R,L] to [L] (removing the R) and you should be fine since it won't trigger a new round-trip from the client.
You're right that the $2 backreference won't work in a RewriteCond expression; this is because the RewriteRule hasn't yet been evaluated. You might be able to use %n - style backreferences to a regex in a previous RewriteCond...
RewriteCond {%REQUEST_URI} ^/([0-9]+)/
RewriteCond {%REQUEST_URI} !^/${map:%1}
But I have not tested this, so YMMV.

ShortUrl: why does it choose one line over another?

RewriteRule ^ihome/([^/]+) /index_ip.php?page=ihome [L]
RewriteRule ^([^/\.]+)/?$ index.php?page=$1 [L]
When I go to site.local/ihome.... it goes to index.php?page=ihome.
Yet the ihome rule is above it.
I have tried it the other way around too.
I need to force ihome to a particular template file.
It's because the first rule requires a / to be present, and at least one non-/-character after that. The second rules does not (note the ? after the /. That's why browsing to ihome sends you to index.php?page=ihome.
You could modify your first rule to something like ^ihome/?([^/]+)? so the slash and characters are optional. The rule by first looks a bit strange though, as it would skip ihome/example/secondslash aswell. Is that intended?

Apache rewrite rule leading slash

Leading slash first argument: ignored?
What's the syntax difference between
RewriteRule help help.php?q=noslash [L] #1
RewriteRule /help help.php?q=withslash [L] #2
If I hit http://localhost/help, it goes to #1, if I hit http://localhost//help it still goes to #1.
Am I right in saying the leading slash in the first argument to RewriteRule is essentially ignored?
Leading slash second argument: error?
Also, why doesn't this rewrite rule work?
RewriteRule help /help.php [L] #1
Putting a leading slash in front of the second arg actually creates a 500 error for the server. Why?
I should note I'm using a .htaccess file to write these rules in
Strangely enough,
RewriteRule ^/help help.php?q=2 [L]
The above rule fails and never matches.
This rule:
RewriteRule ^help help.php?q=1 [L]
Matches http://localhost/help, http://localhost//help and http://localhost///help
It appears RewriteRule never sees leading slashes of the path, and as TheCoolah said they are collapsed (to 0.. when using a .htaccess file anyway) no matter how many there are.
For the second part of the question,
RewriteRule ^help /help.php
I'm getting the answer from Definitive Guide to Apache Mod_rewrite
... a rewrite target that does not begin with http:// or another protocol
designator is assumed to be a file system path. File paths that do not begin with a slash are interpreted as being relative to the directory in which the rewriting is taking place.
So /help.php looks in the root of the system for a file called help.php, which on my system it cannot find.
To make /help.php appear as a relative URL (relative to the root of the site) you can use the [PT] directive:
RewriteRule ^/help /help.php [PT]
That directs http://localhost/help to http://localhost/help.php.
Regarding double slashes: Most Web servers silently collapse multiple slashes into a single slash early in the request processing pipeline. This is true for at least Apache, Tomcat and Jetty. Most Unix-based file systems work the same way. If you really want to check for this, you need to do something like:
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
help matches "help" anywhere in the path.
/help matches nothing since the rewriterule directive omits the leading slash for matching purposes (i.e., you must use ^, not / or ^/, to reference the current directory).
(This can be very confusing if you've used %{REQUEST_URI} in rewritecond because %{REQUEST_URI} does begin with a trailing slash. When matching against %{REQUEST_URI}, ^ and ^/ are equivalent and a directory name will always be preceded by a slash character regardless of whether or not it is in the top-level directory.)
The server error is caused by an infinite loop. "help" becomes "/help.php" which is then matched by the same directive that did the rewriting. So, after the first match, "/help.php" becomes "/help.php" infinitely resulting in a URL that can't be resolved.
I believe such loops can be fixed with the end flag (i.e., [end]), but that flag requires Apache 2.3.9+ whereas Apache 2.2 seems to be more common in deployment. It'd probably be better to just fix the regular expression anyway; ^help$ would seem to be the better choice here.
The way RewriteRule works is that if the given regular expression matches any part of the path part of the URL (the part after the host and port but before the query string), then the entire path part is completely replaced with the given substitution. This explains the behaviour you're seeing in the first part of your question.
I'm not sure what could be causing the 500 errors on the second part; maybe the collapsing of doubled slashes doesn't happen after the rewrite engine has run and then generates a server error.
The reason for the 500 Error is an infinitive Loop:
help gets rewritten to /help
/help gets stripped to help
help gets rewritten to /help
etc. until the MaxRewrites limit is hit -> 500
Whereas if the rule rewrites help to help, Apache is smart enough to abort rewriting at that point.