mod_rewrite and dot at END of URI - apache

I have been developing a shop, which uses mod_rewrite to allow us to make the URIs more readable, for instance:
http://www.example.com/shop/Tools
Will be rewritten to
http://www.example.com/index.php?area=shop&folder=Tools
My rewrite rule is as follows:
RewriteRule ^shop/([^?#]+) index.php?area=shop&folder=$1 [NC,QSA,L]
However, this breaks when the folder name ends in . (dot), as I discovered when testing with a folder name ending in "etc."
It seems any trailing dots are totally removed before $_GET has been populated. If I put another character after the dot, it's fine, if the URI ends in any number of dots, they are removed
Is there a way to stop this from happening?

You don't need to exclude "?" and "#" in RewriteRule since it operates only on URI (path), without query-string or anchor.
So, this is enough:
RewriteRule ^shop/(.+) index.php?area=shop&folder=$1 [NC,QSA,L]
That being said, this does not change the fact that dots get stripped.
This may be a result of MultiViews being on. This option makes Apache try and resolve the URIs disregarding extensions.
So, add this as well:
Options -Multiviews

Related

.htaccess : Pretty URL with whatever number+names of parameters

Hello !
I know there already are a lot of topics about URL rewritting and I honestly swear I've spent a lot of time trying to apply them to my problem but I can't see any of them perfectly applying to my situation (if you find otherwise, please give the link).
-----
Here's the problem :
I'm learning MVC model and URL rewriting and I have my URL like this :
http://localhost/blahblahblah/mywebsite/index.php?param1=value1&param2=value2&param3=value3 ... etc ...
What I want (for some MVC template goals) is to have this kind of URL :
http://localhost/blahblahblah/mywebsite/value1/value2/value3 ... etc ...
-----
Whatever are the names of the parameters and whatever are the values.
This is the most essential thing I can't find a solution for.
(Also don't mind the localhost blahblahblah, this has to work even on distant websites but I trust it will work fine on online website has this part of URL may have no importance in what I want to do)
Thanks a lot for your time if you can help me seeing clearer in what I need to do.
If the .htaccess file is located in the document root (ie. effectively at http://localhost/.htaccess) then you would need to do something like the following using mod_rewrite:
RewriteEngine On
RewriteRule ^(blahblahblah/mywebsite)/(\w+)$ $1/index.php?param1=$2 [L]
RewriteRule ^(blahblahblah/mywebsite)/(\w+)/(\w+)$ $1/index.php?param1=$2&param2=$3 [L]
RewriteRule ^(blahblahblah/mywebsite)/(\w+)/(\w+)/(\w+)$ $1/index.php?param1=$2&param2=$3&param3=$4 [L]
# etc.
Where $n is a backreference to the corresponding captured group in the preceding RewriteRule pattern (1st argument).
UDPATE: \w is a shorthand character class that matches a-z, A-Z, 0-9 and _ (underscore).
A new directive is required for every number of parameters. You could combine them into a single (complex) directive but you would have lots of empty parameters when only a few parameters were passed (rather than not passing those parameters at all).
I'm assuming your URLs do not end in a trailing slash.
If, however, the .htaccess file is located in the /blahblahblah/mywebsite directory then then directives could be simplified a bit:
RewriteRule ^(\w+)$ index.php?param1=$1 [L]
RewriteRule ^(\w+)/(\w+)$ index.php?param1=$1&param2=$2 [L]
RewriteRule ^(\w+)/([\w]+)/([\w]+)$ index.php?param1=$1&param2=$2&param3=$3 [L]
# etc.
Don't use URL parameters (alternative method)
An alternative approach is to not convert the path segments into URL parameters in .htaccess and instead just pass everything to index.php and let your PHP script split the URL into parameters. This allows for any number of parameters.
For example, your .htaccess file then becomes rather more simple:
RewriteRule ^\w+(/\w+)*$ index.php [L]
(This assumes the .htaccess file is located in /blahblahblah/mywebsite directory, otherwise you need to add the necessary directory prefix as above.)
The RewriteRule pattern simply validates the request URL is of the form /value1 or /value1/value2 or /value1/value2/value3 etc. And the request is rewritten to index.php (the front-controller) to handle everything.
In index.php you then examine $_SERVER['REQUEST_URI'] and parse the requested URL.

how do I redirect a url and remove part of the string

I have an url like this:
domain.com/thispart/blah-something-blah-remove
I need to redirect this url like this:
domain.com/newpart/blah-something-blah
change the directory and remove the last part(the last part is always constant, this time it's remove).
How do I do this? I managed to redirect the directory, but the last par I don't know how to remove.
Providing you don't have any existing mod_rewrite directives then you can do something like the following using a mod_alias RedirectMatch directive near the top of your .htaccess file:
RedirectMatch ^/thispart/([\w-]+)-remove$ /newpart/$1
Note that this removes "-remove", as in your example, not simply the string "remove".
The path segment before the "-remove" part is assumed to consist of the characters 0-9, a-z, A-Z, _ or -.
This is a temporary (302) redirect.
You will need to clear your browser cache before testing.
However, if you have existing mod_rewrite directives then you should use mod_rewrite instead to avoid potential conflicts (and improve efficiency). For example:
RewriteRule ^thispart/([\w-]+)-remove$ /newpart/$1 [R=302,L]
(Note the absence of the slash prefix on the RewriteRule pattern.)

Apache Content Negotiation, File Extensions, and Permanent Redirects

Given an Apache 2.x web server that uses Content Negotiation (+MultiViews) to access URLs without their extensions (e.g., allow /foo vs. /foo.php or /foo.html), how does one issue a 301 permanent redirect when someone does in fact try to use those extensions?
The goal is for all roads to lead to the sans extension version of the URL, so /foo/ goes to /foo, and /foo.html goes to /foo. It's the latter one that is proving tricky. (Use case: There are scattered legacy URLs out on the internets that still use the extension. We want those to be permanently redirected.)
There is the canonical link element but, even in the accompanying slides, the suggestion is it's better to do the redirect server-side in the first place.
I've been trying this with mod_rewrite, but it seems like Apache "beats me to it" as it were. It's as if the extension is simply ignored. "No need, I've got it covered" says Apache. But then you can't handle the permanent redirect, and thus extension and no-extension variants are both allowed. Not the desired result. :)
Here's one example. Given a 2-4 character filename consisting of lower-case letters, and a test file placed in /foo/file.html, we want to permanently redirect to /foo/file:
Options +MultiViews
RewriteEngine on
...
RewriteRule ^foo/([a-z]{2,4}).html/$ /foo/$1 [R=301,L]
/foo/file/ and /foo/file.html/ do redirect to /foo/file, but of course /foo/file.html does not. If we try a rule like the following (note the lack of a trailing slash before $):
RewriteRule ^foo/([a-z]{2,4}).html$ /foo/$1 [R=301,L]
... we end up with too many redirects because Apache acts as if the rule is the following, and so it ends up chasing its own tail:
RewriteRule ^foo/([a-z]{2,4})$ /foo/$1 [R=301,L]
In an attempt to be too clever for my own good, I also tried nested parentheses:
RewriteRule ^foo/(([a-z]{2,4}).html)$ /foo/$2 [R=301,L]
No dice. Redirect loop city.
What would be really good is to capture this sort of thing "en masse" so I don't have all these special cases floating around in htaccess.
Another SO question began to address this for the single case of handling html files, but the proposed solution ostensibly requires disabling of Content Negotiation, which isn't good if you still want to use it for images and other file extensions (as it is in my case).
Extra credit: We also want to avoid trailing slashes, so if someone tries /foo/ (which could itself be a .html or .php file) it goes to /foo no matter what. The first rule (above) accomplishes this, but I think it's due to +MultiViews. I have my doubts about using DirectorySlash here, as there may be some risk there that makes it not as worthwhile.

Apache mod_rewrite not persisting the name

I usually put my mod_rewrite conditions in an .htaccess file, but this is a case where it must go into the httpd.conf file.
I am confused because what I want to do seems simple:
The root of the site is a nested directory: mydomain.com/foo/bar/
It just has to be that way.
I want to write a rule so a person can enter:
mydomain.com/simple and it will show content from mydomain/foo/bar
Also, if a person clicks around the site, I want the mydomain.com/simple/some-other-page structure to persist.
The closest I've gotten is this:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^simple$ /foo/bar/$1 [PT]
</IfModule>
However, using this rule, when a person types mydomain.com/simple it rewrites the URI in the browser to mydomain.com/foo/bar
What am I doing wrong?
Thanks in advance.
First, there may be a problem with this rule:
RewriteRule ^simple$ /foo/bar/$1 [PT]
The expression ^simple will probably never match, since all requests will start with a /.
You're using $1 in the right-hand side of the rule, but there are no match groups in the left-hand side that will populate this. This means that a request for /simple would get you /foo/bar/, but a request for /simple/somethingelse wouldn't match the rule. If this isn't the behavior you want, you probably mean this:
RewriteRule ^/simple(.*)$ /foo/bar$1 [PT]
(Note that I've added the missing leading / here as well).
With these changes in place, this rule behaves on my system as I think you're expecting.
Lastly, turning on the RewriteLog and setting RewriteLogLevel (assuming a pre-2.4 version of Apache) will help expose the details of exactly what's happening.

Apache rewrite rule leading slash

Leading slash first argument: ignored?
What's the syntax difference between
RewriteRule help help.php?q=noslash [L] #1
RewriteRule /help help.php?q=withslash [L] #2
If I hit http://localhost/help, it goes to #1, if I hit http://localhost//help it still goes to #1.
Am I right in saying the leading slash in the first argument to RewriteRule is essentially ignored?
Leading slash second argument: error?
Also, why doesn't this rewrite rule work?
RewriteRule help /help.php [L] #1
Putting a leading slash in front of the second arg actually creates a 500 error for the server. Why?
I should note I'm using a .htaccess file to write these rules in
Strangely enough,
RewriteRule ^/help help.php?q=2 [L]
The above rule fails and never matches.
This rule:
RewriteRule ^help help.php?q=1 [L]
Matches http://localhost/help, http://localhost//help and http://localhost///help
It appears RewriteRule never sees leading slashes of the path, and as TheCoolah said they are collapsed (to 0.. when using a .htaccess file anyway) no matter how many there are.
For the second part of the question,
RewriteRule ^help /help.php
I'm getting the answer from Definitive Guide to Apache Mod_rewrite
... a rewrite target that does not begin with http:// or another protocol
designator is assumed to be a file system path. File paths that do not begin with a slash are interpreted as being relative to the directory in which the rewriting is taking place.
So /help.php looks in the root of the system for a file called help.php, which on my system it cannot find.
To make /help.php appear as a relative URL (relative to the root of the site) you can use the [PT] directive:
RewriteRule ^/help /help.php [PT]
That directs http://localhost/help to http://localhost/help.php.
Regarding double slashes: Most Web servers silently collapse multiple slashes into a single slash early in the request processing pipeline. This is true for at least Apache, Tomcat and Jetty. Most Unix-based file systems work the same way. If you really want to check for this, you need to do something like:
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
help matches "help" anywhere in the path.
/help matches nothing since the rewriterule directive omits the leading slash for matching purposes (i.e., you must use ^, not / or ^/, to reference the current directory).
(This can be very confusing if you've used %{REQUEST_URI} in rewritecond because %{REQUEST_URI} does begin with a trailing slash. When matching against %{REQUEST_URI}, ^ and ^/ are equivalent and a directory name will always be preceded by a slash character regardless of whether or not it is in the top-level directory.)
The server error is caused by an infinite loop. "help" becomes "/help.php" which is then matched by the same directive that did the rewriting. So, after the first match, "/help.php" becomes "/help.php" infinitely resulting in a URL that can't be resolved.
I believe such loops can be fixed with the end flag (i.e., [end]), but that flag requires Apache 2.3.9+ whereas Apache 2.2 seems to be more common in deployment. It'd probably be better to just fix the regular expression anyway; ^help$ would seem to be the better choice here.
The way RewriteRule works is that if the given regular expression matches any part of the path part of the URL (the part after the host and port but before the query string), then the entire path part is completely replaced with the given substitution. This explains the behaviour you're seeing in the first part of your question.
I'm not sure what could be causing the 500 errors on the second part; maybe the collapsing of doubled slashes doesn't happen after the rewrite engine has run and then generates a server error.
The reason for the 500 Error is an infinitive Loop:
help gets rewritten to /help
/help gets stripped to help
help gets rewritten to /help
etc. until the MaxRewrites limit is hit -> 500
Whereas if the rule rewrites help to help, Apache is smart enough to abort rewriting at that point.