mod_rewrite rules for existing files - apache

I am defining mod_rewrite rule that will rewrite all requests to my /application.php if requested file not exists, and won't do any rewriting otherwise. It is simple:
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule .* application.php [PT]
There is only one problem with the code. Assume I have foo.html file. Then requests like:
http://example.com/foo.html/some/other/string
will fall with 404 error.
Why?

will fall with 404 error. Why?
Because that URL doesn't exist. It's looking for the file string in the folder /foo.html/some/other and it's not there.
The behaviour that you want to exploit using the http://example.com/foo.html/some/other/string URL structure - treating the first entry as a file name, and the rest as a parameter to it - is called "pathinfo". It has nothing to do with mod_rewrite, but will be available if you enable the following in your Apache configuration:
AcceptPathInfo On
it looks like that setting is currently turned to "off" for you.
If you enable it, the part after the file name will be available to foo.html - in PHP, it would be in the
$_SERVER["PATH_INFO"]
variable.
Because this method doesn't require the rewrite module to be active, this is sometimes called "the poor man's mod_rewrite" - it works fine, but isn't quite as pretty as flexible as "real" rewriting.

Related

Apache 301 redirect with get parameters

I am trying to do a 301 redirect with lightspeed webserver htaccess with no luck.
I need to do a url to url redirect without any related parameters.
for example:
from: http://www.example.com/?cat=123
to: http://www.example.com/some_url
I have tried:
RewriteRule http://www.example.com/?cat=123 http://www.example.com/some_url/ [R=301,L,NC]
Any help will be appreciated.
Thanks for adding your code to your question. Once more we see how important that is:
your issue is that a RewriteRule does not operate on URLs, but on paths. So you need something like that instead:
RewriteEngine on
RewriteRule ^/?$ /some_url/ [R=301,L,NC,QSD]
From your question it is not clear if you want to ignore any GET parameters or if you only want to redirect if certain parameters are set. So here is a variant that will only get applied if some parameter is actually set in the request:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)cat=123(?:&|$)
RewriteRule ^/?$ /some_url/ [R=301,L,NC,QSD]
Another thing that does not really get clear is if you want all URLs below http://www.example.com/ (so below the path /) to be rewritten, or only that exact URL. If you want to keep any potential further path component of a request and still rewrite (for example http://www.example.com/foo => http://www.example.com/some_url/foo), then you need to add a capture in your regular expression and reuse the captured path components:
RewriteEngine on
RewriteRule ^/?(.*)$ /some_url/$1 [R=301,L,NC,QSD]
For either of this to work you need to have the interpretation of .htaccess style files enabled by means of the AllowOverride command. See the official documentation of the rewriting module for details. And you have to take care that that -htaccess style file is actually readable by the http server process and that it is located right inside the http hosts DOCUMENT_ROOT folder in the local file system.
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using .htaccess style files. Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).

Does REQUEST_URI hide or ignore some filenames in .htaccess?

I'm having some difficulty with a super simple htaccess redirect.
All I want to do is rewrite absolutely everything, except a couple files.
htaccess looks like this:
RewriteEngine On
RewriteCond %{REQUEST_URI} !sitemap
RewriteCond %{REQUEST_URI} !robots
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The part that works is that everything gets redirected to new domain as it should be. And I can also access robots.txt without being forwarded, but not with sitemap.xml. If I try to go to sitemap.xml, the domain forwards along anyway and opens the sitemap file on the new domain.
I have this exact same issue when trying to "ignore" index.html. I can ignore robots, I can ignore alternate html or php files, but if I want to ignore index.html, the regex fails.
Since I can't actually SEE what is in the REQUEST_URI variable, my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI? I know this because of a stupid test. If I choose to ignore index.html like this:
RewriteCond %{REQUEST_URI} !index.html
Then if I type example.com/index.html I will be forwarded. But if I just type example.com/ the ignore actually works and it shows the content of index.html without forwarding!
How is it that when I choose to ignore the regex "index.html", it only works when "index.html" is not actually typed in the address bar!?!
And it gets even weirder! Should I type something like example.com/index.html?option=value, then the ignore rule works and I do NOT get forwarded when there are attributes like this. But index.html by itself doesn't work, and then just having the slash root, the rule works again.
I'm completely confused! Why does it seem like REQUEST_URI is not able to see some filenames like index.html and sitemap.xml? I've been Googling for 2 days and not only can I not find out if this is true, but I can't seem to find any websites which actually give examples of what these htaccess server variables actually contain!
Thanks!
my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI?
This is not true. There is no such special treatment of any requested URL. The REQUEST_URI server variable contains the URL-path (only) of the request. This notably excludes the scheme + hostname and any query string (which are available in their own variables).
However, if there are any other mod_rewrite directives that precede this (including the server config) that rewrite the URL then the REQUEST_URI server variable is also updated to reflect the rewritten URL.
index.html (Directory Index)
index.html is possibly a special case. Although, if you are explicitly requesting index.html as part of the URL itself (as you appear to be doing) then this does not apply.
If, on the other hand, you are requesting a directory, eg. http://example.com/subdir/ and relying on mod_dir issuing an internal subrequest for the directory index (ie. index.html), then the REQUEST_URI variable may or may not contain index.html - depending on the version of Apache (2.2 vs 2.4) you are on. On Apache 2.2 mod_dir executes first, so you would need to check for /subdir/index.html. However, on Apache 2.4, mod_rewrite executes first, so you simply check for the requested URL: /subdir/. It's safer to check for both, particularly if you have other rewrites and there is possibility of a second pass through the rewrite engine.
Caching problems
However, the most probable cause in this scenario is simply a caching issue. If the 301 redirect has previously been in place without these exceptions then it's possible these redirections have been cached by the browser. 301 (permanent) redirects are cached persistently by the browser and can cause issues with testing (as well as your users that also have these redirects cached - there is little you can do about that unfortunately).
RewriteCond %{REQUEST_URI} !(sitemap|index|alternate|alt) [NC]
RewriteRule .* alternate.html [R,L]
The example you presented in comments further suggests a caching issue, since you are now getting different results for sitemap than those posted in your question. (It appears to be working as intended in your second example).
Examining Apache server variables
#zzzaaabbb mentioned one method to examine the value of the Apache server variable. (Note that the Apache server variable REQUEST_URI is different to the PHP variable of the same name.) You can also assign the value of an Apache server variable to an environment variable, which is then readable in your application code.
For example:
RewriteRule ^ - [E=APACHE_REQUEST_URI:%{REQUEST_URI}]
You can then examine the value of the APACHE_REQUEST_URI environment variable in your server-side code. Note that if you have any other rewrites that result in the rewritting process to start over then you could get multiple env vars, each prefixed with REDIRECT_.
With the index.html problem, you probably just need to escape the dot (index\.html). You are in the regex pattern-matching area on the right-hand side of RewriteCond. With the un-escaped dot in there, there would need to be a character at that spot in the request, to match, and there isn't, so you're not matching and are getting the unwanted forward.
For the sitemap not matching problem, you could check to see what REQUEST_URI actually contains, by just creating an empty dummy file (to avoid 404 throwing) and then do a redirect at top of .htaccess. Then, in browser URL, type in anything you want to see the REQUEST_URI for -- it will show in address bar.
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
Credit MrWhite with that easy test method.
Hopefully that will show that sitemap in URL ends up as something else, so will at least partially explain why it's not pattern-matching and preventing redirect, when it should be pattern-matching and preventing redirect.
I would also test by being sure that the server isn't stepping in front of things with custom 301 directive that for whatever reason makes sitemap behave unexpectedly. Put this at the top of your .htaccess for that test.
ErrorDocument 301 default

mod_rewrite misbehaves if regex matches existing file, but only in .htaccess file

I've got an .htaccess rule like this:
RewriteRule ^z/([^/]+)$ y.html?$1 [NC,L]
This rule works fine. But if I change the rule slightly, to:
RewriteRule ^y/([^/]+)$ y.html?$1 [NC,L]
When I try to load y/anything I get a 404 and the following message in the error log:
File does not exist: /var/www/y.html/anything
The only difference I can see is that z.html does not exist, but y.html does. At first I thought maybe the initial transform was triggering a recursive re-write, but I don't see how this could be. It should rewrite:
`y/anything`
to
`y.html?anything`
Which does NOT have a slash in it. In fact, the only problem with the re-written URL is that it has a slash where I specified a question mark. What is going on here?
It gets stranger. If I change the rewrite URL, e.g. to
RewriteRule ^y/([^/]+)$ /q.html?$1 [NC,L]
it STILL is telling me /var/www/y.html/anything not found, not q.html..
If I move y.html to y.js on the server, then it tells me /var/www/y.js/anything is not found. It really seems like it is somehow matching /dir/ and changing it to an existing file. Is there a default rule somewhere in apache that might do this?
I tried a hard reload in the browser, which had no effect.
Update: I tried to use RewriteLog to see what was going on with the re-writing. However, to do this I had move my rewrite entries to the VirtualHost section of my main config. After I did this the pattern matching completely stopped until I changed my rule to:
RewriteRule ^/y/([^/]+)$ /y.html?$1 [NC,L]
After making this change, everything works as expected. So why can't I get it to work in the .htaccess file? Neither regex works properly there (with or without the leading slash).
For anyone who runs into this problem in the future, the issue I was having was that MultiViews was interfering with my URL resolution. When I removed "MultiViews" from the list of options for the <Location/> in the <VirtualHost> the issue went away.
Having the following in the htaccess file
RewriteEngine On
RewriteRule ^z/([^/]+)$ /y.html?$1 [NC,L]
RewriteRule ^y/([^/]+)$ /y.html?$1 [NC,L]
will show the contents of y.html
You might want to do a hard refresh of the page because if it was previously redirected to /y.html/anything, but then changed to /y.html?anything it might still be in a cache on your browser. Double check with your other browsers that the same thing is happening on them.

htaccess redirect without .php extension

I recently changed a directory /old_dir/ to be /new_dir/ using this:
RedirectMatch 301 /old_dir/(.*) /new_dir/$1
Which seems to be working perfect for the url:
http://www.mysite.com/old_dir/test.php?var=xxxx
goes to
http://www.mysite.com/new_dir/test.php?var=xxxx
where test.php is the filename. But in many places I use:
http://www.mysite.com/old_dir/test?var=xxxx
which comes up with:
The requested URL /old_dir/test was not found on this server.
not using the .php extension takes advantage of some sort of apache plugin that knows it's a php handler, which seemingly messes up redirect because it says it doesn't exist now.
I am not sure how to fix this issue.
Edit: All the solutions are for this special case, but note that i have about 1000 other files that may not be php, or named the same.
For right now I just made a symbolic link in the old_dir with the name "test" to point to the new_dir's test.php. But I am still looking for a non-specific solution that includes my scenario.
Have you ever tried using mod_rewrite?
Options +FollowSymlinks
RewriteEngine On
RewriteRule ^([^/]+)/([^/]+)$ $1/$2.php [QSA]
RewriteRule ^old_dir/([^/]+)/$ new_dir/$1.php [QSA]

mod_rewrite to alias one file suffix type to another

I hope I can explain this clearly enough, but if not let me know and I'll try to clarify.
I'm currently developing a site using ColdFusion and have a mod_rewrite rule in place to make it look like the site is using PHP. Any requests for index.php get processed by index.cfm (the rule maps *.php to *.cfm).
This works great - so far, so good. The problem is that I want to return a 404 status code if index.cfm (or any ColdFusion page) is requested directly.
If I try to block access to *.cfm files using mod_rewrite it also returns a 404 for requests to *.php.
I figure I might have to change my Apache config rather than use .htaccess
You can use the S flag to skip the 404 rule, like this:
RewriteEngine on
# Do not separate these two rules so long as the first has S=1
RewriteRule (.*)\.php$ $1.cfm [S=1]
RewriteRule \.cfm$ - [R=404]
If you are also using the Alias option then you should also add the PT flag. See the mod_rewrite documentation for details.
Post the rules you already have as a starting point so people don't have to recreate it to help you.
I would suggest testing [L] on the rule that maps .php to .cfm files as the first thing to try.
You have to use two distinct groups of rewrite rules, one for .php, the other for .chm and make them mutually exclusives with RewriteCond %{REQUEST_FILENAME}. And make use of the flag [L] as suggested by jj33.
You can keep your rules in .htaccess.