Why is Apache's RewriteRule revealing local paths? - apache

I'm trying to use RewriteRules in .htaccess with relative paths, but Apache seems to want to output the physical path instead of the server path whenever I try to output a relative path. Absolute and server-root paths work fine. For example:
RewriteEngine On
# this works fine, 127.0.0.1/ab redirects to 127.0.0.1/cd
RewriteRule ^ab$ /cd [R]
# this doesn't work... 127.0.0.1/wx redirects to 127.0.0.1/C:/path/to/files/yz
RewriteRule ^wx$ yz [R]
Adding a "RewriteBase /" solves the problem, but it's tedious to add the path to every .htaccess, and it makes it harder to change the directory structure. Is there a reason RewriteBase defaults to the current physical path instead of the current URI path?

For those who happen to arrive here from Google (like me), the short checklist:
Make sure you have RewriteBase / (or any other value - the statement is what is important)
If you use redirect ([R], [R=30x], etc) - make sure the new URI starts with a / and contains a path relative to your domain root
(If above didn't help yet) Restart Apache, clear your browser's cache (especially if you have used [R=301] at some point)
That's what saved my day, maybe it will save yours too.

It's because of the [R] which means the server will redirect to the new path (so the user's browser will issue a new request with the newly sent uri) instead of translating internally the URI to a local path.
In your first RewriteRule, there is an slash in the new path, thus the server doesn't try to translate it to the local path, but in the second rule, there is no slash, this is why it redirects to a complete local path. This explains too why it works with the RewriteBase set.
Either remove the [R] (you can replace it by a [L] in your case, this avoids the server trying to match other rules once it found a matching one), or add a slash before "yz" in your second RewriteRule.
I'd suggest to simply replace the [R] with a [L]: this way, the user won't see the rewritten path, which is generally what RewriteRules intend to do (mainly for SEO purposes), unless you specifically want to redirect your users to a new URL.

Try this and tell me the result:
# this doesn't work... 127.0.0.1/wx redirects to 127.0.0.1/C:/path/to/files/yz
RewriteRule ^wx$ /yz [R]
put / before yz.

Related

Mod_rewrite rules not working in .htaccess to change the URL

I'm trying to rewrite the below URL but the URLs just don't change, no errors.
Current URL:
https://example.com/test/news/?c=value1&s=value2&id=9876
Expected URL:
https://example.com/test/news/value1/value2
My .htaccess
RewriteEngine On
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
but I've seen many articles where a url such as example.com/display_article.php?articleId=my-article can be rewritten as example.com/articles/my-article for example with .htaccess
But the important point here (that I think you are missing) is that the URL must already have been changed internally in your application - in all your internal links. It is a common misconception that .htaccess alone can be used to change the format of the URL. Whilst .htaccess is an important part of this, it is only part of it.
Yes, you can implement a redirect in .htaccess to redirect from the old to new URL - and this is essential to preserve SEO (see below), but it is not critical to your application working. If you don't first change the URL in your internal links then:
The "old" URL is still exposed in the HTML source. When a user hovers over or copies the link, they are seeing and copying the "old" URL.
Every time a user clicks one of your internal links they are externally redirected to the "new" URL. This is slow for your users, bad for SEO (you should never link to a URL that is redirected) and bad for your server, as it potentially doubles the number of requests hitting your server (OK, 301s are cached locally).
To quote from #IMSoP's answer to this reference question on the subject:
Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly
So, once you have changed your internal links to the "new" (expected) format, eg. /test/news/value1/value2 (or should that be /test/news/value1/value2/id or even /test/news/id/value1/value2? See below), then you can do as follows...
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
This internally rewrites a request from /test/news/<value1>/<value2> to /test/news/?c=<value1>&s=<value2>&id=1. However, there are a couple of issues with this:
/test/news/ is not itself a valid endpoint. This requires further rewriting. Perhaps you are serving a DirectoryIndex document (eg. index.php)? This might appear seamless to you, but this requires an additional internal subrequest and makes the rule dependent on other elements of the config. You should rewrite directly to the file that handles the request. eg. /test/news/index.php?c=<value1>&s=<value2>&id=1 (remember, this is entirely hidden from the user).
You are hardcoding the id=1 parameter? Should every URL have the same id? Or should this be passed in the "new" URL (which is what I would expect)? What does the id represent? If this is critical to the routing of the URL then the id should appear earlier in the URL-path, in case the URL gets accidentally truncated when copy/pasted/shared.
If the id is required then it needs to be passed in the "new" URL. We only have the "new" URL to route the request, so the information can't be hidden.
So, if the "new" URL is now /test/news/<id>/<value1>/<value2> then the rewrite would need to be like this instead:
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]
Then (optionally*1) you can implement an external redirect in order to preserve SEO. This is for search engines that have indexed the "old" URLs or third party inbound links that cannot be updated - these need to be corrected to inform search engines of the change and get the user on the "new" canonical URL having followed an out-of-date inbound link.
(*1 It's not "optional" if you are changing an existing URL, but optional with regards to your application being functional.)
This "redirect" goes before the above rewrite:
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
The $0 backreference contains the full match from the RewriteRule pattern, ie. test/news/ in this case - this simply saves repetition.
The %1, %2 and %3 backreferences contain the values captured from the preceding condition. ie. the values of the c, s and id URL parameters respectively.
Note that the URL parameters / path segments should not be optional as in your original directive (ie. ([^/]*)). If they are optional and they are omitted, then the resulting URL becomes ambiguous. eg. <value2> becomes <value1> if <value1> is omitted.
Note that the URL parameters must be in the order as stated. If you have a mismatch of "old" URLs with these params in a different order (or even intermixed with other params) then this can be accounted for with additional complexity. (It may be easier to perform this redirect in your server-side script, instead of .htaccess.)
The first condition that checks against the REDIRECT_STATUS environment variable ensures that we only redirect direct requests and not rewritten requests by the later rewrite (which would otherwise result in a redirect loop). An alternative on Apache 2.4 is to use the END flag on the RewriteRule instead.
The QSD flag (Apache 2.4) discards the original query string from the request.
You should test first with a 302 (temporary) redirect to avoid potential caching issues and only change to a 301 (permanent) redirect once you have tested that everything works as intended. 301s are cached persistently by the browser so can make testing problematic.
Summary
Your complete .htaccess file should look something like this:
Options -MultiViews +FollowSymLinks
# If relying on the DirectoryIndex to handle the request
DirectoryIndex index.php
RewriteEngine On
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]

Redirect to a specific page from any dir or subdir in htaccess

Is it possible to use an universal rule to redirect to a specific page from whatever directory or subdirectory using .htaccess?
To be more precise, if I want to have an URL like example.com/login that redirects to example.com/login.php?action=login, I use the following line in my .htaccess file:
RewriteRule ^login$ /login.php?action=login [L]
But is it possible to to have a rule that lets me redirect from example.com/any_directory/login to example.com/login.php?action=login? So from anywhere down the example.com subdirectories to ``example.com/login.php?action=login`. And if yes, how can I do this
Certainly that is possible. Easiest is to use a rewrite condition since that operates on the absolute request path even inside a dynamic configuration file. Rewrite rules operate on a relative path i such location which makes matching complicated...
Take a look at this simple example:
RewriteEngine on
RewriteCond %{REQUEST_URI} /login$
RewriteRule ^ /login.php?action=login [L]
If you do the rewriting in the http servers host configuration instead you can simplify that. Reason is that it is always absolute paths the rules work on in that location:
RewriteEngine on
RewriteRule /login$ /login.php?action=login [L]
The main approach in both variants is to rely on the slash preceding the login key word. It always is present in an absolute request path and clearly left-delimits the key word. And not to insist on matching at the line start.
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).
You need to adjust your regex pattern .Simply remove the ^ ,so that it can match anychars before login/ in uri ie : /foobar/login .
RewriteRule /login/?$ /login.php?action=login [L]

.htaccess rewrites root to subfolder, yet subfolder app 302 redirects right back to full path

I have a standard .htaccess RewriteRule that silently rewrites any request for webroot into a subfolder which contains a MantisBT installation. So the user types in "example.com" and my server secretly serves them files from "example.com/path/to/mantisbt".
The problem now is that MantisBT's index page immediately does some authentication based logic routing and sends a 302 redirect to the FULL "example.com/path/to/mantis/login", which subverts my rewriting. I'm trying to have everyone access my MantisBT installation as if it resided in the webroot.
Now, I'm aware that after MantisBT's 302 redirect to the full path, I could redirect them AGAIN back to webroot. But redirecting people twice every time MantisBT goes through some routing logic seems like a dirty hack. I also know that I could hack up the MantisBT code, but I hate re-hacking code every time a new version comes out.
So, is there a way to trick MantisBT (or any other app for that matter) into thinking it resides in root, and therefore crafts it's redirect paths based on a webroot-relative url? For example: "example.com/login" instead of "example.com/path/to/mantis/login".
I'd really prefer to resolve this using an Apache .htaccess method, or httpd.conf change. Perhaps DocumentRoot or RewriteBase?
Try adding this rule above the internal rewrite rule that you had before
RewriteCond %{THE_REQUEST} \ /+path/to/mantisbt/([^\?\ ]*)
RewriteRule ^ /%1 [L,R]
this redirects the browser when the browser directly requests anything in /path/to/mantisbt/. Then the rule that you already have to internally rewrite into the mantisbt directory would take effect.

Does REQUEST_URI hide or ignore some filenames in .htaccess?

I'm having some difficulty with a super simple htaccess redirect.
All I want to do is rewrite absolutely everything, except a couple files.
htaccess looks like this:
RewriteEngine On
RewriteCond %{REQUEST_URI} !sitemap
RewriteCond %{REQUEST_URI} !robots
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The part that works is that everything gets redirected to new domain as it should be. And I can also access robots.txt without being forwarded, but not with sitemap.xml. If I try to go to sitemap.xml, the domain forwards along anyway and opens the sitemap file on the new domain.
I have this exact same issue when trying to "ignore" index.html. I can ignore robots, I can ignore alternate html or php files, but if I want to ignore index.html, the regex fails.
Since I can't actually SEE what is in the REQUEST_URI variable, my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI? I know this because of a stupid test. If I choose to ignore index.html like this:
RewriteCond %{REQUEST_URI} !index.html
Then if I type example.com/index.html I will be forwarded. But if I just type example.com/ the ignore actually works and it shows the content of index.html without forwarding!
How is it that when I choose to ignore the regex "index.html", it only works when "index.html" is not actually typed in the address bar!?!
And it gets even weirder! Should I type something like example.com/index.html?option=value, then the ignore rule works and I do NOT get forwarded when there are attributes like this. But index.html by itself doesn't work, and then just having the slash root, the rule works again.
I'm completely confused! Why does it seem like REQUEST_URI is not able to see some filenames like index.html and sitemap.xml? I've been Googling for 2 days and not only can I not find out if this is true, but I can't seem to find any websites which actually give examples of what these htaccess server variables actually contain!
Thanks!
my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI?
This is not true. There is no such special treatment of any requested URL. The REQUEST_URI server variable contains the URL-path (only) of the request. This notably excludes the scheme + hostname and any query string (which are available in their own variables).
However, if there are any other mod_rewrite directives that precede this (including the server config) that rewrite the URL then the REQUEST_URI server variable is also updated to reflect the rewritten URL.
index.html (Directory Index)
index.html is possibly a special case. Although, if you are explicitly requesting index.html as part of the URL itself (as you appear to be doing) then this does not apply.
If, on the other hand, you are requesting a directory, eg. http://example.com/subdir/ and relying on mod_dir issuing an internal subrequest for the directory index (ie. index.html), then the REQUEST_URI variable may or may not contain index.html - depending on the version of Apache (2.2 vs 2.4) you are on. On Apache 2.2 mod_dir executes first, so you would need to check for /subdir/index.html. However, on Apache 2.4, mod_rewrite executes first, so you simply check for the requested URL: /subdir/. It's safer to check for both, particularly if you have other rewrites and there is possibility of a second pass through the rewrite engine.
Caching problems
However, the most probable cause in this scenario is simply a caching issue. If the 301 redirect has previously been in place without these exceptions then it's possible these redirections have been cached by the browser. 301 (permanent) redirects are cached persistently by the browser and can cause issues with testing (as well as your users that also have these redirects cached - there is little you can do about that unfortunately).
RewriteCond %{REQUEST_URI} !(sitemap|index|alternate|alt) [NC]
RewriteRule .* alternate.html [R,L]
The example you presented in comments further suggests a caching issue, since you are now getting different results for sitemap than those posted in your question. (It appears to be working as intended in your second example).
Examining Apache server variables
#zzzaaabbb mentioned one method to examine the value of the Apache server variable. (Note that the Apache server variable REQUEST_URI is different to the PHP variable of the same name.) You can also assign the value of an Apache server variable to an environment variable, which is then readable in your application code.
For example:
RewriteRule ^ - [E=APACHE_REQUEST_URI:%{REQUEST_URI}]
You can then examine the value of the APACHE_REQUEST_URI environment variable in your server-side code. Note that if you have any other rewrites that result in the rewritting process to start over then you could get multiple env vars, each prefixed with REDIRECT_.
With the index.html problem, you probably just need to escape the dot (index\.html). You are in the regex pattern-matching area on the right-hand side of RewriteCond. With the un-escaped dot in there, there would need to be a character at that spot in the request, to match, and there isn't, so you're not matching and are getting the unwanted forward.
For the sitemap not matching problem, you could check to see what REQUEST_URI actually contains, by just creating an empty dummy file (to avoid 404 throwing) and then do a redirect at top of .htaccess. Then, in browser URL, type in anything you want to see the REQUEST_URI for -- it will show in address bar.
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
Credit MrWhite with that easy test method.
Hopefully that will show that sitemap in URL ends up as something else, so will at least partially explain why it's not pattern-matching and preventing redirect, when it should be pattern-matching and preventing redirect.
I would also test by being sure that the server isn't stepping in front of things with custom 301 directive that for whatever reason makes sitemap behave unexpectedly. Put this at the top of your .htaccess for that test.
ErrorDocument 301 default

apache redirect / Rewrite-Engine

Is the following possible?
A user requests the url http://example1.com/example.php and the apache opens http:// example1.com/example.php?id=1
A user requests the url http://example2.com/example.php and the apache opens http:// example2.com/example.php?id=2
But the user should not see the id in his browser adress bar (the user should only see http://example1.com/example.php or http://example2.com/example.php).
You can say the id is invisible for the user but transfered to the example.php.
How can I implement this?
Is that the correct solution?
RewriteEngine On
RewriteRule ^/example.php http://example1.com/example.php$1 [P]
ProxyPassReverse /example.php?id=1 http:// example1.com/example.php
RewriteEngine On
RewriteRule ^/example.php http://example2.com/example.php$1 [P]
ProxyPassReverse /example.php?id=2 http:// example2.com/example.php
You have to understand several concept.
Once the server received the user requested url he can do several things
Take the requested path from the url and use it without modifications. That's the default solution
Map the requested path to any other physical path, things that can be done via Alias, AliasMatch or RewriteRules.
Map the requested path to another website while hiding the fact thtat another website is requested. That's the proxy solution, thta mod_proxy or mod_rewrite could handle (but you do not need that)
Redirect the user to another path, sending him a new url to use, making another client/server roundtrip, with Redirect instructions or mod_rewrite (the swiss knife). But you do no need that.
So you want a server-side only remapping of the requested path.
Let,s say we will use mod rewrite to make this mapping. If you check all tags available in RewriteRule (summary here) the interesting ones are:
passthrough|PT : Forces the resulting URI to be passed back to the URL mapping engine for processing of other URI-to-filename translators, such as Alias or Redirect.
qsappend|QSA: Appends any query string from the original request URL to any query string created in the rewrite target
last|L: Stop the rewriting process immediately and don't apply any more rules. Especially note caveats for per-directory and .htaccess context (see also the END flag)
nocase|NC: Makes the pattern comparison case-insensitive.
details on the PT flag shows that:
The target (or substitution string) in a RewriteRule is assumed to be a file path, by default.
Well, that,s maybe enough for you. But using PT is a good thing, if you have other apache configusation elements you should try to let them apply after mod_rewrite job.
So... assuming you may need to handle some query strings arguments and that this id argument is based on the domain name in the request, and that only the example.php script needs this behavior; you should start your research with such rules (untested):
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example1.com$ [nocase]
RewriteRule ^example\.php$ example.php?id=1 [passthrough,qsappend,last]
RewriteCond %{HTTP_HOST} ^example2.com$ [nocase]
RewriteRule ^example\.php$ example.php?id=2 [passthrough,qsappend,last]