How to get real request when using mod_rewrite - apache

I'm wondering how to get the "real" requested URL, when using mod_Rewrite. There are several rewrite rules in my htaccess-file for caching-purposes: First there is a check, if a cache-file is existent. If so, the request will be rewritten to the cache-file. Otherwise the request will be rewritten to a php-script, which creates this cache-file.
But I suspect, the rules doesn't match like I want them to. Is there a possibility to trace the "real" requests to see, which URL was requested by the client and which file is requested in the background?
Thanks in advance.

You may want the %{THE_REQUEST} special variable. The mod_rewrite docs say this:
The full HTTP request line sent by the browser to the server (e.g., "GET /index.html HTTP/1.1"). This does not include any additional headers sent by the browser. This value has not been unescaped (decoded), unlike most other variables below.
So if someone enters http://your-domain/path/file.html into their browser and your webserver rewrites /path/file.html into something entirely different, the %{THE_REQUEST} variable will still be GET /path/file.html HTTP/1.1, or something similar.
As for what the request finally got rewritten to, you can turn on logging for rewrite to see what it is:
RewriteLog /some-path/rewrite.log
RewriteLogLevel 9
This would go in your virtual host config and only be used for debugging purposes. The rewrite.log file will contain details on the rewriting process and what the final URI is.

Related

rewrite url using htaccess or hide some text from url

The url showing in the address bar: www.testsite.com/news#tab-1
The url which I want to show: www.testsite.com/news
The url showing in the address bar: www.testsite.com/news#tab-2
The url which I want to show: www.testsite.com/events
I tried rewriting rule using htaccess
RewriteCond %{REQUEST_URI} /news#tab-2$
RewriteRule .* /news[L]
and
RewriteRule www.testsite.com/test www.testsite.com/news#tab-1
But it didnt work. Please help.
You can't rewrite Anchors with .htaccess. You need to use something client side, like javascript in order todo so.
This article i found in another similiar question you can read it here:
Remove fragment in URL with JavaScript w/out causing page reload
Client browsers do not send the character "#" to the server. If you have access to the server Logs you will see all the server gets is "GET /news" and omits the rest. "#" is a client side interpreted character.
You will have to hex encode it in the url if you insist on sending it to the server, but it is probably better if you use a more common URI path or even query string "?" if you want to do internal redirections from the server.
As a friendly side-note. Do not use .htaccess unless you are not the admin of the Apache HTTPD server. It is not necessary to redirect/rewrite as it complicates them and it produces bigger overhead to the server since the file needs to be constantly checked for changes.

mod_rewrite behaviour when no rewriteBase

Just want to confirm something. From what I gather of how mod_rewrite works, Apache receives an URL and immediately mod_rewrite applies (non-<directory>) rules in httpd.conf, then per-directory mod-rewriting goes to work, then restarts the process with a new URL if any changes are made.
#JonLin's great answer to this question first says that when your per-directory rule specs an absolute replacement (ie. starting with a slash), it's assumed to be relative to the DocumentRoot which I get. But of relative replacements (no slash) Jon then says:
it's based on the directory that the rule is in. So if
RewriteRule ^foo$ bar.php [L]
is in the "root" and you go to http://example.com/foo, you get served http://example.com/bar.php. But if that rule is in the "subdir1" directory, and you go to http://example.com/subdir1/foo, you get served http://example.com/subdir1/bar.php. etc. This sometimes works and sometimes doesn't, as the documentation says, it's supposed to be required for relative paths, but most of the time it seems to work. Except when you are redirecting (using the R flag, or implicitly because you have http://host in your rule's target). That means this rule:
RewriteRule ^foo$ bar.php [L,R]
if it's in the "subdir2" directory, and you go to http://example.com/subdir2/foo, mod_rewrite will mistake the relative path as a file-path instead of a URL-path and because of the R flag, you'll end up getting redirected to something like: http://example.com/var/www/localhost/htdocs/subdir1.
As Jon explains in the last bit, when a redirect will occur and when there's no rewriteBase, a string intended as filepath gets appended to the site's base address to create a phony URL. But just to confirm, even in the former case Jon mentions, ie. not an actual redirect, the substituted string does get sent back to Apache's URL-reception code, restarting the whole process, correct? The diagram on this page of the spec seems to imply that until no rules make a change, the process keeps restarting. These non-redirect cases would seem to be the time when it WOULD make sense to tack the filepath right from the file system root to the htaccess directory onto the beginning of the substitution. But how does that get turned into a proper URL as expected by the URL-reception code - does http://localhost get prepended? I think that would make everything relative to the documentroot, not the actual file system root.
Thanks!
Been doing some more reading and think I've got this explained, for anyone who's interested.
Regarding my question about how a file system absolute path gets turned into a valid url for the internal redirect, I was thinking that the URI in an HTTP request contained "http://hostname", but this has been cut off ie. the URI is like /this/is/a/path. The host name is in a separate "Host" header field, and is no longer a vital piece of information by the time mod_rewrite is running, as Apache's initial Post Read Request phase has already noticed the GET request on the port and, if Name-Based Virtual Hosting is in use, interpreted things like the DocumentRoot from the Host header field, and finally called the URI Translation Phase where mod_rewrite executes. So any time mod_rewrite is running, there could be only one host name that got us here.
So to summarize, what I had called the "URL-reception" part of Apache always deals with /paths/like/this/without/hostname, not just after internal redirects. The spec does say that rewriteCond/rewriteRule match against such paths, but I figured the host name was there initially and got removed. So then all that's left is to ensure our rules are prepared for cases where they are running in an internal redirect spawned by an earlier runthrough of themselves, and not do something inadvertent when they see a file system absolute path caused by a replacement that didn't start with a slash. What a mouthful.

Does REQUEST_URI hide or ignore some filenames in .htaccess?

I'm having some difficulty with a super simple htaccess redirect.
All I want to do is rewrite absolutely everything, except a couple files.
htaccess looks like this:
RewriteEngine On
RewriteCond %{REQUEST_URI} !sitemap
RewriteCond %{REQUEST_URI} !robots
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The part that works is that everything gets redirected to new domain as it should be. And I can also access robots.txt without being forwarded, but not with sitemap.xml. If I try to go to sitemap.xml, the domain forwards along anyway and opens the sitemap file on the new domain.
I have this exact same issue when trying to "ignore" index.html. I can ignore robots, I can ignore alternate html or php files, but if I want to ignore index.html, the regex fails.
Since I can't actually SEE what is in the REQUEST_URI variable, my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI? I know this because of a stupid test. If I choose to ignore index.html like this:
RewriteCond %{REQUEST_URI} !index.html
Then if I type example.com/index.html I will be forwarded. But if I just type example.com/ the ignore actually works and it shows the content of index.html without forwarding!
How is it that when I choose to ignore the regex "index.html", it only works when "index.html" is not actually typed in the address bar!?!
And it gets even weirder! Should I type something like example.com/index.html?option=value, then the ignore rule works and I do NOT get forwarded when there are attributes like this. But index.html by itself doesn't work, and then just having the slash root, the rule works again.
I'm completely confused! Why does it seem like REQUEST_URI is not able to see some filenames like index.html and sitemap.xml? I've been Googling for 2 days and not only can I not find out if this is true, but I can't seem to find any websites which actually give examples of what these htaccess server variables actually contain!
Thanks!
my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI?
This is not true. There is no such special treatment of any requested URL. The REQUEST_URI server variable contains the URL-path (only) of the request. This notably excludes the scheme + hostname and any query string (which are available in their own variables).
However, if there are any other mod_rewrite directives that precede this (including the server config) that rewrite the URL then the REQUEST_URI server variable is also updated to reflect the rewritten URL.
index.html (Directory Index)
index.html is possibly a special case. Although, if you are explicitly requesting index.html as part of the URL itself (as you appear to be doing) then this does not apply.
If, on the other hand, you are requesting a directory, eg. http://example.com/subdir/ and relying on mod_dir issuing an internal subrequest for the directory index (ie. index.html), then the REQUEST_URI variable may or may not contain index.html - depending on the version of Apache (2.2 vs 2.4) you are on. On Apache 2.2 mod_dir executes first, so you would need to check for /subdir/index.html. However, on Apache 2.4, mod_rewrite executes first, so you simply check for the requested URL: /subdir/. It's safer to check for both, particularly if you have other rewrites and there is possibility of a second pass through the rewrite engine.
Caching problems
However, the most probable cause in this scenario is simply a caching issue. If the 301 redirect has previously been in place without these exceptions then it's possible these redirections have been cached by the browser. 301 (permanent) redirects are cached persistently by the browser and can cause issues with testing (as well as your users that also have these redirects cached - there is little you can do about that unfortunately).
RewriteCond %{REQUEST_URI} !(sitemap|index|alternate|alt) [NC]
RewriteRule .* alternate.html [R,L]
The example you presented in comments further suggests a caching issue, since you are now getting different results for sitemap than those posted in your question. (It appears to be working as intended in your second example).
Examining Apache server variables
#zzzaaabbb mentioned one method to examine the value of the Apache server variable. (Note that the Apache server variable REQUEST_URI is different to the PHP variable of the same name.) You can also assign the value of an Apache server variable to an environment variable, which is then readable in your application code.
For example:
RewriteRule ^ - [E=APACHE_REQUEST_URI:%{REQUEST_URI}]
You can then examine the value of the APACHE_REQUEST_URI environment variable in your server-side code. Note that if you have any other rewrites that result in the rewritting process to start over then you could get multiple env vars, each prefixed with REDIRECT_.
With the index.html problem, you probably just need to escape the dot (index\.html). You are in the regex pattern-matching area on the right-hand side of RewriteCond. With the un-escaped dot in there, there would need to be a character at that spot in the request, to match, and there isn't, so you're not matching and are getting the unwanted forward.
For the sitemap not matching problem, you could check to see what REQUEST_URI actually contains, by just creating an empty dummy file (to avoid 404 throwing) and then do a redirect at top of .htaccess. Then, in browser URL, type in anything you want to see the REQUEST_URI for -- it will show in address bar.
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
Credit MrWhite with that easy test method.
Hopefully that will show that sitemap in URL ends up as something else, so will at least partially explain why it's not pattern-matching and preventing redirect, when it should be pattern-matching and preventing redirect.
I would also test by being sure that the server isn't stepping in front of things with custom 301 directive that for whatever reason makes sitemap behave unexpectedly. Put this at the top of your .htaccess for that test.
ErrorDocument 301 default

Level of obscurity of destination URLs via mod_rewrite

To achieve a single layer of content delivery security, I'm looking into the possibility of obscuring a resource URL via an .htaccess RewriteRule:
RewriteEngine on
RewriteBase /js/
RewriteRule obscure-alias\.js http://example.com/sensitive.js
It would of course be implemented as:
<script type="text/javascript" src="obscure-alias.js"></script>
Because this is not a 301 redirect, but rather a routing scenario similar to that of many of our frameworks we used today, would it be safe to say that this RewriteRule adequately obfuscates the actual URL where this resource is located, or:
Can the destination URL still be found out via some HTTP header sniffing utility
Might a web browser be able to reveal the "Download URL"
I'm going to pre-answer my own questions by saying no to both since the "internal proxy" is taking place on the server-side and not on the client side if I understand it correctly: http://httpd.apache.org/docs/current/mod/mod_rewrite.html. I just wanted to confirm that when Apache goes to serve the destination URL, that it also isn't passing along information to the user agent what the URL was that it rewrote the original request as.
It depends on how you specify the redirect target.
If your http://example.com/ is running on the same server, there will be an internal redirect that is invisible to the client. From the manual:
Absolute URL
If an absolute URL is specified, mod_rewrite checks to see whether the hostname matches the current host. If it does, the scheme and hostname are stripped out and the resulting path is treated as a URL-path. Otherwise, an external redirect is performed for the given URL. To force an external redirect back to the current host, see the [R] flag below.
if the absolute URL points to a remote domain, a header redirect will be performed. A header redirect is visible to the client and will reveal the sensitive location.
To make sure no external redirect takes place, specify a relative URL like
RewriteRule obscure-alias\.js sensitive.js
Note that the sensitive JS file's URL can still be guessed.
To find out whether a request results in a header redirect, log in onto a terminal (eg. on a Linux server) and do
wget --server-response http://www.example.com
If the first HTTP/.... line (there may be more than one) is something that begins with a 3xx, like
HTTP request sent, awaiting response...
HTTP/1.1 302 Moved Temporarily
you are looking at a header redirect.
Possible using proxy throughput.
See http://httpd.apache.org/docs/2.4/rewrite/proxy.html
Also alluded to here as well: mod_rewrite not working as internal proxy

Apache rewrite rule - prevent rewritten URL appearing in browser URL bar

I have a rewrite rule which is looking for a particular URI. When it matches the particular URL it rewrites it with a proper file path so the required content can be found. It then changes the protocol to HTTPS and allows the request to pass through.
I have two problems;
I don't want the rewritten path to appear in the users browser - i want to maintain the vanity url
I do want the HTTPS protocol to appear indicating to the user that they are accessing the site over a secured conection.
I have tried a couple of options but no success. If i include the [R] flag the URL and protocol remain unchanged but that is not the desired effect
Any suggestions on how i can achieve this?
This is my rule;
RewriteMap redirectsIfSecure txt:/myserver/content/secure_urls.txt
RewriteCond ${lowercase:%{REQUEST_URI}} ^/(.+)$
RewriteCond ${redirectsIfSecure:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^(.*)$ https://myserver.com${redirectsIfSecure:%1} [PT]
From the mod_rewrite documentation:
If an absolute URL is specified, mod_rewrite checks to see whether the
hostname matches the current host. If it does, the scheme and hostname
are stripped out and the resulting path is treated as a URL-path.
Otherwise, an external redirect is performed for the given URL. To
force an external redirect back to the current host, see the [R] flag
below.
If you rewrite the request to a fully qualified URL (that is, anything starting with http://, https://, etc) that doesn't match your ServerName, then mod_rewrite will issue an HTTP redirect, which will cause the client browser to request the resource from the new location.
If you're not trying to switch between http and https you can use a proxy rule (the P flag) to have Apache make the request on behalf of the client and return the result, thus masking the rewritten URL.
However, if you're trying to upgrade from http to https (or the other way around), this will always require a client redirect.