mod_rewrite force internal redirect - apache

My goal is to reduce the visibility of my app's signature. This is not security by obscurity, just a superficial bit of defence in depth, so that at first glance an attacker cannot tell if it is a static site or not. (Also cosmetic; it just feels "cleaner" to hide app details even if they would never become visible in normal operation). Therefore I want to deny access to some directories without revealing that they exist, so I must give the exact same 404 response my app would give if the user requested a non-existent page.
In an .htaccess file, I have the following:
RewriteEngine on
RewriteCond "%{REQUEST_FILENAME}" "!-f"
RewriteCond "%{REQUEST_FILENAME}" "!-d"
RewriteRule "^(.*)" "index.php?page=$1"
RewriteRule "^(secret_dir1|secret_dir2)(/.*)?$" "index.php?page=404"
where index.php renders a nice pretty webpage according to the value of the "page" GET parameter; if "page" does not correspond to a page at the app level, or "page" is set to 404, the script renders a pretty 404 page with proper headers and everything.
Here's where the problem happens. "App-level" 404s work as expected; a 404 page is rendered. However, if the user requests mydomain.com/dir_i_am_trying_to_hide, they are given a 301 redirect to mydomain.com/dir_i_am_trying_to_hide/?page=404: an external redirect instead of an internal rewrite.
Why is it sending out an external redirect instead of just rewriting the url? How am I supposed to avoid this properly? Barring that, is there a way to force the server to do an internal rewrite instead? (The Apache docs seem to indicate you can force a RewriteRule to be external, but not the other way around)

Turns out my rewrite rule was not causing the external redirect; Apache's DirectorySlash was; I would query hostname/secret_dir1 and it would send a redirect to hostname/secret_dir1/.
I'm not sure why the query string was changed, but adding DirectorySlash off fixed it.

Related

Mod_rewrite rules not working in .htaccess to change the URL

I'm trying to rewrite the below URL but the URLs just don't change, no errors.
Current URL:
https://example.com/test/news/?c=value1&s=value2&id=9876
Expected URL:
https://example.com/test/news/value1/value2
My .htaccess
RewriteEngine On
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
but I've seen many articles where a url such as example.com/display_article.php?articleId=my-article can be rewritten as example.com/articles/my-article for example with .htaccess
But the important point here (that I think you are missing) is that the URL must already have been changed internally in your application - in all your internal links. It is a common misconception that .htaccess alone can be used to change the format of the URL. Whilst .htaccess is an important part of this, it is only part of it.
Yes, you can implement a redirect in .htaccess to redirect from the old to new URL - and this is essential to preserve SEO (see below), but it is not critical to your application working. If you don't first change the URL in your internal links then:
The "old" URL is still exposed in the HTML source. When a user hovers over or copies the link, they are seeing and copying the "old" URL.
Every time a user clicks one of your internal links they are externally redirected to the "new" URL. This is slow for your users, bad for SEO (you should never link to a URL that is redirected) and bad for your server, as it potentially doubles the number of requests hitting your server (OK, 301s are cached locally).
To quote from #IMSoP's answer to this reference question on the subject:
Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly
So, once you have changed your internal links to the "new" (expected) format, eg. /test/news/value1/value2 (or should that be /test/news/value1/value2/id or even /test/news/id/value1/value2? See below), then you can do as follows...
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
This internally rewrites a request from /test/news/<value1>/<value2> to /test/news/?c=<value1>&s=<value2>&id=1. However, there are a couple of issues with this:
/test/news/ is not itself a valid endpoint. This requires further rewriting. Perhaps you are serving a DirectoryIndex document (eg. index.php)? This might appear seamless to you, but this requires an additional internal subrequest and makes the rule dependent on other elements of the config. You should rewrite directly to the file that handles the request. eg. /test/news/index.php?c=<value1>&s=<value2>&id=1 (remember, this is entirely hidden from the user).
You are hardcoding the id=1 parameter? Should every URL have the same id? Or should this be passed in the "new" URL (which is what I would expect)? What does the id represent? If this is critical to the routing of the URL then the id should appear earlier in the URL-path, in case the URL gets accidentally truncated when copy/pasted/shared.
If the id is required then it needs to be passed in the "new" URL. We only have the "new" URL to route the request, so the information can't be hidden.
So, if the "new" URL is now /test/news/<id>/<value1>/<value2> then the rewrite would need to be like this instead:
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]
Then (optionally*1) you can implement an external redirect in order to preserve SEO. This is for search engines that have indexed the "old" URLs or third party inbound links that cannot be updated - these need to be corrected to inform search engines of the change and get the user on the "new" canonical URL having followed an out-of-date inbound link.
(*1 It's not "optional" if you are changing an existing URL, but optional with regards to your application being functional.)
This "redirect" goes before the above rewrite:
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
The $0 backreference contains the full match from the RewriteRule pattern, ie. test/news/ in this case - this simply saves repetition.
The %1, %2 and %3 backreferences contain the values captured from the preceding condition. ie. the values of the c, s and id URL parameters respectively.
Note that the URL parameters / path segments should not be optional as in your original directive (ie. ([^/]*)). If they are optional and they are omitted, then the resulting URL becomes ambiguous. eg. <value2> becomes <value1> if <value1> is omitted.
Note that the URL parameters must be in the order as stated. If you have a mismatch of "old" URLs with these params in a different order (or even intermixed with other params) then this can be accounted for with additional complexity. (It may be easier to perform this redirect in your server-side script, instead of .htaccess.)
The first condition that checks against the REDIRECT_STATUS environment variable ensures that we only redirect direct requests and not rewritten requests by the later rewrite (which would otherwise result in a redirect loop). An alternative on Apache 2.4 is to use the END flag on the RewriteRule instead.
The QSD flag (Apache 2.4) discards the original query string from the request.
You should test first with a 302 (temporary) redirect to avoid potential caching issues and only change to a 301 (permanent) redirect once you have tested that everything works as intended. 301s are cached persistently by the browser so can make testing problematic.
Summary
Your complete .htaccess file should look something like this:
Options -MultiViews +FollowSymLinks
# If relying on the DirectoryIndex to handle the request
DirectoryIndex index.php
RewriteEngine On
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]

POST information getting lost in .htaccess redirect

So, I have a fully working CRUD. The problem is, because of my file structure, my URLs were looking something like https://localhost/myapp/resources/views/add-product.php but that looked too ugly, so after research and another post here, I was able to use a .htaccess file to make the links look like https://localhost/myapp/add-product (removing .php extension and the directories), and I'm also using it to enforce HTTPS. Now, most of the views are working fine, but my Mass Delete view uses POST information from a form on my index. After restructuring the code now that the redirect works, the Mass Delete view is receiving an empty array. If I remove the redirect and use the "ugly URLs" it works fine. Here's how my .htaccess file is looking like:
Options +FollowSymLinks +MultiViews
RewriteEngine On
RewriteBase /myapp/
RewriteRule ^resources/views/(.+)\.php$ $1 [L,NC,R=301]
RewriteCond %{DOCUMENT_ROOT}/myapp/resources/views/$1.php -f
RewriteRule ^(.+?)/?$ resources/views/$1.php [END]
RewriteCond %{HTTPS} off
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
I didn't actually write any of it, it's a mesh between answered questions and research. I did try to change the L flag to a P according to this post: Is it possible to redirect post data?, but that gave me the following error:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator at admin#example.com to inform them of the time this error occurred, and the actions you performed just before this error.
More information about this error may be available in the server error log.
Apache/2.4.52 (Win64) OpenSSL/1.1.1m PHP/8.1.2 Server at localhost Port 443
POST information getting lost in .htaccess redirect
You shouldn't be redirecting the form submission in the first place. Ideally, you should be linking directly to the "pretty" URL in your form action. If you are unable to change the form action in the HTML then include an exception in your .htaccess redirect to exclude this particular URL from being redirected.
Redirecting the form submission is not really helping anyone here. Users and search engines can still see the "ugly" URL (it's in the HTML source) and you are doubling the form submission that hits your server (and doubling the user's bandwidth).
"Redirects" like this are only for when search engines have already indexed the "ugly" URL and/or is linked to by external third parties that you have no control over. This is in order to preserve SEO, just like when you change any URL structure. All internal "ugly" URLs should have already been converted to the "pretty" version. The "ugly" URLs are then never exposed to users or search engines.
So, using a 307 (temporary) or 308 (permanent) status code to get the browser to preserve the request method across the redirect should not be necessary in the first place. For redirects like this it is common to see an exception for POST requests (because the form submission shouldn't be redirected). Or only target GET requests. For example:
RewriteCond %{REQUEST_METHOD} GET
:
Changing this redirect to a 307/8 is a workaround, not a solution. And if this redirect is for SEO (as it only should be) then this should be a 308 (permanent), not a 307 (temporary).
Aside:
RewriteCond %{HTTPS} off
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
Your HTTP to HTTPS redirect is in the wrong place. This needs to go as the first rule, or make sure you are redirecting to HTTPS in the current first rule and include this as the second rule, before the rewrite (to ensure you never get a double redirect).
By placing this rule last then any HTTP requests to /resources/views/<something>.php (or /<something>) will not be upgraded to HTTPS.

Rewrite URL in XAMPP VirtualHost

Been trying to play with mod_rewrite through .htaccess in one Xampp virtualhost, but I am not getting the results that I am looking for.
What I am tring to do is to rewrite the following: www.example.com/name/billy.html to: www.example.com/billy
Without trying to rewrite the URLs the virtualhost is working fine, I have access to all pages. However, when I add the .htaccess with the corresponding rewrite rule I get a 404 page not found. The regex is working as expected though. I see that the request to www.example.com/name/billy.html it's been rewritten to www.example.com/billy, but the page doesn't load.
The name folder exists in the file structure and the .htaccess is inside the example folder.
Currently, my vh configuration looks like this:
<VirtualHost *:80>
DocumentRoot "/Applications/XAMPP/xamppfiles/htdocs/example"
ServerName www.example.com
<Directory "/Applications/XAMPP/xamppfiles/htdocs/example">
Options Indexes FollowSymLinks ExecCGI Includes
AllowOverride All
Require all granted
</Directory>
</VirtualHost>
And this is the content of the .htaccess file:
RewriteEngine On
RewriteRule ^name\/([a-z]+).html$ /$1 [L,NC,R]
What is missing?
What I am tring to do is to rewrite the following: www.example.com/name/billy.html to: www.example.com/billy
You seem to have the process the wrong way round and possibly mixing up "rewrites" and "redirects"?
You should be internally rewriting from the visible "friendly" URL to the underlying file-path that actually handles the request. You are trying to do the opposite here. How is your system expected to handle a request for /billy? (It doesn't, and generates a 404.)
You may be thinking you can change the URL using .htaccess (mod_rewrite) alone? But no, that is not how this works.
RewriteRule ^name\/([a-z]+).html$ /$1 [L,NC,R]
You mention "rewrite", but this directive is in fact a "redirect" (as indicated by the R flag). Specifically, a 302 (temporary) redirect in this instance.
(You might actually want to implement a redirect like this, if you are changing an existing URL structure, but more on that later*1)
A URL "rewrite" is entirely internal to the server. The user only sees the public URL, they do not see the URL that it might be rewritten to. You (the user) can't "see" a rewrite.
A "redirect" on the other hand, usually refers to an external redirect, ie. a 3xx response sent back to the client with an instruction to make a new request to a different URL. The URL being redirected to is visible to the user. This is used when content has moved to a different URL.
So, following your example, you should be requesting/linking to (in your HTML source) the short/friendly URL /billy and internally rewriting the request to /name/billy.html that actually handles the request.
For example:
RewriteEngine On
# Rewrite from "billy" to "name/billy.html"
RewriteRule ^[a-z]+$ name/$0.html [NC,L]
The $0 backreference contains the entire URL-path that is matched by the RewriteRule pattern.
Only use the NC flag if you do need to match uppercase letters as well. But a request for /Billy won't serve /name/billy.html on a case-sensitive OS.
And that's really it, with regards to the URL-rewritting, you can stop reading here.
*1 Redirect from old to new
Regarding the external "redirect" mentioned above. You might choose to implement a redirect (in the opposite direction) if you are changing an existing URL structure and the old URLs have been indexed by search engines and/or linked (or bookmarked) to by external third parties - in order to preserve SEO and keep users happy.
For example, say your original URLs were of the form /name/billy.html and you later decided to change your URLs to /billy instead. You first change the URLs in the HTML source and implement the "rewrite" as mentioned above so the new URLs now work. You then might implement an external redirect from the old /name/billy.html URL to the new /billy URL.
For this, you would use a directive like you had initially, except you have to be careful of redirect-loops because you are already rewriting the request in the opposite directive. You only want to redirect "direct/initial" requests and not rewritten requests by the earlier rewrite (that is actually later in the file). An easy way to check for "direct" requests is to check against the REDIRECT_STATUS environment variable, which is empty on the initial request and set to 200 (as in 200 OK status) when the request is rewritten.
For example, the following "redirect" would go before the above "rewrite", immediately after the RewriteEngine directive:
# Redirect from "name/billy.html" to "/billy"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^name/([a-z]+)\.html$ /$1 [R=301,NC,L]
This should ultimately be a 301 (permanent) redirect since the URL has presumably changed permanently. However, always test with 302 (temporary) redirects to avoid potential caching issues.
Further reading:
Reference: mod_rewrite, URL rewriting and "pretty links" explained

Does REQUEST_URI hide or ignore some filenames in .htaccess?

I'm having some difficulty with a super simple htaccess redirect.
All I want to do is rewrite absolutely everything, except a couple files.
htaccess looks like this:
RewriteEngine On
RewriteCond %{REQUEST_URI} !sitemap
RewriteCond %{REQUEST_URI} !robots
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The part that works is that everything gets redirected to new domain as it should be. And I can also access robots.txt without being forwarded, but not with sitemap.xml. If I try to go to sitemap.xml, the domain forwards along anyway and opens the sitemap file on the new domain.
I have this exact same issue when trying to "ignore" index.html. I can ignore robots, I can ignore alternate html or php files, but if I want to ignore index.html, the regex fails.
Since I can't actually SEE what is in the REQUEST_URI variable, my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI? I know this because of a stupid test. If I choose to ignore index.html like this:
RewriteCond %{REQUEST_URI} !index.html
Then if I type example.com/index.html I will be forwarded. But if I just type example.com/ the ignore actually works and it shows the content of index.html without forwarding!
How is it that when I choose to ignore the regex "index.html", it only works when "index.html" is not actually typed in the address bar!?!
And it gets even weirder! Should I type something like example.com/index.html?option=value, then the ignore rule works and I do NOT get forwarded when there are attributes like this. But index.html by itself doesn't work, and then just having the slash root, the rule works again.
I'm completely confused! Why does it seem like REQUEST_URI is not able to see some filenames like index.html and sitemap.xml? I've been Googling for 2 days and not only can I not find out if this is true, but I can't seem to find any websites which actually give examples of what these htaccess server variables actually contain!
Thanks!
my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI?
This is not true. There is no such special treatment of any requested URL. The REQUEST_URI server variable contains the URL-path (only) of the request. This notably excludes the scheme + hostname and any query string (which are available in their own variables).
However, if there are any other mod_rewrite directives that precede this (including the server config) that rewrite the URL then the REQUEST_URI server variable is also updated to reflect the rewritten URL.
index.html (Directory Index)
index.html is possibly a special case. Although, if you are explicitly requesting index.html as part of the URL itself (as you appear to be doing) then this does not apply.
If, on the other hand, you are requesting a directory, eg. http://example.com/subdir/ and relying on mod_dir issuing an internal subrequest for the directory index (ie. index.html), then the REQUEST_URI variable may or may not contain index.html - depending on the version of Apache (2.2 vs 2.4) you are on. On Apache 2.2 mod_dir executes first, so you would need to check for /subdir/index.html. However, on Apache 2.4, mod_rewrite executes first, so you simply check for the requested URL: /subdir/. It's safer to check for both, particularly if you have other rewrites and there is possibility of a second pass through the rewrite engine.
Caching problems
However, the most probable cause in this scenario is simply a caching issue. If the 301 redirect has previously been in place without these exceptions then it's possible these redirections have been cached by the browser. 301 (permanent) redirects are cached persistently by the browser and can cause issues with testing (as well as your users that also have these redirects cached - there is little you can do about that unfortunately).
RewriteCond %{REQUEST_URI} !(sitemap|index|alternate|alt) [NC]
RewriteRule .* alternate.html [R,L]
The example you presented in comments further suggests a caching issue, since you are now getting different results for sitemap than those posted in your question. (It appears to be working as intended in your second example).
Examining Apache server variables
#zzzaaabbb mentioned one method to examine the value of the Apache server variable. (Note that the Apache server variable REQUEST_URI is different to the PHP variable of the same name.) You can also assign the value of an Apache server variable to an environment variable, which is then readable in your application code.
For example:
RewriteRule ^ - [E=APACHE_REQUEST_URI:%{REQUEST_URI}]
You can then examine the value of the APACHE_REQUEST_URI environment variable in your server-side code. Note that if you have any other rewrites that result in the rewritting process to start over then you could get multiple env vars, each prefixed with REDIRECT_.
With the index.html problem, you probably just need to escape the dot (index\.html). You are in the regex pattern-matching area on the right-hand side of RewriteCond. With the un-escaped dot in there, there would need to be a character at that spot in the request, to match, and there isn't, so you're not matching and are getting the unwanted forward.
For the sitemap not matching problem, you could check to see what REQUEST_URI actually contains, by just creating an empty dummy file (to avoid 404 throwing) and then do a redirect at top of .htaccess. Then, in browser URL, type in anything you want to see the REQUEST_URI for -- it will show in address bar.
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
Credit MrWhite with that easy test method.
Hopefully that will show that sitemap in URL ends up as something else, so will at least partially explain why it's not pattern-matching and preventing redirect, when it should be pattern-matching and preventing redirect.
I would also test by being sure that the server isn't stepping in front of things with custom 301 directive that for whatever reason makes sitemap behave unexpectedly. Put this at the top of your .htaccess for that test.
ErrorDocument 301 default

SEO URLs with ColdFusion controller?

quick ref: area = portal type page.
I would like old urls http://domain.com/long/rubbish/url/blah/blah/index.cfm?id=12345
to redirect to http://domain.com/area/12345-short-title
http://domain.com/area/12345-short-title should display the content.
I have worked out so far to do this I could use apache to write all URLs to
http://domain.com/index.cfm/long/rubbish/url/blah/blah/index.cfm?id=12345
and
http://domain.com/index.cfm/area/12345-short-title
The index.cfm will either server the content or apply a permanent redirect, but it will need to get the title and area information from the database first.
There are 50,000 pages on this website. I also have other ideas for subdomain redirects, and permanent subdomains and controlling how they act through the index.cfm.
Infrastructure are keen to do as much through Apache rewrite as possible, we suspect it would be faster. However I'm not sure we have that choice if we need to get the area and title information for each page.
Has anyone got some experience with this that can provide input?
--
Something to note, I'm assuming we'll have to keep all the internal URLs used on the website in the old format. It would be a mega job to change them all.
This means all internal URLs will have to use a permanent redirect every time.
Rather than redirecting both groups of URLs to the same script, why not simply send them to two distinct scripts?
Simply like this:
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^\w+/\d+-[\w-]+$ /content.cfm/$0 [L]
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^.* /redirect.cfm/$0 [L,QSA]
Then, the redirect.cfm can lookup the replacement URL and do the 301 redirect, whilst content.cfm simply serves the content.
(You haven't specified how your CF is setup; you may need to update the Jrun/Tomcat/other config to support /content.cfm/* and /redirect.cfm/* - it'll be done the same as it's done for index.cfm)
For performance reasons, you still want to avoid the database hits for redirecting if you can, and you can do that by generating rewrite rules for each page that performs the 301 redirect on the Apache side. This can be as simple as appending a line to the .htaccess file, like so:
<cfset NewLine = 'RewriteRule #ReEscape(OldUrl)# #NewUrl# [L,QSA,R=301]' />
<cffile action="append" file="./.htaccess" output=#NewLine# />
(Where OldUrl and NewUrl have been looked-up from the database.)
You might also want to investigate using mod_alias redirect instead of mod_rewrite RewriteRule, where the syntax would be Redirect permanent #OldUrl# #NewUrl# - since the OldUrl is an exact path match it would likely be faster.
Note that these rules will need to be checked before the above redirect.cfm redirect is done - if they are in the same .htaccess you can't simply do an append, but if they are in the site's general Apache config files then the .htaccess rules will be checked first.
Also, as per Sharon's comment, you should verify if your Apache will handle 50k rules - whilst I've seen it reported that "thousands" of regex-based Apache rewrites are perfectly fine, there may well be some limit (or at least the need to split across multiple files).
Using apache rewrites would only be faster if they were static rewrites, or if they all followed some rule that you could write in regex within the .htaccess file. If you're having to touch the database for these redirects, then it may not make sense to do it in .htaccess.
Another approach is the one used by most CMSs for handling virtual directories and redirects. An index.cfm file at the root of the site handles all incoming requests and returns the correct pages and pathing. MURA CMS uses this approach (as well as Joomla and most of the others.)
Basically you're using the CGI.path_info variable on an incoming request, searching for it in your DB, and doing a redirect to the new path. As usual, Ben Nadel has a good write-up of how to use this approach: Ben Nadel: Using IIS URL Rewriting And CGI.PATH_INFO With IIS MOD-Rewrite
You can, however, use the .htaccess to remove the "index.cfm" from the url string entirely if you want by redirecting all incoming requests to the root URL with something that looks like this in your .htaccess:
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^([a-zA-Z0-9-]{1,})/([a-zA-Z0-9/-]+)$ /$1/index.cfm/$2 [PT]
Basically this would redirect something like http://www.yourdomain.com/your-new-url/ to http://www.yourdomain.com/index.cfm/your-new-url/ where it could be processed as described by the blog post above. The user would never see the index.cfm.