Using .htaccess rewrite rules to reflect a "fake" directory structure in the addres bar - apache

I'm working with an online encyclopedia and I am trying to achieve the following:
Given the physical location of a file in http://example.com/articles/c/a/t/Cat.html,
Get the location in the address bar to show http://example.com/encyclopedia/Cat.html
This also needs to work so that if a link is clicked or someone types in "example.com/encyclopedia/Cat.html", the server will look for the file in "/articles/c/a/t/Cat.html", yet still serve the shorter URI in the address bar.
I understand this may involve some heavy .htaccess voodoo to accomplish, or perhaps that it would be better to use a PHP script to serve this purpose.
So far I have the following in my .htaccess:
<IfModule mod_rewrite.c>
Options +FollowSymLinks
RewriteEngine On
RewriteRule ^encyclopedia/(.*)\.html$ articles/$1.html [NC]
RewriteCond %{THE_REQUEST} ^GET\ articles/(.*)
RewriteRule ^articles/(.*) /encyclopedia/$1 [L,R=301]
</IfModule>
However with this code, it only works by going to "example.com/encyclopedia/c/a/t/Cat.html" and showing the proper page, and when you go to "/articles/c/a/t/Cat.html it still doesn't rewrite it as "/encyclopedia/", it just stays the same.
Edit - By removing the GET\ part from the RewriteCond and removing the leading forward-slash from /encyclopedia/$1 in the following line, any requests to "/articles/c/a/t/Cat.html" are correctly redirected to "/encyclopedia/c/a/t/Cat.html". I am still at a loss trying to remove the "/c/a/t" part though. **
I've tried using the following two rules to remove the "c/a/t/" part:
RewriteRule ^encyclopedia/((.)(.)(.).*)\.html$ articles/$2/$3/$4/$1.html [NC]
RewriteRule ^articles/(.)/(.)/(.)/(.*) /encyclopedia/$4 [L,R=301]
But with no success as I'm sure what's happening is I'm getting the capital "C" from "Cat.html" and putting that in as "/articles/C/a/t/Cat.html" which will obviously not work.
I've been looking around studying .htaccess RewriteRule and RewriteCond for days but I still haven't been able to figure this out and been BHOK enough to cause a few migraines.
Would this be better accomplished using a PHP script? Or can this voodoo be easily enough accomplished via only .htaccess rules?

First thing, forget about .htaccess files. .htaccess files is just an extension of Apache configuration files that you can put in some directories. They're really slowing down your apache server, he needs to check part of his configuration at runtime. It's done to allow some configuration on hosted environments.
Put everything you have in .htaccess files in <Directory> sections on your VirtualHost and use AllowOverride None to tell Apache to forget about trying to read .htaccess files.
So what you need is mod-rewrite voodoo, not .htaccess voodoo :-)
Now your rewrite problem is quite complex. If you need some mod-rewrite help do not forget to read this ServFault article : Everything You Ever Wanted to Know about Mod_Rewrite Rules but Were Afraid to Ask?
I assume that your Cat.html -> c/a/t/Cat.html is just an example and that you can have more than 3 letters : CatAndDogs.html -> c/a/t/a/n/d/d/o/g/s/CatAndDogs.html.
The part of mod-reqrite you need is (I think) RewriteMap. There you will find some helpers like lowercase: that coudl help you, but you will also find the prg: which means using an external program to perform the mapping. I would use perl examples of such rewriteMaps examples available via google and make some transformations. Should be quite easy and Fast in Perl to transform CatAndDogs.html in c/a/t/a/n/d/d/o/g/s/CatAndDogs.html.
Note that RewriteMap will never work inside a .htaccess. Forget .htaccess files. The prg: keyword will launch your perl program as a parallel daemon and will feed him with quite a lot of data, you shoudl really write something robust & fast. Do not forget to use the RewriteLock directive to avoid mixing results (some prg: mappers do not care about mixing results, think about load balancers for examples, but you do want to avoid mixing results for parallel queries)

Related

Apache RewriteRule for Two URLs

I'm trying to redirect the following two URLs:
https://www.example.com/blog/content/Das.com
https://www.example.com/blog/content/page/2
To:
https://www.example.com/blog/content
Using:
RewriteEngine on
RewriteRule (blog/content/Das.com|blog/content/page/2) /blog/content [L,R=301]
But it's not working. What am I doing wrong?
Are you sure you want to redirect different requested URLs to the same target URL? That means you will loose the information which URL has originally been requested. So you cannot differ between the two requests any more. If you actually only want to internally rewrite those URLs, so that they can be processed by the same controller, then just leave away the R=301 flags below...
I personally would suggest to implement two separate rules. Readability of code is of high importance, it should be possible to immedately understand what code does even for someone who did not write the code:
RewriteEngine on
RewriteRule ^/?blog/content/Das\.com$ /blog/content [R=301,END]
RewriteRule ^/?blog/content/page/2$ /blog/content [R=301,END]
But if you prefer a single rule you certainly can combine that:
RewriteEngine on
RewriteRule ^/?blog/content/(?:Das\.com|page/)$ /blog/content [R=301,END]
For this to work the rewriting module needs to be loaded into the http server obviously. You should prefer to implement such rules in the actual http server's host configuration. You can use a distributed configuration file (".htaccess") in case you do not have control over the normal configuration, but that comes with a performance penalty. And obviously also needs to be enabled first. You'd need to place that file in the top folder of your DOCUMENT_ROOT in that case.
In general it is a good idea to start out with a R=302 temporary redirection and only to change that to a R=301 permanent redirection once you are sure things work as expected. That prevents annoying caching issues.

Domain handling with a controller

Im running an MVC based application on my mainsite, I have 2 other domains (for the sake of an example, www.a.com & www.b.com)
I'd like to be able to handle all a.com's requests with mainsite.com/a/ and similarly b.com with mainsite.com/b/
However I do not want the url to be redirected/changed in the browser.
I've been trying with mod_rewrite, however it seems to be clashing with my existing .htaccess rules set for mainsite.com
this is my existing .htaccess
Could anyone please suggest the best way to do this?
In the existing .htaccess, I don't see any rules redirecting the domains a.com or b.com. To do that is pretty straightforward, though.
A condition for selecting the proper host www.a.com or a.com
RewriteCond %{HTTP_HOST} ^(?:www\.)?a\.com$
prevent an endless loop
RewriteCond %{REQUEST_URI} !^/a/
and do the actual rewrite
RewriteRule ^ /a%{REQUEST_URI} [L]
As long as you don't use the R flag, the URL shouldn't change in the browser.
The rule for host b.com is analogous.
Update:
Since you already have a very large .htaccess file, the performance impact shouldn't matter too much. If you want to know for sure, there's no substitute for measuring.
If you want to reduce the performance hit nevertheless, you have two options
Move the directives in the .htaccess file to your main config or virtual config file, see When (not) to use .htaccess files for an explanation.
Do some custom rewriting with PHP in your front controller. This depends on the framework or routing mechanism you use, of course.

rewritemap for SEO and pretty URLs

I am attempting to redirect & rewrite some dynamic PHP URL's to pretty and SEO friendly URLs. I have manged to do this successfully through .htaccess with the following code:
RewriteCond %{QUERY_STRING} ^somevar=green&nodescription=([a-zA-Z0-9_-]*)$
RewriteRule (.*) /green\/%1\/? [L,R=301]
RewriteRule ^green/([^/]*)/$ /script.php?somevar=green&nodescription=$1&rewrite=on [L]
This creates a somewhat pretty URL as follows:
http://www.mysite.com/green/aA43-/
As I say, this works absolutley fine. Apart from one thing. The parameter nodescription contains a non-descriptive random set of letters, numbers and other characters.
I would like to rewrite the nodescription parameter to a more descriptive one. I understand that I can do this with a rewritemap through Apache. However, I have no experience at doing soemthing like this, and I'm not entirely sure where to start.
Normally I would simply alter script.php so that it contains more descriptive parameters, but this time I have no control over the script; I am pulling it from another site using cURL.
Can anybody give me an example of how to pull this off?
Thanks!
Matt
Well, to answer my own question, to pull this off you need access httpd.conf file on your apache server. My shared hosting company didn't allow access to this file (I doubt any would allow you access).
So I bit the bullet and purchased a VPS. I will post the steps I took here in order to set the rewritemap up in the hope that it will help a lost soul :) Ok, here goes...
My VPS has WHM installed, so in WHM I went to:
Server Configuration >> Apache Configuration >> Include Editor
Pre Virtual Host Include >> All Versions
This feature takes any text you put in and includes it in your httpd.conf file without worrying that it will be overwritten at a later stage. If you don't have WHM on your server then you can add the text directly to your httpd.conf file; make sure it is outside and before any virtual hosts.
OK, so I included the following map declaration and rewrite rule:
#Map to redirect (swaps key and value)
RewriteMap rwmap txt:/home/*/public_html/rdmap.txt
<Directory /home/*/public_html/test>
Options All -Indexes
RewriteEngine on
RewriteRule ^url/([^/]*)/$ /script.php?foo=${rwmap:$1|$1}&rewrite=on [L]
</Directory>
The actual map is a simple text file containing key/value pairs - you need to place this file in the directory declared in RewriteMap rwmap txt:/home/*/public_html/rdmap.txt.
And there you go. Apache now rewrites my URLs for me and I now have some nice and pretty SEO optimized links thanks to my rewrite map! Hoorah!
RewriteEngine on
RewriteRule ^green/([^/]*)/(.*)$ /script.php?somevar=green&nodescription=$1&rewrite=on [L]
This rewrite will allow you to pass "arbitrary text" that has nothing to do with the rewrite. For example:
http://www.mysite.com/green/aA43-/some-seo-boosting-title
Will still reroute correctly to script.php; the latter part will simply be ignored by the rewrite.

Apache mod_rewrite not doing anything (?)

I'm having some trouble with Apache's mod_rewrite. One of the things I'm trying to get it to do is hide some of my implementation details, so that, for example, the user sees the URL http://www.mysite.com/login but Apache responds with the page at http://www.mysite.com/doc_root/login.php instead (preferably without showing the user that it's a PHP file or the directory structure). Here's what I have in my .htaccess file:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?mysite.com*
RewriteRule ^/(\w+) /doc_root/$1.php [L]
#Redirect http://www.mysite.com to the login page
RewriteRule ^/?$ https://www.mysite.com/doc_root/login.php
But when I go to http://www.mysite.com/login, I get a 404 error even though the page exists. I clearly don't have a great understanding of how the mod_rewrite conditionals and rules work, so can anyone please tell me what I'm doing wrong? Thanks.
Take doc_root out of all the stuff you have it in. That will give you the result you're asking for. However I'm not sure if it's desired or not. How are you going to force someone to login if they manually type http://www.mysite.com/index.php?
Also if you're trying to force all traffic to SSL it's better to use a second VirtualHost and Redirect instead of mod_rewrite. Those are all questions probably better suited for ServerFault
Unless your site has a bunch of different domain names, and you only want mysite.com to do the rewriting, you don't need the RewriteCond. (Potential problem. Apache likes to dick around with the domain name unless you set UseCanonicalName off. If the name isn't what it's expecting, the rewrite won't happen.)
In RewriteCond (and RewriteRule) patterns, . matches any character. Add a backslash before them. (Minor bug. Shouldn't cause rewrites to fail, but they would match stuff like "mysite-com" as well.)
mod_rewrite is actually a URL-to-filename filter. Though it is often used to rewrite URLs to other URLs, sometimes it will misbehave if what you're rewriting to is a URL and it can't tell. (Especially if what it's rewriting to would be an alias, or would otherwise not translate directly to a real filename.) If you add a [PT] flag onto your rule, though, it will consider the rewritten thing a URL and pass it along to the other filters (including the ones that turn URLs into filenames).
Do you really need "/doc_root"? The document root should already be set up in Apache using the DocumentRoot directive, and shouldn't need to be part of the URL unless you have multiple apps on the same domain (in which case it's the app root; the document root doesn't change).
UPDATE:
Another thing i just thought about: Rewrite rules work differently in .htaccess files. Apache likes to strip off the leading slash. So you will probably want to get rid of the first slash in your patterns, or at least make it optional (^/?login instead of ^/login).
^/?(\w+) will match /doc_root/login.php, and cause a rewrite to /doc_root/doc_root.php. You should probably have a $ at the end of your pattern.

mod_rewrite to alias one file suffix type to another

I hope I can explain this clearly enough, but if not let me know and I'll try to clarify.
I'm currently developing a site using ColdFusion and have a mod_rewrite rule in place to make it look like the site is using PHP. Any requests for index.php get processed by index.cfm (the rule maps *.php to *.cfm).
This works great - so far, so good. The problem is that I want to return a 404 status code if index.cfm (or any ColdFusion page) is requested directly.
If I try to block access to *.cfm files using mod_rewrite it also returns a 404 for requests to *.php.
I figure I might have to change my Apache config rather than use .htaccess
You can use the S flag to skip the 404 rule, like this:
RewriteEngine on
# Do not separate these two rules so long as the first has S=1
RewriteRule (.*)\.php$ $1.cfm [S=1]
RewriteRule \.cfm$ - [R=404]
If you are also using the Alias option then you should also add the PT flag. See the mod_rewrite documentation for details.
Post the rules you already have as a starting point so people don't have to recreate it to help you.
I would suggest testing [L] on the rule that maps .php to .cfm files as the first thing to try.
You have to use two distinct groups of rewrite rules, one for .php, the other for .chm and make them mutually exclusives with RewriteCond %{REQUEST_FILENAME}. And make use of the flag [L] as suggested by jj33.
You can keep your rules in .htaccess.