Sub-domain tangle (knotsmith required) - apache

I seem to have painted myself into a corner with my plans for a series of website sub-domains. I wonder if my plans can be rescued or have I truly shot myself in the foot and would need to totally re-write a ton of stuff? Here's the problem:
I planned a series of sub-domain sites all dealing with variations on a theme. For illustrative purposes let's pretend I have the site www.colour.com (this is the 'hub' whose files reside in public_html) and then I add subdomains red.colour.com (in public_html/red), green.colour.com (in public_html/green) and blue.colour.com (in public_html/blue). Ok all good up to this point.
The thing is that these sites all share a lot of resources in common - style sheets, Javascripts, images etc. These are resources I don't want to replicate because it's a waste of space, but more importantly I don't want to risk developing different versions of files and them not all keeping in step with each other. So I did what I thought was the sane thing and I store all these at the 'hub' (in public_html/css, public_html/js etc).
What I discovered when my site went nearly live was that as soon as I defined e.g. public_html/red as red.colour.com, it could no longer 'see' any of it's supporting files that were located one level higher (in ../) and so the appearence and functionality broke down into a complete screen mess.
Short of a major re-write, is there any way out of this mess that anyone can think of?
Thanks in advance!
Frank.

I cracked it!
It required 1 pint of blood, 2 gallons of sweat and 3 magnums of tears, but I finally cracked it.
The secret is to use mod.rewrite as I was starting to suspect. You can indeed not address areas above the root, but there are other ways of refering to such locations, specifically you can refer to files or directories that 'don't exist' using the ReWrite conditions here:
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
So for example, when you want to refer to the style sheet directory that was at a higher level than the currect context, you would refer to ../css, but after you've made the current context into it's own sub-domain, ../css no longer exists. That means you can make a match with one or both of the above rewrite conditions. Now it's just a case of pointng the client to the (e.g.) css directory via the original domain, www.colour.com/css, in my example. Here, for completeness, and to help any other poor soul from killing themselves over this (I had to learn regular expressions, mod.rewrite and Lord knows what else in 6 hours to solve this one! so I hope someone else benefits as well as me!)
DirectoryIndex index.php
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^(.*)$ http://www.colours.com/$1 [R=301,L]
There may be some other directives worth adding such as to make it not rewrite hidden files, to prevent it looping etc but I haven't learn enough about mod.rewrite yet to make this squeeky clean. However it does work as it is and that is 99.99999% of my main goal.
Edit one week on.
Having lived with this solution a while I have to admit that there is a price to pay for this little bit of magic. I'm no expert as you will have gleaned, but it seems to me that this solution leads the browser to request each of the resources involved twice. This might not be a problem if your site is small and the efficiency hit unnoticeable, but if you want to be a stickler for optimum load times, then you might have to weigh up the pro's and con's of using this solution versus recoding your links into something more direct. Just so you know!

Frank, one slight tease: give us XYZ "then I can post a solution" isn't quite consistent with "every source including my web hosting company said it wasn't possible to do this".
Your approach is only sort doing what you want because a 301 redirect like this just passes the problem back to the client's browser telling it to refetch the resource from your other subdomain so it does two request for each resource -- which slows down the load time at the client.
Your first solution is that you could have done by just directly coding http://common.colour.com/... in your src and href attributes in your HTML. But this has the downside that you need to change your scripts. This approach is actually quite common as many high volume sites use a separate domain for such static resource and now often use a CDN for this. It can also have performance advantages as browsers will typically open two streams per domain for loading content but parallel up requests to separate streams.
Your second solution is to do this within your servers filesystem. What you need to do is to create symbolic links for DOCROOT/red/css to DOCROOT/css etc. If you are scripting in PHP you can do this by creating and executing a temporary configuration script in your DOCROOT and invoke it to add these links (which appear in your FTP browser as separate directory copies but in fact all point to shared resources. The key function that you need use is symlink() like this:
<?php
$root=dirname(__FILE__);
foreach( array( 'red', 'green', 'blue') as $c ){
foreach( array( 'css', 'js' ) as $d ) {
symlink( "$root/$c", "$root/$d/$c);
}
}
You'd need to tweak the colours and resource directories. Note that I've just typed this in and not debugged it -- costs extra ;-)
The third solution is to use an .htaccess file, but to do internal redirects rather than external ones. Here the Apache rewrite module maps the request directly to a different path/filename. I assume that your cpanel allows you to map XXX.colour.com to DOCROOT/xxx etc. so that a "GET /css/sheet.css" to a HTTP_HOST red.colour.com gets processed as DOCROOT/red/css/sheet.css. How you approach this depends on whether you use a single .htaccess at DOCROOT or separate red/.htaccess etc. If the former you can use a rule like:
RewriteEngine On
RewriteBase /
RewriteRule (red|green|blue)/(css|js|images)/(.*) /$2/$3 [L]
You do want the leading / on the rule target string here.
Try these out, and hope this helps :-)

Related

How to implement url page redirection for a massive huge website

my site e.g. carparts.co.uk has 355000 unique urls. (it is a car parts catalogue site) (on webmaster tools it shows that 174000 of these are indexed)
We want to move our site to a new shopping cart platform (prestashop), and have completely changed the structure of the catalogue, which means we now have a new set of urls. (although the main domain is unchanged and is still carparts.co.uk)
i now have a excel sheet where I have a column of the 355000 'old' urls matched against the closest equivalent url on the new catalogue.
e.g.
old url: "carparts.co.uk/ford-ranger-alternator belts.htm"
goes to: "carparts.co.uk/belt-drive"
(and there are 355,000 of similar redirects)
my question is how should i do this?
i've that you can use htaccess to do this, but i'm worried because i've read that htaccess slows down sites if it is very large (is this slowness only encounted when trying to access one of the old urls?, or will it impact the speed of all my urls?
so what is the best thing for me to do with such a large number of urls?
Your best bet is probably setting up a RewriteMap. This requires server vhost config access as you can't configure the map from an htaccess file (though you can use one). The mapping is cached by apache so you don't need to worry about constant file access.
Something simple like:
RewriteMap redirects txt:/full/path/to/redirect-map.txt
Then in the file redirect-map.txt would simply have a "from" and "to":
"ford-ranger-alternator belts.htm" belt-drive
old-url.htm new-url
etc...
Then in either your htaccess file or in vhost config, just do:
RewriteCond $(redirects:$1|0) !=0
RewriteRule ^(.*)$ $(redirects:$1) [L,R]
Use of htaccess slows down the website because it needs to check several files for each request, and these are checked dynamically for every request.
It's more a problem for deep routed sites. For example, a request to:
www.example.com/folder1/folder2/folder3/folder4/index.htm
would need to check
The main config file.
Then add any overrides in the document root
htaccess file.
Then add any overrides in the folder1 htaccess file.
Then add any overrides in the folder2 htaccess file.
...etc.
However if you don't have deep nesting then it's not so bad. Still slower than not using them, but may not be noticeable on most sites.
The benefit of htaccess for you here, would mean that you wouldn't need to put all the redirects in one place, and could split them up amongst the htaccess file. I'm not sure of the impact of adding 355,000 redirects to the main Apache config, but it is a fair number, so imagine it could have a performance impact. The htaccess files, on the other hand, are read dynamically as the request is made, so all the redirects would not need to be loaded into Apache.
So, this might be one of the few use cases where htaccess might be a better solution, even if you do have access to the main config files.

mod_rewrite for favicon control?

I'm making several subdomains as what will basically be portals to the same site on Namecheap. Redirecting subdomains is actually really easy (especially since the plumbing is hidden from me),but I want the favicons to be different. This is crucial because the site is crawled by robots that probably don't care about Javascript or the like.
How would I get a request for http://newsubdomain.example.com/favicon.ico to go to http://oldsubdomain.example.com/differentfavicon.ico instead?
Since I'm a huge n00b in mod_rewrite and most of .htaccess in general, I don't know if it's significant that I'm ultimately storing the files in a structure similar to
http://example.com/oldsubdomain/differentfavicon.ico ...
I could probably use PHP if worse came to worst, but I'm trying to avoid adding yet another language to the list of things my little project requires.
How would I get a request for http://newsubdomain.example.com/favicon.ico to go to http://oldsubdomain.example.com/differentfavicon.ico
You can use this code in your DOCUMENT_ROOT/.htaccess file:
RewriteEngine On
RewriteCond %{HTTP_HOST} =newsubdomain.example.com
RewriteRule ^(favicon\.ico)$ http://oldsubdomain.example.com/different$1 [L,NC,R=301]

Is there a way to know the original URL from the friendly?

Lets say that its been two years (and many people left the company) and we don't know anymore where is the real path of:
http://mysubdomain.domain.com/some-friendly-url
Is there a way to know?
Without access to the Apache configuration (or .htaccess), no you don't. It's totally transparent for the browser.
Some (PHP) websites just do this:
RewriteRule ^ index.php [QSA,L]
Which means that all the logic is hidden in your web application.

We've created a new website, but the old URLs should continue to work because people have bookmarked them

The only problem is
The old urls are something like this www.example.com/?pt#!/2/1270/something-etc-etc/
and we want to redirect them, but we need to pass the something-etc-etc to the new url.
Something like this new.example.com/old/ plus(something-etc-etc)
I've been trying so many ways that I'm already lost
RewriteCond %{QUERY_STRING} ([:alnum:]-)+?[:alnum:]/$
RedirectMatch www.example.com/ http://new.example.com/old/
I was hoping that this regex will return only the ending part, but instead, it returns ?pt#!/2/1270/something-etc-etc/
Best way is to use the Apache mod_rewrite module as described in the URL Rewriting Guide. A bit of a heavy read if you haven't used mod_rewrite before, but well well worth learning. Lots of examples to make things concrete, too.

Upgrading a site with SEO in mind

I'm managing an established site which is currently in the process of being upgraded (completely replaced anew), but I'm worried that I'll lose all my Google indexing (that is, there will be a lot of pages in Google's index which won't exist in that place any more).
The last time I upgraded a (different) site, someone told me I should have done something so that my SEO isn't adversely affected. The problem is, I can't remember what that something was.
Update for some clarification: Basically I'm looking for some way to map the old paths to the new ones. For example:
User searches for "awesome page"
Google returns mysite.com/old_awesome_page.php, user clicks it.
My site takes them to mysite.com/new_awesome_page.php
And when Google gets around to crawling the site again...
Google crawls my site, refreshing the existing indexes.
Requests old_awesome_page.php
My site tells Google that the page has now moved to new_awesome_page.php.
There won't be a simple 1:1 mapping like that, it'll be more like (old) index.php?page=awesome --> (new) index.php/pages/awesome, so I can't just replace the contents of the existing files with redirects.
I'm using PHP on Apache
301 redirect all your old (gone) pages to the new ones.
Edit:
Here's a link to help. It has a few links to other places too.
You need to put some rewrite rules in an .htaccess file.
You can find lots of good information here. It's for Apache 1.3, but it works for Apache 2, too.
From that article, a sample for redirecting to files that have moved directories:
RewriteEngine on
RewriteRule ^/~(.+) http://newserver/~$1 [R,L]
This reads:
Turn on the rewrite engine.
For anything that starts with /~, followed by one or more of "anything", rewrite it to http://newserver/~ followed by that "anything".
The [L] means that the rewriting should stop after this rule.
There are additional directives that you can use to set a [301] redirect
You could do:
RewriteEngine on
RewriteRule old_page.php new_page.php [L]
But you'd have to have a rule for every page. To avoid this, I'd look at using Regular Expressions, as in the first example.
You can tune Google's view of your site, and probably notify its changes, from within Google Webmaster Tools. I think you should build a sitemap of your current site, and have it verified when the site changes.