Upgrading a site with SEO in mind - apache

I'm managing an established site which is currently in the process of being upgraded (completely replaced anew), but I'm worried that I'll lose all my Google indexing (that is, there will be a lot of pages in Google's index which won't exist in that place any more).
The last time I upgraded a (different) site, someone told me I should have done something so that my SEO isn't adversely affected. The problem is, I can't remember what that something was.
Update for some clarification: Basically I'm looking for some way to map the old paths to the new ones. For example:
User searches for "awesome page"
Google returns mysite.com/old_awesome_page.php, user clicks it.
My site takes them to mysite.com/new_awesome_page.php
And when Google gets around to crawling the site again...
Google crawls my site, refreshing the existing indexes.
Requests old_awesome_page.php
My site tells Google that the page has now moved to new_awesome_page.php.
There won't be a simple 1:1 mapping like that, it'll be more like (old) index.php?page=awesome --> (new) index.php/pages/awesome, so I can't just replace the contents of the existing files with redirects.
I'm using PHP on Apache

301 redirect all your old (gone) pages to the new ones.
Edit:
Here's a link to help. It has a few links to other places too.

You need to put some rewrite rules in an .htaccess file.
You can find lots of good information here. It's for Apache 1.3, but it works for Apache 2, too.
From that article, a sample for redirecting to files that have moved directories:
RewriteEngine on
RewriteRule ^/~(.+) http://newserver/~$1 [R,L]
This reads:
Turn on the rewrite engine.
For anything that starts with /~, followed by one or more of "anything", rewrite it to http://newserver/~ followed by that "anything".
The [L] means that the rewriting should stop after this rule.
There are additional directives that you can use to set a [301] redirect
You could do:
RewriteEngine on
RewriteRule old_page.php new_page.php [L]
But you'd have to have a rule for every page. To avoid this, I'd look at using Regular Expressions, as in the first example.

You can tune Google's view of your site, and probably notify its changes, from within Google Webmaster Tools. I think you should build a sitemap of your current site, and have it verified when the site changes.

Related

How to do a basic, total, overall redirect with htaccess

For some time I used a free account to host my online forum. Recently I bought a domain name, and now it contains a full copy of all the php and mysql data in the forum, so now there are two copies of the forum in different locations. The last step I need to accomplish is to erase all the data in the old location, and replace it with a short htaccess file that will make a global redirect. What would the contents of this htaccess file look like ? I don't know enough Apache to be sure.
I found many related topics to this one here on MSE and elsewhere, but none that could match my needs exactly . Most are about redirections to just a subfolder inside a single server.
Note : the whole point of this question is that I wish the users of the forum to use the same URL as before, I wish to avoid to have to tell everyone that the forum's URL has changed. Nonprofessional users dislike changes
If mod_alias is active:
Redirect 301 / http://your-new-site.com
otherwise, if mod_rewrite is active:
RewriteEngine On
RewriteRule ^ http://your-new-site.com%{REQUEST_URI} [R=301,L]

How to implement url page redirection for a massive huge website

my site e.g. carparts.co.uk has 355000 unique urls. (it is a car parts catalogue site) (on webmaster tools it shows that 174000 of these are indexed)
We want to move our site to a new shopping cart platform (prestashop), and have completely changed the structure of the catalogue, which means we now have a new set of urls. (although the main domain is unchanged and is still carparts.co.uk)
i now have a excel sheet where I have a column of the 355000 'old' urls matched against the closest equivalent url on the new catalogue.
e.g.
old url: "carparts.co.uk/ford-ranger-alternator belts.htm"
goes to: "carparts.co.uk/belt-drive"
(and there are 355,000 of similar redirects)
my question is how should i do this?
i've that you can use htaccess to do this, but i'm worried because i've read that htaccess slows down sites if it is very large (is this slowness only encounted when trying to access one of the old urls?, or will it impact the speed of all my urls?
so what is the best thing for me to do with such a large number of urls?
Your best bet is probably setting up a RewriteMap. This requires server vhost config access as you can't configure the map from an htaccess file (though you can use one). The mapping is cached by apache so you don't need to worry about constant file access.
Something simple like:
RewriteMap redirects txt:/full/path/to/redirect-map.txt
Then in the file redirect-map.txt would simply have a "from" and "to":
"ford-ranger-alternator belts.htm" belt-drive
old-url.htm new-url
etc...
Then in either your htaccess file or in vhost config, just do:
RewriteCond $(redirects:$1|0) !=0
RewriteRule ^(.*)$ $(redirects:$1) [L,R]
Use of htaccess slows down the website because it needs to check several files for each request, and these are checked dynamically for every request.
It's more a problem for deep routed sites. For example, a request to:
www.example.com/folder1/folder2/folder3/folder4/index.htm
would need to check
The main config file.
Then add any overrides in the document root
htaccess file.
Then add any overrides in the folder1 htaccess file.
Then add any overrides in the folder2 htaccess file.
...etc.
However if you don't have deep nesting then it's not so bad. Still slower than not using them, but may not be noticeable on most sites.
The benefit of htaccess for you here, would mean that you wouldn't need to put all the redirects in one place, and could split them up amongst the htaccess file. I'm not sure of the impact of adding 355,000 redirects to the main Apache config, but it is a fair number, so imagine it could have a performance impact. The htaccess files, on the other hand, are read dynamically as the request is made, so all the redirects would not need to be loaded into Apache.
So, this might be one of the few use cases where htaccess might be a better solution, even if you do have access to the main config files.

Is there an Apache/Plesk server setting that governs https:// behavior?

Context:
I've recently moved a site to a new host, and moved the SSL certificate from the old host to the new one. The code, written in PHP, is a big mess made by someone no longer available many years ago. Because of this, I'm hoping to figure out something related to the configuration of the server that can fix the issue so I don't have to reverse-engineer the rather messy code.
Problem:
When users navigate to an area of the site that uses https://, all goes according to plan. The problem, however, arises when they click a link in the navigation that is normally to an http:// part of the site. On hover, you can see that the target URL incorrectly includes "https://". When the user tries to go to a non-secure area with https:// in front, either by clicking one of those altered links or by typing it into the location bar of the browser, they are redirected to the directory without any domain. For example, if you try to go to "https://domain.org/site/", the browser is redirected to only "/site", which of course cannot be found.
Theoretical solutions:
Is there a setting in Plesk which governs the "stickiness" of https? One way to fix the problem is to stop the non-secure links from acquiring https://.
Is there an obvious reason why whatever script or file the site is using to redirect would break when an un-secure area is accessed via https://? Is there a server setting that would have made this function differently on the new server via the old server?
I don't have access to see what exactly the configuration of the old server was. Is it likely that this could be caused by a difference in PHP version? If so, any suspicions about what the problem would be?
Is there some workaround with .htaccess that can manually redirect all but certain secure areas of the site to http:// when they are accessed via https://, presumably before the site's redirect script is activated?
Thank you for any help!!
Yes, since Plesk 17 (Onyx):
For older versions you can create .htaccess files which will rewrite request from https pages to http, based on referrer:
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^https://domain.org [NC]
RewriteRule ^(.*)$ http://domain.org/$1 [L,R=301]

Changing site URLs in Joomla with Apache mod rewrite

So I'm trying to clean up the URLs on my site, and have digged around for information here and not figured out why I'm still getting errors.
Basically, I have a site that revolves around a search engine, and once the user sees the results and clicks on one, it goes to a URL that looks like:
www.mysearch.com/searchresults/204982398sjfdkf&thisismorejunk=junkjunk=1331
Well, sort of, but you get the point.
I want to clean this up for each result, so it looks like
www.mysearch.com/searchresult1001
I'm using the Joomla platform on my backend, and enabled 'Search Friendly URLs' and there was no problem (although it did almost nothing for me). Then before I enabled 'Use APACHE mod rewrite' I put the following code into my .htaccess file
RewriteEngine On
Options +FollowSymLinks
RewriteRule ^Joomla161_2/joomla\.html http://www.joomla.org/? [R=301,L]
(The last rule was just to see if the rewrite rule works, which it does)
First problem - My host automatically overrides the Options command, saying I can't do it for security reasons - but I figure maybe this isn't big because the rewrite rule still works.
But then when I try to enable 'Use Apache rewrite' my whole site breaks. Worse yet, I have no idea what to do next to actually CHANGE the URLs of my search results.
Options +FollowSymLinks
It is very possible that your host has this enabled by default and
adding to your .htaccess file is causing a conflict.
If your mod-rewrites are working without it then there is no reason
to add it.
Joomla Apache Rewrite
Change the seo settings and htaccess back to default setting.
BEFORE! you change any of the options in the Joomla admin for seo friendly url's or Apache rewrite you must FIRST! rename your htaccess.txt to .htaccess
In some cases you may have to give the server a min or two to notice it.
Now change the Joomla admin settings to seo friendly and enable rewrite.
Search Rewrite
If the above setting are configured correctly then your websites search results will have landing pages with nice urls.

Apache URL Redirect Alternatives

One of my clients (before I came along) decided to use htaccess redirects as their form of URL shortening/search engine friendly URLs. They have literally thousands of them.
The new version of the site now has friendly urls but they aren't equivalent to their redirects so they still need them.
My question to you all is: Is there another way than to populate this file with thousands of lines of "Redirects /folder1 /folder2"?
Thanks
If you cannot make simple rules to catch all of them as in the #chris henry solution you can use the RewriteMap utility of mod_rewrite. You'll be able to write these thousand rules in a text file, then make this text file an hash file, and mode_rewrite will try to match url in this file (if it's an hash file it's quite fast). After that mode_rewwrite can generate a redirect 301 with the [L,R=301] tag.
Yep, look at using the Apache config (httpd.conf or httpd-vhosts.conf) to set up site wide folder aliasing. Eg:
Alias /folder1 c:/www/folder2
Look at http://httpd.apache.org/docs/2.0/mod/core.html#directory for more info.
Depending on how different the URLs being redirected are, one solution might be to come up with an rewrite rule that covers all of them, and maintain the short / long URLs in your application, or even a database.