Major site rewrite and SEO with 301 redirects - seo

I am currently working on a relaunch of a high trafficed website with thousands of pages. All urls are changing and for SEO reasons i know we should be putting 301 redirects in for old page to new page mappings. However, the old site has tens of thousands of pages, do i really put thousands of 301 redirects in place? isnt that bad for performance reasons? Any other suggestions or ways?
Thanks in advance.

If the URLs follow a regular pattern then you can set up more general redirects based on the format. For example if you're redirecting /category/page.php to /othername/page that is very easy to do with regular expressions in any server.
If there is no pattern then you could keep a lookup database that you check to redirect. It may be a little bit slower but not terrible performance. You can still list the top 50 or so pages to redirect in your server config.
Note, if you are using Apache then it's strongly recommended to put the redirect rules in your httpd.conf (stored in memory when Apache starts) and not .htaccess files (which are loaded on every page request).

At that scale you probably want to write some custom code and an indexed database table to retrieve the redirect info.

You're probably going to want to take a look at the most important pages on the site - perhaps categories and start the redirecting from there. There's no way you're going to be able to do tens of thousands of redirects because as you mentioned, there will be performance issues with that.

to make this simple, you can use wildcards and regular expressions in htaccess.
it'll only work if your website is properly organized in categories

Related

Deprecated domain in google index

We have got a deprecated domain www.deprecateddomain.com. Specific fact is that we have got reverse proxy working and redirecting all requests from this domain to the new one www.newdomain.com.
The problem is when you type "deprecateddomain.com" in google search, there is a link to www.deprecateddomain.com in search results besides results with "newdomain.com". It means that there is such entries in google index. Our customer don't want to see links to old site.
We were suggested to create fake robots.txt with Disallow: / directive for www.deprecateddomain.com and reverse proxy rules to get this file from some directory. But after investigation the subject I started hesitating that it will help. Will it remove entries with old domain from index?
Why not to just create the request in search console to remove www.deprecateddomain.com from index? In my opinion it might help.
Anyway, I'm novice in this question. Could you give me advice what to do?
Google takes time to remove old/obsolete entries from its ranking, especially on low visited or low value pages. You have no control on it. Google needs to revisit each page to see the redirection you have implemented.
So DO NOT implement a disallow on the old website, because it will make the problem worse. Bots won't be able to crawls those pages and see the redirection you have implemented. So they will stay longer in the rankings.
You must also make sure you implement a proper 301 redirection (i.e. a permanent one, not a temporary) for all pages of the old website. Else, some pages may stay in the ranking for quite some time.
If some pages are obsolete and should be deleted rather than redirected, return a 404 for them. Google will remove them quickly from its index.

Disallow double or junk wildcard subdomains in htaccess mod rewrite for SEO

I have wildcard subdomains enabled on my domain. I use this so that I can rewrite urls like es.domain.com to domain.com/page.php?lang=es and display to the user the local language version of page.php.
The one potential problem I see with allowing wildcard subdomains is that people can link to www.es.domain.com or even anything.they.like.domain.com and it will display a perfectly working clone of the website. I presume this 'duplicate content' is bad for SEO.
Can anyone come up with a RewriteRule which detects subdomains of more than 2 letters (www. excluded of course) and 301 redirects offending urls to the clean base domain.com? I'm having trouble when I consider domains like domain.co.uk which already look like they are on a subdomain.
As a side note, are there any similar implications for SEO on the opposite side of the url, with query parameters? For example, domain.com?param=anything-I-like will surely show a duplicate page. How does Google handle this content?
UPDATE:
Here's the rewrite rule I'm using currently. If I wanted to clean up bad urls with PHP, I'd need to modify this to catch all subdomains. i need to do this generically (without specifying domain.com) as it's going to be used on a CMS. Any suggestions?
RewriteCond %{HTTP_HOST} ^([a-z]{2})\.
RewriteRule p/(.*) page.php?p=$1&lang=%1
I honestly can't speak to fixing your actual issue, but I can confirm that anything.I.want.domain.com is really, REALLY bad for SEO. I've got two years' experience in the field and I'm currently working on a project cleaning up links for our main U.S. site. A couple of the biggest problems have come from sites just like you described where there were around 100 *.domain.com. The biggest issue is the effect of this problem with trust flow, it basically sends a link's trust rating to 0 and tells Google that, not only should this link be disregarded, the domain it came from and links to should be investigated for potential spammy-ness.
As to your final question on implications:
Query parameters can be just as helpful or detrimental as any other URL structure, so you want to be careful with those, as well. If you've got different language versions of your site, be sure to have one (especially if you don't have entirely unique content) as the rel-canonical page. The thing is, linking structure is important to search engines, but not overly so. It's one of many metrics. I'd be far more concerned about the subdomains. If you happen to be able to sneak in some small, basic keywords that help describe the page in with your query vars, it could help a bit. I would, however, highly suggest that you have a three or four tiered structure to your site, supported in the URLS.
See this
Google tends to like: domain.com/landingpage/category/subcategory?somevars=44
Going more than three deep spreads you too thin and less than that makes the site too bulky to navigate. I believe it's covered somewhat here if you've never seen it: http://moz.com/beginners-guide-to-seo
Search Engine Journal
Single Grain and
Moz
can answer a lot of your SEO questions and tools like:
Majestic
Soolve
Mozcast
SERPMetrics Flux
can help a lot, too. Try doing a little reading and see if you can decide a good scheme for your links.
Again, sorry, I don't know really any Apache, but hopefully that'll help!
Presumably you have a rewrite rule that takes anything in front of domain.com and puts it into the lang parameter. Rather than having a rewrite rule to do the redirecting, have your page.php script examine the lang parameter and issue a redirect for invalid values.
Thanks to all for the info & replies on this. The solution I've found is to write a more generic .htaccess rule to catch all subdomains and forward them to PHP for processing. PHP then checks if the subdomain is valid and if not, 301 redirects the visitor to the root domain. This way if someone links to blah.blah.domain.com, Search engines should see that as a link to just domain.com. I'm only using language subdomains on my site but it should work for any subdomains you want to use.
Here's the htaccess rewrite:
The regex works by finding the last instance of more than 3 domain-name-valid characters, followed by a dot, followed by any other string. The idea is that it finds the domain name in the url, then captures everything before it. Obviously this wont work for domains which are shorter than 3 characters.
#All sub domains are redirected to p.php for processing:
RewriteCond %{HTTP_HOST} ^(.*)\.[a-z0-9\-]{3,}\..*
RewriteRule (.*) p.php?subdom=%1 [L]
Here's the PHP:
function redirect301($page='/'){
header("HTTP/1.1 301 Moved Permanently");
header("Location:{$page}");
exit();
}
$subdom = $_REQUEST['subdomain']; //you should sanitise this if using this script!
$defaultLang = 'en';
$alternateLangs = "de|es"; //list of allowed subdomains
$alternateLangs = explode('|',$alternateLangs);
if(!empty($subdom) && $subdom!= 'www'){
if( !in_array($subdom,$alternateLangs) ) redirect301(); //redirect to homepage
$ISOlangCode = $subdom; // en,es,de,etc - capture code for use later
}
if($defaultLang && $ISOlangCode == $defaultLang) redirect301(); //disallow subdomain for default language (redirect to homepage)
Hopefully this helps someone out.

How Can I Deal With Those Dead Links After Revamping My Web Site?

Couple of months ago, we revamped our web site. We adopted totally new site structure, specifically merged several pages into one. Everything looks charming.
However, there are lots of dead links which produce a large number of 404 errors.
So how can I do with it? If I leave it alone, could it bite back someday, say eating up my pr?
One basic option is using 301 redirect, however it is almost impossible considering the number of it.
So is there any workaround? Thanks for your considering!
301 is an excellent idea.
Consider you can take advantage of global configurations to map a group of pages. You don't necessary need to write one redirect for every 404.
For example, if you removed the http://example/foo folder, using Apache you can write the following configuration
RedirectMatch 301 ^/foo/(.*)$ http://example.org/
to catch all 404 generated from the removed folder.
Also, consider to redirect selectively. You can use Google Webmaster Tools to check which 404 URI are receiving the highest number inbound links and create a redirect configuration only for those.
Chances are the number of redirection rules you need to create will decrease drastically.
301 is definitely the correct route to go down to preserve your page rank.
Alternatively, you could catch 404 errors and redirect either to a "This content has moved" type page, or your home page. If you do this I would still recommend cherry picking busy pages and important content and setting up 301s for these - then you can preserve PR on your most important content, and deal gracefully with the rest of the dead links...
I agree with the other posts - using mod_rewrite you can remap URLs and return 301s. Note - it's possible to call an external program or database with mod_rewrite - so there's a lot you can do there.
If your new and old site don't follow any remapable pattern, then I suggest you make your 404 page as useful as possible. Google has a widget which will suggest the page the user is probably looking for. This works well once Google has spidered your new site.
Along with the other 301 suggestions, you could also split the requested url string into a search string routing to your default search page (if you have one) passing those parameters automatically to the search.
For example, if someone tries to visit http://example.com/2009/01/new-years-was-a-blast, this would route to your search page and automatically search for "new years was a blast" returning the best result for those key words and hopefully your most relevant article.

SEO and hard links with dynamic URLs

With ASP.NET MVC (or using HttpHandlers) you can dynamically generate URLs, like the one in this question, which includes the title.
What happens if the title changes (for example, editing it) and there's a link pointing to the page from another site, or Google's Pagerank was calculated for that URL?
I guess it's all lost right? (The link points to nowhere and the pagerank calculated is lost)
If so, is there a way to avoid it?
I use the same system as is in place here, everything after the number in the URL is not used in the db query, then I 301 redirect anything else to be the title.
In other words, if the title changed, then it would redirect to the correct place. I do it in PHP rather than htaccess as it's easier to manage more complex ideas.
I think you're generally best off having the server send a permanent redirect to the new location, if possible.
That way any rank which is gained from third party links should, in theory, be transferred to the new location. I'm not convinced whether this happens in practice, but it should.
The way Stackoverflow seems to be implemented everything after the question number is superfluous as far as linking to the question goes. For instance:
SEO and hard links with dynamic URLs
links to this question, despite the fact that I just made up the 'question title' part out of thin air. So the link will not point to nowhere and the PageRank is not lost (though it may be split between the two URLs, depending on whether or not Google can canonicalize them into a single URL).
Have your app redirect the old URL via a 301 Redirect. This will tell Google to transfer the pagerank to the new URL.
If a document is moved to a different URL, the server should be configured to return a HTTP status code of 301 (Moved Permanently) for the old URL to tell the client where the document has been moved to. With Apache, this is done using mod_rewrite and RewriteRule.
The best thing to help Google in this instance is to return a permanent redirect on the old URL to the new one.
I'm not an ASP.NET hacker - so I can't recommend the best way to implement this - but Googling the topic looks fairly productive :-)
Yes, all SEO is lost upon a url change -- it forks to an entirely new record. The way to handle that is to leave a 301 redirect at the old title to the new one, and some search engines (read: Google) is smart enough to pick that up.
EDIT: Fixed to 301 redirect!

Google Page Rank - New Domain / Link Structure Migration

i've been tasked with re-organizing a pure HTML site into a CMS. if all goes well, the new site will eventually become the main URL, and the old domain will be phased out. the old domain has a decent enough page rank, and the company wishes to mitigate any loss of page rank for that. in looking over the options available, i've discovered a few things:
it's better to use a 301 redirect when you're ready to make the switch (source).
the current site does not have a sitemap, so adding one and submitting it may help their future page rank.
i'll need to suggest to them that they contact people currently linking to them to update their links.
the process for regaining an old page rank takes awhile, so plan on rebuilding links while we see if the new site is flexible enough to warrant switching over completely.
my question is: as a result of a move to a CMS driven site, the links to various pages will change to accommodate the new structure. will this be an issue for trying to maintain (or improve) the current page rank? what sort of methods are available to mitigate the issue of changing individual page URL's? is there a preferable method beyond mapping individual pages to their new locations with 301 redirects? (the site has literally hundreds of pages, ugh...)
ex.
http://domain.com/Messy_HTML_page_with_little_categorization.html ->
http://newdomain.com/nice/structured/pages.php
i realize this isn't strictly a programming question, however i felt the information could be useful to developers who are tasked with handling this sort of thing in addition to development of the site.
edit: additions in italics
If you really truly want to ensure that page rank is not lost, you will want to replace the old content with something that performs a proper 301 redirect to the new location. With a 301 redirect the search spiders will know that the content is moved and the page rank typically carries over. It also helps external links.
However, the down side is that after a certain period of time you just have to get rid of the old domains.
You can make a handler for HTML files and map the old pages to the new structure with a 301 redirect.