How Can I Deal With Those Dead Links After Revamping My Web Site? - seo

Couple of months ago, we revamped our web site. We adopted totally new site structure, specifically merged several pages into one. Everything looks charming.
However, there are lots of dead links which produce a large number of 404 errors.
So how can I do with it? If I leave it alone, could it bite back someday, say eating up my pr?
One basic option is using 301 redirect, however it is almost impossible considering the number of it.
So is there any workaround? Thanks for your considering!

301 is an excellent idea.
Consider you can take advantage of global configurations to map a group of pages. You don't necessary need to write one redirect for every 404.
For example, if you removed the http://example/foo folder, using Apache you can write the following configuration
RedirectMatch 301 ^/foo/(.*)$ http://example.org/
to catch all 404 generated from the removed folder.
Also, consider to redirect selectively. You can use Google Webmaster Tools to check which 404 URI are receiving the highest number inbound links and create a redirect configuration only for those.
Chances are the number of redirection rules you need to create will decrease drastically.

301 is definitely the correct route to go down to preserve your page rank.
Alternatively, you could catch 404 errors and redirect either to a "This content has moved" type page, or your home page. If you do this I would still recommend cherry picking busy pages and important content and setting up 301s for these - then you can preserve PR on your most important content, and deal gracefully with the rest of the dead links...

I agree with the other posts - using mod_rewrite you can remap URLs and return 301s. Note - it's possible to call an external program or database with mod_rewrite - so there's a lot you can do there.
If your new and old site don't follow any remapable pattern, then I suggest you make your 404 page as useful as possible. Google has a widget which will suggest the page the user is probably looking for. This works well once Google has spidered your new site.

Along with the other 301 suggestions, you could also split the requested url string into a search string routing to your default search page (if you have one) passing those parameters automatically to the search.
For example, if someone tries to visit http://example.com/2009/01/new-years-was-a-blast, this would route to your search page and automatically search for "new years was a blast" returning the best result for those key words and hopefully your most relevant article.

Related

Deprecated domain in google index

We have got a deprecated domain www.deprecateddomain.com. Specific fact is that we have got reverse proxy working and redirecting all requests from this domain to the new one www.newdomain.com.
The problem is when you type "deprecateddomain.com" in google search, there is a link to www.deprecateddomain.com in search results besides results with "newdomain.com". It means that there is such entries in google index. Our customer don't want to see links to old site.
We were suggested to create fake robots.txt with Disallow: / directive for www.deprecateddomain.com and reverse proxy rules to get this file from some directory. But after investigation the subject I started hesitating that it will help. Will it remove entries with old domain from index?
Why not to just create the request in search console to remove www.deprecateddomain.com from index? In my opinion it might help.
Anyway, I'm novice in this question. Could you give me advice what to do?
Google takes time to remove old/obsolete entries from its ranking, especially on low visited or low value pages. You have no control on it. Google needs to revisit each page to see the redirection you have implemented.
So DO NOT implement a disallow on the old website, because it will make the problem worse. Bots won't be able to crawls those pages and see the redirection you have implemented. So they will stay longer in the rankings.
You must also make sure you implement a proper 301 redirection (i.e. a permanent one, not a temporary) for all pages of the old website. Else, some pages may stay in the ranking for quite some time.
If some pages are obsolete and should be deleted rather than redirected, return a 404 for them. Google will remove them quickly from its index.

Block google from indexing some pages from site

I have a problem with lots of 404 errors on one site. I figured out that these errors are happening because google is trying to find pages that no longer exist.
Now I need to tell Google not to index those pages again.
I found some solutions on the internet about using robots.txt file. But this is not a site that I built. I just need to fix those errors.The thing is, those pages are generated. They do not physically exist in that form. So I can not add anything in php code.
And I am not quite sure how to add those to robot.txt.
When I just write:
*User-agent: *
noindex: /objekten/anzeigen/haus_antea/5-0000001575*
and hit test button in webmaster tools
I get this from Googlebot:
Allowed
Detected as a directory; specific files may have different restrictions
And I do not know what that means.
I am new in this kind of stuff so please write your answer as simpler as it can be.
Sorry for bad english.
I think Google will remove such pages that return a 404 error automatically from its index. Google will not display these pages in the results. So you don't need to care about that.
Just make sure, that these pages are not linked from other pages. If so, Google may try to index them from time to time. In this case you should return a 301 error (permanently moved) and redirect to the correct url. Google will follow the 301 errors and use the redirected url instead.
Robots.txt is only necessary, if you want to remove pages that are already in the search results. But I think pages with error code 404 will not be displayed there anyway.

Major site rewrite and SEO with 301 redirects

I am currently working on a relaunch of a high trafficed website with thousands of pages. All urls are changing and for SEO reasons i know we should be putting 301 redirects in for old page to new page mappings. However, the old site has tens of thousands of pages, do i really put thousands of 301 redirects in place? isnt that bad for performance reasons? Any other suggestions or ways?
Thanks in advance.
If the URLs follow a regular pattern then you can set up more general redirects based on the format. For example if you're redirecting /category/page.php to /othername/page that is very easy to do with regular expressions in any server.
If there is no pattern then you could keep a lookup database that you check to redirect. It may be a little bit slower but not terrible performance. You can still list the top 50 or so pages to redirect in your server config.
Note, if you are using Apache then it's strongly recommended to put the redirect rules in your httpd.conf (stored in memory when Apache starts) and not .htaccess files (which are loaded on every page request).
At that scale you probably want to write some custom code and an indexed database table to retrieve the redirect info.
You're probably going to want to take a look at the most important pages on the site - perhaps categories and start the redirecting from there. There's no way you're going to be able to do tens of thousands of redirects because as you mentioned, there will be performance issues with that.
to make this simple, you can use wildcards and regular expressions in htaccess.
it'll only work if your website is properly organized in categories

SEO and hard links with dynamic URLs

With ASP.NET MVC (or using HttpHandlers) you can dynamically generate URLs, like the one in this question, which includes the title.
What happens if the title changes (for example, editing it) and there's a link pointing to the page from another site, or Google's Pagerank was calculated for that URL?
I guess it's all lost right? (The link points to nowhere and the pagerank calculated is lost)
If so, is there a way to avoid it?
I use the same system as is in place here, everything after the number in the URL is not used in the db query, then I 301 redirect anything else to be the title.
In other words, if the title changed, then it would redirect to the correct place. I do it in PHP rather than htaccess as it's easier to manage more complex ideas.
I think you're generally best off having the server send a permanent redirect to the new location, if possible.
That way any rank which is gained from third party links should, in theory, be transferred to the new location. I'm not convinced whether this happens in practice, but it should.
The way Stackoverflow seems to be implemented everything after the question number is superfluous as far as linking to the question goes. For instance:
SEO and hard links with dynamic URLs
links to this question, despite the fact that I just made up the 'question title' part out of thin air. So the link will not point to nowhere and the PageRank is not lost (though it may be split between the two URLs, depending on whether or not Google can canonicalize them into a single URL).
Have your app redirect the old URL via a 301 Redirect. This will tell Google to transfer the pagerank to the new URL.
If a document is moved to a different URL, the server should be configured to return a HTTP status code of 301 (Moved Permanently) for the old URL to tell the client where the document has been moved to. With Apache, this is done using mod_rewrite and RewriteRule.
The best thing to help Google in this instance is to return a permanent redirect on the old URL to the new one.
I'm not an ASP.NET hacker - so I can't recommend the best way to implement this - but Googling the topic looks fairly productive :-)
Yes, all SEO is lost upon a url change -- it forks to an entirely new record. The way to handle that is to leave a 301 redirect at the old title to the new one, and some search engines (read: Google) is smart enough to pick that up.
EDIT: Fixed to 301 redirect!

Google Page Rank - New Domain / Link Structure Migration

i've been tasked with re-organizing a pure HTML site into a CMS. if all goes well, the new site will eventually become the main URL, and the old domain will be phased out. the old domain has a decent enough page rank, and the company wishes to mitigate any loss of page rank for that. in looking over the options available, i've discovered a few things:
it's better to use a 301 redirect when you're ready to make the switch (source).
the current site does not have a sitemap, so adding one and submitting it may help their future page rank.
i'll need to suggest to them that they contact people currently linking to them to update their links.
the process for regaining an old page rank takes awhile, so plan on rebuilding links while we see if the new site is flexible enough to warrant switching over completely.
my question is: as a result of a move to a CMS driven site, the links to various pages will change to accommodate the new structure. will this be an issue for trying to maintain (or improve) the current page rank? what sort of methods are available to mitigate the issue of changing individual page URL's? is there a preferable method beyond mapping individual pages to their new locations with 301 redirects? (the site has literally hundreds of pages, ugh...)
ex.
http://domain.com/Messy_HTML_page_with_little_categorization.html ->
http://newdomain.com/nice/structured/pages.php
i realize this isn't strictly a programming question, however i felt the information could be useful to developers who are tasked with handling this sort of thing in addition to development of the site.
edit: additions in italics
If you really truly want to ensure that page rank is not lost, you will want to replace the old content with something that performs a proper 301 redirect to the new location. With a 301 redirect the search spiders will know that the content is moved and the page rank typically carries over. It also helps external links.
However, the down side is that after a certain period of time you just have to get rid of the old domains.
You can make a handler for HTML files and map the old pages to the new structure with a 301 redirect.