How to deal with missing items the SEO way? - seo

I am working on a public-facing web site which serves up articles for people to read. After some time, articles become stale and we remove them from the site. My question is this: what is the best way to handle the situation when a search engine visits a URL corresponding to a removed article? Should the app respond with a permanent redirect (301 Moved Permanently) to a "article not found" page, or is there a better way to handle this?
Edit
These articles are actually not removed, but they are "unpublished" - and they may return to the "published" state eventually.

If the article is remove you should respond with 410 Gone. Your error page can still have some useful info on it as long as the response code is correct. This indicates that the page has been intentionally removed and is not just "not found" (as would happen with a bad url).

You might consider keeping the content up, with some sort of indicator to the person reading it that the content is stale. Then you could also include more relevant content on the page or links to more relevant content.
This might not be appropriate for your situation, or could be more work than its worth, but it may be a good way not to waste potential traffic.
I feel like the 410 Gone response would be the appropriate response, however, you'd basically be telling the search engine "we don't have this content anymore, so stop linking here" - which isn't advantageous to your SEO strategy.

Well, if you want to be all proper about it, it should redirect not to an article not found, but to an "article removed" page. Because article not found suggests that it should be a 404.

My gut tells me that you should probably have an article removed page, but in practice many sites will simply do a 301 redirect to the home page.
I think the idea there is that any "link juice" from the old article will then be transferred to the home page rather than a generic "article removed" page. I get the feeling though that search engines might not look too kindly on that practice.

Related

Is rel=self the correct rel tag to use for forum permalinks?

I have been building a forum from scratch with my friends just for fun, and we're starting to see bots and scrapers go by. The problem we're having is that you can load a page /post/1 with four replies, and each reply includes a little permalink to itself /reply/1#reply-1. If I am on /post/1 and navigate to /reply/1, I'll end up right back where I started, just with the anchor to the reply. But! Scrapers have no idea this is the case, so they're opening every /post link and then following every /reply link, and it's causing performance issues, so I've been looking around SEO sites to try to fix it.
I've started using rel=canonical on the /reply page, to tell the bots they're all the same, but as far as I can tell that doesn't help me until the bot has already loaded the page, and thus I wind up with tons of traffic. Would it be correct to change my
Permalink
tags to
Permalink
since they should be the same content? Or would this be misusing rel="self" and there's another, better rel tag I should be using instead?
The self link type is not defined for HTML (but for Atom), so it can’t be used in HTML5 documents.
The canonical link type is appropriate for your case (if you make sure that it always points to the correct page, in case the thread is paginated), but it doesn’t prevent bots from crawling the URLs.
If you want to prevent crawling, no link type will help (not even the nofollow link type, but it’s not appropriate for your case anyway). You’d have to use robots.txt, e.g.:
User-agent: *
Disallow: /reply/
That said, you might want to consider changing the permalink design. I think it’s not useful (neither for your users nor for bots) to have such an architecture. It’s a good practice to have exactly one URL per document, and if users want to link to a certain post, there is no reason to require a new page load if it’s actually the same document.
So I would either use the "canonical" URL and add a fragment component (/post/1#reply-1, or what might make more sense: /threads/1#post-1), or (if you think it can be useful for your users) I would create a page that only contains the reply (with a link back to the full thread).

What should be located at the homepage of a REST API?

I'm currently in the process of writing a REST API and this question always seems to popup.
I've always just added a description, quick links to docs, server time etc, but see now (after looking around a bit) that a simple redirect to the API docs would be even better.
My question is what would be the accepted norm to have as the root - '/' - "homepage" of your API?
I've been looking at a few implementations:
Facebook: Just gives a error of "Unsupported get request.";
Twitter: Shows an actual 404 page;
StackOverflow: Redirect to quick "usage" page.
After looking at those it's clear everyone is doing it differently.
In the bigger picture this is of little significance but would be interesting to see what the "RESTfull" way of doing it (if there is one) might be.
Others have had the same question and as you discovered yourself everyone is doing it their own way. There is a move in this direction to somehow standardize it, so see if you find this draft useful:
Home Documents for HTTP APIs aka JSON Home.
I've give this much thought and right now I either return a 404 page, a health status page, a dummy page or redirect to another page, mostly likely on within the organization.
An API homepage isn't something everyone should be looking at and believe me, it can be found. There are more people like me that love to inspect the browser and see how a website is performing.

SEO and 404 error redirection

Recently I've removed some products and categories from my Magento store and that generated some many 404 errors as the pages where in the research index and now they doesn't exist anymore.
I was thinking about developing/using a module that takes the request to the store when they should give 404 and use keywords from the request URL to build a search query on the website so the customers doesn't get stop by a dead link.
But the question is:
will that kill my SEO?
How does Google, for instance, couple with the 404 error suppression?
As anyone else encountered this problem and tried something like this?
Since the operation will take quite some time, I would like some feedback before going into this road.
As now, I only know that redirecting 404 error to the homepage or another page is bad as it keeps dead links alive, but redirecting with a criteria will have the same "zombifing" effect?
In their SEO guide Google recommends building a nice custom 404 page for your case (do not forget to return 404 status code).
Abstractly quoting: "A good custom 404 page will help people find the information they're looking for, as well as providing other helpful content and encouraging them to explore your site further."
Google's recommendations about 404 pages are available here.
Also do not forget to check their starter SEO guide in the "Best Practices" chapter.
I would just try to follow the recommendations as close as possible.
Good luck!
As solution is already mentioned over here i would like to add few words to this,
one doesn't need to be worried about their numbers of 404 pages shown in webmaster tools because it is natural that you create and delete you webpages. All you need to do is set a proper custom 404 pages so when user lands on pages that is no longer available they can easily navigate to the similar stuff they would be desiring from your website.
This will increase good user experience and Google loves those sites which caters their visitors in best manner.

How Can I Deal With Those Dead Links After Revamping My Web Site?

Couple of months ago, we revamped our web site. We adopted totally new site structure, specifically merged several pages into one. Everything looks charming.
However, there are lots of dead links which produce a large number of 404 errors.
So how can I do with it? If I leave it alone, could it bite back someday, say eating up my pr?
One basic option is using 301 redirect, however it is almost impossible considering the number of it.
So is there any workaround? Thanks for your considering!
301 is an excellent idea.
Consider you can take advantage of global configurations to map a group of pages. You don't necessary need to write one redirect for every 404.
For example, if you removed the http://example/foo folder, using Apache you can write the following configuration
RedirectMatch 301 ^/foo/(.*)$ http://example.org/
to catch all 404 generated from the removed folder.
Also, consider to redirect selectively. You can use Google Webmaster Tools to check which 404 URI are receiving the highest number inbound links and create a redirect configuration only for those.
Chances are the number of redirection rules you need to create will decrease drastically.
301 is definitely the correct route to go down to preserve your page rank.
Alternatively, you could catch 404 errors and redirect either to a "This content has moved" type page, or your home page. If you do this I would still recommend cherry picking busy pages and important content and setting up 301s for these - then you can preserve PR on your most important content, and deal gracefully with the rest of the dead links...
I agree with the other posts - using mod_rewrite you can remap URLs and return 301s. Note - it's possible to call an external program or database with mod_rewrite - so there's a lot you can do there.
If your new and old site don't follow any remapable pattern, then I suggest you make your 404 page as useful as possible. Google has a widget which will suggest the page the user is probably looking for. This works well once Google has spidered your new site.
Along with the other 301 suggestions, you could also split the requested url string into a search string routing to your default search page (if you have one) passing those parameters automatically to the search.
For example, if someone tries to visit http://example.com/2009/01/new-years-was-a-blast, this would route to your search page and automatically search for "new years was a blast" returning the best result for those key words and hopefully your most relevant article.

About Isolated Page In My Web Site

I Produced a page which I have no intention to let Search Engines find and claw it.
The advisable solution is robot.txt. But it is not applicable in my situation.
So I isolated this page from my site by clearing all links from other pages to this page, and never put its URL in external sites.
Logically, then, it is impossible for search engines to find out this page. And that means no matter how many out-bound links nesting in this page, the PR of site is save.
Am I right?
Thank you very much!
Hope this question is programming related!
No, there's still a chance your page can be found by search engine crawlers. For example, it's been speculated that data from the Google Toolbar can be used to alert Googlebot to the presence of a page. And there's still a chance others might link to your page from external sites if the URL becomes known.
Your best bet is to add a robots meta tag to your page, this will prevent it from being indexed, and prevent crawlers from following any links:
<meta name="robots" content="noindex,nofollow" />
If it is on the internet and not restricted, it will be found. It may make it harder to find, but it is still possible a crawler may happen across it.
What is the link so I can check? ;)
If you have outbound links on this "isolated" page then your page will probably show up as a referrer in the logs of the linked-to page. Depending on how much the owners of the linked-to page track their stats, then they may find your page.
I've seen httpd log files turn up in Google searches. This in turn may lead others to find your page, including crawlers and other robots.
The easiest solution might be to password protect the page?