How to restrict search engines from indexing my mediawiki site? - seo

Is there a fool proof way to restrict your content from being indexed by major search engines?
Thanks
Prady

One possible way, is the Robots.txt file.
User-Agent: *
Disallow: /
Here is a blog post discussing other techniques, including meta tags.

Most search engines follow robots.txt. I've heard Yahoo Slurp! does not.
You could scan user agent for well known bots, such as Google, Yahoo, Bing, Internet Archive, etc and produce blank output. You will be penalised for giving alternate content to Google, but since you are blocking them, it won't be a problem.
The most important thing is whatever you publish publically can and will be accessed by bots such as search engine spiders.
Don't forget bots have a nasty habit of being where you don't want them to be (mixed with bad coding practices, can be quite disastrous).

Fool proof? I think not. You can restrict IP's, use Robots.txt, meta tags, but if a search engine really really wants your content indexed, it will find a way.

Related

robots.txt disallow property

I have disallowed certain pages using robots.txt for all crawlers. Do I have to write metatags for those files or web crawlers will just skip them so there is no need to do so?
If the crawler you want to limit obeys robots.txt then you are fine, but if it dosn't then you are probably screwed eighterway, because chances are it will ignore meta too.
All major search-engine crawlers does obey it however so you are probably fine.
You are good to go. All of the big search engines (Google, really) obey any entries you make in robots.txt. http://www.robotstxt.org/robotstxt.html
Also, be aware that the robots.txt file itself is viewable, so don't use this as a security measure. http://www.cre8asiteforums.com/forums/index.php?showtopic=55546
Well written bots will ignore those pages (provided that robots.txt syntax is correct).

Is there a way that is more efficient than sitemap to add/force recrawl/remove your website's index entries in google?

Pretty much that is the question. Is there a way that is more efficient than the standart sitemap.xml to [add/force recrawl/remove] i.e. manage your website's index entries in google?
I remember a few years ago I was reading an article of an unknown blogger that was saying that when he write news in his website, the url entry of the news will appear immediately in google's search result. I think he was mentioning about something special. I don't remember exactly what.. . some automatic re-crawling system that is offered by google themselves? However, I'm not sure about it. So I ask, do you think that I am blundering myself and there is NO OTHER way to manage index content besides sitemap.xml ? I just need to be sure about this.
Thank you.
I don't think you will find that magical "silver bullet" answer you're looking for, but here's some additional information and tips that may help:
Depth of crawl and rate of crawl is directly influenced by PageRank (one of the few things it does influence). So increasing your site's homepage and internal pages back-link count and quality will assist you.
QDF - this Google algorithm factor, "Query Deserves Freshness", does have a real impact and is one of the core reasons behind the Google Caffeine infrastructure project to allow much faster finding of fresh content. This is one of the main reasons that blogs and sites like SE do well - because the content is "fresh" and matches the query.
XML sitemaps do help with indexation, but they won't result in better ranking. Use them to assist search bots to find content that is deep in your architecture.
Pinging, especially by blogs, to services that monitor site changes like ping-o-matic, can really assist in pushing notification of your new content - this can also ensure the search engines become immediately aware of it.
Crawl Budget - be mindful of wasting a search engine's time on parts of your site that don't change or don't deserve a place in the index - using robots.txt and the robots meta tags can herd the search bots to different parts of your site (use with caution so as to not remove high value content).
Many of these topics are covered online, but there are other intrinsic things like navigational structure, internal linking, site architecture etc that also contribute just as much as any "trick" or "device".
Getting many links, from good sites, to your website will make the Google "spiders" reach your site faster.
Also links from social sites like Twitter can help the crawlers visit your site (although the Twitter links do not pass "link juice" - the spiders still go through them).
One last thing, update your content regularly, think of content as "Google Spider Food". If the spiders will come to your site, and will not find new food, they will not come back again soon, if each time they come, there is new food, they will come a lot. Article directories for example, get indexed several times a day.

do lots of links with keywords affect SEO?

as you know, some websites have lots of links with keywords in their frontpage. sometimes they use cloudtags and in other cases they even link the "most popular" searches.
do you think that could be a good idea for SEO?
The more links you have the better it is, but remember that search engines developers are not foolish and will try to favorize websites with real interesting content first.
So invest in your content, then in links to your content.

What are cons to not to use <meta name="keywords" content="some, words" />

If i only use <meta name="description" content="lorem impsum." />
I heard search engines does not give importance to Keywords.
<meta name="keywords" content="some, words" />
So is it ok to not to use Keywords?
I have been looking for evidence of Meta Keyword support for years and never found any documentation that they are supported by anyone. Never. Most of the recommendations supporting them are recycled from everyone else.
Some people say that they may be used in the future... well, I'll get to that in a moment. Other people say that Keywords can't hurt so just include them anyway. But they are incorrect.
Meta keywords are great for letting your competitors know your SEO secrets. You wouldn't tell your competitors this information directly so, don't use them. These are the only people that are likely to look at your Meta Keywords.
Since Google set the bench mark of quality software, Search Engines must perform to very high standards to be successful. It's too easy for consumers to switch to Google which is trusted and reliable.
Consider this:
To build quality Search Engine you must, first of all acquire high quality information for indexing. This is the foundation of your product.
You must also protect your search index from being manipulated by third parties for their benefit. Your users will probably not have the same interest as a third party who can modify your Search Engine's behaviour.
Meta keywords are not derived from the content of the web page though any process that can be considered reliable. Meta Keywords are not directly related to web pages in any way and can be manipulated without consequence. This makes meta keywords a low quality source of information. They are what's known to programmers as "Tainted Data", data that is not to be trusted.
If you build your Search Engine to index low quality information, your Search Engine won't return useful search results. I propose that it would be impossible to build a search engine today that uses meta keywords that would work well at all.
It's important to stop using Meta Keywords and try to put the Meta Keywords myth to rest. They just waste everybody's time and are counter productive. Remember, It's not good practice to add features to your website that don't work. The time you spend with something that doesn't work could be better spent with something that does. Or maybe go look out the window and admire the sky. You'll be better off.
I heard search engines does not give
importance to Keywords.
Google doesn’t use the keywords meta tag for the web search (Source).
However, Yahoo (Source), Bing (Source), and other search engines may still be using them with various degrees of importance. They may also be used by internal search engines.
So is it ok to not to use Keywords?
"... I hope this clarifies that the keywords meta tag is not something that you need to worry about, or at least not in Google." - Mutt Cutts (Google doesn’t use the keywords meta tag in web search)
I have heard the same. However search engine algorithms are not static and may change over time. Furthermore not all search engines treat the keywords tag equally. I think you should include it if possible.
Google analyzes your page content and gives higher priority to other parts, but I don't know of any reason not to include meta tag keywords.

How to Develop a Successful Sitemap

I have been browsing around on the internet and researching effective sitemap web pages. I have encountered these two sitemaps and questioning their effectiveness.
http://www.webanswers.com/sitemap/
http://www.answerbag.com/sitemap/
Are these sitemaps effective?
Jeff Atwood, (One of the guys who made this site) wrote a great article on the importance of sitemaps.
I'm a little aggravated that we have
to set up this special file for the
Googlebot to do its job properly; it
seems to me that web crawlers should
be able to spider down our simple
paging URL scheme without me giving
them an explicit assist.
The good news is that since we set up
our sitemaps.xml, every question on
Stack Overflow is eminently findable.
But when 50% of your traffic comes
from one source, perhaps it's best not
to ask these kinds of questions.
So yeah, effective for people, or effective for google?
I would have thought a HTML sitemap should be useful to a human, whereas these 2 sites aren't. If you're trying to target a search engine then a sitemap.xml file that conforms to sitemaps.org would be a better approach. Whilst the html approach would work it's easier to generate a xml file and have your robots.txt file pointing at this.