How to make sure a link in the spam-post won't get benefit in search-engine result - seo

I have a wiki website. Many spammers using it for seo. They are adding spam-posts with a link to an external website. Is there way to make sure they won't get benefit of it? My thought is adding a text file like robots.txt to inform the search engine "don't consider external website links for search results". I don't want to prevent spammers from creating posts for the sake of advertisements :)

Add rel="nofollow" to the links when you output them on your site.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569
They will still spam your site with links, so you'll need to monitor as well.

Related

How to ignore some links in my website?

I am working on a small php script and i have some links like this
*-phones-*.html
* are variables i want to disallow google to index this kind of links using robots.txt, it is possible ?
You're not disallowing anything. robots.txt is just a set of guidelines for webcrawlers, who can choose to follow them or not.
Rude crawlers should of course be IP banned. But you can't avoid that the webcrawler might come across that page. Anyway, you can add it to your robots.txt and googles webcrawler might obey.

External links: when use rel="external" or rel="nofollow"?

In most of my web site I have a lot of external links to my other sites and other external sites.
I need to know when is better to use rel="nofollow" or rel="external" in a website?
You may use external for every link to a different website, no matter if it’s yours or not, if it’s on the same host or not.
You may use nofollow for every link that you don’t endorse (for example: search engines shouldn’t assume that it’s a relevant link and should not give any ranking credit to this link).
You may use both values for the same link:
Foobar
Note that external doesn’t convey that the link should be opened in a new window.
Note that search engine bots (that support nofollow) might still follow a nofollow link (it doesn’t forbid to follow it). FWIW, there is also the nofollow value for the meta-robots keyword (which may mean the same … or not, depending on which definition you follow).
nofollow links attributes do not allow search engine bots to follow link.
If you have rel="nofollow" then the juice stops.
rel="external" dosent act like nofollow. its DoFollow link.
For rel="external" it means the file is on a different site to the current one.
rel="external" is the XHTML valid version that informs search engine spiders that the link is external.
However, using this does not open the link in a new window. target="_blank" and target="_new" does this, but is not XHTML valid. I hope that helps.
I advise you to use Nofollow Links for the following content:
Links in Comments or on Forums - Anything that has user-generated content is likely to be a source of spam. Even if you carefully moderate, things will slip through
Advertisements & Sponsored Links - Any links that are meant to be advertisements or are part of a sponsorship arrangement must be nofollowed.
Paid Links - If you charge in any way for a link (directory submission, quality assessment, reviews, etc.), nofollow the outbound links
**
If you have an external link to your own site then use
Your Link
If you have external link to someone else's site you don't trust then you can combine both and use
Other Domain Link
If you have an external link to someone else's site and you consider it's trustworthy then use
External Useful Link
It depends what you mean by "better". Those are two comopletely different attribute.
rel = nofollow tells the Search engine crawlers not to look at this link (probably you don't want this to happen for your other websites, but you will use it for other's web sites). Documentation: rel=nofollow - https://support.google.com/webmasters/answer/96569?hl=en
rel = external tells that the link is not part of the web site and open the link in a new window (it's not working for older IE). It is used as a valid XHTML attribute instead of target="_blank". Here you can learn how to use it: http://www.copterlabs.com/blog/easily-create-external-links-without-the-target-attribute/

Sub-domain vs Sub-directory to block from crawlers

I've google a lot and read a lot of articles, but got mixed reactions.
I'm a little confused about which is a better option if I want a certain section of my site to be blocked from being indexed by Search Engines. Basically I make a lot of updates to my site and also design for clients, I don't want all the "test data" that I upload for previews to be indexed to avoid the duplicate content issue.
Should I use a sub-domain and block the whole sub-domain
or
Create a sub-directory and block it using robots.txt.
I'm new to web-designing and was a little insecure about using sub-domains (read somewhere that it's a little advanced procedure and even a tiny mistake could have big consequences, moreover Matt Cutts has also mentioned something similar (source):
"I’d recommend using sub directories until you start to feel pretty
confident with the architecture of your site. At that point, you’ll be
better equipped to make the right decision for your own site."
But on the other hand I'm hesitant on using robots.txt as well as anyone could access the file.
What are the pros and cons of both?
For now I am under the impression that Google treats both similarly and it would be best to go for a sub-directory with robots.txt, but I'd like a second opinion before "taking the plunge".
Either you ask bots not to index your content (→ robots.txt) or you lock everyone out (→ password protection).
For this decision it's not relevant whether you use a separate subdomain or a folder. You can use robots.txt or password protection for both. Note that the robots.txt always has to be put in the document root.
Using robots.txt gives no guaranty, it's only a polite request. Polite bots will honor it, others not. Human users will still be able to visit your "disallowed" pages. Even those bots that honor your robots.txt (e.g. Google) may still link to your "disallowed" content in their search (they won't index content, though).
Using a login mechanism protects your pages from all bots and visitors.

How to tell search engines NOT to look at this specific link?

Suppose I have a link in the page My Messages, which on click will display an alert message "You must login to access my messages".
May be it's better to just not display this link when user is not logged in, but I want "My Messages" to be visible even if user is not logged in.
I think this link is user-friendly, but for search engines they will get redirected to login page, which I think is.. bad for SEO? or is it fine?
I thought of keeping My Messages displayed as normal text (not as a link), then wrap it with a link tag by using javascript/jquery, is this solution good or bad? other ideas please? Thank you.
Try to create a robots.txt file and write:
User-agent: *
Disallow: /mymessages
This will keep SEO bots out of that folder
Use a robots.txt file to tell search engines which pages they should not index.
Using nofollow to block access to a page is erroneous - this is not what nofollow is for. This attribute was designed to allow to you place a link in page without conferring any weight or endorsement of the link. In other words, it's not a link that search engines should regard as significant for page-ranking algorithms. It does not mean "do not index this page" - just "don't follow this particular link to that page"
Here's what Google have to say about nofollow
...However, the target pages may still appear in our index if other
sites link to them without using nofollow or if the URLs are submitted
to Google in a Sitemap. Also, it's important to note that other search
engines may handle nofollow in slightly different ways.
One way of keeping the URL from affecting your rank is setting the rel attribute of your link:
My Messages
Another option is robots.txt, that way you can disallow the bots from the URL entirely.
You might want to use robots.txt to exclude /mymessages. This will also prevent engines which have already visited /mymessages from visiting it again.
Alternatively, add the following to the top of the /mymessages script:
<meta name="robots" content="noindex" />
If you want to tell search engines, not to follow a particular link , then use rel="nofollow".
It is a way to tell search engines and bots that don't follow this link.
Now,google will not crawl that link and does not transfer PageRank or anchor text across this link.

Is there a way to prevent Googlebot from indexing certain parts of a page?

Is it possible to fine-tune directives to Google to such an extent that it will ignore part of a page, yet still index the rest?
There are a couple of different issues we've come across which would be helped by this, such as:
RSS feed/news ticker-type text on a page displaying content from an external source
users entering contact phone etc. details who want them visible on the site but would rather they not be google-able
I'm aware that both of the above can be addressed via other techniques (such as writing the content with JavaScript), but am wondering if anyone knows if there's a cleaner option already available from Google?
I've been doing some digging on this and came across mentions of googleon and googleoff tags, but these seem to be exclusive to Google Search Appliances.
Does anyone know if there's a similar set of tags to which Googlebot will adhere?
Edit: Just to clarify, I don't want to go down the dangerous route of cloaking/serving up different content to Google, which is why I'm looking to see if there's a "legit" way of achieving what I'd like to do here.
What you're asking for, can't really be done, Google either takes the entire page, or none of it.
You could do some sneaky tricks though like insert the part of the page you don't want indexed in an iFrame and use robots.txt to ask Google not to index that iFrame.
In short NO - unless you use cloaking with is discouraged by Google.
Please check out the official documentation from here
http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html
Go to section "Excluding Unwanted Text from the Index"
<!--googleoff: index-->
here will be skipped
<!--googleon: index-->
Found useful resource for using certain duplicate content and not to allow index by search engine for such content.
<p>This is normal (X)HTML content that will be indexed by Google.</p>
<!--googleoff: index-->
<p>This (X)HTML content will NOT be indexed by Google.</p>
<!--googleon: index>
At your server detect the search bot by IP using PHP or ASP. Then feed the IP addresses that fall into that list a version of the page you wish to be indexed. In that search engine friendly version of your page use the canonical link tag to specify to the search engine the page version that you do not want to be indexed.
This way the page with the content that do want to be index will be indexed by address only while the only the content you wish to be indexed will be indexed. This method will not get you blocked by the search engines and is completely safe.
Yes definitely you can stop Google from indexing some parts of your website by creating custom robots.txt and write which portions you don't want to index like wpadmins, or a particular post or page so you can do that easily by creating this robots.txt file .before creating check your site robots.txt for example www.yoursite.com/robots.txt.
All search engines either index or ignore the entire page. The only possible way to implement what you want is to:
(a) have two different versions of the same page
(b) detect the browser used
(c) If it's a search engine, serve the second version of your page.
This link might prove helpful.
There are meta-tags for bots, and there's also the robots.txt, with which you can restrict access to certain directories.