Suppose I have a link in the page My Messages, which on click will display an alert message "You must login to access my messages".
May be it's better to just not display this link when user is not logged in, but I want "My Messages" to be visible even if user is not logged in.
I think this link is user-friendly, but for search engines they will get redirected to login page, which I think is.. bad for SEO? or is it fine?
I thought of keeping My Messages displayed as normal text (not as a link), then wrap it with a link tag by using javascript/jquery, is this solution good or bad? other ideas please? Thank you.
Try to create a robots.txt file and write:
User-agent: *
Disallow: /mymessages
This will keep SEO bots out of that folder
Use a robots.txt file to tell search engines which pages they should not index.
Using nofollow to block access to a page is erroneous - this is not what nofollow is for. This attribute was designed to allow to you place a link in page without conferring any weight or endorsement of the link. In other words, it's not a link that search engines should regard as significant for page-ranking algorithms. It does not mean "do not index this page" - just "don't follow this particular link to that page"
Here's what Google have to say about nofollow
...However, the target pages may still appear in our index if other
sites link to them without using nofollow or if the URLs are submitted
to Google in a Sitemap. Also, it's important to note that other search
engines may handle nofollow in slightly different ways.
One way of keeping the URL from affecting your rank is setting the rel attribute of your link:
My Messages
Another option is robots.txt, that way you can disallow the bots from the URL entirely.
You might want to use robots.txt to exclude /mymessages. This will also prevent engines which have already visited /mymessages from visiting it again.
Alternatively, add the following to the top of the /mymessages script:
<meta name="robots" content="noindex" />
If you want to tell search engines, not to follow a particular link , then use rel="nofollow".
It is a way to tell search engines and bots that don't follow this link.
Now,google will not crawl that link and does not transfer PageRank or anchor text across this link.
Related
I have a site with an input text.
User types the name of a city, hits enter and it's linked there.
my sitemap.xml looks like this:
<urlset>
<url><loc>http://www.example.com/rome.html</loc></url>
<url><loc>http://www.example.com/london.html</loc></url>
<url><loc>http://www.example.com/newyork.html</loc></url>
<url><loc>http://www.example.com/paris.html</loc></url>
<url><loc>http://www.example.com/berlin.html</loc></url>
<url><loc>http://www.example.com/toronto.html</loc></url>
<url><loc>http://www.example.com/milan.html</loc></url>
<url><loc>http://www.example.com/edinburgh.html</loc></url>
<url><loc>http://www.example.com/nice.html</loc></url>
<url><loc>http://www.example.com/boston.html</loc></url>
...
</urlset>
My question is:
Will I be penalized (from a SEO point of view) because my links only appear on the sitemap.xml instead as in a list of anchors in the html page.
Note: the anchor approach was excluded because I have about 5,000 listed cities
It won't be penalised. Google themselves say the primary purpose of a sitemap is "a way to tell Google about pages on your site we might not otherwise discover."
https://support.google.com/webmasters/answer/156184?hl=en
You are rare in that you are using the sitemap correctly to help Google find your pages.
Often SEOs just add one for the sake of it, rather than taking the time to identify and using it to fix potential crawling errors.
The only negative aspect for SEO I can think of is that page rank will not flow between your pages if there is no direct link.
No, you will not be penalized. The sole purpose of sitemaps is to tell search engines where to find your content. That content may or may not be available through hyperlinks on your website.
I have a wiki website. Many spammers using it for seo. They are adding spam-posts with a link to an external website. Is there way to make sure they won't get benefit of it? My thought is adding a text file like robots.txt to inform the search engine "don't consider external website links for search results". I don't want to prevent spammers from creating posts for the sake of advertisements :)
Add rel="nofollow" to the links when you output them on your site.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569
They will still spam your site with links, so you'll need to monitor as well.
I have a wordpress website which has been indexed in search engines.
I have edited Robots.txt to disallow certain directories and webpages from search index.
I only know how to use allow and disallow, but don't know how to use the follow and nofollow in Robots.txt file.
I read somewhere while Googling on this that I can have webpages that won't be indexed in Google but will be crawled for pageranks. This can be achieved by disallowing the webpages in Robots.txt and use follow for the webpages.
Please let me know how to use follow and nofollow in Robots.txt file.
Thanks
Sumit
a.) The follow/no follow and index/no index rules are not for robots.txt (sets general site rules) but for an on-page meta-robots tag (sets the rules for this specific page)
More info about Meta-Robots
b.) Google won't crawl the Disallowed pages but it can index them on SERP (using info from inbound links or website directories like Dmoz).
Having said that, there is no PR value you can gain from this.
More info about Googlebot's indexing behavior
Google actually does recognize the Noindex: directive inside robots.txt. Here's Matt Cutts talking about it: http://www.mattcutts.com/blog/google-noindex-behavior/
If you put "Disallow" in robots.txt for a page already in Google's index, you will usually find that the page stays in the index, like a ghost, stripped of its keywords. I suppose this is because they know they won't be crawling it, and they don't want the index containing bit-rot. So they replace the page description with "A description for this result is not available because of this site's robots.txt – learn more."
So, the problem remains: How do we remove that link from Google since "Disallow" didn't work? Typically, you'd want to use meta robots noindex on the page in question because Google will actually remove the page from the index if it sees this update, but with that Disallow directive in your robots file, they'll never know about it.
So you could remove that page's Disallow rule from robots.txt and add a meta robots noindex tag to the page's header, but now you've got to wait for Google to go back and look at a page you told them to forget about.
You could create a new link to it from your homepage in hopes that Google will get the hint, or you could avoid the whole thing by just adding that Noindex rule directly to the robots.txt file. In the post above, Matt says that this will result in the removal of the link.
No you cant.
You can set which directories you want to block and which bots but you cant set nofollow by robots.txt
Use robots meta tag on the pages to set nofollow.
Is it possible to fine-tune directives to Google to such an extent that it will ignore part of a page, yet still index the rest?
There are a couple of different issues we've come across which would be helped by this, such as:
RSS feed/news ticker-type text on a page displaying content from an external source
users entering contact phone etc. details who want them visible on the site but would rather they not be google-able
I'm aware that both of the above can be addressed via other techniques (such as writing the content with JavaScript), but am wondering if anyone knows if there's a cleaner option already available from Google?
I've been doing some digging on this and came across mentions of googleon and googleoff tags, but these seem to be exclusive to Google Search Appliances.
Does anyone know if there's a similar set of tags to which Googlebot will adhere?
Edit: Just to clarify, I don't want to go down the dangerous route of cloaking/serving up different content to Google, which is why I'm looking to see if there's a "legit" way of achieving what I'd like to do here.
What you're asking for, can't really be done, Google either takes the entire page, or none of it.
You could do some sneaky tricks though like insert the part of the page you don't want indexed in an iFrame and use robots.txt to ask Google not to index that iFrame.
In short NO - unless you use cloaking with is discouraged by Google.
Please check out the official documentation from here
http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html
Go to section "Excluding Unwanted Text from the Index"
<!--googleoff: index-->
here will be skipped
<!--googleon: index-->
Found useful resource for using certain duplicate content and not to allow index by search engine for such content.
<p>This is normal (X)HTML content that will be indexed by Google.</p>
<!--googleoff: index-->
<p>This (X)HTML content will NOT be indexed by Google.</p>
<!--googleon: index>
At your server detect the search bot by IP using PHP or ASP. Then feed the IP addresses that fall into that list a version of the page you wish to be indexed. In that search engine friendly version of your page use the canonical link tag to specify to the search engine the page version that you do not want to be indexed.
This way the page with the content that do want to be index will be indexed by address only while the only the content you wish to be indexed will be indexed. This method will not get you blocked by the search engines and is completely safe.
Yes definitely you can stop Google from indexing some parts of your website by creating custom robots.txt and write which portions you don't want to index like wpadmins, or a particular post or page so you can do that easily by creating this robots.txt file .before creating check your site robots.txt for example www.yoursite.com/robots.txt.
All search engines either index or ignore the entire page. The only possible way to implement what you want is to:
(a) have two different versions of the same page
(b) detect the browser used
(c) If it's a search engine, serve the second version of your page.
This link might prove helpful.
There are meta-tags for bots, and there's also the robots.txt, with which you can restrict access to certain directories.
I Produced a page which I have no intention to let Search Engines find and claw it.
The advisable solution is robot.txt. But it is not applicable in my situation.
So I isolated this page from my site by clearing all links from other pages to this page, and never put its URL in external sites.
Logically, then, it is impossible for search engines to find out this page. And that means no matter how many out-bound links nesting in this page, the PR of site is save.
Am I right?
Thank you very much!
Hope this question is programming related!
No, there's still a chance your page can be found by search engine crawlers. For example, it's been speculated that data from the Google Toolbar can be used to alert Googlebot to the presence of a page. And there's still a chance others might link to your page from external sites if the URL becomes known.
Your best bet is to add a robots meta tag to your page, this will prevent it from being indexed, and prevent crawlers from following any links:
<meta name="robots" content="noindex,nofollow" />
If it is on the internet and not restricted, it will be found. It may make it harder to find, but it is still possible a crawler may happen across it.
What is the link so I can check? ;)
If you have outbound links on this "isolated" page then your page will probably show up as a referrer in the logs of the linked-to page. Depending on how much the owners of the linked-to page track their stats, then they may find your page.
I've seen httpd log files turn up in Google searches. This in turn may lead others to find your page, including crawlers and other robots.
The easiest solution might be to password protect the page?