Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I've researched many methods on how to prevent Google/other search engines from crawling a specific directory. The two most popular ones I've seen are:
Adding it into the robots.txt file: Disallow: /directory/
Adding a meta tag: <meta name="robots" content="noindex, nofollow">
Which method would work the best? I want this directory to remain "invisible" from search engines so it does not affect any of my site's ranking.
In other words, I want this directory to be neutral/invisible and "just there." I don't want it to affect any ranking. Which method would be the best to achieve this?
Robots.txt is the way to go for this.
According to Google, you only use the meta tag if you don't have rights to create/edit the robots.txt file.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I'm working on an application which contents are taken from books and I can find similar content on competitor site. As I'm not copying their content but the content from books I have this fear that it could be marked as duplicated content. How can I resolve this issue or avoid duplication?
You're correct. Without proper care, your site could get dinged for having duplicate content. There are several options that you can take:
1)If you want the page to be indexed, then putting the book content into an iFrame (which search engine spiders can't crawl) is a good solution. Include some original content as an introduction to the page and then place the "duplicate content" into an iframe. This will allow the page to get indexed in the search engine results without putting you at risk. I recommend having at least 500 words of unique content per page:
2)The other option - if you don't want to write introductory text - is to tell Google not to index those pages. Add a noindex,follow tag to pages that have duplicate content.
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have some problems and im confusing now with google webmaster tools
I have example:
http://site.com/link/text.html
and im using CustomVar on google analytics to track clicks from external site
example
http://site.com/link/text.html?promoid=123
Now in webmaster tools i have toons of duplicated links
In robots.txt im add
Disallow: *?promoid
but im not sure if this good idea...
What i should do now, still use robots file and disallow promoid or maybe use rel="canonical" ?
Edit: all links with ?promoid=123 is posted on external site not on my...
This is exactly what canonical URLs are for. It will tell Google that http://site.com/link/text.html is the main URL to use for that page and that any other page using that canonical URL is just a minor variation of that page.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 11 months ago.
Improve this question
this is just a simple SEO question.
I have a wordpress SEO plug-in that has this option:
Meta robots: [checkbox] noindex, follow
Should I check this option if I want my page to be available on Google?
Noindex means that the page may not be indexed and thus the page which is affected by this Robots Exclusion Protocol directive, will not appear in major search engines, including Google. The follow directive doesn't have much to do with whether the page appears in a search engine's results.
See this for more info (applies on most search engines):
Control Crawling/Indexing
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
What is the use of relativepagescore meta tag and how does it effect SEO?
<meta name="relativepagescore" content="555">
After extensive research I found the source of the tag, his associated to OpenSearch xml file.
This meta tag points to the internal rating of the page.
There is no evidence that the search engines even know\read relativepagescore meta tag, and therefore there is no affect on the relevance of Google search engines results.
But if your site has an internal search engine i will recommend you to consider the use of OpenSearch project.
Which influence directly on your site traffic, build a loyal users, and influence you website SEO work
Explanation of the installation and use of the file you will find here
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
This a very basic question, but I can't find a direct answer anywhere online. When searching for my website on google, sitemap.xml and robots.txt are returned as search results (amongst more useful results). To prevent this should I add the following lines to robots.txt?:
Disallow: /sitemap.xml
Disallow: /robots.txt
This won't stop search engines accessing the sitemap or robots file?
Also/Instead should I use google's URL removal tool?
you won't stop the crawler from indexing robots.txt because its a chicken and the egg situation, however, if you aren't specifying google and other search engines to look directly at the sitemap, you could lose some indexing weight from denying your sitemap.xml.
Is there a particular reason why you would want to not have users be able to see the sitemap?
I actually do this which is specific just for the google crawler:
Allow: /
# Sitemap
Sitemap: http://www.mysite.com/sitemap.xml