Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I've about 100 pages of on my website which I don't want to be indexed in google...is there any way to block it using robots.txt..It'd be very tiresome to edit each page and add noindex meta tag....
All the urls which I want to block goes like...
www.example.com/index-01.html
www.example.com/index-02.html
www.example.com/index-03.html
www.example.com/index-04.html
.
.
.
.
www.example.com/index-100.html
Not sure but will adding something like the following work?
User-Agent: *
Disallow: /index-*.html
Yes it will work using wildcard
Ref : "https://geoffkenyon.com/how-to-use-wildcards-robots-txt"
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
I want to hide some products from whole the front end of prestashop and only they must be visible by the link and nowhere else !
Is it possible ?
thank you
Sure you can, go to product admin page then (in prestashop 1.7) in Options tab, just make change Visibility to "Nowhere". So it won't be displayed anywhere except if you have the url.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I would like to have something cleared up.
On a member based website, there are certain pages that can only be accessed by a particular member; such as edit profile, edit password..etc.
My question is, do those pages need to be included in the sitemap that is submitted to search engines?
No. Only add pages that you want search engines to index and are available to the search engine to be crawled. These pages do not meet either criteria.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I am generating sitemap for my web according to http://www.sitemaps.org/.
Is it possible to have in the sitemap external links?
Or the sitemap always include only the internal links.
Thanks.
According to sitemaps.org:
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling.
So, you should not include external URLs to your sitemap at all.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 9 years ago.
Improve this question
I try to find how to block crawlers to access my links that are something like this:
site.com/something-search.html
I want to block all /something-*
Can someone help me?
User-agent: *
Disallow: /something-
This blocks all URLs whose path starts with /something-, for example for a robots.txt accessible from http://example.com/robots.txt:
http://example.com/something-
http://example.com/something-foo
http://example.com/something-foo.html
http://example.com/something-foo/bar
…
The following URLs would still be allowed:
http://example.com/something
http://example.com/something.html
http://example.com/something/
…
In your robots.txt
User-agent: *
Disallow: site.com/something-(1st link)
.
.
.
Disallow: site.com/somedthing-(last link)
Add entry for each page that you don't want to be seen!
Though regex are not allowd in robots.txt some intelligent crawlers can understand it!
have a look here
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 11 months ago.
Improve this question
this is just a simple SEO question.
I have a wordpress SEO plug-in that has this option:
Meta robots: [checkbox] noindex, follow
Should I check this option if I want my page to be available on Google?
Noindex means that the page may not be indexed and thus the page which is affected by this Robots Exclusion Protocol directive, will not appear in major search engines, including Google. The follow directive doesn't have much to do with whether the page appears in a search engine's results.
See this for more info (applies on most search engines):
Control Crawling/Indexing