is there any any to prevent google from indexing my folder? Not "Page" but folder
using htaccess or robot.txt or any other way?
add below to your robots.txt file
Disallow: /foldername/
let me know if that helps
Related
I am redesigning my site and It is located in sub folder of website directory. And Google have indexed our new site from sub folder which is affecting my search results of live site.
Is there any specific way, that I can remove sub folder from google search index and google search results ?
e.g. My Live site is www.xyz.com and
I am redesigning on www.xyz.com/newsite
Is there anyway that I can remove /newsite from google search index and results ?
Refer http://www.robotstxt.org/robotstxt.html
Add this robots.txt file
User-agent: *
Disallow: /newsite/
or best suited, get access to Google Webmaster
https://www.google.com/webmasters/tools/url-removal?hl=en&siteUrl=
add your website url after =
For example:
https://www.google.com/webmasters/tools/url-removal?hl=en&siteUrl=http://www.techplayce.com/
Yes by uploading robots.txt file on your site directory...
User-agent: *
Disallow: /newsite/
add this code if you have wordpress site then install a plugin for robots.txt
In robots.txt can I write the following relative URL for the sitemap file?
sitemap: /sitemap.ashx
Or do I have to use the complete (absolute) URL for the sitemap file, like:
sitemap: http://subdomain.domain.com/sitemap.ashx
Why I wonder:
I own a new blog service, www.domain.com, that allow users to blog on accountname.domain.com.
I use wildcards, so all subdomains (accounts) point to: "blog.domain.com".
In blog.domain.com I put the robots.txt to let search engines find the sitemap.
But, due to the wildcards, all user account share the same robots.txt file.Thats why I can't use the second alternative. And for now I can't use url rewrite for txt files. (I guess that later versions of IIS can handle this?)
According to the official documentation on sitemaps.org it needs to be a full URL:
You can specify the location of the Sitemap using a robots.txt file. To do this, simply add the following line including the full URL to the sitemap:
Sitemap: http://www.example.com/sitemap.xml
Google crawlers are not smart enough, they can't crawl relative URLs, that's why it's always recommended to use absolute URL's for better crawlability and indexability.
Therefore, you can not use this variation
> sitemap: /sitemap.xml
Recommended syntax is
Sitemap: https://www.yourdomain.com/sitemap.xml
Note:
Don't forgot to capitalise the first letter in "sitemap"
Don't forgot to put space after "Sitemap:"
Good technical & logical question my dear friend.
No in robots.txt file you can't go with relative URL of the sitemap; you need to go with the complete URL of the sitemap.
It's better to go with "sitemap: https://www.example.com/sitemap_index.xml"
In the above URL after the colon gives space.
I also like to support Deepak.
I am wondering if there is a way to include in my robots.txt a line which stops Google from indexing any URL in my website, that contains specific text.
I have different sections, all of which contain different pages. I don't want Google to index page2, page3, etc, just the main page.
The URL structure I have is as follows:
http://www.domain.com/section
http://www.domain.com/section/page/2
http://www.domain.com/section/article_name
Is there any way to put in my robots.txt file a way to NOT index any URL containing:
/page/
Thanks in advance everyone!
User-agent: Googlebot
Disallow: http://www.domain.com/section/*
or depending on your requirement:
User-agent: Googlebot
Disallow: http://www.domain.com/section/page/*
Also you may use the Google Webmaster tools rather than the robots.txt file
Goto GWT / Crawl / URL Parameters
Add Parameter: page
Set to: No URLs
You can directly use
Disallow: /page
I did:
<meta name="robots" content="none"/>
Is that the best way to go about it, or is there a better way?
You can create a file called robots.txt in the root of your site. The format is this:
User-agent: (user agent string to match)
Disallow: (URL here)
Disallow: (other URL here)
...
Example:
User-agent: *
Disallow: /
Will make (the nice) robots not index anything on your site. Of course, some robots will completely ignore robots.txt, but for the ones that honor it, it's your best bet.
If you'd like more information on the robots.txt file, please see http://www.robotstxt.org/
You could use a robots.txt file to direct search engines which pages not to index.
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
Or use the meta noindex tag in each page.
Let's say I have a web site for hosting community generated content that targets a very specific set of users. Now, let's say in the interest of fostering a better community I have an off-topic area where community members can post or talk about anything they want, regardless of the site's main theme.
Now, I want most of the content to get indexed by Google. The notable exception is the off-topic content. Each thread has it's own page, but all the threads are listed in the same folder so I can't just exclude search engines from a folder somewhere. It has to be per-page. A traditional robots.txt file would get huge, so how else could I accomplish this?
This will work for all well-behaving search engines, just add it to the <head>:
<meta name="robots" content="noindex, nofollow" />
If using Apache I'd use mod-rewrite to alias robots.txt to a script that could dynamically generate the necessary content.
Edit: If using IIS you could use ISAPIrewrite to do the same.
You can implement it by substituting robots.txt with dynamic script generating the output.
With Apache You could make simple .htaccess rule to acheive that.
RewriteRule ^robots\.txt$ /robots.php [NC,L]
Simlarly to #James Marshall's suggestion - in ASP.NET you could use an HttpHandler to redirect calls to robots.txt to a script which generated the content.
Just for that thread , make sure your head contains a noindex meta tag. Thats one more way to tell search engines not to crawl your page other than blocking in robots.txt
Just keep in mind that a robots.txt disallow will NOT prevent Google from indexing pages that have links from external sites, all it does is prevent crawling internally. See http://www.webmasterworld.com/google/4490125.htm or http://www.stonetemple.com/articles/interview-matt-cutts.shtml.
You can disallow search engines to read or index your content by restricting robot meta tags. In this way, spider will consider your instructions and will index only such pages that you want.
block dynamic webpage by robots.txt use this code
User-agent: *
Disallow: /setnewsprefs?
Disallow: /index.html?
Disallow: /?
Allow: /?hl=
Disallow: /?hl=*&