Avoid google indexing subdomains - indexing

As far as I searched for it, not able to find a proper answer for such kinda problem.
I have a few TLDs installed on the same cPanel account.
One of them is known as the main domain, and the rest are secondary domain.
cPanel automatically creates subdomains when you add a secondary domain somthing like;
http://secondary.maindomain.com
My problem is google indexed my pages both from 2 addresses.
Like:
secondary.com/blabla.html
secondary.maindomain.com/blabla.html
How can I remove those indexes from google? And
How can I avoid those subdomains being indexed for the future?

For this purpose you can add robots.txt to your document root path and add 'Disallow: ' to avoid any search engine or Google to index your files or directories.
For example to avoid indexing your subdomain in google add below entries in robots.txt and place robots.txt in document root path of you subdomain:
User-agent: Googlebot
Disallow: /
or for all search engines:
User-agent: *
Disallow: /

Related

Noindex Only One Subdomain

I am having difficulty finding information on how to completely noindex only one particular subdomain via htaccess (from my understanding, that's the best way?) and it is important for me that only that one subdomain and its files are never indexed or crawlable.
I have an Apache server that uses Plesk and the subdomain is for an email software we use for newsletter campaigns etc.
The subdomain is "mail" (e.g https://mail.test.com) and my goal is to only make "mail" noindex because for some reason the software has seo features that can wind up harming our general purpose etc.
Create robots.txt inside subdomain document root with the following content:
User-agent: *
Disallow: /

How to remove folder and Its child pages from Google Search Index

I am redesigning my site and It is located in sub folder of website directory. And Google have indexed our new site from sub folder which is affecting my search results of live site.
Is there any specific way, that I can remove sub folder from google search index and google search results ?
e.g. My Live site is www.xyz.com and
I am redesigning on www.xyz.com/newsite
Is there anyway that I can remove /newsite from google search index and results ?
Refer http://www.robotstxt.org/robotstxt.html
Add this robots.txt file
User-agent: *
Disallow: /newsite/
or best suited, get access to Google Webmaster
https://www.google.com/webmasters/tools/url-removal?hl=en&siteUrl=
add your website url after =
For example:
https://www.google.com/webmasters/tools/url-removal?hl=en&siteUrl=http://www.techplayce.com/
Yes by uploading robots.txt file on your site directory...
User-agent: *
Disallow: /newsite/
add this code if you have wordpress site then install a plugin for robots.txt

Robots.txt and sub-folders

Several domains are configured as add-ons to my primary hosting account (shared hosting).
The directory structure looks like this (primary domain is example.com):
public_html (example.com)
_sub
ex1 --> displayed as example-realtor.com
ex2 --> displayed as example-author.com
ex3 --> displayed as example-blogger.com
(the SO requirement to use example as the domain makes explanation more difficult - for example, sub ex1 might point to plutorealty and ex2 might point to amazon, or some other business sub-hosting with me. The point is that each ex# is a different company's website, so mentally substitute something normal and different for each "example")
Because these domains (ex1, ex2, etc) are add-on domains, they are accessible in two ways (ideally, the 2nd method is known only to me):
(1) http://example1.com
(2) http://example.com/_sub/ex1/index.php
Again, example1.com is a totally unrelated website/domain name from example.com
QUESTIONS:
(a) How will the site be indexed on search engines? Will both (1) and (2) show up in search results? It is undesireable for method 2 to show up in google)
(b) Should I put a robots.txt in public_html that disallows each folder in the _sub folder? Eg:
User-agent: *
Disallow: /_sub/
Disallow: /_sub/ex1/
Disallow: /_sub/ex2/
Disallow: /_sub/ex3/
(c) Is there a more common way to configure add-on domains?
This robots.txt would be sufficient, you don’t have to list anything that comes after /_sub/:
User-agent: *
Disallow: /_sub/
This would disallow bots (who honor the robots.txt) to crawl any URL whose path starts with /_sub/. But that doesn’t necessarily stop these bots to index your URL itself (e.g., list them in their search results).
Ideally you would redirect from http://example.com/_sub/ex1/ to http://example1.com/ with HTTP status code 301. It depends on your server how that works (for Apache, you could use a .htaccess). Then everyone ends up on the canonical URL for your site.
Do not Use Multi site features with Google. Google Ranking effect on Main domain also. If Black hat and also Spam generate sub directory sites.
My Suggestion If you need important site on Sub Categories then Put all Sub Domain noindex .
Robot.txt
User-agent: *
Disallow: /_sub/
Disallow: /_sub/ex1/
Disallow: /_sub/ex2/
Disallow: /_sub/ex3/

How may i prevent search engines from crawling a subdomain on my website?

I have cPanel installed on my website.
I went to the Domains section on cPanel
I clicked on subdomains.
I assigned the subdomain name (e.g : personal.mywebsite.com )
It wanted me to assign document root folder also. I assigned mywebsite.com/personal
if i create robots.txt in my website root(e.g : website.com)
User-agent:
Disallow: /personal/
Can it also block personal.mywebsite.com?
what should i do?
thanks
When you want to block URLs on personal.example.com, visit http://personal.example.com/robots.txt (resp. https instead of http).
It doesn’t matter how your server organizes folders in the backend, it only matters which robots.txt is available when accessing this URL.

How to block all subdomain in a site from search engine?

I have around 300+ subdomains for my site, I need to block them from indexing in search engines.
I saw this robots.txt code
User-agent: *
Disallow: /
But I need to do it for every subdomain, is there an easier way to do it with single robot.txt file in the root of the main domain.