I am just a learn in the field of SEO and i have a main domain and an addon domains. Both have separate websites. Consider main.com is my main domain and addon.com is my addon domain name which is pointed to a sub directory called "addon".
I can access addon.com by using the following 3 ways.
addon.com
main.com/addon
addon.main.com
Are these urls are indexed separately by search engines? If so how can i prevent this?
Does Search engine think main.com/addon as a page in the main.com?
I am not sure i need to worry about all these things or just leave it as it is. I searched to google but couldn't find a right answer.
It may be too late to answer. However, it may benefit others.
Primarydomain and subdomain or addon-domain will not be linked by the search engines automatically, unless you link them purposefully or inadvertently. Except all conditions are true:
Your web root normally public_html has no index page
Directory indexing of your web root is opened, eventually
exposing/linking your sub-folder -which is attached to your
addon-domain- to google and entire world.
In that scenario robots.txt solution is not recommended, because search engines may ignore robot.txt rules.
Reference
Google will only index pages if they are linked to or listed in the sitemap. You can stop the addon.main.com or main.com/addon being indexed by using noindex tags:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
or disallowing it in the robots.txt
The search engine will consider main.com/addon as a page of main.com - if sites are completely separate i'd recommend using a separate domain (preferably a keyword rich domain) but it's up to you really
We have three domain names with the same content. For the three domains, it will return a 200 OK HTTP code. It will look like duplicates of the same content. If there is a canonical tag on every page it will be better.
The best would be to create a redirection on the subdomain panel in cpanel so that at least addon.main.com would redirect to addon.com
Then, you can add a robots.txt to the root path of the primary domain and add
user-agent:*
disallow:/
so that no robot will visit main.com/addon
Google gives less weight to subdomain hosted site of another domain.
Superbad for SEO
If you are hosting for SEO and love the convenience of cPanel, then forget hosting domains as addon domains.
#Vasanthan R.P.
Its an excellent question, often overlooked by SEO professionals. +1 for you
Related
I am doing some research on Canonical pages in our site.
Does Google create two indexes in this case:
http://www.foo.com/folder/index.html
http://www.foo.com/folder/
Or does it only index one of the above?
I am curious if I need to add a rel="canonical" or if I am just overthinking this simple idea.
After research it depends on the web server.
In our case it was a Sun One web server that you could hit both foo.com/ and foo.com/index.jsp
Even though these pulled up the same content, they are two different URLs and Google saw them as two sperate pages with duplicate content. This was bumping down our SEO.
The fix was to modify the web server to auotmatically redirect /index.jsp pages to the /.
So yes, google will index any page that you can browse to in your browser, unless its on you robots.txt or you are manually telling google not to index in some fashion.
If my website only responds to www.example.com, and not example.com, does this affect search rankings at all? I haven't found anything to confirm or deny this for any major search engine, and I'm curious.
i was reading an article on this a while back from ScottGuthrie that relates to IIS SEO Toolkit - the main points are as follows:
4 Really Common SEO Problems Your Sites Might Have
Below are 4 really common scenarios that can cause your site to inadvertently expose multiple URLs for the same content. When this happens external sites linking to yours will end up splitting their page links across multiple URLs - and as a result cause you to have a lower page ranking with search engines than you deserve.
SEO Problem #1: Default Document
IIS (and other web servers) supports the concept of a “default document”. This allows you to avoid having to explicitly specify the page you want to serve at either the root of the web-site/application, or within a sub-directory. This is convenient – but means that by default this content is available via two different publically exposed URLs (which is bad). For example:
http://scottgu.com/
http://scottgu.com/default.aspx
SEO Problem #2: Different URL Casings
Web developers often don’t realize URLs are case sensitive to search engines on the web. This means that search engines will treat the following links as two completely different URLs:
http://scottgu.com/Albums.aspx
http://scottgu.com/albums.aspx
SEO Problem #3: Trailing Slashes
Consider the below two URLs – they might look the same at first, but they are subtly different. The trailing slash creates yet another situation that causes search engines to treat the URLs as different and so split search rankings:
http://scottgu.com
http://scottgu.com/
SEO Problem #4: Canonical Host Names
Sometimes sites support scenarios where they support a web-site with both a leading “www” hostname prefix as well as just the hostname itself. This causes search engines to treat the URLs as different and split search rankling:
http://scottgu.com/albums.aspx/
http://www.scottgu.com/albums.aspx/
full article at http://weblogs.asp.net/scottgu/archive/2010/04/20/tip-trick-fix-common-seo-problems-using-the-url-rewrite-extension.aspx
Google treats www.example.com and example.com as two separate domains (since 'www' is technically a sub-domain). Neither is better than the other in terms of SEO, as long as you don't mix and match links - i.e. some links point to example.com while others point to www.example.com.
If you don't have any redirects from one to the other, then links into the site (and so visitor traffic) may be split between the two sub-domains, effectively meaning you're competing with yourself in search engine rankings. It's probably a good idea to pick one (either example.com or www.example.com) then set up redirects on the other domain, and/or add canonical links to pages so that search engines know that the pages should be treated as the same site.
See here for more on canonical links in www vs non-www links.
When I searching our web site on Google I found three sites with the same content show up. I always thought we were using only one site www.foo.com, but it turn out we have www.foo.net and www.foo.info with the same content as www.foo.com.
I know it is extremely bad to have the same content under different URL. And it seems we have being using three domains for years and I have not seen punitive blunt so far. What is going on? Is Google using new policy like this blog advocate?http://www.seodenver.com/duplicate-content-over-multiple-domains-seo-issues/ Or is it OK using DNS redirect? What should I do? Thanks
If you are managing the websites via Google Webmaster Tools, it is possible to specify the "primary domain".
However, the world of search engines doesn't stop with Google, so your best bet is to send a 301 redirect to your primary domain. For example.
www.foo.net should 301 redirect to www.foo.com
www.foo.net/bar should 301 redirect to www.foo.com/bar
and so on.
This will ensure that www.foo.com gets the entire score, rather than (potentially) a third of the score that you might get for link-backs (internal and external).
Look into canonical links, as documented by Google.
If your site has identical or vastly
similar content that's accessible
through multiple URLs, this format
provides you with more control over
the URL returned in search results. It
also helps to make sure that
properties such as link popularity are
consolidated to your preferred
version.
They explicitly state it will work cross-domain.
We're doing a whitelabel site, which mustn't be google indexed.
Does anyone know a tool to check if the googlebot will index a given url ?
I've put <meta name="robots" content="noindex" /> on all pages, so it shouldn't be indexed - however I'd rather be 110% certain by testing it.
I know I could use robots.txt, however the problem with robots.txt is as follows:
Our mainsite should be indexed, and it's the same application on the IIS (ASP.Net) as the whitelabel site - the only difference is the url.
I cannot modify the robots.txt depending on the incoming url, but I can add a meta tag to all pages from my code-behind.
You should add a Robots.txt to your site.
However, the only perfect way to prevent search engines from indexing a site is to require authentication. (Some spiders ignore Robots.txt)
EDIT: You need to add an handler for Robots.txt to serve different files depending on the Host header.
You'll need to configure IIS to send the Robots.txt request through ASP.Net; the exact instructions depend on the IIS version.
Google Webmasters Tools (google.com/webmasters/tools) will (other than permitting you to upload a sitemap) do a test crawl of your site and tell you what they crawled, how it rates for certain queries, and what they will crawl and what not.
The test crawl isn't automatically included in google results, anyway if you're trying to hide sensitive data from the prying eyes of Google you cannot count on that alone: put some authentication on the line of fire, no matter what.
A few days ago we replaced our web site with an updated version. The original site's content was migrated to http://backup.example.com. Search engines do not know about the old site, and I do not want them to know.
While we were in the process of updating our site, Google crawled the old version.
Now when using Google to search for our web site, we get results for both the new and old sites (e.g., http://www.example.com and http://backup.example.com).
Here are my questions:
Can I update the backup site content with the new content? Then we can get rid all of old content. My concern is that Google will lower our page ranking due to duplicate content.
If I prevent the old site from being accessed, how long will it take for the information to clear out of Google's search results?
Can I use google disallow to block Google from the old web site.
You should probably put a robots.txt file in your backup site and tell robots not to crawl it at all. Google will obey the restrictions though not all crawlers will. You might want to check out the options available to you at Google's WebMaster Central. Ask Google and see if they will remove the errant links for you from their data.
you can always use robot.txt on backup.* site to disallow google to index it.
More info here: link text
Are the URL formats consistent enough between the backup and current site that you could redirect a given page on the backup site to its equivalent on the current one? If so you could do so, having the backup site send 301 Permanent Redirects to each of the equivalent pages on the site you actually want indexed. The redirecting pages should drop out of the index (after how much time, I do not know).
If not, definitely look into robots.txt as Zepplock mentioned. After setting the robots.txt you can expedite removal from Google's index with their Webmaster Tools
Also you can make a rule in your scripts to redirect with header 301 each page to new one
Robots.txt is a good suggestion but...Google doesn't always listen. Yea, that's right, they don't always listen.
So, disallow all spiders but....also put this in your header
<meta name="robots" content="noindex, nofollow, noarchive" />
It's better to be safe than sorry. Meta commands are like yelling at Google "I DONT WANT YOU TO DO THIS TO THIS PAGE". :)
Do both, save yourself some pain. :)
I suggest you to either add no index meta tag in all old page or just disallow by robots.txt. Best way to just blocked the by robots.txt. One thing more add the sitemap in new site and submit it in webmaster that improve your new website indexing.
Password protect your webpages or directories that you don't want web spiders to crawl/index by putting password protecting code in the .htaccess file (if present in your website's root directory on the server or create a new one and upload it).
The web spiders will never know that password and hence won't be able to index the protected directories or web pages.
you can block any particular urls in webmasters check once...even you can block using robots.txt....remove sitemap for your old backup site and put noindex no follow tag for all of your old backup pages...i too handled this situation for one of my client............