How to prevent a search engine from indexing a directory for a particular domain? - indexing

I have a web hosting package with 2 domains pointing to it. I've noticed on Google that it has indexed the directory of one of the domains for the other domain. Is there a way of preventing this from happening.

You could try with the Robots exclusion standard but is no guarantee.

Redirect all pages of one of your domains to the other one. You can do that with .htaccess and modRewrite similar to this:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
This would perform a 301 redirect (Permanently moved) from example.com to www.example.com.
For SEO purposes you never want to have duplicate content (identical pages on different URLs), there should always be exactly one URL for your content, all other possible URLs should redirect to that one.

Updating your robots.txt will definitely solve the problem in the future, but I think the question you should be asking is, How did Google know those pages were there?
First, you should ensure that a user can't traverse your site's filesystem (if your server is *nix, .htaccess should have something like Options -Indexes). And if you had a public link anywhere that joined the two sites on a single domain, that could be how Google found it. If you are careful to keep your site clean and never point to the files in the other docroot, there should be no problem hosting one domain off the subdirectory of another domain.
You can clear Google's index of those pages by using their Webmaster Tools. In order to identify yourself as the site's owner, you'll need to install a unique file (they create it for you) in the root directory of your various document roots, then you can manually update the parts of your site that they've indexed. This applies only to Google.
If you've been indexed by other search engines (and you probably have been if Google indexed you), you should try to figure out how they got there, fix the problem, move the second site to another folder (causing the pages to report 404 Page Not Found on your main domain) and then get the the search engines to reindex.

If you are using Linux, then some additions to your .htaccess file would probably work, but the specifics would depend on your site setup.

Related

How to move opencart site http to https

I have already installed ssl certificate on my opencart site but some pages are working fine with https but category pages not working with https. Do I need to change all url in database also? In the config file, I already set https.
Some of these may not apply to your particular installation but in the interest of creating a comprehensive answer, I've tried to cover all the bases here:
Note: you might need to adjust the table names depending on your store's table prefix if they don't begin with oc_
Open config.php and admin/config.php and change all those constant url declarations to https - make sure to include HTTP_SERVER and HTTP_CATALOG
In your admin panel go to system > settings, click edit and in the server table set Use SSL: to Yes.
In your database update the store_url column in the oc_order table so that all links are https. This is important because updating orders can fail if the api attempts to access http version of your site. you can use this query: UPDATE oc_order SET store_url = REPLACE(store_url, 'http:', 'https:')
If you have any hard coded images and links in your description tables you should replace those as well. SSL will still work but will show the warning flag in the browser bar. This includes oc_product_description, oc_category_description, and any other tables where you might have created html content.
Same as above for your theme files. It's fairly common to find hard coded http:// links and images in footer.tpl and header.tpl for starters. You can simply browser your site to see if any of the pages are not showing the green lock icon in the browser and take it from there.
Another culprit breaking https can be third party extensions which can exist both as files and in OC2 as ocmods in the oc_modification table.
Finally, create a redirect in .httaccess to gracefully let traffic know that your pages can now be found on https. I've excluded robots.txt and any connections for the openbay routes because, based on experience, when I tried to redirect ebay webhooks it broke things and they seem to be http only by default. I suspect this may be a shortcoming in how openbay handles those requests, or possibly a configuration issue but I was unable to find a workaround that didn't break openbay so for now I'd recommend leaving those requests untouched. I am using this in .htaccess:
RewriteCond %{HTTPS} off
RewriteCond %{REQUEST_URI} !/robots\.txt$
RewriteCond %{QUERY_STRING} !^route=ebay/openbay/*
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
That should do it!

Redirecting old unused domain to new domain will help to increase Google authority?

I have been doing blogging since 5 years back and from last year i had stopped blogging and delete the domain content. (During last year all pages are removed from indexed )
now I had purchased a new domain BlogTechie and i am planning to 301 redirect that old domain to new domain.
Is It helped to gain SEO authority in Google or I should start from scratch without worrying about old domain.
I am also adding settings in webmaster tools to inform Google for the change.
SEOs attribute a large portion of most search engines' ranking algorithms to link-based factors. It's possible there may be old links to your pages out there on the internet on other websites. You can capitalize on this if you still own the old URL and boost your new domain's ranking with redirects.
If you know some of your older content's URLs, it might make sense to have a one to one redirect to the new page. If you're using apache, you can do this with an .htaccess file:
RewriteEngine On
RedirectMatch 301 /folder/oldpage.php http://www.newdomain.org/newpage.php
Anything remaining can redirect to the root.
RewriteRule ^(.*)$ http://newdomain.com/ [R=301]
Check out SEO Moz for more explanation on this: http://moz.com/learn/seo/redirection

sitemap for multiple domains of same site

Here is the situation, i have a website that can be accessed from multiple domains, lets say www.domain1.com, www.domain2.net, www.domain3.com. the domains access the exact same code base, but depending on the domain, different CSS, graphics, etc are loaded.
everything works fine, but now my question is how do i deal with the sitemap.xml?
i wrote the sitemap.xml for the default domain (www.domain1.com), but what about when the site is accessed from the other domains? the content of the sitemap.xml will contain the wrong domain.
i read that i can add multiple sitemap files to robots.txt, so does that mean that i can for example create sitemap-domain2.net.xml and sitemap-domain3.com.xml (containing the links with the matching domains) and simply add them to robots.txt?
somehow i have doubts that this would work thus i turn to you experts to shed some light on the subject :)
thanks
You should use server-side code to send the correct sitemap based on the domain name for requests to /sitemap.xml
Apache rewrite rules for /robots.txt requests
If you're using Apache as a webserver, you can create a directory called robots and put a robots.txt for each website you run on that VHOST by using Rewrite Rules in your .htaccess file like this:
# URL Rewrite solution for robots.txt for multidomains on single docroot
RewriteCond %{REQUEST_FILENAME} !-d # not an existing dir
RewriteCond %{REQUEST_FILENAME} !-f # not an existing file
RewriteCond robots/%{HTTP_HOST}.txt -f # and the specific robots file exists
RewriteRule ^robots\.txt$ robots/%{HTTP_HOST}.txt [L]
NginX mapping for /robots.txt requests
When using NginX as a webserver (while taking yourdomain1.tld and yourdomain2.tld as example domains), you can achieve the same goal as post above with the following conditional variable (place this outside your server directive):
map $host $robots_file {
default /robots/default.txt;
yourdomain1.tld /robots/yourdomain1.tld.txt;
yourdomain2.tld /robots/yourdomain2.tld.txt;
}
This way you can use this variable in a try_files statement inside your server directive:
location = /robots.txt {
try_files /robots/$robots_file =404;
}
Content of /robots/*.txt
After setting up the aliases to the domain-specific robots.txt-files, add the sitemap to each of the robots files (e.g.: /robots/yourdomain1.tld.txt) using this syntax at the bottom of the file:
# Sitemap for this specific domain
Sitemap: https://yourdomain1.tld/sitemaps/yourdomain1.tld.xml
Do this for all domains you have, and you'll be set!
You have to make sure URLs in each XML sitemap match within domain/subdomain. But, if you really want, you can host all sitemaps on one domain look using "Sitemaps & Cross Submits"
I'm not an expert with this but I have a similar situation
for my situation is that I have one domain but with 3 sub-domain
so what happen is that each of the sub-domain contain the sitemap.xml
but since my case was different directory for each of the sub-domain
but I'm pretty sure that the sitemap.xml can be specify for which of each domain.
The easiest method that I have found to achieve that is to use an XML sitemap generator to create a sitemap for each domain name.
Place both the /sitemap.xml in the root directory of your domains or sub-domains.
Go to Google Search and create separate properties for each domain name.
Submit an appropriate sitemap to each domain in the Search Console. The submission will say show success.
I'm facing a similar situation for a project I'm working on right now. And Google Search Central actually have the following answer:
If you have multiple websites, you can simplify the process of creating and submitting sitemaps by creating one or more sitemaps that include URLs for all your verified sites, and saving the sitemap(s) to a single location. All sites must be verified in Search Console.
So it seems that as long as you have added the different domains as your properties in Google Search Console, at least Google will know how to deal with the rest, even if you upload sitemaps for the other domains to only one of your properties in the Google Search Console.
For my use case, I then use server side code to generate sitemaps where all the dynamic pages with English content end up getting a location on my .io domain, and my pages with German content end up with a location on the .de domain:
<url>
<loc>https://www.mydomain.io/page/some-english-content</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://www.mydomain.de/page/some-german-content</loc>
<changefreq>weekly</changefreq>
</url>
And then Google handles the rest. See docs.

Proper 301 redirect for sites

I have a bit of a complex question. I am moving sites from
http://www.hikingsanfrancisco.com
to
http://www.comehike.com
The directory structures will not be the same throughout both sites. What are some of the best practice things I can do in order to retain most of my existing SEO strength in both the general domain and individual pages for searches related to the other pages?
Thank you,
Alex
If most of the URLs are staying the same and just the domain is changing, you could create an .htaccess file in the root folder at the old site with the following:
Options +FollowSymLinks
RewriteEngine on
RewriteRule (.*) http://www.comehike.com/$1 [R=301,L]
This will make hikingsanfrancisco.com/some-page go to comehike.com/some-page.
Otherwise in that same htaccess file you could add a line for each redirect. So if hikingsanfrancisco.com/big-hikes is now going to comehike.com/even-bigger-hikes the redirect would look like:
Redirect 301 /big-hikes http://www.comehike.com/even-bigger-hikes
That 301 tells Google to now consider the new URL correct.
To redirect the whole site no matter what to the new URL you could use this:
Redirect 301 / http://www.comehike.com/
A 301 Redirect, page by page, is the best option (If you can use regular expressions is easier). Redirect the old page to a page in the new site with similar content.
Use the change of address tool in Google Webmasters tools.
Try to contact some of yours referrals to change the links that target your site.

Multiple domains for one site: alias or redirect?

I'm setting up a number sites right now and many of them have multiple domains. The question is: do I alias the domain (with ServerAlias) or do I Redirect the request?
Obviously ServerAlias is better/easier from a readability or scripting perspective. I have heard however that Google likes it better if everything redirects to one domain. Is this true? If so, what redirect code should be used?
Common vhost examples will have:
ServerName example.net
ServerAlias www.example.net
Is this wrong and should the www also be a redirect in addition to example2.net and www.example2.net? Or is Google smart enough to that all these sites (or at least the www) are the same site?
UPDATE: Part of the reasoning for wanting aliases is that they are much faster. A redirect for a dialup user just because they did (or didn't) use the www adds significantly to initial page load.
UPDATE and ANSWER: Thanks Paul for finding the Google link which instructs us to "help your fellow webmasters by not perpetuating the myth of duplicate content penalties". Note, however, this only applies to content ON THE SAME SITE, exemplified in the article with "www.example.com/skates.asp?color=black&brand=riedell or www.example.com/skates.asp?brand=riedell&color=black". In fact, the article explicitly says "Don't create multiple pages, subdomains, or domains with substantially duplicate content."
Redirecting is better, then there is always one, canonical domain for your content. I hear Google penalises multiple domains hosting the same content, but I can't find a source for that at the moment (edit, here's one article, but from 2005, which is ancient history in Internet years!) (not correct, see edit below)
Here's some mod-rewrite rules to redirect to a canonical domain:
RewriteCond %{HTTP_HOST} !^www\.foobar\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.foobar.com/$1 [L,R=permanent]
That checks that the host isn't the canonical domain (www.foobar.com) and checks that a domain has actually been specified, before deciding to redirect the request to the canonical domain.
Further Edit: Here's an article straight from the horses mouth - seems it's not as big an issue as you might think. Please read this article CAREFULLY as it distinguishes between duplicate content on the same site (as in "www.example.com/skates.asp?color=black&brand=riedell and www.example.com/skates.asp?brand=riedell&color=black") and specifically says "Don't create multiple pages, subdomains, or domains with substantially duplicate content."
SSL certificates can also be an issue (wild card certs mitigate this but are more expensive).
So if the cert is only bound to www.example.com, it won't validate for example.com. If this circumstance applies to your case, then carefully handling, redirects and hyperlink references in your html and javascript is very important.
If they are entirely different domain names, you will want to redirect because otherwise cookies can not be shared between the two. If a user logs into your website at example1.com, they will need to log in again if they visit example2.com.
If they are just different subdomains (example.com vs www.example.com) this won't matter.
Server aliasing can cause problems with CGI session continuity: since cookies are attached to the domain they were served from, CGI scripts have to be carefully written so that they are aware of the aliasing, or all links within and into the site have to be relative, or both - it is much harder to avoid niggly little hard-to-debug problems due to the browser serving you different cookies based on whether the user last entered your site through name.tld or www.name.tld.
Nowadays I doubt it matters. If you see both entries in google, then you know you're doing it wrong.
If half the links to your site refer to one URL and half refer to another, each URL is only going to get half the pagerank. Even if Google doesn't penalize your rank for having duplicate content, you're going to suffer.