sitemap for multiple domains of same site - seo

Here is the situation, i have a website that can be accessed from multiple domains, lets say www.domain1.com, www.domain2.net, www.domain3.com. the domains access the exact same code base, but depending on the domain, different CSS, graphics, etc are loaded.
everything works fine, but now my question is how do i deal with the sitemap.xml?
i wrote the sitemap.xml for the default domain (www.domain1.com), but what about when the site is accessed from the other domains? the content of the sitemap.xml will contain the wrong domain.
i read that i can add multiple sitemap files to robots.txt, so does that mean that i can for example create sitemap-domain2.net.xml and sitemap-domain3.com.xml (containing the links with the matching domains) and simply add them to robots.txt?
somehow i have doubts that this would work thus i turn to you experts to shed some light on the subject :)
thanks

You should use server-side code to send the correct sitemap based on the domain name for requests to /sitemap.xml

Apache rewrite rules for /robots.txt requests
If you're using Apache as a webserver, you can create a directory called robots and put a robots.txt for each website you run on that VHOST by using Rewrite Rules in your .htaccess file like this:
# URL Rewrite solution for robots.txt for multidomains on single docroot
RewriteCond %{REQUEST_FILENAME} !-d # not an existing dir
RewriteCond %{REQUEST_FILENAME} !-f # not an existing file
RewriteCond robots/%{HTTP_HOST}.txt -f # and the specific robots file exists
RewriteRule ^robots\.txt$ robots/%{HTTP_HOST}.txt [L]
NginX mapping for /robots.txt requests
When using NginX as a webserver (while taking yourdomain1.tld and yourdomain2.tld as example domains), you can achieve the same goal as post above with the following conditional variable (place this outside your server directive):
map $host $robots_file {
default /robots/default.txt;
yourdomain1.tld /robots/yourdomain1.tld.txt;
yourdomain2.tld /robots/yourdomain2.tld.txt;
}
This way you can use this variable in a try_files statement inside your server directive:
location = /robots.txt {
try_files /robots/$robots_file =404;
}
Content of /robots/*.txt
After setting up the aliases to the domain-specific robots.txt-files, add the sitemap to each of the robots files (e.g.: /robots/yourdomain1.tld.txt) using this syntax at the bottom of the file:
# Sitemap for this specific domain
Sitemap: https://yourdomain1.tld/sitemaps/yourdomain1.tld.xml
Do this for all domains you have, and you'll be set!

You have to make sure URLs in each XML sitemap match within domain/subdomain. But, if you really want, you can host all sitemaps on one domain look using "Sitemaps & Cross Submits"

I'm not an expert with this but I have a similar situation
for my situation is that I have one domain but with 3 sub-domain
so what happen is that each of the sub-domain contain the sitemap.xml
but since my case was different directory for each of the sub-domain
but I'm pretty sure that the sitemap.xml can be specify for which of each domain.

The easiest method that I have found to achieve that is to use an XML sitemap generator to create a sitemap for each domain name.
Place both the /sitemap.xml in the root directory of your domains or sub-domains.
Go to Google Search and create separate properties for each domain name.
Submit an appropriate sitemap to each domain in the Search Console. The submission will say show success.

I'm facing a similar situation for a project I'm working on right now. And Google Search Central actually have the following answer:
If you have multiple websites, you can simplify the process of creating and submitting sitemaps by creating one or more sitemaps that include URLs for all your verified sites, and saving the sitemap(s) to a single location. All sites must be verified in Search Console.
So it seems that as long as you have added the different domains as your properties in Google Search Console, at least Google will know how to deal with the rest, even if you upload sitemaps for the other domains to only one of your properties in the Google Search Console.
For my use case, I then use server side code to generate sitemaps where all the dynamic pages with English content end up getting a location on my .io domain, and my pages with German content end up with a location on the .de domain:
<url>
<loc>https://www.mydomain.io/page/some-english-content</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://www.mydomain.de/page/some-german-content</loc>
<changefreq>weekly</changefreq>
</url>
And then Google handles the rest. See docs.

Related

.htaccess rewrite domain but keep directory structure and preserve url in address bar

I have copied a Joomla site from one domain to a new domain.
I want to rewrite the domain name only to keep the directory structure.
And I want to keep the original URL in the address bar to preserve SEO ranking.
Joomla is using relative url's, so the real domain name of the new server will not as such be invoked by Joomla.
How to do this in .htaccess on Apache?
And I want to keep the original URL in the address bar to preserve SEO ranking.
That won't help you really, just add proper 301 redirects and make sure you catch as much of the indexed url's with your redirects component within joomla to prevent any dead links (google hates those and will penalize your domain for it). Also add sitemap, upload it to your google webmaster tools and ask google to index it.

Single robots.txt file for all subdomains

I have a site (example.com) and have my robots.txt set up in the root directory. I have also multiple subdomains (foo.example.com, bar.example.com, and more to come in the future) whose robots.txt will all be identical as that of example.com. I know that I can place a file at the root of each subdomain but I'm wondering if it's possible to redirect the crawlers searching for robots.txt on any subdomain to example.com/robots.txt?
Sending a redirect header for your robots.txt file is not advised, nor is it officially supported.
Google's documentation specifically states:
Handling of robots.txt redirects to disallowed URLs is undefined and discouraged.
But the documentation does say redirect "will be generally followed". If you add your subdomains into Google Webmaster Tools and go to "Crawl > Blocked URLs" you can test your subdomain robots.txts that are 301 redirecting. It should come back as positively working.
However, with that said, I would strongly suggest that you just symlink the files into place and that each robots.txt file responds with a 200 OK at the appropriate URLs. This is much more inline with the original robots.txt specification, as well as, Google's documentation, and who knows exactly how bing / yahoo will handle it over time.

Multiple Domains to Display Content from Landing Pages on Another Domain

We have created a bunch of landing pages on a Joomla CMS system, such that the URL for each landing page is www.domain.com/page1.html and www.domain.com/page2.html, and so on. Of course the page1.html isn't really an HTML file it is a dynamic CMS page, just rewritten with htaccess.
The goal is to have one of our other domains, something like www.uniquedomain1.com show the content of www.domain.com/page1.html. Or, another domain like www.uniquedomain2.html show the content of www.domain.com/page2.html.
This needs to be search engine friendly so we can't use URL masking. Also we can't use HTACCESS redirects as this actually changes the URL in the browser bar. Need to keep the www.uniquedomain1.com URL in the browser bar.
Tried Apache VirtualHost options without any luck. You can park in a directory but not from a URL.
Ended up parking the domains on one folder, and then creating a PHP script to detect the domain host and then use CURL to query the correct url and deliver content. This whole thing seems ridiculously over complicated, and of course CURL isn't the best option, but it is all we could get to work.
Any thoughts on how to do this, or a better solution?
You can use HTACCESS redirect rules to do it without performing a redirect.
Change the html file names to be the domain name of the desired domain like domain.tld and do something like this in an .htaccess file
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(?:www\.)?([a-z0-9\.-]+\.[a-z]+) [NC]
RewriteRule ^$ /%1.html [L]
A quick test of this worked for two of my test (sub)domains test.domain.tld and test2.domain.tld. Both properly redirected to files with the names test.domain.tld.html and test2.domain.tld.html without modifying the URL.
You could also just use your PHP wrapper script to grab the content of each of the miscellaneous html files and output them.
If you renamed all of your HTML files (as in my previous suggested answer) to be domain.tld.html you could do it fairly easily. Something might look like:
<?php
require($_SERVER['SERVER_NAME'] .'.html');

Redirecting Pages: Names to Standard Address

I have WordPress installed in the root of a website, and recently enabled a custom permalink structure just for the sake of having good looking page URLs (only pages are used in this website, no posts at all — it's not a blog). Unfortunately this is causing some problems with other parts of the website, outside WordPress.
So I'd like to go the manual way: and redirect URLs like /my-page to /?page_id=32 just for a selected amount of pages. Is it possible to do that using the .htaccess file? What would the rules look like?
If you're redirecting pages from Wordpress to other URLs, you can use .htaccess. But it's probably easier to use a plugin to redirect rather than edit .htaccess.
See WordPress › Redirection « WordPress Plugins to easily set up redirects and log redirects, errors, and more.

How to prevent a search engine from indexing a directory for a particular domain?

I have a web hosting package with 2 domains pointing to it. I've noticed on Google that it has indexed the directory of one of the domains for the other domain. Is there a way of preventing this from happening.
You could try with the Robots exclusion standard but is no guarantee.
Redirect all pages of one of your domains to the other one. You can do that with .htaccess and modRewrite similar to this:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
This would perform a 301 redirect (Permanently moved) from example.com to www.example.com.
For SEO purposes you never want to have duplicate content (identical pages on different URLs), there should always be exactly one URL for your content, all other possible URLs should redirect to that one.
Updating your robots.txt will definitely solve the problem in the future, but I think the question you should be asking is, How did Google know those pages were there?
First, you should ensure that a user can't traverse your site's filesystem (if your server is *nix, .htaccess should have something like Options -Indexes). And if you had a public link anywhere that joined the two sites on a single domain, that could be how Google found it. If you are careful to keep your site clean and never point to the files in the other docroot, there should be no problem hosting one domain off the subdirectory of another domain.
You can clear Google's index of those pages by using their Webmaster Tools. In order to identify yourself as the site's owner, you'll need to install a unique file (they create it for you) in the root directory of your various document roots, then you can manually update the parts of your site that they've indexed. This applies only to Google.
If you've been indexed by other search engines (and you probably have been if Google indexed you), you should try to figure out how they got there, fix the problem, move the second site to another folder (causing the pages to report 404 Page Not Found on your main domain) and then get the the search engines to reindex.
If you are using Linux, then some additions to your .htaccess file would probably work, but the specifics would depend on your site setup.