sitemaps on cross submits - seo

I went through the following link http://www.sitemaps.org/protocol.html for sitemaps & cross submits but did not get clarification if the following approach is correct or not:
I've the following websites one for desktop users and other for mobile users both server different content:
https://www.mainsite.com and https://mobile.site.com both domains point to same website physical root directory and based on user's device the domain URL is changed.
I've placed a robots.txt file in this root directory which has an entry to sitemap_index.xml file:
sitemap: https://www.mainsite.com/sitemap_index.xml
In sitemap_index file
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>
https://www.mainsite.com/sitemap_desktop_www.xml
</loc>
<lastmod>2015-09-04</lastmod>
</sitemap>
<sitemap>
<loc>
https://mobile.site.com/sitemap_mobile_www.xml
</loc>
<lastmod>2015-05-22</lastmod>
</sitemap>
</sitemapindex>
Is this approach correct?
If a bot reads robots.txt and sitemap_index.xml file for domain www.mainsite.com and in sitemap_index.xml will it ignore mobile.site.com?

You have to add your two websites in Google webmaster as two properties and in each one you can add your XML sitemap without the need for the robots.txt
Google will start crawling and indexing your website and see your XML sitemap for each domain.

Related

Sitemap files on different domains

I am creating multiple sitemap files for my website. The issue is that my sitemap files are located on a different file server from my website.
For example, I have a website by domain, www.example.com, however my sitemap index file and the other sitemap files reside on www.filestack.com.
My sitemap index file will look like:
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
<sitemap>
<loc>
https://www.filestack.com/sitemap1.xml
</loc>
</sitemap>
Though my sitemap1.xml will be:
<url>
<loc>
https://www.example.com/test
</loc>
<lastmod>2017-09-04</lastmod>
<changefreq>weekly</changefreq>
</url>
Is it possible to add links to do such a thing and how?
See Sitemaps & Cross Submits.
You have to provide a robots.txt at https://www.example.com/robots.txt which
links to the external sitemap:
Sitemap: https://www.filestack.com/sitemap1.xml
(This sitemap may only contain URLs from https://www.example.com/.)
You can use either XML sitemap or HTML sitemap as per Matt Cutts says. It's not mandatory that you must use both sitemaps. Though you can't submit an HTML sitemap to search engines, spiders can crawl your HTML sitemap and crawl pages deeper into your site. But you can not use XML sitemap that is on the different server.

How to remove folder and Its child pages from Google Search Index

I am redesigning my site and It is located in sub folder of website directory. And Google have indexed our new site from sub folder which is affecting my search results of live site.
Is there any specific way, that I can remove sub folder from google search index and google search results ?
e.g. My Live site is www.xyz.com and
I am redesigning on www.xyz.com/newsite
Is there anyway that I can remove /newsite from google search index and results ?
Refer http://www.robotstxt.org/robotstxt.html
Add this robots.txt file
User-agent: *
Disallow: /newsite/
or best suited, get access to Google Webmaster
https://www.google.com/webmasters/tools/url-removal?hl=en&siteUrl=
add your website url after =
For example:
https://www.google.com/webmasters/tools/url-removal?hl=en&siteUrl=http://www.techplayce.com/
Yes by uploading robots.txt file on your site directory...
User-agent: *
Disallow: /newsite/
add this code if you have wordpress site then install a plugin for robots.txt

Google isn't indexing all the Pages in my Site

In my website i use jquery to dynamically change the primary contents of a div(my primary content) so that pages are not reloading when someone presses a link but it adds content to the div.
Google searches for links in my site and finds only #smth and it does not index the pages.What should i do so that Google will index my other pages?
Any thoughts?
You can add a sitemap.xml file using the Sitemaps protocol to the root of your website (or another location specified in robots.txt). The Sitemaps protocol allows you to inform search engines about URLs on a website that are available for crawling (wiki).
An example sitemap (from referenced wiki above) looks like this:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://example.com/</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The crawler of a search engine will visit your sitemap.xml and index the locations specified.
I found out that the answer is to add ! in front of your hash urls and configure the server to send a snapshot of the page to google more info here

sitemap for multiple domains of same site

Here is the situation, i have a website that can be accessed from multiple domains, lets say www.domain1.com, www.domain2.net, www.domain3.com. the domains access the exact same code base, but depending on the domain, different CSS, graphics, etc are loaded.
everything works fine, but now my question is how do i deal with the sitemap.xml?
i wrote the sitemap.xml for the default domain (www.domain1.com), but what about when the site is accessed from the other domains? the content of the sitemap.xml will contain the wrong domain.
i read that i can add multiple sitemap files to robots.txt, so does that mean that i can for example create sitemap-domain2.net.xml and sitemap-domain3.com.xml (containing the links with the matching domains) and simply add them to robots.txt?
somehow i have doubts that this would work thus i turn to you experts to shed some light on the subject :)
thanks
You should use server-side code to send the correct sitemap based on the domain name for requests to /sitemap.xml
Apache rewrite rules for /robots.txt requests
If you're using Apache as a webserver, you can create a directory called robots and put a robots.txt for each website you run on that VHOST by using Rewrite Rules in your .htaccess file like this:
# URL Rewrite solution for robots.txt for multidomains on single docroot
RewriteCond %{REQUEST_FILENAME} !-d # not an existing dir
RewriteCond %{REQUEST_FILENAME} !-f # not an existing file
RewriteCond robots/%{HTTP_HOST}.txt -f # and the specific robots file exists
RewriteRule ^robots\.txt$ robots/%{HTTP_HOST}.txt [L]
NginX mapping for /robots.txt requests
When using NginX as a webserver (while taking yourdomain1.tld and yourdomain2.tld as example domains), you can achieve the same goal as post above with the following conditional variable (place this outside your server directive):
map $host $robots_file {
default /robots/default.txt;
yourdomain1.tld /robots/yourdomain1.tld.txt;
yourdomain2.tld /robots/yourdomain2.tld.txt;
}
This way you can use this variable in a try_files statement inside your server directive:
location = /robots.txt {
try_files /robots/$robots_file =404;
}
Content of /robots/*.txt
After setting up the aliases to the domain-specific robots.txt-files, add the sitemap to each of the robots files (e.g.: /robots/yourdomain1.tld.txt) using this syntax at the bottom of the file:
# Sitemap for this specific domain
Sitemap: https://yourdomain1.tld/sitemaps/yourdomain1.tld.xml
Do this for all domains you have, and you'll be set!
You have to make sure URLs in each XML sitemap match within domain/subdomain. But, if you really want, you can host all sitemaps on one domain look using "Sitemaps & Cross Submits"
I'm not an expert with this but I have a similar situation
for my situation is that I have one domain but with 3 sub-domain
so what happen is that each of the sub-domain contain the sitemap.xml
but since my case was different directory for each of the sub-domain
but I'm pretty sure that the sitemap.xml can be specify for which of each domain.
The easiest method that I have found to achieve that is to use an XML sitemap generator to create a sitemap for each domain name.
Place both the /sitemap.xml in the root directory of your domains or sub-domains.
Go to Google Search and create separate properties for each domain name.
Submit an appropriate sitemap to each domain in the Search Console. The submission will say show success.
I'm facing a similar situation for a project I'm working on right now. And Google Search Central actually have the following answer:
If you have multiple websites, you can simplify the process of creating and submitting sitemaps by creating one or more sitemaps that include URLs for all your verified sites, and saving the sitemap(s) to a single location. All sites must be verified in Search Console.
So it seems that as long as you have added the different domains as your properties in Google Search Console, at least Google will know how to deal with the rest, even if you upload sitemaps for the other domains to only one of your properties in the Google Search Console.
For my use case, I then use server side code to generate sitemaps where all the dynamic pages with English content end up getting a location on my .io domain, and my pages with German content end up with a location on the .de domain:
<url>
<loc>https://www.mydomain.io/page/some-english-content</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://www.mydomain.de/page/some-german-content</loc>
<changefreq>weekly</changefreq>
</url>
And then Google handles the rest. See docs.

seo question: where should my blog sitemap.xml live

we have a blog in a sub-directory of the main url.
http://www.domain.com/blog/
the blog is run by wordpress and we are using Google Sitemap Generator to create the XML file.
We have an index of all of our sitemaps in the main sitemap.xml which leads to many sitemaps.
From an SEO standpoint would it be best to link directly to the sitemap that is under the blog directory:
e.g. http://www.domian.com/blog/sitemap.xml
or should be do a cron (daily) to copy the file to the main domain's directory:
e.g. http://www.domain.com/sitemap_blog.xml
which will be linked from the main index with the other sitemaps.
What is the best way from an SEO standpoint???
It doesn't matter where the sitemap is. you will want to register its location with the search engines you want to be able to find it. The main thing though is to have a link to your sitemap location in the robots.txt file using the following line:
Sitemap: <sitemap_location>
Your robots.txt file should be in your domain's root.