Sitemap files on different domains - seo

I am creating multiple sitemap files for my website. The issue is that my sitemap files are located on a different file server from my website.
For example, I have a website by domain, www.example.com, however my sitemap index file and the other sitemap files reside on www.filestack.com.
My sitemap index file will look like:
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
<sitemap>
<loc>
https://www.filestack.com/sitemap1.xml
</loc>
</sitemap>
Though my sitemap1.xml will be:
<url>
<loc>
https://www.example.com/test
</loc>
<lastmod>2017-09-04</lastmod>
<changefreq>weekly</changefreq>
</url>
Is it possible to add links to do such a thing and how?

See Sitemaps & Cross Submits.
You have to provide a robots.txt at https://www.example.com/robots.txt which
links to the external sitemap:
Sitemap: https://www.filestack.com/sitemap1.xml
(This sitemap may only contain URLs from https://www.example.com/.)

You can use either XML sitemap or HTML sitemap as per Matt Cutts says. It's not mandatory that you must use both sitemaps. Though you can't submit an HTML sitemap to search engines, spiders can crawl your HTML sitemap and crawl pages deeper into your site. But you can not use XML sitemap that is on the different server.

Related

How to detect amazon sitemap

I am trying to scrape some products from amazon.com, but it I can't find it in its robots.txt
I tried
amazon.com/sitemap.xml
amazon.com/sitemap.xml.gz
amazon.com/sitemap1.xml.gz
amazon.com/sitemap1.xml
all turn-up nothing
I also tried sitemap detector such like
https://seositecheckup.com/tools/sitemap-test
The result shows Amazon doesn't have a sitemap.
Is that true? or I didn't have the correct approach.
Look at robots.txt, you will see a sitemap link at bottom with access denied.
This ressources may be accessible only to robots (specific user-agent, IP...).

sitemaps on cross submits

I went through the following link http://www.sitemaps.org/protocol.html for sitemaps & cross submits but did not get clarification if the following approach is correct or not:
I've the following websites one for desktop users and other for mobile users both server different content:
https://www.mainsite.com and https://mobile.site.com both domains point to same website physical root directory and based on user's device the domain URL is changed.
I've placed a robots.txt file in this root directory which has an entry to sitemap_index.xml file:
sitemap: https://www.mainsite.com/sitemap_index.xml
In sitemap_index file
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>
https://www.mainsite.com/sitemap_desktop_www.xml
</loc>
<lastmod>2015-09-04</lastmod>
</sitemap>
<sitemap>
<loc>
https://mobile.site.com/sitemap_mobile_www.xml
</loc>
<lastmod>2015-05-22</lastmod>
</sitemap>
</sitemapindex>
Is this approach correct?
If a bot reads robots.txt and sitemap_index.xml file for domain www.mainsite.com and in sitemap_index.xml will it ignore mobile.site.com?
You have to add your two websites in Google webmaster as two properties and in each one you can add your XML sitemap without the need for the robots.txt
Google will start crawling and indexing your website and see your XML sitemap for each domain.

Google isn't indexing all the Pages in my Site

In my website i use jquery to dynamically change the primary contents of a div(my primary content) so that pages are not reloading when someone presses a link but it adds content to the div.
Google searches for links in my site and finds only #smth and it does not index the pages.What should i do so that Google will index my other pages?
Any thoughts?
You can add a sitemap.xml file using the Sitemaps protocol to the root of your website (or another location specified in robots.txt). The Sitemaps protocol allows you to inform search engines about URLs on a website that are available for crawling (wiki).
An example sitemap (from referenced wiki above) looks like this:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://example.com/</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The crawler of a search engine will visit your sitemap.xml and index the locations specified.
I found out that the answer is to add ! in front of your hash urls and configure the server to send a snapshot of the page to google more info here

Will Googlebot automatically attempt to index sitemap.xml?

Will Googlebot automatically attempt to index sitemap.xml if my sitemap.xml file wasn't submitted to Google? For example, will Googlebot attempt to index http://www.example.com/sitemap.xml if by chance the file is there?
Google's resource say to submit, but what Googlebot does is a separate question.
http://support.google.com/sites/bin/answer.py?hl=en&answer=100283
Sitemap file can have any name and path. So, I don't think that google will look for it, if it is not explicitly specified in robots.txt.
User-agent: *
Sitemap: sitemap.xml

seo question: where should my blog sitemap.xml live

we have a blog in a sub-directory of the main url.
http://www.domain.com/blog/
the blog is run by wordpress and we are using Google Sitemap Generator to create the XML file.
We have an index of all of our sitemaps in the main sitemap.xml which leads to many sitemaps.
From an SEO standpoint would it be best to link directly to the sitemap that is under the blog directory:
e.g. http://www.domian.com/blog/sitemap.xml
or should be do a cron (daily) to copy the file to the main domain's directory:
e.g. http://www.domain.com/sitemap_blog.xml
which will be linked from the main index with the other sitemaps.
What is the best way from an SEO standpoint???
It doesn't matter where the sitemap is. you will want to register its location with the search engines you want to be able to find it. The main thing though is to have a link to your sitemap location in the robots.txt file using the following line:
Sitemap: <sitemap_location>
Your robots.txt file should be in your domain's root.