Manually add sitemap located in s3 into google webmaster tools - amazon-s3

I have an app running in Heroku.
I am using sitemap_generator to generate sitemap and save it into s3.
I have added the robots.txt to contain my sitemap location.
My question are.
How can I know my sitemap are successfully find by search engine like google?
How can I monitor my sitemap?
If my sitemap is located in my app server I can add the sitemap manually into google webmaster tools for monitoring. Because when I click on "Test/Add sitemap" in Google webmaster tools, it default to the same server.
Thanks for your help.

I got it to work.
Google has something called cross submission: http://googlewebmastercentral.blogspot.com/2007/10/dealing-with-sitemap-cross-submissions.html
You might want to visit this blog as well:
http://stanicblog.blogspot.sg/2012/02/how-to-add-your-sitemap-file-located-in.html
Thanks for your help, yacc.

Let me answer your two first questions, one at a time (I'm not sure what you mean by 'how can I monitor my sitemap' so I'll skip it):
Manually submit a sitemap to Google
If you can't use Google webmaster form to submit your sitemap, use an HTTP get request to notify Google of your new site map.
If your sitemap is located at https://s3.amazonaws.com/sitemapbucket/sitemap.gz , first URL encode your sitemap URL (you can use this online URL encoder/decoder for that) then using curl or wget to submit your encoded URL to Google:
curl www.google.com/webmasters/tools/ping?sitemap=https%3A%2F%2Fs3.amazonaws.com%2Fsitemapbucket%2Fsitemap.gz
If your request is successful you'll get a 200 answer with a message like this:
... cut ...
<body><h2>Sitemap Notification Received</h2>
<br>
Your Sitemap has been successfully added to our list of Sitemaps to crawl.
... cut ...
Checking that Google knows about your new sitemap
Open Webmaster Tools, navigate to Site sonfiguration->Sitemaps, there you should see the sitemaps that you've submited. It might take sometime for a new sitemap to show up there, so check frequently.

Related

Indexing a URL in Google which has no any referral link in website

I have created a link such as http://example.com/s/an-arduino-product (in PHP), but there is no any referral link that refers to the above link in my website. If someone type this link on the browser, it will be redirected dynamically to a valid page (By reading information from database). My question is: What can I do extra work to make Google automatically recognize these links and index them? I will not do it manually via "Google Search Console".
Submit this link in sitemap.xml and resubmit sitemap. Google will reindex pages when you submit sitemap.xml those links only.

Advice regarding Google Webmaster 404 errors

I created a CMS for a website and integrated Google Analytics. The site changes it's content every week (adding, editing, removing pages and URLs) and I rewrite the sitemap every time when one of this actions occurred.
The problem is that the web crawlers from Google detect a lot of 404 error pages.
What I am doing wrong?
Getting reports about 404s is perfectly normal and generally no need to worry about them.
Check where does Google find those 404 URLs, you can see that in Search Console (formerly Webmaster Tools), and see if you can fix them. If you cannot, if you have great content, sooner or later you'll get better links.
What you could do additionally, is to create custom 404 pages, where you link to content on your site that's similar to the missing page (if it's possible to determine that), or that's popular on the site.
Also if you feel that the page is for content that won't be coming back on the site. you can remove the URL for their index by using the remove URL option.

Submitting sitemap.xml to Google via php

On the sitemaps.org it says that it is possible to submit the sitemap.xml via HTTP request to the search engine. However I'm unable to find documentation on how to do this for Google. I'm only finding the documentation on submitting it via Google Webmaster Tools.
Any ideas, is this even possible?
You can ping the sitemap url :
http://www.google.com/webmasters/sitemaps/ping?sitemap=URLOFSITEMAP.xml
Pinging google sitemap after every new article submission?
You don't need to submit and resubmit sitemap.xml to search engines. You can define them in your robots.txt file and web crawlers will find them and crawl them frequently.

pages indexing in google crawl error

i want to remove the pages that i have removed from the server from google
or redirect them
the pages that i have removed from the server are
www.mysite.com/id?=9898
and
www.mysite.com/pagename.html
the new pages are
www.mysite.com/pagename
so i removed the sitemap from google and created a new one and uploaded it
my problem now is google give me crawl error because of the removed pages like www.sitename.com/contact.html and the
indexed page now are only 2 pages
why he can't see that i have removed this pages and when i search on google the removed pages still appears
Better then uploading a new sitemap, is to redirect from your old pages to your new ones. Here you can geht informations about a 301 redirect.
Your problem is, that Google in fact can not know that the pages www.blabalbal.com/id?234234 and www.blabalbal.com/speaking_url are the same page. So you have to tell Google. A good method is the redirect from above.
You should fix this, because Google maybe crawls both webpages as unique ones and comes to the conclusion that this two pages are duplicate content, which is a bad thing for your rankings.
You have to use Disavow Tool - Google to get this done completely.On google search for Disavow Tool - Google and you will get the first link ,login to this by google ID and follow the instructions.USe the same google ID through which sitemap was upload in google webmaster

google indexing wrong page

When I google for cms tutorial my website is the first link which is obviously great. Unfortunatly it is showing the Under construction title I was using when I was updating my site. When you click on the link you of course you go to my website but the title link still remains "CMS TUTORIAL SITE - Under construction" in google instead of the name of the actual page.
How can I request google to re-index that page, I allready requested to remove the cache for that url with the google webmaster tool.
You could request to update your url here.