Can we stop googlebot crawling old pdf url - seo

In my site their is a button linking to PDF. Let say current pdf url on button is http://www.abc.come/wp-content/uploads/2016/09/xyz.pdf and this url google bot has crawled. Now later after month from admin, administrator uploads new pdf let say http://www.abc.come/wp-content/uploads/2016/09/xyz-latest.pdf and updates url on button.
Issue is that googlebot is still crawling old url with xyz.pdf and giving 404 in webmaster tools.
How can we make googlebot to stop crawling old url and crawl new ones.
Thanks.

Yup you can.
Under webmaster go to Google Index -> Remove URLS. Remove you url from here and then from your app. Works for me.

I had the same issue, my solution was an entry in the .htaccess with an 410 ('gone') statuscode. After some time google stops crawling.
But I also read that Google will stop crawling when 404. But on my site it keeps crawling 404-sites.

Related

Google bots are indexing pages from the second domain on the server

Could you help me? Recently my websites on the server were updated to HTTPS - vectorization-eu and pixsector-com. The problem is that google bots for some strange reason indexing pages from pixsector under vectorization-eu domain. Vectorization-eu doesn't have .htaccess file. Could be this an issue.
PREVIEW
Request removal of the URLs that do not exist on Google Search Console.
Upload a site maps to Google Search Console for each domain.

Advice regarding Google Webmaster 404 errors

I created a CMS for a website and integrated Google Analytics. The site changes it's content every week (adding, editing, removing pages and URLs) and I rewrite the sitemap every time when one of this actions occurred.
The problem is that the web crawlers from Google detect a lot of 404 error pages.
What I am doing wrong?
Getting reports about 404s is perfectly normal and generally no need to worry about them.
Check where does Google find those 404 URLs, you can see that in Search Console (formerly Webmaster Tools), and see if you can fix them. If you cannot, if you have great content, sooner or later you'll get better links.
What you could do additionally, is to create custom 404 pages, where you link to content on your site that's similar to the missing page (if it's possible to determine that), or that's popular on the site.
Also if you feel that the page is for content that won't be coming back on the site. you can remove the URL for their index by using the remove URL option.

Submitting sitemap.xml to Google via php

On the sitemaps.org it says that it is possible to submit the sitemap.xml via HTTP request to the search engine. However I'm unable to find documentation on how to do this for Google. I'm only finding the documentation on submitting it via Google Webmaster Tools.
Any ideas, is this even possible?
You can ping the sitemap url :
http://www.google.com/webmasters/sitemaps/ping?sitemap=URLOFSITEMAP.xml
Pinging google sitemap after every new article submission?
You don't need to submit and resubmit sitemap.xml to search engines. You can define them in your robots.txt file and web crawlers will find them and crawl them frequently.

pages indexing in google crawl error

i want to remove the pages that i have removed from the server from google
or redirect them
the pages that i have removed from the server are
www.mysite.com/id?=9898
and
www.mysite.com/pagename.html
the new pages are
www.mysite.com/pagename
so i removed the sitemap from google and created a new one and uploaded it
my problem now is google give me crawl error because of the removed pages like www.sitename.com/contact.html and the
indexed page now are only 2 pages
why he can't see that i have removed this pages and when i search on google the removed pages still appears
Better then uploading a new sitemap, is to redirect from your old pages to your new ones. Here you can geht informations about a 301 redirect.
Your problem is, that Google in fact can not know that the pages www.blabalbal.com/id?234234 and www.blabalbal.com/speaking_url are the same page. So you have to tell Google. A good method is the redirect from above.
You should fix this, because Google maybe crawls both webpages as unique ones and comes to the conclusion that this two pages are duplicate content, which is a bad thing for your rankings.
You have to use Disavow Tool - Google to get this done completely.On google search for Disavow Tool - Google and you will get the first link ,login to this by google ID and follow the instructions.USe the same google ID through which sitemap was upload in google webmaster

Manually add sitemap located in s3 into google webmaster tools

I have an app running in Heroku.
I am using sitemap_generator to generate sitemap and save it into s3.
I have added the robots.txt to contain my sitemap location.
My question are.
How can I know my sitemap are successfully find by search engine like google?
How can I monitor my sitemap?
If my sitemap is located in my app server I can add the sitemap manually into google webmaster tools for monitoring. Because when I click on "Test/Add sitemap" in Google webmaster tools, it default to the same server.
Thanks for your help.
I got it to work.
Google has something called cross submission: http://googlewebmastercentral.blogspot.com/2007/10/dealing-with-sitemap-cross-submissions.html
You might want to visit this blog as well:
http://stanicblog.blogspot.sg/2012/02/how-to-add-your-sitemap-file-located-in.html
Thanks for your help, yacc.
Let me answer your two first questions, one at a time (I'm not sure what you mean by 'how can I monitor my sitemap' so I'll skip it):
Manually submit a sitemap to Google
If you can't use Google webmaster form to submit your sitemap, use an HTTP get request to notify Google of your new site map.
If your sitemap is located at https://s3.amazonaws.com/sitemapbucket/sitemap.gz , first URL encode your sitemap URL (you can use this online URL encoder/decoder for that) then using curl or wget to submit your encoded URL to Google:
curl www.google.com/webmasters/tools/ping?sitemap=https%3A%2F%2Fs3.amazonaws.com%2Fsitemapbucket%2Fsitemap.gz
If your request is successful you'll get a 200 answer with a message like this:
... cut ...
<body><h2>Sitemap Notification Received</h2>
<br>
Your Sitemap has been successfully added to our list of Sitemaps to crawl.
... cut ...
Checking that Google knows about your new sitemap
Open Webmaster Tools, navigate to Site sonfiguration->Sitemaps, there you should see the sitemaps that you've submited. It might take sometime for a new sitemap to show up there, so check frequently.