Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
A site I worked on recently used to be joomla based and had a ton of articles within it and the entire business is different.
After clearing out the site (FTP) and starting fresh and finally finishing all was done however, the sites rankings on google are plagued by old pages which no longer exist. Furthermore, Google seems to think these pages do exist.
I was under the impression that after recrawling the site (at whatever time it saw fit) it would recognise those pages are now non existent and replace them when it could.
Its driving me insane. There are 100's of pages, so I can't put in requests to remove them all, won't they ever automatically be removed?
It will take a while but they will eventually stop looking for those pages. They keep trying for a little while under the assumption that their being missing is an error and they will return. If you're not going to do removal requests then you will have to simply wait it out.
Make sure, all old pages are returning 404 or 410 status. If Googlebot encounters multiple times 404/410 status, it will remove them from index.
Also suggest to check if any of those pages are having backlinks. If Googlebot keeps encountering backlinks to outdated page, it might still hold them in their search index. If there are some pages with valid backlink suggest to 301 (re-direct) them to valid pages.
Try the answer using Google WebMaster Tools you may find here.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I am the developer of Infermap.com. We are regularly monitoring and working on SEO and presence on Google SERPs. In the past 3-4 days we have seen a sudden steep drop in the number of Impressions on Google.
Can someone suggest me the possible reason of why might this happen and by what ways I can prevent it.
Also I have added around 11k urls to be indexed out of which only 1.5k has been indexed. What are the possible reasons for it?
(note: this question should probably be moved to Webmasters Stack Exchange)
Looks like your 11k new URLs have not been picked up as quality content by Google. You might even be cloaking, when I click on a result I get a completely different text on your site.
Ways to avoid it:
avoid cloaking
avoid adding similar looking pages without unique content, e.g. make sure your pages are unique enough before publishing them
feed new content that looks alike gradually, e.g. start with 100 pages, wait a week or two, and add another 200. Once you are confident your pages are picked up well you can add everything at once.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have a website with over 4,00,000 pages and i have created 10 sitemaps with 40,000 links in each site dynamically with php and submitted it in my google webmasters account , i add 50 - 60 pages to my website daily and i don't want to create another sitemap after every 40,000 links now . I have a solution in mind for this which is making a sitemap dynamically which shows all the links to pages created with in last 30 days now and re-submitting it everyday once (with a cron job) but here's the problem the pages i have created before last 30 days will not be in any of the sitemaps so i wanna know is if the links are indexed by google and after resubmitting the sitemap if the links are not in the sitemap will they get unindexed ? and if yes i would really like to know the solution for this ..
I am kind of beginner in seo so if it's a bad question i am really sorry but i searched alot before posting this question but couldn't find any solution.
You might want to look at the Sitemap index standard to see if this may help you break your very large site into more manageable chunks for Google and other search engines to traverse through your sitemaps. Particularly since you are using PHP, the "last updated" date and the assigned weight still factor into the crawl frequency.
To answer your question, though, I am fairly sure the answer is "No". Google has no reason to delete a page from their index unless you explicitly tell them to (using the section in Webmaster Tools or if your server responds with a 301 or 404 HTTP status code).
But I really do think you could benefit from using the Sitemap directory schema described above.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I've seen tutorials/articles discussing using Robots.txt. Is this still a necessary practice? Do we still need to use this technique?
Robots.txt file is not necessary but it is recommended for those who want to block few pages or folders on your website being crawled by search engine crawlers.
I agree with the above answer. Robot.txt file is used for blocking pages and folders from crawling by search engines. For eg. You can block the search engines from crawling and indexing the Session IDs created, which in rare cases could become a security threat! Other than this, I don't see much importance.
The way that a lot of the robots crawl through your site and rank your page has changed recently as well.
I believe for a short period of time the use of Robot.txt may have helped quite a bit, but no adays most other options you'll take in regards to SEO will have more of a positive impact than this little .txt file ever will.
Same goes for backlinks, they used to be far far more important than they are now for you getting ranked.
Robots.txt is not for indexing . its used to blocks the things that you don't want search engines to index
Robots.txt can help with indexation with large sites, if you use it to reveal an XML sitemap file.
Like this:
Sitemap: http://www.domain.com/sitemap.xml
Within the XML file, you can list up to 50,000 URLs for search engines to index. There are plugins for many content management systems that can generate and update these files automatically.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
If you google a specific entity, occasionally the website listed first is given a little listing of content, sort of like a mini site-map that the user can click on to navigate the linked site, bypassing the home page.
My question is this: Can I control this mini-sitemap when I am PR1? If so, how do I do so? I'm trying to build a list of relevant links so users can more effectively hit my site, but I'm not sure where to go about doing this.
Help?
No you cannot turn this on. Google decides this on their own wheter or not to generate them and for which search terms. If you sign up for the google webmasters you can see the status (if google has generated some for your site) and read more about their background.
Google generates the sitelinks itself, but only for certain sites. As for how it determines which sites get it and which don't, I'm not really sure, but I suspect it has something to do with the pagerank of the site and the amount of content you have.
For a while, I had sitelinks for my site (PR4 with about 40,000 pages indexed in Google) but then a while later, they went away. In my case it generated sitelinks for the main tabs on the site, probably because they are in the header navigation and therefore on every single page near the top of the page.
The only control you have over them is you can use the Google webmaster tools to remove sitelinks that you don't like, but you can't change the existing ones or suggest new ones.
They are called Sitelinks - there's a FAQ entry about them here.
You can't control them (except to remove ones you don't like) - the FAQ says "At the moment, sitelinks are completely automated."
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I was wondering how (or if) I should guide Googlebot through my blog. Should I only allow visiting pages with single entries or should it also crawl the main page (which also has full entries)? My concern is that the main page changes when I add a new post and google keeps the old version for some time. I also find directing people to the main page annoying - you have to look through all the post before you find the one you're interested in. So what is the proper way to solve this issue?
Why not submit a sitemap with the appropriate <changefreq> tags -- if you set that to "always" for the homepage, the crawler will know that your homepage is very volatile (and you can have accurate change freq for other URLs too, of course). You can also give a lower priority to your homepage and a higher one to the pages you prefer to see higher in the index.
I do not recommend telling crawlers to avoid indexing your homepage completely, as that would throw away any link juice you might be getting from links to it from other sites -- tweaking change freq and priority seems preferable.
Make a sitemap.xml and regenerate it periodically. Check out Google Webmaster Tools.