I've submitted sitemap.xml files to google webmaster tools and it says that i has all of the page in total but under "indexed" it says "--"? How long does it take for Google to start indexing? This was a couple of days ago.
A Sitemap is a way for webmasters to help Search Engines to easily discover more pages from their websites. A Sitemap should be considered an aid, not a duty. Even if you submit a Sitemap there's no guarantee that the URLs listed in the Sitemap will be read or included in Search Engine indexes.
Usually it takes from a few hours to some day to be indexed.
Quotes from a Google source
"We don't guarantee that we'll crawl
or index all of your URLs. For
example, we won't crawl or index image
URLs contained in your Sitemap.
However, we use the data in your
Sitemap to learn about your site's
structure, which will allow us to
improve our crawler schedule and do a
better job crawling your site in the
future. In most cases, webmasters will
benefit from Sitemap submission, and
in no case will you be penalized for
it."
Mod Note: An attribution link was originally here, but the site linked to no longer exists
It usually takes up to two weeks to be indexed. Just give it some time :)
In short: it depends.
If your website is new, Google will have to crawl and index it first. This can take time and depends on many factors (see the Google FAQs on indexing).
If the website is not new, it's possible that you are submitting URLs in the Sitemap file which do not match the URLs that were crawled and indexed. In this case, the indexed URL count is usually not zero, but this could theoretically be the case if the URLs in the Sitemap file are drastically wrong (eg with session-ids).
Finally, if you are submitting a non-web Sitemap file (eg for Google Video or Google News), it's normal for the indexed URL count to be zero: the count only applies for URLs within the normal web-search results.
Without knowing the URL of the Sitemap file it's impossible to say for sure which of the above applies.
Related
Recently my site got hacked and it has been restored. Thousands of spam URLs were indexed by google. I do have a google webmaster account and I can update and submit my sitemap. But how do I tell google to strictly only index the URLs inside my sitemap? I want to prevent any new spam urls created by hackers from being indexed.
Any parameter inside the sitemap.xml that I can use to do this?
Your sitemap should only include the new URLs and google will crawl and index only them.
If you have removed the old spammed URLs and they are in 404(not found) status, Google will remove them from the index (albeit quite slowly, it could take even 1-2 months).
If you need to remove those URLs from being displayed in the index there's a section about it the webmaster guide: https://support.google.com/webmasters/answer/1663419?hl=en
I have a website with updated content daily. I have two questions:
How does google see this content? Do I need a SEO in this case?
Does the error 404 page have an influence on the ranking on the search engine. (I do not have a static page)
So Google can "know" the content is supposed to be updated daily, it may be useful (if you don't do it yet) to implement a sitemap (and update, if necessary, dynamically). In this simemap, you can specify for each page, the update period.
This is not a constraint for Google, but it can help to adjust the frequency of indexing robots visit.
If you do, you must be "honest" with Google about times updates. If Google realizes that the frequency defined in the sitemap does not correspond to the actual frequency, it can be bad for your rankings.
404 errors (and other HTTP errors) can actually indirectly have an adverse effect on the ranking of the site. Of course, if the robot can not access content at a given moment, it can not be indexed. But scoffers, if too many problems are encountered during the visit of your site by web crawlers, Google will adjust the frequency of visits to the downside.
You can get some personalized advice and monitor the process of indexing your site using Google Webmaster Tools (and to a lesser extent, Analytics or any other tool that could monitor the web crawlers visits).
You can see the date and time when Google last visited your page. So you can see that Google adapted your updated content or not. If you have a website with updated content daily then you can ping your website to many search engines and can also submit your site in Google.
You can make a sitemap for only those urls who have a daily updated content and submit to google webmaster tools. You can define your date and time when the url was last modified in tag. You can also give a hint how frequently your page will likely to change under tag. You can set high priority for the pages which are modified daily under tag.
If you have 404 error (file not found) page then then put them in one directory and define it in your robots.txt file. So Google will not crawl that web pages and automatically it will not be indexed. It will not make any influence on your SERP ranking.
I have made changes to my website's keywords, description, and title, but Google is not indexing the new keyword. Instead, I have found that Google is indexing the older one.
How can I get Google to index my site using the new keywords that I have added?
Periods between crawling a page vary a lot across pages. A post to SO will be crawled and indexed by Google in seconds. Your personal page that hasn't changed content in 20 years might not even be crawled as much as once a year.
Submitting a sitemap to the webmaster tools will likely re-crawl your website to validate your sitemap. You could use this to speed up the re-crawling.
However, as #Charles noted, the keywords meta-tag is mostly ignored by Google. So it sounds like you're wasting your time.
I've recently been involved in the redevelopment of a website (a search engine for health professionals: http://www.tripdatabase.com), and one of the goals was to make it more search engine "friendly", not through any black magic, but through better xhtml compliance, more keyword-rich urls, and a comprehensive sitemap (>500k documents).
Unfortunately, shortly after launching the new version of the site in October 2009, we saw site visits (primarily via organic searches from Google) drop substantially to 30% of their former glory, which wasn't the intention :)
We've brought in a number of SEO experts to help, but none have been able to satisfactorily explain the immediate drop in traffic, and we've heard conflicting advice on various aspects, which I'm hoping someone can help us with.
My question are thus:
do pages present in sitemaps also need to be spiderable from other pages? We had thought the point of a sitemap was specifically to help spiders get to content not already "visible". But now we're getting the advice to make sure every page is also linked to from another page. Which prompts the question... why bother with sitemaps?
some months on, and only 1% of the sitemap (well-formatted, according to webmaster tools) seems to have been spidered - is this usual?
Thanks in advance,
Phil Murphy
The XML sitemap helps search engine spider to indexing of all web pages of your site.
The sitemap is very usefull if you publish frequently many pages, but does not replace the correct system of linking of the site: all documents must be linke from an other related page.
Your site is very large, you must attention at the number of URLs published in the Sitemap because there are the limit of 50.000 URLs for each XML file.
The full documentation is available at Sitemaps.org
re: do pages present in sitemaps also need to be spiderable from other pages?
Yes, in fact this should be one of the first things you do. Make your website more usable to users before the search engines and the search engines will love you for it. Heavy internal linking between pages is a must first step. Most of the time you can do this with internal sitemap pages or category pages ect..
re: why bother with sitemaps?
Yes!, Site map help you set priorities for certain content on your site (like homepage), Tell the search engines what to look at more often. NOTE: Do not set all your pages with the highest priority, it confuses Google and doesn't help you.
re: some months on, and only 1% of the sitemap seems to have been spidered - is this usual?
YES!, I have a webpage with 100k+ pages. Google has never indexed them all in a single month, it takes small chunks of about 20k at a time each month. If you use the priority settings property you can tell the spider what pages they should re index each visit.
As Rinzi mentioned more documentation is available at Sitemaps.org
Try build more backlinks and "trust" (links from quality sources)
May help speed indexing further :)
Our SEO team would like to open up our main dynamic search results page to spiders and remove the 'nofollow' from the meta tags. It is currently accessible to spiders via allowing the path in robots.txt, but with a 'nofollow' clause in the meta tag which prevents spiders from going beyond the first page.
<meta name="robots" content="index,nofollow">
I am concerned that if we remove the 'nofollow', the impact to our search system will be catastrophic, as spiders will start crawling through all pages in the result set. I would appreciate advice as to:
1) Is there a way to remove the 'nofollow' from the meta tag, but prevent spiders from following only certain links on the page? I have read mixed opinions on rel="nofollow", is this a viable option?
<a rel="nofollow" href="http://www.mysite.com/paginglink" >Next Page</a>
2) Is there a way to control the 'depth' of how far spiders will go? It wouldn't be so bad if they hit a few pages, then stopped.
3) Our search results pages have the standard next/previous links, which would in theory cause spiders to hit pages recursively to infinity, what is the effect of this on SEO?
I understand that different spiders behave differently, but am mainly concerned with the big players, such as Google, Yahoo, MSN.
Note our Search results pages and paging links are not bot-friendly, in that they are not re-written and have a ?name=value query string, but from what I've seen spiders no longer just abort when they see the '?' as the results pages ARE getting indexed with decent page rank.
To be honest you are looking at nofollow wrong. Chances are the search spiders are already especially Google, Yahoo, and MSN searching the nofollow pages, because they still have to hit those pages to see if they have a noindex.
The real problem is nofollow doesn't actually mean don't follow, it just means don't pass on my reputation to this link. So unless you are aggressively blocking bots, which it doesn't sound like you are, changing the ROBOTS meta tag and robot commands on links will not effect performance because they are already hitting your site. To confirm this just look at your HTTP Server Log.
So my vote is that you will not see any problem with removing the robot limits.
I've seen Google index a calendar system that had relative links on each page through the end of time (Jan 19, 2038 - see: http://en.wikipedia.org/wiki/Year_2038_problem). We didn't notice the load on our servers until it exposed a bug in the source code dealing with dates in 2038.
I don't know about the other search engines, but Google offers a number of helpful tools for controlling how much the googlebot impacts your server infrastructure. See http://www.google.com/webmasters/.
There is an option in webmaster tools to set the crawl rate for your site.
Google bots are pretty intelligent about not traversing an entire database of dynamically-generated pages, as long as the URLs give some hint that they are dynamic (i.e. file extension of .asp or .jsp, etc. and numeric ids as query parameters). If you use rewrite rules to make your URLs "friendly", then the bots have a harder time determining whether or not it's a static page they are reading or a dynamically generated page. See this Google article for more information about dynamic vs. static URLs.
You may also want to consider creating a Google Sitemap to give the bots a better idea about what pages on your site can be indexed and which cannot.