I have submitted the sitemap to google webmaster tools however it is not getting indexed.
It has been almost a month since it has been submitted. The webmaster tools say "No data available." on almost every section.
As far as I can tell there is nothing blocking google from indexing, robots.txt as you can see is not blocking anything, no meta tags blocking crawling.
Here is a screen shot of the webmaster tools for the sitemap:
http://www.2shared.com/photo/4HLbsOte/webmaster.html
I am not sure why it says Processed May 3 2012 when I submitted it earlier last month. But nothing has been indexed and looks like there are no issues with it either.
Any ideas?
Thanks for the help.
SOLVED Edit:
looks like I had X-Robots-Tag: noindex, nofollow in my http header.
In the sitemap section of webmaster tools, does it say that there are any errors with the sitemap you submitted?
Also, how many pages are in that sitemap? if there are very few pages, you are likely to see very low indexing because google usually doesn't index all of your pages
had X-Robots-Tag: noindex, nofollow in my http header.
There may be some issues when you submit sitemap of your website to google webmaster tool. You should try again to add sitemap. Just delete previous sitemap and add it again. I hope it will work now.
Related
My website has about 500.000 pages. I made sitemap.xml and listed all pages in it (I know about limitation 50.000 links per file, so I have 10 sitemaps). Anyway I submitted sitemaps in webmastertool and everything seems ok (no error and I can see submitted and index links). Hoverer I have a problem with spidering frequently. GoogleBot spiders the same page 4 times per day but in sitemap.xml I tell that the page would be changed yearly.
This is an example
<url>
<loc>http://www.domain.com/destitution</loc>
<lastmod>2015-01-01T16:59:23+02:00</lastmod>
<changefreq>yearly</changefreq>
<priority>0.1</priority>
</url>
1) So how to tell GoogleBot not to spider so frequently as it overload my server?
2) the website has several pages like http://www.domain.com/destitution1, http://www.domain.com/destitution2 ... and I put canonical url to http://www.domain.com/destitution. Might it be the reason of multi spidering ?
You can report this to Google crawling team, see here :
In general, specific Googlebot crawling-problems like this are best
handled through Webmaster Tools directly. I'd go through the Site
Settings for your main domain, Crawl Rate, and then use the "Report a
problem with Googlebot" form there. The submissions through this form
go to our Googlebot team, who can work out what (or if anything) needs
to be changed on our side. They generally won't be able to reply, and
won't be able to process anything other than crawling issues, but they
sure know Googlebot and can help tweak what it does.
https://www.seroundtable.com/google-crawl-report-problem-19894.html
The crawling will slow down progressively. Bots are likely revisiting your pages because there are internal links between your pages.
In general, canonicals tend to reduce crawling rates. But at the beginning, Google bots need crawl both the source and target page. You will see the benefit later.
Google bots don't necessarily take lastmod and changefreq information into account. But if they establish content is not modified, they will come back less often. It is a matter of time. Every URL has a scheduler for revisits.
Bots adapt to the capaccity of the server (see crawling summary I maintain for more details). You can temporarily slow down bots by returning them http error code 500 if that is an issue. They will stop and come back later.
I don't believe there is a crawling issue with your site. What you see is normal behavior. When several sitemaps are submitted at once, the crawling rates can be temporarily raised.
I have a site www.megalim.co.il,
recently due to a version upgrade, I discovered that i have a robots.txt file that disallowed all Search engines.. my google ranking dropped , and I couldn't find the site's main page anymore
I changed the robots.txt file to one that allows all, and now the web master toolkit doesn't
write me that the site is blocked from google.
I did this about 5 days ago, I've also fetched as google
and submitted www.megalim.co.il to index with all related pages
but still, when i search this: "site:www.megalim.co.il"
i get a bunch of results from my site , but not the main page!
what else should I look for?
thanks!
Igal
You don't see your main page because of your old robots.txt. 5 days is nothing for Google bots to re-index all your website.
Just wait a little and you will see your website fully indexed in Google results.
Issue sorted out..
embarassing...
apparently we (inexplicably) had a nofollow, noindex meta tag..
after a day we start reappearing in google
thanks :)
I have made changes to my website's keywords, description, and title, but Google is not indexing the new keyword. Instead, I have found that Google is indexing the older one.
How can I get Google to index my site using the new keywords that I have added?
Periods between crawling a page vary a lot across pages. A post to SO will be crawled and indexed by Google in seconds. Your personal page that hasn't changed content in 20 years might not even be crawled as much as once a year.
Submitting a sitemap to the webmaster tools will likely re-crawl your website to validate your sitemap. You could use this to speed up the re-crawling.
However, as #Charles noted, the keywords meta-tag is mostly ignored by Google. So it sounds like you're wasting your time.
I've recently been involved in the redevelopment of a website (a search engine for health professionals: http://www.tripdatabase.com), and one of the goals was to make it more search engine "friendly", not through any black magic, but through better xhtml compliance, more keyword-rich urls, and a comprehensive sitemap (>500k documents).
Unfortunately, shortly after launching the new version of the site in October 2009, we saw site visits (primarily via organic searches from Google) drop substantially to 30% of their former glory, which wasn't the intention :)
We've brought in a number of SEO experts to help, but none have been able to satisfactorily explain the immediate drop in traffic, and we've heard conflicting advice on various aspects, which I'm hoping someone can help us with.
My question are thus:
do pages present in sitemaps also need to be spiderable from other pages? We had thought the point of a sitemap was specifically to help spiders get to content not already "visible". But now we're getting the advice to make sure every page is also linked to from another page. Which prompts the question... why bother with sitemaps?
some months on, and only 1% of the sitemap (well-formatted, according to webmaster tools) seems to have been spidered - is this usual?
Thanks in advance,
Phil Murphy
The XML sitemap helps search engine spider to indexing of all web pages of your site.
The sitemap is very usefull if you publish frequently many pages, but does not replace the correct system of linking of the site: all documents must be linke from an other related page.
Your site is very large, you must attention at the number of URLs published in the Sitemap because there are the limit of 50.000 URLs for each XML file.
The full documentation is available at Sitemaps.org
re: do pages present in sitemaps also need to be spiderable from other pages?
Yes, in fact this should be one of the first things you do. Make your website more usable to users before the search engines and the search engines will love you for it. Heavy internal linking between pages is a must first step. Most of the time you can do this with internal sitemap pages or category pages ect..
re: why bother with sitemaps?
Yes!, Site map help you set priorities for certain content on your site (like homepage), Tell the search engines what to look at more often. NOTE: Do not set all your pages with the highest priority, it confuses Google and doesn't help you.
re: some months on, and only 1% of the sitemap seems to have been spidered - is this usual?
YES!, I have a webpage with 100k+ pages. Google has never indexed them all in a single month, it takes small chunks of about 20k at a time each month. If you use the priority settings property you can tell the spider what pages they should re index each visit.
As Rinzi mentioned more documentation is available at Sitemaps.org
Try build more backlinks and "trust" (links from quality sources)
May help speed indexing further :)
I Produced a page which I have no intention to let Search Engines find and claw it.
The advisable solution is robot.txt. But it is not applicable in my situation.
So I isolated this page from my site by clearing all links from other pages to this page, and never put its URL in external sites.
Logically, then, it is impossible for search engines to find out this page. And that means no matter how many out-bound links nesting in this page, the PR of site is save.
Am I right?
Thank you very much!
Hope this question is programming related!
No, there's still a chance your page can be found by search engine crawlers. For example, it's been speculated that data from the Google Toolbar can be used to alert Googlebot to the presence of a page. And there's still a chance others might link to your page from external sites if the URL becomes known.
Your best bet is to add a robots meta tag to your page, this will prevent it from being indexed, and prevent crawlers from following any links:
<meta name="robots" content="noindex,nofollow" />
If it is on the internet and not restricted, it will be found. It may make it harder to find, but it is still possible a crawler may happen across it.
What is the link so I can check? ;)
If you have outbound links on this "isolated" page then your page will probably show up as a referrer in the logs of the linked-to page. Depending on how much the owners of the linked-to page track their stats, then they may find your page.
I've seen httpd log files turn up in Google searches. This in turn may lead others to find your page, including crawlers and other robots.
The easiest solution might be to password protect the page?