Sitemap.xml - Google not indexing - seo

I have created a sitemap for my site and it complies with the protocol set by http://www.sitemaps.org/
Google has been told about this sitemap via webmaster tools. It has tracked all the urls within the sitemap (500+ urls) but has only indexed 1 of them. The last time google downloaded the sitemap was on the 21st of Oct 2009.
When I do a google search for site:url it picks up 2500+ results.
Google says it can crawl the site.
Does anyone have any ideas as to why only 1 url is actually indexed?
Cheers,
James

First off, make sure Google hasn't been forbidden from those pages using robots.txt, etc. Also make sure those URLs are correct. :)
Second, Google doesn't just take your sitemap at face value. It uses other factors, such as inbound links, etc, to determine whether it wants to crawl all of the pages in your sitemap. The sitemap then serves mostly as a hint more than anything else (it helps Google know when pages are updated more quickly, for example). Get high-quality, relevant, useful links (inbound and outbound) and your site should start getting indexed.

Your two statements seem to contradict one another.
but has only indexed 1 of them.
and
When I do a google search for site:url it picks up 2500+ results
bdonlan is correct in their logic (robot.txt and Google's lack of trust for sitemaps) but I think the issue is what you "think" is true about your site.
That is, Google Webmaster Tools says you only have 1 page indexed but site:yoursite.com shows 2.5k.
Google Webmaster Tools aren't very accurate. They are nice but they are buggy and MIGHT help you learn about issues about your site. Trust the site: command. Your in Google's index if you search site:yoursite.com and you see more than 1 result.
I'd trust site:yoursite.com. You have 2.5k pages in Google, indexed and search-able.
So, now optimize those pages and see the traffic flow. :D
Sidenote: Google can crawl any site, flash, javascript, etc.

Related

When will Google stop showing a site's page after a robots.txt has been placed in it?

Google is showing www.example.com/myPage as a search result.
I do not want this /myPage to be indexed by google, so a robots.txt was placed in the page.
How long will it take to stop being showed in google?
I know that people can still visit it if they have the URL, but my aim is just to remove it from google's search results.
My knowledge in SEO is little, and I feel the answer may vary depending on the site traffic and other SEO-related factors, but speaking in general terms, how long would this take?
Crawls are based on many factors such as PageRank, links to a page, and crawling constraints such as the number of parameters in a URL. Any number of factors can affect the crawl frequency of individual sites.
The crawl process is algorithmic; computer programs determine which sites to crawl, how often, and how many pages to fetch from each site. They don't accept payment to crawl a site more frequently. For tips on maintaining a crawler-friendly website, please visit the Webmaster Guidelines.
I would suggest you to use the google webmaster tool for your SEO this will help you to see that when Google last crawled your website also there are many SEO options that will help you to index your site better.
There is also an option in Webmaster to ask Google to crawl your site again telling Google bots to re crawl your site as the content on your site has changed.
This link might help you understand better.Also to get an overview of Webmaster setup and features visit this link

Automatic Google Indexing

Implemented Google site search in our company website. We need to automate the google indexing for our website.
Suppose like our customers are updated the forum. We need to show the up to updated forum information in our forum search ?
Is there any option in google API or any other API please help me ?
You can use an XML sitemap. This will tell the search engines where your content is so they can find it and crawl it. Keep in mind there is no way to make the search engines crawl your site when you want them to. They will crawl on a schedule they determine to be right for your site. (You can set a crawl rate in Google Webmaster Tools but that rate is relative to what crawl rate Google already has set for you. Setting it to fastest will not speed up heir crawl rate)).
Unfortunately, Google will only crawl your site when it feels like it. It is based on many variables to determine how often this occurs (i.e. site ranking, standards compliance, and so on). The sitemap XML is a helpful way to help Google determine what parts of your site to index, however if you don't have one Google will find it by crawling links on other parts of your page and updating its index if the page changes.
The more visitors you get and the more often your site's links appear on other sites will make Google index more frequently.
To start, I'd suggest http://validator.w3.org/ to validate your site and make sure you get it as close to possible to no errors. This makes it easier for Google to index your site because it can find the information it expects without having to crawl over invalid markup. Also, chances are, if a site validates with a very small amount of errors, it is more credible than one containing many errors. It tells the search engine that you update your site to ensure most all browsers can use it and that it is accessible.
Also validating your site gives you some bragging rights over those who don't meet W3 standards :)
Hope this helps!

SEO : Adding to Google other than submitting directly for google's crawler - http://www.enshaeyah.webs.com

What are other ways of making your website searchable by Google, other than submitting the link directly to Google.
Submitting links to yahoo is a breeze, gets crawled for a day or two... Google though takes a while...
Thanks...
if you add a link to your website on a website that's already indexed by google, google will follow that and reach your site without you needing to submit to their page. it's actually not recommended to submit your site to their page because then you're put at the end of the queue. but if you have a link on a page that google indexes in the next minute, it will get to you much faster. more links on many pages with higher ranking the better. cheers
Add your site to DMOZ.org, and encourage everyone you know to link to your site. The more places that link to your site, the more likely it'll get indexed sooner (and more fully), and the better it will rank.
Also, if your site is very large, it is not unreasonable to sign up for their webmaster tools and submit a sitemap index. This is especially effective for fast ranking, and showing up in obscure search results, but it will not help you rank for difficult terms.
Also note that if your site was visited by googlebot,
it doesn't necessarily end up in the google index.
Use this link to check:
http://www.google.com/webmasters/tools/sitestatus

How is it possible for new content to appear in Google results mere minutes after it is created?

For example, when I post to Stackoverflow, the post appears in the Google index a minute later. How is this accomplished? What do I have to do to my web-site to get the same frequency of indexing?
You could start by:
getting 65,000-odd regular users on your site.
making your site linked to from all over the place.
make your site very active.
providing very useful content.
This is all standard SEO stuff which will up your "importance" in the eyes of Google (and other search engines, presumably, but who cares :-).
The faster a page changes, the more google will re-index it.
Obviously, if your site is "important" enough for google.
You should check out Google Webmaster Tools here http://www.google.com/webmasters/tools
To help with indexing from Google, but also Yahoo and MS, you'll want to use the sitemap protocol, see http://en.wikipedia.org/wiki/Sitemaps .
Simply put, if you want do that you, first, need to lure Google robot to you site.
To do this you should do those things:
Building as much hyperlinks to high-ranked, active, relevant sites as possible.
make your own site active. In this way, google believes your site is worthwhile to visit frequently!
In addition to this, you can provide premier content and structure(site map).
To sum all of them up, you need build a great site in the eyes of search engines!
Good luck!

Is this a blackhat SEO technique?

I have a site which has been developed completely in flash. Now the site owners do not want to shift to a more text/html based site. So am planning to create an alternative html/text based site which the googlebot will get redirected to. (By checking the useragent). My question is that is this allowed officially by google?
If not then how come there are many subscription based sites which display a different set of data to google compared to the users? Is that allowed?
Thank you very much.
I've dealt with this exact scenario for a large ecommerce site and Google essentially ignored the site. Google considers it cloaking and addresses it directly here and says:
Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.
Instead, create an ADA compliant version of the website so that users with screen readers and vision aids can use your web site. As long as there as link from your home page to your ADA compliant pages, Google will index them.
The official advice seems to be: offer a visible link to a non-flash version of the site. Fooling the googlebot is a surefire way to get in trouble. And remember, Google results will link to the matching page! Do not make useless results.
Google already indexes flash content so my suggestion would be to check how your site is being indexed. Maybe you don't have to do anything.
I don't think showing an alternate version of the site is good from a Google perspective.
If you serve up your page with the exact same address, then you're probably fine. For example, if you show 'http://www.somesite.com/' but direct googlebot to 'http://www.somesite.com/alt.htm', then Google might direct search users to alt.htm. You don't want that, right?
This is called cloaking. I'm not sure what the effects of it are but it is certainly not whitehat. I am pretty sure Google is working on a way to crawl flash now so it might not even be a concern.
I'm assuming you're not really doing a redirect but instead a PHP import or something similar so it shows up as the same page. If you're actually redirecting then it's just going to index the other page like normal.
Some sites offer a different level of content -- they LIMIT the content, they don't offer alternative and additional content. This is done so it doesn't index unrelated things generally.