SEO And AJAX Sites - seo

Is it possible to help search engines by giving them a list of urls to crawl? It might be hard to make the site SEO friendly when using heavy AJAX logic. Let's say that the user chooses a category, then a sub-category and a product. It seems unnecessary to give categories and subcategories urls. But giving only products a url makes sense. When I see the url for the product, I can make the application navigate to that product. So, is it possible to use robots.txt or some other method to direct search engines to the urls I designate?
I am open to other suggestions if this somehow does not make sense.

Yes. What you're describing is called a sitemap -- it's a list of pages on your site which search engines can use to help them crawl your web site.
There are a couple ways of formatting a sitemap, but by far the easiest is to just list out all the URLs in a text file available on your web site -- one per line -- and reference it in robots.txt like so:
Sitemap: http://example.com/sitemap.txt
Here's Google's documentation on the topic: https://support.google.com/webmasters/answer/183668?hl=en

Related

How to remove duplicate title and meta description tags if google indexed them

So, I have been building an ecommerce site for a small company.
The url structure is : www.example.com/product_category/product_name and the site has around 1000 products.
I've checked google webmaster tools and in the HTML improvements section it shows that I have multiple title and meta description tags for all the product pages. They all appear two times, both:
-www.example.com/product_category/product_name
and
-www.example.com/product_category/product_name/ (with slash in the end)
got indexed as separate pages.
I've added a 301 redirect from every www.example.com/product_category/product_name/ to www.example.com/product_category/product_name, but this was almost two weeks ago. I have resubmitted my sitemap and asked google to fetch the whole page a few times. Nothing has changed, GWT still shows the pages as duplicate tags.
I did not get any manual action message.
So I have two questions:
-how can I accelerate the reindexation process, if it's possible?
-and do these tags hurt my organic search results? I've googled it, yes and some say it does and some say it doesn't.
An option is to set a canonical link on both URLs (with and without /) using the URL without a /. Little by little, Google will stop complaining. Keep in mind Google Webmaster Tools is slow to react, especially when you don't have much traffic or backlinks.
And yes, duplicate tags can influence your rankings negatively because users won't have proper and specific information for each page.
Set a canonical link on both Urls is a solution but it take time from my experience.
The fasted way is to block old URL in robots.txt file.
Disallow: /old_url
canonical tag is option but why you are not adding different title and description for all pages.
you can add dynamic meta tags one time and it will create automatically for all pages so we dont worry about duplication.

SEO Search Only content

We have a ton of content on our website which a user can get to by performing a search on the website. For example, we have data for all Public companies, in the form of individual pages per company. So think like 10,000 pages in total. Now in order to get to these pages, a user needs to search for the company name and from the search results, click on the company name they are interested in.
How would a search bot find this page? There is no page on the website which has links to these 10,000 pages. Think amazon, you need to search for your product and then from the search results, click on the product you are interested in to get to it.
The closest solution I could find was the sitemap.xml, is that it? Anything which doesn't require adding 10,000 links to an xml file?
You need to link to a page, or for it to be close to the homepage for it to stand a decent chance of getting indexed by Google.
A sitemap helps, sure, but a page still needs to exist in the menu / site structure. A sitemap reference alone does not guarantee a resource will be indexed.
Google - Webmaster Support on Sitemaps: "Google doesn't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."
If you browse Amazon, it will be possible to find 99% of the products available. Amazon do a lot of interesting stuff in their faceted navigation, you could write a book on it.
Speak to an SEO or a usability / CRO expert - they will be able to tell you what you need to do - which is basically create a user friendly site with categories & links to all your products.
An XML sitemap pretty much is your only on-site option if you do not or cannot link to these products on your website. You could link to these pages from other websites but that doesn't seem like a likely scenario.
Adding 10,000 products to an XML sitemap is easy to do. Your sitemap can be dynamic just like your web pages are. Just generate it on the fly when requested like you would a regular web page and include whatever products you want to be found and indexed.

Associated Content & SEO, Sitemaps with External links, using CNAMEs to include External Links as my own in the sitemap

Is there any HTML code or page paramater or metaname that can tell search engines that the content of a page is closely linked to another page on another domain..
I keep the content metatag updated and also the keyword metatag.
I don't want to show these links to my visitors.
1)
I need to know if there is a protocol for communicating related links specifically to crawlers so as to improve my ranking
Is there any way via code I can tell crawlers (crawlers specifically, like how No Follow is addressed to crawlers) that mydomain.com/Porduct.php is closely linked to say
http://ebay.com/sameProduct
http://wikipedia.com/GenericProduct or
http://google.com?q=someKeywords
Should I include external links or CNAME mapped External links(Read Q3) inside the content tag ?? Would that make a difference
2)
Can I include these links in my Sitemap.. Common sense would suggest that links in my sitemap should be hoisted on my domain. Still though I did ask since the sitemap takes in the full URL including the domain name.
3)
If a particular well indexed page has content largely similar to mine can I map a CNAME of my page to that site and include that in the sitemap?? would that amount to cheating ??
First of all, I'm not sure what do you want to achieve there. Search engines in general are already pretty good at recognizing what your page is about. If your content is about product A, write a description about product A, have images about product A, let your users comment about or review product A, or add microdata to your page (i.e. http://schema.org/Product). All these will help search engines recognize that your page is about that product, just like that page on the other site which also have content about the same product.
To answer your questions:
1) I'm not aware of any tag like that which would also be supported by search engines.
2) In your Sitemap you can include only URLs that point to a location on the same hostname the Sitemap is hosted on (there are some exceptions, but those are irrelevant now). See http://www.sitemaps.org/protocol.html for more info about Sitemaps.
3) A CNAME resource record specifies that the domain name is an alias of another domain name, and thus it can't be used the way you described.
Lastly, you're trying to do something for crawlers which is usually a bad idea. Create an awesome website, something useful for the users, something they would love and they'd miss in case you closed the shop. Just focus on the user and all else will come.

will limiting dynamic urls with robots.txt improve my SEO ranking?

My website has about 200 useful articles. Because the website has an internal search function with lots of parameters, the search engines end up spidering urls with all possible permutations of additional parameters such as tags, search phrases, versions, dates etc. Most of these pages are simply a list of search results with some snippets of the original articles.
According to Google's Webmaster-tools Google spidered only about 150 of the 200 entries in the xml sitemap. It looks as if Google has not yet seen all of the content years after it went online.
I plan to add a few "Disallow:" lines to robots.txt so that the search engines no longer spiders those dynamic urls. In addition I plan to disable some url parameters in the Webmaster-tools "website configuration" --> "url parameter" section.
Will that improve or hurt my current SEO ranking? It will look as if my website is losing thousands of content pages.
This is exactly what canonical URLs are for. If one page (e.g. article) can be reached by more then one URL then you need to specify the primary URL using a canonical URL. This prevents duplicate content issues and tells Google which URL to display in their search results.
So do not block any of your articles and you don't need to enter any parameters, either. Just use canonical URLs and you'll be fine.
As nn4l pointed out, canonical is not a good solution for search pages.
The first thing you should do is have search results pages include a robots meta tag saying noindex. This will help get them removed from your index and let Google focus on your real content. Google should slowly remove them as they get re-crawled.
Other measures:
In GWMT tell Google to ignore all those search parameters. Just a band aid but may help speed up the recovery.
Don't block the search page in the robots.txt file as this will block the robots from crawling and cleanly removing those pages already indexed. Wait till your index is clear before doing a full block like that.
Your search system must be based on links (a tags) or GET based forms and not POST based forms. This is why they got indexed. Switching them to POST based forms should stop robots from trying to index those pages in the first place. JavaScript or AJAX is another way to do it.

Should a sitemap have *every* url

I have a site with a huge number (well, thousands or tens of thousands) of dynamic URLs, plus a few static URLs.
In theory, due to some cunning SEO linkage on the homepage, it should be possible for any spider to crawl the site and discover all the dynamic urls via a spider-friendly search.
Given this, do I really need to worry about expending the effort to produce a dynamic sitemap index that includes all these URLs, or should I simply ensure that all the main static URLs are in there?
That actual way in which I would generate this isn't a concern - I'm just questioning the need to actually do it.
Indeed, the Google FAQ (and yes, I know they're not the only search engine!) about this recommends including URLs in the sitemap that might not be discovered by a crawl; based on that fact, then, if every URL in your site is reachable from another, surely the only URL you really need as a baseline in your sitemap for a well-designed site is your homepage?
If there is more than one way to get to a page, you should pick a main URL for each page that contains the actual content, and put those URLs in the site map. I.e. the site map should contain links to the actual content, not every possible URL to get to the same content.
Also consider putting canonical meta tags in the pages with this main URL, so that spiders can recognise a page even if it's reachable through different dynamical URLs.
Spiders only spend a limited time searching each site, so you should make it easy to find the actual content as soon as possible. A site map can be a great help as you can use it to point directly to the actual content so that the spider doesn't have to look for it.
We have had a pretty good results using these methods, and Google now indexes 80-90% of our dynamic content. :)
In an SO podcast they talked about limitations on the number of links you could include/submit in a sitemap (around 500 per page with a page limit based on pagerank?) and how you would need to break them over multiple pages.
Given this, do I really need to worry
about expending the effort to produce
a dynamic sitemap index that includes
all these URLs, or should I simply
ensure that all the main static URLs
are in there?
I was under the impression that the sitemap wasn't necessarily about disconnected pages but rather about increasing the crawling of existing pages. In my experience when a site includes a sitemap, minor pages even when prominently linked to are more likely to appear on Google results. Depending on the pagerank/inbound links etc. of your site this may be less of an issue.