SEO Search Only content - seo

We have a ton of content on our website which a user can get to by performing a search on the website. For example, we have data for all Public companies, in the form of individual pages per company. So think like 10,000 pages in total. Now in order to get to these pages, a user needs to search for the company name and from the search results, click on the company name they are interested in.
How would a search bot find this page? There is no page on the website which has links to these 10,000 pages. Think amazon, you need to search for your product and then from the search results, click on the product you are interested in to get to it.
The closest solution I could find was the sitemap.xml, is that it? Anything which doesn't require adding 10,000 links to an xml file?

You need to link to a page, or for it to be close to the homepage for it to stand a decent chance of getting indexed by Google.
A sitemap helps, sure, but a page still needs to exist in the menu / site structure. A sitemap reference alone does not guarantee a resource will be indexed.
Google - Webmaster Support on Sitemaps: "Google doesn't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."
If you browse Amazon, it will be possible to find 99% of the products available. Amazon do a lot of interesting stuff in their faceted navigation, you could write a book on it.
Speak to an SEO or a usability / CRO expert - they will be able to tell you what you need to do - which is basically create a user friendly site with categories & links to all your products.

An XML sitemap pretty much is your only on-site option if you do not or cannot link to these products on your website. You could link to these pages from other websites but that doesn't seem like a likely scenario.
Adding 10,000 products to an XML sitemap is easy to do. Your sitemap can be dynamic just like your web pages are. Just generate it on the fly when requested like you would a regular web page and include whatever products you want to be found and indexed.

Related

SEO And AJAX Sites

Is it possible to help search engines by giving them a list of urls to crawl? It might be hard to make the site SEO friendly when using heavy AJAX logic. Let's say that the user chooses a category, then a sub-category and a product. It seems unnecessary to give categories and subcategories urls. But giving only products a url makes sense. When I see the url for the product, I can make the application navigate to that product. So, is it possible to use robots.txt or some other method to direct search engines to the urls I designate?
I am open to other suggestions if this somehow does not make sense.
Yes. What you're describing is called a sitemap -- it's a list of pages on your site which search engines can use to help them crawl your web site.
There are a couple ways of formatting a sitemap, but by far the easiest is to just list out all the URLs in a text file available on your web site -- one per line -- and reference it in robots.txt like so:
Sitemap: http://example.com/sitemap.txt
Here's Google's documentation on the topic: https://support.google.com/webmasters/answer/183668?hl=en

SEO: secure pages and rel=nofollow

Should one apply rel="nofollow" attribute to site links that are bound for secure/login required pages?
We have a URI date based link structure where the previous year's news content is free, while the current year, and any year prior to the last, are paid, login required content.
The net effect is that when doing a search for our company name in google, what comes up first is Contact, About, Login, etc., standard non-login required content. That's fine, but ideally we have our free content, the pages we want to promote, shown first in the search engine results.
Toward this end, the link structure now generates rel="follow" for the free content we want to promote, and rel="nofollow" for all paid content and Contact, About, Login, etc. screens that we want at the bottom of the SEO search result ladder.
I have yet to deploy the new linking scheme for fear of, you know, blowing up the site SEO-wise ;-) It's not in great shape to begin with, despite our decent ranking, but I don't want us to disappear either.
Anyway, words of wisdom appreciated.
Thanks
nofollow
I think Emil Vikström is wrong about nofollow. You can use the rel value nofollow for internal links. The microformats spec and the HTML5 spec don't say the opposite.
Google even gives such an example:
Crawl prioritization: Search engine robots can't sign in or register as a member on your forum, so there's no reason to invite Googlebot to follow "register here" or "sign in" links. Using nofollow on these links enables Googlebot to crawl other pages you'd prefer to see in Google's index. However, a solid information architecture — intuitive navigation, user- and search-engine-friendly URLs, and so on — is likely to be a far more productive use of resources than focusing on crawl prioritization via nofollowed links.
This does apply to your use case. So you could nofollow the links to your login page. Note however, if you also meta-noindex them, people that search for "YourSiteName login" probably won't get the desired page in their search results, then.
follow
There is no rel value "follow". It's not defined in the HTML5 spec nor in the HTML5 Link Type extensions. It isn't even mentioned in http://microformats.org/wiki/existing-rel-values at all. A link without the rel value nofollow is automatically a "follow link".
You can't overwrite a meta-nofollow for certain links (the two nofollow values even have a different semantic).
Your case
I'd use nofollow for all links to restricted/paid content. I wouldn't nofollow the links to the informational pages about the site (About, Contact, Login), because they are useful, people might search especially for them, and they give information about your site, while all the content pages give information about the various topics.
Nofollow is only for external links, it does not apply to links within your own domain. Search engines will try to give the most relevant content for the query asked, and they generally actively avoid taking the website owners wishes into account. Thus, nofollow will not help you here.
What you really want to do is make the news content the best choice for a search on your company name. A user searching for your company name may do this for two reasons: They want your homepage (the first page) or they more specifically want to know more about your company. This means that your homepage as well as "About", "Contact", etc, are generally actually what the user is looking for and the search engines will show them at the top of their results pages.
If you don't want this you must make those pages useless for one wanting to know more about your company. This may sound really silly. To make your "About" and "Contact" pages useless to one searching for your company you should remove your company name from those pages, as well as any information about what your company does. Put that info on the news pages instead and the search engines may start to rank the news higher.
Another option is to not let the search engine index those other pages at all by adding them to a robots.txt file.

SEO: Allowing crawler to index all pages when only few are visible at a time

I'm working on improving the site for the SEO purposes and hit an interesting issue. The site, among other things, includes a large directory of individual items (it doesn't really matter what these are). Each item has its own details page, which is accessed via
http://www.mysite.com/item.php?id=item_id
or
http://www.mysite.com/item.php/id/title
The directory is large - having about 100,000 items in it. Naturally, on any of the pages only a few items are listed. For example, on the main site homepage, there are links to about 5 or 6 items, from some other page there links to about a dozen different items, etc.
When real users visits the site, they can use search form to find item by keyword or location - so there would be a list produced matching their search criteria. However when, for example, a google crawler visits the site, it won't even attempt to put a text into the keyword search field and submit the form. Thus as far as the bot is concern, after indexing the entire site, it has covered only a few dozen items at best. Naturally, I want it to index each individual item separately. What are my options here?
One thing I considered is to check the user agent and IP ranges and if the requestor is a bot (as best I can say), then add a div to the end of the most relevant page with links to each individual item. Yes, this would be a huge page to load - and I'm not sure how google bot would react to this.
Any other things I can do? What are best practices here?
Thanks in advance.
One thing I considered is to check the user agent and IP ranges and if
the requestor is a bot (as best I can say), then add a div to the end
of the most relevant page with links to each individual item. Yes,
this would be a huge page to load - and I'm not sure how google bot
would react to this.
That would be a very bad thing to do. Serving up different content to the search engines specifically for their benefit is called cloaking and is a great way to get your site banned. Don't even consider it.
Whenever a webmaster is concerned about getting their pages indexed having an XML sitemap is an easy way to ensure the search engines are aware of your site's content. They're very easy to create and update, too, if your site is database driven. The XML file does not have to be static so you can dynamically produce it whenever the search engines request it (Google, Yahoo, and Bing all support XML sitemaps). You can find out mroe about XML sitemaps at sitemaps.org.
If you want to make your content available to search engines and want to benefit from semantic markup (i.e. HTML) you should also make sure your all of content can be reached through hyperlinks (in other words not through form submissions or JavaScript). The reason for this is twofold:
The anchor text in the links to your items will contain the keywords you want to rank well for. This is one of the more heavily weighted ranking factors.
Links count as "votes", especially to Google. Links from external websites, especially related websites, are what you'll hear people recommend the most and for good reason. They're valuable to have. But internal links carry weight, too, and can be a great way to prop up your internal item pages.
(Bonus) Google has PageRank which used to be a huge part of their ranking algorithm but plays only a small part now. But it still has value and links "pass" PageRank to each page they link to increasing the PageRank of that page. When you have as many pages as you do that's a lot of potential PageRank to pass around. If you built your site well you could probably get your home page to a PageRank of 6 just from internal linking alone.
Having an HTML sitemap that somehow links to all of your products is a great way to ensure that search engines, and users, can easily find all of your products. It is also recommended that you structure your site so more important pages are closer to the root of your website (home page) and then as you branch out gets to sub pages (categories) and then to specific items. This gives search engines an idea of what pages are important and helps them organize them (which helps them rank them). It also helps them follow those links from top to bottom and find all of your content.
Each item has its own details page, which is accessed via
http://www.mysite.com/item.php?id=item_id
or
http://www.mysite.com/item.php/id/title
This is also bad for SEO. When you can pull up the same page using two different URLs you have duplicate content on your website. Google is on a crusade to increase the quality of their index and they consider duplicate content to be low quality. Their infamous Panda Algorithm is partially out to find and penalize sites with low quality content. Considering how many products you have it is only a matter of time before you are penalized for this. Fortunately the solution is easy. You just need to specify a canonical URL for your product pages. I recommend the second format as it is more search engine friendly.
Read my answer to an SEO question at the Pro Webmaster's site for even more information on SEO.
I would suggest for starters having an xml sitemap. Generate a list of all your pages, and submit this to Google via webmaster tools. It wouldn't hurt having a "friendly" sitemap either - linked to from the front page, which lists all these pages, preferably by category, too.
If you're concerned with SEO, then having links to your pages is hugely important. Google could see your page and think "wow, awesome!" and give you lots of authority -- this authority (some like to call it link juice" is then passed down to pages that are linked from it. You ought to make a hierarchy of files, more important ones closer to the top and/or making it wide instead of deep.
Also, showing different stuff to the Google crawler than the "normal" visitor can be harmful in some cases, if Google thinks you're trying to con it.
Sorry -- A little bias on Google here - but the other engines are similar.

Use of sitemaps

I've recently been involved in the redevelopment of a website (a search engine for health professionals: http://www.tripdatabase.com), and one of the goals was to make it more search engine "friendly", not through any black magic, but through better xhtml compliance, more keyword-rich urls, and a comprehensive sitemap (>500k documents).
Unfortunately, shortly after launching the new version of the site in October 2009, we saw site visits (primarily via organic searches from Google) drop substantially to 30% of their former glory, which wasn't the intention :)
We've brought in a number of SEO experts to help, but none have been able to satisfactorily explain the immediate drop in traffic, and we've heard conflicting advice on various aspects, which I'm hoping someone can help us with.
My question are thus:
do pages present in sitemaps also need to be spiderable from other pages? We had thought the point of a sitemap was specifically to help spiders get to content not already "visible". But now we're getting the advice to make sure every page is also linked to from another page. Which prompts the question... why bother with sitemaps?
some months on, and only 1% of the sitemap (well-formatted, according to webmaster tools) seems to have been spidered - is this usual?
Thanks in advance,
Phil Murphy
The XML sitemap helps search engine spider to indexing of all web pages of your site.
The sitemap is very usefull if you publish frequently many pages, but does not replace the correct system of linking of the site: all documents must be linke from an other related page.
Your site is very large, you must attention at the number of URLs published in the Sitemap because there are the limit of 50.000 URLs for each XML file.
The full documentation is available at Sitemaps.org
re: do pages present in sitemaps also need to be spiderable from other pages?
Yes, in fact this should be one of the first things you do. Make your website more usable to users before the search engines and the search engines will love you for it. Heavy internal linking between pages is a must first step. Most of the time you can do this with internal sitemap pages or category pages ect..
re: why bother with sitemaps?
Yes!, Site map help you set priorities for certain content on your site (like homepage), Tell the search engines what to look at more often. NOTE: Do not set all your pages with the highest priority, it confuses Google and doesn't help you.
re: some months on, and only 1% of the sitemap seems to have been spidered - is this usual?
YES!, I have a webpage with 100k+ pages. Google has never indexed them all in a single month, it takes small chunks of about 20k at a time each month. If you use the priority settings property you can tell the spider what pages they should re index each visit.
As Rinzi mentioned more documentation is available at Sitemaps.org
Try build more backlinks and "trust" (links from quality sources)
May help speed indexing further :)

What should i add to my site to make google index the subpages as well

I am a beginner web developer and i have a site JammuLinks.com, it is built on php. It is a city local listing search engine. Basically i've written search pages which take in a parameter, fetch the records from the database and display it. So it is dynamically generating the content. However if you look at the bottom of the site, i have added many static links where i have hard coded the parameters in the link like searchresult.php?tablename='schools'. So my question is
Since google crawls the page and also the links listed in the page, will it be crawling the results page data as well? How can i identify if it has. So far i tried site:www.jammulinks.com but it results the homepage and the blog alone.
What more can i add to make the static links be indexed by it as well.
The best way to do this is to create a sitemap document (you can even get the template from Google's webmaster portion of their sites, www.google.com/webmasters/ I believe).