SEO: how can dynamic URL with query strings be searched by search engine bots? - sql

I’m developing an ecommerce web site in ASP.NET using SQL server 2008 database.
Most of my pages are database driven and all the content is gathered from a SQL Server.
Every product page is created dynamically from data coming from the database, hence every product’s page URL has a unique query string, containing a “product_id” variable.
*Example: http://www.myecommence.com/products.aspx?product_id=1*
I'd like to improve my Search Engine Optimization.
Dealing with a small number of products could be fine but what if I
had more than 1000 products, how could every product be crawled?
How does the google spider/bot know that a product_id with a
hypothetical number of 767 exists?
I’ve been googleing this, still I can’t understand how pages that
have absolutely no reference in the site or external sites can be
crawled? If this is possible the spider should know how to read the
website’s database tables, but I guess that this is not the case.
At this point since most of the pages and links are dynamic how
could they be indexed, the same thing applies to “user detail” pages
that are accessed via query string using a “user id=n”?
Probably what I’m asking has already been discussed but still I don’t have clear some points.

I would advise using Mod Rewrite rules to make your URLs search engine friendly.
This is very important for Google.
As is a good category structure.
Eg:
domain.com/t-shirts/girls/star-wars-t-shirt/
is far better than
domain.com/products.aspx?product_id=1*
Here is some info:
http://msdn.microsoft.com/en-us/library/ms972974.aspx
http://www.wrox.com/WileyCDA/Section/id-305997.html
To answer your questions:
Dealing with a small number of products could be fine but what if I had more than 1000 products, how could every product be crawled?
If you have a good sitemap / menu structure etc, it is likely that Google will crawl all your pages.
How does the google spider/bot know that a product_id with a hypothetical number of 767 exists?
Via crawling your site, via your sitemap, via the menu system on the site etc. However always remember: Google is not psychic - it cannot find a page unless you tell how to / link to it.
I’ve been googleing this, still I can’t understand how pages that have absolutely no reference in the site or external sites can be crawled? If this is possible the spider should know how to read the website’s database tables, but I guess that this is not the case.
If you have no reference - you are doing something wrong. Improve your site structure.
At this point since most of the pages and links are dynamic how could they be indexed, the same thing applies to “user detail” pages that are accessed via query string using a “user id=n”?
Nothing wrong with a dynamic URL per-se - but again I would recommend implementing search engine friendly URLs via Mod Rewrite or similar - see the above resources.
Good luck,
Colin

Modern systems optimize for SEO by allowing for either custom or automated URLs that remap to your id based url pattern. This URL style allows for a fully custom word for word product title or keyword/description, which carries more weight than a random id number in a URL.
To ensure all individual pages are indexed, you generally benefit most from submitting or making available a sitemap xml. More info from google on generating one here:
https://code.google.com/p/googlesitemapgenerator/
Hope that gets you going in the right direction!

Related

SEO Search Only content

We have a ton of content on our website which a user can get to by performing a search on the website. For example, we have data for all Public companies, in the form of individual pages per company. So think like 10,000 pages in total. Now in order to get to these pages, a user needs to search for the company name and from the search results, click on the company name they are interested in.
How would a search bot find this page? There is no page on the website which has links to these 10,000 pages. Think amazon, you need to search for your product and then from the search results, click on the product you are interested in to get to it.
The closest solution I could find was the sitemap.xml, is that it? Anything which doesn't require adding 10,000 links to an xml file?
You need to link to a page, or for it to be close to the homepage for it to stand a decent chance of getting indexed by Google.
A sitemap helps, sure, but a page still needs to exist in the menu / site structure. A sitemap reference alone does not guarantee a resource will be indexed.
Google - Webmaster Support on Sitemaps: "Google doesn't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."
If you browse Amazon, it will be possible to find 99% of the products available. Amazon do a lot of interesting stuff in their faceted navigation, you could write a book on it.
Speak to an SEO or a usability / CRO expert - they will be able to tell you what you need to do - which is basically create a user friendly site with categories & links to all your products.
An XML sitemap pretty much is your only on-site option if you do not or cannot link to these products on your website. You could link to these pages from other websites but that doesn't seem like a likely scenario.
Adding 10,000 products to an XML sitemap is easy to do. Your sitemap can be dynamic just like your web pages are. Just generate it on the fly when requested like you would a regular web page and include whatever products you want to be found and indexed.

SEO: Allowing crawler to index all pages when only few are visible at a time

I'm working on improving the site for the SEO purposes and hit an interesting issue. The site, among other things, includes a large directory of individual items (it doesn't really matter what these are). Each item has its own details page, which is accessed via
http://www.mysite.com/item.php?id=item_id
or
http://www.mysite.com/item.php/id/title
The directory is large - having about 100,000 items in it. Naturally, on any of the pages only a few items are listed. For example, on the main site homepage, there are links to about 5 or 6 items, from some other page there links to about a dozen different items, etc.
When real users visits the site, they can use search form to find item by keyword or location - so there would be a list produced matching their search criteria. However when, for example, a google crawler visits the site, it won't even attempt to put a text into the keyword search field and submit the form. Thus as far as the bot is concern, after indexing the entire site, it has covered only a few dozen items at best. Naturally, I want it to index each individual item separately. What are my options here?
One thing I considered is to check the user agent and IP ranges and if the requestor is a bot (as best I can say), then add a div to the end of the most relevant page with links to each individual item. Yes, this would be a huge page to load - and I'm not sure how google bot would react to this.
Any other things I can do? What are best practices here?
Thanks in advance.
One thing I considered is to check the user agent and IP ranges and if
the requestor is a bot (as best I can say), then add a div to the end
of the most relevant page with links to each individual item. Yes,
this would be a huge page to load - and I'm not sure how google bot
would react to this.
That would be a very bad thing to do. Serving up different content to the search engines specifically for their benefit is called cloaking and is a great way to get your site banned. Don't even consider it.
Whenever a webmaster is concerned about getting their pages indexed having an XML sitemap is an easy way to ensure the search engines are aware of your site's content. They're very easy to create and update, too, if your site is database driven. The XML file does not have to be static so you can dynamically produce it whenever the search engines request it (Google, Yahoo, and Bing all support XML sitemaps). You can find out mroe about XML sitemaps at sitemaps.org.
If you want to make your content available to search engines and want to benefit from semantic markup (i.e. HTML) you should also make sure your all of content can be reached through hyperlinks (in other words not through form submissions or JavaScript). The reason for this is twofold:
The anchor text in the links to your items will contain the keywords you want to rank well for. This is one of the more heavily weighted ranking factors.
Links count as "votes", especially to Google. Links from external websites, especially related websites, are what you'll hear people recommend the most and for good reason. They're valuable to have. But internal links carry weight, too, and can be a great way to prop up your internal item pages.
(Bonus) Google has PageRank which used to be a huge part of their ranking algorithm but plays only a small part now. But it still has value and links "pass" PageRank to each page they link to increasing the PageRank of that page. When you have as many pages as you do that's a lot of potential PageRank to pass around. If you built your site well you could probably get your home page to a PageRank of 6 just from internal linking alone.
Having an HTML sitemap that somehow links to all of your products is a great way to ensure that search engines, and users, can easily find all of your products. It is also recommended that you structure your site so more important pages are closer to the root of your website (home page) and then as you branch out gets to sub pages (categories) and then to specific items. This gives search engines an idea of what pages are important and helps them organize them (which helps them rank them). It also helps them follow those links from top to bottom and find all of your content.
Each item has its own details page, which is accessed via
http://www.mysite.com/item.php?id=item_id
or
http://www.mysite.com/item.php/id/title
This is also bad for SEO. When you can pull up the same page using two different URLs you have duplicate content on your website. Google is on a crusade to increase the quality of their index and they consider duplicate content to be low quality. Their infamous Panda Algorithm is partially out to find and penalize sites with low quality content. Considering how many products you have it is only a matter of time before you are penalized for this. Fortunately the solution is easy. You just need to specify a canonical URL for your product pages. I recommend the second format as it is more search engine friendly.
Read my answer to an SEO question at the Pro Webmaster's site for even more information on SEO.
I would suggest for starters having an xml sitemap. Generate a list of all your pages, and submit this to Google via webmaster tools. It wouldn't hurt having a "friendly" sitemap either - linked to from the front page, which lists all these pages, preferably by category, too.
If you're concerned with SEO, then having links to your pages is hugely important. Google could see your page and think "wow, awesome!" and give you lots of authority -- this authority (some like to call it link juice" is then passed down to pages that are linked from it. You ought to make a hierarchy of files, more important ones closer to the top and/or making it wide instead of deep.
Also, showing different stuff to the Google crawler than the "normal" visitor can be harmful in some cases, if Google thinks you're trying to con it.
Sorry -- A little bias on Google here - but the other engines are similar.

SEO Question, and about Server.Transfer (Asp.net)

So, we're trying to up our application in the rankings in the search engines, and one way our SEO guy told us to do that was to register similar domains...for example we have something like
http://www.myapplication.com/parks.html
so..we acquired the domain parks.com (again just an example).
Now when people go to http://www.parks.com ...we want it to display the content of http://www.myapplication.com/parks.html.
I could just put a forwarding page there, but from what i've been told that makes us look bad because it's technically a permanent redirect..and we're trying to get higher in the search engine rankings, not lower.
Is this a situation where we would use the Server.Transfer method of ASP.net?
How are situations like this handled, because I've defiantly seen this done by many websites.
We also don't want to cheat the system, we are showing relevant content and not spam or tricking customers in anyway, so the proper way to do achieve what i'm looking for would be great.
Thanks
Use your "similar" domain names to host individual and targetted landing pages that will point to your master content.
It's easier to manage and you will get a higher conversion rate.
Having to create individual page will force you to write relevent content and will increase the popularity of the page.
I also suggest you to not only build landing pages, but mini sites (of few pages).
SEO is sa very high demanding task.
Regarding technical aspects: Server.Transfer is what you should use. Never use Response.Redirect, Google and other search engines will drop your ranking.
I used permanent URL rewrite in the past. I changed my website and since lots of traffic was coming from others website linking mine, I wanted to have a permanent solution.
Read more about URL rewriting : http://msdn.microsoft.com/en-us/library/ms972974.aspx

Should a sitemap have *every* url

I have a site with a huge number (well, thousands or tens of thousands) of dynamic URLs, plus a few static URLs.
In theory, due to some cunning SEO linkage on the homepage, it should be possible for any spider to crawl the site and discover all the dynamic urls via a spider-friendly search.
Given this, do I really need to worry about expending the effort to produce a dynamic sitemap index that includes all these URLs, or should I simply ensure that all the main static URLs are in there?
That actual way in which I would generate this isn't a concern - I'm just questioning the need to actually do it.
Indeed, the Google FAQ (and yes, I know they're not the only search engine!) about this recommends including URLs in the sitemap that might not be discovered by a crawl; based on that fact, then, if every URL in your site is reachable from another, surely the only URL you really need as a baseline in your sitemap for a well-designed site is your homepage?
If there is more than one way to get to a page, you should pick a main URL for each page that contains the actual content, and put those URLs in the site map. I.e. the site map should contain links to the actual content, not every possible URL to get to the same content.
Also consider putting canonical meta tags in the pages with this main URL, so that spiders can recognise a page even if it's reachable through different dynamical URLs.
Spiders only spend a limited time searching each site, so you should make it easy to find the actual content as soon as possible. A site map can be a great help as you can use it to point directly to the actual content so that the spider doesn't have to look for it.
We have had a pretty good results using these methods, and Google now indexes 80-90% of our dynamic content. :)
In an SO podcast they talked about limitations on the number of links you could include/submit in a sitemap (around 500 per page with a page limit based on pagerank?) and how you would need to break them over multiple pages.
Given this, do I really need to worry
about expending the effort to produce
a dynamic sitemap index that includes
all these URLs, or should I simply
ensure that all the main static URLs
are in there?
I was under the impression that the sitemap wasn't necessarily about disconnected pages but rather about increasing the crawling of existing pages. In my experience when a site includes a sitemap, minor pages even when prominently linked to are more likely to appear on Google results. Depending on the pagerank/inbound links etc. of your site this may be less of an issue.

Is a deep directory structure a bad thing for SEO?

a friend of mine told me that the company he works at are redoing their SEO for their large website. Large == both number of pages and traffic they get a day.
Currently they have a (quote) deeply nested site , which i'm assuming means /x/y/z/a/b/c.. or something. I also know it's very unRESTful from some of the pages i've also seen -> eg. foo.blah?a=1&b=2&c=3......z=24 (yep, lots of crap in the url).
So updating their SEO sounds like a much needed thing.
But, they are going flat. I mean -> totally flat. eg. /foo-bar-pew-pew-abc-article1
This scares the bollox out of me.
From what he said (if i understood him right), each - character doesn't mean a new heirachial level.
so /foo-bar-pew-pew-abc-article1 does not mean /foo/bar/pew/pew/abc/article1
A space could be replace by a -. A + represents a space, but only if the two words are suppose to be one word (whatever that means). ie. Jean-Luke will be jean+luke but if i had a subject like 'hello world, that would be listed ashello-world`.
Excuse me while i blow my head up.
Is this just mean or is it totally silly to go completly flat. To mean, I was under the impression that when SEO people say keep it as flat as possible, they are trying to say keep it to 1 or 2 levels. 4 is the utter max=.
Is this me or is a flat heirachy a 'really really good thing' for seo ... for MEDIUM and LARGE sites (lots of resources, not necessairly lots of hits/page views).
Well, let's take a step back and look at what SEO is supposed to accomplish; it's meant to help a search engine identify quality, relevant content for users based on key phrases and terms.
Take, for example, the following blog URLs:
* http://blog.example.com/articles/2010/01/20/how-to-improve-seo/
* http://blog.example.com/how-to-improve-seo/
Yes, one is deep and the other is flat; but the URL structure is important for two reasons:
URL terms and phrases are high-value targets for determining relevance of a page by a search engine
A confusing URL may immediately force a user to skip your link in the search results
Let's face it: Google and other search engines can associate even the worst URLs with relevant content.
Take, for example, a search for "sears kenmore white refrigerator" in Google: http://www.google.com/search?q=sears+kenmore+white+refrigerator&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a.
Notice the top hit? The URL is http://www.sears.com/shc/s/p_10153_12605_04665802000P , and yet Google replaces the lousy URL with www.sears.com › Refrigerators › Top Freezers. (Granted, 2 results down is the true URL.)
If your goal for SEO is optimized organic relevance, then I would wholeheartedly recommend generating either key/value pairs in the URL, like www.sears.com/category/refrigerators/company/kenmore (meh), or phrase-like URLs like www.sears.com/kenmore/refrigerators/modelNumber. You want to align your URLs with the user's search terms and phrases to maximize your effort.
In the end, if you offer valuable content and you structure your content and site properly, the search engines will accurately gather it. You just need to help them realize how specific and authoritative your content is. :)
Generally the less navigation to reach content the better. But with a logical breadcrumb strategy and well thought out deep linking the excess of directory depth can be managed and not hurt seo and the visibility in search.
Remember that Google is trying to return the most relevant link and the best user experience, so if your site has 3 urls coming up for the same search term and it take 2 or 3 exits to find the appropriate content, Google will read that as bad and start lowering all of your urls in SERPs.
You have to consider how visitors will find your content - not navigate it. Think content discovery and just navigation.
HTH
Flat or deeply nested really shouldn't affect the SEO. The key part is how those individual pages are linked to will determine how they get ranked. I did write some basic stuff on this years ago see here, but essentially if pages are not buried deeply within a site, i.e. it takes several clicks (or links from Google's perspective) then they should rank fairly much the same in either case. Google used to put a lot more weight on keywords in URL's but this has been scaled back in more recent algorithm changes. It helps to have keywords there, but its no longer the be-all and end-all.
What you/they will need to consider are the following two important points:
1) How will the URL structure be perceived by the users of the site? Will they they be able to easily navigate the site and not have to rely on the URL structure in the address bar?
2) In making navigational changes such as this its vitally important to set-up redirects from old url's. Google, hates 404's and they should either put in 410 (Gone) HTTP responses for pages are no longer valid or 301 HTTP response for permanent redirects (with new url).
In making any large changes such as this you can save loads of time getting the site indexed successfully by utilising XML sitemaps and Google's webmaster console.