Is it really helpful to store content in flat-file than a database for better google/yahoo/bing searches? - sql

I just came across few articles, while selecting a wiki for my personal site. I am confused, as i am setting a personal wiki for my personal projects, i think a flat file system is good, even to maintain revisions of the design documents, design decisions, and comments/feedbacks from peers.
But the internet gives a mixed bag of responses, mostly irrelevant information. Can anyone please shed some light on the selection. It will be nice if some can share his experience for a wiki selection for this personal/small business site.

You're asking more about Search Engine Optimization (SEO), which has nothing much to do with how you store your content in the server. Whether in static HTML or as a DB-driven application, search engines will still index your pages by trawling from link to link.
Some factors that do affect search engines' ability to index your site:
Over-dependency on Javascript to
drive dynamic content. If certain
blocks on information can't even be
rendered on the page without
invoking use of Javascript, it will
be a problem. Search engines
typically don't execute the JS on
your page. They just take the
content as-is.
Not making use of proper HTML tags
to represent varying classes of
data. A <h1> tag is given more
emphasis by search engines than a
<p> tag. Basically, you just need
to have a proper grasp of what HTML
element to tag your content with.
URLs. Strictly speaking, I don't
think having complicated dynamic
URLs represent a problem for search
engines. However, I've seen some
weird content management systems
that expose several different URL
mappings just to point to the same
content. It would be logical that
the search engines deem this same
content as separate pages, which can
dilute your ranking.
There are other factors. I suggest you look up on "accessible web content" as your Google search key.
As for flat files vs DB-driven content, think about how you're going to manage the system. At the end of the day, it's your own labor (or your subordinates'). I, for one, sure don't want to spend my time managing content manually. So, a convenient content management system is pretty much mandatory. I know that there are a couple of Wiki implementations that write directly to flat files. As long as the management part of it is good enough, I'm sure they'd be fine for your purposes.

Related

Hiding a page part from Google, does it hurt SEO?

We all know that showing inexistent stuff to Google bots is not allowed and will hurt the search positioning but what about the other way around; showing stuff to visitors that are not displayed for Google bots?
I need to do this because I have photo pages each with the short title and the photo along with textarea containing the embed HTML code. googlebot is taking the embed code and putting it at the page description on its search results which is very ugly.
Please advise.
When you start playing with tricks like that, you need to consider several things.
... showing stuff to visitors that are not displayed for Google bots.
That approach is a bit tricky.
You can certainly check User-agents to see if a visitor is Googlebot, but Google can add any number of new spiders with different User-agents, which will index your images in the end. You will have to constantly monitor that.
Testing of each code release your website will have to check "images and Googlebot" scenario. That will extend testing phase and testing cost.
That can also affect future development - all changes will have to be done with "images and Googlebot" scenario in mind which can introduce additional constraints to your system.
Personally I would choose a bit different approach:
First of all review if you can use any methods recommended by Google. Google provides a few nice pages describing that problem e.g. Blocking Google or Block or remove pages using a robots.txt file.
If that is not enough, maybe restructuring of you HTML would help. Consider using JavaScript to build some customer facing interfaces.
And whatever you do, try to keep it as simple as possible, otherwise very complex solutions can turn around and bite you.
It is very difficult to give you very good advise without knowledge of your system, constraints and strategy. But I hope my answer will help you out to choose good architecture / solution for your system.
Boy, you want more.
Google does not because of a respect therefore judge you cheat, he needs a review, as long as your purpose to the user experience, the common cheating tactics, Google does not think you cheating.
just block these pages with robots.txt and you`ll be fine, it is not cheating - that's why they came with solution like that in the first place

Crawling for Eternity

I've recently been building a new web app dealing with Recurring Events. These events can recur on a daily, weekly or monthly basis.
This all is working great. But when I started creating the Event Browser Page (which will be visible to the public internet) a thought came across my mind.
If a crawler hits this page, with a next and previous button to browse the dates, it will just continue forever ? So I opted out of using generic HTML links and used AJAX. Which means that bots will not be able to follow the links.
But this method means I'm losing any that functionality for users without Javascript. Or is the amount of users without Javascript too small to worry ?
Is there a better way to handle this ?
I'm also very interested in how bots like the Google Crawler detects black holes like these and what it does to handle them ?
Add a nofollow tag to the page, or to the individual links you don't want crawled. This can be in robots.txt or in the page source. See the Robots Exclusion Standard
You may still need to think about how to fend off ill-behaved bots which do not respect the standard.
Even a minimally functional web crawler requires a lot more sophistication than you might imagine, and the situation you describe is not a problem. Crawlers operate on some variant of a breadth-first search, so even if they do nothing to detect black holes, it's not a big deal. Another typical feature of web crawlers that helps is that they avoid fetching a lot of pages from the same domain in a short time span, because otherwise they would inadvertently be performing a DOS attack against any site with less bandwidth than the crawler.
Even though it's not strictly necessary for a crawler to detect black holes, a good one might have all sorts of heuristics to avoid wasting time on low-value pages. For instance, it may choose ignore a pages that don't have a minimum amount of English (or whatever language) text, pages that contain nothing but links, pages that seem to contain binary data, etc. The heuristics don't have to be perfect because the basic breadth-first nature of the search ensures that no single site can waste too much of the crawler's time, and the sheer size of the web means that even if it misses some "good" pages, there are always plenty of other good pages to be found. (Of course this is from the perspective of the web crawler; if you own the pages being skipped, it might be more of a problem for you, but companies like Google that run web crawlers are intentionally secretive about the exact details of things like that because they don't want people trying to outguess their heuristics.)

Is there a way that is more efficient than sitemap to add/force recrawl/remove your website's index entries in google?

Pretty much that is the question. Is there a way that is more efficient than the standart sitemap.xml to [add/force recrawl/remove] i.e. manage your website's index entries in google?
I remember a few years ago I was reading an article of an unknown blogger that was saying that when he write news in his website, the url entry of the news will appear immediately in google's search result. I think he was mentioning about something special. I don't remember exactly what.. . some automatic re-crawling system that is offered by google themselves? However, I'm not sure about it. So I ask, do you think that I am blundering myself and there is NO OTHER way to manage index content besides sitemap.xml ? I just need to be sure about this.
Thank you.
I don't think you will find that magical "silver bullet" answer you're looking for, but here's some additional information and tips that may help:
Depth of crawl and rate of crawl is directly influenced by PageRank (one of the few things it does influence). So increasing your site's homepage and internal pages back-link count and quality will assist you.
QDF - this Google algorithm factor, "Query Deserves Freshness", does have a real impact and is one of the core reasons behind the Google Caffeine infrastructure project to allow much faster finding of fresh content. This is one of the main reasons that blogs and sites like SE do well - because the content is "fresh" and matches the query.
XML sitemaps do help with indexation, but they won't result in better ranking. Use them to assist search bots to find content that is deep in your architecture.
Pinging, especially by blogs, to services that monitor site changes like ping-o-matic, can really assist in pushing notification of your new content - this can also ensure the search engines become immediately aware of it.
Crawl Budget - be mindful of wasting a search engine's time on parts of your site that don't change or don't deserve a place in the index - using robots.txt and the robots meta tags can herd the search bots to different parts of your site (use with caution so as to not remove high value content).
Many of these topics are covered online, but there are other intrinsic things like navigational structure, internal linking, site architecture etc that also contribute just as much as any "trick" or "device".
Getting many links, from good sites, to your website will make the Google "spiders" reach your site faster.
Also links from social sites like Twitter can help the crawlers visit your site (although the Twitter links do not pass "link juice" - the spiders still go through them).
One last thing, update your content regularly, think of content as "Google Spider Food". If the spiders will come to your site, and will not find new food, they will not come back again soon, if each time they come, there is new food, they will come a lot. Article directories for example, get indexed several times a day.

What are cons to not to use <meta name="keywords" content="some, words" />

If i only use <meta name="description" content="lorem impsum." />
I heard search engines does not give importance to Keywords.
<meta name="keywords" content="some, words" />
So is it ok to not to use Keywords?
I have been looking for evidence of Meta Keyword support for years and never found any documentation that they are supported by anyone. Never. Most of the recommendations supporting them are recycled from everyone else.
Some people say that they may be used in the future... well, I'll get to that in a moment. Other people say that Keywords can't hurt so just include them anyway. But they are incorrect.
Meta keywords are great for letting your competitors know your SEO secrets. You wouldn't tell your competitors this information directly so, don't use them. These are the only people that are likely to look at your Meta Keywords.
Since Google set the bench mark of quality software, Search Engines must perform to very high standards to be successful. It's too easy for consumers to switch to Google which is trusted and reliable.
Consider this:
To build quality Search Engine you must, first of all acquire high quality information for indexing. This is the foundation of your product.
You must also protect your search index from being manipulated by third parties for their benefit. Your users will probably not have the same interest as a third party who can modify your Search Engine's behaviour.
Meta keywords are not derived from the content of the web page though any process that can be considered reliable. Meta Keywords are not directly related to web pages in any way and can be manipulated without consequence. This makes meta keywords a low quality source of information. They are what's known to programmers as "Tainted Data", data that is not to be trusted.
If you build your Search Engine to index low quality information, your Search Engine won't return useful search results. I propose that it would be impossible to build a search engine today that uses meta keywords that would work well at all.
It's important to stop using Meta Keywords and try to put the Meta Keywords myth to rest. They just waste everybody's time and are counter productive. Remember, It's not good practice to add features to your website that don't work. The time you spend with something that doesn't work could be better spent with something that does. Or maybe go look out the window and admire the sky. You'll be better off.
I heard search engines does not give
importance to Keywords.
Google doesn’t use the keywords meta tag for the web search (Source).
However, Yahoo (Source), Bing (Source), and other search engines may still be using them with various degrees of importance. They may also be used by internal search engines.
So is it ok to not to use Keywords?
"... I hope this clarifies that the keywords meta tag is not something that you need to worry about, or at least not in Google." - Mutt Cutts (Google doesn’t use the keywords meta tag in web search)
I have heard the same. However search engine algorithms are not static and may change over time. Furthermore not all search engines treat the keywords tag equally. I think you should include it if possible.
Google analyzes your page content and gives higher priority to other parts, but I don't know of any reason not to include meta tag keywords.

SEO for product known by different names

If you're selling widgets, we all know that having "Bob's Widgets" in the title and the H1 gives you a better ranking in Google when people search for "widgets".
But what if, as someone explained to me the other day, their product is known by different names in different parts of the world?
In the US, it's called a Widget. In Canada, it's called a Flidget. In Australia, it's called a Zidget. There's really no official name for it, just informal names.
Meta-tags are no problem, but apart from that, what's the best way to cope with that situation? Just make separate pages? You can't have 3 H1s on the page. One H1 which says "Widgets, (aka Flidgets, Zidgets)"?
Or do I just trust that Google is smart enough and some magical taxonomy database groups those three words together as the same thing?
EDIT: This question got downvoted simply because it's about SEO? How bizarre. If you even bother to read the question, you can see I'm not trying to game the system or get away with anything. I have a genuinely interesting question and a valid client need.
Please note also, that I always use semantic HTML, I am well aware of how search engine rankings work, and I'm not trying to get away with anything shady.
If my client was selling beer, I would simply use semantic HTML to put the word "beer" first and foremost. If I was selling beer to French people, I would make another page in French and do the same with "biere". But imagine for a second that beer isn't called "beer" in other English-speaking nations. Imagine it's called "reeb". How do I correctly, semantically code an English-language page when different English-language users will be searching using a different string, but searching for the same thing.
HTML meta-tags were originally created for the purpose of embedding exactly such metadata into a webpage. But because of the SEO industry and the commercialization of the web, meta-tags like 'keywords' are no longer used by major search engines.
With all of the advances in page ranking algorithms and intelligent search robots over the years, there's really not much to do in terms of active 'search engine optimization' for legitimate websites. In today's search environment, all you have to do is optimize your site for your visitors, and it will automatically be optimize for searching.
So you can passively optimize your site's ranking by doing any(or all) of the following:
Use good spelling and writing etiquette (like not writing your entire site in caps or text-message-speak)
Format your pages using proper markup. (Title your document, mark your headings with H1/H2/etc., delimit your paragraphs, and so on and so forth.)
Abide by established web standards and write well-formed code.
Weed out broken links and make sure your site works properly.
Don't use pop-ups, cover your site with banner ads, or otherwise bombard visitors with advertising
Don't link to disreputable websites
Simply put, make your site as user-friendly and as accessible as possible. If your site is useful to visitors and provides valuable content, most major search engines like Google or Yahoo! are smart enough to rank it fairly. Your ranking may be modest at first. But if you're genuinely supplying quality content then, as your site becomes better established on the web, other sites will start linking to you, increasing your search ranking.
And if other webpages linking to your site use the various names & nicknames your product is referred to by, then your site will also be associated with those names/keywords (that's how Google Bombing works). Google also tracks synonymous search terms and is even smart enough to recommend related/alternative search terms in some cases.
On the other hand, if you're creating a spam site or the 10 millionth affiliate marketing website with the same exact products and content as the other 9,999,999 sites of the same exact nature, then expect your search engine ranking to be reasonably poor.
It's generally only websites with no original content and that provide no legitimate value to visitors that require active (black hat) SEO techniques to gain a decent ranking--polluting search results in the process. Otherwise, if you're actually building a useful website, then just optimize it for your visitors and let Google/Yahoo! do their job.
The anchor text of your inbound links is a lot more important than the tags you use. So try getting links to your page with both "beer" and "reeb". As long as you'll get enough links with both terms, you'll do well in SERPs, no matter the keywords you use in it.
One option is to localize pages for the different target regions you are interested.
If you use a local domain, google will give it priority on default searches on that country. When I hit www.google.com, it redirects me to www.google.com.mx, and any search I do tends to display high results from mexico domains. I actually have to hit a couple options, when I don't want that behavior.
I also think google has an option to map parts of the site to a region, so you can keep the single domain.
Update: Regarding the beer example, you can localize per country (which is what I mention above). Actually its not that of a special need, since english british and english US have their differences.
The talk has been language agnostic, but consider how .net handle resources. Lets say the current request is being processed for en-GB, and you look for a resource (i.e. a text, image, etc). It will first try to find the resource for the specific culture: en-GB, if it isn't found it will look under the more general en (and then in the default resource file).
The previous allows you to selectively localize what you really need on the more specific resource files. If you only need to localize the resources with the key beerName, you can just configure that on the specific languages and leave the rest.