How does Google Know you are Cloaking? - seo

I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tell that you have checked the user agent and executed a different code path because you saw "googlebot" in the name?
It's in relation to this question on legitimate url cloaking for seo. If textual content is exactly the same, but the rendering is different (1995-style html vs. ajax vs. flash), is there really a problem with cloaking?
Thanks for your put on this one.

As far as I know, how Google prepares search engine results is secret and constantly changing. Spoofing different user-agents is easy, so they might do that. They also might, in the case of Javascript, actually render partial or entire pages. "Do they have a team of human beings comparing?" This is doubtful. A lot has been written on Google's crawling strategies including this, but if humans are involved, they're only called in for specific cases. I even doubt this: any person-power spent is probably spent by tweaking the crawling engine.

Google looks at your site while presenting user-agent's other than googlebot.

See the Google Chrome comic book page 11 where it describes (even better than layman's terms) about how a Google tool can take a schematic of a web page. They could be using this or similar technology for Google search indexing and cloak detection - at least that would be another good use for it.

Google does hire contractors (indirectly, through an outside agency, for very low pay) to manually review documents returned as search results and judge their relevance to the search terms, quality of translations, etc. I highly doubt that this is their only tool for detecting cloaking, but it is one of them.

In reality, many of Google's algos are trivially reversed and are far from rocket science. In the case of, so called, "cloaking detection" all of the previous guesses are on the money (apart from, somewhat ironically, John K lol) If you don't believe me set up some test sites (inputs) and some 'cloaking test cases' (further inputs), submit your sites to uncle Google (processing) and test your non-assumptions via pseudo-advanced human-based cognitive correlationary quantum perceptions (<-- btw, i made that up for entertainment value (and now i'm nesting parentheses to really mess with your mind :)) AKA "checking google resuts to see if you are banned yet" (outputs). Loop until enlightenment == True (noob!) lol

A very simple test would be to compare the file size of a webpage the Googlbot saw against the file size of the page scanned by an alias user of Google that looks like a normal user.
This would detect most suspect candidates for closeer examination.

They call your page using tools like curl and they construct a hash based on the page without the user agent, then they construct another hash with the googlebot user-agent. Both hashes must be similars, they have algorithms to check the hashes and know if its cloaking or not

Related

Hiding a page part from Google, does it hurt SEO?

We all know that showing inexistent stuff to Google bots is not allowed and will hurt the search positioning but what about the other way around; showing stuff to visitors that are not displayed for Google bots?
I need to do this because I have photo pages each with the short title and the photo along with textarea containing the embed HTML code. googlebot is taking the embed code and putting it at the page description on its search results which is very ugly.
Please advise.
When you start playing with tricks like that, you need to consider several things.
... showing stuff to visitors that are not displayed for Google bots.
That approach is a bit tricky.
You can certainly check User-agents to see if a visitor is Googlebot, but Google can add any number of new spiders with different User-agents, which will index your images in the end. You will have to constantly monitor that.
Testing of each code release your website will have to check "images and Googlebot" scenario. That will extend testing phase and testing cost.
That can also affect future development - all changes will have to be done with "images and Googlebot" scenario in mind which can introduce additional constraints to your system.
Personally I would choose a bit different approach:
First of all review if you can use any methods recommended by Google. Google provides a few nice pages describing that problem e.g. Blocking Google or Block or remove pages using a robots.txt file.
If that is not enough, maybe restructuring of you HTML would help. Consider using JavaScript to build some customer facing interfaces.
And whatever you do, try to keep it as simple as possible, otherwise very complex solutions can turn around and bite you.
It is very difficult to give you very good advise without knowledge of your system, constraints and strategy. But I hope my answer will help you out to choose good architecture / solution for your system.
Boy, you want more.
Google does not because of a respect therefore judge you cheat, he needs a review, as long as your purpose to the user experience, the common cheating tactics, Google does not think you cheating.
just block these pages with robots.txt and you`ll be fine, it is not cheating - that's why they came with solution like that in the first place

reCAPTCHA vs other captcha systems

What is a good reason to choose reCAPTCHA over a well known and tested captcha generator on the server. Is it just philanthropy (helping with digitizing texts) or are there other good reasons.
reCAPTCHA is rather neat. Not only does it stop spammers but it helps digitize books. Each word that appears in the captcha has actually been scanned in from a book but sometimes the character recognition is off so the computer my save some gibberish of a sentence without knowing any better.
See the image off their site:
By making people type in what they think the word is, it helps create a digital copy of the book or word that was scanned with accuracy while at the same time checking what the user submit, comparing it to other's submissions, and determining if the user is human or not.
For that reason I use reCAPTCHA. I'm not just selfishly protecting my site, I'm providing a service for others.
Not only that but it's fairly simple to implement and provided by a reliable company (Google).
The question was "why should I use it"; that question must include "why shouldn't I use it", so some criticisms:
Recaptcha volunteers your users to be OCR monkeys, without bothering to ask their opinion.
It requires that you advertise recaptcha in the captcha widget, which isn't always appropriate.
It's a web service, which means there's no hard guarantee it'll still exist a week or a year or two years from now. (Google has crippled or removed public, widely-used APIs in the past, such as their translation API.)
It only supports web pages, loading everything with scripts and iframes. It doesn't have a proper API, so if you ever want to have an iOS or Android app that logs into your system, and need to show a captcha there, you'll be out of luck.
You have no control over the complexity of the generated captcha. Captchas always have a tradeoff between how hard they are to read and how difficult they are to OCR. There are no knobs to adjust, based on how important stopping robots is to your use case. If they decide to make the captchas much harder to read (which they've done at times), and this becomes a nuisance to your users, there's nothing you can do about it.
reCAPTCHA is quite good. Most other generators are broken easily while reCAPTCHA usually gets good scores.
Another good thing is that it has the accessiblity button so that it would read the text.
This is an old threat but I would just like to confirm that in my case we used reCAPTCHA on a number of Drupal 6 websites in combination with the Honeypot module. We did that to stop automated spam user registrations.
I presume these user accounts were being created automatically by desktop applications such as SEnuke XCr and XRumer with the aim of then posting spam. They create the user account but they rarely do anything further but I found it annoying. Further reading on this subject can be found here: How to prevent spam user registrations? (links to an article on Drupal.org).
I can confirm that the above reduced my spam user registrations from a little over 100 a day to none at all.
We need to register our IP address on which server would be running. Its seems some what risky. So we might be required to change registration work flow in case of use of reCAPTCHA.

Is listing all products on the homepage's footer making a real difference SEO-wise?

I'm working on a website on which I am asked to add to the homepage's footer a list of all the products that are sold on the website along with a link to the products' detail pages.
The problem is that there are about 900 items to display.
Not only that doesn't look good but that makes the page render a lot slower.
I've been told that such a technique would improve the website's visibility in Search Engine.
I've also heard that such techniques could lead to the opposite effect: google seeing it as "spam".
My question is: Is listing products of a website on its homepage really efficient when it comes to becoming more visible on search engines?
That technique is called keyword stuffing and Google says that it's not a good idea:
"Keyword stuffing" refers to the practice of loading a webpage with keywords in an attempt to manipulate a site's ranking in Google's search results. Filling pages with keywords results in a negative user experience, and can harm your site's ranking. Focus on creating useful, information-rich content that uses keywords appropriately and in context.
Now you might want to ask: Does their crawler really realize that the list at the bottom of the page is just keyword stuffing? Well, that's a question that only Google could answer (and I'm pretty sure that they don't want to). In any case: Even if you could make a keyword stuffing block that is not recognized, they will probably improve they algorithm and -- sooner or later -- discover the truth. My recommendation: Don't do it.
If you want to optimize your search engine page ranking, do it "the right way" and read the Search Engine Optimization Guide published by Google.
Google is likely to see a huge list of keywords at the bottom of each page as spam. I'd highly recommend not doing this.
When is it ever a good idea to specify 900 items to a user? good practice dictates that large lists are usually paginated to avoid giving the user a huge blob of stuff to look through at once.
That's a good rule of thumb, if you're doing it to help the user, then it's probably good ... if you're doing it purely to help a machine (ie. google/bing), then it might be a bad idea.
You can return different html to genuine users and google by inspecting the user agent of the web request.
That way you can provide the google bot with a lot more text than you'd give a human user.
Update: People have pointed out that you shouldn't do this. I'm leaving this answer up though so that people know it's possible but bad.

What are the common sense SEO practices that aren't dodgy or crap? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
In SEO there are a few techniques that have been flagged that need to avoided at all costs. These are all techniques that used to be perfectly acceptable but are now taboo. Number 1: Spammy guest blogging: Blowing up a page with guest comments is no longer a benefit. Number 2: Optimized Anchors: These have become counterproductive, instead use safe anchors. Number 3: Low Quality Links: Often sites will be flooded with hyperlinks that take you to low quality Q&A sites, don’t do this. Number 4: Keyword Heavy Content: Try and avoid too many of these, use longer well written sections more liberally. Number 5: Link-Back Overuse: Back links can be a great way to redirect to your site but over saturation will make people feel trapped
Content, Content, CONTENT! Create worthwhile content that other people will want to link to from their sites.
Google has the best tools for webmasters, but remember that they aren't the only search engine around. You should also look into Bing and Yahoo!'s webmaster tool offerings (here are the tools for Bing; here for Yahoo). Both of them also accept sitemap.xml files, so if you're going to make one for Google, then you may as well submit it elsewhere as well.
Google Analytics is very useful for helping you tweak this sort of thing. It makes it easy to see the effect that your changes are having.
Google and Bing both have very useful SEO blogs. Here is Google's. Here is Bing's. Read through them--they have a lot of useful information.
Meta keywords and meta descriptions may or may not be useful these days. I don't see the harm in including them if they are applicable.
If your page might be reached by more than one URL (i.e., www.mysite.com/default.aspx versus mysite.com/default.aspx versus www.mysite.com/), then be aware that that sort of thing sometimes confuses search engines, and they may penalize you for what they perceive as duplicated content. Use the link rel="canoncial" element to help avoid this problem.
Adjust your site's layout so that the main content comes as early as possible in the HTML source.
Understand and utilize your robots.txt and meta robots tags.
When you register your domain name, go ahead and claim it for as long of a period of time as you can. If your domain name registration is set to expire ten years from now rather than one year from now, search engines will take you more seriously.
As you probably know already, having other reputable sites that link to your site is a good thing (as long as those links are legitimate).
I'm sure there are many more tips as well. Good luck!
In addition to having quality content, content should be added/updated regularly. I believe that Google (an likely others) will have some bias toward the general "freshness" of content on your site.
Also, try to make sure that the content that the crawler sees is as close as possible to what the user will see (can be tricky for localized pages). If you're careless, your site may be be blacklisted for "bait-and-switch" tactics.
Don't implement important text-based
sections in Flash - Google will
probably not see them and if it does,
it'll screw it up.
Google can Index Flash. I don't know how well but it can. :)
A well organized, easy to navigate, hierarchical site.
There are many SEO practices that all work and that people should take into consideration. But fundamentally, I think it's important to remember that Google doesn't necessarily want people to be using SEO. More and more, google is striving to create a search engine that is capable of ranking websites based on how good the content is, and solely on that. It wants to be able to see what good content is in ways in which we can't trick it. Think about, at the very beginning of search engines, a site which had the same keyword on the same webpage repeated 200 times was sure to rank for that keyword, just like a site with any number of backlinks, regardless of the quality or PR of the sites they come from, was assured Google popularity. We're past that now, but is SEO is still , in a certain way, tricking a search engine into making it believe that your site has good content, because you buy backlinks, or comments, or such things.
I'm not saying that SEO is a bad practice, far from that. But Google is taking more and more measures to make its search results independant of the regular SEO practices we use today. That is way I can't stress this enough: write good content. Content, content, content. Make it unique, make it new, add it as often as you can. A lot of it. That's what matters. Google will always rank a site if it sees that there is a lot of new content, and even more so if it sees content coming onto the site in other ways, especially through commenting.
Common sense is uncommon. Things that appear obvious to me or you wouldn't be so obvious to someone else.
SEO is the process of effectively creating and promoting valuable content or tools, ensuring either is totally accessible to people and robots (search engine robots).
The SEO process includes and is far from being limited to such uncommon sense principles as:
Improving page load time (through minification, including a trailing slash in URLs, eliminating unnecessary code or db calls, etc.)
Canonicalization and redirection of broken links (organizing information and ensuring people/robots find what they're looking for)
Coherent, semantic use of language (from inclusion and emphasis of targeted keywords where they semantically make sense [and earn a rankings boost from SE's] all the way through semantic permalink architecture)
Mining search data to determine what people are going to be searching for before they do, and preparing awesome tools/content to serve their needs
SEO matters when you want your content to be found/accessed by people -- especially for topics/industries where many players compete for attention.
SEO does not matter if you do not want your content to be found/accessed, and there are times when SEO is inappropriate. Motives for not wanting your content found -- the only instances when SEO doesn't matter -- might vary, and include:
Privacy
When you want to hide content from the general public for some reason, you have no incentive to optimize a site for search engines.
Exclusivity
If you're offering something you don't want the general public to have, you need not necessarily optimize that.
Security
For example, say, you're an SEO looking to improve your domain's page load time, so you serve static content through a cookieless domain. Although the cookieless domain is used to improve the SEO of another domain, the cookieless domain need not be optimized itself for search engines.
Testing In Isolation
Let's say you want to measure how many people link to a site within a year which is completely promoted with AdWords, and through no other medium.
When One's Business Doesn't Rely On The Web For Traffic, Nor Would They Want To
Many local businesses or businesses which rely on point-of-sale or earning their traffic through some other mechanism than digital marketing may not want to even consider optimizing their site for search engines because they've already optimized it for some other system, perhaps like people walking down a street after emptying out of bars or an amusement park.
When Competing Differently In An A Saturated Market
Let's say you want to market entirely through social media, or internet cred & reputation here on SE. In such instances, you don't have to worry much about SEO.
Go real and do for user not for robots you will reach the success!!
Thanks!

SEO for product known by different names

If you're selling widgets, we all know that having "Bob's Widgets" in the title and the H1 gives you a better ranking in Google when people search for "widgets".
But what if, as someone explained to me the other day, their product is known by different names in different parts of the world?
In the US, it's called a Widget. In Canada, it's called a Flidget. In Australia, it's called a Zidget. There's really no official name for it, just informal names.
Meta-tags are no problem, but apart from that, what's the best way to cope with that situation? Just make separate pages? You can't have 3 H1s on the page. One H1 which says "Widgets, (aka Flidgets, Zidgets)"?
Or do I just trust that Google is smart enough and some magical taxonomy database groups those three words together as the same thing?
EDIT: This question got downvoted simply because it's about SEO? How bizarre. If you even bother to read the question, you can see I'm not trying to game the system or get away with anything. I have a genuinely interesting question and a valid client need.
Please note also, that I always use semantic HTML, I am well aware of how search engine rankings work, and I'm not trying to get away with anything shady.
If my client was selling beer, I would simply use semantic HTML to put the word "beer" first and foremost. If I was selling beer to French people, I would make another page in French and do the same with "biere". But imagine for a second that beer isn't called "beer" in other English-speaking nations. Imagine it's called "reeb". How do I correctly, semantically code an English-language page when different English-language users will be searching using a different string, but searching for the same thing.
HTML meta-tags were originally created for the purpose of embedding exactly such metadata into a webpage. But because of the SEO industry and the commercialization of the web, meta-tags like 'keywords' are no longer used by major search engines.
With all of the advances in page ranking algorithms and intelligent search robots over the years, there's really not much to do in terms of active 'search engine optimization' for legitimate websites. In today's search environment, all you have to do is optimize your site for your visitors, and it will automatically be optimize for searching.
So you can passively optimize your site's ranking by doing any(or all) of the following:
Use good spelling and writing etiquette (like not writing your entire site in caps or text-message-speak)
Format your pages using proper markup. (Title your document, mark your headings with H1/H2/etc., delimit your paragraphs, and so on and so forth.)
Abide by established web standards and write well-formed code.
Weed out broken links and make sure your site works properly.
Don't use pop-ups, cover your site with banner ads, or otherwise bombard visitors with advertising
Don't link to disreputable websites
Simply put, make your site as user-friendly and as accessible as possible. If your site is useful to visitors and provides valuable content, most major search engines like Google or Yahoo! are smart enough to rank it fairly. Your ranking may be modest at first. But if you're genuinely supplying quality content then, as your site becomes better established on the web, other sites will start linking to you, increasing your search ranking.
And if other webpages linking to your site use the various names & nicknames your product is referred to by, then your site will also be associated with those names/keywords (that's how Google Bombing works). Google also tracks synonymous search terms and is even smart enough to recommend related/alternative search terms in some cases.
On the other hand, if you're creating a spam site or the 10 millionth affiliate marketing website with the same exact products and content as the other 9,999,999 sites of the same exact nature, then expect your search engine ranking to be reasonably poor.
It's generally only websites with no original content and that provide no legitimate value to visitors that require active (black hat) SEO techniques to gain a decent ranking--polluting search results in the process. Otherwise, if you're actually building a useful website, then just optimize it for your visitors and let Google/Yahoo! do their job.
The anchor text of your inbound links is a lot more important than the tags you use. So try getting links to your page with both "beer" and "reeb". As long as you'll get enough links with both terms, you'll do well in SERPs, no matter the keywords you use in it.
One option is to localize pages for the different target regions you are interested.
If you use a local domain, google will give it priority on default searches on that country. When I hit www.google.com, it redirects me to www.google.com.mx, and any search I do tends to display high results from mexico domains. I actually have to hit a couple options, when I don't want that behavior.
I also think google has an option to map parts of the site to a region, so you can keep the single domain.
Update: Regarding the beer example, you can localize per country (which is what I mention above). Actually its not that of a special need, since english british and english US have their differences.
The talk has been language agnostic, but consider how .net handle resources. Lets say the current request is being processed for en-GB, and you look for a resource (i.e. a text, image, etc). It will first try to find the resource for the specific culture: en-GB, if it isn't found it will look under the more general en (and then in the default resource file).
The previous allows you to selectively localize what you really need on the more specific resource files. If you only need to localize the resources with the key beerName, you can just configure that on the specific languages and leave the rest.