reCAPTCHA vs other captcha systems - captcha

What is a good reason to choose reCAPTCHA over a well known and tested captcha generator on the server. Is it just philanthropy (helping with digitizing texts) or are there other good reasons.

reCAPTCHA is rather neat. Not only does it stop spammers but it helps digitize books. Each word that appears in the captcha has actually been scanned in from a book but sometimes the character recognition is off so the computer my save some gibberish of a sentence without knowing any better.
See the image off their site:
By making people type in what they think the word is, it helps create a digital copy of the book or word that was scanned with accuracy while at the same time checking what the user submit, comparing it to other's submissions, and determining if the user is human or not.
For that reason I use reCAPTCHA. I'm not just selfishly protecting my site, I'm providing a service for others.
Not only that but it's fairly simple to implement and provided by a reliable company (Google).

The question was "why should I use it"; that question must include "why shouldn't I use it", so some criticisms:
Recaptcha volunteers your users to be OCR monkeys, without bothering to ask their opinion.
It requires that you advertise recaptcha in the captcha widget, which isn't always appropriate.
It's a web service, which means there's no hard guarantee it'll still exist a week or a year or two years from now. (Google has crippled or removed public, widely-used APIs in the past, such as their translation API.)
It only supports web pages, loading everything with scripts and iframes. It doesn't have a proper API, so if you ever want to have an iOS or Android app that logs into your system, and need to show a captcha there, you'll be out of luck.
You have no control over the complexity of the generated captcha. Captchas always have a tradeoff between how hard they are to read and how difficult they are to OCR. There are no knobs to adjust, based on how important stopping robots is to your use case. If they decide to make the captchas much harder to read (which they've done at times), and this becomes a nuisance to your users, there's nothing you can do about it.

reCAPTCHA is quite good. Most other generators are broken easily while reCAPTCHA usually gets good scores.
Another good thing is that it has the accessiblity button so that it would read the text.

This is an old threat but I would just like to confirm that in my case we used reCAPTCHA on a number of Drupal 6 websites in combination with the Honeypot module. We did that to stop automated spam user registrations.
I presume these user accounts were being created automatically by desktop applications such as SEnuke XCr and XRumer with the aim of then posting spam. They create the user account but they rarely do anything further but I found it annoying. Further reading on this subject can be found here: How to prevent spam user registrations? (links to an article on Drupal.org).
I can confirm that the above reduced my spam user registrations from a little over 100 a day to none at all.

We need to register our IP address on which server would be running. Its seems some what risky. So we might be required to change registration work flow in case of use of reCAPTCHA.

Related

A captcha and forced registration still isn't stopping robot spam

I'm using the phpdug template at http://fantasybookmark.com and there's still spam coming through. What additional steps should I take? Is recaptcha THAT much better than the captcha system in place here?
It is common secret that bots easily pass reCAPTCHA. You can find references and code for this by internet searching, but do not use google since it bought reCAPTCHA and suppresses such searches
Even if not, reCAPTCHA does not protect from automatic engagement of sweat shops (on failing to automatically circumvent reCAPTCHA) integrated in many professional spam bots. That is, nothing in it prevents to screenshot reCAPTCHA and pass it for solution to 3d party human solvers. See my other answer on it
The captcha you have there is extremely weak... very easy for a program like teseract to break. And yes, ReCAPTCHA is a much better quality CAPTCHA.

How does Google Know you are Cloaking?

I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tell that you have checked the user agent and executed a different code path because you saw "googlebot" in the name?
It's in relation to this question on legitimate url cloaking for seo. If textual content is exactly the same, but the rendering is different (1995-style html vs. ajax vs. flash), is there really a problem with cloaking?
Thanks for your put on this one.
As far as I know, how Google prepares search engine results is secret and constantly changing. Spoofing different user-agents is easy, so they might do that. They also might, in the case of Javascript, actually render partial or entire pages. "Do they have a team of human beings comparing?" This is doubtful. A lot has been written on Google's crawling strategies including this, but if humans are involved, they're only called in for specific cases. I even doubt this: any person-power spent is probably spent by tweaking the crawling engine.
Google looks at your site while presenting user-agent's other than googlebot.
See the Google Chrome comic book page 11 where it describes (even better than layman's terms) about how a Google tool can take a schematic of a web page. They could be using this or similar technology for Google search indexing and cloak detection - at least that would be another good use for it.
Google does hire contractors (indirectly, through an outside agency, for very low pay) to manually review documents returned as search results and judge their relevance to the search terms, quality of translations, etc. I highly doubt that this is their only tool for detecting cloaking, but it is one of them.
In reality, many of Google's algos are trivially reversed and are far from rocket science. In the case of, so called, "cloaking detection" all of the previous guesses are on the money (apart from, somewhat ironically, John K lol) If you don't believe me set up some test sites (inputs) and some 'cloaking test cases' (further inputs), submit your sites to uncle Google (processing) and test your non-assumptions via pseudo-advanced human-based cognitive correlationary quantum perceptions (<-- btw, i made that up for entertainment value (and now i'm nesting parentheses to really mess with your mind :)) AKA "checking google resuts to see if you are banned yet" (outputs). Loop until enlightenment == True (noob!) lol
A very simple test would be to compare the file size of a webpage the Googlbot saw against the file size of the page scanned by an alias user of Google that looks like a normal user.
This would detect most suspect candidates for closeer examination.
They call your page using tools like curl and they construct a hash based on the page without the user agent, then they construct another hash with the googlebot user-agent. Both hashes must be similars, they have algorithms to check the hashes and know if its cloaking or not

Another answer to the CAPTCHA problem? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Most sites at least employ server access log checking and banning along with some kind of bot prevention measure like a CAPTCHA (those messed-up text images).
The problem with CAPTCHAs is that they poss a threat to the user experience. Luckily they now come with user friendly features like refresh and audio versions.
Anyway, like linux vs windows, it isn't worth the time of a spammer to customize and/or build a script to handle a custom CAPTCHA example that only pertains to one site. Therefore, I was wondering if there might be better ways to handle the whole CAPTCHA thing.
In A Better CAPTCHA Peter Bromberg mentions that one way would be to convert the image to HTML and display it embedded in the page. On http://shiflett.org/ Chris simply asks users to type his name into an input. Examples like this are ways to simplifying the CAPTCHA experience while decreasing the value for spammers. Does anyone know of more good examples I could use or see any problem with the embedded image idea?
Image presented as HTML table is just a technical speed bump. There's no difficulty in extraction of pixels from such document.
IMHO CAPTCHA puts focus on a wrong thing – you're not interested whether there's a human on the other side. You wouldn't like human to spam you either. So take a step back and focus on spam:
Analyze text (look for spammy keywords, use bayesian filtering)
Analyze links (blacklist spammy domains – SURBL, LinkSleeve)
Look at traffic patterns and block floods
There's no single perfectly accurate method, but you can use few of them and weight the result to get pretty close.
Have a look at source code of Sblam! (it's a completely transparent server-side comment spam filter).
Alternatives to captchas are going to be to consider the problem from other angles. The reason for this is because captchas are built around the idea that a human and computer actor can be distinguished. As Artificial intelligence progresses, this will always become an increasingly difficult problem as the gap between computer and human users shrinks.
The technique used here on slashdot is for other users of the site to act as gatekeepers, marking abuse and removing offending posts before they become noticeable to a wide audience.
Another technique is to detect spam-like posts directly, using the same technology used to filter spam from email. Obviously it isn't 100% effective for email, and wont be for other uses, either, but if you can filter out 75% of the spam with very few false positives being filtered, then other techniques will only have to deal with the remaining 25%.
Keep a log of spam-related activity, so that you can track trends about offending ip addresses, content of posts, claimed user agent, and so forth, so that you can block abusive users at a routing level.
In nearly all cases, your users would rather put up with the slight inconvenience of abuse prevention, than the huge inconvenience of a major spam problem.
Ultimately, the arms race between you and spammers is one of cost-benefit. Initially, it will cost spammers close to nothing to spam your site, but you can change that to make it very difficult. Even if they continue to spam your site, the benefit they recieve will never grow beyond a few innocent users falling for their schemes. Once the cost of spamming rises sharply above the benefit, the spammers will go away.
Another way to benefit from that is to allow advertising on your site. Make it inexpensive (but not free, of course) and easy for legitimate advertisers to post responsible marketing material for your users to see. Would be spammers may find that it is a better deal to just pay you a few dollars and get their offering seen than to pursue clandestine methods.
Obviously most spammers won't fit in this category, since that is often more about getting your users to fall victim to malware exploits. You can do your part for that by encouring users to use modern, up to date browsers or plugins so that they become less vulnerable to those same exploits.
This article describes a technique based on hashed field names (changing with each page view) with some of them being honeypot fields (i.e. the request is rejected if they're filled) that are hidden from human users via various techniques.
Basically, it relies on spam scripts not being sophisticated enough to determine which form fields are actually visible. In a way, that is a CAPTCHA, since in order to solve it reliably, not only would they have to implement HTML, CSS and JavaScript fully, they'd also have to recognize when a field is too small to see, colored the same as the background, hidden behind another field, placed outside the browser's viewport, etc.
It's the same basic problem that makes Web Standards a farce: there is no algorithm to determine whether a webpage "looks right" - only a human can decide that.
seen this?
It's a system with cute pictures instead of captcha ;)
But I still think honeypots are a better solution - they're so cheap&easy&invisible
I really think that Dinah hit the nail on the head. The fact seems to be that the beauty of the whole CAPTCHA setup is that there is no standard. Standardizing would only help the market to be more profitable.
Therefore it seems that the best way to handle the CAPTCHA problem is to come up with a fairly hard system for bots to catch that is NOT used by anyone else on the planet. It could be a question system, a very custom image creator, or even a mix of JS calls that only browsers respect.
By the time that your site is big enough for spammers to care you should have the budget to rethink your CAPTCHA setup and optimize it much more. In the mean time we should be monitoring our server logs and banning bad agents, refers, and IP's.
In my case I created a CAPTCHA image that I believe is very different from any other CAPTCHA I have seen. This should do fine for now along side my Apache logs + htaccess banning and Aksimet checking. Maybe I should spend time on a reporting feature as well.
although not a true image captcha, good turing test is asking users a random question - common options are: is ice hot or cold? 5+2= ..? etc.

Implementing CAPTCHA after 50% of Article

We are planning to put large number of Business Research Reports and Articles from our intranet on to the Internet. However, we don't want others to copy the content and host it on their own.
I read about protection by CAPTCHA and was wondering if this is possible. Readers should be able to read 50% of the article for FREE after which a CAPTCHA should be entered to read the rest of the article [In this way we are making life little harder for those copycats]
Any pointers on how to implment this ? The content is in HTML and programming experience in Perl, PHP. Can hire others if required.
Aditionally, search engine will crawl half of the article and wondering if it will penalize the site for not being able to crawl the rest of the article since it won't be able to crack the CAPTCHA ?
Thanks.
There's a really good Captcha service provided by Recaptcha - http://recaptcha.net/
There is a PHP class that you can use to do all the hard work.
It's important to bear in mind that search engines aren't able to solve a Captcha and so they will only index the first half of the report. As long as this half contains largely the correct key words, it shouldn't cause a massive problem. Don't make the mistake of "detecting" a search engine and showing them different content to a normal user as the major search engines think that this is spamming.
An alternative solution would be to use a service like Copyscape (http://www.copyscape.com/) to protect your content.
I know this is not what you're asking, but please take into account that CAPTCHAs are universally broken, and will not protect your content. You said the first half is free, does that mean you intend to charge for the other half? CAPTCHA won't help you here at all...
But even if you're just trying to prevent automated scraping, CAPTCHA still won't do the trick. Check out my answer to another captcha question... Or you can go straight to the ppt I presented at OWASP last year.
Readers should be able to read 50% of the article for FREE after which a CAPTCHA should be entered to read the rest of the article
Have your PHP programmer output 50% of the article. On the bottom, add a captcha. If the user types in the correct captcha, output 100% of the article.
Any pointers on how to implment this ? The content is in HTML and programming experience in Perl, PHP. Can hire others if required.
As a PHP programmer, I use http://www.phpcaptcha.org to implement captcha.
Aditionally, search engine will crawl half of the article and wondering if it will penalize the site for not being able to crawl the rest of the article since it won't be able to crack the CAPTCHA ?
No, it won't penalize you but that particular section will not be shown on the search results.
As already mentioned reCAPTCHA is a good way to go.
Have a look at Captcha::reCAPTCHA on CPAN which according to the CPAN rating reviews "Works out of the box"
If your want Captcha then there are plenty of modules that do this on CPAN ;-)
Hope that helps.

Most effective form of CAPTCHA?

Of all the forms of CAPTCHA available, which one is the "least crackable" while remaining fairly human readable?
I believe that CAPTCHA is dying. If someone really wants to break it, it will be broken. I read (somewhere, don't remember where) about a site that gave you free porn in exchange for answering CAPTCHAs to they can be rendered obsolete by bots. So, why bother?
Anyone who really wants to break this padlock can use a pair of bolt cutters, so why bother with the lock?
Anyone who really wants to steal this car can drive up with a tow truck, so why bother locking my car?
Anyone who really wants to open this safe can cut it open with an oxyacetylene torch, so why bother putting things in the safe?
Because using the padlock, locking your car, putting valuables in a safe, and using a CAPTCHA weeds out a large spectrum of relatively unsophisticated or unmotivated attackers. The fact that it doesn't stop sophisticated, highly motivated attackers doesn't mean that it doesn't work at all. Using a CAPTCHA isn't going to stop all spammers, but it's going to tremendously reduce the amount that requires filtering or manual intervention.
Heck look at the lame CAPTCHA that Jeff uses on his blog. Even a wimpy barrier like that still provides a lot of protection.
I agree with Thomas. Captcha is on its way out. But if you must use it, reCAPTCHA is a pretty good provider with a simple API.
I believe that CAPTCHA is dying. If someone really wants to break it, it will be broken. I read (somewhere, don't remember where) about a site that gave you free porn in exchange for answering CAPTCHAs to they can be rendered obsolete by bots. So, why bother?
If you're a small enough site, no one would bother.
If you're still looking for a CAPTCHA, I like tEABAG_3D by the OCR Research Team. It's complicated to break and uses your 3D vision. Plus, it being developed by people who break CAPTCHAs for fun.
If you're just looking for a captcha to prevent spammers from bombing your blog, the best option is something simple but unique. For example, ask to write the word "Cat" into a box. The advantage of this is that no targeted captcha-breaker was developed for this solution, and your small blog isn't important enough for someone to actually develop one. I've used such a captcha on my blog with some success for a couple of years now.
This information is hard to really know because I believe a CAPTCHA gets broken long before anybody knows about it. There is economic incentive for those that break them to keep it quiet.
I used to work with a guy whose job revolved mostly around breaking CAPTCHA's and I can tell you the one giving them fits currently is reCAPTCHA.
Now, does that mean it will forever, call me skeptical.
I wonder if a CAPTCHA mechanism that uses collage made of pictures and asks human to type what he sees in the collage image will be much more crack-proof than the text and number image one. Imagine that the mechanism stitches pictures of cat, cup and car into a collage image and expects human visitor to tick (checkboxes) cat, cup, and car. How long do you think will hackers and crackers will come up with an algorithm to crack the mechanism (i.e. extract image elements from the collage and recognize the object depicted by each picture) ...
If you wanted you could try out the Microsoft Research project Asirra: http://research.microsoft.com/asirra/
CAPTCHAS, I believe should start being considered heavily when designing the UX. They're slow, cumbersome, and a very poor user experience. They are useful, don't get me wrong but perhaps you should look into designing a honeypot.
A honeypot is created by adding a hiddenfield at the bottom of the form. Because spam bots will fill in all the fields on the page blindly you can do a check:
If honeypotfield <> Empty Then
"No Spam TY"
Else
//Proceed with the form
End If
This works until there is a specifically designed spambot for your site, so they can choose to fill out selected input fields.
For more information: http://haacked.com/archive/2007/09/11/honeypot-captcha.aspx/
As far as I know, the Google's one is the best that there is. It hasn't been broken by computer programs yet. What I know that the crackers have been doing is to copy the image and then send it to many phishing websites where humans solve them to enter those websites.
It doesn't matter if captchas are broken or not now -- there are Indian firms that do nothing but process captchas. I'm with the rest of the group in saying that Captchas are on their way out.
Here is a cool link to create CAPTCHA..... http://www.codeproject.com/aspnet/CaptchaImage.asp
Just.. don't.. There are several reasons use of captcha is not advised.
http://www.interfacegeek.com/dont-ever-use-captchas/
I use uniqpin.com - it's easy to use and not annoying for users. So, bots can recognise a text, but can't recognize a image.
Death by Captcha can solve any Regular CAPTCHA (incude reCAPTCHA), but not Speedcoin Cryptocurrency Captcha.
Death by Captcha - http://deathbycaptcha.com
Speedcoin Captcha - http://speedcoin.co/info/captcha/Speedcoin_Captcha.html