Captcha Alternative, how secure? - captcha

I do the web page for my local library, and I was thinking it might be kind of appealing to have a "custom" captcha based on book covers. So serve up one of several dozen book covers, and have the patron filling out the form type the book title to prove they're human. Assuming I stripped the title/author info from the image and filename, would that be enough? Would the fact that it was a unique system on a fairly small website be enough to make it effective? Just how tricky are the spam bots these days?
Would having the image name be the ISBN # be too obvious?
Here is a sample cover:
(source: mfrl.org)

You need to make it difficult for an OCR system to read the text. Otherwise the spam bot will easily get through your captcha, without any customisation from a human spammer.
That's why you see funny XORing, noise and distortion on most captchas these days.
As a matter of principle, it makes sense to NOT base the image name on something that can be looked up, although in the case of a local library, chances are low that any spammers will be writing custom scripts to defeat your captcha...

Try, jQuery and html version from this:
Practical non-image based CAPTCHA approaches?

Related

Hiding a page part from Google, does it hurt SEO?

We all know that showing inexistent stuff to Google bots is not allowed and will hurt the search positioning but what about the other way around; showing stuff to visitors that are not displayed for Google bots?
I need to do this because I have photo pages each with the short title and the photo along with textarea containing the embed HTML code. googlebot is taking the embed code and putting it at the page description on its search results which is very ugly.
Please advise.
When you start playing with tricks like that, you need to consider several things.
... showing stuff to visitors that are not displayed for Google bots.
That approach is a bit tricky.
You can certainly check User-agents to see if a visitor is Googlebot, but Google can add any number of new spiders with different User-agents, which will index your images in the end. You will have to constantly monitor that.
Testing of each code release your website will have to check "images and Googlebot" scenario. That will extend testing phase and testing cost.
That can also affect future development - all changes will have to be done with "images and Googlebot" scenario in mind which can introduce additional constraints to your system.
Personally I would choose a bit different approach:
First of all review if you can use any methods recommended by Google. Google provides a few nice pages describing that problem e.g. Blocking Google or Block or remove pages using a robots.txt file.
If that is not enough, maybe restructuring of you HTML would help. Consider using JavaScript to build some customer facing interfaces.
And whatever you do, try to keep it as simple as possible, otherwise very complex solutions can turn around and bite you.
It is very difficult to give you very good advise without knowledge of your system, constraints and strategy. But I hope my answer will help you out to choose good architecture / solution for your system.
Boy, you want more.
Google does not because of a respect therefore judge you cheat, he needs a review, as long as your purpose to the user experience, the common cheating tactics, Google does not think you cheating.
just block these pages with robots.txt and you`ll be fine, it is not cheating - that's why they came with solution like that in the first place

reCAPTCHA vs other captcha systems

What is a good reason to choose reCAPTCHA over a well known and tested captcha generator on the server. Is it just philanthropy (helping with digitizing texts) or are there other good reasons.
reCAPTCHA is rather neat. Not only does it stop spammers but it helps digitize books. Each word that appears in the captcha has actually been scanned in from a book but sometimes the character recognition is off so the computer my save some gibberish of a sentence without knowing any better.
See the image off their site:
By making people type in what they think the word is, it helps create a digital copy of the book or word that was scanned with accuracy while at the same time checking what the user submit, comparing it to other's submissions, and determining if the user is human or not.
For that reason I use reCAPTCHA. I'm not just selfishly protecting my site, I'm providing a service for others.
Not only that but it's fairly simple to implement and provided by a reliable company (Google).
The question was "why should I use it"; that question must include "why shouldn't I use it", so some criticisms:
Recaptcha volunteers your users to be OCR monkeys, without bothering to ask their opinion.
It requires that you advertise recaptcha in the captcha widget, which isn't always appropriate.
It's a web service, which means there's no hard guarantee it'll still exist a week or a year or two years from now. (Google has crippled or removed public, widely-used APIs in the past, such as their translation API.)
It only supports web pages, loading everything with scripts and iframes. It doesn't have a proper API, so if you ever want to have an iOS or Android app that logs into your system, and need to show a captcha there, you'll be out of luck.
You have no control over the complexity of the generated captcha. Captchas always have a tradeoff between how hard they are to read and how difficult they are to OCR. There are no knobs to adjust, based on how important stopping robots is to your use case. If they decide to make the captchas much harder to read (which they've done at times), and this becomes a nuisance to your users, there's nothing you can do about it.
reCAPTCHA is quite good. Most other generators are broken easily while reCAPTCHA usually gets good scores.
Another good thing is that it has the accessiblity button so that it would read the text.
This is an old threat but I would just like to confirm that in my case we used reCAPTCHA on a number of Drupal 6 websites in combination with the Honeypot module. We did that to stop automated spam user registrations.
I presume these user accounts were being created automatically by desktop applications such as SEnuke XCr and XRumer with the aim of then posting spam. They create the user account but they rarely do anything further but I found it annoying. Further reading on this subject can be found here: How to prevent spam user registrations? (links to an article on Drupal.org).
I can confirm that the above reduced my spam user registrations from a little over 100 a day to none at all.
We need to register our IP address on which server would be running. Its seems some what risky. So we might be required to change registration work flow in case of use of reCAPTCHA.

Captcha's + Differnet Possibilities

I wanted to run some captcha possibities past people to see if they are easily by passed by bots etc.
What if colors were used - eg: there is a string of 10 characters are you ask people to type the red characters of where there are 5? Easy to bypass?
I've noticed a captcha on plentyoffish that involves typing in the characters under the circles. This seems a touch more complex - would this be more challenging for bots?
The other idea I was thinking was putting the requirement in an image as well meaning like in no. 1 above - you can put "type the red characters" in an image and this could change with different colors. Any value here?
Interested in what people think.
cheers
Colours are easy to bypass. A bot just takes the red channel and gets the answer. It is even easier than choosing between many possible solutions. The same applies to any noise that has another colour than the letters the user needs to find.
Symbols that don't touch the letters are very easy to ignore. Why would a bot even look at those circles that probably always stay at the same position? (valid but wasn't asked here)
Identifying circles or other symbols is easier than identifying letters, if one can do the latter, a simple symbol is no challenge.
I think captchas are used too frequently in places where they aren't the best tool. For instance, are you trying to prevent registration spam? Why use a captcha rather than email validation?
What are your intentions and have you considered alternatives to the (relatively ineffective) captcha technology?
As a side note, if you have to use them, I prefer KittyAuth myself :) http://thepcspy.com/kittenauth/#5
Color blind people will have trouble separating red from green letters. People who have trouble reading and understanding descriptions, or have other disabilities may have trouble reading the captchas too.
In some of these, the texts are so mangled that almost everyone has a hard time reading them.
I think captcha's, if used at all, should be quite easy to read. The one with the dots and triangles is doable, although it's a matter of time before someone writes an algorithm to hack them. It is very easy for computers to read this kind too.
The best way to deal with this, is increase moderation. Make your site so that it isn't rewarding to spam it at all. Don't make it the problem of your users.
Also, if you're gonna use captcha's, it may be better to build something yourself than to use common libraries. I've found that these are easier hacked, probably because it is more rewarding to write a captcha solver for something that is used by thhousands of sites.
No matter which CAPTCHA you construct, spammers will find a way to work around it, given enough incentive. Large CAPTCHA services like reCAPTCHA, for instance, get bypassed by outsourcing solving them to cheap labor in India(source).
If you run a small site, your best bet is to make your own mini-CAPTCHA, which asks a simple question. If it isn't a standard question, isn't a standard CAPTCHA module and isn't a large site, it isn't worth it for the spammers to automate bypassing it.
I've been working on a community site for an organization at my university, and we've had trouble with spammers registering, despite us using every CAPTCHA module in the book. As soon as we made our own simple one-question CAPTCHA, all spam stopped. The key to preventing this sort of spam often lies in uniqueness.

How does Google Know you are Cloaking?

I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tell that you have checked the user agent and executed a different code path because you saw "googlebot" in the name?
It's in relation to this question on legitimate url cloaking for seo. If textual content is exactly the same, but the rendering is different (1995-style html vs. ajax vs. flash), is there really a problem with cloaking?
Thanks for your put on this one.
As far as I know, how Google prepares search engine results is secret and constantly changing. Spoofing different user-agents is easy, so they might do that. They also might, in the case of Javascript, actually render partial or entire pages. "Do they have a team of human beings comparing?" This is doubtful. A lot has been written on Google's crawling strategies including this, but if humans are involved, they're only called in for specific cases. I even doubt this: any person-power spent is probably spent by tweaking the crawling engine.
Google looks at your site while presenting user-agent's other than googlebot.
See the Google Chrome comic book page 11 where it describes (even better than layman's terms) about how a Google tool can take a schematic of a web page. They could be using this or similar technology for Google search indexing and cloak detection - at least that would be another good use for it.
Google does hire contractors (indirectly, through an outside agency, for very low pay) to manually review documents returned as search results and judge their relevance to the search terms, quality of translations, etc. I highly doubt that this is their only tool for detecting cloaking, but it is one of them.
In reality, many of Google's algos are trivially reversed and are far from rocket science. In the case of, so called, "cloaking detection" all of the previous guesses are on the money (apart from, somewhat ironically, John K lol) If you don't believe me set up some test sites (inputs) and some 'cloaking test cases' (further inputs), submit your sites to uncle Google (processing) and test your non-assumptions via pseudo-advanced human-based cognitive correlationary quantum perceptions (<-- btw, i made that up for entertainment value (and now i'm nesting parentheses to really mess with your mind :)) AKA "checking google resuts to see if you are banned yet" (outputs). Loop until enlightenment == True (noob!) lol
A very simple test would be to compare the file size of a webpage the Googlbot saw against the file size of the page scanned by an alias user of Google that looks like a normal user.
This would detect most suspect candidates for closeer examination.
They call your page using tools like curl and they construct a hash based on the page without the user agent, then they construct another hash with the googlebot user-agent. Both hashes must be similars, they have algorithms to check the hashes and know if its cloaking or not

Most effective form of CAPTCHA?

Of all the forms of CAPTCHA available, which one is the "least crackable" while remaining fairly human readable?
I believe that CAPTCHA is dying. If someone really wants to break it, it will be broken. I read (somewhere, don't remember where) about a site that gave you free porn in exchange for answering CAPTCHAs to they can be rendered obsolete by bots. So, why bother?
Anyone who really wants to break this padlock can use a pair of bolt cutters, so why bother with the lock?
Anyone who really wants to steal this car can drive up with a tow truck, so why bother locking my car?
Anyone who really wants to open this safe can cut it open with an oxyacetylene torch, so why bother putting things in the safe?
Because using the padlock, locking your car, putting valuables in a safe, and using a CAPTCHA weeds out a large spectrum of relatively unsophisticated or unmotivated attackers. The fact that it doesn't stop sophisticated, highly motivated attackers doesn't mean that it doesn't work at all. Using a CAPTCHA isn't going to stop all spammers, but it's going to tremendously reduce the amount that requires filtering or manual intervention.
Heck look at the lame CAPTCHA that Jeff uses on his blog. Even a wimpy barrier like that still provides a lot of protection.
I agree with Thomas. Captcha is on its way out. But if you must use it, reCAPTCHA is a pretty good provider with a simple API.
I believe that CAPTCHA is dying. If someone really wants to break it, it will be broken. I read (somewhere, don't remember where) about a site that gave you free porn in exchange for answering CAPTCHAs to they can be rendered obsolete by bots. So, why bother?
If you're a small enough site, no one would bother.
If you're still looking for a CAPTCHA, I like tEABAG_3D by the OCR Research Team. It's complicated to break and uses your 3D vision. Plus, it being developed by people who break CAPTCHAs for fun.
If you're just looking for a captcha to prevent spammers from bombing your blog, the best option is something simple but unique. For example, ask to write the word "Cat" into a box. The advantage of this is that no targeted captcha-breaker was developed for this solution, and your small blog isn't important enough for someone to actually develop one. I've used such a captcha on my blog with some success for a couple of years now.
This information is hard to really know because I believe a CAPTCHA gets broken long before anybody knows about it. There is economic incentive for those that break them to keep it quiet.
I used to work with a guy whose job revolved mostly around breaking CAPTCHA's and I can tell you the one giving them fits currently is reCAPTCHA.
Now, does that mean it will forever, call me skeptical.
I wonder if a CAPTCHA mechanism that uses collage made of pictures and asks human to type what he sees in the collage image will be much more crack-proof than the text and number image one. Imagine that the mechanism stitches pictures of cat, cup and car into a collage image and expects human visitor to tick (checkboxes) cat, cup, and car. How long do you think will hackers and crackers will come up with an algorithm to crack the mechanism (i.e. extract image elements from the collage and recognize the object depicted by each picture) ...
If you wanted you could try out the Microsoft Research project Asirra: http://research.microsoft.com/asirra/
CAPTCHAS, I believe should start being considered heavily when designing the UX. They're slow, cumbersome, and a very poor user experience. They are useful, don't get me wrong but perhaps you should look into designing a honeypot.
A honeypot is created by adding a hiddenfield at the bottom of the form. Because spam bots will fill in all the fields on the page blindly you can do a check:
If honeypotfield <> Empty Then
"No Spam TY"
Else
//Proceed with the form
End If
This works until there is a specifically designed spambot for your site, so they can choose to fill out selected input fields.
For more information: http://haacked.com/archive/2007/09/11/honeypot-captcha.aspx/
As far as I know, the Google's one is the best that there is. It hasn't been broken by computer programs yet. What I know that the crackers have been doing is to copy the image and then send it to many phishing websites where humans solve them to enter those websites.
It doesn't matter if captchas are broken or not now -- there are Indian firms that do nothing but process captchas. I'm with the rest of the group in saying that Captchas are on their way out.
Here is a cool link to create CAPTCHA..... http://www.codeproject.com/aspnet/CaptchaImage.asp
Just.. don't.. There are several reasons use of captcha is not advised.
http://www.interfacegeek.com/dont-ever-use-captchas/
I use uniqpin.com - it's easy to use and not annoying for users. So, bots can recognise a text, but can't recognize a image.
Death by Captcha can solve any Regular CAPTCHA (incude reCAPTCHA), but not Speedcoin Cryptocurrency Captcha.
Death by Captcha - http://deathbycaptcha.com
Speedcoin Captcha - http://speedcoin.co/info/captcha/Speedcoin_Captcha.html