A captcha and forced registration still isn't stopping robot spam - captcha

I'm using the phpdug template at http://fantasybookmark.com and there's still spam coming through. What additional steps should I take? Is recaptcha THAT much better than the captcha system in place here?

It is common secret that bots easily pass reCAPTCHA. You can find references and code for this by internet searching, but do not use google since it bought reCAPTCHA and suppresses such searches
Even if not, reCAPTCHA does not protect from automatic engagement of sweat shops (on failing to automatically circumvent reCAPTCHA) integrated in many professional spam bots. That is, nothing in it prevents to screenshot reCAPTCHA and pass it for solution to 3d party human solvers. See my other answer on it

The captcha you have there is extremely weak... very easy for a program like teseract to break. And yes, ReCAPTCHA is a much better quality CAPTCHA.

Related

reCAPTCHA vs other captcha systems

What is a good reason to choose reCAPTCHA over a well known and tested captcha generator on the server. Is it just philanthropy (helping with digitizing texts) or are there other good reasons.
reCAPTCHA is rather neat. Not only does it stop spammers but it helps digitize books. Each word that appears in the captcha has actually been scanned in from a book but sometimes the character recognition is off so the computer my save some gibberish of a sentence without knowing any better.
See the image off their site:
By making people type in what they think the word is, it helps create a digital copy of the book or word that was scanned with accuracy while at the same time checking what the user submit, comparing it to other's submissions, and determining if the user is human or not.
For that reason I use reCAPTCHA. I'm not just selfishly protecting my site, I'm providing a service for others.
Not only that but it's fairly simple to implement and provided by a reliable company (Google).
The question was "why should I use it"; that question must include "why shouldn't I use it", so some criticisms:
Recaptcha volunteers your users to be OCR monkeys, without bothering to ask their opinion.
It requires that you advertise recaptcha in the captcha widget, which isn't always appropriate.
It's a web service, which means there's no hard guarantee it'll still exist a week or a year or two years from now. (Google has crippled or removed public, widely-used APIs in the past, such as their translation API.)
It only supports web pages, loading everything with scripts and iframes. It doesn't have a proper API, so if you ever want to have an iOS or Android app that logs into your system, and need to show a captcha there, you'll be out of luck.
You have no control over the complexity of the generated captcha. Captchas always have a tradeoff between how hard they are to read and how difficult they are to OCR. There are no knobs to adjust, based on how important stopping robots is to your use case. If they decide to make the captchas much harder to read (which they've done at times), and this becomes a nuisance to your users, there's nothing you can do about it.
reCAPTCHA is quite good. Most other generators are broken easily while reCAPTCHA usually gets good scores.
Another good thing is that it has the accessiblity button so that it would read the text.
This is an old threat but I would just like to confirm that in my case we used reCAPTCHA on a number of Drupal 6 websites in combination with the Honeypot module. We did that to stop automated spam user registrations.
I presume these user accounts were being created automatically by desktop applications such as SEnuke XCr and XRumer with the aim of then posting spam. They create the user account but they rarely do anything further but I found it annoying. Further reading on this subject can be found here: How to prevent spam user registrations? (links to an article on Drupal.org).
I can confirm that the above reduced my spam user registrations from a little over 100 a day to none at all.
We need to register our IP address on which server would be running. Its seems some what risky. So we might be required to change registration work flow in case of use of reCAPTCHA.

How do I make sure my website can block automation scripts, bots?

I'd like to make sure that my website blocks automation tools like Selenium and QTP. Is there a way to do that ?
What settings on a website is Selenium bound to fail with ?
With due consideration to the comments on the original question asking "why on earth would you do this?", you basically need to follow the same strategy that any site uses to verify that a user is actually human. Methods such as asking users to authenticate or enter text from images or the like will probably work, but this will likely have the effect of blocking google crawlers and everything else.
Doing anything based on user agent strings or anything like that is mostly useless. Those are trivial to fake.
Rate-limiting connections or similar might have limited effectiveness, but it seems like you're going to inadvertently block any web crawlers too.
While this questions seems to be strange it is funny, so I tried to investigate possibilities
Besides adding a CAPTCHA which is the best and the only ultimate solution, you can block Selenium by adding the following JavaScript to your pages (this example will redirect to the Google page, but you can do anything you want):
<script>
var loc = window.parent.location.toString();
if (loc.indexOf("RemoteRunner.html")!=-1) {
// It is run in Selenium RC, so do something
document.location="http://www.google.com";
}
</script>
I do not know how can you block other automation tools and I am not sure if this will not block Selenium IDE
to be 100% certain that no automated bots/scripts can be run against your websites, don't have a website online. This will meet your requirement with certainty.
CAPTCHA are easy to break if not cheap, thanks to crowdsourcing and OCR methods.
Proxies can be found in the wild for free or bulk are available at extremely low costs. Again, useless to limit connection rates or detect bots.
One possible approach can be in your application logic, implement ways to increase time and cost for access to the site by having things like phone verification, credit card verification. Your website will never get off the ground because nobody will trust your site at it's infancy.
Solution: Do not put your website online and expect to be able to effectively eliminate bots and scripts from running.

Negative Captchas - help me understand spam bots better

I have to decide a technique to prevent spam bots from registering my site. In this question I am mainly asking about negative captchas.
I came to know about many weaknesses of bots but want to know more. I read somewhere that majority of bots do not render/support javascript. Why is it so? How do I test that the visiting program can't evaluate javascript?
I started with this question Need suggestions/ideas for easy-to-use but secure captchas
Please answer to that question if you have some good captcha ideas.
Then I got ideas about negative captchas here
http://damienkatz.net/2007/01/negative_captch.html
But Damien has written that though this technique likely won't work on big community sites (for long), it will work just fine for most smaller sites.
So, what are the chances of somebody making site-specific bots? I assume my site will be a very popular one. How much safe this technique will be considering that?
Negative captchas using complex honeypot implementations here described here
http://nedbatchelder.com/text/stopbots.html
Does anybody know how easily can it be implemented? Are there some plugins available?
Thanks,
Sandeepan
I read somewhere that majority of bots do not render/support javascript. Why is it so?
Simplicity of implementation — you can read web page source and post forms with just dozen lines of code in high-level languages. I've seen bots that are ridiculously bad, e.g. parsing HTML with regular expressions and getting ../ in URLs wrong. But it works well enough apparently.
However, running JavaScript engine and implementing DOM library is much more complex task. You have to deal with scripts that do while(1);, that depend on timers, external resources, CSS, sniff browsers and do lots of crazy stuff. The amount of work you need to do quickly starts looking like writing a full browser engine.
It's also computationally much much expensive, so probably it's not as profitable for spammers — they can have dumb bot that silently spams 100 pages/second, or fully-featured one that spams 2 pages/second and hogs victim's computer like a typical web browser would.
There's middle ground in implementing just a simple site-specific hack, like filling in certain form field if known script pattern is noticed in the page.
So, what are the chances of somebody making site-specific bots? I assume my site will be a very popular one. How much safe this technique will be considering that?
It's a cost/benefit trade-off. If you have high pagerank, lots of visitors or something of monetary value, or useful for spamming, then some spammer might notice you and decide workaround is worth his time. OTOH if you just have a personal blog or small forum, there's million others unprotected waiting to be spammed.
How do I test that the visiting program can't evaluate javascript?
Create a hidden field with some fixed value, then write a js which increments or changes it and you will see in the response..

Implementing CAPTCHA after 50% of Article

We are planning to put large number of Business Research Reports and Articles from our intranet on to the Internet. However, we don't want others to copy the content and host it on their own.
I read about protection by CAPTCHA and was wondering if this is possible. Readers should be able to read 50% of the article for FREE after which a CAPTCHA should be entered to read the rest of the article [In this way we are making life little harder for those copycats]
Any pointers on how to implment this ? The content is in HTML and programming experience in Perl, PHP. Can hire others if required.
Aditionally, search engine will crawl half of the article and wondering if it will penalize the site for not being able to crawl the rest of the article since it won't be able to crack the CAPTCHA ?
Thanks.
There's a really good Captcha service provided by Recaptcha - http://recaptcha.net/
There is a PHP class that you can use to do all the hard work.
It's important to bear in mind that search engines aren't able to solve a Captcha and so they will only index the first half of the report. As long as this half contains largely the correct key words, it shouldn't cause a massive problem. Don't make the mistake of "detecting" a search engine and showing them different content to a normal user as the major search engines think that this is spamming.
An alternative solution would be to use a service like Copyscape (http://www.copyscape.com/) to protect your content.
I know this is not what you're asking, but please take into account that CAPTCHAs are universally broken, and will not protect your content. You said the first half is free, does that mean you intend to charge for the other half? CAPTCHA won't help you here at all...
But even if you're just trying to prevent automated scraping, CAPTCHA still won't do the trick. Check out my answer to another captcha question... Or you can go straight to the ppt I presented at OWASP last year.
Readers should be able to read 50% of the article for FREE after which a CAPTCHA should be entered to read the rest of the article
Have your PHP programmer output 50% of the article. On the bottom, add a captcha. If the user types in the correct captcha, output 100% of the article.
Any pointers on how to implment this ? The content is in HTML and programming experience in Perl, PHP. Can hire others if required.
As a PHP programmer, I use http://www.phpcaptcha.org to implement captcha.
Aditionally, search engine will crawl half of the article and wondering if it will penalize the site for not being able to crawl the rest of the article since it won't be able to crack the CAPTCHA ?
No, it won't penalize you but that particular section will not be shown on the search results.
As already mentioned reCAPTCHA is a good way to go.
Have a look at Captcha::reCAPTCHA on CPAN which according to the CPAN rating reviews "Works out of the box"
If your want Captcha then there are plenty of modules that do this on CPAN ;-)
Hope that helps.

Most effective form of CAPTCHA?

Of all the forms of CAPTCHA available, which one is the "least crackable" while remaining fairly human readable?
I believe that CAPTCHA is dying. If someone really wants to break it, it will be broken. I read (somewhere, don't remember where) about a site that gave you free porn in exchange for answering CAPTCHAs to they can be rendered obsolete by bots. So, why bother?
Anyone who really wants to break this padlock can use a pair of bolt cutters, so why bother with the lock?
Anyone who really wants to steal this car can drive up with a tow truck, so why bother locking my car?
Anyone who really wants to open this safe can cut it open with an oxyacetylene torch, so why bother putting things in the safe?
Because using the padlock, locking your car, putting valuables in a safe, and using a CAPTCHA weeds out a large spectrum of relatively unsophisticated or unmotivated attackers. The fact that it doesn't stop sophisticated, highly motivated attackers doesn't mean that it doesn't work at all. Using a CAPTCHA isn't going to stop all spammers, but it's going to tremendously reduce the amount that requires filtering or manual intervention.
Heck look at the lame CAPTCHA that Jeff uses on his blog. Even a wimpy barrier like that still provides a lot of protection.
I agree with Thomas. Captcha is on its way out. But if you must use it, reCAPTCHA is a pretty good provider with a simple API.
I believe that CAPTCHA is dying. If someone really wants to break it, it will be broken. I read (somewhere, don't remember where) about a site that gave you free porn in exchange for answering CAPTCHAs to they can be rendered obsolete by bots. So, why bother?
If you're a small enough site, no one would bother.
If you're still looking for a CAPTCHA, I like tEABAG_3D by the OCR Research Team. It's complicated to break and uses your 3D vision. Plus, it being developed by people who break CAPTCHAs for fun.
If you're just looking for a captcha to prevent spammers from bombing your blog, the best option is something simple but unique. For example, ask to write the word "Cat" into a box. The advantage of this is that no targeted captcha-breaker was developed for this solution, and your small blog isn't important enough for someone to actually develop one. I've used such a captcha on my blog with some success for a couple of years now.
This information is hard to really know because I believe a CAPTCHA gets broken long before anybody knows about it. There is economic incentive for those that break them to keep it quiet.
I used to work with a guy whose job revolved mostly around breaking CAPTCHA's and I can tell you the one giving them fits currently is reCAPTCHA.
Now, does that mean it will forever, call me skeptical.
I wonder if a CAPTCHA mechanism that uses collage made of pictures and asks human to type what he sees in the collage image will be much more crack-proof than the text and number image one. Imagine that the mechanism stitches pictures of cat, cup and car into a collage image and expects human visitor to tick (checkboxes) cat, cup, and car. How long do you think will hackers and crackers will come up with an algorithm to crack the mechanism (i.e. extract image elements from the collage and recognize the object depicted by each picture) ...
If you wanted you could try out the Microsoft Research project Asirra: http://research.microsoft.com/asirra/
CAPTCHAS, I believe should start being considered heavily when designing the UX. They're slow, cumbersome, and a very poor user experience. They are useful, don't get me wrong but perhaps you should look into designing a honeypot.
A honeypot is created by adding a hiddenfield at the bottom of the form. Because spam bots will fill in all the fields on the page blindly you can do a check:
If honeypotfield <> Empty Then
"No Spam TY"
Else
//Proceed with the form
End If
This works until there is a specifically designed spambot for your site, so they can choose to fill out selected input fields.
For more information: http://haacked.com/archive/2007/09/11/honeypot-captcha.aspx/
As far as I know, the Google's one is the best that there is. It hasn't been broken by computer programs yet. What I know that the crackers have been doing is to copy the image and then send it to many phishing websites where humans solve them to enter those websites.
It doesn't matter if captchas are broken or not now -- there are Indian firms that do nothing but process captchas. I'm with the rest of the group in saying that Captchas are on their way out.
Here is a cool link to create CAPTCHA..... http://www.codeproject.com/aspnet/CaptchaImage.asp
Just.. don't.. There are several reasons use of captcha is not advised.
http://www.interfacegeek.com/dont-ever-use-captchas/
I use uniqpin.com - it's easy to use and not annoying for users. So, bots can recognise a text, but can't recognize a image.
Death by Captcha can solve any Regular CAPTCHA (incude reCAPTCHA), but not Speedcoin Cryptocurrency Captcha.
Death by Captcha - http://deathbycaptcha.com
Speedcoin Captcha - http://speedcoin.co/info/captcha/Speedcoin_Captcha.html