Stop spam without captcha - captcha

I want to stop spammers from using my site. But I find CAPTCHA very annoying. I am not just talking about the "type the text" type, but anything that requires the user to waste his time to prove himself human.
What can I do here?

Requiring Javascript to post data blocks a fair amount of spam bots while not interfering with most users.
You can also use an nifty trick:
<input type="text" id="not_human" name="name" />
<input type="text" name="actual_name" />
<style>
#not_human { display: none }
</style>
Most bots will populate the first field, so you can block them.

I combine a few methods that seem quite successful so far:
Provide an input field with the name email and hide it with CSS
display: none. When the form is submitted check if this field is
empty. Bots tend to fill this with a bogus emailaddress.
Provide another hidden input field which contains the time the page
is loaded. Check if the time between loading and submitting the page
is larger the minimum time it takes to fill in the form. I use
between 5 and 10 seconds.
Then check if the number of GET parameters are as you would expect.
If your forms action is POST and the underlying URL of your
submission page is index.php?p=guestbook&sub=submit, then you
expect 2 GET parameters. Bots try to add GET parameters so this
check would fail.
And finally, check if the HTTP_USER_AGENT is set, which bots sometimes don't set,
and that the HTTP_REFERER is the URL of the page of your form. Bots
sometimes just POST to the submission page causing the HTTP_REFERER
to be something else.
I got most of my information from http://www.braemoor.co.uk/software/antispam.shtml and http://www.nogbspam.com/.

Integrate the Akismet API to automatically filter your users' posts.

If you're looking for a .NET solution, the Ajax Control Toolkit has a control named NoBot.
NoBot is a control that attempts to provide CAPTCHA-like bot/spam prevention without requiring any user interaction. NoBot has the benefit of being completely invisible. NoBot is probably most relevant for low-traffic sites where blog/comment spam is a problem and 100% effectiveness is not required.
NoBot employs a few different anti-bot techniques:
Forcing the client's browser to perform a configurable JavaScript calculation and verifying the result as part of the postback. (Ex: the calculation may be a simple numeric one, or may also involve the DOM for added assurance that a browser is involved)
Enforcing a configurable delay between when a form is requested and when it can be posted back. (Ex: a human is unlikely to complete a form in less than two seconds)
Enforcing a configurable limit to the number of acceptable requests per IP address per unit of time. (Ex: a human is unlikely to submit the same form more than five times in one minute)
More discussion and demonstration at this blogpost by Jacques-Louis Chereau on NoBot.
<ajaxToolkit:NoBot
ID="NoBot2"
runat="server"
OnGenerateChallengeAndResponse="CustomChallengeResponse"
ResponseMinimumDelaySeconds="2"
CutoffWindowSeconds="60"
CutoffMaximumInstances="5" />

I would be careful using CSS or Javascript tricks to ensure a user is a genuine real life human, as you could be introducing accessibility issues, cross browser issues, etc. Not to mention spam bots can be fairly sophisticated, so employing cute little CSS display tricks may not even work anyway.
I would look into Akismet.
Also, you can be creative in the way you validate user data. For example, let's say you have a registration form that requires a user email and address. You can be fairly hardcore in how you validate the email address, even going so far as to ensure the domain is actually set up to receive mail, and that there is a mailbox on that domain that matches what was provided. You could also use Google Maps API to try and geolocate an address and ensure it's valid.
To take this even further, you could implement "hard" and "soft" validation errors. If the mail address doesn't match a regex validation string, then that's a hard fail. Not being able to check the DNS records of the domain to ensure it accepts mail, or that the mailbox exists, is a "soft" fail. When you encounter a soft fail, you could then ask for CAPTCHA validation. This would hopefully reduce the amount of times you'd have to push for CAPTCHA verification, because if you're getting enough activity on the site, valid people should be entering valid data at least some of the time!

I realize this is a rather old post, however, I came across an interesting solution called the "honey-pot captcha" that is easy to implement and doesn't require javascript:
Provide a hidden text box!
Most spambots will gladly complete the hidden text box allowing you to politely ignore them.
Most of your users will never even know the difference.
To prevent a user with a screen reader from falling into your trap simply label the text box "If you are human, leave blank" or something to that affect.
Tada! Non-intrusive spam-blocking! Here is the article:
http://www.campaignmonitor.com/blog/post/3817/stopping-spambots-with-two-simple-captcha-alternatives

Since it is extremely hard to avoid it at 100% I recommend to read this IBM article posted 2 years ago titled 'Real Web 2.0: Battling Web spam', where visitor behavior and control workflow are analyzed well and concise
Web spam comes in many forms, including:
Spam articles and vandalized articles on wikis
Comment spam on Weblogs
Spam postings on forums, issue trackers, and other discussion sites
Referrer spam (when spam sites pretend to refer users to a target
site that lists referrers)
False user entries on social networks
Dealing with Web spam is very difficult, but a Web developer
neglects spam prevention at his or her
peril. In this article, and in a
second part to come later, I present
techniques, technologies, and services
to combat the many sorts of Web spam.
Also is linked a very interesting "...hashcash technique for minimizing spam on Wikis and such, in addition to e-mail."

How about a human readable question that tells the user to put in the first letter of the value he put in the first name field and the last letter of the last name field or something like this?
Or show some hidden fields which are filled with JavaScript with values like referer and so one. Check for equality of these fields with the ones you have stored in the session before.
If the values are empty, the user has no javascript. Then it would be no spam. But a bot will at least fill in some of them.

Surely you should select one thing Honeypot or BOTCHA.

Related

Client does not want invalid logins to redirect

I would like to ask this question to developers who have a good sense of design. I see that whenever a website uses a popup box for their login page, they will always post and then redirect to the next page whether it be content or a dedicated login page for an invalid login error.
I have a client who has been asking to cut out as much page refreshing as possible, including the login function. They would like to see the login error appear on the login popup box without a page refresh.
I have not noticed a web based businesses do this, so I'm wondering if there's a valid reason to avoid this. I personally think that a page refreshing allows users to recognize their input has been registered and the next page appearing will be a solid response to their action either good or bad.
Having no refresh and expecting the user to notice that some error text has appeared seems like a bad idea?
Notes
The question is most likely more appropriate for https://ux.stackexchange.com .
You can find a lot of stuff by searching "ajax logins" in a search engine
There already is this question that might indeed be a duplicate of this one. Since I was not sure and I had already wrote most of this answer before finding that I post it nonetheless.
The title ought to be changed to a question (maybe something like "Is it a bad idea to use ajax to return login errors?").
Actual Answer
In my opinion ajax logins could indeed make less clear whether there really has been a successful interaction with the server or not.
Some ideas to improve it might be:
to include the time of the login request in the error message
to explicitly assure in the error message that the credentials have been received and checked
to be sure that this error does indeed only occur after the credentials have been determined to be not valid (and not because of problems with scripts or the network, for example).
A good way to ensure this might be to have the server always send the full text of the error, rather than a code that selects a message stored in the page source (and to be careful its caching).
This becomes relevant only after the user has been using the site for some time, of course (and has incurred in the error and verified that it was indeed due to a mistake on his part).
to use some animated feedback to highlight the dispatch and the reception of the reply to the user. As with the text you should ensure that the animations do not give (too) incorrect indications.
Basically these suggestions would be applicable to any ajax form entry, but they are more important for logins because:
in this context it's a lot easier to make typing mistakes (in the typing of the password)
and mistakes have a drastic, immediate annoying outcome: the inability to authenticate and the necessity to input again the entire password
And so uncertainties on whether the input has really been received and processed are a lot more bothering.
All in all anyhow it's pretty complex to do this well, with both an appealing appearance and a reassuring feedback.
The ones ajax logins that I've incurred into did not do a good job (I think I have indeed experienced false login errors with them).
You can find several ajax login frameworks/plugins by searching for "ajax logins". I have not looked into any of them.

Protect page from requests

What's another way, instead of recaptcha, to protect the booting page?
My customers do not like to be filling these things out and I'm out of ideas
Without more details, it will be difficult to know what is appropriate for your situation. Here are some things that may or may not work for you depending on your situation:
Instead of a CAPTCHA use a simple question. Arithmetic perhaps. Or asking someone to type a word that is undistorted rather than distorted like a CAPTCHA. Or having someone always type the same word into a box. This is nowhere near as strong as a CAPTCHA, but it may be enough depending on your needs.
Password protect your site. Depending on your needs, you can use a shared password, individual accounts specific to your site, or something like Facebook Connect / OAuth / OpenID.
You can try a robots.txt file, but that will only keep well-behaved robots away. Attackers will, of course, ignore it.
You can firewall off your server so people can only access it from a certain subnet. If everyone using the site has access to the same VPN, then they can use VPN to access the site.
If you don't like any of #Trott's suggestions - A simple CAPTCHA replacement, but I am not sure for how long (a somewhat sophisticated attacker could crack it):
Add this into your form:
<input name="dummy" value="" style="display: hidden"/>
Then in your server code,
if params['dummy'].empty?
# user
else
# spambot!
end
This relies on spambots compulsively filling out unknown form fields (so that they don't leave out any mandatory ones); but a user will never see it, and thus always leave empty.

How to Verify whether a Robot is Entering Information

I have a web form which the users fill and the info send to server and stored on a database. I am worried that Robots might just fill in the form and I will end up with a database full of useless records. How can I prevent Robots from filling in my forms? I am thinking maybe something like Stackoverflow's robot detection, where if it thinks you are a robot, it asks you to verify that you are not. Is there a server-side API in Perl, Java or PHP?
There are several solutions.
Use a CAPTCHA. SO uses reCAPTCHA as far as I know.
Add an extra field to your form and hide it with CSS (display:none). A normal user would not see this field and therefore will not fill it. You check at the submission if this field is empty. If not, then you are dealing with a robot that has carefully filled out all form fields. This technique is usually referred to as a "honeypot".
Add a JavaScript timer function. At the page load it starts a value at zero and then increases it as time passes. A normal user would read and fill out your form for some time and only then submit it. A robot would just fill out and submit the form immediately upon receiving it. You check if the value has gone much from zero at the submission. If it has, then it is likely a real user. If you see just a couple of seconds (or even no value at all due to the robots not executing JavaScript) then it is likely a robot. This will however only work if you decide you will require your users have JavaScript on in order to perform "write" operations.
There are other techniques for sure. But these are quite simple and effective.
You can use reCAPTCHA (same as stackoverflow) - they have libraries for a number of programming languages.
I've always preferred Honeypot captcha (article by phil haack), as its less invasive to the user.
Captchas bring accessibility problems and will be ultimately defeated by software recognition.
I recommand the reading of this short article about bot traps, which include hidden fields, as Matthew Vines and New in town already suggested.
Anyway, you are still free to use both captcha and bot traps.
CAPTCHA is great. The other thing you can do that will prevent 99% of your robot traffic yet not annoy your users is to validate fields.
My site, I check for text in fields like zip code and phone number. That has removed all of the non-targeted robot misinformation.
You could create a two-step system in which a user fills the form, but then must reply to an e-mail to "activate" the record within a set period of time - say 24 hours.
In the back end, instead of populating your current table with all the form submissions, you could put them into a temporary table that automatically deletes any row that is older than your time allotment. Unless you have a serious bot problem, then I would think that the table wouldn't get that big, especially if the first form is just a few fields.
A benifit of this approach is that you don't have to use captcha or some other technology like that that might create some accessibility problems.

Block spam from "tell a friend" forms

You have to have a form on your website for people to send an email to a friend if they found something interesting. You can force people to be logged in (which is not a good option in my case). You can make time delay (this is not really urgent email, so it can wait for 5 minutes). Do you have this problem? How would you solve it?
Edit: I am mostly interested in stopping manual spam
Do you have a problem with automated scripting of your form, or people genuinely using it too much?
The simple solution to the bot problem is a Captcha, such as ReCaptcha. The user-friendliness is questionable, but it would perhaps solve your problem.
You can also use something different from all those captcha scripts. Let me tell you what I do:
- I create a md5 hash:
$secretWord='TryToHashMe';
$formID='myForm';
$md5Value=md5($secretWord.$formID);
echo '<input type="hidden" name="form-check" value="'.$md5Value.'">';
echo '<input type="hidden" name="bot-check" value="">';
those are 2 very simple ways because: 1) auto bots try to fill all your inputs and 2)the hash is not provided, this mean you have a post request from outside your site. The hashing could be extended with some session or cookie, too.
All the best!
I would recommend a Captcha or if you would like something a bit less intrusive, have a simple math problem(which changes) so you just have something like:
For spam protection: Type what Two Plus Two is here _________
I did this on my personal website and never had a problem(and I had a lot of attempts that failed by spambots)
This service has very good anti-spam measures.
http://www.tellafriendking.com/features.php?showall=1#spam-free
FYI, I am involved with the company, so I'm not entirely unbiased, but we do get a lot of refugees who come to us to end their spam problems with other services or downloaded scripts.
Edit:
If you feel the need to vote down, perhaps you should leave a comment too...
The best solution is to use an all-purpose bot filtering solution. I know this is an old post, but a new botnet was discovered that uses these send to a friend modules to send spam (not a new technique but some interesting new advancements).
According to one security vendor (good tips), “At a minimum, they should include a rate-limiting mechanism that will prevent an IP address from issuing unreasonable numbers of requests over a specific period of time. Other DIY solutions are to have all users fill in CAPTCHAs and to enforce registration as a prerequisite to sending out an email message.”

How does Google track search result clicks? Is this the best way?

As the question states, I'm trying to figure out how google tracks clicks on search results. When you view the source, you find the following:
<em>Yahoo</em>!
The function rwt is, which is pretty messy:
windows.rwt=function(b,d,e,g,h,f,i,j){
var a=encodeURIComponent||escape,c=b.href.split("#");
b.href=["/url?sa=t\x26source\x3dweb",d?"&oi="+a(d):"",e?"&cad="+a(e):"","&ct=",a(g),"&cd=",a(h),"&url=",a(c[0]).replace(/\+/g,"%2B"),"&ei=7_C2SbqXBMW0-AbU4OWnCw",f?"&usg="+f:"",i,c[1]?"#"+c[1]:""].join("");
b.onmousedown="";
return true};
So it looks like Google is changing the href of the a tag to /url?... which I'm assuming is where their tracking is. From LiveHeaders in Firefox, it looks like this page is redirecting the browser to the original href of the a tag.
Is this correct and is this the best method of tracking clicks on links on your site, such as ads?
It's actually changing the href of the link rather than the window location. It's setting b.href, and b refers to the link itself. This runs in onmousedown, so when you release the mouse and the click is handled you magically get sent to that new href.
Any click tracking pretty much comes down to sending the user to some equivalent of Google's /url?... script, counting the click, and performing a 302 redirect to the real destination.
This javascript href replacement has the advantage of automatically filtering out any robots that don't run scripts. The downside is that it also filters out any real people that have javascript disabled. If, like Google, you just care which link is most popular with your real human users, this works out quite well. The clicks that you do record should be representative of real human traffic, and you can safely ignore the clicks from non-javascript users because they probably have the same preferences anyway.
Most adverts just link straight to the counting URL with no javascript replacement. This means that you definitely count every real click on the link, but you need to worry about filtering out requests from robots, since they'll now see your counting URL too.
Which you prefer really depends on why you want to track the clicks.
I think most people expect ads to click through via some sort of tracking system, so I shouldn't worry too much about following this particular javascript implementation - as much as anything that's probably there to ensure that the user sees the correct link in the browsers status bar, that various other interesting bits of info (search terms, position on the result set at the time, who you are, etc) are sent across (without you realising it) and that the links still work if JavaScript is disabled.
Generally, yes directing the user through some tracking page with the ID of the ad they have clicked on, and possibly some additional indication of where they have come from is sensible - that way you aren't relying on other mechanisms (such as JS event handlers) to track clicks on the links, it's certainly the way most ad systems I've used work.