How to Verify whether a Robot is Entering Information - captcha

I have a web form which the users fill and the info send to server and stored on a database. I am worried that Robots might just fill in the form and I will end up with a database full of useless records. How can I prevent Robots from filling in my forms? I am thinking maybe something like Stackoverflow's robot detection, where if it thinks you are a robot, it asks you to verify that you are not. Is there a server-side API in Perl, Java or PHP?

There are several solutions.
Use a CAPTCHA. SO uses reCAPTCHA as far as I know.
Add an extra field to your form and hide it with CSS (display:none). A normal user would not see this field and therefore will not fill it. You check at the submission if this field is empty. If not, then you are dealing with a robot that has carefully filled out all form fields. This technique is usually referred to as a "honeypot".
Add a JavaScript timer function. At the page load it starts a value at zero and then increases it as time passes. A normal user would read and fill out your form for some time and only then submit it. A robot would just fill out and submit the form immediately upon receiving it. You check if the value has gone much from zero at the submission. If it has, then it is likely a real user. If you see just a couple of seconds (or even no value at all due to the robots not executing JavaScript) then it is likely a robot. This will however only work if you decide you will require your users have JavaScript on in order to perform "write" operations.
There are other techniques for sure. But these are quite simple and effective.

You can use reCAPTCHA (same as stackoverflow) - they have libraries for a number of programming languages.

I've always preferred Honeypot captcha (article by phil haack), as its less invasive to the user.

Captchas bring accessibility problems and will be ultimately defeated by software recognition.
I recommand the reading of this short article about bot traps, which include hidden fields, as Matthew Vines and New in town already suggested.
Anyway, you are still free to use both captcha and bot traps.

CAPTCHA is great. The other thing you can do that will prevent 99% of your robot traffic yet not annoy your users is to validate fields.
My site, I check for text in fields like zip code and phone number. That has removed all of the non-targeted robot misinformation.

You could create a two-step system in which a user fills the form, but then must reply to an e-mail to "activate" the record within a set period of time - say 24 hours.
In the back end, instead of populating your current table with all the form submissions, you could put them into a temporary table that automatically deletes any row that is older than your time allotment. Unless you have a serious bot problem, then I would think that the table wouldn't get that big, especially if the first form is just a few fields.
A benifit of this approach is that you don't have to use captcha or some other technology like that that might create some accessibility problems.

Related

Counting the amount of users or executions of an application.

I made a program that gets the data from the clipboard and saves it in a string variable. Then it looks for specific words in that string and generates several URLs. Afterwards it open the browser and shows each URL in an own tab.
Some of my friends already use this program frequently and I want to have some statistics about how often. I simple counter variable would be enough but I need to get access to it.
I came up with two options that could work:
I could send an email to a specific adress every time my app is executed. Then I can track the amount of uses by manually or automaticly counting the amount of emails in the postbox. I think this would be a Vers dirty solution.
I could create and publish a website containing a counter. This counter could be refreshed by my application. This solution is a bit better I think but a lot more work for just one single counter.
Do you have better ideas to solve my problem or is one of mine already a good one?
Thank you in advace!
You can use Measurement Protocol Overview. This provides you statistics of usage your application compared with Google Analytics. You can see even a geo statistic, version distribution, crash reports. It is easy to use it from .net. It is just about requesting http request to google.

how to get the data from captcha in selenium webdriver

I'm using Selenium webdriver (Java).
I need to test the registration form but before submitting, image box (captcha) is appearing but everytime of execution it is going to be changed. I want to know how to get the data from image (captcha).
Anyone can help me?
If the captcha is coming from an environment under your control, you will likely need to implement some sort of method indicating you are in a test environment and have the captcha system return a known value or some indicator of what the expected value is.
If, on the other hand, the captcha is coming from another source out of your control, you are probably our of luck. At that point, you are essentially in the same boat as the spammers who are in a constant arms race to write software that can visually parse a captcha.
UPDATE
I feel the need to add some clarification to the ideas put forth in the question, answer and comments. Essentially you are dealing with one of the following situations (note that when I say 'your', I am referring to you, your company, client, etc):
1) Your form, Your captcha system: If this is the case, your best solution is to work with your developers to add a 'test' mode to your captchas, returning either a known value, or additional information in the page that indicates what the expected value should be. If you are able to make use of a tool, either written by you, or by another, that can successfully 'read' the captcha image, your system is broken. If you can do it in test mode, what is to stop anyone else (spammer, hacker, etc) from bypassing your captcha in exactly the same manner.
2) Your form, 3rd Party captcha system: If this is the case, your best solution is again to see if the system has some 'test' mode that you can make use of. I have no experiance with these systems myself but in general would guess that test methods exist for the major systems out there. A Google search of {Captcha System Name} automated testing should return some good hints as to how to go about testing with the system. If nothing good comes from that, your next bet would be to implement your own, internal, test only, dummy captcha system that works with some known value and make your captcha provider configurable so that you can point to your test system in test/dev/etc and your real system in production.
3) Another Form, Unknown captcha system: I am going to make a leap of faith here and assume this is not your case, but just for completeness I will include it. If this is your case, your not testing anything at all and are simply asking for help bypassing someone else's security mechanisms for your own reasons. If that is the case, please seek your assistance on less scrupulous sites.
Captcha code was introduced in order to prevent from the robot or automation codes. There is no option for automating the Captcha code.
1 . You can give a wait time for the automation, so that the user can enter the captcha code.
2. If the project is in testing url means, you can request your system admin and developer to disable the captcha validation.
May be this can help you, but i din't try on this..
Developers will generate a random value for captcha, and they will convert the value into image as well as they will store the value in session for comparing the entered input is matching with the captcha code.
So If possible, you can take that session value and give as the input.

May we use Yii flash messages on this scenario?

I haven't seen this scenario covered here:
Yii Framework: How to work with Flash Messages.
So, after user registration, I wish to redirect the user to a thank you page where he/she could read more about what he/she should do, and what would happen next. It's a nice amount of information, so adding that message to an already existing page is not an option, because it would get to noisy. Making temporary displaying msg isn't an option neither, because it's a fair amount of text to be read.
On cases like this:
Should we still use flash messages and use a conditional so that what normally exists on the page stays hidden while display a success flash message ?
OR
Should we simply redirect to a given thank you view (by creating the respective thankyou action?)
Is there a better option?
You could use a flash message. But these are really for things like "Your account is now created".
If you want to include a good amount of information, I think it best to have a separate thankyou action/view that people are redirected to after the sign up process is complete.

Stop spam without captcha

I want to stop spammers from using my site. But I find CAPTCHA very annoying. I am not just talking about the "type the text" type, but anything that requires the user to waste his time to prove himself human.
What can I do here?
Requiring Javascript to post data blocks a fair amount of spam bots while not interfering with most users.
You can also use an nifty trick:
<input type="text" id="not_human" name="name" />
<input type="text" name="actual_name" />
<style>
#not_human { display: none }
</style>
Most bots will populate the first field, so you can block them.
I combine a few methods that seem quite successful so far:
Provide an input field with the name email and hide it with CSS
display: none. When the form is submitted check if this field is
empty. Bots tend to fill this with a bogus emailaddress.
Provide another hidden input field which contains the time the page
is loaded. Check if the time between loading and submitting the page
is larger the minimum time it takes to fill in the form. I use
between 5 and 10 seconds.
Then check if the number of GET parameters are as you would expect.
If your forms action is POST and the underlying URL of your
submission page is index.php?p=guestbook&sub=submit, then you
expect 2 GET parameters. Bots try to add GET parameters so this
check would fail.
And finally, check if the HTTP_USER_AGENT is set, which bots sometimes don't set,
and that the HTTP_REFERER is the URL of the page of your form. Bots
sometimes just POST to the submission page causing the HTTP_REFERER
to be something else.
I got most of my information from http://www.braemoor.co.uk/software/antispam.shtml and http://www.nogbspam.com/.
Integrate the Akismet API to automatically filter your users' posts.
If you're looking for a .NET solution, the Ajax Control Toolkit has a control named NoBot.
NoBot is a control that attempts to provide CAPTCHA-like bot/spam prevention without requiring any user interaction. NoBot has the benefit of being completely invisible. NoBot is probably most relevant for low-traffic sites where blog/comment spam is a problem and 100% effectiveness is not required.
NoBot employs a few different anti-bot techniques:
Forcing the client's browser to perform a configurable JavaScript calculation and verifying the result as part of the postback. (Ex: the calculation may be a simple numeric one, or may also involve the DOM for added assurance that a browser is involved)
Enforcing a configurable delay between when a form is requested and when it can be posted back. (Ex: a human is unlikely to complete a form in less than two seconds)
Enforcing a configurable limit to the number of acceptable requests per IP address per unit of time. (Ex: a human is unlikely to submit the same form more than five times in one minute)
More discussion and demonstration at this blogpost by Jacques-Louis Chereau on NoBot.
<ajaxToolkit:NoBot
ID="NoBot2"
runat="server"
OnGenerateChallengeAndResponse="CustomChallengeResponse"
ResponseMinimumDelaySeconds="2"
CutoffWindowSeconds="60"
CutoffMaximumInstances="5" />
I would be careful using CSS or Javascript tricks to ensure a user is a genuine real life human, as you could be introducing accessibility issues, cross browser issues, etc. Not to mention spam bots can be fairly sophisticated, so employing cute little CSS display tricks may not even work anyway.
I would look into Akismet.
Also, you can be creative in the way you validate user data. For example, let's say you have a registration form that requires a user email and address. You can be fairly hardcore in how you validate the email address, even going so far as to ensure the domain is actually set up to receive mail, and that there is a mailbox on that domain that matches what was provided. You could also use Google Maps API to try and geolocate an address and ensure it's valid.
To take this even further, you could implement "hard" and "soft" validation errors. If the mail address doesn't match a regex validation string, then that's a hard fail. Not being able to check the DNS records of the domain to ensure it accepts mail, or that the mailbox exists, is a "soft" fail. When you encounter a soft fail, you could then ask for CAPTCHA validation. This would hopefully reduce the amount of times you'd have to push for CAPTCHA verification, because if you're getting enough activity on the site, valid people should be entering valid data at least some of the time!
I realize this is a rather old post, however, I came across an interesting solution called the "honey-pot captcha" that is easy to implement and doesn't require javascript:
Provide a hidden text box!
Most spambots will gladly complete the hidden text box allowing you to politely ignore them.
Most of your users will never even know the difference.
To prevent a user with a screen reader from falling into your trap simply label the text box "If you are human, leave blank" or something to that affect.
Tada! Non-intrusive spam-blocking! Here is the article:
http://www.campaignmonitor.com/blog/post/3817/stopping-spambots-with-two-simple-captcha-alternatives
Since it is extremely hard to avoid it at 100% I recommend to read this IBM article posted 2 years ago titled 'Real Web 2.0: Battling Web spam', where visitor behavior and control workflow are analyzed well and concise
Web spam comes in many forms, including:
Spam articles and vandalized articles on wikis
Comment spam on Weblogs
Spam postings on forums, issue trackers, and other discussion sites
Referrer spam (when spam sites pretend to refer users to a target
site that lists referrers)
False user entries on social networks
Dealing with Web spam is very difficult, but a Web developer
neglects spam prevention at his or her
peril. In this article, and in a
second part to come later, I present
techniques, technologies, and services
to combat the many sorts of Web spam.
Also is linked a very interesting "...hashcash technique for minimizing spam on Wikis and such, in addition to e-mail."
How about a human readable question that tells the user to put in the first letter of the value he put in the first name field and the last letter of the last name field or something like this?
Or show some hidden fields which are filled with JavaScript with values like referer and so one. Check for equality of these fields with the ones you have stored in the session before.
If the values are empty, the user has no javascript. Then it would be no spam. But a bot will at least fill in some of them.
Surely you should select one thing Honeypot or BOTCHA.

invisible captcha

I'm using the following security(invisble captcha) for my site's form submission to prevent auto submission:
generate the result of md5 with a fixed salt on number x and render it
inside the form as a hidden field
generate 2 hidden fields a and b where a + b = x, a and b are
unencrypted
upon submission, use javascript to add another plain hidden field c
where c=a+b
on server side apply md5 on c with the salt, compare it with encrypted
x
However such system is cracked in production, one person was able to auto-submit thousands of forms successfully. Any idea how?
One way to do it is, the hacker already knows that the operation is + (simple to find out by observation of javascript), read the form and add a and b, create a new form with the extra c field where c=a+b. He has to first read a form, then create one for submission.
My questions are:
Is the hypothesis I presented above the likely way to break my system?
If so, what should I do to prevent this kind of hack?
What are other alternative hacks the hacker might use?
I don't want to use real captcha because it degrades user experience. All suggestions are welcome.
Alternatively, the hacker could just execute your javascript themselves.
If you want to validate that the user isn't a robot, you'll have to get the user to do something a robot can't. It's really that simple.
A further step would be to increase the amount of computation required; make it infeasible to submit the forms too rapidly. Try looking at HashCash.
I can't give advice in your specific case, but Django has some nice approaches, how spam in comment fields could be supressed without captchas: Nice approaches here.
Your system is not working because the attacker(s) are just executing your JavaScript themselves. If you want to use a somewhat similar scheme that will prevent automated submissions you need to put a workload factor on the client. This will not stop the automated software from being able to submit to your site but it will slow them down and increase the cost of an attack. The goal is to increase the cost and slow them down enough that the attack is just not worthwhile. Instead of trying to build it yourself try using this proof of work service.