Are you human? (or How to prevent spam) [closed]

Are you human? (or How to prevent spam) [closed] - captcha

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What mechanisms do you know that prevent your site from being abused by anonymous spammers.
For example, let's say that I have a site where people can vote something. But I don't want someone to spam something all the way to the top. So I found (a) creating an account and only allowed to vote once and (b) CAPTCHA to decrease spam. What other methods do you know and how good do they work?

From xkcd

The big thing I've noticed is that whatever you do, you want your system to be unique. You want an attacker to have to tailor their automation program for your specific site, rather than just throw a pre-existing script at it that will work almost anywhere. It doesn't even have to be cryptographically secure; it just has to make your site a little different from the norm.
This doesn't mean you can't or shouldn't use something like a pre-built captcha widget. Absolutely do use one of those as a staring point! It just means you have to customize it somewhere so that something extra happens that is outside the norm and will break any pre-existing script that could normally defeat it.
If your site gets big enough that you have attackers targeting it specifically, then your simple little customization probably won't hold up anymore and you might have do something a little more special and think about real cryptography and all that. But that's one of those things that's a "good" problem to have.

For a CAPTCHA system, I heartily recommend reCAPTCHA.
Traditional computer-generated CAPTCHAs will eventually be broken by developing a sufficiently intelligent system. For instance, here's someone who claims to break the Google CAPTCHA, formerly considered unbreakable, with a 30% hit rate. reCAPTCHA, by definition, shows you only images that cannot be recognized by optical character recognition.
And at the same time, your users' effort will be directed towards the common good - they help digitize books by recognizing words that cannot be recognized automatically.
See here for further explanation and to try it out.

From Quantum Random Bit Generator Service, via MNeylon

Limit the number of votes per IP address per time
Block anonymizing proxies.
For voting: How about shuffling the value that has to be returned by the form on a "per session basis". Once "1" means the first item, "2" means the second. Then "77" means the first item, "812" means the second, ... could be some simple maths behind the scene, but it prevents users from just sending the same HTTP query over and over again.
What's worked for me very well: Use AJAX forms, not simple HTTP forms. Technically it's not much more complicated to fake votes, but I have written a simple blog software and it's only SPAM protection mechanism is to submit the comments via AJAX - no SPAM so far.

I'm a fan of the "hidden field" CAPTCHA. I don't remember where I read about it, but the idea is this:
create your form as normal
add an extra field but hide it (i.e. style="display:none" on the surrounding div or table row)
after submission, if the field is blank, do the appropriate action (eg send an email); if the field has been filled in, then it's a robot submitter
The only case where this falls down is if the user's browser doesn't handle CSS (or they have it switched off), which is very rare.

Charge for votes, like they do on some television "talent" shows, and get spammed all the way to the bank!
Seriously, this is a really tough problem, and someday (maybe soon, if you listen to Ray Kurzweil), computers will do testing to screen out humans. The answers I'm adding to the list have obvious drawbacks, but just for the sake of enumeration: moderation (have humans do the testing), and IP-based tracking (limit the number of votes from a host).

stackoverflow has a few features that help with this; I think the single most useful step you can take is disabling the ability of anonymous users and new accounts to vote. This way, no one can sign up for hundreds of accounts and use their one vote to overpower other users. I'd say requiring a few posts or membership for a certain period of time are both decent options.
Some would say you could allow one vote per IP address to help address this, but I've played plenty of games where malicious users with a nigh-infinite number of proxies defied IP address-based security. It's a deterrent, but a savvy user will get around it easily.

This is the study area of Human Computation.
there is an excellent video from Luis von Ahn here:
http://video.google.com/videoplay?docid=-8246463980976635143

There's a few ideas in the answers to the Best non-image based CAPTCHA? question if you haven't seen it already.

I normally use a combination of the two: anonmous user is free to browse everything, but if he wants to vote, then he has to register.
In the registration process, depending on the situation, I use an optin thru mail (to complete registration and confirm that at least the mailbox exists) and/or a CAPTCHA.
From that point on you can decide if the user can vonte more than once, or any other rule.
Btw I'm not a fan of the IP-based constraints: there are a lot of situation in which big organization's network use few IP for all their users, so the risk to block users that could vote is high.

Related

reCAPTCHA vs other captcha systems

What is a good reason to choose reCAPTCHA over a well known and tested captcha generator on the server. Is it just philanthropy (helping with digitizing texts) or are there other good reasons.

reCAPTCHA is rather neat. Not only does it stop spammers but it helps digitize books. Each word that appears in the captcha has actually been scanned in from a book but sometimes the character recognition is off so the computer my save some gibberish of a sentence without knowing any better.
See the image off their site:
By making people type in what they think the word is, it helps create a digital copy of the book or word that was scanned with accuracy while at the same time checking what the user submit, comparing it to other's submissions, and determining if the user is human or not.
For that reason I use reCAPTCHA. I'm not just selfishly protecting my site, I'm providing a service for others.
Not only that but it's fairly simple to implement and provided by a reliable company (Google).

The question was "why should I use it"; that question must include "why shouldn't I use it", so some criticisms:
Recaptcha volunteers your users to be OCR monkeys, without bothering to ask their opinion.
It requires that you advertise recaptcha in the captcha widget, which isn't always appropriate.
It's a web service, which means there's no hard guarantee it'll still exist a week or a year or two years from now. (Google has crippled or removed public, widely-used APIs in the past, such as their translation API.)
It only supports web pages, loading everything with scripts and iframes. It doesn't have a proper API, so if you ever want to have an iOS or Android app that logs into your system, and need to show a captcha there, you'll be out of luck.
You have no control over the complexity of the generated captcha. Captchas always have a tradeoff between how hard they are to read and how difficult they are to OCR. There are no knobs to adjust, based on how important stopping robots is to your use case. If they decide to make the captchas much harder to read (which they've done at times), and this becomes a nuisance to your users, there's nothing you can do about it.

reCAPTCHA is quite good. Most other generators are broken easily while reCAPTCHA usually gets good scores.
Another good thing is that it has the accessiblity button so that it would read the text.

This is an old threat but I would just like to confirm that in my case we used reCAPTCHA on a number of Drupal 6 websites in combination with the Honeypot module. We did that to stop automated spam user registrations.
I presume these user accounts were being created automatically by desktop applications such as SEnuke XCr and XRumer with the aim of then posting spam. They create the user account but they rarely do anything further but I found it annoying. Further reading on this subject can be found here: How to prevent spam user registrations? (links to an article on Drupal.org).
I can confirm that the above reduced my spam user registrations from a little over 100 a day to none at all.

We need to register our IP address on which server would be running. Its seems some what risky. So we might be required to change registration work flow in case of use of reCAPTCHA.

Need suggestions/ideas for easy-to-use but secure captchas [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
To start with, I am well aware of the security/usability trade-off associated with captchas and do not need any explanation on that.
I know that reCAPTCHA is the state-of-the-art in captcha technology but we just do not want to use it for our site because of the difficulty faced by users to read distorted words. Our site is a study portal for students offering live online classes, so the users will be students (leaving certificate level) and teachers.
I have been searching for different ideas and found some good ones like:-
The Sesame Street Solution as given in http://www.usereffect.com/topic/2009-07-13-captcha-is-there-a-better-way.
Asking questions which are very easy for humans like "which one tastes better or ". But how many such questions do I need to store to be safe?
My purpose of asking this question is to get as many ideas as possible. I think there are still a lot of user-friendly but secure ways I could analyse before finalizing.
Please highlight the pros and cons of the method you suggest with reference to the way spam bots work. I am not much aware of many of their strengths and weaknesses.
Thanks,
Sandeepan

Reading distorted words is one thing, but also asking legit users to enter things like this can get quite annoying. So it's important you don't burden the user with anti-spam measures.
Damien Katz has used a negative captcha to stop spam bots. This technique, also called honeypot field, is easy to implement and doesn't require the user to do anything.
A more complex honeypot implementation is described by Ned Batchelder. It involves randomized field names and hashed values to make sure bots haven't tampered with the form.
In his article he states the following:
Spammers don't make software that can post to any form, they make software that can post to many forms.
So it only takes a simple trick to confuse the majority of spam bots. A little bit more magic will take care of the remaining bots.
Regarding the Sesame Street solution, asking simple question or selecting the correct animal from a list: these are questions that are hard for spam bots to answer, but they can be difficult for users as well. Especially if your site has an international audience, people with a first language other than English may have trouble understanding the questions. It may not be an issue with your student audience, but it is something to keep in mind.

One a colleague of mine implemented was to present a series of random images of things like tea cups, boats, cats etc. with checkboxes and ask the user to tick all the cats (say), or perhaps the boat and the tree.
The images were fairly simple two colour icons really, though you could use real photos if necessary.
Just make sure that your image names aren't representative of their contents.

First, ASP.NET has a control that isn't truly a "captcha," but in fact quite the reverse - a very simple script which makes sure that the visiting program can evaluate JavaScript. This gets rid of all but the most complex scrapers, especially if the JavaScript test has a structure that changes (i.e. it isn't just var y = 2; var x=y+(random number from server); verify(x))
Google and Craigslist both use phone numbers, which mandate that a nasty bot at least have access to an SMS-capable number (or speech recognition + voice line)
My favorite captcha is clicking on something that a computer can't recognize, such as picking out a cat from a short list of animal pictures.
It's important to consider accessibility and ease of implementation, which reCAPTCHA does very well.

Another answer to the CAPTCHA problem? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Most sites at least employ server access log checking and banning along with some kind of bot prevention measure like a CAPTCHA (those messed-up text images).
The problem with CAPTCHAs is that they poss a threat to the user experience. Luckily they now come with user friendly features like refresh and audio versions.
Anyway, like linux vs windows, it isn't worth the time of a spammer to customize and/or build a script to handle a custom CAPTCHA example that only pertains to one site. Therefore, I was wondering if there might be better ways to handle the whole CAPTCHA thing.
In A Better CAPTCHA Peter Bromberg mentions that one way would be to convert the image to HTML and display it embedded in the page. On http://shiflett.org/ Chris simply asks users to type his name into an input. Examples like this are ways to simplifying the CAPTCHA experience while decreasing the value for spammers. Does anyone know of more good examples I could use or see any problem with the embedded image idea?

Image presented as HTML table is just a technical speed bump. There's no difficulty in extraction of pixels from such document.
IMHO CAPTCHA puts focus on a wrong thing – you're not interested whether there's a human on the other side. You wouldn't like human to spam you either. So take a step back and focus on spam:
Analyze text (look for spammy keywords, use bayesian filtering)
Analyze links (blacklist spammy domains – SURBL, LinkSleeve)
Look at traffic patterns and block floods
There's no single perfectly accurate method, but you can use few of them and weight the result to get pretty close.
Have a look at source code of Sblam! (it's a completely transparent server-side comment spam filter).

Alternatives to captchas are going to be to consider the problem from other angles. The reason for this is because captchas are built around the idea that a human and computer actor can be distinguished. As Artificial intelligence progresses, this will always become an increasingly difficult problem as the gap between computer and human users shrinks.
The technique used here on slashdot is for other users of the site to act as gatekeepers, marking abuse and removing offending posts before they become noticeable to a wide audience.
Another technique is to detect spam-like posts directly, using the same technology used to filter spam from email. Obviously it isn't 100% effective for email, and wont be for other uses, either, but if you can filter out 75% of the spam with very few false positives being filtered, then other techniques will only have to deal with the remaining 25%.
Keep a log of spam-related activity, so that you can track trends about offending ip addresses, content of posts, claimed user agent, and so forth, so that you can block abusive users at a routing level.
In nearly all cases, your users would rather put up with the slight inconvenience of abuse prevention, than the huge inconvenience of a major spam problem.
Ultimately, the arms race between you and spammers is one of cost-benefit. Initially, it will cost spammers close to nothing to spam your site, but you can change that to make it very difficult. Even if they continue to spam your site, the benefit they recieve will never grow beyond a few innocent users falling for their schemes. Once the cost of spamming rises sharply above the benefit, the spammers will go away.
Another way to benefit from that is to allow advertising on your site. Make it inexpensive (but not free, of course) and easy for legitimate advertisers to post responsible marketing material for your users to see. Would be spammers may find that it is a better deal to just pay you a few dollars and get their offering seen than to pursue clandestine methods.
Obviously most spammers won't fit in this category, since that is often more about getting your users to fall victim to malware exploits. You can do your part for that by encouring users to use modern, up to date browsers or plugins so that they become less vulnerable to those same exploits.

This article describes a technique based on hashed field names (changing with each page view) with some of them being honeypot fields (i.e. the request is rejected if they're filled) that are hidden from human users via various techniques.
Basically, it relies on spam scripts not being sophisticated enough to determine which form fields are actually visible. In a way, that is a CAPTCHA, since in order to solve it reliably, not only would they have to implement HTML, CSS and JavaScript fully, they'd also have to recognize when a field is too small to see, colored the same as the background, hidden behind another field, placed outside the browser's viewport, etc.
It's the same basic problem that makes Web Standards a farce: there is no algorithm to determine whether a webpage "looks right" - only a human can decide that.

seen this?
It's a system with cute pictures instead of captcha ;)
But I still think honeypots are a better solution - they're so cheap&easy&invisible

I really think that Dinah hit the nail on the head. The fact seems to be that the beauty of the whole CAPTCHA setup is that there is no standard. Standardizing would only help the market to be more profitable.
Therefore it seems that the best way to handle the CAPTCHA problem is to come up with a fairly hard system for bots to catch that is NOT used by anyone else on the planet. It could be a question system, a very custom image creator, or even a mix of JS calls that only browsers respect.
By the time that your site is big enough for spammers to care you should have the budget to rethink your CAPTCHA setup and optimize it much more. In the mean time we should be monitoring our server logs and banning bad agents, refers, and IP's.
In my case I created a CAPTCHA image that I believe is very different from any other CAPTCHA I have seen. This should do fine for now along side my Apache logs + htaccess banning and Aksimet checking. Maybe I should spend time on a reporting feature as well.

although not a true image captcha, good turing test is asking users a random question - common options are: is ice hot or cold? 5+2= ..? etc.

Moving from Enterprise to World Wide Web [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am going to change my working sphere from Enterprise Web Applications written for concrete business process to Public Web Sites that will be accessible to all users around.
What is difference between this two spheres at the most top level? What specific characters I need to know about modern web sites development?

I suspect one could write books about this.
I suppose the first difference is the user base. With an enterprise, you can, at least partly, ensure the users are doing what they are supposed to - and if not you know who they are and where they live. Further, they can be fired for abuse. On a public web site, you almost have to assume that some part of your user base is not there for a positive reason. So be paranoid - if they're not attacking you yet, just wait.
A second related point is that users will find ways to use (abuse?) your site you never thought of. Plan for the worst, hope for better.
Third, language, culture and usage varies across the world. A form, for example, with "zip code" that accepts just 5 digits may make sense in the US but is useless in the UK. And asking for a state and restricting it to two characters likewise makes no sense say in Italy where Italy IS the "state". This also applies to actual content - that joke you think is so very funny may be offensive in other countries. And never under estimate the ability of some folks to be offended at anything.
Fourth - get a good bunch of beta tester and test your site, and updates, carefully and thoroughly.
Fith, have a plan for scalability - if you suddenly get "discovered" can your site take the traffic.
That's 5 things at least.

In an enterprise application, functionality and efficiency trump aesthetics every time. This is because you have a captive audience. The people who use your application are being paid to use it.
However, when opening an application up to the public, aesthetics becomes more important. There are always alternatives, and a given person will be more attracted to the application which looks better. Granted, functionality is still very important for repeat users, but you won't get people in the door if your application looks amateurish.

Browser agnosticism - In enterprise apps, it used to be that the developer would target the app at a specific browser, just for simplicity's sake.
In internet accessible apps, the developer must target the vast majority of browsers. While this has gotten easier in the last few years, it is still a issue that needs attention.
Scalability - its easier to scale an enterprise app, its easier to predict the growth of usage of the app, or simply design for access by all users in the org at once. This is not generally the case for internet sites. The day you get slashdotted, or dugg is the day that you learn this. Better to design scalability in from the start, rather than have to learn it at the time that your site starts to suffer.

In addition to Zack's answer, I would say that a web site/application that is open to the public needs to be constantly evolving/refreshed in order to grow your user base and keep them. Whereas on a more closed system, consistency and reliability are key priorities.
Depending on the nature of the application, if it has significant amounts of content Internationalization and presentation of content are hugely important.

As Zack mentions, public users have a lot less tolerance for poor UI than enterprise customers do. That said, public users are more tolerant of incremental change; you can upgrade a live site as you feel like it (as long as it works, of course!!) without having to go through endless feature-request prioritization committees and user-training requirements.

Public web sites needs to be easy to use. While it's important that they look somewhat polished, don't ever let polish get in the way of ease of use. For example many designers like fixed width layouts because they are more predictable, many users like fluid width layouts because they use the space more efficiently. Side with your users.
Enterprise users can be forced to deal with needlessly-complex systems (lord knows I am more than I'd like), the general public cannot.

Good tool to collect issues, improvements, ideas [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I need a tool for collecting feedback and new ideas inside our company regarding our internal IS product. The problem is the acceptance level for such a tool.
Most of our colleagues are not IT oriented, so a solution like BugZilla or Jira is way to complicated for them to use. You need to create an account, take care of a lot of parameters before submission, new ideas about new software doesn't really fit well in these tools, etc...
So, here are my requirements:
No login need, or optional.
Few fields to enter.
If possible a WYSIWYG editor for the main description field.
Web based or E-mail based (we use outlook internaly).
Free (as a beer).
Not too chaotic (a Wiki is not an option)
I've take a look at uservoice (of course), it's really a nice tool for experienced people, but too complex for my target users.

Is the feedback you are seeking possible to collect through a questionnaire? There are many free solutions that provide you with questionnaire forms very easy to use, and if none apply it is also something relatively easy to implement.
I also do not understand why a wiki will not be a good solution, but regarding the Outlook, you have the possibility of doing simple votes (approve/reject) (yes/no):
See: http://www.microsoft.com/atwork/worktogether/forms.mspx

If the barrier to actually use the tool should be minimal, then perhaps the best way to collect the feedback is to use an e-mail address. Everybody knows how to use the system, so there is practically no barrier. And the feedback that is provided has to be processed by developers / management anyway, in order to decide what concrete actions are going to be taken. The developers can then use whatever system suits them best in order to keep track of bugs, immediately required functionality, nice-to-have features that can be implemented later, etc.

Some "defect tracking tools" handle this.
Don't vote down because of "defect tracking". Some of the tools are enterprise and handle incidents, requents, requirements, etc. And, you can go to one place for bugs and enhancement requests.

Microsoft's Exchange server has support for Public Folders, email lists/groups. This may be an easy introduction to collaboration for your environment, using tools that are familiar. From the Microsoft Help on Public Folders:
Public folders are an easy and
effective way to collect, organize,
and share information with other
people in your workgroup or
organization. You can use public
folders to share files or post
information on an electronic bulletin
board.
I'm not sure how effective the tools for managing those "lists" are - I'm not sure if you can mark responses such that all users see the mark, for example.
But it is probably a good start. As people start to see the value of collaboration, something along the lines of a Wiki becomes more appealing.
I've got to say that Confluence, especially now that editing with Open Office or Microsoft Office tools is possible really deserves a look. Not free (as in beer).

I would think a locally hosted php-bb (or other...) forum would be a good choice, as you could moderate it and have a FAQ and history that people could check before duplicating suggestions. So, that's the advantage over a simple email address, and it has a simple, known interface.

What's too complex about Uservoice? The main UI is a single question ("I suggest you ..."). Your users can be anonymous, one field to enter, web based, free for small users. Seems to tick all the boxes except the visual editor. Even administering it is not terribly tricky. (I use it for my iPhone app.)

It looks like you're facing a very standard tradeoff - you want your feedback to be structured, but you don't want any impositions upon your users.
You can't have your cake and eat it too. Why is a wiki off the table? Wikis were designed to balance this kind of tradeoff.

You could use Google Documents to create a shared spreadsheet. Your uses will need Google accounts, but they only need to log in once and a cookie will remember them for next time.

Hum, I've found that we've also InfoPath as part of our toolset. I've never use it, but maybe that it could do the job.

How about using for example Google groups? I've found a mailing list works quite well for this kind of purpose.
Edit: or how about http://getsatisfaction.com/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas