Captcha alternative - captcha

In order to implement a CAPTCHA for my login page, I would like to understand how a translation test can be considered secure compared to popular image recognition patterns.
All customers will be bilingual speakers of an orally learnt and used Polynesian language i.e., no formal spelling conventions (hence the translation to English not the reverse), so instead of asking them to read distorted letters I would like to ask them to translate a simple sentence into English to be validated from the PHP server side.
Is this secure/accurate?

The basic idea to state that this kind of CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart") is totally insecure is that while the OP states that "currently" Google Translator doesn't offer support for Polynesian language, it cannot be excluded that it will do so in the future.
More generally, translation is not a valid CAPTCHA test because of the following considerations:
Comparing a random sentence VS its automated translation using a public translator (e.g. a future version of Google, Bing) is equal for a hacker submitting the same phrase to the translation engine
Using a whitelist of sentences and their translations will be eventually overwhelmed by the accuracy of the automated public translators
I mean that modern public computer translators are perfecting their accuracy. If you assume that a public translator is unable to perform an accurate job today and challenge the user with a known phrase the translator cannot process, technology will tend to eventually fix that translation and you will get the challenge sentence easily spotted by robots.
That is the main principle of ReCaptcha being used as an OCR, but from the opposite side. I will suggest you to read this paper but briefly the researchers state that ReCaptcha is destined to improve its accuracy far more than automated OCRs because of user input.
Since Google and Bing Translate widely use user-submitted data to improve their translation process, they will be subject to a human-aided machine learning eventually breaking the Turing Test for that kind of challenge (e.g. ReCaptcha will read like a human, Translate will translate like a human)

After reading the comments, it seems the only danger I face is a vague future Google Translate one, which is unlikely to eventuate. So I'm going to stick my head out and say that this is indeed a good security measure which could conceivably be useful to many businesses or organisations that have such a customer base. Thanks for the assist.
Major point in it's favor is ease of use for the customers all of which so far prefer it to trying to read captcha. I put it on a live system so had 80+ people use it today.

I presume they all speak English too then? Unusual to require your users to be bilingual. Even if this is the case today, is it possible that with future growth you might be excluding certain users? What if someone moves into the area who wants to signup but only speaks English?
Language is a funny imprecise thing. You could take a sentence and probably translate it a number of different ways. Computers deal in precision so you need a question where there can only be one answer.
Also, the whole idea of a CAPTCHA is to make sure it's a real person but it may not be too hard to write a program that uses google translate or something similar. It may not always get it right but it'd probably get through some of the time.

Related

How to optimise Google Translate API calls to translate multiple words in a single request

Everyone. Recently Google Translate Is Integrated Into My Project, Which Plays The Role Of Translating Some Product Names, Product Descriptions, Product Related Category Names. But Cause There Are Plenty Of Products In My Database(And Increased Quickly), Google Translate Api Would Cost Considerable Money.
I Want To Translate By Google As Less As Possible. In The Translation, Many Words Are Same Among Many Products, For Example : 阿迪达斯 - Adidas, 苹果 - iphone, 篮球 - Basketball, Bla Bla..... I Wanna Do Some Tricks, But Find No Idea.
Did Anyone Encounter Such Questions?
Any Help Would Be Appreciated.
It sounds like what you need is actually the ability to reuse translation at the string or substring level (in other words, per database entry). You can't really do that with Google, that I know of. You've got a few options, as I see it:
You could switch over to Microsoft Translator and use their methods
that allow you to place translations yourself, such as their
Collaborative Translation feature that lets you override the MT with
a preferred translation and even to vote translations up/down. Quality here will be broadly comparable to Google (I often find it better), and you have methods at your disposal that allow this override. Also, unlike Google, the Microsoft API is free up to a certain volume. Take a look:
http://www.microsoft.com/en-us/translator/developers.aspx
Microsoft also has a unique feature called the Microsoft Translator Hub, which can use your terminology, for example, for translations. However,depending on how you implemented any solution with Microsoft, you might still have the problem that you are making more calls out to Microsoft than you'd like, and, moreover, that "matching" only takes place at the level of a whole record or string, so it would not hit the case of shared linguistic elements being concatenated into one string.
There's a commercial offering called GeoFluent (full disclosure--I am the product manager for this product, so I'm clearly biased :)) that works with Microsoft Translator but provides pre and post translation processing that can deal with sub-segment and may reduce the volume you are therefore putting through translation each time. It could make sense if, as you mention, you are rapidly adding to your database. Of course, this is a commercial offering too, so you'd have to balance the costs.
Let me know if this helps, and happy to answer any other questions you have.
Marcus
There is a PHP sample here : http://weblite.ca/svn/dataface/modules/tm/trunk/lib/googleTranslatePlugin.php
That allows you to send and array and return an array.
array(source=>target) getTranslations()
translates all of the user provided strings into the target language using the Google Translate API and returns an array of source=>target
strings.

How usable and secure is Confident CAPTCHA? Are there other options?

I am trying to find an easier CAPTCHA to use with my website. I currently have reCAPTCHA but the users are struggling to get the words right the first time.
I have came across Confident CAPTCHA (here) and would like to know what you guys think about it.
Has anyone used it before?
How safe is it?
Are there similar CAPTCHA's, excluding reCAPTCHA?
Interesting captcha, I have not seen this one before.
I will try to address your second question about How safe is it?. There are no docs available or sample code to check so the analysis is based on using it a few times.
It seems like it should be reasonably secure. I see that it uses a 3rd party service, so you will rely on API calls to generate the HTML markup and validate the captcha.
In their demo, you are required to choose 4 images out of a total of 9 which means the probability of guessing the correct value is about 0.000330688% (1/9 * 1/8 * 1/7 * 1/6).
It essentially works by creating an alpha captcha code based on the sequence of images you choose. So the server generates a random challenge (cat, vehicle, drink, house) and associates each element with a random letter from the range [A-Z].
Clicking the sequence of images creates a captcha code based on the letter assigned to each image (e.g. PKIR) if cat = P, vehicle = K, drink = I, house = R that gets placed in a hidden input and submitted with the form.
Therefore the only way to pass the captcha is to come up with a code that agrees with the sequence of images on the server side.
I would conclude it is relatively secure in that there is no way to defeat the captcha solely on the client side (see this question for example). Since there is no reason for them to ever present anything related to the solution to the client (browser); it would seem logical that the only way to get the correct captcha code is to select the correct images in the correct sequence.
Conclusion:
At first glance, the captcha seems secure (no easy bypasses).
This specific captcha may be more difficult to farm out to human solvers (a positive)
Depending on the number of objects and images in the database, it may be possible to generate a database of words to images.
One potential downfall to the captcha is that certain words may require a moderate level of understanding the English language; non-English speaking users may be completely cut off or at least have to put in additional effort to translate words to their native language.
You may want to do a usability check of this captcha on mobile devices (just a thought).
That's my 2 cents, I hope that helps you out.
I'm using it with ads and well, this is very secure.
About english language, the api support many languages and adapt the questions based on the browser language.
I have used GoogleTranslation to help people who have spoken language out of the ConfidentCaptcha reach.
No problem so far. They are very responsive, a very good support.
About mobile, if you don't use ads, you have a special mobile mode, which make it very easy and adapted to the tiny devices.

Captcha's + Differnet Possibilities

I wanted to run some captcha possibities past people to see if they are easily by passed by bots etc.
What if colors were used - eg: there is a string of 10 characters are you ask people to type the red characters of where there are 5? Easy to bypass?
I've noticed a captcha on plentyoffish that involves typing in the characters under the circles. This seems a touch more complex - would this be more challenging for bots?
The other idea I was thinking was putting the requirement in an image as well meaning like in no. 1 above - you can put "type the red characters" in an image and this could change with different colors. Any value here?
Interested in what people think.
cheers
Colours are easy to bypass. A bot just takes the red channel and gets the answer. It is even easier than choosing between many possible solutions. The same applies to any noise that has another colour than the letters the user needs to find.
Symbols that don't touch the letters are very easy to ignore. Why would a bot even look at those circles that probably always stay at the same position? (valid but wasn't asked here)
Identifying circles or other symbols is easier than identifying letters, if one can do the latter, a simple symbol is no challenge.
I think captchas are used too frequently in places where they aren't the best tool. For instance, are you trying to prevent registration spam? Why use a captcha rather than email validation?
What are your intentions and have you considered alternatives to the (relatively ineffective) captcha technology?
As a side note, if you have to use them, I prefer KittyAuth myself :) http://thepcspy.com/kittenauth/#5
Color blind people will have trouble separating red from green letters. People who have trouble reading and understanding descriptions, or have other disabilities may have trouble reading the captchas too.
In some of these, the texts are so mangled that almost everyone has a hard time reading them.
I think captcha's, if used at all, should be quite easy to read. The one with the dots and triangles is doable, although it's a matter of time before someone writes an algorithm to hack them. It is very easy for computers to read this kind too.
The best way to deal with this, is increase moderation. Make your site so that it isn't rewarding to spam it at all. Don't make it the problem of your users.
Also, if you're gonna use captcha's, it may be better to build something yourself than to use common libraries. I've found that these are easier hacked, probably because it is more rewarding to write a captcha solver for something that is used by thhousands of sites.
No matter which CAPTCHA you construct, spammers will find a way to work around it, given enough incentive. Large CAPTCHA services like reCAPTCHA, for instance, get bypassed by outsourcing solving them to cheap labor in India(source).
If you run a small site, your best bet is to make your own mini-CAPTCHA, which asks a simple question. If it isn't a standard question, isn't a standard CAPTCHA module and isn't a large site, it isn't worth it for the spammers to automate bypassing it.
I've been working on a community site for an organization at my university, and we've had trouble with spammers registering, despite us using every CAPTCHA module in the book. As soon as we made our own simple one-question CAPTCHA, all spam stopped. The key to preventing this sort of spam often lies in uniqueness.

How does Google Know you are Cloaking?

I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tell that you have checked the user agent and executed a different code path because you saw "googlebot" in the name?
It's in relation to this question on legitimate url cloaking for seo. If textual content is exactly the same, but the rendering is different (1995-style html vs. ajax vs. flash), is there really a problem with cloaking?
Thanks for your put on this one.
As far as I know, how Google prepares search engine results is secret and constantly changing. Spoofing different user-agents is easy, so they might do that. They also might, in the case of Javascript, actually render partial or entire pages. "Do they have a team of human beings comparing?" This is doubtful. A lot has been written on Google's crawling strategies including this, but if humans are involved, they're only called in for specific cases. I even doubt this: any person-power spent is probably spent by tweaking the crawling engine.
Google looks at your site while presenting user-agent's other than googlebot.
See the Google Chrome comic book page 11 where it describes (even better than layman's terms) about how a Google tool can take a schematic of a web page. They could be using this or similar technology for Google search indexing and cloak detection - at least that would be another good use for it.
Google does hire contractors (indirectly, through an outside agency, for very low pay) to manually review documents returned as search results and judge their relevance to the search terms, quality of translations, etc. I highly doubt that this is their only tool for detecting cloaking, but it is one of them.
In reality, many of Google's algos are trivially reversed and are far from rocket science. In the case of, so called, "cloaking detection" all of the previous guesses are on the money (apart from, somewhat ironically, John K lol) If you don't believe me set up some test sites (inputs) and some 'cloaking test cases' (further inputs), submit your sites to uncle Google (processing) and test your non-assumptions via pseudo-advanced human-based cognitive correlationary quantum perceptions (<-- btw, i made that up for entertainment value (and now i'm nesting parentheses to really mess with your mind :)) AKA "checking google resuts to see if you are banned yet" (outputs). Loop until enlightenment == True (noob!) lol
A very simple test would be to compare the file size of a webpage the Googlbot saw against the file size of the page scanned by an alias user of Google that looks like a normal user.
This would detect most suspect candidates for closeer examination.
They call your page using tools like curl and they construct a hash based on the page without the user agent, then they construct another hash with the googlebot user-agent. Both hashes must be similars, they have algorithms to check the hashes and know if its cloaking or not

When its enough for a programming language that you need to switch to another?

I have wonder that many big applications (e.g. social websites such as facebook) are build with many languages into its platform.
They usually start with AJAX browser support, then scale down to PHP scripting, then move towards a powrful OOP technologie such as Java or .NET, and finally a primitive language to increase performance in crucial operations such as C.
My question is how should I determinate the edge of the layers between languages. When PHP, when Java, when C and so on. And the other question is if should those languages integrate in a vertcal fashion for simplicity and maintanance, or could it be cases when you decide to program on module of your app in Java and the other in native C.
What are the context variables that push me to move to a better performance language? (e.g. concurrency issues due increase of users)
Don't tell me that PHP overlaps .NET and Java Technologies. In a starter point it does, but when the network is overload you start seeing the diferences. I mean how can I achieve Multithreading in PHP as in Java with the same performance. The thing it's hard to answer my wuestion is becasue there is not so much reading about this. You maybe find some good books covering PHP, but few telling how when and why integrate different languages.
Each language was created for different purposes, Python is strong with string operations, Perl very powerful in batch scripting, PHP a very reliable application web server, C the mother of most popular languages.
Best,
Demian.
On one end of the scale, you move to a higher performance language whenever your profiling and measurements tell you that you have a bottleneck that can't be fixed with better algorithms, data structures, or other optimisation.
At the other end, you move to a higher level language (ie. more abstraction, better libraries) whenever your management allow you to do so. ;)
I believe most teams simply use what they are best familiar with.
There are also questions of licensing that can influence the decision.
That is, if you're talking about technologies that compare to each other and solve the problem on the same level (for example ASP.NET/JSF/JSP/PHP...). But you can't compare .NET with C++ for example, they are meant to solve different problems on different abstraction levels.
My criterion for any programming language is "does it help me to get the job done or does it just get in the way?" If the latter, then it's time to move on.
From an economical point of view the answer is easy: on a regular basis just look what will be cheaper. Either continue with the current technology and maybe stretch the envelope a bit more. Or switch to something new. When you compare the two alternatives the cost of the investment already done is not important anymore since you've already spent that money/effort. You only have to look ahead: cost of licenses, education, etc.
Of course this is easier said then done, but just sitting down with a few people, thinking about it, and maybe try to come up with some numbers already helps a lot. I have seen too many projects that continued with technology that really wasn't suited for the job anymore.
Also hard numbers don't tell the whole story. There will be resistance because of unfamiliar technology, experts who are losing their status, etc.
Identify the bottleneck
Solve bottleneck
Go to 1
I'm sure you can imagine that step 2 is the one where decisions like "What programming language do we use" and "where do we put the coffee machine" come into play. That's the basic rule.