How do robots see captcha? or a group of alpha-numeric characters? - captcha

like why do websites insist on entering captchas?
i think that even robots are capable of reading that...
I want to know how this thing with captcha work...:/

In the past 10 years robots were unable to identify distorted test as entered in captcha . But now a days robots can identify distorted text and even door number plates used in so many websites as captcha .

I am not sure if this is the perfect answer but you might want to take a look at this blog post where author has mentioned that so far no program or robot has been able to read the CAPTCHA or any other form of distorted texts while computer programs can read the simple text or text in images, no such system has been so far developed to scan and read distorted texts or CAPTCHA.

There is a thing called Turing Test which differentiates a machine an a Human. Captchas are formed so that machines fail that test. But using AI , now-a-days machines have started passing Turing tests.

Captchas will typically display alphanumeric characters in a distorted way so that the human brain can process it in a way automatic character recognition can not. This trick is how the website administrator can tell humans and robots apart.
As machine learning algorithms get better, captchas are getting more complicated for actual humans to solve. This is an arms race between website administrators trying to keep their site safe from robots and hackers trying to create fake accounts in an automated way.
Google's reCAPTCHA asks you to read 2 fields. One is an actual captcha, the other is an image that Google's machine learning system has failed to read (such as a house street number from Google Street View). By solving this captcha, you are helping the machine to learn.

Related

On-device single-word voice recognition

Does needing just a single word voice recognition reduce the complexity of the task enough to be able to fully perform voice recognition processing offline, on an iOS or Android smartphone? (E.g., could a reasonably accurate counter for the number of times that a single, pre-programmed word was spoken while the microphone is active be developed to work offline on a standard iOS or Android smartphone?).
I've found plenty of tools and examples capturing voice and sending it to an online service (e.g., the Google cloud voice-to-text), but does the single-word focus reduce the complexity enough for the recognition to be doable offline today? If so, do you have any libraries to suggest or where would you start?
Cloud services are good for various reasons relating to your question:
It makes deployment of new versions of the algorithm (which happen much more frequently than most people realize) a lot easier
It allows the developer to collect your data and use it in future algorithm development (or whatever they please)
From a practical standpoint, most deployed models (at least the effective ones) can be quite large and take up quite a bit of space on a mobile device.
In addition to the above, I don't think that the singular word focus changes much, if anything. The model has to not just account for words, but also for the different ways those words can be said (volume, tone, accents, inflection, etc, etc).
So what you are asking can be done but there's also good reasons why it's on the cloud.

Number of results google (or other) search programmatically

I am making a little personal project.
Ideally I would like to be able to make programmatically a google search and have the count of results. (My goal is to compare the results count between a lot (100000+) of different phrases).
Is there a free way to make a web search and compare the popularity of different texts, by using Google Bing or whatever (the source is not really important).
I tried Google but seems that freely I can do only 10 requests per day.
Bing is more permissive (5000 free requests per month).
Is there other tools or way to have a count of number of results for a particular sentence freely ?
Thanks in advance.
There are several things you're going to need if you're seeking to create a simple search engine.
First of all you should read and understand where the field of information retrieval started with G. Salton's paper or at least read the wiki page on the vector space model. It will require you learning at least some undergraduate linear algebra. I suggest Gilbert Strang's MIT video lectures for this.
You can then move to the Brin/Page Pagerank paper which outlays the original concept behind the hyperlink matrix and quickly calculating eigenvectors for ranking or read the wiki page.
You may also be interested in looking at the code for Apache Lucene
To get into contemporary search algorithm techniques you need calculus and regression analysis to learn machine learning and deep learning as the current google search has moved away from Pagerank and utilizes these. This is partially due to how link farming enabled people to artificially engineer search results and the huge amount of meta data that modern browsers and web servers allow to be collected.
EDIT:
For the webcrawler only portion I'd recommend WebSPHINX. I used this in my senior research in college in conjunction with Lucene.

What methods to recognize sentence handwriting?

I mean posts per sentence, not per letter. Such a doctor's prescription handwriting which hard to read. Not just a normal handwriting.
In example :
I use a data mining or machine learning for doing a training from
paper handwrited.
User scanning a paper with hard to read writing.
The application doing an image processing.
And the output is some sentence from paper.
And what device to use? (Scanner or webcam)
I am newbie. If could i need some example in vb.net with emguCV/openCV and researches journals.
Any help would be appreciated.
Welcome to stack overflow! The answer to your question is twofold:
a. If you want to recognize handwriting that has already happened i.e. it is presented to you as an image you are in trouble. Computer Vision is still not good enough to provide you with reasonable accuracy.
b. If you have a chance to recognize handwriting “as it's happening” - you are in luck. Download, for example, a Gesture Search app from Android play store and you are in business.
The difference between the two scenarios is subtle but significant. In the second case you have an extra piece of information that makes handwriting recognition possible. This piece is timing of each stroke. In other words, instead of an image with handwriting you have a bunch of strokes that are all labeled with their time stamps. You can think about it as a sequence of lines and curves or as image segmentation - in any way this provides a big hint for the system. Additional help comes from the dictionary on your phone but this is typically used by any handwriting system.
Android of course has an open source library for stroke recognition (find more on your own). If you still want to go for recognizing images though, you have to first detect text (e.g. as a bounding box) and second use any of the existing engines to process detected regions. For text detection I can recommend MSER. But be careful trying to implement even text detection on your own - you are entering a world of pain here ;). Here is an article that can help.
As for learning how to recognize text from images on the Internet - this can be your plan B or C or Z when you master above mentioned stages. Don’t try to abuse learning methods and make them do hard work for you - you will hit a wall if you don’t understand what’s going on under the hood.

API to break voice into phonemes / synthesize new speech given speech samples?

You know those movies where the tech geeks record someone's voice, and their software breaks it into phonemes? Which they can then use to type in any phrase, and make it seem as if the target is saying it?
Does that software exist in an API Version? I don't even know what to Google.
There is no such software. Breaking arbitrary speech into its constituent phonemes is only a partially solved problem: speech-to-text software is still imperfect, as is text-to-speech.
The idea is to reproduce the timbre of the target's voice. Even if you were able to segment the audio perfectly, reordering the phonemes would produce audio with unnatural cadence and intonation, not to mention splicing artifacts. At that point you're getting into smoothing, time-scaling, and pitch correction, all of which are possible and well-understood in theory, but operate poorly on real-world data, especially when the audio sample in question is as short as a single phoneme, and further when the timbre needs to be preserved.
These problems are compounded on the phonetic side by allophonic variation in sounds based on accent and surrounding phonemes; in order to faithfully produce even a low-quality approximation of the audio, you'd need a detailed understanding of the target's language, accent, and speech patterns.
Furthermore, your ultimate problem is one of social engineering, and people are not easy to fool when it comes to the voices of people they know. Even with a large corpus of input data, at best you could get a short low-quality sample, hardly enough for a conversation.
So while it's certainly possible, it's difficult; even if it existed, it wouldn't always be good enough.
SRI International (the company that created Siri for iOS) has an SDK called EduSpeak, which will take audio input and break it down into individual phonemes. I know this because I sat through a demo of the product about a week ago. During the demo, the presenter showed us an application that was created using the SDK. The application gave a few lines of text for the presenter to read. After reading the text, the application displayed a bar chart where each bar represented a phoneme from his speech. The height of each bar represented a score of how well each phoneme was pronounced (the presenter was not a native English speaker, so he received lower scores on certain phonemes compared to others). The presenter could also click on each individual bar to have only that individual phoneme played back using the original audio.
So yes, software exists that divides audio up by phoneme, and it does a very good job of it. Now, whether or not those phonemes can be re-assembled into speech is an open question. If we end up getting a trial version of the SDK, I'll try it out and let you know.
If your aim is to mimic someone else's voice, then another attitude is to convert your own voice (instead of assembling phonemes). It is (surprisingly) called voice conversion, e.g http://www.busim.ee.boun.edu.tr/~speech/projects/Voice_Conversion.htm
The technology is called "voice synthesis" and "voice recognition"
The java API for this can be found here Java voice JSAPI
Apple has an API for this Apple speech
Microsoft has several ...one is discussed here Vista speech
Lyrebird is a start-up that is working on this very problem. Given samples of a person's voice and some written text, it can synthesize a spoken version of that written text in the voice of the person in the samples.
You can get interesting voice warping effects with a formant-aware pitch shift. Adobe Audition has a pretty good implementation. Antares produces some interesting vocal effects VST plugins.
These techniques use some form of linear predictive coding (LPC) to treat the voice as a source-filter model. LPC works on speech signals by estimating the resonance of the vocal tract (formant), reversing its effect with an inverse filter, and then coding the resulting residual signal. The residual signal is ideally an impulse train that represents the glottal impulse. This allows the scaling of pitch and formants independently, which leads to a much better gender conversion result than simple pitch shifting.
I dunno about a commercially available solution, but the concept isn't entirely out of the range of possibility. For example, the University of Delaware has fairly decent software for doing just that.
http://www.modeltalker.com

Harder, Better, Faster, Stronger... Techniques for an image-based CAPTCHA?

There are lots of non-image-based CAPTCHA ideas floating around. But what about the old-fashioned way?
What are the elements of a good image CAPTCHA? What visual elements are hard for computers, but easier for humans? What about mistakes, elements that are easier for computers than they are for humans? What are good techniques for increasing the speed of a CAPTCHA generator?
Here's an example of a CAPCHA I've been working on. It generates the functions for two sine waves, then stretches a text between them. It lays that over a background drawn from a pool of images.
How could this be improved? (Specifically, I'm using PHP GD.) Things that come to mind are:
Change the color of the text, possibly making it multicolored.
Add "scratches" or marks that mildly obscure the text.
Add to the distortion so that it's affected by sine waves horizontally as well.
What goes into a superb image CAPTCHA?
Edit:
I know that there are some very worthy third-party CAPTCHA resources. I'm looking for attributes that make them good. I'd like to use my own CAPTCHAs, just for the purpose of self-improvement. So, you can talk about reCAPTCHA, but it's not exactly what I'm looking for.
Also, it has been brought up that not only the image, but also the experience matters, so feel free to comment on that.
Make each letter/number out of a pattern, I.E. unconnected dots. Meaning the computer has no way of knowing that a dot is part of a letter other than pattern recognition (which they don't have yet.) Then the usual distortions and random lines.
How you do this is the challenge.
EDIT: Also, bonus points for patterns of different shapes, and try alpha transparency on the characters (on the edges or the whole character), so they merge with the background.
Make letters difficult to separate. Use handwriting-like font or add lines that join letters. Decrease and randomize spacing between letters.
Add wave distortion in other axis too. Distortion in one axis only can be relatively easily analyzed and reversed.
Don't bother with color background at all. It's super-easy to automatically filter black from other colors. Your background hinders only humans.
Don't add scratches or other noise unless it has the same thickness as letters. Noise-removal algorithms can easily remove things that are thinner than letters.
What if the color of the letters faded into other colors... for instance the 5 can start off as yellow on top and fade into blue or something. The colors chosen should be random.
With the multicolored background it might make it hard for the computer to pickup where the background ends and the character begins.. and hopefully it would not be too difficult for the human to actually pick up the pattern.
Instead of generating captcha you can create a captcha table in your database and you yourself create the table by search on google for good captcha images.
So no need to worry "Will this generation method work?"
I really hate CAPTCHA on sites, they just annoy me, but if you want to try and make a robust one try the following:
Ability to get a new image without submitting
Spoken version for the visually impaired
Non-uniform characters
I've used Recaptcha on a few sites, it's a nice and robust solution.
Or if you want to be really funky about it check out this: http://research.microsoft.com/asirra/
Algorithms that try to break captcha are pattern matchers that work by a few different ways: scaling and skewing the symbols that they already know about, finding and tracing edges, and counting interior holes to help. If you can break the letter up into pieces, vary the letter quality, or add strong lines or “scratches” along the letters these techniques will help. However all of this is fairly moot considering we have recaptcha for this purpose and it’s a wonderful third party app for this. Additionally captcha will help the security of your site, but will not stop those who are truly enticed.
I like the idea of KittenAuth and Microsoft's Asirra project. The idea is that, while OCR will eventually evolve to break your traditional captcha, the ability to distinguish a kitten from a dog is many orders of magnitude more complex a problem, while absolutely trivial for humans.
This solution, while probably the sexiest captcha idea ever, has the limitation of not being easily portable to hearing-impaired methods.
What about shearing and shuffling bands to mangle display and mouse-only input?
Start by taking your sine-wave morphed text, divide into horizontal bands or maybe even a grid.
That makes optical recognition harder and might allow you to avoid the kind of nasty background games that make some captchas hard for humans.
For a site where you can rely on local drag in the browser, instead of typing in an entry use shuffling requiring the user to re-order pieces (just in sloppy order, not like one of those puzzles). Or, if you wanted to use clicks alone, the classic sliding tile puzzle.
Note, I've run into a captcha where you had to identify which of N cartoons had an animal in them which succeeded in blocking me!
Wellington Grey sums up the AI CAPTCHA race nicely.
You could add a random array of fonts so that GD renders each character using a different one.
Be wary of suggestions of ReCaptcha. I have submitted incorrect input into it a couple few dozen times, and have had success each time. Several of those times I have submitted incorrect input for both words rather than just the most obscured word; the success rate, as I said, has been 100%.
I also think that image-based CAPTCHAs are user-hostile and should be avoided wherever possible. The advantage of text-based solutions is that you can tailor them to your site's audience, adding a level of obscurity that may trip up machines as they become more savvy with text-based solutions.
At the very least, don't use this all the time:
(source: codinghorror.com)