Which OCR was used by Google for making Google Books? - api

And especially did they release some API to use it in our development?
I think that an OCR trained over million of books should have simply become perfect and I've also heard that Google improved a lot his algorithm through neural networks for the recognizing of numbers of houses in street view.
So I really wonder if this API are available.

Basically for google books , Google customs ocropus opensource OCR with a lot of techniques.
The funniest is the crowdsourcing one via the captchas composed of 2 word, one generated and insuring security, the second one is an unrecognized word in google books or streetview (building numbers) ...

Related

What is needed for a recommendation engine based on word/text input

I'm new to the Machine-Learning (AI) technology. I'm developing a messenger app for Android/IOs where I would like to recommend the users based on the texts/word/conversation a product from a relative small product portfolio.
Example 1:
In case the user of the messenger writes a sentence including the words "vine", "dinner", "date" the AI should recommend a bottle of vine to the user.
Example 2:
In case the user of the app writes that he has drunk a good coffee this morning, the AI should recommend a mug to the user.
Example 3:
In case the user writes something about a cute boy she met last day, the AI should recommend a "teddy bear" to the user.
I'm a Software Developer since almost 20 year with experience in the development of C/C++/Java based application (Android and IOs apps) as well as some experience in Google Cloud Platform. The ML/AI technology is completely new to me. Okay, I know the basics (input data is needed to train the ML/AI system etc.), but I wonder If there is already a framework which could help me to develop such a system which solves the above described uses-case.
I would appreciate it, if you could give me some hints where and how to start.
Thank you and regards
It is definitely possible to implement such an application, in case you want to do it in Google Cloud you will need some understanding of Tensorflow.
First of all, I recommend to you to do the Machine Learning Crash Course, for a good introduction to Machine Learning and to start to familiarize yourself with TensorFlow. Afterwards I recommend to take a look into Tensorflow tutorials which will give you a more practical introduction to Tensorflow, and include various examples on building/training/testing models.
Once you are famirialized with Tensorflow, you can jump into learning how to run jobs in the Machine Learning engine, you can start by following the quickstart. The documentation includes detailed guides on how to use the ml-engine, plus multiple samples and tutorials.
Since I believe that your application would fall into the Recommender System type, here you can see an example model, in Google Cloud ML Engine, on how to recommend items to users based on his previous searches. In your case, you would have to build a model in order to recommend items to users based on his previous words in the sentence.
The second option, in case you don't want to go through the hassle of building a new model from scratch, would be to use the Google Cloud Natural Language API, which you can understand as pre-trained models using Google (incredibly big) data. In your case, I believe that the Content Classifying API would help you achieve what your application intends to do, however, the outputs (which you can see here) are limited to what the model was trained to do, and might not be specific enough for your application, however it is an easy solution and you can still profit of this API in order to extract labels/information and send it as input to another model.
I hope that these links provide you with some foundations on what is possible to do with Tensorflow in the ML Engine, and are useful to you.

On-device single-word voice recognition

Does needing just a single word voice recognition reduce the complexity of the task enough to be able to fully perform voice recognition processing offline, on an iOS or Android smartphone? (E.g., could a reasonably accurate counter for the number of times that a single, pre-programmed word was spoken while the microphone is active be developed to work offline on a standard iOS or Android smartphone?).
I've found plenty of tools and examples capturing voice and sending it to an online service (e.g., the Google cloud voice-to-text), but does the single-word focus reduce the complexity enough for the recognition to be doable offline today? If so, do you have any libraries to suggest or where would you start?
Cloud services are good for various reasons relating to your question:
It makes deployment of new versions of the algorithm (which happen much more frequently than most people realize) a lot easier
It allows the developer to collect your data and use it in future algorithm development (or whatever they please)
From a practical standpoint, most deployed models (at least the effective ones) can be quite large and take up quite a bit of space on a mobile device.
In addition to the above, I don't think that the singular word focus changes much, if anything. The model has to not just account for words, but also for the different ways those words can be said (volume, tone, accents, inflection, etc, etc).
So what you are asking can be done but there's also good reasons why it's on the cloud.

IPA (International Phonetic Alphabet) Transcription with Tensorflow

I'm looking into designing a software platform that will aid linguists and anthropologists in their study of previously unstudied languages. Statistics show that around 1,000 languages exist that have never been studied by a person outside of their respective speaker groups.
My goal is to utilize TensorFlow to make a platform that will allow linguists to study and document these languages more efficiently, and to help them create written systems for the ones that don't have a written system already. One of their current methods of accomplishing such a task is three-fold: 1) Record a native speaker conversing in the language, 2) Listening to that recording and trying to transcribe it into the IPA, 3) From the phonetics, analyzing the phonemics and phonotactics of the language to eventually create a written system for the speaker.
My proposed platform would cut that research time down from a minimum of a year to a maximum of six months. Before I start, I have some questions...
What would be required to train TensorFlow to transcribe live audio into the IPA? Has this already been done? and if so, how would I utilize a previous solution for this project? Is a project like this even possible with TensorFlow? if not, what would you recommend using instead?
My apologies for the magnitude of this question. I don't have much experience in the realm of machine learning, as I am just beginning the research process for this project. Any help is appreciated!
I guess I will take a first shot at answering this. Since the question is pretty general, my answer will have to be pretty general as well.
What would be required. At the very least you would have to have a large dataset of pre-transcribed data. Ideally a large amount of spoken language audio mapped to characters in the phonetic alphabet, so the system could learn the sound of individual characters rather than whole transcribed words. If such a dataset doesn't exist, a less granular dataset could be used, mapping single words to their transcriptions. Then you would need a model, that is the actual neural network architecture implemented in code. And lastly you would need some computing resources. This is not something you can train casually, you would either have to buy some time in a cloud based machine learning framework (like Google Cloud ML) or build a fairly expensive machine to train at home.
Has this been done? I don't know. I don't think so. There have been published papers reporting various degrees of success at training systems to transcribe speech. Here is one, for example, http://deeplearning.stanford.edu/lexfree/lexfree.pdf It seems that since the alphabet you want to transcribe to is specifically designed to capture the way words sound rather than just write down the words you might have more success at training such a model.
Is it possible with TensorFlow. Yes, most likely. TensorFlow is well suited for implementing most modern deep learning architectures. Unless you end up designing some really weird and very original model for this purpose, TensorFlow should work just fine.
Edit: after some thought in part 1, you would have to use a dataset mapping spoken words to their transcriptions, since I expect that the same sound pronounced separately would be different from when the same sound is used in a word.
This has actually been done, albeit in PyTorch, by a group at CMU: https://github.com/xinjli/allosaurus

Number of results google (or other) search programmatically

I am making a little personal project.
Ideally I would like to be able to make programmatically a google search and have the count of results. (My goal is to compare the results count between a lot (100000+) of different phrases).
Is there a free way to make a web search and compare the popularity of different texts, by using Google Bing or whatever (the source is not really important).
I tried Google but seems that freely I can do only 10 requests per day.
Bing is more permissive (5000 free requests per month).
Is there other tools or way to have a count of number of results for a particular sentence freely ?
Thanks in advance.
There are several things you're going to need if you're seeking to create a simple search engine.
First of all you should read and understand where the field of information retrieval started with G. Salton's paper or at least read the wiki page on the vector space model. It will require you learning at least some undergraduate linear algebra. I suggest Gilbert Strang's MIT video lectures for this.
You can then move to the Brin/Page Pagerank paper which outlays the original concept behind the hyperlink matrix and quickly calculating eigenvectors for ranking or read the wiki page.
You may also be interested in looking at the code for Apache Lucene
To get into contemporary search algorithm techniques you need calculus and regression analysis to learn machine learning and deep learning as the current google search has moved away from Pagerank and utilizes these. This is partially due to how link farming enabled people to artificially engineer search results and the huge amount of meta data that modern browsers and web servers allow to be collected.
EDIT:
For the webcrawler only portion I'd recommend WebSPHINX. I used this in my senior research in college in conjunction with Lucene.

Audio analysis for voice, gender diarization/recognition

Does anyone know a library, program, project, etc. that tries to determine how many speakers were active in an audio file, label each speaker, label its gender, etc.?
So far I found the following:
Identifying segments when a person is speaking?
Audio analysis to detect human voice, gender, age and emotion — any prior open-source work done?
The task of identifying how many people are there and assigning segments to speakers in an audio file is known as speaker diarization. Using this keyword for search you can find lots of research papers and some libraries in python. Most of the current research use deep learning models, typically RNN, to generate embeddings and then cluster them into different chunks, ideally which belong to different speakers. It is a difficult task, especially if your files are noisy. I didn't find any library/tool which was very accurate. Even IBM's API is not that accurate.
We have developed some Deep learning models on our own for this task which are exposed through API's. You can take a look at https://developers.deepaffects.com/ for more info. We also have gender and emotion recognition API's.
Disclosure - I work at deepaffects