Is Google reCAPTCHA v3 a perfect fit for bot detection? I doubt 100% accuracy is needed. We want statistical significance and confidence that an account generating a post, tweet, or some other electronic message is mostly automatic and mostly not human.
I would make a Google ReCaptcha test every few message transmissions. One would think over a long enough period one could with confidence generate significant statistics to determine whether the account was used mostly by a human or mostly by a bot.
Related
Could anyone tell me which identifiers in WebRTC Statistics API are directly related to the quality of the experiences users have during connections?
This depends on the type of session. A videocall where many participants collaborate has different needs than an audio call where one talks any the others mainly listen.
In general the elements that impact the perceived quality are packetsLost, jitter, currentRoundTripTime, framesDropped, pliCount, framesDropped.
You should also consider that the bandwidth estimators adapt the bandwidth (and so the quality) based on the feedback from the other party.
If you search for "Quality of experience estimators for WebRTC" you'll find studies that use the above statistics to estimate the QoE.
I'm new to the Machine-Learning (AI) technology. I'm developing a messenger app for Android/IOs where I would like to recommend the users based on the texts/word/conversation a product from a relative small product portfolio.
Example 1:
In case the user of the messenger writes a sentence including the words "vine", "dinner", "date" the AI should recommend a bottle of vine to the user.
Example 2:
In case the user of the app writes that he has drunk a good coffee this morning, the AI should recommend a mug to the user.
Example 3:
In case the user writes something about a cute boy she met last day, the AI should recommend a "teddy bear" to the user.
I'm a Software Developer since almost 20 year with experience in the development of C/C++/Java based application (Android and IOs apps) as well as some experience in Google Cloud Platform. The ML/AI technology is completely new to me. Okay, I know the basics (input data is needed to train the ML/AI system etc.), but I wonder If there is already a framework which could help me to develop such a system which solves the above described uses-case.
I would appreciate it, if you could give me some hints where and how to start.
Thank you and regards
It is definitely possible to implement such an application, in case you want to do it in Google Cloud you will need some understanding of Tensorflow.
First of all, I recommend to you to do the Machine Learning Crash Course, for a good introduction to Machine Learning and to start to familiarize yourself with TensorFlow. Afterwards I recommend to take a look into Tensorflow tutorials which will give you a more practical introduction to Tensorflow, and include various examples on building/training/testing models.
Once you are famirialized with Tensorflow, you can jump into learning how to run jobs in the Machine Learning engine, you can start by following the quickstart. The documentation includes detailed guides on how to use the ml-engine, plus multiple samples and tutorials.
Since I believe that your application would fall into the Recommender System type, here you can see an example model, in Google Cloud ML Engine, on how to recommend items to users based on his previous searches. In your case, you would have to build a model in order to recommend items to users based on his previous words in the sentence.
The second option, in case you don't want to go through the hassle of building a new model from scratch, would be to use the Google Cloud Natural Language API, which you can understand as pre-trained models using Google (incredibly big) data. In your case, I believe that the Content Classifying API would help you achieve what your application intends to do, however, the outputs (which you can see here) are limited to what the model was trained to do, and might not be specific enough for your application, however it is an easy solution and you can still profit of this API in order to extract labels/information and send it as input to another model.
I hope that these links provide you with some foundations on what is possible to do with Tensorflow in the ML Engine, and are useful to you.
Does needing just a single word voice recognition reduce the complexity of the task enough to be able to fully perform voice recognition processing offline, on an iOS or Android smartphone? (E.g., could a reasonably accurate counter for the number of times that a single, pre-programmed word was spoken while the microphone is active be developed to work offline on a standard iOS or Android smartphone?).
I've found plenty of tools and examples capturing voice and sending it to an online service (e.g., the Google cloud voice-to-text), but does the single-word focus reduce the complexity enough for the recognition to be doable offline today? If so, do you have any libraries to suggest or where would you start?
Cloud services are good for various reasons relating to your question:
It makes deployment of new versions of the algorithm (which happen much more frequently than most people realize) a lot easier
It allows the developer to collect your data and use it in future algorithm development (or whatever they please)
From a practical standpoint, most deployed models (at least the effective ones) can be quite large and take up quite a bit of space on a mobile device.
In addition to the above, I don't think that the singular word focus changes much, if anything. The model has to not just account for words, but also for the different ways those words can be said (volume, tone, accents, inflection, etc, etc).
So what you are asking can be done but there's also good reasons why it's on the cloud.
I've just seen this genius script on github:
https://github.com/jsavoie/proof-of-work-login
My question is: Why is POW login not a world standard right now in 2018? It's absolute Genius!
Why are old fashioned captchas and recaptchas still so widespread?
The current going rate for reCaptcha solves is upwards of $3 per 1000. Meanwhile, the going rate for a t2.large spot instance is 2.78 cents per hour, dual cores with burst capability. So, for a proof of work to have the same attack cost as reCaptcha, the POW would have to take well over a minute, maybe two, on a single core of 3GHz. You're looking at possibly as much as 5min on an iphone 6. Most users would prefer to click signs.
Real-world data shows that the difficulty levels associated with Proof-of-Work would mean that significant numbers of legitimate users would be unable to continue their current levels of activity.
It seems to be a classic example of security over usability.
Source
The Main Purpose of Recaptcha is to prevent automated Bots,
Preventing Spam is an obvious outcome of Recaptcha as Bots are usually used for Spamming, Though proof-of-work can prevent spamming (By making process slow) It doesn't actually stop bots, It might be used as a seperate mechanism but It by no way is a replacement to Recaptcha as Recaptcha is to detect Humans
I am training wit.ai understanding, I use python script to make api call and feed wit.ai with sentences stored in a local file.
There is more data (~thousands) than I can manually validate (hundreds).
Does wit.ai prioritize what it ask me to validate? e.g. those that has low confidence score or "no match"
From my experience, I did not see this kind of optimization for wit.ai. If this is true, I need to do some optimization in my training process, and avoid flooding wit.ai with similar data that reach high confidence score quickly.