I've had excellent results with the Google API for speech recognition with natural dialogues, however for sounds from Youtube videos or movies recognition is poor or nonexistent.
Recording sounds on an iPhone 4 of my voice in both Spanish to English is recognized, but with the same phone at a movie is almost impossible, even a scene with a character talking with little background noise. Only once had success.
I try to clean up the sound with SoX (Sound eXchange) using noisered and compand efects, without any success.
Any idea? Or simply are sounds that can not be identified by the Google API for more you change? It will have better success with other speech recognition software?
Google voice recognizer (and most other recognizers) is not compatible with reverberation effects. In most video scenes distance between person and microphone more than 1-3 meter. Try to put your phone on table and recognize smth from 3 meters distance. This will does not lead to anything but sound quality will be very good.
Related
Is there any WEARABLE SENSORS available that can simulate human emotions ?
Something like the one in this link https://www.technologyreview.com/s/421316/sensor-detects-emotions-through-the-skin/ (but it doesn't capture many of the human emotions).
I am looking for some WEARABLE sensor that should simulate the level of Anger, Disgust, Fear, Happiness, Sadness, Surprise, Excitement etc of a human at any particular instance. I am NOT looking for emotion detection from Facial expression or voice recognition.
Your Help is much appreciated !! Thanks.
Take a look at the Frauenhofer Shore Software. Frauenhofer creating really great things like mp3. But i think this Software would be really expensive.
http://www.iis.fraunhofer.de/en/ff/bsy/tech/bildanalyse/shore-gesichtsdetektion.html
Or here is a open source solution:
https://github.com/auduno/clmtrackr
How to recognize categorize a photo into selfie category ?
I was wondering if we could use the meta data where in it contains the type of the camera used to snap a pic (It could be a front camera of a phone (secondary camera) in case of a selfie but not necessarily should be, primary camera (back camera of the phone) could also be used or the non phone camera could be used too)
Do we use AI techniques to learn it as a selfie, Or do we use some kind of measuring the focal length to recognize the given pic as selfie.
This is just a open ended question. Any comments or thoughts are appreciated.
Meta data wouldn't be useful as smart phones are becoming so advanced. I know Apple and Microsoft have both shot commercials with their phones, as well using photos for giant billboards.
You could probably find some facial recognition software that cameras use to find faces without a lot of difficulty (opencv is a place to look). From there you could measure the size of the face in relation to the photo, if it's large enough it's probably a selfie.
With the selfie stick, categorize a photo as a selfie by the size of the face is probably not the most accurate solution. Of course, if you think that no photo that you have will be used with a selfie stick, then as roro said, measuring the size of the face would be a working solution.
What I would suggest is to have a photo of the owner of the phone and try to estimate with facial recognition if the photo contains the owner.
I just bought a Sony A7 and I am blown away with the incredible pictures it takes, but now I would like to interact and automate the use of this camera using the Sony Remote Camera API. I consider myself a maker and would like to do some fun stuff: add a laser trigger with Arduino, do some computer controlled light painting, and some long-term (on the order of weeks) time-lapse photography. One reason I purchased this Sony camera over other models from famous brands such as Canon, Nikon, or Samsung is because of the ingenious Sony Remote Camera API. However, after reading through the API reference it seems that many of the features cannot be accessed. Is this true? Does anyone know a work around?
Specifically, I am interested in changing a lot of the manual settings that you can change through the menu system on the camera such as ISO, shutter speed, and aperture. I am also interested in taking HDR images in a time-lapse manner and it would be nice to change this setting through the API as well. If anyone knows, why wasn't the API opened up to the whole menu system in the first place?
Finally, if any employee of Sony is reading this I would like to make this plea: PLEASE PLEASE PLEASE keep supporting the Remote Camera API and improve upon an already amazing idea! I think the more control you offer to makers and developers the more popular your cameras will become. I think you could create a cult following if you can manage to capture the imagination of makers across the world and get just one cool project to go viral on the internet. Using http and POST commands is super awesome, because it is OS agnostic and makes communication a breeze. Did I mention that is awesome?! Sony's cameras will nicely integrate themselves into the internet of things.
I think the Remote Camera API strategy is better than the strategies of Sony's competitors. Nikon and Canon have nothing comparable. The closest thing is Samsung gluing Android onto the Galaxy NX, but that is a completely unnecessary cost since most people already own a smart phone; all that needs to exist is a link that allows the camera to talk to the phone, like the Sony API. Sony gets it. Please don't abandon this direction you are taking or the Remote Camera API, because I love where it is heading.
Thanks!
New API features for the Lens Style Cameras DSC-QX100 and DSC-QX10 will be expanded during the spring of 2014. The shutter speed functionality, white balance, ISO settings and more will be included! Check out the official announcement here: https://developer.sony.com/2014/02/24/new-cameras-now-support-camera-remote-api-beta-new-api-features-coming-this-spring-to-selected-cameras/
Thanks a lot for your valuable feedback. Great to hear, that the APIs are used and we are looking forward nice implementations!
Peter
I am trying to build app like talking tom cat. I am able to record the voice and play the voice in chipmunk style. but it records background noise also. It does not record only human voice records everything. I want to record only human voice. Could anyone help on this.
Is there any code out there, that I can use in Cocoa, to recognize text from photos? Let's say I snap a photo with my iPhone of a page of a book. I'd like to capture the text in it.
There is the Tesseract OCR toolkit that is an open source OCR engine, currently maintained by Google. "Olipion" created a cross compilation tutorial to get in on the iPhone. I would say that this is a good place to start.
However, there are reasons why you might not want to to OCR on the Phone even if you could. Some of these include:
Even the new iPhone 4's processor is not that fast and since you app can't really run in the background doing the processing, the user experience might not be optimal.
Running OCR on a mobile device would probably be a killer for battery life.
Every time you would want to update the OCR engine everybody who installed your app would have to upgrade.
For an always connected mobile device running the OCR on a server somewhere would be probably better. You could upgrade your OCR software easily, you could run much more powerful algorithms then a mobile device could handle and so on.
I am not so sure that you would be able to get good results from photos taken using a mobile camera -- accuracy of OCR systems goes way down with the kind of poorly lit, noisy, distorted images likely to be captured using a phone camera.
As far as commercial products out there, there is Evernote that gives you a OCR capability if you buy their premium service.
As an alternative to machine OCR, there is always Mechanical Turk, where you could pay people small amount to do the OCR for you. Would probably do better at transcription given the image source.