Speech Recognition API match grammar when there is no sound (Microsoft) - kinect

I have build a Speech Recognition tool using Microsoft SAPI and a Kinect.
Following code sample I load XML grammar and start a SpeechRecognitionEngine.
Sometimes when there is few or no sound the SpeechRecognitionEngine have a match with a very high confidence (0.85) on a simple sentence: "Sarah what time is it"
Why Engine trigger this strong match in silence ?!
Any Workaroud ?
Here is my main class on GitHub
I also write (in french) a blog post with dump (wav + xml)

I am not sure exactly which wave file you are talking about (I haven't spoken French since middle school). But I think this wave from your group qualifies: dump_2012_12.16_12.47.33.wav. It has a high confidence value .857 and does not appear to have any speech in the audio file. Looking at a spectrogram (see below) you can see the audio file does contain energy in the speech range.
Most speech recognition engines these days use a Hidden Markov Model (aka HMM) to match audio vector patterns to speech. The state of the art today is not always accurate at doing this. HMM's tend to be really sensitive to background noise.
This is why most speech type features in production today (like Siri) are push to talk. You need to push a button and you have 5 seconds to speak into the microphone. They do this so they can be sure there is some type of speech signal. For those systems that are open mic (Kinect is the only one I know of) they try and use a form of echo cancellation to suppress background audio. But even with the state of the art there is still bleed through.
The only relatively easy work arounds (again not 100%) that I know of involve editing your grammar to include a garbage rule and shortening the possible phrase list. The garbage rule will give the speech engine a "run home to momma" option when it does not know what to do.
http://www.w3.org/TR/speech-grammar/#S2.2.3
Although I don't think this is recommended usage I have seen some systems behave better when using the garbage rule to help filter out background noise. Of course they then have to ignore the garbage reco events.

Related

Voice Recording Has Lower Decibels Than "Silent" Recording

I have looked at a number of articles on detecting silence in a recording using NAudio. But, I ran into a snag. I am now looking into using the more complex silence detection methods such as Fourier Transform. In the meantime, perhaps someone can shed some light on the problem I have run into.
I wrote a program in C# using NAudio to detect silence in a WAV file. I have a sample dictation file that I used to test it and it works fine.
As a further test, I used Audacity to create a Wav file that has 1 minute of silence in it using the Noise Reduction feature. When I listen to it, I don't hear sound. When I run my program on it, the lowest decibel reading is higher than the lowest decibel reading in the dictation file. I'm wondering why because this concerns me that there might be dictation files that have silence in them that I cannot detect.

Good speech recognition engine for Mac, not iOS?

Sorry if this is a repeat question, but I didn't see it anywhere.
I'm working on a Mac program that will take voice commands, and NSSpeechRecognizer isn't quite doing it for me.
I want something a little more dynamic so I can set alarms, make dates, give more natural commands, etc.
Every open source speech engine I've found is tailored toward iOS. Do openears/vocalkit etc. still work just as fine for Mac programs?
Speech recognition is exceptionally non-trivial. The engines that are free are free for a reason. If you expect dictation in any amount (like an alarm label), you're out of luck. There are reasons Siri requires an entire data center. The open source packages available won't get you much further than simple telephone auto-attendants.
Unless you have an extensive statistics background and free time, I'd recommend that you pursue licensing a commercial library or server implementation.
pocket sphinx from Carnegie Melon is about the only option
http://cmusphinx.sourceforge.net/

Mac OS X equivalent for DirectShow, GraphEdit

New to Mac OS X, familiar with Windows. Windows has DirectShow, a good number of built-in filters, COM programming, and GraphEdit for very fast prototyping and snooping on the graphs you've constructed in code.
I'm now about to go to the Mac to work with cameras, webcams, microphones, color spaces, files, splitting, synchronization, rendering, file reading, file saving, and many of things I've come to take for granted with DirecShow when putting together applications for live performance. On the Mac side, so far I've found ... nothing! Either I don't know where to look or I'm having the toughest time tying the Mac's reputation for its ease of handling media with a coherent programmatic ability to get in there and start messin' with media manipulatin' building blocks.
I've seen some weak suggestions to use gstreamer or some library for QT but I can't bring myself to believe that this is the Apple way to go. And I've come across some QuickTime documentation but I'm not looking to do transitions, sprites, broadcasting, ...
Having a brain trained on DirectShow means I don't even know how Apple thinks about providing DirectShow-like functionality. That means I don't know the right keywords and don't even know where to look. Books? Bought a few. Now I might be able to write some code that can edit your sister's wedding video (if I can't make decent headway on this topic I may next be asking what that'd be worth to you), but for identifying what filters are available and how to string them together ... nothing. Suggestions?
Video handling is going through a huge transition on the Mac at the moment. QuickTime is very old, but also big and powerful, so it's been undergoing an incremental replacement process for the past 5 years or so.
That said, QTKit is the QuickTime subset (capture, playback, format conversion and basic video editing) which is supported going forward. The legacy QuickTime APIs are still there for the moment, and probably will remain at least until its major features are available elsewhere, but are 32-bit only. For some involved video stuff you may end up needing to use it in places.
At the moment, iOS is ahead of the Mac because it could start from scratch with AV Foundation. The future of the Mac media frameworks will probably either be AV Foundation directly (with QTKit being a lightweight shim over the top) or an extension of QTKit that looks very similar.
For audio there's Core Audio which is on Mac and iOS and isn't going away any time soon. It's quite powerful but somewhat obtuse in places. Luckily online support is very good; the mailing list is an essential resource.
For filters and frame-level processing you've got Core Video as someone else mentioned, as well as Core Image. For motion graphics there's Quartz Composer which includes a graphical editor and a plugin architecture to add your own patches. For programmatic procedural animation and easily mixing rendering models (OpenGL, Quartz, video, etc.) there's Core Animation.
In addition to all of these, of course there's no reason you can't use open source libraries where the built-in stuff doesn't do what you want.
To address your comment below:
In QuickTime (and QTKit), individual data types like audio and video are represented as tracks. It may not be immediately clear that QuickTime can open audio as well as video file formats. A common way to combine audio and video would be:
Create a QTMovie with your video file.
Create a QTMovie with your audio file.
Take the QTTrack object representing the audio and add it to the QTMovie with the video in it.
Flatten the movie, so it doesn't simply contain a reference to the other movie but actually contains the audio data.
Write the movie to disk.
Here's an example from Blender. You'll see how the A/V muxing is done in the end_qt function. There's also some use of Core Audio in there (AudioConverter*). (There's some classic QuickTime export code in quicktime_export.c but it doesn't seem to do audio.)

Convert audio (wav file) to text using SAPI?

My task is to convert an Audio file not from Direct Speech from Human into text.
e.g If I have "Hello there" store in wav file to it will transcribe it into text and show "Hello there" string on screen.
Any language code in preferred but priority is C#.
SAPI can certainly do what you want. Start with an in-proc recognizer, connect up your audio as a file stream, set dictation mode, and off you go.
Now the disappointing bit. You probably won't get terribly good results; in fact, I suspect that unless you're very lucky, you'll probably get total garbage.
There are several problems:
Dictation really only works well once the SR engine has been trained. If you're lucky (like me), you can get OK results, but if the speaker has an accent, training is a must.
Training only works well for a single voice. If you've got multiple speakers in a single audio file, it's not going to work well.
The audio model for dictation (and Speech Recognition in general) assumes that you're using a close-talk microphone (i.e., a microphone right next to your face, to minimize noise pickup). If your WAV files have extra noise, accuracy will go down dramatically.
Dragon Naturally Speaking Professional has support for transcription, but it still requires training and a single voice. (I do believe that DNS has a custom audio model that works well for voice recorders.) I haven't used it myself, so I don't know how well it would work in your situation.
Now, if you are looking for specific keywords, other people have had success using "Audio Mining" - running the recognizer looking for a specific keyword on an audio stream

What are some ideas for an embedded and/or robotics project?

I'd like to start messing around programming and building something with an Arduino board, but I can't think of any great ideas on what to build. Do you have any suggestions?
I show kids, who have never programmed, or done any electronics before, to make a simple 'Phototrope', a light sensitive robot, in about a day. It costs under £30 (GBP) including Arduino, electronics and off-the-shelf mechanics. If folks really get into mobile robots, the initial project can grow and grow (which I feel is part of the fun).
There are international robot competitions which require relatively simple mechanics to get started, e.g. in the UK http://www.tic.ac.uk/micromouse/toh.asp
Ultimate performance require specially built machines (for lightness) , but folks would get creditable results with an Arduino Nano, the right electronics, and a couple of good motors.
A line following robot is the classic mobile robot project. The track can be as simple as electrical tape. Pololu have some fun videos about their near-Arduino 3PI robot. The sensors are about £1, and there are a bunch of simple motor+gearbox kits from lots of places for under £10. Add a few £ for motor control, and you have autonomous robot mechanics, in need of programming! Add an Infrared Remote receiver (about £1), and you can drive it around using your TV remote. Add a small solar cell, use an Arduino analogue input to measure voltage, and it can find the sun. With a bit more electronics, it can 'feed' itself. And so it gets more sophisticated. Each step might be no more than a few hours to a few days effort, and you'll find new problems to solve and learn from.
IMHO, the most interesting (low-cost) competitions are maze solving robots. The international competition rule require the robot to explore a walled maze, usually using Infrared sensors, and calculate their optimal route. The challenges include keeping track of current position to near-millimeter accuracy, dealing with real world's unpredictably noisy environment and optimising straight-line speed with shortest distance cornering.
All that in 16K of program, and 1K RAM, with real-time interrupt handling (as much as 100K interrupts/second for some motor systems), sensor sampling, motor speed control, and maze solving is an interesting programming challenge. (You might make it 'easy' with 32K of program, and 2K RAM :-)
I'm working on a 'constrained' robot challenge (based on Arduino) so that robot performance is mainly about programming rather than having a big budget.
Start small and build up to something more complex. Control servos. Blink LEDs. Debounce inputs. Read analog sensors. Display text on an LCD. Then put it together.
Despite the name, I like the "Evil Genius" book for PIC microcontrollers because of the small, easily digestible projects that tend to build on one another. It is, of course, aimed at PIC programmers rather than the Arduino, but the material covered will be useful no matter what you're developing on.
I know Arduino is trendy right now, but I also like the Teensy++ development board because of its low price-point ($24), breadboard-compatible PCB, relatively high pin count, Linux development environment, USB connectivity, and not needing a programmer. Worth considering for smaller projects.
If you come up with something cool, let me know. I need an excuse to do something fun :)
Bicycle-related ideas:
theft alarm (perhaps with radio link to a base station which is connected to a PC by Ethernet)
fancy trip computer (with reed switch or opto sensor on wheel)
integrate with a GPS telematics unit (trip logging) with Ethernet/USB download of logged data to PC. Also has an interesting PC programming component--integrate with Google Maps.
Other ideas:
Clock with automatic time sync from:
GPS receiver
FM radio signal with embedded RDS data with CT code
Digital radio (DAB+)
Mobile phone tower (would it require a subscription and SIM card for this receive-only operation?)
NTP server via:
Ethernet
WiFi
ZigBee (with a ZigBee coordinator that gets its time from e.g. Ethernet or GPS)
Mains electricity smart meter via ZigBee (I'm interested now that smart meters are being introduced in Victoria, Australia; not sure if the smart meters broadcast the time info though, and whether it requires authentication)
Metronome
Instrument tuner
This reverse-geocache puzzle box was an awesome Arduino project. You could take this to the next step, e.g. have a reverse-geocache box that gives out a clue only at a specific location, and then using physical clues found at that location coupled with the next clue from the box, determine where to go for the next step.
You could do one of the firefighting robot competitions. We built a robot in university for my bachelor's final project, but didn't have time to enter the competition. Plus the robot needed some polish anyway... :)
Video here.
Mind you, this was done with a Motorola HC12 and a C compiler, and most components outside the microcontroller board were made from scratch, so it took longer than it should. Should be much easier with prefab components.
Path finding/obstacle navigation is typically a good project to start with. If you want something practical, take a look at how iRobot vacuums the floor and come up with a better scheme.
Depends on your background and if you want practical or cool. On the practical side, a remote control could be a simple starting point. It's got buttons and lights but isn't too demanding.
For a cool project maybe a Simon-style memory game or anything with lights & noises (thinking theremin-style).
I don't have suggestions or perhaps something like a line follower robot. I could help you with some links for inspiration
Arduino tutorials
Top 40 Arduino Projects of the Web
20 Unbelievable Arduino Projects
I'm currently developing plans to automate my 30 year old model train layout.
A POV device could be fun to build (just google for POV Arduino). POV means persistence of vision.