VBA speech recognition /audio input / voice command - vba

Speech recognition may be too grand a term for this problem.
I want my VBA program to wait for the user to say something like "next" or "continue" before it carries on processing.
This is the equivalent of the traditional "Press any key to continue" loop.
This should be fairly simple. All the examples I have found do complicated things like defining lexica and registering callback functions for recognition events. All very nice, but not necessary in my case.
Maybe I can/should use some other (audio) library instead of Speechlib (Microsoft Speech Object Library)
Thanks for any advice.

I think this is a bad idea
So what would happen if I step away and I had the TV or radio on and someone said Next
That would be pretty funny I think...or not

There's no drop-dead simple way to do speech recognition. You have to define a grammar (so the SR engine knows what to listen for) and a recognition handler (so the SR engine can tell you when it's heard something).

Related

Using different intonations with Watson text to speech

I am developing a PoC using Watson text to speech and Watson conversation.
Sometimes, the chatbot needs to ask a question, so I'd like text to speech to synthesize the voice using an interrogation intonation.
Is it possible to be done?
Watson Text to Speech supports SSML, and has expressive SSML tags.
The one you want to use is Uncertainty. As it is defined as "conveys an uncertain, interrogative message".
Example:
<express-as type="Uncertainty">
Could she still be in the office? She told me that she might leave early.
</express-as>
More details on it's usage is here:
https://console.bluemix.net/docs/services/text-to-speech/SSML-expressive.html#the-express-as-element
Yes, you can certainly use text-to-speech (TTS) for output and speech-to-text (STT) for input. You would need to use a middleware or app layer to drive the conversation and route the input/output to the other services (see "how to use" in the docs).
I have used the following TJBot recipe as a simple and good started for some projects: https://github.com/damiancummins/tell_the_time
Unfortunately Concatenative TTS may have problems to create correct intonation in questions. If you think it happens consistently or too often please open a bug.
If you have a specific question which gets incorrect intonation try to rephrase it a little bit if possible. A useful trick for this voice could be to use double question mark '??'

How to simulate user input in objective-c?

On windows, and with c#, I was able to capture another window's "screen", use that for processing, and then send user input events to that window. (which were generated by the program).
I would like to do the same with objective-c and within Mac OS X. So! Any resources or even a name for what I'm trying to do would be great. It's very frustrating to try and find information on this when the only ways i can think to phrase my searches are too ambiguous.
Thanks!
EDIT: As a specific example, there might be a particular game that I want to make an AI for. In that case I would need to be able to send mouse and keyboard events to the game.
If you're looking to do automated user interface testing, look at Squish, eggPlant, expect and OS X Accessibility. If you're looking to programmatically control another program, use scripting (if you wrote the program to control, add scripting support; otherwise, see what scripting support the program offers), or try Automator.

Change dictation topic on the fly

I am scoping out a custom dictation application to be built using MS SAPI 5. I would like to be able to change the grammar (topic) of dictation dynamically based on what is being recognized. For example, if my dictation application deals with car repair, then, if I detect the speaker talking about engine, I want to bring in a dictation topic optimized for recognizing engine part names, as opposed to cabin upholstery.
Anyone know if this is possible?
Thanks.
-Raj
I believe your biggest hurdle will be in developing a "fool proof" method of identifying what topic is being discussed. To reference your own statement, " talking about engine", if you simply listen for engine and key off of that word you would not be able to for instance use the word engine to represent both a car engine and a software gaming engine. I have used a couple of speech synthesizers. The ones i have used wait for specific commands to begin listening. Perhaps you could have a combination of start listening commands.
USER "Computer, start listening."
COMPUTER "Ready to Listen."
USER "Car engines."
COMPUTER "Loading Car Engine Library."
Something like this might be a reasonable approach to your problem while still allowing yourself the flexibility of adding libraries. You could also utilize this approach to implement a default library. If the second command given isn't a recognized library then the program could use the default library.

Mac OS X speech to text API. Howto?

I have a program that receives an audio (mono) stream of bits from TCP/IP. I am wondering whether the speech (speech-recognition) API in Mac OS X would be able to do a speech-to-text transform for me.
(I don't mind saving the audio into .wav first and read it as oppose to do the transform on the fly).
I have read the official docs online, it is a bit confusing. And I couldn't find any good example about this topic.
Also, should I do it in Cocoa/Carbon/Java or Objective-C?
Can someone please shed some light?
Thanks.
There's a number of examples that get copied under /Developer/Examples/Speech/Recognition when you install XCode.
Cocoa class for speech recognition is NSSpeechRecognizer.
I've not used it but as far as I know speech recognition requires you to build a grammar to help the engine choose from a number of choices rather then allowing you to pass free-form input. This is all explained in the examples referred above.
This comes a bit late perhaps, but I'll chime in anyway.
The speech recognition facilities in OS X (on both the Carbon and Cocoa side of things) are for speech command recognition, which means that they will recognize words (or phrases, commands) that have been loaded into the speech system language model. I've done some stuff with small dictionaries and it works pretty well, but if you want to recognize arbitrary speech things may turn hairier.
Something else to keep in mind is that the functionality that the speech APIs in OS X provide is not one to one. The Carbon stuff provides functionality that has not made it to NSSpeechRecognizer (the docs make some mention of this).
I don't know about Cocoa, but the Carbon Speech Recognition Manager does allow you to specify inputs other than a microphone so a sound stream would work just fine.
Here's a good O'Reilly article to get you started.
You can use either ApplicationServices's SpeechSynthesis (10.0+)
CFStringRef cfstr = CFStringCreateWithCString(NULL,"Hello World!", kCFStringEncodingMacRoman);
Str255 pstr;
CFStringGetPascalString(cfstr, pstr, 255, kCFStringEncodingMacRoman);
SpeakString(pstr);
or AppKit's NSSpeechSynthesizer (10.3+)
NSSpeechSynthesizer *synth = [[NSSpeechSynthesizer alloc] initWithVoice:#"com.apple.speech.synthesis.voice.Alex"];
[synth startSpeakingString:#"Hello world!"];

3d files in vb.net

I know this will be a difficult question, so I am not necessarily looking for a direct answer but maybe a tutorial or a point in the right direction.
What I am doing is programing a robot that will be controlled by a remote operator. We have a 3D rendering of the robot in SolidWorks. What I am looking to do is get the 3D file into VB (probably using DX9) and be able to manipulate it using code so that the remote operator will have a better idea of what the robot is doing. The operator will also have live video to look at, but that doesn't really matter for this question.
Any help would be greatly appreciated. Thanks!
Sounds like a tough idea to implement. Well, for VB you are stuck with MDX 1.1(Comes with DirectX SDK) or SlimDX (or other 3rd party Managed DirectX wrapper). The latest XNA (replacement for MDX 1.1/2.0b) is only available for C# coder. You can try some workaround but it's not recommended and you won't get much community support. These are the least you need to get your VB to display some 3d stuffs.
If you want to save some trouble, you could use ready made game engine to simplified you job. Try Ogre, and it's managed wrapper MOgre. It was one of the candidate for my project. But I ended up with SlimDX due to Ogre not supporting video very well. But since video is not your requirement, you can really consider it. Most sample would be in C# also, so you need to convert to VB.Net to use. It won't be hard.
Here comes the harder part, you need to export your model exported from SolidWorks to DirectX Format (*.x). I did a quick search in google and only found a few paid tools to do that. You might need to spend a bit on that or spend more time looking for free converter tools.
That's about it. If you have more question, post again. Good Luck
I'm not sure what the real question is but what I suspect that you are trying to do is to be able to manipulate a SW model of a robot with some sort of a manual input. Assuming that this is the correct question, there are two aspects that need to be dwelt with:
1) The Solidworks module: Once the model of the robot is working properly in SW, a program can be written in VB.Net that can manipulate the positional mates for each of the joints. Also using VB, a window can be programmed with slide bars etc. that will allow the operator to be able to "remotely" control the robot. Once this is done, there is a great opportunity to setup a table that could store the sequencial steps. When completed, the VB program could be further developed to allow the robot to "cycle" through a sequence of moves. If any obstacles are also added to the model, this would be a great tool for collission detection and training off line.
2) If the question also includes the incorporation of a physical operator pendent there are a number of potential solutions for this. It would be hoped that the robot software would provide a VB library for communicating and commanding the Robot programatically. If this is the case, then the VB code could then be developed with a "run" mode where the SW robot is controlled by the operator pendent, instead of the controls in the VB window, (as mentioned above). This would then allow the opertor to work "offline" with a virtual robot.
Hope this helps.