How to detect when SAPI TTS engine is busy - text-to-speech

The SAPI engine can only render TTS from one application at a time (I have run a test with two instances of the Windows SDK TTSApplication sample to verify this). I am writing an application in which I need to detect whether the TTS engine is currently speaking (i.e. under control of a separate application, not mine).
Does anyone know please how can I programmatically (in C++) detect the SAPI TTS engine busy/ready state? I have tried using ISpVoice::GetStatus() but that only seems to work for any TTS activity in my own application.
Thanks.

This is the solution to know whether the speech synthesis system is speaking or not.
ISpVoice *pVoice;
hr = pVoice->GetStatus(& status, NULL);
if(status.dwRunningState == SPRS_IS_SPEAKING)
std::cout<< "The Speech Synthesis System is speaking."
else
std::cout<< "The Speech Synthesis System is not speaking."

For example in SAPI4, IVTxtAttributes::IsSpeaking retrieve such status (if engine is currently playing samples to some audio device).
Anyway IMO general SAPI engine is not limited to one application. I believe that this behaviour is 'your engine' specific.

From http://msdn.microsoft.com/en-us/library/ee431864%28v=vs.85%29.aspx
SPRUNSTATE lists the voice running states.
typedef enum SPRUNSTATE
{
SPRS_DONE,
SPRS_IS_SPEAKING
} SPRUNSTATE;
Elements:
SPRS_DONE
The voice has completed processing all queued streams.
SPRS_IS_SPEAKING
The voice instance currently has the audio claimed.

Related

How to get the device platform on Windows 10

Since one release binary will run on pc, xbox and phones, I need a way to fetch the device type on runtime.
It is doable by checking with the ApiInformation for present types, methods etc., but I believe there should be a more reliable way.
Currently (with the preview tools released 23-Mar-2015) there isn't an easy way to do this, other than (as you mention) using the ApiInformation methods to detect implementations of things that only exist on the specific platform you're after.
It would be nice if there were some helpers to do this and if none are in the final tooling I'm sure some will be created by helpful people in the community.
However, there is a really good reason not to have this in that it encourages broad assumptions about the device.
If it was possible to say "Am I running on a phone?" then if you got the response 'Yes' then it would be easy to make assumptions about what was possible with that device but not all phones have the same capabilities.
It looks like there will be a "mobile" version of Windows 10 for both phones and small tablets. If you were able to say "am I the 'mobile' version?" then again that wouldn't potentially answer all your questions and you'd have to still check individual API availabilities as the capabilities of a cheap tablet and a high end phone could be vastly different. (The inclusion of physical buttons on the device and the ability to make phone calls are two obvious examples.)
Extending this further there are plenty of scenarios where you'd treat different platforms the same as the functionality exists on all of them. In this scenario you're code would be better of saying "Is such and such API available?", rather than saying "Am I running on desktop, Xbox or SurfaceHub?".
The IOT platform will likely further complicate this due to the range of functionality and capabilities different IOT devices will have available.
There are very few scenarios where you want to know the platform you're running on and not whether a specific API is available. Hopefully, by only exposing API availability Microsoft are encouraging developers to think about checking for what they actually need, rather than relying on broad, potentially incomplete, classifications of devices.
Just as with web development where you don't know what platform or browser you are running on, you shouldn't detect the platform and make assumptions about what capabilities that device will therefore have, you should detect if the specific capability you require is supported/enabled on the device before using it or exposing associated UI in your app.
It seems there is a new API to detect Device Family:
Windows.System.Profile.AnalyticsInfo.VersionInfo.DeviceFamily
You can find more information here: https://msdn.microsoft.com/en-us/library/windows/apps/dn705767.aspx
Updated:
https://msdn.microsoft.com/en-us/library/windows/apps/windows.system.profile.analyticsversioninfo.aspx
[Edit July 3 to replace //build-era information with current information]
Although you can try and infer the device you're on by using the ApiInformation APIs to detect APIs, this is a very bad solution since APIs can be added to devices over time. Please don't do that; your future self (or your replacement ;-) ) will thank you.
If you really do need to programmatically detect the device family that you're running on (and in most cases you don't) then you can use AnalyticsInfo.VersionInfo.DeviceFamliy. This returns a string for which there is no published standard set of values, because device families could be introduced or retired at any time.
If you want to provide different resources per device-family (strings, images, XAML files, HTML pages, etc.) then you don't need to detect the device-family in code; instead you can use an MRT qualifier DeviceFamily (such as Logo.DeviceFamily-Mobile.png). Just make sure you always have a fallback resource (image, string, etc) for use when the app is running on a device family you've never heard of before. And don't fall into the trap of assuming things like "Desktop requires higher-res assets than Mobile" because that is often not true.
Additionally to support the scenario Alan describes in his comment you can check for a Contract rather than a specific type as this indicates a block of related functionality. There is one such contract for the Windows Phone specific APIs - I described in here http://inthehand.com/2015/03/26/determine-if-running-on-windows-phone-from-a-uap-application/
Since this contract provides compatibility APIs for current Windows Phone apps we can assume at this point that it won't be implemented in small tablets as they won't have this. Obviously since the OS or APIs are not final this is not set in stone yet. This is a useful thing to know for Windows Phone especially if during the transition you want to cross promote legacy WP apps only on WP devices. For custom IoT devices I would check availability at the API level.
You can specify device family exclusive resources and views using specially named folders: (http://www.sharpgis.net/post/2015/04/01/Creating-DeviceFamily-specific-layouts-in-a-Universal-App).
You could, for the "advertising only same family apps" scenario described above, place a JSON or XML file in that device family's folder and fetch it at runtime using the storage API's.
I use this for phone (mobile):
if (Windows.System.Profile.AnalyticsInfo.VersionInfo.DeviceFamily == "Windows.Mobile")
{
// code for phone
}
else
{
// other code
}
extample is here
This is just repeating one of the previous answers which suggests using Windows.System.Profile.AnalyticsInfo.VersionInfo.DeviceFamily but I thought I'd include the complete code for a check:
// ----------------------------------------------------------------------
// IsRunningOnXbox
// Determines whether or not the game is running on an xbox console
bool IsRunningOnXbox()
{
// Skip if already checked
static bool bChecked = false;
static bool bRunningOnXbox = false;
if (bChecked)
return bRunningOnXbox;
// Retrieve the platform device family
Platform::String^ strVersionInfoDeviceFamily = Windows::System::Profile::AnalyticsInfo::VersionInfo->DeviceFamily;
if (strVersionInfoDeviceFamily != nullptr)
{
// Check to see if the device belongs to the xbox family
std::wstring strDeviceFamily = strVersionInfoDeviceFamily->Data();
std::transform(strDeviceFamily.begin(), strDeviceFamily.end(), strDeviceFamily.begin(), ::tolower);
if (strDeviceFamily.find(L"xbox") != std::wstring::npos)
bRunningOnXbox = true;
}
// Check complete
bChecked = true;
// Return whether or not the host platform is xbox
return bRunningOnXbox;
}
I do agree with Chuck's comment that this is probably not what AnalyticsInfo is intended for... but at the same time, we're talking about the xbox - a device with a single manufacturer who is also responsible for the OS. So in my mind at least, this seems pretty safe. Plus, if you wrap it like this, it's incredibly easy to swap in a different check should something better come along.

Speech Recognition API

I need to automatically transcribe some short MP3s as part of a proof of concept I am working on. I am currently looking into cloud solutions or web API services to send the MP3 as a simple HTTP request and receive a transcription back.
The only free/open source solution I have found here, but the demos don't seem to work (at least not on the files I need to transcribe). I have found some enterprise solutions for call centers, but so far nothing I can simply integrate into a project.
Are there any web based speech recognition services available? One that is able to filter out small noise would be a plus.
Here is an unofficial method to access Google ASR capability. I just tested on Yesterday and it still works - you can get JSON style ASR output with words and associated confidence score from an FLC audio sampled in 16KHz.
Also you can try speech recognition engine of Windows 7 to produce subtitles. Here is the tool for that.
This may be a good match. Also, their techcrunch profile (See this) lists competitors as: SimulScribe, SpinVox, Vlingo, Nuance, Microsoft, Google
Some of these links may be helpful.
Vlingo, Bing and Google have recognizers in the cloud, but I don't think they make them publicly programmable. I believe they are accessible only from their authorized clients.
For a proof of concept (and low volume), have you considered just using the desktop speech engines that come in Windows 7? What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? may be helpful. The MS desktop recognizers ship with a dictation grammar and it sounds like that is what you will need.

Change dictation topic on the fly

I am scoping out a custom dictation application to be built using MS SAPI 5. I would like to be able to change the grammar (topic) of dictation dynamically based on what is being recognized. For example, if my dictation application deals with car repair, then, if I detect the speaker talking about engine, I want to bring in a dictation topic optimized for recognizing engine part names, as opposed to cabin upholstery.
Anyone know if this is possible?
Thanks.
-Raj
I believe your biggest hurdle will be in developing a "fool proof" method of identifying what topic is being discussed. To reference your own statement, " talking about engine", if you simply listen for engine and key off of that word you would not be able to for instance use the word engine to represent both a car engine and a software gaming engine. I have used a couple of speech synthesizers. The ones i have used wait for specific commands to begin listening. Perhaps you could have a combination of start listening commands.
USER "Computer, start listening."
COMPUTER "Ready to Listen."
USER "Car engines."
COMPUTER "Loading Car Engine Library."
Something like this might be a reasonable approach to your problem while still allowing yourself the flexibility of adding libraries. You could also utilize this approach to implement a default library. If the second command given isn't a recognized library then the program could use the default library.

Mac OS X speech to text API. Howto?

I have a program that receives an audio (mono) stream of bits from TCP/IP. I am wondering whether the speech (speech-recognition) API in Mac OS X would be able to do a speech-to-text transform for me.
(I don't mind saving the audio into .wav first and read it as oppose to do the transform on the fly).
I have read the official docs online, it is a bit confusing. And I couldn't find any good example about this topic.
Also, should I do it in Cocoa/Carbon/Java or Objective-C?
Can someone please shed some light?
Thanks.
There's a number of examples that get copied under /Developer/Examples/Speech/Recognition when you install XCode.
Cocoa class for speech recognition is NSSpeechRecognizer.
I've not used it but as far as I know speech recognition requires you to build a grammar to help the engine choose from a number of choices rather then allowing you to pass free-form input. This is all explained in the examples referred above.
This comes a bit late perhaps, but I'll chime in anyway.
The speech recognition facilities in OS X (on both the Carbon and Cocoa side of things) are for speech command recognition, which means that they will recognize words (or phrases, commands) that have been loaded into the speech system language model. I've done some stuff with small dictionaries and it works pretty well, but if you want to recognize arbitrary speech things may turn hairier.
Something else to keep in mind is that the functionality that the speech APIs in OS X provide is not one to one. The Carbon stuff provides functionality that has not made it to NSSpeechRecognizer (the docs make some mention of this).
I don't know about Cocoa, but the Carbon Speech Recognition Manager does allow you to specify inputs other than a microphone so a sound stream would work just fine.
Here's a good O'Reilly article to get you started.
You can use either ApplicationServices's SpeechSynthesis (10.0+)
CFStringRef cfstr = CFStringCreateWithCString(NULL,"Hello World!", kCFStringEncodingMacRoman);
Str255 pstr;
CFStringGetPascalString(cfstr, pstr, 255, kCFStringEncodingMacRoman);
SpeakString(pstr);
or AppKit's NSSpeechSynthesizer (10.3+)
NSSpeechSynthesizer *synth = [[NSSpeechSynthesizer alloc] initWithVoice:#"com.apple.speech.synthesis.voice.Alex"];
[synth startSpeakingString:#"Hello world!"];

Does Vista Voice Recognition engine have scripting like Naturally Speaking?

I want to have an action performed whenever the user (while using Vista voice recognition) says "Wingbats are crazy!". How do I do this? Is there scripting or is there a dll to tie into?
You might want to check out the Microsoft Speech API (SAPI). I used this in Windows XP a while ago and it supports an XML markup that declares the command(s) that you want the system to recognise. Your application then determines what needs to happen when a speech command is recognised.
For speech recognition, check out the ISpRecoContext interface.
Previously this was a COM interface, but since Vista you can use .NET. Or apparently you can use Python if that's your preference!
Edit
Microsoft Speech Server 2007 supports VoiceXML, mentioned in another response to this question.
I would recommend the WSR Macro toolkit. It lets you easily integrate your custom scripts into the Windows Speech Recognition system. SAPI and System.Speech.Recognition are great if you need more control, but given your question, I suspect that the learning curve will be much easier with WSR Macros.
Check out Voice XML. A list of systems implementing the standard can be found on w3.org.