How to set the input audio device for Microsoft.Speech recognizer in VB.Net or C# to any audio device - vb.net

I want to use the Microsoft.Speech namespace in VB.NET to create a telephony application. I need to be able to set the recognizer input to any audio device installed on the system. Microsoft has the recognizer.SetInputToDefaultAudioDevice() method, but I need something like .SetInputToAudioDeviceID. How can I choose another wave audio input from the list of devices installed on my system? In SAPI, I would use MMSystem and SpVoice:
Set MMSysAudioIn1 = New SpMMAudioIn
MMSysAudioIn1.DeviceId = WindowsAudioDeviceID 'set audio input to audio device Id
MMSysAudioIn1.Format.Type = SAFT11kHz8BitMono 'set wave format, change to 8kHz, 16bit mono for other devices
Dim fmt As New SpeechAudioFormatInfo(1000, AudioBitsPerSample.Eight, AudioChannel.Mono)
Recognizer.SetInputToAudioStream(MMSysAudioIN1, fmt)
How do I do this with Microsoft.Speech?
MORE INFO: I want to take any wave input device in the Windows list of wave drivers and us that as input to speech recognition. Specifically, I may have a Dialogic card with wave input reported by TAPI as deviceID 1-4. In SAPI, I can use the SpMMAudioIn class to create a stream and set which device ID is associated with that stream. You can see some of that code above. Can I directly set Recognizer1.SetInputToAudioStream by the device ID of the device like I can in SAPI? Or do I have to create code that reads bytes and uses buffers, etc. Do I have to create a MemoryStream Object? I can't find any example code anywhere. What do I have to check in .NET to get access to ISpeechMMSysAudio/spMMAudioIn in case something like this would work? But hopefully, there is a way to use MemoryStream or something like it that takes a device ID and lets me pass that stream to the recognizer.
NOTE 2: I added "imports Speechlib" to the VB project and then tried to run the following code. It gives the error listed in the comments below about not being able to set the audio stream to a COM object.
Dim sre As New SpeechRecognitionEngine
Dim fmt As New SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono)
Dim audiosource As ISpeechMMSysAudio
audiosource = New SpMMAudioIn
audiosource.DeviceId = WindowsAudioDeviceID 'set audio input to audio device Id
' audiosource.Format.Type = SpeechAudioFormatType.SAFT11kHz16BitMono
sre.SetInputToAudioStream(audiosource, fmt) <----- Invalid Cast with COM here
It also appears the SpeechAudioFormatType does not support 8kHz formats. This just gets more and more complicated.

You would use SpeechRecognitionEngine.SetInputToAudioStream. Note that if you're having problems with streaming input, you may need to wrap the stream, as illustrated here.

Related

pop at the beginning of playback

When playing a memory stream containing wav encoded audio, the playback starts with a sharp pop/crackle:
ms = new MemoryStream(File.ReadAllBytes(audio_filename));
[...]
dispose_audio();
sound_output = new DirectSoundOut();
IWaveProvider provider = new RawSourceWaveStream(ms, new WaveFormat());
sound_output.Init(provider);
sound_output.Play();
That pop/crackle does not occur when playing the wav file directly:
dispose_audio();
NAudio.Wave.WaveStream pcm = new WaveChannel32(new NAudio.Wave.WaveFileReader(audio_filename));
audio_stream = new BlockAlignReductionStream(pcm);
sound_output = new DirectSoundOut();
sound_output.Init(audio_stream);
sound_output.Play();
Same file is playing, but when the wav data are stored in a memory stream first, there is a somewhat loud pop at the beginning of the playback.
I am very much a newbie with NAudio and audio in general, so it's probably something silly, but I can't seem to figure it out.
You are playing the WAV file header as though it were audio. Instead of RawSourceWaveStream, you still need to use WaveFileReader, just pass in your memory stream.

SpeechRecognitionEngine Spoken and Recorded do not match

I am using SpeechRecognitionEngine to recognize information being spoken by a user. The method will be running on the client's computer and it is working just fine and recognizing text almost like I want it to. So I'm happy.
However, I want to be able to do some processing of the wave file on my server. Right now I am testing on my local machine and when I use the SetInputToWaveFile method on the Recognizer, and pass the same audio clip back through (the one recorded by the engine originally) it does not give anything close to the original matches (or alternates).
For example:
User speaks and recognizer returns the phrase: "Hello how are you today" with 10 alternates.
Wave file is saved and then passed in through using SetInputToWaveFile (or SetInputToAudioStream). The recognizer will return a phrase (usually one word) that is nothing like the spoken text, example "Moon" along with ZERO alternates.
Often, when doing this, the recognizer will not raise the RecognizeCompleted event. It will however sometimes raise the SpeechHypothesized event, other times the AudioSignalProblem occured.
Shouldn't passing the audio clip that was captured from the recognizer results, back through the same recognizer, return the same matches?
Original:
Private _recognizer As New SpeechRecognitionEngine(New CultureInfo("en-US"))
_recognizer.UnloadAllGrammars()
_recognizer.LoadGrammar(New DictationGrammar())
_recognizer.SetInputToDefaultAudioDevice()
_recognizer.InitialSilenceTimeout = TimeSpan.FromSeconds(2)
_recognizer.MaxAlternates = 10
_recognizer.BabbleTimeout = TimeSpan.FromSeconds(1)
Dim result As RecognitionResult = _recognizer.Recognize()
Dim aud As RecognizedAudio = _result.Audio 'This is the audio that gets saved
aud.WriteToWaveStream("mypath")
(I've removed some of the logic code in between that pulls out the results, and does some processing)
Now trying to pull out from the Audio file:
_recognizer.SetInputToWaveFile("mypath")
'Doesn't work either
'_recognizer.SetInputToAudioStream(File.OpenRead("mypath"), New SpeechAudioFormatInfo(44100, AudioBitsPerSample.Sixteen, AudioChannel.Mono))
Dim result2 As RecognitionResult = _recognizer.Recognize()
The recognition/matches from result and result2 are not even close.
I manually set the speech audio format info, and now it works perfectly.
_recognizer.SetInputToAudioStream(File.OpenRead("mypath"), New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing))

UVC Extension Unit transfert only one byte of data?

I am using UVC extension Unit between Windows7 / Windows xp and Custom device.
To access the custom device I am using the following COM interface :
KSP_NODE s;
s.Property.Set = Guid_KSPROPSETID;
s.Property.Id = PropID;
s.Property.Flags = KSPROPERTY_TYPE_GET | KSPROPERTY_TYPE_TOPOLOGY;
s.NodeId = dwExtensionNode;
hr = pIKsControl->KsProperty( (PKSPROPERTY) &s, sizeof(s), pbPropertyValue, ulSize, &ulBytesReturned);
It works fine, but at the Windows UVC side I can't transfert more than one byte instead of the complete buffer pbPropertyValue of ulSize size. Someone know why? And how to fix it?
One more question, I am trying to find how to access the UVC_GET_MIN, UVC_GET_MAX, UVC_GET_INFO, UVC_GET_DEF and UVC_GET_RES with Extension unit? With the standard property I am using pVideoProcAmp->GetRange Method. But I didn't find the equivalent for the extension unit.
Finally, the problem was coming from the UVC_GEN_LEN return value.
The length need to be = 0x02;
And the data back needed to be equal to the len of the ulSize..

How to get notifications when the headphones are plugged in/out? Mac

I'd like to get notified when headphones are plugged in or out in the headphone jack.
I've searched around for this on stackoverflow but I can't seem to find what I'm looking for for the Mac, I can only find for iOS.
So, do you have any ideas on how to perform this? What I want to do with this is: when headphones are plugged out I want to programmatically pause iTunes (iOS-like feature).
Thank you!
You can observe changes using the CoreAudio framework.
Both headphones and the speakers are data sources on the same audio output device (of type built-in). One of both will be on the audio device based on headphones being plugged in or not.
To get notifications you listen to changes of the active datasource on the built-in output device.
1. Get the built-in output device
To keep this short we'll use the default output device. In most cases this is the built-in output device. In real-life applications you'll want to loop all available devices to find it, because the default device could be set to a different audio device (soundflower or airplay for example).
AudioDeviceID defaultDevice = 0;
UInt32 defaultSize = sizeof(AudioDeviceID);
const AudioObjectPropertyAddress defaultAddr = {
kAudioHardwarePropertyDefaultOutputDevice,
kAudioObjectPropertyScopeGlobal,
kAudioObjectPropertyElementMaster
};
AudioObjectGetPropertyData(kAudioObjectSystemObject, &defaultAddr, 0, NULL, &defaultSize, &defaultDevice);
2. Read its current data source
The current datasource on a device is identified by an ID of type UInt32.
AudioObjectPropertyAddress sourceAddr;
sourceAddr.mSelector = kAudioDevicePropertyDataSource;
sourceAddr.mScope = kAudioDevicePropertyScopeOutput;
sourceAddr.mElement = kAudioObjectPropertyElementMaster;
UInt32 dataSourceId = 0;
UInt32 dataSourceIdSize = sizeof(UInt32);
AudioObjectGetPropertyData(defaultDevice, &sourceAddr, 0, NULL, &dataSourceIdSize, &dataSourceId);
3. Observe for changes to the data source
AudioObjectAddPropertyListenerBlock(_defaultDevice, &sourceAddr, dispatch_get_current_queue(), ^(UInt32 inNumberAddresses, const AudioObjectPropertyAddress *inAddresses) {
// move to step 2. to read the updated value
});
Determine the data source type
When you have the data source id as UInt32 you can query the audio object for properties using a value transformer. For example to get the source name as string use kAudioDevicePropertyDataSourceNameForIDCFString. This will result in the string "Internal Speaker" or "Headphones". However this might differ based on user locale.
An easier way is to compare the data source id directly:
if (dataSourceId == 'ispk') {
// Recognized as internal speakers
} else if (dataSourceId == 'hdpn') {
// Recognized as headphones
}
However I couldn't find any constants defined for these values, so this is kind of undocumented.
I was looking for a similar solution and found AutoMute in the app store. It works well.
I'm also working on some scripts of my own, and wrote this script to test if headphones are plugged in:
#!/bin/bash
if system_profiler SPAudioDataType | grep --quiet Headphones; then
echo plugged in
else
echo not plugged in
fi

Mimic file IO in j2me midlet using RMS

I want to be able to record audio and save it to persistent storage in my j2me application. As I understand j2me does not expose the handset's file system, instead it wants the developer to use the RMS system. I understand the idea behind RMS but cannot seem to think of the best way to implement audio recording using it. I have a continuous stream of bits from the audio input which must be saved, 1) should I make a buffer and then periodically create a new record with the bytes in the buffer. 2) Should I put each sample in a new record? 3) should I save the entire recording file in a byte array and then only write it to the RMS on stop recording?
Is there a better way to achieve this other than RMS?
Consider this code below and edit it as necessary it should solve your problem by writing to the phone filesystem directly
getRoots();
FileConnection fc = null;
DataOutputStream dos = null;
fc = (FileConnection)Connector.open("file:///E:/");
if (!fc.exists())
{
fc.mkdir();
}
fc = (FileConnection) Connector.open("file:///E:/test.wav");
if (!fc.exists())
{
fc.create();
}
dos = fc.openDataOutputStream();
dos.write( recordedSoundArray);