MFT NAudio Resampling on the fly - naudio

I want to resample an audio file using NAudio and MFT on-the-fly.
For example, I have the following audio file:
File name: MyAudioFile.mp3
Duration: 10 sec
When this file is being played, I only want to resample that particular position to WAV in the desired format.
So, if the length of "MyAudioFile.mp3" is 10 sec, and the "current play position" is 2.5 sec, I want to resample only that portion of data into WAV format at the sampling rate of 48 KHz.
When the audio progresses further, again, only the "current play position" must be resampled.
I tried the following code:
WaveStream reader = new MediaFoundationReaderRT([path of
"MyAudioFile.mp3"]);
MemoryStream outMemStream = new MemoryStream(); //Decode to memory
stream
using (reader)
using (var resampler = new MediaFoundationResampler(reader,
resampler.WaveFormat))
{
WaveFileWriter.CreateWaveFile(outMemStream, resampler);
rsws = new RawSourceWaveStream(outMemStream, resampler.WaveFormat);
}
WaveChannel32 waveformInputStream = new WaveChannel32(rsws);
The resampling happens properly; however it resamples the whole audio file, which takes time.
What I am looking at is just resampling the "current play position" of the audio, and discard any other position information.
Thanks! Appreciate if you can provide some sample.

To resample on the fly, just pass the reader directly into MediaFoundationResampler. You will now have an ISampleProvider so you won't be able to use WaveChannel32, but really that is an obsolete class now, and you should be able to do anything you need with other ISampleProvider classes from NAudio.

Related

MS Cognitive custom voice-submitting sample data-returning "Only the RIFF(WAV) format is accepted. Check the format of your audio files."

Just checking to make sure that this should be supported. The page here says that you should be able to use any PCM file that's at least 16kHz. I'm trying to segment a longer wav file into utterances using NAudio, and I can generate the files, but all of the training data that I submit is coming back with the processing error "Only the RIFF(WAV) format is accepted. Check the format of your audio files." The audio files are 16 bit PCM, mono, 44kHz wav files, and are all under 60s. Is there another constraint on the file format that I might be missing? The wav files do have a valid RIFF header (verified that the bytes exist).
I managed to figure this out by explicitly re-encoding audio that I received back from the SpeechRecognizer. Definitely not an efficient solution, but this was just a hack to test things out. Here's the code for reference (put this in Recognizer.Recognized):
string rawResult = ea.Result.ToString(); //can get access to raw value this way.
Regex r = new Regex(#".*Offset"":(\d*),.*");
UInt64 offset = Convert.ToUInt64(r?.Match(rawResult)?.Groups[1]?.Value);
r = new Regex(#".*Duration"":(\d*),.*");
UInt64 duration = Convert.ToUInt64(r?.Match(rawResult)?.Groups[1]?.Value);
//create segment files
File.AppendAllText($#"{path}\{fileName}\{fileName}.txt", $"{segmentNumber}\t{ea.Result.Text}\r\n");
//offset and duration are in 100ns units
WaveFileReader w = new WaveFileReader(v);
long totalDurationInMs = w.SampleCount / w.WaveFormat.SampleRate * 1000; //total length of the file
ulong offsetInMs = offset / 10000; //convert from 100ns intervals to ms
ulong durationInMs = duration / 10000;
long bytesPerMilliseconds = w.WaveFormat.AverageBytesPerSecond / 1000;
w.Position = bytesPerMilliseconds * (long)offsetInMs;
long bytesToRead = bytesPerMilliseconds * (long)durationInMs;
byte[] buffer = new byte[bytesToRead];
int bytesRead = w.Read(buffer, 0, (int)bytesToRead);
string wavFileName = $#"{path}\{fileName}\{segmentNumber}.wav";
string tempFileName = wavFileName + ".tmp";
WaveFileWriter wr = new WaveFileWriter(tempFileName, w.WaveFormat);
wr.Write(buffer, 0, bytesRead);
wr.Close();
//this is probably really inefficient, but it's also the simplest way to get things in the right format. It's a prototype-deal with it...
WaveFileReader r2 = new WaveFileReader(tempFileName);
//from other project
var desiredOutputFormat = new WaveFormat(16000, 16, 1);
using (var converter = new WaveFormatConversionStream(desiredOutputFormat, r2))
{
WaveFileWriter.CreateWaveFile(wavFileName, converter);
}
segmentNumber++;
This splits the input file to separate per-turn files, and appends the turn transcripts in the text file using the filenames.
The good news is that this produced a "valid" dataset, and I was able to create a voice from it. The bad news is that the voice font produced audio that was almost completely unintelligible, which I'm going to attribute to a combination of using machine-transcribed samples along with irregular turn breaks and possibly noisy audio. I may see if there's anything that can be done to improve the accuracy by hand editing a few files, but I at least wanted to post an answer here in case anyone else has the same problem.
Also, it appears that either 16 KHz and 44 KHz PCM will work with custom voice, so that's a plus if you have higher quality audio available.

How to set the input audio device for Microsoft.Speech recognizer in VB.Net or C# to any audio device

I want to use the Microsoft.Speech namespace in VB.NET to create a telephony application. I need to be able to set the recognizer input to any audio device installed on the system. Microsoft has the recognizer.SetInputToDefaultAudioDevice() method, but I need something like .SetInputToAudioDeviceID. How can I choose another wave audio input from the list of devices installed on my system? In SAPI, I would use MMSystem and SpVoice:
Set MMSysAudioIn1 = New SpMMAudioIn
MMSysAudioIn1.DeviceId = WindowsAudioDeviceID 'set audio input to audio device Id
MMSysAudioIn1.Format.Type = SAFT11kHz8BitMono 'set wave format, change to 8kHz, 16bit mono for other devices
Dim fmt As New SpeechAudioFormatInfo(1000, AudioBitsPerSample.Eight, AudioChannel.Mono)
Recognizer.SetInputToAudioStream(MMSysAudioIN1, fmt)
How do I do this with Microsoft.Speech?
MORE INFO: I want to take any wave input device in the Windows list of wave drivers and us that as input to speech recognition. Specifically, I may have a Dialogic card with wave input reported by TAPI as deviceID 1-4. In SAPI, I can use the SpMMAudioIn class to create a stream and set which device ID is associated with that stream. You can see some of that code above. Can I directly set Recognizer1.SetInputToAudioStream by the device ID of the device like I can in SAPI? Or do I have to create code that reads bytes and uses buffers, etc. Do I have to create a MemoryStream Object? I can't find any example code anywhere. What do I have to check in .NET to get access to ISpeechMMSysAudio/spMMAudioIn in case something like this would work? But hopefully, there is a way to use MemoryStream or something like it that takes a device ID and lets me pass that stream to the recognizer.
NOTE 2: I added "imports Speechlib" to the VB project and then tried to run the following code. It gives the error listed in the comments below about not being able to set the audio stream to a COM object.
Dim sre As New SpeechRecognitionEngine
Dim fmt As New SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono)
Dim audiosource As ISpeechMMSysAudio
audiosource = New SpMMAudioIn
audiosource.DeviceId = WindowsAudioDeviceID 'set audio input to audio device Id
' audiosource.Format.Type = SpeechAudioFormatType.SAFT11kHz16BitMono
sre.SetInputToAudioStream(audiosource, fmt) <----- Invalid Cast with COM here
It also appears the SpeechAudioFormatType does not support 8kHz formats. This just gets more and more complicated.
You would use SpeechRecognitionEngine.SetInputToAudioStream. Note that if you're having problems with streaming input, you may need to wrap the stream, as illustrated here.

Feed m4s files directly into SourceBuffer doesn't work

I downloaded some init MP4 file(init.mp4) and a sequence of m4s files(like 744397965.m4s, 744397966.m4s, 744397967.m4s...) from a live stream http://vm2.dashif.org/livesim/testpic_2s/Manifest.mpd using Dash.js.
Then I tried to feed these files directly into SourceBuffer bind with a video element, no pictures been played and no error thrown, why?
Did you seek the video element to the earliest timestamp in the resultant buffer (if it is not zero) and then call play() ?

pop at the beginning of playback

When playing a memory stream containing wav encoded audio, the playback starts with a sharp pop/crackle:
ms = new MemoryStream(File.ReadAllBytes(audio_filename));
[...]
dispose_audio();
sound_output = new DirectSoundOut();
IWaveProvider provider = new RawSourceWaveStream(ms, new WaveFormat());
sound_output.Init(provider);
sound_output.Play();
That pop/crackle does not occur when playing the wav file directly:
dispose_audio();
NAudio.Wave.WaveStream pcm = new WaveChannel32(new NAudio.Wave.WaveFileReader(audio_filename));
audio_stream = new BlockAlignReductionStream(pcm);
sound_output = new DirectSoundOut();
sound_output.Init(audio_stream);
sound_output.Play();
Same file is playing, but when the wav data are stored in a memory stream first, there is a somewhat loud pop at the beginning of the playback.
I am very much a newbie with NAudio and audio in general, so it's probably something silly, but I can't seem to figure it out.
You are playing the WAV file header as though it were audio. Instead of RawSourceWaveStream, you still need to use WaveFileReader, just pass in your memory stream.

Mimic file IO in j2me midlet using RMS

I want to be able to record audio and save it to persistent storage in my j2me application. As I understand j2me does not expose the handset's file system, instead it wants the developer to use the RMS system. I understand the idea behind RMS but cannot seem to think of the best way to implement audio recording using it. I have a continuous stream of bits from the audio input which must be saved, 1) should I make a buffer and then periodically create a new record with the bytes in the buffer. 2) Should I put each sample in a new record? 3) should I save the entire recording file in a byte array and then only write it to the RMS on stop recording?
Is there a better way to achieve this other than RMS?
Consider this code below and edit it as necessary it should solve your problem by writing to the phone filesystem directly
getRoots();
FileConnection fc = null;
DataOutputStream dos = null;
fc = (FileConnection)Connector.open("file:///E:/");
if (!fc.exists())
{
fc.mkdir();
}
fc = (FileConnection) Connector.open("file:///E:/test.wav");
if (!fc.exists())
{
fc.create();
}
dos = fc.openDataOutputStream();
dos.write( recordedSoundArray);