Event when N bytes have been played in a playback - naudio

I want to play an audio file and compare if the buffer with size N just played has the same main frequency as the buffer with size N just capture in the mic. Looking at the SpectrumVisualization sample seems like I have to implement my own ISampleProvider to be able to fire event when N bytes have been played from my file. Is that correct? Can I use any of the existing providers to do that?
thanks

yes, this is the general technique. In the demo it batches up 1024 samples before passing them to an FFT. If your algorithm can cope with arbitrary batch sizes then it might be easier just to pass the audio received in each call to Read into it, which will align with the buffer size being used by the playback device.

Related

What characteristics should have a .wav file as result of TTS engine to be be listened with high quality?

I'm trying to generate high quality voice-over using Microsoft Speech API. What kind of values I should pass in to this constructor to guarantee high quality audio?
The .wav file will be used latter to feed FFmpeg, so audio will be re-encoded latter to a more compact form. My main goal is keep the voice as clear as I can, but I really don't know which values guarantee the best quality perceived by humans.
First of all, just to let you know I haven't used this Speech API, I'll give you an answer based on my Audio processing work.....
You can choose EncodingFormat.Pcm for Pulse Code Modulation
samplesPerSecond is sampling frequency. Because it is voice you can cover it with 16000hz for sure. If you are really perfectionist you can go with 22050 for example. Higher the value is, the audio file size will be larger. If file size isn't a problem you can even go with 32000 or 44100 but there won't be much noticable difference....
bitsPerSample - go with 16 if possible
1 or 2, mono or stereo ..... it won't affect the quality of the sound
averageBytesPerSecond ..... this would be samplesPerSecond*bytesPerSample (for example 22050*2)
blockAlign ..... this would be Bytes Per Sample*numberOfChanels (for example if you have 16bit PCM Mono audio, 16bits are 2 bytes, Mono is 1, so blockAlign is 2*1)
That last one, the byte array doesn't speaks much for itself, I'm not sure what it serves for, I believe the first 6 arguments are enough for audio to be generated.
I hope this was helpful
Cheers

Using NAudio, How do I get the amplitude and rhythm of an MP3 file?

The wife asked for a device to make the xmas lights 'rock' with the best of music. I am going to use an Arduino micro-controller to control relays hooked up to the lights, sending down 6 signals from C# winforms to turn them off and on. I want to use NAduio to separate the amplitude and rhythm to send the six signals. For a specific range of hertz like an equalizer with six bars for the six signals, then the timing from the rhythm. I have seen the WPF demo, and the waveform seems like the answer. I want to know how to get those values real time while the song is playing.
I'm thinking ...
1. Create a simple mp3 player and load all my songs.
2. Start the songs playing.
3. Sample the current dynamics of the song and put that into an integer that I can send to which channel on the Arduino micro-controller via usb.
I'm not sure how to capture real time the current sound information and give integer values for that moment. I can read the e.MaxSampleValues[0] values real time while the song is playing, but I want to be able to distinguish what frequency range is active at that moment.
Any help or direction would be appreciated for this interesting project.
Thank you
Sounds like a fun signal processing project.
Using the NAudio.Wave.WasapiLoopbackCapture object you can get the audio data being produced from the sound card on the local computer. This lets you skip the 'create an MP3 player' step, although at the cost of a slight delay between sound and lights. To get better synchronization you can do the MP3 decoding and pre-calculate the beat patterns and output states during playback. This will let you adjust the delay between sending the outputs and playing the audio block those outputs were generated from, getting near perfect synchronization between lights and music.
Once you have the samples, the next step is to use an FFT to find the frequency components. Fortunately NAudio includes a class to help with this: NAudio.Dsp.FastFourierTransform. (Thank you Mark!) Take the output of the FFT() function and sum out the frequency ranges you want for each controlled light.
The next step is Beat Detection. There's an interesting article on this here. The main difference is that instead of doing energy detection on a stream of sample blocks you'll be using the data from your spectral analysis stage to feed the beat detection algorithm. Those ranges you summed become inputs into individual beat detection processors, giving you one output for each frequency range you defined. You might want to add individual scaling/threshold factors for each frequency group, with some sort of on-screen controls to adjust these for best effect.
At the end of the process you will have a stream of sample blocks, each with a set of output flags. Push the flags out to your Arduino and queue the samples to play, with a delay on either of those operations to achieve your synchronization.

h264 high profile video: playback from specified point

What is the proper and fast way to start streaming/playback of h264 high profile HDTV video dump from the specific point?
Huge sample of the real life stream: this file.
According to 'ffprobe -show_frames' this sample 10Gb 105 minutes video dump has only 28 video frames marked as 'key_frame=1' and 10 I-frames.
Application which I am trying to improve uses such frames as some kind of index, allowing to rewind and play from any key-frame or I-frame.
It works perfectly with other streams. But not in this case, as you can easily understand. Only 28 starting points of playback in 100+ minutes of show is far too low.
I've checked the presence of packets with 'random-access-indicator' enabled - but such packets in this stream aren't on frame boundaries, they don't have 'frame begin' bit enabled, so I can't rely on them.
Is there a way at all to implement 'rewind/pause/play from the specified time point' feature for this codec?
Solved by interpretation as index frames the ones which contain NAL sequences 'nal slice idr' and 'nal slice pps'.

audio-unit sample rate and buffer size

i am facing a really misunderstanding when sampling the iphone audio with remoteIO.
from one side, i can do this math: 44khz sample rate means 44 samples per 1ms. which means if i set bufferSize to 0.005 with :
float bufferLength = 0.00005;
AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration, sizeof(bufferLength), &bufferLength);
which means 5ms buffer size -which means 44*5=220 samples in buffer each callback.
BUT i get 512 samples from inNumberFrames each callback . and it stay's fixed even when i change buffer length.
another thing , my callbacks are every 11ms and is not changing! i need faster callbacks .
so !
what is going on here ?
who set what ?
i need to pass a digital information in an FSK modulation, and have to know exactly buffer size in samples, and what time from the signal it has , in order to know how to FFT it correctly .
any explanation on this ?
thanks a lot.
There is no way on all current iOS 10 devices to get RemoteIO audio recording buffer callbacks at a faster rate than every 5 to 6 milliseconds. The OS may even decide to switch to sending even larger buffers at a lower callback rate at runtime. The rate you request is merely a request, the OS then decides on the actual rates that are possible for the hardware, device driver, and device state. This rate may or may not stay fixed, so your app will just have to deal with different buffer sizes and rates.
One of your options might be to concatenate each callback buffer onto your own buffer, and chop up this second buffer however you like outside the audio callback. But this won't reduce actual latency.
Added: some newer iOS devices allow returning audio unit buffers that are shorter than 5.x mS in duration, usually a power of 2 in size at a 48000 sample rate.
i need to pass a digital information in an FSK modulation, and have to know exactly buffer size in samples, and what time from the signal it has , in order to know how to FFT it correctly.
It doesn't work that way - you don't mandate various hosts or hardware to operate in an exact manner which is optimal for your processing. You can request reduced latency - to a point. Audio systems generally pass streaming pcm data in blocks of samples sized by a power of two for efficient realtime io.
You would create your own buffer for your processor, and report latency (where applicable). You can attempt to reduce wall latency by choosing another sample rate, or by using a smaller N.
The audio session property is a suggested value. You can put in a really tiny number but will just go to the lowest value it can. The fastest that I have seen on an iOS device when using 16 bit stereo was 0.002902 second ( ~3ms ).
That is 128 samples (LR stereo frames) per callback. Thus, 512 bytes per callback.
So 128/44100 = 0.002902 seconds.
You can check it with:
AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration, &size, &bufferDuration)
Could the value 512 in the original post have meant bytes instead of samples?

samples per second on the iphone audio input

After reading so much on the remoteIO for the iphone ,and the buffers, i wrote a code and i get the audio samples.
but , i cant understand something about the buffer size.
i know every time the buffer is full, the callback function is being called.
the buffer size is 2 byts, 16 bits.
what i dont know is, the frequency which the callback is called to get this 16bits.
somehow when i log the buffer out, i have got only 2500 samples per 7 second, which means about 400 samples a second. which is too BAD ! .
what am i doing wrong ? OR what i dont understand here ?
my code is here from another post of me :
error in audio Unit code -remoteIO for iphone
The problem is that NSLog is way too slow compared to the audio samplerate, and thus blocks yor audio callback from getting called often enough. So you are losing almost all of the samples. Take all of the NSLogs out of the audio callback, and just increment a counter to count your samples.
If you want to NSLog something, figure out how to do that outside the audio callback.
Your code sample seems to be requesting 44100 samples per second from the audio unit. Check the error return value to make sure.
Also, the number of samples in a buffer does not involve a strlen().
Maybe it's just the logging that is 400Hz, and the audio is fine?
If I understand correctly you have no problem hearing the audio, that means that the audio frequency is sufficient. At 400Hz, you'll have aliasing for all bands over 200Hz, which is very low (we can hear up to 20 kHz), meaning your audio will be strongly distorted. See nyquist theorem.
Maybe what you get is not a single sample but an array of samples, i.e. an audio buffer, containing maybe 128 samples (~400*128 = 44100), and maybe multiple channels (depending on your configuration).