Handling Dynamic Sample Rates when Saving Audio File - naudio

So, I am recording a WAVE file using 16-bit PCM samples that are received from a widget that streams them over in real time. Pretty basic stuff, right? Except the problem is that the widget might dynamically change the sample rate of the audio data that it is sending.
It might start out at a nice 44.1 kHz stream but then might change to a 23 kHz sample rate, or vice-versa. My understanding is that conventional WAVE files do not handle varying sample rates like this, so I am trying to determine the best way to handle the situation.
One approach I came up with was to put something like a ResamplerDmoStream in front of the WaveFileWriter, locking the WaveFileWriter to a sample rate of 44.1 kHz, and just resampling all incoming data to 44.1 kHz.
Another idea that might work is to find a supported output file format that may have native support for varying sample rates, write all the data to that file and then perform a post-process resampling step to create a conventional 44.1 kHz WAVE file.
Anyone else out there had to deal with this kind of situation and has a better idea?
Thanks!
Peace!

Related

Converting an executable file into an analog waveform signal

I have been trying to convert a digital binary file (.exe) into waveform to listen the resulted audio. I have been looking for any possible software/open source code to help me in achieving this, but no use.
My ultimate goal is to represent the .exe file as a spectogram to analyse the behaviour of the frequencies in the executable file. My understanding that I have to identify the range of frequencies first, which could be done by plotting the waveform first.
Any reference would be appreciated.
Edit:
I have a collection of binary files and I need to classify them according to their sound statistical features (frequency behaviour). My plan was to get the waveform of the actual binary file (by dividing the file into 1 signed byte each) and then convert the waveform into spectrogram picture and apply deep learning analysis for voice recognition
So, the depth of each sample will be 8-bits, and the sampling rate will be either 8Khz or 16 Khz. But I am confused of how to determine the frequencies related from the executable file

Parsing HEVC for Motion Information

I parsed the HEVC stream by simply identifying sart code (000001 or 00000001), and now I am looking for the motion information in the NAL payload. My goal is to calculate the percentage of the motion information in the stream. Any ideas?
Your best bet is to start with the HM reference software (get it here: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/trunk/) and add some debug info as the different kinds of data is read from the bitstream. This is likely much easier than writing bitstream decoder from scratch.
Check out the debug that is built into the software already, for example RExt__DECODER_DEBUG_BIT_STATISTICS or DEBUG_CABAC_BINS. This may do what you want already, if not it will be pretty close. I think information about bit usage can be best collected in source/Lib/TLibDecoder/TDecBinCoderCABAC.cpp during decode.
If you need to speed this up, you can of course skip the actual decode steps :)
At the decoder side, You can find the motion vector information as MVD, so you should using pixel decoding process to get the motion information. it need you to understand the process of the inter prediction at HEVC.
than you!

How to detect silence and cut mp3 file without re-encoding using NAudio and .NET

I've been looking for an answer everywhere and I was only able to find some bits and pieces. What I want to do is to load multiple mp3 files (kind of temporarily merge them) and then cut them into pieces using silence detection.
My understanding is that I can use Mp3FileReader for this but the questions are:
1. How do I read say 20 seconds of audio from an mp3 file? Do I need to read 20 times reader.WaveFormat.AverageBytesPerSecond? Or maybe keep on reading frames until the sum of Mp3Frame.SampleCount / Mp3Frame.SampleRate exceeds 20 seconds?
2. How do I actually detect the silence? I would look at an appropriate number of the consecutive samples to check if they are all below some threshold. But how do I access the samples regardless of them being 8 or 16bit, mono or stereo etc.? Can I directly decode an MP3 frame?
3. After I have detected silence at say sample 10465, how do I map it back to the mp3 frame index to perform the cutting without re-encoding?
Here's the approach I'd recommend (which does involve re-encoding)
Use AudioFileReader to get your MP3 as floating point samples directly in the Read method
Find an open source noise gate algorithm, port it to C#, and use that to detect silence (i.e. when noise gate is closed, you have silence. You'll want to tweak threshold and attack/release times)
Create a derived ISampleProvider that uses the noise gate, and in its Read method, does not return samples that are in silence
Either: Pass the output into WaveFileWriter to create a WAV File and and encode the WAV file to MP3
Or: use NAudio.Lame to encode directly without a WAV step. You'll probably need to go from SampleProvider back down to 16 bit WAV provider first
BEFORE READING BELOW: Mark's answer is far easier to implement, and you'll almost certainly be happy with the results. This answer is for those who are willing to spend an inordinate amount of time on it.
So with that said, cutting an MP3 file based on silence without re-encoding or full decoding is actually possible... Basically, you can look at each frame's side info and each granule's gain & huffman data to "estimate" the silence.
Find the silence
Copy all the frames from before the silence to a new file
now it gets tricky...
Pull the audio data from the frames after the silence, keeping track of which frame header goes with what audio data.
Start writing the second new file, but as you write out the frames, update the main_data_begin field so the bit reservoir is in sync with where the audio data really is.
MP3 is a compressed audio format. You can't just cut bits out and expect the remainder to still be a valid MP3 file. In fact, since it's a DCT-based transform, the bits are in the frequency domain instead of the time domain. There simply are no bits for sample 10465. There's a frame which contains sample 10465, and there's a set of bits describing all frequencies in that frame.
Plain cutting the audio at sample 10465 and continuing with some random other sample probably causes a discontinuity, which means the number of frequencies present in the resulting frame skyrockets. So that definitely means a full recode. The better way is to smooth the transition, but that's not a trivial operation. And the result is of course slightly different than the input, so it still means a recode.
I don't understand why you'd want to read 20 seconds of audio anyway. Where's that number coming from? You usually want to read everything.
Sound is a wave; it's entirely expected that it crosses zero. So being close to zero isn't special. For a 20 Hz wave (threshold of hearing), zero crossings happen 40 times per second, but each time you'll have multiple samples near zero. So you basically need multiple samples that are all close to zero, but on both sides. 5 6 7 isn't much for 16 bits sounds, but it might very well be part of a wave that will have a maximum at 10000. You really should check for at least 0.05 seconds to catch those 20 Hz sounds.
Since you detected silence in a 50 millisecond interval, you have a "position" that's approximately several hundred samples wide. With any bit of luck, there's a frame boundary in there. Cut there. Else it's time for reencoding.

NAudio - Create software beat machine / sampler - General Strategy

Im new to NAudio but the goal of my project is to provide the user with the ability for the user to listen to an MP3 and then select a sample or a "chunk" of that song as a sample which could be saved to disk. These samples would be able to replayed at the same time (i.e. not merged but played at the same time).
Could someone please let me know what the overall strategy required to achieve this (....not necessarily the specifics...almost like pseduo code....).
For example would the samples / chunks of a song need to be saved as a WAV file. And these samples could be played together in the WAV format, etc.
I have seen a few small examples of a few implementations of some of the ideas Ive mentioned above but dont have a good sense of the big picture just yet.
Thanks in advance,
Andrew
The chunks wouldn't need to be saved as WAV files unless you were keeping them for future use. You can store the PCM audio (Mp3FileReader automatically converts to PCM) in a byte array and use RawSourceWaveStream to play them.
As for mixing them, I'd recommend using the MixingSampleProvider. This does mean you need to convert your RawSourceWaveStream to IEEE float, but you can use Pcm16BitToSampleProvider to do this. This will give the advantage that you can adjust volumes (and do other DSP) easily on the samples you are mixing. MixingSampleProvider also auto-removes completed inputs, so you can just add new inputs whenever you want to trigger a sound.

Getting Pitch with VB.net

I want to get the pitch of a song at any point. I plan on storing the pitches later. How can I read say... an mp3 file or wav file to get the pitch played at a certain point?
Here is a visual example:
Say I wanted to get the pitch that is here at ^this point of the song.
Thanks if you can!
The matter is a tad more complicated than you may be anticipating.
While time-domain approaches exist (that is, approaches which work with the PCM data directly), frequency-domain pitch detection is going to be more accurate. You can read a very simplified overview here.
What you probably want is a Fourier Transform, which can be used to transform blocks of your signal from time-domain to frequency-domain (that is, a distribution of frequency content over a given span of the signal). From there, you would need to analyze the frequency spectrum within that block. The problem becomes even harder still, because there is no best way to deduce pitch from a sampled frequency spectrum in the general case. The aforementioned Wikipedia article should give you a foundation for looking into those algorithms.
Finally, it's worth noting that this is really a language-agnostic question, unless your primary interest is in reading a WAV file specifically using VB.NET.