Is it possible to set Windows.Media.SpeechSynthesis stream format as in SAPI 5.3? - c++-winrt

I'm using Windows.Media.SpeechSynthesis (C++/WinRT) to convert text to audio file. Previously I was using SAPI where was possible to set Audio Format when binding to a file via SPBindToFile(...) before speaking.
Is there any similar method in Windows.Media.SpeechSynthesis? Seems that there is only possible to get 16kHz, 16Bit, Mono wave stream, does it?
Does SpeechSynthesisStream already contain a real audio stream after speech synthesis, or does it hold some precalculated raw data, and does actual encoding happen when accessing its data (playback on a device or copying to another not-speech-specific stream)?
Thank you!
I think there should be possible to control the speech synthesis stream format somehow.

The WinRT synthesis engines output 16Khz 16-bit mono data. There isn't any resampling layer to change the format.

Related

How to Get Audio sample data from mp3 using NAudio

I have an mp3 file into one large array of audio samples.
I want the audio samples to be floats.
NAudio.Wave.WaveStream pcm=NAudio.Wave.WaveFormatConversionStream.CreatePcmStream(new NAudio.Wave.Mp3FileReader(OFD.FileName));
so far I get the pcm stream and can play that back fine but I don't know how to read the raw data out of the stream.
Use AudioFileReader. This implements ISampleProvider so the Read method allows you to read directly into a float array of samples.
Alternatively use the ToSampleProvider method after your Mp3FileReader. You don't need to use WaveFormatConversionStream, since Mp3FileReader (and AudioFileReader) already decompress the MP3 frames.

core audio: is zero equivalent to silence only for PCM audio?

I'm trying to create a basic algorithm that does packet loss concealment for core audio. I simply want to replace the missing data with silence.. in the book learning core audio, the author says that in lossless PCM, zeros mean silence. I was wondering if I'm playing VBR (ie compressed data), would putting zeros suffice for silence as well?
In my existing code.. when I plug zeros into the audio queue.. it suddenly jams (ie it no longer frees up consumed data in the audio queue callback..) and i'm wondering why
PCM is the raw encoded sample. All 0 (when using signed data for samples) is indeed silence. (In fact, all of any value is silence, but such a DC offset has the potential to damage your amplifier and/or speakers, if it isn't filtered out.)
When you compress with a lossy codec, you enter a digital format where it is not trivial to just add silence. Think of adding data to a ZIP file to add null bytes to the end of a file. It isn't as simple as just inserting them arbitrarily into the ZIP file.
If you want to add silence to a compressed file, you must do so using the appropriate codec. Then, you have to fit it into the bitstream, which is also not trivial. Usually the stream is broken up by frames, but you can't even split on those frames in some formats. MP3 and AAC use a bit reservoir where unused data in prior frames can be used to encode more complicated frames later on, making splitting the file very difficult.

Record audio in OS X into FLAC using Cocoa

I am trying to record audio from a microphone/iSight camera from Mac to a NSData object.
I have tried to do it using QTKit, but I found out that you could only save it as a .mov file.
But the fact is that I want to recode the audio into a FLAC file. Is that posible, or I'll need to use another framework?.
Thanks.
Grab the source for VLC (if you can deal w/GPL -- it has limitations on use that many find onerous) and have a read. It does transcoding, amongst other things.
Beyond that, one dead simple approach is to save as AIFF and then use a command line tool (via NSTask) to do the conversion.
Or you could just go with Apple Lossless -- it is open source these days.
Of course, this also begs the question; why do you need lossless compression when recording voice [low bandwidth in the first place] via a relatively sub-par microphone?

When reading audio file with ExtAudioFile read, is it possible to read audio floats not consecutively?

I`m trying to draw waveform out of mp3 file.
I've succeeded to extract floats using ExtAudioFileReadTest app provided in the Core Audio SDK documentation(link: http://stephan.bernsee.com/ExtAudioFileReadTest.zip), but it reads floats consecutively.
The problem is, that my audio file is very long (about 1 hour), so if I read floats consecutively, it takes so much time.
Is it possible to skip the audio file, then read small partial of audio, then skip and read?
Yes, use ExtAudioFileSeek() to seek to the desired sample frame. It has some bugs depending on what format you're using (or did on 10.6) but MP3 should be OK.

Scheme to play video file in own container format on Mac OS X

I am planning to write an application (C/C++/Objective-C) that will
play media files in own (private) container format. The files will
contain: multiple video streams, encoded by a video codec (like
XVid or H264, it is supposed, that components capable of decoding
the viideo formats are present in the system); multiple audio
streams in some compressed formats (it is supposed, that decoding
will be performed by a system component or by own code).
So, it seems it is required to implement the following scheme:
1) Implement container demuxer (may be in the form of media handler
component).
2) Pass video frames to a video decoder component, and mix
decompressed frames (using some own rules).
3) Pass audio data to an audio decoder component, or decompress the
audio by own code, and mix decoded audio data.
4) Render video frames to a window.
5) Pass audio data to a selected audio board.
Could anybody provide tips to any of the above stages, that is:
toolkits, that I should use; useful samples; may be names of
functions to be used; may be improvements to the scheme,....
I know I am quite late, so you might not need this anymore, but I just wanted to mention, that the right way to do it is to write a QuickTime component.
Although it is pretty old school, it's the same way Apple uses to support new formats and codecs.
Look at the Perian project as an orientation point.
Best