Does change in MCI_RECORD parameter affect the SAPI speech regocnition? - sapi

Intially I have specified MCI_WAVE_SET_PARMS at the time of recording as follows:
MCI_WAVE_SET_PARMS mciSetParms;
mciSetParms.wFormatTag = WAVE_FORMAT_PCM;
mciSetParms.wBitsPerSample = 16;
mciSetParms.nChannels = 2;
mciSetParms.nSamplesPerSec = 11050;
Now if I change it to
MCI_WAVE_SET_PARMS mciSetParms;
mciSetParms.wFormatTag = WAVE_FORMAT_PCM;
mciSetParms.wBitsPerSample = 8;
mciSetParms.nChannels = 1;
mciSetParms.nSamplesPerSec = 8000;
Will it affect the speech recognition performed by SAPI ?

Yes, it will. SAPI's recognizer really needs at least 11KHz of 16-bit audio for good recognition. Giving it 8KHz and 8-bit audio will substantially impair the recognition rate (if you get any recognitions at all).
Note - Dropping the channels from 2 to 1 won't affect SAPI at all.

I think it should affect .
See the following link :
http://en.wikipedia.org/wiki/Acoustic_Model

Related

Why am I getting such a large alignment memory requirement for an image?

I create an image in Vulkan and I get an alignment requirement in the memory requirements of 131072. This seems like an enormous alignment and I'm not sure why anything bigger than 128 or 256 may be needed. It's so big that my memory allocation algorithm can't even handle it, and will never be able to practically handle it given that each allocation of this strict an alignment will waste too much space. What's the deal behind this? Here is how I create the image:
VkImageCreateInfo create_info{};
create_info.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
create_info.imageType = VK_IMAGE_TYPE_2D;
create_info.pNext = nullptr;
create_info.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
create_info.samples = VkSampleCountFlagBits::VK_SAMPLE_COUNT_1_BIT;
create_info.queueFamilyIndexCount = 0;
image_create_info.extent.width = 1716;
image_create_info.extent.height = 1731;
image_create_info.extent.depth = 1;
image_create_info.usage = VkImageUsageFlagBits::VK_IMAGE_USAGE_SAMPLED_BIT;
image_create_info.tiling = VkImageTiling::VK_IMAGE_TILING_OPTIMAL;
image_create_info.initialLayout = VkImageLayout::VK_IMAGE_LAYOUT_UNDEFINED;
image_create_info.flags = 0;
image_create_info.mipLevels = 1;
image_create_info.format = VK_FORMAT_R8G8B8A8_UINT;
image_create_info.arrayLayers = 1;
VkImage vk_image;
VkResult result = vkCreateImage((VkDevice)VK::logicalDevice, &image_create_info, nullptr, &vk_image);
VkMemoryRequirements requirements;
vkGetImageMemoryRequirements(VK::logicalDevice, vk_image, &requirements);
Another interesting thing about the requirements returned by the function is that the memory size requirement for format VK_FORMAT_R8G8B8A8_UINT is about 12 mb, which makes sense, but with a format of VK_FORMAT_R8G8B8_UINT (so without the alpha channel), it gives a size requirement of only 3 mb, about a quarter of the size. Have I run into some sort of bug?
I know the dimensions of the image I created aren't power of two, but surely this shouldn't lead to such strange behaviour, should it?
It's so big that my memory allocation algorithm can't even handle it and will never be able to practically handle it given that each allocation of this strict an alignment will waste too much space.
Then fix that.
Implementations are allowed to require all kinds of alignments, especially for optimally-tiled images. 128KiB alignment is hardly unreasonable for images. So your sub-allocator needs to be able to account for this.
As for "waste too much space," perhaps you should take another look at those numbers. The example texture must take up at least 11'881'584 bytes. 128KiB is slightly more than 1% of that storage. That's not a lot of waste.

Audio through CAN FD into headphones

I am trying to record audio using a 12 bit resolution ADC, take the sample buffer and send it through CAN FD to another device, which takes samples of this audio and creates a .wav and plays it. The problem is that I see the data of the microphone being sent through CAN FD to the other device, but I am not able to transform this data into a .wav file properly and hear what I say through the microphone. I only hear beeps.
I'm creating a new .wav every 4 CAN FD messages in order to make some kind of real time communication and decrease the delay, but I don't think this is possible or if I am thinking it the proper way.
In this thread I take the message sent by the CAN FD and concatenate it in a buffer in order to introduce it in a .wav file. I have tried bigger buffers but it doesn't change the outcome.
How could I be able to take the data from the CAN FD and hear it?
Clarification: I know using CAN FD to transmit audio isn't the proper way, but it is for a master project.
struct canfd_frame frame;
CAN_MSG msg;
int trama_can[72];
int nbytes;
while (status_libreria == 0)
;
unsigned char buffer[256];
// FILE * fPtr;
int i=0,x=0;
//fPtr = fopen("Test.txt", "w");
while (1) {
do {
nbytes = read(s, &frame, sizeof(struct canfd_frame));
} while (nbytes == 0);
msg.id.ext = frame.can_id;
msg.dlc = frame.len;
if (msg.dlc > 8)
msg.dlc = 8; //Protecci�n hasta adaptar AC3LIB a CANFD
Numas_memcpy(&(msg.data.bdata), &(frame.data), msg.dlc);
can_frame_2_ac3lib(&msg, BUS_VERTICAL);
for(x=0;x<64;x++) buffer[i*64+x] = frame.data[x];
printf("%d \r\n",frame.data[x]);
printf("i:%d \r\n",i);
// Copiar datos a fichero.wav y reproducirlo simultaneamente
if (i == 3) {
printf("Datos IN\r\n");
write_wav("prueba.wav",256 , (short int *)buffer, 16000);
//fwrite(buffer,1,sizeof(buffer),fPtr);
//fclose(fPtr);
system("aplay prueba.wav -f cd");
i = 0;
system("rm prueba.wav");
}
i++;
}
32 first bytes of the audio file being recorded
In the picture, as you can see, the data is being recorded. moreover, this data is the same data as in the ADC, but when I play it, I only hear noise.
Simplify the problem first. Make sure you can transmit known data from one end to the other first at low rates. I'm sure the suggestion below will sound far too trivial. But until you are absolutely confident you understand it all, I predict you sill have many struggles.
Slowly - one frame per second, or even slower.
Learn to send one 0x55 byte from one end to the other and verify at the receiver.
Learn to send a few 0x55 in one frame and verify.
Learn to send 0x12345678 - verify it ends up with the bytes in the right order at the other end
Learn to send a counter. Check it at the receiver, make sure you do not drop any data.
Now do it all again but 10x faster.
Continue until you can send a counter at 10x the rate you need to for the audio without dropping any frames at all, for minutes and then hours.
Stress the rest of the system to make sure it still works under stress.
Only now, can you start to learn about sending audio.
Trust me, you will learn a lot!

How to make Timer1 more accurate as a real time clock?

I have PIC18F87J11 with 8 MHz oscillator and I am using timer1 as real time clock. At this moment I have it toggle an LED every 1 minute. I noticed it does work perfect fine the first few times but slowly it starts toggling the LED every 59 seconds. Then every few minutes it keeps going down to 58, 57, etc. I don't know if its impossible to get an accurate clock using internal oscillator or if I need external oscillator. My settings look right for timer1, I just hope I can resolve this issue with the current hardware.
Prescaler 1:8, TMR1 Preload = 15536, Actual Interrupt Time : 200 ms
// Timer 1 Settings
RCONbits.IPEN = 1; // Enable interrupt system priority feature
INTCONbits.GIEL = 1; // Enable low priority interrupts
// 1:8 prescalar
T1CONbits.T1CKPS1 = 1;
T1CONbits.T1CKPS0 = 1;
// Use Internal Clock
T1CONbits.TMR1CS = 0;
// Timer1 overflow interrupt
PIE1bits.TMR1IE = 1;
IPR1bits.TMR1IP = 0; // Timer 1 -> Low priority interrupt group
PIE1bits.TMR1IE = 1; // Enable Timer1 interrupt
// TMR1 Preload = 15536;
TMR1H = 0x3C;
TMR1L = 0xB0;
Interrupt Routine
void interrupt low_priority lowISR(void) {
if (PIR1bits.TMR1IF == 1) {
oneSecond++;
if (oneSecond == 5) {
minute_Counter++;
if (minute_Counter >= 60) {
// One minute passed
Printf("\r\n One minute Passed");
ToggleLed();
minute_Counter = 0;
}
oneSecond = 0;
}
// TMR1 Preload = 15536;
TMR1H = 0x3C;
TMR1L = 0xB0;
PIR1bits.TMR1IF = 0;
}}
The internal oscillator is a simple RC oscilator (a resistor/capacitor time constant determines its frequency), this kind of circuit may be accurate to only +/-10% over the operating temperature range of the device, and the device will be self-heating due to normal operating power dissipation.
From the data sheet:
An external crystal or other accurate external clock source is required to get accurate timing. Alternatively, if you have some other stable and accurate, but low frequency clock source, such as output from an RTC with a 38768 Hz crystal, you can use that to calibrate the internal RC oscillator and dynamically adjust it with the OSCTUNE register - by using a timer gated by the low frequency source, you can determine the actual frequency of INTOSC and adjust accordingly - it will not be perfect, but it will be better - but no better than the precision of the calibrating source of course.
Some devices have a die temperature sensor that can also be used to compensate, but that is not available on your device.
The RC error can cause serial communications mistiming to the extent that you cannot communicate with a device using asynchronous (UART) serial comms.
There are some stuff in the datasheet you linked, "2.5.3 INTERNAL OSCILLATOR OUTPUT FREQUENCY AND TUNING", on p38
The datasheet says that
The INTOSC frequency may drift as VDD or temperature changes".
Are VDD and temperature stable ?
It notes three ways to deal with this by tuning the OSCTUNE register. The three of them would need an external "oscillator" :
dealing with errors of EUSART...this signal should come from somewhere.
a peripheral clock
cpp module in capture mode. You may use any stable AC signal as input.
Good luck !
Reload the Timer as soon as it expires, the delay between timer overflow and rearm is affecting the total time. So this will solve your problem.
void interrupt low_priority lowISR(void)
{
if (PIR1bits.TMR1IF)
{
PIR1bits.TMR1IF = 0;
TMR1H = 0x3C;
TMR1L = 0xAF;
/* rest of the code here */
. . . .
}
}
One more recommendation is not to load up the isr, keep it simple.
For all timing, time and frequency applications the first and most important thing to do is to CALIBRATE THE CRYSTAL OSCILLATOR!!! The oscillator itself and its crystal MUST run exactly (to better than 1 part per million = 1ppm) of its nominal frequency. Crystals straight out of a factory (except some very specialized and expensive ones = 100's of $) are not running exactly at their nominal frequency. If the calibration is not done, all time and frequency related functions will be off, because the oscillator frequency is used as reference for all PICs internal functions. The calibration must be done against an accurate frequency counter by adjusting one of the capacitors from crystal pins to ground. Any processor routines for frequency (and time) calibration are not accurate enough.

How do I analyze video stream on iOS?

For example, there are QR scanners which scan video stream in real time and get QR codes info.
I would like to check the light source from the video, if it is on or off, it is quite powerful so it is no problem.
I will probably take a video stream as input, maybe make images of it and analyze images or stream in real time for presence of light source (maybe number of pixels of certain color on the image?)
How do I approach this problem? Maybe there is some source of library?
It sounds like you are asking for information about several discreet steps. There are a multitude of ways to do each of them and if you get stuck on any individual step it would be a good idea to post a question about it individually.
1: Get video Frame
Like chaitanya.varanasi said, AVFoundation Framework is the best way of getting access to an video frame on IOS. If you want something less flexible and quicker try looking at open CV's video capture. The goal of this step is to get access to a pixel buffer from the camera. If you have trouble with this, ask about it specifically.
2: Put pixel buffer into OpenCV
This part is really easy. If you get it from openCV's video capture you are already done. If you get it from an AVFoundation you will need to put it into openCV like this
//Buffer is of type CVImageBufferRef, which is what AVFoundation should be giving you
//I assume it is BGRA or RGBA formatted, if it isn't, change CV_8UC4 to the appropriate format
CVPixelBufferLockBaseAddress( Buffer, 0 );
int bufferWidth = CVPixelBufferGetWidth(Buffer);
int bufferHeight = CVPixelBufferGetHeight(Buffer);
unsigned char *pixel = (unsigned char *)CVPixelBufferGetBaseAddress(Buffer);
cv::Mat image = cv::Mat(bufferHeight,bufferWidth,CV_8UC4,pixel); //put buffer in open cv, no memory copied
//Process image Here
//End processing
CVPixelBufferUnlockBaseAddress( pixelBuffer, 0 );
note I am assuming you plan to do this in OpenCV since you used its tag. Also I assume you can get the OpenCV framework to link to your project. If that is an issue, ask a specific question about it.
3: Process Image
This part is by far the most open ended. All you have said about your problem is that you are trying to detect a strong light source. One very quick and easy way of doing that would be to detect the mean pixel value in a greyscale image. If you get the image in colour you can convert with cvtColor. Then just call Avg on it to get the mean value. Hopefully you can tell if the light is on by how that value fluctuates.
chaitanya.varanasi suggested another option, you should check it out too.
openCV is a very large library that can do a wide wide variety of things. Without knowing more about your problem I don't know what else to tell you.
Look at the AVFoundation Framework from Apple.
Hope it helps!
You can try this method: start by getting all images to an AVCaptureVideoDataOutput. From the method:captureOutput:didOutputSampleBuffer:fromConnection,you can sample/calculate every pixel. Source: answer
Also, you can take a look at this SO question where they check if a pixel is black. If its such a powerful light source, you can take the inverse of the pixel and then determine using a set threshold for black.
The above sample code only provides access to the pixel values stored in the buffer; you cannot run any other commands but those that change those values on a pixel-by-pixel basis:
for ( uint32_t y = 0; y < height; y++ )
{
for ( uint32_t x = 0; x < width; x++ )
{
bgraImage.at<cv::Vec<uint8_t,4> >(y,x)[1] = 0;
}
}
This—to use your example—will not work with the code you provided:
cv::Mat bgraImage = cv::Mat( (int)height, (int)extendedWidth, CV_8UC4, base );
cv::Mat grey = bgraImage.clone();
cv::cvtColor(grey, grey, 44);

Streaming non-PCM raw audio using NAudio

I'm hell bent on making this work with NAudio, so please tell me if there's a way around this. I have streaming raw audio coming in from a serial device, which I'm trying to play through WaveOut.
Attempt 1:
'Constants 8000, 1, 8000 * 1, 1, 8
Dim CustomWaveOutFormat = WaveFormat.CreateCustomFormat(WaveFormatEncoding.Pcm, SampleRate, Channels, AverageBPS, BlockAlign, BitsPerSample)
Dim rawStream = New RawSourceWaveStream(VoicePort.BaseStream, CustomWaveOutFormat)
'Run in background
Dim waveOut = New WaveOut(WaveCallbackInfo.FunctionCallback())
'Play stream
waveOut.Init(rawStream)
waveOut.Play()
This code works, but there's a tiny problem - the actual audio stream isn't raw PCM, it's raw MuLaw. It plays out the companding like a Beethoven's 5th on cheese-grater. If I change the WaveFormat to WaveFormatEncoding.MuLaw, I get a bad format exception because it's raw audio and there are no RIFF headers.
So I moved over to converting it to PCM:
Attempt 2:
Dim reader = New MuLawWaveStream(VoicePort.BaseStream, SampleRate, Channels)
Dim pcmStream = WaveFormatConversionStream.CreatePcmStream(reader)
Dim waveOutStream = New BlockAlignReductionStream(pcmStream)
waveOut.Init(waveOutStream)
Here, CreatePcmStream tries to get the length of the stream (even though CanSeek = false) and fails.
Attempt 3
waveOutStream = New BufferedWaveProvider(WaveFormat.CreateMuLawFormat(SampleRate, Channels))
*add samples when OnDataReceived()*
It too seems to suffer from lack of having a header.
I'm hoping there's something minor I missed in all of this. The device only streams audio when in use, and no data is received otherwise - a case which is handled by (1).
To make attempt (1) work, your RawSourceWaveStream should specify the format that the data really is in. Then just use another WaveFormatConversionStream.CreatePcmStream, taking rawStream as the input:
Dim muLawStream = New RawSourceWaveStream(VoicePort.BaseStream, WaveFormat.CreateMuLawFormat(SampleRate, Channels))
Dim pcmStream = WaveFormatConversionStream.CreatePcmStream(muLawStream);
Attempt (2) is actually very close to working. You just need to make MuLawStream.Length return 0. You don't need it for what you are doing. BlockAlignReductionStream is irrelevant to mu-law as well since mu law block align is 1.
Attempt (3) should work. I don't know what you mean by lack of a header?
In NAudio you are building a pipeline of audio data. Each stage in the pipeline can have a different format. Your audio starts off in Mu-law, then gets converted to PCM, then can be played. A buffered WaveProvider is used for you want playback to continue even though your device has stopped providing audio data.
Edit I should add that the IWaveProvider interface in NAudio is a simplified WaveStream. It has only a format and a Read method, and is useful for situations where Length is unknown and repositioning is not possible.