I am making HID for some data acquisition system. There are a lot of sensors who store test data and when I need I get to them and connect via USB and take it. USB host sent 3 bytes and USB device, if bytes are correct, sends its stored data. Sounds simple.
Previously it was implemented on PC, but now I try to implement it on STM32F769 Discovery and have some serious problems.
I am using ARM Keil 5.27, code generated with STM32CubeMX 5.3.0. I tried just to make a plain simple program, later to integrate with the entire touchscreen interface. I tried to implement this code in main:
if (HAL_GPIO_ReadPin(BUTTON_GPIO_Port, BUTTON_Pin))
while (HAL_GPIO_ReadPin(BUTTON_GPIO_Port, BUTTON_Pin))
{
Transmission_function();
}
And the function itself:
#define DLE 0x10
#define STX 0x2
uint8_t tx_buf[]={DLE, STX, 120}, RX_FLAG;
uint32_t size_tx=sizeof(tx_buf);
void Transmission_function (void)
{
if (Appli_state == APPLICATION_READY)
{
i=0;
USBH_CDC_Transmit(&hUsbHostHS, tx_buf, size_tx);
HAL_Delay(50);
RX_FLAG=0;
}
}
It should send the message after I press the blue button on the Discovery board. All that I get is Hard Fault. While trying to debug, I tried manually to check after which action I get this error and it was functioning in stm32f7xx_ll_usb.c:
HAL_StatusTypeDef USB_WritePacket(USB_OTG_GlobalTypeDef *USBx, uint8_t *src,
uint8_t ch_ep_num, uint16_t len, uint8_t dma)
{
uint32_t USBx_BASE = (uint32_t)USBx;
uint32_t *pSrc = (uint32_t *)src;
uint32_t count32b, i;
if (dma == 0U)
{
count32b = ((uint32_t)len + 3U) / 4U;
for (i = 0U; i < count32b; i++)
{
USBx_DFIFO((uint32_t)ch_ep_num) = *((__packed uint32_t *)pSrc);
pSrc++;
}
}
return HAL_OK;
}
But trying to scroll back in disassembly I notice, that just before Hard Fault program was in this function inside stm32f7xx_hal_hcd.c, in case GRXSTS_PKTSTS_IN:
static void HCD_RXQLVL_IRQHandler(HCD_HandleTypeDef *hhcd)
{
USB_OTG_GlobalTypeDef *USBx = hhcd->Instance;
uint32_t USBx_BASE = (uint32_t)USBx;
uint32_t pktsts;
uint32_t pktcnt;
uint32_t temp;
uint32_t tmpreg;
uint32_t ch_num;
temp = hhcd->Instance->GRXSTSP;
ch_num = temp & USB_OTG_GRXSTSP_EPNUM;
pktsts = (temp & USB_OTG_GRXSTSP_PKTSTS) >> 17;
pktcnt = (temp & USB_OTG_GRXSTSP_BCNT) >> 4;
switch (pktsts)
{
case GRXSTS_PKTSTS_IN:
/* Read the data into the host buffer. */
if ((pktcnt > 0U) && (hhcd->hc[ch_num].xfer_buff != (void *)0))
{
(void)USB_ReadPacket(hhcd->Instance, hhcd->hc[ch_num].xfer_buff, (uint16_t)pktcnt);
/*manage multiple Xfer */
hhcd->hc[ch_num].xfer_buff += pktcnt;
hhcd->hc[ch_num].xfer_count += pktcnt;
if ((USBx_HC(ch_num)->HCTSIZ & USB_OTG_HCTSIZ_PKTCNT) > 0U)
{
/* re-activate the channel when more packets are expected */
tmpreg = USBx_HC(ch_num)->HCCHAR;
tmpreg &= ~USB_OTG_HCCHAR_CHDIS;
tmpreg |= USB_OTG_HCCHAR_CHENA;
USBx_HC(ch_num)->HCCHAR = tmpreg;
hhcd->hc[ch_num].toggle_in ^= 1U;
}
}
break;
case GRXSTS_PKTSTS_DATA_TOGGLE_ERR:
break;
case GRXSTS_PKTSTS_IN_XFER_COMP:
case GRXSTS_PKTSTS_CH_HALTED:
default:
break;
}
}
Last few lines from Dissasembly shows this:
0x080018B4 E8BD81F0 POP {r4-r8,pc}
0x080018B8 0000 DCW 0x0000
0x080018BA 1FF8 DCW 0x1FF8
Why it fails? How could I fix it? I do not have much experience with USB protocol.
I will post my walkaround this, but I am not sure why it worked. Solution was to use EXTI0 interrupt instead of just detection if PA0 is high, as I showed I used here:
if (HAL_GPIO_ReadPin(BUTTON_GPIO_Port, BUTTON_Pin))
while (HAL_GPIO_ReadPin(BUTTON_GPIO_Port, BUTTON_Pin))
Transmission_function();
I changed it to this:
void EXTI0_IRQHandler(void)
{
/* USER CODE BEGIN EXTI0_IRQn 0 */
if(Appli_state == APPLICATION_READY){
USBH_CDC_Transmit(&hUsbHostHS, Buffer, 3);
}
/* USER CODE END EXTI0_IRQn 0 */
HAL_GPIO_EXTI_IRQHandler(GPIO_PIN_0);
/* USER CODE BEGIN EXTI0_IRQn 1 */
/* USER CODE END EXTI0_IRQn 1 */
}
Originally the problem appeared when I tried to optimize an algorithm for neon arm and some minor part of it was taking 80% of according to profiler. I tried to test to see what can be done to improve it and for that I created array of function pointers to different versions of my optimized function and then I run them in the loop to see in profiler which one performs better:
typedef unsigned(*CalcMaxFunc)(const uint16_t a[8][4], const uint16_t b[4][4]);
CalcMaxFunc CalcMaxFuncs[] =
{
CalcMaxFunc_NEON_0,
CalcMaxFunc_NEON_1,
CalcMaxFunc_NEON_2,
CalcMaxFunc_NEON_3,
CalcMaxFunc_C_0
};
int N = sizeof(CalcMaxFunc) / sizeof(CalcMaxFunc[0]);
for (int i = 0; i < 10 * N; ++i)
{
auto f = CalcMaxFunc[i % N];
unsigned retI = f(a, b);
// just random code to ensure that cpu waits for the results
// and compiler doesn't optimize it away
if (retI > 1000000)
break;
ret |= retI;
}
I got surprising results: performance of a function was totally depend on its position within CalcMaxFuncs array. For example, when I swapped CalcMaxFunc_NEON_3 to be first it would be 3-4 times slower and according to profiler it would stall at the last bit of the function where it tried to move data from neon to arm register.
So, what does it make stall sometimes and not in other times? BY the way, I profile on iPhone6 in xcode if that matters.
When I intentionally introduced neon pipeline stalls by mixing-in some floating point division between calling these functions in the loop I eliminated unreliable behavior, now all of them perform the same regardless of the order in which they were called. So, why in the first place did I have that problem and what can I do to eliminate it in actual code?
Update:
I tried to create a simple test function and then optimize it in stages and see how I could possibly avoid neon->arm stalls.
Here's the test runner function:
void NeonStallTest()
{
int findMinErr(uint8_t* var1, uint8_t* var2, int size);
srand(0);
uint8_t var1[1280];
uint8_t var2[1280];
for (int i = 0; i < sizeof(var1); ++i)
{
var1[i] = rand();
var2[i] = rand();
}
#if 0 // early exit?
for (int i = 0; i < 16; ++i)
var1[i] = var2[i];
#endif
int ret = 0;
for (int i=0; i<10000000; ++i)
ret += findMinErr(var1, var2, sizeof(var1));
exit(ret);
}
And findMinErr is this:
int findMinErr(uint8_t* var1, uint8_t* var2, int size)
{
int ret = 0;
int ret_err = INT_MAX;
for (int i = 0; i < size / 16; ++i, var1 += 16, var2 += 16)
{
int err = 0;
for (int j = 0; j < 16; ++j)
{
int x = var1[j] - var2[j];
err += x * x;
}
if (ret_err > err)
{
ret_err = err;
ret = i;
}
}
return ret;
}
Basically it it does sum of squared difference between each uint8_t[16] block and returns index of the block pair that has lowest squared difference. So, then I rewrote it in neon intrisics (no particular attempt was made to make it fast, as it's not the point):
int findMinErr_NEON(uint8_t* var1, uint8_t* var2, int size)
{
int ret = 0;
int ret_err = INT_MAX;
for (int i = 0; i < size / 16; ++i, var1 += 16, var2 += 16)
{
int err;
uint8x8_t var1_0 = vld1_u8(var1 + 0);
uint8x8_t var1_1 = vld1_u8(var1 + 8);
uint8x8_t var2_0 = vld1_u8(var2 + 0);
uint8x8_t var2_1 = vld1_u8(var2 + 8);
int16x8_t s0 = vreinterpretq_s16_u16(vsubl_u8(var1_0, var2_0));
int16x8_t s1 = vreinterpretq_s16_u16(vsubl_u8(var1_1, var2_1));
uint16x8_t u0 = vreinterpretq_u16_s16(vmulq_s16(s0, s0));
uint16x8_t u1 = vreinterpretq_u16_s16(vmulq_s16(s1, s1));
#ifdef __aarch64__1
err = vaddlvq_u16(u0) + vaddlvq_u16(u1);
#else
uint32x4_t err0 = vpaddlq_u16(u0);
uint32x4_t err1 = vpaddlq_u16(u1);
err0 = vaddq_u32(err0, err1);
uint32x2_t err00 = vpadd_u32(vget_low_u32(err0), vget_high_u32(err0));
err00 = vpadd_u32(err00, err00);
err = vget_lane_u32(err00, 0);
#endif
if (ret_err > err)
{
ret_err = err;
ret = i;
#if 0 // enable early exit?
if (ret_err == 0)
break;
#endif
}
}
return ret;
}
Now, if (ret_err > err) is clearly data hazard. Then I manually "unrolled" loop by two and modified code to use err0 and err1 and check them after performing next round of compute. According to profiler I got some improvements. In simple neon loop I got roughly 30% of entire function spent in the two lines vget_lane_u32 followed by if (ret_err > err). After I unrolled by two these operations started to take 25% (e.g. I got roughly 10% overall speedup). Also, check armv7 version, there is only 8 instructions between when err0 is set (vmov.32 r6, d16[0]) and when it's accessed (cmp r12, r6). T
Note, in the code early exit is ifdefed out. Enabling it would make function even slower. If I unrolled it by four and changed to use four errN variables and deffer check by two rounds then I still saw vget_lane_u32 in profiler taking too much time. When I checked generated asm, appears that compiler destroys all the optimizations attempts because it reuses some of the errN registers which effectively makes CPU access results of vget_lane_u32 much earlier than I want (and I aim to delay access by 10-20 instructions). Only when I unrolled by 4 and marked all four errN as volatile vget_lane_u32 totally disappeared from the radar in profiler, however, the if (ret_err > errN) check obviously got slow as hell as now these probably ended up as regular stack variables overall these 4 checks in 4x manual loop unroll started to take 40%. Looks like with proper manual asm it's possible to make it work properly: have early loop exit, while avoiding neon->arm stalls and have some arm logic in the loop, however, extra maintenance required to deal with arm asm makes it 10x more complex to maintain that kind of code in a large project (that doesn't have any armasm).
Update:
Here's sample stall when moving data from neon to arm register. To implement early exist I need to move from neon to arm once per loop. This move alone takes more than 50% of entire function according to sampling profiler that comes with xcode. I tried to add lots of noops before and/or after the mov, but nothing seems to affect results in profiler. I tried to use vorr d0,d0,d0 for noops: no difference. What's the reason for the stall, or the profiler simply shows wrong results?
I'm trying to comunicate a pic18f24k50 with an arduino 101. I'm using two lines to establish a synchronized comunication. On every change from low to high of the first line, I read the value from the second line.
I have no problems with the arduino code, my problem is that in the pic, the Interrupt on change triggers on the second change from low to high instead of triggering on the first change. This only happens the first time I send data, after that, it works perfectly (it triggers on the first change and I receive the byte properly). Sorry for my english, I'll try to explain myself better with this image:
Channel 1 is the clock signal, channel2 is the data (Im sending one byte for the moment, with bit values 10101010). Channel 4 is an output I'm changing every time I process a bit. (as you can see, it begins on the second rise of the clock signal instead of the first one). This is captured on the first sent byte, the next ones works ok.
I post the relevant code in the pic here:
This is where I initialize things:
TRISCbits.TRISC6 = 0;
TRISCbits.TRISC1 = 1;
TRISCbits.TRISC2 = 1;
IOCC1 = 1;
ANSELCbits.ANSC2=0;
IOCC2 = 0;
INTCONbits.IOCIE = 1;
INTCONbits.IOCIF = 0;
And this is on the interrupt code:
void interrupt SYS_InterruptHigh(void)
{
if (INTCONbits.IOCIE==1 && INTCONbits.IOCIF==1)
{
readByte();
}
}
void readByte(void)
{
while(contaBits<8)
{
INTCONbits.IOCIE = 0;
INTCONbits.IOCIF = 0;
while (PORTCbits.RC1 != HIGH)
{
}
if (PORTCbits.RC1 == HIGH)
{
LATCbits.LATC6 = !LATCbits.LATC6;
//LATCbits.LATC6 = ~LATCbits.LATC6;
switch (contaBits)
{
case 0:
if (PORTCbits.RC2 == HIGH)
varByte.b0 = 1;
else
varByte.b0 = 0;
break;
case 1:
if (PORTCbits.RC2 == HIGH)
varByte.b1 = 1;
else
varByte.b1 = 0;
break;
case 2:
if (PORTCbits.RC2 == HIGH)
varByte.b2 = 1;
else
varByte.b2 = 0;
break;
case 3:
if (PORTCbits.RC2 == HIGH)
varByte.b3 = 1;
else
varByte.b3 = 0;
break;
case 4:
if (PORTCbits.RC2 == HIGH)
varByte.b4 = 1;
else
varByte.b4 = 0;
break;
case 5:
if (PORTCbits.RC2 == HIGH)
varByte.b5 = 1;
else
varByte.b5 = 0;
break;
case 6:
if (PORTCbits.RC2 == HIGH)
varByte.b6 = 1;
else
varByte.b6 = 0;
break;
case 7:
if (PORTCbits.RC2 == HIGH)
varByte.b7 = 1;
else
varByte.b7 = 0;
break;
}
contaBits++;
}
}//while(contaBits<8)
INTCONbits.IOCIE = 1;
contaBits=0;
}
LATCbits.LATC6 = !LATCbits.LATC6; <-- this is the line corresponding to channel 4.
RC1 is channel 1
and RC2 is channel 2
My question is what am I doing wrong, why on the first sent of bytes the interrupt doesn't triggers on the first change of the line 1?
Thank you.
What are you trying to achieve?
The communication protocol as you're describing is commonly refered to as SPI (Serial Peripheral Interface). You should use the hardware implementation available by PIC/Microchip, if possible, for best performance.
Keep your code documented/formatted/logic
I noticed your code being a little, weird.
Weird code gives weird errors.
First:
You're using interrupts, you have this blocking code in your interrupt. Which isn't nice at all.
Second:
Your "contabits" and "varByte" variable come out of nowhere, probably they're global. Which might not be the best practice.
Though, if you're using interrupts, and when it might make sense to transfer the variables to your main program by using a global. The global should be volatile.
Third:
That case switch is just 8x the same code, but a little different.
Also, add some comments to your code.
To be honest
I didn't spot your actual error, has been a long time since I worked with PIC.
But, you should try hardware SPI.
Below is my attempt at software SPI as you described, it might be a little more logical, and takes out any annoyance of the interrupts.
I would recommend replacing the while(CLOCK_PIN_MACRO != 1){}; with for(unsigned int timeout = 64000; timeout > 0; timeout--){};.
So that you can be sure your program won't hang while waiting for an SPI signal that won't come. (Make the timeout something what fits your needs)
#define CLOCK_PIN_MACRO PORTCbits.RC1
#define DATA_PIN_MACRO PORTCbits.RC2
/*
Test, without interrupts.
I would strongly advise to use Hardware SPI or non-blocking code.
*/
unsigned char readByte(void){
unsigned char returnByte = 0;
for(int i = 0; i < 8; i++){ //Do this for bit 0 to 7, to get a complete byte.
while(CLOCK_PIN_MACRO != 1){}; //Wait until the Clock pin goes high.
if(DATA_PIN_MACRO == 1) //Check the data pin.
returnByte = returnByte & (1 << i); //Shift the bit in our byte. (https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/bitshift.html)
while(CLOCK_PIN_MACRO == 1){}; //Wait untill the Clock pin goes low again.
}
return returnByte;
}
void setup(){
TRISCbits.TRISC6 = 0;
TRISCbits.TRISC1 = 1;
TRISCbits.TRISC2 = 1;
//IOCC1 = 1;
ANSELCbits.ANSC2=0;
//IOCC2 = 0;
INTCONbits.IOCIE = 1;
INTCONbits.IOCIF = 0;
}
void run(){
unsigned char theByte = readByte();
}
main(){
setup();
while(1){
run();
}
}
You can find the following from the datasheet.
The pins are compared with the old value latched on the last read of
PORTB. The "mismatch" outputs of RB7:RB4 are ORed together to generate
the RB Port Change Interrupt with Flag bit,RBIF (INTCON<0>).
This means that when you read PORTB RB7:RB4 pins, the value of each pin will be stored in an internal latch. Interrupt will be generated only when there is any change in the pin input from the previously latched value.
Then, what will be the default initial value in the latch? i.e. Before we read anything from PORTB.
What ever it is. in your case the default latch value is different from the logic state of your input signal. During the next transition of the signal the latch value and signal level becomes same. So no intterrupt is generated first time.
What you can do is setting the latch value to the initial state of your signal. This can be done by reading PORTB (should be done before enabling the intterrupt).
The below code is enough to simply read the register.
unsigned char ch;
ch = PORTB;
Hope this will help.
I am trying to do a simple playback from a file functionality and it appears that my callback function is never called. It doesn't really make sense because all of the OSStatuses come back 0 and other numbers all appear correct as well (like the output packets read pointer from AudioFileReadPackets).
Here is the setup:
OSStatus stat;
stat = AudioFileOpenURL(
(CFURLRef)urlpath, kAudioFileReadPermission, 0, &aStreamData->aFile
);
UInt32 dsze = 0;
stat = AudioFileGetPropertyInfo(
aStreamData->aFile, kAudioFilePropertyDataFormat, &dsze, 0
);
stat = AudioFileGetProperty(
aStreamData->aFile, kAudioFilePropertyDataFormat, &dsze, &aStreamData->aDescription
);
stat = AudioQueueNewOutput(
&aStreamData->aDescription, bufferCallback, aStreamData, NULL, NULL, 0, &aStreamData->aQueue
);
aStreamData->pOffset = 0;
for(int i = 0; i < NUM_BUFFERS; i++) {
stat = AudioQueueAllocateBuffer(
aStreamData->aQueue, aStreamData->aDescription.mBytesPerPacket, &aStreamData->aBuffer[i]
);
bufferCallback(aStreamData, aStreamData->aQueue, aStreamData->aBuffer[i]);
}
stat = AudioQueuePrime(aStreamData->aQueue, 0, NULL);
stat = AudioQueueStart(aStreamData->aQueue, NULL);
(Not shown is where I'm checking the value of stat in between the functions, it just comes back normal.)
And the callback function:
void bufferCallback(void *uData, AudioQueueRef queue, AudioQueueBufferRef buffer) {
UInt32 bread = 0;
UInt32 pread = buffer->mAudioDataBytesCapacity / player->aStreamData->aDescription.mBytesPerPacket;
OSStatus stat;
stat = AudioFileReadPackets(
player->aStreamData->aFile, false, &bread, NULL, player->aStreamData->pOffset, &pread, buffer->mAudioData
);
buffer->mAudioDataByteSize = bread;
stat = AudioQueueEnqueueBuffer(queue, buffer, 0, NULL);
player->aStreamData->pOffset += pread;
}
Where aStreamData is my user data struct (typedefed so I can use it as a class property) and player is a static instance of the controlling Objective-C class. If any other code is wanted please let me know. I am a bit at my wit's end. Printing any of the numbers involved here yields the correct result, including functions in bufferCallback when I call it myself in the allocate loop. It just never gets called thereafter. The start up method returns and nothing happens.
Also anecdotally, I am using a peripheral device (an MBox Pro 3) to play the sound which CoreAudio only boots up when it is about to output. IE if I start iTunes or something, the speakers pop faintly and there is an LED that goes from blinking to solid. The device boots up like it does so CA is definitely doing something. (Also I've of course tried it with the onboard Macbook sound sans the device.)
I've read other solutions to problems that sound similiar and they don't work. Stuff like using multiple buffers which I am doing now and doesn't appear to make any difference.
I basically assume I am doing something obviously wrong somehow but not sure what it could be. I've read the relevant documentation, looked at the available code examples and scoured the net a bit for answers and it appears that this is all I need to do and it should just go.
At the very least, is there anything else I can do to investigate?
My first answer was not good enough, so I compiled a minimal example that will play a 2 channel, 16 bit wave file.
The main difference from your code is that I made a property listener listening for play start and stop events.
As for your code, it seems legit at first glance. Two things I will point out, though:
1. Is seems you are allocating buffers with TOO SMALL a buffer size. I have noticed that AudioQueues won't play if the buffers are too small, which seems to fit your problem.
2. Have you verified the properties returned?
Back to my code example:
Everything is hard coded, so it is not exactly good coding practice, but it shows how you can do it.
AudioStreamTest.h
#import <Foundation/Foundation.h>
#import <AudioToolbox/AudioToolbox.h>
uint32_t bufferSizeInSamples;
AudioFileID file;
UInt32 currentPacket;
AudioQueueRef audioQueue;
AudioQueueBufferRef buffer[3];
AudioStreamBasicDescription audioStreamBasicDescription;
#interface AudioStreamTest : NSObject
- (void)start;
- (void)stop;
#end
AudioStreamTest.m
#import "AudioStreamTest.h"
#implementation AudioStreamTest
- (id)init
{
self = [super init];
if (self) {
bufferSizeInSamples = 441;
file = NULL;
currentPacket = 0;
audioStreamBasicDescription.mBitsPerChannel = 16;
audioStreamBasicDescription.mBytesPerFrame = 4;
audioStreamBasicDescription.mBytesPerPacket = 4;
audioStreamBasicDescription.mChannelsPerFrame = 2;
audioStreamBasicDescription.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
audioStreamBasicDescription.mFormatID = kAudioFormatLinearPCM;
audioStreamBasicDescription.mFramesPerPacket = 1;
audioStreamBasicDescription.mReserved = 0;
audioStreamBasicDescription.mSampleRate = 44100;
}
return self;
}
- (void)start {
AudioQueueNewOutput(&audioStreamBasicDescription, AudioEngineOutputBufferCallback, (__bridge void *)(self), NULL, NULL, 0, &audioQueue);
AudioQueueAddPropertyListener(audioQueue, kAudioQueueProperty_IsRunning, AudioEnginePropertyListenerProc, NULL);
AudioQueueStart(audioQueue, NULL);
}
- (void)stop {
AudioQueueStop(audioQueue, YES);
AudioQueueRemovePropertyListener(audioQueue, kAudioQueueProperty_IsRunning, AudioEnginePropertyListenerProc, NULL);
}
void AudioEngineOutputBufferCallback(void *inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer) {
if (file == NULL) return;
UInt32 bytesRead = bufferSizeInSamples * 4;
UInt32 packetsRead = bufferSizeInSamples;
AudioFileReadPacketData(file, false, &bytesRead, NULL, currentPacket, &packetsRead, inBuffer->mAudioData);
inBuffer->mAudioDataByteSize = bytesRead;
currentPacket += packetsRead;
if (bytesRead == 0) {
AudioQueueStop(inAQ, false);
}
else {
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL);
}
}
void AudioEnginePropertyListenerProc (void *inUserData, AudioQueueRef inAQ, AudioQueuePropertyID inID) {
//We are only interested in the property kAudioQueueProperty_IsRunning
if (inID != kAudioQueueProperty_IsRunning) return;
//Get the status of the property
UInt32 isRunning = false;
UInt32 size = sizeof(isRunning);
AudioQueueGetProperty(inAQ, kAudioQueueProperty_IsRunning, &isRunning, &size);
if (isRunning) {
currentPacket = 0;
NSString *fileName = #"/Users/roy/Documents/XCodeProjectsData/FUZZ/03.wav";
NSURL *fileURL = [[NSURL alloc] initFileURLWithPath: fileName];
AudioFileOpenURL((__bridge CFURLRef) fileURL, kAudioFileReadPermission, 0, &file);
for (int i = 0; i < 3; i++){
AudioQueueAllocateBuffer(audioQueue, bufferSizeInSamples * 4, &buffer[i]);
UInt32 bytesRead = bufferSizeInSamples * 4;
UInt32 packetsRead = bufferSizeInSamples;
AudioFileReadPacketData(file, false, &bytesRead, NULL, currentPacket, &packetsRead, buffer[i]->mAudioData);
buffer[i]->mAudioDataByteSize = bytesRead;
currentPacket += packetsRead;
AudioQueueEnqueueBuffer(audioQueue, buffer[i], 0, NULL);
}
}
else {
if (file != NULL) {
AudioFileClose(file);
file = NULL;
for (int i = 0; i < 3; i++) {
AudioQueueFreeBuffer(audioQueue, buffer[i]);
buffer[i] = NULL;
}
}
}
}
-(void)dealloc {
[super dealloc];
AudioQueueDispose(audioQueue, true);
audioQueue = NULL;
}
#end
Lastly, I want to include some research I have done today to test the robustness of AudioQueues.
I have noticed that if you make too small AudioQueue buffers, it won't play at all. That made me play around a bit to see why it is not playing.
If I try buffer size that can hold only 150 samples, I get no sound at all.
If I try buffer size that can hold 175 samples, it plays the whole song through, but with A lot of distortion. 175 amounts to a tad less than 4 ms of audio.
AudioQueue keeps asking for new buffers as long as you keep supplying buffers. That is regardless of AudioQueue actually playing your buffers or not.
If you supply a buffer with size 0, the buffer will be lost and an error kAudioQueueErr_BufferEmpty is returned for that queue enqueue request. You will never see AudioQueue ask you to fill that buffer again. If this happened for the last queue you have posted, AudioQueue will stop asking you to fill any more buffers. In that case you will not hear any more audio for that session.
To see why AudioQueues is not playing anything with smaller buffer sizes, I made a test to see if my callback is called at all even when there is no sound. The answer is that the buffers gets called all the time as long as AudioQueues is playing and needs data.
So if you keep feeding buffers to the queue, no buffer is ever lost. It doesn't happen. Unless there is an error, of course.
So why is no sound playing?
I tested to see if 'AudioQueueEnqueueBuffer()' returned any errors. It did not. No other errors within my play routine either. The data returned from reading from file is also good.
Everything is normal, buffers are good, data re-enqueued is good, there is just no sound.
So my last test was to slowly increase buffer size till I could hear anything. I finally heard faint and sporadic distortion.
Then it came to me...
It seems that the problem lies with that the system tries to keep the stream in sync with time so if you enqueue audio, and the time for the audio you wanted to play has passed, it will just skip that part of the buffer. If the buffer size becomes too small, more and more data is dropped or skipped until the audio system is in sync again. Which is never if the buffer size is too small. (You can hear this as distortion if you chose a buffer size that is barely large enough to support continuous play.)
If you think about it, it is the only way the audio queue can work, but it is a good realisation when you are clueless like me and "discover" how it really works.
I decided to take a look at this again and was able to solve it by making the buffers larger. I've accepted the answer by #RoyGal since it was their suggestion but I wanted to provide the actual code that works since I guess others are having the same problem (question has a few favorites that aren't me at the moment).
One thing I tried was making the packet size larger:
aData->aDescription.mFramesPerPacket = 512; // or some other number
aData->aDescription.mBytesPerPacket = (
aData->aDescription.mFramesPerPacket * aData->aDescription.mBytesPerFrame
);
This does NOT work: it causes AudioQueuePrime to fail with an AudioConverterNew returned -50 message. I guess it wants mFramesPerPacket to be 1 for PCM.
(I also tried setting the kAudioQueueProperty_DecodeBufferSizeFrames property which didn't seem to do anything. Not sure what it's for.)
The solution seems to be to only allocate the buffer(s) with the specified size:
AudioQueueAllocateBuffer(
aData->aQueue,
aData->aDescription.mBytesPerPacket * N_BUFFER_PACKETS / N_BUFFERS,
&aData->aBuffer[i]
);
And the size has to be sufficiently large. I found the magic number is:
mBytesPerPacket * 1024 / N_BUFFERS
(Where N_BUFFERS is the number of buffers and should be > 1 or playback is choppy.)
Here is an MCVE demonstrating the issue and solution:
#import <Foundation/Foundation.h>
#import <AudioToolbox/AudioToolbox.h>
#import <AudioToolbox/AudioQueue.h>
#import <AudioToolbox/AudioFile.h>
#define N_BUFFERS 2
#define N_BUFFER_PACKETS 1024
typedef struct AStreamData {
AudioFileID aFile;
AudioQueueRef aQueue;
AudioQueueBufferRef aBuffer[N_BUFFERS];
AudioStreamBasicDescription aDescription;
SInt64 pOffset;
volatile BOOL isRunning;
} AStreamData;
void printASBD(AudioStreamBasicDescription* desc) {
printf("mSampleRate = %d\n", (int)desc->mSampleRate);
printf("mBytesPerPacket = %d\n", desc->mBytesPerPacket);
printf("mFramesPerPacket = %d\n", desc->mFramesPerPacket);
printf("mBytesPerFrame = %d\n", desc->mBytesPerFrame);
printf("mChannelsPerFrame = %d\n", desc->mChannelsPerFrame);
printf("mBitsPerChannel = %d\n", desc->mBitsPerChannel);
}
void bufferCallback(
void *vData, AudioQueueRef aQueue, AudioQueueBufferRef aBuffer
) {
AStreamData* aData = (AStreamData*)vData;
UInt32 bRead = 0;
UInt32 pRead = (
aBuffer->mAudioDataBytesCapacity / aData->aDescription.mBytesPerPacket
);
OSStatus stat;
stat = AudioFileReadPackets(
aData->aFile, false, &bRead, NULL, aData->pOffset, &pRead, aBuffer->mAudioData
);
if(stat != 0) {
printf("AudioFileReadPackets returned %d\n", stat);
}
if(pRead == 0) {
aData->isRunning = NO;
return;
}
aBuffer->mAudioDataByteSize = bRead;
stat = AudioQueueEnqueueBuffer(aQueue, aBuffer, 0, NULL);
if(stat != 0) {
printf("AudioQueueEnqueueBuffer returned %d\n", stat);
}
aData->pOffset += pRead;
}
AStreamData* beginPlayback(NSURL* path) {
static AStreamData* aData;
aData = malloc(sizeof(AStreamData));
OSStatus stat;
stat = AudioFileOpenURL(
(CFURLRef)path, kAudioFileReadPermission, 0, &aData->aFile
);
printf("AudioFileOpenURL returned %d\n", stat);
UInt32 dSize = 0;
stat = AudioFileGetPropertyInfo(
aData->aFile, kAudioFilePropertyDataFormat, &dSize, 0
);
printf("AudioFileGetPropertyInfo returned %d\n", stat);
stat = AudioFileGetProperty(
aData->aFile, kAudioFilePropertyDataFormat, &dSize, &aData->aDescription
);
printf("AudioFileGetProperty returned %d\n", stat);
printASBD(&aData->aDescription);
stat = AudioQueueNewOutput(
&aData->aDescription, bufferCallback, aData, NULL, NULL, 0, &aData->aQueue
);
printf("AudioQueueNewOutput returned %d\n", stat);
aData->pOffset = 0;
for(int i = 0; i < N_BUFFERS; i++) {
// change YES to NO for stale playback
if(YES) {
stat = AudioQueueAllocateBuffer(
aData->aQueue,
aData->aDescription.mBytesPerPacket * N_BUFFER_PACKETS / N_BUFFERS,
&aData->aBuffer[i]
);
} else {
stat = AudioQueueAllocateBuffer(
aData->aQueue,
aData->aDescription.mBytesPerPacket,
&aData->aBuffer[i]
);
}
printf(
"AudioQueueAllocateBuffer returned %d for aBuffer[%d] with capacity %d\n",
stat, i, aData->aBuffer[i]->mAudioDataBytesCapacity
);
bufferCallback(aData, aData->aQueue, aData->aBuffer[i]);
}
UInt32 numFramesPrepared = 0;
stat = AudioQueuePrime(aData->aQueue, 0, &numFramesPrepared);
printf("AudioQueuePrime returned %d with %d frames prepared\n", stat, numFramesPrepared);
stat = AudioQueueStart(aData->aQueue, NULL);
printf("AudioQueueStart returned %d\n", stat);
UInt32 pSize = sizeof(UInt32);
UInt32 isRunning;
stat = AudioQueueGetProperty(
aData->aQueue, kAudioQueueProperty_IsRunning, &isRunning, &pSize
);
printf("AudioQueueGetProperty returned %d\n", stat);
aData->isRunning = !!isRunning;
return aData;
}
void endPlayback(AStreamData* aData) {
OSStatus stat = AudioQueueStop(aData->aQueue, NO);
printf("AudioQueueStop returned %d\n", stat);
}
NSString* getPath() {
// change NO to YES and enter path to hard code
if(NO) {
return #"";
}
char input[512];
printf("Enter file path: ");
scanf("%[^\n]", input);
return [[NSString alloc] initWithCString:input encoding:NSASCIIStringEncoding];
}
int main(int argc, const char* argv[]) {
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
NSURL* path = [NSURL fileURLWithPath:getPath()];
AStreamData* aData = beginPlayback(path);
if(aData->isRunning) {
do {
printf("Queue is running...\n");
[NSThread sleepForTimeInterval:1.0];
} while(aData->isRunning);
endPlayback(aData);
} else {
printf("Playback did not start\n");
}
[pool drain];
return 0;
}
I am currently working at a logger that uses a MSP430F2618 MCU and SanDisk 4GB SDHC Card.
Card initialization works as expected, I also can read MBR and FAT table.
The problem is that I can't write any data on it. I have checked if it is write protected by notch, but it's not. Windows 7 OS has no problem reading/writing to it.
Though, I have used a tool called "HxD" and I've tried to alter some sectors (under Windows). When I try to save the content to SD card, the tool pop up a windows telling me "Access denied!".
Then I came back to my code for writing to SD card:
uint8_t SdWriteBlock(uchar_t *blockData, const uint32_t address)
{
uint8_t result = OP_ERROR;
uint16_t count;
uchar_t dataResp;
uint8_t idx;
for (idx = RWTIMEOUT; idx > 0; idx--)
{
CS_LOW();
SdCommand(CMD24, address, 0xFF);
dataResp = SdResponse();
if (dataResp == 0x00)
{
break;
}
else
{
CS_HIGH();
SdWrite(0xFF);
}
}
if (0x00 == dataResp)
{
//send command success, now send data starting with DATA TOKEN = 0xFE
SdWrite(0xFE);
//send 512 bytes of data
for (count = 0; count < 512; count++)
{
SdWrite(*blockData++);
}
//now send tow CRC bytes ,through it is not used in the spi mode
//but it is still needed in transfer format
SdWrite(0xFF);
SdWrite(0xFF);
//now read in the DATA RESPONSE TOKEN
do
{
SdWrite(0xFF);
dataResp = SdRead();
}
while (dataResp == 0x00);
//following the DATA RESPONSE TOKEN are a number of BUSY bytes
//a zero byte indicates the SD/MMC is busy programing,
//a non_zero byte indicates SD/MMC is not busy
dataResp = dataResp & 0x0F;
if (0x05 == dataResp)
{
idx = RWTIMEOUT;
do
{
SdWrite(0xFF);
dataResp = SdRead();
if (0x0 == dataResp)
{
result = OP_OK;
break;
}
idx--;
}
while (idx != 0);
CS_HIGH();
SdWrite(0xFF);
}
else
{
CS_HIGH();
SdWrite(0xFF);
}
}
return result;
}
The problem seems to be when I am waiting for card status:
do
{
SdWrite(0xFF);
dataResp = SdRead();
}
while (dataResp == 0x00);
Here I am waiting for a response of type "X5"(hex value) where X is undefined.
But most of the cases the response is 0x00 (hex value) and I don't get out of the loop. Few cases are when the response is 0xFF (hex value).
I can't figure out what is the problem.
Can anyone help me? Thanks!
4GB SDHC
We need to see much more of your code. Many µC SPI codebases only support SD cards <= 2 GB, so using a smaller card might work.
You might check it yourself: SDHC needs a CMD 8 and an ACMD 41 after the CMD 0 (GO_IDLE_STATE) command, otherwise you cannot read or write data to it.
Thank you for your answers, but I solved my problem. It was a problem of timing. I had to put a delay at specific points.