How are USB peripherals' bIntervals enforced? - usb

I have a FullSpeed USB Device that sends a Report Descriptor, whose relevant Endpoint Descriptor declares a bInterval of 8, meaning 8ms.
The following report extract is obtained from a USB Descriptor Dumper when the device's driver is HidUsb:
Interface Descriptor: // +several attributes
------------------------------
0x04 bDescriptorType
0x03 bInterfaceClass (Human Interface Device Class)
0x00 bInterfaceSubClass
0x00 bInterfaceProtocol
0x00 iInterface
HID Descriptor: // +bLength, bCountryCode
------------------------------
0x21 bDescriptorType
0x0110 bcdHID
0x01 bNumDescriptors
0x22 bDescriptorType (Report descriptor)
0x00D6 bDescriptorLength
Endpoint Descriptor: // + bLength, bEndpointAddress, wMaxPacketSize
------------------------------
0x05 bDescriptorType
0x03 bmAttributes (Transfer: Interrupt / Synch: None / Usage: Data)
0x08 bInterval (8 frames)
After switching the driver to WinUSB to be able to use it, if I repeatedly query IN interrupt transfers using libusb, and time the real time spent between 2 libusb calls and during the libusb call using this script :
for (int i = 0; i < n; i++) {
start = std::chrono::high_resolution_clock::now();
forTime = (double)((start - end).count()) / 1000000;
<libusb_interrupt_transfer on IN interrupt endpoint>
end = std::chrono::high_resolution_clock::now();
std::cout << "for " << forTime << std::endl;
transferTime = (double)((end - start).count()) / 1000000;
std::cout << "transfer " << transferTime << std::endl;
std::cout << "sum " << transferTime + forTime << std::endl << std::endl;
}
Here's a sample of obtained values :
for 2.60266
transfer 5.41087
sum 8.04307 //~8
for 3.04287
transfer 5.41087
sum 8.01353 //~8
for 6.42174
transfer 9.65907
sum 16.0808 //~16
for 2.27422
transfer 5.13271
sum 7.87691 //~8
for 3.29928
transfer 4.68676
sum 7.98604 //~8
The sum values consistently stay very close to 8ms, except when the time elapsed before initiating a new interrupt transfer call is too long (the threshold appear to be between 6 and 6.5 for my particular case) in which case it's equal to 16. I have once seen a "for" measure equal to 18ms, and the sum precisely equal to 24ms. Using an URB tracker (Microsoft Message Analyzer in my case), the time differences between Complete URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER message are also multiples of 8ms - usually 8ms. In short, they match the "sum" measures.
So, it is clear that the time elapsed between two returns of "libusb interrupt transfer calls" is a multiple of 8ms, which I assume is related to the bInterval value of 8 (FullSpeed -> *1ms -> 8ms).
But now that I have, I hope, made it clear what I'm talking about - where is that enforced ? Despite research, I cannot find a clear explanation of how the bInterval value affects things.
Apparently, this is enforced by the driver.
Therefore, is it :
The driver forbids the request from firing until 8ms have passed. Sounds like the most reasonable option to me, but from my URB Trace, Dispatch message events were raised several milliseconds before the request came back. This would mean the real time the data left the host is hidden from me/the message analyzer.
The driver hides the response from me and the analyzer until 8ms have passed since the last response.
If it is indeed handled by the driver, there is a lie somewhere to what's shown to me in the log of exchanged message. A response should come immediately after the a request, yet this is not the case. So, either the request is sent after the displayed time, or the response comes earlier than what's displayed.
How does the enforcement of the respect of the bInterval work ?
My ultimate goal is to disregard that bInterval value and poll the device more frequently than 8ms (I have good reason to believe it can be polled up to every 2ms, and a period of 8ms is unacceptable for its usage), but first I would like to know how its current limitation works, if what I'm seeking is possible at all, so I can understand what to study next (ex. writing a custom WinUSB driver)

I have a FullSpeed USB Device
Careful: Did you verify this? The 8ms are the limit for low speed USB devices - which many common mice or keyboards may still be using.
The 8ms scheduling is done inside the USB host driver (ehci/xhci) AFAIK. You could try to gamble this by releasing and reclaiming the interface - not tested, though. (Edit: Won't work, see comment).
An USB device cannot talk on its own, so it has to be the request that is delayed. Note that a device can also NAK any interrupt IN requests when there is no new data available. This simply adds another bInterval ms to the timing.
writing a custom WinUSB driver
Not recommended - replacing a windows supplied driver is quite a hassle. Our libusb-win32 replacer for an USB CDC device breaks on all big windows 10 upgrades - the device uses a COM port instead of libusb once the upgrade is finished.

Related

STM32 USB Custom HID only 1 byte per transaction

I know that maximum speed of USB HID device is 64 kbps, but on oscilloscope I get transactions every 1 ms, which contain only ONE byte. My HID report descriptor listed below. What i must change to achieve 64Kbps? Currently my bInterval = 0x01 (1 ms polling for interrupt endpoint), but actual speed is 65 bytes/s, because it add reportID byte to my 64-byte data. I think, USB should not divide single 64+1 packet to 65 singlebyte packets. For experiment I use reportID=1 (from STM32 to PC). From PC side I use hidapi.dll to interact.
__ALIGN_BEGIN static uint8_t CUSTOM_HID_ReportDesc_FS[USBD_CUSTOM_HID_REPORT_DESC_SIZE] __ALIGN_END =
{
/* USER CODE BEGIN 0 */
USAGE_PAGE(USAGE_PAGE_UNDEFINED)
USAGE(USAGE_UNDEFINED)
COLLECTION(APPLICATION)
REPORT_ID(1)
USAGE(1)
LOGICAL_MIN(0)
LOGICAL_MAX(255)
REPORT_SIZE(8)
REPORT_COUNT(64)
INPUT(DATA | VARIABLE | ABSOLUTE)
REPORT_ID(2)
USAGE(2)
LOGICAL_MIN(0)
LOGICAL_MAX(255)
REPORT_SIZE(8)
REPORT_COUNT(64)
OUTPUT(DATA | VARIABLE | ABSOLUTE)
REPORT_ID(3)
USAGE(3)
LOGICAL_MIN(0)
LOGICAL_MAX(255)
REPORT_SIZE(8)
REPORT_COUNT(64)
OUTPUT(DATA | VARIABLE | ABSOLUTE)
REPORT_ID(4)
USAGE(4)
LOGICAL_MIN(0)
LOGICAL_MAX(255)
REPORT_SIZE(8)
REPORT_COUNT(64)
OUTPUT(DATA | VARIABLE | ABSOLUTE)
/* USER CODE END 0 */
0xC0 /* END_COLLECTION */
};
HID uses interrupt IN/OUT to convey reports. In USB, Interrupt transfers are polled by host every 1 ms. Every time endpoint is polled, it may yield a 64-byte report (for Low/Full speed). That's probably where you get the 64kB/s figure from. Actually, limit is 1k report / second. Also note these limits are different for High-speed and Super-speed devices.
Report descriptor is one thing. What you actually send as interrupt-IN is something else. It should match, but this is not enforced by anything. You should probably look into the code that builds the interrupt IN transfer payload.
Side note: all you seem interested in is to send arbitrary chunks of data, then HID is probably not the relevant profile. Using bulk endpoints looks more appropriate (and you'll not be limited by interrupt endpoint polling rate).

Every SPI send results in receiving a 0 on MSP430

I run this simplified program for SPI communication, running on the TI MSP430FR5969 on its corresponding launchpad MSP-EXP430FR5969, and set breakpoints just before TX and just after RX in CCS (Code Composer Studio). The breakpoints are labelled with comments.
My launchpad is not connected to anything. (Once I figure this out I intend to communicate it to some other device for real communication.)
I do not expect to receive any data because the launchpad is not connected to anything. But I receive exactly one zero for every send. The breakpoints are hit in alternate order starting with the first TX breakpoint.
Why am I receiving data? Is it because I need to enable pullup registers on some of the pins? I believe the launchpad itself uses the USCI "A" module(s) so the "B" module that I am using should have nothing connected to it.
#include <msp430.h>
int main(void) {
WDTCTL = WDTPW | WDTHOLD;
P1SEL0 &= ~BIT3; // UCB0STE
P1SEL0 &= ~BIT6; // UCB0SIMO
P1SEL0 &= ~BIT7; // UCB0SOMI
P2SEL0 &= ~BIT2; // UCB0CLK
P1SEL1 |= BIT3; // UCB0STE
P1SEL1 |= BIT6; // UCB0SIMO
P1SEL1 |= BIT7; // UCB0SOMI
P2SEL1 |= BIT2; // UCB0CLK
PM5CTL0 &= ~LOCKLPM5;
CSCTL0_H = CSKEY_H;
CSCTL1 &= ~DCORSEL;
CSCTL1 = (CSCTL1 & ~0x000e) | DCOFSEL_0; // 1 MHz
CSCTL3 |= DIVA__1 | DIVS__1 | DIVM__1; // clock dividers = 1
CSCTL0_H = 0;
UCB0CTLW0 |= UCSWRST;
UCB0CTLW0 |= UCCKPH;
UCB0CTLW0 |= UCCKPL;
UCB0CTLW0 |= UCMSB;
UCB0CTLW0 |= UCMST;
UCB0CTLW0 |= UCMODE_2;
UCB0CTLW0 |= UCSYNC;
UCB0CTLW0 |= UCSSEL__SMCLK;
UCB0CTLW0 |= UCSTEM;
// UCB0STATW |= UCLISTEN; // OK, if enabled i receive what i send
UCB0CTLW0 &= ~UCSWRST;
UCB0IE |= UCRXIE;
_enable_interrupts();
_delay_cycles(100000);
int send = 0;
while (1) {
while (!(UCB0IFG & UCTXIFG));
UCB0TXBUF = send; // BREAKPOINT 1
send = (send + 1) % 100;
_delay_cycles(100000);
}
return 0;
}
#pragma vector = USCI_B0_VECTOR
__interrupt void isr_usci_b0 (void) {
static volatile int received = 0;
switch (__even_in_range(UCB0IV, USCI_SPI_UCTXIFG)) {
case USCI_NONE:
break;
case USCI_SPI_UCRXIFG:
received = UCB0RXBUF;
UCB0IFG &= ~UCRXIFG; // BREAKPOINT 2
_no_operation();
break;
case USCI_SPI_UCTXIFG:
break;
}
}
The SPI peripheral does two things if MISO and MOSI are enabled (CLK enabled as well, of course). Assuming Master mode operation, it clocks out data from the TX shift register on the MOSI line and simultaneously clocks in data to the RX shift register from the MISO line.
In your circuit, the MISO input is hanging since you have not enabled either pull-up or pull-down internal resistances. Thus, observing 0x00 would not be out of the ordinary. If you had enabled the pull-up resistance, then you would have seen 0xFF in the receive buffer.
Another rule of thumb:
If you are using the peripheral functions then configure the GPIO pins of the MSP430 as output/input. (i.e. MOSI, CLK = output, MISO = input for SPI master mode)
Answer to the questions in the comments:
The MSP430 is configured in the listed code to be the SPI master. I see little point in the using a dedicated RX interrupt service routine, unless you want the controller to do something else in the time between shifting data from the TX buffer to the shift register and shifting data from the RX shift register to the RX buffer, i.e. one "byte" transfer period. You could as well have polled for the RX interrupt as you have for the TX interrupt. But you must wait for the RX interrupt.
Excerpt from the user guide:
The eUSCI initiates data transfer when data is moved to the transmit data buffer UCxTXBUF. The UCxTXBUF data is moved to the transmit (TX) shift register when the TX shift register is empty, initiating data transfer on UCxSIMO starting with either the MSB or LSB, depending on the UCMSB setting. Data on UCxSOMI is shifted into the receive shift register on the opposite clock edge. When the character is received, the receive data is moved from the receive (RX) shift register to the received data buffer UCxRXBUF and the receive interrupt flag UCRXIFG is set, indicating the RX/TX operation is complete.
A set transmit interrupt flag, UCTXIFG, indicates that data has moved from UCxTXBUF to the TX shift register and UCxTXBUF is ready for new data. It does not indicate RX/TX completion.
To receive data into the eUSCI in master mode, data must be written to UCxTXBUF, because receive and transmit operations operate concurrently.
The client will not send data by itself to the MSP430. The client device may need some time to execute the command the master just sent. Typically an "erase flash" command for SPI Flash chips.
In this case the master, i.e. MSP430, must poll the client device to see if it has data to send/completed the command. This is done typically either by polling a status register of the client device (or by using a dedicated IRQ interrupt). i.e. the client signals "completion of command"/"availability of data" via the status byte (or IRQ interrupt). On this event, the master could read out data from the client.
At first glance it may seem rather counter intuitive that data (dummy bytes) needs to be written in order to read data - perhaps your source of confusion as well :)
Perhaps reading about an SPI client may help. For example this SPI memory.
The SPI peripheral transmits a bit and receives a bit for every clock cycle. Instead of wondering how some unconnected device has sent a byte, think that your SPI peripheral has clocked in a receive byte even though nothing is connected. The byte you receive is 0 because the MISO line happens to be low while nothing is connected.
The SPI peripheral does not know the meaning of the data and does not know how many bytes must be transmitted and received for any particular command. It's up to your application to know when to transmit and receive dummy bytes. For example, if the slave responds to a command in the next byte then your application has to transmit two bytes (the command byte followed by a dummy byte) and at the same time receive two bytes (a dummy byte, followed by the response). Some slaves may send a generic status byte instead of a dummy byte as the first byte of all responses. It's up to your application to use or ignore the status byte.
The MSP430's SPI documentation is not going to tell you when you need to send/receive dummy bytes. You'll have to read the SPI documentation for the slave device for that information. Each slave may have different requirements. Some slaves my receive a command byte and send a reply. Other slaves may receive a command and address byte before sending a reply. Some slaves may reply with multiple bytes. You'll have to program your application to transmit/receive the appropriate number of bytes.
There are no stop bits. The master is both transmitting and receiving with every clock. If you want to stop receiving then stop transmitting. If you want to continue receiving then transmit dummy bytes.
Yes, you should use the RX interrupt. The RX interrupt indicates that it is safe for your application to read the received byte from the RX register. The SPI peripheral is shifting receive bits into a shift register with each clock. But after a byte has been received the SPI peripheral still has to copy the contents of the shift register to the RX register and then set the RX interrupt. You shouldn't assume that the received byte can be read from the RX register until the RX interrupt is indicated.
The behavior you describe is to be expected. With SPI, it is movement on the clock line that indicates the presence of data. The input line can be idle, and data will be received because, in order to send a byte, the clock must be toggled back and forth to latch the transmitted data, but at same time, data is latched into the receive buffer.
An SPI bus is intended to be a closed pathway. The TX line from your processor goes out and is daisy-chained to one or more slave devices and then looped back to your RX pin. Every transition on the clock line drives a data bit. This means that for every transition your hardware will shift one bit into your receive buffer. It's up to your code to know how many of those bits to discard before you start reading real data.
You are reading 0's because nothing is driving the RX pin. When you're connected to a real device, the first several bytes you send will also likely generate 00's on your RX pin. Usually you'll have to send some sort of command byte to the slave device which then will start sending real data. The length of that command should be discarded because the slave will not have started driving its output pin until the command byte (word, string, whatever) is complete.

Tx LPC2148 with Xbee in API mode with Rx Xbee with Relay not works

I started to work on lpc2148 with xbee series 2.
On the transmitting side i am using lpc2148 with xbee coordinator in API mode,
and Rx side i am using xbee on Shield in router AT mode.
I want XBee to activate a D3 pin, which could be used to turn on relay on Rx side
API frame format as below code using c program .
enter code here
#define Delimeter 0x7E
void Init_UART1(void) //This function setups UART1
{
unsigned int Baud16;
U1LCR = 0x83; // DLAB = 1
Baud16 = (Fpclk / 16) / UART_BPS;
U1DLM = Baud16 / 256;
U1DLL = Baud16 % 256;
U1LCR = 0x03;
}
void main() {
Init_UART1();
LED1_ON();
setRemoteState(0x5);//AD3 config DOUT HIGH
Delay(25);
LED1_OFF();
setRemoteState(0x4);//AD3 config DOUT LOW
Delay(25);
void setRemoteState (char value) {
UART1_Write(Delimeter);//start byte
UART1_Write(0);//high part of length
UART1_Write(0X10);//low part of length
UART1_Write(0X17);//remote AT command
UART1_Write(0X0);//frame id 0 for no reply
UART1_Write(0X0);
UART1_Write(0X0);
UART1_Write(0X0);
UART1_Write(0X0);
UART1_Write(0X0);
UART1_Write(0X0);
UART1_Write(0XFF);// broadcast
UART1_Write(0XFF);// broadcast
UART1_Write(0XFF);
UART1_Write(0XFE);
UART1_Write(0X02);//apply changes immediately on remote
UART1_Write('D');//writing on AD3 pin
UART1_Write('3');
UART1_Write(value);
sum = 0x17 + 0xFF + 0xFF + 0xFF + 0xFE + 0x02 + 'D' + '3' + value;
UART1_Write(0xFF - (0xFF & sum));//checksum
Delay(25);
}
}
i am not able to get any communication or data on my Rx side. D3 pinout volage is still low.
Please guide on this point...
This program is working fine with arduino using Serial.write Function.
Regards,
Vijay
Are you using the correct baud rate? Are you sure you've connected TX/RX correctly and haven't crossed them? If you have hardware flow control enabled, is the RTS signal into the XBee asserted? Is the XBee module powering up and receiving enough current?
If you monitor the XBee transmit signal on another device (computer via FTDI's TTL-to-USB cable) are you seeing bytes at startup (I believe it sends a Modem Status during it's startup)? If you monitor the LPC2148 transmit signal, are you seeing the byte stream you think you're sending (confirming that you're driving UART1 correctly)?
Can you tell if the XBee module is receiving your requests, perhaps by toggling ATD0 between high and low output and checking with an LED or scope? Do you have any hardware you can use to monitor the serial stream between the two devices to see if it's sending the bytes you think you're sending? Are you sure it's calculating the correct checksum (dump the bytes somehow and try running them through X-CTU to see if they work).
If you're going to be doing a lot of communications between the LPC2148 and the XBee module, you might want to try porting this Open Source ANSI C XBee Host Library to the platform. It includes multiple layers of XBee frame processing that should reduce the amount of software you need to write.

Implementing SPI slave ISR on PIC32?

I have two PIC32MX microcontrollers that are connected over a 1.53MHz SPI bus with Chip Select. I am having trouble getting my slave side interrupt service routine to transmit data correctly. As a test case, I'm having the master send out two bytes (0x01, 0x00) every 10 ms. The slave is supposed to receive the 0x01 command id and respond with a 0x02 when the master sends the 2nd byte (the dummy 0x00).
Ideally each transfer should look like this.
Master Slave
0x01 0x00
0x00 0x02
I'm really not sure where to start with the slave interrupt though. I'm using a fifo buffer called airsysTx to hold data that needs to be shifted out the next time the master makes a request. The slave receives the 0x01 from the master just fine and writes 0x02 to the fifo buffer when it does. I'm not sure how to code the interrupt so that it will be sure to transmit correctly. The code I have below is a good start, but it's wrong. Suggestions?
/*******************************************************************************
* Interrupt service routine for SPI3 interrupts from Air MCU.
* The user's code at this vector should perform any application specific
* operations and MUST clear the SPI3 interrupt flags before exiting.
******************************************************************************/
void __ISR(_SPI_3_VECTOR, ipl7) _SPI3Interrupt()
{
BYTE MasterCMD;
SET_D1();//Set debug LED
// RX INTERRUPT
if(IFS0bits.SPI3RXIF) // receive data available in SPI3BUF Rx buffer
{
MasterCMD = SPI3BUF;
if(AirCMD == 0x01)
{
airsysTxFlush();
airsysTxWrite(0x02);
}
}
//Transmit data if needed.
if(SPI3STATbits.SPITBE)
{
if(!airsysTxIsEmpty())
{
SPI3BUF = airsysTxRead();
}
else
{
//Else write 0 to the tx buffer to clear the spi shift reg
SPI3BUF = 0x00;
}
}
IFS0bits.SPI3RXIF = 0;
IFS0bits.SPI3TXIF = 0;
IFS0bits.SPI3EIF = 0;
SPI3STATbits.SPIROV = 0;// clear the Overflow
CLEAR_D1();//CLEAR Debug LED
} // end ISR
What this code is actually transmitting is something like this:
Ideally each transfer should look like this.
Master Slave
0x01 0x02
0x00 0x01
Generally you can't write a slave SPI driver to interact in the way you describe because you can't control the timing precisely as a slave. What generates your ISR, is it Rx of first byte from master or assertion of chip select?
As the slave, you need to have set up the data bytes you want to transmit before the master starts the transaction. You usually don't have time to react to the first byte. There are a couple of ways to do this:
1) You could use a protocol where master does a 1 or 2 byte write-only transaction that tells the slave what it wants to read. Then master waits a few milliseconds to allow the slave to prepare the response. Then master does a read-only transaction to get the slave response.
2) If using DMA or FIFO, slave preloads the first padding byte(s) into the fifo before master starts the transaction. Then as you get the ISR you put the remaining response data into the fifo (without a flush). You need to have enough pad bytes to accommodate the slave ISR latency in forming the response. So for example, you may define your protocol where master knows that the first N bytes of response are pad bytes, followed by response data. Padding requirement would depend on your master clock speed and slave CPU speed/interrupt latency.

Initializing SD card in SPI issues

I've had a look at Stack Overflow question Initialization of a microSD card using an SPI interface and didn't see any answers that matched my issue (that is, things I haven't already tried).
I have a similar issue where I'm trying to access a SD card through a microcontroller's SPI interface (specifically an HC908). I've tried following the flow charts in the Physical Layer Simplified Specification v2.00 and it seems to initialize correctly on Transcend 1 GB & 2 GB and an AE&C 1 GB card. But I'm having problems on three other random cards from my stash of old cards that I've used on my camera.
My code is all HC908 assembler. I scoped out the SPI clock line and during initialization it's running about 350 kHz (the only speed multiplier that the HC908 supplies at my low MCU clock speed that falls within the 100 - 400 kHz window).
Here are the results of the three cards that aren't completing my initialization routine (all done consecutively without changing any code or timing parameters):
Canon 16Meg card (labeled as SD):
Set card select high
Send 80 SPI clock cycles (done by writing 0xFF 10 times)
Set card select low
Send CMD0 [0x400000000095] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (indicates idle)
Send CMD8 [0x48000001AA87] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
Because illegal command set local flag to indicate v1 or MMC card
Send CMD58 [0x7A00000000FD] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
because illegal command branch to error routine
Send CMD13 [0x4D000000000D] (show status buffer) and Loop up to 8 times waiting for high bit on response to go low
R1= 0x05 (idle and illegal command)
Is the illegal command flag stuck? Should I be doing something after CMD8 to clear that flag?
SanDisk UltraII 256Meg
Set card select high
Send 80 SPI clock cycles (done by writing 0xFF 10 times)
Set card select low
Send CMD0 [0x400000000095] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Send CMD8 [0x48000001AA87] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
Because illegal command set local flag to indicate v1 or MMC card
Send CMD58 [0x7A00000000FD] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Send 0xFF 4 times to read OCR
OCR = 0xFFFFFFFF
Send CMD55 [0x770000000065] (1st part of ACMD41) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Send CMD41 [0x6900000000E5] (2nd part of ACMD41) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
Because illegal command, assume card is MMC
Send CMD1 [0x4100000000F9] (for MMC) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
Repeat the CMD1 50 times (my arbitrary number to wait until idle clears)
Every R1 response is 0x05 (idle and illegal command)
Why is OCR all F? Doesn't seem proper at all. Also, why does ACMD41 and CMD1 respond illegal command? Is CMD1 failing because the card is waiting for a valid ACMD after the CMD55 even with the illegal command response?
SanDisk ExtremeIII 2G:
Set card select high
Send 80 SPI clock cycles (done by writing 0xFF 10 times)
Set card select low
Send CMD0 [0x400000000095] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Send CMD8 [0x40000001AA87] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x7F (??? My loop shows the responses for each iteration and I got 0xFF 0xFF 0xC1 0x7F... is the card getting out of sync?)
Send CMD58 [0x7A00000000FD] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle and back in sync)
Send 0xFF 4 times to read OCR
OCR = 0x00FF80
Send CMD55 [0x770000000065] (1st part of ACMD41) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x5F (??? loop responses are 0xFF 0xFF 0xF0 0x5F... again out of sync?)
Send CMD41 [0x6900000000E5] (2nd part of ACMD41) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command, but back in sync???)
Because illegal command, assume card is MMC
Send CMD1 [0x4100000000F9] (for MMC) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x7F (??? loop responses are 0xFF 0xFF 0xC1 0x7F... again out of sync?)
Repeat CMD1 and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Repeat CMD1 and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x7F (??? loop responses are 0xFF 0xFF 0xC1 0x7F... again out of sync?)
Repeat CMD1 and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x00 (out of idle)
Send CMD9 [0x4900000000AF] (get CSD) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x3F (??? loop responses are 0xFF 0xFF 0xC1 0x3F... again out of sync?)
Code craps out because Illegal command bit is high.
What on Earth is wrong with that card?
Sometimes it is in sync, other times not. (The above pattern is repeatable.) I've scoped this one out and I'm not seeing any rogue clock cycles going through between MOSI/MISO transfers.
OK... I found my problem. For anyone else who runs into this issue, it is important to remember to send an extra 0xFF after getting responses. This gives the card an extra eight clock cycles to prepare itself for the next command. Some cards don't seem to need it (the Transcends that I'm using for example), but others require it.
I actually put a simple loop at the beginning of my 'write command' routine that sends 0xFF until it gets 0xFF as a response just so I don't have to go to all the different places where I read responses to make sure I put an extra send 0xFF. Because as far as the SD card is (usually) concerned in SPI mode, if there are no clock cycles coming in, time stands still.
One thing that I noted and have yet to find an answer for (but so far it isn't hurting anything), after I read the 16 bytes of the CSR, there seem to be an additional 2 bytes of non-0xFF that comes out... Is that a CRC16? Odd since the CSR has a CRC built in...
If you enabled CRC (with CMD59), then yes, data blocks will have CRC16 appended.
For more info see "Physical Layer Simplified Specification Version 2.00", chapters "Bus Transfer Protection" and "Data Read".
This is important: I've had very much trouble with SD/MMC card, until I found out that I had to select an operating voltage.
You do so, by sending ACMD41 with the bit set for the voltage you're supplying the card with.
Note: Only a single bit may be selected.
If you don't select a voltage or select more than one, it will KEEP looping in idle-state and never exit on some SD cards.
That is: If your ACMD41 keeps sending response 0x01, you have not selected a voltage.
The voltage is in ACMD41's 32-bit parameter bits 23...8.
For 3.2V ... 3.3V, this is bit 20, so for instance you could:
acmdSDAppOpCond[2] = (1 << (20 & 7)); /* 3.2V .. 3.3V */
That's hex-value 0x10, so your ACMD41 would look like this...
0x69 0x40 0x10 0x00 0x00 0xCD
...or if it's a SDSC card...
0x69 0x00 0x10 0x00 0x00 0x5F
Here's a short (and incomplete) table of the most common values:
Bit23: 3.5V..3.6V
Bit22: 3.4V..3.5V
Bit21: 3.3V..3.4V
Bit20: 3.2V..3.3V
Bit19: 3.1V..3.2V
Bit18: 3.0V..3.1V
Bit17: 2.9V..3.0V
Bit16: 2.8V..2.9V
Bit15: 2.7V..2.8V
You do NOT have to switch CS high at ANY point in time. You can keep it low ALL the time.