Initializing SD card in SPI issues - embedded

I've had a look at Stack Overflow question Initialization of a microSD card using an SPI interface and didn't see any answers that matched my issue (that is, things I haven't already tried).
I have a similar issue where I'm trying to access a SD card through a microcontroller's SPI interface (specifically an HC908). I've tried following the flow charts in the Physical Layer Simplified Specification v2.00 and it seems to initialize correctly on Transcend 1 GB & 2 GB and an AE&C 1 GB card. But I'm having problems on three other random cards from my stash of old cards that I've used on my camera.
My code is all HC908 assembler. I scoped out the SPI clock line and during initialization it's running about 350 kHz (the only speed multiplier that the HC908 supplies at my low MCU clock speed that falls within the 100 - 400 kHz window).
Here are the results of the three cards that aren't completing my initialization routine (all done consecutively without changing any code or timing parameters):
Canon 16Meg card (labeled as SD):
Set card select high
Send 80 SPI clock cycles (done by writing 0xFF 10 times)
Set card select low
Send CMD0 [0x400000000095] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (indicates idle)
Send CMD8 [0x48000001AA87] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
Because illegal command set local flag to indicate v1 or MMC card
Send CMD58 [0x7A00000000FD] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
because illegal command branch to error routine
Send CMD13 [0x4D000000000D] (show status buffer) and Loop up to 8 times waiting for high bit on response to go low
R1= 0x05 (idle and illegal command)
Is the illegal command flag stuck? Should I be doing something after CMD8 to clear that flag?
SanDisk UltraII 256Meg
Set card select high
Send 80 SPI clock cycles (done by writing 0xFF 10 times)
Set card select low
Send CMD0 [0x400000000095] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Send CMD8 [0x48000001AA87] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
Because illegal command set local flag to indicate v1 or MMC card
Send CMD58 [0x7A00000000FD] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Send 0xFF 4 times to read OCR
OCR = 0xFFFFFFFF
Send CMD55 [0x770000000065] (1st part of ACMD41) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Send CMD41 [0x6900000000E5] (2nd part of ACMD41) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
Because illegal command, assume card is MMC
Send CMD1 [0x4100000000F9] (for MMC) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command)
Repeat the CMD1 50 times (my arbitrary number to wait until idle clears)
Every R1 response is 0x05 (idle and illegal command)
Why is OCR all F? Doesn't seem proper at all. Also, why does ACMD41 and CMD1 respond illegal command? Is CMD1 failing because the card is waiting for a valid ACMD after the CMD55 even with the illegal command response?
SanDisk ExtremeIII 2G:
Set card select high
Send 80 SPI clock cycles (done by writing 0xFF 10 times)
Set card select low
Send CMD0 [0x400000000095] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Send CMD8 [0x40000001AA87] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x7F (??? My loop shows the responses for each iteration and I got 0xFF 0xFF 0xC1 0x7F... is the card getting out of sync?)
Send CMD58 [0x7A00000000FD] and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle and back in sync)
Send 0xFF 4 times to read OCR
OCR = 0x00FF80
Send CMD55 [0x770000000065] (1st part of ACMD41) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x5F (??? loop responses are 0xFF 0xFF 0xF0 0x5F... again out of sync?)
Send CMD41 [0x6900000000E5] (2nd part of ACMD41) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x05 (idle and illegal command, but back in sync???)
Because illegal command, assume card is MMC
Send CMD1 [0x4100000000F9] (for MMC) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x7F (??? loop responses are 0xFF 0xFF 0xC1 0x7F... again out of sync?)
Repeat CMD1 and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x01 (idle)
Repeat CMD1 and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x7F (??? loop responses are 0xFF 0xFF 0xC1 0x7F... again out of sync?)
Repeat CMD1 and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x00 (out of idle)
Send CMD9 [0x4900000000AF] (get CSD) and Loop up to 8 times waiting for high bit on response to go low
R1 = 0x3F (??? loop responses are 0xFF 0xFF 0xC1 0x3F... again out of sync?)
Code craps out because Illegal command bit is high.
What on Earth is wrong with that card?
Sometimes it is in sync, other times not. (The above pattern is repeatable.) I've scoped this one out and I'm not seeing any rogue clock cycles going through between MOSI/MISO transfers.

OK... I found my problem. For anyone else who runs into this issue, it is important to remember to send an extra 0xFF after getting responses. This gives the card an extra eight clock cycles to prepare itself for the next command. Some cards don't seem to need it (the Transcends that I'm using for example), but others require it.
I actually put a simple loop at the beginning of my 'write command' routine that sends 0xFF until it gets 0xFF as a response just so I don't have to go to all the different places where I read responses to make sure I put an extra send 0xFF. Because as far as the SD card is (usually) concerned in SPI mode, if there are no clock cycles coming in, time stands still.
One thing that I noted and have yet to find an answer for (but so far it isn't hurting anything), after I read the 16 bytes of the CSR, there seem to be an additional 2 bytes of non-0xFF that comes out... Is that a CRC16? Odd since the CSR has a CRC built in...

If you enabled CRC (with CMD59), then yes, data blocks will have CRC16 appended.
For more info see "Physical Layer Simplified Specification Version 2.00", chapters "Bus Transfer Protection" and "Data Read".

This is important: I've had very much trouble with SD/MMC card, until I found out that I had to select an operating voltage.
You do so, by sending ACMD41 with the bit set for the voltage you're supplying the card with.
Note: Only a single bit may be selected.
If you don't select a voltage or select more than one, it will KEEP looping in idle-state and never exit on some SD cards.
That is: If your ACMD41 keeps sending response 0x01, you have not selected a voltage.
The voltage is in ACMD41's 32-bit parameter bits 23...8.
For 3.2V ... 3.3V, this is bit 20, so for instance you could:
acmdSDAppOpCond[2] = (1 << (20 & 7)); /* 3.2V .. 3.3V */
That's hex-value 0x10, so your ACMD41 would look like this...
0x69 0x40 0x10 0x00 0x00 0xCD
...or if it's a SDSC card...
0x69 0x00 0x10 0x00 0x00 0x5F
Here's a short (and incomplete) table of the most common values:
Bit23: 3.5V..3.6V
Bit22: 3.4V..3.5V
Bit21: 3.3V..3.4V
Bit20: 3.2V..3.3V
Bit19: 3.1V..3.2V
Bit18: 3.0V..3.1V
Bit17: 2.9V..3.0V
Bit16: 2.8V..2.9V
Bit15: 2.7V..2.8V
You do NOT have to switch CS high at ANY point in time. You can keep it low ALL the time.

Related

Reading data over the SPI bus using DMA witth STM32F479

I am using the STM32F479 microcontroller along with a AFE440 Analog Front End. When data is ready to be read on the AFE I get a trigger via the ADC_RDY pin on the Microcontroller. At this point I need to read 4 different registers on the AFE all with 3 bytes of data and store them in a buffer. (3 * 4 = 12 bytes total). Then I want my processor to sleep until I get another event on the ADC_RDY pin at which point I read another 12 bytes. I want to store the 12 bytes read each time in a FIFO buffer of size 120 bytes.
I would like to read and store the bytes into the Buffer all using DMA. My processor will be a sleep during this transaction. It will wake up once the FIFO buffer is full with 120 bytes and process the data.
How would I go about setting this up with ST ?

How to test Bit-Banged communication's assembly routines

For one MCU I have written some assembly routines performing RX and TX of a proprietary protocol (UART-based) in a bit-bang fashion. How can I test them?
TX might be tested by sending data, and at the same time, with the help of a logic analyzer, checking that all the sampled timings are correct (manually or with some scripts).
RX on the other hand is more difficult. On one hand I can check if I'm receiving what someone else is sending, but on the other hand how do I know that the RX sampling is happening correctly (timing-wise)?
For example, my RX routine may return the correct data by sampling at the edge of the "bit window" instead of the middle.
I thought about toggling a "debug pin" to indicate when the sampling is actually happening, but this introduces delays in the sampling procedure, hence I wouldn't be testing my original routine.
Some things worth clarifying after reading comments:
I know that hardware UART is better (it depends, though), but I can't use it. This is not a matter of "have you tried this ...?";
I know how to do the bit banging (I have already written the assembly routines);
I can't connect TX to RX because I'm only using 1 wire (the communication is half-duplex);
I'm asking how to test the RX sampling timings, not how to implement UART.
I thought about toggling a "debug pin" to indicate when the sampling
is actually happening, but this introduces delays in the sampling
procedure, hence I wouldn't be testing my original routine.
Test with the instrumentation code, and then leave the instrumentation - or near-equivalent code that doesn't actually twiddle hardware - in place.
You'll need something to send data to the MCU, perhaps a second MCU. I've worked on similar code for both 6502 and Z80 for old 8 bit Atari peripherals. These are half duplex protocols, so whenever the device is idle, it's polling for a start bit. After detecting a start bit, it delays 1.5 bit times, then receives 8 bits, with 1 bit time between bits. Both receiving and sending of data routines are coded to get exact cycle counts for timing. These were old devices, and even the fastest bit rate was relatively slow at 19 microseconds per bit ~= 52600 baud.
The question has been updated. If the input and output instructions take the exact same time to run (cycle count), you could modify the receive code to transmit data to verify the bit time, and confirm exactly how fast the processor is running.
For the timing regarding sensing the start bit and doing a 1.5 bit time wait, you'd have to calculate the minimum and maximum number of cycles to sense the start bit. The maximum cycle count would be an input instruction that just misses the trailing edge of the start bit, the test instruction, and the loop back to the input, followed by another test and then a fall through the loop to continue the receive. The minimum cycle count would be an input that just barely catches the leading edge of the start bit, does a test, then falls through the loop. Then the remainder of the receive code needs to sample as close as possible to the middle of the data bit periods.
Here is example of code for a 4mhz Z80 that receives data at 19 microseconds == 76 cycles per data bit. The comments include the cycle count for each instruction. The ideal wait time for start bit to 1st data bit is 114 cycles. The min,max cycle time for the start bit loop is 20,50 cycles. An additional delay plus the input of the first data bit of 79 cycles is used so min,max cycle time to sense start to receive 1st data bit is 99,129 cycles, within the min,max bounds of 76,152 cycles. The remaining data bits are read at exactly 76 cycles per bit.
LD E,0 ;SET UP
; ; START BIT TO DATA BIT=114
NRXF0: LD A,(FBS) ;(13) WAIT FOR START BIT
AND FBSRXD ;(7)
JP NZ,NRXF0 ;(10)
; ; NOTE: 20 MIN, 50 MAX, 35 AVG
EX (SP),HL ;(19) DELAY
EX (SP),HL ;(19)
LD A,(HL) ;(7)
NRXF1: LD A,(HL) ;(7)
LD A,(HL) ;(7)
LD D,8 ;(7) 8 BITS PER BYTE
; ; 76 CYCLES PER DATA BIT
NRXF2: LD A,(FBS) ;(13) GET DATA BIT
AND FBSRXD ;(7)
ADD A,0FFH ;(7)
RR C ;(8)
PUSH BC ;(11) DELAY
POP BC ;(10)
NOP ;(4)
DEC D ;(4) LP TIL BYTE DONE
JR NZ,NRXF2 ;(12/7)
RET NZ ;(5) DELAY
NRXF4: LD A,(FBS) ;(13) WAIT FOR NEXT START BIT
AND FBSRXD ;(7)
JP NZ,NRXF4 ;(10)
; ; START BIT TO DATA BIT=114
LD (HL),C ;(7) STORE BYTE
LD A,C ;(4) DO CKSUM
ADD A,E ;(4)
ADC A,0 ;(7)
LD E,A ;(4)
INC HL ;(6) ADV ADR
DJNZ NRXF1 ;(13/8) LP IF MORE BYTES

Every SPI send results in receiving a 0 on MSP430

I run this simplified program for SPI communication, running on the TI MSP430FR5969 on its corresponding launchpad MSP-EXP430FR5969, and set breakpoints just before TX and just after RX in CCS (Code Composer Studio). The breakpoints are labelled with comments.
My launchpad is not connected to anything. (Once I figure this out I intend to communicate it to some other device for real communication.)
I do not expect to receive any data because the launchpad is not connected to anything. But I receive exactly one zero for every send. The breakpoints are hit in alternate order starting with the first TX breakpoint.
Why am I receiving data? Is it because I need to enable pullup registers on some of the pins? I believe the launchpad itself uses the USCI "A" module(s) so the "B" module that I am using should have nothing connected to it.
#include <msp430.h>
int main(void) {
WDTCTL = WDTPW | WDTHOLD;
P1SEL0 &= ~BIT3; // UCB0STE
P1SEL0 &= ~BIT6; // UCB0SIMO
P1SEL0 &= ~BIT7; // UCB0SOMI
P2SEL0 &= ~BIT2; // UCB0CLK
P1SEL1 |= BIT3; // UCB0STE
P1SEL1 |= BIT6; // UCB0SIMO
P1SEL1 |= BIT7; // UCB0SOMI
P2SEL1 |= BIT2; // UCB0CLK
PM5CTL0 &= ~LOCKLPM5;
CSCTL0_H = CSKEY_H;
CSCTL1 &= ~DCORSEL;
CSCTL1 = (CSCTL1 & ~0x000e) | DCOFSEL_0; // 1 MHz
CSCTL3 |= DIVA__1 | DIVS__1 | DIVM__1; // clock dividers = 1
CSCTL0_H = 0;
UCB0CTLW0 |= UCSWRST;
UCB0CTLW0 |= UCCKPH;
UCB0CTLW0 |= UCCKPL;
UCB0CTLW0 |= UCMSB;
UCB0CTLW0 |= UCMST;
UCB0CTLW0 |= UCMODE_2;
UCB0CTLW0 |= UCSYNC;
UCB0CTLW0 |= UCSSEL__SMCLK;
UCB0CTLW0 |= UCSTEM;
// UCB0STATW |= UCLISTEN; // OK, if enabled i receive what i send
UCB0CTLW0 &= ~UCSWRST;
UCB0IE |= UCRXIE;
_enable_interrupts();
_delay_cycles(100000);
int send = 0;
while (1) {
while (!(UCB0IFG & UCTXIFG));
UCB0TXBUF = send; // BREAKPOINT 1
send = (send + 1) % 100;
_delay_cycles(100000);
}
return 0;
}
#pragma vector = USCI_B0_VECTOR
__interrupt void isr_usci_b0 (void) {
static volatile int received = 0;
switch (__even_in_range(UCB0IV, USCI_SPI_UCTXIFG)) {
case USCI_NONE:
break;
case USCI_SPI_UCRXIFG:
received = UCB0RXBUF;
UCB0IFG &= ~UCRXIFG; // BREAKPOINT 2
_no_operation();
break;
case USCI_SPI_UCTXIFG:
break;
}
}
The SPI peripheral does two things if MISO and MOSI are enabled (CLK enabled as well, of course). Assuming Master mode operation, it clocks out data from the TX shift register on the MOSI line and simultaneously clocks in data to the RX shift register from the MISO line.
In your circuit, the MISO input is hanging since you have not enabled either pull-up or pull-down internal resistances. Thus, observing 0x00 would not be out of the ordinary. If you had enabled the pull-up resistance, then you would have seen 0xFF in the receive buffer.
Another rule of thumb:
If you are using the peripheral functions then configure the GPIO pins of the MSP430 as output/input. (i.e. MOSI, CLK = output, MISO = input for SPI master mode)
Answer to the questions in the comments:
The MSP430 is configured in the listed code to be the SPI master. I see little point in the using a dedicated RX interrupt service routine, unless you want the controller to do something else in the time between shifting data from the TX buffer to the shift register and shifting data from the RX shift register to the RX buffer, i.e. one "byte" transfer period. You could as well have polled for the RX interrupt as you have for the TX interrupt. But you must wait for the RX interrupt.
Excerpt from the user guide:
The eUSCI initiates data transfer when data is moved to the transmit data buffer UCxTXBUF. The UCxTXBUF data is moved to the transmit (TX) shift register when the TX shift register is empty, initiating data transfer on UCxSIMO starting with either the MSB or LSB, depending on the UCMSB setting. Data on UCxSOMI is shifted into the receive shift register on the opposite clock edge. When the character is received, the receive data is moved from the receive (RX) shift register to the received data buffer UCxRXBUF and the receive interrupt flag UCRXIFG is set, indicating the RX/TX operation is complete.
A set transmit interrupt flag, UCTXIFG, indicates that data has moved from UCxTXBUF to the TX shift register and UCxTXBUF is ready for new data. It does not indicate RX/TX completion.
To receive data into the eUSCI in master mode, data must be written to UCxTXBUF, because receive and transmit operations operate concurrently.
The client will not send data by itself to the MSP430. The client device may need some time to execute the command the master just sent. Typically an "erase flash" command for SPI Flash chips.
In this case the master, i.e. MSP430, must poll the client device to see if it has data to send/completed the command. This is done typically either by polling a status register of the client device (or by using a dedicated IRQ interrupt). i.e. the client signals "completion of command"/"availability of data" via the status byte (or IRQ interrupt). On this event, the master could read out data from the client.
At first glance it may seem rather counter intuitive that data (dummy bytes) needs to be written in order to read data - perhaps your source of confusion as well :)
Perhaps reading about an SPI client may help. For example this SPI memory.
The SPI peripheral transmits a bit and receives a bit for every clock cycle. Instead of wondering how some unconnected device has sent a byte, think that your SPI peripheral has clocked in a receive byte even though nothing is connected. The byte you receive is 0 because the MISO line happens to be low while nothing is connected.
The SPI peripheral does not know the meaning of the data and does not know how many bytes must be transmitted and received for any particular command. It's up to your application to know when to transmit and receive dummy bytes. For example, if the slave responds to a command in the next byte then your application has to transmit two bytes (the command byte followed by a dummy byte) and at the same time receive two bytes (a dummy byte, followed by the response). Some slaves may send a generic status byte instead of a dummy byte as the first byte of all responses. It's up to your application to use or ignore the status byte.
The MSP430's SPI documentation is not going to tell you when you need to send/receive dummy bytes. You'll have to read the SPI documentation for the slave device for that information. Each slave may have different requirements. Some slaves my receive a command byte and send a reply. Other slaves may receive a command and address byte before sending a reply. Some slaves may reply with multiple bytes. You'll have to program your application to transmit/receive the appropriate number of bytes.
There are no stop bits. The master is both transmitting and receiving with every clock. If you want to stop receiving then stop transmitting. If you want to continue receiving then transmit dummy bytes.
Yes, you should use the RX interrupt. The RX interrupt indicates that it is safe for your application to read the received byte from the RX register. The SPI peripheral is shifting receive bits into a shift register with each clock. But after a byte has been received the SPI peripheral still has to copy the contents of the shift register to the RX register and then set the RX interrupt. You shouldn't assume that the received byte can be read from the RX register until the RX interrupt is indicated.
The behavior you describe is to be expected. With SPI, it is movement on the clock line that indicates the presence of data. The input line can be idle, and data will be received because, in order to send a byte, the clock must be toggled back and forth to latch the transmitted data, but at same time, data is latched into the receive buffer.
An SPI bus is intended to be a closed pathway. The TX line from your processor goes out and is daisy-chained to one or more slave devices and then looped back to your RX pin. Every transition on the clock line drives a data bit. This means that for every transition your hardware will shift one bit into your receive buffer. It's up to your code to know how many of those bits to discard before you start reading real data.
You are reading 0's because nothing is driving the RX pin. When you're connected to a real device, the first several bytes you send will also likely generate 00's on your RX pin. Usually you'll have to send some sort of command byte to the slave device which then will start sending real data. The length of that command should be discarded because the slave will not have started driving its output pin until the command byte (word, string, whatever) is complete.

lpc1788 ssp (SPI) - proc to proc communication

I would like to send string of chars from one proc (master) to another (slave) and then read string from a slave.
Currently im mixing up the arduino and LPC1788, using lpc as master and arduino as slave.
LPC sent's the string correctly which is received by the arduino in ISR. In loop function i check if all of the chars are received and then try to send string back. On LPC side ISR is not working for some reason. I have set SR as
SR = (1<<TNF) | (1<<RNE);
So i have put delay after sending the string from LPC and then initiate read from arduino.
What i see on LA for sending the string is:
but reading of string from Arduino looks odd (string should be "Pong\n", it is not always P that i received... it varies)
i guess majority of problem is within the sync of sending and reading of SPI buffer. How do i achieve that without functional ISR on LPC?
The SPI specification states that the CS (SSEL) line should be active during a frame and become inactive in between. NXP interpreted this as a word being one frame. This means that the CS as generated by the SSP block (the same goes for the legacy SPI) is only active during one transaction of up to 16 bits.
Note also that there is always a gap in between the words/frames being sent. So even when you fill the FIFO or use DMA you will see 16 clock pulses, a short delay and then 16 more pulses.
When using a GPIO pin as SSEL, please note you have to wait for SSEL assertion or de-assertion until the peripheral is idle.

Implementing SPI slave ISR on PIC32?

I have two PIC32MX microcontrollers that are connected over a 1.53MHz SPI bus with Chip Select. I am having trouble getting my slave side interrupt service routine to transmit data correctly. As a test case, I'm having the master send out two bytes (0x01, 0x00) every 10 ms. The slave is supposed to receive the 0x01 command id and respond with a 0x02 when the master sends the 2nd byte (the dummy 0x00).
Ideally each transfer should look like this.
Master Slave
0x01 0x00
0x00 0x02
I'm really not sure where to start with the slave interrupt though. I'm using a fifo buffer called airsysTx to hold data that needs to be shifted out the next time the master makes a request. The slave receives the 0x01 from the master just fine and writes 0x02 to the fifo buffer when it does. I'm not sure how to code the interrupt so that it will be sure to transmit correctly. The code I have below is a good start, but it's wrong. Suggestions?
/*******************************************************************************
* Interrupt service routine for SPI3 interrupts from Air MCU.
* The user's code at this vector should perform any application specific
* operations and MUST clear the SPI3 interrupt flags before exiting.
******************************************************************************/
void __ISR(_SPI_3_VECTOR, ipl7) _SPI3Interrupt()
{
BYTE MasterCMD;
SET_D1();//Set debug LED
// RX INTERRUPT
if(IFS0bits.SPI3RXIF) // receive data available in SPI3BUF Rx buffer
{
MasterCMD = SPI3BUF;
if(AirCMD == 0x01)
{
airsysTxFlush();
airsysTxWrite(0x02);
}
}
//Transmit data if needed.
if(SPI3STATbits.SPITBE)
{
if(!airsysTxIsEmpty())
{
SPI3BUF = airsysTxRead();
}
else
{
//Else write 0 to the tx buffer to clear the spi shift reg
SPI3BUF = 0x00;
}
}
IFS0bits.SPI3RXIF = 0;
IFS0bits.SPI3TXIF = 0;
IFS0bits.SPI3EIF = 0;
SPI3STATbits.SPIROV = 0;// clear the Overflow
CLEAR_D1();//CLEAR Debug LED
} // end ISR
What this code is actually transmitting is something like this:
Ideally each transfer should look like this.
Master Slave
0x01 0x02
0x00 0x01
Generally you can't write a slave SPI driver to interact in the way you describe because you can't control the timing precisely as a slave. What generates your ISR, is it Rx of first byte from master or assertion of chip select?
As the slave, you need to have set up the data bytes you want to transmit before the master starts the transaction. You usually don't have time to react to the first byte. There are a couple of ways to do this:
1) You could use a protocol where master does a 1 or 2 byte write-only transaction that tells the slave what it wants to read. Then master waits a few milliseconds to allow the slave to prepare the response. Then master does a read-only transaction to get the slave response.
2) If using DMA or FIFO, slave preloads the first padding byte(s) into the fifo before master starts the transaction. Then as you get the ISR you put the remaining response data into the fifo (without a flush). You need to have enough pad bytes to accommodate the slave ISR latency in forming the response. So for example, you may define your protocol where master knows that the first N bytes of response are pad bytes, followed by response data. Padding requirement would depend on your master clock speed and slave CPU speed/interrupt latency.