How to understand the SPI clock modes? - spi

There are many links on the web describing the SPI timing/clock modes. E.g., the following picture from here indicates 4 combinations of CPOL/CPHA determines when to sampling/transmitting data wrt clock rising/failing edges:
And for SPI to work correctly, it's required that both the controller(or master) and the device (or slave) should work in the same clock mode.
Few days ago, I encounter a datasheet which describes a QSPI controller, saying that it only supports mode 1 (CPOL=0, CPHA=1). It also containing a AC timing requirement for the SPI interface, as below:
The timing diagram confirms that the controller works in mode 1: sclk at low when idle, and data sampled at falling edge of sclk. So far so good.
What puzzled me is another statement in the user guide: "The SPI only supports SPI mode 1. but, if the SPI device follows the Soc AC timing, it is usable regardless of the mode." How it's possible? Apparently it violates the "same mode" rule. For example, how it works when the controller is in mode 1 (sampling on falling edge) and the device is in mode 0 or 3 (sampling on the rising edge)?
Btw, a working system shows that the controller actually works with a QSPI flash devices whose datasheet tells that it only support mode 0 and 3. This implies that I must miss some points in understanding the SPI clock modes...
Any explanation is appreciated.

Your "mode 1" qspi controller is not behaving like a normal mode 1 spi controller.
Look at the timing diagram: It outputs new data on MOSI on the falling edge of CLK, AND it samples MISO on the falling edge of CLK. That is not equivalent of any of the four SPI modes.
It is done to allow much higher performance.
(Q)SPI flash memories sample MOSI on the rising edge of CLK, and output new data on MISO on the falling edge of CLK.
Regular SPI controllers will sample MISO on the rising edge of CLK. This means the entire round-trip of CLK from controller to flash, flash access, and MISO back to the controller, must happen in just half a clock cycle.
This will typically limit you to 25MHz or less.
By having the controller sample MISO on the falling edge, the round-trip gets a whole clock period worth of time, which means you can reach 50MHz or more.

In general:
In SPI there is only one clock edge that matters to the receiver. In modes 0 and 3 it is the rising edge, in modes 1 and 2 it is the falling edge.
The receiver requires the data that it is going to read to be valid for some short period immediately before the edge that matters (called the "setup time") and requires that it remains valid for some short period after the edge that matters (called the "hold time").
Outside this small window (the setup time plus the hold time) the value of the data lines does not matter at all. The receiver does not do anything at the other clock edge to that the one where it samples the data, and the line may have any value then, including an intermediate voltage which is between high and low.
The only responsibility of each transmitter is to meet the requirements of the receiver. That is, the master must control MOSI in such a way that that it meets the requirements of the slave, and the slave must control MISO so that it meets the requirements of the master.
At slow speeds the easiest way for each transmitter to meet the requirements of the other device is to change the data at the moment of the opposite clock edge to the one the receiver cares about. This is the implies using the same clock mode at both ends as shown in the diagrams in your question.
At higher speeds this is not always ideal and the timing method described in your datasheet extract is widespread and perfectly normal.
For your particular device:
Your slave device samples MOSI on the falling edge of CLK. It requires the master to have driven MOSI to the correct level at least tIS (input sample time) before this and to hold it for at least tIH (input hold time) after this.
In the other direction, your slave device holds the previous MISO value for at least tCLQX (clock low output hold time) after the edge where it assumes the master would have sampled it.
Your slave device outputs the next MISO value no later than tCLQV (clock low to output valid time) after the same falling edge.
In the case of the naive timing diagrams in your question, you would assume that tCLQX would have the same value as tCLQV and would also coincide with the next rising edge of the clock, but the datasheet is saying that this does not have to be the case.

Related

what is the best way to design a shift register with stm32

I am using a STM32F031K6, clocked at 40MHz, and I want to design a program which acts as a looping shift register - an external trigger is used to clock it, the values in the shift register left shift every time a rising/falling edge is received. the output is one pin either high or low.
I need to make the time between the clocking edge and the output less than 0.5uS, or failing that as quick as possible. The values of the shift register can be changed and the length can also be changed, but for now I'm just starting with a byte like 11000010 .
I initially thought to implement this with an external interrupt but it was suggested there may be a better way to implement it
any help much appreciated
You might use the SPI peripheral of the STM32F0 for your task. When configured in slave mode, each time an external clock edge is detected on the SCK signal, the MISO will be set to the next bit of a value loaded into an internal shift register via the SPI data register.
Check out the chapter on the Serial peripheral interface (SPI) in STM32F0 reference manual.
Especially have a look at the sections addressing the following keywords:
General description: SPI block diagram
Slave Mode (Master selection: Slave configuration)
Simplex communication: Transmit-only mode (RXONLY=0)
Slave select (NSS) pin management: Software NSS management (SSM = 1)
Data frame format (data size can be set from 4-bit up to 16-bit length)
Configuration of SPI
The SPI unit is highly configurable, e.g. regarding the polarity of clock signal. Since it is an independent hardware unit, it should be able to handle your 0.5us reaction time requirement. The MCU firmware needs to set up the SPI unit and then provide new data to the SPI unit, each time the Tx buffer empty flag (TXE) is set. This can also be done by interrupt (TXEIE) or even using a DMA channel (TXDMAEN) with a circular buffer. In the latter case the "shift register functionality" runs completely independent of the MCU core (after setup).

Where is the SPI multiplexer of my dreams?

Consider you have an SPI bus with only a single chip select.
Is there a chip such that I can connect 8 or more devices to that SPI bus?
To simplify things, you may assume that all devices agree on the SPI mode (data needs to be valid on a rising edge). Also, all devices are of the time, where chip selects stays low for the whole transfer, and is not toggled after each word.
The SPI multiplexer could have 4 inputs:
MISO, MOSI, input clock, master chip select
and 9 outputs:
output clock, 8 slave chip selects
MISO and MOSI are connected directly to the slaves. The slaves have their SPI clock connected to the output clock and their chips selects are connected to one of the 8 slave chip selects.
The SPI multiplexer would take the two bytes of each SPI transfer as its own input. The first byte could indicate which slave is to be selected. For configuration of the multiplexer, a ninth address could be allowed.
If one of the 8 slaves is selected, the multiplexer would then activate the slave's chip select after the first byte (or even after the first few bits of the first byte). The output clock would activate with the start of the second byte and it would be synchronous to the input clock. Leaving the clock inactive during the first byte makes sure that the slaves never take notice of the first byte.
Such a chip does not seem to exist. I found solutions with two chip selects, but that's not an option to upgrade old hardware designs with just a single chip select.
Does such a thing exist?
It does not exist because there is no need for it. Often CS is only controlled from software as a regular gpio before initiating an SPI transfer, then you could just wire the different slaves to different GPIO and use them as CS. If the CS is generated and needed by the hardware block you gate this signal with the external chip select to choose the wanted slave.
Your proposed one byte to select bus would also break all software which relay on writing from and reading to the same buffer, and it would be decrease the available bandwidth with control signals.
As a side note with recent development with secure enclaves and Trustzone the trend is not to share spi buses but rather hard wire them so only code running in the trusted part of the SoC will be able to access the connected slave.

How UART handles spike or noise state change?

In UART, the default state is high it will start the process when it receives start bit i.e, state change of high to low.
My question is In some worst scenario any spike, noise or interrupt may
change the default sate from high to lowwhich is enough for the UART to detect it as a start bit, So that it should not start
processing the next bits. I know UART internally handles these
scenarios and avoids the processing.
Can anyone explain me how UART behaves for this?
If the signal drop is short enough, the UART might not detect it as a start bit. For example, a UART might oversample the line level at 16x the bit rate and when it detects a falling edge in the RX line will then look for a certain number of 0 samples in the next 16 to detect the start bit. If it doesn't see enough 0 samples, it won't consider it a start bit (and might set some condition bit that indicates a noisy/dirty line).
But if the UART sees the line drop for long enough to consider it a start bit, then the UART will start processing as if a character is being received. If the signal is "junk" this might result in:
a spurious character,
a framing error,
a parity error (if the UART is configured to check parity),
a 'break' signal (if the line stays in the low state long enough)
The datasheet for the UART device should give details on how bits are sampled & detected.
UART Receiver samples the Rx line 16 times(most uC's) before confirming the value of each bit. For example, if the baud is 9600,each bit time will be 104uS. Now to correctly detect the value of each bit(i.e whether its high or low), the UART receiver samples the bus every 104uS/16 seconds. The majority voting of these samples is then used to decide the value of a bit. The start bit is used to indicate to the receiver that data bits are about to be received. Its mandatory that the start bit be low.By employing such a sampling scheme, the receiver ensures that the effects of noise is eliminated.
UART is an old, increasingly obsolete 1970s technology, so the error handling is poor. On the most fundamental level, the UART will trigger any read by a falling edge, which is the start bit.
So if there is noise which gets interpreted as the falling edge of the start bit, the UART will then sample 8 data bits, with sample times according to its pre-configured baudrate (usually it samples 16 times faster than the set baudrate). It will sample regardless of any edges present. After that, it will clock in a potential parity bit, and then 1 or 2 stop bits.
At the point where the stop bit is read, the UART checks to ensure that the parity bit (if present) is correct, if not, you will get parity errors. And then it checks the stop bit, which must be high, otherwise you get a framing error.
These kind of error checks are roughly of a fifty/fifty quality. Double-bit errors or larger will cause lots of problems. "Parity" in particular, is a poor method of detecting errors. Professionals stopped using it around 30 years ago. Overall, the error detection chance by hardware is very poor, by design.
This is why all UART-based protocols must have lots of checksums, sync bytes and other such overhead in order to work. UART is very vulnerable to EMI, to the point where you cannot reliably use UART buses outside a circuit board, without using differential signals (like RS-422).

When putting an SD/MMC card into SPI mode, can CS go high in between bytes?

I have a microprocessor with the card's chip select (CS) line tied to a 'frame' signal automatically driven by the SPI (SSP) circuit. This causes CS to go high between each byte.
The MMC/SD specs require that CS be held low in order to enter SPI mode. Does it need to be held low the entire time, or only when transmitting each byte of CMD0?
At the sdcard.org site, I found various PDF specifications for SDIO. None of these seem to have an explicit timing statement which clarifies this. However, this statement does occur:
(1) SD Bus mode is selected by CMD0 (Keep Pin 1 to high during CMD0 execution).
from page 88 of SD Host Controller Simplified Specification Version 2.00. ("Pin 1" is Chip Select (CS))
Given that sentence, an SD card manufacturer would be justified in requiring that you assert CS through the entire D0..D15 bits being sent. In other words, I think you cannot use the SPI frame signal and will need a GPIO pin or similar.

How CPU finds ISR and distinguishes between devices

I should first share all what I know - and that is complete chaos. There are several different questions on the topic, so please don't get irritated :).
1) To find an ISR, CPU is provided with a interrupt number. In x86 machines (286/386 and above) there is a IVT with ISRs in it; each entry of 4 bytes in size. So we need to multiply interrupt number by 4 to find the ISR. So first bunch of questions is - I am completely confused in mechanism of CPU receiving the interrupt. To raise an interrupt, firstly device shall probe for IRQ - then what ? The interrupt number travels "on IRQ" towards CPU? I also read something like device putting ISR address on data bus ; whats that then ? What is the concept of devices overriding the ISR. Can somebody tell me few example devices where CPU polls for interrupts? And where does it finds ISR for them ?
2) If two devices share an IRQ (which is very much possible), how does CPU differs amongst them ? What if both devices raise an interrupt of same priority simultaneously. I got to know there will be masking of same type and low priority interrupts - but how this communication happens between CPU and device controller? I studied the role of PIC and APIC for this problem, but could not understand.
Thanks for reading.
Thank you very much for answering.
CPUs don't poll for interrupts, at least not in a software sense. With respect to software, interrupts are asynchronous events.
What happens is that hardware within the CPU recognizes the interrupt request, which is an electrical input on an interrupt line, and in response, sets aside the normal execution of events to respond to the interrupt. In most modern CPUs, what happens next is determined by a hardware handshake particular to the type of CPU, but most of them receive a number of some kind from the interrupting device. That number can be 8 bits or 32 or whatever, depending on the design of the CPU. The CPU then uses this interrupt number to index into the interrupt vector table, to find an address to begin execution of the interrupt service routine. Once that address is determined, (and the current execution context is safely saved to the stack) the CPU begins executing the ISR.
When two devices share an interrupt request line, they can cause different ISRs to run by returning a different interrupt number during that handshaking process. If you have enough vector numbers available, each interrupting device can use its own interrupt vector.
But two devices can even share an interrupt request line and an interrupt vector, provided that the shared ISR is clever enough to go back to all the possible sources of the given interrupt, and check status registers to see which device requested service.
A little more detail
Suppose you have a system composed of a CPU, and interrupt controller, and an interrupting device. In the old days, these would have been separate physical devices but now all three might even reside in the same chip, but all the signals are still there inside the ceramic case. I'm going to use a powerPC (PPC) CPU with an integrated interrupt controller, connected to a device on a PCI bus, as an example that should serve nicely.
Let's say the device is a serial port that's transmitting some data. A typical serial port driver will load bunch of data into the device's FIFO, and the CPU can do regular work while the device does its thing. Typically these devices can be configured to generate an interrupt request when the device is running low on data to transmit, so that the device driver can come back and feed more into it.
The hardware logic in the device will expect a PCI bus interrupt acknowledge, at which point, a couple of things can happen. Some devices use 'autovectoring', which means that they rely on the interrupt controller to see to it that the correct service routine gets selected. Others will have a register, which the device driver will pre-program, that contains an interrupt vector that the device will place on the data bus in response to the interrupt acknowledge, for the interrupt controller to pick up.
A PCI bus has only four interrupt request lines, so our serial device will have to assert one of those. (It doesn't matter which at the moment, it's usually somewhat slot dependent..) Next in line is the interrupt controller (e.g. PIC/APIC), that will decide whether to acknowledge the interrupt based on mask bits that have been set in its own registers. Assuming it acknowledges the interrupt, it either then obtains the vector from the interrupting device (via the data bus lines), or may if so programmed use a 'canned' value provided by the APIC's own device driver. So far, the CPU has been blissfully unaware of all these goings-on, but that's about to change.
Now it's time for the interrupt controller to get the attention of the CPU core. The CPU will have its own interrupt mask bit(s) that may cause it to just ignore the request from the PIC. Assuming that the CPU is ready to take interrupts, it's now time for the real action to start. The current instruction usually has to be retired before the ISR can begin, so with pipelined processors this is a little complicated, but suffice it to say that at some point in the instruction stream, the processor context is saved off to the stack and the hardware-determined ISR takes over.
Some CPU cores have multiple request lines, and can start the process of narrowing down which ISR runs via hardware logic that jumps the CPU instruction pointer to one of a handful of top level handlers. The old 68K, and possibly others did it that way. The powerPC (and I believe, the x86) have a single interrupt request input. The x86 itself behaves a bit like a PIC, and can obtain a vector from the external PIC(s), but the powerPC just jumps to a fixed address, 0x00000500.
In the PPC, the code at 0x0500 is probably just going to immediately jump out to somewhere in memory where there's room enough for some serious decision making code, but it's still the interrupt service routine. That routine will first go to the PIC and obtain the vector, and also ask the PIC to stop asserting the interrupt request into the CPU core. Once the vector is known, the top level ISR can case out to a more specific handler that will service all the devices known to be using that vector. The vector specific handler then walks down the list of devices assigned to that vector, checking interrupt status bits in those devices, to see which ones need service.
When a device, like the hypothetical serial port, is found wanting service, the ISR for that device takes appropriate actions, for example, loading the next FIFO's worth of data out of an operating system buffer into the port's transmit FIFO. Some devices will automatically drop their interrupt request in response to being accessed, for example, writing a byte into the transmit FIFO might cause the serial port device to de-assert the request line. Other devices will require a special control register bit to be toggled, set, cleared, what-have-you, in order to drop the request. There are zillions of different I/O devices and no two of them ever seem to do it the same way, so it's hard to generalize, but that's usually the way of it.
Now, obviously there's more to say - what about interrupt priorities? what happens in a multi-core processor? What about nested interrupt controllers? But I've burned enough space on the server. Hope any of this helps.
I Came over this Question like after 3 years.. Hope I Can help ;)
The Intel 8259A or simply the "PIC" has 8 pins ,IRQ0-IRQ7, every pin connects to a single device..
Lets suppose that u pressed a button on the keyboard.. the voltage of the IRQ1 pin, which is connected to the KBD, is High.. so after the CPU gets interrupted, acknowledge the Interrupt bla bla bla... the PIC does simply add 8 to the number of the IRQ line so IRQ1 means 1+8 which means 9
SO the CPU sets its CS and IP on the 9th entry in the vector table.. and because the IVT is an array of longs it just multiply the number of cells by 4 ;)
CPU.CS=IVT[9].CS
CPU.IP=IVT[9].IP
the ESR deals with the device through the I/O ports ;)
Sorry for my bad english .. am an Arab though :)