what is the best way to design a shift register with stm32 - embedded

I am using a STM32F031K6, clocked at 40MHz, and I want to design a program which acts as a looping shift register - an external trigger is used to clock it, the values in the shift register left shift every time a rising/falling edge is received. the output is one pin either high or low.
I need to make the time between the clocking edge and the output less than 0.5uS, or failing that as quick as possible. The values of the shift register can be changed and the length can also be changed, but for now I'm just starting with a byte like 11000010 .
I initially thought to implement this with an external interrupt but it was suggested there may be a better way to implement it
any help much appreciated

You might use the SPI peripheral of the STM32F0 for your task. When configured in slave mode, each time an external clock edge is detected on the SCK signal, the MISO will be set to the next bit of a value loaded into an internal shift register via the SPI data register.
Check out the chapter on the Serial peripheral interface (SPI) in STM32F0 reference manual.
Especially have a look at the sections addressing the following keywords:
General description: SPI block diagram
Slave Mode (Master selection: Slave configuration)
Simplex communication: Transmit-only mode (RXONLY=0)
Slave select (NSS) pin management: Software NSS management (SSM = 1)
Data frame format (data size can be set from 4-bit up to 16-bit length)
Configuration of SPI
The SPI unit is highly configurable, e.g. regarding the polarity of clock signal. Since it is an independent hardware unit, it should be able to handle your 0.5us reaction time requirement. The MCU firmware needs to set up the SPI unit and then provide new data to the SPI unit, each time the Tx buffer empty flag (TXE) is set. This can also be done by interrupt (TXEIE) or even using a DMA channel (TXDMAEN) with a circular buffer. In the latter case the "shift register functionality" runs completely independent of the MCU core (after setup).

Related

How to understand the SPI clock modes?

There are many links on the web describing the SPI timing/clock modes. E.g., the following picture from here indicates 4 combinations of CPOL/CPHA determines when to sampling/transmitting data wrt clock rising/failing edges:
And for SPI to work correctly, it's required that both the controller(or master) and the device (or slave) should work in the same clock mode.
Few days ago, I encounter a datasheet which describes a QSPI controller, saying that it only supports mode 1 (CPOL=0, CPHA=1). It also containing a AC timing requirement for the SPI interface, as below:
The timing diagram confirms that the controller works in mode 1: sclk at low when idle, and data sampled at falling edge of sclk. So far so good.
What puzzled me is another statement in the user guide: "The SPI only supports SPI mode 1. but, if the SPI device follows the Soc AC timing, it is usable regardless of the mode." How it's possible? Apparently it violates the "same mode" rule. For example, how it works when the controller is in mode 1 (sampling on falling edge) and the device is in mode 0 or 3 (sampling on the rising edge)?
Btw, a working system shows that the controller actually works with a QSPI flash devices whose datasheet tells that it only support mode 0 and 3. This implies that I must miss some points in understanding the SPI clock modes...
Any explanation is appreciated.
Your "mode 1" qspi controller is not behaving like a normal mode 1 spi controller.
Look at the timing diagram: It outputs new data on MOSI on the falling edge of CLK, AND it samples MISO on the falling edge of CLK. That is not equivalent of any of the four SPI modes.
It is done to allow much higher performance.
(Q)SPI flash memories sample MOSI on the rising edge of CLK, and output new data on MISO on the falling edge of CLK.
Regular SPI controllers will sample MISO on the rising edge of CLK. This means the entire round-trip of CLK from controller to flash, flash access, and MISO back to the controller, must happen in just half a clock cycle.
This will typically limit you to 25MHz or less.
By having the controller sample MISO on the falling edge, the round-trip gets a whole clock period worth of time, which means you can reach 50MHz or more.
In general:
In SPI there is only one clock edge that matters to the receiver. In modes 0 and 3 it is the rising edge, in modes 1 and 2 it is the falling edge.
The receiver requires the data that it is going to read to be valid for some short period immediately before the edge that matters (called the "setup time") and requires that it remains valid for some short period after the edge that matters (called the "hold time").
Outside this small window (the setup time plus the hold time) the value of the data lines does not matter at all. The receiver does not do anything at the other clock edge to that the one where it samples the data, and the line may have any value then, including an intermediate voltage which is between high and low.
The only responsibility of each transmitter is to meet the requirements of the receiver. That is, the master must control MOSI in such a way that that it meets the requirements of the slave, and the slave must control MISO so that it meets the requirements of the master.
At slow speeds the easiest way for each transmitter to meet the requirements of the other device is to change the data at the moment of the opposite clock edge to the one the receiver cares about. This is the implies using the same clock mode at both ends as shown in the diagrams in your question.
At higher speeds this is not always ideal and the timing method described in your datasheet extract is widespread and perfectly normal.
For your particular device:
Your slave device samples MOSI on the falling edge of CLK. It requires the master to have driven MOSI to the correct level at least tIS (input sample time) before this and to hold it for at least tIH (input hold time) after this.
In the other direction, your slave device holds the previous MISO value for at least tCLQX (clock low output hold time) after the edge where it assumes the master would have sampled it.
Your slave device outputs the next MISO value no later than tCLQV (clock low to output valid time) after the same falling edge.
In the case of the naive timing diagrams in your question, you would assume that tCLQX would have the same value as tCLQV and would also coincide with the next rising edge of the clock, but the datasheet is saying that this does not have to be the case.

Freertos and the necessity of uart transmit interrupt

For uart reception, it's pretty obvious to me what can go wrong in case of 'blocking receive' over uart. Even in freertos with a dedicated task to read from uart, context / task switching could result in missing bytes that were received in the uart peripheral.
But for transmission I am not really sure if there is a need for interrupt based approach. I transmit from a task, and in my design it's no problem if that task is blocked for a short while. (it also blocks/sleeps on mutexes e.g).
Is there another strong argument to use use uart transmit in interrupt mode? I am not risking anything wrt loss of data, right?
In my case I use an stm32, but I guess the type of mcu is not really relevant here.
Let's focus on TX only and assume that we don't use interrupts and handle all the transmission with the tools provided by the RTOS.
µC UART hardware generally have a transmit shift register (TSR) and some kind of data register (DR). The software loads the DR, and if the TSR is empty, DR is instantly transferred into TSR and TX begins. The software is free to load another byte into DR, and the hardware loads the new byte from DR to TSR whenever the TX (shift-out) of the previous byte finishes. Hardware provides status bits for querying the status of DR & TSR. This way, the software can using polling method and still achieve continuous transmission with no gaps between the bytes.
I'm not sure if the hardware configuration I described above holds for every µC. I have experience with 8 & 16-bit PICs and STM32 F0, F1, F4 series. They are all similar. UART hardware doesn't provide additional hardware buffers.
Now, back to RTOS... Obviously, your TX task needs to be polling UART status bits. If we assume that UART baud rate is 115200 (which is a common value), you waste ~90 µs of polling for each byte. The general rule of RTOS is that if you are waiting for something to happen, your task needs to be blocked so other tasks can run. But block on what? What will tell you when to unblock? For this you need interrupts. Your task blocks on task notification, (ulTaskNotifyTake()), and the interrupt gives the notification using xTaskNotifyGive().
So, I can't imagine any other way without using interrupts. But, the method mentioned above isn't good either. It makes no sense to block - unblock with each byte.
There are 2 possible solutions:
Move TX handling completely to interrupt handler (ISR), and notify the task when TX is completed.
Use DMA instead! Almost all modern 32-bit µCs have DMA support. DMA generates a single interrupt when the TX is completed. You can notify the task from the DMA transfer complete interrupt.
On this answer I've focused on TX, but using DMA is the proper way of handling reception (RX) too.

SPI slave within boundary scan on STM32F4

I'd like to test a SPI slave on my STM32F4 via JTAG boundary scanning methods (best would be using OpenOCD, instead of other special tool).
Does somebody know details and typical pitfalls of such thing?
What I found was this site, whereas this neatly explains boundary scanning.
I am thankful for any hint on that topic.
As the linked site points out, testing your µC's output on the SPI pins through boundary scan will suffer from very low speed (because you have to feed the corresponding bit-banging commands through the boundary-scan protocol, which is far from efficient).
Using the STM32F4 controller, I therefore suggest you to keep the CPU in debug (break), and to set up the GPIOs and SPI through JTAG (as if the firmware were doing this from inside). Then, you are free to put entire data bytes/words into the TX register and poll the SPI status and RX register. This is one or two levels above the (plain) boundary scan method, but it will be quite easy to implement.
(Only) if you want to take this idea even some steps further, you can use JTAG first to switch the clock settings to higher speed or to add DMA (and write larger amounts of data to RAM before triggering the SPI transfer).

Sampling a high speed serial bit stream with MCU

I'm currently working on an application where an MCU is receiving data from a hardware chip in the form of an asynchronous serial bit transmission at 2Mbps. This data has no encoding and no protocol aside from a start sequence, after which it is raw binary data.
The current approach for recovery is using the SPI module in 3-pin mode to oversample the stream 4x at 8MHz, allowing for recovery of the asynchronous data. While seemingly effective thus far with a simulated testbench, this method is rather complicated as an internal clock needs to be routed to the SPI CLK as the device is run in slave mode in order for DMA to recover the transmitted data while the processor executes another task.
Would it be possible to use any other peripherals efficiently for this task aside from SPI? Faking a communication protocol to recover a serial bit stream seems a bit roundabout, but I am not sure how to utilize UART or I2C without doing the same, and those might not even be possible to use as the protocol bits are not present in the stream. I also want to avoid using an ADC in the interest of power, along with the fact that the data is already digital so it seems unnecessary.

How CPU finds ISR and distinguishes between devices

I should first share all what I know - and that is complete chaos. There are several different questions on the topic, so please don't get irritated :).
1) To find an ISR, CPU is provided with a interrupt number. In x86 machines (286/386 and above) there is a IVT with ISRs in it; each entry of 4 bytes in size. So we need to multiply interrupt number by 4 to find the ISR. So first bunch of questions is - I am completely confused in mechanism of CPU receiving the interrupt. To raise an interrupt, firstly device shall probe for IRQ - then what ? The interrupt number travels "on IRQ" towards CPU? I also read something like device putting ISR address on data bus ; whats that then ? What is the concept of devices overriding the ISR. Can somebody tell me few example devices where CPU polls for interrupts? And where does it finds ISR for them ?
2) If two devices share an IRQ (which is very much possible), how does CPU differs amongst them ? What if both devices raise an interrupt of same priority simultaneously. I got to know there will be masking of same type and low priority interrupts - but how this communication happens between CPU and device controller? I studied the role of PIC and APIC for this problem, but could not understand.
Thanks for reading.
Thank you very much for answering.
CPUs don't poll for interrupts, at least not in a software sense. With respect to software, interrupts are asynchronous events.
What happens is that hardware within the CPU recognizes the interrupt request, which is an electrical input on an interrupt line, and in response, sets aside the normal execution of events to respond to the interrupt. In most modern CPUs, what happens next is determined by a hardware handshake particular to the type of CPU, but most of them receive a number of some kind from the interrupting device. That number can be 8 bits or 32 or whatever, depending on the design of the CPU. The CPU then uses this interrupt number to index into the interrupt vector table, to find an address to begin execution of the interrupt service routine. Once that address is determined, (and the current execution context is safely saved to the stack) the CPU begins executing the ISR.
When two devices share an interrupt request line, they can cause different ISRs to run by returning a different interrupt number during that handshaking process. If you have enough vector numbers available, each interrupting device can use its own interrupt vector.
But two devices can even share an interrupt request line and an interrupt vector, provided that the shared ISR is clever enough to go back to all the possible sources of the given interrupt, and check status registers to see which device requested service.
A little more detail
Suppose you have a system composed of a CPU, and interrupt controller, and an interrupting device. In the old days, these would have been separate physical devices but now all three might even reside in the same chip, but all the signals are still there inside the ceramic case. I'm going to use a powerPC (PPC) CPU with an integrated interrupt controller, connected to a device on a PCI bus, as an example that should serve nicely.
Let's say the device is a serial port that's transmitting some data. A typical serial port driver will load bunch of data into the device's FIFO, and the CPU can do regular work while the device does its thing. Typically these devices can be configured to generate an interrupt request when the device is running low on data to transmit, so that the device driver can come back and feed more into it.
The hardware logic in the device will expect a PCI bus interrupt acknowledge, at which point, a couple of things can happen. Some devices use 'autovectoring', which means that they rely on the interrupt controller to see to it that the correct service routine gets selected. Others will have a register, which the device driver will pre-program, that contains an interrupt vector that the device will place on the data bus in response to the interrupt acknowledge, for the interrupt controller to pick up.
A PCI bus has only four interrupt request lines, so our serial device will have to assert one of those. (It doesn't matter which at the moment, it's usually somewhat slot dependent..) Next in line is the interrupt controller (e.g. PIC/APIC), that will decide whether to acknowledge the interrupt based on mask bits that have been set in its own registers. Assuming it acknowledges the interrupt, it either then obtains the vector from the interrupting device (via the data bus lines), or may if so programmed use a 'canned' value provided by the APIC's own device driver. So far, the CPU has been blissfully unaware of all these goings-on, but that's about to change.
Now it's time for the interrupt controller to get the attention of the CPU core. The CPU will have its own interrupt mask bit(s) that may cause it to just ignore the request from the PIC. Assuming that the CPU is ready to take interrupts, it's now time for the real action to start. The current instruction usually has to be retired before the ISR can begin, so with pipelined processors this is a little complicated, but suffice it to say that at some point in the instruction stream, the processor context is saved off to the stack and the hardware-determined ISR takes over.
Some CPU cores have multiple request lines, and can start the process of narrowing down which ISR runs via hardware logic that jumps the CPU instruction pointer to one of a handful of top level handlers. The old 68K, and possibly others did it that way. The powerPC (and I believe, the x86) have a single interrupt request input. The x86 itself behaves a bit like a PIC, and can obtain a vector from the external PIC(s), but the powerPC just jumps to a fixed address, 0x00000500.
In the PPC, the code at 0x0500 is probably just going to immediately jump out to somewhere in memory where there's room enough for some serious decision making code, but it's still the interrupt service routine. That routine will first go to the PIC and obtain the vector, and also ask the PIC to stop asserting the interrupt request into the CPU core. Once the vector is known, the top level ISR can case out to a more specific handler that will service all the devices known to be using that vector. The vector specific handler then walks down the list of devices assigned to that vector, checking interrupt status bits in those devices, to see which ones need service.
When a device, like the hypothetical serial port, is found wanting service, the ISR for that device takes appropriate actions, for example, loading the next FIFO's worth of data out of an operating system buffer into the port's transmit FIFO. Some devices will automatically drop their interrupt request in response to being accessed, for example, writing a byte into the transmit FIFO might cause the serial port device to de-assert the request line. Other devices will require a special control register bit to be toggled, set, cleared, what-have-you, in order to drop the request. There are zillions of different I/O devices and no two of them ever seem to do it the same way, so it's hard to generalize, but that's usually the way of it.
Now, obviously there's more to say - what about interrupt priorities? what happens in a multi-core processor? What about nested interrupt controllers? But I've burned enough space on the server. Hope any of this helps.
I Came over this Question like after 3 years.. Hope I Can help ;)
The Intel 8259A or simply the "PIC" has 8 pins ,IRQ0-IRQ7, every pin connects to a single device..
Lets suppose that u pressed a button on the keyboard.. the voltage of the IRQ1 pin, which is connected to the KBD, is High.. so after the CPU gets interrupted, acknowledge the Interrupt bla bla bla... the PIC does simply add 8 to the number of the IRQ line so IRQ1 means 1+8 which means 9
SO the CPU sets its CS and IP on the 9th entry in the vector table.. and because the IVT is an array of longs it just multiply the number of cells by 4 ;)
CPU.CS=IVT[9].CS
CPU.IP=IVT[9].IP
the ESR deals with the device through the I/O ports ;)
Sorry for my bad english .. am an Arab though :)