Is there a way to synchronize custom interrupt signals with AXI master transactions in Vitis HLS? - interrupt

I have been unable to find an answer, possibly due to me being unable to put specific enough nomenclature on the involved processes.
I use Vitis HLS to synthesize designs where one call of the main function is one clock cycle long, being pipelined of course. This works fine for almost all of our cases. Where this is not possible (i.e. for components where we need to guarantee certain latencies / pipelining depths) I use verilog.
The goal is to transfer data via DMA to a Zynq-7000's memory and THEN issue an interrupt to let the PS know that the DMA transfer is finished.
Suppose I have a Vitis HLS project, where the PS can initiate a DMA transfer of uint32s using (a rising edge on a signal in) an s_axilite interface to my component, like in the code below:
#include <cstdint>
void Example
(
uint32_t *dmaRegion,
bool &intrSig,
volatile bool writeNow
)
{
#pragma HLS PIPELINE II=1
#pragma HLS INLINE RECURSIVE
#pragma HLS INTERFACE s_axilite port=return bundle=registers
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE m_axi port=dmaRegion offset=slave bundle=x
#pragma HLS INTERFACE s_axilite port=dmaRegion bundle=registers
#pragma HLS INTERFACE ap_none port=dmaRegion
#pragma HLS INTERFACE s_axilite port=writeNow bundle=registers
#pragma HLS INTERFACE ap_none port=writeNow
#pragma HLS INTERFACE ap_none port=intrSig
static bool lastWriteNow { false };
static uint32_t Ctr { 0 };
bool intr = false;
if (!lastWriteNow && writeNow)
{
Ctr++;
dmaRegion[10] = Ctr;
intr = true;
}
intrSig = intr;
lastWriteNow = writeNow;
}
Now, this seems to work fine and cause a 1-clock-cycle-pulse interrupt as long as WREADY is driven high by the Zynq (and through a SmartConnect to my component) and I have found some examples where this is done this way. Also, the PS grabs the correct data from the DDR memory (L2 Data Cache has been disabled for this memory region) directly after the interrupt.
However, what will happen if for example more AXI masters are trying to drive the Smart Connect and cause congestion, effectively causing WREADY to go low for this component? In my tests, where I drove the WREADY signal of the AXI Smart Connect Master Interface to a constant zero to simulate (permanent) congestion, the interrupt signal (and WVALID) was driven to a permanent high, which would mean.... what? That the HLS design blocked inside the if clause? I do not quite get it as it seems to me that this would contradict the II=1 constraint (which is reported by Vitis HLS as being satisfied).
In a way it makes sense of course, since WVALID must go high when data is available and it must stay high until WREADY is high as well. But why the interrupt line goes (and stays) high no matter what even though the transaction is not yet finished evades me.
Is this at all possible with any guarantees about the m_axi interface, or will I have to find other solutions?
Any hint and information (especially background information about that behaviour) is very much appreciated.
Edit:
For example, this works fine:
but this causes the interrupt to stay high forever:
Of course, the transaction cannot finish. But it seems I have no way of unblocking the design so long as the AXI bus is congested.

Vitis Scheduler view
When I compile your code and look at the schedule view this is the result:
What I understand is that there is phi node (term borrowed from LLVM) which means that the value of intrSig can't be set before finishing the AXI4 write response. Since this is then converted into RTL the signal must have a value, and if it goes high, then there is congestion on the AXI4, it will stay high until the AXI transaction has finished.
HLS craziness
I tried to look into the HDL, with not much luck. I only got an intuition though which I try to share:
The red wires are the ones that eventually drive the intrSig signal. The flip flop is driven to 1 through the SET port, and to 0 by the RST port.
Long way to intrSig from this FF, but it eventually gets there:
The SET signal is driven by combinatorial logic using writeNow:
And lastly the wready goes a long way but it interferes to the pipeline chain of registers that eventually drives the intrSig.
Is this proof of what is happening? Unfortunately no, but there are some hints that the outcome of the m_axi transaction stops the interrupt pipeline to advance.
Some debugging hints
I don't know if clearing the wready signal actually simulates congestion, the axi protocol starts with a awready and I expect a congested interconnect to not accept transactions from the beginning.
Also, I would instantiate your IP alone, then attach some AXI VIP (axi verification IPs) which are provided in Vivado by Xilinx and programmed in SystemVerilog to give you the output you want, while recording all your data. You will also be able to look all the waveforms and detect where your issues are.
You can have your IP write into one of these AXI4VIP configured in slave mode, or you can write to a BRAM.
I'll leave here some documentation.

Related

Will semaphore corrupt data transmission of peripherals like UART in a microcontroller?

Semaphore disables interrupts and so will this cause other operations like receiving data on SPI to get corrupt?
Disabling interrupts cannot corrupt the data on the hardware interface.
The problem is if the data is received by the hardware peripheral and then the it raises an interrupt to have the processor collect the data then this will be delayed. If it is delayed for too long then potentially more data will have been received. Depending on the peripheral, either the new data or the old data will have to be discarded. Either way stream of data will be incomplete.
In most cases it is difficult to predict or test how long it is safe to disable interrupts for, so if possible it is best to avoid turning interrupts off.
If the peripheral includes a FIFO buffer, then the length of time that it is safe to disable interrupts for may be increased (although still difficult to predict).
Most modern microcontrollers have many ways to avoid disabling interrupts:
A better approach is to have the peripheral transfer the data to memory with DMA, so no interrupt is required at all.
Most modern processor cores provide ways to implement a semaphore do not even need to disable interrupts.
There's no standard way of implementing a semaphore. To disable all interrupts on the MCU is one way to do it, but it's a very poor amateur way of doing so. Because in more complex applications with multiple interrupts, this will make all real-time considerations and calculations a nightmare.
It creates subtle but severe bugs. Particularly when some quack has done so from deep inside some driver code. You import the driver into your project and suddenly previously working code breaks. In particular, be very careful about using various libs provided by silicon vendors - they are often of very poor quality.
There are better ways to do it, including:
Ensuring atomic access of shared variables, which can only be done with inline assembler or C11 _Atomic if supported.
Disabling one specific interrupt for a specific hardware peripheral, if it is possible to do do given the real-time considerations. Then this should be handled by the driver for that hardware peripheral in the form of setter/getter functions.
Use a "poor man's semaphore" in the form of a plain flag variable, by relying on the interrupt mechanism of the MCU blocking all other interrupts while the ISR is executing. Example.

TinyAVR 0-Series: Can I use pin-change sensing without entering interrupt handler?

I am evaluating the ATtiny806 running at 20MHz to build a cycle-accurate Intel 4004 microprocessor emulator. (I know it will be a bit too slow, but AVRs have a huge community.)
I need to synchronize to the external, two-phase non-overlapping clocks. These are not fast clocks (the original 4004 ran at 750kHz)
but if I spin-wait for every clock edge, I risk wasting most of my time budget.
The TinyAVR 0-series has a very nice pin-change interrupt facility that can be configured to trigger only on rising edges.
But, an interrupt routine round-trip is 8 cycles (3 in, 5 out).
My question is:
Can I leverage the pin-change sensing mechanism while never visiting an ISR?
(Other processor families let you poll for interruptible conditions without enabling interrupts from that peripheral). Can polling be done with a tight skip-on-bit/jump-back loop, followed by a set-bit instruction?
Straightforward way
You can always just poll on the level of the GPIO pin using the single cycle skip if bit set/clear instruction on the appropriate PORT register and bit.
But as you mention, polling does burn cycles so I'm not sure exactly what you want here - either a poll (that burns cycles but has low latency) or an interrupt (that has higher latency but allows processing to continue until the condition is true).
Note that if things get really tight and you are looking for, say, power savings by sleeping between clock signal transitions then you can do tricks like having an ISR that nevers returns (saving the IRET cycles) but that requires some careful coding probably with something like a state machine.
INTFLAG way
Alternately, if you want to use the internal pin state machine logic and you can live without interrupts, then you can use the INTFLAGS flags to check for the pin change configured in the ISC bits of the PINxCTRL register. As long as global interrupts are not enabled in SREG then you can spin poll on the appropriate INTFLAG bit to check/wait for the desired condition, and then write a 1 to that bit to clear the flag.
Note that if you want to make this fast, you will probably want to map the appropriate PORT to a VPORT since the VPORT registers are in I/O Memory. This lets you use SBIS to test the INTFLAG bit a single cycle and SBI to clear the bit in a single cycle (these instructions only work on IO memory and the normal PORT registers are not in IO Memory).
Finally one more complication, if you need to leave the interrupts on when doing this, it is probably possible by hacking the interrupt priority registers. You'd set the pin change to be on level 0, and then make sure the interrupts you care about are level 1 or higher, and then trick the interrupt controller into thinking that there is already a level 0 running so these interrupts do not actually fire. There are also other restrictions to this strategy so avoid it if at all possible.
Programmable logic way
If you want to get really esoteric, it is likely possible that you could route the input value of a pin to a configurable custom logic LUT in the chip and then route the output of that module to a bit that you test using a 1-cycle bit test (maybe an unused IO Pin). To do this, you'd feedback the output of the LUT back into one of its inputs and then use the LUT to create a strobe on the edge you are looking for. This is very complex, and also since the strobe has no acknowledgement that if the signal changes when you are not looking for it (in a spin check) then it will be lost and you will have to wait for the next edge (probably fatal in your application).

what is the best way to design a shift register with stm32

I am using a STM32F031K6, clocked at 40MHz, and I want to design a program which acts as a looping shift register - an external trigger is used to clock it, the values in the shift register left shift every time a rising/falling edge is received. the output is one pin either high or low.
I need to make the time between the clocking edge and the output less than 0.5uS, or failing that as quick as possible. The values of the shift register can be changed and the length can also be changed, but for now I'm just starting with a byte like 11000010 .
I initially thought to implement this with an external interrupt but it was suggested there may be a better way to implement it
any help much appreciated
You might use the SPI peripheral of the STM32F0 for your task. When configured in slave mode, each time an external clock edge is detected on the SCK signal, the MISO will be set to the next bit of a value loaded into an internal shift register via the SPI data register.
Check out the chapter on the Serial peripheral interface (SPI) in STM32F0 reference manual.
Especially have a look at the sections addressing the following keywords:
General description: SPI block diagram
Slave Mode (Master selection: Slave configuration)
Simplex communication: Transmit-only mode (RXONLY=0)
Slave select (NSS) pin management: Software NSS management (SSM = 1)
Data frame format (data size can be set from 4-bit up to 16-bit length)
Configuration of SPI
The SPI unit is highly configurable, e.g. regarding the polarity of clock signal. Since it is an independent hardware unit, it should be able to handle your 0.5us reaction time requirement. The MCU firmware needs to set up the SPI unit and then provide new data to the SPI unit, each time the Tx buffer empty flag (TXE) is set. This can also be done by interrupt (TXEIE) or even using a DMA channel (TXDMAEN) with a circular buffer. In the latter case the "shift register functionality" runs completely independent of the MCU core (after setup).

STM32F4 Handling peripheral error while making a DMA Transfer (RX)

I am trying to communicate with the UART peripheral using DMA for both RX and TX.
I am using the HAL library that is supplied by ST (Generated with STCubeMX).
I am handling a UART channel with 1.5MBaud - so in order to not loose any data, I've configured the DMA in direct mode, with circular buffer, and handled the half-transfers interrupts to take care of the data, and keep the DMA online for more data to come.
The problem is that sometimes I can see in the Status Register of the UART that the Frame Error bit is on, and sometimes the Overrun Error flag is also on.
I can handle to lost bytes (using crc on the structured packets), but the problem is that the peripheral stops receiving data - but the DMA does not raise error, or stop the transfer.
So if I try to receive data, and the flag is on the system hangs.
I saw that the HAL provides a __weak function that should handle UART_Error, but it is never called - and the status in the HAL handle remains normal.
only a look at the register can tell that there is a problem.
How should I detect/handle these kind of errors?
Thanks
I do not use the HAL for performance reasons, as it is very clumsy and - imo also does not provide much abstraction to justify that. Handling the hardware directly is not much more complicated; even more as you still have to understand very well what goes on. And as you already detected, the HAL does only support a certain approach; once you follow your own trail, you are lost.
You apparently have similar issues as the overflow-flag is set. After such an error, you have to re-sync the receiver with the transmitter bytestream after an error in general. That would require out-of-band signalling using a symbol or line-condition not occuring within a packet. Framing errors are a good indicator there are problems to sync to the start of a symbol (start-bit) properly.
If the line is clean (not EMC problems), there should be no framing errors or data corruption (unless timing parameters do not match).
If using a simple ping-pong, a timeout might be sufficient. However, tha proper solution depends on the protocol. A good protocol design takes transmission errors and overflows into account.
Note that you have to enable receive-error interrupts in addition to DMA transfers to be informed. However, if you use a timeout (and a ping-pong protocol), you just can erase the flags, as the data did apparently not arrive in-time. If actually using error-interrupts be aware of race-conditions, too.

How to simulate a scheduler?

In my firmware I write to MicroSD in a background task, and I've got a lot of higher-priorities interrupts enabled, some of which can take several milliseconds.
So the writing/reading from SPI can be interrupted at any moment, and for writes that may not be such a problem (if SPI behaves anything like UART), but during reads I'm afraid that my hardware SPI FIFO's will overflow if the task just happens to be interupted while the MicroSD card is sending a datablock.
Now the obvious solution would be to decrease the time that the higher priority interrupts take, but this seems very hard, because sometimes they have to wait on other peripherals too, and too prevent that I have to rewrite a lot of code that does polling now, to an interrupt-structure, which would make the overall code much more complicated.
I think in modern OSes this is solved by letting all those tasks run synchronously at the same priority, and give them all an equal time slice. But I don't have any mechanisms for threading, or an OS, so what would be the simplest way to solve this?
write to MicroSD [...]
hardware SPI FIFO's will overflow
You are the Master of the SPI: You control the SPI clock. The SPI Master will only generate a clock signal when it has a data frame to transfer - otherwise the clock is in idle state. This is also true for read operations: SPI always reads and writes at the same time.
In short, SPI will never overflow if you are the master. Hardware FIFOs do not change this fact.
"I think in modern OSes this is solved by letting all those tasks run synchronously at the same priority, and give them all an equal time slice. But I don't have any mechanisms for threading, or an OS, so what would be the simplest way to solve this?"
multitasking in OS is not the same as interrupts.
I would layout the following:
SPI interrupt handlers for reading and writing. You've got the SPI FIFOs that you need to be aware of. You might get interrupts for overflow and "watermark" conditions. Be sure and handle these. Read your MCU user guide for specifics. Give your interrupt handlers their own circular queues in software. Size the queues appropriately given the size of the FIFO's (this is a choice based on hardware FIFO size, page size of the device you are reading/writing, and available memory.)
State machine module to be called from your application. This should have its own circular queues. The state machine should have functions to read and write data, as well as a "pump" function (i.e. a function that is called periodically from the main loop, aka "scheduled" from the main loop).
Your "tasks" that read and write from the SPI device state machine, should also be state machines, and they should also be able to handle not being able to write, or no data ready. In general, DO NOT BLOCK! Write your functions so that if what they need is not ready/available, they just quit, expecting to be called, aka "scheduled" at a later time.
So a general flow would be:
[task that wants to write] -> queue -> [Device state machine] -> queue -> [SPI interrupt] -> hardware queue -> [hardware] -> wire
wire -> [hardware -> hardware queue -> [SPI interrupt] -> queue -> [Device state machine] -> queue -> [task that wants to read]
Without specifics on your architecture, its hard to provide more details. But I've successfully used this pattern in many embedded device drivers.