How do you avoid interrupt starvation in a nested interrupt system? - embedded

I am learning about interrupts and couldn't understand what happens when there are too many interrupts to a point where the CPU can't process the foreground loop or complete the existing interrupts. I read through this article https://www.cs.utah.edu/~regehr/papers/interrupt_chapter.pdf but didn't completely understand how a scheduler would help, if there are simply too many interrupts?
Do we switch to a faster CPU if the interrupts can not be missed?

Yes, you had to switch to a faster CPU!
You had to ensure that there is enough time for the mainloop. Therefore it is really important to keep your Interrupt service as short as possible and do some CPU workloads tests.

Indeed, any time there is contention over a shared resource, there is the possibility of starvation. The schedulers discussed in the paper limit the interrupt rate, thus ensuring some interrupt-free processing time during each interval. During high activity periods, interrupt handling is disabled, and the scheduler switches to polling mode where it interrogates the state of the interrupt request lines periodically, effectively throttling the stream of interrupts. The operating system strives to do as little as possible in each interrupt handler - tasks are often simply queued so they can be handled later at a different stage. There are many considerations and trade-offs that go into any scheduling algorithm.

Overall you need a clue of how much time each part of your program consumes. This is pretty easy to measure in practice live with an oscilloscope. If you activate a GPIO when entering and de-activate it when leaving the interrupt, you don't only get to see how much time the ISR consumes, but also how often it kicks in. If you do this for each ISR you get a good idea how much time they need. You can then do something similar in main(), to get a rough estimate of the complete execution cycle of the program, main + interrupts.
As for the best solution, it is obviously to reduce the amount of interrupts. Use polling if possible. Use DMA. Use serial peripherals (UART, CAN etc) that are hardware-buffered instead of interrupt-intensive ones. Use hardware PWM instead of output compare timers. And so on. These things need to be considered early on when you pick a suitable MCU for your project. If you picked the wrong MCU, then you'll obviously have to change. Twiddling with the CPU clock sounds like quick & dirty fix. Get the design right instead.

Related

Is there any problem with computing the whole program in an ISR?

I have a program that should be run periodically on a MCU (ex. STM32). For example it should be run at every 1 ms. If I program an 1ms ISR and call my complete program in it, assuming it will not exceed 1ms, is that a good way? Is there any problem that I could be facing? Will it be precise?
The usual answer would be probably that ISRs should be generally kept to a minimum and most of the work should be performed in the "background loop". (Since you are apparently using the traditional "foreground-background" architecture, a.k.a. "main+ISRs").
But ARM Cortex-M (e.g., STM32) has been specifically designed such that ISRs can be written as plain C functions. In that case, working with ISRs is no different than working with any other C code. This includes ease of debugging.
Moreover, ARM Cortex-M comes with the NVIC (Nested Vectored Interrupt Controller). The NVIC allows you to prioritize interrupts and they can preempt each other. This means that you can quite easily build sets of periodic "tasks" (ISRs), which run under the preemptive, priority-based scheduler. Interestingly, this scheduler (implemented in the NVIC hardware) meets all requirements of RMA/RMS (Rate Monotonic Analysis/Scheduling), so you could prove the schedulability of your system. Of course, the ISR "tasks" cannot block internally, but this is not required for RMA/RMS. Also, if you have any shared resources between ISRs running at different priorities (or ISR and the background loop), you need to properly protect the resources by disabling interrupts.
So, your idea of using ISRs as "tasks" makes a lot of sense to me. Your system will be optimal meaning that any other approach would be less efficient. (This includes the use of any kind of RTOS.) Also, this design can be low-power because you could use the "background loop" in main() to put your CPU and peripherals into low-power sleep mode (WFI instruction, etc.) In fact, you can view the "background loop" as the idle "task" in this "hardware-RTOS".

Will semaphore corrupt data transmission of peripherals like UART in a microcontroller?

Semaphore disables interrupts and so will this cause other operations like receiving data on SPI to get corrupt?
Disabling interrupts cannot corrupt the data on the hardware interface.
The problem is if the data is received by the hardware peripheral and then the it raises an interrupt to have the processor collect the data then this will be delayed. If it is delayed for too long then potentially more data will have been received. Depending on the peripheral, either the new data or the old data will have to be discarded. Either way stream of data will be incomplete.
In most cases it is difficult to predict or test how long it is safe to disable interrupts for, so if possible it is best to avoid turning interrupts off.
If the peripheral includes a FIFO buffer, then the length of time that it is safe to disable interrupts for may be increased (although still difficult to predict).
Most modern microcontrollers have many ways to avoid disabling interrupts:
A better approach is to have the peripheral transfer the data to memory with DMA, so no interrupt is required at all.
Most modern processor cores provide ways to implement a semaphore do not even need to disable interrupts.
There's no standard way of implementing a semaphore. To disable all interrupts on the MCU is one way to do it, but it's a very poor amateur way of doing so. Because in more complex applications with multiple interrupts, this will make all real-time considerations and calculations a nightmare.
It creates subtle but severe bugs. Particularly when some quack has done so from deep inside some driver code. You import the driver into your project and suddenly previously working code breaks. In particular, be very careful about using various libs provided by silicon vendors - they are often of very poor quality.
There are better ways to do it, including:
Ensuring atomic access of shared variables, which can only be done with inline assembler or C11 _Atomic if supported.
Disabling one specific interrupt for a specific hardware peripheral, if it is possible to do do given the real-time considerations. Then this should be handled by the driver for that hardware peripheral in the form of setter/getter functions.
Use a "poor man's semaphore" in the form of a plain flag variable, by relying on the interrupt mechanism of the MCU blocking all other interrupts while the ISR is executing. Example.

Disable interrupt to let freeRTOS run on stm32

I'm working a project where I am getting digital samples continuously through DMA on STM32f4. DMA generates a complete callback interrupt after every sample where I do some DSP. My plan is to allow freeRTOS to work on other tasks while DMA is waiting on the callback. However, DMA is generating callback too frequently, not allowing freeRTOS to run. I want to make it so that after every DMA complete callback, freeRTOS tasks is allowed to run for 6ms. I thought of calling __disable_irq() from complete callback and __enable_irq() from one of the tasks but that would not guarantee 6ms also I have a high priority button interrupt. I also tried disabling just DMA interrupt calling __set_BASEPRI(priority<<(8-__NVIC_PRIO_BITS)) then starting a timer for 6ms. On timer period elapsed callback in call __set_BASEPRI(0) to enable DMA interrupt. But for some reason this did not allow freeRTOS to run at all. It goes back and forth between DMA complete callback and Timer period elapsed callback.
I am new to embedded programming so any comment on this will help. Thank You.
You should not think of the DSP process being separate from the RTOS tasks, do the DSP in an RTOS task - the signal processing is the most time critical aspect of your system, you have to process the data as fast as it arrives with no loss.
If the DSP is being done in an interrupt context and starving your tasks, then clearly you are doing too much work in the interrupt context, and have too high an interrupt rate. You need to fix your design for something more schedulable.
If your DMA transfers are single samples, you will get one interrupt per sample - the ADC will do that on its own; so using DMA in that manner offers no advantage over direct ADC interrupt processing.
Instead you should use block processing, so you DMA a block of say 80 samples samples cyclically, for which you get a half-transfer interrupt at 40 samples, and full-transfer interrupt at 80 samples. So for each interrupt you might then trigger a task-event or semaphore to defer the DSP processing to a high-priority RTOS task. This achieves two things;
For the entirety of the n sample block acquisition time, the RTOS is free to:
be performing the DSP processing for the previous block,
use any remaining time to process the lower priority tasks.
Any interrupt overhead spent context switching etc. is reduced by 1/n, allowing more time performing core signal processing and background tasks.
Apart form reducing the number of interrupts and software overhead, the signal processing algorithms themselves can be optimised more readily when performing block-processing.
A variation on the above is rather then triggering a task event or semaphore from the DMA interrupt handler, you could place the new sample block in a message queue, which will then provide some buffering. This is useful if the DSP processing might be less deterministic, so cannot always guarantee to complete processing of one block before the next is ready. However overall it remains necessary that on average you complete block processing in the time it takes to acquire a block, with time to spare for other tasks.
If your lower priority tasks are still starved, then the clear indication is that your DSP process is simply too much for your processor. There may be scope for optimisation, but that would be a different question.
Using the suggested block-processing strategy I have in the past migrated an application from a TI C2000 DSP running at 200MHz and 98% CPU load, to a 72MHz STM32F1xx at 60% CPU load. The performance improvement is potentially very significant if you get it right.
With respect to your "high-priority" button interrupt, I would question your priority assignment. Buttons are operated manually with human response and perception times measured in 10's or even 100's of milliseconds. That is hardly your time critical task, whereas missing an ADC sample of a few microseconds would cause your signal processing to go seriously awry.
You may be making the mistake of confusing "high-priority" with "important". In the context or a real-time system, they are not the same thing. You could simply poll the button in a low-priority task, or if you use an interrupt, the interrupt should do no more than signal a task (or more realistically trigger a de-bounce timer) (see Rising edge interrupt triggering multiple times on STM32 Nucleo for example).

TinyAVR 0-Series: Can I use pin-change sensing without entering interrupt handler?

I am evaluating the ATtiny806 running at 20MHz to build a cycle-accurate Intel 4004 microprocessor emulator. (I know it will be a bit too slow, but AVRs have a huge community.)
I need to synchronize to the external, two-phase non-overlapping clocks. These are not fast clocks (the original 4004 ran at 750kHz)
but if I spin-wait for every clock edge, I risk wasting most of my time budget.
The TinyAVR 0-series has a very nice pin-change interrupt facility that can be configured to trigger only on rising edges.
But, an interrupt routine round-trip is 8 cycles (3 in, 5 out).
My question is:
Can I leverage the pin-change sensing mechanism while never visiting an ISR?
(Other processor families let you poll for interruptible conditions without enabling interrupts from that peripheral). Can polling be done with a tight skip-on-bit/jump-back loop, followed by a set-bit instruction?
Straightforward way
You can always just poll on the level of the GPIO pin using the single cycle skip if bit set/clear instruction on the appropriate PORT register and bit.
But as you mention, polling does burn cycles so I'm not sure exactly what you want here - either a poll (that burns cycles but has low latency) or an interrupt (that has higher latency but allows processing to continue until the condition is true).
Note that if things get really tight and you are looking for, say, power savings by sleeping between clock signal transitions then you can do tricks like having an ISR that nevers returns (saving the IRET cycles) but that requires some careful coding probably with something like a state machine.
INTFLAG way
Alternately, if you want to use the internal pin state machine logic and you can live without interrupts, then you can use the INTFLAGS flags to check for the pin change configured in the ISC bits of the PINxCTRL register. As long as global interrupts are not enabled in SREG then you can spin poll on the appropriate INTFLAG bit to check/wait for the desired condition, and then write a 1 to that bit to clear the flag.
Note that if you want to make this fast, you will probably want to map the appropriate PORT to a VPORT since the VPORT registers are in I/O Memory. This lets you use SBIS to test the INTFLAG bit a single cycle and SBI to clear the bit in a single cycle (these instructions only work on IO memory and the normal PORT registers are not in IO Memory).
Finally one more complication, if you need to leave the interrupts on when doing this, it is probably possible by hacking the interrupt priority registers. You'd set the pin change to be on level 0, and then make sure the interrupts you care about are level 1 or higher, and then trick the interrupt controller into thinking that there is already a level 0 running so these interrupts do not actually fire. There are also other restrictions to this strategy so avoid it if at all possible.
Programmable logic way
If you want to get really esoteric, it is likely possible that you could route the input value of a pin to a configurable custom logic LUT in the chip and then route the output of that module to a bit that you test using a 1-cycle bit test (maybe an unused IO Pin). To do this, you'd feedback the output of the LUT back into one of its inputs and then use the LUT to create a strobe on the edge you are looking for. This is very complex, and also since the strobe has no acknowledgement that if the signal changes when you are not looking for it (in a spin check) then it will be lost and you will have to wait for the next edge (probably fatal in your application).

How to keep interrupts short?

The most heard advice in embedded programming is "keep your interrupts short".
Now my situation is that I have a very long running task in my main() loop (writing large blocks of data to SDcard), which can sometimes take 100ms. So to keep my system responsive I moved all other stuff to interrupt-handlers.
For example, normally one would handle the incoming UART data in an interrupt, then process the incoming command in the main() loop, and then send back the response. But in my case, the whole processing/handling of the commands also takes places in the interrupts, because my main() loop can be blocked for (relatively) long periods.
The optimal solution would be to switch to an RTOS but I don't have the RAM for it. Are there alternatives for my design where the interrupts can be short?
The traditional approach for this is for Interrupts to schedule a deferred procedure and end the interrupt as soon as possible.
Once the interrupt has finished, the list of deferred procedures is walked from most-important to least important.
Consider the case where you have your main (lower proiority) action, and two interrupts I1 and I2, where I2 is more important than main, but less important than I1.
In this case, let's suppose you're running main and I1 fires. I1 schedules a deferred procedure and signals to the hardware that I1 is done. I1's DPC now begins running. Suddenly I2 comes in from the hardware. I2's interrupt takes over from I1's DPC and schedules I2's DPC and signals to the hardware that it's done.
The scheduler then returns to I1's DPC (because it is more important), and when I1's DPC completes, I2's DPC begins (because it is more important than main), and then eventually returns execution to main.
This design allows you to schedule the importance of different interrupts, encourages you to keep your interrupts small, and allows you to complete DPCs in an ordered and in-order prioritized way.
There are 100 different ways to skin this cat, depending on CPU architecture (interrupt nesting & prioritization, software interrupt support, etc.) but let's take a pretty straightforward approach that is relatively simple to understand and free from the race conditions and resource-sharing hazards of a preemptive kernel.
(Disclaimer: my first choice is typically a preemptive real time kernel, many of them can run in extremely resource-constrained systems... SecurityMatt's suggestion is good but if you're not comfortable implementing your own preemptible kernel / task switcher, particularly one that handles asynchronous (interrupt-triggered) preemption, you can get wrapped around the axle pretty quickly. So what I'm proposing below is not as responsive as a preemption-based kernel, but it's much simpler and often adequate).
Create 3 event/work queues:
Q1 is the lowest priority and handles your slow, background SD card writes
Q2 holds requests to process incoming UART packets
Q3 (highest priority) holds UART RX FIFO read requests.
I split up the UART RX FIFO reading and the processing of the read packet so that the FIFO reading is always serviced ahead of the packet processing; maybe you want to keep them together, your choice.
For this to work, you break your large (~100ms) SD card write process into a bunch of smaller, discrete, run to completion steps.
So for example, to write 5 blocks, 20ms each, you write the first block, then enqueue "write next block" to Q1. You go back to your scheduler at the end of each step & scan the queues in priority order, starting with Q3. If Q2 and Q3 are empty, you pull the next event off of Q1 ("write next block"), and run that command for another 20ms before returning and scanning the queues again. If 20ms is not responsive enough, you break up each 20ms block write into a more fine-grained set of steps, continually posting to Q1 the next work step.
Now for the incoming UART stuff; in the UART RX ISR, you simple enqueue a "read UART FIFO" command in Q3, and return from interrupt back into the 20ms "write block" step that was interrupted. As soon as the CPU finishes the write, it goes back and scans the queues in priority order (worst case response will be 20ms if the block write had just begun at the time of the interrupt). The queue scanner (scheduler) will see that Q3 now has work to do, and it will run that command before going back and scanning again.
The responsiveness in your system, worst case, will be determined by the longest run-to-completion step in the system, regardless of priority. You keep your system very responsive by doing work in small, discrete, run to completion steps.
Note that I have to speak in generalities here. Maybe you want to read the UART RX FIFO in the ISR, put the data into a buffer, and only defer the packet processing, not the actual reading of the FIFO (then you'd only have 2 queues). You have to work this out for yourself. But I hope the approach makes sense.
This event-driven approach with prioritized queues is exactly the approach used by the Quantum Platform (QP) event-driven framework. The QP actually supports an underlying non-preemptive (cooperative) scheduler, such as what was described here, or a preemptive scheduler which runs the scheduler each an event is queued (similar to the approach suggested by SecurityMatt). You can see the code/implementation of the QP's cooperative scheduler over at QP website.
An alternative solution would be as follow:
Anywhere the FAT library can capture the processor for a long time, you insert a call to a new function which is normally very fast and return to the caller after a few machine cycles. Such fast function would not impact the real-time performance of your time consuming operation, such as reading/writing to SD Flash. You would insert such call in any loop that wait for a flash sector to be erased. You also insert a call to such function in between every 512 bytes written or 512 bytes read.
The goal of that function is to perform most of the task that you would normally have inside the "while(1)" loop in a typical "main()" for embedded device. It would first increment an integer and perform a fast modulo on the new value, then return if the modulo is not equal to an arbitrary constant. The code is as follow:
void premption_check(void)
{
static int fast_modulo = 0;
//divide the number of call
fast_modulo++;
if( (fast_modulo & 0x003F) != 3)
{
return;
}
//the processor would continue here only once every 64 calls to "premption_check"
Next, you call the functions that extract RS232 characters/strings from the serial port interrupts, process any command if complete strings are received, etc
The binary mask 0x3F used above means that we look only at the 6 least significant bits of the counter. When these 6 bits happen to be equal to the arbitrary value 5, when go ahead with the calls to functions which may take some micro-second or even milli-second to execute. You may want to try smaller or larger binary mask depending on the speed at which you want to service the serial port and other operations. You may even use simultaneously more than one mask to service some operation faster than other.
The FAT library and the SD card should not experience any problem when some sporadic delay happen in between two Flash erase operation, for example.
The solution given here works even with a micro-controller with only 2K byte, like many variant of 8051. As incredible as it may seems, the pinball machine of 1980 to 1990 had a few K of RAM, slow processors (like 10 MHz) and they where able to test one hundred switch... fully debounced, update a X/Y matrix display, produce sound effects, etc The solutions developed by these engineer can still be used to boost the performance of large system. Even with the best servers with 64 Gig RAM and many Terabyte of hard disk, I presume that any bytes count when some company want to index billions of WEB pages.
As no-one has suggested coming at it from this end yet I'll throw it in the hat:
It's possible that sticking the SD card service routine in a low-priority interrupt, maybe throwing in some DMA if you can, would free up your main loop & other interrupts to be more responsive, rather than being stuck in a main() loop waiting for longtime for something to finish.
The caveat to this is I don't know if the hardware has any way of triggering the interrupt when the SD card is ready for more, you might have to cheat by running a polling timer to check & force the interrupt. I'm not above that sort of thing though, if you have spare hardware timers & interrupts it can be done with very little overhead.
Resorting to an RTOS for something like this would seem overkill & an admission of failure to me... ;)