Disable interrupt to let freeRTOS run on stm32

Disable interrupt to let freeRTOS run on stm32 - embedded

I'm working a project where I am getting digital samples continuously through DMA on STM32f4. DMA generates a complete callback interrupt after every sample where I do some DSP. My plan is to allow freeRTOS to work on other tasks while DMA is waiting on the callback. However, DMA is generating callback too frequently, not allowing freeRTOS to run. I want to make it so that after every DMA complete callback, freeRTOS tasks is allowed to run for 6ms. I thought of calling __disable_irq() from complete callback and __enable_irq() from one of the tasks but that would not guarantee 6ms also I have a high priority button interrupt. I also tried disabling just DMA interrupt calling __set_BASEPRI(priority<<(8-__NVIC_PRIO_BITS)) then starting a timer for 6ms. On timer period elapsed callback in call __set_BASEPRI(0) to enable DMA interrupt. But for some reason this did not allow freeRTOS to run at all. It goes back and forth between DMA complete callback and Timer period elapsed callback.
I am new to embedded programming so any comment on this will help. Thank You.

You should not think of the DSP process being separate from the RTOS tasks, do the DSP in an RTOS task - the signal processing is the most time critical aspect of your system, you have to process the data as fast as it arrives with no loss.
If the DSP is being done in an interrupt context and starving your tasks, then clearly you are doing too much work in the interrupt context, and have too high an interrupt rate. You need to fix your design for something more schedulable.
If your DMA transfers are single samples, you will get one interrupt per sample - the ADC will do that on its own; so using DMA in that manner offers no advantage over direct ADC interrupt processing.
Instead you should use block processing, so you DMA a block of say 80 samples samples cyclically, for which you get a half-transfer interrupt at 40 samples, and full-transfer interrupt at 80 samples. So for each interrupt you might then trigger a task-event or semaphore to defer the DSP processing to a high-priority RTOS task. This achieves two things;
For the entirety of the n sample block acquisition time, the RTOS is free to:
be performing the DSP processing for the previous block,
use any remaining time to process the lower priority tasks.
Any interrupt overhead spent context switching etc. is reduced by 1/n, allowing more time performing core signal processing and background tasks.
Apart form reducing the number of interrupts and software overhead, the signal processing algorithms themselves can be optimised more readily when performing block-processing.
A variation on the above is rather then triggering a task event or semaphore from the DMA interrupt handler, you could place the new sample block in a message queue, which will then provide some buffering. This is useful if the DSP processing might be less deterministic, so cannot always guarantee to complete processing of one block before the next is ready. However overall it remains necessary that on average you complete block processing in the time it takes to acquire a block, with time to spare for other tasks.
If your lower priority tasks are still starved, then the clear indication is that your DSP process is simply too much for your processor. There may be scope for optimisation, but that would be a different question.
Using the suggested block-processing strategy I have in the past migrated an application from a TI C2000 DSP running at 200MHz and 98% CPU load, to a 72MHz STM32F1xx at 60% CPU load. The performance improvement is potentially very significant if you get it right.
With respect to your "high-priority" button interrupt, I would question your priority assignment. Buttons are operated manually with human response and perception times measured in 10's or even 100's of milliseconds. That is hardly your time critical task, whereas missing an ADC sample of a few microseconds would cause your signal processing to go seriously awry.
You may be making the mistake of confusing "high-priority" with "important". In the context or a real-time system, they are not the same thing. You could simply poll the button in a low-priority task, or if you use an interrupt, the interrupt should do no more than signal a task (or more realistically trigger a de-bounce timer) (see Rising edge interrupt triggering multiple times on STM32 Nucleo for example).

Related

How do you avoid interrupt starvation in a nested interrupt system?

I am learning about interrupts and couldn't understand what happens when there are too many interrupts to a point where the CPU can't process the foreground loop or complete the existing interrupts. I read through this article https://www.cs.utah.edu/~regehr/papers/interrupt_chapter.pdf but didn't completely understand how a scheduler would help, if there are simply too many interrupts?
Do we switch to a faster CPU if the interrupts can not be missed?

Yes, you had to switch to a faster CPU!
You had to ensure that there is enough time for the mainloop. Therefore it is really important to keep your Interrupt service as short as possible and do some CPU workloads tests.

Indeed, any time there is contention over a shared resource, there is the possibility of starvation. The schedulers discussed in the paper limit the interrupt rate, thus ensuring some interrupt-free processing time during each interval. During high activity periods, interrupt handling is disabled, and the scheduler switches to polling mode where it interrogates the state of the interrupt request lines periodically, effectively throttling the stream of interrupts. The operating system strives to do as little as possible in each interrupt handler - tasks are often simply queued so they can be handled later at a different stage. There are many considerations and trade-offs that go into any scheduling algorithm.

Overall you need a clue of how much time each part of your program consumes. This is pretty easy to measure in practice live with an oscilloscope. If you activate a GPIO when entering and de-activate it when leaving the interrupt, you don't only get to see how much time the ISR consumes, but also how often it kicks in. If you do this for each ISR you get a good idea how much time they need. You can then do something similar in main(), to get a rough estimate of the complete execution cycle of the program, main + interrupts.
As for the best solution, it is obviously to reduce the amount of interrupts. Use polling if possible. Use DMA. Use serial peripherals (UART, CAN etc) that are hardware-buffered instead of interrupt-intensive ones. Use hardware PWM instead of output compare timers. And so on. These things need to be considered early on when you pick a suitable MCU for your project. If you picked the wrong MCU, then you'll obviously have to change. Twiddling with the CPU clock sounds like quick & dirty fix. Get the design right instead.

Operating System Basics

I am reading process management,and I have a few doubts-
What is meant by an I/o request,for E.g.-A process is executing and
hence it is in running state,it is in waiting state if it is waiting
for the completion of an I/O request.I am not getting by what is meant by an I/O request,Can you
please give an example to elaborate.
Another doubt is -Lets say that a process is executing and suddenly
an interrupt occurs,then the process stops its execution and will be
put in the ready state,is it possible that some other process began
its execution while the interrupt is also being processed?

Regarding the first question:
A simple way to think about it...
Your computer has lots of components. CPU, Hard Drive, network card, sound card, gpu, etc. All those work in parallel and independent of each other. They are also generally slower than the CPU.
This means that whenever a process makes a call that down the line (on the OS side) ends up communicating with an external device, there is no point for the OS to be stuck waiting for the result since the time it takes for that operation to complete is probably an eternity (in the CPU view point of things).
So, the OS fires up whatever communication the process requested (call it IO request), flags the process as waiting for IO, and switches execution to another process so the CPU can do something useful instead of sitting around blocked waiting for the IO request to complete.
When the external device finishes whatever operation was requested, it generates an interrupt, so the OS is informed the work is done, and it can then flag the blocked process as ready again.
This is all a very simplified view of course, but that's the main idea. It allows the CPU to do useful work instead of waiting for IO requests to complete.
Regarding the second question:
It's tricky, even for single CPU machines, and depends on how the OS handles interrupts.
For code simplicity, a simple OS might for example, whenever an interrupt happens process the interrupt in one go, then resume whatever process it decides it's appropriate whenever the interrupt handling is done. So in this case, no other process would run until the interrupt handling is complete.
In practice, things get a bit more complicated for performance and latency reasons.
If you think about an interrupt lifetime as just another task for the CPU (From when the interrupt starts to the point the OS considers that handling complete), you can effectively code the interrupt handling to run in parallel with other things.
Just think of the interrupt as notification for the OS to start another task (that interrupt handling). It grabs whatever context it needs at the point the interrupt started, then keeps processing that task in parallel with other processes.

I/O request generally just means request to do either Input , Output or both. The exact meaning varies depending on your context like HTTP, Networks, Console Ops, or may be some process in the CPU.
A process is waiting for IO: Say for example you were writing a program in C to accept user's name on command line, and then would like to print 'Hello User' back. Your code will go into waiting state until user enters their name and hits Enter. This is a higher level example, but even on a very low level process executing in your computer's processor works on same basic principle
Can Processor work on other processes when current is interrupted and waiting on something? Yes! You better hope it does. Thats what scheduling algorithms and stacks are for. However the real answer depending on what Architecture you are on, does it support parallel or serial processing etc.

How can polling be faster than interupt

I'm trying to learn interrupts by reading these slides and am wondering, why can polling be faster than interrupts? If a device has a direct wire to the CPU that it can use to signal an interrupt, I can't imagine something being faster than that.
Give each device a wire (interrupt line) that it can use to signal the
processor
• When interrupt signaled, processor executes a routine
called an interrupt handler to deal with the interrupt
(does it litterally mean a wire by the way?)
Polling can be better if processor has to respond to an event ASAP

Interrupt handling, needs context switching (pipeline break, save stack pointer, CPU registers, etc..) before servicing the interrupt, which needs some time (dependent on the architecture). Polling can be faster if it's the only task (keep polling for the event), as you stay in the same context. In this case, it's only the poll + loop instructions time.

How to keep interrupts short?

The most heard advice in embedded programming is "keep your interrupts short".
Now my situation is that I have a very long running task in my main() loop (writing large blocks of data to SDcard), which can sometimes take 100ms. So to keep my system responsive I moved all other stuff to interrupt-handlers.
For example, normally one would handle the incoming UART data in an interrupt, then process the incoming command in the main() loop, and then send back the response. But in my case, the whole processing/handling of the commands also takes places in the interrupts, because my main() loop can be blocked for (relatively) long periods.
The optimal solution would be to switch to an RTOS but I don't have the RAM for it. Are there alternatives for my design where the interrupts can be short?

The traditional approach for this is for Interrupts to schedule a deferred procedure and end the interrupt as soon as possible.
Once the interrupt has finished, the list of deferred procedures is walked from most-important to least important.
Consider the case where you have your main (lower proiority) action, and two interrupts I1 and I2, where I2 is more important than main, but less important than I1.
In this case, let's suppose you're running main and I1 fires. I1 schedules a deferred procedure and signals to the hardware that I1 is done. I1's DPC now begins running. Suddenly I2 comes in from the hardware. I2's interrupt takes over from I1's DPC and schedules I2's DPC and signals to the hardware that it's done.
The scheduler then returns to I1's DPC (because it is more important), and when I1's DPC completes, I2's DPC begins (because it is more important than main), and then eventually returns execution to main.
This design allows you to schedule the importance of different interrupts, encourages you to keep your interrupts small, and allows you to complete DPCs in an ordered and in-order prioritized way.

There are 100 different ways to skin this cat, depending on CPU architecture (interrupt nesting & prioritization, software interrupt support, etc.) but let's take a pretty straightforward approach that is relatively simple to understand and free from the race conditions and resource-sharing hazards of a preemptive kernel.
(Disclaimer: my first choice is typically a preemptive real time kernel, many of them can run in extremely resource-constrained systems... SecurityMatt's suggestion is good but if you're not comfortable implementing your own preemptible kernel / task switcher, particularly one that handles asynchronous (interrupt-triggered) preemption, you can get wrapped around the axle pretty quickly. So what I'm proposing below is not as responsive as a preemption-based kernel, but it's much simpler and often adequate).
Create 3 event/work queues:
Q1 is the lowest priority and handles your slow, background SD card writes
Q2 holds requests to process incoming UART packets
Q3 (highest priority) holds UART RX FIFO read requests.
I split up the UART RX FIFO reading and the processing of the read packet so that the FIFO reading is always serviced ahead of the packet processing; maybe you want to keep them together, your choice.
For this to work, you break your large (~100ms) SD card write process into a bunch of smaller, discrete, run to completion steps.
So for example, to write 5 blocks, 20ms each, you write the first block, then enqueue "write next block" to Q1. You go back to your scheduler at the end of each step & scan the queues in priority order, starting with Q3. If Q2 and Q3 are empty, you pull the next event off of Q1 ("write next block"), and run that command for another 20ms before returning and scanning the queues again. If 20ms is not responsive enough, you break up each 20ms block write into a more fine-grained set of steps, continually posting to Q1 the next work step.
Now for the incoming UART stuff; in the UART RX ISR, you simple enqueue a "read UART FIFO" command in Q3, and return from interrupt back into the 20ms "write block" step that was interrupted. As soon as the CPU finishes the write, it goes back and scans the queues in priority order (worst case response will be 20ms if the block write had just begun at the time of the interrupt). The queue scanner (scheduler) will see that Q3 now has work to do, and it will run that command before going back and scanning again.
The responsiveness in your system, worst case, will be determined by the longest run-to-completion step in the system, regardless of priority. You keep your system very responsive by doing work in small, discrete, run to completion steps.
Note that I have to speak in generalities here. Maybe you want to read the UART RX FIFO in the ISR, put the data into a buffer, and only defer the packet processing, not the actual reading of the FIFO (then you'd only have 2 queues). You have to work this out for yourself. But I hope the approach makes sense.
This event-driven approach with prioritized queues is exactly the approach used by the Quantum Platform (QP) event-driven framework. The QP actually supports an underlying non-preemptive (cooperative) scheduler, such as what was described here, or a preemptive scheduler which runs the scheduler each an event is queued (similar to the approach suggested by SecurityMatt). You can see the code/implementation of the QP's cooperative scheduler over at QP website.

An alternative solution would be as follow:
Anywhere the FAT library can capture the processor for a long time, you insert a call to a new function which is normally very fast and return to the caller after a few machine cycles. Such fast function would not impact the real-time performance of your time consuming operation, such as reading/writing to SD Flash. You would insert such call in any loop that wait for a flash sector to be erased. You also insert a call to such function in between every 512 bytes written or 512 bytes read.
The goal of that function is to perform most of the task that you would normally have inside the "while(1)" loop in a typical "main()" for embedded device. It would first increment an integer and perform a fast modulo on the new value, then return if the modulo is not equal to an arbitrary constant. The code is as follow:
void premption_check(void)
{
static int fast_modulo = 0;
//divide the number of call
fast_modulo++;
if( (fast_modulo & 0x003F) != 3)
{
return;
}
//the processor would continue here only once every 64 calls to "premption_check"
Next, you call the functions that extract RS232 characters/strings from the serial port interrupts, process any command if complete strings are received, etc
The binary mask 0x3F used above means that we look only at the 6 least significant bits of the counter. When these 6 bits happen to be equal to the arbitrary value 5, when go ahead with the calls to functions which may take some micro-second or even milli-second to execute. You may want to try smaller or larger binary mask depending on the speed at which you want to service the serial port and other operations. You may even use simultaneously more than one mask to service some operation faster than other.
The FAT library and the SD card should not experience any problem when some sporadic delay happen in between two Flash erase operation, for example.
The solution given here works even with a micro-controller with only 2K byte, like many variant of 8051. As incredible as it may seems, the pinball machine of 1980 to 1990 had a few K of RAM, slow processors (like 10 MHz) and they where able to test one hundred switch... fully debounced, update a X/Y matrix display, produce sound effects, etc The solutions developed by these engineer can still be used to boost the performance of large system. Even with the best servers with 64 Gig RAM and many Terabyte of hard disk, I presume that any bytes count when some company want to index billions of WEB pages.

As no-one has suggested coming at it from this end yet I'll throw it in the hat:
It's possible that sticking the SD card service routine in a low-priority interrupt, maybe throwing in some DMA if you can, would free up your main loop & other interrupts to be more responsive, rather than being stuck in a main() loop waiting for longtime for something to finish.
The caveat to this is I don't know if the hardware has any way of triggering the interrupt when the SD card is ready for more, you might have to cheat by running a polling timer to check & force the interrupt. I'm not above that sort of thing though, if you have spare hardware timers & interrupts it can be done with very little overhead.
Resorting to an RTOS for something like this would seem overkill & an admission of failure to me... ;)

Which Cortex-M3 interrupts can I use for general purpose work?

I'd have some code that needs to be run as the result of a particular interrupt going off.
I don't want to execute it in the context of the interrupt itself but I also don't want it to execute in thread mode.
I would like to run it at a priority that's lower than the high level interrupt that precipitated its running but also a priority that higher than thread level (and some other interrupts as well).
I think I need to use one of the other interrupt handlers.
Which ones are the best to use and what the best way to invoke them?
At the moment I'm planning on just using the interrupt handlers for some peripherals that I'm not using and invoking them by setting bits directly through the NVIC but I was hoping there's a better, more official way.
Thanks,

ARM Cortex supports a very special kind of exception called PendSV. It seems that you could use this exception exactly to do your work. Virtually all preemptive RTOSes for ARM Cortex use PendSV to implement the context switch.
To make it work, you need to prioritize PendSV low (write 0xFF to the PRI_14 register in the NVIC). You should also prioritize all IRQs above the PendSV (write lower numbers in the respective priority registers in the NVIC). When you are ready to process the whole message, trigger the PendSV from the high-priority ISR:
*((uint32_t volatile *)0xE000ED04) = 0x10000000; // trigger PendSV
The ARM Cortex CPU will then finish your ISR and all other ISRs that possibly were preempted by it, and eventually it will tail-chain to the PendSV exception. This is where your code for parsing the message should be.
Please note that PendSV could be preempted by other ISRs. This is all fine, but you need to obviously remember to protect all shared resources by a critical section of code (briefly disabling and enabling interrupts). In ARM Cortex, you disable interrupts by executing __asm("cpsid i") and you enable interrupts by __asm("cpsie i"). (Most C compilers provide built-in intrinsic functions or macros for this purpose.)

Are you using an RTOS? Generally this type of thing would be handled by having a high priority thread that gets signaled to do some work by the interrupt.
If you're not using an RTOS, you only have a few tasks, and the work being kicked off by the interrupt isn't too resource intensive, it might be simplest having your high priority work done in the context of the interrupt handler. If those conditions don't hold, then implementing what you're talking about would be the start of a basic multitasking OS itself. That can be an interesting project in its own right, but if you're looking to just get work done, you might want to consider a simple RTOS.
Since you mentioned some specifics about the work you're doing, here's an overview of how I've handled a similar problem in the past:
For handling received data over a UART one method that I've used when dealing with a simpler system that doesn't have full support for tasking (ie., the tasks are round-robined i na simple while loop) is to have a shared queue for data that's received from the UART. When a UART interrupt fires, the data is read from the UART's RDR (Receive Data Register) and placed in the queue. The trick to deal with this in such a way that the queue pointers aren't corrupted is to carefully make the queue pointers volatile, and make certain that only the interrupt handler modifies the tail pointer and that only the 'foreground' task that's reading data off the queue modified the head pointer. A high-level overview:
producer (the UART interrupt handler):
read queue.head and queue.tail into locals;
increment the local tail pointer (not the actual queue.tail pointer). Wrap it to the start of the queue buffer if you've incremented past the end of the queue's buffer.
compare local.tail and local.head - if they're equal, the queue is full, and you'll have to do whatever error handing is appropriate.
otherwise you can write the new data to where local.tail points
only now can you set queue.tail == local.tail
return from the interrupt (or handle other UART related tasks, if appropriate, like reading from a transmit queue)
consumer (the foreground 'task')
read queue.head and queue.tail into locals;
if local.head == local.tail the queue is empty; return to let the next task do some work
read the byte pointed to by local.head
increment local.head and wrap it if necessary;
set queue.head = local.head
goto step 1
Make sure that queue.head and queue.tail are volatile (or write these bits in assembly) to make sure there are no sequencing issues.
Now just make sure that your UART received data queue is large enough that it'll hold all the bytes that could be received before the foreground task gets a chance to run. The foreground task needs to pull the data off the queue into it's own buffers to build up the messages to give to the 'message processor' task.

What you are asking for is pretty straightforward on the Cortex-M3. You need to enable the STIR register so you can trigger the low priority ISR with software. When the high-priority ISR gets done with the critical stuff, it just triggers the low priority interrupt and exits. The NVIC will then tail-chain to the low-priority handler, if there is nothing more important going on.

The "more official way" or rather the conventional method is to use a priority based preemptive multi-tasking scheduler and the 'deferred interrupt handler' pattern.

Check your processor documentation. Some processors will interrupt if you write the bit that you normally have to clear inside the interrupt. I am presently using a SiLabs c8051F344 and in the spec sheet section 9.3.1:
"Software can simulate an interrupt by setting any interrupt-pending flag to logic 1. If interrupts are enabled for the flag, an interrupt request will be generated and the CPU will vector to the ISR address associated with the interrupt-pending flag."

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas