I have a program that should be run periodically on a MCU (ex. STM32). For example it should be run at every 1 ms. If I program an 1ms ISR and call my complete program in it, assuming it will not exceed 1ms, is that a good way? Is there any problem that I could be facing? Will it be precise?
The usual answer would be probably that ISRs should be generally kept to a minimum and most of the work should be performed in the "background loop". (Since you are apparently using the traditional "foreground-background" architecture, a.k.a. "main+ISRs").
But ARM Cortex-M (e.g., STM32) has been specifically designed such that ISRs can be written as plain C functions. In that case, working with ISRs is no different than working with any other C code. This includes ease of debugging.
Moreover, ARM Cortex-M comes with the NVIC (Nested Vectored Interrupt Controller). The NVIC allows you to prioritize interrupts and they can preempt each other. This means that you can quite easily build sets of periodic "tasks" (ISRs), which run under the preemptive, priority-based scheduler. Interestingly, this scheduler (implemented in the NVIC hardware) meets all requirements of RMA/RMS (Rate Monotonic Analysis/Scheduling), so you could prove the schedulability of your system. Of course, the ISR "tasks" cannot block internally, but this is not required for RMA/RMS. Also, if you have any shared resources between ISRs running at different priorities (or ISR and the background loop), you need to properly protect the resources by disabling interrupts.
So, your idea of using ISRs as "tasks" makes a lot of sense to me. Your system will be optimal meaning that any other approach would be less efficient. (This includes the use of any kind of RTOS.) Also, this design can be low-power because you could use the "background loop" in main() to put your CPU and peripherals into low-power sleep mode (WFI instruction, etc.) In fact, you can view the "background loop" as the idle "task" in this "hardware-RTOS".
Related
I am evaluating the ATtiny806 running at 20MHz to build a cycle-accurate Intel 4004 microprocessor emulator. (I know it will be a bit too slow, but AVRs have a huge community.)
I need to synchronize to the external, two-phase non-overlapping clocks. These are not fast clocks (the original 4004 ran at 750kHz)
but if I spin-wait for every clock edge, I risk wasting most of my time budget.
The TinyAVR 0-series has a very nice pin-change interrupt facility that can be configured to trigger only on rising edges.
But, an interrupt routine round-trip is 8 cycles (3 in, 5 out).
My question is:
Can I leverage the pin-change sensing mechanism while never visiting an ISR?
(Other processor families let you poll for interruptible conditions without enabling interrupts from that peripheral). Can polling be done with a tight skip-on-bit/jump-back loop, followed by a set-bit instruction?
Straightforward way
You can always just poll on the level of the GPIO pin using the single cycle skip if bit set/clear instruction on the appropriate PORT register and bit.
But as you mention, polling does burn cycles so I'm not sure exactly what you want here - either a poll (that burns cycles but has low latency) or an interrupt (that has higher latency but allows processing to continue until the condition is true).
Note that if things get really tight and you are looking for, say, power savings by sleeping between clock signal transitions then you can do tricks like having an ISR that nevers returns (saving the IRET cycles) but that requires some careful coding probably with something like a state machine.
INTFLAG way
Alternately, if you want to use the internal pin state machine logic and you can live without interrupts, then you can use the INTFLAGS flags to check for the pin change configured in the ISC bits of the PINxCTRL register. As long as global interrupts are not enabled in SREG then you can spin poll on the appropriate INTFLAG bit to check/wait for the desired condition, and then write a 1 to that bit to clear the flag.
Note that if you want to make this fast, you will probably want to map the appropriate PORT to a VPORT since the VPORT registers are in I/O Memory. This lets you use SBIS to test the INTFLAG bit a single cycle and SBI to clear the bit in a single cycle (these instructions only work on IO memory and the normal PORT registers are not in IO Memory).
Finally one more complication, if you need to leave the interrupts on when doing this, it is probably possible by hacking the interrupt priority registers. You'd set the pin change to be on level 0, and then make sure the interrupts you care about are level 1 or higher, and then trick the interrupt controller into thinking that there is already a level 0 running so these interrupts do not actually fire. There are also other restrictions to this strategy so avoid it if at all possible.
Programmable logic way
If you want to get really esoteric, it is likely possible that you could route the input value of a pin to a configurable custom logic LUT in the chip and then route the output of that module to a bit that you test using a 1-cycle bit test (maybe an unused IO Pin). To do this, you'd feedback the output of the LUT back into one of its inputs and then use the LUT to create a strobe on the edge you are looking for. This is very complex, and also since the strobe has no acknowledgement that if the signal changes when you are not looking for it (in a spin check) then it will be lost and you will have to wait for the next edge (probably fatal in your application).
Background:
I am using uCOS II, Keil uVision 5, and a TIVA board with the TM4C123GH6PM MCU on it. I was given a the port for uCOS II as well as a blank project file to get started. I wrote the tasks needed and the program works correctly but now I am interested in implementing interrupts and trying to understand how they can coexist with an RTOS. This is all done in C.
Issue:
Interrupts do not work; they simply don't fire up. There are instances where the other tasks won't execute either. The core issue is that I don't really understand how interrupts can coexist with the RTOS. I've written code (in both assembly and C) on baremetal where interrupts work perfectly and I fully understand how they work when there is no layer in-between the code and the cpu.
What I've Tried:
I read the book and reference manual that came with uCOS-II and searched for ways to implement interrupts. No mentioning whatsoever; the only thing mentioned about interrupts is how they interact with the scheduler so interrupts are only covered in the theoretical domain.
I asked on the micrium (original vendor) forum and no reply/seems like a dead forum
I looked at the libraries included with the uCOS port and found something useful:
bsp_int is the library that deals with the interrupts. BSP stands for Board Support Package and is intended to facilitate the interaction between the software and the code
The library has functions to register an interrupt and enable it. The rtos uses its own table of ISR handlers mapped to the NVIC of the cpu. All handlers are filtered through a generic handler. The two useful functions from this library are:
bsp_intVectSet which takes the interrupt trigger ID (i.e bsp_int_id_gpiof) and a pointer to the interrupt handler and registers it
bsp_intEn which takes the interrupt ID and enables it
The bsp_int library is included in bsp.c which calls the initialization function (from bsp_int) for interrupts (bsp_IntInit())
The bsp.h file is included in the main application file (app.c)
app.c main is the entry point of the program. The main disables interrupt, initializes uCOS (i.e the kernel) creates the first/starting task called AppTaskStart, and starts multitasking (i.e gives control to the rtos and the function never returns). I'm assuming the kernel reenables interrupts since it needs those to run
So the way the rtos works (to my understanding) is that it hijacks the systick timer so at every clock tick, the kernel is called and is able to schedule the tasks.
AppTaskStart, which is the very first task to execute within the kernel domain, calls bsp_init (in which, bsp_IntInit is called to initialize the interrupt table and more) and performs other initialization tasks
The way I've set up interrupts without a kernel before, was using the Tivaware library (in C) provided by TI. It has functions for creating interrupts, specifying the trigger (i.e rising/falling edge, timer overflow, etc.), and enabling them. This method works and I thought is what I should be using to set up the interrupt I want
So I used the tivaware library to set up interrupts on one of the gpio ports (to which, mechanical switches are connected) on the rising edge. The code for this, as well as other code to start the port f peripheral, set the switches pins to input, and enable pull-ups, is included in bsp_init (bsp.c) which is called from AppTaskStart which is called from main. So far everything works perfectly, the rtos initiliazes, and all its tasks execute accordingly. When I try to move the code directly to the main and flash the program onto the board, the rtos initializes (leds blink) but then the tasks don't execute. Any ideas why that might be?
If I add the code to also enable and register the interrupt for when the switch is closed in the same function, using code from the tivaware library, the rtos does not initialize.
Do I need to setup/register/enable interrupts using the tivaware library as well as register and enable them using the board support package (bsp) library? The way I understand this so far is that the bsp is registering/enabling interrupts for the kernel only whereas the tivaware code is enabling them by directly writing to the registers so the latter is needed to setup the cpu portion of the interrupts and the former is needed to setup the OS portion of the interrupts. But I don't know. I really don't understand how they've designed incorporating interrupts under uCOS II. They do specify how the interrupt handler should be written and what macros to use but nothing else.
What should I try next? Does anybody have any experience with working with these two components (the rtos and the board)?
I am just stuck at this point and I've been playing with the code, moving stuff around, trying to find a clue/lead to solving this issue. I can't even debug the rtos because uVision does not support uCOS and I can't use step-debugging because interrupts are firing at every clock-tick and the PC is being changed constantly so the IDE can't follow it.
I know IAR Embedded Workbench has support for uCOS-II and I have the app on my laptop and I tried setting up a project but I was only given a port/starter project for Keil and I don't know how to set one up for IAR EW. The only ports on Micrium's website are for the TM4C129 series and I tried using that to start an IAR EW project but I couldn't get it to work (libraries not being linked/missing files).
Thank You!
Does anybody have any experience with working with these two components (the rtos and the board)?
I'm afraid I haven't worked with uCos (but with other OSes, mainly SysBios and FreeRTOS) and I haven't worked with Tiva (but with Sitara AM335x) yet. Still, I think some hints below may be helpful for you (and apply in spite of the different implementations you are using).
What should I try next?
These are the steps I recommend you to consider. You can put them into any order you find most helpful.
interrupt priorities of ISRs that call RTOS library APIs must not be higher than the level that is taken into account by RTOS, otherwise the RTOS-internal states may get corrupted, and anything can happen. Please check your OS documentation.
Please verify the position of the interrupt vector table and its contents:
Does every vector table entry point to one of the ISR wrapper handlers provided by the RTOS, or do you also find "independent" ISR implementations? If so, what do the latter do?
If you find pointers into third-party libraries you don't have the code for, don't give up. These can be as important...
Even more important than including the right header to bsp_Int... APIs is that interrupt management of all software components runs through one unique API, e.g., the bsp_Int... one.
Your assumptions about app.c/main() sound reasonable. Please make sure that you also know about every component that accesses interrupts indirectly.
AppTaskStart, which is the very first task to execute within the kernel domain, calls bsp_init (in which, bsp_IntInit is called to initialize the interrupt table and more) and performs other initialization tasks
Please check what happens if you place a breakpoint at the top of every task function. Then you should be able to watch all tasks start and run into its breakpoint once.
The way I've set up interrupts without a kernel before, was using the Tivaware library (in C) provided by TI. It has functions for creating interrupts, specifying the trigger (i.e rising/falling edge, timer overflow, etc.), and enabling them. This method works and I thought is what I should be using to set up the interrupt I want
You should make sure that the Tivaware library only uses interrupts in a way that is compatible to your RTOS. You can do this by RTM or reading the sources.
So I used the tivaware library to set up interrupts on one of the gpio ports (to which, mechanical switches are connected) on the rising edge. The code for this, as well as other code to start the port f peripheral, set the switches pins to input, and enable pull-ups, is included in bsp_init (bsp.c) which is called from AppTaskStart which is called from main. So far everything works perfectly, the rtos initiliazes, and all its tasks execute accordingly. When I try to move the code directly to the main and flash the program onto the board, the rtos initializes (leds blink) but then the tasks don't execute. Any ideas why that might be?
Could it be that an electronic problem at one of the controller pins connected to the interrupt starts triggering that interrupt all the time?
If I add the code to [...]
Have you tried creating a
minimal reproducible example?
When you do this, you can enhance the effect by simultaneously performing
rubber duck debugging.
Do I need to setup/register/enable interrupts using the tivaware library as well as register and enable them using the board support package (bsp) library? The way I understand this so far is that the bsp is registering/enabling interrupts for the kernel only whereas the tivaware code is enabling them by directly writing to the registers so the latter is needed to setup the cpu portion of the interrupts and the former is needed to setup the OS portion of the interrupts. But I don't know. I really don't understand how they've designed incorporating interrupts under uCOS II. They do specify how the interrupt handler should be written and what macros to use but nothing else.
This sounds dangerous. I haven't worked with Tiva yet, but instead with another TI chip (AM335x). There we had a similar situation, different libraries accessing different/overlapping parts of the same system resource by means of different abstraction layers. Situation only started improving when we tidied up the mess of abstraction layers ignoring each other and ported some code to a common abstraction layering scheme.
And some PS:
You can write your ISRs in C or assembler, as you like. Depending on the quality of your toolchain and optimisation settings, assember may yield better performance (or not at all), and by calling C APIs from assembler, some programmers tend to make new mistakes. I'd recommend to stay within C until you know in detail what is happening around your OS and IRQs.
I am learning about interrupts and couldn't understand what happens when there are too many interrupts to a point where the CPU can't process the foreground loop or complete the existing interrupts. I read through this article https://www.cs.utah.edu/~regehr/papers/interrupt_chapter.pdf but didn't completely understand how a scheduler would help, if there are simply too many interrupts?
Do we switch to a faster CPU if the interrupts can not be missed?
Yes, you had to switch to a faster CPU!
You had to ensure that there is enough time for the mainloop. Therefore it is really important to keep your Interrupt service as short as possible and do some CPU workloads tests.
Indeed, any time there is contention over a shared resource, there is the possibility of starvation. The schedulers discussed in the paper limit the interrupt rate, thus ensuring some interrupt-free processing time during each interval. During high activity periods, interrupt handling is disabled, and the scheduler switches to polling mode where it interrogates the state of the interrupt request lines periodically, effectively throttling the stream of interrupts. The operating system strives to do as little as possible in each interrupt handler - tasks are often simply queued so they can be handled later at a different stage. There are many considerations and trade-offs that go into any scheduling algorithm.
Overall you need a clue of how much time each part of your program consumes. This is pretty easy to measure in practice live with an oscilloscope. If you activate a GPIO when entering and de-activate it when leaving the interrupt, you don't only get to see how much time the ISR consumes, but also how often it kicks in. If you do this for each ISR you get a good idea how much time they need. You can then do something similar in main(), to get a rough estimate of the complete execution cycle of the program, main + interrupts.
As for the best solution, it is obviously to reduce the amount of interrupts. Use polling if possible. Use DMA. Use serial peripherals (UART, CAN etc) that are hardware-buffered instead of interrupt-intensive ones. Use hardware PWM instead of output compare timers. And so on. These things need to be considered early on when you pick a suitable MCU for your project. If you picked the wrong MCU, then you'll obviously have to change. Twiddling with the CPU clock sounds like quick & dirty fix. Get the design right instead.
I'm trying to figure out this basic scenario:
Suppose my cpu received an exception or an interrupt. What I do know, is that the cpu starts to perform an interrupt service routine (looks at the idtr register to locate the idt table, and goes to the appropriate entry to receive the isr address), but in what context is the code running?
Meaning if I have a thread currently running and generating an interrupt of some sort, in which context will the isr run, in the initial process that "holds" the thread, or in some other magical thread?
Thanks!
Interesting question, which raises a few different issues.
The first is that interrupts don’t actually run inside of any thread from the CPU’s perspective. Indeed, the CPU itself is barely aware of threads; it may know a bit more if it has hyper threading or some similar technology, but a thread is really an operating system thing (or, sometimes, an application thing).
The second is that ISRs (Interrupt Service Routines) generally run at some elevated privilege level; you don’t really say which processor family you’re talking about, so it’s difficult to be specific, but modern processors normally have at least one special mode that they enter for handling interrupts — often with its own register bank. One might also ask, as part of your question, whose page table is active during an interrupt?
Third is the question of whose memory map ISRs have when they are entered. The answer, again, is going to be highly processor specific; it’s possible to imagine architectures that disable paging on ISR entry, other architectures that switch automatically to an interrupt page table, and (probably the most common approach) those that decide not to bother doing anything about the page table when entering an ISR.
The fourth is that some operating systems have policies of their own on these kinds of things. A common approach on modern operating systems is to make ISRs themselves as short as possible, and where any significant work needs to be done, convert the interrupt into some kind of event that can be handled by a kernel thread (or even, potentially, by a user thread). In this kind of system, the code that actually handles an interrupt may well be running in a specific thread, though it probably isn’t actually an interrupt service routine at that point.
Summary:
ISRs themselves don’t really run in the context of any given thread.
ISRs may run with the page table of the interrupted thread (depends on architecture).
ISRs may start with a copy of that thread’s registers (depends on architecture).
In modern systems, ISRs commonly try to schedule an event and then exit quickly. That event might be handled by a specific thread (e.g. for processor exceptions, it’s usually delivered as a signal or Structured Exception or similar to the thread that caused it); or by a pool of threads (e.g. to service I/O in the kernel).
If you’re interested in the specifics for x86 (I guess you are, as you use some Intel specific terms in your question), you need to look at the Intel 64 and IA-32 Architectures Software Developer’s Manual, volume 3B, and you’ll need to look at the operating system documentation. x86 is a very complicated architecture compared to some others — for instance, it can optionally perform a task switch on interrupt delivery (if you put a “task gate” in the IDT), in which case it will certainly have its own set of registers and quite possibly its own page table; even if this feature is used by a given operating system, there is no guarantee that x86 tasks map straightforwardly (or at all) to operating system processes and/or threads.
The most heard advice in embedded programming is "keep your interrupts short".
Now my situation is that I have a very long running task in my main() loop (writing large blocks of data to SDcard), which can sometimes take 100ms. So to keep my system responsive I moved all other stuff to interrupt-handlers.
For example, normally one would handle the incoming UART data in an interrupt, then process the incoming command in the main() loop, and then send back the response. But in my case, the whole processing/handling of the commands also takes places in the interrupts, because my main() loop can be blocked for (relatively) long periods.
The optimal solution would be to switch to an RTOS but I don't have the RAM for it. Are there alternatives for my design where the interrupts can be short?
The traditional approach for this is for Interrupts to schedule a deferred procedure and end the interrupt as soon as possible.
Once the interrupt has finished, the list of deferred procedures is walked from most-important to least important.
Consider the case where you have your main (lower proiority) action, and two interrupts I1 and I2, where I2 is more important than main, but less important than I1.
In this case, let's suppose you're running main and I1 fires. I1 schedules a deferred procedure and signals to the hardware that I1 is done. I1's DPC now begins running. Suddenly I2 comes in from the hardware. I2's interrupt takes over from I1's DPC and schedules I2's DPC and signals to the hardware that it's done.
The scheduler then returns to I1's DPC (because it is more important), and when I1's DPC completes, I2's DPC begins (because it is more important than main), and then eventually returns execution to main.
This design allows you to schedule the importance of different interrupts, encourages you to keep your interrupts small, and allows you to complete DPCs in an ordered and in-order prioritized way.
There are 100 different ways to skin this cat, depending on CPU architecture (interrupt nesting & prioritization, software interrupt support, etc.) but let's take a pretty straightforward approach that is relatively simple to understand and free from the race conditions and resource-sharing hazards of a preemptive kernel.
(Disclaimer: my first choice is typically a preemptive real time kernel, many of them can run in extremely resource-constrained systems... SecurityMatt's suggestion is good but if you're not comfortable implementing your own preemptible kernel / task switcher, particularly one that handles asynchronous (interrupt-triggered) preemption, you can get wrapped around the axle pretty quickly. So what I'm proposing below is not as responsive as a preemption-based kernel, but it's much simpler and often adequate).
Create 3 event/work queues:
Q1 is the lowest priority and handles your slow, background SD card writes
Q2 holds requests to process incoming UART packets
Q3 (highest priority) holds UART RX FIFO read requests.
I split up the UART RX FIFO reading and the processing of the read packet so that the FIFO reading is always serviced ahead of the packet processing; maybe you want to keep them together, your choice.
For this to work, you break your large (~100ms) SD card write process into a bunch of smaller, discrete, run to completion steps.
So for example, to write 5 blocks, 20ms each, you write the first block, then enqueue "write next block" to Q1. You go back to your scheduler at the end of each step & scan the queues in priority order, starting with Q3. If Q2 and Q3 are empty, you pull the next event off of Q1 ("write next block"), and run that command for another 20ms before returning and scanning the queues again. If 20ms is not responsive enough, you break up each 20ms block write into a more fine-grained set of steps, continually posting to Q1 the next work step.
Now for the incoming UART stuff; in the UART RX ISR, you simple enqueue a "read UART FIFO" command in Q3, and return from interrupt back into the 20ms "write block" step that was interrupted. As soon as the CPU finishes the write, it goes back and scans the queues in priority order (worst case response will be 20ms if the block write had just begun at the time of the interrupt). The queue scanner (scheduler) will see that Q3 now has work to do, and it will run that command before going back and scanning again.
The responsiveness in your system, worst case, will be determined by the longest run-to-completion step in the system, regardless of priority. You keep your system very responsive by doing work in small, discrete, run to completion steps.
Note that I have to speak in generalities here. Maybe you want to read the UART RX FIFO in the ISR, put the data into a buffer, and only defer the packet processing, not the actual reading of the FIFO (then you'd only have 2 queues). You have to work this out for yourself. But I hope the approach makes sense.
This event-driven approach with prioritized queues is exactly the approach used by the Quantum Platform (QP) event-driven framework. The QP actually supports an underlying non-preemptive (cooperative) scheduler, such as what was described here, or a preemptive scheduler which runs the scheduler each an event is queued (similar to the approach suggested by SecurityMatt). You can see the code/implementation of the QP's cooperative scheduler over at QP website.
An alternative solution would be as follow:
Anywhere the FAT library can capture the processor for a long time, you insert a call to a new function which is normally very fast and return to the caller after a few machine cycles. Such fast function would not impact the real-time performance of your time consuming operation, such as reading/writing to SD Flash. You would insert such call in any loop that wait for a flash sector to be erased. You also insert a call to such function in between every 512 bytes written or 512 bytes read.
The goal of that function is to perform most of the task that you would normally have inside the "while(1)" loop in a typical "main()" for embedded device. It would first increment an integer and perform a fast modulo on the new value, then return if the modulo is not equal to an arbitrary constant. The code is as follow:
void premption_check(void)
{
static int fast_modulo = 0;
//divide the number of call
fast_modulo++;
if( (fast_modulo & 0x003F) != 3)
{
return;
}
//the processor would continue here only once every 64 calls to "premption_check"
Next, you call the functions that extract RS232 characters/strings from the serial port interrupts, process any command if complete strings are received, etc
The binary mask 0x3F used above means that we look only at the 6 least significant bits of the counter. When these 6 bits happen to be equal to the arbitrary value 5, when go ahead with the calls to functions which may take some micro-second or even milli-second to execute. You may want to try smaller or larger binary mask depending on the speed at which you want to service the serial port and other operations. You may even use simultaneously more than one mask to service some operation faster than other.
The FAT library and the SD card should not experience any problem when some sporadic delay happen in between two Flash erase operation, for example.
The solution given here works even with a micro-controller with only 2K byte, like many variant of 8051. As incredible as it may seems, the pinball machine of 1980 to 1990 had a few K of RAM, slow processors (like 10 MHz) and they where able to test one hundred switch... fully debounced, update a X/Y matrix display, produce sound effects, etc The solutions developed by these engineer can still be used to boost the performance of large system. Even with the best servers with 64 Gig RAM and many Terabyte of hard disk, I presume that any bytes count when some company want to index billions of WEB pages.
As no-one has suggested coming at it from this end yet I'll throw it in the hat:
It's possible that sticking the SD card service routine in a low-priority interrupt, maybe throwing in some DMA if you can, would free up your main loop & other interrupts to be more responsive, rather than being stuck in a main() loop waiting for longtime for something to finish.
The caveat to this is I don't know if the hardware has any way of triggering the interrupt when the SD card is ready for more, you might have to cheat by running a polling timer to check & force the interrupt. I'm not above that sort of thing though, if you have spare hardware timers & interrupts it can be done with very little overhead.
Resorting to an RTOS for something like this would seem overkill & an admission of failure to me... ;)