How to best convert legacy polling embedded firmware architecture into an event driven one? - embedded

I've got a family of embedded products running a typical main-loop based firmware with 150+k lines of code. A load of complex timing critical features is realized by a combination of hardware interrupt handlers, timer polling and protothreads (think co-routines). Actually protothreads are polling well and "only" are syntactic sugar to mimic pseudo-parallel scheduling of multiple threads (endless loops). I add bugfixes and extensions to the firmware all the time. There are about 30k devices of about 7 slightly different hardware types and versions out in the field.
For a new product-family member I need to integrate an external FreeRTOS-based project on the new product only while all older products need to get further features and improvements.
In order to not have to port the whole complex legacy firmware to FreeRTOS with all the risk of breaking perfectly fine products I plan to let the old firmware run inside a FreeRTOS task. On the older products the firmware shall still not run without FreeRTOS. Inside the FreeRTOS task the legacy firmware would consume all available processor time because its underlying implementation scheme is polling based. Due to the fact that its using prothreads (timer and hardware polling based behind the scenes) and polls on a free-running processor counter register I hope that I can convert the polling into a event driven behavior.
Here are two examples:
// first example: do something every 100 ms
if (GET_TICK_COUNT() - start > MS(100))
{
start = GET_TICK_COUNT();
// do something every 100 ms
}
// second example: wait for hardware event
setup_hardware();
PT_WAIT_UNTIL(hardware_ready(), pt);
// hardware is ready, do something else
So I got the impression that I can convert these 2 programming patterns (e.g. through macro magic and FreeRTOS functionality) into an underlying event based scheme.
So now my question: has anybody done such a thing already? Are there patterns and or best practices to follow?
[Update]
Thanks for the detailed responses. Let me comment some details: My need is to combine a "multithreading-simulation based legacy firmware (using coo-routines implementation protothreads)" and a FreeRTOS based project being composed of a couple of interacting FreeRTOS tasks. The idea is to let the complete old firmware run in its own RTOS task besides the other new tasks. I'm aware of RTOS principles and patterns (pre-emption, ressource sharing, blocking operations, signals, semaphores, mutexes, mailboxes, task priorities and so forth). I've planed to base the interaction of old and new parts exactly on these mechanisms. What I'm asking for is 1) Ideas how to convert the legacy firmware (150k+ LOC) in a semi-automated way so that the busy-waiting / polling schemes I presented above are using either the new mechanisms when run inside the RTOS tasks or just work the old way when build and run as the current main-loop-kind of firmware. A complete rewrite / full port of the legacy code is no option. 2) More ideas how to teach the old firmware implementation that is used to have full CPU resources on its hands to behave nicely inside its new prison of a RTOS task and not just consume all CPU cycles available (when given the highest priority) or producing new large Real Time Latencies when run not at the highest RTOS priority.
I guess here is nobody who already has done such a very special tas. So I just have to do the hard work and solve all the described issues one after another.

In an RTOS you create and run tasks. If you are not running more than one task, then there is little advantage in using an RTOS.
I don't use FreeRTOS (but have done), but the following applies to any RTOS, and is pseudo-code rather then FreeRTOS API specific - many details such as task priorities and stack allocation are deliberately missing.
First in most simple RTOS, including FreeRTOS, main() is used for hardware initialisation, task creation, and scheduler start:
int main( void )
{
// Necessary h/w & kernel initialisation
initHardware() ;
initKernel() ;
// Create tasks
createTask( task1 ) ;
createTask( task2 ) ;
// Start scheduling
schedulerStart() ;
// schedulerStart should not normally return
return 0 ;
}
Now let us assume that your first example is implemented in task1. A typical RTOS will have both timer and delay functions. The simplest to use is a delay, and this is suitable when the periodic processing is guaranteed to take less than one OS tick period:
void task1()
{
// do something every 100 ms
for(;;)
{
delay( 100 ) ; // assuming 1ms tick period
// something
...
}
}
If the something takes more than 1ms in this case, it will not be executed every 100ms, but 100ms plus the something execution time, which may itself be variable or non-deterministic leading to undesirable timing jitter. In that case you should use a timer:
void task1()
{
// do something every 100 ms
TIMER timer = createTimer( 100 ) ; // assuming 1ms tick period
for(;;)
{
timerWait() ;
// something
...
}
}
That way something can take up to 100ms and will still be executed accurately and deterministically every 100ms.
Now to your second example; that is a little more complex. If nothing useful can happen until the hardware is initialised, then you may as well use your existing pattern in main() before starting the scheduler. However as a generalisation, waiting for something in a different context (task or interrupt) to occur is done using a synchronisation primitive such as a semaphore or task event flag (not all RTOS have task event flags). So in a simple case in main() you might create a semaphore:
createSemaphore( hardware_ready ) ;
Then in the context performing the process that must complete:
// Init hardware
...
// Tell waiting task hardware ready
semaphoreGive( hardware_ready ) ;
Then in some task that will wait for the hardware to be ready:
void task2()
{
// wait for hardware ready
semaphoreTake( hardware_ready ) ;
// do something else
for(;;)
{
// This loop must block is any lower-priority task
// will run. Equal priority tasks may run is round-robin
// scheduling is implemented.
...
}
}

You are facing two big gotchas...
Since the old code is implemented using protothreads (coroutines), there is never any asynchronous resource contention between them. If you split these into FreeRTOS tasks, there there will be preemptive scheduling task switches; these switches may occur at places where the protothreads were not expecting, leaving data or other resources in an inconsistent state.
If you convert any of your protothreads' PT_WAIT calls into real waits in FreeRTOS, the call will really block. But the protothreads assume that other protothreads continue while they're blocked.
So, #1 implies you cannot just convert protothreads to tasks, and #2 implies you must convert protothreads to tasks (if you use FreeRTOS blocking primitives, like xEventGroupWaitBits().
The most straightforward approach will be to put all your protothreads in one task, and continue polling within that task.

Related

difference between non preemptive and cooperative and rate-monotonic scheduler?

I have Read about co-operative Scheduler which not let higher priority task run till lower priority task block itself. so if there is no delay in task the lower task will take the CPU forever is it correct? because I have thought the non preemptive is another name for cooperative but there is another article which has confused me which say in non preemptive higher task can interrupt lower task at sys tick not in the middle between ticks so what's correct ?
is actually cooperative and non preemptive are the same?
and Rate monotonic is one type of preemptive scheduler right?
it's priority didn't set manually the scheduler Algo decide priority based on execution time or deadline it is correct?
is it rate monotonic better than fixed priority preemptive kernel (the one which FreeRtos Used)?
These terms can never fully cover the range of possibilities that can exist. The truth is that people can write whatever kind of scheduler they like, and then other people try to put what is written into one or more categories.
Pre-emptive implies that an interrupt (eg: from a clock or peripheral) can cause a task switch to occur, as well as it can occur when a scheduling OS function is called (like a delay or taking or giving a semaphore).
Co-operative means that the task function must either return or else call an OS function to cause a task switch.
Some OS might have one specific timer interrupt which causes context switches. The ARM systick interrupt is suitable for this purpose. Because the tasks themselves don't have to call a scheduling function then this is one kind of pre-emption.
If a scheduler uses a timer to allow multiple tasks of equal priority to share processor time then one common name for this is a "round-robin scheduler". I have not heard the term "rate monotonic" but I assume it means something very similar.
It sounds like the article you have read describes a very simple pre-emptive scheduler, where tasks do have different priorities, but task switching can only occur when the timer interrupt runs.
Co-operative scheduling is non-preemptive, but "non-preemptive" might describe any scheduler that does not use preemption. It is a rather non-specific term.
The article you describe (without citation) however, seems confused. Context switching on a tick event is preemption if the interrupted task did not explicitly yield. Not everything you read in the Internet is true or authoritative; always check your sources to determine thier level of expertise. Enthusiastic amateurs abound.
A fully preemptive priority based scheduler can context switch on "scheduling events" which include not just the timer tick, but also whenever a running thread or interrupt handler triggers an IPC or synchronisation mechanism on which a higher-priority thread than the current thread is waiting.
What you describe as "non-preemptive" I would suggest is in fact a time triggered preemptive scheduler, where a context switch occurs only in a tick event and not asynchronously on say a message queue post or a semaphore give for example.
A rate-monotonic scheduler does not necessarily determine the priority automatically (in fact I have never come across one that did). Rather the priority is set (manually) according to rate-monotonic analysis of the tasks to be executed. It is "rate-monotonic" in the sense that it supports rate-monotonic scheduling. It is still possible for the system designer to apply entirely inappropriate priorities or partition tasks in such a way that they are insufficiently deterministic for RMS to actually occur.
Most RTOS schedulers support RMS, including FreeRTOS. Most RTOS also support variable task priority as both a priority inversion mitigation, and via an API. But to be honest if your application relies on either I would argue that it is a failed design.

How to get data from an interrupt into a task-based system?

I am preparing for an embedded systems interview and was given this question as one of the questions to prepare with.
From my research, I learned that interrupts are called by a piece of hardware when there is an important issue that the OS should take care of and because of this data cannot be returned by an interrupt.
However, I didn't find any definitive information about how interrupts work with a task-based system. Is this a trick question about interrupts or is there a method to get data from them?
It is true that an interrupt cannot "return" data to a caller, because there is no caller. Interrupts are triggered by asynchronous events independent of normal program flow.
However it is possible for an interrupt pass data to a thread/task context (or even other interrupts) via shared memory or inter-process communication (IPC) such as a message queue, pipe or mailbox. Such data exchange must be non-blocking. So for example, if a message queue is full, the ISR may not wait on the queue to become available - it must baulk and discard the data.
interrupts are called [...] when there is an important issue that the OS should take care of [...]
It is not about "importance" it is about timliness, determinusm, meeting real-time deadlines, and dealing with data before buffers or FIFOs are overrun for example. There need not even be an OS, ant interrupts are generally application specific and not an issue for the OS at all.
I didn't find any definitive information about how interrupts work with a task-based system.
Perhaps you need to hone your research technique (or Google Fu). https://www.google.com/search?q=rtos+interrupt+communication
It's not a trick question. The problem is the same whether you have a "task-based system" (RTOS) or not:
Interrupts will happen at random times, relative to your main program. I.e. you could be calculating something like a = b + c and the interrupt could happen at a time where b is loaded into a register, but before c is.
Interrupts execute in a different context from your main program. Depending on the architecture, you could have a separate stack, for example.
Given the above, the interrupt service routines cannot take any parameters and cannot return any values. The way to pass data in and out of an ISR is to use shared memory (e.g. a global variable).
There are several ways to get data from interrupt:
Using queue or similar approach (implement in freeRTOS). NOTE: there is special API for ISR.
Using global variable. For this way you need to care about consistency of data because interrupt can happen anytime.
...

How to restart a task

How to restart a lower priority task from a higher priority task?
This is a general question about how RTOS in embedded systems work.
I have multiple tasks with different priorities. The lower priority task has certain steps e.g., step1, step2, step3.
The highest priority task handles system malfunctions. If a malfunction occurs then an ISR in the system will cause the higher priority task to immediately run.
My question ...
If system malfunction occurs while the lower priority task in middle e.g., step2 and we do not want to run the rest of the steps in the lower priority task, then how do we accomplish it?
My understanding is that when the scheduler is ready to run the lower priority task then it will continue from where it left off prior to system malfunction. So step3 will be executed.
Does embedded RTOS (e.g., Keil RTX or FreeRTOS) provide such a mechanism so that on certain signal from higher priority task/ISR the lower priority task can restart?
The mechanism you are suggesting is unlikely to be supported by an RTOS since it would be non-deterministic in its behaviour. For example such an RTOS mechanism would have no knowledge of resource allocation to initialisation within the task and whether it would be safe to simply "restart", or how to clean-up if it were not.
Moreover the RTOS preempts at the machine instruction level, not between logical functional "steps" - there is no determination of where it is in the process.
Any such mechanism must be built into the task's implementation at the application level not the RTOS level in order that the process is controlled and deterministic. For example you might have a structure such as:
for(;;)
{
step1() ;
if( restart() )
{
clean_up() ;
continue ;
}
step2() ;
if( restart() )
{
clean_up() ;
continue ;
}
step3() ;
}
Where the malfunction requests a restart, and the request is polled through restart() at specific points in the task where a restart is valid or safe. The clean_up() performs any necessary resource management, and the continue causes a jump to the start of the task loop (in a more complex situation, a goto might be used, but this is already probably a bad idea - don't make it worse!).
Fundamentally the point is you have to code the task to handle the malfunction appropriately and the RTOS cannot "know" what is appropriate.
While there is no generic RTOS mechanism for what you are suggesting, it is possible perhaps to implement a framework to support the required behaviour, but it would require you to write all your tasks to a specific pattern dictated by such a framework - implementing a comprehensive solution that handles resource clean-up in a simple manner however is non-trivial.
QNX Neutrino has a "High Availability Framework" for example that supports process restart and automatic clean-up. It is an example of what can be done, but it is of course specific to that RTOS. If this behaviour is critical to your application, then you need to select your RTOS accordingly rather then rely on "traditional" mechanisms available to any RTOS.

operating system - context switches

I have been confused about the issue of context switches between processes, given round robin scheduler of certain time slice (which is what unix/windows both use in a basic sense).
So, suppose we have 200 processes running on a single core machine. If the scheduler is using even 1ms time slice, each process would get its share every 200ms, which is probably not the case (imagine a Java high-frequency app, I would not assume it gets scheduled every 200ms to serve requests). Having said that, what am I missing in the picture?
Furthermore, java and other languages allows to put the running thread to sleep for e.g. 100ms. Am I correct in saying that this does not cause context switch, and if so, how is this achieved?
So, suppose we have 200 processes running on a single core machine. If
the scheduler is using even 1ms time slice, each process would get its
share every 200ms, which is probably not the case (imagine a Java
high-frequency app, I would not assume it gets scheduled every 200ms
to serve requests). Having said that, what am I missing in the
picture?
No, you aren't missing anything. It's the same case in the case of non-pre-emptive systems. Those having pre-emptive rights(meaning high priority as compared to other processes) can easily swap the less useful process, up to an extent that a high-priority process would run 10 times(say/assume --- actual results are totally depending on the situation and implementation) than the lowest priority process till the former doesn't produce the condition of starvation of the least priority process.
Talking about the processes of similar priority, it totally depends on the Round-Robin Algorithm which you've mentioned, though which process would be picked first is again based on the implementation. And, Windows and Unix have same process scheduling algorithms. Windows and Unix does utilise Round-Robin, but, Linux task scheduler is called Completely Fair Scheduler (CFS).
Furthermore, java and other languages allows to put the running thread
to sleep for e.g. 100ms. Am I correct in saying that this does not
cause context switch, and if so, how is this achieved?
Programming languages and libraries implement "sleep" functionality with the aid of the kernel. Without kernel-level support, they'd have to busy-wait, spinning in a tight loop, until the requested sleep duration elapsed. This would wastefully consume the processor.
Talking about the threads which are caused to sleep(Thread.sleep(long millis)) generally the following is done in most of the systems :
Suspend execution of the process and mark it as not runnable.
Set a timer for the given wait time. Systems provide hardware timers that let the kernel register to receive an interrupt at a given point in the future.
When the timer hits, mark the process as runnable.
I hope you might be aware of threading models like one to one, many to one, and many to many. So, I am not getting into much detail, jut a reference for yourself.
It might appear to you as if it increases the overhead/complexity. But, that's how threads(user-threads created in JVM) are operated upon. And, then the selection is based upon those memory models which I mentioned above. Check this Quora question and answers to that one, and please go through the best answer given by Robert-Love.
For further reading, I'd suggest you to read from Scheduling Algorithms explanation on OSDev.org and Operating System Concepts book by Galvin, Gagne, Silberschatz.

How to keep interrupts short?

The most heard advice in embedded programming is "keep your interrupts short".
Now my situation is that I have a very long running task in my main() loop (writing large blocks of data to SDcard), which can sometimes take 100ms. So to keep my system responsive I moved all other stuff to interrupt-handlers.
For example, normally one would handle the incoming UART data in an interrupt, then process the incoming command in the main() loop, and then send back the response. But in my case, the whole processing/handling of the commands also takes places in the interrupts, because my main() loop can be blocked for (relatively) long periods.
The optimal solution would be to switch to an RTOS but I don't have the RAM for it. Are there alternatives for my design where the interrupts can be short?
The traditional approach for this is for Interrupts to schedule a deferred procedure and end the interrupt as soon as possible.
Once the interrupt has finished, the list of deferred procedures is walked from most-important to least important.
Consider the case where you have your main (lower proiority) action, and two interrupts I1 and I2, where I2 is more important than main, but less important than I1.
In this case, let's suppose you're running main and I1 fires. I1 schedules a deferred procedure and signals to the hardware that I1 is done. I1's DPC now begins running. Suddenly I2 comes in from the hardware. I2's interrupt takes over from I1's DPC and schedules I2's DPC and signals to the hardware that it's done.
The scheduler then returns to I1's DPC (because it is more important), and when I1's DPC completes, I2's DPC begins (because it is more important than main), and then eventually returns execution to main.
This design allows you to schedule the importance of different interrupts, encourages you to keep your interrupts small, and allows you to complete DPCs in an ordered and in-order prioritized way.
There are 100 different ways to skin this cat, depending on CPU architecture (interrupt nesting & prioritization, software interrupt support, etc.) but let's take a pretty straightforward approach that is relatively simple to understand and free from the race conditions and resource-sharing hazards of a preemptive kernel.
(Disclaimer: my first choice is typically a preemptive real time kernel, many of them can run in extremely resource-constrained systems... SecurityMatt's suggestion is good but if you're not comfortable implementing your own preemptible kernel / task switcher, particularly one that handles asynchronous (interrupt-triggered) preemption, you can get wrapped around the axle pretty quickly. So what I'm proposing below is not as responsive as a preemption-based kernel, but it's much simpler and often adequate).
Create 3 event/work queues:
Q1 is the lowest priority and handles your slow, background SD card writes
Q2 holds requests to process incoming UART packets
Q3 (highest priority) holds UART RX FIFO read requests.
I split up the UART RX FIFO reading and the processing of the read packet so that the FIFO reading is always serviced ahead of the packet processing; maybe you want to keep them together, your choice.
For this to work, you break your large (~100ms) SD card write process into a bunch of smaller, discrete, run to completion steps.
So for example, to write 5 blocks, 20ms each, you write the first block, then enqueue "write next block" to Q1. You go back to your scheduler at the end of each step & scan the queues in priority order, starting with Q3. If Q2 and Q3 are empty, you pull the next event off of Q1 ("write next block"), and run that command for another 20ms before returning and scanning the queues again. If 20ms is not responsive enough, you break up each 20ms block write into a more fine-grained set of steps, continually posting to Q1 the next work step.
Now for the incoming UART stuff; in the UART RX ISR, you simple enqueue a "read UART FIFO" command in Q3, and return from interrupt back into the 20ms "write block" step that was interrupted. As soon as the CPU finishes the write, it goes back and scans the queues in priority order (worst case response will be 20ms if the block write had just begun at the time of the interrupt). The queue scanner (scheduler) will see that Q3 now has work to do, and it will run that command before going back and scanning again.
The responsiveness in your system, worst case, will be determined by the longest run-to-completion step in the system, regardless of priority. You keep your system very responsive by doing work in small, discrete, run to completion steps.
Note that I have to speak in generalities here. Maybe you want to read the UART RX FIFO in the ISR, put the data into a buffer, and only defer the packet processing, not the actual reading of the FIFO (then you'd only have 2 queues). You have to work this out for yourself. But I hope the approach makes sense.
This event-driven approach with prioritized queues is exactly the approach used by the Quantum Platform (QP) event-driven framework. The QP actually supports an underlying non-preemptive (cooperative) scheduler, such as what was described here, or a preemptive scheduler which runs the scheduler each an event is queued (similar to the approach suggested by SecurityMatt). You can see the code/implementation of the QP's cooperative scheduler over at QP website.
An alternative solution would be as follow:
Anywhere the FAT library can capture the processor for a long time, you insert a call to a new function which is normally very fast and return to the caller after a few machine cycles. Such fast function would not impact the real-time performance of your time consuming operation, such as reading/writing to SD Flash. You would insert such call in any loop that wait for a flash sector to be erased. You also insert a call to such function in between every 512 bytes written or 512 bytes read.
The goal of that function is to perform most of the task that you would normally have inside the "while(1)" loop in a typical "main()" for embedded device. It would first increment an integer and perform a fast modulo on the new value, then return if the modulo is not equal to an arbitrary constant. The code is as follow:
void premption_check(void)
{
static int fast_modulo = 0;
//divide the number of call
fast_modulo++;
if( (fast_modulo & 0x003F) != 3)
{
return;
}
//the processor would continue here only once every 64 calls to "premption_check"
Next, you call the functions that extract RS232 characters/strings from the serial port interrupts, process any command if complete strings are received, etc
The binary mask 0x3F used above means that we look only at the 6 least significant bits of the counter. When these 6 bits happen to be equal to the arbitrary value 5, when go ahead with the calls to functions which may take some micro-second or even milli-second to execute. You may want to try smaller or larger binary mask depending on the speed at which you want to service the serial port and other operations. You may even use simultaneously more than one mask to service some operation faster than other.
The FAT library and the SD card should not experience any problem when some sporadic delay happen in between two Flash erase operation, for example.
The solution given here works even with a micro-controller with only 2K byte, like many variant of 8051. As incredible as it may seems, the pinball machine of 1980 to 1990 had a few K of RAM, slow processors (like 10 MHz) and they where able to test one hundred switch... fully debounced, update a X/Y matrix display, produce sound effects, etc The solutions developed by these engineer can still be used to boost the performance of large system. Even with the best servers with 64 Gig RAM and many Terabyte of hard disk, I presume that any bytes count when some company want to index billions of WEB pages.
As no-one has suggested coming at it from this end yet I'll throw it in the hat:
It's possible that sticking the SD card service routine in a low-priority interrupt, maybe throwing in some DMA if you can, would free up your main loop & other interrupts to be more responsive, rather than being stuck in a main() loop waiting for longtime for something to finish.
The caveat to this is I don't know if the hardware has any way of triggering the interrupt when the SD card is ready for more, you might have to cheat by running a polling timer to check & force the interrupt. I'm not above that sort of thing though, if you have spare hardware timers & interrupts it can be done with very little overhead.
Resorting to an RTOS for something like this would seem overkill & an admission of failure to me... ;)