Strategy for feeding a watchdog in a multitask environment - embedded

Having moved some embedded code to FreeRTOS, I'm left with an interesting dilemma about the watchdog. The watchdog timer is a must for our application. Using FreeRTOS has been a huge boon for us too. When the application was more single-tasked, it fed the watchdog at timely points in its logic flow so that we could make sure the task was making logical progress in a timely fashion.
With multiple tasks though, that's not easy. One task could be bound up for some reason, not making progress, but another is doing just fine and making enough progress to keep the watchdog fed happily.
One thought was to launch a separate task solely to feed the watchdog, and then use some counters that the other tasks increment regularly, when the watchdog task ticks, it would make sure that all the counters looked like progress was being made on all the other tasks, and if so, go ahead and feed the watchdog.
I'm curious what others have done in situations like this?

A watchdog task that monitors the status of all the other tasks is a good solution. But instead of a counter, consider using a status flag for each task. The status flag should have three possible values: UNKNOWN, ALIVE, and ASLEEP. When a periodic task runs, it sets the flag to ALIVE. Tasks that block on an asynchronous event should set their flag to ASLEEP before they block and ALIVE when the run. When the watchdog monitor task runs it should kick the watchdog if every task is either ALIVE or ASLEEP. Then the watchdog monitor task should set all of the ALIVE flags to UNKNOWN. (ASLEEP flags should remain ASLEEP.) The tasks with the UNKNOWN flag must run and set their flags to ALIVE or ASLEEP again before the monitor task will kick the watchdog again.
See the "Multitasking" section of this article for more details: http://www.embedded.com/design/debug-and-optimization/4402288/Watchdog-Timers

This is indeed a big pain with watchdog timers.
My boards have an LED on a GPIO line, so I flash that in a while/sleep loop, (750ms on, 250ms off), in a next-to-lowest priority thread, (lowest is idle thread which just goes onto low power mode in a loop). I have put a wdog feed in the LED-flash thread.
This helps with complete crashes and higher-priority threads that CPU loop, but doesn't help if the system deadlocks. Luckily, my message-passing designs do not deadlock, (well, not often, anyway:).

Do not forget to handle possible situation where tasks are deleted, or dormant for longer periods of time. If those tasks were previously checked in with a watchdog task, they also need to have a 'check out' mechanism.
In other words, the list of tasks for which a watchdog task is responsible should be dynamic, and it should be organized so that some wild code cannot easily delete the task from the list.
I know, easier said then done...

I've design the solution using the FreeRTOS timers:
SystemSupervisor SW Timer which feed the HW WD. FreeRTOS Failure
causes reset.
Each task creates "its own" SW timer with SystemReset function.
Each task responsible to "manually" reload its timer before it expired.
SystemReset function saves data before commiting a suiside
Here is some pseudo-code listing:
//---------------------------------
//
// System WD
//
void WD_init(void)
{
HW_WD_Init();
// Read Saved Failure data, Send to Monitor
// Create Monitor timer
xTimerCreate( "System WD", // Name
HW_WD_INTERVAL/2, // Reload value
TRUE, // Auto Reload
0, // Timed ID (Data per timer)
SYS_WD_Feed);
}
void SYS_WD_Feed(void)
{
HW_WD_Feed();
}
//-------------------------
// Tasks WD
//
WD_Handler WD_Create()
{
return xTimerCreate( "", // Name
100, // Dummy Reload value
FALSE, // Auto Reload
pxCurrentTCB, // Timed ID (Data per timer)
Task_WD_Reset);
}
Task_WD_Reset(pxTimer)
{
TaskHandler_t th = pvTimerGetTimerID(pxTimer)
// Save Task Name and Status
// Reset
}
Task_WD_Feed(WD_Handler, ms)
{
xTimerChangePeriod(WD_Handler, ms / portTICK_PERIOD_MS, 100);
}

Related

FreeRTOS stuck in osDelay

I'm working on a project using a STM32F446 with a boilerplate created with STM32CubeMX (for peripherals initialization and middleware like the FreeRTOS with the CMSIS-V1 interface).
I have two threads which communicate using mailboxes but I encountered a problem: one of the thread body is
void StartDispatcherTask(void const * argument)
{
mailCommand *commandData = NULL;
mailCommandResponse *commandResponse = NULL;
osEvent event;
for(;;)
{
event = osMailGet(commandMailHandle, osWaitForever);
commandData = (mailCommand *)event.value.p;
// Here is the problem
osDelay(5000);
}
}
It gets to the delay but never gets out. Is there a problem with using the mailbox and the delay in the same thread? I tried also bringing the delay before the for(;;) and it works.
EDIT: I guess I can try to add more detail to the problem. The first thread send a mail of a certain type and then waits for a mail of another type; the thread in which I get the problem receive the mail go the first type and execute some code based on what it receive and then send the result as a mail of the second type; sometimes it is that it has to wait using osDelay and there it stop working but without going into any fault handler
I would rather use standard freeRTOS API. ARM CMSIS wrapper is rubbish.
BTW I rather suspect osMailGet(commandMailHandle, osWaitForever);
the delay is in this case not needed at all. If you wait for the data in the BLOCKED state the task does not consume any processing power
If another guesses are:
You are landing in the HF
You are stacked in the context switch (wrong interrupt priorities )
use your debugger and see what is going on.
osStatus osDelay (uint32_t millisec)
The millisec value specifies the number of timer ticks.
The exact time delay depends on the actual time elapsed since the last timer tick.
For a value of 1, the system waits until the next timer tick occurs.
=> You have to check whether timer tick is running or not.
check this link
As P__J__ pointed out in an earlier answer, you shouldn't use the osDelay() call in the loop1
because your task loop will wait at the osMailGet() call for the next request/mail until it arrives anyhow.
But this hint called my attention to another possible reason for your observation, so I'm opening this new answer:2
As the loop execution is interrupted by a delay of 5000 ticks - could it be that the producer of the mails is filling the mailbox faster than the task is consuming mails? Then, you should inspect if this situation is detected/handled in the producer context.
If the producer ignores "queue full" return values and discards the mails before they have been transmitted, the system will only process a few mails every 5000 ticks (or it may lose all but a few mails after the first fill of the mailbox, if the producer in your example only fills the mailbox queue once).
This could look like the consumer task being stuck, even if the main problem is about the producer context (task/ISR).
1
The osDelay() call can only help you if you want to avoid to process another mail within 5000 ticks if request mails are produced faster than the task processes them.
But then, you'd have a different problem, and you should open a different question...
2
Edit: I just noticed that Clifford already mentioned this option in one of his comments to the question. I think this option must be covered by an answer.

what's different between the Blocked and Busy Waiting?

I known the implement of Busy Waiting. it's a death loop like this:
//main thread
while (true) {
msg = msgQueue.next();
msg.runnable.run();
}
//....msg queue
public Message next() {
while (true) {
if (!queue.isEmpty()) {
return queue.dequeue();
}
}
}
so, the method "next()" just looks like blocked, actually it runs all the time.
this was called "busy waiting" on book.
and what's the "process blocked"? what about its implement details?
is a death loop too? or some others? like signal mechanism?
For instance:
cat xxx | grep "abc"
process "cat" read a file and output them.
process "grep" waiting for input from "cat".
so before the "cat" output data, "grep" should be blocked, waiting for input and go on.
what details about this "blocked", a death loop read the input stream all the time? or really stop running, waiting a signal to wake up it to run?
The difference is basically in what happens to the process:
1. Busy Waiting
A process that is busy waiting is essentially continuously running, asking "Are we there yet? Are we there yet? How about now, are we there yet?" which consumes 100% of CPU cycles with this question:
bool are_we_there = false;
while(!are_we_there)
{
// ask if we're there (without blocking)
are_we_there = ask_if_we_are_there();
}
2. A process that is blocked (or that blocks)
A process that is blocked is suspended by the operating system and will be automatically notified when the data that it is waiting on becomes available. This cannot be accomplished without assistance from the operating system.
And example is a process that is waiting for a long-running I/O operation, or waiting for a timer to expire:
// use a system call to create a waitable timer
var timer = CreateWaitableTime()
// use another system call that waits on a waitable object
WaitFor(timer); // this will block the current thread until the timer is signaled
// .. some time in the future, the timer might expire and it's object will be signaled
// causing the WaitFor(timer) call to resume operation
UPDATE
Waitable objects may be implemented in different ways at the operating system level, but generally it's probably going to be a combination of hardware timers, interrupts and lists of waitable objects that are registered with the operating system by client code. When an interrupt occurs, the operating system's interrupt handler is called which in turn will scan though any waitable objects associated with that event, and invoke certain callback which in turn will eventually signal the waitable objects (put them in a signaled state). This is an over-simplification but if you'd like to learn more you could read up on interrupts and hardware timers.
When you say "a process is blocked" you actually mean "a thread is blocked" because those are the only schedulable entities getting CPU time. When a thread is busy waiting, it wastes CPU time in a loop. When a thread is blocked, the kernel code inside the system call sees that data or lock is not immediately available so it marks the thread as waiting. It then jumps to the scheduler which picks up another thread ready for execution. Such a code in a blocking system call might look like this:
100: if (data_available()) {
101: return;
102: } else {
103: jump_to_scheduler();
104: }
Later on the thread is rescheduled and restarts at line 100 but it immediately gets to the else branch and gets off the CPU again. When data becomes available, the system call finally returns.
Don't take this verbatim, it's my guess based on what I know about operating systems, but you should get the idea.

Why do blocking functions not use 100% CPU?

Sure, the while loop over the function call blocks inside your app's scope, but something outside still has to be looping right? Does it finally lead up to some hardware blocking event? How else can the CPU not be pegged at 100%?
Remember that the operating system is in charge of the CPU. Your code only gets to run when the operating system calls it.
If you ask the operating system to wait for something, the operating system won't call your code until that thing happens.
Imagine the operating system scheduler as a loop like this:
while(true)
{
for(Process *p : all_processes)
{
RunSomeCodeInProcess(p);
}
}
This would always use 100% CPU, even if your process wasn't running. But actually, the loop is more like this: (still simplified)
while(true)
{
bool all_processes_blocked = false;
for(Process *p : all_processes)
{
if(!IsProcessBlocked(p))
{
all_processes_blocked = false;
RunSomeCodeInProcess(p);
}
}
if (all_processes_blocked)
{
StopCPU();
}
}
The OS will not bother running processes that are blocked. It will skip over your process and only run other processes. If all processes are blocked (note: this is normal) then the OS will stop the CPU. When the CPU is stopped, it uses way less power, creates way less heat, and it doesn't execute instructions. That means StopCPU won't return.
... until the CPU gets an interrupt from some hardware device, like a mouse saying it got moved. Then the CPU automatically starts up again and runs the interrupt handler. When the interrupt handler returns, it goes back to StopCPU, so StopCPU returns and the OS checks for unblocked processes again. The hardware interrupt probably unblocked one of the processes. For example, if the interrupt was because the computer got a network packet, then now the process that was waiting for the packet is unblocked. If it was because the user pressed a key on the keyboard, then the process that was waiting for the key is unblocked, and so on.
So there are two main advantages to using blocking I/O instead of polling:
You don't waste CPU time that other processes could get.
If all processes are blocked (this is most of the time!) the CPU can save power and heat.
This is also how sleep works. There's a hardware timer that counts down and then sends an interrupt. When you do sleep(1), the OS sets the timer to one second, then blocks the process. When the interrupt comes in, it unblocks the process.
There's only one timer, but if more than one process is sleeping, the OS sets the timer to the one that wakes up first, and then when the interrupt comes in, it unblocks the first process and sets the timer for the next one. This technique is called a "timer queue".

VxWorks signals

I have a question regarding previous question asked in VxWorks forum.
My goal is when the high priority function generates a signal the low priority function will handle it immidiately(the high priority function must be preempted)
The code is:
sig_hdr () { ... }
task_low_priority() {
...
// Install signal handler for SIGUSR1
signal(SIGUSR1, sig_hdr);
...
}
task_high_priority() {
...
kill(pid, SIGUSR1); //pid is the ID of task_low_priority
...
}
After the line:
signal(SIGUSR1, sig_hdr);
i added
taskDelay(0).
I wanted to block the high priority task so the low priority task can gain the CPU in order to execute the signal handler but it does not happen unless i do taskDelay(1).
Can any one explain why it does not work with taskDelay(0)?
Indeed, taskDelay(0) will not let lower priority tasks run because of the following:
high priority task is executing
high priority task issues taskDelay(0)
Scheduler is invoked and it scans for the next task to run, it will select the highest priority task that is "ready"
The task that issued the taskDelay(0) is ready because the delay has expired (i.e. 0 ticks have elapsed)
So the high priority task is rescheduled immediately, in this case taskDelay(0) is effectively a waste of CPU cycles.
Now in the case where you issue taskDelay(1) the same steps are followed, but the difference is that the high priority task isn't in the ready state because one tick has not elapsed, so a lower priority task that is ready can have 1 tick of CPU time then it will be preempted by the high priority task.
Now there are some poorly designed systems out there that do things like:
taskLock();
...
taskDelay(0);
...
taskUnlock();
With the intention of having a low priority task hog the CPU until some point where it then allows a high priority task to take over by issuing a taskDelay(0). However if you play games like this then you should reconsider your design.
Also in your case I would consider a more robust system, rather than doing a taskDelay() to allow a low priority task to process an event, you should send a message to a low priority task and have that low priority task to process the message queue. While your high priority task blocks on a semaphore that is given by your event handler or some thing similar. In this situation you are hoping to force a ping pong between two different tasks to get a job done, but if you add a queue that will act as a buffer, so as long as your system is schedulable (i.e. there is enough time to respond to all events, queue them up and fully process them) then it will work.
Update
I assume your system is supposed to be something like this:
Event occurs (interrupt driven?).
High priority task runs to gather data.
Data is processed by low priority task.
If this is the case the pattern you want to follow is actually quite simple, and in fact could be accomplished with just 1 task:
Interrupt handler gathers data, and sends a message (msgQSend()) to task.
Task is pending on the message queue with msgQReceive.
But it might help if I knew more about your system (what are you really trying to do) and also why you are using posix calls rather than native vxworks calls.
If you are new to real time systems, you should learn about Rate monotonic analysis, there is a very brief summary on wikipedia:
http://en.wikipedia.org/wiki/Rate-monotonic_scheduling
Also note that in VxWorks a "high priority" is 0, and "low priority" is 255, the actual numbers are inversely related to their meaning :D
this is exactly the point i dont understand how the low priority task will get some CPU time when the high priority task is running?
High priority task will continue run till it gets blocked. OInce it gets blocked, lower priority task that are ready run will run.
My answer has 2 parts:
1. How to use correctly task Delay with vxWorks
2. TaskDelay is not the correct solution for your problem
First part:
TaskDelay in vxWorks can confused:
taskDelay(0) – don't perform delay at all!!!
It is a command to the scheduler to remove the current task from the CPU. If this is still the highest priority task in the system, it will return to the head of the queue with no delay at all. You will use this command if the scheduler configured to FIFO in case tasks in the same priority and your task have a CPU real time consumer function to run, the can try to release the CPU for other tasks in the same priority (nice).
BTW, it is the same as taskDelay(NO_WAIT).
TaskDelay(1) – this will delay the calling task sometime between zero (!!!) to 1 system tick. The delay in vxWorks finish at a round system tick.
TaskDelay(2) – sometime between 1 system tick to 2 system ticks.
3 …… (understood…)
TaksDelay(-1) (A.K.A taskDelay(WAIT_FOREVER)) – will delay the task forever (not recommended).
Second part:
Using taskDelay to enable low priority task might be a wrong idea. You didn't provided the all problem information but please note that delaying the high priority task will not ensure your low priority task will run (regardless the sleep time you'll write). Other tasks in highest priority from your high & low priority tasks might run for the all 'sleep time'.
There are several synchronized methods in vxWorks, like binary semaphores, changing task priority, signals, …

What happens when a thread makes kernel disable the interrupts and then that thread goes to sleep

I have this kernel code where I disable the interrupt to make this lock acquire operation atomic, but if u see the last else condition i.e. when lock is not available thread goes to sleep and interrupts are enable only after thread comes back from sleep. My question is so interrupts are disabled for whole OS until this thread comes out of sleep?
void Lock::Acquire()
{
IntStatus oldLevel = interrupt->SetLevel(IntOff); // Disabling the interrups to make the following statements atomic
if(lockOwnerThread == currentThread) //Checking if the requesting thread already owns lock
{
//printf("SM:error:%s already owns the lock\n",currentThread->getName());
DEBUG('z', "SM:error:%s already owns the lock\n",currentThread->getName());
(void) interrupt->SetLevel(oldLevel);
return;
}
if(lockOwnerThread==NULL)
{
lockOwnerThread = currentThread; // Lock owner ship is given to current thread
DEBUG('z', "SM:The ownership of the lock %s is given to %s \n",name,currentThread->getName());
}
else
{
DEBUG('z', "SM:Adding thread %s to request queue and putting it to sleep\n",currentThread->getName());
queueForLock->Append((void *)currentThread); // Lock is busy so add the thread to queue;
currentThread->Sleep(); // And go to sleep
}
(void) interrupt->SetLevel(oldLevel); // Enable the interrupts
}
I don't know the NACHOS and I would not make any assumptions on my own. So you have to test it.
The idea is simple. If this interrupt enable/disable functionality is local to the current process context then the following should happen when you call Sleep():
the process is marked as not-running, i.e. it is excluded from the list of processes the scheduler will consider to give a CPU time. Then the Sleep() function enforces the scheduler to do it's regular work - to find a process to run. If the list of running processes is not empty, the scheduler picks up a next available process and makes a context switch to this process. After this the state of interrupt management is restored from this new context.
If there are no processes to run then scheduler enters the Idle loop state and usually enables the interrupts. While the scheduler is in Idle loop it continues to poll the queue of the running processes until it get something to schedule.
Your process will get the control when it will be marked as running again. This could happen if some other process calls WakeUp() (or a like, as I mentioned the API is unknown to me)
When the scheduler will pick up your process to switch to it performs the usual (for your system) context switch that has the interrupts enabled flag set to false, so the execution continues at statement after the Sleep() call with interrupts disabled.
If the assumptions above are incorrect and the interrupts enabled flag is global, then there are two possibilities: either the system hangs as it can't serve the interrupts, or it has some workaround for such a situations.
So, you need to try. The best way is to read the kernel sources of course, if you have the access.))