WDT in a preemptive RTOS kernel - embedded

I heard that the best way to use a watch dog timer in a preemptive kernel is to assign it to the lowest task/idle task and refresh it there, I fail to understand why though,what if high priority tasks keeps running and idle task doest run before timeout.
any clarifications?
Thanks.

fail to understand why though,what if high priority tasks keeps running and idle task doest run before timeout.
Well, that is kind of the point. If the lowest priority thread (or better, the idle thread) is starved, then your system will be missing deadlines, and is either poorly designed or some unexpected condition has occurred.
If it were reset at a high priority or interrupt, then all lower priority threads could be in a failed state, either running busy or never running at all, and the watchdog would be uselessly maintained while not providing any protection whatsoever.
It is nonetheless only a partial solution to system integrity monitoring. It addresses the problem of an errant task hogging the CPU, but it does not deal with the issue of a task blocking and never being scheduled as intended. There are any number of ways of dealing with that, but a simple approach is to have "software watchdogs", counters that get reset by each task, and decremented in a high- priority timer thread or handler. If any thread counter reaches zero, then the respective thread blocked for longer than intended, and action may be taken. This requires that each thread runs at an interval shorter than its watchdog counter reset value. For threads that otherwise block indefinitely waiting for infrequent aperiodic events, you might use a blocking timeout just to update the software watchdog.

There is no absolute rule about the priority of a watchdog task. It depends on your design and goals.
Generally speaking, if the watchdog task is the lowest priority task then it will fail to run (and the watchdog will reset) if any higher priority task becomes stuck or consumes too much of the CPU time. Consider that if the high priority task(s) is running 100% of the time then that's probably too much because lower priority tasks are getting starved. And maybe you want the watchdog to reset if lower priority tasks are getting starved.
But that general idea isn't a complete design. See this answer, and especially the "Multitasking" section of this article (https://www.embedded.com/watchdog-timers/) for a more complete watchdog task design. The article suggests making the watchdog task the highest priority task but discusses the trade-offs of the alternative.

Related

difference between non preemptive and cooperative and rate-monotonic scheduler?

I have Read about co-operative Scheduler which not let higher priority task run till lower priority task block itself. so if there is no delay in task the lower task will take the CPU forever is it correct? because I have thought the non preemptive is another name for cooperative but there is another article which has confused me which say in non preemptive higher task can interrupt lower task at sys tick not in the middle between ticks so what's correct ?
is actually cooperative and non preemptive are the same?
and Rate monotonic is one type of preemptive scheduler right?
it's priority didn't set manually the scheduler Algo decide priority based on execution time or deadline it is correct?
is it rate monotonic better than fixed priority preemptive kernel (the one which FreeRtos Used)?
These terms can never fully cover the range of possibilities that can exist. The truth is that people can write whatever kind of scheduler they like, and then other people try to put what is written into one or more categories.
Pre-emptive implies that an interrupt (eg: from a clock or peripheral) can cause a task switch to occur, as well as it can occur when a scheduling OS function is called (like a delay or taking or giving a semaphore).
Co-operative means that the task function must either return or else call an OS function to cause a task switch.
Some OS might have one specific timer interrupt which causes context switches. The ARM systick interrupt is suitable for this purpose. Because the tasks themselves don't have to call a scheduling function then this is one kind of pre-emption.
If a scheduler uses a timer to allow multiple tasks of equal priority to share processor time then one common name for this is a "round-robin scheduler". I have not heard the term "rate monotonic" but I assume it means something very similar.
It sounds like the article you have read describes a very simple pre-emptive scheduler, where tasks do have different priorities, but task switching can only occur when the timer interrupt runs.
Co-operative scheduling is non-preemptive, but "non-preemptive" might describe any scheduler that does not use preemption. It is a rather non-specific term.
The article you describe (without citation) however, seems confused. Context switching on a tick event is preemption if the interrupted task did not explicitly yield. Not everything you read in the Internet is true or authoritative; always check your sources to determine thier level of expertise. Enthusiastic amateurs abound.
A fully preemptive priority based scheduler can context switch on "scheduling events" which include not just the timer tick, but also whenever a running thread or interrupt handler triggers an IPC or synchronisation mechanism on which a higher-priority thread than the current thread is waiting.
What you describe as "non-preemptive" I would suggest is in fact a time triggered preemptive scheduler, where a context switch occurs only in a tick event and not asynchronously on say a message queue post or a semaphore give for example.
A rate-monotonic scheduler does not necessarily determine the priority automatically (in fact I have never come across one that did). Rather the priority is set (manually) according to rate-monotonic analysis of the tasks to be executed. It is "rate-monotonic" in the sense that it supports rate-monotonic scheduling. It is still possible for the system designer to apply entirely inappropriate priorities or partition tasks in such a way that they are insufficiently deterministic for RMS to actually occur.
Most RTOS schedulers support RMS, including FreeRTOS. Most RTOS also support variable task priority as both a priority inversion mitigation, and via an API. But to be honest if your application relies on either I would argue that it is a failed design.

How do you avoid interrupt starvation in a nested interrupt system?

I am learning about interrupts and couldn't understand what happens when there are too many interrupts to a point where the CPU can't process the foreground loop or complete the existing interrupts. I read through this article https://www.cs.utah.edu/~regehr/papers/interrupt_chapter.pdf but didn't completely understand how a scheduler would help, if there are simply too many interrupts?
Do we switch to a faster CPU if the interrupts can not be missed?
Yes, you had to switch to a faster CPU!
You had to ensure that there is enough time for the mainloop. Therefore it is really important to keep your Interrupt service as short as possible and do some CPU workloads tests.
Indeed, any time there is contention over a shared resource, there is the possibility of starvation. The schedulers discussed in the paper limit the interrupt rate, thus ensuring some interrupt-free processing time during each interval. During high activity periods, interrupt handling is disabled, and the scheduler switches to polling mode where it interrogates the state of the interrupt request lines periodically, effectively throttling the stream of interrupts. The operating system strives to do as little as possible in each interrupt handler - tasks are often simply queued so they can be handled later at a different stage. There are many considerations and trade-offs that go into any scheduling algorithm.
Overall you need a clue of how much time each part of your program consumes. This is pretty easy to measure in practice live with an oscilloscope. If you activate a GPIO when entering and de-activate it when leaving the interrupt, you don't only get to see how much time the ISR consumes, but also how often it kicks in. If you do this for each ISR you get a good idea how much time they need. You can then do something similar in main(), to get a rough estimate of the complete execution cycle of the program, main + interrupts.
As for the best solution, it is obviously to reduce the amount of interrupts. Use polling if possible. Use DMA. Use serial peripherals (UART, CAN etc) that are hardware-buffered instead of interrupt-intensive ones. Use hardware PWM instead of output compare timers. And so on. These things need to be considered early on when you pick a suitable MCU for your project. If you picked the wrong MCU, then you'll obviously have to change. Twiddling with the CPU clock sounds like quick & dirty fix. Get the design right instead.

Operating System Basics

I am reading process management,and I have a few doubts-
What is meant by an I/o request,for E.g.-A process is executing and
hence it is in running state,it is in waiting state if it is waiting
for the completion of an I/O request.I am not getting by what is meant by an I/O request,Can you
please give an example to elaborate.
Another doubt is -Lets say that a process is executing and suddenly
an interrupt occurs,then the process stops its execution and will be
put in the ready state,is it possible that some other process began
its execution while the interrupt is also being processed?
Regarding the first question:
A simple way to think about it...
Your computer has lots of components. CPU, Hard Drive, network card, sound card, gpu, etc. All those work in parallel and independent of each other. They are also generally slower than the CPU.
This means that whenever a process makes a call that down the line (on the OS side) ends up communicating with an external device, there is no point for the OS to be stuck waiting for the result since the time it takes for that operation to complete is probably an eternity (in the CPU view point of things).
So, the OS fires up whatever communication the process requested (call it IO request), flags the process as waiting for IO, and switches execution to another process so the CPU can do something useful instead of sitting around blocked waiting for the IO request to complete.
When the external device finishes whatever operation was requested, it generates an interrupt, so the OS is informed the work is done, and it can then flag the blocked process as ready again.
This is all a very simplified view of course, but that's the main idea. It allows the CPU to do useful work instead of waiting for IO requests to complete.
Regarding the second question:
It's tricky, even for single CPU machines, and depends on how the OS handles interrupts.
For code simplicity, a simple OS might for example, whenever an interrupt happens process the interrupt in one go, then resume whatever process it decides it's appropriate whenever the interrupt handling is done. So in this case, no other process would run until the interrupt handling is complete.
In practice, things get a bit more complicated for performance and latency reasons.
If you think about an interrupt lifetime as just another task for the CPU (From when the interrupt starts to the point the OS considers that handling complete), you can effectively code the interrupt handling to run in parallel with other things.
Just think of the interrupt as notification for the OS to start another task (that interrupt handling). It grabs whatever context it needs at the point the interrupt started, then keeps processing that task in parallel with other processes.
I/O request generally just means request to do either Input , Output or both. The exact meaning varies depending on your context like HTTP, Networks, Console Ops, or may be some process in the CPU.
A process is waiting for IO: Say for example you were writing a program in C to accept user's name on command line, and then would like to print 'Hello User' back. Your code will go into waiting state until user enters their name and hits Enter. This is a higher level example, but even on a very low level process executing in your computer's processor works on same basic principle
Can Processor work on other processes when current is interrupted and waiting on something? Yes! You better hope it does. Thats what scheduling algorithms and stacks are for. However the real answer depending on what Architecture you are on, does it support parallel or serial processing etc.

operating system - context switches

I have been confused about the issue of context switches between processes, given round robin scheduler of certain time slice (which is what unix/windows both use in a basic sense).
So, suppose we have 200 processes running on a single core machine. If the scheduler is using even 1ms time slice, each process would get its share every 200ms, which is probably not the case (imagine a Java high-frequency app, I would not assume it gets scheduled every 200ms to serve requests). Having said that, what am I missing in the picture?
Furthermore, java and other languages allows to put the running thread to sleep for e.g. 100ms. Am I correct in saying that this does not cause context switch, and if so, how is this achieved?
So, suppose we have 200 processes running on a single core machine. If
the scheduler is using even 1ms time slice, each process would get its
share every 200ms, which is probably not the case (imagine a Java
high-frequency app, I would not assume it gets scheduled every 200ms
to serve requests). Having said that, what am I missing in the
picture?
No, you aren't missing anything. It's the same case in the case of non-pre-emptive systems. Those having pre-emptive rights(meaning high priority as compared to other processes) can easily swap the less useful process, up to an extent that a high-priority process would run 10 times(say/assume --- actual results are totally depending on the situation and implementation) than the lowest priority process till the former doesn't produce the condition of starvation of the least priority process.
Talking about the processes of similar priority, it totally depends on the Round-Robin Algorithm which you've mentioned, though which process would be picked first is again based on the implementation. And, Windows and Unix have same process scheduling algorithms. Windows and Unix does utilise Round-Robin, but, Linux task scheduler is called Completely Fair Scheduler (CFS).
Furthermore, java and other languages allows to put the running thread
to sleep for e.g. 100ms. Am I correct in saying that this does not
cause context switch, and if so, how is this achieved?
Programming languages and libraries implement "sleep" functionality with the aid of the kernel. Without kernel-level support, they'd have to busy-wait, spinning in a tight loop, until the requested sleep duration elapsed. This would wastefully consume the processor.
Talking about the threads which are caused to sleep(Thread.sleep(long millis)) generally the following is done in most of the systems :
Suspend execution of the process and mark it as not runnable.
Set a timer for the given wait time. Systems provide hardware timers that let the kernel register to receive an interrupt at a given point in the future.
When the timer hits, mark the process as runnable.
I hope you might be aware of threading models like one to one, many to one, and many to many. So, I am not getting into much detail, jut a reference for yourself.
It might appear to you as if it increases the overhead/complexity. But, that's how threads(user-threads created in JVM) are operated upon. And, then the selection is based upon those memory models which I mentioned above. Check this Quora question and answers to that one, and please go through the best answer given by Robert-Love.
For further reading, I'd suggest you to read from Scheduling Algorithms explanation on OSDev.org and Operating System Concepts book by Galvin, Gagne, Silberschatz.

About Watchdog Timer

Can anyone tell me whether we should enable or disable watch dog during the startup/boot code executes? My friend told me that we usually disable watch dog in the boot code. Can anyone one tell me what is the advantage or disadvantage of doing so?
It really depends on your project. The watchdog is there to help you ensure that your program won't get "stuck" while executing code. -- If there is a chance that your program may hang during the boot-procedure, it may make sense to incorporate the watchdog there too.
That being said, I generally start the watchdog at the end of my boot-up procedures.
Usually the WD (watchdog) is enabled after the boot-up procedure, because this is when the program enters its "loop" and periodically kicks the WD. During boot-up, by which I suppose you mean linear initialization of hardware and peripherals, there's much less periodicity in your code and hard to insert a WD kicking cycle.
Production code should always enable the watchdog. Hobby and/or prototype projects are obviously a special case that may not require the watchdog.
If the watchdog is enabled during boot, there is a special case which must be considered. Erasing and writing memory take a long time (erasing an entire device may take seconds to complete). So you must insure that your erase and write routines periodically service the watchdog to prevent a reset.
If you're debugging, you want it off or the device will reboot on your when you try to step through code. Otherwise it's up to you. I've seen watchdogs save projects' butts and I've seen watchdogs lead to inadvertent reboot loops that cause customers to clog up the support lines and thus cost the company a ton.
You make the call.
The best practice would be to have the watchdog activate automatically on power up. If your hardware is not designed for that then switch it on as soon as possible. Generally I set the watchdog up for long duration during boot up but once I am past boot up I go for a short time out and service the watchdog regularly.
You might not always be around to reset a board that hanged after a plant shut down and restart at a remote location. Or the board is located in a inaccessible basement crawl space and it did not restart after a power dip. Lab easy practices is not real world best practices.
Try and design your hardware so that your software can check the reset cause at boot up and report. If you get a watchdog timeout you need to know because it is a failure in your system and ignoring it can cause problems later.
It is easier to debug with the watchdog off but during development regularly test with the watchdog on to ensure everything is on track.
I always have it enabled. What is the advantage of disabling it? So what if I have to reset it during the bootup code?
Watchdogs IMHO serve three two, but distinct, primary purposes, along with a third, less-strongly-related purpose: (1) Ensure that in all cases where the system is knocked out of whack, it will recover, eventually; (2) Ensure that when hardware is enabled which must not go too long without service, anything that would prevent such servicing shuts down the system, reasonably quickly; (3) Provide a means by which a system can go to sleep for awhile, without sleeping forever.
While disabling a watchdog during a boot loader may not interfere with purpose #2, it may interfere with purpose #1. My preference is to leave watchdogs enabled during a boot loader, and have the boot loader hit the watchdog any time something happens to indicate that the system is really supposed to be in the boot loader (e.g. every time it receives a valid boot-loader-command packet). On one project where I didn't do this, and just had the boot loader blindly feed the watchdog, static zaps could sometimes knock units into bootloader mode where they would sit, forever. Having watchdog kick the system out of the boot loader when no actual boot-loading is going on alleviates that problem.
Incidentally, if I were designing my 'ideal' embedded-watchdog circuit, I would have a hardware-configurable parameter for maximum watchdog time, and would have software settings for 'requested watchdog time' and 'maximum watchdog time'. Initially, both software settings would be set to maximum; any time the watchdog is fed, the time would be set to the minimum of the three settings. Software could change the 'requested watchdog time' any time, to any value; the 'maximum watchdog time' setting could be decreased at any time, but could only be increased via system reset.
BTW, I might also include a "periodic reset" timer, which would force the system to unconditionally reset at some interval. Software would not be able to override the behavior of this timer, but would be able to query it and request a reset early. Even systems which try to do everything right with a watchdog can still fall into states which are 'broken' but the watchdog gets fed just fine. If periodic scheduled downtime is acceptable, periodic resets can avoid such issues. One may minimize the effect of such resets on system usefulness by performing them early whenever it wouldn't disrupt some action in progress which would be disrupted. For example, if the reset interval is set to seven hours one could, any time the clock got down to one hour, ask that no further actions be requested, wait a few seconds to see if anyone tried to send an action just as they were asked to stop, and if no actions were requested, reset, and then invite further requests. A request which would have been sent just as the system was about to reset would be delayed until after the reset occurred, but provided no requests would take longer than an hour to complete, no requests would be lost or disrupted.
Fewer transistors switching, I suppose, so minuscule power savings. Depending on how much you sleep, this might actually be a big savings. Your friend might be referring to the practice of turning off the WDT when you're actually doing something, then turning it on when you sleep. There's a nice little point that Microchip gives about their PICs:
"If the WDT is disabled during normal operation (FWDTEN = 0), then the SWDTEN bit (RCON<5>) can be used to turn on the WDT just before entering Sleep mode"