About Watchdog Timer - embedded

Can anyone tell me whether we should enable or disable watch dog during the startup/boot code executes? My friend told me that we usually disable watch dog in the boot code. Can anyone one tell me what is the advantage or disadvantage of doing so?

It really depends on your project. The watchdog is there to help you ensure that your program won't get "stuck" while executing code. -- If there is a chance that your program may hang during the boot-procedure, it may make sense to incorporate the watchdog there too.
That being said, I generally start the watchdog at the end of my boot-up procedures.

Usually the WD (watchdog) is enabled after the boot-up procedure, because this is when the program enters its "loop" and periodically kicks the WD. During boot-up, by which I suppose you mean linear initialization of hardware and peripherals, there's much less periodicity in your code and hard to insert a WD kicking cycle.

Production code should always enable the watchdog. Hobby and/or prototype projects are obviously a special case that may not require the watchdog.
If the watchdog is enabled during boot, there is a special case which must be considered. Erasing and writing memory take a long time (erasing an entire device may take seconds to complete). So you must insure that your erase and write routines periodically service the watchdog to prevent a reset.

If you're debugging, you want it off or the device will reboot on your when you try to step through code. Otherwise it's up to you. I've seen watchdogs save projects' butts and I've seen watchdogs lead to inadvertent reboot loops that cause customers to clog up the support lines and thus cost the company a ton.
You make the call.

The best practice would be to have the watchdog activate automatically on power up. If your hardware is not designed for that then switch it on as soon as possible. Generally I set the watchdog up for long duration during boot up but once I am past boot up I go for a short time out and service the watchdog regularly.
You might not always be around to reset a board that hanged after a plant shut down and restart at a remote location. Or the board is located in a inaccessible basement crawl space and it did not restart after a power dip. Lab easy practices is not real world best practices.
Try and design your hardware so that your software can check the reset cause at boot up and report. If you get a watchdog timeout you need to know because it is a failure in your system and ignoring it can cause problems later.
It is easier to debug with the watchdog off but during development regularly test with the watchdog on to ensure everything is on track.

I always have it enabled. What is the advantage of disabling it? So what if I have to reset it during the bootup code?

Watchdogs IMHO serve three two, but distinct, primary purposes, along with a third, less-strongly-related purpose: (1) Ensure that in all cases where the system is knocked out of whack, it will recover, eventually; (2) Ensure that when hardware is enabled which must not go too long without service, anything that would prevent such servicing shuts down the system, reasonably quickly; (3) Provide a means by which a system can go to sleep for awhile, without sleeping forever.
While disabling a watchdog during a boot loader may not interfere with purpose #2, it may interfere with purpose #1. My preference is to leave watchdogs enabled during a boot loader, and have the boot loader hit the watchdog any time something happens to indicate that the system is really supposed to be in the boot loader (e.g. every time it receives a valid boot-loader-command packet). On one project where I didn't do this, and just had the boot loader blindly feed the watchdog, static zaps could sometimes knock units into bootloader mode where they would sit, forever. Having watchdog kick the system out of the boot loader when no actual boot-loading is going on alleviates that problem.
Incidentally, if I were designing my 'ideal' embedded-watchdog circuit, I would have a hardware-configurable parameter for maximum watchdog time, and would have software settings for 'requested watchdog time' and 'maximum watchdog time'. Initially, both software settings would be set to maximum; any time the watchdog is fed, the time would be set to the minimum of the three settings. Software could change the 'requested watchdog time' any time, to any value; the 'maximum watchdog time' setting could be decreased at any time, but could only be increased via system reset.
BTW, I might also include a "periodic reset" timer, which would force the system to unconditionally reset at some interval. Software would not be able to override the behavior of this timer, but would be able to query it and request a reset early. Even systems which try to do everything right with a watchdog can still fall into states which are 'broken' but the watchdog gets fed just fine. If periodic scheduled downtime is acceptable, periodic resets can avoid such issues. One may minimize the effect of such resets on system usefulness by performing them early whenever it wouldn't disrupt some action in progress which would be disrupted. For example, if the reset interval is set to seven hours one could, any time the clock got down to one hour, ask that no further actions be requested, wait a few seconds to see if anyone tried to send an action just as they were asked to stop, and if no actions were requested, reset, and then invite further requests. A request which would have been sent just as the system was about to reset would be delayed until after the reset occurred, but provided no requests would take longer than an hour to complete, no requests would be lost or disrupted.

Fewer transistors switching, I suppose, so minuscule power savings. Depending on how much you sleep, this might actually be a big savings. Your friend might be referring to the practice of turning off the WDT when you're actually doing something, then turning it on when you sleep. There's a nice little point that Microchip gives about their PICs:
"If the WDT is disabled during normal operation (FWDTEN = 0), then the SWDTEN bit (RCON<5>) can be used to turn on the WDT just before entering Sleep mode"

Related

How to interpret Hardware watchdog exceptions on a ESP chip?

For one of our Projects we have a Hardware Watchdog reset which happens on roughly 0.1% of our devices each day, resulting in many unwanted hardware resets.
We are trying to figure out what causes this Hardware Watchdog reset, but have failed to find anything relevant in our code which would result in this behavior.
We are using the Arduino 2.4.2 Version, we are not sure since when the Problem has bugged our solution since we had other issues which have now mainly been resolved.
Luckily our devices send us their reboot reasons when they reconnect, there we are receiving the following:
ResetReason=Hardware Watchdog;ResetInfo=Fatal exception:4 flag:1 (WDT)
epc1:0x40102329 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000
depc:0x00000000;
We have looked for any thing, when this through the EspStackTraceDecoder we ended up with:
0x40102329: wDev_ProcessFiq at ??:?
A search looking at varies project which have asked similar questions mostly seemed to include a dns query. But not all, so it seems to be a general issue?
What additional information could we extract that might help us identity the issue?
Some Additional Information
Memory is stable and we have ~15-17Kb of free Heap, depending on the mode and the amount of data queued to send / receive queue.
Our side of the code uses yield, delay etc. so the S/W watchdog should always be fed. This also applies to the Async callback code.
Check whether you are doing any wrong memory read. The main reason for HW WDT is that it can trigger the reset if the software (or) cpu is not working anymore.
your CPU might have been stuck while executing some instructions and does't return.

How do you avoid interrupt starvation in a nested interrupt system?

I am learning about interrupts and couldn't understand what happens when there are too many interrupts to a point where the CPU can't process the foreground loop or complete the existing interrupts. I read through this article https://www.cs.utah.edu/~regehr/papers/interrupt_chapter.pdf but didn't completely understand how a scheduler would help, if there are simply too many interrupts?
Do we switch to a faster CPU if the interrupts can not be missed?
Yes, you had to switch to a faster CPU!
You had to ensure that there is enough time for the mainloop. Therefore it is really important to keep your Interrupt service as short as possible and do some CPU workloads tests.
Indeed, any time there is contention over a shared resource, there is the possibility of starvation. The schedulers discussed in the paper limit the interrupt rate, thus ensuring some interrupt-free processing time during each interval. During high activity periods, interrupt handling is disabled, and the scheduler switches to polling mode where it interrogates the state of the interrupt request lines periodically, effectively throttling the stream of interrupts. The operating system strives to do as little as possible in each interrupt handler - tasks are often simply queued so they can be handled later at a different stage. There are many considerations and trade-offs that go into any scheduling algorithm.
Overall you need a clue of how much time each part of your program consumes. This is pretty easy to measure in practice live with an oscilloscope. If you activate a GPIO when entering and de-activate it when leaving the interrupt, you don't only get to see how much time the ISR consumes, but also how often it kicks in. If you do this for each ISR you get a good idea how much time they need. You can then do something similar in main(), to get a rough estimate of the complete execution cycle of the program, main + interrupts.
As for the best solution, it is obviously to reduce the amount of interrupts. Use polling if possible. Use DMA. Use serial peripherals (UART, CAN etc) that are hardware-buffered instead of interrupt-intensive ones. Use hardware PWM instead of output compare timers. And so on. These things need to be considered early on when you pick a suitable MCU for your project. If you picked the wrong MCU, then you'll obviously have to change. Twiddling with the CPU clock sounds like quick & dirty fix. Get the design right instead.

What mechanism is used to account CPU usage for a process, particularly `sys` (time spent in kernel)

What is the mechanism used to account for cpu time, including that spent in-kernel (sys in the output of top)?
I'm thinking about limitations here because I remember reading about processes being able avoid showing up their cpu usage, if they yield before completing their time slice.
Context
Specifically, I'm working on some existing code in KVM virtualization.
if (guest_tsc < tsc_deadline)
__delay(tsc_deadline - guest_tsc);
The code is called with interrupts disabled. I want to know if Linux will correctly account for long busy-waits with interrupts disabled.
If it does, it would help me worry less about certain edge case configurations which might cause long, but bounded busy-waits. System administrators could at least notice if it was bad enough to degrade throughput (though necessarily latency), and identify the specific process responsible (in this case, QEMU, and the process ID would allow identifying the specific virtual machine).
In Linux 4.6, I believe process times are still accounted by sampling in the timer interrupt.
/*
* Called from the timer interrupt handler to charge one tick to current
* process. user_tick is 1 if the tick is user time, 0 for system.
*/
void update_process_times(int user_tick)
So it may indeed be possible for a process to game this approximation.
In answer to my specific query, it looks like CPU time spent with interrupts disabled will not be accounted to the specific process :(.

Inter Processor Interrupt usage

An educational principle is: There is not such a thing as a stupid question. The basic idea behind this is that people learn by asking.
I was asked to: "Can you show and explain at a programming level what bad will happen if every task could execute all instructions."
I did give the code
main(){
_asm_("cli;");
while(1);
}
and explained it (the system frozen for good- UP)
Then I was asked: "Is it possible give an example so that system do not freeze even this clearing interrupts is done?"
I did modify the previous example:
I did give the code
main(){
_asm_("cli;");
i=i/0;
while(1);
}
and explained it.
Trivially: If we have demand paging i=i/0 causes first a page fault (the data page not present) and an other task can be scheduled to run interrupts enabled during the disk read and later on divide by zero will throw this task away for good.
But the answers were based on UP. What about SMP? I must tell that answers are incomplete.
It still easy enough to construct:
int i;
main(){
for(i=0;i<100;i++)// Suppose we have less than 100 CPUs
if(fork())
{ sleep(5);//The generating task has (most probable) time to do all forks
_asm_("cli;");
while(1);
}
}
which will disable interrupts for all CPUs, because every CPU gets a poisonous task to run.
Even so far a stupid question did reveal many things good to learn to a beginner: privileged instructions, paging, fault handling, scheduling during DMA, fork.....
But a minor doubt remains (shame on me) about the first program running on a SMP.
Will one CPU be out permanently or not?
Other CPUs continue and can send re_schedule() IPI message.
What happens then?
It can be easy to speculate that the frozen CPU do not wake up, because interrupts are disabled.
But to be perfectly sure must know more.
My question was:
Is the Inter Processor Interrupt (IPI) maskable or non-maskable?
I mean in the most common "popular" implementations?
Excuse my stupid question. It can't be very difficult to find an answer. I will seek it.
I mean interrupt pin number (telling maskable, I guess).
My own answer - correct?
I studied the issue, because nobody else did like it, coming to following thoughts:
With important real-time applications we have had for a long time a watchdog timer (HW interrupting cpu to answer somehow "I am alive").
For example we have main control computer and standby computer taking care of the system if the main computer is down.
What about Linux?
What kind watchdog- have we one?
We can compile the Linux kernel with or without watchdog.
What the Linux watchdog does?
On many(!) x86/x86-64 type hardware there is a feature that enables us to generate 'watchdog NMI interrupts'.
It's even possible to disable the NMI watchdog in run-time by writing "0" to /proc/sys/kernel/nmi_watchdog.
If any CPU in the system does not execute the period local timer interrupt for more than 5 seconds, APIC tries to fix the situation by a non-maskable interrupt (cpu executes the handler, and kills the process)!
(SCC Linux is an different case as to NMI.)
My answers (in the original question) were based on the system without watchdog!
It is problematic to answer at a general level and give examples based on some fixed system. The answers can be correct or not depending the cpu and configuration and settings.
Anyway, talking about NMI did make some sense? Did it?
If the CPU didn't restrict access to some instructions, it would be too easy to accidentally or deliberately cause a catastrophe.
push $0
push $0
lidt (%esp)
int $42
This code sequence will reset an x86 processor. Here's why:
The code loads the IDTR register with an interrupt descriptor table (IDT) at linear address 0, with a size of one byte.
Raises interrupt 42, which can't work because it is beyond the 1-byte limit of the IDT.
The CPU tries to raise a general protection fault, interrupt 13. This fails too, because interrupt 13 is beyond the one byte limit.
The CPU tries to raise a double fault exception, interrupt 8. This fails too, interrupt 8 is beyond the limit of the IDT.
This is known as a triple-fault. The CPU does a shutdown bus cycle to tell the motherboard that it is now ignoring everything and stopping execution. The motherboard asserts reset, rebooting the machine.
This is actually negligible compared to what code could do. A code sequence could easily hijack the machine altogether and start destroying all of the data on the hard drive, it could send all of your files to a malicious server on the internet, it could change your password, enable remote access, connect out to a malicious server and grant an attacker unlimited shell access. There's no limit on what a program could do.
Processors have privileged instructions for two reasons, the primary purpose is to protect the operating system from buggy programs that might accidentally do something to bring down or hijack the whole machine. The secondary purpose is to restrict deliberately malicious programs from doing the same.

Using only free CPU time with Objective-C program

The BOINC client (does distributed processing jobs like SETI#home does) is able to turn processing on or off based on whether other processes are using a certain percentage of CPU time. That is, if the user starts to do some work and their processes start using 60% CPU, BOINC can pause to avoid interfering with the user's work.
I would like to do the same thing (monitor CPU usage by other processes). The difficulty as I see it is not monitoring CPU usage, but rather making sure that the information isn't skewed by my own usage. For example, if my process is using a ton of CPU time it may prevent another process from using enough to trigger the pause.
Can someone point me in the right direction? Even a suggestion for what to search for would be useful. I'm not really sure what this feature would be called.
You can use NSTask to set the 'nice' value of the process when your process starts.
Also [[NSThread mainThread] setThreadPriority:0.0]
where priority value is between 0.0 and 1.0 is a Cocoa API which may save you frakking about with sudo