When to use windowed watchdog for embedded systems - embedded

This post is not for asking how to use it, but when.
There is a lot of documentation about windowed watchdogs (WW), and most microcontrollers already include it. Every vendor states that WW are meant for safety applications, but no one says more about this topic.
I would like to be pointed to specific examples, but examples that could be a little more than "for a car's brakes system".
We all know that a WW must be fed neither too early nor too late, but how will this scenario help to improve safeness?
Thank you!!

The overall point of a Watchdog is to ensure that the firmware is executing as expected. The theory is that if your firmware can periodically kick the watchdog, then the other functions it is responsible for are also happening.
From a system design, they're the last level of fail safe. It's basically saying "we don't know what the system is doing, because it's not able to kick the watchdog. So, reset the device and hope the problem goes away."
They can protect you from accidental infinite loops, stack corruptions, RAM bit twiddles, etc.
A Windowed Watchdog is a better solution than a single-sided Watchdog as the window can protect against more things... For example, with a single-sided, if the loop you're stuck in includes the watchdog kick, you'd never know you had a problem. For a Windowed Watchdog, you have a better chance of resetting due to the likelyhood of kicking too fast...
So, to answer your question. You'd use a Windowed Watchdog any time you wanted to be reasonably sure that the firmware is doing what it is supposed to, or to fall back to a safe state if it's not. They are generally focused on in safety systems, but all embedded devices can benefit from their use. (For example, a house thermostat is not considered a safety-critical system, however if it completely locks up and requires someone to remove the batteries to restart it that would be an annoyance.)

Related

How to interpret Hardware watchdog exceptions on a ESP chip?

For one of our Projects we have a Hardware Watchdog reset which happens on roughly 0.1% of our devices each day, resulting in many unwanted hardware resets.
We are trying to figure out what causes this Hardware Watchdog reset, but have failed to find anything relevant in our code which would result in this behavior.
We are using the Arduino 2.4.2 Version, we are not sure since when the Problem has bugged our solution since we had other issues which have now mainly been resolved.
Luckily our devices send us their reboot reasons when they reconnect, there we are receiving the following:
ResetReason=Hardware Watchdog;ResetInfo=Fatal exception:4 flag:1 (WDT)
epc1:0x40102329 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000
depc:0x00000000;
We have looked for any thing, when this through the EspStackTraceDecoder we ended up with:
0x40102329: wDev_ProcessFiq at ??:?
A search looking at varies project which have asked similar questions mostly seemed to include a dns query. But not all, so it seems to be a general issue?
What additional information could we extract that might help us identity the issue?
Some Additional Information
Memory is stable and we have ~15-17Kb of free Heap, depending on the mode and the amount of data queued to send / receive queue.
Our side of the code uses yield, delay etc. so the S/W watchdog should always be fed. This also applies to the Async callback code.
Check whether you are doing any wrong memory read. The main reason for HW WDT is that it can trigger the reset if the software (or) cpu is not working anymore.
your CPU might have been stuck while executing some instructions and does't return.

Permanent DOS Attacks - Anyone Knowledgeable?

So, I'm looking into Permanent DOS attacks for a class, and I'm having a hard time coming up with concrete examples. There's a lot of information about Phlashing (flashing firmware to either brick the device, or put malicious firmware in its place, for those of you who don't know the term) but I'd like to have a broader set of examples.
That being said, there has to be a way to write code that will do something like wear out disk arms, right? Something that will have the disk seek to the end of the disk, then back to the front, on and on. Anyone have an example of how that would be accomplished? Is there some way to specify where to track to on a disk in C (similar to traversing to a certain point in a file, but for the entire HDD!)? If not, I guess there's always trying to force a file's location on the disk... which seems like less fun trying to accomplish. Again, can you do something like that programmatically?
If anyone has any insight into these types of attacks, or any good resources for me to check into, I'd appreciate it. Maybe you read a story about it on Slashdot a few years back? Let me know! The more info I can gather, the less likely I'll be forced to kill time during my talk by bricking my router in the class :) I'm not made of money OR routers!
Seems like these would primarily be limited to physical attacks and social engineering ("To enable your computer's hidden turbo function, remove the cover and pry this part). But:
Adjust screen refresh rates to insane values to blow older CRTs
Monkey with ACPI fan, charge, or battery controls if possible to cause overheating or battery failure.
Overwrite every rewritable storage device of every kind attached to any bus. Discover and overwrite any IDE, USB, etc... device you know the flash updater details for.
Of course nothing is permanent. You can replace the hard drive, BIOS chips, CPU, motherboard, memory, etc...
Although it is mostly fictional, the halt and catch fire operation would be a very convenient and permanent DOS attack.
Steve Gibson (google his name) has a paper he wrote a few years back about protocol-level vulnerabilities in TCP/IP. Some of it is still pertinent today.
Socially engineer the power company or ISP to turn off service at the location in question.
Many devices in the computer today have their own firmwares, including but not limited to CPU, DVD, HDD, VGA, motherboard (BIOS) etc. Most of these devices also have a way of updating their respective firmwares. Which can also be used to brick them pretty efficiently. Although this does require an individual approach to every device, often using privileged instructions and undocumented interfaces.
It's possible for a virus to do this. I seem to recall an actual virus doing this back in the day, but can't find anything to back that up.
I was able to find an article where the author has a conversation with the VP from Western Digital wherein he states a program could potentially access a hard drive's firmware causing such a DOS attack:
There are back doors if you will that allow us to get into places that the operating system can't go through the IDE connector
There used to be a few viruses that could cause old CRT monitors to break. They could cause invalid sync signals out the VGA point that would be too high in frequency for the video sweep. I also remember a few that would use bad sector flagging to draw images on the old versions of Scandisk (we are talking early 90’s or older.) I don't remember and of the names or have any references, but they used to be quite annoying.
Fortunately better circuits, memory protection, API abstraction have made such attacked very difficult to impossible.

Determining failing sectors on portable flash memory

I'm trying to write a program that will detect signs of failure for portable flash memory devices (thumb drives, etc).
I have seen tools in the past that are able to detect failing sectors and other kinds of trouble on conventional mechanical hard drives, but I fear that flash memory does not have the same kind of predictable low-level access to the hardware due to the internal workings of the storage. Things like wear-leveling and other block-remapping techniques (to skip over 'dead' sectors?) lead me to believe that determining if a flash drive is failing will be difficult at best, if not impossible (short of having constant read failures and device unmounts).
Flash drives at their end-of-life should be easy to detect (constant CRC discrepancies during reads and all-out failure). But what about drives that might be failing early? Are there any tell-tale signs like slower throughput speeds that might indicate a flash drive is going to fail much sooner than normal?
Along the lines of detecting potentially bad blocks, I had considered attempting random reads/writes to a file close to or exactly the size of the entire volume, but even then is it possible that the drive might report sizes under its maximum capacity to account for 'dead' blocks?
In short, is there any way to circumvent or at least detect (algorithmically or otherwise) the use of block-remapping or other life extension techniques for flash memory?
Let me end this question by expressing my uncertainty as to whether or not this belongs on serverfault.com . This is definitely a hardware-related question, but I also desire a software solution - preferably one that I can program myself.
If this question is misplaced, I will be happy to migrate it to serverfault - but I do need a programming solution. Please let me know if you need clarification :)
Thanks!
It's interesting if badblocks can help in this case
AFAIK, Wear leveling happens at the firmware level. The hardware does not know about the bad block, till such time the firmware detects one.
And there is no known way to find this bad sectors before hand. BTW, I guess, it is not bad sectors, but bad blocks. Once a sector is bad, the whole block is marked as bad ...

What Skill set should a low level programmer possess?

I am an embedded SW Engineer, with less than 3 yrs of experience. I aim to "sharpen the saw" continuously. I was wondering if there was anything specific to low level programming that C/C++ coders should be proficient with.
What comes to my mind is familiarity with the hardware's architecture and instruction set. Knowing how to fiddle with bits is also important, resource management and performance have been part of my job, is there anything else?
EDIT: I work with an in-house customized RTOS, not embedded Linux.
I see a lot of high-level operating system answers here, but you specifically said low-level.
Some scattered thoughts:
Design for test. As you work through a problem only change one thing at a time per test.
You need to understand busses and interfaces, spi, i2c, usb, ethernet, etc. Number one interface, today, yesterday, and tomorrow, the uart, serial.
The steps involved in programming a flash.
Tricks to avoid making the product easily brickable.
Bootloaders in general.
Bit-banging above said interfaces on various families of parts (different chip
vendors have different ideas about io pins, pull ups, direction
controls, etc).
Board and chip bring up, you certainly never want to
boot a many tens of thousands of lines of code program on the first
power up (think led on, led off).
How to debug a product without using too much test equipment (logical analyzers and scopes), at the same time you have to learn to use a scope for debugging, you are far
more valuable if you don't HAVE TO have a tech or engineer in the lab
with you.
How would you reprogram the unit in the field? What would
you do to minimize human error when allowing the user to field
upgrade the unit? Remember field downgrades as well.
What would you do to discourage hacking (binaries, etc).
Efficient use of the flash/rom (don't wear out one bank or section, spread the wear around, or see if the flash is doing it for you).
How and when to use a watchdog timer.
State machines, very useful with bytestreams (serial and ethernet), design packet structures that stream well and are tailored to a state machine, and that have a header and checksum or other structure that insures you do not interpret partial packets or
random data as a good packet.
Specific concepts like,
Endianness (this link is to an old but good linuxjournal article)
Effective use of multithreading architectures (the Embedded site is good in general)
Debugging embedded and multithreaded systems
Understand, Learn and Follow good programming techniques (the link is very old and the point very generic and subjective, but think about it)
Other things (this IBM page on embedded linux sums up most of the other points I want to make)
One more thing -- never underestimate testing! or, planning test cases!!
Use the reference links I give as concepts,
please followup further for deeper knowledge.
I'd study the electronics of the actual chips. Learn how they work internally (such as architecture), interface with peripherals, electrical and timing characteristics, etc.
Basically, read the data sheet start to finish a few times and dig into anything you've not seen/used before.
By the way, what chips do you work with?
Similar to what Brian said, learn how to create unit tests and automated builds.
These skills are are good for all levels of software engineers to be proficient in. They will help improve the quality of your code while also making it easier to refactor and improve the code base.
If you haven't yet I think every Software Engineer should read The Pragmatic Programmer and Code Complete. I know these are not specific to low level programming, but have a large wealth of knowledge in them that applies to all sub disciplines.
Having great familiarity with pointers, the checks these languages don't do much (like buffer overflow and stuff like that), digital electronics. Operational systems internals might also help.
Get to know how stuff is represented internally, specially ready-made data structures (supposing you won't build your own one).
Above all, practice a lot. Doing it brings much more to you than just reading about it ;)
bit operations
processor architectures (caches, etc)
wcet analysis
scheduling
Edit: What I forgot to mention is model based development.
Today, the control algorithms are often implemented as some kind of automaton from which C code is generated afterwards.
Commercial available tools are for example MATLAB/Simulink, ASCET or SCADE.
Get yourself a copy of the MISRA-C book. It was originally written by members of the automotive industry, and attempts to make software written in C more robust by applying a number (quite a large number!) of rules and guidelines.
Then, buy PC-Lint (or another static analysis tool) to check your code for MISRA and other rules.
These are particularly relevant to low-level and embedded C, as between them they deal with the causes of a lot of bugs in such software, such as issues relating to pointers, memory leaks, integer promotion (there's a whole chapter on that in the MISRA book), endianness, and undefined behaviour.
Good question. Some that haven't been mentioned...
Learn your various options for achieving low-level multitasking. From basic round-robin (non-preemptive) schedulers, with timing ticks from a hardware timer, up to a preemptive RTOS. Learn why you might need an RTOS, and why you might not. If you use an RTOS, learn that beginners with a PC background probably tend to want to create too many tasks.
Getting visibility into the internals for debugging can be a challenge. There's no screen typically, so no throwing in "printf" calls wherever you want. An emulator or JTAG interface is ideal--you can set breakpoints and step through your program (as long as halting the micro doesn't make hardware go crazy, like swinging a robot arm around at full speed!). If emulator/JTAG is not available, learn how to use a spare serial port (or maybe even bit-bash a pin to make a serial port) for a debug channel, with some simple memory peek/poke commands.

Is "the optimized delay" a myth or is it real?

From time to time you hear stories that are meant to illustrate how good someone is at something, and sometimes you hear about the guy how is so into code optimization that he optimizes his delay loop.
Since this really sounds like it's a strange thing to do as it's much better to start a "timer interrupt" instead of a optimized buzy wait,
and nobody ever tend to tells you the name of the optimizing hacker.
That has left me to wonder if it is a urban myth or is it real?
What do you say, reality or fiction?
Thanks
Johan
Update: It sounds like ShuggyCoUk was on to something,
wonder if we can find a example.
Update: Just a little clarification, this question is about the "delay" function it self and how that is implemented, not how and where you call it.
And what that purpose was, and how that system became better.
Update: It's no myth, those guys seems to exist
Thanks
ShuggyCoUk
This has more than a kernel of truth about it...
Spin wait can be much better than a signal based interrupt or a yield.
You trade some throughput for much reduced latency.
Often this is vitally important within an OS itself.
You allow yourself the freedom to do operations not possible within an interrupt handler
memory allocation for example.
You can get considerably finer grained control of the interval waited since you can essentially measure the cycle count.
However spin waits are tricky to get right.
If you can you should use use proper idle instructions which:
can power down parts of the core, improving power usage/heat dissipation and even allowing other cores to go faster.
In Hyper Thread based CPUs you allow the other logical thread to use the full CPU pipeline while you spin.
an instruction you might think was a no-op could cause the CPU to execute them out of order via the super scalar execution units. The resulting code may get unforeseen out of order artefacts which force the CPU to apply a great deal of effort in terms of stalls and memory barriers which are unwanted.
This is why you let someone else write the spin wait loop for you in most cases..
In Linux there is the cpu_relax macro
on arm this is barrier()
on x86 this is rep_nop()
In Windows there is YieldProcessor
Accessible in .Net via Thread.SpinWait
OS X eschews providing a standard implementation unless you are in the kernel
see this document and note that it encourages the use only of lck_spin_t
As to some citations of using PAUSE for spin waits:
PostGresSQL
Linux
See also the note that this is better on non P4 as well due to reducing power
The version I've always heard is of a group of hardware programmers who developed a special instruction that optimised the idle (not busy) loop of their operating system. This is mentioned in Kernighan & Pike's book The Practice Of Programming, but even there they admit it may be an Urban Myth.
I've heard stories of programmers who intentionally put in long delay loops early in projects and removed them later as "optimizations" to impress management. Never figured out if the stories were apocryphal or not.