I am talking about a system which uses ARM cortex M3. The code which I am referring to is written in firmware. The user sends commands to do a particular job to the firmware and the firmware calls specific software interrupt handlers to do the task corresponding to the command being sent.I know that the software interrupt handlers are mentioned in interrupt vector table but how come the command issued by the user eg. erase will result in software interrupt being called inside firmware which will do the erase operation?
the software interrupt is an instruction (goes by other names too, same instruction). The logic in the processor knows to then switch modes to supervisor or whatever is correct and begins execution (kinda like a jump) to the code indicated by the address in the vector table. Then there is software there that handles the command, something you setup before calling the software interrupt instruction tells what is or what is effectively the operating system to do that system call for you.
The code at the application layer which makes a system call, the library/code that is linked into the application takes parameters from the application as well as sets up the appropriate information for the software interrupt, performs the software interrupt, gathers results when the interrupt returns and cleans up.
EDIT.
all of the vectors in the vector table work this way. even reset. the logic knows when an event has occurred, an interrupt, data abort, undefined instruction, etc. the logic is hardwired to go to a specific address, read that value, which is an address, and then start executing at that handler address. the swi/svc is just another "event" but one that we want to create directly vs creating an undefined instruction or an unaligned access, etc which all will do basically the same thing, trigger an event, normal execution stops, machine state may or may not be saved (some percentage is in an cortex-m3, but may depend on the event), and execution of the handler happens. (in the m3 there isnt supervisor vs user that is full sized arm). the svc/swi is one though that we want to create where undefined we could but dont normally want to. hardware interrupts are not that much different, but we didnt insert an instruction to cause them, other logic causes them based on some event in that logic. in all cases there is code that we (the programmers) have to write for each event we care to handle (well each that we need to handle, covering all the ones that might happen) one of which might be the svc/swi event and within that it is not defined by arm what you call the system functions or how they are defined. arm may have a set they use, but you are technically free to create any mechanism you want and any set of system calls you want, you just have to make sure the caller and the callee agree on the definition and who is responsible for what.
Related
It might be a stupid question where the answer is "no" but is that theoretically possible ? I wonder why not ?
And I don't know what for...
There's several different types of "interrupt handlers".
The first, the hardware IRQ handlers, are modified when the OS loads drivers and such.
The second, the software interrupt handlers, are used to call OS level services in modern OSes.
Those are the ones that have hardware support (either across the entire computer, or within the processor).
A third kind, without hardware support, are "signal handlers" (in UNIX), which are basically OS-level and relate to OS events.
The common concept between them is that the responses are programmable. The idea is that you know how you want your software/OS to respond to them, so you add the code necessary to service them. In that sense, they are "modifiable at runtime".
But there are rules as to what to do in these things. Primarily, you don't want to waste too much time handling them, because whatever you do with them prevents other interrupts (of the same or lower priority) from occurring while you're processing them. (For example, you don't want to be in the middle of handling one interrupt and get another interrupt for the same thing before you finish handling the first one, because an interrupt handler can do things that would otherwise require a lock (loading and incrementing the current or last pointers on a ring queue, for example) and would clobber the state if it re-entered.)
So, typically interrupt handlers do the least of what they need to do, and set a flag for the software to recognize that processing on that needs to be done once it gets back out of interrupt mode.
Historically, DOS and other non-protected OSes allowed software to modify the interrupt tables willy-nilly. This worked out okay when people who understood how interrupts were supposed to work were programming them, but it was also easy to completely screw over the state of the system with them. This is why modern, protected OSes don't typically allow user software to modify the interrupt tables. (If you're running in kernel mode as a driver, you can do it, but it's still really not a good idea.)
But, UNIX allows for user software to change its process's signal handlers. This is typically done to allow (for example) SIGHUP to tell Apache to reload its configuration files.
Modifying the interrupt table that the OS uses modifies that table for all software running on the system. This is generally not something that a user running a secure OS would particularly want, if they wanted to retain security of their system.
I learned years ago, that in the application world, global variables are a "bad" or "frowned upon", so it became a habit to try to avoid them and use them very scarcely.
Seems like that in the embedded world they are almost unavoidable when it comes to working with hardware interrupts. They also have to be made volatile so that the compiler does not optimize them out if it sees them never being touched in the running program.
Are both of these statements true ? is there a way to avoid those variables in the case I described without bending too far backward ?
Seems like that in the embedded world they are almost unavoidable when
it comes to working with hardware interrupts. They also have to be
made volatile so that the compiler does not optimize them out if it
sees them never being touched in the running program.
Are both of these statements true ? is there a way to avoid those
variables in the case I described without bending too far backward ?
Neither of those statements are true.
First, let's clarify that by global variables we mean file scope variables that have external linkage. These are variables that could be called upon with the extern keyword or by mistake.
With regard to the first statement:
Seems like that in the embedded world they are almost unavoidable when
it comes to working with hardware interrupts.
Global variables are avoidable when working with hardware interrupts. As others have pointed out in the comments, global variables in an embedded environment are not uncommon, but they aren't encouraged either especially if you can afford to implement proper encapsulation. This article, which someone provided in the comments to your question, actually contains a reader response that provides a good example of where a proper implementation of encapsulation was not possible (you don't have to go far, it's the first one).
With regard to the second statement:
They also have to be made volatile so that the compiler does not optimize them out if it
sees them never being touched in the running program.
This statement is, let's say 'almost true'. The compiler knows when the memory location for a variable needs to be accessed (written to/read from memory) so when optimization is turned on it will avoid unnecessary memory access. The volatile keyword tells the compiler not to do that, which means access to that memory location will happen every time the variable is used.
Cases where using the volatile keyword is necessary
Global variable(s) updated an interrupt
Global variable(s) accessed by multiple threads in a multi-thread application
Memory-mapped peripheral registers
In the case of a global variable that is updated by an interrupt, the volatile keyword is imperative because the interrupt can happen at any time and we do not want to miss that update. For global variables that are not updated by an interrupt and the application is single threaded, the volatile keyword is completely unnecessary and can actually slow your code down since you'll be accessing the memory location for that variable every time!
is there a way to avoid those variables in the case I described without bending too far backward ?
The answer to that really depends. Probably most important is what do you have to gain from making this design change? Also, is this a professional project, one for school, or one for fun?
From my experience as an engineer, time to market is often times the most important thing a company is worried about when developing a new product. There is usually going to be some legacy code that you get stuck with that was developed during the research and development phase, and it 'works' so why spend time to fix something that isn't broken? Seriously, it better be a very convincing argument otherwise don't waste your time.
From my educational experience, taking the time to go back and implement a proper design philosophy and document it is definitely worth it, but only if you actually have the time to do so! If you are close to a deadline, don't risk it. If you are ahead of the game, do it. It'll be worth it in more ways than one.
Lastly, to properly encapsulate the Interrupt Service Routine (ISR) for a hardware interrupt, you need to place it in an actual device driver (CAN, UART, SPI, etc). All communication with the ISR should be facilitated by the device driver and device driver only. Variables shared between the ISR and the device driver should be declared static and volatile. If you need access to any of those variables externally, you create setters and getters as part of your public API for the driver. Check this answer out for a general guideline to follow.
From the Cortex-R reference manual, probably not Cortex-R specific
Asynchronous abort masking
The nature of asynchronous aborts means that they can occur while the processor is handling a different abort. If an asynchronous abort generates a new exception in such a situation, the r14_abt and SPSR_abt values are overwritten. If this occurs before the data is pushed to the stack in memory, the state information about the first abort is lost. To prevent this from happening, the CPSR contains a mask bit, the A-bit, to indicate that an asynchronous abort cannot be accepted. When the A-bit is set, any asynchronous abort that occurs is held pending by the processor until the A-bit is cleared, when the exception is actually taken. The A-bit is automatically set when abort, IRQ or FIQ exceptions are taken, and on reset. You must only clear the A-bit in an abort handler after the state information has either been stacked to memory, or is no longer required.
My question is, if I have the A bit masked since reset how can I know if an asynchronous abort is pending? Can pending external aborts be cleared without unmasking the A bit and taking the exception? Or more generally, is there advice on clearing the A bit after a reset?
Apparently something in my current boot chain has a pending external abort (but only after a hard power on). I would like to enable the external aborts, but it seems rather cumbersome to special case the first external abort in the exception code.
On a system that implements the security extensions, the Interrupt Status Register, ISR, can tell you if there's an external abort pending. Sadly this doesn't help much if you're on R4 which doesn't implement them.
Otherwise, there's nothing that I can see in the architecture to identify or deal with an abort short of taking the exception as you say. This doesn't really surprise me - in general an external about that can be safely ignored very much is a special case.
If the bug in the system can't be fixed (is the bootloader probing devices in the wrong order, or similar?) then a workaround, however cumbersome, is the order of the day - if there's some reasonably straightforward way to tell a cold boot from a warm reset I can imagine a pretty trivial self-contained shim to handle it so the main code never needs to know.
When things go badly awry in embedded systems I tend to write an error to a special log file in flash and then reboot (there's not much option if, say, you run out of memory).
I realize even that can go wrong, so I try to minimize it (by not allocating any memory during the final write, and boosting the write processes priority).
But that relies on someone retrieving the log file. Now I was considering sending a message over the intertubes to report the error before rebooting.
On second thoughts, of course, it would be better to send that message after reboot, but it did get me to thinking...
What sort of things ought I be doing if I discover an irrecoverable error, and how can I do them as safely as possible in a system which is in an unstable state?
One strategy is to use a section of RAM that is not initialised by during power-on/reboot. That can be used to store data that survives a reboot, and then when your app restarts, early on in the code it can check that memory and see if it contains any useful data. If it does, then write it to a log, or send it over a comms channel.
How to reserve a section of RAM that is non-initialised is platform-dependent, and depends if you're running a full-blown OS (Linux) that manages RAM initialisation or not. If you're on a small system where RAM initialisation is done by the C start-up code, then your compiler probably has a way to put data (a file-scope variable) in a different section (besides the usual e.g. .bss) which is not initialised by the C start-up code.
If the data is not initialised, then it will probably contain random data at power-up. To determine whether it contains random data or valid data, use a hash, e.g. CRC-32, to determine its validity. If your processor has a way to tell you if you're in a reboot vs a power-up reset, then you should also use that to decide that the data is invalid after a power-up.
There is no single answer to this. I would start with a Watchdog timer. This reboots the system if things go terribly awry.
Something else to consider - what is not in a log file is also important. If you have routine updates from various tasks/actions logged then you can learn from what is missing.
Finally, in the case that things go bad and you are still running: enter a critical section, turn off as much of the OS a possible, shut down peripherals, log as much state info as possible, then reboot!
The one thing you want to make sure you do is to not corrupt data that might legitimately be in flash, so if you try to write information in a crash situation you need to do so carefully and with the knowledge that the system might be an a very bad state so anything you do needs to be done in a way that doesn't make things worse.
Generally, when I detect a crash state I try to spit information out a serial port. A UART driver that's accessible from a crashed state is usually pretty simple - it just needs to be a simple polling driver that writes characters to the transmit data register when the busy bit is clear - a crash handler generally doesn't need to play nice with multitasking, so polling is fine. And it generally doesn't need to worry about incoming data; or at least not needing to worry about incoming data in a fashion that can't be handled by polling. In fact, a crash handler generally cannot expect that multitasking and interrupt handling will be working since the system is screwed up.
I try to have it write the register file, a portion of the stack and any important OS data structures (the current task control block or something) that might be available and interesting. A watchdog timer usually is responsible for resetting the system in this state, so the crash handler might not have the opportunity to write everything, so dump the most important stuff first (do not have the crash handler kick the watchdog - you don't want to have some bug mistakenly prevent the watchdog from resetting the system).
Of course this is most useful in a development setup, since when the device is released it might not have anything attached to the serial port. If you want to be able to capture these kinds of crash dumps after release, then they need to get written somewhere appropriate (like maybe a reserved section of flash - just make sure it's not part of the normal data/file system area unless you're sure it can't corrupt that data). Of course you'd need to have something examine that area at boot so it can be detected and sent somewhere useful or there's no point, unless you might get units back post-mortem and can hook them up to a debugging setup that can look at the data.
I think the most well known example of proper exception handling is a missile self-destruction. The exception was caused by arithmetic overflow in software. There obviously was a lot of tracing/recording media involved because the root cause is known. It was discovered debugged.
So, every embedded design must include 2 features: recording media like your log file and graceful halt, like disabling all timers/interrupts, shutting all ports and sitting in infinite loop or in case of a missile - self-destruction.
Writing messages to flash before reboot in embedded systems is often a bad idea. As you point out, no one is going to read the message, and if the problem is not transient you wear out the flash.
When the system is in an inconsistent state, there is almost nothing you can do reliably and the best thing to do is to restart the system as quickly as possible so that you can recover from transient failures (timing, special external events, etc.). In some systems I have written a trap handler that uses some reserved memory so that it can, set up the serial port and then emit a stack dump and register contents without requiring extra stack space or clobbering registers.
A simple restart with a dump like that is reasonable because if the problem is transient the restart will resolve the problem and you want to keep it simple and let the device continue. If the problem is not transient you are not going to make forward progress anyway and someone can come along and connect a diagnostic device.
Very interesting paper on failures and recovery: WHY DO COMPUTERS STOP AND WHAT CAN BE DONE ABOUT IT?
For a very simple system, do you have a pin you can wiggle? For example, when you start up configure it to have high output, if things go way south (i.e. watchdog reset pending) then set it to low.
Have you ever considered using a garbage collector ?
And I'm not joking.
If you do dynamic allocation at runtime in embedded systems,
why not reserve a mark buffer and mark and sweep when the excrement hits the rotating air blower.
You've probably got the malloc (or whatever) implementation's source, right ?
If you don't have library sources for your embedded system forget I ever suggested it, but tell the rest of us what equipment it is in so we can avoid ever using it. Yikes (how do you debug without library sources?).
If you're system is already dead.... who cares how long it takes. It obviously isn't critical that it be running this instant;
if it was you couldn't risk "dieing" like this anyway ?
You may know a lot of programs, e.g some password cracking programs, we can stop them while they're running, and when we run the program again (with or without entering a same input), they will be able to continue from where they have left. I wonder what kind of technique those programs are using?
[Edit] I am writing a program mainly based on recursion functions. Within my knowledge, I think it is incredibly difficult to save such states in my program. Is there any technique, somehow, saves the stack contents, function calls, and data involved in my program, and then when it is restarted, it can run as if it hasn't been stopped? This is just some concepts I got in my mind, so please forgive me if it doesn't make sense...
It's going to be different for every program. For something as simple as, say, a brute force password cracker all that would really need to be saved was the last password tried. For other apps you may need to store several data points, but that's really all there is too it: saving and loading the minimum amount of information needed to reconstruct where you were.
Another common technique is to save an image of the entire program state. If you've ever played with a game console emulator with the ability to save state, this is how they do it. A similar technique exists in Python with pickling. If the environment is stable enough (ie: no varying pointers) you simply copy the entire apps memory state into a binary file. When you want to resume, you copy it back into memory and begin running again. This gives you near perfect state recovery, but whether or not it's at all possible is highly environment/language dependent. (For example: most C++ apps couldn't do this without help from the OS or if they were built VERY carefully with this in mind.)
Use Persistence.
Persistence is a mechanism through which the life of an object is beyond programs execution lifetime.
Store the state of the objects involved in the process on the local hard drive using serialization.
Implement Persistent Objects with Java Serialization
To achieve this, you need to continually save state (i.e. where you are in your calculation). This way, if you interrupt the probram, when it restarts, it will know it is in the middle of calculation, and where it was in that calculation.
You also probably want to have your main calculation in a separate thread from your user interface - this way you can respond to "close / interrupt" requests from your user interface and handle them appropriately by stopping / pausing the thread.
For linux, there is a project named CRIU, which supports process-level save and resume. It is quite like hibernation and resuming of the OS, but the granularity is broken down to processes. It also supports container technologies, specifically Docker. Refer to http://criu.org/ for more information.