It might be a stupid question where the answer is "no" but is that theoretically possible ? I wonder why not ?
And I don't know what for...
There's several different types of "interrupt handlers".
The first, the hardware IRQ handlers, are modified when the OS loads drivers and such.
The second, the software interrupt handlers, are used to call OS level services in modern OSes.
Those are the ones that have hardware support (either across the entire computer, or within the processor).
A third kind, without hardware support, are "signal handlers" (in UNIX), which are basically OS-level and relate to OS events.
The common concept between them is that the responses are programmable. The idea is that you know how you want your software/OS to respond to them, so you add the code necessary to service them. In that sense, they are "modifiable at runtime".
But there are rules as to what to do in these things. Primarily, you don't want to waste too much time handling them, because whatever you do with them prevents other interrupts (of the same or lower priority) from occurring while you're processing them. (For example, you don't want to be in the middle of handling one interrupt and get another interrupt for the same thing before you finish handling the first one, because an interrupt handler can do things that would otherwise require a lock (loading and incrementing the current or last pointers on a ring queue, for example) and would clobber the state if it re-entered.)
So, typically interrupt handlers do the least of what they need to do, and set a flag for the software to recognize that processing on that needs to be done once it gets back out of interrupt mode.
Historically, DOS and other non-protected OSes allowed software to modify the interrupt tables willy-nilly. This worked out okay when people who understood how interrupts were supposed to work were programming them, but it was also easy to completely screw over the state of the system with them. This is why modern, protected OSes don't typically allow user software to modify the interrupt tables. (If you're running in kernel mode as a driver, you can do it, but it's still really not a good idea.)
But, UNIX allows for user software to change its process's signal handlers. This is typically done to allow (for example) SIGHUP to tell Apache to reload its configuration files.
Modifying the interrupt table that the OS uses modifies that table for all software running on the system. This is generally not something that a user running a secure OS would particularly want, if they wanted to retain security of their system.
Related
I have an application for Zynq MPSoC (Vitis 2020.2) written in C++ using FreeRTOS V10.3.0. This application runs very well if stops at a breakpoint once. If I disable all breakpoints program runs buggy? What might be problem?
How many ways could this happen?! It is a real-time operating system, presumably then also a real-time application. If you stop the coprocessor you affect the timing. Without knowing the hardware, the software, where you are placing the breakpoint and the bugs that arise when free-running, it is not possible to answer your specific question. I.e. you need to debug it - there is no generic explanation as to why the intrusive action of stopping the processor "fixes" your system.
You clearly have buggy code that is affected by timing. Stopping the code does not necessarily stop peripherals and it certainly does not stop the outside world that your system interacts with. For example when you stop on a breakpoint, the world continues, interrupts become pending (possibly several), so that when you resume execution, the execution path and thread scheduling order is likely to differ considerably from that when it is free-run as all those pending interrupts are handled and in turn issue events that cause different tasks to become pending ready-to-run and then run in a different order to that which would otherwise occur.
Ultimately you are asking the wrong question; the breakpoint is not magically "fixing" your code, rather it is significantly changing the way that it runs such that some existing bug (or bugs) is hidden or avoided. The bug is still there, so the question would better focus on finding the bug than "magic thinking".
The bugs could be at any level, but most likely are design issues with inappropriate task partitioning, priority assignment, IPC, task synchronisation or resource protection. Generally probably rather too broad to deal with in a single SO question.
I am talking about a system which uses ARM cortex M3. The code which I am referring to is written in firmware. The user sends commands to do a particular job to the firmware and the firmware calls specific software interrupt handlers to do the task corresponding to the command being sent.I know that the software interrupt handlers are mentioned in interrupt vector table but how come the command issued by the user eg. erase will result in software interrupt being called inside firmware which will do the erase operation?
the software interrupt is an instruction (goes by other names too, same instruction). The logic in the processor knows to then switch modes to supervisor or whatever is correct and begins execution (kinda like a jump) to the code indicated by the address in the vector table. Then there is software there that handles the command, something you setup before calling the software interrupt instruction tells what is or what is effectively the operating system to do that system call for you.
The code at the application layer which makes a system call, the library/code that is linked into the application takes parameters from the application as well as sets up the appropriate information for the software interrupt, performs the software interrupt, gathers results when the interrupt returns and cleans up.
EDIT.
all of the vectors in the vector table work this way. even reset. the logic knows when an event has occurred, an interrupt, data abort, undefined instruction, etc. the logic is hardwired to go to a specific address, read that value, which is an address, and then start executing at that handler address. the swi/svc is just another "event" but one that we want to create directly vs creating an undefined instruction or an unaligned access, etc which all will do basically the same thing, trigger an event, normal execution stops, machine state may or may not be saved (some percentage is in an cortex-m3, but may depend on the event), and execution of the handler happens. (in the m3 there isnt supervisor vs user that is full sized arm). the svc/swi is one though that we want to create where undefined we could but dont normally want to. hardware interrupts are not that much different, but we didnt insert an instruction to cause them, other logic causes them based on some event in that logic. in all cases there is code that we (the programmers) have to write for each event we care to handle (well each that we need to handle, covering all the ones that might happen) one of which might be the svc/swi event and within that it is not defined by arm what you call the system functions or how they are defined. arm may have a set they use, but you are technically free to create any mechanism you want and any set of system calls you want, you just have to make sure the caller and the callee agree on the definition and who is responsible for what.
Both 2 libs are designed for async i/o scheduling, and both engages epoll on linux, and kqueue on FreeBSD, etc.
Except superficial differences, I mean what is the TRUE difference between these two libraries? regarding to architecture, or design philosophy?
As for design philosophy, libev was created to improve on some of the architectural decisions in libevent, for example, global variable usage made it hard to use libevent safely in multithreaded environments, watcher structures are big because they combine I/O, time and signal handlers in one, the extra components such as the http and dns servers suffered from bad implementation quality and resultant security issues, and timers were inexact and didn't cope well with time jumps.
Libev tried to improve each of these, by not using global variables but using a loop context for all functions, by using small watchers for each event type (an I/O watcher uses 56 bytes on x86_64 compared to 136 for libevent), allowing extra event types such as timers based on wallclock vs. monotonic time, inter-thread interruptions, prepare and check watchers to embed other event loops or to be embedded and so on.
The extra component problem is "solved" by not having them at all, so libev can be small and efficient, but you also need to look elsewhere for an http library, because libev simply doesn't have one (for example, there is a very related library called libeio that does asynchronous I/O, which can be used independently or together with libev, so you can mix and match).
So in short, libev tries to do one thing only (POSIX event library), and this in the most efficient way possible. Libevent tries to give you the full solution (event lib, non-blocking I/O library, http server, DNS client).
Or, even shorter, libev tries to follow the UNIX toolbox philosophy of doing one thing only, as good as possible.
Note that this is the design philosophy, which I can state with authority because I designed libev. Whether these design goals have actually been reached, or whether the philosophy is based on sound principles, is up to you to judge.
Update 2017:
I was asked multiple times what timer inexactness I refer to, and why libev doesn't support IOCPs on windows.
As for timers, libevent schedules timers relative to some unknown base time that is in the future, without you knowing it. Libev can tell you in advance what base time it will use to schedule timers, which allows programs to use both the libevent approach and the libev approach. Furthermore, libevent would sometimes expire timers early, depending on the backend. The former is an API issue, the latter is fixable (and might have been fixed since - I didn't check).
As for IOCP support - I don't think it can be done, as IOCPs are simply not powerful enough. For one thing, they need a special socket type, which would limit the set of handles allowed on windows even more (for example, the sopckets used by perl are of the "wrong" type for IOCPs). Furthermore, IOCPs simply don't support I/O readyness events at all, they only can do actual I/O. There are workarounds for some handle types, such as doing a dummy 0-byte read, but again, this would limit the handle types you can use on windows even more and furthermore would rely on undocumented behaviour that is likely not shared by all socket providers.
To my knowledge, no other event library supports IOCPs on windows, either. What libevent does is, in addition to the event library, it allows you to queue read/write operations which then can be done via IOCPs. Since libev does not do I/O for you, there is no way to use IOCPs in libev itself.
This is indeed by design - libev tries to be small and POSIX-like, and windows simply does not have an efficient way to get POSIX-style I/O events. If IOCPs are important, you either have to use them yourself, or indeed use some of the many other frameworks that do I/O for you and therefore can use IOCPs.
The great advantage of libevent for me is the built-in OpenSSL support. Bufferevent interface, introduced in 2.0 version of libevent API, handles the secure connections almost painlessly for the developer.
May be my knowlege has gone out of date but it seems like libev does not support this.
Here are souce code link,
libev: https://github.com/enki/libev
libevent: https://github.com/libevent/libevent
I find the latest commitee of libev is 2015, but the livevent still have commitees untill July 2022, so a library have persons to maintain it is also important.
When things go badly awry in embedded systems I tend to write an error to a special log file in flash and then reboot (there's not much option if, say, you run out of memory).
I realize even that can go wrong, so I try to minimize it (by not allocating any memory during the final write, and boosting the write processes priority).
But that relies on someone retrieving the log file. Now I was considering sending a message over the intertubes to report the error before rebooting.
On second thoughts, of course, it would be better to send that message after reboot, but it did get me to thinking...
What sort of things ought I be doing if I discover an irrecoverable error, and how can I do them as safely as possible in a system which is in an unstable state?
One strategy is to use a section of RAM that is not initialised by during power-on/reboot. That can be used to store data that survives a reboot, and then when your app restarts, early on in the code it can check that memory and see if it contains any useful data. If it does, then write it to a log, or send it over a comms channel.
How to reserve a section of RAM that is non-initialised is platform-dependent, and depends if you're running a full-blown OS (Linux) that manages RAM initialisation or not. If you're on a small system where RAM initialisation is done by the C start-up code, then your compiler probably has a way to put data (a file-scope variable) in a different section (besides the usual e.g. .bss) which is not initialised by the C start-up code.
If the data is not initialised, then it will probably contain random data at power-up. To determine whether it contains random data or valid data, use a hash, e.g. CRC-32, to determine its validity. If your processor has a way to tell you if you're in a reboot vs a power-up reset, then you should also use that to decide that the data is invalid after a power-up.
There is no single answer to this. I would start with a Watchdog timer. This reboots the system if things go terribly awry.
Something else to consider - what is not in a log file is also important. If you have routine updates from various tasks/actions logged then you can learn from what is missing.
Finally, in the case that things go bad and you are still running: enter a critical section, turn off as much of the OS a possible, shut down peripherals, log as much state info as possible, then reboot!
The one thing you want to make sure you do is to not corrupt data that might legitimately be in flash, so if you try to write information in a crash situation you need to do so carefully and with the knowledge that the system might be an a very bad state so anything you do needs to be done in a way that doesn't make things worse.
Generally, when I detect a crash state I try to spit information out a serial port. A UART driver that's accessible from a crashed state is usually pretty simple - it just needs to be a simple polling driver that writes characters to the transmit data register when the busy bit is clear - a crash handler generally doesn't need to play nice with multitasking, so polling is fine. And it generally doesn't need to worry about incoming data; or at least not needing to worry about incoming data in a fashion that can't be handled by polling. In fact, a crash handler generally cannot expect that multitasking and interrupt handling will be working since the system is screwed up.
I try to have it write the register file, a portion of the stack and any important OS data structures (the current task control block or something) that might be available and interesting. A watchdog timer usually is responsible for resetting the system in this state, so the crash handler might not have the opportunity to write everything, so dump the most important stuff first (do not have the crash handler kick the watchdog - you don't want to have some bug mistakenly prevent the watchdog from resetting the system).
Of course this is most useful in a development setup, since when the device is released it might not have anything attached to the serial port. If you want to be able to capture these kinds of crash dumps after release, then they need to get written somewhere appropriate (like maybe a reserved section of flash - just make sure it's not part of the normal data/file system area unless you're sure it can't corrupt that data). Of course you'd need to have something examine that area at boot so it can be detected and sent somewhere useful or there's no point, unless you might get units back post-mortem and can hook them up to a debugging setup that can look at the data.
I think the most well known example of proper exception handling is a missile self-destruction. The exception was caused by arithmetic overflow in software. There obviously was a lot of tracing/recording media involved because the root cause is known. It was discovered debugged.
So, every embedded design must include 2 features: recording media like your log file and graceful halt, like disabling all timers/interrupts, shutting all ports and sitting in infinite loop or in case of a missile - self-destruction.
Writing messages to flash before reboot in embedded systems is often a bad idea. As you point out, no one is going to read the message, and if the problem is not transient you wear out the flash.
When the system is in an inconsistent state, there is almost nothing you can do reliably and the best thing to do is to restart the system as quickly as possible so that you can recover from transient failures (timing, special external events, etc.). In some systems I have written a trap handler that uses some reserved memory so that it can, set up the serial port and then emit a stack dump and register contents without requiring extra stack space or clobbering registers.
A simple restart with a dump like that is reasonable because if the problem is transient the restart will resolve the problem and you want to keep it simple and let the device continue. If the problem is not transient you are not going to make forward progress anyway and someone can come along and connect a diagnostic device.
Very interesting paper on failures and recovery: WHY DO COMPUTERS STOP AND WHAT CAN BE DONE ABOUT IT?
For a very simple system, do you have a pin you can wiggle? For example, when you start up configure it to have high output, if things go way south (i.e. watchdog reset pending) then set it to low.
Have you ever considered using a garbage collector ?
And I'm not joking.
If you do dynamic allocation at runtime in embedded systems,
why not reserve a mark buffer and mark and sweep when the excrement hits the rotating air blower.
You've probably got the malloc (or whatever) implementation's source, right ?
If you don't have library sources for your embedded system forget I ever suggested it, but tell the rest of us what equipment it is in so we can avoid ever using it. Yikes (how do you debug without library sources?).
If you're system is already dead.... who cares how long it takes. It obviously isn't critical that it be running this instant;
if it was you couldn't risk "dieing" like this anyway ?
You may know a lot of programs, e.g some password cracking programs, we can stop them while they're running, and when we run the program again (with or without entering a same input), they will be able to continue from where they have left. I wonder what kind of technique those programs are using?
[Edit] I am writing a program mainly based on recursion functions. Within my knowledge, I think it is incredibly difficult to save such states in my program. Is there any technique, somehow, saves the stack contents, function calls, and data involved in my program, and then when it is restarted, it can run as if it hasn't been stopped? This is just some concepts I got in my mind, so please forgive me if it doesn't make sense...
It's going to be different for every program. For something as simple as, say, a brute force password cracker all that would really need to be saved was the last password tried. For other apps you may need to store several data points, but that's really all there is too it: saving and loading the minimum amount of information needed to reconstruct where you were.
Another common technique is to save an image of the entire program state. If you've ever played with a game console emulator with the ability to save state, this is how they do it. A similar technique exists in Python with pickling. If the environment is stable enough (ie: no varying pointers) you simply copy the entire apps memory state into a binary file. When you want to resume, you copy it back into memory and begin running again. This gives you near perfect state recovery, but whether or not it's at all possible is highly environment/language dependent. (For example: most C++ apps couldn't do this without help from the OS or if they were built VERY carefully with this in mind.)
Use Persistence.
Persistence is a mechanism through which the life of an object is beyond programs execution lifetime.
Store the state of the objects involved in the process on the local hard drive using serialization.
Implement Persistent Objects with Java Serialization
To achieve this, you need to continually save state (i.e. where you are in your calculation). This way, if you interrupt the probram, when it restarts, it will know it is in the middle of calculation, and where it was in that calculation.
You also probably want to have your main calculation in a separate thread from your user interface - this way you can respond to "close / interrupt" requests from your user interface and handle them appropriately by stopping / pausing the thread.
For linux, there is a project named CRIU, which supports process-level save and resume. It is quite like hibernation and resuming of the OS, but the granularity is broken down to processes. It also supports container technologies, specifically Docker. Refer to http://criu.org/ for more information.