bottom halves of and ISR - interrupt-handling

I was asked this question in an interview.
Why is the bottom half of an interrupt service routine not allowed to have sleeps in it.My answer was that since the interrupts will be masked while executing the ISR we will miss some interrupts if the bottom halves have sleeps.I was not able to think of anything else.Is this the right answer.Can anyone think of any other reason for this.

The following is for Linux.
There are two types of bottom halves. The first one is comprised of softirqs and tasklets.
Tasklets are build upon softirqs and are very similar. These 2 don't run in process context so therefore they can't sleep.
The second type is workqueue which runs in a kernel thread and can sleep.
There are some bottom halves which have to sleep. To the best of my knowledge the network system uses workqueues. I have written bottom halves that sleep.
Maybe you were asking about top halves which can't sleep out of 2 reasons. They are not running in a process context, therefore they can't and they should be executed as fast as possible and deffer all work to bottom halves, which if needed will sleep.

Related

TinyAVR 0-Series: Can I use pin-change sensing without entering interrupt handler?

I am evaluating the ATtiny806 running at 20MHz to build a cycle-accurate Intel 4004 microprocessor emulator. (I know it will be a bit too slow, but AVRs have a huge community.)
I need to synchronize to the external, two-phase non-overlapping clocks. These are not fast clocks (the original 4004 ran at 750kHz)
but if I spin-wait for every clock edge, I risk wasting most of my time budget.
The TinyAVR 0-series has a very nice pin-change interrupt facility that can be configured to trigger only on rising edges.
But, an interrupt routine round-trip is 8 cycles (3 in, 5 out).
My question is:
Can I leverage the pin-change sensing mechanism while never visiting an ISR?
(Other processor families let you poll for interruptible conditions without enabling interrupts from that peripheral). Can polling be done with a tight skip-on-bit/jump-back loop, followed by a set-bit instruction?
Straightforward way
You can always just poll on the level of the GPIO pin using the single cycle skip if bit set/clear instruction on the appropriate PORT register and bit.
But as you mention, polling does burn cycles so I'm not sure exactly what you want here - either a poll (that burns cycles but has low latency) or an interrupt (that has higher latency but allows processing to continue until the condition is true).
Note that if things get really tight and you are looking for, say, power savings by sleeping between clock signal transitions then you can do tricks like having an ISR that nevers returns (saving the IRET cycles) but that requires some careful coding probably with something like a state machine.
INTFLAG way
Alternately, if you want to use the internal pin state machine logic and you can live without interrupts, then you can use the INTFLAGS flags to check for the pin change configured in the ISC bits of the PINxCTRL register. As long as global interrupts are not enabled in SREG then you can spin poll on the appropriate INTFLAG bit to check/wait for the desired condition, and then write a 1 to that bit to clear the flag.
Note that if you want to make this fast, you will probably want to map the appropriate PORT to a VPORT since the VPORT registers are in I/O Memory. This lets you use SBIS to test the INTFLAG bit a single cycle and SBI to clear the bit in a single cycle (these instructions only work on IO memory and the normal PORT registers are not in IO Memory).
Finally one more complication, if you need to leave the interrupts on when doing this, it is probably possible by hacking the interrupt priority registers. You'd set the pin change to be on level 0, and then make sure the interrupts you care about are level 1 or higher, and then trick the interrupt controller into thinking that there is already a level 0 running so these interrupts do not actually fire. There are also other restrictions to this strategy so avoid it if at all possible.
Programmable logic way
If you want to get really esoteric, it is likely possible that you could route the input value of a pin to a configurable custom logic LUT in the chip and then route the output of that module to a bit that you test using a 1-cycle bit test (maybe an unused IO Pin). To do this, you'd feedback the output of the LUT back into one of its inputs and then use the LUT to create a strobe on the edge you are looking for. This is very complex, and also since the strobe has no acknowledgement that if the signal changes when you are not looking for it (in a spin check) then it will be lost and you will have to wait for the next edge (probably fatal in your application).

How do you avoid interrupt starvation in a nested interrupt system?

I am learning about interrupts and couldn't understand what happens when there are too many interrupts to a point where the CPU can't process the foreground loop or complete the existing interrupts. I read through this article https://www.cs.utah.edu/~regehr/papers/interrupt_chapter.pdf but didn't completely understand how a scheduler would help, if there are simply too many interrupts?
Do we switch to a faster CPU if the interrupts can not be missed?
Yes, you had to switch to a faster CPU!
You had to ensure that there is enough time for the mainloop. Therefore it is really important to keep your Interrupt service as short as possible and do some CPU workloads tests.
Indeed, any time there is contention over a shared resource, there is the possibility of starvation. The schedulers discussed in the paper limit the interrupt rate, thus ensuring some interrupt-free processing time during each interval. During high activity periods, interrupt handling is disabled, and the scheduler switches to polling mode where it interrogates the state of the interrupt request lines periodically, effectively throttling the stream of interrupts. The operating system strives to do as little as possible in each interrupt handler - tasks are often simply queued so they can be handled later at a different stage. There are many considerations and trade-offs that go into any scheduling algorithm.
Overall you need a clue of how much time each part of your program consumes. This is pretty easy to measure in practice live with an oscilloscope. If you activate a GPIO when entering and de-activate it when leaving the interrupt, you don't only get to see how much time the ISR consumes, but also how often it kicks in. If you do this for each ISR you get a good idea how much time they need. You can then do something similar in main(), to get a rough estimate of the complete execution cycle of the program, main + interrupts.
As for the best solution, it is obviously to reduce the amount of interrupts. Use polling if possible. Use DMA. Use serial peripherals (UART, CAN etc) that are hardware-buffered instead of interrupt-intensive ones. Use hardware PWM instead of output compare timers. And so on. These things need to be considered early on when you pick a suitable MCU for your project. If you picked the wrong MCU, then you'll obviously have to change. Twiddling with the CPU clock sounds like quick & dirty fix. Get the design right instead.

operating system - context switches

I have been confused about the issue of context switches between processes, given round robin scheduler of certain time slice (which is what unix/windows both use in a basic sense).
So, suppose we have 200 processes running on a single core machine. If the scheduler is using even 1ms time slice, each process would get its share every 200ms, which is probably not the case (imagine a Java high-frequency app, I would not assume it gets scheduled every 200ms to serve requests). Having said that, what am I missing in the picture?
Furthermore, java and other languages allows to put the running thread to sleep for e.g. 100ms. Am I correct in saying that this does not cause context switch, and if so, how is this achieved?
So, suppose we have 200 processes running on a single core machine. If
the scheduler is using even 1ms time slice, each process would get its
share every 200ms, which is probably not the case (imagine a Java
high-frequency app, I would not assume it gets scheduled every 200ms
to serve requests). Having said that, what am I missing in the
picture?
No, you aren't missing anything. It's the same case in the case of non-pre-emptive systems. Those having pre-emptive rights(meaning high priority as compared to other processes) can easily swap the less useful process, up to an extent that a high-priority process would run 10 times(say/assume --- actual results are totally depending on the situation and implementation) than the lowest priority process till the former doesn't produce the condition of starvation of the least priority process.
Talking about the processes of similar priority, it totally depends on the Round-Robin Algorithm which you've mentioned, though which process would be picked first is again based on the implementation. And, Windows and Unix have same process scheduling algorithms. Windows and Unix does utilise Round-Robin, but, Linux task scheduler is called Completely Fair Scheduler (CFS).
Furthermore, java and other languages allows to put the running thread
to sleep for e.g. 100ms. Am I correct in saying that this does not
cause context switch, and if so, how is this achieved?
Programming languages and libraries implement "sleep" functionality with the aid of the kernel. Without kernel-level support, they'd have to busy-wait, spinning in a tight loop, until the requested sleep duration elapsed. This would wastefully consume the processor.
Talking about the threads which are caused to sleep(Thread.sleep(long millis)) generally the following is done in most of the systems :
Suspend execution of the process and mark it as not runnable.
Set a timer for the given wait time. Systems provide hardware timers that let the kernel register to receive an interrupt at a given point in the future.
When the timer hits, mark the process as runnable.
I hope you might be aware of threading models like one to one, many to one, and many to many. So, I am not getting into much detail, jut a reference for yourself.
It might appear to you as if it increases the overhead/complexity. But, that's how threads(user-threads created in JVM) are operated upon. And, then the selection is based upon those memory models which I mentioned above. Check this Quora question and answers to that one, and please go through the best answer given by Robert-Love.
For further reading, I'd suggest you to read from Scheduling Algorithms explanation on OSDev.org and Operating System Concepts book by Galvin, Gagne, Silberschatz.

Inter Processor Interrupt usage

An educational principle is: There is not such a thing as a stupid question. The basic idea behind this is that people learn by asking.
I was asked to: "Can you show and explain at a programming level what bad will happen if every task could execute all instructions."
I did give the code
main(){
_asm_("cli;");
while(1);
}
and explained it (the system frozen for good- UP)
Then I was asked: "Is it possible give an example so that system do not freeze even this clearing interrupts is done?"
I did modify the previous example:
I did give the code
main(){
_asm_("cli;");
i=i/0;
while(1);
}
and explained it.
Trivially: If we have demand paging i=i/0 causes first a page fault (the data page not present) and an other task can be scheduled to run interrupts enabled during the disk read and later on divide by zero will throw this task away for good.
But the answers were based on UP. What about SMP? I must tell that answers are incomplete.
It still easy enough to construct:
int i;
main(){
for(i=0;i<100;i++)// Suppose we have less than 100 CPUs
if(fork())
{ sleep(5);//The generating task has (most probable) time to do all forks
_asm_("cli;");
while(1);
}
}
which will disable interrupts for all CPUs, because every CPU gets a poisonous task to run.
Even so far a stupid question did reveal many things good to learn to a beginner: privileged instructions, paging, fault handling, scheduling during DMA, fork.....
But a minor doubt remains (shame on me) about the first program running on a SMP.
Will one CPU be out permanently or not?
Other CPUs continue and can send re_schedule() IPI message.
What happens then?
It can be easy to speculate that the frozen CPU do not wake up, because interrupts are disabled.
But to be perfectly sure must know more.
My question was:
Is the Inter Processor Interrupt (IPI) maskable or non-maskable?
I mean in the most common "popular" implementations?
Excuse my stupid question. It can't be very difficult to find an answer. I will seek it.
I mean interrupt pin number (telling maskable, I guess).
My own answer - correct?
I studied the issue, because nobody else did like it, coming to following thoughts:
With important real-time applications we have had for a long time a watchdog timer (HW interrupting cpu to answer somehow "I am alive").
For example we have main control computer and standby computer taking care of the system if the main computer is down.
What about Linux?
What kind watchdog- have we one?
We can compile the Linux kernel with or without watchdog.
What the Linux watchdog does?
On many(!) x86/x86-64 type hardware there is a feature that enables us to generate 'watchdog NMI interrupts'.
It's even possible to disable the NMI watchdog in run-time by writing "0" to /proc/sys/kernel/nmi_watchdog.
If any CPU in the system does not execute the period local timer interrupt for more than 5 seconds, APIC tries to fix the situation by a non-maskable interrupt (cpu executes the handler, and kills the process)!
(SCC Linux is an different case as to NMI.)
My answers (in the original question) were based on the system without watchdog!
It is problematic to answer at a general level and give examples based on some fixed system. The answers can be correct or not depending the cpu and configuration and settings.
Anyway, talking about NMI did make some sense? Did it?
If the CPU didn't restrict access to some instructions, it would be too easy to accidentally or deliberately cause a catastrophe.
push $0
push $0
lidt (%esp)
int $42
This code sequence will reset an x86 processor. Here's why:
The code loads the IDTR register with an interrupt descriptor table (IDT) at linear address 0, with a size of one byte.
Raises interrupt 42, which can't work because it is beyond the 1-byte limit of the IDT.
The CPU tries to raise a general protection fault, interrupt 13. This fails too, because interrupt 13 is beyond the one byte limit.
The CPU tries to raise a double fault exception, interrupt 8. This fails too, interrupt 8 is beyond the limit of the IDT.
This is known as a triple-fault. The CPU does a shutdown bus cycle to tell the motherboard that it is now ignoring everything and stopping execution. The motherboard asserts reset, rebooting the machine.
This is actually negligible compared to what code could do. A code sequence could easily hijack the machine altogether and start destroying all of the data on the hard drive, it could send all of your files to a malicious server on the internet, it could change your password, enable remote access, connect out to a malicious server and grant an attacker unlimited shell access. There's no limit on what a program could do.
Processors have privileged instructions for two reasons, the primary purpose is to protect the operating system from buggy programs that might accidentally do something to bring down or hijack the whole machine. The secondary purpose is to restrict deliberately malicious programs from doing the same.

How to keep interrupts short?

The most heard advice in embedded programming is "keep your interrupts short".
Now my situation is that I have a very long running task in my main() loop (writing large blocks of data to SDcard), which can sometimes take 100ms. So to keep my system responsive I moved all other stuff to interrupt-handlers.
For example, normally one would handle the incoming UART data in an interrupt, then process the incoming command in the main() loop, and then send back the response. But in my case, the whole processing/handling of the commands also takes places in the interrupts, because my main() loop can be blocked for (relatively) long periods.
The optimal solution would be to switch to an RTOS but I don't have the RAM for it. Are there alternatives for my design where the interrupts can be short?
The traditional approach for this is for Interrupts to schedule a deferred procedure and end the interrupt as soon as possible.
Once the interrupt has finished, the list of deferred procedures is walked from most-important to least important.
Consider the case where you have your main (lower proiority) action, and two interrupts I1 and I2, where I2 is more important than main, but less important than I1.
In this case, let's suppose you're running main and I1 fires. I1 schedules a deferred procedure and signals to the hardware that I1 is done. I1's DPC now begins running. Suddenly I2 comes in from the hardware. I2's interrupt takes over from I1's DPC and schedules I2's DPC and signals to the hardware that it's done.
The scheduler then returns to I1's DPC (because it is more important), and when I1's DPC completes, I2's DPC begins (because it is more important than main), and then eventually returns execution to main.
This design allows you to schedule the importance of different interrupts, encourages you to keep your interrupts small, and allows you to complete DPCs in an ordered and in-order prioritized way.
There are 100 different ways to skin this cat, depending on CPU architecture (interrupt nesting & prioritization, software interrupt support, etc.) but let's take a pretty straightforward approach that is relatively simple to understand and free from the race conditions and resource-sharing hazards of a preemptive kernel.
(Disclaimer: my first choice is typically a preemptive real time kernel, many of them can run in extremely resource-constrained systems... SecurityMatt's suggestion is good but if you're not comfortable implementing your own preemptible kernel / task switcher, particularly one that handles asynchronous (interrupt-triggered) preemption, you can get wrapped around the axle pretty quickly. So what I'm proposing below is not as responsive as a preemption-based kernel, but it's much simpler and often adequate).
Create 3 event/work queues:
Q1 is the lowest priority and handles your slow, background SD card writes
Q2 holds requests to process incoming UART packets
Q3 (highest priority) holds UART RX FIFO read requests.
I split up the UART RX FIFO reading and the processing of the read packet so that the FIFO reading is always serviced ahead of the packet processing; maybe you want to keep them together, your choice.
For this to work, you break your large (~100ms) SD card write process into a bunch of smaller, discrete, run to completion steps.
So for example, to write 5 blocks, 20ms each, you write the first block, then enqueue "write next block" to Q1. You go back to your scheduler at the end of each step & scan the queues in priority order, starting with Q3. If Q2 and Q3 are empty, you pull the next event off of Q1 ("write next block"), and run that command for another 20ms before returning and scanning the queues again. If 20ms is not responsive enough, you break up each 20ms block write into a more fine-grained set of steps, continually posting to Q1 the next work step.
Now for the incoming UART stuff; in the UART RX ISR, you simple enqueue a "read UART FIFO" command in Q3, and return from interrupt back into the 20ms "write block" step that was interrupted. As soon as the CPU finishes the write, it goes back and scans the queues in priority order (worst case response will be 20ms if the block write had just begun at the time of the interrupt). The queue scanner (scheduler) will see that Q3 now has work to do, and it will run that command before going back and scanning again.
The responsiveness in your system, worst case, will be determined by the longest run-to-completion step in the system, regardless of priority. You keep your system very responsive by doing work in small, discrete, run to completion steps.
Note that I have to speak in generalities here. Maybe you want to read the UART RX FIFO in the ISR, put the data into a buffer, and only defer the packet processing, not the actual reading of the FIFO (then you'd only have 2 queues). You have to work this out for yourself. But I hope the approach makes sense.
This event-driven approach with prioritized queues is exactly the approach used by the Quantum Platform (QP) event-driven framework. The QP actually supports an underlying non-preemptive (cooperative) scheduler, such as what was described here, or a preemptive scheduler which runs the scheduler each an event is queued (similar to the approach suggested by SecurityMatt). You can see the code/implementation of the QP's cooperative scheduler over at QP website.
An alternative solution would be as follow:
Anywhere the FAT library can capture the processor for a long time, you insert a call to a new function which is normally very fast and return to the caller after a few machine cycles. Such fast function would not impact the real-time performance of your time consuming operation, such as reading/writing to SD Flash. You would insert such call in any loop that wait for a flash sector to be erased. You also insert a call to such function in between every 512 bytes written or 512 bytes read.
The goal of that function is to perform most of the task that you would normally have inside the "while(1)" loop in a typical "main()" for embedded device. It would first increment an integer and perform a fast modulo on the new value, then return if the modulo is not equal to an arbitrary constant. The code is as follow:
void premption_check(void)
{
static int fast_modulo = 0;
//divide the number of call
fast_modulo++;
if( (fast_modulo & 0x003F) != 3)
{
return;
}
//the processor would continue here only once every 64 calls to "premption_check"
Next, you call the functions that extract RS232 characters/strings from the serial port interrupts, process any command if complete strings are received, etc
The binary mask 0x3F used above means that we look only at the 6 least significant bits of the counter. When these 6 bits happen to be equal to the arbitrary value 5, when go ahead with the calls to functions which may take some micro-second or even milli-second to execute. You may want to try smaller or larger binary mask depending on the speed at which you want to service the serial port and other operations. You may even use simultaneously more than one mask to service some operation faster than other.
The FAT library and the SD card should not experience any problem when some sporadic delay happen in between two Flash erase operation, for example.
The solution given here works even with a micro-controller with only 2K byte, like many variant of 8051. As incredible as it may seems, the pinball machine of 1980 to 1990 had a few K of RAM, slow processors (like 10 MHz) and they where able to test one hundred switch... fully debounced, update a X/Y matrix display, produce sound effects, etc The solutions developed by these engineer can still be used to boost the performance of large system. Even with the best servers with 64 Gig RAM and many Terabyte of hard disk, I presume that any bytes count when some company want to index billions of WEB pages.
As no-one has suggested coming at it from this end yet I'll throw it in the hat:
It's possible that sticking the SD card service routine in a low-priority interrupt, maybe throwing in some DMA if you can, would free up your main loop & other interrupts to be more responsive, rather than being stuck in a main() loop waiting for longtime for something to finish.
The caveat to this is I don't know if the hardware has any way of triggering the interrupt when the SD card is ready for more, you might have to cheat by running a polling timer to check & force the interrupt. I'm not above that sort of thing though, if you have spare hardware timers & interrupts it can be done with very little overhead.
Resorting to an RTOS for something like this would seem overkill & an admission of failure to me... ;)