Do threads/processes have to disable interrupts while executing a critical section - locking

Let us consider a scenario:-
A Kernel thread acquires a lock and is in the middle of a critical section when an interrupt occurs.
The interrupt handler runs and arrives at the same critical section and tries to acquire lock and go to sleep.
Can this happen or are interrupts disabled during a critical section ?
what steps are taken to to avoid it ?
// Some Code
Acquire_lock()
Critical section //Interrupt occurs and arrives to acquire the same lock.
Disable_lock()

You would never allow code that holds a lock to be interrupted by code that attempts to acquire that same lock. If you mean inside the OS, it may require disabling all interrupts in code that interacts with objects that are also manipulated by interrupt handlers.
User-space threads and processes have no such issue. No interrupt handler acquires a lock that user-space threads can acquire. And if a thread that holds a user-space lock is interrupted, it will release it as soon as it gets rescheduled -- the user-space thread is still ready-to-run.

Related

what's different between the Blocked and Busy Waiting?

I known the implement of Busy Waiting. it's a death loop like this:
//main thread
while (true) {
msg = msgQueue.next();
msg.runnable.run();
}
//....msg queue
public Message next() {
while (true) {
if (!queue.isEmpty()) {
return queue.dequeue();
}
}
}
so, the method "next()" just looks like blocked, actually it runs all the time.
this was called "busy waiting" on book.
and what's the "process blocked"? what about its implement details?
is a death loop too? or some others? like signal mechanism?
For instance:
cat xxx | grep "abc"
process "cat" read a file and output them.
process "grep" waiting for input from "cat".
so before the "cat" output data, "grep" should be blocked, waiting for input and go on.
what details about this "blocked", a death loop read the input stream all the time? or really stop running, waiting a signal to wake up it to run?
The difference is basically in what happens to the process:
1. Busy Waiting
A process that is busy waiting is essentially continuously running, asking "Are we there yet? Are we there yet? How about now, are we there yet?" which consumes 100% of CPU cycles with this question:
bool are_we_there = false;
while(!are_we_there)
{
// ask if we're there (without blocking)
are_we_there = ask_if_we_are_there();
}
2. A process that is blocked (or that blocks)
A process that is blocked is suspended by the operating system and will be automatically notified when the data that it is waiting on becomes available. This cannot be accomplished without assistance from the operating system.
And example is a process that is waiting for a long-running I/O operation, or waiting for a timer to expire:
// use a system call to create a waitable timer
var timer = CreateWaitableTime()
// use another system call that waits on a waitable object
WaitFor(timer); // this will block the current thread until the timer is signaled
// .. some time in the future, the timer might expire and it's object will be signaled
// causing the WaitFor(timer) call to resume operation
UPDATE
Waitable objects may be implemented in different ways at the operating system level, but generally it's probably going to be a combination of hardware timers, interrupts and lists of waitable objects that are registered with the operating system by client code. When an interrupt occurs, the operating system's interrupt handler is called which in turn will scan though any waitable objects associated with that event, and invoke certain callback which in turn will eventually signal the waitable objects (put them in a signaled state). This is an over-simplification but if you'd like to learn more you could read up on interrupts and hardware timers.
When you say "a process is blocked" you actually mean "a thread is blocked" because those are the only schedulable entities getting CPU time. When a thread is busy waiting, it wastes CPU time in a loop. When a thread is blocked, the kernel code inside the system call sees that data or lock is not immediately available so it marks the thread as waiting. It then jumps to the scheduler which picks up another thread ready for execution. Such a code in a blocking system call might look like this:
100: if (data_available()) {
101: return;
102: } else {
103: jump_to_scheduler();
104: }
Later on the thread is rescheduled and restarts at line 100 but it immediately gets to the else branch and gets off the CPU again. When data becomes available, the system call finally returns.
Don't take this verbatim, it's my guess based on what I know about operating systems, but you should get the idea.

Process information when interrupted: stack or process control block

I am studying an Operating System course and we have this chapter about Processes. In this chapter we define the Process Control Block, which keeps the information about a process such as the program counter, content of registers, state, priority and so on. In this chapter it says that when the processor switches to another process (by interrupt), information will be saved in this process control block (PC, registers,...). In another chapter (1.4 Interrupts) it says when a process gets interrupted the PSW, PC and registers get put on the stack and when processor retakes control of this process it takes it from the stack.
It seems to be there are 2 different explanations here for what happens when an interrupt occurs. Do they both happen simultaneously or what? Can anyone explain this to me?
Thanks in advance
Sander
Think of an interrupt as a function call with the difference that it stores more information on the stack and occurs at any time breaking into the normal flow of the program instructions. So, if interrupt handler function decides just to return from the interrupt call, the state is restored from the stack.
Otherwise, if inside the interrupt call, OS decides to preempt the current user process, it saves all the process state to PCB and switch the stack to another process.
BTW, switching to another process can happen not only by interrupt but during any normal call to OS kernel API (syscall).

understanding the concept of running a program in interrupt handler

Early Cisco routers running IOS operating system enhanced their packet processing speed by doing packet switching within the interrupt handler instead in "regular" operating system process. Doing packet processing in interrupt handler ensured that context switching within operating system does not affect the packet processing. As I understand, interrupt handler is a piece of software in operating system meant for handling the interrupts. How to understand the concept of packet switching done within the interrupt handler?
use of interrupts is preferred when an event requires some immediate attention by the operating system, or a program which installed an interrupt service routine. This as opposed to polling, where software checks periodically whether a condition exists, which indicates that the event has occurred.
interrupt service routines aren't commonly meant to do a lot of work themselves. They are rather written to reach their end as quickly as possible, so that normal execution can resume. "normal execution" meaning, the location and state previous processing was interrupted when the interrupt occurred. reason is that it must be avoided that the same interrupt occurs again while its handler is still executed, or it may be ignored, or lead to incorrect results, or even worse, to software failure (crashes). So what an interrupt service routine usually does is, reading any data associated with that event and storing it in a queue, signalling that the queue experienced mutation, and setting things such that another interrupt may occur, then resume by restoring pre-interrupt context. the queued data, associated with that interrupt, can now be processed asynchronously, without risking that interrupts pile up.
The following is the procedure for executing interrupt-level switching:
Look up the memory structure to determine the next-hop address and outgoing interface.
Do an Open Systems Interconnection (OSI) Layer 2 rewrite, also called MAC rewrite, which means changing the encapsulation of the packet to comply with the outgoing interface.
Put the packet into the tx ring or output queue of the outgoing interface.
Update the appropriate memory structures (reset timers in caches, update counters, and so forth).
The interrupt which is raised when a packet is received from the network interface is called the "RX interrupt". This interrupt is dismissed only when all the above steps are executed. If any of the first three steps above cannot be performed, the packet is sent to the next switching layer. If the next switching layer is process switching, the packet is put into the input queue of the incoming interface for process switching and the interrupt is dismissed. Since interrupts cannot be interrupted by interrupts of the same level and all interfaces raise interrupts of the same level, no other packet can be handled until the current RX interrupt is dismissed.
Different interrupt switching paths can be organized in a hierarchy, from the one providing the fastest lookup to the one providing the slowest lookup. The last resort used for handling packets is always process switching. Not all interfaces and packet types are supported in every interrupt switching path. Generally, only those that require examination and changes limited to the packet header can be interrupt-switched. If the packet payload needs to be examined before forwarding, interrupt switching is not possible. More specific constraints may exist for some interrupt switching paths. Also, if the Layer 2 connection over the outgoing interface must be reliable (that is, it includes support for retransmission), the packet cannot be handled at interrupt level.
The following are examples of packets that cannot be interrupt-switched:
Traffic directed to the router (routing protocol traffic, Simple Network Management Protocol (SNMP), Telnet, Trivial File Transfer Protocol (TFTP), ping, and so on). Management traffic can be sourced and directed to the router. They have specific task-related processes.
OSI Layer 2 connection-oriented encapsulations (for example, X.25). Some tasks are too complex to be coded in the interrupt-switching path because there are too many instructions to run, or timers and windows are required. Some examples are features such as encryption, Local Area Transport (LAT) translation, and Data-Link Switching Plus (DLSW+).
More here: http://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-software-releases-121-mainline/12809-tuning.html

Strategy for feeding a watchdog in a multitask environment

Having moved some embedded code to FreeRTOS, I'm left with an interesting dilemma about the watchdog. The watchdog timer is a must for our application. Using FreeRTOS has been a huge boon for us too. When the application was more single-tasked, it fed the watchdog at timely points in its logic flow so that we could make sure the task was making logical progress in a timely fashion.
With multiple tasks though, that's not easy. One task could be bound up for some reason, not making progress, but another is doing just fine and making enough progress to keep the watchdog fed happily.
One thought was to launch a separate task solely to feed the watchdog, and then use some counters that the other tasks increment regularly, when the watchdog task ticks, it would make sure that all the counters looked like progress was being made on all the other tasks, and if so, go ahead and feed the watchdog.
I'm curious what others have done in situations like this?
A watchdog task that monitors the status of all the other tasks is a good solution. But instead of a counter, consider using a status flag for each task. The status flag should have three possible values: UNKNOWN, ALIVE, and ASLEEP. When a periodic task runs, it sets the flag to ALIVE. Tasks that block on an asynchronous event should set their flag to ASLEEP before they block and ALIVE when the run. When the watchdog monitor task runs it should kick the watchdog if every task is either ALIVE or ASLEEP. Then the watchdog monitor task should set all of the ALIVE flags to UNKNOWN. (ASLEEP flags should remain ASLEEP.) The tasks with the UNKNOWN flag must run and set their flags to ALIVE or ASLEEP again before the monitor task will kick the watchdog again.
See the "Multitasking" section of this article for more details: http://www.embedded.com/design/debug-and-optimization/4402288/Watchdog-Timers
This is indeed a big pain with watchdog timers.
My boards have an LED on a GPIO line, so I flash that in a while/sleep loop, (750ms on, 250ms off), in a next-to-lowest priority thread, (lowest is idle thread which just goes onto low power mode in a loop). I have put a wdog feed in the LED-flash thread.
This helps with complete crashes and higher-priority threads that CPU loop, but doesn't help if the system deadlocks. Luckily, my message-passing designs do not deadlock, (well, not often, anyway:).
Do not forget to handle possible situation where tasks are deleted, or dormant for longer periods of time. If those tasks were previously checked in with a watchdog task, they also need to have a 'check out' mechanism.
In other words, the list of tasks for which a watchdog task is responsible should be dynamic, and it should be organized so that some wild code cannot easily delete the task from the list.
I know, easier said then done...
I've design the solution using the FreeRTOS timers:
SystemSupervisor SW Timer which feed the HW WD. FreeRTOS Failure
causes reset.
Each task creates "its own" SW timer with SystemReset function.
Each task responsible to "manually" reload its timer before it expired.
SystemReset function saves data before commiting a suiside
Here is some pseudo-code listing:
//---------------------------------
//
// System WD
//
void WD_init(void)
{
HW_WD_Init();
// Read Saved Failure data, Send to Monitor
// Create Monitor timer
xTimerCreate( "System WD", // Name
HW_WD_INTERVAL/2, // Reload value
TRUE, // Auto Reload
0, // Timed ID (Data per timer)
SYS_WD_Feed);
}
void SYS_WD_Feed(void)
{
HW_WD_Feed();
}
//-------------------------
// Tasks WD
//
WD_Handler WD_Create()
{
return xTimerCreate( "", // Name
100, // Dummy Reload value
FALSE, // Auto Reload
pxCurrentTCB, // Timed ID (Data per timer)
Task_WD_Reset);
}
Task_WD_Reset(pxTimer)
{
TaskHandler_t th = pvTimerGetTimerID(pxTimer)
// Save Task Name and Status
// Reset
}
Task_WD_Feed(WD_Handler, ms)
{
xTimerChangePeriod(WD_Handler, ms / portTICK_PERIOD_MS, 100);
}

What happens when a thread makes kernel disable the interrupts and then that thread goes to sleep

I have this kernel code where I disable the interrupt to make this lock acquire operation atomic, but if u see the last else condition i.e. when lock is not available thread goes to sleep and interrupts are enable only after thread comes back from sleep. My question is so interrupts are disabled for whole OS until this thread comes out of sleep?
void Lock::Acquire()
{
IntStatus oldLevel = interrupt->SetLevel(IntOff); // Disabling the interrups to make the following statements atomic
if(lockOwnerThread == currentThread) //Checking if the requesting thread already owns lock
{
//printf("SM:error:%s already owns the lock\n",currentThread->getName());
DEBUG('z', "SM:error:%s already owns the lock\n",currentThread->getName());
(void) interrupt->SetLevel(oldLevel);
return;
}
if(lockOwnerThread==NULL)
{
lockOwnerThread = currentThread; // Lock owner ship is given to current thread
DEBUG('z', "SM:The ownership of the lock %s is given to %s \n",name,currentThread->getName());
}
else
{
DEBUG('z', "SM:Adding thread %s to request queue and putting it to sleep\n",currentThread->getName());
queueForLock->Append((void *)currentThread); // Lock is busy so add the thread to queue;
currentThread->Sleep(); // And go to sleep
}
(void) interrupt->SetLevel(oldLevel); // Enable the interrupts
}
I don't know the NACHOS and I would not make any assumptions on my own. So you have to test it.
The idea is simple. If this interrupt enable/disable functionality is local to the current process context then the following should happen when you call Sleep():
the process is marked as not-running, i.e. it is excluded from the list of processes the scheduler will consider to give a CPU time. Then the Sleep() function enforces the scheduler to do it's regular work - to find a process to run. If the list of running processes is not empty, the scheduler picks up a next available process and makes a context switch to this process. After this the state of interrupt management is restored from this new context.
If there are no processes to run then scheduler enters the Idle loop state and usually enables the interrupts. While the scheduler is in Idle loop it continues to poll the queue of the running processes until it get something to schedule.
Your process will get the control when it will be marked as running again. This could happen if some other process calls WakeUp() (or a like, as I mentioned the API is unknown to me)
When the scheduler will pick up your process to switch to it performs the usual (for your system) context switch that has the interrupts enabled flag set to false, so the execution continues at statement after the Sleep() call with interrupts disabled.
If the assumptions above are incorrect and the interrupts enabled flag is global, then there are two possibilities: either the system hangs as it can't serve the interrupts, or it has some workaround for such a situations.
So, you need to try. The best way is to read the kernel sources of course, if you have the access.))