Is there an efficient busy waiting method in kernel? - process

Suppose a process is waiting for a lock held by some other process. Either it spinlocks (Busy waits) or it goes to sleep, to be woken up when the lock is released. If the waiting time is too much, it is better to sleep, since too much resources will be hogged. If the waiting time is short, the spinlocking is preferred, to avoid the overhead from sleeping and waking. Now, usually the process itself takes the decision whether to spinlock or sleep. But is there a way to make this decision from the OS?
So, the question is, is there a technique for the OS to know when a process is awaiting a lock, and if so, any efficient technique using which the OS itself decides (from how long the waiting time might be for the lock to be released) whether to allow the process to busy-wait, or to sleep the process ?

Related

WDT in a preemptive RTOS kernel

I heard that the best way to use a watch dog timer in a preemptive kernel is to assign it to the lowest task/idle task and refresh it there, I fail to understand why though,what if high priority tasks keeps running and idle task doest run before timeout.
any clarifications?
Thanks.
fail to understand why though,what if high priority tasks keeps running and idle task doest run before timeout.
Well, that is kind of the point. If the lowest priority thread (or better, the idle thread) is starved, then your system will be missing deadlines, and is either poorly designed or some unexpected condition has occurred.
If it were reset at a high priority or interrupt, then all lower priority threads could be in a failed state, either running busy or never running at all, and the watchdog would be uselessly maintained while not providing any protection whatsoever.
It is nonetheless only a partial solution to system integrity monitoring. It addresses the problem of an errant task hogging the CPU, but it does not deal with the issue of a task blocking and never being scheduled as intended. There are any number of ways of dealing with that, but a simple approach is to have "software watchdogs", counters that get reset by each task, and decremented in a high- priority timer thread or handler. If any thread counter reaches zero, then the respective thread blocked for longer than intended, and action may be taken. This requires that each thread runs at an interval shorter than its watchdog counter reset value. For threads that otherwise block indefinitely waiting for infrequent aperiodic events, you might use a blocking timeout just to update the software watchdog.
There is no absolute rule about the priority of a watchdog task. It depends on your design and goals.
Generally speaking, if the watchdog task is the lowest priority task then it will fail to run (and the watchdog will reset) if any higher priority task becomes stuck or consumes too much of the CPU time. Consider that if the high priority task(s) is running 100% of the time then that's probably too much because lower priority tasks are getting starved. And maybe you want the watchdog to reset if lower priority tasks are getting starved.
But that general idea isn't a complete design. See this answer, and especially the "Multitasking" section of this article (https://www.embedded.com/watchdog-timers/) for a more complete watchdog task design. The article suggests making the watchdog task the highest priority task but discusses the trade-offs of the alternative.

It is said that using sleep() is poor design, but what about waiting for hardware to settle?

I am using sleep() in two ways in my current embedded (real time) software design:
To throttle a processing loop, but this is discussed here, and as pointed out thread priority will most likely answer very well for that.
Waiting for hardware to "settle". Lets say I am writing an interface to some hardware. Communications with the hardware is all good, but I want to change its mode and I know it only takes a small number of instruction cycles to do it. I am using a sleep(1); to pause briefly to allow for this. I could setup a loop that keeps pinging it until I receive a valid response, but this would arguably be harder to read (much more code) and, in fact, slower because of data transfer times. In fact I could probably do a usleep(100) or less in my case.
So my question is, is this a good practice? And if not, is there a better/efficient alternative?
Callback
The most ideal solution to this would be to have the hardware notify you when a particular operation is complete through some form of callback/signal.
When writing production code, I would almost always favor this solution above all others. Of course this is provided that the api you are using exposes such methods.
Poll
If there is no way for you to receive such events then the only other option would be for you to check if the operation has completed. The most naive solution would be one that constantly checks (spin-lock).
However, if you know roughly how long an operation should take you could always sleep for that duration, wake-up, check operation status then sleep again or continue.
If you are 100% sure about the timings and can guarantee that your thread is not woken up early then you can rely solely on sleep.
Poor Design?
I wouldn't necessarily say that using sleep for this task is poor design. Sometimes you have no other choice. I would say that to rely solely on sleep is poor design when you cannot guarantee timing because you can not be 100% sure that the operation you are waiting for has in fact completed.
In Linux I use sigsuspend it suspends the software until it receives a signal.
Example
My main thread needs a data, but this data isn't ready, so main thread is suspended.
Other thread reads the data and when it finishes, it fires a signal.
Now then main thread continues and it has the data ready.
If you use sleep the data can be ready or not.

Operating System Basics

I am reading process management,and I have a few doubts-
What is meant by an I/o request,for E.g.-A process is executing and
hence it is in running state,it is in waiting state if it is waiting
for the completion of an I/O request.I am not getting by what is meant by an I/O request,Can you
please give an example to elaborate.
Another doubt is -Lets say that a process is executing and suddenly
an interrupt occurs,then the process stops its execution and will be
put in the ready state,is it possible that some other process began
its execution while the interrupt is also being processed?
Regarding the first question:
A simple way to think about it...
Your computer has lots of components. CPU, Hard Drive, network card, sound card, gpu, etc. All those work in parallel and independent of each other. They are also generally slower than the CPU.
This means that whenever a process makes a call that down the line (on the OS side) ends up communicating with an external device, there is no point for the OS to be stuck waiting for the result since the time it takes for that operation to complete is probably an eternity (in the CPU view point of things).
So, the OS fires up whatever communication the process requested (call it IO request), flags the process as waiting for IO, and switches execution to another process so the CPU can do something useful instead of sitting around blocked waiting for the IO request to complete.
When the external device finishes whatever operation was requested, it generates an interrupt, so the OS is informed the work is done, and it can then flag the blocked process as ready again.
This is all a very simplified view of course, but that's the main idea. It allows the CPU to do useful work instead of waiting for IO requests to complete.
Regarding the second question:
It's tricky, even for single CPU machines, and depends on how the OS handles interrupts.
For code simplicity, a simple OS might for example, whenever an interrupt happens process the interrupt in one go, then resume whatever process it decides it's appropriate whenever the interrupt handling is done. So in this case, no other process would run until the interrupt handling is complete.
In practice, things get a bit more complicated for performance and latency reasons.
If you think about an interrupt lifetime as just another task for the CPU (From when the interrupt starts to the point the OS considers that handling complete), you can effectively code the interrupt handling to run in parallel with other things.
Just think of the interrupt as notification for the OS to start another task (that interrupt handling). It grabs whatever context it needs at the point the interrupt started, then keeps processing that task in parallel with other processes.
I/O request generally just means request to do either Input , Output or both. The exact meaning varies depending on your context like HTTP, Networks, Console Ops, or may be some process in the CPU.
A process is waiting for IO: Say for example you were writing a program in C to accept user's name on command line, and then would like to print 'Hello User' back. Your code will go into waiting state until user enters their name and hits Enter. This is a higher level example, but even on a very low level process executing in your computer's processor works on same basic principle
Can Processor work on other processes when current is interrupted and waiting on something? Yes! You better hope it does. Thats what scheduling algorithms and stacks are for. However the real answer depending on what Architecture you are on, does it support parallel or serial processing etc.

spin lock vs mutex sleep lock

Spin locks (busy waiting locks) are more efficient than mutex sleep locks for very short
critical sections. Suppose that the context switch time for a system (the time it takes to
save the current process and load the next) is time T. How long can a critical section
be before it is more efficient to use a mutex sleep lock rather than a spin lock?
It depends on the specific case. How much cpu time are you willing to burn on spinning to wake up faster?
Intel once said 20 cycles. But this was very long ago in the days on many more threads than cores.
If your waiting thread is very critical to you, it'd have it's own dedicated core and you would not care to spin forever for the benefit of the ultimate fastest wake up.
If it's less critical than that, and the core is shared with other threads, you may want to give the cpu time to some other thread. If you don't do that, the OS will eventually do that for you, but this is less than optimal, obviously.
Bottom line - test and see the differences in performance and then re-iterate, re-test, etc.

Is it safe to access the hard drive via many different GCD queues?

Is it safe? For instance, if I create a bunch of different GCD queues that each compress (tar cvzf) some files, am I doing something wrong? Will the hard drive be destroyed?
Or does the system properly take care of such things?
Dietrich's answer is correct save for one detail (that is completely non-obvious).
If you were to spin off, say, 100 asynchronous tar executions via GCD, you'd quickly find that you have 100 threads running in your application (which would also be dead slow due to gross abuse of the I/O subsystem).
In a fully asynchronous concurrent system with queues, there is no way to know if a particular unit of work is blocked because it is waiting for a system resource or waiting for some other enqueued unit of work. Therefore, anytime anything blocks, you pretty much have to spin up another thread and consume another unit of work or risk locking up the application.
In such a case, the "obvious" solution is to wait a bit when a unit of work blocks before spinning up another thread to de-queue and process another unit of work with the hope that the first unit of work "unblocks" and continues processing.
Doing so, though, would mean that any asynchronous concurrent system with interaction between units of work -- a common case -- would be so slow as to be useless.
Far more effective is to limit the # of units of work that are enqueued in the global asynchronous queues at any one time. A GCD semaphore makes this quite easy; you have a single serial queue into which all units of work are enqueued. Every time you dequeue a unit of work, you increment the semaphore. Every time a unit of work is completed, you decrement the semaphore. As long as the semaphore is below some maximum value (say, 4), then you enqueue a new unit of work.
If you take something that is normally IO limited, such as tar, and run a bunch of copies in GCD,
It will run more slowly because you are throwing more CPU at an IO-bound task, meaning the IO will be more scattered and there will be more of it at the same time,
No more than N tasks will run at a time, which is the point of GCD, so "a billion queue entries" and "ten queue entries" give you the same thing if you have less than 10 threads,
Your hard drive will be fine.
Even though this question was asked back in May, it's still worth noting that GCD has now provided I/O primitives with the release of 10.7 (OS X Lion). See the man pages for dispatch_read and dispatch_io_create for examples on how to do efficient I/O with the new APIs. They are smart enough to properly schedule I/O against a single disk (or multiple disks) with knowledge of how much concurrency is, or is not, possible in the actual I/O requests.