I am trying to understand the concepts of Semaphores, I have the following piece of code. Initially Semaphore mutex is initialized to 1
Structure of Pi;
do{
wait(mutex);
Critical Section
signal(mutex);
Remainder section
}
while(1);
Considering N processes, does the above algorithm provides a good solution to the Critical Section problem?
My observation is that the first two conditions, i.e Mutual exclusion and Progress are being satisfied but not the bounded buffer. Is that correct?
Mutual exclusion is being satisfied if the semaphore maximum count is 1. Typically you would use a lock if you want mutual exclusion.
Progress isn't necessarily being satisfied. It depends on if the semaphore implementation guarantees fairness. On some operating systems, given two high priority threads and one with lower priority, it's possible for the low priority thread to be starved.
The bounded buffer problem is not being satisfied, but then what you show is not a producer-consumer program.
Related
I have Read about co-operative Scheduler which not let higher priority task run till lower priority task block itself. so if there is no delay in task the lower task will take the CPU forever is it correct? because I have thought the non preemptive is another name for cooperative but there is another article which has confused me which say in non preemptive higher task can interrupt lower task at sys tick not in the middle between ticks so what's correct ?
is actually cooperative and non preemptive are the same?
and Rate monotonic is one type of preemptive scheduler right?
it's priority didn't set manually the scheduler Algo decide priority based on execution time or deadline it is correct?
is it rate monotonic better than fixed priority preemptive kernel (the one which FreeRtos Used)?
These terms can never fully cover the range of possibilities that can exist. The truth is that people can write whatever kind of scheduler they like, and then other people try to put what is written into one or more categories.
Pre-emptive implies that an interrupt (eg: from a clock or peripheral) can cause a task switch to occur, as well as it can occur when a scheduling OS function is called (like a delay or taking or giving a semaphore).
Co-operative means that the task function must either return or else call an OS function to cause a task switch.
Some OS might have one specific timer interrupt which causes context switches. The ARM systick interrupt is suitable for this purpose. Because the tasks themselves don't have to call a scheduling function then this is one kind of pre-emption.
If a scheduler uses a timer to allow multiple tasks of equal priority to share processor time then one common name for this is a "round-robin scheduler". I have not heard the term "rate monotonic" but I assume it means something very similar.
It sounds like the article you have read describes a very simple pre-emptive scheduler, where tasks do have different priorities, but task switching can only occur when the timer interrupt runs.
Co-operative scheduling is non-preemptive, but "non-preemptive" might describe any scheduler that does not use preemption. It is a rather non-specific term.
The article you describe (without citation) however, seems confused. Context switching on a tick event is preemption if the interrupted task did not explicitly yield. Not everything you read in the Internet is true or authoritative; always check your sources to determine thier level of expertise. Enthusiastic amateurs abound.
A fully preemptive priority based scheduler can context switch on "scheduling events" which include not just the timer tick, but also whenever a running thread or interrupt handler triggers an IPC or synchronisation mechanism on which a higher-priority thread than the current thread is waiting.
What you describe as "non-preemptive" I would suggest is in fact a time triggered preemptive scheduler, where a context switch occurs only in a tick event and not asynchronously on say a message queue post or a semaphore give for example.
A rate-monotonic scheduler does not necessarily determine the priority automatically (in fact I have never come across one that did). Rather the priority is set (manually) according to rate-monotonic analysis of the tasks to be executed. It is "rate-monotonic" in the sense that it supports rate-monotonic scheduling. It is still possible for the system designer to apply entirely inappropriate priorities or partition tasks in such a way that they are insufficiently deterministic for RMS to actually occur.
Most RTOS schedulers support RMS, including FreeRTOS. Most RTOS also support variable task priority as both a priority inversion mitigation, and via an API. But to be honest if your application relies on either I would argue that it is a failed design.
My understanding of the Bulkhead pattern is that it's a way of isolating thread pools. Hence, interactions with different services use different thread pools: if the same thread pool is shared, one service timing out constantly might exhaust the entire thread pool, taking down the communication with the other (healthy) services. By using different ones, the impact is reduced.
Given my understanding, I don't see any reason to apply this pattern to non-blocking applications as threads don't get blocked and, therefore, thread pools wouldn't get exhausted either way.
I would appreciate if someone could clarify this point in case I'm missing something.
EDIT (explain why it's not a duplicate):
There's another (more generic) question asking about why using Circuit-Breaker and Bulkhead patterns with Reactor. The question was answered in a very generic way, explaining why all Resilience4J decorators are relevant when working with Reactor.
My question, on the other hand, is particular to the Bulkhead pattern, as I don't understand its benefits on scenarios where threads don't get blocked.
The Bulkhead pattern is not only about isolating thread pools.
Think of Little's law: L = λ * W
Where:
L – the average number of concurrent tasks in a queuing system
λ – the average number of tasks arriving at a queuing system per unit of time
W – the average service time a tasks spends in a queuing system
The Bulkhead pattern is more about controlling L in order to prevent resource exhaustion. This can be done by using:
bounded queues + thread pools
semaphores
Even non-blocking applications require resources per concurrent task which you might want to restrict. Semaphores could help to restrict the number of concurrent tasks.
The RateLimiter pattern is about controlling λ and the TimeLimiter about controlling the maximum time a tasks is allowed to spend.
An adaptive Bulkhead can even replace RateLimiters. Have a look at this awesome talk "Stop Rate Limiting! Capacity Management Done Right" by Jon Moore"
We are currently developing an AdaptiveBulkhead in Resilience4j which adapts the concurrency limit of tasks dynamically. The implementation is comparable to TCP Congestion Control algorithms which are using an additive increase/multiplicative decrease (AIMD) scheme to dynamically adapt a congestion window.
But the AdaptiveBulkhead is of course protocol-agnostic.
If you have a semaphore that is being used to restrict access to a shared resource or limit the number of concurrent actions, what is the locking algorithm to be able to change the maximum value of that semaphore once it's in use?
Example 1
In NSOperationQueue, there is a property named maxConcurrentOperationCount. This value can be changed after the queue has been created. The documentation notes that changing this value doesn't affect any operations already running, but it does affect pending jobs, which presumably are waiting on a lock or semaphore to execute.
Since that semaphore is potentially being held by pending operations, you can just replace it with one with a new count. So another lock must be needed in the change somewhere, but where?
Example 2:
In most of Apple's Metal sample code, they use a semaphore with an initial count of 3 to manage in-flight buffers. I'd like to experiment changing that number while my application is running, just to see how big of a difference it makes. I could tear down the entire class that uses that semaphore and then rebuild the Metal pipeline, but that's a bit heavy handed. Like above, I'm curious how I can structure a sequence of locks or semaphores to allow me to swap out that semaphore for a different one while everything is running.
My experience is with Grand Central Dispatch, but I'm equally interested in a C++ implementation that might use those locking or atomic constructs.
I should add that I'm aware I can technically just make unbalanced calls to signal and wait but that doesn't seem right to me. Namely, whatever code that is making these changes needs to be able to block itself if wait takes awhile to reduce the count...
I need to solve the following problem:
a. Show how to implement acquire() and release() lock operations using TestandSet instruction.
b. Identify a performance problem, that could occur in your solution when it runs on a multiprocessor, but does not occur on a uniprocessor. Describe a concrete scenario where the performance problem arises.
c. Describe an alternative lock implementation that reduces the performance problem in b, and explain how it helps in the concrete scenario you presented in b.
I have my acquire() and release() setup like these:
acquire() {
while(TestandSet(true)){
//wait for lock to be released
{
}
release() {
TestandSet(false);
}
However, I could not identify any performance issue regarding multiple processors or a single processor. What is the performance issue? Or, is my implementation of acquire() and release() correct?
Found on the testAndSet wiki:
The four major evaluation metrics for locks in general are uncontended lock-acquisition latency, bus traffic, fairness, and storage.
Test-and-set scores low on two of them, namely, high bus traffic and unfairness.
When processor P1 has obtained a lock and processor P2 is also waiting for the lock, P2 will keep incurring bus transactions in attempts to acquire the lock. When a processor has obtained a lock, all other processors which also wish to obtain the same lock keep trying to obtain the lock by initiating bus transactions repeatedly until they get hold of the lock. This increases the bus traffic requirement of test-and-set significantly. This slows down all other traffic from cache and coherence misses. It slows down the overall section, since the traffic is saturated by failed lock acquisition attempts. Test-and-test-and-set is an improvement over TSL since it does not initiate lock acquisition requests continuously.
When we consider fairness, we consider if a processor gets a fair chance of acquiring the lock when it is set free. In an extreme situation the processor might starve i.e. it might not be able to acquire the lock for an extended period of time even though it has become free during that time.
Storage overhead for TSL is next to nothing since only one lock is required. Uncontended latency is also low since only one atomic instruction and branch are needed.
In objective-c, there are (at least) two approaches to synchronizing concurrent accesses to a shared resource. The older lock-based approach and the newer approach with Grand Central Dispatch (GCD), for the latter one using dispatch_sync to dispatch all accesses to a shared queue.
In the Concurrency Programming Guide, section Eliminating Lock-Based Code, it is stated that "the use of locks comes at a cost. Even in the non-contested case, there is always a performance penalty associated with taking a lock."
Is this a valid argument for the GCD approach?
I think it's not for the following reason:
A queue must have a list of queued tasks to do. One ore more threads can add tasks to this list via dispatch_sync and one or more worker threads need to remove elements from this list in order to execute the tasks. This must be guarded by a lock. So a lock needs to be taken there as well.
Please tell me if there is any other way how queues can do this without a lock that I'm not aware of.
UPDATE: Further on in the guide, it is implied that there is something I'm not aware of: "queueing a task does not require trapping into the kernel to acquire a mutex."
How does that work?
On current releases of OS X and iOS, both pthread mutexes and GCD queues (as well as GCD semaphores) are implemented purely in userspace without needing to trap into the kernel, except when there is contention (i.e. a thread blocking in the kernel waiting for an "unlock").
The conceptual advantage of GCD queues over locks is more about them being able to be used asynchronously, the asynchronous execution of a "locked" critical section on a queue does not involve any waiting.
If you are just replacing locks with calls to dispatch_sync you are not really taking full advantage of the features of GCD (though the implementation of dispatch_sync happens to be slightly more efficient mainly due to pthread mutexes having to satisfy additional constraints).
There exist lock free queuing implementations. One reason they are often pooh-poohed is that they are platform specific, since they rely on the processors atomic operations (like increment, decrement, compare-and-swap, etc) and the exact implementation of those will vary from one CPU architecture to another. Since Apple is both the OS and hardware vendor, this criticism is far less of an issue for Apple platforms.
The implication from the documentation is that GCD queue management uses one of these lock-free queues to achieve thread safety without trapping into the kernel.
For more information about one possible MacOS/iOS lock-free queue implementation, read here about these functions:
void OSAtomicEnqueue( OSQueueHead *__list, void *__new, size_t __offset);
void* OSAtomicDequeue( OSQueueHead *__list, size_t __offset);
It's worth mentioning here that GCD has been (mostly) open-sourced, so if you're truly curious about the implementation of it's queues, go forth and use the source, Luke.