Operating Systems: deadlock possible if a process can only lock one mutex at a time? - system

Is a deadlock possible in an operating system which disallows nested locking, so that a process can only lock one mutex at a time?
I think it wouldn't be possible, since for a process to acquire another lock it would need to release any lock it's holding. but I am not that familiar with deadlock situations. Is my logic correct?
Thanks.

This depends how you define lock, if you mean any activity with the potential to block then yes, it would be correct that the if a deadlock is not possible if only one lock can be taken at once. If you only mean explicitly created mutexes and semaphores deadlock is still possible as there are things other than taking a lock which can cause blocking.
However the only implementations I know of which did something like this were old operating systems which had only a single lock for all shared resources and effectively only allowed a single thread to be in kernel space at once. This causes exceedingly poor performance in multi core systems, it is better to use other techniques such as ordered lock acquisition and time-outs in a modern multi-core or multi-cpu system rather than revert to this technique.

Related

Can a Deadlock occur with CPU as a resource?

I am on my fourth year of Software Engineering and we are covering the topic of Deadlocks.
The generalization goes that a Deadlock occurs when two processes A and B, use two resources X and Y and wait for the release of the other process resource before releasing theirs.
My question would be, given that the CPU is a resource in itself, is there a scenario where there could be a deadlock involving CPU as a resource?
My first thought on this problem is that you would require a system where a process cannot be released from the CPU by timed interrupts (it could just be a FCFS algorithm). You would also require no waiting queues for resources, because getting into a queue would release the resource. But then I also ask, can there be Deadlocks when there are queues?
CPU scheduler can be implemented in any way, you can build one which used FCFS algorithm and allowed processes to decide when they should relinquish control of CPU. but these kind of implementations are neither going to be practical nor reliable since CPU is the single most important resource an operating system has and allowing a process to take control of it in such a way that it may never be preempted will effectively make process the owner of the system which contradicts the basic idea that operating system should always be in control of the system.
As far as contemporary operating systems (Linux, Windows etc) are concerned, this will never happen because they don't allow such situations.

acquire() and release() lock operations with testandset()

I need to solve the following problem:
a. Show how to implement acquire() and release() lock operations using TestandSet instruction.
b. Identify a performance problem, that could occur in your solution when it runs on a multiprocessor, but does not occur on a uniprocessor. Describe a concrete scenario where the performance problem arises.
c. Describe an alternative lock implementation that reduces the performance problem in b, and explain how it helps in the concrete scenario you presented in b.
I have my acquire() and release() setup like these:
acquire() {
while(TestandSet(true)){
//wait for lock to be released
{
}
release() {
TestandSet(false);
}
However, I could not identify any performance issue regarding multiple processors or a single processor. What is the performance issue? Or, is my implementation of acquire() and release() correct?
Found on the testAndSet wiki:
The four major evaluation metrics for locks in general are uncontended lock-acquisition latency, bus traffic, fairness, and storage.
Test-and-set scores low on two of them, namely, high bus traffic and unfairness.
When processor P1 has obtained a lock and processor P2 is also waiting for the lock, P2 will keep incurring bus transactions in attempts to acquire the lock. When a processor has obtained a lock, all other processors which also wish to obtain the same lock keep trying to obtain the lock by initiating bus transactions repeatedly until they get hold of the lock. This increases the bus traffic requirement of test-and-set significantly. This slows down all other traffic from cache and coherence misses. It slows down the overall section, since the traffic is saturated by failed lock acquisition attempts. Test-and-test-and-set is an improvement over TSL since it does not initiate lock acquisition requests continuously.
When we consider fairness, we consider if a processor gets a fair chance of acquiring the lock when it is set free. In an extreme situation the processor might starve i.e. it might not be able to acquire the lock for an extended period of time even though it has become free during that time.
Storage overhead for TSL is next to nothing since only one lock is required. Uncontended latency is also low since only one atomic instruction and branch are needed.

Distributed Locking for Device

We have distributed cluster weblogic setup.
Our Use Case was whenever Device Contact our system need to compute Parameter and provision to the device. There can be concurrent request from devices. We cant reject any request from devices.So we are going with Async Processing approach.
Here problem we are facing is whenever device contacts we need to lock the source device as well as neighbor devices to provision optimized parameter.
Since we have cluster system, we require a distributed locking system which provides high performance.
Could you suggest us any framework/suggestion in java for distributed locking which suits to our requirement ?
Regards,
Sakumar
Typically, when you sense a need for distributed locking, that indicates a design flaw. Distributed locking is usually either slow or unsafe. It's slow when done correctly because strong consistency guarantees are required to ensure two processes can't hold the same lock at the same time, and unsafe when consistency constraints are relaxed in favor of performance gains.
Often you can find a better solution than distributed locking by doing something like consistent hashing to ensure related requests are handled by the same process. Similarly, leader election can be a more performant alternative to distributed locking if you can elect a leader and route related requests to the leader. But certainly there must be some cases where these solutions are not possible, and so I'd better answer your question...
Assuming fault tolerance is a requirement, and considering the performance and safety concerns mentioned above, Hazelcast may be a good option for your use case. It's a fast embedded in-memory data grid that has a distributed Lock implementation. Often it's nice to use an embedded system like Hazelcast rather than relying on another cluster, but Hazelcat does have the potential for consistency issues in certain partition scenarios, and that could result in two processes acquiring a lock. TBH I've heard more than a few complaints about locks in Hazelcast, but no doubt others have had positive experiences.
Alternatively, ZooKeeper is likely the most common system for distributed locking in Java. However, ZooKeeper tends to be significantly slower for writes than reads since its quorum based - though it is relatively fast and very mature - and locking is a write-heavy work load. Also, in contrast to Hazelcast, one major downside to ZooKeeper is that it's a separate cluster and thus a dependency on another external system. I think ZooKeeper's stability and maturity makes it worth a look.
There doesn't currently seem to be many proven projects in between Hazelcast (an embedded eventually strongly consistent framework) and ZooKeeper (a strongly consistent external service) which is why (disclaimer: self promotion incoming) I created Atomix to provide safe distributed locking and leader elections as an embedded system for Java. It's a decent option if you need a framework like Hazelcast that has the same (actually stronger) consistency guarantees as ZooKeeper.
If performance and scalability is paramount and you're expecting high rates of requests, you will likely have to sacrifice consistency and look at a Hazelcast or something similar.
Alternatively, if fault tolerance is not a requirement (I don't think you spshould cities that it is) you can even just use a Redis instance :-)

Does dispatch_sync have a conceptual performance advantage over a lock?

In objective-c, there are (at least) two approaches to synchronizing concurrent accesses to a shared resource. The older lock-based approach and the newer approach with Grand Central Dispatch (GCD), for the latter one using dispatch_sync to dispatch all accesses to a shared queue.
In the Concurrency Programming Guide, section Eliminating Lock-Based Code, it is stated that "the use of locks comes at a cost. Even in the non-contested case, there is always a performance penalty associated with taking a lock."
Is this a valid argument for the GCD approach?
I think it's not for the following reason:
A queue must have a list of queued tasks to do. One ore more threads can add tasks to this list via dispatch_sync and one or more worker threads need to remove elements from this list in order to execute the tasks. This must be guarded by a lock. So a lock needs to be taken there as well.
Please tell me if there is any other way how queues can do this without a lock that I'm not aware of.
UPDATE: Further on in the guide, it is implied that there is something I'm not aware of: "queueing a task does not require trapping into the kernel to acquire a mutex."
How does that work?
On current releases of OS X and iOS, both pthread mutexes and GCD queues (as well as GCD semaphores) are implemented purely in userspace without needing to trap into the kernel, except when there is contention (i.e. a thread blocking in the kernel waiting for an "unlock").
The conceptual advantage of GCD queues over locks is more about them being able to be used asynchronously, the asynchronous execution of a "locked" critical section on a queue does not involve any waiting.
If you are just replacing locks with calls to dispatch_sync you are not really taking full advantage of the features of GCD (though the implementation of dispatch_sync happens to be slightly more efficient mainly due to pthread mutexes having to satisfy additional constraints).
There exist lock free queuing implementations. One reason they are often pooh-poohed is that they are platform specific, since they rely on the processors atomic operations (like increment, decrement, compare-and-swap, etc) and the exact implementation of those will vary from one CPU architecture to another. Since Apple is both the OS and hardware vendor, this criticism is far less of an issue for Apple platforms.
The implication from the documentation is that GCD queue management uses one of these lock-free queues to achieve thread safety without trapping into the kernel.
For more information about one possible MacOS/iOS lock-free queue implementation, read here about these functions:
void OSAtomicEnqueue( OSQueueHead *__list, void *__new, size_t __offset);
void* OSAtomicDequeue( OSQueueHead *__list, size_t __offset);
It's worth mentioning here that GCD has been (mostly) open-sourced, so if you're truly curious about the implementation of it's queues, go forth and use the source, Luke.

Ticket lock algorithm performance?

Is anyone familiar with the ticket lock algorithm which replaces the basic spinlock algorithm in the linux kernel? I am hoping to find an expert on this. I've read from a few online sources that the ticket lock algorithm is supposed to be faster, since the naive algorithm overwhelms the CPU bus with all threads trying to get the lock at the same time. Can anyone confirm/deny this for me?
I did some experiments of my own. The ticket lock is indeed fair, but its performance is just about on par with the pthread spinlock algorithm. In fact, it is just a touch slower.
The way I see it, an unfair algorithm should be a bit faster since the thread that hogs the lock early on finishes more quickly, giving the scheduler less work to do.
I'd like to get some more perspective on this. If it isnt faster, why is ticket lock implemented in the kernel and why is it not used in user space? thanks!
Is anyone familiar with the ticket lock algorithm which replaces the basic spinlock algorithm in the linux kernel? I am hoping to find an expert on this. I've read from a few online sources that the ticket lock algorithm is supposed to be faster, since the naive algorithm overwhelms the CPU bus with all threads trying to get the lock at the same time. Can anyone confirm/deny this for me?
I did some experiments of my own. The ticket lock is indeed fair, but its performance is just about on par with the pthread spinlock algorithm. In fact, it is just a touch slower.
I think the introducing of ticket lock is mainly because of fairness reason. The speed and scalability of ticket lock and spinlock is almost the same comparing to scalable lock like MCS. Both of them introduce lot of cache line invalidate and memory read which overwhelms the CPU bus.
The way I see it, an unfair algorithm should be a bit faster since the thread that hogs the lock early on finishes more quickly, giving the scheduler less work to do.
There's no scheduler involved. Ticket lock and spinlock are busy-waiting lock, which are not blocked when waiting, but keep check the lock value. The program moves on once the lock is free. The control flow never goes the scheduler and comes back. The reason we use spinlock instead of block-wakeup lock is block-wakeup involve context switch, which is expensive, instead we just waiting and burning cpu time turns out cheaper. So busy-waiting locks can only be used in "short" critical sections.
I'd like to get some more perspective on this. If it isnt faster, why is ticket lock implemented in the kernel and why is it not used in user space? thanks!
It's in the kernel because kernel code has critical section too, so you need kernel space lock to protect kernel data. But of course, you can implement a use space ticket lock, and use it in your application.