PThread RWLock Deadlocking with Recursive Locks - locking

I've been working on a small sand-boxed example to help me figure out how to use rwlocks. Everything seems fairly straightforward, however I'm getting deadlocks in my example every once and a while and don't understand why it's happening.
I've put the code example on pastebin because it's more than a few lines of code: http://pastebin.org/359203
If you run the example. When it eventually deadlocks the last three print statements will be one of two cases:
one:
th4: request lock
th3: request lock
th4: locked
two:
th3: request lock
th4: request lock
th3: locked
Based on the output. To me it seems like there is an eventual deadlock from a second call to a locking function, whether it's to a read lock, or a write lock. But since one of the threads has the lock, and the same thread is what calls the second locking function, why is it deadlocking? Even more interesting, what is it in this small case that is causing the deadlock?
Note that I'm on Mac OS X, and that this is a seriously contrived example. It's sand-boxed from something else I'm working on and wanted to make sure I get this part right.

pthread_rwlock supports recursive read locking, but not recursive write locking. If you write lock the lock while you already hold it, you have entered the realm of undefined behavior. This is the case for your thfn3().
It's clearer if you call the threads the "reader" (thfn4) and the "writer" (thfn3). Case one is then:
reader tries to lock
writer tries to lock and blocks waiting for reader to release lock
reader gets lock
reader tries to lock again and blocks waiting for writer to acquire lock and release lock
In this case, the reader is likely unable to lock again because there is a writer waiting on the lock, and would-be writers block would-be readers.
Case two is:
writer tries to lock
reader tries to lock and blocks waiting for writer to finish
writer gets lock
writer tries to lock again and blocks
This case can likely only be explained by appeal to details of the rwlock implementation.

Your problem is that pthread_rwlock_wrlock(3) is not reentrant. The documentation clearly states that the results of calling this method when the thread already holds the lock are undefined. Your code specifically calls the method twice without releasing the lock in between.

See the bug I reported with apple. This is the problem.
https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa/7/wo/0blX77DJS8lBTTxVnTsNDM/5.83.28.0.13

Here's the open radar bug.
http://openradar.appspot.com/8588290

I posted a workaround for this question in
Pthread RWLock on MAC Deadlocking but not on Linux?
It is platform independent, and the general method ought to allow for other tricks like upgrading from read to write, etc.

Related

Semaphore/Mutex lock/unlock frequency

I have some code which I need to lock using a semaphore or mutex.
The code is something like this:
callA();
callB();
callC();
.
.
.
callZ();
I would like to know the efficient way to lock it. The options I am thinking are
lock before callA() and unlock after callZ(). My concern is the lock remains set for a pretty long period.
lock and unlock after each function call. I am worried about the 'too much overhead' of grabbing and releasing the lock.
Appreciare your help !!!
It all depends on your use case. How much lock/unlock/lock/unlock performance penalty can you tolerate? Weighed against this, how long are you willing to make another task block while waiting for the lock? Are some of the threads latency-critical or interactive and other threads bulk or low-priority? Are there other tasks that will take the same lock(s) through other code paths? If so, what do those look like? If the critical sections in callA, callB, etc.. are really separate then do you want to use 26 different locks? Or do they manipulate the same data, forcing you to use a single lock?
By the way, if you are using Linux, definitely use (pthreads) mutexes, not semaphores. The fast path for mutexes is completely usersparce. Locking and unlocking them when there is no contention is quite cheap. There is no fast path for semaphores.
Without knowing anything else, I would advise fine grained locking, especially if your individual functions are already organized to not make assumptions that would only be true if the lock is held across them all. But as I said, it really depends what you're doing and why you're doing it.

iOS SQLite Slow Performance

I am using SQLite in my iOS app and I have a lot of saving/loading to do while the user is interacting with the UI. This is a problem as it makes the UI very jittery and slow.
I've tried doing the operations in an additional thread but I don't think this is possible in SQLite. I get the error codes SQLITE_BUSY and SQLITE_LOCKED frequently if I do that.
Is there a way to do this in multithreading without those error codes, or should I abandon SQLite?
It's perfectly possible, you just need to serialise the access to SQLite in your background thread.
My answer on this recent question should point you in the right direction I think.
As mentioned elsewhere, SQLite is fine for concurrent reads, but locks at the database level for writes. That means if you're reading and writing in different threads, you'll get SQLITE_BUSY and SQLITE_LOCKED errors.
The most basic way to avoid this is to serialise all DB access (reads and writes) either in a dispatch queue or an NSOperationQueue that has a concurrency of 1. As this access is not taking place on the main thread, your UI will not be impacted.
This will obviously stop reads and writes overlapping, but it will also stop simultaneous reads. It's not clear whether that's a performance hit that you can take or not.
To initialise a queue as described above:
NSOperationQueue *backgroundQueue = [[NSOperationQueue alloc] init];
[backgroundQueue setMaxConcurrentOperationCount:1];
Then you can just add operations to the queue as you see fit.
Having everything in a dedicated SQLite thread, or a one-op-at-a-time operation queue are great solutions, especially to solve your jittery UI. Another technique (which may not help the jitters) is to spot those error codes, and simply loop, retrying the update until you get a successful return code.
Put SQLite into WAL mode. Then reads won't be blocked. Not so writes - you need to serialize them. There are various ways how to achieve it. One of them is offered by SQLite - WAL hook can be used to signal that the next write can start.
WAL mode should generally improve performance of your app. Most things will be a bit faster. Reads won't be blocked at all. Only large transactions (several MB) will slow down. Generally nothing dramatic.
Don't abandon SQLite. You can definitely do it in a thread different than the UI thread to avoid slowness. Just make sure only one thread is accessing the database at a time. SQLite is not great when dealing with concurrent access.
I recommend using Core Data which sits on top of sqlite. I use it in a multithreaded environment. Here's a guide on Concurrency with Core Data.
OFF:
Have you checkout: FMDB it is a sqlite Wrapper and is thread safe. I used it in all my sqlite Project.

Retrying deadlocks in a loop, will they eventually resolve?

If I write code in a PL/SQL function which upon catching a ORA-00060 deadlock exception (Oracle 10g), just rolls back and retries the transaction, will, such a function complete in a finite amount of time (you can assume that the work to be done by the database is finite, not an infinite stream)?
Is there any particular reason why I should add a wait before retrying?
In general, it depends on what other transactions in the system, especially the other(s) involved in the deadlock, are doing. You could have a situation where the second attempt would block indefinitely due to locked resources, or even encounter a second deadlock.
At the very least, before implementing this solution, I think you should understand how the deadlock is arising and consider what is likely to happen in the other sessions involved when the first session gets the exception.

Restarting agent program after it crashes

Consider a distributed bank application, wherein distributed agent machines modify the value of a global variable : say "balance"
So, the agent's requests are queued. A request is of the form wherein value is added to the global variable on behalf of the particular agent. So,the code for the agent is of the form :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
lock_global_variable(balance,agent_machine_id);
///////////////////// **POINT A**
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
/////////////////// **POINT B**
dequeue(); // once transaction is complete, request can be dequeued
}
Now, if an agent's code crashes at POINT B, then obviously the request should not be processed again, otherwise the variable will be modified twice for the same request. To avoid this, we can make the code atomic, thus :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
*atomic*
{
lock_global_variable(balance,agent_machine_id);
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
dequeue(); // once transaction is complete, request can be dequeued
}
}
I am looking for answers to these questions :
How to identify points in code which need to be executed atomically 'automatically' ?
IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
Again,logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
In general,how can identify such atomic blocks in case of agents that work together. If one agent fails, others have to wait for it to restart ? How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
How to make the atomic blocks more fine-grained, to reduce performance bottlenecks ?
Q> How to identify points in code which need to be executed atomically 'automatically' ?
A> Any time, when there's anything stateful shared across different contexts (not necessarily all parties need to be mutators, enough to have at least one). In your case, there's balance that is shared between different agents.
Q> IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
A> It can help, but it has high costs attached. You need to rollback X entries, replay the scenario, etc. Better approach is to either make it all-transactional or have effective automatic rollback scenario.
Q> Again, logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
A> In some cases you can relax consistency. For example, CopyOnWriteArrayList does a concurrent write-behind and switches data on for new readers after when it becomes available. If write fails, it can safely discard that data. There's also compare and swap. Also see the link for the previous question.
Q> In general,how can identify such atomic blocks in case of agents that work together.
A> See your first question.
Q> If one agent fails, others have to wait for it to restart ?
A> Most of the policies/APIs define maximum timeouts for critical section execution, otherwise risking the system to end up in a perpetual deadlock.
Q> How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
A> It can to a fair degree. However testing concurrent code requires as much skills as to write the code itself, if not more.
Q> How to make the atomic blocks more fine-grained, to reduce performance bottlenecks?
A> You have answered the question yourself :) If one atomic operation needs to modify 10 different shared state variables, there's nothing much you can do apart from trying to push the external contract down so it needs to modify more. This is pretty much the reason why databases are not as scalable as NoSQL stores - they might need to modify depending foreign keys, execute triggers, etc. Or try to promote immutability.
If you were Java programmer, I would definitely recommend reading this book. I'm sure there are good counterparts for other languages, too.

Will pool the connection help threading in sqlite (and how)?

I currently use a singleton to acces my database (see related question) but now when try to add some background processing everything fall apart. I read the sqlite docs and found that sqlite could work thread-safe, but each thread must have their own db connection. I try using egodatabase that promise a sqlite wrapper with thread safety but is very buggy, so I return to my old FMDB library I start to see how use it in multi-thread way.
Because I have all code with the idea of singleton, change everything will be expensive (and a lot of open/close connections could become slow), so I wonder if, as the sqlite docs hint, build a pooling for each connection will help. If is the case, how make it? How to know which connection to get from the pool (because 2 threads can't share the connection)?
I wonder if somebody already use sqlite in multi-threading with NSOperation or similar stuff, my searching only return "yeah, its possible" but let the details to my imagination...
You should look at using thread-local variables to hold the connection; if the variable is empty (i.e., holding something like a NULL) you know you can safely open a connection at that point to serve the thread and store the connection back in the variable. Don't know how to do this with Obj-C though.
Also be aware that SQLite is not tuned for concurrent writes. Writer locks are expensive, so keep any time in a writing transaction (i.e., one that includes an INSERT, UPDATE or DELETE) to a minimum in all threads. Transaction commits are also expensive too.