Clarification about the following API ReentranReadWriteLock - api

Directly from this API:
When constructed as fair, threads contend for entry using an
approximately arrival-order policy. When the currently held lock is
released either the longest-waiting single writer thread will be
assigned the write lock, or if there is a group of reader threads
waiting longer than all waiting writer threads, that group will be
assigned the read lock.
It compares a single writing thread to a group of reading threads. What if there were only one waiting thread instead of a group of thread as the API specifies.. Would it change anything or it refers both to individual threads and group of threads?
Thanks in advance.

I'm 95% certain that "group" in this case can be read as "one or more". It should be easy enough to write a test for this. Harder but also possible is to crack open the java source and see what it's doing.
The idea here is you can give the lock to 1 writer or 1+ readers at the same time. It's just trying to say that if there are multiple readers waiting before the next writer, they all get the lock at the same time. This is safe because they're just reading.

Related

Implementing a mutual exclusion system / distributed queue in Postgres

I want to implement a mutual exclusion system in PostgreSQL where multiple worker processes will temporarily lock resources (rows) from a table (queue) while they work on them. If the worker processes crash, I want the lock to be cleanly released and not have to rely on another process to clean up the leaked locks.
What I have come up with so far is to use a SELECT ... FOR UPDATE SKIP LOCKED query within a transaction, which locks the row it finds and skips any other locked row.
It works well but one of the issues is that the worker might take a while to do its task and I need to keep the transaction open for the entire duration of its task.
Another problem is that the workers work incrementally and persist their state to the database so that if they're stopped or crash, they can resume quickly where they were. The row being locked makes it impossible to persist their state in the same table (though I think I can get away from that by using another table to persist the state).
I've searched on the Web on how to implement a semaphore or a resource borrowing system in SQL/PostgreSQL but I haven't found something that fits my needs. Is there a simple way of achieving this with PostgreSQL?

Deadlock graph interpretation

We have SQL Server 2016.
We faced this deadlock issue and we don't understand something in the deadlock graph.
Here is how we are interpreting this:
The two processes in question were not waiting on each other. They were waiting on themselves and trying to acquire the same lock on the same resource in spite of already being the owner of that lock.
I think something is not right with this interpretation.
Could someone more knowledgeable explain what happened here?
The isolation level is READ_COMMITTED.
If one could explain every bit here that would be really helpful.
Many thanks in advance.
A deadlock occurs when two processes are trying to get at some data object, but each cannot complete as long as the other process is running. SQL Server picks one to run and one to be the victim.
In your pic, it appears that process 8557c42ca8 has a shared lock on something and process d468 has an exclusive lock on something. Each process is also a waiter and waiting on the process. So as long as neither process can proceed until one of the processes releases their lock, they are in an infinite loop.
The two processes in question were not waiting on each other. They were waiting on themselves
No. Each session owns a lock on a key and waits on a lock on the key locked by the other.
Could someone more knowledgeable explain what happened here?
You'd need to post the rest of the deadlock graph, the table definition (including indexes) and the code that each session had run from the beginning of its transaction up to the deadlock.

How to create a distributed 'debounce' task to drain a Redis List?

I have the following usecase: multiple clients push to a shared Redis List. A separate worker process should drain this list (process and delete). Wait/multi-exec is in place to make sure, this goes smoothly.
For performance reasons I don't want to call the 'drain'-process right away, but after x milliseconds, starting from the moment the first client pushes to the (then empty) list.
This is akin to a distributed underscore/lodash debounce function, for which the timer starts to run the moment the first item comes in (i.e.: 'leading' instead of 'trailing')
I'm looking for the best way to do this reliably in a fault tolerant way.
Currently I'm leaning to the following method:
Use Redis Set with the NX and px method. This allows:
to only set a value (a mutex) to a dedicated keyspace, if it doesn't yet exist. This is what the nx argument is used for
expires the key after x milliseconds. This is what the px argument is used for
This command returns 1 if the value could be set, meaning no value did previously exist. It returns 0 otherwise. A 1 means the current client is the first client to run the process since the Redis List was drained. Therefore,
this client puts a job on a distributed queue which is scheduled to run in x milliseconds.
After x milliseconds, the worker to receive the job starts the process of draining the list.
This works on paper, but feels a bit complicated. Any other ways to make this work in a distributed fault-tolerant way?
Btw: Redis and a distributed queue are already in place so I don't consider it an extra burden to use it for this issue.
Sorry for that, but normal response would require a bunch of text/theory. Because your good question you've already written a good answer :)
First of all we should define the terms. The 'debounce' in terms of underscore/lodash should be learned under the David Corbacho’s article explanation:
Debounce: Think of it as "grouping multiple events in one". Imagine that you go home, enter in the elevator, doors are closing... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you are debouncing the elevator departure. Consider that the same situation can happen again with a third person, and so on... probably delaying the departure several minutes.
Throttle: Think of it as a valve, it regulates the flow of the executions. We can determine the maximum number of times a function can be called in certain time. So in the elevator analogy you are polite enough to let people in for 10 secs, but once that delay passes, you must go!
Your are asking about debounce sinse first element would be pushed to list:
So that, by analogy with the elevator. Elevator should go up after 10 minutes after the lift came first person. It does not matter how many people crammed into the elevator more.
In case of distributed fault-tolerant system this should be viewed as a set of requirements:
Processing of the new list must begin within X time, after inserting the first element (ie the creation of the list).
The worker crash should not break anything.
Dead lock free.
The first requirement must be fulfilled regardless of the number of workers - be it 1 or N.
I.e. you should know (in distributed way) - group of workers have to wait, or you can start the list processing. As soon as we utter the phrase "distributed" and "fault-tolerant". These concepts always lead with they friends:
Atomicity (eg by blocking)
Reservation
In practice
In practice, i am afraid that your system needs to be a little bit more complicated (maybe you just do not have written, and you already have it).
Your method:
Pessimistic locking with mutex via SET NX PX. NX is a guarantee that only one process at a time doing the work (atomicity). The PX ensures that if something happens with this process the lock is released by the Redis (one part of fault-tolerant about dead locking).
All workers try to catch one mutex (per list key), so just one be happy and would process list after X time. This process can update TTL of mutex (if need more time as originally wanted). If process would crash - the mutex would be unlocked after TTL and be grabbed with other worker.
My suggestion
The fault-tolerant reliable queue processing in Redis built around RPOPLPUSH:
RPOPLPUSH item from processing to special list (per worker per list).
Process item
Remove item from special list
Requirements
So, if worker would crashed we always can return broken message from special list to main list. And Redis guarantees atomicity of RPOPLPUSH/RPOP. That is, there is only a problem group of workers to wait a while.
And then two options. First - if have much of clients and lesser workers use locking on side of worker. So try to lock mutex in worker and if success - start processing.
And vice versa. Use SET NX PX each time you execute LPUSH/RPUSH (to have "wait N time before pop from me" solution if you have many workers and some push clients). So push is:
SET myListLock 1 PX 10000 NX
LPUSH myList value
And each worker just check if myListLock exists we should wait not at least key TTL before set processing mutex and start to drain.

Handle Lock Manually in SQL Server?

I am new to SQL Server, but am having a fair knowledge of simple things like select/update/delete and other transaction. I am facing a dead lock scenario in my application. I have understood the scenario as many threads are parallel trying to run a set of update operations. Its is not a single update but a set of update operations.
I have understood that this cannot be avoided in my application as many people want to do a update simultaneously. So I want to have a manual lock system. First the thread 1 should check if the manual lock is available and then start the transaction. Mean while if the second thread requests for the lock it should be busy and hence the second thread should wait. Once the first is completed the second should acquire the lock and start with the transaction.
This is just a logic i have thought about. But I do not have any idea of how to do this in SQL Server. Are there any examples which can help me. Please let me know if you can give me some sample sql scripts or links that will be helpful for me. Thank you for your time and help.
You probably mean "semaphore". That is, something to serialise execution of the DML to only one process can run at a time.
This is native in SQL Server using sp_getapplock
You can configure 2nd processes to wait or fail when they call sp_getapplock, and also it can be self-cancelling in "transaction" mode.
You will still most likely end up in the same scenario. Having a dead lock based around your tailor made locks. SQL Server internally implements a very robust locking mechanism. You should use it.
The problem you're having is that resources (tables, indexes, etc.) are accessed (or modified) in a conflicting order by different transactions/threads.
If you create your own locking mechanism, you may end up with a dead lock just the same. Example:
Thread 1 creates a lock on Customer record
Thread 2 creates a lock on Order record
Thread 1 attempts to create a lock on Order record (but cannot proceed due to step 2)
Thread 2 attempts to create a lock on Customer record (but cannot proceed due to step 3)
Voila ... deadlock
The solution is to refactor the way resources are accessed, so records are always accessed in the same order and the problem will go away.
Thread 1 creates a lock on Customer record
Thread 2 attempts to create a lock on Customer record (but cannot proceed due to step 1)
Thread 1 creates a lock on Order record
Thread 1 completes transaction and unlocks both Order and Customer records
Thread 2 creates a lock on Customer record
Thread 2 creates a lock on Order record
Also, have a look here to read how locking can happen on a single table.
You manual Lock system sounds interesting but you need to aware that it will sacrifice concurrency, which is quite important for many OLTP application.
Advance db like Oracle and SQL server is quite good in avoiding dead lock and give you the tool to resolve dead lock, which help you just kill the session that cause the dead lock and let the other query finish it's job first.
Microsoft Has documentation which can be find here.
http://support.microsoft.com/kb/832524
Beside, there are many other reasons that could lead to deadlock. You can find some example here. how to solve deadlock problem?

LockObtainFailedException updating Lucene search index using solr

I've googled this a lot. Most of these issues are caused by a lock being left around after a JVM crash. This is not my case.
I have an index with multiple readers and writers. I'm am trying to do a mass index update (delete and add -- that's how lucene does updates). I'm using solr's embedded server (org.apache.solr.client.solrj.embedded.EmbeddedSolrServer). Other writers are using the remote, non-streaming server (org.apache.solr.client.solrj.impl.CommonsHttpSolrServer).
I kick off this mass update, it runs fine for a while, then dies with a
Caused by:
org.apache.lucene.store.LockObtainFailedException:
Lock obtain timed out:
NativeFSLock#/.../lucene-ff783c5d8800fd9722a95494d07d7e37-write.lock
I've adjusted my lock timeouts in solrconfig.xml
<writeLockTimeout>20000</writeLockTimeout>
<commitLockTimeout>10000</commitLockTimeout>
I'm about to start reading the lucene code to figure this out. Any help so I don't have to do this would be great!
EDIT: All my updates go through the following code (Scala):
val req = new UpdateRequest
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, false)
req.add(docs)
val rsp = req.process(solrServer)
solrServer is an instance of org.apache.solr.client.solrj.impl.CommonsHttpSolrServer, org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer, or org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.
ANOTHER EDIT:
I stopped using EmbeddedSolrServer and it works now. I have two separate processes that update the solr search index:
1) Servlet
2) Command line tool
The command line tool was using the EmbeddedSolrServer and it would eventually crash with the LockObtainFailedException. When I started using StreamingUpdateSolrServer, the problems went away.
I'm still a little confused that the EmbeddedSolrServer would work at all. Can someone explain this. I thought that it would play nice with the Servlet process and they would wait while the other is writing.
I'm assuming that you're doing something like:
writer1.writeSomeStuff();
writer2.writeSomeStuff(); // this one doesn't write
The reason this won't work is because the writer stays open unless you close it. So writer1 writes and holds on to the lock, even after it's done writing. (Once a writer gets a lock, it never releases until it's destroyed.) writer2 can't get the lock, since writer1 is still holding onto it, so it throws a LockObtainFailedException.
If you want to use two writers, you'd need to do something like:
writer1.writeSomeStuff();
writer1.close();
writer2.open();
writer2.writeSomeStuff();
writer2.close();
Since you can only have one writer open at a time, this pretty much negates any benefit you would get from using multiple writers. (It's actually much worse to open and close them all the time since you'll be constantly paying a warmup penalty.)
So the answer to what I suspect is your underlying question is: don't use multiple writers. Use a single writer with multiple threads accessing it (IndexWriter is thread safe). If you're connecting to Solr via REST or some other HTTP API, a single Solr writer should be able to handle many requests.
I'm not sure what your use case is, but another possible answer is to see Solr's Recommendations for managing multiple indices. Particularly the ability to hot-swap cores might be of interest.
>> But you have multiple Solr servers writing to the same location, right?
No, wrong. Solr is using the Lucene libraries and it is stated in "Lucene in Action" * that there can only be one process/thread writing to the index at a time. That is why the writer takes a lock.
Your concurrent processes that are trying to write could, perhaps, check for the org.apache.lucene.store.LockObtainFailedException exception when instantiating the writer.
You could, for instance, put the process that instantiates writer2 in a waiting loop to wait until the active writing process finishes and issues writer1.close(); which will then release the lock and make the Lucene index available for writing again. Alternatively, you could have multiple Lucene indexes (in different locations) being written to concurrently and when doing a search you would need to search through all of them.
* "In order to enforce a single writer at a time, which means an IndexWriter or an IndexReader doing deletions or changing norms, Lucene uses a file-based lock: If the lock file (write.lock, by default) exists in your index directory, a writer currently has the index open. Any attempt to create another writer on the same index will hit a LockObtainFailedException. This is a vital protection mechanism, because if two writers are accidentally created on a single index, it will very quickly lead to index corruption."
Section 2.11.3, Lucene in Action, Second Edition, Michael McCandless, Erik Hatcher, and Otis Gospodnetić, 2010