Avoid distributed lock when executing an update transaction on Redis? - redis

I have the following Redis read/writes needs to be run in a transaction (atomically).
Read key 1
Read key 2
Read key 3
Write key X
And,
When a transaction is running, other processes are not allowed to write key 1, 2, 3, or X.
The transaction is atomic, no two transactions can be run at the same time.
I can use Redis lock (distributed lock) to achieve this. However, I heard that a lot of time it cannot acquire the lock from another team. Is it a way to avoid Redis lock?

Related

How to reduce downtime of table during inserts SQL Server

I have an operative table, call it Ops. The table gets queried by our customers via a web service every other second.
There are two processes that affect the table:
Deleting expired records (daily)
Inserting new records (weekly)
My goal is to reduce downtime to a minimum during these processes. I know Oracle, but this is the first time I'm using SQL Server and T-SQL. In Oracle, I would do a truncate to speed up the first process of deleting expired records and a partition exchange to insert new records.
Partition Exchanges for SQL Server seem a bit harder to handle, because from what I can read, one has to create file groups, partition schemes and partition functions (?).
What are your recommendations for reducing downtime?
A table is not offline because someone is deleting or inserting rows. The table can be read and updated concurrently.
However, under the default isolation level READ COMMITTED readers are blocked by writers and writers are blocked by readers. This means that a SELECT statement can take longer to complete because a not-yet-committed transaction is locking some rows the SELECT statement is trying to read. The SELECT statement is blocked until the transaction completes. This can be a problem if the transaction takes long time, since it appears as the table was offline.
On the other hand, under READ COMMITTED SNAPSHOT and SNAPSHOT isolation levels readers don't block writers and writers don't block readers. This means that a SELECT statement can run concurrently with INSERT, UPDATE and DELETE statements without waiting to acquire locks, because under these isolation levels SELECT statements don't request locks.
The simplest thing you can do is to enable READ COMMITTED SNAPSHOT isolation level on the database. When this isolation level is enabled it becomes the default isolation level, so you don't need to change the code of your application.
ALTER DATABASE MyDataBase SET READ_COMMITTED_SNAPSHOT ON
If your problem is "selects getting blocked," you can try 'NO LOCK' hint. But be sure to read the implications. You can check https://www.mssqltips.com/sqlservertip/2470/understanding-the-sql-server-nolock-hint/ for details.

Strange deadlock PostgreSQL deadlock issue with SELECT FOR UPDATE

I am building a locking system based on PostgreSQL, I have two methods, acquire and release.
For acquire, it works like this
BEGIN
while True:
SELECT id FROM my_locks WHERE locked = false AND id = '<NAME>' FOR UPDATE
if no rows return:
continue
UPDATE my_locks SET locked = true WHERE id = '<NAME>'
COMMIT
break
And for release
BEGIN
UPDATE my_locks SET locked = false WHERE id = '<NAME>'
COMMIT
This looks pretty straightforward, but it doesn't work. The strange part of it is, I thought
SELECT id FROM my_locks WHERE locked = false AND id = '<NAME>' FOR UPDATE
should only acquire the lock on target row only if the target row's locked is false. But in reality, it's not like that. Somehow, even no locked = false row exists, it acquire lock anyway. As a result, I have a deadlock issue. It looks like this
Release is waiting for SELECT FOR UPDATE, and SELECT FOR UPDATE is doing infinite loop while it's holding a lock for no reason.
To reproduce the issue, I wrote a simple test here
https://gist.github.com/victorlin/d9119dd9dfdd5ac3836b
You can run it with psycopg2 and pytest, remember to change the database setting, and run
pip install pytest psycopg2
py.test -sv test_lock.py
The test case plays out like this:
Thread-1 runs the SELECT and acquires the record lock.
Thread-2 runs the SELECT and enters the lock's wait queue.
Thread-1 runs the UPDATE / COMMIT and releases the lock.
Thread-2 acquires the lock. Detecting that the record has changed since its SELECT, it rechecks the data against its WHERE condition. The check fails, and the row is filtered out of the result set, but the lock is still held.
This behaviour is mentioned in the FOR UPDATE documentation:
...rows that satisfied the query conditions as of the query snapshot will be locked, although they will not be returned if they were updated after the snapshot and no longer satisfy the query conditions.
This can have some unpleasant consequences, so a superfluous lock isn't that bad, all things considered.
Probably the simplest workaround is to limit the lock duration by committing after every iteration of acquire. There are various other ways to prevent it from holding this lock (e.g. SELECT ... NOWAIT, running in a REPEATABLE READ or SERIALIZABLE isolation level, SELECT ... SKIP LOCKED in Postgres 9.5).
I think the cleanest implementation using this retry-loop approach would be to skip the SELECT altogether, and just run an UPDATE ... WHERE locked = false, committing each time. You can tell if you acquired the lock by checking cur.rowcount after calling cur.execute(). If there is additional information you need to pull from the lock record, you can use an UPDATE ... RETURNING statement.
But I would have to agree with #Kevin, and say that you'd probably be better off leveraging Postgres' built-in locking support than trying to reinvent it. It would solve a lot of problems for you, e.g.:
Deadlocks are automatically detected
Waiting processes are put to sleep, rather than having to poll the server
Lock requests are queued, preventing starvation
Locks would (generally) not outlive a failed process
The easiest way might be to implement acquire as SELECT FROM my_locks FOR UPDATE, release simply as COMMIT, and let the processes contend for the row lock. If you need more flexibility (e.g. blocking/non-blocking calls, transaction/session/custom scope), advisory locks should prove useful.
PostgreSQL normally aborts transactions which deadlock:
The use of explicit locking can increase the likelihood of deadlocks, wherein two (or more) transactions each hold locks that the other wants. For example, if transaction 1 acquires an exclusive lock on table A and then tries to acquire an exclusive lock on table B, while transaction 2 has already exclusive-locked table B and now wants an exclusive lock on table A, then neither one can proceed. PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete. (Exactly which transaction will be aborted is difficult to predict and should not be relied upon.)
Looking at your Python code, and at the screenshot you showed, it appears to me that:
Thread 3 is holding the locked=true lock, and is waiting to acquire a row lock.
Thread 1 is also waiting for a row lock, and also the locked=true lock.
The only logical conclusion is that Thread 2 is somehow holding the row lock, and waiting for the locked=true lock (note the short time on that query; it is looping, not blocking).
Since Postgres is not aware of the locked=true lock, it is unable to abort transactions to prevent deadlock in this case.
It's not immediately clear to me how T2 acquired the row lock, since all the information I've looked at says it can't do that:
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.
I was not able to find any evidence of PostgreSQL "magically" upgrading row locks to table locks or anything similar.
But what you're doing is not obviously safe, either. You're acquiring lock A (the row lock), then acquiring lock B (the explicit locked=true lock), then releasing and re-acquiring A, before finally releasing B and A in that order. This does not properly observe a lock hierarchy since we try both to acquire A while holding B and vice-versa. But OTOH, acquiring B while holding A should not fail (I think), so I'm still not sure this is outright wrong.
Quite frankly, it's my opinion that you'd be better off just using the LOCK TABLE statement on an empty table. Postgres is aware of these locks and will detect deadlocks for you. It also saves you the trouble of the SELECT FOR UPDATE finagling.
Also, you should add locked = true in the release code:
BEGIN
UPDATE my_locks SET locked = false WHERE id = '<NAME>' AND locked = true
COMMIT
If not, you are updating the record whatever locked state it is (in your case, even when locked = false), and adding the odds of causing a deadlock.

Why is an implicit table lock being released prior to end of transaction in RedShift?

I have an ETL process that is building dimension tables incrementally in RedShift. It performs actions in the following order:
Begins transaction
Creates a table staging_foo like foo
Copies data from external source into staging_foo
Performs mass insert/update/delete on foo so that it matches staging_foo
Drop staging_foo
Commit transaction
Individually this process works, but in order to achieve continuous streaming refreshes to foo and redundancy in the event of failure, I have several instances of the process running at the same time. And when that happens I occasionally get concurrent serialization errors. This is because both processes are replaying some of the same changes to foo from foo_staging in overlapping transactions.
What happens is that the first process creates the staging_foo table, and the second process is blocked when it attempts to create a table with the same name (this is what I want). When the first process commits its transaction (which can take several seconds) I find that the second process gets unblocked before the commit is complete. So it appears to be getting a snapshot of the foo table before the commit is in place, which causes the inserts/updates/deletes (some of which may be redundant) to fail.
I am theorizing based on the documentation http://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html where it says:
Concurrent transactions are invisible to each other; they cannot detect each other's changes. Each concurrent transaction will create a snapshot of the database at the beginning of the transaction. A database snapshot is created within a transaction on the first occurrence of most SELECT statements, DML commands such as COPY, DELETE, INSERT, UPDATE, and TRUNCATE, and the following DDL commands :
ALTER TABLE (to add or drop columns)
CREATE TABLE
DROP TABLE
TRUNCATE TABLE
The documentation quoted above is somewhat confusing to me because it first says a snapshot will be created at the beginning of a transaction, but subsequently says a snapshot will be created only at the first occurrence of some specific DML/DDL operations.
I do not want to do a deep copy where I replace foo instead of incrementally updating it. I have other processes that continually query this table so there is never a time when I can replace it without interruption. Another question asks a similar question for deep copy but it will not work for me: How can I ensure synchronous DDL operations on a table that is being replaced?
Is there a way for me to perform my operations in a way that I can avoid concurrent serialization errors? I need to ensure that read access is available for foo so I can't LOCK that table.
OK, Postgres (and therefore Redshift [more or less]) uses MVCC (Multi Version Concurrency Control) for transaction isolation instead of a db/table/row/page locking model (as seen in SQL Server, MySQL, etc.). Simplistically every transaction operates on the data as it existed when the transaction started.
So your comment "I have several instances of the process running at the same time" explains the problem. If Process 2 starts while Process 1 is running then Process 2 has no visibility of the results from Process 1.

Design a Lock for SQL Server to help relax the conflict between INSERT and SELECT

SQL Server is SQL Azure, basically it's SQL Server 2008 for normal process.
I have a table, called TASK, constantly have new data in (new task), and removed (task complete)
For new data in, I use INSERT INTO .. SELECT ..., most of time takes very long, lets say dozen of minutes.
For old data out, I first use SELECT (WITH NOLOCK) to get task, UPDATE to let other thread know this task already starts to process, then DELETE once finished.
Dead lock sometime happens on SELECT, most time happens on UPDATE and DELETE.
this is not time critical task, so I can start process the new data once all INSERT finished. Is there any kind of LOCK to ask SELECT not to select it before the INSERT finished? Or any kind of other suggestion to avoid Conflict. I can redesign table if needed.
later the sqlserver2005,resolve lock is easy.
for conflict
1.you can use the service broker.
2.use the isolution level.
dbcc useroptions ,at last row ,you can see the deflaut isolution level is read_committed,this is the session level.
we can change the level to read_committed_snapshot for conflict,in sqlserver, not realy row lock like oracle.but we can use this method implement.
ALTER DATABASE DBName
SET READ_COMMITTED_SNAPSHOT ON;
open this feature,must in single user schame.
and you can test it.
for session A ,session B.
A:update table1 set name = 'new' with(Xlock) where id = 1
B:you still update other row and select all the data from table.
my english is not very good,but for lock ,i know.
in sqlserver,for function ,there are three locks.
1.optimistic lock ,use the timestamp(rowversion) control.
2.pessimism lock ,force lock when use the date.use Ulock,Xlock and so on.
3.virtual lock,use the proc getapplock().
if you need lock schame in system architecture,please me email : mjjjj2001#163.com
Consider using service broker if this is a processing queue.
There are a number of considerations that affect performance and locking. I surmise that the data is being updated and deleted in a separate session. Which transaction isolation level is in use for the insert session and the delete session.
Has the insert session and all transactions committed and closed when the delete session runs? Are there multiple delete sessions running concurrently? It is very important to have an index on the columns you are using to identify a task for the SELECT/UPDATE/DELETE statements, especially if you move to a higher isolation level such as REPEATABLE READ or SERIALIZED.
All of these issues could be solved by moving to Service Broker if it is appropriate.

Voluntary transaction priority in Oracle

I'm going to make up some sql here. What I want is something like the following:
select ... for update priority 2; // Session 2
So when I run in another session
select ... for update priority 1; // Session 1
It immediately returns, and throws an error in session 2 (and hence does a rollback), and locks the row in session 1.
Then, whilst session 1 holds the lock, running the following in session 2.
select ... for update priority 2; // Session 2
Will wait until session 1 releases the lock.
How could I implement such a scheme, as the priority x is just something I've made up. I only need something that can do two priority levels.
Also, I'm happy to hide all my logic in PL/SQL procedures, I don't need this to work for generic SQL statements.
I'm using Oracle 10g if that makes any difference.
I'm not aware of a way to interrupt an atomic process in Oracle like you're suggesting. I think the only thing you could do would be to programmaticaly break down your larger processes into smaller ones and poll some type of sentinel table. So instead of doing a single update for 1 million rows perhaps you could write a proc that would update 1k, check a jobs table (or something similar) to see if there's a higher priority process running, and if a higher priority process is running, to pause its own execution through a wait loop. This is the only thing I can think that would keep your session alive during this process.
If you truly want to abort the progress of your currently running, lower priority thread and losing your session is acceptable, then I would suggest a jobs table again that registered the SQL that was being run and the session ID that it is run on. If you run a higher priority statement it should again check the jobs table and then issue a kill command to the low priority session (http://www.oracle-base.com/articles/misc/KillingOracleSessions.php) along with inserting a record into the jobs table to note the fact that it was killed. When a higher-priority process finishes it could check the jobs table to see if it was responsible for killing anything and if so, reissue it.
That's what resource manager was implemented for.