How to prevent locks in redshift. ( Shared lock stopping a write job) - sql

I have a data warehouse which are used by multiple downstream users. They read the data from the redshift table. When they read the data, there is a shared lock enforced on the table. At that time, my daily job which is supposed to write on the table does not write as it cannot put an exclusive lock until the shared lock is clear.
Ideally my write job should take priority over any other read job. Can I enforce this is some way?

Usually this is done by your update process not requiring an exclusive lock or managing the need for locks so that the update process isn't blocked.
Can you describe your update process and which steps are requiring the exclusive locks?
Look at the locks and statements causing them when things are making forward progress. Reworking these parts should allow you to keep you updates moving while these read sessions are acting on the versions of data they started with.
It is also important to not have user transactions that hang around for days on end. This can happen when interactive sessions are just left open mid transaction. The also prevents errors due to some sessions seeing very old versions of data.

Related

Will a very slow write on a master DB cause read locking on a read replica DB?

Will a bulk insert on a master DB (only user is one job that writes to it) that takes 20 seconds cause the read replicas to also get locked for 20 seconds while the change gets propagated to them?
The only locks that get replicated with streaming replication (I guess that's what you are talking about) are ACCESS EXCLUSIVE locks, which are taken by LOCK TABLE, DROP,TRUNCATE and similar statements, but also (and that's the most frequent case) by vacuum truncation (removing empty blocks at the end of a table).
These locks will block activity on the standby. Everything else will not interfere with the activity on the standby.
The activity on the standby is limited to reading, and reads do not conflict with data modification: The documentation shows that the ACCESS SHARE lock taken by a SELECT on the table doesn't conflict with the ROW EXCLUSIVE lock takes by data modifications, so that latter lock doesn't even have to be replicated. And since SELECT doesn't take row locks at all, you can never be blocked that way.

Implementing a mutual exclusion system / distributed queue in Postgres

I want to implement a mutual exclusion system in PostgreSQL where multiple worker processes will temporarily lock resources (rows) from a table (queue) while they work on them. If the worker processes crash, I want the lock to be cleanly released and not have to rely on another process to clean up the leaked locks.
What I have come up with so far is to use a SELECT ... FOR UPDATE SKIP LOCKED query within a transaction, which locks the row it finds and skips any other locked row.
It works well but one of the issues is that the worker might take a while to do its task and I need to keep the transaction open for the entire duration of its task.
Another problem is that the workers work incrementally and persist their state to the database so that if they're stopped or crash, they can resume quickly where they were. The row being locked makes it impossible to persist their state in the same table (though I think I can get away from that by using another table to persist the state).
I've searched on the Web on how to implement a semaphore or a resource borrowing system in SQL/PostgreSQL but I haven't found something that fits my needs. Is there a simple way of achieving this with PostgreSQL?

How are transactions partitioned/isolated in SQLite?

I have been reading the SQLite documentation and also referencing code I have written previously but I don't seem to be able to find a definitive answer to what I imagine to be a rather simple question.
I would like to execute many (separate) compiled statements within a transaction, but child threads may also be creating transactions or just executing statements at the same time and I would not want them included in this particular transaction. Currently, I have a single database handle that I share between all threads.
So, my question is,
1) .. is it generally better to have some kind of semaphore around transactions to ensure they will not clash/collect with other statements being executed against a database handle. I already marshal writes to prevent problems with multithreaded issues with SQLite (although with WAL now it's very hard to unsettle it at all).
2) .. or are you expected to open multiple database connections and start/commit the transactions one per database connection if they will be concurrent?
Changes made in one database connection are invisible to all other database connections prior to commit.
So it seems a hybrid approach of having several connections open to the database provides adequate concurrency guarantees, trading off the expense of opening a new connection with the benefit of allowing multi-threaded write transactions.
A query sees all changes that are completed on the same database connection prior to the start of the query, regardless of whether or not those changes have been committed.
If changes occur on the same database connection after a query starts running but before the query completes, then it is undefined whether or not the query will see those changes.
If changes occur on the same database connection after a query starts running but before the query completes, then the query might return a changed row more than once, or it might return a row that was previously deleted.
For the purposes of the previous four items, two database connections that use the same shared cache and which enable PRAGMA read_uncommitted are considered to be the same database connection, not separate database connections.
Here is the SQLite information on isolation. Which is exceptionally useful to read and understand for this problem.

Handle Lock Manually in SQL Server?

I am new to SQL Server, but am having a fair knowledge of simple things like select/update/delete and other transaction. I am facing a dead lock scenario in my application. I have understood the scenario as many threads are parallel trying to run a set of update operations. Its is not a single update but a set of update operations.
I have understood that this cannot be avoided in my application as many people want to do a update simultaneously. So I want to have a manual lock system. First the thread 1 should check if the manual lock is available and then start the transaction. Mean while if the second thread requests for the lock it should be busy and hence the second thread should wait. Once the first is completed the second should acquire the lock and start with the transaction.
This is just a logic i have thought about. But I do not have any idea of how to do this in SQL Server. Are there any examples which can help me. Please let me know if you can give me some sample sql scripts or links that will be helpful for me. Thank you for your time and help.
You probably mean "semaphore". That is, something to serialise execution of the DML to only one process can run at a time.
This is native in SQL Server using sp_getapplock
You can configure 2nd processes to wait or fail when they call sp_getapplock, and also it can be self-cancelling in "transaction" mode.
You will still most likely end up in the same scenario. Having a dead lock based around your tailor made locks. SQL Server internally implements a very robust locking mechanism. You should use it.
The problem you're having is that resources (tables, indexes, etc.) are accessed (or modified) in a conflicting order by different transactions/threads.
If you create your own locking mechanism, you may end up with a dead lock just the same. Example:
Thread 1 creates a lock on Customer record
Thread 2 creates a lock on Order record
Thread 1 attempts to create a lock on Order record (but cannot proceed due to step 2)
Thread 2 attempts to create a lock on Customer record (but cannot proceed due to step 3)
Voila ... deadlock
The solution is to refactor the way resources are accessed, so records are always accessed in the same order and the problem will go away.
Thread 1 creates a lock on Customer record
Thread 2 attempts to create a lock on Customer record (but cannot proceed due to step 1)
Thread 1 creates a lock on Order record
Thread 1 completes transaction and unlocks both Order and Customer records
Thread 2 creates a lock on Customer record
Thread 2 creates a lock on Order record
Also, have a look here to read how locking can happen on a single table.
You manual Lock system sounds interesting but you need to aware that it will sacrifice concurrency, which is quite important for many OLTP application.
Advance db like Oracle and SQL server is quite good in avoiding dead lock and give you the tool to resolve dead lock, which help you just kill the session that cause the dead lock and let the other query finish it's job first.
Microsoft Has documentation which can be find here.
http://support.microsoft.com/kb/832524
Beside, there are many other reasons that could lead to deadlock. You can find some example here. how to solve deadlock problem?

What are the problems of using transactions in a database?

From this post. One obvious problem is scalability/performance. What are the other problems that transactions use will provoke?
Could you say there are two sets of problems, one for long running transactions and one for short running ones? If yes, how would you define them?
EDIT: Deadlock is another problem, but data inconsistency might be worse, depending on the application domain. Assuming a transaction-worthy domain (banking, to use the canonical example), deadlock possibility is more like a cost to pay for ensuring data consistency, rather than a problem with transactions use, or you would disagree? If so, what other solutions would you use to ensure data consistency which are deadlock free?
It depends a lot on the transactional implementation inside your database and may also depend on the transaction isolation level you use. I'm assuming "repeatable read" or higher here. Holding transactions open for a long time (even ones which haven't modified anything) forces the database to hold on to deleted or updated rows of frequently-changing tables (just in case you decide to read them) which could otherwise be thrown away.
Also, rolling back transactions can be really expensive. I know that in MySQL's InnoDB engine, rolling back a big transaction can take FAR longer than committing it (we've seen a rollback take 30 minutes).
Another problem is to do with database connection state. In a distributed, fault-tolerant application, you can't ever really know what state a database connection is in. Stateful database connections can't be maintained easily as they could fail at any moment (the application needs to remember what it was in the middle of doing it and redo it). Stateless ones can just be reconnected and have the (atomic) command re-issued without (in most cases) breaking state.
You can get deadlocks even without using explicit transactions. For one thing, most relational databases will apply an implicit transaction to each statement you execute.
Deadlocks are fundamentally caused by acquiring multiple locks, and any activity that involves acquiring more than one lock can deadlock with any other activity that involves acquiring at least two of the same locks as the first activity. In a database transaction, some of the acquired locks may be held longer than they would otherwise be held -- to the end of the transaction, in fact. The longer locks are held, the greater the chance for a deadlock. This is why a longer-running transaction has a greater chance of deadlock than a shorter one.
One issue with transactions is that it's possible (unlikely, but possible) to get deadlocks in the DB. You do have to understand how your database works, locks, transacts, etc in order to debug these interesting/frustrating problems.
-Adam
I think the major issue is at the design level. At what level or levels within my application do I utilise transactions.
For example I could:
Create transactions within stored procedures,
Use the data access API (ADO.NET) to control transactions
Use some form of implicit rollback higher in the application
A distributed transaction in (via DTC / COM+).
Using more then one of these levels in the same application often seems to create performance and/or data integrity issues.