Is there a Redshift cluster option to enforce autocommit=True for all connections?
I.e. not to allow to establish a connection / run a query unless the setting was set.
We're having a lot of locking issues because some SQL tools have autocommit=False by default.
I know of no such option and you likely wouldn't want one. The down side of autocommit is that all statements are their own transaction and must be processed through the commit queue. The commit queue is a singular resource in the cluster and can be overloaded if you have too many commits. ETL flows generally require performing multiple statements within a transaction to maintain database consistency.
With that said I feel your pain. Large poorly trained database users can cause all sorts of locking issues. Standardizing workbenches and their configuration can help but there will always be a few that have to use their own. Creating daemons that look for idle connections and close them also can help. Education is also key. The most important action is to rationalize your ETL flows to not be blocked by read locks. Some statements, like drop, will be blocked as long as a read lock is on the table but truncate can proceed with such locks. So drop-and-swap is not a good choice for tables that are used for queries by general users.
Sometimes I also wish there was a way to put certain users into autocommit along with read-only.
Related
I have a data warehouse which are used by multiple downstream users. They read the data from the redshift table. When they read the data, there is a shared lock enforced on the table. At that time, my daily job which is supposed to write on the table does not write as it cannot put an exclusive lock until the shared lock is clear.
Ideally my write job should take priority over any other read job. Can I enforce this is some way?
Usually this is done by your update process not requiring an exclusive lock or managing the need for locks so that the update process isn't blocked.
Can you describe your update process and which steps are requiring the exclusive locks?
Look at the locks and statements causing them when things are making forward progress. Reworking these parts should allow you to keep you updates moving while these read sessions are acting on the versions of data they started with.
It is also important to not have user transactions that hang around for days on end. This can happen when interactive sessions are just left open mid transaction. The also prevents errors due to some sessions seeing very old versions of data.
I was wondering if it is possible to add/update/delete an SQL Server database table, as well as an Informix database table at the same time.
Both databases will have the same table (data and all), so the query would only change just based on which database it is going to. For some reason, we need the data inside both databases and kept up in real time.
Is it possible to do this with a SQL Trigger or maybe a SProc?
Any insight of how to do this, or a push in the right direction would be very much appreciated.
Doing a synchronous update, ie. a distributed transaction by using a linked server, possible for a trigger, while technically possible, I would definitely advise against it. Aaron brings the issue of how reliable XA in general is, but my point is different: availability. Your update in SQL Server will fail if it cannot connect and update in Informix. Downtime (patching, maintenance, not to mention disasters) of the Informix site will imply downtime of the SQL Server site, driving your five 9's toward nine 5's quite fast... This is why I strongly advocate decoupling the application of updates. Transactional Replication is such an example of decoupling and it supports heterogenous environments (ie. Informix client downstream to accept the changes).
You will have a delay of update visibility (state in SQL Server will be reflected in Informix after delay that can be milliseconds, seconds, minutes, even hours in a bad day). And the updates are one way, nothing flows back from Informix to SQL Server. But doing master-master replication in an heterogeneous environment is something that not even Chuck Norris would attempt, just saying.
Maintaining two different DBMS with a single transaction requires a transaction monitor such as the XA system to coordinate the transactions. There are such systems. The XA specification is typically the underlying standard. Both Microsoft's SQL Server and IBM's Informix work with such systems, and it is possible to have SQL Server and Informix controlled by the same transaction monitor. I have fewer qualms about the technical competency of such systems than the others who've answered; I share their concerns about whether it is appropriate for you.
Such systems are very heavyweight. If you want consistency, all transactions that modify the single table described in the question will need to use the same XA services (plural; likely one for insert, one for update, one for delete) to do so. Further, if the same transactions need to manage any other tables too, then you need to add and use services for those tables as well. It is this aspect that tends to make such systems difficult to manage.
Using a replication system with the potential for delay before the sites are consistent is probably better than trying for absolute synchronicity, unless there are cogent demands for such synchronicity.
If there really is a demand for absolute synchronicity, then use a transaction monitor.
Do not roll your own.
They are hard to get right. Handling all the special cases is tricky. And (under the hypothesis that you need absolute synchronicity) doing it wrong is costly but easy.
That depends on your definition of "possible". Technically, you can use a technique called "two-phase commit."
The idea is you send the data to both databases and then a "prepare commit" command which does everything necessary to commit the data except for committing it. If the prepare fails, the commit would fail too. If prepare succeeds, then commit must succeed.
Brilliant idea, doesn't work in practice. One common case is that you send the commit to both databases and one of them gets lost on the way (network outage). Happens rarely but when it happens, you have an inconsistent state and, since this step must not fail, no good way to clean up.
So my solution works like this:
You load the data into a new table which has two extra columns where you can say "server X has seen this record"
You add a job which copies all jobs for server X to server X and updates the respective column. Write the job in such a way that it can be aborted and restarted at any time (i.e. it must be able to cope with cases where data already exists on the target side).
That way, you can distribute the data to any number of servers in a consistent, fault tolerant way.
I have an issue that people can't work with an intranet application that is querying the same tables that i'm querying from SQL Server Management Studio. They must wait until the query finished what can last ~10 minutes.
I've already reduced the deadlock-priority of SSMS but that seems to have no effect on the delays now.
I would think it should be possible that SQL-Server could handle both queries parallel or at least have an option to reduce SSMS' priority. So is there any way/option to ensure that some processes like w3sp.exe get high-priority and other "internal"(SSMS related queries) get low-priority in processor time? The server wasn't busy at all, only 6% CPU was used, why is this so? Also, why does SQL-Server seem to lock tables that are queried without any changes(updates/deletes). Can i avoid that?
Thank you in advance.
It's possible that you could reduce the transaction isolation level for some/all of the read-only operations being performed by either or both of the intranet application and your SSMS queries. But if you do this, you have to be wary of such things as dirty reads (where you read one or more rows from a table and these rows later turns out not to be committed by their owning transaction).
There are no priority level settings for connections within SQL Server (other than, as you've noted, volunteering to be the deadlock victim). OS level settings (e.g. process priorities) will have no effect on SQL Server - all it cares about are locks, and whether these locks are compatible between different connections.
You can use the SET TRANSACTION ISOLATION LEVEL statement to change your isolation at the connection level, or you can use locking hints (such as WITH NOLOCK) on individual tables within statements to have more granular control over what locks are being taken.
Note, though, that if you're running DML statements (e.g. INSERT or DELETE), these will still need to take exclusive locks, so if your intranet application wishes to query the same tables, it had to wait for the DML statement to complete, or it has to be modified to relaz its isolation. There's no means to specify the behaviour of other connections from your own queries - they have to choose their own isolation settings.
I would like to ask couple of questions regarding SQL Server Locking mechanism
If i am not using the lock hint with SQL Statement, SQL Server uses PAGELOCK hint by default. am I right??? If yes then why? may be its due to the factor of managing too many locks this is the only thing i took as drawback but please let me know if there are others. and also tell me if we can change this default behavior if its reasonable to do.
I am writing a server side application, a Sync Server (not using sync framework) and I have written database queries in C# code file and using ODBC connection to execute them. Now question is what is the best way to change the default locking from Page to Row keeping drawbacks in mind (e.g. adding lock hint in queries this is what i am planning for).
What if a sql query(SELECT/DML) is being executed without the scope of transaction and statement contains lock hint then what kind of lock will be acquired (e.g. shared, update, exclusive)? AND while in transaction scope does Isolation Level of transaction has impact on lock type if ROWLOCK hint is being used.
Lastly, If some could give me sample so i could test and experience all above scenarios my self (e.g. dot net code or sql script)
Thanks
Mubashar
No. It locks as it sees fit and escalates locks as needed
Let the DB engine manage it
See point 2
See point 2
I'd only use lock hints if you want specific and certain behaviours eg queues or non-blocking (dirty) reads.
More generally, why do you think the DB engine can't do what you want by default?
The default locking is row locks not page locks, although the way in which the locking mechanism works means you will be placing locks on all the objects within the hierarchy e.g. reading a single row will place a shared lock on the table, a shared lock on the page and then a shared lock on the row.
This enables an action requesting an exclusive lock on the table to know it may not take it yet, since there is a shared lock present (otherwise it would have to check every page / row for locks.)
If you issue too many locks for an individual query however, it performs lock escalation which reduces the granularity of the lock - so that is it managing less locks.
This can be turned off using a trace flag but I wouldn't consider it.
Until you know you actually have a locking / lock escalation issue you risk prematurely optimizing a non-existant problem.
From this post. One obvious problem is scalability/performance. What are the other problems that transactions use will provoke?
Could you say there are two sets of problems, one for long running transactions and one for short running ones? If yes, how would you define them?
EDIT: Deadlock is another problem, but data inconsistency might be worse, depending on the application domain. Assuming a transaction-worthy domain (banking, to use the canonical example), deadlock possibility is more like a cost to pay for ensuring data consistency, rather than a problem with transactions use, or you would disagree? If so, what other solutions would you use to ensure data consistency which are deadlock free?
It depends a lot on the transactional implementation inside your database and may also depend on the transaction isolation level you use. I'm assuming "repeatable read" or higher here. Holding transactions open for a long time (even ones which haven't modified anything) forces the database to hold on to deleted or updated rows of frequently-changing tables (just in case you decide to read them) which could otherwise be thrown away.
Also, rolling back transactions can be really expensive. I know that in MySQL's InnoDB engine, rolling back a big transaction can take FAR longer than committing it (we've seen a rollback take 30 minutes).
Another problem is to do with database connection state. In a distributed, fault-tolerant application, you can't ever really know what state a database connection is in. Stateful database connections can't be maintained easily as they could fail at any moment (the application needs to remember what it was in the middle of doing it and redo it). Stateless ones can just be reconnected and have the (atomic) command re-issued without (in most cases) breaking state.
You can get deadlocks even without using explicit transactions. For one thing, most relational databases will apply an implicit transaction to each statement you execute.
Deadlocks are fundamentally caused by acquiring multiple locks, and any activity that involves acquiring more than one lock can deadlock with any other activity that involves acquiring at least two of the same locks as the first activity. In a database transaction, some of the acquired locks may be held longer than they would otherwise be held -- to the end of the transaction, in fact. The longer locks are held, the greater the chance for a deadlock. This is why a longer-running transaction has a greater chance of deadlock than a shorter one.
One issue with transactions is that it's possible (unlikely, but possible) to get deadlocks in the DB. You do have to understand how your database works, locks, transacts, etc in order to debug these interesting/frustrating problems.
-Adam
I think the major issue is at the design level. At what level or levels within my application do I utilise transactions.
For example I could:
Create transactions within stored procedures,
Use the data access API (ADO.NET) to control transactions
Use some form of implicit rollback higher in the application
A distributed transaction in (via DTC / COM+).
Using more then one of these levels in the same application often seems to create performance and/or data integrity issues.