Update via a temp table - sql

So I have a rather large table (150 million rows) that data scrub queries get run on nightly. Now these queries don't update a lot of records, but to get the records needed, that have to query that single table multiple times in sub queries, which takes some time.
So, would it be better for me to do a normal update statement, or would it be better if I put the few results I needed in a temp table, and then just did an update for those few rows, which would greatly reduce the locks during update.
I'm unsure how an update statement locks work when most of the time is spent querying. If it is going to only update 5 records, and runs for half and hour, will it release a record that it updated in the first minute, or does it wait till the end of the query?
Thanks

You need to use (and look into) into the ROWLOCK table hint. You can use it with the update statement while updating in batches of 5000 rows of less. This will attempt to place row locks in the target table (or on index keys, if a covering index is present). If for some reason that fails, the lock will be escalated to a table lock.
From MSDN (as for reasons why lock escalation might occur):
When the Database Engine checks for possible escalations at every 1250
newly acquired locks, a lock escalation will occur if and only if a
Transact-SQL statement has acquired at least 5000 locks on a single
reference of a table. Lock escalation is triggered when a Transact-SQL
statement acquires at least 5,000 locks on a single reference of a
table. For example, lock escalation is not triggered if a statement
acquires 3,000 locks in one index and 3,000 locks in another index of
the same table. Similarly, lock escalation is not triggered if a
statement has a self join on a table, and each reference to the table
only acquires 3,000 locks in the table.
Actually, there's more to read in this last article. You should have a look at mixed lock type escalation section.

Related

Snowflake database: Question on table performance which is stored in snowflake

We have a continuous insert, update, and delete in a table that is in snowflake DB, can this slow down the performance of a table in snowflake over the period of time?
Yes. For two reasons.
because the changes of the INSERT, UPDATE, & DELETE alter the fragment the partition data, thus even if the same number of ROW are present after N hours/days, the layout of the rows can become unaligned to the affinity of queries you run, thus your performance profile can go from highly prunes partition reads, to full table reads.
Also with large number of changes, even if the data is all perfectly ordered after then, the share fact many changes are being made with mean you end up with way too many partitions, which slows down you SQL compilations.
You also can have bad performance if you are INSERT, UPDATE, & DELETE to the same table at the same time, as the second operation will be blocked by the former. This can waste wall clock, and credit allocation (if they are different warehouses)
Some things you can do to avoid this, is run clustering, rebuild the tables in "down time". Not delete the data, but insert into "delete tables" and then left join and exclude matches. We have done all the above.

postgresql: delete all locks

question: there is a table with over 9000 rows. It must be cleaned but without any locks (table in active using). I tried to use pg_advisory_unlock_all, but no result.
select pg_advisory_unlock_all();
start transaction();
delete from table where id='1';
DELETE 1
start transaction();
delete from table where id='1';
(waiting for finish first transaction)
There is no way to delete data from a table without locking the rows you want to delete.
That shouldn't be a problem as long as concurrent access doesn't try to modify these rows or insert new ones with id = '1', because writers never block readers and vice versa in PostgreSQL.
If concurrent transactions keep modifying the rows you want to delete, that's a little funny (why would you want to delete data you need?). You'd have to wait for the exclusive locks, and you might well run into deadlocks. In that case, it might be best to lock the whole table with the LOCK statement before you start. Deleting from a table that small should then only take a very short time.

Postgres - Deadlock while updating a column that is also part of where clause

My colleague at work and I were wondering if, during an update, a column is being updated while the same column is used in where clause, there are chances of deadlock.
For ex:
UPDATE EMPLOYEES
SET DEPT_ID = NULL
WHERE DEPT_ID = 13;
So if the table EMPLOYEES contains about a million records, are there chances of deadlock?
There is no chance for a deadlock at all. Not only will a single query never deadlock itself in Postgres (see comments), there is also no chance for a deadlock in combination with the same query in a concurrent transactions.
The minimum "requirements" for a deadlock:
At least two competing concurrent transactions.
Each of both must lock a resource that one of the others will try to access later.
Each of both must later try to access a resource locked by the other transaction. So that at least two wait for the other to finish.
In theory two concurrent, identical calls like you display have the potential for a deadlock if there are multiple rows with the same DEPT_IT. Since there is no ORDER BY for a DELETE, it can take an exclusive row lock on rows to delete in any arbitrary order. Two identical commands might start with different rows and end up deadlocking each other.
In practice, this is not going to happen because both concurrent deletes will take locks in the same order thereby voiding any potential for deadlocks. We would need additional concurrent transactions or more commands in the same transaction trying to lock resources out of order.
But all of this is completely unrelated to the fact that a column to be updated is also in the WHERE clause. (Even if indexes on the column are involved.) Due to the MVCC model of Postgres, it writes a new row version anyway, no matter which columns are actually updated.
If you should run into deadlocks involving out-of-order row locks, you can solve it using SELECT .. FOR UPDATE with a deterministic ORDER BY in a subquery:
Avoiding PostgreSQL deadlocks when performing bulk update and delete operations
Postgres, update and lock ordering

Does TRUNCATE TABLE grows up transaction log?

I have read that one of the differences between DELETE and TRUNCATE TABLE in Sql is the TRUNCATE operation cannot be rolled back and no triggers will be fired (as written in this site for example) :
QUESTION:
Does this mean that when I TRUNcATE TABLE that is containing millions of records, I should not be effecting the transaction log file -that is transaction log file should not grow up in the time of truncating-, am I correct?
In MS SQL Server (Books Online)
Compared to the DELETE statement, TRUNCATE TABLE has the following advantages:
Less transaction log space is used.
The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row. TRUNCATE TABLE removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log.
Fewer locks are typically used.
When the DELETE statement is executed using a row lock, each row in the table is locked for deletion. TRUNCATE TABLE always locks the table (including a schema (SCH-M) lock) and page but not each row.
Without exception, zero pages are left in the table.
After a DELETE statement is executed, the table can still contain empty pages. For example, empty pages in a heap cannot be deallocated without at least an exclusive (LCK_M_X) table lock. If the delete operation does not use a table lock, the table (heap) will contain many empty pages. For indexes, the delete operation can leave empty pages behind, although these pages will be deallocated quickly by a background cleanup process.
TRUNCATE TABLE removes all rows from a table, but the table structure and its columns, constraints, indexes, and so on remain. To remove the table definition in addition to its data, use the DROP TABLE statement.
If the table contains an identity column, the counter for that column is reset to the seed value defined for the column. If no seed was defined, the default value 1 is used. To retain the identity counter, use DELETE instead.
From: http://msdn.microsoft.com/en-us/library/ms177570.aspx
To the original question:
Technically TRUNCATE is deallocating the data pages from the table, effectively removing all records from it. This action in theory can be rolled back until none of the data pages are being re-used. The information on the deallocated pages are not removed, they are still available in the data file. These deallocated pages in the data file can be re-used (allocated to another table for example) and the data on them can be overwritten.
The transaction log contains the list of pages (in SQL Server) being deallocated from the table during TRUNCATE, but this list is much shorter than the list of all records, therefore the transaction log will not grow to the same extent.
Depending on the implementation of transactions and TRUNCATE on different RDBMS it can be possible to do a rollback within a transaction. Sometimes it is possible to do a 'rollback' (restore the table) after the transaction is committed, if the data on the pages are still intact and all information is available, but that is some black magic and usually not supported directly by the RDBMS.
Does this mean that when I TRUNcATE TABLE that is containing millions of records, I should not be effecting the transaction log file -that is transaction log file should not grow up in the time of truncating-, am I correct?
Well you don't specify that actual server software but in all cases that I'm aware of that's correct.
DELETE effectively works row-by-row, deleting the record, firing any appropriate triggers, and adding a transaction to the log.
TRUNCATE just removes all of the data in one swoop, not significantly affecting the transaction log (certainly not enough to allow a rollback) and not executing triggers.

When does sql exclusively lock a row in an update statement?

Can a race condition occur in sql under these conditions?
If I have this SQL update running in one thread call it statement 1:
Update Items
Set Flag = B
where Flag = A;
And this SQL update running in another call it statement 2:
Update Items
Set Flag = C
where Flag = A;
Is it possible for each thread to read the same record where Flag is equal to A and write the record with their own values? Such that statement 1 can write it first and then statement 2 writes it or visa versa?
The answer to this question depends on when the database exclusively locks the update. Does it happen before it finds the records or after it finds the records and evaluates the where clause?
First, there are three lock contexts:
Database level lock
Table level lock
Row level lock
Then you have four lock modes:
IX
IS
X
S
IX and IS locks are "intention" locks. These locks are held before acquiring other types of locks. X locks are exclusive (write) locks and S locks are shared (read) locks.
The locks (IX,IS,X or S) locks can be taken at any context level. An X lock at the database level will block all other operations in the database for example. This is the type of lock that SQLlite takes. An S lock is taken for the entire database during reads, and an X lock is taken for the entire database during writes. Writes will wait for any S locks to complete and will block new S and X locks until the write lock is released. This provides a serializable isolation transaction level.
For MySQL, the locking depends on the storage engine. MyISAM will take X and S locks on entire (sets of) tables. X locks will wait on existing S or X locks and block new locks. New X locks will be given higher priority in the queue, moved ahead of new S locks. This behavior can be changed by setting LOW_PRIORITY_UPDATES, which could result in write starvation because writes will be de-prioritized in favor of reads.
It is possible in MySQL to obtain an X lock over the entire database using 'FLUSH TABLES WITH READ LOCK'.
InnoDB locks rows as they are encountered via an index read. InnoDB locks index records and locks the records when the index records are traversed. InnoDB uses special locks called 'gap' locks to ensure REPEATABLE-READ transaction isolation level. Locks are held on index entries, so if a table is not well indexed for an UPDATE query, then many rows will be locked. Note that InnoDB does not create S locks for normal SELECT queries. It uses row versioning, not row level locking for consistent snapshots.
When acquiring X locks, the database needs to detect deadlocks. Consider the following:
>connection 1
start transaction;
update T set c = c + 1 order by id asc;
>connection 2
start transaction;
update T set c = c - 1 order by id desc;
In a row locking model, these two statements can not both complete successfully. The first would wait forever to acquire locks the second holds, and vice-versa. The database will pick one of the connections to roll back. InnoDB will pick the connection which has made the fewest number of changes. MyISAM will lock the whole table for whichever connection acquires the lock first, and then the second will run after the first completes.
The simple example given by you will be resolved by X locks at any context (database, table or row). If two connections begin at exactly the same type, both running two updates which try to update the same row, both will attempt to acquire an X lock. Only one connection can acquire the X lock. It is not possible to determine exactly which one will acquire the lock. The other connection will have to wait until the lock is released until it can acquire the X lock. Keep in mind, that if the row was locked by a DELETE or UPDATE, then the waiter might end up not acquiring a lock after waiting, because there is nothing left in the database to lock.
In your example, the first UPDATE to acquire the X lock, and the second UPDATE will then wait on the X lock and will eventually execute but not match any rows.
Exclusive lock, used for data-modification operations, such as INSERT, UPDATE, or DELETE will be used in this scenario.
An exclusive lock ensures that multiple updates cannot be made to the same resource at the same time.
You will not get a race condition in this scenario.
If you have a more complex scenario involving multiple tables then you may get race conditions, or deadlocks. There are many ways to avoid this, simplifying and separating queries, etc.
You can also apply hints to queries that tell SQL what type of lock to use.
http://msdn.microsoft.com/en-us/library/aa213026(v=sql.80).aspx
Sounds like you should read about locking. SQL server has a complex set of logic and will perform either table or row level locks based on the number of rows it estimates will require updates. Unless you specifically tell it which you want it to perform it can even vary from query to query. Usually if you are modifying a small subset of the table it will choose a row level lock.
SQL Server is designed with ACID in mind, thus it writes changes to its logs before performing any actual updates to the data. This allows any failed updates to be rolled back and allows consistency between queries (like your asking about). You can perform dirty reads to get around locking issues, however you cannot prevent SQL Server from locking inserted, updated and/or deleted records.
SQL Server Locking
EDIT: Here is an article about ACID.
ACID - Wikipedia
All SQL databases pretty much guarantee that such a collision will not occur. "When" locking occurs depends on whether locking is at the table, partition, page, or row level. Or, whether you have turned off such locking in your database.
What can happen, if you have concurrent update statements and multiple rows being updated, is that sone row are updated with the first, some with the second.
In general, I think of the where clause as being evaluated to select the row set, lock the rows one at a time, do the update and unlock. However, this depends on the type of locking. In this case, the scenario above would continue with the values flipping.
If you are concerned about this situation, use table level locking to force serialization when concurrent update requests are being processed.