Run UPDATE query while the INSERTs are running - sql

I have a DB table which size is around 100,000 rows. I Need to run an UPDATE query on that table (affecting every row) which runs for approximately 25 minutes. During this 25 minutes, there will be approximately 300 INSERTs to the same table.
I'm wondering whether those INSERTs will run? Will they be blocked for the time the UPDATE is running, and then executed, or will they just never be executed?
I'm using Postgres database.

Yes, those inserts will run.
An UPDATE, even when changing all rows, does not block the INSERTs

If the update starts before the insert commits, then 100,000 existing records will be updated, the 300 new ones will not be. They will not block each other, unless there is something else going on like attempts to violate a constraint or elaborate triggers.

Related

SQL stored procedure - how to optimize slow delete?

I've got a seemingly simple stored procedure that is taking too long to run (25 minutes on about 1 million records). I was wondering what I can do to speed it up. It's just deleting records in a given set of statuses.
Here's the entire procedure:
ALTER PROCEDURE [dbo].[spTTWFilters]
AS
BEGIN
DELETE FROM TMW
WHERE STATUS IN ('AVAIL', 'CANCL', 'CONTACTED', 'EDI-IN', 'NOFRGHT', 'QUOTE');
END
I can obviously beef up my Azure SQL instance to run faster, but are there other ways to improve? Is my syntax not ideal? Do I need to index the STATUS column? Thanks!
So the answer, as it is generally the case with large data update operations, is to break it up into several smaller batches.
Every DML statement, by default, starts an implicit transaction, whether explicitly declared or not. By running the delete affecting a large number of rows in a single batch, locks are held on indexes and the base table for the duration of the operation and the log file will continue to grow, internally creating new VLFs for the entire transaction.
Moreover, if the delete is aborted before it completes, the rollback may well take considerably longer to complete since they are always single-threaded.
Breaking into batches, usually performed in some form of loop working progressively through a range of key values, allows the deletes to occur in smaller more manageable chunks. In this case, having a range of different status-values to delete separately appears to be enough to effect a worthwhile improvement.
You can use top keyword to delete large amount of data by using loop or use = sign instead of in keyword.

postgresql: delete all locks

question: there is a table with over 9000 rows. It must be cleaned but without any locks (table in active using). I tried to use pg_advisory_unlock_all, but no result.
select pg_advisory_unlock_all();
start transaction();
delete from table where id='1';
DELETE 1
start transaction();
delete from table where id='1';
(waiting for finish first transaction)
There is no way to delete data from a table without locking the rows you want to delete.
That shouldn't be a problem as long as concurrent access doesn't try to modify these rows or insert new ones with id = '1', because writers never block readers and vice versa in PostgreSQL.
If concurrent transactions keep modifying the rows you want to delete, that's a little funny (why would you want to delete data you need?). You'd have to wait for the exclusive locks, and you might well run into deadlocks. In that case, it might be best to lock the whole table with the LOCK statement before you start. Deleting from a table that small should then only take a very short time.

Update via a temp table

So I have a rather large table (150 million rows) that data scrub queries get run on nightly. Now these queries don't update a lot of records, but to get the records needed, that have to query that single table multiple times in sub queries, which takes some time.
So, would it be better for me to do a normal update statement, or would it be better if I put the few results I needed in a temp table, and then just did an update for those few rows, which would greatly reduce the locks during update.
I'm unsure how an update statement locks work when most of the time is spent querying. If it is going to only update 5 records, and runs for half and hour, will it release a record that it updated in the first minute, or does it wait till the end of the query?
Thanks
You need to use (and look into) into the ROWLOCK table hint. You can use it with the update statement while updating in batches of 5000 rows of less. This will attempt to place row locks in the target table (or on index keys, if a covering index is present). If for some reason that fails, the lock will be escalated to a table lock.
From MSDN (as for reasons why lock escalation might occur):
When the Database Engine checks for possible escalations at every 1250
newly acquired locks, a lock escalation will occur if and only if a
Transact-SQL statement has acquired at least 5000 locks on a single
reference of a table. Lock escalation is triggered when a Transact-SQL
statement acquires at least 5,000 locks on a single reference of a
table. For example, lock escalation is not triggered if a statement
acquires 3,000 locks in one index and 3,000 locks in another index of
the same table. Similarly, lock escalation is not triggered if a
statement has a self join on a table, and each reference to the table
only acquires 3,000 locks in the table.
Actually, there's more to read in this last article. You should have a look at mixed lock type escalation section.

Is there any real world difference in the following two SQL statements (updates that result in no net change)

Say I have a table 1 million rows and lets say 50% on the particular column is null (so 500k NULL and 500k non NULL). And I want to set all the rows to NULL.
Assume no indexing to simplify the domain.
UPDATE
MyTable
SET
MyColumn = NULL
or
UPDATE
MyTable
SET
MyColumn = NULL
WHERE
MyColumn IS NOT NULL
Logic dictates that the latter is more efficient. However won't the optimiser realise the first is the same as the second as the WHERE condition and the SET only reference MyColumn.
The optimizer works against SELECT statements.
The optimizer does not affect how a table is Updated.
When you ask SQL-Server to updated Every row, then it will Update EVERY Row.
It will also take a lot longer to do this because you're affecting every row; which I believe means it will affect your transaction log too.
Be VERY Careful NOT do this.
You will create Exclusive Locks on EVERY Record in the entire table when this happens.
Even if the data is not actually changing, SQL-Server will still update the record nonetheless.
This Might Cause Deadlocks on that table if another process tries to use it during that time.
I speak from experience where every night our main database table would lock up for 15 minutes while a process (someone else wrote) was updating the entire table... Twice.
This caused all the other queries to wait for it to complete (some would timeout).
Not even a simple Select statement could be run against it while it was Updating.
The optimizer will not realize that the first is the same as the second.
You should use the second form. The first form will log the changes to the records that are not actually changed under some circumstances (but perhaps not in this particular case). Here is a good reference on this subject.

Performance with inserting rows from inside CLR Stored Procedure

I have a SP written in C# which makes calculation on around 2 million rows. Calculations takes about 3 minutes. For each row result is generated in the form of three numbers.
Those results are inserted into temporary table which later is somehow processed.
Results are added in chunks and inserting takes sometimes over 200 minutes (yes, over 3 hours!). Sometimes it takes "only" 50 minutes.
I have modified it so results are kept in memory till the end and then whole 2 millions are dumped in one loop inside one transaction. Still - it takes around 20 minutes.
Similar loop written in SQL with transaction begin/commit takes less than 30 seconds.
Anyone has an idea where is the problem?
Processing 2 millions (so selecting them, etc) takes 3 minutes, inserting results in the best solution 20 minutes.
UPDATE: this table has one clustered index on identity column (to assure that rows are being physically appended at the end), no triggers, no other indexes, no other process is accessing it.
As long as we are all being vague, here is a vague answer. If inserting 2mil rows takes that long, I would check four problems in this order:
Validating foreign key references or uniqueness constraints. You shouldn't need any of these on your temporary table. Do the validation in your CLR before the record gets to the insert step.
Complicated triggers. Please tell me you don't have any triggers on the temporary table. Finish your inserts and then do more processing after everything is in.
Trying to recalculate indexes after each insert. Try dropping the indexes before the insert step and recreating after.
If these aren't it, you might be dealing with record locking. Do you have other processes that hit the temporary table that might be getting in the way? Can you stop them during your insert?