SQL locking issue - sql

I have much information in the table named "A". The table named "B" is empty.
I get 500 rows from table A, which are not in the table "B":
SELECT TOP 500 * FROM A WHERE
NOT EXISTS (
SELECT * FROM B WHERE
(
A.id1=B.id1 AND A.id2=B.id2
)
)
Everything works well, But I have one problem here.
I have a web application, which calls that SQL query. Imagine that 200 people together call my application, there will be duplicates in the table named "B". Sow how can I lock those rows?
Is this correct?
SELECT TOP 500 * FROM A WITH (ROWLOCK) WHERE
NOT EXISTS (
SELECT * FROM B WITH (ROWLOCK) WHERE
(
A.id1=B.id1 AND A.id2=B.id2
)
)
FOR MORE CLEARLY:
imagine that when you open my web application, page calls (for instance Servlet or PHP) SQL query.
If many person together open my application there is threading problem. while open query is writing data to the table named B, another query parallely doing same thing.
Threading problem: while one thread read 500 row, another thread may read the same data because first thread is still writing data at this moment and has not finished it yet.
thread 1 read that data from table "A":
1,2,3,4,5,6,7 ... 500
thread 1 wriote that data to the table named "B":
1,2,3,4,5,6,7,...495
thread 1 is not finished and thread 2 read data that from table "a":
496, 497 .. 500 ,.... 995
thread 2 write
496, 497 .. 500 ,.... 995
then again thread 1 write data
495 .. 500
For exampl

A rowlock is only going to lock the row.
Moving to tablelock would impact performance.
I would add a status column to A of InProcess.
Then in a transaction mark that to true.
So your query would also have a where A.InProcess = false.

I need to point out that the queries that you have posted won't actually write anything to Table B.
That being said, if you're just looking to lock down rows that are being worked on in Table A, I would suggest adding a status column to the table and using a stored procedure similar to the following.
DECLARE #Results TABLE
(
id1 int NOT NULL,
id2 int NOT NULL --Assuming these are ints
)
UPDATE TOP (500) A with (ROWLOCK, READPAST)
SET [Status] = 'InProcess'
OUTPUT inserted.id1,
inserted.id2
INTO #Results
WHERE [Status] <> 'InProcess'
SELECT * FROM #Results temp
INNER JOIN A
ON A.id1 = temp.id1 AND A.id2=temp.id2
I have used this method when I had multiple workers coming to a single table looking for tasks to do.
The keys points here are:
SELECT statement do not lock or block, but UPDATE statments will.
The ROWLOCK hint is only used to prevent table locks. It does not magically create row locks. We provide the hint only so that we don't lock down the whole table.
The READPAST hint tells each query that if it encounters a row lock, it should proceed on the the next row instead of waiting on the lock to release. We provide this hint so that multiple queries can execute at the same time.
We set the [Status] field to indicate that this row is being worked on and should be ignored until such time as we decide to reset the status (if ever). This prevents multiple queries from reaching the status after the locks have been released.
While we've used an UPDATE to give us the locks we want, we still need to actually retrieve some data. The OUTPUT statement into a table variable lets us capture the keys so that we can merge back into the table later.
There's a lot going on here, and maybe you don't need all this, but this is the method I found to give me high concurrency while preventing race conditions.

Related

Synchronization of queries to a MariaDB database

Because of some high-availability considerations, I design a system where multiple processes will communicate/synchronize via the database (most likely MariaDB, but I am open to looking into PostgreSQL and MySQL options).
One of the requirements identified is that a process must take a piece of work from the database, without letting another process take the same piece of work concurrently.
Specifically, here is the race condition I have in mind:
Process A starts a SQL transaction and runs SELECT * FROM requests WHERE ReservedTS IS NULL ORDER BY CreatedTS LIMIT 100. Here ReservedTS and CreatedTS are DATETIME columns storing the time the piece of work was created by a work submitter process and reserved by a work executor process correspondingly.
Process B starts a transaction, runs the same query and gets the same set of results.
Process A runs UPDATE requests WHERE id IN (<list of IDs selected above>) AND ReservedTS IS NULL SET ReservedTS=NOW()
Process B runs the same query, however, because its transaction has its own snapshot of the data, the ReservedTS will appear not null to Process B, so the items get reserved twice.
Process A commits the transaction.
Process B commits the transaction, overwriting the values of process A.
Could you please help to resolve the above data race?
You can easily do that by using exclusive locks:
For simplification the test table:
CREATE TABLE t1 (id int not null auto_increment primary key, reserved int);
INSERT INTO t1 VALUES (0,0), (1,0);
Process A:
BEGIN
SELECT id, reserved from t1 where id=2 and reserved=0 FOR UPDATE;
UPDATE t1 SET reserved=1 WHERE id=2 and reserved=0;
COMMIT
If Process B tries to update the same entry before Process A finished the transaction it has to wait until lock was released (or a timeout occurred):
update t1 set reserved=1 where id=2 and reserved=0;
Query OK, 0 rows affected (12.04 sec)
Rows matched: 0 Changed: 0 Warnings: 0
And as you can see, Process B didn't update anything.

How to SELECT COUNT from tables currently being INSERT?

Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.
What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query
SELECT COUNT(1) FROM TABLE_A WITH (nolock)
hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.
What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?
(Edit)
OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).
Short answer: You can't do this.
Longer answer: A single INSERT statement is an atomic operation. As such, the query has either inserted all the rows or has inserted none of them. Therefore you can't get a count of how far through it has progressed.
Even longer answer: Martin Smith has given you a way to achieve what you want. Whether you still want to do it that way is up to you of course. Personally I still prefer to insert in manageable batches if you really need to track progress of something like this. So I would rewrite the INSERT as multiple smaller statements. Depending on your implementation, that may be a trivial thing to do.
If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.
The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.
It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.
With respect to your question on NOLOCK
It is trivial to test
Connection 1
USE tempdb
CREATE TABLE T2
(
X INT IDENTITY PRIMARY KEY,
F CHAR(8000)
);
WHILE NOT EXISTS(SELECT * FROM T2 WITH (NOLOCK))
LOOP:
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;
WAITFOR delay '00:00:10';
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT
DROP TABLE T2
Connection 2
use tempdb;
--Insert 2000 * 2000 = 4 million rows
WITH T
AS (SELECT TOP 2000 'x' AS x
FROM master..spt_values)
INSERT INTO T2
(F)
SELECT 'X'
FROM T v1
CROSS JOIN T v2
OPTION (MAXDOP 1)
Example Results - Showing row count increasing
SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).
The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)
For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)
When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.
If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.
Use the below query to find the count for any large table or locked table or being inserted table in seconds . Just replace the table name which you want to search.
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME' AND (index_id < 2)
For those who just need to see the record count while executing a long running INSERT script, I found you can see the current record count through SSMS by right clicking on the destination database table, -> Properties -> Storage, then view the "Row Count" value like so:
Close window and repeat to see the updated record count.

Simultaneous updates to SQL table

I have a table will needs to be updated based on values within the table. The whole table will be updated and I wanted to ensure that SQL-Server will handle the locking for me. I know I can use WITH(LOCKTYPE). But as my update will be something like:
SELECT MyTable.ID, MyTable.Cost
INTO #tempTable from MyTable
-- More complex calculations here
UPDATE MyTable
SET MyTable.Cost = #tempTable.Cost+1
FROM #tempTable
WHERE MyTable.ID = #TempTable.ID
I wanted to be sure that I could lock the entire table so that if another request comes in to update the table it has to wait, but ideally, if a read request comes in it is processed. Is that possible?
Edit
Process:
Read from Table X
Perform calculations
Write to temp table Y
Average values between X and Y
Write back to X
Within the SP, once step 1 has happened, I don't want any other calls to the SP to be processed, I want them to sit in a queue. Because each update is dependent on the values already in the table, I cannot allow another call to the SP to read from the table.
In general, the SP needs to be run serially, the SP should not be allowed to be running twice at the same time. I don't mind if the call goes through and the SELECT is queued, but once a select happens, another select (within the SP) cannot be allowed to happen in another thread.
You can use a SQL transaction : http://msdn.microsoft.com/en-us/library/ms188929.aspx#fbid=-CQjt8-slkk

How can you track the progress of a SQL update?

Let's say I have an update such as:
UPDATE [db1].[sc1].[tb1]
SET c1 = LEFT(c1, LEN(c1)-1)
WHERE c1 like '%:'
This update is basically going to go through millions of rows and trim the colon if there is one in the c1 column.
How can I track how far along in the table this has progressed?
Thanks
This is sql server 2008
You can use the sysindexes table, which keeps track of how much an index has changed. Because this is done in an atomic update, it won't have a chance to recalc statistics, so rowmodctr will keep growing. This is sometimes not noticeable in small tables, but for millions, it will show.
-- create a test table
create table testtbl (id bigint identity primary key clustered, nv nvarchar(max))
-- fill it up with dummy data. 1/3 will have a trailing ':'
insert testtbl
select
convert(nvarchar(max), right(a.number*b.number+c.number,30)) +
case when a.number %3=1 then ':' else '' end
from master..spt_values a
inner join master..spt_values b on b.type='P'
inner join master..spt_values c on c.type='P'
where a.type='P' and a.number between 1 and 5
-- (20971520 row(s) affected)
update testtbl
set nv = left(nv, len(nv)-1)
where nv like '%:'
Now in another query window, run the below continuously and watch the rowmodctr going up and up. rowmodctr vs rows gives you an idea where you are up to, if you know where rowmodctr needs to end up being. In our case, it is 67% of just over 2 million.
select rows, rowmodctr
from sysindexes with (nolock)
where id = object_id('testtbl')
Please don't run (nolock) counting queries on the table itself while it is being updated.
Not really... you can query with the nolock hint and same where, but this will take resources
It isn't an optimal query with a leading wildcard of course...)
Database queries, particularly Data Manipulation Language (DML), are atomic. That means that the INSERT/UPDATE/DELETE either successfully occurs, or it doesn't. There's no means to see what record is being processed -- to the database, they all had been changed once the COMMIT is issued after the UPDATE. Even if you were able to view the records in process, by the time you would see the value, the query will have progressed on to other records.
The only means to knowing where in the process is to script the query to occur within a loop, so you can use a counter to know how many are processed. It's common to do this so large data sets are periodically committed, to minimize the risk of failure requiring having to run the entire query over again.

DELETE SQL with correlated subquery for table with 42 million rows?

I have a table cats with 42,795,120 rows.
Apparently this is a lot of rows. So when I do:
/* owner_cats is a many-to-many join table */
DELETE FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
the query times out :(
(edit: I need to increase my CommandTimeout value, default is only 30 seconds)
I can't use TRUNCATE TABLE cats because I don't want to blow away cats from other owners.
I'm using SQL Server 2005 with "Recovery model" set to "Simple."
So, I thought about doing something like this (executing this SQL from an application btw):
DELETE TOP (25) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
DELETE TOP(50) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
DELETE FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
My question is: what is the threshold of the number of rows I can DELETE in SQL Server 2005?
Or, if my approach is not optimal, please suggest a better approach. Thanks.
This post didn't help me enough:
SQL Server Efficiently dropping a group of rows with millions and millions of rows
EDIT (8/6/2010):
Okay, I just realized after reading the above link again that I did not have indexes on these tables. Also, some of you have already pointed out that issue in the comments below. Keep in mind this is a fictitious schema, so even id_cat is not a PK, because in my real life schema, it's not a unique field.
I will put indexes on:
cats.id_cat
owner_cats.id_cat
owner_cats.id_owner
I guess I'm still getting the hang of this data warehousing, and obviously I need indexes on all the JOIN fields right?
However, it takes hours for me to do this batch load process. I'm already doing it as a SqlBulkCopy (in chunks, not 42 mil all at once). I have some indexes and PKs. I read the following posts which confirms my theory that the indexes are slowing down even a bulk copy:
SqlBulkCopy slow as molasses
What’s the fastest way to bulk insert a lot of data in SQL Server (C# client)
So I'm going to DROP my indexes before the copy and then re CREATE them when it's done.
Because of the long load times, it's going to take me awhile to test these suggestions. I'll report back with the results.
UPDATE (8/7/2010):
Tom suggested:
DELETE
FROM cats c
WHERE EXISTS (SELECT 1
FROM owner_cats o
WHERE o.id_cat = c.id_cat
AND o.id_owner = 1)
And still with no indexes, for 42 million rows, it took 13:21 min:sec versus 22:08 with the way described above. However, for 13 million rows, took him 2:13 versus 2:10 my old way. It's a neat idea, but I still need to use indexes!
Update (8/8/2010):
Something is terribly wrong! Now with the indexes on, my first delete query above took 1:9 hrs:min (yes an hour!) versus 22:08 min:sec and 13:21 min:sec versus 2:10 min:sec for 42 mil rows and 13 mil rows respectively. I'm going to try Tom's query with the indexes now, but this is heading in the wrong direction. Please help.
Update (8/9/2010):
Tom's delete took 1:06 hrs:min for 42 mil rows and 10:50 min:sec for 13 mil rows with indexes versus 13:21 min:sec and 2:13 min:sec respectively. Deletes are taking longer on my database when I use indexes by an order of magnitude! I think I know why, my database .mdf and .ldf grew from 3.5 GB to 40.6 GB during the first (42 mil) delete! What am I doing wrong?
Update (8/10/2010):
For lack of any other options, I have come up with what I feel is a lackluster solution (hopefully temporary):
Increase timeout for database connection to 1 hour (CommandTimeout=60000; default was 30 sec)
Use Tom's query: DELETE FROM WHERE EXISTS (SELECT 1 ...) because it performed a little faster
DROP all indexes and PKs before running delete statement (???)
Run DELETE statement
CREATE all indexes and PKs
Seems crazy, but at least it's faster than using TRUNCATE and starting over my load from the beginning with the first owner_id, because one of my owner_id takes 2:30 hrs:min to load versus 17:22 min:sec for the delete process I just described with 42 mil rows. (Note: if my load process throws an exception, I start over for that owner_id, but I don't want to blow away previous owner_id, so I don't want to TRUNCATE the owner_cats table, which is why I'm trying to use DELETE.)
Anymore help would still be appreciated :)
There is no practical threshold. It depends on what your command timeout is set to on your connection.
Keep in mind that the time it takes to delete all of these rows is contingent upon:
The time it takes to find the rows of interest
The time it takes to log the transaction in the transaction log
The time it takes to delete the index entries of interest
The time it takes to delete the actual rows of interest
The time it takes to wait for other processes to stop using the table so you can acquire what in this case will most likely be an exclusive table lock
The last point may often be the most significant. Do an sp_who2 command in another query window to make sure that there isn't lock contention going on, preventing your command from executing.
Improperly configured SQL Servers will do poorly at this type of query. Transaction logs which are too small and/or share the same disks as the data files will often incur severe performance penalties when working with large rows.
As for a solution, well, like all things, it depends. Is this something you intend to be doing often? Depending on how many rows you have left, the fastest way might be to rebuild the table as another name and then rename it and recreate its constraints, all inside a transaction. If this is just an ad-hoc thing, make sure your ADO CommandTimeout is set high enough and you can just bear the cost of this big delete.
If the delete will remove "a significant number" of rows from the table, this can be an alternative to a DELETE: put the records to keep somewhere else, truncate the original table, put back the 'keepers'. Something like:
SELECT *
INTO #cats_to_keep
FROM cats
WHERE cats.id_cat NOT IN ( -- note the NOT
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
TRUNCATE TABLE cats
INSERT INTO cats
SELECT * FROM #cats_to_keep
Have you tried no Subquery and use a join instead?
DELETE cats
FROM
cats c
INNER JOIN owner_cats oc
on c.id_cat = oc.id_cat
WHERE
id_owner =1
And if you have have you also tried different Join hints e.g.
DELETE cats
FROM
cats c
INNER HASH JOIN owner_cats oc
on c.id_cat = oc.id_cat
WHERE
id_owner =1
If you use an EXISTS rather than an IN, you should get much better performance. Try this:
DELETE
FROM cats c
WHERE EXISTS (SELECT 1
FROM owner_cats o
WHERE o.id_cat = c.id_cat
AND o.id_owner = 1)
There's no threshold as such - you can DELETE all the rows from any table given enough transaction log space - which is where your query is most likely falling over. If you're getting some results from your DELETE TOP (n) PERCENT FROM cats WHERE ... then you can wrap it in a loop as below:
SELECT 1
WHILE ##ROWCOUNT <> 0
BEGIN
DELETE TOP (somevalue) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
END
As others have mentioned, when you delete 42 million rows, the db has to log 42 million deletions against the database. Thus, the transaction log has to grow substantially. What you might try is to break up the delete into chunks. In the following query, I use the NTile ranking function to break up the rows into 100 buckets. If that is too slow, you can expand the number of buckets so that each delete is smaller. It will help tremendously if there is an index on owner_cats.id_owner, owner_cats.id_cats and cats.id_cat (which I assumed the primary key and numeric).
Declare #Cats Cursor
Declare #CatId int --assuming an integer PK here
Declare #Start int
Declare #End int
Declare #GroupCount int
Set #GroupCount = 100
Set #Cats = Cursor Fast_Forward For
With CatHerd As
(
Select cats.id_cat
, NTile(#GroupCount) Over ( Order By cats.id_cat ) As Grp
From cats
Join owner_cats
On owner_cats.id_cat = cats.id_cat
Where owner_cats.id_owner = 1
)
Select Grp, Min(id_cat) As MinCat, Max(id_cat) As MaxCat
From CatHerd
Group By Grp
Open #Cats
Fetch Next From #Cats Into #CatId, #Start, #End
While ##Fetch_Status = 0
Begin
Delete cats
Where id_cat Between #Start And #End
Fetch Next From #Cats Into #CatId, #Start, #End
End
Close #Cats
Deallocate #Cats
The notable catch with the above approach is that it is not transactional. Thus, if it fails on the 40th chunk, you will have deleted 40% of the rows and the other 60% will still exist.
Might be worth trying MERGE e.g.
MERGE INTO cats
USING owner_cats
ON cats.id_cat = owner_cats.id_cat
AND owner_cats.id_owner = 1
WHEN MATCHED THEN DELETE;
<Edit> (9/28/2011)
My answer performs basically the same way as Thomas' solution (Aug 6 '10). I missed it when I posted my answer because it he uses an actual CURSOR so I thought to myself "bad" because of the # of records involved. However, when I reread his answer just now I realize that the WAY he uses the cursor is actually "good". Very clever. I just voted up his answer and will probably use his approach in the future. If you don't understand why, take a look at it again. If you still can't see it, post a comment on this answer and I will come back and try to explain in detail. I decided to leave my answer because someone may have a DBA who refuses to let them use an actual CURSOR regardless of how "good" it is. :-)
</Edit>
I realize that this question is a year old but I recently had a similar situation. I was trying to do "bulk" updates to a large table with a join to a different table, also fairly large. The problem was that the join was resulting in so many "joined records" that it took too long to process and could have led to contention problems. Since this was a one-time update I came up with the following "hack." I created a WHILE LOOP that went through the table to be updated and picked 50,000 records to update at a time. It looked something like this:
DECLARE #RecId bigint
DECLARE #NumRecs bigint
SET #NumRecs = (SELECT MAX(Id) FROM [TableToUpdate])
SET #RecId = 1
WHILE #RecId < #NumRecs
BEGIN
UPDATE [TableToUpdate]
SET UpdatedOn = GETDATE(),
SomeColumn = t2.[ColumnInTable2]
FROM [TableToUpdate] t
INNER JOIN [Table2] t2 ON t2.Name = t.DBAName
AND ISNULL(t.PhoneNumber,'') = t2.PhoneNumber
AND ISNULL(t.FaxNumber, '') = t2.FaxNumber
LEFT JOIN [Address] d ON d.AddressId = t.DbaAddressId
AND ISNULL(d.Address1,'') = t2.DBAAddress1
AND ISNULL(d.[State],'') = t2.DBAState
AND ISNULL(d.PostalCode,'') = t2.DBAPostalCode
WHERE t.Id BETWEEN #RecId AND (#RecId + 49999)
SET #RecId = #RecId + 50000
END
Nothing fancy but it got the job done. Because it was only processing 50,000 records at a time, any locks that got created were short lived. Also, the optimizer realized that it did not have to do the entire table so it did a better job of picking an execution plan.
<Edit> (9/28/2011)
There is a HUGE caveat to the suggestion that has been mentioned here more than once and is posted all over the place around the web regarding copying the "good" records to a different table, doing a TRUNCATE (or DROP and reCREATE, or DROP and rename) and then repopulating the table.
You cannot do this if the table is the PK table in a PK-FK relationship (or other CONSTRAINT). Granted, you could DROP the relationship, do the clean up, and re-establish the relationship, but you would have to clean up the FK table, too. You can do that BEFORE re-establishing the relationship, which means more "down-time", or you can choose to not ENFORCE the CONSTRAINT on creation and clean up afterwards. I guess you could also clean up the FK table BEFORE you clean up the PK table. Bottom line is that you have to explicitly clean up the FK table, one way or the other.
My answer is a hybrid SET-based/quasi-CURSOR process. Another benefit of this method is that if the PK-FK relationship is setup to CASCADE DELETES you don't have to do the clean up I mention above because the server will take care of it for you. If your company/DBA discourage cascading deletes, you can ask that it be enabled only while this process is running and then disabled when it is finished. Depending on the permission levels of the account that runs the clean up, the ALTER statements to enable/disable cascading deletes can be tacked onto the beginning and the end of the SQL statement.
</Edit>
Bill Karwin's answer to another question applies to my situation also:
"If your DELETE is intended to eliminate a great majority of the rows in that table, one thing that people often do is copy just the rows you want to keep to a duplicate table, and then use DROP TABLE or TRUNCATE to wipe out the original table much more quickly."
Matt in this answer says it this way:
"If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename."
ammoQ in this answer (from the same question) recommends (paraphrased):
issue a table lock when deleting a large amount of rows
put indexes on any foreign key columns