I'm trying to find a way to do a conditional DELETE on an InnoDB table which contains millions of records, without locking it (thus not bringing the website down).
I've tried to find information on mysql.com, but to no avail. Any tips on how to proceed?
I don't think it is possible to delete without locking. That said, I don't think locking the record you want to delete is a problem. What would be a problem is locking other rows.
I found some information on that subject here: http://dev.mysql.com/doc/refman/5.0/en/innodb-locks-set.html
What I would suggest, is to try and do a million single row deletes. I think that if you do all those in a single transaction, performance should not hurt too much. so you would get something like:
START TRANSACTION;
DELETE FROM tab WHERE id = 1;
..
..
DELETE FROM tab WHERE id = x;
COMMIT;
You can generate the required statments by doing something like
SELECT CONCAT('DELETE FROM tab WHERE id = ', id)
FROM tab
WHERE <some intricate condition that selects the set you want to delete>
So the advantage over this method instead of doing:
DELETE FROM tab
WHERE <some intricate condition that selects the set you want to delete>
is that in the first approach you only ever lock the record you're deleting, whereas in the second approach you could run the risk of locking other records that happen to be in the same range as the rows you are deleteing.
If it fits your application, then you could limit the number of rows to delete, and setup a cronjob for repeating the deletion. E.g.:
DELETE FROM tab WHERE .. LIMIT 1000
I found this to be good compromise in a similar scenario.
I use procedure to delete
create procedure delete_last_year_data()
begin
DECLARE del_row varchar(255);
DECLARE done INT DEFAULT 0;
declare del_rows cursor for select CONCAT('DELETE FROM table_name WHERE id = ', id)
from table_name
where created_time < '2018-01-01 00:00:00';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
open del_rows;
repeat
fetch del_rows into del_row;
if not done
then
set #del = del_row;
prepare stmt from #del;
execute stmt;
DEALLOCATE PREPARE stmt;
end if;
until done end repeat;
close del_rows;
end //
Related
I need to delete a bunch of records (literally millions) but I don't want to make it in an individual statement, because of performance issues. So I created a view:
CREATE VIEW V1
AS
SELECT FIRST 500000 *
FROM TABLE
WHERE W_ID = 14
After that I do a bunch deletes for example:
DELETE FROM V1 WHERE TS < 2021-01-01
What I want is to import this logic in a While loop and in stored procedure. I tried SELECT COUNT query like this:
SELECT COUNT(*)
FROM TABLE
WHERE W_ID = 14 AND TS < 2021-01-01;
Can I use this number in the same procedure as a condition and how can I manage that?
This is what I have tried and I get an error
ERROR: Dynamic SQL Error; SQL error code = -104; Token unknown; WHILE
Code:
CREATE PROCEDURE DeleteBatch
AS
DECLARE VARIABLE CNT INT;
BEGIN
SELECT COUNT(*) FROM TABLE WHERE W_ID = 14 AND TS < 2021-01-01 INTO :cnt;
WHILE cnt > 0 do
BEGIN
IF (cnt > 0) THEN
DELETE FROM V1 WHERE TS < 2021-01-01;
END
ELSE break;
END
I just can't wrap my head around this.
To clarify, in my previous question I wanted to know how to manage the garbage_collection after many deleted records, and I did what was suggested - SELECT * FROM TABLE; or gfix -sweep and that worked very well. As mentioned in the comments the correct statement is SELECT COUNT(*) FROM TABLE;
After that another even bigger database was given to me - above 50 million. And the problem was the DB was very slow to operate with. And I managed to get the server it was on, killed with a DELETE statement to clean the database.
That's why I wanted to try deleting in batches. The slow-down problem there was purely hardware - HDD has gone, and we replaced it. After that there was no problem with executing statements and doing backup and restore to reclaim disk space.
Provided the data that you need to delete, doesn't ever need to be rollbacked once the stored procedure is kicked off, there is another way to handle massive DELETEs in a Stored Procedure.
The example stored procedure will delete the rows 500,000 at a time. It will loop until there aren't any more rows to delete. The AUTONOMOUS TRANSACTION will allow you to put each delete statement in its own transaction and it will commit immediately after the statement completes. This is issuing an implicit commit inside a stored procedure, which you normally can't do.
CREATE OR ALTER PROCEDURE DELETE_TABLEXYZ_ROWS
AS
DECLARE VARIABLE RC INTEGER;
BEGIN
RC = 9999;
WHILE (RC > 0) DO
BEGIN
IN AUTONOMOUS TRANSACTION DO
BEGIN
DELETE FROM TABLEXYZ ROWS 500000;
RC = ROW_COUNT;
END
END
SELECT COUNT(*)
FROM TABLEXYZ
INTO :RC;
END
because of performance issues
What are those exactly? I do not think you actually are improving performance, by just running delete in loops but within the same transaction, or even different TXs but within the same timespan. You seem to be solving some wrong problem. The issue is not how you create "garbage", but how and when Firebird "collects" it.
For example, Select Count(*) in Interbase/Firebird engines means natural scan over all the table and the garbage collection is often trigggered by it, which can itself get long if lot of garbage was created (and massive delete surely does, no matter if done by one million-rows statement or million of one-row statements).
How to delete large data from Firebird SQL database
If you really want to slow down deletion - you have to spread that activity round the clock, and make your client application call a deleting SP for example once every 15 minutes. You would have to add some column to the table, flagging it is marked for deletion and then do the job like that
CREATE PROCEDURE DeleteBatch(CNT INT)
AS
DECLARE ROW_ID INTEGER;
BEGIN
FOR SELECT ID FROM TABLENAME WHERE MARKED_TO_DEL > 0 INTO :row_id
DO BEGIN
CNT = CNT - 1;
DELETE FROM TABLENAME WHERE ID = :ROW_ID;
IF (CNT <= 0) THEN LEAVE;
END
SELECT COUNT(1) FROM TABLENAME INTO :ROW_id; /* force GC now */
END
...and every 15 minutes you do EXECUTE PROCEDURE DeleteBatch(1000).
Overall this probably would only be slower, because of single-row "precision targeting" - but at least it would spread the delays.
Use DELETE...ROWS.
https://firebirdsql.org/file/documentation/html/en/refdocs/fblangref25/firebird-25-language-reference.html#fblangref25-dml-delete-orderby
But as I already said in the answer to the previous question it is better to spend time investigating source of slowdown instead of workaround it by deleting data.
I'm trying to figure out how to create a looped truncate to truncate (or delete) the data in specific tables only. There are something like 30 or so tables that need to be truncated, and I want to avoid just using a list of truncate statements. I can't seem to find any good examples of doing this, and no matter what I try, I get a "right truncated" error.
Example 1:
FOR anlyc_tables AS curs CURSOR FOR
SELECT table_name FROM systable WHERE table_name LIKE 'table_to_truncate_prefix%'
DO EXECUTE (
'TRUNCATE TABLE ' + table_name
);
END FOR;
This one throws no errors, but completes in .078 seconds and doesn't actually truncate anything.
Example 2:
ALTER PROCEDURE truncate_analytics()
BEGIN
DECLARE #table_name VARCHAR;
DECLARE curs DYNAMIC SCROLL CURSOR FOR SELECT table_name FROM systable WHERE table_name LIKE 't_anlyc%';
OPEN curs WITH HOLD;
FETCH NEXT curs INTO #table_name;
WHILE(sqlstate = 0) LOOP
FETCH NEXT curs INTO #table_name;
TRUNCATE TABLE table_name;
END LOOP;
END
GO
CALL truncate_analytics()
GO
Results in the "right truncated" error and the tables are not truncated.
I think I'm missing something really obvious here, but I don't have a ton of experience with SQL scripting in this manner, and can't seem to find any working examples of this to prove it's even possible.
Can anyone point me in the right direction?
There may be a few things at work that are preventing the code from working.
With your second example, the text "TRUNCATE TABLE table_name" is going to look for a table named table_name. table_name won't be treated for it's variable value, but will be considered a object identifier. FWIW, I'm not sure how this would cause a right truncation error.
Depending on what you consider the prefix for a table, you might not be getting the list of tables you expect. If you are considering the user to be the prefix (i.e. dba.my_table), your select isn't going to grab the necessary tables; there can be multiple tables with the same name, just different owners. If you want to select based on user, you'll need to join to the SYSUSER view.
Lastly, using a cursor for this can be done. It would require WITH HOLD as you already have, because the TRUNCATE TABLE statement issues an implicit commit and closes cursors otherwise. (Note that WITH HOLD requires explicit closing of the cursor, otherwise it will remain open until connection close.) Another option is to build a string of statements to execute, and use the execute or EXECUTE IMMEDIATE on them as a batch.
Hope This Helps,
Tyson
I am writing a T-SQL stored procedure that conditionally adds a record to a table only if the number of similar records is below a certain threshold, 10 in the example below. The problem is this will be run from a web application, so it will run on multiple threads, and I need to ensure that the table never has more than 10 similar records.
The basic gist of the procedure is:
BEGIN
DECLARE #c INT
SELECT #c = count(*)
FROM foo
WHERE bar = #a_param
IF #c < 10 THEN
INSERT INTO foo
(bar)
VALUES (#a_param)
END IF
END
I think I could solve any potential concurrency problems by replacing the select statement with:
SELECT #c = count(*) WITH (TABLOCKX, HOLDLOCK)
But I am curious if there any methods other than lock hints for managing concurrency problems in T-SQL
One option would be to use the sp_getapplock system stored procedure. You can place your critical section logic in a transaction and use the built in locking of sql server to ensure synchronized access.
Example:
CREATE PROC MyCriticalWork(#MyParam INT)
AS
DECLARE #LockRequestResult INT
SET #LockRequestResult=0
DECLARE #MyTimeoutMiliseconds INT
SET #MyTimeoutMiliseconds=5000--Wait only five seconds max then timeouit
BEGIN TRAN
EXEC #LockRequestResult=SP_GETAPPLOCK 'MyCriticalWork','Exclusive','Transaction',#MyTimeoutMiliseconds
IF(#LockRequestResult>=0)BEGIN
/*
DO YOUR CRITICAL READS AND WRITES HERE
*/
--Release the lock
COMMIT TRAN
END ELSE
ROLLBACK TRAN
Use SERIALIZABLE. By definition it provides you the illusion that your transaction is the only transaction running. Be aware that this might result in blocking and deadlocking. In fact this SQL code is a classic candidate for deadlocking: Two transactions might first read a set of rows, then both will try to modify that set of rows. Locking hints are the classic way of solving that problem. Retry also works.
As stated in the comment. Why are you trying to insert on multiple threads? You cannot write to a table faster on multiple threads.
But you don't need a declare
insert into [Table_1] (ID, fname, lname)
select 3, 'fname', 'lname'
from [Table_1]
where ID = 3
having COUNT(*) <= 10
If you need to take a lock then do so
The data is not 3NF
Should start any design with a proper data model
Why rule out table lock?
That could very well be the best approach
Really, what are the chances?
Even without a lock you would have to have two at a count of 9 submit at exactly the same time. Even then it would stop at 11. Is the 10 an absolute hard number?
I have database on Sql Server 2008 R2.
On that database a delete query on 400 Million records, has been running for 4 days , but I need to reboot the machine. How can I force it to commit whatever is deleted so far? I want to reject that data which is deleted by running query so far.
But problem is that query is still running and will not complete before the server reboot.
Note : I have not set any isolation / begin/end transaction for the query. The query is running in SSMS studio.
If machine reboot or I cancelled the query, then database will go in recovery mode and it will recovering for next 2 days, then I need to re-run the delete and it will cost me another 4 days.
I really appreciate any suggestion / help or guidance in this.
I am novice user of sql server.
Thanks in Advance
Regards
There is no way to stop SQL Server from trying to bring the database into a transactionally consistent state. Every single statement is implicitly a transaction itself (if not part of an outer transaction) and is executing either all or nothing. So if you either cancel the query or disconnect or reboot the server, SQL Server will from transaction log write the original values back to the updated data pages.
Next time when you delete so many rows at once, don't do it at once. Divide the job in smaller chunks (I always use 5.000 as a magic number, meaning I delete 5000 rows at the time in the loop) to minimize transaction log use and locking.
set rowcount 5000
delete table
while ##rowcount = 5000
delete table
set rowcount 0
If you are deleting that many rows you may have a better time with truncate. Truncate deletes all rows from the table very efficiently. However, I'm assuming that you would like to keep some of the records in the table. The stored procedure below backs up the data you would like to keep into a temp table then truncates then re-inserts the records that were saved. This can clean a huge table very quickly.
Note that truncate doesn't play well with Foreign Key constraints so you may need to drop those then recreate them after cleaned.
CREATE PROCEDURE [dbo].[deleteTableFast] (
#TableName VARCHAR(100),
#WhereClause varchar(1000))
AS
BEGIN
-- input:
-- table name: is the table to use
-- where clause: is the where clause of the records to KEEP
declare #tempTableName varchar(100);
set #tempTableName = #tableName+'_temp_to_truncate';
-- error checking
if exists (SELECT [Table_Name] FROM Information_Schema.COLUMNS WHERE [TABLE_NAME] =(#tempTableName)) begin
print 'ERROR: already temp table ... exiting'
return
end
if not exists (SELECT [Table_Name] FROM Information_Schema.COLUMNS WHERE [TABLE_NAME] =(#TableName)) begin
print 'ERROR: table does not exist ... exiting'
return
end
-- save wanted records via a temp table to be able to truncate
exec ('select * into '+#tempTableName+' from '+#TableName+' WHERE '+#WhereClause);
exec ('truncate table '+#TableName);
exec ('insert into '+#TableName+' select * from '+#tempTableName);
exec ('drop table '+#tempTableName);
end
GO
You must know D(Durability) in ACID first before you understand why database goes to Recovery mode.
Generally speaking, you should avoid long running SQL if possible. Long running SQL means more lock time on resource, larger transaction log and huge rollback time when it fails.
Consider divided your task some id or time. For example, you want to insert large volume data from TableSrc to TableTarget, you can write query like
DECLARE #BATCHCOUNT INT = 1000;
DECLARE #Id INT = 0;
DECLARE #Max = ...;
WHILE Id < #Max
BEGIN
INSERT INTO TableTarget
FROM TableSrc
WHERE PrimaryKey >= #Id AND #PrimaryKey < #Id + #BatchCount;
SET #Id = #Id + #BatchCount;
END
It's ugly more code and more error prone. But it's the only way I know to deal with huge data volume.
I've only been writing DB2 procedures for a few days, but trying to do a "batch delete" on a given table. My expected logic is:
to open a cursor
walk through it until EOF
issue a DELETE on each iteration
For sake of simplifying this question, assume I only want to issue a single COMMIT (of all DELETEs), after the WHILE loop is completed (ie. once cursor reaches EOF). So given the code sample below:
CREATE TABLE tableA (colA INTEGER, ...)
CREATE PROCEDURE "SCHEMA"."PURGE_PROC"
(IN batchSize INTEGER)
LANGUAGE SQL
SPECIFIC SQL140207163731500
BEGIN
DECLARE tempID INTEGER;
DECLARE eof_bool INTEGER DEFAULT 0;
DECLARE sqlString VARCHAR(1000);
DECLARE sqlStmt STATEMENT;
DECLARE myCurs CURSOR WITH HOLD FOR sqlStmt;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET eof_bool = 1;
SET sqlString = 'select colA from TableA';
PREPARE sqlStmt FROM sqlString;
OPEN myCurs;
FETCH myCurs INTO tempID;
WHILE (eof_bool = 0) DO
DELETE FROM TableA where colA = tempID;
FETCH myCurs INTO tempID;
END WHILE;
COMMIT;
CLOSE myCurs;
END
Note: In my real scenario:
I am not deleting all records from the table, just certain ones based on some additional criteria; and
I plan to perform a COMMIT every N# iterations of the WHILE loop (say 500 or 1000), not the entire mess like above; and
I plan to DELETE against multiple tables, not just this one;
But again, to simplify, I tested the above code, and what I'm seeing is that the DELETEs seem to be getting committed 1-by-1. I base this on the following test:
I pre-load the table with (say 50k) records;
then run the purge storedProc which takes ~60 secs to run;
during this time, from another sql client, I continuously "SELECT COUNT(*) FROM tableA" and see count reducing incrementally.
If all DELETEs were committed at once, I would expect to see the record count(*) only drop from to 0 at the end of the ~60 seconds. That is what I see with comparable SPs written for Oracle or SQLServer.
This is DB2 v9.5 on Win2003.
Any ideas what I'm missing?
You are missing the difference in concurrency control implementation between the different database engines. In an Oracle database another session would see data that have been committed prior to the beginning of its transaction, that is, it would not see any deletes until the first session commits.
In DB2, depending on the server configuration parameters (e.g. DB2_SKIPDELETED) and/or the second session isolation level (e.g. uncommitted read) it can in fact see (or not see) data affected by in-flight transactions.
If your business logic requires different transaction isolation, speak with your DBA.
It should be pointed out that you're deleting "outside of the cursor"
The right way to delete using the cursor would be using a "positioned delete"
DELETE FROM tableA WHERE CURRENT OF myCurs;
The above deletes the row just fetched.