Stored Procedure for batch delete in Firebird - sql

I need to delete a bunch of records (literally millions) but I don't want to make it in an individual statement, because of performance issues. So I created a view:
CREATE VIEW V1
AS
SELECT FIRST 500000 *
FROM TABLE
WHERE W_ID = 14
After that I do a bunch deletes for example:
DELETE FROM V1 WHERE TS < 2021-01-01
What I want is to import this logic in a While loop and in stored procedure. I tried SELECT COUNT query like this:
SELECT COUNT(*)
FROM TABLE
WHERE W_ID = 14 AND TS < 2021-01-01;
Can I use this number in the same procedure as a condition and how can I manage that?
This is what I have tried and I get an error
ERROR: Dynamic SQL Error; SQL error code = -104; Token unknown; WHILE
Code:
CREATE PROCEDURE DeleteBatch
AS
DECLARE VARIABLE CNT INT;
BEGIN
SELECT COUNT(*) FROM TABLE WHERE W_ID = 14 AND TS < 2021-01-01 INTO :cnt;
WHILE cnt > 0 do
BEGIN
IF (cnt > 0) THEN
DELETE FROM V1 WHERE TS < 2021-01-01;
END
ELSE break;
END
I just can't wrap my head around this.
To clarify, in my previous question I wanted to know how to manage the garbage_collection after many deleted records, and I did what was suggested - SELECT * FROM TABLE; or gfix -sweep and that worked very well. As mentioned in the comments the correct statement is SELECT COUNT(*) FROM TABLE;
After that another even bigger database was given to me - above 50 million. And the problem was the DB was very slow to operate with. And I managed to get the server it was on, killed with a DELETE statement to clean the database.
That's why I wanted to try deleting in batches. The slow-down problem there was purely hardware - HDD has gone, and we replaced it. After that there was no problem with executing statements and doing backup and restore to reclaim disk space.

Provided the data that you need to delete, doesn't ever need to be rollbacked once the stored procedure is kicked off, there is another way to handle massive DELETEs in a Stored Procedure.
The example stored procedure will delete the rows 500,000 at a time. It will loop until there aren't any more rows to delete. The AUTONOMOUS TRANSACTION will allow you to put each delete statement in its own transaction and it will commit immediately after the statement completes. This is issuing an implicit commit inside a stored procedure, which you normally can't do.
CREATE OR ALTER PROCEDURE DELETE_TABLEXYZ_ROWS
AS
DECLARE VARIABLE RC INTEGER;
BEGIN
RC = 9999;
WHILE (RC > 0) DO
BEGIN
IN AUTONOMOUS TRANSACTION DO
BEGIN
DELETE FROM TABLEXYZ ROWS 500000;
RC = ROW_COUNT;
END
END
SELECT COUNT(*)
FROM TABLEXYZ
INTO :RC;
END

because of performance issues
What are those exactly? I do not think you actually are improving performance, by just running delete in loops but within the same transaction, or even different TXs but within the same timespan. You seem to be solving some wrong problem. The issue is not how you create "garbage", but how and when Firebird "collects" it.
For example, Select Count(*) in Interbase/Firebird engines means natural scan over all the table and the garbage collection is often trigggered by it, which can itself get long if lot of garbage was created (and massive delete surely does, no matter if done by one million-rows statement or million of one-row statements).
How to delete large data from Firebird SQL database
If you really want to slow down deletion - you have to spread that activity round the clock, and make your client application call a deleting SP for example once every 15 minutes. You would have to add some column to the table, flagging it is marked for deletion and then do the job like that
CREATE PROCEDURE DeleteBatch(CNT INT)
AS
DECLARE ROW_ID INTEGER;
BEGIN
FOR SELECT ID FROM TABLENAME WHERE MARKED_TO_DEL > 0 INTO :row_id
DO BEGIN
CNT = CNT - 1;
DELETE FROM TABLENAME WHERE ID = :ROW_ID;
IF (CNT <= 0) THEN LEAVE;
END
SELECT COUNT(1) FROM TABLENAME INTO :ROW_id; /* force GC now */
END
...and every 15 minutes you do EXECUTE PROCEDURE DeleteBatch(1000).
Overall this probably would only be slower, because of single-row "precision targeting" - but at least it would spread the delays.

Use DELETE...ROWS.
https://firebirdsql.org/file/documentation/html/en/refdocs/fblangref25/firebird-25-language-reference.html#fblangref25-dml-delete-orderby
But as I already said in the answer to the previous question it is better to spend time investigating source of slowdown instead of workaround it by deleting data.

Related

How to read all "last_changed" records from Firebird DB?

My question is a bit tricky, because it's mostly a logical problem.
I've tried to optimize my app speed by reading everything into memory but only those records, which changed since "last read" = greatest timestamp of records last time loaded.
FirebirdSQL database engine does not allow to update a field in an "After Trigger" directly, so it's obviously using "before update or insert" triggers to update the field new.last_changed = current_timestamp;
The problem:
As it turns out, this is a totally WRONG method, because those triggers fire on transaction start!
So if there is a transaction that takes some more time than an other, the saved "last changed time" will be lower than a short-burst transaction fired and finished in between.
1. tr.: 13:00:01.400 .............................Commit << this record will be skipped !
2. tr.: 13:00.01.500......Commit << reading of data will happen here.
The next read will be >= 13:00.01.500
I've tried:
to rewrite all triggers, so they fire after and call an UPDATE orders SET ... << but this causing circular, self-calling endless events.
Would a SET_CONTEXT lock interfere with multi-row update and nested triggers?
(I do not see any possibility this method would work good if running multiple updates in the same transaction.)
What is the common solution for all this?
Edit1:
What I want to happen is to read only those records from DB actually changed since last read. For that to happen, I need the engine to update records AFTER COMMIT. (Not during it, "in the middle".)
This trigger is NOT good, because it will fire on the moment of change, (not after Commit):
alter trigger SYNC_ORDERS active after insert or update position 999 AS
declare variable N timestamp;
begin
N = cast('NOW' as timestamp);
if (new.last_changed <> :N) then
update ORDERS set last_changed= :N where ID=new.ID;
end
And from the application I do:
Query1.SQL.Text := 'SELECT * FROM orders WHERE last_changed >= ' + DateTimeToStr( latest_record );
Query1.Open;
latest_record := Query1.FieldByName('last_changed').asDateTime;
.. this code will list only the record commited in the 2th transaction (earlier) and never the first, longer running transaction (commited later).
Edit2:
It seems I have the same question as here... , but specially for FirebirdSQL.
There are not really any good solutions there, but gave me an idea:
- What if I create an extra table and log changes earlier than 5 minutes there per table?
- Before each SQL query, first I will ask for any changes in that table, sequenced via ID grow!
- Delete lines older than 23 hours
ID TableID Changed
===========================
1 5 2019.11.27 19:36:21
2 5 2019.11.27 19:31:19
Edit3:
As Arioch already suggested, one solution is to:
create a "logger table" filled on every BEFORE INSERT OR UPDATE
trigger by every table
and update the "last_changed" sequence of it
by the ON TRANSACTION COMMIT trigger
But, would not be ...
a better approach?:
adding 1-1 last_sequence INT64 DEFAULT NULL column to every table
create a global generator LAST_GEN
update every table's every NULL row with a gen_id(LAST_GEN,1) inside the ON TRANSACTION COMMIT trigger
SET to NULL again on every BEFORE INSERT OR UPDATE trigger
So basically switching the last_sequence column of a record to:
NULL > 1 > NULL > 34 ... every time it gets modified.
This way I :
do not have to fill the DB with log data,
and I can query the tables directly with WHERE last_sequence>1;.
No needed to pre-query the "logger table" first.
I'm just afraid: WHAT happens, if the ON TRANSACTION COMMIT trigger is trying to update a last_sequence field, while a 2th transaction's ON BEFORE trigger is locking the record (of an other table)?
Can this happen at all?
The final solution is based on the idea, that:
Each table's BEFORE INSERT OR UPDATE trigger can push a time of the transaction: RDB$SET_CONTEXT('USER_TRANSACTION', 'table31', current_timestamp);
The global ON TRANSACTION COMMIT trigger can insert a sequence + time into a "logging table", if receiving such a context.
It can even take care of "daylight saving changes" and "intervals", by logging only "big time differences", like >=1 minute, to reduce the amount of records.)
A stored procedure can ease and speed up the calculation of 'LAST_QUERY_TIME' of each query's.
Example:
1.)
create trigger ORDERS_BI active before insert or update position 0 AS
BEGIN
IF (NEW.ID IS NULL) THEN
NEW.ID = GEN_ID(GEN_ORDERS,1);
RDB$SET_CONTEXT('USER_TRANSACTION', 'orders_table', current_timestamp);
END
2, 3.)
create trigger TRG_SYNC_AFTER_COMMIT ACTIVE ON transaction commit POSITION 1 as
declare variable N TIMESTAMP;
declare variable T VARCHAR(255);
begin
N = cast('NOW' as timestamp);
T = RDB$GET_CONTEXT('USER_TRANSACTION', 'orders_table');
if (:T is not null) then begin
if (:N < :T) then T = :N; --system time changed eg.: daylight saving" -1 hour
if (datediff(second from :T to :N) > 60 ) then --more than 1min. passed
insert into "SYNC_PAST_TIMES" (ID, TABLE_NUMBER, TRG_START, SYNC_TIME, C_USER)
values (GEN_ID(GEN_SYNC_PAST_TIMES, 1), 31, cast(:T as timestamp), :N, CURRENT_USER);
end;
-- other tables too:
T = RDB$GET_CONTEXT('USER_TRANSACTION', 'details_table');
-- ...
when any do EXIT;
end
Edit1:
It is possible to speed up the readout of the "last-time-changed" value from our SYNC_PAST_TIMES table with a help of a Stored Procedure. Logically, You have to store in memory both the ID PT_ID + the time PT_TM in your program to call it for each table.
CREATE PROCEDURE SP_LAST_MODIF_TIME (
TABLE_NUMBER SM_INT,
LAST_PASTTIME_ID BG_INT,
LAST_PASTTIME TIMESTAMP)
RETURNS (
PT_ID BG_INT,
PT_TM TIMESTAMP)
AS
declare variable TEMP_TIME TIMESTAMP;
declare variable TBL SMALLINT;
begin
PT_TM = :LAST_PASTTIME;
FOR SELECT p.ID, p.SYNC_TIME, p.TABLA FROM SYNC_PAST_TIMES p WHERE (p.ID > :LAST_PASTTIME_ID)
ORDER by p.ID ASC
INTO PT_ID, TEMP_TIME, TBL DO --the PT_ID gets an increasing value immediately
begin
if (:TBL = :TABLE_NUMBER) then
if (:TEMP_TIME< :MI_TIME) then
PT_TM = :TEMP_TIME; --searching for the smallest
end
if (:PT_ID IS NULL) then begin
PT_ID = :LAST_PASTTIME_ID;
PT_TM = :LAST_PASTTIME;
end
suspend;
END
You can use this procedure by including in your select, using the WITH .. AS format:
with UTLS as (select first 1 PT_ID, PT_TM from SP_LAST_MODIF_TIME (55, -- TABLE_NUMBER
0, '1899.12.30 00:00:06.000') ) -- last PT_ID, PT_TM from your APP
select first 1000 u.PT_ID, current_timestamp as NOWWW, r.*
from UTLS u, "Orders" r
where (r.SYNC_TIME >= u.PT_TM);
Using FIRST 1000 is a must to prevent reading the whole table if all values are changed at once.
Upgrading the SQL, adding a new column, etc. makes SYNC_TIME changing to NOW at the same time at all rows of the table.
You may adjust it per table individually, just like the interval of seconds to monitor changes. Add a check to your APP, how to handle the case, if the new data reaches 1000 lines at once ...

Conditionally inserting records into a table in multithreaded environment based on a count

I am writing a T-SQL stored procedure that conditionally adds a record to a table only if the number of similar records is below a certain threshold, 10 in the example below. The problem is this will be run from a web application, so it will run on multiple threads, and I need to ensure that the table never has more than 10 similar records.
The basic gist of the procedure is:
BEGIN
DECLARE #c INT
SELECT #c = count(*)
FROM foo
WHERE bar = #a_param
IF #c < 10 THEN
INSERT INTO foo
(bar)
VALUES (#a_param)
END IF
END
I think I could solve any potential concurrency problems by replacing the select statement with:
SELECT #c = count(*) WITH (TABLOCKX, HOLDLOCK)
But I am curious if there any methods other than lock hints for managing concurrency problems in T-SQL
One option would be to use the sp_getapplock system stored procedure. You can place your critical section logic in a transaction and use the built in locking of sql server to ensure synchronized access.
Example:
CREATE PROC MyCriticalWork(#MyParam INT)
AS
DECLARE #LockRequestResult INT
SET #LockRequestResult=0
DECLARE #MyTimeoutMiliseconds INT
SET #MyTimeoutMiliseconds=5000--Wait only five seconds max then timeouit
BEGIN TRAN
EXEC #LockRequestResult=SP_GETAPPLOCK 'MyCriticalWork','Exclusive','Transaction',#MyTimeoutMiliseconds
IF(#LockRequestResult>=0)BEGIN
/*
DO YOUR CRITICAL READS AND WRITES HERE
*/
--Release the lock
COMMIT TRAN
END ELSE
ROLLBACK TRAN
Use SERIALIZABLE. By definition it provides you the illusion that your transaction is the only transaction running. Be aware that this might result in blocking and deadlocking. In fact this SQL code is a classic candidate for deadlocking: Two transactions might first read a set of rows, then both will try to modify that set of rows. Locking hints are the classic way of solving that problem. Retry also works.
As stated in the comment. Why are you trying to insert on multiple threads? You cannot write to a table faster on multiple threads.
But you don't need a declare
insert into [Table_1] (ID, fname, lname)
select 3, 'fname', 'lname'
from [Table_1]
where ID = 3
having COUNT(*) <= 10
If you need to take a lock then do so
The data is not 3NF
Should start any design with a proper data model
Why rule out table lock?
That could very well be the best approach
Really, what are the chances?
Even without a lock you would have to have two at a count of 9 submit at exactly the same time. Even then it would stop at 11. Is the 10 an absolute hard number?

How to force a running t-sql query (half done) to commit?

I have database on Sql Server 2008 R2.
On that database a delete query on 400 Million records, has been running for 4 days , but I need to reboot the machine. How can I force it to commit whatever is deleted so far? I want to reject that data which is deleted by running query so far.
But problem is that query is still running and will not complete before the server reboot.
Note : I have not set any isolation / begin/end transaction for the query. The query is running in SSMS studio.
If machine reboot or I cancelled the query, then database will go in recovery mode and it will recovering for next 2 days, then I need to re-run the delete and it will cost me another 4 days.
I really appreciate any suggestion / help or guidance in this.
I am novice user of sql server.
Thanks in Advance
Regards
There is no way to stop SQL Server from trying to bring the database into a transactionally consistent state. Every single statement is implicitly a transaction itself (if not part of an outer transaction) and is executing either all or nothing. So if you either cancel the query or disconnect or reboot the server, SQL Server will from transaction log write the original values back to the updated data pages.
Next time when you delete so many rows at once, don't do it at once. Divide the job in smaller chunks (I always use 5.000 as a magic number, meaning I delete 5000 rows at the time in the loop) to minimize transaction log use and locking.
set rowcount 5000
delete table
while ##rowcount = 5000
delete table
set rowcount 0
If you are deleting that many rows you may have a better time with truncate. Truncate deletes all rows from the table very efficiently. However, I'm assuming that you would like to keep some of the records in the table. The stored procedure below backs up the data you would like to keep into a temp table then truncates then re-inserts the records that were saved. This can clean a huge table very quickly.
Note that truncate doesn't play well with Foreign Key constraints so you may need to drop those then recreate them after cleaned.
CREATE PROCEDURE [dbo].[deleteTableFast] (
#TableName VARCHAR(100),
#WhereClause varchar(1000))
AS
BEGIN
-- input:
-- table name: is the table to use
-- where clause: is the where clause of the records to KEEP
declare #tempTableName varchar(100);
set #tempTableName = #tableName+'_temp_to_truncate';
-- error checking
if exists (SELECT [Table_Name] FROM Information_Schema.COLUMNS WHERE [TABLE_NAME] =(#tempTableName)) begin
print 'ERROR: already temp table ... exiting'
return
end
if not exists (SELECT [Table_Name] FROM Information_Schema.COLUMNS WHERE [TABLE_NAME] =(#TableName)) begin
print 'ERROR: table does not exist ... exiting'
return
end
-- save wanted records via a temp table to be able to truncate
exec ('select * into '+#tempTableName+' from '+#TableName+' WHERE '+#WhereClause);
exec ('truncate table '+#TableName);
exec ('insert into '+#TableName+' select * from '+#tempTableName);
exec ('drop table '+#tempTableName);
end
GO
You must know D(Durability) in ACID first before you understand why database goes to Recovery mode.
Generally speaking, you should avoid long running SQL if possible. Long running SQL means more lock time on resource, larger transaction log and huge rollback time when it fails.
Consider divided your task some id or time. For example, you want to insert large volume data from TableSrc to TableTarget, you can write query like
DECLARE #BATCHCOUNT INT = 1000;
DECLARE #Id INT = 0;
DECLARE #Max = ...;
WHILE Id < #Max
BEGIN
INSERT INTO TableTarget
FROM TableSrc
WHERE PrimaryKey >= #Id AND #PrimaryKey < #Id + #BatchCount;
SET #Id = #Id + #BatchCount;
END
It's ugly more code and more error prone. But it's the only way I know to deal with huge data volume.

db2 stored procedure - trouble batching DELETE statements

I've only been writing DB2 procedures for a few days, but trying to do a "batch delete" on a given table. My expected logic is:
to open a cursor
walk through it until EOF
issue a DELETE on each iteration
For sake of simplifying this question, assume I only want to issue a single COMMIT (of all DELETEs), after the WHILE loop is completed (ie. once cursor reaches EOF). So given the code sample below:
CREATE TABLE tableA (colA INTEGER, ...)
CREATE PROCEDURE "SCHEMA"."PURGE_PROC"
(IN batchSize INTEGER)
LANGUAGE SQL
SPECIFIC SQL140207163731500
BEGIN
DECLARE tempID INTEGER;
DECLARE eof_bool INTEGER DEFAULT 0;
DECLARE sqlString VARCHAR(1000);
DECLARE sqlStmt STATEMENT;
DECLARE myCurs CURSOR WITH HOLD FOR sqlStmt;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET eof_bool = 1;
SET sqlString = 'select colA from TableA';
PREPARE sqlStmt FROM sqlString;
OPEN myCurs;
FETCH myCurs INTO tempID;
WHILE (eof_bool = 0) DO
DELETE FROM TableA where colA = tempID;
FETCH myCurs INTO tempID;
END WHILE;
COMMIT;
CLOSE myCurs;
END
Note: In my real scenario:
I am not deleting all records from the table, just certain ones based on some additional criteria; and
I plan to perform a COMMIT every N# iterations of the WHILE loop (say 500 or 1000), not the entire mess like above; and
I plan to DELETE against multiple tables, not just this one;
But again, to simplify, I tested the above code, and what I'm seeing is that the DELETEs seem to be getting committed 1-by-1. I base this on the following test:
I pre-load the table with (say 50k) records;
then run the purge storedProc which takes ~60 secs to run;
during this time, from another sql client, I continuously "SELECT COUNT(*) FROM tableA" and see count reducing incrementally.
If all DELETEs were committed at once, I would expect to see the record count(*) only drop from to 0 at the end of the ~60 seconds. That is what I see with comparable SPs written for Oracle or SQLServer.
This is DB2 v9.5 on Win2003.
Any ideas what I'm missing?
You are missing the difference in concurrency control implementation between the different database engines. In an Oracle database another session would see data that have been committed prior to the beginning of its transaction, that is, it would not see any deletes until the first session commits.
In DB2, depending on the server configuration parameters (e.g. DB2_SKIPDELETED) and/or the second session isolation level (e.g. uncommitted read) it can in fact see (or not see) data affected by in-flight transactions.
If your business logic requires different transaction isolation, speak with your DBA.
It should be pointed out that you're deleting "outside of the cursor"
The right way to delete using the cursor would be using a "positioned delete"
DELETE FROM tableA WHERE CURRENT OF myCurs;
The above deletes the row just fetched.

Deleting from table with millions of records

I'm trying to find a way to do a conditional DELETE on an InnoDB table which contains millions of records, without locking it (thus not bringing the website down).
I've tried to find information on mysql.com, but to no avail. Any tips on how to proceed?
I don't think it is possible to delete without locking. That said, I don't think locking the record you want to delete is a problem. What would be a problem is locking other rows.
I found some information on that subject here: http://dev.mysql.com/doc/refman/5.0/en/innodb-locks-set.html
What I would suggest, is to try and do a million single row deletes. I think that if you do all those in a single transaction, performance should not hurt too much. so you would get something like:
START TRANSACTION;
DELETE FROM tab WHERE id = 1;
..
..
DELETE FROM tab WHERE id = x;
COMMIT;
You can generate the required statments by doing something like
SELECT CONCAT('DELETE FROM tab WHERE id = ', id)
FROM tab
WHERE <some intricate condition that selects the set you want to delete>
So the advantage over this method instead of doing:
DELETE FROM tab
WHERE <some intricate condition that selects the set you want to delete>
is that in the first approach you only ever lock the record you're deleting, whereas in the second approach you could run the risk of locking other records that happen to be in the same range as the rows you are deleteing.
If it fits your application, then you could limit the number of rows to delete, and setup a cronjob for repeating the deletion. E.g.:
DELETE FROM tab WHERE .. LIMIT 1000
I found this to be good compromise in a similar scenario.
I use procedure to delete
create procedure delete_last_year_data()
begin
DECLARE del_row varchar(255);
DECLARE done INT DEFAULT 0;
declare del_rows cursor for select CONCAT('DELETE FROM table_name WHERE id = ', id)
from table_name
where created_time < '2018-01-01 00:00:00';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
open del_rows;
repeat
fetch del_rows into del_row;
if not done
then
set #del = del_row;
prepare stmt from #del;
execute stmt;
DEALLOCATE PREPARE stmt;
end if;
until done end repeat;
close del_rows;
end //