How to optimize SQL delete query with subselect for Firebird?

How to optimize SQL delete query with subselect for Firebird? - sql

The following query is extremely slow. It seems the subselect is executed for each row in the table?!
delete from HISTORY
where ID in (
select ID from (
select ID, ROW_NUMBER() over(partition by SOURCE order by ID desc) as NUM from HISTORY
) where NUM > 100
);
This is a cleanup query. It should delete everything but the 100 most recent records per SOURCE.
The time required seems to depend only on the number of records in the table and not on how many records are to be deleted. Even with only 10,000 records it takes several minutes. However, if I only execute the sub-select, it is fast.
Of course there is a PK on ID and a FK and index on SOURCE (both are Integer columns).

Firebird 3 added DELETE option into MERGE clause. It was first mentioned in Release Notes. It is now properly documented in Firebird 3 SQL Reference.
Modelling by the examples there the cleanup query would look like that:
merge into HISTORY HDel
using ( select ID, SOURCE, ROW_NUMBER() over
(partition by SOURCE order by ID desc) as NUM
from HISTORY ) HVal
on (HVal.NUM > 100) and (HVal.ID = HDel.ID) and (HVal.Source = HDel.Source)
WHEN MATCHED THEN DELETE
In your specific database (HVal.Source = HDel.Source) filtering seems redundant, but i still decided to add it to make the query as generic as possibe, for future readers. Better safe than sorry :-)
Firebird 2.x did not provide for that feature, and with FB3's MERGE/DELETE and Window Functions features missing one can fall back to explicit imperative programming and write good old loops. It would take writing and executing a small PSQL program (either a persistent named Stored Procedure or ad hoc EXECUTE BLOCK statement) with making explicit loop over SOURCE values.
Something like (i did not syntax-check it, just scratching from memory):
execute block as
declare variable SRC_VAL integer;
declare variable ID_VAL integer;
begin
for select distinct SOURCE from HISTORY into :SRC_VAL do begin
:ID_VAL = NULL;
select first(1) skip(100) ID from HISTORY
where SOURCE = :SRC_VAL
order by ID desc
into :ID_VAL;
if (:ID_VAL IS NOT NULL) then
delete from HISTORY
where SOURCE = :SRC_VAL
and ID <= :ID_VAL;
end
end

Related

How to delete one random row at a time in Oracle database

I am working on a project where I need to delete a random row from an Oracle database. Which query should I use for that?

Well obviously it's a DELETE statement. It's the random part which is tricky.
Oracle has a PL/SQL package to handle randomness, DBMS_RANDOM. If the table is small or performance is not critical this will work:
delete from your_table
where id = ( select id from
( select id from your_table
order by dbms_random.value)
where rownum = 1)
/
The innermost query sorts the table in a random order. The sub-query selects the top row from that set, and that's the row which gets deleted.
Alternatively, if you know how many records you have...
delete from your_table
where id = ( select round(dbms_random.value(1,10000))
from dual )

Is there any pure sql way to delete data from log table in following ways

I have a two tables 1. main_table 2. log_table
On main_table a trigger is defined that copies the affected rows to log_table on delete and update. In log_table the columns are same except two extra column pid(primary key int) and updateat(time stamp).
Every month the cron job is deleting all the records from log_table that are more than 10 days old. which I need to change to delete all records except latest 10 for each row.
I did it via linq in which I am selecting distinct ids, and then I am looping over all the ids and deleting all data for that id except top 10.
This is working except it is painfully slow and unrealistic in production environment. I also tried to move this logic to Stored procedure with cursor but it is still slow.
I am not sure but I think there can other methods for achieving it, where I don't have to loop over?
The server is mssql 2012 if that matters.

Is this what you want?
with todelete as (
select l.*,
row_number() over (partition by id order by updatedate desc) as seqnum
from log_table l
)
delete todelete
where seqnum > 10;

Writing back a GROUP number in SQL

I have an existing app I can’t modify. It needs to execute a SQL GROUP BY, but cannot. However it can and does read a GroupNumber field from the same table.
What I’m doing now is executing the grouping SQL statement, processing it in code and writing back the GroupNumber to the table so that App can do its thing. What I’d like to do is execute a single SQL statement to do both the grouping and the writeback in a single step. I can’t figure out how to do this, if indeed it’s possible. Simple example:
SELECT FirstName, LastName, Age
FROM Persons
WHERE ....
GROUP BY Age
ORDER BY Age
I execute this, then do
for ( i = 1; i <= result_set.n; i++ )
Sql = “UPDATE Persons
SET GroupNumber = “ + fixed( i )
+ “WHERE Age = “ + fixed( result_set.Age[i] )
I need to do this every time a record gets added to the table (so yes, if someone younger than me gets added, my group number changes - don’t ask).

Clearly you want a trigger. However trigger definitions vary from database server to database server. I'll hazard a guess and say you are using some version of Microsoft SQL Server: the create trigger syntax and a couple of examples can be found at http://msdn.microsoft.com/en-us/library/ms189799.aspx. There might be some small complication with the trigger modifying the same table it is sourcing data from, but I believe you can generally do that in most SQL server databases (SQLite may be one of the few where that is difficult).
Try that and see if that helps.

I'm not really sure what you are after, here is my best guess:
;WITH AllRows AS (--get one row per age, and number them
SELECT
Age, ROW_NUMBER() OVER (PARTITION BY AGE ORDER BY Age) AS RowNumber
FROM Persons
WHERE ...
GROUP BY Age
)
UPDATE p --update all the people, getting their GroupNumber based on their Age's row number
SET GroupNumber=a.RowNumber
FROM Persons p
INNER JOIN AllRows a ON p.Age=a.Age
WHERE GroupNumber IS NULL OR GroupNumber!=a.RowNumber
I use SQL Server, but this is fairly standards based code.

My tricky SQL Update query not working so well

I am trying to update a table in my database with another row from another table. I have two parameters one being the ID and another being the row number (as you can select which row you want from the GUI)
this part of the code works fine, this returns one column of a single row.
(SELECT txtPageContent
FROM (select *, Row_Number() OVER (ORDER BY ArchiveDate asc) as rowid
from ARC_Content Where ContentID = #ContentID) as test
Where rowid = #rowID)
its just when i try to add the update/set it won't work. I am probably missing something
UPDATE TBL_Content
Set TBL_Content.txtPageContent = (select txtPageContent
FROM (select *, Row_Number() OVER (ORDER BY ArchiveDate asc) as rowid
from ARC_Content Where ContentID = #ContentID) as test
Where rowid = #rowID)
Thanks for the help! (i have tried top 1 with no avail)

I see a few issues with your update. First, I don't see any joining or selection criteria for the table that you're updating. That means that every row in the table will be updated with this new value. Is that really what you want?
Second, the row number between what is on the GUI and what you get back in the database may not match. Even if you reproduce the query used to create your list in the GUI (which is dangerous anyway, since it involves keeping the update and the select code always in sync), it's possible that someone could insert or delete or update a row between the time that you fill your list box and send that row number to the server for the update. It's MUCH better to use PKs (probably IDs in your case) to determine which row to use for updating.
That said, I think that the following will work for you (untested):
;WITH cte AS (
SELECT
txtPageContent,
ROW_NUMBER() OVER (ORDER BY ArchiveDate ASC) AS rowid
FROM
ARC_Content
WHERE
ContentID = #ContentID)
UPDATE
TC
SET
txtPageContent = cte.txtPageContent
FROM
TBL_Content TC
INNER JOIN cte ON
rowid = #rowID

How do I improve performance of a SQL UPDATE statement whose SET involves an expensive aggregate subquery?

I have the following UPDATE scenario:
UPDATE destTable d
SET d.test_count = ( SELECT COUNT( employee_id )
FROM sourceTable s
WHERE d.matchCode1 = s.matchCode1 AND
d.matchCode2 = s.matchCode2 AND
d.matchCode3 = s.matchCode3
GROUP BY matchCode1, matchCode2, matchCode3, employee_id )
I have to execute this in a loop changing out the match codes for each iteration.
Between two large tables (~500k records each), this query takes an unacceptably long time to execute. If I just had to execute it once, I wouldn't care too much. Given it is being executed about 20 times, it takes way too long for my needs.
It requires two full table scans (one for the destTable and another for the subquery).
Questions:
What techniques do you recommend to speed this up?
Does the SQL-optimizer run the subquery for each row I'm updating in the destTable to satisfy the where-clause of the subquery or does it have some super intelligence to do this all at once?

In Oracle 9i and higher:
MERGE
INTO destTable d
USING (
SELECT matchCode1, matchCode2, matchCode3, COUNT(employee_id) AS cnt
FROM sourceTable s
GROUP BY
matchCode1, matchCode2, matchCode3, employee_id
) so
ON d.matchCode1 = s.matchCode1 AND
d.matchCode2 = s.matchCode2 AND
d.matchCode3 = s.matchCode3
WHEN MATCHED THEN
UPDATE
SET d.test_count = cnt
To speed up your query, make sure you have a composite index on (matchCode1, matchCode2, matchCode3) in destTable, and a composite index on (matchCode1, matchCode2, matchCode3, employee_id) in sourceTable

I have to execute this in a loop
The first thing you do is build the loop into your sub query or where clause. You're updating data, and then immediately replacing some of the data you just updated. You should be able to either filter your update to only change records appropriate to the current iteration or make your query complex enough to update everything in one statement- probably both.

Have you considered an UPDATE FROM query?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to optimize SQL delete query with subselect for Firebird? - sql

Related

How to delete one random row at a time in Oracle database

Is there any pure sql way to delete data from log table in following ways

Writing back a GROUP number in SQL

My tricky SQL Update query not working so well

How do I improve performance of a SQL UPDATE statement whose SET involves an expensive aggregate subquery?

Categories

Resources