I am facing issue with ever-running query in one stored procedure (using SQL Server 2005).
In this stored procedure there are many UPDATE statements and few INSERT statements running under one Transaction.
When debugging the stored procedure, I found that there is one specific UPDATE query which is running indefinitely as shown below.
BEGIN TRAN
...
...-- some INSERT & UPDATE statements
UPDATE PIH
SET PIH.RemittanceHeaderID = RD.RemittanceHeaderID
FROM TempPayroll TP
INNER JOIN RemittanceDetails RD ON TP.UniqueID = RD.PLInvoiceDetailID
INNER JOIN PLInvoiceHeader PIH ON RD.PLInvoiceUniqueID = PIH.InvoiceUniqueID AND PIH.CompanyCode = TP.CompanyCode
WHERE TP.PLSelfBill = 1
AND TP.UsePLPaymentTerms = 1
AND RD.PLInvoiceDetailID IS NOT NULL
AND TP.RecordType LIKE 'INVOICE%'
AND RD.RecordType LIKE 'INVOICE%'
... --some UPDATE statements
COMMIT TRAN
However, when I remove the CompanyCode condition the stored procedure is working fast with no issues. Also if I execute just this statement alone under a transaction separately it is running fine.
There are around 240 matching records for this query. Each table contains just 30k - 50k records.
How do I debug the query and how to fix it?
Why just that JOIN condition (CompanyCode) is creating this issue?
Thanks in advance.
Related
I need to delete a bunch of records (literally millions) but I don't want to make it in an individual statement, because of performance issues. So I created a view:
CREATE VIEW V1
AS
SELECT FIRST 500000 *
FROM TABLE
WHERE W_ID = 14
After that I do a bunch deletes for example:
DELETE FROM V1 WHERE TS < 2021-01-01
What I want is to import this logic in a While loop and in stored procedure. I tried SELECT COUNT query like this:
SELECT COUNT(*)
FROM TABLE
WHERE W_ID = 14 AND TS < 2021-01-01;
Can I use this number in the same procedure as a condition and how can I manage that?
This is what I have tried and I get an error
ERROR: Dynamic SQL Error; SQL error code = -104; Token unknown; WHILE
Code:
CREATE PROCEDURE DeleteBatch
AS
DECLARE VARIABLE CNT INT;
BEGIN
SELECT COUNT(*) FROM TABLE WHERE W_ID = 14 AND TS < 2021-01-01 INTO :cnt;
WHILE cnt > 0 do
BEGIN
IF (cnt > 0) THEN
DELETE FROM V1 WHERE TS < 2021-01-01;
END
ELSE break;
END
I just can't wrap my head around this.
To clarify, in my previous question I wanted to know how to manage the garbage_collection after many deleted records, and I did what was suggested - SELECT * FROM TABLE; or gfix -sweep and that worked very well. As mentioned in the comments the correct statement is SELECT COUNT(*) FROM TABLE;
After that another even bigger database was given to me - above 50 million. And the problem was the DB was very slow to operate with. And I managed to get the server it was on, killed with a DELETE statement to clean the database.
That's why I wanted to try deleting in batches. The slow-down problem there was purely hardware - HDD has gone, and we replaced it. After that there was no problem with executing statements and doing backup and restore to reclaim disk space.
Provided the data that you need to delete, doesn't ever need to be rollbacked once the stored procedure is kicked off, there is another way to handle massive DELETEs in a Stored Procedure.
The example stored procedure will delete the rows 500,000 at a time. It will loop until there aren't any more rows to delete. The AUTONOMOUS TRANSACTION will allow you to put each delete statement in its own transaction and it will commit immediately after the statement completes. This is issuing an implicit commit inside a stored procedure, which you normally can't do.
CREATE OR ALTER PROCEDURE DELETE_TABLEXYZ_ROWS
AS
DECLARE VARIABLE RC INTEGER;
BEGIN
RC = 9999;
WHILE (RC > 0) DO
BEGIN
IN AUTONOMOUS TRANSACTION DO
BEGIN
DELETE FROM TABLEXYZ ROWS 500000;
RC = ROW_COUNT;
END
END
SELECT COUNT(*)
FROM TABLEXYZ
INTO :RC;
END
because of performance issues
What are those exactly? I do not think you actually are improving performance, by just running delete in loops but within the same transaction, or even different TXs but within the same timespan. You seem to be solving some wrong problem. The issue is not how you create "garbage", but how and when Firebird "collects" it.
For example, Select Count(*) in Interbase/Firebird engines means natural scan over all the table and the garbage collection is often trigggered by it, which can itself get long if lot of garbage was created (and massive delete surely does, no matter if done by one million-rows statement or million of one-row statements).
How to delete large data from Firebird SQL database
If you really want to slow down deletion - you have to spread that activity round the clock, and make your client application call a deleting SP for example once every 15 minutes. You would have to add some column to the table, flagging it is marked for deletion and then do the job like that
CREATE PROCEDURE DeleteBatch(CNT INT)
AS
DECLARE ROW_ID INTEGER;
BEGIN
FOR SELECT ID FROM TABLENAME WHERE MARKED_TO_DEL > 0 INTO :row_id
DO BEGIN
CNT = CNT - 1;
DELETE FROM TABLENAME WHERE ID = :ROW_ID;
IF (CNT <= 0) THEN LEAVE;
END
SELECT COUNT(1) FROM TABLENAME INTO :ROW_id; /* force GC now */
END
...and every 15 minutes you do EXECUTE PROCEDURE DeleteBatch(1000).
Overall this probably would only be slower, because of single-row "precision targeting" - but at least it would spread the delays.
Use DELETE...ROWS.
https://firebirdsql.org/file/documentation/html/en/refdocs/fblangref25/firebird-25-language-reference.html#fblangref25-dml-delete-orderby
But as I already said in the answer to the previous question it is better to spend time investigating source of slowdown instead of workaround it by deleting data.
I executed a few queries like below, where it INSERTS 1 row to the table then UPDATE it, and I didn't get any error.
But then I found out that it locked the table, where no one else can query the table.
Do you know why the query below would lock the table ?
Can I not SET vchB = vchNumber, vchC = vchNumber right after INSERT INTO ?
I read when/what locks are hold/released in READ COMMITTED isolation level, and it says "All lock will release only after committed/rollbacked".
The INSERT and UPDATE statements below were committed succesfully (it only INSERT and UPDATE 1 row in the table), and it only took a second to run, but yet when other people query the table, it just hang.
Thank you.
BEGIN TRANSACTION T1
INSERT INTO myTbl(vchSN,vchNumber,vchName)
SELECT 'AB12','1234','My Name'
UPDATE myTbl
SET vchB = vchNumber, vchC = vchNumber, vchtab = 'N' where vchSN = 'AB12'
COMMIT TRANSACTION T1
I'm implementing a new method for a warehouse. The new method consist on perform incremental loading between source and destination tables (Insert,Update or Delete).
All the table are working really well, except for 1 table which the Source has more than 3 millions of rows, as you will see in the image below it just start running but never finish.
Probable I'm not doing the update in the correct way or there is another way to do it.
Here are some pictures of my SSIS package:
Highlighted object is where it hangs.
This is the stored procedure I call to update the table:
ALTER PROCEDURE [dbo].[UpdateDim_A]
#ID INT,
#FileDataID INT
,#CategoryID SMALLINT
,#FirstName VARCHAR(50)
,#LastName VARCHAR(50)
,#Company VARCHAR(100)
,#Email VARCHAR(250) AS BEGIN
SET NOCOUNT ON;
BEGIN TRAN
UPDATE DIM_A
SET
[FileDataID] = #FileDataID,
[CategoryID] = #CategoryID,
[FirstName] = #FirstName,
[LastName] = #LastName,
[Company] = #Company,
[Email] = #Email
WHERE PartyID=#ID
COMMIT TRAN; END
Note:
I already tried Dropping the constraint and indexes and changing the recovery mode of the database to simple.
Any help will be appreciate.
After Apply the solution provided by #Prabhat G, this is how my package looks like, running in 39 seconds (avg)!!!
Inside Dim_A DataFlow
Follow these 2 performance enhancers and you'll avoid your bottleneck.
Remove sort transformation. In your source, while fetching the data use order by sql. Reason being, sort takes up all the records in memory before sorting. You don't want that, be it incremental or full load.
In the last step of update, introduce another Staging Table instead of update records oledb command, which will be replica of Dim table. Once all the matching records are inserted in this new staging table, exit the Data flow task and create EXECUTE SQL TASK which will simply UPDATE Dim table based on joining ID/conditions.
Reason for this is, oledb command hits row by row. Always prefer update using Execute SQL Task as its a batch process.
Edit:
As per comments, to update only changed rows in Execute SQL Task, add the conditions in where clause:
eg:
UPDATE x
SET
x.attribute_A = y.attribute_A
,x.attribute_B = y.attribute_B
FROM
DimA x
inner join stg_DimA y
ON x.Id = y.Id
WHERE
(x.Attribute_A <> y.Attribute_A
OR x.Attribute_B <> y.Attribute_B)
So your problem is actually very simple the method you are using is executing that stored procedure for every row returned. If you have 9961(as in your picture) rows to update it will run that statement 9961 sepreate time. Chances are if you are to look at active queries running on SQL server you'll see that procedure executing over and over.
What you should do to speed this up is dump that data into a staging table then use the execute SQL task further in your package to run a standard SQL update. This will run much faster.
The problem is that you are trying to execute a stored procedure within the data flow. The correct SqlCommand will be an explicit UPDATE query and then map the columns from SSIS to the columns on the table that you are updating.
UPDATE DIM_A
SET FileDataID = ?
,CategoryID = ?
,FirstName = ?
,LastName = ?
,Company = ?
,Email = ?
WHERE PartyID = ?
Note: The #Id needs to be included as a column in your data flow.
One final thing you should consider, as Zane correctly pointed out: you should only update rows that have changed. So, in your data flow you should add a Conditional Split transformation that checks to see if any of the columns in the new source row are different from the existing table rows. Only rows that are different should be send to the OLE DB Command - the rest can be disregarded.
I have 3 tables ticket_addresses,tickets,['2014nosec add']. I want to update this ticket_addresses table but unfortunately i have run this query and it updated the entire table where the column ta_address_2 with '.'.
my doubt is my query is wrong because the from table ['2014nosec add'] is different from the update table and it does not have ta-address-2 column on it should give me an error because the from table is not the in the list.
is there any way to rollback the update query as i have not used it as transaction . I am using sql server managament studio.
update
ticket_addresses set ta_address_2 = '.'
FROM ['2014nosec add'] inner join tickets ------> I think this is wrong here.. it should be ticket_addresses table(right)
on ['2014nosec add'].[PCN] = tickets.t_reference
where ta_address_2 = ''
and ta_address_1 <> ' ' and t_camera_ticket = '-1'
and
convert (datetime,t_date_time_issued,101) between convert(datetime,'2014/04/15',101) and convert (datetime,'2014/06/06',101)
By default SQL Server using "Autocommit" mode for transaction management. So you can't rollback this query because it already commited.
I am wondering how to rewrite the following SQL Server 2005/2008 script for SQL Server 2000 which didn't have OUTPUT yet.
Basically, I would like to update rows and return the updated rows without creating deadlocks.
Thanks in advance!
UPDATE TABLE
SET Locked = 1
OUTPUT INSERTED.*
WHERE Locked = 0
You can't in SQL Server 2000 cleanly
What you can do is use a transaction and some lock hints to prevent a race condition. Your main problem is 2 processes accessing the same row(s), not a deadlock. See SQL Server Process Queue Race Condition for more, please.
BEGIN TRANSACTION
SELECT * FROM TABLE WITH (ROWLOCK, READPAST, UPDLOCK) WHERE Locked = 0
UPDATE TABLE
SET Locked = 1
WHERE Locked = 0
COMMIT TRANSACTION
I haven't tried this, but you could also try a SELECT in an UPDATE trigger from INSERTED.