SSIS is hanging during Update with 3 millions of rows - sql

I'm implementing a new method for a warehouse. The new method consist on perform incremental loading between source and destination tables (Insert,Update or Delete).
All the table are working really well, except for 1 table which the Source has more than 3 millions of rows, as you will see in the image below it just start running but never finish.
Probable I'm not doing the update in the correct way or there is another way to do it.
Here are some pictures of my SSIS package:
Highlighted object is where it hangs.
This is the stored procedure I call to update the table:
ALTER PROCEDURE [dbo].[UpdateDim_A]
#ID INT,
#FileDataID INT
,#CategoryID SMALLINT
,#FirstName VARCHAR(50)
,#LastName VARCHAR(50)
,#Company VARCHAR(100)
,#Email VARCHAR(250) AS BEGIN
SET NOCOUNT ON;
BEGIN TRAN
UPDATE DIM_A
SET
[FileDataID] = #FileDataID,
[CategoryID] = #CategoryID,
[FirstName] = #FirstName,
[LastName] = #LastName,
[Company] = #Company,
[Email] = #Email
WHERE PartyID=#ID
COMMIT TRAN; END
Note:
I already tried Dropping the constraint and indexes and changing the recovery mode of the database to simple.
Any help will be appreciate.
After Apply the solution provided by #Prabhat G, this is how my package looks like, running in 39 seconds (avg)!!!
Inside Dim_A DataFlow

Follow these 2 performance enhancers and you'll avoid your bottleneck.
Remove sort transformation. In your source, while fetching the data use order by sql. Reason being, sort takes up all the records in memory before sorting. You don't want that, be it incremental or full load.
In the last step of update, introduce another Staging Table instead of update records oledb command, which will be replica of Dim table. Once all the matching records are inserted in this new staging table, exit the Data flow task and create EXECUTE SQL TASK which will simply UPDATE Dim table based on joining ID/conditions.
Reason for this is, oledb command hits row by row. Always prefer update using Execute SQL Task as its a batch process.
Edit:
As per comments, to update only changed rows in Execute SQL Task, add the conditions in where clause:
eg:
UPDATE x
SET
x.attribute_A = y.attribute_A
,x.attribute_B = y.attribute_B
FROM
DimA x
inner join stg_DimA y
ON x.Id = y.Id
WHERE
(x.Attribute_A <> y.Attribute_A
OR x.Attribute_B <> y.Attribute_B)

So your problem is actually very simple the method you are using is executing that stored procedure for every row returned. If you have 9961(as in your picture) rows to update it will run that statement 9961 sepreate time. Chances are if you are to look at active queries running on SQL server you'll see that procedure executing over and over.
What you should do to speed this up is dump that data into a staging table then use the execute SQL task further in your package to run a standard SQL update. This will run much faster.

The problem is that you are trying to execute a stored procedure within the data flow. The correct SqlCommand will be an explicit UPDATE query and then map the columns from SSIS to the columns on the table that you are updating.
UPDATE DIM_A
SET FileDataID = ?
,CategoryID = ?
,FirstName = ?
,LastName = ?
,Company = ?
,Email = ?
WHERE PartyID = ?
Note: The #Id needs to be included as a column in your data flow.
One final thing you should consider, as Zane correctly pointed out: you should only update rows that have changed. So, in your data flow you should add a Conditional Split transformation that checks to see if any of the columns in the new source row are different from the existing table rows. Only rows that are different should be send to the OLE DB Command - the rest can be disregarded.

Related

How to update and insert in T-SQL in one query

I have a database that needs from time to time an update.
It may also happens that there are new data while the update runs.
In MySQL there is a option
INSERT INTO IGNORE
I can't find something like this in T-SQL.
No Problem to update ID 1-4 but then there is a new record for ID 5.
The UPDATE query don't work here.
And when I try to INSERT all data again I get a DUPLICATE KEY error.
Additional Infos:
I've forgotten to say that my data come from external sources. I call an API to get data from it. From there I have to insert these data into my database.
I have to admit that I don't understand MERGE. So my solution for now is to use TRUNCATE first and then insert all data again.
Not the best solution but MERGE works, so far I understand it, with two tables. But I have only one table. And to create a table temporarly to use MERGE and later drop that table is in my eyes a bit to much for my little table with 200 records in it.
You can use MERGE keyword. Basically, you need to specify the column(s) on which to join the source of data with target table, and depending on whether it is matching (existing record) or not matching (new record), you run an UPDATE or INSERT.
Reference: http://msdn.microsoft.com/en-us/library/bb510625.aspx
Is a stored procedure an option?
CREATE PROCEDURE dbo.Testing (#ID int, #Field1 varchar(20))
AS
BEGIN
UPDATE tblTesting
SET Field1 = #Field1
WHERE ID = #ID
IF ##ROWCOUNT = 0
INSERT INTO tblTesting (ID, Field1) SELECT #ID, #Field1
END

Create a Trigger to insert a rows in another table

After create a Stored Procedure in a Table " dbo.terms" to insert a data in it using this code:
CREATE PROCEDURE dbo.terms
#Term_en NVARCHAR(50) = NULL ,
#Createdate DATETIME = NULL ,
#Writer NVARCHAR(50) = NULL ,
#Term_Subdomain NVARCHAR(50) = NULL
AS
BEGIN
SET NOCOUNT ON
INSERT INTO dbo.terms
(
Term_en ,
Createdate ,
Writer ,
Term_Subdomain
)
VALUES
(
#Term_en = 'Cat' ,
#Createdate = '2013-12-12' ,
#Writer = 'Fadi' ,
#Term_Subdomain = 'English'
)
END
GO
I want to Create a Trigger in it to add another rows in a table dbo.term_prop
I used this code :
CREATE TRIGGER triggerdata
AFTER INSERT
ON dbo.terms
FOR EACH ROW
BEGIN
INSERT INTO dbo.term_prop VALUES
('قطة', term_ar, upper(:new.term_ar) , null , 'chat', term_fr, upper(:new.term_fr) , null ,'Animal', Def_en, upper(:new.Def_en) , null ,'حيوان', Def_ar, upper(:new.Def_ar) , null ,'Animal', Def_fr, upper(:new.Def_fr) , null);
END;
and it shows me an Error
To add more rows you can use SELECTED table.
This is a special table populated with rows inserted in your transaction.
An example is:
INSERT INTO dbo.term_prop VALUES
SELECT * FROM inserted
So you mustn't use FOR EACH ROW.
The correct definition of your trigger will be
CREATE TRIGGER triggername ON table AFTER INSERT
AS BEGIN
END
Joe answer is a good one and this is more a advice.
Avoid triggers, they can cause maintenance nightmares: are trick to maintain and debug.
If you want to inserts tables in another table after inserting in the first one just put that code in the same SP.
If you need a auto identity generated value you can do it by using ##identity or scope_identity() or ident_current().
Try to keep things simple.
Wow, I am still surprised that triggers are given a bad wrap! I wrote a dozen articles on them a long time ago ...
Like anything in life, the use of triggers depends on the situation.
1 - Trigger are great to track DDL changes. Who changed that table?
http://craftydba.com/?p=2015
2 - Triggers can track DML changes (insert, update, delete). However, on large tables with large transaction numbers, they can slow down processing.
http://craftydba.com/?p=2060
However, with today's hardware, what is slow for me might not be slow for you.
3 - Triggers are great at tracking logins and/or server changes.
http://craftydba.com/?p=1909
So, lets get back to center and talk about your situation.
Why are you trying to make a duplicate entry on just an insert action?
Other options right out of the SQL Server Engine to solve this problem, are:
1 - Move data from table 1 to table 2 via a custom job. "Insert Into table 1 select * from table 2 where etl_flag = 0;". Of course make it transactional and update the flag after the insert is complete. I am just considering inserts w/o deletes or updates.
2 - If you want to track just changes, check out the change data capture. It reads from the transaction log. It is not as instant as a trigger, ie - does not fire for every record. Just runs as a SQL Agent Job in the background to load cdc.tables.
3 - Replicate the data from one server1.database1.table1 to server2.database2.table2.
ETC ...
I hope my post reminds everyone that the situation determines the solution.
Triggers are good in certain situations, otherwise, they would have been removed from the product a long time ago.
And if the situation changes, then the solution might have to change ...

Deleted Rows Daily

I have a database table from which same data under a certain condition are lost at a specific time daily as if such statement is performed:
delete * from table where category=1
I'd like to list all delete actions on this table through a SQL script to know how records are deleted exactly and by which statement, user and time of deletion.
Does anyone have such script? Or did anyone have similar case and can advise?
The SQL version is Server 2008 Enterprise Edition.
If this is just a short-term debugging issue, the easiest way to address this is probably to run SQL Server Profiler, with filters set to capture the data you're interested in. No code changes that way.
For best performance, try to run SQL Profiler on a machine other than the DB server, if you can.
Use AFTER DELETE trigger on the table to log deletions in another table with user and time it was performed.
Using some advanced tricks you can extract the query text which deleted the rows, but I'm not sure that it is possible inside a trigger.
The trigger might look like this
CREATE TABLE YourLogTable
(
ID int identity primary key,
Date datetime NOT NULL DEFAULT GETDATE(),
[User] nvarchar(128) NOT NULL DEFAULT suser_sname(),
[SqlText] NVARCHAR(MAX),
[any other interesting columns from deleted rows]
)
GO
CREATE TRIGGER [TR.AD#YourTable]
ON YourTable
AFTER DELETE
AS
BEGIN
SET NOCOUNT ON;
DECLARE #sqlText NVARCHAR(MAX)
SELECT #sqlText = txt.Text
FROM sys.dm_exec_connections c
CROSS APPLY sys.dm_exec_sql_text(c.most_recent_sql_handle) txt
WHERE session_id = ##SPID
INSERT YourLogTable([SqlText], [any other interesting columns from deleted rows])
SELECT #SqlText, [any other interesting columns from deleted rows]
FROM DELETED
END

Stored procedure and trigger

I had a task -- to create update trigger, that works on real table data change (not just update with the same values). For that purpose I had created copy table then began to compare updated rows with the old copied ones. When trigger completes, it's neccessary to actualize the copy:
UPDATE CopyTable SET
id = s.id,
-- many, many fields
FROM MainTable s WHERE s.id IN (SELECT [id] FROM INSERTED)
AND CopyTable.id = s.id;
I don't like to have this ugly code in the trigger anymore, so I have extracted it to a stored procedure:
CREATE PROCEDURE UpdateCopy AS
BEGIN
UPDATE CopyTable SET
id = s.id,
-- many, many fields
FROM MainTable s WHERE s.id IN (SELECT [id] FROM INSERTED)
AND CopyTable.id = s.id;
END
The result is -- Invalid object name 'INSERTED'. How can I workaround this?
Regards,
Leave the code in the trigger. INSERTED is a pseudo-table only available in the trigger code. Do not try to pass around this pseudo-table values, it may contain a very large number of entries.
This is T-SQL, a declarative data access language. It is not your run-of-the-mill procedural programming language. Common wisdom like 'code reuse' does not apply in SQL and it will only cause you performance issues. Leave the code in the trigger, where it belongs. For ease of re-factoring, generate triggers through some code generation tool so you can easily refactor the triggers.
The problem is that INSERTED is only available during the trigger
-- Trigger changes to build list of id's
DECLARE #idStack VARCHAR(max)
SET #idStack=','
SELECT #idStack=#idStack+ltrim(str(id))+',' FROM INSERTED
-- Trigger changes to call stored proc
EXEC updateCopy(#idStack)
-- Procedure to take a comma separated list of id's
CREATE PROCEDURE UpdateCopy(#IDLIST VARCHAR(max)) AS
BEGIN
UPDATE CopyTable SET
id = s.id,
-- many, many fields
FROM MainTable s WHERE charindex(','+ltrim(str(s.id))+',',#idList) > 0
AND CopyTable.id = s.id;
END
Performance will not be great, but it should allow you to do what you want.
Just typed in on the fly, but should run OK
The real question is "How to pass array of GUIDs in a stored procedure?" or, more wide, "How to pass an array in a stored procedure?".
Here is the answers:
http://www.sommarskog.se/arrays-in-sql-2005.html
http://www.sommarskog.se/arrays-in-sql-2008.html

atomic compare and swap in a database

I am working on a work queueing solution. I want to query a given row in the database, where a status column has a specific value, modify that value and return the row, and I want to do it atomically, so that no other query will see it:
begin transaction
select * from table where pk = x and status = y
update table set status = z where pk = x
commit transaction
--(the row would be returned)
it must be impossible for 2 or more concurrent queries to return the row (one query execution would see the row while its status = y) -- sort of like an interlocked CompareAndExchange operation.
I know the code above runs (for SQL server), but will the swap always be atomic?
I need a solution that will work for SQL Server and Oracle
Is PK the primary key? Then this is a non issue, if you already know the primary key there is no sport. If pk is the primary key, then this begs the obvious question how do you know the pk of the item to dequeue...
The problem is if you don't know the primary key and want to dequeue the next 'available' (ie. status = y) and mark it as dequeued (delete it or set status = z).
The proper way to do this is to use a single statement. Unfortunately the syntax differs between Oracle and SQL Server. The SQL Server syntax is:
update top (1) [<table>]
set status = z
output DELETED.*
where status = y;
I'm not familiar enough with Oracle's RETURNING clause to give an example similar to SQL's OUTPUT one.
Other SQL Server solutions require lock hints on the SELECT (with UPDLOCK) to be correct.
In Oracle the preffered avenue is use the FOR UPDATE, but that does not work in SQL Server since FOR UPDATE is to be used in conjunction with cursors in SQL.
In any case, the behavior you have in the original post is incorrect. Multiple sessions can all select the same row(s) and even all update it, returning the same dequeued item(s) to multiple readers.
As a general rule, to make an operation like this atomic you'll need to ensure that you set an exclusive (or update) lock when you perform the select so that no other transaction can read the row before your update.
The typical syntax for this is something like:
select * from table where pk = x and status = y for update
but you'd need to look it up to be sure.
I have some applications that follow a similar pattern. There is a table like yours that represents a queue of work. The table has two extra columns: thread_id and thread_date. When the app asks for work froom the queue, it submits a thread id. Then a single update statement updates all applicable rows with the thread id column with the submitted id and the thread date column with the current time. After that update, it selects all rows with that thread id. This way you dont need to declare an explicit transaction. The "locking" occurs in the initial update.
The thread_date column is used to ensure that you do not end up with orphaned work items. What happens if items are pulled from the queue and then your app crashes? You have to have the ability to try those work items again. So you might grab all items off the queue that have not been marked completed but have been assigned to a thread with a thread date in the distant past. Its up to you to define "distant."
Try this. The validation is in the UPDATE statement.
Code
IF EXISTS (SELECT * FROM sys.tables WHERE name = 't1')
DROP TABLE dbo.t1
GO
CREATE TABLE dbo.t1 (
ColID int IDENTITY,
[Status] varchar(20)
)
GO
DECLARE #id int
DECLARE #initialValue varchar(20)
DECLARE #newValue varchar(20)
SET #initialValue = 'Initial Value'
INSERT INTO dbo.t1 (Status) VALUES (#initialValue)
SELECT #id = SCOPE_IDENTITY()
SET #newValue = 'Updated Value'
BEGIN TRAN
UPDATE dbo.t1
SET
#initialValue = [Status],
[Status] = #newValue
WHERE ColID = #id
AND [Status] = #initialValue
SELECT ColID, [Status] FROM dbo.t1
COMMIT TRAN
SELECT #initialValue AS '#initialValue', #newValue AS '#newValue'
Results
ColID Status
----- -------------
1 Updated Value
#initialValue #newValue
------------- -------------
Initial Value Updated Value