SQL Merge - How can I optimize this?

SQL Merge - How can I optimize this? - sql

Table A (table to merge into) has 90,000 rows
Table B (source table) has 3,677 rows
I would expect this to merge really quick but it's taking 30 minutes (and counting).
How can it be optimized to run faster?
ALTER PROCEDURE [dbo].[MergeAddressFromGraph]
-- no params
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- first add fids to the MergeFundraiserNameAddress table instead of the temp table?
SELECT fundraiserid, ein
INTO #fids
FROM bb02_fundraiser
BEGIN TRAN;
MERGE BB02_FundraiserNameAddress AS T
USING
(
select f.fundraiserid,
n.addresslines,
n.town,
n.county,
n.postcode,
n.country,
n.fulladdress,
n.ein
from MergeFundraiserNameAddress n
join bb02_fundraiser f
on f.ein = n.ein and f.isdefault = 1
group by n.ein,
f.fundraiserid,
n.addresslines,
n.town,
n.county,
n.postcode,
n.country,
n.fulladdress
) AS S
ON (T.fundraiserid in( (select fundraiserid from #fids where ein = S.ein)) )
WHEN MATCHED
THEN UPDATE
SET
-- ADDRESS
T.addresslines = S.addresslines
,T.town = S.town
,T.county = S.county
,T.postcode = S.postcode
,T.country = S.country
,T.fulladdress = S.fulladdress
;
DELETE FROM MergeFundraiserNameAddress
COMMIT TRAN;
drop table #fids
END
UPDATE
I was able to improve the stored procedure which now runs in just a few seconds. I joined on the temp table instead of the bb02_fundraiser table and removed the subquery in the ON clause.
I realize now that the Merge is not necessary and I could have used an Update instead, but I'm ok with this right now because an INSERT may be needed soon in a refactor.
UPDATED STORED PROCEDURE BELOW
IF OBJECT_ID('tempdb..#fids') IS NOT NULL
DROP TABLE #fids
SELECT fundraiserid, ein
INTO #fids
FROM bb02_fundraiser
where isdefault = 1
BEGIN TRAN;
MERGE BB02_FundraiserNameAddress AS T
USING
(
select f.fundraiserid,
n.addresslines,
n.town,
n.county,
n.postcode,
n.country,
n.fulladdress,
n.ein
from MergeFundraiserNameAddress n
join #fids f
on f.ein = n.ein
group by n.ein,
f.fundraiserid,
n.addresslines,
n.town,
n.county,
n.postcode,
n.country,
n.fulladdress
) AS S
ON (T.fundraiserid = S.fundraiserid)
WHEN MATCHED
THEN UPDATE
SET
-- ADDRESS
T.addresslines = S.addresslines
,T.town = S.town
,T.county = S.county
,T.postcode = S.postcode
,T.country = S.country
,T.fulladdress = S.fulladdress
;
DELETE FROM MergeFundraiserNameAddress
COMMIT TRAN;
IF OBJECT_ID('tempdb..#fids') IS NOT NULL
DROP TABLE #fids

See below if this statement alone does the job for you.
UPDATE T
SET T.addresslines = n.addresslines
,T.town = n.town
,T.county = n.county
,T.postcode = n.postcode
,T.country = n.country
,T.fulladdress = n.fulladdress
from MergeFundraiserNameAddress n join bb02_fundraiser f
on f.ein = n.ein and f.isdefault = 1
INNER JOIN BB02_FundraiserNameAddress T
ON T.fundraiserid = f.fundraiserid AND T.ein = f.ein
group by n.ein,
f.fundraiserid,
n.addresslines,
n.town,
n.county,
n.postcode,
n.country,
n.fulladdress
As other users has mentioned in your comments, why use MERGE statement when you're only updating records. MERGE statement is used when you are doing multiple operation such as UPDATE , DELETE and INSERT.
Since you are only UPDATING records there is no need for merge statement.
Reason For Slow Performance
Since you are getting all the records in a Temp table and then joining it with other tables and not creating any indexes on that Temp table, The absence of any indexes will hurt the query performance.
When you do a SELECT * INTO #TempTable FROM Some_Table it will bring all the data from Some_Table into a Temp table but not the indexes. you can see your self by running this simple query
select * from tempdb.sys.indexes
where object_id = (select object_id
from tempdb.sys.objects
where name LIKE '#TempTable%')

also why delete when you can truncate.
truncate table MergeFundraiserNameAddress

Related

UPDATE from another table with multiple WHERE criteria

In Postgres 9.5, I want to connect to another DB using Postgres' dblink, get data and then use them to update another table.
-- connect to another DB, get data from table, put it in a WITH
WITH temp_table AS
(
SELECT r_id, descr, p_id
FROM
dblink('myconnection',
'SELECT
r_id, descr, p_id
FROM table
WHERE table.p_id
IN (10,20);'
)
AS tempTable(r_id integer, descr text, p_id integer)
)
-- now use temp_table to update
UPDATE anothertable
SET
descr =temp_table.descr
FROM anothertable AS x
INNER JOIN temp_table
ON
x.r_id = temp_table.r_id
AND
x.p_id = temp_table.p_id
AND
x.p_id IN (2) ;
dblink works fine and if I do select * from temp_table before the UPDATE, it has data.
The issue is the UPDATE itself. It runs with no errors, but it never actually updates the table.
I tried changing the UPDATE to:
UPDATE anothertable
SET
descr =temp_table.descr
FROM anothertable AS x , temp_table
WHERE x.r_id = temp_table.r_id
AND
x.p_id = temp_table.p_id
AND
x.p_id IN (2) ;
Same as above: runs with no errors, but it never actually updates the table.
I also tried to change the UPDATE to:
UPDATE anothertable
INNER JOIN temp_table
ON x.r_id = temp_table.r_id
AND
x.p_id = temp_table.p_id
AND
x.p_id IN (2)
SET descr =temp_table.descr
But I get:
ERROR: syntax error at or near "INNER" SQL state: 42601
Character: 1894
How can I fix this to actually update?

Don't repeat the target table in the FROM clause of the UPDATE:
WITH temp_table AS ( ... )
UPDATE anothertable x
SET descr = t.descr
FROM temp_table t
WHERE x.r_id = t.r_id
AND x.p_id = t.p_id
AND x.p_id IN (2);
Or simplified:
...
AND x.p_id = 2
AND t.p_id = 2
The manual:
Do not repeat the target table as a from_item unless you intend a self-join (in which case it must appear with an alias in the from_item).
Related:
UPDATE statement with multiple joins in PostgreSQL
SQL update query with substring WHERE clause

SQL Server : trigger deadlock

There is an InventoryCategory table and I have the following trigger attached to that table. The main purpose of trigger is to update the ProcessQueue table with InventoryCategory RowID when someone update or insert or delete a record in InventoryCategory table so that I know which records are updated recently and then I need to update other systems.
I have a multi threaded C# application updating InventoryCategory and this trigger is causing a deadlock. If I run the it with single thread I don't have any error.
CREATE trigger [dbo].[tr_InventoryCategory_Queue]
On [dbo].[InventoryCategory]
After Insert, Update, Delete
as
if ##ROWCOUNT = 0
return
-- inserted records need to be either inserted or deleted into the queue
if exists(select 1 from inserted)
begin
-- update existing queue records
update ProcessQueue
set ParentTable = 'InventoryCategory', UpdatedTime = getdate()
from inserted i
join ProcessQueue p on i.Id = p.RowID
where exists (select * from ProcessQueue q
where q.RowID = p.RowID and q.ParentTable = 'InventoryCategory')
and p.ParentTable = 'InventoryCategory'
-- insert new queue records
insert into ProcessQueue (ParentTable, RowID)
select
'InventoryCategory', i.id
from
inserted i
where
i.ID not in (select q.RowID from ProcessQueue q
where q.ParentTable = 'InventoryCategory')
end
-- deleted records need to be either inserted or updated into the queue
if exists(select 1 from deleted)
begin
-- update existing queue records
update ProcessQueue
set ParentTable = 'InventoryCategory', UpdatedTime = getdate()
from deleted d
join ProcessQueue p on d.Id = p.RowID
where exists (select * from ProcessQueue q
where q.RowID = p.RowID and q.ParentTable = 'InventoryCategory')
and p.ParentTable = 'InventoryCategory'
-- insert new queue records
insert into ProcessQueue (ParentTable, RowID)
select
'InventoryCategory', d.Id
from
deleted d
where
d.Id not in (select q.RowID from ProcessQueue q
where q.ParentTable = 'InventoryCategory')
end
Any suggestions?
Thanks in advance

DELETE Row of a table with a condition with another table

I have the following SQL which is working,
IF NOT EXISTS(SELECT 1 FROM [Batch] B
INNER JOIN BatchProducts BP
ON (B.ID = BP.BatchID)
WHERE Bp.ID = #ID AND B.RetailerID = #RetailerID)
BEGIN
RETURN;
END
DELETE FROM BatchProducts WHERE BatchProducts.ID = #ID;
But it is composed with 2 statements. I want to use a single DELETE with a condition that RetailerID must match in BatchProducts table.

You can do it this way:
DELETE FROM BatchProducts
WHERE BatchProducts.ID = #ID
AND EXISTS (SELECT * FROM [Batch]
WHERE [Batch].ID = BatchProducts.BatchID AND [Batch].RetailerID = #RetailerID)

Is there a particular reason you need to check if it exists?
DELETE bp
FROM BatchProducts bp
JOIN Batch b
ON bp.BatchID = b.ID
WHERE bp.ID = #ID
AND b.RetailerID = #RetailerID

SQL Server 2008 R2 Trigger multiple rows in a single batch update issue

I have implemented a trigger in SQL Server 2008 R2 in this way -
ALTER TRIGGER [dbo].[trgtblOrgStaffAssocLastUpdate] ON [dbo].[tblOrgStaffAssoc]
AFTER UPDATE
AS
IF EXISTS (SELECT i.* FROM INSERTED i inner join deleted d on i.ORG_ID = d.ORG_ID and isnull(i.StaffType, -1111) = isnull(d.StaffType, -1111)
where i.Deleted = 0
and (isnull(i.PER_ID, cast(cast(0 as binary) as uniqueidentifier)) <> isnull(d.PER_ID, cast(cast(0 as binary) as uniqueidentifier))
))
BEGIN
IF EXISTS (SELECT * FROM DELETED)
BEGIN
--UPDATE PER_ID
update l
set PER_ID = 1, UpdatedOn = GETDATE()
from dbo.tblOrgStaffAssocLastUpadate l inner join [dbo].[inserted] i on l.ORG_ID = i.ORG_ID and l.StaffType = i.StaffType
inner join [dbo].[deleted] d on i.ORG_ID = d.ORG_ID and i.StaffType = d.StaffType
where isnull(i.PER_ID, cast(cast(0 as binary) as uniqueidentifier)) <> isnull(d.PER_ID, cast(cast(0 as binary) as uniqueidentifier))
END
END
My purpose is to update tblOrgStaffAssocLastUpadate when tblOrgStaffAssoc is updated. For only one row, it works fine. However for multiple rows send over in one batch, it updates one row only in tblOrgStaffAssocLastUpadate while tblOrgStaffAssoc has multiple rows updated.
When I use a intermediate tables _Inserted and _deleted to buffer the INSERTED and DELETED data and use the permanent tables to update like this way -
ALTER TRIGGER [dbo].[trgtblOrgStaffAssocLastUpdate] ON [dbo].[tblOrgStaffAssoc]
AFTER UPDATE
AS
IF EXISTS (SELECT i.* FROM INSERTED i inner join deleted d on i.ORG_ID = d.ORG_ID and isnull(i.StaffType, -1111) = isnull(d.StaffType, -1111)
where i.Deleted = 0
and (isnull(i.PER_ID, cast(cast(0 as binary) as uniqueidentifier)) <> isnull(d.PER_ID, cast(cast(0 as binary) as uniqueidentifier))
))
BEGIN
IF EXISTS (SELECT * FROM DELETED)
BEGIN
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[_inserted]') AND type in (N'U'))
insert into [dbo].[_inserted]
select * from inserted
else
select * into [dbo].[_inserted]
from inserted
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[_deleted]') AND type in (N'U'))
insert into [dbo].[_deleted]
select * from deleted
else
select * into [dbo].[_deleted]
from deleted
--UPDATE PER_ID
update l
set PER_ID = 1, UpdatedOn = GETDATE()
from dbo.tblOrgStaffAssocLastUpadate l inner join [dbo].[_inserted] i on l.ORG_ID = i.ORG_ID and l.StaffType = i.StaffType
inner join [dbo].[_deleted] d on i.ORG_ID = d.ORG_ID and i.StaffType = d.StaffType
where isnull(i.PER_ID, cast(cast(0 as binary) as uniqueidentifier)) <> isnull(d.PER_ID, cast(cast(0 as binary) as uniqueidentifier))
END
END
It works fine. Changing permanent table _Inserted to use #inserted or table variable #inserted does not work either.
Apparently using permanent table is not a good idea. I don't know how the trigger works in this way. Can anyone help out?
Thanks
Edit to answer #usr's comment -
It does not work correctly if I use -
update l set PER_ID = 1, UpdatedOn = GETDATE() from dbo.tblOrgStaffAssocLastUpadate l inner join [dbo].[inserted] i on l.ORG_ID = i.ORG_ID and l.StaffType = i.StaffType inner join [dbo].[deleted] d on i.ORG_ID = d.ORG_ID and i.StaffType = d.StaffType where isnull(i.PER_ID, cast(cast(0 as binary) as uniqueidentifier)) <> isnull(d.PER_ID, cast(cast(0 as binary) as uniqueidentifier))
Only one row is updated. The rest rows are not updated at all. Even though I can see multiple rows returned if I use
select l.* from dbo.tblOrgStaffAssocLastUpadate l inner join [dbo].[inserted] i on l.ORG_ID = i.ORG_ID and l.StaffType = i.StaffType inner join [dbo].[deleted] d on i.ORG_ID = d.ORG_ID and i.StaffType = d.StaffType where isnull(i.PER_ID, cast(cast(0 as binary) as uniqueidentifier)) <> isnull(d.PER_ID, cast(cast(0 as binary) as uniqueidentifier))
right before the update statement. That is where I am totally confused why the next update statement only updates one row only, and why only permanent table can persist all the rows update but not temporary table and table variables.
Updated question -
Since multiple update rows are send to the table in a single batch. It seems that the trigger's INSERT and DELETE table only one and the last one update row only at the time when I reach " update l " statement. Before that it holds multiple update rows. I can see that when I use permanent tables. I just don't understand SQL Server behave in this way. Anyone saw the same thing?

You are seriously over-complicating this. You do not need all those checks and links between INSERTED and DELETED.
This is a trigger on UPDATE only, therefore, there will always be a DELETED table that contains for each changed row what was in the table and there will always be an INSERTED that contains for each changed row what will be in the table. So for your purposes you only need to deal with one of these tables. Rows that are not changed are in neither table.
If this was a trigger on INSERT there would only be an INSERTED table. Similarly, a DELETE trigger only has a DELETED table.
Secondly, you seem to think that NULL=NULL is true - it isn't. The statement NULL=NULL returns NULL. So does NULL<>NULL, NULL>=NULL etc. etc. In a database NULL means not a value and something that does not have a value is incomparable. So your where statement is also superfluous.
So I think the code you want is:
ALTER TRIGGER [dbo].[trgtblOrgStaffAssocLastUpdate] ON [dbo].[tblOrgStaffAssoc]
AFTER UPDATE
AS
UPDATE l
SET PER_ID = 1
,UpdatedOn = GETDATE()
FROM dbo.tblOrgStaffAssocLastUpadate l
INNER JOIN
inserted i ON l.ORG_ID = i.ORG_ID and l.StaffType = i.StaffType

Avoiding creating the same query twice in SQL

I have a pretty much simple and self explanatory SQL statement:
ALTER PROCEDURE [dbo].[sp_getAllDebatesForAlias](#SubjectAlias nchar(30))
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for procedure here
SELECT *
FROM tblDebates
WHERE (SubjectID1 in (SELECT SubjectID FROM tblSubjectAlias WHERE SubjectAlias = #SubjectAlias))
OR (SubjectID2 in (SELECT SubjectID FROM tblSubjectAlias WHERE SubjectAlias = #SubjectAlias)) ;
END
I am certain that there is a way to make this statement more efficient, at least get rid of that multiple creation of the same table in the in section, i.e., the
SELECT SubjectID FROM tblSubjectAlias WHERE SubjectAlias = #SubjectAlias
part.
Any ideas?

Try:
select d.* from tblDebates d
where exists
(select 1
from tblSubjectAlias s
where s.SubjectID in (d.SubjectID1, d.SubjectID2) and
s.SubjectAlias = #SubjectAlias)

SELECT d.*
FROM tblDebates d
inner join tblSubjectAlias s on s.SubjectID in (d.SubjectID1, d.SubjectID2)
where s.SubjectAlias = #SubjectAlias

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Merge - How can I optimize this? - sql

also why delete when you can truncate. truncate table MergeFundraiserNameAddress

Related

UPDATE from another table with multiple WHERE criteria

SQL Server : trigger deadlock

DELETE Row of a table with a condition with another table

SQL Server 2008 R2 Trigger multiple rows in a single batch update issue

Avoiding creating the same query twice in SQL

Categories

Resources