Merge - Only update if values have changed

Merge - Only update if values have changed - sql

I am running a merge in SQL Server. In my update, I want to only update the row if the values have changed. There is a version row that increments on each update. Below is an example:
MERGE Employee as tgt USING
(SELECT Employee_History.Emp_ID
, Employee_History.First_Name
, Employee_History.Last_Name
FROM Employee_History)
as src (Emp_ID,First_Name,Last_Name)
ON tgt.Emp_ID = src.Emp_ID
WHEN MATCHED THEN
UPDATE SET
Emp_ID = src.Emp_ID,
,[VERSION] = tgt.VERSION + 1
,First_Name = src.First_Name
,Last_Name = src.Last_Name
WHEN NOT MATCHED BY target THEN
INSERT (Emp_ID,0,First_Name,Last_Name)
VALUES
(src.Emp_ID,[VERSION],src.First_Name,src.Last_Name);
Now, if I only wanted to update the row, and thus increment version, ONLY if the name has changed.

WHEN MATCHED can have AND . Also, no need to update EMP_ID .
...
WHEN MATCHED AND (trg.First_Name <> src.First_Name
OR trg.Last_Name <> src.Last_Name) THEN UPDATE
SET
[VERSION] = tgt.VERSION + 1
,First_Name = src.First_Name
,Last_Name = src.Last_Name
...
If Last_Name or First_Name are nullable, you need to take care of NULL values while comparing trg.Last_Name <> src.Last_Name , for instance ISNULL(trg.Last_Name,'') <> ISNULL(src.Last_Name,'')

The answer provided by a1ex07 is the right answer, but i just wanted to expand on the difficulty in comparing a large number of columns, watching for nulls, etc.
I found that I could generate a checksum in some CTE's with hashbytes, target those CTEs in the merge, and then use the "update and...." condition specified above to compare the hashes:
with SourcePermissions as (
SELECT 1 as Code, 1013 as ObjectTypeCode, 'Create Market' as ActionName, null as ModuleCode, 1 as AssignableTargetFlags
union all SELECT 2, 1013, 'View Market', null, 1
union all SELECT 3, 1013, 'Edit Market', null, 1
--...shortened....
)
,SourcePermissions2 as (
select sp.*, HASHBYTES('sha2_256', xmlcol) as [Checksum]
from SourcePermissions sp
cross apply (select sp.* for xml raw) x(xmlcol)
)
,TargetPermissions as (
select p.*, HASHBYTES('sha2_256', xmlcol) as [Checksum]
from Permission p
cross apply (select p.* for xml raw) x(xmlcol)
) --select * from SourcePermissions2 sp join TargetPermissions tp on sp.code=tp.code where sp.Checksum = tp.Checksum
MERGE TargetPermissions AS target
USING (select * from SourcePermissions2) AS source ([Code] , [ObjectTypeCode] , [ActionName] , [ModuleCode] , [AssignableTargetFlags], [Checksum])
ON (target.Code = source.Code)
WHEN MATCHED and source.[Checksum] != target.[Checksum] then
UPDATE SET [ObjectTypeCode] = source.[ObjectTypeCode], [ActionName]=source.[ActionName], [ModuleCode]=source.[ModuleCode], [AssignableTargetFlags] = source.[AssignableTargetFlags]
WHEN NOT MATCHED THEN
INSERT ([Code] , [ObjectTypeCode] , [ActionName] , [ModuleCode] , [AssignableTargetFlags])
VALUES (source.[Code] , source.[ObjectTypeCode] , source.[ActionName] , source.[ModuleCode] , source.[AssignableTargetFlags])
OUTPUT deleted.*, $action, inserted.[Code]
--only minor issue is that you can no longer do a inserted.* here since it gives error 404 (sql, not web), complaining about returning checksum which is included in the target cte but not the underlying table
,inserted.[ObjectTypeCode] , inserted.[ActionName] , inserted.[ModuleCode] , inserted.[AssignableTargetFlags]
;
Couple of notes: I could have simplified greatly with checksum or binary_checksum, but I always get collisions with those.
As to the 'why', this is part of an automated deployment to keep a lookup table up to date. The problem with the merge though is there is an indexed view that is complex and heavily used, so updates to the related tables are quite expensive.

Rather than avoiding an update altogether, you could change your [VERSION] + 1 code to add zero when names match:
[VERSION] = tgt.VERSION + (CASE
WHEN tgt.First_Name <> src.First_Name OR tgt.Last_Name <> src.Last_Name
THEN 1
ELSE 0 END)

#a1ex07 thanks for the answer.. a slight correction.. I am not following SQL version so this could be a change in SQL specification
WHEN MATCHED AND CONDITION THEN UPDATE
The above is not a valid syntax
Following is valid
WHEN MATCHED THEN UPDATE SET ... WHERE CONDITION WHEN NOT MATCHED THEN INSERT...
so would change it to
WHEN MATCHED THEN UPDATE
SET
[VERSION] = tgt.VERSION + 1
,First_Name = src.First_Name
,Last_Name = src.Last_Name
WHERE
trg.First_Name <> src.First_Name
OR trg.Last_Name <> src.Last_Name
https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm#SQLRF01606

Related

How remove duplicates from a stored procedure without using DISTINCT

A stored procedure has been written that includes duplicates. ROW_NUMBER was tried but did not work. DISTINCT has worked but was unable to retrieve the large number of records required (about 700,000). Is there another way of using RANK or GROUP BY to remove duplicates?
I have used DISTINCT and this does not retrieve enough records. I have not successfully used GROUP BY.
I have attempted to use ROW NUMBER but this did not work either (you can see where its commented out).
CREATE PROCEDURE [report].[get_foodDetails]
#foodgroup_id INT,
#shop_id INT = 0,
#product_id INT = 0,
#maxrows INT = 600,
#expiry INT = 1,
#productactive INT = 1,
#expiryPeriod DATETIME = '9999-12-31 23:59:59'
AS
IF (#expiryPeriod >= '9999-12-31')
BEGIN
SET #expiryPeriod = GETDATE()
END
SELECT
-- dp.RowNumber
ISNULL([FoodType], '') AS [Foodtype],
ISNULL([FoodColour], '') AS [FoodColour],
ISNULL([FoodBarcode], '') AS [FoodBarcode],
ISNULL([FoodArticleNum], 0) AS [FoodArticleNum],
ISNULL([FoodShelfLife, '9999-21-31') AS [FoodShelfLIFe]
INTO
#devfood
FROM
report.[GetOrderList] (#foodgroup_id, #product_id, #productactive, #expiry, #expiryPeriod, #shop_id, #maxrows ) dp
INNER JOIN
food_group fg ON fg.food_group_id = it.item_FK_item_group_id
SELECT TOP(#maxrows) *
FROM #devfood
ORDER BY [device_packet_created_date]
END
Around 700,000 records retrieved. This is currently achieved although there are duplicates. There are only 20,000 retrieved when using DISTINCT (but no duplicates).

The sample code below is from a presentation I've used to demonstrate CTE's. This is a common mechanism for removing duplicates and is very fast. In this case the duplicates are removed directly from the table. If that is not your objective you could use a temp table or a prior chained CTE. Note that the important thing is what columns you partition by. If, in the example, you partitioned by only [name] you would not see both the red rose and the white rose.
-------------------------------------------------
if object_id(N'[flower].[order]', N'U') is not null
drop table [flower].[order];
go
create table [flower].[order]
(
[id] int identity(1, 1) not null constraint [flower.order.id.clustered_primary_key] primary key clustered
, [flower] nvarchar(128)
, [color] nvarchar(128)
, [count] int
);
go
insert into [flower].[order]
([flower]
, [color]
, [count])
values (N'rose',N'red',5),
(N'rose',N'red',3),
(N'rose',N'white',2),
(N'rose',N'red',1),
(N'rose',N'red',9),
(N'marigold',N'yellow',2),
(N'marigold',N'yellow',9),
(N'marigold',N'yellow',4),
(N'chamomile',N'amber',9),
(N'chamomile',N'amber',4),
(N'lily',N'white',12);
go
select [flower]
, [color]
from [flower].[order];
go
--
-------------------------------------------------
with [duplicate_finder]([name], [color], [sequence])
as (select [flower]
, [color]
, row_number()
over (
partition by [flower], [color]
order by [flower] desc) as [sequence]
from [flower].[order])
delete from [duplicate_finder]
where [sequence] > 1;
--
-- no duplicates
-------------------------------------------------
select [flower]
, [color]
from [flower].[order];

I know you said you tried ROW_NUMBER, but did you try it either of these ways?
First, a CTE. The CTE here is just your existing query, but with a ROW_NUMBER windowing function attached. For each duplicate iteration of a record, it will add one to RowNumber. With the next unique group of records, RowNumber resets to 1.
After the pull, only take the records with a RowNumber = 1. I use this all the time for deleting dupes out of the underlying record set, but it works well to just identify them as well.
WITH NoDupes AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY
ISNULL(FoodType, '')
,ISNULL(FoodColour, '')
,ISNULL(FoodBarcode, '')
,ISNULL(FoodArticleNum, '')
,ISNULL(FoodShelfLife, '9999-21-31')
ORDER BY
(
SELECT
0
)
) AS RowNumber
,ISNULL(FoodType, '') AS Foodtype
,ISNULL(FoodColour, '') AS FoodColour
,ISNULL(FoodBarcode, '') AS FoodBarcode
,ISNULL(FoodArticleNum, 0) AS FoodArticleNum
,ISNULL(FoodShelfLife, '9999-21-31') AS FoodShelfLIFe
FROM
report.GetOrderList(#foodgroup_id, #product_id, #productactive, #expiry, #expiryPeriod, #shop_id, #maxrows) AS dp
INNER JOIN
food_group AS fg
ON
fg.food_group_id = it.item_FK_item_group_id
)
SELECT
nd.Foodtype
,nd.FoodColour
,nd.FoodBarcode
,nd.FoodArticleNum
,nd.FoodShelfLIFe
INTO
#devfood
FROM
NoDupes AS nd
WHERE
NoDupes.RowNumber = 1;
Alternatively (and shorter) you could try SELECT TOP (1) WITH TIES, using that same ROW_NUMBER function to order the record set. The TOP (1) WITH TIES part functionally does the same thing as the CTE, returning only the first record of each set of duplicates.
SELECT
TOP (1) WITH TIES
ISNULL(FoodType, '') AS Foodtype
,ISNULL(FoodColour, '') AS FoodColour
,ISNULL(FoodBarcode, '') AS FoodBarcode
,ISNULL(FoodArticleNum, 0) AS FoodArticleNum
,ISNULL(FoodShelfLife, '9999-21-31') AS FoodShelfLIFe
INTO
#devfood
FROM
report.GetOrderList(#foodgroup_id, #product_id, #productactive, #expiry, #expiryPeriod, #shop_id, #maxrows) AS dp
INNER JOIN
food_group AS fg
ON
fg.food_group_id = it.item_FK_item_group_id
ORDER BY
ROW_NUMBER() OVER (PARTITION BY
ISNULL(FoodType, '')
,ISNULL(FoodColour, '')
,ISNULL(FoodBarcode, '')
,ISNULL(FoodArticleNum, '')
,ISNULL(FoodShelfLife, '9999-21-31')
ORDER BY
(
SELECT
0
)
);
The CTE is maybe a little clearer in it's intention for the next person who looks at the code, but the TOP might perform a little better.

MS SQL Execute multiple updates based on a list

I have the problem that I found out how to fix the database the only problem is that I have to insert the CaseNumber for one execution everytime.
In C# I would use somekind of string list for the broken records is there something in MS SQL.
My Code so far I implemented a variable CaseNumber. I have a table with a lot of Casenumber records that are broken. Is there a way to execute this for every Casenumber of a different table.
Like:
1. Take the first casenumber and run this script.
2. Than take the second one and run this script again until every casenumber was fixed.
Thx in advance for any idea.
GO
DECLARE #CaseNumber VARCHAR(50)
SET #CaseNumber = '25615'
print 'Start fixing broken records.'
print 'Fixing FIELD2'
UPDATE t
SET t.FIELD2 = ( SELECT DISTINCT TOP 1 FIELD2
FROM {myTable} t2
WHERE IDFIELD = #CaseNumber
AND FIELD2 IS NOT NULL )
FROM {myTable} t
WHERE FIELD2 IS NULL
AND IDFIELD = #CaseNumber
</Code>

Here are a couple different options...
-- This verion will just "fix" everything that can be fixed.
UPDATE mt1 SET
mt1.FIELD2 = mtx.FIELD2
FROM
dbo.myTable mt1
CROSS APPLY (
SELECT TOP (1)
mt2.FIELD2
FROM
dbo.myTable mt2
WHERE
mt1.IDFIELD = mt2.IDFIELD
AND mt2.FIELD2 IS NOT NULL
) mtx
WHERE
mt1.FIELD2 IS NULL;
And if, for whatever reason, you don't want to fix the entire table all in one go. You can restrain to to just those you specify...
-- This version will works off the same principal but limits itself to only those values in the #CaseNumCSV parameter.
DECLARE #CaseNumCSV VARCHAR(8000) = '25615,25616,25617,25618,25619';
IF OBJECT_ID('tempdb..#CaseNum', 'U') IS NOT NULL
BEGIN DROP TABLE #CaseNum; END;
CREATE TABLE #CaseNum (
CaseNumber VARCHAR(50) NOT NULL,
PRIMARY KEY (CaseNumber)
WITH(IGNORE_DUP_KEY = ON) -- just in case the same CaseNumber is in the string multiple times.
);
INSERT #CaseNum(CaseNumber)
SELECT
CaseNumber = dsk.Item
FROM
dbo.DelimitedSplit8K(#CaseNumCSV, ',') dsk;
-- a copy of DelimitedSplit8K can be found here: http://www.sqlservercentral.com/articles/Tally+Table/72993/
UPDATE mt1 SET
mt1.FIELD2 = mtx.FIELD2
FROM
#CaseNum cn
JOIN dbo.myTable mt1
ON cn.CaseNumber = mt1.IDFIELD
CROSS APPLY (
SELECT TOP (1)
mt2.FIELD2
FROM
dbo.myTable mt2
WHERE
mt1.IDFIELD = mt2.IDFIELD
AND mt2.FIELD2 IS NOT NULL
) mtx
WHERE
mt1.FIELD2 IS NULL;

Removing duplicated Data in SQL Server 2008

I have the bellow code which is comparing the String before it to see if they match. The code is working on records that there are only 1 or 2 dupes.
If there is 3 or more then the current code is not working.
What i need to do is Display the old code as the ID of that line. The New Code i need to be the 1st matched item in the list, in the example it will be 131133. This needs to be the new code for all of the items that match.
I then need the code which has been replaced to appear in the Deleted code and only the code that has been replaced. in the example this should be 141439.
Can i achieve this with my code bellow or do i need to tackle it from another angle?
Thank you in advance.
;WITH MyCTE AS
(
SELECT *,
ROW_NUMBER()OVER (ORDER BY SortField) AS rn
FROM Aron_Reporting.dbo.Customer_Sort
)
SELECT T1.Forename as Forename, T1.pcode, T1.Surname as Surname,T1.SortField AS T1String,
T2.SortField AS T2String,
T1.IDNO as OldCode,
CASE
WHEN T1.SortField IS NULL OR T1.SortField = ' ' OR T2.SortField = ' ' or T2.SortField IS NULL THEN T1.IDNO
WHEN T1.SortField = T2.SortField THEN T2.IDNO ELSE T1.IDNO END AS NewCode,
CASE
WHEN T1.SortField IS NULL OR T1.SortField = ' ' or T2.SortField = ' ' or T2.SortField IS NULL THEN ' '
WHEN T1.SortField = T2.SortField THEN T2.IDNO ELSE ' ' END AS DeleteCode
FROM MyCTE T1
LEFT JOIN MyCTE T2
ON T1.rn = T2.rn+1

I'm not exactly sure what you are trying to do, so hopefully this helps.
Using a sub-query to calculate the desired code can handle any number of duplicates. The sub-query needs to return the first record that is a match (including itself).
The below example is broken up into small steps so you can see exactly how the data is being manipulated.
-- Generate table structure
DECLARE #TestData TABLE (
ID INT
, ValueToCompare VARCHAR(MAX)
, Code INT
, NewCode INT
)
-- Generate test data
INSERT INTO #TestData
( ID, ValueToCompare, Code )
VALUES (1, 'John', 1134), (2, 'Joe', 1546), (3, 'Joe', 1893), (4, 'Joe', 9785), (5, 'Joe', 9452)
-- View the original data
SELECT *
FROM #TestData
-- View what the NewCode will be
SELECT ID
, ValueToCompare
, Code
, (SELECT MIN(Code) -- This subquery uses the MIN statement to grab the first record from a list of matching records
FROM #TestData SubQueryData
WHERE MainQueryData.ValueToCompare = SubQueryData.ValueToCompare
) AS 'New_Code'
FROM #TestData MainQueryData
-- Set the NewCode value
UPDATE #TestData
SET NewCode = (SELECT MIN(Code)
FROM #TestData SubQueryData
WHERE MainQueryData.ValueToCompare = SubQueryData.ValueToCompare
)
FROM #TestData MainQueryData
-- Delete duplicate records
DELETE
FROM #TestData
WHERE Code <> NewCode
-- View the resulting data
SELECT *
FROM #TestData

Trying to merge rows into one row with certain conditions

Given 2 or more rows that are selected to merge, one of them is identified as being the template row. The other rows should merge their data into any null value columns that the template has.
Example data:
Id Name Address City State Active Email Date
1 Acme1 NULL NULL NULL NULL blah#yada.com 3/1/2011
2 Acme1 1234 Abc Rd Springfield OR 0 blah#gmail.com 1/12/2012
3 Acme2 NULL NULL NULL 1 blah#yahoo.com 4/19/2012
Say that a user has chosen row with Id 1 as the template row, and rows with Ids 2 and 3 are to be merged into row 1 and then deleted. Any null value columns in row Id 1 should be filled with (if one exists) the most recent (see Date column) non-null value, and non-null values already present in row Id 1 are to be left as is. The result of this query on the above data should be exactly this:
Id Name Address City State Active Email Date
1 Acme1 1234 Abc Road Springfield OR 1 blah#yada.com 3/1/2011
Notice that the Active value is 1, and not 0 because row Id 3 had the most recent date.
P.S. Also, is there any way possible to do this without explicitly defining/knowing beforehand what all the column names are? The actual table I'm working with has a ton of columns, with new ones being added all the time. Is there a way to look up all the column names in the table, and then use that subquery or temptable to do the job?

You might do it by ordering rows first by template flag, then by date desc. Template row should always be the last one. Each row is assigned a number in that order. Using max() we are finding fist occupied cell (in descending order of numbers). Then we select columns from rows matching those maximums.
; with rows as (
select test.*,
-- Template row must be last - how do you decide which one is template row?
-- In this case template row is the one with id = 1
row_number() over (order by case when id = 1 then 1 else 0 end,
date) rn
from test
-- Your list of rows to merge goes here
-- where id in ( ... )
),
-- Finding first occupied row per column
positions as (
select
max (case when Name is not null then rn else 0 end) NamePosition,
max (case when Address is not null then rn else 0 end) AddressPosition,
max (case when City is not null then rn else 0 end) CityPosition,
max (case when State is not null then rn else 0 end) StatePosition,
max (case when Active is not null then rn else 0 end) ActivePosition,
max (case when Email is not null then rn else 0 end) EmailPosition,
max (case when Date is not null then rn else 0 end) DatePosition
from rows
)
-- Finally join this columns in one row
select
(select Name from rows cross join Positions where rn = NamePosition) name,
(select Address from rows cross join Positions where rn = AddressPosition) Address,
(select City from rows cross join Positions where rn = CityPosition) City,
(select State from rows cross join Positions where rn = StatePosition) State,
(select Active from rows cross join Positions where rn = ActivePosition) Active,
(select Email from rows cross join Positions where rn = EmailPosition) Email,
(select Date from rows cross join Positions where rn = DatePosition) Date
from test
-- Any id will suffice, or even DISTINCT
where id = 1
You might check it at Sql Fiddle.
EDIT:
Cross joins in last section might actually be inner joins on rows.rn = xxxPosition. It works this way, but change to inner join would be an improvement.

It's not so complicated.
At first..
DECLARE #templateID INT = 1
..so you can remember which row is treated as template..
Now find latest NOT NULL values (exclude template row). The easiest way is to use TOP 1 subqueries for each column:
SELECT
(SELECT TOP 1 Name FROM DataTab WHERE Name IS NOT NULL AND NOT ID = #templateID ORDER BY Date DESC) AS LatestName,
(SELECT TOP 1 Address FROM DataTab WHERE Address IS NOT NULL AND NOT ID = #templateID ORDER BY Date DESC) AS AddressName
-- add more columns here
Wrap above into CTE (Common Table Expression) so you have nice input for your UDPATE..
WITH Latest_CTE (CTE_LatestName, CTE_AddressName) -- add more columns here; I like CTE prefix to distinguish source columns from target columns..
AS
-- Define the CTE query.
(
SELECT
(SELECT TOP 1 Name FROM DataTab WHERE Name IS NOT NULL AND NOT ID = #templateID ORDER BY Date DESC) AS LatestName,
(SELECT TOP 1 Address FROM DataTab WHERE Address IS NOT NULL AND NOT ID = #templateID ORDER BY Date DESC) AS AddressName
-- add more columns here
)
UPDATE
<update statement here (below)>
Now, do smart UPDATE of your template row using ISNULL - it will act as conditional update - update only if target column is null
WITH
<common expression statement here (above)>
UPDATE DataTab
SET
Name = ISNULL(Name, CTE_LatestName), -- if Name is null then set Name to CTE_LatestName else keep Name as Name
Address = ISNULL(Address, CTE_LatestAddress)
-- add more columns here..
WHERE ID = #templateID
And the last task is delete rows other then template row..
DELETE FROM DataTab WHERE NOT ID = #templateID
Clear?

For dynamic columns, you need to write a solution using dynamic SQL.
You can query sys.columns and sys.tables to get the list of columns you need, then you want to loop backwards once for each null column finding the first non-null row for that column and updating your output row for that column. Once you get to 0 in the loop you have a complete row which you can then display to the user.

I should pay attention to posting dates. In any case, here's a solution using dynamic SQL to build out an update statement. It should give you something to build from, anyway.
There's some extra code in there to validate the results along the way, but I tried to comment in a way that made that non-vital code apparent.
CREATE TABLE
dbo.Dummy
(
[ID] int ,
[Name] varchar(30),
[Address] varchar(40) null,
[City] varchar(30) NULL,
[State] varchar(2) NULL,
[Active] tinyint NULL,
[Email] varchar(30) NULL,
[Date] date NULL
);
--
INSERT dbo.Dummy
VALUES
(
1, 'Acme1', NULL, NULL, NULL, NULL, 'blah#yada.com', '3/1/2011'
)
,
(
2, 'Acme1', '1234 Abc Rd', 'Springfield', 'OR', 0, 'blah#gmail.com', '1/12/2012'
)
,
(
3, 'Acme2', NULL, NULL, NULL, 1, 'blah#yahoo.com', '4/19/2012'
);
DECLARE
#TableName nvarchar(128) = 'Dummy',
#TemplateID int = 1,
#SetStmtList nvarchar(max) = '',
#LoopCounter int = 0,
#ColumnCount int = 0,
#SQL nvarchar(max) = ''
;
--
--Create a table to hold the column names
DECLARE
#ColumnList table
(
ColumnID tinyint IDENTITY,
ColumnName nvarchar(128)
);
--
--Get the column names
INSERT #ColumnList
(
ColumnName
)
SELECT
c.name
FROM
sys.columns AS c
JOIN
sys.tables AS t
ON
t.object_id = c.object_id
WHERE
t.name = #TableName;
--
--Create loop boundaries to build out the SQL statement
SELECT
#ColumnCount = MAX( l.ColumnID ),
#LoopCounter = MIN (l.ColumnID )
FROM
#ColumnList AS l;
--
--Loop over the column names
WHILE #LoopCounter <= #ColumnCount
BEGIN
--Dynamically construct SET statements for each column except ID (See the WHERE clause)
SELECT
#SetStmtList = #SetStmtList + ',' + l.ColumnName + ' =COALESCE(' + l.ColumnName + ', (SELECT TOP 1 ' + l.ColumnName + ' FROM ' + #TableName + ' WHERE ' + l.ColumnName + ' IS NOT NULL AND ID <> ' + CAST(#TemplateID AS NVARCHAR(MAX )) + ' ORDER BY Date DESC)) '
FROM
#ColumnList AS l
WHERE
l.ColumnID = #LoopCounter
AND
l.ColumnName <> 'ID';
--
SELECT
#LoopCounter = #LoopCounter + 1;
--
END;
--TESTING - Validate the initial table values
SELECT * FROM dbo.Dummy ;
--
--Get rid of the leading common in the SetStmtList
SET #SetStmtList = SUBSTRING( #SetStmtList, 2, LEN( #SetStmtList ) - 1 );
--Build out the rest of the UPDATE statement
SET #SQL = 'UPDATE ' + #TableName + ' SET ' + #SetStmtList + ' WHERE ID = ' + CAST(#TemplateID AS NVARCHAR(MAX ))
--Then execute the update
EXEC sys.sp_executesql
#SQL;
--
--TESTING - Validate the updated table values
SELECT * FROM dbo.Dummy ;
--
--Build out the DELETE statement
SET #SQL = 'DELETE FROM ' + #TableName + ' WHERE ID <> ' + CAST(#TemplateID AS NVARCHAR(MAX ))
--Execute the DELETE
EXEC sys.sp_executesql
#SQL;
--
--TESTING - Validate the final table values
SELECT * FROM dbo.Dummy;
--
DROP TABLE dbo.Dummy;

How can I efficiently do a database massive update?

I have a table with some duplicate entries. I have to discard all but one, and then update this latest one. I've tried with a temporary table and a while statement, in this way:
CREATE TABLE #tmp_ImportedData_GenericData
(
Id int identity(1,1),
tmpCode varchar(255) NULL,
tmpAlpha3Code varchar(50) NULL,
tmpRelatedYear int NOT NULL,
tmpPreviousValue varchar(255) NULL,
tmpGrowthRate varchar(255) NULL
)
INSERT INTO #tmp_ImportedData_GenericData
SELECT
MCS_ImportedData_GenericData.Code,
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
FROM MCS_ImportedData_GenericData AS M
GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')
-- SELECT * from #tmp_ImportedData_GenericData
-- DROP TABLE #tmp_ImportedData_GenericData
DECLARE #counter int
DECLARE #rowsCount int
SET #counter = 1
SELECT #rowsCount = count(*) from #tmp_ImportedData_GenericData
-- PRINT #rowsCount
WHILE #counter < #rowsCount
BEGIN
SELECT
#Code = tmpCode,
#Alpha3Code = tmpAlpha3Code,
#RelatedYear = tmpRelatedYear,
#OldValue = tmpPreviousValue,
#GrowthRate = tmpGrowthRate
FROM
#tmp_ImportedData_GenericData
WHERE
Id = #counter
DELETE FROM MCS_ImportedData_GenericData
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL
UPDATE
MCS_ImportedData_GenericData
SET
PreviousValue = #OldValue, GrowthRate = #GrowthRate
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'
SET #counter = #counter + 1
END
but it takes too long time, even if there are just 20000 - 30000 rows to process.
Does anyone has some suggestions in order to improve performance?
Thanks in advance!

WITH q AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
FROM MCS_ImportedData_GenericData m
WHERE PreviousValue <> 'INDEFINITO'
)
DELETE
FROM q
WHERE rn > 1

Quassnoi's answer uses SQL Server 2005+ syntax, so I thought I'd put in my tuppence worth using something more generic...
First, to delete all the duplicates, but not the "original", you need a way of differentiating the duplicate records from each other. (The ROW_NUMBER() part of Quassnoi's answer)
It would appear that in your case the source data has no identity column (you create one in the temp table). If that is the case, there are two choices that come to my mind:
1. Add the identity column to the data, then remove the duplicates
2. Create a "de-duped" set of data, delete everything from the original, and insert the de-deduped data back into the original
Option 1 could be something like...
(With the newly created ID field)
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
WHERE
id > (
SELECT
MIN(id)
FROM
MCS_ImportedData_GenericData
WHERE
CODE = [data].CODE
AND ALPHA3CODE = [data].ALPHA3CODE
AND RELATEDYEAR = [data].RELATEDYEAR
)
OR...
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
SELECT
MIN(id) AS [id],
CODE,
ALPHA3CODE,
RELATEDYEAR
FROM
MCS_ImportedData_GenericData
GROUP BY
CODE,
ALPHA3CODE,
RELATEDYEAR
)
AS [original]
ON [original].CODE = [data].CODE
AND [original].ALPHA3CODE = [data].ALPHA3CODE
AND [original].RELATEDYEAR = [data].RELATEDYEAR
AND [original].id <> [data].id

I don't understand used syntax perfectly enough to post an exact answer, but here's an approach.
Identify rows you want to preserve (eg. select value, ... from .. where ...)
Do the update logic while identifying (eg. select value + 1 ... from ... where ...)
Do insert select to a new table.
Drop the original, rename new to original, recreate all grants/synonyms/triggers/indexes/FKs/... (or truncate the original and insert select from the new)
Obviously this has a prety big overhead, but if you want to update/clear millions of rows, it will be the fastest way.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merge - Only update if values have changed - sql

Rather than avoiding an update altogether, you could change your [VERSION] + 1 code to add zero when names match: [VERSION] = tgt.VERSION + (CASE WHEN tgt.First_Name <> src.First_Name OR tgt.Last_Name <> src.Last_Name THEN 1 ELSE 0 END)

Related

How remove duplicates from a stored procedure without using DISTINCT

MS SQL Execute multiple updates based on a list

Removing duplicated Data in SQL Server 2008

Trying to merge rows into one row with certain conditions

How can I efficiently do a database massive update?

Categories

Resources