Leveraging CHECKSUM in MERGE but unable to get all rows to merge - sql

I am having trouble getting MERGE statements to work properly, and I have recently started to try to use checksums.
In the toy example below, I cannot get this row to insert (1, 'ANDREW', 334.3) that is sitting in the staging table.
DROP TABLE TEMP1
DROP TABLE TEMP1_STAGE
-- create table
CREATE TABLE TEMP1
(
[ID] INT,
[NAME] VARCHAR(55),
[SALARY] FLOAT,
[SCD] INT
)
-- create stage
CREATE TABLE TEMP1_STAGE
(
[ID] INT,
[NAME] VARCHAR(55),
[SALARY] FLOAT,
[SCD] INT
)
-- insert vals into stage
INSERT INTO TEMP1_STAGE (ID, NAME, SALARY)
VALUES
(1, 'ANDREW', 333.3),
(2, 'JOHN', 555.3),
(3, 'SARAH', 444.3)
-- insert stage table into main table
INSERT INTO TEMP1
SELECT *
FROM TEMP1_STAGE;
-- clean up stage table
TRUNCATE TABLE TEMP1_STAGE;
-- put some new values in the stage table
INSERT INTO TEMP1_STAGE (ID, NAME, SALARY)
VALUES
(1, 'ANDREW', 334.3),
(4, 'CARL', NULL)
-- CHECKSUMS
update TEMP1_STAGE
set SCD = binary_checksum(ID, NAME, SALARY);
update TEMP1
set SCD = binary_checksum(ID, NAME, SALARY);
-- run merge
MERGE TEMP1 AS TARGET
USING TEMP1_STAGE AS SOURCE
-- match
ON (SOURCE.[ID] = TARGET.[ID])
WHEN NOT MATCHED BY TARGET
THEN INSERT (
[ID], [NAME], [SALARY], [SCD]) VALUES (
SOURCE.[ID], SOURCE.[NAME], SOURCE.[SALARY], SOURCE.[SCD]);
-- the value: (1, 'ANDREW', 334.3) is not merged in
SELECT * FROM TEMP1;
How can I use the checksum to my advantage in the MERGE?

Your issue is that the NOT MATCHED condition is only considering the ID values specified in the ON condition.
If you want duplicate, but distinct records, include SCD to the ON condition.
If (more likely) your intent is that record ID = 1 be updated with the new SALARY, you will need to add a WHEN MATCHED AND SOURCE.SCD <> TARGET.SCD THEN UPDATE ... clause.
That said, the 32-bit int value returned by the `binary_checksum()' function is not sufficiently distinct to avoid collisions and unwanted missed updates. Take a look at HASHBYTES instead. See Binary_Checksum Vs HashBytes function.
Even that may not yield your intended performance gain. Assuming that you have to calculate the hash for all records in the staging table for each update cycle, you may find that it is simpler to just compare each potentially different field before the update. Something like:
WHEN MATCHED AND (SOURCE.NAME <> TARGET.NAME OR SOURCE.SALARY <> TARGET.SALARY)
THEN UPDATE ...
Even then, you need to be careful of potential NULL values and COLLATION. Both NULL <> 50000.00 and 'Andrew' <> 'ANDREW' may not give you the results you expect. It might be easiest and most reliable to just code WHEN MATCHED THEN UPDATE ....
Lastly, I suggest using DECIMAL instead of FLOAT for Salary.

Related

While updating table1, how do I INSERT to table2 for every change in table 1?

I have a MEMBER table and NOTIFICATION table. On client side, I list all of the records in MEMBER table and there is a points column and this is shown as text input. So after I change the values for some members, I can click save button and this will update the records in my MEMBER table that's all right,
But the thing I want to accomplish is for every record whose points value has changed I want to INSERT a record in my notifications table.
I couldn't think of anything, how can I approach to this problem?
For notifications I made 3 tables by following the article in here
Use the output clause instead of trigger, they are bad.
You need the condition "where data_old <> data_new" case if you updated a column with the same value, SQL Server marked it as changed, even if the value hasn't changed
create table #example (id int identity(1,1) not null, data nvarchar(max));
insert into #example (data) values ('value 1'),('value 2'), ('value 3');
create table #audit (id int, data_old nvarchar(max), data_new nvarchar(max), [When] datetime not null default (getdate()));
insert into #audit (id, data_old, data_new)
select id, data_old, data_new
from (
update #example
set data = 'value changed'
output inserted.id, deleted.data as data_old, inserted.data as data_new
where id = 2
)changed (id, data_old, data_new)
where data_old <> data_new
select * from #audit
will result with this in #audit :
You have described what a trigger does.
create trigger trig_member_insert on members after update
as
begin
insert into notifications ( . . . )
select . . ., i.points as new_points u.points as old_points -- what you want to insert
from inserted i join
updated u
on i.member_id = u.member_id
where u.points <> i.points
end;
Storing something called "points" as a string seems like a very poor choice. It sounds like a number.

Using OUTPUT INTO with from_table_name in an INSERT statement [duplicate]

This question already has answers here:
Is it possible to for SQL Output clause to return a column not being inserted?
(2 answers)
Closed 2 years ago.
Microsoft's OUTPUT Clause documentation says that you are allowed to use from_table_name in the OUTPUT clause's column name.
There are two examples of this:
Using OUTPUT INTO with from_table_name in an UPDATE statement
Using OUTPUT INTO with from_table_name in a DELETE statement
Is it possible to also use it in an INSERT statement?
INSERT INTO T ( [Name] )
OUTPUT S.Code, inserted.Id INTO #TMP -- The multi-part identifier "S.Code" could not be bound.
SELECT [Name] FROM S;
Failing example using table variables
-- A table to insert into.
DECLARE #Item TABLE (
[Id] [int] IDENTITY(1,1),
[Name] varchar(100)
);
-- A table variable to store inserted Ids and related Codes
DECLARE #T TABLE (
Code varchar(10),
ItemId int
);
-- Insert some new items
WITH S ([Name], Code) AS (
SELECT 'First', 'foo'
UNION ALL SELECT 'Second', 'bar'
-- Etc.
)
INSERT INTO #Item ( [Name] )
OUTPUT S.Code, inserted.Id INTO #T -- The multi-part identifier "S.Code" could not be bound.
SELECT [Name] FROM S;
No, because an INSERT doesn't have a FROM; it has a set of values that are prepared either by the VALUES keyword, or from a query (and even though that query has a FROM, you should conceive that it's already been run and turned into a block of values by the time the insert is done; there is no s.code any more)
If you want to output something from the table that drove the insert you'll need to use a merge statement that never matches any records (so it's only inserting) instead, or perhaps insert all your data into #tmp and then insert from #tmp into the real table - #tmp will thus still be the record of rows that were inserted, it's just that it was created to drive the insert rather than as a consequence of it (caveats that it wouldn't contain calculated columns)

SQL Merge not inserting new row

I am trying to use T-SQL Merge to check for the existence of records and update, if not then insert.
The update works fine, but the insert is not working.
Any and all help on this would be gratefully received.
DECLARE
#OperatorID INT = 2,
#CurrentCalendarView VARCHAR(50) = 'month';
WITH CTE AS
(
SELECT *
FROM dbo.OperatorOption
WHERE OperatorID = #OperatorID
)
MERGE INTO OperatorOption AS T
USING CTE S ON T.OperatorID = S.OperatorID
WHEN MATCHED THEN
UPDATE
SET T.CurrentCalendarView = #CurrentCalendarView
WHEN NOT MATCHED BY TARGET THEN
INSERT (OperatorID, PrescriptionPrintingAccountID, CurrentCalendarView)
VALUES (#OperatorID, NULL, #CurrentCalendarView);
When would a row Selected from OperatorOption not already exist in OperatorOption?
If you're saying this code does not insert - you're right it doesn't because the row has to be there to begin with (in which case it won't insert), or the row is not there to begin with, in which case there is nothing in the source dataset to insert.
Does
SELECT *
FROM dbo.OperatorOption
WHERE OperatorID = #OperatorID
return anything or not?
This does not work the way you think it does. There is nothing in the source CTE.
The answer to 'was a blank dataset missing from the target' is 'No' so nothing is inserted
To do this operation, I use this construct:
INSERT INTO dbo.OperatorOption
(OperatorID, PrescriptionPrintingAccountID, CurrentCalendarView)
SELECT #OperatorID, NULL, #CurrentCalendarView
WHERE NOT EXISTS (
SELECT * FROM dbo.OperatorOption
WHERE OperatorID = #OperatorID
)
It does not matter you are inserting values as variables. It thinks there is nothing to insert.
You need to produce data that does not match.
Like this:
DECLARE #OperatorID INT = 3, #CurrentCalendarView VARCHAR(50) = 'month';
declare #t table (operatorID int, CurrentCalendarView varchar(50));
insert into #t values (2, 'year');
MERGE #t AS TARGET
USING (SELECT #OperatorID, #CurrentCalendarView) AS source (operatorID, CurrentCalendarView)
on (TARGET.operatorID = Source.operatorID)
WHEN MATCHED THEN
UPDATE SET TARGET.CurrentCalendarView = #CurrentCalendarView
WHEN NOT MATCHED BY TARGET THEN
INSERT (OperatorID, CurrentCalendarView)
VALUES (source.OperatorID, source.CurrentCalendarView);
select * from #t
Insert probably isn't working because your source CTE does not produce any rows. Depending on how your table is organised, you might need to select from some other source, or use table valued constructor to produce source data.

Set Identity ON with a merge statement

I am inserting and deleting elements in a table, as a result, when I want to insert a new element, it takes a new id number, but this id is not taking the last id+1. For example: the last id is 5 and I inserted a 5 elements and deleted after that, the new id will take the value of 11, and I need 6. Here is my code
CREATE TABLE #FC
(
Code varchar(25),
Description varchar(50),
Category varchar(10),
CreatedDate datetime,
LastModifiedDate datetime
);
--Adding just one record
INSERT INTO #FC (Code, Description, Category, CreatedDate, LastModifiedDate)
VALUES ('DELETE_MEMBER', 'Delete Member', 'POLICY', #Now, #Now);
;
SET IDENTITY_INSERT [dbo].[Function_Code] ON;
MERGE
INTO [dbo].[Function_Code] AS T
USING #FC AS S
ON (T.Code = S.Code) AND (T.Description = S.Description) AND(T.Category = S.Category)
WHEN MATCHED THEN
UPDATE SET
[Code] = S.[Code]
, [Description] = S.Description
, [Category] = S.Category
, [CreatedDate] = S.CreatedDate
, [LastModifiedDate] = S.LastModifiedDate
WHEN NOT MATCHED THEN
INSERT (Code, Description, Category, CreatedDate, LastModifiedDate)
VALUES(S.Code, S.Description, S.Category, S.CreatedDate, S.LastModifiedDate)
;
SET IDENTITY_INSERT [dbo].[Function_Code] OFF;
An identity is a technical field that you should not handle yourself. If you want to manage the sequence yourself, then don't use an identity field.
Nevertheless, if you really want to do it, you'll have to reseed the table to the desired value :
DELETE YourTable
DECLARE #n INT;
SELECT #n = MAX(YourId) FROM YourTable
DBCC CHECKIDENT ('YourTable', RESEED, #n)
INSERT YourTable
What you are asking is dangerous. If you make a column an identity column, don't touch it, let sql server do its job. Otherwise you can start getting primary key errors. The identity column is ready to insert 11. You insert six through eleven in your code by running it multiple time and you can get a primary key error next time the identity tries to insert a row into the table.
As Thomas Haratyk said you can reseed your table. Or you can use:
select MAX(YourId) + 1 FROM YourTable
and insert that into your identity column if you are sure you will always insert an id that has already been used by the identity column.
However, if you are commonly overwriting the default identity behavior, it may be better to manage this column yourself because deleting from an identity column results in gaps by default.

How to Update, Insert, Delete in one MERGE query in Sql Server 2008?

I have two tables - the source and the destination. I would like to merge the source into the destination using the MERGE query (SQL Server 2008).
My setup is as follows:
Each destination record has three fields (in a real application there are more than 3, of course) - id, checksum and timestamp.
Each source record has two fields - id and checksum.
A source record is to be inserted into the destination if there is no destination record with the same id.
A destination record will be updated from the source record with the same id provided the source record checksum IS NOT NULL. It is guaranteed that if the checksum IS NOT NULL then it is different from the respective destination checksum. This is a given.
A destination record will be deleted if there is no source record with the same id.
This setup should lend itself quite well to the MERGE statement semantics, yet I am unable to implement it.
My poor attempt is documented in this SQL Fiddle
What am I doing wrong?
EDIT
BTW, not MERGE based solution is here.
create table #Destination
(
id int,
[Checksum] int,
[Timestamp] datetime
)
create table #Source
(
id int,
[Checksum] int
)
insert #Destination
values (1, 1, '1/1/2001'),
(2, 2, '2/2/2002'),
(3, 3, getdate()),
(4, 4, '4/4/2044')
insert #Source
values (1, 11),
(2, NULL),
(4, 44);
merge #destination as D
using #Source as S
on (D.id = S.id)
when not matched by Target then
Insert (id, [Checksum], [Timestamp])
Values (s.id, s.[Checksum], Getdate())
when matched and S.[Checksum] is not null then
Update
set D.[Checksum]=S.[Checksum],
D.[Timestamp]=Getdate()
when not matched by Source then
Delete
Output $action, inserted.*,deleted.*;
select *
from #Destination