How to Update, Insert, Delete in one MERGE query in Sql Server 2008? - sql

I have two tables - the source and the destination. I would like to merge the source into the destination using the MERGE query (SQL Server 2008).
My setup is as follows:
Each destination record has three fields (in a real application there are more than 3, of course) - id, checksum and timestamp.
Each source record has two fields - id and checksum.
A source record is to be inserted into the destination if there is no destination record with the same id.
A destination record will be updated from the source record with the same id provided the source record checksum IS NOT NULL. It is guaranteed that if the checksum IS NOT NULL then it is different from the respective destination checksum. This is a given.
A destination record will be deleted if there is no source record with the same id.
This setup should lend itself quite well to the MERGE statement semantics, yet I am unable to implement it.
My poor attempt is documented in this SQL Fiddle
What am I doing wrong?
EDIT
BTW, not MERGE based solution is here.

create table #Destination
(
id int,
[Checksum] int,
[Timestamp] datetime
)
create table #Source
(
id int,
[Checksum] int
)
insert #Destination
values (1, 1, '1/1/2001'),
(2, 2, '2/2/2002'),
(3, 3, getdate()),
(4, 4, '4/4/2044')
insert #Source
values (1, 11),
(2, NULL),
(4, 44);
merge #destination as D
using #Source as S
on (D.id = S.id)
when not matched by Target then
Insert (id, [Checksum], [Timestamp])
Values (s.id, s.[Checksum], Getdate())
when matched and S.[Checksum] is not null then
Update
set D.[Checksum]=S.[Checksum],
D.[Timestamp]=Getdate()
when not matched by Source then
Delete
Output $action, inserted.*,deleted.*;
select *
from #Destination

Related

Leveraging CHECKSUM in MERGE but unable to get all rows to merge

I am having trouble getting MERGE statements to work properly, and I have recently started to try to use checksums.
In the toy example below, I cannot get this row to insert (1, 'ANDREW', 334.3) that is sitting in the staging table.
DROP TABLE TEMP1
DROP TABLE TEMP1_STAGE
-- create table
CREATE TABLE TEMP1
(
[ID] INT,
[NAME] VARCHAR(55),
[SALARY] FLOAT,
[SCD] INT
)
-- create stage
CREATE TABLE TEMP1_STAGE
(
[ID] INT,
[NAME] VARCHAR(55),
[SALARY] FLOAT,
[SCD] INT
)
-- insert vals into stage
INSERT INTO TEMP1_STAGE (ID, NAME, SALARY)
VALUES
(1, 'ANDREW', 333.3),
(2, 'JOHN', 555.3),
(3, 'SARAH', 444.3)
-- insert stage table into main table
INSERT INTO TEMP1
SELECT *
FROM TEMP1_STAGE;
-- clean up stage table
TRUNCATE TABLE TEMP1_STAGE;
-- put some new values in the stage table
INSERT INTO TEMP1_STAGE (ID, NAME, SALARY)
VALUES
(1, 'ANDREW', 334.3),
(4, 'CARL', NULL)
-- CHECKSUMS
update TEMP1_STAGE
set SCD = binary_checksum(ID, NAME, SALARY);
update TEMP1
set SCD = binary_checksum(ID, NAME, SALARY);
-- run merge
MERGE TEMP1 AS TARGET
USING TEMP1_STAGE AS SOURCE
-- match
ON (SOURCE.[ID] = TARGET.[ID])
WHEN NOT MATCHED BY TARGET
THEN INSERT (
[ID], [NAME], [SALARY], [SCD]) VALUES (
SOURCE.[ID], SOURCE.[NAME], SOURCE.[SALARY], SOURCE.[SCD]);
-- the value: (1, 'ANDREW', 334.3) is not merged in
SELECT * FROM TEMP1;
How can I use the checksum to my advantage in the MERGE?
Your issue is that the NOT MATCHED condition is only considering the ID values specified in the ON condition.
If you want duplicate, but distinct records, include SCD to the ON condition.
If (more likely) your intent is that record ID = 1 be updated with the new SALARY, you will need to add a WHEN MATCHED AND SOURCE.SCD <> TARGET.SCD THEN UPDATE ... clause.
That said, the 32-bit int value returned by the `binary_checksum()' function is not sufficiently distinct to avoid collisions and unwanted missed updates. Take a look at HASHBYTES instead. See Binary_Checksum Vs HashBytes function.
Even that may not yield your intended performance gain. Assuming that you have to calculate the hash for all records in the staging table for each update cycle, you may find that it is simpler to just compare each potentially different field before the update. Something like:
WHEN MATCHED AND (SOURCE.NAME <> TARGET.NAME OR SOURCE.SALARY <> TARGET.SALARY)
THEN UPDATE ...
Even then, you need to be careful of potential NULL values and COLLATION. Both NULL <> 50000.00 and 'Andrew' <> 'ANDREW' may not give you the results you expect. It might be easiest and most reliable to just code WHEN MATCHED THEN UPDATE ....
Lastly, I suggest using DECIMAL instead of FLOAT for Salary.

Is there a way to populate column based on conditions stored as rows in a table

I am working on a project that has a C# front end that will be used to select a file for importing into an MSSQL SQL Database. In the table there will be an additional column called 'recommendedAction' (tinyint - 0-5 only)
I would like to have sql fill in the 'recommendedAction' column based on criteria in a different table.
Is there a way that when SQL is importing (SSIS or pure TSQL) it could read the values of a table and fill in the 'action' based on the criteria? Or is this something that should be done in the C# frontend?
EDIT
SQL table structure for imported data (with additional column)
Create Table ImportedData (
Column1 INT Identity,
Column2 VARCHAR(10) NOT NULL,
Column3 CHAR(6) NOT NULL,
RecommendedAction TINYINT NOT NULL
)
Table structure of recommended action criteria
Create Table RecommendedActions(
ID INT Identity,
ActionID TINYINT NOT NULL, --value to put in the RecommendedAction column if criteria is a match
CriteriaColumn VARCHAR(255) NOT NULL --Criteria to match against the records
)
Example records for RecommendedActions
ID ActionID CriteriaColumn
1 2 'Column2 LIKE ''6%'''
2 3 'Column2 LIKE ''4%'''
Now when a new set of data is imported, if Column2 has a value of '6032' it would fill in a RecommendedAction of 2
Many ways exist. For example you can insert into the tb table a value selected from the ta table according to criteria.
Example setup
create table ta(
Id int,
val int);
insert into ta(ID, val) values
(1, 30)
,(2, 29)
,(3, 28)
,(4, 27)
,(5, 26);
create table tb
(Id int,
ref int);
Example insert
-- parameters
declare #p1 int = 1,
#p2 int = 27;
-- parameterized INSERT
insert tb(Id, ref)
values(#p1, (select ta.id from ta where ta.val=#p2));
Below added Stored procedure will do the job. It gets the Action column value based on the Column2 parameter and insert into the ImportedData table. You can execute this Stored procedure inside the C# code with required parameters. I added sample execute statements for to test the query.
Sample data inserted to the RecommendedActions Table:
INSERT INTO RecommendedActions
VALUES
(2, 'Column2 LIKE ''6%''')
,(3, 'Column2 LIKE ''4%''')
Stored Procedure Implementation :
CREATE PROCEDURE Insert_ImportedData(
#Column2 AS VARCHAR(10)
,#Column3 AS CHAR(3)
)
AS
BEGIN
DECLARE #RecommendedAction AS TINYINT
SELECT #RecommendedAction = ActionID
FROM RecommendedActions
WHERE SUBSTRING(CriteriaColumn, 15, 1) = LEFT(#Column2 , 1)
INSERT INTO ImportedData VALUES (#Column2,#Column3,#RecommendedAction)
END
GO
This is the execute statement for the Above Stored procedure
EXEC Insert_ImportedData '43258' , 'ATT'
EXEC Insert_ImportedData '63258' , 'AOT'
you can use sqlalchemy in python and load your data into a dataframe then append the dataframe to the sql table. You can set the dtype for each of the field datatype in the read_csv using a dictionary. Loading data with Python is super powerful because the bulk load is fast. Use your c# code to build the csv file using stream io and use linq to for your conditions for data fields. Then use python to load your csv.
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(connectionstring)
df = pd.read_csv("your_data.csv", header=None)
df.columns = ['field1', 'field2', 'field3']
df.to_sql(name="my_sql_table", con=connection, if_exists='append', index=False)

SQL Merge not inserting new row

I am trying to use T-SQL Merge to check for the existence of records and update, if not then insert.
The update works fine, but the insert is not working.
Any and all help on this would be gratefully received.
DECLARE
#OperatorID INT = 2,
#CurrentCalendarView VARCHAR(50) = 'month';
WITH CTE AS
(
SELECT *
FROM dbo.OperatorOption
WHERE OperatorID = #OperatorID
)
MERGE INTO OperatorOption AS T
USING CTE S ON T.OperatorID = S.OperatorID
WHEN MATCHED THEN
UPDATE
SET T.CurrentCalendarView = #CurrentCalendarView
WHEN NOT MATCHED BY TARGET THEN
INSERT (OperatorID, PrescriptionPrintingAccountID, CurrentCalendarView)
VALUES (#OperatorID, NULL, #CurrentCalendarView);
When would a row Selected from OperatorOption not already exist in OperatorOption?
If you're saying this code does not insert - you're right it doesn't because the row has to be there to begin with (in which case it won't insert), or the row is not there to begin with, in which case there is nothing in the source dataset to insert.
Does
SELECT *
FROM dbo.OperatorOption
WHERE OperatorID = #OperatorID
return anything or not?
This does not work the way you think it does. There is nothing in the source CTE.
The answer to 'was a blank dataset missing from the target' is 'No' so nothing is inserted
To do this operation, I use this construct:
INSERT INTO dbo.OperatorOption
(OperatorID, PrescriptionPrintingAccountID, CurrentCalendarView)
SELECT #OperatorID, NULL, #CurrentCalendarView
WHERE NOT EXISTS (
SELECT * FROM dbo.OperatorOption
WHERE OperatorID = #OperatorID
)
It does not matter you are inserting values as variables. It thinks there is nothing to insert.
You need to produce data that does not match.
Like this:
DECLARE #OperatorID INT = 3, #CurrentCalendarView VARCHAR(50) = 'month';
declare #t table (operatorID int, CurrentCalendarView varchar(50));
insert into #t values (2, 'year');
MERGE #t AS TARGET
USING (SELECT #OperatorID, #CurrentCalendarView) AS source (operatorID, CurrentCalendarView)
on (TARGET.operatorID = Source.operatorID)
WHEN MATCHED THEN
UPDATE SET TARGET.CurrentCalendarView = #CurrentCalendarView
WHEN NOT MATCHED BY TARGET THEN
INSERT (OperatorID, CurrentCalendarView)
VALUES (source.OperatorID, source.CurrentCalendarView);
select * from #t
Insert probably isn't working because your source CTE does not produce any rows. Depending on how your table is organised, you might need to select from some other source, or use table valued constructor to produce source data.

Not in In SQL statement?

I have set of ids in excel around 5000 and in the table I have ids around 30000. If I use 'In' condition in SQL statment I am getting around 4300 ids from what ever I have ids in Excel. But If I use 'Not In' with Excel id. I have getting around 25000+ records. I just to find out I am missing with Excel ids in the table.
How to write sql for this?
Example:
Excel Ids are
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
Table has IDs
1,
2,
3,
4,
6,
8,
9,
11,
12,
14,
15
Now I want get 5,7,10 values from Excel which missing the table?
Update:
What I am doing is
SELECT [GLID]
FROM [tbl_Detail]
where datasource = 'China' and ap_ID not in (5206896,
5206897,
5206898,
5206899,
5117083,
5143565,
5173361,
5179096,
5179097,
5179150)
Try this:
SELECT tableExcel.ID
FROM tableExcel
WHERE tableExcel.ID NOT IN(SELECT anotherTable.ID FROM anotherTable)
Here's an SQL Fiddle to try this: sqlfiddle.com/#!6/31af5/14
You're probably looking for EXCEPT:
SELECT Value
FROM #Excel
EXCEPT
SELECT Value
FROM #Table;
Edit:
Except will
treat NULL differently(NULL values are matching)
apply DISTINCT
unlike NOT IN
Here's your sample data:
declare #Excel Table(Value int);
INSERT INTO #Excel VALUES(1);
INSERT INTO #Excel VALUES(2);
INSERT INTO #Excel VALUES(3);
INSERT INTO #Excel VALUES(4);
INSERT INTO #Excel VALUES(5);
INSERT INTO #Excel VALUES(6);
INSERT INTO #Excel VALUES(7);
INSERT INTO #Excel VALUES(8);
INSERT INTO #Excel VALUES(9);
INSERT INTO #Excel VALUES(10);
declare #Table Table(Value int);
INSERT INTO #Table VALUES(1);
INSERT INTO #Table VALUES(2);
INSERT INTO #Table VALUES(3);
INSERT INTO #Table VALUES(4);
INSERT INTO #Table VALUES(6);
INSERT INTO #Table VALUES(8);
INSERT INTO #Table VALUES(9);
INSERT INTO #Table VALUES(11);
INSERT INTO #Table VALUES(12);
INSERT INTO #Table VALUES(14);
INSERT INTO #Table VALUES(15);
Import your excel file into SQL Server using the Import Data Wizard found in SQL Server Management Studio.
Then you can write the following query to find any IDs which are in the file but not in the table:
SELECT id
FROM imported_table
WHERE id NOT IN (SELECT id FROM db_table)
You should move excel data to a table in SQL Server, and then do the query in SQL Server.
select distinct id from Excel where id not in (select your ids from Sqltable)
(Obviously select your ids from Sqltable is a select which returns the Ids existing on SQL Server).
You may think that moving data to SQL Server is hard to do, but, on the contrary, it's very easy:
1) create a table
CREATE TABLE ExcelIds (Id int)
2) add a new column in excel with the following formula:
="insert into ExcelIds values(" & XX & ")"
where XX is the reference to the cell in the column with excel Ids.
3) copy the "inserts" from Excel into SSMS or whatever tool you're usin in SQL Server, and execute them.
Now you have 2 tables in SQL Server, so that querying it is absolutely easy.
When you're over, just drop the table
DROP TABLE ExcelIds
NOTE: I didn't create a key on SQL Server table because I suppose that the Ids can be repeated. Neither is justified to create a more complex SQL Query to avoid duplicates in ExcelIds for this ad hoc solution.

Insert statements fail when run against SQL Server 2008

I have to deploy my VB.NET application developed in VB.NET and Visual Studio 2005. The customer is using SQL Server 2008, while the application is being built against SQL Server 2000.
I received the following error against SQL Server 2008:
An explicit value for identity column in 'Outgoing_Invoice' table can only be specified when column list is used and Identity Insert is ON
Here is my query for inserting data in two tables:
Dim cmd1 As New SqlCommand("Insert into Stock values(#invoice_no, #gate_pass, #exp_no, #clm_no, #category, #item_name, #weight, #units_case, 0, 0, #crtns_removed, #pieces_removed, 0, 0, #date_added, #date_removed, #inc_total_price, #out_total_price, #discount, #amount, 'Sold', #expiry_date) Insert into Outgoing_Invoice values(#invoice_no, #exp_no, #party_name, #party_code, #city, #contact, #category, #item_name, #weight, #units_case, #crtns_issued, #pieces_issued, #crtns_removed, #pieces_removed, 0, 0, #scheme, #unit_price, #out_total_price, #discount, #amount, #date_removed, #expiry_date, #order_booker, #salesman)", con)
The error message is shows at cmd1.executenonquery. Both these tables Stock and Outgoing_Invoice have an identity column labelled serial before #invoice.
The problem only arose when insert was tried on SQL Server 2008. When run against SQL Server 2000, it works as expected.
What can be the possible reason for this issue and how can it be resolved?
Your INSERT query needs to specify the column names before the VALUES clause otherwise these will be attempted in column order as defined in the DB (which is subject to change - this is not fixed).
Since you are getting an error it appears that the INSERT tries to insert into the identity column.
In general - when not inserting to all columns, you must specify column names. I would always specify column names as best practice.
So - specify a column list:
INSERT INTO aTable
(col1, col2)
VALUES
(#val1, #val2)
The insert into Outgoing_Invoice has one to many parameters.
This will work just fine. Values 1 and 2 goes to C1 and C2 and ID is assigned automatically.
declare #T table
(
ID int identity,
C1 int,
C2 int
)
insert into #T values (1, 2)
This will give the exact error you have
insert into #T values (1, 2, 3)
Check the table structure in your SQL Server 2000. It probably have one extra field. That would explain why it is working there.
You should specify fields list explicitly if you want to modify/insert IDENTITY column values of table.
Ie. your query should look like that:
Insert into Stock
(
here,
comes,
your,
real,
column,
names
)
values
(
#invoice_no,
#gate_pass,
#exp_no,
#clm_no,
#category,
#item_name,
#weight,
#units_case,
0,
0,
#crtns_removed,
#pieces_removed,
0,
0,
#date_added,
#date_removed,
#inc_total_price,
#out_total_price,
#discount,
#amount,
'Sold',
#expiry_date
)
Insert into Outgoing_Invoice
(
here,
comes,
your,
real,
column,
names,
too
)
values
(
#invoice_no,
#exp_no,
#party_name,
#party_code,
#city,
#contact,
#category,
#item_name,
#weight,
#units_case,
#crtns_issued,
#pieces_issued,
#crtns_removed,
#pieces_removed,
0,
0,
#scheme,
#unit_price,
#out_total_price,
#discount,
#amount,
#date_removed,
#expiry_date,
#order_booker,
#salesman
)