Merge with primary key violation - sql

I have a file based import thingy where the users can post files to be imported in the database. New records are inserted and records with an already existing Id are updated.
After posting a file with
ID NAME
5 Silly
they can correct this by posting a new file with
ID NAME
5 Sally
I have a bulk insert (C# windows service) of the file into a bulk table (Sql Server Azure v12). The files can contain millions of rows so I'd like to avoid iterating through rows. After the bulk insert i have a SP that does a merge update / insert and updates already existing rows and inserts new ones.
The problem I've come across is when the users post a new record and a correction of the same record in the same file. I get a PRIMARY KEY VIOLATION on the target table.
Is there a nice way to solve this?
Here's an example:
--drop table #bulk
--drop table #target
create table #bulk(
id int,
name varchar(10)
)
insert into #bulk values (1,'John')
insert into #bulk values (2,'Sally')
insert into #bulk values (3,'Paul')
insert into #bulk values (4,'Gretchen')
insert into #bulk values (5,'Penny')
insert into #bulk values (5,'Peggy')
create table #target(
id int not null,
name varchar(10),
primary key (id))
merge #target as target
using(select id, name from #bulk) as bulktable
on target.id = bulktable.id
when matched then update
set target.name = bulktable.name
when not matched then
insert(id, name) values (bulktable.id, bulktable.name);

This will handle the latest value for name.
You need a new create script for #bulk
CREATE TABLE #bulk
(
row_id int identity(1,1),
id int,
name varchar(10)
)
This is the script you can use with the new bulk table:
;WITH CTE as
(
SELECT
id, name,
row_number() over (partition by id order by row_id desc) rn
FROM #bulk
), CTE2 as
(
SELECT id, name
FROM CTE
WHERE rn = 1
)
MERGE #target as target
USING CTE2 as bulktable
on target.id = bulktable.id
WHEN matched and
not exists(SELECT target.name except SELECT bulktable.name)
-- this will handle null values. Otherwise it could simply have been:
-- matched and target.name <> bulktable.name
THEN update
SET target.name = bulktable.name
WHEN not matched THEN
INSERT(id, name) VALUES (bulktable.id, bulktable.name);

Related

How to insert a query into SQLite with an autoincrementing value for each row

Suppose I am inserting the following queryset into a new table in SQLite:
CREATE TABLE queryset_cache AS
SELECT ROW_NUMBER() over () AS rowid, * FROM mytable ORDER BY product;
Is it possible to either:
Set the rowid as auto-incrementing PK in sqlite from the insert, or;
Exclude the rowid and have SQLite auto-add in an autoincrementing primary key for each inserted record.
How would this be done?
Currently, without that when I do the insert, the rowid is not indexed.
rowid is already there. You can just do:
CREATE TABLE queryset_cache AS
SELECT t.*
FROM mytable t
ORDER BY product;
You will see it if you do:
SELECT rowid, t.*
FROM queryset_cache;
Here is a db<>fiddle
Auo increment should solve this. Documentation here:
https://www.sqlite.org/autoinc.html
Create source table:
create table sourceTable (oldID integer, data TEXT);
Add source data:
insert into sourceTable values(7, "x");
insert into sourceTable values(8, "y");
insert into sourceTable values(9, "z");
Create target table with auto-increment:
create table target(newID INTEGER PRIMARY KEY AUTOINCREMENT, data TEXT);
Move data from source to target:
insert into target select null, data from sourceTable
If we have a table like:
create table employee (empID integer, name text , address text);
insert data into this table.
create a table in which you want to insert employee table data:
create table newEmployee (newempID integer PRIMARY KEY, name text , address text);
copy data to newEmployee table:
insert into newEmployee select * from employee
(select * from employee) to copy all the columns

Generate ID for duplicate values in sql server

I found following link to assign identical ID to duplicates in SQL server,
my understanding there is no sql server function to automatically generate it rather than using insert and update queries in link attached, is that statement True, if yes, then what would be the trigger if for example someone insert data to MyTable then run insert and update query from link:
Assign identical ID to duplicates in SQL server
INSERT INTO secondTable (word) SELECT distinct word FROM MyTable;
UPDATE MyTable SET ID = (SELECT id from secondTable where MyTable.word = secondTable.word)
thanks,
S
Is this what you want? I can't think of an "automatic" solution that would just increase the Id for new words.
CREATE TABLE MyTable (
Id INT NOT NULL,
Word NVARCHAR(255) NOT NULL
PRIMARY KEY (Id, Word)); -- primary key will make it impossible to have more than one combination of word and id
DECLARE #word NVARCHAR(255) = 'Hello!';
-- Get existing id or calculate a new id
DECLARE #Id INT = (SELECT Id FROM MyTable WHERE Word = #word);
IF(#id IS NULL) SET #Id = (SELECT MAX(Id) + 1 FROM MyTable);
INSERT INTO MyTable (Id, Word)
VALUES (#id, #word)
SELECT * FROM MyTable
If you can't for some reason have id and word as a combined primary key, you may use an unique index to make sure that there is only one combination

SSIS Staging Table to Normalized form

I could be down the wrong path with this. However, here goes. I am trying to take multiple excel sheets and load them into SQL Server using SSIS.
Excel sheet:
RQ|Descr|PartNum|Manufacturer|...
I am loading this into a staging table with a couple of derived columns:
RQ|Descr|PartNum|Manufacturer|Origin|DateTime|...
This is no big deal, I am able to do this easily. However, the problem is how to get the data from the staging table to the correct table and ensuring FK constraints are followed. See below for an illustration.
My goal is to take RQ|Descr|PartNum|Manufacturer|Origin|DateTime|...
and populate multiple tables
[t1] id|RQ|Descr|Origin|DateTime
[t2] id|t1_id|PartNum|Manufacturer
[t3] id|t1_id|...
I have tried MERGE however I am unsure how to keep the FK relationship.
MERGE INTO spin_item AS targ
USING ssis_stage AS src ON 1=0 -- always generates "not matched by target"
WHEN NOT MATCHED BY TARGET THEN
-- INSERT into spin_item:
INSERT (description, reqqty, price, origin, datetime, exclude, status, siteid, production, repairable)
VALUES (src.description, src.rq, src.price, src.origin, GETDATE(), 0, 'N', '', 0, 0)
-- INSERT into spin_part:
OUTPUT inserted.ID, src.manufacturer, src.partnum
INTO spin_part (ID, src.manufacturer, src.partnum);
I have looked into this SSIS : Using multicast to enter data into 2 RELATED destinations but this is for a one-to-many relationship. So, I am not sure how to populate my t1 table and use the id to populate t2, t3 from the staging table.
EDIT: Below, seems to be a working solution. However, I am not sure that it is a good solution.
BEGIN
SET IDENTITY_INSERT dbo.spin_item ON
--Insert into spin_item
MERGE INTO spin_item AS targ
USING ssis_stage AS src ON 1=0
WHEN NOT MATCHED BY TARGET THEN
INSERT (id, description, reqqty, price, origin, datetime, exclude, status, siteid, production, repairable)
VALUES (src.id, src.description, src.rq, src.price, src.origin, GETDATE(), 0, 'N', '', 0, 0);
SET IDENTITY_INSERT dbo.spin_item OFF
--Insert into spin_part
MERGE INTO spin_part AS targ
USING ssis_stage AS src ON 1=0
WHEN NOT MATCHED BY TARGET AND src.partnum IS NOT NULL THEN
INSERT (itemid_id, manufacturer, partnum, catalognum, [primary])
VALUES (src.id, src.manufacturer, src.partnum, src.partnum, 1);
--Insert into spin_stock
MERGE INTO spin_stock AS targ
USING ssis_stage AS src ON 1=0
WHEN NOT MATCHED BY TARGET AND src.stock IS NOT NULL THEN
INSERT (itemid_id, stocknum)
VALUES (src.id, src.stock);
--Insert into spin_collaboration
MERGE INTO spin_collaboration AS targ
USING ssis_stage AS src ON 1=0
WHEN NOT MATCHED BY TARGET AND src.notes IS NOT NULL THEN
INSERT (itemid_id, comment, datetime)
VALUES (src.id, src.notes, GETDATE());
DELETE FROM ssis_stage WHERE id > 0 --Instead of Truncate since auto_increment will reset.
END
You can create an ID column on your staging table, based off your target tables that is then used as the FK in each table insert:
declare #source table (ID int, a int, b int, c int);
insert into #source values
(null,1,1,1)
,(null,1,1,2)
,(null,1,2,2)
,(null,5,3,2)
,(null,7,1,2)
,(null,2,1,2)
declare #target1 table (ID int, a int);
insert into #target1 values
(1,5)
,(2,6)
,(3,99);
declare #target2 table (ID int, b int, c int);
insert into #target2 values
(1,3,2)
,(2,9,7)
,(3,57,3);
update s
set ID = ss.IDNew
from #source s
inner join (
select row_number() over (order by a,b,c) + (select max(ID) from #target1) as IDNew
,a
,b
,c
from #source
) ss
on(s.a = ss.a
and s.b = ss.b
and s.c = ss.c
);
select * from #target1;
select * from #source;
insert into #target1
select ID
,a
from #source;
insert into #target2
select ID
,b
,c
from #source;
select * from #target1;
select * from #target2;

Insert into a Informix table or update if exists

I want to add a row to an Informix database table, but when a row exists with the same unique key I want to update the row.
I have found a solution for MySQL here which is as follows but I need it for Informix:
INSERT INTO table (id, name, age) VALUES(1, "A", 19) ON DUPLICATE KEY UPDATE name="A", age=19
You probably should use the MERGE statement.
Given a suitable table:
create table table (id serial not null primary key, name varchar(20) not null, age integer not null);
this SQL works:
MERGE INTO table AS dst
USING (SELECT 1 AS id, 'A' AS name, 19 AS age
FROM sysmaster:'informix'.sysdual
) AS src
ON dst.id = src.id
WHEN NOT MATCHED THEN INSERT (dst.id, dst.name, dst.age)
VALUES (src.id, src.name, src.age)
WHEN MATCHED THEN UPDATE SET dst.name = src.name, dst.age = src.age
Informix has interesting rules allowing the use of keywords as identifiers without needing double quotes (indeed, unless you have DELIMIDENT set in the environment, double quotes are simply an alternative to single quotes around strings).
You can try the same behavior using the MERGE statement:
Example, creation of the target table:
CREATE TABLE target
(
id SERIAL PRIMARY KEY CONSTRAINT pk_tst,
name CHAR(1),
age SMALLINT
);
Create a temporary source table and insert the record you want:
CREATE TEMP TABLE source
(
id INT,
name CHAR(1),
age SMALLINT
) WITH NO LOG;
INSERT INTO source (id, name, age) VALUES (1, 'A', 19);
The MERGE would be:
MERGE INTO target AS t
USING source AS s ON t.id = s.id
WHEN MATCHED THEN
UPDATE
SET t.name = s.name, t.age = s.age
WHEN NOT MATCHED THEN
INSERT (id, name, age)
VALUES (s.id, s.name, s.age);
You'll see that the record was inserted then you can:
UPDATE source
SET age = 20
WHERE id = 1;
And test the MERGE again.
Another way to do it is create a stored procedure, basically you will do the INSERT statement and check the SQL error code, if it's -100 you go for the UPDATE.
Something like:
CREATE PROCEDURE sp_insrt_target(v_id INT, v_name CHAR(1), v_age SMALLINT)
ON EXCEPTION IN (-100)
UPDATE target
SET name = v_name, age = v_age
WHERE id = v_id;
END EXCEPTION
INSERT INTO target VALUES (v_id, v_name, v_age);
END PROCEDURE;

Update records based on inserted IDs and another non source column in SQL

Probably there is already answer for it, but i couldn't find it... So i have 2 tables and data in third one. Lets name them (Source, Target and UpdateTarget).
I need to insert records from Source to Target, then grab autoincremented IDs from Target and update UpdateTarget table with these IDs based on filters from Source table. I've tried to use OUTPUT, but it gives me an error:
The multi-part identifier "s.EmployeeID" could not be bound.
Here is my current SQL query:
CREATE TABLE dbo.target
(
id INT IDENTITY(1,1) PRIMARY KEY,
employee VARCHAR(32)
);
CREATE TABLE dbo.source
(
id INT IDENTITY(1,1) PRIMARY KEY,
employee VARCHAR(32),
EmployeeID int
);
CREATE TABLE dbo.updateTarget
(
id INT IDENTITY(1,1) PRIMARY KEY,
ExternalID int
);
DECLARE #MyTableVar TABLE
(
id INT,
EmployeeID int
);
INSERT dbo.target (employee)
OUTPUT
inserted.id, -- autoincremented ID
s.EmployeeID -- here i got an error
INTO #MyTableVar
SELECT s.employee
FROM dbo.source AS s
UPDATE dbo.updateTarget
SET ExternalID = data.ID
FROM #MyTableVar data
WHERE updateTarget.ID = data.EmployeeID
DROP TABLE source
DROP TABLE target
DROP TABLE updateTarget
I don't have EmployeeID column in target table.
Is there a way to achieve it without making two queries for each record? Or can you point me to existing answer if there are any?
Thanks!
1) INSERT INTO table variable generated id, and EmployeeId for usage in update
2) MERGE instead of INSERT (it allows to get column EmployeeId from SRC)
3) OUTPUT result, action inserted, getting id from TGT and EmployeeId
INSERT INTO #MyTableVar(id, EmployeeId)
SELECT id, EmployeeId
FROM (
MERGE dbo.target TGT
USING dbo.source SRC
ON TGT.employee = SRC.employee
WHEN NOT MATCHED THEN
INSERT (employee)
VALUES (src.employee)
OUTPUT inserted.id, SRC.EmployeeId)
AS out(id, EmployeeId);;
MERGE gives better OUTPUT options