Data deduplificaton - sql

The code below explains best what I'm trying to accomplish. I know that I can use a cursor or other looping routine to loop through the records to find the duplicates and create my notes records based on what is found. I'm trying to avoid that, unless there's no better option.
DROP TABLE #orig
DROP TABLE #parts
DROP TABLE #part_notes
CREATE TABLE #orig(partnum VARCHAR(20), notes VARCHAR(100));
INSERT INTO #orig VALUES ('A123', 'To be used on Hyster models only')
INSERT INTO #orig VALUES ('A123', 'Right Hand model only')
INSERT INTO #orig VALUES ('A125', 'Not to be used by Jerry')
INSERT INTO #orig VALUES ('A125', NULL)
INSERT INTO #orig VALUES ('A125', 'asdfasdlfj;lsdf')
INSERT INTO #orig VALUES ('A128', 'David test')
INSERT INTO #orig VALUES ('A129', 'Fake part')
SELECT COUNT(*) FROM #orig
-- SHOW ME UNIQUE PARTS, MY PARTS TABLE SHOULD BE UNIQUE!
SELECT DISTINCT partnum FROM #orig
CREATE TABLE #parts(id INT IDENTITY(1,1), partnum VARCHAR(20));
INSERT INTO #parts
SELECT DISTINCT partnum FROM #orig
SELECT * FROM #parts
CREATE TABLE #part_notes(id INT IDENTITY(1,1), part_id INT, line_number INT, notes VARCHAR(100));
/*
HOW DO I AT THIS POINT POPULATE the #part_notes table so that it looks like this:
(note: any NULL or empty note strings should be ignored)
id part_id line_number notes
1 1 1 To be used on Hyster models only
2 1 2 Right Hand model only
3 2 1 Not to be used by Jerry
4 2 2 asdfasdlfj;lsdf
6 3 1 David test
7 4 1 Fake part
*/

The below just arbitrarily chooses line_numbers as there doesn't seem to be anything suitable to order by in the data.
SELECT p.id part_id,
p.partnum ,
ROW_NUMBER() over (partition BY p.id ORDER BY (SELECT 0)) line_number,
notes
FROM #parts p
JOIN #orig o
ON o.partnum=p.partnum
WHERE notes IS NOT NULL
AND notes <> ''
ORDER BY part_id

Related

Insert into 2 different data to 2 table at the same time using first data's id

I have 2 tables: Order and product_order. Every order has some product in it and that's because I store products another table.
Table Order:
Id name
Table PRODUCT_ORDER:
id product_id order_id
Before I start to insert, I don't know what the Order Id is. I want to insert the data into both tables at once and I need the order id to do that.
Both id's are auto incremented. I'm using SQL Server. I can insert first order and then find the id of the order and than execute the second insert, but I want to do these both to execute at once.
The output clause is your friend here.
DECLARE #Orders TABLE (OrderID INT IDENTITY, OrderDateUTC DATETIME, CustomerID INT)
DECLARE #OrderItems TABLE (OrderItemID INT IDENTITY, OrderID INT, ProductID INT, Quantity INT, Priority TINYINT)
We'll use these table variables as demo tables with IDs to insert into. You're liking going to be passing the set of items for an order in together, but for the purpose of a demo we'll ad hoc them as a VALUES list.
DECLARE #Output TABLE (OrderID INT)
INSERT INTO #Orders (OrderDateUTC, CustomerID)
OUTPUT INSERTED.OrderID INTO #Output
VALUES (GETUTCDATE(), 1)
We inserted the Order into the Orders table, and used the OUTPUT clause to cause the inserted (and generated by the engine) into the table variable #Output. We can now use this table however we'd like:
INSERT INTO #OrderItems (OrderID, ProductID, Quantity, Priority)
SELECT OrderID, ProductID, Quantity, Priority
FROM (VALUES (5,1,1),(2,1,2),(3,1,3)) AS x(ProductID, Quantity, Priority)
CROSS APPLY #Output
We cross applied it to our items list, and inseted it as if it was any other row.
DELETE FROM #Output
INSERT INTO #Orders (OrderDateUTC, CustomerID)
OUTPUT INSERTED.OrderID INTO #Output
VALUES (GETUTCDATE(), 1)
INSERT INTO #OrderItems (OrderID, ProductID, Quantity, Priority)
SELECT OrderID, ProductID, Quantity, Priority
FROM (VALUES (1,1,1)) AS x(ProductID, Quantity, Priority)
CROSS APPLY #Output
Just to demo a little farther here's another insert. (You likely wouldn't need the DELETE normally, but we're still using the same variable here)
Now when we select that data we can see the two separate orders, with their IDs and the products that belong to them:
SELECT *
FROM #Orders o
INNER JOIN #OrderItems oi
ON o.OrderID = oi.OrderID
OrderID OrderDateUTC CustomerID OrderItemID OrderID ProductID Quantity Priority
------------------------------------------------------------------------------------------------
1 2022-12-08 23:23:21.923 1 1 1 5 1 1
1 2022-12-08 23:23:21.923 1 2 1 2 1 2
1 2022-12-08 23:23:21.923 1 3 1 3 1 3
2 2022-12-08 23:23:21.927 1 4 2 1 1 1
Dale is correct. You cannot insert into multiple tables at once, but if you use a stored procedure to handle your inserts, you can capture the ID and use it in the next insert.
-- table definitions
create table [order]([id] int identity, [name] nvarchar(100))
go
create table [product_order]([id] int identity, [product_id] nvarchar(100), [order_id] int)
go
-- stored procedure to handle inserts
create procedure InsertProductWithOrder(
#OrderName nvarchar(100),
#ProductID nvarchar(100))
as
begin
declare #orderID int
insert into [order] ([name]) values(#OrderName)
select #orderID = ##identity
insert into [product_order]([product_id], [order_id]) values(#ProductID, #orderID)
end
go
-- insert records using the stored procedure
exec InsertProductWithOrder 'Order ONE', 'AAAAA'
exec InsertProductWithOrder 'Order TWO', 'BBBBB'
-- verify the results
select * from [order]
select * from [product_order]

Replacing a 3 level deep nested cursors in SQL Server

I have three SQL Server tables that I need to loop trhough and update. I did it successfully with a cursor but it is so slow that it is pretty pointless sincethe main table with all the data to loop through is over 1,000 rows long.
The tables are (with some sample data):
-- The PK is InvoiceId and the IsMajorPart is '0' or '1'.
-- The MajorPartId and SubPartId1 to 4 are "technically" FKs for PartId but aren't hooked up and will not be ever due to some external issues outside of scope.
-- The part Id's can be NULL or empty.
-- This table exists elsewhere and is loaded with Id's being varchars but in transfering they will be going in as int's which is the proper way.
CREATE TABLE dbo.Invoices(
InvoicdeId varchar(50),
PartName varchar(255),
IsMajorPart varchar(1)
MajorPartId varchar(50),
SubPartId1 varchar(50),
SubPartId2 varchar(50),
SubPartId3 varchar(50),
SubPartId4 varchar(50));
-- Sampe inserts
INSERT INTO dbo.Invoices VALUES ('1', 'A Part', '0', '', '100', '105', '' ,''):
INSERT INTO dbo.Invoices VALUES ('5', 'E Part', '1', '101', '110', '', '' ,''):
INSERT INTO dbo.Invoices VALUES ('11', 'Z Part', '1', '201', '100', '115', '' ,''):
-- Essentially the old table above is being moved into a normalized, correct tables below.
- The PK is the PartId
CREATE TABLE dbo.Parts
PartsId int,
PartName varchar(255)
-- Sampe inserts (that will be updated or inserted by looping through the first table)
INSERT INTO dbo.Parts VALUES (100,'A Part'):
INSERT INTO dbo.Parts VALUES (110,'B Part'):
INSERT INTO dbo.Parts VALUES (201,'C Part'):
-- The PK is the combination of InvoiceId and PartId
CREATE TABLE dbo.InvoiceToParts
InvoiceId int,
PartsId int,
IsMajorPart bit);
-- Sampe inserts (that will be inserted from the first table but conflicts might occur if an InvoiceId from the first table has 2 PartId's that are the same)
INSERT INTO dbo.Parts VALUES (1, 100, 0):
INSERT INTO dbo.Parts VALUES (5, 100, 1):
INSERT INTO dbo.Parts VALUES (17, 201, 0):
The sample INSERTs above are just samples of the data for seeing what is in the tables.
The rules to move Invoices (I don't care what happens to this table), into the correct tables of Parts and InvoiceToParts are below (and these last two tables are the only ones that I care about.
Loop through Invoices and get all the data.
First, find out if IsMajorPart is '1' and then get the MajorPartId.
Push the MajorPartId with PartName in Parts table if it DOESN'T already exist.
Next check InvoiceToParts to see if the PK of InvoiceId and PartId exist.
If they do, update IsMaorPart to '1'.
If they don't exist, INSERT it.
Next do the same process for all SubPartId1 to SubPartId4.
I have a nested 3-level cursor which performance-wise ran for over 30min before I stopped it as it wasn't even close to finishing and was sucking up all the resources. I am trying to look for a faster way to do this. The Invoices table can have up to about 5,000 rows in it.
You need to unpivot your data and then just do what is called an UPSERT, which has two steps:
If exists, update record(s)
If not exists, insert record(s)
Plenty of examples if you search for examples online for UPSERT
Table Setup
DROP TABLE IF EXISTS #Invoice
DROP TABLE IF EXISTS #Unpivot
DROP TABLE IF EXISTS #InvoiceToParts
DROP TABLE IF EXISTS #Parts
CREATE TABLE #Parts(
PartsId int,
PartName varchar(255)
)
CREATE TABLE #InvoiceToParts(
InvoiceId int,
PartsId int,
IsMajorPart bit
);
CREATE TABLE #Invoice(
InvoiceId varchar(50),
PartName varchar(255),
IsMajorPart varchar(1),
MajorPartsID varchar(50),
SubPartsID1 varchar(50),
SubPartsID2 varchar(50),
SubPartsID3 varchar(50),
SubPartsID4 varchar(50)
);
INSERT INTO #Invoice
VALUES ('1', 'A Part', '0', '', '100', '105', '' ,'')
,('5', 'E Part', '1', '101', '110', '', '' ,'')
,('11', 'Z Part', '1', '201', '100', '115', '' ,'')
SQL to Process Data
Will first unpivot the data, then load into Parts table first so the ID's can be referenced before inserting into the junction table InvoicetoParts
SELECT A.InvoiceId
,B.*
INTO #Unpivot
FROM #Invoice AS A
CROSS APPLY (
VALUES
(NULLIF(MajorPartsID,''),PartName,IsMajorPart)
,(NULLIF(SubPartsID1,''),NULL,0)
,(NULLIF(SubPartsID2,''),NULL,0)
,(NULLIF(SubPartsID3,''),NULL,0)
,(NULLIF(SubPartsID4,''),NULL,0)
) AS B(PartsID,PartName,IsMajorPart)
WHERE B.PartsID IS NOT NULL /*If not data, filter out*/
/*INSERT into table Parts if not exists*/
INSERT INTO #Parts
SELECT PartsID,PartName
FROM #Unpivot AS A
WHERE A.IsMajorPart = 1
AND NOT EXISTS (
SELECT *
FROM #Parts AS DTA
WHERE A.PartsID = DTA.PartsID
)
GROUP BY PartsID,PartName
/*UPSERT into table dbo.InvoiceParts*/
UPDATE #InvoiceToParts
SET IsMajorPart = B.IsMajorPart
FROM #InvoiceToParts AS A
INNER JOIN #Unpivot AS B
ON A.InvoiceId = B.InvoiceId
AND A.PartsId = B.PartsID
INSERT INTO #InvoiceToParts(InvoiceId,PartsId,IsMajorPart)
SELECT InvoiceId
,PartsId
,IsMajorPart
FROM #Unpivot AS A
WHERE NOT EXISTS (
SELECT *
FROM #InvoiceToParts AS DTA
WHERE A.InvoiceId = DTA.InvoiceID
AND A.PartsID = DTA.PartsID
)
SELECT *
FROM #InvoiceToParts
SELECT *
FROM #Parts

INSERT inside an INSERT statement and use its ID in the outer INSERT [duplicate]

Very simplified, I have two tables Source and Target.
declare #Source table (SourceID int identity(1,2), SourceName varchar(50))
declare #Target table (TargetID int identity(2,2), TargetName varchar(50))
insert into #Source values ('Row 1'), ('Row 2')
I would like to move all rows from #Source to #Target and know the TargetID for each SourceID because there are also the tables SourceChild and TargetChild that needs to be copied as well and I need to add the new TargetID into TargetChild.TargetID FK column.
There are a couple of solutions to this.
Use a while loop or cursors to insert one row (RBAR) to Target at a time and use scope_identity() to fill the FK of TargetChild.
Add a temp column to #Target and insert SourceID. You can then join that column to fetch the TargetID for the FK in TargetChild.
SET IDENTITY_INSERT OFF for #Target and handle assigning new values yourself. You get a range that you then use in TargetChild.TargetID.
I'm not all that fond of any of them. The one I used so far is cursors.
What I would really like to do is to use the output clause of the insert statement.
insert into #Target(TargetName)
output inserted.TargetID, S.SourceID
select SourceName
from #Source as S
But it is not possible
The multi-part identifier "S.SourceID" could not be bound.
But it is possible with a merge.
merge #Target as T
using #Source as S
on 0=1
when not matched then
insert (TargetName) values (SourceName)
output inserted.TargetID, S.SourceID;
Result
TargetID SourceID
----------- -----------
2 1
4 3
I want to know if you have used this? If you have any thoughts about the solution or see any problems with it? It works fine in simple scenarios but perhaps something ugly could happen when the query plan get really complicated due to a complicated source query. Worst scenario would be that the TargetID/SourceID pairs actually isn't a match.
MSDN has this to say about the from_table_name of the output clause.
Is a column prefix that specifies a table included in the FROM clause of a DELETE, UPDATE, or MERGE statement that is used to specify the rows to update or delete.
For some reason they don't say "rows to insert, update or delete" only "rows to update or delete".
Any thoughts are welcome and totally different solutions to the original problem is much appreciated.
In my opinion this is a great use of MERGE and output. I've used in several scenarios and haven't experienced any oddities to date.
For example, here is test setup that clones a Folder and all Files (identity) within it into a newly created Folder (guid).
DECLARE #FolderIndex TABLE (FolderId UNIQUEIDENTIFIER PRIMARY KEY, FolderName varchar(25));
INSERT INTO #FolderIndex
(FolderId, FolderName)
VALUES(newid(), 'OriginalFolder');
DECLARE #FileIndex TABLE (FileId int identity(1,1) PRIMARY KEY, FileName varchar(10));
INSERT INTO #FileIndex
(FileName)
VALUES('test.txt');
DECLARE #FileFolder TABLE (FolderId UNIQUEIDENTIFIER, FileId int, PRIMARY KEY(FolderId, FileId));
INSERT INTO #FileFolder
(FolderId, FileId)
SELECT FolderId,
FileId
FROM #FolderIndex
CROSS JOIN #FileIndex; -- just to illustrate
DECLARE #sFolder TABLE (FromFolderId UNIQUEIDENTIFIER, ToFolderId UNIQUEIDENTIFIER);
DECLARE #sFile TABLE (FromFileId int, ToFileId int);
-- copy Folder Structure
MERGE #FolderIndex fi
USING ( SELECT 1 [Dummy],
FolderId,
FolderName
FROM #FolderIndex [fi]
WHERE FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT
(FolderId, FolderName)
VALUES (newid(), 'copy_'+FolderName)
OUTPUT d.FolderId,
INSERTED.FolderId
INTO #sFolder (FromFolderId, toFolderId);
-- copy File structure
MERGE #FileIndex fi
USING ( SELECT 1 [Dummy],
fi.FileId,
fi.[FileName]
FROM #FileIndex fi
INNER
JOIN #FileFolder fm ON
fi.FileId = fm.FileId
INNER
JOIN #FolderIndex fo ON
fm.FolderId = fo.FolderId
WHERE fo.FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT ([FileName])
VALUES ([FileName])
OUTPUT d.FileId,
INSERTED.FileId
INTO #sFile (FromFileId, toFileId);
-- link new files to Folders
INSERT INTO #FileFolder (FileId, FolderId)
SELECT sfi.toFileId, sfo.toFolderId
FROM #FileFolder fm
INNER
JOIN #sFile sfi ON
fm.FileId = sfi.FromFileId
INNER
JOIN #sFolder sfo ON
fm.FolderId = sfo.FromFolderId
-- return
SELECT *
FROM #FileIndex fi
JOIN #FileFolder ff ON
fi.FileId = ff.FileId
JOIN #FolderIndex fo ON
ff.FolderId = fo.FolderId
I would like to add another example to add to #Nathan's example, as I found it somewhat confusing.
Mine uses real tables for the most part, and not temp tables.
I also got my inspiration from here: another example
-- Copy the FormSectionInstance
DECLARE #FormSectionInstanceTable TABLE(OldFormSectionInstanceId INT, NewFormSectionInstanceId INT)
;MERGE INTO [dbo].[FormSectionInstance]
USING
(
SELECT
fsi.FormSectionInstanceId [OldFormSectionInstanceId]
, #NewFormHeaderId [NewFormHeaderId]
, fsi.FormSectionId
, fsi.IsClone
, #UserId [NewCreatedByUserId]
, GETDATE() NewCreatedDate
, #UserId [NewUpdatedByUserId]
, GETDATE() NewUpdatedDate
FROM [dbo].[FormSectionInstance] fsi
WHERE fsi.[FormHeaderId] = #FormHeaderId
) tblSource ON 1=0 -- use always false condition
WHEN NOT MATCHED
THEN INSERT
( [FormHeaderId], FormSectionId, IsClone, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
VALUES( [NewFormHeaderId], FormSectionId, IsClone, NewCreatedByUserId, NewCreatedDate, NewUpdatedByUserId, NewUpdatedDate)
OUTPUT tblSource.[OldFormSectionInstanceId], INSERTED.FormSectionInstanceId
INTO #FormSectionInstanceTable(OldFormSectionInstanceId, NewFormSectionInstanceId);
-- Copy the FormDetail
INSERT INTO [dbo].[FormDetail]
(FormHeaderId, FormFieldId, FormSectionInstanceId, IsOther, Value, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
SELECT
#NewFormHeaderId, FormFieldId, fsit.NewFormSectionInstanceId, IsOther, Value, #UserId, CreatedDate, #UserId, UpdatedDate
FROM [dbo].[FormDetail] fd
INNER JOIN #FormSectionInstanceTable fsit ON fsit.OldFormSectionInstanceId = fd.FormSectionInstanceId
WHERE [FormHeaderId] = #FormHeaderId
Here's a solution that doesn't use MERGE (which I've had problems with many times I try to avoid if possible). It relies on two memory tables (you could use temp tables if you want) with IDENTITY columns that get matched, and importantly, using ORDER BY when doing the INSERT, and WHERE conditions that match between the two INSERTs... the first one holds the source IDs and the second one holds the target IDs.
-- Setup... We have a table that we need to know the old IDs and new IDs after copying.
-- We want to copy all of DocID=1
DECLARE #newDocID int = 99;
DECLARE #tbl table (RuleID int PRIMARY KEY NOT NULL IDENTITY(1, 1), DocID int, Val varchar(100));
INSERT INTO #tbl (DocID, Val) VALUES (1, 'RuleA-2'), (1, 'RuleA-1'), (2, 'RuleB-1'), (2, 'RuleB-2'), (3, 'RuleC-1'), (1, 'RuleA-3')
-- Create a break in IDENTITY values.. just to simulate more realistic data
INSERT INTO #tbl (Val) VALUES ('DeleteMe'), ('DeleteMe');
DELETE FROM #tbl WHERE Val = 'DeleteMe';
INSERT INTO #tbl (DocID, Val) VALUES (6, 'RuleE'), (7, 'RuleF');
SELECT * FROM #tbl t;
-- Declare TWO temp tables each with an IDENTITY - one will hold the RuleID of the items we are copying, other will hold the RuleID that we create
DECLARE #input table (RID int IDENTITY(1, 1), SourceRuleID int NOT NULL, Val varchar(100));
DECLARE #output table (RID int IDENTITY(1,1), TargetRuleID int NOT NULL, Val varchar(100));
-- Capture the IDs of the rows we will be copying by inserting them into the #input table
-- Important - we must specify the sort order - best thing is to use the IDENTITY of the source table (t.RuleID) that we are copying
INSERT INTO #input (SourceRuleID, Val) SELECT t.RuleID, t.Val FROM #tbl t WHERE t.DocID = 1 ORDER BY t.RuleID;
-- Copy the rows, and use the OUTPUT clause to capture the IDs of the inserted rows.
-- Important - we must use the same WHERE and ORDER BY clauses as above
INSERT INTO #tbl (DocID, Val)
OUTPUT Inserted.RuleID, Inserted.Val INTO #output(TargetRuleID, Val)
SELECT #newDocID, t.Val FROM #tbl t
WHERE t.DocID = 1
ORDER BY t.RuleID;
-- Now #input and #output should have the same # of rows, and the order of both inserts was the same, so the IDENTITY columns (RID) can be matched
-- Use this as the map from old-to-new when you are copying sub-table rows
-- Technically, #input and #output don't even need the 'Val' columns, just RID and RuleID - they were included here to prove that the rules matched
SELECT i.*, o.* FROM #output o
INNER JOIN #input i ON i.RID = o.RID
-- Confirm the matching worked
SELECT * FROM #tbl t

SQL Server, Select/Output/Insert - need to select value for output but not insert [duplicate]

Very simplified, I have two tables Source and Target.
declare #Source table (SourceID int identity(1,2), SourceName varchar(50))
declare #Target table (TargetID int identity(2,2), TargetName varchar(50))
insert into #Source values ('Row 1'), ('Row 2')
I would like to move all rows from #Source to #Target and know the TargetID for each SourceID because there are also the tables SourceChild and TargetChild that needs to be copied as well and I need to add the new TargetID into TargetChild.TargetID FK column.
There are a couple of solutions to this.
Use a while loop or cursors to insert one row (RBAR) to Target at a time and use scope_identity() to fill the FK of TargetChild.
Add a temp column to #Target and insert SourceID. You can then join that column to fetch the TargetID for the FK in TargetChild.
SET IDENTITY_INSERT OFF for #Target and handle assigning new values yourself. You get a range that you then use in TargetChild.TargetID.
I'm not all that fond of any of them. The one I used so far is cursors.
What I would really like to do is to use the output clause of the insert statement.
insert into #Target(TargetName)
output inserted.TargetID, S.SourceID
select SourceName
from #Source as S
But it is not possible
The multi-part identifier "S.SourceID" could not be bound.
But it is possible with a merge.
merge #Target as T
using #Source as S
on 0=1
when not matched then
insert (TargetName) values (SourceName)
output inserted.TargetID, S.SourceID;
Result
TargetID SourceID
----------- -----------
2 1
4 3
I want to know if you have used this? If you have any thoughts about the solution or see any problems with it? It works fine in simple scenarios but perhaps something ugly could happen when the query plan get really complicated due to a complicated source query. Worst scenario would be that the TargetID/SourceID pairs actually isn't a match.
MSDN has this to say about the from_table_name of the output clause.
Is a column prefix that specifies a table included in the FROM clause of a DELETE, UPDATE, or MERGE statement that is used to specify the rows to update or delete.
For some reason they don't say "rows to insert, update or delete" only "rows to update or delete".
Any thoughts are welcome and totally different solutions to the original problem is much appreciated.
In my opinion this is a great use of MERGE and output. I've used in several scenarios and haven't experienced any oddities to date.
For example, here is test setup that clones a Folder and all Files (identity) within it into a newly created Folder (guid).
DECLARE #FolderIndex TABLE (FolderId UNIQUEIDENTIFIER PRIMARY KEY, FolderName varchar(25));
INSERT INTO #FolderIndex
(FolderId, FolderName)
VALUES(newid(), 'OriginalFolder');
DECLARE #FileIndex TABLE (FileId int identity(1,1) PRIMARY KEY, FileName varchar(10));
INSERT INTO #FileIndex
(FileName)
VALUES('test.txt');
DECLARE #FileFolder TABLE (FolderId UNIQUEIDENTIFIER, FileId int, PRIMARY KEY(FolderId, FileId));
INSERT INTO #FileFolder
(FolderId, FileId)
SELECT FolderId,
FileId
FROM #FolderIndex
CROSS JOIN #FileIndex; -- just to illustrate
DECLARE #sFolder TABLE (FromFolderId UNIQUEIDENTIFIER, ToFolderId UNIQUEIDENTIFIER);
DECLARE #sFile TABLE (FromFileId int, ToFileId int);
-- copy Folder Structure
MERGE #FolderIndex fi
USING ( SELECT 1 [Dummy],
FolderId,
FolderName
FROM #FolderIndex [fi]
WHERE FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT
(FolderId, FolderName)
VALUES (newid(), 'copy_'+FolderName)
OUTPUT d.FolderId,
INSERTED.FolderId
INTO #sFolder (FromFolderId, toFolderId);
-- copy File structure
MERGE #FileIndex fi
USING ( SELECT 1 [Dummy],
fi.FileId,
fi.[FileName]
FROM #FileIndex fi
INNER
JOIN #FileFolder fm ON
fi.FileId = fm.FileId
INNER
JOIN #FolderIndex fo ON
fm.FolderId = fo.FolderId
WHERE fo.FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT ([FileName])
VALUES ([FileName])
OUTPUT d.FileId,
INSERTED.FileId
INTO #sFile (FromFileId, toFileId);
-- link new files to Folders
INSERT INTO #FileFolder (FileId, FolderId)
SELECT sfi.toFileId, sfo.toFolderId
FROM #FileFolder fm
INNER
JOIN #sFile sfi ON
fm.FileId = sfi.FromFileId
INNER
JOIN #sFolder sfo ON
fm.FolderId = sfo.FromFolderId
-- return
SELECT *
FROM #FileIndex fi
JOIN #FileFolder ff ON
fi.FileId = ff.FileId
JOIN #FolderIndex fo ON
ff.FolderId = fo.FolderId
I would like to add another example to add to #Nathan's example, as I found it somewhat confusing.
Mine uses real tables for the most part, and not temp tables.
I also got my inspiration from here: another example
-- Copy the FormSectionInstance
DECLARE #FormSectionInstanceTable TABLE(OldFormSectionInstanceId INT, NewFormSectionInstanceId INT)
;MERGE INTO [dbo].[FormSectionInstance]
USING
(
SELECT
fsi.FormSectionInstanceId [OldFormSectionInstanceId]
, #NewFormHeaderId [NewFormHeaderId]
, fsi.FormSectionId
, fsi.IsClone
, #UserId [NewCreatedByUserId]
, GETDATE() NewCreatedDate
, #UserId [NewUpdatedByUserId]
, GETDATE() NewUpdatedDate
FROM [dbo].[FormSectionInstance] fsi
WHERE fsi.[FormHeaderId] = #FormHeaderId
) tblSource ON 1=0 -- use always false condition
WHEN NOT MATCHED
THEN INSERT
( [FormHeaderId], FormSectionId, IsClone, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
VALUES( [NewFormHeaderId], FormSectionId, IsClone, NewCreatedByUserId, NewCreatedDate, NewUpdatedByUserId, NewUpdatedDate)
OUTPUT tblSource.[OldFormSectionInstanceId], INSERTED.FormSectionInstanceId
INTO #FormSectionInstanceTable(OldFormSectionInstanceId, NewFormSectionInstanceId);
-- Copy the FormDetail
INSERT INTO [dbo].[FormDetail]
(FormHeaderId, FormFieldId, FormSectionInstanceId, IsOther, Value, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
SELECT
#NewFormHeaderId, FormFieldId, fsit.NewFormSectionInstanceId, IsOther, Value, #UserId, CreatedDate, #UserId, UpdatedDate
FROM [dbo].[FormDetail] fd
INNER JOIN #FormSectionInstanceTable fsit ON fsit.OldFormSectionInstanceId = fd.FormSectionInstanceId
WHERE [FormHeaderId] = #FormHeaderId
Here's a solution that doesn't use MERGE (which I've had problems with many times I try to avoid if possible). It relies on two memory tables (you could use temp tables if you want) with IDENTITY columns that get matched, and importantly, using ORDER BY when doing the INSERT, and WHERE conditions that match between the two INSERTs... the first one holds the source IDs and the second one holds the target IDs.
-- Setup... We have a table that we need to know the old IDs and new IDs after copying.
-- We want to copy all of DocID=1
DECLARE #newDocID int = 99;
DECLARE #tbl table (RuleID int PRIMARY KEY NOT NULL IDENTITY(1, 1), DocID int, Val varchar(100));
INSERT INTO #tbl (DocID, Val) VALUES (1, 'RuleA-2'), (1, 'RuleA-1'), (2, 'RuleB-1'), (2, 'RuleB-2'), (3, 'RuleC-1'), (1, 'RuleA-3')
-- Create a break in IDENTITY values.. just to simulate more realistic data
INSERT INTO #tbl (Val) VALUES ('DeleteMe'), ('DeleteMe');
DELETE FROM #tbl WHERE Val = 'DeleteMe';
INSERT INTO #tbl (DocID, Val) VALUES (6, 'RuleE'), (7, 'RuleF');
SELECT * FROM #tbl t;
-- Declare TWO temp tables each with an IDENTITY - one will hold the RuleID of the items we are copying, other will hold the RuleID that we create
DECLARE #input table (RID int IDENTITY(1, 1), SourceRuleID int NOT NULL, Val varchar(100));
DECLARE #output table (RID int IDENTITY(1,1), TargetRuleID int NOT NULL, Val varchar(100));
-- Capture the IDs of the rows we will be copying by inserting them into the #input table
-- Important - we must specify the sort order - best thing is to use the IDENTITY of the source table (t.RuleID) that we are copying
INSERT INTO #input (SourceRuleID, Val) SELECT t.RuleID, t.Val FROM #tbl t WHERE t.DocID = 1 ORDER BY t.RuleID;
-- Copy the rows, and use the OUTPUT clause to capture the IDs of the inserted rows.
-- Important - we must use the same WHERE and ORDER BY clauses as above
INSERT INTO #tbl (DocID, Val)
OUTPUT Inserted.RuleID, Inserted.Val INTO #output(TargetRuleID, Val)
SELECT #newDocID, t.Val FROM #tbl t
WHERE t.DocID = 1
ORDER BY t.RuleID;
-- Now #input and #output should have the same # of rows, and the order of both inserts was the same, so the IDENTITY columns (RID) can be matched
-- Use this as the map from old-to-new when you are copying sub-table rows
-- Technically, #input and #output don't even need the 'Val' columns, just RID and RuleID - they were included here to prove that the rules matched
SELECT i.*, o.* FROM #output o
INNER JOIN #input i ON i.RID = o.RID
-- Confirm the matching worked
SELECT * FROM #tbl t

What the best way to get values in multiple rows and set variables with out cursors in SQL Server? (PIVOT more or less)

I could not think of a better question for the problem, but here it is.
I have 3 tables in a many to many relationship like:
Students -> StudentTasks <- Tasks
The student task has a column called "Marked", and when a student is created a create a record for possible tasks and Marked = 0, Ex:
Student ---------
Given
"StudentA -> Id = 1"
Tasks ------------
Given
"Task A -> Id = 1"
"Task B -> Id = 2"
StudentTasks ------
StudentId -- TaskId -- Marked
1 -- 1 -- 0
1 -- 2 -- 0
What I need is for each row in student task I have to set variable #TaskA, #TaskB
with the respective "Marked" value, so I can update another table's column with it.
Another way to solve the problem would be to update the table directly, so given the same scenario, but with the addition of a table like so:
StudentId -- TaskA -- TaskB
I would like to see it filled like this:
StudentId -- TaskA -- TaskB
1 -- 0 -- 0
and if we had "StudentB -< Id = 2" with TaskB marked a 1 we would have:
StudentId -- TaskA -- TaskB
1 -- 0 -- 0
2 -- 0 -- 1
The way I am doing is not efficient at all, its taking 40 seconds to go through 3300 records (I am using cursors to walk through the list of students), any suggestions are welcome.
UPDATES:
Using the idea #djangojazz did with the self extracting queries
DECLARE #Student TABLE ( StudentID INT IDENTITY, Name VARCHAR(50));
INSERT INTO #Student VALUES ('Brett'),('Sean')
DECLARE #StudentTasks TABLE (StudentID INT, TaskId INT, Marked BIT);
INSERT INTO #StudentTasks VALUES (1,1,0),(1,2,0),(2,1,0),(2,2,1)
DECLARE #Tasks TABLE (TaskID INT IDENTITY, NAME VARCHAR(50));
INSERT INTO #Tasks VALUES ('Study'),('Do Test')
SELECT * FROM #Student
SELECT * FROM #StudentTasks
SELECT * FROM #Tasks
-- THIS IS WHAT I NEED IN THE RESULT
DECLARE #ResultTable TABLE (StudentId INT, Study BIT, DoTest BIT);
INSERT INTO #ResultTable VALUES (1,0,0),(2,0,1)
SELECT * FROM #ResultTable
UPDATED 6-13-13. I don't even think you need the 'Students' table right off the bat as you just want to pivot on the identifier. The problem may become though if you use repeat values for anything in the future this logic will break. Meaning if you have a table where a Student Id may be repeated for another value with a type other than bit you would then need to perform more operations to see which was the most current by an identity or such. Saying that though I can give you what you say you wanted.
DECLARE #Student TABLE ( StudentID INT IDENTITY, Name VARCHAR(50));
INSERT INTO #Student VALUES ('Brett'),('Sean')
DECLARE #StudentTasks TABLE (StudentID INT, TaskId INT, Marked BIT);
INSERT INTO #StudentTasks VALUES (1,1,0),(1,2,0),(2,1,0),(2,2,1)
DECLARE #Tasks TABLE (TaskID INT IDENTITY, NAME VARCHAR(50));
INSERT INTO #Tasks VALUES ('Study'),('Do Test')
DECLARE #ResultTable TABLE (StudentId INT, Study BIT, DoTest BIT);
INSERT INTO #ResultTable VALUES (1,0,0),(2,0,1)
select 'Results you wanted'
SELECT *
FROM #ResultTable
select 'Method A'
;
-- with case when forcing a pivot on an expression
With x as
(
select
st.StudentID
, cast(st.Marked as tinyint) as Marked
, t.NAME
from #StudentTasks st
join #Tasks t on st.TaskId = t.TaskID
)
Select
x.StudentID
, max(case when NAME = 'Study' then Marked end) as Study
, max(case when NAME = 'Do Test' then Marked end) as DoTest
FROM x
group by x.StudentID
Select 'Method B'
;
-- with traditional pivot you need to translate names I believe
With x as
(
select
st.StudentID
, cast(st.Marked as tinyint) as Marked
, case when t.NAME = 'Study' then 0 else 1 end as Name
from #StudentTasks st
join #Tasks t on st.TaskId = t.TaskID
)
Select
pvt.StudentID
, [0] as Study
, [1] as 'Do Test'
FROM x
pivot(max(x.Marked) for x.Name in ([0], [1])) as pvt
PIVOT the StudentTasks table for the 2 Tasks - A and B, JOIN it with the 'Additional table' on StudentID and Update it ..