SSIS incremental data load error - sql

I am trying to perform incremental insert from staging table (cust_reg_dim_stg) to the warehouse table (dim_cust_reg). This is the query I am using.
insert into dim_cust_reg WITH(TABLOCK)
(
channel_id
,cust_reg_id
,cust_id
,status
,date_created
,date_activated
,date_archived
,custodian_id
,reg_type_id
,reg_flags
,acc_name
,acc_number
,sr_id
,sr_type
,as_of_date
,ins_timestamp
)
select channel_id
,cust_reg_id
,cust_id
,status
,date_created
,date_activated
,date_archived
,reg_type_id
,reg_flags
,acc_name
,acc_number
,sr_id
,sr_type
,as_of_date
,getdate() ins_timestamp
from umpdwstg..cust_reg_dim_stg stg with(nolock)
join lookup_channel ch with(nolock) on stg.channel_name = ch.channel_name
where not exists
(select * from dim_cust_reg dest
where dest.cust_reg_id=stg.cust_reg_id
and dest.sr_id=stg.sr_id
and dest.channel_id=ch.channel_id )
Here channel_id is not there in the staging table and is taken using a channel lookup table (lookup_channel). On running this query I am getting the following error.
Violation of PRIMARY KEY constraint 'PK__dim_cust__4A293521A789A5FA'.
Cannot insert duplicate key in object 'dbo.dim_cust_reg'.
What is wrong with the query? channel_id,sr_id and cust_reg_id forms the unique key combination. There seems to be no data error.

There are 2 areas where you will need to troubleshoot:-
In this code below:
join lookup_channel ch with(nolock) on stg.channel_name = ch.channel_name
The incoming channel_name in the staging table may have a different channel name as compared to the record in the destination dimension.
OR
it may be because of this join condition inside the NOT EXISTS condition:
and dest.sr_id=stg.sr_id
and dest.channel_id=ch.channel_id
Here, again the incoming channel_id may be different when you compare the staged data to the one in the destination. So, suggestion is to ignore the channel id once and try to troubleshoot. Once this data is loaded in the target you can get the exact reason whether error was because of the channel_id.
Happy troubleshooting!

If there is already a duplicate entries in the table - custr_regr_dim_stg - then the SELECT query will produce both those records and will try to insert the same into the dim_cust_reg table. So use DISTINCT in the SELECT statement.

Related

Why PostgreSQL CTE with DELETE is not working?

I was trying to delete a record from my stock table if the update in the same table results in quantity 0 using two CTEs.
The upserts are working, but the delete is not generating the result I was expecting. the quantity in stock table is changing to zero but the record is not being deleted.
Table structure:
CREATE TABLE IF NOT EXISTS stock_location (
stock_location_id SERIAL
, site_code VARCHAR(10) NOT NULL
, location_code VARCHAR(50) NOT NULL
, status CHAR(1) NOT NULL DEFAULT 'A'
, CONSTRAINT pk_stock_location PRIMARY KEY (stock_location_id)
, CONSTRAINT ui_stock_location__keys UNIQUE (site_code, location_code)
);
CREATE TABLE IF NOT EXISTS stock (
stock_id SERIAL
, stock_location_id INT NOT NULL
, item_code VARCHAR(50) NOT NULL
, quantity FLOAT NOT NULL
, CONSTRAINT pk_stock PRIMARY KEY (stock_id)
, CONSTRAINT ui_stock__keys UNIQUE (stock_location_id, item_code)
, CONSTRAINT fk_stock__stock_location FOREIGN KEY (stock_location_id)
REFERENCES stock_location (stock_location_id)
ON DELETE CASCADE ON UPDATE CASCADE
);
This is how the statement looks like:
WITH stock_location_upsert AS (
INSERT INTO stock_location (
site_code
, location_code
, status
) VALUES (
inSiteCode
, inLocationCode
, inStatus
)
ON CONFLICT ON CONSTRAINT ui_stock_location__keys
DO UPDATE SET
status = inStatus
RETURNING stock_location_id
)
, stock_upsert AS (
INSERT INTO stock (
stock_location_id
, item_code
, quantity
)
SELECT
slo.stock_location_id
, inItemCode
, inQuantity
FROM stock_location_upsert slo
ON CONFLICT ON CONSTRAINT ui_stock__keys
DO UPDATE SET
quantity = stock.quantity + inQuantity
RETURNING stock_id, quantity
)
DELETE FROM stock stk
USING stock_upsert stk2
WHERE stk.stock_id = stk2.stock_id
AND stk.quantity = 0;
Does anyone know what's going on?
This is an example of what I'm trying to do:
DROP TABLE IF EXISTS test1;
CREATE TABLE IF NOT EXISTS test1 (
id serial
, code VARCHAR(10) NOT NULL
, description VARCHAR(100) NOT NULL
, quantity INT NOT NULL
, CONSTRAINT pk_test1 PRIMARY KEY (id)
, CONSTRAINT ui_test1 UNIQUE (code)
);
-- UPSERT
WITH test1_upsert AS (
INSERT INTO test1 (
code, description, quantity
) VALUES (
'01', 'DESC 01', 1
)
ON CONFLICT ON CONSTRAINT ui_test1
DO UPDATE SET
description = 'DESC 02'
, quantity = 0
RETURNING test1.id, test1.quantity
)
DELETE FROM test1
USING test1_upsert
WHERE test1.id = test1_upsert.id
AND test1_upsert.quantity = 0;
The second time the UPSERT command runs, it should delete the record from test1 once the quantity will be updated to zero.
Makes sense?
Here, DELETE is working in the way it was designed to work. The answer is actually pretty straightforward and documented. I've experienced the same behaviour years ago.
The reason your delete is not actually removing the data is because your where condition doesn't match with what's stored inside the table as far as what the delete statement sees.
All sub-statements within CTE (Common Table Expression) are executed with the same snapshot of data, so they can't see other statement effect on target table. In this case, when you run UPDATE and then DELETE, the DELETE statement sees the same data that UPDATE did, and doesn't see the updated data that UPDATE statement modified.
How can you work around that? You need to separate UPDATE & DELETE into two independent statements.
In case you need to pass the information about what to delete you could for example (1) create a temporary table and insert the data primary key that has been updated so that you can join to that in your latter query (DELETE based on data that was UPDATEd). (2) You could achieve the same result by simply adding a column within the updated table and changing its value to mark updated rows or (3) however you like it to get the job done. You should get the feeling of what needs to be done by above examples.
Quoting the manual to support my findings:
7.8.2. Data-Modifying Statements in WITH
The sub-statements in WITH are executed concurrently with each other
and with the main query. Therefore, when using data-modifying
statements in WITH, the order in which the specified updates actually
happen is unpredictable. All the statements are executed with the same
snapshot (see Chapter 13), so they cannot “see” one another's effects
on the target tables.
(...)
This also applies to deleting a row that was already updated in the same statement: only the update is performed
Adding to the helpful explanation above... Whenever possible it is absolutely best to break out modifying procedures into their own statements.
However, when the CTE has multiple modifying procedures that reference the same subquery and temporary tables are unideal (such as in stored procedures) then you just need a good solution.
In that case if you'd like a simple trick about how to go about ensuring a bit of order, consider this example:
WITH
to_insert AS
(
SELECT
*
FROM new_values
)
, first AS
(
DELETE FROM some_table
WHERE
id in (SELECT id FROM to_insert)
RETURNING *
)
INSERT INTO some_other_table
SELECT * FROM new_values
WHERE
exists (SELECT count(*) FROM first)
;
The trick here is the exists (SELECT count(*) FROM first) part which must be executed first before the insert can happen. This is a way (which I wouldn't consider too hacky) to enforce an order while keeping everything within one CTE.
But this is just the concept - there are more optimal ways of doing the same thing for a given context.

PostgreSQL: Get last updates by joining 2 tables

I have 2 tables that I need to join to get the last/latest update in the 2nd table based on valid rows in the 1st table.
Code below is en example.
Table 1: Registered users
This table contains a list of users registered in the system.
When a user gets registered it gets added into this table. A user is registered with a name, and a registration time.
A user can get de-registered from the system. When this is done, the de-registration column gets updated to the time that the user was removed. If this value is NULL, it means that the user is still registered.
CREATE TABLE users (
entry_idx SERIAL PRIMARY KEY,
name TEXT NOT NULL,
reg_time TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
dereg_time TIMESTAMP WITH TIME ZONE DEFAULT NULL
);
Table 2: User updates
This table contains updates on the users. Each time a user changes a property (example position) the change gets stored in this table. No updates must be removed since there is a requirement to keep history in the table.
CREATE TABLE user_updates (
entry_idx SERIAL PRIMARY KEY,
name TEXT NOT NULL,
position INTEGER NOT NULL,
time TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Required output
So given the above information, I need to get a new table that contains only the last update for the current registered users.
Test Data
The following data can be used as test data for the above tables:
-- Register 3 users
INSERT INTO users(name) VALUES ('Person1');
INSERT INTO users(name) VALUES ('Person2');
INSERT INTO users(name) VALUES ('Person3');
-- Add some updates for all users
INSERT INTO user_updates(name, position) VALUES ('Person1', 0);
INSERT INTO user_updates(name, position) VALUES ('Person1', 1);
INSERT INTO user_updates(name, position) VALUES ('Person1', 2);
INSERT INTO user_updates(name, position) VALUES ('Person2', 1);
INSERT INTO user_updates(name, position) VALUES ('Person3', 1);
-- Unregister the 2nd user
UPDATE users SET dereg_time = NOW() WHERE name = 'Person2';
From the above, I want the last updates for Person 1 and Person 3.
Failed attempt
I have tried using joins and other methods but the results are not what I am looking for. The question is almost the same as one asked here. I have used the solution in answer 1 and it does give the correct answer, but it takes too long to get too the answer in my system.
Based on the above link I have created the following query that 'works':
SELECT
t1.*
, t2.*
FROM
users t1
JOIN (
SELECT
t.*,
row_number()
OVER (
PARTITION BY
t.name
ORDER BY t.entry_idx DESC
) rn
FROM user_updates t
) t2
ON
t1.name = t2.name
AND
t2.rn = 1
WHERE
t1.dereg_time IS NULL;
Problem
The problem with the above query is that it takes very long to complete. Table 1 contains a small list of users, while table 2 contains a huge amount of updates. I think that the query might be inefficient in the way that it handles the 2 tables (based on my limited understanding of the query). From pgAdmin's explain it does a lot of sorting and aggregation on the updates 1st before joining with the registered table.
Question
How can I formulate a query to efficiently and fast get the latest updates for registered users?
PostgreSQL have a special distinct on syntax for such type of queries:
select distinct on(t1.name)
--it's better to specify columns explicitly, * just for example
t1.*, t2.*
from users as t1
left outer join user_updates as t2 on t2.name = t1.name
where t1.dereg_time is null
order by t1.name, t2.entry_idx desc
sql fiddle demo
you can try it, but for me your query should work fine too.
I am using q1 to get the last update of each user. Then joining with users to remove entries that have been deregistered. Then joining with q2 to get rest of user_update fields.
select users.*,q2.* from users
join
(select name,max(time) t from user_updates group by name) q1
on users.name=q1.name
join user_updates q2 on q1.t=q2.time and q1.name=q2.name
where
users.dereg_time is null
(I haven't tested it. have edited some things)

in SQL, best way to join first and last instance of child table without NOT EXISTS?

in PostgreSQL, have issue table and child issue_step table - an issue contains one or more steps.
the view issue_v pulls things from the issue and the first and last step: author and from_ts are pulled from the first step, while status and thru_ts are pulled from the last step.
the tables
create table if not exists seeplai.issue(
isu_id serial primary key,
subject varchar(240)
);
create table if not exists seeplai.issue_step(
stp_id serial primary key,
isu_id int not null references seeplai.issue on delete cascade,
status varchar(12) default 'open',
stp_ts timestamp(0) default current_timestamp,
author varchar(40),
notes text
);
the view
create view seeplai.issue_v as
select isu.*,
first.stp_ts as from_ts,
first.author as author,
first.notes as notes,
last.stp_ts as thru_ts,
last.status as status
from seeplai.issue isu
join seeplai.issue_step first on( first.isu_id = isu.isu_id and not exists(
select 1 from seeplai.issue_step where isu_id=isu.isu_id and stp_id>first.stp_id ) )
join seeplai.issue_step last on( last.isu_id = isu.isu_id and not exists(
select 1 from seeplai.issue_step where isu_id=isu.isu_id and stp_id<last.stp_id ) );
note1: issue_step.stp_id is guaranteed to be chronologically sequential, so using it instead of stp_ts because it's already indexed
this works, but ugly as sin, and cannot be the most efficient query in the world.
In this code, I use a sub-query to find the first and last step IDs, and then join to the two instances of the step table by using those found values.
SELECT ISU.*
,S1.STP_TS AS FROM_TS
,S1.AUTHOR AS AUTHOR
,S1.NOTES AS NOTES
,S2.STP_TS AS THRU_TS
,S2.STATUS AS STATUS
FROM SEEPLAI.ISSUE ISU
INNER JOIN
(
SELECT ISU_ID
,MIN(STP_ID) AS MIN_ID
,MAX(STP_ID AS MAX_ID
FROM SEEPLAI.ISSUE_STEP
GROUP BY
ISU_ID
) SQ
ON SQ.ISU_ID = ISU.ISU.ID
INNER JOIN
SEEPLAI.ISSUE_STEP S1
ON S1.STP_ID = SQ.MIN_ID
INNER JOIN
SEEPLAI.ISSUE_STEP S2
ON S2.STP_ID = SQ.MAX_ID
Note: you really shouldn't be using a select * in a view. It is much better practice to list out all the fields that you need in the view explicitly
Have you considered using window functions?
http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.html
A starting point:
select steps.*,
first_value(steps.stp_id) over w as first_id,
last_value(steps.stp_id) over w as last_id
from issue_step steps
window w as (partition by steps.isu_id order by steps.stp_id)
Btw, if you know the IDs in advance, you'll much be better off getting details in a separate query. (Trying to fetch everything in one go will just yield sucky plans due to subqueries or joins on aggregates, which will result in inefficiently considering/joining the entire tables together.)

Delete duplicates with no primary key

Here want to delete rows with a duplicated column's value (Product) which will be then used as a primary key.
The column is of type nvarchar and we don't want to have 2 rows for one product.
The database is a large one with about thousands rows we need to remove.
During the query for all the duplicates, we want to keep the first item and remove the second one as the duplicate.
There is no primary key yet, and we want to make it after this activity of removing duplicates.
Then the Product columm could be our primary key.
The database is SQL Server CE.
I tried several methods, and mostly getting error similar to :
There was an error parsing the query. [ Token line number = 2,Token line offset = 1,Token in error = FROM ]
A method which I tried :
DELETE FROM TblProducts
FROM TblProducts w
INNER JOIN (
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
)Dup ON w.Product = Dup.Product
The preferred way trying to learn and adjust my code with something similar
(It's not correct yet):
SELECT Product, COUNT(*) TotalCount
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--
;WITH cte -- These 3 lines are the lines I have more doubt on them
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY ( SELECT 0)) RN
FROM Word)
DELETE FROM cte
WHERE RN > 1
If you have two DIFFERENT records with the same Product column, then you can SELECT the unwanted records with some criterion, e.g.
CREATE TABLE victims AS
SELECT MAX(entryDate) AS date, Product, COUNT(*) AS dups FROM ProductsTable WHERE ...
GROUP BY Product HAVING dups > 1;
Then you can do a DELETE JOIN between ProductTable and Victims.
Or also you can select Product only, and then do a DELETE for some other JOIN condition, for example having an invalid CustomerId, or EntryDate NULL, or anything else. This works if you know that there is one and only one valid copy of Product, and all the others are recognizable by the invalid data.
Suppose you instead have IDENTICAL records (or you have both identical and non-identical, or you may have several dupes for some product and you don't know which). You run exactly the same query. Then, you run a SELECT query on ProductsTable and SELECT DISTINCT all products matching the product codes to be deduped, grouping by Product, and choosing a suitable aggregate function for all fields (if identical, any aggregate should do. Otherwise I usually try for MAX or MIN). This will "save" exactly one row for each product.
At that point you run the DELETE JOIN and kill all the duplicated products. Then, simply reimport the saved and deduped subset into the main table.
Of course, between the DELETE JOIN and the INSERT SELECT, you will have the DB in a unstable state, with all products with at least one duplicate simply disappeared.
Another way which should work in MySQL:
-- Create an empty table
CREATE TABLE deduped AS SELECT * FROM ProductsTable WHERE false;
CREATE UNIQUE INDEX deduped_ndx ON deduped(Product);
-- DROP duplicate rows, Joe the Butcher's way
INSERT IGNORE INTO deduped SELECT * FROM ProductsTable;
ALTER TABLE ProductsTable RENAME TO ProductsBackup;
ALTER TABLE deduped RENAME TO ProductsTable;
-- TODO: Copy all indexes from ProductsTable on deduped.
NOTE: the way above DOES NOT WORK if you want to distinguish "good records" and "invalid duplicates". It only works if you have redundant DUPLICATE records, or if you do not care which row you keep and which you throw away!
EDIT:
You say that "duplicates" have invalid fields. In that case you can modify the above with a sorting trick:
SELECT * FROM ProductsTable ORDER BY Product, FieldWhichShouldNotBeNULL IS NULL;
Then if you have only one row for product, all well and good, it will get selected. If you have more, the one for which (FieldWhichShouldNeverBeNull IS NULL) is FALSE (i.e. the one where the FieldWhichShouldNeverBeNull is actually not null as it should) will be selected first, and inserted. All others will bounce, silently due to the IGNORE clause, against the uniqueness of Product. Not a really pretty way to do it (and check I didn't mix true with false in my clause!), but it ought to work.
EDIT
actually more of a new answer
This is a simple table to illustrate the problem
CREATE TABLE ProductTable ( Product varchar(10), Description varchar(10) );
INSERT INTO ProductTable VALUES ( 'CBPD10', 'C-Beam Prj' );
INSERT INTO ProductTable VALUES ( 'CBPD11', 'C Proj Mk2' );
INSERT INTO ProductTable VALUES ( 'CBPD12', 'C Proj Mk3' );
There is no index yet, and no primary key. We could still declare Product to be primary key.
But something bad happens. Two new records get in, and both have NULL description.
Yet, the second one is a valid product since we knew nothing of CBPD14 before now, and therefore we do NOT want to lose this record completely. We do want to get rid of the spurious CBPD10 though.
INSERT INTO ProductTable VALUES ( 'CBPD10', NULL );
INSERT INTO ProductTable VALUES ( 'CBPD14', NULL );
A rude DELETE FROM ProductTable WHERE Description IS NULL is out of the question, it would kill CBPD14 which isn't a duplicate.
So we do it like this. First get the list of duplicates:
SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1;
We assume that: "There is at least one good record for every set of bad records".
We check this assumption by positing the opposite and querying for it. If all is copacetic we expect this query to return nothing.
SELECT Dups.Product FROM ProductTable
RIGHT JOIN ( SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1 ) AS Dups
ON (ProductTable.Product = Dups.Product
AND ProductTable.Description IS NOT NULL)
WHERE ProductTable.Description IS NULL;
To further verify, I insert two records that represent this mode of failure; now I do expect the query above to return the new code.
INSERT INTO ProductTable VALUES ( "AC5", NULL ), ( "AC5", NULL );
Now the "check" query indeed returns,
AC5
So, the generation of Dups looks good.
I proceed now to delete all duplicate records that are not valid. If there are duplicate, valid records, they will stay duplicate unless some condition may be found, distinguishing among them one "good" record and declaring all others "invalid" (maybe repeating the procedure with a different field than Description).
But ay, there's a rub. Currently, you cannot delete from a table and select from the same table in a subquery ( http://dev.mysql.com/doc/refman/5.0/en/delete.html ). So a little workaround is needed:
CREATE TEMPORARY TABLE Dups AS
SELECT Product, COUNT(*) AS Duplicates
FROM ProductTable GROUP BY Product HAVING Duplicates > 1;
DELETE ProductTable FROM ProductTable JOIN Dups USING (Product)
WHERE Description IS NULL;
Now this will delete all invalid records, provided that they appear in the Dups table.
Therefore our CBPD14 record will be left untouched, because it does not appear there. The "good" record for CBPD10 will be left untouched because it's not true that its Description is NULL. All the others - poof.
Let me state again that if a record has no valid records and yet it is a duplicate, then all copies of that record will be killed - there will be no survivors.
To avoid this can may first SELECT (using the query above, the check "which should return nothing") the rows representing this mode of failure into another TEMPORARY TABLE, then INSERT them back into the main table after the deletion (using transactions might be in order).
Create a new table by scripting the old one out and renaming it. Also script all objects (indexes etc..) from the old table to the new. Insert the keepers into the new table. If you're database is in bulk-logged or simple recovery model, this operation will be minimally logged. Drop the old table and then rename the new one to the old name.
The advantage of this over a delete will be that the insert can be minimally logged. Deletes do double work because not only does the data get deleted, but the delete has to be written to the transaction log. For big tables, minimally logged inserts will be much faster than deletes.
If it's not that big and you have some downtime, and you have Sql Server Management studio, you can put an identity field on the table using the GUI. Now you have the situation like your CTE, except the rows themselves are truly distinct. So now you can do the following
SELECT MIN(table_a.MyTempIDField)
FROM
table_a lhs
join table_1 rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
table_a.MyTempIDField <> table_b.MyTempIDField
GROUP BY
lhs.field1, rhs.field2 etc
This gives you all the 'good' duplicates. Now you can wrap this query with a DELETE FROM query.
DELETE FROM lhs
FROM table_a lhs
join table_b rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
and lhs.MyTempIDField not in (
SELECT MIN(lhs.MyTempIDField)
FROM
table_a lhs
join table_a rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
GROUP BY
lhs.field1, lhs.field2 etc
)
Try this:
DELETE FROM TblProducts
WHERE Product IN
(
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1)
This suffers from the defect that it deletes ALL the records with a duplicated Product. What you probably want to do is delete all but one of each group of records with a given Product. It might be worthwhile to copy all the duplicates to a separate table first, and then somehow remove duplicates from that table, then apply the above, and then copy remaining products back to the original table.

How to use SQL Merge populating Master Detail related tables

I've been searching and I've yet to find an example using merge for populating related tables.
The Northwind DB Order & OrderDetail tables could be used. (In our scenario, our tables are 3 levels deep.)
For simplicity let's say we have the following tables.
Orders
OrderID PK
OrderNumber
OrderDetails
OrderID - PK
OrderLineItemNumber PK - FK to Orders.OrderID field
OrderDetailDetails
OrderID - PK - FK to OrderDetails.OrderID
OrderLineItemNumber - PK - FK to OrderDetails.OrderLineItemNumber
OrderSequenceNumber - PK
Also, in this scenario, records get written to staging tables that are identical the tables above. The merge would need to merge records from the 3 staging tables to the 3 matching production tables.
The production Order table's OrderId will not share the staging Order tables OrderId value.
So if the merge conditions are met, then there must be an insert into the Order table to generate OrderId (set to identity) because OrderId is needed for the OrderDetail & OrderDetailDetails rows to be created.
Right now I've written a service in C# that does all this but it's not that performant.
MERGE was discovered so we're looking into it to see if it can be used in a situation such as this. Any tips or pointers would be greatly appreciated.
Thanks.
Edit: I am now using Output store values into a Temporary table called #MergeOutput.
Declare #MergeOutput Table
(
ActionType varchar(10),
InsertedOrderId int,
StagingOrderID int,
DeletedOrderId int
);
However, I need to do a Merge on all 3 tables. (Order, OrderDetail & OrderDetailDetails)
Also, these tables have more fields than just the Id's.
So I've started creating the 2nd Merge for the OrderDetail table.
MERGE OrderDetail AS OD
USING(
SELECT OrderID,
OrderLineItemNumber,
ProductId
FROM OrderDetail AS OD
where OrderId IN (Select StagingOrderID from #MergeOutput where ActionType = 'INSERT'
) AS src(OrderID,
OrderLineItemNumber,
ProductId
)
ON (OD.OrderId = src.Order AND OD.OrderLineItemNumber = src.OrderLineItemNumber)
WHEN NOT MATCHED By Target THEN
INSERT INTO <-- (This doesn't work no matter what I've tried so far.)
Select (Select Distinct InsertedOrderID from #MergeOutput where StagingOrderId = OrderID), src.OrderLineItemNumber, src.ProductId
;
I see the following errors with the code above.
"Incorrect syntax near the keyword 'into'
I need the functionality of the Merge to move records on all 3 tables
Looks like I've finally got this to working. I had to change the Insert statement to as follows.
Insert(OrderId, OrderLineItemId, ProductID)
Values((Select Distinct InsertedOrderID from #MergeOutput where StagingOrderId = OrderID), src.OrderLineItemNumber, src.ProductId)
I had tried this Insert statement earlier on. I just figured out I had to wrap the selection parens ().
Thanks for everyone's help. I'm hoping I can carry this over to the merge for the 3rd table.
This is a nasty problem that keeps coming up. You need to extract the inserted identity values. In SQL Server you can do this using the OUTPUT clause (http://msdn.microsoft.com/en-us/library/ms177564.aspx) with the INSERT "virtual table". This allows you to get all inserted IDs out in one statement.
You can then push the IDs into a temp table and use them to insert the detail records with the appropriate master IDs like this:
INSERT INTO Detail
SELECT * from Staging_Detail
JOIN #MasterIDs on Staging_Detail.MasterID = #MasterIDs.MasterID