Mapping surrogate keys in ssis - sql

How to map a surrogate key(which is a foreign key to other dimension table) in ssis.
I have Dim.Camp table like this :
Dim.Camp(campkey int identity(1,1),Advkey int,campbk int,campname varchar(10))
Dim.Adv(Advkey int identity(1,1),Advbk int)
The above are my dimension tables,
These are my staging tables:
Camp(Advid int,campid int,campname varchar(10))
Adv(Advid int)
I load my Dim.camp through loop up task in ssis using my staging tables :
Then i get:
Dim.Camp(campkey int identity(1,1),Advkey int,campbk int,campname varchar(10)) populated accept
Advkey which gets all Nulls in its column because there is no corresponding mapping in staging tables
Can somebody tell me what is it I'm doing wrong, ...or how to get this done ?

I would like to know what is the relationship between the entities or tables "Camp" and "Adv" in your source system or in the staging table. You always need the business key to perform a lookup in the dimension tables, that is, you need the campbk to lookup records in the Dim.Camp and the Advbk to lookup records in the Dime.Camp dimension
If I understood correctly you should load first the Dim.Adv from staging using for instance a merge command in an execute SQL task or in a data flow task. Then you can load the Dim.Camp from Stage, something like this:
Source query:
select Advid as Advbk,campid campbk,campname from Staging.Camp
Then make a lookup (lookup table: Dim.Adv) and get the Dim.Advid where Dim.Adv.Advbk = Advbk
Finally make another lookup (table: Dim.Camp) and then determine if you have to update or insert the records.
Let me know if you have further questions.
Kind Regards,
Paul

Related

Migrate data between two SQL databases with identity column

Here is the scenario... I've two databases (A & B) with same schema but different records. I'd like to transfer B's data into corresponding tables in DB A.
Lets say we have tables named Question and Answer in both databases. DB A contains 10 records in Question table and 30 in Answer table. Both tables have identity column Id starting with 1(& auto increment), and there is 1 to many relation between Question and Answer.
In DB B, we have 5 entries in Question table and 20 in Answer.
My requirement is to copy data of both tables from source DB B into destination DB A without having any conflict in identity column while maintaining the relation between two tables during data transfer.
Any solution or potential workaround would be highly appreciated.
I will not write SQL here but here is what I think can be done. Make sure to use Identity insert ON and OFF.
Take maxids of both tables from DB A like A_maxidquestion and A_maxidanswer.
Select from B_question . In select column add derived col QuestionID+A_maxidquestion.This will be your new ID.
Select from B_Answer . In select column add derived col AnswerID+A_maxidanswer and fk id as QuestionID+A_maxidquestion.
Note- Make sure Destination table is not beeing used by any other process for inserting values while you are inserting
One of the best approaches to something like this is to use the OUTPUT clause. https://learn.microsoft.com/en-us/sql/t-sql/queries/output-clause-transact-sql?view=sql-server-2017 You can insert the new parent and capture the newly inserted identity value which you can use to insert the children.
You can do this set based if you also include a temp table which would hold the original identity value and the new identity value.
With no details of the tables that is the best I can do.

Incremental load for Updates into Warehouse

I am planning for an incremental load into warehouse (especially for updates of source tables in RDBMS).
Capturing the updated rows in staging tables from RDBMS based the updates datetime. But how do I determine which column of a particular row needs to be updated in the target warehouse tables?
Or do I just delete a particular row in the warehouse table (based on the primary key of the row in staging table) and insert the new updated row?
Which is the best way to implement the incremental load between the RDBMS and Warehouse using PL/SQL and SQL coding?
In my opinion, the easiest way to accomplish this is as follows:
Create a stage table identical to your host table. When you do your incremental/net-change load, load all changed records into this table (based on whatever your "last updated" field is)
Delete the records from your actual table based on the primary key. For example, if your primary key is customer, part, the query might look like this:
delete from main_table m
where exists (
select null
from stage_table s
where
m.customer = s.customer and
m.part = s.part
);
Insert the records from the stage to the main table.
You could also do an update existing records / insert new records, but either way that's two steps. The advantage of the method I listed is that it will work even if your tables have partitions and the newly updated data violates one of the original partition rules, whereas an update would not accomplish that. Also, the syntax is much simpler as your update would have to list every single field, whereas the delete from / insert into allows you list only the primary key fields.
Oracle also has a merge clause that will update if it exists or insert if it does not. I honestly don't know how that would be impacted if you had partitions.
One major caveat. If your updates include deletes -- records that need to be deleted from the main table, none of these will resolve that and you will need some other way to handle that. It may not be necessary, depending on your circumstances, but it's something to consider.

Pre-process validation of an INSERT INTO statement

I have a stored procedure that creates many INSERT INTO statements using Dynamic SQL. I have no control over the data that I am attempting to insert and the values being inserted are derived from a
SELECT * FROM sourceTable
Occasionally, when inserting, a foreign key constraint is triggered (the data is being taken from environment A and being inserted into environment B so it is possible certain other tables have not been kept up to date)
My question is - is there a way to pre-process validate all my INSERT statements for any errors (foreign key constraint or otherwise) before executing them? Or do I need to create checkpoints and use the rollback functionality?
---OVERVIEW OF PROCESS
We create tables on environment A (source) containing subsets of data based on a selection criteria
Using the SQL Export Wizard, these tables are copied across to environment B (target)
A stored procedure is run to import the data from these tables into the corresponding
tables on environment B. This sp uses the INSERT INTO tableA SELECT
* FROM tableAFromSource command within a dynamic SQL loop containing all the table names
This approach is used due to external factors we cannot control (servers cannot be linked, data structures, permissions on servers etc)
I don't know any type of pre-processing that checks what would be the result of any given query, but it's often easy to check for conditions using plain T-SQL. If you just want to avoid the primary key constraint violations, you can use the following variation of the query you provided (considering Id the primary key):
INSERT INTO tableA
SELECT *
FROM tableAFromSource src
WHERE src.Id NOT IN ( SELECT tba.Id FROM tableA tba );

Foreign key Constraints - Writing to Error Table - SQL Server 2008

I am new to SQL Server. I have a Batch process that loads data into my stage tables. I have some foreign Keys on that table. I want to dump all the foreign key errors encountered while loading into a error table. How do I do that?
Thanks
New Novice
Use SSIS to load the data. Records which fail validation can be sent off to a an exception table.
One approach would be to load the data into a temporary table which has no FK constraints, remove the bad records (that violate the FK constraints), then move the data from the temp table into the stage table. If you have lots of FKs on your table this will probably be a bit tedious, so you would probably want to automate the process.
Here's some pseudo-code to show what I mean...
-- First put the raw data into MyTempTable
-- Find the records that are "bad" -- you can SELECT INTO a "bad records" table
-- for later inspection if you want...
SELECT *
INTO #BadRecords
FROM MyTempTable
WHERE ForeignKeyIDColumn NOT IN
(
SELECT ID FROM ForeignKeyTable
)
-- Remove the bad records now
DELETE
FROM MyTempTable
WHERE ForeignKeyIDColumn NOT IN
(
SELECT ID FROM ForeignKeyTable
)
-- Now the data is "clean" (won't violate the FK) so you can insert it
-- from MyTempTable into the stage table

Merging databases how to handle duplicate PK's

We have three databases that are physically separated by region, one in LA, SF and NY. All the databases share the same schema but contain data specific to their region. We're looking to merge these databases into one and mirror it. We need to preserve the data for each region but merge them into one db. This presents quite a few issues for us, for example we will certainly have duplicate Primary Keys, and Foreign Keys will be potentially invalid.
I'm hoping to find someone who has had experience with a task like this who could provide some tips, strategies and words of experience on how we can accomplish the merge.
For example, one idea was to create composite keys and then change our code and sprocs to find the data via the composite key (region/original pk). But this requires us to change all of our code and sprocs.
Another idea was to just import the data and let it generate new PK's and then update all the FK references to the new PK. This way we potentially don't have to change any code.
Any experience is welcome!
I have no first-hand experience with this, but it seems to me like you ought to be able to uniquely map PK -> New PK for each server. For instance, generate new PKs such that data from LA server has PK % 3 == 2, SF has PK % 3 == 1, and NY has PK % 3 == 0. And since, as I understood your question anyway, each server only stores FK relationships to its own data, you can update the FKs in identical fashion.
NewLA = OldLA*3-1
NewSF = OldLA*3-2
NewNY = OldLA*3
You can then merge those and have no duplicate PKs. This is essentially, as you already said, just generating new PKs, but structuring it this way allows you to trivially update your FKs (assuming, as I did, that the data on each server is isolated). Good luck.
BEST: add a column for RegionCode, and include it on your PKs, but you don't want to do all the leg work.
HACK: if your IDs are INTs, a quick fix would be to add a fixed value based on region to each key on import. INTs can be as large as: 2,147,483,647
local server data:
LA IDs: 1,2,3,4,5,6
SF IDs: 1,2,3,4,5
NY IDs: 1,2,3,4,5,6,7,9
add 100000000 to LA's IDs
add 200000000 to SF's IDs
add 300000000 to NY's IDs
combined server data:
LA IDs: 100000001,100000002,100000003,100000004,100000005,100000006
SF IDs: 200000001,200000002,200000003,200000004,200000005
NY IDs: 300000001,300000002,300000003,300000004,300000005,300000006,300000007,300000009
I have done this and I say change your keys (pick a method) rather than changing your code. Invariably you will either miss a stored procedure or introduce a bug. With data changes, it is pretty easy to write tests to look for orphaned records or to verify that things were matched up correctly. With code changes, especially code that is working correctly, it is too easy to miss something.
One thing you could do is set up the tables with regional data to use GUID's. That way, the primary keys in each region are unique, and you can mix and match data (import data from one region to another). For the tables which have shared data (like type tables), you can keep the primary keys the way they are (since they should be the same everywhere).
Here is some information about GUID's:
http://www.sqlteam.com/article/uniqueidentifier-vs-identity
Maybe SQL Server Management Studio lets you convert columns to use GUID's easily. I hope so!
Best of luck.
what i have done in a situation like this is this:
create a new db with the same schema
but only tables. no pk fk, checks
etc.
transfer data from DB1 to this
source db
for each table in target database
find the top number for the PK
for each table in the source
database update their pk, fk etc
starting with the (top number + 1)
from the target db
for each table in target database
set identity insert to on
import data from source db to target
db
for each table in target database
set identity insert to off
clear source db
repeat for DB2
As Jon mentioned, I would use GUIDs to solve the merge task. And I see two different solutions that required GUIDs:
1) Permanently change your database schema to use GUIDs instead of INTEGER (IDENTITY) as primary key.
This is a good solution in general, but if you have a lot of non SQL code that is somehow bound to the way your identifiers work, it could require quite some code changes. Probably since you merge databases, you may anyways need to update your application so that it is working with one region data only based on the user logged in etc.
2) Temporarily add GUIDs for migration purposes only, and after the data is migrated, drop them:
This one is kind-of more tricky, but once you write this migration script, you can (re-)run it multiple times to merge databases again in case you screw it the first time. Here is an example:
Table: PERSON (ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL)
Table: ADDRESS (ID INT PRIMARY KEY, City VARCHAR(100) NOT NULL, PERSON_ID INT)
Your alter scripts are (note that for all PK we automatically generate the GUID):
ALTER TABLE PERSON ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID())
ALTER TABLE ADDRESS ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID())
ALTER TABLE ADDRESS ADD PERSON_UID UNIQUEIDENTIFIER NULL
Then you update the FKs to be consistent with INTEGER ones:
--// set ADDRESS.PERSON_UID
UPDATE ADDRESS
SET ADDRESS.PERSON_UID = PERSON.UID
FROM ADDRESS
INNER JOIN PERSON
ON ADDRESS.PERSON_ID = PERSON.ID
You do this for all PKs (automatically generate GUID) and FKs (update as shown above).
Now you create your target database. In this target database you also add the UID columns for all the PKs and FKs. Also disable all FK constraints.
Now you insert from each of your source databases to the target one (note: we do not insert PKs and integer FKs):
INSERT INTO TARGET_DB.dbo.PERSON (UID, NAME)
SELECT UID, NAME FROM SOURCE_DB1.dbo.PERSON
INSERT INTO TARGET_DB.dbo.ADDRESS (UID, CITY, PERSON_UID)
SELECT UID, CITY, PERSON_UID FROM SOURCE_DB1.dbo.ADDRESS
Once you inserted data from all the databases, you run the code opposite to the original to make integer FKs consistent with GUIDs on the target database:
--// set ADDRESS.PERSON_ID
UPDATE ADDRESS
SET ADDRESS.PERSON_ID = PERSON.ID
FROM ADDRESS
INNER JOIN PERSON
ON ADDRESS.PERSON_UID = PERSON.UID
Now you may drop all the UID columns:
ALTER TABLE PERSON DROP COLUMN UID
ALTER TABLE ADDRESS DROP COLUMN UID
ALTER TABLE ADDRESS DROP COLUMN PERSON_UID
So at the end you should get a rather long migration script, that should do the job for you. The point is - IT IS DOABLE
NOTE: all written here is not tested.