i'm trying to migrate some data from two tables in an OLD database, to a NEW database.
The problem is that I wish to generate new Primary Key's in the new database, for the first table that is getting imported. That's simple.
But the 2nd table in the old database has a foreign key dependency on the first table. So when I want to migrate the old data from the second table, the foreign key's don't match any more.
Are there any tricks/best practices involved to help me migrate the data?
Serious Note: i cannot change the current schema of the new tables, which do not have any 'old id' column.
Lets use the following table schema :-
Old Table1 New Table1
ParentId INT PK ParentId INT PK
Name VARCHAR(50) Name VARCHAR(50)
Old Table 2 New Table 2
ChildId INT PK ChildId INT PK
ParentId INT FK ParentId INT FK
Foo VARCHAR(50) Foo VARCHAR(50)
So the table schema's are identical.
Thoughts?
EDIT:
For those that are asking, RDBMS is Sql Server 2008. I didn't specify the software because i was hoping i would get an agnostic answer with some generic T-Sql :P
I think you need to do this in 2 steps.
You need to import the old tables and keep the old ids (and generate new ones). Then once they're in the new database and they have both new and old ids you can use the old Id's to get associate the new ids, then you drop the old ids.
You can do this by importing into temporary (i.e. they will be thrown away) tables, then inserting into the permanent tables, leaving out the old ids.
Or import directy into the new tables (with schema modified to also hold old ids), then drop the old id's when they're no longer necessary.
EDIT:
OK, I'm a bit clearer on what you're looking for thanks to comments here and on other answers. I knocked this up, I think it'll do what you want.
Basically without cursors it steps through the parent table, row by row, and inserts the new partent row, and all the child rows for that parent row, keeping the new id's in sync.
I tried it out and it should work, it doesn't need exclusive access to the tables and should be orders of magniture faster than a cursor.
declare #oldId as int
declare #newId as int
select #oldId = Min(ParentId) from OldTable1
while not #oldId is null
begin
Insert Into NewTable1 (Name)
Select Name from OldTable1 where ParentId = #oldId
Select #newId = SCOPE_IDENTITY()
Insert Into NewTable2 (ParentId, Foo)
Select #newId, Foo From OldTable2 Where ParentId = #oldId
select #oldId = Min(ParentId) from OldTable1 where ParentId > #oldId
end
Hope this helps,
Well, I guess you'll have to determine other criteria to create a map like oldPK => newPK (for example: Name field is equal?
Then you can determine the new PK that matches the old PK and adjust the ParentID accordingly.
You may also do a little trick: Add a new column to the original Table1 which stores the new PK value for a copied record. Then you can easily copy the values of Table2 pointing them to the value of the new column instead of the old PK.
EDIT: I'm trying to provide some sample code of what I meant by my little trick. I'm not altering the original database structure, but I'm using a temporary table now.
OK, you might try to following:
1) Create temporary table that holds the values of the old table, plus, it gets a new PK:
CREATE TABLE #tempTable1
(
newPKField INT,
oldPKField INT,
Name VARCHAR(50)
)
2) Insert all the values from your old table into the temporary table calculating a new PK, copying the old PK:
INSERT INTO #tempTable1
SELECT
newPKValueHere AS newPKField,
ParentID as oldPKField,
Name
FROM
Table1
3) Copy the values to the new table
INSERT INTO NewTable1
SELECT
newPKField as ParentId,
Name
FROM
#tempTable1
4) Copy the values from Table2 to NewTable2
INSERT INTO NewTable2
SELECT
ChildID,
t.newPKField AS ParentId,
Foo
FROM
Table2
INNER JOIN #tempTable1 t ON t.ParentId = parentId
This should do. Please note that this is only pseudo T-SQL Code - I have not tested this on a real database! However, it should come close to what you need.
Can you change the schema of the old tables? If so, you could put a "new id" column on the old tables, and use that as the reference.
You might have to do a row by row insert on the new table and then retrieve the scope_identity, store it in the old table1. But for table2, you can then join to the old table1 and grab the new_id.
First of all - can you not even have some temporary schema that you can later drop?! That would make life easier. Assuming you can't:
If you're lucky (and if you can guarantee that no other inserts will be happening at the same time) then when you insert the Table1's data into your new table you could perhaps cheat by relying on the sequential order of the inserts.
You could then create a view that joins the 2 tables on a row-count so that you have a way to correlate the keys to each other. That way you'd be one step closer to being able to identify the 'ParentId' for the new Table2.
I'm not sure from your question what database software you're using, but if temporary tables are an option, create a temporary table containing the original primary key of table1 and the new primary key of table1. Then create another temporary table with a copy of table2, update the copy using the "old key, new key" table you created earlier, then use "insert into select from" (or whatever the appropriate command is for your database) to copy the revised temporary table into its permanent location.
I had the wonderful opportunity to be dug deep in migration scripts last summer. I was using Oracle's PL/SQL for the task. But you did not mention what technology are you using? What are you migrating the data into? SQL Server? Oracle? MySQL?
The approach is to INSERT a row from table1 RETURING the new primary key generated (probably by a SEQUENCE [in Oracle]) and then INSERT the dependent records from table2, changing their foreign key value to the value returned by the first INSERT. Can't help you any better unless you can specify what DBMS are you migrating data into.
The following Pseudo-ish code should work for you
CREATE TABLE newtable1
ParentId INT PK
OldId INT
Name VARCHAR(50)
CREATE TABLE newtable2
ChildId INT pk
ParentId INT FK
OldParent INT
Foo VARCHAR(50)
INSERT INTO newtable1(OldId, Name)
SELECT ParentId, Name FROM oldtable1
INSERT INTO newtable2(OldParent, Foo)
SELECT ParentId, Foo FROM oldtable2
UPDATE newtable2 SET ParentId = (
SELECT n.ParentId
FROM newtable1 AS n
WHERE n.OldId = newtable2.oldParent
)
ALTER TABLE newtable1 DROP OldId
ALTER TABLE newtable2 DROP OldParent
Related
I have imported some data to a temp SQL table from an Excel file. Then I have tried to insert all rows to two related tables. Simply like this: There are Events and Actors tables with many to many relationship in my database. Actors are already added. I want to add all events to Events table and then add relation(ActorId) for each event to EventActors tables.
(dbo.TempTable has Title, ActorId columns)
insert into dbo.Event (Title)
Select Title
From dbo.TempTable
insert into dbo.EventActor (EventId, ActorId)
Select SCOPE_IDENTITY(), ActorId --SCOPE_IDENTITY() is for EventId
From dbo.TempTable
When this code ran, all events inserted into Events, but the relations didn't inserted into EventActors because of Foreign Key error.
I think there should be a loop. But I am confused. I don't want to write C# code for this. I know there would be a simple but advanced solution trick for this in SQL Server. Thanks for your help.
Use the output clause to capture the new IDs, with a merge statement to allow capture from both source and destination tables.
Having captured this information, join it back to the temp table for the second insert.
Note you need a unique id per row, and this assumes 1 row in the temp table creates 1 row in both the Event and the EventActor tables.
-- Ensure every row has a unique id - could be part of the table create
ALTER TABLE dbo.TempTable ADD id INT IDENTITY(1,1);
-- Create table variable for storing the new IDs in
DECLARE #NewId TABLE (INT id, INT EventId);
-- Use Merge to Insert with Output to allow us to access all tables involves
-- As Insert with Output only allows access to columns in the destination table
MERGE INTO dbo.[Event] AS Target
USING dbo.TempTable AS Source
ON 1 = 0 -- Force an insert regardless
WHEN NOT MATCHED THEN
INSERT (Title)
VALUES (Source.Title)
OUTPUT Source.id, Inserted.EventId
INTO #NewId (id, EventId);
-- Insert using new Ids just created
INSERT INTO dbo.EventActor (EventId, ActorId)
SELECT I.EventId, T.ActorId
FROM dbo.TempTable T
INNER JOIN #NewId I on T.id = T.id;
I have created tables T1 with columns( id as Primary key and name) and T2 with columns( id as primary key, name, t_id as foreign key references T1(id)) . I Inserted some values from inputs from a Windows form. After querying SELECT * FROM T2; using isql, all the values in the foreign key column are null instead of duplicating values in T1(id) because of the relationship created. Is they anything I have left out or need to add? The primary key of both tables are autoincremented.
You are confusing auto-incremented keys and relationship uses.
Auto-incremented keys (or generally talking, fields) just help you when you are inserting a new record on the table of the key. But when you are inserting a new record that makes a reference to a record in another table, then you must specify that record, using the foreign key field. Or in your case, the user that is inserting the "name" in T2 must say which one record on T1 that "name" in T2 is making a reference.
Your confusion on the relationship is that you are thinking that an established relationship will enforce the use of that values automatically. But the relationship just enforce the validation of the values. So, the field t_id in T2 will not use the value of the last record of T1 automatically. But if you try to insert a value that do not exist in T1 in the field t_id, the relationship will not let you do.
So, answering your question, what you left out and need to add?
You left out the part of the code that insert the value on the t_id field of T2 table.
Let me try to explain using an example that is more common.
The most common case of this is that the application insert first the T1 record and then when the user is inserting T2, the application provide a way to the user to choose which one T1 record his T2 record is referencing.
Suppose T1 is a publishers table and T2 is a book table. User insert a publisher, and when it is inserting a book it can choose which one publisher publish that book.
Field "ID" of Customers will be AUTOINCREMENT by default in table create using Event BeforeInsert on table CUSTOMERS. LOOK AT
CREATE TRIGGER nametrigger FOR nametable
ACTIVE BEFORE INSERT POSITION 0
AS
BEGIN
IF (NEW.ID IS NULL) THEN BEGIN
NEW.ID = GEN_ID(GEN_PK_ID, 1);
END
END
Now one new record in Customers
INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)
VALUES ('Cardinal','Tom B. Erichsen','Skagen 21','Stavanger','4006','Norway');
Then ID will be automaticaly one sequencial number from 1 up to last integer or smallint or bigint as you defined in your create table (pay attencion that ID field is not include in FIELDS and VALUES) because TRIGGER
now you can use the dataset (obj) options to link the table MATER and DETAIL see in help delphi
or in SQL you can to use PARAMS FIELDS
later insert one new record in table MASTER try...
INSERT INTO xTable2 (IDcustomersField, ..., ..., ...., ....)
VALUES ( :IDcustomersField, ..., ..., ...., ....);
xTable2 may using one field ID (Primary Key) autoincrement too. this help when DELETING or UPDATING fileds in this table
Then you can say the value to :IDcustomersField in table detail using
xQuery.PARAM( 0 ).value or xQuery.PARAMBYNAME( IDcustomersField).value (here im using Query obj as example )
you can to use example with DATASOURCE in code to say the value for IDcustomersField
can to use
Events in SQL
can to use
PROCEDURE IN SQL
DONT FORGOT
you have to create Relationship between two table ( REFERENCIAL INTEGRITY and PRIMARY KEY in mater table ) NOT NULL FOR TWO FIELDS ON TABLES
I believe that understand me about my poor explanation (i dont speak english
You need to insert the values for t_id manually, after you get the ID's value from the main table T1.
Depending on your logic in the database you also can use a trigger or a stored procedure. Give us more information about what values you expect to have in NAME field in T2 after the insert? Are they duplicates from T1 or independent from T1?
If T1.NAME=T2.NAME, you can automate the process with a trigger
CREATE OR ALTER TRIGGER TR_T1_AI0 FOR T1
ACTIVE AFTER INSERT POSITION 0
AS
BEGIN
INSERT INTO T2(NAME, T_ID)
VALUES (NEW.NAME, NEW.ID);
END
If T2.NAME's value is different from T1.NAME you can use a stored procedure with parameters both names:
CREATE ORA ALTER PROCEDURE XXXX(
P_NAME_T1 TYPE OF T1.NAME,
P_NAME_T2 TYPE OF T2.NAME)
AS
DECLARE VARIABLE L_ID TYPE OF T1.ID;
BEGIN
INSERT INTO T1(NAME)
VALUES (:p_NAME_T1)
RETURNING ID INTO:L_ID;
INSERT INTO T2(NAME, T_ID)
VALUES (:P_NAME_T2, :l_ID);
END
You can use both statements from the stored procedure directly in your program if it supports the returning syntax. If not, you need an additional query with SELECT NEXT VALUE FOR GENERATOR_FOR_T1 FROM RDB$DATABASE; and use the value returned from it in both INSERT statements.
Okay first a little bit of background, I've inherited maintaining a Database on MSSQL 2000.
In the Database there's a massive collection of interconnected tables, through Foreign keys.
What I'm attempting to do is to rebuild each table in a sorted fashion that will eliminate gaps in the IDENT column of the table.
On one table in particular I have the following columns:
RL_ID, RL_FK_RaidID, RL_FK_MemberID, RL_FK_ItemID, RL_ItemValue, RL_Notes, RL_IsUber, RL_IsWishItem, RL_LootModifier, RL_WishItemValue, RL_WeightedLootValue
It uses RL_ID as the IDENT column which currently reports 32620 by using DBCC CHECKIDENT (Table)
There is, however, only 12128 rows of information in this table.
So I tried a simple script to copy all the information in a sorted fashion into a new table:
INSERT INTO Table_1
SELECT RL_ID, RL_FK_RaidID, RL_FK_MemberID, RL_FK_ItemID, RL_ItemValue, RL_Notes, RL_IsUber, RL_IsWishItem, RL_LootModifier, RL_WishItemValue, RL_WeightedLootValue
FROM RaidLoot
ORDER BY RL_ID
Then Delete all the rows from the source table with:
TRUNCATE TABLE (RaidLoot)
Verify the IDENT is 1 with:
DBCC CHECKIDENT (RaidLoot)
Now copy the Data back into the Original table from Row 1 to the end:
SET IDENTITY_INSERT RaidLoot ON
INSERT INTO RaidLoot (RL_ID, RL_FK_RaidID, RL_FK_MemberID, RL_FK_ItemID, RL_ItemValue, RL_Notes, RL_IsUber, RL_IsWishItem, RL_LootModifier, RL_WishItemValue, RL_WeightedLootValue)
SELECT RL_ID, RL_FK_RaidID, RL_FK_MemberID, RL_FK_ItemID, RL_ItemValue, RL_Notes, RL_IsUber, RL_IsWishItem, RL_LootModifier, RL_WishItemValue, RL_WeightedLootValue
FROM Table_1
ORDER BY RL_ID
SET IDENTITY_INSERT RaidLoot OFF
Now verify that I only have the 12128 rows of data:
DBCC CHECKIDENT (RaidLoot)
(Note: I end up with 32620 again since it never did renumber the RL_ID, it just put them back into the same spots leaving the gaps). So where / how can I get it to Renumber the RL_ID column starting from 1 so that when it writes the data back to the original table I don't have the gaps?
The only other solution I can see is a heartache process of Manually changing each row RL_ID in the Table_1 before I write it back to the Original table. While this isn't impossible. I have another table that has approx 306,000 rows of data, but the IDENT report lists out as 450,123, so I'm hoping there is an easier way to automate the renumbering process.
If you really have to do this (seems like a great waste of time to me), you will have to adjust all of the foreign key references as well.
Consider the strategy of adding a NewID column for each table and populate the new column sequentially. Then you can use this NewID column in the queries needed to adjust the foreign keys. Very messy nonetheless unless you can come up with a consistent pattern to do so.
Since you can query the metadata to determine foreign keys, etc. this is certainly possible, and definitely should be considered seriously if you really do have lots of tables.
ADDED
There is a simple way to populate the NewID column
declare #id int
set #id = 0
update MyTable set NewID=#id, #id=#id+1
It is not obvious that this works, but it does.
I don't think it has to do with RL_ID being referenced by other tables in the schema - if I set up a single table test, the identity will always show up as the max number in the identity field:
CREATE TABLE #temp (id INT IDENTITY(1,1), other VARCHAR(1))
INSERT INTO #temp
( other )
VALUES ( -- id - int
'a' -- other - varchar(1)
),('b'),('c'),('d'),('e')
SELECT *
FROM #temp
SELECT *
INTO #holder
FROM #temp
WHERE other = 'C'
TRUNCATE TABLE #temp
SET IDENTITY_INSERT #temp ON
INSERT INTO #temp
( id, other )
SELECT id ,
other
FROM #holder
DBCC CHECKIDENT (#temp)
DROP TABLE #temp
DROP TABLE #holder
So your new identity is 32620 because that is the MAX(RL_ID)
Suppose I have Table A and Table B. Table B references Table A. I want to deep copy a set of rows in Table A and Table B. I want all of the new Table B rows to reference the new Table A rows.
Note that I'm not copying the rows into any other tables. The rows in table A will be copied into table A, and the rows in table B will be copied into table B.
How can I ensure that the foreign key references get readjusted as part of the copy?
To clarify, I'm trying to find a generic way to do this. The example I'm giving involves two tables, but in practice the dependency graph may be much more complicated. Even a generic way to dynamically generate SQL to do the work would be fine.
UPDATE:
People are asking why this is necessary, so I'll give some background. It may be way too much, but here goes:
I'm working with an old desktop application that's been moved to a client-server model. But, the application still uses a rudimentary in-house binary file format for storing data for its tables. A data file is just a header followed by a series of rows, each of which is just the binary serialized field values, the order of which is determined by a schema text file. The only thing good about it is that it's very fast. It's terrible in every other respect. I'm moving the application to SQL Server and trying not to degrade the performance too badly.
This is a kind of scheduling application; the data's not critical to anybody, and there's no audit tracking, etc. necessary. It's not a supermassive amount of data, and we don't necessarily need to keep very old data around if the database grows too large.
One feature that they are accustomed to is the ability to duplicate entire schedules in order to create "what-if" scenarios that they can muck with. Any user can do this as many times as they want, as often as they want. In the old database, the data files for each schedule are stored in their own data folder, identified by name. So, copying a schedule was as simple as copying the data folder and renaming it.
I must be able to do effectively the same thing with SQL Server or the migration will not work. Maybe you're thinking that I can just only copy the data that actually gets changed in order to avoid redundancy; but that honestly sounds too complicated to be feasible.
To throw another wrench into the mix, there can be a hierarchy of schedule data folders. So, a data folder may contain a data folder, which may contain a data folder. And the copying can occur at any level.
In SQL Server, I'm implementing a nested set hierarchy to mimic this. I have a DATA_SET table like this:
CREATE TABLE dbo.DATA_SET
(
DATA_SET_ID UNIQUEIDENTIFIER PRIMARY KEY,
NAME NVARCHAR(128) NOT NULL,
LFT INT NOT NULL,
RGT INT NOT NULL
)
So, there's a tree structure of data sets. Each data set represents a schedule, and may contain child data sets. Every row in every table has a DATA_SET_ID FK reference, indicating which data set it belongs to. Whenever I copy a data set, I copy all the rows in the table for that data set, and every other data set, into the same table, but referencing new data sets.
So, here's a simple concrete example:
CREATE TABLE FOO
(
FOO_ID BIGINT PRIMARY KEY,
DATA_SET_ID BIGINT FOREIGN KEY REFERENCES DATA_SET(DATA_SET_ID) NOT NULL
)
CREATE TABLE BAR
(
BAR_ID BIGINT PRIMARY KEY,
DATA_SET_ID BIGINT FOREIGN KEY REFERENCES DATA_SET(DATA_SET_ID) NOT NULL,
FOO_ID UNIQUEIDENTIFIER PRIMARY KEY
)
INSERT INTO FOO
SELECT 1, 1 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
INSERT INTO BAR
SELECT 1, 1, 1
SELECT 2, 1, 2
SELECT 3, 1, 3
So, let's say I copy data set 1 into a new data set of ID 2. After I copy, the tables will look like this:
FOO
FOO_ID, DATA_SET_ID
1 1
2 1
3 1
4 2
5 2
6 2
BAR
BAR_ID, DATA_SET_ID, FOO_ID
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 2 6
As you can see, the new BAR rows are referencing the new FOO rows. It's not the rewiring of the DATA_SET_ID's that I'm asking about. I'm asking about rewiring the foreign keys in general.
So, that was surely too much information, but there you go.
I'm sure there are a lot of concerns about performance with the idea of bulk copying the data like this. The tables are not going to be huge. I'm not expecting more than 1000 records in any table, and most of the tables will be much much smaller than that. Old data sets can be deleted outright with no repercussions.
Thanks,
Tedderz
Here is an example with three tables that can probably get you started.
DB schema
CREATE TABLE users
(user_id int auto_increment PRIMARY KEY,
user_name varchar(32));
CREATE TABLE agenda
(agenda_id int auto_increment PRIMARY KEY,
`user_id` int, `agenda_name` varchar(7));
CREATE TABLE events
(event_id int auto_increment PRIMARY KEY,
`agenda_id` int,
`event_name` varchar(8));
An SP to clone a user with his agenda and events records
DELIMITER $$
CREATE PROCEDURE clone_user(IN uid INT)
BEGIN
DECLARE last_user_id INT DEFAULT 0;
INSERT INTO users (user_name)
SELECT user_name
FROM users
WHERE user_id = uid;
SET last_user_id = LAST_INSERT_ID();
INSERT INTO agenda (user_id, agenda_name)
SELECT last_user_id, agenda_name
FROM agenda
WHERE user_id = uid;
INSERT INTO events (agenda_id, event_name)
SELECT a3.agenda_id_new, e.event_name
FROM events e JOIN
(SELECT a1.agenda_id agenda_id_old,
a2.agenda_id agenda_id_new
FROM
(SELECT agenda_id, #n := #n + 1 n
FROM agenda, (SELECT #n := 0) n
WHERE user_id = uid
ORDER BY agenda_id) a1 JOIN
(SELECT agenda_id, #m := #m + 1 m
FROM agenda, (SELECT #m := 0) m
WHERE user_id = last_user_id
ORDER BY agenda_id) a2 ON a1.n = a2.m) a3
ON e.agenda_id = a3.agenda_id_old;
END$$
DELIMITER ;
To clone a user
CALL clone_user(3);
Here is SQLFiddle demo.
I recently found myself needing to solve a similar problem; that is, I needed to copy a set of rows in a table (Table A) as well as all of the rows in related tables which have foreign keys pointing to Table A's primary key. I was using Postgres so the exact queries may differ but the overall approach is the same. The biggest benefit of this approach is that it can be used recursively to go infinitely deep
TLDR: the approach looks like this
1) find all the related table/columns of Table A
2) copy the necessary data into temporary tables
3) create a trigger and function to propagate primary key column
updates to related foreign keys columns in the temporary tables
4) update the primary key column in the temporary tables to the next
value in the auto increment sequence
5) Re-insert the data back into the source tables, and drop the
temporary tables/triggers/function
1) The first step is to query the information schema to find all of the tables and columns which are referencing Table A. In Postgres this might look like the following:
SELECT tc.table_name, kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY'
AND ccu.table_name='<Table A>'
AND ccu.column_name='<Primary Key>'
2) Next we need to copy the data from Table A, and any other tables which reference Table A - lets say there is one called Table B. To start this process, lets create a temporary table for each of these tables and we will populate it with the data that we need to copy. This might look like the following:
CREATE TEMP TABLE temp_table_a AS (
SELECT * FROM <Table A> WHERE ...
)
CREATE TEMP TABLE temp_table_b AS (
SELECT * FROM <Table B> WHERE <Foreign Key> IN (
SELECT <Primary Key> FROM temp_table_a
)
)
3) We can now define a function that will cascade primary key column updates out to related foreign key columns, and trigger which will execute whenever the primary key column changes. For example:
CREATE OR REPLACE FUNCTION cascade_temp_table_a_pk()
RETURNS trigger AS
$$
BEGIN
UPDATE <Temp Table B> SET <Foreign Key> = NEW.<Primary Key>
WHERE <Foreign Key> = OLD.<Primary Key>;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_temp_table_a
AFTER UPDATE
ON <Temp Table A>
FOR EACH ROW
WHEN (OLD.<Primary Key> != NEW.<Primary Key>)
EXECUTE PROCEDURE cascade_temp_table_a_pk();
4) Now we just update the primary key column in to the next value of the sequence of the source table (). This will activate the trigger, and the updates will be cascaded out to the foreign key columns in . In Postgres you can do the following:
UPDATE <Temp Table A>
SET <Primary Key> = nextval(pg_get_serial_sequence('<Table A>', '<Primary Key>'))
5) Insert the data back from the temporary tables back into the source tables. And then drop the temporary tables, triggers, and functions after that.
INSERT INTO <Table A> (SELECT * FROM <Temp Table A>)
INSERT INTO <Table B> (SELECT * FROM <Temp Table B>)
DROP TRIGGER trigger_temp_table_a
DROP cascade_temp_table_a_pk()
It is possible to take this general approach and turn it into a script which can be called recursively in order to go infinitely deep. I ended up doing just that using python (our application was using django so I was able to use the django ORM to make some of this easier)
I have this table which doesn't have a primary key.
I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.
If this were a programming language like Java I would:
int hash = column1 * 31 + column2 * 31 + column3*31
Or something like that. But this is SQL.
How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.
My table has 3 numbers and a date.
EDIT What my problem is
I think a bit more of background is needed. I'm sorry for not providing it before.
I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.
Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.
I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.
Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).
This process does clean up from the original source the data once it is copied ( into dm ).
Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )
So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.
The problem here are:
They don't have a well defined PK.
They are in separate databases.
So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.
EDIT 2
So I did the following in my local environment:
select a.* from the_table#PRODUCTION a , the_table b where
a.idle = b.idle and
a.activity = b.activity and
a.finishdate = b.finishdate
Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.
What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table
I hope I don't get in something worst : - S : -S
The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.
Instead I'd recommend using an autoincrementing ID for your primary key.
Just create a surrogate key:
ALTER TABLE mytable ADD pk_col INT
UPDATE mytable
SET pk_col = rownum
ALTER TABLE mytable MODIFY pk_col INT NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
or this:
ALTER TABLE mytable ADD pk_col RAW(16)
UPDATE mytable
SET pk_col = SYS_GUID()
ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)
Update:
If you need to create same PRIMARY KEYs on two tables with identical data, use this:
MERGE
INTO mytable v
USING (
SELECT rowid AS rid, rownum AS rn
FROM mytable
ORDER BY
co1l, col2, col3
)
ON (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET pk_col = rn
Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).
Update 2:
For your very problem, you don't need a PK at all.
If you just want to select the records missing in dm, use this one (on dm side)
SELECT *
FROM mytable#myxe
MINUS
SELECT *
FROM mytable
This will return all records that exist in mytable#myxe but not in mytable#dm
Note that it will shrink all duplicates if any.
Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.
Select Table2.SomeFields
FROM Table1 LEFT OUTER JOIN Table2 ON
(Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) +
((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk
The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.
And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.
If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.
If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)
If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?