Export / import tree (id's conflicts) - sql

Let's assume we have a table in database with the following structure:
id (int32), parentId (int32), nodeName, nodeBodyText, ...
Of course some kind of "tree" is stored there.
User exports some branch of the tree to csv/xml/etc file.
When this file is being imported to another db (with a different nodes of course) there often may happen id's conflicts.
1) Records with the same id's may exist already
2) Db has the id column with the auto-incrementing enabled
(so you can't explicitly specify id for newly created record)
How this problem is usually solved?
Especially in case nodeBodyText also may contain text with relations to other nodes
(using hardcoded ids from a previous db)
P.S.
Usage of guid's is not acceptable for us.

Assuming that the imported subtree has parent references confined to that subtree only and you are inserting the nodes only, not updating. In SQL server you can do this:
You need a mapping table to store new and old ids.
declare #idmap table
(
old_id int, new_id int
)
Then insert imported nodes using MERGE command
MERGE [target] as t
USING [source] as s ON 1=0 -- don't match anythig, all nodes are new
WHEN NOT MATCHED
THEN INSERT(parentid,nodename) VALUES(s.parentid,s.nodename)
OUTPUT s.id, inserted.id INTO #idmap; -- store new and old id in mapping table
Finally re-map target table's parent ids
update t
set parentid = x.new_id
from [target] t
inner join #idmap x on x.old_id = t.parentid
where t.parentid is not null
and -- only the newly inserted nodes
exists(select * from #idmap where new_id = t.id);

Related

How pull data from 2 tables that have Uniqueidentifier's as PK's and populate the data into 2 new tables that have AutoIncremented ID's as PK's? [duplicate]

This question already has answers here:
How to better duplicate a set of data in SQL Server
(3 answers)
Closed 6 years ago.
I have a few tables that contain ID's that are of type GUID, pretty much every table is using GUID's instead of incremented IDs, but thats neither here nor their, just a scope of what I am dealing with.
So I have a table called
MaterialType and MaterialSubType
the MaterialType is built as the following
MaterialTypeID (PK, Uniqueidentifier, not null)
MaterialType (varchar(40), not null)
Code (varchar(100), not null)
EnabledInd (tinyint, not null)
The MaterialSubType is built as follows
MaterialSubTypeID (PK, Uniqueidentifier, not null)
MaterialTypeID (PK, Uniqueidentifier, not null)
MaterialSubType (varchar(40), not null)
Code (varchar(100), not null)
EnabledInd (tinyint, not null)
The problem I am running into is that I have two updated tables that are pretty much identical, other than the fact is that I am using autoincremented ID's, I need to figure out how to query the data from the original tables and insert the data into the new tables.
I know I can do an "insert into select" to the tables, but that doesn't (at least to my knowledge) help me because I need to have a foreign key in the MaterialSubType to the MaterialType.
So I am not sure how this should be done or how to do it, reason being is that I am not an expert with SQL or a DBA.
Again I'm making assumptions:
The code is the universal identifier that you're using to identify new records in both tables
The autoincrement identity values are on
MaterialType.MaterialTypeID and MaterialSubType.MaterialSubTypeID
(though it doesn't really make sense if you have a PK on two columns
when only one would be required in this case)
The databases are on the same SQL Server
First add new records to MaterialType so we can generate a key:
INSERT INTO TargetDB..MaterialType (MaterialType, Code, EnabledInd)
SELECT MaterialType, Code, EnabledInd
FROM SourceDB..MaterialType SRC
WHERE NOT EXISTS (
SELECT * FROM TGT
WHERE TGT.Code = SRC.Code
)
Now add new records to MaterialSubType. To get the target FK value we need to first look up the code value in the source, then use that to lookup the correct FK value in the target.
I strongly suggest you first just run the select without the insert and test some records manually.
INSERT INTO TargetDB..MaterialSubType (
MaterialTypeID,
MaterialSubType,
Code,
EnabledInd
)
SELECT
-- We get this FK value by first looking up the code in the source
-- then using that to look up the FK in the target
MT.MaterialTypeID,
SRC.MaterialSubType,
SRC.Code,
SRC.EnabledInd
FROM
SourceDB..MaterialSubType SRC
INNER JOIN
-- Join to source lookup to find the code
SourceDB..MaterialType FK
ON SRC.MaterialTypeID = FK.MaterialTypeID
-- Now we have the source code, look up the target to get the target FK
INNER JOIN
TargetDB..MaterialType MT
ON FK.Code = MT.Code
WHERE NOT EXISTS (
SELECT *
FROM TargetDB..MaterialSubType TGT
SRC.Code = TGT.Code
)
There might be some errors here. If there are, and you would like them fixed please also confirm my assumptions.

How Do I Deep Copy a Set of Data, and Change FK References to Point to All the Copies?

Suppose I have Table A and Table B. Table B references Table A. I want to deep copy a set of rows in Table A and Table B. I want all of the new Table B rows to reference the new Table A rows.
Note that I'm not copying the rows into any other tables. The rows in table A will be copied into table A, and the rows in table B will be copied into table B.
How can I ensure that the foreign key references get readjusted as part of the copy?
To clarify, I'm trying to find a generic way to do this. The example I'm giving involves two tables, but in practice the dependency graph may be much more complicated. Even a generic way to dynamically generate SQL to do the work would be fine.
UPDATE:
People are asking why this is necessary, so I'll give some background. It may be way too much, but here goes:
I'm working with an old desktop application that's been moved to a client-server model. But, the application still uses a rudimentary in-house binary file format for storing data for its tables. A data file is just a header followed by a series of rows, each of which is just the binary serialized field values, the order of which is determined by a schema text file. The only thing good about it is that it's very fast. It's terrible in every other respect. I'm moving the application to SQL Server and trying not to degrade the performance too badly.
This is a kind of scheduling application; the data's not critical to anybody, and there's no audit tracking, etc. necessary. It's not a supermassive amount of data, and we don't necessarily need to keep very old data around if the database grows too large.
One feature that they are accustomed to is the ability to duplicate entire schedules in order to create "what-if" scenarios that they can muck with. Any user can do this as many times as they want, as often as they want. In the old database, the data files for each schedule are stored in their own data folder, identified by name. So, copying a schedule was as simple as copying the data folder and renaming it.
I must be able to do effectively the same thing with SQL Server or the migration will not work. Maybe you're thinking that I can just only copy the data that actually gets changed in order to avoid redundancy; but that honestly sounds too complicated to be feasible.
To throw another wrench into the mix, there can be a hierarchy of schedule data folders. So, a data folder may contain a data folder, which may contain a data folder. And the copying can occur at any level.
In SQL Server, I'm implementing a nested set hierarchy to mimic this. I have a DATA_SET table like this:
CREATE TABLE dbo.DATA_SET
(
DATA_SET_ID UNIQUEIDENTIFIER PRIMARY KEY,
NAME NVARCHAR(128) NOT NULL,
LFT INT NOT NULL,
RGT INT NOT NULL
)
So, there's a tree structure of data sets. Each data set represents a schedule, and may contain child data sets. Every row in every table has a DATA_SET_ID FK reference, indicating which data set it belongs to. Whenever I copy a data set, I copy all the rows in the table for that data set, and every other data set, into the same table, but referencing new data sets.
So, here's a simple concrete example:
CREATE TABLE FOO
(
FOO_ID BIGINT PRIMARY KEY,
DATA_SET_ID BIGINT FOREIGN KEY REFERENCES DATA_SET(DATA_SET_ID) NOT NULL
)
CREATE TABLE BAR
(
BAR_ID BIGINT PRIMARY KEY,
DATA_SET_ID BIGINT FOREIGN KEY REFERENCES DATA_SET(DATA_SET_ID) NOT NULL,
FOO_ID UNIQUEIDENTIFIER PRIMARY KEY
)
INSERT INTO FOO
SELECT 1, 1 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
INSERT INTO BAR
SELECT 1, 1, 1
SELECT 2, 1, 2
SELECT 3, 1, 3
So, let's say I copy data set 1 into a new data set of ID 2. After I copy, the tables will look like this:
FOO
FOO_ID, DATA_SET_ID
1 1
2 1
3 1
4 2
5 2
6 2
BAR
BAR_ID, DATA_SET_ID, FOO_ID
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 2 6
As you can see, the new BAR rows are referencing the new FOO rows. It's not the rewiring of the DATA_SET_ID's that I'm asking about. I'm asking about rewiring the foreign keys in general.
So, that was surely too much information, but there you go.
I'm sure there are a lot of concerns about performance with the idea of bulk copying the data like this. The tables are not going to be huge. I'm not expecting more than 1000 records in any table, and most of the tables will be much much smaller than that. Old data sets can be deleted outright with no repercussions.
Thanks,
Tedderz
Here is an example with three tables that can probably get you started.
DB schema
CREATE TABLE users
(user_id int auto_increment PRIMARY KEY,
user_name varchar(32));
CREATE TABLE agenda
(agenda_id int auto_increment PRIMARY KEY,
`user_id` int, `agenda_name` varchar(7));
CREATE TABLE events
(event_id int auto_increment PRIMARY KEY,
`agenda_id` int,
`event_name` varchar(8));
An SP to clone a user with his agenda and events records
DELIMITER $$
CREATE PROCEDURE clone_user(IN uid INT)
BEGIN
DECLARE last_user_id INT DEFAULT 0;
INSERT INTO users (user_name)
SELECT user_name
FROM users
WHERE user_id = uid;
SET last_user_id = LAST_INSERT_ID();
INSERT INTO agenda (user_id, agenda_name)
SELECT last_user_id, agenda_name
FROM agenda
WHERE user_id = uid;
INSERT INTO events (agenda_id, event_name)
SELECT a3.agenda_id_new, e.event_name
FROM events e JOIN
(SELECT a1.agenda_id agenda_id_old,
a2.agenda_id agenda_id_new
FROM
(SELECT agenda_id, #n := #n + 1 n
FROM agenda, (SELECT #n := 0) n
WHERE user_id = uid
ORDER BY agenda_id) a1 JOIN
(SELECT agenda_id, #m := #m + 1 m
FROM agenda, (SELECT #m := 0) m
WHERE user_id = last_user_id
ORDER BY agenda_id) a2 ON a1.n = a2.m) a3
ON e.agenda_id = a3.agenda_id_old;
END$$
DELIMITER ;
To clone a user
CALL clone_user(3);
Here is SQLFiddle demo.
I recently found myself needing to solve a similar problem; that is, I needed to copy a set of rows in a table (Table A) as well as all of the rows in related tables which have foreign keys pointing to Table A's primary key. I was using Postgres so the exact queries may differ but the overall approach is the same. The biggest benefit of this approach is that it can be used recursively to go infinitely deep
TLDR: the approach looks like this
1) find all the related table/columns of Table A
2) copy the necessary data into temporary tables
3) create a trigger and function to propagate primary key column
updates to related foreign keys columns in the temporary tables
4) update the primary key column in the temporary tables to the next
value in the auto increment sequence
5) Re-insert the data back into the source tables, and drop the
temporary tables/triggers/function
1) The first step is to query the information schema to find all of the tables and columns which are referencing Table A. In Postgres this might look like the following:
SELECT tc.table_name, kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY'
AND ccu.table_name='<Table A>'
AND ccu.column_name='<Primary Key>'
2) Next we need to copy the data from Table A, and any other tables which reference Table A - lets say there is one called Table B. To start this process, lets create a temporary table for each of these tables and we will populate it with the data that we need to copy. This might look like the following:
CREATE TEMP TABLE temp_table_a AS (
SELECT * FROM <Table A> WHERE ...
)
CREATE TEMP TABLE temp_table_b AS (
SELECT * FROM <Table B> WHERE <Foreign Key> IN (
SELECT <Primary Key> FROM temp_table_a
)
)
3) We can now define a function that will cascade primary key column updates out to related foreign key columns, and trigger which will execute whenever the primary key column changes. For example:
CREATE OR REPLACE FUNCTION cascade_temp_table_a_pk()
RETURNS trigger AS
$$
BEGIN
UPDATE <Temp Table B> SET <Foreign Key> = NEW.<Primary Key>
WHERE <Foreign Key> = OLD.<Primary Key>;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_temp_table_a
AFTER UPDATE
ON <Temp Table A>
FOR EACH ROW
WHEN (OLD.<Primary Key> != NEW.<Primary Key>)
EXECUTE PROCEDURE cascade_temp_table_a_pk();
4) Now we just update the primary key column in to the next value of the sequence of the source table (). This will activate the trigger, and the updates will be cascaded out to the foreign key columns in . In Postgres you can do the following:
UPDATE <Temp Table A>
SET <Primary Key> = nextval(pg_get_serial_sequence('<Table A>', '<Primary Key>'))
5) Insert the data back from the temporary tables back into the source tables. And then drop the temporary tables, triggers, and functions after that.
INSERT INTO <Table A> (SELECT * FROM <Temp Table A>)
INSERT INTO <Table B> (SELECT * FROM <Temp Table B>)
DROP TRIGGER trigger_temp_table_a
DROP cascade_temp_table_a_pk()
It is possible to take this general approach and turn it into a script which can be called recursively in order to go infinitely deep. I ended up doing just that using python (our application was using django so I was able to use the django ORM to make some of this easier)

Migrating categories using SQL OUTPUT

I want to do something like
DECLARE #idTable TABLE
(
hierakiId INT,
katId INT
);
DECLARE #id int
SET #id = (SELECT MIN([ID]) FROM [Hieraki] WHERE Navn = 'Sagsskabeloner')
INSERT INTO HierakiMedlem(Navn, HierakiID)
OUTPUT INSERTED.ID, s.ID INTO #idTable
SELECT s.Navn, #id, s.ID FROM SagSkabelonKategori s
UPDATE s SET s.HierakiMedlem = #idTable.hierakiId
FROM SagSkabelon s INNER JOIN #idTable
ON s.SagSkabelonKategoriID = #idTable.katId
resulting in a map in #idTable, mapping the old to the new identity of each category, so that i can change references as needed. Obviously this results in an error (3rd line) as the SELECT results in more columns than used by the INSERT INTO.
Any suggestions on the cleanest way to do this?
I'm on SQL Server 2005.
/edit
now w. complete source code.
We are switching from a semi-flat, non-nested category sorting, to a hierachy based one. All the categories are to be copied as root level nodes in the new hierachy, and the former members of each category must have a new field set referencing the newly created root node.
what i want to do;
1. Copy all categories to the hierachy table, setting their parent (HierakiID) to the same value.
update a column in all references to the categories so they now (also) reference the hierachy nodes.
delete references to categories
delete categories
the tricky part for me is to get a map between the category id and the hierachy id.
/edit
In an INSERT statement OUTPUT can only project columns from the INSERTED table. And your SELECT must match the INSERT. Assuming HierakiMedlem.ID is an generated identity value, then try something like:
INSERT INTO HierakiMedlem(Navn, HierakiID)
OUTPUT INSERTED.ID, INSERTED.HierakiID
INTO #idTable (ID, HierakiID)
SELECT s.Navn, s.ID
FROM SagSkabelonKategori s
Your subsequent update uses column names like #idTable.katId which is not possible for me to guess what it means to be. So is likely my answer won't compile directly, but if you want a correct answer you should, always, include the exact definition of your tables (including table variables).

Doing UPSERT when row is referenced by a FK

Let's say that I have a table of items, and for each item, there can be additional information stored for it, which goes into a second table. The additional information is referenced by a FK in the first table, which can be NULL (if the item doesn't have additional info).
TABLE item (
...
item_addtl_info_id INTEGER
)
CONSTRAINT fk_item_addtl_info FOREIGN KEY (item_addtl_info)
REFERENCES addtl_info (addtl_info_id)
TABLE addtl_info (
addtl_info_id INTEGER NOT NULL
GENERATED BY DEFAULT
AS IDENTITY (
INCREMENT BY 1
NO CACHE
),
addtl_info_text VARCHAR(100)
...
CONSTRAINT pk_addtl_info PRIMARY KEY (addtl_info_id)
)
What is the "best practice" to update an item's additional info (in IBM DB2 SQL, preferably)?
It should be an UPSERT operation, meaning that if additional info does not yet exist then a new record is created in the second table, but if it does, then it is only updated, and the FK in the first table does not change.
So imperatively, this is the logic:
UPSERT(item, item_info):
CASE WHEN item.item_addtl_info_id IS NULL THEN
INSERT INTO addtl_info (item_info)
UPDATE item.item_addtl_info_id (addtl_info.addtl_info_id)
^^^^^^^^^^^^^
ELSE
UPDATE addtl_info (item_info)
END
My main problem is how to get the newly inserted addtl_info row's id (underlined above). In a stored proc I can request the id from a sequence and store it in a variable, but maybe there is a more straightforward way. Isn't it something that comes up all the time when programming databases?
I mean, I'm really not interested in what the id of the addtl_info record is as long as it remains unique and is referenced properly. So using sequences seems a bit of an overkill to me in this case.
As a matter of fact, this UPSERT operation should be part of the SQL language as a standard operation (maybe it is, and I just don't know about it?)...
The syntax I was looking for is:
SELECT * FROM NEW TABLE ( INSERT INTO phone_book VALUES ( 'Peter Doe','555-2323' ) )
from Wikipedia (http://en.wikipedia.org/wiki/Insert_%28SQL%29)
This is how to refer to the record that was just inserted in the table.
My colleague called this construct an "in-place trigger", which what it really is...
Here is the first version that I put together as a compound SQL statement:
begin atomic
declare addtl_id integer;
set addtl_id = (select item_addtl_info_id from item where item.item_id = XXX);
if addtl_id is null
then
set addtl_id = (select addtl_info_id from new table
(insert into addtl_info
(addtl_info_text)
values ('My brand new additional info')
)
);
update item set item.item_addtl_info_id = addtl_id
where item.item_id = XXX;
else
update addtl_info set addtl_info_text = 'My updated additional info'
where addtl_info.addtl_info_id = addtl_id;
end if;
end
XXX being equal to the item id to be updated - this code can now be easily inserted into a sproc, and XXX can be converted to an input parameter.
I also tried using MERGE INTO, but I couldn't figure out a syntax for updating a table different from what was specified as the target.

Stuck trying to migrate two tables from one DB to another DB

i'm trying to migrate some data from two tables in an OLD database, to a NEW database.
The problem is that I wish to generate new Primary Key's in the new database, for the first table that is getting imported. That's simple.
But the 2nd table in the old database has a foreign key dependency on the first table. So when I want to migrate the old data from the second table, the foreign key's don't match any more.
Are there any tricks/best practices involved to help me migrate the data?
Serious Note: i cannot change the current schema of the new tables, which do not have any 'old id' column.
Lets use the following table schema :-
Old Table1 New Table1
ParentId INT PK ParentId INT PK
Name VARCHAR(50) Name VARCHAR(50)
Old Table 2 New Table 2
ChildId INT PK ChildId INT PK
ParentId INT FK ParentId INT FK
Foo VARCHAR(50) Foo VARCHAR(50)
So the table schema's are identical.
Thoughts?
EDIT:
For those that are asking, RDBMS is Sql Server 2008. I didn't specify the software because i was hoping i would get an agnostic answer with some generic T-Sql :P
I think you need to do this in 2 steps.
You need to import the old tables and keep the old ids (and generate new ones). Then once they're in the new database and they have both new and old ids you can use the old Id's to get associate the new ids, then you drop the old ids.
You can do this by importing into temporary (i.e. they will be thrown away) tables, then inserting into the permanent tables, leaving out the old ids.
Or import directy into the new tables (with schema modified to also hold old ids), then drop the old id's when they're no longer necessary.
EDIT:
OK, I'm a bit clearer on what you're looking for thanks to comments here and on other answers. I knocked this up, I think it'll do what you want.
Basically without cursors it steps through the parent table, row by row, and inserts the new partent row, and all the child rows for that parent row, keeping the new id's in sync.
I tried it out and it should work, it doesn't need exclusive access to the tables and should be orders of magniture faster than a cursor.
declare #oldId as int
declare #newId as int
select #oldId = Min(ParentId) from OldTable1
while not #oldId is null
begin
Insert Into NewTable1 (Name)
Select Name from OldTable1 where ParentId = #oldId
Select #newId = SCOPE_IDENTITY()
Insert Into NewTable2 (ParentId, Foo)
Select #newId, Foo From OldTable2 Where ParentId = #oldId
select #oldId = Min(ParentId) from OldTable1 where ParentId > #oldId
end
Hope this helps,
Well, I guess you'll have to determine other criteria to create a map like oldPK => newPK (for example: Name field is equal?
Then you can determine the new PK that matches the old PK and adjust the ParentID accordingly.
You may also do a little trick: Add a new column to the original Table1 which stores the new PK value for a copied record. Then you can easily copy the values of Table2 pointing them to the value of the new column instead of the old PK.
EDIT: I'm trying to provide some sample code of what I meant by my little trick. I'm not altering the original database structure, but I'm using a temporary table now.
OK, you might try to following:
1) Create temporary table that holds the values of the old table, plus, it gets a new PK:
CREATE TABLE #tempTable1
(
newPKField INT,
oldPKField INT,
Name VARCHAR(50)
)
2) Insert all the values from your old table into the temporary table calculating a new PK, copying the old PK:
INSERT INTO #tempTable1
SELECT
newPKValueHere AS newPKField,
ParentID as oldPKField,
Name
FROM
Table1
3) Copy the values to the new table
INSERT INTO NewTable1
SELECT
newPKField as ParentId,
Name
FROM
#tempTable1
4) Copy the values from Table2 to NewTable2
INSERT INTO NewTable2
SELECT
ChildID,
t.newPKField AS ParentId,
Foo
FROM
Table2
INNER JOIN #tempTable1 t ON t.ParentId = parentId
This should do. Please note that this is only pseudo T-SQL Code - I have not tested this on a real database! However, it should come close to what you need.
Can you change the schema of the old tables? If so, you could put a "new id" column on the old tables, and use that as the reference.
You might have to do a row by row insert on the new table and then retrieve the scope_identity, store it in the old table1. But for table2, you can then join to the old table1 and grab the new_id.
First of all - can you not even have some temporary schema that you can later drop?! That would make life easier. Assuming you can't:
If you're lucky (and if you can guarantee that no other inserts will be happening at the same time) then when you insert the Table1's data into your new table you could perhaps cheat by relying on the sequential order of the inserts.
You could then create a view that joins the 2 tables on a row-count so that you have a way to correlate the keys to each other. That way you'd be one step closer to being able to identify the 'ParentId' for the new Table2.
I'm not sure from your question what database software you're using, but if temporary tables are an option, create a temporary table containing the original primary key of table1 and the new primary key of table1. Then create another temporary table with a copy of table2, update the copy using the "old key, new key" table you created earlier, then use "insert into select from" (or whatever the appropriate command is for your database) to copy the revised temporary table into its permanent location.
I had the wonderful opportunity to be dug deep in migration scripts last summer. I was using Oracle's PL/SQL for the task. But you did not mention what technology are you using? What are you migrating the data into? SQL Server? Oracle? MySQL?
The approach is to INSERT a row from table1 RETURING the new primary key generated (probably by a SEQUENCE [in Oracle]) and then INSERT the dependent records from table2, changing their foreign key value to the value returned by the first INSERT. Can't help you any better unless you can specify what DBMS are you migrating data into.
The following Pseudo-ish code should work for you
CREATE TABLE newtable1
ParentId INT PK
OldId INT
Name VARCHAR(50)
CREATE TABLE newtable2
ChildId INT pk
ParentId INT FK
OldParent INT
Foo VARCHAR(50)
INSERT INTO newtable1(OldId, Name)
SELECT ParentId, Name FROM oldtable1
INSERT INTO newtable2(OldParent, Foo)
SELECT ParentId, Foo FROM oldtable2
UPDATE newtable2 SET ParentId = (
SELECT n.ParentId
FROM newtable1 AS n
WHERE n.OldId = newtable2.oldParent
)
ALTER TABLE newtable1 DROP OldId
ALTER TABLE newtable2 DROP OldParent