Suppose you have two tables that are in a one-to-one relationship; i.e. the primary key of the child table is also the foreign key that links it to the parent table. Suppose also that the primary key of the parent is an identity field (a monotonically increasing integer that is assigned by the database when the record is inserted).
Suppose that you need to copy records from these two tables into a second pair of identical tables -- the primary key of the parent is an identity, and the foreign key linking the child to the parent is also the child's primary key.
How should I copy records from one set of tables to the other set?
I currently have three solutions, but I'd like to know if there are others that are better.
Option 1: Temporarily disable the
identity property in the destination
parent table. Copy records from the
parent table, then the child table,
keeping the same values for the
primary key. Cross your fingers that
there are no conflicts (value of
primary key of source table already
exists in destination table).
Option 2: Temporarily add a column to
the destination parent table to hold
the "old" (source) primary key. Copy
records from the parent table,
allowing the database to assign a new
primary key but saving the old
primary key in the temporary column.
Copy records from the child table,
joining the source child table to the
destination parent table via the the
old primary key, using the join to
insert the record into the
destination child table with the new
primary key. Drop the temporary
column from the destination parent
table.
Option 3: Copy sequentially
record-by-record, first from parent
to parent, then child to child, using
DB-provided "identity of last
inserted record" functions to ensure
that the link is maintained.
Of these options, I think option 2 is my preference. Does anyone prefer one of the other two options, and if so, why? Does anyone have a different solution that is "better"?
This is one reason why it is so critical to remember that even if you use a surrogate key (like an identity column), you always need a business key. I.e., there always need to be some other unique constraint on the table. If you had that, then another choice would be to insert the values into the copy of the parent table without the identity values and use that unique key to insert the proper parent value for the child rows.
If you do not have that unique key, then given your situation, I agree that your best solution would likely be Option #2.
Before you decide on an approach to copy data to new set of tables, you should investigate following items:
a list of tables that reference the data from the parent and child tables (both sets of tables)
Are there any stored procedures/triggers that utilize the data in these tables?
How does this table get populated? Is there an application/data feed that inserts data in this table?
How does the data in this table get deleted?
What is the purpose of the primary key beyond ensuring uniqueness in the table? For this you will have to understand how the data in the table is used by the application.
Based on the answers, you should be able to pick the right solution that will meet the requirements of the application.
My money is on Option 1 (see SET IDENTITY INSERT, http://msdn.microsoft.com/en-us/library/ms188059.aspx).
But: Why are you copying them?
If you are just altering the table schema, or migrating to new tables and retiring the old ones, why not use ALTER TABLE.
If you are going to run them side-by-side you probably need the keys to match.
But to answer your question, use Option 1, definitely.
Related
I want to create a 1-to-1 relationship on a table with itself.
I have a table MenuItem, but I want the items to be able to have a parent MenuItem. One item can only have one parent, but an item can be parent to multiple items.
I am currently working with a link table, MenuItemParent, but I can't figure out how to get the keys and constraints correctly. It has two columns: MenuItemId and ParentId. Both are foreign keys to the MenuItem table.
If I make the first or both columns Primary key, I seem to end up with a 1-to-many relationship. (I'm generating code from the DB so I can verify it.)
If I only make the first column Primary Key, I end up in a sort of Schrödinger state where a MenuItem can both have a single parent and have multiple parents (i.e. the generated POCO has both a MenuItem property and an EntitySet<MenuItem> property.) I could build my code around this, but then it's not clear from either the model or the generated code what kind of relationship it actually is.
What am I missing?
As to why I'm using a link table, I'm trying to employ vertical segmentation, as this data will not be accessed as often.
A 1-1 relationship effectively partitions the attributes (columns) in
a table into two tables. This is called vertical segmentation. This is
often done for sub-classing the table entities, or, for another
reason, if the usage patterns on the columns in the table indicate
that a few of the columns need to be accessed significantly more often
than the rest of the columns. (Say one or two columns will be accessed
1000s of times per second and the other 40 columns will be accessed
only once a month). Partitioning the table in this way in effect will
optimize the storage pattern for those two different queries.
From: https://stackoverflow.com/a/5112498/125938
Edit: premature optimization aside, I now understand I could simply use a ParentId column in the MenuItem table, but is this really better than using a link table?
You should add a ParentID column to your table MenuItem with a foreign key.
This is an example on how to do that.
alter table MenuItem
add ParentID int null;
alter table MenuItem
add constraint FK_MenuItemParent foreign key (ParentID) references MenuItem (ID);
Now you have an hierarchical table, which means that a menuitem can have only one parent, but many other menuitems can have the same menuitem as parent
A Link Table is only needed when you need a many to many relationship, which is not the case for this
Also you can create an unique index on both columns, as suggested, but beware that the ParentID can be null often so add a clause to fix that
create unique nonclustered index idx_MenuParentID
on MenuItem(ID, ParentID)
where ParentID is not null;
Get rid of the "link" table. Just setup your MenuItem table with an ID (PK) column and a ParentID (FK) column. Setup the foreign key relationship (I'll assume you can figure that out). Then setup a "Unique Key" constraint on the ParentID and ID columns.
I think you should try to have 1 column is PRIMARY KEY, and the other is FOREIGN KEY REFERENCES from MenuItem. Because the 1-1 relationship with itself in database called self-reference(you can search google for more info), it can't have two FOREIGN KEY.
I was wondering, if we have two tables that share one column in common and in the first table this column is a primary key but in the second, another is chosen as a primary key... then does SQL treat the common column in the second table as just another ordinary column? Hence no optimization is present if the second table is searched based on the common column info, i.e. primary keys between two related tables are completely independent?
Yes they are independent: primary keys are completely unique to a table.
They are not shared across tables, even if the type of the column is the same, but you can share the primary key of a table as foreign key in another table.
No optimization is performed, as the second column you had mentioned is not a primary-key in that table. And the database by default creates an index based on a primary key which improves looking up in the table data.
If a proper PK - FK relationship is established between the two columns in their respective tables, then any joins should be optimized.
Assume that I know that updating a primary key is bad.
There are other questions which imply that the inserted and updated table records match by position (the first of one matches the first of the other.) Is this a fact or coincidence?
Is there anything that could join the two tables together when the primary key changes on an update?
There is no match of inserted+deleted virtual table row positions.
And no, you can't match rows
Some options:
there is another unique unchanging (for that update) key to link rows
limit to single row actions.
use a stored procedure with the OUTPUT clause to capture before and after keys
INSTEAD OF trigger with OUTPUT clause (TBH not sure if you can do this)
disallow primary key updates (added after comment)
Each table is allowed to have one identity column. Identity columns are not updateable; they are assigned a value when the records are inserted (or when the column is added), and they can never change. If the primary key is updateable, it must not be an identity column. So, either the table has another column which is an identity column, or you can add one to it. There is no rule that says the identity column has to be the primary key. Then in the trigger, rows in inserted and updated that have the same identity value are the same row, and you can support updating the primary key on multiple rows at a time.
Yes -- create an "old_primary_key" field in the table you're updating, and populate it first.
Nothing you can do to match-up the inserted and deleted psuedo table record keys -- even if you store their data in a log table somewhere.
I guess alternatively, you could create a separate log table that tracked changes to primary keys (old and new). This might be more useful than adding a field to the table you're updating as I suggested right at first, as it would allow you to track more than one change for a given record. Just depends on your situation, I guess.
But that said -- before you do anything, please go find a chalk board and write this 100 times:
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
...
:-) (just kidding)
I have a table with one column source_id whose value should be the primary key of another table, though which table it is will vary from record to record. Every record must have a value for source_table that specifies the table for the source record, and a value for source_id that specifies the row in the source table.
Is there any way to accomplish this to take advantage of the DB's foreign key constraints and validation? Or will I have to move my validation logic into the application layer? Alternately, is there another design that will just let me avoid this problem?
Foreign key constraints can only reference one target table. "Conditional" foreign keys which reference a different target table based on some other field are not available in SQL. As #OMG Ponies noted in a comment below, you can have more than one foreign key on the same column, referencing more than one table, but that would mean the value of that column will have to exist in all the referenced tables. I guess this is not what you are after.
For a few possible solutions, I suggest checking out #Bill Karwin's answer to this question:
Possible to do a MySQL foreign key to one of two possible tables?
I like the "supertable" approach in general. You may also want to check out this post for another example:
MySQL - Conditional Foreign Key Constraints
I think previous answers do answer the first part of the question well. However link recommended by Daniel provides a solution only for the case when the number of referenced "source" tables is reasonably small. And the solution will not scale easily if you decide to increase the number of "source" tables.
To recommend a better strategy it would be nice to have a little more details on what the task is and if the "source" tables have anything in common that would allow to combine them.
In current structure (as far as I can infer from the question) I would reverse the relationship:
I would create a table (let's call it AllSources) that would work as a repository of all available sources with columns source_id and source_table. Both included in the primary key.
I would create foreign keys from each "source" table referencing AllSources table so that they could have only sources already registered in it.
Then I would create the table you mentioned in your question with foreign key referencing the AllSources table (not separate "source" tables).
Drawback: you will have to manage AllSources and "source" tables together ensuring that if you create a record in AllSources, you also create a corresponding record in proper "source" table, which in reality is not that hard.
Our application uses an Oracle 10g database where several primary keys are exposed to the end user. Productcodes and such. Unfortunately it's to late to do anything with this, as there are tons of reports and custom scripts out there that we do not have control over. We can't redefine the primary keys or mess up the database structure.
Now some customer want to change some of the primary key values. What they initially wanted to call P23A1 should now be called CAT23MOD1 (not a real example, but you get my meaning.)
Is there an easy way to do this? I would prefer a script of some sort, that could be parametrized to fit other tables and keys, but external tools would be acceptable if no other way exists.
The problem is presumably with the foreign keys that reference the PK. You must define the foreign keys as "deferrable initially immediate", as described in this Tom Kyte article: http://www.oracle.com/technology/oramag/oracle/03-nov/o63asktom.html
That lets you ...
Defer the constraints
Modify the parent value
Modify the child values
Commit the change
Simple.
Oops. A little googling makes it appear that, inexplicably, Oracle does not implement ON UPDATE CASCADE, only ON DELETE CASCADE. To find workarounds google ORACLE ON UPDATE CASCADE. Here's a link on Creating A Cascade Update Set of Tables in Oracle.
Original answer:
If I understand correctly, you want to change the values of data in primary key columns, not the actual constraint names of the keys themselves.
If this is true it can most easily be accomplished redefining ALL the foreign keys that reference the affected primary key constraint as ON UPDATE CASCADE. This means that when you make a change to the primary key value, the engine will automatically update all related values in foreign key tables.
Be aware that if this results in a lot of changes it could be prohibitively expensive in a production system.
If you have to do this on a live system with no DDL changes to the tables involved, then I think your only option is to (for each value of the PK that needs to be changed):
Insert into the parent table a copy of the row with the PK value replaced
For each child table, update the FK value to the new PK value
Delete the parent table row with the old PK value
If you have a list of parent tables and the PK values to be renamed, it shouldn't be too hard to write a procedure that does this - the information in USER_CONSTRAINTS can be used to get the FK-related tables for a given parent table.