Understanding the role of foreign key constraints. Am I using them properly? - sql

I need help in understanding the applicability of foreign keys when setting up constraints. I understand that the role of setting up foreign keys is to prevent orphaned data, but I have found a desire to put the foreign key in the child, which seems to break a pattern. Not sure if I am doing this right, and would like some advice if I have my constraints correctly.
Here is the design I have:
(1) I want all my "product"s to have a type of unit associate with the quantity. Units being like "Each", "Foot", "Gallon", etc, so between the quantity and the unit, you would have something like:
Quantity Unit
5 Gallons
I do not want to allow a bunch of crazy units, so I set this constraint up. This is pretty much by the book.
(2) I also believe that not all products will have an "Image", so I put the foreign key in the "ProductImage" table so I would not have "Product"s with a column with an empty row because I am also trying to "Normalize" the design.
The same issue with "FeeTypes" because not all "Product"s will have fees.
I feel guilt about breaking the pattern of putting the foreign key constraint in the child and not the parent. I just cannot wrap my head around "FeeType" being a parent. This conflict in logic is where I have the question.
Is my design correct, from a design perspective?
Am I still constraining the data properly?
Is there another "role" besides preventing orphaned data?
Thanks in advance.

There are three cases here (from the Product table's point of view):
Many-to-one relationship, e.g. many products having the same unit type - one unit type per product.In this case the foreign key must be in the Product table referencing the primary key UnitType.UnitTypeID.
One-to-many relationship, e.g. one product can have multiple images - one image can belong to only one productIn this case the foreign key must be in the ProductImages table, referencing Product.ProductID.
Many-to-many relationship, e.g. any product can have many categories - any category might describe many productsIn this case you will need a connection table that contain ProductID/CategoryID pairs, with columns being foreign keys referencing Product.ProductID and Category.CategoryID respectively.
So, the design of UnitType (case 1.) and ProductImage (case 2.) tables is OK, but FeeType should probably be case 1. and Category should be case 3.
BTW, it would be perfectly OK to have NULL in a foreign key column; it would not break the rules of normalization. So, for example, if some products do not have fees associated, you can have NULL in the Product.FeeTypeID column. But you will need to use an outer join in your queries to ensure that no products with no fees will not be excluded from the results.


Is it fine to have multiple foreign keys from one table to single primary key in another table?

Say I have two tables
Products table:
PrebuiltSystems table:
The values for the Processor, Motherboard etc. are all existing ProductID's. I am now creating
foreign key relationships from each of the part name columns to the one ProductID and have a bunch of navigation properties and relationships lines being created for each part. Is this ok?
Or is there some kind of relationship merger/rule that I can use to say all those columns are foreign keys to productID without creating one to one relationships ?
Is this ok?
Yes, this is the right way to do it.
Or is there somekind of relationship merger/rule that I can use to say
all those columns are foreign keys to productID
No, they are all different relationships.
without creating one to one relationships ?
Note that these are many-to-one, not one-to-one relationships, many PrebuiltSystems will have the same Processor
Yes it is fine to have multiple foreign keys. However, in your example, I'd still do it differently; if you have one column for each component type you limit yourself very much. What do you do with a multi-CPU-system? What with multiple harddisks etc.?
You should rather normalize the PrebuiltSystems table so that you have a link table which creates a n-to-n relationship to the products (e.g. each product can be part of any number of prebuilt systems, and each prebuilt system can have any number of products in it).

Implementing many-to-many with one "primary" value

I have many products that can each be in many categories.
products: id, ...
products_categories: product_id, category_id
categories: id, ...
Now I want to have many products, each with one master category, and 0 or more secondary categories. I can think of two ways to model this in SQL.
Add an is_primary column to products_categories
Add a primary_category_id column to products
What is the best way to implement this in pure SQL and/or ActiveRecord? I'm using PostgreSQL, for what it's worth.
I would go with the first option unless I have a good reason for choosing 2 (like the cost of an extra join when getting the primary category)
reason: you probably need to add the primary category to product_category table anyway (in order to use it in a uniform and simple way in queries like getting all categories for a product)
option 1 avoids duplicating primary category thus simpler
I would go with option (1). The reason for this is since your products can belong to more than one category, the relationship attribute (that its a 'primary' category) belongs in the table that defines the relationship.
I would even go further and suggest that instead of labeling the field 'is_primary', you should have the field labeled as 'association_type'. And instead of just adding a bit field, make it an integer field, and have all the association types defined. In your case today, there are only two association types - secondary and primary. The advantage is that this design is much more scalable. If tomorrow, you are asked to define a 'primary', a 'secondary' and all other tertiary categories, this design will be able to handle it, instead of having to add another field to designate the 'secondary' field.
It really depends on the exact details of what you're trying to accomplish. Here are some of the things to consider while deciding what's best for you. Other answers already tackled the first case, so I'm going to focus on the second one.
If you have primary_category_id:
It seems cleaner to have one field in product that tells which category is the primary one, than to have a field in every product_category which has 1 in one row and 0 in every other row, although the suggestion by M.R. to use association_type sounds clean too - but what's the chance you're going to have "tertiary" categories?
It's slightly easier to get to the primary category
It's easy to ensure every product always has a primary category (just make the field NOT NULL)
It automatically enforces that a product may only have one primary category
Should you also insert the primary category to products_categories?
Neither option is enforced.
If you don't, it's awkward to query all the categories
If you do, it's still easy to query, but without additional work, nothing guarantees the primary category is also inserted in the other table
If you use the is_primary method, you should somehow ensure that every product always has exactly one primary category.
What are each way's pros and cons?
Option 1. I can be sure that the primary category for a product is indeed one of its categories. But there may be a problem of ensuring that a product has no more than one primary category.
Option 2. This lets me make sure that a product has only one primary category. But then I don't seem to have a way to make sure that it's one of this same product's categories.
So, I would probably go for a third option, using a table Products_PrimaryCategories:
Products_PrimaryCategories: product_id, category_id
It seems the same as product_categories, but has some additional properties:
product_id has an associated unique index, making sure you can only have one primary category for each product;
(product_id, category_id) is a foreign key referencing products_categories (product_id, category_id) ensuring that a product's primary category is one of its categories (which implies that (product_id, category_id) should be products_categories's primary key).

Do link tables need a meaningless primary key field?

I am working on a couple of link tables and I got to thinking (Danger Will Robinson, Danger) what are the possible structures of a link table and what are their pro's and con's.
I came up with a few possible strictures for the link table:
Traditional 3 column model
id - auto-numbered PRIMARY
table1fk - foreign key
table2fk - foreign key
It's a classic, in most of the books, 'nuff said.
Indexed 3 column model
id - auto-numbered PRIMARY
table1fk - foreign key INDEX ('table1fk')
table2fk - foreign key INDEX ('table2fk')
In my own experience, the fields that you are querying against are not indexed in the traditional model. I have found that indexing the foreign key fields does improve performance as would be expected. Not a major change but a nice optimizing tweak.
Composite key 2 columns ADD PRIMARY KEY ('table1fk' , 'table2fk')
table1fk - foreign key
table2fk - foreign key
With this I use a composite key so that a record from table1 can only be linked to a record on table2 once. Because the key is composite I can add records (1,1), (1,2), (2,2) without any duplication errors.
Any potential problems with the composite key 2 columns option? Is there an indexing issue that this might cause? A performance hit? Anything that would disqualify this as a possible option?
I would use composite key, and no extra meaningless key.
I would not use a ORM system that enforces such rules on my db structure.
For true link tables, they typically do not exist as object entities in my object models. Thus the surrogate key is not ever used. The removable of an item from a collection results in a removal of an item from a link relationship where both foreign keys are known (Person.Siblings.Remove(Sibling) or Person.RemoveSibling(Sibling) which is appropriately translated at the data access layer as usp_Person_RemoveSibling(PersonID, SiblingID)).
As Mike mentioned, if it does become an actual entity in your object model, then it may merit an ID. However, even with addition of temporal factors like effective start and end dates of the relationship and things like that, it's not always clear. For instance, the collection may have an effective date associated at the aggregate level, so the relationship itself may still not become an entity with any exposed properties.
I'd like to add that you might very well need the table indexed both ways on the two foreign key columns.
If this is a true many-to-many join table, then dump unecessary id column (unless your ORM requires one. in that case you've got to decide whether your intellect is going to trump your practicality).
But I find that true join tables are pretty rare. It usually isn't long before I start wanting to put some other data in that table. Because of that I almost always model these join tables as entities from the beginning and stick an id in there.
Having a single column pk can help out alot in disaster recovery situation. So though while correct in theory that you only need the 2 foreign keys. In practice when the shit hits the fan you may want the single column key. I have never been in a situation where i was screwed because I had a single column identifier but I have been in ones where I was screwed because I didn't.
Composite PK and turn off clustering.
I have used composite key to prevent duplicate entry and let the database handle the exception. With a single key, you are rely on the front-end application to check the database for duplicate before adding a new record.
There is something called identifying and non-identifying relationship. With identifying relationships the FK is a part of the PK in the many-to-many table. For example, say we have tables Person, Company and a many-to-many table Employment. In an identifying relationship both fk PersonID and CompanyID are part of the pk, so we can not repeat PersonID, CompanyID combination.
TABLE Employment(PersonID int (PK,FK), CompanyID int (PK,FK))
Now, suppose we want to capture history of employment, so a person can leave a company, work somewhere else and return to the same company later. The relationship is non-identifying here, combination of PersonID, CompanyID can now repeat, so the table would look something like:
TABLE Employment(EmploymentID int (PK), PersonID int (FK), CompanyID int (FK),
FromDate datetime, ToDate datetime)
If you are using an ORM to get to/alter the data, some of them require a single-column primary key (Thank you Tom H for pointing this out) in order to function correctly (I believe Subsonic 2.x was this way, not sure about 3.x).
In my mind, having the primary key doesn't impact performance to any measurable degree, so I usually use it.
If you need to traverse the join table 'in both directions', that is starting with a table1fk or a table2fk key only, you might consider adding a second, reversed, composite index.
ADD KEY ('table2fk', 'table1fk')
The correct answer is:
Primary key is ('table1fk' , 'table2fk')
Another index on ('table2fk' , 'table1fk')
You don't need an index on table1fk or table2fk alone: the optimiser will use the PK
You'll most likely use the table "both" ways
Adding a surrogate key is only needed because of braindead ORMs
i've used both, the only benefit of using the first model (with uid) is that you can transport the identifier around as a number, whereas in some cases you would have to do some string concatenation with the composite key to transport it around.
i agree that not indexing the foreign keys is a bad idea whichever way you go.
I (almost) always use the additional single-column primary key. This generally makes it easier to build user interfaces, because when a user selects that particular linking entity I can identify with a single integer value rather than having to create and then parse compound identifiers.

Multiple foreign keys to a single column

I'm defining a database for a customer/ order system where there are two highly distinct types of customers. Because they are so different having a single customer table would be very ugly (it'd be full of null columns as they are pointless for one type).
Their orders though are in the same format. Is it possible to have a CustomerId column in my Order table which has a foreign key to both the Customer Types? I have set it up in SQL server and it's given me no problems creating the relationships, but I'm yet to try inserting any data.
Also, I'm planning on using nHibernate as the ORM, could there be any problems introduced by doing the relationships like this?
No, you can't have a single field as a foreign key to two different tables. How would you tell where to look for the key?
You would at least need a field that tells what kind of user it is, or two separate foreign keys.
You could also put the information that is common for all users in one table and have separate tables for the information that is specific for the user types, so that you have a single table with user id as primary key.
A foreign key can only reference a single primary key, so no. However, you could use a bridge table:
CustomerA <---- CustomerA_Orders ----> Order
CustomerB <---- CustomerB_Orders ----> Order
So Order doesn't even have a foreign key; whether this is desirable, though...
I inherited a SQL Server database where this was done (a single column used in four foreign key relationships with four unrelated tables), so yes, it's possible. My predecessor is gone, though, so I can't ask why he thought it was a good idea.
He used a GUID column ("uniqueidentifier" type) to avoid the ambiguity problem, and he turned off constraint checking on the foreign keys, since it's guaranteed that only one will match. But I can think of lots of reasons that you shouldn't, and I haven't thought of any reasons you should.
Yours does sound like the classical "specialization" problem, typically solved by creating a parent table with the shared customer data, then two child tables that contain the data unique to each class of customer. Your foreign key would then be against the parent customer table, and your determination of which type of customer would be based on which child table had a matching entry.
You can create a foreign key referencing multiple tables. This feature is to allow vertical partioining of your table and still maintain referential integrity. In your case however, this is not applicable.
Your best bet would be to have a CustomerType table with possible columns - CustomerTypeID, CustomerID, where CustomerID is the PK and then refernce your OrderID table to CustomerID.
I know this is a very old question; however if other people are finding this question through the googles, and you don't mind adding some columns to your table, a technique I've used (using the original question as a hypothetical problem to solve) is:
Add a [CustomerType] column. The purpose of storing a value here is to indicate which table holds the PK for your (assumed) [CustomerId] FK column. Optional - addition of a check constraint (to ensure CustomerType is in CustomerA or CustomerB) will help you sleep better at night.
Add a computed column for each [CustomerType], eg:
[CustomerTypeAId] as case when [CustomerType] = 'CustomerA' then [CustomerId] end persisted
[CustomerTypeBId] as case when [CustomerType] = 'CustomerB' then [CustomerId] end persisted
Add your foreign keys to the calculated (and persisted) columns.
Caveat: I'm primarily in a MSSQL environment; so I don't know how well this translates to other DBMS (ie: Postgres, ORACLE, etc).
As noted, if the key is, say, 12345, how would you know which table to look it up in? You could, I suppose, do something to insure that the key values for the two tables never overlapped, but this is too ugly and painful to contemplate. You could have a second field that says which customer type it is. But if you're going to have two fields, why not have one field for customer type 1 id and another for customer type 2 id.
Without knowing more about your app, my first thought is that you really should have a general customer table with the data that is common to both, and then have two additional tables with the data specific to each customer type. I would think that there must be a lot of data common to the two -- basic stuff like name and address and customer number at the least -- and repeating columns across tables sucks big time. The additional tables could then refer back to the base table. As there is then a single key for the base table, the issue of foreign keys having to know which table to refer to evaporates.
Two distinct types of customer is a classic case of types and subtypes or, if you prefer, classes and subclasses. Here is an answer from another question.
Essentially, the class-table-inheritance technique is like Arnand's answer. The use of the shared-primary-key technique is what allows you to get around the problems created by two types of foreign key in one column. The foreign key will be customer-id. That will identify one row in the customer table, and also one row in the appropriate kind of customer type table, as the case may be.
Create a "customer" table include all the columns that have same data for both types of customer.
Than create table "customer_a" and "customer_b"
Use "customer_id" from "consumer" table as foreign key in "customer_a" and "customer_b"
| |
cusomter_a customer_b

Owner ID type database fields

Suppose you have these tables: RestaurantChains, Restaurants, MenuItems - with the obvious relations between them. Now, you have tables Comments and Ratings, which store the customer comments/ratings about chains, restaurants and menu items. What would be the best way to link these tables? The obvious solutions could be:
Use columns OwnerType and OwnerID in the tables Comments and Ratings, but now I can't add foreign keys to link comments/ratings with the objects they are ment for
Create separate tables of Comments and Ratings for each table, e.g. MenuItemRatings, MenuItemComments etc. This solution has the advantage that all the correct foreign keys are present and has the obvious disadavantage of having lots and lots of tables with basically the same structure.
So, which solution works better? Or is there even a better solution that I don't know about?
Since comments about a menu item are different from comments about a restaurant (even if they happen to share the same structure) I would put them in separate tables and have the appropriate FKs to enforce some data integrity in your database.
I don't know why there is an aversion to having more tables in your database. Unless you're going from 50 tables to 50,000 tables you're not going to see a performance problem due to large catalog tables (and having more, smaller tables in this case should actually give you better performance). I would also tend to think that it would be a lot clearer to understand when dealing with tables called "Menu_Item_Comments" and "Restaurant_Comments" than it would to deal with a table called "Comments" and not knowing what exactly is really in it just by the name of it.
How about this alt text http://www.freeimagehosting.net/uploads/8241ff5c76.png
Have a single Comments/Rating table for all the objects and dont use automatically generated foreign keys. The key in the ratings table eg RatingID can be placed in a field in Restaurant, Chain, Menuitems table and they can all point to the same table, they are still foreign keys.
If you need to know in reverse what object the review relates to you would need to have a field specifying the type of review it was, but that should be all.
Use a single table for comments and use GUID's as primary keys for your entites.
Then you can select comments without even knowing beforehand where they belong to:
SELECT CommentText
FROM Comments c, Restaurants r
WHERE c.Source = r.Id
SELECT CommentText
FROM Comments c, Chains ch
WHERE c.Source = ch.Id
You can't use foreign keys for comments, of course, but it's not that comments cannot live without foreign keys.
You may clean orphaned comments in triggers but there's nothing bad if some of them are left.
You amy also create a global Entity table (with a single GUID column), make your Chains, Restaurants, MenuItems and Comments refer to that table with a FOREING KEY ON DELETE CASCADE, and when DELETE'ing, say, a restaurant, delete it from that table instead. It will delete both a restaurant and all comments on it, and you still have your integrity.
If you want to take advantage of foreign key constraint and normalize the attributes of comments (and ratings) across base tables, you may need to create relationship tables between base tables and comments (and ratings).
e.g. for Restaurants and Comments:
id (PK)
(attributes of restaurants...)
id (PK)
restaurantid (FK to Restaurants)
commentid (FK to Comments)
id (PK)
(attributes of comments...)