Is unique foreign keys across multiple tables via normalization and without null columns possible? - sql

This is a relational database design question, not specific to any RDBMS. A simplified case:
I have two tables Cars and Trucks. They have both have a column, say RegistrationNumber, that must be unique across the two tables.
This could probably be enforced with some insert/update triggers, but I'm looking for a more "clean" solution if possible.
One way to achieve this could be to add a third table, Vehicles, that holds the RegistrationNumber column, then add two additional columns to Vehicles, CarID and TruckID. But then for each row in Vehicles, one of the columns CarID or TruckID would always be NULL, because a RegistrationNumber applies to either a Car or a Truck leaving the other column with a NULL value.
Is there anyway to enforce a unique RegistrationNumber value across multiple tables, without introducing NULL columns or relying on triggers?

This is a bit complicated. Having the third table, Vehicles is definitely part of the solution. The second part is guaranteeing that a vehicle is either a car or a truck, but not both.
One method is the "list-all-the-possibilities" method. This is what you propose with two columns. In addition, this should have a constraint to verify that only one of the ids is filled in. A similar approach is to have the CarId and TruckId actually be the VehicleId. This reduces the number of different ids floating around.
Another approach uses composite keys. The idea is:
create table Vehicles (
Vehicle int primary key,
Registration varchar(255),
Type varchar(255),
constraint chk_type check (type in ('car', 'truck')),
constraint unq_type_Vehicle unique (type, vehicle), -- this is redundant, but necessary
. . .
);
create table car (
VehicleId int,
Type varchar(255), -- always 'car' in this table
constraint fk_car_vehicle foreign key (VehicleId) references Vehicles(VehicleId),
constraint fk_car_vehicle_type foreign key (Type, VehicleId) references Vehicles(Type, VehicleId)
);

See the following tags: class-table-inheritance shared-primary-key
You have already outlined class table inheritance in your question. The tag will just add a few details, and show you some other questions whose answers may help.
Shared primary key is a handy way of enforcing the one-to-one nature of IS-A relationships such as the relationship between a vehicle and a truck. It also allows a foreign key in some other table to reference a vehicle and also a truck or a car, as the case may be.

You can add the third table Vehicles containing a single column RegistratioNumber on which you apply the unique constraint, then on the existing tables - Cars and Trucks - use the RegistrationNumber as a foreign key on the Vehicles table. In this way you don't need an extra id, avoid the null problem and enforce the uniqueness of the registration number.
Update - this solution doesn't prevent a car and a truck to share the same registration number. To enforce this constraint you need to add either triggers or logic beyond plain SQL. Otherwise you may want to take a look at Gordon's solution that involves Composite Foreign Keys.

Related

composite primary key with practical approach

I just need to understand the concept behind the composite primary key. I have googled about it, understood that it is a combination of more than one column of a table.But my questions is, what is the practical approach of this key over any data? when i should use this concept? can you show me any practical usage of this key on excel or SQL server?
It may be a weird type of question for any sql expert. I apologize for this kind of idiotic question. If anybody feels it is an idiot question, please forgive me.
A typical use-case for a composite primary key is a junction/association table. Consider orders and products. One order could have many products. One product could be in many orders. The orderProducts table could be defined as:
create table orderProducts (
orderId int not null references orders(orderId),
productId int not null references products(productId),
quantity int,
. . .
);
It makes sense to declare (orderId, productId) as a composite primary key. This would impose the constraint that any given order has any given product only once.
That said, I would normally use a synthetic key (orderProductId) and simply declare the combination as unique.
The benefit of a composite primary key as that it enforces the uniques (which could also be done with a uniqueness constraint). It also wastes no space that would be needed for an additional key.
There are downsides to composite primary keys as compared to identity keys:
Identity keys keep track of the order of inserts.
Identity keys are typically only 4 bytes.
Foreign key references consist of only one column.
By default, SQL Server clusters on primary keys. This imposes an ordering and can result in fragmentation (although that is doubtful for this example).
Let's say I have a table of cars. It includes the model and make of the cars. I do not want to insert the same exact car into my table, but there are cars that will have the same make and cars that will have the same model (assume both Ford and Toyota make a car called the 'BlergWagon').
I could enforce uniqueness of make/model with a composite key that includes both values. A unique key on just make would not allow me to add more than 1 Toyota and a unique key on just model would not allow me to enter more than 1 BlergWagon.
Another example would be grades, terms, years, students, and classes. I could enforce uniqueness for a student in a class and a specific semester in a specific year so that my table does not have 2 dupe records that show the same class in the same semester in the same year with the same student.
Another part of your post is about primary key, which I'll assume means you are talking about a clustered index. Clustered index enforces order of the table. So you could throw this onto an identity column to order the table and add a unique, nonclustered index to enforce uniqueness on your other columns.

One Primary Key Value in many tables

This may seem like a simple question, but I am stumped:
I have created a database about cars (in Oracle SQL developer). I have amongst other tables a table called: Manufacturer and a table called Parentcompany.
Since some manufacturers are owned by bigger corporations, I will also show them in my database.
The parentcompany table is the "parent table" and the Manufacturer table the "child table".
for both I have created columns, each having their own Primary Key.
For some reason, when I inserted the values for my columns, I was able to use the same value for the primary key of Manufacturer and Parentcompany
The column: ManufacturerID is primary Key of Manufacturer. The value for this is: 'MBE'
The column: ParentcompanyID is primary key of Parentcompany. The value for this is 'MBE'
Both have the same value. Do I have a problem with the thinking logic?
Or do I just not understand how primary keys work?
Does a primary key only need to be unique in a table, and not the database?
I would appreciate it if someone shed light on the situation.
A primary key is unique for each table.
Have a look at this tutorial: SQL - Primary key
A primary key is a field in a table which uniquely identifies each
row/record in a database table. Primary keys must contain unique
values. A primary key column cannot have NULL values.
A table can have only one primary key, which may consist of single or
multiple fields. When multiple fields are used as a primary key, they
are called a composite key.
If a table has a primary key defined on any field(s), then you cannot
have two records having the same value of that field(s).
Primary key is table-unique. You can use same value of PI for every separate table in DB. Actually that often happens as PI often incremental number representing ID of a row: 1,2,3,4...
For your case more common implementation would be to have hierarchical table called Company, which would have fields: company_name and parent_company_name. In case company has a parent, in field parent_company_name it would have some value from field company_name.
There are several reasons why the same value in two different PKs might work out with no problems. In your case, it seems to flow naturally from the semantics of the data.
A row in the Manufacturers table and a row in the ParentCompany table both appear to refer to the same thing, namely a company. In that case, giving a company the same id in both tables is not only possible, but actually useful. It represents a 1 to 1 correspondence between manufacturers and parent companies without adding extra columns to serve as FKs.
Thanks for the quick answers!
I think I know what to do now. I will create a general company table, in which all companies will be stored. Then I will create, as I go along specific company tables like Manufacturer and parent company that reference a certain company in the company table.
To clarify, the only column I would put into the sub-company tables is a column with a foreign key referencing a column of the company table, yes?
For the primary key, I was just confused, because I hear so much about the key needing to be unique, and can't have the same value as another. So then this condition only goes for tables, not the whole database. Thanks for the clarification!

SQL sub-types with overlapping child tables

Consider the problem above where the 'CommonChild' entity can be a child of either sub-type A or B, but not C. How would I go about designing the physical model in a relational [SQL] database?
Ideally, the solution would allow...
for an identifying relationship between CommonChild and it's related sub-type.
a 1:N relationship.
Possible Solutions
Add an additional sub-type to the super-type and move sub-type A and B under the new sub-type. The CommonChild can then have a FK constraint on the newly created sub-type. Works for the above, but not if an additional entity is added which can have a relationship with sub-type A and C, but not B.
Add a FK constraint between the CommonChild and SuperType. Use a trigger or check constraint (w/ UDF) against the super-type's discriminator before allowing a new tuple into CommonChild. Seems straight forward, but now CommonChild almost seems like new subtype itself (which it is not).
My model is fundamentally flawed. Remodel and the problem should go away.
I'm looking for other possible solutions or confirmation of one of the above solutions I've already proposed.
Thanks!
EDIT
I'm going to implement the exclusive foreign key solution provided by Branko Dimitrijevic (see accepted answer).
I am going to make a slight modifications in this case as:
the super-type, sub-type, and "CommonChild" all have the same PKs and;
the PKs are 3 column composites.
The modification is to to create an intermediate table whose sole role is to enforce the exclusive FK constraint between the sub-types and the "CommonChild" (exact model provided by Dimitrijevic minus the "CommonChild's" attributes.). The CommonChild's PK will have a normal FK constraint to the intermediate table.
This will prevent the "CommonChild" from having 2 sets of 3 column composite FKs. Plus, since the identifying relationship is maintained from super-type to "CommonChild", [read] queries can effectively ignore the intermediate table altogether.
Looks like you need a variation of exclusive foreign keys:
CREATE TABLE CommonChild (
Id AS COALESCE(SubTypeAId, SubTypeBId) PERSISTED PRIMARY KEY,
SubTypeAId int REFERENCES SubTypeA (SuperId),
SubTypeBId int REFERENCES SubTypeB (SuperId),
Attr6 varchar,
CHECK (
(SubTypeAId IS NOT NULL AND SubTypeBId IS NULL)
OR (SubTypeAId IS NULL AND SubTypeBId IS NOT NULL)
)
);
There are couple of thing to note here:
There are two NULL-able FOREIGN KEYs.
There is a CHECK that allows exactly one of these FKs be non-NULL.
There is a computed column Id which equals one of the FKs (whichever is currently non-NULL) which is also a PRIMARY KEY. This ensures that:
One parent cannot have multiple children.
A "grandchild" table can reference the CommonChild.Id directly from its FK. The SuperType.Id is effectively popagated all the way down.
We don't have to mess with NULL-able UNIQUE constraints, which are problematic in MS SQL Server (see below).
A DBMS-agnostic way of of doing something similar would be...
CREATE TABLE CommonChild (
Id int PRIMARY KEY,
SubTypeAId int UNIQUE REFERENCES SubTypeA (SuperId),
SubTypeBId int UNIQUE REFERENCES SubTypeB (SuperId),
Attr6 varchar,
CHECK (
(SubTypeAId IS NOT NULL AND SubTypeAId = Id AND SubTypeBId IS NULL)
OR (SubTypeAId IS NULL AND SubTypeBId IS NOT NULL AND SubTypeBId = Id)
)
)
Unfortunately a UNIQUE column containing more than one NULL is not allowed by MS SQL Server, which is not the case in most DBMSes. However, you can just omit the UNIQUE constraint if you don't want to reference SubTypeAId or SubTypeBId directly.
Wondering what am I missing here?
Admittedly, it is hard without having the wording of the specific problem, but things do feel a bit upside-down.

MS SQL creating many-to-many relation with a junction table

I'm using Microsoft SQL Server Management Studio and while creating a junction table should I create an ID column for the junction table, if so should I also make it the primary key and identity column? Or just keep 2 columns for the tables I'm joining in the many-to-many relation?
For example if this would be the many-to many tables:
MOVIE
Movie_ID
Name
etc...
CATEGORY
Category_ID
Name
etc...
Should I make the junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
Movie_Category_Junction_ID
[and make the Movie_Category_Junction_ID my Primary Key and use it as the Identity Column] ?
Or:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
[and just leave it at that with no primary key or identity table] ?
I would use the second junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
The primary key would be the combination of both columns. You would also have a foreign key from each column to the Movie and Category table.
The junction table would look similar to this:
create table movie_category_junction
(
movie_id int,
category_id int,
CONSTRAINT movie_cat_pk PRIMARY KEY (movie_id, category_id),
CONSTRAINT FK_movie
FOREIGN KEY (movie_id) REFERENCES movie (movie_id),
CONSTRAINT FK_category
FOREIGN KEY (category_id) REFERENCES category (category_id)
);
See SQL Fiddle with Demo.
Using these two fields as the PRIMARY KEY will prevent duplicate movie/category combinations from being added to the table.
There are different schools of thought on this. One school prefers including a primary key and naming the linking table something more significant than just the two tables it is linking. The reasoning is that although the table may start out seeming like just a linking table, it may become its own table with significant data.
An example is a many-to-many between magazines and subscribers. Really that link is a subscription with its own attributes, like expiration date, payment status, etc.
However, I think sometimes a linking table is just a linking table. The many to many relationship with categories is a good example of this.
So in this case, a separate one field primary key is not necessary. You could have a auto-assign key, which wouldn't hurt anything, and would make deleting specific records easier. It might be good as a general practice, so if the table later develops into a significant table with its own significant data (as subscriptions) it will already have an auto-assign primary key.
You can put a unique index on the two fields to avoid duplicates. This will even prevent duplicates if you have a separate auto-assign key. You could use both fields as your primary key (which is also a unique index).
So, the one school of thought can stick with integer auto-assign primary keys, and avoids compound primary keys. This is not the only way to do it, and maybe not the best, but it won't lead you wrong, into a problem where you really regret it.
But, for something like what you are doing, you will probably be fine with just the two fields. I'd still recommend either making the two fields a compound primary key, or at least putting a unique index on the two fields.
I would go with the 2nd junction table. But make those two fields as Primary key. That will restrict duplicate entries.

Foreign key with multiple columns from different tables

Let's take a stupid example : I have many domestic animals, each one with a NAME as an id and a type (being CAT or DOG), let's write it this way (pseudo code) :
TABLE ANIMALS (
NAME char,
ANIMAL_TYPE char {'DOG', 'CAT'}
PRIMARY KEY(NAME)
)
(for instance, I have a CAT named Felix, and a dog called Pluto)
In another table, I'd like to store the prefered food for each one of my animals :
TABLE PREFERED_FOOD (
ANIMAL_NAME char,
PREF_FOOD char
FOREIGN KEY (ANIMAL_NAME) REFERENCES ANIMALS(NAME)
)
(for instance, Felix likes milk, and Pluto likes bones)
As I would like to define a set of possible prefered foods, I store in a third table the food types, for each type of animal :
TABLE FOOD (
ANIMAL_TYPE char {'DOG', 'CAT'},
FOOD_TYPE char
)
(for instance, DOGs eat bones and meat, CATs eat fish and milk)
Here comes my question : I'd like to add a foreign constraint in PREFERED_FOOD, so as the PREF_FOOD is a FOOD_TYPE from FOOD with FOOD.ANIMAL_TYPE=ANIMALS.TYPE. How can I define this foreign key without duplicating the ANIMAL_TYPE on PREFERED_FOOD ?
I'm not an expert with SQL, so you can call me stupid if it is really easy ;-)
You can't in SQL. I think you could if SQL supported assertions. (The SQL-92 standard defined assertions. Nobody supports them yet, as far as I know.)
To work around that problem, use overlapping constraints.
-- Nothing special here.
create table animal_types (
animal_type varchar(15) primary key
);
create table animals (
name varchar(15) primary key,
animal_type varchar(15) not null references animal_types (animal_type),
-- This constraint lets us work around SQL's lack of assertions in this case.
unique (name, animal_type)
);
-- Nothing special here.
create table animal_food_types (
animal_type varchar(15) not null references animal_types (animal_type),
food_type varchar(15) not null,
primary key (animal_type, food_type)
);
-- Overlapping foreign key constraints.
create table animals_preferred_food (
animal_name varchar(15) not null,
-- This column is necessary to implement your requirement.
animal_type varchar(15) not null,
pref_food varchar(10) not null,
primary key (animal_name, pref_food),
-- This foreign key constraint requires a unique constraint on these
-- two columns in "animals".
foreign key (animal_name, animal_type)
references animals (animal_name, animal_type),
-- Since the animal_type column is now in this table, this constraint
-- is simple.
foreign key (animal_type, pref_food)
references animal_food_types (animal_type, food_type)
);
FOREIGN KEY (PREF_FOOD) REFERENCES FOOD (FOOD_TYPE)
in the PREFERRED_FOOD table, this will make sure that every PREFFOOD in the PREFERRED_FOOD table is already present in the FOOD_TYPE of FOOD table.
and in the FOOD table use, its quite self-explanatory now.
FOREIGN KEY (ANIMAL_TYPE) REFERENCES ANIMALS (ANIMAL_TYPE)
Depending on what DBMS you are using (please edit your question to include this), you would probably want to create a unique constraint on the ANIMAL_TYPE and PREFERED_FOOD columns.
Something like this:
ALTER TABLE PREFERED_FOOD
ADD CONSTRAINT uc_FoodAnimal UNIQUE (ANIMAL_TYPE,PREFERED_FOOD)
Frankly, I had some trouble following your requirements, but a straightforward model for representing animals and their food would probably look like this:
The SPECIES_FOOD lists all foods a given species can eat, and the INDIVIDUAL then just picks one of them through the PREFERRED_FOOD_NAME field.
Since INDIVIDUAL.SPECIES_NAME is a FK towards both SPECIES and SPECIES_FOOD, an individual can never prefer a food that is not edible by its species.
This of course assumes an individual animal cannot have more than one preferred food.1 It also assumes it can have none - if that's not the case, just make the INDIVIDUAL.PREFERRED_FOOD_NAME NOT NULL.
The INDIVIDUAL_NAME was intentionally not made a key, so you can have, say, two cats with the name "Felix". If that's not desirable, you'll easy add the appropriate key.
If all you need to know about the food is its name, and you don't need to represent a food independently from any species, the FOOD table can be omitted altogether.
1 In case there can be multiple preferred foods per individual animal, you'd need one more table "between" INDIVIDUAL and SPECIES_FOOD, and be careful to keep using identifying relationships, so SPECIES_NAME is migrated all the way down (to prevent preferring a food not edible by the species).
If you take the (natural) JOIN of ANIMALS and PREFERRED_FOOD, then you get a table in which for each animal, its type and its preferred food are listed.
You want that combination to be "valid" for each individual animal where "valid" means "to appear in the enumeration of valid animal type/food type combinations that are listed in FOOD.
So you have a constraint that is somewhat similar to an FK, but this time the "foreign key" appears not in a base table, but in a join of two tables. For this type of constraint, the SQL language has CHECK constraints and ASSERTIONS.
The ASSERTION version is the simplest. It is a constraint like (I've been somewhat liberal with the attribute names in order to avoid mere attribute renames that obfuscate the point)
CREATE ASSERTION <name for your constraint here>
CHECK NOT EXISTS (SELECT ANIMAL_TYPE, FOOD_TYPE
FROM ANIMALS NATURAL JOIN PREF_FOOD
WHERE (ANIMAL_TYPE, FOOD_TYPE) NOT IN
SELECT ANIMAL_TYPE, FOOD_TYPE FROM FOOD_TYPE);
But your average SQL engine won't support ASSERTIONs. So you have to use CHECK constraints. For the PREF_FOOD table, for example, the CHECK constraint you need might look something like
CHECK EXISTS (SELECT 1
FROM FOOD NATURAL JOIN ANIMAL
WHERE ANIMAL_TYPE = <animal type of inserted row> AND
FOOD_TYPE = <food type of inserted row>);
In theory, this should suffice to enforce your constraint, but then again your average SQL engine will once again not support this kind of CHECK constraint, because of the references to tables other than the one the constraint is defined on.
So the options you have is to resort to rather complex (*) setups like catcall's, or enforcing the constraint using triggers (and you'll have to write quite a lot of them (three or six at least, haven't thought this through in detail), and your next best option is to enforce this in application code, and once again there will be three or six (more or less) distinct places where the same number of distinct checks need to be implemented.
In all of these three scenario's, you will preferably want to document the existence of the constraint, and what exactly it is about, in some other place. None of the three will make it very obvious to a third party reading this design what the heck this is all about.
(*) "complex" might not exactly be the right word, but note that such solutions rely on deliberate redundancy, thus deliberately going below 3NF with the design. And this means that your design is exposed to update anomalies, meaning that it will be harder for the user to update the database AND keep it consistent (precisely because of the deliberate redundancies).