Foreign key with multiple columns from different tables - sql

Let's take a stupid example : I have many domestic animals, each one with a NAME as an id and a type (being CAT or DOG), let's write it this way (pseudo code) :
TABLE ANIMALS (
NAME char,
ANIMAL_TYPE char {'DOG', 'CAT'}
PRIMARY KEY(NAME)
)
(for instance, I have a CAT named Felix, and a dog called Pluto)
In another table, I'd like to store the prefered food for each one of my animals :
TABLE PREFERED_FOOD (
ANIMAL_NAME char,
PREF_FOOD char
FOREIGN KEY (ANIMAL_NAME) REFERENCES ANIMALS(NAME)
)
(for instance, Felix likes milk, and Pluto likes bones)
As I would like to define a set of possible prefered foods, I store in a third table the food types, for each type of animal :
TABLE FOOD (
ANIMAL_TYPE char {'DOG', 'CAT'},
FOOD_TYPE char
)
(for instance, DOGs eat bones and meat, CATs eat fish and milk)
Here comes my question : I'd like to add a foreign constraint in PREFERED_FOOD, so as the PREF_FOOD is a FOOD_TYPE from FOOD with FOOD.ANIMAL_TYPE=ANIMALS.TYPE. How can I define this foreign key without duplicating the ANIMAL_TYPE on PREFERED_FOOD ?
I'm not an expert with SQL, so you can call me stupid if it is really easy ;-)

You can't in SQL. I think you could if SQL supported assertions. (The SQL-92 standard defined assertions. Nobody supports them yet, as far as I know.)
To work around that problem, use overlapping constraints.
-- Nothing special here.
create table animal_types (
animal_type varchar(15) primary key
);
create table animals (
name varchar(15) primary key,
animal_type varchar(15) not null references animal_types (animal_type),
-- This constraint lets us work around SQL's lack of assertions in this case.
unique (name, animal_type)
);
-- Nothing special here.
create table animal_food_types (
animal_type varchar(15) not null references animal_types (animal_type),
food_type varchar(15) not null,
primary key (animal_type, food_type)
);
-- Overlapping foreign key constraints.
create table animals_preferred_food (
animal_name varchar(15) not null,
-- This column is necessary to implement your requirement.
animal_type varchar(15) not null,
pref_food varchar(10) not null,
primary key (animal_name, pref_food),
-- This foreign key constraint requires a unique constraint on these
-- two columns in "animals".
foreign key (animal_name, animal_type)
references animals (animal_name, animal_type),
-- Since the animal_type column is now in this table, this constraint
-- is simple.
foreign key (animal_type, pref_food)
references animal_food_types (animal_type, food_type)
);

FOREIGN KEY (PREF_FOOD) REFERENCES FOOD (FOOD_TYPE)
in the PREFERRED_FOOD table, this will make sure that every PREFFOOD in the PREFERRED_FOOD table is already present in the FOOD_TYPE of FOOD table.
and in the FOOD table use, its quite self-explanatory now.
FOREIGN KEY (ANIMAL_TYPE) REFERENCES ANIMALS (ANIMAL_TYPE)

Depending on what DBMS you are using (please edit your question to include this), you would probably want to create a unique constraint on the ANIMAL_TYPE and PREFERED_FOOD columns.
Something like this:
ALTER TABLE PREFERED_FOOD
ADD CONSTRAINT uc_FoodAnimal UNIQUE (ANIMAL_TYPE,PREFERED_FOOD)

Frankly, I had some trouble following your requirements, but a straightforward model for representing animals and their food would probably look like this:
The SPECIES_FOOD lists all foods a given species can eat, and the INDIVIDUAL then just picks one of them through the PREFERRED_FOOD_NAME field.
Since INDIVIDUAL.SPECIES_NAME is a FK towards both SPECIES and SPECIES_FOOD, an individual can never prefer a food that is not edible by its species.
This of course assumes an individual animal cannot have more than one preferred food.1 It also assumes it can have none - if that's not the case, just make the INDIVIDUAL.PREFERRED_FOOD_NAME NOT NULL.
The INDIVIDUAL_NAME was intentionally not made a key, so you can have, say, two cats with the name "Felix". If that's not desirable, you'll easy add the appropriate key.
If all you need to know about the food is its name, and you don't need to represent a food independently from any species, the FOOD table can be omitted altogether.
1 In case there can be multiple preferred foods per individual animal, you'd need one more table "between" INDIVIDUAL and SPECIES_FOOD, and be careful to keep using identifying relationships, so SPECIES_NAME is migrated all the way down (to prevent preferring a food not edible by the species).

If you take the (natural) JOIN of ANIMALS and PREFERRED_FOOD, then you get a table in which for each animal, its type and its preferred food are listed.
You want that combination to be "valid" for each individual animal where "valid" means "to appear in the enumeration of valid animal type/food type combinations that are listed in FOOD.
So you have a constraint that is somewhat similar to an FK, but this time the "foreign key" appears not in a base table, but in a join of two tables. For this type of constraint, the SQL language has CHECK constraints and ASSERTIONS.
The ASSERTION version is the simplest. It is a constraint like (I've been somewhat liberal with the attribute names in order to avoid mere attribute renames that obfuscate the point)
CREATE ASSERTION <name for your constraint here>
CHECK NOT EXISTS (SELECT ANIMAL_TYPE, FOOD_TYPE
FROM ANIMALS NATURAL JOIN PREF_FOOD
WHERE (ANIMAL_TYPE, FOOD_TYPE) NOT IN
SELECT ANIMAL_TYPE, FOOD_TYPE FROM FOOD_TYPE);
But your average SQL engine won't support ASSERTIONs. So you have to use CHECK constraints. For the PREF_FOOD table, for example, the CHECK constraint you need might look something like
CHECK EXISTS (SELECT 1
FROM FOOD NATURAL JOIN ANIMAL
WHERE ANIMAL_TYPE = <animal type of inserted row> AND
FOOD_TYPE = <food type of inserted row>);
In theory, this should suffice to enforce your constraint, but then again your average SQL engine will once again not support this kind of CHECK constraint, because of the references to tables other than the one the constraint is defined on.
So the options you have is to resort to rather complex (*) setups like catcall's, or enforcing the constraint using triggers (and you'll have to write quite a lot of them (three or six at least, haven't thought this through in detail), and your next best option is to enforce this in application code, and once again there will be three or six (more or less) distinct places where the same number of distinct checks need to be implemented.
In all of these three scenario's, you will preferably want to document the existence of the constraint, and what exactly it is about, in some other place. None of the three will make it very obvious to a third party reading this design what the heck this is all about.
(*) "complex" might not exactly be the right word, but note that such solutions rely on deliberate redundancy, thus deliberately going below 3NF with the design. And this means that your design is exposed to update anomalies, meaning that it will be harder for the user to update the database AND keep it consistent (precisely because of the deliberate redundancies).

Related

What does it mean when there is no "Id" at the end of a column name which appears to be a foreign key?

This is the databases ERD my final project for school is on (at the bottom), I am required to make a database using this information. I understand how to add the tables that are setup like 'trainer' and even how to add self-joining tables to my database, but something we have NOT learned is what it means or what to do when there is no Id at the end? Like 'evolvesfrom' and 'pokemonfightexppoint'.
Do you not have to add an Id at the end? From what my teacher taught us, I assumed you did. From what I see in this ERD is how evolvesfrom is self-joining itself to pokemonId. I know how to complete this only when there is an Id at the end of evolvesfrom.
For something like trainerId, it is super easy to understand how to add the constraints and everything like so:
CREATE TABLE trainer (
trainerId INT IDENTITY(1, 1),
trainerName VARCHAR(50) NOT NULL,
CONSTRAINT pk_trainer_trainerId PRIMARY KEY (trainerId)
);
I just don't understand how to do this when there is no Id added. For the pokemonFight table, it is noted that "It is assumed that a
Pokémon can play any battles at any battle locations. In other words, the battle experience points are functionally dependent on Pokémon, battle, and battle location", if that makes a difference.
If possible, could anyone show me an example on how to add a table, with constraints on either the pokemon or the pokemonFight table? (obviously you don't have to include the data types or anything).
Thank you in advance.
I am using SQL Server.
There is no required naming convention for columns in SQL Server that differentiates between a data column, a primary key column or a foreign key column.
The only constraints on column names are that they follow the rules for SQL Server identifier naming. However in a particular work environment you might well use a naming convention which does include ID at the end of the column name in order to clearly make the intention of the column obvious.
To create a self-referencing foreign key you just do the same as normal which can be as part of the create table or an alter table.
CREATE TABLE pokemon (
pokemonId INT IDENTITY(1, 1),
...
CONSTRAINT fk_pokemon_evolvesFrom FOREIGN KEY (evolvesFrom) REFERENCES pokemon (pokemonId)
);
-- OR
ALTER TABLE pokemon
ADD CONSTRAINT fk_pokemon_evolvesFrom FOREIGN KEY (evolvesFrom)
REFERENCES pokemon (pokemonId)

Is unique foreign keys across multiple tables via normalization and without null columns possible?

This is a relational database design question, not specific to any RDBMS. A simplified case:
I have two tables Cars and Trucks. They have both have a column, say RegistrationNumber, that must be unique across the two tables.
This could probably be enforced with some insert/update triggers, but I'm looking for a more "clean" solution if possible.
One way to achieve this could be to add a third table, Vehicles, that holds the RegistrationNumber column, then add two additional columns to Vehicles, CarID and TruckID. But then for each row in Vehicles, one of the columns CarID or TruckID would always be NULL, because a RegistrationNumber applies to either a Car or a Truck leaving the other column with a NULL value.
Is there anyway to enforce a unique RegistrationNumber value across multiple tables, without introducing NULL columns or relying on triggers?
This is a bit complicated. Having the third table, Vehicles is definitely part of the solution. The second part is guaranteeing that a vehicle is either a car or a truck, but not both.
One method is the "list-all-the-possibilities" method. This is what you propose with two columns. In addition, this should have a constraint to verify that only one of the ids is filled in. A similar approach is to have the CarId and TruckId actually be the VehicleId. This reduces the number of different ids floating around.
Another approach uses composite keys. The idea is:
create table Vehicles (
Vehicle int primary key,
Registration varchar(255),
Type varchar(255),
constraint chk_type check (type in ('car', 'truck')),
constraint unq_type_Vehicle unique (type, vehicle), -- this is redundant, but necessary
. . .
);
create table car (
VehicleId int,
Type varchar(255), -- always 'car' in this table
constraint fk_car_vehicle foreign key (VehicleId) references Vehicles(VehicleId),
constraint fk_car_vehicle_type foreign key (Type, VehicleId) references Vehicles(Type, VehicleId)
);
See the following tags: class-table-inheritance shared-primary-key
You have already outlined class table inheritance in your question. The tag will just add a few details, and show you some other questions whose answers may help.
Shared primary key is a handy way of enforcing the one-to-one nature of IS-A relationships such as the relationship between a vehicle and a truck. It also allows a foreign key in some other table to reference a vehicle and also a truck or a car, as the case may be.
You can add the third table Vehicles containing a single column RegistratioNumber on which you apply the unique constraint, then on the existing tables - Cars and Trucks - use the RegistrationNumber as a foreign key on the Vehicles table. In this way you don't need an extra id, avoid the null problem and enforce the uniqueness of the registration number.
Update - this solution doesn't prevent a car and a truck to share the same registration number. To enforce this constraint you need to add either triggers or logic beyond plain SQL. Otherwise you may want to take a look at Gordon's solution that involves Composite Foreign Keys.

Best database design for multiple entity types

I'm working on a web app and I have to design it's database. There's a part that didn't come very straightforward to me, so after some thinking and research I came with multiple ideas. Still neither seems completely suitable, so I'm not sure which one to implement and why.
The simplified problem looks as follows:
I have a table Teacher. There are 2 types of teachers, according to the relations with their Fields and Subjects:
A Teacher that's related to a Field, the Field is obligatory related to a Category
A Teacher that's not related to a Field, but directly to a Category
My initial idea was to have two nullable foreign keys, one to the table Field, and the other to the table Category. But in this case, how can I make sure that exactly one is null, and the other one is not?
The other idea is to create a hierarchy, with two types of Teacher tables derived from the table Teacher (is-a relation), but I couldn't find any useful tutorial on this.
I'm developing the app using Django with SQLite db
OK, your comment made it much clearer:
If a Teacher belongs to exactly one category, you should keep this in the Teacher's table directly:
Secondly each teacher belongs to "one or zero" fields. If this is sure for ever you should use a nullable FieldID column. This is set or remains empty.
Category (CategoryID, Name, ...)
Field (FieldID,Name,...)
Teacher (TeacherID,FieldID [NULL FK],CategoryID [NOT NULL FK], FirstName, Lastname, ...)
Remark: This is almost the same as my mapping table of the last answer. The only difference is, that you'll have a strict limitation with your "exactly one" or "exactly none or one"... From my experience I'd still prefer the open approach. It is easy to enforce your rules with unique indexes including the TeacherID-column. Sooner or later you'll probably have to re-structure this...
As you continue, one category is related to "zero or more" fields. There are two approaches:
Add a CategoryID-column to the Field-table (NOT NULL FK). This way you define a field several times with differing CategoryIDs (combined unique index!). A category's fields list you'll get simply by asking the Field-table for all fields with the given CategoryID.
Better in my eyes was a mapping table CategoryField. If you enforce a unique FieldID you'll get for sure, that no field is mapped twice. And add a unique index on the combination of CategoryID and FieldID...
A SELECT could be something like this (SQL Server Syntax, untested):
SELECT Teacher.TeacherID
,Teacher.FieldID --might be NULL
,Teacher.CategoryID --never NULL
,Teacher.[... Other columns ...]
,Field.Name --might be NULL
--The following columns you pick from the right source,
--depending on the return value of the LEFT JOIN to Field and the related "catField"
--the directly joined "Category" (which is never NULL) is the "default"
,ISNULL(catField.CategoryID,Category.CategoryID) AS ResolvedCategoryID
,ISNULL(catField.Name,Category.Name) AS ResolvedCategoryName
,[... Other columns ...]
FROM Teacher
INNER JOIN Category ON Teacher.CategoryID=Category.CategoryID --never NULL
LEFT JOIN Field ON Teacher.FieldID=Field.FieldID --might be NULL
LEFT JOIN Category AS catField ON Field.CategoryID=catField.CategoryID
This was the answer before the EDIT:
I try to help you even if the concept is not absolutely clear to me
Teacher-Table: TeacherID, person's data (name, address...), ...
Category-Table: CategoryID, category title, ...
Field-Tabls: FieldID, field title, ...
You say, that fields are bound to a category in all cases. If this is the same category in all cases, you should set the category as a FK-column in the Field-Table. If there is the slightest chance, that a field's category could differ with the context, you should not...
Same with teachers: If a teacher is ever bound to one single category set a FK-column within the Teacher-table, otherwise don't.
The most flexible you'll be with at least one mapping table:
(SQL Server Syntax)
CREATE TABLE TeacherFieldCategory
(
--A primary key to identify this row. This is not needed actually, but it will serve as clustered key index as a lookup index...
TeacherFieldCategoryID INT IDENTITY NOT NULL CONSTRAINT PK_TeacherFieldCategory PRIMARY KEY
--Must be set
,TeacherID INT NOT NULL CONSTRAINT FK_TeacherFieldCategory_TeacherID FOREIGN KEY REFERENCES Teacher(TeacherID)
--Field may be left NULL
,FieldID INT NULL CONSTRAINT FK_TeacherFieldCategory_FieldID FOREIGN KEY REFERENCES Field(FieldID)
--Must be set. This makes sure, that a teacher ever has a category and - if the field is set - the field will have a category
,CategoryID INT NOT NULL CONSTRAINT FK_TeacherFieldCategory_CategoryID FOREING KEY REFERENCES Category(CategoryID)
);
--This unique index will ensure, that each combination will exist only once.
CREATE UNIQUE INDEX IX_TeacherFieldCategory_UniqueCombination ON TeacherFieldCategory(TeacherID,FieldID,CategoryID);
It could be a better concept to have a mapping table FieldCategory and this table mapped to the mapping table above through a foreign key. Doing so you could avoid invalid field-category combinations.
Hope this helps...

SQL - NULL foreign key

Please have a look at the database design below:
create table Person (id int identity, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
Currently all persons have an invoiceID i.e. the InvoiceID is not null.
I want to extend the database so that some person does not have an Invoice. The original developer hated nulls and never uses them. I want to be consistent so I am wondering if there are other patterns I can use to extend the database to meet this requirement. How can this be approached without using nulls?
Please note that the two tables above are for illustration purposes. They are not the actual tables.
NULL is a very important feature in databases and programming in general. It is significantly different from being zero or any other value. It is most commonly used to signify absence of value (though it also can mean unknown value, but that's less used as the interpretation). If some people do not have an invoice, then you should truly allow NULL, as that matches your desired Schema
A common pattern would be to store that association in a separate table.
Person: Id
Invoice: Id
Assoc: person_id, assoc_id
Then if a person doesn't have an invoice, you simply don't have a row. This approach also allows a person to have more than one invoice id which might make sense.
The only way to represent the optional relationship while avoiding nulls is to use another table, as some other answers have suggested. Then the absence of a row for a given Person indicates the person has no Invoice. You can enforce a 1:1 relationship between this table and the Person table by making person_id be the primary or unique key:
CREATE TABLE PersonInvoice (
person_id INT NOT NULL PRIMARY KEY,
invoice_id INT NOT NULL,
FOREIGN KEY (person_id) REFERENCES Person(id),
FOREIGN KEY (invoice_id) REFERENCES Invoice(id)
);
If you want to permit each person to have multiple invoices, you can declare the primary key as the pair of columns instead.
But this solution is to meet your requirement to avoid NULL. This is an artificial requirement. NULL has a legitimate place in a data model.
Some relational database theorists like Chris Date eschew NULL, explaining that the existence of NULL leads to some troubling logical anomalies in relational logic. For this camp, the absence of a row as shown above is a better way to represent missing data.
But other theorists, including E. F. Codd who wrote the seminal paper on relational theory, acknowledged the importance of a placeholder that means either "not known" or "not applicable." Codd even proposed in a 1990 book that SQL needed two placeholders, one for "missing but applicable" (i.e. unknown), and the other for "missing but inapplicable."
To me, the anomalies we see when using NULL in certain ways are like the undefined results we see in arithmetic when we divide by zero. The solution is: don't do that.
But certainly we shouldn't use any non-NULL value like 0 or '' (empty string) to represent missing data. And likewise we shouldn't use a NULL as if it were an ordinary scalar value.
I wrote more about NULL in a chapter titled "Fear of the Unknown" in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
You need to move the invoice/person relation to another table.
You end up with
create table Person (id int person_identity)
create table PersonInvoice (id int person_id, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
You need this for some databases to allow in InvoiceId to be a foreign key as some do not allow NULLS in a foreign key.
If a person only can have one invoice then PersonInvoice can have a unique constraint on the person_id as well as the two columns together. You can also enforce having a single person for a invoice by adding a unique constraint to the invoiceID field.

ORACLE Table design: M:N table best practice

I'd like to hear your suggestions on this very basic question:
Imagine these three tables:
--DROP TABLE a_to_b;
--DROP TABLE a;
--DROP TABLE b;
CREATE TABLE A
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT A_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE B
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT B_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE A_TO_B
(
id NUMBER NOT NULL,
a_id NUMBER NOT NULL,
b_id NUMBER NOT NULL,
somevalue1 VARCHAR2(20) NOT NULL,
somevalue2 VARCHAR2(20) NOT NULL,
somevalue3 VARCHAR2(20) NOT NULL
) ;
How would you design table a_to_b?
I'll give some discussion starters:
synthetic id-PK column or combined a_id,b_id-PK (dropping the "id" column)
When synthetic: What other indices/constraints?
When combined: Also index on b_id? Or even b_id,a_id (don't think so)?
Also combined when these entries are referenced themselves?
Also combined when these entries perhaps are referenced themselves in the future?
Heap or Index-organized table
Always or only up to x "somevalue"-columns?
I know that the decision for one of the designs is closely related to the question how the table will be used (read/write ratio, density, etc.), but perhaps we get a 20/80 solution as blueprint for future readers.
I'm looking forward to your ideas!
Blama
I have always made the PK be the combination of the two FKs, a_id and b_id in your example. Adding a synthetic id field to this table does no good, since you never end up looking for a row based on a knowledge of its id.
Using the compound PK gives you a constraint that prevents the same instance of the relationship between a and b from being inserted twice. If duplicate entries need to be permitted, there's something wrong with your data model at the conceptual level.
The index you get behind the scenes (for every DBMS I know of) will be useful to speed up common joins. An extra index on b_id is sometimes useful, depending on the kinds of joins you do frequently.
Just as a side note, I don't use the name "id" for all my synthetic pk columns. I prefer a_id, b_id. It makes it easier to manage the metadata, even though it's a little extra typing.
CREATE TABLE A_TO_B
(
a_id NUMBER NOT NULL REFERENCES A (a_id),
b_id NUMBER NOT NULL REFERENCES B (b_id),
PRIMARY KEY (a_id, b_id),
...
) ;
It's not unusual for ORMs to require (or, in more clueful ORMs, hope for) an integer column named "id" in addition to whatever other keys you have. Apart from that, there's no need for it. An id number like that makes the table wider (which usually degrades I/O performance just slightly), and adds an index that is, strictly speaking, unnecessary. It isn't necessary to identify the entity--the existing key does that--and it leads new developers into bad habits. (Specifically, giving every table an integer column named "id", and believing that that column alone is the only key you need.)
You're likely to need one or more of these indexed.
a_id
b_id
{a_id, b_id}
{b_id, a_id}
I believe Oracle should automatically index {a_id, b_id}, because that's the primary key. Oracle doesn't automatically index foreign keys. Oracle's indexing guidelines are online.
In general, you need to think carefully about whether you need ON UPDATE CASCADE or ON DELETE CASCADE. In Oracle, you only need to think carefully about whether you need ON DELETE CASCADE. (Oracle doesn't support ON UPDATE CASCADE.)
the other comments so far are good.
also consider adding begin_dt and end_dt to the relationship. in this way, you can manage a good number of questions about each relationship through time. (consider baseline issues)