Foreign key with many-to-many containment constraint - sql

Quick question that I'm having trouble putting into search terms:
Suppose I have a many-to-many relationship between players and teams:
CREATE TABLE players (
id bigserial primary key,
name text not null
);
CREATE TABLE teams (
id bigserial primary key,
name text not null,
team_captain_id bigint not null references players(id)
);
CREATE TABLE team_players (
id bigserial primary key,
player_id bigint not null,
team_id bigint not null
);
ALTER TABLE team_players ADD CONSTRAINT uq_team_players UNIQUE (player_id,team_id);
Now, each team is required to have a team captain, who is also a player. But I want to enforce that the team captain is also a member of that team (or, semantically equivalent, that the team captain is not redundantly in the join table)
Is there a standard way to model this? I can think of several ways that would actually get the job done, but I'm wondering if there's a standard, elegant way of doing it.
Thanks!
EDIT: Although it would be nice to have the captain a required field, I would also be content with the following condition: If the team has at least 1 member, then a captain is defined for it.
EDIT 2: OK, here's an attempt for clarification. Pardon the unnecessary "id" columns.
CREATE TABLE players (
id bigserial primary key,
name text not null
);
CREATE TABLE teams (
id bigserial primary key,
name text not null
);
CREATE TABLE leaderships (
id bigserial primary key,
team_id bigint not null references teams(id),
captain_id bigint not null references players(id),
-- Make a key.
UNIQUE (team_id,captain_id),
-- Only one leadership per team.
UNIQUE (team_id)
);
CREATE TABLE team_players (
id bigserial primary key,
team_id bigint not null,
captain_id bigint not null,
player_id bigint not null,
-- One entry per player.
UNIQUE (team_id,captain_id,player_id),
-- Valid reference to a leadership.
FOREIGN KEY (team_id,captain_id) references leaderships(team_id,captain_id),
-- Not the captain.
CHECK (player_id <> captain_id)
);

You need to learn about database design.
Find fill-in-the-(named-)blank statements that describe your application. Each statement gets a table. A table holds the rows that make a true statement.
// [player_id] is a player
player(player_id)
// [team_id] is a team
team(team_id)
// player [player_id] plays for team [team_id]
team_players(team_id,player_id)
Turns out you don't need a player_team_id. The team_players (player_id,team_id) pairs are 1:1 with them so you can use those instead. On the other hand team-player contracts are 1:1 with them so they might have a role.
Every team_players player_id is a player player_id (since every team player is a player). We say that via a FOREIGN KEY delaration (and the DBMS enforces it):
FOREIGN KEY (team_id) REFERENCES team (team_id)
FOREIGN KEY (player_id) REFERENCES player (player_id)
It's true that team_players (player_id,team_id) is unique. But more than that is true. No contained subrow is unique. This matters to database design.
A unique subrow is a "superkey". A unique subrow containing no smaller unique subrow is a "key". Use KEY for that. Any superset of key columns is unique. But SQL requires that the target of a FOREIGN KEY be explictly declared so. Use UNIQUE for that. Traditionally in SQL you pick one key as PRIMARY KEY. This matters to some SQL functionality. (Technically, in SQL KEY means UNIQUE and PRIMARY KEY means UNIQUE NON NULL. Ie SQL does not enforce no-smaller-contained-unique-subrow.)
KEY (team_id,player_id)
(If you also had a team_player_id in team_players it too would be a KEY, typically the PK.)
Some players are captains. It's 1:1. So both team_id and player_id are unique.
// [player_id] captains [team_id]
team_captains(team_id,player_id)
FOREIGN KEY (team_id) REFERENCES team (team_id)
FOREIGN KEY (player_id) REFERENCES player (player_id)
KEY (team_id)
KEY (player_id)
A team-captain pair must appear as a team-player pair.
FOREIGN KEY (team_id,player_id) REFERENCES team_players (team_id,player_id)
Your thoughts on redundancy re captains is admirable. It is true that there is a sense in which it is redundant to have the database record that a person is a team's captain and that they are on a given team.
-- instead of having team_players(team_id,player_id)
-- team_players team_players FK now to here
// player [player_id] is a non-captain on team [team_id]
team_non_captains(team_id,player_id)
FOREIGN KEY (team_id) REFERENCES team (team_id)
FOREIGN KEY (player_id) REFERENCES player (player_id)
KEY (team_id,player_id)
However, every time you wanted the players on a team you'd have to say:
-- now team_player =
// player [player_id] is a non-captain on team [team_id]
// OR player [player_id] is captain of team [teamm_id]
select * from team_non_captains UNION select * from team_captains
It turns out it is probably worse to have one "redundant" row per captain than it is to have one "redundant" union operation (and "redundant" human parsing of a sub-expression) per query involving a whole team. Just make the most straightforward statements.
(Avoid nulls in an initial design. They complicate table meanings and query meanings. Especially query meanings because SQL does not evaluate expressions involving nulls in a way that means means anything in particular in terms of the meanings of tables in a query, let alone "not known" or "not applicable". One uses them as an engineering tradeoff which you must learn to judge.)

The simplest possible design, a real-world solution could possible involve more complexity.
a player is part of exactly one team so player.team_id is functionally dependent on player.id, and points to teams.id
a team has one captain which should be present in the players table
The captain should be part of the team, which requires an additional constraint teams.{team_id,captain_id} -> players.{team_id,id}
CREATE TABLE players
( id bigserial primary key
, team_id INTEGER NOT NULL
, name text not null
, UNIQUE (team_id, id) -- we need this because the FK requires it
);
CREATE TABLE teams
( id bigserial primary key
, captain_id bigint not null references players(id)
, name text not null
);
ALTER TABLE players
ADD FOREIGN KEY (team_id )
REFERENCES teams(id)
;
-- captain must be part of the team
ALTER TABLE teams
ADD FOREIGN KEY (id, captain_id )
REFERENCES players( team_id, id)
;
UPDATE it appears a player can take part of more than one team, so a N:M junction-table will be needed:
CREATE TABLE players
( player_id bigserial PRIMARY KEY
, name text not null
);
CREATE TABLE teams
( team_id bigserial PRIMARY KEY
, captain_id bigint
, name text not null
);
CREATE TABLE players_teams
(player_id INTEGER NOT NULL REFERENCES players(player_id)
, team_id INTEGER NOT NULL REFERENCES teams(team_id)
, PRIMARY KEY (player_id,team_id)
);
ALTER TABLE teams
ADD FOREIGN KEY (team_id,captain_id)
REFERENCES players_teams(team_id,player_id)
;

Related

Enforce referential integrity in a ternary relation

Question
Employees are organized into teams. Each team can have multiple employees, and each employee can belong to multiple teams. This many-to-many relationship is represented by the team_membership table.
Each project is assigned to one team. Projects are subdivided into tasks, and each task is assigned to an employee.
Is it possible to guarantee that a task's employee is a member of the corresponding project's team, without adding triggers or redundant columns?
Example tables
CREATE TABLE employee
(
employee_id bigserial PRIMARY KEY,
employee_name text
);
CREATE TABLE team
(
team_id bigserial PRIMARY KEY,
team_name text
);
CREATE TABLE team_membership
(
team_id bigint NOT NULL REFERENCES team,
employee_id bigint NOT NULL REFERENCES employee,
PRIMARY KEY (team_id, employee_id)
);
CREATE TABLE project
(
project_id bigserial PRIMARY KEY,
team_id bigint NOT NULL REFERENCES team,
project_name text
);
CREATE TABLE task
(
task_id bigserial PRIMARY KEY,
task_name text,
project_id bigint NOT NULL REFERENCES project,
employee_id bigint NOT NULL REFERENCES employee
);
What I have already tried
Use a trigger to check validity when data changes. This would require writing similar trigger procedures for the employee and team_membership tables.
CREATE FUNCTION check_employee_member_of_team() RETURNS trigger AS $$
BEGIN
IF NOT new.employee_id IN (
SELECT employee_id FROM team_membership tm
JOIN project pr ON tm.team_id = pr.team_id
WHERE pr.proejct_id = new.project_id
) THEN
RAISE EXCEPTION 'Employee is not a member of project';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER insert_or_update_task_trigger
BEFORE INSERT OR UPDATE ON task
FOR EACH ROW EXECUTE PROCEDURE check_employee_member_of_team();
Add a team_id column to the task table, and enforce the constraint using composite foreign keys. project.team_id and task.team_id are redundant.
CREATE TABLE project
(
project_id bigserial PRIMARY KEY,
team_id bigint NOT NULL REFERENCES team,
project_name text,
UNIQUE (project_id, team_id)
);
CREATE TABLE task
(
task_id bigserial PRIMARY KEY,
task_name text,
project_id bigint NOT NULL,
team_id bigint NOT NULL,
employee_id bigint NOT NULL REFERENCES employee,
FOREIGN KEY (project_id, team_id) REFERENCES project (project_id, team_id),
FOREIGN KEY (team_id, employee_id) REFERENCES team_membership (team_id, employee_id)
);
Add a redundant team_id to task and add the new column to the foreign key referencing project. This requires a redundant unique constraint on project(team_id, project_id).
Then create a foreign key constraint from task to team_membership. Then the task can only be assigned to a team member.
The following solution, a variation on that of #LaurenzAlbe, reduces the redunduncy while maintaining all the constraints:
employee (empl_id, empl_name), primary key empl_id
team (team_id, team_name), primary key team_id
team_membership (empl_id, team_id),
primary key (empl_id, team_id),
foreign key empl_id references employee,
foreign key team_id references team
project (project_num, team_id, project_name)
primary key(project_num, team_id)
foreign key team_id references team
task (task_num, project_num, team_id, empl_id, task_name)
primary key(task_num, project_num, team_id)
foreign key (project_num, team_id) references project,
foreign key (empl_id, team_id) references team_membership
Note that project_num is a sequential number internal to each task (so different tasks can have the same set of numbers for their projects), while task_num is a sequential number internal to each project.
Finally, note that, as for the comment of #DamirSudarevic, the foreign key on team_membership is enough to ensure that empl_id and team_id reference to an existing employee and team, respectively.
Normalisation to the n-th degree might not always be a very good idea. Storage is far cheaper now than days of Boyce–Codd. The cost of additional complexity and operational costs are far higher by orders of magnitude. Avoiding triggers and such native features will make the solution less portable - lets say in a few years from now your business was super successful and you have to scale this out into a no-sql solution or a platform where triggers are not supported
It might be wise to drop the purist pursuit of too much normalisation and look at more pragmatic solution. Then there is many ways to skin this cat by using foreign keys alone. Other answers are also good in this respect. One way I can think is below and I haven't denormalised too much yet:
CREATE TABLE employee
(
employee_id bigserial PRIMARY KEY,
employee_name text
);
CREATE TABLE team
(
team_id bigserial PRIMARY KEY,
team_name text
);
CREATE TABLE team_membership
(
team_id bigint NOT NULL REFERENCES team,
employee_id bigint NOT NULL REFERENCES employee,
-- PRIMARY KEY (team_id, employee_id)
teamp_membership_id bigserial PRIMARY KEY
);
CREATE TABLE project
(
project_id bigserial PRIMARY KEY,
-- team_id bigint NOT NULL REFERENCES team,
project_name text
);
CREATE TABLE project_team
(
project_team_id bigserial PRIMARY KEY,
project_id bigint REFERENCES project,
teamp_membership_id bigint REFERENCES team_membership
);
CREATE TABLE task
(
task_id bigserial PRIMARY KEY,
task_name text,
project_team_id bigint NOT NULL REFERENCES project_team
);
Surrogate key to team_membership called team_member_id
Remove team_id from project table
Create project-team mapping tables because probably not all employees in a team may be in a project. Also modify the task table.

relational model query for simple person-car example

I'd like some insight on a relational model I have created for PostgreSQL. It has to do with person and car relationships.
CREATE TABLE "person" (
"id" serial NOT NULL PRIMARY KEY,
"name" varchar(300) NOT NULL,
"car_id" integer REFERENCES car (id));
CREATE TABLE "car" (
"id" serial NOT NULL PRIMARY KEY,
"type" varchar(50) NOT NULL);
CREATE TABLE "car_person_relations" (
"id" serial NOT NULL PRIMARY KEY,
"car_type" varchar(50) NOT NULL REFERENCES "car" ("type"),
"person_id" integer NOT NULL REFERENCES "person" ("id"));
Ultimately, I'd like to get the most popular car type based on how many people have it,
i.e. how many "person"s it is associated with. What query can I use to achieve this? And is this relational table (car_person_relations) sufficient to achieve it?
Any insight would be much appreciated
If I understand correctly, this is a simple aggregation with some limiting:
select car_type, count(*) as num_persons
from car_person_relations cpr
group by car_type
order by count(*) desc
limit 1;
First off take a look at your data model, it has a couple fundamental problems.
In CAR_PERSONS_RELATIONS table you are attempting to create a FK on car_type. However, to do so car_type would have to be
unique in the car table. From the documentation section 5.4.5. Foreign Keys .
A foreign key constraint specifies that the values in a column (or a group of columns) must match the values
appearing in some row of another table. We say this maintains the
referential integrity between two related tables. ... A foreign key
must reference columns that either are a primary key or form a unique
constraint.
But this would there could only be 1 car for a given type in the
table.
Now look at the PERSON table. This table contains a FK to car_id, thus establishing a M:1 to CAR. Basically this relationship
says "A person can own only 1 car, but a car can be owned by many persons". Your description and table name indicate
this is NOT the relationship you are trying to define.
Correcting requires the following changes:
Add car_id to CAR_PERSON_RELATIONS.
Remove car_id from PERSON.
Remove car_type from CAR_PERSON_RELATIONS.
Redefine CAR_PERSON_RELATIONS PK to car_id, person_id. Alternately
leave the surrogate PK and create UK on car_id,person_id).
Create the FKs from CAR_PERSON_RELATIONS to CAR and PERSON.
Revised Model
create table person (
id serial
,name varchar(300) not null
,constraint person_pk primary key (id)
);
create table car (
id serial
,type varchar(50) not null
,constraint car_pk primary key (id)
);
create table car_person_relations (
car_id integer
,person_id integer
,constraint car_person_relations primary key (car_id, person_id)
,constraint car_person_relations_2_car_fk
foreign key (car_id)
references (car.id)
,constraint car_person_relations_2_person_fk
foreign key (person_id)
references (person.id)
):
And then the necessary query becomes:
select cpr.car_type, count(*) as num_persons
from car_person_relations cpr
join car c
on cpr.car_id = c.id
group by cpr.car_type
order by count(*) desc
limit 1;

Composite key for a one-to-many relationship?

For a store I have many store_offers which is a one-to-many relationship.
However, for a table
create table store (
id bigserial primary key
);
I can use a single primary key id (SQLfiddle):
create table store_offer (
id bigserial primary key,
store_id bigint not null,
constraint fk__store_offer__store
foreign key (store_id)
references store(id)
);
or a composite primary key (id, store_id) (SQLFiddle):
create table store_offer (
id bigserial not null,
store_id bigint not null,
constraint fk__store_offer__store
foreign key (store_id)
references store(id),
primary key(id, store_id)
);
My question is "what does make more sense here?". Imho the composite key should be the correct way to do it since a store_offer is actually "bound" to as store. But one can argue that this is already the case since the first version has a foreign key. On the other hand a store_offer primary key actually must not change once it's created. You have to create a new store_offer and delete the old one if you want discard one. But you cannot simply change store_id in the second approach.
So what is the correct answer here?
Using primary key(id, store_id) is a bad idea. This will make many queries more complicated and more prone to error. It sounds like what you are really trying to make is a many-to-many relationship between stores and offers. If this is the case you should have a store table with unique store_id as a primary key, an offer table with unique offer_id as a primary key and a store_offer table that has a primary key of store_id and offer_id.

How to represent partially nullable foreign keys?

Say that you want to store contact phone numbers, people, and households in a database. Every person belongs to exactly one household. A phone number may be associated with a particular individual in a household, or may be a general number for the household. These relationships are partially expressed in the following Oracle SQL:
CREATE TABLE HOUSEHOLD (
HOUSEHOLD_ID INTEGER PRIMARY KEY
);
CREATE TABLE PERSON (
PERSON_ID INTEGER PRIMARY KEY,
HOUSEHOLD_ID INTEGER NOT NULL,
CONSTRAINT FK_PERSON_HOUSEHOLD
FOREIGN KEY (HOUSEHOLD_ID)
REFERENCES HOUSEHOLD (HOUSEHOLD_ID)
);
CREATE TABLE CONTACT_PHONE (
PHONE_NUMBER CHAR(10) PRIMARY KEY,
HOUSEHOLD_ID INTEGER NOT NULL,
PERSON_ID INTEGER NULL,
CONSTRAINT FK_PHONE_HOUSEHOLD
FOREIGN KEY (HOUSEHOLD_ID)
REFERENCES HOUSEHOLD (HOUSEHOLD_ID),
CONSTRAINT FK_PHONE_PERSON
FOREIGN KEY (PERSON_ID)
REFERENCES PERSON (PERSON_ID)
);
The foreign keys and NULL/NOT NULL constraints ensure that every person belongs to exactly one household, that every contact phone is associated with exactly one household, and that a contact phone may or may not be associated with person. One thing that they do not prevent is a phone number that is associated with one household, and with a person who belongs to a different household. Is there a standard way to express this kind of relationship using database constraints? The example given is for Oracle, but solutions for other database platforms would be welcome, as well.
We want to have a foreign key to the HOUSEHOLD_ID and PERSON_ID columns of the PERSON table, but don't want it to be checked if the PERSON_ID column of the CONTACT_PHONE table is NULL. The solution is to create a virtual/computed column that replicates HOUSEHOLD_ID, but only when PERSON_ID is not NULL, then use it in the foreign key instead of HOUSEHOLD_ID:
CREATE TABLE CONTACT_PHONE (
PHONE_NUMBER CHAR(10) PRIMARY KEY,
HOUSEHOLD_ID INTEGER NOT NULL,
PERSON_ID INTEGER NULL,
PERSON_HOUSEHOLD_ID GENERATED ALWAYS AS (
CAST(DECODE(PERSON_ID, NULL, NULL, HOUSEHOLD_ID) AS INTEGER)
) VIRTUAL,
CONSTRAINT FK_PHONE_HOUSEHOLD
FOREIGN KEY (HOUSEHOLD_ID)
REFERENCES HOUSEHOLD (HOUSEHOLD_ID),
CONSTRAINT FK_PHONE_PERSON
FOREIGN KEY (PERSON_HOUSEHOLD_ID, PERSON_ID)
REFERENCES PERSON (HOUSEHOLD_ID, PERSON_ID)
);
In this way, when PERSON_ID is not NULL, PERSON_HOUSEHOLD_ID will be the same as HOUSEHOLD_ID, and FK_PHONE_PERSON will be checked normally.
But, when PERSON_ID is NULL, PERSON_HOUSEHOLD_ID will also be NULL. Since both of the local columns participating in FK_PHONE_PERSON are NULL, the constraint will not be checked.

what is FOREIGN KEY is for?

let's say i have 2 tables:
Department(depNum)
Worker(id,deptNum) - key should be id
now i want dept to reference an existing value in Department.
so i write in create table:
CREATE TABLE Worker(
id integer primary key,
dept integer references Department);
my question is, i've seen in many examples that you also put foreign key with the references statement. i don't understand what is primary key for.
does it mean that dept will be also a key on Worker?
thank you
From Wikipedia:
A primary key is a combination of columns which uniquely specify a
row. It is a special case of unique keys. . . . Primary keys were
added to the SQL standard mainly as a convenience to the application
programmer.
You cannot reference a record in a table without a primary key. A foreign key lets you reference a record in another table within an individual record. This foreign key is usually referencing the primary key in the foreign table.
This post has a lot of great information. In particular, check out the highest ranked answer for a bullet list of do's and do not's.
What's wrong with foreign keys?
This post gives a pretty decent explanation, given the poster's
original example:
What will these foreign keys do?
Let's say that each worker can only work in one department at any one time.
So each department has its own unique ID. This is the department's primary key because two departments should never have the same id.
Now, each individual worker must be tracked so they are also assigned their own unique ID. This is their primary key. You need to link the worker to the department that they work in and since they can only work in one department at a time, you can have their department as a foreign key. The foreign key in the worker table is linked to the ID of the department table.
This has more information: http://www.1keydata.com/sql/sql-foreign-key.html
You have two tables:
PLAYER,
primary key (unique) PK_player_id
player_name
foreignt key to TEAM.team_id FK_team_id
TEAM
primary key (unique) PK_team_id
team_name
Every PLAYER is in exact one TEAM.
The PLAYER has a FOREIGN-KEY to the TEAM (FK_team_id). Also you can delete the TEAM, which will delete all player in it cascading (if configured).
Now you can't create a player without an existing TEAM, because the database ensures this.
EDIT:
Didn't you ask for the foreign key?
The primary key is one or more than one column, which will identify on datarow within your database. If jou want to create a foreign key, you have to use a column or more than one column) which is unique.
In my example, there is a unique key (the primary key) for every table, because the name may change. To identify the 'target' of the foreign key, it has to be unique. so it is liklye to use the prmary key of the second table. (TEAM.PK_team_id)
I am not clear, the requirement should be
Department(dept)
Worker(id,dept) - key should be id
which means dept is the primary key in Department and foreign key in worker.
the foreign key is not unique in worker table but it is unique in Department Table.
The worker table cannot have some unknown department which is not defined in the Department.
Did I make sense ?
To ensure the integrity of the tables, not allowing you to enter values in the table (Worker) without referencing an existing row (at Department)
According to the SQL-92 Standard:
A foreign key (FK) may reference either a PRIMARY KEY or a UNIQUE CONSTRAINT. In the case of a PRIMARY KEY, the referenced columns may be omitted from the foreign key declaration e.g. the following three are all valid:
CREATE TABLE Department ( Department INTEGER NOT NULL PRIMARY KEY, ...);
CREATE TABLE Worker (dept INTEGER REFERENCES Department, ...);
CREATE TABLE Department ( Department INTEGER NOT NULL PRIMARY KEY, ...);
CREATE TABLE Worker (dept INTEGER REFERENCES Department (dept), ...);
CREATE TABLE Department ( Department INTEGER NOT NULL UNIQUE, ...);
CREATE TABLE Worker (dept INTEGER REFERENCES Department (dept), ...);
The following is not valid:
CREATE TABLE Department ( Department INTEGER NOT NULL UNIQUE, ...);
CREATE TABLE Worker (dept INTEGER REFERENCES Department, ...);
...because the referenced columns involved in the foreign key must be declared.
When declaring a simple (single-column) FK in-line the FOREIGN KEY keywords are omitted as above.
A composite (multiple-column) cannot be declared in-line and a simple FK need not be declared in-line: in these cases, the referencing column(s) AND the FOREIGN KEY keywords are required (the rules for the referenced columns remain the same as stated earlier) e.g. here are just a few examples:
CREATE TABLE Department ( Department INTEGER NOT NULL PRIMARY KEY, ...);
CREATE TABLE Worker (dept INTEGER, FOREIGN KEY (dept) REFERENCES Department, ...);
CREATE TABLE Department ( Department INTEGER NOT NULL UNIQUE, ...);
CREATE TABLE Worker (dept INTEGER, FOREIGN KEY (dept) REFERENCES Department (dept), ...);
CREATE TABLE DepartmentHistory
(
dept INTEGER NOT NULL,
dt DATE NOT NULL,
PRIMARY KEY (dt, dept),
...
);
CREATE TABLE Worker
(
dept INTEGER NOT NULL,
dept_dt DATE NOT NULL,
FOREIGN KEY (dept_dt, dept) REFERENCES DepartmentHistory,
...
);