In a SQL database, when should a one-to-one relationship be in the same table and when in separate tables? - sql

Can anyone provide some examples of when in a SQL database it's a better choice to keep one-to-one relationships on the same table, and when instead it makes more sense to have them on separate tables?

When you have several entities which all must be able to act as a foreign key to another entity, and the "several entities" have both common properties and unique properties, and you want a NOT NULL constraint on the unique properties (or less important don't want a bunch of NULL values for the unique properties not applicable to the other entity). Even if you didn't have the unique/common properties and didn't care about the NULL values, you might still wish to do so if you wanted individual foreign constraints on each subtpye table as well as the supertype table. This strategy is called supertype/subtype modelling.
Let me give you an example.
peoples
id (PK)
name
age
teachers
id (PK, and FK to people.id)
years_teaching NOT NULL
whatever NOT NULL
students
id (PK, and FK to people.id)
grade NOT NULL
whatever NOT NULL
As you see, teachers and students can have a single common table for some of the properties and can each have their own NOT NULL unique properties. Furthermore, you can JOIN people, teachers, and students to other tables and keep referential integrity.
Another application "might" be if you had separate databases for the each record with some of the properties in one and some in the other, however, I have never done this.

Related

Is unique foreign keys across multiple tables via normalization and without null columns possible?

This is a relational database design question, not specific to any RDBMS. A simplified case:
I have two tables Cars and Trucks. They have both have a column, say RegistrationNumber, that must be unique across the two tables.
This could probably be enforced with some insert/update triggers, but I'm looking for a more "clean" solution if possible.
One way to achieve this could be to add a third table, Vehicles, that holds the RegistrationNumber column, then add two additional columns to Vehicles, CarID and TruckID. But then for each row in Vehicles, one of the columns CarID or TruckID would always be NULL, because a RegistrationNumber applies to either a Car or a Truck leaving the other column with a NULL value.
Is there anyway to enforce a unique RegistrationNumber value across multiple tables, without introducing NULL columns or relying on triggers?
This is a bit complicated. Having the third table, Vehicles is definitely part of the solution. The second part is guaranteeing that a vehicle is either a car or a truck, but not both.
One method is the "list-all-the-possibilities" method. This is what you propose with two columns. In addition, this should have a constraint to verify that only one of the ids is filled in. A similar approach is to have the CarId and TruckId actually be the VehicleId. This reduces the number of different ids floating around.
Another approach uses composite keys. The idea is:
create table Vehicles (
Vehicle int primary key,
Registration varchar(255),
Type varchar(255),
constraint chk_type check (type in ('car', 'truck')),
constraint unq_type_Vehicle unique (type, vehicle), -- this is redundant, but necessary
. . .
);
create table car (
VehicleId int,
Type varchar(255), -- always 'car' in this table
constraint fk_car_vehicle foreign key (VehicleId) references Vehicles(VehicleId),
constraint fk_car_vehicle_type foreign key (Type, VehicleId) references Vehicles(Type, VehicleId)
);
See the following tags: class-table-inheritance shared-primary-key
You have already outlined class table inheritance in your question. The tag will just add a few details, and show you some other questions whose answers may help.
Shared primary key is a handy way of enforcing the one-to-one nature of IS-A relationships such as the relationship between a vehicle and a truck. It also allows a foreign key in some other table to reference a vehicle and also a truck or a car, as the case may be.
You can add the third table Vehicles containing a single column RegistratioNumber on which you apply the unique constraint, then on the existing tables - Cars and Trucks - use the RegistrationNumber as a foreign key on the Vehicles table. In this way you don't need an extra id, avoid the null problem and enforce the uniqueness of the registration number.
Update - this solution doesn't prevent a car and a truck to share the same registration number. To enforce this constraint you need to add either triggers or logic beyond plain SQL. Otherwise you may want to take a look at Gordon's solution that involves Composite Foreign Keys.

Best database design for multiple entity types

I'm working on a web app and I have to design it's database. There's a part that didn't come very straightforward to me, so after some thinking and research I came with multiple ideas. Still neither seems completely suitable, so I'm not sure which one to implement and why.
The simplified problem looks as follows:
I have a table Teacher. There are 2 types of teachers, according to the relations with their Fields and Subjects:
A Teacher that's related to a Field, the Field is obligatory related to a Category
A Teacher that's not related to a Field, but directly to a Category
My initial idea was to have two nullable foreign keys, one to the table Field, and the other to the table Category. But in this case, how can I make sure that exactly one is null, and the other one is not?
The other idea is to create a hierarchy, with two types of Teacher tables derived from the table Teacher (is-a relation), but I couldn't find any useful tutorial on this.
I'm developing the app using Django with SQLite db
OK, your comment made it much clearer:
If a Teacher belongs to exactly one category, you should keep this in the Teacher's table directly:
Secondly each teacher belongs to "one or zero" fields. If this is sure for ever you should use a nullable FieldID column. This is set or remains empty.
Category (CategoryID, Name, ...)
Field (FieldID,Name,...)
Teacher (TeacherID,FieldID [NULL FK],CategoryID [NOT NULL FK], FirstName, Lastname, ...)
Remark: This is almost the same as my mapping table of the last answer. The only difference is, that you'll have a strict limitation with your "exactly one" or "exactly none or one"... From my experience I'd still prefer the open approach. It is easy to enforce your rules with unique indexes including the TeacherID-column. Sooner or later you'll probably have to re-structure this...
As you continue, one category is related to "zero or more" fields. There are two approaches:
Add a CategoryID-column to the Field-table (NOT NULL FK). This way you define a field several times with differing CategoryIDs (combined unique index!). A category's fields list you'll get simply by asking the Field-table for all fields with the given CategoryID.
Better in my eyes was a mapping table CategoryField. If you enforce a unique FieldID you'll get for sure, that no field is mapped twice. And add a unique index on the combination of CategoryID and FieldID...
A SELECT could be something like this (SQL Server Syntax, untested):
SELECT Teacher.TeacherID
,Teacher.FieldID --might be NULL
,Teacher.CategoryID --never NULL
,Teacher.[... Other columns ...]
,Field.Name --might be NULL
--The following columns you pick from the right source,
--depending on the return value of the LEFT JOIN to Field and the related "catField"
--the directly joined "Category" (which is never NULL) is the "default"
,ISNULL(catField.CategoryID,Category.CategoryID) AS ResolvedCategoryID
,ISNULL(catField.Name,Category.Name) AS ResolvedCategoryName
,[... Other columns ...]
FROM Teacher
INNER JOIN Category ON Teacher.CategoryID=Category.CategoryID --never NULL
LEFT JOIN Field ON Teacher.FieldID=Field.FieldID --might be NULL
LEFT JOIN Category AS catField ON Field.CategoryID=catField.CategoryID
This was the answer before the EDIT:
I try to help you even if the concept is not absolutely clear to me
Teacher-Table: TeacherID, person's data (name, address...), ...
Category-Table: CategoryID, category title, ...
Field-Tabls: FieldID, field title, ...
You say, that fields are bound to a category in all cases. If this is the same category in all cases, you should set the category as a FK-column in the Field-Table. If there is the slightest chance, that a field's category could differ with the context, you should not...
Same with teachers: If a teacher is ever bound to one single category set a FK-column within the Teacher-table, otherwise don't.
The most flexible you'll be with at least one mapping table:
(SQL Server Syntax)
CREATE TABLE TeacherFieldCategory
(
--A primary key to identify this row. This is not needed actually, but it will serve as clustered key index as a lookup index...
TeacherFieldCategoryID INT IDENTITY NOT NULL CONSTRAINT PK_TeacherFieldCategory PRIMARY KEY
--Must be set
,TeacherID INT NOT NULL CONSTRAINT FK_TeacherFieldCategory_TeacherID FOREIGN KEY REFERENCES Teacher(TeacherID)
--Field may be left NULL
,FieldID INT NULL CONSTRAINT FK_TeacherFieldCategory_FieldID FOREIGN KEY REFERENCES Field(FieldID)
--Must be set. This makes sure, that a teacher ever has a category and - if the field is set - the field will have a category
,CategoryID INT NOT NULL CONSTRAINT FK_TeacherFieldCategory_CategoryID FOREING KEY REFERENCES Category(CategoryID)
);
--This unique index will ensure, that each combination will exist only once.
CREATE UNIQUE INDEX IX_TeacherFieldCategory_UniqueCombination ON TeacherFieldCategory(TeacherID,FieldID,CategoryID);
It could be a better concept to have a mapping table FieldCategory and this table mapped to the mapping table above through a foreign key. Doing so you could avoid invalid field-category combinations.
Hope this helps...

SQL Table Design: Multiple foreign key columns or general "fkName" and "fkValue" columns

Given a table (Contacts) which could apply to distinct items in a database (Employers, Churches, Hospitals, Government Groups, etc.) which are stored in different tables, when leveraging this single contacts table in the end I've found there exist two choices for relating a contact back to one particular "item"
One column for each "item" type with a Foreign Key association, this results in a table looking like:
contactID empID churchID hospID govID conFN conLN ...
One column indicating the type of "item" (fkName) and one column for the value corresponding to the item of that type (fkValue). This results in a table looking like:
contactID fkName fkValue conFN conLN ...
The first means that out of the X possible foreign keys, X-1 will be NULL, but I get the advantages of hard-associated foreign keys.
The second means that I can set fkName and fkValue as NOT NULL but I don't get the advantages of DB-supported foreign keys.
Ultimately, is there a "right" answer? Are there other advantages / disadvantages that I haven't thought about (performance, security, growth/expansion)?
The second approach is an anti-pattern.
You need to set up many-to-many relationship tables between each entity (Hospitals, Churches, Employers, Government Groups, etc.) and Contacts.
If you want to make it easier to query for all of the entities a contact is related to, consider creating a view on top of the many-to-many relationship tables.
I think the second option is better as it will allow you to maintain referential integrity of your database using the in-built SQL features (foreign keys), rather than relying on your code to maintain it.
This is the solution that you should be going towards:
type
----------------
typeId name
1 hospital
2 church
contact
-----------------------------------------
contactId firstName LastName typeId (fk)
1 bob is 1
2 your uncle 2
If Bob can be a contact for more than one type, than you will need a junction table.

What's the proper name for an "add-on" table?

I have a table a with primary key id and a table b that represents a specialized version of a (it has all the same characteristics to track as a does, plus some specific to its b-ness--the latter are all that are stored in b). If I decide to represent this by having b's primary key be also a foreign key to a.id, what's the proper terminology for b in relation to a?
A real world example might be a person table with student and teacher add-on tables. A student might also be a teacher (a TA for example) but they're both the same person.
I would call it a 'child table' of a but I already use that as a synonym for 'detail table', like lines on a purchase order, for example.
Your design sounds like Concrete Table Inheritance.
I'd call table B a concrete table that extends table A.
The relationship is one-to-one.
Other answers have suggested storing only the columns specific to the extended table. This design would be called Class Table Inheritance.
Ok this is sort of off topic but first things first, why does B have all of A's columns? It should only have the added columns, ESPECIALLY if you are referencing A with a foriegn key.
"Add on" records are usually called "Detail(s)"
For example, lets say my Table A is "Cars" my Table B would be "CarDetails"
As Neil N said, you shouldn't have the columns in both places if you're referencing table A in table B through a foreign key.
What your describing sounds a bit like a parallel to inheritance in object oriented programming. Personally, I don't use any specific naming convention in this case. I name A what it is and I name B what it is. For example, I might have:
CREATE TABLE People
(
people_id INT NOT NULL,
first_name VARCHAR(40) NOT NULL,
last_name VARCHAR(40) NOT NULL,
...
CONSTRAINT PK_People PRIMARY KEY CLUSTERED (people_id)
)
GO
CREATE TABLE My_Application_Users
(
people_id INT NOT NULL,
user_name VARCHAR(20) NOT NULL,
security_level INT NOT NULL,
CONSTRAINT PK_My_Application_Users PRIMARY KEY CLUSTERED (people_id),
CONSTRAINT UI_My_Application_Users_user_name UNIQUE (user_name)
)
GO
This is just an example, so please don't tell me that my name columns are too long or too short or that they should allow NULLs, etc. ;)
what's the proper terminology for b in relation to a?
Table B is a child of Table A (the parent), because in order for a record to exist in the child, it must first exist in the parent.
Tables should be modeled based on either having one-to-many or many-to-one relationships depending on the context, and of those options they can be either optional or required. Tables that link two sets of lists together will relate to other tables in a many-to-one fashion for every table involved. For example, users, groups, and user_groups_xref - the user_groups_xref can support numerous specific user instances of a user records, and the same relationship to the groups table.
There's no point in one-to-one relationships - these should never be allowed to exist because it should only be one table.

Constraints instead Triggers (Specific question)

I read here some reasons to use constraints instead of triggers. But I have a doubt. How can be assure (using only constraints), the coherence between SUPERCLASS tables and SUBCLASSES tables?
Whit a trigger is only a matter of check when INS.. UPD...
Is there a way to define that kinda relation by using only constraints (I'm newbie at this), thanks!
You can use constraints to ensure that every ContractEmployees row has a corresponding Employees row, and likewise for SalariedExployees. I don't know of a way to use constraints to enforce the opposite: making sure that for every Employees row, there is a row either in ContractEmployees or SalariedEmployees.
Backing up a bit... there are three main ways to model OO inheritance in a relational DB. The terminology is from Martin Fowler's Patterns of Enterprise Application Architecture:
Single table inheritance: everything is just in one big table, with lots of optional columns that apply only to certain subclasses. Easy to do but not very elegant.
Concrete table inheritance: one table for each concrete type. So if all employees are either salaried or contract, you'd have two tables: SalariedEmployees and ContractEmployees. I don't like this approach either, since it makes it harder to query all employees regardless of type.
Class table inheritance: one table for the base class and one per subclass. So three tables: Employees, SalariedEmployeees, and ContractEmployees.
Here is an example of class table inheritance with constraints (code for MS SQL Server):
CREATE TABLE Employees
(
ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
FirstName VARCHAR(100) NOT NULL DEFAULT '',
LastName VARCHAR(100) NOT NULL DEFAULT ''
);
CREATE TABLE SalariedEmployees
(
ID INT NOT NULL PRIMARY KEY REFERENCES Employees(ID),
Salary DECIMAL(12,2) NOT NULL
);
CREATE TABLE ContractEmployees
(
ID INT NOT NULL PRIMARY KEY REFERENCES Employees(ID),
HourlyRate DECIMAL(12,2) NOT NULL
);
The "REFERENCES Employees(ID)" part on the two subclass tables defines a foreign key constraint. This ensures that there must be a row in Employees for every row in SalariedEmployees or ContractEmployees.
The ID column is what links everything together. In the subclass tables, the ID is both a primary key for that table, and a foreign key pointing at the base class table.
Here's how I'd model a contract vs salary employee setup:
EMPLOYEE_TYPE_CODE table
EMPLOYEE_TYPE_CODE, pk
DESCRIPTION
Examples:
EMPLOYEE_TYPE_CODE DESCRIPTION
-----------------------------------
CONTRACT Contractor
SALARY Salaried
WAGE_SLAVE I can't be fired - slaves are sold
EMPLOYEES table
EMPLOYEE_ID, pk
EMPLOYEE_TYPE_CODE, foreign key to the EMPLOYEE_TYPE_CODE table
firstname, lastname, etc..
If you're wanting to store a hierarchical relationship, say between employee and manager (who by definition is also an employee):
EMPLOYEES table
EMPLOYEE_ID, pk
EMPLOYEE_TYPE_CODE, foreign key to the EMPLOYEE_TYPE_CODE table
MANAGER_ID
The MANAGER_ID would be filled with the employee_id of the employee who is their manager. This setup assumes that an employee could only have one manager. If you worked in a place like what you see in the movie "Office Space", you need a different setup to allow for an employee record to associate with 2+ managers:
MANAGE_EMPLOYEES_XREF table
MANAGER_EMPLOYEE_ID, pk, fk to EMPLOYEES table
EMPLOYEE_ID, pk, fk to EMPLOYEES table
Databases are relational and constraints enforce relational dependencies pretty well, been doing so for some 30 years now. What is this super and sub class you talk about?
Update
Introducing the OO inheritance relationships in databases is actually quite problematic. To take your example, contract-employee and fulltime-employee. You can model this as 1) a single table with a discriminator field, as 2) two unrelated tables, or as 3) three tables (one with the common parts, one with contract specific info, one with fulltime specific info).
However if you approach the very same problem from a traditional normal form point of view, you may end up with a structure similar to 1) or 3), but never as 2). More often than not, you'll end up with something that looks like nothing you'd recommend from your OO design board.
The problem is that when this collision of requirements happens, today almost invariably the OO design will prevail. Often times, the relational model will not even be be on the table. Why I see this as a 'problem' is that most times databases far outlive their original application. All too often I see some design that can be traced back to a OO domain driven design session from an application long forgotten, and one can see in the database schema the places where, over time, the OO design was 'hammered' into place to fit what the relational engine underneath could support, scale and deliver. The tell sign for me is tables organized on a clustered index around a identity ID when no one ever is interrogating those tables for a specific ID.