Should Foreign Key Columns be Unique and Not Null?

Should Foreign Key Columns be Unique and Not Null? - sql

I understand that, unless specified, foreign key columns can be NULL and duplicated (at least in Oracle SQL). Is it better practice to have foreign key columns declared not null and unique or leave them as is? Is this a decision that should be made based on the situation at hand, or is there a general rule that should be followed?

All databases allow foreign keys to be NULLable and non-UNIQUE. How you choose to declare a particular foreign key depends on the business case.
Consider the following tables used by a company that sells supplies to secret agents.
CountryList (
CountryCode NOT NULL PRIMARY KEY,
CountryName NOT NULL
)
SecretAgents (
CodeName NOT NULL PRIMARY KEY,
HomeCountryCode FOREIGN KEY REFERENCES CountryList(CountryCode)
)
Clearly, HomeCountryCode will not be unique because you may sell to more than one spy in each country. Is it NULLable? That depends on whether your business model requires each customer to declare their home country or not. If the model allows you to do business with someone who does not have a home country, or does not wish to reveal the home country to you, then the field should be NULLable. But if a state-less actor is not contemplated in your business model you should declare the column NOT NULL so that an invalid customer record cannot be created.
Now, consider the additional table
SpyMasters (
CountryCode NOT NULL PRIMARY KEY References CountryList(CountryCode),
Name NOT NULL PRIMARY KEY
)
This table lists the (singleton) head of spying for those countries that have a spy master. Not all countries will appear in this list, but each country can appear only once. In this case the CountryCode field is UNIQUE -- but you don't have to declare that explicitly because PRIMARY KEY always includes uniqueness.

The foreign key is an attribute in another table. In the original table ("referenced table"), the foreign key should be unique and non-NULL. In fact, it should almost always be the primary key of that table.
In the referencing table, the referencing column should only be declared not-NULL if a value is always required. It should only be declared unique if you never want duplicates. In other words, it depends on the characteristics in the referencing table, not the referenced table.

Related

The usability of Unique Constraint

I would like to ask in which cases its proper to use UNIQUE keyword in SQL. I know that if I declare a column as a primary key has uniqueness on its own but what happens with other attributes like country? Is it proper to use a unique constraint there?

The unique keyword in sql is used whenever you want each and every row entry of that column to be different from each other. A primary key column is automatically unique but there are some cases in which you may want more columns to be unique.
For example if you have a product_id as primary key it will ensure that no other row will have a product with product_id as that row. And in addition to that, you want that no two rows should have the same product_imei, then you can make the product_imei unique.
You can make a composite primary key like Primary Key(column1,column2) but that will mean that the combination you get from product_id and product_imei will be unique.
For example
(DLK-22,356938035643809) and (DLK-22, 11111111111111) both can exist in a table if (product_id,product_imei) is the primary key.
So you can use a unique constraint on as much columns as you like and its need depends on the scenario of the problem you are facing. You can use the unique constraint with the country if that helps you, there is no problem in doing so

The UNIQUE constraint ensures that all values in a column are different. Both the UNIQUE and PRIMARY KEY constraints provide a guarantee for uniqueness for a column or set of columns. A PRIMARY KEY constraint automatically has a UNIQUE constraint.

As any other constraint the UNIQUE constraint enforces some level of data quality. If you add this constraint to a column, then all values on that column will be different.
For example, on a table where EMPLOYEE_PK is already unique (because it's the PK) you may want to enforce the CARD_NUMBER column is also unique; that's the number displayed on the employee card. In your model the card number may be different from the PK, and you may also need to make sure it's unique.
Another extra benefit of a UNIQUE constraint is that other tables can link foreign keys to it. Since a UNIQUE column effectively acts as a "table key", any other table can establish a foreign key pointing to it. I've met many people who [wrongly] think that foreign keys can only point to primary keys.

Partial index on value from related table, rather than foreign key?

I'm working on a learning platform where students belong to a team, each of which belongs to a curriculum:
CREATE TABLE teams (
id SERIAL,
name string NOT NULL,
curriculum_id integer NOT NULL
);
CREATE TABLE curricula (
id SERIAL,
name string NOT NULL
);
CREATE UNIQUE INDEX index_curricula_on_name ON curricula USING btree (name);
Curricula have to be unique by name, and while most curricula are allowed to have multiple teams associated to them, one can not. I am trying to add a partial (unique) index on the teams table so as to add a restraint on the curriculum.
I know I can partially constrain the curriculum id itself with...
CREATE UNIQUE INDEX index_teams_on_curriculum_id ON teams USING btree (curriculum_id)
WHERE curriculum_id = 1;
... but this is not viable, as the IDs for the curriculum will vary across environments (dev, staging, etc).
Is there a way to constrain the teams.curriculum_id column by curricula.name instead?

You could implement something like this with a trigger or with a fake immutable function in a CHECK constraint. Both have their weak spots.
But this can also be implemented with pure SQL - only using NOT NULL, CHECK, UNIQUE and FK constraints. No weak spot.
CREATE TABLE curriculum (
curriculum_id serial PRIMARY KEY
, curriculum text UNIQUE NOT NULL
, team_unique boolean UNIQUE NOT NULL
, CONSTRAINT curriculum_team_uni UNIQUE (curriculum_id, team_unique) -- for multicolumn FK
);
CREATE TABLE team (
team_id serial PRIMARY KEY
, team text NOT NULL
, curriculum_id integer NOT NULL
, team_unique boolean NOT NULL
-- , CONSTRAINT fk1 FOREIGN KEY (curriculum_id) REFERENCES curriculum
, CONSTRAINT fk2 FOREIGN KEY (curriculum_id, team_unique)
REFERENCES curriculum (curriculum_id, team_unique)
);
CREATE UNIQUE INDEX team_curriculum_uni_idx ON team (team_unique)
WHERE team_unique;
Add a boolean NOT NULL column to parent and child table and make it UNIQUE in the parent table. So only one row in curriculum can be marked unique - to implement your restrictive requirement:
one can not
A partial unique index team_curriculum_uni_idx enforces only a single reference to it.
If there were multiple unique curriculums (to be referenced once only), we would remove the UNIQUE constraints on curriculum.team_unique and extend the partial unique index on team to (curriculum_id, team_unique).
The FK (fk2) forces to inherit the combination of columns.
This makes it simple to add a UNIQUE constraint to enforce a single team for the unique curriculum.
The default MATCH SIMPLE behavior of Postgres FK constraints only enforces combinations without NULL values. We can either use MATCH FULL or another plain FK (fk1) to enforce only existing curriculum_id. I commented the additional FK since we don't need it in this configuration (both FK columns defined NOT NULL).
SQL Fiddle.
Related:
MATCH FULL vs MATCH SIMPLE in foreign key constraints
CONSTRAINT to check values from a remotely related table (via join etc.)
Disable all constraints and table checks while restoring a dump
Enforcing constraints “two tables away”

Is unique foreign keys across multiple tables via normalization and without null columns possible?

This is a relational database design question, not specific to any RDBMS. A simplified case:
I have two tables Cars and Trucks. They have both have a column, say RegistrationNumber, that must be unique across the two tables.
This could probably be enforced with some insert/update triggers, but I'm looking for a more "clean" solution if possible.
One way to achieve this could be to add a third table, Vehicles, that holds the RegistrationNumber column, then add two additional columns to Vehicles, CarID and TruckID. But then for each row in Vehicles, one of the columns CarID or TruckID would always be NULL, because a RegistrationNumber applies to either a Car or a Truck leaving the other column with a NULL value.
Is there anyway to enforce a unique RegistrationNumber value across multiple tables, without introducing NULL columns or relying on triggers?

This is a bit complicated. Having the third table, Vehicles is definitely part of the solution. The second part is guaranteeing that a vehicle is either a car or a truck, but not both.
One method is the "list-all-the-possibilities" method. This is what you propose with two columns. In addition, this should have a constraint to verify that only one of the ids is filled in. A similar approach is to have the CarId and TruckId actually be the VehicleId. This reduces the number of different ids floating around.
Another approach uses composite keys. The idea is:
create table Vehicles (
Vehicle int primary key,
Registration varchar(255),
Type varchar(255),
constraint chk_type check (type in ('car', 'truck')),
constraint unq_type_Vehicle unique (type, vehicle), -- this is redundant, but necessary
. . .
);
create table car (
VehicleId int,
Type varchar(255), -- always 'car' in this table
constraint fk_car_vehicle foreign key (VehicleId) references Vehicles(VehicleId),
constraint fk_car_vehicle_type foreign key (Type, VehicleId) references Vehicles(Type, VehicleId)
);

See the following tags: class-table-inheritance shared-primary-key
You have already outlined class table inheritance in your question. The tag will just add a few details, and show you some other questions whose answers may help.
Shared primary key is a handy way of enforcing the one-to-one nature of IS-A relationships such as the relationship between a vehicle and a truck. It also allows a foreign key in some other table to reference a vehicle and also a truck or a car, as the case may be.

You can add the third table Vehicles containing a single column RegistratioNumber on which you apply the unique constraint, then on the existing tables - Cars and Trucks - use the RegistrationNumber as a foreign key on the Vehicles table. In this way you don't need an extra id, avoid the null problem and enforce the uniqueness of the registration number.
Update - this solution doesn't prevent a car and a truck to share the same registration number. To enforce this constraint you need to add either triggers or logic beyond plain SQL. Otherwise you may want to take a look at Gordon's solution that involves Composite Foreign Keys.

Can a foreign key be NULL and/or duplicate?

Please clarify two things for me:
Can a Foreign key be NULL?
Can a Foreign key be duplicate?
As fair as I know, NULL shouldn't be used in foreign keys, but in some application of mine I'm able to input NULL in both Oracle and SQL Server, and I don't know why.

Short answer: Yes, it can be NULL or duplicate.
I want to explain why a foreign key might need to be null or might need to be unique or not unique. First remember a Foreign key simply requires that the value in that field must exist first in a different table (the parent table). That is all an FK is by definition. Null by definition is not a value. Null means that we do not yet know what the value is.
Let me give you a real life example. Suppose you have a database that stores sales proposals. Suppose further that each proposal only has one sales person assigned and one client. So your proposal table would have two foreign keys, one with the client ID and one with the sales rep ID. However, at the time the record is created, a sales rep is not always assigned (because no one is free to work on it yet), so the client ID is filled in but the sales rep ID might be null. In other words, usually you need the ability to have a null FK when you may not know its value at the time the data is entered, but you do know other values in the table that need to be entered. To allow nulls in an FK generally all you have to do is allow nulls on the field that has the FK. The null value is separate from the idea of it being an FK.
Whether it is unique or not unique relates to whether the table has a one-one or a one-many relationship to the parent table. Now if you have a one-one relationship, it is possible that you could have the data all in one table, but if the table is getting too wide or if the data is on a different topic (the employee - insurance example #tbone gave for instance), then you want separate tables with a FK. You would then want to make this FK either also the PK (which guarantees uniqueness) or put a unique constraint on it.
Most FKs are for a one to many relationship and that is what you get from a FK without adding a further constraint on the field. So you have an order table and the order details table for instance. If the customer orders ten items at one time, he has one order and ten order detail records that contain the same orderID as the FK.

1 - Yes, since at least SQL Server 2000.
2 - Yes, as long as it's not a UNIQUE constraint or linked to a unique index.

Yes foreign key can be null as told above by senior programmers... I would add another scenario where Foreign key will required to be null....
suppose we have tables comments, Pictures and Videos in an application which allows comments on pictures and videos. In comments table we can have two Foreign Keys PicturesId, and VideosId along with the primary Key CommentId. So when you comment on a video only VideosId would be required and pictureId would be null... and if you comment on a picture only PictureId would be required and VideosId would be null...

it depends on what role this foreign key plays in your relation.
if this foreign key is also a key attribute in your relation, then it can't be NULL
if this foreign key is a normal attribute in your relation, then it can be NULL.

Here's an example using Oracle syntax:
First let's create a table COUNTRY
CREATE TABLE TBL_COUNTRY ( COUNTRY_ID VARCHAR2 (50) NOT NULL ) ;
ALTER TABLE TBL_COUNTRY ADD CONSTRAINT COUNTRY_PK PRIMARY KEY ( COUNTRY_ID ) ;
Create the table PROVINCE
CREATE TABLE TBL_PROVINCE(
PROVINCE_ID VARCHAR2 (50) NOT NULL ,
COUNTRY_ID VARCHAR2 (50)
);
ALTER TABLE TBL_PROVINCE ADD CONSTRAINT PROVINCE_PK PRIMARY KEY ( PROVINCE_ID ) ;
ALTER TABLE TBL_PROVINCE ADD CONSTRAINT PROVINCE_COUNTRY_FK FOREIGN KEY ( COUNTRY_ID ) REFERENCES TBL_COUNTRY ( COUNTRY_ID ) ;
This runs perfectly fine in Oracle. Notice the COUNTRY_ID foreign key in the second table doesn't have "NOT NULL".
Now to insert a row into the PROVINCE table, it's sufficient to only specify the PROVINCE_ID. However, if you chose to specify a COUNTRY_ID as well, it must exist already in the COUNTRY table.

By default there are no constraints on the foreign key, foreign key can be null and duplicate.
while creating a table / altering the table, if you add any constrain of uniqueness or not null then only it will not allow the null/ duplicate values.

Simply put, "Non-identifying" relationships between Entities is part of ER-Model and is available in Microsoft Visio when designing ER-Diagram. This is required to enforce cardinality between Entities of type " zero or more than zero", or "zero or one". Note this "zero" in cardinality instead of "one" in "one to many".
Now, example of non-identifying relationship where cardinality may be "zero" (non-identifying) is when we say a record / object in one entity-A "may" or "may not" have a value as a reference to the record/s in another Entity-B.
As, there is a possibility for one record of entity-A to identify itself to the records of other Entity-B, therefore there should be a column in Entity-B to have the identity-value of the record of Entity-B. This column may be "Null" if no record in Entity-A identifies the record/s (or, object/s) in Entity-B.
In Object Oriented (real-world) Paradigm, there are situations when an object of Class-B does not necessarily depends (strongly coupled) on object of class-A for its existence, which means Class-B is loosely-coupled with Class-A such that Class-A may "Contain" (Containment) an object of Class-A, as opposed to the concept of object of Class-B must have (Composition) an object of Class-A, for its (object of class-B) creation.
From SQL Query point of view, you can query all records in entity-B which are "not null" for foreign-key reserved for Entity-B. This will bring all records having certain corresponding value for rows in Entity-A alternatively all records with Null value will be the records which do not have any record in Entity-A in Entity-B.

Can a Foreign key be NULL?
Existing answers focused on single column scenario. If we consider multi column foreign key we have more options using MATCH [SIMPLE | PARTIAL | FULL] clause defined in SQL Standard:
PostgreSQL-CREATE TABLE
A value inserted into the referencing column(s) is matched against the values of the referenced table and referenced columns using the given match type. There are three match types: MATCH FULL, MATCH PARTIAL, and MATCH SIMPLE (which is the default). MATCH FULL will not allow one column of a multicolumn foreign key to be null unless all foreign key columns are null; if they are all null, the row is not required to have a match in the referenced table. MATCH SIMPLE allows any of the foreign key columns to be null; if any of them are null, the row is not required to have a match in the referenced table. MATCH PARTIAL is not yet implemented.
(Of course, NOT NULL constraints can be applied to the referencing column(s) to prevent these cases from arising.)
Example:
CREATE TABLE A(a VARCHAR(10), b VARCHAR(10), d DATE , UNIQUE(a,b));
INSERT INTO A(a, b, d)
VALUES (NULL, NULL, NOW()),('a', NULL, NOW()),(NULL, 'b', NOW()),('c', 'b', NOW());
CREATE TABLE B(id INT PRIMARY KEY, ref_a VARCHAR(10), ref_b VARCHAR(10));
-- MATCH SIMPLE - default behaviour nulls are allowed
ALTER TABLE B ADD CONSTRAINT B_Fk FOREIGN KEY (ref_a, ref_b)
REFERENCES A(a,b) MATCH SIMPLE;
INSERT INTO B(id, ref_a, ref_b) VALUES (1, NULL, 'b');
-- (NULL/'x') 'x' value does not exists in A table, but insert is valid
INSERT INTO B(id, ref_a, ref_b) VALUES (2, NULL, 'x');
ALTER TABLE B DROP CONSTRAINT IF EXISTS B_Fk; -- cleanup
-- MATCH PARTIAL - not implemented
ALTER TABLE B ADD CONSTRAINT B_Fk FOREIGN KEY (ref_a, ref_b)
REFERENCES A(a,b) MATCH PARTIAL;
-- ERROR: MATCH PARTIAL not yet implemented
DELETE FROM B; ALTER TABLE B DROP CONSTRAINT IF EXISTS B_Fk; -- cleanup
-- MATCH FULL nulls are not allowed
ALTER TABLE B ADD CONSTRAINT B_Fk FOREIGN KEY (ref_a, ref_b)
REFERENCES A(a,b) MATCH FULL;
-- FK is defined, inserting NULL as part of FK
INSERT INTO B(id, ref_a, ref_b) VALUES (1, NULL, 'b');
-- ERROR: MATCH FULL does not allow mixing of null and nonnull key values.
-- FK is defined, inserting all NULLs - valid
INSERT INTO B(id, ref_a, ref_b) VALUES (1, NULL, NULL);
db<>fiddle demo

I think it is better to consider the possible cardinality we have in the tables.
We can have possible minimum cardinality zero. When it is optional, the minimum participation of tuples from the related table could be zero, Now you face the necessity of foreign key values to be allowed null.
But the answer is it all depends on the Business.

The idea of a foreign key is based on the concept of referencing a value that already exists in the main table. That is why it is called a foreign key in the other table. This concept is called referential integrity. If a foreign key is declared as a null field it will violate the the very logic of referential integrity. What will it refer to? It can only refer to something that is present in the main table. Hence, I think it would be wrong to declare a foreign key field as null.

I think foreign key of one table also primary key to some other table.So it won't allows nulls.So there is no question of having null value in foreign key.

What are the drawbacks of foreign key constraints that reference non-primary-key columns?

I want to know if there are any drawbacks between a referential relation that uses primary key columns versus unique key columns (in SQL Server a foreign key constraint can only reference columns in a primary key or unique index).
Are there differences in how queries are parsed, in specific DB systems (e.g. Microsoft SQL Server 2005), based on whether a foreign key references a primary key versus a unique key?
Note that I'm not asking about the differences between using columns of different datatypes for referential integrity, joins, etc.
Purely as an example, imagine a DB in which there is a 'lookup table' dbo.Offices:
CREATE TABLE dbo.Offices (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Codes PRIMARY KEY,
Code varchar(50) NOT NULL CONSTRAINT UQ_Codes_Code UNIQUE
);
There is also a table dbo.Patients:
CREATE TABLE dbo.Patients (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Patients PRIMARY KEY,
OfficeCode varchar(50) NOT NULL,
...
CONSTRAINT FK_Patients_Offices FOREIGN KEY ( OfficeCode )
REFERENCES dbo.Offices ( Code )
);
What are the drawbacks of the table dbo.Patients and its constraint FK_Patients_Offices as in the T-SQL code above, versus the following alternate version:
CREATE TABLE dbo.Patients (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Patients PRIMARY KEY,
OfficeID int NOT NULL,
...
CONSTRAINT FK_Patients_Offices FOREIGN KEY ( OfficeID )
REFERENCES dbo.Offices ( ID )
);
Obviously, for the second version of dbo.Patients, the values in the column OfficeID don't need to be updated if changes are made to values in the Code column of dbo.Offices.
Also (obvious) is that using the Code column of dbo.Offices for foreign key references largely defeats the purpose of the surrogate key column ID – this is purely an artifact of the example. [Is there a better example of a table for which foreign key references might reasonably use a non-primary key?]

There is no drawback.
However..
Why do you have an ID column in the Offices table? A surrogate key is used to reduce space and improve performance over, say, a varchar column when used in other tables as a foreign key.
If you are going to use the varchar column for foreign keys, then you don't need a surrogate key.
Most benefits of having the IDENTITY are squandered by using the Code column for FKs.

Why do you think there would be any drawbacks??
Quite the contrary! It's good to see you're enforcing referential integrity as everyone should! No drawbacks - just good practice to do this!
I don't see any functional difference or any problems/issues with referencing a unique index vs. referencing a primary key.
Update: since you're not interested in performance- or datatype-related issues, this last paragraph probably doesn't add any additional value.
The only minor thing I see is that your OfficeCode is both a VARCHAR and thus you might run into issues with collation and/or casing (upper-/lower-case, depending on your collation), and JOIN's on a fairly large (up to 50 bytes) and varying length field are probably not quite as efficient as JOIN conditions based on a small, fixed-length INT column.

A primary key is a candidate key and is not fundamentally different from any other candidate key. It is a widely observed convention that one candidate key per table is designated as a "primary" one and that this is the key used for all foreign key references.
A possible advantage of singling out one key in this way is that you make the use of the key clearer to users of the database: they know which key is the one being referenced without looking in every referencing table. This is entirely optional however. If you find it convenient to do otherwise or if requirements dictate that some other key should be referenced by a foreign key then I suggest you do that.

Assuming you add an index on the code column (which you definitely should as soon as you reference to it), is there anything to be said against getting rid of the entire ID column and using the code column as PK as well?

The most significant one I can think of is that, if they ever renumber the offices, you'll either lose integrity or need to update both tables. However likely that might be.
The performance consequences are vanishingly small unless you have irrationally large office codes, and even then less than you probably expect.
It's not considered a significant determinant of database design for most people.

Big flaw
We were able to enter some value into dbo.Patients.OfficeID that is not there in dbo.Offices.ID
There is no meaning to say that there is a reference.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas