What are the drawbacks of foreign key constraints that reference non-primary-key columns? - sql

I want to know if there are any drawbacks between a referential relation that uses primary key columns versus unique key columns (in SQL Server a foreign key constraint can only reference columns in a primary key or unique index).
Are there differences in how queries are parsed, in specific DB systems (e.g. Microsoft SQL Server 2005), based on whether a foreign key references a primary key versus a unique key?
Note that I'm not asking about the differences between using columns of different datatypes for referential integrity, joins, etc.
Purely as an example, imagine a DB in which there is a 'lookup table' dbo.Offices:
CREATE TABLE dbo.Offices (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Codes PRIMARY KEY,
Code varchar(50) NOT NULL CONSTRAINT UQ_Codes_Code UNIQUE
);
There is also a table dbo.Patients:
CREATE TABLE dbo.Patients (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Patients PRIMARY KEY,
OfficeCode varchar(50) NOT NULL,
...
CONSTRAINT FK_Patients_Offices FOREIGN KEY ( OfficeCode )
REFERENCES dbo.Offices ( Code )
);
What are the drawbacks of the table dbo.Patients and its constraint FK_Patients_Offices as in the T-SQL code above, versus the following alternate version:
CREATE TABLE dbo.Patients (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Patients PRIMARY KEY,
OfficeID int NOT NULL,
...
CONSTRAINT FK_Patients_Offices FOREIGN KEY ( OfficeID )
REFERENCES dbo.Offices ( ID )
);
Obviously, for the second version of dbo.Patients, the values in the column OfficeID don't need to be updated if changes are made to values in the Code column of dbo.Offices.
Also (obvious) is that using the Code column of dbo.Offices for foreign key references largely defeats the purpose of the surrogate key column ID – this is purely an artifact of the example. [Is there a better example of a table for which foreign key references might reasonably use a non-primary key?]

There is no drawback.
However..
Why do you have an ID column in the Offices table? A surrogate key is used to reduce space and improve performance over, say, a varchar column when used in other tables as a foreign key.
If you are going to use the varchar column for foreign keys, then you don't need a surrogate key.
Most benefits of having the IDENTITY are squandered by using the Code column for FKs.

Why do you think there would be any drawbacks??
Quite the contrary! It's good to see you're enforcing referential integrity as everyone should! No drawbacks - just good practice to do this!
I don't see any functional difference or any problems/issues with referencing a unique index vs. referencing a primary key.
Update: since you're not interested in performance- or datatype-related issues, this last paragraph probably doesn't add any additional value.
The only minor thing I see is that your OfficeCode is both a VARCHAR and thus you might run into issues with collation and/or casing (upper-/lower-case, depending on your collation), and JOIN's on a fairly large (up to 50 bytes) and varying length field are probably not quite as efficient as JOIN conditions based on a small, fixed-length INT column.

A primary key is a candidate key and is not fundamentally different from any other candidate key. It is a widely observed convention that one candidate key per table is designated as a "primary" one and that this is the key used for all foreign key references.
A possible advantage of singling out one key in this way is that you make the use of the key clearer to users of the database: they know which key is the one being referenced without looking in every referencing table. This is entirely optional however. If you find it convenient to do otherwise or if requirements dictate that some other key should be referenced by a foreign key then I suggest you do that.

Assuming you add an index on the code column (which you definitely should as soon as you reference to it), is there anything to be said against getting rid of the entire ID column and using the code column as PK as well?

The most significant one I can think of is that, if they ever renumber the offices, you'll either lose integrity or need to update both tables. However likely that might be.
The performance consequences are vanishingly small unless you have irrationally large office codes, and even then less than you probably expect.
It's not considered a significant determinant of database design for most people.

Big flaw
We were able to enter some value into dbo.Patients.OfficeID that is not there in dbo.Offices.ID
There is no meaning to say that there is a reference.

Related

Does defining the constraint PRIMARY KEY already makes sure that the column values are unique and not null or do you have to define it seperately?

Does defining the constraint PRIMARY KEY already makes sure that the column values are unique and not null or do you have to define it seperately?
Yes. But one exception is Sqlite.
See https://sqlite.org/lang_createtable.html
Each row in a table with a primary key must have a unique combination of values in its primary key columns. For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs. If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows have identical primary key values, that is a constraint violation.
According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a bug in some early versions, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY or the table is a WITHOUT ROWID table or the column is declared NOT NULL, SQLite allows NULL values in a PRIMARY KEY column. SQLite could be fixed to conform to the standard, but doing so might break legacy applications. Hence, it has been decided to merely document the fact that SQLite allows NULLs in most PRIMARY KEY columns.

Do I really need PRIMARY KEY when using UNIQUE NOT NULL columns?

My knowledge in SQL is limited and I would appreciate someone who could help me to clarify the use of PRIMARY KEY in the following circumstances. I created a table to support ISO country information. I'm using MariaDB 10 but I believe that will not be relevant for the kind of questions I have(?)
CREATE TABLE IF NOT EXISTS python.country
(
iso_code INTEGER( 3) NOT NULL ,
iso_2_alpha VARCHAR( 2) NOT NULL ,
iso_3_alpha VARCHAR( 3) NOT NULL ,
short_name VARCHAR( 32) NOT NULL ,
long_name VARCHAR( 64) NOT NULL ,
flag_link VARCHAR(2000) DEFAULT(NULL),
CONSTRAINT CK_iso_code CHECK (iso_code > 0 AND iso_code <= 999) ,
CONSTRAINT CK_iso_alpha CHECK (
iso_2_alpha RLIKE BINARY '^[A-Z]+$' AND LENGTH(iso_2_alpha) = 2
AND
iso_3_alpha RLIKE BINARY '^[A-Z]+$' AND LENGTH(iso_3_alpha) = 3
) ,
CONSTRAINT CK_names CHECK (
short_name RLIKE '^\\p{L}+(\\.?[[:blank:]]\\p{L}+)*\\p{L}+$'
AND
long_name RLIKE '^\\p{L}+(\\.?[[:blank:]]\\p{L}+)*\\p{L}+$'
) ,
CONSTRAINT UN_short_name UNIQUE (short_name) ,
CONSTRAINT UN_long_name UNIQUE (long_name) ,
CONSTRAINT UN_iso_2_alpha UNIQUE (iso_2_alpha) ,
CONSTRAINT UN_iso_3_alpha UNIQUE (iso_3_alpha)
-- ???
-- CONSTRAINT PK_country PRIMARY KEY (iso_code,iso_2_alpha,iso_3_alpha)
); -- ENGINE = 'InnoDB';
Question 1: Since all main columns (iso_code,iso_2_alpha,iso_3_alpha) are NOT NULL and UNIQUE does make sense to create a composite PRIMARY KEY? I "believe" it's waste of space and time when inserting new elements?
Question 2: Can I use iso_code safely has being the FOREIGN KEY in other table?
Many thanks.
Since all main columns (iso_code,iso_2_alpha,iso_3_alpha) are NOT NULL and UNIQUE does make sense to create a composite PRIMARY KEY? I "believe" it's waste of space and time when inserting new elements?
Your proposed PK is a superkey over existing keys. It's not necessary in and of itself. You could choose to declare one of your unique key constraints as a PK instead but it's not necessary.
Can I use iso_code safely has being the FOREIGN KEY in other table?
If you also mark iso_code as a unique key in this table, that should work fine.
Some people would recommend that every table always have an autogenerated column marked as PK. That's fine so long as you also enforce the logical keys. Unfortunately, many people will just create that auto-PK and no other keys, which means your data is nonsense.
You've chosen (currently) to just have the logical keys. I think that's fine in this case, especially as several (iso_code, iso_2_alpha and iso_3_alpha) are likely to be more compact that the recommended autogenerated column.
Can't comment on performance and efficiency but one thing with composite keys is that when you use them as a primary key, you have to repeat them in your foreign key. I.e, PK iso_code, iso_2_alpha, iso_3_alpha will be additional FK columns in all the tables related. You also have to then query by these 3 columns in your SQL queries. Bit of a PITA IMO when you can simply use a generic, unique self generating column.
If you can use iso_code and you are sure you never ever ever will have the chance to require inserting a duplicate iso_code that has a different iso_2_alpha, iso_3_alpha then go ahead. But, you should future proof and make table more robust and anticipate the unexpected, use a new dedicated id column unrelated to the business, IMHO.

mysql: difference between primary key and unique index? [duplicate]

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

Can there be constraints with the same name in a DB?

This is a follow-on question from the one I asked here.
Can constraints in a DB have the same name?
Say I have:
CREATE TABLE Employer
(
EmployerCode VARCHAR(20) PRIMARY KEY,
Address VARCHAR(100) NULL
)
CREATE TABLE Employee
(
EmployeeID INT PRIMARY KEY,
EmployerCode VARCHAR(20) NOT NULL,
CONSTRAINT employer_code_fk FOREIGN KEY (EmployerCode) REFERENCES Employer
)
CREATE TABLE BankAccount
(
BankAccountID INT PRIMARY KEY,
EmployerCode VARCHAR(20) NOT NULL,
Amount MONEY NOT NULL,
CONSTRAINT employer_code_fk FOREIGN KEY (EmployerCode) REFERENCES Employer
)
Is this allowable? Does it depend on the DBMS (I'm on SQL Server 2005)? If it is not allowable, does anyone have any suggestions on how to work around it?
No - a constraint is a database object as well, and thus its name needs to be unique.
Try adding e.g. the table name to your constraint, that way it'll be unique.
CREATE TABLE BankAccount
(
BankAccountID INT PRIMARY KEY,
EmployerCode VARCHAR(20) NOT NULL,
Amount MONEY NOT NULL,
CONSTRAINT FK_BankAccount_Employer
FOREIGN KEY (EmployerCode) REFERENCES Employer
)
We basically use "FK_"(child table)_(parent table)" to name the constraints and are quite happy with this naming convention.
Information from MSDN
That constraint names have to be unique to the schema (ie. two different schemas in the same database can both contain a constraint with the same name) is not explicitly documented. Rather you need to assume the identifiers of database objects must be unique within the containing schema unless specified otherwise. So the constraint name is defined as:
Is the name of the constraint. Constraint names must follow the rules for identifiers, except that the name cannot start with a number sign (#). If constraint_name is not supplied, a system-generated name is assigned to the constraint.
Compare this to the name of an index:
Is the name of the index. Index names must be unique within a table or view but do not have to be unique within a database. Index names must follow the rules of identifiers.
which explicitly narrows the scope of the identifier.
The other answers are all good but I thought I'd add an answer to the question in the title, i.e., "can there be constraints with the same name in a DB?"
The answer for MS SQL Server is yes – but only so long as the constraints are in different schemas. Constraint names must be unique within a schema.
I was always puzzled why constraint names must be unique in the database, since they seem like they're associated with tables.
Then I read about SQL-99's ASSERTION constraint, which is like a check constraint, but exists apart from any single table. The conditions declared in an assertion must be satisfied consistently like any other constraint, but the assertion can reference multiple tables.
AFAIK no SQL vendor implements ASSERTION constraints. But this helps explain why constraint names are database-wide in scope.
It depends on the DBMS.
For example on PostgreSQL, the answer is yes :
Because PostgreSQL does not require constraint names to be unique
within a schema (but only per-table), it is possible that there is
more than one match for a specified constraint name.
Source : https://www.postgresql.org/docs/current/static/sql-set-constraints.html
I've seen Foreign Keys constraint names equals on 2 different tables within the same schema.
Does it depend on the DBMS (I'm on SQL Server 2005)?
Yes, apparently it does depend on the DBMS.
Other answers say it's not permitted, but I have a MS SQL CE ("Compact Edition") database in which I accidentally successfully created two FK contraints, in two tables, with the same contraint name.
Good practice is to create index and constraint names specifying table name at the beginning.
There's 2 approaches, with index/constraint type at the beginning or at the end) eg.
UQ_TableName_FieldName
or
TableName_FieldName_UQ
Foreign keys names should also contain names of referenced Table/Field(s).
One of good naming conventions is to give table names in form of FullName_3LetterUniqueAlias eg.
Employers_EMR
Employees_EMP
BankAccounts_BNA
Banks_BNK
This give you opportunity to use "predefined" aliases in queries which improves readability and also makes Naming of foreign keys easier, like:
EMPEMR_EmployerCode_FK
BNKEMR_EmployerCode_FK

Primary key or Unique index?

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.