Do I really need PRIMARY KEY when using UNIQUE NOT NULL columns? - sql

My knowledge in SQL is limited and I would appreciate someone who could help me to clarify the use of PRIMARY KEY in the following circumstances. I created a table to support ISO country information. I'm using MariaDB 10 but I believe that will not be relevant for the kind of questions I have(?)
CREATE TABLE IF NOT EXISTS python.country
(
iso_code INTEGER( 3) NOT NULL ,
iso_2_alpha VARCHAR( 2) NOT NULL ,
iso_3_alpha VARCHAR( 3) NOT NULL ,
short_name VARCHAR( 32) NOT NULL ,
long_name VARCHAR( 64) NOT NULL ,
flag_link VARCHAR(2000) DEFAULT(NULL),
CONSTRAINT CK_iso_code CHECK (iso_code > 0 AND iso_code <= 999) ,
CONSTRAINT CK_iso_alpha CHECK (
iso_2_alpha RLIKE BINARY '^[A-Z]+$' AND LENGTH(iso_2_alpha) = 2
AND
iso_3_alpha RLIKE BINARY '^[A-Z]+$' AND LENGTH(iso_3_alpha) = 3
) ,
CONSTRAINT CK_names CHECK (
short_name RLIKE '^\\p{L}+(\\.?[[:blank:]]\\p{L}+)*\\p{L}+$'
AND
long_name RLIKE '^\\p{L}+(\\.?[[:blank:]]\\p{L}+)*\\p{L}+$'
) ,
CONSTRAINT UN_short_name UNIQUE (short_name) ,
CONSTRAINT UN_long_name UNIQUE (long_name) ,
CONSTRAINT UN_iso_2_alpha UNIQUE (iso_2_alpha) ,
CONSTRAINT UN_iso_3_alpha UNIQUE (iso_3_alpha)
-- ???
-- CONSTRAINT PK_country PRIMARY KEY (iso_code,iso_2_alpha,iso_3_alpha)
); -- ENGINE = 'InnoDB';
Question 1: Since all main columns (iso_code,iso_2_alpha,iso_3_alpha) are NOT NULL and UNIQUE does make sense to create a composite PRIMARY KEY? I "believe" it's waste of space and time when inserting new elements?
Question 2: Can I use iso_code safely has being the FOREIGN KEY in other table?
Many thanks.

Since all main columns (iso_code,iso_2_alpha,iso_3_alpha) are NOT NULL and UNIQUE does make sense to create a composite PRIMARY KEY? I "believe" it's waste of space and time when inserting new elements?
Your proposed PK is a superkey over existing keys. It's not necessary in and of itself. You could choose to declare one of your unique key constraints as a PK instead but it's not necessary.
Can I use iso_code safely has being the FOREIGN KEY in other table?
If you also mark iso_code as a unique key in this table, that should work fine.
Some people would recommend that every table always have an autogenerated column marked as PK. That's fine so long as you also enforce the logical keys. Unfortunately, many people will just create that auto-PK and no other keys, which means your data is nonsense.
You've chosen (currently) to just have the logical keys. I think that's fine in this case, especially as several (iso_code, iso_2_alpha and iso_3_alpha) are likely to be more compact that the recommended autogenerated column.

Can't comment on performance and efficiency but one thing with composite keys is that when you use them as a primary key, you have to repeat them in your foreign key. I.e, PK iso_code, iso_2_alpha, iso_3_alpha will be additional FK columns in all the tables related. You also have to then query by these 3 columns in your SQL queries. Bit of a PITA IMO when you can simply use a generic, unique self generating column.
If you can use iso_code and you are sure you never ever ever will have the chance to require inserting a duplicate iso_code that has a different iso_2_alpha, iso_3_alpha then go ahead. But, you should future proof and make table more robust and anticipate the unexpected, use a new dedicated id column unrelated to the business, IMHO.

Related

Should I use a unique constraint in a table even though it isn't necessarily required?

In Microsoft SQL Server, when creating tables, are there any downsides to using a unique constraint on a column even though you don't really need it to be unique?
An example would be descriptions for say a role in a user management system:
CREATE TABLE Role
(
ID TINYINT PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title CHARACTER VARYING(32) NOT NULL UNIQUE,
Description CHARACTER VARYING(MAX) NOT NULL UNIQUE
)
My fear is that validating this constraint when doing frequent insertions in other tables will be a very time consuming process. I am unsure as to how this constraint is validated, but I feel like it could be done in a very efficient way or as a linear comparison.
Your fear becomes true: UNIQUE constraint are implemented as indices, and this is time and space consuming.
So, whenever you insert a new row, the database have to update the table, and also one index for each unique constraint.
So, according to you:
using a unique constraint on a column even though you don't really need it to be unique
the answer is no, don't use it. there are time and space downsides.
Your sample table would need a clustered index for the Id, and 2 extra indices, one for each unique constraint. This takes up space, and time to update the 3 indices on the inserts.
This would only be justified if you made queries filtering by those fields.
BY THE WAY:
The original post sample table have several flaws:
that syntax is not SQL Server syntax (and you tagged this as SQL Server)
you cannot create an index in a varchar(max) column
If you correct the syntax and create this table:
CREATE TABLE Role
(
ID tinyint PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title varchar(32) NOT NULL UNIQUE,
Description varchar(32) NOT NULL UNIQUE
)
You can then execute sp_help Role and you'll find the 3 indices.
The database creates an index which backs up the UNIQUE constraint, so it should be very low-cost to do the uniqueness check.
http://msdn.microsoft.com/en-us/library/ms177420.aspx
The Database Engine automatically creates a UNIQUE index to enforce the uniqueness requirement of the UNIQUE constraint. Therefore, if an attempt to insert a duplicate row is made, the Database Engine returns an error message that states the UNIQUE constraint has been violated and does not add the row to the table. Unless a clustered index is explicitly specified, a unique, nonclustered index is created by default to enforce the UNIQUE constraint.
Is it typically a good practice to constrain it if you know the data
will always be unique but it doesn't necessarily need to be unique for
the application to function correctly?
My question to you: would it make sense for two roles to have different titles but the same description? e.g.
INSERT INTO Role ( Title , Description )
VALUES ( 'CEO' , 'Senior manager' ),
( 'CTO' , 'Senior manager' );
To me it would seem to devalue the use of the description; if there were many duplications then it might make more sense to do something more like this:
INSERT INTO Role ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
INSERT INTO SeniorManagers ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
But then again you are not expecting duplicates.
I assume this is a low activity table. You say you fear validating this constraint when doing frequent insertions in other tables. Well, that will not happen (unless there is a trigger we cannot see that might update this table when another table is updated).
Personally, I would ask the designer (business analyst, whatever) to justify not applying a unique constraint. If they cannot then I would impose the unqiue constraint based on common sense. As is usual for such a text column, I would also apply CHECK constraints e.g. to disallow leading/trailing/double spaces, zero-length string, etc.
On SQL Server, the data type tinyint only gives you 256 distinct values. No matter what you do outside of the id column, you're not going to end up with a very big table. It will surely perform quickly even with a dozen indexed columns.
You usually need at least one unique constraint besides the surrogate key, though. If you don't have one, you're liable to end up with data like this.
1 First title First description
2 First title First description
3 First title First description
...
17 Third title Third description
18 First title First description
Tables that permit data like that are usually wrong. Any table that uses foreign key references to this table won't be able to report correctly, say, the number of "First title" used.
I'd argue that allowing multiple, identical titles for roles in a user management system is a design error. I'd probably argue that "title" is a really bad name for that column, too.

ORACLE Table design: M:N table best practice

I'd like to hear your suggestions on this very basic question:
Imagine these three tables:
--DROP TABLE a_to_b;
--DROP TABLE a;
--DROP TABLE b;
CREATE TABLE A
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT A_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE B
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT B_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE A_TO_B
(
id NUMBER NOT NULL,
a_id NUMBER NOT NULL,
b_id NUMBER NOT NULL,
somevalue1 VARCHAR2(20) NOT NULL,
somevalue2 VARCHAR2(20) NOT NULL,
somevalue3 VARCHAR2(20) NOT NULL
) ;
How would you design table a_to_b?
I'll give some discussion starters:
synthetic id-PK column or combined a_id,b_id-PK (dropping the "id" column)
When synthetic: What other indices/constraints?
When combined: Also index on b_id? Or even b_id,a_id (don't think so)?
Also combined when these entries are referenced themselves?
Also combined when these entries perhaps are referenced themselves in the future?
Heap or Index-organized table
Always or only up to x "somevalue"-columns?
I know that the decision for one of the designs is closely related to the question how the table will be used (read/write ratio, density, etc.), but perhaps we get a 20/80 solution as blueprint for future readers.
I'm looking forward to your ideas!
Blama
I have always made the PK be the combination of the two FKs, a_id and b_id in your example. Adding a synthetic id field to this table does no good, since you never end up looking for a row based on a knowledge of its id.
Using the compound PK gives you a constraint that prevents the same instance of the relationship between a and b from being inserted twice. If duplicate entries need to be permitted, there's something wrong with your data model at the conceptual level.
The index you get behind the scenes (for every DBMS I know of) will be useful to speed up common joins. An extra index on b_id is sometimes useful, depending on the kinds of joins you do frequently.
Just as a side note, I don't use the name "id" for all my synthetic pk columns. I prefer a_id, b_id. It makes it easier to manage the metadata, even though it's a little extra typing.
CREATE TABLE A_TO_B
(
a_id NUMBER NOT NULL REFERENCES A (a_id),
b_id NUMBER NOT NULL REFERENCES B (b_id),
PRIMARY KEY (a_id, b_id),
...
) ;
It's not unusual for ORMs to require (or, in more clueful ORMs, hope for) an integer column named "id" in addition to whatever other keys you have. Apart from that, there's no need for it. An id number like that makes the table wider (which usually degrades I/O performance just slightly), and adds an index that is, strictly speaking, unnecessary. It isn't necessary to identify the entity--the existing key does that--and it leads new developers into bad habits. (Specifically, giving every table an integer column named "id", and believing that that column alone is the only key you need.)
You're likely to need one or more of these indexed.
a_id
b_id
{a_id, b_id}
{b_id, a_id}
I believe Oracle should automatically index {a_id, b_id}, because that's the primary key. Oracle doesn't automatically index foreign keys. Oracle's indexing guidelines are online.
In general, you need to think carefully about whether you need ON UPDATE CASCADE or ON DELETE CASCADE. In Oracle, you only need to think carefully about whether you need ON DELETE CASCADE. (Oracle doesn't support ON UPDATE CASCADE.)
the other comments so far are good.
also consider adding begin_dt and end_dt to the relationship. in this way, you can manage a good number of questions about each relationship through time. (consider baseline issues)

What are the drawbacks of foreign key constraints that reference non-primary-key columns?

I want to know if there are any drawbacks between a referential relation that uses primary key columns versus unique key columns (in SQL Server a foreign key constraint can only reference columns in a primary key or unique index).
Are there differences in how queries are parsed, in specific DB systems (e.g. Microsoft SQL Server 2005), based on whether a foreign key references a primary key versus a unique key?
Note that I'm not asking about the differences between using columns of different datatypes for referential integrity, joins, etc.
Purely as an example, imagine a DB in which there is a 'lookup table' dbo.Offices:
CREATE TABLE dbo.Offices (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Codes PRIMARY KEY,
Code varchar(50) NOT NULL CONSTRAINT UQ_Codes_Code UNIQUE
);
There is also a table dbo.Patients:
CREATE TABLE dbo.Patients (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Patients PRIMARY KEY,
OfficeCode varchar(50) NOT NULL,
...
CONSTRAINT FK_Patients_Offices FOREIGN KEY ( OfficeCode )
REFERENCES dbo.Offices ( Code )
);
What are the drawbacks of the table dbo.Patients and its constraint FK_Patients_Offices as in the T-SQL code above, versus the following alternate version:
CREATE TABLE dbo.Patients (
ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Patients PRIMARY KEY,
OfficeID int NOT NULL,
...
CONSTRAINT FK_Patients_Offices FOREIGN KEY ( OfficeID )
REFERENCES dbo.Offices ( ID )
);
Obviously, for the second version of dbo.Patients, the values in the column OfficeID don't need to be updated if changes are made to values in the Code column of dbo.Offices.
Also (obvious) is that using the Code column of dbo.Offices for foreign key references largely defeats the purpose of the surrogate key column ID – this is purely an artifact of the example. [Is there a better example of a table for which foreign key references might reasonably use a non-primary key?]
There is no drawback.
However..
Why do you have an ID column in the Offices table? A surrogate key is used to reduce space and improve performance over, say, a varchar column when used in other tables as a foreign key.
If you are going to use the varchar column for foreign keys, then you don't need a surrogate key.
Most benefits of having the IDENTITY are squandered by using the Code column for FKs.
Why do you think there would be any drawbacks??
Quite the contrary! It's good to see you're enforcing referential integrity as everyone should! No drawbacks - just good practice to do this!
I don't see any functional difference or any problems/issues with referencing a unique index vs. referencing a primary key.
Update: since you're not interested in performance- or datatype-related issues, this last paragraph probably doesn't add any additional value.
The only minor thing I see is that your OfficeCode is both a VARCHAR and thus you might run into issues with collation and/or casing (upper-/lower-case, depending on your collation), and JOIN's on a fairly large (up to 50 bytes) and varying length field are probably not quite as efficient as JOIN conditions based on a small, fixed-length INT column.
A primary key is a candidate key and is not fundamentally different from any other candidate key. It is a widely observed convention that one candidate key per table is designated as a "primary" one and that this is the key used for all foreign key references.
A possible advantage of singling out one key in this way is that you make the use of the key clearer to users of the database: they know which key is the one being referenced without looking in every referencing table. This is entirely optional however. If you find it convenient to do otherwise or if requirements dictate that some other key should be referenced by a foreign key then I suggest you do that.
Assuming you add an index on the code column (which you definitely should as soon as you reference to it), is there anything to be said against getting rid of the entire ID column and using the code column as PK as well?
The most significant one I can think of is that, if they ever renumber the offices, you'll either lose integrity or need to update both tables. However likely that might be.
The performance consequences are vanishingly small unless you have irrationally large office codes, and even then less than you probably expect.
It's not considered a significant determinant of database design for most people.
Big flaw
We were able to enter some value into dbo.Patients.OfficeID that is not there in dbo.Offices.ID
There is no meaning to say that there is a reference.

mysql: difference between primary key and unique index? [duplicate]

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

Primary key or Unique index?

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.