Should every table have a primary key?

Should every table have a primary key? - sql

I read somewhere saying that every table should have a primary key to fulfill 1NF.
I have a tbl_friendship table.
There are 2 fields in the table : Owner and Friend.
Fields of Owner and Friends are foreign keys of auto increment id field in tbl_user.
Should this tbl_friendship has a primary key?
Should I create an auto increment id field in tbl_friendship and make it as primary key?

Primary keys can apply to multiple columns! In your example, the primary key should be on both columns, For example (Owner, Friend). Especially when Owner and Friend are foreign keys to a users table rather than actual names say (personally, my identity columns use the "Id" naming convention and so I would have (OwnerId, FriendId)
Personally I believe every table should have a primary key, but you'll find others who disagree.
Here's an article I wrote on the topic of normal forms.
http://michaeljswart.com/2011/01/ridiculously-unnormalized-database-schemas-part-zero/

Yes every table should have a primary key.
Yes you should create surrogate key.. aka an auto increment pk field.
You should also make "Friend" an FK to that auto increment field.
If you think that you are going to "rekey" in the future you might want to look into using natural keys, which are fields that naturally identify your data. The key to this is while coding always use the natural identifiers, and then you create unique indexes on those natural keys. In the future if you have to re-key you can, because your ux guarantees your data is consistent.
I would only do this if you absolutely have to, because it increases complexity, in your code and data model.

It is not clear from your description, but are owner and friend foreign keys and there can be only one relationship between any given pair? This makes two foreign key column a perfect candidate for a natural primary key.
Another option is to use surrogate key (extra auto-incremented column as you suggested). Take a look here for an in-depth discussion.

A primary key can be something abstract as well. In this case, each tuple (owner, friend), e.g. ("Dave","Matt") can form a unique entry and therefore be your primary key. In that case, it would be useful not to use names, but keys referencing another table. If you guarantee, that these tuples can't have duplicates, you have a valid primary key.
For processing reasons it might be useful to introduce a special primary key, like an autoincrement field (e.g. in MySQL) or using a sequence with Oracle.

To comply with 1NF (which is not completely aggreed upon what defines 1NF), yes you should have a primary key identified on each table. This is necessary to provide for uniqueness of each record.
http://en.wikipedia.org/wiki/First_normal_form
In general, you can create a primary key in many ways, one of which is to have an auto-increment column, another is to have a column with GUIDs, another is to have two or more columns that will identify a row uniquely when taken together.

Your table will be much easier to manage in the long term if it has a primary key. At the very least, you need to uniquely identify each record in the table. The field that is used to uniquely identify each record might as well be the primary key.

Yes every table should have (at least one) key. Duplicating rows in any table is undesirable for lots of reasons so put the constraint on those two columns.

Related

What is the difference between a primary key and a surrogate key?

I googled a lot, but I did not find the exact straight forward answer with an example.
Any example for this would be more helpful.

The primary key is a unique key in your table that you choose that best uniquely identifies a record in the table. All tables should have a primary key, because if you ever need to update or delete a record you need to know how to uniquely identify it.
A surrogate key is an artificially generated key. They're useful when your records essentially have no natural key (such as a Person table, since it's possible for two people born on the same date to have the same name, or records in a log, since it's possible for two events to happen such they they carry the same timestamp). Most often you'll see these implemented as integers in an automatically incrementing field, or as GUIDs that are generated automatically for each record. ID numbers are almost always surrogate keys.
Unlike primary keys, not all tables need surrogate keys, however. If you have a table that lists the states in America, you don't really need an ID number for them. You could use the state abbreviation as a primary key code.
The main advantage of the surrogate key is that they're easy to guarantee as unique. The main disadvantage is that they don't have any meaning. There's no meaning that "28" is Wisconsin, for example, but when you see 'WI' in the State column of your Address table, you know what state you're talking about without needing to look up which state is which in your State table.

A surrogate key is a made up value with the sole purpose of uniquely identifying a row. Usually, this is represented by an auto incrementing ID.
Example code:
CREATE TABLE Example
(
SurrogateKey INT IDENTITY(1,1) -- A surrogate key that increments automatically
)
A primary key is the identifying column or set of columns of a table. Can be surrogate key or any other unique combination of columns (for example a compound key). MUST be unique for any row and cannot be NULL.
Example code:
CREATE TABLE Example
(
PrimaryKey INT PRIMARY KEY -- A primary key is just an unique identifier
)

All keys are identifiers used as surrogates for the things they identify. E.F.Codd explained the concept of system-assigned surrogates as follows [1]:
Database users may cause the system to generate or delete a surrogate,
but they have no control over its value, nor is its value ever
displayed to them.
This is what is commonly referred to as a surrogate key. The definition is immediately problematic however because Codd was assuming that such a feature would be provided by the DBMS. DBMSs in general have no such feature. The keys are normally visible to at least some DBMS users as, for obvious reasons, they have to be. The concept of a surrogate has therefore morphed slightly in usage. The term is generally used in the data management profession to mean a key that is not exposed and used as an identifier in the business domain. Note that this is essentially unrelated to how the key is generated or how "artificial" it is perceived to be. All keys consist of symbols invented by humans or machines. The only possible significance of the term surrogate therefore relates how the key is used, not how it is created or what its values are.
[1] Extending the database relational model to capture more meaning, E.F.Codd, 1979

This is a great treatment describing the various kinds of keys:
http://www.agiledata.org/essays/keys.html

A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a column with an identity property to help generate surrogate key values.
The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain UNIQUE values.
A primary key column cannot contain NULL values.
Most tables should have a primary key, and each table can have only ONE primary key.
http://www.databasejournal.com/features/mssql/article.php/3922066/SQL-Server-Natural-Key-Verses-Surrogate-Key.htm

I think Michelle Poolet describes it in a very clear way:
A surrogate key is an artificially produced value, most often a
system-managed, incrementing counter whose values can range from 1 to
n, where n represents a table's maximum number of rows. In SQL Server,
you create a surrogate key by assigning an identity property to a
column that has a number data type.
http://sqlmag.com/business-intelligence/surrogate-key-vs-natural-key
It usually helps you use a surrogate key when you change a composite key with an identity column.

Primary key in "many-to-many" table

I have a table in a SQL database that provides a "many-to-many" connection.
The table contains id's of both tables and some fields with additional information about the connection.
CREATE TABLE SomeTable (
f_id1 INTEGER NOT NULL,
f_id2 INTEGER NOT NULL,
additional_info text NOT NULL,
ts timestamp NULL DEFAULT now()
);
The table is expected to contain 10 000 - 100 000 entries.
How is it better to design a primary key? Should I create an additional 'id' field, or to create a complex primary key from both id's?
DBMS is PostgreSQL

This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.
A big reason for this is so foreign key references to the table can use only one column.
In a many-to-many junction table (aka "association table"), this probably isn't necessary:
It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.
Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.
My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.

A surrogate key wont protect you from adding multiple instances of (f_id1, f_id2) so you should definitely have a unique constraint or primary key for that. What would the purpose of a surrogate key be in your scenario?

Yes that's actually what people commonly do, that key is called surrogate key.. I'm not exactly sure with PostgreSQL, but in MySQL by using surrogate key you can delete/edit the records from the user interface.. Besides, this allows the database to query the single key column faster than it could multiple columns.. Hope it helps..

Why we can't have more than one primary key?

I Know there can't be more than 1 primary key in a table but what is the technical reason ?

Pulled directly from SO:
You can only have one primary key, but you can have multiple columns in your primary key.
You can also have Unique Indexes on your table, which will work a bit like a primary key in that they will enforce unique values, and will speed up querying of those values.
Primary in the context of Primary Key means that it's ranked first in importance. Therefore, there can only be one key. It's by definition.
It's also usually the key for which the index has the actual data attached to it, that is, the data is stored with the primary key index. Other indices contain only the data that's being indexed, and perhaps some Included Columns.

In fact E.F.Codd (the inventor of the Relational Database Model) [1] originated the term "primary key" to mean any number of keys of a relation - not just one. He made it clear that it was quite possible to have more than one such key. His suggestion was that the database designer could choose one key as a preferred identifier ("the primary key") - but in principle this was optional and such a choice was "arbitrary" (that was his word). Because all keys enjoy the same properties as each other there is no fundamental need to choose any one over another.
Later on [2] what Codd originally called primary keys became known as candidate keys and the one key singled out as the preferred one became known as the "primary" key. This was not really a fundamental shift however because a primary key means exactly the same as a candidate key. Since they are equivalent concepts it doesn't really mean anything important when we say there "must" only be one primary key. If you have more than one candidate key you could quite reasonably call more than one of them "primary" if you prefer because it doesn't make any logical or practical difference to the meaning and function of the database.
It has been argued (by me among others) that the idea of designating one key per table as "primary" is utterly superfluous and sometimes a positive hinderance to a good understanding of database design and data intgrity issues. However, the concept is so entrenched we are probably stuck with it.
So the proper answer to your question is "convention" and "convenience". There is no good technical reason at all.
[1] A Relational Model of Data for Large Shared Data Banks (1970)
[2] E.g. in "Further Normalization of the Relational Data Base Model" (1971)

Well, it's called "primary" for a reason. As in, its the one key used to uniquely identify the record... and there "can be only one".
You could certainly mimick a second "primary" key by having an index placed on one or more other fields that are unique but for the purposes of your database server it's generally only necessary if your key isn't unique enough to cross database servers in a merge replication situation. (ie: multi master).

PRIMARY KEY is usually equivalent to UNIQUE INDEX NOT NULL. So you can effectively have multiple "primary keys" on a single table.

The primary key is the key which uniquely identifies that record.
I'm not sure if you're asking if a) there can be a single primary key spanning multiple columns, or b) if you can have multiple keys which uniquely identify the record.
The first is possible, known as a composite primary key.
The second is possible also, but only one is called the primary key.

Because the "primary" in "primary key" denotes its, mmm, singularity(?).
But if you need more, you can define UNIQUE keys which have quite the same behaviour.

The technical reason is that there can be only one primary. Otherwise it wouldn't be called so.
However a primary key can include several columns - see 7.5.2. Multiple-Column Indexes

The primary key is the one (of possibly many) unique identifiers of a particular row in a table. The other unique identifiers, which were not designated as the primary one, are hence often refereed to as secondary unique indexes.

Primary key allows us to uniquely identify each record in the table. You can have 2 primary keys in a table but they are called Composite Primary Keys. "When you define more than one column as your primary key on a table, it is called a composite primary key."

A primary key defines record uniqueness. To have two different measures of uniqueness can be problematic. For example, if you have primary keys A and B and you insert records where A is the same and B is different, then are those records the same or different? If you consider them different, then make your primary a composite of A and B. If you consider them the same record, then just use A or B as the primary key.

For non-clustered index we can create two index and are typically made on non-primary key columns used in JOIN, WHERE , ORDER BY clauses.
While in clustered index we have only one index and that on primary key. So if we have two primary keys there is ambiguity.
Also in referential intergrity there is ambiguity selecting one of the two primary keys.

Only one primary key possible on the table because primary key creates a clustered index on the table which stored data physically on the leaf node in ordered way based on that primary key column.
If we try to create one another primary key on that table then there will be one major problem related to the data.Because be can not store same data of the table in two different-2 order.

What is the difference between a primary key and a unique constraint?

Someone asked me this question on an interview...

Primary keys can't be null. Unique keys can.

A primary key is a unique field on a table but it is special in the sense that the table considers that row as its key. This means that other tables can use this field to create foreign key relationships to themselves.
A unique constraint simply means that a particular field must be unique.

Primary key can not be null but unique can have only one null value.
Primary key create the cluster index automatically but unique key not.
A table can have only one primary key but unique key more than one.

TL;DR Much can be implied by PRIMARY KEY (uniqueness, reference-able, non-null-ness, clustering, etc) but nothing that can't be stated explicitly using UNIQUE.
I suggest that if you are the kind of coder who likes the convenience of SELECT * FROM... without having to list out all those pesky columns then PRIMARY KEY is just the thing for you.
a relvar can have several keys, but we choose just one for underlining
and call that one the primary key. The choice is arbitrary, so the
concept of primary is not really very important from a logical point
of view. The general concept of key, however, is very important! The
term candidate key means exactly the same as key (i.e., the addition
of candidate has no real significance—it was proposed by Ted Codd
because he regarded each key as a candidate for being nominated as the
primary key)... SQL allows a subset of a table's columns to be
declared as a key for that table. It also allows one of them to be
nominated as the primary key. Specifying a key to be primary makes
for a certain amount of convenience in connection with other
constraints that might be needed
What Is a Key? by Hugh Darwen
it's usual... to single out one key as the primary key (and any other
keys for the relvar in question are then said to be alternate keys).
But whether some key is to be chosen as primary, and if so which one,
are essentially psychological issues, beyond the purview of the
relational model as such. As a matter of good practice, most base
relvars probably should have a primary key—but, to repeat, this rule,
if it is a rule, really isn't a relational issue as such... Strong
recommendation [to SQL users]: For base tables, at any rate, use
PRIMARY KEY and/or UNIQUE specifications to ensure that every such
table does have at least one key.
SQL and Relational Theory: How to Write Accurate SQL Code
By C. J. Date
In standard SQL PRIMARY KEY
implies uniqueness but you can specify that explicitly (using UNIQUE).
implies NOT NULL but you can specify that explicitly when creating columns (but you should be avoiding nulls anyhow!)
allows you to omit its columns in a FOREIGN KEY but you can specify them explicitly.
can be declared for only one key per table but it is not clear why (Codd, who originally proposed the concept, did not impose such a restriction).
In some products PRIMARY KEY implies the table's clustered index but you can specify that explicitly (you may not want the primary key to be the clustered index!)
For some people PRIMARY KEY has purely psychological significance:
they think it signifies that the key will be referenced in a foreign key (this was proposed by Codd but not actually adopted by standard SQL nor SQL vendors).
they think it signifies the sole key of the table (but the failure to enforce other candidate keys leads to loss of data integrity).
they think it implies a 'surrogate' or 'artificial ' key with no significance to the business (but actually imposes unwanted significance on the enterprise by being exposed to users).

Every primary key is a unique constraint, but in addition to the PK, a table can have additional unique constraints.
Say you have a table Employees, PK EmployeeID. You can add a unique constraint on SSN, for example.

Unique Key constraints:
Unique key constraint will provide you a constraint like the column values should retain uniqueness.
It will create non-clustered index by default
Any number of unique constraints can be added to a table.
It will allow null value in the column.
ALTER TABLE table_name
ADD CONSTRAINT UNIQUE_CONSTRAINT
UNIQUE (column_name1, column_name2, ...)
Primary Key:
Primary key will create column data uniqueness in the table.
Primary key will create clustered index by default
Only one Primay key can be created for a table
Multiple columns can be consolidated to form a single primary key
It wont allow null values.
ALTER TABLE table_name
ADD CONSTRAINT KEY_CONSTRAINT
PRIMARY KEY (column_name)

In addition to Andrew's answer, you can only have one primary key per table but you can have many unique constraints.

Primary key's purpose is to uniquely identify a row in a table. Unique constraint ensures that a field's value is unique among the rows in table.
You can have only one primary key per table. You can have more than one unique constraint per table.

A primary key is a minimal set of columns such that any two records with identical values in those columns have identical values in all columns. Note that a primary key can consist of multiple columns.
A uniqueness constraint is exactly what it sounds like.

The UNIQUE constraint uniquely identifies each record in a database table.
The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.
Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table

Primary key can't be null but unique constraint is nullable.
when you choose a primary key for your table it's atomatically Index that field.

Primary keys are essentially combination of (unique +not null). also when referencing a foreign key the rdbms requires Primary key.
Unique key just imposes uniqueness of the column.A value of field can be NULL in case of uniqe key. Also it cannot be used to reference a foreign key, that is quite obvious as u can have null values in it

Both guarantee uniqueness across the rows in the table, with the exception of nulls as mentioned in some of the other answers.
Additionally, a primary key "comes with" an index, which could be either clustered or non-clustered.

There are several good answers in here so far. In addition to the fact that a primary key can't be null, is a unique constraint itself, and can be comprised of multiple columns, there are deeper meanings that depend on the database server you are using.
I am a SQL Server user, so a primary key has a more specific meaning to me. In SQL Server, by default, primary keys are also part of what is called the "clustered index". A clustered index defines the actual ordering of data pages for that particular table, which means that the primary key ordering matches the physical order of rows on disk.
I know that one, possibly more, of MySql's table formats also support clustered indexing, which means the same thing as it does in SQL Server...it defines the physical row ordering on disk.
Oracle provides something called Index Organized Tables, which order the rows on disk by the primary key.
I am not very familiar with DB2, however I think a clustered index means the rows on disk are stored in the same order as a separate index. I don't know if the clustered index must match the primary key, or if it can be a distinct index.

A great number of the answers here have discussed the properties of PK vs unique constraints. But it is more important to understand the difference in concept.
A primary key is considered to be an identifier of the record in the database. So these will for example be referenced when creating foreign key references between tables. A primary key should therefore under normal circumstances never contain values that have any meaining in your domain (often automatically incremential fields are used for this).
A unique constraint is just a way of enforcing domain specific business rules in your database schema.
Becuase a PK it is an identifier for the record, you can never change the value of a primary key.

Should I use Natural Identity Columns without GUID?

Is it ok to define tables primary key with 2 foreign key column that in combination must be unique? And thus not add a new id column (Guid or int)?
Is there a negative to this?

Yes, it's completely OK. Why not? The downside of composite primary keys is that it can be long and it might be harder to identify a single row uniquely from the application perspective. However, for a couple integer columns (specially in junction tables), it's a good practice.

Natural versus Artificial primary keys is one of those issues that gets widely debated and IMHO the discussion only seems to see the positions harden.
In my opinion both work so long as the developer knows how to avoid the downside of both. A natural primary key (whether composite or single column) more nearly ensures that duplicate rows are not added to the DB. Whereas with artificial primary keys it is necessary to first check the record is unique (as opposed to the primary key, which being artificial will always be unique). One effective way to achieve this is to have a unique or candidate index on the fields that make the record unique (e.g. the fields that make a candidate for a primary key)
Meanwhile an artificial primary key makes for easy joining. Relations can be made with a single field to single field join. With a composite key the writer of the SQL statement must know how many fields to include in the join.

For some definitions of "ok", yes. As long as you never intend to add additional fields to this intersection table, that'll be fine. However, if you intend to have more fields, it's good practice to have an ID field. It's still fine, come to think of it, but can be more awkward.
Unless, of course, disk space is at a serious premium!

If you look into any database textbook, you will find such tables en masse. This is the default way to define n-to-m relations. For example:
article = (id, title, text)
author = (id, name)
article_author = (article_id, author_id)
Semantically, article_author is not a new entity so you might refrain from defining it as the primary key and instead create it as a normal index with UNIQUE constraint.

Yes, I agree, with "some definition of OK" it is OK. But the moment you decide to reference this composite primary key from somewhere (i.e. move it to a foreign key), it quickly bocomes NG (Not Good).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas