Primary key in "many-to-many" table - sql

I have a table in a SQL database that provides a "many-to-many" connection.
The table contains id's of both tables and some fields with additional information about the connection.
CREATE TABLE SomeTable (
f_id1 INTEGER NOT NULL,
f_id2 INTEGER NOT NULL,
additional_info text NOT NULL,
ts timestamp NULL DEFAULT now()
);
The table is expected to contain 10 000 - 100 000 entries.
How is it better to design a primary key? Should I create an additional 'id' field, or to create a complex primary key from both id's?
DBMS is PostgreSQL

This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.
A big reason for this is so foreign key references to the table can use only one column.
In a many-to-many junction table (aka "association table"), this probably isn't necessary:
It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.
Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.
My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.

A surrogate key wont protect you from adding multiple instances of (f_id1, f_id2) so you should definitely have a unique constraint or primary key for that. What would the purpose of a surrogate key be in your scenario?

Yes that's actually what people commonly do, that key is called surrogate key.. I'm not exactly sure with PostgreSQL, but in MySQL by using surrogate key you can delete/edit the records from the user interface.. Besides, this allows the database to query the single key column faster than it could multiple columns.. Hope it helps..

Related

How is primary key is different from unique key when both are actually serving the same purpose as per my understanding?

Primary key is actually the one which cannot be repeated for more than one entries so as the same for unique key as far as I know.
Let's say, we take employee IDs as primary keys for 100 number of employees, this means that employee ID for two employees cannot and can never be same.
But then, what is the unique key? As employee ID is a unique identifier for each employee. What I just know is that the primary key cannot cater Null values whereas unique key can cater just one Null value.
But is that actually the difference between the two? Can someone, make it more easily understandable preferably with a code example.
Also, after differentiating between two, how can we define both in a single dataset? Are there any set rules which we have to follow.
A primary key has three properties:
The key is unique across all rows of a table.
The key (or no components of a composite key) are NULL.
There is only one per table.
In general, primary keys are used for foreign key relationships. They are typically integers, because that is somewhat more efficient for indexing.
Other columns or combinations of columns can be unique and non-NULL. Those are candidate primary keys. However, a table has only one primary key.
I hope my answer will clear your doubt,
Primary key is used as for foreign key to referencing to different tables.
But when we implement large scale Database and our requirement of Referencing more then two Keys from same table to different table(s).
At that time unique key come in scenario and will help you to do the above task easily, as per your(programmer) requirements.
Primary key can't have NULL values. Unique key can allows one NULL value.
A database table can't have more that one primary key but multiple unique keys are allowed in one table.
In sql server a clustered index automatically gets created with the primary key. On the other hand one unique key generates one non-clustered index.

Transform Logical Data model to SQL Table design

In a LDM I recently made, I have an entity which has the following structure:
Building_ID (Primary Key, Foreign Key),
Plant_ID (Foreign Key),
Build_Year (Primary Key),
Size
I need to create a table in a SQL database using this design. The question I'm running into is how do I handle the primary keys here? Is it OK for a SQL table to have multiple primary keys? If the answer to this question is yes, then which column should act as the unique index? Should I create a new column to act as the unique index identifier?
Any SQL table for any relational database system (SQL Server, Oracle, Firebird, IBM DB2, Sybase etc.) I know can only ever have one primary key - after all, it's the primary key - there can only ever be one.
However, a primary key can be made up from multiple columns (called a "compound primary key"). There are downsides such as: all foreign key constraints from other tables also must specify all columns in the compound PK, thus making joining the tables a bit of a pain (since you need to specify all equality constraints for all columns included in the key in your JOIN).
Besides a primary key, you can also have multiple alternate keys - other column(s) that could also identify the row uniquely. Those make excellent candidates for e.g. indices, if those can help you speed up access to the table (but don't over-index your tables! Less is more)

Should every table have a primary key?

I read somewhere saying that every table should have a primary key to fulfill 1NF.
I have a tbl_friendship table.
There are 2 fields in the table : Owner and Friend.
Fields of Owner and Friends are foreign keys of auto increment id field in tbl_user.
Should this tbl_friendship has a primary key?
Should I create an auto increment id field in tbl_friendship and make it as primary key?
Primary keys can apply to multiple columns! In your example, the primary key should be on both columns, For example (Owner, Friend). Especially when Owner and Friend are foreign keys to a users table rather than actual names say (personally, my identity columns use the "Id" naming convention and so I would have (OwnerId, FriendId)
Personally I believe every table should have a primary key, but you'll find others who disagree.
Here's an article I wrote on the topic of normal forms.
http://michaeljswart.com/2011/01/ridiculously-unnormalized-database-schemas-part-zero/
Yes every table should have a primary key.
Yes you should create surrogate key.. aka an auto increment pk field.
You should also make "Friend" an FK to that auto increment field.
If you think that you are going to "rekey" in the future you might want to look into using natural keys, which are fields that naturally identify your data. The key to this is while coding always use the natural identifiers, and then you create unique indexes on those natural keys. In the future if you have to re-key you can, because your ux guarantees your data is consistent.
I would only do this if you absolutely have to, because it increases complexity, in your code and data model.
It is not clear from your description, but are owner and friend foreign keys and there can be only one relationship between any given pair? This makes two foreign key column a perfect candidate for a natural primary key.
Another option is to use surrogate key (extra auto-incremented column as you suggested). Take a look here for an in-depth discussion.
A primary key can be something abstract as well. In this case, each tuple (owner, friend), e.g. ("Dave","Matt") can form a unique entry and therefore be your primary key. In that case, it would be useful not to use names, but keys referencing another table. If you guarantee, that these tuples can't have duplicates, you have a valid primary key.
For processing reasons it might be useful to introduce a special primary key, like an autoincrement field (e.g. in MySQL) or using a sequence with Oracle.
To comply with 1NF (which is not completely aggreed upon what defines 1NF), yes you should have a primary key identified on each table. This is necessary to provide for uniqueness of each record.
http://en.wikipedia.org/wiki/First_normal_form
In general, you can create a primary key in many ways, one of which is to have an auto-increment column, another is to have a column with GUIDs, another is to have two or more columns that will identify a row uniquely when taken together.
Your table will be much easier to manage in the long term if it has a primary key. At the very least, you need to uniquely identify each record in the table. The field that is used to uniquely identify each record might as well be the primary key.
Yes every table should have (at least one) key. Duplicating rows in any table is undesirable for lots of reasons so put the constraint on those two columns.

Why we can't have more than one primary key?

I Know there can't be more than 1 primary key in a table but what is the technical reason ?
Pulled directly from SO:
You can only have one primary key, but you can have multiple columns in your primary key.
You can also have Unique Indexes on your table, which will work a bit like a primary key in that they will enforce unique values, and will speed up querying of those values.
Primary in the context of Primary Key means that it's ranked first in importance. Therefore, there can only be one key. It's by definition.
It's also usually the key for which the index has the actual data attached to it, that is, the data is stored with the primary key index. Other indices contain only the data that's being indexed, and perhaps some Included Columns.
In fact E.F.Codd (the inventor of the Relational Database Model) [1] originated the term "primary key" to mean any number of keys of a relation - not just one. He made it clear that it was quite possible to have more than one such key. His suggestion was that the database designer could choose one key as a preferred identifier ("the primary key") - but in principle this was optional and such a choice was "arbitrary" (that was his word). Because all keys enjoy the same properties as each other there is no fundamental need to choose any one over another.
Later on [2] what Codd originally called primary keys became known as candidate keys and the one key singled out as the preferred one became known as the "primary" key. This was not really a fundamental shift however because a primary key means exactly the same as a candidate key. Since they are equivalent concepts it doesn't really mean anything important when we say there "must" only be one primary key. If you have more than one candidate key you could quite reasonably call more than one of them "primary" if you prefer because it doesn't make any logical or practical difference to the meaning and function of the database.
It has been argued (by me among others) that the idea of designating one key per table as "primary" is utterly superfluous and sometimes a positive hinderance to a good understanding of database design and data intgrity issues. However, the concept is so entrenched we are probably stuck with it.
So the proper answer to your question is "convention" and "convenience". There is no good technical reason at all.
[1] A Relational Model of Data for Large Shared Data Banks (1970)
[2] E.g. in "Further Normalization of the Relational Data Base Model" (1971)
Well, it's called "primary" for a reason. As in, its the one key used to uniquely identify the record... and there "can be only one".
You could certainly mimick a second "primary" key by having an index placed on one or more other fields that are unique but for the purposes of your database server it's generally only necessary if your key isn't unique enough to cross database servers in a merge replication situation. (ie: multi master).
PRIMARY KEY is usually equivalent to UNIQUE INDEX NOT NULL. So you can effectively have multiple "primary keys" on a single table.
The primary key is the key which uniquely identifies that record.
I'm not sure if you're asking if a) there can be a single primary key spanning multiple columns, or b) if you can have multiple keys which uniquely identify the record.
The first is possible, known as a composite primary key.
The second is possible also, but only one is called the primary key.
Because the "primary" in "primary key" denotes its, mmm, singularity(?).
But if you need more, you can define UNIQUE keys which have quite the same behaviour.
The technical reason is that there can be only one primary. Otherwise it wouldn't be called so.
However a primary key can include several columns - see 7.5.2. Multiple-Column Indexes
The primary key is the one (of possibly many) unique identifiers of a particular row in a table. The other unique identifiers, which were not designated as the primary one, are hence often refereed to as secondary unique indexes.
Primary key allows us to uniquely identify each record in the table. You can have 2 primary keys in a table but they are called Composite Primary Keys. "When you define more than one column as your primary key on a table, it is called a composite primary key."
A primary key defines record uniqueness. To have two different measures of uniqueness can be problematic. For example, if you have primary keys A and B and you insert records where A is the same and B is different, then are those records the same or different? If you consider them different, then make your primary a composite of A and B. If you consider them the same record, then just use A or B as the primary key.
For non-clustered index we can create two index and are typically made on non-primary key columns used in JOIN, WHERE , ORDER BY clauses.
While in clustered index we have only one index and that on primary key. So if we have two primary keys there is ambiguity.
Also in referential intergrity there is ambiguity selecting one of the two primary keys.
Only one primary key possible on the table because primary key creates a clustered index on the table which stored data physically on the leaf node in ordered way based on that primary key column.
If we try to create one another primary key on that table then there will be one major problem related to the data.Because be can not store same data of the table in two different-2 order.

What is the difference between a primary key and a unique constraint?

Someone asked me this question on an interview...
Primary keys can't be null. Unique keys can.
A primary key is a unique field on a table but it is special in the sense that the table considers that row as its key. This means that other tables can use this field to create foreign key relationships to themselves.
A unique constraint simply means that a particular field must be unique.
Primary key can not be null but unique can have only one null value.
Primary key create the cluster index automatically but unique key not.
A table can have only one primary key but unique key more than one.
TL;DR Much can be implied by PRIMARY KEY (uniqueness, reference-able, non-null-ness, clustering, etc) but nothing that can't be stated explicitly using UNIQUE.
I suggest that if you are the kind of coder who likes the convenience of SELECT * FROM... without having to list out all those pesky columns then PRIMARY KEY is just the thing for you.
a relvar can have several keys, but we choose just one for underlining
and call that one the primary key. The choice is arbitrary, so the
concept of primary is not really very important from a logical point
of view. The general concept of key, however, is very important! The
term candidate key means exactly the same as key (i.e., the addition
of candidate has no real significance—it was proposed by Ted Codd
because he regarded each key as a candidate for being nominated as the
primary key)... SQL allows a subset of a table's columns to be
declared as a key for that table. It also allows one of them to be
nominated as the primary key. Specifying a key to be primary makes
for a certain amount of convenience in connection with other
constraints that might be needed
What Is a Key? by Hugh Darwen
it's usual... to single out one key as the primary key (and any other
keys for the relvar in question are then said to be alternate keys).
But whether some key is to be chosen as primary, and if so which one,
are essentially psychological issues, beyond the purview of the
relational model as such. As a matter of good practice, most base
relvars probably should have a primary key—but, to repeat, this rule,
if it is a rule, really isn't a relational issue as such... Strong
recommendation [to SQL users]: For base tables, at any rate, use
PRIMARY KEY and/or UNIQUE specifications to ensure that every such
table does have at least one key.
SQL and Relational Theory: How to Write Accurate SQL Code
By C. J. Date
In standard SQL PRIMARY KEY
implies uniqueness but you can specify that explicitly (using UNIQUE).
implies NOT NULL but you can specify that explicitly when creating columns (but you should be avoiding nulls anyhow!)
allows you to omit its columns in a FOREIGN KEY but you can specify them explicitly.
can be declared for only one key per table but it is not clear why (Codd, who originally proposed the concept, did not impose such a restriction).
In some products PRIMARY KEY implies the table's clustered index but you can specify that explicitly (you may not want the primary key to be the clustered index!)
For some people PRIMARY KEY has purely psychological significance:
they think it signifies that the key will be referenced in a foreign key (this was proposed by Codd but not actually adopted by standard SQL nor SQL vendors).
they think it signifies the sole key of the table (but the failure to enforce other candidate keys leads to loss of data integrity).
they think it implies a 'surrogate' or 'artificial ' key with no significance to the business (but actually imposes unwanted significance on the enterprise by being exposed to users).
Every primary key is a unique constraint, but in addition to the PK, a table can have additional unique constraints.
Say you have a table Employees, PK EmployeeID. You can add a unique constraint on SSN, for example.
Unique Key constraints:
Unique key constraint will provide you a constraint like the column values should retain uniqueness.
It will create non-clustered index by default
Any number of unique constraints can be added to a table.
It will allow null value in the column.
ALTER TABLE table_name
ADD CONSTRAINT UNIQUE_CONSTRAINT
UNIQUE (column_name1, column_name2, ...)
Primary Key:
Primary key will create column data uniqueness in the table.
Primary key will create clustered index by default
Only one Primay key can be created for a table
Multiple columns can be consolidated to form a single primary key
It wont allow null values.
ALTER TABLE table_name
ADD CONSTRAINT KEY_CONSTRAINT
PRIMARY KEY (column_name)
In addition to Andrew's answer, you can only have one primary key per table but you can have many unique constraints.
Primary key's purpose is to uniquely identify a row in a table. Unique constraint ensures that a field's value is unique among the rows in table.
You can have only one primary key per table. You can have more than one unique constraint per table.
A primary key is a minimal set of columns such that any two records with identical values in those columns have identical values in all columns. Note that a primary key can consist of multiple columns.
A uniqueness constraint is exactly what it sounds like.
The UNIQUE constraint uniquely identifies each record in a database table.
The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.
Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table
Primary key can't be null but unique constraint is nullable.
when you choose a primary key for your table it's atomatically Index that field.
Primary keys are essentially combination of (unique +not null). also when referencing a foreign key the rdbms requires Primary key.
Unique key just imposes uniqueness of the column.A value of field can be NULL in case of uniqe key. Also it cannot be used to reference a foreign key, that is quite obvious as u can have null values in it
Both guarantee uniqueness across the rows in the table, with the exception of nulls as mentioned in some of the other answers.
Additionally, a primary key "comes with" an index, which could be either clustered or non-clustered.
There are several good answers in here so far. In addition to the fact that a primary key can't be null, is a unique constraint itself, and can be comprised of multiple columns, there are deeper meanings that depend on the database server you are using.
I am a SQL Server user, so a primary key has a more specific meaning to me. In SQL Server, by default, primary keys are also part of what is called the "clustered index". A clustered index defines the actual ordering of data pages for that particular table, which means that the primary key ordering matches the physical order of rows on disk.
I know that one, possibly more, of MySql's table formats also support clustered indexing, which means the same thing as it does in SQL Server...it defines the physical row ordering on disk.
Oracle provides something called Index Organized Tables, which order the rows on disk by the primary key.
I am not very familiar with DB2, however I think a clustered index means the rows on disk are stored in the same order as a separate index. I don't know if the clustered index must match the primary key, or if it can be a distinct index.
A great number of the answers here have discussed the properties of PK vs unique constraints. But it is more important to understand the difference in concept.
A primary key is considered to be an identifier of the record in the database. So these will for example be referenced when creating foreign key references between tables. A primary key should therefore under normal circumstances never contain values that have any meaining in your domain (often automatically incremential fields are used for this).
A unique constraint is just a way of enforcing domain specific business rules in your database schema.
Becuase a PK it is an identifier for the record, you can never change the value of a primary key.