What does PRIMARY KEY actually signify, and does my table need one? - sql

I have a PostgreSQL 9.3 database with a users table that stores usernames in their case-preserved format. All queries will be case insensitive, so I should have an index that supports that. Additionally, usernames must be unique, regardless of case.
This is what I have come up with:
forum=> \d users
Table "public.users"
Column | Type | Modifiers
------------+--------------------------+------------------------
name | character varying(24) | not null
Indexes:
"users_lower_idx" UNIQUE, btree (lower(name::text))
Expressed in standard SQL syntax:
CREATE TABLE users (
name varchar(24) NOT NULL
);
CREATE UNIQUE INDEX "users_lower_idx" ON users (lower(name));
With this schema, I've satisfied all my constraints, albeit without a primary key. The SQL standard doesn't support functional primary keys, so I cannot promote the index:
forum=> ALTER TABLE users ADD PRIMARY KEY USING INDEX users_lower_idx;
ERROR: index "users_lower_idx" contains expressions
LINE 1: ALTER TABLE users ADD PRIMARY KEY USING INDEX users_lower_id...
^
DETAIL: Cannot create a primary key or unique constraint using such an index.
But, I already have the UNIQUE constraint, and the column is already marked "NOT NULL." If I had to have a primary key, I could construct the table like this:
CREATE TABLE users (
name varchar(24) PRIMARY KEY
);
CREATE UNIQUE INDEX "users_lower_idx" ON users (lower(name));
But then I'll have two indexes, and that seems wasteful and unnecessary to me. So, does PRIMARY KEY mean anything special to postgres beyond "UNIQUE NOT NULL," and am I missing anything by not having one?

First off, practically every table should have a primary key.
citext
The additional module provides a data type of the same name. "ci" for case insensitive. Per documentation:
The citext module provides a case-insensitive character string type,
citext. Essentially, it internally calls lower when comparing
values. Otherwise, it behaves almost exactly like text.
It is intended for exactly the purpose you describe:
The citext data type allows you to eliminate calls to lower in SQL
queries, and allows a primary key to be case-insensitive.
Bold emphasis mine.
Be sure to read the manual about limitations first. Install it once per database with
CREATE EXTENSION citext;
text
If you don't want to go that route, I suggest you add a serial as surrogate primary key.
CREATE TABLE users (
user_id serial PRIMARY KEY
, username text NOT NULL
);
I would use text instead of varchar(24). Use a CHECK constraint if you need to enforce a maximum length (that may change at a later time). Details:
Any downsides of using data type "text" for storing strings?
Change PostgreSQL columns used in views
Along with the UNIQUE index in your original design (without type cast):
CREATE UNIQUE INDEX users_username_lower_idx ON users (lower(username));
The underlying integer of a serial is small and fast and does not have to waste time with lower() or the collation of your database. That's particularly useful for foreign key references. I mostly prefer that over some natural primary key with varying properties.
Both solutions have pros and cons.

I would suggest using a primary key, as you have stated you want something that is unique, and as you have demonstrated that you can put unique constraints on a username. I will assume that since this is a unique,not null username that you will use this to track your users in other parts of the Database, as well as allow usernames to be changed.
This is where a primary key will come in handy, instead of having to go into all of your tables and change the value of the Username column, you will only have one place to change it.
Example
Without primary key:
Table users
Username
'Test'
Table thingsdonebyUsers
RandomColumn AnotherColumn Username
RandomValue RandomValue Test
Now assume your user wants to change his username to Test1, well now you have to go find everywhere you used Username and change that to the new value before you change it in your users table since I'm assuming you will have a constraint there.
With Primary Key
Table users
PK Username
1 'Test'
Table thingsdonebyUsers
RandomColumn AnotherColumn PK_Users
RandomValue RandomValue 1
Now you can just change your users table and be done with the change.
You can still enforce unique and not null on your username column as you demonstrated.
This is just one of the many advantages of having normalized tables, which requires your tables to have a Primary Key that is an unrelated value(forget what the proper name is for this right now).
As for what a PK actually signifies, it just a non nullable unique column that identifies the row, so in this sense you already have a Primary Key on your table. The thing is that usually PKs are INT numbers because of the reason that I explained above.

Short answer: No, you don't need a declarative "PRIMARY KEY", since the UNIQUE index serves the same exact purpose.
Long answer:
The idea of having Primary Keys comes from database systems where the data is physically in key order. This requires having a single, "primary" key. MySQL InnoDB is this way, as are many older databases.
However, PostgreSQL does not keep the tables in key order; it separates the indexes, including the primary key index, from the heap, which is essentially unordered. As a result, in Postgres, there is no material difference between primary keys and unique indexes. You can even create a foreign key against a unique index, as long as that index covers the whole table.
That being said, some tools external to PostgreSQL look for primary keys and do not regard unique indexes as being equivalent. These tools may cause you issues because of not finding a PK.

Related

Primary key in "many-to-many" table

I have a table in a SQL database that provides a "many-to-many" connection.
The table contains id's of both tables and some fields with additional information about the connection.
CREATE TABLE SomeTable (
f_id1 INTEGER NOT NULL,
f_id2 INTEGER NOT NULL,
additional_info text NOT NULL,
ts timestamp NULL DEFAULT now()
);
The table is expected to contain 10 000 - 100 000 entries.
How is it better to design a primary key? Should I create an additional 'id' field, or to create a complex primary key from both id's?
DBMS is PostgreSQL
This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.
A big reason for this is so foreign key references to the table can use only one column.
In a many-to-many junction table (aka "association table"), this probably isn't necessary:
It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.
Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.
My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.
A surrogate key wont protect you from adding multiple instances of (f_id1, f_id2) so you should definitely have a unique constraint or primary key for that. What would the purpose of a surrogate key be in your scenario?
Yes that's actually what people commonly do, that key is called surrogate key.. I'm not exactly sure with PostgreSQL, but in MySQL by using surrogate key you can delete/edit the records from the user interface.. Besides, this allows the database to query the single key column faster than it could multiple columns.. Hope it helps..

Should every table have a primary key?

I read somewhere saying that every table should have a primary key to fulfill 1NF.
I have a tbl_friendship table.
There are 2 fields in the table : Owner and Friend.
Fields of Owner and Friends are foreign keys of auto increment id field in tbl_user.
Should this tbl_friendship has a primary key?
Should I create an auto increment id field in tbl_friendship and make it as primary key?
Primary keys can apply to multiple columns! In your example, the primary key should be on both columns, For example (Owner, Friend). Especially when Owner and Friend are foreign keys to a users table rather than actual names say (personally, my identity columns use the "Id" naming convention and so I would have (OwnerId, FriendId)
Personally I believe every table should have a primary key, but you'll find others who disagree.
Here's an article I wrote on the topic of normal forms.
http://michaeljswart.com/2011/01/ridiculously-unnormalized-database-schemas-part-zero/
Yes every table should have a primary key.
Yes you should create surrogate key.. aka an auto increment pk field.
You should also make "Friend" an FK to that auto increment field.
If you think that you are going to "rekey" in the future you might want to look into using natural keys, which are fields that naturally identify your data. The key to this is while coding always use the natural identifiers, and then you create unique indexes on those natural keys. In the future if you have to re-key you can, because your ux guarantees your data is consistent.
I would only do this if you absolutely have to, because it increases complexity, in your code and data model.
It is not clear from your description, but are owner and friend foreign keys and there can be only one relationship between any given pair? This makes two foreign key column a perfect candidate for a natural primary key.
Another option is to use surrogate key (extra auto-incremented column as you suggested). Take a look here for an in-depth discussion.
A primary key can be something abstract as well. In this case, each tuple (owner, friend), e.g. ("Dave","Matt") can form a unique entry and therefore be your primary key. In that case, it would be useful not to use names, but keys referencing another table. If you guarantee, that these tuples can't have duplicates, you have a valid primary key.
For processing reasons it might be useful to introduce a special primary key, like an autoincrement field (e.g. in MySQL) or using a sequence with Oracle.
To comply with 1NF (which is not completely aggreed upon what defines 1NF), yes you should have a primary key identified on each table. This is necessary to provide for uniqueness of each record.
http://en.wikipedia.org/wiki/First_normal_form
In general, you can create a primary key in many ways, one of which is to have an auto-increment column, another is to have a column with GUIDs, another is to have two or more columns that will identify a row uniquely when taken together.
Your table will be much easier to manage in the long term if it has a primary key. At the very least, you need to uniquely identify each record in the table. The field that is used to uniquely identify each record might as well be the primary key.
Yes every table should have (at least one) key. Duplicating rows in any table is undesirable for lots of reasons so put the constraint on those two columns.

Example Use of MySQL Indexes?

I was reading this question about the difference between these 4: Differences between INDEX, PRIMARY, UNIQUE, FULLTEXT in MySQL?
However, it is all very abstract and vague to me after reading it. Maybe it would help me understand it more concretely if I had some examples of when I would use it.
For example, I think for the field user_id, we would use the index UNIQUE, correct?
A primary key is not an index, per se --it's a constraint.
The primary key uniquely identifies a row from all the rest - that means the values must be unique. A primary key is typically made of one column, but can be made of more than one - multiple columns are called a composite....
A unique constraint is implemented in MySQL as an index - it guarantees that the same value can not occur more than once in the column(s) it is defined for. A unique constraint/index is redundant on a primary key column, and a primary key could be considered a synonym but with bigger implications. These too support composites...
In MySQL (and SQL Server), there are two types of indexes - clustered and non-clustered. A clustered index is typically associated with the primary key, and automatically created if a primary key is defined in the CREATE TABLE statement. But it doesn't have to be - it's the most important index to a table, so if it's more optimal to associate with different columns then the change should be reviewed. There can only be one clustered index for a table - the rest are non-clustered indexes. The amount of space you have to define indexes depends on the table engine - 1,000 for MyISAM and 767 for InnoDB. Indexes, clustered an non, are used to speed up data retrieval and their use can be triggered by using the columns in SELECT, JOIN, WHERE and ORDER BY clauses. But they also slow down INSERT/UPDATE/DELETE statements because of maintaining that data.
Full Text indexes are explicitly for Full Text Search (FTS) functionality - no other functionality can make use of them. They are only for columns defined with string based data types.
Mind that indexes are not ANSI - the similarities are thankfully relatively consistent. Oracle doesn't distinguish indexes - they're all the same.
Here's a brief description of what they are and when to use them:
INDEX: To speed up searches for values in this column.
UNIQUE: When you want to constrain the column to only contain unique values. This also adds an index. Example: if you only want each email to be registered once, you can add a unique constraint on the email column.
PRIMARY KEY: Contains the identity for each row. A primary key also implies a unique index and a "not null" constraint. It is often an auto-increment id, but it could also be a natural key. It is generally a good idea to create a primary key for each table, although it is not required.
FULL TEXT: This is a special type of index that allows you to perform searches for text strings much faster than LIKE '%foo%'.
Note that I am only considering single column indexes here. It is also possible to have a multi-column index.
if you have a "Person" table with id,name,email and bio information...
The primary key is the id, maybe it's a number it will allways be unique and you could use it as a reference in other tables (foreign keys)
the email is unique on each person, so you could add a UNIQUE constraint there
you might want to search a person over his name constantly so you could add an INDEX there
and finally a full text search in the bio attribute because it might be useful on a search
primary keys are also UNIQUE

What is the difference between a primary key and a unique constraint?

Someone asked me this question on an interview...
Primary keys can't be null. Unique keys can.
A primary key is a unique field on a table but it is special in the sense that the table considers that row as its key. This means that other tables can use this field to create foreign key relationships to themselves.
A unique constraint simply means that a particular field must be unique.
Primary key can not be null but unique can have only one null value.
Primary key create the cluster index automatically but unique key not.
A table can have only one primary key but unique key more than one.
TL;DR Much can be implied by PRIMARY KEY (uniqueness, reference-able, non-null-ness, clustering, etc) but nothing that can't be stated explicitly using UNIQUE.
I suggest that if you are the kind of coder who likes the convenience of SELECT * FROM... without having to list out all those pesky columns then PRIMARY KEY is just the thing for you.
a relvar can have several keys, but we choose just one for underlining
and call that one the primary key. The choice is arbitrary, so the
concept of primary is not really very important from a logical point
of view. The general concept of key, however, is very important! The
term candidate key means exactly the same as key (i.e., the addition
of candidate has no real significance—it was proposed by Ted Codd
because he regarded each key as a candidate for being nominated as the
primary key)... SQL allows a subset of a table's columns to be
declared as a key for that table. It also allows one of them to be
nominated as the primary key. Specifying a key to be primary makes
for a certain amount of convenience in connection with other
constraints that might be needed
What Is a Key? by Hugh Darwen
it's usual... to single out one key as the primary key (and any other
keys for the relvar in question are then said to be alternate keys).
But whether some key is to be chosen as primary, and if so which one,
are essentially psychological issues, beyond the purview of the
relational model as such. As a matter of good practice, most base
relvars probably should have a primary key—but, to repeat, this rule,
if it is a rule, really isn't a relational issue as such... Strong
recommendation [to SQL users]: For base tables, at any rate, use
PRIMARY KEY and/or UNIQUE specifications to ensure that every such
table does have at least one key.
SQL and Relational Theory: How to Write Accurate SQL Code
By C. J. Date
In standard SQL PRIMARY KEY
implies uniqueness but you can specify that explicitly (using UNIQUE).
implies NOT NULL but you can specify that explicitly when creating columns (but you should be avoiding nulls anyhow!)
allows you to omit its columns in a FOREIGN KEY but you can specify them explicitly.
can be declared for only one key per table but it is not clear why (Codd, who originally proposed the concept, did not impose such a restriction).
In some products PRIMARY KEY implies the table's clustered index but you can specify that explicitly (you may not want the primary key to be the clustered index!)
For some people PRIMARY KEY has purely psychological significance:
they think it signifies that the key will be referenced in a foreign key (this was proposed by Codd but not actually adopted by standard SQL nor SQL vendors).
they think it signifies the sole key of the table (but the failure to enforce other candidate keys leads to loss of data integrity).
they think it implies a 'surrogate' or 'artificial ' key with no significance to the business (but actually imposes unwanted significance on the enterprise by being exposed to users).
Every primary key is a unique constraint, but in addition to the PK, a table can have additional unique constraints.
Say you have a table Employees, PK EmployeeID. You can add a unique constraint on SSN, for example.
Unique Key constraints:
Unique key constraint will provide you a constraint like the column values should retain uniqueness.
It will create non-clustered index by default
Any number of unique constraints can be added to a table.
It will allow null value in the column.
ALTER TABLE table_name
ADD CONSTRAINT UNIQUE_CONSTRAINT
UNIQUE (column_name1, column_name2, ...)
Primary Key:
Primary key will create column data uniqueness in the table.
Primary key will create clustered index by default
Only one Primay key can be created for a table
Multiple columns can be consolidated to form a single primary key
It wont allow null values.
ALTER TABLE table_name
ADD CONSTRAINT KEY_CONSTRAINT
PRIMARY KEY (column_name)
In addition to Andrew's answer, you can only have one primary key per table but you can have many unique constraints.
Primary key's purpose is to uniquely identify a row in a table. Unique constraint ensures that a field's value is unique among the rows in table.
You can have only one primary key per table. You can have more than one unique constraint per table.
A primary key is a minimal set of columns such that any two records with identical values in those columns have identical values in all columns. Note that a primary key can consist of multiple columns.
A uniqueness constraint is exactly what it sounds like.
The UNIQUE constraint uniquely identifies each record in a database table.
The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.
Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table
Primary key can't be null but unique constraint is nullable.
when you choose a primary key for your table it's atomatically Index that field.
Primary keys are essentially combination of (unique +not null). also when referencing a foreign key the rdbms requires Primary key.
Unique key just imposes uniqueness of the column.A value of field can be NULL in case of uniqe key. Also it cannot be used to reference a foreign key, that is quite obvious as u can have null values in it
Both guarantee uniqueness across the rows in the table, with the exception of nulls as mentioned in some of the other answers.
Additionally, a primary key "comes with" an index, which could be either clustered or non-clustered.
There are several good answers in here so far. In addition to the fact that a primary key can't be null, is a unique constraint itself, and can be comprised of multiple columns, there are deeper meanings that depend on the database server you are using.
I am a SQL Server user, so a primary key has a more specific meaning to me. In SQL Server, by default, primary keys are also part of what is called the "clustered index". A clustered index defines the actual ordering of data pages for that particular table, which means that the primary key ordering matches the physical order of rows on disk.
I know that one, possibly more, of MySql's table formats also support clustered indexing, which means the same thing as it does in SQL Server...it defines the physical row ordering on disk.
Oracle provides something called Index Organized Tables, which order the rows on disk by the primary key.
I am not very familiar with DB2, however I think a clustered index means the rows on disk are stored in the same order as a separate index. I don't know if the clustered index must match the primary key, or if it can be a distinct index.
A great number of the answers here have discussed the properties of PK vs unique constraints. But it is more important to understand the difference in concept.
A primary key is considered to be an identifier of the record in the database. So these will for example be referenced when creating foreign key references between tables. A primary key should therefore under normal circumstances never contain values that have any meaining in your domain (often automatically incremential fields are used for this).
A unique constraint is just a way of enforcing domain specific business rules in your database schema.
Becuase a PK it is an identifier for the record, you can never change the value of a primary key.

SQL: To primary key or not to primary key?

I have a table with sets of settings for users, it has the following columns:
UserID INT
Set VARCHAR(50)
Key VARCHAR(50)
Value NVARCHAR(MAX)
TimeStamp DATETIME
UserID together with Set and Key are unique. So a specific user cannot have two of the same keys in a particular set of settings. The settings are retrieved by set, so if a user requests a certain key from a certain set, the whole set is downloaded, so that the next time a key from the same set is needed, it doesn't have to go to the database.
Should I create a primary key on all three columns (userid, set, and key) or should I create an extra field that has a primary key (for example an autoincrement integer called SettingID, bad idea i guess), or not create a primary key, and just create a unique index?
----- UPDATE -----
Just to clear things up: This is an end of the line table, it is not joined in anyway. UserID is a FK to the Users table. Set is not a FK. It is pretty much a helper table for my GUI.
Just as an example: users get the first time they visit parts of the website, a help balloon, which they can close if they want. Once they click it away, I will add some setting to the "GettingStarted" set that will state they helpballoon X has been disabled. Next time when the user comes to the same page, the setting will state that help balloon X should not be shown anymore.
Having composite unique keys is mostly not a good idea.
Having any business relevant data as primary key can also make you troubles. For instance, if you need to change the value. If it is not possible in the application to change the value, it could be in the future, or it must be changed in an upgrade script.
It's best to create a surrogate key, a automatic number which does not have any business meaning.
Edit after your update:
In this case, you can think of having conceptually no primary key, and make this three columns either the primary key of a composite unique key (to make it changeable).
Should I create a primary key on all three columns (userid, set, and key)
Make this one.
Using surrogate primary key will result in an extra column which is not used for other purposes.
Creating a UNIQUE INDEX along with surrogate primary key is same as creating a non-clustered PRIMARY KEY, and will result in an extra KEY lookup which is worse for performance.
Creating a UNIQUE INDEX without a PRIMARY KEY will result in a HEAP-organized table which will need an extra RID lookup to access the values: also not very good.
How many Key's and Set's do you have? Do these need to be varchar(50) or can they point to a lookup table? If you can convert this Set and Key into SetId and KeyId then you can create your primary key on the 3 integer values which will be much faster.
I would probably try to make sure that UserID was a unique identifier, rather than having duplicates of UserID throughout the code. Composite keys tend to get confusing later on in your code's life.
I'm assuming this is a lookup field for config values of some kind, so you could probably go with the composite key if this is the case. The data is already there. You can guarantee it's uniqueness using the primary key. If you change your mind and decide later that it isn't appropriate for you, you can easily add a SettingId and make the original composite key a unique index.
Create one, separate primary key. No matter what how bussines logic will change, what new rules will have to be applied to your Key VARCHAR(50) field - having one primary key will make you completly independent of bussines logic.
In my experience it all depends how many tables will be using this table as FK information. Do you want 3 extra columns in your other tables just to carry over a FK?
Personally I would create another FK column and put a unique constraint over the other three columns. This makes foreign keys to this table a lot easier to swallow.
I'm not a proponent of composite keys, but in this case as an end of the line table, it might make sense. However, if you allow nulls in any of these three fields becasue one or more of the values is not known at the time of the insert, there can be difficulty and a unique index might be better.
Better have UserID as 32 bit newid() or unique identifier because UserID as int gives a hint to the User of the probable UserID. This will also solve your issue of composite key.