Superkey vs. Candidate key - sql

What difference between Super and Candidate key in ERDB?

A superkey is a set of columns that uniquely identifies a row. A Candidate key would be a MINIMAL set of columns that uniquely identifies a row. So essentially a Superkey is a Candidate key with extra unnecessary columns in it.

candidate key is a minimal superkey

Candidate key = minimal key to identify a row
Super key = at least as wide as a candidate key
For me, a super key would generally introduce ambiguities over a candidate key

Let's keep it simple
SuperKey - A set of keys that uniquely defines a row.So out of all the attributes if even any single one is unique then all the subsets having that unique attribute falls under superkey.
Candidate Key - A superkey out of which no further subset can be derived which can identify the rows uniquely, Or we can simply say that it is the minimal superkey.

In nutshell: CANDIDATE KEY is a minimal SUPER KEY.
Where Super key is the combination of columns(or attributes) that uniquely identify any record(or tuple) in a relation(table) in RDBMS.
For instance, consider the following dependencies in a table having columns A, B, C and D
(Giving this table just for a quick example so not covering all dependencies that R could have).
Attribute set (Determinant)---Can Identify--->(Dependent)
A-----> AD
B-----> ABCD
C-----> CD
AC----->ACD
AB----->ABCD
ABC----->ABCD
BCD----->ABCD
Now, B, AB, ABC, BCD identifies all columns so those four qualify for the super key.
But, B⊂AB; B⊂ABC; B⊂BCD hence AB, ABC, and BCD disqualified for CANDIDATE KEY as their subsets could identify the relation, so they aren't minimal and hence only B is the candidate key, not the others.

Related

Candidate key or Super key

Consider a relational table with different columns, what would you call the collection of unique and not null values, super key or candidate key?
A Super key is a set or one of more columns to uniquely identify rows in a table.
Candidate keys are selected from the set of super keys, the only thing we take care while selecting candidate key is: It should not have any redundant attribute. That’s the reason they are also termed as minimal super key.
In Employee table there are Three Columns : Emp_Code,Emp_Number,Emp_Name
Super keys:
All of the following sets are able to uniquely identify rows of the employee table.
{Emp_Code}
{Emp_Number}
{Emp_Code, Emp_Number}
{Emp_Code, Emp_Name}
{Emp_Code, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
Candidate Keys:
As I stated above, they are the minimal super keys with no redundant attributes.
{Emp_Code}
{Emp_Number}
As a Summary:
A Superkey is a set of columns that uniquely identifies a row.Whereas a Candidate key would be a MINIMAL set of columns that uniquely identifies a row. So essentially a Superkey is a Candidate key with extra unnecessary columns in it.

What is it called when there are two columns in a table that are individually unique?

As the question asks I'm wondering what you would call it when a row could be uniquely identified by two separate columns.
I apologize for my lack of formatting, but here goes:
ColumnA: 1,2,3,4,5
ColumnB: A,B,C,D,E
ColumnC: 1,2,1,2,1
ColumnD: A,A,B,B,A
So both Column A and Column B are each individually primary keys? I don't think this is correct, so what would you call this in terms of "keys"?
If you mean that every row always has a unique A value and every row always has a unique B value then each of them is a "superkey". (Actually, the sets {A} and {B}.)
So the answer is, the table has two one-column superkeys.
But also according to your data no smaller set of columns of either superkey is a superkey, each is also a "candidate key". (The only smaller set is {}, but it's not a superkey given your data.)
So here an equivalent answer is, the table has two one-column candidate keys.
A base table that is like that and that can hold your example data is in 5NF. There is no benefit in decomposing it into other tables.
There is a tradition to pick a candidate key as a "primary key". Then each other one is an "alternate key". But this is not a particularly helpful tradition.
In SQL PRIMARY KEY or UNIQUE NOT NULL actually declares a superkey. You can only have one PRIMARY KEY declared per table. SQL does not allow declaring {} as a superkey.

Which is better, have a primary key composed of an integer and a foreign key or have a primary key autoincrement and a foreign key?

I have a problem, the database admin have the follow structure:
As you can see the primary key of the table TCModulo is a composed key of the ID_modulo and ID_sistema which is a foreign key of the table TCSistemas.
I think that is better that the field ID_modulo from the table TCModulo must be the primary key with an auto_increment constrain, and the field ID_sistema must be only a foreign key.
Wich one is better?
Whether the PK of TCmodulo is (ID_modulo) or (ID_modulo,ID_sistema) depends on what goes in the table. We cannot answer your question unless you tell us. Presumably an ID_modulo value in a row is how you refer to some modulo. You have to tell us how to do that. But after that (for every column) (and given what situations can arise) there is no choice left about which sets of columns are candidates for primary key.
A set of columns whose subrow values are unique in a table is called a superkey. Any subrow containing a unique subrow is unique. So any set of columns containing a superkey is a superkey. A subrow that contains no (smaller) unique subrow is called a candidate key. So a superkey that contains no (smaller) superkey is a candidate key. One of the candidate keys of a table is chosen as primary key.
If ID_modulo uniquely determined a module over the whole application, then (ID_modulo) would be unique with no smaller unique subrow inside so it would be a candidate key. It would be the only one so it would be the primary key.
If ID_modulo uniquely determined a module only per sistema, then (ID_modulo,ID_sistema) would be unique with no smaller unique subrow (assuming there can more than one sistemo) so it would be a candidate key. It would be the only one so it would be the primary key.
So what candidate keys are available to be chosen as primary key is up to how your application refers to modulos. After that there is no choice about candidate keys. In each of these two cases there's only one candidate key so there's no choice about primary key either.
As to whether you should have a unique id overall or only within sistema or both or anything else, that depends on other ergonomic issues. Eg you are uniquely kentverger in stackoverflow (now; user names aren't necessarily unique), but perhaps uniquely Kent at home. Eg you probably prefer to call today something like the 4th of July, rather than day 185. But note that any candidate key serves as a unique identifier. So if ID_modulo is unique only within sistema, still (ID_modula,ID_sistema) is unique overall.
Note that this has nothing to do with modulos being many-to-one with sistemas per se. It has to do with columns forming unique subrows.
I always prefer to use an identity (auto-increment) for the primary key, as it keeps the pages clustered better and avoids fragmentation on the disk. You need a foreign key ID_sistema anyway, so add that too.

Difference between Primary key and Candidate key

I have read about Keys in RDBMS.
https://stackoverflow.com/a/6951124/1647112
I however couldn't understand the need to use a candidate key. If a primary key is all that is needed to uniquely identify a row in a table, why is candidate key required?
Please give a good example as to state the differences and importance of various keys.
Thanks in advance.
A table can have one or more candidate keys - these are keys that uniquely identify a row in the table.
However, only one of these candidate keys can be chosen to be the primary key.
From, the above answer i came to this conclusion
Super key(one or more attributes used for selecting one or more rows)
||
\/
Candidate key(one or more attributes from super used for selecting a single row)
||
\/
Primary key(one attribute among candidate keys used for selecting a single row)
Am i correct?

Regarding Candidate Keys and Superkeys

I have a quick question regarding candidate keys and superkeys. Say you have two keys (a, b) where 'a' is a primary key and b is a candidate key. Would the combination of these two keys be a superkey ie. would (a,b) be a superkey? Or would it be a candidate key. My assumption is that it would be a superkey because the definition of a candidate key states that it is a irreducible superkey and the combination of the two fields a and b could be reduced to either a or b. Is this logic correct? Or am I missing something here? Thanks!
Would the combination of these two keys be a superkey ie. would (a,b) be a superkey?
Yes, it would still uniquely identify rows.
Or would it be a candidate key.
No, it would no longer be minimal.
My assumption is that it would be a superkey because the definition of a candidate key states that it is a irreducible superkey and the combination of the two fields a and b could be reduced to either a or b. Is this logic correct?
Almost. Yes it would be a superkey, but not because it can be reduced. It would be a superkey because it is unique.
Every candidate key is superkey, but not every superkey is candidate key. So {a} is both candidate and superkey, {b} is both candidate and superkey and {a, b} is just superkey.