Candidate key or Super key - sql

Consider a relational table with different columns, what would you call the collection of unique and not null values, super key or candidate key?

A Super key is a set or one of more columns to uniquely identify rows in a table.
Candidate keys are selected from the set of super keys, the only thing we take care while selecting candidate key is: It should not have any redundant attribute. That’s the reason they are also termed as minimal super key.
In Employee table there are Three Columns : Emp_Code,Emp_Number,Emp_Name
Super keys:
All of the following sets are able to uniquely identify rows of the employee table.
{Emp_Code}
{Emp_Number}
{Emp_Code, Emp_Number}
{Emp_Code, Emp_Name}
{Emp_Code, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
Candidate Keys:
As I stated above, they are the minimal super keys with no redundant attributes.
{Emp_Code}
{Emp_Number}
As a Summary:
A Superkey is a set of columns that uniquely identifies a row.Whereas a Candidate key would be a MINIMAL set of columns that uniquely identifies a row. So essentially a Superkey is a Candidate key with extra unnecessary columns in it.

Related

How is primary key is different from unique key when both are actually serving the same purpose as per my understanding?

Primary key is actually the one which cannot be repeated for more than one entries so as the same for unique key as far as I know.
Let's say, we take employee IDs as primary keys for 100 number of employees, this means that employee ID for two employees cannot and can never be same.
But then, what is the unique key? As employee ID is a unique identifier for each employee. What I just know is that the primary key cannot cater Null values whereas unique key can cater just one Null value.
But is that actually the difference between the two? Can someone, make it more easily understandable preferably with a code example.
Also, after differentiating between two, how can we define both in a single dataset? Are there any set rules which we have to follow.
A primary key has three properties:
The key is unique across all rows of a table.
The key (or no components of a composite key) are NULL.
There is only one per table.
In general, primary keys are used for foreign key relationships. They are typically integers, because that is somewhat more efficient for indexing.
Other columns or combinations of columns can be unique and non-NULL. Those are candidate primary keys. However, a table has only one primary key.
I hope my answer will clear your doubt,
Primary key is used as for foreign key to referencing to different tables.
But when we implement large scale Database and our requirement of Referencing more then two Keys from same table to different table(s).
At that time unique key come in scenario and will help you to do the above task easily, as per your(programmer) requirements.
Primary key can't have NULL values. Unique key can allows one NULL value.
A database table can't have more that one primary key but multiple unique keys are allowed in one table.
In sql server a clustered index automatically gets created with the primary key. On the other hand one unique key generates one non-clustered index.

nature key vs auto_increment key as the primary key

My problem is about nature key and auto_increment integer as primary key.
For example, I have tables A and B and A_B_relation. A and B may be some object, and A_B_realtion record the many to many relation of A and B.
Both A and B have their own global unique id, such as UUID. The UUID is available to user, this means user may query A or B by UUID.
There are two ways to design the table's primary key.
use the auto_increment integer. A_B_relation reference the integer as FK.
use the UUID. A_B_relation reference the UUID as FK.
For example, user want to query all the B's info associate with A by A's UUID.
For the first case, the query flow is this:
First, query A's integer primary key by UUID from `A`.
And then, query all the B's integer primary key from `A_B_relation`.
At last, query all the B's info from `B`.
For the latter case, the flow is as below:
Query all the B's UUID from the `A_B_relation` by A's UUID.
Query all the B's info from `B`.
So I think, the latter case is more convenient. Is this right? what's the shortage of the latter case?
According to my opinion convenience of using either natural key of auto-increment key depends on the program solution you are providing. Both methods have pros and cons. So the best solution is to understand both key types properly, analyze what kind of business solution you are trying to provide and select the appropriate primary key type.
Natural key is a column or a set of columns which we can be used to uniquely identify a record in a table. These columns contain real data which has a relationship with the rest of the columns of the table.
Auto-incremented key, also called as surrogate key is a single table column which contains unique numeric values which can be used to uniquely identify a single row of data in a table. These values are generated at run-time when a record is inserted to the table and has no relationship with the rest of the data of the row.
The main advantage of using Natural keys is it has it's own meaning and requires less joins with other tables where as if we used a surrogate key we would require to join to a foreign key table to get the results we got with the natural key.
But say we cannot get all the data required from single table and have to join with another table to get all the data required. Then it is convenient to use a surrogate key instead of natural key because most of the time natural keys are strings and larger in size than surrogate keys and it will take more time to join tables using larger values.
A natural key has it's own meaning. So when it comes to searching records it is more advantageous to use natural keys over surrogate keys. But say with time our program logic changes and we have to change the natural key value. This will be difficult and will cause a cascade effect over all foreign key relationships. We can overcome this problem using a surrogate key. Since a surrogate key does not have a relationship with the rest of the values of a row, changes of the logic won't have a affect over the surrogate key.
Likewise, as I see the convenience and inconvenience of using a surrogate key or a natural key entirely base on the solution you are providing.

What is it called when there are two columns in a table that are individually unique?

As the question asks I'm wondering what you would call it when a row could be uniquely identified by two separate columns.
I apologize for my lack of formatting, but here goes:
ColumnA: 1,2,3,4,5
ColumnB: A,B,C,D,E
ColumnC: 1,2,1,2,1
ColumnD: A,A,B,B,A
So both Column A and Column B are each individually primary keys? I don't think this is correct, so what would you call this in terms of "keys"?
If you mean that every row always has a unique A value and every row always has a unique B value then each of them is a "superkey". (Actually, the sets {A} and {B}.)
So the answer is, the table has two one-column superkeys.
But also according to your data no smaller set of columns of either superkey is a superkey, each is also a "candidate key". (The only smaller set is {}, but it's not a superkey given your data.)
So here an equivalent answer is, the table has two one-column candidate keys.
A base table that is like that and that can hold your example data is in 5NF. There is no benefit in decomposing it into other tables.
There is a tradition to pick a candidate key as a "primary key". Then each other one is an "alternate key". But this is not a particularly helpful tradition.
In SQL PRIMARY KEY or UNIQUE NOT NULL actually declares a superkey. You can only have one PRIMARY KEY declared per table. SQL does not allow declaring {} as a superkey.

Foreign keys vs secondary keys

I used to think that foreign key and secondary key are the same thing.
After Googling the result are even more confusing, some consider them to be the same, others said that a secondary key is an index that doesn't have to be unique, and allows faster access to data than with the primary key.
Can someone explain the difference?
Or is it indeed a case of mixed terminology?
Does it maybe differ per database type?
The definition in wiki/Foreign_key states that:
In the context of relational databases, a foreign key is a field (or
collection of fields) in one table that uniquely identifies a row of
another table. In other words, a foreign key is a column or a
combination of columns that is used to establish and enforce a link
between two tables.
The table containing the foreign key is called the referencing or
child table, and the table containing the candidate key is called the
referenced or parent table.
Take the example of the case:
A customer may place 0,1 or more orders.
From the point of the business, each customer is identified by a unique id (Primary Key) and instead of repeating the customer information with each order, we place a reference, or a pointer to that unique customer id (Customer's Primary Key) in the order table. By looking at any order, we can tell who placed it using the unique customer id.
The relationship established between the parent (Customer table) and the child table (Order table) is established when you set the value of the FK in the Order table after the Customer row has been inserted. Also, deleting a child row may affect the parent depending on your Referential Integrity stings (Cascading Rules) established when the FK was created. FKs help establish integrity in a relational database system.
As for the "Secondary Key", the term refers to a structure of 1 or more columns that together help retrieve 1 or more rows of the same table. The word 'key' is somewhat misleading to some. The Secondary Key does not have to be unique (unlike the PK). It is not the Primary Key of the table. It is used to locate rows in the same table it is defined within (unlike the FK). Its enforcement is only through an index (either unique or not) and it is implementation is optional. A table could have 0,1 or more Secondary Key(s). For example, in an Employee table, you may use an auto generated column as a primary key. Alternatively, you may decide to use the Employee Number or SSN to retrieve employee(s) information.
Sometimes people mix the term "Secondary Key" with the term "Candidate Key" or "Alternate Key" (usually appears in Normalization context) but they are all different.
A foreign key is a key that references an index on some other table. For example, if you have a table of customers, one of the columns on that table may be a country column which would just contain an ID number, which would match the ID of that country in a separate Country table. That country column in the customer table would be a foreign key.
A secondary key on the other hand is just a different column in the table that you have used to create an index (which is used to speed up queries). Foreign keys have nothing to do with improving query speeds.
"Secondary key" is not a term I'm familiar with. It doesn't appear in the index of Database Design for Mere Mortals and I don't remember it in Pro SQL Server 2012 Relational Database Design and Implementation (my two "goto" books for database design). It also doesn't appear in the index for SQL for Smarties. It sounds like its not an actual term at all.
I've always used the term "candidate key".
A candidate key is a way to uniquely identify an entity. You identify all the candidate keys during the design phase of a database system. During the implementation phase, you will decide on a primary key: either one of the candidate keys or an artificial key. The primary key will probably be implemented with a primary key constraint; the candidate keys will probably be implemented with unique constraints.
A foreign key is an instance of one entity's candidate key in another entity, representing a relationship between the two entities. It will probably be implemented with a foreign key constraints.

Superkey vs. Candidate key

What difference between Super and Candidate key in ERDB?
A superkey is a set of columns that uniquely identifies a row. A Candidate key would be a MINIMAL set of columns that uniquely identifies a row. So essentially a Superkey is a Candidate key with extra unnecessary columns in it.
candidate key is a minimal superkey
Candidate key = minimal key to identify a row
Super key = at least as wide as a candidate key
For me, a super key would generally introduce ambiguities over a candidate key
Let's keep it simple
SuperKey - A set of keys that uniquely defines a row.So out of all the attributes if even any single one is unique then all the subsets having that unique attribute falls under superkey.
Candidate Key - A superkey out of which no further subset can be derived which can identify the rows uniquely, Or we can simply say that it is the minimal superkey.
In nutshell: CANDIDATE KEY is a minimal SUPER KEY.
Where Super key is the combination of columns(or attributes) that uniquely identify any record(or tuple) in a relation(table) in RDBMS.
For instance, consider the following dependencies in a table having columns A, B, C and D
(Giving this table just for a quick example so not covering all dependencies that R could have).
Attribute set (Determinant)---Can Identify--->(Dependent)
A-----> AD
B-----> ABCD
C-----> CD
AC----->ACD
AB----->ABCD
ABC----->ABCD
BCD----->ABCD
Now, B, AB, ABC, BCD identifies all columns so those four qualify for the super key.
But, B⊂AB; B⊂ABC; B⊂BCD hence AB, ABC, and BCD disqualified for CANDIDATE KEY as their subsets could identify the relation, so they aren't minimal and hence only B is the candidate key, not the others.