How would you go about fixing an existing a year database that uses a composite key from the fields school and year that no longer represent a unique row? One of these schools are releasing a biannual yearbook. Should I just generate an id and use that for the primary key?
I suggest adding a semester or term field. You could just create a surrogate key, but adding another field to your composite key gives you the flexibility to handle quarters/semesters neatly.
Related
Primary key is actually the one which cannot be repeated for more than one entries so as the same for unique key as far as I know.
Let's say, we take employee IDs as primary keys for 100 number of employees, this means that employee ID for two employees cannot and can never be same.
But then, what is the unique key? As employee ID is a unique identifier for each employee. What I just know is that the primary key cannot cater Null values whereas unique key can cater just one Null value.
But is that actually the difference between the two? Can someone, make it more easily understandable preferably with a code example.
Also, after differentiating between two, how can we define both in a single dataset? Are there any set rules which we have to follow.
A primary key has three properties:
The key is unique across all rows of a table.
The key (or no components of a composite key) are NULL.
There is only one per table.
In general, primary keys are used for foreign key relationships. They are typically integers, because that is somewhat more efficient for indexing.
Other columns or combinations of columns can be unique and non-NULL. Those are candidate primary keys. However, a table has only one primary key.
I hope my answer will clear your doubt,
Primary key is used as for foreign key to referencing to different tables.
But when we implement large scale Database and our requirement of Referencing more then two Keys from same table to different table(s).
At that time unique key come in scenario and will help you to do the above task easily, as per your(programmer) requirements.
Primary key can't have NULL values. Unique key can allows one NULL value.
A database table can't have more that one primary key but multiple unique keys are allowed in one table.
In sql server a clustered index automatically gets created with the primary key. On the other hand one unique key generates one non-clustered index.
I tried searching google to find this but I can't find a comparison between them. If anyone can tell me will be a great help.
Primary Key
The attribute that uniquely identifies a row or record in a relation is known as primary key
-like page number of a book
Secondary Key
A field or combination of fields that is basis for retrieval is known as secondary key (mainly used for finding details from large data)
like an index page of a book
Foreign Key
A field used to refer records in another table(primary key of refered table )
Primary Key: Is a single field chosen by the designer to uniquely identify a record in a table (relation), cannot be null (empty/unassigned).
Foreign Key: Is the Primary Key one table appearing (cross-referenced) in another table.
Secondary (or Alternative) Key: Is any field in the table that isn't selected to be any of the two types above.
Hope this helps.
It is traditional in SQL to designate one of the keys of a table as the "primary key". A "secondary" or "alternate" key is any key that is not selected as the primary. (The distinction doesn't have any basis in relational theory.)
A foreign key is a rather different kind of thing, and should have its own question.
I googled a lot, but I did not find the exact straight forward answer with an example.
Any example for this would be more helpful.
The primary key is a unique key in your table that you choose that best uniquely identifies a record in the table. All tables should have a primary key, because if you ever need to update or delete a record you need to know how to uniquely identify it.
A surrogate key is an artificially generated key. They're useful when your records essentially have no natural key (such as a Person table, since it's possible for two people born on the same date to have the same name, or records in a log, since it's possible for two events to happen such they they carry the same timestamp). Most often you'll see these implemented as integers in an automatically incrementing field, or as GUIDs that are generated automatically for each record. ID numbers are almost always surrogate keys.
Unlike primary keys, not all tables need surrogate keys, however. If you have a table that lists the states in America, you don't really need an ID number for them. You could use the state abbreviation as a primary key code.
The main advantage of the surrogate key is that they're easy to guarantee as unique. The main disadvantage is that they don't have any meaning. There's no meaning that "28" is Wisconsin, for example, but when you see 'WI' in the State column of your Address table, you know what state you're talking about without needing to look up which state is which in your State table.
A surrogate key is a made up value with the sole purpose of uniquely identifying a row. Usually, this is represented by an auto incrementing ID.
Example code:
CREATE TABLE Example
(
SurrogateKey INT IDENTITY(1,1) -- A surrogate key that increments automatically
)
A primary key is the identifying column or set of columns of a table. Can be surrogate key or any other unique combination of columns (for example a compound key). MUST be unique for any row and cannot be NULL.
Example code:
CREATE TABLE Example
(
PrimaryKey INT PRIMARY KEY -- A primary key is just an unique identifier
)
All keys are identifiers used as surrogates for the things they identify. E.F.Codd explained the concept of system-assigned surrogates as follows [1]:
Database users may cause the system to generate or delete a surrogate,
but they have no control over its value, nor is its value ever
displayed to them.
This is what is commonly referred to as a surrogate key. The definition is immediately problematic however because Codd was assuming that such a feature would be provided by the DBMS. DBMSs in general have no such feature. The keys are normally visible to at least some DBMS users as, for obvious reasons, they have to be. The concept of a surrogate has therefore morphed slightly in usage. The term is generally used in the data management profession to mean a key that is not exposed and used as an identifier in the business domain. Note that this is essentially unrelated to how the key is generated or how "artificial" it is perceived to be. All keys consist of symbols invented by humans or machines. The only possible significance of the term surrogate therefore relates how the key is used, not how it is created or what its values are.
[1] Extending the database relational model to capture more meaning, E.F.Codd, 1979
This is a great treatment describing the various kinds of keys:
http://www.agiledata.org/essays/keys.html
A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a column with an identity property to help generate surrogate key values.
The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain UNIQUE values.
A primary key column cannot contain NULL values.
Most tables should have a primary key, and each table can have only ONE primary key.
http://www.databasejournal.com/features/mssql/article.php/3922066/SQL-Server-Natural-Key-Verses-Surrogate-Key.htm
I think Michelle Poolet describes it in a very clear way:
A surrogate key is an artificially produced value, most often a
system-managed, incrementing counter whose values can range from 1 to
n, where n represents a table's maximum number of rows. In SQL Server,
you create a surrogate key by assigning an identity property to a
column that has a number data type.
http://sqlmag.com/business-intelligence/surrogate-key-vs-natural-key
It usually helps you use a surrogate key when you change a composite key with an identity column.
My problem is about nature key and auto_increment integer as primary key.
For example, I have tables A and B and A_B_relation. A and B may be some object, and A_B_realtion record the many to many relation of A and B.
Both A and B have their own global unique id, such as UUID. The UUID is available to user, this means user may query A or B by UUID.
There are two ways to design the table's primary key.
use the auto_increment integer. A_B_relation reference the integer as FK.
use the UUID. A_B_relation reference the UUID as FK.
For example, user want to query all the B's info associate with A by A's UUID.
For the first case, the query flow is this:
First, query A's integer primary key by UUID from `A`.
And then, query all the B's integer primary key from `A_B_relation`.
At last, query all the B's info from `B`.
For the latter case, the flow is as below:
Query all the B's UUID from the `A_B_relation` by A's UUID.
Query all the B's info from `B`.
So I think, the latter case is more convenient. Is this right? what's the shortage of the latter case?
According to my opinion convenience of using either natural key of auto-increment key depends on the program solution you are providing. Both methods have pros and cons. So the best solution is to understand both key types properly, analyze what kind of business solution you are trying to provide and select the appropriate primary key type.
Natural key is a column or a set of columns which we can be used to uniquely identify a record in a table. These columns contain real data which has a relationship with the rest of the columns of the table.
Auto-incremented key, also called as surrogate key is a single table column which contains unique numeric values which can be used to uniquely identify a single row of data in a table. These values are generated at run-time when a record is inserted to the table and has no relationship with the rest of the data of the row.
The main advantage of using Natural keys is it has it's own meaning and requires less joins with other tables where as if we used a surrogate key we would require to join to a foreign key table to get the results we got with the natural key.
But say we cannot get all the data required from single table and have to join with another table to get all the data required. Then it is convenient to use a surrogate key instead of natural key because most of the time natural keys are strings and larger in size than surrogate keys and it will take more time to join tables using larger values.
A natural key has it's own meaning. So when it comes to searching records it is more advantageous to use natural keys over surrogate keys. But say with time our program logic changes and we have to change the natural key value. This will be difficult and will cause a cascade effect over all foreign key relationships. We can overcome this problem using a surrogate key. Since a surrogate key does not have a relationship with the rest of the values of a row, changes of the logic won't have a affect over the surrogate key.
Likewise, as I see the convenience and inconvenience of using a surrogate key or a natural key entirely base on the solution you are providing.
I read somewhere saying that every table should have a primary key to fulfill 1NF.
I have a tbl_friendship table.
There are 2 fields in the table : Owner and Friend.
Fields of Owner and Friends are foreign keys of auto increment id field in tbl_user.
Should this tbl_friendship has a primary key?
Should I create an auto increment id field in tbl_friendship and make it as primary key?
Primary keys can apply to multiple columns! In your example, the primary key should be on both columns, For example (Owner, Friend). Especially when Owner and Friend are foreign keys to a users table rather than actual names say (personally, my identity columns use the "Id" naming convention and so I would have (OwnerId, FriendId)
Personally I believe every table should have a primary key, but you'll find others who disagree.
Here's an article I wrote on the topic of normal forms.
http://michaeljswart.com/2011/01/ridiculously-unnormalized-database-schemas-part-zero/
Yes every table should have a primary key.
Yes you should create surrogate key.. aka an auto increment pk field.
You should also make "Friend" an FK to that auto increment field.
If you think that you are going to "rekey" in the future you might want to look into using natural keys, which are fields that naturally identify your data. The key to this is while coding always use the natural identifiers, and then you create unique indexes on those natural keys. In the future if you have to re-key you can, because your ux guarantees your data is consistent.
I would only do this if you absolutely have to, because it increases complexity, in your code and data model.
It is not clear from your description, but are owner and friend foreign keys and there can be only one relationship between any given pair? This makes two foreign key column a perfect candidate for a natural primary key.
Another option is to use surrogate key (extra auto-incremented column as you suggested). Take a look here for an in-depth discussion.
A primary key can be something abstract as well. In this case, each tuple (owner, friend), e.g. ("Dave","Matt") can form a unique entry and therefore be your primary key. In that case, it would be useful not to use names, but keys referencing another table. If you guarantee, that these tuples can't have duplicates, you have a valid primary key.
For processing reasons it might be useful to introduce a special primary key, like an autoincrement field (e.g. in MySQL) or using a sequence with Oracle.
To comply with 1NF (which is not completely aggreed upon what defines 1NF), yes you should have a primary key identified on each table. This is necessary to provide for uniqueness of each record.
http://en.wikipedia.org/wiki/First_normal_form
In general, you can create a primary key in many ways, one of which is to have an auto-increment column, another is to have a column with GUIDs, another is to have two or more columns that will identify a row uniquely when taken together.
Your table will be much easier to manage in the long term if it has a primary key. At the very least, you need to uniquely identify each record in the table. The field that is used to uniquely identify each record might as well be the primary key.
Yes every table should have (at least one) key. Duplicating rows in any table is undesirable for lots of reasons so put the constraint on those two columns.