If two tables have a "One To One" relationship, should they have the same column as primary key? - sql

Let's say I have a table called Users which represents registered users of a website. I also have an AccountActivation table which stores the randomly generated string sent to a new user's email to verify that email.
The AccountActivation table has UserId column which also happen to be the primary key for the Users table. It also has the ActivationCode column to store the code. Either column could uniquely identify a row in the AccountActivation table.
So if I choose the activation code column as the primary key, I end up having two one-to-one tables with different primary keys. I thought in one to one relationship, the two tables must have the same primary key?

If you choose ActivationCode as PK, then why do you have two one-to-one relations?
The only relation that's there is
AccountActivation.UserId -> Users.UserId
or what else do you think you suddenly have?
If go do what you suggested, then the table Users has its PK on UserId and table AccountActivation has its PK on ActivationCode - not a problem at all, and there's no reason not to do it this way.
Which column (UserId or ActivationCode) you pick for the PK of AccountActivation doesn't matter - that doesn't influence / disturb the FK relationship between AccountActivation and User, nor does it add an extra one-to-one relationship of any kind .....
If you do choose ActivationCode for the PK of AccountActivation, the only extra step that I would take is creating a nonclustered index on UserId so that queries that join the two tables will benefit from maximum performance.

If there is only to be one ActivationCode they could share the UserId. But that would imply that when a user re-generated a key you should, either update the old row or delete it.
But why you need to store such data? You could also composite the account activation code with some sort of computation and encryption with unique data from the User.
Just to illustrate my suggestion:
Users table has two columns UserId CreationDate
Then the token might be UserId + CreationDate (example). You would be able to generate and check it without the extra data in the database. I know that this might not suite your requirements.

Make the UserId column in the AccountActivation a foreign key to the Users table.
Users
=====
UserId primary key
Name
Address
etc...
AccountActivation
=================
UserId primary key (foreign key to Users.UserId)
ActivationCode (unique constraint)
Now you have a one-to-one relationship

You need not have the same column as primary key in 2 tables to have one-to-one relationship.
You can have any column as primary key in the AccountActivation table.
UserId which is the foreign key to the AccountActivation table is the primary key in Users table. So, you should definitely be able to uniquely identify a users activation code from the AccountActivation table using this column, no matter whether it is primary key of that table(But it should be unique and I hope it would).

Related

Use or not Key as both FK and PK

On a database I have the following:
USERS
Id (PK)
Name
DOCTORS
Id (PK)
UserId (FK)
CurriculumVitae
Birthdate
...
All doctors I also Users of the application but have extra columns and I am displaying all doctors on a page.
Does it make sense to have Id and UserId on DOCTORS table?
Or should I make UserId both the PK and FK of table DOCTORS:
DOCTORS
UserId (PK, FK)
CurriculumVitae
Birthdate
...
In your opinion when should I go for this option?
Yes, you can make UserId (as you have defined it) both the primary key and foreign key. The primary key specifies that it is unique within the table. The foreign key maps it back to the Users table.
This is an example of a subsetting relationship.
I would actually call the columns Users.UserId and Doctors.DoctorId. I think that is less confusing nomenclature that captures the most important aspect of the two columns. However, the actual names are not important for your question (and yours are reasonable).
If you did not make it a PK also then you could have duplicates which is probably not what you want.

One Primary Key Value in many tables

This may seem like a simple question, but I am stumped:
I have created a database about cars (in Oracle SQL developer). I have amongst other tables a table called: Manufacturer and a table called Parentcompany.
Since some manufacturers are owned by bigger corporations, I will also show them in my database.
The parentcompany table is the "parent table" and the Manufacturer table the "child table".
for both I have created columns, each having their own Primary Key.
For some reason, when I inserted the values for my columns, I was able to use the same value for the primary key of Manufacturer and Parentcompany
The column: ManufacturerID is primary Key of Manufacturer. The value for this is: 'MBE'
The column: ParentcompanyID is primary key of Parentcompany. The value for this is 'MBE'
Both have the same value. Do I have a problem with the thinking logic?
Or do I just not understand how primary keys work?
Does a primary key only need to be unique in a table, and not the database?
I would appreciate it if someone shed light on the situation.
A primary key is unique for each table.
Have a look at this tutorial: SQL - Primary key
A primary key is a field in a table which uniquely identifies each
row/record in a database table. Primary keys must contain unique
values. A primary key column cannot have NULL values.
A table can have only one primary key, which may consist of single or
multiple fields. When multiple fields are used as a primary key, they
are called a composite key.
If a table has a primary key defined on any field(s), then you cannot
have two records having the same value of that field(s).
Primary key is table-unique. You can use same value of PI for every separate table in DB. Actually that often happens as PI often incremental number representing ID of a row: 1,2,3,4...
For your case more common implementation would be to have hierarchical table called Company, which would have fields: company_name and parent_company_name. In case company has a parent, in field parent_company_name it would have some value from field company_name.
There are several reasons why the same value in two different PKs might work out with no problems. In your case, it seems to flow naturally from the semantics of the data.
A row in the Manufacturers table and a row in the ParentCompany table both appear to refer to the same thing, namely a company. In that case, giving a company the same id in both tables is not only possible, but actually useful. It represents a 1 to 1 correspondence between manufacturers and parent companies without adding extra columns to serve as FKs.
Thanks for the quick answers!
I think I know what to do now. I will create a general company table, in which all companies will be stored. Then I will create, as I go along specific company tables like Manufacturer and parent company that reference a certain company in the company table.
To clarify, the only column I would put into the sub-company tables is a column with a foreign key referencing a column of the company table, yes?
For the primary key, I was just confused, because I hear so much about the key needing to be unique, and can't have the same value as another. So then this condition only goes for tables, not the whole database. Thanks for the clarification!

MS SQL creating many-to-many relation with a junction table

I'm using Microsoft SQL Server Management Studio and while creating a junction table should I create an ID column for the junction table, if so should I also make it the primary key and identity column? Or just keep 2 columns for the tables I'm joining in the many-to-many relation?
For example if this would be the many-to many tables:
MOVIE
Movie_ID
Name
etc...
CATEGORY
Category_ID
Name
etc...
Should I make the junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
Movie_Category_Junction_ID
[and make the Movie_Category_Junction_ID my Primary Key and use it as the Identity Column] ?
Or:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
[and just leave it at that with no primary key or identity table] ?
I would use the second junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
The primary key would be the combination of both columns. You would also have a foreign key from each column to the Movie and Category table.
The junction table would look similar to this:
create table movie_category_junction
(
movie_id int,
category_id int,
CONSTRAINT movie_cat_pk PRIMARY KEY (movie_id, category_id),
CONSTRAINT FK_movie
FOREIGN KEY (movie_id) REFERENCES movie (movie_id),
CONSTRAINT FK_category
FOREIGN KEY (category_id) REFERENCES category (category_id)
);
See SQL Fiddle with Demo.
Using these two fields as the PRIMARY KEY will prevent duplicate movie/category combinations from being added to the table.
There are different schools of thought on this. One school prefers including a primary key and naming the linking table something more significant than just the two tables it is linking. The reasoning is that although the table may start out seeming like just a linking table, it may become its own table with significant data.
An example is a many-to-many between magazines and subscribers. Really that link is a subscription with its own attributes, like expiration date, payment status, etc.
However, I think sometimes a linking table is just a linking table. The many to many relationship with categories is a good example of this.
So in this case, a separate one field primary key is not necessary. You could have a auto-assign key, which wouldn't hurt anything, and would make deleting specific records easier. It might be good as a general practice, so if the table later develops into a significant table with its own significant data (as subscriptions) it will already have an auto-assign primary key.
You can put a unique index on the two fields to avoid duplicates. This will even prevent duplicates if you have a separate auto-assign key. You could use both fields as your primary key (which is also a unique index).
So, the one school of thought can stick with integer auto-assign primary keys, and avoids compound primary keys. This is not the only way to do it, and maybe not the best, but it won't lead you wrong, into a problem where you really regret it.
But, for something like what you are doing, you will probably be fine with just the two fields. I'd still recommend either making the two fields a compound primary key, or at least putting a unique index on the two fields.
I would go with the 2nd junction table. But make those two fields as Primary key. That will restrict duplicate entries.

What is the difference between a candidate key and a primary key?

Is it that a primary key is the selected candidate key chosen for a given table?
Candidate Key – A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key.
Primary Key – A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key.
More on this link with example
John Woo's answer is correct, as far as it goes. Here are a few additional points.
A primary key is always one of the candidate keys. Fairly often, it's the only candidate.
A table with no candidate keys does not represent a relation. If you're using the relational model to help you build a good database, then every table you design will have at least one candidate key.
The relational model would be complete without the concept of primary key. It wasn't in the original presentation of the relational model. As a practical matter, the use of foreign key references without a declared primary key leads to a mess. It could be a logically correct mess, but it's a mess nonetheless. Declaring a primary key lets the DBMS help you enforce the data rules. Most of the time, having the DBMS help you enforce the data rules is a good thing, and well worth the cost.
Some database designers and some users have some mental confusion about whether the primary key identifies a row (record) in a table or an instance of an entity in the subject matter that the table represents. In an ideal world, it's supposed to do both, and there should be a one-for-one correspondence between rows in an entity table and instances of the corresponding entity.
In the real world, things get screwed up. Somebody enters the same new employee twice, and the employee ends up with two ids. Somebody gets hired, but the data entry slips through the cracks in some manual process, and the employee doesn't get an id, until the omission is corrected. A database that does not collapse the first time things get screwed up is more robust than one that does.
Primary key -> Any column or set of columns that can uniquely identify a record in the table is a primary key. (There can be only one Primary key in the table)
Candidate key -> Any column or set of columns that are candidate to become primary key are Candidate key. (There can be one or more candidate key(s) in the table, if there is only one candidate key, it can be chosen as Primary key)
A Primary key is a special kind of index in that:
there can be only one;
it cannot be nullable
it must be unique.
Candidate keys are selected from the set of super keys, the only thing we take care while selecting the candidate key is: It should not have any redundant attribute.
Example of an Employee table:
Employee (
Employee ID,
FullName,
SSN,
DeptID
)
Candidate Key: are individual columns in a table that qualifies for the uniqueness of all the rows. Here in Employee table EmployeeID & SSN are Candidate keys.
Primary Key: are the columns you choose to maintain uniqueness in a table. Here in Employee table, you can choose either EmployeeID or SSN columns, EmployeeID is a preferable choice, as SSN is a secure value.
Alternate Key: Candidate column other the Primary column, like if EmployeeID is PK then SSN would be the Alternate key.
Super Key: If you add any other column/attribute to a Primary Key then it becomes a super key, like EmployeeID + FullName, is a Super Key.
Composite Key: If a table does not have a single column that qualifies for a Candidate key, then you have to select 2 or more columns to make a row unique. Like if there is no EmployeeID or SSN columns, then you can make FullName + DateOfBirth as Composite primary Key. But still, there can be a narrow chance of duplicate row.
There is no difference. A primary key is a candidate key. By convention one candidate key in a relation is usually chosen to be the "primary" one but the choice is essentially arbitrary and a matter of convenience for database users/designers/developers. It doesn't make a "primary" key fundamentally any different to any other candidate key.
A table can have so many column which can uniquely identify a row. This columns are referred as candidate keys, but primary key should be one of them because one primary key is enough for a table. So selection of primary key is important among so many candidate key. Thats the main difference.
Think of a table of vehicles with an integer Primary Key.
The registration number would be a candidate key.
In the real world registration numbers are subject change so it depends somewhat on the circumstances what might qualify as a candidate key.
Primary key -> Any column or set of columns that can uniquely identify a record in the table is a primary key. (There can be only one Primary key in the table) and
the candidate key-> the same as Primary key but the Primary Key chosen by DB administrator's prospective for example(the primary key the least candidate key in size)
A primary key is a column (or columns) in a table that uniquely identifies the rows in that table.
CUSTOMERS
CustomerNo FirstName LastName
1 Sally Thompson
2 Sally Henderson
3 Harry Henderson
4 Sandra Wellington
For example, in the table above, CustomerNo is the primary key.
The values placed in primary key columns must be unique for each row: no duplicates can be tolerated. In addition, nulls are not allowed in primary key columns.
So, having told you that it is possible to use one or more columns as a primary key, how do you decide which columns (and how many) to choose?
Well there are times when it is advisable or essential to use multiple columns. However, if you cannot see an immediate reason to use multiple columns, then use one. This isn't an absolute rule, it is simply advice. However, primary keys made up of single columns are generally easier to maintain and faster in operation. This means that if you query the database, you will usually get the answer back faster if the tables have single column primary keys.
Next question — which column should you pick? The easiest way to choose a column as a primary key (and a method that is reasonably commonly employed) is to get the database itself to automatically allocate a unique number to each row.
In a table of employees, clearly any column like FirstName is a poor choice since you cannot control employee's first names. Often there is only one choice for the primary key, as in the case above. However, if there is more than one, these can be described as 'candidate keys' — the name reflects that they are candidates for the responsible job of primary key.
If superkey is a big set than candidate key is some smaller set inside big set and primary key any one element(one at a time or for a table) in candidate key set.
First you have to know what is a determinant?
the determinant is an attribute that used to determine another attribute in the same table.
SO the determinant must be a candidate key. And you can have more than one determinant.
But primary key is used to determine the whole record and you can have only one primary key.
Both primary and candidate key can consist of one or more attributes

Database Design

I am making a webapp right now and I am trying to get my head around the database design.
I have a user model(username (which is primary key), password, email, website)
I have a entry model(id, title, content, comments, commentCount)
A user can only comment on an entry once. What is the best and most efficient way to go about doing this?
At the moment, I am thinking of another table that has username (from user model) and entry id (from entry model)
**username id**
Sonic 4
Sonic 5
Knuckles 2
Sonic 6
Amy 15
Sonic 20
Knuckles 5
Amy 4
So then to list comments for entry 4 it searches for id=4.
On a side note:
Instead of storing a commentCount, would it be better to calculate the comment count from the database each time when needed?
Your design is basically sound. Your third table should be named something like UsersEntriesComments, with fields UserName, EntryID and Comment. In this table, you would have a compound primary key consisting of the UserName and EntryID fields; this would enforce the rule that each user can comment on each entry only once. The table would also have foreign key constraints such that UserName must be in the Users table, and EntryID must be in the Entries table (the ID field, specifically).
You could add an ID field to the Users table, but many programmers (myself included) advocate the use of "natural" keys where possible. Since UserNames must be unique in your system, this is a perfectly valid (and easily readable) primary key.
Update: just read your question again. You don't need the Comments or the CommentsCount fields in your Entries table. Comments would properly be stored in the UsersEntriesComments table, and the counts would be calculated dynamically in your queries (saving you the trouble of updating this value yourself).
Update 2: James Black makes a good point in favor of not using UserName as the primary key, and instead adding an artificial primary key to the table (UserID or some such). If you use UserName as the primary key, allowing a user to change their user name is more difficult, as you have to change the username in all the related tables as well.
What exactly do you mean by
entry model(id, title, content, **comments**, commentCount)
(emphasis mine)? Since it looks like you have multiple comments per entity, they should be stored in a separate table:
comments(id, entry_id, content, user_id)
entry_id and user_id are foreign keys to respective tables. Now you just need to create a unique index on (entry_id, user_id) to ensure user can only add one comment per entity.
Also, you may want to create a surrogate (numeric, generated via sequence / identity) primary key for your users table instead of making user name your PK.
Here's my recommendation for your data model:
USERS table
USER_ID (pk, int)
USER_NAME
PASSWORD
EMAIL
WEBSITE
ENTRY table
ENTRY_ID (pk, int)
ENTRY_TITLE
CONTENT
ENTRY_COMMENTS table
ENTRY_ID (pk, fk)
USER_ID (pk, fk)
COMMENT
This setup allows an ENTRY to have 0+ comments. When a comment is added, the primary key being a composite key of ENTRY_ID and USER_ID means that the pair can only exist once in the table (IE: 1, 1 won't allow 1, 1 to be added again).
Do not store counts in a table - use a VIEW for that so the number can be generated based on existing data at the time of execution.
I wouldn't use the username as a primary ID. I would make a numeric id with autoincrement
I would use that new id in the relations table with a unique key on the 2 fields
Even though it isn't in the question, you may want to have a userid that is the primary key, otherwise it will be difficult if the user is allowed to change their username, or make certain people know you cannot change your username.
Make the joined table have a unique constraint on the userid and entryid. That way the database forces that there is only one comment/entry/user.
It would help if you specified a database, btw.
It sounds like you want to guarantee that the set of comments is unique with respect to username X post_id. You can do this by using a unique constraint, or if your database system doesn't support that explicitly, with an index that does the same. Here's some SQL expressing that:
CREATE TABLE users (
username VARCHAR(10) PRIMARY KEY,
-- any other data ...
);
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
-- any other data ...
);
CREATE TABLE comments (
username VARCHAR(10) REFERENCES users(username),
post_id INTEGER REFERENCES posts(post_id),
-- any other data ...
UNIQUE (username, post_id) -- Here's the important bit!
);